Semantic matching of competences and skills
VDAB has a database of over 11000 competencies, which describe an employee within his or her function. This database keeps on growing. One of the biggest challenges is avoiding duplicate competences and grouping competencies that resemble each other. Given the size of the database, it is no longer possible to perform this manually. The aim of this project was to build an AI-based solution that automatically links similar competences.
Our matching algorithm works in several steps. Using the Google Translate API we first translate everything from Dutch/French to English. This gives us four different "languages", i.e., Dutch, French, English (translated from Dutch) and English (translated from French). Next, we clean up and normalize all the text. This includes converting all punctuation marks to their correct value and also converting all upper case letters to lower case.
After this phase, we can start the word embeddings. We've chosen fastText from Facebook for this as it all finally, we rank each of these sentence embeddings and based on the continuous feedback from VDAB we retrain our models to ensure a correct matching to perform subword matching. FastText uses both common crawl data as well as models trained on Wikipedia. Next, we use both classical methods as well as machine learning-based methods for the sentence embeddings.