Semantic matching of competences and skills

The challenge

VDAB has a database of over 11000 competencies, which describe an employee within his or her function. This database keeps on growing. One of the biggest challenges is avoiding duplicate competences and grouping competencies that resemble each other. Given the size of the database, it is no longer possible to perform this manually. The aim of this project was to build an AI-based solution that automatically links similar competences.

Our solution

Our matching algorithm works in several steps. Using the Google Translate API we first translate everything from Dutch/French to English. This gives us four different "languages", i.e., Dutch, French, English (translated from Dutch) and English (translated from French). Next, we clean up and normalize all the text. This includes converting all punctuation marks to their correct value and also converting all upper case letters to lower case.

After this phase, we can start the word embeddings. We've chosen fastText from Facebook for this as it all finally, we rank each of these sentence embeddings and based on the continuous feedback from VDAB we retrain our models to ensure a correct matching to perform subword matching. FastText uses both common crawl data as well as models trained on Wikipedia. Next, we use both classical methods as well as machine learning-based methods for the sentence embeddings.

Swapnil dwivedi w46t RF64q Nc unsplash
Room Gk Wr EP Ty D Rs unsplash
Photo 38 b2da1c713620ec05c913c98d969ded7d
Rishi deep Wi Cv C9u7 Op E unsplash
Clint adair BW0v K FA3eg unsplash
Austin distel w D1 L Rb9 Oe Eo unsplash
Logo alt

Need a human in the loop?
Let's brainstorm!

Contact us

Deze website maakt gebruik van cookies om ervoor te zorgen dat u de beste surfervaring op onze website krijgt. Meer info