Managing Operational Risk: Proven Strategies & Best Practices
Jun 30, 2025 in Industry Overview
Learn effective methods for managing operational risk. Discover key frameworks, controls, and culture tips to protect your business today.
Not a member? Sign up now
a proactive approach for improving a candidate's employability
Paulo Maia on Jan 18, 2021
With COVID-19, many were affected by the economic crisis and lost their jobs. In Portugal alone, between February and September, there was a 30% increase in unemployment! AI can be a powerful tool in allocating scarce resources in a more efficient way. Inspired by DSSG Fellowship’s Project in Partnership with IEFP (Instituto de Emprego e Formação em Portugal) we started to think about how we would help reduce unemployment using AI.
Similar to DSSG’s project, the goals that will be discussed here are:
This is a summary from an internal non-exhaustive discussion merely for learning purposes – there are multiple solutions, all of which depend on the development time and the data which you can access.
Let’s assume that we can gather data when the unemployed person registers on the website platform, and all the job-related interactions are stored until the candidate finds a job. For simplification purposes, we can build our dataset given a list of monthly snapshots of all platform users’ characteristics, which have a unique identifier.
How can we encode in a similar way the information from a person with 10 previous courses vs. one with 2?
The candidate’s CV is also rich with data – we could use several NLP techniques to extract data from here, such as a Bag of Words or the average word embedding value (which has the advantage of having a semantic representation of words).
Note on fairness: We could have added a feature related to the candidate’s monthly expenses, as a way to estimate how much money he would need per month. But this could create a negative feedback loop – if he has low expenses due to not having the ability to live a better life, the model could be allocating him to offers with low salaries.
Ideally, we would represent the offer list in a comparable domain to the candidate’s expertise, so our features could represent the similarity between the offer and the candidate skills. As such, offer areas would also be represented in a Content Based approach.
If we were to use a model that cannot learn this relationship directly (e.g. tree based models) we could calculate pairwise features, such as the difference between the offer remuneration and the candidate’s monthly expenses, or the intersection between the candidate and the offer’s areas.
After having all these features, we can create a model that, given a candidate snapshot in a month and a job offer, returns an employability score. There are several ways this employability score can be modeled, all of which can be tested and should be picked depending on how the model intends to be applied in production:
Let’s assume, for now, the model gives us the months until the candidate finds a job.
We can evaluate the model performance using regression metrics such as the Mean Absolute/Squared Error and the Spearman/Pearson correlation coefficients between the target and the predicted value.
Time is an important component in our learning system, and, as such, we must pay attention to the way we split our data for performance estimation. Random splits might “leak” information in training, giving overestimations of the model performance.
As we’re building a dataset with several rows for the same candidate in different months, if we had the same candidate in train and test, but in different months, we’d know whether he found a job or not.
An initial approach is grouping the train/test splits by candidate (A is in train, B is in test, in the above example).
However, this approach still has the issue of time-leakage. Imagine that you were training the model with data from October 2020. You know, since a big shift in employment happened started in March 2020, that the average value of “months until employment” has increased, so you could also have an overestimation of model performance in predictions from months before.
A possible solution is to do both temporal and grouped stratification: for instance, train with a list of applicants from the previous year, and test with a list of other applicants from the current year. In the above example, you could train with data from 2019, with a candidate list not considering A and B, and test it on 2020, in candidates A and B.
We can optimize the employment institute’s resources by calculating the model performance for a list of available job/formation offers and having a cost function that tells us how good that offer is for that given candidate. We will not extensively discuss this in this blog post.
Imagine, for instance, we want to create an email marketing campaign where we send a list of K offers for each candidate. We can decide (at least) two types of actions:
– Improving candidate’s skills (e.g., courses).
– Improve candidate exposure to jobs (e.g., interviews).
This is an assignment problem with a huge number of possible combinations and budget/time constraints.
Two possible cost functions for this problem are.
This can then be optimized efficiently, for instance, with the use of metaheuristics, if the list of possible options is very large.
There are some ethical concerns with such cost functions:
To account for this, we can include some extra factors on the loss function to account for this, such as a multi objective search: considering time constraints for finding jobs for everyone and at the same time, reducing the average time people spend without a job.
This blogpost was written after an internal discussion where we discussed this topic. Obviously, it covers only a small fraction of what could be done with this topic. If you’re interested in having such discussions for a specific business problem you have, make sure to contact us!
Like this story?
Special offers, latest news and quality content in your inbox.
Jun 30, 2025 in Industry Overview
Learn effective methods for managing operational risk. Discover key frameworks, controls, and culture tips to protect your business today.
Jun 30, 2025 in Industry Overview
Learn how bottleneck analysis can help you find and resolve constraints efficiently. Discover tips for effective bottleneck analysis today!
Jun 5, 2025 in Industry Overview
Master quality control automation with proven strategies that drive real results. Discover practical insights from industry leaders.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |