The Miranda warning prevents us from self-incrimination.
You have the right to remain silent. Anything you say will be used against you.
If we hold ML models accountable for their predictions, shouldn’t we at least grant them that right? Can we expect ML models to know everything? I guess we don’t! Moreover, it would be beneficial to know when the model is unsure about what to say.
Granting ML models the right to abstain is known as the reject option. And it’s pretty handy. We will show you how to use it in this article.
Context
Machine Learning and Artificial Intelligence algorithms are currently applied in almost every industry, integrating numerous Value Chains that depend on their decisions. However, despite the continuous advances in the state-of-the-art, these algorithms are still not perfect and make several mistakes in critical situations. The cause of each mistake might rely on several factors, for example:
Data points that are too close to a decision boundary – In real-world datasets, decision boundaries might be hard to define. In those cases, in the predictions close to the boundary, the model returns predictions with low levels of confidence, which might lead to misclassified data.
Outliers – If the data point doesn’t belong to any population seen on the training set, it’s hard for the model to make an inference about that data point.
Missing data – There are several ways to deal with missing data, either by imputation or by adding “missing data” flags. In both cases, we are making assumptions about the data that might not be true, therefore, the inferences about those data points should have a lower confidence level.
Low confidence levels might lead to misclassification cases but is it really the model’s fault? When we ask an algorithm for its predictions about a data point, we force it to return an answer, even if it doesn’t know it. For several use cases (see some examples in “Applications”), it’s beneficial to give the model the option to remain silent, i.e., if the algorithm is not confident enough, it has the option to reject the data point, avoiding making mistakes – Classification withReject Option.
Course
The ABCs of Machine Learning
Get familiar with the Machine Learning concepts with our course.
This approach is only applicable when it is possible to pass on the decision to another available decision system (e.g. another algorithm, a specialist, exams, or tests) or when there’s no need to return predictions for the entire dataset. In other words, apply Reject Option when the cost of rejecting an instance by the model is lower than the error cost. Here are a couple of examples of applications that can benefit from an approach of Classification with Reject Option:
Decision Support Systems for Medical Diagnose – there are few things riskier than a misdiagnosis, especially for lethal diseases. Therefore, delegating that task to an algorithm and making it responsible for the decision is hardly accepted by the medical community, since it brings a lot of reliability issues to the table. For that reason, it is difficult to integrate AI algorithms in the screening workflow of diseases. However, the intention of including AI in healthcare is to help the specialist and not to replace them. Using a Classification with Reject Option approach, the algorithm returns its predictions for the cases where it is highly confident and passes on the decision to a specialist when the confidence levels are lower. This way, the algorithm will be helping the specialist, relieving him/her from a significant workload.
Image-based Classification in Videos – In some applications, such as object detection, action recognition, video summary, or face recognition, not every frame is relevant since the normal frame rate of videos is 30 FPS and most of the time, one good frame is enough to trigger a decision. In these cases, instead of returning frame-wise predictions with a lot of noise and uncertainty, the models could use the Reject Option trick and return the predictions with high levels of confidence, only.
Implementation
Now that we have seen how Classification with Reject Option can help us in critical use cases, let explore how we can integrate it in our model implementations.
Method 0 – Threshold Optimization
The easiest and simplest way to integrate Reject Option in a Decision Support System is applying post-processing on the model results, considering the confidence level and the performance goal. For example, if the acceptable performance is an average accuracy above 95%, you can optimize the confidence threshold for each class. To do so, follow these steps:
Compute the predictions to your validation dataset
For each class, iterate through the prediction corresponding to the class from the lowest to the highest
Consider that prediction as to the value of your threshold
After applying the threshold, compute the metric of interest (in this case, the average accuracy)
Once you achieve the average accuracy of 95%, you have found the optimized threshold
To avoid overfitting over the validation set, apply cross-validation and compute the optimized threshold considering one of the sample statistics: average, median, or mode.
Despite being easy to implement, this method has some limitations. First of all, it’s hard to regularize the amount of data that is being ignored by post-processing. In the limit, this method is able to find perfect metrics by ignoring all the data, so you will need extra mechanisms to avoid that to occur in your optimization. Second of all, since this method is applied after getting the predictions, the model doesn’t learn how the feature space is related to data rejection. To overcome these limitations, we present to you the next three methods found in the literature.
Method 1 – Adding a Rejection Class
This method was explored by Sousa, Ricardo Gamelas, et al. in [1] for a binary problem. The solution implemented by them included the following steps:
Define a value (random or not) as the initial threshold.
Compute the ratio of rejected instances (R = number of rejected instances / total number of instances in the dataset) and the ratio of misclassified data points (E = number of misclassified instances / total number of instances in the dataset).
Compute the Ȓ, using the equation Ȓ = ⍵R + E, where ⍵ is the rejection cost, R is the ratio of rejected instances, and E is the ratio of misclassified instances.
Repeat steps 1, 2, and 3 for a set of thresholds.
Select the threshold that minimizes Ȓ.
Create the Rejection Class, and re-label the dataset with that class when the predictions are under the threshold value.
Train a new model for the 3 classes problem.
A weakness of this model is that it needs two different training sets, one for the first model and a second to be re-labeled and to train the second model. If you’re dealing with small amounts of data, you might compromise the model performance by using only half of it.
Method 2 – Class-Specialized Models
This method was also presented by Sousa, Ricardo Gamelas, et al. in [1] for a binary problem. However, as well as the previous method, it can be adapted for the multi-class problem.
The implementation of this solution integrated the following steps:
Define the Rejection Cost for each model, considering the context of your use case and the real-life costs. For example, if rejecting a sample implies that a specialist has to analyze it later, consider the duration of the task and the man-hour value.
Train the first model to become specialized on class 0, i.e. maximizing the precision for class 0.
Train the second model to become specialized in class 1, i.e. maximizing the precision for class 1.
Compute the predictions for the test dataset, for model 1 and for model 2.
For each data point, if the predictions match, classify the instance with the corresponding class otherwise, classify it as “rejected”.
To extend this method for a multi-class problem, you must train a different model for each class and then combine the predictions of all the models to check if there is unanimity, otherwise, the data point is rejected. This means the computation scales with the number of classes in the problem, which makes it impracticable when working with datasets with a high amount of classes, as the Imagenet (1000 classes), for example.
Method 3 – Regularization Through Loss Function
The fourth and last method was proposed by Geifman, Yonatan, and Ran El-Yaniv in [2] with the novel Dense Neural Network (DNN) architecture “Selectivenet”. The Selectivenet can be adapted to any DNN, by adding an extra task to the model for data selection. The selection task is self-supervised, which means there’s no ground truth related to this task but its output is supervised by the loss function.
The loss function has then two terms: one to punish misclassifications on the data points that were not rejected by the model and a second term to punish the rejection itself to avoid a massive rejection.
Additionally, the authors suggest joining an auxiliary task that can be the same as the classification task or a different one, as long as it doesn’t ignore any data point. The purpose of this task is to force the model to learn the entire feature space represented by the available data and to learn the relation between the feature space and the rejection. Adding the auxiliary task implies the addition of a third term to the loss function, whose impact is regularized by a parameter.
From all the methods this seems the most functional since it is easy to implement, it doesn’t require an extra data partition to optimize the thresholds, and it doesn’t cause a significant increase in the computation cost.
Course
The Machine Learning Spectrum
To know more about other learning strategies, check our course.
Reject Option methods are useful to increase the trustability of Machine Learning methods and to avoid mismanagement in critical situations. However, it is not applicable to every use case. When a data point is rejected by the model and it can’t be ignored, someone or something has to handle it, and that option might not be available. Once again, the key to a successful AI system is in understanding the problem, finding the strengths and the limitations associated with each possible method, and designing a solution that fits the problem and its context.
If you’re looking for more ideas or if you’re willing to discuss cutting-edge solutions in AI, contact us at [email protected]
Special offers, latest news and quality content in your inbox.
Signup single post
Recommended Articles
Article
AI City Intelligence
Oct 31, 2024 in
Use Case
Imagine being able to make better decisions about where to live, where to establish a new business, or how to understand the changing dynamics of urban neighborhoods. Access to detailed, up-to-date information about city environments allows us to answer these questions with greater confidence, but the challenge lies in accessing and analyzing the right data. […]
EcoRouteAI: Otimização de Ecopontos com Inteligência Artificial
Sep 30, 2024 in
News
O Plano Estratégico para os Resíduos Urbanos (PERSU) 2030 definiu metas ambiciosas para a gestão de resíduos em Portugal, com o objetivo de aumentar a reciclagem e melhorar a sustentabilidade ambiental. No entanto, os atuais índices de reciclagem e separação de resíduos ainda estão aquém do necessário, tanto a nível nacional quanto europeu, criando desafios […]
NILG.AI named Most-Reviewed AI Companies in Portugal by The Manifest
Aug 28, 2024 in
News
The artificial intelligence space has been showcasing many amazing technologies and solutions. AI is at its peak, and many businesses are using it to help propel their products and services to the top! You can do it, too, with the help of one of the best AI Companies in Portugal: NILG.AI. We focus on your […]
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.