You Have the Right to Remain Silent

IA4tech • Aprendizagem Automática •

You Have the Right to Remain Silent

An overview of Classification with Reject Option

Paulo Maia em Ago 2, 2021

The Miranda warning prevents us from self-incrimination.

You have the right to remain silent. Anything you say will be used against you.

If we hold ML models accountable for their predictions, shouldn’t we at least grant them that right? Can we expect ML models to know everything? I guess we don’t! Moreover, it would be beneficial to know when the model is unsure about what to say.

Granting ML models the right to abstain is known as the reject option. And it’s pretty handy. We will show you how to use it in this article.

Context

Machine Learning and Artificial Intelligence algorithms are currently applied in almost every industry, integrating numerous Value Chains that depend on their decisions. However, despite the continuous advances in the state-of-the-art, these algorithms are still not perfect and make several mistakes in critical situations. The cause of each mistake might rely on several factors, for example:

Data points that are too close to a decision boundary – In real-world datasets, decision boundaries might be hard to define. In those cases, in the predictions close to the boundary, the model returns predictions with low levels of confidence, which might lead to misclassified data.

Outliers – If the data point doesn’t belong to any population seen on the training set, it’s hard for the model to make an inference about that data point.

Missing data – There are several ways to deal with missing data, either by imputation or by adding “missing data” flags. In both cases, we are making assumptions about the data that might not be true, therefore, the inferences about those data points should have a lower confidence level.

Low confidence levels might lead to misclassification cases but is it really the model’s fault? When we ask an algorithm for its predictions about a data point, we force it to return an answer, even if it doesn’t know it. For several use cases (see some examples in “Applications”), it’s beneficial to give the model the option to remain silent, i.e., if the algorithm is not confident enough, it has the option to reject the data point, avoiding making mistakes – Classification with Reject Option.

Curso

Os Fundamentos do Aprendizagem Automática

Get familiar with the Machine Learning concepts with our course.

Saber mais

Applications

This approach is only applicable when it is possible to pass on the decision to another available decision system (e.g. another algorithm, a specialist, exams, or tests) or when there’s no need to return predictions for the entire dataset. In other words, apply Reject Option when the cost of rejecting an instance by the model is lower than the error cost. Here are a couple of examples of applications that can benefit from an approach of Classification with Reject Option:

Decision Support Systems for Medical Diagnose – there are few things riskier than a misdiagnosis, especially for lethal diseases. Therefore, delegating that task to an algorithm and making it responsible for the decision is hardly accepted by the medical community, since it brings a lot of reliability issues to the table. For that reason, it is difficult to integrate AI algorithms in the screening workflow of diseases. However, the intention of including AI in healthcare is to help the specialist and not to replace them. Using a Classification with Reject Option approach, the algorithm returns its predictions for the cases where it is highly confident and passes on the decision to a specialist when the confidence levels are lower. This way, the algorithm will be helping the specialist, relieving him/her from a significant workload.
Image-based Classification in Videos – In some applications, such as object detection, action recognition, video summary, or face recognition, not every frame is relevant since the normal frame rate of videos is 30 FPS and most of the time, one good frame is enough to trigger a decision. In these cases, instead of returning frame-wise predictions with a lot of noise and uncertainty, the models could use the Reject Option trick and return the predictions with high levels of confidence, only.

Implementation

Now that we have seen how Classification with Reject Option can help us in critical use cases, let explore how we can integrate it in our model implementations.

Method 0 – Threshold Optimization

The easiest and simplest way to integrate Reject Option in a Decision Support System is applying post-processing on the model results, considering the confidence level and the performance goal. For example, if the acceptable performance is an average accuracy above 95%, you can optimize the confidence threshold for each class. To do so, follow these steps:

Compute the predictions to your validation dataset
For each class, iterate through the prediction corresponding to the class from the lowest to the highest
Consider that prediction as to the value of your threshold
After applying the threshold, compute the metric of interest (in this case, the average accuracy)
Once you achieve the average accuracy of 95%, you have found the optimized threshold

To avoid overfitting over the validation set, apply cross-validation and compute the optimized threshold considering one of the sample statistics: average, median, or mode.

Despite being easy to implement, this method has some limitations. First of all, it’s hard to regularize the amount of data that is being ignored by post-processing. In the limit, this method is able to find perfect metrics by ignoring all the data, so you will need extra mechanisms to avoid that to occur in your optimization. Second of all, since this method is applied after getting the predictions, the model doesn’t learn how the feature space is related to data rejection. To overcome these limitations, we present to you the next three methods found in the literature.

Method 1 – Adding a Rejection Class

This method was explored by Sousa, Ricardo Gamelas, et al. in [1] for a binary problem. The solution implemented by them included the following steps:

Define a value (random or not) as the initial threshold.
Compute the ratio of rejected instances (R = number of rejected instances / total number of instances in the dataset) and the ratio of misclassified data points (E = number of misclassified instances / total number of instances in the dataset).
Compute the Ȓ, using the equation Ȓ = ⍵R + E, where ⍵ is the rejection cost, R is the ratio of rejected instances, and E is the ratio of misclassified instances.
Repeat steps 1, 2, and 3 for a set of thresholds.
Select the threshold that minimizes Ȓ.
Create the Rejection Class, and re-label the dataset with that class when the predictions are under the threshold value.
Train a new model for the 3 classes problem.

A weakness of this model is that it needs two different training sets, one for the first model and a second to be re-labeled and to train the second model. If you’re dealing with small amounts of data, you might compromise the model performance by using only half of it.

Method 2 – Class-Specialized Models

This method was also presented by Sousa, Ricardo Gamelas, et al. in [1] for a binary problem. However, as well as the previous method, it can be adapted for the multi-class problem.

The implementation of this solution integrated the following steps:

Define the Rejection Cost for each model, considering the context of your use case and the real-life costs. For example, if rejecting a sample implies that a specialist has to analyze it later, consider the duration of the task and the man-hour value.
Train the first model to become specialized on class 0, i.e. maximizing the precision for class 0.
Train the second model to become specialized in class 1, i.e. maximizing the precision for class 1.
Compute the predictions for the test dataset, for model 1 and for model 2.
For each data point, if the predictions match, classify the instance with the corresponding class otherwise, classify it as “rejected”.

To extend this method for a multi-class problem, you must train a different model for each class and then combine the predictions of all the models to check if there is unanimity, otherwise, the data point is rejected. This means the computation scales with the number of classes in the problem, which makes it impracticable when working with datasets with a high amount of classes, as the Imagenet (1000 classes), for example.

Method 3 – Regularization Through Loss Function

The fourth and last method was proposed by Geifman, Yonatan, and Ran El-Yaniv in [2] with the novel Dense Neural Network (DNN) architecture “Selectivenet”. The Selectivenet can be adapted to any DNN, by adding an extra task to the model for data selection. The selection task is self-supervised, which means there’s no ground truth related to this task but its output is supervised by the loss function.

The loss function has then two terms: one to punish misclassifications on the data points that were not rejected by the model and a second term to punish the rejection itself to avoid a massive rejection.

Additionally, the authors suggest joining an auxiliary task that can be the same as the classification task or a different one, as long as it doesn’t ignore any data point. The purpose of this task is to force the model to learn the entire feature space represented by the available data and to learn the relation between the feature space and the rejection. Adding the auxiliary task implies the addition of a third term to the loss function, whose impact is regularized by a parameter.

From all the methods this seems the most functional since it is easy to implement, it doesn’t require an extra data partition to optimize the thresholds, and it doesn’t cause a significant increase in the computation cost.

Curso

O Espectro do Machine Learning

To know more about other learning strategies, check our course.

Saber mais

Conclusão

Reject Option methods are useful to increase the trustability of Machine Learning methods and to avoid mismanagement in critical situations. However, it is not applicable to every use case. When a data point is rejected by the model and it can’t be ignored, someone or something has to handle it, and that option might not be available. Once again, the key to a successful AI system is in understanding the problem, finding the strengths and the limitations associated with each possible method, and designing a solution that fits the problem and its context.

If you’re looking for more ideas or if you’re willing to discuss cutting-edge solutions in AI, contact us at [email protected]

Quer discutir esta ideia mais a fundo?

Marcar uma reunião com Pedro Serrano

Meet Pedro Saber mais

References

[1] – Sousa, Ricardo Gamelas, et al. “Robust classification with reject option using the self-organizing map.” Neural Computing and Applications 26.7 (2015): 1603-1619.

[2] – Geifman, Yonatan, and Ran El-Yaniv. “Selectivenet: A deep neural network with an integrated reject option.” International Conference on Machine Learning. PMLR, 2019.

Gosta desta história?

Subscreva a Nossa Newsletter

Ofertas especiais, últimas notícias e conteúdo de qualidade na sua caixa de entrada.

Bolacha	Duração	Descrição
cookielawinfo-checkbox-analiticas	11 meses	Este cookie é definido pelo plugin de Consentimento de Cookies do RGPD. O cookie é usado para armazenar o consentimento do utilizador para os cookies na categoria "Análise".
--- O seu texto é uma etiqueta ou nome de campo, provavelmente de um sistema de gestão de cookies ou de um formulário web, e não uma frase completa que necessite de tradução contextual. No entanto, se o objectivo for manter a clareza e a funcionalidade para um utilizador de língua portuguesa, sugiro a seguinte tradução e explicação: "Checkbox Funcional" Explicação: * Checkbox: Refere-se ao elemento gráfico de marcação (uma caixa que pode ser seleccionada ou desmarcada). * Funcional: Indica que esta caixa de seleção está relacionada com funcionalidades essenciais do website, como o login, a gestão do carrinho de compras ou outras características que tornam o site utilizável. Se esta etiqueta pertencer a um contexto onde se refere especificamente a cookies, a tradução poderia ser ajustada para ter mais clareza: "Aceitação de Cookies Funcionais" ou "Cookies Essenciais (Funcionais)" Esta última opção é comum em avisos de cookies para indicar que estes são estritamente necessários para o funcionamento do site. ---	11 meses	O cookie é definido pelo consentimento de cookies GDPR para registar o consentimento do utilizador para os cookies na categoria "Funcional".
cookielawinfo-checkbox-necessary	11 meses	Este cookie é definido pelo plugin GDPR Cookie Consent. O cookie é usado para armazenar o consentimento do utilizador para os cookies na categoria "Necessário".
cookielawinfo-checkbox-outros	11 meses	Este cookie é definido pelo plugin GDPR Cookie Consent. O cookie é usado para armazenar o consentimento do utilizador para os cookies na categoria "Outros".
checkbox-performance-cookielawinfo	11 meses	Este cookie é definido pelo plugin GDPR Cookie Consent. O cookie é usado para armazenar o consentimento do utilizador para os cookies na categoria "Desempenho".
política_de_cookies_visualizada	11 meses	O cookie é definido pelo plugin GDPR Cookie Consent e é utilizado para armazenar se o utilizador consentiu ou não com a utilização de cookies. Não armazena quaisquer dados pessoais.

NILG.AI