Difficult Targets to Optimize: the ROC AUC

Aprendizagem Profunda • Aprendizagem Automática •

Difficult Targets to Optimize: the ROC AUC

A practical trick to learn ML models with imbalance data

Paulo Maia em Dez 18, 2020

In many binary classification problems, especially in domains with highly unbalanced problems (such as the medical domain and rare event detection), we need to make sure our model does not become too biased for the more predominant class.

Thus, you may have heard that accuracy is not a good metric to validate classifiers in unbalanced settings. Instead, people tend to use other performance metrics that are robust to imbalance, such as the ROC AUC and the F1-score. So, why don’t we train models that optimize these metrics directly? Well, in some cases it is not possible/efficient since they are not differentiable. This is a series of blog posts in which we’ll explain how to optimize your model for such metrics (or soft versions of them), starting with the ROC AUC.

What is the ROC AUC?

A ROC curve is a plot that illustrates the diagnostic ability of a binary classifier for varying discriminative thresholds in its probabilistic output. It can be built by thresholding the model’s predicted probabilities at several values between 0 to 1 and calculating the True Positive Rate (Proportion of positive samples correctly predicted as positive) and the False Positive Rate (proportion of negative samples incorrectly predicted as positive). The ROC curve is the plot of all these determined points, as shown below.

In ML, we typically want to achieve the curve with the highest possible area. Why? Maybe you didn’t know this, but the area under the ROC curve equals the probability of the classifier ranking a randomly chosen positive instance ahead of a randomly chosen negative one (the proof of such a theorem is available aqui). Namely, what’s the probability of assigning a higher priority to a sick patient than to a healthy patient.

So, we can think of the ROC AUC as the accuracy of a ranking model when exposed to a pair of samples with opposite classes (e.g, one sick and one healthy). Being the cross-entropy loss the de facto soft approach for learning binary classifiers when we have accuracy in mind, the cross-entropy of a pairwise ranking model (e.g., a siamese neural network) would be a soft way of learning that tends to maximize the ROC AUC.

How to optimize the ROC AUC?

Let’s say we have a model (e.g., a deep neural network) such as the following one that, given the input features, predicts a continuous score.

As we discussed before, maximizing the ROC AUC is equivalent to maximizing the accuracy of the score difference sign for a positive-negative pair:

Thus, we can simply use a Siamese architecture, where each stream will contain our target model, trained on positive-negative pairs. In our architecture, the scores will be subtracted and passed through a sigmoid activation in order to approximate the probability of the positive sample having a higher score than the negative sample.

For generating our training batches, each pair will have a sample from each class, one on the negative stream and one on the positive stream, meaning that the ground truth will always be 1, as the probability from the positive stream should always be higher than the probability from the negative stream. The model is trained by minimizing the cross-entropy loss of this pairwise target. We won’t converge to a naive solution here given that the weights on each stream of a siamese network are shared.

This means the model is penalized whenever the negative side has a greater score than the positive side. As such, we are optimizing the model so that it always gives a higher score to the positive class (input_pos) when compared to a negative class (input_neg) – which is essentially the definition of optimizing ROC AUC!

So, how do we turn this network into an actionable model, which returns the binary classes? We need to reduce our network to a single stream, with the pre-trained weights, and determine a threshold value for the predicted score above which the model classifies the class as positive.

Validation

This architecture was tested on the CIFAR10 dataset in Keras, by creating an artificially unbalanced problem. The positive class was considered to be “airplanes”, and the negative class was all the other classes in the dataset. The positive class was then subsampled to 5% so we created an artificially unbalanced problem.

Afterward, we used a feedforward network with internal dropout layers and compared the performance of the simple cross-entropy-based strategy to train classifiers with the siamese-based models that we discussed in this post. The experiment was repeated 5 times with different random seeds to get a mean value more independent of the image selection process.

The mean ROC AUC value for the Siamese Network was (86 ± 1.3)%, while for the one-stream network the value was reduced to (72.2 ± 7.2)%, showing that optimizing the model with the siamese architecture was beneficial for the ROC AUC.

Conclusão

This blogpost explained how to optimize your model for a different metric, based on the probabilistic interpretation of ROC AUC.

At NILG.AI, we have worked on a lot of medical and marketing applications, where targets tend to be extremely unbalanced. We have used this strategy in several projects, achieving on each case higher performance with this learning strategy than with traditional learning approaches. If you are facing a similar problem, let’s discuss how we can collaborate with these types of learning approaches!

Gosta desta história?

Subscreva a Nossa Newsletter

Ofertas especiais, últimas notícias e conteúdo de qualidade na sua caixa de entrada.

Bolacha	Duração	Descrição
cookielawinfo-checkbox-analiticas	11 meses	Este cookie é definido pelo plugin de Consentimento de Cookies do RGPD. O cookie é usado para armazenar o consentimento do utilizador para os cookies na categoria "Análise".
--- O seu texto é uma etiqueta ou nome de campo, provavelmente de um sistema de gestão de cookies ou de um formulário web, e não uma frase completa que necessite de tradução contextual. No entanto, se o objectivo for manter a clareza e a funcionalidade para um utilizador de língua portuguesa, sugiro a seguinte tradução e explicação: "Checkbox Funcional" Explicação: * Checkbox: Refere-se ao elemento gráfico de marcação (uma caixa que pode ser seleccionada ou desmarcada). * Funcional: Indica que esta caixa de seleção está relacionada com funcionalidades essenciais do website, como o login, a gestão do carrinho de compras ou outras características que tornam o site utilizável. Se esta etiqueta pertencer a um contexto onde se refere especificamente a cookies, a tradução poderia ser ajustada para ter mais clareza: "Aceitação de Cookies Funcionais" ou "Cookies Essenciais (Funcionais)" Esta última opção é comum em avisos de cookies para indicar que estes são estritamente necessários para o funcionamento do site. ---	11 meses	O cookie é definido pelo consentimento de cookies GDPR para registar o consentimento do utilizador para os cookies na categoria "Funcional".
cookielawinfo-checkbox-necessary	11 meses	Este cookie é definido pelo plugin GDPR Cookie Consent. O cookie é usado para armazenar o consentimento do utilizador para os cookies na categoria "Necessário".
cookielawinfo-checkbox-outros	11 meses	Este cookie é definido pelo plugin GDPR Cookie Consent. O cookie é usado para armazenar o consentimento do utilizador para os cookies na categoria "Outros".
checkbox-performance-cookielawinfo	11 meses	Este cookie é definido pelo plugin GDPR Cookie Consent. O cookie é usado para armazenar o consentimento do utilizador para os cookies na categoria "Desempenho".
política_de_cookies_visualizada	11 meses	O cookie é definido pelo plugin GDPR Cookie Consent e é utilizado para armazenar se o utilizador consentiu ou não com a utilização de cookies. Não armazena quaisquer dados pessoais.

NILG.AI