Quality Control Automation: Your Manufacturing Game-Changer
Jun 5, 2025 in Industry Overview
Master quality control automation with proven strategies that drive real results. Discover practical insights from industry leaders.
Not a member? Sign up now
From supervised to unsupervised learning
Paulo Maia on May 5, 2020
Insurance codes are used by people’s health plan to make decisions about how much your doctor and other healthcare providers should be paid. There is some variety of coding systems currently used [1]:
After a given procedure, healthcare professionals list the procedure’s code in an insurance claim form so that the hospital is partially or fully refunded by that procedure.
However, it is natural that not all claims sent correspond to the actual procedures that were performed – due to fraud or submission errors, for instance.
Here’s an example (adapted from [1]): if you fall and sprain your ankle, and go to the emergency services as a consequence of it, you might end up performing an X-Ray of the ankle. If by mistake, the healthcare professionals mislabel ankle X-Ray as elbow X-Ray but still end up giving you the diagnosis of sprained ankle, the procedure and the diagnosis is not consistent, and the insurance claim might end up being rejected.
What are some of the issues resulting from this process which can have monetary consequences, from several points of view of the negatively affected stakeholders?
So, how can we use AI to assist in this area?
We will show you how to solve it using multiple techniques, including: Supervised, Unsupervised and Weakly supervised versions. To know more about it, enroll into our online course where we discuss in-depth all of these concepts.
We can identify problems in insurance claims’ submissions at a certain granularity level:
This use case of insurance claim error detection can be applied to both hospitals and insurance companies, as there is interest in understanding which claims are not correct. Basically, the possible options are to: identify a claim as wrong, correct an error in claim codes and/or try to explain why or where it occurred.
We will now start using letters to refer to the claim codes, as a simplification. The following figure represents the possible inputs and outputs of an insurance claim model.
Regarding the third case, we can have extra things in the claim sent by the hospital - an insurance company will want to have the minimum things possible, so the model should delete the codes which are unnecessary.
For the fourth case, a claim sent by a hospital can have missing codes for some procedures (i.e. by mistake or by lack of knowledge regarding a specific code). A model applied in a hospital should be able to add extra claims when they are missing.
A model used to detect errors in insurance claims should be invariant to the order in which the healthcare professional places the codes (e.g. A-B-C or C-A-B). We can add this invariance in different ways:
There are several approaches we can take in this problem, depending on the amount of labels and data available. We will give some examples of how we can approach this in a supervised and unsupervised way, with a more detailed focus in an unsupervised approach.
Two major learning mechanisms can be used:
If we have both positive and negative labels, this is a classical supervised learning problem framed as binary classification. We can then manually extract features from the codes, such as the co-occurrence of code pairs or use some Deep Model (e.g. RNNs) to try to infer the relationship between codes from the input.
However, as negative labels can also contain the positive target, we can instead think of this problem as weakly supervised and use Positive Unlabeled learning (PU Learning), in which the class that is not positive is considered to have both negative and positive examples (mixed set). Inside PU Learning, there are several algorithms that can be used, some of which are described/referred in the literature [2].
If there are no labels available at all, we then need to follow an unsupervised approach. We will describe a few examples next:
We can train a word2vec model that, given two codes, estimates the most likely adjacent code. Note that we are adding position-invariance.
This way, we can train code embeddings (similar to word embeddings) which learn the relationships between different codes. Then, we check if a given code has the embedding with the smallest distance to its neighbours. If not, we replace it by the code who does.
This is more error-prone as we can have codes for common operations with similar embedding distance.
Using a generative model, we can fit our claims to a model, learning a density function. The most common claims will be close in a given probability space. Examples of models who do so are variational autoencoders or a gaussian mixture models. We will then be able to know the probability of each claim being an outlier.
With a denoising autoencoder, we are trying to reconstruct a certain claim sequence - we add noise and the model tries to know what is wrong and try to correct it. We can then calculate a reconstruction error, which tells us we should have more elements of a certain claim and less elements of another claim. We then have an explanation informative to know what is wrong in the claim.
Alternatively, we can have a single model which tells us the probability of each element being wrong (and therefore, we know the probability of the whole claim being wrong).
To do this, we randomly add label noise by adding, removing and swapping claims. We then have an autoencoder which has a sigmoid layer that reconstructs the probability of each claim being wrong.
We have a higher degree of confidence in the model (and can measure its uncertainty) and can decide better on which claims we should manually analyze, since we have probabilities. On the other hand, we know a certain sequence has a high probability of being wrong, but we don’t know if it should be added or deleted.
To solve this issue, we could add a network with three extra tasks: probability of the claim code being wrong because it needs to be edited, deleted or added.
We can also have a generative model which shares weights with a denoising autoencoder (or other reconstructive model). This way, a generative model tells us which claim is wrong, and the reconstructive model tells us why it’s wrong (i.e., what part of the sequence is wrong), returning also the corrected sequence.
So, how can we be actionable with our model? Let us assume we are an insurance company with these two tools:
If we want to select N cases to manually evaluate, how could we optimize this to determine which are the most cost-effective claims?
The insurance company has certain costs associated to this procedure, and an example claim with the codes AAGKM, which should be AAGKD. Each code is a procedure/item with a certain cost.
Code | Cost |
A | 5 |
B | 50 |
K | 1000 |
M | 200 |
G | 300 |
D | 100 |
AAGKM = 5*2 + 300 + 1000 + 200 = 1510€ AAGKD = 5*2 + 300 + 1000 + 100 = 1500 €
Positive cases are fraud cases, which we want to manually evaluate.
If we are applying this model in an insurance company, we want to maximize both True Positives (TP) and True Negatives (TN). By maximizing True Negatives, we save analysis time, and by maximizing True Positives, we are reducing the number of cases which the insurance company should not be paying, but actually is.
On the other hand, if we apply this in a hospital, we want to minimize FP - cases which are flagged as a fraud but are not, costing man hours to evaluate manually - and FN - cases which are flagged as negative but are actually fraud, costing money due to errors.
There is a certain cost associated with correcting something in a claim and a price difference between the reconstructed claim and the original claim.
For each claim, X, we can calculate a score, and choose the samples with the highest N scores as the claims to manually evaluate.
This score needs to be composed of two terms. In the first term, containing the expected value in case fraud is detected, we multiply the probability of fraud by the money saved by the insurance company when fraud is detected. Here, the money inflow is dependent on the cost of the corrected claim subtracted from the original claim price and the man-hour rate required for correcting that claim manually.
In the second term, we multiply the probability of non-fraud by the man hour rate required for analyzing that sample, because even if there’s no fraud, there’s a cost associated with analyzing that claim manually.
Score(X) = P(Fraud) x ( PriceCorrectedClaim(X) - Price(X) - ManHourRate(Corrected(X) - X) ) - (1-P(Fraud)) x ManHourRate(Corrected(X) - X)
which is equal to:
Score(X) = Prob(Fraud) x ( PriceCorrectedClaim(X) - Price(X) ) - ManHours
So, for the above example, if the model corrects the sequence AAGKM to AAGKD, we have a 90% confidence that it is anomalous, and assuming a fixed price of 5€ per claim analysis:
Score(AAGKM) = 0.9 x (1510 - 1500 - 5) = 4.5
In this blog post, we presented the issue of automatically detecting errors/anomalies in insurance claims, an use case which can affect several stakeholders: patients, hospitals and insurance companies.
This approach can be done in a supervised or unsupervised way, depending on the available data. Even with no labels available, it is possible to create an interpretable and actionable model for optimizing the process of manually reviewing claims.
Let us know if you have any more ideas for solving this issue!
Like this story?
Special offers, latest news and quality content in your inbox.
Jun 5, 2025 in Industry Overview
Master quality control automation with proven strategies that drive real results. Discover practical insights from industry leaders.
Jun 5, 2025 in Industry Overview
Explore the best predictive maintenance tools transforming industries in 2025. Maximize asset uptime and efficiency with AI-powered solutions.
Jun 5, 2025 in Industry Overview
Transform operations with supply chain predictive analytics. Proven strategies, real results, and implementation insights from industry leaders.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |