Detecting Errors in Insurance Claims

From supervised to unsupervised learning

Insurance codes are used by people’s health plan to make decisions about how much your doctor and other healthcare providers should be paid.  There is some variety of coding systems currently used [1]:

  • Current Procedural Terminology (CPT) codes, used by physicians to describe the services they provide.
  • Healthcare Common Procedure Coding System (HCPCS),  used by Medicare. It is subdivided into level I codes (equal to CPT codes) and level II codes. The latter are used for identifying products, supplies, and services not included in the CPT codes (e.g. prosthetics and ambulance services)
  • International Classification of Diseases (ICD), developed by the World Health Organization (WHO), with the goal of identifying the patient’s health condition/diagnosis. These codes are typically combined with CPT Codes, to make sure that the patient’s health condition and the received services match (i.e., for matching billing and documenting diagnosis).

After a given procedure, healthcare professionals list the procedure’s code in an insurance claim form so that the hospital is partially or fully refunded by that procedure.

However, it is natural that not all claims sent correspond to the actual procedures that were performed – due to fraud or submission errors, for instance.

Here’s an example (adapted from [1]): if you fall and sprain your ankle, and go to the emergency services as a consequence of it, you might end up performing an X-Ray of the ankle. If by mistake, the healthcare professionals mislabel ankle X-Ray as elbow X-Ray but still end up giving you the diagnosis of sprained ankle, the procedure and the diagnosis is not consistent, and the insurance claim might end up being rejected.

What are some of the issues resulting from this process which can have monetary consequences, from several points of view of the negatively affected stakeholders?


What does each lose?

  • Patient

    • Miscoding the received services (diagnosis or procedures) can lead to the patient being labeled with a condition he does not have.
    • Increased expenses, incurring on additional cost for the patient, for the insurance company or both.
    • The patient having pre-existing conditions (which can be misdiagnosed) can potentially lead to obstacles in obtaining health coverage.
    • Often the patient has no access to his medical records, leading to a lack of visibility of these issues.
  • Hospital

    • Healthcare professionals might be overpaid or underpaid for a procedure
    • Insurance companies can even deny the claim and not pay anything, resulting in price increases also to the hospital
    • Hospitals can forget to include some expenses, losing the opportunity to be reimbursed by procedures they did/materials they used.
    • Delays in corrections might lead to extra administrative costs and delay in payments.
  • Insurance Company

    • Besides coding errors, hospitals could try to submit extra items associated with a procedure that they did not really use, to receive more funding (fraud).
    • Delays in corrections might lead to extra administrative costs and delays in payments.

So, how can we use AI to assist in this area?

Modeling Anomaly Detection in Insurance Claims

We will show you how to solve it using multiple techniques, including: Supervised, Unsupervised and Weakly supervised versions. To know more about it, enroll into our online course where we discuss in-depth all of these concepts.

The Machine Learning Spectrum course

What to do?

We can identify problems in insurance claims’ submissions at a certain granularity level:

  • A group of claim codes does not make sense
  • A group of claim codes does not make sense because the healthcare professional clicked the adjacent code in the interface or used the code in the wrong category (since the codes have a certain hierarchy - e.g. local and global anesthesia).


This use case of insurance claim error detection can be applied to both hospitals and insurance companies, as there is interest in understanding which claims are not correct. Basically, the possible options are to: identify a claim as wrong, correct an error in claim codes and/or try to explain why or where it occurred.

We will now start using letters to refer to the claim codes, as a simplification. The following figure represents the possible inputs and outputs of an insurance claim model.

Regarding the third case, we can have extra things in the claim sent by the hospital - an insurance company will want to have the minimum things possible, so the model should delete the codes which are unnecessary.

For the fourth case, a claim sent by a hospital can have missing codes for some procedures (i.e. by mistake or by lack of knowledge regarding a specific code). A model applied in a hospital should be able to add extra claims when they are missing.

What are the restrictions?

A model used to detect errors in insurance claims should be invariant to the order in which the healthcare professional places the codes (e.g. A-B-C or C-A-B). We can add this invariance in different ways:

  • Augmenting the input data with random shuffling.
  • Ordering both input and outputs in alphabetical/numerical order.
  • Using a position-invariant representation. For instance, since claim sequences can be considered as text, we could use Bag-of-Words for counting the presence of claims regardless of their order.

How can we do this?

There are several approaches we can take in this problem, depending on the amount of labels and data available. We will give some examples of how we can approach this in a supervised and unsupervised way, with a more detailed focus in an unsupervised approach. 

Available data

  • Claim Code
  • Date of claim
  • Possibly: Result (used as a label)

Assumed labels

  • Positive: Wrong claim. Insurance company reported issues with a certain hospital’s claim, and the hospital backed down and agreed with the error.
  • Negative: Cases in which the insurance company detected that the claim had no errors, did not want to spend time and money in legal processes for that given claim or failed to detect an error that existed. As such, negative labels are a mix of positive and negative.
  • None: Remaining claims which were not evaluated yet

Two major learning mechanisms can be used:

Supervised Approaches

If we have both positive and negative labels, this is a classical supervised learning problem framed as binary classification. We can then manually extract features from the codes, such as the co-occurrence of code pairs or use some Deep Model (e.g. RNNs) to try to infer the relationship between codes from the input.

However, as negative labels can also contain the positive target, we can instead think of this problem as weakly supervised and use Positive Unlabeled learning (PU Learning), in which the class that is not positive is considered to have both negative and positive examples (mixed set). Inside PU Learning, there are several algorithms that can be used, some of which are described/referred in the literature [2].

Unsupervised approaches

If there are no labels available at all, we then need to follow an unsupervised approach. We will describe a few examples next:

Code embeddings

We can train a word2vec model that, given two codes, estimates the most likely adjacent code. Note that we are adding position-invariance.

This way, we can train code embeddings (similar to word embeddings) which learn the relationships between different codes. Then, we check if a given code has the embedding with the smallest distance to its neighbours. If not, we replace it by the code who does.

This is more error-prone as we can have codes for common operations with similar embedding distance.

Generative model

Using a generative model, we can fit our claims to a model, learning a density function. The most common claims will be close in a given probability space. Examples of models who do so are variational autoencoders or a gaussian mixture models. We will then be able to know the probability of each claim being an outlier.

Seq2Seq inspired: reconstructing correct sequences

With a denoising autoencoder, we are trying to reconstruct a certain claim sequence - we add noise and the model tries to know what is wrong and try to correct it. We can then calculate a reconstruction error, which tells us we should have more elements of a certain claim and less elements of another claim. We then have an explanation informative to know what is wrong in the claim.


Seq2Seq inspired: probability of a sequence being wrong

Alternatively, we can have a single model which tells us the probability of each element being wrong (and therefore, we know the probability of the whole claim being wrong).

To do this, we randomly add label noise by adding, removing and swapping claims. We then have an autoencoder which has a sigmoid layer that reconstructs the probability of each claim being wrong.

We have a higher degree of confidence in the model (and can measure its uncertainty) and can decide better on which claims we should manually analyze, since we have probabilities. On the other hand, we know a certain sequence has a high probability of being wrong, but we don’t know if it should be added or deleted.

To solve this issue, we could add a network with three extra tasks: probability of the claim code being wrong because it needs to be edited, deleted or added.

Mixed generative and reconstructive model

We can also have a generative model which shares weights with a denoising autoencoder (or other reconstructive model). This way, a generative model tells us which claim is wrong, and the reconstructive model tells us why it’s wrong (i.e., what part of the sequence is wrong), returning also the corrected sequence.

What to do with model results?

So, how can we be actionable with our model? Let us assume we are an insurance company with these two tools:

  • Probability of the claim being wrong.
  • Suggestions of what is wrong.

If we want to select N cases to manually evaluate, how could we optimize this to determine which are the most cost-effective claims?
The insurance company has certain costs associated to this procedure, and an example claim with the codes AAGKM, which should be AAGKD. Each code is a procedure/item with a certain cost.

Code Cost
A 5
B 50
K 1000
M 200
G 300
D 100


AAGKM = 5*2 + 300 + 1000 + 200 = 1510€

AAGKD = 5*2 + 300 + 1000 + 100 = 1500 €

Positive cases are fraud cases, which we want to manually evaluate.

If we are applying this model in an insurance company, we want to maximize both True Positives (TP) and True Negatives (TN). By maximizing True Negatives, we save analysis time, and by maximizing True Positives, we are reducing the number of cases which the insurance company should not be paying, but actually is.

On the other hand, if we apply this in a hospital, we want to minimize FP - cases which are flagged as a fraud but are not, costing man hours to evaluate manually - and FN - cases which are flagged as negative but are actually fraud, costing money due to errors.


How can we optimize this for an insurance company?

There is a certain cost associated with correcting something in a claim and a price difference between the reconstructed claim and the original claim.

For each claim, X, we can calculate a score, and choose the samples with the highest N scores as the claims to manually evaluate.

This score needs to be composed of two terms. In the first term, containing the expected value in case fraud is detected, we multiply the probability of fraud by the money saved by the insurance company when fraud is detected. Here, the money inflow is dependent on the cost of the corrected claim subtracted from the original claim price and the man-hour rate required for correcting that claim manually.

In the second term, we multiply the probability of non-fraud by the man hour rate required for analyzing that sample, because even if there’s no fraud, there’s a cost associated with analyzing that claim manually.


Score(X) = P(Fraud) x (
  PriceCorrectedClaim(X) -
  Price(X) -
  ManHourRate(Corrected(X) - X)
) -
(1-P(Fraud)) x ManHourRate(Corrected(X) - X)

which is equal to:


Score(X) = Prob(Fraud) x (
  PriceCorrectedClaim(X) - Price(X)
) - ManHours

So, for the above example, if the model corrects the sequence AAGKM to AAGKD, we have a 90% confidence that it is anomalous, and assuming a fixed price of 5€ per claim analysis:

Score(AAGKM) = 0.9 x (1510 - 1500 - 5)  = 4.5



In this blog post, we presented the issue of automatically detecting errors/anomalies in insurance claims, an use case which can affect several stakeholders: patients, hospitals and insurance companies.

This approach can be done in a supervised or unsupervised way, depending on the available data. Even with no labels available, it is possible to create an interpretable and actionable model for optimizing the process of manually reviewing claims.

Let us know if you have any more ideas for solving this issue!


  2. Sansone, E., De Natale, F. G., & Zhou, Z. H. (2018). Efficient training for positive unlabeled learning. IEEE transactions on pattern analysis and machine intelligence, 41(11), 2584-2598.

Like this story?

Subscribe to Our Newsletter

Special offers, latest news and quality content in your inbox once per month.

Signup single post

This field is for validation purposes and should be left unchanged.

Recommended Articles

Can Your Business Optimize AI Predictive Models?

Predictive models are transforming the AI landscape. They can forecast future events, identify past occurrences, and even predict present situations. However, building a successful predictive model is not as simple as it seems. To achieve an effective predictive model, you need to consider three crucial moments: the prediction time, the prediction window, and the data […]

Read More
Is Your Business Ready for Generative AI Risks?

Generative AI is a powerful tool that many companies are rushing to incorporate into their operations. However, it’s crucial to understand the possible risks associated with this technology. In this article, we’ll discuss the top nine risks that could impact your business’s readiness for AI integration. Stay ahead of the curve, and make sure you’re […]

Read More
Can the STAR Framework Streamline Your AI Projects?

As a manager dealing with AI projects, you may often find yourself overwhelmed. The constant addition of promising projects to the backlog can lead to a mounting technical debt within your team, forcing you to neglect the core aspects of your business. Here at NILG.AI, we have a solution for this challenge: the STAR framework. […]

Read More