Churn prediction – tandem with engagement – is probably the most wanted use case we get from Marketing departments across industries. For those of you that do not know what churn is, basically it’s associated with customers that will leave your company/services. So, it shouldn’t be a surprise that companies put a lot of effort into making sure once a customer is acquired, you don’t lose it.
We can think of churn from multiple angles, but all of them are solved by – more or less – the same analytical models. For instance:
- We can talk about voluntary churn, when the customer is the one making the call, or involuntary when you’re deciding the customer is no longer beneficial so you stop providing the service. In any case, you want to know which customers will swap from an income-generating user to a zero income or debt generating customer in the near future.
- We can talk about churn in the sense that the person is going to a competitor (e.g. employee turnover, changing internet provider), or cases where the customer lost interest in the service you provide (e.g., gym membership, online courses, etc.). In any case, you want to predict who is doing this, the only thing that changes is your proposition to re-engage him.
- Churn can be classified as hard, when there’s a termination of contract/service/subscription, or soft, when the customer just smoothly or suddenly stops buying at your store but without explicitly stating he doesn’t want to know more about you.
- A more esoteric definition can be applied to customers that end free trials without subscribing to those that convert. While this is typically handled from a business perspective as lead conversion instead of churn, the analytical models that solve this case lay within the same family of models.
How to model churn?
In this section, we will cover churn from multiple perspectives, including some insights from our previous experience on how to boost these models.
Churn as a binary classification
To be realistic, all customers are churners. Eventually, they will stop buying your services (voluntary churn) or you will shut down your company/product line (involuntary churn). So, when we think about predicting churn as a binary signal, it is restricted to a certain observance window. For instance, who is churning in the next week, in the next month, in the next quarter, etc? The predictive function, in this case, tends to assume this form:
Models that predict churn at a more immediate future tend to be more accurate. Namely, it’s easier to identify unsatisfied customers that will initiate a contract termination tomorrow than a year from now. However, short-term churn prediction is less actionable since, for most of those customers, the damage may be irreparable. So, we should aim for a right trade-off between prediction accuracy and recoverability/actionability.
In general, this category falls into binary classification, which allows us to use any classification model we want as long as it’s capable of dealing with extremely unbalanced data in the order of 1-99 or even less than that. Yes, it is bad news for Data Scientists, but good news for the Business side – otherwise your business would bankrupt soon. Even if we consider the target as a binary label, consider the model outcome as a probability and treat those with a probability high enough to compensate for the treatment costs. So, favor false positives since a lost client is hard – or even impossible – to recover. False negatives aren’t that bad unless you’re offering major discounts/benefits for re-engaging.
The main disadvantage of this approach is that it doesn’t give you an urgency indicator of when the customer will leave, nor they tell you what actions would be required to heal him.
I’ve seen people using SHAP values to “decide” how to save them. Please, don’t go this way. It is simply wrong. If your model thinks a customer will churn because he hasn’t used your services for a month, the right treatment isn’t inviting him to get into your website. Keep in mind your models are learning correlations and not causality.
Time to churn
Time-to-churn models aim to predict when the customer is leaving the company. These models can be based on standard regression, ordinal classification with time segmentation into classes, or using survival analysis prediction. The predictive function, in this case, tends to assume this form:
The main ambiguity of time-to-event modes is how to handle customers that do not churn. Namely, how to include active customers during training? You can use semi-supervised learning (assuming you don’t know their label) or weakly supervised learning (regularizing the prediction to be higher than your observance window without explicitly saying the actual value).
Once you know the time to churn, you can schedule clients to handle first customers that will leave the company sooner.
While these models are more informative regarding urgency than binary classification ones, they still don’t tell you how to save the customer.
From our experience, the right way to model churn is using uplift modeling. At uplift modeling, you aim to discover the effect of a marketing action on a customer. For instance, how the probability of churn decreases given a discount, marketing campaign, etc. While learning in a counterfactual manner is difficult – i.e., you don’t know what would have happened if you didn’t do the action, you can estimate the effect by applying different actions to different customers and learning the probability of churn conditioned on the action. The predictive function, in this case, tends to assume this form:
where F is any of the two methods previously referred to (churn as binary classification or time to churn) but conditioned on the action. For the sake of simplicity, let’s assume you’re modeling F as a binary classification task. In this case, the model is telling you the expected benefit from applying an action. Therefore, you are no longer targeting your more unsatisfied customers but you’re targeting overall loyalty. What’s the action that I should apply to each and every customer to engage him?
As for modeling actions into your model, don’t go with abstract actions. Try to be as discriminative as possible in the features. Include features such as: what’s the contact channel? what’s the day of month/week/time of day? what’s the discount percentage I’m offering? What are the additional perks I’m including? In this way, your model will be able to generalize for new actions you do in the future.
Of course, actions have cost. So, you probably don’t want just the action that minimizes churn at any expense (which would likely be paying your customer for using your service). You want the action that has the best trade-off between retaining the customer and making some profit out of it. Let’s say the predicted revenue for an alive customer is R(customer) (refer to our previous blog post to learn how to estimate this).
Also, we have a function that tells us the cost of an action C(action). Assuming a user that leaves the service generates $0 (it may not be the case since interrupting contracts tend to incur on expenses on both sides), the final function that tells you the best action per client is:
In case the best action has a negative expected gain, the best thing is to let the customer go. Otherwise, go for that action and keep both your customer and CFO happy.
We have built dozens of churn models for multiple industries. Therefore, if reducing churn* is one of your goals as a company and you’re looking at predictive analytics, let’s have a call and discuss how we can collaborate.
* increasing engagement, upselling, cross-selling, etc.
If you enjoyed the content of this post, subscribe to our mailing list. There, you will find content such as:
- Our blog posts
- References to papers we publish with other clients or research institutions
- Reference to events in which we will participate/sponsor
- An aggregate of content we recommend (e.g. papers, libraries, books, opinion articles, softwares, online courses, …)