As part of the rise of Deep Neural Networks in the ML community, we have observed an increasing fit-predict approach, where AI practitioners don’t take the time to think about the domain knowledge that is already available and how to embed that knowledge in the models. In this blogpost, we will cover how we created custom-made deep neural networks that combine domain knowledge for estimating customer lifetime value in multiple timesteps, in a project developed together with a major Telco in Portugal. We will start by explaining the problem and the different ways it has been approached in the literature, followed by our solution and the way we incrementally built it.
While this post reflects the ideas applied in the Telecommunications industry, it can be easily extrapolated to any other subscription-based industry.
What is Customer Lifetime Value and how to estimate it?
The customer lifetime value is the net present value of customers calculated profit over a certain number of months [1]. Specifically, in the telecommunications industry, where price transitions are limited, customer monthly margin and customer survival curve are the two major components of this term.
There are several approaches in the literature for estimating Customer Lifetime Value (CLTV). The most classical approaches are based on statistical models (e.g. Buy ’Til You Die Models, Pareto/NBD model, Recency-Frequency-Monetary value [2] ) and consider features such as purchase frequency and most recent purchases, fitting them to a certain statistical distribution. Most recently, machine-learning-based approaches are reported in the literature, which consider hand-crafted features [3, 4].
Based on a previously presented idea of a model for recommending the best package for a given customer, we set out to build a model that could estimate customer lifetime value in a time window of N months, given such recommendation. We intended to build a tool for understanding if CLTV could be used to help our stakeholders make decisions in the best package to offer in an outbound call. By maximizing CLTV for a set of possible package transitions, it is expected that client satisfaction and loyalty will also increase.
Business Rules and important dataset features
After accepting an offer, clients agree to a fidelization/contract period of 24 months. If the offer is rejected, the fidelization period remains the same, and the risk of the client leaving the company (churn) increases.
Our dataset contains two groups of features:
Behavioral features: consumption patterns, interactions with the company’s channels, etc.
Proposal features: Features such as internet speed, package type, number of TV Channels, …
During this article, we will use the following notation: the offer month will be month 0, and the subsequent months will be named month 1, … N. If the target variables are presented in curly brackets, that means the input is a dictionary of arrays. If not, it is a single array.
From the available data, we can calculate three different possible targets, which are all helpful for business decision making, in different use cases:
: Client has not left the company in month N (is alive/did not churn)
: Monthly Revenue (Price), simplified as subscription value
: Client accepted the offered package
After considering several approaches for estimating CLTV based on optimization functions or machine learning models which used pre-trained models’ scores as features (For more, check the appendix) we opted to develop an approach based on Deep Neural Networks.
Deep Neural Networks Approach
Deep learning has promoted a black-box approach where people do not think about what they’re trying to do, and just plug their input features into a model and get an output value. At NILG.AI, we always think about business impact and how to create explainable models to help communicate with people in charge of making business decisions.
How can we create a block-based model which we could quickly manipulate in case we wanted to test different things? How can we learn holistic representations of the client that cover more than one signal of interest (e.g., CLTV, churn, upselling, etc.). At a very high level, we planned on building a model that could be single or multi-output, and single or multi-task, with embedded domain knowledge.
The next sections will explain our building blocks of this architecture.
We improved the Mean Absolute Error by 50% using our custom architecture when compared with an off-the-shelf regression model
DNN User
DNN User is a loop of N Dense layers followed by Dropout layers, with a final Dense layer which creates a common latent space for further tasks. Basically, this is a feature extraction step that extracts relevant user-specific features from the input features.
For now, let’s consider that we are outputting only the Price for a given month after the proposal month ().
Churn + Regression Task – Single Output
If we were building a simple regression model, all we needed to do to finish this architecture would be to add a final non-negative activation (e.g., ReLU/ELU), predicting . We decided to add an extra task: estimating client survival probability, by having a sigmoid layer to predict .
As a way of enforcing that a decrease in client survival probability leads to a decrease in CLTV, we added the following business rule to our neural network:
where the second term is equal to 0, as the Price when the client has churned () is zero. Therefore, this equation is reduced to a multiplication between survival probability and , a latent space which can be interpreted as the potential value the client is willing to pay for the service, without considering customer satisfaction and competition.
The network we have built until now is summarized below:
Where the red-box outputs are learned in a multitask supervised fashion.
The loss function is calculated according to the following equation:
Churn + Regression Task – Multiple Output
The above network predicts . However, CLTV is estimated by summing the Price over several months.
To create a multi-output model, all that is required is to repeat the above blocks, and shifting the target by one month for each block.
The loss function is then calculated as:
Churn + Regression Task + Taker Task
We can also add an extra task for further explainability: the probability of the client accepting the offer (yTaker), which can be modeled as a business rule in the network by the following equation:
Which is the sum of the subscription value if the client accepts the offer and if he does not. This expected value is then multiplied by the survival probability in that timestamp, as in the previous architecture.
The final equation (which can be followed by a non-negative activation, such as ReLu) is then written as:
Results
This architecture was compared to a base regression model. The mean absolute error (MAE) between the predicted and estimated Price was calculated. We improved the Mean Absolute Error by 50% using our custom architecture when compared with an off-the-shelf regression model.
While for some cases, this may not have a direct impact in model results (and may even lead to reduced performance, if some of the labels are noisy and contradictory), adding these priors can lead to an increased model regularization and reduce overfitting to noisy values/outliers or sporadic events in time.
Conclusion
We have managed to innovate in this area of estimating Customer Lifetime Value by creating a multitask model that can predict customer churn, lifetime value and propensity for accepting an offer. In spite of DNNs being typically considered as black boxes, we show here that, by thinking of DNN as Differentiable Programmes (as encouraged by Yann LeCunn), we can add different tasks to increase interpretability and decision making.
With this, we can understand which offer is best for improving customer satisfaction and retention. For instance, an offer that leads to a low propensity and high probability of churn is probably not the best offer for that customer.
There are several more ways we can add domain knowledge to this model, such as a loss function that penalizes more customers in a certain risk group, ensuring monotonic features have monotonic impact, among others.
Chamberlain, B. P., Cardoso, A., Liu, C. H., Pagliari, R., & Deisenroth, M. P. (2017, August). Customer lifetime value prediction using embeddings. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1753-1762). ACM.
Special offers, latest news and quality content in your inbox once per month.
Signup single post
Recommended Articles
Article
NILG.AI named Most-Reviewed AI Companies in Portugal by The Manifest
Aug 28, 2024 in
News
The artificial intelligence space has been showcasing many amazing technologies and solutions. AI is at its peak, and many businesses are using it to help propel their products and services to the top! You can do it, too, with the help of one of the best AI Companies in Portugal: NILG.AI. We focus on your […]
Predictive models are transforming the AI landscape. They can forecast future events, identify past occurrences, and even predict present situations. However, building a successful predictive model is not as simple as it seems. To achieve an effective predictive model, you need to consider three crucial moments: the prediction time, the prediction window, and the data […]
Generative AI is a powerful tool that many companies are rushing to incorporate into their operations. However, it’s crucial to understand the possible risks associated with this technology. In this article, we’ll discuss the top nine risks that could impact your business’s readiness for AI integration. Stay ahead of the curve, and make sure you’re […]
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.