Churn prediction is a common use case in the machine learning domain. Churn means “leaving the company”. It is very critical for a business to have an idea about why and when customers are likely to churn. Having a robust and accurate churn prediction model helps businesses to take action to prevent customers from leaving the company. Machine learning models have proved to be effective in detecting churn. However, if left unattended, the performance of churn models can degrade over time leading to losing customers.
The Fiddler AI Observability platform provides a variety of tools that can be used to monitor, explain, analyze, and improve the performance of your machine learning-based churn model.
In this article we will go over a churn example and how we can mitigate performance degradation in a churn machine learning model.
Refer to the colab notebook to learn how to -
- Onboard model on the Fiddler platform
- Publish events on the Fiddler platform
- Use the Fiddler API to run explanations
Please refer to our Getting Started guide for a step-by-step walkthrough of how to upload baseline and production data to the Fiddler platform.
When we check the monitoring dashboard, we notice a drop in the predicted churn value and a rise in the predicted churn drift value. Our next step is to check if this has resulted in a drop in performance.
We use precision, recall, and F1-score as accuracy metrics for this example. We’re choosing these metrics as they are suited for classification problems and help us in identifying the number of false positives and false negatives. We notice that although the precision has remained constant, there is a drop in the F1-score and recall, which means that there are a few customers who are likely to churn but the model is not able to predict their outcome correctly.
There could be a number of reasons for drop in performance, some of them are-
- Cases of extreme events (Outliers)
- Data distribution changes
- Model/Concept drift
- Pipeline health issues
While Pipeline health issues could be due to a component in the Data pipeline failing, the first 3 could be due to changes in data. In order to check that we can go to the Data Integrity tab to first check if the incoming data is consistent with the baseline data.
Our next step would be to check if this could be due to any data integrity issues. On navigating to the Data Integrity tab under the Monitor tab, we see that there has been a range violation. On selecting the bins which have the range violations, we notice it is due to the field
It is advised to check all the fields which cause data integrity violations. Since we see a range violation, we can check how much the data has drifted.
Our next step would be to go back to the Data Drift tab to measure the amount of drift in the field
numofproducts. The drift is calculated using Jensen Shannon Divergence, which compares the distributions of the two data sets being compared.
We can select the bin where we see an increase in average value as well as drift. We see a significant increase in the
numofproducts average value and drift. We can also see there is a difference in the distribution of the baseline and production data which leads to a drift.
Next step could be to find out if the change in distribution was only for a subsection of data or was it due to other factors like time (seasonality etc.), fault in data reporting (sensor data), change in the unit in which the metric is reported etc.
Seasonality could be observed by plotting the data across time (provided we have enough data), a fault in data reporting would mean missing values, and change in unit of data would mean change in values for all subsections of data.
In order to investigate if the change was only for a subsection of data, we will go to the Analyze tab. We can do this by clicking Export bin and feature to Analyze.
In the analyze tab, we will have an auto-generated SQL query based on our selection in the Monitor tab, we can also write custom SQL queries to investigate the data.
We check the distribution of the field
numofproducts for our selection. We can do this by selecting Chart Type - Feature Distribution on the RHS of the tab.
We further check the performance of the model for our selection by selecting the Chart Type - Slice Evaluation.
In order to check if the change in the range violation has occurred for a subsection of data, we can plot it against the categorical variable. In our case, we can check distribution of
geography. For this we can plot a feature correlation plot for two features by querying data and selecting Chart type - Feature Correlation.
On plotting the feature correlation plot of
numofprodcuts, we observe the distribution to be similar.
For the sake of this example, let’s say that state of Hawaii (which is a value in the
geography field in the data) announced that it has eased restrictions on number of loans, since loans is one of products, our hypothesis is the
numofproducts would be higher for the state. To test this we will check the feature correlation between
We do see higher values for the state of Hawaii as compared to other states. We can further check distribution for the field
numofproducts just for the state of Hawaii.
On checking performance for the subset of Hawaii, we see a huge performance drop.
On the contrary, we see a good performance for the subset of data without the ‘Hawaii’.
In order to measure the impact of features -
numofproducts, we can navigate back to the Monitor tab. We can see that the prediction drift impact is highest for
numofproducts due to its high drift value, which means it is contributing the most to the prediction drift.
We can further measure the attribution of the feature -
numofproducts for a single data point. We can select a data point which was incorrectly predicted to not churn (false negative). We can check point explanations for a point from the Analyze by running a query or from the Explain tab. Below we check point explanations for a data point form analyze tab by clicking the bulb symbol from the query results.
We see that the feature -
numofproducts attributes significantly towards the data point being predicted not to churn.
We have seen that the performance of the churn model drops due to range violation in one of the features. We can improve the performance by retraining the model with new data but before that we must perform mitigation actions which would help us in preemptively detecting the model performance degradation and inform our retraining frequency.
Add to dashboard
We can add the chart generated to the dashboard by clicking on Pin this chart on the RHS of the Analyze tab. This would help us in monitoring importance aspects of the model.
We can alert users to make sure we are notified the next time there is a performance degradation. For instance, in this example, there was a performance degradation due to range data integrity violation. To mitigate this, we can set up an alert which would notify us in case the percentage range violation exceeds a certain threshold (10% would be a good number in our case). We can also set up alerts on drift values for prediction etc. Check out this link to learn how to set up alerts on Fiddler platform.
Updated 2 months ago