Predicting Customer Churn with Machine Learning(AI)

Photo by Mica Asato from Pexels

You spent months, or even years fostering that treasured business-to-customer relationship. Then suddenly, you get the call:

“Hello, I’d like to cancel please.”

Whoa! You feel blindsided.

Where is this coming from?

Is it something I said?

Could I have prevented this?

Well, with AI and a KNIME workflow, you can identify some of the customers likely to churn! 

Identifying churn likelihood is crucial for customer retention as it enables businesses to take preventative measures before they lose the customer forever.

The Data

For this adventure, I used a squeaky clean Kaggle dataset on customer churn. The dataset consists of customers from a telephone company where out of 3,333 customers, 483 churned and 2,850 did not churn.

The predictive columns were: AccountWeeks, ContractRenewal, DataPlan, DataUsage, CustServCalls, DayMins, DayCalls, MonthlyCharge, OverageFee, and RoamMins.

There were no missing values and the only data cleaning task was to transform categorical variables from the numerical form.

Data Exploration

For this section, I focused on finding differences between the churn and non-churn groups.

Central Tendencies of Churn and Non-Churn Groups

From the statistics above, the churn group used considerably less data, had more customer service calls, consumed more day minutes, and had higher fees. The other variables may also be of predictive relevance, but I do not see any major differences looking at these numbers.

Data Plan and Churn

Customers who churned were less likely to have a data plan than those who did not churn.

Contract Renewal and Churn

Customers who churned were less likely to have renewed their contracts.

Now we have an idea of which variables could help predict churn, nonetheless, I will use a feature selection loop to select variables for the final model. We also know that we have an issue with class imbalance. This is something I will fix in the modeling section.

Here is my workflow for the data cleaning(lite) and exploration:

Model Creation and Selection

I utilized the Auto ML component to help me identify the model I should focus on. I also tested SMOTE oversampling and Equal Size sampling to identify which method of handling class imbalance is better. I got the best results using SMOTE to oversample the churn class, and the XGBoost model performed best across all measures. The model classified both churn and non-churn cases relatively well with a sensitivity of 94% and a specificity of 95% (for the displayed rows).

Proceeding with the XGBoost model, I performed feature selection and cross-validation.

Cross-Validation and Feature Selection

I love loops! It is always a treat seeing them run 😻

Back to data science stuff!

Feature selection helps us identify which variables are most relevant to our model. This helps us optimize accuracy and processing time.

I chose to maximize Cohen’s Kappa over accuracy. As this is an imbalanced dataset, the Cohen’s Kappa score provides a more realistic performance of the model as it takes into account the class imbalance. If a model performs poorly on the minority class, but accurately on the majority class, the accuracy measure will not reflect this, but Cohen’s Kappa will. Here is more on Cohen’s Kappa.

Cross-validation allows us to use all of the training data for feature selection, rather than partitioning our data even further. With cross-validation, the data is broken up into a specified number of folds (k-folds), and the different folds are used to test the model. This is a good way to evaluate if your model is stable, i.e if it shows the same accuracy levels when tested on different subsets of data.

I had 5 folds and error ranged from approximately 5% to 6.5%, pretty stable no? 😏

Here are the selected features:

For CustomerService, I selected this one manually since the XGBoost model needed at least one predictor variable to initiate the loop.

Model Testing

Alright XGBoost, let’s see how you do on data you’ve never seen before!

From the confusion matrix above, we can capture 76.5% (78/102) of customers who churned. Personally, I’d like that number to be higher…but being able to identify even half of the customers who are about to churn can be leveraged to retain the revenue stream from these customers.

23 customers were misclassified as churned, but worst case these customers will get some TLC which could increase the chance of retaining them. The main goal should be to limit the number of churned customers missed by the model.

Prediction can be improved by providing more data such as customer demographics, and customer service ticket information like the reason for contact, customer service rating, and the number of calls to resolve an issue.

How Can Business Limit Churn?

The number of customer service calls are indicative of an issue, companies should give special attention to customers who call in more, both to resolve the issue and to improve the customer experience by offering rewards and freebies,

Having a customer in a contract seems to reduce churn likelihood, hence companies should make their contract attractive to the customer for a win-win scenario. Keeping the costs low can be helpful as well especially for daytime calls (DayMins) since the churn group accrues more daytime call minutes every month. Perhaps daytime minute packages can be provided, which offer savings to the customer.

Where to get the Workflow

You can download and play with the KNIME workflow here. Let me know your thoughts 😁

See you on the blog!

-Tosin Adekanye-

Advertisement

Published by Tosinlitics

Hello! I'm Tosin and I love analyzing stuff and using data science as a crystal ball. Follow me to see my cool dashboards, data science, and analytics projects.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: