PART 2: Enhancing Anti-Churn Strategies: Leveraging Advanced Machine Learning for Targeted Intervention in Telecommunications
Strategic Insights Unveiled: Machine Learning Illuminates the Path to Proactive Anti-Churn Promotions by Identifying At-Risk Consumers
Author
Affiliations
John Karuitha
Karatina University, School of Business
University of the Witwatersrand, Johannesburg, School of Construction Economics & Management
Published
January 12, 2024
Modified
January 12, 2024
Executive Summary
This analysis focuses on optimizing the anti-churn campaign for a telecommunications company through advanced machine learninThis study aims to improve a telecommunications company’s anti-churn efforts using advanced machine learning. Six models, including Decision Tree, Random Forest, Bagged Decision Tree, Extreme Gradient Boosting, Ridge Model, and Lasso Model, were used to identify 2000 customers for urgent anti-churn actions. The logistic regression model successfully pinpointed 2740 clients, offering targeted insights to enhance the anti-churn campaign’s effectiveness. The findings highlight the importance of using sophisticated machine learning for precise customer churn prediction in the telecommunications sector.
1Introduction
As of April 2023, many customers have been canceling their contracts with a mobile phone company, negatively affecting its performance. To tackle this issue, the company wants to reach out to active customers and offer them special deals to prevent them from canceling in the future—a strategy known as “anti-churn.” However, due to budget constraints, they can only contact 2,000 people. The goal is to identify those most likely to cancel in the next 3 months.
In this analysis, various tools are used to find 2,000 customers at risk of leaving for the anti-churn campaign. Here’s a breakdown: Section 2 explains the goals, Section 3 lists the methods used, Section 4 explores the data and finds key features related to customer cancellations, Section 5 puts the methods into action and assesses their performance, and finally, Section 6 concludes the analysis.
The primary objective of this analysis is to identify and select a targeted group of 2,000 active consumers who are at a heightened risk of canceling their contracts within the next 3 months.
3Technique
The principle of the project will be to construct several targets of 2000 customers using more or less complex statistical methods in order to improve the performance of the marketing campaign:
Decision tree model.
Random forest model.
Bagged decision tree model.
Extreme gradient boosting model.
Ridge model.
Lasso model.
4 Data
There are two sets of data.
base_telecom_2022_12.txt has 44529 rows and 42 columns.
base_telecom_2023_03.txt has 22528 rows and 41 columns.
The extra column in base_telecom_2022_12.txt contains a column flag_resiliation that indicates whether the consumer churned or not. Hence, I use this data set to train our models. I test the models on the second set of data; base_telecom_2023_03.txt .
4.1Data Exploration
The table below shows the summary statistics for numeric variables in the data. We see that the variables taille_ville and revenu_moyen_ville have a few missing observations.
I start by converting the data dates into the proper format. I also create an age column age that is the current year (2023) minus the year of birth (date_naissance). The goal is to find out whether age has a bearing on consumer churn. Similarly, I also create a duration column that captures the period from the start of the contract. Finally, I create a feature that captures whether the client is under commitment where they can only exit the plan by paying a contract fee. I capture any member whose commitment period ends on or before December 31, 2023 to be under commitment, otherwise, they are not under commitment.
I plot the extent of churn in our data. We see in the figure below that the incidents of churn are relatively few compared to the consumers that stay. During modelling, this is an important consideration which would require us to balance the data to aptly capture the characteristics of the people that churn.
Code
telcom_two %>%ggplot(aes(x =factor(flag_resiliation))) +geom_bar() +labs(title ="Prevalence of Churn in the Data",y ="There is a serious imbalance with the non-churn over-represented in the data",x ="Churned?", y ="Count")
I create a correlation funnel that bins the data and creates a correlation matrix of the correlations. Starting from the top, we see the variables that have stronger linear relationships with churn. In this case, nb_reengagements and vols_appels variables are highly related to churn. Sex has the lowest linear relationship with churn.
CorrelationFunnel Package
The correlationfunnel package includes a streamlined 3-step process for preparing data and performing visual Correlation Analysis. The visualization produced uncovers insights by elevating high-correlation features and loweribng low-correlation features. The shape looks like a funnel (hence the name “Correlation Funnel”), making it very efficient to understand which features are most likely to provide business insights and lend well to a [machine learning model](https://cran.r-project.org/web/packages/correlationfunnel/vignettes/introducing_correlation_funnel.html).
To run the machine learning models, I start by creating a recipe that I use throughout the analysis.
Code
## Select pertinent variables ----training_data <- telcom_two %>%mutate(flag_resiliation =factor(flag_resiliation))## Final validation data ----validation_data <- telcom_one_val %>%mutate(date_naissance =dmy(date_naissance),date_activation =dmy(date_activation),date_fin_engagement =dmy(date_fin_engagement),date_dernier_reengagement =dmy(date_dernier_reengagement) ) %>%mutate(age =as.numeric(today() - date_naissance),duration =as.numeric(today() - date_activation) ) %>%mutate(committed =case_when( date_fin_engagement <=as.Date("2022-12-31") ~"Committed",.default ="Not Committed" )) %>% dplyr::select(-starts_with("date"),-code_postal,-id_client) %>%mice(seed =234, printFlag =FALSE) %>%complete()## Create a recipe ----my_recipe <-recipe(flag_resiliation ~ ., data = training_data) %>%step_dummy(all_nominal_predictors()) %>%step_upsample(over_ratio =1) %>%step_impute_knn() %>%prep(training =NULL)# ## Apply all the steps to training data ----# training_data <- my_recipe %>% # bake(new_data = training_data)## Apply the recipe steps to the testing setvalidation_data <- my_recipe %>%bake(new_data = validation_data)## Create a workflowtelcom_wf <-workflow() %>%add_recipe(my_recipe)
6Training and testing sets
I create training and testing sets out of the main data that contains the outcomes of customer churn. I use this data to test the models before choosing the final model. For the model of choice, I will then choose the clients at risk of churn.
The null model, which is our baseline for analysis assumes that we pick the target clients at random. We see that the model has very weak discriminatory power with a diagonal ROC AUC. The ROC AUC indicates that the model is equivalent to guessing the clients to contact. In the remainder of the analysis, we explore models that will do better than the null model.
Code
null_spec <-null_model() %>%set_engine('parsnip') %>%set_mode('classification')## Run the model my_null <- telcom_wf %>%add_model(null_spec) %>%fit(data = train_set)## Predict on test setnull_output <- my_null %>%augment(new_data= test_set) %>%conf_mat(truth = flag_resiliation,estimate = .pred_class)(## ROC AUC for null model.my_null %>%augment(new_data= test_set) %>%roc_curve(truth = flag_resiliation, .pred_0) %>%autoplot() +## Confusion matrix null_output %>%autoplot())
Null model summary
6.2Logistic regression model
Logistic Regression is a statistical model widely used in machine learning for binary and multi-class classification tasks. Despite its name, logistic regression is used for classification, not regression. It models the probability of a sample belonging to a particular class based on one or more predictor variables. Logistic Regression models the probability of an event occurring, such as the probability of an observation belonging to a certain class. The logistic function, also known as the sigmoid function, is employed to squash the output into the range (0, 1). The model is useful mainly due to its simplicity and interpretability. The AUC curve below shows the model does better than the null model. Table () shows the metrics for the model for use in comparing with other models (Boateng and Abaye 2019; Das 2021).
Code
## Set up the modellogit_model <-logistic_reg() %>%set_engine("glm") %>%set_mode("classification")## Run the model my_logit <- telcom_wf %>%add_model(logit_model) %>%fit(data = train_set)## Predict on test setlogit_output <- my_logit %>%augment(new_data= test_set) %>%conf_mat(truth = flag_resiliation,estimate = .pred_class)(logit_output %>%autoplot() +## ROC AUC for logit model.my_logit %>%augment(new_data= test_set) %>%roc_curve(truth = flag_resiliation, .pred_0) %>%autoplot())
Logistic model summary
6.3Decision tree model
A Decision Tree model is a popular machine learning algorithm known for its simplicity, interpretability, and effectiveness in both classification and regression tasks. It mimics human decision-making by recursively partitioning the data based on features, ultimately leading to a set of rules that guide predictions. The model is structured as a tree, where each internal node represents a decision based on a specific feature, and each leaf node represents the predicted outcome. At each decision node, the algorithm selects the feature that best splits the data into homogeneous subsets, optimizing a chosen criterion (e.g., Gini impurity for classification, mean squared error for regression). The goal is to reduce uncertainty or impurity in the data at each step. For classification. The decision tree is much weaker than the logistic model going by the AUC. Table () has more metrics.
Code
decision_spec <-decision_tree() %>%set_engine("rpart") %>%set_mode("classification")my_decision <- telcom_wf %>%add_model(decision_spec) %>%fit(data = train_set)## Predict on training setdecision_output <- my_decision %>%augment(new_data= test_set) %>%conf_mat(truth = flag_resiliation,estimate = .pred_class)(decision_output %>%autoplot() +## ROC AUC for decision tree model.my_decision %>%augment(new_data= test_set) %>%roc_curve(truth = flag_resiliation, .pred_0) %>%autoplot())
AUC: Decision tree model
6.4Random forest model
A Random Forest model is an ensemble learning technique that builds a multitude of decision trees during training and outputs the average prediction (for regression tasks) or the majority vote (for classification tasks) of the individual trees. Random Forests are particularly effective due to their ability to mitigate overfitting and improve the generalization of predictions. A Random Forest consists of an ensemble of decision trees, where each tree is trained independently on a random subset of the training data. For each tree in the ensemble, a random sample is drawn from the training dataset with replacement. This process, known as bootstrap sampling, creates diverse subsets of data for each tree. The random forest appears very strong. It is also not very heavily affected by over-fitting being an ensemble algorithm. it also detects cases of consumer churn very well going by the confusion matrix (Lantz 2019).
Bagged Trees, short for Bootstrap Aggregating Trees or Bagging, is an ensemble learning method that combines the predictive power of multiple decision trees. This technique is particularly effective in reducing overfitting and improving the stability and accuracy of predictions. Bagged Trees employs bootstrap sampling, which involves randomly selecting subsets of the original data with replacement. This results in the creation of multiple bootstrap samples, each potentially containing duplicate and missing instances. Like the random forest model, the bagged tree does very well. The confusion matrix also shows the high power in detecting cases of consumer churn.
Code
bag_spec <-bag_tree() %>%set_engine("rpart") %>%set_mode('classification')my_bag <- telcom_wf %>%add_model(bag_spec) %>%fit(data = train_set)bag_output <- my_bag %>%augment(new_data= test_set) %>%conf_mat(truth = flag_resiliation,estimate = .pred_class)## Confusion matrix for the bagged tree model(bag_output %>%autoplot() +## ROC AUC for bagged decision tree model.my_bag %>%augment(new_data= test_set) %>%roc_curve(truth = flag_resiliation, .pred_0) %>%autoplot())
AUC for Bagged Trees Model
6.6Extreme gradient boosting (XGboost)
XGBoost is an ensemble learning method that builds a strong predictive model by combining the outputs of multiple weak models, often decision trees. It belongs to the family of boosting algorithms, which iteratively improve the model’s performance by focusing on previously mis-classified data. XGBoost is is s particularly effective for regression and classification tasks, offering high predictive accuracy and robustness against overfitting. The XGBoost model is better than the null model going by the AUC and is fairly accurate in detecting cases of churn.
The Lasso model, implemented through the glmnet package in R and integrated with the tidymodels framework, is a powerful statistical method used for both variable selection and regularization in linear regression.
The Lasso model, short for Least Absolute Shrinkage and Selection Operator, is a regression technique that introduces a penalty term to the linear regression equation. This penalty, based on the absolute values of the regression coefficients, encourages sparsity by shrinking some coefficients to exactly zero. This characteristic makes Lasso particularly useful for variable selection in high-dimensional datasets. In setting up the model in R, we set mixture to one. The confusion matrix shows that the lasso model has less than average potential of detecting churn cases.
The Ridge model, like the Lasso model, is a regularization technique commonly used in linear regression. Both Ridge and Lasso aim to address issues like multi-collinearity and over-fitting by introducing penalty terms, but they differ in the nature of the penalties applied to the regression coefficients.
The Ridge model, also known as Tikhonov regularization, introduces a penalty term to the linear regression equation based on the sum of the squared values of the regression coefficients. This penalty discourages large coefficients and helps to stabilize the model by preventing coefficients from becoming too extreme. Here, we set the mixture to zero. Like the lasso model, the ridge model has less than average potential of detecting churn cases.
We see that the bagged three has the highest sensitivity and specificity, a good AUC, and a high discriminatory power going by the confusion matrix. I choose the bagged tree.
8Target Clients for Anti-churn Promotions
Using the bagging technique, I generate a list of clients to be reached for anti-churn promotion as a csv file. Using the model, I am able to generate a list of 2740 clients.
In conclusion, this analysis has been dedicated to optimizing the anti-churn campaign for a telecommunications company by harnessing advanced machine learning techniques. Employing a comprehensive approach, six machine learning models, including Decision Tree, Random Forest, Bagged Decision Tree, Extreme Gradient Boosting, Ridge Model, and Lasso Model, were implemented. These models collectively identified 2000 consumers for urgent contact in the anti-churn initiative. Through the application of the logistic regression model, the analysis successfully identified a total of 2740 clients, offering targeted insights that hold the potential to significantly enhance the efficacy of the anti-churn campaign. The demonstrated success of sophisticated machine learning methodologies, as evidenced by the logistic regression model, emphasizes their crucial role in precise customer churn prediction and strategic intervention within the telecommunications sector. This study not only contributes to the ongoing discourse in anti-churn strategies but also reinforces the imperative of leveraging advanced analytics for informed decision-making in the dynamic landscape of telecommunications.
References
Boateng, Ernest Yeboah, and Daniel A Abaye. 2019. “A Review of the Logistic Regression Model with Emphasis on Medical Research.”Journal of Data Analysis and Information Processing 7 (4): 190–207.
Das, Abhik. 2021. “Logistic Regression.” In Encyclopedia of Quality of Life and Well-Being Research, 1–2. Springer.
Lantz, Brett. 2019. Machine Learning with r: Expert Techniques for Predictive Modeling. Packt publishing ltd.