PART 2: Enhancing Anti-Churn Strategies: Leveraging Advanced Machine Learning for Targeted Intervention in Telecommunications

Strategic Insights Unveiled: Machine Learning Illuminates the Path to Proactive Anti-Churn Promotions by Identifying At-Risk Consumers

Author
Affiliations

John Karuitha

Karatina University, School of Business

University of the Witwatersrand, Johannesburg, School of Construction Economics & Management

Published

January 12, 2024

Modified

January 12, 2024

Executive Summary

This analysis focuses on optimizing the anti-churn campaign for a telecommunications company through advanced machine learninThis study aims to improve a telecommunications company’s anti-churn efforts using advanced machine learning. Six models, including Decision Tree, Random Forest, Bagged Decision Tree, Extreme Gradient Boosting, Ridge Model, and Lasso Model, were used to identify 2000 customers for urgent anti-churn actions. The logistic regression model successfully pinpointed 2740 clients, offering targeted insights to enhance the anti-churn campaign’s effectiveness. The findings highlight the importance of using sophisticated machine learning for precise customer churn prediction in the telecommunications sector.

1 Introduction

As of April 2023, many customers have been canceling their contracts with a mobile phone company, negatively affecting its performance. To tackle this issue, the company wants to reach out to active customers and offer them special deals to prevent them from canceling in the future—a strategy known as “anti-churn.” However, due to budget constraints, they can only contact 2,000 people. The goal is to identify those most likely to cancel in the next 3 months.

In this analysis, various tools are used to find 2,000 customers at risk of leaving for the anti-churn campaign. Here’s a breakdown: Section 2 explains the goals, Section 3 lists the methods used, Section 4 explores the data and finds key features related to customer cancellations, Section 5 puts the methods into action and assesses their performance, and finally, Section 6 concludes the analysis.

Code
if(!require(pacman)){
        install.packages('pacman')
}

p_load(tidyverse, janitor, skimr, 
       ggthemes, gt, correlationfunnel,
       mice, doParallel, tidymodels,
       klaR, ranger, rpart, kknn,
       kernlab, LiblineaR, brulee,
       conflicted, themis, xgboost,
       usemodels, AppliedPredictiveModeling,
       discrim, baguette, nnet, patchwork,
       kableExtra, caret, stacks)

p_load_gh("datarootsio/artyfarty")

theme_set(artyfarty::theme_scientific())
options(digits = 2)
options(scipen = 999)

## Speed
## Hasten code execution by parallel computing
all_cores <- parallel::detectCores(logical = FALSE)
cl <- makeCluster(all_cores)
registerDoParallel(cl)
Code
telcom_one_val <- read.csv2("base_telecom_2023_03.txt")
telcom_two <- read.csv2("base_telecom_2022_12.txt") %>% 
        mice(seed = 234, 
             printFlag = FALSE) %>% 
        complete()

2 Objective

The primary objective of this analysis is to identify and select a targeted group of 2,000 active consumers who are at a heightened risk of canceling their contracts within the next 3 months.

3 Technique

The principle of the project will be to construct several targets of 2000 customers using more or less complex statistical methods in order to improve the performance of the marketing campaign:

  1. Decision tree model.
  2. Random forest model.
  3. Bagged decision tree model.
  4. Extreme gradient boosting model.
  5. Ridge model.
  6. Lasso model.

4 Data

There are two sets of data.

  • base_telecom_2022_12.txt has 44529 rows and 42 columns.

  • base_telecom_2023_03.txt has 22528 rows and 41 columns.

The extra column in base_telecom_2022_12.txt contains a column flag_resiliation that indicates whether the consumer churned or not. Hence, I use this data set to train our models. I test the models on the second set of data; base_telecom_2023_03.txt .

4.1 Data Exploration

The table below shows the summary statistics for numeric variables in the data. We see that the variables taille_ville and revenu_moyen_ville have a few missing observations.

Code
telcom_two %>% 
        dplyr::select(where(is.numeric)) %>% 
        skim_without_charts() %>% 
        dplyr::select(-skim_type, -n_missing) %>% 
        rename(
            Variable = skim_variable,
            Complete = complete_rate,
            Mean = numeric.mean,
            SD = numeric.sd,
            Min = numeric.p0,
            Q1 = numeric.p25,
            Median = numeric.p50,
            Q3 = numeric.p75,
            Max = numeric.p100
        ) %>% 
        gt(caption = "Summary Statistics: Numeric Variables")
Summary Statistics: Numeric Variables
Variable Complete Mean SD Min Q1 Median Q3 Max
flag_resiliation 1 0.18 0.38 0 0 0 0 1
taille_ville 1 58223.13 78778.66 37 6231 25701 73002 798074
revenu_moyen_ville 1 15744.46 4930.16 0 12482 14160 18071 57257
nb_migrations 1 1.46 1.44 0 0 1 2 13
flag_migration_hausse 1 0.35 0.48 0 0 0 1 1
flag_migration_baisse 1 0.50 0.50 0 0 1 1 1
nb_services 1 2.96 1.83 0 2 3 4 18
flag_personnalisation_repondeur 1 0.20 0.40 0 0 0 0 1
flag_telechargement_sonnerie 1 0.14 0.34 0 0 0 0 1
nb_reengagements 1 0.64 0.70 0 0 1 1 4
vol_appels_m6 1 18012.26 10562.61 0 9631 16683 25484 46229
vol_appels_m5 1 18002.92 10559.17 0 9626 16714 25532 46347
vol_appels_m4 1 18017.06 10597.15 0 9617 16667 25477 46613
vol_appels_m3 1 18056.74 10646.03 0 9575 16731 25554 46172
vol_appels_m2 1 18046.34 10619.43 0 9584 16692 25576 47101
vol_appels_m1 1 18029.88 10639.35 0 9541 16631 25618 47151
flag_appels_vers_international 1 0.26 0.44 0 0 0 1 1
flag_appels_depuis_international 1 0.17 0.37 0 0 0 0 1
flag_appels_numeros_speciaux 1 0.58 0.49 0 0 1 1 1
nb_sms_m6 1 101.75 133.43 0 11 29 135 534
nb_sms_m5 1 101.59 133.35 0 11 30 135 534
nb_sms_m4 1 101.60 133.51 0 11 30 135 538
nb_sms_m3 1 101.54 133.65 0 10 31 135 535
nb_sms_m2 1 101.42 133.56 0 10 31 134 537
nb_sms_m1 1 101.17 133.47 0 10 32 134 536

Summary Statistics for Numeric Variables

Code
telcom_two %>% 
        dplyr::select(where(is.character)) %>% 
        skim_without_charts() %>% 
        dplyr::select(-n_missing) %>% 
        dplyr::rename(
                Variable = skim_variable,
                Complete = complete_rate,
                Char_min = character.min,
                Char_max = character.max,
                Empty = character.empty,
                Unique = character.n_unique,
                Blank = character.whitespace
                
        ) %>% 
        gt(caption = "Summary Statistics for Character Variables")
Summary Statistics for Character Variables
skim_type Variable Complete Char_min Char_max Empty Unique Blank
character id_client 1 15 15 0 44529 0
character date_naissance 1 0 10 45 15796 0
character sexe 1 0 8 6 3 0
character csp 1 5 19 0 8 0
character code_postal 1 4 5 0 3628 0
character type_ville 1 0 14 1944 5 0
character date_activation 1 10 10 0 2414 0
character enseigne 1 8 19 0 3 0
character mode_paiement 1 3 8 0 3 0
character duree_offre_init 1 1 3 0 8 0
character duree_offre 1 1 3 0 8 0
character telephone_init 1 12 15 0 4 0
character telephone 1 12 15 0 3 0
character date_fin_engagement 1 0 10 602 1956 0
character date_dernier_reengagement 1 0 10 21429 1016 0
character situation_impayes 1 12 15 0 3 0
character segment 1 1 1 0 3 0

Summary Statistics for Character Variables

4.2 Feature Engineering

I start by converting the data dates into the proper format. I also create an age column age that is the current year (2023) minus the year of birth (date_naissance). The goal is to find out whether age has a bearing on consumer churn. Similarly, I also create a duration column that captures the period from the start of the contract. Finally, I create a feature that captures whether the client is under commitment where they can only exit the plan by paying a contract fee. I capture any member whose commitment period ends on or before December 31, 2023 to be under commitment, otherwise, they are not under commitment.

Code
telcom_two <- telcom_two %>% 
        mutate(
                date_naissance = dmy(date_naissance),
                date_activation = dmy(date_activation),
                date_fin_engagement = dmy(date_fin_engagement),
                date_dernier_reengagement = dmy(date_dernier_reengagement)
        ) %>% 
        mutate(
                age = as.numeric(today() - date_naissance),
                duration = as.numeric(today() - date_activation)
        ) %>% 
        mutate(committed = case_when(
              date_fin_engagement <= as.Date("2022-12-31") ~ "Committed",
              .default = "Not Committed"
        )) %>% 
        dplyr::select(-starts_with("date"),
               -code_postal,
               -id_client) %>% 
        mice(seed = 234, printFlag = FALSE) %>% 
        complete()

4.3 Data Visualization

I plot the extent of churn in our data. We see in the figure below that the incidents of churn are relatively few compared to the consumers that stay. During modelling, this is an important consideration which would require us to balance the data to aptly capture the characteristics of the people that churn.

Code
telcom_two %>% 
        ggplot(aes(x = factor(flag_resiliation))) + 
        geom_bar() + 
        labs(title = "Prevalence of Churn in the Data",
             y = "There is a serious imbalance with the non-churn over-represented in the data",
             x = "Churned?", y = "Count")

I create a correlation funnel that bins the data and creates a correlation matrix of the correlations. Starting from the top, we see the variables that have stronger linear relationships with churn. In this case, nb_reengagements and vols_appels variables are highly related to churn. Sex has the lowest linear relationship with churn.

CorrelationFunnel Package

The correlationfunnel package includes a streamlined 3-step process for preparing data and performing visual Correlation Analysis. The visualization produced uncovers insights by elevating high-correlation features and loweribng low-correlation features. The shape looks like a funnel (hence the name “Correlation Funnel”), making it very efficient to understand which features are most likely to provide business insights and lend well to a [machine learning model](https://cran.r-project.org/web/packages/correlationfunnel/vignettes/introducing_correlation_funnel.html).

Code
telcom_two %>% 
        binarize(n_bins = 5, thresh_infreq = 0.01, name_infreq = "OTHER", one_hot = TRUE) %>% 
        correlate("flag_resiliation__0") %>% 
        plot_correlation_funnel()

Correlation Funnel

5 Modeling

To run the machine learning models, I start by creating a recipe that I use throughout the analysis.

Code
## Select pertinent variables ----
training_data <- telcom_two %>% 
        mutate(flag_resiliation = factor(flag_resiliation))

## Final validation data ----
validation_data <- telcom_one_val %>% 
        mutate(
                date_naissance = dmy(date_naissance),
                date_activation = dmy(date_activation),
                date_fin_engagement = dmy(date_fin_engagement),
                date_dernier_reengagement = dmy(date_dernier_reengagement)
        ) %>% 
        mutate(
                age = as.numeric(today() - date_naissance),
                duration = as.numeric(today() - date_activation)
        ) %>% 
        mutate(committed = case_when(
              date_fin_engagement <= as.Date("2022-12-31") ~ "Committed",
              .default = "Not Committed"
        )) %>% 
        dplyr::select(-starts_with("date"),
               -code_postal,
               -id_client) %>% 
        mice(seed = 234, printFlag = FALSE) %>% 
        complete()


## Create a recipe ----
my_recipe <- recipe(flag_resiliation ~ ., data = training_data) %>% 
        step_dummy(all_nominal_predictors()) %>% 
        step_upsample(over_ratio = 1) %>% 
        step_impute_knn() %>% 
        prep(training = NULL)

# ## Apply all the steps to training data ----
# training_data <- my_recipe %>% 
#           bake(new_data = training_data)


## Apply the recipe steps to the testing set
validation_data <- my_recipe %>% 
         bake(new_data = validation_data)

## Create a workflow
telcom_wf <- workflow() %>% 
        add_recipe(my_recipe)

6 Training and testing sets

I create training and testing sets out of the main data that contains the outcomes of customer churn. I use this data to test the models before choosing the final model. For the model of choice, I will then choose the clients at risk of churn.

Code
split_object <- training_data %>% 
        initial_split(prop = 0.75, 
                      strata = flag_resiliation)

train_set <- split_object %>% 
        training()

test_set <- split_object %>% 
        testing()

6.1 Null model

The null model, which is our baseline for analysis assumes that we pick the target clients at random. We see that the model has very weak discriminatory power with a diagonal ROC AUC. The ROC AUC indicates that the model is equivalent to guessing the clients to contact. In the remainder of the analysis, we explore models that will do better than the null model.

Code
null_spec <- null_model() %>% 
        set_engine('parsnip') %>% 
        set_mode('classification')

## Run the model 
my_null <- telcom_wf %>% 
        add_model(null_spec) %>% 
        fit(data = train_set)

## Predict on test set
null_output <- my_null %>% 
        augment(new_data= test_set) %>% 
        conf_mat(truth = flag_resiliation,
                 estimate = .pred_class)

(## ROC AUC for null model.
my_null %>% 
        augment(new_data= test_set) %>% 
        roc_curve(truth = flag_resiliation,
                 .pred_0) %>% 
        autoplot() +


## Confusion matrix 
null_output %>% 
        autoplot())

Null model summary

6.2 Logistic regression model

Logistic Regression is a statistical model widely used in machine learning for binary and multi-class classification tasks. Despite its name, logistic regression is used for classification, not regression. It models the probability of a sample belonging to a particular class based on one or more predictor variables. Logistic Regression models the probability of an event occurring, such as the probability of an observation belonging to a certain class. The logistic function, also known as the sigmoid function, is employed to squash the output into the range (0, 1). The model is useful mainly due to its simplicity and interpretability. The AUC curve below shows the model does better than the null model. Table () shows the metrics for the model for use in comparing with other models (Boateng and Abaye 2019; Das 2021).

Code
## Set up the model
logit_model <- logistic_reg() %>% 
        set_engine("glm") %>% 
        set_mode("classification")

## Run the model 
my_logit <- telcom_wf %>% 
        add_model(logit_model) %>% 
        fit(data = train_set)

## Predict on test set
logit_output <- my_logit %>% 
        augment(new_data= test_set) %>% 
        conf_mat(truth = flag_resiliation,
                 estimate = .pred_class)

(logit_output %>% 
        autoplot() + 

## ROC AUC for logit model.
my_logit %>% 
        augment(new_data= test_set) %>% 
        roc_curve(truth = flag_resiliation,
                 .pred_0) %>% 
        autoplot())

Logistic model summary

6.3 Decision tree model

A Decision Tree model is a popular machine learning algorithm known for its simplicity, interpretability, and effectiveness in both classification and regression tasks. It mimics human decision-making by recursively partitioning the data based on features, ultimately leading to a set of rules that guide predictions. The model is structured as a tree, where each internal node represents a decision based on a specific feature, and each leaf node represents the predicted outcome. At each decision node, the algorithm selects the feature that best splits the data into homogeneous subsets, optimizing a chosen criterion (e.g., Gini impurity for classification, mean squared error for regression). The goal is to reduce uncertainty or impurity in the data at each step. For classification. The decision tree is much weaker than the logistic model going by the AUC. Table () has more metrics.

Code
decision_spec <- decision_tree() %>% 
        set_engine("rpart") %>% 
        set_mode("classification")

my_decision <- telcom_wf %>% 
        add_model(decision_spec) %>% 
        fit(data = train_set)


## Predict on training set
decision_output <- my_decision %>% 
        augment(new_data= test_set) %>% 
        conf_mat(truth = flag_resiliation,
                 estimate = .pred_class)

(decision_output %>% 
        autoplot() + 

## ROC AUC for decision tree model.
my_decision %>% 
        augment(new_data= test_set) %>% 
        roc_curve(truth = flag_resiliation,
                 .pred_0) %>% 
        autoplot())

AUC: Decision tree model

6.4 Random forest model

A Random Forest model is an ensemble learning technique that builds a multitude of decision trees during training and outputs the average prediction (for regression tasks) or the majority vote (for classification tasks) of the individual trees. Random Forests are particularly effective due to their ability to mitigate overfitting and improve the generalization of predictions. A Random Forest consists of an ensemble of decision trees, where each tree is trained independently on a random subset of the training data. For each tree in the ensemble, a random sample is drawn from the training dataset with replacement. This process, known as bootstrap sampling, creates diverse subsets of data for each tree. The random forest appears very strong. It is also not very heavily affected by over-fitting being an ensemble algorithm. it also detects cases of consumer churn very well going by the confusion matrix (Lantz 2019).

Code
rf_model <- rand_forest() %>% 
        set_engine("ranger") %>% 
        set_mode("classification")

set.seed(134)
my_rf <- telcom_wf %>% 
        add_model(rf_model) %>% 
        fit(data = train_set)

rf_output <- my_rf %>% 
        augment(new_data= test_set) %>% 
        conf_mat(truth = flag_resiliation,
                 estimate = .pred_class)

(rf_output %>% 
        autoplot() + 

## ROC AUC for random forest model.
my_rf %>% 
        augment(new_data= test_set) %>% 
        roc_curve(truth = flag_resiliation,
                 .pred_0) %>% 
        autoplot())

AUC for Random Forest Model

6.5 Bagged decision tree

Bagged Trees, short for Bootstrap Aggregating Trees or Bagging, is an ensemble learning method that combines the predictive power of multiple decision trees. This technique is particularly effective in reducing overfitting and improving the stability and accuracy of predictions. Bagged Trees employs bootstrap sampling, which involves randomly selecting subsets of the original data with replacement. This results in the creation of multiple bootstrap samples, each potentially containing duplicate and missing instances. Like the random forest model, the bagged tree does very well. The confusion matrix also shows the high power in detecting cases of consumer churn.

Code
bag_spec <- bag_tree() %>% 
  set_engine("rpart") %>% 
        set_mode('classification')

my_bag <- telcom_wf %>% 
        add_model(bag_spec) %>% 
        fit(data = train_set)

bag_output <- my_bag %>% 
        augment(new_data= test_set) %>% 
        conf_mat(truth = flag_resiliation,
                 estimate = .pred_class)

## Confusion matrix for the bagged tree model
(bag_output %>% 
        autoplot() + 

## ROC AUC for bagged decision tree model.
my_bag %>% 
        augment(new_data= test_set) %>% 
        roc_curve(truth = flag_resiliation,
                 .pred_0) %>% 
        autoplot())

AUC for Bagged Trees Model

6.6 Extreme gradient boosting (XGboost)

XGBoost is an ensemble learning method that builds a strong predictive model by combining the outputs of multiple weak models, often decision trees. It belongs to the family of boosting algorithms, which iteratively improve the model’s performance by focusing on previously mis-classified data. XGBoost is is s particularly effective for regression and classification tasks, offering high predictive accuracy and robustness against overfitting. The XGBoost model is better than the null model going by the AUC and is fairly accurate in detecting cases of churn.

Code
xg_spec <- boost_tree() %>% 
        set_engine('xgboost') %>% 
        set_mode('classification')

my_xg <- telcom_wf %>% 
        add_model(xg_spec) %>% 
        fit(data = training_data)

xg_output <- my_xg %>% 
        augment(new_data= test_set) %>% 
        conf_mat(truth = flag_resiliation,
                 estimate = .pred_class)

## Confusion matrix for the xg model
(xg_output %>% 
        autoplot() + 

## ROC AUC for xg model.
my_xg %>% 
        augment(new_data= test_set) %>% 
        roc_curve(truth = flag_resiliation,
                 .pred_0) %>% 
        autoplot())

AUC for XGBoost Model

6.7 Lasso regression

The Lasso model, implemented through the glmnet package in R and integrated with the tidymodels framework, is a powerful statistical method used for both variable selection and regularization in linear regression.

The Lasso model, short for Least Absolute Shrinkage and Selection Operator, is a regression technique that introduces a penalty term to the linear regression equation. This penalty, based on the absolute values of the regression coefficients, encourages sparsity by shrinking some coefficients to exactly zero. This characteristic makes Lasso particularly useful for variable selection in high-dimensional datasets. In setting up the model in R, we set mixture to one. The confusion matrix shows that the lasso model has less than average potential of detecting churn cases.

Code
glmnet_spec <- logistic_reg(
        
        penalty = 0.01,
        mixture = 1
) %>% 
        set_engine('glmnet') %>% 
        set_mode('classification')

my_lasso <- telcom_wf %>% 
        add_model(glmnet_spec) %>% 
        fit(data = training_data)

lasso_output <- my_lasso %>% 
        augment(new_data = test_set) %>% 
        conf_mat(truth = flag_resiliation,
                 estimate = .pred_class)

## Confusion matrix for the lasso model
(lasso_output %>% 
        autoplot() + 

## ROC AUC for lasso model.
my_lasso %>% 
        augment(new_data= test_set) %>% 
        roc_curve(truth = flag_resiliation,
                 .pred_0) %>% 
        autoplot())

AUC for LASSO Model

6.8 Ridge Model

The Ridge model, like the Lasso model, is a regularization technique commonly used in linear regression. Both Ridge and Lasso aim to address issues like multi-collinearity and over-fitting by introducing penalty terms, but they differ in the nature of the penalties applied to the regression coefficients.

The Ridge model, also known as Tikhonov regularization, introduces a penalty term to the linear regression equation based on the sum of the squared values of the regression coefficients. This penalty discourages large coefficients and helps to stabilize the model by preventing coefficients from becoming too extreme. Here, we set the mixture to zero. Like the lasso model, the ridge model has less than average potential of detecting churn cases.

Code
ridge_spec <- logistic_reg(
        
        penalty = 0.01,
        mixture = 0
) %>% 
        set_engine('glmnet') %>% 
        set_mode('classification')

my_ridge <- telcom_wf %>% 
        add_model(ridge_spec) %>% 
        fit(data = training_data)

ridge_output <- my_ridge %>% 
        augment(new_data = test_set) %>% 
        conf_mat(truth = flag_resiliation,
                 estimate = .pred_class)

## Confusion matrix for the lasso model
(ridge_output %>% 
        autoplot() + 

## ROC AUC for lasso model.
my_ridge %>% 
        augment(new_data= test_set) %>% 
        roc_curve(truth = flag_resiliation,
                 .pred_0) %>% 
        autoplot())

AUC for RIDGE Model

7 Model Metrics

Here is a summary of model metrics.

Code
null_output %>% 
        summary() %>% 
        rename(Null_model = .estimate) %>% 
        left_join(
                logit_output %>% 
                        summary(),
                by = c(".metric", ".estimator")
        ) %>% rename(Logit_model = .estimate) %>% 
        left_join(
                decision_output %>% 
                        summary() %>% 
                        rename(DecisionTree = .estimate),
                
                by = c(".metric", ".estimator")        
        ) %>% 
        left_join(
                rf_output %>% 
                        summary() %>% 
                        rename(RandomForest = .estimate),
                
                by = c(".metric", ".estimator")        
        ) %>% 
        
        left_join(
                bag_output %>% 
                        summary() %>% 
                        rename(BaggedTree = .estimate),
                
                by = c(".metric", ".estimator")        
        )  %>% 
        left_join(
                xg_output %>% 
                        summary() %>% 
                        rename(XGBoost = .estimate),
                
                by = c(".metric", ".estimator")        
        ) %>% 
        left_join(
                lasso_output %>% 
                        summary() %>% 
                        rename(Lasso = .estimate),
                
                by = c(".metric", ".estimator")        
        ) %>% 
        dplyr::select(-.estimator) %>% 
        kbl(caption = "Summary of the Models",
            booktabs = TRUE) %>% 
        kable_classic(full_width = TRUE)
Summary of the Models
.metric Null_model Logit_model DecisionTree RandomForest BaggedTree XGBoost Lasso
accuracy 0.82 0.86 0.86 0.88 0.88 0.89 0.85
kap 0.00 0.45 0.42 0.53 0.54 0.56 0.37
sens 1.00 0.95 0.96 0.97 0.95 0.96 0.97
spec 0.00 0.44 0.39 0.48 0.54 0.53 0.32
ppv 0.82 0.89 0.88 0.89 0.90 0.90 0.87
npv NaN 0.66 0.67 0.78 0.72 0.77 0.70
mcc NA 0.46 0.44 0.55 0.55 0.58 0.41
j_index 0.00 0.39 0.35 0.45 0.49 0.50 0.29
bal_accuracy 0.50 0.70 0.67 0.72 0.75 0.75 0.65
detection_prevalence 1.00 0.88 0.90 0.89 0.86 0.87 0.92
precision 0.82 0.89 0.88 0.89 0.90 0.90 0.87
recall 1.00 0.95 0.96 0.97 0.95 0.96 0.97
f_meas 0.90 0.92 0.92 0.93 0.93 0.93 0.91

Model Summaries

We see that the bagged three has the highest sensitivity and specificity, a good AUC, and a high discriminatory power going by the confusion matrix. I choose the bagged tree.

8 Target Clients for Anti-churn Promotions

Using the bagging technique, I generate a list of clients to be reached for anti-churn promotion as a csv file. Using the model, I am able to generate a list of 2740 clients.

Code
final_test_data <- telcom_one_val %>% 
        mutate(
                date_naissance = dmy(date_naissance),
                date_activation = dmy(date_activation),
                date_fin_engagement = dmy(date_fin_engagement),
                date_dernier_reengagement = dmy(date_dernier_reengagement)
        ) %>% 
        mutate(
                age = as.numeric(today() - date_naissance),
                duration = as.numeric(today() - date_activation)
        ) %>% 
        mutate(committed = case_when(
              date_fin_engagement <= as.Date("2022-12-31") ~ "Committed",
              .default = "Not Committed"
        )) %>% 
        dplyr::select(-starts_with("date"),
               -code_postal,
               -id_client) %>% 
        mice(seed = 234, printFlag = FALSE) %>% 
        complete()


cvcontrol <- trainControl(method="repeatedcv", 
                          number = 5,
                          allowParallel=TRUE)

train.bagg <- train(factor(flag_resiliation) ~ .,
                   data=training_data,
                   method="treebag",
                   trControl=cvcontrol,
                   importance=TRUE)

telcom_one_val %>% 
        bind_cols(pred = predict(train.bagg, final_test_data)) %>% 
        dplyr::filter(pred == 1) %>% 
        dplyr::select(id_client) %>% 
        write_csv("C5_Name.txt")

9 Conclusion

In conclusion, this analysis has been dedicated to optimizing the anti-churn campaign for a telecommunications company by harnessing advanced machine learning techniques. Employing a comprehensive approach, six machine learning models, including Decision Tree, Random Forest, Bagged Decision Tree, Extreme Gradient Boosting, Ridge Model, and Lasso Model, were implemented. These models collectively identified 2000 consumers for urgent contact in the anti-churn initiative. Through the application of the logistic regression model, the analysis successfully identified a total of 2740 clients, offering targeted insights that hold the potential to significantly enhance the efficacy of the anti-churn campaign. The demonstrated success of sophisticated machine learning methodologies, as evidenced by the logistic regression model, emphasizes their crucial role in precise customer churn prediction and strategic intervention within the telecommunications sector. This study not only contributes to the ongoing discourse in anti-churn strategies but also reinforces the imperative of leveraging advanced analytics for informed decision-making in the dynamic landscape of telecommunications.

References

Boateng, Ernest Yeboah, and Daniel A Abaye. 2019. “A Review of the Logistic Regression Model with Emphasis on Medical Research.” Journal of Data Analysis and Information Processing 7 (4): 190–207.
Das, Abhik. 2021. “Logistic Regression.” In Encyclopedia of Quality of Life and Well-Being Research, 1–2. Springer.
Lantz, Brett. 2019. Machine Learning with r: Expert Techniques for Predictive Modeling. Packt publishing ltd.