We are collaborating with a company called Regork, which has recently launched telecommunication products such as internet services, streaming platforms, and phone services. The CEO of Regork has assigned us the task of creating models to predict potential customer churn. Our focus is on analyzing customer retention by considering a variety of relevant variables.
We tackled these challenges by performing a more in-depth analysis of the data set. Our goal was to identify trends among customers who continued using the service versus those who decided to leave.
Our primary focus turned to internet service, given its widespread usage. We observed that a significant portion of those who discontinued the service were on fiber optics.
Next, we examined the lengths of customers’ contracts. We believe that promoting a reduced price for longer commitment periods could encourage more customers to remain with us.
Finally, we analyzed customer tenure, noting a clear trend in the number of individuals who stayed compared to those who left. We think that providing enhanced offers for those with longer tenure could be a crucial incentive for retention.
The core of our solution will involve creating enticing offers to encourage individuals to stay with Regork.
We will provide package deals that combine specific products, such as offering access to multiple streaming services at a reduced bundle price.
These are the packages required to conduct our analysis.
library(ggplot2)
library(tidymodels)
library(tidyverse)
library(baguette)
library(vip)
library(kernlab)
library(rpart.plot)
library(pdp)
library(ranger)
library(readr)
cr <- read_csv("data/customer_retention.csv")
variables <- ls(cr)
cr <- mutate(cr, Status = factor(Status)) %>%
na.omit()
cr %>%
ggplot(aes(x = InternetService)) +
geom_bar(fill = 'skyblue', size = 1.5) + facet_wrap(~ Status) +
ylab('# of Consumers') +
xlab('Mode of Service') +
ggtitle('Customer Retention with Different Internet Services')
cr %>%
ggplot(aes(Tenure)) +
geom_histogram(fill = 'aquamarine', size = .3, color = "black") +
facet_wrap(~ Status) +
ylab('# of Consumers') +
xlab('Time in Months') +
ggtitle('Consumer Retention with Tenure')
cr %>%
ggplot(aes(Contract)) +
geom_bar(width = .8, fill = 'pink') +
facet_wrap(~ Status) +
ylab('# of Consumers') +
xlab('Length of Contract') +
ggtitle('Consumer Retention relating to Contracts')
First, we have established test and training sets that will be used for our upcoming prediction models:
set.seed(123)
cr_split <- initial_split(cr, prop = 0.7, strata = Status)
cr_train <- training(cr_split)
cr_test <- testing(cr_split)
cr_recipe <- recipe(Status ~ ., data = cr_train)
In our initial approach, we opted to implement a simple logistic regression model, which yielded an AUC of 0.844—an encouraging outcome. Nonetheless, I believe there is scope for enhancement through the application of more sophisticated models.
set.seed(123)
kfold <- vfold_cv(cr_train, v = 5)
results <- logistic_reg() %>%
fit_resamples(Status ~ ., kfold)
collect_metrics(results)
## # A tibble: 3 × 6
## .metric .estimator mean n std_err .config
## <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 accuracy binary 0.797 5 0.00705 Preprocessor1_Model1
## 2 brier_class binary 0.137 5 0.00374 Preprocessor1_Model1
## 3 roc_auc binary 0.844 5 0.00974 Preprocessor1_Model1
For our subsequent model, we opted to employ a random forest approach. We believed that implementing hyperparameter tuning would enhance the precision of our results. Ultimately, we achieved an AUC of 0.835, which is slightly lower in accuracy compared to our logistic regression model.
set.seed(123)
kfold <- vfold_cv(cr_train, v = 5)
results <- logistic_reg() %>%
fit_resamples(Status ~ ., kfold)
collect_metrics(results)
## # A tibble: 3 × 6
## .metric .estimator mean n std_err .config
## <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 accuracy binary 0.797 5 0.00705 Preprocessor1_Model1
## 2 brier_class binary 0.137 5 0.00374 Preprocessor1_Model1
## 3 roc_auc binary 0.844 5 0.00974 Preprocessor1_Model1
In accordance with our third model’s findings, the inaugural row yields the most favorable outcomes, achieving an AUC of 0.814. However, this performance falls short of that exhibited by both the logistic regression and random forest models, positioning this particular model as the least optimal among the three.
dt_mod <- decision_tree(
mode = "classification",
cost_complexity = tune(),
tree_depth = tune(),
min_n = tune()
) %>%
set_engine("rpart")
dt_hyper_grid <- grid_regular(
cost_complexity(),
tree_depth(),
min_n(),
levels = 5
)
set.seed(123)
dt_results <- tune_grid(dt_mod, cr_recipe, resamples = kfold, grid = dt_hyper_grid)
print(dt_results)
## # Tuning results
## # 5-fold cross-validation
## # A tibble: 5 × 4
## splits id .metrics .notes
## <list> <chr> <list> <list>
## 1 <split [3912/979]> Fold1 <tibble [375 × 7]> <tibble [0 × 3]>
## 2 <split [3913/978]> Fold2 <tibble [375 × 7]> <tibble [0 × 3]>
## 3 <split [3913/978]> Fold3 <tibble [375 × 7]> <tibble [0 × 3]>
## 4 <split [3913/978]> Fold4 <tibble [375 × 7]> <tibble [0 × 3]>
## 5 <split [3913/978]> Fold5 <tibble [375 × 7]> <tibble [0 × 3]>
Our Optimal Model ultimately emerged as the logistic regression model, which achieved the highest AUC of 0.844. Armed with this insight, we can now strategically implement offers and refine pricing structures to enhance customer retention.
The following represents the five most significant variables within our model, accompanied by the accuracy derived from our confusion matrix.
final_fit <- logistic_reg() %>%
fit(Status ~ ., data = cr_test)
tidy(final_fit)
## # A tibble: 31 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 2.17 1.55 1.40 0.160
## 2 GenderMale -0.00317 0.120 -0.0264 0.979
## 3 SeniorCitizen 0.101 0.157 0.643 0.520
## 4 PartnerYes 0.328 0.149 2.20 0.0275
## 5 DependentsYes -0.382 0.166 -2.30 0.0215
## 6 Tenure -0.0511 0.0110 -4.64 0.00000351
## 7 PhoneServiceYes 1.07 1.22 0.877 0.381
## 8 MultipleLinesNo phone service NA NA NA NA
## 9 MultipleLinesYes 0.551 0.336 1.64 0.101
## 10 InternetServiceFiber optic 2.93 1.51 1.94 0.0521
## # ℹ 21 more rows
final_fit %>%
predict(cr_test) %>%
bind_cols(cr_test %>% select(Status)) %>%
conf_mat(truth = Status, estimate = .pred_class)
## Truth
## Prediction Current Left
## Current 1386 243
## Left 154 314
The confusion matrix revealed 154 false positives and 243 false negatives. The occurrence of false negatives indicates a higher number of customers who departed than what our model had initially projected.
The five most influential variables identified were tenure, the presence of two-year contracts, the utilization of paperless billing, familial status, and individuals with spouses.
Based on our exploratory analysis and the insights gained from our machine learning models, we recommend sticking with our original proposed solution of offering package deals with specific products for a lower price.
Through our analysis, we found that fiber-optic customers are much more likely to churn compared to DSL customers or those without internet service. To combat this, we recommend offering a deal that targets those who prefer fiber-optic. For example, we could offer a year long contract that includes 1-3 months of an additional service, such as a streaming subscription or membership to a location of their choosing. This could increase customer retention while also increasing value.
Our data has shown that customers with longer contracts are less likely to turnover. When plans are month-to-month, our customers are much more likely to churn, whereas our customers with 2-year contracts remained with us. We are recommending making our long term contracts mpre appealing to customers, this could be achieved by reducing the costs for longer deals or bundling temporary upgrades like a 3 or 6-month higher internet speed. That may be enough for some customers to bite on a 1 or 2-year contract.
Customer retention improves greatly as customers spend more time with our company. As our recommendation is to encourage our customers to take longer contracts, we can assist with customer tenure by adjusting our offers to look more appealing. After the first contract, our customers are much more likely to become loyal to our brand.