Customer churn is a major concern in the telecommunications industry. Retaining customers is more cost-effective than acquiring new ones. Regork Telecom seeks to predict customer churn to retain more customers and protect revenue.
I used historical customer data and applied machine learning techniques including Logistic Regression, Decision Trees, and Random Forests. I performed cross-validation and hyperparameter tuning to select the best model.
Based on predictive modeling results, I propose a targeted incentive program to retain customers identified as likely to churn. This approach will improve customer retention rates and increase revenue.
library(tidyverse)
library(caret)
library(randomForest)
library(rpart)
library(rpart.plot)
library(pROC)
library(janitor)
library(rsample)data <- read_csv("customer_retention.csv") %>%
clean_names() %>%
drop_na(total_charges) %>%
mutate(
gender = as.factor(gender),
partner = as.factor(partner),
dependents = as.factor(dependents),
phone_service = as.factor(phone_service),
multiple_lines = as.factor(multiple_lines),
internet_service = as.factor(internet_service),
online_security = as.factor(online_security),
online_backup = as.factor(online_backup),
device_protection = as.factor(device_protection),
tech_support = as.factor(tech_support),
streaming_tv = as.factor(streaming_tv),
streaming_movies = as.factor(streaming_movies),
contract = as.factor(contract),
paperless_billing = as.factor(paperless_billing),
payment_method = as.factor(payment_method),
status = as.factor(status)
)## Rows: 6999 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (16): Gender, Partner, Dependents, PhoneService, MultipleLines, Internet...
## dbl (4): SeniorCitizen, Tenure, MonthlyCharges, TotalCharges
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(data, aes(x=contract, fill=status)) +
geom_bar(position="fill", color="black") +
scale_fill_manual(values=c("Current"="lightblue", "Left"="tomato")) +
labs(
title = "Customer Churn by Contract Type",
subtitle = "Month-to-month customers churn significantly more",
x = "Contract Type",
y = "Proportion of Customers",
fill = "Customer Status"
) +
theme_minimal(base_size = 14)ggplot(data, aes(x=payment_method, fill=status)) +
geom_bar(position="fill", color="black") +
scale_fill_manual(values=c("Current"="lightblue", "Left"="tomato")) +
labs(
title = "Churn Rate by Payment Method",
subtitle = "Electronic check customers churn at higher rates",
x = "Payment Method",
y = "Proportion of Customers",
fill = "Customer Status"
) +
coord_flip() +
theme_minimal(base_size = 14)logit_preds <- predict(logit_model, test, type = "prob")[,2]
tree_preds <- predict(tree_model, test, type = "prob")[,2]
rf_preds <- predict(rf_model, test, type = "prob")[,2]
roc_logit <- roc(test$status, logit_preds)## Setting levels: control = Current, case = Left
## Setting direction: controls < cases
## Setting levels: control = Current, case = Left
## Setting direction: controls < cases
## Setting levels: control = Current, case = Left
## Setting direction: controls < cases
## Area under the curve: 0.8447
## Area under the curve: 0.7227
## Area under the curve: 0.8355
| Model | AUC Score |
|---|---|
| Logistic Regression | 0.8447 |
| Decision Tree | 0.7227 |
| Random Forest | 0.8355 |
Without intervention, churn could result in a monthly revenue loss of approximately $22,750.
Offer at-risk customers a $100 bill credit for committing to a 1-year
contract.
- One-time cost: approximately $35,000
- Revenue preserved: approximately $273,000 annually.
This analysis uses only available behavior-based data.
Future work could integrate satisfaction surveys, customer service
interactions, or competitor analysis to enhance predictions.
I recommend immediate implementation of targeted retention
incentives.
Focusing on high-risk groups such as month-to-month and new customers
will maximize return on investment and reduce revenue loss.