With the rapid growth of the telecommunications industry, the service providers are more inclined towards customer base expansion. In order to fulfill the need to thrive in a competitive world, maintaining current customers has become a huge challenge. The cost of attracting a new customer is said to be much higher than that of maintaining the current one. Hence, it is important for the telecom companies to use advanced analytics to consider consumer behavior and forecast in-turn customer interaction as whether they would exit the business or not. This collection of data provides information regarding a telecom company’s level of customers. Specific characteristics are reported for each customer relating to the services used.
We will be performing various steps to arrive at model to predict the future churns in this industry.
Customer churns happen when consumers or subscribers cease business with a company or service. This is also known as consumer turnover or defection. Many service providers, such as telecommunications companies, internet companies, or insurance firms often use customer churn analysis as one of the main business indicators to maintain the customer satisfaction.
Target variable is called Churn - it represents customers who left.
Predictor variables include all the remaining variables.
This the demonstration in R to analyze all relevant customer data and predict Customer churn.
Some possible insights could be -
We have taken the dataset from Kaggle, data source link Customer Churn.
We are importing the Telecom Churn data and converting the response variable to factors.
library(rmarkdown)
library(data.table)
library(ggplot2)
library(GGally)
library(grid)
library(gridExtra)
library(corrplot)
library(tidyverse)
library(tidyr)
library(caret)
library(knitr)
library(mgcv)
library(nnet)
library(NeuralNetTools)
library(e1071)
library(verification)
library(glmnet)
library(dplyr)
library(ROCR)
set.seed(12345)
telecom <- read.csv("~/telecom_churn.csv")
telecom$Churn <- factor(telecom$Churn)
dim(telecom)
## [1] 3333 11
str(telecom)
## 'data.frame': 3333 obs. of 11 variables:
## $ Churn : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ AccountWeeks : int 128 107 137 84 75 118 121 147 117 141 ...
## $ ContractRenewal: int 1 1 1 0 0 0 1 0 1 0 ...
## $ DataPlan : int 1 1 0 0 0 0 1 0 0 1 ...
## $ DataUsage : num 2.7 3.7 0 0 0 0 2.03 0 0.19 3.02 ...
## $ CustServCalls : int 1 1 0 2 3 0 3 0 1 0 ...
## $ DayMins : num 265 162 243 299 167 ...
## $ DayCalls : int 110 123 114 71 113 98 88 79 97 84 ...
## $ MonthlyCharge : num 89 82 52 57 41 57 87.3 36 63.9 93.2 ...
## $ OverageFee : num 9.87 9.78 6.06 3.1 7.42 ...
## $ RoamMins : num 10 13.7 12.2 6.6 10.1 6.3 7.5 7.1 8.7 11.2 ...
There are 3333 records of customer and 11 variables.
We will convert variables ContractRenewal and DataPlan to factors as well since it takes only two levels of values, namely “0” and “1”.
telecom$DataPlan <- factor(telecom$DataPlan)
telecom$ContractRenewal <- factor(telecom$ContractRenewal)
Here is the data.
paged_table(telecom)
We are checking if there are any missing values in any of the variables.
colSums(is.na(telecom))
## Churn AccountWeeks ContractRenewal DataPlan DataUsage
## 0 0 0 0 0
## CustServCalls DayMins DayCalls MonthlyCharge OverageFee
## 0 0 0 0 0
## RoamMins
## 0
print(paste(sum(complete.cases(telecom)),"Complete cases!"))
## [1] "3333 Complete cases!"
any(is.null(telecom))
## [1] FALSE
We see that there are no missing values in any of the variables and there are 3333 complete cases. Also, there are no invalid or spurious data in the dataset.
We are checking if there are duplicate records in the dataset which is redundant and hence needs to be removed.
dim(unique(telecom))[1]
## [1] 3333
There is no duplicate records in the dataset
We are splitting the data into training and testing set with 80:20 percentage of data. We will use the trainings set to train the model and testing set to validate the model.
index <- sample(nrow(telecom),nrow(telecom)*0.80)
telecom_train = telecom[index,]
telecom_test = telecom[-index,]
Variable | Type | Description |
---|---|---|
Churn | factor | 1 if customer cancelled service, 0 if not |
AccountWeeks | integer | number of weeks customer has had active account |
ContractRenewal | factor | 1 if customer recently renewed contract, 0 if not |
DataPlan | factor | 1 if customer has data plan, 0 if not |
DataUsage | double | gigabytes of monthly data usage |
CustServCalls | integer | number of calls into customer service |
DayMins | double | average daytime minutes per month |
DayCalls | integer | average number of daytime calls |
MonthlyCharge | double | average monthly bill |
OverageFee | double | largest overage fee in last 12 months |
RoamMins | double | average number of roaming minutes |
In the bar graph below, we can see that the number of customers’ turnover is made up 1/4 of the customers deciding to stay with their current service provider.
ggplot(telecom, aes(Churn)) + geom_bar(aes(fill = Churn), width = 0.6) + labs(title = "Stayed vs Cancelled", subtitle = "From Customer Churn dataset", x = "Customer Churn", y = "Count") + guides(fill = FALSE) + scale_x_discrete(labels = c("0" = "Stayed", "1" = "Cancelled")) + theme_classic()
We can see from the above bar graph that we have high count of customers who did not cancel their service.
corrplot(cor(telecom[,-1]), type = "lower", method = "number")
We can see the correlation coefficient of 0.95 between Data Usage and Data Plan. This indicates a strong positive linear relationship. It is a logical statement because customer who has data plan tends to use more data. Also, looking at the correlation between Monthly Charge and Data Plan or Monthly Charge and Data Usage, the coefficients are 0.74 and 0.78 respectively. These are strong linear relationships. Lastly, Days Mins and Monthly Charge has 0.57 correlation coefficent showing a moderate positive linear relationship.
ggpairs(telecom)
The plot shows that DayMins, AverageFee, RoamingMinutes are normally ditributed.AccountWeeks, Data Usage and Monthy Charges are right skewed and DailyCalls is left skewed.
ggplot(telecom, aes(Churn)) + geom_bar(aes(fill = ContractRenewal))
ggplot(telecom, aes(Churn)) + geom_bar(aes(fill = DataPlan))
We can see that the the count of customers who churned did not renew their contract but they have active data plan.
To begin, a logistic regression model containing all variables is fit to our training data, this model use the default logit link function. Train a logistic regression model with all predictor variables:
telecom.glm0 <- glm(Churn~., family=binomial, data=telecom_train)
summary(telecom.glm0)
##
## Call:
## glm(formula = Churn ~ ., family = binomial, data = telecom_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8204 -0.5214 -0.3607 -0.2168 2.9479
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.6456455 0.5946808 -9.494 < 2e-16 ***
## AccountWeeks -0.0001974 0.0015357 -0.129 0.897716
## ContractRenewal1 -1.9158591 0.1597725 -11.991 < 2e-16 ***
## DataPlan1 -1.4915812 0.5980313 -2.494 0.012626 *
## DataUsage 0.5945315 2.1193691 0.281 0.779076
## CustServCalls 0.4912088 0.0428183 11.472 < 2e-16 ***
## DayMins 0.0184734 0.0358158 0.516 0.606001
## DayCalls 0.0028711 0.0030045 0.956 0.339278
## MonthlyCharge -0.0387268 0.2104983 -0.184 0.854032
## OverageFee 0.2056159 0.3594994 0.572 0.567355
## RoamMins 0.0796817 0.0238582 3.340 0.000838 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2208.6 on 2665 degrees of freedom
## Residual deviance: 1785.3 on 2655 degrees of freedom
## AIC: 1807.3
##
## Number of Fisher Scoring iterations: 5
We see that some of the variables has p-valus < 0.05 and therefore, plays a significant role in the model.
Model performance:
AIC(telecom.glm0)
## [1] 1807.332
BIC(telecom.glm0)
## [1] 1872.103
telecom.glm0$deviance
## [1] 1785.332
AUC for training data:
pred.glm0.train <- predict(telecom.glm0, type="response")
pred0 <- prediction(pred.glm0.train, telecom_train$Churn)
unlist(slot(performance(pred0, "auc"), "y.values"))
## [1] 0.8093944
class.glm0.train<- (pred.glm0.train>0.5)*1
AUC for testing data:
pred.glm0.test <- predict(telecom.glm0, telecom_test, type="response")
pred0 <- prediction(pred.glm0.test, telecom_test$Churn)
unlist(slot(performance(pred0, "auc"), "y.values"))
## [1] 0.850737
class.glm0.test<- (pred.glm0.test>0.5)*1
Missclassification Rate for training data:
mean(telecom_test$Churn != class.glm0.test)
## [1] 0.1229385
Missclassification Rate for testing data:
mean(telecom_train$Churn != class.glm0.train)
## [1] 0.1425356
We see that the missclassification rate of the full model is not too good so to improve upon the model, we perform the variable selection.
In order to see if we could improve upon a simple model including all possible variables, various variable selection procedures were used including stepwise selection using both AIC and BIC criteria.
AIC-stepwise We build the model using AIC stepwise.
model_aic <- step(telecom.glm0)
summary(model_aic)
##
## Call:
## glm(formula = Churn ~ ContractRenewal + DataPlan + CustServCalls +
## DayMins + OverageFee + RoamMins, family = binomial, data = telecom_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8718 -0.5291 -0.3604 -0.2185 2.9321
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.476759 0.475315 -11.522 < 2e-16 ***
## ContractRenewal1 -1.917849 0.159366 -12.034 < 2e-16 ***
## DataPlan1 -0.899978 0.159449 -5.644 1.66e-08 ***
## CustServCalls 0.487195 0.042614 11.433 < 2e-16 ***
## DayMins 0.011909 0.001175 10.132 < 2e-16 ***
## OverageFee 0.139130 0.025060 5.552 2.82e-08 ***
## RoamMins 0.089987 0.022005 4.089 4.33e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2208.6 on 2665 degrees of freedom
## Residual deviance: 1787.3 on 2659 degrees of freedom
## AIC: 1801.3
##
## Number of Fisher Scoring iterations: 5
AUC for training data:
pred.aic.train <- predict(model_aic, type="response")
pred_aic <- prediction(pred.aic.train, telecom_train$Churn)
auc_aic_train<-unlist(slot(performance(pred_aic, "auc"), "y.values"))
auc_aic_train
## [1] 0.8085984
class.glm0.train.aic<- (pred.aic.train>0.5)*1
AUC for testing data:
pred.aic.test <- predict(model_aic, telecom_test, type="response")
pred_aic1 <- prediction(pred.aic.test, telecom_test$Churn)
auc_aic_test<-unlist(slot(performance(pred_aic1, "auc"), "y.values"))
auc_aic_test
## [1] 0.85216
class.glm0.test.aic<- (pred.aic.test>0.5)*1
Missclassification Rate for training data:
MR_aic_train<-mean(telecom_train$Churn != class.glm0.train.aic)
MR_aic_train
## [1] 0.1414104
Missclassification Rate for testing data:
MR_aic_test<-mean(telecom_test$Churn != class.glm0.test.aic)
MR_aic_test
## [1] 0.125937
BIC-stepwise We build the model using BIC stepwise.
model_bic <- step(telecom.glm0, k=log(nrow(telecom_train)))
summary(model_bic)
##
## Call:
## glm(formula = Churn ~ ContractRenewal + DataPlan + CustServCalls +
## DayMins + OverageFee + RoamMins, family = binomial, data = telecom_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8718 -0.5291 -0.3604 -0.2185 2.9321
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.476759 0.475315 -11.522 < 2e-16 ***
## ContractRenewal1 -1.917849 0.159366 -12.034 < 2e-16 ***
## DataPlan1 -0.899978 0.159449 -5.644 1.66e-08 ***
## CustServCalls 0.487195 0.042614 11.433 < 2e-16 ***
## DayMins 0.011909 0.001175 10.132 < 2e-16 ***
## OverageFee 0.139130 0.025060 5.552 2.82e-08 ***
## RoamMins 0.089987 0.022005 4.089 4.33e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2208.6 on 2665 degrees of freedom
## Residual deviance: 1787.3 on 2659 degrees of freedom
## AIC: 1801.3
##
## Number of Fisher Scoring iterations: 5
AUC for training data:
pred.bic.train <- predict(model_bic, type="response")
pred_bic <- prediction(pred.bic.train, telecom_train$Churn)
unlist(slot(performance(pred_bic, "auc"), "y.values"))
## [1] 0.8085984
class.glm0.train.bic<- (pred.bic.train>0.5)*1
AUC for testing data:
pred.bic.test <- predict(model_bic, telecom_test, type="response")
pred_bic1 <- prediction(pred.bic.test, telecom_test$Churn)
unlist(slot(performance(pred_bic1, "auc"), "y.values"))
## [1] 0.85216
class.glm0.test.bic<- (pred.bic.test>0.5)*1
Missclassification Rate for training data:
mean(telecom_train$Churn != class.glm0.train.bic)
## [1] 0.1414104
Missclassification Rate for testing data:
mean(telecom_test$Churn != class.glm0.test.bic)
## [1] 0.125937
Model Performance- Logistic Regression
Given that all three models AIC and BIC model is the same and is better than the full model in terms of missclassification rate of the testing data. As the model fit using AIC (same as BIC in this case) (hereafter the ‘AIC model’) has the smallest MR we considered it our best model.
In order to determine if our model has sufficient ability to discriminate between true positives and false positives, we plotted the ROC curve. Figure 1 displays the ROC curve for the AIC model based on our training data and Figure 2 displays the ROC curve for the AIC model based on the remaining 20% testing data.
ROC curve for training data:
pred <- prediction(pred.aic.train, telecom_train$Churn)
perf <- performance(pred, "tpr", "fpr")
plot(perf, colorize=TRUE)
ROC curve for testing data:
pred <- prediction(pred.aic.test, telecom_test$Churn)
perf <- performance(pred, "tpr", "fpr")
plot(perf, colorize=TRUE)
Conclusion- Logistic Regression
The final model contains 6 predictors. The in-sample and out-of-sample performance of this model is comparable. The model can accurately predict the churn classification for more than 88% of the customers in the data set.
Our goal here is to develop a neural network to determine if a customer cancels the telecom service or not.
We now generate the error of the neural network model, along with the weights between the inputs, hidden layers, and outputs.
library(neuralnet)
##
## Attaching package: 'neuralnet'
## The following object is masked from 'package:ROCR':
##
## prediction
## The following object is masked from 'package:dplyr':
##
## compute
nn <- neuralnet(Churn ~ ., data = telecom_train, hidden = c(2,1),act.fct = "logistic", linear.output = FALSE,stepmax = 1e+08, likelihood = TRUE)
nn$result.matrix
## [,1]
## error 8.844941e+01
## reached.threshold 9.797548e-03
## steps 1.906010e+05
## aic 2.308988e+02
## bic 3.898839e+02
## Intercept.to.1layhid1 6.172225e+00
## AccountWeeks.to.1layhid1 -3.867337e-01
## ContractRenewal.to.1layhid1 -6.828850e+00
## DataPlan.to.1layhid1 -1.875615e-01
## DataUsage.to.1layhid1 8.504366e-01
## CustServCalls.to.1layhid1 -9.084971e+00
## DayMins.to.1layhid1 3.515021e+00
## DayCalls.to.1layhid1 -8.295585e-01
## MonthlyCharge.to.1layhid1 -6.726830e-01
## OverageFee.to.1layhid1 2.096985e+00
## RoamMins.to.1layhid1 2.481003e-01
## Intercept.to.1layhid2 -1.813270e+00
## AccountWeeks.to.1layhid2 8.806643e-01
## ContractRenewal.to.1layhid2 -3.172061e+01
## DataPlan.to.1layhid2 -2.961457e+00
## DataUsage.to.1layhid2 -3.277022e+01
## CustServCalls.to.1layhid2 -1.179714e+01
## DayMins.to.1layhid2 1.485483e+01
## DayCalls.to.1layhid2 -2.991698e-01
## MonthlyCharge.to.1layhid2 3.879850e+01
## OverageFee.to.1layhid2 6.154376e+00
## RoamMins.to.1layhid2 4.641863e+00
## Intercept.to.2layhid1 -2.340426e+00
## 1layhid1.to.2layhid1 3.733727e+01
## 1layhid2.to.2layhid1 -3.516225e+01
## Intercept.to.Churn 2.241288e+00
## 2layhid1.to.Churn -5.803788e+00
We create a neural network with 2 hidden layers and our neural network looks like:
plot(nn, rep = "best")
We get the confusion matrix for the training data:
temp_train <- subset(telecom_train, select = c("AccountWeeks","ContractRenewal","DataPlan","DataUsage","CustServCalls","DayMins","DayCalls","MonthlyCharge","OverageFee","RoamMins"))
nn.results1 <- compute(nn, temp_train)
results1 <- data.frame(actual = telecom_train$Churn, prediction = nn.results1$net.result)
prob.result.train <- nn.results1$net.result
roundedresults1 <- sapply(results1,round,digits = 0)
roundedresultsdf1 = data.frame(roundedresults1)
attach(roundedresultsdf1)
confusion_matrix_train <- table(actual,prediction)
confusion_matrix_train
## prediction
## actual 0 1
## 0 2220 52
## 1 190 204
Missclassification rate(MR) for the training data:
MR_train <- (confusion_matrix_train[1,2] + confusion_matrix_train[2,1])/nrow(telecom_train)
MR_train
## [1] 0.09077269
ROC curve for training data and AUC:
nn.pred = ROCR::prediction(prob.result.train, telecom_train$Churn)
pref <- performance(nn.pred, "tpr", "fpr")
plot(pref, colorize=TRUE)
auc_train<-unlist(slot(ROCR::performance(nn.pred, "auc"),"y.values"))
auc_train
## [1] 0.8895576
We get the confusion matrix for the testing data:
temp_test <- subset(telecom_test, select = c("AccountWeeks","ContractRenewal","DataPlan","DataUsage","CustServCalls","DayMins","DayCalls","MonthlyCharge","OverageFee","RoamMins"))
nn.results <- compute(nn, temp_test)
results <- data.frame(actual = telecom_test$Churn, prediction = nn.results$net.result)
prob.result.test <- nn.results$net.result
roundedresults <- sapply(results,round,digits = 0)
roundedresultsdf = data.frame(roundedresults)
attach(roundedresultsdf)
confusion_matrix_test <- table(actual,prediction)
confusion_matrix_test
## prediction
## actual 0 1
## 0 569 9
## 1 39 50
Missclassification rate(MR) for the testing data:
MR_test <- (confusion_matrix_test[1,2] + confusion_matrix_test[2,1])/nrow(telecom_test)
MR_test
## [1] 0.07196402
ROC curve for testing data and AUC:
nn.pred = ROCR::prediction(prob.result.test, telecom_test$Churn)
pref <- performance(nn.pred, "tpr", "fpr")
plot(pref, colorize=TRUE)
auc_test<-unlist(slot(ROCR::performance(nn.pred, "auc"),"y.values"))
auc_test
## [1] 0.8989347
We will compare the AUC and missclassification rate (MR) of the training and testing data of both the approaches - Logistic regression and Neural Network. The lower the MR, the better the accuracy of prediction of the model. The higher the AUC, the better the model.
Logistic Regression
## MR for training MR for testing AUC for training AUC for testing
## 0.1414104 0.1259370 0.8085984 0.8521600
Neural Network
## MR for training MR for testing AUC for training AUC for testing
## 0.09077269 0.07196402 0.88955760 0.89893472
We can see that the Neural Network performs much better than the Logistic model here and therefore, we will choose the neural network as our final model for predicting the new data in the future.