The biggest problem a telecom company faces is of Churned Customers. Churning is a term used in this industry to describe whether the consumer or the user is going to continue the service with the company any further or not.
Churning has a huge impact on Revenue of a company. If the company predicts the churn rate of the customers with high accuracy, it gives the company a gestimate of how its revenues would look like and in turn give it freedom to plan finances ahead.
In this project we are more concerned about the customers who are going to get churned. We are going to predict whether the customer be Churned or not on the basis of its billing information and customer demographics.
The most important scenario for a company from a business perspective is to retain customers. Thus, we are more interested in the instances where we predicted that the customer would not be churned but infact the customer was churned. This shows potential losses for the company and is denoted by SPECIFICITY. Thus, our primary goal is to increases the SPECIFICITY of the model and select the model that trades off between ACCURACY and SPECIFICITY.
library(dplyr)
library(rpart)
library(randomForest)
library(ROCR)
library(rpart.plot)
library(dummies)
library(caret)
library(ggplot2)
library(pROC)
library(DT)
churn <- read.csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")
churn$SeniorCitizen <- as.factor(churn$SeniorCitizen)
str(churn)
## 'data.frame': 7043 obs. of 21 variables:
## $ customerID : Factor w/ 7043 levels "0002-ORFBO","0003-MKNFE",..: 5376 3963 2565 5536 6512 6552 1003 4771 5605 4535 ...
## $ gender : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 1 1 2 ...
## $ SeniorCitizen : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ Partner : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 2 1 ...
## $ Dependents : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 2 ...
## $ tenure : int 1 34 2 45 2 8 22 10 28 62 ...
## $ PhoneService : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 1 2 2 ...
## $ MultipleLines : Factor w/ 3 levels "No","No phone service",..: 2 1 1 2 1 3 3 2 3 1 ...
## $ InternetService : Factor w/ 3 levels "DSL","Fiber optic",..: 1 1 1 1 2 2 2 1 2 1 ...
## $ OnlineSecurity : Factor w/ 3 levels "No","No internet service",..: 1 3 3 3 1 1 1 3 1 3 ...
## $ OnlineBackup : Factor w/ 3 levels "No","No internet service",..: 3 1 3 1 1 1 3 1 1 3 ...
## $ DeviceProtection: Factor w/ 3 levels "No","No internet service",..: 1 3 1 3 1 3 1 1 3 1 ...
## $ TechSupport : Factor w/ 3 levels "No","No internet service",..: 1 1 1 3 1 1 1 1 3 1 ...
## $ StreamingTV : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 3 1 3 1 ...
## $ StreamingMovies : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 1 1 3 1 ...
## $ Contract : Factor w/ 3 levels "Month-to-month",..: 1 2 1 2 1 1 1 1 1 2 ...
## $ PaperlessBilling: Factor w/ 2 levels "No","Yes": 2 1 2 1 2 2 2 1 2 1 ...
## $ PaymentMethod : Factor w/ 4 levels "Bank transfer (automatic)",..: 3 4 4 1 3 3 2 4 3 1 ...
## $ MonthlyCharges : num 29.9 57 53.9 42.3 70.7 ...
## $ TotalCharges : num 29.9 1889.5 108.2 1840.8 151.7 ...
## $ Churn : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 1 2 1 ...
churn_copy <- churn[,-1]
sum(is.na(churn_copy))
## [1] 11
We can see that there aree 11 entries in our data set which are empty or have NA values. The number of NA’s as compared to the whole dataset is relatively less and hence we would drop these 11 values.
set.seed(131313)
churn_final <- churn_copy[complete.cases(churn_copy),]
intrain <- createDataPartition(y = churn_final$Churn, p = 0.7, list = FALSE)
training <- churn_final[intrain, ]
testing <- churn_final[-intrain, ]
The goal of this project is the Classify whether the customer would be Churned or Not. It builds up a classic Classification probelm and hence we would run LOGISTIC regression on our data set.
After building the logistic model we would run a STEPWISE logistic regression in forward as well as backward direction to drop the unnecessary variables and obtain a model with as low as possible AIC.
model_glm <- glm(Churn ~ ., data = training, family = binomial)
model_step <- step(model_glm, direction = "both", steps = 10000, trace = FALSE)
predictions_step <- predict(model_step, testing, type = "response")
We predict the class probabilities of all the testing instances by using the predict function on our Stepwise logistic regression model. The next step is to select the right cut-off point so that we can distinguish the instances into 0’s and 1’s depending on their probabilities.
pred_step <- prediction(predictions_step, testing$Churn)
plot(performance(pred_step, "tpr", "fpr"), colorize = TRUE)
auc_step <- performance(pred_step, "auc")
auc_step
## An object of class "performance"
## Slot "x.name":
## [1] "None"
##
## Slot "y.name":
## [1] "Area under the ROC curve"
##
## Slot "alpha.name":
## [1] "none"
##
## Slot "x.values":
## list()
##
## Slot "y.values":
## [[1]]
## [1] 0.8434016
##
##
## Slot "alpha.values":
## list()
roc_step <- roc(response = testing$Churn, predictor = predictions_step)
c <- coords(roc_step, "best", "threshold")
c
## threshold specificity sensitivity
## 0.2865218 0.7525840 0.7821429
We find that the point 0.2865218 is the best cut-off point to distinguish the class probabilities into 0’s and 1’s. We now form the Predicted scores using ifelse loops.
Churn_step <- ifelse(predictions_step >= c[[1]],
"Yes", "No")
cm_log <- confusionMatrix(testing$Churn, Churn_step)
cm_log
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 1165 383
## Yes 122 438
##
## Accuracy : 0.7604
## 95% CI : (0.7416, 0.7785)
## No Information Rate : 0.6105
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.4655
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.9052
## Specificity : 0.5335
## Pos Pred Value : 0.7526
## Neg Pred Value : 0.7821
## Prevalence : 0.6105
## Detection Rate : 0.5527
## Detection Prevalence : 0.7343
## Balanced Accuracy : 0.7194
##
## 'Positive' Class : No
##
From the confusion matrix for stepwise logisitic model we find that the Accuracy of the model is 0.7604364 & Specificity is 0.5334957.
tc_rf <- trainControl(method = "repeatedcv",repeats = 2,number = 3, search = "random")
rf_train <- train(Churn ~ ., data = training, method = "rf", trainControl = tc_rf) ##Time consuming
plot(varImp(rf_train, scale = F))
predict_rftrain <- predict(rf_train, testing)
cm_rf <- confusionMatrix(predict_rftrain, testing$Churn)
cm_rf
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 1471 352
## Yes 77 208
##
## Accuracy : 0.7965
## 95% CI : (0.7787, 0.8135)
## No Information Rate : 0.7343
## P-Value [Acc > NIR] : 1.872e-11
##
## Kappa : 0.3815
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.9503
## Specificity : 0.3714
## Pos Pred Value : 0.8069
## Neg Pred Value : 0.7298
## Prevalence : 0.7343
## Detection Rate : 0.6978
## Detection Prevalence : 0.8648
## Balanced Accuracy : 0.6608
##
## 'Positive' Class : No
##
From the confusion matrix for Random Forest model we find that the Accuracy of the model is 0.7964896 & Specificity is 0.3714286.
rp <- rpart(Churn ~ ., data = training, method = "class")
predictions_rp <- predict(rp, testing, type = "class")
rpart.plot(rp, type=1, extra=100, branch.lty=3, box.palette="RdYlGn", tweak = 1.2, fallen.leaves = FALSE)
cm_rpart <- confusionMatrix(predictions_rp, testing$Churn)
cm_rpart
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 1404 304
## Yes 144 256
##
## Accuracy : 0.7875
## 95% CI : (0.7694, 0.8048)
## No Information Rate : 0.7343
## P-Value [Acc > NIR] : 9.284e-09
##
## Kappa : 0.4006
## Mcnemar's Test P-Value : 5.821e-14
##
## Sensitivity : 0.9070
## Specificity : 0.4571
## Pos Pred Value : 0.8220
## Neg Pred Value : 0.6400
## Prevalence : 0.7343
## Detection Rate : 0.6660
## Detection Prevalence : 0.8102
## Balanced Accuracy : 0.6821
##
## 'Positive' Class : No
##
From the confusion matrix for RPART model we find that the Accuracy of the model is 0.7874763 & Specificity is 0.4571429.
As majority of our variables are Factor variables, we create dummy variables for our data to perform Principal Component Analysis.
dummy.data <- dummy.data.frame(churn_final)
intrain.pca <- createDataPartition(y = dummy.data$ChurnYes, p = 0.7, list = FALSE)
pca.train <- dummy.data[intrain.pca, ]
pca.test <- dummy.data[-intrain.pca, ]
pca.train1 <- pca.train[,-c(47,48)] #Removing the dependent variables
pca.test1 <- pca.test[,-c(47,48)] #Removing the dependent variables
pr_train <- prcomp(pca.train1, center = T, scale. = T)
summary(pr_train)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 3.3801 2.4573 2.06827 1.61305 1.5181 1.41535
## Proportion of Variance 0.2484 0.1313 0.09299 0.05656 0.0501 0.04355
## Cumulative Proportion 0.2484 0.3796 0.47264 0.52920 0.5793 0.62285
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 1.37808 1.25900 1.22355 1.15630 1.14324 1.12002
## Proportion of Variance 0.04128 0.03446 0.03255 0.02907 0.02841 0.02727
## Cumulative Proportion 0.66413 0.69859 0.73114 0.76020 0.78862 0.81589
## PC13 PC14 PC15 PC16 PC17 PC18
## Standard deviation 1.09540 1.07106 1.05333 1.00670 0.97358 0.96478
## Proportion of Variance 0.02608 0.02494 0.02412 0.02203 0.02061 0.02023
## Cumulative Proportion 0.84197 0.86691 0.89103 0.91306 0.93367 0.95390
## PC19 PC20 PC21 PC22 PC23 PC24
## Standard deviation 0.93056 0.9022 0.61726 0.24203 0.03076 4.365e-14
## Proportion of Variance 0.01882 0.0177 0.00828 0.00127 0.00002 0.000e+00
## Cumulative Proportion 0.97273 0.9904 0.99871 0.99998 1.00000 1.000e+00
## PC25 PC26 PC27 PC28 PC29
## Standard deviation 2.9e-14 2.477e-14 5.999e-15 5.969e-15 4.992e-15
## Proportion of Variance 0.0e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.0e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC30 PC31 PC32 PC33 PC34
## Standard deviation 4.025e-15 3.441e-15 3.264e-15 2.538e-15 2.309e-15
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC35 PC36 PC37 PC38 PC39
## Standard deviation 1.986e-15 1.428e-15 1.307e-15 1e-15 6.966e-16
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1e+00 1.000e+00
## PC40 PC41 PC42 PC43 PC44
## Standard deviation 2.41e-16 2.41e-16 2.41e-16 2.41e-16 2.41e-16
## Proportion of Variance 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
## Cumulative Proportion 1.00e+00 1.00e+00 1.00e+00 1.00e+00 1.00e+00
## PC45 PC46
## Standard deviation 2.41e-16 7.346e-17
## Proportion of Variance 0.00e+00 0.000e+00
## Cumulative Proportion 1.00e+00 1.000e+00
plot(pr_train, type = "l")
biplot(pr_train, scale = 0)
pca_var <- pr_train$sdev ^ 2
pca_var_prop <- pca_var / sum(pca_var)
plot(cumsum(pca_var_prop), type = 'b', ylab = "Cummulative Variance", xlab = "Principal Components",main = "Principal Components VS Variance Explained")
pca.final.train <- data.frame(ChurnYes = pca.train$ChurnYes, pr_train$x)
pca.final.train1 <- pca.final.train[,1:16] #Selecting the first 15 Principal components
pca.final.test <- predict(pr_train, pca.test1)
pca.final.test <- as.data.frame(pca.final.test)
pca.final.test1 <- pca.final.test[,1:15]
rpart_pca <- rpart(ChurnYes ~ ., data = pca.final.train1)
predict_rpart.pca <- predict(rpart_pca, newdata = pca.final.test1)
pred_rpart_pca <- prediction(predict_rpart.pca, pca.test$ChurnNo)
perf_rpart_pca <- performance(pred_rpart_pca, "tpr", "fpr")
plot(perf_rpart_pca, colorize = TRUE)
myroc <- roc(pca.test$ChurnNo, predict_rpart.pca)
c_pca <- coords(myroc, "best", ret = "threshold")
churned <- ifelse(predict_rpart.pca > c_pca[[1]], 1, 0)
cm_pca <- confusionMatrix(churned, pca.test$ChurnYes)
cm_pca
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 966 103
## 1 565 475
##
## Accuracy : 0.6833
## 95% CI : (0.6629, 0.7031)
## No Information Rate : 0.7259
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.3626
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.6310
## Specificity : 0.8218
## Pos Pred Value : 0.9036
## Neg Pred Value : 0.4567
## Prevalence : 0.7259
## Detection Rate : 0.4580
## Detection Prevalence : 0.5069
## Balanced Accuracy : 0.7264
##
## 'Positive' Class : 0
##
The summary of the Principal components tells us that the first 15 components explain 89% variance and hence we select the first 15 PCs and move forward to perform Decision trees predicition.
We feed these 15 PCs to the RPART model to predict whether the customer be churned or not.
The biplot suggests the directions of all the variables. It states that Tenure, Contract, Partner etc convey the same data value and explain the same variances as these vectors are in the same direction.
The Cumulative variance plot states that first 15 PCs explain almost 90% variance and PC’s after 23 are redudant and should be dropped as 23 PC’s explain 100% variance.
From the confusion matrix for Decision trees on Principal components model we find that the Accuracy of the model is 0.6832622 & Specificity is 0.8217993.
| Model | Specificty | Accuracy |
|---|---|---|
| Logistic | 0.53 | 0.76 |
| Random Forest | 0.37 | 0.79 |
| RPART | 0.46 | 0.78 |
| PCA + RPART | 0.82 | 0.68 |
According to our problem statement, PCA + Decisison trees yeilds SPECIFICITY of 82%, better than any other model and hence we select it as a final model.
If accuracy is taken into account, Random forest out performs every other problem.