Group Members:

  1. Venkataramana Muragaya (S2107792)
  2. Tan Jeanne Ywei (S2117461)
  3. Ahmad Faiz Aslam bin Mohd Abbas (S2125991)
  4. Sajay Thomas (S2102673)
  5. Siva Prabhu Sunthra Sakaran (S2103885)

INTRODUCTION:

The Telco data contains information about almost seven thousand three hundred users, their demographic characteristics, the services they use, the duration of using the operator’s services, the method of payment, and the amount of payment. The analysis of the data is to predict the churn of users (to identify people who will and will not renew their contract).

PROBLEM STATEMENT:

The Telco industry heavily relies on retaining existing customers. It’s vital for these organizations looking to grow their business without relying too heavily on the significantly higher cost of acquiring new customers. Marketing, sales, and customer retention departments need to work to make sure customers are satisfied, provide them with incentives, and present offers at the right time to reduce churn

Objective

  1. To identify the factors that contribute to customers churning.
  2. To build and evaluate 3 different machine learning models that would predict if the customer will churn

Libraries that are being used for data pre-processing.

library("dplyr")
library("tidyverse")

Importing the files

Data source: Telco Customer Churn/Kaggle

df = read.csv("Telco_Customer_Dataset_Dirty.csv")

Understanding the Data

Head of the data

head(df)

Dimension of the data

dim(df)
## [1] 7322   23

str() used for compactly displaying the internal structure of a R object

str(df)
## 'data.frame':    7322 obs. of  23 variables:
##  $ RowId             : chr  "0096-BXERSMonth-to-monthElectronic check" "0178-CIIKRMonth-to-monthMailed check" "0318-ZOPWSTwo yearBank transfer (automatic)" "4445-ZJNMUMonth-to-monthCredit card (automatic)" ...
##  $ customerID        : chr  "0096-BXERS" "0178-CIIKR" "0318-ZOPWS" "4445-ZJNMU" ...
##  $ gender            : chr  "female" "female" "Female" "M" ...
##  $ SeniorCitizen     : num  0 0 0 0 0 0 1 1 1 0 ...
##  $ Partner           : chr  "Yes" "No" "Yes" "No" ...
##  $ Supplementary.Line: chr  "" "" "" "" ...
##  $ Dependents        : chr  "No" "No" "No" "No" ...
##  $ tenure            : num  6 3 49 9 12 1 1 16 24 36 ...
##  $ PhoneService      : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ MultipleLines     : chr  "Yes" "No" "No" "Yes" ...
##  $ InternetService   : chr  "DSL" "No" "No" "Fiber optic" ...
##  $ OnlineSecurity    : chr  "No" "No internet service" "No internet service" "No" ...
##  $ OnlineBackup      : chr  "No" "No internet service" "No internet service" "Yes" ...
##  $ DeviceProtection  : chr  "No" "No internet service" "No internet service" "No" ...
##  $ TechSupport       : chr  "No" "No internet service" "No internet service" "No" ...
##  $ StreamingTV       : chr  "No" "No internet service" "No internet service" "Yes" ...
##  $ StreamingMovies   : chr  "No" "No internet service" "No internet service" "Yes" ...
##  $ Contract          : chr  "Month-to-month" "Month-to-month" "Two year" "Month-to-month" ...
##  $ PaperlessBilling  : chr  "n" "No" "Yes" "Yes" ...
##  $ PaymentMethod     : chr  "Electronic check" "Mailed check" "Bank transfer (automatic)" "Credit card (automatic)" ...
##  $ MonthlyCharges    : num  50.4 19.9 20.1 99.3 19.8 ...
##  $ TotalCharges      : num  315 58 973 919 202 ...
##  $ Churn             : chr  "No" "No" "No" "No" ...

Data Cleaning

Indentifying total duplicate values

sum(duplicated(df$customerID))
## [1] 279

Removing duplicated rows

df = df[!duplicated(df$customerID), ]

Checking if any duplicates are remaining

sum(duplicated(df$customerID))
## [1] 0
print(dim(df))
## [1] 7043   23

Checking ‘na’ collumns

colSums(is.na(df))
##              RowId         customerID             gender      SeniorCitizen 
##                  0                  0                  0                  0 
##            Partner Supplementary.Line         Dependents             tenure 
##                  0                  0                  0                  0 
##       PhoneService      MultipleLines    InternetService     OnlineSecurity 
##                  0                  0                  0                  0 
##       OnlineBackup   DeviceProtection        TechSupport        StreamingTV 
##                  0                  0                  0                  0 
##    StreamingMovies           Contract   PaperlessBilling      PaymentMethod 
##                  0                  0                  0                  0 
##     MonthlyCharges       TotalCharges              Churn 
##                  0                 11                  0

Finding missing values in the columns

colSums(df == "")
##              RowId         customerID             gender      SeniorCitizen 
##                  0                  0                  0                  0 
##            Partner Supplementary.Line         Dependents             tenure 
##                  0               6996                  0                  0 
##       PhoneService      MultipleLines    InternetService     OnlineSecurity 
##                  0                  0                  0                  0 
##       OnlineBackup   DeviceProtection        TechSupport        StreamingTV 
##                  0                  0                  0                  0 
##    StreamingMovies           Contract   PaperlessBilling      PaymentMethod 
##                  0                  0                  0                  0 
##     MonthlyCharges       TotalCharges              Churn 
##                  0                 NA                  0

Droping columns with missing or unecessary values

drop <- c("RowId","Supplementary.Line")
df = df[,!(names(df) %in% drop)]

‘Row ID’ and ‘Supplementary.Line’ is dropped.

RowID is a concatenation of customerID + Contract + PaymentMethod which is probably only being used in Data Warehousing

Supplementary.Line has more than 80% missing values and it’s safe to drop as it may cause bias

Viewing rows that contains missing values

new_DF <- df[rowSums(is.na(df)) > 0,]
head(new_DF,20)

Droping rows that contains ‘NaN’

df = na.omit(df)
colSums(is.na(df))
##       customerID           gender    SeniorCitizen          Partner 
##                0                0                0                0 
##       Dependents           tenure     PhoneService    MultipleLines 
##                0                0                0                0 
##  InternetService   OnlineSecurity     OnlineBackup DeviceProtection 
##                0                0                0                0 
##      TechSupport      StreamingTV  StreamingMovies         Contract 
##                0                0                0                0 
## PaperlessBilling    PaymentMethod   MonthlyCharges     TotalCharges 
##                0                0                0                0 
##            Churn 
##                0
head(df)

Standardizing the attributes

Grouping ‘gender’ attributes reveals that various gender attributes were used.

df %>% group_by(gender)%>%  dplyr::summarise(count = n())

Standardizing the ‘gender’ column.

male_cat <- c('m','male','M')
female_cat <- c('f','female','F')
df$gender[df$gender %in% male_cat]  <- 'Male'
df$gender[df$gender %in% female_cat]  <- 'Female'

Checking gender standardization results.

df %>% group_by(gender)%>%  dplyr::summarise(count = n())

Standardizing ‘Paperless Billing’.

df %>% group_by(PaperlessBilling)%>%  dplyr::summarise(count = n())
yes_cat <- c('n','no','N')
no_cat <- c('y','yes','Y')
df$PaperlessBilling[df$PaperlessBilling %in% yes_cat]  <- 'Yes'
df$PaperlessBilling[df$PaperlessBilling %in% no_cat]  <- 'No'

df %>% group_by(PaperlessBilling)%>%  dplyr::summarise(count = n())

Standardizing ‘Contract’.

df %>% group_by(Contract)%>%  dplyr::summarise(count = n())
one_cat <- c('1 year')
two_cat <- c('2 year')
df$Contract[df$Contract %in% one_cat]  <- 'One year'
df$Contract[df$Contract %in% two_cat]  <- 'Two year'

df %>% group_by(Contract)%>%  dplyr::summarise(count = n())

Dimension of the data.

dim(df)
## [1] 7032   21

Saving the cleaned dataset.

write.csv(df,'Telco_Customer_Dataset_Cleaned.csv')

Exploring the cleaned data.

Describing the columns/variables.

Column Name Description
customerID Customer ID
gender Customer gender
SeniorCitizen Whether the customer is a senior citizen or not
Partner Whether the customer has a partner or not
Dependents Whether the customer has dependents or not
tenure Number of months the customer has stayed with the company
PhoneService Whether the customer has a phone service or not
MultipleLines Whether the customer has multiple lines or not
InternetService Customer’s internet service provider
OnlineSecurity Whether the customer has online security or not
OnlineBackup Whether the customer has online backup or not
DeviceProtection Whether the customer has device protection or not
TechSupport Whether the customer has tech support or not
StreamingTV Whether the customer has streaming TV or not
StreamingMovies Whether the customer has streaming movies or not
Contract The contract term of the customer
PaperlessBilling Whether the customer has paperless billing or not
PaymentMethod The customer’s payment method
MonthlyCharges The amount charged to the customer monthly
TotalCharges The total amount charged to the customer

Libraries loaded for data exploration.

library(ggplot2)
library(caret)
library(cowplot)
library(ggcorrplot)
library(gridExtra)

Importing the cleaned files

df = read.csv("Telco_Customer_Dataset_Cleaned.csv")

Understanding customer through EDA

Does the product suit people with family ?

x = ggplot(df, aes(Partner, fill = Churn)) + 
  geom_bar() +
  labs(title = "Customer Partner Status", 
       x = "Does the Customer have a Partner?", 
       y = "Count") 
y= ggplot(df, aes(Dependents, fill = Churn)) + 
  geom_bar() +
  labs(title = "Customer Dependents Status", 
       x = "Does the Customer have Dependents?", 
       y = "Count")

grid.arrange(x, y,nrow=1,ncol=2)

Customers with a partner or dependents are much less likely to cancel their service.

This shows that the client’s product is mostly attractive to customers with families.

The company should improve their product to suit customers without dependents or a partner to able to maximize their profit.

How long is the customer retention?

df <- mutate(df, tenure_bin = tenure)

df$tenure_bin[df$tenure_bin >=0 & df$tenure_bin <= 12] <- '0-1 year'
df$tenure_bin[df$tenure_bin > 12 & df$tenure_bin <= 24] <- '1-2 years'
df$tenure_bin[df$tenure_bin > 24 & df$tenure_bin <= 36] <- '2-3 years'
df$tenure_bin[df$tenure_bin > 36 & df$tenure_bin <= 48] <- '3-4 years'
df$tenure_bin[df$tenure_bin > 48 & df$tenure_bin <= 60] <- '4-5 years'
df$tenure_bin[df$tenure_bin > 60 & df$tenure_bin <= 72] <- '5-6 years'

df$tenure_bin <- as.factor(df$tenure_bin)
options(repr.plot.width =6, repr.plot.height = 3)
ggplot(df, aes(tenure_bin, fill = Partner)) + geom_bar()

The highest frequency of customer retention period is 0 -1 year, this shows that the client has recently gained new customer through a recent promotion.

The number of churn decreases with the number of months which indicates that old customers are less difficult to maintain their contract.

The median also revealed that 10 months is common customer attrition period.

The company manages to maintain older clientele. The focus of the company should be in the implementation of strategies for the new clients.

options(repr.plot.width =6, repr.plot.height = 2)
ggplot(df, aes(y= tenure, x = "", fill = Churn)) + 
geom_boxplot()+ 
theme_bw()+
xlab("") +ylab("Tenure (months)")

Secondary exploration of data.

Libraries used for secondary exploration.

library("tidyverse")
library("dplyr")

Reading the clean data

df = read.csv("Telco_Customer_Dataset_Cleaned.csv")

Contract Type

Which contract type has the highest churn?

ggplot(df, aes(Contract, fill = Churn)) + 
  geom_bar() +
  labs(x = "Type of Contract Customer Has", 
       y = "Count") +
  ggtitle("Popularity of Contract Types") + 
  theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"))

Customers that have longer than a year contract tend to stay with their telco compared to those that only have monthly contracts.

Long terms contracts should come with attractive package to ensure customers favor them better than monthly contracts.

Monthly Charges

How does the monthly charges affect churn?

ggplot(df, aes(MonthlyCharges, fill = Churn)) + 
  geom_histogram() +
  labs(x = "Monthly Charge to Customer", 
       y = "Count") +
  ggtitle("Monthly Charges Histogram") + 
  theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"))

Customers having higher monthly charges tend to churn more than those with lower charges. This may be due to them believing the service provided is not value for money.

Internet Service

Is there any internet service with a higher churn rate?

df %>%
  dplyr::select(Churn, InternetService, OnlineSecurity, OnlineBackup) %>%
  mutate(OnlineSecurity = factor(OnlineSecurity, 
                                 levels = c("Yes", "No", "No internet service"),
                                 labels = c("Security", "No security", "No internet")),
         OnlineBackup = factor(OnlineBackup,
                               levels =  c("Yes", "No", "No internet service"),
                               labels = c("Backup", "No backup", "No internet"))) %>%
  group_by(Churn, InternetService, OnlineSecurity, OnlineBackup) %>%
  count() %>%
  ggplot(aes(x = InternetService, y = n, fill = Churn)) +
  geom_col(position = "fill") +
  scale_fill_brewer(palette = "Set2") +
  facet_grid(OnlineBackup ~ OnlineSecurity) +
  labs(y = "Customers") +
  ggtitle("Customer Churn by Internet service") + 
  theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold")) +
  scale_y_continuous(labels = scales::percent)  

Value-Added Services

Which services have a greater effect on customer churn rate?

df %>%
  mutate(SeniorCitizen = ifelse(SeniorCitizen == 0, "No", "Yes")) -> categorical

categorical %>%
  dplyr::select(gender:Dependents, PhoneService:PaymentMethod, Churn) -> categorical 

categorical %>%
  dplyr::select(MultipleLines, OnlineSecurity:StreamingMovies, Churn) %>%
  filter(MultipleLines != "No phone service" &
           OnlineSecurity != "No internet service") -> c2
           
gather(c2, columns, value, -Churn) -> c3

ggplot(c3)+
  geom_bar(aes(x = value, fill = Churn), position = "fill", stat = "count")+
  facet_wrap(~columns)+ 
  xlab("Attributes") +
  ggtitle("Value-Added Service Against Churn") + 
  theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"))

Those who subscribe to the service of Device Protection, Online Backup, Online Security and Tech Support have lower churn rate compared to those who didn’t subscribe.

Building Prediction Models for Predicting Customer Churn

We are going to build 3 types of machine learning models namely:-

1) Logistic Regression

2) Decision Tree

3) Random Forest

library(caTools)
library(car)
library(pROC)
library(MASS)
library(cowplot)
library(e1071)

Minor Cleaning Before Building The dataset

data_raw = read.csv("Telco_Customer_Dataset_Cleaned.csv")
data_raw <- dplyr::select(data_raw, -1)
data_raw <- data_raw[complete.cases(data_raw),]
data_raw$SeniorCitizen <- as.factor(ifelse(data_raw$SeniorCitizen==1, 'YES', 'NO'))

Preparing Data For Training:

There are 6 steps we will undertake to prepare the dataset for training which are:-

1) We clean the categorical features in the dataset

For our machine learning purpose, we are aware that there are few categorical features for example that have ‘No Internet Service’ or ‘No Phone Service’ as categories. Thus we can transform them as ‘No’ and reorganize these features.

data_raw <- data.frame(lapply(data_raw, function(x) {
  gsub("No internet service", "No", x)}))

data_raw <- data.frame(lapply(data_raw, function(x) {
  gsub("No phone service", "No", x)}))

2) We standardise the continuous features in the dataset

columns <- c("tenure", "MonthlyCharges", "TotalCharges")
data_raw[columns] <- sapply(data_raw[columns], as.numeric)
data_raw_int<- data_raw[,c("tenure", "MonthlyCharges", "TotalCharges")]
data_raw_int <- data.frame(scale(data_raw_int))

3) We create derived features in the dataset

Let’s proceed to create derived features from tenure, where we have made multiple bins of tenure which are in months format to ‘0-1 year’, ‘2-3 years’, ‘3-4 years’, ‘4-5 years’, and etc.

data_raw <- mutate(data_raw, tenure_bin = tenure)

data_raw$tenure_bin[data_raw$tenure_bin >=0 & data_raw$tenure_bin <= 12] <- '0-1 year'
data_raw$tenure_bin[data_raw$tenure_bin > 12 & data_raw$tenure_bin <= 24] <- '1-2 years'
data_raw$tenure_bin[data_raw$tenure_bin > 24 & data_raw$tenure_bin <= 36] <- '2-3 years'
data_raw$tenure_bin[data_raw$tenure_bin > 36 & data_raw$tenure_bin <= 48] <- '3-4 years'
data_raw$tenure_bin[data_raw$tenure_bin > 48 & data_raw$tenure_bin <= 60] <- '4-5 years'
data_raw$tenure_bin[data_raw$tenure_bin > 60 & data_raw$tenure_bin <= 72] <- '5-6 years'

data_raw$tenure_bin <- as.factor(data_raw$tenure_bin)

4) We create dummy variables for factor variables in our dataset

data_raw_cat <- data_raw[,-c(1,6,19,20)]
dummy<- data.frame(sapply(data_raw_cat,function(x) data.frame(model.matrix(~x-1,data =data_raw_cat))[,-1]))
head(dummy)

5) We build the the final dataset by combining numeric and dummy data frames

data_final <- cbind(data_raw_int,dummy)
head(data_final)

6) We split the data into train and test set.

set.seed(123)
indices = sample.split(data_final$Churn, SplitRatio = 0.7)
train = data_final[indices,]
validation = data_final[!(indices),]

Building Machine Learning Models & Evaluating Them

1) Logistic Regression (Regression)

logisticRegressionModel_1 = glm(Churn ~ ., data = train, family = "binomial")
summary(logisticRegressionModel_1)
## 
## Call:
## glm(formula = Churn ~ ., family = "binomial", data = train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0302  -0.6825  -0.2813   0.6462   3.1922  
## 
## Coefficients:
##                                         Estimate Std. Error z value Pr(>|z|)
## (Intercept)                            -4.658413   1.594718  -2.921 0.003487
## tenure                                 -2.267207   0.312586  -7.253 4.07e-13
## MonthlyCharges                         -1.573734   1.162395  -1.354 0.175777
## TotalCharges                            0.047889   0.191794   0.250 0.802826
## gender                                 -0.044823   0.079352  -0.565 0.572167
## SeniorCitizen                           0.412759   0.102490   4.027 5.64e-05
## Partner                                 0.005844   0.095418   0.061 0.951166
## Dependents                             -0.086100   0.108181  -0.796 0.426097
## PhoneService                            0.611067   0.790021   0.773 0.439237
## MultipleLines                           0.628124   0.213993   2.935 0.003333
## InternetService.xFiber.optic            2.190157   0.971833   2.254 0.024219
## InternetService.xNo                    -2.157244   0.983457  -2.194 0.028269
## OnlineSecurity                         -0.069821   0.216431  -0.323 0.746996
## OnlineBackup                            0.121141   0.214402   0.565 0.572064
## DeviceProtection                        0.298124   0.214337   1.391 0.164251
## TechSupport                             0.017730   0.218627   0.081 0.935366
## StreamingTV                             0.759168   0.396137   1.916 0.055310
## StreamingMovies                         0.843451   0.398770   2.115 0.034419
## Contract.xOne.year                     -0.713523   0.129933  -5.491 3.99e-08
## Contract.xTwo.year                     -1.500326   0.216948  -6.916 4.66e-12
## PaperlessBilling                       -0.110683   0.125114  -0.885 0.376340
## PaymentMethod.xCredit.card..automatic.  0.022790   0.136162   0.167 0.867077
## PaymentMethod.xElectronic.check         0.377997   0.113721   3.324 0.000888
## PaymentMethod.xMailed.check            -0.011456   0.138454  -0.083 0.934058
## tenure_bin.x1.2.years                   0.257856   0.188121   1.371 0.170472
## tenure_bin.x2.3.years                   0.877473   0.312040   2.812 0.004923
## tenure_bin.x3.4.years                   1.927149   0.448153   4.300 1.71e-05
## tenure_bin.x4.5.years                   2.885506   0.578430   4.989 6.08e-07
## tenure_bin.x5.6.years                   3.581315   0.721274   4.965 6.86e-07
##                                           
## (Intercept)                            ** 
## tenure                                 ***
## MonthlyCharges                            
## TotalCharges                              
## gender                                    
## SeniorCitizen                          ***
## Partner                                   
## Dependents                                
## PhoneService                              
## MultipleLines                          ** 
## InternetService.xFiber.optic           *  
## InternetService.xNo                    *  
## OnlineSecurity                            
## OnlineBackup                              
## DeviceProtection                          
## TechSupport                               
## StreamingTV                            .  
## StreamingMovies                        *  
## Contract.xOne.year                     ***
## Contract.xTwo.year                     ***
## PaperlessBilling                          
## PaymentMethod.xCredit.card..automatic.    
## PaymentMethod.xElectronic.check        ***
## PaymentMethod.xMailed.check               
## tenure_bin.x1.2.years                     
## tenure_bin.x2.3.years                  ** 
## tenure_bin.x3.4.years                  ***
## tenure_bin.x4.5.years                  ***
## tenure_bin.x5.6.years                  ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5699.5  on 4921  degrees of freedom
## Residual deviance: 4034.4  on 4893  degrees of freedom
## AIC: 4092.4
## 
## Number of Fisher Scoring iterations: 6

We will be using stepAIC function for variable selection. It is a continpus process of adding and removing variables for us to get a subset of variables that will provide us the best performing logistic regression model.

logisticRegressionModel_2<- stepAIC(logisticRegressionModel_1, direction="both")
## Start:  AIC=4092.35
## Churn ~ tenure + MonthlyCharges + TotalCharges + gender + SeniorCitizen + 
##     Partner + Dependents + PhoneService + MultipleLines + InternetService.xFiber.optic + 
##     InternetService.xNo + OnlineSecurity + OnlineBackup + DeviceProtection + 
##     TechSupport + StreamingTV + StreamingMovies + Contract.xOne.year + 
##     Contract.xTwo.year + PaperlessBilling + PaymentMethod.xCredit.card..automatic. + 
##     PaymentMethod.xElectronic.check + PaymentMethod.xMailed.check + 
##     tenure_bin.x1.2.years + tenure_bin.x2.3.years + tenure_bin.x3.4.years + 
##     tenure_bin.x4.5.years + tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## - Partner                                 1   4034.4 4090.4
## - TechSupport                             1   4034.4 4090.4
## - PaymentMethod.xMailed.check             1   4034.4 4090.4
## - PaymentMethod.xCredit.card..automatic.  1   4034.4 4090.4
## - TotalCharges                            1   4034.4 4090.4
## - OnlineSecurity                          1   4034.5 4090.5
## - gender                                  1   4034.7 4090.7
## - OnlineBackup                            1   4034.7 4090.7
## - PhoneService                            1   4035.0 4091.0
## - Dependents                              1   4035.0 4091.0
## - PaperlessBilling                        1   4035.1 4091.1
## - MonthlyCharges                          1   4036.2 4092.2
## - tenure_bin.x1.2.years                   1   4036.2 4092.2
## - DeviceProtection                        1   4036.3 4092.3
## <none>                                        4034.4 4092.4
## - StreamingTV                             1   4038.0 4094.0
## - StreamingMovies                         1   4038.8 4094.8
## - InternetService.xNo                     1   4039.2 4095.2
## - InternetService.xFiber.optic            1   4039.4 4095.4
## - tenure_bin.x2.3.years                   1   4042.3 4098.3
## - MultipleLines                           1   4043.0 4099.0
## - PaymentMethod.xElectronic.check         1   4045.5 4101.5
## - SeniorCitizen                           1   4050.6 4106.6
## - tenure_bin.x3.4.years                   1   4052.9 4108.9
## - tenure_bin.x5.6.years                   1   4059.2 4115.2
## - tenure_bin.x4.5.years                   1   4059.4 4115.4
## - Contract.xOne.year                      1   4066.0 4122.0
## - tenure                                  1   4088.6 4144.6
## - Contract.xTwo.year                      1   4091.7 4147.7
## 
## Step:  AIC=4090.36
## Churn ~ tenure + MonthlyCharges + TotalCharges + gender + SeniorCitizen + 
##     Dependents + PhoneService + MultipleLines + InternetService.xFiber.optic + 
##     InternetService.xNo + OnlineSecurity + OnlineBackup + DeviceProtection + 
##     TechSupport + StreamingTV + StreamingMovies + Contract.xOne.year + 
##     Contract.xTwo.year + PaperlessBilling + PaymentMethod.xCredit.card..automatic. + 
##     PaymentMethod.xElectronic.check + PaymentMethod.xMailed.check + 
##     tenure_bin.x1.2.years + tenure_bin.x2.3.years + tenure_bin.x3.4.years + 
##     tenure_bin.x4.5.years + tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## - TechSupport                             1   4034.4 4088.4
## - PaymentMethod.xMailed.check             1   4034.4 4088.4
## - PaymentMethod.xCredit.card..automatic.  1   4034.4 4088.4
## - TotalCharges                            1   4034.4 4088.4
## - OnlineSecurity                          1   4034.5 4088.5
## - OnlineBackup                            1   4034.7 4088.7
## - gender                                  1   4034.7 4088.7
## - PhoneService                            1   4035.0 4089.0
## - Dependents                              1   4035.1 4089.1
## - PaperlessBilling                        1   4035.1 4089.1
## - MonthlyCharges                          1   4036.2 4090.2
## - tenure_bin.x1.2.years                   1   4036.2 4090.2
## - DeviceProtection                        1   4036.3 4090.3
## <none>                                        4034.4 4090.4
## - StreamingTV                             1   4038.0 4092.0
## + Partner                                 1   4034.4 4092.4
## - StreamingMovies                         1   4038.8 4092.8
## - InternetService.xNo                     1   4039.2 4093.2
## - InternetService.xFiber.optic            1   4039.4 4093.4
## - tenure_bin.x2.3.years                   1   4042.3 4096.3
## - MultipleLines                           1   4043.0 4097.0
## - PaymentMethod.xElectronic.check         1   4045.5 4099.5
## - SeniorCitizen                           1   4050.8 4104.8
## - tenure_bin.x3.4.years                   1   4052.9 4106.9
## - tenure_bin.x5.6.years                   1   4059.2 4113.2
## - tenure_bin.x4.5.years                   1   4059.4 4113.4
## - Contract.xOne.year                      1   4066.0 4120.0
## - tenure                                  1   4088.9 4142.9
## - Contract.xTwo.year                      1   4091.8 4145.8
## 
## Step:  AIC=4088.36
## Churn ~ tenure + MonthlyCharges + TotalCharges + gender + SeniorCitizen + 
##     Dependents + PhoneService + MultipleLines + InternetService.xFiber.optic + 
##     InternetService.xNo + OnlineSecurity + OnlineBackup + DeviceProtection + 
##     StreamingTV + StreamingMovies + Contract.xOne.year + Contract.xTwo.year + 
##     PaperlessBilling + PaymentMethod.xCredit.card..automatic. + 
##     PaymentMethod.xElectronic.check + PaymentMethod.xMailed.check + 
##     tenure_bin.x1.2.years + tenure_bin.x2.3.years + tenure_bin.x3.4.years + 
##     tenure_bin.x4.5.years + tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## - PaymentMethod.xMailed.check             1   4034.4 4086.4
## - PaymentMethod.xCredit.card..automatic.  1   4034.4 4086.4
## - TotalCharges                            1   4034.4 4086.4
## - gender                                  1   4034.7 4086.7
## - OnlineSecurity                          1   4034.7 4086.7
## - OnlineBackup                            1   4035.0 4087.0
## - Dependents                              1   4035.1 4087.1
## - PaperlessBilling                        1   4035.1 4087.1
## - tenure_bin.x1.2.years                   1   4036.2 4088.2
## - PhoneService                            1   4036.3 4088.3
## <none>                                        4034.4 4088.4
## + TechSupport                             1   4034.4 4090.4
## + Partner                                 1   4034.4 4090.4
## - DeviceProtection                        1   4038.9 4090.9
## - MonthlyCharges                          1   4041.7 4093.7
## - tenure_bin.x2.3.years                   1   4042.3 4094.3
## - PaymentMethod.xElectronic.check         1   4045.5 4097.5
## - StreamingTV                             1   4046.7 4098.7
## - StreamingMovies                         1   4049.6 4101.6
## - SeniorCitizen                           1   4050.8 4102.8
## - InternetService.xNo                     1   4051.0 4103.0
## - tenure_bin.x3.4.years                   1   4053.0 4105.0
## - InternetService.xFiber.optic            1   4056.3 4108.3
## - MultipleLines                           1   4056.7 4108.7
## - tenure_bin.x5.6.years                   1   4059.2 4111.2
## - tenure_bin.x4.5.years                   1   4059.4 4111.4
## - Contract.xOne.year                      1   4066.0 4118.0
## - tenure                                  1   4088.9 4140.9
## - Contract.xTwo.year                      1   4092.0 4144.0
## 
## Step:  AIC=4086.37
## Churn ~ tenure + MonthlyCharges + TotalCharges + gender + SeniorCitizen + 
##     Dependents + PhoneService + MultipleLines + InternetService.xFiber.optic + 
##     InternetService.xNo + OnlineSecurity + OnlineBackup + DeviceProtection + 
##     StreamingTV + StreamingMovies + Contract.xOne.year + Contract.xTwo.year + 
##     PaperlessBilling + PaymentMethod.xCredit.card..automatic. + 
##     PaymentMethod.xElectronic.check + tenure_bin.x1.2.years + 
##     tenure_bin.x2.3.years + tenure_bin.x3.4.years + tenure_bin.x4.5.years + 
##     tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## - PaymentMethod.xCredit.card..automatic.  1   4034.4 4084.4
## - TotalCharges                            1   4034.4 4084.4
## - gender                                  1   4034.7 4084.7
## - OnlineSecurity                          1   4034.7 4084.7
## - OnlineBackup                            1   4035.0 4085.0
## - Dependents                              1   4035.1 4085.1
## - PaperlessBilling                        1   4035.2 4085.2
## - tenure_bin.x1.2.years                   1   4036.2 4086.2
## - PhoneService                            1   4036.3 4086.3
## <none>                                        4034.4 4086.4
## + PaymentMethod.xMailed.check             1   4034.4 4088.4
## + TechSupport                             1   4034.4 4088.4
## + Partner                                 1   4034.4 4088.4
## - DeviceProtection                        1   4038.9 4088.9
## - MonthlyCharges                          1   4041.7 4091.7
## - tenure_bin.x2.3.years                   1   4042.3 4092.3
## - StreamingTV                             1   4046.7 4096.7
## - StreamingMovies                         1   4049.7 4099.7
## - SeniorCitizen                           1   4050.9 4100.9
## - InternetService.xNo                     1   4051.1 4101.1
## - PaymentMethod.xElectronic.check         1   4051.9 4101.9
## - tenure_bin.x3.4.years                   1   4053.0 4103.0
## - InternetService.xFiber.optic            1   4056.4 4106.4
## - MultipleLines                           1   4056.8 4106.8
## - tenure_bin.x5.6.years                   1   4059.2 4109.2
## - tenure_bin.x4.5.years                   1   4059.5 4109.5
## - Contract.xOne.year                      1   4066.1 4116.1
## - tenure                                  1   4089.4 4139.4
## - Contract.xTwo.year                      1   4092.0 4142.0
## 
## Step:  AIC=4084.43
## Churn ~ tenure + MonthlyCharges + TotalCharges + gender + SeniorCitizen + 
##     Dependents + PhoneService + MultipleLines + InternetService.xFiber.optic + 
##     InternetService.xNo + OnlineSecurity + OnlineBackup + DeviceProtection + 
##     StreamingTV + StreamingMovies + Contract.xOne.year + Contract.xTwo.year + 
##     PaperlessBilling + PaymentMethod.xElectronic.check + tenure_bin.x1.2.years + 
##     tenure_bin.x2.3.years + tenure_bin.x3.4.years + tenure_bin.x4.5.years + 
##     tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## - TotalCharges                            1   4034.5 4082.5
## - gender                                  1   4034.8 4082.8
## - OnlineSecurity                          1   4034.8 4082.8
## - OnlineBackup                            1   4035.1 4083.1
## - Dependents                              1   4035.2 4083.2
## - PaperlessBilling                        1   4035.2 4083.2
## - tenure_bin.x1.2.years                   1   4036.3 4084.3
## - PhoneService                            1   4036.4 4084.4
## <none>                                        4034.4 4084.4
## + PaymentMethod.xCredit.card..automatic.  1   4034.4 4086.4
## + PaymentMethod.xMailed.check             1   4034.4 4086.4
## + TechSupport                             1   4034.4 4086.4
## + Partner                                 1   4034.4 4086.4
## - DeviceProtection                        1   4039.0 4087.0
## - MonthlyCharges                          1   4041.7 4089.7
## - tenure_bin.x2.3.years                   1   4042.3 4090.3
## - StreamingTV                             1   4046.8 4094.8
## - StreamingMovies                         1   4049.7 4097.7
## - SeniorCitizen                           1   4051.0 4099.0
## - InternetService.xNo                     1   4051.2 4099.2
## - tenure_bin.x3.4.years                   1   4053.0 4101.0
## - PaymentMethod.xElectronic.check         1   4054.3 4102.3
## - InternetService.xFiber.optic            1   4056.4 4104.4
## - MultipleLines                           1   4056.9 4104.9
## - tenure_bin.x5.6.years                   1   4059.3 4107.3
## - tenure_bin.x4.5.years                   1   4059.5 4107.5
## - Contract.xOne.year                      1   4066.1 4114.1
## - tenure                                  1   4089.4 4137.4
## - Contract.xTwo.year                      1   4092.1 4140.1
## 
## Step:  AIC=4082.48
## Churn ~ tenure + MonthlyCharges + gender + SeniorCitizen + Dependents + 
##     PhoneService + MultipleLines + InternetService.xFiber.optic + 
##     InternetService.xNo + OnlineSecurity + OnlineBackup + DeviceProtection + 
##     StreamingTV + StreamingMovies + Contract.xOne.year + Contract.xTwo.year + 
##     PaperlessBilling + PaymentMethod.xElectronic.check + tenure_bin.x1.2.years + 
##     tenure_bin.x2.3.years + tenure_bin.x3.4.years + tenure_bin.x4.5.years + 
##     tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## - gender                                  1   4034.8 4080.8
## - OnlineSecurity                          1   4034.8 4080.8
## - OnlineBackup                            1   4035.1 4081.1
## - Dependents                              1   4035.2 4081.2
## - PaperlessBilling                        1   4035.2 4081.2
## - tenure_bin.x1.2.years                   1   4036.3 4082.3
## - PhoneService                            1   4036.4 4082.4
## <none>                                        4034.5 4082.5
## + TotalCharges                            1   4034.4 4084.4
## + PaymentMethod.xCredit.card..automatic.  1   4034.4 4084.4
## + PaymentMethod.xMailed.check             1   4034.5 4084.5
## + TechSupport                             1   4034.5 4084.5
## + Partner                                 1   4034.5 4084.5
## - DeviceProtection                        1   4039.0 4085.0
## - MonthlyCharges                          1   4041.8 4087.8
## - tenure_bin.x2.3.years                   1   4042.4 4088.4
## - StreamingTV                             1   4046.8 4092.8
## - StreamingMovies                         1   4049.7 4095.7
## - SeniorCitizen                           1   4051.0 4097.0
## - InternetService.xNo                     1   4051.3 4097.3
## - tenure_bin.x3.4.years                   1   4053.3 4099.3
## - PaymentMethod.xElectronic.check         1   4054.3 4100.3
## - InternetService.xFiber.optic            1   4056.4 4102.4
## - MultipleLines                           1   4057.0 4103.0
## - tenure_bin.x4.5.years                   1   4060.1 4106.1
## - tenure_bin.x5.6.years                   1   4060.4 4106.4
## - Contract.xOne.year                      1   4066.2 4112.2
## - Contract.xTwo.year                      1   4093.3 4139.3
## - tenure                                  1   4097.4 4143.4
## 
## Step:  AIC=4080.81
## Churn ~ tenure + MonthlyCharges + SeniorCitizen + Dependents + 
##     PhoneService + MultipleLines + InternetService.xFiber.optic + 
##     InternetService.xNo + OnlineSecurity + OnlineBackup + DeviceProtection + 
##     StreamingTV + StreamingMovies + Contract.xOne.year + Contract.xTwo.year + 
##     PaperlessBilling + PaymentMethod.xElectronic.check + tenure_bin.x1.2.years + 
##     tenure_bin.x2.3.years + tenure_bin.x3.4.years + tenure_bin.x4.5.years + 
##     tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## - OnlineSecurity                          1   4035.1 4079.1
## - PaperlessBilling                        1   4035.4 4079.4
## - OnlineBackup                            1   4035.5 4079.5
## - Dependents                              1   4035.6 4079.6
## - tenure_bin.x1.2.years                   1   4036.7 4080.7
## - PhoneService                            1   4036.7 4080.7
## <none>                                        4034.8 4080.8
## + gender                                  1   4034.5 4082.5
## + TotalCharges                            1   4034.8 4082.8
## + PaymentMethod.xCredit.card..automatic.  1   4034.8 4082.8
## + PaymentMethod.xMailed.check             1   4034.8 4082.8
## + TechSupport                             1   4034.8 4082.8
## + Partner                                 1   4034.8 4082.8
## - DeviceProtection                        1   4039.4 4083.4
## - MonthlyCharges                          1   4042.1 4086.1
## - tenure_bin.x2.3.years                   1   4042.8 4086.8
## - StreamingTV                             1   4047.2 4091.2
## - StreamingMovies                         1   4050.2 4094.2
## - SeniorCitizen                           1   4051.4 4095.4
## - InternetService.xNo                     1   4051.7 4095.7
## - tenure_bin.x3.4.years                   1   4053.7 4097.7
## - PaymentMethod.xElectronic.check         1   4054.6 4098.6
## - InternetService.xFiber.optic            1   4056.9 4100.9
## - MultipleLines                           1   4057.4 4101.4
## - tenure_bin.x4.5.years                   1   4060.5 4104.5
## - tenure_bin.x5.6.years                   1   4060.8 4104.8
## - Contract.xOne.year                      1   4066.5 4110.5
## - Contract.xTwo.year                      1   4093.6 4137.6
## - tenure                                  1   4097.9 4141.9
## 
## Step:  AIC=4079.15
## Churn ~ tenure + MonthlyCharges + SeniorCitizen + Dependents + 
##     PhoneService + MultipleLines + InternetService.xFiber.optic + 
##     InternetService.xNo + OnlineBackup + DeviceProtection + StreamingTV + 
##     StreamingMovies + Contract.xOne.year + Contract.xTwo.year + 
##     PaperlessBilling + PaymentMethod.xElectronic.check + tenure_bin.x1.2.years + 
##     tenure_bin.x2.3.years + tenure_bin.x3.4.years + tenure_bin.x4.5.years + 
##     tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## - PaperlessBilling                        1   4035.8 4077.8
## - Dependents                              1   4035.9 4077.9
## - OnlineBackup                            1   4036.7 4078.7
## - tenure_bin.x1.2.years                   1   4037.0 4079.0
## <none>                                        4035.1 4079.1
## + OnlineSecurity                          1   4034.8 4080.8
## + gender                                  1   4034.8 4080.8
## + TechSupport                             1   4034.9 4080.9
## + TotalCharges                            1   4035.1 4081.1
## + PaymentMethod.xCredit.card..automatic.  1   4035.1 4081.1
## + PaymentMethod.xMailed.check             1   4035.1 4081.1
## + Partner                                 1   4035.1 4081.1
## - PhoneService                            1   4039.9 4081.9
## - DeviceProtection                        1   4042.8 4084.8
## - tenure_bin.x2.3.years                   1   4043.2 4085.2
## - SeniorCitizen                           1   4051.7 4093.7
## - MonthlyCharges                          1   4053.0 4095.0
## - tenure_bin.x3.4.years                   1   4054.1 4096.1
## - PaymentMethod.xElectronic.check         1   4055.0 4097.0
## - StreamingTV                             1   4059.0 4101.0
## - tenure_bin.x4.5.years                   1   4061.0 4103.0
## - tenure_bin.x5.6.years                   1   4061.3 4103.3
## - StreamingMovies                         1   4064.3 4106.3
## - InternetService.xNo                     1   4065.8 4107.8
## - Contract.xOne.year                      1   4066.7 4108.7
## - MultipleLines                           1   4067.6 4109.6
## - InternetService.xFiber.optic            1   4084.4 4126.4
## - Contract.xTwo.year                      1   4093.7 4135.7
## - tenure                                  1   4098.7 4140.7
## 
## Step:  AIC=4077.77
## Churn ~ tenure + MonthlyCharges + SeniorCitizen + Dependents + 
##     PhoneService + MultipleLines + InternetService.xFiber.optic + 
##     InternetService.xNo + OnlineBackup + DeviceProtection + StreamingTV + 
##     StreamingMovies + Contract.xOne.year + Contract.xTwo.year + 
##     PaymentMethod.xElectronic.check + tenure_bin.x1.2.years + 
##     tenure_bin.x2.3.years + tenure_bin.x3.4.years + tenure_bin.x4.5.years + 
##     tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## - Dependents                              1   4036.5 4076.5
## - OnlineBackup                            1   4037.2 4077.2
## - tenure_bin.x1.2.years                   1   4037.7 4077.7
## <none>                                        4035.8 4077.8
## + PaperlessBilling                        1   4035.1 4079.1
## + OnlineSecurity                          1   4035.4 4079.4
## + TechSupport                             1   4035.5 4079.5
## + gender                                  1   4035.6 4079.6
## + TotalCharges                            1   4035.7 4079.7
## + PaymentMethod.xCredit.card..automatic.  1   4035.7 4079.7
## + PaymentMethod.xMailed.check             1   4035.7 4079.7
## + Partner                                 1   4035.8 4079.8
## - PhoneService                            1   4040.4 4080.4
## - DeviceProtection                        1   4043.3 4083.3
## - tenure_bin.x2.3.years                   1   4043.9 4083.9
## - SeniorCitizen                           1   4052.2 4092.2
## - MonthlyCharges                          1   4053.5 4093.5
## - tenure_bin.x3.4.years                   1   4054.9 4094.9
## - PaymentMethod.xElectronic.check         1   4055.7 4095.7
## - StreamingTV                             1   4059.3 4099.3
## - tenure_bin.x4.5.years                   1   4061.9 4101.9
## - tenure_bin.x5.6.years                   1   4062.1 4102.1
## - StreamingMovies                         1   4064.6 4104.6
## - InternetService.xNo                     1   4066.2 4106.2
## - Contract.xOne.year                      1   4067.2 4107.2
## - MultipleLines                           1   4067.9 4107.9
## - InternetService.xFiber.optic            1   4084.7 4124.7
## - Contract.xTwo.year                      1   4094.3 4134.3
## - tenure                                  1   4099.6 4139.6
## 
## Step:  AIC=4076.54
## Churn ~ tenure + MonthlyCharges + SeniorCitizen + PhoneService + 
##     MultipleLines + InternetService.xFiber.optic + InternetService.xNo + 
##     OnlineBackup + DeviceProtection + StreamingTV + StreamingMovies + 
##     Contract.xOne.year + Contract.xTwo.year + PaymentMethod.xElectronic.check + 
##     tenure_bin.x1.2.years + tenure_bin.x2.3.years + tenure_bin.x3.4.years + 
##     tenure_bin.x4.5.years + tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## - OnlineBackup                            1   4038.0 4076.0
## - tenure_bin.x1.2.years                   1   4038.5 4076.5
## <none>                                        4036.5 4076.5
## + Dependents                              1   4035.8 4077.8
## + PaperlessBilling                        1   4035.9 4077.9
## + OnlineSecurity                          1   4036.2 4078.2
## + TechSupport                             1   4036.3 4078.3
## + gender                                  1   4036.4 4078.4
## + Partner                                 1   4036.4 4078.4
## + TotalCharges                            1   4036.5 4078.5
## + PaymentMethod.xCredit.card..automatic.  1   4036.5 4078.5
## + PaymentMethod.xMailed.check             1   4036.5 4078.5
## - PhoneService                            1   4041.3 4079.3
## - DeviceProtection                        1   4044.2 4082.2
## - tenure_bin.x2.3.years                   1   4044.7 4082.7
## - MonthlyCharges                          1   4054.4 4092.4
## - SeniorCitizen                           1   4054.5 4092.5
## - tenure_bin.x3.4.years                   1   4055.7 4093.7
## - PaymentMethod.xElectronic.check         1   4056.6 4094.6
## - StreamingTV                             1   4060.3 4098.3
## - tenure_bin.x4.5.years                   1   4062.8 4100.8
## - tenure_bin.x5.6.years                   1   4063.0 4101.0
## - StreamingMovies                         1   4065.7 4103.7
## - InternetService.xNo                     1   4067.3 4105.3
## - Contract.xOne.year                      1   4068.5 4106.5
## - MultipleLines                           1   4068.8 4106.8
## - InternetService.xFiber.optic            1   4086.1 4124.1
## - Contract.xTwo.year                      1   4096.3 4134.3
## - tenure                                  1   4100.8 4138.8
## 
## Step:  AIC=4076.01
## Churn ~ tenure + MonthlyCharges + SeniorCitizen + PhoneService + 
##     MultipleLines + InternetService.xFiber.optic + InternetService.xNo + 
##     DeviceProtection + StreamingTV + StreamingMovies + Contract.xOne.year + 
##     Contract.xTwo.year + PaymentMethod.xElectronic.check + tenure_bin.x1.2.years + 
##     tenure_bin.x2.3.years + tenure_bin.x3.4.years + tenure_bin.x4.5.years + 
##     tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## - tenure_bin.x1.2.years                   1   4039.9 4075.9
## <none>                                        4038.0 4076.0
## + OnlineBackup                            1   4036.5 4076.5
## + OnlineSecurity                          1   4036.8 4076.8
## + Dependents                              1   4037.2 4077.2
## - PhoneService                            1   4041.3 4077.3
## + PaperlessBilling                        1   4037.5 4077.5
## + gender                                  1   4037.8 4077.8
## + Partner                                 1   4037.9 4077.9
## + TotalCharges                            1   4038.0 4078.0
## + PaymentMethod.xCredit.card..automatic.  1   4038.0 4078.0
## + PaymentMethod.xMailed.check             1   4038.0 4078.0
## + TechSupport                             1   4038.0 4078.0
## - DeviceProtection                        1   4044.3 4080.3
## - tenure_bin.x2.3.years                   1   4046.2 4082.2
## - SeniorCitizen                           1   4056.3 4092.3
## - tenure_bin.x3.4.years                   1   4057.4 4093.4
## - MonthlyCharges                          1   4057.4 4093.4
## - PaymentMethod.xElectronic.check         1   4058.5 4094.5
## - StreamingTV                             1   4062.1 4098.1
## - tenure_bin.x4.5.years                   1   4064.7 4100.7
## - tenure_bin.x5.6.years                   1   4064.9 4100.9
## - StreamingMovies                         1   4068.4 4104.4
## - MultipleLines                           1   4069.4 4105.4
## - InternetService.xNo                     1   4071.0 4107.0
## - Contract.xOne.year                      1   4071.5 4107.5
## - InternetService.xFiber.optic            1   4098.4 4134.4
## - Contract.xTwo.year                      1   4101.0 4137.0
## - tenure                                  1   4102.2 4138.2
## 
## Step:  AIC=4075.92
## Churn ~ tenure + MonthlyCharges + SeniorCitizen + PhoneService + 
##     MultipleLines + InternetService.xFiber.optic + InternetService.xNo + 
##     DeviceProtection + StreamingTV + StreamingMovies + Contract.xOne.year + 
##     Contract.xTwo.year + PaymentMethod.xElectronic.check + tenure_bin.x2.3.years + 
##     tenure_bin.x3.4.years + tenure_bin.x4.5.years + tenure_bin.x5.6.years
## 
##                                          Df Deviance    AIC
## <none>                                        4039.9 4075.9
## + tenure_bin.x1.2.years                   1   4038.0 4076.0
## + OnlineBackup                            1   4038.5 4076.5
## + OnlineSecurity                          1   4038.7 4076.7
## + Dependents                              1   4039.2 4077.2
## - PhoneService                            1   4043.2 4077.2
## + PaperlessBilling                        1   4039.3 4077.3
## + gender                                  1   4039.7 4077.7
## + Partner                                 1   4039.8 4077.8
## + PaymentMethod.xCredit.card..automatic.  1   4039.9 4077.9
## + PaymentMethod.xMailed.check             1   4039.9 4077.9
## + TotalCharges                            1   4039.9 4077.9
## + TechSupport                             1   4039.9 4077.9
## - DeviceProtection                        1   4046.2 4080.2
## - tenure_bin.x2.3.years                   1   4048.3 4082.3
## - SeniorCitizen                           1   4058.3 4092.3
## - MonthlyCharges                          1   4059.5 4093.5
## - PaymentMethod.xElectronic.check         1   4060.6 4094.6
## - StreamingTV                             1   4063.9 4097.9
## - StreamingMovies                         1   4070.5 4104.5
## - tenure_bin.x3.4.years                   1   4070.5 4104.5
## - MultipleLines                           1   4071.0 4105.0
## - InternetService.xNo                     1   4073.1 4107.1
## - Contract.xOne.year                      1   4073.4 4107.4
## - tenure_bin.x5.6.years                   1   4088.6 4122.6
## - tenure_bin.x4.5.years                   1   4088.6 4122.6
## - InternetService.xFiber.optic            1   4100.8 4134.8
## - Contract.xTwo.year                      1   4104.6 4138.6
## - tenure                                  1   4172.8 4206.8
summary(logisticRegressionModel_2)
## 
## Call:
## glm(formula = Churn ~ tenure + MonthlyCharges + SeniorCitizen + 
##     PhoneService + MultipleLines + InternetService.xFiber.optic + 
##     InternetService.xNo + DeviceProtection + StreamingTV + StreamingMovies + 
##     Contract.xOne.year + Contract.xTwo.year + PaymentMethod.xElectronic.check + 
##     tenure_bin.x2.3.years + tenure_bin.x3.4.years + tenure_bin.x4.5.years + 
##     tenure_bin.x5.6.years, family = "binomial", data = train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0192  -0.6820  -0.2846   0.6391   3.1572  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                     -4.22546    0.43105  -9.803  < 2e-16 ***
## tenure                          -1.94005    0.17406 -11.146  < 2e-16 ***
## MonthlyCharges                  -1.39722    0.31817  -4.391 1.13e-05 ***
## SeniorCitizen                    0.43003    0.10031   4.287 1.81e-05 ***
## PhoneService                     0.48815    0.26831   1.819  0.06886 .  
## MultipleLines                    0.60072    0.10849   5.537 3.07e-08 ***
## InternetService.xFiber.optic     2.08003    0.27135   7.665 1.78e-14 ***
## InternetService.xNo             -2.00103    0.34864  -5.739 9.50e-09 ***
## DeviceProtection                 0.27377    0.10984   2.492  0.01269 *  
## StreamingTV                      0.70389    0.14465   4.866 1.14e-06 ***
## StreamingMovies                  0.79492    0.14529   5.471 4.47e-08 ***
## Contract.xOne.year              -0.72829    0.12903  -5.644 1.66e-08 ***
## Contract.xTwo.year              -1.55603    0.21379  -7.278 3.38e-13 ***
## PaymentMethod.xElectronic.check  0.38119    0.08339   4.571 4.85e-06 ***
## tenure_bin.x2.3.years            0.55021    0.18992   2.897  0.00377 ** 
## tenure_bin.x3.4.years            1.46331    0.26611   5.499 3.82e-08 ***
## tenure_bin.x4.5.years            2.30721    0.33517   6.884 5.83e-12 ***
## tenure_bin.x5.6.years            2.87120    0.41717   6.883 5.88e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5699.5  on 4921  degrees of freedom
## Residual deviance: 4039.9  on 4904  degrees of freedom
## AIC: 4075.9
## 
## Number of Fisher Scoring iterations: 6

Also, we can use Variance Inflation Factor (vif) to remove redundant variables or predictors that have high multicollinearity between them. Multicollinearity exists between two or more predictor variables when they are highly related to each other. As a result, then it becomes tough for us to analyze and understand the impact of independent variable on the dependent variables.

A predictor getting VIF lesser than 2 are considered safe and it can be understood that the predictor is not correlated with other predictor variables in the dataset.

The higher the VIF is, the more signifcant the correlation of the predictor variable is with other predictor variables.

vif(logisticRegressionModel_2)
##                          tenure                  MonthlyCharges 
##                       14.765303                       52.984214 
##                   SeniorCitizen                    PhoneService 
##                        1.085076                        4.168944 
##                   MultipleLines    InternetService.xFiber.optic 
##                        1.898677                       11.696648 
##             InternetService.xNo                DeviceProtection 
##                        7.357807                        1.721037 
##                     StreamingTV                 StreamingMovies 
##                        3.307173                        3.357872 
##              Contract.xOne.year              Contract.xTwo.year 
##                        1.373647                        1.398162 
## PaymentMethod.xElectronic.check           tenure_bin.x2.3.years 
##                        1.134441                        2.424132 
##           tenure_bin.x3.4.years           tenure_bin.x4.5.years 
##                        3.756845                        6.281268 
##           tenure_bin.x5.6.years 
##                        8.415471

We are now removing the DeviceProtection variable since it has high p-value

logisticRegressionModel_3 <-glm(formula = Churn ~ tenure + MonthlyCharges + SeniorCitizen + 
                Partner + InternetService.xFiber.optic + InternetService.xNo + 
                OnlineSecurity + OnlineBackup + TechSupport + 
                StreamingTV + Contract.xOne.year + Contract.xTwo.year + PaperlessBilling + 
                PaymentMethod.xElectronic.check + tenure_bin.x1.2.years + 
                tenure_bin.x5.6.years, family = "binomial", data = train)
summary(logisticRegressionModel_3)
## 
## Call:
## glm(formula = Churn ~ tenure + MonthlyCharges + SeniorCitizen + 
##     Partner + InternetService.xFiber.optic + InternetService.xNo + 
##     OnlineSecurity + OnlineBackup + TechSupport + StreamingTV + 
##     Contract.xOne.year + Contract.xTwo.year + PaperlessBilling + 
##     PaymentMethod.xElectronic.check + tenure_bin.x1.2.years + 
##     tenure_bin.x5.6.years, family = "binomial", data = train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9480  -0.6785  -0.2977   0.6795   3.1342  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                     -1.13909    0.19859  -5.736 9.71e-09 ***
## tenure                          -0.85816    0.07363 -11.654  < 2e-16 ***
## MonthlyCharges                   0.20547    0.15646   1.313 0.189116    
## SeniorCitizen                    0.46288    0.09948   4.653 3.27e-06 ***
## Partner                         -0.04999    0.08506  -0.588 0.556713    
## InternetService.xFiber.optic     0.64672    0.19855   3.257 0.001125 ** 
## InternetService.xNo             -0.88288    0.19095  -4.624 3.77e-06 ***
## OnlineSecurity                  -0.41112    0.10500  -3.915 9.02e-05 ***
## OnlineBackup                    -0.15733    0.09571  -1.644 0.100222    
## TechSupport                     -0.30170    0.10639  -2.836 0.004573 ** 
## StreamingTV                      0.26649    0.11462   2.325 0.020066 *  
## Contract.xOne.year              -0.70924    0.12702  -5.584 2.36e-08 ***
## Contract.xTwo.year              -1.52795    0.21586  -7.078 1.46e-12 ***
## PaperlessBilling                -0.11478    0.12215  -0.940 0.347392    
## PaymentMethod.xElectronic.check  0.41434    0.08247   5.024 5.05e-07 ***
## tenure_bin.x1.2.years           -0.38893    0.10330  -3.765 0.000166 ***
## tenure_bin.x5.6.years            0.45586    0.19036   2.395 0.016631 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5699.5  on 4921  degrees of freedom
## Residual deviance: 4108.1  on 4905  degrees of freedom
## AIC: 4142.1
## 
## Number of Fisher Scoring iterations: 6
vif(logisticRegressionModel_3)
##                          tenure                  MonthlyCharges 
##                        2.609193                       12.829118 
##                   SeniorCitizen                         Partner 
##                        1.082235                        1.149315 
##    InternetService.xFiber.optic             InternetService.xNo 
##                        6.401652                        2.252302 
##                  OnlineSecurity                    OnlineBackup 
##                        1.223534                        1.307788 
##                     TechSupport                     StreamingTV 
##                        1.303619                        2.104326 
##              Contract.xOne.year              Contract.xTwo.year 
##                        1.347368                        1.434496 
##                PaperlessBilling PaymentMethod.xElectronic.check 
##                        1.008861                        1.130146 
##           tenure_bin.x1.2.years           tenure_bin.x5.6.years 
##                        1.034265                        1.787422

We are now removing the StreamingTV variable since it has high p-value

logisticRegressionModel_4 <- glm(formula = Churn ~ tenure + MonthlyCharges + SeniorCitizen + 
                 Partner + InternetService.xFiber.optic + InternetService.xNo + 
                 OnlineSecurity + OnlineBackup + TechSupport +  
                 Contract.xOne.year + Contract.xTwo.year + PaperlessBilling + 
                 PaymentMethod.xElectronic.check + tenure_bin.x1.2.years + 
                 tenure_bin.x5.6.years, family = "binomial", data = train)

summary(logisticRegressionModel_4)
## 
## Call:
## glm(formula = Churn ~ tenure + MonthlyCharges + SeniorCitizen + 
##     Partner + InternetService.xFiber.optic + InternetService.xNo + 
##     OnlineSecurity + OnlineBackup + TechSupport + Contract.xOne.year + 
##     Contract.xTwo.year + PaperlessBilling + PaymentMethod.xElectronic.check + 
##     tenure_bin.x1.2.years + tenure_bin.x5.6.years, family = "binomial", 
##     data = train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9217  -0.6749  -0.2987   0.6852   3.1315  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                     -0.93249    0.17679  -5.274 1.33e-07 ***
## tenure                          -0.85359    0.07355 -11.606  < 2e-16 ***
## MonthlyCharges                   0.43511    0.12221   3.560 0.000370 ***
## SeniorCitizen                    0.45674    0.09936   4.597 4.29e-06 ***
## Partner                         -0.05066    0.08500  -0.596 0.551210    
## InternetService.xFiber.optic     0.41937    0.17221   2.435 0.014882 *  
## InternetService.xNo             -0.75129    0.18346  -4.095 4.22e-05 ***
## OnlineSecurity                  -0.45903    0.10297  -4.458 8.28e-06 ***
## OnlineBackup                    -0.19495    0.09431  -2.067 0.038723 *  
## TechSupport                     -0.33342    0.10546  -3.161 0.001570 ** 
## Contract.xOne.year              -0.69931    0.12685  -5.513 3.53e-08 ***
## Contract.xTwo.year              -1.51470    0.21570  -7.022 2.19e-12 ***
## PaperlessBilling                -0.11570    0.12215  -0.947 0.343557    
## PaymentMethod.xElectronic.check  0.42527    0.08228   5.168 2.36e-07 ***
## tenure_bin.x1.2.years           -0.38750    0.10315  -3.757 0.000172 ***
## tenure_bin.x5.6.years            0.44641    0.19012   2.348 0.018870 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5699.5  on 4921  degrees of freedom
## Residual deviance: 4113.5  on 4906  degrees of freedom
## AIC: 4145.5
## 
## Number of Fisher Scoring iterations: 6
vif(logisticRegressionModel_4)
##                          tenure                  MonthlyCharges 
##                        2.605061                        7.796109 
##                   SeniorCitizen                         Partner 
##                        1.081037                        1.149391 
##    InternetService.xFiber.optic             InternetService.xNo 
##                        4.821043                        2.081572 
##                  OnlineSecurity                    OnlineBackup 
##                        1.178617                        1.271566 
##                     TechSupport              Contract.xOne.year 
##                        1.284166                        1.346749 
##              Contract.xTwo.year                PaperlessBilling 
##                        1.433403                        1.008819 
## PaymentMethod.xElectronic.check           tenure_bin.x1.2.years 
##                        1.126829                        1.034175 
##           tenure_bin.x5.6.years 
##                        1.784223

logisticRegressionModel_3 has the best set of significant variables, let’s use this model for prediction first

logisticRegressionModel_final <- logisticRegressionModel_3

Model Evaluation using the validation data:

logisticRegression_Pred_result <- predict(logisticRegressionModel_final, type = "response", newdata = validation[,-24])
summary(logisticRegression_Pred_result)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00676 0.05082 0.20036 0.26891 0.44784 0.86188
validation$prob <- logisticRegression_Pred_result

Lets use probability cutoff of 50%.

pred_churn <- factor(ifelse(logisticRegression_Pred_result >= 0.50, "Yes", "No"))
actual_churn <- factor(ifelse(validation$Churn==1,"Yes","No"))
table(actual_churn,pred_churn)
##             pred_churn
## actual_churn   No  Yes
##          No  1399  150
##          Yes  281  280

Let’s find the Accuracy, Sensitivity and Specificity of our model using when the cutoff is at 50%

cutoff_churn <- factor(ifelse(logisticRegression_Pred_result >=0.50, "Yes", "No"))
conf_final <- confusionMatrix(cutoff_churn, actual_churn, positive = "Yes")
accuracy <- conf_final$overall[1]
sensitivity <- conf_final$byClass[1]
specificity <- conf_final$byClass[2]
accuracy
##  Accuracy 
## 0.7957346
sensitivity
## Sensitivity 
##   0.4991087
specificity
## Specificity 
##   0.9031633

As shown above, when we use a cutoff of 0.50, we are yielding a good accuracy and specificity, however the sensitivity is very less. Thus, we need to be able to find the optimal probability cutoff which will yield maximum accuracy, sensitivity, and specificity

perform_fn <- function(cutoff) 
{
  predicted_churn <- factor(ifelse(logisticRegression_Pred_result >= cutoff, "Yes", "No"))
  conf <- confusionMatrix(predicted_churn, actual_churn, positive = "Yes")
  accuray <- conf$overall[1]
  sensitivity <- conf$byClass[1]
  specificity <- conf$byClass[2]
  out <- t(as.matrix(c(sensitivity, specificity, accuray))) 
  colnames(out) <- c("Sensitivity", "Specificity", "Accuracy")
  return(out)
}

options(repr.plot.width =8, repr.plot.height =6)
summary(logisticRegression_Pred_result)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00676 0.05082 0.20036 0.26891 0.44784 0.86188
s = seq(0.01,0.80,length=100)
OUT = matrix(0,100,3)

for(i in 1:100)
{
  OUT[i,] = perform_fn(s[i])
} 

plot(s, OUT[,1],xlab="Cutoff",ylab="Value",cex.lab=1.5,cex.axis=1.5,ylim=c(0,1),
     type="l",lwd=2,axes=FALSE,col=2)
axis(1,seq(0,1,length=5),seq(0,1,length=5),cex.lab=1.5)
axis(2,seq(0,1,length=5),seq(0,1,length=5),cex.lab=1.5)
lines(s,OUT[,2],col="darkgreen",lwd=2)
lines(s,OUT[,3],col=4,lwd=2)
box()
legend("bottom",col=c(2,"darkgreen",4,"darkred"),text.font =3,inset = 0.02,
       box.lty=0,cex = 0.8, 
       lwd=c(2,2,2,2),c("Sensitivity","Specificity","Accuracy"))
abline(v = 0.32, col="red", lwd=1, lty=2)
axis(1, at = seq(0.1, 1, by = 0.1))

Choosing a cutoff value of 0.32% for our final Logistisc Regression model, where the three curves of accuracy, specificty and sensitivity meet

cutoff_churn <- factor(ifelse(logisticRegression_Pred_result >=0.32, "Yes", "No"))
conf_final <- confusionMatrix(cutoff_churn, actual_churn, positive = "Yes")
logisticRegression_accuracy <- conf_final$overall[1]
logisticRegression_sensitivity <- conf_final$byClass[1]
logisticRegression_specificity <- conf_final$byClass[2]
logisticRegression_accuracy
##  Accuracy 
## 0.7701422
logisticRegression_sensitivity
## Sensitivity 
##   0.7557932
logisticRegression_specificity
## Specificity 
##   0.7753389

With a cutoff probability value of 0.32, we are getting better values of accuracy, sensitivity and specificity in the validation data.

2) Decision Tree (Classification)

Preparing the dataset for training and testing

set.seed(123)
data_final$Churn <- as.factor(data_final$Churn)
indices = sample.split(data_final$Churn, SplitRatio = 0.7)
train = data_final[indices,]
validation = data_final[!(indices),]

Training the Decision Tree model using all the predictor variables & performing prediction on the validation dataset

options(repr.plot.width = 10, repr.plot.height = 8)
library(rpart)
library(rpart.plot)

#Training The Decision Tree Model
Dtree_model = rpart(Churn ~., data = train, method = "class")
summary(Dtree_model)
## Call:
## rpart(formula = Churn ~ ., data = train, method = "class")
##   n= 4922 
## 
##          CP nsplit rel error    xerror       xstd
## 1 0.1127676      0 1.0000000 1.0000000 0.02369296
## 2 0.0100000      2 0.7744648 0.7752294 0.02169290
## 
## Variable importance
##                          tenure                    TotalCharges 
##                              27                              24 
##    InternetService.xFiber.optic                  MonthlyCharges 
##                              16                              15 
## PaymentMethod.xElectronic.check                   MultipleLines 
##                               5                               5 
##             InternetService.xNo                         Partner 
##                               4                               2 
##     PaymentMethod.xMailed.check 
##                               1 
## 
## Node number 1: 4922 observations,    complexity param=0.1127676
##   predicted class=0  expected loss=0.2657456  P(node) =1
##     class counts:  3614  1308
##    probabilities: 0.734 0.266 
##   left son=2 (3091 obs) right son=3 (1831 obs)
##   Primary splits:
##       tenure                          < -0.6079294 to the right, improve=230.9873, (0 missing)
##       Contract.xTwo.year              < 0.5        to the right, improve=179.3544, (0 missing)
##       PaymentMethod.xElectronic.check < 0.5        to the left,  improve=178.0319, (0 missing)
##       InternetService.xFiber.optic    < 0.5        to the left,  improve=177.0621, (0 missing)
##       TotalCharges                    < -0.8594936 to the right, improve=104.2756, (0 missing)
##   Surrogate splits:
##       TotalCharges                    < -0.6010313 to the right, agree=0.892, adj=0.708, (0 split)
##       Partner                         < 0.5        to the right, agree=0.654, adj=0.069, (0 split)
##       PaymentMethod.xMailed.check     < 0.5        to the left,  agree=0.643, adj=0.039, (0 split)
##       MonthlyCharges                  < -1.519752  to the right, agree=0.629, adj=0.002, (0 split)
##       PaymentMethod.xElectronic.check < 0.5        to the left,  agree=0.628, adj=0.001, (0 split)
## 
## Node number 2: 3091 observations
##   predicted class=0  expected loss=0.1478486  P(node) =0.6279967
##     class counts:  2634   457
##    probabilities: 0.852 0.148 
## 
## Node number 3: 1831 observations,    complexity param=0.1127676
##   predicted class=0  expected loss=0.4647733  P(node) =0.3720033
##     class counts:   980   851
##    probabilities: 0.535 0.465 
##   left son=6 (1026 obs) right son=7 (805 obs)
##   Primary splits:
##       InternetService.xFiber.optic    < 0.5        to the left,  improve=137.11870, (0 missing)
##       MonthlyCharges                  < 0.1188857  to the left,  improve=116.66510, (0 missing)
##       InternetService.xNo             < 0.5        to the right, improve= 95.88223, (0 missing)
##       PaymentMethod.xElectronic.check < 0.5        to the left,  improve= 66.61460, (0 missing)
##       PaymentMethod.xMailed.check     < 0.5        to the right, improve= 43.01084, (0 missing)
##   Surrogate splits:
##       MonthlyCharges                  < 0.1188857  to the left,  agree=0.973, adj=0.939, (0 split)
##       PaymentMethod.xElectronic.check < 0.5        to the left,  agree=0.701, adj=0.320, (0 split)
##       TotalCharges                    < -0.8083967 to the left,  agree=0.692, adj=0.299, (0 split)
##       MultipleLines                   < 0.5        to the left,  agree=0.691, adj=0.297, (0 split)
##       InternetService.xNo             < 0.5        to the right, agree=0.670, adj=0.248, (0 split)
## 
## Node number 6: 1026 observations
##   predicted class=0  expected loss=0.2933723  P(node) =0.2084518
##     class counts:   725   301
##    probabilities: 0.707 0.293 
## 
## Node number 7: 805 observations
##   predicted class=1  expected loss=0.3167702  P(node) =0.1635514
##     class counts:   255   550
##    probabilities: 0.317 0.683
#Predicting the result of churning on the validation dataset using the trained model
DT_Pred_result <- predict(Dtree_model,type = "class", newdata = validation[,-24])
conf_final <- confusionMatrix(validation$Churn, DT_Pred_result)

DT_accuracy <- conf_final$overall[1]
DT_sensitivity <- conf_final$byClass[1]
DT_specificity <- conf_final$byClass[2]

DT_accuracy 
##  Accuracy 
## 0.7810427
DT_sensitivity 
## Sensitivity 
##    0.804141
DT_specificity 
## Specificity 
##   0.6532508

3) Random Forest Model (Classification)

Preparing the dataset for training and testing

library(randomForest)
set.seed(123)
data_final$Churn <- as.factor(data_final$Churn)

indices = sample.split(data_final$Churn, SplitRatio = 0.7)
train = data_final[indices,]
validation = data_final[!(indices),]

Model Training

model.rf <- randomForest(Churn ~ ., data=train, proximity=FALSE,importance = FALSE,
                        ntree=500,mtry=4, do.trace=FALSE)
model.rf
## 
## Call:
##  randomForest(formula = Churn ~ ., data = train, proximity = FALSE,      importance = FALSE, ntree = 500, mtry = 4, do.trace = FALSE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 4
## 
##         OOB estimate of  error rate: 20.44%
## Confusion matrix:
##      0   1 class.error
## 0 3265 349   0.0965689
## 1  657 651   0.5022936

Let’s predict the result on the validation set and check the Confusion Matrix.

RandomForest_Pred_result <- predict(model.rf, newdata=validation[,-24])
table(RandomForest_Pred_result, validation$Churn)
##                         
## RandomForest_Pred_result    0    1
##                        0 1414  298
##                        1  135  263
conf_final <- confusionMatrix(validation$Churn, RandomForest_Pred_result)

RF_accuracy <- conf_final$overall[1]
RF_sensitivity <- conf_final$byClass[1]
RF_specificity <- conf_final$byClass[2]

RF_accuracy 
##  Accuracy 
## 0.7947867
RF_sensitivity 
## Sensitivity 
##   0.8259346
RF_specificity 
## Specificity 
##    0.660804

Variable Importance Plot: As you can see below, this is the Variable Importance Plot which shows the most significant variable or attribute in descending order by mean decrease in Gini. The Mean decrease Gini shows how pure the nodes are at the end of the tree. The Higher the Gini Index is, the better the homogeneity of it.

varImpPlot(model.rf)

Let’s Check the AUC for all three machine learning models. The Area Under the Curve (AUC) is the ability of a model to differentiate between classes and is used as a summary of the ROC curve. The higher the AUC is, the better the performance of the model in differentiating between the positive and negative classes.

The ROC curve basically shows the trade-off between sensitivity (or TPR) and specificity (1 – FPR). Models that result in curves closer to the top-left corner of the graph indicate a better performance.The closer the curve is to the 45-degree diagonal line of the ROC space, the less accurate the model is in terms on performance.

Receiver operating characteristic (ROC) Graph Between The Three Machine Learning Models

options(repr.plot.width =10, repr.plot.height = 8)

glm.roc <- roc(response = validation$Churn, predictor = as.numeric(logisticRegression_Pred_result))
DT.roc <- roc(response = validation$Churn, predictor = as.numeric(DT_Pred_result))
rf.roc <- roc(response = validation$Churn, predictor = as.numeric(RandomForest_Pred_result))

plot(glm.roc,      legacy.axes = TRUE, print.auc.y = 1.0, print.auc = TRUE)
plot(DT.roc, col = "blue", add = TRUE, print.auc.y = 0.65, print.auc = TRUE)
plot(rf.roc, col = "red" , add = TRUE, print.auc.y = 0.85, print.auc = TRUE)
legend("bottom", c("Random Forest", "Decision Tree", "Logistic"),
       lty = c(1,1), lwd = c(2, 2), col = c("red", "blue", "black"), cex = 0.75)

As we can see from the above graph, we can see that the the curve of Logistic Regression model curves closer to the top-left corner which in turn indicates the best performer among the other two models namely Decision Tree and Random Forest.

Hence we can say that Logistic Regression is the most suited machine learning algorithm for our dataset to predict customer churn.

A brief summary for all the 3 models used:

Logistic Regression:

sprintf("Accuracy:  %f", logisticRegression_accuracy)
## [1] "Accuracy:  0.770142"
sprintf("Sensitivity:  %f", logisticRegression_sensitivity)
## [1] "Sensitivity:  0.755793"
sprintf("Specificity:  %f", logisticRegression_specificity)
## [1] "Specificity:  0.775339"

Decision Trees:

sprintf("Accuracy:  %f", DT_accuracy)
## [1] "Accuracy:  0.781043"
sprintf("Sensitivity:  %f", DT_sensitivity)
## [1] "Sensitivity:  0.804141"
sprintf("Specificity:  %f", DT_specificity)
## [1] "Specificity:  0.653251"

Random Forest:

sprintf("Accuracy:  %f", RF_accuracy)
## [1] "Accuracy:  0.794787"
sprintf("Sensitivity:  %f", RF_sensitivity)
## [1] "Sensitivity:  0.825935"
sprintf("Specificity:  %f", RF_specificity)
## [1] "Specificity:  0.660804"

Conclusion:

Based on the exploratory analysis and modelling which has been done, we can come up with few suggestion to minimise customer attrition and to maximise profit.

1)Target more on young and middle-aged customers since they are more likely to adopt modern technology and have the budget to enjoy.

2) Offer more discount for the customers who decide to choose the one year or two-year contract so that more customers will be bound with the contract.

3) Consider an overall discount since the price is always one of the major factors for customers to choose among existing incumbents.