Introduction

Overview

For this project our team decided to look at customer data to preform logistic regression. Our data set concerns customer retention data and our dependent variable is a dummy variable of whether a customer is retained. In this report we will be going through our data (including visualizations), exploratory data analysis, model building, and conclusions.

The Data

The data selected for this project is a data set from kaggle called “Telco Customer Churn”. It is originally an IBM sample data set. As stated in the overview the data is concerned with whether customers are retained, this variable is a dummy variable called churn. Churn takes a value of 1 if the customer churned (left) or 0 if the customer is retained. The data set contains 19 more predictor varibles and a identification column. The predictor varibles fall into three broad categories: services each customer has, customer account information, and customer demographics. These variables will be explored in greater depth below.

Exploratory Data Analysis

Initial Wrangling

In this section of the report we will be exploring and explaining the data.

# load necessary packages
library(dplyr)
library(tidyverse)
library(fastDummies)
library(skimr)
library(lares)
library(caTools)
library(caret)
# read in the data
file<-read.csv("C:/Users/Dhruv Cairae/Desktop/WA_Fn-UseC_-Telco-Customer-Churn.csv",header=T) # reads in the data using read.csv()
summary(file) # summary Statistics
##       customerID      gender     SeniorCitizen    Partner    Dependents
##  0002-ORFBO:   1   Female:3488   Min.   :0.0000   No :3641   No :4933  
##  0003-MKNFE:   1   Male  :3555   1st Qu.:0.0000   Yes:3402   Yes:2110  
##  0004-TLHLJ:   1                 Median :0.0000                        
##  0011-IGKFF:   1                 Mean   :0.1621                        
##  0013-EXCHZ:   1                 3rd Qu.:0.0000                        
##  0013-MHZWF:   1                 Max.   :1.0000                        
##  (Other)   :7037                                                       
##      tenure      PhoneService          MultipleLines     InternetService
##  Min.   : 0.00   No : 682     No              :3390   DSL        :2421  
##  1st Qu.: 9.00   Yes:6361     No phone service: 682   Fiber optic:3096  
##  Median :29.00                Yes             :2971   No         :1526  
##  Mean   :32.37                                                          
##  3rd Qu.:55.00                                                          
##  Max.   :72.00                                                          
##                                                                         
##              OnlineSecurity              OnlineBackup 
##  No                 :3498   No                 :3088  
##  No internet service:1526   No internet service:1526  
##  Yes                :2019   Yes                :2429  
##                                                       
##                                                       
##                                                       
##                                                       
##             DeviceProtection              TechSupport  
##  No                 :3095    No                 :3473  
##  No internet service:1526    No internet service:1526  
##  Yes                :2422    Yes                :2044  
##                                                        
##                                                        
##                                                        
##                                                        
##               StreamingTV              StreamingMovies           Contract   
##  No                 :2810   No                 :2785   Month-to-month:3875  
##  No internet service:1526   No internet service:1526   One year      :1473  
##  Yes                :2707   Yes                :2732   Two year      :1695  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  PaperlessBilling                   PaymentMethod  MonthlyCharges  
##  No :2872         Bank transfer (automatic):1544   Min.   : 18.25  
##  Yes:4171         Credit card (automatic)  :1522   1st Qu.: 35.50  
##                   Electronic check         :2365   Median : 70.35  
##                   Mailed check             :1612   Mean   : 64.76  
##                                                    3rd Qu.: 89.85  
##                                                    Max.   :118.75  
##                                                                    
##   TotalCharges    Churn     
##  Min.   :  18.8   No :5174  
##  1st Qu.: 401.4   Yes:1869  
##  Median :1397.5             
##  Mean   :2283.3             
##  3rd Qu.:3794.7             
##  Max.   :8684.8             
##  NA's   :11
str(file) # simple structure
## 'data.frame':    7043 obs. of  21 variables:
##  $ customerID      : Factor w/ 7043 levels "0002-ORFBO","0003-MKNFE",..: 5376 3963 2565 5536 6512 6552 1003 4771 5605 4535 ...
##  $ gender          : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 1 1 2 ...
##  $ SeniorCitizen   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Partner         : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 2 1 ...
##  $ Dependents      : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 2 ...
##  $ tenure          : int  1 34 2 45 2 8 22 10 28 62 ...
##  $ PhoneService    : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 1 2 2 ...
##  $ MultipleLines   : Factor w/ 3 levels "No","No phone service",..: 2 1 1 2 1 3 3 2 3 1 ...
##  $ InternetService : Factor w/ 3 levels "DSL","Fiber optic",..: 1 1 1 1 2 2 2 1 2 1 ...
##  $ OnlineSecurity  : Factor w/ 3 levels "No","No internet service",..: 1 3 3 3 1 1 1 3 1 3 ...
##  $ OnlineBackup    : Factor w/ 3 levels "No","No internet service",..: 3 1 3 1 1 1 3 1 1 3 ...
##  $ DeviceProtection: Factor w/ 3 levels "No","No internet service",..: 1 3 1 3 1 3 1 1 3 1 ...
##  $ TechSupport     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 3 1 1 1 1 3 1 ...
##  $ StreamingTV     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 3 1 3 1 ...
##  $ StreamingMovies : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 1 1 3 1 ...
##  $ Contract        : Factor w/ 3 levels "Month-to-month",..: 1 2 1 2 1 1 1 1 1 2 ...
##  $ PaperlessBilling: Factor w/ 2 levels "No","Yes": 2 1 2 1 2 2 2 1 2 1 ...
##  $ PaymentMethod   : Factor w/ 4 levels "Bank transfer (automatic)",..: 3 4 4 1 3 3 2 4 3 1 ...
##  $ MonthlyCharges  : num  29.9 57 53.9 42.3 70.7 ...
##  $ TotalCharges    : num  29.9 1889.5 108.2 1840.8 151.7 ...
##  $ Churn           : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 1 2 1 ...
head(file) # first 6 rows
##   customerID gender SeniorCitizen Partner Dependents tenure PhoneService
## 1 7590-VHVEG Female             0     Yes         No      1           No
## 2 5575-GNVDE   Male             0      No         No     34          Yes
## 3 3668-QPYBK   Male             0      No         No      2          Yes
## 4 7795-CFOCW   Male             0      No         No     45           No
## 5 9237-HQITU Female             0      No         No      2          Yes
## 6 9305-CDSKC Female             0      No         No      8          Yes
##      MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection
## 1 No phone service             DSL             No          Yes               No
## 2               No             DSL            Yes           No              Yes
## 3               No             DSL            Yes          Yes               No
## 4 No phone service             DSL            Yes           No              Yes
## 5               No     Fiber optic             No           No               No
## 6              Yes     Fiber optic             No           No              Yes
##   TechSupport StreamingTV StreamingMovies       Contract PaperlessBilling
## 1          No          No              No Month-to-month              Yes
## 2          No          No              No       One year               No
## 3          No          No              No Month-to-month              Yes
## 4         Yes          No              No       One year               No
## 5          No          No              No Month-to-month              Yes
## 6          No         Yes             Yes Month-to-month              Yes
##               PaymentMethod MonthlyCharges TotalCharges Churn
## 1          Electronic check          29.85        29.85    No
## 2              Mailed check          56.95      1889.50    No
## 3              Mailed check          53.85       108.15   Yes
## 4 Bank transfer (automatic)          42.30      1840.75    No
## 5          Electronic check          70.70       151.65   Yes
## 6          Electronic check          99.65       820.50   Yes

Based on the summary, head, and structure functions we can see that most of the data is numerical or character. We can see that many of the character variables appear to actually be categorical (we will need to transform these) and we also need to check for missing values.

table(is.na(file)) # check for missing values in the data frame
## 
##  FALSE   TRUE 
## 147892     11

Our table of missing values shows that only 11 are missing out of nearly 150,000 data points. This is negligible and should not interfere with our model. Based on the data dictionary we have access to we know that the customerID variable is unique for every row. Our next step is to remove this from our data set.

file_1<-file[-c(1)] # dataframe of the last column
file_2<-na.omit(file_1) # dataframe of everything except the last column

Now that we have removed our first column, the next step is to transform the character variables into factors. This will make it easier to preform our logistic regression in the model building section, the numeric variables are left alone.

file_3 <- dummy_cols(file_2, select_columns = c('gender','Partner','Dependents','PhoneService','MultipleLines',
                                              'InternetService','OnlineSecurity','OnlineBackup','DeviceProtection',
                                              'TechSupport','StreamingTV','StreamingMovies',
                                              'PaperlessBilling'),remove_selected_columns = TRUE) # creates factors of variables

After transforming these variables to factors, we further transform some of these into dummy variables to make our analysis easier and remove unecessary variables.

dataset<-subset(file_3,select = c(SeniorCitizen,tenure,MonthlyCharges,TotalCharges,gender_Male,Partner_Yes,Dependents_Yes,
                                 PhoneService_Yes,MultipleLines_Yes,InternetService_DSL,InternetService_No,OnlineSecurity_Yes,OnlineBackup_Yes,DeviceProtection_Yes,TechSupport_Yes,StreamingTV_Yes,
                                 StreamingMovies_Yes,PaperlessBilling_Yes)) # subset of only necessary data
dataset$creditcard<- ifelse(file_3$PaymentMethod=="Credit card (automatic)", 1, 0)  # Creates new columns using ifelse()                          
dataset$banktransfer<- ifelse(file_3$PaymentMethod=="Bank transfer (automatic)", 1, 0)
dataset$ec<- ifelse(file_3$PaymentMethod=="Electronic check", 1, 0)
dataset$monthlycontract<- ifelse(file_3$Contract=="Month-to-month", 1, 0)
dataset$annual<- ifelse(file_3$Contract=="One year", 1, 0)
dataset$Churn_Yes<- ifelse(file_3$Churn=="Yes", 1, 0)
skim(dataset) # summary statistics similar to summary()
Data summary
Name dataset
Number of rows 7032
Number of columns 24
_______________________
Column type frequency:
numeric 24
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
SeniorCitizen 0 1 0.16 0.37 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
tenure 0 1 32.42 24.55 1.00 9.00 29.00 55.00 72.00 ▇▃▃▃▅
MonthlyCharges 0 1 64.80 30.09 18.25 35.59 70.35 89.86 118.75 ▇▅▆▇▅
TotalCharges 0 1 2283.30 2266.77 18.80 401.45 1397.47 3794.74 8684.80 ▇▂▂▂▁
gender_Male 0 1 0.50 0.50 0.00 0.00 1.00 1.00 1.00 ▇▁▁▁▇
Partner_Yes 0 1 0.48 0.50 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▇
Dependents_Yes 0 1 0.30 0.46 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃
PhoneService_Yes 0 1 0.90 0.30 0.00 1.00 1.00 1.00 1.00 ▁▁▁▁▇
MultipleLines_Yes 0 1 0.42 0.49 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▆
InternetService_DSL 0 1 0.34 0.47 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
InternetService_No 0 1 0.22 0.41 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
OnlineSecurity_Yes 0 1 0.29 0.45 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃
OnlineBackup_Yes 0 1 0.34 0.48 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
DeviceProtection_Yes 0 1 0.34 0.48 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
TechSupport_Yes 0 1 0.29 0.45 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃
StreamingTV_Yes 0 1 0.38 0.49 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
StreamingMovies_Yes 0 1 0.39 0.49 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
PaperlessBilling_Yes 0 1 0.59 0.49 0.00 0.00 1.00 1.00 1.00 ▆▁▁▁▇
creditcard 0 1 0.22 0.41 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
banktransfer 0 1 0.22 0.41 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
ec 0 1 0.34 0.47 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
monthlycontract 0 1 0.55 0.50 0.00 0.00 1.00 1.00 1.00 ▆▁▁▁▇
annual 0 1 0.21 0.41 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
Churn_Yes 0 1 0.27 0.44 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃

The summary of our new data set demonstrates how our transformations have affected our original data. This summary can be compared to the summary created in the first part of this section.

Initial Model Building

Correlation Analysis & Visualizations of top 3 Variables of Interest

We preform a correlation analysis to get the top 10 correlated values from there we will narrow it down to the top 3 variables of interest.

corr_var(dataset, # name of dataset
         Churn_Yes, # name of variable to focus on
         top = 10 # display top 10 correlations
) # correlation analysis
## Warning in .font_global(font, quiet = FALSE): Font 'Arial Narrow' is not
## installed, has other name, or can't be found

Based on the correlation analysis the most important variables in relation to churn_yes (the customer has churned) are monthlycontract (the customers contract is month to month), tenure (The number of months the customer has stayed with the company), and ec (the customer pays with an electric check). EC and monthlycontract are highly positively correlated while tenure is highly negatively correlated. This makes logical sense for the tenure variable, the longer you are with the company the less likely you are to leave. We have graphed the three most important variables to make viewing them easier.

hist(file$tenure,main="Histogram of Tenure",freq = FALSE) #histogram of tenure
lines(density(file$tenure), lwd=5, col='blue') 

ggplot(file, aes(x = Churn)) + #ggplot of churn
  geom_bar(fill=c('green','red'))+
  theme_minimal()+
  ggtitle("Plot of Churn")+
  labs(x = "Churn", y = "Count")

ggplot(file, aes(x = Churn,fill=Contract)) + # ggplot of churn by contracts
  geom_bar()+
  ggtitle("Plot of Churn by Contracts")+
  theme_minimal()+
  labs(x = "Churn")

ggplot(file, aes(x = Churn,fill=PaymentMethod)) + # ggplot of churn by payment method
  geom_bar()+
  ggtitle("Plot of Churn by Payment Method")+
  theme_minimal()+
  labs(x = "Churn")

Model Development

In this section we will be developing a model to predict whether a customer leaves the company or not. First we split the data into training and testing sets. Then we fit XGBoost to the training set and use K-fold cross validation to validate our predicted results.

set.seed(100)
split = sample.split(dataset$Churn_Yes, SplitRatio = 0.8) # create testing and training datasets
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

# Fitting XGBoost to the Training set & K-Fold Cross Validation
library(xgboost)
classifier = xgboost(data = as.matrix(training_set[-24]), label = training_set$Churn_Yes, nrounds = 10)
## [1]  train-rmse:0.434445 
## [2]  train-rmse:0.395380 
## [3]  train-rmse:0.372658 
## [4]  train-rmse:0.357991 
## [5]  train-rmse:0.348013 
## [6]  train-rmse:0.341597 
## [7]  train-rmse:0.335460 
## [8]  train-rmse:0.331614 
## [9]  train-rmse:0.327296 
## [10] train-rmse:0.324736
# Predicting the Test set results
y_pred = predict(classifier, newdata = as.matrix(test_set[-24]))
y_pred = (y_pred >= 0.5)
cm = table(test_set[, 24], y_pred)
cm
##    y_pred
##     FALSE TRUE
##   0   942   91
##   1   174  200
library(caret)
folds = createFolds(training_set$Churn_Yes, k = 5)
cv = lapply(folds, function(x) {
  training_fold = training_set[-x, ]
  test_fold = training_set[x, ]
  classifier = xgboost(data = as.matrix(training_fold[-24]), label = training_fold$Churn_Yes, nrounds = 5)
  y_pred = predict(classifier, newdata = as.matrix(test_fold[-24]))
  y_pred = (y_pred >= 0.5)
  cm = table(test_fold[, 24], y_pred)
  accuracy = (cm[1,1] + cm[2,2]) / (cm[1,1] + cm[2,2] + cm[1,2] + cm[2,1])
  return(accuracy)
})
## [1]  train-rmse:0.433763 
## [2]  train-rmse:0.394571 
## [3]  train-rmse:0.371047 
## [4]  train-rmse:0.355262 
## [5]  train-rmse:0.345286 
## [1]  train-rmse:0.433647 
## [2]  train-rmse:0.394210 
## [3]  train-rmse:0.370405 
## [4]  train-rmse:0.355088 
## [5]  train-rmse:0.345524 
## [1]  train-rmse:0.433397 
## [2]  train-rmse:0.393363 
## [3]  train-rmse:0.369611 
## [4]  train-rmse:0.355079 
## [5]  train-rmse:0.345474 
## [1]  train-rmse:0.432931 
## [2]  train-rmse:0.393024 
## [3]  train-rmse:0.369668 
## [4]  train-rmse:0.354796 
## [5]  train-rmse:0.344006 
## [1]  train-rmse:0.431996 
## [2]  train-rmse:0.392834 
## [3]  train-rmse:0.368289 
## [4]  train-rmse:0.353862 
## [5]  train-rmse:0.343629
accuracy = mean(as.numeric(cv))
accuracy
## [1] 0.7946667

Based on the test results table and the accuracy measurement this appears to be an excellent model for our predictive purposes.

Another modeling attempt using glm (Logistic Regression)

In this section we will test different glm models using the full model, the null model, the model using the 3 predictor variables from the previous section, stepwise AIC, and stepwise BIC.

## glm Model on all variables
full_model <- glm(Churn_Yes ~ ., family = binomial, data = training_set)
full_model_summary <- summary(full_model)

full_model_summary$deviance/full_model_summary$df.residual # in-sample model mean residual deviance
## [1] 0.8298638
AIC(full_model) #AIC
## [1] 4696.067
BIC(full_model) #BIC
## [1] 4855.307
## glm Model on no variables
null_model <- glm(Churn_Yes ~ 1, family = binomial, data = training_set)
null_model_summary <- summary(null_model)

null_model_summary$deviance/null_model_summary$df.residual # in-sample model mean residual deviance
## [1] 1.158234
AIC(null_model)
## [1] 6515.907
BIC(null_model)
## [1] 6522.542
## glm Model on Education and PAY_0 variables
glm_model <- glm(Churn_Yes ~monthlycontract+tenure+ec, family = binomial, data = training_set)
glm_model_summary <- summary(glm_model)

glm_model_summary$deviance/glm_model_summary$df.residual # in-sample model mean residual deviance
## [1] 0.9191821
AIC(glm_model)
## [1] 5174.722
BIC(glm_model)
## [1] 5201.262
glm_model_summary$df.residual
## [1] 5621
## Start:  AIC=4696.07
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     gender_Male + Partner_Yes + Dependents_Yes + PhoneService_Yes + 
##     MultipleLines_Yes + InternetService_DSL + InternetService_No + 
##     OnlineSecurity_Yes + OnlineBackup_Yes + DeviceProtection_Yes + 
##     TechSupport_Yes + StreamingTV_Yes + StreamingMovies_Yes + 
##     PaperlessBilling_Yes + creditcard + banktransfer + ec + monthlycontract + 
##     annual
## 
##                        Df Deviance    AIC
## - banktransfer          1   4648.1 4694.1
## - OnlineBackup_Yes      1   4648.1 4694.1
## - Partner_Yes           1   4648.1 4694.1
## - gender_Male           1   4648.1 4694.1
## - PhoneService_Yes      1   4648.2 4694.2
## - DeviceProtection_Yes  1   4648.6 4694.6
## - OnlineSecurity_Yes    1   4648.9 4694.9
## - creditcard            1   4648.9 4694.9
## - MonthlyCharges        1   4649.3 4695.3
## - TechSupport_Yes       1   4649.3 4695.3
## <none>                      4648.1 4696.1
## - StreamingTV_Yes       1   4650.4 4696.4
## - StreamingMovies_Yes   1   4650.6 4696.6
## - Dependents_Yes        1   4651.0 4697.0
## - InternetService_DSL   1   4651.6 4697.6
## - SeniorCitizen         1   4651.6 4697.6
## - InternetService_No    1   4651.7 4697.7
## - MultipleLines_Yes     1   4652.6 4698.6
## - ec                    1   4658.0 4704.0
## - annual                1   4661.1 4707.1
## - PaperlessBilling_Yes  1   4663.4 4709.4
## - TotalCharges          1   4671.3 4717.3
## - monthlycontract       1   4714.9 4760.9
## - tenure                1   4740.9 4786.9
## 
## Step:  AIC=4694.08
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     gender_Male + Partner_Yes + Dependents_Yes + PhoneService_Yes + 
##     MultipleLines_Yes + InternetService_DSL + InternetService_No + 
##     OnlineSecurity_Yes + OnlineBackup_Yes + DeviceProtection_Yes + 
##     TechSupport_Yes + StreamingTV_Yes + StreamingMovies_Yes + 
##     PaperlessBilling_Yes + creditcard + ec + monthlycontract + 
##     annual
## 
##                        Df Deviance    AIC
## - OnlineBackup_Yes      1   4648.1 4692.1
## - Partner_Yes           1   4648.1 4692.1
## - gender_Male           1   4648.2 4692.2
## - PhoneService_Yes      1   4648.2 4692.2
## - DeviceProtection_Yes  1   4648.6 4692.6
## - OnlineSecurity_Yes    1   4648.9 4692.9
## - MonthlyCharges        1   4649.3 4693.3
## - TechSupport_Yes       1   4649.3 4693.3
## - creditcard            1   4649.4 4693.4
## <none>                      4648.1 4694.1
## - StreamingTV_Yes       1   4650.4 4694.4
## - StreamingMovies_Yes   1   4650.6 4694.6
## - Dependents_Yes        1   4651.0 4695.0
## - InternetService_DSL   1   4651.7 4695.7
## - SeniorCitizen         1   4651.7 4695.7
## - InternetService_No    1   4651.8 4695.8
## - MultipleLines_Yes     1   4652.6 4696.6
## - annual                1   4661.1 4705.1
## - ec                    1   4663.0 4707.0
## - PaperlessBilling_Yes  1   4663.4 4707.4
## - TotalCharges          1   4671.4 4715.4
## - monthlycontract       1   4714.9 4758.9
## - tenure                1   4742.6 4786.6
## 
## Step:  AIC=4692.1
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     gender_Male + Partner_Yes + Dependents_Yes + PhoneService_Yes + 
##     MultipleLines_Yes + InternetService_DSL + InternetService_No + 
##     OnlineSecurity_Yes + DeviceProtection_Yes + TechSupport_Yes + 
##     StreamingTV_Yes + StreamingMovies_Yes + PaperlessBilling_Yes + 
##     creditcard + ec + monthlycontract + annual
## 
##                        Df Deviance    AIC
## - Partner_Yes           1   4648.2 4690.2
## - gender_Male           1   4648.2 4690.2
## - PhoneService_Yes      1   4649.2 4691.2
## - creditcard            1   4649.5 4691.5
## - OnlineSecurity_Yes    1   4649.8 4691.8
## - DeviceProtection_Yes  1   4650.1 4692.1
## <none>                      4648.1 4692.1
## - TechSupport_Yes       1   4650.8 4692.8
## - Dependents_Yes        1   4651.0 4693.0
## - SeniorCitizen         1   4651.7 4693.7
## - MonthlyCharges        1   4655.7 4697.7
## - StreamingTV_Yes       1   4659.3 4701.3
## - StreamingMovies_Yes   1   4660.3 4702.3
## - annual                1   4661.1 4703.1
## - MultipleLines_Yes     1   4662.4 4704.4
## - ec                    1   4663.0 4705.0
## - PaperlessBilling_Yes  1   4663.4 4705.4
## - InternetService_No    1   4668.1 4710.1
## - InternetService_DSL   1   4668.2 4710.2
## - TotalCharges          1   4671.4 4713.4
## - monthlycontract       1   4714.9 4756.9
## - tenure                1   4742.7 4784.7
## 
## Step:  AIC=4690.16
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     gender_Male + Dependents_Yes + PhoneService_Yes + MultipleLines_Yes + 
##     InternetService_DSL + InternetService_No + OnlineSecurity_Yes + 
##     DeviceProtection_Yes + TechSupport_Yes + StreamingTV_Yes + 
##     StreamingMovies_Yes + PaperlessBilling_Yes + creditcard + 
##     ec + monthlycontract + annual
## 
##                        Df Deviance    AIC
## - gender_Male           1   4648.2 4688.2
## - PhoneService_Yes      1   4649.3 4689.3
## - creditcard            1   4649.5 4689.5
## - OnlineSecurity_Yes    1   4649.9 4689.9
## - DeviceProtection_Yes  1   4650.2 4690.2
## <none>                      4648.2 4690.2
## - TechSupport_Yes       1   4650.8 4690.8
## - Dependents_Yes        1   4651.3 4691.3
## - SeniorCitizen         1   4651.9 4691.9
## - MonthlyCharges        1   4655.8 4695.8
## - StreamingTV_Yes       1   4659.4 4699.4
## - StreamingMovies_Yes   1   4660.4 4700.4
## - annual                1   4661.2 4701.2
## - MultipleLines_Yes     1   4662.5 4702.5
## - ec                    1   4663.1 4703.1
## - PaperlessBilling_Yes  1   4663.5 4703.5
## - InternetService_No    1   4668.3 4708.3
## - InternetService_DSL   1   4668.4 4708.4
## - TotalCharges          1   4671.5 4711.5
## - monthlycontract       1   4715.0 4755.0
## - tenure                1   4743.2 4783.2
## 
## Step:  AIC=4688.23
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     Dependents_Yes + PhoneService_Yes + MultipleLines_Yes + InternetService_DSL + 
##     InternetService_No + OnlineSecurity_Yes + DeviceProtection_Yes + 
##     TechSupport_Yes + StreamingTV_Yes + StreamingMovies_Yes + 
##     PaperlessBilling_Yes + creditcard + ec + monthlycontract + 
##     annual
## 
##                        Df Deviance    AIC
## - PhoneService_Yes      1   4649.3 4687.3
## - creditcard            1   4649.6 4687.6
## - OnlineSecurity_Yes    1   4649.9 4687.9
## - DeviceProtection_Yes  1   4650.2 4688.2
## <none>                      4648.2 4688.2
## - TechSupport_Yes       1   4650.9 4688.9
## - Dependents_Yes        1   4651.4 4689.4
## - SeniorCitizen         1   4652.0 4690.0
## - MonthlyCharges        1   4655.8 4693.8
## - StreamingTV_Yes       1   4659.4 4697.4
## - StreamingMovies_Yes   1   4660.5 4698.5
## - annual                1   4661.2 4699.2
## - MultipleLines_Yes     1   4662.6 4700.6
## - ec                    1   4663.2 4701.2
## - PaperlessBilling_Yes  1   4663.6 4701.6
## - InternetService_No    1   4668.3 4706.3
## - InternetService_DSL   1   4668.4 4706.4
## - TotalCharges          1   4671.5 4709.5
## - monthlycontract       1   4715.0 4753.0
## - tenure                1   4743.3 4781.3
## 
## Step:  AIC=4687.31
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     Dependents_Yes + MultipleLines_Yes + InternetService_DSL + 
##     InternetService_No + OnlineSecurity_Yes + DeviceProtection_Yes + 
##     TechSupport_Yes + StreamingTV_Yes + StreamingMovies_Yes + 
##     PaperlessBilling_Yes + creditcard + ec + monthlycontract + 
##     annual
## 
##                        Df Deviance    AIC
## - DeviceProtection_Yes  1   4650.3 4686.3
## - creditcard            1   4650.7 4686.7
## <none>                      4649.3 4687.3
## - Dependents_Yes        1   4652.5 4688.5
## - SeniorCitizen         1   4653.0 4689.0
## - OnlineSecurity_Yes    1   4654.5 4690.5
## - TechSupport_Yes       1   4656.5 4692.5
## - annual                1   4662.3 4698.3
## - MultipleLines_Yes     1   4664.0 4700.0
## - ec                    1   4664.1 4700.1
## - PaperlessBilling_Yes  1   4664.6 4700.6
## - StreamingTV_Yes       1   4665.9 4701.9
## - MonthlyCharges        1   4666.6 4702.6
## - StreamingMovies_Yes   1   4668.4 4704.4
## - TotalCharges          1   4672.1 4708.1
## - InternetService_DSL   1   4690.9 4726.9
## - InternetService_No    1   4708.3 4744.3
## - monthlycontract       1   4715.9 4751.9
## - tenure                1   4745.4 4781.4
## 
## Step:  AIC=4686.27
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     Dependents_Yes + MultipleLines_Yes + InternetService_DSL + 
##     InternetService_No + OnlineSecurity_Yes + TechSupport_Yes + 
##     StreamingTV_Yes + StreamingMovies_Yes + PaperlessBilling_Yes + 
##     creditcard + ec + monthlycontract + annual
## 
##                        Df Deviance    AIC
## - creditcard            1   4651.7 4685.7
## <none>                      4650.3 4686.3
## - Dependents_Yes        1   4653.4 4687.4
## - SeniorCitizen         1   4654.1 4688.1
## - OnlineSecurity_Yes    1   4656.3 4690.3
## - TechSupport_Yes       1   4657.9 4691.9
## - annual                1   4663.1 4697.1
## - MultipleLines_Yes     1   4664.1 4698.1
## - ec                    1   4664.9 4698.9
## - PaperlessBilling_Yes  1   4665.5 4699.5
## - StreamingTV_Yes       1   4666.2 4700.2
## - MonthlyCharges        1   4666.8 4700.8
## - StreamingMovies_Yes   1   4668.6 4702.6
## - TotalCharges          1   4673.5 4707.5
## - InternetService_DSL   1   4692.0 4726.0
## - InternetService_No    1   4709.5 4743.5
## - monthlycontract       1   4716.1 4750.1
## - tenure                1   4746.1 4780.1
## 
## Step:  AIC=4685.66
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     Dependents_Yes + MultipleLines_Yes + InternetService_DSL + 
##     InternetService_No + OnlineSecurity_Yes + TechSupport_Yes + 
##     StreamingTV_Yes + StreamingMovies_Yes + PaperlessBilling_Yes + 
##     ec + monthlycontract + annual
## 
##                        Df Deviance    AIC
## <none>                      4651.7 4685.7
## - Dependents_Yes        1   4654.8 4686.8
## - SeniorCitizen         1   4655.4 4687.4
## - OnlineSecurity_Yes    1   4657.7 4689.7
## - TechSupport_Yes       1   4659.2 4691.2
## - annual                1   4664.5 4696.5
## - MultipleLines_Yes     1   4665.3 4697.3
## - PaperlessBilling_Yes  1   4666.4 4698.4
## - StreamingTV_Yes       1   4667.5 4699.5
## - MonthlyCharges        1   4668.2 4700.2
## - StreamingMovies_Yes   1   4669.9 4701.9
## - ec                    1   4673.9 4705.9
## - TotalCharges          1   4675.5 4707.5
## - InternetService_DSL   1   4693.1 4725.1
## - InternetService_No    1   4710.3 4742.3
## - monthlycontract       1   4717.6 4749.6
## - tenure                1   4749.9 4781.9
AIC_step_summary <- summary(AIC_step)

AIC_step_summary$deviance/AIC_step_summary$df.residual
## [1] 0.8294691
AIC(AIC_step)
## [1] 4685.663
BIC(AIC_step)
## [1] 4798.458
## Stepwise (BIC)
n <- dim(training_set)[1]
## Start:  AIC=4855.31
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     gender_Male + Partner_Yes + Dependents_Yes + PhoneService_Yes + 
##     MultipleLines_Yes + InternetService_DSL + InternetService_No + 
##     OnlineSecurity_Yes + OnlineBackup_Yes + DeviceProtection_Yes + 
##     TechSupport_Yes + StreamingTV_Yes + StreamingMovies_Yes + 
##     PaperlessBilling_Yes + creditcard + banktransfer + ec + monthlycontract + 
##     annual
## 
##                        Df Deviance    AIC
## - banktransfer          1   4648.1 4846.7
## - OnlineBackup_Yes      1   4648.1 4846.7
## - Partner_Yes           1   4648.1 4846.7
## - gender_Male           1   4648.1 4846.7
## - PhoneService_Yes      1   4648.2 4846.8
## - DeviceProtection_Yes  1   4648.6 4847.2
## - OnlineSecurity_Yes    1   4648.9 4847.5
## - creditcard            1   4648.9 4847.6
## - MonthlyCharges        1   4649.3 4847.9
## - TechSupport_Yes       1   4649.3 4847.9
## - StreamingTV_Yes       1   4650.4 4849.0
## - StreamingMovies_Yes   1   4650.6 4849.2
## - Dependents_Yes        1   4651.0 4849.6
## - InternetService_DSL   1   4651.6 4850.3
## - SeniorCitizen         1   4651.6 4850.3
## - InternetService_No    1   4651.7 4850.3
## - MultipleLines_Yes     1   4652.6 4851.2
## <none>                      4648.1 4855.3
## - ec                    1   4658.0 4856.6
## - annual                1   4661.1 4859.7
## - PaperlessBilling_Yes  1   4663.4 4862.0
## - TotalCharges          1   4671.3 4869.9
## - monthlycontract       1   4714.9 4913.5
## - tenure                1   4740.9 4939.5
## 
## Step:  AIC=4846.69
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     gender_Male + Partner_Yes + Dependents_Yes + PhoneService_Yes + 
##     MultipleLines_Yes + InternetService_DSL + InternetService_No + 
##     OnlineSecurity_Yes + OnlineBackup_Yes + DeviceProtection_Yes + 
##     TechSupport_Yes + StreamingTV_Yes + StreamingMovies_Yes + 
##     PaperlessBilling_Yes + creditcard + ec + monthlycontract + 
##     annual
## 
##                        Df Deviance    AIC
## - OnlineBackup_Yes      1   4648.1 4838.1
## - Partner_Yes           1   4648.1 4838.1
## - gender_Male           1   4648.2 4838.1
## - PhoneService_Yes      1   4648.2 4838.2
## - DeviceProtection_Yes  1   4648.6 4838.6
## - OnlineSecurity_Yes    1   4648.9 4838.9
## - MonthlyCharges        1   4649.3 4839.2
## - TechSupport_Yes       1   4649.3 4839.3
## - creditcard            1   4649.4 4839.4
## - StreamingTV_Yes       1   4650.4 4840.4
## - StreamingMovies_Yes   1   4650.6 4840.6
## - Dependents_Yes        1   4651.0 4841.0
## - InternetService_DSL   1   4651.7 4841.6
## - SeniorCitizen         1   4651.7 4841.6
## - InternetService_No    1   4651.8 4841.7
## - MultipleLines_Yes     1   4652.6 4842.6
## <none>                      4648.1 4846.7
## - annual                1   4661.1 4851.1
## - ec                    1   4663.0 4852.9
## - PaperlessBilling_Yes  1   4663.4 4853.4
## - TotalCharges          1   4671.4 4861.4
## - monthlycontract       1   4714.9 4904.9
## - tenure                1   4742.6 4932.6
## 
## Step:  AIC=4838.07
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     gender_Male + Partner_Yes + Dependents_Yes + PhoneService_Yes + 
##     MultipleLines_Yes + InternetService_DSL + InternetService_No + 
##     OnlineSecurity_Yes + DeviceProtection_Yes + TechSupport_Yes + 
##     StreamingTV_Yes + StreamingMovies_Yes + PaperlessBilling_Yes + 
##     creditcard + ec + monthlycontract + annual
## 
##                        Df Deviance    AIC
## - Partner_Yes           1   4648.2 4829.5
## - gender_Male           1   4648.2 4829.5
## - PhoneService_Yes      1   4649.2 4830.5
## - creditcard            1   4649.5 4830.8
## - OnlineSecurity_Yes    1   4649.8 4831.2
## - DeviceProtection_Yes  1   4650.1 4831.4
## - TechSupport_Yes       1   4650.8 4832.1
## - Dependents_Yes        1   4651.0 4832.4
## - SeniorCitizen         1   4651.7 4833.0
## - MonthlyCharges        1   4655.7 4837.0
## <none>                      4648.1 4838.1
## - StreamingTV_Yes       1   4659.3 4840.6
## - StreamingMovies_Yes   1   4660.3 4841.6
## - annual                1   4661.1 4842.5
## - MultipleLines_Yes     1   4662.4 4843.7
## - ec                    1   4663.0 4844.3
## - PaperlessBilling_Yes  1   4663.4 4844.8
## - InternetService_No    1   4668.1 4849.5
## - InternetService_DSL   1   4668.2 4849.6
## - TotalCharges          1   4671.4 4852.8
## - monthlycontract       1   4714.9 4896.3
## - tenure                1   4742.7 4924.0
## 
## Step:  AIC=4829.5
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     gender_Male + Dependents_Yes + PhoneService_Yes + MultipleLines_Yes + 
##     InternetService_DSL + InternetService_No + OnlineSecurity_Yes + 
##     DeviceProtection_Yes + TechSupport_Yes + StreamingTV_Yes + 
##     StreamingMovies_Yes + PaperlessBilling_Yes + creditcard + 
##     ec + monthlycontract + annual
## 
##                        Df Deviance    AIC
## - gender_Male           1   4648.2 4820.9
## - PhoneService_Yes      1   4649.3 4822.0
## - creditcard            1   4649.5 4822.2
## - OnlineSecurity_Yes    1   4649.9 4822.6
## - DeviceProtection_Yes  1   4650.2 4822.9
## - TechSupport_Yes       1   4650.8 4823.5
## - Dependents_Yes        1   4651.3 4824.0
## - SeniorCitizen         1   4651.9 4824.6
## - MonthlyCharges        1   4655.8 4828.5
## <none>                      4648.2 4829.5
## - StreamingTV_Yes       1   4659.4 4832.1
## - StreamingMovies_Yes   1   4660.4 4833.1
## - annual                1   4661.2 4833.9
## - MultipleLines_Yes     1   4662.5 4835.2
## - ec                    1   4663.1 4835.8
## - PaperlessBilling_Yes  1   4663.5 4836.2
## - InternetService_No    1   4668.3 4841.0
## - InternetService_DSL   1   4668.4 4841.1
## - TotalCharges          1   4671.5 4844.2
## - monthlycontract       1   4715.0 4887.7
## - tenure                1   4743.2 4915.9
## 
## Step:  AIC=4820.93
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     Dependents_Yes + PhoneService_Yes + MultipleLines_Yes + InternetService_DSL + 
##     InternetService_No + OnlineSecurity_Yes + DeviceProtection_Yes + 
##     TechSupport_Yes + StreamingTV_Yes + StreamingMovies_Yes + 
##     PaperlessBilling_Yes + creditcard + ec + monthlycontract + 
##     annual
## 
##                        Df Deviance    AIC
## - PhoneService_Yes      1   4649.3 4813.4
## - creditcard            1   4649.6 4813.7
## - OnlineSecurity_Yes    1   4649.9 4814.0
## - DeviceProtection_Yes  1   4650.2 4814.3
## - TechSupport_Yes       1   4650.9 4815.0
## - Dependents_Yes        1   4651.4 4815.4
## - SeniorCitizen         1   4652.0 4816.1
## - MonthlyCharges        1   4655.8 4819.9
## <none>                      4648.2 4820.9
## - StreamingTV_Yes       1   4659.4 4823.5
## - StreamingMovies_Yes   1   4660.5 4824.5
## - annual                1   4661.2 4825.3
## - MultipleLines_Yes     1   4662.6 4826.6
## - ec                    1   4663.2 4827.2
## - PaperlessBilling_Yes  1   4663.6 4827.6
## - InternetService_No    1   4668.3 4832.4
## - InternetService_DSL   1   4668.4 4832.5
## - TotalCharges          1   4671.5 4835.6
## - monthlycontract       1   4715.0 4879.1
## - tenure                1   4743.3 4907.3
## 
## Step:  AIC=4813.37
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     Dependents_Yes + MultipleLines_Yes + InternetService_DSL + 
##     InternetService_No + OnlineSecurity_Yes + DeviceProtection_Yes + 
##     TechSupport_Yes + StreamingTV_Yes + StreamingMovies_Yes + 
##     PaperlessBilling_Yes + creditcard + ec + monthlycontract + 
##     annual
## 
##                        Df Deviance    AIC
## - DeviceProtection_Yes  1   4650.3 4805.7
## - creditcard            1   4650.7 4806.1
## - Dependents_Yes        1   4652.5 4807.9
## - SeniorCitizen         1   4653.0 4808.4
## - OnlineSecurity_Yes    1   4654.5 4810.0
## - TechSupport_Yes       1   4656.5 4811.9
## <none>                      4649.3 4813.4
## - annual                1   4662.3 4817.7
## - MultipleLines_Yes     1   4664.0 4819.4
## - ec                    1   4664.1 4819.5
## - PaperlessBilling_Yes  1   4664.6 4820.0
## - StreamingTV_Yes       1   4665.9 4821.4
## - MonthlyCharges        1   4666.6 4822.1
## - StreamingMovies_Yes   1   4668.4 4823.8
## - TotalCharges          1   4672.1 4827.5
## - InternetService_DSL   1   4690.9 4846.4
## - InternetService_No    1   4708.3 4863.7
## - monthlycontract       1   4715.9 4871.3
## - tenure                1   4745.4 4900.8
## 
## Step:  AIC=4805.7
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     Dependents_Yes + MultipleLines_Yes + InternetService_DSL + 
##     InternetService_No + OnlineSecurity_Yes + TechSupport_Yes + 
##     StreamingTV_Yes + StreamingMovies_Yes + PaperlessBilling_Yes + 
##     creditcard + ec + monthlycontract + annual
## 
##                        Df Deviance    AIC
## - creditcard            1   4651.7 4798.5
## - Dependents_Yes        1   4653.4 4800.2
## - SeniorCitizen         1   4654.1 4800.9
## - OnlineSecurity_Yes    1   4656.3 4803.1
## - TechSupport_Yes       1   4657.9 4804.6
## <none>                      4650.3 4805.7
## - annual                1   4663.1 4809.9
## - MultipleLines_Yes     1   4664.1 4810.9
## - ec                    1   4664.9 4811.7
## - PaperlessBilling_Yes  1   4665.5 4812.3
## - StreamingTV_Yes       1   4666.2 4813.0
## - MonthlyCharges        1   4666.8 4813.6
## - StreamingMovies_Yes   1   4668.6 4815.4
## - TotalCharges          1   4673.5 4820.3
## - InternetService_DSL   1   4692.0 4838.8
## - InternetService_No    1   4709.5 4856.3
## - monthlycontract       1   4716.1 4862.9
## - tenure                1   4746.1 4892.9
## 
## Step:  AIC=4798.46
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     Dependents_Yes + MultipleLines_Yes + InternetService_DSL + 
##     InternetService_No + OnlineSecurity_Yes + TechSupport_Yes + 
##     StreamingTV_Yes + StreamingMovies_Yes + PaperlessBilling_Yes + 
##     ec + monthlycontract + annual
## 
##                        Df Deviance    AIC
## - Dependents_Yes        1   4654.8 4793.0
## - SeniorCitizen         1   4655.4 4793.6
## - OnlineSecurity_Yes    1   4657.7 4795.9
## - TechSupport_Yes       1   4659.2 4797.4
## <none>                      4651.7 4798.5
## - annual                1   4664.5 4802.6
## - MultipleLines_Yes     1   4665.3 4803.5
## - PaperlessBilling_Yes  1   4666.4 4804.6
## - StreamingTV_Yes       1   4667.5 4805.6
## - MonthlyCharges        1   4668.2 4806.4
## - StreamingMovies_Yes   1   4669.9 4808.0
## - ec                    1   4673.9 4812.0
## - TotalCharges          1   4675.5 4813.7
## - InternetService_DSL   1   4693.1 4831.2
## - InternetService_No    1   4710.3 4848.5
## - monthlycontract       1   4717.6 4855.8
## - tenure                1   4749.9 4888.1
## 
## Step:  AIC=4793.01
## Churn_Yes ~ SeniorCitizen + tenure + MonthlyCharges + TotalCharges + 
##     MultipleLines_Yes + InternetService_DSL + InternetService_No + 
##     OnlineSecurity_Yes + TechSupport_Yes + StreamingTV_Yes + 
##     StreamingMovies_Yes + PaperlessBilling_Yes + ec + monthlycontract + 
##     annual
## 
##                        Df Deviance    AIC
## - SeniorCitizen         1   4659.9 4789.5
## - OnlineSecurity_Yes    1   4661.0 4790.6
## - TechSupport_Yes       1   4662.3 4791.9
## <none>                      4654.8 4793.0
## - annual                1   4668.0 4797.5
## - MultipleLines_Yes     1   4668.7 4798.2
## - PaperlessBilling_Yes  1   4670.1 4799.6
## - StreamingTV_Yes       1   4670.5 4800.1
## - MonthlyCharges        1   4671.9 4801.4
## - StreamingMovies_Yes   1   4673.5 4803.0
## - ec                    1   4677.7 4807.2
## - TotalCharges          1   4679.7 4809.2
## - InternetService_DSL   1   4697.3 4826.9
## - InternetService_No    1   4715.0 4844.5
## - monthlycontract       1   4722.7 4852.2
## - tenure                1   4756.4 4886.0
## 
## Step:  AIC=4789.46
## Churn_Yes ~ tenure + MonthlyCharges + TotalCharges + MultipleLines_Yes + 
##     InternetService_DSL + InternetService_No + OnlineSecurity_Yes + 
##     TechSupport_Yes + StreamingTV_Yes + StreamingMovies_Yes + 
##     PaperlessBilling_Yes + ec + monthlycontract + annual
## 
##                        Df Deviance    AIC
## - OnlineSecurity_Yes    1   4666.3 4787.2
## - TechSupport_Yes       1   4668.1 4789.0
## <none>                      4659.9 4789.5
## - annual                1   4673.5 4794.4
## - MultipleLines_Yes     1   4674.8 4795.7
## - StreamingTV_Yes       1   4675.9 4796.8
## - PaperlessBilling_Yes  1   4676.0 4796.9
## - MonthlyCharges        1   4678.0 4798.9
## - StreamingMovies_Yes   1   4679.7 4800.6
## - ec                    1   4684.4 4805.3
## - TotalCharges          1   4684.9 4805.8
## - InternetService_DSL   1   4705.2 4826.1
## - InternetService_No    1   4723.8 4844.7
## - monthlycontract       1   4731.3 4852.2
## - tenure                1   4760.0 4880.9
## 
## Step:  AIC=4787.21
## Churn_Yes ~ tenure + MonthlyCharges + TotalCharges + MultipleLines_Yes + 
##     InternetService_DSL + InternetService_No + TechSupport_Yes + 
##     StreamingTV_Yes + StreamingMovies_Yes + PaperlessBilling_Yes + 
##     ec + monthlycontract + annual
## 
##                        Df Deviance    AIC
## - TechSupport_Yes       1   4673.4 4785.6
## <none>                      4666.3 4787.2
## - annual                1   4680.7 4793.0
## - PaperlessBilling_Yes  1   4683.7 4796.0
## - MultipleLines_Yes     1   4685.0 4797.3
## - StreamingTV_Yes       1   4688.8 4801.1
## - TotalCharges          1   4690.9 4803.2
## - StreamingMovies_Yes   1   4692.2 4804.5
## - ec                    1   4692.3 4804.5
## - MonthlyCharges        1   4696.1 4808.3
## - InternetService_DSL   1   4734.9 4847.1
## - monthlycontract       1   4741.4 4853.6
## - InternetService_No    1   4748.6 4860.9
## - tenure                1   4768.2 4880.4
## 
## Step:  AIC=4785.64
## Churn_Yes ~ tenure + MonthlyCharges + TotalCharges + MultipleLines_Yes + 
##     InternetService_DSL + InternetService_No + StreamingTV_Yes + 
##     StreamingMovies_Yes + PaperlessBilling_Yes + ec + monthlycontract + 
##     annual
## 
##                        Df Deviance    AIC
## <none>                      4673.4 4785.6
## - annual                1   4690.0 4793.6
## - PaperlessBilling_Yes  1   4690.7 4794.3
## - MultipleLines_Yes     1   4697.1 4800.7
## - TotalCharges          1   4697.3 4800.9
## - StreamingTV_Yes       1   4700.7 4804.3
## - ec                    1   4700.8 4804.5
## - StreamingMovies_Yes   1   4704.2 4807.8
## - MonthlyCharges        1   4719.4 4823.1
## - monthlycontract       1   4758.2 4861.8
## - tenure                1   4774.4 4878.0
## - InternetService_DSL   1   4775.5 4879.1
## - InternetService_No    1   4779.5 4883.2
BIC_step_summary <- summary(BIC_step)
BIC_step_summary$deviance/BIC_step_summary$df.residual
## [1] 0.8327492
AIC(BIC_step)
## [1] 4699.389
BIC(BIC_step)
## [1] 4785.643

Based on the various general linear model (logistic) the step wise AIC performs the best in terms of MSE, AIC, and BIC criteria. The stepwise AIC chose the following variables to be included in the final logistic model:SeniorCitizen + tenure + MonthlyCharges + TotalCharges + Dependents_Yes + MultipleLines_Yes + InternetService_DSL + InternetService_No + OnlineSecurity_Yes + TechSupport_Yes + StreamingTV_Yes + StreamingMovies_Yes + PaperlessBilling_Yes + ec + monthlycontract + annual. We will test this model to see if it is more accurate then the model produced earlier.

Testing the best model from the logistic regression

ROC, AUC, and Asymmetric Cost of the Stepwise AIC model

# ROC curve, in sample prediction
AIC_step_train<- predict(AIC_step, type="response")

# ROC Curve
library(ROCR)
## Warning: package 'ROCR' was built under R version 3.6.3
pred <- prediction(AIC_step_train, training_set$Churn_Yes)
perf <- performance(pred, "tpr", "fpr")
plot(perf, colorize=TRUE, main = "ROC Plot Training Data")

#Get the AUC
unlist(slot(performance(pred, "auc"), "y.values"))
## [1] 0.8490154
# 2X2 misclassification table
pred_resp <- predict(AIC_step,type="response")
hist(pred_resp)

table(training_set$Churn_Yes, (pred_resp > 0.5)*1, dnn=c("Truth","Predicted"))
##      Predicted
## Truth    0    1
##     0 3705  425
##     1  667  828
## Symetric cost (misclassification rate) function
pcut <- 1/2 #prespecify pcut value
cost1 <- function(r, pi){
  mean(((r==0)&(pi>pcut)) | ((r==1)&(pi<pcut)))
}

#Symmetric cost
cost1(r = training_set$Churn_Yes, pi = AIC_step_train)
## [1] 0.1941333

The AUC and cost show that this is an effective determiner of whether a customer is retained. It is more effective than the model created earlier and just guessing based on percentage of churn_yes = 1.

Now we have confirmed the stepwise AIC performs well on the training data, we must now confirm using the testing data set.

# Out-of-sample Testing 
AIC_step_test<- predict(AIC_step, newdata = test_set, type="response")

# Get ROC curve
pred <- prediction(AIC_step_test, test_set$Churn_Yes)
perf <- performance(pred, "tpr", "fpr")
plot(perf, colorize=TRUE, main = "ROC Plot Testing Data")

#Get the AUC
unlist(slot(performance(pred, "auc"), "y.values"))
## [1] 0.8445755
#Asymmetric cost
cost1(r = test_set$Churn_Yes, pi = AIC_step_test)
## [1] 0.1933191

The AUC and cost are close to the results from the training set and they are good enough to show this is an effective model.

Next we use cross validation using AUC as cost.

#AUC as cost
costfunc1 = function(obs, pred.p){
  pred <- prediction(pred.p, obs)
  perf <- performance(pred, "tpr", "fpr")
  cost =unlist(slot(performance(pred, "auc"), "y.values"))
  return(cost)
} 

library(boot)
library(ROCR)

## Attempt using glm set to stepwise
glm1<- glm(Churn_Yes~SeniorCitizen + tenure + MonthlyCharges + TotalCharges + Dependents_Yes + MultipleLines_Yes + InternetService_DSL + InternetService_No + OnlineSecurity_Yes + TechSupport_Yes + StreamingTV_Yes + StreamingMovies_Yes + PaperlessBilling_Yes + ec + monthlycontract + annual, family=binomial, data=dataset);  
cv_result  <- cv.glm(data=dataset, glmfit=glm1, cost=costfunc1, K=10) 
cv_result$delta[2]
## [1] 0.8461742

The cross validation confirms the strong results above.

Conclusion

This project showed the process of importing Telco customer data, performing exploratory data analysis, fitting a variety of models, and then comparing said models to determine best fit. Key indicators like monthly contract and tenure were explained as logically becoming important features. Models included XGBoost and GLM, with stepwise AIC being the best GLM model on multiple metrics, surprassing XGBoost. This model had high AUC (> 0.7) and low symmetric cost, and was further confirmed in cross-validation.

Appendix

Functions used that were covered in class

  • c()
  • ggplot()
  • summary()
  • str()
  • aes()
  • function()
  • ifelse()
  • set.seed()
  • hist()
  • predict()
  • table()
  • head()
  • is.na()
  • glm()