knitr::opts_chunk$set(echo = TRUE)
options(width=120)

library(car)
library(ggplot2)
library(reshape2)
library(psych)
library(pastecs)
library(tidyr)
library(dplyr)
##   SeniorCitizen tenure MultipleLines InternetService       Contract MonthlyCharges Churn
## 1             0     71           Yes             DSL       Two year          80.10    No
## 2             0     53           Yes     Fiber optic       One year          91.15    No
## 3             0     67            No              No       Two year          19.40    No
## 4             0      6            No     Fiber optic Month-to-month          69.80    No
## 5             0     14            No              No Month-to-month          19.60    No
## 6             0     68            No     Fiber optic       One year         100.30    No

1 About Dataset

The dataset contains the data of the customers in telecommunication industry. Therefore the unit of observation is a customer. The dataset includes 21 variables and data of 7043 customers.

For easier data manipulation I shrinked the data to 1000 observations and 7 variables, as more are not going to be necessary.

The unit of observation of the analysis is a customer in the network.

Source:

https://www.kaggle.com/datasets/blastchar/telco-customer-churn

https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113

The variable description is as below.

2 Objective of the Study

The main goal of this study is to build a prediction model for predicting the customers who are likely to churn from the network. In the first section the descriptive analysis and in the third section hypothesis testing was conducted. Therefore as mentioned the prediction model will be build in this section.

3 Data Manipulation

Before moving to the modeling part data pre processing is a necessary and required step, to get the data for suitable types so it can be continued with the further analysis without any error. In this dataset also there were many required preprocesing steps were identified and they were corrected before the next step.

df$SeniorCitizen <- factor(df$SeniorCitizen,
                           levels = c("0","1"),
                           labels = c("No", "Yes"))

All the categorical variables in the dataset are in ‘char’ format. Therefore they will be converted to factors before moving to the descriptive analysis.

3.1 Changing the data type of Categorical variables

df$MultipleLines= as.factor(df$MultipleLines)
df$InternetService= as.factor(df$InternetService)
df$Contract= as.factor(df$Contract)
df$Churn= as.factor(df$Churn)

4 Building the Logistic Models

4.1 Model 01:

The first model is built only as an intercept model, which indicates no independent variables will be used to build the model.

#Fitting the logistic only intercept model
fit0 <- glm(df$Churn ~ 1, #Dependent and explanatory variables
            family = binomial, #Binary logistic regression
            data = df)

The below outputs show the intercept model summary, the intercept value is -1.046.

summary(fit0)
## 
## Call:
## glm(formula = df$Churn ~ 1, family = binomial, data = df)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -0.776  -0.776  -0.776   1.641   1.641  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.04597    0.07209  -14.51   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1146.1  on 999  degrees of freedom
## Residual deviance: 1146.1  on 999  degrees of freedom
## AIC: 1148.1
## 
## Number of Fisher Scoring iterations: 4
exp(cbind(odds = fit0$coefficients, confint.default(fit0))) #Odds for Y=1
##                  odds     2.5 %    97.5 %
## (Intercept) 0.3513514 0.3050531 0.4046764

The next step includes testing the predicted vs the actuals, for that a classification matrix is created.

#Confusion Matrix
head(fitted(fit0))
##    1    2    3    4    5    6 
## 0.26 0.26 0.26 0.26 0.26 0.26
correctly <- 740/1000

The probabilities are all the same as we did not include any other explanatory variable. The probability of client churning is 26 percent and the probability of correctly classified units is therefore 74 percent.

4.2 Model 02:

Building the second model we included the tenure as an independent variable to see if the model is better.

#Fitting the logistic model
fit1 <- glm(df$Churn ~ tenure, #Dependent and explanatory variables
            family = binomial, #Binary logistic regression
            data = df)
summary(fit1)
## 
## Call:
## glm(formula = df$Churn ~ tenure, family = binomial, data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1429  -0.8486  -0.5072   1.2277   2.2956  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.08163    0.11218  -0.728    0.467    
## tenure      -0.03592    0.00372  -9.658   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1146.1  on 999  degrees of freedom
## Residual deviance: 1031.3  on 998  degrees of freedom
## AIC: 1035.3
## 
## Number of Fisher Scoring iterations: 4

We see that tenure is statistically sigificant.

anova(fit0, fit1, test = "Chi") #Comparision of models based on -2LL statistics
## Analysis of Deviance Table
## 
## Model 1: df$Churn ~ 1
## Model 2: df$Churn ~ tenure
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       999     1146.1                          
## 2       998     1031.3  1   114.81 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

H0: Simple model is better

H1: Complex model is better

The degrees of freedom (m) equals to 1 and deviance is smaller, and considering the p-value we can reject the null hypothesis and conclude that complex model which includes tenure as explanatory variable is better (p < 0.001).

exp(cbind(OR = fit1$coefficients, confint.default(fit1)))
##                    OR     2.5 %    97.5 %
## (Intercept) 0.9216090 0.7397017 1.1482509
## tenure      0.9647124 0.9577048 0.9717713

If tenure increases by 1 month, the odds for churning is equal to 0.96 times the initial odds under the assumption that the remaining explanatory variables do not change (p<0,001).

4.3 Model 03:

Model 03 is build based on all the explanatory variables we were left off in the table.

fit2 <- glm(Churn ~ SeniorCitizen + tenure + MultipleLines+ InternetService + Contract + MonthlyCharges,  
            family = binomial, 
            data = df)
vif(fit2)
##                     GVIF Df GVIF^(1/(2*Df))
## SeniorCitizen   1.085280  1        1.041768
## tenure          1.903811  1        1.379786
## MultipleLines   2.620181  2        1.272280
## InternetService 7.379071  2        1.648164
## Contract        1.531826  2        1.112506
## MonthlyCharges  8.484722  1        2.912855

We will remove MonthlyCharges because of the multicollinearity.

df <- df[ , -c(6)]

fit2 <- glm(Churn ~ SeniorCitizen + tenure + MultipleLines+ InternetService + Contract,  
            family = binomial, 
            data = df)

vif(fit2)
##                     GVIF Df GVIF^(1/(2*Df))
## SeniorCitizen   1.080885  1        1.039656
## tenure          1.745943  1        1.321341
## MultipleLines   1.746574  2        1.149600
## InternetService 1.635842  2        1.130929
## Contract        1.474496  2        1.101947

We see that the GVIF is reduced at InternetService when removing MonthlyCharges, so that is good.

df$StdResid <- rstandard(fit2)
df$CookDist <- cooks.distance(fit2)

hist(df$StdResid, 
     main = "Histogram of standardized residuals",
     ylab = "Frequency", 
     xlab = "Standardized residuals")

Standardized residual greater than 3 or less than -3 can be considered as outliers.Therefore, in this histogram, outliers are not observed.

head(df$CookDist)
## [1] 2.689948e-06 2.028701e-04 6.770075e-07 1.238401e-03 1.422785e-04 1.397198e-04
head(df[order(-df$CookDist), ])
##     SeniorCitizen tenure MultipleLines InternetService Contract Churn StdResid   CookDist
## 401            No     55            No             DSL Two year   Yes 2.868602 0.02351044
## 604            No     57            No     Fiber optic Two year   Yes 2.389266 0.02100341
## 537            No     54            No     Fiber optic Two year   Yes 2.353393 0.02064268
## 112            No     66           Yes     Fiber optic Two year   Yes 2.490158 0.01965960
## 152            No     65           Yes     Fiber optic Two year   Yes 2.478531 0.01956842
## 57             No     61           Yes     Fiber optic Two year   Yes 2.431698 0.01925458
df <- df[-c(401) , ]

Refitting the model after removing a unit.

Chi-square test to compare two models:

H0: Simple model is better

H1: Complex model is better

fit1 <- glm(df$Churn ~ tenure, 
            family = binomial,
            data = df)


fit2 <- glm(Churn ~ SeniorCitizen + tenure + MultipleLines+ InternetService + Contract,  
            family = binomial, 
            data = df)
anova(fit1, fit2, test = "Chi")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: df$Churn
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                     998     1143.4              
## tenure  1   116.49       997     1026.9 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

At p< 0,001 I will reject null hypothesis therefore the complex model fits the data better.

summary(fit2)
## 
## Call:
## glm(formula = Churn ~ SeniorCitizen + tenure + MultipleLines + 
##     InternetService + Contract, family = binomial, data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6737  -0.7248  -0.3129   0.7725   2.5500  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -0.75688    0.19625  -3.857 0.000115 ***
## SeniorCitizenYes               0.46060    0.21782   2.115 0.034463 *  
## tenure                        -0.03121    0.00554  -5.634 1.76e-08 ***
## MultipleLinesNo phone service  0.74996    0.32134   2.334 0.019606 *  
## MultipleLinesYes               0.03004    0.20363   0.148 0.882725    
## InternetServiceFiber optic     1.41505    0.22573   6.269 3.64e-10 ***
## InternetServiceNo             -0.87388    0.35979  -2.429 0.015145 *  
## ContractOne year              -0.61237    0.27626  -2.217 0.026647 *  
## ContractTwo year              -1.84009    0.50240  -3.663 0.000250 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1143.42  on 998  degrees of freedom
## Residual deviance:  857.46  on 990  degrees of freedom
## AIC: 875.46
## 
## Number of Fisher Scoring iterations: 6

We see that Multiple phone lines compared to single phone line are statstically insignificant.

exp(cbind(OR = fit2$coefficients, confint.default(fit2)))
##                                      OR     2.5 %    97.5 %
## (Intercept)                   0.4691263 0.3193317 0.6891877
## SeniorCitizenYes              1.5850301 1.0342590 2.4291019
## tenure                        0.9692736 0.9588066 0.9798548
## MultipleLinesNo phone service 2.1169086 1.1276451 3.9740356
## MultipleLinesYes              1.0304944 0.6913771 1.5359472
## InternetServiceFiber optic    4.1167125 2.6448701 6.4076196
## InternetServiceNo             0.4173290 0.2061717 0.8447496
## ContractOne year              0.5420625 0.3154242 0.9315448
## ContractTwo year              0.1588025 0.0593216 0.4251107

If tenure increases by 1 month, the odds for churning is equal to 0.97 times the initial odds under the assumption that the remaining explanatory variables do not change (p<0.001).

Given the value of all other variables, the odds churning for customer with no phone line are 2.12 times the odds of those who have one phone line (p < 0.05).

Given the value of all other variables, the odds churning for a senior citizen are 1.59 times the odds of non senior citizen churning (p < 0.05).

Given the value of all other variables, the odds churning for a customer with fiber optic are 4.21 times the odds of customer with DSL churning (p < 0.001).

Given the value of all other variables, the odds churning for a customer with no internet access are 0.42 times the odds of customer with DSL churning (p < 0.05).

Given the value of all other variables, the odds churning for a customer with one year contract are 0.54 times the odds of customr with month-to-month contract churning (p < 0.05).

Given the value of all other variables, the odds churning for a customer with two year contract are 0.16 times the odds of customr with month-to-month contract churning (p < 0.001).

df$EstimatedProbab <- fitted(fit2)
head(df)
##   SeniorCitizen tenure MultipleLines InternetService       Contract Churn    StdResid     CookDist EstimatedProbab
## 1            No     71           Yes             DSL       Two year    No -0.14401934 2.689948e-06     0.008303542
## 2            No     53           Yes     Fiber optic       One year    No -0.61239629 2.028701e-04     0.171049282
## 3            No     67            No              No       Two year    No -0.09658068 6.770075e-07     0.003827074
## 4            No      6            No     Fiber optic Month-to-month    No -1.38722998 1.238401e-03     0.615601779
## 5            No     14            No              No Month-to-month    No -0.48866018 1.422785e-04     0.112277938
## 6            No     68            No     Fiber optic       One year    No -0.49242353 1.397198e-04     0.111414469

In the far right column we can observe the change of a customer churning based on values of his explanatory variables.

df$Classification <- ifelse(test = df$EstimatedProbab < 0.5, 
                                 yes = "No",
                                 no = "Yes")

ClassificationTable <- table(df$Churn, df$Classification)

ClassificationTable
##      
##        No Yes
##   No  664  76
##   Yes 133 126

The correctly classified units of observation lie on the main diagonal of the matrix.

Pseudo_R2_fit2 <- (ClassificationTable[1,1] + ClassificationTable[2,2])/nrow(df)
Pseudo_R2_fit2
## [1] 0.7907908

79.1 percent of all observations were correctly classified.