Customer Attrition

Customer attrition, also known as customer churn, customer turnover, or customer defection, is the loss of clients or customers.

Telephone service companies, Internet service providers, pay TV companies, insurance firms, and alarm monitoring services, often use customer attrition analysis and customer attrition rates as one of their key business metrics because the cost of retaining an existing customer is far less than acquiring a new one. Companies from these sectors often have customer service branches which attempt to win back defecting clients, because recovered long-term customers can be worth much more to a company than newly recruited clients.

Companies usually make a distinction between voluntary churn and involuntary churn. Voluntary churn occurs due to a decision by the customer to switch to another company or service provider, involuntary churn occurs due to circumstances such as a customer’s relocation to a long-term care facility, death, or the relocation to a distant location. In most applications, involuntary reasons for churn are excluded from the analytical models. Analysts tend to concentrate on voluntary churn, because it typically occurs due to factors of the company-customer relationship which companies control, such as how billing interactions are handled or how after-sales help is provided.

predictive analytics use churn prediction models that predict customer churn by assessing their propensity of risk to churn. Since these models generate a small prioritized list of potential defectors, they are effective at focusing customer retention marketing programs on the subset of the customer base who are most vulnerable to churn.

library(tidyverse) 
library(dplyr)
library(MASS)
library(car)
library(e1071)
library(caret)
library(cowplot)
library(caTools)
library(pROC)
library(ggcorrplot)
telco <- read.csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")
glimpse(telco)
## Observations: 7,043
## Variables: 21
## $ customerID       <fct> 7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOCW, 92...
## $ gender           <fct> Female, Male, Male, Male, Female, Female, Male, Fe...
## $ SeniorCitizen    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Partner          <fct> Yes, No, No, No, No, No, No, No, Yes, No, Yes, No,...
## $ Dependents       <fct> No, No, No, No, No, No, Yes, No, No, Yes, Yes, No,...
## $ tenure           <int> 1, 34, 2, 45, 2, 8, 22, 10, 28, 62, 13, 16, 58, 49...
## $ PhoneService     <fct> No, Yes, Yes, No, Yes, Yes, Yes, No, Yes, Yes, Yes...
## $ MultipleLines    <fct> No phone service, No, No, No phone service, No, Ye...
## $ InternetService  <fct> DSL, DSL, DSL, DSL, Fiber optic, Fiber optic, Fibe...
## $ OnlineSecurity   <fct> No, Yes, Yes, Yes, No, No, No, Yes, No, Yes, Yes, ...
## $ OnlineBackup     <fct> Yes, No, Yes, No, No, No, Yes, No, No, Yes, No, No...
## $ DeviceProtection <fct> No, Yes, No, Yes, No, Yes, No, No, Yes, No, No, No...
## $ TechSupport      <fct> No, No, No, Yes, No, No, No, No, Yes, No, No, No i...
## $ StreamingTV      <fct> No, No, No, No, No, Yes, Yes, No, Yes, No, No, No ...
## $ StreamingMovies  <fct> No, No, No, No, No, Yes, No, No, Yes, No, No, No i...
## $ Contract         <fct> Month-to-month, One year, Month-to-month, One year...
## $ PaperlessBilling <fct> Yes, No, Yes, No, Yes, Yes, Yes, No, Yes, No, Yes,...
## $ PaymentMethod    <fct> Electronic check, Mailed check, Mailed check, Bank...
## $ MonthlyCharges   <dbl> 29.85, 56.95, 53.85, 42.30, 70.70, 99.65, 89.10, 2...
## $ TotalCharges     <dbl> 29.85, 1889.50, 108.15, 1840.75, 151.65, 820.50, 1...
## $ Churn            <fct> No, No, Yes, No, Yes, Yes, No, No, Yes, No, No, No...
telco <- telco %>% 
  mutate(SeniorCitizen= as.factor(SeniorCitizen))
colSums(is.na(telco))
##       customerID           gender    SeniorCitizen          Partner 
##                0                0                0                0 
##       Dependents           tenure     PhoneService    MultipleLines 
##                0                0                0                0 
##  InternetService   OnlineSecurity     OnlineBackup DeviceProtection 
##                0                0                0                0 
##      TechSupport      StreamingTV  StreamingMovies         Contract 
##                0                0                0                0 
## PaperlessBilling    PaymentMethod   MonthlyCharges     TotalCharges 
##                0                0                0               11 
##            Churn 
##                0
  1. There are only 11 missing data in the TotalCharges field, so getting rid of those rows from the dataset.
  2. There are three continuous variables and they are Tenure, MonthlyCharges and TotalCharges. SeniorCitizen is in ‘int’ form, that can be changed to categorical.

EXPLORATORY DATA ANALYSIS:

telco <- telco %>% 
  drop_na(TotalCharges)
colSums(is.na(telco))
##       customerID           gender    SeniorCitizen          Partner 
##                0                0                0                0 
##       Dependents           tenure     PhoneService    MultipleLines 
##                0                0                0                0 
##  InternetService   OnlineSecurity     OnlineBackup DeviceProtection 
##                0                0                0                0 
##      TechSupport      StreamingTV  StreamingMovies         Contract 
##                0                0                0                0 
## PaperlessBilling    PaymentMethod   MonthlyCharges     TotalCharges 
##                0                0                0                0 
##            Churn 
##                0
ggplot(telco, aes(x=Churn))+
  geom_bar()

  1. Gender - The churn percent is almost equal in case of Male and Females
  2. The percent of churn is higher in case of senior citizens
  3. Customers with Partners and Dependents have lower churn rate as compared to those who don’t have partners & Dependents.
telco_agg <- telco %>% 
  group_by(Contract, Churn) %>% 
  summarise(jumlah = n()) %>% 
  arrange(desc(jumlah))
telco_agg %>% 
  ggplot(aes(x = Contract, y = jumlah)) +
  geom_col(aes(fill = Churn), position = "dodge")

1. A larger percent of Customers with monthly subscription have left when compared to Customers with one or two year contract.
2. Churn percent is higher in case of cutsomers having paperless billing option.
3. Customers who have ElectronicCheck PaymentMethod tend to leave the platform more when compared to other options.

telco <- telco %>% 
  dplyr::select(-customerID)
options(repr.plot.width =6, repr.plot.height = 4)
telco_cor <- round(cor(telco[,c("tenure", "MonthlyCharges", "TotalCharges")]), 1)

ggcorrplot(telco_cor,  title = "Correlation")+theme(plot.title = element_text(hjust = 0.5))

Checking for outliers in the continuous variables, and it seems none of the values are beyond the whiskers here.
Splitting Data for checking model

RNGkind(sample.kind = "Rounding")
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(19)
index <- sample(nrow(telco), nrow(telco)*0.8)
data_train <- telco[index, ]
data_test <- telco[-index,]

Model 1
Build the first model using all variables

model_full <- glm(Churn~.,"binomial", data_train)
summary(model_full)
## 
## Call:
## glm(formula = Churn ~ ., family = "binomial", data = data_train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9427  -0.6696  -0.2849   0.7272   3.2043  
## 
## Coefficients: (7 not defined because of singularities)
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                           1.650e+00  9.143e-01   1.804 0.071162 .  
## genderMale                           -1.006e-02  7.253e-02  -0.139 0.889715    
## SeniorCitizen1                        2.545e-01  9.459e-02   2.690 0.007143 ** 
## PartnerYes                            1.233e-02  8.758e-02   0.141 0.888029    
## DependentsYes                        -1.798e-01  1.002e-01  -1.795 0.072689 .  
## tenure                               -5.598e-02  6.889e-03  -8.126 4.43e-16 ***
## PhoneServiceYes                       7.249e-01  7.292e-01   0.994 0.320149    
## MultipleLinesNo phone service                NA         NA      NA       NA    
## MultipleLinesYes                      6.214e-01  2.002e-01   3.103 0.001913 ** 
## InternetServiceFiber optic            2.383e+00  8.997e-01   2.649 0.008075 ** 
## InternetServiceNo                    -2.475e+00  9.066e-01  -2.730 0.006336 ** 
## OnlineSecurityNo internet service            NA         NA      NA       NA    
## OnlineSecurityYes                     4.908e-02  2.010e-01   0.244 0.807085    
## OnlineBackupNo internet service              NA         NA      NA       NA    
## OnlineBackupYes                       1.680e-01  1.968e-01   0.854 0.393230    
## DeviceProtectionNo internet service          NA         NA      NA       NA    
## DeviceProtectionYes                   2.941e-01  1.988e-01   1.480 0.138943    
## TechSupportNo internet service               NA         NA      NA       NA    
## TechSupportYes                       -1.205e-01  2.036e-01  -0.592 0.553707    
## StreamingTVNo internet service               NA         NA      NA       NA    
## StreamingTVYes                        8.786e-01  3.679e-01   2.388 0.016942 *  
## StreamingMoviesNo internet service           NA         NA      NA       NA    
## StreamingMoviesYes                    7.776e-01  3.671e-01   2.118 0.034171 *  
## ContractOne year                     -7.141e-01  1.211e-01  -5.895 3.75e-09 ***
## ContractTwo year                     -1.373e+00  1.971e-01  -6.968 3.22e-12 ***
## PaperlessBillingYes                   4.151e-01  8.314e-02   4.992 5.98e-07 ***
## PaymentMethodCredit card (automatic)  1.563e-02  1.279e-01   0.122 0.902728    
## PaymentMethodElectronic check         3.565e-01  1.075e-01   3.317 0.000910 ***
## PaymentMethodMailed check            -9.416e-04  1.295e-01  -0.007 0.994200    
## MonthlyCharges                       -6.591e-02  3.570e-02  -1.846 0.064873 .  
## TotalCharges                          2.704e-04  7.839e-05   3.450 0.000561 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6532.1  on 5624  degrees of freedom
## Residual deviance: 4662.2  on 5601  degrees of freedom
## AIC: 4710.2
## 
## Number of Fisher Scoring iterations: 6

Using stepAIC for variable selection, which is a iterative process of adding or removing variables, in order to get a subset of variables that gives the best performing model.

model_both <- stepAIC(model_full, direction = "both")
## Start:  AIC=4710.2
## Churn ~ gender + SeniorCitizen + Partner + Dependents + tenure + 
##     PhoneService + MultipleLines + InternetService + OnlineSecurity + 
##     OnlineBackup + DeviceProtection + TechSupport + StreamingTV + 
##     StreamingMovies + Contract + PaperlessBilling + PaymentMethod + 
##     MonthlyCharges + TotalCharges
## 
## 
## Step:  AIC=4710.2
## Churn ~ gender + SeniorCitizen + Partner + Dependents + tenure + 
##     MultipleLines + InternetService + OnlineSecurity + OnlineBackup + 
##     DeviceProtection + TechSupport + StreamingTV + StreamingMovies + 
##     Contract + PaperlessBilling + PaymentMethod + MonthlyCharges + 
##     TotalCharges
## 
##                    Df Deviance    AIC
## - gender            1   4662.2 4708.2
## - Partner           1   4662.2 4708.2
## - OnlineSecurity    1   4662.3 4708.3
## - TechSupport       1   4662.6 4708.6
## - OnlineBackup      1   4662.9 4708.9
## <none>                  4662.2 4710.2
## - DeviceProtection  1   4664.4 4710.4
## - Dependents        1   4665.4 4711.4
## - MonthlyCharges    1   4665.6 4711.6
## - StreamingMovies   1   4666.7 4712.7
## - StreamingTV       1   4667.9 4713.9
## - InternetService   1   4669.2 4715.2
## - SeniorCitizen     1   4669.4 4715.4
## - TotalCharges      1   4674.6 4720.6
## - PaymentMethod     3   4682.6 4724.6
## - MultipleLines     2   4683.0 4727.0
## - PaperlessBilling  1   4687.3 4733.3
## - Contract          2   4732.8 4776.8
## - tenure            1   4739.1 4785.1
## 
## Step:  AIC=4708.22
## Churn ~ SeniorCitizen + Partner + Dependents + tenure + MultipleLines + 
##     InternetService + OnlineSecurity + OnlineBackup + DeviceProtection + 
##     TechSupport + StreamingTV + StreamingMovies + Contract + 
##     PaperlessBilling + PaymentMethod + MonthlyCharges + TotalCharges
## 
##                    Df Deviance    AIC
## - Partner           1   4662.2 4706.2
## - OnlineSecurity    1   4662.3 4706.3
## - TechSupport       1   4662.6 4706.6
## - OnlineBackup      1   4663.0 4707.0
## <none>                  4662.2 4708.2
## - DeviceProtection  1   4664.4 4708.4
## - Dependents        1   4665.5 4709.5
## - MonthlyCharges    1   4665.7 4709.7
## + gender            1   4662.2 4710.2
## - StreamingMovies   1   4666.7 4710.7
## - StreamingTV       1   4668.0 4712.0
## - InternetService   1   4669.3 4713.3
## - SeniorCitizen     1   4669.5 4713.5
## - TotalCharges      1   4674.6 4718.6
## - PaymentMethod     3   4682.6 4722.6
## - MultipleLines     2   4683.1 4725.1
## - PaperlessBilling  1   4687.4 4731.4
## - Contract          2   4732.8 4774.8
## - tenure            1   4739.2 4783.2
## 
## Step:  AIC=4706.24
## Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines + 
##     InternetService + OnlineSecurity + OnlineBackup + DeviceProtection + 
##     TechSupport + StreamingTV + StreamingMovies + Contract + 
##     PaperlessBilling + PaymentMethod + MonthlyCharges + TotalCharges
## 
##                    Df Deviance    AIC
## - OnlineSecurity    1   4662.3 4704.3
## - TechSupport       1   4662.6 4704.6
## - OnlineBackup      1   4663.0 4705.0
## <none>                  4662.2 4706.2
## - DeviceProtection  1   4664.5 4706.5
## - MonthlyCharges    1   4665.7 4707.7
## - Dependents        1   4666.0 4708.0
## + Partner           1   4662.2 4708.2
## + gender            1   4662.2 4708.2
## - StreamingMovies   1   4666.8 4708.8
## - StreamingTV       1   4668.0 4710.0
## - InternetService   1   4669.3 4711.3
## - SeniorCitizen     1   4669.6 4711.6
## - TotalCharges      1   4674.6 4716.6
## - PaymentMethod     3   4682.7 4720.7
## - MultipleLines     2   4683.1 4723.1
## - PaperlessBilling  1   4687.4 4729.4
## - Contract          2   4732.9 4772.9
## - tenure            1   4739.6 4781.6
## 
## Step:  AIC=4704.31
## Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines + 
##     InternetService + OnlineBackup + DeviceProtection + TechSupport + 
##     StreamingTV + StreamingMovies + Contract + PaperlessBilling + 
##     PaymentMethod + MonthlyCharges + TotalCharges
## 
##                    Df Deviance    AIC
## - OnlineBackup      1   4663.5 4703.5
## - TechSupport       1   4663.8 4703.8
## <none>                  4662.3 4704.3
## - Dependents        1   4666.0 4706.0
## + OnlineSecurity    1   4662.2 4706.2
## + gender            1   4662.3 4706.3
## + Partner           1   4662.3 4706.3
## - DeviceProtection  1   4666.8 4706.8
## - SeniorCitizen     1   4669.7 4709.7
## - MonthlyCharges    1   4674.4 4714.4
## - TotalCharges      1   4674.8 4714.8
## - StreamingMovies   1   4676.3 4716.3
## - PaymentMethod     3   4682.7 4718.7
## - StreamingTV       1   4680.8 4720.8
## - MultipleLines     2   4687.6 4725.6
## - PaperlessBilling  1   4687.5 4727.5
## - InternetService   1   4689.7 4729.7
## - Contract          2   4732.9 4770.9
## - tenure            1   4739.6 4779.6
## 
## Step:  AIC=4703.49
## Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines + 
##     InternetService + DeviceProtection + TechSupport + StreamingTV + 
##     StreamingMovies + Contract + PaperlessBilling + PaymentMethod + 
##     MonthlyCharges + TotalCharges
## 
##                    Df Deviance    AIC
## <none>                  4663.5 4703.5
## + OnlineBackup      1   4662.3 4704.3
## - DeviceProtection  1   4666.8 4704.8
## + OnlineSecurity    1   4663.0 4705.0
## - Dependents        1   4667.2 4705.2
## - TechSupport       1   4667.3 4705.3
## + gender            1   4663.5 4705.5
## + Partner           1   4663.5 4705.5
## - SeniorCitizen     1   4670.9 4708.9
## - TotalCharges      1   4676.2 4714.2
## - MonthlyCharges    1   4677.8 4715.8
## - StreamingMovies   1   4678.9 4716.9
## - PaymentMethod     3   4684.1 4718.1
## - StreamingTV       1   4684.6 4722.6
## - MultipleLines     2   4688.1 4724.1
## - PaperlessBilling  1   4689.2 4727.2
## - InternetService   1   4701.6 4739.6
## - Contract          2   4735.4 4771.4
## - tenure            1   4740.8 4778.8
summary(model_both)
## 
## Call:
## glm(formula = Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines + 
##     InternetService + DeviceProtection + TechSupport + StreamingTV + 
##     StreamingMovies + Contract + PaperlessBilling + PaymentMethod + 
##     MonthlyCharges + TotalCharges, family = "binomial", data = data_train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9500  -0.6687  -0.2870   0.7286   3.2230  
## 
## Coefficients: (4 not defined because of singularities)
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                           1.450e+00  5.963e-01   2.431 0.015071 *  
## SeniorCitizen1                        2.569e-01  9.403e-02   2.732 0.006294 ** 
## DependentsYes                        -1.727e-01  9.054e-02  -1.908 0.056384 .  
## tenure                               -5.590e-02  6.863e-03  -8.145 3.80e-16 ***
## MultipleLinesNo phone service        -3.086e-01  2.788e-01  -1.107 0.268403    
## MultipleLinesYes                      5.204e-01  1.073e-01   4.848 1.24e-06 ***
## InternetServiceFiber optic            1.875e+00  3.064e-01   6.120 9.35e-10 ***
## InternetServiceNo                    -1.962e+00  3.659e-01  -5.362 8.23e-08 ***
## DeviceProtectionNo internet service          NA         NA      NA       NA    
## DeviceProtectionYes                   1.932e-01  1.063e-01   1.817 0.069165 .  
## TechSupportNo internet service               NA         NA      NA       NA    
## TechSupportYes                       -2.253e-01  1.150e-01  -1.959 0.050067 .  
## StreamingTVNo internet service               NA         NA      NA       NA    
## StreamingTVYes                        6.769e-01  1.480e-01   4.574 4.78e-06 ***
## StreamingMoviesNo internet service           NA         NA      NA       NA    
## StreamingMoviesYes                    5.731e-01  1.467e-01   3.905 9.41e-05 ***
## ContractOne year                     -7.186e-01  1.211e-01  -5.936 2.92e-09 ***
## ContractTwo year                     -1.383e+00  1.967e-01  -7.029 2.08e-12 ***
## PaperlessBillingYes                   4.193e-01  8.303e-02   5.050 4.41e-07 ***
## PaymentMethodCredit card (automatic)  1.611e-02  1.278e-01   0.126 0.899659    
## PaymentMethodElectronic check         3.572e-01  1.074e-01   3.327 0.000879 ***
## PaymentMethodMailed check            -3.325e-03  1.294e-01  -0.026 0.979498    
## MonthlyCharges                       -4.548e-02  1.207e-02  -3.769 0.000164 ***
## TotalCharges                          2.740e-04  7.828e-05   3.500 0.000466 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6532.1  on 5624  degrees of freedom
## Residual deviance: 4663.5  on 5605  degrees of freedom
## AIC: 4703.5
## 
## Number of Fisher Scoring iterations: 6

Build Model 3 for get p-value low

model_3 <- glm(formula = Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines + 
    InternetService + DeviceProtection + TechSupport + StreamingTV + 
    StreamingMovies + Contract + PaperlessBilling + PaymentMethod + 
    MonthlyCharges + TotalCharges, family = "binomial", data = data_train)
summary(model_3)
## 
## Call:
## glm(formula = Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines + 
##     InternetService + DeviceProtection + TechSupport + StreamingTV + 
##     StreamingMovies + Contract + PaperlessBilling + PaymentMethod + 
##     MonthlyCharges + TotalCharges, family = "binomial", data = data_train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9500  -0.6687  -0.2870   0.7286   3.2230  
## 
## Coefficients: (4 not defined because of singularities)
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                           1.450e+00  5.963e-01   2.431 0.015071 *  
## SeniorCitizen1                        2.569e-01  9.403e-02   2.732 0.006294 ** 
## DependentsYes                        -1.727e-01  9.054e-02  -1.908 0.056384 .  
## tenure                               -5.590e-02  6.863e-03  -8.145 3.80e-16 ***
## MultipleLinesNo phone service        -3.086e-01  2.788e-01  -1.107 0.268403    
## MultipleLinesYes                      5.204e-01  1.073e-01   4.848 1.24e-06 ***
## InternetServiceFiber optic            1.875e+00  3.064e-01   6.120 9.35e-10 ***
## InternetServiceNo                    -1.962e+00  3.659e-01  -5.362 8.23e-08 ***
## DeviceProtectionNo internet service          NA         NA      NA       NA    
## DeviceProtectionYes                   1.932e-01  1.063e-01   1.817 0.069165 .  
## TechSupportNo internet service               NA         NA      NA       NA    
## TechSupportYes                       -2.253e-01  1.150e-01  -1.959 0.050067 .  
## StreamingTVNo internet service               NA         NA      NA       NA    
## StreamingTVYes                        6.769e-01  1.480e-01   4.574 4.78e-06 ***
## StreamingMoviesNo internet service           NA         NA      NA       NA    
## StreamingMoviesYes                    5.731e-01  1.467e-01   3.905 9.41e-05 ***
## ContractOne year                     -7.186e-01  1.211e-01  -5.936 2.92e-09 ***
## ContractTwo year                     -1.383e+00  1.967e-01  -7.029 2.08e-12 ***
## PaperlessBillingYes                   4.193e-01  8.303e-02   5.050 4.41e-07 ***
## PaymentMethodCredit card (automatic)  1.611e-02  1.278e-01   0.126 0.899659    
## PaymentMethodElectronic check         3.572e-01  1.074e-01   3.327 0.000879 ***
## PaymentMethodMailed check            -3.325e-03  1.294e-01  -0.026 0.979498    
## MonthlyCharges                       -4.548e-02  1.207e-02  -3.769 0.000164 ***
## TotalCharges                          2.740e-04  7.828e-05   3.500 0.000466 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6532.1  on 5624  degrees of freedom
## Residual deviance: 4663.5  on 5605  degrees of freedom
## AIC: 4703.5
## 
## Number of Fisher Scoring iterations: 6

Model Evaluation using the validation data:

why model_3, cause AIC lower

pred_telco <- predict(model_3,data_test,"response")
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
head(pred_telco)
##          2         11         15         22         23         24 
## 0.04512903 0.24148402 0.46505525 0.06027001 0.18506237 0.01464914
pred_telco2 <- as.factor(ifelse(pred_telco >= 0.5, "Yes", "No"))
head(pred_telco2)
##  2 11 15 22 23 24 
## No No No No No No 
## Levels: No Yes
head(data_test$Churn)
## [1] No  No  No  No  Yes No 
## Levels: No Yes
table("Predicted" = pred_telco2, "Reference" = data_test$Churn)
##          Reference
## Predicted  No Yes
##       No  924 164
##       Yes 118 201
confusionMatrix(pred_telco2,
                reference = data_test$Churn,
                positive = "Yes")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  924 164
##        Yes 118 201
##                                           
##                Accuracy : 0.7996          
##                  95% CI : (0.7777, 0.8202)
##     No Information Rate : 0.7406          
##     P-Value [Acc > NIR] : 1.286e-07       
##                                           
##                   Kappa : 0.4561          
##                                           
##  Mcnemar's Test P-Value : 0.007369        
##                                           
##             Sensitivity : 0.5507          
##             Specificity : 0.8868          
##          Pos Pred Value : 0.6301          
##          Neg Pred Value : 0.8493          
##              Prevalence : 0.2594          
##          Detection Rate : 0.1429          
##    Detection Prevalence : 0.2267          
##       Balanced Accuracy : 0.7187          
##                                           
##        'Positive' Class : Yes             
## 

as you can see this model have accuracy almost 80%

now uses recall to increase the “YES” prediction percentage.

pred_recall_telco <- as.factor(ifelse(pred_telco>0.3, "Yes","No"))
confusionMatrix(pred_recall_telco, data_test$Churn, positive = "Yes")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  794  80
##        Yes 248 285
##                                           
##                Accuracy : 0.7669          
##                  95% CI : (0.7439, 0.7888)
##     No Information Rate : 0.7406          
##     P-Value [Acc > NIR] : 0.01253         
##                                           
##                   Kappa : 0.4722          
##                                           
##  Mcnemar's Test P-Value : < 2e-16         
##                                           
##             Sensitivity : 0.7808          
##             Specificity : 0.7620          
##          Pos Pred Value : 0.5347          
##          Neg Pred Value : 0.9085          
##              Prevalence : 0.2594          
##          Detection Rate : 0.2026          
##    Detection Prevalence : 0.3788          
##       Balanced Accuracy : 0.7714          
##                                           
##        'Positive' Class : Yes             
## 

but the accuracy decreases because the recall effect makes this prediction not good if we are targeting maximum accuracy, but if we attach importance to reconsideration to find more customers who want to continue this solution right.

now uses precision to increase the percentage of “NO” predictions.

pred_presisi_telco <- as.factor(ifelse(pred_telco>0.6, "Yes","No"))
confusionMatrix(pred_presisi_telco, data_test$Churn, positive = "Yes")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  978 225
##        Yes  64 140
##                                           
##                Accuracy : 0.7946          
##                  95% CI : (0.7725, 0.8154)
##     No Information Rate : 0.7406          
##     P-Value [Acc > NIR] : 1.284e-06       
##                                           
##                   Kappa : 0.376           
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.3836          
##             Specificity : 0.9386          
##          Pos Pred Value : 0.6863          
##          Neg Pred Value : 0.8130          
##              Prevalence : 0.2594          
##          Detection Rate : 0.0995          
##    Detection Prevalence : 0.1450          
##       Balanced Accuracy : 0.6611          
##                                           
##        'Positive' Class : Yes             
## 

if we use the precision method, we will deduct the exact amount that will continue service. but it has a disadvantage that is the possibility to continue below 60% (to continue service) is very large.