Customer Attrition
Customer attrition, also known as customer churn, customer turnover, or customer defection, is the loss of clients or customers.
Telephone service companies, Internet service providers, pay TV companies, insurance firms, and alarm monitoring services, often use customer attrition analysis and customer attrition rates as one of their key business metrics because the cost of retaining an existing customer is far less than acquiring a new one. Companies from these sectors often have customer service branches which attempt to win back defecting clients, because recovered long-term customers can be worth much more to a company than newly recruited clients.
Companies usually make a distinction between voluntary churn and involuntary churn. Voluntary churn occurs due to a decision by the customer to switch to another company or service provider, involuntary churn occurs due to circumstances such as a customer’s relocation to a long-term care facility, death, or the relocation to a distant location. In most applications, involuntary reasons for churn are excluded from the analytical models. Analysts tend to concentrate on voluntary churn, because it typically occurs due to factors of the company-customer relationship which companies control, such as how billing interactions are handled or how after-sales help is provided.
predictive analytics use churn prediction models that predict customer churn by assessing their propensity of risk to churn. Since these models generate a small prioritized list of potential defectors, they are effective at focusing customer retention marketing programs on the subset of the customer base who are most vulnerable to churn.
library(tidyverse)
library(dplyr)
library(MASS)
library(car)
library(e1071)
library(caret)
library(cowplot)
library(caTools)
library(pROC)
library(ggcorrplot)
telco <- read.csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")
glimpse(telco)
## Observations: 7,043
## Variables: 21
## $ customerID <fct> 7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOCW, 92...
## $ gender <fct> Female, Male, Male, Male, Female, Female, Male, Fe...
## $ SeniorCitizen <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Partner <fct> Yes, No, No, No, No, No, No, No, Yes, No, Yes, No,...
## $ Dependents <fct> No, No, No, No, No, No, Yes, No, No, Yes, Yes, No,...
## $ tenure <int> 1, 34, 2, 45, 2, 8, 22, 10, 28, 62, 13, 16, 58, 49...
## $ PhoneService <fct> No, Yes, Yes, No, Yes, Yes, Yes, No, Yes, Yes, Yes...
## $ MultipleLines <fct> No phone service, No, No, No phone service, No, Ye...
## $ InternetService <fct> DSL, DSL, DSL, DSL, Fiber optic, Fiber optic, Fibe...
## $ OnlineSecurity <fct> No, Yes, Yes, Yes, No, No, No, Yes, No, Yes, Yes, ...
## $ OnlineBackup <fct> Yes, No, Yes, No, No, No, Yes, No, No, Yes, No, No...
## $ DeviceProtection <fct> No, Yes, No, Yes, No, Yes, No, No, Yes, No, No, No...
## $ TechSupport <fct> No, No, No, Yes, No, No, No, No, Yes, No, No, No i...
## $ StreamingTV <fct> No, No, No, No, No, Yes, Yes, No, Yes, No, No, No ...
## $ StreamingMovies <fct> No, No, No, No, No, Yes, No, No, Yes, No, No, No i...
## $ Contract <fct> Month-to-month, One year, Month-to-month, One year...
## $ PaperlessBilling <fct> Yes, No, Yes, No, Yes, Yes, Yes, No, Yes, No, Yes,...
## $ PaymentMethod <fct> Electronic check, Mailed check, Mailed check, Bank...
## $ MonthlyCharges <dbl> 29.85, 56.95, 53.85, 42.30, 70.70, 99.65, 89.10, 2...
## $ TotalCharges <dbl> 29.85, 1889.50, 108.15, 1840.75, 151.65, 820.50, 1...
## $ Churn <fct> No, No, Yes, No, Yes, Yes, No, No, Yes, No, No, No...
telco <- telco %>%
mutate(SeniorCitizen= as.factor(SeniorCitizen))
colSums(is.na(telco))
## customerID gender SeniorCitizen Partner
## 0 0 0 0
## Dependents tenure PhoneService MultipleLines
## 0 0 0 0
## InternetService OnlineSecurity OnlineBackup DeviceProtection
## 0 0 0 0
## TechSupport StreamingTV StreamingMovies Contract
## 0 0 0 0
## PaperlessBilling PaymentMethod MonthlyCharges TotalCharges
## 0 0 0 11
## Churn
## 0
EXPLORATORY DATA ANALYSIS:
telco <- telco %>%
drop_na(TotalCharges)
colSums(is.na(telco))
## customerID gender SeniorCitizen Partner
## 0 0 0 0
## Dependents tenure PhoneService MultipleLines
## 0 0 0 0
## InternetService OnlineSecurity OnlineBackup DeviceProtection
## 0 0 0 0
## TechSupport StreamingTV StreamingMovies Contract
## 0 0 0 0
## PaperlessBilling PaymentMethod MonthlyCharges TotalCharges
## 0 0 0 0
## Churn
## 0
ggplot(telco, aes(x=Churn))+
geom_bar()
telco_agg <- telco %>%
group_by(Contract, Churn) %>%
summarise(jumlah = n()) %>%
arrange(desc(jumlah))
telco_agg %>%
ggplot(aes(x = Contract, y = jumlah)) +
geom_col(aes(fill = Churn), position = "dodge")
1. A larger percent of Customers with monthly subscription have left when compared to Customers with one or two year contract.
2. Churn percent is higher in case of cutsomers having paperless billing option.
3. Customers who have ElectronicCheck PaymentMethod tend to leave the platform more when compared to other options.
telco <- telco %>%
dplyr::select(-customerID)
options(repr.plot.width =6, repr.plot.height = 4)
telco_cor <- round(cor(telco[,c("tenure", "MonthlyCharges", "TotalCharges")]), 1)
ggcorrplot(telco_cor, title = "Correlation")+theme(plot.title = element_text(hjust = 0.5))
Checking for outliers in the continuous variables, and it seems none of the values are beyond the whiskers here.
Splitting Data for checking model
RNGkind(sample.kind = "Rounding")
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(19)
index <- sample(nrow(telco), nrow(telco)*0.8)
data_train <- telco[index, ]
data_test <- telco[-index,]
Model 1
Build the first model using all variables
model_full <- glm(Churn~.,"binomial", data_train)
summary(model_full)
##
## Call:
## glm(formula = Churn ~ ., family = "binomial", data = data_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9427 -0.6696 -0.2849 0.7272 3.2043
##
## Coefficients: (7 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.650e+00 9.143e-01 1.804 0.071162 .
## genderMale -1.006e-02 7.253e-02 -0.139 0.889715
## SeniorCitizen1 2.545e-01 9.459e-02 2.690 0.007143 **
## PartnerYes 1.233e-02 8.758e-02 0.141 0.888029
## DependentsYes -1.798e-01 1.002e-01 -1.795 0.072689 .
## tenure -5.598e-02 6.889e-03 -8.126 4.43e-16 ***
## PhoneServiceYes 7.249e-01 7.292e-01 0.994 0.320149
## MultipleLinesNo phone service NA NA NA NA
## MultipleLinesYes 6.214e-01 2.002e-01 3.103 0.001913 **
## InternetServiceFiber optic 2.383e+00 8.997e-01 2.649 0.008075 **
## InternetServiceNo -2.475e+00 9.066e-01 -2.730 0.006336 **
## OnlineSecurityNo internet service NA NA NA NA
## OnlineSecurityYes 4.908e-02 2.010e-01 0.244 0.807085
## OnlineBackupNo internet service NA NA NA NA
## OnlineBackupYes 1.680e-01 1.968e-01 0.854 0.393230
## DeviceProtectionNo internet service NA NA NA NA
## DeviceProtectionYes 2.941e-01 1.988e-01 1.480 0.138943
## TechSupportNo internet service NA NA NA NA
## TechSupportYes -1.205e-01 2.036e-01 -0.592 0.553707
## StreamingTVNo internet service NA NA NA NA
## StreamingTVYes 8.786e-01 3.679e-01 2.388 0.016942 *
## StreamingMoviesNo internet service NA NA NA NA
## StreamingMoviesYes 7.776e-01 3.671e-01 2.118 0.034171 *
## ContractOne year -7.141e-01 1.211e-01 -5.895 3.75e-09 ***
## ContractTwo year -1.373e+00 1.971e-01 -6.968 3.22e-12 ***
## PaperlessBillingYes 4.151e-01 8.314e-02 4.992 5.98e-07 ***
## PaymentMethodCredit card (automatic) 1.563e-02 1.279e-01 0.122 0.902728
## PaymentMethodElectronic check 3.565e-01 1.075e-01 3.317 0.000910 ***
## PaymentMethodMailed check -9.416e-04 1.295e-01 -0.007 0.994200
## MonthlyCharges -6.591e-02 3.570e-02 -1.846 0.064873 .
## TotalCharges 2.704e-04 7.839e-05 3.450 0.000561 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 6532.1 on 5624 degrees of freedom
## Residual deviance: 4662.2 on 5601 degrees of freedom
## AIC: 4710.2
##
## Number of Fisher Scoring iterations: 6
Using stepAIC for variable selection, which is a iterative process of adding or removing variables, in order to get a subset of variables that gives the best performing model.
model_both <- stepAIC(model_full, direction = "both")
## Start: AIC=4710.2
## Churn ~ gender + SeniorCitizen + Partner + Dependents + tenure +
## PhoneService + MultipleLines + InternetService + OnlineSecurity +
## OnlineBackup + DeviceProtection + TechSupport + StreamingTV +
## StreamingMovies + Contract + PaperlessBilling + PaymentMethod +
## MonthlyCharges + TotalCharges
##
##
## Step: AIC=4710.2
## Churn ~ gender + SeniorCitizen + Partner + Dependents + tenure +
## MultipleLines + InternetService + OnlineSecurity + OnlineBackup +
## DeviceProtection + TechSupport + StreamingTV + StreamingMovies +
## Contract + PaperlessBilling + PaymentMethod + MonthlyCharges +
## TotalCharges
##
## Df Deviance AIC
## - gender 1 4662.2 4708.2
## - Partner 1 4662.2 4708.2
## - OnlineSecurity 1 4662.3 4708.3
## - TechSupport 1 4662.6 4708.6
## - OnlineBackup 1 4662.9 4708.9
## <none> 4662.2 4710.2
## - DeviceProtection 1 4664.4 4710.4
## - Dependents 1 4665.4 4711.4
## - MonthlyCharges 1 4665.6 4711.6
## - StreamingMovies 1 4666.7 4712.7
## - StreamingTV 1 4667.9 4713.9
## - InternetService 1 4669.2 4715.2
## - SeniorCitizen 1 4669.4 4715.4
## - TotalCharges 1 4674.6 4720.6
## - PaymentMethod 3 4682.6 4724.6
## - MultipleLines 2 4683.0 4727.0
## - PaperlessBilling 1 4687.3 4733.3
## - Contract 2 4732.8 4776.8
## - tenure 1 4739.1 4785.1
##
## Step: AIC=4708.22
## Churn ~ SeniorCitizen + Partner + Dependents + tenure + MultipleLines +
## InternetService + OnlineSecurity + OnlineBackup + DeviceProtection +
## TechSupport + StreamingTV + StreamingMovies + Contract +
## PaperlessBilling + PaymentMethod + MonthlyCharges + TotalCharges
##
## Df Deviance AIC
## - Partner 1 4662.2 4706.2
## - OnlineSecurity 1 4662.3 4706.3
## - TechSupport 1 4662.6 4706.6
## - OnlineBackup 1 4663.0 4707.0
## <none> 4662.2 4708.2
## - DeviceProtection 1 4664.4 4708.4
## - Dependents 1 4665.5 4709.5
## - MonthlyCharges 1 4665.7 4709.7
## + gender 1 4662.2 4710.2
## - StreamingMovies 1 4666.7 4710.7
## - StreamingTV 1 4668.0 4712.0
## - InternetService 1 4669.3 4713.3
## - SeniorCitizen 1 4669.5 4713.5
## - TotalCharges 1 4674.6 4718.6
## - PaymentMethod 3 4682.6 4722.6
## - MultipleLines 2 4683.1 4725.1
## - PaperlessBilling 1 4687.4 4731.4
## - Contract 2 4732.8 4774.8
## - tenure 1 4739.2 4783.2
##
## Step: AIC=4706.24
## Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines +
## InternetService + OnlineSecurity + OnlineBackup + DeviceProtection +
## TechSupport + StreamingTV + StreamingMovies + Contract +
## PaperlessBilling + PaymentMethod + MonthlyCharges + TotalCharges
##
## Df Deviance AIC
## - OnlineSecurity 1 4662.3 4704.3
## - TechSupport 1 4662.6 4704.6
## - OnlineBackup 1 4663.0 4705.0
## <none> 4662.2 4706.2
## - DeviceProtection 1 4664.5 4706.5
## - MonthlyCharges 1 4665.7 4707.7
## - Dependents 1 4666.0 4708.0
## + Partner 1 4662.2 4708.2
## + gender 1 4662.2 4708.2
## - StreamingMovies 1 4666.8 4708.8
## - StreamingTV 1 4668.0 4710.0
## - InternetService 1 4669.3 4711.3
## - SeniorCitizen 1 4669.6 4711.6
## - TotalCharges 1 4674.6 4716.6
## - PaymentMethod 3 4682.7 4720.7
## - MultipleLines 2 4683.1 4723.1
## - PaperlessBilling 1 4687.4 4729.4
## - Contract 2 4732.9 4772.9
## - tenure 1 4739.6 4781.6
##
## Step: AIC=4704.31
## Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines +
## InternetService + OnlineBackup + DeviceProtection + TechSupport +
## StreamingTV + StreamingMovies + Contract + PaperlessBilling +
## PaymentMethod + MonthlyCharges + TotalCharges
##
## Df Deviance AIC
## - OnlineBackup 1 4663.5 4703.5
## - TechSupport 1 4663.8 4703.8
## <none> 4662.3 4704.3
## - Dependents 1 4666.0 4706.0
## + OnlineSecurity 1 4662.2 4706.2
## + gender 1 4662.3 4706.3
## + Partner 1 4662.3 4706.3
## - DeviceProtection 1 4666.8 4706.8
## - SeniorCitizen 1 4669.7 4709.7
## - MonthlyCharges 1 4674.4 4714.4
## - TotalCharges 1 4674.8 4714.8
## - StreamingMovies 1 4676.3 4716.3
## - PaymentMethod 3 4682.7 4718.7
## - StreamingTV 1 4680.8 4720.8
## - MultipleLines 2 4687.6 4725.6
## - PaperlessBilling 1 4687.5 4727.5
## - InternetService 1 4689.7 4729.7
## - Contract 2 4732.9 4770.9
## - tenure 1 4739.6 4779.6
##
## Step: AIC=4703.49
## Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines +
## InternetService + DeviceProtection + TechSupport + StreamingTV +
## StreamingMovies + Contract + PaperlessBilling + PaymentMethod +
## MonthlyCharges + TotalCharges
##
## Df Deviance AIC
## <none> 4663.5 4703.5
## + OnlineBackup 1 4662.3 4704.3
## - DeviceProtection 1 4666.8 4704.8
## + OnlineSecurity 1 4663.0 4705.0
## - Dependents 1 4667.2 4705.2
## - TechSupport 1 4667.3 4705.3
## + gender 1 4663.5 4705.5
## + Partner 1 4663.5 4705.5
## - SeniorCitizen 1 4670.9 4708.9
## - TotalCharges 1 4676.2 4714.2
## - MonthlyCharges 1 4677.8 4715.8
## - StreamingMovies 1 4678.9 4716.9
## - PaymentMethod 3 4684.1 4718.1
## - StreamingTV 1 4684.6 4722.6
## - MultipleLines 2 4688.1 4724.1
## - PaperlessBilling 1 4689.2 4727.2
## - InternetService 1 4701.6 4739.6
## - Contract 2 4735.4 4771.4
## - tenure 1 4740.8 4778.8
summary(model_both)
##
## Call:
## glm(formula = Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines +
## InternetService + DeviceProtection + TechSupport + StreamingTV +
## StreamingMovies + Contract + PaperlessBilling + PaymentMethod +
## MonthlyCharges + TotalCharges, family = "binomial", data = data_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9500 -0.6687 -0.2870 0.7286 3.2230
##
## Coefficients: (4 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.450e+00 5.963e-01 2.431 0.015071 *
## SeniorCitizen1 2.569e-01 9.403e-02 2.732 0.006294 **
## DependentsYes -1.727e-01 9.054e-02 -1.908 0.056384 .
## tenure -5.590e-02 6.863e-03 -8.145 3.80e-16 ***
## MultipleLinesNo phone service -3.086e-01 2.788e-01 -1.107 0.268403
## MultipleLinesYes 5.204e-01 1.073e-01 4.848 1.24e-06 ***
## InternetServiceFiber optic 1.875e+00 3.064e-01 6.120 9.35e-10 ***
## InternetServiceNo -1.962e+00 3.659e-01 -5.362 8.23e-08 ***
## DeviceProtectionNo internet service NA NA NA NA
## DeviceProtectionYes 1.932e-01 1.063e-01 1.817 0.069165 .
## TechSupportNo internet service NA NA NA NA
## TechSupportYes -2.253e-01 1.150e-01 -1.959 0.050067 .
## StreamingTVNo internet service NA NA NA NA
## StreamingTVYes 6.769e-01 1.480e-01 4.574 4.78e-06 ***
## StreamingMoviesNo internet service NA NA NA NA
## StreamingMoviesYes 5.731e-01 1.467e-01 3.905 9.41e-05 ***
## ContractOne year -7.186e-01 1.211e-01 -5.936 2.92e-09 ***
## ContractTwo year -1.383e+00 1.967e-01 -7.029 2.08e-12 ***
## PaperlessBillingYes 4.193e-01 8.303e-02 5.050 4.41e-07 ***
## PaymentMethodCredit card (automatic) 1.611e-02 1.278e-01 0.126 0.899659
## PaymentMethodElectronic check 3.572e-01 1.074e-01 3.327 0.000879 ***
## PaymentMethodMailed check -3.325e-03 1.294e-01 -0.026 0.979498
## MonthlyCharges -4.548e-02 1.207e-02 -3.769 0.000164 ***
## TotalCharges 2.740e-04 7.828e-05 3.500 0.000466 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 6532.1 on 5624 degrees of freedom
## Residual deviance: 4663.5 on 5605 degrees of freedom
## AIC: 4703.5
##
## Number of Fisher Scoring iterations: 6
Build Model 3 for get p-value low
model_3 <- glm(formula = Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines +
InternetService + DeviceProtection + TechSupport + StreamingTV +
StreamingMovies + Contract + PaperlessBilling + PaymentMethod +
MonthlyCharges + TotalCharges, family = "binomial", data = data_train)
summary(model_3)
##
## Call:
## glm(formula = Churn ~ SeniorCitizen + Dependents + tenure + MultipleLines +
## InternetService + DeviceProtection + TechSupport + StreamingTV +
## StreamingMovies + Contract + PaperlessBilling + PaymentMethod +
## MonthlyCharges + TotalCharges, family = "binomial", data = data_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9500 -0.6687 -0.2870 0.7286 3.2230
##
## Coefficients: (4 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.450e+00 5.963e-01 2.431 0.015071 *
## SeniorCitizen1 2.569e-01 9.403e-02 2.732 0.006294 **
## DependentsYes -1.727e-01 9.054e-02 -1.908 0.056384 .
## tenure -5.590e-02 6.863e-03 -8.145 3.80e-16 ***
## MultipleLinesNo phone service -3.086e-01 2.788e-01 -1.107 0.268403
## MultipleLinesYes 5.204e-01 1.073e-01 4.848 1.24e-06 ***
## InternetServiceFiber optic 1.875e+00 3.064e-01 6.120 9.35e-10 ***
## InternetServiceNo -1.962e+00 3.659e-01 -5.362 8.23e-08 ***
## DeviceProtectionNo internet service NA NA NA NA
## DeviceProtectionYes 1.932e-01 1.063e-01 1.817 0.069165 .
## TechSupportNo internet service NA NA NA NA
## TechSupportYes -2.253e-01 1.150e-01 -1.959 0.050067 .
## StreamingTVNo internet service NA NA NA NA
## StreamingTVYes 6.769e-01 1.480e-01 4.574 4.78e-06 ***
## StreamingMoviesNo internet service NA NA NA NA
## StreamingMoviesYes 5.731e-01 1.467e-01 3.905 9.41e-05 ***
## ContractOne year -7.186e-01 1.211e-01 -5.936 2.92e-09 ***
## ContractTwo year -1.383e+00 1.967e-01 -7.029 2.08e-12 ***
## PaperlessBillingYes 4.193e-01 8.303e-02 5.050 4.41e-07 ***
## PaymentMethodCredit card (automatic) 1.611e-02 1.278e-01 0.126 0.899659
## PaymentMethodElectronic check 3.572e-01 1.074e-01 3.327 0.000879 ***
## PaymentMethodMailed check -3.325e-03 1.294e-01 -0.026 0.979498
## MonthlyCharges -4.548e-02 1.207e-02 -3.769 0.000164 ***
## TotalCharges 2.740e-04 7.828e-05 3.500 0.000466 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 6532.1 on 5624 degrees of freedom
## Residual deviance: 4663.5 on 5605 degrees of freedom
## AIC: 4703.5
##
## Number of Fisher Scoring iterations: 6
Model Evaluation using the validation data:
why model_3, cause AIC lower
pred_telco <- predict(model_3,data_test,"response")
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
head(pred_telco)
## 2 11 15 22 23 24
## 0.04512903 0.24148402 0.46505525 0.06027001 0.18506237 0.01464914
pred_telco2 <- as.factor(ifelse(pred_telco >= 0.5, "Yes", "No"))
head(pred_telco2)
## 2 11 15 22 23 24
## No No No No No No
## Levels: No Yes
head(data_test$Churn)
## [1] No No No No Yes No
## Levels: No Yes
table("Predicted" = pred_telco2, "Reference" = data_test$Churn)
## Reference
## Predicted No Yes
## No 924 164
## Yes 118 201
confusionMatrix(pred_telco2,
reference = data_test$Churn,
positive = "Yes")
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 924 164
## Yes 118 201
##
## Accuracy : 0.7996
## 95% CI : (0.7777, 0.8202)
## No Information Rate : 0.7406
## P-Value [Acc > NIR] : 1.286e-07
##
## Kappa : 0.4561
##
## Mcnemar's Test P-Value : 0.007369
##
## Sensitivity : 0.5507
## Specificity : 0.8868
## Pos Pred Value : 0.6301
## Neg Pred Value : 0.8493
## Prevalence : 0.2594
## Detection Rate : 0.1429
## Detection Prevalence : 0.2267
## Balanced Accuracy : 0.7187
##
## 'Positive' Class : Yes
##
as you can see this model have accuracy almost 80%
now uses recall to increase the “YES” prediction percentage.
pred_recall_telco <- as.factor(ifelse(pred_telco>0.3, "Yes","No"))
confusionMatrix(pred_recall_telco, data_test$Churn, positive = "Yes")
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 794 80
## Yes 248 285
##
## Accuracy : 0.7669
## 95% CI : (0.7439, 0.7888)
## No Information Rate : 0.7406
## P-Value [Acc > NIR] : 0.01253
##
## Kappa : 0.4722
##
## Mcnemar's Test P-Value : < 2e-16
##
## Sensitivity : 0.7808
## Specificity : 0.7620
## Pos Pred Value : 0.5347
## Neg Pred Value : 0.9085
## Prevalence : 0.2594
## Detection Rate : 0.2026
## Detection Prevalence : 0.3788
## Balanced Accuracy : 0.7714
##
## 'Positive' Class : Yes
##
but the accuracy decreases because the recall effect makes this prediction not good if we are targeting maximum accuracy, but if we attach importance to reconsideration to find more customers who want to continue this solution right.
now uses precision to increase the percentage of “NO” predictions.
pred_presisi_telco <- as.factor(ifelse(pred_telco>0.6, "Yes","No"))
confusionMatrix(pred_presisi_telco, data_test$Churn, positive = "Yes")
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 978 225
## Yes 64 140
##
## Accuracy : 0.7946
## 95% CI : (0.7725, 0.8154)
## No Information Rate : 0.7406
## P-Value [Acc > NIR] : 1.284e-06
##
## Kappa : 0.376
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.3836
## Specificity : 0.9386
## Pos Pred Value : 0.6863
## Neg Pred Value : 0.8130
## Prevalence : 0.2594
## Detection Rate : 0.0995
## Detection Prevalence : 0.1450
## Balanced Accuracy : 0.6611
##
## 'Positive' Class : Yes
##
if we use the precision method, we will deduct the exact amount that will continue service. but it has a disadvantage that is the possibility to continue below 60% (to continue service) is very large.