Companies need to focus their effort on reducing the number of customers churned. In this tutorial, I will do a churn analysis for telecom customers from data that I found in Kaggle. You can download the data by following this link : https://www.kaggle.com/becksddf/churn-in-telecoms-dataset/data#
data=read.csv('C:/Users/TOSHIBA/Desktop/churn.csv',header=T)
str(data)
## 'data.frame': 3333 obs. of 21 variables:
## $ state : Factor w/ 51 levels "AK","AL","AR",..: 17 36 32 36 37 2 20 25 19 50 ...
## $ account.length : int 128 107 137 84 75 118 121 147 117 141 ...
## $ area.code : int 415 415 415 408 415 510 510 415 408 415 ...
## $ phone.number : Factor w/ 3333 levels "327-1058","327-1319",..: 1927 1576 1118 1708 111 2254 1048 81 292 118 ...
## $ international.plan : Factor w/ 2 levels "no","yes": 1 1 1 2 2 2 1 2 1 2 ...
## $ voice.mail.plan : Factor w/ 2 levels "no","yes": 2 2 1 1 1 1 2 1 1 2 ...
## $ number.vmail.messages : int 25 26 0 0 0 0 24 0 0 37 ...
## $ total.day.minutes : num 265 162 243 299 167 ...
## $ total.day.calls : int 110 123 114 71 113 98 88 79 97 84 ...
## $ total.day.charge : num 45.1 27.5 41.4 50.9 28.3 ...
## $ total.eve.minutes : num 197.4 195.5 121.2 61.9 148.3 ...
## $ total.eve.calls : int 99 103 110 88 122 101 108 94 80 111 ...
## $ total.eve.charge : num 16.78 16.62 10.3 5.26 12.61 ...
## $ total.night.minutes : num 245 254 163 197 187 ...
## $ total.night.calls : int 91 103 104 89 121 118 118 96 90 97 ...
## $ total.night.charge : num 11.01 11.45 7.32 8.86 8.41 ...
## $ total.intl.minutes : num 10 13.7 12.2 6.6 10.1 6.3 7.5 7.1 8.7 11.2 ...
## $ total.intl.calls : int 3 3 5 7 3 6 7 6 4 5 ...
## $ total.intl.charge : num 2.7 3.7 3.29 1.78 2.73 1.7 2.03 1.92 2.35 3.02 ...
## $ customer.service.calls: int 1 1 0 2 3 0 3 0 1 0 ...
## $ churn : Factor w/ 2 levels "False","True": 1 1 1 1 1 1 1 1 1 1 ...
summary(data)
## state account.length area.code phone.number
## WV : 106 Min. : 1.0 Min. :408.0 327-1058: 1
## MN : 84 1st Qu.: 74.0 1st Qu.:408.0 327-1319: 1
## NY : 83 Median :101.0 Median :415.0 327-3053: 1
## AL : 80 Mean :101.1 Mean :437.2 327-3587: 1
## OH : 78 3rd Qu.:127.0 3rd Qu.:510.0 327-3850: 1
## OR : 78 Max. :243.0 Max. :510.0 327-3954: 1
## (Other):2824 (Other) :3327
## international.plan voice.mail.plan number.vmail.messages
## no :3010 no :2411 Min. : 0.000
## yes: 323 yes: 922 1st Qu.: 0.000
## Median : 0.000
## Mean : 8.099
## 3rd Qu.:20.000
## Max. :51.000
##
## total.day.minutes total.day.calls total.day.charge total.eve.minutes
## Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.0
## 1st Qu.:143.7 1st Qu.: 87.0 1st Qu.:24.43 1st Qu.:166.6
## Median :179.4 Median :101.0 Median :30.50 Median :201.4
## Mean :179.8 Mean :100.4 Mean :30.56 Mean :201.0
## 3rd Qu.:216.4 3rd Qu.:114.0 3rd Qu.:36.79 3rd Qu.:235.3
## Max. :350.8 Max. :165.0 Max. :59.64 Max. :363.7
##
## total.eve.calls total.eve.charge total.night.minutes total.night.calls
## Min. : 0.0 Min. : 0.00 Min. : 23.2 Min. : 33.0
## 1st Qu.: 87.0 1st Qu.:14.16 1st Qu.:167.0 1st Qu.: 87.0
## Median :100.0 Median :17.12 Median :201.2 Median :100.0
## Mean :100.1 Mean :17.08 Mean :200.9 Mean :100.1
## 3rd Qu.:114.0 3rd Qu.:20.00 3rd Qu.:235.3 3rd Qu.:113.0
## Max. :170.0 Max. :30.91 Max. :395.0 Max. :175.0
##
## total.night.charge total.intl.minutes total.intl.calls total.intl.charge
## Min. : 1.040 Min. : 0.00 Min. : 0.000 Min. :0.000
## 1st Qu.: 7.520 1st Qu.: 8.50 1st Qu.: 3.000 1st Qu.:2.300
## Median : 9.050 Median :10.30 Median : 4.000 Median :2.780
## Mean : 9.039 Mean :10.24 Mean : 4.479 Mean :2.765
## 3rd Qu.:10.590 3rd Qu.:12.10 3rd Qu.: 6.000 3rd Qu.:3.270
## Max. :17.770 Max. :20.00 Max. :20.000 Max. :5.400
##
## customer.service.calls churn
## Min. :0.000 False:2850
## 1st Qu.:1.000 True : 483
## Median :1.000
## Mean :1.563
## 3rd Qu.:2.000
## Max. :9.000
##
library(ggplot2)
library(plotly)
library(dplyr)
library(randomForest)
To visualise the data we will use two packages of R which are : ggplot2 and plotly
p=ggplot(data)+geom_bar(aes(x=churn))
p
Churn
p1=ggplot(data)+geom_bar(aes(x=state,fill=churn))
p1
Churn by state
p2=ggplot(data)+geom_bar(aes(x=as.factor(area.code),fill=churn))
p2
Churn by area code
p3=ggplot(data)+geom_bar(aes(x=international.plan,fill=churn))
p3
Churn by international plan
p4=ggplot(data)+geom_bar(aes(x=voice.mail.plan,fill=churn))
p4
Churn by voice mail plan
data2=data[,7:21]
churn_RandomForest <- randomForest(churn~.,data=data2, ntree = 100,
mtry = 2, na.action = na.roughfix)
print(churn_RandomForest)
##
## Call:
## randomForest(formula = churn ~ ., data = data2, ntree = 100, mtry = 2, na.action = na.roughfix)
## Type of random forest: classification
## Number of trees: 100
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 8.31%
## Confusion matrix:
## False True class.error
## False 2824 26 0.009122807
## True 251 232 0.519668737
reg <- glm(churn ~.,
data = data2, family = binomial(logit))
reg
##
## Call: glm(formula = churn ~ ., family = binomial(logit), data = data2)
##
## Coefficients:
## (Intercept) number.vmail.messages total.day.minutes
## -7.8815516 -0.0245235 -0.5741144
## total.day.calls total.day.charge total.eve.minutes
## 0.0029735 3.4515631 0.3840558
## total.eve.calls total.eve.charge total.night.minutes
## 0.0009446 -4.4402475 0.0102128
## total.night.calls total.night.charge total.intl.minutes
## 0.0009018 -0.1669207 -1.4374061
## total.intl.calls total.intl.charge customer.service.calls
## -0.0793376 5.6669682 0.4549603
##
## Degrees of Freedom: 3332 Total (i.e. Null); 3318 Residual
## Null Deviance: 2758
## Residual Deviance: 2363 AIC: 2393
summary(reg)
##
## Call:
## glm(formula = churn ~ ., family = binomial(logit), data = data2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7853 -0.5661 -0.4016 -0.2502 2.9868
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.8815516 0.6695246 -11.772 < 2e-16 ***
## number.vmail.messages -0.0245235 0.0045098 -5.438 5.39e-08 ***
## total.day.minutes -0.5741144 3.1345784 -0.183 0.854676
## total.day.calls 0.0029735 0.0026444 1.124 0.260809
## total.day.charge 3.4515631 18.4388176 0.187 0.851512
## total.eve.minutes 0.3840558 1.5630932 0.246 0.805913
## total.eve.calls 0.0009446 0.0026287 0.359 0.719342
## total.eve.charge -4.4402475 18.3892705 -0.241 0.809200
## total.night.minutes 0.0102128 0.8339306 0.012 0.990229
## total.night.calls 0.0009018 0.0027169 0.332 0.739955
## total.night.charge -0.1669207 18.5312690 -0.009 0.992813
## total.intl.minutes -1.4374061 5.0276526 -0.286 0.774955
## total.intl.calls -0.0793376 0.0237729 -3.337 0.000846 ***
## total.intl.charge 5.6669682 18.6201516 0.304 0.760864
## customer.service.calls 0.4549603 0.0371624 12.242 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2758.3 on 3332 degrees of freedom
## Residual deviance: 2362.8 on 3318 degrees of freedom
## AIC: 2392.8
##
## Number of Fisher Scoring iterations: 5