| Call.Failure | Complains | Subscription.Length | Charge.Amount | Minutes.of.Use | Distinct.Called.Numbers | Tariff.Plan | Status | Age | Customer.Value | Churn |
|---|---|---|---|---|---|---|---|---|---|---|
| 8 | 0 | 38 | 0 | 72.83 | 17 | 0 | 1 | 30 | 197.640 | 0 |
| 0 | 0 | 39 | 0 | 5.30 | 4 | 0 | 0 | 25 | 46.035 | 0 |
| 10 | 0 | 37 | 0 | 40.88 | 24 | 0 | 1 | 30 | 1536.520 | 0 |
| 10 | 0 | 38 | 0 | 69.97 | 35 | 0 | 1 | 15 | 240.020 | 0 |
| 3 | 0 | 38 | 0 | 39.88 | 33 | 0 | 1 | 15 | 145.805 | 0 |
## data.frame [3150 × 11]
## $ Call.Failure : integer [1:3150] 8 0 10 10 3
## $ Complains : factor [1:3150] 0 0 0 0 0
## $ Subscription.Length : integer [1:3150] 38 39 37 38 38
## $ Charge.Amount : integer [1:3150] 0 0 0 0 0
## $ Minutes.of.Use : numeric [1:3150] 72.83 5.3 40.88 69.97 39.88
## $ Distinct.Called.Numbers: integer [1:3150] 17 4 24 35 33
## $ Tariff.Plan : factor [1:3150] 0 0 0 0 0
## $ Status : factor [1:3150] 1 0 1 1 1
## $ Age : integer [1:3150] 30 25 30 15 15
## $ Customer.Value : numeric [1:3150] 197.64 46.035 1536.52 240.02 145.805
## $ Churn : factor [1:3150] 0 0 0 0 0
Fig. 1.1
Fig. 1.2
Fig. 1.3
Fig. 1.4
Fig. 1.5
The data was partitioned into a training set and a holdout set. Using churn as the dependent variable, 70% of the data was allocated to the training set, while the remaining 30% was set aside as the holdout set. The seed was set to 1 to allow for result reproduction.
set.seed(1)
idx <- createDataPartition(df$Churn, p = 0.7, list = FALSE)
training.df <- df[idx, ]
holdout.df <- df[-idx, ]
training.df$Churn <- as.factor(training.df$Churn)
holdout.df$Churn <- as.factor(holdout.df$Churn)
## k-Nearest Neighbors
##
## 2206 samples
## 10 predictor
## 2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 1986, 1986, 1985, 1985, 1986, 1985, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 5 0.9031441 0.6125965
## 7 0.8977040 0.5792232
## 9 0.8880324 0.5329789
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 5.
| Call.Failure | Complains | Subscription.Length | Charge.Amount | Minutes.of.Use | Distinct.Called.Numbers | Tariff.Plan | Status | Age | Customer.Value | Churn | prediction | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 0 | 0 | 39 | 0 | 5.30 | 4 | 0 | 0 | 25 | 46.035 | 0 | 1 |
| 6 | 11 | 0 | 38 | 1 | 62.92 | 28 | 0 | 1 | 30 | 282.280 | 0 | 0 |
| 10 | 7 | 0 | 38 | 1 | 75.25 | 25 | 0 | 1 | 30 | 191.920 | 0 | 0 |
| 22 | 8 | 0 | 37 | 1 | 111.97 | 37 | 0 | 1 | 25 | 791.685 | 0 | 0 |
| 28 | 9 | 1 | 36 | 0 | 37.80 | 31 | 0 | 0 | 30 | 228.480 | 1 | 1 |
| 31 | 3 | 0 | 33 | 1 | 113.08 | 22 | 0 | 1 | 55 | 124.230 | 0 | 0 |
| 34 | 25 | 0 | 31 | 3 | 267.92 | 80 | 1 | 1 | 25 | 734.085 | 0 | 0 |
| 38 | 6 | 0 | 36 | 0 | 46.33 | 23 | 0 | 1 | 25 | 914.985 | 0 | 0 |
| 39 | 4 | 0 | 27 | 1 | 21.92 | 11 | 1 | 1 | 25 | 929.295 | 0 | 0 |
| 40 | 4 | 0 | 26 | 0 | 26.13 | 68 | 0 | 1 | 45 | 115.175 | 0 | 0 |
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 757 59
## 1 39 89
##
## Accuracy : 0.8962
## 95% CI : (0.8749, 0.9149)
## No Information Rate : 0.8432
## P-Value [Acc > NIR] : 1.593e-06
##
## Kappa : 0.5845
##
## Mcnemar's Test P-Value : 0.05495
##
## Sensitivity : 0.60135
## Specificity : 0.95101
## Pos Pred Value : 0.69531
## Neg Pred Value : 0.92770
## Prevalence : 0.15678
## Detection Rate : 0.09428
## Detection Prevalence : 0.13559
## Balanced Accuracy : 0.77618
##
## 'Positive' Class : 1
##
## 'data.frame': 2206 obs. of 11 variables:
## $ Call.Failure : Factor w/ 2 levels "Few","Many": 1 2 2 1 1 2 1 1 2 2 ...
## $ Complains : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ Subscription.Length : Factor w/ 2 levels "Short","Long": 2 2 2 2 2 2 2 2 2 2 ...
## $ Charge.Amount : Factor w/ 2 levels "Small","Large": 1 1 1 1 1 2 1 1 1 2 ...
## $ Minutes.of.Use : Factor w/ 2 levels "Few","Many": 1 1 1 1 1 2 2 2 1 2 ...
## $ Distinct.Called.Numbers: Factor w/ 2 levels "Few","Many": 1 1 2 2 1 2 2 1 1 2 ...
## $ Tariff.Plan : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ Status : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 1 2 ...
## $ Age : Factor w/ 5 levels "15","25","30",..: 3 3 1 1 3 3 3 3 3 3 ...
## $ Customer.Value : Factor w/ 2 levels "Less Valuable",..: 1 2 1 1 2 2 2 1 1 2 ...
## $ Churn : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## 'data.frame': 944 obs. of 11 variables:
## $ Call.Failure : Factor w/ 2 levels "Few","Many": 1 2 1 1 2 1 2 1 1 1 ...
## $ Complains : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
## $ Subscription.Length : Factor w/ 2 levels "Short","Long": 2 2 2 2 2 1 1 2 1 1 ...
## $ Charge.Amount : Factor w/ 2 levels "Small","Large": 1 1 1 1 1 1 2 1 1 1 ...
## $ Minutes.of.Use : Factor w/ 2 levels "Few","Many": 1 1 2 2 1 2 2 1 1 1 ...
## $ Distinct.Called.Numbers: Factor w/ 2 levels "Few","Many": 1 2 2 2 2 1 2 1 1 2 ...
## $ Tariff.Plan : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 1 2 1 ...
## $ Status : Factor w/ 2 levels "0","1": 1 2 2 2 1 2 2 2 2 2 ...
## $ Age : Factor w/ 5 levels "15","25","30",..: 2 3 3 2 3 5 2 2 2 4 ...
## $ Customer.Value : Factor w/ 2 levels "Less Valuable",..: 1 1 1 2 1 1 2 2 2 1 ...
## $ Churn : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
##
## Naive Bayes Classifier for Discrete Predictors
##
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
##
## A-priori probabilities:
## Y
## 0 1
## 0.8427017 0.1572983
##
## Conditional probabilities:
## Call.Failure
## Y Few Many
## 0 0.6120365 0.3879635
## 1 0.6590258 0.3409742
##
## Complains
## Y 0 1
## 0 0.98173025 0.01826975
## 1 0.57593123 0.42406877
##
## Subscription.Length
## Y Short Long
## 0 0.4040838 0.5959162
## 1 0.4040115 0.5959885
##
## Charge.Amount
## Y Small Large
## 0 0.72326706 0.27673294
## 1 0.93123209 0.06876791
##
## Minutes.of.Use
## Y Few Many
## 0 0.59108006 0.40891994
## 1 0.91690544 0.08309456
##
## Distinct.Called.Numbers
## Y Few Many
## 0 0.5206878 0.4793122
## 1 0.8538682 0.1461318
##
## Tariff.Plan
## Y 0 1
## 0 0.90865126 0.09134874
## 1 0.98280802 0.01719198
##
## Status
## Y 0 1
## 0 0.1526061 0.8473939
## 1 0.7363897 0.2636103
##
## Age
## Y 15 25 30 45 55
## 0 0.046673820 0.331008584 0.442596567 0.112124464 0.067596567
## 1 0.002840909 0.380681818 0.434659091 0.173295455 0.008522727
##
## Customer.Value
## Y Less Valuable More Valuable
## 0 0.60827512 0.39172488
## 1 0.98280802 0.01719198
| Call.Failure | Complains | Subscription.Length | Charge.Amount | Minutes.of.Use | Distinct.Called.Numbers | Tariff.Plan | Status | Age | Customer.Value | Churn | X0 | X1 | pred.class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Few | 0 | Long | Small | Few | Few | 0 | 0 | 25 | Less Valuable | 0 | 0.2107083 | 0.7892917 | 1 |
| Many | 0 | Long | Small | Few | Many | 0 | 1 | 30 | Less Valuable | 0 | 0.9696575 | 0.0303425 | 0 |
| Few | 0 | Long | Small | Many | Many | 0 | 1 | 30 | Less Valuable | 0 | 0.9950030 | 0.0049970 | 0 |
| Few | 0 | Long | Small | Many | Many | 0 | 1 | 25 | More Valuable | 0 | 0.9998403 | 0.0001597 | 0 |
| Many | 1 | Long | Small | Few | Many | 0 | 0 | 30 | Less Valuable | 1 | 0.0494923 | 0.9505077 | 1 |
| Few | 0 | Short | Small | Many | Few | 0 | 1 | 55 | Less Valuable | 0 | 0.9965450 | 0.0034550 | 0 |
| Many | 0 | Short | Large | Many | Many | 1 | 1 | 25 | More Valuable | 0 | 0.9999956 | 0.0000044 | 0 |
| Few | 0 | Long | Small | Few | Few | 0 | 1 | 25 | More Valuable | 0 | 0.9934832 | 0.0065168 | 0 |
| Few | 0 | Short | Small | Few | Few | 1 | 1 | 25 | More Valuable | 0 | 0.9988603 | 0.0011397 | 0 |
| Few | 0 | Short | Small | Few | Many | 0 | 1 | 45 | Less Valuable | 0 | 0.9431137 | 0.0568863 | 0 |
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Churn
## No 671 35
## Churn 125 113
##
## Accuracy : 0.8305
## 95% CI : (0.805, 0.8539)
## No Information Rate : 0.8432
## P-Value [Acc > NIR] : 0.8679
##
## Kappa : 0.4861
##
## Mcnemar's Test P-Value : 1.977e-12
##
## Sensitivity : 0.7635
## Specificity : 0.8430
## Pos Pred Value : 0.4748
## Neg Pred Value : 0.9504
## Prevalence : 0.1568
## Detection Rate : 0.1197
## Detection Prevalence : 0.2521
## Balanced Accuracy : 0.8032
##
## 'Positive' Class : Churn
##
##
## Call:
## glm(formula = Churn ~ Call.Failure + Subscription.Length + Charge.Amount +
## Distinct.Called.Numbers + Customer.Value + Minutes.of.Use +
## Age, family = binomial(link = "logit"), data = training.df)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -16.8251 391.9665 -0.043 0.9658
## Call.FailureMany 0.7697 0.1529 5.035 4.78e-07 ***
## Subscription.LengthLong 0.3409 0.1350 2.525 0.0116 *
## Charge.AmountLarge -1.0245 0.2521 -4.064 4.82e-05 ***
## Distinct.Called.NumbersMany -0.8124 0.1842 -4.411 1.03e-05 ***
## Customer.ValueMore Valuable -3.2931 0.4587 -7.180 6.99e-13 ***
## Minutes.of.UseMany -1.0874 0.2284 -4.760 1.94e-06 ***
## Age25 15.9001 391.9665 0.041 0.9676
## Age30 15.6433 391.9665 0.040 0.9682
## Age45 15.9637 391.9665 0.041 0.9675
## Age55 13.3153 391.9671 0.034 0.9729
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1919.9 on 2205 degrees of freedom
## Residual deviance: 1460.6 on 2195 degrees of freedom
## AIC: 1482.6
##
## Number of Fisher Scoring iterations: 16