###
##  Homework 5C: Decision Trees – The Churn dataset from C50 Package
###
rm(list=ls())
## Install packages (“C50”)
library(C50)
library(tidyverse)
library(liver) # churn dataset
data(churn)
glimpse(churn)
## Rows: 5,000
## Columns: 20
## $ state          <fct> KS, OH, NJ, OH, OK, AL, MA, MO, LA, WV, IN, RI, IA, MT,…
## $ area.code      <fct> area_code_415, area_code_415, area_code_415, area_code_…
## $ account.length <int> 128, 107, 137, 84, 75, 118, 121, 147, 117, 141, 65, 74,…
## $ voice.plan     <fct> yes, yes, no, no, no, no, yes, no, no, yes, no, no, no,…
## $ voice.messages <int> 25, 26, 0, 0, 0, 0, 24, 0, 0, 37, 0, 0, 0, 0, 0, 0, 27,…
## $ intl.plan      <fct> no, no, no, yes, yes, yes, no, yes, no, yes, no, no, no…
## $ intl.mins      <dbl> 10.0, 13.7, 12.2, 6.6, 10.1, 6.3, 7.5, 7.1, 8.7, 11.2, …
## $ intl.calls     <int> 3, 3, 5, 7, 3, 6, 7, 6, 4, 5, 6, 5, 2, 5, 6, 9, 4, 3, 5…
## $ intl.charge    <dbl> 2.70, 3.70, 3.29, 1.78, 2.73, 1.70, 2.03, 1.92, 2.35, 3…
## $ day.mins       <dbl> 265.1, 161.6, 243.4, 299.4, 166.7, 223.4, 218.2, 157.0,…
## $ day.calls      <int> 110, 123, 114, 71, 113, 98, 88, 79, 97, 84, 137, 127, 9…
## $ day.charge     <dbl> 45.07, 27.47, 41.38, 50.90, 28.34, 37.98, 37.09, 26.69,…
## $ eve.mins       <dbl> 197.4, 195.5, 121.2, 61.9, 148.3, 220.6, 348.5, 103.1, …
## $ eve.calls      <int> 99, 103, 110, 88, 122, 101, 108, 94, 80, 111, 83, 148, …
## $ eve.charge     <dbl> 16.78, 16.62, 10.30, 5.26, 12.61, 18.75, 29.62, 8.76, 2…
## $ night.mins     <dbl> 244.7, 254.4, 162.6, 196.9, 186.9, 203.9, 212.6, 211.8,…
## $ night.calls    <int> 91, 103, 104, 89, 121, 118, 118, 96, 90, 97, 111, 94, 1…
## $ night.charge   <dbl> 11.01, 11.45, 7.32, 8.86, 8.41, 9.18, 9.57, 9.53, 9.71,…
## $ customer.calls <int> 1, 1, 0, 2, 3, 0, 3, 0, 1, 0, 4, 0, 1, 3, 4, 4, 1, 3, 1…
## $ churn          <fct> no, no, no, no, no, no, no, no, no, no, yes, no, no, no…
obs <- nrow(churn)
churn <- churn[order(runif(obs)), ]
churnTrain <- churn[1:4000, ]
churnTest  <- churn[4001:obs, ]
## Model Training  
library(caret)
library(rpart)

# method = "rpart",  builds a Decision Tree model using the Recursive Partitioning and Regression Trees method using rpart package.

dt_model <- train (churn ~. , 
 data = churnTrain, metric = "Accuracy", 
 method = "rpart")

typeof(dt_model)
## [1] "list"
names(dt_model)
##  [1] "method"       "modelInfo"    "modelType"    "results"      "pred"        
##  [6] "bestTune"     "call"         "dots"         "metric"       "control"     
## [11] "finalModel"   "preProcess"   "trainingData" "ptype"        "resample"    
## [16] "resampledCM"  "perfNames"    "maximize"     "yLimits"      "times"       
## [21] "levels"       "terms"        "coefnames"    "contrasts"    "xlevels"
## Check Decision Tree Classifier
print(dt_model)
## CART 
## 
## 4000 samples
##   19 predictor
##    2 classes: 'yes', 'no' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 4000, 4000, 4000, 4000, 4000, 4000, ... 
## Resampling results across tuning parameters:
## 
##   cp          Accuracy   Kappa    
##   0.07001795  0.8924821  0.4041502
##   0.08527828  0.8745606  0.2270605
##   0.08617594  0.8729086  0.2058219
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.07001795.
# To do:  Interpret the results in this output section
# To do:  Explain cp accuracy and kappa (Look up kappa)

Ans:

The training dataset has 4000 records with 19 predictors and the response variable is two classes categorical variable. The caret package uses a ten folds cross-validation using a resampling technique. The above output provided the three accuracy model by default. The maximum accuracy was 0.922 with cp value 0.052 and kappa = 0.608.

cp = Complexity parameter
Accuracy = Accuracy rate

\(\kappa =\frac{P_0 -P_e}{1-P_0}\)

where \(P_0\) = accuracy rate \(P_e\) = random guess proportion

Kappa is a useful indicator when accuracy rates lie near one or zero, as kappa considers how much better the model is than a random guess.

## Check Decision Tree Classifier Details     
print(dt_model$finalModel)
## n= 4000 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 4000 557 no (0.13925000 0.86075000)  
##    2) day.mins>=265.4 236  94 yes (0.60169492 0.39830508) *
##    3) day.mins< 265.4 3764 415 no (0.11025505 0.88974495)  
##      6) customer.calls>=3.5 307 153 no (0.49837134 0.50162866)  
##       12) day.mins< 162.7 131  18 yes (0.86259542 0.13740458) *
##       13) day.mins>=162.7 176  40 no (0.22727273 0.77272727) *
##      7) customer.calls< 3.5 3457 262 no (0.07578826 0.92421174) *
## Model Prediction (1)
dt_predict <- predict(dt_model, newdata=churnTest, na.action=na.omit, type="prob")
head(dt_predict, 5)
# To do: Explain the results in the output of this section
Ans: 

It provides the finmodel in the training dataset, which has each node value, splitting variable with threshold, number of observations, loss in each node, y-value and probability.

The prediction function provides the probability of success or failure for each observation in the test dataset.
## Model Prediction (2)
dt_predict2 <- predict(dt_model, newdata=churnTest, type= "raw")
head(dt_predict2)
## [1] no no no no no no
## Levels: yes no
# To do:  Explain the results in the output of this section
Ans:

It predicts the raw type mean default value for the model. In our case, the category values of the model.
## Model Tuning (1)
dt_model_tune <- train(churn ~. , data=churnTrain, method= "rpart", metric="Accuracy",  tuneLength = 8)
print(dt_model_tune$finalModel)
## n= 4000 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 4000 557 no (0.13925000 0.86075000)  
##     2) day.mins>=265.4 236  94 yes (0.60169492 0.39830508)  
##       4) voice.planno>=0.5 183  48 yes (0.73770492 0.26229508)  
##         8) eve.mins>=166.45 134  14 yes (0.89552239 0.10447761) *
##         9) eve.mins< 166.45 49  15 no (0.30612245 0.69387755)  
##          18) day.mins>=302.5 10   1 yes (0.90000000 0.10000000) *
##          19) day.mins< 302.5 39   6 no (0.15384615 0.84615385) *
##       5) voice.planno< 0.5 53   7 no (0.13207547 0.86792453) *
##     3) day.mins< 265.4 3764 415 no (0.11025505 0.88974495)  
##       6) customer.calls>=3.5 307 153 no (0.49837134 0.50162866)  
##        12) day.mins< 162.7 131  18 yes (0.86259542 0.13740458) *
##        13) day.mins>=162.7 176  40 no (0.22727273 0.77272727) *
##       7) customer.calls< 3.5 3457 262 no (0.07578826 0.92421174)  
##        14) intl.planno< 0.5 314 116 no (0.36942675 0.63057325)  
##          28) intl.calls< 2.5 60   0 yes (1.00000000 0.00000000) *
##          29) intl.calls>=2.5 254  56 no (0.22047244 0.77952756)  
##            58) intl.mins>=13.05 48   0 yes (1.00000000 0.00000000) *
##            59) intl.mins< 13.05 206   8 no (0.03883495 0.96116505) *
##        15) intl.planno>=0.5 3143 146 no (0.04645243 0.95354757)  
##          30) day.mins>=221.85 511  85 no (0.16634051 0.83365949)  
##            60) eve.mins>=242.35 99  41 yes (0.58585859 0.41414141)  
##             120) voice.planno>=0.5 79  21 yes (0.73417722 0.26582278) *
##             121) voice.planno< 0.5 20   0 no (0.00000000 1.00000000) *
##            61) eve.mins< 242.35 412  27 no (0.06553398 0.93446602) *
##          31) day.mins< 221.85 2632  61 no (0.02317629 0.97682371) *
# To do:  Explain what the arguments, method="rpart", and metric="Accuracy" and tell what they are telling the model to do

Ans:
method="rpart" --> develop a decision tree model using rpart package.
metric="Accuracy"  --> Use the highest "accuracy" rate to select the optimal model.
tuneLength = 8  --> The number of tuning grids chosen is 8. One of the tuning parameter grid chosen by CARET is the cp values.
## Model Tuning (2)
dt_model_tune2 <- train(churn ~. , data=churnTrain, method="rpart",
                          tuneGrid = expand.grid(cp=seq(0, 0.1, 0.01)))
print(dt_model_tune2$finalModel)
## n= 4000 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 4000 557 no (0.13925000 0.86075000)  
##     2) day.mins>=265.4 236  94 yes (0.60169492 0.39830508)  
##       4) voice.planno>=0.5 183  48 yes (0.73770492 0.26229508)  
##         8) eve.mins>=166.45 134  14 yes (0.89552239 0.10447761) *
##         9) eve.mins< 166.45 49  15 no (0.30612245 0.69387755)  
##          18) day.mins>=302.5 10   1 yes (0.90000000 0.10000000) *
##          19) day.mins< 302.5 39   6 no (0.15384615 0.84615385) *
##       5) voice.planno< 0.5 53   7 no (0.13207547 0.86792453) *
##     3) day.mins< 265.4 3764 415 no (0.11025505 0.88974495)  
##       6) customer.calls>=3.5 307 153 no (0.49837134 0.50162866)  
##        12) day.mins< 162.7 131  18 yes (0.86259542 0.13740458) *
##        13) day.mins>=162.7 176  40 no (0.22727273 0.77272727)  
##          26) eve.mins< 141.45 19   6 yes (0.68421053 0.31578947) *
##          27) eve.mins>=141.45 157  27 no (0.17197452 0.82802548) *
##       7) customer.calls< 3.5 3457 262 no (0.07578826 0.92421174)  
##        14) intl.planno< 0.5 314 116 no (0.36942675 0.63057325)  
##          28) intl.calls< 2.5 60   0 yes (1.00000000 0.00000000) *
##          29) intl.calls>=2.5 254  56 no (0.22047244 0.77952756)  
##            58) intl.mins>=13.05 48   0 yes (1.00000000 0.00000000) *
##            59) intl.mins< 13.05 206   8 no (0.03883495 0.96116505) *
##        15) intl.planno>=0.5 3143 146 no (0.04645243 0.95354757)  
##          30) day.mins>=221.85 511  85 no (0.16634051 0.83365949)  
##            60) eve.mins>=242.35 99  41 yes (0.58585859 0.41414141)  
##             120) voice.planno>=0.5 79  21 yes (0.73417722 0.26582278) *
##             121) voice.planno< 0.5 20   0 no (0.00000000 1.00000000) *
##            61) eve.mins< 242.35 412  27 no (0.06553398 0.93446602) *
##          31) day.mins< 221.85 2632  61 no (0.02317629 0.97682371) *
# To do:  What is the default cp=  in the tuneGrid argument?
# To do:  Notice the terminal nodes indicated by * in the output
Ans:
The default cp = 0.10 in the package. 
In above we have choosen eleven cp grid point for tuning.
seq(0, 0.1, 0.01)
 [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10


Yes, the are eleven terminal nodes with asterisk sign.
## Model Pre-Pruning
dt_model_preprune <- train(churn ~. , data=churnTrain, method="rpart", metric="Accuracy",  tuneLength = 8, control = rpart.control(minsplit=50, minbucket=20, maxdepth=5))
print(dt_model_preprune$finalModel)
## n= 4000 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 4000 557 no (0.13925000 0.86075000)  
##    2) day.mins>=265.4 236  94 yes (0.60169492 0.39830508)  
##      4) voice.planno>=0.5 183  48 yes (0.73770492 0.26229508)  
##        8) eve.mins>=166.45 134  14 yes (0.89552239 0.10447761) *
##        9) eve.mins< 166.45 49  15 no (0.30612245 0.69387755) *
##      5) voice.planno< 0.5 53   7 no (0.13207547 0.86792453) *
##    3) day.mins< 265.4 3764 415 no (0.11025505 0.88974495)  
##      6) customer.calls>=3.5 307 153 no (0.49837134 0.50162866)  
##       12) day.mins< 162.7 131  18 yes (0.86259542 0.13740458) *
##       13) day.mins>=162.7 176  40 no (0.22727273 0.77272727) *
##      7) customer.calls< 3.5 3457 262 no (0.07578826 0.92421174)  
##       14) intl.planno< 0.5 314 116 no (0.36942675 0.63057325)  
##         28) intl.calls< 2.5 60   0 yes (1.00000000 0.00000000) *
##         29) intl.calls>=2.5 254  56 no (0.22047244 0.77952756)  
##           58) intl.mins>=13.05 48   0 yes (1.00000000 0.00000000) *
##           59) intl.mins< 13.05 206   8 no (0.03883495 0.96116505) *
##       15) intl.planno>=0.5 3143 146 no (0.04645243 0.95354757)  
##         30) day.mins>=221.85 511  85 no (0.16634051 0.83365949)  
##           60) eve.mins>=242.35 99  41 yes (0.58585859 0.41414141) *
##           61) eve.mins< 242.35 412  27 no (0.06553398 0.93446602) *
##         31) day.mins< 221.85 2632  61 no (0.02317629 0.97682371) *
# To do:  Explain the control argument.
Ans:

method="rpart" --> Decision tree
metric="Accuracy" --> use accuracy rate method
tuneLength = 8  --> Choose eight tuning grid

control = rpart.control(minsplit=50, minbucket=20, maxdepth=5)

minsplit=50: Minimum number of observations required in the node to split node is 50.

minbucket=20: Minimum number of observation required in terminal node is 20.

maxdepth=5: Maximum depth of any node is 5. The root node depth is zero.
## Model Post-Pruning  
dt_model_postprune <- prune(dt_model$finalModel, cp = 0.2)
print(dt_model_postprune)
## n= 4000 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 4000 557 no (0.1392500 0.8607500) *
## Check Decision Tree Classifier (1)
plot(dt_model$finalModel)
text(dt_model$finalModel)

#summary(dt_model$finalModel)
## Check Decision Tree Classifier (2)
library(rattle)
## Loading required package: bitops
## Rattle: A free graphical interface for data science with R.
## Version 5.5.1 Copyright (c) 2006-2021 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.
fancyRpartPlot(dt_model$finalModel)

# To do:  Try to explain the decision tree in the output
Ans:
The root node is split by day.mins variable at the threshold value day.mins>= 255, with yes 9% yes and 91% no. The yes output values from root node are split by variable voice.plano at threshold 0.05, with yes 7% and no 2%. Similarly, for all the nodes splitting. It has a depth of 3. The different colors for each yes and no category are provided and the color darkness by proportional to the accuracy.
## Check Decision Tree Classifier (3) 
library(rpart.plot)
prp(dt_model$finalModel)

rpart.plot(dt_model$finalModel)