LOADING DATA INTO R ENVIRONMENT

TRAINING THE DECISION TREE MODEL

Running the Training Model

## CART 
## 
## 23681 samples
##     7 predictor
##     2 classes: 'No', 'Yes' 
## 
## No pre-processing
## Resampling: Cross-Validated (4 fold) 
## Summary of sample sizes: 17761, 17761, 17761, 17760 
## Resampling results across tuning parameters:
## 
##   cp           Accuracy   Kappa     
##   0.002081756  0.7782188  0.05488130
##   0.002933384  0.7776698  0.05526216
##   0.002964926  0.7776698  0.05526216
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.002081756.

Variable Importance in Decision Tree Model

TESTING THE DECISION TREE MODEL

Confusion Matrix at 50% Cut-Off Probability

## Confusion Matrix and Statistics
## 
##          Actual
## Predicted  Yes   No
##       Yes   53   40
##       No  1268 4559
##                                           
##                Accuracy : 0.7791          
##                  95% CI : (0.7683, 0.7896)
##     No Information Rate : 0.7769          
##     P-Value [Acc > NIR] : 0.3491          
##                                           
##                   Kappa : 0.047           
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.040121        
##             Specificity : 0.991302        
##          Pos Pred Value : 0.569892        
##          Neg Pred Value : 0.782392        
##              Prevalence : 0.223142        
##          Detection Rate : 0.008953        
##    Detection Prevalence : 0.015709        
##       Balanced Accuracy : 0.515712        
##                                           
##        'Positive' Class : Yes             
## 

Confusion Matrix at 55% Cut-Off Probability –

Comparison with 50% Cutoff Probability: no change in Confusion Matrices

## Confusion Matrix and Statistics
## 
##          Actual
## Predicted  Yes   No
##       Yes   53   40
##       No  1268 4559
##                                           
##                Accuracy : 0.7791          
##                  95% CI : (0.7683, 0.7896)
##     No Information Rate : 0.7769          
##     P-Value [Acc > NIR] : 0.3491          
##                                           
##                   Kappa : 0.047           
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.040121        
##             Specificity : 0.991302        
##          Pos Pred Value : 0.569892        
##          Neg Pred Value : 0.782392        
##              Prevalence : 0.223142        
##          Detection Rate : 0.008953        
##    Detection Prevalence : 0.015709        
##       Balanced Accuracy : 0.515712        
##                                           
##        'Positive' Class : Yes             
## 

Confusion Matrix at 45% Cut-Off Probability –

The Confusion Matrix at 45% Cutoff Probability is different than the Confusion Matrix at 50% Cutoff Probability:

(This is because segment 4 is classified as defaulters, when the cutoff probability is 45%, but is classified as non-defaulters, when the cutoff probability is 55%)

## Confusion Matrix and Statistics
## 
##          Actual
## Predicted  Yes   No
##       Yes  145  143
##       No  1176 4456
##                                           
##                Accuracy : 0.7772          
##                  95% CI : (0.7664, 0.7877)
##     No Information Rate : 0.7769          
##     P-Value [Acc > NIR] : 0.4825          
##                                           
##                   Kappa : 0.1091          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.10977         
##             Specificity : 0.96891         
##          Pos Pred Value : 0.50347         
##          Neg Pred Value : 0.79119         
##              Prevalence : 0.22314         
##          Detection Rate : 0.02449         
##    Detection Prevalence : 0.04865         
##       Balanced Accuracy : 0.53934         
##                                           
##        'Positive' Class : Yes             
## 

Performance Metrics at different Cut-Off Probabilities

##    cutoff  Accuracy Senstivity Specificity      kappa
## 1    0.00 0.2231419 1.00000000   0.0000000 0.00000000
## 2    0.05 0.2231419 1.00000000   0.0000000 0.00000000
## 3    0.10 0.2231419 1.00000000   0.0000000 0.00000000
## 4    0.15 0.2231419 1.00000000   0.0000000 0.00000000
## 5    0.20 0.7250000 0.28538986   0.8512720 0.14697089
## 6    0.25 0.7250000 0.28538986   0.8512720 0.14697089
## 7    0.30 0.7643581 0.16654050   0.9360731 0.13117140
## 8    0.35 0.7771959 0.14004542   0.9602087 0.13495012
## 9    0.40 0.7771959 0.10976533   0.9689063 0.10906688
## 10   0.45 0.7771959 0.10976533   0.9689063 0.10906688
## 11   0.50 0.7790541 0.04012112   0.9913025 0.04699149
## 12   0.55 0.7790541 0.04012112   0.9913025 0.04699149
## 13   0.60 0.7778716 0.02271007   0.9947815 0.02657536
## 14   0.65 0.7783784 0.00984103   0.9991302 0.01384019
## 15   0.70 0.7783784 0.00984103   0.9991302 0.01384019
## 16   0.75 0.7768581 0.00000000   1.0000000 0.00000000
## 17   0.80 0.7768581 0.00000000   1.0000000 0.00000000
## 18   0.85 0.7768581 0.00000000   1.0000000 0.00000000
## 19   0.90 0.7768581 0.00000000   1.0000000 0.00000000
## 20   0.95 0.7768581 0.00000000   1.0000000 0.00000000
## 21   1.00 0.7768581 0.00000000   1.0000000 0.00000000