READING AND EXPLORING DATA

First Few Rows of the Data Table

##    Id CreditLimit Male Education MaritalStatus Age BillOutstanding LastPayment
## 1:  1       20000    0         2             1  24            3913           0
## 2:  2      120000    0         2             2  26            2682           0
## 3:  3       90000    0         2             2  34           29239        1518
## 4:  4       50000    0         2             1  37           46990        2000
## 5:  5       50000    1         2             1  57            8617        2000
## 6:  6       50000    1         1             2  37           64400        2500
##    Default
## 1:       1
## 2:       1
## 3:       0
## 4:       0
## 5:       0
## 6:       0

Data Types of the Data Columns

## Classes 'data.table' and 'data.frame':   29601 obs. of  9 variables:
##  $ Id             : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ CreditLimit    : int  20000 120000 90000 50000 50000 50000 500000 100000 140000 20000 ...
##  $ Male           : int  0 0 0 0 1 1 1 0 0 1 ...
##  $ Education      : int  2 2 2 2 2 1 1 2 3 3 ...
##  $ MaritalStatus  : int  1 2 2 1 1 2 2 2 1 2 ...
##  $ Age            : int  24 26 34 37 57 37 29 23 28 35 ...
##  $ BillOutstanding: int  3913 2682 29239 46990 8617 64400 367965 11876 11285 0 ...
##  $ LastPayment    : int  0 0 1518 2000 2000 2500 55000 380 3329 0 ...
##  $ Default        : int  1 1 0 0 0 0 0 0 0 0 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Converting Data Type Structure

## [1] "0" "1"
## Classes 'data.table' and 'data.frame':   29601 obs. of  9 variables:
##  $ Id             : Factor w/ 29601 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ CreditLimit    : int  20000 120000 90000 50000 50000 50000 500000 100000 140000 20000 ...
##  $ Male           : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 1 1 2 ...
##  $ Education      : Factor w/ 4 levels "1","2","3","4": 2 2 2 2 2 1 1 2 3 3 ...
##  $ MaritalStatus  : Factor w/ 3 levels "1","2","3": 1 2 2 1 1 2 2 2 1 2 ...
##  $ Age            : int  24 26 34 37 57 37 29 23 28 35 ...
##  $ BillOutstanding: int  3913 2682 29239 46990 8617 64400 367965 11876 11285 0 ...
##  $ LastPayment    : int  0 0 1518 2000 2000 2500 55000 380 3329 0 ...
##  $ Default        : Factor w/ 2 levels "No","Yes": 2 2 1 1 1 1 1 1 1 1 ...
##  - attr(*, ".internal.selfref")=<externalptr>

SPLITTING DATA

Training the Logistic Regression Model

Running the Training Model

## 
## Call:
## NULL
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9767  -0.7751  -0.6468  -0.3778   4.1637  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -6.626e-01  4.867e-02 -13.614  < 2e-16 ***
## CreditLimit     -3.315e-06  1.615e-07 -20.523  < 2e-16 ***
## Male1            1.732e-01  3.226e-02   5.369 7.90e-08 ***
## Education2       1.606e-02  3.781e-02   0.425  0.67108    
## Education3       5.350e-03  4.933e-02   0.108  0.91363    
## Education4      -1.135e+00  3.952e-01  -2.872  0.00407 ** 
## MaritalStatus2  -2.464e-01  3.308e-02  -7.448 9.45e-14 ***
## MaritalStatus3  -6.862e-02  1.450e-01  -0.473  0.63608    
## BillOutstanding  1.400e-06  2.647e-07   5.292 1.21e-07 ***
## LastPayment     -2.302e-05  2.739e-06  -8.404  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 25142  on 23680  degrees of freedom
## Residual deviance: 24266  on 23671  degrees of freedom
## AIC: 24286
## 
## Number of Fisher Scoring iterations: 6

Predicting Test Set Results by Regression Model

Range of the Predicted Probabilities

## [1] "0.00001234883" "0.37989222693"

Making Confusion Matrix at Cut-off Probability “0.20”

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   No  Yes
##        No  2333  416
##        Yes 2266  905
##                                           
##                Accuracy : 0.547           
##                  95% CI : (0.5342, 0.5597)
##     No Information Rate : 0.7769          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.1283          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.6851          
##             Specificity : 0.5073          
##          Pos Pred Value : 0.2854          
##          Neg Pred Value : 0.8487          
##              Prevalence : 0.2231          
##          Detection Rate : 0.1529          
##    Detection Prevalence : 0.5356          
##       Balanced Accuracy : 0.5962          
##                                           
##        'Positive' Class : Yes             
## 

Calculating Accuracy, Sensitivity,Specificity at Different Cut-off Levels

##    cutoff  Accuracy  Senstivity Specificity       kappa
## 1    0.01 0.2256757 0.999242998 0.003479017 0.001217283
## 2    0.04 0.2331081 0.998485995 0.013263753 0.005283801
## 3    0.07 0.2498311 0.994700984 0.035877365 0.013935201
## 4    0.10 0.2856419 0.975018925 0.087627745 0.029507143
## 5    0.13 0.3315878 0.936411809 0.157860404 0.046618561
## 6    0.16 0.3866554 0.863739591 0.249619482 0.060218697
## 7    0.19 0.4641892 0.785768357 0.371819961 0.092532914
## 8    0.22 0.5469595 0.685087055 0.507284192 0.128324799
## 9    0.25 0.6238176 0.557153671 0.642965862 0.155715728
## 10   0.28 0.6881757 0.355791067 0.783648619 0.134225226
## 11   0.31 0.7467905 0.193792581 0.905631659 0.119821784
## 12   0.34 0.7684122 0.073429220 0.968036530 0.058448157
## 13   0.37 0.7770270 0.006056018 0.998477930 0.007000454
## 14   0.40 0.7768581 0.000000000 1.000000000 0.000000000