Preliminary Data Intake and Evaluation

Pull in data and examine structure. This is an RData file, so you could just select it in the files window at the bottom right of your RStudio panel. Or open it using the load() command, replacing my directory info with yours.

#load(here("GermanData.RData"))
load('GermanData.RData')

Examine structure of the dataset.

str(credit)

## 'data.frame':    1000 obs. of  21 variables:
##  $ V1 : chr  "A11" "A12" "A14" "A11" ...
##  $ V2 : int  6 48 12 42 24 36 24 36 12 30 ...
##  $ V3 : chr  "A34" "A32" "A34" "A32" ...
##  $ V4 : chr  "A43" "A43" "A46" "A42" ...
##  $ V5 : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ V6 : chr  "A65" "A61" "A61" "A61" ...
##  $ V7 : chr  "A75" "A73" "A74" "A74" ...
##  $ V8 : int  4 2 2 2 3 2 3 2 2 4 ...
##  $ V9 : chr  "A93" "A92" "A93" "A93" ...
##  $ V10: chr  "A101" "A101" "A101" "A103" ...
##  $ V11: int  4 2 3 4 4 4 4 2 4 2 ...
##  $ V12: chr  "A121" "A121" "A121" "A122" ...
##  $ V13: int  67 22 49 45 53 35 53 35 61 28 ...
##  $ V14: chr  "A143" "A143" "A143" "A143" ...
##  $ V15: chr  "A152" "A152" "A152" "A153" ...
##  $ V16: int  2 1 1 1 2 1 1 1 1 2 ...
##  $ V17: chr  "A173" "A173" "A172" "A173" ...
##  $ V18: int  1 1 2 2 2 2 1 1 1 1 ...
##  $ V19: chr  "A192" "A191" "A191" "A191" ...
##  $ V20: chr  "A201" "A201" "A201" "A201" ...
##  $ V21: int  1 2 1 1 2 1 1 1 1 2 ...

1000 observations with 20 predictors and 1 response (V21). Let’s clean up V21 and give the columns some names.

colnames(credit) <- c("Chk_acct_status",    #1
                      "Duration",           #2
                      "Cred_history",       #3 
                      "Purpose",            #4
                      "Cred_amnt",          #5 
                      "Savings_acct",       #6 
                      "Pres_emplymnt",      #7
                      "Instllmnt_rate",     #8
                      "Status_Gendr",       #9
                      "Other_debtor",       #10
                      "Present_res",        #11
                      "Property",           #12
                      "Age",                #13
                      "Other_instll_plns",  #14
                      "Housing",            #15
                      "Num_credits",        #16
                      "Job",                #17
                      "Num_people",         #18
                      "Telephone",          #19
                      "Foreign_wrkr",       #20
                      "Response")           #21

credit$Response <- credit$Response - 1
credit$Response <- as.factor(credit$Response)

#Lots of factors in this guy.  Here goes.
credit$Chk_acct_status <- as.factor(credit$Chk_acct_status)
credit$Cred_history <- as.factor(credit$Cred_history)
credit$Purpose <- as.factor(credit$Purpose)
credit$Savings_acct <- as.factor(credit$Savings_acct)
credit$Pres_emplymnt <- as.factor(credit$Pres_emplymnt)
credit$Status_Gendr <- as.factor(credit$Status_Gendr)
credit$Other_debtor <- as.factor(credit$Other_debtor)
credit$Property <- as.factor(credit$Property)
credit$Other_instll_plns <- as.factor(credit$Other_instll_plns)
credit$Housing <- as.factor(credit$Housing)
credit$Job <- as.factor(credit$Job)
credit$Telephone <- as.factor(credit$Telephone)
credit$Foreign_wrkr <- as.factor(credit$Foreign_wrkr)

Summary of dataframe:

summary(credit)

##  Chk_acct_status    Duration    Cred_history    Purpose      Cred_amnt    
##  A11:274         Min.   : 4.0   A30: 40      A43    :280   Min.   :  250  
##  A12:269         1st Qu.:12.0   A31: 49      A40    :234   1st Qu.: 1366  
##  A13: 63         Median :18.0   A32:530      A42    :181   Median : 2320  
##  A14:394         Mean   :20.9   A33: 88      A41    :103   Mean   : 3271  
##                  3rd Qu.:24.0   A34:293      A49    : 97   3rd Qu.: 3972  
##                  Max.   :72.0                A46    : 50   Max.   :18424  
##                                              (Other): 55                  
##  Savings_acct Pres_emplymnt Instllmnt_rate  Status_Gendr Other_debtor
##  A61:603      A71: 62       Min.   :1.000   A91: 50      A101:907    
##  A62:103      A72:172       1st Qu.:2.000   A92:310      A102: 41    
##  A63: 63      A73:339       Median :3.000   A93:548      A103: 52    
##  A64: 48      A74:174       Mean   :2.973   A94: 92                  
##  A65:183      A75:253       3rd Qu.:4.000                            
##                             Max.   :4.000                            
##                                                                      
##   Present_res    Property        Age        Other_instll_plns Housing   
##  Min.   :1.000   A121:282   Min.   :19.00   A141:139          A151:179  
##  1st Qu.:2.000   A122:232   1st Qu.:27.00   A142: 47          A152:713  
##  Median :3.000   A123:332   Median :33.00   A143:814          A153:108  
##  Mean   :2.845   A124:154   Mean   :35.55                               
##  3rd Qu.:4.000              3rd Qu.:42.00                               
##  Max.   :4.000              Max.   :75.00                               
##                                                                         
##   Num_credits      Job        Num_people    Telephone  Foreign_wrkr Response
##  Min.   :1.000   A171: 22   Min.   :1.000   A191:596   A201:963     0:700   
##  1st Qu.:1.000   A172:200   1st Qu.:1.000   A192:404   A202: 37     1:300   
##  Median :1.000   A173:630   Median :1.000                                   
##  Mean   :1.407   A174:148   Mean   :1.155                                   
##  3rd Qu.:2.000              3rd Qu.:1.000                                   
##  Max.   :4.000              Max.   :2.000                                   
##

Splitting the Data in Train / Test Datasets

Executing train/test split that is 80/20 based on the case study documentation. The first 800 observations become training, and the last 200 are for testing. Normally we’d want to use sample to secure a random sample from the dataset, but in this case we’ve been instructed otherwise.

For our tuning method we’ll want to use a pseudo-train / pseudo-test split of the training dataset. For that we’re going to use a 70/30 split, using the sample method.

set.seed(2021)

credit.train<-credit[1:800, ]
credit.test<-credit[801:1000, ]

#Create pseudo-train and pseudo-test as 70/30 split using random sampling
pseudo.index <- sample(1:nrow(credit.train), 0.7*nrow(credit.train))
credit.pseudo.train <- credit.train[pseudo.index, ]
credit.pseudo.test <- credit.train[-pseudo.index, ]
credit.pseudo.test.true <- credit.pseudo.test$Response

Now we will create a full model based on full training data set to evaluate for variance inflation factors, and decide whether to remove them prior to doing stepwise model selection.

credit.full.glm <- glm(Response ~ ., data = credit.train, family = "binomial")
credit.null.glm <- glm(Response ~ 1, data = credit.train, family = "binomial")
vif(credit.full.glm)

##    Chk_acct_statusA12    Chk_acct_statusA13    Chk_acct_statusA14 
##              1.550778              1.201653              1.479547 
##              Duration       Cred_historyA31       Cred_historyA32 
##              1.939880              2.430698              6.362419 
##       Cred_historyA33       Cred_historyA34            PurposeA41 
##              2.898700              4.785278              1.385062 
##           PurposeA410            PurposeA42            PurposeA43 
##              1.226612              1.612320              1.695109 
##            PurposeA44            PurposeA45            PurposeA46 
##              1.086700              1.203427              1.252694 
##            PurposeA48            PurposeA49             Cred_amnt 
##              1.108763              1.494406              2.578850 
##       Savings_acctA62       Savings_acctA63       Savings_acctA64 
##              1.214479              1.126197              1.084439 
##       Savings_acctA65      Pres_emplymntA72      Pres_emplymntA73 
##              1.198551              4.336868              5.842938 
##      Pres_emplymntA74      Pres_emplymntA75        Instllmnt_rate 
##              3.774112              4.540360              1.407154 
##       Status_GendrA92       Status_GendrA93       Status_GendrA94 
##              4.827323              5.165044              2.618493 
##      Other_debtorA102      Other_debtorA103           Present_res 
##              1.144550              1.131934              1.396683 
##          PropertyA122          PropertyA123          PropertyA124 
##              1.705516              1.802407              3.667789 
##                   Age Other_instll_plnsA142 Other_instll_plnsA143 
##              1.588120              1.431932              1.499877 
##           HousingA152           HousingA153           Num_credits 
##              1.864765              3.571403              1.679225 
##               JobA172               JobA173               JobA174 
##             11.173775             15.312464              9.064171 
##            Num_people         TelephoneA192      Foreign_wrkrA202 
##              1.225625              1.455223              1.061278

I observe large VIF in the Job variable, we’ll want to keep an eye on that.

Creating Initial Models

Proceeding with AIC stepwise model selection. These will be plugged in to models for the pseudo.train / pseudo.test tuning.

step.AIC.glm <- step(credit.null.glm, scope = list(upper = credit.full.glm),
                     direction = "both", test = "Chisq", trace = F)
summary(step.AIC.glm)

## 
## Call:
## glm(formula = Response ~ Chk_acct_status + Duration + Purpose + 
##     Cred_history + Status_Gendr + Instllmnt_rate + Other_debtor + 
##     Savings_acct + Cred_amnt + Foreign_wrkr + Pres_emplymnt + 
##     Age + Num_people + Num_credits, family = "binomial", data = credit.train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7958  -0.7171  -0.3801   0.6794   2.6147  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         3.937e-01  9.924e-01   0.397 0.691568    
## Chk_acct_statusA12 -2.987e-01  2.378e-01  -1.256 0.209060    
## Chk_acct_statusA13 -9.614e-01  3.951e-01  -2.434 0.014951 *  
## Chk_acct_statusA14 -1.768e+00  2.639e-01  -6.700 2.08e-11 ***
## Duration            2.997e-02  1.002e-02   2.992 0.002767 ** 
## PurposeA41         -1.676e+00  4.290e-01  -3.907 9.35e-05 ***
## PurposeA410        -1.368e+00  7.968e-01  -1.718 0.085888 .  
## PurposeA42         -8.492e-01  2.913e-01  -2.916 0.003549 ** 
## PurposeA43         -9.546e-01  2.748e-01  -3.474 0.000513 ***
## PurposeA44         -9.276e-01  8.784e-01  -1.056 0.290983    
## PurposeA45         -4.809e-01  5.949e-01  -0.808 0.418897    
## PurposeA46          1.690e-01  4.151e-01   0.407 0.683923    
## PurposeA48         -2.211e+00  1.192e+00  -1.855 0.063584 .  
## PurposeA49         -8.435e-01  3.752e-01  -2.248 0.024558 *  
## Cred_historyA31     3.533e-01  6.028e-01   0.586 0.557839    
## Cred_historyA32    -8.373e-01  4.679e-01  -1.790 0.073529 .  
## Cred_historyA33    -9.210e-01  5.150e-01  -1.788 0.073710 .  
## Cred_historyA34    -1.575e+00  4.831e-01  -3.260 0.001116 ** 
## Status_GendrA92    -3.871e-01  4.166e-01  -0.929 0.352768    
## Status_GendrA93    -1.177e+00  4.105e-01  -2.868 0.004131 ** 
## Status_GendrA94    -4.581e-01  4.985e-01  -0.919 0.358127    
## Instllmnt_rate      3.687e-01  9.631e-02   3.828 0.000129 ***
## Other_debtorA102    8.665e-01  4.696e-01   1.845 0.065026 .  
## Other_debtorA103   -9.132e-01  4.661e-01  -1.959 0.050076 .  
## Savings_acctA62    -3.345e-01  3.079e-01  -1.087 0.277254    
## Savings_acctA63    -4.750e-01  4.765e-01  -0.997 0.318873    
## Savings_acctA64    -1.207e+00  5.288e-01  -2.282 0.022501 *  
## Savings_acctA65    -6.888e-01  2.813e-01  -2.449 0.014326 *  
## Cred_amnt           1.157e-04  4.739e-05   2.442 0.014588 *  
## Foreign_wrkrA202   -1.443e+00  8.023e-01  -1.799 0.072009 .  
## Pres_emplymntA72   -1.436e-01  4.376e-01  -0.328 0.742696    
## Pres_emplymntA73   -3.323e-01  4.068e-01  -0.817 0.414015    
## Pres_emplymntA74   -1.085e+00  4.543e-01  -2.387 0.016979 *  
## Pres_emplymntA75   -3.387e-01  4.154e-01  -0.815 0.414863    
## Age                -1.993e-02  9.679e-03  -2.059 0.039510 *  
## Num_people          5.152e-01  2.790e-01   1.847 0.064772 .  
## Num_credits         3.197e-01  2.044e-01   1.564 0.117744    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 975.68  on 799  degrees of freedom
## Residual deviance: 717.52  on 763  degrees of freedom
## AIC: 791.52
## 
## Number of Fisher Scoring iterations: 5

There’s a lot of stuff going on with AIC, a very complex model.

Evaluating the model using BIC stepwise selection, now. BIC should, in theory, give us a cleaner model, because it penalizes model complexity more severely than AIC.

step.BIC.glm <- step(credit.null.glm, scope = list(upper = credit.full.glm),
                     direction = "both", test = "Chisq", trace = F, k=log(nrow(credit.train)))
summary(step.BIC.glm)

## 
## Call:
## glm(formula = Response ~ Chk_acct_status + Duration + Instllmnt_rate, 
##     family = "binomial", data = credit.train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6625  -0.8841  -0.4981   1.0258   2.5032  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -1.495796   0.310130  -4.823 1.41e-06 ***
## Chk_acct_statusA12 -0.302805   0.200405  -1.511  0.13080    
## Chk_acct_statusA13 -1.057450   0.362564  -2.917  0.00354 ** 
## Chk_acct_statusA14 -2.020412   0.232934  -8.674  < 2e-16 ***
## Duration            0.036564   0.006872   5.321 1.03e-07 ***
## Instllmnt_rate      0.208381   0.077129   2.702  0.00690 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 975.68  on 799  degrees of freedom
## Residual deviance: 833.93  on 794  degrees of freedom
## AIC: 845.93
## 
## Number of Fisher Scoring iterations: 4

True enough, BIC’s model is a lot less complex than the AIC. This may be too simple of a model to be of any use!

Model Diagnostics

Run Hosmer Lemeshow goodness of fit test for both AIC and BIC models:

hoslem.test(step.BIC.glm$y, fitted(step.BIC.glm), g=10)

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  step.BIC.glm$y, fitted(step.BIC.glm)
## X-squared = 17.695, df = 8, p-value = 0.02364

hoslem.test(step.AIC.glm$y, fitted(step.AIC.glm), g=10)

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  step.AIC.glm$y, fitted(step.AIC.glm)
## X-squared = 7.6941, df = 8, p-value = 0.4639

The null hypothesis of the Hosmer Lemeshow test is that the model is adequate and fits the data well. The alternate hypothesis is that the model is not adequate and does not fit the data well. Here we see that the p-value of the Hosmer Lemeshow test indicates we fail to reject the null hypothesis for only the AIC model, so we can say that our stepwise logistical model for the AIC fits the data well. However, we reject the null hypothesis for the BIC model, meaning it does not fit the data well.

Check Cook’s distance to see if there are any influential points on these models:

plot(step.AIC.glm, which = 4)

plot(step.BIC.glm, which = 4)

R identifies three points (this is by default) on both AIC and BIC. Biggest value is in the AIC model, at just over 0.04 with point 204. Given we have no guidance on influential points, let’s leave them as is.

Checking the model’s pseudo-R^2:

glm.pseudoR2.AIC <- PseudoR2(step.AIC.glm, which = c("McFadden","Nagel","CoxSnell"))
round(glm.pseudoR2.AIC, 3)

##   McFadden Nagelkerke   CoxSnell 
##      0.265      0.391      0.276

glm.pseudoR2.BIC <- PseudoR2(step.BIC.glm, which = c("McFadden","Nagel","CoxSnell"))
round(glm.pseudoR2.BIC, 3)

##   McFadden Nagelkerke   CoxSnell 
##      0.145      0.230      0.162

The Pseudo R^2 of the AIC model ranges from 26.5% for McFadden to 39.1% for Nagelkerke. The Pseudo R^2 of the BIC model ranges from 14.5% for McFadden to 23.0% for Nagelkerke. There are higher values of Pseudo R^2 for AIC. The simpler model of BIC may lend itself to easier interpretation, but as we can see with the Pseudo R^2 scores it has rubbish performance.

Performance Tuning of Models

Create the selection threshold sequence.

cut.seq <- seq(0.01, 0.99, by = 0.01)

Creating GLMs based on previous AIC/BIC structure using the pseudo.train data.

pseudo.step.AIC.glm <- glm(Response ~ Chk_acct_status + Duration + Purpose + 
    Cred_history + Status_Gendr + Instllmnt_rate + Other_debtor + 
    Savings_acct + Cred_amnt + Foreign_wrkr + Pres_emplymnt + 
    Age + Num_people + Num_credits, family = "binomial", data = credit.pseudo.train)

pseudo.step.BIC.glm <- glm(Response ~ Chk_acct_status + Duration + Instllmnt_rate, family = "binomial", data = credit.pseudo.train)

Process the AIC model to identify the optimal selection threshold.

metric.AIC=c();  #this used to be err.AIC=c();
for(i in 1:length(cut.seq)){
  
  est.prob = predict(pseudo.step.AIC.glm, newdata=credit.pseudo.test[ ,-21], type="response")  # prediction on credit.psuedo.test
  est.class = ifelse(est.prob > cut.seq[i], 1, 0)    # if p-hat >= cut-off then class=1
  # if p-hat < cut-off then class=0
  
  #err.AIC[i] = mean(credit.pseudo.test.true != est.class, na.rm=TRUE)  ## error rate 
  
  #keeping track of error rate + false good rate rather than just the error rate; we will choose the threshold that minimizes this value instead
  conf_mat = caret::confusionMatrix(as.factor(est.class), credit.pseudo.test.true, positive = "1")
  err = (1 - conf_mat$overall[["Accuracy"]])
  false_good = (1 - conf_mat$byClass[["Sensitivity"]])
  metric.AIC[i] = (err + false_good)
}

#had to replace err.BIC with metric.BIC
par(mfrow=c(1,1))
plot(cut.seq, metric.AIC, type="l")

min(metric.AIC)

## [1] 0.4849638

cut.seq[which.min(metric.AIC)]

## [1] 0.11

final.cut.AIC = cut.seq[which.min(metric.AIC)]

The optimal selection threshold is projected to be 0.11 for the AIC logistic model.

Applying this technique to BIC model now.

metric.BIC=c();   #this used to be err.AIC=c();
for(i in 1:length(cut.seq)){
  
  est.prob = predict(pseudo.step.BIC.glm, newdata=credit.pseudo.test[ ,-21], type="response")  # prediction on credit.psuedo.test
  est.class = ifelse(est.prob > cut.seq[i], 1, 0)    # if p-hat >= cut-off then class=1
  # if p-hat < cut-off then class=0
  
  #err.BIC[i] = mean(credit.pseudo.test.true != est.class, na.rm=TRUE)  ## error rate
  
  #keeping track of error rate + false good rate rather than just the error rate; we will choose the threshold that minimizes this value instead
  conf_mat = caret::confusionMatrix(as.factor(est.class), credit.pseudo.test.true, positive = "1")
  err = (1 - conf_mat$overall[["Accuracy"]])
  false_good = (1 - conf_mat$byClass[["Sensitivity"]])
  metric.BIC[i] = (err + false_good)
}

#had to replace err.BIC with metric.BIC
par(mfrow=c(1,1))
plot(cut.seq, metric.BIC, type="l")

min(metric.BIC)

## [1] 0.5490942

cut.seq[which.min(metric.BIC)]

## [1] 0.14

final.cut.BIC = cut.seq[which.min(metric.BIC)]

The optimal selection threshold is projected to be 0.14 for the BIC logistic model.

Let’s apply these to the test data and see what happens.

step.AIC.probs <- predict(pseudo.step.AIC.glm, credit.test, type = "response")
step.AIC.preds <- rep(0, length(step.AIC.probs))
step.AIC.preds[step.AIC.probs > final.cut.AIC] = 1
caret::confusionMatrix(as.factor(step.AIC.preds), credit.test$Response, positive = "1")

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 59  7
##          1 80 54
##                                           
##                Accuracy : 0.565           
##                  95% CI : (0.4933, 0.6348)
##     No Information Rate : 0.695           
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.2319          
##                                           
##  Mcnemar's Test P-Value : 1.171e-14       
##                                           
##             Sensitivity : 0.8852          
##             Specificity : 0.4245          
##          Pos Pred Value : 0.4030          
##          Neg Pred Value : 0.8939          
##              Prevalence : 0.3050          
##          Detection Rate : 0.2700          
##    Detection Prevalence : 0.6700          
##       Balanced Accuracy : 0.6549          
##                                           
##        'Positive' Class : 1               
##

Total error rate + False good rate (ratio of observations classified as good credit when reference is bad credit):

AIC.score = (1 - 0.565) + (7 / (7+54))
AIC.score

## [1] 0.5497541

step.BIC.probs <- predict(pseudo.step.BIC.glm, credit.test, type = "response")
step.BIC.preds <- rep(0, length(step.BIC.probs))
step.BIC.preds[step.AIC.probs > final.cut.BIC] = 1
caret::confusionMatrix(as.factor(step.BIC.preds), credit.test$Response, positive = "1")

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 67  8
##          1 72 53
##                                           
##                Accuracy : 0.6             
##                  95% CI : (0.5285, 0.6685)
##     No Information Rate : 0.695           
##     P-Value [Acc > NIR] : 0.9983          
##                                           
##                   Kappa : 0.2711          
##                                           
##  Mcnemar's Test P-Value : 1.873e-12       
##                                           
##             Sensitivity : 0.8689          
##             Specificity : 0.4820          
##          Pos Pred Value : 0.4240          
##          Neg Pred Value : 0.8933          
##              Prevalence : 0.3050          
##          Detection Rate : 0.2650          
##    Detection Prevalence : 0.6250          
##       Balanced Accuracy : 0.6754          
##                                           
##        'Positive' Class : 1               
##

Total error rate + False good rate (ratio of observations classified as good credit when reference is bad credit):

BIC.score = (1 - 0.6) + (8 / (8+53))
BIC.score

## [1] 0.5311475

Since we want to get as accurate a value a model as possible, we should theoretically run with the full model.

credit.opt.glm <- glm(Response ~ ., data = credit.pseudo.train, family = "binomial")

We now run a cut sequence with optimization.

#Here we establish the cut sequence, and create an "err" vector that is blank.

opt_probs = predict(credit.opt.glm, newdata=credit.pseudo.test[ ,-21], type="response")
#The model within the predict function is the one generated from stepwise model selection, based on AIC criteria
#Evaluating the model against the pseudo.test dataset
cut_seq <- seq(0, 1, by = 0.01)
err=c(); 

for(i in 1:length(cut_seq)){
  
  opt.preds = ifelse(opt_probs > cut_seq[i], 1, 0)
  opt.conf = confusionMatrix(as.factor(opt.preds), credit.pseudo.test$Response, positive='1')
  #I do not understand what this err function is calculating?
  err[i] = (1-opt.conf$overall[1]) + (1-opt.conf$byClass[1])
}

par(mfrow=c(1,1))
plot(cut_seq, err, type="l")

min(err)

## [1] 0.5054348

cut_seq[which.min(err)]

## [1] 0.15

cut_opt = cut_seq[which.min(err)]

The optimal cut-off threshold for the full model is 0.15. We now apply this to the test set and examine its performance.

credit.opt.probs <- predict(credit.opt.glm, credit.test, type = "response")
credit.opt.preds <- rep(0, length(credit.opt.probs))
credit.opt.preds[credit.opt.probs > cut_opt] = 1
caret::confusionMatrix(as.factor(credit.opt.preds), credit.test$Response, positive = "1")

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 78  7
##          1 61 54
##                                           
##                Accuracy : 0.66            
##                  95% CI : (0.5898, 0.7253)
##     No Information Rate : 0.695           
##     P-Value [Acc > NIR] : 0.8747          
##                                           
##                   Kappa : 0.3576          
##                                           
##  Mcnemar's Test P-Value : 1.3e-10         
##                                           
##             Sensitivity : 0.8852          
##             Specificity : 0.5612          
##          Pos Pred Value : 0.4696          
##          Neg Pred Value : 0.9176          
##              Prevalence : 0.3050          
##          Detection Rate : 0.2700          
##    Detection Prevalence : 0.5750          
##       Balanced Accuracy : 0.7232          
##                                           
##        'Positive' Class : 1               
##

Per the scoring metric:

Opt.score <- (1-0.66)+(7 / (7+54))
Opt.score

## [1] 0.4547541

1 - (7 / (7+54))

## [1] 0.8852459

We find that the “tuned” full model returns a performance metric of 0.45, better than the tuned AIC (0.549) or BIC models (0.531).

Performance of Models using 0.5 Threshold

Let’s evaluate both AIC and BIC using the standard 0.5 threshold, and then use a tuning method to optimize sensitivity / specificity. We’ll examine how these perform using confusion matrices. Then run the best against the testing set to see how we do.

step.AIC.probs <- predict(step.AIC.glm, credit.train, type = "response")
step.AIC.preds <- rep(0, length(step.AIC.probs))
step.AIC.preds[step.AIC.probs > 0.5] = 1
caret::confusionMatrix(as.factor(step.AIC.preds), credit.train$Response, positive = "1")

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 504 108
##          1  57 131
##                                          
##                Accuracy : 0.7938         
##                  95% CI : (0.764, 0.8213)
##     No Information Rate : 0.7012         
##     P-Value [Acc > NIR] : 2.105e-09      
##                                          
##                   Kappa : 0.4756         
##                                          
##  Mcnemar's Test P-Value : 9.922e-05      
##                                          
##             Sensitivity : 0.5481         
##             Specificity : 0.8984         
##          Pos Pred Value : 0.6968         
##          Neg Pred Value : 0.8235         
##              Prevalence : 0.2988         
##          Detection Rate : 0.1638         
##    Detection Prevalence : 0.2350         
##       Balanced Accuracy : 0.7233         
##                                          
##        'Positive' Class : 1              
##

#Since Response is "credit risk" with 0 being no risk, and 1 being risk, I'm coding positive as "1" for "event"

The performance metric for this model is:

(1-0.7938) + (108 / (108+131))

## [1] 0.6580828

This performance metric is nowhere near as low as the best model we currently have.

step.BIC.probs <- predict(step.BIC.glm, credit.train, type = "response")
step.BIC.preds <- rep(0, length(step.BIC.probs))
step.BIC.preds[step.BIC.probs > 0.5] = 1
caret::confusionMatrix(as.factor(step.BIC.preds), credit.train$Response, positive = "1")

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 513 161
##          1  48  78
##                                           
##                Accuracy : 0.7388          
##                  95% CI : (0.7068, 0.7689)
##     No Information Rate : 0.7012          
##     P-Value [Acc > NIR] : 0.01066         
##                                           
##                   Kappa : 0.2786          
##                                           
##  Mcnemar's Test P-Value : 9.394e-15       
##                                           
##             Sensitivity : 0.3264          
##             Specificity : 0.9144          
##          Pos Pred Value : 0.6190          
##          Neg Pred Value : 0.7611          
##              Prevalence : 0.2988          
##          Detection Rate : 0.0975          
##    Detection Prevalence : 0.1575          
##       Balanced Accuracy : 0.6204          
##                                           
##        'Positive' Class : 1               
##

#Since Response is "credit risk" with 0 being no risk, and 1 being risk, I'm coding positive as "1" for "event".

The performance metric for this model is:

(1-0.7388) + (161 / (161+78))

## [1] 0.9348402

Oh my. That’s not very good at all.

Now to tune AIC for sensitivity and specificity, and evaluate it against test set.

#ROC Curve and AUC
pr = predict(step.AIC.glm, credit.train, type = "response") # make prediction with last built model on training set
response = credit.train$Response

pred <- prediction(pr, response) #Predicted Probability and True Classification

# area under curve
auc <- round(as.numeric(performance(pred, measure = "auc")@y.values),3)

# some important statistics
false.rates <-performance(pred, "fpr","fnr")
accuracy <-performance(pred, "acc","err")
perf <- performance(pred, "tpr","fpr")

#plotting the ROC curve and computing AUC
plot(perf,colorize = T, main = "ROC Curve")
text(0.5,0.5, paste("AUC:", auc))

# computing threshold for cutoff to best trade off sensitivity and specificity
#first sensitivity
plot(unlist(performance(pred, "sens")@x.values), unlist(performance(pred, "sens")@y.values), 
     type="l", lwd=2, 
     ylab="Sensitivity", xlab="Cutoff", main = paste("Maximized Cutoff\n","AUC: ",auc))

par(new=TRUE) # plot another line in same plot

#second specificity
plot(unlist(performance(pred, "spec")@x.values), unlist(performance(pred, "spec")@y.values), 
     type="l", lwd=2, col='red', ylab="", xlab="")
axis(4, at=seq(0,1,0.2)) #specificity axis labels
mtext("Specificity",side=4, col='red')

#find where the lines intersect
min.diff <-which.min(abs(unlist(performance(pred, "sens")@y.values) - unlist(performance(pred, "spec")@y.values)))
min.x<-unlist(performance(pred, "sens")@x.values)[min.diff]
min.y<-unlist(performance(pred, "spec")@y.values)[min.diff]
optimal <-min.x #this is the optimal points to best trade off sensitivity and specificity

abline(h = min.y, lty = 3)
abline(v = min.x, lty = 3)
text(min.x,0,paste("optimal threshold=",round(optimal,2)), pos = 4)

Now that we have a threshold, let’s run the AIC model against the test data.

step.AIC.probs <- predict(step.AIC.glm, credit.test, type = "response")
step.AIC.preds <- rep(0, length(step.AIC.probs))
step.AIC.preds[step.AIC.probs > 0.32] = 1
caret::confusionMatrix(as.factor(step.AIC.preds), credit.test$Response, positive = "1", mode='prec_recall')

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 96 18
##          1 43 43
##                                          
##                Accuracy : 0.695          
##                  95% CI : (0.6261, 0.758)
##     No Information Rate : 0.695          
##     P-Value [Acc > NIR] : 0.53455        
##                                          
##                   Kappa : 0.3548         
##                                          
##  Mcnemar's Test P-Value : 0.00212        
##                                          
##               Precision : 0.5000         
##                  Recall : 0.7049         
##                      F1 : 0.5850         
##              Prevalence : 0.3050         
##          Detection Rate : 0.2150         
##    Detection Prevalence : 0.4300         
##       Balanced Accuracy : 0.6978         
##                                          
##        'Positive' Class : 1              
##

Total error rate + False good rate (ratio of observations classified as good credit when reference is bad credit):

AIC.score = (1 - 0.695) + (18 / (18+43))
AIC.score

## [1] 0.600082

The tuned, AIC stepwise model has a score metric of 0.600, which is better than the un-tuned models, but not as good as the models tuned to minimize the performance metric factors.

Now, BIC.

#ROC Curve and AUC
pr = predict(step.BIC.glm, credit.train, type = "response") # make prediction with last built model on training set
response = credit.train$Response

pred <- prediction(pr, response) #Predicted Probability and True Classification

# area under curve
auc <- round(as.numeric(performance(pred, measure = "auc")@y.values),3)

# some important statistics
false.rates <-performance(pred, "fpr","fnr")
accuracy <-performance(pred, "acc","err")
perf <- performance(pred, "tpr","fpr")

#plotting the ROC curve and computing AUC
plot(perf,colorize = T, main = "ROC Curve")
text(0.5,0.5, paste("AUC:", auc))

# computing threshold for cutoff to best trade off sensitivity and specificity
#first sensitivity
plot(unlist(performance(pred, "sens")@x.values), unlist(performance(pred, "sens")@y.values), 
     type="l", lwd=2, 
     ylab="Sensitivity", xlab="Cutoff", main = paste("Maximized Cutoff\n","AUC: ",auc))

par(new=TRUE) # plot another line in same plot

#second specificity
plot(unlist(performance(pred, "spec")@x.values), unlist(performance(pred, "spec")@y.values), 
     type="l", lwd=2, col='red', ylab="", xlab="")
axis(4, at=seq(0,1,0.2)) #specificity axis labels
mtext("Specificity",side=4, col='red')

#find where the lines intersect
min.diff <-which.min(abs(unlist(performance(pred, "sens")@y.values) - unlist(performance(pred, "spec")@y.values)))
min.x<-unlist(performance(pred, "sens")@x.values)[min.diff]
min.y<-unlist(performance(pred, "spec")@y.values)[min.diff]
optimal <-min.x #this is the optimal points to best trade off sensitivity and specificity

abline(h = min.y, lty = 3)
abline(v = min.x, lty = 3)
text(min.x,0,paste("optimal threshold=",round(optimal,2)), pos = 4)

Now that we have a threshold, let’s run the BIC model against the test data.

step.BIC.probs <- predict(step.BIC.glm, credit.test, type = "response")
step.BIC.preds <- rep(0, length(step.BIC.probs))
step.BIC.preds[step.AIC.probs > 0.35] = 1
caret::confusionMatrix(as.factor(step.BIC.preds), credit.test$Response, positive = "1", mode='prec_recall')

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 103  18
##          1  36  43
##                                           
##                Accuracy : 0.73            
##                  95% CI : (0.6628, 0.7902)
##     No Information Rate : 0.695           
##     P-Value [Acc > NIR] : 0.1590          
##                                           
##                   Kappa : 0.4118          
##                                           
##  Mcnemar's Test P-Value : 0.0207          
##                                           
##               Precision : 0.5443          
##                  Recall : 0.7049          
##                      F1 : 0.6143          
##              Prevalence : 0.3050          
##          Detection Rate : 0.2150          
##    Detection Prevalence : 0.3950          
##       Balanced Accuracy : 0.7230          
##                                           
##        'Positive' Class : 1               
##

BIC score:

BIC.score = (1 - 0.73) + (18 / (18+43))
BIC.score

## [1] 0.565082

The tuned, BIC stepwise model has a score metric of 0.565. This is a significant improvement over the un-tuned BIC model, and marginally worse than the BIC tuned to minimize the performance metric.

DA 6213 - Case Study 2

German Credit Data - Default Risk Modeling

D. Mendoza, J. Schulze, S. Sutter, T. Tonche, L. Waters

3/20/2021

Preliminary Data Intake and Evaluation

Splitting the Data in Train / Test Datasets

Creating Initial Models

Model Diagnostics

Performance Tuning of Models

Performance of Models using 0.5 Threshold