Pull in data and examine structure. This is an RData file, so you could just select it in the files window at the bottom right of your RStudio panel. Or open it using the load() command, replacing my directory info with yours.
#load(here("GermanData.RData"))
load('GermanData.RData')
Examine structure of the dataset.
str(credit)
## 'data.frame': 1000 obs. of 21 variables:
## $ V1 : chr "A11" "A12" "A14" "A11" ...
## $ V2 : int 6 48 12 42 24 36 24 36 12 30 ...
## $ V3 : chr "A34" "A32" "A34" "A32" ...
## $ V4 : chr "A43" "A43" "A46" "A42" ...
## $ V5 : int 1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
## $ V6 : chr "A65" "A61" "A61" "A61" ...
## $ V7 : chr "A75" "A73" "A74" "A74" ...
## $ V8 : int 4 2 2 2 3 2 3 2 2 4 ...
## $ V9 : chr "A93" "A92" "A93" "A93" ...
## $ V10: chr "A101" "A101" "A101" "A103" ...
## $ V11: int 4 2 3 4 4 4 4 2 4 2 ...
## $ V12: chr "A121" "A121" "A121" "A122" ...
## $ V13: int 67 22 49 45 53 35 53 35 61 28 ...
## $ V14: chr "A143" "A143" "A143" "A143" ...
## $ V15: chr "A152" "A152" "A152" "A153" ...
## $ V16: int 2 1 1 1 2 1 1 1 1 2 ...
## $ V17: chr "A173" "A173" "A172" "A173" ...
## $ V18: int 1 1 2 2 2 2 1 1 1 1 ...
## $ V19: chr "A192" "A191" "A191" "A191" ...
## $ V20: chr "A201" "A201" "A201" "A201" ...
## $ V21: int 1 2 1 1 2 1 1 1 1 2 ...
1000 observations with 20 predictors and 1 response (V21). Let’s clean up V21 and give the columns some names.
colnames(credit) <- c("Chk_acct_status", #1
"Duration", #2
"Cred_history", #3
"Purpose", #4
"Cred_amnt", #5
"Savings_acct", #6
"Pres_emplymnt", #7
"Instllmnt_rate", #8
"Status_Gendr", #9
"Other_debtor", #10
"Present_res", #11
"Property", #12
"Age", #13
"Other_instll_plns", #14
"Housing", #15
"Num_credits", #16
"Job", #17
"Num_people", #18
"Telephone", #19
"Foreign_wrkr", #20
"Response") #21
credit$Response <- credit$Response - 1
credit$Response <- as.factor(credit$Response)
#Lots of factors in this guy. Here goes.
credit$Chk_acct_status <- as.factor(credit$Chk_acct_status)
credit$Cred_history <- as.factor(credit$Cred_history)
credit$Purpose <- as.factor(credit$Purpose)
credit$Savings_acct <- as.factor(credit$Savings_acct)
credit$Pres_emplymnt <- as.factor(credit$Pres_emplymnt)
credit$Status_Gendr <- as.factor(credit$Status_Gendr)
credit$Other_debtor <- as.factor(credit$Other_debtor)
credit$Property <- as.factor(credit$Property)
credit$Other_instll_plns <- as.factor(credit$Other_instll_plns)
credit$Housing <- as.factor(credit$Housing)
credit$Job <- as.factor(credit$Job)
credit$Telephone <- as.factor(credit$Telephone)
credit$Foreign_wrkr <- as.factor(credit$Foreign_wrkr)
Summary of dataframe:
summary(credit)
## Chk_acct_status Duration Cred_history Purpose Cred_amnt
## A11:274 Min. : 4.0 A30: 40 A43 :280 Min. : 250
## A12:269 1st Qu.:12.0 A31: 49 A40 :234 1st Qu.: 1366
## A13: 63 Median :18.0 A32:530 A42 :181 Median : 2320
## A14:394 Mean :20.9 A33: 88 A41 :103 Mean : 3271
## 3rd Qu.:24.0 A34:293 A49 : 97 3rd Qu.: 3972
## Max. :72.0 A46 : 50 Max. :18424
## (Other): 55
## Savings_acct Pres_emplymnt Instllmnt_rate Status_Gendr Other_debtor
## A61:603 A71: 62 Min. :1.000 A91: 50 A101:907
## A62:103 A72:172 1st Qu.:2.000 A92:310 A102: 41
## A63: 63 A73:339 Median :3.000 A93:548 A103: 52
## A64: 48 A74:174 Mean :2.973 A94: 92
## A65:183 A75:253 3rd Qu.:4.000
## Max. :4.000
##
## Present_res Property Age Other_instll_plns Housing
## Min. :1.000 A121:282 Min. :19.00 A141:139 A151:179
## 1st Qu.:2.000 A122:232 1st Qu.:27.00 A142: 47 A152:713
## Median :3.000 A123:332 Median :33.00 A143:814 A153:108
## Mean :2.845 A124:154 Mean :35.55
## 3rd Qu.:4.000 3rd Qu.:42.00
## Max. :4.000 Max. :75.00
##
## Num_credits Job Num_people Telephone Foreign_wrkr Response
## Min. :1.000 A171: 22 Min. :1.000 A191:596 A201:963 0:700
## 1st Qu.:1.000 A172:200 1st Qu.:1.000 A192:404 A202: 37 1:300
## Median :1.000 A173:630 Median :1.000
## Mean :1.407 A174:148 Mean :1.155
## 3rd Qu.:2.000 3rd Qu.:1.000
## Max. :4.000 Max. :2.000
##
Executing train/test split that is 80/20 based on the case study documentation. The first 800 observations become training, and the last 200 are for testing. Normally we’d want to use sample to secure a random sample from the dataset, but in this case we’ve been instructed otherwise.
For our tuning method we’ll want to use a pseudo-train / pseudo-test split of the training dataset. For that we’re going to use a 70/30 split, using the sample method.
set.seed(2021)
credit.train<-credit[1:800, ]
credit.test<-credit[801:1000, ]
#Create pseudo-train and pseudo-test as 70/30 split using random sampling
pseudo.index <- sample(1:nrow(credit.train), 0.7*nrow(credit.train))
credit.pseudo.train <- credit.train[pseudo.index, ]
credit.pseudo.test <- credit.train[-pseudo.index, ]
credit.pseudo.test.true <- credit.pseudo.test$Response
Now we will create a full model based on full training data set to evaluate for variance inflation factors, and decide whether to remove them prior to doing stepwise model selection.
credit.full.glm <- glm(Response ~ ., data = credit.train, family = "binomial")
credit.null.glm <- glm(Response ~ 1, data = credit.train, family = "binomial")
vif(credit.full.glm)
## Chk_acct_statusA12 Chk_acct_statusA13 Chk_acct_statusA14
## 1.550778 1.201653 1.479547
## Duration Cred_historyA31 Cred_historyA32
## 1.939880 2.430698 6.362419
## Cred_historyA33 Cred_historyA34 PurposeA41
## 2.898700 4.785278 1.385062
## PurposeA410 PurposeA42 PurposeA43
## 1.226612 1.612320 1.695109
## PurposeA44 PurposeA45 PurposeA46
## 1.086700 1.203427 1.252694
## PurposeA48 PurposeA49 Cred_amnt
## 1.108763 1.494406 2.578850
## Savings_acctA62 Savings_acctA63 Savings_acctA64
## 1.214479 1.126197 1.084439
## Savings_acctA65 Pres_emplymntA72 Pres_emplymntA73
## 1.198551 4.336868 5.842938
## Pres_emplymntA74 Pres_emplymntA75 Instllmnt_rate
## 3.774112 4.540360 1.407154
## Status_GendrA92 Status_GendrA93 Status_GendrA94
## 4.827323 5.165044 2.618493
## Other_debtorA102 Other_debtorA103 Present_res
## 1.144550 1.131934 1.396683
## PropertyA122 PropertyA123 PropertyA124
## 1.705516 1.802407 3.667789
## Age Other_instll_plnsA142 Other_instll_plnsA143
## 1.588120 1.431932 1.499877
## HousingA152 HousingA153 Num_credits
## 1.864765 3.571403 1.679225
## JobA172 JobA173 JobA174
## 11.173775 15.312464 9.064171
## Num_people TelephoneA192 Foreign_wrkrA202
## 1.225625 1.455223 1.061278
I observe large VIF in the Job variable, we’ll want to keep an eye on that.
Proceeding with AIC stepwise model selection. These will be plugged in to models for the pseudo.train / pseudo.test tuning.
step.AIC.glm <- step(credit.null.glm, scope = list(upper = credit.full.glm),
direction = "both", test = "Chisq", trace = F)
summary(step.AIC.glm)
##
## Call:
## glm(formula = Response ~ Chk_acct_status + Duration + Purpose +
## Cred_history + Status_Gendr + Instllmnt_rate + Other_debtor +
## Savings_acct + Cred_amnt + Foreign_wrkr + Pres_emplymnt +
## Age + Num_people + Num_credits, family = "binomial", data = credit.train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7958 -0.7171 -0.3801 0.6794 2.6147
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.937e-01 9.924e-01 0.397 0.691568
## Chk_acct_statusA12 -2.987e-01 2.378e-01 -1.256 0.209060
## Chk_acct_statusA13 -9.614e-01 3.951e-01 -2.434 0.014951 *
## Chk_acct_statusA14 -1.768e+00 2.639e-01 -6.700 2.08e-11 ***
## Duration 2.997e-02 1.002e-02 2.992 0.002767 **
## PurposeA41 -1.676e+00 4.290e-01 -3.907 9.35e-05 ***
## PurposeA410 -1.368e+00 7.968e-01 -1.718 0.085888 .
## PurposeA42 -8.492e-01 2.913e-01 -2.916 0.003549 **
## PurposeA43 -9.546e-01 2.748e-01 -3.474 0.000513 ***
## PurposeA44 -9.276e-01 8.784e-01 -1.056 0.290983
## PurposeA45 -4.809e-01 5.949e-01 -0.808 0.418897
## PurposeA46 1.690e-01 4.151e-01 0.407 0.683923
## PurposeA48 -2.211e+00 1.192e+00 -1.855 0.063584 .
## PurposeA49 -8.435e-01 3.752e-01 -2.248 0.024558 *
## Cred_historyA31 3.533e-01 6.028e-01 0.586 0.557839
## Cred_historyA32 -8.373e-01 4.679e-01 -1.790 0.073529 .
## Cred_historyA33 -9.210e-01 5.150e-01 -1.788 0.073710 .
## Cred_historyA34 -1.575e+00 4.831e-01 -3.260 0.001116 **
## Status_GendrA92 -3.871e-01 4.166e-01 -0.929 0.352768
## Status_GendrA93 -1.177e+00 4.105e-01 -2.868 0.004131 **
## Status_GendrA94 -4.581e-01 4.985e-01 -0.919 0.358127
## Instllmnt_rate 3.687e-01 9.631e-02 3.828 0.000129 ***
## Other_debtorA102 8.665e-01 4.696e-01 1.845 0.065026 .
## Other_debtorA103 -9.132e-01 4.661e-01 -1.959 0.050076 .
## Savings_acctA62 -3.345e-01 3.079e-01 -1.087 0.277254
## Savings_acctA63 -4.750e-01 4.765e-01 -0.997 0.318873
## Savings_acctA64 -1.207e+00 5.288e-01 -2.282 0.022501 *
## Savings_acctA65 -6.888e-01 2.813e-01 -2.449 0.014326 *
## Cred_amnt 1.157e-04 4.739e-05 2.442 0.014588 *
## Foreign_wrkrA202 -1.443e+00 8.023e-01 -1.799 0.072009 .
## Pres_emplymntA72 -1.436e-01 4.376e-01 -0.328 0.742696
## Pres_emplymntA73 -3.323e-01 4.068e-01 -0.817 0.414015
## Pres_emplymntA74 -1.085e+00 4.543e-01 -2.387 0.016979 *
## Pres_emplymntA75 -3.387e-01 4.154e-01 -0.815 0.414863
## Age -1.993e-02 9.679e-03 -2.059 0.039510 *
## Num_people 5.152e-01 2.790e-01 1.847 0.064772 .
## Num_credits 3.197e-01 2.044e-01 1.564 0.117744
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 975.68 on 799 degrees of freedom
## Residual deviance: 717.52 on 763 degrees of freedom
## AIC: 791.52
##
## Number of Fisher Scoring iterations: 5
There’s a lot of stuff going on with AIC, a very complex model.
Evaluating the model using BIC stepwise selection, now. BIC should, in theory, give us a cleaner model, because it penalizes model complexity more severely than AIC.
step.BIC.glm <- step(credit.null.glm, scope = list(upper = credit.full.glm),
direction = "both", test = "Chisq", trace = F, k=log(nrow(credit.train)))
summary(step.BIC.glm)
##
## Call:
## glm(formula = Response ~ Chk_acct_status + Duration + Instllmnt_rate,
## family = "binomial", data = credit.train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6625 -0.8841 -0.4981 1.0258 2.5032
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.495796 0.310130 -4.823 1.41e-06 ***
## Chk_acct_statusA12 -0.302805 0.200405 -1.511 0.13080
## Chk_acct_statusA13 -1.057450 0.362564 -2.917 0.00354 **
## Chk_acct_statusA14 -2.020412 0.232934 -8.674 < 2e-16 ***
## Duration 0.036564 0.006872 5.321 1.03e-07 ***
## Instllmnt_rate 0.208381 0.077129 2.702 0.00690 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 975.68 on 799 degrees of freedom
## Residual deviance: 833.93 on 794 degrees of freedom
## AIC: 845.93
##
## Number of Fisher Scoring iterations: 4
True enough, BIC’s model is a lot less complex than the AIC. This may be too simple of a model to be of any use!
Run Hosmer Lemeshow goodness of fit test for both AIC and BIC models:
hoslem.test(step.BIC.glm$y, fitted(step.BIC.glm), g=10)
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: step.BIC.glm$y, fitted(step.BIC.glm)
## X-squared = 17.695, df = 8, p-value = 0.02364
hoslem.test(step.AIC.glm$y, fitted(step.AIC.glm), g=10)
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: step.AIC.glm$y, fitted(step.AIC.glm)
## X-squared = 7.6941, df = 8, p-value = 0.4639
The null hypothesis of the Hosmer Lemeshow test is that the model is adequate and fits the data well. The alternate hypothesis is that the model is not adequate and does not fit the data well. Here we see that the p-value of the Hosmer Lemeshow test indicates we fail to reject the null hypothesis for only the AIC model, so we can say that our stepwise logistical model for the AIC fits the data well. However, we reject the null hypothesis for the BIC model, meaning it does not fit the data well.
Check Cook’s distance to see if there are any influential points on these models:
plot(step.AIC.glm, which = 4)
plot(step.BIC.glm, which = 4)
R identifies three points (this is by default) on both AIC and BIC. Biggest value is in the AIC model, at just over 0.04 with point 204. Given we have no guidance on influential points, let’s leave them as is.
Checking the model’s pseudo-R^2:
glm.pseudoR2.AIC <- PseudoR2(step.AIC.glm, which = c("McFadden","Nagel","CoxSnell"))
round(glm.pseudoR2.AIC, 3)
## McFadden Nagelkerke CoxSnell
## 0.265 0.391 0.276
glm.pseudoR2.BIC <- PseudoR2(step.BIC.glm, which = c("McFadden","Nagel","CoxSnell"))
round(glm.pseudoR2.BIC, 3)
## McFadden Nagelkerke CoxSnell
## 0.145 0.230 0.162
The Pseudo R^2 of the AIC model ranges from 26.5% for McFadden to 39.1% for Nagelkerke. The Pseudo R^2 of the BIC model ranges from 14.5% for McFadden to 23.0% for Nagelkerke. There are higher values of Pseudo R^2 for AIC. The simpler model of BIC may lend itself to easier interpretation, but as we can see with the Pseudo R^2 scores it has rubbish performance.
Create the selection threshold sequence.
cut.seq <- seq(0.01, 0.99, by = 0.01)
Creating GLMs based on previous AIC/BIC structure using the pseudo.train data.
pseudo.step.AIC.glm <- glm(Response ~ Chk_acct_status + Duration + Purpose +
Cred_history + Status_Gendr + Instllmnt_rate + Other_debtor +
Savings_acct + Cred_amnt + Foreign_wrkr + Pres_emplymnt +
Age + Num_people + Num_credits, family = "binomial", data = credit.pseudo.train)
pseudo.step.BIC.glm <- glm(Response ~ Chk_acct_status + Duration + Instllmnt_rate, family = "binomial", data = credit.pseudo.train)
Process the AIC model to identify the optimal selection threshold.
metric.AIC=c(); #this used to be err.AIC=c();
for(i in 1:length(cut.seq)){
est.prob = predict(pseudo.step.AIC.glm, newdata=credit.pseudo.test[ ,-21], type="response") # prediction on credit.psuedo.test
est.class = ifelse(est.prob > cut.seq[i], 1, 0) # if p-hat >= cut-off then class=1
# if p-hat < cut-off then class=0
#err.AIC[i] = mean(credit.pseudo.test.true != est.class, na.rm=TRUE) ## error rate
#keeping track of error rate + false good rate rather than just the error rate; we will choose the threshold that minimizes this value instead
conf_mat = caret::confusionMatrix(as.factor(est.class), credit.pseudo.test.true, positive = "1")
err = (1 - conf_mat$overall[["Accuracy"]])
false_good = (1 - conf_mat$byClass[["Sensitivity"]])
metric.AIC[i] = (err + false_good)
}
#had to replace err.BIC with metric.BIC
par(mfrow=c(1,1))
plot(cut.seq, metric.AIC, type="l")
min(metric.AIC)
## [1] 0.4849638
cut.seq[which.min(metric.AIC)]
## [1] 0.11
final.cut.AIC = cut.seq[which.min(metric.AIC)]
The optimal selection threshold is projected to be 0.11 for the AIC logistic model.
Applying this technique to BIC model now.
metric.BIC=c(); #this used to be err.AIC=c();
for(i in 1:length(cut.seq)){
est.prob = predict(pseudo.step.BIC.glm, newdata=credit.pseudo.test[ ,-21], type="response") # prediction on credit.psuedo.test
est.class = ifelse(est.prob > cut.seq[i], 1, 0) # if p-hat >= cut-off then class=1
# if p-hat < cut-off then class=0
#err.BIC[i] = mean(credit.pseudo.test.true != est.class, na.rm=TRUE) ## error rate
#keeping track of error rate + false good rate rather than just the error rate; we will choose the threshold that minimizes this value instead
conf_mat = caret::confusionMatrix(as.factor(est.class), credit.pseudo.test.true, positive = "1")
err = (1 - conf_mat$overall[["Accuracy"]])
false_good = (1 - conf_mat$byClass[["Sensitivity"]])
metric.BIC[i] = (err + false_good)
}
#had to replace err.BIC with metric.BIC
par(mfrow=c(1,1))
plot(cut.seq, metric.BIC, type="l")
min(metric.BIC)
## [1] 0.5490942
cut.seq[which.min(metric.BIC)]
## [1] 0.14
final.cut.BIC = cut.seq[which.min(metric.BIC)]
The optimal selection threshold is projected to be 0.14 for the BIC logistic model.
Let’s apply these to the test data and see what happens.
step.AIC.probs <- predict(pseudo.step.AIC.glm, credit.test, type = "response")
step.AIC.preds <- rep(0, length(step.AIC.probs))
step.AIC.preds[step.AIC.probs > final.cut.AIC] = 1
caret::confusionMatrix(as.factor(step.AIC.preds), credit.test$Response, positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 59 7
## 1 80 54
##
## Accuracy : 0.565
## 95% CI : (0.4933, 0.6348)
## No Information Rate : 0.695
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.2319
##
## Mcnemar's Test P-Value : 1.171e-14
##
## Sensitivity : 0.8852
## Specificity : 0.4245
## Pos Pred Value : 0.4030
## Neg Pred Value : 0.8939
## Prevalence : 0.3050
## Detection Rate : 0.2700
## Detection Prevalence : 0.6700
## Balanced Accuracy : 0.6549
##
## 'Positive' Class : 1
##
Total error rate + False good rate (ratio of observations classified as good credit when reference is bad credit):
AIC.score = (1 - 0.565) + (7 / (7+54))
AIC.score
## [1] 0.5497541
step.BIC.probs <- predict(pseudo.step.BIC.glm, credit.test, type = "response")
step.BIC.preds <- rep(0, length(step.BIC.probs))
step.BIC.preds[step.AIC.probs > final.cut.BIC] = 1
caret::confusionMatrix(as.factor(step.BIC.preds), credit.test$Response, positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 67 8
## 1 72 53
##
## Accuracy : 0.6
## 95% CI : (0.5285, 0.6685)
## No Information Rate : 0.695
## P-Value [Acc > NIR] : 0.9983
##
## Kappa : 0.2711
##
## Mcnemar's Test P-Value : 1.873e-12
##
## Sensitivity : 0.8689
## Specificity : 0.4820
## Pos Pred Value : 0.4240
## Neg Pred Value : 0.8933
## Prevalence : 0.3050
## Detection Rate : 0.2650
## Detection Prevalence : 0.6250
## Balanced Accuracy : 0.6754
##
## 'Positive' Class : 1
##
Total error rate + False good rate (ratio of observations classified as good credit when reference is bad credit):
BIC.score = (1 - 0.6) + (8 / (8+53))
BIC.score
## [1] 0.5311475
Since we want to get as accurate a value a model as possible, we should theoretically run with the full model.
credit.opt.glm <- glm(Response ~ ., data = credit.pseudo.train, family = "binomial")
We now run a cut sequence with optimization.
#Here we establish the cut sequence, and create an "err" vector that is blank.
opt_probs = predict(credit.opt.glm, newdata=credit.pseudo.test[ ,-21], type="response")
#The model within the predict function is the one generated from stepwise model selection, based on AIC criteria
#Evaluating the model against the pseudo.test dataset
cut_seq <- seq(0, 1, by = 0.01)
err=c();
for(i in 1:length(cut_seq)){
opt.preds = ifelse(opt_probs > cut_seq[i], 1, 0)
opt.conf = confusionMatrix(as.factor(opt.preds), credit.pseudo.test$Response, positive='1')
#I do not understand what this err function is calculating?
err[i] = (1-opt.conf$overall[1]) + (1-opt.conf$byClass[1])
}
par(mfrow=c(1,1))
plot(cut_seq, err, type="l")
min(err)
## [1] 0.5054348
cut_seq[which.min(err)]
## [1] 0.15
cut_opt = cut_seq[which.min(err)]
The optimal cut-off threshold for the full model is 0.15. We now apply this to the test set and examine its performance.
credit.opt.probs <- predict(credit.opt.glm, credit.test, type = "response")
credit.opt.preds <- rep(0, length(credit.opt.probs))
credit.opt.preds[credit.opt.probs > cut_opt] = 1
caret::confusionMatrix(as.factor(credit.opt.preds), credit.test$Response, positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 78 7
## 1 61 54
##
## Accuracy : 0.66
## 95% CI : (0.5898, 0.7253)
## No Information Rate : 0.695
## P-Value [Acc > NIR] : 0.8747
##
## Kappa : 0.3576
##
## Mcnemar's Test P-Value : 1.3e-10
##
## Sensitivity : 0.8852
## Specificity : 0.5612
## Pos Pred Value : 0.4696
## Neg Pred Value : 0.9176
## Prevalence : 0.3050
## Detection Rate : 0.2700
## Detection Prevalence : 0.5750
## Balanced Accuracy : 0.7232
##
## 'Positive' Class : 1
##
Per the scoring metric:
Opt.score <- (1-0.66)+(7 / (7+54))
Opt.score
## [1] 0.4547541
1 - (7 / (7+54))
## [1] 0.8852459
We find that the “tuned” full model returns a performance metric of 0.45, better than the tuned AIC (0.549) or BIC models (0.531).
Let’s evaluate both AIC and BIC using the standard 0.5 threshold, and then use a tuning method to optimize sensitivity / specificity. We’ll examine how these perform using confusion matrices. Then run the best against the testing set to see how we do.
step.AIC.probs <- predict(step.AIC.glm, credit.train, type = "response")
step.AIC.preds <- rep(0, length(step.AIC.probs))
step.AIC.preds[step.AIC.probs > 0.5] = 1
caret::confusionMatrix(as.factor(step.AIC.preds), credit.train$Response, positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 504 108
## 1 57 131
##
## Accuracy : 0.7938
## 95% CI : (0.764, 0.8213)
## No Information Rate : 0.7012
## P-Value [Acc > NIR] : 2.105e-09
##
## Kappa : 0.4756
##
## Mcnemar's Test P-Value : 9.922e-05
##
## Sensitivity : 0.5481
## Specificity : 0.8984
## Pos Pred Value : 0.6968
## Neg Pred Value : 0.8235
## Prevalence : 0.2988
## Detection Rate : 0.1638
## Detection Prevalence : 0.2350
## Balanced Accuracy : 0.7233
##
## 'Positive' Class : 1
##
#Since Response is "credit risk" with 0 being no risk, and 1 being risk, I'm coding positive as "1" for "event"
The performance metric for this model is:
(1-0.7938) + (108 / (108+131))
## [1] 0.6580828
This performance metric is nowhere near as low as the best model we currently have.
step.BIC.probs <- predict(step.BIC.glm, credit.train, type = "response")
step.BIC.preds <- rep(0, length(step.BIC.probs))
step.BIC.preds[step.BIC.probs > 0.5] = 1
caret::confusionMatrix(as.factor(step.BIC.preds), credit.train$Response, positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 513 161
## 1 48 78
##
## Accuracy : 0.7388
## 95% CI : (0.7068, 0.7689)
## No Information Rate : 0.7012
## P-Value [Acc > NIR] : 0.01066
##
## Kappa : 0.2786
##
## Mcnemar's Test P-Value : 9.394e-15
##
## Sensitivity : 0.3264
## Specificity : 0.9144
## Pos Pred Value : 0.6190
## Neg Pred Value : 0.7611
## Prevalence : 0.2988
## Detection Rate : 0.0975
## Detection Prevalence : 0.1575
## Balanced Accuracy : 0.6204
##
## 'Positive' Class : 1
##
#Since Response is "credit risk" with 0 being no risk, and 1 being risk, I'm coding positive as "1" for "event".
The performance metric for this model is:
(1-0.7388) + (161 / (161+78))
## [1] 0.9348402
Oh my. That’s not very good at all.
Now to tune AIC for sensitivity and specificity, and evaluate it against test set.
#ROC Curve and AUC
pr = predict(step.AIC.glm, credit.train, type = "response") # make prediction with last built model on training set
response = credit.train$Response
pred <- prediction(pr, response) #Predicted Probability and True Classification
# area under curve
auc <- round(as.numeric(performance(pred, measure = "auc")@y.values),3)
# some important statistics
false.rates <-performance(pred, "fpr","fnr")
accuracy <-performance(pred, "acc","err")
perf <- performance(pred, "tpr","fpr")
#plotting the ROC curve and computing AUC
plot(perf,colorize = T, main = "ROC Curve")
text(0.5,0.5, paste("AUC:", auc))
# computing threshold for cutoff to best trade off sensitivity and specificity
#first sensitivity
plot(unlist(performance(pred, "sens")@x.values), unlist(performance(pred, "sens")@y.values),
type="l", lwd=2,
ylab="Sensitivity", xlab="Cutoff", main = paste("Maximized Cutoff\n","AUC: ",auc))
par(new=TRUE) # plot another line in same plot
#second specificity
plot(unlist(performance(pred, "spec")@x.values), unlist(performance(pred, "spec")@y.values),
type="l", lwd=2, col='red', ylab="", xlab="")
axis(4, at=seq(0,1,0.2)) #specificity axis labels
mtext("Specificity",side=4, col='red')
#find where the lines intersect
min.diff <-which.min(abs(unlist(performance(pred, "sens")@y.values) - unlist(performance(pred, "spec")@y.values)))
min.x<-unlist(performance(pred, "sens")@x.values)[min.diff]
min.y<-unlist(performance(pred, "spec")@y.values)[min.diff]
optimal <-min.x #this is the optimal points to best trade off sensitivity and specificity
abline(h = min.y, lty = 3)
abline(v = min.x, lty = 3)
text(min.x,0,paste("optimal threshold=",round(optimal,2)), pos = 4)
Now that we have a threshold, let’s run the AIC model against the test data.
step.AIC.probs <- predict(step.AIC.glm, credit.test, type = "response")
step.AIC.preds <- rep(0, length(step.AIC.probs))
step.AIC.preds[step.AIC.probs > 0.32] = 1
caret::confusionMatrix(as.factor(step.AIC.preds), credit.test$Response, positive = "1", mode='prec_recall')
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 96 18
## 1 43 43
##
## Accuracy : 0.695
## 95% CI : (0.6261, 0.758)
## No Information Rate : 0.695
## P-Value [Acc > NIR] : 0.53455
##
## Kappa : 0.3548
##
## Mcnemar's Test P-Value : 0.00212
##
## Precision : 0.5000
## Recall : 0.7049
## F1 : 0.5850
## Prevalence : 0.3050
## Detection Rate : 0.2150
## Detection Prevalence : 0.4300
## Balanced Accuracy : 0.6978
##
## 'Positive' Class : 1
##
Total error rate + False good rate (ratio of observations classified as good credit when reference is bad credit):
AIC.score = (1 - 0.695) + (18 / (18+43))
AIC.score
## [1] 0.600082
The tuned, AIC stepwise model has a score metric of 0.600, which is better than the un-tuned models, but not as good as the models tuned to minimize the performance metric factors.
Now, BIC.
#ROC Curve and AUC
pr = predict(step.BIC.glm, credit.train, type = "response") # make prediction with last built model on training set
response = credit.train$Response
pred <- prediction(pr, response) #Predicted Probability and True Classification
# area under curve
auc <- round(as.numeric(performance(pred, measure = "auc")@y.values),3)
# some important statistics
false.rates <-performance(pred, "fpr","fnr")
accuracy <-performance(pred, "acc","err")
perf <- performance(pred, "tpr","fpr")
#plotting the ROC curve and computing AUC
plot(perf,colorize = T, main = "ROC Curve")
text(0.5,0.5, paste("AUC:", auc))
# computing threshold for cutoff to best trade off sensitivity and specificity
#first sensitivity
plot(unlist(performance(pred, "sens")@x.values), unlist(performance(pred, "sens")@y.values),
type="l", lwd=2,
ylab="Sensitivity", xlab="Cutoff", main = paste("Maximized Cutoff\n","AUC: ",auc))
par(new=TRUE) # plot another line in same plot
#second specificity
plot(unlist(performance(pred, "spec")@x.values), unlist(performance(pred, "spec")@y.values),
type="l", lwd=2, col='red', ylab="", xlab="")
axis(4, at=seq(0,1,0.2)) #specificity axis labels
mtext("Specificity",side=4, col='red')
#find where the lines intersect
min.diff <-which.min(abs(unlist(performance(pred, "sens")@y.values) - unlist(performance(pred, "spec")@y.values)))
min.x<-unlist(performance(pred, "sens")@x.values)[min.diff]
min.y<-unlist(performance(pred, "spec")@y.values)[min.diff]
optimal <-min.x #this is the optimal points to best trade off sensitivity and specificity
abline(h = min.y, lty = 3)
abline(v = min.x, lty = 3)
text(min.x,0,paste("optimal threshold=",round(optimal,2)), pos = 4)
Now that we have a threshold, let’s run the BIC model against the test data.
step.BIC.probs <- predict(step.BIC.glm, credit.test, type = "response")
step.BIC.preds <- rep(0, length(step.BIC.probs))
step.BIC.preds[step.AIC.probs > 0.35] = 1
caret::confusionMatrix(as.factor(step.BIC.preds), credit.test$Response, positive = "1", mode='prec_recall')
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 103 18
## 1 36 43
##
## Accuracy : 0.73
## 95% CI : (0.6628, 0.7902)
## No Information Rate : 0.695
## P-Value [Acc > NIR] : 0.1590
##
## Kappa : 0.4118
##
## Mcnemar's Test P-Value : 0.0207
##
## Precision : 0.5443
## Recall : 0.7049
## F1 : 0.6143
## Prevalence : 0.3050
## Detection Rate : 0.2150
## Detection Prevalence : 0.3950
## Balanced Accuracy : 0.7230
##
## 'Positive' Class : 1
##
BIC score:
BIC.score = (1 - 0.73) + (18 / (18+43))
BIC.score
## [1] 0.565082
The tuned, BIC stepwise model has a score metric of 0.565. This is a significant improvement over the un-tuned BIC model, and marginally worse than the BIC tuned to minimize the performance metric.