Starter code for German credit scoring

Refer to http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)) for variable description. THe response variable is Class and all others are predictors.

Only run the following code once to install the package caret. The German credit scoring data in provided in that package.

install.packages('caret')

Task1: Data Preparation

1. Load the caret package and the GermanCredit dataset.

library(caret) #this package contains the german data with its numeric format
## Warning: package 'caret' was built under R version 4.3.1
## Loading required package: ggplot2
## Loading required package: lattice
data(GermanCredit)
GermanCredit$Class <-  GermanCredit$Class == "Good" # use this code to convert `Class` into True or False (equivalent to 1 or 0)
str(GermanCredit)
## 'data.frame':    1000 obs. of  62 variables:
##  $ Duration                              : int  6 48 12 42 24 36 24 36 12 30 ...
##  $ Amount                                : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ InstallmentRatePercentage             : int  4 2 2 2 3 2 3 2 2 4 ...
##  $ ResidenceDuration                     : int  4 2 3 4 4 4 4 2 4 2 ...
##  $ Age                                   : int  67 22 49 45 53 35 53 35 61 28 ...
##  $ NumberExistingCredits                 : int  2 1 1 1 2 1 1 1 1 2 ...
##  $ NumberPeopleMaintenance               : int  1 1 2 2 2 2 1 1 1 1 ...
##  $ Telephone                             : num  0 1 1 1 1 0 1 0 1 1 ...
##  $ ForeignWorker                         : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Class                                 : logi  TRUE FALSE TRUE TRUE FALSE TRUE ...
##  $ CheckingAccountStatus.lt.0            : num  1 0 0 1 1 0 0 0 0 0 ...
##  $ CheckingAccountStatus.0.to.200        : num  0 1 0 0 0 0 0 1 0 1 ...
##  $ CheckingAccountStatus.gt.200          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CheckingAccountStatus.none            : num  0 0 1 0 0 1 1 0 1 0 ...
##  $ CreditHistory.NoCredit.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.ThisBank.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.PaidDuly                : num  0 1 0 1 0 1 1 1 1 0 ...
##  $ CreditHistory.Delay                   : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ CreditHistory.Critical                : num  1 0 1 0 0 0 0 0 0 1 ...
##  $ Purpose.NewCar                        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ Purpose.UsedCar                       : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Purpose.Furniture.Equipment           : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Purpose.Radio.Television              : num  1 1 0 0 0 0 0 0 1 0 ...
##  $ Purpose.DomesticAppliance             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Repairs                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Education                     : num  0 0 1 0 0 1 0 0 0 0 ...
##  $ Purpose.Vacation                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Retraining                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Business                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Other                         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.lt.100            : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ SavingsAccountBonds.100.to.500        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.500.to.1000       : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ SavingsAccountBonds.gt.1000           : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ SavingsAccountBonds.Unknown           : num  1 0 0 0 0 1 0 0 0 0 ...
##  $ EmploymentDuration.lt.1               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EmploymentDuration.1.to.4             : num  0 1 0 0 1 1 0 1 0 0 ...
##  $ EmploymentDuration.4.to.7             : num  0 0 1 1 0 0 0 0 1 0 ...
##  $ EmploymentDuration.gt.7               : num  1 0 0 0 0 0 1 0 0 0 ...
##  $ EmploymentDuration.Unemployed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Male.Divorced.Seperated      : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ Personal.Female.NotSingle             : num  0 1 0 0 0 0 0 0 0 0 ...
##  $ Personal.Male.Single                  : num  1 0 1 1 1 1 1 1 0 0 ...
##  $ Personal.Male.Married.Widowed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Female.Single                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.None           : num  1 1 1 0 1 1 1 1 1 1 ...
##  $ OtherDebtorsGuarantors.CoApplicant    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.Guarantor      : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ Property.RealEstate                   : num  1 1 1 0 0 0 0 0 1 0 ...
##  $ Property.Insurance                    : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Property.CarOther                     : num  0 0 0 0 0 0 0 1 0 1 ...
##  $ Property.Unknown                      : num  0 0 0 0 1 1 0 0 0 0 ...
##  $ OtherInstallmentPlans.Bank            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.Stores          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.None            : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Housing.Rent                          : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Housing.Own                           : num  1 1 1 0 0 0 1 0 1 1 ...
##  $ Housing.ForFree                       : num  0 0 0 1 1 1 0 0 0 0 ...
##  $ Job.UnemployedUnskilled               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Job.UnskilledResident                 : num  0 0 1 0 0 1 0 0 1 0 ...
##  $ Job.SkilledEmployee                   : num  1 1 0 1 1 0 1 0 0 0 ...
##  $ Job.Management.SelfEmp.HighlyQualified: num  0 0 0 0 0 0 0 1 0 1 ...
#This is an optional code that drop variables that provide no information in the data
GermanCredit = GermanCredit[,-c(14,19,27,30,35,40,44,45,48,52,55,58,62)]

Your observation: Based on the variables, none appear categorical. Thus, none of the varibales are converted to factor.
We know that some of the columns are categorical such as; ResidenceDuration, NumberExistingCredit, NumberPeopleMaintenance, Telephone ForeignWorker, Class, CheckingAccountStatus.lt.0, CheckingAccountStatus.0.to.200, etc. We can see There are more categorical than interger varibales based from observing the mean, minimum and maxiumum values.

2. Explore the dataset to understand its structure.

summary(GermanCredit)
colnames(GermanCredit)

Your observation: In the Age variable, the median is 33 years. This insight can help us observe a lower age demographic which may lead to worse credit scores for younger ages. In Duration the Max value is 72, while the mean and median are 18 and 20; respectfully. This may lead us to believe that their may be outliers present in the data. The same idea follows for Amount. The max is 18424, which is extremely higher than the other data.

3. Split the dataset into training and test set. Please use the random seed as 2023 for reproducibility.

set.seed(2023)
index <- sample(1:nrow(GermanCredit),nrow(GermanCredit)*0.80)
credit.train = GermanCredit[index,]
credit.test = GermanCredit[-index,]

Your observation: The random seed was selected to 2023 and the data was randomly split to training (80%) and testing (20%).

Task 2: Model Fitting

1. Fit a logistic regression model using the training set. Please use all variables, but make sure the variable types are right.

credit.glm0<- glm(Class~., family=binomial, data=credit.train)
summary(credit.glm0)
## 
## Call:
## glm(formula = Class ~ ., family = binomial, data = credit.train)
## 
## Coefficients:
##                                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                         7.948e+00  1.620e+00   4.908 9.22e-07 ***
## Duration                           -2.465e-02  1.027e-02  -2.401  0.01636 *  
## Amount                             -1.206e-04  4.943e-05  -2.440  0.01467 *  
## InstallmentRatePercentage          -2.766e-01  9.796e-02  -2.823  0.00476 ** 
## ResidenceDuration                   4.616e-02  9.831e-02   0.469  0.63872    
## Age                                 1.982e-02  1.046e-02   1.896  0.05802 .  
## NumberExistingCredits              -2.741e-01  2.145e-01  -1.278  0.20142    
## NumberPeopleMaintenance            -1.388e-01  2.898e-01  -0.479  0.63190    
## Telephone                          -2.586e-01  2.242e-01  -1.153  0.24877    
## ForeignWorker                      -1.789e+00  8.309e-01  -2.153  0.03132 *  
## CheckingAccountStatus.lt.0         -1.944e+00  2.646e-01  -7.347 2.02e-13 ***
## CheckingAccountStatus.0.to.200     -1.278e+00  2.551e-01  -5.009 5.46e-07 ***
## CheckingAccountStatus.gt.200       -5.367e-01  4.445e-01  -1.208  0.22724    
## CreditHistory.NoCredit.AllPaid     -1.284e+00  4.801e-01  -2.674  0.00750 ** 
## CreditHistory.ThisBank.AllPaid     -1.436e+00  4.997e-01  -2.873  0.00407 ** 
## CreditHistory.PaidDuly             -7.179e-01  2.865e-01  -2.506  0.01221 *  
## CreditHistory.Delay                -5.630e-01  3.726e-01  -1.511  0.13081    
## Purpose.NewCar                     -1.917e+00  8.668e-01  -2.212  0.02697 *  
## Purpose.UsedCar                    -2.727e-01  8.931e-01  -0.305  0.76006    
## Purpose.Furniture.Equipment        -1.069e+00  8.737e-01  -1.223  0.22118    
## Purpose.Radio.Television           -1.054e+00  8.812e-01  -1.196  0.23171    
## Purpose.DomesticAppliance          -1.109e+00  1.220e+00  -0.909  0.36321    
## Purpose.Repairs                    -1.992e+00  1.035e+00  -1.924  0.05433 .  
## Purpose.Education                  -1.896e+00  9.500e-01  -1.996  0.04595 *  
## Purpose.Retraining                 -1.045e+00  1.507e+00  -0.694  0.48796    
## Purpose.Business                   -1.240e+00  8.975e-01  -1.381  0.16721    
## SavingsAccountBonds.lt.100         -9.516e-01  2.975e-01  -3.199  0.00138 ** 
## SavingsAccountBonds.100.to.500     -7.571e-01  3.877e-01  -1.953  0.05083 .  
## SavingsAccountBonds.500.to.1000    -3.102e-01  5.274e-01  -0.588  0.55639    
## SavingsAccountBonds.gt.1000        -2.349e-01  5.947e-01  -0.395  0.69284    
## EmploymentDuration.lt.1             2.255e-01  4.925e-01   0.458  0.64711    
## EmploymentDuration.1.to.4           2.978e-01  4.682e-01   0.636  0.52473    
## EmploymentDuration.4.to.7           8.561e-01  5.057e-01   1.693  0.09045 .  
## EmploymentDuration.gt.7             3.178e-01  4.724e-01   0.673  0.50108    
## Personal.Male.Divorced.Seperated   -5.419e-01  4.982e-01  -1.088  0.27668    
## Personal.Female.NotSingle          -2.182e-01  3.492e-01  -0.625  0.53197    
## Personal.Male.Single                2.917e-01  3.523e-01   0.828  0.40770    
## OtherDebtorsGuarantors.None        -7.453e-01  4.707e-01  -1.583  0.11339    
## OtherDebtorsGuarantors.CoApplicant -1.243e+00  6.380e-01  -1.948  0.05138 .  
## Property.RealEstate                 8.035e-01  4.647e-01   1.729  0.08381 .  
## Property.Insurance                  6.041e-01  4.511e-01   1.339  0.18050    
## Property.CarOther                   4.111e-01  4.378e-01   0.939  0.34776    
## OtherInstallmentPlans.Bank         -5.736e-01  2.706e-01  -2.120  0.03401 *  
## OtherInstallmentPlans.Stores       -4.597e-01  4.649e-01  -0.989  0.32276    
## Housing.Rent                       -5.839e-01  5.256e-01  -1.111  0.26656    
## Housing.Own                        -7.262e-02  4.909e-01  -0.148  0.88240    
## Job.UnemployedUnskilled             9.950e-01  8.532e-01   1.166  0.24352    
## Job.UnskilledResident               1.006e-01  3.978e-01   0.253  0.80027    
## Job.SkilledEmployee                 1.195e-02  3.242e-01   0.037  0.97060    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 980.75  on 799  degrees of freedom
## Residual deviance: 717.58  on 751  degrees of freedom
## AIC: 815.58
## 
## Number of Fisher Scoring iterations: 5

Your observation: Here, the lg model is using the family function. This focuses on the distribution of the Class variable with the general binomial distribution.

2. Summarize the model and interpret the coefficients.

pred.glm0.train <- predict(credit.glm0,type="response")
hist(pred.glm0.train)

Your observation: In this histogram, we can see a skewed left. Here the predict function is used to focus on the \(\hat{P(y=1)}\) for logistics regression. This is the predicted probability for each training observations

Task 3: Optimal Probability Cut-off, with weight0 = 1 and weight1 = ### 1.

1. Use the training set to predict probabilities. (only for optimal threshold)

pcut1<- mean(GermanCredit$Class)
class.glm0.train<- (pred.glm0.train > pcut1) *1
length(GermanCredit$Class)
## [1] 1000
length(class.glm0.train)
## [1] 800
#table(GermanCredit$Class, class.glm0.train, dnn = c("True", "Predicted"))


costfunc = function(obs, pred.p, pcut){
    weight1 = 1   # define the weight for "true=1 but pred=0" (FN)
    weight0 = 1    # define the weight for "true=0 but pred=1" (FP)
    c1 = (obs==1)&(pred.p<pcut)    # count for "true=1 but pred=0"   (FN)
    c0 = (obs==0)&(pred.p>=pcut)   # count for "true=0 but pred=1"   (FP)
    cost = mean(weight1*c1 + weight0*c0)  # misclassification with weight
    return(cost) }
p.seq = seq(0.01, 1, 0.01) 
cost = rep(0, length(p.seq))  
for(i in 1:length(p.seq)){ 
    cost[i] = costfunc(obs = GermanCredit$Class, pred.p = pred.glm0.train, pcut = p.seq[i])  } 
optimal.pcut.glm0 = p.seq[which(cost==min(cost))]
print(optimal.pcut.glm0)
## [1] 0.01 0.02 0.03 0.04

Your observation: Due to using the weight0 = 1 and weight1 = 1, the Grid Search Method is used.

2. Find the optimal probability cut-off point using the MR (misclassification rate) or equivalently the equal-weight cost.

pred.glm0.test<- predict(credit.glm0, newdata = GermanCredit, type="response")
pred.glm0.test.opt <- (pred.glm0.test > 0.5)*1
table(GermanCredit$Class, pred.glm0.test.opt, dnn = c("True", "Predicted"))
##        Predicted
## True      0   1
##   FALSE 165 135
##   TRUE   84 616
MR<- mean(credit.test$default!= pred.glm0.test.opt)
print(paste0("MR:",MR))
## [1] "MR:NaN"

Your observation: The confusion matrix shows a 2*2 table. Since the first input is labeled as true, the first name is labeled true. The MR is an option for this case because the overall MR as the cost evaluates the model’s prediction.

Task 4: Model Evaluation

1. Using the optimal probability cut-off point obtained in ### 3.2, generate confusion matrix and obtain MR for the the training set.

pcut1<- mean(GermanCredit$Class)
# get binary prediction
class.glm0.train<- (pred.glm0.train>pcut1)*1
table(GermanCredit$Class, pred.glm0.test.opt, dnn = c("True", "Predicted"))
##        Predicted
## True      0   1
##   FALSE 165 135
##   TRUE   84 616
MR<- mean(GermanCredit$Class!= pred.glm0.test.opt)
print(paste0("MR:",MR))
## [1] "MR:0.219"

Your observation: Since the MR is 0.219, this means about 21% of the instances in the dataset are misclassified.

2. Using the optimal probability cut-off point obtained in ### 3.2, generate the ROC curve and calculate the AUC for the training set.

ROC Curve

library(ROCR)
pred <- prediction(pred.glm0.test.opt, GermanCredit$Class)
perf <- performance(pred, "tpr", "fpr")
plot(perf, colorize=TRUE)

#Get the AUC
unlist(slot(performance(pred, "auc"), "y.values"))
## [1] 0.715

Your observation: The AUC is 0.715. The ROC curve shows the curve of FPR (1-specificity) vs. TPR (sensitivity).

3. Using the same cut-off point, generate confusion matrix and obtain MR for the test set.

pred.glm0.test<- predict(credit.glm0, newdata = GermanCredit, type="response")
pred.glm0.test.opt <- (pred.glm0.test>0.5)*1
table(GermanCredit$Class, pred.glm0.test.opt, dnn = c("True", "Predicted"))
##        Predicted
## True      0   1
##   FALSE 165 135
##   TRUE   84 616
MR<- mean(GermanCredit$Class!= pred.glm0.test.opt)
print(paste0("MR:",MR))
## [1] "MR:0.219"

Your observation: The testing sample is only used for evaluating the model’s prediction accuracy.

4. Using the same cut-off point, generate the ROC curve and calculate the AUC for the test set.

pred.glm0.test<- predict(credit.glm0, newdata = GermanCredit, type="response")
pred <- prediction(pred.glm0.test.opt, GermanCredit$Class)
perf <- performance(pred, "tpr", "fpr")
plot(perf, colorize=TRUE)

Your observation: For some reason, both ROC curves in both testing and training data chunks do not have a clear curve.

Task 5: Using different weights

Now, let’s assume “It is worse to class a customer as good when they are bad (weight = 5), than it is to class a customer as bad when they are good (weight = 1).” Please figure out which weight should be 5 and which weight should be ### 1. Then define your cost function accordingly!

1. Obtain optimal probability cut-off point again, with the new weights.

costfunc = function(obs, pred.p, pcut){
    weight1 = 1   # define the weight for "true=1 but pred=0" (FN)
    weight0 = 5    # define the weight for "true=0 but pred=1" (FP)
    c1 = (obs==1)&(pred.p<pcut)    # count for "true=1 but pred=0"   (FN)
    c0 = (obs==0)&(pred.p>=pcut)   # count for "true=0 but pred=1"   (FP)
    cost = mean(weight1*c1 + weight0*c0)  # misclassification with weight
    return(cost) # you have to return to a value when you write R functions
} # end of the function
p.seq = seq(0.01, 1, 0.01) 
cost = rep(0, length(p.seq))  
for(i in 1:length(p.seq)){ 
    cost[i] = costfunc(obs = GermanCredit$Class, pred.p = pred.glm0.train, pcut = p.seq[i])  } 
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
plot(p.seq, cost)

optimal.pcut.glm0 = p.seq[which(cost==min(cost))]
print(optimal.pcut.glm0)
## [1] 0.99

Your observation: In this plot with the cost against p.seq, the pcut gives us a 0.99.

2. Obtain the confusion matrix and MR for the training set.

class.glm0.train<- (pred.glm0.train>pcut1)*1
#table(GermanCredit$Class, class.glm0.train, dnn = c("True", "Predicted"))
MR<- mean(GermanCredit$Class!=class.glm0.train)
## Warning in GermanCredit$Class != class.glm0.train: longer object length is not
## a multiple of shorter object length
print(paste0("MR:",MR))
## [1] "MR:0.453"

Your observation: In this observation the arguments are not the same length.

3. Obtain the confusion matrix and MR for the test set.

class.glm0.test<- (pred.glm0.test>pcut1)*1
table(GermanCredit$Class, class.glm0.test, dnn = c("True", "Predicted"))
##        Predicted
## True      0   1
##   FALSE 231  69
##   TRUE  174 526
MR<- mean(GermanCredit$Class!=class.glm0.test)
print(paste0("MR:",MR))
## [1] "MR:0.243"

Your observation: Rather, in this observation the MR is 0.243 as opposed to on the training data set the MR DNE.

Task 6: Report

Summarize your findings, including the optimal probability cut-off, MR and AUC (if calculated) for both in-sample and out-of-sample data. Discuss what you observed and make some suggestions on how can we improve the model.

The lg model was fitted to the training set for all variables using the binomial family.

The optimal probability cut-off point was calculated using a weight 1. The optimal cut-off point was found to be 0.99. The MR for the training set is 21%. The AUC is 0.715. The ROC curve not having a clear shape in both training and testing data might indicate that the model’s predictive performance is not very strong. A way to improve this dataset is to fix the training set to ensure that their are the same amount of arguments.