Refer to http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data))
for variable description. THe response variable is Class
and all others are predictors.
Only run the following code once to install the package
caret. The German credit scoring data in
provided in that package.
install.packages('caret')
library(caret) #this package contains the german data with its numeric format
## Warning: package 'caret' was built under R version 4.3.1
## Loading required package: ggplot2
## Loading required package: lattice
data(GermanCredit)
GermanCredit$Class <- GermanCredit$Class == "Good" # use this code to convert `Class` into True or False (equivalent to 1 or 0)
str(GermanCredit)
## 'data.frame': 1000 obs. of 62 variables:
## $ Duration : int 6 48 12 42 24 36 24 36 12 30 ...
## $ Amount : int 1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
## $ InstallmentRatePercentage : int 4 2 2 2 3 2 3 2 2 4 ...
## $ ResidenceDuration : int 4 2 3 4 4 4 4 2 4 2 ...
## $ Age : int 67 22 49 45 53 35 53 35 61 28 ...
## $ NumberExistingCredits : int 2 1 1 1 2 1 1 1 1 2 ...
## $ NumberPeopleMaintenance : int 1 1 2 2 2 2 1 1 1 1 ...
## $ Telephone : num 0 1 1 1 1 0 1 0 1 1 ...
## $ ForeignWorker : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Class : logi TRUE FALSE TRUE TRUE FALSE TRUE ...
## $ CheckingAccountStatus.lt.0 : num 1 0 0 1 1 0 0 0 0 0 ...
## $ CheckingAccountStatus.0.to.200 : num 0 1 0 0 0 0 0 1 0 1 ...
## $ CheckingAccountStatus.gt.200 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CheckingAccountStatus.none : num 0 0 1 0 0 1 1 0 1 0 ...
## $ CreditHistory.NoCredit.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.ThisBank.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.PaidDuly : num 0 1 0 1 0 1 1 1 1 0 ...
## $ CreditHistory.Delay : num 0 0 0 0 1 0 0 0 0 0 ...
## $ CreditHistory.Critical : num 1 0 1 0 0 0 0 0 0 1 ...
## $ Purpose.NewCar : num 0 0 0 0 1 0 0 0 0 1 ...
## $ Purpose.UsedCar : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Purpose.Furniture.Equipment : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Purpose.Radio.Television : num 1 1 0 0 0 0 0 0 1 0 ...
## $ Purpose.DomesticAppliance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Repairs : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Education : num 0 0 1 0 0 1 0 0 0 0 ...
## $ Purpose.Vacation : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Retraining : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Business : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Other : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.lt.100 : num 0 1 1 1 1 0 0 1 0 1 ...
## $ SavingsAccountBonds.100.to.500 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.500.to.1000 : num 0 0 0 0 0 0 1 0 0 0 ...
## $ SavingsAccountBonds.gt.1000 : num 0 0 0 0 0 0 0 0 1 0 ...
## $ SavingsAccountBonds.Unknown : num 1 0 0 0 0 1 0 0 0 0 ...
## $ EmploymentDuration.lt.1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ EmploymentDuration.1.to.4 : num 0 1 0 0 1 1 0 1 0 0 ...
## $ EmploymentDuration.4.to.7 : num 0 0 1 1 0 0 0 0 1 0 ...
## $ EmploymentDuration.gt.7 : num 1 0 0 0 0 0 1 0 0 0 ...
## $ EmploymentDuration.Unemployed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Male.Divorced.Seperated : num 0 0 0 0 0 0 0 0 1 0 ...
## $ Personal.Female.NotSingle : num 0 1 0 0 0 0 0 0 0 0 ...
## $ Personal.Male.Single : num 1 0 1 1 1 1 1 1 0 0 ...
## $ Personal.Male.Married.Widowed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Female.Single : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.None : num 1 1 1 0 1 1 1 1 1 1 ...
## $ OtherDebtorsGuarantors.CoApplicant : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.Guarantor : num 0 0 0 1 0 0 0 0 0 0 ...
## $ Property.RealEstate : num 1 1 1 0 0 0 0 0 1 0 ...
## $ Property.Insurance : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Property.CarOther : num 0 0 0 0 0 0 0 1 0 1 ...
## $ Property.Unknown : num 0 0 0 0 1 1 0 0 0 0 ...
## $ OtherInstallmentPlans.Bank : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.Stores : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.None : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Housing.Rent : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Housing.Own : num 1 1 1 0 0 0 1 0 1 1 ...
## $ Housing.ForFree : num 0 0 0 1 1 1 0 0 0 0 ...
## $ Job.UnemployedUnskilled : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Job.UnskilledResident : num 0 0 1 0 0 1 0 0 1 0 ...
## $ Job.SkilledEmployee : num 1 1 0 1 1 0 1 0 0 0 ...
## $ Job.Management.SelfEmp.HighlyQualified: num 0 0 0 0 0 0 0 1 0 1 ...
#This is an optional code that drop variables that provide no information in the data
GermanCredit = GermanCredit[,-c(14,19,27,30,35,40,44,45,48,52,55,58,62)]
Your observation: Based on the variables, none appear categorical.
Thus, none of the varibales are converted to factor.
We know that some of the columns are categorical such as;
ResidenceDuration, NumberExistingCredit, NumberPeopleMaintenance,
Telephone ForeignWorker, Class, CheckingAccountStatus.lt.0,
CheckingAccountStatus.0.to.200, etc. We can see There are more
categorical than interger varibales based from observing the mean,
minimum and maxiumum values.
summary(GermanCredit)
colnames(GermanCredit)
Your observation: In the Age variable, the median is 33 years. This insight can help us observe a lower age demographic which may lead to worse credit scores for younger ages. In Duration the Max value is 72, while the mean and median are 18 and 20; respectfully. This may lead us to believe that their may be outliers present in the data. The same idea follows for Amount. The max is 18424, which is extremely higher than the other data.
2023 for
reproducibility.set.seed(2023)
index <- sample(1:nrow(GermanCredit),nrow(GermanCredit)*0.80)
credit.train = GermanCredit[index,]
credit.test = GermanCredit[-index,]
Your observation: The random seed was selected to 2023 and the data was randomly split to training (80%) and testing (20%).
credit.glm0<- glm(Class~., family=binomial, data=credit.train)
summary(credit.glm0)
##
## Call:
## glm(formula = Class ~ ., family = binomial, data = credit.train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.948e+00 1.620e+00 4.908 9.22e-07 ***
## Duration -2.465e-02 1.027e-02 -2.401 0.01636 *
## Amount -1.206e-04 4.943e-05 -2.440 0.01467 *
## InstallmentRatePercentage -2.766e-01 9.796e-02 -2.823 0.00476 **
## ResidenceDuration 4.616e-02 9.831e-02 0.469 0.63872
## Age 1.982e-02 1.046e-02 1.896 0.05802 .
## NumberExistingCredits -2.741e-01 2.145e-01 -1.278 0.20142
## NumberPeopleMaintenance -1.388e-01 2.898e-01 -0.479 0.63190
## Telephone -2.586e-01 2.242e-01 -1.153 0.24877
## ForeignWorker -1.789e+00 8.309e-01 -2.153 0.03132 *
## CheckingAccountStatus.lt.0 -1.944e+00 2.646e-01 -7.347 2.02e-13 ***
## CheckingAccountStatus.0.to.200 -1.278e+00 2.551e-01 -5.009 5.46e-07 ***
## CheckingAccountStatus.gt.200 -5.367e-01 4.445e-01 -1.208 0.22724
## CreditHistory.NoCredit.AllPaid -1.284e+00 4.801e-01 -2.674 0.00750 **
## CreditHistory.ThisBank.AllPaid -1.436e+00 4.997e-01 -2.873 0.00407 **
## CreditHistory.PaidDuly -7.179e-01 2.865e-01 -2.506 0.01221 *
## CreditHistory.Delay -5.630e-01 3.726e-01 -1.511 0.13081
## Purpose.NewCar -1.917e+00 8.668e-01 -2.212 0.02697 *
## Purpose.UsedCar -2.727e-01 8.931e-01 -0.305 0.76006
## Purpose.Furniture.Equipment -1.069e+00 8.737e-01 -1.223 0.22118
## Purpose.Radio.Television -1.054e+00 8.812e-01 -1.196 0.23171
## Purpose.DomesticAppliance -1.109e+00 1.220e+00 -0.909 0.36321
## Purpose.Repairs -1.992e+00 1.035e+00 -1.924 0.05433 .
## Purpose.Education -1.896e+00 9.500e-01 -1.996 0.04595 *
## Purpose.Retraining -1.045e+00 1.507e+00 -0.694 0.48796
## Purpose.Business -1.240e+00 8.975e-01 -1.381 0.16721
## SavingsAccountBonds.lt.100 -9.516e-01 2.975e-01 -3.199 0.00138 **
## SavingsAccountBonds.100.to.500 -7.571e-01 3.877e-01 -1.953 0.05083 .
## SavingsAccountBonds.500.to.1000 -3.102e-01 5.274e-01 -0.588 0.55639
## SavingsAccountBonds.gt.1000 -2.349e-01 5.947e-01 -0.395 0.69284
## EmploymentDuration.lt.1 2.255e-01 4.925e-01 0.458 0.64711
## EmploymentDuration.1.to.4 2.978e-01 4.682e-01 0.636 0.52473
## EmploymentDuration.4.to.7 8.561e-01 5.057e-01 1.693 0.09045 .
## EmploymentDuration.gt.7 3.178e-01 4.724e-01 0.673 0.50108
## Personal.Male.Divorced.Seperated -5.419e-01 4.982e-01 -1.088 0.27668
## Personal.Female.NotSingle -2.182e-01 3.492e-01 -0.625 0.53197
## Personal.Male.Single 2.917e-01 3.523e-01 0.828 0.40770
## OtherDebtorsGuarantors.None -7.453e-01 4.707e-01 -1.583 0.11339
## OtherDebtorsGuarantors.CoApplicant -1.243e+00 6.380e-01 -1.948 0.05138 .
## Property.RealEstate 8.035e-01 4.647e-01 1.729 0.08381 .
## Property.Insurance 6.041e-01 4.511e-01 1.339 0.18050
## Property.CarOther 4.111e-01 4.378e-01 0.939 0.34776
## OtherInstallmentPlans.Bank -5.736e-01 2.706e-01 -2.120 0.03401 *
## OtherInstallmentPlans.Stores -4.597e-01 4.649e-01 -0.989 0.32276
## Housing.Rent -5.839e-01 5.256e-01 -1.111 0.26656
## Housing.Own -7.262e-02 4.909e-01 -0.148 0.88240
## Job.UnemployedUnskilled 9.950e-01 8.532e-01 1.166 0.24352
## Job.UnskilledResident 1.006e-01 3.978e-01 0.253 0.80027
## Job.SkilledEmployee 1.195e-02 3.242e-01 0.037 0.97060
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 980.75 on 799 degrees of freedom
## Residual deviance: 717.58 on 751 degrees of freedom
## AIC: 815.58
##
## Number of Fisher Scoring iterations: 5
Your observation: Here, the lg model is using the family function. This focuses on the distribution of the Class variable with the general binomial distribution.
pred.glm0.train <- predict(credit.glm0,type="response")
hist(pred.glm0.train)
Your observation: In this histogram, we can see a skewed left. Here the predict function is used to focus on the \(\hat{P(y=1)}\) for logistics regression. This is the predicted probability for each training observations
pcut1<- mean(GermanCredit$Class)
class.glm0.train<- (pred.glm0.train > pcut1) *1
length(GermanCredit$Class)
## [1] 1000
length(class.glm0.train)
## [1] 800
#table(GermanCredit$Class, class.glm0.train, dnn = c("True", "Predicted"))
costfunc = function(obs, pred.p, pcut){
weight1 = 1 # define the weight for "true=1 but pred=0" (FN)
weight0 = 1 # define the weight for "true=0 but pred=1" (FP)
c1 = (obs==1)&(pred.p<pcut) # count for "true=1 but pred=0" (FN)
c0 = (obs==0)&(pred.p>=pcut) # count for "true=0 but pred=1" (FP)
cost = mean(weight1*c1 + weight0*c0) # misclassification with weight
return(cost) }
p.seq = seq(0.01, 1, 0.01)
cost = rep(0, length(p.seq))
for(i in 1:length(p.seq)){
cost[i] = costfunc(obs = GermanCredit$Class, pred.p = pred.glm0.train, pcut = p.seq[i]) }
optimal.pcut.glm0 = p.seq[which(cost==min(cost))]
print(optimal.pcut.glm0)
## [1] 0.01 0.02 0.03 0.04
Your observation: Due to using the weight0 = 1 and weight1 = 1, the Grid Search Method is used.
pred.glm0.test<- predict(credit.glm0, newdata = GermanCredit, type="response")
pred.glm0.test.opt <- (pred.glm0.test > 0.5)*1
table(GermanCredit$Class, pred.glm0.test.opt, dnn = c("True", "Predicted"))
## Predicted
## True 0 1
## FALSE 165 135
## TRUE 84 616
MR<- mean(credit.test$default!= pred.glm0.test.opt)
print(paste0("MR:",MR))
## [1] "MR:NaN"
Your observation: The confusion matrix shows a 2*2 table. Since the first input is labeled as true, the first name is labeled true. The MR is an option for this case because the overall MR as the cost evaluates the model’s prediction.
pcut1<- mean(GermanCredit$Class)
# get binary prediction
class.glm0.train<- (pred.glm0.train>pcut1)*1
table(GermanCredit$Class, pred.glm0.test.opt, dnn = c("True", "Predicted"))
## Predicted
## True 0 1
## FALSE 165 135
## TRUE 84 616
MR<- mean(GermanCredit$Class!= pred.glm0.test.opt)
print(paste0("MR:",MR))
## [1] "MR:0.219"
Your observation: Since the MR is 0.219, this means about 21% of the instances in the dataset are misclassified.
library(ROCR)
pred <- prediction(pred.glm0.test.opt, GermanCredit$Class)
perf <- performance(pred, "tpr", "fpr")
plot(perf, colorize=TRUE)
#Get the AUC
unlist(slot(performance(pred, "auc"), "y.values"))
## [1] 0.715
Your observation: The AUC is 0.715. The ROC curve shows the curve of FPR (1-specificity) vs. TPR (sensitivity).
pred.glm0.test<- predict(credit.glm0, newdata = GermanCredit, type="response")
pred.glm0.test.opt <- (pred.glm0.test>0.5)*1
table(GermanCredit$Class, pred.glm0.test.opt, dnn = c("True", "Predicted"))
## Predicted
## True 0 1
## FALSE 165 135
## TRUE 84 616
MR<- mean(GermanCredit$Class!= pred.glm0.test.opt)
print(paste0("MR:",MR))
## [1] "MR:0.219"
Your observation: The testing sample is only used for evaluating the model’s prediction accuracy.
pred.glm0.test<- predict(credit.glm0, newdata = GermanCredit, type="response")
pred <- prediction(pred.glm0.test.opt, GermanCredit$Class)
perf <- performance(pred, "tpr", "fpr")
plot(perf, colorize=TRUE)
Your observation: For some reason, both ROC curves in both testing and training data chunks do not have a clear curve.
Now, let’s assume “It is worse to class a customer as good when they are bad (weight = 5), than it is to class a customer as bad when they are good (weight = 1).” Please figure out which weight should be 5 and which weight should be ### 1. Then define your cost function accordingly!
costfunc = function(obs, pred.p, pcut){
weight1 = 1 # define the weight for "true=1 but pred=0" (FN)
weight0 = 5 # define the weight for "true=0 but pred=1" (FP)
c1 = (obs==1)&(pred.p<pcut) # count for "true=1 but pred=0" (FN)
c0 = (obs==0)&(pred.p>=pcut) # count for "true=0 but pred=1" (FP)
cost = mean(weight1*c1 + weight0*c0) # misclassification with weight
return(cost) # you have to return to a value when you write R functions
} # end of the function
p.seq = seq(0.01, 1, 0.01)
cost = rep(0, length(p.seq))
for(i in 1:length(p.seq)){
cost[i] = costfunc(obs = GermanCredit$Class, pred.p = pred.glm0.train, pcut = p.seq[i]) }
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
plot(p.seq, cost)
optimal.pcut.glm0 = p.seq[which(cost==min(cost))]
print(optimal.pcut.glm0)
## [1] 0.99
Your observation: In this plot with the cost against p.seq, the pcut gives us a 0.99.
class.glm0.train<- (pred.glm0.train>pcut1)*1
#table(GermanCredit$Class, class.glm0.train, dnn = c("True", "Predicted"))
MR<- mean(GermanCredit$Class!=class.glm0.train)
## Warning in GermanCredit$Class != class.glm0.train: longer object length is not
## a multiple of shorter object length
print(paste0("MR:",MR))
## [1] "MR:0.453"
Your observation: In this observation the arguments are not the same length.
class.glm0.test<- (pred.glm0.test>pcut1)*1
table(GermanCredit$Class, class.glm0.test, dnn = c("True", "Predicted"))
## Predicted
## True 0 1
## FALSE 231 69
## TRUE 174 526
MR<- mean(GermanCredit$Class!=class.glm0.test)
print(paste0("MR:",MR))
## [1] "MR:0.243"
Your observation: Rather, in this observation the MR is 0.243 as opposed to on the training data set the MR DNE.
Summarize your findings, including the optimal probability cut-off, MR and AUC (if calculated) for both in-sample and out-of-sample data. Discuss what you observed and make some suggestions on how can we improve the model.
The lg model was fitted to the training set for all variables using the binomial family.
The optimal probability cut-off point was calculated using a weight 1. The optimal cut-off point was found to be 0.99. The MR for the training set is 21%. The AUC is 0.715. The ROC curve not having a clear shape in both training and testing data might indicate that the model’s predictive performance is not very strong. A way to improve this dataset is to fix the training set to ensure that their are the same amount of arguments.