Refer to http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data))
for variable description. The response variable is Class
and all others are predictors.
Only run the following code once to install the package
caret. The German credit scoring data in
provided in that package.
library(caret) #this package contains the german data with its numeric format
data(GermanCredit)
GermanCredit$Class <- GermanCredit$Class == "Good" # use this code to convert `Class` into True or False (equivalent to 1 or 0)
str(GermanCredit)
## 'data.frame': 1000 obs. of 62 variables:
## $ Duration : int 6 48 12 42 24 36 24 36 12 30 ...
## $ Amount : int 1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
## $ InstallmentRatePercentage : int 4 2 2 2 3 2 3 2 2 4 ...
## $ ResidenceDuration : int 4 2 3 4 4 4 4 2 4 2 ...
## $ Age : int 67 22 49 45 53 35 53 35 61 28 ...
## $ NumberExistingCredits : int 2 1 1 1 2 1 1 1 1 2 ...
## $ NumberPeopleMaintenance : int 1 1 2 2 2 2 1 1 1 1 ...
## $ Telephone : num 0 1 1 1 1 0 1 0 1 1 ...
## $ ForeignWorker : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Class : logi TRUE FALSE TRUE TRUE FALSE TRUE ...
## $ CheckingAccountStatus.lt.0 : num 1 0 0 1 1 0 0 0 0 0 ...
## $ CheckingAccountStatus.0.to.200 : num 0 1 0 0 0 0 0 1 0 1 ...
## $ CheckingAccountStatus.gt.200 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CheckingAccountStatus.none : num 0 0 1 0 0 1 1 0 1 0 ...
## $ CreditHistory.NoCredit.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.ThisBank.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.PaidDuly : num 0 1 0 1 0 1 1 1 1 0 ...
## $ CreditHistory.Delay : num 0 0 0 0 1 0 0 0 0 0 ...
## $ CreditHistory.Critical : num 1 0 1 0 0 0 0 0 0 1 ...
## $ Purpose.NewCar : num 0 0 0 0 1 0 0 0 0 1 ...
## $ Purpose.UsedCar : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Purpose.Furniture.Equipment : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Purpose.Radio.Television : num 1 1 0 0 0 0 0 0 1 0 ...
## $ Purpose.DomesticAppliance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Repairs : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Education : num 0 0 1 0 0 1 0 0 0 0 ...
## $ Purpose.Vacation : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Retraining : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Business : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Other : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.lt.100 : num 0 1 1 1 1 0 0 1 0 1 ...
## $ SavingsAccountBonds.100.to.500 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.500.to.1000 : num 0 0 0 0 0 0 1 0 0 0 ...
## $ SavingsAccountBonds.gt.1000 : num 0 0 0 0 0 0 0 0 1 0 ...
## $ SavingsAccountBonds.Unknown : num 1 0 0 0 0 1 0 0 0 0 ...
## $ EmploymentDuration.lt.1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ EmploymentDuration.1.to.4 : num 0 1 0 0 1 1 0 1 0 0 ...
## $ EmploymentDuration.4.to.7 : num 0 0 1 1 0 0 0 0 1 0 ...
## $ EmploymentDuration.gt.7 : num 1 0 0 0 0 0 1 0 0 0 ...
## $ EmploymentDuration.Unemployed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Male.Divorced.Seperated : num 0 0 0 0 0 0 0 0 1 0 ...
## $ Personal.Female.NotSingle : num 0 1 0 0 0 0 0 0 0 0 ...
## $ Personal.Male.Single : num 1 0 1 1 1 1 1 1 0 0 ...
## $ Personal.Male.Married.Widowed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Female.Single : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.None : num 1 1 1 0 1 1 1 1 1 1 ...
## $ OtherDebtorsGuarantors.CoApplicant : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.Guarantor : num 0 0 0 1 0 0 0 0 0 0 ...
## $ Property.RealEstate : num 1 1 1 0 0 0 0 0 1 0 ...
## $ Property.Insurance : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Property.CarOther : num 0 0 0 0 0 0 0 1 0 1 ...
## $ Property.Unknown : num 0 0 0 0 1 1 0 0 0 0 ...
## $ OtherInstallmentPlans.Bank : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.Stores : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.None : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Housing.Rent : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Housing.Own : num 1 1 1 0 0 0 1 0 1 1 ...
## $ Housing.ForFree : num 0 0 0 1 1 1 0 0 0 0 ...
## $ Job.UnemployedUnskilled : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Job.UnskilledResident : num 0 0 1 0 0 1 0 0 1 0 ...
## $ Job.SkilledEmployee : num 1 1 0 1 1 0 1 0 0 0 ...
## $ Job.Management.SelfEmp.HighlyQualified: num 0 0 0 0 0 0 0 1 0 1 ...
Your observation: I can see that German Credit has 1000 observations and 62 variables. These include numeric and binary variables that are made from categorical variables.
#This is an optional code that drop variables that provide no information in the data
GermanCredit = GermanCredit[,-c(14,19,27,30,35,40,44,45,48,52,55,58,62)] #don't run this code twice!! Think about why.
str(GermanCredit)
## 'data.frame': 1000 obs. of 49 variables:
## $ Duration : int 6 48 12 42 24 36 24 36 12 30 ...
## $ Amount : int 1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
## $ InstallmentRatePercentage : int 4 2 2 2 3 2 3 2 2 4 ...
## $ ResidenceDuration : int 4 2 3 4 4 4 4 2 4 2 ...
## $ Age : int 67 22 49 45 53 35 53 35 61 28 ...
## $ NumberExistingCredits : int 2 1 1 1 2 1 1 1 1 2 ...
## $ NumberPeopleMaintenance : int 1 1 2 2 2 2 1 1 1 1 ...
## $ Telephone : num 0 1 1 1 1 0 1 0 1 1 ...
## $ ForeignWorker : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Class : logi TRUE FALSE TRUE TRUE FALSE TRUE ...
## $ CheckingAccountStatus.lt.0 : num 1 0 0 1 1 0 0 0 0 0 ...
## $ CheckingAccountStatus.0.to.200 : num 0 1 0 0 0 0 0 1 0 1 ...
## $ CheckingAccountStatus.gt.200 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.NoCredit.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.ThisBank.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.PaidDuly : num 0 1 0 1 0 1 1 1 1 0 ...
## $ CreditHistory.Delay : num 0 0 0 0 1 0 0 0 0 0 ...
## $ Purpose.NewCar : num 0 0 0 0 1 0 0 0 0 1 ...
## $ Purpose.UsedCar : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Purpose.Furniture.Equipment : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Purpose.Radio.Television : num 1 1 0 0 0 0 0 0 1 0 ...
## $ Purpose.DomesticAppliance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Repairs : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Education : num 0 0 1 0 0 1 0 0 0 0 ...
## $ Purpose.Retraining : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Business : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.lt.100 : num 0 1 1 1 1 0 0 1 0 1 ...
## $ SavingsAccountBonds.100.to.500 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.500.to.1000 : num 0 0 0 0 0 0 1 0 0 0 ...
## $ SavingsAccountBonds.gt.1000 : num 0 0 0 0 0 0 0 0 1 0 ...
## $ EmploymentDuration.lt.1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ EmploymentDuration.1.to.4 : num 0 1 0 0 1 1 0 1 0 0 ...
## $ EmploymentDuration.4.to.7 : num 0 0 1 1 0 0 0 0 1 0 ...
## $ EmploymentDuration.gt.7 : num 1 0 0 0 0 0 1 0 0 0 ...
## $ Personal.Male.Divorced.Seperated : num 0 0 0 0 0 0 0 0 1 0 ...
## $ Personal.Female.NotSingle : num 0 1 0 0 0 0 0 0 0 0 ...
## $ Personal.Male.Single : num 1 0 1 1 1 1 1 1 0 0 ...
## $ OtherDebtorsGuarantors.None : num 1 1 1 0 1 1 1 1 1 1 ...
## $ OtherDebtorsGuarantors.CoApplicant: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Property.RealEstate : num 1 1 1 0 0 0 0 0 1 0 ...
## $ Property.Insurance : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Property.CarOther : num 0 0 0 0 0 0 0 1 0 1 ...
## $ OtherInstallmentPlans.Bank : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.Stores : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Housing.Rent : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Housing.Own : num 1 1 1 0 0 0 1 0 1 1 ...
## $ Job.UnemployedUnskilled : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Job.UnskilledResident : num 0 0 1 0 0 1 0 0 1 0 ...
## $ Job.SkilledEmployee : num 1 1 0 1 1 0 1 0 0 0 ...
summary(GermanCredit)
## Duration Amount InstallmentRatePercentage ResidenceDuration
## Min. : 4.0 Min. : 250 Min. :1.000 Min. :1.000
## 1st Qu.:12.0 1st Qu.: 1366 1st Qu.:2.000 1st Qu.:2.000
## Median :18.0 Median : 2320 Median :3.000 Median :3.000
## Mean :20.9 Mean : 3271 Mean :2.973 Mean :2.845
## 3rd Qu.:24.0 3rd Qu.: 3972 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :72.0 Max. :18424 Max. :4.000 Max. :4.000
## Age NumberExistingCredits NumberPeopleMaintenance Telephone
## Min. :19.00 Min. :1.000 Min. :1.000 Min. :0.000
## 1st Qu.:27.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.000
## Median :33.00 Median :1.000 Median :1.000 Median :1.000
## Mean :35.55 Mean :1.407 Mean :1.155 Mean :0.596
## 3rd Qu.:42.00 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :75.00 Max. :4.000 Max. :2.000 Max. :1.000
## ForeignWorker Class CheckingAccountStatus.lt.0
## Min. :0.000 Mode :logical Min. :0.000
## 1st Qu.:1.000 FALSE:300 1st Qu.:0.000
## Median :1.000 TRUE :700 Median :0.000
## Mean :0.963 Mean :0.274
## 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :1.000 Max. :1.000
## CheckingAccountStatus.0.to.200 CheckingAccountStatus.gt.200
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000
## Mean :0.269 Mean :0.063
## 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000
## CreditHistory.NoCredit.AllPaid CreditHistory.ThisBank.AllPaid
## Min. :0.00 Min. :0.000
## 1st Qu.:0.00 1st Qu.:0.000
## Median :0.00 Median :0.000
## Mean :0.04 Mean :0.049
## 3rd Qu.:0.00 3rd Qu.:0.000
## Max. :1.00 Max. :1.000
## CreditHistory.PaidDuly CreditHistory.Delay Purpose.NewCar Purpose.UsedCar
## Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.00 Median :0.000 Median :0.000 Median :0.000
## Mean :0.53 Mean :0.088 Mean :0.234 Mean :0.103
## 3rd Qu.:1.00 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.00 Max. :1.000 Max. :1.000 Max. :1.000
## Purpose.Furniture.Equipment Purpose.Radio.Television Purpose.DomesticAppliance
## Min. :0.000 Min. :0.00 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000
## Median :0.000 Median :0.00 Median :0.000
## Mean :0.181 Mean :0.28 Mean :0.012
## 3rd Qu.:0.000 3rd Qu.:1.00 3rd Qu.:0.000
## Max. :1.000 Max. :1.00 Max. :1.000
## Purpose.Repairs Purpose.Education Purpose.Retraining Purpose.Business
## Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.00 Median :0.000 Median :0.000
## Mean :0.022 Mean :0.05 Mean :0.009 Mean :0.097
## 3rd Qu.:0.000 3rd Qu.:0.00 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.00 Max. :1.000 Max. :1.000
## SavingsAccountBonds.lt.100 SavingsAccountBonds.100.to.500
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.000
## Mean :0.603 Mean :0.103
## 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000
## SavingsAccountBonds.500.to.1000 SavingsAccountBonds.gt.1000
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000
## Mean :0.063 Mean :0.048
## 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000
## EmploymentDuration.lt.1 EmploymentDuration.1.to.4 EmploymentDuration.4.to.7
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.000
## Mean :0.172 Mean :0.339 Mean :0.174
## 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000 Max. :1.000
## EmploymentDuration.gt.7 Personal.Male.Divorced.Seperated
## Min. :0.000 Min. :0.00
## 1st Qu.:0.000 1st Qu.:0.00
## Median :0.000 Median :0.00
## Mean :0.253 Mean :0.05
## 3rd Qu.:1.000 3rd Qu.:0.00
## Max. :1.000 Max. :1.00
## Personal.Female.NotSingle Personal.Male.Single OtherDebtorsGuarantors.None
## Min. :0.00 Min. :0.000 Min. :0.000
## 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:1.000
## Median :0.00 Median :1.000 Median :1.000
## Mean :0.31 Mean :0.548 Mean :0.907
## 3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :1.00 Max. :1.000 Max. :1.000
## OtherDebtorsGuarantors.CoApplicant Property.RealEstate Property.Insurance
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.000
## Mean :0.041 Mean :0.282 Mean :0.232
## 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000 Max. :1.000
## Property.CarOther OtherInstallmentPlans.Bank OtherInstallmentPlans.Stores
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.000
## Mean :0.332 Mean :0.139 Mean :0.047
## 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000 Max. :1.000
## Housing.Rent Housing.Own Job.UnemployedUnskilled Job.UnskilledResident
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.0
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.0
## Median :0.000 Median :1.000 Median :0.000 Median :0.0
## Mean :0.179 Mean :0.713 Mean :0.022 Mean :0.2
## 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:0.0
## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.0
## Job.SkilledEmployee
## Min. :0.00
## 1st Qu.:0.00
## Median :1.00
## Mean :0.63
## 3rd Qu.:1.00
## Max. :1.00
Your observation: After dropping variables that provide no information in the data, the total observations remain 1000 and there are 49 variables.
2024 for reproducibility.
(10pts)set.seed(2024)
# Create index for training data (70% training is standard)
train_index <- createDataPartition(GermanCredit$Class, p = 0.7, list = FALSE)
# Split the data
train_data <- GermanCredit[train_index, ]
test_data <- GermanCredit[-train_index, ]
Your observation: This dataset was split into training and tests sets by using a 70/30 split. A random seed of 2024 was used for reproducibility. The createDataPartition() function was used to perform a stratified split based on the Class variable.
# Make sure Class is a factor
train_data$Class <- as.factor(train_data$Class)
# Fit logistic regression model
log_model <- glm(Class ~ ., data = train_data, family = binomial)
summary(log_model)
##
## Call:
## glm(formula = Class ~ ., family = binomial, data = train_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.7755921 1.7594925 5.556 2.76e-08 ***
## Duration -0.0281752 0.0114559 -2.459 0.013915 *
## Amount -0.0001968 0.0000580 -3.394 0.000690 ***
## InstallmentRatePercentage -0.3458012 0.1122102 -3.082 0.002058 **
## ResidenceDuration -0.1477247 0.1099835 -1.343 0.179222
## Age -0.0011930 0.0111092 -0.107 0.914479
## NumberExistingCredits -0.1741853 0.2247245 -0.775 0.438277
## NumberPeopleMaintenance -0.2953842 0.3033517 -0.974 0.330188
## Telephone -0.8357009 0.2619015 -3.191 0.001418 **
## ForeignWorker -1.6606566 0.8122576 -2.044 0.040905 *
## CheckingAccountStatus.lt.0 -2.0280291 0.2899845 -6.994 2.68e-12 ***
## CheckingAccountStatus.0.to.200 -1.4706478 0.2943908 -4.996 5.87e-07 ***
## CheckingAccountStatus.gt.200 -0.6052653 0.4876931 -1.241 0.214577
## CreditHistory.NoCredit.AllPaid -1.2639798 0.5155113 -2.452 0.014211 *
## CreditHistory.ThisBank.AllPaid -1.8780235 0.5706646 -3.291 0.000999 ***
## CreditHistory.PaidDuly -0.8775997 0.3159046 -2.778 0.005469 **
## CreditHistory.Delay -0.4012640 0.4307837 -0.931 0.351608
## Purpose.NewCar -1.0626620 0.8142904 -1.305 0.191887
## Purpose.UsedCar 1.1942539 0.8839916 1.351 0.176702
## Purpose.Furniture.Equipment -0.1681192 0.8320966 -0.202 0.839883
## Purpose.Radio.Television -0.3031554 0.8286036 -0.366 0.714467
## Purpose.DomesticAppliance -0.7371787 1.2321421 -0.598 0.549646
## Purpose.Repairs -0.8575710 0.9887784 -0.867 0.385776
## Purpose.Education -0.6848705 0.9364025 -0.731 0.464544
## Purpose.Retraining -0.1649183 1.5465838 -0.107 0.915079
## Purpose.Business -0.3600823 0.8535288 -0.422 0.673116
## SavingsAccountBonds.lt.100 -0.9786195 0.3127225 -3.129 0.001752 **
## SavingsAccountBonds.100.to.500 -0.9669534 0.4406228 -2.195 0.028198 *
## SavingsAccountBonds.500.to.1000 -0.2529878 0.5442721 -0.465 0.642061
## SavingsAccountBonds.gt.1000 0.2713176 0.6594268 0.411 0.680747
## EmploymentDuration.lt.1 -0.4435735 0.5345880 -0.830 0.406681
## EmploymentDuration.1.to.4 -0.4275141 0.5069023 -0.843 0.399013
## EmploymentDuration.4.to.7 0.4416798 0.5618787 0.786 0.431822
## EmploymentDuration.gt.7 -0.2520532 0.5037635 -0.500 0.616835
## Personal.Male.Divorced.Seperated -0.4301280 0.5538492 -0.777 0.437385
## Personal.Female.NotSingle -0.0179029 0.3950224 -0.045 0.963851
## Personal.Male.Single 0.6299901 0.3971902 1.586 0.112713
## OtherDebtorsGuarantors.None -1.0309812 0.5142560 -2.005 0.044984 *
## OtherDebtorsGuarantors.CoApplicant -1.0727811 0.7201303 -1.490 0.136302
## Property.RealEstate 1.2295999 0.5185315 2.371 0.017725 *
## Property.Insurance 0.8935212 0.5097800 1.753 0.079643 .
## Property.CarOther 1.1356001 0.5048681 2.249 0.024493 *
## OtherInstallmentPlans.Bank -0.6436463 0.3046547 -2.113 0.034626 *
## OtherInstallmentPlans.Stores -0.2405278 0.4731218 -0.508 0.611184
## Housing.Rent -0.7041915 0.5817432 -1.210 0.226093
## Housing.Own -0.5109041 0.5552490 -0.920 0.357502
## Job.UnemployedUnskilled 0.4681174 0.8091298 0.579 0.562897
## Job.UnskilledResident 0.3450109 0.4498926 0.767 0.443156
## Job.SkilledEmployee 0.1604813 0.3719210 0.431 0.666110
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 855.21 on 699 degrees of freedom
## Residual deviance: 595.77 on 651 degrees of freedom
## AIC: 693.77
##
## Number of Fisher Scoring iterations: 5
Your observation: A logistic regression model was built using the training data and all variables. The Class variable was converted to a factor so it could be used for classification. The glm() function with a binomial setting was used to predict credit risk.
summary(log_model)
##
## Call:
## glm(formula = Class ~ ., family = binomial, data = train_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.7755921 1.7594925 5.556 2.76e-08 ***
## Duration -0.0281752 0.0114559 -2.459 0.013915 *
## Amount -0.0001968 0.0000580 -3.394 0.000690 ***
## InstallmentRatePercentage -0.3458012 0.1122102 -3.082 0.002058 **
## ResidenceDuration -0.1477247 0.1099835 -1.343 0.179222
## Age -0.0011930 0.0111092 -0.107 0.914479
## NumberExistingCredits -0.1741853 0.2247245 -0.775 0.438277
## NumberPeopleMaintenance -0.2953842 0.3033517 -0.974 0.330188
## Telephone -0.8357009 0.2619015 -3.191 0.001418 **
## ForeignWorker -1.6606566 0.8122576 -2.044 0.040905 *
## CheckingAccountStatus.lt.0 -2.0280291 0.2899845 -6.994 2.68e-12 ***
## CheckingAccountStatus.0.to.200 -1.4706478 0.2943908 -4.996 5.87e-07 ***
## CheckingAccountStatus.gt.200 -0.6052653 0.4876931 -1.241 0.214577
## CreditHistory.NoCredit.AllPaid -1.2639798 0.5155113 -2.452 0.014211 *
## CreditHistory.ThisBank.AllPaid -1.8780235 0.5706646 -3.291 0.000999 ***
## CreditHistory.PaidDuly -0.8775997 0.3159046 -2.778 0.005469 **
## CreditHistory.Delay -0.4012640 0.4307837 -0.931 0.351608
## Purpose.NewCar -1.0626620 0.8142904 -1.305 0.191887
## Purpose.UsedCar 1.1942539 0.8839916 1.351 0.176702
## Purpose.Furniture.Equipment -0.1681192 0.8320966 -0.202 0.839883
## Purpose.Radio.Television -0.3031554 0.8286036 -0.366 0.714467
## Purpose.DomesticAppliance -0.7371787 1.2321421 -0.598 0.549646
## Purpose.Repairs -0.8575710 0.9887784 -0.867 0.385776
## Purpose.Education -0.6848705 0.9364025 -0.731 0.464544
## Purpose.Retraining -0.1649183 1.5465838 -0.107 0.915079
## Purpose.Business -0.3600823 0.8535288 -0.422 0.673116
## SavingsAccountBonds.lt.100 -0.9786195 0.3127225 -3.129 0.001752 **
## SavingsAccountBonds.100.to.500 -0.9669534 0.4406228 -2.195 0.028198 *
## SavingsAccountBonds.500.to.1000 -0.2529878 0.5442721 -0.465 0.642061
## SavingsAccountBonds.gt.1000 0.2713176 0.6594268 0.411 0.680747
## EmploymentDuration.lt.1 -0.4435735 0.5345880 -0.830 0.406681
## EmploymentDuration.1.to.4 -0.4275141 0.5069023 -0.843 0.399013
## EmploymentDuration.4.to.7 0.4416798 0.5618787 0.786 0.431822
## EmploymentDuration.gt.7 -0.2520532 0.5037635 -0.500 0.616835
## Personal.Male.Divorced.Seperated -0.4301280 0.5538492 -0.777 0.437385
## Personal.Female.NotSingle -0.0179029 0.3950224 -0.045 0.963851
## Personal.Male.Single 0.6299901 0.3971902 1.586 0.112713
## OtherDebtorsGuarantors.None -1.0309812 0.5142560 -2.005 0.044984 *
## OtherDebtorsGuarantors.CoApplicant -1.0727811 0.7201303 -1.490 0.136302
## Property.RealEstate 1.2295999 0.5185315 2.371 0.017725 *
## Property.Insurance 0.8935212 0.5097800 1.753 0.079643 .
## Property.CarOther 1.1356001 0.5048681 2.249 0.024493 *
## OtherInstallmentPlans.Bank -0.6436463 0.3046547 -2.113 0.034626 *
## OtherInstallmentPlans.Stores -0.2405278 0.4731218 -0.508 0.611184
## Housing.Rent -0.7041915 0.5817432 -1.210 0.226093
## Housing.Own -0.5109041 0.5552490 -0.920 0.357502
## Job.UnemployedUnskilled 0.4681174 0.8091298 0.579 0.562897
## Job.UnskilledResident 0.3450109 0.4498926 0.767 0.443156
## Job.SkilledEmployee 0.1604813 0.3719210 0.431 0.666110
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 855.21 on 699 degrees of freedom
## Residual deviance: 595.77 on 651 degrees of freedom
## AIC: 693.77
##
## Number of Fisher Scoring iterations: 5
Your observation: The model shows which variables affect credit risk, with smaller p-values indicating more important variables. The variable Amount is significant and has a negative coefficient, meaning that as the loan amount increases, the likelihood of being a good credit risk decreases, indicating higher risk.
# Get predicted probabilities for training data
train_probs <- predict(log_model, newdata = train_data, type = "response")
train_probs
## 1 2 3 4 5 7 8
## 0.94848296 0.27796033 0.98221231 0.62636465 0.10671097 0.92238998 0.84618528
## 9 10 13 14 15 16 18
## 0.97464344 0.29520427 0.89639731 0.48124556 0.26183844 0.37328933 0.18355073
## 19 20 23 26 27 28 29
## 0.33817898 0.96535073 0.91592423 0.88704068 0.84251378 0.79908021 0.90161658
## 30 31 32 34 35 36 37
## 0.11509086 0.82665738 0.49668265 0.94312595 0.74340675 0.46396694 0.78401622
## 38 39 40 41 42 43 44
## 0.76316015 0.94528334 0.73722203 0.84356945 0.86475457 0.69292297 0.85923910
## 48 49 50 51 52 54 55
## 0.98501660 0.85538821 0.82703426 0.76140500 0.95987148 0.99641808 0.25491387
## 56 57 59 62 63 65 66
## 0.97350159 0.82771599 0.58803645 0.98516158 0.37131136 0.77367847 0.76290626
## 67 68 70 71 72 73 75
## 0.81940201 0.73052188 0.80972895 0.78221373 0.99313805 0.67454250 0.36950532
## 77 78 80 81 82 84 86
## 0.21962661 0.83854378 0.47954971 0.96050103 0.96501822 0.81606836 0.98331606
## 87 88 90 91 93 94 95
## 0.67777141 0.14093191 0.50279403 0.96279398 0.93298346 0.77511981 0.89050014
## 96 98 99 100 101 102 103
## 0.01516329 0.51328547 0.68826744 0.92134398 0.62280574 0.58369242 0.94350374
## 104 105 106 107 110 111 112
## 0.95934528 0.99294187 0.56799941 0.15997786 0.94907850 0.60755091 0.70162277
## 113 115 116 117 119 120 121
## 0.51948335 0.74046375 0.98063412 0.49912773 0.68104685 0.89444905 0.59170572
## 123 124 125 126 127 129 130
## 0.94156691 0.86381370 0.67749680 0.45282713 0.50625461 0.98230214 0.38530251
## 132 133 134 136 138 139 140
## 0.23073231 0.84914818 0.59185468 0.98834283 0.85150902 0.98448653 0.91929965
## 144 153 154 159 162 163 165
## 0.67016479 0.56875797 0.80840138 0.67435878 0.79629767 0.97458942 0.78519392
## 166 168 169 170 171 174 177
## 0.99237526 0.85522979 0.93042961 0.68128732 0.17803593 0.97587801 0.47978418
## 178 179 180 181 183 185 186
## 0.87121238 0.95471374 0.50914520 0.58679838 0.20234145 0.65211261 0.93476798
## 187 189 190 191 192 195 197
## 0.41059055 0.63695471 0.53325536 0.95712754 0.21663918 0.47002932 0.98933630
## 198 200 202 203 204 205 206
## 0.21891084 0.30483706 0.27715452 0.90839276 0.50155276 0.92420165 0.48160548
## 207 208 209 210 211 213 214
## 0.97421964 0.84956907 0.24139445 0.99951331 0.98952641 0.40067898 0.89013709
## 217 218 219 220 221 222 223
## 0.64898494 0.89401422 0.36646575 0.89342237 0.75640360 0.42770904 0.84441173
## 226 228 231 233 234 236 237
## 0.56279189 0.27219303 0.61041976 0.91418428 0.80686237 0.39188147 0.72743454
## 239 240 241 243 245 246 247
## 0.89165798 0.74474079 0.29160845 0.24148173 0.71523789 0.97945467 0.94736407
## 248 249 250 251 252 255 256
## 0.76533393 0.89637375 0.88064664 0.95824729 0.89989705 0.98826959 0.76736462
## 257 259 260 263 264 265 266
## 0.93486918 0.98449117 0.93256894 0.42518148 0.72486952 0.98492620 0.63769823
## 267 269 270 271 272 275 276
## 0.90285657 0.46919352 0.95282393 0.98548835 0.99277134 0.05176437 0.94987128
## 277 278 279 280 281 282 283
## 0.93487405 0.84005356 0.85624814 0.88508464 0.99329860 0.92054577 0.82718982
## 285 286 287 288 289 290 291
## 0.46587876 0.15444083 0.56361847 0.56002794 0.84413466 0.34379401 0.99291345
## 293 295 296 297 299 300 301
## 0.68736627 0.41589859 0.36772870 0.98213568 0.95211687 0.98818616 0.97240055
## 302 304 305 306 307 308 311
## 0.39979252 0.82820550 0.44945495 0.94454014 0.97609810 0.37337534 0.65317876
## 312 313 315 316 318 319 321
## 0.76700301 0.67812172 0.99155705 0.08943771 0.86455701 0.93910367 0.33707283
## 324 326 327 329 330 331 332
## 0.77229787 0.96934351 0.98729734 0.63802616 0.65240352 0.87074969 0.85789938
## 334 336 337 338 340 341 342
## 0.63977567 0.78297316 0.79511587 0.47539002 0.52472957 0.47186784 0.65897637
## 343 344 345 347 348 349 350
## 0.76101472 0.80609900 0.90949550 0.91737689 0.55978692 0.98108025 0.83501625
## 351 352 353 354 355 356 357
## 0.94182732 0.95953195 0.99575256 0.29909993 0.94702560 0.44304970 0.99677785
## 359 360 364 366 367 368 369
## 0.86623521 0.38121817 0.92369836 0.98645194 0.99448473 0.41780574 0.30421909
## 370 371 372 374 376 379 380
## 0.72418405 0.78290674 0.95035674 0.38468531 0.16643762 0.04850645 0.86089338
## 381 382 383 384 385 387 388
## 0.91513174 0.43267495 0.85013983 0.76102386 0.89111033 0.96178471 0.64896983
## 390 392 394 395 396 398 399
## 0.93663061 0.92068037 0.89566032 0.98179577 0.27530739 0.62640574 0.52284698
## 400 401 402 403 405 407 408
## 0.97427241 0.91824253 0.66723624 0.83624573 0.64764015 0.99767540 0.79360401
## 409 411 412 414 415 418 419
## 0.92029383 0.60784358 0.99024551 0.94360589 0.31856859 0.65234112 0.89777799
## 420 421 423 424 426 427 428
## 0.50879533 0.94868296 0.90159296 0.95634311 0.94147445 0.92010576 0.98270257
## 430 431 432 433 434 436 438
## 0.40888646 0.97922967 0.30572675 0.76254316 0.86131391 0.94859756 0.96478082
## 439 440 441 442 443 445 446
## 0.38183771 0.65973469 0.87745863 0.48866924 0.84334626 0.65707176 0.93478127
## 447 448 449 450 451 452 453
## 0.19776017 0.92609772 0.95688606 0.67870087 0.97659988 0.92839790 0.87108701
## 456 457 459 461 462 463 464
## 0.90123372 0.50490815 0.44404872 0.68896302 0.48957033 0.54736006 0.81853003
## 466 468 472 473 474 475 476
## 0.71365567 0.69600960 0.27536202 0.44074760 0.97935285 0.71375622 0.29743974
## 480 483 485 486 488 490 492
## 0.80107508 0.52930535 0.97783399 0.78187190 0.52010800 0.91802334 0.16333382
## 493 494 495 496 498 499 500
## 0.97051107 0.85660752 0.86273640 0.62018651 0.96121832 0.81924493 0.86617708
## 501 503 504 505 506 507 509
## 0.12557756 0.80322011 0.23608239 0.10007403 0.96280921 0.99757839 0.74431422
## 515 516 517 518 522 523 526
## 0.86442453 0.88114109 0.92808340 0.81073050 0.46862259 0.03872306 0.72560466
## 528 529 530 531 532 533 534
## 0.99234529 0.10405482 0.65602880 0.61520030 0.48810842 0.96062600 0.93461322
## 535 537 538 539 541 543 544
## 0.91678962 0.65824014 0.79892752 0.01288756 0.77381466 0.38219977 0.89744232
## 546 547 548 551 552 553 558
## 0.54737800 0.86061772 0.91150421 0.97108902 0.84042135 0.79257969 0.73495158
## 559 563 564 565 567 568 569
## 0.25034866 0.85207218 0.29583673 0.69491600 0.46073821 0.99097528 0.83683068
## 570 572 573 574 576 577 578
## 0.21658839 0.94866127 0.98276422 0.31754958 0.94513635 0.84094445 0.97509283
## 579 581 583 584 585 586 587
## 0.17198438 0.81841524 0.85300578 0.16350742 0.83342423 0.36681598 0.57289535
## 588 590 593 594 595 596 597
## 0.82340752 0.73466703 0.88536844 0.36596580 0.65765065 0.38746553 0.21613787
## 598 601 604 606 607 610 611
## 0.72477492 0.91640660 0.79959851 0.47381643 0.96115946 0.95204885 0.34469117
## 612 616 620 622 623 624 625
## 0.62857119 0.25056539 0.70363825 0.81544962 0.51802030 0.40029102 0.21576805
## 626 628 629 630 631 632 637
## 0.96189225 0.35797904 0.95708873 0.97434400 0.31400624 0.16495709 0.95445862
## 638 640 641 642 644 647 649
## 0.63065153 0.49937673 0.54005416 0.51704055 0.99060100 0.36487337 0.45347122
## 650 652 653 654 655 656 657
## 0.35400516 0.68451561 0.33916342 0.32179795 0.99595647 0.37351967 0.32341235
## 658 659 661 662 663 664 666
## 0.85997187 0.33673828 0.73561316 0.48668217 0.88757443 0.90423775 0.77995176
## 667 669 670 671 672 673 674
## 0.63831867 0.52146189 0.79305203 0.95433175 0.94901484 0.35265987 0.97267807
## 675 676 677 678 680 684 685
## 0.90451091 0.92025322 0.94918560 0.16416589 0.88367696 0.81515881 0.72407757
## 686 687 688 689 690 692 693
## 0.43827407 0.99252586 0.26420628 0.95781103 0.65683583 0.48027126 0.73336871
## 694 695 696 697 698 699 700
## 0.82197590 0.96994008 0.99043616 0.98523735 0.96653718 0.93943996 0.81122166
## 701 702 704 706 708 710 711
## 0.90405714 0.63151489 0.50514417 0.73632815 0.23735811 0.80499494 0.97084247
## 712 713 714 715 716 717 718
## 0.08939516 0.99275136 0.74637712 0.03303341 0.99014287 0.98483238 0.89447391
## 720 723 724 725 727 729 730
## 0.59470960 0.30216088 0.76465903 0.58128956 0.97400959 0.01428197 0.97111425
## 732 734 737 739 742 743 744
## 0.49848433 0.99218899 0.38751163 0.91066987 0.42172970 0.96323392 0.49076981
## 746 747 748 749 750 752 754
## 0.63868421 0.38337709 0.40513218 0.99041924 0.99281343 0.49301489 0.86319161
## 755 756 760 763 764 765 767
## 0.89808516 0.22253619 0.52846395 0.47314986 0.69684594 0.91496190 0.23214564
## 768 769 771 772 775 776 777
## 0.98894025 0.93075371 0.87435670 0.10229013 0.87795579 0.28923190 0.92604257
## 778 779 780 781 784 785 786
## 0.76253748 0.99360632 0.63772902 0.93975774 0.23407962 0.94804867 0.92399253
## 788 789 790 791 792 793 795
## 0.99130772 0.13829628 0.11558785 0.47875144 0.98274588 0.98867245 0.88147378
## 796 798 799 800 801 802 803
## 0.83741564 0.93315321 0.93422893 0.81185864 0.76041916 0.89374228 0.58253609
## 805 808 809 811 813 814 815
## 0.70364634 0.97435146 0.43790332 0.83135262 0.82812040 0.18231476 0.08420201
## 816 817 818 820 821 822 824
## 0.14695154 0.92339396 0.99349907 0.22618386 0.88626130 0.77315533 0.64046294
## 825 827 828 829 830 832 833
## 0.91824129 0.42681764 0.74962641 0.56623223 0.44105457 0.30950698 0.04893579
## 834 835 836 837 838 839 840
## 0.81286325 0.82793019 0.12268755 0.93828291 0.94377338 0.90390221 0.86255447
## 841 842 843 844 845 846 847
## 0.39991010 0.98011072 0.64370087 0.83118491 0.73356977 0.96783152 0.77852252
## 848 849 850 852 854 855 857
## 0.65955376 0.78941256 0.56774033 0.99775733 0.11662391 0.66146923 0.98532646
## 858 859 860 861 862 863 864
## 0.95425833 0.30913643 0.98929406 0.99026524 0.92995203 0.41886302 0.97994878
## 865 867 868 869 870 872 873
## 0.92055984 0.29607471 0.95194278 0.80818518 0.34006643 0.98456207 0.92593181
## 874 875 876 877 878 879 881
## 0.91889219 0.56288438 0.74844012 0.16345799 0.79387866 0.48884237 0.98032748
## 882 883 884 885 887 890 891
## 0.91621106 0.73821576 0.93230925 0.71290179 0.87392862 0.93795173 0.48126733
## 892 893 895 896 897 898 900
## 0.97096905 0.66762231 0.98640544 0.98814027 0.27742245 0.99847366 0.54058835
## 901 902 904 905 906 909 910
## 0.79557804 0.94259871 0.96339224 0.95478319 0.78458346 0.97934381 0.80580916
## 911 912 913 915 916 917 918
## 0.61053947 0.52627050 0.71017024 0.25122408 0.14150761 0.98727163 0.03812747
## 919 921 922 923 924 928 929
## 0.54426838 0.74745292 0.87174974 0.33239424 0.72206141 0.24462156 0.96579943
## 930 931 933 935 936 940 941
## 0.44327792 0.84285085 0.94928361 0.28824484 0.23367536 0.98968604 0.93860117
## 942 943 946 947 948 950 951
## 0.97167771 0.99014908 0.16126018 0.26765895 0.86679817 0.87118816 0.81936240
## 955 957 958 959 961 962 963
## 0.51917575 0.87693272 0.90988047 0.27688304 0.96166076 0.33831267 0.76861033
## 964 966 967 968 969 970 971
## 0.95272377 0.67310879 0.82418813 0.75116460 0.91004654 0.67519736 0.78465824
## 972 974 975 976 978 979 980
## 0.59270957 0.02151977 0.92769932 0.85712003 0.82919412 0.73268463 0.20333165
## 981 983 984 985 986 988 989
## 0.77464206 0.71982687 0.47173397 0.98461483 0.44355161 0.92342286 0.70474587
## 990 991 993 994 996 999 1000
## 0.76895996 0.97378937 0.80465761 0.47973885 0.94768552 0.30098042 0.89296843
Your observation: Predicted probabilities were successfully generated for all 700 observations in the training set. The values range between 0 and 1, representing the likelihood that each observation is classified as a good credit risk.
cutoffs <- seq(0.1, 0.9, by = 0.01)
mr_values <- c()
for (c in cutoffs) {
preds <- ifelse(train_probs > c, TRUE, FALSE)
cm <- table(preds, train_data$Class)
mr <- 1 - sum(diag(cm)) / sum(cm)
mr_values <- c(mr_values, mr)
}
optimal_cutoff <- cutoffs[which.min(mr_values)]
optimal_cutoff
## [1] 0.41
Your observation: The optimal probability cutoff was found by testing multiple threshold values and selecting the one that minimizes the misclassification rate. The cutoff value of 0.41 resulted in the lowest error and was chosen as the optimal threshold.
# Apply optimal cutoff
train_preds <- ifelse(train_probs > 0.41, TRUE, FALSE)
# Confusion matrix
cm_train <- table(Predicted = train_preds, Actual = train_data$Class)
# Misclassification rate
mr_train <- 1 - sum(diag(cm_train)) / sum(cm_train)
# Output results
cm_train
## Actual
## Predicted FALSE TRUE
## FALSE 103 27
## TRUE 107 463
mr_train
## [1] 0.1914286
Your observation: The confusion matrix was generated using the optimal cutoff of 0.41. The misclassification rate (MR) for the training set is 0.191, meaning that approximately 19.1% of the observations were incorrectly classified.
# Load library
library(pROC)
# Generate ROC curve
roc_obj <- roc(train_data$Class, train_probs)
## Setting levels: control = FALSE, case = TRUE
## Setting direction: controls < cases
# Plot ROC curve
plot(roc_obj, main = "ROC Curve - Training Set")
# Calculate AUC
auc_value <- auc(roc_obj)
# Output AUC
auc_value
## Area under the curve: 0.8497
Your observation: The ROC curve was generated for the training set to evaluate the model’s performance. The curve is well above the diagonal line, indicating strong classification ability. The AUC value is [your AUC], suggesting that the model performs well in distinguishing between good and bad credit risk.
# Get predicted probabilities for test set
test_probs <- predict(log_model, newdata = test_data, type = "response")
# Apply optimal cutoff (0.41)
test_preds <- ifelse(test_probs > 0.41, TRUE, FALSE)
# Confusion matrix
cm_test <- table(Predicted = test_preds, Actual = test_data$Class)
# Misclassification rate
mr_test <- 1 - sum(diag(cm_test)) / sum(cm_test)
# Output results
cm_test
## Actual
## Predicted FALSE TRUE
## FALSE 30 22
## TRUE 60 188
mr_test
## [1] 0.2733333
Your observation: The confusion matrix was generated for the test set using the optimal cutoff of 0.41. The misclassification rate (MR) is 0.273, meaning that approximately 27.3% of the observations were incorrectly classified.
# ROC curve for test set
roc_test <- roc(test_data$Class, test_probs)
## Setting levels: control = FALSE, case = TRUE
## Setting direction: controls < cases
# Plot ROC curve
plot(roc_test, main = "ROC Curve - Test Set")
# Calculate AUC
auc_test <- auc(roc_test)
# Output AUC
auc_test
## Area under the curve: 0.7562
Your observation: The ROC curve was generated for the test set to evaluate model performance. The AUC value is 0.7562, indicating that the model has a good ability to distinguish between good and bad credit risk on unseen data.
Now, let’s assume “It is worse to class a customer as good when they are bad (weight = 5), than it is to class a customer as bad when they are good (weight = 1).” Please figure out which weight should be 5 and which weight should be 1. Then define your cost function accordingly!
# Define weights
weight_FP <- 5
weight_FN <- 1
# Function to calculate cost
cost_function <- function(cm) {
FP <- cm[2,1] # predicted TRUE, actual FALSE
FN <- cm[1,2] # predicted FALSE, actual TRUE
total <- sum(cm)
cost <- (weight_FP * FP + weight_FN * FN) / total
return(cost)}
Your observation: A higher weight of 5 was assigned to false positives because misclassifying a bad customer as good is more costly. A lower weight of 1 was assigned to false negatives. The cost function was defined to reflect these differences in classification errors.
# Use same cutoff (0.41)
train_preds <- ifelse(train_probs > 0.41, TRUE, FALSE)
# Confusion matrix
cm_train <- table(Predicted = train_preds, Actual = train_data$Class)
# Misclassification rate (MR)
mr_train <- 1 - sum(diag(cm_train)) / sum(cm_train)
# Output
cm_train
## Actual
## Predicted FALSE TRUE
## FALSE 103 27
## TRUE 107 463
mr_train
## [1] 0.1914286
Your observation: The confusion matrix was generated for the training set using the cutoff of 0.41. The misclassification rate (MR) is 0.191, meaning that approximately 19.1% of the observations were incorrectly classified.
# Use same cutoff (0.41)
test_preds <- ifelse(test_probs > 0.41, TRUE, FALSE)
# Confusion matrix
cm_test <- table(Predicted = test_preds, Actual = test_data$Class)
# Misclassification rate (MR)
mr_test <- 1 - sum(diag(cm_test)) / sum(cm_test)
# Output
cm_test
## Actual
## Predicted FALSE TRUE
## FALSE 30 22
## TRUE 60 188
mr_test
## [1] 0.2733333
Your observation: The confusion matrix was generated for the test set using the cutoff of 0.41. The misclassification rate (MR) is 0.273, meaning that approximately 27.3% of the observations were incorrectly classified.
Summarize your findings, including the optimal probability cut-off, MR and AUC for both training and testing data. Discuss what you observed and what you will do to improve the model further.
The optimal cutoff was 0.41, which gave the lowest misclassification rate on the training data. The training MR was 0.191, while the test MR was 0.273, showing that the model performs a bit worse on new data. The AUC for the test set was 0.7562, which means the model does a decent job at distinguishing between good and bad credit risk.
Overall, the model works pretty well, but the higher error on the test set suggests it may be slightly overfitting. To improve it, I could try reducing the number of variables, using regularization, or testing other models like decision trees or random forests. I could also adjust the cutoff based on cost instead of just minimizing error.