#Data
Data from
Dr. Hans Hofmann of the University of Hamburg.
These data have two classes for the
credit worthiness: good or bad. There are predictors
related to attributes, such as: checking account status,
duration, credit history, purpose of the
loan, amount of the loan, savings accounts or
bonds, employment duration, Installment rate in
percentage of disposable income, personal information,
other debtors/guarantors, residence duration,
property, age, other installment plans,
housing, number of existing credits, job
information, Number of people being liable to provide
maintenance for, telephone, and foreign worker
status.
Many of these predictors are discrete and have been
expanded into several 0/1 indicator variables. In other
words, a binary data set.
library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
library(lattice)
library(ROCR)
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
data(GermanCredit)
GermanCredit$Class <- GermanCredit$Class == "Good"
str(GermanCredit)
## 'data.frame': 1000 obs. of 62 variables:
## $ Duration : int 6 48 12 42 24 36 24 36 12 30 ...
## $ Amount : int 1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
## $ InstallmentRatePercentage : int 4 2 2 2 3 2 3 2 2 4 ...
## $ ResidenceDuration : int 4 2 3 4 4 4 4 2 4 2 ...
## $ Age : int 67 22 49 45 53 35 53 35 61 28 ...
## $ NumberExistingCredits : int 2 1 1 1 2 1 1 1 1 2 ...
## $ NumberPeopleMaintenance : int 1 1 2 2 2 2 1 1 1 1 ...
## $ Telephone : num 0 1 1 1 1 0 1 0 1 1 ...
## $ ForeignWorker : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Class : logi TRUE FALSE TRUE TRUE FALSE TRUE ...
## $ CheckingAccountStatus.lt.0 : num 1 0 0 1 1 0 0 0 0 0 ...
## $ CheckingAccountStatus.0.to.200 : num 0 1 0 0 0 0 0 1 0 1 ...
## $ CheckingAccountStatus.gt.200 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CheckingAccountStatus.none : num 0 0 1 0 0 1 1 0 1 0 ...
## $ CreditHistory.NoCredit.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.ThisBank.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.PaidDuly : num 0 1 0 1 0 1 1 1 1 0 ...
## $ CreditHistory.Delay : num 0 0 0 0 1 0 0 0 0 0 ...
## $ CreditHistory.Critical : num 1 0 1 0 0 0 0 0 0 1 ...
## $ Purpose.NewCar : num 0 0 0 0 1 0 0 0 0 1 ...
## $ Purpose.UsedCar : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Purpose.Furniture.Equipment : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Purpose.Radio.Television : num 1 1 0 0 0 0 0 0 1 0 ...
## $ Purpose.DomesticAppliance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Repairs : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Education : num 0 0 1 0 0 1 0 0 0 0 ...
## $ Purpose.Vacation : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Retraining : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Business : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Other : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.lt.100 : num 0 1 1 1 1 0 0 1 0 1 ...
## $ SavingsAccountBonds.100.to.500 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.500.to.1000 : num 0 0 0 0 0 0 1 0 0 0 ...
## $ SavingsAccountBonds.gt.1000 : num 0 0 0 0 0 0 0 0 1 0 ...
## $ SavingsAccountBonds.Unknown : num 1 0 0 0 0 1 0 0 0 0 ...
## $ EmploymentDuration.lt.1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ EmploymentDuration.1.to.4 : num 0 1 0 0 1 1 0 1 0 0 ...
## $ EmploymentDuration.4.to.7 : num 0 0 1 1 0 0 0 0 1 0 ...
## $ EmploymentDuration.gt.7 : num 1 0 0 0 0 0 1 0 0 0 ...
## $ EmploymentDuration.Unemployed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Male.Divorced.Seperated : num 0 0 0 0 0 0 0 0 1 0 ...
## $ Personal.Female.NotSingle : num 0 1 0 0 0 0 0 0 0 0 ...
## $ Personal.Male.Single : num 1 0 1 1 1 1 1 1 0 0 ...
## $ Personal.Male.Married.Widowed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Female.Single : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.None : num 1 1 1 0 1 1 1 1 1 1 ...
## $ OtherDebtorsGuarantors.CoApplicant : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.Guarantor : num 0 0 0 1 0 0 0 0 0 0 ...
## $ Property.RealEstate : num 1 1 1 0 0 0 0 0 1 0 ...
## $ Property.Insurance : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Property.CarOther : num 0 0 0 0 0 0 0 1 0 1 ...
## $ Property.Unknown : num 0 0 0 0 1 1 0 0 0 0 ...
## $ OtherInstallmentPlans.Bank : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.Stores : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.None : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Housing.Rent : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Housing.Own : num 1 1 1 0 0 0 1 0 1 1 ...
## $ Housing.ForFree : num 0 0 0 1 1 1 0 0 0 0 ...
## $ Job.UnemployedUnskilled : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Job.UnskilledResident : num 0 0 1 0 0 1 0 0 1 0 ...
## $ Job.SkilledEmployee : num 1 1 0 1 1 0 1 0 0 0 ...
## $ Job.Management.SelfEmp.HighlyQualified: num 0 0 0 0 0 0 0 1 0 1 ...
Your observation: There are 1000 obs. of 62 variables. We see all variables are num, but we know that Duration, Amount, Installmentrateperentage, ResidentDuration, Age, NumberExistingCredits, and NumberPeopleMaintenance are categorical
?GermanCredit
View(GermanCredit)
colnames(GermanCredit)
## [1] "Duration"
## [2] "Amount"
## [3] "InstallmentRatePercentage"
## [4] "ResidenceDuration"
## [5] "Age"
## [6] "NumberExistingCredits"
## [7] "NumberPeopleMaintenance"
## [8] "Telephone"
## [9] "ForeignWorker"
## [10] "Class"
## [11] "CheckingAccountStatus.lt.0"
## [12] "CheckingAccountStatus.0.to.200"
## [13] "CheckingAccountStatus.gt.200"
## [14] "CheckingAccountStatus.none"
## [15] "CreditHistory.NoCredit.AllPaid"
## [16] "CreditHistory.ThisBank.AllPaid"
## [17] "CreditHistory.PaidDuly"
## [18] "CreditHistory.Delay"
## [19] "CreditHistory.Critical"
## [20] "Purpose.NewCar"
## [21] "Purpose.UsedCar"
## [22] "Purpose.Furniture.Equipment"
## [23] "Purpose.Radio.Television"
## [24] "Purpose.DomesticAppliance"
## [25] "Purpose.Repairs"
## [26] "Purpose.Education"
## [27] "Purpose.Vacation"
## [28] "Purpose.Retraining"
## [29] "Purpose.Business"
## [30] "Purpose.Other"
## [31] "SavingsAccountBonds.lt.100"
## [32] "SavingsAccountBonds.100.to.500"
## [33] "SavingsAccountBonds.500.to.1000"
## [34] "SavingsAccountBonds.gt.1000"
## [35] "SavingsAccountBonds.Unknown"
## [36] "EmploymentDuration.lt.1"
## [37] "EmploymentDuration.1.to.4"
## [38] "EmploymentDuration.4.to.7"
## [39] "EmploymentDuration.gt.7"
## [40] "EmploymentDuration.Unemployed"
## [41] "Personal.Male.Divorced.Seperated"
## [42] "Personal.Female.NotSingle"
## [43] "Personal.Male.Single"
## [44] "Personal.Male.Married.Widowed"
## [45] "Personal.Female.Single"
## [46] "OtherDebtorsGuarantors.None"
## [47] "OtherDebtorsGuarantors.CoApplicant"
## [48] "OtherDebtorsGuarantors.Guarantor"
## [49] "Property.RealEstate"
## [50] "Property.Insurance"
## [51] "Property.CarOther"
## [52] "Property.Unknown"
## [53] "OtherInstallmentPlans.Bank"
## [54] "OtherInstallmentPlans.Stores"
## [55] "OtherInstallmentPlans.None"
## [56] "Housing.Rent"
## [57] "Housing.Own"
## [58] "Housing.ForFree"
## [59] "Job.UnemployedUnskilled"
## [60] "Job.UnskilledResident"
## [61] "Job.SkilledEmployee"
## [62] "Job.Management.SelfEmp.HighlyQualified"
mean(GermanCredit$Class)
## [1] 0.7
# Understanding German Credit structure
str(GermanCredit)
## 'data.frame': 1000 obs. of 62 variables:
## $ Duration : int 6 48 12 42 24 36 24 36 12 30 ...
## $ Amount : int 1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
## $ InstallmentRatePercentage : int 4 2 2 2 3 2 3 2 2 4 ...
## $ ResidenceDuration : int 4 2 3 4 4 4 4 2 4 2 ...
## $ Age : int 67 22 49 45 53 35 53 35 61 28 ...
## $ NumberExistingCredits : int 2 1 1 1 2 1 1 1 1 2 ...
## $ NumberPeopleMaintenance : int 1 1 2 2 2 2 1 1 1 1 ...
## $ Telephone : num 0 1 1 1 1 0 1 0 1 1 ...
## $ ForeignWorker : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Class : logi TRUE FALSE TRUE TRUE FALSE TRUE ...
## $ CheckingAccountStatus.lt.0 : num 1 0 0 1 1 0 0 0 0 0 ...
## $ CheckingAccountStatus.0.to.200 : num 0 1 0 0 0 0 0 1 0 1 ...
## $ CheckingAccountStatus.gt.200 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CheckingAccountStatus.none : num 0 0 1 0 0 1 1 0 1 0 ...
## $ CreditHistory.NoCredit.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.ThisBank.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.PaidDuly : num 0 1 0 1 0 1 1 1 1 0 ...
## $ CreditHistory.Delay : num 0 0 0 0 1 0 0 0 0 0 ...
## $ CreditHistory.Critical : num 1 0 1 0 0 0 0 0 0 1 ...
## $ Purpose.NewCar : num 0 0 0 0 1 0 0 0 0 1 ...
## $ Purpose.UsedCar : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Purpose.Furniture.Equipment : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Purpose.Radio.Television : num 1 1 0 0 0 0 0 0 1 0 ...
## $ Purpose.DomesticAppliance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Repairs : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Education : num 0 0 1 0 0 1 0 0 0 0 ...
## $ Purpose.Vacation : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Retraining : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Business : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Other : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.lt.100 : num 0 1 1 1 1 0 0 1 0 1 ...
## $ SavingsAccountBonds.100.to.500 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.500.to.1000 : num 0 0 0 0 0 0 1 0 0 0 ...
## $ SavingsAccountBonds.gt.1000 : num 0 0 0 0 0 0 0 0 1 0 ...
## $ SavingsAccountBonds.Unknown : num 1 0 0 0 0 1 0 0 0 0 ...
## $ EmploymentDuration.lt.1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ EmploymentDuration.1.to.4 : num 0 1 0 0 1 1 0 1 0 0 ...
## $ EmploymentDuration.4.to.7 : num 0 0 1 1 0 0 0 0 1 0 ...
## $ EmploymentDuration.gt.7 : num 1 0 0 0 0 0 1 0 0 0 ...
## $ EmploymentDuration.Unemployed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Male.Divorced.Seperated : num 0 0 0 0 0 0 0 0 1 0 ...
## $ Personal.Female.NotSingle : num 0 1 0 0 0 0 0 0 0 0 ...
## $ Personal.Male.Single : num 1 0 1 1 1 1 1 1 0 0 ...
## $ Personal.Male.Married.Widowed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Female.Single : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.None : num 1 1 1 0 1 1 1 1 1 1 ...
## $ OtherDebtorsGuarantors.CoApplicant : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.Guarantor : num 0 0 0 1 0 0 0 0 0 0 ...
## $ Property.RealEstate : num 1 1 1 0 0 0 0 0 1 0 ...
## $ Property.Insurance : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Property.CarOther : num 0 0 0 0 0 0 0 1 0 1 ...
## $ Property.Unknown : num 0 0 0 0 1 1 0 0 0 0 ...
## $ OtherInstallmentPlans.Bank : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.Stores : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.None : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Housing.Rent : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Housing.Own : num 1 1 1 0 0 0 1 0 1 1 ...
## $ Housing.ForFree : num 0 0 0 1 1 1 0 0 0 0 ...
## $ Job.UnemployedUnskilled : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Job.UnskilledResident : num 0 0 1 0 0 1 0 0 1 0 ...
## $ Job.SkilledEmployee : num 1 1 0 1 1 0 1 0 0 0 ...
## $ Job.Management.SelfEmp.HighlyQualified: num 0 0 0 0 0 0 0 1 0 1 ...
summary(GermanCredit)
## Duration Amount InstallmentRatePercentage ResidenceDuration
## Min. : 4.0 Min. : 250 Min. :1.000 Min. :1.000
## 1st Qu.:12.0 1st Qu.: 1366 1st Qu.:2.000 1st Qu.:2.000
## Median :18.0 Median : 2320 Median :3.000 Median :3.000
## Mean :20.9 Mean : 3271 Mean :2.973 Mean :2.845
## 3rd Qu.:24.0 3rd Qu.: 3972 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :72.0 Max. :18424 Max. :4.000 Max. :4.000
## Age NumberExistingCredits NumberPeopleMaintenance Telephone
## Min. :19.00 Min. :1.000 Min. :1.000 Min. :0.000
## 1st Qu.:27.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.000
## Median :33.00 Median :1.000 Median :1.000 Median :1.000
## Mean :35.55 Mean :1.407 Mean :1.155 Mean :0.596
## 3rd Qu.:42.00 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :75.00 Max. :4.000 Max. :2.000 Max. :1.000
## ForeignWorker Class CheckingAccountStatus.lt.0
## Min. :0.000 Mode :logical Min. :0.000
## 1st Qu.:1.000 FALSE:300 1st Qu.:0.000
## Median :1.000 TRUE :700 Median :0.000
## Mean :0.963 Mean :0.274
## 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :1.000 Max. :1.000
## CheckingAccountStatus.0.to.200 CheckingAccountStatus.gt.200
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000
## Mean :0.269 Mean :0.063
## 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000
## CheckingAccountStatus.none CreditHistory.NoCredit.AllPaid
## Min. :0.000 Min. :0.00
## 1st Qu.:0.000 1st Qu.:0.00
## Median :0.000 Median :0.00
## Mean :0.394 Mean :0.04
## 3rd Qu.:1.000 3rd Qu.:0.00
## Max. :1.000 Max. :1.00
## CreditHistory.ThisBank.AllPaid CreditHistory.PaidDuly CreditHistory.Delay
## Min. :0.000 Min. :0.00 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000
## Median :0.000 Median :1.00 Median :0.000
## Mean :0.049 Mean :0.53 Mean :0.088
## 3rd Qu.:0.000 3rd Qu.:1.00 3rd Qu.:0.000
## Max. :1.000 Max. :1.00 Max. :1.000
## CreditHistory.Critical Purpose.NewCar Purpose.UsedCar
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.000
## Mean :0.293 Mean :0.234 Mean :0.103
## 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000 Max. :1.000
## Purpose.Furniture.Equipment Purpose.Radio.Television Purpose.DomesticAppliance
## Min. :0.000 Min. :0.00 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000
## Median :0.000 Median :0.00 Median :0.000
## Mean :0.181 Mean :0.28 Mean :0.012
## 3rd Qu.:0.000 3rd Qu.:1.00 3rd Qu.:0.000
## Max. :1.000 Max. :1.00 Max. :1.000
## Purpose.Repairs Purpose.Education Purpose.Vacation Purpose.Retraining
## Min. :0.000 Min. :0.00 Min. :0 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0 1st Qu.:0.000
## Median :0.000 Median :0.00 Median :0 Median :0.000
## Mean :0.022 Mean :0.05 Mean :0 Mean :0.009
## 3rd Qu.:0.000 3rd Qu.:0.00 3rd Qu.:0 3rd Qu.:0.000
## Max. :1.000 Max. :1.00 Max. :0 Max. :1.000
## Purpose.Business Purpose.Other SavingsAccountBonds.lt.100
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :1.000
## Mean :0.097 Mean :0.012 Mean :0.603
## 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:1.000
## Max. :1.000 Max. :1.000 Max. :1.000
## SavingsAccountBonds.100.to.500 SavingsAccountBonds.500.to.1000
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000
## Mean :0.103 Mean :0.063
## 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000
## SavingsAccountBonds.gt.1000 SavingsAccountBonds.Unknown
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000
## Mean :0.048 Mean :0.183
## 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000
## EmploymentDuration.lt.1 EmploymentDuration.1.to.4 EmploymentDuration.4.to.7
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.000
## Mean :0.172 Mean :0.339 Mean :0.174
## 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000 Max. :1.000
## EmploymentDuration.gt.7 EmploymentDuration.Unemployed
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000
## Mean :0.253 Mean :0.062
## 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000
## Personal.Male.Divorced.Seperated Personal.Female.NotSingle
## Min. :0.00 Min. :0.00
## 1st Qu.:0.00 1st Qu.:0.00
## Median :0.00 Median :0.00
## Mean :0.05 Mean :0.31
## 3rd Qu.:0.00 3rd Qu.:1.00
## Max. :1.00 Max. :1.00
## Personal.Male.Single Personal.Male.Married.Widowed Personal.Female.Single
## Min. :0.000 Min. :0.000 Min. :0
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0
## Median :1.000 Median :0.000 Median :0
## Mean :0.548 Mean :0.092 Mean :0
## 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:0
## Max. :1.000 Max. :1.000 Max. :0
## OtherDebtorsGuarantors.None OtherDebtorsGuarantors.CoApplicant
## Min. :0.000 Min. :0.000
## 1st Qu.:1.000 1st Qu.:0.000
## Median :1.000 Median :0.000
## Mean :0.907 Mean :0.041
## 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000
## OtherDebtorsGuarantors.Guarantor Property.RealEstate Property.Insurance
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.000
## Mean :0.052 Mean :0.282 Mean :0.232
## 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000 Max. :1.000
## Property.CarOther Property.Unknown OtherInstallmentPlans.Bank
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.000
## Mean :0.332 Mean :0.154 Mean :0.139
## 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000 Max. :1.000
## OtherInstallmentPlans.Stores OtherInstallmentPlans.None Housing.Rent
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:0.000
## Median :0.000 Median :1.000 Median :0.000
## Mean :0.047 Mean :0.814 Mean :0.179
## 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000 Max. :1.000
## Housing.Own Housing.ForFree Job.UnemployedUnskilled Job.UnskilledResident
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.0
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.0
## Median :1.000 Median :0.000 Median :0.000 Median :0.0
## Mean :0.713 Mean :0.108 Mean :0.022 Mean :0.2
## 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:0.0
## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.0
## Job.SkilledEmployee Job.Management.SelfEmp.HighlyQualified
## Min. :0.00 Min. :0.000
## 1st Qu.:0.00 1st Qu.:0.000
## Median :1.00 Median :0.000
## Mean :0.63 Mean :0.148
## 3rd Qu.:1.00 3rd Qu.:0.000
## Max. :1.00 Max. :1.000
Your observation: Looking into the structure of the data we see many binary variables with seven categorical variables as mentioned before.
2024 for reproducibility.
(10pts)# Splitting German Credit into Training and Testing sets
set.seed(2024)
index <- sample(1:nrow(GermanCredit),nrow(GermanCredit)*0.80)
German_credit_train = GermanCredit[index,]
German_credit_test = GermanCredit[-index,]
Your observation: The data of GermanCredit was separated
into German_credit_train and
German_credit_test.
# Making a Logistic Regression Model
German_log <- glm(Class~., family=binomial, data=German_credit_train)
Your observation: Looking at all the variables from German_log we can see every variable is insignificant besides Amount to InstallmentRatePercentage, CheckingAccountStatus.lt.0, CheckingAccountStatus.0.to.200 , making it significantly different from 0.
# Summary of the Logistic Regression of German Credit Train
summary(German_log)
##
## Call:
## glm(formula = Class ~ ., family = binomial, data = German_credit_train)
##
## Coefficients: (13 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.241e+00 1.719e+00 5.376 7.61e-08
## Duration -2.994e-02 1.072e-02 -2.794 0.005214
## Amount -1.771e-04 5.095e-05 -3.475 0.000510
## InstallmentRatePercentage -3.718e-01 1.036e-01 -3.589 0.000332
## ResidenceDuration 2.577e-02 1.010e-01 0.255 0.798510
## Age 1.183e-02 1.097e-02 1.078 0.280974
## NumberExistingCredits -1.225e-01 2.189e-01 -0.560 0.575690
## NumberPeopleMaintenance -1.731e-01 2.945e-01 -0.588 0.556678
## Telephone -4.236e-01 2.371e-01 -1.786 0.074081
## ForeignWorker -1.651e+00 7.421e-01 -2.224 0.026143
## CheckingAccountStatus.lt.0 -1.817e+00 2.710e-01 -6.703 2.04e-11
## CheckingAccountStatus.0.to.200 -1.432e+00 2.686e-01 -5.330 9.81e-08
## CheckingAccountStatus.gt.200 -5.912e-01 4.631e-01 -1.277 0.201696
## CheckingAccountStatus.none NA NA NA NA
## CreditHistory.NoCredit.AllPaid -8.724e-01 5.139e-01 -1.698 0.089584
## CreditHistory.ThisBank.AllPaid -1.676e+00 5.493e-01 -3.052 0.002277
## CreditHistory.PaidDuly -6.686e-01 2.939e-01 -2.275 0.022899
## CreditHistory.Delay -9.413e-01 3.780e-01 -2.491 0.012756
## CreditHistory.Critical NA NA NA NA
## Purpose.NewCar -1.733e+00 1.013e+00 -1.710 0.087282
## Purpose.UsedCar 6.716e-02 1.033e+00 0.065 0.948146
## Purpose.Furniture.Equipment -8.257e-01 1.015e+00 -0.814 0.415816
## Purpose.Radio.Television -8.386e-01 1.019e+00 -0.823 0.410457
## Purpose.DomesticAppliance -1.227e+00 1.328e+00 -0.923 0.355762
## Purpose.Repairs -1.321e+00 1.165e+00 -1.134 0.256825
## Purpose.Education -2.020e+00 1.088e+00 -1.857 0.063374
## Purpose.Vacation NA NA NA NA
## Purpose.Retraining 4.276e-01 1.640e+00 0.261 0.794237
## Purpose.Business -8.618e-01 1.032e+00 -0.835 0.403529
## Purpose.Other NA NA NA NA
## SavingsAccountBonds.lt.100 -1.266e+00 3.201e-01 -3.956 7.63e-05
## SavingsAccountBonds.100.to.500 -1.075e+00 4.171e-01 -2.577 0.009964
## SavingsAccountBonds.500.to.1000 -8.768e-01 5.216e-01 -1.681 0.092761
## SavingsAccountBonds.gt.1000 1.301e-02 6.161e-01 0.021 0.983157
## SavingsAccountBonds.Unknown NA NA NA NA
## EmploymentDuration.lt.1 3.581e-01 5.167e-01 0.693 0.488195
## EmploymentDuration.1.to.4 5.527e-01 5.000e-01 1.105 0.268967
## EmploymentDuration.4.to.7 9.863e-01 5.355e-01 1.842 0.065524
## EmploymentDuration.gt.7 5.253e-01 5.039e-01 1.042 0.297218
## EmploymentDuration.Unemployed NA NA NA NA
## Personal.Male.Divorced.Seperated -2.546e-01 5.214e-01 -0.488 0.625274
## Personal.Female.NotSingle -1.274e-01 3.573e-01 -0.357 0.721452
## Personal.Male.Single 4.118e-01 3.623e-01 1.137 0.255622
## Personal.Male.Married.Widowed NA NA NA NA
## Personal.Female.Single NA NA NA NA
## OtherDebtorsGuarantors.None -1.239e+00 5.370e-01 -2.308 0.021018
## OtherDebtorsGuarantors.CoApplicant -1.565e+00 6.828e-01 -2.292 0.021919
## OtherDebtorsGuarantors.Guarantor NA NA NA NA
## Property.RealEstate 7.166e-01 4.898e-01 1.463 0.143477
## Property.Insurance 3.544e-01 4.785e-01 0.741 0.458926
## Property.CarOther 6.110e-01 4.648e-01 1.314 0.188702
## Property.Unknown NA NA NA NA
## OtherInstallmentPlans.Bank -8.504e-01 2.730e-01 -3.115 0.001838
## OtherInstallmentPlans.Stores -4.293e-01 4.711e-01 -0.911 0.362139
## OtherInstallmentPlans.None NA NA NA NA
## Housing.Rent -9.538e-01 5.624e-01 -1.696 0.089924
## Housing.Own -2.723e-01 5.282e-01 -0.516 0.606157
## Housing.ForFree NA NA NA NA
## Job.UnemployedUnskilled 1.449e+00 8.788e-01 1.649 0.099175
## Job.UnskilledResident -2.641e-03 4.101e-01 -0.006 0.994861
## Job.SkilledEmployee -1.073e-02 3.349e-01 -0.032 0.974438
## Job.Management.SelfEmp.HighlyQualified NA NA NA NA
##
## (Intercept) ***
## Duration **
## Amount ***
## InstallmentRatePercentage ***
## ResidenceDuration
## Age
## NumberExistingCredits
## NumberPeopleMaintenance
## Telephone .
## ForeignWorker *
## CheckingAccountStatus.lt.0 ***
## CheckingAccountStatus.0.to.200 ***
## CheckingAccountStatus.gt.200
## CheckingAccountStatus.none
## CreditHistory.NoCredit.AllPaid .
## CreditHistory.ThisBank.AllPaid **
## CreditHistory.PaidDuly *
## CreditHistory.Delay *
## CreditHistory.Critical
## Purpose.NewCar .
## Purpose.UsedCar
## Purpose.Furniture.Equipment
## Purpose.Radio.Television
## Purpose.DomesticAppliance
## Purpose.Repairs
## Purpose.Education .
## Purpose.Vacation
## Purpose.Retraining
## Purpose.Business
## Purpose.Other
## SavingsAccountBonds.lt.100 ***
## SavingsAccountBonds.100.to.500 **
## SavingsAccountBonds.500.to.1000 .
## SavingsAccountBonds.gt.1000
## SavingsAccountBonds.Unknown
## EmploymentDuration.lt.1
## EmploymentDuration.1.to.4
## EmploymentDuration.4.to.7 .
## EmploymentDuration.gt.7
## EmploymentDuration.Unemployed
## Personal.Male.Divorced.Seperated
## Personal.Female.NotSingle
## Personal.Male.Single
## Personal.Male.Married.Widowed
## Personal.Female.Single
## OtherDebtorsGuarantors.None *
## OtherDebtorsGuarantors.CoApplicant *
## OtherDebtorsGuarantors.Guarantor
## Property.RealEstate
## Property.Insurance
## Property.CarOther
## Property.Unknown
## OtherInstallmentPlans.Bank **
## OtherInstallmentPlans.Stores
## OtherInstallmentPlans.None
## Housing.Rent .
## Housing.Own
## Housing.ForFree
## Job.UnemployedUnskilled .
## Job.UnskilledResident
## Job.SkilledEmployee
## Job.Management.SelfEmp.HighlyQualified
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 958.02 on 799 degrees of freedom
## Residual deviance: 672.78 on 751 degrees of freedom
## AIC: 770.78
##
## Number of Fisher Scoring iterations: 5
Your observation: Based off the model, the variable
CheckingAccountStatus.0.to.200 has the most reliable p
value in the data set. The lower the estimate, the more less likely
somebody will be classed as good or having good credit.
This makes sense considering CheckingAccontStatus.lt.0 is
the strongest estimate with an -1.817e+00 for its either
positive or 1 for the likelihood of good credit.
# Code for predicted probabilities
pred_prob_German_credit_train <- predict(German_log, newdata = German_credit_train, type = "response")
pred_German_train <- prediction(pred_prob_German_credit_train, German_credit_train$Class)
# Wanted more insight using the Histograms
Histogram_German_credit_train <- predict(German_log)
hist(Histogram_German_credit_train)
Histogram_prob_German_credit_train <- predict(German_log, type="response")
hist(Histogram_prob_German_credit_train)
Your observation: Histogram pred_German_credit_train
displays the response function. This gives us the predicted probability
solution of German_credit_traing, where some what of the
majority of cases will have a high probability of
Class = Good.
costfunc = function(obs, pred.p, pcut){
weight_FN = 1
weight_FP = 1
FNC = sum( (obs==1) & (pred.p < pcut))
FPC = sum( (obs==0) & (pred.p >=pcut))
MR = sum(weight_FN*FNC + weight_FP*FPC) / length(obs)
return(MR)
}
pcut.seq = seq(0.01, 1, 0.01)
MR_vec = rep(0, length(pcut.seq))
for(i in 1:length(pcut.seq)){
MR_vec[i] = costfunc(obs = GermanCredit$Class, pred.p = pred_prob_German_credit_train, pcut = pcut.seq[i])
}
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
cbind(pcut.seq,MR_vec)
## pcut.seq MR_vec
## [1,] 0.01 0.300
## [2,] 0.02 0.300
## [3,] 0.03 0.300
## [4,] 0.04 0.304
## [5,] 0.05 0.304
## [6,] 0.06 0.304
## [7,] 0.07 0.305
## [8,] 0.08 0.305
## [9,] 0.09 0.308
## [10,] 0.10 0.308
## [11,] 0.11 0.310
## [12,] 0.12 0.310
## [13,] 0.13 0.310
## [14,] 0.14 0.311
## [15,] 0.15 0.311
## [16,] 0.16 0.311
## [17,] 0.17 0.312
## [18,] 0.18 0.316
## [19,] 0.19 0.312
## [20,] 0.20 0.315
## [21,] 0.21 0.316
## [22,] 0.22 0.317
## [23,] 0.23 0.319
## [24,] 0.24 0.317
## [25,] 0.25 0.320
## [26,] 0.26 0.325
## [27,] 0.27 0.325
## [28,] 0.28 0.326
## [29,] 0.29 0.325
## [30,] 0.30 0.331
## [31,] 0.31 0.331
## [32,] 0.32 0.336
## [33,] 0.33 0.337
## [34,] 0.34 0.341
## [35,] 0.35 0.346
## [36,] 0.36 0.352
## [37,] 0.37 0.354
## [38,] 0.38 0.361
## [39,] 0.39 0.365
## [40,] 0.40 0.365
## [41,] 0.41 0.368
## [42,] 0.42 0.370
## [43,] 0.43 0.370
## [44,] 0.44 0.372
## [45,] 0.45 0.374
## [46,] 0.46 0.378
## [47,] 0.47 0.384
## [48,] 0.48 0.384
## [49,] 0.49 0.387
## [50,] 0.50 0.390
## [51,] 0.51 0.392
## [52,] 0.52 0.394
## [53,] 0.53 0.403
## [54,] 0.54 0.405
## [55,] 0.55 0.405
## [56,] 0.56 0.409
## [57,] 0.57 0.414
## [58,] 0.58 0.413
## [59,] 0.59 0.422
## [60,] 0.60 0.421
## [61,] 0.61 0.431
## [62,] 0.62 0.432
## [63,] 0.63 0.434
## [64,] 0.64 0.436
## [65,] 0.65 0.442
## [66,] 0.66 0.447
## [67,] 0.67 0.448
## [68,] 0.68 0.450
## [69,] 0.69 0.453
## [70,] 0.70 0.457
## [71,] 0.71 0.458
## [72,] 0.72 0.457
## [73,] 0.73 0.460
## [74,] 0.74 0.464
## [75,] 0.75 0.478
## [76,] 0.76 0.484
## [77,] 0.77 0.485
## [78,] 0.78 0.485
## [79,] 0.79 0.487
## [80,] 0.80 0.488
## [81,] 0.81 0.492
## [82,] 0.82 0.495
## [83,] 0.83 0.503
## [84,] 0.84 0.509
## [85,] 0.85 0.516
## [86,] 0.86 0.519
## [87,] 0.87 0.528
## [88,] 0.88 0.525
## [89,] 0.89 0.530
## [90,] 0.90 0.546
## [91,] 0.91 0.559
## [92,] 0.92 0.566
## [93,] 0.93 0.577
## [94,] 0.94 0.597
## [95,] 0.95 0.612
## [96,] 0.96 0.620
## [97,] 0.97 0.640
## [98,] 0.98 0.657
## [99,] 0.99 0.679
## [100,] 1.00 0.700
# All new p-cut and Y axis being associated Class
plot(pcut.seq, MR_vec)
# All "new" p-cut and Y axis being associated Class
First.optimal.pcut = pcut.seq[which(MR_vec==min(MR_vec))]
print(First.optimal.pcut)
## [1] 0.01 0.02 0.03
Your observation: We are searching all possible p-cuts to find the
one that provides minimum MR (cost) within the predicted probability of
pred_prob_train. The model determined the optimal
probability cut-off points where 0.01, 0.02, and 0.03 but
looking at the MR_vec we see the distribution through all
possible cutoff values. Starts to make a slight shift at 0.4
which makes me conclude that 0.4 is the overall best optimal
probability cut-off point.
# Confusion matrix of Training, and Training MR
pred_prob_credit_train <- predict(German_log, type="response")
class.glm0.train<- (pred_prob_credit_train> 0.4 )*1
table(German_credit_train$Class, class.glm0.train, dnn = c("True", "Predicted"))
## Predicted
## True 0 1
## FALSE 107 122
## TRUE 35 536
MR<- 1 - sum(diag(pred_prob_credit_train)) / sum(pred_prob_German_credit_train)
print(paste0("MR:",MR))
## [1] "MR:0"
Your observation: The model achieves an accuracy of 80.4%,
indicating that it correctly classifies 80.4% of all instances.
This evaluation suggests that the model performs well, particularly in
detecting true positives, though it may have a higher false positive
rate than desired. Having an MR of 0 meaning, that there
are no false negatives—the model has correctly classified all actual
positives. In other words, it has perfect recall (or sensitivity),
effectively identifying every positive instance in the data set without
any misses.
# ROC and AUC for Training
ROC <- performance(pred_German_train, "tpr", "fpr")
plot(ROC, colorize=TRUE)
German_class_test_optim <- ifelse(pred_prob_German_credit_train >= 0.4, 1, 0)
auc_German_train = unlist(slot(performance(pred_German_train, "auc"), "y.values"))
auc_German_train
## [1] 0.8504807
Your observation: Using the AUC model, the model already predicts
really good with the German_credit_training data. It
provides us an ROC of 0.8504807 which is actually really good,
for the ROC predicts close to 1.
# Confusion matrix of Testing, and Testing MR
pred_prob_credit_test<- predict(German_log, newdata = German_credit_test, type="response")
pred_German_test <- prediction(pred_prob_credit_test, German_credit_test$Class)
class.glm0.test <- (pred_prob_credit_test> 0.4 )*1
confusion_test<- table(German_credit_test$Class, class.glm0.test, dnn = c("True", "Predicted"))
MR<- 1 - sum(diag(confusion_test)) / sum(confusion_test)
print(paste0("MR:",MR))
## [1] "MR:0.27"
Your observation: Accuracy 73% indicates that the model
correctly classifies 73% of all instances. Same as the other
model pred_class_credit_test the MR is 0 for model
pred_class_credit_test. There are no false negatives—the
model has correctly classified all actual positives. This evaluation
suggests that the model is effective at identifying true positives. In
other words, it has perfect recall (or sensitivity), effectively
identifying every positive instance in the data set without any
misses.
library(ROCR)
library(gplots)
# ROC and AUC for Testing
ROC <- performance(pred_German_test, "tpr", "fpr")
plot(ROC, colorize=TRUE)
auc_German_test = unlist(slot(performance(pred_German_test, "auc"), "y.values"))
auc_German_test
## [1] 0.7353423
Your observation: Using the AUC model, the model already predicts
fairly well with the German_credit_test data. It provides
us an ROC of 0.7353423 which is actually above average, for the
ROC predicts close to 1.
Now, let’s assume “It is worse to class a customer as good when they are bad (weight = 5), than it is to class a customer as bad when they are good (weight = 1).” Please figure out which weight should be 5 and which weight should be 1. Then define your cost function accordingly!
costfunc = function(obs, pred.p, pcut){
weight_FN = 5
weight_FP = 1
FNC = sum( (obs==1) & (pred.p < pcut))
FPC = sum( (obs==0) & (pred.p >=pcut))
MR = sum(weight_FN*FNC + weight_FP*FPC) / length(obs)
return(MR)
}
pcut.seq = seq(0.01, 1, 0.01)
MR_vec = rep(0, length(pcut.seq))
for(i in 1:length(pcut.seq)){
MR_vec[i] = costfunc(obs = GermanCredit$Class, pred.p = pred_prob_German_credit_train, pcut = pcut.seq[i])
}
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
cbind(pcut.seq,MR_vec)
## pcut.seq MR_vec
## [1,] 0.01 0.300
## [2,] 0.02 0.300
## [3,] 0.03 0.300
## [4,] 0.04 0.328
## [5,] 0.05 0.328
## [6,] 0.06 0.328
## [7,] 0.07 0.333
## [8,] 0.08 0.337
## [9,] 0.09 0.356
## [10,] 0.10 0.368
## [11,] 0.11 0.382
## [12,] 0.12 0.382
## [13,] 0.13 0.386
## [14,] 0.14 0.391
## [15,] 0.15 0.395
## [16,] 0.16 0.395
## [17,] 0.17 0.400
## [18,] 0.18 0.424
## [19,] 0.19 0.420
## [20,] 0.20 0.439
## [21,] 0.21 0.452
## [22,] 0.22 0.465
## [23,] 0.23 0.483
## [24,] 0.24 0.485
## [25,] 0.25 0.504
## [26,] 0.26 0.529
## [27,] 0.27 0.537
## [28,] 0.28 0.546
## [29,] 0.29 0.545
## [30,] 0.30 0.591
## [31,] 0.31 0.603
## [32,] 0.32 0.632
## [33,] 0.33 0.649
## [34,] 0.34 0.673
## [35,] 0.35 0.710
## [36,] 0.36 0.760
## [37,] 0.37 0.778
## [38,] 0.38 0.817
## [39,] 0.39 0.853
## [40,] 0.40 0.857
## [41,] 0.41 0.880
## [42,] 0.42 0.890
## [43,] 0.43 0.894
## [44,] 0.44 0.912
## [45,] 0.45 0.922
## [46,] 0.46 0.950
## [47,] 0.47 0.984
## [48,] 0.48 0.996
## [49,] 0.49 1.023
## [50,] 0.50 1.046
## [51,] 0.51 1.060
## [52,] 0.52 1.078
## [53,] 0.53 1.127
## [54,] 0.54 1.145
## [55,] 0.55 1.157
## [56,] 0.56 1.189
## [57,] 0.57 1.234
## [58,] 0.58 1.249
## [59,] 0.59 1.294
## [60,] 0.60 1.305
## [61,] 0.61 1.363
## [62,] 0.62 1.384
## [63,] 0.63 1.398
## [64,] 0.64 1.416
## [65,] 0.65 1.454
## [66,] 0.66 1.487
## [67,] 0.67 1.520
## [68,] 0.68 1.534
## [69,] 0.69 1.561
## [70,] 0.70 1.593
## [71,] 0.71 1.614
## [72,] 0.72 1.617
## [73,] 0.73 1.648
## [74,] 0.74 1.680
## [75,] 0.75 1.754
## [76,] 0.76 1.796
## [77,] 0.77 1.817
## [78,] 0.78 1.841
## [79,] 0.79 1.863
## [80,] 0.80 1.884
## [81,] 0.81 1.916
## [82,] 0.82 1.951
## [83,] 0.83 2.003
## [84,] 0.84 2.045
## [85,] 0.85 2.096
## [86,] 0.86 2.135
## [87,] 0.87 2.200
## [88,] 0.88 2.225
## [89,] 0.89 2.274
## [90,] 0.90 2.366
## [91,] 0.91 2.459
## [92,] 0.92 2.522
## [93,] 0.93 2.589
## [94,] 0.94 2.725
## [95,] 0.95 2.840
## [96,] 0.96 2.912
## [97,] 0.97 3.068
## [98,] 0.98 3.205
## [99,] 0.99 3.375
## [100,] 1.00 3.500
# All new p-cut and Y axis being associated Class
plot(pcut.seq, MR_vec)
# New optimal p-cut
Second.optimal.pcut = pcut.seq[which(MR_vec==min(MR_vec))]
print(Second.optimal.pcut)
## [1] 0.01 0.02 0.03
Your observation: We are searching all possible p-cuts to find the
one that provides minimum MR (cost) within the predicted probability of
pred_prob_train. This time changing the weight of
weight_FN = 5. The model determined the optimal probability
cut-off points where still 0.01, 0.02, and 0.03 but looking at
the MR_vec we see the distribution through all possible
cutoff values. Starts to make a slight shift at 0.4 where we
will continue using 0.4.
# Confusion matrix of Training data, Training MR (New Weights)
pred_class_credit_train_optimal <- (pred_prob_German_credit_train>0.4)*1
conf_train <- table(German_credit_train$Class, pred_class_credit_train_optimal, dnn = c("True", "Predicted"))
MR<- 1 - sum(diag(conf_train)) / sum(conf_train)
print(paste0("MR:",MR))
## [1] "MR:0.19625"
Your observation:
Accuracy (1 - MR) = 1 - 0.19625 = 0.80375 or
80.375%, which indicates that the model correctly classifies
80.375% of instances.This rate suggests that the model has a
moderate level of accuracy, with some room for improvement in reducing
the number of incorrect predictions. Can it predict better for the
Testing set? Lets go ahead and preform another confusion
matrix and calculate the Testing sets MR.
# Confusion matrix of Training data, Training MR (New Weights)
pred_class_credit_test_optimal <- (pred_prob_credit_test>0.4)*1
conf_test <- table(German_credit_test$Class, pred_class_credit_test_optimal, dnn = c("True", "Predicted"))
MR<- 1 - sum(diag(conf_test)) / sum(conf_test)
print(paste0("MR:",MR))
## [1] "MR:0.27"
Your observation: A 73% accuracy (from 27%
misclassification) may be acceptable depending on the domain. Some
companies might accept, this for it could indicate satisfactory
performance; for others, it might necessitate improvement, especially if
the cost of misclassifications is high. Conclusion it doesn’t preform
better than the Training set which is great!
Summarize your findings, including the optimal probability cut-off, MR and AUC for both training and testing data. Discuss what you observed and what you will do to improve the model.
After testing various probability cutoffs, a threshold of
0.4 was identified as optimal, minimizing the misclassification
rate (MR) and achieving an accuracy of 80.4% on the training
set. This accuracy reflects strong performance, with an MR of 0, meaning
no false negatives and perfect recall in identifying actual positives.
The model’s AUC of 0.8505 on the training set
indicates strong class discrimination. On the testing set,
the model reached an accuracy of 73% and maintained an MR of 0,
successfully identifying all true positives, though with a slightly
lower AUC of 0.7353423, suggesting less generalization to new
data. Applying a higher false-negative weight confirmed 0.4 as
the optimal cutoff, balancing misclassification costs with accuracy, yet
leaving room for improvement. Enhancements through hyperparameter
tuning, adjusted cutoffs, feature engineering, and cross-validation
could further boost accuracy and generalization, especially in
applications where misclassification costs are high.