#Data

Data from Dr. Hans Hofmann of the University of Hamburg.

These data have two classes for the credit worthiness: good or bad. There are predictors related to attributes, such as: checking account status, duration, credit history, purpose of the loan, amount of the loan, savings accounts or bonds, employment duration, Installment rate in percentage of disposable income, personal information, other debtors/guarantors, residence duration, property, age, other installment plans, housing, number of existing credits, job information, Number of people being liable to provide maintenance for, telephone, and foreign worker status.

Many of these predictors are discrete and have been expanded into several 0/1 indicator variables. In other words, a binary data set.

Task1: Data Preparation

1. Load the caret package and the GermanCredit dataset.

library(caret) 
## Loading required package: ggplot2
## Loading required package: lattice
library(lattice)
library(ROCR)
library(gplots)
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
data(GermanCredit)
GermanCredit$Class <-  GermanCredit$Class == "Good" 
str(GermanCredit)
## 'data.frame':    1000 obs. of  62 variables:
##  $ Duration                              : int  6 48 12 42 24 36 24 36 12 30 ...
##  $ Amount                                : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ InstallmentRatePercentage             : int  4 2 2 2 3 2 3 2 2 4 ...
##  $ ResidenceDuration                     : int  4 2 3 4 4 4 4 2 4 2 ...
##  $ Age                                   : int  67 22 49 45 53 35 53 35 61 28 ...
##  $ NumberExistingCredits                 : int  2 1 1 1 2 1 1 1 1 2 ...
##  $ NumberPeopleMaintenance               : int  1 1 2 2 2 2 1 1 1 1 ...
##  $ Telephone                             : num  0 1 1 1 1 0 1 0 1 1 ...
##  $ ForeignWorker                         : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Class                                 : logi  TRUE FALSE TRUE TRUE FALSE TRUE ...
##  $ CheckingAccountStatus.lt.0            : num  1 0 0 1 1 0 0 0 0 0 ...
##  $ CheckingAccountStatus.0.to.200        : num  0 1 0 0 0 0 0 1 0 1 ...
##  $ CheckingAccountStatus.gt.200          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CheckingAccountStatus.none            : num  0 0 1 0 0 1 1 0 1 0 ...
##  $ CreditHistory.NoCredit.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.ThisBank.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.PaidDuly                : num  0 1 0 1 0 1 1 1 1 0 ...
##  $ CreditHistory.Delay                   : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ CreditHistory.Critical                : num  1 0 1 0 0 0 0 0 0 1 ...
##  $ Purpose.NewCar                        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ Purpose.UsedCar                       : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Purpose.Furniture.Equipment           : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Purpose.Radio.Television              : num  1 1 0 0 0 0 0 0 1 0 ...
##  $ Purpose.DomesticAppliance             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Repairs                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Education                     : num  0 0 1 0 0 1 0 0 0 0 ...
##  $ Purpose.Vacation                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Retraining                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Business                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Other                         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.lt.100            : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ SavingsAccountBonds.100.to.500        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.500.to.1000       : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ SavingsAccountBonds.gt.1000           : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ SavingsAccountBonds.Unknown           : num  1 0 0 0 0 1 0 0 0 0 ...
##  $ EmploymentDuration.lt.1               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EmploymentDuration.1.to.4             : num  0 1 0 0 1 1 0 1 0 0 ...
##  $ EmploymentDuration.4.to.7             : num  0 0 1 1 0 0 0 0 1 0 ...
##  $ EmploymentDuration.gt.7               : num  1 0 0 0 0 0 1 0 0 0 ...
##  $ EmploymentDuration.Unemployed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Male.Divorced.Seperated      : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ Personal.Female.NotSingle             : num  0 1 0 0 0 0 0 0 0 0 ...
##  $ Personal.Male.Single                  : num  1 0 1 1 1 1 1 1 0 0 ...
##  $ Personal.Male.Married.Widowed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Female.Single                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.None           : num  1 1 1 0 1 1 1 1 1 1 ...
##  $ OtherDebtorsGuarantors.CoApplicant    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.Guarantor      : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ Property.RealEstate                   : num  1 1 1 0 0 0 0 0 1 0 ...
##  $ Property.Insurance                    : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Property.CarOther                     : num  0 0 0 0 0 0 0 1 0 1 ...
##  $ Property.Unknown                      : num  0 0 0 0 1 1 0 0 0 0 ...
##  $ OtherInstallmentPlans.Bank            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.Stores          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.None            : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Housing.Rent                          : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Housing.Own                           : num  1 1 1 0 0 0 1 0 1 1 ...
##  $ Housing.ForFree                       : num  0 0 0 1 1 1 0 0 0 0 ...
##  $ Job.UnemployedUnskilled               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Job.UnskilledResident                 : num  0 0 1 0 0 1 0 0 1 0 ...
##  $ Job.SkilledEmployee                   : num  1 1 0 1 1 0 1 0 0 0 ...
##  $ Job.Management.SelfEmp.HighlyQualified: num  0 0 0 0 0 0 0 1 0 1 ...

Your observation: There are 1000 obs. of 62 variables. We see all variables are num, but we know that Duration, Amount, Installmentrateperentage, ResidentDuration, Age, NumberExistingCredits, and NumberPeopleMaintenance are categorical

2. Explore the dataset to understand its structure. (10pts)

?GermanCredit
View(GermanCredit)
colnames(GermanCredit)
##  [1] "Duration"                              
##  [2] "Amount"                                
##  [3] "InstallmentRatePercentage"             
##  [4] "ResidenceDuration"                     
##  [5] "Age"                                   
##  [6] "NumberExistingCredits"                 
##  [7] "NumberPeopleMaintenance"               
##  [8] "Telephone"                             
##  [9] "ForeignWorker"                         
## [10] "Class"                                 
## [11] "CheckingAccountStatus.lt.0"            
## [12] "CheckingAccountStatus.0.to.200"        
## [13] "CheckingAccountStatus.gt.200"          
## [14] "CheckingAccountStatus.none"            
## [15] "CreditHistory.NoCredit.AllPaid"        
## [16] "CreditHistory.ThisBank.AllPaid"        
## [17] "CreditHistory.PaidDuly"                
## [18] "CreditHistory.Delay"                   
## [19] "CreditHistory.Critical"                
## [20] "Purpose.NewCar"                        
## [21] "Purpose.UsedCar"                       
## [22] "Purpose.Furniture.Equipment"           
## [23] "Purpose.Radio.Television"              
## [24] "Purpose.DomesticAppliance"             
## [25] "Purpose.Repairs"                       
## [26] "Purpose.Education"                     
## [27] "Purpose.Vacation"                      
## [28] "Purpose.Retraining"                    
## [29] "Purpose.Business"                      
## [30] "Purpose.Other"                         
## [31] "SavingsAccountBonds.lt.100"            
## [32] "SavingsAccountBonds.100.to.500"        
## [33] "SavingsAccountBonds.500.to.1000"       
## [34] "SavingsAccountBonds.gt.1000"           
## [35] "SavingsAccountBonds.Unknown"           
## [36] "EmploymentDuration.lt.1"               
## [37] "EmploymentDuration.1.to.4"             
## [38] "EmploymentDuration.4.to.7"             
## [39] "EmploymentDuration.gt.7"               
## [40] "EmploymentDuration.Unemployed"         
## [41] "Personal.Male.Divorced.Seperated"      
## [42] "Personal.Female.NotSingle"             
## [43] "Personal.Male.Single"                  
## [44] "Personal.Male.Married.Widowed"         
## [45] "Personal.Female.Single"                
## [46] "OtherDebtorsGuarantors.None"           
## [47] "OtherDebtorsGuarantors.CoApplicant"    
## [48] "OtherDebtorsGuarantors.Guarantor"      
## [49] "Property.RealEstate"                   
## [50] "Property.Insurance"                    
## [51] "Property.CarOther"                     
## [52] "Property.Unknown"                      
## [53] "OtherInstallmentPlans.Bank"            
## [54] "OtherInstallmentPlans.Stores"          
## [55] "OtherInstallmentPlans.None"            
## [56] "Housing.Rent"                          
## [57] "Housing.Own"                           
## [58] "Housing.ForFree"                       
## [59] "Job.UnemployedUnskilled"               
## [60] "Job.UnskilledResident"                 
## [61] "Job.SkilledEmployee"                   
## [62] "Job.Management.SelfEmp.HighlyQualified"
mean(GermanCredit$Class)
## [1] 0.7
# Understanding German Credit structure 
str(GermanCredit)    
## 'data.frame':    1000 obs. of  62 variables:
##  $ Duration                              : int  6 48 12 42 24 36 24 36 12 30 ...
##  $ Amount                                : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ InstallmentRatePercentage             : int  4 2 2 2 3 2 3 2 2 4 ...
##  $ ResidenceDuration                     : int  4 2 3 4 4 4 4 2 4 2 ...
##  $ Age                                   : int  67 22 49 45 53 35 53 35 61 28 ...
##  $ NumberExistingCredits                 : int  2 1 1 1 2 1 1 1 1 2 ...
##  $ NumberPeopleMaintenance               : int  1 1 2 2 2 2 1 1 1 1 ...
##  $ Telephone                             : num  0 1 1 1 1 0 1 0 1 1 ...
##  $ ForeignWorker                         : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Class                                 : logi  TRUE FALSE TRUE TRUE FALSE TRUE ...
##  $ CheckingAccountStatus.lt.0            : num  1 0 0 1 1 0 0 0 0 0 ...
##  $ CheckingAccountStatus.0.to.200        : num  0 1 0 0 0 0 0 1 0 1 ...
##  $ CheckingAccountStatus.gt.200          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CheckingAccountStatus.none            : num  0 0 1 0 0 1 1 0 1 0 ...
##  $ CreditHistory.NoCredit.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.ThisBank.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.PaidDuly                : num  0 1 0 1 0 1 1 1 1 0 ...
##  $ CreditHistory.Delay                   : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ CreditHistory.Critical                : num  1 0 1 0 0 0 0 0 0 1 ...
##  $ Purpose.NewCar                        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ Purpose.UsedCar                       : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Purpose.Furniture.Equipment           : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Purpose.Radio.Television              : num  1 1 0 0 0 0 0 0 1 0 ...
##  $ Purpose.DomesticAppliance             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Repairs                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Education                     : num  0 0 1 0 0 1 0 0 0 0 ...
##  $ Purpose.Vacation                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Retraining                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Business                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Other                         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.lt.100            : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ SavingsAccountBonds.100.to.500        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.500.to.1000       : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ SavingsAccountBonds.gt.1000           : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ SavingsAccountBonds.Unknown           : num  1 0 0 0 0 1 0 0 0 0 ...
##  $ EmploymentDuration.lt.1               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EmploymentDuration.1.to.4             : num  0 1 0 0 1 1 0 1 0 0 ...
##  $ EmploymentDuration.4.to.7             : num  0 0 1 1 0 0 0 0 1 0 ...
##  $ EmploymentDuration.gt.7               : num  1 0 0 0 0 0 1 0 0 0 ...
##  $ EmploymentDuration.Unemployed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Male.Divorced.Seperated      : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ Personal.Female.NotSingle             : num  0 1 0 0 0 0 0 0 0 0 ...
##  $ Personal.Male.Single                  : num  1 0 1 1 1 1 1 1 0 0 ...
##  $ Personal.Male.Married.Widowed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Female.Single                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.None           : num  1 1 1 0 1 1 1 1 1 1 ...
##  $ OtherDebtorsGuarantors.CoApplicant    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.Guarantor      : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ Property.RealEstate                   : num  1 1 1 0 0 0 0 0 1 0 ...
##  $ Property.Insurance                    : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Property.CarOther                     : num  0 0 0 0 0 0 0 1 0 1 ...
##  $ Property.Unknown                      : num  0 0 0 0 1 1 0 0 0 0 ...
##  $ OtherInstallmentPlans.Bank            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.Stores          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.None            : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Housing.Rent                          : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Housing.Own                           : num  1 1 1 0 0 0 1 0 1 1 ...
##  $ Housing.ForFree                       : num  0 0 0 1 1 1 0 0 0 0 ...
##  $ Job.UnemployedUnskilled               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Job.UnskilledResident                 : num  0 0 1 0 0 1 0 0 1 0 ...
##  $ Job.SkilledEmployee                   : num  1 1 0 1 1 0 1 0 0 0 ...
##  $ Job.Management.SelfEmp.HighlyQualified: num  0 0 0 0 0 0 0 1 0 1 ...
summary(GermanCredit)
##     Duration        Amount      InstallmentRatePercentage ResidenceDuration
##  Min.   : 4.0   Min.   :  250   Min.   :1.000             Min.   :1.000    
##  1st Qu.:12.0   1st Qu.: 1366   1st Qu.:2.000             1st Qu.:2.000    
##  Median :18.0   Median : 2320   Median :3.000             Median :3.000    
##  Mean   :20.9   Mean   : 3271   Mean   :2.973             Mean   :2.845    
##  3rd Qu.:24.0   3rd Qu.: 3972   3rd Qu.:4.000             3rd Qu.:4.000    
##  Max.   :72.0   Max.   :18424   Max.   :4.000             Max.   :4.000    
##       Age        NumberExistingCredits NumberPeopleMaintenance   Telephone    
##  Min.   :19.00   Min.   :1.000         Min.   :1.000           Min.   :0.000  
##  1st Qu.:27.00   1st Qu.:1.000         1st Qu.:1.000           1st Qu.:0.000  
##  Median :33.00   Median :1.000         Median :1.000           Median :1.000  
##  Mean   :35.55   Mean   :1.407         Mean   :1.155           Mean   :0.596  
##  3rd Qu.:42.00   3rd Qu.:2.000         3rd Qu.:1.000           3rd Qu.:1.000  
##  Max.   :75.00   Max.   :4.000         Max.   :2.000           Max.   :1.000  
##  ForeignWorker     Class         CheckingAccountStatus.lt.0
##  Min.   :0.000   Mode :logical   Min.   :0.000             
##  1st Qu.:1.000   FALSE:300       1st Qu.:0.000             
##  Median :1.000   TRUE :700       Median :0.000             
##  Mean   :0.963                   Mean   :0.274             
##  3rd Qu.:1.000                   3rd Qu.:1.000             
##  Max.   :1.000                   Max.   :1.000             
##  CheckingAccountStatus.0.to.200 CheckingAccountStatus.gt.200
##  Min.   :0.000                  Min.   :0.000               
##  1st Qu.:0.000                  1st Qu.:0.000               
##  Median :0.000                  Median :0.000               
##  Mean   :0.269                  Mean   :0.063               
##  3rd Qu.:1.000                  3rd Qu.:0.000               
##  Max.   :1.000                  Max.   :1.000               
##  CheckingAccountStatus.none CreditHistory.NoCredit.AllPaid
##  Min.   :0.000              Min.   :0.00                  
##  1st Qu.:0.000              1st Qu.:0.00                  
##  Median :0.000              Median :0.00                  
##  Mean   :0.394              Mean   :0.04                  
##  3rd Qu.:1.000              3rd Qu.:0.00                  
##  Max.   :1.000              Max.   :1.00                  
##  CreditHistory.ThisBank.AllPaid CreditHistory.PaidDuly CreditHistory.Delay
##  Min.   :0.000                  Min.   :0.00           Min.   :0.000      
##  1st Qu.:0.000                  1st Qu.:0.00           1st Qu.:0.000      
##  Median :0.000                  Median :1.00           Median :0.000      
##  Mean   :0.049                  Mean   :0.53           Mean   :0.088      
##  3rd Qu.:0.000                  3rd Qu.:1.00           3rd Qu.:0.000      
##  Max.   :1.000                  Max.   :1.00           Max.   :1.000      
##  CreditHistory.Critical Purpose.NewCar  Purpose.UsedCar
##  Min.   :0.000          Min.   :0.000   Min.   :0.000  
##  1st Qu.:0.000          1st Qu.:0.000   1st Qu.:0.000  
##  Median :0.000          Median :0.000   Median :0.000  
##  Mean   :0.293          Mean   :0.234   Mean   :0.103  
##  3rd Qu.:1.000          3rd Qu.:0.000   3rd Qu.:0.000  
##  Max.   :1.000          Max.   :1.000   Max.   :1.000  
##  Purpose.Furniture.Equipment Purpose.Radio.Television Purpose.DomesticAppliance
##  Min.   :0.000               Min.   :0.00             Min.   :0.000            
##  1st Qu.:0.000               1st Qu.:0.00             1st Qu.:0.000            
##  Median :0.000               Median :0.00             Median :0.000            
##  Mean   :0.181               Mean   :0.28             Mean   :0.012            
##  3rd Qu.:0.000               3rd Qu.:1.00             3rd Qu.:0.000            
##  Max.   :1.000               Max.   :1.00             Max.   :1.000            
##  Purpose.Repairs Purpose.Education Purpose.Vacation Purpose.Retraining
##  Min.   :0.000   Min.   :0.00      Min.   :0        Min.   :0.000     
##  1st Qu.:0.000   1st Qu.:0.00      1st Qu.:0        1st Qu.:0.000     
##  Median :0.000   Median :0.00      Median :0        Median :0.000     
##  Mean   :0.022   Mean   :0.05      Mean   :0        Mean   :0.009     
##  3rd Qu.:0.000   3rd Qu.:0.00      3rd Qu.:0        3rd Qu.:0.000     
##  Max.   :1.000   Max.   :1.00      Max.   :0        Max.   :1.000     
##  Purpose.Business Purpose.Other   SavingsAccountBonds.lt.100
##  Min.   :0.000    Min.   :0.000   Min.   :0.000             
##  1st Qu.:0.000    1st Qu.:0.000   1st Qu.:0.000             
##  Median :0.000    Median :0.000   Median :1.000             
##  Mean   :0.097    Mean   :0.012   Mean   :0.603             
##  3rd Qu.:0.000    3rd Qu.:0.000   3rd Qu.:1.000             
##  Max.   :1.000    Max.   :1.000   Max.   :1.000             
##  SavingsAccountBonds.100.to.500 SavingsAccountBonds.500.to.1000
##  Min.   :0.000                  Min.   :0.000                  
##  1st Qu.:0.000                  1st Qu.:0.000                  
##  Median :0.000                  Median :0.000                  
##  Mean   :0.103                  Mean   :0.063                  
##  3rd Qu.:0.000                  3rd Qu.:0.000                  
##  Max.   :1.000                  Max.   :1.000                  
##  SavingsAccountBonds.gt.1000 SavingsAccountBonds.Unknown
##  Min.   :0.000               Min.   :0.000              
##  1st Qu.:0.000               1st Qu.:0.000              
##  Median :0.000               Median :0.000              
##  Mean   :0.048               Mean   :0.183              
##  3rd Qu.:0.000               3rd Qu.:0.000              
##  Max.   :1.000               Max.   :1.000              
##  EmploymentDuration.lt.1 EmploymentDuration.1.to.4 EmploymentDuration.4.to.7
##  Min.   :0.000           Min.   :0.000             Min.   :0.000            
##  1st Qu.:0.000           1st Qu.:0.000             1st Qu.:0.000            
##  Median :0.000           Median :0.000             Median :0.000            
##  Mean   :0.172           Mean   :0.339             Mean   :0.174            
##  3rd Qu.:0.000           3rd Qu.:1.000             3rd Qu.:0.000            
##  Max.   :1.000           Max.   :1.000             Max.   :1.000            
##  EmploymentDuration.gt.7 EmploymentDuration.Unemployed
##  Min.   :0.000           Min.   :0.000                
##  1st Qu.:0.000           1st Qu.:0.000                
##  Median :0.000           Median :0.000                
##  Mean   :0.253           Mean   :0.062                
##  3rd Qu.:1.000           3rd Qu.:0.000                
##  Max.   :1.000           Max.   :1.000                
##  Personal.Male.Divorced.Seperated Personal.Female.NotSingle
##  Min.   :0.00                     Min.   :0.00             
##  1st Qu.:0.00                     1st Qu.:0.00             
##  Median :0.00                     Median :0.00             
##  Mean   :0.05                     Mean   :0.31             
##  3rd Qu.:0.00                     3rd Qu.:1.00             
##  Max.   :1.00                     Max.   :1.00             
##  Personal.Male.Single Personal.Male.Married.Widowed Personal.Female.Single
##  Min.   :0.000        Min.   :0.000                 Min.   :0             
##  1st Qu.:0.000        1st Qu.:0.000                 1st Qu.:0             
##  Median :1.000        Median :0.000                 Median :0             
##  Mean   :0.548        Mean   :0.092                 Mean   :0             
##  3rd Qu.:1.000        3rd Qu.:0.000                 3rd Qu.:0             
##  Max.   :1.000        Max.   :1.000                 Max.   :0             
##  OtherDebtorsGuarantors.None OtherDebtorsGuarantors.CoApplicant
##  Min.   :0.000               Min.   :0.000                     
##  1st Qu.:1.000               1st Qu.:0.000                     
##  Median :1.000               Median :0.000                     
##  Mean   :0.907               Mean   :0.041                     
##  3rd Qu.:1.000               3rd Qu.:0.000                     
##  Max.   :1.000               Max.   :1.000                     
##  OtherDebtorsGuarantors.Guarantor Property.RealEstate Property.Insurance
##  Min.   :0.000                    Min.   :0.000       Min.   :0.000     
##  1st Qu.:0.000                    1st Qu.:0.000       1st Qu.:0.000     
##  Median :0.000                    Median :0.000       Median :0.000     
##  Mean   :0.052                    Mean   :0.282       Mean   :0.232     
##  3rd Qu.:0.000                    3rd Qu.:1.000       3rd Qu.:0.000     
##  Max.   :1.000                    Max.   :1.000       Max.   :1.000     
##  Property.CarOther Property.Unknown OtherInstallmentPlans.Bank
##  Min.   :0.000     Min.   :0.000    Min.   :0.000             
##  1st Qu.:0.000     1st Qu.:0.000    1st Qu.:0.000             
##  Median :0.000     Median :0.000    Median :0.000             
##  Mean   :0.332     Mean   :0.154    Mean   :0.139             
##  3rd Qu.:1.000     3rd Qu.:0.000    3rd Qu.:0.000             
##  Max.   :1.000     Max.   :1.000    Max.   :1.000             
##  OtherInstallmentPlans.Stores OtherInstallmentPlans.None  Housing.Rent  
##  Min.   :0.000                Min.   :0.000              Min.   :0.000  
##  1st Qu.:0.000                1st Qu.:1.000              1st Qu.:0.000  
##  Median :0.000                Median :1.000              Median :0.000  
##  Mean   :0.047                Mean   :0.814              Mean   :0.179  
##  3rd Qu.:0.000                3rd Qu.:1.000              3rd Qu.:0.000  
##  Max.   :1.000                Max.   :1.000              Max.   :1.000  
##   Housing.Own    Housing.ForFree Job.UnemployedUnskilled Job.UnskilledResident
##  Min.   :0.000   Min.   :0.000   Min.   :0.000           Min.   :0.0          
##  1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.000           1st Qu.:0.0          
##  Median :1.000   Median :0.000   Median :0.000           Median :0.0          
##  Mean   :0.713   Mean   :0.108   Mean   :0.022           Mean   :0.2          
##  3rd Qu.:1.000   3rd Qu.:0.000   3rd Qu.:0.000           3rd Qu.:0.0          
##  Max.   :1.000   Max.   :1.000   Max.   :1.000           Max.   :1.0          
##  Job.SkilledEmployee Job.Management.SelfEmp.HighlyQualified
##  Min.   :0.00        Min.   :0.000                         
##  1st Qu.:0.00        1st Qu.:0.000                         
##  Median :1.00        Median :0.000                         
##  Mean   :0.63        Mean   :0.148                         
##  3rd Qu.:1.00        3rd Qu.:0.000                         
##  Max.   :1.00        Max.   :1.000

Your observation: Looking into the structure of the data we see many binary variables with seven categorical variables as mentioned before.

3. Split the dataset into training and test set. Please use the random seed as 2024 for reproducibility. (10pts)

# Splitting German Credit into Training and Testing sets
set.seed(2024)
index <- sample(1:nrow(GermanCredit),nrow(GermanCredit)*0.80)
German_credit_train = GermanCredit[index,]
German_credit_test = GermanCredit[-index,]

Your observation: The data of GermanCredit was separated into German_credit_train and German_credit_test.

Task 2: Model Fitting (20pts)

1. Fit a logistic regression model using the training set. Please use all variables, but make sure the variable types are right.

# Making a Logistic Regression Model
German_log <- glm(Class~., family=binomial, data=German_credit_train)

Your observation: Looking at all the variables from German_log we can see every variable is insignificant besides Amount to InstallmentRatePercentage, CheckingAccountStatus.lt.0, CheckingAccountStatus.0.to.200 , making it significantly different from 0.

2. Summarize the model and interpret the coefficients (pick at least one coefficient you think important and discuss it in detail).

# Summary of the Logistic Regression of German Credit Train
summary(German_log)
## 
## Call:
## glm(formula = Class ~ ., family = binomial, data = German_credit_train)
## 
## Coefficients: (13 not defined because of singularities)
##                                          Estimate Std. Error z value Pr(>|z|)
## (Intercept)                             9.241e+00  1.719e+00   5.376 7.61e-08
## Duration                               -2.994e-02  1.072e-02  -2.794 0.005214
## Amount                                 -1.771e-04  5.095e-05  -3.475 0.000510
## InstallmentRatePercentage              -3.718e-01  1.036e-01  -3.589 0.000332
## ResidenceDuration                       2.577e-02  1.010e-01   0.255 0.798510
## Age                                     1.183e-02  1.097e-02   1.078 0.280974
## NumberExistingCredits                  -1.225e-01  2.189e-01  -0.560 0.575690
## NumberPeopleMaintenance                -1.731e-01  2.945e-01  -0.588 0.556678
## Telephone                              -4.236e-01  2.371e-01  -1.786 0.074081
## ForeignWorker                          -1.651e+00  7.421e-01  -2.224 0.026143
## CheckingAccountStatus.lt.0             -1.817e+00  2.710e-01  -6.703 2.04e-11
## CheckingAccountStatus.0.to.200         -1.432e+00  2.686e-01  -5.330 9.81e-08
## CheckingAccountStatus.gt.200           -5.912e-01  4.631e-01  -1.277 0.201696
## CheckingAccountStatus.none                     NA         NA      NA       NA
## CreditHistory.NoCredit.AllPaid         -8.724e-01  5.139e-01  -1.698 0.089584
## CreditHistory.ThisBank.AllPaid         -1.676e+00  5.493e-01  -3.052 0.002277
## CreditHistory.PaidDuly                 -6.686e-01  2.939e-01  -2.275 0.022899
## CreditHistory.Delay                    -9.413e-01  3.780e-01  -2.491 0.012756
## CreditHistory.Critical                         NA         NA      NA       NA
## Purpose.NewCar                         -1.733e+00  1.013e+00  -1.710 0.087282
## Purpose.UsedCar                         6.716e-02  1.033e+00   0.065 0.948146
## Purpose.Furniture.Equipment            -8.257e-01  1.015e+00  -0.814 0.415816
## Purpose.Radio.Television               -8.386e-01  1.019e+00  -0.823 0.410457
## Purpose.DomesticAppliance              -1.227e+00  1.328e+00  -0.923 0.355762
## Purpose.Repairs                        -1.321e+00  1.165e+00  -1.134 0.256825
## Purpose.Education                      -2.020e+00  1.088e+00  -1.857 0.063374
## Purpose.Vacation                               NA         NA      NA       NA
## Purpose.Retraining                      4.276e-01  1.640e+00   0.261 0.794237
## Purpose.Business                       -8.618e-01  1.032e+00  -0.835 0.403529
## Purpose.Other                                  NA         NA      NA       NA
## SavingsAccountBonds.lt.100             -1.266e+00  3.201e-01  -3.956 7.63e-05
## SavingsAccountBonds.100.to.500         -1.075e+00  4.171e-01  -2.577 0.009964
## SavingsAccountBonds.500.to.1000        -8.768e-01  5.216e-01  -1.681 0.092761
## SavingsAccountBonds.gt.1000             1.301e-02  6.161e-01   0.021 0.983157
## SavingsAccountBonds.Unknown                    NA         NA      NA       NA
## EmploymentDuration.lt.1                 3.581e-01  5.167e-01   0.693 0.488195
## EmploymentDuration.1.to.4               5.527e-01  5.000e-01   1.105 0.268967
## EmploymentDuration.4.to.7               9.863e-01  5.355e-01   1.842 0.065524
## EmploymentDuration.gt.7                 5.253e-01  5.039e-01   1.042 0.297218
## EmploymentDuration.Unemployed                  NA         NA      NA       NA
## Personal.Male.Divorced.Seperated       -2.546e-01  5.214e-01  -0.488 0.625274
## Personal.Female.NotSingle              -1.274e-01  3.573e-01  -0.357 0.721452
## Personal.Male.Single                    4.118e-01  3.623e-01   1.137 0.255622
## Personal.Male.Married.Widowed                  NA         NA      NA       NA
## Personal.Female.Single                         NA         NA      NA       NA
## OtherDebtorsGuarantors.None            -1.239e+00  5.370e-01  -2.308 0.021018
## OtherDebtorsGuarantors.CoApplicant     -1.565e+00  6.828e-01  -2.292 0.021919
## OtherDebtorsGuarantors.Guarantor               NA         NA      NA       NA
## Property.RealEstate                     7.166e-01  4.898e-01   1.463 0.143477
## Property.Insurance                      3.544e-01  4.785e-01   0.741 0.458926
## Property.CarOther                       6.110e-01  4.648e-01   1.314 0.188702
## Property.Unknown                               NA         NA      NA       NA
## OtherInstallmentPlans.Bank             -8.504e-01  2.730e-01  -3.115 0.001838
## OtherInstallmentPlans.Stores           -4.293e-01  4.711e-01  -0.911 0.362139
## OtherInstallmentPlans.None                     NA         NA      NA       NA
## Housing.Rent                           -9.538e-01  5.624e-01  -1.696 0.089924
## Housing.Own                            -2.723e-01  5.282e-01  -0.516 0.606157
## Housing.ForFree                                NA         NA      NA       NA
## Job.UnemployedUnskilled                 1.449e+00  8.788e-01   1.649 0.099175
## Job.UnskilledResident                  -2.641e-03  4.101e-01  -0.006 0.994861
## Job.SkilledEmployee                    -1.073e-02  3.349e-01  -0.032 0.974438
## Job.Management.SelfEmp.HighlyQualified         NA         NA      NA       NA
##                                           
## (Intercept)                            ***
## Duration                               ** 
## Amount                                 ***
## InstallmentRatePercentage              ***
## ResidenceDuration                         
## Age                                       
## NumberExistingCredits                     
## NumberPeopleMaintenance                   
## Telephone                              .  
## ForeignWorker                          *  
## CheckingAccountStatus.lt.0             ***
## CheckingAccountStatus.0.to.200         ***
## CheckingAccountStatus.gt.200              
## CheckingAccountStatus.none                
## CreditHistory.NoCredit.AllPaid         .  
## CreditHistory.ThisBank.AllPaid         ** 
## CreditHistory.PaidDuly                 *  
## CreditHistory.Delay                    *  
## CreditHistory.Critical                    
## Purpose.NewCar                         .  
## Purpose.UsedCar                           
## Purpose.Furniture.Equipment               
## Purpose.Radio.Television                  
## Purpose.DomesticAppliance                 
## Purpose.Repairs                           
## Purpose.Education                      .  
## Purpose.Vacation                          
## Purpose.Retraining                        
## Purpose.Business                          
## Purpose.Other                             
## SavingsAccountBonds.lt.100             ***
## SavingsAccountBonds.100.to.500         ** 
## SavingsAccountBonds.500.to.1000        .  
## SavingsAccountBonds.gt.1000               
## SavingsAccountBonds.Unknown               
## EmploymentDuration.lt.1                   
## EmploymentDuration.1.to.4                 
## EmploymentDuration.4.to.7              .  
## EmploymentDuration.gt.7                   
## EmploymentDuration.Unemployed             
## Personal.Male.Divorced.Seperated          
## Personal.Female.NotSingle                 
## Personal.Male.Single                      
## Personal.Male.Married.Widowed             
## Personal.Female.Single                    
## OtherDebtorsGuarantors.None            *  
## OtherDebtorsGuarantors.CoApplicant     *  
## OtherDebtorsGuarantors.Guarantor          
## Property.RealEstate                       
## Property.Insurance                        
## Property.CarOther                         
## Property.Unknown                          
## OtherInstallmentPlans.Bank             ** 
## OtherInstallmentPlans.Stores              
## OtherInstallmentPlans.None                
## Housing.Rent                           .  
## Housing.Own                               
## Housing.ForFree                           
## Job.UnemployedUnskilled                .  
## Job.UnskilledResident                     
## Job.SkilledEmployee                       
## Job.Management.SelfEmp.HighlyQualified    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 958.02  on 799  degrees of freedom
## Residual deviance: 672.78  on 751  degrees of freedom
## AIC: 770.78
## 
## Number of Fisher Scoring iterations: 5

Your observation: Based off the model, the variable CheckingAccountStatus.0.to.200 has the most reliable p value in the data set. The lower the estimate, the more less likely somebody will be classed as good or having good credit. This makes sense considering CheckingAccontStatus.lt.0 is the strongest estimate with an -1.817e+00 for its either positive or 1 for the likelihood of good credit.

Task 3: Find Optimal Probability Cut-off, with weight_FN = 1 and weight_FP = 1. (20pts)

1. Use the training set to obtain predicted probabilities.

# Code for predicted probabilities 
pred_prob_German_credit_train <- predict(German_log, newdata = German_credit_train, type = "response")

pred_German_train <- prediction(pred_prob_German_credit_train, German_credit_train$Class)
# Wanted more insight using the Histograms 
Histogram_German_credit_train <- predict(German_log)
hist(Histogram_German_credit_train)

Histogram_prob_German_credit_train <- predict(German_log, type="response")
hist(Histogram_prob_German_credit_train)

Your observation: Histogram pred_German_credit_train displays the response function. This gives us the predicted probability solution of German_credit_traing, where some what of the majority of cases will have a high probability of Class = Good.

2. Find the optimal probability cut-off point using the MR (misclassification rate) or equivalently the equal-weight cost.

costfunc = function(obs, pred.p, pcut){
    weight_FN = 1    
    weight_FP = 1    
    FNC = sum( (obs==1) & (pred.p < pcut))   
    FPC = sum( (obs==0) & (pred.p >=pcut))   
    MR  = sum(weight_FN*FNC + weight_FP*FPC) / length(obs)  
    return(MR) 
} 
pcut.seq = seq(0.01, 1, 0.01) 
MR_vec = rep(0, length(pcut.seq))  
for(i in 1:length(pcut.seq)){ 
    MR_vec[i] = costfunc(obs = GermanCredit$Class, pred.p = pred_prob_German_credit_train, pcut = pcut.seq[i])  
} 
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
cbind(pcut.seq,MR_vec)
##        pcut.seq MR_vec
##   [1,]     0.01  0.300
##   [2,]     0.02  0.300
##   [3,]     0.03  0.300
##   [4,]     0.04  0.304
##   [5,]     0.05  0.304
##   [6,]     0.06  0.304
##   [7,]     0.07  0.305
##   [8,]     0.08  0.305
##   [9,]     0.09  0.308
##  [10,]     0.10  0.308
##  [11,]     0.11  0.310
##  [12,]     0.12  0.310
##  [13,]     0.13  0.310
##  [14,]     0.14  0.311
##  [15,]     0.15  0.311
##  [16,]     0.16  0.311
##  [17,]     0.17  0.312
##  [18,]     0.18  0.316
##  [19,]     0.19  0.312
##  [20,]     0.20  0.315
##  [21,]     0.21  0.316
##  [22,]     0.22  0.317
##  [23,]     0.23  0.319
##  [24,]     0.24  0.317
##  [25,]     0.25  0.320
##  [26,]     0.26  0.325
##  [27,]     0.27  0.325
##  [28,]     0.28  0.326
##  [29,]     0.29  0.325
##  [30,]     0.30  0.331
##  [31,]     0.31  0.331
##  [32,]     0.32  0.336
##  [33,]     0.33  0.337
##  [34,]     0.34  0.341
##  [35,]     0.35  0.346
##  [36,]     0.36  0.352
##  [37,]     0.37  0.354
##  [38,]     0.38  0.361
##  [39,]     0.39  0.365
##  [40,]     0.40  0.365
##  [41,]     0.41  0.368
##  [42,]     0.42  0.370
##  [43,]     0.43  0.370
##  [44,]     0.44  0.372
##  [45,]     0.45  0.374
##  [46,]     0.46  0.378
##  [47,]     0.47  0.384
##  [48,]     0.48  0.384
##  [49,]     0.49  0.387
##  [50,]     0.50  0.390
##  [51,]     0.51  0.392
##  [52,]     0.52  0.394
##  [53,]     0.53  0.403
##  [54,]     0.54  0.405
##  [55,]     0.55  0.405
##  [56,]     0.56  0.409
##  [57,]     0.57  0.414
##  [58,]     0.58  0.413
##  [59,]     0.59  0.422
##  [60,]     0.60  0.421
##  [61,]     0.61  0.431
##  [62,]     0.62  0.432
##  [63,]     0.63  0.434
##  [64,]     0.64  0.436
##  [65,]     0.65  0.442
##  [66,]     0.66  0.447
##  [67,]     0.67  0.448
##  [68,]     0.68  0.450
##  [69,]     0.69  0.453
##  [70,]     0.70  0.457
##  [71,]     0.71  0.458
##  [72,]     0.72  0.457
##  [73,]     0.73  0.460
##  [74,]     0.74  0.464
##  [75,]     0.75  0.478
##  [76,]     0.76  0.484
##  [77,]     0.77  0.485
##  [78,]     0.78  0.485
##  [79,]     0.79  0.487
##  [80,]     0.80  0.488
##  [81,]     0.81  0.492
##  [82,]     0.82  0.495
##  [83,]     0.83  0.503
##  [84,]     0.84  0.509
##  [85,]     0.85  0.516
##  [86,]     0.86  0.519
##  [87,]     0.87  0.528
##  [88,]     0.88  0.525
##  [89,]     0.89  0.530
##  [90,]     0.90  0.546
##  [91,]     0.91  0.559
##  [92,]     0.92  0.566
##  [93,]     0.93  0.577
##  [94,]     0.94  0.597
##  [95,]     0.95  0.612
##  [96,]     0.96  0.620
##  [97,]     0.97  0.640
##  [98,]     0.98  0.657
##  [99,]     0.99  0.679
## [100,]     1.00  0.700
# All new p-cut and Y axis being associated Class
plot(pcut.seq, MR_vec)

# All "new" p-cut and Y axis being associated Class
First.optimal.pcut = pcut.seq[which(MR_vec==min(MR_vec))]
print(First.optimal.pcut)
## [1] 0.01 0.02 0.03

Your observation: We are searching all possible p-cuts to find the one that provides minimum MR (cost) within the predicted probability of pred_prob_train. The model determined the optimal probability cut-off points where 0.01, 0.02, and 0.03 but looking at the MR_vec we see the distribution through all possible cutoff values. Starts to make a slight shift at 0.4 which makes me conclude that 0.4 is the overall best optimal probability cut-off point.

Task 4: Model Evaluation (20pts)

1. Using the optimal probability cut-off point obtained in 3.2, generate confusion matrix and obtain MR for the the training set.

# Confusion matrix of Training, and Training MR
pred_prob_credit_train <- predict(German_log, type="response")
class.glm0.train<- (pred_prob_credit_train> 0.4 )*1
table(German_credit_train$Class, class.glm0.train, dnn = c("True", "Predicted"))
##        Predicted
## True      0   1
##   FALSE 107 122
##   TRUE   35 536
MR<- 1 - sum(diag(pred_prob_credit_train)) / sum(pred_prob_German_credit_train)
print(paste0("MR:",MR))
## [1] "MR:0"

Your observation: The model achieves an accuracy of 80.4%, indicating that it correctly classifies 80.4% of all instances. This evaluation suggests that the model performs well, particularly in detecting true positives, though it may have a higher false positive rate than desired. Having an MR of 0 meaning, that there are no false negatives—the model has correctly classified all actual positives. In other words, it has perfect recall (or sensitivity), effectively identifying every positive instance in the data set without any misses.

2. Using the optimal probability cut-off point obtained in 3.2, generate the ROC curve and calculate the AUC for the training set.

# ROC and AUC for Training 
ROC <- performance(pred_German_train, "tpr", "fpr")
plot(ROC, colorize=TRUE)

German_class_test_optim <- ifelse(pred_prob_German_credit_train >= 0.4, 1, 0) 
auc_German_train = unlist(slot(performance(pred_German_train, "auc"), "y.values"))
auc_German_train 
## [1] 0.8504807

Your observation: Using the AUC model, the model already predicts really good with the German_credit_training data. It provides us an ROC of 0.8504807 which is actually really good, for the ROC predicts close to 1.

3. Using the same cut-off point, generate confusion matrix and obtain MR for the test set.

# Confusion matrix of Testing, and Testing MR
pred_prob_credit_test<- predict(German_log, newdata = German_credit_test, type="response")

pred_German_test <- prediction(pred_prob_credit_test, German_credit_test$Class)

class.glm0.test <- (pred_prob_credit_test> 0.4 )*1
confusion_test<- table(German_credit_test$Class, class.glm0.test, dnn = c("True", "Predicted"))
MR<- 1 - sum(diag(confusion_test)) / sum(confusion_test)
print(paste0("MR:",MR))
## [1] "MR:0.27"

Your observation: Accuracy 73% indicates that the model correctly classifies 73% of all instances. Same as the other model pred_class_credit_test the MR is 0 for model pred_class_credit_test. There are no false negatives—the model has correctly classified all actual positives. This evaluation suggests that the model is effective at identifying true positives. In other words, it has perfect recall (or sensitivity), effectively identifying every positive instance in the data set without any misses.

4. Using the same cut-off point, generate the ROC curve and calculate the AUC for the test set.

library(ROCR)
library(gplots)
# ROC and AUC for Testing 
ROC <- performance(pred_German_test, "tpr", "fpr")
plot(ROC, colorize=TRUE)

auc_German_test = unlist(slot(performance(pred_German_test, "auc"), "y.values"))
auc_German_test 
## [1] 0.7353423

Your observation: Using the AUC model, the model already predicts fairly well with the German_credit_test data. It provides us an ROC of 0.7353423 which is actually above average, for the ROC predicts close to 1.

Task 5: Using different weights (20pts)

Now, let’s assume “It is worse to class a customer as good when they are bad (weight = 5), than it is to class a customer as bad when they are good (weight = 1).” Please figure out which weight should be 5 and which weight should be 1. Then define your cost function accordingly!

1. Obtain optimal probability cut-off point again, with the new weights.

costfunc = function(obs, pred.p, pcut){
    weight_FN = 5    
    weight_FP = 1    
    FNC = sum( (obs==1) & (pred.p < pcut))   
    FPC = sum( (obs==0) & (pred.p >=pcut))   
    MR  = sum(weight_FN*FNC + weight_FP*FPC) / length(obs)  
    return(MR) 
} 
pcut.seq = seq(0.01, 1, 0.01) 
MR_vec = rep(0, length(pcut.seq))  
for(i in 1:length(pcut.seq)){ 
    MR_vec[i] = costfunc(obs = GermanCredit$Class, pred.p = pred_prob_German_credit_train, pcut = pcut.seq[i])  
} 
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
## Warning in (obs == 1) & (pred.p < pcut): longer object length is not a multiple
## of shorter object length
## Warning in (obs == 0) & (pred.p >= pcut): longer object length is not a
## multiple of shorter object length
cbind(pcut.seq,MR_vec)
##        pcut.seq MR_vec
##   [1,]     0.01  0.300
##   [2,]     0.02  0.300
##   [3,]     0.03  0.300
##   [4,]     0.04  0.328
##   [5,]     0.05  0.328
##   [6,]     0.06  0.328
##   [7,]     0.07  0.333
##   [8,]     0.08  0.337
##   [9,]     0.09  0.356
##  [10,]     0.10  0.368
##  [11,]     0.11  0.382
##  [12,]     0.12  0.382
##  [13,]     0.13  0.386
##  [14,]     0.14  0.391
##  [15,]     0.15  0.395
##  [16,]     0.16  0.395
##  [17,]     0.17  0.400
##  [18,]     0.18  0.424
##  [19,]     0.19  0.420
##  [20,]     0.20  0.439
##  [21,]     0.21  0.452
##  [22,]     0.22  0.465
##  [23,]     0.23  0.483
##  [24,]     0.24  0.485
##  [25,]     0.25  0.504
##  [26,]     0.26  0.529
##  [27,]     0.27  0.537
##  [28,]     0.28  0.546
##  [29,]     0.29  0.545
##  [30,]     0.30  0.591
##  [31,]     0.31  0.603
##  [32,]     0.32  0.632
##  [33,]     0.33  0.649
##  [34,]     0.34  0.673
##  [35,]     0.35  0.710
##  [36,]     0.36  0.760
##  [37,]     0.37  0.778
##  [38,]     0.38  0.817
##  [39,]     0.39  0.853
##  [40,]     0.40  0.857
##  [41,]     0.41  0.880
##  [42,]     0.42  0.890
##  [43,]     0.43  0.894
##  [44,]     0.44  0.912
##  [45,]     0.45  0.922
##  [46,]     0.46  0.950
##  [47,]     0.47  0.984
##  [48,]     0.48  0.996
##  [49,]     0.49  1.023
##  [50,]     0.50  1.046
##  [51,]     0.51  1.060
##  [52,]     0.52  1.078
##  [53,]     0.53  1.127
##  [54,]     0.54  1.145
##  [55,]     0.55  1.157
##  [56,]     0.56  1.189
##  [57,]     0.57  1.234
##  [58,]     0.58  1.249
##  [59,]     0.59  1.294
##  [60,]     0.60  1.305
##  [61,]     0.61  1.363
##  [62,]     0.62  1.384
##  [63,]     0.63  1.398
##  [64,]     0.64  1.416
##  [65,]     0.65  1.454
##  [66,]     0.66  1.487
##  [67,]     0.67  1.520
##  [68,]     0.68  1.534
##  [69,]     0.69  1.561
##  [70,]     0.70  1.593
##  [71,]     0.71  1.614
##  [72,]     0.72  1.617
##  [73,]     0.73  1.648
##  [74,]     0.74  1.680
##  [75,]     0.75  1.754
##  [76,]     0.76  1.796
##  [77,]     0.77  1.817
##  [78,]     0.78  1.841
##  [79,]     0.79  1.863
##  [80,]     0.80  1.884
##  [81,]     0.81  1.916
##  [82,]     0.82  1.951
##  [83,]     0.83  2.003
##  [84,]     0.84  2.045
##  [85,]     0.85  2.096
##  [86,]     0.86  2.135
##  [87,]     0.87  2.200
##  [88,]     0.88  2.225
##  [89,]     0.89  2.274
##  [90,]     0.90  2.366
##  [91,]     0.91  2.459
##  [92,]     0.92  2.522
##  [93,]     0.93  2.589
##  [94,]     0.94  2.725
##  [95,]     0.95  2.840
##  [96,]     0.96  2.912
##  [97,]     0.97  3.068
##  [98,]     0.98  3.205
##  [99,]     0.99  3.375
## [100,]     1.00  3.500
# All new p-cut and Y axis being associated Class
plot(pcut.seq, MR_vec)

# New optimal p-cut
Second.optimal.pcut = pcut.seq[which(MR_vec==min(MR_vec))]
print(Second.optimal.pcut)
## [1] 0.01 0.02 0.03

Your observation: We are searching all possible p-cuts to find the one that provides minimum MR (cost) within the predicted probability of pred_prob_train. This time changing the weight of weight_FN = 5. The model determined the optimal probability cut-off points where still 0.01, 0.02, and 0.03 but looking at the MR_vec we see the distribution through all possible cutoff values. Starts to make a slight shift at 0.4 where we will continue using 0.4.

2. Obtain the confusion matrix and MR for the training set.

# Confusion matrix of Training data, Training MR (New Weights)
pred_class_credit_train_optimal <- (pred_prob_German_credit_train>0.4)*1
conf_train <- table(German_credit_train$Class, pred_class_credit_train_optimal, dnn = c("True", "Predicted"))
MR<- 1 - sum(diag(conf_train)) / sum(conf_train)
print(paste0("MR:",MR))
## [1] "MR:0.19625"

Your observation: Accuracy (1 - MR) = 1 - 0.19625 = 0.80375 or 80.375%, which indicates that the model correctly classifies 80.375% of instances.This rate suggests that the model has a moderate level of accuracy, with some room for improvement in reducing the number of incorrect predictions. Can it predict better for the Testing set? Lets go ahead and preform another confusion matrix and calculate the Testing sets MR.

3. Obtain the confusion matrix and MR for the test set.

# Confusion matrix of Training data, Training MR (New Weights)
pred_class_credit_test_optimal <- (pred_prob_credit_test>0.4)*1
conf_test <- table(German_credit_test$Class, pred_class_credit_test_optimal, dnn = c("True", "Predicted"))
MR<- 1 - sum(diag(conf_test)) / sum(conf_test)
print(paste0("MR:",MR))
## [1] "MR:0.27"

Your observation: A 73% accuracy (from 27% misclassification) may be acceptable depending on the domain. Some companies might accept, this for it could indicate satisfactory performance; for others, it might necessitate improvement, especially if the cost of misclassifications is high. Conclusion it doesn’t preform better than the Training set which is great!

Task 6: Conlusion (10pts)

Summarize your findings, including the optimal probability cut-off, MR and AUC for both training and testing data. Discuss what you observed and what you will do to improve the model.

After testing various probability cutoffs, a threshold of 0.4 was identified as optimal, minimizing the misclassification rate (MR) and achieving an accuracy of 80.4% on the training set. This accuracy reflects strong performance, with an MR of 0, meaning no false negatives and perfect recall in identifying actual positives. The model’s AUC of 0.8505 on the training set indicates strong class discrimination. On the testing set, the model reached an accuracy of 73% and maintained an MR of 0, successfully identifying all true positives, though with a slightly lower AUC of 0.7353423, suggesting less generalization to new data. Applying a higher false-negative weight confirmed 0.4 as the optimal cutoff, balancing misclassification costs with accuracy, yet leaving room for improvement. Enhancements through hyperparameter tuning, adjusted cutoffs, feature engineering, and cross-validation could further boost accuracy and generalization, especially in applications where misclassification costs are high.