Refer to http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data))
for variable description. The response variable is Class
and all others are predictors.
Only run the following code once to install the package
caret. The German credit scoring data in
provided in that package.
if (!require("caret", quietly = TRUE)) {
install.packages("caret")
}
library(caret) #this package contains the german data with its numeric format
data(GermanCredit)
GermanCredit$Class <- GermanCredit$Class == "Good" # use this code to convert `Class` into True or False (equivalent to 1 or 0)
str(GermanCredit)
## 'data.frame': 1000 obs. of 62 variables:
## $ Duration : int 6 48 12 42 24 36 24 36 12 30 ...
## $ Amount : int 1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
## $ InstallmentRatePercentage : int 4 2 2 2 3 2 3 2 2 4 ...
## $ ResidenceDuration : int 4 2 3 4 4 4 4 2 4 2 ...
## $ Age : int 67 22 49 45 53 35 53 35 61 28 ...
## $ NumberExistingCredits : int 2 1 1 1 2 1 1 1 1 2 ...
## $ NumberPeopleMaintenance : int 1 1 2 2 2 2 1 1 1 1 ...
## $ Telephone : num 0 1 1 1 1 0 1 0 1 1 ...
## $ ForeignWorker : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Class : logi TRUE FALSE TRUE TRUE FALSE TRUE ...
## $ CheckingAccountStatus.lt.0 : num 1 0 0 1 1 0 0 0 0 0 ...
## $ CheckingAccountStatus.0.to.200 : num 0 1 0 0 0 0 0 1 0 1 ...
## $ CheckingAccountStatus.gt.200 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CheckingAccountStatus.none : num 0 0 1 0 0 1 1 0 1 0 ...
## $ CreditHistory.NoCredit.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.ThisBank.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.PaidDuly : num 0 1 0 1 0 1 1 1 1 0 ...
## $ CreditHistory.Delay : num 0 0 0 0 1 0 0 0 0 0 ...
## $ CreditHistory.Critical : num 1 0 1 0 0 0 0 0 0 1 ...
## $ Purpose.NewCar : num 0 0 0 0 1 0 0 0 0 1 ...
## $ Purpose.UsedCar : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Purpose.Furniture.Equipment : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Purpose.Radio.Television : num 1 1 0 0 0 0 0 0 1 0 ...
## $ Purpose.DomesticAppliance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Repairs : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Education : num 0 0 1 0 0 1 0 0 0 0 ...
## $ Purpose.Vacation : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Retraining : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Business : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Other : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.lt.100 : num 0 1 1 1 1 0 0 1 0 1 ...
## $ SavingsAccountBonds.100.to.500 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.500.to.1000 : num 0 0 0 0 0 0 1 0 0 0 ...
## $ SavingsAccountBonds.gt.1000 : num 0 0 0 0 0 0 0 0 1 0 ...
## $ SavingsAccountBonds.Unknown : num 1 0 0 0 0 1 0 0 0 0 ...
## $ EmploymentDuration.lt.1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ EmploymentDuration.1.to.4 : num 0 1 0 0 1 1 0 1 0 0 ...
## $ EmploymentDuration.4.to.7 : num 0 0 1 1 0 0 0 0 1 0 ...
## $ EmploymentDuration.gt.7 : num 1 0 0 0 0 0 1 0 0 0 ...
## $ EmploymentDuration.Unemployed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Male.Divorced.Seperated : num 0 0 0 0 0 0 0 0 1 0 ...
## $ Personal.Female.NotSingle : num 0 1 0 0 0 0 0 0 0 0 ...
## $ Personal.Male.Single : num 1 0 1 1 1 1 1 1 0 0 ...
## $ Personal.Male.Married.Widowed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Female.Single : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.None : num 1 1 1 0 1 1 1 1 1 1 ...
## $ OtherDebtorsGuarantors.CoApplicant : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.Guarantor : num 0 0 0 1 0 0 0 0 0 0 ...
## $ Property.RealEstate : num 1 1 1 0 0 0 0 0 1 0 ...
## $ Property.Insurance : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Property.CarOther : num 0 0 0 0 0 0 0 1 0 1 ...
## $ Property.Unknown : num 0 0 0 0 1 1 0 0 0 0 ...
## $ OtherInstallmentPlans.Bank : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.Stores : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.None : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Housing.Rent : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Housing.Own : num 1 1 1 0 0 0 1 0 1 1 ...
## $ Housing.ForFree : num 0 0 0 1 1 1 0 0 0 0 ...
## $ Job.UnemployedUnskilled : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Job.UnskilledResident : num 0 0 1 0 0 1 0 0 1 0 ...
## $ Job.SkilledEmployee : num 1 1 0 1 1 0 1 0 0 0 ...
## $ Job.Management.SelfEmp.HighlyQualified: num 0 0 0 0 0 0 0 1 0 1 ...
Your observation: The majority of variables are numeric and only display 0s or 1s but a few variables are integers. 1 variable is logical
#This is an optional code that drop variables that provide no information in the data
GermanCredit = GermanCredit[,-c(14,19,27,30,35,40,44,45,48,52,55,58,62)] #don't run this code twice!! Think about why.
str(GermanCredit)
## 'data.frame': 1000 obs. of 49 variables:
## $ Duration : int 6 48 12 42 24 36 24 36 12 30 ...
## $ Amount : int 1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
## $ InstallmentRatePercentage : int 4 2 2 2 3 2 3 2 2 4 ...
## $ ResidenceDuration : int 4 2 3 4 4 4 4 2 4 2 ...
## $ Age : int 67 22 49 45 53 35 53 35 61 28 ...
## $ NumberExistingCredits : int 2 1 1 1 2 1 1 1 1 2 ...
## $ NumberPeopleMaintenance : int 1 1 2 2 2 2 1 1 1 1 ...
## $ Telephone : num 0 1 1 1 1 0 1 0 1 1 ...
## $ ForeignWorker : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Class : logi TRUE FALSE TRUE TRUE FALSE TRUE ...
## $ CheckingAccountStatus.lt.0 : num 1 0 0 1 1 0 0 0 0 0 ...
## $ CheckingAccountStatus.0.to.200 : num 0 1 0 0 0 0 0 1 0 1 ...
## $ CheckingAccountStatus.gt.200 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.NoCredit.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.ThisBank.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.PaidDuly : num 0 1 0 1 0 1 1 1 1 0 ...
## $ CreditHistory.Delay : num 0 0 0 0 1 0 0 0 0 0 ...
## $ Purpose.NewCar : num 0 0 0 0 1 0 0 0 0 1 ...
## $ Purpose.UsedCar : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Purpose.Furniture.Equipment : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Purpose.Radio.Television : num 1 1 0 0 0 0 0 0 1 0 ...
## $ Purpose.DomesticAppliance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Repairs : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Education : num 0 0 1 0 0 1 0 0 0 0 ...
## $ Purpose.Retraining : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Business : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.lt.100 : num 0 1 1 1 1 0 0 1 0 1 ...
## $ SavingsAccountBonds.100.to.500 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.500.to.1000 : num 0 0 0 0 0 0 1 0 0 0 ...
## $ SavingsAccountBonds.gt.1000 : num 0 0 0 0 0 0 0 0 1 0 ...
## $ EmploymentDuration.lt.1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ EmploymentDuration.1.to.4 : num 0 1 0 0 1 1 0 1 0 0 ...
## $ EmploymentDuration.4.to.7 : num 0 0 1 1 0 0 0 0 1 0 ...
## $ EmploymentDuration.gt.7 : num 1 0 0 0 0 0 1 0 0 0 ...
## $ Personal.Male.Divorced.Seperated : num 0 0 0 0 0 0 0 0 1 0 ...
## $ Personal.Female.NotSingle : num 0 1 0 0 0 0 0 0 0 0 ...
## $ Personal.Male.Single : num 1 0 1 1 1 1 1 1 0 0 ...
## $ OtherDebtorsGuarantors.None : num 1 1 1 0 1 1 1 1 1 1 ...
## $ OtherDebtorsGuarantors.CoApplicant: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Property.RealEstate : num 1 1 1 0 0 0 0 0 1 0 ...
## $ Property.Insurance : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Property.CarOther : num 0 0 0 0 0 0 0 1 0 1 ...
## $ OtherInstallmentPlans.Bank : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.Stores : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Housing.Rent : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Housing.Own : num 1 1 1 0 0 0 1 0 1 1 ...
## $ Job.UnemployedUnskilled : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Job.UnskilledResident : num 0 0 1 0 0 1 0 0 1 0 ...
## $ Job.SkilledEmployee : num 1 1 0 1 1 0 1 0 0 0 ...
Your observation: There are 1000 observations and 49 variables
class
(use table() function). How many observations are classed
as “good” and how many are “bad”? (2 pts)table(GermanCredit$Class)
##
## FALSE TRUE
## 300 700
Your observation: 300 are classified as Bad and 700 are classified as Good
class. Please add titles and labels to axis. (2 pts)barplot(table(GermanCredit$Class), main = "Barplot for German Credit vs Class", xlab = "Class", ylab = "# of Credit Observations")
2025 is set for reproducibility. Please
comment on what is the split proportion you choose for training and
testing data? (2 pts)set.seed(2025) # set random seed for reproducibility.
training_cred <- sample(1:nrow(GermanCredit), round(0.8 * nrow(GermanCredit)))
credit_train <- GermanCredit[training_cred, ]
credit_test <- GermanCredit[-training_cred, ]
Your comment: My split was 20% testing and 80% training
credit_log <- glm(Class ~ .,data = credit_train, family = "binomial")
InstallmentRatePercentage? Is it significant, and why? (2
pts)summary(credit_log)
##
## Call:
## glm(formula = Class ~ ., family = "binomial", data = credit_train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.815e+00 1.594e+00 5.529 3.22e-08 ***
## Duration -2.643e-02 1.027e-02 -2.573 0.010068 *
## Amount -1.355e-04 4.957e-05 -2.734 0.006253 **
## InstallmentRatePercentage -2.925e-01 9.747e-02 -3.001 0.002693 **
## ResidenceDuration -4.893e-02 9.689e-02 -0.505 0.613531
## Age 1.355e-02 1.020e-02 1.329 0.183763
## NumberExistingCredits -3.599e-01 2.175e-01 -1.654 0.098027 .
## NumberPeopleMaintenance -3.644e-01 2.712e-01 -1.344 0.178951
## Telephone -2.654e-01 2.276e-01 -1.166 0.243564
## ForeignWorker -1.454e+00 7.181e-01 -2.025 0.042862 *
## CheckingAccountStatus.lt.0 -1.775e+00 2.635e-01 -6.735 1.64e-11 ***
## CheckingAccountStatus.0.to.200 -1.393e+00 2.596e-01 -5.365 8.09e-08 ***
## CheckingAccountStatus.gt.200 -9.145e-01 4.225e-01 -2.165 0.030425 *
## CreditHistory.NoCredit.AllPaid -1.232e+00 4.874e-01 -2.528 0.011465 *
## CreditHistory.ThisBank.AllPaid -1.570e+00 4.829e-01 -3.252 0.001147 **
## CreditHistory.PaidDuly -9.374e-01 2.907e-01 -3.224 0.001262 **
## CreditHistory.Delay -5.626e-01 3.728e-01 -1.509 0.131283
## Purpose.NewCar -1.417e+00 8.247e-01 -1.718 0.085814 .
## Purpose.UsedCar 2.976e-01 8.684e-01 0.343 0.731867
## Purpose.Furniture.Equipment -5.842e-01 8.330e-01 -0.701 0.483148
## Purpose.Radio.Television -4.791e-01 8.311e-01 -0.576 0.564284
## Purpose.DomesticAppliance -7.999e-01 1.209e+00 -0.661 0.508314
## Purpose.Repairs -1.601e+00 1.011e+00 -1.585 0.113077
## Purpose.Education -1.304e+00 9.075e-01 -1.436 0.150873
## Purpose.Retraining 3.610e-01 1.500e+00 0.241 0.809867
## Purpose.Business -6.988e-01 8.558e-01 -0.817 0.414159
## SavingsAccountBonds.lt.100 -1.065e+00 3.092e-01 -3.444 0.000573 ***
## SavingsAccountBonds.100.to.500 -8.837e-01 3.957e-01 -2.233 0.025519 *
## SavingsAccountBonds.500.to.1000 -9.486e-01 5.005e-01 -1.896 0.058026 .
## SavingsAccountBonds.gt.1000 1.460e-01 5.940e-01 0.246 0.805829
## EmploymentDuration.lt.1 9.279e-02 4.634e-01 0.200 0.841281
## EmploymentDuration.1.to.4 2.090e-01 4.443e-01 0.470 0.638077
## EmploymentDuration.4.to.7 9.869e-01 4.882e-01 2.021 0.043232 *
## EmploymentDuration.gt.7 1.785e-01 4.568e-01 0.391 0.695924
## Personal.Male.Divorced.Seperated -1.288e-01 5.011e-01 -0.257 0.797124
## Personal.Female.NotSingle -2.033e-01 3.478e-01 -0.584 0.558947
## Personal.Male.Single 3.802e-01 3.481e-01 1.092 0.274710
## OtherDebtorsGuarantors.None -7.408e-01 4.667e-01 -1.587 0.112448
## OtherDebtorsGuarantors.CoApplicant -1.220e+00 6.317e-01 -1.931 0.053423 .
## Property.RealEstate 3.867e-01 4.875e-01 0.793 0.427607
## Property.Insurance 1.635e-01 4.784e-01 0.342 0.732464
## Property.CarOther 1.796e-01 4.596e-01 0.391 0.695959
## OtherInstallmentPlans.Bank -7.233e-01 2.617e-01 -2.764 0.005716 **
## OtherInstallmentPlans.Stores -1.929e-01 4.246e-01 -0.454 0.649608
## Housing.Rent -4.468e-01 5.415e-01 -0.825 0.409294
## Housing.Own -2.017e-01 5.184e-01 -0.389 0.697127
## Job.UnemployedUnskilled 5.206e-01 7.212e-01 0.722 0.470322
## Job.UnskilledResident -2.015e-02 3.833e-01 -0.053 0.958079
## Job.SkilledEmployee -6.640e-02 3.153e-01 -0.211 0.833214
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 987.34 on 799 degrees of freedom
## Residual deviance: 728.49 on 751 degrees of freedom
## AIC: 826.49
##
## Number of Fisher Scoring iterations: 5
Your comment: The estimated coefficient for InstallmentRatePercentage is -.2925 and is significant because there are 2 stars next to it and the P-value is very low. #### 3. Please interpret this number in detail (please calculate the corresponding odds ratio, and interpret it). (2 pts)
exp(-.2925)
## [1] 0.7463952
Your comment: For every one unit increased in the “InstallmentRatePercentage” variable, it will lead to a decrease by 29.25% down to a 74.6% of the original value.
pred_prob <- predict(credit_log, newdata = credit_train, type = "response")
pred_prob
## 909 460 932 922 961 279 510
## 0.95773202 0.89643422 0.59583944 0.82286550 0.95718687 0.87975870 0.88651650
## 187 266 461 881 972 900 539
## 0.58454522 0.63202386 0.68970175 0.96553745 0.69746948 0.52224145 0.05490455
## 983 891 373 407 571 907 159
## 0.52249286 0.36098959 0.93708656 0.99351643 0.31363827 0.92991255 0.53316440
## 142 549 800 575 797 801 764
## 0.33074716 0.22196333 0.86057386 0.62898028 0.92283095 0.82412861 0.85868114
## 857 630 88 195 686 940 371
## 0.99213773 0.97666440 0.27369504 0.55873152 0.58032700 0.98661879 0.85027790
## 306 59 317 528 837 564 186
## 0.96319764 0.46112830 0.82434373 0.98921934 0.93761834 0.51341403 0.94169214
## 144 882 49 399 50 626 185
## 0.63998359 0.96257747 0.88981728 0.54493907 0.81581919 0.96177678 0.48244904
## 107 491 433 302 536 788 991
## 0.29928848 0.95716547 0.69883918 0.38382487 0.71526357 0.97055596 0.95620908
## 779 503 819 937 987 958 214
## 0.98265791 0.88857998 0.11641353 0.81302085 0.28285732 0.88320224 0.86707488
## 304 850 913 593 744 153 550
## 0.73196238 0.71054119 0.74405169 0.87107759 0.37613926 0.52420738 0.98906540
## 440 754 432 772 408 753 517
## 0.67044209 0.87179854 0.26823056 0.18469067 0.83178670 0.82059780 0.92548726
## 413 495 325 336 363 41 182
## 0.93151965 0.82645341 0.81282925 0.87894074 0.61872074 0.75168692 0.51422080
## 792 599 104 15 308 524 742
## 0.96832521 0.90138841 0.92733401 0.29326192 0.60017238 0.94944969 0.55563970
## 725 387 300 264 648 23 776
## 0.57158582 0.93277872 0.97108970 0.74561871 0.80100535 0.89465682 0.38538900
## 898 482 154 781 962 873 743
## 0.99815049 0.38287596 0.88786906 0.92527108 0.52076923 0.87822850 0.96039632
## 956 687 708 654 827 878 652
## 0.84880908 0.98296713 0.23831359 0.23817412 0.26702978 0.78440766 0.69790073
## 340 608 995 425 807 583 960
## 0.59153368 0.35332001 0.93882956 0.79982099 0.92804133 0.82935872 0.57804224
## 37 228 468 846 929 117 254
## 0.77034184 0.51558271 0.68361342 0.94824647 0.96522167 0.61614621 0.90632971
## 915 820 876 676 758 586 879
## 0.29228540 0.34427125 0.77382084 0.92037755 0.97210807 0.36712068 0.53356541
## 164 994 672 531 170 825 218
## 0.41419800 0.37236901 0.90901122 0.43366378 0.66927855 0.91205400 0.79480729
## 870 691 824 639 791 148 590
## 0.36548962 0.71099397 0.66770445 0.77020742 0.51847461 0.94758734 0.81269341
## 405 901 110 210 10 241 18
## 0.67078335 0.70864554 0.85037445 0.99918711 0.26553521 0.27515076 0.19553322
## 385 158 692 284 177 430 504
## 0.87579681 0.50546028 0.40022631 0.98769478 0.62931376 0.43772301 0.49539906
## 240 6 90 715 250 447 912
## 0.76113070 0.79797878 0.42010822 0.10526342 0.73690550 0.16749866 0.49843792
## 25 465 774 369 930 121 256
## 0.99565385 0.91847149 0.88506903 0.32243455 0.41486430 0.49085271 0.79941546
## 314 295 845 301 943 331 429
## 0.55879628 0.63735822 0.57126803 0.95130964 0.97919319 0.88878503 0.94687289
## 580 557 500 917 671 53 965
## 0.87728537 0.21687782 0.76457253 0.96654072 0.93276602 0.92582903 0.45494320
## 644 748 438 5 316 397 28
## 0.98063529 0.39547587 0.93615565 0.26115599 0.13164892 0.51777319 0.81811262
## 297 908 163 568 999 396 152
## 0.96700731 0.58543713 0.93793631 0.98965747 0.41792693 0.42157224 0.99226554
## 280 320 750 836 427 298 414
## 0.88619208 0.54686011 0.98464786 0.16515719 0.90109376 0.97477224 0.93245092
## 81 713 8 416 474 388 607
## 0.93905641 0.98745949 0.73458617 0.94813351 0.98357157 0.58509803 0.97552702
## 278 428 348 130 382 206 559
## 0.87927669 0.98687108 0.60403393 0.32848365 0.45061515 0.50921460 0.34708508
## 868 680 926 643 699 730 184
## 0.96309287 0.87099453 0.15627585 0.86590479 0.92852426 0.95956655 0.98700998
## 656 551 147 679 222 209 145
## 0.41370077 0.95540357 0.67945593 0.41902555 0.32708363 0.31114361 0.89205580
## 499 58 572 285 866 865 362
## 0.79471639 0.71899947 0.94400958 0.53769990 0.83542971 0.87225414 0.94412981
## 959 533 268 444 251 612 986
## 0.33952173 0.95484178 0.79868640 0.62899020 0.93369539 0.68956026 0.48341907
## 169 576 619 951 40 98 452
## 0.89361266 0.87353374 0.32505854 0.67006189 0.76349041 0.61817326 0.92487607
## 199 383 451 16 752 333 415
## 0.85137726 0.87295845 0.96919858 0.46912898 0.40051610 0.06405302 0.44700200
## 577 683 337 629 902 996 813
## 0.84231892 0.78603796 0.77386531 0.89794248 0.95327636 0.94193011 0.72611772
## 740 198 516 345 957 269 71
## 0.16720595 0.13457044 0.88996788 0.80963914 0.89968692 0.66864346 0.74371163
## 203 2 860 73 682 450 211
## 0.89033316 0.35895450 0.98063092 0.80463656 0.97100575 0.72543622 0.98940178
## 633 871 655 155 223 573 116
## 0.84416446 0.87038090 0.98596150 0.73122756 0.81737835 0.95119186 0.97719956
## 261 554 855 274 979 334 156
## 0.75358290 0.70690633 0.52455814 0.55276744 0.66161361 0.59482687 0.48508345
## 562 714 233 622 281 135 993
## 0.48848740 0.78351586 0.96672829 0.78994951 0.98798977 0.73897060 0.82796790
## 914 720 616 646 911 225 522
## 0.98168163 0.41967461 0.36821170 0.79906079 0.69583353 0.95667459 0.55785604
## 949 693 631 964 44 918 94
## 0.71391355 0.74146546 0.44015318 0.81105580 0.85422423 0.19639600 0.75367799
## 763 621 126 585 378 974 569
## 0.57344508 0.87653235 0.50806034 0.90457114 0.98515853 0.06110540 0.76298207
## 305 26 417 939 799 32 120
## 0.58324796 0.78877498 0.32543043 0.14423310 0.90725119 0.52009960 0.85799473
## 594 700 802 201 893 473 197
## 0.26518127 0.61965056 0.79727080 0.94403002 0.69279047 0.43025917 0.97341895
## 322 512 99 479 793 398 207
## 0.36659751 0.94997551 0.77281290 0.90436579 0.98835938 0.64438000 0.94486134
## 380 756 817 783 739 969 685
## 0.88737724 0.50775546 0.95909366 0.92752115 0.95026803 0.93382901 0.65365662
## 771 96 555 464 176 47 924
## 0.83484740 0.08247094 0.67348910 0.92306831 0.82819816 0.83681086 0.61802524
## 434 834 11 127 906 403 171
## 0.79146925 0.80812096 0.33038335 0.64788520 0.71603305 0.67846460 0.14769656
## 263 160 420 277 13 217 767
## 0.49765422 0.99211507 0.58574586 0.92491095 0.79664229 0.56490709 0.40123248
## 778 553 651 119 695 560 601
## 0.76593752 0.73649917 0.34496041 0.56237524 0.94584300 0.86599307 0.90655621
## 1000 33 360 520 903 598 87
## 0.80623596 0.46462379 0.43855604 0.98450787 0.97001995 0.58203342 0.62449858
## 139 404 384 231 335 745 196
## 0.96256610 0.80087656 0.62802748 0.57372925 0.08462010 0.64425318 0.73395272
## 332 476 361 62 68 31 166
## 0.83441353 0.37285848 0.79185477 0.97787512 0.72750928 0.77362884 0.96791540
## 728 844 843 727 923 864 544
## 0.14247132 0.82960144 0.55997118 0.96682033 0.42937825 0.95331012 0.81477714
## 118 831 190 759 307 265 840
## 0.93877514 0.86220459 0.37722398 0.94473860 0.96697596 0.97155012 0.86555498
## 624 55 290 368 697 563 38
## 0.33960826 0.27308379 0.56925010 0.41682370 0.97554258 0.76462746 0.67761686
## 341 52 122 84 595 20 456
## 0.43264942 0.86333391 0.93119938 0.82454705 0.78600898 0.87883536 0.88645041
## 449 143 887 787 998 435 928
## 0.94166451 0.53864668 0.90929582 0.90548903 0.91227649 0.69266064 0.25696195
## 814 941 76 658 696 393 650
## 0.38466852 0.93668386 0.90816502 0.89042838 0.97919648 0.34005455 0.32152284
## 1 321 506 193 640 140 833
## 0.96023468 0.29237146 0.95712920 0.47836492 0.37838380 0.82622300 0.07914164
## 483 365 423 260 215 437 967
## 0.53340789 0.61217154 0.88298470 0.92189636 0.98026116 0.91148495 0.72490380
## 675 315 762 454 927 832 875
## 0.79459682 0.97775411 0.48035964 0.94477407 0.51173770 0.19970559 0.46564731
## 136 446 45 647 352 529 966
## 0.98039096 0.91845431 0.45248815 0.47260700 0.93442869 0.25188713 0.69971236
## 490 34 72 880 179 889 724
## 0.89808915 0.98094171 0.98766706 0.98374122 0.91360467 0.69790706 0.78163605
## 657 950 346 982 463 85 521
## 0.26875104 0.87591018 0.94103123 0.53258870 0.57157044 0.75965868 0.77524946
## 54 367 180 27 401 916 175
## 0.98997490 0.98555772 0.63960279 0.87394129 0.91721827 0.30604922 0.40931042
## 872 138 324 167 102 635 816
## 0.96878874 0.85602035 0.77831617 0.38399130 0.61831921 0.34937881 0.25711127
## 989 310 589 80 252 841 273
## 0.59755719 0.40222905 0.59189231 0.49768005 0.84594114 0.50077249 0.08475800
## 606 567 226 627 869 755 874
## 0.36523202 0.61270559 0.41686407 0.92617850 0.84000625 0.90303838 0.92358278
## 497 784 815 112 439 475 477
## 0.20463530 0.21762209 0.23663203 0.42826782 0.30440517 0.79386790 0.89308432
## 234 395 462 431 272 717 653
## 0.79382803 0.96961882 0.53843156 0.98757774 0.97664955 0.97121732 0.28469428
## 632 411 359 83 232 418 312
## 0.27022759 0.47807900 0.89708749 0.79835438 0.90009800 0.46607817 0.78091890
## 481 786 128 137 350 105 732
## 0.88415481 0.88192518 0.46726478 0.96933830 0.82812802 0.98087913 0.52961075
## 410 737 638 883 895 194 592
## 0.84132925 0.38539974 0.62337575 0.76337213 0.98170824 0.96355020 0.53447279
## 884 821 919 239 645 936 722
## 0.94842558 0.79704727 0.59105924 0.91224507 0.83884815 0.51671911 0.23171676
## 798 329 309 947 236 30 229
## 0.90349041 0.57413103 0.65651568 0.27137871 0.36168154 0.26763012 0.95343034
## 830 168 376 470 973 856 718
## 0.59256216 0.80490844 0.29373496 0.97614777 0.11339120 0.76225583 0.92337754
## 377 684 174 854 806 364 954
## 0.90208585 0.87484813 0.96877276 0.23552754 0.36899768 0.91674512 0.34732828
## 406 736 794 471 637 712 213
## 0.74178822 0.32682151 0.66904884 0.55747505 0.90884553 0.19987863 0.33477274
## 488 496 318 729 515 574 582
## 0.55893598 0.71179732 0.86605865 0.09193221 0.90176166 0.38420937 0.60577062
## 259 822 469 552 760 459 540
## 0.95618976 0.73639748 0.82475405 0.81423005 0.55339904 0.50076331 0.75888575
## 992 749 205 123 489 132 22
## 0.64826383 0.98428624 0.87252898 0.91256438 0.96762821 0.20958027 0.71819247
## 14 292 507 108 726 975 611
## 0.54503785 0.57183073 0.98052595 0.66959947 0.95797789 0.90646692 0.37524160
## 246 445 386 668 375 353 65
## 0.95234235 0.63470905 0.92883302 0.62103585 0.05172623 0.99447025 0.75398537
## 419 558 597 861 498 56 698
## 0.81197409 0.67605766 0.20465651 0.98367056 0.96301260 0.96364704 0.95726864
## 849 514 106 803 519 19 191
## 0.89689210 0.47069773 0.58551527 0.60976049 0.65082105 0.44084902 0.95164853
## 579 230 769 392 248 705 36
## 0.25680834 0.26311985 0.80883330 0.87970212 0.86265934 0.34986767 0.49652951
## 342 859 237 946 587 366 782
## 0.64254158 0.28645325 0.69016314 0.09484399 0.57433861 0.97475845 0.97920194
## 581 204 299 945 161 412 508
## 0.70131776 0.79095560 0.93758093 0.57626258 0.96218064 0.97717280 0.34405712
## 29 374 523 746 613 82 681
## 0.89819292 0.71844204 0.13990541 0.57169380 0.44464600 0.92761864 0.87509387
## 546 790 466 904 349 615 785
## 0.32892539 0.18677143 0.75768720 0.96271631 0.97219470 0.93576340 0.94471829
## 181 970 35 547 78 735 768
## 0.64753530 0.67485616 0.55815486 0.82607709 0.91419512 0.96380595 0.98249099
## 255 165 478 394 111 509 609
## 0.96886497 0.69606103 0.62892873 0.92514587 0.56959813 0.82326590 0.90154796
## 925 89 12 710 60 723 773
## 0.19514802 0.54304267 0.12938091 0.80827219 0.14249984 0.30587029 0.99067013
## 57 303 323 192 505 532 484
## 0.89486257 0.85813641 0.92964034 0.42807617 0.13442739 0.35951223 0.97922160
## 731 70 963 330 910 796 944
## 0.74450985 0.86149452 0.89434439 0.63917968 0.80835015 0.84968130 0.96187567
## 64 823 665 149 984 545 862
## 0.17363325 0.30765650 0.86881581 0.68919194 0.46838717 0.92311104 0.88404072
## 276 839 990 134 809 897 100
## 0.94938643 0.86994708 0.79340479 0.73293091 0.45227054 0.34322832 0.85853315
## 667 46 441 501 678 443 372
## 0.58990936 0.78323797 0.83243989 0.15583931 0.27659765 0.74434000 0.92669870
## 287 527 721 271 95 877 502
## 0.54156518 0.86517612 0.74444862 0.98624800 0.86317028 0.25976699 0.59168207
## 286 146 43 677 565 733 311
## 0.14105257 0.43774614 0.54040176 0.92168879 0.66137277 0.92121563 0.77093175
## 328 976 570 625 977 948 766
## 0.91058463 0.77852666 0.23446944 0.55572422 0.92383944 0.89697797 0.69012618
## 765 614 596 390 662 810 703
## 0.90008403 0.79328388 0.45313359 0.90205205 0.55632246 0.39990339 0.75725585
## 402 220 661 664 289 157 789
## 0.74361157 0.88399190 0.74306627 0.73776997 0.79567834 0.93601046 0.26853570
## 493 997 666 620 663 920 296
## 0.97087106 0.57900056 0.74160084 0.80969282 0.87121313 0.49566284 0.28099581
## 953 848 604 747 326 202 812
## 0.57907908 0.67482330 0.73079587 0.26618700 0.96669014 0.35060513 0.88654313
## 719 389 347 669 775 391 77
## 0.95374322 0.77563102 0.89551050 0.38812278 0.80274246 0.83864808 0.32189014
## 67 91 734 238 101 291 219
## 0.78998891 0.97714238 0.97552220 0.27391310 0.69988452 0.98496766 0.46688139
## 188 319 339 227 457 74 694
## 0.87963119 0.89619373 0.39359341 0.49748174 0.52418577 0.58498867 0.84697112
## 867 257
## 0.33837597 0.93633078
act_value <- credit_train$Class
pcut <- 0.5
as.numeric(pred_prob > pcut)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0
## [38] 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0
## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 1
## [112] 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1
## [149] 1 1 1 0 0 0 1 1 0 1 1 0 0 1 1 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1
## [186] 1 1 1 0 1 0 1 0 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
## [223] 1 0 1 1 0 1 1 1 1 0 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1
## [260] 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [297] 1 1 1 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 0 0
## [334] 1 1 1 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1
## [371] 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0
## [408] 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1
## [445] 1 1 0 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1
## [482] 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 0 1 0 1 1 1 1
## [519] 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1
## [556] 1 1 1 1 0 1 1 1 0 0 0 1 1 1 0 1 0 1 1 1 1 1 0 0 1 0 1 0 1 1 1 0 0 1 1 1 0
## [593] 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1
## [630] 1 1 0 1 1 1 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 0
## [667] 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 1
## [704] 1 1 1 1 0 0 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 0 0 1 1 1
## [741] 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 0 1
## [778] 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 0 1 1 1 0 1
pred_value <- 1 * (pred_prob > pcut)
table(act_value)
## act_value
## FALSE TRUE
## 246 554
confusion_mat <- table(actual = act_value, predict = pred_value)
confusion_mat
## predict
## actual 0 1
## FALSE 134 112
## TRUE 62 492
# MR = (FP + FN) / Total #
MR_rate <- (112+62)/(134+112+62+492)
MR_rate
## [1] 0.2175
Your comment: 21.75% of the time predictions are incorrect
library(ROCR)
pred <- prediction(predictions = pred_prob, labels = credit_train$Class)
perf <- performance(pred, "tpr", "fpr")
plot(perf,colorize = T)
## Find optimal cut off prob
pcut_seq <- seq(from = 0, to = 1, by = 0.01)
WMR_seq <- rep(0, length(pcut_seq))
for(i in 1: length(pcut_seq)){
pcut <- pcut_seq[i]
pred_value <- as.numeric(pred_prob > pcut)
#calculate # of FP and FN
FP <- sum(credit_train$Class == 0 & pred_value == 1)
FN <- sum(credit_train$Class == 1 & pred_value == 0)
# calculate WMR Rate
WMR <- (0.9 *FN + 0.1 *FP)/length(pred_value)
WMR_seq[i] <- WMR
}
plot(WMR_seq ~ pcut_seq)
which.min(WMR_seq)
## [1] 9
pcut_seq[which.min(WMR_seq)]
## [1] 0.08
Your comment: Optimal cutoff is 0.08
# install.packages("pROC")
library(pROC)
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
pROC::auc(credit_train$Class, pred_prob)
## Setting levels: control = FALSE, case = TRUE
## Setting direction: controls < cases
## Area under the curve: 0.8312
pROC::roc(credit_train$Class, pred_prob)
## Setting levels: control = FALSE, case = TRUE
## Setting direction: controls < cases
##
## Call:
## roc.default(response = credit_train$Class, predictor = pred_prob)
##
## Data: pred_prob in 246 controls (credit_train$Class FALSE) < 554 cases (credit_train$Class TRUE).
## Area under the curve: 0.8312
Your comment: With an AUC of 83.12%, it can correctly select good vs bad credit 83% of the time. ## Task 4: Model Evaluation (Part II)
pred_prob_test <- predict(credit_log, newdata = credit_test, type = "response")
pred_prob_test
## 3 4 7 9 17 21 24 39
## 0.9772401 0.6814121 0.9038051 0.9909554 0.9742847 0.8756654 0.9516930 0.8876975
## 42 48 51 61 63 66 69 75
## 0.7466523 0.8952729 0.7223591 0.6706598 0.5768842 0.5582612 0.7020236 0.5482519
## 79 86 92 93 97 103 109 113
## 0.8809099 0.9650999 0.9353571 0.9361437 0.9818417 0.9377578 0.8978622 0.5338887
## 114 115 124 125 129 131 133 141
## 0.5099719 0.6878917 0.8518698 0.6880750 0.9134794 0.5234859 0.7764952 0.9850522
## 150 151 162 172 173 178 183 189
## 0.9815586 0.9203494 0.7580465 0.8952890 0.6503636 0.8418932 0.3037764 0.7466948
## 200 208 212 216 221 224 235 242
## 0.5758866 0.8966099 0.9515127 0.9915721 0.6818632 0.9338395 0.9845366 0.9673127
## 243 244 245 247 249 253 258 262
## 0.2655885 0.9271178 0.8390568 0.9511028 0.8335346 0.1894182 0.3363573 0.4473779
## 267 270 275 282 283 288 293 294
## 0.9285523 0.9324453 0.1149949 0.9327905 0.8692413 0.6855770 0.7141235 0.8882975
## 313 327 338 343 344 351 354 355
## 0.5888179 0.9866129 0.5691214 0.6956107 0.5916764 0.9067081 0.3053874 0.8901967
## 356 357 358 370 379 381 400 409
## 0.3783597 0.9942734 0.8211146 0.7661366 0.1486400 0.9013356 0.9742049 0.8655252
## 421 422 424 426 436 442 448 453
## 0.9297396 0.9336753 0.9594899 0.8410436 0.9308694 0.4572932 0.9213999 0.8619465
## 455 458 467 472 480 485 486 487
## 0.4715316 0.8224843 0.4705888 0.2810850 0.7008234 0.9522705 0.6789267 0.9570692
## 492 494 511 513 518 525 526 530
## 0.2827850 0.9016305 0.6425128 0.8916845 0.7387778 0.6598609 0.6141441 0.5520584
## 534 535 537 538 541 542 543 548
## 0.9370869 0.9049687 0.7067898 0.7661943 0.7626184 0.8206858 0.5470357 0.8809928
## 556 561 566 578 584 588 591 600
## 0.7528453 0.7591008 0.7581759 0.9385016 0.1967097 0.7529203 0.8242370 0.9377106
## 602 603 605 610 617 618 623 628
## 0.5931661 0.1900353 0.6272732 0.9554712 0.6549202 0.5392176 0.7144620 0.6668122
## 634 636 641 642 649 659 660 670
## 0.6960458 0.5002807 0.4064320 0.3464538 0.4861953 0.3055277 0.8491247 0.8448649
## 673 674 688 689 690 701 702 704
## 0.4487838 0.9468073 0.4053282 0.9149924 0.7182936 0.8078283 0.6494789 0.5371601
## 706 707 709 711 716 738 741 751
## 0.7509929 0.3667677 0.6037826 0.9213382 0.9898570 0.3536597 0.3300860 0.6213036
## 757 761 770 777 780 795 804 805
## 0.9855870 0.9733062 0.9858863 0.8846609 0.4338050 0.8231918 0.9805477 0.6781081
## 808 811 818 826 828 829 835 838
## 0.9785474 0.7417941 0.9821373 0.4222999 0.7275023 0.7668071 0.7655679 0.8978307
## 842 847 851 852 853 858 863 885
## 0.9500000 0.8721425 0.6665308 0.9956044 0.9537740 0.9539546 0.4479711 0.7426700
## 886 888 890 892 894 896 899 905
## 0.3234184 0.1881873 0.8733112 0.9558426 0.8213964 0.9758381 0.9662557 0.9320719
## 921 931 933 934 935 938 942 952
## 0.8517034 0.7096466 0.9290540 0.9531494 0.2974182 0.6220468 0.9537083 0.7617198
## 955 968 971 978 980 981 985 988
## 0.4494508 0.7276164 0.6663540 0.8272782 0.3348674 0.8701366 0.9768891 0.9425153
act_value <- credit_test$Class
pcut <- 0.5
as.numeric(pred_prob_test > pcut)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [38] 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1
## [75] 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 1 0 1 1 1 1 1 1 0 1 1
## [149] 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1
## [186] 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1
pred_value <- 1 * (pred_prob_test > pcut)
table(act_value)
## act_value
## FALSE TRUE
## 54 146
confusion_mat <- table(actual = act_value, predict = pred_value)
confusion_mat
## predict
## actual 0 1
## FALSE 21 33
## TRUE 12 134
# MR = (FP + FN) / Total #
MR_rate <- (112+62)/(134+112+62+492)
MR_rate
## [1] 0.2175
Your comment:
21.75% of the time predictions are incorrect
(Note to Prof: I am not sure if you menat to ask us to do this for the training set or testing set…in the subquestions 1 and 3 you ask for testing but this one is asking for training…my testing set would not work so I just did it again with my training set and got the samw result from the previous time I did this in this assignment for Task 3.
library(ROCR)
pred <- prediction(predictions = pred_prob_test, labels = credit_test$Class)
perf <- performance(pred, "tpr", "fpr")
plot(perf,colorize = T)
# install.packages("pROC")
library(pROC)
pROC::auc(credit_train$Class, pred_prob)
## Setting levels: control = FALSE, case = TRUE
## Setting direction: controls < cases
## Area under the curve: 0.8312
pROC::roc(credit_train$Class, pred_prob)
## Setting levels: control = FALSE, case = TRUE
## Setting direction: controls < cases
##
## Call:
## roc.default(response = credit_train$Class, predictor = pred_prob)
##
## Data: pred_prob in 246 controls (credit_train$Class FALSE) < 554 cases (credit_train$Class TRUE).
## Area under the curve: 0.8312
AUC is 83.12% which means it predicts who will have good credit vs bad credit aeround 83% of the time which shows the model is very good but still could be improved.