Homework4

Starter code for German credit scoring

Refer to http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)) for variable description. The response variable is Class and all others are predictors.

Only run the following code once to install the package caret. The German credit scoring data in provided in that package.

if (!require("caret", quietly = TRUE)) {
    install.packages("caret")
}

Task1: Data Preparation

1. Load the caret package and the GermanCredit dataset.

library(caret) #this package contains the german data with its numeric format
data(GermanCredit)
GermanCredit$Class <-  GermanCredit$Class == "Good" # use this code to convert `Class` into True or False (equivalent to 1 or 0)
str(GermanCredit)

## 'data.frame':    1000 obs. of  62 variables:
##  $ Duration                              : int  6 48 12 42 24 36 24 36 12 30 ...
##  $ Amount                                : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ InstallmentRatePercentage             : int  4 2 2 2 3 2 3 2 2 4 ...
##  $ ResidenceDuration                     : int  4 2 3 4 4 4 4 2 4 2 ...
##  $ Age                                   : int  67 22 49 45 53 35 53 35 61 28 ...
##  $ NumberExistingCredits                 : int  2 1 1 1 2 1 1 1 1 2 ...
##  $ NumberPeopleMaintenance               : int  1 1 2 2 2 2 1 1 1 1 ...
##  $ Telephone                             : num  0 1 1 1 1 0 1 0 1 1 ...
##  $ ForeignWorker                         : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Class                                 : logi  TRUE FALSE TRUE TRUE FALSE TRUE ...
##  $ CheckingAccountStatus.lt.0            : num  1 0 0 1 1 0 0 0 0 0 ...
##  $ CheckingAccountStatus.0.to.200        : num  0 1 0 0 0 0 0 1 0 1 ...
##  $ CheckingAccountStatus.gt.200          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CheckingAccountStatus.none            : num  0 0 1 0 0 1 1 0 1 0 ...
##  $ CreditHistory.NoCredit.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.ThisBank.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.PaidDuly                : num  0 1 0 1 0 1 1 1 1 0 ...
##  $ CreditHistory.Delay                   : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ CreditHistory.Critical                : num  1 0 1 0 0 0 0 0 0 1 ...
##  $ Purpose.NewCar                        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ Purpose.UsedCar                       : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Purpose.Furniture.Equipment           : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Purpose.Radio.Television              : num  1 1 0 0 0 0 0 0 1 0 ...
##  $ Purpose.DomesticAppliance             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Repairs                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Education                     : num  0 0 1 0 0 1 0 0 0 0 ...
##  $ Purpose.Vacation                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Retraining                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Business                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Other                         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.lt.100            : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ SavingsAccountBonds.100.to.500        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.500.to.1000       : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ SavingsAccountBonds.gt.1000           : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ SavingsAccountBonds.Unknown           : num  1 0 0 0 0 1 0 0 0 0 ...
##  $ EmploymentDuration.lt.1               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EmploymentDuration.1.to.4             : num  0 1 0 0 1 1 0 1 0 0 ...
##  $ EmploymentDuration.4.to.7             : num  0 0 1 1 0 0 0 0 1 0 ...
##  $ EmploymentDuration.gt.7               : num  1 0 0 0 0 0 1 0 0 0 ...
##  $ EmploymentDuration.Unemployed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Male.Divorced.Seperated      : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ Personal.Female.NotSingle             : num  0 1 0 0 0 0 0 0 0 0 ...
##  $ Personal.Male.Single                  : num  1 0 1 1 1 1 1 1 0 0 ...
##  $ Personal.Male.Married.Widowed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Female.Single                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.None           : num  1 1 1 0 1 1 1 1 1 1 ...
##  $ OtherDebtorsGuarantors.CoApplicant    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.Guarantor      : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ Property.RealEstate                   : num  1 1 1 0 0 0 0 0 1 0 ...
##  $ Property.Insurance                    : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Property.CarOther                     : num  0 0 0 0 0 0 0 1 0 1 ...
##  $ Property.Unknown                      : num  0 0 0 0 1 1 0 0 0 0 ...
##  $ OtherInstallmentPlans.Bank            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.Stores          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.None            : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Housing.Rent                          : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Housing.Own                           : num  1 1 1 0 0 0 1 0 1 1 ...
##  $ Housing.ForFree                       : num  0 0 0 1 1 1 0 0 0 0 ...
##  $ Job.UnemployedUnskilled               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Job.UnskilledResident                 : num  0 0 1 0 0 1 0 0 1 0 ...
##  $ Job.SkilledEmployee                   : num  1 1 0 1 1 0 1 0 0 0 ...
##  $ Job.Management.SelfEmp.HighlyQualified: num  0 0 0 0 0 0 0 1 0 1 ...

Your observation: The majority of variables are numeric and only display 0s or 1s but a few variables are integers. 1 variable is logical

#This is an optional code that drop variables that provide no information in the data
GermanCredit = GermanCredit[,-c(14,19,27,30,35,40,44,45,48,52,55,58,62)] #don't run this code twice!! Think about why.

2. Explore the dataset to understand its structure.

(1) How many observations and variables are there? (2 pts)

str(GermanCredit)

## 'data.frame':    1000 obs. of  49 variables:
##  $ Duration                          : int  6 48 12 42 24 36 24 36 12 30 ...
##  $ Amount                            : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ InstallmentRatePercentage         : int  4 2 2 2 3 2 3 2 2 4 ...
##  $ ResidenceDuration                 : int  4 2 3 4 4 4 4 2 4 2 ...
##  $ Age                               : int  67 22 49 45 53 35 53 35 61 28 ...
##  $ NumberExistingCredits             : int  2 1 1 1 2 1 1 1 1 2 ...
##  $ NumberPeopleMaintenance           : int  1 1 2 2 2 2 1 1 1 1 ...
##  $ Telephone                         : num  0 1 1 1 1 0 1 0 1 1 ...
##  $ ForeignWorker                     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Class                             : logi  TRUE FALSE TRUE TRUE FALSE TRUE ...
##  $ CheckingAccountStatus.lt.0        : num  1 0 0 1 1 0 0 0 0 0 ...
##  $ CheckingAccountStatus.0.to.200    : num  0 1 0 0 0 0 0 1 0 1 ...
##  $ CheckingAccountStatus.gt.200      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.NoCredit.AllPaid    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.ThisBank.AllPaid    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.PaidDuly            : num  0 1 0 1 0 1 1 1 1 0 ...
##  $ CreditHistory.Delay               : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ Purpose.NewCar                    : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ Purpose.UsedCar                   : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Purpose.Furniture.Equipment       : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Purpose.Radio.Television          : num  1 1 0 0 0 0 0 0 1 0 ...
##  $ Purpose.DomesticAppliance         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Repairs                   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Education                 : num  0 0 1 0 0 1 0 0 0 0 ...
##  $ Purpose.Retraining                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Business                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.lt.100        : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ SavingsAccountBonds.100.to.500    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.500.to.1000   : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ SavingsAccountBonds.gt.1000       : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ EmploymentDuration.lt.1           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EmploymentDuration.1.to.4         : num  0 1 0 0 1 1 0 1 0 0 ...
##  $ EmploymentDuration.4.to.7         : num  0 0 1 1 0 0 0 0 1 0 ...
##  $ EmploymentDuration.gt.7           : num  1 0 0 0 0 0 1 0 0 0 ...
##  $ Personal.Male.Divorced.Seperated  : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ Personal.Female.NotSingle         : num  0 1 0 0 0 0 0 0 0 0 ...
##  $ Personal.Male.Single              : num  1 0 1 1 1 1 1 1 0 0 ...
##  $ OtherDebtorsGuarantors.None       : num  1 1 1 0 1 1 1 1 1 1 ...
##  $ OtherDebtorsGuarantors.CoApplicant: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Property.RealEstate               : num  1 1 1 0 0 0 0 0 1 0 ...
##  $ Property.Insurance                : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Property.CarOther                 : num  0 0 0 0 0 0 0 1 0 1 ...
##  $ OtherInstallmentPlans.Bank        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.Stores      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Housing.Rent                      : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Housing.Own                       : num  1 1 1 0 0 0 1 0 1 1 ...
##  $ Job.UnemployedUnskilled           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Job.UnskilledResident             : num  0 0 1 0 0 1 0 0 1 0 ...
##  $ Job.SkilledEmployee               : num  1 1 0 1 1 0 1 0 0 0 ...

Your observation: There are 1000 observations and 49 variables

(2) Please make a frequency table of variable `class` (use `table()` function). How many observations are classed as “good” and how many are “bad”? (2 pts)

table(GermanCredit$Class)

## 
## FALSE  TRUE 
##   300   700

Your observation: 300 are classified as Bad and 700 are classified as Good

(3) Please make a barplot of of response variable `class`. Please add titles and labels to axis. (2 pts)

barplot(table(GermanCredit$Class), main = "Barplot for German Credit vs Class", xlab = "Class", ylab = "# of Credit Observations")

3. Split the dataset into training and test set. A random seed of `2025` is set for reproducibility. Please comment on what is the split proportion you choose for training and testing data? (2 pts)

set.seed(2025) # set random seed for reproducibility.
training_cred <- sample(1:nrow(GermanCredit), round(0.8 * nrow(GermanCredit)))
credit_train <- GermanCredit[training_cred, ]
credit_test <- GermanCredit[-training_cred, ]

Your comment: My split was 20% testing and 80% training

Task 2: Model Fitting

1. Fit a logistic regression model using the training set. Please use all variables, but make sure the variable types are right. (2 pts)

credit_log <- glm(Class ~ .,data = credit_train, family = "binomial")

2. Summarize the model and interpret the coefficients. What is the estimated coefficients for variable `InstallmentRatePercentage`? Is it significant, and why? (2 pts)

summary(credit_log)

## 
## Call:
## glm(formula = Class ~ ., family = "binomial", data = credit_train)
## 
## Coefficients:
##                                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                         8.815e+00  1.594e+00   5.529 3.22e-08 ***
## Duration                           -2.643e-02  1.027e-02  -2.573 0.010068 *  
## Amount                             -1.355e-04  4.957e-05  -2.734 0.006253 ** 
## InstallmentRatePercentage          -2.925e-01  9.747e-02  -3.001 0.002693 ** 
## ResidenceDuration                  -4.893e-02  9.689e-02  -0.505 0.613531    
## Age                                 1.355e-02  1.020e-02   1.329 0.183763    
## NumberExistingCredits              -3.599e-01  2.175e-01  -1.654 0.098027 .  
## NumberPeopleMaintenance            -3.644e-01  2.712e-01  -1.344 0.178951    
## Telephone                          -2.654e-01  2.276e-01  -1.166 0.243564    
## ForeignWorker                      -1.454e+00  7.181e-01  -2.025 0.042862 *  
## CheckingAccountStatus.lt.0         -1.775e+00  2.635e-01  -6.735 1.64e-11 ***
## CheckingAccountStatus.0.to.200     -1.393e+00  2.596e-01  -5.365 8.09e-08 ***
## CheckingAccountStatus.gt.200       -9.145e-01  4.225e-01  -2.165 0.030425 *  
## CreditHistory.NoCredit.AllPaid     -1.232e+00  4.874e-01  -2.528 0.011465 *  
## CreditHistory.ThisBank.AllPaid     -1.570e+00  4.829e-01  -3.252 0.001147 ** 
## CreditHistory.PaidDuly             -9.374e-01  2.907e-01  -3.224 0.001262 ** 
## CreditHistory.Delay                -5.626e-01  3.728e-01  -1.509 0.131283    
## Purpose.NewCar                     -1.417e+00  8.247e-01  -1.718 0.085814 .  
## Purpose.UsedCar                     2.976e-01  8.684e-01   0.343 0.731867    
## Purpose.Furniture.Equipment        -5.842e-01  8.330e-01  -0.701 0.483148    
## Purpose.Radio.Television           -4.791e-01  8.311e-01  -0.576 0.564284    
## Purpose.DomesticAppliance          -7.999e-01  1.209e+00  -0.661 0.508314    
## Purpose.Repairs                    -1.601e+00  1.011e+00  -1.585 0.113077    
## Purpose.Education                  -1.304e+00  9.075e-01  -1.436 0.150873    
## Purpose.Retraining                  3.610e-01  1.500e+00   0.241 0.809867    
## Purpose.Business                   -6.988e-01  8.558e-01  -0.817 0.414159    
## SavingsAccountBonds.lt.100         -1.065e+00  3.092e-01  -3.444 0.000573 ***
## SavingsAccountBonds.100.to.500     -8.837e-01  3.957e-01  -2.233 0.025519 *  
## SavingsAccountBonds.500.to.1000    -9.486e-01  5.005e-01  -1.896 0.058026 .  
## SavingsAccountBonds.gt.1000         1.460e-01  5.940e-01   0.246 0.805829    
## EmploymentDuration.lt.1             9.279e-02  4.634e-01   0.200 0.841281    
## EmploymentDuration.1.to.4           2.090e-01  4.443e-01   0.470 0.638077    
## EmploymentDuration.4.to.7           9.869e-01  4.882e-01   2.021 0.043232 *  
## EmploymentDuration.gt.7             1.785e-01  4.568e-01   0.391 0.695924    
## Personal.Male.Divorced.Seperated   -1.288e-01  5.011e-01  -0.257 0.797124    
## Personal.Female.NotSingle          -2.033e-01  3.478e-01  -0.584 0.558947    
## Personal.Male.Single                3.802e-01  3.481e-01   1.092 0.274710    
## OtherDebtorsGuarantors.None        -7.408e-01  4.667e-01  -1.587 0.112448    
## OtherDebtorsGuarantors.CoApplicant -1.220e+00  6.317e-01  -1.931 0.053423 .  
## Property.RealEstate                 3.867e-01  4.875e-01   0.793 0.427607    
## Property.Insurance                  1.635e-01  4.784e-01   0.342 0.732464    
## Property.CarOther                   1.796e-01  4.596e-01   0.391 0.695959    
## OtherInstallmentPlans.Bank         -7.233e-01  2.617e-01  -2.764 0.005716 ** 
## OtherInstallmentPlans.Stores       -1.929e-01  4.246e-01  -0.454 0.649608    
## Housing.Rent                       -4.468e-01  5.415e-01  -0.825 0.409294    
## Housing.Own                        -2.017e-01  5.184e-01  -0.389 0.697127    
## Job.UnemployedUnskilled             5.206e-01  7.212e-01   0.722 0.470322    
## Job.UnskilledResident              -2.015e-02  3.833e-01  -0.053 0.958079    
## Job.SkilledEmployee                -6.640e-02  3.153e-01  -0.211 0.833214    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 987.34  on 799  degrees of freedom
## Residual deviance: 728.49  on 751  degrees of freedom
## AIC: 826.49
## 
## Number of Fisher Scoring iterations: 5

Your comment: The estimated coefficient for InstallmentRatePercentage is -.2925 and is significant because there are 2 stars next to it and the P-value is very low. #### 3. Please interpret this number in detail (please calculate the corresponding odds ratio, and interpret it). (2 pts)

exp(-.2925)

## [1] 0.7463952

Your comment: For every one unit increased in the “InstallmentRatePercentage” variable, it will lead to a decrease by 29.25% down to a 74.6% of the original value.

Task 3: Model Evaluation (Part I)

1. Use the training set to obtain predicted probabilities. (2 pts)

pred_prob <- predict(credit_log, newdata = credit_train, type = "response")
pred_prob

##        909        460        932        922        961        279        510 
## 0.95773202 0.89643422 0.59583944 0.82286550 0.95718687 0.87975870 0.88651650 
##        187        266        461        881        972        900        539 
## 0.58454522 0.63202386 0.68970175 0.96553745 0.69746948 0.52224145 0.05490455 
##        983        891        373        407        571        907        159 
## 0.52249286 0.36098959 0.93708656 0.99351643 0.31363827 0.92991255 0.53316440 
##        142        549        800        575        797        801        764 
## 0.33074716 0.22196333 0.86057386 0.62898028 0.92283095 0.82412861 0.85868114 
##        857        630         88        195        686        940        371 
## 0.99213773 0.97666440 0.27369504 0.55873152 0.58032700 0.98661879 0.85027790 
##        306         59        317        528        837        564        186 
## 0.96319764 0.46112830 0.82434373 0.98921934 0.93761834 0.51341403 0.94169214 
##        144        882         49        399         50        626        185 
## 0.63998359 0.96257747 0.88981728 0.54493907 0.81581919 0.96177678 0.48244904 
##        107        491        433        302        536        788        991 
## 0.29928848 0.95716547 0.69883918 0.38382487 0.71526357 0.97055596 0.95620908 
##        779        503        819        937        987        958        214 
## 0.98265791 0.88857998 0.11641353 0.81302085 0.28285732 0.88320224 0.86707488 
##        304        850        913        593        744        153        550 
## 0.73196238 0.71054119 0.74405169 0.87107759 0.37613926 0.52420738 0.98906540 
##        440        754        432        772        408        753        517 
## 0.67044209 0.87179854 0.26823056 0.18469067 0.83178670 0.82059780 0.92548726 
##        413        495        325        336        363         41        182 
## 0.93151965 0.82645341 0.81282925 0.87894074 0.61872074 0.75168692 0.51422080 
##        792        599        104         15        308        524        742 
## 0.96832521 0.90138841 0.92733401 0.29326192 0.60017238 0.94944969 0.55563970 
##        725        387        300        264        648         23        776 
## 0.57158582 0.93277872 0.97108970 0.74561871 0.80100535 0.89465682 0.38538900 
##        898        482        154        781        962        873        743 
## 0.99815049 0.38287596 0.88786906 0.92527108 0.52076923 0.87822850 0.96039632 
##        956        687        708        654        827        878        652 
## 0.84880908 0.98296713 0.23831359 0.23817412 0.26702978 0.78440766 0.69790073 
##        340        608        995        425        807        583        960 
## 0.59153368 0.35332001 0.93882956 0.79982099 0.92804133 0.82935872 0.57804224 
##         37        228        468        846        929        117        254 
## 0.77034184 0.51558271 0.68361342 0.94824647 0.96522167 0.61614621 0.90632971 
##        915        820        876        676        758        586        879 
## 0.29228540 0.34427125 0.77382084 0.92037755 0.97210807 0.36712068 0.53356541 
##        164        994        672        531        170        825        218 
## 0.41419800 0.37236901 0.90901122 0.43366378 0.66927855 0.91205400 0.79480729 
##        870        691        824        639        791        148        590 
## 0.36548962 0.71099397 0.66770445 0.77020742 0.51847461 0.94758734 0.81269341 
##        405        901        110        210         10        241         18 
## 0.67078335 0.70864554 0.85037445 0.99918711 0.26553521 0.27515076 0.19553322 
##        385        158        692        284        177        430        504 
## 0.87579681 0.50546028 0.40022631 0.98769478 0.62931376 0.43772301 0.49539906 
##        240          6         90        715        250        447        912 
## 0.76113070 0.79797878 0.42010822 0.10526342 0.73690550 0.16749866 0.49843792 
##         25        465        774        369        930        121        256 
## 0.99565385 0.91847149 0.88506903 0.32243455 0.41486430 0.49085271 0.79941546 
##        314        295        845        301        943        331        429 
## 0.55879628 0.63735822 0.57126803 0.95130964 0.97919319 0.88878503 0.94687289 
##        580        557        500        917        671         53        965 
## 0.87728537 0.21687782 0.76457253 0.96654072 0.93276602 0.92582903 0.45494320 
##        644        748        438          5        316        397         28 
## 0.98063529 0.39547587 0.93615565 0.26115599 0.13164892 0.51777319 0.81811262 
##        297        908        163        568        999        396        152 
## 0.96700731 0.58543713 0.93793631 0.98965747 0.41792693 0.42157224 0.99226554 
##        280        320        750        836        427        298        414 
## 0.88619208 0.54686011 0.98464786 0.16515719 0.90109376 0.97477224 0.93245092 
##         81        713          8        416        474        388        607 
## 0.93905641 0.98745949 0.73458617 0.94813351 0.98357157 0.58509803 0.97552702 
##        278        428        348        130        382        206        559 
## 0.87927669 0.98687108 0.60403393 0.32848365 0.45061515 0.50921460 0.34708508 
##        868        680        926        643        699        730        184 
## 0.96309287 0.87099453 0.15627585 0.86590479 0.92852426 0.95956655 0.98700998 
##        656        551        147        679        222        209        145 
## 0.41370077 0.95540357 0.67945593 0.41902555 0.32708363 0.31114361 0.89205580 
##        499         58        572        285        866        865        362 
## 0.79471639 0.71899947 0.94400958 0.53769990 0.83542971 0.87225414 0.94412981 
##        959        533        268        444        251        612        986 
## 0.33952173 0.95484178 0.79868640 0.62899020 0.93369539 0.68956026 0.48341907 
##        169        576        619        951         40         98        452 
## 0.89361266 0.87353374 0.32505854 0.67006189 0.76349041 0.61817326 0.92487607 
##        199        383        451         16        752        333        415 
## 0.85137726 0.87295845 0.96919858 0.46912898 0.40051610 0.06405302 0.44700200 
##        577        683        337        629        902        996        813 
## 0.84231892 0.78603796 0.77386531 0.89794248 0.95327636 0.94193011 0.72611772 
##        740        198        516        345        957        269         71 
## 0.16720595 0.13457044 0.88996788 0.80963914 0.89968692 0.66864346 0.74371163 
##        203          2        860         73        682        450        211 
## 0.89033316 0.35895450 0.98063092 0.80463656 0.97100575 0.72543622 0.98940178 
##        633        871        655        155        223        573        116 
## 0.84416446 0.87038090 0.98596150 0.73122756 0.81737835 0.95119186 0.97719956 
##        261        554        855        274        979        334        156 
## 0.75358290 0.70690633 0.52455814 0.55276744 0.66161361 0.59482687 0.48508345 
##        562        714        233        622        281        135        993 
## 0.48848740 0.78351586 0.96672829 0.78994951 0.98798977 0.73897060 0.82796790 
##        914        720        616        646        911        225        522 
## 0.98168163 0.41967461 0.36821170 0.79906079 0.69583353 0.95667459 0.55785604 
##        949        693        631        964         44        918         94 
## 0.71391355 0.74146546 0.44015318 0.81105580 0.85422423 0.19639600 0.75367799 
##        763        621        126        585        378        974        569 
## 0.57344508 0.87653235 0.50806034 0.90457114 0.98515853 0.06110540 0.76298207 
##        305         26        417        939        799         32        120 
## 0.58324796 0.78877498 0.32543043 0.14423310 0.90725119 0.52009960 0.85799473 
##        594        700        802        201        893        473        197 
## 0.26518127 0.61965056 0.79727080 0.94403002 0.69279047 0.43025917 0.97341895 
##        322        512         99        479        793        398        207 
## 0.36659751 0.94997551 0.77281290 0.90436579 0.98835938 0.64438000 0.94486134 
##        380        756        817        783        739        969        685 
## 0.88737724 0.50775546 0.95909366 0.92752115 0.95026803 0.93382901 0.65365662 
##        771         96        555        464        176         47        924 
## 0.83484740 0.08247094 0.67348910 0.92306831 0.82819816 0.83681086 0.61802524 
##        434        834         11        127        906        403        171 
## 0.79146925 0.80812096 0.33038335 0.64788520 0.71603305 0.67846460 0.14769656 
##        263        160        420        277         13        217        767 
## 0.49765422 0.99211507 0.58574586 0.92491095 0.79664229 0.56490709 0.40123248 
##        778        553        651        119        695        560        601 
## 0.76593752 0.73649917 0.34496041 0.56237524 0.94584300 0.86599307 0.90655621 
##       1000         33        360        520        903        598         87 
## 0.80623596 0.46462379 0.43855604 0.98450787 0.97001995 0.58203342 0.62449858 
##        139        404        384        231        335        745        196 
## 0.96256610 0.80087656 0.62802748 0.57372925 0.08462010 0.64425318 0.73395272 
##        332        476        361         62         68         31        166 
## 0.83441353 0.37285848 0.79185477 0.97787512 0.72750928 0.77362884 0.96791540 
##        728        844        843        727        923        864        544 
## 0.14247132 0.82960144 0.55997118 0.96682033 0.42937825 0.95331012 0.81477714 
##        118        831        190        759        307        265        840 
## 0.93877514 0.86220459 0.37722398 0.94473860 0.96697596 0.97155012 0.86555498 
##        624         55        290        368        697        563         38 
## 0.33960826 0.27308379 0.56925010 0.41682370 0.97554258 0.76462746 0.67761686 
##        341         52        122         84        595         20        456 
## 0.43264942 0.86333391 0.93119938 0.82454705 0.78600898 0.87883536 0.88645041 
##        449        143        887        787        998        435        928 
## 0.94166451 0.53864668 0.90929582 0.90548903 0.91227649 0.69266064 0.25696195 
##        814        941         76        658        696        393        650 
## 0.38466852 0.93668386 0.90816502 0.89042838 0.97919648 0.34005455 0.32152284 
##          1        321        506        193        640        140        833 
## 0.96023468 0.29237146 0.95712920 0.47836492 0.37838380 0.82622300 0.07914164 
##        483        365        423        260        215        437        967 
## 0.53340789 0.61217154 0.88298470 0.92189636 0.98026116 0.91148495 0.72490380 
##        675        315        762        454        927        832        875 
## 0.79459682 0.97775411 0.48035964 0.94477407 0.51173770 0.19970559 0.46564731 
##        136        446         45        647        352        529        966 
## 0.98039096 0.91845431 0.45248815 0.47260700 0.93442869 0.25188713 0.69971236 
##        490         34         72        880        179        889        724 
## 0.89808915 0.98094171 0.98766706 0.98374122 0.91360467 0.69790706 0.78163605 
##        657        950        346        982        463         85        521 
## 0.26875104 0.87591018 0.94103123 0.53258870 0.57157044 0.75965868 0.77524946 
##         54        367        180         27        401        916        175 
## 0.98997490 0.98555772 0.63960279 0.87394129 0.91721827 0.30604922 0.40931042 
##        872        138        324        167        102        635        816 
## 0.96878874 0.85602035 0.77831617 0.38399130 0.61831921 0.34937881 0.25711127 
##        989        310        589         80        252        841        273 
## 0.59755719 0.40222905 0.59189231 0.49768005 0.84594114 0.50077249 0.08475800 
##        606        567        226        627        869        755        874 
## 0.36523202 0.61270559 0.41686407 0.92617850 0.84000625 0.90303838 0.92358278 
##        497        784        815        112        439        475        477 
## 0.20463530 0.21762209 0.23663203 0.42826782 0.30440517 0.79386790 0.89308432 
##        234        395        462        431        272        717        653 
## 0.79382803 0.96961882 0.53843156 0.98757774 0.97664955 0.97121732 0.28469428 
##        632        411        359         83        232        418        312 
## 0.27022759 0.47807900 0.89708749 0.79835438 0.90009800 0.46607817 0.78091890 
##        481        786        128        137        350        105        732 
## 0.88415481 0.88192518 0.46726478 0.96933830 0.82812802 0.98087913 0.52961075 
##        410        737        638        883        895        194        592 
## 0.84132925 0.38539974 0.62337575 0.76337213 0.98170824 0.96355020 0.53447279 
##        884        821        919        239        645        936        722 
## 0.94842558 0.79704727 0.59105924 0.91224507 0.83884815 0.51671911 0.23171676 
##        798        329        309        947        236         30        229 
## 0.90349041 0.57413103 0.65651568 0.27137871 0.36168154 0.26763012 0.95343034 
##        830        168        376        470        973        856        718 
## 0.59256216 0.80490844 0.29373496 0.97614777 0.11339120 0.76225583 0.92337754 
##        377        684        174        854        806        364        954 
## 0.90208585 0.87484813 0.96877276 0.23552754 0.36899768 0.91674512 0.34732828 
##        406        736        794        471        637        712        213 
## 0.74178822 0.32682151 0.66904884 0.55747505 0.90884553 0.19987863 0.33477274 
##        488        496        318        729        515        574        582 
## 0.55893598 0.71179732 0.86605865 0.09193221 0.90176166 0.38420937 0.60577062 
##        259        822        469        552        760        459        540 
## 0.95618976 0.73639748 0.82475405 0.81423005 0.55339904 0.50076331 0.75888575 
##        992        749        205        123        489        132         22 
## 0.64826383 0.98428624 0.87252898 0.91256438 0.96762821 0.20958027 0.71819247 
##         14        292        507        108        726        975        611 
## 0.54503785 0.57183073 0.98052595 0.66959947 0.95797789 0.90646692 0.37524160 
##        246        445        386        668        375        353         65 
## 0.95234235 0.63470905 0.92883302 0.62103585 0.05172623 0.99447025 0.75398537 
##        419        558        597        861        498         56        698 
## 0.81197409 0.67605766 0.20465651 0.98367056 0.96301260 0.96364704 0.95726864 
##        849        514        106        803        519         19        191 
## 0.89689210 0.47069773 0.58551527 0.60976049 0.65082105 0.44084902 0.95164853 
##        579        230        769        392        248        705         36 
## 0.25680834 0.26311985 0.80883330 0.87970212 0.86265934 0.34986767 0.49652951 
##        342        859        237        946        587        366        782 
## 0.64254158 0.28645325 0.69016314 0.09484399 0.57433861 0.97475845 0.97920194 
##        581        204        299        945        161        412        508 
## 0.70131776 0.79095560 0.93758093 0.57626258 0.96218064 0.97717280 0.34405712 
##         29        374        523        746        613         82        681 
## 0.89819292 0.71844204 0.13990541 0.57169380 0.44464600 0.92761864 0.87509387 
##        546        790        466        904        349        615        785 
## 0.32892539 0.18677143 0.75768720 0.96271631 0.97219470 0.93576340 0.94471829 
##        181        970         35        547         78        735        768 
## 0.64753530 0.67485616 0.55815486 0.82607709 0.91419512 0.96380595 0.98249099 
##        255        165        478        394        111        509        609 
## 0.96886497 0.69606103 0.62892873 0.92514587 0.56959813 0.82326590 0.90154796 
##        925         89         12        710         60        723        773 
## 0.19514802 0.54304267 0.12938091 0.80827219 0.14249984 0.30587029 0.99067013 
##         57        303        323        192        505        532        484 
## 0.89486257 0.85813641 0.92964034 0.42807617 0.13442739 0.35951223 0.97922160 
##        731         70        963        330        910        796        944 
## 0.74450985 0.86149452 0.89434439 0.63917968 0.80835015 0.84968130 0.96187567 
##         64        823        665        149        984        545        862 
## 0.17363325 0.30765650 0.86881581 0.68919194 0.46838717 0.92311104 0.88404072 
##        276        839        990        134        809        897        100 
## 0.94938643 0.86994708 0.79340479 0.73293091 0.45227054 0.34322832 0.85853315 
##        667         46        441        501        678        443        372 
## 0.58990936 0.78323797 0.83243989 0.15583931 0.27659765 0.74434000 0.92669870 
##        287        527        721        271         95        877        502 
## 0.54156518 0.86517612 0.74444862 0.98624800 0.86317028 0.25976699 0.59168207 
##        286        146         43        677        565        733        311 
## 0.14105257 0.43774614 0.54040176 0.92168879 0.66137277 0.92121563 0.77093175 
##        328        976        570        625        977        948        766 
## 0.91058463 0.77852666 0.23446944 0.55572422 0.92383944 0.89697797 0.69012618 
##        765        614        596        390        662        810        703 
## 0.90008403 0.79328388 0.45313359 0.90205205 0.55632246 0.39990339 0.75725585 
##        402        220        661        664        289        157        789 
## 0.74361157 0.88399190 0.74306627 0.73776997 0.79567834 0.93601046 0.26853570 
##        493        997        666        620        663        920        296 
## 0.97087106 0.57900056 0.74160084 0.80969282 0.87121313 0.49566284 0.28099581 
##        953        848        604        747        326        202        812 
## 0.57907908 0.67482330 0.73079587 0.26618700 0.96669014 0.35060513 0.88654313 
##        719        389        347        669        775        391         77 
## 0.95374322 0.77563102 0.89551050 0.38812278 0.80274246 0.83864808 0.32189014 
##         67         91        734        238        101        291        219 
## 0.78998891 0.97714238 0.97552220 0.27391310 0.69988452 0.98496766 0.46688139 
##        188        319        339        227        457         74        694 
## 0.87963119 0.89619373 0.39359341 0.49748174 0.52418577 0.58498867 0.84697112 
##        867        257 
## 0.33837597 0.93633078

2. Using the probability cut-off of 0.5, generate confusion matrix and obtain MR (misclassification rate) for the the training set. (3 pts)

act_value <- credit_train$Class
pcut <- 0.5

as.numeric(pred_prob > pcut)

##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0
##  [38] 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0
##  [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 1
## [112] 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1
## [149] 1 1 1 0 0 0 1 1 0 1 1 0 0 1 1 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1
## [186] 1 1 1 0 1 0 1 0 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
## [223] 1 0 1 1 0 1 1 1 1 0 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1
## [260] 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [297] 1 1 1 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 0 0
## [334] 1 1 1 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1
## [371] 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0
## [408] 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1
## [445] 1 1 0 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1
## [482] 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 0 1 0 1 1 1 1
## [519] 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1
## [556] 1 1 1 1 0 1 1 1 0 0 0 1 1 1 0 1 0 1 1 1 1 1 0 0 1 0 1 0 1 1 1 0 0 1 1 1 0
## [593] 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1
## [630] 1 1 0 1 1 1 0 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 0
## [667] 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 1
## [704] 1 1 1 1 0 0 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 0 0 1 1 1
## [741] 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 0 1
## [778] 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 0 1 1 1 0 1

pred_value <- 1 * (pred_prob > pcut)
table(act_value)

## act_value
## FALSE  TRUE 
##   246   554

confusion_mat <- table(actual = act_value, predict = pred_value)
confusion_mat

##        predict
## actual    0   1
##   FALSE 134 112
##   TRUE   62 492

# MR = (FP + FN) / Total #
MR_rate <- (112+62)/(134+112+62+492)
MR_rate

## [1] 0.2175

Your comment: 21.75% of the time predictions are incorrect

3. Find the optimal probability cut-off point using the MR. Please draw a plot of MR vs. cut-off probability, and comment on optimal cut-off probability. (3 pts)

library(ROCR)
pred <- prediction(predictions = pred_prob, labels = credit_train$Class)
perf <- performance(pred, "tpr", "fpr")
plot(perf,colorize = T)

## Find optimal cut off prob 
pcut_seq <- seq(from = 0, to = 1, by = 0.01)
WMR_seq <- rep(0, length(pcut_seq))
for(i in 1: length(pcut_seq)){
  pcut <- pcut_seq[i] 
  pred_value <- as.numeric(pred_prob > pcut) 
  #calculate # of FP and FN
  FP <- sum(credit_train$Class == 0 & pred_value == 1) 
  FN <- sum(credit_train$Class == 1 & pred_value == 0)
  # calculate WMR Rate
  WMR <- (0.9 *FN + 0.1 *FP)/length(pred_value)
  WMR_seq[i] <- WMR
}

plot(WMR_seq ~ pcut_seq)

which.min(WMR_seq)

## [1] 9

pcut_seq[which.min(WMR_seq)]

## [1] 0.08

Your comment: Optimal cutoff is 0.08

4. Please generate the ROC curve and calculate the AUC for the training set. Please comment on this AUC. (2 pts)

# install.packages("pROC")
library(pROC)

## Type 'citation("pROC")' for a citation.

## 
## Attaching package: 'pROC'

## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

pROC::auc(credit_train$Class, pred_prob)

## Setting levels: control = FALSE, case = TRUE

## Setting direction: controls < cases

## Area under the curve: 0.8312

pROC::roc(credit_train$Class, pred_prob)

## Setting levels: control = FALSE, case = TRUE
## Setting direction: controls < cases

## 
## Call:
## roc.default(response = credit_train$Class, predictor = pred_prob)
## 
## Data: pred_prob in 246 controls (credit_train$Class FALSE) < 554 cases (credit_train$Class TRUE).
## Area under the curve: 0.8312

Your comment: With an AUC of 83.12%, it can correctly select good vs bad credit 83% of the time. ## Task 4: Model Evaluation (Part II)

1. Use the testing set to obtain predicted probabilities. (2 pts)

pred_prob_test <- predict(credit_log, newdata = credit_test, type = "response")
pred_prob_test

##         3         4         7         9        17        21        24        39 
## 0.9772401 0.6814121 0.9038051 0.9909554 0.9742847 0.8756654 0.9516930 0.8876975 
##        42        48        51        61        63        66        69        75 
## 0.7466523 0.8952729 0.7223591 0.6706598 0.5768842 0.5582612 0.7020236 0.5482519 
##        79        86        92        93        97       103       109       113 
## 0.8809099 0.9650999 0.9353571 0.9361437 0.9818417 0.9377578 0.8978622 0.5338887 
##       114       115       124       125       129       131       133       141 
## 0.5099719 0.6878917 0.8518698 0.6880750 0.9134794 0.5234859 0.7764952 0.9850522 
##       150       151       162       172       173       178       183       189 
## 0.9815586 0.9203494 0.7580465 0.8952890 0.6503636 0.8418932 0.3037764 0.7466948 
##       200       208       212       216       221       224       235       242 
## 0.5758866 0.8966099 0.9515127 0.9915721 0.6818632 0.9338395 0.9845366 0.9673127 
##       243       244       245       247       249       253       258       262 
## 0.2655885 0.9271178 0.8390568 0.9511028 0.8335346 0.1894182 0.3363573 0.4473779 
##       267       270       275       282       283       288       293       294 
## 0.9285523 0.9324453 0.1149949 0.9327905 0.8692413 0.6855770 0.7141235 0.8882975 
##       313       327       338       343       344       351       354       355 
## 0.5888179 0.9866129 0.5691214 0.6956107 0.5916764 0.9067081 0.3053874 0.8901967 
##       356       357       358       370       379       381       400       409 
## 0.3783597 0.9942734 0.8211146 0.7661366 0.1486400 0.9013356 0.9742049 0.8655252 
##       421       422       424       426       436       442       448       453 
## 0.9297396 0.9336753 0.9594899 0.8410436 0.9308694 0.4572932 0.9213999 0.8619465 
##       455       458       467       472       480       485       486       487 
## 0.4715316 0.8224843 0.4705888 0.2810850 0.7008234 0.9522705 0.6789267 0.9570692 
##       492       494       511       513       518       525       526       530 
## 0.2827850 0.9016305 0.6425128 0.8916845 0.7387778 0.6598609 0.6141441 0.5520584 
##       534       535       537       538       541       542       543       548 
## 0.9370869 0.9049687 0.7067898 0.7661943 0.7626184 0.8206858 0.5470357 0.8809928 
##       556       561       566       578       584       588       591       600 
## 0.7528453 0.7591008 0.7581759 0.9385016 0.1967097 0.7529203 0.8242370 0.9377106 
##       602       603       605       610       617       618       623       628 
## 0.5931661 0.1900353 0.6272732 0.9554712 0.6549202 0.5392176 0.7144620 0.6668122 
##       634       636       641       642       649       659       660       670 
## 0.6960458 0.5002807 0.4064320 0.3464538 0.4861953 0.3055277 0.8491247 0.8448649 
##       673       674       688       689       690       701       702       704 
## 0.4487838 0.9468073 0.4053282 0.9149924 0.7182936 0.8078283 0.6494789 0.5371601 
##       706       707       709       711       716       738       741       751 
## 0.7509929 0.3667677 0.6037826 0.9213382 0.9898570 0.3536597 0.3300860 0.6213036 
##       757       761       770       777       780       795       804       805 
## 0.9855870 0.9733062 0.9858863 0.8846609 0.4338050 0.8231918 0.9805477 0.6781081 
##       808       811       818       826       828       829       835       838 
## 0.9785474 0.7417941 0.9821373 0.4222999 0.7275023 0.7668071 0.7655679 0.8978307 
##       842       847       851       852       853       858       863       885 
## 0.9500000 0.8721425 0.6665308 0.9956044 0.9537740 0.9539546 0.4479711 0.7426700 
##       886       888       890       892       894       896       899       905 
## 0.3234184 0.1881873 0.8733112 0.9558426 0.8213964 0.9758381 0.9662557 0.9320719 
##       921       931       933       934       935       938       942       952 
## 0.8517034 0.7096466 0.9290540 0.9531494 0.2974182 0.6220468 0.9537083 0.7617198 
##       955       968       971       978       980       981       985       988 
## 0.4494508 0.7276164 0.6663540 0.8272782 0.3348674 0.8701366 0.9768891 0.9425153

2. Using the probability cut-off of 0.5, generate confusion matrix and obtain MR (misclassification rate) for the the training set. (2 pts)

act_value <- credit_test$Class
pcut <- 0.5

as.numeric(pred_prob_test > pcut)

##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [38] 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1
##  [75] 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 1 0 1 1 1 1 1 1 0 1 1
## [149] 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1
## [186] 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1

pred_value <- 1 * (pred_prob_test > pcut)
table(act_value)

## act_value
## FALSE  TRUE 
##    54   146

confusion_mat <- table(actual = act_value, predict = pred_value)
confusion_mat

##        predict
## actual    0   1
##   FALSE  21  33
##   TRUE   12 134

# MR = (FP + FN) / Total #
MR_rate <- (112+62)/(134+112+62+492)
MR_rate

## [1] 0.2175

Your comment:

21.75% of the time predictions are incorrect

(Note to Prof: I am not sure if you menat to ask us to do this for the training set or testing set…in the subquestions 1 and 3 you ask for testing but this one is asking for training…my testing set would not work so I just did it again with my training set and got the samw result from the previous time I did this in this assignment for Task 3.

3. Please generate the ROC curve and calculate the AUC for the test set. Please comment on this AUC. (2 pts)

library(ROCR)
pred <- prediction(predictions = pred_prob_test, labels = credit_test$Class)
perf <- performance(pred, "tpr", "fpr")
plot(perf,colorize = T)

# install.packages("pROC")
library(pROC)
pROC::auc(credit_train$Class, pred_prob)

## Setting levels: control = FALSE, case = TRUE

## Setting direction: controls < cases

## Area under the curve: 0.8312

pROC::roc(credit_train$Class, pred_prob)

## Setting levels: control = FALSE, case = TRUE
## Setting direction: controls < cases

## 
## Call:
## roc.default(response = credit_train$Class, predictor = pred_prob)
## 
## Data: pred_prob in 246 controls (credit_train$Class FALSE) < 554 cases (credit_train$Class TRUE).
## Area under the curve: 0.8312

AUC is 83.12% which means it predicts who will have good credit vs bad credit aeround 83% of the time which shows the model is very good but still could be improved.