Homework5

Starter code for German credit scoring

Refer to http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)) for variable description. The response variable is Class and all others are predictors.

Only run the following code once to install the package caret. The German credit scoring data in provided in that package.

if (!require(package_name, quietly = TRUE)) {
    install.packages("caret")
}

Task1: Data Preparation

1. Load the caret package and the GermanCredit dataset.

library(caret) #this package contains the german data with its numeric format
data(GermanCredit)
GermanCredit$Class <-  GermanCredit$Class == "Good" # use this code to convert `Class` into True or False (equivalent to 1 or 0)
str(GermanCredit)

## 'data.frame':    1000 obs. of  62 variables:
##  $ Duration                              : int  6 48 12 42 24 36 24 36 12 30 ...
##  $ Amount                                : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ InstallmentRatePercentage             : int  4 2 2 2 3 2 3 2 2 4 ...
##  $ ResidenceDuration                     : int  4 2 3 4 4 4 4 2 4 2 ...
##  $ Age                                   : int  67 22 49 45 53 35 53 35 61 28 ...
##  $ NumberExistingCredits                 : int  2 1 1 1 2 1 1 1 1 2 ...
##  $ NumberPeopleMaintenance               : int  1 1 2 2 2 2 1 1 1 1 ...
##  $ Telephone                             : num  0 1 1 1 1 0 1 0 1 1 ...
##  $ ForeignWorker                         : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Class                                 : logi  TRUE FALSE TRUE TRUE FALSE TRUE ...
##  $ CheckingAccountStatus.lt.0            : num  1 0 0 1 1 0 0 0 0 0 ...
##  $ CheckingAccountStatus.0.to.200        : num  0 1 0 0 0 0 0 1 0 1 ...
##  $ CheckingAccountStatus.gt.200          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CheckingAccountStatus.none            : num  0 0 1 0 0 1 1 0 1 0 ...
##  $ CreditHistory.NoCredit.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.ThisBank.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.PaidDuly                : num  0 1 0 1 0 1 1 1 1 0 ...
##  $ CreditHistory.Delay                   : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ CreditHistory.Critical                : num  1 0 1 0 0 0 0 0 0 1 ...
##  $ Purpose.NewCar                        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ Purpose.UsedCar                       : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Purpose.Furniture.Equipment           : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Purpose.Radio.Television              : num  1 1 0 0 0 0 0 0 1 0 ...
##  $ Purpose.DomesticAppliance             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Repairs                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Education                     : num  0 0 1 0 0 1 0 0 0 0 ...
##  $ Purpose.Vacation                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Retraining                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Business                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Other                         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.lt.100            : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ SavingsAccountBonds.100.to.500        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.500.to.1000       : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ SavingsAccountBonds.gt.1000           : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ SavingsAccountBonds.Unknown           : num  1 0 0 0 0 1 0 0 0 0 ...
##  $ EmploymentDuration.lt.1               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EmploymentDuration.1.to.4             : num  0 1 0 0 1 1 0 1 0 0 ...
##  $ EmploymentDuration.4.to.7             : num  0 0 1 1 0 0 0 0 1 0 ...
##  $ EmploymentDuration.gt.7               : num  1 0 0 0 0 0 1 0 0 0 ...
##  $ EmploymentDuration.Unemployed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Male.Divorced.Seperated      : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ Personal.Female.NotSingle             : num  0 1 0 0 0 0 0 0 0 0 ...
##  $ Personal.Male.Single                  : num  1 0 1 1 1 1 1 1 0 0 ...
##  $ Personal.Male.Married.Widowed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Female.Single                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.None           : num  1 1 1 0 1 1 1 1 1 1 ...
##  $ OtherDebtorsGuarantors.CoApplicant    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.Guarantor      : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ Property.RealEstate                   : num  1 1 1 0 0 0 0 0 1 0 ...
##  $ Property.Insurance                    : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Property.CarOther                     : num  0 0 0 0 0 0 0 1 0 1 ...
##  $ Property.Unknown                      : num  0 0 0 0 1 1 0 0 0 0 ...
##  $ OtherInstallmentPlans.Bank            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.Stores          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.None            : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Housing.Rent                          : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Housing.Own                           : num  1 1 1 0 0 0 1 0 1 1 ...
##  $ Housing.ForFree                       : num  0 0 0 1 1 1 0 0 0 0 ...
##  $ Job.UnemployedUnskilled               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Job.UnskilledResident                 : num  0 0 1 0 0 1 0 0 1 0 ...
##  $ Job.SkilledEmployee                   : num  1 1 0 1 1 0 1 0 0 0 ...
##  $ Job.Management.SelfEmp.HighlyQualified: num  0 0 0 0 0 0 0 1 0 1 ...

Your observation: My observations from the German data is that there 1000 observations and 62 variables. Most variables are numeric while some are integers and one logic. The data contains variables like Age, Class, and Housing.

#This is an optional code that drop variables that provide no information in the data
GermanCredit = GermanCredit[,-c(14,19,27,30,35,40,44,45,48,52,55,58,62)] #don't run this code twice!! Think about why.

2. Explore the dataset to understand its structure.

(1) How many observations and variables are there? (2 pts)

# your code here:
str(GermanCredit)

## 'data.frame':    1000 obs. of  49 variables:
##  $ Duration                          : int  6 48 12 42 24 36 24 36 12 30 ...
##  $ Amount                            : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ InstallmentRatePercentage         : int  4 2 2 2 3 2 3 2 2 4 ...
##  $ ResidenceDuration                 : int  4 2 3 4 4 4 4 2 4 2 ...
##  $ Age                               : int  67 22 49 45 53 35 53 35 61 28 ...
##  $ NumberExistingCredits             : int  2 1 1 1 2 1 1 1 1 2 ...
##  $ NumberPeopleMaintenance           : int  1 1 2 2 2 2 1 1 1 1 ...
##  $ Telephone                         : num  0 1 1 1 1 0 1 0 1 1 ...
##  $ ForeignWorker                     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Class                             : logi  TRUE FALSE TRUE TRUE FALSE TRUE ...
##  $ CheckingAccountStatus.lt.0        : num  1 0 0 1 1 0 0 0 0 0 ...
##  $ CheckingAccountStatus.0.to.200    : num  0 1 0 0 0 0 0 1 0 1 ...
##  $ CheckingAccountStatus.gt.200      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.NoCredit.AllPaid    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.ThisBank.AllPaid    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.PaidDuly            : num  0 1 0 1 0 1 1 1 1 0 ...
##  $ CreditHistory.Delay               : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ Purpose.NewCar                    : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ Purpose.UsedCar                   : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Purpose.Furniture.Equipment       : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Purpose.Radio.Television          : num  1 1 0 0 0 0 0 0 1 0 ...
##  $ Purpose.DomesticAppliance         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Repairs                   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Education                 : num  0 0 1 0 0 1 0 0 0 0 ...
##  $ Purpose.Retraining                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Business                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.lt.100        : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ SavingsAccountBonds.100.to.500    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.500.to.1000   : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ SavingsAccountBonds.gt.1000       : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ EmploymentDuration.lt.1           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EmploymentDuration.1.to.4         : num  0 1 0 0 1 1 0 1 0 0 ...
##  $ EmploymentDuration.4.to.7         : num  0 0 1 1 0 0 0 0 1 0 ...
##  $ EmploymentDuration.gt.7           : num  1 0 0 0 0 0 1 0 0 0 ...
##  $ Personal.Male.Divorced.Seperated  : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ Personal.Female.NotSingle         : num  0 1 0 0 0 0 0 0 0 0 ...
##  $ Personal.Male.Single              : num  1 0 1 1 1 1 1 1 0 0 ...
##  $ OtherDebtorsGuarantors.None       : num  1 1 1 0 1 1 1 1 1 1 ...
##  $ OtherDebtorsGuarantors.CoApplicant: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Property.RealEstate               : num  1 1 1 0 0 0 0 0 1 0 ...
##  $ Property.Insurance                : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Property.CarOther                 : num  0 0 0 0 0 0 0 1 0 1 ...
##  $ OtherInstallmentPlans.Bank        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.Stores      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Housing.Rent                      : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Housing.Own                       : num  1 1 1 0 0 0 1 0 1 1 ...
##  $ Job.UnemployedUnskilled           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Job.UnskilledResident             : num  0 0 1 0 0 1 0 0 1 0 ...
##  $ Job.SkilledEmployee               : num  1 1 0 1 1 0 1 0 0 0 ...

Your observation: This subset of the data set is 1000 observations with 49 variables instead of 62 variables.

(2) Please make a frequency table of variable `class` (use `table()` function). How many observations are classed as “good” and how many are “bad”? (2 pts)

# your code here:
table(GermanCredit$Class)

## 
## FALSE  TRUE 
##   300   700

Your observation: There are 300 observations in the FALSE category, and 700 observations in the TRUE category.

(3) Please make a barplot of of response variable `class`. Please add titles and labels to axis. (2 pts)

# your code here:
library(ggplot2)
freq_class <- table(GermanCredit$Class)
barplot(freq_class, main = "Classes", xlab = "Observations")

3. Split the dataset into training and test set. A random seed of `2025` is set for reproducibility. Please comment on what is the split proportion you choose for training and testing data? (2 pts)

set.seed(2025) # set random seed for reproducibility.
# your code here:
train <- createDataPartition(GermanCredit$Class, p = 0.7, list = FALSE)
train_data <- GermanCredit[train, ]
test_data  <- GermanCredit[-train, ]
dim(train_data)

## [1] 700  49

dim(test_data)

## [1] 300  49

Your comment: For this example, we used a 70/30 testing split. This provides the model enough to train it, while also giving it enough to reliably test the models performance.

Task 2: Model Fitting

1. Fit a logistic regression model using the training set. Please use all variables, but make sure the variable types are right. (2 pts)

# your code here:
logic_model <- glm(Class ~ ., data = train_data, family = "binomial")
summary(logic_model)

## 
## Call:
## glm(formula = Class ~ ., family = "binomial", data = train_data)
## 
## Coefficients:
##                                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                         7.941e+00  1.617e+00   4.910 9.12e-07 ***
## Duration                           -2.901e-02  1.151e-02  -2.521 0.011705 *  
## Amount                             -1.075e-04  5.272e-05  -2.039 0.041401 *  
## InstallmentRatePercentage          -3.898e-01  1.092e-01  -3.569 0.000358 ***
## ResidenceDuration                  -6.141e-02  1.015e-01  -0.605 0.545004    
## Age                                 2.488e-02  1.142e-02   2.178 0.029393 *  
## NumberExistingCredits              -2.077e-01  2.276e-01  -0.913 0.361504    
## NumberPeopleMaintenance            -2.455e-01  3.068e-01  -0.800 0.423583    
## Telephone                          -4.703e-02  2.442e-01  -0.193 0.847267    
## ForeignWorker                      -1.116e+00  6.633e-01  -1.682 0.092485 .  
## CheckingAccountStatus.lt.0         -1.539e+00  2.694e-01  -5.714 1.10e-08 ***
## CheckingAccountStatus.0.to.200     -1.186e+00  2.786e-01  -4.258 2.06e-05 ***
## CheckingAccountStatus.gt.200       -2.359e-01  4.652e-01  -0.507 0.612139    
## CreditHistory.NoCredit.AllPaid     -1.508e+00  5.144e-01  -2.931 0.003383 ** 
## CreditHistory.ThisBank.AllPaid     -1.970e+00  5.276e-01  -3.733 0.000189 ***
## CreditHistory.PaidDuly             -9.283e-01  3.350e-01  -2.771 0.005588 ** 
## CreditHistory.Delay                -7.730e-01  4.062e-01  -1.903 0.057032 .  
## Purpose.NewCar                     -1.616e+00  8.906e-01  -1.815 0.069570 .  
## Purpose.UsedCar                     1.418e-02  9.328e-01   0.015 0.987872    
## Purpose.Furniture.Equipment        -8.664e-01  8.985e-01  -0.964 0.334875    
## Purpose.Radio.Television           -6.736e-01  8.958e-01  -0.752 0.452093    
## Purpose.DomesticAppliance          -1.316e+00  1.198e+00  -1.099 0.271980    
## Purpose.Repairs                    -1.125e+00  1.077e+00  -1.045 0.295888    
## Purpose.Education                  -1.842e+00  9.842e-01  -1.872 0.061248 .  
## Purpose.Retraining                  5.452e-01  1.534e+00   0.355 0.722233    
## Purpose.Business                   -1.013e+00  9.128e-01  -1.110 0.266924    
## SavingsAccountBonds.lt.100         -1.011e+00  3.130e-01  -3.230 0.001238 ** 
## SavingsAccountBonds.100.to.500     -6.076e-01  4.251e-01  -1.429 0.152873    
## SavingsAccountBonds.500.to.1000    -3.826e-01  5.583e-01  -0.685 0.493160    
## SavingsAccountBonds.gt.1000         2.995e-01  7.103e-01   0.422 0.673243    
## EmploymentDuration.lt.1             3.306e-01  4.957e-01   0.667 0.504788    
## EmploymentDuration.1.to.4           5.461e-01  4.722e-01   1.156 0.247501    
## EmploymentDuration.4.to.7           6.790e-01  5.128e-01   1.324 0.185449    
## EmploymentDuration.gt.7             4.548e-01  4.684e-01   0.971 0.331585    
## Personal.Male.Divorced.Seperated   -7.670e-01  5.438e-01  -1.410 0.158402    
## Personal.Female.NotSingle          -2.582e-01  3.681e-01  -0.701 0.483014    
## Personal.Male.Single                4.698e-01  3.750e-01   1.253 0.210300    
## OtherDebtorsGuarantors.None        -1.044e+00  5.749e-01  -1.817 0.069292 .  
## OtherDebtorsGuarantors.CoApplicant -1.167e+00  7.453e-01  -1.566 0.117349    
## Property.RealEstate                 7.808e-01  5.317e-01   1.468 0.141990    
## Property.Insurance                  3.269e-01  5.206e-01   0.628 0.530048    
## Property.CarOther                   7.458e-01  5.089e-01   1.466 0.142739    
## OtherInstallmentPlans.Bank         -5.795e-01  2.937e-01  -1.973 0.048493 *  
## OtherInstallmentPlans.Stores       -5.174e-01  4.619e-01  -1.120 0.262692    
## Housing.Rent                       -8.942e-01  5.908e-01  -1.514 0.130116    
## Housing.Own                        -4.131e-01  5.593e-01  -0.739 0.460202    
## Job.UnemployedUnskilled             1.175e+00  9.763e-01   1.204 0.228626    
## Job.UnskilledResident              -2.144e-01  4.270e-01  -0.502 0.615535    
## Job.SkilledEmployee                -1.170e-01  3.437e-01  -0.340 0.733609    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 855.21  on 699  degrees of freedom
## Residual deviance: 621.15  on 651  degrees of freedom
## AIC: 719.15
## 
## Number of Fisher Scoring iterations: 5

2. Summarize the model and interpret the coefficients. What is the estimated coefficients for variable `InstallmentRatePercentage`? Is it significant, and why? (2 pts)

# your code here:
summary(logic_model)

## 
## Call:
## glm(formula = Class ~ ., family = "binomial", data = train_data)
## 
## Coefficients:
##                                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                         7.941e+00  1.617e+00   4.910 9.12e-07 ***
## Duration                           -2.901e-02  1.151e-02  -2.521 0.011705 *  
## Amount                             -1.075e-04  5.272e-05  -2.039 0.041401 *  
## InstallmentRatePercentage          -3.898e-01  1.092e-01  -3.569 0.000358 ***
## ResidenceDuration                  -6.141e-02  1.015e-01  -0.605 0.545004    
## Age                                 2.488e-02  1.142e-02   2.178 0.029393 *  
## NumberExistingCredits              -2.077e-01  2.276e-01  -0.913 0.361504    
## NumberPeopleMaintenance            -2.455e-01  3.068e-01  -0.800 0.423583    
## Telephone                          -4.703e-02  2.442e-01  -0.193 0.847267    
## ForeignWorker                      -1.116e+00  6.633e-01  -1.682 0.092485 .  
## CheckingAccountStatus.lt.0         -1.539e+00  2.694e-01  -5.714 1.10e-08 ***
## CheckingAccountStatus.0.to.200     -1.186e+00  2.786e-01  -4.258 2.06e-05 ***
## CheckingAccountStatus.gt.200       -2.359e-01  4.652e-01  -0.507 0.612139    
## CreditHistory.NoCredit.AllPaid     -1.508e+00  5.144e-01  -2.931 0.003383 ** 
## CreditHistory.ThisBank.AllPaid     -1.970e+00  5.276e-01  -3.733 0.000189 ***
## CreditHistory.PaidDuly             -9.283e-01  3.350e-01  -2.771 0.005588 ** 
## CreditHistory.Delay                -7.730e-01  4.062e-01  -1.903 0.057032 .  
## Purpose.NewCar                     -1.616e+00  8.906e-01  -1.815 0.069570 .  
## Purpose.UsedCar                     1.418e-02  9.328e-01   0.015 0.987872    
## Purpose.Furniture.Equipment        -8.664e-01  8.985e-01  -0.964 0.334875    
## Purpose.Radio.Television           -6.736e-01  8.958e-01  -0.752 0.452093    
## Purpose.DomesticAppliance          -1.316e+00  1.198e+00  -1.099 0.271980    
## Purpose.Repairs                    -1.125e+00  1.077e+00  -1.045 0.295888    
## Purpose.Education                  -1.842e+00  9.842e-01  -1.872 0.061248 .  
## Purpose.Retraining                  5.452e-01  1.534e+00   0.355 0.722233    
## Purpose.Business                   -1.013e+00  9.128e-01  -1.110 0.266924    
## SavingsAccountBonds.lt.100         -1.011e+00  3.130e-01  -3.230 0.001238 ** 
## SavingsAccountBonds.100.to.500     -6.076e-01  4.251e-01  -1.429 0.152873    
## SavingsAccountBonds.500.to.1000    -3.826e-01  5.583e-01  -0.685 0.493160    
## SavingsAccountBonds.gt.1000         2.995e-01  7.103e-01   0.422 0.673243    
## EmploymentDuration.lt.1             3.306e-01  4.957e-01   0.667 0.504788    
## EmploymentDuration.1.to.4           5.461e-01  4.722e-01   1.156 0.247501    
## EmploymentDuration.4.to.7           6.790e-01  5.128e-01   1.324 0.185449    
## EmploymentDuration.gt.7             4.548e-01  4.684e-01   0.971 0.331585    
## Personal.Male.Divorced.Seperated   -7.670e-01  5.438e-01  -1.410 0.158402    
## Personal.Female.NotSingle          -2.582e-01  3.681e-01  -0.701 0.483014    
## Personal.Male.Single                4.698e-01  3.750e-01   1.253 0.210300    
## OtherDebtorsGuarantors.None        -1.044e+00  5.749e-01  -1.817 0.069292 .  
## OtherDebtorsGuarantors.CoApplicant -1.167e+00  7.453e-01  -1.566 0.117349    
## Property.RealEstate                 7.808e-01  5.317e-01   1.468 0.141990    
## Property.Insurance                  3.269e-01  5.206e-01   0.628 0.530048    
## Property.CarOther                   7.458e-01  5.089e-01   1.466 0.142739    
## OtherInstallmentPlans.Bank         -5.795e-01  2.937e-01  -1.973 0.048493 *  
## OtherInstallmentPlans.Stores       -5.174e-01  4.619e-01  -1.120 0.262692    
## Housing.Rent                       -8.942e-01  5.908e-01  -1.514 0.130116    
## Housing.Own                        -4.131e-01  5.593e-01  -0.739 0.460202    
## Job.UnemployedUnskilled             1.175e+00  9.763e-01   1.204 0.228626    
## Job.UnskilledResident              -2.144e-01  4.270e-01  -0.502 0.615535    
## Job.SkilledEmployee                -1.170e-01  3.437e-01  -0.340 0.733609    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 855.21  on 699  degrees of freedom
## Residual deviance: 621.15  on 651  degrees of freedom
## AIC: 719.15
## 
## Number of Fisher Scoring iterations: 5

Your comment: The coefficient for InstallmentRatePercentage is about -0.43 which means that the installment rate percentage goes up while the good credit decreases. We also know that the coefficent is statistically significant.

3. Please interpret this number in detail (please calculate the corresponding odds ratio, and interpret it). (2 pts)

# you might need some code for calculation:
odds <- exp(-0.43)
odds

## [1] 0.6505091

Your comment: The odds for the InstallmentRatePercentage is 0.65 which means the for every one increase in installment rate percentage, the odds went down

Task 3: Model Evaluation (Part I)

1. Use the training set to obtain predicted probabilities. (2 pts)

# your code here:
train_pred_prob <- predict(logic_model, newdata = train_data, type = "response")
train_pred_prob

##          1          3          6         11         12         13         14 
## 0.97453792 0.95952716 0.71928110 0.32074645 0.07608803 0.83362826 0.69258565 
##         17         18         19         20         21         22         27 
## 0.97534586 0.30190275 0.44440128 0.93414779 0.91801561 0.87751175 0.83303158 
##         28         29         31         32         34         35         36 
## 0.86415628 0.94700988 0.78837840 0.57217581 0.96130270 0.59547297 0.45662901 
##         37         40         41         42         43         44         45 
## 0.71945153 0.78436056 0.87184749 0.74934363 0.73804195 0.85571115 0.39818725 
##         46         47         48         49         50         53         55 
## 0.80970117 0.90975368 0.92967068 0.91680013 0.87332510 0.93965266 0.28727474 
##         56         58         59         60         61         62         63 
## 0.95353840 0.76328005 0.63778003 0.08298532 0.69549150 0.98685437 0.49810381 
##         66         67         69         70         71         72         73 
## 0.76409405 0.83396713 0.45105723 0.84528069 0.76418751 0.97747220 0.89713474 
##         74         75         76         79         80         81         83 
## 0.54420799 0.36887102 0.94193521 0.85120347 0.46008017 0.95749589 0.67457061 
##         85         86         88         90         91         93         94 
## 0.82433136 0.96561410 0.38939636 0.44404657 0.98101984 0.91067955 0.79720755 
##         95         96         97         98        101        103        105 
## 0.89288421 0.05445515 0.96598445 0.73372531 0.64948174 0.91623995 0.97038749 
##        107        108        109        110        111        112        114 
## 0.25697354 0.68899368 0.84373246 0.91075861 0.62243962 0.26670060 0.39299756 
##        115        116        117        118        120        121        122 
## 0.82830648 0.97343284 0.48837169 0.93649086 0.79031272 0.50743668 0.91662667 
##        125        126        127        129        134        135        137 
## 0.58439242 0.55166319 0.67206485 0.91333212 0.77129936 0.61205648 0.98026546 
##        140        141        142        143        144        145        148 
## 0.88554312 0.99507557 0.22333677 0.66953945 0.73599183 0.77410823 0.93320228 
##        149        153        154        155        156        157        158 
## 0.78513767 0.72452984 0.85311719 0.59706989 0.48612500 0.98621768 0.60202683 
##        160        161        164        165        166        167        170 
## 0.99492455 0.96929030 0.52909516 0.73574041 0.98480174 0.35169734 0.47746567 
##        171        173        174        175        176        178        179 
## 0.16151866 0.44571354 0.97498970 0.33945227 0.71611495 0.94766547 0.85719842 
##        180        181        183        184        186        187        188 
## 0.77268849 0.55647894 0.25819819 0.98743571 0.93755529 0.63404503 0.97409758 
##        189        190        191        192        193        197        198 
## 0.61401290 0.44703332 0.96738525 0.24788011 0.54592239 0.97940358 0.21636882 
##        199        200        201        202        203        204        205 
## 0.82536522 0.27548963 0.96393135 0.34071828 0.82326223 0.64897242 0.89033531 
##        206        207        210        211        213        214        215 
## 0.63949745 0.94877641 0.99817417 0.99416101 0.20823273 0.91281987 0.98161676 
##        216        217        218        220        221        223        225 
## 0.99075149 0.33338165 0.87548525 0.91730153 0.73905079 0.85661604 0.95900690 
##        226        229        230        231        232        233        234 
## 0.58061266 0.92256324 0.25567170 0.75611831 0.94140859 0.94000707 0.82488527 
##        235        236        238        241        242        244        245 
## 0.98462010 0.34081693 0.27126045 0.32624289 0.95253056 0.92352144 0.88874722 
##        248        249        250        251        252        253        255 
## 0.73831590 0.91268716 0.65021448 0.89546655 0.93625340 0.19390704 0.95202190 
##        257        258        259        262        263        264        265 
## 0.94056216 0.15069989 0.96030236 0.55868983 0.61030221 0.76207414 0.96619683 
##        267        268        269        270        271        272        274 
## 0.93109731 0.69564809 0.51577521 0.93979651 0.95444792 0.96234097 0.46099325 
##        277        279        280        281        282        283        284 
## 0.94562188 0.86235165 0.88099448 0.97603860 0.94192067 0.86128746 0.98660863 
##        286        287        288        290        291        294        297 
## 0.16273577 0.38052024 0.68328398 0.48271889 0.97257311 0.93761212 0.96026572 
##        298        300        302        304        306        308        310 
## 0.96651258 0.96547321 0.35982447 0.70063228 0.93177913 0.48335368 0.31190666 
##        311        313        314        315        316        317        318 
## 0.86850693 0.74068964 0.49825737 0.98602978 0.12807656 0.88153545 0.84923432 
##        319        320        321        322        323        324        325 
## 0.89492362 0.49716200 0.32732381 0.20375737 0.88161581 0.84348474 0.81817381 
##        326        328        329        331        332        334        338 
## 0.95529442 0.84766112 0.75700509 0.92321552 0.81812022 0.68103809 0.46563408 
##        339        341        343        347        349        350        351 
## 0.37650887 0.33046478 0.80064371 0.93039746 0.97854242 0.82856761 0.81106483 
##        352        354        358        359        361        362        363 
## 0.94006883 0.21756736 0.81028011 0.88571450 0.78247128 0.98098723 0.70555665 
##        364        366        367        368        369        371        372 
## 0.87895617 0.95672989 0.98102594 0.43533586 0.44487211 0.84792417 0.94304877 
##        375        376        377        378        379        380        382 
## 0.09291021 0.20160964 0.87832247 0.97871669 0.20795926 0.92631805 0.39434636 
##        383        384        385        387        388        390        391 
## 0.62572033 0.74566189 0.85926741 0.85385342 0.44958271 0.84807150 0.79548508 
##        392        394        397        399        400        406        407 
## 0.84903633 0.92016707 0.50135188 0.50052190 0.97657529 0.69773028 0.98990856 
##        408        409        410        411        412        415        416 
## 0.78025405 0.84216773 0.89780987 0.43188111 0.96911710 0.40753187 0.94760045 
##        417        418        419        421        422        423        424 
## 0.37030238 0.40485679 0.86739645 0.87516196 0.94833818 0.88798919 0.93491570 
##        425        426        427        428        430        432        433 
## 0.77289600 0.83538876 0.92136839 0.99117198 0.82553850 0.61869609 0.66390981 
##        434        435        437        441        442        443        444 
## 0.77469403 0.73906933 0.92397607 0.83812794 0.53935071 0.84507998 0.62407893 
##        446        453        457        458        459        460        461 
## 0.81246360 0.82994731 0.66834631 0.80444850 0.41317085 0.90192378 0.82076868 
##        462        463        464        465        467        468        469 
## 0.63381333 0.54736303 0.94961402 0.81877255 0.31117073 0.77097503 0.78857056 
##        472        473        475        476        477        478        479 
## 0.16440797 0.48547341 0.83949642 0.24952196 0.93088847 0.71235438 0.87464793 
##        480        482        484        485        486        487        492 
## 0.78480218 0.34005065 0.94617429 0.96808608 0.54615228 0.97169471 0.25737091 
##        493        494        495        496        497        498        499 
## 0.97790449 0.89480182 0.82732966 0.76179214 0.19902227 0.92596531 0.83003623 
##        500        502        503        506        507        508        509 
## 0.84205100 0.68202573 0.92992678 0.94940548 0.99482851 0.32029871 0.76445422 
##        511        512        513        514        515        516        518 
## 0.53009776 0.94070080 0.81775407 0.47592078 0.89261965 0.76453606 0.59074771 
##        519        520        521        522        524        525        526 
## 0.69676770 0.98285122 0.89431148 0.62166108 0.92283994 0.71537382 0.75939568 
##        528        530        531        534        535        536        537 
## 0.98752234 0.57465164 0.37773903 0.84030294 0.91662107 0.64823120 0.66827368 
##        539        540        541        542        544        545        546 
## 0.07900311 0.78782778 0.76664840 0.78596402 0.91193669 0.94010786 0.39411095 
##        548        549        550        551        552        553        554 
## 0.87098340 0.15783893 0.97918517 0.94938022 0.85643042 0.76705349 0.79821292 
##        556        558        559        560        561        564        565 
## 0.76767447 0.60840016 0.24192396 0.89110435 0.77838523 0.59976244 0.63021663 
##        566        567        569        570        571        573        575 
## 0.50821954 0.43778667 0.73695832 0.27672907 0.17517846 0.87318297 0.69126793 
##        576        577        578        579        583        584        585 
## 0.88372185 0.78925999 0.91735823 0.19126138 0.74780235 0.20119736 0.86891472 
##        586        587        588        589        591        594        595 
## 0.38563003 0.70633445 0.75219215 0.50128887 0.79657042 0.14016731 0.72477278 
##        596        597        598        599        600        602        603 
## 0.31562494 0.12454123 0.61564514 0.86532861 0.92784827 0.53473965 0.04952951 
##        604        605        607        608        610        611        613 
## 0.67980607 0.63906810 0.98292420 0.51015476 0.95643866 0.23219891 0.55407255 
##        614        615        616        617        618        619        620 
## 0.76912021 0.81935353 0.49418156 0.66994115 0.73418572 0.46223651 0.68519594 
##        622        623        624        627        628        629        630 
## 0.81702124 0.55960966 0.29348451 0.96333896 0.46546598 0.92926984 0.97960362 
##        632        635        638        639        640        642        643 
## 0.31183900 0.29767799 0.52518083 0.78656769 0.25167095 0.60578872 0.83890473 
##        644        645        646        647        648        651        654 
## 0.97387569 0.71914262 0.69706459 0.43868372 0.84366363 0.19511318 0.47126307 
##        656        659        661        663        665        666        667 
## 0.40545779 0.35201351 0.80490708 0.89607732 0.89196243 0.74611556 0.49782930 
##        668        669        670        673        675        676        677 
## 0.43733106 0.46428488 0.90477905 0.44407490 0.84447333 0.80223151 0.91347219 
##        679        680        681        682        683        684        686 
## 0.49515501 0.84923433 0.82958478 0.95880366 0.78856994 0.88207633 0.50890760 
##        687        688        689        690        691        692        694 
## 0.97613260 0.39081144 0.95410598 0.82057805 0.58667493 0.61915916 0.92166799 
##        696        697        698        699        700        703        707 
## 0.98561848 0.96593756 0.96980871 0.94371749 0.65405593 0.83273741 0.28804317 
##        710        711        712        713        715        717        718 
## 0.87372424 0.94189919 0.09631206 0.98478493 0.05033778 0.97902590 0.89667909 
##        719        721        722        724        725        726        727 
## 0.97989870 0.82980082 0.10335929 0.89125380 0.47303441 0.97166232 0.98115841 
##        728        729        730        731        732        734        735 
## 0.17602662 0.04427021 0.96221315 0.82870956 0.53951191 0.96866080 0.95561262 
##        736        738        739        742        744        746        748 
## 0.27280382 0.49693933 0.93261667 0.36733954 0.28869202 0.58053174 0.40521607 
##        751        752        753        755        756        757        758 
## 0.63518224 0.45242644 0.67839292 0.95625543 0.22506124 0.99851051 0.98358470 
##        759        760        762        763        764        765        766 
## 0.95714237 0.58244836 0.43915395 0.52777947 0.83273547 0.79241244 0.81992959 
##        767        771        772        773        774        775        777 
## 0.22099847 0.80479292 0.12138484 0.99251261 0.92325825 0.97824531 0.82474115 
##        779        780        781        782        783        787        788 
## 0.98656283 0.67643063 0.93724353 0.98146880 0.93738783 0.95651779 0.97188657 
##        791        792        793        794        795        796        799 
## 0.52438609 0.97480344 0.98791063 0.65847486 0.74858560 0.78246258 0.89724390 
##        800        801        802        803        804        805        806 
## 0.93427958 0.81040480 0.83843173 0.63916310 0.98452148 0.78355267 0.33458435 
##        808        809        810        813        814        815        816 
## 0.98926744 0.42660803 0.51641813 0.65949955 0.48139335 0.18196006 0.24888121 
##        817        819        820        821        822        823        824 
## 0.96347468 0.19807853 0.39710471 0.78150600 0.86849548 0.40462350 0.73849207 
##        826        827        828        829        830        833        835 
## 0.46148744 0.34740088 0.73751825 0.80979448 0.50955766 0.09808805 0.83622947 
##        838        840        841        842        843        844        845 
## 0.85710880 0.91646248 0.33616838 0.91896694 0.64003106 0.66686110 0.61714953 
##        848        849        850        851        852        853        855 
## 0.62027279 0.89780530 0.70638848 0.66098374 0.99008619 0.94305070 0.64984560 
##        857        858        859        860        861        862        863 
## 0.98250019 0.92884640 0.24715094 0.98684260 0.98691119 0.82512093 0.40278737 
##        865        866        867        869        871        874        875 
## 0.85510834 0.87261964 0.30394556 0.83756422 0.86233151 0.85897062 0.59085203 
##        877        878        879        880        881        882        883 
## 0.26542044 0.80090392 0.49970245 0.97544062 0.94976689 0.92764136 0.80718668 
##        884        886        887        888        889        891        892 
## 0.97834109 0.25148088 0.80155295 0.27141375 0.73804684 0.46372696 0.96386194 
##        893        894        895        896        897        898        899 
## 0.67200540 0.87339443 0.97312848 0.94050166 0.21132033 0.99398055 0.92517698 
##        900        902        904        905        906        907        908 
## 0.76074879 0.91929429 0.94106804 0.92045627 0.72321408 0.89780792 0.60149900 
##        909        910        911        913        914        915        916 
## 0.94711214 0.84485290 0.66194849 0.78746382 0.96298269 0.13931765 0.32542120 
##        917        918        919        920        921        922        923 
## 0.98718776 0.36482078 0.44739573 0.39467439 0.77737840 0.82009373 0.31884228 
##        925        926        928        929        930        931        932 
## 0.10105060 0.17652282 0.16560287 0.96873445 0.48810331 0.62737551 0.56001765 
##        933        935        936        937        939        940        941 
## 0.94455295 0.37638520 0.40825226 0.87394904 0.12294907 0.98907595 0.93586336 
##        943        945        946        947        948        949        951 
## 0.93544027 0.56424324 0.20960083 0.24840481 0.79002941 0.71072155 0.84520790 
##        953        954        960        961        962        964        965 
## 0.67868981 0.29515088 0.60225364 0.95670983 0.43389802 0.91061129 0.53823572 
##        966        967        968        969        970        971        972 
## 0.72467216 0.79926950 0.73619573 0.80660823 0.82779187 0.88198871 0.75736442 
##        973        974        976        978        979        980        981 
## 0.12994303 0.05442441 0.93701089 0.80241461 0.70385991 0.25184260 0.84646060 
##        982        983        986        987        988        989        990 
## 0.43352988 0.78633499 0.38915474 0.22289042 0.95698463 0.49222532 0.81069345 
##        991        992        993        995        998        999       1000 
## 0.94444539 0.66486296 0.74360496 0.94370166 0.92512863 0.34030652 0.83830995

summary(train_pred_prob)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.04427 0.50110 0.78645 0.70000 0.92189 0.99851

2. Using the probability cut-off of 0.5, generate confusion matrix and obtain MR (misclassification rate) for the the training set. (3 pts)

# your code here:
cut <- 0.5 
train_pred_class <- as.numeric(train_pred_prob >= cut)
confusion_matrix <- table(Predicted = train_pred_class, Actual  =train_data$Class)
confusion_matrix

##          Actual
## Predicted FALSE TRUE
##         0   122   52
##         1    88  438

#MR
MR <- (82+53)/(128+53+82+437)
MR

## [1] 0.1928571

Your comment: With this matrix, we compare predicted class labels with the actual class values in the data set. The MR is 0.19 which means that 19% of the training observations were not classified correctly.

3. Find the optimal probability cut-off point using the MR. Please draw a plot of MR vs. cut-off probability, and comment on optimal cut-off probability. (3 pts)

# your code here:
pcut_seq <- seq(from=0, to=1,by=0.01)
MR_seq <- rep(0,length(pcut_seq))
for(i in 1: length(pcut_seq)){
  pcut <- pcut_seq[i]
  pred_value <- train_pred_prob > pcut
  FP <- sum(train_data$Class == FALSE & pred_value == TRUE)
  FN <- sum(train_data$Class == TRUE  & pred_value == FALSE)
  MR <- (FP + FN) / length(pred_value)
  MR_seq[i] <- MR
}

plot(MR_seq ~ pcut_seq, type = "l",
     xlab = "Probability Cutoff",
     ylab = "MR",
     main = "MR vs Cutoff ")

best_cut <- pcut_seq[which.min(MR_seq)]
best_cut

## [1] 0.49

Your comment: We were able to find the optimal cutoff of 0.56 which we can use to improve the models prediction. For the graph, we can see that MR starts small and increases

4. Please generate the ROC curve and calculate the AUC for the training set. Please comment on this AUC. (2 pts)

# your code here:
library(ROCR)
library(pROC)

## Type 'citation("pROC")' for a citation.

## 
## Attaching package: 'pROC'

## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

train_data$Class <- as.numeric(train_data$Class)
pred_obj <- prediction(train_pred_prob, train_data$Class)
perf_obj <- performance(pred_obj, "tpr", "fpr")
plot(perf_obj, col = "blue", lwd = 2, main = "ROC Curve")

roc_obj <- roc(train_data$Class, train_pred_prob)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

auc(roc_obj)

## Area under the curve: 0.8346

Your comment: The AUC for the training set is 0.83 which means that the model is good at distingusing between good and bad credit applicants.

Task 4: Model Evaluation (Part II)

1. Use the testing set to obtain predicted probabilities. (2 pts)

# your code here:
test_pred_prob <- predict(logic_model, newdata = test_data, type = "response")
summary(test_pred_prob)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.06043 0.52355 0.80110 0.70132 0.91777 0.99664

2. Using the probability cut-off of 0.5, generate confusion matrix and obtain MR (misclassification rate) for the the training set. (2 pts)

# your code here:
cut <- 0.5
test_pred_class <- as.numeric(test_pred_prob >= cut)
confusion_matrix_test <- table(Predicted = test_pred_class, Actual = test_data$Class)
confusion_matrix_test

##          Actual
## Predicted FALSE TRUE
##         0    40   29
##         1    50  181

FP <- sum(test_data$Class == 0 & test_pred_class == 1)
FN <- sum(test_data$Class == 1 & test_pred_class == 0)
MR_test <- (FP + FN) / length(test_pred_class)
MR_test

## [1] 0.2633333

Your comment: Using a cutoff of 0.5, the confusion matrix compares the models prediction against the test data. The MR, which is 0.263, tells us that 26.3% of the test data were incorrect.

2. Please generate the ROC curve and calculate the AUC for the test set. Please comment on this AUC. (2 pts)

# your code here:
pred_obj_test <- prediction(test_pred_prob, test_data$Class)
perf_obj_test <- performance(pred_obj_test, "tpr", "fpr")
plot(perf_obj_test, col = "blue", lwd = 2, main = "ROC Curve")

roc_obj_test <- roc(test_data$Class, test_pred_prob)

## Setting levels: control = FALSE, case = TRUE

## Setting direction: controls < cases

auc_test <- auc(roc_obj_test)
auc_test

## Area under the curve: 0.7964