Solution For Assignment on Default Dataset

Sameer Mathur

Importing Data

library(ISLR)
library(data.table)
# reading inbuilt data as data table
default.dt <- data.table(Default)
# dimension of the data table
dim(default.dt)
[1] 10000     4

LOGISTIC REGRESSION

Model 1

# fit simple linear logistic model
Model1 <- glm(default ~ balance, data = default.dt, family = binomial())
# summary of the model
summary(Model1)

Call:
glm(formula = default ~ balance, family = binomial(), data = default.dt)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2697  -0.1465  -0.0589  -0.0221   3.7589  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.065e+01  3.612e-01  -29.49   <2e-16 ***
balance      5.499e-03  2.204e-04   24.95   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2920.6  on 9999  degrees of freedom
Residual deviance: 1596.5  on 9998  degrees of freedom
AIC: 1600.5

Number of Fisher Scoring iterations: 8

Calculation

Q1- Using Model 1 what is the probability of Default when balance = 1500?

Coefficients given by Model 1

             coefficients
(Intercept) -10.651330614
balance       0.005498917

fitting the equation using these coefficients

\[ \hat{p}(X) = \frac{e^{\hat{\beta_0}+ \hat{\beta_1}*X}}{1 + e^{\hat{\beta_0}+ \hat{\beta_1}*X}} \]

\[ = \frac{e^{-10.651330614 + 0.005498917*1500}}{1 + e^{-10.651330614 + 0.005498917*1500}} \]

\[ = \frac{e^{-2.402955}}{1 + e^{-2.402955 }} \]

\[ = 0.08294763 \]

Prediction by R

Q1- Using Model 1 what is the probability of Default when balance = 1500?

# creating single value data frame
newdata <- data.frame(balance = 1500)
# predicting probability
PredProb <- predict(Model1, newdata , type = "response")
PredProb
         1 
0.08294762 

Model 2

# fit multiple linear logistic model with two variables
Model2 <- glm(default ~ student,
                      data = default.dt, family = binomial())
# summary of the model
summary(Model2)

Call:
glm(formula = default ~ student, family = binomial(), data = default.dt)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2970  -0.2970  -0.2434  -0.2434   2.6585  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.50413    0.07071  -49.55  < 2e-16 ***
studentYes   0.40489    0.11502    3.52 0.000431 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2920.6  on 9999  degrees of freedom
Residual deviance: 2908.7  on 9998  degrees of freedom
AIC: 2912.7

Number of Fisher Scoring iterations: 6

Q2- Using Model 2 what is the probability of Default by non-students ?

Coefficients given by Model 2

            coefficients
(Intercept)   -3.5041278
studentYes     0.4048871

fitting the equation using these coefficients

\[ \hat{p}(X) = \frac{e^{\hat{\beta_0}+ \hat{\beta_1}*X}}{1 + e^{\hat{\beta_0}+ \hat{\beta_1}*X}} \]

As we have to calculate for non-student so for this taking X = 0 i.e. for student {“No” = 0,“Yes” = 1}

\[ = \frac{e^{-3.5041278 + 0.4048871*0}}{1 + e^{-3.5041278 + 0.4048871*0}} \]

\[ = 0.02919501 \]

Prediction by R

Q2- Using Model 2 what is the probability of Default by non-students ?

# create a single value dataframe
newdata2 <- data.frame(student = "No")
newdata2
  student
1      No
# prediction of glm
predict(Model2, newdata2, type = "response")
         1 
0.02919501 

Model 3

# fit multiple linear logistic model with all variables
Model3 <- glm(default ~ .,
                      data = default.dt, family = binomial())
# summary of the model
summary(Model3)

Call:
glm(formula = default ~ ., family = binomial(), data = default.dt)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4691  -0.1418  -0.0557  -0.0203   3.7383  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.087e+01  4.923e-01 -22.080  < 2e-16 ***
studentYes  -6.468e-01  2.363e-01  -2.738  0.00619 ** 
balance      5.737e-03  2.319e-04  24.738  < 2e-16 ***
income       3.033e-06  8.203e-06   0.370  0.71152    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2920.6  on 9999  degrees of freedom
Residual deviance: 1571.5  on 9996  degrees of freedom
AIC: 1579.5

Number of Fisher Scoring iterations: 8

Q3- Using Model 3, what is the probability of Default by students having average Balance and average Income?

Coefficients given by Model 3

             coefficients
(Intercept) -1.086905e+01
studentYes  -6.467758e-01
balance      5.736505e-03
income       3.033450e-06

fitting the equation using these coefficients

\[ \hat{p}(X) = \frac{e^{\hat{\beta_0}+ \hat{\beta_1}*X_1 + \hat{\beta_2}*X_2 + \hat{\beta_3}*X_3}}{1 + e^{\hat{\beta_0}+ \hat{\beta_1}*X_1 + \hat{\beta_2}*X_2 + \hat{\beta_3}*X_3}} \]

\[ = \frac{e^{-10.86905 -0.6467758*1 + 0.005736505*835.3749 +0.00000303345*33516.98 }}{1 + e^{-10.86905 -0.6467758*1 + 0.005736505*835.3749 +0.00000303345*33516.98 }} \]

\[ = \frac{e^{-6.622021}}{1 + e^{-6.622021}} \]

\[ = 0.00132897 \]

Prediction by R

Q3- Using Model 3, what is the probability of Default by students having average Balance and average Income.

# create a single value dataframe
newdata3 <- data.frame(balance = mean(default.dt$balance),income = mean(default.dt$income),student = "Yes")
newdata3
   balance   income student
1 835.3749 33516.98     Yes
# prediction of glm
predict(Model3, newdata3, type = "response")
          1 
0.001328976 

Q4- Using Model 3, what is the probability of Default by non-students having average Balance and average Income.

Coefficients given by Model 3

             coefficients
(Intercept) -1.086905e+01
studentYes  -6.467758e-01
balance      5.736505e-03
income       3.033450e-06

fitting the equation using these coefficients

\[ \hat{p}(X) = \frac{e^{\hat{\beta_0}+ \hat{\beta_1}*X_1 + \hat{\beta_2}*X_2 + \hat{\beta_3}*X_3}}{1 + e^{\hat{\beta_0}+ \hat{\beta_1}*X_1 + \hat{\beta_2}*X_2 + \hat{\beta_3}*X_3}} \]

\[ = \frac{e^{-10.86905 - 0.6467758*0 + 0.005736505*835.3749 +0.00000303345*33516.98 }}{1 + e^{-10.86905 - 0.6467758*0 + 0.005736505*835.3749 +0.00000303345*33516.98 }} \]

\[ = \frac{e^{-5.975246}}{1 + e^{-5.975246}} \]

\[ = 0.002534437 \]

Prediction by R

Q4- Using Model 3, what is the probability of Default by non-students having average Balance and average Income.

# create a single value dataframe
newdata3 <- data.frame(balance = mean(default.dt$balance),income = mean(default.dt$income),student = "No")
newdata3
   balance   income student
1 835.3749 33516.98      No
# prediction of glm
predict(Model3, newdata3, type = "response")
          1 
0.002534451