Sameer Mathur
library(ISLR)
library(data.table)
# reading inbuilt data as data table
default.dt <- data.table(Default)
# dimension of the data table
dim(default.dt)
[1] 10000 4
# fit simple linear logistic model
Model1 <- glm(default ~ balance, data = default.dt, family = binomial())
# summary of the model
summary(Model1)
Call:
glm(formula = default ~ balance, family = binomial(), data = default.dt)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2697 -0.1465 -0.0589 -0.0221 3.7589
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.065e+01 3.612e-01 -29.49 <2e-16 ***
balance 5.499e-03 2.204e-04 24.95 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2920.6 on 9999 degrees of freedom
Residual deviance: 1596.5 on 9998 degrees of freedom
AIC: 1600.5
Number of Fisher Scoring iterations: 8
Q1- Using Model 1 what is the probability of Default when balance = 1500?
Coefficients given by Model 1
coefficients
(Intercept) -10.651330614
balance 0.005498917
\[ \hat{p}(X) = \frac{e^{\hat{\beta_0}+ \hat{\beta_1}*X}}{1 + e^{\hat{\beta_0}+ \hat{\beta_1}*X}} \]
\[ = \frac{e^{-10.651330614 + 0.005498917*1500}}{1 + e^{-10.651330614 + 0.005498917*1500}} \]
\[ = \frac{e^{-2.402955}}{1 + e^{-2.402955 }} \]
\[ = 0.08294763 \]
Q1- Using Model 1 what is the probability of Default when balance = 1500?
# creating single value data frame
newdata <- data.frame(balance = 1500)
# predicting probability
PredProb <- predict(Model1, newdata , type = "response")
PredProb
1
0.08294762
# fit multiple linear logistic model with two variables
Model2 <- glm(default ~ student,
data = default.dt, family = binomial())
# summary of the model
summary(Model2)
Call:
glm(formula = default ~ student, family = binomial(), data = default.dt)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.2970 -0.2970 -0.2434 -0.2434 2.6585
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.50413 0.07071 -49.55 < 2e-16 ***
studentYes 0.40489 0.11502 3.52 0.000431 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2920.6 on 9999 degrees of freedom
Residual deviance: 2908.7 on 9998 degrees of freedom
AIC: 2912.7
Number of Fisher Scoring iterations: 6
Q2- Using Model 2 what is the probability of Default by non-students ?
Coefficients given by Model 2
coefficients
(Intercept) -3.5041278
studentYes 0.4048871
\[ \hat{p}(X) = \frac{e^{\hat{\beta_0}+ \hat{\beta_1}*X}}{1 + e^{\hat{\beta_0}+ \hat{\beta_1}*X}} \]
As we have to calculate for non-student so for this taking X = 0 i.e. for student {“No” = 0,“Yes” = 1}
\[ = \frac{e^{-3.5041278 + 0.4048871*0}}{1 + e^{-3.5041278 + 0.4048871*0}} \]
\[ = 0.02919501 \]
Q2- Using Model 2 what is the probability of Default by non-students ?
# create a single value dataframe
newdata2 <- data.frame(student = "No")
newdata2
student
1 No
# prediction of glm
predict(Model2, newdata2, type = "response")
1
0.02919501
# fit multiple linear logistic model with all variables
Model3 <- glm(default ~ .,
data = default.dt, family = binomial())
# summary of the model
summary(Model3)
Call:
glm(formula = default ~ ., family = binomial(), data = default.dt)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4691 -0.1418 -0.0557 -0.0203 3.7383
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.087e+01 4.923e-01 -22.080 < 2e-16 ***
studentYes -6.468e-01 2.363e-01 -2.738 0.00619 **
balance 5.737e-03 2.319e-04 24.738 < 2e-16 ***
income 3.033e-06 8.203e-06 0.370 0.71152
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2920.6 on 9999 degrees of freedom
Residual deviance: 1571.5 on 9996 degrees of freedom
AIC: 1579.5
Number of Fisher Scoring iterations: 8
Q3- Using Model 3, what is the probability of Default by students having average Balance and average Income?
Coefficients given by Model 3
coefficients
(Intercept) -1.086905e+01
studentYes -6.467758e-01
balance 5.736505e-03
income 3.033450e-06
\[ \hat{p}(X) = \frac{e^{\hat{\beta_0}+ \hat{\beta_1}*X_1 + \hat{\beta_2}*X_2 + \hat{\beta_3}*X_3}}{1 + e^{\hat{\beta_0}+ \hat{\beta_1}*X_1 + \hat{\beta_2}*X_2 + \hat{\beta_3}*X_3}} \]
\[ = \frac{e^{-10.86905 -0.6467758*1 + 0.005736505*835.3749 +0.00000303345*33516.98 }}{1 + e^{-10.86905 -0.6467758*1 + 0.005736505*835.3749 +0.00000303345*33516.98 }} \]
\[ = \frac{e^{-6.622021}}{1 + e^{-6.622021}} \]
\[ = 0.00132897 \]
Q3- Using Model 3, what is the probability of Default by students having average Balance and average Income.
# create a single value dataframe
newdata3 <- data.frame(balance = mean(default.dt$balance),income = mean(default.dt$income),student = "Yes")
newdata3
balance income student
1 835.3749 33516.98 Yes
# prediction of glm
predict(Model3, newdata3, type = "response")
1
0.001328976
Q4- Using Model 3, what is the probability of Default by non-students having average Balance and average Income.
Coefficients given by Model 3
coefficients
(Intercept) -1.086905e+01
studentYes -6.467758e-01
balance 5.736505e-03
income 3.033450e-06
\[ \hat{p}(X) = \frac{e^{\hat{\beta_0}+ \hat{\beta_1}*X_1 + \hat{\beta_2}*X_2 + \hat{\beta_3}*X_3}}{1 + e^{\hat{\beta_0}+ \hat{\beta_1}*X_1 + \hat{\beta_2}*X_2 + \hat{\beta_3}*X_3}} \]
\[ = \frac{e^{-10.86905 - 0.6467758*0 + 0.005736505*835.3749 +0.00000303345*33516.98 }}{1 + e^{-10.86905 - 0.6467758*0 + 0.005736505*835.3749 +0.00000303345*33516.98 }} \]
\[ = \frac{e^{-5.975246}}{1 + e^{-5.975246}} \]
\[ = 0.002534437 \]
Q4- Using Model 3, what is the probability of Default by non-students having average Balance and average Income.
# create a single value dataframe
newdata3 <- data.frame(balance = mean(default.dt$balance),income = mean(default.dt$income),student = "No")
newdata3
balance income student
1 835.3749 33516.98 No
# prediction of glm
predict(Model3, newdata3, type = "response")
1
0.002534451