Midterm

Problem 2.

set.seed(1)

library("ISLR2")

glm.fit <- glm(default ~ income + balance, data = Default, family = binomial(link = "logit"))
summary(glm.fit)

## 
## Call:
## glm(formula = default ~ income + balance, family = binomial(link = "logit"), 
##     data = Default)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.154e+01  4.348e-01 -26.545  < 2e-16 ***
## income       2.081e-05  4.985e-06   4.174 2.99e-05 ***
## balance      5.647e-03  2.274e-04  24.836  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2920.6  on 9999  degrees of freedom
## Residual deviance: 1579.0  on 9997  degrees of freedom
## AIC: 1585
## 
## Number of Fisher Scoring iterations: 8

boot.fn <- function(data, index)
  coef(glm(default ~ income + balance, data = data, subset = index, family = binomial(link = "logit")))

library(boot)

## Warning: package 'boot' was built under R version 4.3.3

boot(Default, boot.fn, 1000)

## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = Default, statistic = boot.fn, R = 1000)
## 
## 
## Bootstrap Statistics :
##          original        bias     std. error
## t1* -1.154047e+01 -3.945460e-02 4.344722e-01
## t2*  2.080898e-05  1.680317e-07 4.866284e-06
## t3*  5.647103e-03  1.855765e-05 2.298949e-04

glm() = 0.43, 0.0000050, 0.00023 bootstrap = 0.45, 0.0000050, 0.00023

Comparatively, the glm() model and bootstrap gave very similar standard errors for the coefficents in the logistic regression model. The only slight differences appear in the later decimal places with bootstrap generally giving the larger numbers.