Question 5
a
library(ISLR2)
glm.fit <- glm(default ~ income + balance, data = Default, family = binomial)
summary(glm.fit)
Call:
glm(formula = default ~ income + balance, family = binomial,
data = Default)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.154e+01 4.348e-01 -26.545 < 2e-16 ***
income 2.081e-05 4.985e-06 4.174 2.99e-05 ***
balance 5.647e-03 2.274e-04 24.836 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2920.6 on 9999 degrees of freedom
Residual deviance: 1579.0 on 9997 degrees of freedom
AIC: 1585
Number of Fisher Scoring iterations: 8
b
We now estimate the test error of this model using the validation set
approach. We will split the dataset 50/50, fit the model on the training
set, predict on the validation set using a threshold of 0.5, and
calculate the validation error rate.
set.seed(1)
train <- sample(nrow(Default), nrow(Default) / 2)
glm.train_fit <- glm(default ~ income + balance, data = Default, family = binomial, subset = train)
glm.probs <- predict(glm.train_fit, Default[-train, ], type = "response")
glm.pred <- ifelse(glm.probs > 0.5, "Yes", "No")
val_error <- mean(glm.pred != Default[-train, ]$default)
cat("Validation Set Error Rate:", val_error, "\n")
Validation Set Error Rate: 0.0254
c
for (s in 2:4) {
set.seed(s)
train_split <- sample(nrow(Default), nrow(Default) / 2)
fit <- glm(default ~ income + balance, data = Default, family = binomial, subset = train_split)
probs <- predict(fit, Default[-train_split, ], type = "response")
preds <- ifelse(probs > 0.5, "Yes", "No")
err <- mean(preds != Default[-train_split, ]$default)
cat("Seed:", s, "-> Validation Error Rate:", round(err * 100, 2), "%\n")
}
Seed: 2 -> Validation Error Rate: 2.38 %
Seed: 3 -> Validation Error Rate: 2.64 %
Seed: 4 -> Validation Error Rate: 2.56 %
The validation error rates fluctuate between approximately 2.44% and
2.74%. This variability shows that the validation set approach has some
variance, as the estimated test error rate depends on which observations
happen to end up in the training and validation sets.
d
set.seed(1)
train_d <- sample(nrow(Default), nrow(Default) / 2)
glm.student_fit <- glm(default ~ income + balance + student, data = Default, family = binomial, subset = train_d)
student.probs <- predict(glm.student_fit, Default[-train_d, ], type = "response")
student.pred <- ifelse(student.probs > 0.5, "Yes", "No")
val_error_student <- mean(student.pred != Default[-train_d, ]$default)
cat("Validation Set Error Rate (with Student):", val_error_student, "\n")
Validation Set Error Rate (with Student): 0.026
Including the dummy variable does not lead to a reduction in the test
error rate.
Question 6
a
glm.full <- glm(default ~ income + balance, data = Default, family = binomial)
summary(glm.full)$coefficients[, c("Estimate", "Std. Error")]
Estimate Std. Error
(Intercept) -1.154047e+01 4.347564e-01
income 2.080898e-05 4.985167e-06
balance 5.647103e-03 2.273731e-04
b
boot.fn <- function(data, index) {
fit <- glm(default ~ income + balance, data = data, family = binomial, subset = index)
return(coef(fit))
}
c
library(boot)
set.seed(1)
boot_results <- boot(Default, boot.fn, R = 1000)
boot_results
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Default, statistic = boot.fn, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* -1.154047e+01 -3.945460e-02 4.344722e-01
t2* 2.080898e-05 1.680317e-07 4.866284e-06
t3* 5.647103e-03 1.855765e-05 2.298949e-04
d
The bootstrap standard errors are similar to those obtained using
standard logistic regression formulas.
Question 9
a
mu.hat <- mean(Boston$medv)
mu.hat
[1] 22.53281
b
se.hat <- sd(Boston$medv) / sqrt(nrow(Boston))
se.hat
[1] 0.4088611
c
boot.fn.mean <- function(data, index) {
return(mean(data$medv[index]))
}
set.seed(1)
boot.mean_results <- boot(Boston, boot.fn.mean, R = 1000)
boot.mean_results
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Boston, statistic = boot.fn.mean, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 22.53281 0.007650791 0.4106622
The bootstrap standard error estimate is very close to the
theoretical standard error.
d
boot_se <- sd(boot.mean_results$t)
ci.boot <- c(mu.hat - 2 * boot_se, mu.hat + 2 * boot_se)
cat("Bootstrap 95% CI:", ci.boot, "\n")
Bootstrap 95% CI: 21.71148 23.35413
t.test(Boston$medv)
One Sample t-test
data: Boston$medv
t = 55.111, df = 505, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
21.72953 23.33608
sample estimates:
mean of x
22.53281
The bootstrap 95% confidence interval is almost identical to the
interval obtained using the standard formula.
e
median.hat <- median(Boston$medv)
median.hat
[1] 21.2
f
boot.fn.median <- function(data, index) {
return(median(data$medv[index]))
}
set.seed(1)
boot.median_results <- boot(Boston, boot.fn.median, R = 1000)
boot.median_results
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Boston, statistic = boot.fn.median, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 21.2 0.02295 0.3778075
Because there is no straightforward theoretical formula for the
standard error of a median, the boot strap is an incredibly useful tool
here. The standard error is small, indicating that the median estimate
is relatively stable.
g
q10.hat <- quantile(Boston$medv, 0.1)
q10.hat
10%
12.75
h
boot.fn.q10 <- function(data, index) {
return(quantile(data$medv[index], 0.1))
}
set.seed(1)
boot.q10_results <- boot(Boston, boot.fn.q10, R = 1000)
boot.q10_results
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Boston, statistic = boot.fn.q10, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 12.75 0.0339 0.4767526
Similar to the median, there is no simple analytical formula for the
standard error of percentiles. The standard error xshows that the 10th
percentile is estimated with a moderate degree of precision.
LS0tCnRpdGxlOiAiQ2hyaXMgU2VycmFubyAtIEFzc2lnbm1lbnQgNCIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHRydWUKICAgIHRvY19mbG9hdDogdHJ1ZQogICAgZWNobzogdHJ1ZQotLS0KCiMjIFF1ZXN0aW9uIDMKCiMjIyBhCgprLWZvbGQgY3Jvc3MtdmFsaWRhdGlvbiBpcyBpbXBsZW1lbnRlZCBieSBmb2xsb3dpbmcgdGhlc2Ugc3RlcHM6VGhlIG9yaWdpbmFsIGRhdGFzZXQgaXMgcmFuZG9tbHkgZGl2aWRlZCBpbnRvIGsgcm91Z2hseSBlcXVhbC1zaXplZCwgbm9uLW92ZXJsYXBwaW5nIHN1YnNldHMuCgoxLiAgRm9yIGVhY2ggZm9sZAogICAgLSAgIFVzZSBpIGFzIHRoZSB2YWxpZGF0aW9uIChvciB0ZXN0aW5nKSBzZXQuCiAgICAtICAgVXNlIHRoZSByZW1haW5pbmcgZm9sZHMgY29tYmluZWQgYXMgdGhlIHRyYWluaW5nIHNldC4KICAgIC0gICBGaXQgdGhlIHN0YXRpc3RpY2FsIGxlYXJuaW5nIG1ldGhvZCkgb24gdGhlIHRyYWluaW5nIHNldC4KICAgIC0gICBDb21wdXRlIHRoZSB0ZXN0IGVycm9yCjIuICBUaGUgb3ZlcmFsbCBrIGZvbGQgY3Jvc3MtdmFsaWRhdGlvbiBlc3RpbWF0ZSBvZiB0aGUgdGVzdCBlcnJvciBpcyBjYWxjdWxhdGVkIGJ5IGF2ZXJhZ2luZyB0aGUgaW5kaXZpZHVhbCB0ZXN0IGVycm9ycyBjb21wdXRlZCBvdmVyIGFsbCBrIGl0ZXJhdGlvbnM6CgojIyMgYgoKIyMjIyBpLgoKLSAgICoqQWR2YW50YWdlcyoqOgogICAgMS4gIExvd2VyIFZhcmlhbmNlCiAgICAyLiAgUmVkdWNlZCBQZXNzaW1pc3RpYyBCaWFzCi0gICAqKkRpc2FkdmFudGFnZXMqKjoKICAgIDEuICBDb21wdXRhdGlvbmFsIENvc3QKCiMjIyMgaWkuCgotICAgKipBZHZhbnRhZ2VzKio6CiAgICAxLiAgQ29tcHV0YXRpb25hbCBFZmZpY2llbmN5CiAgICAyLiAgQmV0dGVyIEJpYXMtVmFyaWFuY2UgVHJhZGUtb2ZmCi0gICAqKkRpc2FkdmFudGFnZXMqKjoKICAgIDEuICBIaWdoZXIgQmlhcwoKLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCgojIyBRdWVzdGlvbiA1CgojIyMgYQoKYGBge3J9CmxpYnJhcnkoSVNMUjIpCgpnbG0uZml0IDwtIGdsbShkZWZhdWx0IH4gaW5jb21lICsgYmFsYW5jZSwgZGF0YSA9IERlZmF1bHQsIGZhbWlseSA9IGJpbm9taWFsKQpzdW1tYXJ5KGdsbS5maXQpCmBgYAoKIyMjIGIKCldlIG5vdyBlc3RpbWF0ZSB0aGUgdGVzdCBlcnJvciBvZiB0aGlzIG1vZGVsIHVzaW5nIHRoZSB2YWxpZGF0aW9uIHNldCBhcHByb2FjaC4gV2Ugd2lsbCBzcGxpdCB0aGUgZGF0YXNldCA1MC81MCwgZml0IHRoZSBtb2RlbCBvbiB0aGUgdHJhaW5pbmcgc2V0LCBwcmVkaWN0IG9uIHRoZSB2YWxpZGF0aW9uIHNldCB1c2luZyBhIHRocmVzaG9sZCBvZiAwLjUsIGFuZCBjYWxjdWxhdGUgdGhlIHZhbGlkYXRpb24gZXJyb3IgcmF0ZS4KCmBgYHtyfQpzZXQuc2VlZCgxKQoKdHJhaW4gPC0gc2FtcGxlKG5yb3coRGVmYXVsdCksIG5yb3coRGVmYXVsdCkgLyAyKQoKZ2xtLnRyYWluX2ZpdCA8LSBnbG0oZGVmYXVsdCB+IGluY29tZSArIGJhbGFuY2UsIGRhdGEgPSBEZWZhdWx0LCBmYW1pbHkgPSBiaW5vbWlhbCwgc3Vic2V0ID0gdHJhaW4pCgpnbG0ucHJvYnMgPC0gcHJlZGljdChnbG0udHJhaW5fZml0LCBEZWZhdWx0Wy10cmFpbiwgXSwgdHlwZSA9ICJyZXNwb25zZSIpCmdsbS5wcmVkIDwtIGlmZWxzZShnbG0ucHJvYnMgPiAwLjUsICJZZXMiLCAiTm8iKQoKdmFsX2Vycm9yIDwtIG1lYW4oZ2xtLnByZWQgIT0gRGVmYXVsdFstdHJhaW4sIF0kZGVmYXVsdCkKY2F0KCJWYWxpZGF0aW9uIFNldCBFcnJvciBSYXRlOiIsIHZhbF9lcnJvciwgIlxuIikKYGBgCgojIyMgYwoKYGBge3J9CmZvciAocyBpbiAyOjQpIHsKICBzZXQuc2VlZChzKQogIHRyYWluX3NwbGl0IDwtIHNhbXBsZShucm93KERlZmF1bHQpLCBucm93KERlZmF1bHQpIC8gMikKICAKICBmaXQgPC0gZ2xtKGRlZmF1bHQgfiBpbmNvbWUgKyBiYWxhbmNlLCBkYXRhID0gRGVmYXVsdCwgZmFtaWx5ID0gYmlub21pYWwsIHN1YnNldCA9IHRyYWluX3NwbGl0KQogIHByb2JzIDwtIHByZWRpY3QoZml0LCBEZWZhdWx0Wy10cmFpbl9zcGxpdCwgXSwgdHlwZSA9ICJyZXNwb25zZSIpCiAgcHJlZHMgPC0gaWZlbHNlKHByb2JzID4gMC41LCAiWWVzIiwgIk5vIikKICAKICBlcnIgPC0gbWVhbihwcmVkcyAhPSBEZWZhdWx0Wy10cmFpbl9zcGxpdCwgXSRkZWZhdWx0KQogIGNhdCgiU2VlZDoiLCBzLCAiLT4gVmFsaWRhdGlvbiBFcnJvciBSYXRlOiIsIHJvdW5kKGVyciAqIDEwMCwgMiksICIlXG4iKQp9CmBgYAoKVGhlIHZhbGlkYXRpb24gZXJyb3IgcmF0ZXMgZmx1Y3R1YXRlIGJldHdlZW4gYXBwcm94aW1hdGVseSAyLjQ0JSBhbmQgMi43NCUuIFRoaXMgdmFyaWFiaWxpdHkgc2hvd3MgdGhhdCB0aGUgdmFsaWRhdGlvbiBzZXQgYXBwcm9hY2ggaGFzIHNvbWUgdmFyaWFuY2UsIGFzIHRoZSBlc3RpbWF0ZWQgdGVzdCBlcnJvciByYXRlIGRlcGVuZHMgb24gd2hpY2ggb2JzZXJ2YXRpb25zIGhhcHBlbiB0byBlbmQgdXAgaW4gdGhlIHRyYWluaW5nIGFuZCB2YWxpZGF0aW9uIHNldHMuCgojIyMgZAoKYGBge3J9CnNldC5zZWVkKDEpCnRyYWluX2QgPC0gc2FtcGxlKG5yb3coRGVmYXVsdCksIG5yb3coRGVmYXVsdCkgLyAyKQoKZ2xtLnN0dWRlbnRfZml0IDwtIGdsbShkZWZhdWx0IH4gaW5jb21lICsgYmFsYW5jZSArIHN0dWRlbnQsIGRhdGEgPSBEZWZhdWx0LCBmYW1pbHkgPSBiaW5vbWlhbCwgc3Vic2V0ID0gdHJhaW5fZCkKCnN0dWRlbnQucHJvYnMgPC0gcHJlZGljdChnbG0uc3R1ZGVudF9maXQsIERlZmF1bHRbLXRyYWluX2QsIF0sIHR5cGUgPSAicmVzcG9uc2UiKQpzdHVkZW50LnByZWQgPC0gaWZlbHNlKHN0dWRlbnQucHJvYnMgPiAwLjUsICJZZXMiLCAiTm8iKQoKdmFsX2Vycm9yX3N0dWRlbnQgPC0gbWVhbihzdHVkZW50LnByZWQgIT0gRGVmYXVsdFstdHJhaW5fZCwgXSRkZWZhdWx0KQpjYXQoIlZhbGlkYXRpb24gU2V0IEVycm9yIFJhdGUgKHdpdGggU3R1ZGVudCk6IiwgdmFsX2Vycm9yX3N0dWRlbnQsICJcbiIpCmBgYAoKSW5jbHVkaW5nIHRoZSBkdW1teSB2YXJpYWJsZSBkb2VzIG5vdCBsZWFkIHRvIGEgcmVkdWN0aW9uIGluIHRoZSB0ZXN0IGVycm9yIHJhdGUuCgotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KCiMjIFF1ZXN0aW9uIDYKCiMjIyBhCgpgYGB7cn0KZ2xtLmZ1bGwgPC0gZ2xtKGRlZmF1bHQgfiBpbmNvbWUgKyBiYWxhbmNlLCBkYXRhID0gRGVmYXVsdCwgZmFtaWx5ID0gYmlub21pYWwpCnN1bW1hcnkoZ2xtLmZ1bGwpJGNvZWZmaWNpZW50c1ssIGMoIkVzdGltYXRlIiwgIlN0ZC4gRXJyb3IiKV0KYGBgCgojIyMgYgoKYGBge3J9CmJvb3QuZm4gPC0gZnVuY3Rpb24oZGF0YSwgaW5kZXgpIHsKICBmaXQgPC0gZ2xtKGRlZmF1bHQgfiBpbmNvbWUgKyBiYWxhbmNlLCBkYXRhID0gZGF0YSwgZmFtaWx5ID0gYmlub21pYWwsIHN1YnNldCA9IGluZGV4KQogIHJldHVybihjb2VmKGZpdCkpCn0KCmBgYAoKIyMjIGMKCmBgYHtyfQpsaWJyYXJ5KGJvb3QpCnNldC5zZWVkKDEpCgpib290X3Jlc3VsdHMgPC0gYm9vdChEZWZhdWx0LCBib290LmZuLCBSID0gMTAwMCkKYm9vdF9yZXN1bHRzCmBgYAoKIyMjIGQKClRoZSBib290c3RyYXAgc3RhbmRhcmQgZXJyb3JzIGFyZSBzaW1pbGFyIHRvIHRob3NlIG9idGFpbmVkIHVzaW5nIHN0YW5kYXJkIGxvZ2lzdGljIHJlZ3Jlc3Npb24gZm9ybXVsYXMuCgotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KCiMjIFF1ZXN0aW9uIDkKCiMjIyBhCgpgYGB7cn0KbXUuaGF0IDwtIG1lYW4oQm9zdG9uJG1lZHYpCm11LmhhdApgYGAKCiMjIyBiCgpgYGB7cn0Kc2UuaGF0IDwtIHNkKEJvc3RvbiRtZWR2KSAvIHNxcnQobnJvdyhCb3N0b24pKQpzZS5oYXQKYGBgCgojIyMgYwoKYGBge3J9CmJvb3QuZm4ubWVhbiA8LSBmdW5jdGlvbihkYXRhLCBpbmRleCkgewogIHJldHVybihtZWFuKGRhdGEkbWVkdltpbmRleF0pKQp9CgpzZXQuc2VlZCgxKQpib290Lm1lYW5fcmVzdWx0cyA8LSBib290KEJvc3RvbiwgYm9vdC5mbi5tZWFuLCBSID0gMTAwMCkKYm9vdC5tZWFuX3Jlc3VsdHMKYGBgCgpUaGUgYm9vdHN0cmFwIHN0YW5kYXJkIGVycm9yIGVzdGltYXRlIGlzIHZlcnkgY2xvc2UgdG8gdGhlIHRoZW9yZXRpY2FsIHN0YW5kYXJkIGVycm9yLgoKIyMjIGQKCmBgYHtyfQpib290X3NlIDwtIHNkKGJvb3QubWVhbl9yZXN1bHRzJHQpCgpjaS5ib290IDwtIGMobXUuaGF0IC0gMiAqIGJvb3Rfc2UsIG11LmhhdCArIDIgKiBib290X3NlKQpjYXQoIkJvb3RzdHJhcCA5NSUgQ0k6IiwgY2kuYm9vdCwgIlxuIikKCnQudGVzdChCb3N0b24kbWVkdikKYGBgCgpUaGUgYm9vdHN0cmFwIDk1JSBjb25maWRlbmNlIGludGVydmFsIGlzIGFsbW9zdCBpZGVudGljYWwgdG8gdGhlIGludGVydmFsIG9idGFpbmVkIHVzaW5nIHRoZSBzdGFuZGFyZCBmb3JtdWxhLgoKIyMjIGUKCmBgYHtyfQptZWRpYW4uaGF0IDwtIG1lZGlhbihCb3N0b24kbWVkdikKbWVkaWFuLmhhdApgYGAKCiMjIyBmCgpgYGB7cn0KYm9vdC5mbi5tZWRpYW4gPC0gZnVuY3Rpb24oZGF0YSwgaW5kZXgpIHsKICByZXR1cm4obWVkaWFuKGRhdGEkbWVkdltpbmRleF0pKQp9CgpzZXQuc2VlZCgxKQpib290Lm1lZGlhbl9yZXN1bHRzIDwtIGJvb3QoQm9zdG9uLCBib290LmZuLm1lZGlhbiwgUiA9IDEwMDApCmJvb3QubWVkaWFuX3Jlc3VsdHMKYGBgCgpCZWNhdXNlIHRoZXJlIGlzIG5vIHN0cmFpZ2h0Zm9yd2FyZCB0aGVvcmV0aWNhbCBmb3JtdWxhIGZvciB0aGUgc3RhbmRhcmQgZXJyb3Igb2YgYSBtZWRpYW4sIHRoZSBib290IHN0cmFwIGlzIGFuIGluY3JlZGlibHkgdXNlZnVsIHRvb2wgaGVyZS4gVGhlIHN0YW5kYXJkIGVycm9yIGlzIHNtYWxsLCBpbmRpY2F0aW5nIHRoYXQgdGhlIG1lZGlhbiBlc3RpbWF0ZSBpcyByZWxhdGl2ZWx5IHN0YWJsZS4KCiMjIyBnCgpgYGB7cn0KcTEwLmhhdCA8LSBxdWFudGlsZShCb3N0b24kbWVkdiwgMC4xKQpxMTAuaGF0CmBgYAoKIyMjIGgKCmBgYHtyfQpib290LmZuLnExMCA8LSBmdW5jdGlvbihkYXRhLCBpbmRleCkgewogIHJldHVybihxdWFudGlsZShkYXRhJG1lZHZbaW5kZXhdLCAwLjEpKQp9CgpzZXQuc2VlZCgxKQpib290LnExMF9yZXN1bHRzIDwtIGJvb3QoQm9zdG9uLCBib290LmZuLnExMCwgUiA9IDEwMDApCmJvb3QucTEwX3Jlc3VsdHMKYGBgCgpTaW1pbGFyIHRvIHRoZSBtZWRpYW4sIHRoZXJlIGlzIG5vIHNpbXBsZSBhbmFseXRpY2FsIGZvcm11bGEgZm9yIHRoZSBzdGFuZGFyZCBlcnJvciBvZiBwZXJjZW50aWxlcy4gVGhlIHN0YW5kYXJkIGVycm9yIHhzaG93cyB0aGF0IHRoZSAxMHRoIHBlcmNlbnRpbGUgaXMgZXN0aW1hdGVkIHdpdGggYSBtb2RlcmF0ZSBkZWdyZWUgb2YgcHJlY2lzaW9uLgo=