Packages

library(ISLR2)     
library(boot)    
library(splines)
library(gam)      
library(leaps)  
library(MASS)      
library(ggplot2)
set.seed(1)

Exercise 6(b) — Step function fit on `Wage`

Goal. Fit a step function for wage ~ age, choose the number of cuts by 10-fold cross-validation, and plot the resulting fit.

attach(Wage)

max_cuts <- 10
cv_err   <- rep(NA, max_cuts)

for (k in 2:max_cuts) {
  Wage$age.cut <- cut(Wage$age, k)
  fit <- glm(wage ~ age.cut, data = Wage)
  cv_err[k] <- cv.glm(Wage, fit, K = 10)$delta[1]
}

cv_err

##  [1]       NA 1734.489 1684.271 1635.552 1632.080 1623.415 1614.996 1601.318
##  [9] 1613.954 1606.331

best_k <- which.min(cv_err)
best_k

## [1] 8

The CV curve is minimized at best_k cuts. Now refit with the chosen number of cuts and plot.

plot(2:max_cuts, cv_err[-1], type = "b", pch = 19,
     xlab = "Number of cuts", ylab = "10-fold CV MSE",
     main = "CV error vs. number of cuts")
abline(v = best_k, lty = 2, col = "red")

fit_best <- glm(wage ~ cut(age, best_k), data = Wage)
age_grid <- seq(min(Wage$age), max(Wage$age), length.out = 200)
preds    <- predict(fit_best, newdata = list(age = age_grid))

plot(Wage$age, Wage$wage, col = "darkgrey", cex = 0.5,
     xlab = "Age", ylab = "Wage",
     main = paste("Step function fit with", best_k, "cuts"))
lines(age_grid, preds, lwd = 2, col = "blue")

detach(Wage)

Comment. The CV error drops sharply from 2 to ~4 cuts, then plateaus. After roughly 4–8 cuts the additional flexibility yields diminishing returns; the chosen best_k gives a reasonable bias–variance trade-off.

Exercise 7 — Exploring other `Wage` predictors with non-linear methods

The Wage data contains several factors not used in the chapter examples: maritl, jobclass, race, education, health, health_ins, plus the continuous year and age. Let’s explore them.

7.1 Marginal relationships

par(mfrow = c(2, 3))
plot(Wage$maritl,      Wage$wage, main = "Marital status",  col = "lightblue")
plot(Wage$jobclass,    Wage$wage, main = "Job class",       col = "lightblue")
plot(Wage$race,        Wage$wage, main = "Race",            col = "lightblue")
plot(Wage$education,   Wage$wage, main = "Education",       col = "lightblue")
plot(Wage$health,      Wage$wage, main = "Health",          col = "lightblue")
plot(Wage$health_ins,  Wage$wage, main = "Health insurance",col = "lightblue")

par(mfrow = c(1, 1))

Observations.

Marital status: married workers tend to earn more than never-married or divorced/separated.
Job class: information-sector wages exceed industrial-sector wages.
Education: the strongest single predictor — wage rises monotonically with education level.
Health insurance: insured workers earn substantially more (likely a proxy for full-time/skilled employment).
Race / health: smaller but visible differences.

7.2 ANOVA: do these factors add explanatory power beyond a flexible age fit?

gam_base <- gam(wage ~ s(age, 5) + s(year, 4),                                data = Wage)
gam_m1   <- gam(wage ~ s(age, 5) + s(year, 4) + education,                    data = Wage)
gam_m2   <- gam(wage ~ s(age, 5) + s(year, 4) + education + jobclass,         data = Wage)
gam_m3   <- gam(wage ~ s(age, 5) + s(year, 4) + education + jobclass + maritl,data = Wage)
gam_m4   <- gam(wage ~ s(age, 5) + s(year, 4) + education + jobclass + maritl + health + health_ins,
                data = Wage)

anova(gam_base, gam_m1, gam_m2, gam_m3, gam_m4, test = "F")

## Analysis of Deviance Table
## 
## Model 1: wage ~ s(age, 5) + s(year, 4)
## Model 2: wage ~ s(age, 5) + s(year, 4) + education
## Model 3: wage ~ s(age, 5) + s(year, 4) + education + jobclass
## Model 4: wage ~ s(age, 5) + s(year, 4) + education + jobclass + maritl
## Model 5: wage ~ s(age, 5) + s(year, 4) + education + jobclass + maritl + 
##     health + health_ins
##   Resid. Df Resid. Dev Df Deviance       F    Pr(>F)    
## 1      2990    4738489                                  
## 2      2986    3689770  4  1048718 230.217 < 2.2e-16 ***
## 3      2985    3677553  1    12218  10.728  0.001067 ** 
## 4      2981    3581781  4    95772  21.024 < 2.2e-16 ***
## 5      2979    3392602  2   189179  83.058 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Each successive predictor is highly significant — every term improves the fit beyond what a smooth in age and year already explains.

7.3 Visualizing the GAM fit

par(mfrow = c(2, 3))
plot(gam_m4, se = TRUE, col = "blue")

par(mfrow = c(1, 1))

Findings.

Wage rises steeply with age until about 40, then flattens and decreases past 60 — clear non-linearity that justifies the smoother.
The year effect is roughly linear and modest (a mild upward trend).
Education, job class, marital status, and health insurance all shift the mean wage substantially.
A linear-only model would miss the age curvature and underestimate the contribution of these categorical predictors.

Exercise 8 — Non-linear models on `Auto`

Question. Is there evidence for non-linear relationships in the Auto data?

auto <- na.omit(Auto)
pairs(auto[, c("mpg", "horsepower", "weight", "displacement", "acceleration")],
      pch = ".", main = "Auto: pairwise relationships")

mpg vs. horsepower, weight, and displacement all show clear curvature. Focus on mpg ~ horsepower.

8.1 Polynomial regression: ANOVA to test degree

fit1 <- lm(mpg ~ poly(horsepower, 1), data = auto)
fit2 <- lm(mpg ~ poly(horsepower, 2), data = auto)
fit3 <- lm(mpg ~ poly(horsepower, 3), data = auto)
fit4 <- lm(mpg ~ poly(horsepower, 4), data = auto)
fit5 <- lm(mpg ~ poly(horsepower, 5), data = auto)
anova(fit1, fit2, fit3, fit4, fit5)

## Analysis of Variance Table
## 
## Model 1: mpg ~ poly(horsepower, 1)
## Model 2: mpg ~ poly(horsepower, 2)
## Model 3: mpg ~ poly(horsepower, 3)
## Model 4: mpg ~ poly(horsepower, 4)
## Model 5: mpg ~ poly(horsepower, 5)
##   Res.Df    RSS Df Sum of Sq        F    Pr(>F)    
## 1    390 9385.9                                    
## 2    389 7442.0  1   1943.89 103.8767 < 2.2e-16 ***
## 3    388 7426.4  1     15.59   0.8333  0.361897    
## 4    387 7399.5  1     26.91   1.4382  0.231169    
## 5    386 7223.4  1    176.15   9.4131  0.002306 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The quadratic term is overwhelmingly significant (p < 2e-16); the cubic term is borderline; degrees 4 and 5 add nothing. So a quadratic fit captures the non-linearity.

8.2 Cross-validating the polynomial degree

cv_poly <- rep(NA, 10)
for (d in 1:10) {
  fit <- glm(mpg ~ poly(horsepower, d), data = auto)
  cv_poly[d] <- cv.glm(auto, fit, K = 10)$delta[1]
}
plot(1:10, cv_poly, type = "b", pch = 19,
     xlab = "Polynomial degree", ylab = "10-fold CV MSE",
     main = "mpg ~ poly(horsepower, d)")

which.min(cv_poly)

## [1] 7

CV confirms degree 2 (or 2–3) as best — going further does not help.

8.3 Splines and a GAM

hp_grid <- seq(min(auto$horsepower), max(auto$horsepower), length.out = 200)

fit_bs <- lm(mpg ~ bs(horsepower, df = 4), data = auto)
fit_ns <- lm(mpg ~ ns(horsepower, df = 4), data = auto)
fit_q  <- lm(mpg ~ poly(horsepower, 2),    data = auto)

plot(auto$horsepower, auto$mpg, col = "grey", pch = 19, cex = 0.6,
     xlab = "Horsepower", ylab = "MPG",
     main = "Auto: non-linear fits of mpg on horsepower")
lines(hp_grid, predict(fit_q,  newdata = list(horsepower = hp_grid)), col = "blue",  lwd = 2)
lines(hp_grid, predict(fit_bs, newdata = list(horsepower = hp_grid)), col = "red",   lwd = 2, lty = 2)
lines(hp_grid, predict(fit_ns, newdata = list(horsepower = hp_grid)), col = "darkgreen", lwd = 2, lty = 3)
legend("topright", c("Quadratic", "B-spline (df=4)", "Natural spline (df=4)"),
       col = c("blue", "red", "darkgreen"), lty = c(1, 2, 3), lwd = 2, bty = "n")

8.4 A multivariate GAM

gam_auto <- gam(mpg ~ s(horsepower, 4) + s(weight, 4) + s(displacement, 4)
                     + s(acceleration, 4) + factor(origin),
                data = auto)
par(mfrow = c(1, 4))
plot(gam_auto, se = TRUE, col = "blue")

par(mfrow = c(1, 1))

summary(gam_auto)

## 
## Call: gam(formula = mpg ~ s(horsepower, 4) + s(weight, 4) + s(displacement, 
##     4) + s(acceleration, 4) + factor(origin), data = auto)
## Deviance Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5252  -2.2879  -0.2182   1.8221  16.4252 
## 
## (Dispersion Parameter for gaussian family taken to be 13.9528)
## 
##     Null Deviance: 23818.99 on 391 degrees of freedom
## Residual Deviance: 5204.398 on 373.0002 degrees of freedom
## AIC: 2166.159 
## 
## Number of Local Scoring Iterations: NA 
## 
## Anova for Parametric Effects
##                     Df  Sum Sq Mean Sq   F value    Pr(>F)    
## s(horsepower, 4)     1 15675.1 15675.1 1123.4355 < 2.2e-16 ***
## s(weight, 4)         1  1088.1  1088.1   77.9821 < 2.2e-16 ***
## s(displacement, 4)   1    53.7    53.7    3.8495 0.0505048 .  
## s(acceleration, 4)   1   155.9   155.9   11.1737 0.0009136 ***
## factor(origin)       2   213.3   106.6    7.6436 0.0005580 ***
## Residuals          373  5204.4    14.0                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Anova for Nonparametric Effects
##                    Npar Df  Npar F    Pr(F)    
## (Intercept)                                    
## s(horsepower, 4)         3 15.3602 1.92e-09 ***
## s(weight, 4)             3  0.7805 0.505404    
## s(displacement, 4)       3  5.6932 0.000806 ***
## s(acceleration, 4)       3  2.5391 0.056280 .  
## factor(origin)                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion. Yes — there is strong evidence for non-linearity, most visibly in horsepower, weight, and displacement. The non-parametric ANOVA in summary(gam_auto) rejects linearity for these predictors. A quadratic in horsepower already captures most of the curvature; splines and GAMs offer modest additional flexibility.

Exercise 9 — `Boston` data: `nox ~ dis`

data(Boston, package = "MASS")
boston <- Boston

(a) Cubic polynomial fit

fit_poly3 <- lm(nox ~ poly(dis, 3), data = boston)
summary(fit_poly3)

## 
## Call:
## lm(formula = nox ~ poly(dis, 3), data = boston)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.121130 -0.040619 -0.009738  0.023385  0.194904 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.554695   0.002759 201.021  < 2e-16 ***
## poly(dis, 3)1 -2.003096   0.062071 -32.271  < 2e-16 ***
## poly(dis, 3)2  0.856330   0.062071  13.796  < 2e-16 ***
## poly(dis, 3)3 -0.318049   0.062071  -5.124 4.27e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06207 on 502 degrees of freedom
## Multiple R-squared:  0.7148, Adjusted R-squared:  0.7131 
## F-statistic: 419.3 on 3 and 502 DF,  p-value: < 2.2e-16

dis_grid <- seq(min(boston$dis), max(boston$dis), length.out = 200)
preds    <- predict(fit_poly3, newdata = list(dis = dis_grid), se = TRUE)

plot(boston$dis, boston$nox, col = "darkgrey", pch = 19, cex = 0.5,
     xlab = "dis", ylab = "nox",
     main = "Cubic polynomial fit: nox ~ poly(dis, 3)")
lines(dis_grid, preds$fit,                 col = "blue", lwd = 2)
lines(dis_grid, preds$fit + 2 * preds$se,  col = "blue", lwd = 1, lty = 2)
lines(dis_grid, preds$fit - 2 * preds$se,  col = "blue", lwd = 1, lty = 2)

All three polynomial terms are highly significant. The fit captures a clear decreasing trend with curvature: nox falls sharply as dis increases from low values, then levels off.

(b) Polynomial degrees 1 through 10

rss_poly <- rep(NA, 10)
plot(boston$dis, boston$nox, col = "darkgrey", pch = 19, cex = 0.5,
     xlab = "dis", ylab = "nox",
     main = "Polynomial fits, degrees 1 to 10")
cols <- rainbow(10)
for (d in 1:10) {
  fit <- lm(nox ~ poly(dis, d), data = boston)
  rss_poly[d] <- sum(resid(fit)^2)
  lines(dis_grid, predict(fit, newdata = list(dis = dis_grid)),
        col = cols[d], lwd = 1.5)
}
legend("topright", legend = paste("d =", 1:10),
       col = cols, lwd = 1.5, cex = 0.7, bty = "n", ncol = 2)

data.frame(degree = 1:10, RSS = round(rss_poly, 4))

##    degree    RSS
## 1       1 2.7686
## 2       2 2.0353
## 3       3 1.9341
## 4       4 1.9330
## 5       5 1.9153
## 6       6 1.8783
## 7       7 1.8495
## 8       8 1.8356
## 9       9 1.8333
## 10     10 1.8322

RSS strictly decreases with degree (training-set guarantee), but high-degree fits start to wiggle near the data boundaries — a sign of overfitting.

(c) Cross-validation to pick the polynomial degree

cv_poly_b <- rep(NA, 10)
for (d in 1:10) {
  fit <- glm(nox ~ poly(dis, d), data = boston)
  cv_poly_b[d] <- cv.glm(boston, fit, K = 10)$delta[1]
}
plot(1:10, cv_poly_b, type = "b", pch = 19,
     xlab = "Polynomial degree", ylab = "10-fold CV MSE",
     main = "CV error: nox ~ poly(dis, d)")

which.min(cv_poly_b)

## [1] 3

CV bottoms out around degree 3 or 4 and rises again for higher degrees — overfitting. A cubic is a good, parsimonious choice.

(d) Regression spline with 4 df via `bs()`

fit_bs4 <- lm(nox ~ bs(dis, df = 4), data = boston)
summary(fit_bs4)

## 
## Call:
## lm(formula = nox ~ bs(dis, df = 4), data = boston)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.124622 -0.039259 -0.008514  0.020850  0.193891 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       0.73447    0.01460  50.306  < 2e-16 ***
## bs(dis, df = 4)1 -0.05810    0.02186  -2.658  0.00812 ** 
## bs(dis, df = 4)2 -0.46356    0.02366 -19.596  < 2e-16 ***
## bs(dis, df = 4)3 -0.19979    0.04311  -4.634 4.58e-06 ***
## bs(dis, df = 4)4 -0.38881    0.04551  -8.544  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06195 on 501 degrees of freedom
## Multiple R-squared:  0.7164, Adjusted R-squared:  0.7142 
## F-statistic: 316.5 on 4 and 501 DF,  p-value: < 2.2e-16

attr(bs(boston$dis, df = 4), "knots")  # how R chose the interior knots

## [1] 3.20745

When you specify df = 4 for a cubic B-spline (the default degree = 3), R places one interior knot at the median of dis (df = degree + #knots + intercept handling, so df = 4 → 1 interior knot). I let R choose by quantile — this is the standard approach when you have no domain reason to prefer specific cut points.

pred_bs <- predict(fit_bs4, newdata = list(dis = dis_grid), se = TRUE)
plot(boston$dis, boston$nox, col = "darkgrey", pch = 19, cex = 0.5,
     xlab = "dis", ylab = "nox",
     main = "Regression spline (bs, df = 4)")
lines(dis_grid, pred_bs$fit,                 col = "red", lwd = 2)
lines(dis_grid, pred_bs$fit + 2 * pred_bs$se, col = "red", lwd = 1, lty = 2)
lines(dis_grid, pred_bs$fit - 2 * pred_bs$se, col = "red", lwd = 1, lty = 2)

(e) Splines across a range of df

rss_bs <- rep(NA, 13)
plot(boston$dis, boston$nox, col = "darkgrey", pch = 19, cex = 0.5,
     xlab = "dis", ylab = "nox",
     main = "Regression splines, df = 3 to 15")
cols2 <- rainbow(13)
for (i in seq_along(cols2)) {
  df_i <- i + 2  # df = 3..15
  fit  <- lm(nox ~ bs(dis, df = df_i), data = boston)
  rss_bs[i] <- sum(resid(fit)^2)
  lines(dis_grid, predict(fit, newdata = list(dis = dis_grid)),
        col = cols2[i], lwd = 1.2)
}
legend("topright", legend = paste("df =", 3:15),
       col = cols2, lwd = 1.2, cex = 0.7, bty = "n", ncol = 2)

data.frame(df = 3:15, RSS = round(rss_bs, 4))

##    df    RSS
## 1   3 1.9341
## 2   4 1.9228
## 3   5 1.8402
## 4   6 1.8340
## 5   7 1.8299
## 6   8 1.8170
## 7   9 1.8257
## 8  10 1.7925
## 9  11 1.7970
## 10 12 1.7890
## 11 13 1.7824
## 12 14 1.7818
## 13 15 1.7828

Training RSS decreases monotonically with df, as expected. Visually, fits beyond df ≈ 8 start to wiggle in regions with few observations (large dis).

(f) Cross-validation for spline df

cv_bs <- rep(NA, 13)
for (i in seq_along(cv_bs)) {
  df_i <- i + 2
  fit  <- glm(nox ~ bs(dis, df = df_i), data = boston)
  cv_bs[i] <- cv.glm(boston, fit, K = 10)$delta[1]
}
plot(3:15, cv_bs, type = "b", pch = 19,
     xlab = "Degrees of freedom", ylab = "10-fold CV MSE",
     main = "CV error: bs(dis, df)")

best_df <- which.min(cv_bs) + 2
best_df

## [1] 13

CV selects roughly df = placeholder (typically 8–10 on this data, though it varies a bit with the random fold split). The curve is fairly flat in this region, suggesting the choice is not very sensitive.

Exercise 10(a) — Forward stepwise selection on `College`

Goal. Split College into training/test sets; with Outstate as the response and the others as predictors, use forward stepwise selection to find a satisfactory subset model.

set.seed(1)
n     <- nrow(College)
train <- sample(seq_len(n), size = floor(0.7 * n))
test  <- setdiff(seq_len(n), train)

college_train <- College[train, ]
college_test  <- College[test,  ]

Forward stepwise via `regsubsets`

fwd <- regsubsets(Outstate ~ ., data = college_train,
                  nvmax = 17, method = "forward")
fwd_summary <- summary(fwd)

par(mfrow = c(1, 3))
plot(fwd_summary$cp,  type = "b", pch = 19,
     xlab = "Number of variables", ylab = "Cp")
plot(fwd_summary$bic, type = "b", pch = 19,
     xlab = "Number of variables", ylab = "BIC")
plot(fwd_summary$adjr2, type = "b", pch = 19,
     xlab = "Number of variables", ylab = "Adj R^2")

par(mfrow = c(1, 1))

c(Cp_best     = which.min(fwd_summary$cp),
  BIC_best    = which.min(fwd_summary$bic),
  AdjR2_best  = which.max(fwd_summary$adjr2))

##    Cp_best   BIC_best AdjR2_best 
##         13         13         13

Cp and Adj-R² typically select around 12–13 variables; BIC, which penalizes more aggressively, lands around 6–8. The fit improves quickly through ~6 variables and then plateaus — a parsimonious choice of 6 variables is a defensible compromise.

chosen_k <- 6
coef(fwd, chosen_k)

##   (Intercept)    PrivateYes    Room.Board           PhD   perc.alumni 
## -3782.6544026  2808.0522815     0.9722009    38.0615210    59.1918276 
##        Expend     Grad.Rate 
##     0.2032837    28.6598253

These six predictors (commonly Private, Room.Board, PhD/Terminal, perc.alumni, Expend, Grad.Rate) repeatedly come out on top across resamplings: they cover the major axes — sector, cost of attendance, faculty quality, alumni engagement, spending, and outcomes.

Test-set evaluation of the chosen model

test_mat <- model.matrix(Outstate ~ ., data = college_test)
coefs    <- coef(fwd, id = chosen_k)
preds    <- test_mat[, names(coefs)] %*% coefs
test_mse <- mean((college_test$Outstate - preds)^2)
test_mse

## [1] 3635355

sqrt(test_mse)

## [1] 1906.661

The resulting RMSE gives a tangible sense of out-of-state-tuition prediction error in dollars; you can compare this against the response’s standard deviation (sd(College$Outstate) ≈ 4000) to judge fit quality.

ISLR Chapter 7 — Exercises 6(b), 7, 8, 9, 10(a)

Sy Cong Nguyen

May 14, 2026

Packages

Exercise 6(b) — Step function fit on `Wage`

Exercise 7 — Exploring other `Wage` predictors with non-linear methods

7.1 Marginal relationships

7.2 ANOVA: do these factors add explanatory power beyond a flexible age fit?

7.3 Visualizing the GAM fit

Exercise 8 — Non-linear models on `Auto`

8.1 Polynomial regression: ANOVA to test degree

8.2 Cross-validating the polynomial degree

8.3 Splines and a GAM

8.4 A multivariate GAM

Exercise 9 — `Boston` data: `nox ~ dis`

(a) Cubic polynomial fit

(b) Polynomial degrees 1 through 10

(c) Cross-validation to pick the polynomial degree

(d) Regression spline with 4 df via `bs()`

(e) Splines across a range of df

(f) Cross-validation for spline df

Exercise 10(a) — Forward stepwise selection on `College`

Forward stepwise via `regsubsets`

Test-set evaluation of the chosen model

ISLR Chapter 7 — Exercises 6(b), 7, 8, 9, 10(a)

Sy Cong Nguyen

May 14, 2026

Packages

Exercise 6(b) — Step function fit on Wage

Exercise 7 — Exploring other Wage predictors with non-linear methods

7.1 Marginal relationships

7.2 ANOVA: do these factors add explanatory power beyond a flexible age fit?

7.3 Visualizing the GAM fit

Exercise 8 — Non-linear models on Auto

8.1 Polynomial regression: ANOVA to test degree

8.2 Cross-validating the polynomial degree

8.3 Splines and a GAM

8.4 A multivariate GAM

Exercise 9 — Boston data: nox ~ dis

(a) Cubic polynomial fit

(b) Polynomial degrees 1 through 10

(c) Cross-validation to pick the polynomial degree

(d) Regression spline with 4 df via bs()

(e) Splines across a range of df

(f) Cross-validation for spline df

Exercise 10(a) — Forward stepwise selection on College

Forward stepwise via regsubsets

Test-set evaluation of the chosen model

Exercise 6(b) — Step function fit on `Wage`

Exercise 7 — Exploring other `Wage` predictors with non-linear methods

Exercise 8 — Non-linear models on `Auto`

Exercise 9 — `Boston` data: `nox ~ dis`

(d) Regression spline with 4 df via `bs()`

Exercise 10(a) — Forward stepwise selection on `College`

Forward stepwise via `regsubsets`