Assignment 3

Question 1

We perform best subset, forward stepwise, and backward stepwise selection on a single data set. For each approach, we obtain \(p+1\) models, containing \(0, 1, 2,\dots , p\) predictors. Explain your answers:

Part 1

Which of the three models with k predictors has the smallest training RSS?

Answer 1 Part 1

While the algorithm for obtaining the best subset selection will produce the smallest RSS just by algorithmic definition of selecting for the smallest RSS, this is not trivial.

Part 2

Which of the three models with k predictors has the smallest test RSS?

Answer 1 Part 2

Again, without knowing anything else, the best subset selection, purely on the basis that it considers all possible models for each \(k\)–including those produced by the other two methods and assuming that we are effective with our implementation of the final step of each algorithm, by using \(C_p\), AIC,BIC,Adjusted \(R^2\), and Cross Validation)–will most likely find the most parsimonious model which will hopefully be the one that generalizes best to the true response distribution. It is possible, however, that all of our validation methods point to a model that is captured only by the best method, while the model that actually performs best could be different.

Part 3: True or False:

See answers below for each of the 5 values

Answer 1 Part 3

  • The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in the \((k+1)\)-variable model identified by forward stepwise selection.

  • True. The forward stepwise model is forward-additive for predictors.

  • The predictors in the k-variable model identified by backward stepwise are a subset of the predictors in the \((k+1)\) variable model identified by backward stepwise selection.

  • True. The backward stepwise model is backward-subtractive of its predictors.

  • The predictors in the k-variable model identified by backward stepwise are a subset of the predictors in the \((k+1)\) variable model identified by forward stepwise selection.

  • False. There is no guarantee that these models converge on the same subset of predictors.

  • The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in the \((k+1)\)-variable model identified by backward stepwise selection.

  • False There is no guarantee that these models converge on the same subset of predictors.

  • The predictors in the k-variable model identified by best subset are a subset of the predictors in the \((k+1)\)-variable model identified by best subset selection.

  • False The best subset algorithm is not additive.

========================================== ### Question 2

Suppose we are interested in estimating the regression coefficients in linear regression with regularization. Particularly, the goal is to minimize

\[||Y - X \beta||_2^2 + \lambda \sum_{i = 1}^{p} J_{\tau}(|\beta_j|), \]

with respect to \(\beta,\) where \(Y\) is a vector of responses, \(X\) is our usual data matrix, and for some fixed \(\lambda > 0\), \(\tau > 0\), and \(J_{\tau}(|\beta_j|) = \min\{\frac{|\beta_j|}{\tau}, 1\}.\)

Part 1

Intuitively, describe what this regularization \(J_{\tau}(|\beta_j|)\) is doing. Hint: Consider what happens to \(|\beta_j|\) that are smaller than \(\tau\) and greater than \(\tau\).

Answer 2 Part 1

The regularization term \(J_{\tau}(|\beta_j|)\) penalizes all \(|\beta_j|\) coefficients whose \(\tau\)-standardized values are greater than 1. In other words, \(\tau\) serves as a threshold: if \(\beta_j\) is greater than \(\tau\), then

Intuitively, \(J_{\tau}(|\beta_j|)\) makes it so the total contribution to the regularization of \(\beta\) has a maximum bound of \(p\). (It does this by creating a penalty ceiling of \(\tau\) for each variable which gives us an upper bound for the maximum total regularization: \(\lambda p \ge \lambda \sum_{i = 1}^{p} J_{\tau}(|\beta_j|)\).) Thus, our second hyperparameter \(\tau\) acts as a bound for our maximum penalties: when \(|\beta_j|\) surpasses \(\tau\), penalties do not continue to increase. This could help in some cases with unstandardized datasets or categorical predictors.

To minimize our function, \(|\beta_j|\) needs to stay below \(\tau\).

Part 2

Describe what happens when \(\tau \downarrow 0\). That is, what happens when \(\tau \rightarrow 0\) from the right side of \(0\).

Answer 2 Part 2

If \(\lambda\) is sufficiently large, \(\hat{\beta_j^J}\) begins to move towards zero as \(\tau \rightarrow 0\), but, when the \(RSS\) term is much larger than the maximum shrinkage penalty, \(\lambda p\), the values of \(\hat{\beta^J}\) will take advantage of the penalty ceiling, growing unbounded as \(\tau \rightarrow 0\), finally resulting in least squares estimates.

Part 3

Instead, suppose we consider the LASSO, with the goal to minimize

\[||Y - X \beta||_2^2 + \lambda \sum_{i = 1}^{p} |\beta_j|.\]

In this case, explain what the \(\beta\) coefficient solution approaches when \(\lambda \rightarrow 0\) and when \(\lambda \rightarrow \infty\).

Answer 2 Part 3

As \(\lambda \rightarrow 0\), the regularization disappears and we arrive at least squares estimates. As \(\lambda \rightarrow \infty\), our estimators drop to \(0\), and we are left with all bias and no variance (and likely a simple mean as an intercept estimator).

Part 4

Explain what is the difference between the LASSO and Ridge Regression.

Answer 2 Part 4

The key difference between LASSO and Ridge is that LASSO allows estimators to drop to \(0\), removing them from the model, because LASSO uses an \(l_1\) regulation.

==============================

library(leaps)
library(ISLR2)
library(ramify)
## 
## Attaching package: 'ramify'
## The following object is masked from 'package:graphics':
## 
##     clip

Question 3

In this exercise, we will generate simulated data, and will then use this data to perform best subset selection.

Part 1

Use the rnorm() function to generate a predictor \(X\) of length \(n = 200\), as well as a noise vector \(\varepsilon\) of length \(n = 200\).

Answer 3 Part 1

set.seed(1)
X <- rnorm(200)
eps <- rnorm(200, sd = 1)

Part 2

Generate a response vector \(Y\) of length \(n = 200\) according to the model

\[Y = \beta_0 + \beta_1 X +\beta_2X^2 +\beta_3X^3 +\varepsilon,\]

where \(\beta_0, \beta_1, \beta_2\), and \(\beta_3\) are constants of your choice.

Answer 3 Part 2

beta_0 <- 42
beta_1 <- 10
beta_2 <- 11
beta_3 <- 12
eps <- rnorm(100, sd = 1)
Y <- beta_0 + beta_1*X + beta_2*X^2 + beta_3*X^3 + eps
par(mfrow=c(1,1))
plot(Y)
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

Part 3

Use the regsubsets() function to perform best subset selection in order to choose the best model containing the predictors \(X,X^2, \dots , X^{9}\). What is the best model obtained according to RSS, \(C_p\), BIC, and adjusted \(R^2\)? Show some plots to provide evidence for your answer, and report the coefficients of the best model obtained. Note you will need to use the data.frame() function to create a single data set containing both \(X\) and \(Y\).

Answer 3 Part 3

data = data.frame(Y = Y, X = X, X2 = X^2, X3= X^3, X4=X^4, X5 = X^5, X6=X^6, X7=X^7, X8=X^8,X9=X^9)
m.matrix = model.matrix(Y~., data=data)
regfit.full <- regsubsets(m.matrix, y = Y, data=data)
## Warning in leaps.setup(x, y, wt = weights, nbest = nbest, nvmax = nvmax, : 1
## linear dependencies found
## Reordering variables and trying again:
reg.summary <- summary(regfit.full)
reg.summary
## Subset selection object
## 10 Variables  (and intercept)
##             Forced in Forced out
## X               FALSE      FALSE
## X2              FALSE      FALSE
## X3              FALSE      FALSE
## X4              FALSE      FALSE
## X5              FALSE      FALSE
## X6              FALSE      FALSE
## X7              FALSE      FALSE
## X8              FALSE      FALSE
## X9              FALSE      FALSE
## (Intercept)     FALSE      FALSE
## 1 subsets of each size up to 9
## Selection Algorithm: exhaustive
##          (Intercept) X   X2  X3  X4  X5  X6  X7  X8  X9 
## 1  ( 1 ) " "         " " " " "*" " " " " " " " " " " " "
## 2  ( 1 ) " "         " " "*" "*" " " " " " " " " " " " "
## 3  ( 1 ) " "         "*" "*" "*" " " " " " " " " " " " "
## 4  ( 1 ) " "         "*" "*" "*" " " "*" " " " " " " " "
## 5  ( 1 ) " "         "*" "*" "*" " " "*" " " "*" " " " "
## 6  ( 1 ) " "         "*" "*" "*" "*" " " "*" " " "*" " "
## 7  ( 1 ) " "         "*" "*" "*" "*" "*" "*" " " "*" " "
## 8  ( 1 ) " "         "*" "*" "*" "*" " " "*" "*" "*" "*"
## 9  ( 1 ) " "         "*" "*" "*" "*" "*" "*" "*" "*" "*"

First, we look at the best model according to \(R^2\), which, unsurprisingly, is the model that includes all the

# for (n in names(reg.summary)){
#   print(paste(n, ":",reg.summary))
# }
best.model = reg.summary$rsq
best.model
## [1] 0.9298092 0.9868603 0.9993822 0.9993827 0.9993865 0.9993887 0.9993895
## [8] 0.9993977 0.9993982
best.model = data.matrix(best.model)
reg.summary$which[argmax(best.model, rows = F),]
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE        TRUE 
##          X5          X6          X7          X8          X9 
##        TRUE        TRUE        TRUE        TRUE        TRUE
# reg.summary$which

To condense everything into one output, here are all of the respective models according to each of the model criteria found in the regsubsets() function:

metrics = c("rsq","rss","adjr2","cp","bic")
for (m in metrics){
  print("")
  print(paste("Evaluating models from 'best' selection using ",m, ":"))
  best.model = reg.summary[m]
  best.model
  best.model = data.frame(reg.summary[m])[1]
  if (m == "cp" | m == "bic"){
    best.model.index = argmin(best.model, rows = F)
  } else{
    best.model.index = argmax(best.model, rows = F)
  }
  xy = data.frame(reg.summary[m], x =1:9)
  plot(y = xy[,m], x = xy[,'x'],xlab = "Number of Variables",ylab = m, type = "l")
  points(best.model.index, xy[,m][best.model.index], col = "red", cex = 2,pch = 20)
  print(paste("The best model number is number ", best.model.index))
  print("It includes the following variables:")
  print(reg.summary$which[best.model.index,])
  print(reg.summary$outmat[best.model.index,])
}
## [1] ""
## [1] "Evaluating models from 'best' selection using  rsq :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  9"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE        TRUE 
##          X5          X6          X7          X8          X9 
##        TRUE        TRUE        TRUE        TRUE        TRUE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         "*"         "*" 
##          X6          X7          X8          X9 
##         "*"         "*"         "*"         "*" 
## [1] ""
## [1] "Evaluating models from 'best' selection using  rss :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  1"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE       FALSE       FALSE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         " "         " "         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " " 
## [1] ""
## [1] "Evaluating models from 'best' selection using  adjr2 :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  3"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " " 
## [1] ""
## [1] "Evaluating models from 'best' selection using  cp :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  3"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " " 
## [1] ""
## [1] "Evaluating models from 'best' selection using  bic :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  3"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " "
plot(reg.summary$cp, xlab = "Number of Variables", ylab = "Cp", type = "l")
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

Part 4

Repeat 3. using forward stepwise selection and also using backwards stepwise selection. How does your answer compare to the results in 3. ?

Answer 3 Part 4

Now, re-running the problem, but this time using forward and backward yield the same number of variable-model for each of them.

regfit.full <- regsubsets(m.matrix, y = Y, data=data, method = 'forward')
## Warning in leaps.setup(x, y, wt = weights, nbest = nbest, nvmax = nvmax, : 1
## linear dependencies found
## Reordering variables and trying again:
reg.summary <- summary(regfit.full)
for (m in metrics){
  print("")
  print(paste("Evaluating models from 'forward' selection using ",m, ":"))
  best.model = reg.summary[m]
  best.model
  best.model = data.frame(reg.summary[m])[1]
  if (m == "cp" | m == "bic"){
    best.model.index = argmin(best.model, rows = F)
  } else{
    best.model.index = argmax(best.model, rows = F)
  }
  xy = data.frame(reg.summary[m], x =1:9)
  plot(y = xy[,m], x = xy[,'x'],xlab = "Number of Variables",ylab = m, type = "l")
  points(best.model.index, xy[,m][best.model.index], col = "red", cex = 2,pch = 20)
  print(paste("The best model number is number ", best.model.index))
  print("It includes the following variables:")
  print(reg.summary$which[best.model.index,])
  print(reg.summary$outmat[best.model.index,])
}
## [1] ""
## [1] "Evaluating models from 'forward' selection using  rsq :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  9"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE        TRUE 
##          X5          X6          X7          X8          X9 
##        TRUE        TRUE        TRUE        TRUE        TRUE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         "*"         "*" 
##          X6          X7          X8          X9 
##         "*"         "*"         "*"         "*" 
## [1] ""
## [1] "Evaluating models from 'forward' selection using  rss :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  1"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE       FALSE       FALSE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         " "         " "         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " " 
## [1] ""
## [1] "Evaluating models from 'forward' selection using  adjr2 :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  3"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " " 
## [1] ""
## [1] "Evaluating models from 'forward' selection using  cp :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  3"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " " 
## [1] ""
## [1] "Evaluating models from 'forward' selection using  bic :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  3"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " "
regfit.full <- regsubsets(m.matrix, y = Y, data=data, method = 'backward')
## Warning in leaps.setup(x, y, wt = weights, nbest = nbest, nvmax = nvmax, : 1
## linear dependencies found
## Reordering variables and trying again:
reg.summary <- summary(regfit.full)
for (m in metrics){
  print("")
  print(paste("Evaluating models from 'backward' selection using ",m, ":"))
  best.model = reg.summary[m]
  best.model
  best.model = data.frame(reg.summary[m])[1]
  if (m == "cp" | m == "bic"){
    best.model.index = argmin(best.model, rows = F)
  } else{
    best.model.index = argmax(best.model, rows = F)
  }
  xy = data.frame(reg.summary[m], x =1:9)
  plot(y = xy[,m], x = xy[,'x'],xlab = "Number of Variables",ylab = m, type = "l")
  points(best.model.index, xy[,m][best.model.index], col = "red", cex = 2,pch = 20)
  print(paste("The best model number is number ", best.model.index))
  print("It includes the following variables:")
  print(reg.summary$which[best.model.index,])
  print(reg.summary$outmat[best.model.index,])
}
## [1] ""
## [1] "Evaluating models from 'backward' selection using  rsq :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  9"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE        TRUE 
##          X5          X6          X7          X8          X9 
##        TRUE        TRUE        TRUE        TRUE        TRUE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         "*"         "*" 
##          X6          X7          X8          X9 
##         "*"         "*"         "*"         "*" 
## [1] ""
## [1] "Evaluating models from 'backward' selection using  rss :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  1"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE       FALSE       FALSE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         " "         " "         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " " 
## [1] ""
## [1] "Evaluating models from 'backward' selection using  adjr2 :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  3"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " " 
## [1] ""
## [1] "Evaluating models from 'backward' selection using  cp :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  3"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " " 
## [1] ""
## [1] "Evaluating models from 'backward' selection using  bic :"
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

## [1] "The best model number is number  3"
## [1] "It includes the following variables:"
## (Intercept) (Intercept)           X          X2          X3          X4 
##        TRUE       FALSE        TRUE        TRUE        TRUE       FALSE 
##          X5          X6          X7          X8          X9 
##       FALSE       FALSE       FALSE       FALSE       FALSE 
## (Intercept)           X          X2          X3          X4          X5 
##         " "         "*"         "*"         "*"         " "         " " 
##          X6          X7          X8          X9 
##         " "         " "         " "         " "

Part 5

Now fit a lasso model to the simulated data, again using \(X,X^2, \dots , X^{9}\) as predictors. Use cross-validation to select the optimal value of \(\lambda\). Create plots of the cross-validation error as a function of \(\lambda\). Report the resulting coefficient estimates, and discuss the results obtained.

Part 6

Now generate a response vector \(Y\) according to the model

\[Y = \beta_0 + \beta_7X^7 + \varepsilon,\]

and perform best subset selection and the lasso. Discuss the results obtained.

Question 4

In this exercise, we will predict the number of applications received using the other variables in the College data set.

Part 1

Split the data set into a training set and a test set.

Part 2

Fit a linear model using least squares on the training set, and report the test error obtained.

Part 3

Fit a ridge regression model on the training set, with \(\lambda\) chosen by cross-validation. Report the test error obtained.

Part 4

Fit a lasso model on the training set, with \(\lambda\) chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

Part 5

Fit a PCR model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.

Part 6

Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

Question 5

Suppose that a curve \(\hat{g}\) is computed to smoothly fit a set of \(n\) points using the following formula:

\[ \hat{g} = \mathrm{argmin}_g \left( \sum_{i=1}^n (y_i - g(x_i)^2) + \lambda \int H(g^{(m)}(x)) \; dx \right) \]

where \(g^{(m)}\) represents the \(m\)th derivative of \(g\) (and \(g^{(0)} = g\)) for some pre-determined \(\delta\) and \(H\) is the Huber loss defined here. The Huber loss is a loss function typically used in robust regression that is less sensitive to outliers in data than the usual squared error loss.

Provide example sketches of \(\hat{g}\) in each of the following scenarios.

  1. \(\lambda = \infty, m = 0\).

  2. \(\lambda = \infty, m = 1\).

  3. \(\lambda = \infty, m = 2\).

  4. \(\lambda = \infty, m = 3\).

  5. \(\lambda = 0, m = 3\).

Question 6

The Wage data set contains a number of other features not explored in this chapter, such as marital status (maritl), job class (jobclass), and others. Explore the relationships between some of these other predictors and wage. Use non-linear fitting techniques with regularization in order to fit flexible models to the data. You should use \(k\)-fold cross validation to select a reasonable \(\lambda\) as your penalized regression parameter. Create plots of the results obtained, and write a summary of your findings.

Question 7

This question relates to the College data set.

Part 1

Split the data into a training set and a test set. Using out-of-state tuition as the response and the other variables as the predictors, perform backward stepwise selection on the training set in order to identify a satisfactory model that uses just a subset of the predictors.

Part 2

Fit a GAM on the training data, using out-of-state tuition as the response and the features selected in the previous step as the predictors. Plot the results, and explain your findings.

Part 3

Evaluate the model obtained on the test set, and explain the results obtained.

Part 4

For which variables, if any, is there evidence of a non-linear relationship with the response?

Question 8

GAMs are generally fit using a backfitting approach. The idea behind backfitting is actually quite simple. We will now explore backfitting in the context of multiple linear regression.

Suppose that we would like to perform multiple linear regression, but we do not have software to do so. Instead, we only have software to perform simple linear regression. Therefore, we take the following iterative approach: we repeatedly hold all but one coefficient estimate fixed at its current value, and update only that coefficient estimate using a simple linear regression. The process is continued until the coefficient estimates stop changing (convergence).

We now try this out on a toy example.

Part 1

Use the following code to generate a response \(Y\) and two predictors \(X_1\) and \(X_2\), with \(n = 100\).

set.seed(2022)
n = 100
x1 = rnorm(n)
x2 = rnorm(n)
y = 13 + 4 * x1 - 7 * x2 + rnorm(n)

Part 2

Initialize \(\hat{\alpha}_1\) to the value 0.

alpha1 = 0

Part 3

Keeping \(\hat{\alpha}_1\) fixed, fit the model

\[Y - \hat{\alpha}_1 X_1 = \alpha_0 + \alpha_2 X_2 + \epsilon.\]

You can do this as follows:

a = y - alpha1 * x1
alpha2 = lm(a ~ x2)$coef[2]

Part 4

Keeping \(\hat{\alpha}_2\) fixed, fit the model

\[Y - \hat{\alpha}_2 X_2 = \alpha_0 + \alpha_1 X_1 + \epsilon.\]

You can do this as follows:

a = y - alpha2 * x2
alpha1 = lm(a ~ x1)$coef[2]

Part 5

Write a for loop to repeat 3. and 4. 1,000 times. Report the estimates of \(\hat{\alpha}_0\), \(\hat{\alpha}_1\), and \(\hat{\alpha}_2\) at each iteration of the for loop. Create a plot in which each of these values is displayed, with \(\hat{\alpha}_0\), \(\hat{\alpha}_1\), and \(\hat{\alpha}_2\) each shown in a different color.

Part 6

Compare your answer in 5. to the results of simply performing multiple linear regression to predict \(Y\) using \(X_1\) and \(X_2\). Use the abline() function to overlay those multiple linear regression coefficient estimates on the plot obtained in 5.

Part 7

On this data set, how many backfitting iterations were required in order to obtain a “good” approximation to the multiple regression coefficient estimates?

R Markdown

This is an R Markdown document themed with {bslib} package. {bslib} makes it easy to customize the main colors and fonts of a html_document, flexdashboard::flex_dashboard, shiny::fluidPage(), or more generally any website that uses Bootstrap for styling. The theme parameter in the yaml front-matter of this Rmd document describes a bslib::bs_theme() object, which provides access to 100s of theming options (via its ... argument) in addition to the main options demonstrated here (e.g., bg, fg, primary, etc).

This particular example uses {bslib}’s default Bootstrap version (which, at the time of writing, is Bootstrap 5). However, if reproducibility is important, it’s recommended that you “lock-in” the version by adding version: 5 to the theme definition.

Themed Plots

When running this document with {thematic} installed, the thematic::thematic_rmd(font = "auto") effectively translates theme (CSS) settings to new global theming defaults for {ggplot2}, {lattice}, and {base} R graphics:

ggplot2

library(ggplot2)

ggplot(mpg, aes(displ, hwy)) +
  geom_point() + geom_smooth()
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

lattice

lattice::show.settings()
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element

base

plot(pressure, col = thematic::thematic_get_option("accent"))
## Warning in check_font_path(italic, "italic"): 'italic' should be a length-one
## vector, using the first element