Problem 1

In this problem, you will generate a simulated dataset, and then use subset selection and LASSO to identify the most important predictors. In class we introduced subset selection as a method to select an optimal set of predictors. Specifically, given \(p\) predictors and the model \[\begin{equation} Y \simeq \beta_0 + \beta_1X_1 + ... + \beta_pX_p, \end{equation}\] subset selection identifies a set of predictors \(M\) such that \(\beta_j\) can take any value if \(j \in M\) and \(\beta_j=0\) if \(j\notin M\). Earlier in the class, we showed that linear regression can be used to fit polynomials by treating \(X^j\) (that is, \(X\) raised to the power \(j\), for \(j=2...p\)) as independent predictors: \[\begin{equation} Y \simeq \beta_0 + \beta_1X + \beta_2X^2 ... + \beta_pX^p \end{equation}\] In this problem you will first generate synthetic data for a specific choice of a polynomial. You will then show that the polynomial can be re-inferred from the synthetic data using best subset selection or LASSO. Make sure to read through the R-lab in section 6.5 of ISLR to see how to implement subset selection and LASSO in R.

  1. First generate the synthetic dataset with \(n=100\) data points. Before doing so, use set.seed(1) for reproducibility. Then use the function rnorm() to generate a vector of \(n=100\) random predictor values \(X\) as well as a vector of \(n=00\) random noise values \(\epsilon\), both following the standard normal distribution \(N(0,1)\). Then generate a vector of response values \(Y\) according to the model: \[\begin{equation} Y = 2 + X + 3X^2 - 0.5X^3 + \epsilon \label{eqn:model} \end{equation}\] Finally store vectors \(X\) and \(Y\) in a data frame df with with column names x and y (hint: use data.frame() to create the data frame.
set.seed(1)

n <- 100
X <- rnorm(n)
epsilon <- rnorm(n)

Y <- 2 + X + 3 * X^2 - 0.5 * X^3 + epsilon

df <- data.frame(x = X, y = Y)
  1. Use the function regsubsets() to perform best subset selection to identify the model containing the best combination of the predictors \(X,\,X^2,\,...X^{10}\). Perform best subset selection and store the results in the variable regfit.full. What powers of \(X\) were selected for the model with \(k=3\) predictors, and is this what you expected? (hint: use library(leaps) to load the required library. A \(p\)th degree polynomial model can be specified using the formula y~poly(x, p, raw=TRUE) in R. Alternatively it can be specified using y~x+I(x^2)+I(x^3)+.... The command summary() show what predictors were selected).
library(leaps)

regfit.full <- regsubsets(y ~ poly(x, 10, raw = TRUE), data = df, nvmax = 10)

regfit.sum <- summary(regfit.full)
print(regfit.sum)
## Subset selection object
## Call: regsubsets.formula(y ~ poly(x, 10, raw = TRUE), data = df, nvmax = 10)
## 10 Variables  (and intercept)
##                           Forced in Forced out
## poly(x, 10, raw = TRUE)1      FALSE      FALSE
## poly(x, 10, raw = TRUE)2      FALSE      FALSE
## poly(x, 10, raw = TRUE)3      FALSE      FALSE
## poly(x, 10, raw = TRUE)4      FALSE      FALSE
## poly(x, 10, raw = TRUE)5      FALSE      FALSE
## poly(x, 10, raw = TRUE)6      FALSE      FALSE
## poly(x, 10, raw = TRUE)7      FALSE      FALSE
## poly(x, 10, raw = TRUE)8      FALSE      FALSE
## poly(x, 10, raw = TRUE)9      FALSE      FALSE
## poly(x, 10, raw = TRUE)10     FALSE      FALSE
## 1 subsets of each size up to 10
## Selection Algorithm: exhaustive
##           poly(x, 10, raw = TRUE)1 poly(x, 10, raw = TRUE)2
## 1  ( 1 )  " "                      "*"                     
## 2  ( 1 )  " "                      "*"                     
## 3  ( 1 )  "*"                      "*"                     
## 4  ( 1 )  "*"                      "*"                     
## 5  ( 1 )  "*"                      "*"                     
## 6  ( 1 )  "*"                      "*"                     
## 7  ( 1 )  "*"                      "*"                     
## 8  ( 1 )  "*"                      "*"                     
## 9  ( 1 )  "*"                      "*"                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)3 poly(x, 10, raw = TRUE)4
## 1  ( 1 )  " "                      " "                     
## 2  ( 1 )  " "                      " "                     
## 3  ( 1 )  "*"                      " "                     
## 4  ( 1 )  "*"                      " "                     
## 5  ( 1 )  "*"                      " "                     
## 6  ( 1 )  "*"                      " "                     
## 7  ( 1 )  "*"                      " "                     
## 8  ( 1 )  "*"                      "*"                     
## 9  ( 1 )  "*"                      "*"                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)5 poly(x, 10, raw = TRUE)6
## 1  ( 1 )  " "                      " "                     
## 2  ( 1 )  "*"                      " "                     
## 3  ( 1 )  " "                      " "                     
## 4  ( 1 )  "*"                      " "                     
## 5  ( 1 )  "*"                      "*"                     
## 6  ( 1 )  " "                      " "                     
## 7  ( 1 )  "*"                      "*"                     
## 8  ( 1 )  " "                      "*"                     
## 9  ( 1 )  "*"                      "*"                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)7 poly(x, 10, raw = TRUE)8
## 1  ( 1 )  " "                      " "                     
## 2  ( 1 )  " "                      " "                     
## 3  ( 1 )  " "                      " "                     
## 4  ( 1 )  " "                      " "                     
## 5  ( 1 )  " "                      " "                     
## 6  ( 1 )  "*"                      "*"                     
## 7  ( 1 )  " "                      "*"                     
## 8  ( 1 )  " "                      "*"                     
## 9  ( 1 )  " "                      "*"                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)9 poly(x, 10, raw = TRUE)10
## 1  ( 1 )  " "                      " "                      
## 2  ( 1 )  " "                      " "                      
## 3  ( 1 )  " "                      " "                      
## 4  ( 1 )  " "                      " "                      
## 5  ( 1 )  " "                      " "                      
## 6  ( 1 )  "*"                      " "                      
## 7  ( 1 )  " "                      "*"                      
## 8  ( 1 )  "*"                      "*"                      
## 9  ( 1 )  "*"                      "*"                      
## 10  ( 1 ) "*"                      "*"
  1. We next need to decide how many predictors \(k\) to include in the final model. This can be decided by selecting the model with the best \(C_p\), \(BIC\) or the adjusted \(R^2\) (or cross-validation error, but we will not consider that here). Fortunately, the command regfit.sum=summary(regfit.full) computes these three values for all models. For each of the measures \(C_p\), \(BIC\) or the adjusted \(R^2\), do: 1) create a plot showing how the value depends on the number of predictors, 2) find the optimal value of \(k\) according to the measure, and 3) report the coefficients and compare them to the true model. (hint: use names() to find what values are stored in regfit.sum. These can then be accessed using, for example regfit.sum$cp. which.min() and which.max() are useful for finding the index in an array with the largest or smallest value. The command \(coef()\) is uefull for getting the coefficients.)
plot(regfit.sum$cp, type = 'b', xlab = "Number of Predictors", ylab = "Cp")

plot(regfit.sum$bic, type = 'b', xlab = "Number of Predictors", ylab = "BIC")

plot(regfit.sum$adjr2, type = 'b', xlab = "Number of Predictors", ylab = "Adjusted R^2")

k_cp <- which.min(regfit.sum$cp)
k_bic <- which.min(regfit.sum$bic)
k_adjr2 <- which.max(regfit.sum$adjr2)

coef(regfit.full, k_cp)
##              (Intercept) poly(x, 10, raw = TRUE)1 poly(x, 10, raw = TRUE)2 
##               2.07200775               1.38745596               2.84575641 
## poly(x, 10, raw = TRUE)3 poly(x, 10, raw = TRUE)5 
##              -0.94202574               0.08072292
coef(regfit.full, k_bic)
##              (Intercept) poly(x, 10, raw = TRUE)1 poly(x, 10, raw = TRUE)2 
##                2.0615072                0.9752803                2.8762090 
## poly(x, 10, raw = TRUE)3 
##               -0.4823614
coef(regfit.full, k_adjr2)
##              (Intercept) poly(x, 10, raw = TRUE)1 poly(x, 10, raw = TRUE)2 
##               2.07200775               1.38745596               2.84575641 
## poly(x, 10, raw = TRUE)3 poly(x, 10, raw = TRUE)5 
##              -0.94202574               0.08072292
  1. Repeat (c) using forward stepwise selection and backwards stepwise selection.
regfit.fwd <- regsubsets(y ~ poly(x, 10, raw = TRUE), data = df, nvmax = 10, method = "forward")
summary(regfit.fwd)
## Subset selection object
## Call: regsubsets.formula(y ~ poly(x, 10, raw = TRUE), data = df, nvmax = 10, 
##     method = "forward")
## 10 Variables  (and intercept)
##                           Forced in Forced out
## poly(x, 10, raw = TRUE)1      FALSE      FALSE
## poly(x, 10, raw = TRUE)2      FALSE      FALSE
## poly(x, 10, raw = TRUE)3      FALSE      FALSE
## poly(x, 10, raw = TRUE)4      FALSE      FALSE
## poly(x, 10, raw = TRUE)5      FALSE      FALSE
## poly(x, 10, raw = TRUE)6      FALSE      FALSE
## poly(x, 10, raw = TRUE)7      FALSE      FALSE
## poly(x, 10, raw = TRUE)8      FALSE      FALSE
## poly(x, 10, raw = TRUE)9      FALSE      FALSE
## poly(x, 10, raw = TRUE)10     FALSE      FALSE
## 1 subsets of each size up to 10
## Selection Algorithm: forward
##           poly(x, 10, raw = TRUE)1 poly(x, 10, raw = TRUE)2
## 1  ( 1 )  " "                      "*"                     
## 2  ( 1 )  " "                      "*"                     
## 3  ( 1 )  "*"                      "*"                     
## 4  ( 1 )  "*"                      "*"                     
## 5  ( 1 )  "*"                      "*"                     
## 6  ( 1 )  "*"                      "*"                     
## 7  ( 1 )  "*"                      "*"                     
## 8  ( 1 )  "*"                      "*"                     
## 9  ( 1 )  "*"                      "*"                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)3 poly(x, 10, raw = TRUE)4
## 1  ( 1 )  " "                      " "                     
## 2  ( 1 )  " "                      " "                     
## 3  ( 1 )  " "                      " "                     
## 4  ( 1 )  "*"                      " "                     
## 5  ( 1 )  "*"                      " "                     
## 6  ( 1 )  "*"                      " "                     
## 7  ( 1 )  "*"                      " "                     
## 8  ( 1 )  "*"                      " "                     
## 9  ( 1 )  "*"                      " "                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)5 poly(x, 10, raw = TRUE)6
## 1  ( 1 )  " "                      " "                     
## 2  ( 1 )  "*"                      " "                     
## 3  ( 1 )  "*"                      " "                     
## 4  ( 1 )  "*"                      " "                     
## 5  ( 1 )  "*"                      "*"                     
## 6  ( 1 )  "*"                      "*"                     
## 7  ( 1 )  "*"                      "*"                     
## 8  ( 1 )  "*"                      "*"                     
## 9  ( 1 )  "*"                      "*"                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)7 poly(x, 10, raw = TRUE)8
## 1  ( 1 )  " "                      " "                     
## 2  ( 1 )  " "                      " "                     
## 3  ( 1 )  " "                      " "                     
## 4  ( 1 )  " "                      " "                     
## 5  ( 1 )  " "                      " "                     
## 6  ( 1 )  " "                      " "                     
## 7  ( 1 )  "*"                      " "                     
## 8  ( 1 )  "*"                      "*"                     
## 9  ( 1 )  "*"                      "*"                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)9 poly(x, 10, raw = TRUE)10
## 1  ( 1 )  " "                      " "                      
## 2  ( 1 )  " "                      " "                      
## 3  ( 1 )  " "                      " "                      
## 4  ( 1 )  " "                      " "                      
## 5  ( 1 )  " "                      " "                      
## 6  ( 1 )  "*"                      " "                      
## 7  ( 1 )  "*"                      " "                      
## 8  ( 1 )  "*"                      " "                      
## 9  ( 1 )  "*"                      "*"                      
## 10  ( 1 ) "*"                      "*"
regfit.bwd <- regsubsets(y ~ poly(x, 10, raw = TRUE), data = df, nvmax = 10, method = "backward")
summary(regfit.bwd)
## Subset selection object
## Call: regsubsets.formula(y ~ poly(x, 10, raw = TRUE), data = df, nvmax = 10, 
##     method = "backward")
## 10 Variables  (and intercept)
##                           Forced in Forced out
## poly(x, 10, raw = TRUE)1      FALSE      FALSE
## poly(x, 10, raw = TRUE)2      FALSE      FALSE
## poly(x, 10, raw = TRUE)3      FALSE      FALSE
## poly(x, 10, raw = TRUE)4      FALSE      FALSE
## poly(x, 10, raw = TRUE)5      FALSE      FALSE
## poly(x, 10, raw = TRUE)6      FALSE      FALSE
## poly(x, 10, raw = TRUE)7      FALSE      FALSE
## poly(x, 10, raw = TRUE)8      FALSE      FALSE
## poly(x, 10, raw = TRUE)9      FALSE      FALSE
## poly(x, 10, raw = TRUE)10     FALSE      FALSE
## 1 subsets of each size up to 10
## Selection Algorithm: backward
##           poly(x, 10, raw = TRUE)1 poly(x, 10, raw = TRUE)2
## 1  ( 1 )  " "                      "*"                     
## 2  ( 1 )  " "                      "*"                     
## 3  ( 1 )  "*"                      "*"                     
## 4  ( 1 )  "*"                      "*"                     
## 5  ( 1 )  "*"                      "*"                     
## 6  ( 1 )  "*"                      "*"                     
## 7  ( 1 )  "*"                      "*"                     
## 8  ( 1 )  "*"                      "*"                     
## 9  ( 1 )  "*"                      "*"                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)3 poly(x, 10, raw = TRUE)4
## 1  ( 1 )  " "                      " "                     
## 2  ( 1 )  "*"                      " "                     
## 3  ( 1 )  "*"                      " "                     
## 4  ( 1 )  "*"                      " "                     
## 5  ( 1 )  "*"                      " "                     
## 6  ( 1 )  "*"                      " "                     
## 7  ( 1 )  "*"                      " "                     
## 8  ( 1 )  "*"                      "*"                     
## 9  ( 1 )  "*"                      "*"                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)5 poly(x, 10, raw = TRUE)6
## 1  ( 1 )  " "                      " "                     
## 2  ( 1 )  " "                      " "                     
## 3  ( 1 )  " "                      " "                     
## 4  ( 1 )  " "                      " "                     
## 5  ( 1 )  " "                      " "                     
## 6  ( 1 )  " "                      " "                     
## 7  ( 1 )  " "                      "*"                     
## 8  ( 1 )  " "                      "*"                     
## 9  ( 1 )  "*"                      "*"                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)7 poly(x, 10, raw = TRUE)8
## 1  ( 1 )  " "                      " "                     
## 2  ( 1 )  " "                      " "                     
## 3  ( 1 )  " "                      " "                     
## 4  ( 1 )  " "                      " "                     
## 5  ( 1 )  " "                      "*"                     
## 6  ( 1 )  " "                      "*"                     
## 7  ( 1 )  " "                      "*"                     
## 8  ( 1 )  " "                      "*"                     
## 9  ( 1 )  " "                      "*"                     
## 10  ( 1 ) "*"                      "*"                     
##           poly(x, 10, raw = TRUE)9 poly(x, 10, raw = TRUE)10
## 1  ( 1 )  " "                      " "                      
## 2  ( 1 )  " "                      " "                      
## 3  ( 1 )  " "                      " "                      
## 4  ( 1 )  "*"                      " "                      
## 5  ( 1 )  "*"                      " "                      
## 6  ( 1 )  "*"                      "*"                      
## 7  ( 1 )  "*"                      "*"                      
## 8  ( 1 )  "*"                      "*"                      
## 9  ( 1 )  "*"                      "*"                      
## 10  ( 1 ) "*"                      "*"
  1. A subset of important predictors can also be selected using LASSO. Use cv.glmnet() (imported using library(glmnet)) with alpha=1 to fit a LASSO model to the synthetic data and store it in cv.out. Plot how the cross-validation error depends on \(\lambda\). How many predictors were selected? Is the LASSO model similar to the true model? (hint: An example of how to fit a model using cv.glmnet() can be found on page 278. As input, cv.glmnet() needs the design matrix with the intercept column removed, as demonstrated on page 274. Both the \(\lambda\) values and the corresponding cross-validation error are stored in cv.out; use names() to find them! It may be useful to use the argument log='x' in plot())
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-8
x_matrix <- model.matrix(y ~ poly(x, 10, raw = TRUE), data = df)[, -1]  # Remove intercept
y_response <- df$y

cv.out <- cv.glmnet(x_matrix, y_response, alpha = 1)
plot(cv.out)

best_lambda <- cv.out$lambda.min
lasso_model <- glmnet(x_matrix, y_response, alpha = 1, lambda = best_lambda)
coef(lasso_model)
## 11 x 1 sparse Matrix of class "dgCMatrix"
##                                      s0
## (Intercept)                2.1482869810
## poly(x, 10, raw = TRUE)1   0.8852714744
## poly(x, 10, raw = TRUE)2   2.7158832671
## poly(x, 10, raw = TRUE)3  -0.4596297829
## poly(x, 10, raw = TRUE)4   .           
## poly(x, 10, raw = TRUE)5   .           
## poly(x, 10, raw = TRUE)6   0.0036926467
## poly(x, 10, raw = TRUE)7   .           
## poly(x, 10, raw = TRUE)8   0.0007278168
## poly(x, 10, raw = TRUE)9   .           
## poly(x, 10, raw = TRUE)10  .

Problem 2

In the previous problem you performed subset selection using three approaches: best subset selection, forward stepwise selection, and backward stepwise selection. For each approach, you obtained the models \(M_0\), …, \(M_p\) containing \(0, 1, 2, ..., p\) predictors. In this problem, you will compare the theoretical properties of the three approaches.

  1. Given that you used the three approaches to independently generate three models \(M_k\) with \(k\) predictors, which model do you expect to have the smallest training RSS? Can there be a tie? Is it impossible for some model(s) to be better or worse than the other model(s)?

The best subset selection will have the smallest training RSS because it evaluates all possible predictor combinations. Forward and backward stepwise selection may match the best subset model’s RSS, but usually will not, because they are limited.

  1. Which of the three models with \(k\) predictors do you expect to have the smallest test RSS? Can there be a tie? Is it impossible for some model(s) to be better or worse than the other model(s)?

Any method could perform best, as it depends on which predictors go best with the actual data pattern. None of these methods is guaranteed to have the smallest test RSS, but they can produce similar test errors if they select similar predictors.

  1. Determine which of the following statements true or false, and provide a motivation:
  1. The predictors in the \(k\)-variable model identified by forward stepwise are always a subset of the predictors in the \((k+1)\)-variable model identified by forward stepwise selection.

True: In forward stepwise selection, each (k+1) predictor model includes all k-predictors + 1, so each k-predictor model is a subset of the next.

  1. The predictors in the \(k\)-variable model identified by backward stepwise are always a subset of the predictors in the \((k + 1)\)-variable model identified by backward stepwise selection.

False: Backward stepwise doesn’t guarantee that each k-predictor model is a subset of the previous (k+1) predictor model, because predictors are removed sequentially.

  1. The predictors in the \(k\)-variable model identified by backward stepwise are always a subset of the predictors in the \((k + 1)\)-variable model identified by forward stepwise selection.

False: Forward and backward stepwise models do not follow the same paths, so the predictors chosen by one method are not necessarily subsets of the other.

  1. The predictors in the \(k\)-variable model identified by forward stepwise are always a subset of the predictors in the \((k+1)\)-variable model identified by backward stepwise selection.

False: Forward stepwise predictors are not guaranteed to be subsets of those chosen by backward stepwise selection.The two methods can select different predictors at each step, so there’s no subset relationship.

  1. The predictors in the \(k\)-variable model identified by best subset are always a subset of the predictors in the \((k + 1)\)-variable model identified by best subset selection.

True: Best subset selection includes each k-predictor model as a subset of the (k+1) model since it explores all possible combinations.

Problem 3

In this problem, you will study the fat dataset from the library faraway. To load this data, use:

#install.packages("faraway")
library(faraway)
data(fat)

Here the first line installs the package and only needs to be executed once. This dataset contains biometric data for 252 individuals. Our goal is to predict the body fat (stored in brozek) using all other predictors except for siri, density and free. These predictors can be removed using

df=subset(fat, select = -c(siri,density,free))

We also remove one datapoint with an anomalously short individual, and split the data into a training and testing dataset

df=df[df$height>50,]
set.seed(1)
train=sample(1:nrow(df), nrow(df)/2)
df.tr = df[train,]
df.te = df[-train,]
  1. Fit a linear regression model using all predictors. What predictor has the smallest p-value? What is the training and testing MSE?
lm.fit <- lm(brozek ~ ., data = df.tr)

train.mse <- mean(lm.fit$residuals^2)

pred.te <- predict(lm.fit, newdata = df.te)
test.mse <- mean((df.te$brozek - pred.te)^2)
  1. Fit a ridge regression model on the training set and store it in cv.out, with \(\lambda\) chosen by cross-validation. What is the optimal value of \(\lambda\)? Compute and report the training and testing MSE. (hint: use cv.glmnet() as in Problem 1e but with a different value of alpha. The optimal value of \(\lambda\) is stored in cv.out$lambda.min. When making predictions, use the function predict() with the model cv.out; the argument newx is used to specify the matrix of (possibly new) predictor values as input, and the argument s is used to specify what lambda to use for the prediction).
library(glmnet)
x <- model.matrix(brozek ~ ., df.tr)[, -1]
y <- df.tr$brozek
cv.out <- cv.glmnet(x, y, alpha = 0)

best.lambda <- cv.out$lambda.min

ridge.train.pred <- predict(cv.out, newx = x, s = best.lambda)
ridge.train.mse <- mean((y - ridge.train.pred)^2)

x.te <- model.matrix(brozek ~ ., df.te)[, -1]
ridge.test.pred <- predict(cv.out, newx = x.te, s = best.lambda)
ridge.test.mse <- mean((df.te$brozek - ridge.test.pred)^2)
  1. Fit a LASSO model on the training set, with \(\lambda\) chosen by cross-validation. Compute and report the training and testing MSE, along with the number of non-zero coefficient estimates (hint: see problem 1e and 3b).
cv.out.lasso <- cv.glmnet(x, y, alpha = 1)

lasso.train.pred <- predict(cv.out.lasso, newx = x, s = cv.out.lasso$lambda.min)
lasso.train.mse <- mean((y - lasso.train.pred)^2)

lasso.test.pred <- predict(cv.out.lasso, newx = x.te, s = cv.out.lasso$lambda.min)
lasso.test.mse <- mean((df.te$brozek - lasso.test.pred)^2)

lasso.coef <- coef(cv.out.lasso, s = cv.out.lasso$lambda.min)
num.nonzero <- sum(lasso.coef != 0)
  1. Perform best subset selection using the training data, saving the result in regfit.full. Plot the \(C_p\), BIC and adjusted \(R^2\) as a function of the number of predictors \(k\). What number of predictors minimize \(BIC\)? What predictors does this model use? (hint: See problem 1c)
regfit.full <- regsubsets(brozek ~ ., data = df.tr, nvmax = ncol(df.tr) - 1)
regfit.sum <- summary(regfit.full)

plot(regfit.sum$cp, xlab = "Number of Predictors", ylab = "Cp", type = "l")

plot(regfit.sum$bic, xlab = "Number of Predictors", ylab = "BIC", type = "l")

plot(regfit.sum$adjr2, xlab = "Number of Predictors", ylab = "Adjusted R^2", type = "l")

best.bic.k <- which.min(regfit.sum$bic)
best.bic.predictors <- coef(regfit.full, id = best.bic.k)
  1. Finally, compute the training and testing errors for the model from part (d). How does the testing error compare to those estimated for the LASSO and Ridge Regression models? In the book on page 272, the following function was defined to make prediction using models generated by regsubsets():
predict.regsubsets <- function(object, newdata, id, ...) {
  form <- as.formula(object$call[[2]])
  mat <- model.matrix(form, newdata)
  coefi <- coef(object, id = id)
  xvars <- names(coefi)
  mat[, xvars] %*% coefi
}