Predictive Modeling Homework 5

Author

Cheryl Chiu (wky301)

Published

April 1, 2025

Load libraries

library(ISLR)
library(pls)
library(MASS)   
library(leaps)   
library(glmnet)

Exercise 2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

(a) The lasso, relative to least squares, is:

✅iii. Less flexible and hence will give improved prediction accu- racy when its increase in bias is less than its decrease in variance.

🔴 Answer:The lasso is less flexible than least squares. The method does better when the extra bias is smaller than the reduction in variability.

(b) Repeat (a) for ridge regression relative to least squares.

✅iii. Less flexible and hence will give improved prediction accu- racy when its increase in bias is less than its decrease in variance.

🔴 Answer: The ridge regression is less flexible than least squares. It works better when the extra bias is smaller than the reduction in variability.

(c) Repeat (a) for non-linear methods relative to least squares.

More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

✅ ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

🔴 Answer: Non-linear methods are more flexible than least squares. They work better when the reduction in bias is larger than the increase in variability.

Exercise 9

In this exercise, we will predict the number of applications received using the other variables in the College data set.

(a) Split the data set into a training set and a test set.

set.seed(1)
train = sample(1:nrow(College), nrow(College)/2)
test = -train
College.test <- College[test, ]

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

linear.fit <- lm(Apps ~ ., data=College, subset=train)
summary(linear.fit)


Call:
lm(formula = Apps ~ ., data = College, subset = train)

Residuals:
    Min      1Q  Median      3Q     Max 
-5741.2  -479.5    15.3   359.6  7258.0 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -7.902e+02  6.381e+02  -1.238 0.216410    
PrivateYes  -3.070e+02  2.006e+02  -1.531 0.126736    
Accept       1.779e+00  5.420e-02  32.830  < 2e-16 ***
Enroll      -1.470e+00  3.115e-01  -4.720 3.35e-06 ***
Top10perc    6.673e+01  8.310e+00   8.030 1.31e-14 ***
Top25perc   -2.231e+01  6.533e+00  -3.415 0.000708 ***
F.Undergrad  9.269e-02  5.529e-02   1.676 0.094538 .  
P.Undergrad  9.397e-03  5.493e-02   0.171 0.864275    
Outstate    -1.084e-01  2.700e-02  -4.014 7.22e-05 ***
Room.Board   2.115e-01  7.224e-02   2.928 0.003622 ** 
Books        2.912e-01  3.985e-01   0.731 0.465399    
Personal     6.133e-03  8.803e-02   0.070 0.944497    
PhD         -1.548e+01  6.681e+00  -2.316 0.021082 *  
Terminal     6.415e+00  7.290e+00   0.880 0.379470    
S.F.Ratio    2.283e+01  2.047e+01   1.115 0.265526    
perc.alumni  1.134e+00  6.083e+00   0.186 0.852274    
Expend       4.857e-02  1.619e-02   2.999 0.002890 ** 
Grad.Rate    7.490e+00  4.397e+00   1.703 0.089324 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1083 on 370 degrees of freedom
Multiple R-squared:  0.9389,    Adjusted R-squared:  0.9361 
F-statistic: 334.3 on 17 and 370 DF,  p-value: < 2.2e-16

pred.app <- predict(linear.fit, newdata = College.test)
test.error <- mean((College.test$Apps - pred.app)^2)
test.error

[1] 1135758

🔴 Answer: The MSE obtained from least square is 1135758.

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

x.train <- model.matrix(Apps ~ ., data = College)[train, -1]
y.train <- College$Apps[train]
x.test <- model.matrix(Apps ~ ., data = College)[test, -1]
y.test <- College$Apps[test]

set.seed(1)
ridge.cv <- cv.glmnet(x.train, y.train, alpha = 0)  
best.lambda.ridge <- ridge.cv$lambda.min
ridge.pred <- predict(ridge.cv, s = best.lambda.ridge, newx = x.test)
test.error.ridge <- mean((y.test - ridge.pred)^2)
test.error.ridge

[1] 976261.5

🔴 Answer: The MSE obtained from ridge regression model is 976261.5.

(d) Fit a lasso model on the training set, with λ chosen by cross- validation. Report the test error obtained, along with the num-er of non-zero coeﬀicient estimates.

set.seed(1)
lasso.cv <- cv.glmnet(x.train, y.train, alpha = 1) 
best.lambda.lasso <- lasso.cv$lambda.min
lasso.pred <- predict(lasso.cv, s = best.lambda.lasso, newx = x.test)
test.error.lasso <- mean((y.test - lasso.pred)^2)
test.error.lasso

[1] 1115901

lasso.fit <- glmnet(x.train, y.train, alpha = 1)
lasso.coef <- predict(lasso.fit, type = "coefficients", s = best.lambda.lasso)
nonzero.coefs <- sum(lasso.coef != 0) - 1
nonzero.coefs

[1] 17

🔴 Answer: The MSE obtained from lasso model is 1115901. The number of non-zero coeﬀicient is 17.

(e) Fit a PCR model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the valueof M selected by cross-validation.

set.seed(1)
pcr.fit <- pcr(Apps ~ ., data = College, subset = train, scale = TRUE, validation = "CV")

validationplot(pcr.fit, val.type = "MSEP", main = "PCR: CV MSEP")

optimal.pcr <- which.min(pcr.fit$validation$PRESS) - 1 
pcr.pred <- predict(pcr.fit, newdata = College.test, ncomp = optimal.pcr)
test.error.pcr <- mean((College.test$Apps - pcr.pred)^2)
test.error.pcr

[1] 1137877

optimal.pcr

[1] 16

🔴 Answer: The MSE obtained from PCR is 1137877. Optimal number of components in PCR is 16.

(f) Fit a PLS model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.

set.seed(1)
pls.fit <- plsr(Apps ~ ., data = College, subset = train, scale = TRUE, validation = "CV")
validationplot(pls.fit, val.type = "MSEP", main = "PLS: CV MSEP")

optimal.pls <- which.min(pls.fit$validation$PRESS) - 1
pls.pred <- predict(pls.fit, newdata = College.test, ncomp = optimal.pls)
test.error.pls <- mean((College.test$Apps - pls.pred)^2)
test.error.pls

[1] 1135812

optimal.pls

[1] 16

🔴 Answer: The MSE obtained from PLS is 1135812. Optimal number of components in PCR is 16.

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

sqrt(1115901)

[1] 1056.362

🔴 Answer: Overall, the test errors from the five methods are very similar. None of the methods substantially outperforms the others in predicting the number of college applications received. Lasso has the lowest MSE. For how accurately can we predict the number of college applications received for Lasso model, the RMSE is ~1056, which means the prediction error is about 1056 applications. (If a college actually received 20000 applications, the Lasso model’s prediction might be around 20000 ± 1056 applications.)

Exercise 11

We will now try to predict per capita crime rate in the Boston data set.

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

data(Boston)
set.seed(1)

# Best Subset Selection with 10-fold CV
predict.regsubsets <- function(object, newdata, id, ...) {
  form <- as.formula(object$call[[2]])
  mat <- model.matrix(form, newdata)
  coefi <- coef(object, id = id)
  mat[, names(coefi)] %*% coefi
}

k <- 10
folds <- sample(rep(1:k, length = nrow(Boston)))
p <- ncol(Boston) - 1  
cv.errors <- matrix(NA, k, p)

for (i in 1:k) {
  best.fit <- regsubsets(crim ~ ., data = Boston[folds != i, ], nvmax = p)
  for (j in 1:p) {
    pred <- predict.regsubsets(best.fit, Boston[folds == i, ], id = j)
    cv.errors[i, j] <- mean((Boston$crim[folds == i] - pred)^2)
  }
}

mean.cv.errors <- apply(cv.errors, 2, mean)
best.num <- which.min(mean.cv.errors)
cat("Best Subset Selection: Best number of predictors =", best.num, "\n")

Best Subset Selection: Best number of predictors = 9

cat("Minimum CV Error for Best Subset:", mean.cv.errors[best.num], "\n\n")

Minimum CV Error for Best Subset: 42.81453

# Ridge Regression with cv.glmnet
x <- model.matrix(crim ~ ., Boston)[, -1]
y <- Boston$crim

set.seed(1)
cv.ridge <- cv.glmnet(x, y, alpha = 0)
ridge.mse <- min(cv.ridge$cvm)
cat("Ridge Regression CV MSE:", ridge.mse, "\n\n")

Ridge Regression CV MSE: 42.71472

# Lasso Regression with cv.glmnet
set.seed(1)
cv.lasso <- cv.glmnet(x, y, alpha = 1)
lasso.mse <- min(cv.lasso$cvm)
cat("Lasso Regression CV MSE:", lasso.mse, "\n")

Lasso Regression CV MSE: 42.51513

lasso.coefs <- coef(cv.lasso, s = cv.lasso$lambda.min)
nonzero.coefs <- sum(lasso.coefs != 0) - 1
cat("Number of predictors selected by Lasso:", nonzero.coefs, "\n\n")

Number of predictors selected by Lasso: 11

# Principal Components Regression (PCR) with pls package
set.seed(1)
pcr.fit <- pcr(crim ~ ., data = Boston, scale = TRUE, validation = "CV")
validationplot(pcr.fit, val.type = "MSEP", main = "PCR: CV RMSEP")

pcr.cv <- RMSEP(pcr.fit, estimate = "CV")
pcr.rmsep <- pcr.cv$val[1, 1, ]
optimal.components <- which.min(pcr.rmsep) - 1
cat("Optimal number of components for PCR:", optimal.components, "\n")

Optimal number of components for PCR: 13

cat("PCR CV MSEP:", min(pcr.rmsep), "\n")

PCR CV MSEP: 6.623616

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross- validation, or some other reasonable alternative, as opposed to using training error.

🔴 Answer: I propose using the lasso model. Although the cross-validated MSEs from all methods are very similar, the lasso achieves the lowest CV MSE.

(c) Does your chosen model involve all of the features in the data set? Why or why not?

🔴 Answer: The lasso model does not involve all of the features. It selected 11 predictors out of the 13 available. Coefficients of some predictors that are zero were excluded to simplify the model and reduce overfitting.