2.

(a)

Option iii., the lasso, relative to least squares, is less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. This is the case since the lasso method can remove predictors from the model, which decreases variance but may result in an increase in bias in the parameters.

(b)

Again, option iii., ridge regression, relative to least squares, is less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. Ridge regression shrinks the parameters of variables that are not associated with the response, thereby increasing bias and decreasing variance.

(c)

Option ii., non-linear methods, relative to least squares is more flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias. Since non-linear methods can overfit the data, generally variance increases while bias decreases.

9.

(a)

library(ISLR)
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.0-2
collegeDf<-College
set.seed(1)
train=sample(nrow(collegeDf),nrow(collegeDf)/2)
collegeDfTrain<-collegeDf[train,]
collegeDfTest<-collegeDf[-train,]

(b)

lm.fit<-lm(Apps~., data = collegeDfTrain)
lm.pred <- predict(lm.fit,newdata = collegeDfTest)
mean((lm.pred - collegeDfTest$Apps)^2)
## [1] 1135758

The test MSE using a least squares linear regression model is \(1,135,758\).

(c)

trainMatrix <- model.matrix(Apps~., data = collegeDfTrain)
testMatrix <- model.matrix(Apps~., data = collegeDfTest)
grid <- 10 ^ seq(4, -2, length = 100)
ridge.fit <- glmnet(trainMatrix, collegeDfTrain$Apps, alpha = 0, lambda = grid, thresh = 1e-12)
ridge.cv <- cv.glmnet(trainMatrix, collegeDfTrain$Apps, alpha = 0, lambda = grid, thresh = 1e-12)
ridge.minLambda <- ridge.cv$lambda.min
ridge.minLambda
## [1] 0.01
ridge.pred <- predict(ridge.fit, s = ridge.minLambda, newx = testMatrix)
mean((ridge.pred - collegeDfTest$Apps)^2)
## [1] 1135714

The test MSE from the ridge regression model is slightly lower than the least squares at \(1,135,714\). The model uses \(\lambda = 0.01\).

(d)

lasso.fit <- glmnet(trainMatrix, collegeDfTrain$Apps, alpha = 1, lambda = grid, thresh = 1e-12)
lasso.cv <- cv.glmnet(trainMatrix, collegeDfTrain$Apps, alpha = 1, lambda = grid, thresh = 1e-12)
lasso.minLambda <- lasso.cv$lambda.min
lasso.minLambda
## [1] 0.01
lasso.pred <- predict(lasso.fit, s = lasso.minLambda, newx = testMatrix)
mean((lasso.pred - collegeDfTest$Apps)^2)
## [1] 1135660

The test MSE for the lasso model is \(1,135,660\), slightly outperforming the least squares regression and the ridge regression models. The model uses \(\lambda = 0.01\).

(e)

library(pls)
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
pcr.fit <- pcr(Apps ~., data = collegeDfTrain, scale = TRUE, validation = 'CV')
validationplot(pcr.fit, val.type = "MSEP")

pcr.pred <- predict(pcr.fit, collegeDfTest, ncomp = 10)
mean((pcr.pred - collegeDfTest$Apps)^2)
## [1] 1723100

The PCR model has the worst test MSE so far of \(1,723,100\). The model uses \(M = 10\).

(f)

pls.fit <- plsr(Apps ~., data = collegeDfTrain, scale = TRUE, validation = 'CV')
validationplot(pls.fit, val.type = "MSEP")

pls.pred <- predict(pls.fit, collegeDfTest, ncomp = 10)
mean((pls.pred - collegeDfTest$Apps)^2)
## [1] 1131661

The PLS model has the lowest test MSE out of any of the models, with a value of \(1,131,661\). The model uses \(M = 10\).

(g)

test.avg <- mean(collegeDfTest$Apps)
lm.r2 <- 1 - mean((lm.pred - collegeDfTest$Apps)^2) / mean((test.avg - collegeDfTest$Apps)^2)
ridge.r2 <- 1 - mean((ridge.pred - collegeDfTest$Apps)^2) / mean((test.avg - collegeDfTest$Apps)^2)
lasso.r2 <- 1 - mean((lasso.pred - collegeDfTest$Apps)^2) / mean((test.avg - collegeDfTest$Apps)^2)
pcr.r2 <- 1 - mean((pcr.pred - collegeDfTest$Apps)^2) / mean((test.avg - collegeDfTest$Apps)^2)
pls.r2 <- 1 - mean((pls.pred - collegeDfTest$Apps)^2) / mean((test.avg - collegeDfTest$Apps)^2)
lm.r2
## [1] 0.9015413
ridge.r2
## [1] 0.9015452
lasso.r2
## [1] 0.9015499
pcr.r2
## [1] 0.8506248
pls.r2
## [1] 0.9018965

Comparing the models shows that there are some noticeable differences, with the PLS model performing the best with an of \(R^2 = 0.9018965\) explaining the most variance and the lowest test MSE of \(1,131,661\). The least squares, ridge, and lasso regression all perform very similarly with \(R^2\) of \(0.9015413\), \(0.9015452\), and \(0.9015499\) respectively and test MSE of \(1,135,758\), \(1,135,714\), and \(1,135,660\). Lastly, the PCR performed the worst of all the models with an \(R^2 = 0.8506248\) and a test MSE of \(1723100\).

11.

library(MASS)
library(leaps)
bostonDf <- Boston

(a)

predict.regsubsets <- function(object, newdata, id, ...) {
    form <- as.formula(object$call[[2]])
    mat <- model.matrix(form, newdata)
    coefi <- coef(object, id = id)
    xvars <- names(coefi)
    mat[, xvars] %*% coefi
}

k = 10
set.seed(1)
folds <- sample(1:k, nrow(bostonDf), replace = TRUE)
cvErrors <- matrix(NA, k, 13, dimnames = list(NULL, paste(1:13)))
for (j in 1:k) {
    bestFit <- regsubsets(crim ~ ., data = bostonDf[folds != j, ], nvmax = 13)
    for (i in 1:13) {
        pred <- predict(bestFit, bostonDf[folds == j, ], id = i)
        cvErrors[j, i] <- mean((bostonDf$crim[folds == j] - pred)^2)
    }
}

meanCvErrors <- apply(cvErrors, 2, mean)
plot(meanCvErrors, type = "b", xlab = "Number of variables", ylab = "CV error")

min(meanCvErrors)
## [1] 42.46014
which.min(meanCvErrors)
## 12 
## 12

Subset selection determines that a 12-variable model is best suited, giving a test MSE of \(42.46014\).

set.seed(1)
lassoCvOut <- cv.glmnet(model.matrix(crim ~ ., bostonDf)[, -1], bostonDf$crim, alpha = 1, type.measure = "mse")
plot(lassoCvOut)

lassoCvOut$lambda.min
## [1] 0.05630926
lassoCvOut
## 
## Call:  cv.glmnet(x = model.matrix(crim ~ ., bostonDf)[, -1], y = bostonDf$crim,      type.measure = "mse", alpha = 1) 
## 
## Measure: Mean-Squared Error 
## 
##     Lambda Measure    SE Nonzero
## min 0.0563   42.52 13.53      11
## 1se 3.0758   55.33 17.14       1

The lasso regression performs slightly worse than the least squares subset selected linear regression, with a \(\lambda = 0.05630926\) and a test MSE of \(42.52\)

set.seed(1)
ridgeCvOut <- cv.glmnet(model.matrix(crim ~ ., bostonDf)[, -1], bostonDf$crim, alpha = 0, type.measure = "mse")
plot(ridgeCvOut)

ridgeCvOut$lambda.min
## [1] 0.5374992
ridgeCvOut
## 
## Call:  cv.glmnet(x = model.matrix(crim ~ ., bostonDf)[, -1], y = bostonDf$crim,      type.measure = "mse", alpha = 0) 
## 
## Measure: Mean-Squared Error 
## 
##     Lambda Measure    SE Nonzero
## min   0.54   42.71 13.71      13
## 1se  56.31   55.80 17.00      13

The ridge regression model performs the worst yet, with a \(\lambda = 0.5374992\) and an MSE of \(45.42\).

set.seed(1)
pcrFit <- pcr(crim ~ ., data = bostonDf, scale = TRUE, validation = "CV")
summary(pcrFit)
## Data:    X dimension: 506 13 
##  Y dimension: 506 1
## Fit method: svdpc
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            8.61    7.250    7.253    6.833    6.815    6.826    6.847
## adjCV         8.61    7.245    7.247    6.825    6.803    6.818    6.838
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.837    6.710    6.735     6.723     6.714     6.696     6.624
## adjCV    6.827    6.698    6.724     6.710     6.702     6.682     6.609
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       47.70    60.36    69.67    76.45    82.99    88.00    91.14    93.45
## crim    30.69    30.87    39.27    39.61    39.61    39.86    40.14    42.47
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       95.40     97.04     98.46     99.52     100.0
## crim    42.55     42.78     43.04     44.13      45.4
6.624^2
## [1] 43.87738

The PCR has a test MSE of \(43.87738\), and uses all of the variables.

(b)

Of all the models fit, the subset selection linear regression performed the best with a test MSE of \(42.46014\). For this reason, this is the model I would recommend.

(c)

The final regression model used 12 of the variables, since it had a slightly lower test MSE than the full model.