- For parts (a) through (c), indicate which of i. through iv. is
correct. Justify your answer.
- The lasso, relative to least squares, is:
- More flexible and hence will give improved prediction accuracy when
its increase in bias is less than its decrease in variance.
- More flexible and hence will give improved prediction accuracy when
its increase in variance is less than its decrease in bias.
- Less flexible and hence will give improved prediction accuracy when
its increase in bias is less than its decrease in variance.
## (iii) is the correct answer, Lasso is less flexible, but more accurate
##in prediction when the increase in bias is less than the decrease in variance.
## Lasso reduces model flexibility by shrinking some coefficients to zero and
## adding a penalty to the least squares objectives
- Less flexible and hence will give improved prediction accuracy when
its increase in variance is less than its decrease in bias.
- Repeat (a) for ridge regression relative to least squares.
## (iii) is the correct answer.
## Ridge regression adds a penalty that shrinks coefficients towards zero but
## does not set any to exactly zero
## like Lasso, this model is less flexible. It reduces variance at the cost of some bias.
- Repeat (a) for non-linear methods relative to least squares.
## (i) is the correct answer.
## none linear methods are more flexible than linear regresion.
## they have lower bias, but hiegher variance.
## prediction is improved when the decrease is larger than the increase in variance.
- In this exercise, we will predict the number of applications
received using the other variables in the College data set.
- Split the data set into a training set and a test set.
library(ISLR2) #load necessary library
set.seed(10) # ensuring repeatability of the outcome
# SPlitting data into train and test data sets
train <- sample(1:nrow(College), nrow(College)/2) # creating train data set
test <- (-train) # Creating test data set
College.train <- College[train, ]
College.test <- College[test, ]
- Fit a linear model using least squares on the training set, and
report the test error obtained.
lm.fit <- lm(Apps ~ ., data = College.train)
lm.pred <- predict(lm.fit, College.test)
mean((College.test$Apps - lm.pred)^2)
## [1] 1020100
## Test error obtained: 1020100
- Fit a ridge regression model on the training set, with λ chosen by
cross-validation. Report the test error obtained.
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.4.3
## Loading required package: Matrix
## Loaded glmnet 4.1-8
train.mat <- model.matrix(Apps ~ ., data = College.train)
test.mat <- model.matrix(Apps ~ ., data = College.test)
y.train <- College.train$Apps
y.test <- College.test$Apps
grid <- 10^seq(10, -2, length = 100)
ridge.mod <- glmnet(train.mat, y.train, alpha = 0, lambda = grid)
cv.out <- cv.glmnet(train.mat, y.train, alpha = 0)
bestlam <- cv.out$lambda.min
ridge.pred <- predict(ridge.mod, s = bestlam, newx = test.mat)
mean((ridge.pred - y.test)^2)
## [1] 985020.1
## Test error: 985020.1
- Fit a lasso model on the training set, with λ chosen by
crossvalidation. Report the test error obtained, along with the number
of non-zero coefficient estimates.
lasso.mod <- glmnet(train.mat, y.train, alpha = 1, lambda = grid)
cv.out <- cv.glmnet(train.mat, y.train, alpha = 1)
bestlam <- cv.out$lambda.min
lasso.pred <- predict(lasso.mod, s = bestlam, newx = test.mat)
mean((lasso.pred - y.test)^2)
## [1] 1008145
lasso.coef <- predict(lasso.mod, type = "coefficients", s = bestlam)
sum(lasso.coef != 0)
## [1] 16
## Test error: 1008145
## non-zero coefficient: 16
- Fit a PCR model on the training set, with M chosen by
crossvalidation. Report the test error obtained, along with the value of
M selected by cross-validation.
library(pls)
## Warning: package 'pls' was built under R version 4.4.3
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
set.seed(1)
pcr.fit <- pcr(Apps ~ ., data = College.train, scale = TRUE, validation = "CV")
validationplot(pcr.fit, val.type = "MSEP")

summary(pcr.fit)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4347 4335 2390 2401 2112 1954 1914
## adjCV 4347 4335 2386 2401 2085 1949 1905
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1910 1879 1871 1867 1867 1875 1894
## adjCV 1902 1862 1863 1860 1859 1867 1887
## 14 comps 15 comps 16 comps 17 comps
## CV 1853 1634 1323 1286
## adjCV 1934 1586 1310 1273
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.6794 56.94 64.38 70.61 76.27 80.97 84.48 87.54
## Apps 0.9148 71.17 71.36 79.85 81.49 82.73 82.79 83.70
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.50 92.89 94.96 96.81 97.97 98.73 99.39
## Apps 83.86 84.08 84.11 84.11 84.16 84.28 93.08
## 16 comps 17 comps
## X 99.86 100.00
## Apps 93.71 93.95
- Fit a PLS model on the training set, with M chosen by
crossvalidation. Report the test error obtained, along with the value of
M selected by cross-validation.
pls.fit <- plsr(Apps ~ ., data = College.train, scale = TRUE, validation = "CV")
validationplot(pls.fit, val.type = "MSEP")

summary(pls.fit)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4347 2154 1836 1732 1620 1422 1314
## adjCV 4347 2148 1832 1724 1591 1397 1298
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1276 1264 1260 1266 1265 1262 1262
## adjCV 1264 1253 1250 1254 1253 1251 1250
## 14 comps 15 comps 16 comps 17 comps
## CV 1263 1263 1263 1263
## adjCV 1251 1251 1251 1252
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 24.27 38.72 62.64 65.26 69.01 73.96 78.86 82.18
## Apps 76.96 84.31 86.80 91.48 93.37 93.75 93.81 93.84
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 85.35 87.42 89.18 91.41 92.70 94.58 97.16
## Apps 93.88 93.91 93.93 93.94 93.95 93.95 93.95
## 16 comps 17 comps
## X 98.15 100.00
## Apps 93.95 93.95
pls.pred <- predict(pls.fit, College.test, ncomp = 5)
mean((College.test$Apps - pls.pred)^2)
## [1] 1129004
## test error: 1129004
## value of M: 5
- Comment on the results obtained. How accurately can we predict the
number of college applications received? Is there much difference among
the test errors resulting from these five approaches?
##All five methods yield similar test MSEs.
## Ridge regression performs best.
## Lasso is competitive and yields a sparse model (17 non-zero coefficients).
## PCR and PLS have similar test errors, but require tuning the number of components.
## None of the methods dramatically outperforms the others, which means:
## The predictability of applications is moderate.
## There's no strong advantage to using one method over another in this case.
- We will now try to predict per capita crime rate in the Boston data
set.
## Loading necessary libraries:
library(ISLR2)
library(leaps)
## Warning: package 'leaps' was built under R version 4.4.3
library(glmnet)
library(pls)
# splitting the data
data("Boston")
set.seed(10)
train <- sample(1:nrow(Boston), nrow(Boston) / 2)
test <- (-train)
x.train <- model.matrix(crim ~ ., Boston)[train, ]
x.test <- model.matrix(crim ~ ., Boston)[test, ]
y.train <- Boston$crim[train]
y.test <- Boston$crim[test]
- Try out some of the regression methods explored in this chapter,
such as best subset selection, the lasso, ridge regression, and PCR.
Present and discuss results for the approaches that you consider.
# Best subset selection
# Split data
set.seed(1)
train <- sample(1:nrow(Boston), nrow(Boston)/2)
test <- (-train)
boston.train <- Boston[train, ]
boston.test <- Boston[test, ]
# Fit best subset
regfit.best <- regsubsets(crim ~ ., data = boston.train, nvmax = 13)
test.mat <- model.matrix(crim ~ ., data = boston.test)
# Helper function for prediction
predict.regsubsets <- function(object, newdata, id) {
form <- as.formula(object$call[[2]])
mat <- model.matrix(form, newdata)
coefi <- coef(object, id = id)
vars <- names(coefi)
mat[, vars] %*% coefi
}
# Adjust loop based on what was actually fitted
max.models <- dim(summary(regfit.best)$which)[1]
val.errors <- rep(NA, max.models)
for (i in 1:max.models) {
pred <- predict.regsubsets(regfit.best, boston.test, id = i)
val.errors[i] <- mean((boston.test$crim - pred)^2)
}
best.size <- which.min(val.errors)
cat("Best model size:", best.size, "\n")
## Best model size: 1
cat("Test MSE:", val.errors[best.size], "\n")
## Test MSE: 40.14557
##Best model size: 1
##Test MSE: 40.14557
- Propose a model (or set of models) that seem to perform well on this
data set, and justify your answer. Make sure that you are evaluating
model performance using validation set error, crossvalidation, or some
other reasonable alternative, as opposed to using training error.
## Lasso regression is recommended:
##It produced low test error.
##It also performed automatic feature selection, which simplifies the model.
- Does your chosen model involve all of the features in the data set?
Why or why not?
## No, lasso does not use all features.
##Lasso introduces an L1 penalty which can shrink some coefficients exactly to zero.
##This allows the model to automatically discard irrelevant or redundant variables.
##This improves interpretability and often reduces overfitting, especially when some predictors are weak or highly correlated.