Problem 2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer. (a) The lasso, relative to least squares, is:
i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
(b) Repeat (a) for ridge regression relative to least squares.
(c) Repeat (a) for non-linear methods relative to least squares.

(a): iii. lasso regression reduces the number of predictors but is still linear, which means that it is less flexible. It has an increase in bias for the trade-off of a greater reduction in variance.
(b): iii. ridge is very similar to lasso, though it shrinks predictors rather than removing them. It still has the same trade-off of an increase in bias for a greater reduction of variance.
(c): ii. non-linear methods are more flexible and have a better prediction accuracy when an increase in variance is traded for a decrease in bias.

Problem 9

In this exercise, we will predict the number of applications received using the other variables in the College data set.

library(ISLR2)
attach(College)

(a) Split the data set into a training set and a test set.

set.seed(1)
train=sample(dim(College)[1], dim(College)[1]/2)
test=-train

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

lm.fit=lm(Apps~.,data=College[train,])
lm.pred=predict(lm.fit,newdata = College[test,])
mean((lm.pred-College[test,"Apps"])^2)
## [1] 1135758

1135758 is the MSE.

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

library(glmnet)
set.seed(1)
grid=10^seq(10,-2,length=100)
x=model.matrix(Apps~.,data=College[train,])
ridge.cv=cv.glmnet(x,y=College[train,"Apps"],alpha=0,lambda=grid)
ridge.cv$lambda.min
## [1] 0.01
x.test=model.matrix(Apps~.,data=College[test,])
ridge.pred=predict(ridge.cv, x.test, ridge.cv$lambda.min)
mean((College[test,"Apps"]-ridge.pred)^2)
## [1] 1134677

A test error of 1134677 is an improvement over the Linear Model.

(d) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

set.seed(1)
lasso.cv=cv.glmnet(x,y=College[train,"Apps"],alpha=1,lambda=grid)
lasso.cv$lambda.min
## [1] 0.01
lasso.pred=predict(lasso.cv, x.test, lasso.cv$lambda.min)
mean((College[test,"Apps"]-lasso.pred)^2)
## [1] 1133422
lasso.coef=predict(lasso.cv,type='coefficients',lasso.cv$lambda.min)
lasso.coef[lasso.coef!=0]
## [1] -215.425105    1.368687   19.318477

This test error of 1133422 is a further improvement over Ridge regression. There are 3 non-zero coefficients.

(e) Fit a PCR model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.

library(pls)
set.seed(1)
pcr.fit=pcr(Apps~.,data=College[train,],scale=TRUE,validation='CV')
validationplot(pcr.fit,val.type="MSEP")

pcr.pred=predict(pcr.fit, College[test,], ncomp=17)
mean((College[test,"Apps"]-pcr.pred)^2)
## [1] 1135758

Using 17 components, we get a test error of 1135758. This is the same as the linear model.

(f) Fit a PLS model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.

set.seed(1)
pls.fit=plsr(Apps~.,data=College[train,],scale=TRUE,validation='CV')
validationplot(pls.fit,val.type="MSEP")

pls.pred=predict(pls.fit, College[test,], ncomp=6)
mean((College[test,"Apps"]-pls.pred)^2)
## [1] 1066991
detach(College)

Using 6 components, we get a test error of 1066991.

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

There is not too much difference in the test errors resulting from the 5 approaches.

1-mean(abs(College[test,"Apps"]-pls.pred))/mean(College[test,"Apps"])
## [1] 0.7757627

We can predict the number of college applications with around 77% accuracy.

Problem 11

We will now try to predict per capita crime rate in the Boston data set.
(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

library(MASS)
attach(Boston)
set.seed(1)
train=sample(dim(Boston)[1], dim(Boston)[1]/2)
test=-train

set.seed(1)
x=model.matrix(crim~.,data=Boston[train,])
ridge.cv=cv.glmnet(x,y=Boston[train,"crim"],alpha=0,lambda=grid)
ridge.cv$lambda.min
## [1] 0.2848036
x.test=model.matrix(crim~.,data=Boston[test,])
ridge.pred=predict(ridge.cv, x.test, ridge.cv$lambda.min)
mean((Boston[test,"crim"]-ridge.pred)^2)
## [1] 41.11039
set.seed(1)
lasso.cv=cv.glmnet(x,y=Boston[train,"crim"],alpha=1,lambda=grid)
lasso.cv$lambda.min
## [1] 0.05336699
lasso.pred=predict(lasso.cv, x.test, lasso.cv$lambda.min)
mean((Boston[test,"crim"]-lasso.pred)^2)
## [1] 40.97871
lasso.coef=predict(lasso.cv,type='coefficients',lasso.cv$lambda.min)
lasso.coef[lasso.coef!=0]
## [1] 1.1928272 0.2657869 0.0183239

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross-validation, or some other reasonable alternative, as opposed to using training error.

A lasso model looks like it will perform well, as it narrows the prediction down to just 3 coefficients, whilst resulting in a relatively small test error.

(c) Does your chosen model involve all of the features in the data set? Why or why not?

This model does not involve all features of the data set, as many coefficients are set to 0.