For parts (a) through (c), indicate which of i. through iv.
is correct. Justify your answer. (a) The lasso,
relative to least squares, is:
i. More flexible and hence will give improved prediction
accuracy when its increase in bias is less than its decrease in
variance.
ii. More flexible and hence will give improved prediction
accuracy when its increase in variance is less than its decrease in
bias.
iii. Less flexible and hence will give improved prediction
accuracy when its increase in bias is less than its decrease in
variance.
iv. Less flexible and hence will give improved prediction
accuracy when its increase in variance is less than its decrease in
bias.
(b) Repeat (a) for ridge regression relative to least
squares.
(c) Repeat (a) for non-linear methods relative to least
squares.
(a): iii. lasso regression reduces the number of predictors but is
still linear, which means that it is less flexible. It has an increase
in bias for the trade-off of a greater reduction in variance.
(b): iii. ridge is very similar to lasso, though it shrinks predictors
rather than removing them. It still has the same trade-off of an
increase in bias for a greater reduction of variance.
(c): ii. non-linear methods are more flexible and have a better
prediction accuracy when an increase in variance is traded for a
decrease in bias.
In this exercise, we will predict the number of applications
received using the other variables in the College data
set.
library(ISLR2)
attach(College)
(a) Split the data set into a training set and a test set.
set.seed(1)
train=sample(dim(College)[1], dim(College)[1]/2)
test=-train
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
lm.fit=lm(Apps~.,data=College[train,])
lm.pred=predict(lm.fit,newdata = College[test,])
mean((lm.pred-College[test,"Apps"])^2)
## [1] 1135758
1135758 is the MSE.
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(glmnet)
set.seed(1)
grid=10^seq(10,-2,length=100)
x=model.matrix(Apps~.,data=College[train,])
ridge.cv=cv.glmnet(x,y=College[train,"Apps"],alpha=0,lambda=grid)
ridge.cv$lambda.min
## [1] 0.01
x.test=model.matrix(Apps~.,data=College[test,])
ridge.pred=predict(ridge.cv, x.test, ridge.cv$lambda.min)
mean((College[test,"Apps"]-ridge.pred)^2)
## [1] 1134677
A test error of 1134677 is an improvement over the Linear Model.
(d) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.
set.seed(1)
lasso.cv=cv.glmnet(x,y=College[train,"Apps"],alpha=1,lambda=grid)
lasso.cv$lambda.min
## [1] 0.01
lasso.pred=predict(lasso.cv, x.test, lasso.cv$lambda.min)
mean((College[test,"Apps"]-lasso.pred)^2)
## [1] 1133422
lasso.coef=predict(lasso.cv,type='coefficients',lasso.cv$lambda.min)
lasso.coef[lasso.coef!=0]
## [1] -215.425105 1.368687 19.318477
This test error of 1133422 is a further improvement over Ridge regression. There are 3 non-zero coefficients.
(e) Fit a PCR model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.
library(pls)
set.seed(1)
pcr.fit=pcr(Apps~.,data=College[train,],scale=TRUE,validation='CV')
validationplot(pcr.fit,val.type="MSEP")
pcr.pred=predict(pcr.fit, College[test,], ncomp=17)
mean((College[test,"Apps"]-pcr.pred)^2)
## [1] 1135758
Using 17 components, we get a test error of 1135758. This is the same as the linear model.
(f) Fit a PLS model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.
set.seed(1)
pls.fit=plsr(Apps~.,data=College[train,],scale=TRUE,validation='CV')
validationplot(pls.fit,val.type="MSEP")
pls.pred=predict(pls.fit, College[test,], ncomp=6)
mean((College[test,"Apps"]-pls.pred)^2)
## [1] 1066991
detach(College)
Using 6 components, we get a test error of 1066991.
(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
There is not too much difference in the test errors resulting from the 5 approaches.
1-mean(abs(College[test,"Apps"]-pls.pred))/mean(College[test,"Apps"])
## [1] 0.7757627
We can predict the number of college applications with around 77% accuracy.
We will now try to predict per capita crime rate in the
Boston data set.
(a) Try out some of the regression methods explored in this
chapter, such as best subset selection, the lasso, ridge regression, and
PCR. Present and discuss results for the approaches that you
consider.
library(MASS)
attach(Boston)
set.seed(1)
train=sample(dim(Boston)[1], dim(Boston)[1]/2)
test=-train
set.seed(1)
x=model.matrix(crim~.,data=Boston[train,])
ridge.cv=cv.glmnet(x,y=Boston[train,"crim"],alpha=0,lambda=grid)
ridge.cv$lambda.min
## [1] 0.2848036
x.test=model.matrix(crim~.,data=Boston[test,])
ridge.pred=predict(ridge.cv, x.test, ridge.cv$lambda.min)
mean((Boston[test,"crim"]-ridge.pred)^2)
## [1] 41.11039
set.seed(1)
lasso.cv=cv.glmnet(x,y=Boston[train,"crim"],alpha=1,lambda=grid)
lasso.cv$lambda.min
## [1] 0.05336699
lasso.pred=predict(lasso.cv, x.test, lasso.cv$lambda.min)
mean((Boston[test,"crim"]-lasso.pred)^2)
## [1] 40.97871
lasso.coef=predict(lasso.cv,type='coefficients',lasso.cv$lambda.min)
lasso.coef[lasso.coef!=0]
## [1] 1.1928272 0.2657869 0.0183239
(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross-validation, or some other reasonable alternative, as opposed to using training error.
A lasso model looks like it will perform well, as it narrows the prediction down to just 3 coefficients, whilst resulting in a relatively small test error.
(c) Does your chosen model involve all of the features in the data set? Why or why not?
This model does not involve all features of the data set, as many coefficients are set to 0.