Assignment #5

Author

Christian Rivera

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

(a) The lasso, relative to least squares, is:

iii
Lasso is less flexible. It shrinks coefficients. It improves prediction when the variance it removes is greater than the bias it adds.

(b) Repeat (a) for ridge regression relative to least squares.

iii
Ridge is less flexible. It shrinks all coefficients. It helps when variance drops more than bias increases.

(c) Repeat (a) for non-linear methods relative to least squares.

ii
Non-linear methods are more flexible. They reduce bias but increase variance. They work when bias drops more than variance rises.

9. In this exercise, we will predict the number of applications received using the other variables in the College data set

(a) Split the data set into a training set and a test set.

library(ISLR2)
Warning: package 'ISLR2' was built under R version 4.4.2
set.seed(23)

x <- model.matrix(Apps ~ ., College)[, -1]
y <- College$Apps

train <- sample(1:nrow(x), nrow(x) / 2)
test <- -train
y.test <- y[test]

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

LS.fit <- lm(Apps ~ ., data = College, subset = train)
pred <- predict(LS.fit, College[test, ])
mean((y.test - pred)^2)
[1] 1502889

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

library(glmnet)
Warning: package 'glmnet' was built under R version 4.4.3
Loading required package: Matrix
Loaded glmnet 4.1-10
grid <- 10^seq(10, -2, length = 100)
ridge.mod <- glmnet(x[train, ], y[train], alpha = 0, lambda = grid)

cv.ridge <- cv.glmnet(x[train, ], y[train], alpha = 0)
bestlam.ridge <- cv.ridge$lambda.min

ridge.pred <- predict(ridge.mod, s = bestlam.ridge, newx = x[test, ])
mean((y.test - ridge.pred)^2)
[1] 2183465

(d) Fit a lasso model on the training set, with λ chosen by cross validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
cv.lasso=cv.glmnet(x[train,],y[train],alpha=1)
bestlam=cv.lasso$lambda.min
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((y.test-lasso.pred)^2)
[1] 1536580
coef(lasso.mod,s=bestlam)
18 x 1 sparse Matrix of class "dgCMatrix"
               s=16.13985
(Intercept) -8.062771e+02
PrivateYes  -5.798990e+02
Accept       1.254614e+00
Enroll       .           
Top10perc    2.590386e+01
Top25perc    .           
F.Undergrad  3.244117e-02
P.Undergrad -4.214541e-04
Outstate    -8.696287e-02
Room.Board   9.443607e-02
Books       -1.507894e-01
Personal     1.998897e-02
PhD         -6.220556e+00
Terminal    -8.399445e+00
S.F.Ratio    2.407169e+01
perc.alumni -4.605567e+00
Expend       1.809945e-01
Grad.Rate    9.764346e+00

(e) Fit a PCR model on the training set, with M chosen by cross validation. Report the test error obtained, along with the value of M selected by cross-validation.

library(pls)
Warning: package 'pls' was built under R version 4.4.3

Attaching package: 'pls'
The following object is masked from 'package:stats':

    loadings
pcr.fit=pcr(Apps~.,data=College[train,],scale=TRUE,validation="CV")
pcr.pred=predict(pcr.fit,x[test,],ncomp=10)
mean((y.test-pcr.pred)^2)
[1] 2846437

(f) Fit a PLS model on the training set, with M chosen by cross validation. Report the test error obtained, along with the value of M selected by cross-validation.

pls.fit=plsr(Apps~.,data=College[train,],scale=TRUE,validation="CV")
pls.pred=predict(pls.fit,x[test,],ncomp=9)
mean((y.test-pls.pred)^2)
[1] 1539409

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

Ridge had the lowest test error.
Lasso and PLS were close.
PCR performed worst.
All others were within the same range.
Prediction is accurate but not perfect.

11. We will now try to predict per capita crime rate in the Boston data set.

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

library(ISLR2)
library(glmnet)
library(pls)

set.seed(23)
x=model.matrix(crim~.,Boston)[,-1]
y=Boston$crim
train=sample(1:nrow(x),nrow(x)/2)
test=-train
y.test=y[test]

Least squares

LS.fit=lm(crim~.,data=Boston,subset=train)
mean((y.test-predict(LS.fit,Boston[test,]))^2)
[1] 43.39276

Ridge

grid=10^seq(10,-2,length=100)
ridge.mod=glmnet(x[train,],y[train],alpha=0,lambda=grid)
cv.ridge=cv.glmnet(x[train,],y[train],alpha=0)
mean((y.test-predict(ridge.mod,s=cv.ridge$lambda.min,newx=x[test,]))^2)
[1] 44.64073

Lasso

lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
cv.lasso=cv.glmnet(x[train,],y[train],alpha=1)
mean((y.test-predict(lasso.mod,s=cv.lasso$lambda.min,newx=x[test,]))^2)
[1] 44.92273

PCR

pcr.fit=pcr(crim~.,data=Boston[train,],scale=TRUE,validation="CV")
mean((y.test-predict(pcr.fit,x[test,],ncomp=10))^2)
[1] 45.69818

PLS

pls.fit=plsr(crim~.,data=Boston[train,],scale=TRUE,validation="CV")
mean((y.test-predict(pls.fit,x[test,],ncomp=9))^2)
[1] 43.40717

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross validation, or some other reasonable alternative, as opposed to using training error.

Least squares and PLS had the lowest test errors. Both are simple and accurate. Model performance was judged using test MSE.

(c) Does your chosen model involve all of the features in the data set? Why or why not?

No. PLS used 9 components. It reduced dimensionality and combined features. Not all original predictors are kept.