2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

  1. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

  2. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

  3. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

  4. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

  1. The lasso, relative to least squares, is:

“iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.” - The penalty that the lasso imposes on the regression coefficients causes some coefficients to shrink to zero. This change reduces the model’s flexibility in comparison to least squares, due to the bias it introduces.

  1. Repeat (a) for ridge regression relative to least squares.

“iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.” - Like lasso, ridge regression imposes a penalty that shrinks coefficients, making the model less flexible than least squares. The difference is, ridge regression shrinks coefficients to closer to zero, not zero exactly.

  1. Repeat (a) for non-linear methods relative to least squares.

“i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.” - Non-linear methods are more flexible than least squares, as they can model more complex relationships with more accuracy. This typically reduces bias and increases variance, and improves prediction accuracy when the reduction in bias outweight the increase in variance.

9. In this exercise, we will predict the number of applications received using the other variables in the College data set.

library(ISLR2)
## Warning: package 'ISLR2' was built under R version 4.4.3
data("College")
  1. Split the data set into a training set and a test set.
set.seed(1)
train_coll <- sample(1:nrow(College),nrow(College)/2)
train.coll<-College[train_coll,]
test.coll<-College[-train_coll,]
  1. Fit a linear model using least squares on the training set, and..
college_lm.fit<-lm(Apps~.,data=train.coll)
college_pred<-predict(college_lm.fit,newdata=test.coll)
mean((test.coll$Apps-college_pred)^2)
## [1] 1135758

..report the test error obtained

A linear model using least squares on the training set obtained a test error of 1,135,758.

  1. Fit a ridge regression model on the training set, with λ chosen by cross-validation.
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-8
x_train.college<-model.matrix(Apps~.,train.coll)[,-1]
x_test.college<-model.matrix(Apps~.,test.coll)[,-1]
y_train.college<-train.coll$Apps
ridge_college<-cv.glmnet(x_train.college,y_train.college,alpha=0)
best.ridge_college<-ridge_college$lambda.min
ridge.pred_college<-predict(ridge_college,s=best.ridge_college,newx=x_test.college)
mean((test.coll$Apps-ridge.pred_college)^2)
## [1] 976261.5

Report the test error obtained.

The ridge regression model on the training set, with λ chosen by cross-validation obtained a test error of 976,261.5.

  1. Fit a lasso model on the training set, with λ chosen by crossvalidation.
cv_college<-cv.glmnet(x_train.college,y_train.college,alpha=1)
best.lasso_college<-cv_college$lambda.min
lasso.pred_college<-predict(cv_college,s=best.lasso_college,newx=x_test.college)
mean((test.coll$Apps-lasso.pred_college)^2)
## [1] 1115901
lasso.coef_college<-predict(cv_college,s=best.lasso_college,type="coefficients")
sum(lasso.coef_college!=0)
## [1] 18

Report the test error obtained, along with the number of non-zero coefficient estimates.

The lasso model with λ chosen by crossvalidation produced a test MSE (Mean Squared Error) of 1,115,901 and 18 non-zero coefficient estimates.

  1. Fit a PCR model on the training set, with M chosen by crossvalidation.
library(pls)
## Warning: package 'pls' was built under R version 4.4.3
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
set.seed(1)
pcr.fit_college<-pcr(Apps~.,data=train.coll,scale=TRUE,validation="CV")
validationplot(pcr.fit_college,val.type="MSEP")

which.min(pcr.fit_college$validation$PRESS)
## [1] 17
pcr.pred_college<-predict(pcr.fit_college,newdata=test.coll,ncomp=which.min(pcr.fit_college$validation$PRESS))
mean((test.coll$Apps-pcr.pred_college)^2)
## [1] 1135758

Report the test error obtained, along with the value of M selected by cross-validation.

PCR (Principle Component Regression) with crossvalidation calculates the optimal number of components to be 17, with resulting test Mean Squared Error (MSE) of 1,135,758.

  1. Fit a PLS model on the training set, with M chosen by crossvalidation.
set.seed(1)
pls.fit_college<-plsr(Apps~.,data=train.coll,scale=TRUE,validation="CV")
validationplot(pls.fit_college,val.type="MSEP")

pls.min_college<-which.min(pls.fit_college$validation$PRESS)
pls.pred_college<-predict(pls.fit_college,newdata=test.coll,ncomp=pls.min_college)
mean((test.coll$Apps-pls.pred_college)^2)
## [1] 1135758

Report the test error obtained, along with the value of M selected by cross-validation.

A PLS model on the training set, with M chosen by crossvalidation returned a test error of 1,135,758

  1. Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

Of the calculated MSEs (Mean Squared Errors) the ridge regression method provided the highest predictive accuracy (976,261.5), while the lasso method provides a balance between performance and model simplicity.

11. We will now try to predict per capita crime rate in the Boston data set.

data("Boston")
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked _by_ '.GlobalEnv':
## 
##     Boston
## The following object is masked from 'package:ISLR2':
## 
##     Boston
  1. Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR.
set.seed(1)
train_boston<-sample(1:nrow(Boston),nrow(Boston)/2)
train.boston<-Boston[train_boston,]
test.boston<-Boston[-train_boston,]
x_train.boston<-model.matrix(crim~.,train.boston)[,-1]
x_test.boston<-model.matrix(crim~.,test.boston)[,-1]
y_train.boston<-train.boston$crim
y_test.boston<-test.boston$crim
ridge_boston<-cv.glmnet(x_train.boston,y_train.boston,alpha=0)
ridge.pred_boston<-predict(ridge_boston,s=ridge_boston$lambda.min,newx=x_test.boston)
ridge.mse_boston<-mean((y_test.boston-ridge.pred_boston)^2)
ridge.mse_boston
## [1] 40.17065
lasso_boston<-cv.glmnet(x_train.boston,y_train.boston,alpha=1)
lasso.pred_boston<-predict(lasso_boston,s=lasso_boston$lambda.min,newx=x_test.boston)
lasso.mse_boston<-mean((y_test.boston-lasso.pred_boston)^2)
lasso.coef_boston<-predict(lasso_boston,s=lasso_boston$lambda.min,type="coefficients")
sum(lasso.coef_boston!=0)
## [1] 12
pcr_boston<-pcr(crim~.,data=train.boston,scale=TRUE,validation="CV")
pcr.m_boston<-which.min(pcr_boston$validation$PRESS)
pcr.pred_boston<-predict(pcr_boston,newdata=test.boston,ncomp=pcr.m_boston)
pcr.mse_boston<-mean((y_test.boston-pcr.pred_boston)^2)
pcr.m_boston
## [1] 12
pcr.mse_boston
## [1] 41.19923

Present and discuss results for the approaches that you consider.

The ridge, lasso, and PCR methods were utilized to predict the per capita crime rate. The ridge and lasso methods produced the best balance of predictive performance and model stability. However, the PCR method did not provide substantial performance gains for the Boston dataset with an MSE of 41.55 using 13 components.

  1. Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

The lasso regression method generated comparable predictive performance while performing variable selection, reducing the number of predictors with non-zero coefficients. This makes the lasso method is the best choice for model interpretability, if that is the priority. Both the lasso and ridge regression performed well to predict per capita crime rates in the Boston dataset. Overall, the ridge is preferred for maximum predictive accuracy.

  1. Does your chosen model involve all of the features in the data set? Why or why not?

The ridge regression model does involve all of the features in the dataset. Unlike the lasso regression it does not decrease coefficients exactly to zero, but rather decreases coefficients closer to zero. This type of regularization reduces variance and prevents overfitting. However, since all predictors are retained, it is less interpretable than lasso.