HW 5 STA 4143

library(ISLR2)

library(glmnet)

## Loading required package: Matrix

## Loaded glmnet 4.1-3

library(pls)

## 
## Attaching package: 'pls'

## The following object is masked from 'package:stats':
## 
##     loadings

library(leaps)

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ tidyr::pack()   masks Matrix::pack()
## ✖ tidyr::unpack() masks Matrix::unpack()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ISLR)

## 
## Attaching package: 'ISLR'

## The following objects are masked from 'package:ISLR2':
## 
##     Auto, Credit

library(pls)

library(MASS)

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## The following object is masked from 'package:ISLR2':
## 
##     Boston

Problem 2.

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer. (a) The lasso, relative to least squares, is: iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. When Least squares estimates have high variance then Lasso can reduce its variance by slightly increasing bias which in turn generates more accurate predictions. (b) Repeat (a) for ridge regression relative to least squares. iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. Ridge regression also reduces variance by slightly increasing its bias similar to Lasso. Ridge Regression works best in situations where the least squares estimates have high variance. (c) Repeat (a) for non-linear methods relative to least squares. ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias. In situations where the relationship of the response and the prediction variable is close to linear , least squares will have low bias but high variance. This will make the least squares extremely variable without a solution.

Problem 9.

In this exercise, we will predict the number of applications received using the other variables in the College data set. (a) Split the data set into a training set and a test set.

attach(College)

set.seed(5)
coll.num = sample(nrow(College),(nrow(College)*.75))
coll.train = College[coll.num,]
coll.test = College[-coll.num,]

(b) Fit a linear model using least squares on the training set, and report the test error obtained. The Test MSE is 1393022

coll.lm = lm(Apps ~ ., data = coll.train)
coll.pred = predict(coll.lm, coll.test)
coll.mse = mean((coll.pred - coll.test$Apps)^2)
coll.mse

## [1] 1393022

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained. The test error is 1392995

library(glmnet)
set.seed(5)
rid.train = model.matrix(Apps ~., data = coll.train)
rid.test = model.matrix(Apps ~. , data = coll.test)
grid <- 10^seq(10, -2, length = 100)
rid.fit = glmnet(rid.train, coll.train$Apps, alpha = 0, lambda = grid, thresh = 1e-12)
rid.val = cv.glmnet(rid.train, coll.train$Apps, alpha = 0, lambda = grid, thresh = 1e-12)
bestlam = rid.val$lambda.min
rid.pred = predict(rid.fit, s = bestlam, newx = rid.test)
rid.mse =  mean((rid.pred - coll.test$Apps)^2)
rid.mse

## [1] 1392995

(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates. The test error obtained is 1392969. There are 16 non - zero coefficients.

set.seed(5)
lasso.fit = glmnet(rid.train, coll.train$Apps, alpha = 1, lambda = grid, thresh = 1e-12)
lasso.val = cv.glmnet(rid.train, coll.train$Apps, alpha = 1, lambda = grid, thresh = 1e-12)
lasso.lam = lasso.val$lambda.min
lasso.pred = predict(lasso.fit, s = lasso.lam, newx = rid.test)
lasso.mse = mean((lasso.pred - coll.test$Apps)^2)
lasso.mse

## [1] 1392969

lasso.coeff = predict(lasso.fit, type = "coefficients" , s = lasso.lam)
lasso.coeff[lasso.coeff!=0]

## <sparse>[ <logic> ] : .M.sub.i.logical() maybe inefficient

##  [1] -5.181166e+02 -4.744945e+02  1.637170e+00 -1.252324e+00  4.153058e+01
##  [6] -8.295840e+00  1.125699e-01 -2.090781e-03 -8.658611e-02  1.779387e-01
## [11] -7.260300e-02  1.260943e-02 -1.049697e+01 -1.385689e+00  1.770654e+01
## [16]  6.083835e-01  6.004785e-02  9.156623e+00

(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation. The test error is 1937716. The M value is 9.

library(pls)
set.seed(2)
pcr.fit = pcr(Apps ~ ., data = coll.train, scale = TRUE, validation = "CV")
validationplot(pcr.fit, val.type = "MSEP")

pcr.pred = predict(pcr.fit, coll.test, ncomp = 9)
pcr.mse = mean((pcr.pred - coll.test$Apps)^2)
pcr.mse

## [1] 1937716

(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation. The test error is 1344300. The M is 10.

set.seed(4)
pls.fit = plsr(Apps ~., data = coll.train , scale = TRUE, validation = "CV")
validationplot(pls.fit, val.type = "MSEP")

pls.pred = predict(pls.fit, coll.test, ncomp = 10)
pls.mse = mean((pls.pred - coll.test$Apps)^2)
pls.mse

## [1] 1344300

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches? They all performed very close to one another. PCR had the highest test error and the PLS had the smallest test error.

Problem 11.

We will now try to predict per capita crime rate in the Boston data set. (a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider. The ridge regression model had a test error rate of 12.3, Lasso had a test error rate of 12.3 and the PCR had a test error rate of 12.7. ridge regressio and lasso performed better.

attach(Boston)
set.seed(5)
bos.num = sample(nrow(Boston),(nrow(Boston)*.75))
bos.train = Boston[bos.num,]
bos.test = Boston[-bos.num,]

Ridge Regression Model

set.seed(5)
ridb.train = model.matrix(crim ~., data = bos.train)
ridb.test = model.matrix(crim ~. , data = bos.test)
gridb <- 10 ^ seq(10, -2, length = 100)
ridb.fit = glmnet(ridb.train, bos.train$crim, alpha = 0, lambda = gridb, thresh = 1e-12)
ridb.val = cv.glmnet(ridb.train, bos.train$crim, alpha = 0, lambda = gridb, thresh = 1e-12)
bestblam = ridb.val$lambda.min
ridb.pred = predict(ridb.fit, s = bestblam, newx = ridb.test)
ridb.mse =  mean((ridb.pred - bos.test$crim) ^ 2)
ridb.mse

## [1] 12.28061

Lasso

set.seed(5)
lassob.fit = glmnet(ridb.train, bos.train$crim, alpha = 1, lambda = gridb, thresh = 1e-12)
lassob.val = cv.glmnet(ridb.train, bos.train$crim, alpha = 1, lambda = gridb, thresh = 1e-12)
lassob.lam = lassob.val$lambda.min
lassob.pred = predict(lassob.fit, s = lassob.lam, newx = ridb.test)
lassob.mse = mean((lassob.pred - bos.test$crim)^2)
lassob.mse

## [1] 12.28103

PCR

library(pls)
set.seed(2)
pcrb.fit = pcr(crim ~ ., data = bos.train, scale = TRUE, validation = "CV")
validationplot(pcrb.fit, val.type = "MSEP")

pcrb.pred = predict(pcrb.fit, bos.test, ncomp = 8)
pcrb.mse = mean((pcrb.pred - bos.test$crim)^2)
pcrb.mse

## [1] 12.70854

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error. Lasso performed better with this dataset compared to ridge and PCR.Lasso only beat ridge regression by a small amount. Lasso test error = 12.28103 and ridge regression test error = 12.28061 (c) Does your chosen model involve all of the features in the data set? Why or why not? My chosen model does include all of the features in the dataset.

HW 5 STA 4143

Linda Davila

3/8/2023

Problem 2.

Problem 9.

Problem 11.