Chapter 6 #2, 9, 11
- For parts (a) through (c), indicate which of i. through iv. is
correct. Justify your answer.
- The lasso, relative to least squares, is:
- More flexible and hence will give improved prediction accuracy when
its increase in bias is less than its decrease in variance.
FALSE
- More flexible and hence will give improved prediction accuracy when
its increase in variance is less than its decrease in bias.
FALSE
- Less flexible and hence will give improved prediction accuracy when
its increase in bias is less than its decrease in variance.
TRUE
- Less flexible and hence will give improved prediction accuracy when
its increase in variance is less than its decrease in bias.
FALSE
- Repeat (a) for ridge regression relative to least squares.
FALSE
FALSE
TRUE
FALSE
- Repeat (a) for non-linear methods relative to least squares.
FALSE
TRUE
FALSE
FALSE
- In this exercise, we will predict the number of applications
received using the other variables in the College data set.
- Split the data set into a training set and a test set.
install.packages("pacman")
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.5/pacman_0.5.1.zip'
Content type 'application/zip' length 395200 bytes (385 KB)
downloaded 385 KB
package ‘pacman’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\aliso\AppData\Local\Temp\RtmpiauuDM\downloaded_packages
library("pacman")
p_load(ggplot2, dplyr, tidyr, tidyverse, ISLR, ISLR2, tibble, readr, purrr,stringr, forcats, lubridate, glmnet, leaps, caret, klaR, pls, mlbench)
# load the College data set and inspect
data("College")
str(College)
'data.frame': 777 obs. of 18 variables:
$ Private : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
$ Apps : num 1660 2186 1428 417 193 ...
$ Accept : num 1232 1924 1097 349 146 ...
$ Enroll : num 721 512 336 137 55 158 103 489 227 172 ...
$ Top10perc : num 23 16 22 60 16 38 17 37 30 21 ...
$ Top25perc : num 52 29 50 89 44 62 45 68 63 44 ...
$ F.Undergrad: num 2885 2683 1036 510 249 ...
$ P.Undergrad: num 537 1227 99 63 869 ...
$ Outstate : num 7440 12280 11250 12960 7560 ...
$ Room.Board : num 3300 6450 3750 5450 4120 ...
$ Books : num 450 750 400 450 800 500 500 450 300 660 ...
$ Personal : num 2200 1500 1165 875 1500 ...
$ PhD : num 70 29 53 92 76 67 90 89 79 40 ...
$ Terminal : num 78 30 66 97 72 73 93 100 84 41 ...
$ S.F.Ratio : num 18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...
$ perc.alumni: num 12 16 30 37 2 11 26 37 23 15 ...
$ Expend : num 7041 10527 8735 19016 10922 ...
$ Grad.Rate : num 60 56 54 59 15 55 63 73 80 52 ...
# define an 80%/20% train/test split of the data set
set.seed(997)
split=0.80
trainIndex <- createDataPartition(College$Apps, p=split, list=FALSE)
data_train <- College[trainIndex, ]
data_test <- College[-trainIndex, ]
- Fit a linear model using least squares on the training set, and
report the test error obtained.
# train a linear model using least squares
College_lm_model <- lm(Apps ~., data = data_train)
summary(College_lm_model)
Call:
lm(formula = Apps ~ ., data = data_train)
Residuals:
Min 1Q Median 3Q Max
-3609.2 -420.8 -51.4 273.8 7426.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -155.38283 418.34753 -0.371 0.71045
PrivateYes -704.24008 146.11853 -4.820 1.82e-06 ***
Accept 1.35999 0.05214 26.081 < 2e-16 ***
Enroll -0.44770 0.19118 -2.342 0.01951 *
Top10perc 41.90177 5.72564 7.318 8.01e-13 ***
Top25perc -10.28007 4.55448 -2.257 0.02436 *
F.Undergrad 0.06222 0.03244 1.918 0.05559 .
P.Undergrad 0.03826 0.03130 1.222 0.22203
Outstate -0.04971 0.02066 -2.406 0.01641 *
Room.Board 0.13239 0.05128 2.581 0.01007 *
Books 0.11964 0.26386 0.453 0.65042
Personal -0.03928 0.06510 -0.603 0.54649
PhD -6.79702 4.74604 -1.432 0.15262
Terminal -3.94673 5.22479 -0.755 0.45031
S.F.Ratio 2.54414 13.73904 0.185 0.85315
perc.alumni -6.17206 4.35038 -1.419 0.15649
Expend 0.06692 0.01353 4.947 9.78e-07 ***
Grad.Rate 9.50104 3.03593 3.130 0.00183 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 970.3 on 606 degrees of freedom
Multiple R-squared: 0.9284, Adjusted R-squared: 0.9264
F-statistic: 462 on 17 and 606 DF, p-value: < 2.2e-16
College_appPredict <- predict(College_lm_model, newdata = data_test)
MSE_lm <- mean((data_test$Apps - College_appPredict )^2)
print(sprintf("Linear model test MSE= %10.3f", MSE_lm))
[1] "Linear model test MSE= 1954809.607"
- Fit a ridge regression model on the training set, with λ chosen by
cross-validation. Report the test error obtained.
Xtrain <- model.matrix(Apps~., data = data_train)[,-1]
Ytrain <- data_train$Apps
Xtest <- model.matrix(Apps~., data = data_test)[,-1]
Ytest <- data_test$Apps
ridge_cv <- cv.glmnet(Xtrain, Ytrain, alpha = 0)
ridge_best_lambda <- ridge_cv$lambda.min
ridge_preds <- predict(ridge_cv, s = ridge_best_lambda, newx = Xtest)
mse_ridge <- mean((Ytest - ridge_preds)^2)
mse_ridge
[1] 3756604
- Fit a lasso model on the training set, with λ chosen by
crossvalidation. Report the test error obtained, along with the number
of non-zero coefficient estimates.
set.seed(997)
cv_lasso <- cv.glmnet(Xtrain, Ytrain, alpha = 1)
best_lasso <- cv_lasso$lambda.min
lasso.pred <- predict(cv_lasso, s = best_lasso, newx = Xtest)
mean((Ytest - lasso.pred)^2)
[1] 2155835
lasso_coef <- predict(cv_lasso, s = best_lasso, type = "coefficients")
sum(lasso_coef != 0)
[1] 15
- Fit a PCR model on the training set, with M chosen by
crossvalidation. Report the test error obtained, along with the value of
M selected by cross-validation.
set.seed(997)
pcr.fit = pcr(Apps~., data = data_train, scale = TRUE, validation = "CV")
summary(pcr.fit)
Data: X dimension: 624 17
Y dimension: 624 1
Fit method: svdpc
Number of components considered: 17
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
CV 3578 3548 1727 1712 1535 1370 1288 1258
adjCV 3578 3550 1724 1712 1530 1354 1284 1253
8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
CV 1234 1229 1228 1229 1224 1224 1228 1229
adjCV 1228 1226 1225 1226 1221 1221 1225 1227
16 comps 17 comps
CV 1033 1018
adjCV 1030 1015
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps 9 comps
X 31.990 57.73 64.67 70.36 75.80 80.76 84.37 87.78 90.76
Apps 2.361 77.49 77.94 82.81 86.49 87.88 88.61 88.95 89.21
10 comps 11 comps 12 comps 13 comps 14 comps 15 comps 16 comps 17 comps
X 93.11 95.12 96.89 97.95 98.87 99.42 99.83 100.00
Apps 89.22 89.24 89.34 89.37 89.37 89.54 92.46 92.84
validationplot(pcr.fit,val.type="MSEP")

NA
NA
pcr.pred <- predict(pcr.fit, newdata = data_test, ncomp = 17)
mean((data_test$Apps - pcr.pred)^2)
[1] 1954810
- Fit a PLS model on the training set, with M chosen by
crossvalidation. Report the test error obtained, along with the value of
M selected by cross-validation.
set.seed(997)
pls.fit=plsr(Apps~., data=data_train, scale=TRUE,validation ="CV")
summary (pls.fit)
Data: X dimension: 624 17
Y dimension: 624 1
Fit method: kernelpls
Number of components considered: 17
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
CV 3578 1557 1289 1184 1149 1096 1034 1023
adjCV 3578 1554 1292 1182 1144 1090 1030 1020
8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
CV 1021 1020 1018 1017 1018 1018 1018 1018
adjCV 1018 1017 1015 1014 1015 1015 1015 1015
16 comps 17 comps
CV 1018 1018
adjCV 1015 1015
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps 9 comps
X 25.90 41.71 63.11 66.9 70.57 74.11 77.98 81.08 83.95
Apps 81.83 87.62 89.84 90.9 92.02 92.63 92.72 92.76 92.78
10 comps 11 comps 12 comps 13 comps 14 comps 15 comps 16 comps 17 comps
X 86.84 89.67 91.10 93.57 95.34 96.96 98.98 100.00
Apps 92.81 92.83 92.83 92.84 92.84 92.84 92.84 92.84
validationplot(pls.fit,val.type="MSEP")

NA
NA
pls.pred=predict(pls.fit, newdata = data_test, ncomp =17)
mean((data_test$Apps - pls.pred)^2)
[1] 1954810
- Comment on the results obtained. How accurately can we predict the
number of college applications received? Is there much difference among
the test errors resulting from these five approaches?
#Least Square model
test.avg <- mean(data_test$Apps)
lm.r2 <- 1 - mean((College_appPredict - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)
#Ridge model
ridge.r2 <- 1 - mean((ridge_preds - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)
#Lasso model
lasso.r2 <- 1 - mean((lasso.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)
#PCR model
pcr.r2 <- 1 - mean((pcr.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)
#PLS model
pls.r2 <- 1 - mean((pls.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)
print(lm.r2)
[1] 0.9182287
print(ridge.r2)
[1] 0.8428582
print(lasso.r2)
[1] 0.9098196
print(pcr.r2)
[1] 0.9182287
print(pls.r2)
[1] 0.9182287
Least Squares, PCR, and PLS all have the same R^2 value. So they are
equal predictability.
- We will now try to predict per capita crime rate in the Boston data
set.
- Try out some of the regression methods explored in this chapter,
such as best subset selection, the lasso, ridge regression, and PCR.
Present and discuss results for the approaches that you consider.
library(pls)
data(Boston)
str(Boston)
attach(Boston)
set.seed(997)
pcr.fit1 = pcr(crim~., data = Boston, scale = TRUE, validation = "CV")
summary(pcr.fit1)
attach(Boston)
set.seed(997)
train <- sample(c(TRUE, FALSE), nrow(Boston), rep = TRUE)
test <- !train
pcr.fit = pcr(crim~., data = Boston[train,], scale = TRUE, validation = "CV")
summary(pcr.fit)
validationplot(pcr.fit, val.type = "MSEP")
set.seed(997)
pcr.fit=pcr(crim~., data=Boston, subset=train, scale=TRUE, validation ="CV")
validationplot(pcr.fit, val.type="MSEP")
cverr <- RMSEP(pcr.fit)$val[1,,-1]
imin <- which.min(cverr) - 1
imin
4 comps
3
set.seed(997)
pcr.pred = predict(pcr.fit, newdata = Boston[test, ], ncomp = 4)
mean((pcr.pred - Boston$crim[test])^2)
[1] 35.15133
best_ncomp <- which.min(pcr.fit$validation$PRESS)
best_ncomp
[1] 4
x_train <- model.matrix(crim ~ ., Boston[train, ])[, -1]
x_test <- model.matrix(crim ~ ., Boston[test, ])[, -1]
y_train <- Boston$crim[train]
y_test <- Boston$crim[test]
cv_ridge <- cv.glmnet(x_train, y_train, alpha = 0)
best_ridge <- cv_ridge$lambda.min
ridge.pred <- predict(cv_ridge, s = best_ridge, newx = x_test)
mean((y_test - ridge.pred)^2)
[1] 34.71123
plot(cv_ridge)
title("Ridge Regression: Cross-Validation Curve", line = 2.5)
set.seed(997)
cv_lasso <- cv.glmnet(x_train, y_train, alpha = 1)
best_lasso <- cv_lasso$lambda.min
lasso.pred <- predict(cv_lasso, s = best_lasso, newx = x_test)
mean((y_test - lasso.pred)^2)
[1] 34.97991
reg.fit.best <- regsubsets(crim ~ ., data = Boston[train, ], nvmax = 13)
test_matrix <- model.matrix(crim ~ ., data = Boston[test, ])
nv <- nrow(summary(reg.fit.best)$which)
val_errors <- rep(NA, nv)
for (i in 1:nv) {
coeficient <- coef(reg.fit.best, id = i)
pred <- test_matrix[, names(coeficient)] %*% coeficient
val_errors[i] <- mean((Boston$crim[test] - pred)^2)
}
min(val_errors)
[1] 35.26505
fitControl <- trainControl(
method = "repeatedcv",
number = 10,
repeats = 3,
verboseIter = TRUE # Show training progress
)
ridgeGrid <- expand.grid(alpha = 0,
lambda = 10^seq(-3, 1, length = 100)) # A range of lambda values
cat("\n--- Training Ridge Regression with caret ---\n")
--- Training Ridge Regression with caret ---
ridge_caret <- train(
crim ~ ., # Formula interface: response ~ all predictors
data = Boston, # Our data frame
method = "glmnet", # Specify the glmnet model
tuneGrid = ridgeGrid, # Our custom tuning grid for alpha and lambda
trControl = fitControl, # Our defined cross-validation control
preProcess = c("center", "scale") # Center and scale predictors (important for regularization)
)
print(ridge_caret)
# Plot the tuning results (RMSE vs lambda) using ggplot2 directly
ridge_plot_data <- ridge_caret$results
print(ggplot(ridge_plot_data, aes(x = lambda, y = RMSE)) +
geom_line() +
geom_point() +
# Add error bars if RMSESD is available in the results
{if("RMSESD" %in% names(ridge_plot_data)) geom_errorbar(aes(ymin = RMSE - RMSESD, ymax = RMSE + RMSESD), width = 0.01)} +
ggplot2::labs(title = "Ridge Regression Tuning (caret)",
x = "Lambda",
y = "RMSE") +
theme_minimal())
cat("Optimal lambda for Ridge (caret):", ridge_caret$bestTune$lambda, "\n")
Optimal lambda for Ridge (caret): 0.4641589
# --- Lasso Regression with caret ---
# For Lasso, we specify a tuneGrid where alpha is fixed at 1.
# caret will then search for the best lambda.
lassoGrid <- expand.grid(alpha = 1,
lambda = 10^seq(-3, 1, length = 100)) # A range of lambda values
cat("\n--- Training Lasso Regression with caret ---\n")
--- Training Lasso Regression with caret ---
lasso_caret <- train(
crim ~ .,
data = Boston,
method = "glmnet",
tuneGrid = lassoGrid,
trControl = fitControl,
preProcess = c("center", "scale")
)
# Print the model results
print(lasso_caret)
# Get the best lambda found by caret
cat("Optimal lambda for Lasso (caret):", lasso_caret$bestTune$lambda, "\n")
Optimal lambda for Lasso (caret): 0.0869749
# Get the coefficients from the best model
coef(lasso_caret$finalModel, s = lasso_caret$bestTune$lambda)
reg.fit.best <- regsubsets(crim ~ ., data = Boston[train, ], nvmax = 13)
test_matrix <- model.matrix(crim ~ ., data = Boston[test, ])
nv <- nrow(summary(reg.fit.best)$which)
val_errors <- rep(NA, nv)
for (i in 1:nv) {
coeficient <- coef(reg.fit.best, id = i)
pred <- test_matrix[, names(coeficient)] %*% coeficient
val_errors[i] <- mean((Boston$crim[test] - pred)^2)
}
min(val_errors)
[1] 35.26505
which.min(val_errors)
[1] 4
plot(val_errors, type = "b", xlab = "Number of Variables", ylab = "Validation MSE")
- Propose a model (or set of models) that seem to perform well on this
data set, and justify your answer. Make sure that you are evaluating
model performance using validation set error, crossvalidation, or some
other reasonable alternative, as opposed to using training error.
Ridge regression had the lowest MSE of 34.71 using cross
validation.
- Does your chosen model involve all of the features in the data set?
Why or why not?
The chosen model does use all features to evaluate the model.
---
title: "Assignment 5 STA 6543 UTSA SMMR 2025"
author: "AC Band"
date: "`r Sys.Date()`"
output: 
  html_notebook:
    toc: true
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Chapter 6 #2, 9, 11

2. For parts (a) through (c), indicate which of i. through iv. is correct.
Justify your answer.
(a) The lasso, relative to least squares, is:

i. More flexible and hence will give improved prediction accuracy
when its increase in bias is less than its decrease in
variance.

*FALSE*

ii. More flexible and hence will give improved prediction accuracy
when its increase in variance is less than its decrease
in bias.

*FALSE*

iii. Less flexible and hence will give improved prediction accuracy
when its increase in bias is less than its decrease in
variance.

*TRUE*

iv. Less flexible and hence will give improved prediction accuracy
when its increase in variance is less than its decrease
in bias.

*FALSE*

(b) Repeat (a) for ridge regression relative to least squares.

*FALSE*

*FALSE*

*TRUE*

*FALSE*



(c) Repeat (a) for non-linear methods relative to least squares.

*FALSE*

*TRUE*

*FALSE*

*FALSE*





9. In this exercise, we will predict the number of applications received
using the other variables in the College data set.
(a) Split the data set into a training set and a test set.

```{r}

install.packages("pacman")
library("pacman")

```

```{r}

p_load(ggplot2, dplyr, tidyr, tidyverse, ISLR, ISLR2, tibble, readr, purrr,stringr, forcats, lubridate, glmnet, leaps, caret, klaR, pls, mlbench)

```


```{r}

# load the College data set and inspect

data("College")

str(College)

```
```{r}
# define an 80%/20% train/test split of the data set

set.seed(997)

split=0.80

trainIndex <- createDataPartition(College$Apps, p=split, list=FALSE)

data_train <- College[trainIndex, ]

data_test <- College[-trainIndex, ]

```




(b) Fit a linear model using least squares on the training set, and
report the test error obtained.

```{r}
# train a linear model using least squares

College_lm_model <- lm(Apps ~., data = data_train)

summary(College_lm_model)

```

```{r}

College_appPredict <- predict(College_lm_model, newdata = data_test)

MSE_lm <- mean((data_test$Apps - College_appPredict )^2)

print(sprintf("Linear model test MSE= %10.3f", MSE_lm))

```

(c) Fit a ridge regression model on the training set, with λ chosen
by cross-validation. Report the test error obtained.

```{r}

Xtrain <- model.matrix(Apps~., data = data_train)[,-1]

Ytrain <- data_train$Apps

Xtest <- model.matrix(Apps~., data = data_test)[,-1]

Ytest <- data_test$Apps


ridge_cv <- cv.glmnet(Xtrain, Ytrain, alpha = 0)

ridge_best_lambda <- ridge_cv$lambda.min

ridge_preds <- predict(ridge_cv, s = ridge_best_lambda, newx = Xtest)

mse_ridge <- mean((Ytest - ridge_preds)^2)

mse_ridge

```


(d) Fit a lasso model on the training set, with λ chosen by crossvalidation.
Report the test error obtained, along with the number
of non-zero coefficient estimates.

```{r}

set.seed(997)

cv_lasso <- cv.glmnet(Xtrain, Ytrain, alpha = 1)
best_lasso <- cv_lasso$lambda.min


lasso.pred <- predict(cv_lasso, s = best_lasso, newx = Xtest)
mean((Ytest - lasso.pred)^2)


```

```{r}

lasso_coef <- predict(cv_lasso, s = best_lasso, type = "coefficients")
sum(lasso_coef != 0)


```

(e) Fit a PCR model on the training set, with M chosen by crossvalidation.
Report the test error obtained, along with the value
of M selected by cross-validation.

```{r}

set.seed(997)

pcr.fit = pcr(Apps~., data = data_train, scale = TRUE, validation = "CV")

summary(pcr.fit)



```

```{r}

validationplot(pcr.fit,val.type="MSEP")


```
```{r}

pcr.pred <- predict(pcr.fit, newdata = data_test, ncomp = 17)

mean((data_test$Apps - pcr.pred)^2)


```


(f) Fit a PLS model on the training set, with M chosen by crossvalidation.
Report the test error obtained, along with the value
of M selected by cross-validation.

```{r}

set.seed(997)

pls.fit=plsr(Apps~., data=data_train, scale=TRUE,validation ="CV")

summary (pls.fit)


```
```{r}

validationplot(pls.fit,val.type="MSEP")


```



```{r}

pls.pred=predict(pls.fit, newdata = data_test, ncomp =17)

mean((data_test$Apps - pls.pred)^2)

```

(g) Comment on the results obtained. How accurately can we predict
the number of college applications received? Is there much
difference among the test errors resulting from these five approaches?

```{r}

#Least Square model

test.avg <- mean(data_test$Apps)

lm.r2 <- 1 - mean((College_appPredict - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#Ridge model
ridge.r2 <- 1 - mean((ridge_preds - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#Lasso model
lasso.r2 <- 1 - mean((lasso.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#PCR model
pcr.r2 <- 1 - mean((pcr.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#PLS model
pls.r2 <- 1 - mean((pls.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)


print(lm.r2)
print(ridge.r2)
print(lasso.r2)
print(pcr.r2)
print(pls.r2)

```
Least Squares, PCR, and PLS all have the same R^2 value. So they are equal predictability.

11. We will now try to predict per capita crime rate in the Boston data
set.
(a) Try out some of the regression methods explored in this chapter,
such as best subset selection, the lasso, ridge regression, and
PCR. Present and discuss results for the approaches that you
consider.

```{r}

library(pls)

data(Boston)

str(Boston)


```
```{r}

attach(Boston)

set.seed(997)

pcr.fit1 = pcr(crim~., data = Boston, scale = TRUE, validation = "CV")

summary(pcr.fit1)

```





```{r}

attach(Boston)

set.seed(997)

train <- sample(c(TRUE, FALSE), nrow(Boston), rep = TRUE)

test <- !train

pcr.fit = pcr(crim~., data = Boston[train,], scale = TRUE, validation = "CV")

summary(pcr.fit)

```

```{r}

validationplot(pcr.fit, val.type = "MSEP")


```

```{r}

set.seed(997)

pcr.fit=pcr(crim~., data=Boston, subset=train, scale=TRUE, validation ="CV")

validationplot(pcr.fit, val.type="MSEP")


```





```{r}

cverr <- RMSEP(pcr.fit)$val[1,,-1]

imin <- which.min(cverr) - 1

imin

```


```{r}

set.seed(997)

pcr.pred = predict(pcr.fit, newdata = Boston[test, ], ncomp = 4)

mean((pcr.pred - Boston$crim[test])^2)


```
```{r}

best_ncomp <- which.min(pcr.fit$validation$PRESS)

best_ncomp

```
```{r}

x_train <- model.matrix(crim ~ ., Boston[train, ])[, -1]

x_test  <- model.matrix(crim ~ ., Boston[test, ])[, -1]

y_train <- Boston$crim[train]

y_test  <- Boston$crim[test]


cv_ridge <- cv.glmnet(x_train, y_train, alpha = 0)

best_ridge <- cv_ridge$lambda.min


ridge.pred <- predict(cv_ridge, s = best_ridge, newx = x_test)

mean((y_test - ridge.pred)^2)

```

```{r}

plot(cv_ridge)

title("Ridge Regression: Cross-Validation Curve", line = 2.5)
```
```{r}

set.seed(997)

cv_lasso <- cv.glmnet(x_train, y_train, alpha = 1)

best_lasso <- cv_lasso$lambda.min


lasso.pred <- predict(cv_lasso, s = best_lasso, newx = x_test)

mean((y_test - lasso.pred)^2)


```
```{r}

reg.fit.best <- regsubsets(crim ~ ., data = Boston[train, ], nvmax = 13)

test_matrix <- model.matrix(crim ~ ., data = Boston[test, ])


nv <- nrow(summary(reg.fit.best)$which)
val_errors <- rep(NA, nv)

for (i in 1:nv) {
  coeficient <- coef(reg.fit.best, id = i)
  pred <- test_matrix[, names(coeficient)] %*% coeficient
  val_errors[i] <- mean((Boston$crim[test] - pred)^2)
}


min(val_errors)



```
```{r}

fitControl <- trainControl(
  method = "repeatedcv",
  number = 10,
  repeats = 3,
  verboseIter = TRUE # Show training progress
)


```


```{r}


ridgeGrid <- expand.grid(alpha = 0,
                         lambda = 10^seq(-3, 1, length = 100)) # A range of lambda values

cat("\n--- Training Ridge Regression with caret ---\n")


```
```{r}

ridge_caret <- train(
  crim ~ .,                # Formula interface: response ~ all predictors
  data = Boston,         # Our data frame
  method = "glmnet",      # Specify the glmnet model
  tuneGrid = ridgeGrid,   # Our custom tuning grid for alpha and lambda
  trControl = fitControl, # Our defined cross-validation control
  preProcess = c("center", "scale") # Center and scale predictors (important for regularization)
)


```
```{r}

print(ridge_caret)

```
```{r}

# Plot the tuning results (RMSE vs lambda) using ggplot2 directly
ridge_plot_data <- ridge_caret$results
print(ggplot(ridge_plot_data, aes(x = lambda, y = RMSE)) +
        geom_line() +
        geom_point() +
        # Add error bars if RMSESD is available in the results
        {if("RMSESD" %in% names(ridge_plot_data)) geom_errorbar(aes(ymin = RMSE - RMSESD, ymax = RMSE + RMSESD), width = 0.01)} +
        ggplot2::labs(title = "Ridge Regression Tuning (caret)",
                      x = "Lambda",
                      y = "RMSE") +
        theme_minimal())


```


```{r}

cat("Optimal lambda for Ridge (caret):", ridge_caret$bestTune$lambda, "\n")

```

```{r}

# --- Lasso Regression with caret ---
# For Lasso, we specify a tuneGrid where alpha is fixed at 1.
# caret will then search for the best lambda.

lassoGrid <- expand.grid(alpha = 1,
                         lambda = 10^seq(-3, 1, length = 100)) # A range of lambda values

cat("\n--- Training Lasso Regression with caret ---\n")



```


```{r}

lasso_caret <- train(
 crim ~ .,
  data = Boston,
  method = "glmnet",
  tuneGrid = lassoGrid,
  trControl = fitControl,
  preProcess = c("center", "scale")
)



```

```{r}

# Print the model results
print(lasso_caret)


```

```{r}

# Get the best lambda found by caret
cat("Optimal lambda for Lasso (caret):", lasso_caret$bestTune$lambda, "\n")


```

```{r}

# Get the coefficients from the best model
coef(lasso_caret$finalModel, s = lasso_caret$bestTune$lambda)


```


```{r}

reg.fit.best <- regsubsets(crim ~ ., data = Boston[train, ], nvmax = 13)

test_matrix <- model.matrix(crim ~ ., data = Boston[test, ])


nv <- nrow(summary(reg.fit.best)$which)
val_errors <- rep(NA, nv)

for (i in 1:nv) {
  coeficient <- coef(reg.fit.best, id = i)
  pred <- test_matrix[, names(coeficient)] %*% coeficient
  val_errors[i] <- mean((Boston$crim[test] - pred)^2)
}


min(val_errors)


```


```{r}

which.min(val_errors)

```


```{r}

plot(val_errors, type = "b", xlab = "Number of Variables", ylab = "Validation MSE")

```

(b) Propose a model (or set of models) that seem to perform well on
this data set, and justify your answer. Make sure that you are
evaluating model performance using validation set error, crossvalidation,
or some other reasonable alternative, as opposed to
using training error.

Ridge regression had the lowest MSE of 34.71 using cross validation.


(c) Does your chosen model involve all of the features in the data
set? Why or why not?

The chosen model does use all features to evaluate the model. 
