Chapter 6 #2, 9, 11

  1. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
  1. The lasso, relative to least squares, is:
  1. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

FALSE

  1. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

FALSE

  1. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

TRUE

  1. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

FALSE

  1. Repeat (a) for ridge regression relative to least squares.

FALSE

FALSE

TRUE

FALSE

  1. Repeat (a) for non-linear methods relative to least squares.

FALSE

TRUE

FALSE

FALSE

  1. In this exercise, we will predict the number of applications received using the other variables in the College data set.
  1. Split the data set into a training set and a test set.

install.packages("pacman")
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.5/pacman_0.5.1.zip'
Content type 'application/zip' length 395200 bytes (385 KB)
downloaded 385 KB
package ‘pacman’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\aliso\AppData\Local\Temp\RtmpiauuDM\downloaded_packages
library("pacman")

p_load(ggplot2, dplyr, tidyr, tidyverse, ISLR, ISLR2, tibble, readr, purrr,stringr, forcats, lubridate, glmnet, leaps, caret, klaR, pls, mlbench)

# load the College data set and inspect

data("College")

str(College)
'data.frame':   777 obs. of  18 variables:
 $ Private    : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
 $ Apps       : num  1660 2186 1428 417 193 ...
 $ Accept     : num  1232 1924 1097 349 146 ...
 $ Enroll     : num  721 512 336 137 55 158 103 489 227 172 ...
 $ Top10perc  : num  23 16 22 60 16 38 17 37 30 21 ...
 $ Top25perc  : num  52 29 50 89 44 62 45 68 63 44 ...
 $ F.Undergrad: num  2885 2683 1036 510 249 ...
 $ P.Undergrad: num  537 1227 99 63 869 ...
 $ Outstate   : num  7440 12280 11250 12960 7560 ...
 $ Room.Board : num  3300 6450 3750 5450 4120 ...
 $ Books      : num  450 750 400 450 800 500 500 450 300 660 ...
 $ Personal   : num  2200 1500 1165 875 1500 ...
 $ PhD        : num  70 29 53 92 76 67 90 89 79 40 ...
 $ Terminal   : num  78 30 66 97 72 73 93 100 84 41 ...
 $ S.F.Ratio  : num  18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...
 $ perc.alumni: num  12 16 30 37 2 11 26 37 23 15 ...
 $ Expend     : num  7041 10527 8735 19016 10922 ...
 $ Grad.Rate  : num  60 56 54 59 15 55 63 73 80 52 ...
# define an 80%/20% train/test split of the data set

set.seed(997)

split=0.80

trainIndex <- createDataPartition(College$Apps, p=split, list=FALSE)

data_train <- College[trainIndex, ]

data_test <- College[-trainIndex, ]
  1. Fit a linear model using least squares on the training set, and report the test error obtained.
# train a linear model using least squares

College_lm_model <- lm(Apps ~., data = data_train)

summary(College_lm_model)

Call:
lm(formula = Apps ~ ., data = data_train)

Residuals:
    Min      1Q  Median      3Q     Max 
-3609.2  -420.8   -51.4   273.8  7426.4 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -155.38283  418.34753  -0.371  0.71045    
PrivateYes  -704.24008  146.11853  -4.820 1.82e-06 ***
Accept         1.35999    0.05214  26.081  < 2e-16 ***
Enroll        -0.44770    0.19118  -2.342  0.01951 *  
Top10perc     41.90177    5.72564   7.318 8.01e-13 ***
Top25perc    -10.28007    4.55448  -2.257  0.02436 *  
F.Undergrad    0.06222    0.03244   1.918  0.05559 .  
P.Undergrad    0.03826    0.03130   1.222  0.22203    
Outstate      -0.04971    0.02066  -2.406  0.01641 *  
Room.Board     0.13239    0.05128   2.581  0.01007 *  
Books          0.11964    0.26386   0.453  0.65042    
Personal      -0.03928    0.06510  -0.603  0.54649    
PhD           -6.79702    4.74604  -1.432  0.15262    
Terminal      -3.94673    5.22479  -0.755  0.45031    
S.F.Ratio      2.54414   13.73904   0.185  0.85315    
perc.alumni   -6.17206    4.35038  -1.419  0.15649    
Expend         0.06692    0.01353   4.947 9.78e-07 ***
Grad.Rate      9.50104    3.03593   3.130  0.00183 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 970.3 on 606 degrees of freedom
Multiple R-squared:  0.9284,    Adjusted R-squared:  0.9264 
F-statistic:   462 on 17 and 606 DF,  p-value: < 2.2e-16

College_appPredict <- predict(College_lm_model, newdata = data_test)

MSE_lm <- mean((data_test$Apps - College_appPredict )^2)

print(sprintf("Linear model test MSE= %10.3f", MSE_lm))
[1] "Linear model test MSE= 1954809.607"
  1. Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

Xtrain <- model.matrix(Apps~., data = data_train)[,-1]

Ytrain <- data_train$Apps

Xtest <- model.matrix(Apps~., data = data_test)[,-1]

Ytest <- data_test$Apps


ridge_cv <- cv.glmnet(Xtrain, Ytrain, alpha = 0)

ridge_best_lambda <- ridge_cv$lambda.min

ridge_preds <- predict(ridge_cv, s = ridge_best_lambda, newx = Xtest)

mse_ridge <- mean((Ytest - ridge_preds)^2)

mse_ridge
[1] 3756604
  1. Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.

set.seed(997)

cv_lasso <- cv.glmnet(Xtrain, Ytrain, alpha = 1)
best_lasso <- cv_lasso$lambda.min


lasso.pred <- predict(cv_lasso, s = best_lasso, newx = Xtest)
mean((Ytest - lasso.pred)^2)
[1] 2155835

lasso_coef <- predict(cv_lasso, s = best_lasso, type = "coefficients")
sum(lasso_coef != 0)
[1] 15
  1. Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

set.seed(997)

pcr.fit = pcr(Apps~., data = data_train, scale = TRUE, validation = "CV")

summary(pcr.fit)
Data:   X dimension: 624 17 
    Y dimension: 624 1
Fit method: svdpc
Number of components considered: 17

VALIDATION: RMSEP
Cross-validated using 10 random segments.
       (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
CV            3578     3548     1727     1712     1535     1370     1288     1258
adjCV         3578     3550     1724     1712     1530     1354     1284     1253
       8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
CV        1234     1229      1228      1229      1224      1224      1228      1229
adjCV     1228     1226      1225      1226      1221      1221      1225      1227
       16 comps  17 comps
CV         1033      1018
adjCV      1030      1015

TRAINING: % variance explained
      1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps  9 comps
X      31.990    57.73    64.67    70.36    75.80    80.76    84.37    87.78    90.76
Apps    2.361    77.49    77.94    82.81    86.49    87.88    88.61    88.95    89.21
      10 comps  11 comps  12 comps  13 comps  14 comps  15 comps  16 comps  17 comps
X        93.11     95.12     96.89     97.95     98.87     99.42     99.83    100.00
Apps     89.22     89.24     89.34     89.37     89.37     89.54     92.46     92.84

validationplot(pcr.fit,val.type="MSEP")

NA
NA

pcr.pred <- predict(pcr.fit, newdata = data_test, ncomp = 17)

mean((data_test$Apps - pcr.pred)^2)
[1] 1954810
  1. Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

set.seed(997)

pls.fit=plsr(Apps~., data=data_train, scale=TRUE,validation ="CV")

summary (pls.fit)
Data:   X dimension: 624 17 
    Y dimension: 624 1
Fit method: kernelpls
Number of components considered: 17

VALIDATION: RMSEP
Cross-validated using 10 random segments.
       (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
CV            3578     1557     1289     1184     1149     1096     1034     1023
adjCV         3578     1554     1292     1182     1144     1090     1030     1020
       8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
CV        1021     1020      1018      1017      1018      1018      1018      1018
adjCV     1018     1017      1015      1014      1015      1015      1015      1015
       16 comps  17 comps
CV         1018      1018
adjCV      1015      1015

TRAINING: % variance explained
      1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps  9 comps
X       25.90    41.71    63.11     66.9    70.57    74.11    77.98    81.08    83.95
Apps    81.83    87.62    89.84     90.9    92.02    92.63    92.72    92.76    92.78
      10 comps  11 comps  12 comps  13 comps  14 comps  15 comps  16 comps  17 comps
X        86.84     89.67     91.10     93.57     95.34     96.96     98.98    100.00
Apps     92.81     92.83     92.83     92.84     92.84     92.84     92.84     92.84

validationplot(pls.fit,val.type="MSEP")

NA
NA

pls.pred=predict(pls.fit, newdata = data_test, ncomp =17)

mean((data_test$Apps - pls.pred)^2)
[1] 1954810
  1. Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

#Least Square model

test.avg <- mean(data_test$Apps)

lm.r2 <- 1 - mean((College_appPredict - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#Ridge model
ridge.r2 <- 1 - mean((ridge_preds - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#Lasso model
lasso.r2 <- 1 - mean((lasso.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#PCR model
pcr.r2 <- 1 - mean((pcr.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#PLS model
pls.r2 <- 1 - mean((pls.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)


print(lm.r2)
[1] 0.9182287
print(ridge.r2)
[1] 0.8428582
print(lasso.r2)
[1] 0.9098196
print(pcr.r2)
[1] 0.9182287
print(pls.r2)
[1] 0.9182287

Least Squares, PCR, and PLS all have the same R^2 value. So they are equal predictability.

  1. We will now try to predict per capita crime rate in the Boston data set.
  1. Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

library(pls)

data(Boston)

str(Boston)

attach(Boston)

set.seed(997)

pcr.fit1 = pcr(crim~., data = Boston, scale = TRUE, validation = "CV")

summary(pcr.fit1)

attach(Boston)

set.seed(997)

train <- sample(c(TRUE, FALSE), nrow(Boston), rep = TRUE)

test <- !train

pcr.fit = pcr(crim~., data = Boston[train,], scale = TRUE, validation = "CV")

summary(pcr.fit)

validationplot(pcr.fit, val.type = "MSEP")

set.seed(997)

pcr.fit=pcr(crim~., data=Boston, subset=train, scale=TRUE, validation ="CV")

validationplot(pcr.fit, val.type="MSEP")

cverr <- RMSEP(pcr.fit)$val[1,,-1]

imin <- which.min(cverr) - 1

imin
4 comps 
      3 

set.seed(997)

pcr.pred = predict(pcr.fit, newdata = Boston[test, ], ncomp = 4)

mean((pcr.pred - Boston$crim[test])^2)
[1] 35.15133

best_ncomp <- which.min(pcr.fit$validation$PRESS)

best_ncomp
[1] 4

x_train <- model.matrix(crim ~ ., Boston[train, ])[, -1]

x_test  <- model.matrix(crim ~ ., Boston[test, ])[, -1]

y_train <- Boston$crim[train]

y_test  <- Boston$crim[test]


cv_ridge <- cv.glmnet(x_train, y_train, alpha = 0)

best_ridge <- cv_ridge$lambda.min


ridge.pred <- predict(cv_ridge, s = best_ridge, newx = x_test)

mean((y_test - ridge.pred)^2)
[1] 34.71123

plot(cv_ridge)

title("Ridge Regression: Cross-Validation Curve", line = 2.5)

set.seed(997)

cv_lasso <- cv.glmnet(x_train, y_train, alpha = 1)

best_lasso <- cv_lasso$lambda.min


lasso.pred <- predict(cv_lasso, s = best_lasso, newx = x_test)

mean((y_test - lasso.pred)^2)
[1] 34.97991

reg.fit.best <- regsubsets(crim ~ ., data = Boston[train, ], nvmax = 13)

test_matrix <- model.matrix(crim ~ ., data = Boston[test, ])


nv <- nrow(summary(reg.fit.best)$which)
val_errors <- rep(NA, nv)

for (i in 1:nv) {
  coeficient <- coef(reg.fit.best, id = i)
  pred <- test_matrix[, names(coeficient)] %*% coeficient
  val_errors[i] <- mean((Boston$crim[test] - pred)^2)
}


min(val_errors)
[1] 35.26505

fitControl <- trainControl(
  method = "repeatedcv",
  number = 10,
  repeats = 3,
  verboseIter = TRUE # Show training progress
)


ridgeGrid <- expand.grid(alpha = 0,
                         lambda = 10^seq(-3, 1, length = 100)) # A range of lambda values

cat("\n--- Training Ridge Regression with caret ---\n")

--- Training Ridge Regression with caret ---

ridge_caret <- train(
  crim ~ .,                # Formula interface: response ~ all predictors
  data = Boston,         # Our data frame
  method = "glmnet",      # Specify the glmnet model
  tuneGrid = ridgeGrid,   # Our custom tuning grid for alpha and lambda
  trControl = fitControl, # Our defined cross-validation control
  preProcess = c("center", "scale") # Center and scale predictors (important for regularization)
)

print(ridge_caret)

# Plot the tuning results (RMSE vs lambda) using ggplot2 directly
ridge_plot_data <- ridge_caret$results
print(ggplot(ridge_plot_data, aes(x = lambda, y = RMSE)) +
        geom_line() +
        geom_point() +
        # Add error bars if RMSESD is available in the results
        {if("RMSESD" %in% names(ridge_plot_data)) geom_errorbar(aes(ymin = RMSE - RMSESD, ymax = RMSE + RMSESD), width = 0.01)} +
        ggplot2::labs(title = "Ridge Regression Tuning (caret)",
                      x = "Lambda",
                      y = "RMSE") +
        theme_minimal())

cat("Optimal lambda for Ridge (caret):", ridge_caret$bestTune$lambda, "\n")
Optimal lambda for Ridge (caret): 0.4641589 

# --- Lasso Regression with caret ---
# For Lasso, we specify a tuneGrid where alpha is fixed at 1.
# caret will then search for the best lambda.

lassoGrid <- expand.grid(alpha = 1,
                         lambda = 10^seq(-3, 1, length = 100)) # A range of lambda values

cat("\n--- Training Lasso Regression with caret ---\n")

--- Training Lasso Regression with caret ---

lasso_caret <- train(
 crim ~ .,
  data = Boston,
  method = "glmnet",
  tuneGrid = lassoGrid,
  trControl = fitControl,
  preProcess = c("center", "scale")
)

# Print the model results
print(lasso_caret)

# Get the best lambda found by caret
cat("Optimal lambda for Lasso (caret):", lasso_caret$bestTune$lambda, "\n")
Optimal lambda for Lasso (caret): 0.0869749 

# Get the coefficients from the best model
coef(lasso_caret$finalModel, s = lasso_caret$bestTune$lambda)

reg.fit.best <- regsubsets(crim ~ ., data = Boston[train, ], nvmax = 13)

test_matrix <- model.matrix(crim ~ ., data = Boston[test, ])


nv <- nrow(summary(reg.fit.best)$which)
val_errors <- rep(NA, nv)

for (i in 1:nv) {
  coeficient <- coef(reg.fit.best, id = i)
  pred <- test_matrix[, names(coeficient)] %*% coeficient
  val_errors[i] <- mean((Boston$crim[test] - pred)^2)
}


min(val_errors)
[1] 35.26505

which.min(val_errors)
[1] 4

plot(val_errors, type = "b", xlab = "Number of Variables", ylab = "Validation MSE")
  1. Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

Ridge regression had the lowest MSE of 34.71 using cross validation.

  1. Does your chosen model involve all of the features in the data set? Why or why not?

The chosen model does use all features to evaluate the model.

---
title: "Assignment 5 STA 6543 UTSA SMMR 2025"
author: "AC Band"
date: "`r Sys.Date()`"
output: 
  html_notebook:
    toc: true
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Chapter 6 #2, 9, 11

2. For parts (a) through (c), indicate which of i. through iv. is correct.
Justify your answer.
(a) The lasso, relative to least squares, is:

i. More flexible and hence will give improved prediction accuracy
when its increase in bias is less than its decrease in
variance.

*FALSE*

ii. More flexible and hence will give improved prediction accuracy
when its increase in variance is less than its decrease
in bias.

*FALSE*

iii. Less flexible and hence will give improved prediction accuracy
when its increase in bias is less than its decrease in
variance.

*TRUE*

iv. Less flexible and hence will give improved prediction accuracy
when its increase in variance is less than its decrease
in bias.

*FALSE*

(b) Repeat (a) for ridge regression relative to least squares.

*FALSE*

*FALSE*

*TRUE*

*FALSE*



(c) Repeat (a) for non-linear methods relative to least squares.

*FALSE*

*TRUE*

*FALSE*

*FALSE*





9. In this exercise, we will predict the number of applications received
using the other variables in the College data set.
(a) Split the data set into a training set and a test set.

```{r}

install.packages("pacman")
library("pacman")

```

```{r}

p_load(ggplot2, dplyr, tidyr, tidyverse, ISLR, ISLR2, tibble, readr, purrr,stringr, forcats, lubridate, glmnet, leaps, caret, klaR, pls, mlbench)

```


```{r}

# load the College data set and inspect

data("College")

str(College)

```
```{r}
# define an 80%/20% train/test split of the data set

set.seed(997)

split=0.80

trainIndex <- createDataPartition(College$Apps, p=split, list=FALSE)

data_train <- College[trainIndex, ]

data_test <- College[-trainIndex, ]

```




(b) Fit a linear model using least squares on the training set, and
report the test error obtained.

```{r}
# train a linear model using least squares

College_lm_model <- lm(Apps ~., data = data_train)

summary(College_lm_model)

```

```{r}

College_appPredict <- predict(College_lm_model, newdata = data_test)

MSE_lm <- mean((data_test$Apps - College_appPredict )^2)

print(sprintf("Linear model test MSE= %10.3f", MSE_lm))

```

(c) Fit a ridge regression model on the training set, with λ chosen
by cross-validation. Report the test error obtained.

```{r}

Xtrain <- model.matrix(Apps~., data = data_train)[,-1]

Ytrain <- data_train$Apps

Xtest <- model.matrix(Apps~., data = data_test)[,-1]

Ytest <- data_test$Apps


ridge_cv <- cv.glmnet(Xtrain, Ytrain, alpha = 0)

ridge_best_lambda <- ridge_cv$lambda.min

ridge_preds <- predict(ridge_cv, s = ridge_best_lambda, newx = Xtest)

mse_ridge <- mean((Ytest - ridge_preds)^2)

mse_ridge

```


(d) Fit a lasso model on the training set, with λ chosen by crossvalidation.
Report the test error obtained, along with the number
of non-zero coefficient estimates.

```{r}

set.seed(997)

cv_lasso <- cv.glmnet(Xtrain, Ytrain, alpha = 1)
best_lasso <- cv_lasso$lambda.min


lasso.pred <- predict(cv_lasso, s = best_lasso, newx = Xtest)
mean((Ytest - lasso.pred)^2)


```

```{r}

lasso_coef <- predict(cv_lasso, s = best_lasso, type = "coefficients")
sum(lasso_coef != 0)


```

(e) Fit a PCR model on the training set, with M chosen by crossvalidation.
Report the test error obtained, along with the value
of M selected by cross-validation.

```{r}

set.seed(997)

pcr.fit = pcr(Apps~., data = data_train, scale = TRUE, validation = "CV")

summary(pcr.fit)



```

```{r}

validationplot(pcr.fit,val.type="MSEP")


```
```{r}

pcr.pred <- predict(pcr.fit, newdata = data_test, ncomp = 17)

mean((data_test$Apps - pcr.pred)^2)


```


(f) Fit a PLS model on the training set, with M chosen by crossvalidation.
Report the test error obtained, along with the value
of M selected by cross-validation.

```{r}

set.seed(997)

pls.fit=plsr(Apps~., data=data_train, scale=TRUE,validation ="CV")

summary (pls.fit)


```
```{r}

validationplot(pls.fit,val.type="MSEP")


```



```{r}

pls.pred=predict(pls.fit, newdata = data_test, ncomp =17)

mean((data_test$Apps - pls.pred)^2)

```

(g) Comment on the results obtained. How accurately can we predict
the number of college applications received? Is there much
difference among the test errors resulting from these five approaches?

```{r}

#Least Square model

test.avg <- mean(data_test$Apps)

lm.r2 <- 1 - mean((College_appPredict - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#Ridge model
ridge.r2 <- 1 - mean((ridge_preds - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#Lasso model
lasso.r2 <- 1 - mean((lasso.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#PCR model
pcr.r2 <- 1 - mean((pcr.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)

#PLS model
pls.r2 <- 1 - mean((pls.pred - data_test$Apps)^2) / mean((test.avg - data_test$Apps)^2)


print(lm.r2)
print(ridge.r2)
print(lasso.r2)
print(pcr.r2)
print(pls.r2)

```
Least Squares, PCR, and PLS all have the same R^2 value. So they are equal predictability.

11. We will now try to predict per capita crime rate in the Boston data
set.
(a) Try out some of the regression methods explored in this chapter,
such as best subset selection, the lasso, ridge regression, and
PCR. Present and discuss results for the approaches that you
consider.

```{r}

library(pls)

data(Boston)

str(Boston)


```
```{r}

attach(Boston)

set.seed(997)

pcr.fit1 = pcr(crim~., data = Boston, scale = TRUE, validation = "CV")

summary(pcr.fit1)

```





```{r}

attach(Boston)

set.seed(997)

train <- sample(c(TRUE, FALSE), nrow(Boston), rep = TRUE)

test <- !train

pcr.fit = pcr(crim~., data = Boston[train,], scale = TRUE, validation = "CV")

summary(pcr.fit)

```

```{r}

validationplot(pcr.fit, val.type = "MSEP")


```

```{r}

set.seed(997)

pcr.fit=pcr(crim~., data=Boston, subset=train, scale=TRUE, validation ="CV")

validationplot(pcr.fit, val.type="MSEP")


```





```{r}

cverr <- RMSEP(pcr.fit)$val[1,,-1]

imin <- which.min(cverr) - 1

imin

```


```{r}

set.seed(997)

pcr.pred = predict(pcr.fit, newdata = Boston[test, ], ncomp = 4)

mean((pcr.pred - Boston$crim[test])^2)


```
```{r}

best_ncomp <- which.min(pcr.fit$validation$PRESS)

best_ncomp

```
```{r}

x_train <- model.matrix(crim ~ ., Boston[train, ])[, -1]

x_test  <- model.matrix(crim ~ ., Boston[test, ])[, -1]

y_train <- Boston$crim[train]

y_test  <- Boston$crim[test]


cv_ridge <- cv.glmnet(x_train, y_train, alpha = 0)

best_ridge <- cv_ridge$lambda.min


ridge.pred <- predict(cv_ridge, s = best_ridge, newx = x_test)

mean((y_test - ridge.pred)^2)

```

```{r}

plot(cv_ridge)

title("Ridge Regression: Cross-Validation Curve", line = 2.5)
```
```{r}

set.seed(997)

cv_lasso <- cv.glmnet(x_train, y_train, alpha = 1)

best_lasso <- cv_lasso$lambda.min


lasso.pred <- predict(cv_lasso, s = best_lasso, newx = x_test)

mean((y_test - lasso.pred)^2)


```
```{r}

reg.fit.best <- regsubsets(crim ~ ., data = Boston[train, ], nvmax = 13)

test_matrix <- model.matrix(crim ~ ., data = Boston[test, ])


nv <- nrow(summary(reg.fit.best)$which)
val_errors <- rep(NA, nv)

for (i in 1:nv) {
  coeficient <- coef(reg.fit.best, id = i)
  pred <- test_matrix[, names(coeficient)] %*% coeficient
  val_errors[i] <- mean((Boston$crim[test] - pred)^2)
}


min(val_errors)



```
```{r}

fitControl <- trainControl(
  method = "repeatedcv",
  number = 10,
  repeats = 3,
  verboseIter = TRUE # Show training progress
)


```


```{r}


ridgeGrid <- expand.grid(alpha = 0,
                         lambda = 10^seq(-3, 1, length = 100)) # A range of lambda values

cat("\n--- Training Ridge Regression with caret ---\n")


```
```{r}

ridge_caret <- train(
  crim ~ .,                # Formula interface: response ~ all predictors
  data = Boston,         # Our data frame
  method = "glmnet",      # Specify the glmnet model
  tuneGrid = ridgeGrid,   # Our custom tuning grid for alpha and lambda
  trControl = fitControl, # Our defined cross-validation control
  preProcess = c("center", "scale") # Center and scale predictors (important for regularization)
)


```
```{r}

print(ridge_caret)

```
```{r}

# Plot the tuning results (RMSE vs lambda) using ggplot2 directly
ridge_plot_data <- ridge_caret$results
print(ggplot(ridge_plot_data, aes(x = lambda, y = RMSE)) +
        geom_line() +
        geom_point() +
        # Add error bars if RMSESD is available in the results
        {if("RMSESD" %in% names(ridge_plot_data)) geom_errorbar(aes(ymin = RMSE - RMSESD, ymax = RMSE + RMSESD), width = 0.01)} +
        ggplot2::labs(title = "Ridge Regression Tuning (caret)",
                      x = "Lambda",
                      y = "RMSE") +
        theme_minimal())


```


```{r}

cat("Optimal lambda for Ridge (caret):", ridge_caret$bestTune$lambda, "\n")

```

```{r}

# --- Lasso Regression with caret ---
# For Lasso, we specify a tuneGrid where alpha is fixed at 1.
# caret will then search for the best lambda.

lassoGrid <- expand.grid(alpha = 1,
                         lambda = 10^seq(-3, 1, length = 100)) # A range of lambda values

cat("\n--- Training Lasso Regression with caret ---\n")



```


```{r}

lasso_caret <- train(
 crim ~ .,
  data = Boston,
  method = "glmnet",
  tuneGrid = lassoGrid,
  trControl = fitControl,
  preProcess = c("center", "scale")
)



```

```{r}

# Print the model results
print(lasso_caret)


```

```{r}

# Get the best lambda found by caret
cat("Optimal lambda for Lasso (caret):", lasso_caret$bestTune$lambda, "\n")


```

```{r}

# Get the coefficients from the best model
coef(lasso_caret$finalModel, s = lasso_caret$bestTune$lambda)


```


```{r}

reg.fit.best <- regsubsets(crim ~ ., data = Boston[train, ], nvmax = 13)

test_matrix <- model.matrix(crim ~ ., data = Boston[test, ])


nv <- nrow(summary(reg.fit.best)$which)
val_errors <- rep(NA, nv)

for (i in 1:nv) {
  coeficient <- coef(reg.fit.best, id = i)
  pred <- test_matrix[, names(coeficient)] %*% coeficient
  val_errors[i] <- mean((Boston$crim[test] - pred)^2)
}


min(val_errors)


```


```{r}

which.min(val_errors)

```


```{r}

plot(val_errors, type = "b", xlab = "Number of Variables", ylab = "Validation MSE")

```

(b) Propose a model (or set of models) that seem to perform well on
this data set, and justify your answer. Make sure that you are
evaluating model performance using validation set error, crossvalidation,
or some other reasonable alternative, as opposed to
using training error.

Ridge regression had the lowest MSE of 34.71 using cross validation.


(c) Does your chosen model involve all of the features in the data
set? Why or why not?

The chosen model does use all features to evaluate the model. 
