Question 2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer. (a) The lasso, relative to least squares, is:

i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

The Lasso will lead to option iii. Since lasso shrinks coefficients and can set some to zero making it less flexible than least squares which will introduce more bias but reduce variance. Predictions improve when variation reduction is higher than the increase of bias.

(b) Repeat (a) for ridge regression relative to least squares.

The Ridge regression will lead to option iii. Because this method also uses shrinkage. Very similar to the lasso method but it retains all coefficients in the model. The usage of the penalty term discourages large coefficients which introduce bias and reduces variances in the model. Predictions improve when variation reduction is higher than the increase of bias.

(c) Repeat (a) for non-linear methods relative to least squares.

Non-linear methods will lead to option i. which allows for more complex relationships to occur between predictors and response variables which leads to more flexibility in the model and that usually means lower bias and higher variance. Predictions improve when variance increases slightly and bias reduces significantly.

Question 9 In this exercise, we will predict the number of applications received using the other variables in the College data set.

(a) Split the data set into a training set and a test set.

library(ISLR2)
## Warning: package 'ISLR2' was built under R version 4.3.3
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.3.2
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.3.2
## Loaded glmnet 4.1-8
library(leaps)
## Warning: package 'leaps' was built under R version 4.3.3
library(pls)
## Warning: package 'pls' was built under R version 4.3.3
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
data(College)
set.seed(1)
n <- nrow(College)
train <- sample(1:n, size = round(0.8 * n))
test <- setdiff(1:n, train)
train_data <- College[train, ]
test_data <- College[test, ]
x <- model.matrix(Apps ~ ., data = College)[, -1]
y <- College$Apps

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

lm.fit <- lm(Apps ~ ., data = train_data)
lm.pred <- predict(lm.fit, newdata = test_data)
lm.mse <- mean((lm.pred - test_data$Apps)^2)
lm.mse
## [1] 1578073

(c) Fit a ridge regression model on the training set, with chosen by cross-validation. Report the test error obtained.

grid <- 10^seq(10, -2, length = 100)
ridge.mod <- glmnet(x[train, ], y[train], alpha = 0, lambda = grid)
cv.ridge <- cv.glmnet(x[train, ], y[train], alpha = 0)
bestlam.ridge <- cv.ridge$lambda.min
bestlam.ridge
## [1] 362.6608
ridge.pred <- predict(ridge.mod, s = bestlam.ridge, newx = x[test, ])
ridge.mse <- mean((ridge.pred - y[test])^2)
ridge.mse
## [1] 1445510
cat("\n--- Ridge Results ---\n",
    "Test MSE:", round(ridge.mse, 2), 
    "; best λ:", signif(bestlam.ridge, 3), "\n")
## 
## --- Ridge Results ---
##  Test MSE: 1445510 ; best λ: 363

(d) Fit a lasso model on the training set, with chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.

lasso.mod <- glmnet(x[train, ], y[train], alpha = 1, lambda = grid)
cv.lasso <- cv.glmnet(x[train, ], y[train], alpha = 1)
bestlam.lasso <- cv.lasso$lambda.min

bestlam.lasso
## [1] 1.935412
lasso.pred <- predict(lasso.mod, s = bestlam.lasso, newx = x[test, ])
lasso.mse <- mean((lasso.pred - y[test])^2)

lasso.coef <- predict(lasso.mod, s = bestlam.lasso, type = "coefficients")[1:18, ]
nonzero.count <- sum(lasso.coef != 0) - 1  # exclude intercept

lasso.mse
## [1] 1565419
nonzero.count
## [1] 17
cat("\n--- Lasso Results ---\n",
    "Test MSE:", round(lasso.mse, 2), 
    "; best λ:", signif(bestlam.lasso, 3), 
    "; non‑zero coefficients:", nonzero.count, "\n")
## 
## --- Lasso Results ---
##  Test MSE: 1565419 ; best λ: 1.94 ; non‑zero coefficients: 17

(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

pcr.fit <- pcr(Apps ~ ., data = train_data, scale = TRUE, validation = "CV")
summary(pcr.fit)
## Data:    X dimension: 622 17 
##  Y dimension: 622 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3834     3753     2060     2071     1672     1594     1579
## adjCV         3834     3754     2057     2072     1665     1583     1576
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1566     1526     1497      1483      1486      1488      1484
## adjCV     1565     1519     1494      1481      1484      1485      1481
##        14 comps  15 comps  16 comps  17 comps
## CV         1486      1464      1160      1079
## adjCV      1483      1454      1150      1073
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X      32.019    57.05    64.13    70.01    75.36    80.38    84.09    87.44
## Apps    4.315    72.01    72.02    81.89    83.65    83.73    83.98    85.12
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.48     92.84     94.92     96.78     97.86     98.72     99.36
## Apps    85.40     85.75     85.75     85.76     85.88     85.94     89.94
##       16 comps  17 comps
## X        99.83    100.00
## Apps     92.88     93.47
validationplot(pcr.fit, val.type = "MSEP")

# Find number of components with lowest CV error
pcr.pred <- predict(pcr.fit, newdata = test_data, ncomp = which.min(pcr.fit$validation$PRESS))
pcr.mse <- mean((pcr.pred - test_data$Apps)^2)
pcr.mse
## [1] 1578073
pcr.ncomp <- which.min(pcr.fit$validation$PRESS)
pcr.ncomp
## [1] 17
cat("\n--- PCR Results ---\n",
    "Test MSE:", round(pcr.mse, 2), 
    "; Optimal number of components (M):", pcr.ncomp, "\n")
## 
## --- PCR Results ---
##  Test MSE: 1578073 ; Optimal number of components (M): 17

(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

pls.fit <- plsr(Apps ~ ., data = train_data, scale = TRUE, validation = "CV")
summary(pls.fit)
## Data:    X dimension: 622 17 
##  Y dimension: 622 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3834     1880     1630     1445     1391     1243     1165
## adjCV         3834     1876     1630     1440     1374     1221     1153
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1143     1130     1124      1121      1122      1118      1116
## adjCV     1133     1121     1116      1112      1113      1109      1108
##        14 comps  15 comps  16 comps  17 comps
## CV         1116      1116      1116      1116
## adjCV      1108      1108      1108      1108
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       25.52    45.30    62.57    65.06    67.50    72.05    76.04    80.49
## Apps    77.30    83.58    87.50    90.88    92.89    93.15    93.24    93.31
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       82.50     85.41     87.76     91.08     92.72     95.12     96.97
## Apps    93.39     93.42     93.45     93.46     93.46     93.47     93.47
##       16 comps  17 comps
## X        97.98    100.00
## Apps     93.47     93.47
validationplot(pls.fit, val.type = "MSEP")

pls.pred <- predict(pls.fit, newdata = test_data, ncomp = which.min(pls.fit$validation$PRESS))
pls.mse <- mean((pls.pred - test_data$Apps)^2)
pls.mse
## [1] 1578073
pls.ncomp <- which.min(pls.fit$validation$PRESS)
pls.ncomp
## [1] 17
cat("\n--- PLS Results ---\n",
    "Test MSE:", round(pls.mse, 2), 
    "; Optimal number of components (M):", pls.ncomp, "\n")
## 
## --- PLS Results ---
##  Test MSE: 1578073 ; Optimal number of components (M): 17

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

cat("Test MSEs:\n")
## Test MSEs:
cat("Linear Regression:", lm.mse, "\n")
## Linear Regression: 1578073
cat("Ridge Regression:", ridge.mse, "\n")
## Ridge Regression: 1445510
cat("Lasso Regression:", lasso.mse, "\n")
## Lasso Regression: 1565419
cat("PCR:", pcr.mse, "\n")
## PCR: 1578073
cat("PLS:", pls.mse, "\n")
## PLS: 1578073
range_values <- range(College$Apps)
print(range_values)
## [1]    81 48094

Prediction are off by about 1225 applications per college which is about 2.5% of the total range making my models predicting fairly accurately. There is not much difference in MSE between majority of approaches we used. The difference between the best approach which is Ridge Regression of 1,445,510 and the worst approach which is PCR/Linear of 1,578,073 is 132,000 in MSE.

Question 11 We will now try to predict per capita crime rate in the Boston data set.

a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

library(leaps)
data(Boston)
set.seed(1)
train_indices <- sample(1:nrow(Boston), size = 0.8 * nrow(Boston))
train_data <- Boston[train_indices, ]
test_data <- Boston[-train_indices, ]
# Ridge Regression Method 
x_train <- model.matrix(crim ~ ., data = train_data)[, -1]
y_train <- train_data$crim
x_test <- model.matrix(crim ~ ., data = test_data)[, -1]
y_test <- test_data$crim

# Fit ridge with CV
cv.ridge <- cv.glmnet(x_train, y_train, alpha = 0)
best_lambda_ridge <- cv.ridge$lambda.min
ridge.pred <- predict(cv.ridge, s = best_lambda_ridge, newx = x_test)
ridge_mse <- mean((ridge.pred - y_test)^2)
cat("Ridge test MSE:", ridge_mse, "\n")
## Ridge test MSE: 65.66146
#Lasso Regession 
# Fit lasso with CV
cv.lasso <- cv.glmnet(x_train, y_train, alpha = 1)
best_lambda_lasso <- cv.lasso$lambda.min
lasso.pred <- predict(cv.lasso, s = best_lambda_lasso, newx = x_test)
lasso_mse <- mean((lasso.pred - y_test)^2)
cat("Lasso test MSE:", lasso_mse, "\n")
## Lasso test MSE: 64.66192
# Number of non-zero coefficients
lasso.coef <- predict(cv.lasso, s = best_lambda_lasso, type = "coefficients")
cat("Number of non-zero coefficients in lasso:", sum(lasso.coef != 0) - 1, "\n")
## Number of non-zero coefficients in lasso: 11
#PCR Method

pcr.fit <- pcr(crim ~ ., data = train_data, scale = TRUE, validation = "CV")
validationplot(pcr.fit, val.type = "MSEP")

# Choose number of components with lowest CV error
pcr.ncomp <- which.min(pcr.fit$validation$PRESS)
pcr.pred <- predict(pcr.fit, newdata = test_data, ncomp = pcr.ncomp)
pcr.mse <- mean((pcr.pred - test_data$crim)^2)
cat("PCR test MSE:", pcr.mse, "with components:", pcr.ncomp, "\n")
## PCR test MSE: 64.30352 with components: 12

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

Among these models I ran, PCR results has the lowest test MSE of 64.30 comparatively with Lasso regression being 64.66 MSE and Ridge regression being 65.66 which indicates a slightly better predictive performance. if predictive accuracy is of importance. PCR is the best choice, but if simplicity of a model is important then Lasso is the better option.

(c) Does your chosen model involve all of the features in the data set? Why or why not?

I chose the PCR model since predictive accuracy is main priority in my decision. This chosen model does not involve all the 13 features in the dataset and instead it uses 12 components. This indicates that the model likely dropped the least informative component that added noise to the model and avoids the risk of overfitting the model.