Assignment5

Chapter 6 Questions

a. iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Lasso simplifies a model by shrinking coefficients toward zero, and even removing some completely. It is less complex than regular least squares regression, which tries to fit the data exactly. Although lasso may make some small bias mistakes, it does not overreact to variance in the data.Lasso can actually make better predictions and it can be more accurate in many cases.

b. iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Ridge regression adds a penalty to keep the model from becoming too complex. It is less flexible than regular least squares, which tries to fit the data perfectly. Ridge might not fit the training data as closely due to bias but it reduces variance which often leads to better predictions on new data.

c. i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Non-linear methods are more flexible and can fit more complex patterns in data. This flexibility can help them make better predictions, when the real relationship between variables isn’t straight-line simple. Being very flexible can also make them overfit. If the benefit from reducing variance is greater than the cost of introducing some bias, then these methods can actually be more accurate.

library(ISLR2)
library(leaps)
library(MASS)
library(glmnet)
library(pls)
data(College)

set.seed(1)

n <- nrow(College)
train_index <- sample(1:n, size = round(0.7 * n))

train_set <- College[train_index, ]
test_set <- College[-train_index, ]

train_set$Private <- ifelse(train_set$Private == "Yes", 1, 0)
test_set$Private <- ifelse(test_set$Private == "Yes", 1, 0)

lm_fit <- lm(Outstate ~ ., data = train_set)
predictions <- predict(lm_fit, newdata = test_set)

mse <- mean((test_set$Outstate - predictions)^2)
mse

## [1] 3419805

x_train <- model.matrix(Outstate ~ ., data = train_set)[, -1]
y_train <- train_set$Outstate

x_test <- model.matrix(Outstate ~ ., data = test_set)[, -1]
y_test <- test_set$Outstate

cv_ridge <- cv.glmnet(x_train, y_train, alpha = 0)

best_lambda <- cv_ridge$lambda.min
ridge_pred <- predict(cv_ridge, s = best_lambda, newx = x_test)

ridge_mse <- mean((y_test - ridge_pred)^2)
ridge_mse

## [1] 3456248

cv_lasso <- cv.glmnet(x_train, y_train, alpha = 1)

best_lambda_lasso <- cv_lasso$lambda.min
lasso_pred <- predict(cv_lasso, s = best_lambda_lasso, newx = x_test)

lasso_mse <- mean((y_test - lasso_pred)^2)

lasso_coef <- predict(cv_lasso, s = best_lambda_lasso, type = "coefficients")
nonzero_coef_count <- sum(lasso_coef != 0) - 1 

lasso_mse

## [1] 3383435

nonzero_coef_count

## [1] 14

pcr_fit <- pcr(Outstate ~ ., data = train_set, scale = TRUE, validation = "CV")

cv_msep <- RMSEP(pcr_fit)
M_opt <- which.min(cv_msep$val[1, 1, -1]) 

pcr_pred <- predict(pcr_fit, newdata = test_set, ncomp = M_opt)
pcr_mse <- mean((test_set$Outstate - pcr_pred)^2)

pcr_mse

## [1] 3560640

M_opt

## 16 comps 
##       16

pls_fit <- plsr(Outstate ~ ., data = train_set, scale = TRUE, validation = "CV")

cv_msep1 <- RMSEP(pls_fit)
M_opt1 <- which.min(cv_msep1$val[1, 1, -1])  

pls_pred <- predict(pls_fit, newdata = test_set, ncomp = M_opt1)
pls_mse <- mean((test_set$Outstate - pls_pred)^2)

pls_mse

## [1] 3434281

M_opt1

## 11 comps 
##       11

g. The number of college applications can be predicted reasonably well, though not perfectly. Lasso performed the best. The differences in test errors between models are not dramatic, suggesting that the predictors used provide a moderate level of predictive power, and further improvements may require more informative or nonlinear features.

11.

a. There are several methods to predict the crime rate in Boston neighborhoods. The regular linear model chose the best combination of variables and had a good prediction. Lasso regression worked slightly better. It made accurate predictions and simplified the model by removing unimportant information. Overall, lasso had the best balance between accuracy and simplicity.

data(Boston)

set.seed(2)

nb <- nrow(Boston)
train_indexb <- sample(1:nb, round(0.7 * nb))
train <- Boston[train_indexb, ]
test <- Boston[-train_indexb, ]

#Best Subset Selection
regfit_full <- regsubsets(crim ~ ., data = train, nvmax = 13)
val_errors <- rep(NA, 13)

test_mat <- model.matrix(crim ~ ., data = test)
for (i in 1:13) {
  coef_i <- coef(regfit_full, id = i)
  pred_i <- test_mat[, names(coef_i)] %*% coef_i
  val_errors[i] <- mean((test$crim - pred_i)^2)
}

best_size <- which.min(val_errors)
best_subset_mse <- val_errors[best_size]

#Ridge Regression
x_trainb <- model.matrix(crim ~ ., data = train)[, -1]
y_trainb <- train$crim
x_testb <- model.matrix(crim ~ ., data = test)[, -1]
y_testb <- test$crim

cv_ridgeb <- cv.glmnet(x_trainb, y_trainb, alpha = 0)
ridge_predb <- predict(cv_ridgeb, s = cv_ridgeb$lambda.min, newx = x_testb)
ridge_mseb <- mean((y_testb - ridge_predb)^2)

#Lasso Regression
cv_lassob <- cv.glmnet(x_trainb, y_trainb, alpha = 1)
lasso_predb <- predict(cv_lassob, s = cv_lassob$lambda.min, newx = x_testb)
lasso_mseb <- mean((y_testb - lasso_predb)^2)

lasso_coefb <- predict(cv_lassob, s = cv_lassob$lambda.min, type = "coefficients")
lasso_nonzero <- sum(lasso_coefb != 0) - 1

#PCR
pcr_fitb <- pcr(crim ~ ., data = train, scale = TRUE, validation = "CV")
pcr_msep <- RMSEP(pcr_fitb)
pcr_M <- which.min(pcr_msep$val[1, 1, -1])
pcr_predb <- predict(pcr_fitb, newdata = test, ncomp = pcr_M)
pcr_mseb <- mean((test$crim - pcr_predb)^2)

b. Lasso because it made accurate predictions while also simplifying the model by removing less important information, making it easier to understand. It had the lowest error and avoided using unnecessary details.

c. No, the chosen model does not involve all of the features in the dataset. That’s because it automatically removes features that don’t help with making predictions. It makes the model simpler, easier to interpret, and less likely to be influenced by variance or bias. It only keeps the most useful features, which improves performance on new data and avoids overfitting.

Assignment5

Mandi Stanley

2025-07-10

Chapter 6 Questions