The lasso is less flexible than least squares because it imposes an ℓ₁ penalty that shrinks coefficients and can set some equal to zero. This increases bias but reduces variance, so lasso improves prediction when the reduction in variance outweighs the bias increase.
Ridge regression is also less flexible than least squares. Using an ℓ₂ penalty, it shrinks coefficients toward zero (without eliminating them), which introduces bias but lowers variance. Prediction improves when the variance decrease compensates for the bias increase.
Non-linear methods are more flexible than least squares. They typically have lower bias because they can follow complex trends, but they tend to have higher variance. They improve prediction when the reduction in bias more than offsets the increase in variance.
# Load the College dataset from ISLR
library(ISLR2)
## Warning: package 'ISLR2' was built under R version 4.4.2
data("College")
# Split the data: 70% training and 30% test
set.seed(1)
train_index <- sample(1:nrow(College), size = 0.7 * nrow(College))
train_data <- College[train_index, ]
test_data <- College[-train_index, ]
# Fit a linear regression model predicting the number of applications ("Apps")
fit_lm <- lm(Apps ~ ., data = train_data)
# Make predictions on the test set
pred_lm <- predict(fit_lm, newdata = test_data)
# Calculate the residuals and MSE for the test set
residuals_lm <- test_data$Apps - pred_lm
test_mse_lm <- mean(residuals_lm^2)
# Report the test MSE
cat("Test MSE for Least Squares:", test_mse_lm, "\n")
## Test MSE for Least Squares: 1261630
# Load the glmnet package for ridge regression
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.4.2
## Loading required package: Matrix
## Loaded glmnet 4.1-8
# Prepare the data for glmnet (response and predictors)
x_train <- model.matrix(Apps ~ ., train_data)[, -1] # Remove intercept
y_train <- train_data$Apps
x_test <- model.matrix(Apps ~ ., test_data)[, -1]
y_test <- test_data$Apps
# Fit a ridge regression model (alpha = 0 for ridge)
ridge_model <- cv.glmnet(x_train, y_train, alpha = 0)
# Get the best lambda chosen by cross-validation
best_lambda_ridge <- ridge_model$lambda.min
# Make predictions on the test set using the best lambda
pred_ridge <- predict(ridge_model, s = best_lambda_ridge, newx = x_test)
# Calculate the residuals and MSE for the test set
residuals_ridge <- y_test - pred_ridge
test_mse_ridge <- mean(residuals_ridge^2)
# Report the test MSE
cat("Test MSE for Ridge Regression:", test_mse_ridge, "\n")
## Test MSE for Ridge Regression: 1121034
# Fit a lasso regression model (alpha = 1 for lasso)
lasso_model <- cv.glmnet(x_train, y_train, alpha = 1)
# Get the best lambda chosen by cross-validation
best_lambda_lasso <- lasso_model$lambda.min
# Make predictions on the test set using the best lambda
pred_lasso <- predict(lasso_model, s = best_lambda_lasso, newx = x_test)
# Calculate the residuals and MSE for the test set
residuals_lasso <- y_test - pred_lasso
test_mse_lasso <- mean(residuals_lasso^2)
# Get the number of non-zero coefficients
num_non_zero_lasso <- sum(coef(lasso_model, s = best_lambda_lasso) != 0)
# Report the test MSE and number of non-zero coefficients
cat("Test MSE for Lasso Regression:", test_mse_lasso, "\n")
## Test MSE for Lasso Regression: 1254408
cat("Number of non-zero coefficients in Lasso:", num_non_zero_lasso, "\n")
## Number of non-zero coefficients in Lasso: 18
# Load the pls package for PCR
library(pls)
## Warning: package 'pls' was built under R version 4.4.3
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
# Fit a PCR model (cross-validation to choose the number of components)
pcr_model <- pcr(Apps ~ ., data = train_data, scale = TRUE, validation = "CV")
# Get the number of components (M) chosen by cross-validation
best_M_pcr <- pcr_model$ncomp
# Make predictions on the test set
pred_pcr <- predict(pcr_model, ncomp = best_M_pcr, newdata = test_data)
# Calculate the residuals and MSE for the test set
residuals_pcr <- test_data$Apps - pred_pcr
test_mse_pcr <- mean(residuals_pcr^2)
# Report the test MSE and value of M
cat("Test MSE for PCR:", test_mse_pcr, "\n")
## Test MSE for PCR: 1261630
cat("Value of M selected by cross-validation for PCR:", best_M_pcr, "\n")
## Value of M selected by cross-validation for PCR: 17
pls_model <- plsr(Apps ~ ., data = train_data, scale = TRUE, validation = "CV")
best_M_pls <- pls_model$ncomp
pred_pls <- predict(pls_model, ncomp = best_M_pls, newdata = test_data)
residuals_pls <- test_data$Apps - pred_pls
test_mse_pls <- mean(residuals_pls^2)
cat("Test MSE for PLS:", test_mse_pls, "\n")
## Test MSE for PLS: 1261630
cat("Value of M selected by cross-validation for PLS:", best_M_pls, "\n")
## Value of M selected by cross-validation for PLS: 17
cat("\nSummary of test errors:\n")
##
## Summary of test errors:
cat("Least Squares Test MSE:", test_mse_lm, "\n")
## Least Squares Test MSE: 1261630
cat("Ridge Regression Test MSE:", test_mse_ridge, "\n")
## Ridge Regression Test MSE: 1121034
cat("Lasso Regression Test MSE:", test_mse_lasso, "\n")
## Lasso Regression Test MSE: 1254408
cat("PCR Test MSE:", test_mse_pcr, "\n")
## PCR Test MSE: 1261630
cat("PLS Test MSE:", test_mse_pls, "\n")
## PLS Test MSE: 1261630
Best Performing Model: Ridge regression outperforms the other methods with the lowest test MSE (1,121,034). This suggests that regularization helps improve prediction accuracy by reducing overfitting, especially when there is multicollinearity among predictors.
Comparison of Other Models:
Least Squares, PCR, and PLS have the same MSE, indicating that dimensionality reduction or the lack of regularization does not yield a performance improvement over the standard linear regression model in this case.
Lasso Regression performed similarly to the least squares model, suggesting that variable selection didn’t significantly improve the model’s performance for this data.
# Load necessary libraries and the Boston dataset
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:ISLR2':
##
## Boston
library(leaps)
library(glmnet)
library(pls)
# Load the Boston data
data("Boston")
# Split the data into training and test sets
set.seed(1)
train_index <- sample(1:nrow(Boston), size = 0.7 * nrow(Boston))
train_data <- Boston[train_index, ]
test_data <- Boston[-train_index, ]
# Prepare the response variable (crim) and predictors
y_train <- train_data$crim
x_train <- model.matrix(crim ~ ., train_data)[, -1] # Remove intercept
y_test <- test_data$crim
x_test <- model.matrix(crim ~ ., test_data)[, -1] # Remove intercept
# Perform best subset selection
best_subset <- regsubsets(crim ~ ., data = train_data, nvmax = 13)
best_subset_summary <- summary(best_subset)
# Plot the model selection criteria (Cp and BIC)
par(mfrow = c(1, 2))
plot(best_subset_summary$cp, xlab = "Number of Variables", ylab = "Cp", type = "l")
plot(best_subset_summary$bic, xlab = "Number of Variables", ylab = "BIC", type = "l")
# Best model according to BIC
best_model_idx <- which.min(best_subset_summary$bic)
best_subset_model <- coef(best_subset, id = best_model_idx)
cat("Best Subset Model with BIC has the following coefficients:\n")
## Best Subset Model with BIC has the following coefficients:
print(best_subset_model)
## (Intercept) rad black lstat
## 0.95384519 0.43521730 -0.01310151 0.24416589
# Now, manually make predictions using the selected model
# Select the predictor variables used in the best model
selected_vars <- names(best_subset_model)[-1] # Exclude intercept
x_test_selected <- test_data[, selected_vars]
# Add intercept to the prediction
pred_best_subset <- as.matrix(cbind(1, x_test_selected)) %*% best_subset_model
# Calculate the test MSE
test_mse_best_subset <- mean((y_test - pred_best_subset)^2)
cat("Test MSE for Best Subset Selection:", test_mse_best_subset, "\n")
## Test MSE for Best Subset Selection: 61.86531
# Fit a ridge regression model using cross-validation to choose lambda
ridge_model <- cv.glmnet(x_train, y_train, alpha = 0)
# Get the best lambda chosen by cross-validation
best_lambda_ridge <- ridge_model$lambda.min
# Make predictions on the test set
pred_ridge <- predict(ridge_model, s = best_lambda_ridge, newx = x_test)
# Calculate the test MSE
test_mse_ridge <- mean((y_test - pred_ridge)^2)
cat("Test MSE for Ridge Regression:", test_mse_ridge, "\n")
## Test MSE for Ridge Regression: 60.15076
# Fit a lasso regression model using cross-validation to choose lambda
lasso_model <- cv.glmnet(x_train, y_train, alpha = 1)
# Get the best lambda chosen by cross-validation
best_lambda_lasso <- lasso_model$lambda.min
# Make predictions on the test set
pred_lasso <- predict(lasso_model, s = best_lambda_lasso, newx = x_test)
# Calculate the test MSE
test_mse_lasso <- mean((y_test - pred_lasso)^2)
cat("Test MSE for Lasso Regression:", test_mse_lasso, "\n")
## Test MSE for Lasso Regression: 59.51757
# Fit a PCR model with cross-validation to choose the number of components (M)
pcr_model <- pcr(crim ~ ., data = train_data, scale = TRUE, validation = "CV")
# Get the optimal number of components chosen by cross-validation
best_M_pcr <- pcr_model$ncomp
# Make predictions on the test set
pred_pcr <- predict(pcr_model, ncomp = best_M_pcr, newdata = test_data)
# Calculate the test MSE
test_mse_pcr <- mean((y_test - pred_pcr)^2)
cat("Test MSE for PCR:", test_mse_pcr, "\n")
## Test MSE for PCR: 58.97718
cat("\n### Summary of Test MSE for Different Models ###\n")
##
## ### Summary of Test MSE for Different Models ###
cat("Best Subset Selection Test MSE:", test_mse_best_subset, "\n")
## Best Subset Selection Test MSE: 61.86531
cat("Ridge Regression Test MSE:", test_mse_ridge, "\n")
## Ridge Regression Test MSE: 60.15076
cat("Lasso Regression Test MSE:", test_mse_lasso, "\n")
## Lasso Regression Test MSE: 59.51757
cat("PCR Test MSE:", test_mse_pcr, "\n")
## PCR Test MSE: 58.97718
Based on the Test MSE results, PCR is the best performing model in this case with the lowest test MSE of 58.98. Therefore, we would propose Principal Component Regression (PCR) as the most accurate model for predicting the per capita crime rate in the Boston dataset.
Justification for Choosing PCR:
Performance: PCR achieved the lowest test MSE compared to all other models, making it the most accurate for this dataset.
Dimensionality Reduction: PCR uses principal components to reduce multicollinearity and dimensionality. This helps to improve model stability and predictive accuracy, especially when there is collinearity in the features.
Flexibility: Unlike other methods, PCR does not rely on regularization techniques and instead transforms the feature space, which may have better predictive power on this dataset.
The PCR model does not involve all of the original features. Instead,
it uses a smaller set of principal components, which are linear
combinations of the original features. The number of components
(M) selected in the model determines how much of the
original features’ information is retained. Only the most important
components (those explaining the most variance) are used, so not all
features are directly involved in the final model.