For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer. (a) The lasso, relative to least squares, is: i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias. iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
III.Lasso is similar to ridge regression, in that it shrinks coefficients, but it goes further to reduce some coefficients to zero - removing some variables.This reduces the flexibility of the model compared to ordinary least squares. This reduction in flexibility decreases variance but may slightly increase bias.
Repeat (a) for ridge regression relative to least squares. III.Ridge regression adds a penalty term to the coefficients that reduces their magnitudes. This penalty term shrinks the coefficients towards zero, effectively reducing the flexibility of the model compared to ordinary least squares. This reduction in flexibility decreases variance but may slightly increase bias.
Repeat (a) for non-linear methods relative to least squares.
In this exercise, we will predict the number of applications received using the other variables in the College data set. (a) Split the data set into a training set and a test set.
library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-8
library(ISLR2)
college = College
# 80/20 Split
set.seed(1)
index = sample(nrow(college), 0.8*nrow(college), replace = F)
college_train = college[index,]
college_test = college[-index,]
lm1 = lm(Apps ~ ., data = college_train)
summary(lm1)
##
## Call:
## lm(formula = Apps ~ ., data = college_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5555.2 -404.6 19.9 310.3 7577.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -630.58238 435.56266 -1.448 0.148209
## PrivateYes -388.97393 148.87623 -2.613 0.009206 **
## Accept 1.69123 0.04433 38.153 < 2e-16 ***
## Enroll -1.21543 0.20873 -5.823 9.41e-09 ***
## Top10perc 50.45622 5.88174 8.578 < 2e-16 ***
## Top25perc -13.62655 4.67321 -2.916 0.003679 **
## F.Undergrad 0.08271 0.03632 2.277 0.023111 *
## P.Undergrad 0.06555 0.03367 1.947 0.052008 .
## Outstate -0.07562 0.01987 -3.805 0.000156 ***
## Room.Board 0.14161 0.05130 2.760 0.005947 **
## Books 0.21161 0.25184 0.840 0.401102
## Personal 0.01873 0.06604 0.284 0.776803
## PhD -9.72551 4.91228 -1.980 0.048176 *
## Terminal -0.48690 5.43302 -0.090 0.928620
## S.F.Ratio 18.26146 13.83984 1.319 0.187508
## perc.alumni 1.39008 4.39572 0.316 0.751934
## Expend 0.05764 0.01254 4.595 5.26e-06 ***
## Grad.Rate 5.89480 3.11185 1.894 0.058662 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 993.8 on 603 degrees of freedom
## Multiple R-squared: 0.9347, Adjusted R-squared: 0.9328
## F-statistic: 507.5 on 17 and 603 DF, p-value: < 2.2e-16
lm1_pred=predict(lm1, college_test)
lm1_test_error=mean((college_test$Apps-lm1_pred)^2)
lm1_test_error
## [1] 1567324
The test error obtained is 1567324.
set.seed(1)
rm1 = cv.glmnet(as.matrix(college_train[, -1]), college_train$Apps, alpha = 0)
best_lambda_ridge = rm1$lambda.min
best_lambda_ridge
## [1] 383.1245
rm1_pred <- predict(rm1, s=best_lambda_ridge, newx=as.matrix(college_test[, -1]))
rm1_test_error <- mean((rm1_pred-college_test$Apps)^2)
rm1_test_error
## [1] 268542.9
The test error obtained is 268542.9.
set.seed(1)
lasso_model <- cv.glmnet(as.matrix(college_train[, -1]), college_train$Apps, alpha = 1)
best_lambda_lasso <- lasso_model$lambda.min
best_lambda_lasso
## [1] 111.6828
lasso_pred <- predict(lasso_model, s = best_lambda_lasso, newx = as.matrix(college_test[, -1]))
lasso_test_error <- mean((lasso_pred - college_test$Apps)^2)
lasso_test_error
## [1] 13706.72
num_non_zero_coef <- sum(coef(lasso_model, s = best_lambda_lasso) != 0)
num_non_zero_coef
## [1] 2
The test error obtained is 13706.72. There are 2 non-zero coefficients.
To summarize test errors for three methods:
Linear Regression: Test error of approximately 1,567,324. Ridge Regression: Test error of approximately 268,542.9. Lasso Regression: Test error of approximately 13,706.72 with 2 non-zero coefficients.
We can observe a significant improvement in prediction accuracy when moving from linear regression to ridge regression, and then to lasso regression. Lasso regression outperforms both linear and ridge regression models with substantially lower test error.This indicates tje lasso model’s effectiveness in capturing the underlying patterns in the data and making accurate predictions. Additionally, having only two non-zero coefficients shows that the lasso model successfully performed feature selection, which could enhance model interpretability and generalization to new data.