Assignment 5

Question 2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer. (a) The lasso, relative to least squares, is: i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias. iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

III.Lasso is similar to ridge regression, in that it shrinks coefficients, but it goes further to reduce some coefficients to zero - removing some variables.This reduces the flexibility of the model compared to ordinary least squares. This reduction in flexibility decreases variance but may slightly increase bias.

Repeat (a) for ridge regression relative to least squares. III.Ridge regression adds a penalty term to the coefficients that reduces their magnitudes. This penalty term shrinks the coefficients towards zero, effectively reducing the flexibility of the model compared to ordinary least squares. This reduction in flexibility decreases variance but may slightly increase bias.
Repeat (a) for non-linear methods relative to least squares.

Non-linear methods introduce more complexity and flexibility to the model compared to ordinary least squares. This increased flexibility can lead to higher variance but may also capture more complex patterns in the data, potentially reducing bias. ### Question 9

In this exercise, we will predict the number of applications received using the other variables in the College data set. (a) Split the data set into a training set and a test set.

library(caret)

## Loading required package: ggplot2

## Loading required package: lattice

library(glmnet)

## Loading required package: Matrix

## Loaded glmnet 4.1-8

library(ISLR2)
college = College

# 80/20 Split
set.seed(1)
index = sample(nrow(college), 0.8*nrow(college), replace = F)
college_train = college[index,]
college_test = college[-index,]

Fit a linear model using least squares on the training set, and report the test error obtained.

lm1 = lm(Apps ~ ., data = college_train)
summary(lm1)

## 
## Call:
## lm(formula = Apps ~ ., data = college_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5555.2  -404.6    19.9   310.3  7577.7 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -630.58238  435.56266  -1.448 0.148209    
## PrivateYes  -388.97393  148.87623  -2.613 0.009206 ** 
## Accept         1.69123    0.04433  38.153  < 2e-16 ***
## Enroll        -1.21543    0.20873  -5.823 9.41e-09 ***
## Top10perc     50.45622    5.88174   8.578  < 2e-16 ***
## Top25perc    -13.62655    4.67321  -2.916 0.003679 ** 
## F.Undergrad    0.08271    0.03632   2.277 0.023111 *  
## P.Undergrad    0.06555    0.03367   1.947 0.052008 .  
## Outstate      -0.07562    0.01987  -3.805 0.000156 ***
## Room.Board     0.14161    0.05130   2.760 0.005947 ** 
## Books          0.21161    0.25184   0.840 0.401102    
## Personal       0.01873    0.06604   0.284 0.776803    
## PhD           -9.72551    4.91228  -1.980 0.048176 *  
## Terminal      -0.48690    5.43302  -0.090 0.928620    
## S.F.Ratio     18.26146   13.83984   1.319 0.187508    
## perc.alumni    1.39008    4.39572   0.316 0.751934    
## Expend         0.05764    0.01254   4.595 5.26e-06 ***
## Grad.Rate      5.89480    3.11185   1.894 0.058662 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 993.8 on 603 degrees of freedom
## Multiple R-squared:  0.9347, Adjusted R-squared:  0.9328 
## F-statistic: 507.5 on 17 and 603 DF,  p-value: < 2.2e-16

lm1_pred=predict(lm1, college_test)
lm1_test_error=mean((college_test$Apps-lm1_pred)^2)
lm1_test_error

## [1] 1567324

The test error obtained is 1567324.

Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

set.seed(1)
rm1 = cv.glmnet(as.matrix(college_train[, -1]), college_train$Apps, alpha = 0)
best_lambda_ridge = rm1$lambda.min
best_lambda_ridge

## [1] 383.1245

rm1_pred <- predict(rm1, s=best_lambda_ridge, newx=as.matrix(college_test[, -1]))
rm1_test_error <- mean((rm1_pred-college_test$Apps)^2)
rm1_test_error

## [1] 268542.9

The test error obtained is 268542.9.

Fit a lasso model on the training set, with λ chosen by cross- validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

set.seed(1)
lasso_model <- cv.glmnet(as.matrix(college_train[, -1]), college_train$Apps, alpha = 1)
best_lambda_lasso <- lasso_model$lambda.min
best_lambda_lasso

## [1] 111.6828

lasso_pred <- predict(lasso_model, s = best_lambda_lasso, newx = as.matrix(college_test[, -1]))
lasso_test_error <- mean((lasso_pred - college_test$Apps)^2)
lasso_test_error

## [1] 13706.72

num_non_zero_coef <- sum(coef(lasso_model, s = best_lambda_lasso) != 0)
num_non_zero_coef

## [1] 2

The test error obtained is 13706.72. There are 2 non-zero coefficients.

Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these approaches?

To summarize test errors for three methods:

Linear Regression: Test error of approximately 1,567,324. Ridge Regression: Test error of approximately 268,542.9. Lasso Regression: Test error of approximately 13,706.72 with 2 non-zero coefficients.

We can observe a significant improvement in prediction accuracy when moving from linear regression to ridge regression, and then to lasso regression. Lasso regression outperforms both linear and ridge regression models with substantially lower test error.This indicates tje lasso model’s effectiveness in capturing the underlying patterns in the data and making accurate predictions. Additionally, having only two non-zero coefficients shows that the lasso model successfully performed feature selection, which could enhance model interpretability and generalization to new data.

Assignment 5

Emily Bates

2024-03-22

Question 2