Answer: option iii. is correct.
Because the lasso technique adds a regularization term to the loss function, it is less flexible than least squares. Large coefficients are penalized by this regularization term, which effectively lowers the complexity of the model. Because the lasso may decrease some coefficients to zero, so performing variable selection, the resulting models may have fewer parameters.
A less flexible model, such as the lasso, typically has higher bias but lower variance in the context of the bias-variance tradeoff as compared to a more flexible model, such as least squares. When the rise in bias is less than the matching drop in variance, the lasso will outperform least squares in prediction accuracy. This is due to the fact that when bias and variance are balanced, the overall mean squared error (MSE) is minimized. By introducing bias through regularization, the lasso seeks to strike this equilibrium while preventing overfitting and dramatically lowering variance.
Answer:option iii. is correct.
The ridge objective function RSS + λΣβj2, where the shrinkage penalty term for ridge regression is marginally different from that of the lasso, is the only real variation in this case.
This merely indicates that while the lasso may reduce the coefficients of less-useful features to exactly zero, ridge regression cannot. Nevertheless, the remainder of the argument—because shrinkage lowers variance at the expense of increased bias—remains valid.
Answer:
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.3.3
## Loading required package: Matrix
## Loaded glmnet 4.1-8
library(ISLR)
## Warning: package 'ISLR' was built under R version 4.3.2
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
attach(College)
set.seed(123)
#Splitting 70-30 ratio:
subset_split<-sample(nrow(College),nrow(College)*0.7)
train_data<-College[subset_split,]
test_data<-College[-subset_split,]
# Fit linear model using least squares on the training set
lm_model <- lm(Apps ~ ., data = train_data)
summary(lm_model)
##
## Call:
## lm(formula = Apps ~ ., data = train_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3097.8 -455.8 -46.5 343.8 6452.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -310.17331 481.30075 -0.644 0.519566
## PrivateYes -681.96465 164.08211 -4.156 3.78e-05 ***
## Accept 1.22130 0.05921 20.626 < 2e-16 ***
## Enroll 0.08046 0.21794 0.369 0.712155
## Top10perc 49.33503 6.18296 7.979 9.31e-15 ***
## Top25perc -16.11744 5.02717 -3.206 0.001428 **
## F.Undergrad 0.02284 0.03985 0.573 0.566831
## P.Undergrad 0.03541 0.03529 1.003 0.316139
## Outstate -0.05446 0.02132 -2.555 0.010910 *
## Room.Board 0.18967 0.05275 3.596 0.000354 ***
## Books 0.21366 0.28099 0.760 0.447381
## Personal -0.03685 0.07279 -0.506 0.612876
## PhD -6.00401 5.34580 -1.123 0.261897
## Terminal -5.01712 5.77787 -0.868 0.385609
## S.F.Ratio -2.18927 14.83898 -0.148 0.882766
## perc.alumni -8.01836 4.67330 -1.716 0.086792 .
## Expend 0.07614 0.01340 5.681 2.23e-08 ***
## Grad.Rate 10.63461 3.38228 3.144 0.001760 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 992.3 on 525 degrees of freedom
## Multiple R-squared: 0.9175, Adjusted R-squared: 0.9148
## F-statistic: 343.2 on 17 and 525 DF, p-value: < 2.2e-16
# Make predictions on the test set
test_predictions <- predict(lm_model, test_data)
# Compute test error (mean squared error)
test_error <- mean((test_data$Apps - test_predictions)^2)
test_error
## [1] 1734841
The test MSE is 1734841
Answer:
# alpha=0 for ridge regression model
# Create matrix for training set and validation set
train_data.mat <- model.matrix(Apps ~ ., data = train_data)
validation_data.mat <- model.matrix(Apps ~ ., data = test_data)
# Define grid covering all the range of lambda
grid <- 10^seq(4, -2, length = 100)
# Perform grid search to find the best lambda
mse <- rep(NA, length(grid))
for (i in 1:length(grid)) {
ridge <- glmnet(train_data.mat, train_data$Apps, alpha = 0, lambda = grid[i], thresh = 1e-12)
pred <- predict(ridge, s = grid[i], newx = validation_data.mat)
mse[i] <- mean((test_data$Apps - pred)^2)
}
# Find the index of the lambda with the minimum MSE
best_lambda_index <- which.min(mse)
best_lambda <- grid[best_lambda_index]
best_lambda
## [1] 0.01
# Get the predicted values on the test set using the ridge model
pred_test <- predict(ridge, s = best_lambda, newx = validation_data.mat)
# Calculate Mean Square Error (MSE) on the test set
test_mse <- mean((test_data$Apps - pred_test)^2)
test_mse
## [1] 1734931
The test MSE is higher for ridge regression(1734931) than for least squares regression(1734841).
# alpha=1 for lasso regression model
# Create matrix for training set and validation set
train_data_1.mat <- model.matrix(Apps ~ ., data = train_data)
validation_data_1.mat <- model.matrix(Apps ~ ., data = test_data)
# Define grid covering all the range of lambda
grid_1 <- 10^seq(4, -2, length = 100)
# Perform grid search to find the best lambda
mse_1 <- rep(NA, length(grid_1))
for (i in 1:length(grid)) {
ridge_1 <- glmnet(train_data.mat, train_data$Apps, alpha = 1, lambda = grid_1[i], thresh = 1e-12)
pred_1 <- predict(ridge_1, s = grid_1[i], newx = validation_data_1.mat)
mse_1[i] <- mean((test_data$Apps - pred_1)^2)
}
# Find the index of the lambda with the minimum MSE
best_lambda_index_1 <- which.min(mse_1)
best_lambda_1 <- grid[best_lambda_index_1]
best_lambda_1
## [1] 4.641589
# Get the predicted values on the test set using the lasso model
pred_test_1 <- predict(ridge_1, s = best_lambda_1, newx = validation_data_1.mat)
# Calculate Mean Square Error (MSE) on the test set
test_mse_1 <- mean((test_data$Apps - pred_test_1)^2)
test_mse_1
## [1] 1734857
While the test MSE is higher for lasso regression(1734857) model than least squares regression(1734841) model, and test MSE is higher for ridge regression(1734931) than least squares regression(1734841). when compared to test MSE of ridge and lasso regression, lasso regression model has lower test MSE to that of ridge regression model.
Overall, the test MSE is lower for least squares regression model when compared to all other models.
coefficients_non_zero = ridge_1$beta
print(coefficients_non_zero[coefficients_non_zero[,1]!=0,]) # extracting non zero coefficients
## PrivateYes Accept Enroll Top10perc Top25perc
## -681.92813047 1.22128169 0.08049546 49.32924219 -16.11270195
## F.Undergrad P.Undergrad Outstate Room.Board Books
## 0.02283484 0.03539740 -0.05444704 0.18965658 0.21358532
## Personal PhD Terminal S.F.Ratio perc.alumni
## -0.03682464 -6.00292876 -5.01673188 -2.18405765 -8.01780938
## Expend Grad.Rate
## 0.07614101 10.63317894
print(paste("Number of Non-zero Coefficients:", length(coefficients_non_zero[coefficients_non_zero[,1]!=0,])))
## [1] "Number of Non-zero Coefficients: 17"
#Least Square model Accuracy
test_avg <- mean(test_data$Apps)
lm_lsmodel_accu <- 1 - mean((test_predictions - test_data$Apps)^2) / mean((test_avg - test_data$Apps)^2)
print(paste("Least Square Model R-Square:",lm_lsmodel_accu,"~",round(lm_lsmodel_accu*100, digits = 4)))
## [1] "Least Square Model R-Square: 0.924075933783536 ~ 92.4076"
#Ridge model Accuracy
ridge_model_accu <- 1 - mean((pred_test - test_data$Apps)^2) / mean((test_avg - test_data$Apps)^2)
print(paste("Ridge model R-Square: ", ridge_model_accu,"~",round(ridge_model_accu*100,digits = 4)))
## [1] "Ridge model R-Square: 0.924071999658342 ~ 92.4072"
#Lasso model Accuracy
lasso_model_accu <- 1 - mean((pred_test_1 - test_data$Apps)^2) / mean((test_avg - test_data$Apps)^2)
print(paste("Lasso model R-Square: ", lasso_model_accu,"~",round(lasso_model_accu*100,digits=4)))
## [1] "Lasso model R-Square: 0.924075231851842 ~ 92.4075"
From the above analysis it is understood that, Lasso model has higher R-square compared to Ridge Model, it’s evident that test MSE for model Lasso is lower when compared to Ridge model from the solution # 9(d) as well. Highest R-square among all three is Least Square model, however isn’t much larger in number when compared to the other models.
However, the R-Square for all the models have nearly similar accuracy in predicting the number of college applications received(their isn’t large difference in metrics among the test errors results for the three models).