(a) The lasso, relative to least squares, is:
Answer : iii Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance Less flexible than least squares, the lasso technique adds a regularization factor to the loss function. This regularization term penalizes large coefficients, hence reducing the model’s complexity. The models that are produced may contain fewer parameters as a result of variable selection, as the lasso may reduce some coefficients to zero.
(b) Repeat (a) for ridge regression relative to least squares.
Answer:iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. The only substantial distinction in this scenario is the ridge objective function RSS + λΣβj2, where the shrinkage penalty term for ridge regression differs somewhat from that of the lasso.
(c) Repeat (a) for non-linear methods relative to least squares.
Answer: ii More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias. Since non-linear techniques are more flexible than least squares, they can reduce bias despite having a higher variation. If the underlying relationship in the data is nonlinear, we can expect an increase in prediction accuracy (the bias reduction will be greater than the variance increase).
(a) Split the data set into a training set and a test set.
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-8
library(ISLR)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
attach(College)
set.seed(369)
splitsubset<-sample(nrow(College),nrow(College)*0.8)
traindata<-College[splitsubset,]
testdata<-College[-splitsubset,]
report the test error obtained.**
lmmodel <- lm(Apps ~ ., data = traindata)
summary(lmmodel)
##
## Call:
## lm(formula = Apps ~ ., data = traindata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3227.1 -436.5 -51.6 322.0 7047.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -538.55978 435.98067 -1.235 0.217206
## PrivateYes -576.80294 147.89337 -3.900 0.000107 ***
## Accept 1.31302 0.05478 23.969 < 2e-16 ***
## Enroll -0.42055 0.22195 -1.895 0.058597 .
## Top10perc 44.94508 6.04845 7.431 3.71e-13 ***
## Top25perc -12.02853 4.88751 -2.461 0.014131 *
## F.Undergrad 0.08285 0.03633 2.280 0.022947 *
## P.Undergrad 0.02368 0.03411 0.694 0.487956
## Outstate -0.05858 0.02091 -2.801 0.005263 **
## Room.Board 0.18154 0.05151 3.524 0.000457 ***
## Books -0.11038 0.27533 -0.401 0.688648
## Personal 0.02458 0.06750 0.364 0.715837
## PhD -7.51054 4.93727 -1.521 0.128736
## Terminal -4.81507 5.34484 -0.901 0.368011
## S.F.Ratio 10.79350 13.87656 0.778 0.436979
## perc.alumni -5.49951 4.42619 -1.242 0.214538
## Expend 0.08068 0.01274 6.332 4.73e-10 ***
## Grad.Rate 10.49503 3.16479 3.316 0.000967 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1008 on 603 degrees of freedom
## Multiple R-squared: 0.9221, Adjusted R-squared: 0.9199
## F-statistic: 419.6 on 17 and 603 DF, p-value: < 2.2e-16
testpredictions <- predict(lmmodel, testdata)
testerror <- mean((testdata$Apps - testpredictions)^2)
testerror
## [1] 1703945
Linear model using least squares on the training set test MSE:1703945
traindata.mat <- model.matrix(Apps ~ ., data = traindata)
validationdata.mat <- model.matrix(Apps ~ ., data = testdata)
grid <- 10^seq(4, -2, length = 80)
mse <- rep(NA, length(grid))
for (i in 1:length(grid)) {
ridge <- glmnet(traindata.mat, traindata$Apps, alpha = 0, lambda = grid[i], thresh = 1e-12)
pred <- predict(ridge, s = grid[i], newx = validationdata.mat)
mse[i] <- mean((testdata$Apps - pred)^2)
}
bestlambda_index <- which.min(mse)
bestlambda <- grid[bestlambda_index]
bestlambda
## [1] 0.01
predtest <- predict(ridge, s = bestlambda, newx = validationdata.mat)
testmse <- mean((testdata$Apps - predtest)^2)
testmse
## [1] 1704069
Test MSEs for ridge regression and least squares regression are slightly different (1704069 versus 1703945), indicating that least squares performed somewhat better. This suggests that there is very little difference in favor of least squares.
traindata1.mat <- model.matrix(Apps ~ ., data = traindata)
validationdata1.mat <- model.matrix(Apps ~ ., data = testdata)
grid1 <- 10^seq(4, -2, length = 80)
mse1 <- rep(NA, length(grid1))
for (i in 1:length(grid)) {
lasso1 <- glmnet(traindata1.mat, traindata$Apps, alpha = 1, lambda = grid1[i], thresh = 1e-12)
pred1 <- predict(lasso1, s = grid1[i], newx = validationdata1.mat)
mse1[i] <- mean((testdata$Apps - pred1)^2)
}
bestlambdaindex1 <- which.min(mse1)
bestlambda1 <- grid[bestlambdaindex1]
bestlambda1
## [1] 0.01
predtest1 <- predict(lasso1, s = bestlambda, newx = validationdata1.mat)
testmse <- mean((testdata$Apps - predtest1)^2)
testmse
## [1] 1704155
Lasso Model Test MSE is 1704155.
nonzerocoefficients = lasso1$beta
print(nonzerocoefficients[nonzerocoefficients[,1]!=0,])
## PrivateYes Accept Enroll Top10perc Top25perc
## -576.82842033 1.31291719 -0.41990063 44.93659128 -12.02215856
## F.Undergrad P.Undergrad Outstate Room.Board Books
## 0.08277026 0.02367221 -0.05855824 0.18153474 -0.11023202
## Personal PhD Terminal S.F.Ratio perc.alumni
## 0.02456031 -7.50938731 -4.81506437 10.78828576 -5.50067277
## Expend Grad.Rate
## 0.08067941 10.49295153
print(paste("non zero coefficients:", length(nonzerocoefficients[nonzerocoefficients[,1]!=0,])))
## [1] "non zero coefficients: 17"
#Least Square model
testavg <- mean(testdata$Apps)
lm_lsmodel<- 1 - mean((testpredictions - testdata$Apps)^2) / mean((testavg - testdata$Apps)^2)
print(paste("Least Square Model R-Square:",round(lm_lsmodel*100, digits = 4)))
## [1] "Least Square Model R-Square: 92.9122"
#Lasso model
lasso_model<- 1 - mean((predtest1 - testdata$Apps)^2) / mean((testavg - testdata$Apps)^2)
print(paste("Lasso model R-Square: ",round(lasso_model*100,digits=4)))
## [1] "Lasso model R-Square: 92.9113"
#Ridge model
ridge_model_accu <- 1 - mean((predtest - testdata$Apps)^2) / mean((testavg - testdata$Apps)^2)
print(paste("Ridge model R-Square: ", round(ridge_model_accu*100,digits = 4)))
## [1] "Ridge model R-Square: 92.9117"
The results of the investigation show that the R-square for the Least Square model (92.9122) is marginally greater than that of the Lasso (92.9113) and Ridge (92.9117) models. All models show comparable accuracy in forecasting the number of college applications received, indicating similar performance overall, despite slight variations in R-square values.