2.

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

(a) The lasso, relative to least squares, is:

  1. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  2. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  3. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  4. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

Answer : iii Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance Less flexible than least squares, the lasso technique adds a regularization factor to the loss function. This regularization term penalizes large coefficients, hence reducing the model’s complexity. The models that are produced may contain fewer parameters as a result of variable selection, as the lasso may reduce some coefficients to zero.

(b) Repeat (a) for ridge regression relative to least squares.

Answer:iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. The only substantial distinction in this scenario is the ridge objective function RSS + λΣβj2, where the shrinkage penalty term for ridge regression differs somewhat from that of the lasso.

(c) Repeat (a) for non-linear methods relative to least squares.

Answer: ii More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias. Since non-linear techniques are more flexible than least squares, they can reduce bias despite having a higher variation. If the underlying relationship in the data is nonlinear, we can expect an increase in prediction accuracy (the bias reduction will be greater than the variance increase).

9.

In this exercise, we will predict the number of applications received using the other variables in the College data set.

(a) Split the data set into a training set and a test set.

library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-8
library(ISLR)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
attach(College)
set.seed(369)
splitsubset<-sample(nrow(College),nrow(College)*0.8)
traindata<-College[splitsubset,]
testdata<-College[-splitsubset,]

(b) Fit a linear model using least squares on the training set, and

report the test error obtained.**

lmmodel <- lm(Apps ~ ., data = traindata)
summary(lmmodel)
## 
## Call:
## lm(formula = Apps ~ ., data = traindata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3227.1  -436.5   -51.6   322.0  7047.8 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -538.55978  435.98067  -1.235 0.217206    
## PrivateYes  -576.80294  147.89337  -3.900 0.000107 ***
## Accept         1.31302    0.05478  23.969  < 2e-16 ***
## Enroll        -0.42055    0.22195  -1.895 0.058597 .  
## Top10perc     44.94508    6.04845   7.431 3.71e-13 ***
## Top25perc    -12.02853    4.88751  -2.461 0.014131 *  
## F.Undergrad    0.08285    0.03633   2.280 0.022947 *  
## P.Undergrad    0.02368    0.03411   0.694 0.487956    
## Outstate      -0.05858    0.02091  -2.801 0.005263 ** 
## Room.Board     0.18154    0.05151   3.524 0.000457 ***
## Books         -0.11038    0.27533  -0.401 0.688648    
## Personal       0.02458    0.06750   0.364 0.715837    
## PhD           -7.51054    4.93727  -1.521 0.128736    
## Terminal      -4.81507    5.34484  -0.901 0.368011    
## S.F.Ratio     10.79350   13.87656   0.778 0.436979    
## perc.alumni   -5.49951    4.42619  -1.242 0.214538    
## Expend         0.08068    0.01274   6.332 4.73e-10 ***
## Grad.Rate     10.49503    3.16479   3.316 0.000967 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1008 on 603 degrees of freedom
## Multiple R-squared:  0.9221, Adjusted R-squared:  0.9199 
## F-statistic: 419.6 on 17 and 603 DF,  p-value: < 2.2e-16
testpredictions <- predict(lmmodel, testdata)


testerror <- mean((testdata$Apps - testpredictions)^2)
testerror
## [1] 1703945

Linear model using least squares on the training set test MSE:1703945

(c) Fit a ridge regression model on the training set, with Lambda chosen by cross-validation. Report the test error obtained.**

traindata.mat <- model.matrix(Apps ~ ., data = traindata)
validationdata.mat <- model.matrix(Apps ~ ., data = testdata)

grid <- 10^seq(4, -2, length = 80)

mse <- rep(NA, length(grid))
for (i in 1:length(grid)) {
  ridge <- glmnet(traindata.mat, traindata$Apps, alpha = 0, lambda = grid[i], thresh = 1e-12)
  pred <- predict(ridge, s = grid[i], newx = validationdata.mat)
  mse[i] <- mean((testdata$Apps - pred)^2)
}

bestlambda_index <- which.min(mse)
bestlambda <- grid[bestlambda_index]
bestlambda
## [1] 0.01
predtest <- predict(ridge, s = bestlambda, newx = validationdata.mat)

testmse <- mean((testdata$Apps - predtest)^2)
testmse
## [1] 1704069

Test MSEs for ridge regression and least squares regression are slightly different (1704069 versus 1703945), indicating that least squares performed somewhat better. This suggests that there is very little difference in favor of least squares.

(d) Fit a lasso model on the training set, with Lambda chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.

traindata1.mat <- model.matrix(Apps ~ ., data = traindata)
validationdata1.mat <- model.matrix(Apps ~ ., data = testdata)

grid1 <- 10^seq(4, -2, length = 80)


mse1 <- rep(NA, length(grid1))
for (i in 1:length(grid)) {
  lasso1 <- glmnet(traindata1.mat, traindata$Apps, alpha = 1, lambda = grid1[i], thresh = 1e-12)
  pred1 <- predict(lasso1, s = grid1[i], newx = validationdata1.mat)
  mse1[i] <- mean((testdata$Apps - pred1)^2)
}

bestlambdaindex1 <- which.min(mse1)
bestlambda1 <- grid[bestlambdaindex1]
bestlambda1
## [1] 0.01
predtest1 <- predict(lasso1, s = bestlambda, newx = validationdata1.mat)

testmse <- mean((testdata$Apps - predtest1)^2)
testmse
## [1] 1704155

Lasso Model Test MSE is 1704155.

nonzerocoefficients = lasso1$beta

print(nonzerocoefficients[nonzerocoefficients[,1]!=0,]) 
##    PrivateYes        Accept        Enroll     Top10perc     Top25perc 
## -576.82842033    1.31291719   -0.41990063   44.93659128  -12.02215856 
##   F.Undergrad   P.Undergrad      Outstate    Room.Board         Books 
##    0.08277026    0.02367221   -0.05855824    0.18153474   -0.11023202 
##      Personal           PhD      Terminal     S.F.Ratio   perc.alumni 
##    0.02456031   -7.50938731   -4.81506437   10.78828576   -5.50067277 
##        Expend     Grad.Rate 
##    0.08067941   10.49295153
print(paste("non zero coefficients:", length(nonzerocoefficients[nonzerocoefficients[,1]!=0,])))
## [1] "non zero coefficients: 17"

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

#Least Square model
testavg <- mean(testdata$Apps)
lm_lsmodel<- 1 - mean((testpredictions - testdata$Apps)^2) / mean((testavg - testdata$Apps)^2)
print(paste("Least Square Model R-Square:",round(lm_lsmodel*100, digits = 4)))
## [1] "Least Square Model R-Square: 92.9122"
#Lasso model
lasso_model<- 1 - mean((predtest1 - testdata$Apps)^2) / mean((testavg - testdata$Apps)^2)
print(paste("Lasso model  R-Square: ",round(lasso_model*100,digits=4)))
## [1] "Lasso model  R-Square:  92.9113"
#Ridge model
ridge_model_accu <- 1 - mean((predtest - testdata$Apps)^2) / mean((testavg - testdata$Apps)^2)
print(paste("Ridge model R-Square: ", round(ridge_model_accu*100,digits = 4)))
## [1] "Ridge model R-Square:  92.9117"

The results of the investigation show that the R-square for the Least Square model (92.9122) is marginally greater than that of the Lasso (92.9113) and Ridge (92.9117) models. All models show comparable accuracy in forecasting the number of college applications received, indicating similar performance overall, despite slight variations in R-square values.