Libraries

data("College")
data("Boston")
library(ISLR2)
library(glmnet)
library(pls)

Question 2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

  1. The lasso, relative to least squares, is:
  1. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease invariance.

  2. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

  3. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

  4. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

Part iii, the lasso shrinks the coefficient estimates towards zero. The shrinkage is what reduces the variance of the predictions, at the cost of a small increase in bias.

  1. Repeat (a) for ridge regression relative to least squares. Part iii, As λ increases, the flexibility of the ridge regression fit decreases, leading to decreased variance but increased bias.

  2. Repeat (a) for non-linear methods relative to least squares. Part ii, the non-linear method is a more flexible approach. This will therefore lead to a decrease in bias that outweighs any increase in variance

Question 9

In this exercise, we will predict the number of applications received using the other variables in the College data set.

  1. Split the data set into a training set and a test set.
set.seed(1)
s=sample(1:nrow(College), nrow(College) * 0.7)
train=College[s,]
test=College[-s,]
  1. Fit a linear model using least squares on the training set, and report the test error obtained.
ls=lm(Apps ~ ., data= train)
summary(ls)
## 
## Call:
## lm(formula = Apps ~ ., data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5816.1  -451.6    -1.0   327.2  7445.9 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.377e+02  5.076e+02  -1.059  0.28995    
## PrivateYes  -5.045e+02  1.648e+02  -3.061  0.00232 ** 
## Accept       1.722e+00  4.763e-02  36.159  < 2e-16 ***
## Enroll      -1.055e+00  2.437e-01  -4.329 1.80e-05 ***
## Top10perc    5.358e+01  6.440e+00   8.320 7.64e-16 ***
## Top25perc   -1.614e+01  5.092e+00  -3.170  0.00161 ** 
## F.Undergrad  2.970e-02  4.399e-02   0.675  0.49976    
## P.Undergrad  7.162e-02  3.649e-02   1.963  0.05019 .  
## Outstate    -8.841e-02  2.176e-02  -4.064 5.57e-05 ***
## Room.Board   1.630e-01  5.577e-02   2.923  0.00362 ** 
## Books        2.727e-01  2.723e-01   1.001  0.31715    
## Personal    -7.316e-03  7.283e-02  -0.100  0.92002    
## PhD         -9.676e+00  5.360e+00  -1.805  0.07161 .  
## Terminal    -3.781e-01  6.015e+00  -0.063  0.94990    
## S.F.Ratio    1.627e+01  1.608e+01   1.012  0.31214    
## perc.alumni  2.358e+00  4.853e+00   0.486  0.62722    
## Expend       5.986e-02  1.337e-02   4.476 9.34e-06 ***
## Grad.Rate    7.158e+00  3.520e+00   2.034  0.04248 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1022 on 525 degrees of freedom
## Multiple R-squared:  0.9332, Adjusted R-squared:  0.9311 
## F-statistic: 431.6 on 17 and 525 DF,  p-value: < 2.2e-16
pred=predict(ls, test)
mean((pred-test$Apps)^2)
## [1] 1261630
  1. Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
trmat=model.matrix(Apps ~ ., data=train)
tmat=model.matrix(Apps ~ ., data=test)
grid=10^seq(4,-2, length=100)
ridge=glmnet(trmat, train$Apps, alpha=0, lambda=grid, thresh = 1e-12)

cv.out=cv.glmnet(trmat, train$Apps, alpha=0, lambda=grid, thresh=1e-12)

bestlam=cv.out$lambda.min
bestlam
## [1] 0.01
predridge=predict(ridge, s=bestlam ,newx =tmat)
mean((predridge-test$Apps)^2)
## [1] 1261598

The MSE is 1261598 for ridge regression which is slightly less than the least square model.

  1. Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
lasso=glmnet(trmat, train$Apps, alpha=1, lambda=grid, thresh = 1e-12)

cv.out2=cv.glmnet(trmat, train$Apps, alpha=1, lambda=grid, thresh=1e-12)

bestlaml=cv.out2$lambda.min
bestlaml
## [1] 0.01
predlasso=predict(lasso, s=bestlaml, newx =tmat)
mean((predlasso-test$Apps)^2)
## [1] 1261591

The MSE for the lasso is 1228885 which is quite smaller than both of the previous models.

predict(lasso, s=bestlam, type="coefficients")
## 19 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept) -5.377539e+02
## (Intercept)  .           
## PrivateYes  -5.044625e+02
## Accept       1.722329e+00
## Enroll      -1.054317e+00
## Top10perc    5.357463e+01
## Top25perc   -1.613792e+01
## F.Undergrad  2.961867e-02
## P.Undergrad  7.161220e-02
## Outstate    -8.840061e-02
## Room.Board   1.630166e-01
## Books        2.725781e-01
## Personal    -7.289348e-03
## PhD         -9.674575e+00
## Terminal    -3.772228e-01
## S.F.Ratio    1.626160e+01
## perc.alumni  2.354463e+00
## Expend       5.986103e-02
## Grad.Rate    7.157200e+00
  1. Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
pcr=pcr(Apps ~ ., data=train, scale=TRUE, validation="CV")
predpcr=predict(pcr, test)
mean((predpcr-test$Apps)^2)
## [1] 2670100

The MSE for the pcr model in 2670100 which is larger than the other models.

  1. Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
plsr=plsr(Apps ~ ., data=train, scale=TRUE, validation="CV")
predplsr=predict(plsr, test)
mean((predplsr-test$Apps)^2)
## [1] 1413894

The MSE for the pls model is 1413894.

  1. Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

All of the models predict them accurately and not much difference between the test errors.

Question 11

We will now try to predict per capita crime rate in the Boston data set.

  1. Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
set.seed(1)
s=sample(1:nrow(Boston), nrow(Boston) * 0.7)
btrain=Boston[s,]
btest=Boston[-s,]
btrmat=model.matrix(medv ~ ., data=btrain)
btmat=model.matrix(medv ~ ., data=btest)
grid=10^seq(4,-2, length=100)
bridge=glmnet(btrmat, btrain$medv, alpha=0, lambda=grid)

bcv.out=cv.glmnet(btrmat, btrain$medv, alpha=0, lambda=grid)

bbestlam=bcv.out$lambda.min
bpred=predict(bridge, s=bbestlam, newx=btmat)
mean((btest$medv-bpred)^2)
## [1] 27.24035
lasso2=glmnet(btrmat, btrain$medv, alpha=0, lambda=grid)
bcv.out2=cv.glmnet(btrmat, btrain$medv, alpha=0, lambda=grid)

bbestlaml=bcv.out2$lambda.min
bpredl=predict(lasso2, s=bbestlaml, newx=btmat)
mean((bpredl- btest$medv)^2)
## [1] 27.24035
pcr2<-pcr(medv ~ ., data=btrain, scale=TRUE, validation="CV")
bpredpcr<-predict(pcr2, btest)
mean((bpredpcr-btest$medv)^2)
## [1] 33.42198
  1. Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

The models with the best result is the ridge regression and lasso model and used crossvalidation when calculating the results.

  1. Does your chosen model involve all of the features in the data set? Why or why not? Yes, the both models involve all of the features in the data set