Chapter 06 (page 259): 2, 9, 11

library(MASS)
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-6
library(ISLR2)
## 
## Attaching package: 'ISLR2'
## The following object is masked from 'package:MASS':
## 
##     Boston
library(pls)
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
library(leaps)

#2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

  1. The lasso, relative to least squares, is:
  1. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  2. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  3. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  4. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  1. The lasso, relative to least squares, is: (iii)

Relative to least squares, the lasso is less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

  1. Repeat (a) for ridge regression relative to least squares. (iii)

Relative to least squares, the ridge regresson is less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

  1. Repeat (a) for non-linear methods relative to least squares. (ii)

Relative to least squares, non-linear methods are more flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease

#9

In this exercise, we will predict the number of applications received using the other variables in the College data set.

names(College)
##  [1] "Private"     "Apps"        "Accept"      "Enroll"      "Top10perc"  
##  [6] "Top25perc"   "F.Undergrad" "P.Undergrad" "Outstate"    "Room.Board" 
## [11] "Books"       "Personal"    "PhD"         "Terminal"    "S.F.Ratio"  
## [16] "perc.alumni" "Expend"      "Grad.Rate"
  1. Split the data set into a training set and a test set.
set.seed(99)
train <- sample(c(TRUE,FALSE),nrow(College),rep=TRUE)
test <- (!train)
  1. Fit a linear model using least squares on the training set, and report the test error obtained.
lm.fit <- lm(Apps ~ ., data = College[train,])
pred <- predict(lm.fit, College[test,])
mean((pred - College[test,]$Apps)^2)
## [1] 1324037

The MSE for the linear model is: 1324037

  1. Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
set.seed(99)
x <- model.matrix(Apps ~ ., College)
y <- College$Apps
y.test <- y[test]

grid <- 10^seq(10,-2,length=100)

cv.out <- cv.glmnet(x[train,],y[train], alpha=0)
bestlam <- cv.out$lambda.min

ridge.mod <- glmnet(x[train,], y[train], alpha = 0, lambda = grid, thresh = 1e-12)
ridge.pred <- predict(ridge.mod, s=bestlam, newx = x[test,])

mean((ridge.pred-y.test)^2)
## [1] 1969100

The MSE for the ridge model is: 1969100

  1. Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
set.seed(99)

cv.out <- cv.glmnet(x[train,],y[train], alpha=1)
bestlam <- cv.out$lambda.min

lasso.mod <- glmnet(x[train,], y[train], alpha = 1, lambda = grid)
lasso.pred <- predict(lasso.mod, s=bestlam, newx = x[test,])

mean((lasso.pred-y.test)^2)
## [1] 1371963

The MSE for the lasso model is: 1371963

  1. Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
set.seed(99)
pcr_fit <- pcr(Apps~., data=College[train,], scale=TRUE, validation="CV")
validationplot(pcr_fit, val.type = "MSEP")

summary(pcr_fit)
## Data:    X dimension: 393 17 
##  Y dimension: 393 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3349     3256     1705     1722     1467     1347     1344
## adjCV         3349     3253     1701     1720     1436     1340     1341
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1239     1218     1251      1269      1270      1269      1272
## adjCV     1215     1212     1244      1262      1264      1262      1265
##        14 comps  15 comps  16 comps  17 comps
## CV         1269      1216      1053      1059
## adjCV      1262      1210      1049      1052
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X      32.953    58.23    65.81    71.45    76.86    81.40    84.92    88.32
## Apps    6.669    75.32    75.37    82.89    85.26    85.29    87.67    87.67
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       91.17      93.6     95.61     97.12     98.16     98.88     99.41
## Apps    87.89      87.9     88.01     88.17     88.28     88.35     89.51
##       16 comps  17 comps
## X        99.83    100.00
## Apps     91.53     91.89
pcr_pred <- predict(pcr_fit , College[test , ], ncomp = 10)
mean (( pcr_pred - y.test)^2)
## [1] 2775474

The MSE for the PCR model is: 2775474

  1. Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
pls_fit <- plsr(Apps~. ,data=College[train,] ,scale=TRUE ,validation="CV")
validationplot(pls_fit ,val.type = "MSEP")

summary(pls_fit)
## Data:    X dimension: 393 17 
##  Y dimension: 393 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3349     1559     1376     1176     1144     1092     1055
## adjCV         3349     1555     1376     1173     1136     1080     1050
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1041     1041     1043      1045      1043      1045      1045
## adjCV     1037     1036     1038      1040      1038      1040      1040
##        14 comps  15 comps  16 comps  17 comps
## CV         1044      1044      1044      1044
## adjCV      1039      1039      1039      1039
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       26.23    50.58    63.51    66.76    69.84    75.24    78.06    82.10
## Apps    79.46    84.26    88.70    90.00    91.21    91.61    91.75    91.79
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       84.40     86.41     89.47     91.43     92.77     94.11     96.18
## Apps    91.83     91.86     91.87     91.88     91.88     91.88     91.88
##       16 comps  17 comps
## X        98.90    100.00
## Apps     91.88     91.89
pls_pred <- predict(pls_fit , College[test, ], ncomp = 7)
mean (( pls_pred - y.test)^2)
## [1] 1376845

The error for the PCR model is: 1376845

#11

We will now try to predict per capita crime rate in the Boston data set. (a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

attach(Boston)
set.seed(1)
predict.regsubsets <-  function(object, newdata, id, ...) {
    form <-  as.formula(object$call[[2]])
    mat <-  model.matrix(form, newdata)
    coefi <-  coef(object, id = id)
    mat[, names(coefi)] %*% coefi
}

k = 10
p <-  ncol(Boston) - 1
folds <-  sample(rep(1:k, length = nrow(Boston)))
cv.errors <-  matrix(NA, k, p)
for (i in 1:k) {
    best.fit <-  regsubsets(crim ~ ., data = Boston[folds != i, ], nvmax = p)
    for (j in 1:p) {
        pred <-  predict(best.fit, Boston[folds == i, ], id = j)
        cv.errors[i, j] <-  mean((Boston$crim[folds == i] - pred)^2)}}

mean.cv.errors <- apply(cv.errors, 2, mean)
plot(mean.cv.errors, type = "b", xlab = "Predictors", ylab = "Error")

which.min(mean.cv.errors)
## [1] 11
mean.cv.errors[9]
## [1] 42.75855

The Best subset model with 9 predictors displays the lowest Test MSE, which is= 42.81453

set.seed(1)
split <- sample(nrow(Boston), 0.7*nrow(Boston))
train <- Boston[split,]
test <- Boston[-split,]

train.mat <- model.matrix(crim~.,train)
test.mat <- model.matrix(crim~.,test)
grid=10^seq(10,-2,length=100)

bridge.mod <- glmnet(train.mat, train$crim, alpha=0, lambda = grid, thresh=1e-12)
cv.out <- cv.glmnet(train.mat, train$crim, alpha=0, lambda = grid, thresh=1e-12)
lambda <- cv.out$lambda.min
lambda
## [1] 0.07054802
ridge <-glmnet(train.mat, train$crim, alpha=0, lambda = lambda)
coef(ridge)
## 14 x 1 sparse Matrix of class "dgCMatrix"
##                        s0
## (Intercept)  8.6551478305
## (Intercept)  .           
## zn           0.0348028903
## indus       -0.0778134418
## chas        -0.7005827378
## nox         -5.8865340096
## rm           0.4652812869
## age         -0.0023113130
## dis         -0.6847741847
## rad          0.5311924048
## tax         -0.0006274497
## ptratio     -0.3173369112
## lstat        0.2300573891
## medv        -0.1460584759
pred <- predict(ridge, s=lambda, newx=test.mat)
MSE <-  mean((pred-test$crim)^2)
MSE
## [1] 57.81492

The MSE for the ridge model is: 59.57622

lasso <- glmnet(train.mat, train$crim, alpha=0,lambda=grid)
cv.lasso <- cv.glmnet(train.mat, train$crim, alpha=0, lambda=grid)
lambda.lasso <- cv.lasso$lambda.min
lambda.lasso
## [1] 0.09326033
lasso1<-glmnet(train.mat, train$crim, alpha=0, lambda=lambda.lasso)
coef(lasso1)
## 14 x 1 sparse Matrix of class "dgCMatrix"
##                        s0
## (Intercept)  8.2638175782
## (Intercept)  .           
## zn           0.0343471195
## indus       -0.0792217778
## chas        -0.6979559379
## nox         -5.6728263126
## rm           0.4575823669
## age         -0.0022603493
## dis         -0.6731592993
## rad          0.5224052360
## tax         -0.0002176078
## ptratio     -0.3099114914
## lstat        0.2307052204
## medv        -0.1431985787
pred.lasso <- predict(lasso, s=lambda.lasso, newx= test.mat)

lasso_MSE <- mean((test$crim - pred.lasso)^2)
lasso_MSE
## [1] 57.88517

The MSE for the lasso model is: 59.91168

pcr.model <- pcr(crim ~ . ,data=train ,scale=TRUE ,validation="CV")
summary(pcr.model)
## Data:    X dimension: 354 12 
##  Y dimension: 354 1
## Fit method: svdpc
## Number of components considered: 12
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           8.085    6.776    6.774    6.422    6.363    6.307    6.279
## adjCV        8.085    6.770    6.768    6.412    6.354    6.299    6.271
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps
## CV       6.096    6.090    6.074     6.078     6.068     6.009
## adjCV    6.087    6.078    6.065     6.069     6.058     5.999
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       50.57    64.47    73.58    80.72    87.14    90.65    93.18    95.07
## crim    31.52    31.89    39.44    40.88    41.82    42.41    45.89    46.09
##       9 comps  10 comps  11 comps  12 comps
## X       96.86     98.34     99.46    100.00
## crim    46.40     46.46     46.94     48.03
validationplot(pcr.model , val.type = "MSEP")

pcr.model1 <-  pcr(crim ~ . ,data=train ,scale=TRUE , ncomp = 12)
summary(pcr.model1)
## Data:    X dimension: 354 12 
##  Y dimension: 354 1
## Fit method: svdpc
## Number of components considered: 12
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       50.57    64.47    73.58    80.72    87.14    90.65    93.18    95.07
## crim    31.52    31.89    39.44    40.88    41.82    42.41    45.89    46.09
##       9 comps  10 comps  11 comps  12 comps
## X       96.86     98.34     99.46    100.00
## crim    46.40     46.46     46.94     48.03
predict.pcr <- predict(pcr.model ,test ,ncomp=12)
MSE <- mean((test$crim - predict.pcr)^2)
MSE
## [1] 57.61252

The MSE for the PCR model is: 58.97718

  1. Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

Comparing the MSE from the different models applied to the Boston data set, we can observe that the best subset model has the lowest MSE, being 42.81453

  1. Does your chosen model involve all of the features in the data set? Why or why not?

the Best Subset Model performs with only 9 predictors, unlike the rest of the models which perform with 13.

detach(Boston)