Exercise 1

1.a)

Best subset selection.

1.b)

Best subset selection should have less variance and generally have the smallest test RSS.

1.c)

  1. True
  2. True
  3. False
  4. False
  5. False

Exercise 2

Lasso - Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Ridge regression - Also less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Non-linear methods - More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

Exercise 3

  1. Steadily decrease.

  2. Decrease initially, and then eventually start increasing in a U shape.

  3. Steadily increase.

  4. Steadily decrease.

Exercise 4

  1. Steadily decrease.

  2. Decrease initially, and then eventually start increasing in a U shape.

  3. Steadily decrease.

  4. Steadily increase.

Exercise 11

set.seed(1)
library(MASS)
library(pls)
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.0-2
cv.lasso = cv.glmnet(model.matrix(crim ~ . - 1, data = Boston),
                     Boston$crim, type.measure = "mse")
plot(cv.lasso)

coef(cv.lasso)
## 14 x 1 sparse Matrix of class "dgCMatrix"
##                     1
## (Intercept) 1.0894283
## zn          .        
## indus       .        
## chas        .        
## nox         .        
## rm          .        
## age         .        
## dis         .        
## rad         0.2643196
## tax         .        
## ptratio     .        
## black       .        
## lstat       .        
## medv        .
sqrt(cv.lasso$cvm[cv.lasso$lambda == cv.lasso$lambda.1se])
## [1] 7.438669
cv.ridge = cv.glmnet(model.matrix(crim ~ . - 1, data = Boston), Boston$crim, type.measure = "mse", alpha = 0)
plot(cv.ridge)

coef(cv.ridge)
## 14 x 1 sparse Matrix of class "dgCMatrix"
##                        1
## (Intercept)  2.190805375
## zn          -0.002514410
## indus        0.021067968
## chas        -0.102372822
## nox          1.323474252
## rm          -0.106109537
## age          0.004452629
## dis         -0.066139399
## rad          0.029778399
## tax          0.001383946
## ptratio      0.049720162
## black       -0.001713827
## lstat        0.024367321
## medv        -0.016059212
sqrt(cv.ridge$cvm[cv.ridge$lambda == cv.ridge$lambda.1se])
## [1] 7.948055
pcr.fit = pcr(crim ~ ., data = Boston, scale = TRUE, validation = "CV")
summary(pcr.fit)
## Data:    X dimension: 506 13 
##  Y dimension: 506 1
## Fit method: svdpc
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            8.61    7.200    7.197    6.776    6.770    6.770    6.783
## adjCV         8.61    7.198    7.194    6.771    6.761    6.765    6.777
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.772    6.644    6.646     6.641     6.629     6.600     6.539
## adjCV    6.766    6.637    6.639     6.633     6.621     6.591     6.529
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       47.70    60.36    69.67    76.45    82.99    88.00    91.14    93.45
## crim    30.69    30.87    39.27    39.61    39.61    39.86    40.14    42.47
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       95.40     97.04     98.46     99.52     100.0
## crim    42.55     42.78     43.04     44.13      45.4

I propose the 13 component PCR model, since it has lowest root mean square error when cross validated. Next I would pick the Lasso model and finally the ridge regression which had the worst results.

It is also possible that an ensemble of these models would probably yield the best results.

I would use all the features in the data set.