Chapter 06 (page 259): 2, 9, 11

2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

(a) The lasso, relative to least squares, is:
i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  • Part iii is correct.
  • Less flexible and will give improved prediction accuracy when its increase in bias is less than its decrease in variance
  • The shrinkage is what reduces the variance of the predictions, at the cost of a small increase in bias. The trade-off is usually worth it.
(b) Repeat (a) for ridge regression relative to least squares.
  • Part iii is correct.
  • Less flexible and will give improved prediction accuracy when its increase in bias is less than its decrease in variance
(c) Repeat (a) for non-linear methods relative to least squares.
  • Part ii is correct.
  • More flexible and will give improved prediction accuracy when its increase in varience is less than its decrease in bias

9. In this exercise, we will predict the number of applications received using the other variables in the College data set.

library(ISLR)
## Warning: package 'ISLR' was built under R version 4.0.5
data(College)
(a) Split the data set into a training set and a test set.
set.seed(1)
id = sample(1:nrow(College), nrow(College)/2)
train = College[id,]
test = College[-id,]
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
lm.fit = lm(Apps~., data=train)
lm.pred = predict(lm.fit, test)
lm.error = mean((test$Apps - lm.pred)^2)
lm.error
## [1] 1135758
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
#install.packages("glmnet", repos = "https://cran.us.r-project.org")
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.0.5
## Loading required package: Matrix
## Loaded glmnet 4.1-2
train.X = model.matrix(Apps ~ ., data = train)
train.Y = train[, "Apps"]
test.X = model.matrix(Apps ~ ., data = test)
test.Y = test[, "Apps"]
grid = 10 ^ seq(4, -2, length=100)
ridge.mod = glmnet(train.X, train.Y, alpha =0, lambda=grid, thresh = 1e-12)
lambda.best = ridge.mod$lambda.min
ridge.pred = predict(ridge.mod, newx= test.X, s=lambda.best)
(ridge.err = mean((test.Y - ridge.pred)^2))
## [1] 1164319
  • Test error: 1164319
(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
lasso.mod = glmnet(train.X, train.Y, alpha =1, lambda=grid, thresh = 1e-12)
lasso.cv = cv.glmnet(train.X, train.Y, alpha =1, lambda=grid, thresh = 1e-12)
lambda.best = lasso.cv$lambda.min
lasso.pred = predict(lasso.mod, newx= test.X, s=lambda.best)
(lasso.err = mean((test.Y - lasso.pred)^2))
## [1] 1135660
lasso_coef = predict(lasso.mod, s = lambda.best, type="coefficients")
round(lasso_coef, 3)
## 19 x 1 sparse Matrix of class "dgCMatrix"
##                   s1
## (Intercept) -790.036
## (Intercept)    .    
## PrivateYes  -307.010
## Accept         1.779
## Enroll        -1.470
## Top10perc     66.722
## Top25perc    -22.304
## F.Undergrad    0.093
## P.Undergrad    0.009
## Outstate      -0.108
## Room.Board     0.212
## Books          0.291
## Personal       0.006
## PhD          -15.472
## Terminal       6.410
## S.F.Ratio     22.826
## perc.alumni    1.130
## Expend         0.049
## Grad.Rate      7.488
  • Test error: 1135660
  • Number of non-zero coefficients: 18 out of 19
(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
#install.packages("pls", repos = "https://cran.us.r-project.org")
library(pls)
## Warning: package 'pls' was built under R version 4.0.5
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
pcr.fit = pcr(Apps~., data=train, scale=TRUE, validation="CV")
validationplot(pcr.fit, val.type="MSEP")

summary(pcr.fit)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4288     4027     2351     2355     2046     1965     1906
## adjCV         4288     4031     2347     2353     2014     1955     1899
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1910     1913     1871      1799      1799      1802      1800
## adjCV     1903     1908     1866      1790      1792      1795      1793
##        14 comps  15 comps  16 comps  17 comps
## CV         1832      1728      1310      1222
## adjCV      1829      1702      1296      1212
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       32.20    57.78    65.31    70.99    76.37    81.27     84.8    87.85
## Apps    13.44    70.93    71.07    79.87    81.15    82.25     82.3    82.33
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.62     92.91     94.98     96.74     97.79     98.72     99.42
## Apps    83.38     84.76     84.80     84.84     85.11     85.14     90.55
##       16 comps  17 comps
## X        99.88    100.00
## Apps     93.42     93.89
pcr.pred = predict(pcr.fit, test, ncomp=10)
(pcr.error = mean((test$Apps - pcr.pred)^2))
## [1] 1723100
  • M is equal to 17
  • Test error: 1723100
(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
pcr.fit2 = pcr(Apps~., data=train, scale=TRUE, validation="CV")
validationplot(pcr.fit2, val.type="MSEP")

summary(pcr.fit2)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4288     4054     2422     2432     2117     2022     1959
## adjCV         4288     4051     2415     2426     2061     1999     1948
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1957     1955     1911      1852      1853      1856      1861
## adjCV     1947     1947     1900      1840      1843      1846      1851
##        14 comps  15 comps  16 comps  17 comps
## CV         1877      1679      1338      1269
## adjCV      1876      1649      1323      1256
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       32.20    57.78    65.31    70.99    76.37    81.27     84.8    87.85
## Apps    13.44    70.93    71.07    79.87    81.15    82.25     82.3    82.33
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.62     92.91     94.98     96.74     97.79     98.72     99.42
## Apps    83.38     84.76     84.80     84.84     85.11     85.14     90.55
##       16 comps  17 comps
## X        99.88    100.00
## Apps     93.42     93.89
pls.pred = predict(pcr.fit2, test, ncomp=12)
(pls.err = mean((test$Apps - pls.pred)^2))
## [1] 1706651
  • Test error: 1723100
  • M is equal to 12
(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
test.avg = mean(test$Apps)
lm.r2 =1 - lm.error/mean((test.avg - test$Apps)^2)
ridge.r2 =1 - ridge.err/mean((test.avg - test$Apps)^2)
lasso.r2 =1 - lasso.err/mean((test.avg - test$Apps)^2)
pcr.r2 =1 - pcr.error/mean((test.avg - test$Apps)^2)
pls.r2 =1 - pls.err/mean((test.avg - test$Apps)^2)
all.r2 = c(lm.r2, ridge.r2, lasso.r2, pcr.r2, pls.r2)
names(all.r2) = c("lm", "ridge", "lasso", "pcr", "pls")
barplot(all.r2 )

  • All the models except pcr and pls predict college applications with high accuracry.

11. We will now try to predict per capita crime rate in the Boston data set.

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
#install.packages("leaps", repos = "https://cran.us.r-project.org")
- Best Subset selection
library(MASS) 
library(leaps)   
## Warning: package 'leaps' was built under R version 4.0.5
library(glmnet)  
data(Boston)
set.seed(1)
predict.regsubsets = function(object, newdata, id, ...) {
    form = as.formula(object$call[[2]])
    mat = model.matrix(form, newdata)
    coefi = coef(object, id = id)
    xvars = names(coefi)
    mat[, xvars] %*% coefi
}

k = 10
folds = sample(1:k, nrow(Boston), replace = TRUE)
cv.errors = matrix(NA, k, 13, dimnames = list(NULL, paste(1:13)))
for (j in 1:k) {
    best.fit = regsubsets(crim ~ ., data = Boston[folds != j, ], nvmax = 13)
    for (i in 1:13) {
        pred = predict(best.fit, Boston[folds == j, ], id = i)
        cv.errors[j, i] = mean((Boston$crim[folds == j] - pred)^2)
    }
}
mean.cv.errors = apply(cv.errors, 2, mean)
plot(mean.cv.errors, type = "b", xlab = "Number of variables", ylab = "CV error")

min(mean.cv.errors)
## [1] 42.46014
regfit.best = regsubsets(crim~., data=Boston, nvmax=13)
coef(regfit.best, 12)
##   (Intercept)            zn         indus          chas           nox 
##  16.985713928   0.044673247  -0.063848469  -0.744367726 -10.202169211 
##            rm           dis           rad           tax       ptratio 
##   0.439588002  -0.993556631   0.587660185  -0.003767546  -0.269948860 
##         black         lstat          medv 
##  -0.007518904   0.128120290  -0.198877768
- Lasso
x = model.matrix(crim ~ ., Boston)[, -1]
y = Boston$crim
cv.out = cv.glmnet(x, y, alpha = 1, type.measure = "mse")
plot(cv.out)

cv.out
## 
## Call:  cv.glmnet(x = x, y = y, type.measure = "mse", alpha = 1) 
## 
## Measure: Mean-Squared Error 
## 
##     Lambda Index Measure    SE Nonzero
## min  0.051    51   43.11 14.16      11
## 1se  3.376     6   56.89 17.29       1
- Ridge
cv.out = cv.glmnet(x, y, alpha = 0, type.measure = "mse")
cv.out
## 
## Call:  cv.glmnet(x = x, y = y, type.measure = "mse", alpha = 0) 
## 
## Measure: Mean-Squared Error 
## 
##     Lambda Index Measure    SE Nonzero
## min   0.54   100   43.48 14.33      13
## 1se  74.44    47   57.69 16.76      13
- pcr
pcr.fit = pcr(crim ~ ., data = Boston, scale = TRUE, validation = "CV")
validationplot(pcr.fit, val.type = "MSEP")

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.
  • The best subset selections yields the best results
(c) Does your chosen model involve all of the features in the data set? Why or why not?
  • The 9 parameter best subset model because it had the best cross-validated RMSE
  • It features the best 12 models, 1 feature is missing (age)