Assignment #5

Exercises

Conceptual

2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

(a) The lasso, relative to least squares, is:

i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

PART iii is correct. As with ridge regression, when the least squares estimates have excessively high variance, the lasso solution can yield a reduction in variance at the expense of a small increase in bias, and consequently can generate more accurate predictions.

iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

(b) Repeat (a) for ridge regression relative to least squares.

PART iii is correct. As with ridge regression, when the least squares estimates have excessively high variance, the lasso solution can yield a reduction in variance at the expense of a small increase in bias, and consequently can generate more accurate predictions. Unlike ridge regression, the lasso performs variable selection, and hence results in models that are easier to interpret.

(c) Repeat (a) for non-linear methods relative to least squares.

PART ii is correct. For non-linear methods the result will be more flexible models, which will lead to a decrease in bias less than and increase in varriance , resulting in higher accuracy.

Applied

9. In this exercise, we will predict the number of applications received using the other variables in the College data set.

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.5     v dplyr   1.0.3
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
## Warning: package 'pls' was built under R version 4.0.4
## 
## Attaching package: 'pls'
## The following object is masked from 'package:caret':
## 
##     R2
## The following object is masked from 'package:stats':
## 
##     loadings

(a) Split the data set into a training set and a test set.

str(College)
## 'data.frame':    777 obs. of  18 variables:
##  $ Private    : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Apps       : num  1660 2186 1428 417 193 ...
##  $ Accept     : num  1232 1924 1097 349 146 ...
##  $ Enroll     : num  721 512 336 137 55 158 103 489 227 172 ...
##  $ Top10perc  : num  23 16 22 60 16 38 17 37 30 21 ...
##  $ Top25perc  : num  52 29 50 89 44 62 45 68 63 44 ...
##  $ F.Undergrad: num  2885 2683 1036 510 249 ...
##  $ P.Undergrad: num  537 1227 99 63 869 ...
##  $ Outstate   : num  7440 12280 11250 12960 7560 ...
##  $ Room.Board : num  3300 6450 3750 5450 4120 ...
##  $ Books      : num  450 750 400 450 800 500 500 450 300 660 ...
##  $ Personal   : num  2200 1500 1165 875 1500 ...
##  $ PhD        : num  70 29 53 92 76 67 90 89 79 40 ...
##  $ Terminal   : num  78 30 66 97 72 73 93 100 84 41 ...
##  $ S.F.Ratio  : num  18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...
##  $ perc.alumni: num  12 16 30 37 2 11 26 37 23 15 ...
##  $ Expend     : num  7041 10527 8735 19016 10922 ...
##  $ Grad.Rate  : num  60 56 54 59 15 55 63 73 80 52 ...
set.seed(3)
train_index <- sample(1:nrow(College), round(nrow(College) * 0.7))
train <- College[train_index, ]
test <- College[-train_index, ]
dim(train)
## [1] 544  18
dim(test)
## [1] 233  18

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

lm.fit <- lm(Apps ~., data = train)
summary(lm.fit)
## 
## Call:
## lm(formula = Apps ~ ., data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3371.9  -437.6   -63.3   350.2  6233.4 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -605.81244  493.25798  -1.228 0.219927    
## PrivateYes  -584.35163  164.18527  -3.559 0.000406 ***
## Accept         1.32861    0.06124  21.696  < 2e-16 ***
## Enroll        -0.24166    0.21699  -1.114 0.265908    
## Top10perc     52.92817    6.99182   7.570 1.68e-13 ***
## Top25perc    -17.86210    5.44140  -3.283 0.001097 ** 
## F.Undergrad    0.04199    0.03837   1.094 0.274342    
## P.Undergrad    0.04110    0.03435   1.197 0.231996    
## Outstate      -0.07531    0.02325  -3.239 0.001274 ** 
## Room.Board     0.17973    0.05972   3.010 0.002742 ** 
## Books         -0.09909    0.26773  -0.370 0.711436    
## Personal       0.02642    0.07653   0.345 0.730076    
## PhD           -9.93047    5.51335  -1.801 0.072249 .  
## Terminal      -5.68887    6.08867  -0.934 0.350559    
## S.F.Ratio     23.09251   15.20844   1.518 0.129514    
## perc.alumni   -6.52534    5.04819  -1.293 0.196713    
## Expend         0.12521    0.01797   6.966 9.77e-12 ***
## Grad.Rate     10.65431    3.52288   3.024 0.002614 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1042 on 526 degrees of freedom
## Multiple R-squared:  0.9215, Adjusted R-squared:  0.9189 
## F-statistic:   363 on 17 and 526 DF,  p-value: < 2.2e-16
lm.pred <- predict(lm.fit, test)
lm.mse <- mean((lm.pred - test$Apps)^2)
lm.mse
## [1] 1413287

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

library(glmnet)
## Warning: package 'glmnet' was built under R version 4.0.4
X_train <- model.matrix(data = train, Apps ~ Private + Accept + Enroll + Top10perc + Top25perc + F.Undergrad + P.Undergrad + Outstate + Room.Board + Books + Personal + PhD + Terminal + S.F.Ratio + perc.alumni + Expend + Grad.Rate)

X_test <- X_test <- model.matrix(data = train, Apps ~ Private + Accept + Enroll + Top10perc + Top25perc + F.Undergrad + P.Undergrad + Outstate + Room.Board + Books + Personal + PhD + Terminal + S.F.Ratio + perc.alumni + Expend + Grad.Rate)
set.seed(3)

ridge.mod <- cv.glmnet(y = train$Apps, x = X_train, alpha = 0, lambda = 10^seq(2,-2,length=100), standardize = TRUE, nfolds = 5)
print(ridge.mod)
## 
## Call:  cv.glmnet(x = X_train, y = train$Apps, lambda = 10^seq(2, -2,      length = 100), nfolds = 5, alpha = 0, standardize = TRUE) 
## 
## Measure: Mean-Squared Error 
## 
##     Lambda Index Measure     SE Nonzero
## min  22.57    17 1224114 217572      17
## 1se 100.00     1 1249628 204525      17
ridge.mod_best <- glmnet(y = train$Apps, x = X_train, alpha = 0,lambda = 10^seq(2,-2,length=100))
ridge_pred <- predict(ridge.mod_best, s = ridge.mod$lambda.min, newx = X_test)
ridge_mse <- mean((ridge_pred - test$Apps)^2)
## Warning in ridge_pred - test$Apps: longer object length is not a multiple of
## shorter object length
ridge_mse
## [1] 29869158

(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.

set.seed(3)
lasso.mod <- cv.glmnet(y = train$Apps, x = X_train, alpha = 1, lambda = 10^seq(2,-2,length=100), standardize = TRUE, nfolds = 5)
print(lasso.mod)
## 
## Call:  cv.glmnet(x = X_train, y = train$Apps, lambda = 10^seq(2, -2,      length = 100), nfolds = 5, alpha = 1, standardize = TRUE) 
## 
## Measure: Mean-Squared Error 
## 
##     Lambda Index Measure     SE Nonzero
## min   9.77    26 1212291 231037      16
## 1se 100.00     1 1325895 292896       5
lasso.mod_best <- glmnet(y = train$Apps, x = X_train, alpha = 1,lambda = 10^seq(2,-2,length=100))
lasso_pred <- predict(lasso.mod_best, s = lasso.mod$lambda.min, newx = X_test)
lasso_mse <- mean((lasso_pred - test$Apps)^2)
## Warning in lasso_pred - test$Apps: longer object length is not a multiple of
## shorter object length
lasso_mse
## [1] 29900964
lasso_coef <- predict(lasso.mod_best, type = "coefficients", s = lasso.mod$lambda.min)
round(lasso_coef, 4)
## 19 x 1 sparse Matrix of class "dgCMatrix"
##                     1
## (Intercept) -734.9381
## (Intercept)    .     
## PrivateYes  -569.0715
## Accept         1.2808
## Enroll         .     
## Top10perc     46.0597
## Top25perc    -12.6486
## F.Undergrad    0.0179
## P.Undergrad    0.0350
## Outstate      -0.0627
## Room.Board     0.1635
## Books         -0.0007
## Personal       0.0095
## PhD           -8.0844
## Terminal      -5.6616
## S.F.Ratio     18.6745
## perc.alumni   -7.0681
## Expend         0.1202
## Grad.Rate      9.2769

One out of the 17 predictors are zero.

(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

train_control <- trainControl(method="cv", number = 10)
set.seed(3)
pcr.mod <- train(Apps ~ ., data = train, scale = T, trControl = train_control, method = 'pcr'  )
print(pcr.mod)
## Principal Component Analysis 
## 
## 544 samples
##  17 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 491, 490, 490, 490, 489, 489, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE     
##   1      3611.390  0.0344369  2637.460
##   2      1743.708  0.7735935  1276.479
##   3      1745.749  0.7734178  1271.414
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 2.
plot(pcr.mod)

pcr.mod$bestTune
##   ncomp
## 2     2
pcr_pred <- predict(pcr.mod, test, ncomp = 2)
(pcr_mse <- mean((pcr_pred - test$Apps)^2))
## [1] 6399126

(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

set.seed(3)
pls.mod <- train(Apps ~ ., data = train, scale = T, trControl = train_control, method = 'pls'  )
print(pls.mod)
## Partial Least Squares 
## 
## 544 samples
##  17 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 491, 490, 490, 490, 489, 489, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE      
##   1      1592.108  0.8101942  1172.3893
##   2      1324.546  0.8663608   831.1558
##   3      1204.053  0.8911273   794.6171
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 3.
plot(pls.mod)

pls_pred <- predict(pls.mod, test, ncomp = 3)
(pls_mse <- mean((pls_pred - test$Apps)^2))
## [1] 3234791

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

SS_tot <- sum((test$Apps - mean(test$Apps))^2)

data.frame(method = c("OLS", "Ridge", "Lasso", "PCR", "PLS"), 
           test_MSE = c(lm.mse, ridge_mse, lasso_mse, pcr_mse, pls_mse), 
           test_R2 = c(1 - sum((test$Apps - lm.pred)^2) / SS_tot,
                       1 - sum((test$Apps - ridge_pred)^2) / SS_tot, 
                       1 - sum((test$Apps - lasso_pred)^2) / SS_tot, 
                       1 - sum((test$Apps - pcr_pred)^2) / SS_tot, 
                       1 - sum((test$Apps - pls_pred)^2) / SS_tot)) %>%
  arrange(test_MSE)
## Warning in test$Apps - ridge_pred: longer object length is not a multiple of
## shorter object length
## Warning in test$Apps - lasso_pred: longer object length is not a multiple of
## shorter object length
##   method test_MSE    test_R2
## 1    OLS  1413287  0.9241061
## 2    PLS  3234791  0.8262909
## 3    PCR  6399126  0.6563652
## 4  Ridge 29869158 -2.7449184
## 5  Lasso 29900964 -2.7489062

According to the results the OLS model has the lowest Test MSE

11. We will now try to predict per capita crime rate in the Boston data set.

Boston <- Boston
str(Boston)
## 'data.frame':    506 obs. of  14 variables:
##  $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
##  $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
##  $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
##  $ rm     : num  6.58 6.42 7.18 7 7.15 ...
##  $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
##  $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
##  $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
##  $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
##  $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
##  $ black  : num  397 397 393 395 397 ...
##  $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
##  $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
control <- trainControl(method = "repeatedcv", number = 10, repeats = 10)

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

OLS

lm_model <- train(crim ~., data = Boston, trControl = control, method = "lm")
print(lm_model)
## Linear Regression 
## 
## 506 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 456, 455, 455, 456, 456, 457, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   5.843998  0.5682393  2.941571
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Ridge Regression

set.seed(42)
ridge.fit <- train(crim ~., data = Boston, method = "glmnet", metric = "RMSE", trControl = control, tuneGrid = expand.grid(alpha = 0, lambda = seq(0, 1, length = 100)))
print(ridge.fit)
## glmnet 
## 
## 506 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 455, 455, 456, 455, 456, 456, ... 
## Resampling results across tuning parameters:
## 
##   lambda      RMSE      Rsquared   MAE     
##   0.00000000  5.788281  0.5708364  2.784917
##   0.01010101  5.788281  0.5708364  2.784917
##   0.02020202  5.788281  0.5708364  2.784917
##   0.03030303  5.788281  0.5708364  2.784917
##   0.04040404  5.788281  0.5708364  2.784917
##   0.05050505  5.788281  0.5708364  2.784917
##   0.06060606  5.788281  0.5708364  2.784917
##   0.07070707  5.788281  0.5708364  2.784917
##   0.08080808  5.788281  0.5708364  2.784917
##   0.09090909  5.788281  0.5708364  2.784917
##   0.10101010  5.788281  0.5708364  2.784917
##   0.11111111  5.788281  0.5708364  2.784917
##   0.12121212  5.788281  0.5708364  2.784917
##   0.13131313  5.788281  0.5708364  2.784917
##   0.14141414  5.788281  0.5708364  2.784917
##   0.15151515  5.788281  0.5708364  2.784917
##   0.16161616  5.788281  0.5708364  2.784917
##   0.17171717  5.788281  0.5708364  2.784917
##   0.18181818  5.788281  0.5708364  2.784917
##   0.19191919  5.788281  0.5708364  2.784917
##   0.20202020  5.788281  0.5708364  2.784917
##   0.21212121  5.788281  0.5708364  2.784917
##   0.22222222  5.788281  0.5708364  2.784917
##   0.23232323  5.788281  0.5708364  2.784917
##   0.24242424  5.788281  0.5708364  2.784917
##   0.25252525  5.788281  0.5708364  2.784917
##   0.26262626  5.788281  0.5708364  2.784917
##   0.27272727  5.788281  0.5708364  2.784917
##   0.28282828  5.788281  0.5708364  2.784917
##   0.29292929  5.788281  0.5708364  2.784917
##   0.30303030  5.788281  0.5708364  2.784917
##   0.31313131  5.788281  0.5708364  2.784917
##   0.32323232  5.788281  0.5708364  2.784917
##   0.33333333  5.788281  0.5708364  2.784917
##   0.34343434  5.788281  0.5708364  2.784917
##   0.35353535  5.788281  0.5708364  2.784917
##   0.36363636  5.788281  0.5708364  2.784917
##   0.37373737  5.788281  0.5708364  2.784917
##   0.38383838  5.788281  0.5708364  2.784917
##   0.39393939  5.788281  0.5708364  2.784917
##   0.40404040  5.788281  0.5708364  2.784917
##   0.41414141  5.788281  0.5708364  2.784917
##   0.42424242  5.788281  0.5708364  2.784917
##   0.43434343  5.788281  0.5708364  2.784917
##   0.44444444  5.788281  0.5708364  2.784917
##   0.45454545  5.788281  0.5708364  2.784917
##   0.46464646  5.788281  0.5708364  2.784917
##   0.47474747  5.788281  0.5708364  2.784917
##   0.48484848  5.788281  0.5708364  2.784917
##   0.49494949  5.788313  0.5708357  2.784900
##   0.50505051  5.788382  0.5708331  2.784856
##   0.51515152  5.788495  0.5708260  2.784793
##   0.52525253  5.788683  0.5708164  2.784675
##   0.53535354  5.789008  0.5707944  2.784440
##   0.54545455  5.789457  0.5707545  2.783946
##   0.55555556  5.789722  0.5707090  2.783157
##   0.56565657  5.789635  0.5706665  2.782046
##   0.57575758  5.789479  0.5706232  2.780886
##   0.58585859  5.789327  0.5705793  2.779734
##   0.59595960  5.789175  0.5705356  2.778593
##   0.60606061  5.789031  0.5704914  2.777462
##   0.61616162  5.788906  0.5704460  2.776356
##   0.62626263  5.788793  0.5703995  2.775262
##   0.63636364  5.788687  0.5703519  2.774176
##   0.64646465  5.788582  0.5703042  2.773106
##   0.65656566  5.788476  0.5702568  2.772051
##   0.66666667  5.788380  0.5702089  2.771008
##   0.67676768  5.788300  0.5701600  2.769993
##   0.68686869  5.788230  0.5701102  2.768989
##   0.69696970  5.788167  0.5700592  2.767996
##   0.70707071  5.788104  0.5700082  2.767017
##   0.71717172  5.788040  0.5699575  2.766051
##   0.72727273  5.787982  0.5699066  2.765096
##   0.73737374  5.787936  0.5698550  2.764160
##   0.74747475  5.787902  0.5698026  2.763235
##   0.75757576  5.787875  0.5697494  2.762318
##   0.76767677  5.787853  0.5696953  2.761416
##   0.77777778  5.787831  0.5696414  2.760532
##   0.78787879  5.787806  0.5695879  2.759659
##   0.79797980  5.787787  0.5695342  2.758796
##   0.80808081  5.787778  0.5694800  2.757944
##   0.81818182  5.787780  0.5694252  2.757108
##   0.82828283  5.787788  0.5693697  2.756285
##   0.83838384  5.787802  0.5693132  2.755470
##   0.84848485  5.787816  0.5692568  2.754672
##   0.85858586  5.787829  0.5692007  2.753889
##   0.86868687  5.787841  0.5691448  2.753124
##   0.87878788  5.787859  0.5690888  2.752368
##   0.88888889  5.787888  0.5690322  2.751625
##   0.89898990  5.787925  0.5689753  2.750898
##   0.90909091  5.787967  0.5689176  2.750186
##   0.91919192  5.788014  0.5688592  2.749482
##   0.92929293  5.788062  0.5688008  2.748792
##   0.93939394  5.788107  0.5687425  2.748113
##   0.94949495  5.788152  0.5686847  2.747441
##   0.95959596  5.788200  0.5686268  2.746773
##   0.96969697  5.788255  0.5685687  2.746126
##   0.97979798  5.788319  0.5685103  2.745497
##   0.98989899  5.788388  0.5684515  2.744890
##   1.00000000  5.788462  0.5683920  2.744301
## 
## Tuning parameter 'alpha' was held constant at a value of 0
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0 and lambda = 0.8080808.
ridge.fit$bestTune
##    alpha    lambda
## 81     0 0.8080808
min(ridge.fit$results$RMSE)
## [1] 5.787778

The Lasso

lasso.fit <- train(crim ~., data = Boston, method = "glmnet", metric = "RMSE", trControl = control, tuneGrid = expand.grid(alpha = 1, lambda = seq(0, 1, length = 100)))
print(lasso.fit)
## glmnet 
## 
## 506 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 457, 455, 455, 456, 454, 456, ... 
## Resampling results across tuning parameters:
## 
##   lambda      RMSE      Rsquared   MAE     
##   0.00000000  5.798650  0.5804312  2.910859
##   0.01010101  5.793855  0.5809716  2.897997
##   0.02020202  5.785495  0.5818662  2.875008
##   0.03030303  5.778485  0.5825679  2.853748
##   0.04040404  5.772089  0.5831949  2.833090
##   0.05050505  5.766120  0.5838055  2.813385
##   0.06060606  5.761571  0.5841904  2.795629
##   0.07070707  5.758395  0.5843534  2.780064
##   0.08080808  5.756105  0.5843842  2.766013
##   0.09090909  5.754633  0.5842793  2.753891
##   0.10101010  5.753917  0.5840540  2.743583
##   0.11111111  5.754016  0.5837168  2.734308
##   0.12121212  5.754802  0.5832790  2.725782
##   0.13131313  5.756112  0.5827925  2.717833
##   0.14141414  5.757865  0.5822581  2.710578
##   0.15151515  5.759640  0.5817416  2.703985
##   0.16161616  5.761277  0.5812454  2.698419
##   0.17171717  5.762405  0.5807992  2.693417
##   0.18181818  5.763348  0.5803885  2.689023
##   0.19191919  5.764228  0.5799840  2.685259
##   0.20202020  5.764637  0.5796546  2.681701
##   0.21212121  5.765234  0.5792922  2.678940
##   0.22222222  5.765988  0.5789262  2.676837
##   0.23232323  5.766994  0.5785253  2.675224
##   0.24242424  5.768188  0.5780963  2.674169
##   0.25252525  5.769567  0.5776397  2.673640
##   0.26262626  5.770996  0.5771761  2.673299
##   0.27272727  5.772501  0.5766983  2.673239
##   0.28282828  5.774090  0.5762099  2.673268
##   0.29292929  5.775725  0.5757210  2.673469
##   0.30303030  5.777269  0.5752538  2.673482
##   0.31313131  5.778729  0.5748082  2.673089
##   0.32323232  5.780049  0.5743886  2.672440
##   0.33333333  5.781075  0.5740285  2.671342
##   0.34343434  5.781310  0.5738154  2.669126
##   0.35353535  5.780930  0.5737138  2.666102
##   0.36363636  5.780515  0.5736192  2.663119
##   0.37373737  5.779910  0.5735610  2.659863
##   0.38383838  5.779212  0.5735305  2.656452
##   0.39393939  5.778531  0.5735033  2.653062
##   0.40404040  5.777883  0.5734750  2.649721
##   0.41414141  5.777268  0.5734451  2.646444
##   0.42424242  5.776682  0.5734150  2.643243
##   0.43434343  5.776116  0.5733861  2.640098
##   0.44444444  5.775569  0.5733586  2.637004
##   0.45454545  5.775049  0.5733309  2.633976
##   0.46464646  5.774567  0.5733014  2.631006
##   0.47474747  5.774122  0.5732700  2.628088
##   0.48484848  5.773699  0.5732391  2.625191
##   0.49494949  5.773303  0.5732083  2.622334
##   0.50505051  5.772935  0.5731771  2.619515
##   0.51515152  5.772604  0.5731440  2.616762
##   0.52525253  5.772287  0.5731134  2.614029
##   0.53535354  5.771986  0.5730848  2.611327
##   0.54545455  5.771713  0.5730561  2.608646
##   0.55555556  5.771475  0.5730264  2.606019
##   0.56565657  5.771265  0.5729966  2.603418
##   0.57575758  5.771069  0.5729690  2.600852
##   0.58585859  5.770891  0.5729440  2.598360
##   0.59595960  5.770735  0.5729195  2.595905
##   0.60606061  5.770581  0.5728987  2.593491
##   0.61616162  5.770461  0.5728764  2.591122
##   0.62626263  5.770371  0.5728539  2.588813
##   0.63636364  5.770263  0.5728408  2.586559
##   0.64646465  5.770177  0.5728287  2.584353
##   0.65656566  5.770095  0.5728208  2.582197
##   0.66666667  5.770035  0.5728116  2.580114
##   0.67676768  5.770006  0.5728014  2.578070
##   0.68686869  5.770011  0.5727900  2.576102
##   0.69696970  5.770036  0.5727800  2.574170
##   0.70707071  5.770087  0.5727697  2.572326
##   0.71717172  5.770157  0.5727612  2.570535
##   0.72727273  5.770269  0.5727507  2.568799
##   0.73737374  5.770415  0.5727388  2.567109
##   0.74747475  5.770597  0.5727254  2.565472
##   0.75757576  5.770816  0.5727094  2.563908
##   0.76767677  5.771078  0.5726902  2.562413
##   0.77777778  5.771385  0.5726675  2.560933
##   0.78787879  5.771732  0.5726423  2.559489
##   0.79797980  5.772119  0.5726148  2.558086
##   0.80808081  5.772547  0.5725849  2.556742
##   0.81818182  5.773014  0.5725523  2.555437
##   0.82828283  5.773520  0.5725176  2.554173
##   0.83838384  5.774067  0.5724800  2.552955
##   0.84848485  5.774611  0.5724489  2.551783
##   0.85858586  5.775187  0.5724172  2.550670
##   0.86868687  5.775801  0.5723832  2.549618
##   0.87878788  5.776452  0.5723471  2.548617
##   0.88888889  5.777143  0.5723085  2.547661
##   0.89898990  5.777861  0.5722693  2.546747
##   0.90909091  5.778614  0.5722282  2.545895
##   0.91919192  5.779397  0.5721866  2.545085
##   0.92929293  5.780209  0.5721446  2.544298
##   0.93939394  5.781044  0.5721027  2.543548
##   0.94949495  5.781905  0.5720602  2.542824
##   0.95959596  5.782801  0.5720159  2.542130
##   0.96969697  5.783734  0.5719694  2.541490
##   0.97979798  5.784682  0.5719255  2.540873
##   0.98989899  5.785658  0.5718815  2.540287
##   1.00000000  5.786667  0.5718363  2.539735
## 
## Tuning parameter 'alpha' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.1010101.
min(lasso.fit$results$RMSE)
## [1] 5.753917

Principal Components Regression

pcr.fit <- train(crim ~ ., data = Boston, method = "pcr", preProcess = c("center", "scale"), metric = "RMSE", maximize = F, trControl = control, tuneGrid = expand.grid(ncomp = 1:(ncol(Boston)-1)))
print(pcr.fit)
## Principal Component Analysis 
## 
## 506 samples
##  13 predictor
## 
## Pre-processing: centered (13), scaled (13) 
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 456, 455, 455, 456, 455, 455, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE     
##    1     6.562125  0.4028868  3.611301
##    2     6.558251  0.4037856  3.648972
##    3     6.105260  0.5084291  2.925960
##    4     6.086937  0.5122725  2.897432
##    5     6.096105  0.5107041  2.906228
##    6     6.113882  0.5087445  2.966602
##    7     6.115560  0.5089303  3.034096
##    8     5.968378  0.5398229  2.919653
##    9     5.977378  0.5387653  2.908556
##   10     5.967625  0.5416283  2.867324
##   11     5.971455  0.5411952  2.865369
##   12     5.950563  0.5441186  2.987375
##   13     5.882760  0.5575203  2.916545
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 13.
min(pcr.fit$results$RMSE)
## [1] 5.88276
plot(pcr.fit)

Partial Least Squares

pls.fit <- train(crim ~ ., data = Boston, method = "pls", preProcess = c("center", "scale"), metric = "RMSE", maximize = F, trControl = control, tuneGrid = expand.grid(ncomp = 1:(ncol(Boston)-1)))

print(pcr.fit)
## Principal Component Analysis 
## 
## 506 samples
##  13 predictor
## 
## Pre-processing: centered (13), scaled (13) 
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 456, 455, 455, 456, 455, 455, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE     
##    1     6.562125  0.4028868  3.611301
##    2     6.558251  0.4037856  3.648972
##    3     6.105260  0.5084291  2.925960
##    4     6.086937  0.5122725  2.897432
##    5     6.096105  0.5107041  2.906228
##    6     6.113882  0.5087445  2.966602
##    7     6.115560  0.5089303  3.034096
##    8     5.968378  0.5398229  2.919653
##    9     5.977378  0.5387653  2.908556
##   10     5.967625  0.5416283  2.867324
##   11     5.971455  0.5411952  2.865369
##   12     5.950563  0.5441186  2.987375
##   13     5.882760  0.5575203  2.916545
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 13.
plot(pls.fit)

min(pls.fit$results$RMSE)
## [1] 5.8453

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

data.frame(method = c("OLS", "Ridge", "Lasso", "PCR", "PLS"), 
           CV_RMSE = round(c(lm_model$results$RMSE, min(ridge.fit$results$RMSE), min(lasso.fit$results$RMSE), min(pcr.fit$results$RMSE), min(pls.fit$results$RMSE)),4)) %>% arrange(CV_RMSE)
##   method CV_RMSE
## 1  Lasso  5.7539
## 2  Ridge  5.7878
## 3    OLS  5.8440
## 4    PLS  5.8453
## 5    PCR  5.8828

The best model according to my results is the lasso model with the lowest RMSE of 5.7539.

(c) Does your chosen model involve all of the features in the data set? Why or why not?

lasso_best <- train(crim ~., data = Boston, method = "glmnet", metric = "RMSE", trControl = control, tuneGrid = expand.grid(alpha = 1, lambda = seq(0, 1, length = 100)))
print(lasso_best)
## glmnet 
## 
## 506 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 455, 455, 456, 455, 455, 456, ... 
## Resampling results across tuning parameters:
## 
##   lambda      RMSE      Rsquared   MAE     
##   0.00000000  5.933629  0.5566482  2.907770
##   0.01010101  5.929668  0.5570211  2.895452
##   0.02020202  5.923113  0.5575818  2.873112
##   0.03030303  5.917840  0.5579689  2.851965
##   0.04040404  5.913296  0.5582703  2.831880
##   0.05050505  5.909119  0.5585626  2.812843
##   0.06060606  5.906241  0.5586583  2.796078
##   0.07070707  5.904442  0.5585881  2.781000
##   0.08080808  5.903404  0.5584066  2.767545
##   0.09090909  5.903199  0.5581016  2.755649
##   0.10101010  5.903863  0.5576621  2.745460
##   0.11111111  5.905234  0.5571289  2.736538
##   0.12121212  5.907238  0.5565025  2.728349
##   0.13131313  5.909670  0.5558159  2.720601
##   0.14141414  5.912029  0.5551527  2.713312
##   0.15151515  5.914301  0.5545129  2.706726
##   0.16161616  5.916456  0.5538892  2.701238
##   0.17171717  5.918213  0.5533047  2.696487
##   0.18181818  5.919838  0.5527421  2.692188
##   0.19191919  5.921296  0.5521933  2.688309
##   0.20202020  5.922148  0.5517678  2.684734
##   0.21212121  5.923031  0.5513426  2.681838
##   0.22222222  5.924050  0.5509260  2.679516
##   0.23232323  5.925370  0.5504708  2.677714
##   0.24242424  5.926875  0.5499889  2.676284
##   0.25252525  5.928568  0.5494793  2.675486
##   0.26262626  5.930407  0.5489475  2.675022
##   0.27272727  5.932394  0.5483994  2.674934
##   0.28282828  5.934507  0.5478352  2.675088
##   0.29292929  5.936657  0.5472672  2.675343
##   0.30303030  5.938609  0.5467373  2.675355
##   0.31313131  5.940452  0.5462347  2.675094
##   0.32323232  5.942082  0.5457842  2.674333
##   0.33333333  5.943445  0.5453891  2.673218
##   0.34343434  5.944099  0.5451289  2.671225
##   0.35353535  5.944171  0.5449879  2.668287
##   0.36363636  5.944174  0.5448648  2.665171
##   0.37373737  5.944063  0.5447681  2.661873
##   0.38383838  5.943874  0.5446965  2.658381
##   0.39393939  5.943715  0.5446247  2.654923
##   0.40404040  5.943594  0.5445505  2.651523
##   0.41414141  5.943507  0.5444745  2.648194
##   0.42424242  5.943453  0.5443970  2.644953
##   0.43434343  5.943428  0.5443186  2.641764
##   0.44444444  5.943436  0.5442388  2.638611
##   0.45454545  5.943471  0.5441587  2.635563
##   0.46464646  5.943519  0.5440815  2.632573
##   0.47474747  5.943596  0.5440037  2.629650
##   0.48484848  5.943697  0.5439258  2.626794
##   0.49494949  5.943829  0.5438469  2.623985
##   0.50505051  5.943978  0.5437684  2.621235
##   0.51515152  5.944159  0.5436880  2.618516
##   0.52525253  5.944367  0.5436071  2.615834
##   0.53535354  5.944598  0.5435265  2.613203
##   0.54545455  5.944865  0.5434440  2.610615
##   0.55555556  5.945158  0.5433607  2.608075
##   0.56565657  5.945480  0.5432763  2.605584
##   0.57575758  5.945816  0.5431944  2.603140
##   0.58585859  5.946156  0.5431164  2.600766
##   0.59595960  5.946521  0.5430385  2.598404
##   0.60606061  5.946909  0.5429608  2.596083
##   0.61616162  5.947320  0.5428840  2.593811
##   0.62626263  5.947761  0.5428061  2.591620
##   0.63636364  5.948221  0.5427297  2.589469
##   0.64646465  5.948707  0.5426527  2.587386
##   0.65656566  5.949202  0.5425774  2.585363
##   0.66666667  5.949703  0.5425062  2.583381
##   0.67676768  5.950228  0.5424354  2.581440
##   0.68686869  5.950779  0.5423643  2.579579
##   0.69696970  5.951351  0.5422941  2.577765
##   0.70707071  5.951950  0.5422234  2.576004
##   0.71717172  5.952578  0.5421512  2.574314
##   0.72727273  5.953243  0.5420772  2.572666
##   0.73737374  5.953939  0.5420020  2.571082
##   0.74747475  5.954648  0.5419295  2.569541
##   0.75757576  5.955375  0.5418582  2.568022
##   0.76767677  5.956116  0.5417885  2.566539
##   0.77777778  5.956873  0.5417206  2.565105
##   0.78787879  5.957661  0.5416509  2.563724
##   0.79797980  5.958492  0.5415778  2.562395
##   0.80808081  5.959361  0.5415023  2.561108
##   0.81818182  5.960266  0.5414246  2.559855
##   0.82828283  5.961180  0.5413489  2.558656
##   0.83838384  5.962123  0.5412721  2.557497
##   0.84848485  5.963095  0.5411943  2.556389
##   0.85858586  5.964098  0.5411153  2.555318
##   0.86868687  5.965137  0.5410340  2.554285
##   0.87878788  5.966212  0.5409502  2.553296
##   0.88888889  5.967313  0.5408672  2.552338
##   0.89898990  5.968435  0.5407850  2.551416
##   0.90909091  5.969574  0.5407036  2.550529
##   0.91919192  5.970740  0.5406213  2.549677
##   0.92929293  5.971928  0.5405396  2.548863
##   0.93939394  5.973125  0.5404610  2.548082
##   0.94949495  5.974350  0.5403815  2.547329
##   0.95959596  5.975605  0.5403004  2.546618
##   0.96969697  5.976875  0.5402204  2.545955
##   0.97979798  5.978160  0.5401431  2.545323
##   0.98989899  5.979457  0.5400685  2.544712
##   1.00000000  5.980771  0.5399952  2.544147
## 
## Tuning parameter 'alpha' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.09090909.
lasso_best_tuned <- train(medv ~., data = Boston, method = "lasso", metric = "RMSE", trControl = control, tuneGrid = expand.grid(fraction=c(1,0.1,0.01,0.001)))
lasso_best_tuned$results$RMSE
## [1] 9.121896 8.974774 7.636392 4.792789
predict(lasso_best_tuned$finalModel, type = "coef", mode = "fraction", s = as.numeric(lasso_best_tuned$bestTune))
## $s
## [1] 1
## 
## $fraction
## 0 
## 1 
## 
## $mode
## [1] "fraction"
## 
## $coefficients
##          crim            zn         indus          chas           nox 
## -1.080114e-01  4.642046e-02  2.055863e-02  2.686734e+00 -1.776661e+01 
##            rm           age           dis           rad           tax 
##  3.809865e+00  6.922246e-04 -1.475567e+00  3.060495e-01 -1.233459e-02 
##       ptratio         black         lstat 
## -9.527472e-01  9.311683e-03 -5.247584e-01