(a) The lasso, relative to least squares, is:
i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
PART iii is correct. As with ridge regression, when the least squares estimates have excessively high variance, the lasso solution can yield a reduction in variance at the expense of a small increase in bias, and consequently can generate more accurate predictions.
iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
(b) Repeat (a) for ridge regression relative to least squares.
PART iii is correct. As with ridge regression, when the least squares estimates have excessively high variance, the lasso solution can yield a reduction in variance at the expense of a small increase in bias, and consequently can generate more accurate predictions. Unlike ridge regression, the lasso performs variable selection, and hence results in models that are easier to interpret.
(c) Repeat (a) for non-linear methods relative to least squares.
PART ii is correct. For non-linear methods the result will be more flexible models, which will lead to a decrease in bias less than and increase in varriance , resulting in higher accuracy.
College data set.## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.5 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
## Warning: package 'pls' was built under R version 4.0.4
##
## Attaching package: 'pls'
## The following object is masked from 'package:caret':
##
## R2
## The following object is masked from 'package:stats':
##
## loadings
(a) Split the data set into a training set and a test set.
str(College)
## 'data.frame': 777 obs. of 18 variables:
## $ Private : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ Apps : num 1660 2186 1428 417 193 ...
## $ Accept : num 1232 1924 1097 349 146 ...
## $ Enroll : num 721 512 336 137 55 158 103 489 227 172 ...
## $ Top10perc : num 23 16 22 60 16 38 17 37 30 21 ...
## $ Top25perc : num 52 29 50 89 44 62 45 68 63 44 ...
## $ F.Undergrad: num 2885 2683 1036 510 249 ...
## $ P.Undergrad: num 537 1227 99 63 869 ...
## $ Outstate : num 7440 12280 11250 12960 7560 ...
## $ Room.Board : num 3300 6450 3750 5450 4120 ...
## $ Books : num 450 750 400 450 800 500 500 450 300 660 ...
## $ Personal : num 2200 1500 1165 875 1500 ...
## $ PhD : num 70 29 53 92 76 67 90 89 79 40 ...
## $ Terminal : num 78 30 66 97 72 73 93 100 84 41 ...
## $ S.F.Ratio : num 18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...
## $ perc.alumni: num 12 16 30 37 2 11 26 37 23 15 ...
## $ Expend : num 7041 10527 8735 19016 10922 ...
## $ Grad.Rate : num 60 56 54 59 15 55 63 73 80 52 ...
set.seed(3)
train_index <- sample(1:nrow(College), round(nrow(College) * 0.7))
train <- College[train_index, ]
test <- College[-train_index, ]
dim(train)
## [1] 544 18
dim(test)
## [1] 233 18
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
lm.fit <- lm(Apps ~., data = train)
summary(lm.fit)
##
## Call:
## lm(formula = Apps ~ ., data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3371.9 -437.6 -63.3 350.2 6233.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -605.81244 493.25798 -1.228 0.219927
## PrivateYes -584.35163 164.18527 -3.559 0.000406 ***
## Accept 1.32861 0.06124 21.696 < 2e-16 ***
## Enroll -0.24166 0.21699 -1.114 0.265908
## Top10perc 52.92817 6.99182 7.570 1.68e-13 ***
## Top25perc -17.86210 5.44140 -3.283 0.001097 **
## F.Undergrad 0.04199 0.03837 1.094 0.274342
## P.Undergrad 0.04110 0.03435 1.197 0.231996
## Outstate -0.07531 0.02325 -3.239 0.001274 **
## Room.Board 0.17973 0.05972 3.010 0.002742 **
## Books -0.09909 0.26773 -0.370 0.711436
## Personal 0.02642 0.07653 0.345 0.730076
## PhD -9.93047 5.51335 -1.801 0.072249 .
## Terminal -5.68887 6.08867 -0.934 0.350559
## S.F.Ratio 23.09251 15.20844 1.518 0.129514
## perc.alumni -6.52534 5.04819 -1.293 0.196713
## Expend 0.12521 0.01797 6.966 9.77e-12 ***
## Grad.Rate 10.65431 3.52288 3.024 0.002614 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1042 on 526 degrees of freedom
## Multiple R-squared: 0.9215, Adjusted R-squared: 0.9189
## F-statistic: 363 on 17 and 526 DF, p-value: < 2.2e-16
lm.pred <- predict(lm.fit, test)
lm.mse <- mean((lm.pred - test$Apps)^2)
lm.mse
## [1] 1413287
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.0.4
X_train <- model.matrix(data = train, Apps ~ Private + Accept + Enroll + Top10perc + Top25perc + F.Undergrad + P.Undergrad + Outstate + Room.Board + Books + Personal + PhD + Terminal + S.F.Ratio + perc.alumni + Expend + Grad.Rate)
X_test <- X_test <- model.matrix(data = train, Apps ~ Private + Accept + Enroll + Top10perc + Top25perc + F.Undergrad + P.Undergrad + Outstate + Room.Board + Books + Personal + PhD + Terminal + S.F.Ratio + perc.alumni + Expend + Grad.Rate)
set.seed(3)
ridge.mod <- cv.glmnet(y = train$Apps, x = X_train, alpha = 0, lambda = 10^seq(2,-2,length=100), standardize = TRUE, nfolds = 5)
print(ridge.mod)
##
## Call: cv.glmnet(x = X_train, y = train$Apps, lambda = 10^seq(2, -2, length = 100), nfolds = 5, alpha = 0, standardize = TRUE)
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 22.57 17 1224114 217572 17
## 1se 100.00 1 1249628 204525 17
ridge.mod_best <- glmnet(y = train$Apps, x = X_train, alpha = 0,lambda = 10^seq(2,-2,length=100))
ridge_pred <- predict(ridge.mod_best, s = ridge.mod$lambda.min, newx = X_test)
ridge_mse <- mean((ridge_pred - test$Apps)^2)
## Warning in ridge_pred - test$Apps: longer object length is not a multiple of
## shorter object length
ridge_mse
## [1] 29869158
(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
set.seed(3)
lasso.mod <- cv.glmnet(y = train$Apps, x = X_train, alpha = 1, lambda = 10^seq(2,-2,length=100), standardize = TRUE, nfolds = 5)
print(lasso.mod)
##
## Call: cv.glmnet(x = X_train, y = train$Apps, lambda = 10^seq(2, -2, length = 100), nfolds = 5, alpha = 1, standardize = TRUE)
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 9.77 26 1212291 231037 16
## 1se 100.00 1 1325895 292896 5
lasso.mod_best <- glmnet(y = train$Apps, x = X_train, alpha = 1,lambda = 10^seq(2,-2,length=100))
lasso_pred <- predict(lasso.mod_best, s = lasso.mod$lambda.min, newx = X_test)
lasso_mse <- mean((lasso_pred - test$Apps)^2)
## Warning in lasso_pred - test$Apps: longer object length is not a multiple of
## shorter object length
lasso_mse
## [1] 29900964
lasso_coef <- predict(lasso.mod_best, type = "coefficients", s = lasso.mod$lambda.min)
round(lasso_coef, 4)
## 19 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -734.9381
## (Intercept) .
## PrivateYes -569.0715
## Accept 1.2808
## Enroll .
## Top10perc 46.0597
## Top25perc -12.6486
## F.Undergrad 0.0179
## P.Undergrad 0.0350
## Outstate -0.0627
## Room.Board 0.1635
## Books -0.0007
## Personal 0.0095
## PhD -8.0844
## Terminal -5.6616
## S.F.Ratio 18.6745
## perc.alumni -7.0681
## Expend 0.1202
## Grad.Rate 9.2769
One out of the 17 predictors are zero.
(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
train_control <- trainControl(method="cv", number = 10)
set.seed(3)
pcr.mod <- train(Apps ~ ., data = train, scale = T, trControl = train_control, method = 'pcr' )
print(pcr.mod)
## Principal Component Analysis
##
## 544 samples
## 17 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 491, 490, 490, 490, 489, 489, ...
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 3611.390 0.0344369 2637.460
## 2 1743.708 0.7735935 1276.479
## 3 1745.749 0.7734178 1271.414
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 2.
plot(pcr.mod)
pcr.mod$bestTune
## ncomp
## 2 2
pcr_pred <- predict(pcr.mod, test, ncomp = 2)
(pcr_mse <- mean((pcr_pred - test$Apps)^2))
## [1] 6399126
(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
set.seed(3)
pls.mod <- train(Apps ~ ., data = train, scale = T, trControl = train_control, method = 'pls' )
print(pls.mod)
## Partial Least Squares
##
## 544 samples
## 17 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 491, 490, 490, 490, 489, 489, ...
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 1592.108 0.8101942 1172.3893
## 2 1324.546 0.8663608 831.1558
## 3 1204.053 0.8911273 794.6171
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 3.
plot(pls.mod)
pls_pred <- predict(pls.mod, test, ncomp = 3)
(pls_mse <- mean((pls_pred - test$Apps)^2))
## [1] 3234791
(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
SS_tot <- sum((test$Apps - mean(test$Apps))^2)
data.frame(method = c("OLS", "Ridge", "Lasso", "PCR", "PLS"),
test_MSE = c(lm.mse, ridge_mse, lasso_mse, pcr_mse, pls_mse),
test_R2 = c(1 - sum((test$Apps - lm.pred)^2) / SS_tot,
1 - sum((test$Apps - ridge_pred)^2) / SS_tot,
1 - sum((test$Apps - lasso_pred)^2) / SS_tot,
1 - sum((test$Apps - pcr_pred)^2) / SS_tot,
1 - sum((test$Apps - pls_pred)^2) / SS_tot)) %>%
arrange(test_MSE)
## Warning in test$Apps - ridge_pred: longer object length is not a multiple of
## shorter object length
## Warning in test$Apps - lasso_pred: longer object length is not a multiple of
## shorter object length
## method test_MSE test_R2
## 1 OLS 1413287 0.9241061
## 2 PLS 3234791 0.8262909
## 3 PCR 6399126 0.6563652
## 4 Ridge 29869158 -2.7449184
## 5 Lasso 29900964 -2.7489062
According to the results the OLS model has the lowest Test MSE
Boston <- Boston
str(Boston)
## 'data.frame': 506 obs. of 14 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : int 0 0 0 0 0 0 0 0 0 0 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : int 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ black : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
control <- trainControl(method = "repeatedcv", number = 10, repeats = 10)
(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
lm_model <- train(crim ~., data = Boston, trControl = control, method = "lm")
print(lm_model)
## Linear Regression
##
## 506 samples
## 13 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times)
## Summary of sample sizes: 456, 455, 455, 456, 456, 457, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 5.843998 0.5682393 2.941571
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
set.seed(42)
ridge.fit <- train(crim ~., data = Boston, method = "glmnet", metric = "RMSE", trControl = control, tuneGrid = expand.grid(alpha = 0, lambda = seq(0, 1, length = 100)))
print(ridge.fit)
## glmnet
##
## 506 samples
## 13 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times)
## Summary of sample sizes: 455, 455, 456, 455, 456, 456, ...
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.00000000 5.788281 0.5708364 2.784917
## 0.01010101 5.788281 0.5708364 2.784917
## 0.02020202 5.788281 0.5708364 2.784917
## 0.03030303 5.788281 0.5708364 2.784917
## 0.04040404 5.788281 0.5708364 2.784917
## 0.05050505 5.788281 0.5708364 2.784917
## 0.06060606 5.788281 0.5708364 2.784917
## 0.07070707 5.788281 0.5708364 2.784917
## 0.08080808 5.788281 0.5708364 2.784917
## 0.09090909 5.788281 0.5708364 2.784917
## 0.10101010 5.788281 0.5708364 2.784917
## 0.11111111 5.788281 0.5708364 2.784917
## 0.12121212 5.788281 0.5708364 2.784917
## 0.13131313 5.788281 0.5708364 2.784917
## 0.14141414 5.788281 0.5708364 2.784917
## 0.15151515 5.788281 0.5708364 2.784917
## 0.16161616 5.788281 0.5708364 2.784917
## 0.17171717 5.788281 0.5708364 2.784917
## 0.18181818 5.788281 0.5708364 2.784917
## 0.19191919 5.788281 0.5708364 2.784917
## 0.20202020 5.788281 0.5708364 2.784917
## 0.21212121 5.788281 0.5708364 2.784917
## 0.22222222 5.788281 0.5708364 2.784917
## 0.23232323 5.788281 0.5708364 2.784917
## 0.24242424 5.788281 0.5708364 2.784917
## 0.25252525 5.788281 0.5708364 2.784917
## 0.26262626 5.788281 0.5708364 2.784917
## 0.27272727 5.788281 0.5708364 2.784917
## 0.28282828 5.788281 0.5708364 2.784917
## 0.29292929 5.788281 0.5708364 2.784917
## 0.30303030 5.788281 0.5708364 2.784917
## 0.31313131 5.788281 0.5708364 2.784917
## 0.32323232 5.788281 0.5708364 2.784917
## 0.33333333 5.788281 0.5708364 2.784917
## 0.34343434 5.788281 0.5708364 2.784917
## 0.35353535 5.788281 0.5708364 2.784917
## 0.36363636 5.788281 0.5708364 2.784917
## 0.37373737 5.788281 0.5708364 2.784917
## 0.38383838 5.788281 0.5708364 2.784917
## 0.39393939 5.788281 0.5708364 2.784917
## 0.40404040 5.788281 0.5708364 2.784917
## 0.41414141 5.788281 0.5708364 2.784917
## 0.42424242 5.788281 0.5708364 2.784917
## 0.43434343 5.788281 0.5708364 2.784917
## 0.44444444 5.788281 0.5708364 2.784917
## 0.45454545 5.788281 0.5708364 2.784917
## 0.46464646 5.788281 0.5708364 2.784917
## 0.47474747 5.788281 0.5708364 2.784917
## 0.48484848 5.788281 0.5708364 2.784917
## 0.49494949 5.788313 0.5708357 2.784900
## 0.50505051 5.788382 0.5708331 2.784856
## 0.51515152 5.788495 0.5708260 2.784793
## 0.52525253 5.788683 0.5708164 2.784675
## 0.53535354 5.789008 0.5707944 2.784440
## 0.54545455 5.789457 0.5707545 2.783946
## 0.55555556 5.789722 0.5707090 2.783157
## 0.56565657 5.789635 0.5706665 2.782046
## 0.57575758 5.789479 0.5706232 2.780886
## 0.58585859 5.789327 0.5705793 2.779734
## 0.59595960 5.789175 0.5705356 2.778593
## 0.60606061 5.789031 0.5704914 2.777462
## 0.61616162 5.788906 0.5704460 2.776356
## 0.62626263 5.788793 0.5703995 2.775262
## 0.63636364 5.788687 0.5703519 2.774176
## 0.64646465 5.788582 0.5703042 2.773106
## 0.65656566 5.788476 0.5702568 2.772051
## 0.66666667 5.788380 0.5702089 2.771008
## 0.67676768 5.788300 0.5701600 2.769993
## 0.68686869 5.788230 0.5701102 2.768989
## 0.69696970 5.788167 0.5700592 2.767996
## 0.70707071 5.788104 0.5700082 2.767017
## 0.71717172 5.788040 0.5699575 2.766051
## 0.72727273 5.787982 0.5699066 2.765096
## 0.73737374 5.787936 0.5698550 2.764160
## 0.74747475 5.787902 0.5698026 2.763235
## 0.75757576 5.787875 0.5697494 2.762318
## 0.76767677 5.787853 0.5696953 2.761416
## 0.77777778 5.787831 0.5696414 2.760532
## 0.78787879 5.787806 0.5695879 2.759659
## 0.79797980 5.787787 0.5695342 2.758796
## 0.80808081 5.787778 0.5694800 2.757944
## 0.81818182 5.787780 0.5694252 2.757108
## 0.82828283 5.787788 0.5693697 2.756285
## 0.83838384 5.787802 0.5693132 2.755470
## 0.84848485 5.787816 0.5692568 2.754672
## 0.85858586 5.787829 0.5692007 2.753889
## 0.86868687 5.787841 0.5691448 2.753124
## 0.87878788 5.787859 0.5690888 2.752368
## 0.88888889 5.787888 0.5690322 2.751625
## 0.89898990 5.787925 0.5689753 2.750898
## 0.90909091 5.787967 0.5689176 2.750186
## 0.91919192 5.788014 0.5688592 2.749482
## 0.92929293 5.788062 0.5688008 2.748792
## 0.93939394 5.788107 0.5687425 2.748113
## 0.94949495 5.788152 0.5686847 2.747441
## 0.95959596 5.788200 0.5686268 2.746773
## 0.96969697 5.788255 0.5685687 2.746126
## 0.97979798 5.788319 0.5685103 2.745497
## 0.98989899 5.788388 0.5684515 2.744890
## 1.00000000 5.788462 0.5683920 2.744301
##
## Tuning parameter 'alpha' was held constant at a value of 0
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0 and lambda = 0.8080808.
ridge.fit$bestTune
## alpha lambda
## 81 0 0.8080808
min(ridge.fit$results$RMSE)
## [1] 5.787778
lasso.fit <- train(crim ~., data = Boston, method = "glmnet", metric = "RMSE", trControl = control, tuneGrid = expand.grid(alpha = 1, lambda = seq(0, 1, length = 100)))
print(lasso.fit)
## glmnet
##
## 506 samples
## 13 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times)
## Summary of sample sizes: 457, 455, 455, 456, 454, 456, ...
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.00000000 5.798650 0.5804312 2.910859
## 0.01010101 5.793855 0.5809716 2.897997
## 0.02020202 5.785495 0.5818662 2.875008
## 0.03030303 5.778485 0.5825679 2.853748
## 0.04040404 5.772089 0.5831949 2.833090
## 0.05050505 5.766120 0.5838055 2.813385
## 0.06060606 5.761571 0.5841904 2.795629
## 0.07070707 5.758395 0.5843534 2.780064
## 0.08080808 5.756105 0.5843842 2.766013
## 0.09090909 5.754633 0.5842793 2.753891
## 0.10101010 5.753917 0.5840540 2.743583
## 0.11111111 5.754016 0.5837168 2.734308
## 0.12121212 5.754802 0.5832790 2.725782
## 0.13131313 5.756112 0.5827925 2.717833
## 0.14141414 5.757865 0.5822581 2.710578
## 0.15151515 5.759640 0.5817416 2.703985
## 0.16161616 5.761277 0.5812454 2.698419
## 0.17171717 5.762405 0.5807992 2.693417
## 0.18181818 5.763348 0.5803885 2.689023
## 0.19191919 5.764228 0.5799840 2.685259
## 0.20202020 5.764637 0.5796546 2.681701
## 0.21212121 5.765234 0.5792922 2.678940
## 0.22222222 5.765988 0.5789262 2.676837
## 0.23232323 5.766994 0.5785253 2.675224
## 0.24242424 5.768188 0.5780963 2.674169
## 0.25252525 5.769567 0.5776397 2.673640
## 0.26262626 5.770996 0.5771761 2.673299
## 0.27272727 5.772501 0.5766983 2.673239
## 0.28282828 5.774090 0.5762099 2.673268
## 0.29292929 5.775725 0.5757210 2.673469
## 0.30303030 5.777269 0.5752538 2.673482
## 0.31313131 5.778729 0.5748082 2.673089
## 0.32323232 5.780049 0.5743886 2.672440
## 0.33333333 5.781075 0.5740285 2.671342
## 0.34343434 5.781310 0.5738154 2.669126
## 0.35353535 5.780930 0.5737138 2.666102
## 0.36363636 5.780515 0.5736192 2.663119
## 0.37373737 5.779910 0.5735610 2.659863
## 0.38383838 5.779212 0.5735305 2.656452
## 0.39393939 5.778531 0.5735033 2.653062
## 0.40404040 5.777883 0.5734750 2.649721
## 0.41414141 5.777268 0.5734451 2.646444
## 0.42424242 5.776682 0.5734150 2.643243
## 0.43434343 5.776116 0.5733861 2.640098
## 0.44444444 5.775569 0.5733586 2.637004
## 0.45454545 5.775049 0.5733309 2.633976
## 0.46464646 5.774567 0.5733014 2.631006
## 0.47474747 5.774122 0.5732700 2.628088
## 0.48484848 5.773699 0.5732391 2.625191
## 0.49494949 5.773303 0.5732083 2.622334
## 0.50505051 5.772935 0.5731771 2.619515
## 0.51515152 5.772604 0.5731440 2.616762
## 0.52525253 5.772287 0.5731134 2.614029
## 0.53535354 5.771986 0.5730848 2.611327
## 0.54545455 5.771713 0.5730561 2.608646
## 0.55555556 5.771475 0.5730264 2.606019
## 0.56565657 5.771265 0.5729966 2.603418
## 0.57575758 5.771069 0.5729690 2.600852
## 0.58585859 5.770891 0.5729440 2.598360
## 0.59595960 5.770735 0.5729195 2.595905
## 0.60606061 5.770581 0.5728987 2.593491
## 0.61616162 5.770461 0.5728764 2.591122
## 0.62626263 5.770371 0.5728539 2.588813
## 0.63636364 5.770263 0.5728408 2.586559
## 0.64646465 5.770177 0.5728287 2.584353
## 0.65656566 5.770095 0.5728208 2.582197
## 0.66666667 5.770035 0.5728116 2.580114
## 0.67676768 5.770006 0.5728014 2.578070
## 0.68686869 5.770011 0.5727900 2.576102
## 0.69696970 5.770036 0.5727800 2.574170
## 0.70707071 5.770087 0.5727697 2.572326
## 0.71717172 5.770157 0.5727612 2.570535
## 0.72727273 5.770269 0.5727507 2.568799
## 0.73737374 5.770415 0.5727388 2.567109
## 0.74747475 5.770597 0.5727254 2.565472
## 0.75757576 5.770816 0.5727094 2.563908
## 0.76767677 5.771078 0.5726902 2.562413
## 0.77777778 5.771385 0.5726675 2.560933
## 0.78787879 5.771732 0.5726423 2.559489
## 0.79797980 5.772119 0.5726148 2.558086
## 0.80808081 5.772547 0.5725849 2.556742
## 0.81818182 5.773014 0.5725523 2.555437
## 0.82828283 5.773520 0.5725176 2.554173
## 0.83838384 5.774067 0.5724800 2.552955
## 0.84848485 5.774611 0.5724489 2.551783
## 0.85858586 5.775187 0.5724172 2.550670
## 0.86868687 5.775801 0.5723832 2.549618
## 0.87878788 5.776452 0.5723471 2.548617
## 0.88888889 5.777143 0.5723085 2.547661
## 0.89898990 5.777861 0.5722693 2.546747
## 0.90909091 5.778614 0.5722282 2.545895
## 0.91919192 5.779397 0.5721866 2.545085
## 0.92929293 5.780209 0.5721446 2.544298
## 0.93939394 5.781044 0.5721027 2.543548
## 0.94949495 5.781905 0.5720602 2.542824
## 0.95959596 5.782801 0.5720159 2.542130
## 0.96969697 5.783734 0.5719694 2.541490
## 0.97979798 5.784682 0.5719255 2.540873
## 0.98989899 5.785658 0.5718815 2.540287
## 1.00000000 5.786667 0.5718363 2.539735
##
## Tuning parameter 'alpha' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.1010101.
min(lasso.fit$results$RMSE)
## [1] 5.753917
pcr.fit <- train(crim ~ ., data = Boston, method = "pcr", preProcess = c("center", "scale"), metric = "RMSE", maximize = F, trControl = control, tuneGrid = expand.grid(ncomp = 1:(ncol(Boston)-1)))
print(pcr.fit)
## Principal Component Analysis
##
## 506 samples
## 13 predictor
##
## Pre-processing: centered (13), scaled (13)
## Resampling: Cross-Validated (10 fold, repeated 10 times)
## Summary of sample sizes: 456, 455, 455, 456, 455, 455, ...
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 6.562125 0.4028868 3.611301
## 2 6.558251 0.4037856 3.648972
## 3 6.105260 0.5084291 2.925960
## 4 6.086937 0.5122725 2.897432
## 5 6.096105 0.5107041 2.906228
## 6 6.113882 0.5087445 2.966602
## 7 6.115560 0.5089303 3.034096
## 8 5.968378 0.5398229 2.919653
## 9 5.977378 0.5387653 2.908556
## 10 5.967625 0.5416283 2.867324
## 11 5.971455 0.5411952 2.865369
## 12 5.950563 0.5441186 2.987375
## 13 5.882760 0.5575203 2.916545
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 13.
min(pcr.fit$results$RMSE)
## [1] 5.88276
plot(pcr.fit)
pls.fit <- train(crim ~ ., data = Boston, method = "pls", preProcess = c("center", "scale"), metric = "RMSE", maximize = F, trControl = control, tuneGrid = expand.grid(ncomp = 1:(ncol(Boston)-1)))
print(pcr.fit)
## Principal Component Analysis
##
## 506 samples
## 13 predictor
##
## Pre-processing: centered (13), scaled (13)
## Resampling: Cross-Validated (10 fold, repeated 10 times)
## Summary of sample sizes: 456, 455, 455, 456, 455, 455, ...
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 6.562125 0.4028868 3.611301
## 2 6.558251 0.4037856 3.648972
## 3 6.105260 0.5084291 2.925960
## 4 6.086937 0.5122725 2.897432
## 5 6.096105 0.5107041 2.906228
## 6 6.113882 0.5087445 2.966602
## 7 6.115560 0.5089303 3.034096
## 8 5.968378 0.5398229 2.919653
## 9 5.977378 0.5387653 2.908556
## 10 5.967625 0.5416283 2.867324
## 11 5.971455 0.5411952 2.865369
## 12 5.950563 0.5441186 2.987375
## 13 5.882760 0.5575203 2.916545
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 13.
plot(pls.fit)
min(pls.fit$results$RMSE)
## [1] 5.8453
(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.
data.frame(method = c("OLS", "Ridge", "Lasso", "PCR", "PLS"),
CV_RMSE = round(c(lm_model$results$RMSE, min(ridge.fit$results$RMSE), min(lasso.fit$results$RMSE), min(pcr.fit$results$RMSE), min(pls.fit$results$RMSE)),4)) %>% arrange(CV_RMSE)
## method CV_RMSE
## 1 Lasso 5.7539
## 2 Ridge 5.7878
## 3 OLS 5.8440
## 4 PLS 5.8453
## 5 PCR 5.8828
The best model according to my results is the lasso model with the lowest RMSE of 5.7539.
(c) Does your chosen model involve all of the features in the data set? Why or why not?
lasso_best <- train(crim ~., data = Boston, method = "glmnet", metric = "RMSE", trControl = control, tuneGrid = expand.grid(alpha = 1, lambda = seq(0, 1, length = 100)))
print(lasso_best)
## glmnet
##
## 506 samples
## 13 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times)
## Summary of sample sizes: 455, 455, 456, 455, 455, 456, ...
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.00000000 5.933629 0.5566482 2.907770
## 0.01010101 5.929668 0.5570211 2.895452
## 0.02020202 5.923113 0.5575818 2.873112
## 0.03030303 5.917840 0.5579689 2.851965
## 0.04040404 5.913296 0.5582703 2.831880
## 0.05050505 5.909119 0.5585626 2.812843
## 0.06060606 5.906241 0.5586583 2.796078
## 0.07070707 5.904442 0.5585881 2.781000
## 0.08080808 5.903404 0.5584066 2.767545
## 0.09090909 5.903199 0.5581016 2.755649
## 0.10101010 5.903863 0.5576621 2.745460
## 0.11111111 5.905234 0.5571289 2.736538
## 0.12121212 5.907238 0.5565025 2.728349
## 0.13131313 5.909670 0.5558159 2.720601
## 0.14141414 5.912029 0.5551527 2.713312
## 0.15151515 5.914301 0.5545129 2.706726
## 0.16161616 5.916456 0.5538892 2.701238
## 0.17171717 5.918213 0.5533047 2.696487
## 0.18181818 5.919838 0.5527421 2.692188
## 0.19191919 5.921296 0.5521933 2.688309
## 0.20202020 5.922148 0.5517678 2.684734
## 0.21212121 5.923031 0.5513426 2.681838
## 0.22222222 5.924050 0.5509260 2.679516
## 0.23232323 5.925370 0.5504708 2.677714
## 0.24242424 5.926875 0.5499889 2.676284
## 0.25252525 5.928568 0.5494793 2.675486
## 0.26262626 5.930407 0.5489475 2.675022
## 0.27272727 5.932394 0.5483994 2.674934
## 0.28282828 5.934507 0.5478352 2.675088
## 0.29292929 5.936657 0.5472672 2.675343
## 0.30303030 5.938609 0.5467373 2.675355
## 0.31313131 5.940452 0.5462347 2.675094
## 0.32323232 5.942082 0.5457842 2.674333
## 0.33333333 5.943445 0.5453891 2.673218
## 0.34343434 5.944099 0.5451289 2.671225
## 0.35353535 5.944171 0.5449879 2.668287
## 0.36363636 5.944174 0.5448648 2.665171
## 0.37373737 5.944063 0.5447681 2.661873
## 0.38383838 5.943874 0.5446965 2.658381
## 0.39393939 5.943715 0.5446247 2.654923
## 0.40404040 5.943594 0.5445505 2.651523
## 0.41414141 5.943507 0.5444745 2.648194
## 0.42424242 5.943453 0.5443970 2.644953
## 0.43434343 5.943428 0.5443186 2.641764
## 0.44444444 5.943436 0.5442388 2.638611
## 0.45454545 5.943471 0.5441587 2.635563
## 0.46464646 5.943519 0.5440815 2.632573
## 0.47474747 5.943596 0.5440037 2.629650
## 0.48484848 5.943697 0.5439258 2.626794
## 0.49494949 5.943829 0.5438469 2.623985
## 0.50505051 5.943978 0.5437684 2.621235
## 0.51515152 5.944159 0.5436880 2.618516
## 0.52525253 5.944367 0.5436071 2.615834
## 0.53535354 5.944598 0.5435265 2.613203
## 0.54545455 5.944865 0.5434440 2.610615
## 0.55555556 5.945158 0.5433607 2.608075
## 0.56565657 5.945480 0.5432763 2.605584
## 0.57575758 5.945816 0.5431944 2.603140
## 0.58585859 5.946156 0.5431164 2.600766
## 0.59595960 5.946521 0.5430385 2.598404
## 0.60606061 5.946909 0.5429608 2.596083
## 0.61616162 5.947320 0.5428840 2.593811
## 0.62626263 5.947761 0.5428061 2.591620
## 0.63636364 5.948221 0.5427297 2.589469
## 0.64646465 5.948707 0.5426527 2.587386
## 0.65656566 5.949202 0.5425774 2.585363
## 0.66666667 5.949703 0.5425062 2.583381
## 0.67676768 5.950228 0.5424354 2.581440
## 0.68686869 5.950779 0.5423643 2.579579
## 0.69696970 5.951351 0.5422941 2.577765
## 0.70707071 5.951950 0.5422234 2.576004
## 0.71717172 5.952578 0.5421512 2.574314
## 0.72727273 5.953243 0.5420772 2.572666
## 0.73737374 5.953939 0.5420020 2.571082
## 0.74747475 5.954648 0.5419295 2.569541
## 0.75757576 5.955375 0.5418582 2.568022
## 0.76767677 5.956116 0.5417885 2.566539
## 0.77777778 5.956873 0.5417206 2.565105
## 0.78787879 5.957661 0.5416509 2.563724
## 0.79797980 5.958492 0.5415778 2.562395
## 0.80808081 5.959361 0.5415023 2.561108
## 0.81818182 5.960266 0.5414246 2.559855
## 0.82828283 5.961180 0.5413489 2.558656
## 0.83838384 5.962123 0.5412721 2.557497
## 0.84848485 5.963095 0.5411943 2.556389
## 0.85858586 5.964098 0.5411153 2.555318
## 0.86868687 5.965137 0.5410340 2.554285
## 0.87878788 5.966212 0.5409502 2.553296
## 0.88888889 5.967313 0.5408672 2.552338
## 0.89898990 5.968435 0.5407850 2.551416
## 0.90909091 5.969574 0.5407036 2.550529
## 0.91919192 5.970740 0.5406213 2.549677
## 0.92929293 5.971928 0.5405396 2.548863
## 0.93939394 5.973125 0.5404610 2.548082
## 0.94949495 5.974350 0.5403815 2.547329
## 0.95959596 5.975605 0.5403004 2.546618
## 0.96969697 5.976875 0.5402204 2.545955
## 0.97979798 5.978160 0.5401431 2.545323
## 0.98989899 5.979457 0.5400685 2.544712
## 1.00000000 5.980771 0.5399952 2.544147
##
## Tuning parameter 'alpha' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.09090909.
lasso_best_tuned <- train(medv ~., data = Boston, method = "lasso", metric = "RMSE", trControl = control, tuneGrid = expand.grid(fraction=c(1,0.1,0.01,0.001)))
lasso_best_tuned$results$RMSE
## [1] 9.121896 8.974774 7.636392 4.792789
predict(lasso_best_tuned$finalModel, type = "coef", mode = "fraction", s = as.numeric(lasso_best_tuned$bestTune))
## $s
## [1] 1
##
## $fraction
## 0
## 1
##
## $mode
## [1] "fraction"
##
## $coefficients
## crim zn indus chas nox
## -1.080114e-01 4.642046e-02 2.055863e-02 2.686734e+00 -1.776661e+01
## rm age dis rad tax
## 3.809865e+00 6.922246e-04 -1.475567e+00 3.060495e-01 -1.233459e-02
## ptratio black lstat
## -9.527472e-01 9.311683e-03 -5.247584e-01