Exercise 2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

(A) The lasso, relative to least squares, is: iii. Less flexible and will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

(B) Ridge regression relative to least squares, is: iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

(C) Non-linear methods relative to least squares, are: ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

Exercise 9

In this exercise, we will predict the number of applications received using the other variables in the College data set.

(A) Split the data set into a training set and a test set.

data("College")
names(College)
##  [1] "Private"     "Apps"        "Accept"      "Enroll"      "Top10perc"  
##  [6] "Top25perc"   "F.Undergrad" "P.Undergrad" "Outstate"    "Room.Board" 
## [11] "Books"       "Personal"    "PhD"         "Terminal"    "S.F.Ratio"  
## [16] "perc.alumni" "Expend"      "Grad.Rate"
set.seed(1)
College.split = initial_split(College, strata = "Apps", prop = .7)
College.train = training(College.split)
College.test = testing (College.split)

(B) Fit a linear model using least squares on the training set, and report the test error obtained.

lm.fit=lm(Apps~., data= College.train)
summary(lm.fit)
## 
## Call:
## lm(formula = Apps ~ ., data = College.train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5107.8  -391.0    -7.3   290.7  7594.2 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -629.04441  506.66716  -1.242  0.21497    
## PrivateYes  -528.76499  161.85774  -3.267  0.00116 ** 
## Accept         1.67554    0.04556  36.779  < 2e-16 ***
## Enroll        -0.98055    0.21469  -4.567 6.17e-06 ***
## Top10perc     38.91487    6.55417   5.937 5.28e-09 ***
## Top25perc     -7.27523    5.16743  -1.408  0.15975    
## F.Undergrad    0.01406    0.03889   0.362  0.71783    
## P.Undergrad    0.08264    0.03669   2.252  0.02472 *  
## Outstate      -0.09102    0.02239  -4.065 5.53e-05 ***
## Room.Board     0.12088    0.05781   2.091  0.03701 *  
## Books          0.04141    0.31067   0.133  0.89400    
## Personal       0.01102    0.07274   0.151  0.87970    
## PhD           -6.36443    5.73029  -1.111  0.26722    
## Terminal      -5.11899    6.17210  -0.829  0.40727    
## S.F.Ratio     26.61536   16.14142   1.649  0.09977 .  
## perc.alumni    3.13772    4.89273   0.641  0.52161    
## Expend         0.09128    0.01416   6.447 2.59e-10 ***
## Grad.Rate      7.77373    3.59667   2.161  0.03112 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1039 on 523 degrees of freedom
## Multiple R-squared:  0.9359, Adjusted R-squared:  0.9338 
## F-statistic: 449.2 on 17 and 523 DF,  p-value: < 2.2e-16
lm.pred = predict(lm.fit, College.test)
(lm.info = postResample(lm.pred,College.test$Apps))
##         RMSE     Rsquared          MAE 
## 1083.7809074    0.9026437  631.4713036

The Root Means Squared Error obtained is 1083.7809074.

(C) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

train = model.matrix(Apps ~ ., data = College.train)
test = model.matrix(Apps ~ ., data = College.test)
grid <- 10 ^ seq(4, -2, length = 100)
ridge.fit <- glmnet(train, College.train$Apps, alpha = 0, lambda = grid, thresh = 1e-12)
ridge.cv <- cv.glmnet(train, College.train$Apps, alpha = 0, lambda = grid, thresh = 1e-12)
(ridge.bestlam <- ridge.cv$lambda.min)
## [1] 0.01

The best lambda is 0.01

ridge.pred = predict(ridge.fit, s = ridge.bestlam, newx = test)
(ridge.info = postResample(ridge.pred,College.test$Apps))
##         RMSE     Rsquared          MAE 
## 1083.7730184    0.9026454  631.4656057

The Root Mean Square Error for Ridge regression model is 1083.7730184.

(D) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

set.seed(2)
lasso.fit <- glmnet(train, College.train$Apps, alpha = 1, lambda = grid, thresh = 1e-12)
lasso.cv <- cv.glmnet(train, College.train$Apps, alpha = 1, lambda = grid, thresh = 1e-12)
(lasso.bestlam <- lasso.cv$lambda.min)
## [1] 0.01
lasso.pred = predict(lasso.fit, s = lasso.bestlam, newx = test)
(lasso.info = postResample(ridge.pred,College.test$Apps))
##         RMSE     Rsquared          MAE 
## 1083.7730184    0.9026454  631.4656057

The RMSE for lasso is 1083.7730184.

predict(lasso.fit, s=lasso.bestlam, type="coefficients")
## 19 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept) -629.12897655
## (Intercept)    .         
## PrivateYes  -528.72947567
## Accept         1.67546947
## Enroll        -0.98002139
## Top10perc     38.90769848
## Top25perc     -7.26977780
## F.Undergrad    0.01398635
## P.Undergrad    0.08263787
## Outstate      -0.09101140
## Room.Board     0.12086539
## Books          0.04139372
## Personal       0.01099352
## PhD           -6.36221484
## Terminal      -5.11870024
## S.F.Ratio     26.61043507
## perc.alumni    3.13343444
## Expend         0.09127878
## Grad.Rate      7.77280764

(E) Fit a PCR model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.

set.seed(3)
pcr.fit = pcr(Apps ~ ., data=College.train, scale=TRUE, validation="CV")
summary(pcr.fit)
## Data:    X dimension: 541 17 
##  Y dimension: 541 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4041     3962     2213     2216     1755     1741     1734
## adjCV         4041     3964     2210     2216     1734     1734     1730
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1705     1703     1637      1640      1649      1649      1656
## adjCV     1698     1700     1633      1635      1644      1644      1651
##        14 comps  15 comps  16 comps  17 comps
## CV         1660      1602      1241      1198
## adjCV      1655      1567      1229      1187
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       30.70    56.67    63.58    69.60     75.1    80.06    83.76    87.24
## Apps     4.56    70.96    71.47    82.69     82.7    82.78    83.55    83.63
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.40     92.77     94.92     96.70     97.90     98.72     99.33
## Apps    84.85     84.98     85.02     85.08     85.08     85.16     91.34
##       16 comps  17 comps
## X        99.84    100.00
## Apps     93.26     93.59
validationplot(pcr.fit, val.type="MSEP")

set.seed(3)
pcr.pred = predict(pcr.fit, College.test, ncomp = 8)
(pcr.info = postResample(pcr.pred,College.test$Apps))
##         RMSE     Rsquared          MAE 
## 1211.2579996    0.8796107  820.2794103

With PCR the test error is 1211.2579996 when M=8.

  1. Fit a PLS model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.
set.seed(4)
pls.fit <- plsr(Apps ~ ., data=College.train, scale=TRUE, validation="CV")
summary(pls.fit)
## Data:    X dimension: 541 17 
##  Y dimension: 541 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4041     2015     1682     1579     1498     1301     1214
## adjCV         4041     2010     1678     1572     1477     1280     1202
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1186     1182     1176      1173      1171      1166      1165
## adjCV     1176     1172     1167      1164      1162      1157      1156
##        14 comps  15 comps  16 comps  17 comps
## CV         1165      1166      1166      1166
## adjCV      1157      1157      1157      1157
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       26.19    39.93    62.35    65.02    68.04    73.36    76.45    79.44
## Apps    76.62    84.55    87.05    90.85    93.02    93.39    93.45    93.49
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       81.96     84.40     86.78     90.56     92.46     94.80     96.78
## Apps    93.54     93.57     93.58     93.58     93.59     93.59     93.59
##       16 comps  17 comps
## X        98.65    100.00
## Apps     93.59     93.59
validationplot(pls.fit, val.typ="MSEP")

pls.pred = predict(pls.fit, College.test, ncomp = 2)
(pls.info = postResample(pls.pred,College.test$Apps))
##         RMSE     Rsquared          MAE 
## 1284.1314290    0.8647247  763.4251588

The test error is 1284.1314290 when M=2.

(G) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

as_tibble(rbind(lm.info, ridge.info,lasso.info, pcr.info, pls.info)) %>%
    mutate(model = c('Linear', 'Ridge', 'Lasso', 'PCR', 'PLS')) %>%
    select(model, RMSE, Rsquared)
## # A tibble: 5 × 3
##   model   RMSE Rsquared
##   <chr>  <dbl>    <dbl>
## 1 Linear 1084.    0.903
## 2 Ridge  1084.    0.903
## 3 Lasso  1084.    0.903
## 4 PCR    1211.    0.880
## 5 PLS    1284.    0.865

The Linear, Ridge, and Lasso have a reasonably large R^2 and are extremely close together which indicates the difference between them is minuscule. The PCR and PLS models are less desirable compared to the other three due to their R^2 values being much lower; however, when compared to each other the R^2 values are close together indicating there may not be much of a difference between them.

Exercise 11

We will now try to predict per capita crime rate in the Boston data set.

(A) Try out some of the regression methods explored in this chapter,such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

data("Boston")

set.seed(1)
Boston.split = initial_split(Boston, strata ="crim", prop=.8)
Boston.train = training(Boston.split)
Boston.test = testing (Boston.split)

Linear Regresion:

lm.split=lm(crim~., data= Boston.train)

crim.pred = predict(lm.split, Boston.test)
(lm.info2 = postResample(crim.pred, Boston.test$crim))
##      RMSE  Rsquared       MAE 
## 6.7551201 0.4123084 2.9804667

Full liner model RMSE is 6.7551201.

Ridge Regression:

set.seed(1)
matrix.train = model.matrix(crim ~ ., data = Boston.train)
matrix.test = model.matrix(crim ~ ., data = Boston.test)
grid = 10 ^ seq(4, -2, length = 100)

ridge.fit = glmnet(matrix.train, Boston.train$crim, alpha = 0, lambda = grid, thresh = 1e-12)
ridge.cv = cv.glmnet(matrix.train, Boston.train$crim, alpha = 0, lambda = grid, thresh = 1e-12)
(ridge.bestlam = ridge.cv$lambda.min)
## [1] 0.2477076
ridge.pred = predict(ridge.fit, s = ridge.bestlam, newx = matrix.test)
(ridge.info2 = postResample(ridge.pred, Boston.test$crim))
##      RMSE  Rsquared       MAE 
## 6.7350604 0.4155481 2.8834667

Ridge Regression best lamda is 0.2477076 and RSME is 6.7350604

Lasso Model

set.seed(1)
lasso.fit = glmnet(matrix.train, Boston.train$crim, alpha = 1, lambda = grid, thresh = 1e-12)
lasso.cv = cv.glmnet(matrix.train, Boston.train$crim, alpha = 1, lambda = grid, thresh = 1e-12)
(lasso.bestlam = lasso.cv$lambda.min)
## [1] 0.06135907
lasso.pred = predict(lasso.fit, s = lasso.bestlam, newx = matrix.test)
(lasso.info2 = postResample(lasso.pred, Boston.test$crim))
##      RMSE  Rsquared       MAE 
## 6.7506997 0.4127902 2.8520426
predict(lasso.fit, s=lasso.bestlam, type="coefficients")
## 15 x 1 sparse Matrix of class "dgCMatrix"
##                       s1
## (Intercept) 11.801194827
## (Intercept)  .          
## zn           0.038681799
## indus       -0.079153523
## chas        -0.707239482
## nox         -5.594725457
## rm           0.235049318
## age          .          
## dis         -0.841139977
## rad          0.526935923
## tax          .          
## ptratio     -0.208748543
## black       -0.003396034
## lstat        0.098860201
## medv        -0.182569148

Lasso best lamda is 0.06135907 and RSME is 6.7506997.

Principal Componenets Regression:

set.seed(1)
pcr.fit = pcr(crim ~ ., data = Boston.train, scale = TRUE, validation = "CV")
summary(pcr.fit)
## Data:    X dimension: 402 13 
##  Y dimension: 402 1
## Fit method: svdpc
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           8.557    7.170    7.162    6.787    6.780    6.791    6.770
## adjCV        8.557    7.166    7.157    6.779    6.771    6.783    6.761
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.756    6.635    6.672     6.674     6.636     6.603     6.530
## adjCV    6.747    6.626    6.662     6.665     6.624     6.589     6.516
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       47.06    59.74    69.06    75.98    82.73    87.90    91.09    93.52
## crim    30.86    31.09    39.09    39.29    39.34    40.12    40.36    42.63
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       95.40     97.01     98.43     99.49    100.00
## crim    42.74     42.85     43.55     44.81     46.31
validationplot(pcr.fit, val.type = "MSEP")

pcr.pred = predict(pcr.fit, Boston.test, ncomp = 3)
(pcr.info2 = postResample(pcr.pred, Boston.test$crim))
##      RMSE  Rsquared       MAE 
## 6.8425179 0.3967627 2.8322963

The PRC model used 3 components and has an error rate of 6.842518.

Partial Least Squares:

pls.fit = plsr(crim ~ ., data = Boston.train, scale = TRUE, validation = "CV")
summary(pls.fit)
## Data:    X dimension: 402 13 
##  Y dimension: 402 1
## Fit method: kernelpls
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           8.557    6.974    6.630    6.561    6.543    6.516    6.504
## adjCV        8.557    6.971    6.624    6.550    6.531    6.503    6.490
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.492    6.481    6.483     6.481     6.482     6.482     6.482
## adjCV    6.479    6.469    6.471     6.469     6.470     6.470     6.470
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       46.63    56.09    61.08    70.50    75.66    78.89    83.93    85.92
## crim    34.54    42.22    44.62    45.32    45.81    46.14    46.23    46.30
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       88.72     94.23     96.64     98.37    100.00
## crim    46.31     46.31     46.31     46.31     46.31
validationplot(pls.fit, val.type = "MSEP")

pls.pred = predict(pls.fit, Boston.test, ncomp = 5)
(pls.info2 = postResample(pls.pred, Boston.test$crim))
##      RMSE  Rsquared       MAE 
## 6.7362959 0.4154761 2.9841105

The pls model used 5 components and has an error rate of 6.7362959

**(B) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross-validation, or some other reasonable alternative, as opposed to using training error.##

as_tibble(rbind(lm.info2, ridge.info2, lasso.info2, pcr.info2, pls.info2)) %>%
    mutate(model = c('Linear', 'Ridge', 'Lasso', 'PCR', 'PLS')) %>%
    select(model, RMSE, Rsquared)
## # A tibble: 5 × 3
##   model   RMSE Rsquared
##   <chr>  <dbl>    <dbl>
## 1 Linear  6.76    0.412
## 2 Ridge   6.74    0.416
## 3 Lasso   6.75    0.413
## 4 PCR     6.84    0.397
## 5 PLS     6.74    0.415

With the exception of PCR all models have an R^2 values that are extremely close together which indicate the difference between the is minuscule. Based on the RMSE values the proposed model would be the Ridge model.

The Ridge Regression and Partial least squares have the lowest MSRE and seem to perform well on this data set.

(C) Does your chosen model involve all of the features in the data set? Why or why not?

The Ridge Regression model uses all the features because it does not have zeroing coefficients as a goal.