2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

(a) The lasso, relative to least squares, is:

  1. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  2. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  3. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  4. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

Answer: iii. The flexibility of the fit decreases as λ increases, which in turn leads to a decrease in variance and an increase in bias.

(b) Repeat (a) for ridge regression relative to least squares.

Answer: iii. Similar to the lasso, as the λ increases, the variance will decrease and the bias will increase.

(c) Repeat (a) for non-linear methods relative to least squares.

Answer: ii. A non-linear method will be more flexible than the least squares method.

9. In this exercise, we will predict the number of applications received using the other variables in the College data set.

library(ISLR)
## Warning: package 'ISLR' was built under R version 4.1.2
summary(College)
##  Private        Apps           Accept          Enroll       Top10perc    
##  No :212   Min.   :   81   Min.   :   72   Min.   :  35   Min.   : 1.00  
##  Yes:565   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242   1st Qu.:15.00  
##            Median : 1558   Median : 1110   Median : 434   Median :23.00  
##            Mean   : 3002   Mean   : 2019   Mean   : 780   Mean   :27.56  
##            3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902   3rd Qu.:35.00  
##            Max.   :48094   Max.   :26330   Max.   :6392   Max.   :96.00  
##    Top25perc      F.Undergrad     P.Undergrad         Outstate    
##  Min.   :  9.0   Min.   :  139   Min.   :    1.0   Min.   : 2340  
##  1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0   1st Qu.: 7320  
##  Median : 54.0   Median : 1707   Median :  353.0   Median : 9990  
##  Mean   : 55.8   Mean   : 3700   Mean   :  855.3   Mean   :10441  
##  3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0   3rd Qu.:12925  
##  Max.   :100.0   Max.   :31643   Max.   :21836.0   Max.   :21700  
##    Room.Board       Books           Personal         PhD        
##  Min.   :1780   Min.   :  96.0   Min.   : 250   Min.   :  8.00  
##  1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850   1st Qu.: 62.00  
##  Median :4200   Median : 500.0   Median :1200   Median : 75.00  
##  Mean   :4358   Mean   : 549.4   Mean   :1341   Mean   : 72.66  
##  3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700   3rd Qu.: 85.00  
##  Max.   :8124   Max.   :2340.0   Max.   :6800   Max.   :103.00  
##     Terminal       S.F.Ratio      perc.alumni        Expend     
##  Min.   : 24.0   Min.   : 2.50   Min.   : 0.00   Min.   : 3186  
##  1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00   1st Qu.: 6751  
##  Median : 82.0   Median :13.60   Median :21.00   Median : 8377  
##  Mean   : 79.7   Mean   :14.09   Mean   :22.74   Mean   : 9660  
##  3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00   3rd Qu.:10830  
##  Max.   :100.0   Max.   :39.80   Max.   :64.00   Max.   :56233  
##    Grad.Rate     
##  Min.   : 10.00  
##  1st Qu.: 53.00  
##  Median : 65.00  
##  Mean   : 65.46  
##  3rd Qu.: 78.00  
##  Max.   :118.00
attach(College)

(a) Split the data set into a training set and a test set.

set.seed(1)
x=model.matrix(Apps~.,College)[,-1]
y=College$Apps
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
College.train = College[train, ]
College.test = College[test, ]
y.test=y[test]

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

lm.fit=lm(Apps~., data=College, subset=train)
summary(lm.fit)
## 
## Call:
## lm(formula = Apps ~ ., data = College, subset = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5741.2  -479.5    15.3   359.6  7258.0 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7.902e+02  6.381e+02  -1.238 0.216410    
## PrivateYes  -3.070e+02  2.006e+02  -1.531 0.126736    
## Accept       1.779e+00  5.420e-02  32.830  < 2e-16 ***
## Enroll      -1.470e+00  3.115e-01  -4.720 3.35e-06 ***
## Top10perc    6.673e+01  8.310e+00   8.030 1.31e-14 ***
## Top25perc   -2.231e+01  6.533e+00  -3.415 0.000708 ***
## F.Undergrad  9.269e-02  5.529e-02   1.676 0.094538 .  
## P.Undergrad  9.397e-03  5.493e-02   0.171 0.864275    
## Outstate    -1.084e-01  2.700e-02  -4.014 7.22e-05 ***
## Room.Board   2.115e-01  7.224e-02   2.928 0.003622 ** 
## Books        2.912e-01  3.985e-01   0.731 0.465399    
## Personal     6.133e-03  8.803e-02   0.070 0.944497    
## PhD         -1.548e+01  6.681e+00  -2.316 0.021082 *  
## Terminal     6.415e+00  7.290e+00   0.880 0.379470    
## S.F.Ratio    2.283e+01  2.047e+01   1.115 0.265526    
## perc.alumni  1.134e+00  6.083e+00   0.186 0.852274    
## Expend       4.857e-02  1.619e-02   2.999 0.002890 ** 
## Grad.Rate    7.490e+00  4.397e+00   1.703 0.089324 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1083 on 370 degrees of freedom
## Multiple R-squared:  0.9389, Adjusted R-squared:  0.9361 
## F-statistic: 334.3 on 17 and 370 DF,  p-value: < 2.2e-16
lm.pred=predict(lm.fit, College.test)
mean((College.test$Apps - lm.pred)^2)
## [1] 1135758

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

library(glmnet)
## Warning: package 'glmnet' was built under R version 4.1.2
## Loading required package: Matrix
## Loaded glmnet 4.1-3
grid=10^seq(10, -2, length=100)
ridge.mod=glmnet(x, y, alpha=0, lambda=grid)
set.seed(1)
cv.out=cv.glmnet(x[train,], y[train], alpha=0)
bestlam=cv.out$lambda.min
bestlam
## [1] 405.8404
ridge.pred=predict(ridge.mod, s=bestlam, newx=x[test,])
mean((ridge.pred -y.test)^2)
## [1] 906029.4

(d) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

lasso.mod=glmnet(x[train,], y[train], alpha=1, lambda=grid)
set.seed(1)
cv.out1=cv.glmnet(x[train,], y[train], alpha=1)
bestlam1=cv.out1$lambda.min
lasso.pred=predict(lasso.mod ,s=bestlam1, newx=x[test ,])
mean((lasso.pred - y.test)^2)
## [1] 1116252
lasso.coef=predict(lasso.mod, type="coefficients", s=bestlam1)[1:18,]
lasso.coef[lasso.coef!=0]
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -7.700064e+02 -3.118246e+02  1.763118e+00 -1.324681e+00  6.484111e+01 
##     Top25perc   F.Undergrad   P.Undergrad      Outstate    Room.Board 
## -2.084190e+01  7.231529e-02  1.210751e-02 -1.049092e-01  2.088339e-01 
##         Books      Personal           PhD      Terminal     S.F.Ratio 
##  2.936591e-01  3.742302e-03 -1.443128e+01  5.292709e+00  2.176770e+01 
##   perc.alumni        Expend     Grad.Rate 
##  5.149189e-01  4.824620e-02  7.035589e+00

(e) Fit a PCR model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.

library(pls)
## Warning: package 'pls' was built under R version 4.1.3
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
pcr.fit=pcr(Apps~., data=College.train, scale=TRUE, validation="CV")
summary(pcr.fit)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4288     4013     2368     2388     2072     1926     1900
## adjCV         4288     4012     2364     2386     2025     1907     1893
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1905     1907     1868      1811      1815      1820      1809
## adjCV     1899     1903     1862      1799      1807      1812      1801
##        14 comps  15 comps  16 comps  17 comps
## CV         1819      1753      1312      1236
## adjCV      1813      1725      1298      1225
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       32.20    57.78    65.31    70.99    76.37    81.27     84.8    87.85
## Apps    13.44    70.93    71.07    79.87    81.15    82.25     82.3    82.33
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.62     92.91     94.98     96.74     97.79     98.72     99.42
## Apps    83.38     84.76     84.80     84.84     85.11     85.14     90.55
##       16 comps  17 comps
## X        99.88    100.00
## Apps     93.42     93.89
validationplot(pcr.fit, val.type="MSEP")

pcr.pred=predict(pcr.fit, x[test ,], ncomp=10)
mean((pcr.pred-y.test)^2)
## [1] 1723100

The value of M selected by cross-validation is 10.

(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

set.seed(1)
pls.fit=plsr(Apps~., data=College.train, scale=TRUE, validation="CV")
summary (pls.fit)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4288     2217     2019     1761     1630     1533     1347
## adjCV         4288     2211     2012     1749     1605     1510     1331
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1309     1303     1286      1283      1283      1277      1271
## adjCV     1296     1289     1273      1270      1270      1264      1258
##        14 comps  15 comps  16 comps  17 comps
## CV         1270      1270      1270      1270
## adjCV      1258      1257      1257      1257
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       27.21    50.73    63.06    65.52    70.20    74.20    78.62    80.81
## Apps    75.39    81.24    86.97    91.14    92.62    93.43    93.56    93.68
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       83.29     87.17     89.15     91.37     92.58     94.42     96.98
## Apps    93.76     93.79     93.83     93.86     93.88     93.89     93.89
##       16 comps  17 comps
## X        98.78    100.00
## Apps     93.89     93.89
validationplot(pls.fit, val.type="MSEP")

pls.pred=predict(pls.fit, x[test,], ncomp=8)
mean((pls.pred-y.test)^2)
## [1] 1090290

The value of M selected by cross-validation is 8.

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

Based on the models, the number of college applications received can be predicted fairly accurately. The test errors for each approach were close in range, except for the PCR approach, which had the highest test error.

detach(College)

11. We will now try to predict per capita crime rate in the Boston data set.

library(MASS)
attach(Boston)
summary(Boston)
##       crim                zn             indus            chas        
##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
##  1st Qu.: 0.08205   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
##       nox               rm             age              dis        
##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
##       rad              tax           ptratio          black       
##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
##      lstat            medv      
##  Min.   : 1.73   Min.   : 5.00  
##  1st Qu.: 6.95   1st Qu.:17.02  
##  Median :11.36   Median :21.20  
##  Mean   :12.65   Mean   :22.53  
##  3rd Qu.:16.95   3rd Qu.:25.00  
##  Max.   :37.97   Max.   :50.00

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

set.seed(1)
x1=model.matrix(crim~., Boston)
y1=Boston$crim
train1=sample(1:nrow(x1), nrow(x1)/2)
test1=(-train1)
Boston.train = Boston[train1, ]
Boston.test = Boston[test1, ]
y1.test1=y1[test1]

Ridge Regression

grid=10^seq(10, -2, length=100)
ridge.mod2=glmnet(x1, y1, alpha=0, lambda=grid)
set.seed(1)
cv.out2=cv.glmnet(x1[train1,], y1[train1], alpha=0)
bestlam2=cv.out2$lambda.min
bestlam2
## [1] 0.5919159
ridge.pred2=predict(ridge.mod2, s=bestlam2, newx=x1[test1,])
mean((ridge.pred2 - y1.test1)^2)
## [1] 38.01174

Lasso

lasso.mod3=glmnet(x1[train1,], y1[train1], alpha=1, lambda=grid)
set.seed(1)
cv.out3=cv.glmnet(x1[train1,], y1[train1], alpha=1)
bestlam3=cv.out3$lambda.min
lasso.pred3=predict(lasso.mod3 ,s=bestlam3, newx=x1[test1 ,])
mean((lasso.pred3 - y1.test1)^2)
## [1] 40.99066
lasso.coef3=predict(lasso.mod3,type="coefficients",s=bestlam3)[1:15,]
lasso.coef3[lasso.coef3!=0]
##   (Intercept)            zn         indus          chas           nox 
## 18.9503431634  0.0366638419 -0.1248394521 -0.4770056594 -8.3082861310 
##            rm           age           dis           rad           tax 
##  0.1230732403 -0.0007472670 -0.8420393123  0.5337673781 -0.0001484142 
##       ptratio         black         lstat          medv 
## -0.3832539956 -0.0129716801  0.2580918024 -0.1611287956
set.seed(1)
pls.fit1=plsr(crim~., data=Boston.train, scale=TRUE, validation="CV")
summary (pls.fit1)
## Data:    X dimension: 253 13 
##  Y dimension: 253 1
## Fit method: kernelpls
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           9.275    7.328    6.842    6.784    6.741    6.718    6.695
## adjCV        9.275    7.322    6.834    6.768    6.728    6.707    6.683
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.722    6.707    6.696     6.701     6.700     6.700     6.700
## adjCV    6.706    6.693    6.683     6.688     6.686     6.686     6.686
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       48.10    57.46    62.68    69.65    77.95    80.82    84.40    87.65
## crim    38.94    47.55    49.73    50.61    50.85    51.17    51.29    51.35
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       90.47     93.07     96.66     98.26    100.00
## crim    51.37     51.37     51.37     51.37     51.37
validationplot(pls.fit1, val.type="MSEP")

pls.pred1=predict(pls.fit1, Boston.test, ncomp=6)
mean((pls.pred1-y1.test1)^2)
## [1] 41.78158

Based on the test errors of the Ridge Regression, Lasso, and PLS approaches, Ridge Regression is the lowest.

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross-validation, or some other reasonable alternative, as opposed to using training error.

The Ridge Regression model performed well using cross-validation. It had the lowest test error compared to the Lasso and PCR approaches.

(c) Does your chosen model involve all of the features in the data set? Why or why not?

This model does include all of the features and reduces the size of coefficients.