Q2.

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

  1. More flexible and hence will give improved prediction ac- curacy when its increase in bias is less than its decrease in variance.

  2. More flexible and hence will give improved prediction accu- racy when its increase in variance is less than its decrease in bias.

  3. Less flexible and hence will give improved prediction accu- racy when its increase in bias is less than its decrease in variance.

  4. Less flexible and hence will give improved prediction accu- racy when its increase in variance is less than its decrease in bias.

a.)

The lasso, relative to least squares, is: answer choice- iii

b.)

Repeat (a) for ridge regression relative to least squares.: answer choice- iii

c.)

Repeat (a) for non-linear methods relative to least squares.: answer choice- ii

Q9.

In this exercise, we will predict the number of applications received using the other variables in the College data set.

a.)

Split the data set into a training set and a test set.

library(ISLR)
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-2
library(pls)
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
library(MASS)
x=model.matrix(Apps~.,College)[,-1]
y=College$Apps
set.seed(10)
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
College.train = College[train, ]
College.test = College[test, ]
y.test=y[test]

b.)

LS.fit<-lm(Apps~., data=College, subset=train)
summary(LS.fit)
## 
## Call:
## lm(formula = Apps ~ ., data = College, subset = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5139.5  -473.3   -21.1   353.2  7402.7 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -629.36179  639.35741  -0.984 0.325579    
## PrivateYes  -647.56836  192.17056  -3.370 0.000832 ***
## Accept         1.68912    0.05038  33.530  < 2e-16 ***
## Enroll        -1.02383    0.27721  -3.693 0.000255 ***
## Top10perc     48.19124    8.10714   5.944 6.42e-09 ***
## Top25perc    -10.51538    6.44952  -1.630 0.103865    
## F.Undergrad    0.01992    0.05364   0.371 0.710574    
## P.Undergrad    0.04213    0.05348   0.788 0.431373    
## Outstate      -0.09489    0.02674  -3.549 0.000436 ***
## Room.Board     0.14549    0.07243   2.009 0.045277 *  
## Books          0.06660    0.31115   0.214 0.830623    
## Personal       0.05663    0.09453   0.599 0.549475    
## PhD          -10.11489    7.11588  -1.421 0.156027    
## Terminal      -2.29300    8.03546  -0.285 0.775528    
## S.F.Ratio     22.07117   18.70991   1.180 0.238897    
## perc.alumni    2.08121    6.00673   0.346 0.729179    
## Expend         0.07654    0.01672   4.577 6.45e-06 ***
## Grad.Rate      9.99706    4.49821   2.222 0.026857 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1092 on 370 degrees of freedom
## Multiple R-squared:  0.9395, Adjusted R-squared:  0.9367 
## F-statistic:   338 on 17 and 370 DF,  p-value: < 2.2e-16
pred.app<-predict(LS.fit, College.test)
test.error<-mean((College.test$Apps-pred.app)^2)
test.error
## [1] 1020100

c.)

Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

grid=10^seq(10,-2,length=100)
ridge.mod=glmnet(x[train,],y[train],alpha=0,lambda=grid)
summary(ridge.mod)
##           Length Class     Mode   
## a0         100   -none-    numeric
## beta      1700   dgCMatrix S4     
## df         100   -none-    numeric
## dim          2   -none-    numeric
## lambda     100   -none-    numeric
## dev.ratio  100   -none-    numeric
## nulldev      1   -none-    numeric
## npasses      1   -none-    numeric
## jerr         1   -none-    numeric
## offset       1   -none-    logical
## call         5   -none-    call   
## nobs         1   -none-    numeric
cv.college.out=cv.glmnet(x[train,],y[train] ,alpha=0)
bestlam=cv.college.out$lambda.min
bestlam
## [1] 411.3927
ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
mean((ridge.pred-y.test)^2)
## [1] 985020.1

d.)

Fit a lasso model on the training set, with λ chosen by cross- validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
summary(lasso.mod)
##           Length Class     Mode   
## a0         100   -none-    numeric
## beta      1700   dgCMatrix S4     
## df         100   -none-    numeric
## dim          2   -none-    numeric
## lambda     100   -none-    numeric
## dev.ratio  100   -none-    numeric
## nulldev      1   -none-    numeric
## npasses      1   -none-    numeric
## jerr         1   -none-    numeric
## offset       1   -none-    logical
## call         5   -none-    call   
## nobs         1   -none-    numeric
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
bestlam=cv.out$lambda.min
bestlam
## [1] 24.66235
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
## [1] 1008145
out=glmnet(x,y,alpha=1,lambda = grid)
lasso.coef=predict(out,type="coefficients",s=bestlam)[1:18,]
lasso.coef[lasso.coef!=0]
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -6.324960e+02 -4.087012e+02  1.436837e+00 -1.410240e-01  3.143012e+01 
##     Top25perc   P.Undergrad      Outstate    Room.Board      Personal 
## -8.606525e-01  1.480293e-02 -5.342495e-02  1.205819e-01  4.379046e-05 
##           PhD      Terminal     S.F.Ratio   perc.alumni        Expend 
## -5.121245e+00 -3.371192e+00  2.717229e+00 -1.039648e+00  6.838161e-02 
##     Grad.Rate 
##  4.700317e+00

e.)

Fit a PCR model on the training set, with M chosen by cross- validation. Report the test error obtained, along with the value of M selected by cross-validation.

pcr.college=pcr(Apps~., data=College.train,scale=TRUE,validation="CV")
summary(pcr.college)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4347     4345     2371     2391     2104     1949     1898
## adjCV         4347     4345     2368     2396     2085     1939     1891
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1899     1880     1864      1861      1870      1873      1891
## adjCV     1893     1862     1857      1853      1862      1865      1885
##        14 comps  15 comps  16 comps  17 comps
## CV         1903      1727      1295      1260
## adjCV      1975      1669      1283      1249
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X     32.6794    56.94    64.38    70.61    76.27    80.97    84.48    87.54
## Apps   0.9148    71.17    71.36    79.85    81.49    82.73    82.79    83.70
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.50     92.89     94.96     96.81     97.97     98.73     99.39
## Apps    83.86     84.08     84.11     84.11     84.16     84.28     93.08
##       16 comps  17 comps
## X        99.86    100.00
## Apps     93.71     93.95
pcr.pred=predict(pcr.college,x[test,],ncomp=10)
mean((pcr.pred-y.test)^2)
## [1] 1422699

f.)

pls.college=plsr(Apps~., data=College.train,scale=TRUE, validation="CV")
summary(pls.college)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4347     2178     1872     1734     1615     1453     1359
## adjCV         4347     2171     1867     1726     1586     1427     1341
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1347     1340     1329      1317      1310      1305      1305
## adjCV     1330     1324     1314      1302      1296      1291      1291
##        14 comps  15 comps  16 comps  17 comps
## CV         1305      1307      1307      1307
## adjCV      1291      1292      1293      1293
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       24.27    38.72    62.64    65.26    69.01    73.96    78.86    82.18
## Apps    76.96    84.31    86.80    91.48    93.37    93.75    93.81    93.84
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       85.35     87.42     89.18     91.41     92.70     94.58     97.16
## Apps    93.88     93.91     93.93     93.94     93.95     93.95     93.95
##       16 comps  17 comps
## X        98.15    100.00
## Apps     93.95     93.95
pls.pred=predict(pls.college,x[test,],ncomp=9)
mean((pls.pred-y.test)^2)
## [1] 1049868

g.)

test.avg = mean(College.test[, "Apps"])
lm.test.r2 = 1 - mean((College.test[, "Apps"] - pred.app)^2) /mean((College.test[, "Apps"] - test.avg)^2)
ridge.test.r2 = 1 - mean((College.test[, "Apps"] - ridge.pred)^2) /mean((College.test[, "Apps"] - test.avg)^2)
lasso.test.r2 = 1 - mean((College.test[, "Apps"] - lasso.pred)^2) /mean((College.test[, "Apps"] - test.avg)^2)
pcr.test.r2 = 1 - mean((pcr.pred-y.test)^2) /mean((College.test[, "Apps"] - test.avg)^2)
pls.test.r2 = 1 - mean((pls.pred-y.test)^2) /mean((College.test[, "Apps"] - test.avg)^2)
print(lm.test.r2)
## [1] 0.9076134
print(ridge.test.r2)
## [1] 0.9107905
print(lasso.test.r2)
## [1] 0.9086962
print(pcr.test.r2)
## [1] 0.8711516
print(pls.test.r2)
## [1] 0.9049174
Comment

Based on the R2’s calculated, it seems that all of the models are reliable in giving us good predictions for the number of college applications we will get. The PCR model had the lowest with .87. And Ridge had an MSE of 985020.1, making it one of the better models and gives better quality.

Q11.

We will now try to predict per capita crime rate in the Boston data set.

a.)

Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

x2=model.matrix(crim~.,Boston)[,-1]
y2=Boston$crim
set.seed(10)
train2=sample(1:nrow(x2), nrow(x2)/2)
test2=(-train2)
Boston.train2 = Boston[train2, ]
Boston.test2 = Boston[test2, ]
y.test2=y[test2]

Linear Model

LS2.fit<-lm(crim~., data=Boston, subset=train2)
summary(LS2.fit)
## 
## Call:
## lm(formula = crim ~ ., data = Boston, subset = train2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.440 -1.772 -0.274  0.902 56.481 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 16.608332   8.075136   2.057   0.0408 *  
## zn           0.036330   0.023112   1.572   0.1173    
## indus       -0.092007   0.105736  -0.870   0.3851    
## chas        -1.201144   1.318540  -0.911   0.3632    
## nox         -6.136130   5.726461  -1.072   0.2850    
## rm          -0.273063   0.624231  -0.437   0.6622    
## age          0.007418   0.019710   0.376   0.7070    
## dis         -0.605597   0.314563  -1.925   0.0554 .  
## rad          0.532056   0.094632   5.622 5.24e-08 ***
## tax         -0.002360   0.006015  -0.392   0.6952    
## ptratio     -0.222514   0.219675  -1.013   0.3121    
## black       -0.013073   0.004510  -2.899   0.0041 ** 
## lstat        0.133088   0.083231   1.599   0.1111    
## medv        -0.111278   0.068103  -1.634   0.1036    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.255 on 239 degrees of freedom
## Multiple R-squared:  0.5351, Adjusted R-squared:  0.5098 
## F-statistic: 21.16 on 13 and 239 DF,  p-value: < 2.2e-16
pred2.app<-predict(LS2.fit, Boston.test2)
test2.error<-mean((Boston.test2$crim-pred2.app)^2)
test2.error
## [1] 55.60279

Ridge Model

grid2=10^seq(10,-2,length=100)
ridge2.mod=glmnet(x[train2,],y[train2],alpha=0,lambda=grid2)
summary(ridge2.mod)
##           Length Class     Mode   
## a0         100   -none-    numeric
## beta      1700   dgCMatrix S4     
## df         100   -none-    numeric
## dim          2   -none-    numeric
## lambda     100   -none-    numeric
## dev.ratio  100   -none-    numeric
## nulldev      1   -none-    numeric
## npasses      1   -none-    numeric
## jerr         1   -none-    numeric
## offset       1   -none-    logical
## call         5   -none-    call   
## nobs         1   -none-    numeric
cv2.boston.out=cv.glmnet(x[train2,],y[train2] ,alpha=0)
bestlam2=cv2.boston.out$lambda.min
bestlam2
## [1] 288.8223
ridge2.pred=predict(ridge2.mod,s=bestlam2,newx=x[test2,])
mean((ridge2.pred-y.test2)^2)
## [1] 2138171

Lasso Model

lasso2.mod=glmnet(x[train2,],y[train2],alpha=1,lambda=grid2)
summary(lasso2.mod)
##           Length Class     Mode   
## a0         100   -none-    numeric
## beta      1700   dgCMatrix S4     
## df         100   -none-    numeric
## dim          2   -none-    numeric
## lambda     100   -none-    numeric
## dev.ratio  100   -none-    numeric
## nulldev      1   -none-    numeric
## npasses      1   -none-    numeric
## jerr         1   -none-    numeric
## offset       1   -none-    logical
## call         5   -none-    call   
## nobs         1   -none-    numeric
cv.out_L=cv.glmnet(x[train2,],y[train2],alpha=1)
bestlam_L=cv.out_L$lambda.min
bestlam_L
## [1] 17.31444
lasso2.pred=predict(lasso2.mod,s=bestlam_L,newx=x[test2,])
mean((lasso2.pred-y.test2)^2)
## [1] 1494277
out_L=glmnet(x,y,alpha=1,lambda = grid2)
lasso2.coef=predict(out_L,type="coefficients",s=bestlam_L)[1:18,]
lasso2.coef[lasso2.coef!=0]
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -5.704754e+02 -4.437052e+02  1.481060e+00 -2.829124e-01  3.683722e+01 
##     Top25perc   P.Undergrad      Outstate    Room.Board      Personal 
## -4.610571e+00  2.833588e-02 -6.390853e-02  1.303785e-01  5.325777e-03 
##           PhD      Terminal     S.F.Ratio   perc.alumni        Expend 
## -6.221909e+00 -3.223163e+00  6.728209e+00 -8.212749e-01  7.126981e-02 
##     Grad.Rate 
##  5.799437e+00

PCR Model

pcr2.boston=pcr(crim~., data=Boston.train2,scale=TRUE,validation="CV")
summary(pcr2.boston)
## Data:    X dimension: 253 13 
##  Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           7.521    6.068    6.014    5.627    5.523    5.562    5.631
## adjCV        7.521    6.066    6.013    5.619    5.516    5.555    5.621
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       5.644    5.556    5.443     5.444     5.465     5.447     5.404
## adjCV    5.633    5.539    5.426     5.429     5.451     5.431     5.388
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       46.57    58.79    68.40    76.02    82.37    87.52    90.99    93.28
## crim    35.21    36.53    45.03    47.18    47.30    47.35    47.35    49.85
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       95.35     97.13     98.44     99.51    100.00
## crim    51.98     51.99     52.05     52.64     53.51
#pcr2.pred=predict(pcr2.boston, data=x[test2,], ncomp= 11)
#mean((pcr2.pred-y.test2)^2)

PLS Model

pls2.boston=plsr(crim~., data=Boston.train2,scale=TRUE, validation="CV")
summary(pls2.boston)
## Data:    X dimension: 253 13 
##  Y dimension: 253 1
## Fit method: kernelpls
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           7.521    5.910    5.565    5.567    5.549    5.509    5.498
## adjCV        7.521    5.905    5.554    5.546    5.525    5.489    5.477
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       5.521    5.510    5.505     5.505     5.505     5.505     5.505
## adjCV    5.497    5.488    5.483     5.483     5.483     5.484     5.484
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       46.03    56.08    62.49    69.86    77.37    79.54    84.48    87.27
## crim    40.25    49.70    51.99    52.95    53.12    53.41    53.47    53.50
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       91.38     93.60     96.75     98.27    100.00
## crim    53.51     53.51     53.51     53.51     53.51
#pls2.pred=predict(pls2.boston, data = x[test2,],ncomp=11)
#mean((pls2.pred-y.test2)^2)
#test2.avg = mean(Boston.test2[, "crim"])
#lm2.test.r2 = 1 - mean((Boston.test2[, "crim"] - pred2.app)^2) /mean((Boston.test2[, "crim"] - test2.avg)^2)
#ridge2.test.r2 = 1 - mean((Boston.test2[, "crim"] - ridge2.pred)^2) /mean((Boston.test2[, "crim"] - test2.avg)^2)
#lasso2.test.r2 = 1 - mean((Boston.test2[, "crim"] - lasso2.pred)^2) /mean((Boston.test2[, "crim"] - test2.avg)^2)
#pcr2.test.r2 = 1 - mean((pcr.pred-y.test)^2) /mean((College.test[, "Apps"] - test.avg)^2)
#pls2.test.r2 = 1 - mean((pls.pred-y.test)^2) /mean((College.test[, "Apps"] - test.avg)^2)