Problem 2
For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
The lasso relative to least squares is less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance (answer iii). When the least squares estimates have excessively high variance, the lasso solution can yield a reduction in variance at the expense of a small increase in bias, and consequently can generate more accurate predictions.
Ridge regression relative to least squares is less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance (answer iii). Ridge regression’s advantage over least squares is rooted in the bias-variance trade-off. As λ increases, the flexibility of the ridge regression fit decreases, leading to decreased variance but increased bias.
Non-linear methods relative to least squares are more flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias (answer ii). Nonlinear methods are more flexible than least squares and are less biased.
Problem 9
In this exercise, we will predict the number of applications received using the other variables in the College data set.
library(ISLR)
attach(College)
x=model.matrix(Apps~.,College)[,-1]
y=College$Apps
set.seed(10)
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
College.train = College[train, ]
College.test = College[test, ]
y.test=y[test]
LS.fit<-lm(Apps~., data=College, subset=train)
summary(LS.fit)
##
## Call:
## lm(formula = Apps ~ ., data = College, subset = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5139.5 -473.3 -21.1 353.2 7402.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -629.36179 639.35741 -0.984 0.325579
## PrivateYes -647.56836 192.17056 -3.370 0.000832 ***
## Accept 1.68912 0.05038 33.530 < 2e-16 ***
## Enroll -1.02383 0.27721 -3.693 0.000255 ***
## Top10perc 48.19124 8.10714 5.944 6.42e-09 ***
## Top25perc -10.51538 6.44952 -1.630 0.103865
## F.Undergrad 0.01992 0.05364 0.371 0.710574
## P.Undergrad 0.04213 0.05348 0.788 0.431373
## Outstate -0.09489 0.02674 -3.549 0.000436 ***
## Room.Board 0.14549 0.07243 2.009 0.045277 *
## Books 0.06660 0.31115 0.214 0.830623
## Personal 0.05663 0.09453 0.599 0.549475
## PhD -10.11489 7.11588 -1.421 0.156027
## Terminal -2.29300 8.03546 -0.285 0.775528
## S.F.Ratio 22.07117 18.70991 1.180 0.238897
## perc.alumni 2.08121 6.00673 0.346 0.729179
## Expend 0.07654 0.01672 4.577 6.45e-06 ***
## Grad.Rate 9.99706 4.49821 2.222 0.026857 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1092 on 370 degrees of freedom
## Multiple R-squared: 0.9395, Adjusted R-squared: 0.9367
## F-statistic: 338 on 17 and 370 DF, p-value: < 2.2e-16
pred.app<-predict(LS.fit, College.test)
test.error<-mean((College.test$Apps-pred.app)^2)
test.error
## [1] 1020100
The test error is 1,020,100.
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.3.3
## Loading required package: Matrix
## Loaded glmnet 4.1-10
grid=10^seq(10,-2,length=100)
ridge.mod=glmnet(x[train,],y[train],alpha=0,lambda=grid)
summary(ridge.mod)
## Length Class Mode
## a0 100 -none- numeric
## beta 1700 dgCMatrix S4
## df 100 -none- numeric
## dim 2 -none- numeric
## lambda 100 -none- numeric
## dev.ratio 100 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
cv.college.out=cv.glmnet(x[train,],y[train] ,alpha=0)
bestlam=cv.college.out$lambda.min
bestlam
## [1] 411.3927
ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
mean((ridge.pred-y.test)^2)
## [1] 985020.1
The test error is 985,020.01.
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
summary(lasso.mod)
## Length Class Mode
## a0 100 -none- numeric
## beta 1700 dgCMatrix S4
## df 100 -none- numeric
## dim 2 -none- numeric
## lambda 100 -none- numeric
## dev.ratio 100 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
bestlam=cv.out$lambda.min
bestlam
## [1] 24.66235
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
## [1] 1008145
The test error is 1,008,145.
library(pls)
## Warning: package 'pls' was built under R version 4.3.3
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
pcr.college=pcr(Apps~., data=College.train,scale=TRUE,validation="CV")
summary(pcr.college)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4347 4345 2371 2391 2104 1949 1898
## adjCV 4347 4345 2368 2396 2085 1939 1891
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1899 1880 1864 1861 1870 1873 1891
## adjCV 1893 1862 1857 1853 1862 1865 1885
## 14 comps 15 comps 16 comps 17 comps
## CV 1903 1727 1295 1260
## adjCV 1975 1669 1283 1249
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.6794 56.94 64.38 70.61 76.27 80.97 84.48 87.54
## Apps 0.9148 71.17 71.36 79.85 81.49 82.73 82.79 83.70
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.50 92.89 94.96 96.81 97.97 98.73 99.39
## Apps 83.86 84.08 84.11 84.11 84.16 84.28 93.08
## 16 comps 17 comps
## X 99.86 100.00
## Apps 93.71 93.95
validationplot(pcr.college, val.type="MSEP")
pcr.pred=predict(pcr.college,x[test,],ncomp=10)
mean((pcr.pred-y.test)^2)
## [1] 1422699
The test error is 1,422,699. M = 10.
pls.college=plsr(Apps~., data=College.train,scale=TRUE, validation="CV")
validationplot(pls.college, val.type="MSEP")
summary(pls.college)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4347 2178 1872 1734 1615 1453 1359
## adjCV 4347 2171 1867 1726 1586 1427 1341
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1347 1340 1329 1317 1310 1305 1305
## adjCV 1330 1324 1314 1302 1296 1291 1291
## 14 comps 15 comps 16 comps 17 comps
## CV 1305 1307 1307 1307
## adjCV 1291 1292 1293 1293
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 24.27 38.72 62.64 65.26 69.01 73.96 78.86 82.18
## Apps 76.96 84.31 86.80 91.48 93.37 93.75 93.81 93.84
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 85.35 87.42 89.18 91.41 92.70 94.58 97.16
## Apps 93.88 93.91 93.93 93.94 93.95 93.95 93.95
## 16 comps 17 comps
## X 98.15 100.00
## Apps 93.95 93.95
pls.pred=predict(pls.college,x[test,],ncomp=9)
mean((pls.pred-y.test)^2)
## [1] 1049868
The test error is 1,049,868. M = 11.
The Ridge model has the lowest test error and may be the best model. There is not much difference among the test errors.
Problem 11
We will now try to predict per capita crime rate in the Boston data set.
library(MASS)
attach(Boston)
x=model.matrix(crim~.,Boston)[,-1]
y=Boston$crim
set.seed(10)
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
Boston.train = Boston[train, ]
Boston.test = Boston[test, ]
y.test=y[test]
Linear:
LS.fit<-lm(crim~., data=Boston, subset=train)
summary(LS.fit)
##
## Call:
## lm(formula = crim ~ ., data = Boston, subset = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.440 -1.772 -0.274 0.902 56.481
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.608332 8.075136 2.057 0.0408 *
## zn 0.036330 0.023112 1.572 0.1173
## indus -0.092007 0.105736 -0.870 0.3851
## chas -1.201144 1.318540 -0.911 0.3632
## nox -6.136130 5.726461 -1.072 0.2850
## rm -0.273063 0.624231 -0.437 0.6622
## age 0.007418 0.019710 0.376 0.7070
## dis -0.605597 0.314563 -1.925 0.0554 .
## rad 0.532056 0.094632 5.622 5.24e-08 ***
## tax -0.002360 0.006015 -0.392 0.6952
## ptratio -0.222514 0.219675 -1.013 0.3121
## black -0.013073 0.004510 -2.899 0.0041 **
## lstat 0.133088 0.083231 1.599 0.1111
## medv -0.111278 0.068103 -1.634 0.1036
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.255 on 239 degrees of freedom
## Multiple R-squared: 0.5351, Adjusted R-squared: 0.5098
## F-statistic: 21.16 on 13 and 239 DF, p-value: < 2.2e-16
pred.crim<-predict(LS.fit, Boston.test)
test.error<-mean((Boston.test$crim-pred.crim)^2)
test.error
## [1] 55.60279
The test error is 0.7456872.
Ridge Regression:
library(glmnet)
grid=10^seq(10,-2,length=100)
ridge.mod=glmnet(x[train,],y[train],alpha=0,lambda=grid)
summary(ridge.mod)
## Length Class Mode
## a0 100 -none- numeric
## beta 1300 dgCMatrix S4
## df 100 -none- numeric
## dim 2 -none- numeric
## lambda 100 -none- numeric
## dev.ratio 100 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
cv.boston.out=cv.glmnet(x[train,],y[train] ,alpha=0)
bestlam=cv.boston.out$lambda.min
bestlam
## [1] 0.5060121
ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
mean((ridge.pred-y.test)^2)
## [1] 56.33914
The test error is 0.7573181.
Lasso:
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
summary(lasso.mod)
## Length Class Mode
## a0 100 -none- numeric
## beta 1300 dgCMatrix S4
## df 100 -none- numeric
## dim 2 -none- numeric
## lambda 100 -none- numeric
## dev.ratio 100 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
bestlam=cv.out$lambda.min
bestlam
## [1] 0.04830131
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
## [1] 55.95196
The test error is 0.7580784.
PCR Model:
library(pls)
pcr.boston=pcr(crim~., data=Boston.train,scale=TRUE,validation="CV")
summary(pcr.boston)
## Data: X dimension: 253 13
## Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 7.521 6.068 6.014 5.627 5.523 5.562 5.631
## adjCV 7.521 6.066 6.013 5.619 5.516 5.555 5.621
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 5.644 5.556 5.443 5.444 5.465 5.447 5.404
## adjCV 5.633 5.539 5.426 5.429 5.451 5.431 5.388
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 46.57 58.79 68.40 76.02 82.37 87.52 90.99 93.28
## crim 35.21 36.53 45.03 47.18 47.30 47.35 47.35 49.85
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 95.35 97.13 98.44 99.51 100.00
## crim 51.98 51.99 52.05 52.64 53.51
validationplot(pcr.boston, val.type="MSEP")
pcr.pred=predict(pcr.boston,x[test,],ncomp=10)
mean((pcr.pred-y.test)^2)
## [1] 57.77502
The test error is 0.5777502.
PLS Model:
pls.boston=plsr(crim~., data=Boston.train,scale=TRUE, validation="CV")
validationplot(pls.boston, val.type="MSEP")
summary(pls.boston)
## Data: X dimension: 253 13
## Y dimension: 253 1
## Fit method: kernelpls
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 7.521 5.910 5.565 5.567 5.549 5.509 5.498
## adjCV 7.521 5.905 5.554 5.546 5.525 5.489 5.477
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 5.521 5.510 5.505 5.505 5.505 5.505 5.505
## adjCV 5.497 5.488 5.483 5.483 5.483 5.484 5.484
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 46.03 56.08 62.49 69.86 77.37 79.54 84.48 87.27
## crim 40.25 49.70 51.99 52.95 53.12 53.41 53.47 53.50
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 91.38 93.60 96.75 98.27 100.00
## crim 53.51 53.51 53.51 53.51 53.51
pls.pred=predict(pls.boston,x[test,],ncomp=9)
mean((pls.pred-y.test)^2)
## [1] 55.56646
The test error is 0.7451593.
The PLS model has the lowest test error of 0.7451593 and may be the best model.