(a) The lasso, relative to least squares, is: iii. Less
flexible and hence will give improved prediction accuracy when its
increase in bias is less than its decrease in variance.
(b) Ridge regression relative to least squares, is: iii. Less
flexible and hence will give improved prediction accuracy when its
increase in bias is less than its decrease in variance.
(c) Non-linear methods relative to least squares, is: ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
(a) Split the data set into a training set and a test set.
library(ISLR)
rand_obs <- sample.int(nrow(College), nrow(College)*.7)
train_set <- College[rand_obs,]
test_set <- College[-rand_obs,]
(b) Fit a linear model using least squares on the training set, and report the test error obtained
lm.fit <- lm(Apps ~ ., data = train_set)
summary(lm.fit)
##
## Call:
## lm(formula = Apps ~ ., data = train_set)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3541.6 -458.7 -37.4 312.3 7418.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -144.31603 512.02034 -0.282 0.778165
## PrivateYes -442.87818 172.42573 -2.569 0.010489 *
## Accept 1.67649 0.04572 36.668 < 2e-16 ***
## Enroll -0.79910 0.20746 -3.852 0.000132 ***
## Top10perc 48.33376 6.71193 7.201 2.08e-12 ***
## Top25perc -12.57759 5.36135 -2.346 0.019348 *
## F.Undergrad 0.01079 0.03771 0.286 0.774956
## P.Undergrad 0.06310 0.03576 1.764 0.078231 .
## Outstate -0.09519 0.02306 -4.127 4.27e-05 ***
## Room.Board 0.12475 0.05684 2.195 0.028622 *
## Books 0.16668 0.28951 0.576 0.565032
## Personal 0.01771 0.07062 0.251 0.802134
## PhD -8.78289 5.81358 -1.511 0.131453
## Terminal -4.33909 6.40877 -0.677 0.498669
## S.F.Ratio 2.53563 17.12521 0.148 0.882349
## perc.alumni 1.37125 4.92392 0.278 0.780747
## Expend 0.07091 0.01602 4.426 1.17e-05 ***
## Grad.Rate 8.46122 3.55393 2.381 0.017630 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1034 on 525 degrees of freedom
## Multiple R-squared: 0.936, Adjusted R-squared: 0.934
## F-statistic: 451.9 on 17 and 525 DF, p-value: < 2.2e-16
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
x <- model.matrix(Apps~ ., College)[,-2]
y <- College$Apps
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.1.3
## Loading required package: Matrix
## Loaded glmnet 4.1-3
grid=10^seq(10,-2,length=100)
ridge.mod=glmnet(x,y,alpha=0,lambda=grid)
set.seed(1)
train=sample(1:nrow(x),nrow(x)/2)
test=(-train)
y.test=y[test]
cv.out=cv.glmnet(x[train,],y[train],alpha=0)
plot(cv.out)
bestlam=cv.out$lambda.min
bestlam
## [1] 405.8404
(d) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
plot(lasso.mod)
## Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
## collapsing to unique 'x' values
set.seed(1)
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
plot(cv.out)
bestlam =cv.out$lambda.min
lasso.pred=predict(lasso.mod ,s=bestlam ,newx=x[test ,])
mean((lasso.pred - y.test)^2)
## [1] 1140717
(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
library(pls)
## Warning: package 'pls' was built under R version 4.1.3
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
pcr.fit=pcr(Apps ~ ., data = College, scale=TRUE, validation ="CV")
summary(pcr.fit)
## Data: X dimension: 777 17
## Y dimension: 777 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 3873 3836 2035 2041 1784 1591 1595
## adjCV 3873 3838 2033 2040 1720 1585 1592
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1581 1557 1514 1513 1515 1514 1520
## adjCV 1582 1551 1511 1509 1512 1510 1516
## 14 comps 15 comps 16 comps 17 comps
## CV 1521 1466 1176 1139
## adjCV 1518 1450 1169 1133
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 31.670 57.30 64.30 69.90 75.39 80.38 83.99 87.40
## Apps 2.316 73.06 73.07 82.08 84.08 84.11 84.32 85.18
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.50 92.91 95.01 96.81 97.9 98.75 99.36
## Apps 85.88 86.06 86.06 86.10 86.1 86.13 90.32
## 16 comps 17 comps
## X 99.84 100.00
## Apps 92.52 92.92
validationplot(pcr.fit,val.type="MSEP")
(f) Fit a PLS model on the training set, with M chosen by
cross-validation. Report the test error obtained, along with the value
of M selected by cross-validation.
set.seed(1)
pls.fit=plsr(Apps ~ .,data=College, subset=train,scale=TRUE,validation ="CV")
summary (pls.fit)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4288 2217 2019 1761 1630 1533 1347
## adjCV 4288 2211 2012 1749 1605 1510 1331
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1309 1303 1286 1283 1283 1277 1271
## adjCV 1296 1289 1273 1270 1270 1264 1258
## 14 comps 15 comps 16 comps 17 comps
## CV 1270 1270 1270 1270
## adjCV 1258 1257 1257 1257
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 27.21 50.73 63.06 65.52 70.20 74.20 78.62 80.81
## Apps 75.39 81.24 86.97 91.14 92.62 93.43 93.56 93.68
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 83.29 87.17 89.15 91.37 92.58 94.42 96.98
## Apps 93.76 93.79 93.83 93.86 93.88 93.89 93.89
## 16 comps 17 comps
## X 98.78 100.00
## Apps 93.89 93.89
validationplot(pls.fit,val.type="MSEP")
(g) Comment on the results obtained. How accurately can we
predict the number of college applications received? Is there much
difference among the test errors resulting from these five
approaches?
There is not much difference between the 5 different approaches, but
Ridge seems to be the worst.