In this exercise, we will predict the number of applications received using the other variables in the College data set. (a) Split the data set into a training set and a test set.
library(ISLR)
## Warning: package 'ISLR' was built under R version 4.4.2
attach(College)
set.seed(1)
train=sample(c(TRUE,FALSE),nrow(College),rep=TRUE)
test=(!train)
College.train = College[train,]
College.test = College[test,]
lm.fit = lm(Apps~., data=College.train)
lm.pred = predict(lm.fit, College.test, type="response")
mean((lm.pred-College.test$Apps)^2)
## [1] 984743.1
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.4.3
## Loading required package: Matrix
## Loaded glmnet 4.1-10
set.seed(1)
train.mat = model.matrix(Apps~., data = College.train)
test.mat = model.matrix(Apps~., data = College.test)
grid=10^seq(10,-2,length=100)
ridge.mod=glmnet(train.mat,College.train$Apps,alpha=0,lambda=grid)
cv.out = cv.glmnet(train.mat,College.train$Apps,alpha=0)
bestlam = cv.out$lambda.min
bestlam
## [1] 394.2365
summary(ridge.mod)
## Length Class Mode
## a0 100 -none- numeric
## beta 1800 dgCMatrix S4
## df 100 -none- numeric
## dim 2 -none- numeric
## lambda 100 -none- numeric
## dev.ratio 100 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
ridge.pred = predict(ridge.mod,s=bestlam,newx = test.mat)
mean((ridge.pred - College.test$Apps)^2)
## [1] 940753.5
The test error of the ridge regression fit is 940753.5. Less than the test error of the linear model.
lasso.mod=glmnet(train.mat,College.train$Apps,alpha=1,lambda=grid)
summary(lasso.mod)
## Length Class Mode
## a0 100 -none- numeric
## beta 1800 dgCMatrix S4
## df 100 -none- numeric
## dim 2 -none- numeric
## lambda 100 -none- numeric
## dev.ratio 100 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
cv.outl2=cv.glmnet(train.mat,College.train$Apps,alpha=1)
bestlam2=cv.outl2$lambda.min
bestlam2
## [1] 2.309051
lasso.pred=predict(lasso.mod,s=bestlam2,newx = test.mat)
mean((lasso.pred-College.test$Apps)^2)
## [1] 978962.6
The test error of the lasso model is higher than the linear model test error and ridge regression test error.
out=glmnet(train.mat,College.train$Apps,alpha=1,lambda = grid)
lasso.coef=predict(out,type="coefficients",s=bestlam2)[1:18,]
lasso.coef[lasso.coef!=0]
## (Intercept) PrivateYes Accept Enroll Top10perc
## -173.89023228 -649.94882654 1.67083516 -0.75305706 57.37667424
## Top25perc F.Undergrad P.Undergrad Outstate Room.Board
## -20.72078908 0.01792667 0.03670117 -0.08378483 0.10937820
## Books Personal PhD Terminal S.F.Ratio
## 0.05923882 -0.01840355 -9.09196597 -6.37371755 20.24530218
## perc.alumni Expend
## 8.95013487 0.08288697
library(pls)
## Warning: package 'pls' was built under R version 4.4.3
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
pcr.college=pcr(Apps~., data=College.train,scale=TRUE,validation="CV")
summary(pcr.college)
## Data: X dimension: 393 17
## Y dimension: 393 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4189 4147 2344 2340 2099 2048 1978
## adjCV 4189 4146 2338 2335 2092 2073 1968
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1972 1949 1889 1893 1895 1906 1913
## adjCV 1964 1933 1875 1881 1884 1892 1900
## 14 comps 15 comps 16 comps 17 comps
## CV 1944 1853 1396 1404
## adjCV 1960 1812 1379 1386
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 31.858 57.44 64.20 69.91 75.10 80.17 83.82 87.30
## Apps 4.353 70.99 71.18 76.84 78.34 81.03 81.59 82.21
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.26 92.74 94.79 96.70 97.76 98.67 99.37
## Apps 83.31 83.97 83.97 84.34 84.58 84.70 91.28
## 16 comps 17 comps
## X 99.82 100.00
## Apps 92.83 93.02
validationplot(pcr.college, val.type="MSEP")
pcr.pred=predict(pcr.college,College.test,ncomp=10)
mean((pcr.pred-College.test$Apps)^2)
## [1] 1682909
pls.college=plsr(Apps~., data=College.train,scale=TRUE, validation="CV")
validationplot(pls.college, val.type="MSEP")
summary(pls.college)
## Data: X dimension: 393 17
## Y dimension: 393 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4189 2180 1909 1755 1696 1455 1332
## adjCV 4189 2171 1906 1741 1665 1431 1318
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1310 1301 1298 1298 1297 1298 1297
## adjCV 1298 1289 1287 1286 1285 1286 1285
## 14 comps 15 comps 16 comps 17 comps
## CV 1297 1296 1296 1296
## adjCV 1285 1285 1285 1285
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 26.01 44.96 62.49 65.22 68.52 72.89 77.13 80.46
## Apps 75.74 82.40 86.74 90.58 92.34 92.79 92.88 92.93
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 82.45 84.76 88.08 90.76 92.80 94.45 97.02
## Apps 92.98 93.00 93.01 93.01 93.02 93.02 93.02
## 16 comps 17 comps
## X 98.03 100.00
## Apps 93.02 93.02
pls.pred=predict(pls.college,College.test,ncomp=8)
mean((pls.pred-College.test$Apps)^2)
## [1] 978534.3
pls.pred=predict(pls.college,College.test,ncomp=9)
mean((pls.pred-College.test$Apps)^2)
## [1] 1007163
M=8 was the best performing model.
PLS, linear, and lasso performed similarly. Ridge regression worse than those and PCR the worst by far.
We will now try to predict per capita crime rate in the Boston data set. (a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
library(leaps)
## Warning: package 'leaps' was built under R version 4.4.3
library(MASS)
set.seed(1)
attach(Boston)
predict.regsubsets = function(object, newdata, id, ...) {
form = as.formula(object$call[[2]])
mat = model.matrix(form, newdata)
coefi = coef(object, id = id)
mat[, names(coefi)] %*% coefi
}
k = 10
p = ncol(Boston) - 1
folds = sample(rep(1:k, length = nrow(Boston)))
cv.errors = matrix(NA, k, p)
for (i in 1:k) {
best.fit = regsubsets(crim ~ ., data = Boston[folds != i, ], nvmax = p)
for (j in 1:p) {
pred = predict(best.fit, Boston[folds == i, ], id = j)
cv.errors[i, j] = mean((Boston$crim[folds == i] - pred)^2)
}
}
mean.cv.errors <- apply(cv.errors, 2, mean)
plot(mean.cv.errors, type = "b", xlab = "Number of variables", ylab = "CV error")
which.min(mean.cv.errors)
## [1] 9
mean.cv.errors[which.min(mean.cv.errors)]
## [1] 42.81453
x = model.matrix(crim ~ . - 1, data = Boston)
y = Boston$crim
cv.lasso = cv.glmnet(x, y, type.measure = "mse")
plot(cv.lasso)
coef(cv.lasso)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## lambda.1se
## (Intercept) 2.176491
## zn .
## indus .
## chas .
## nox .
## rm .
## age .
## dis .
## rad 0.150484
## tax .
## ptratio .
## black .
## lstat .
## medv .
sqrt(cv.lasso$cvm[cv.lasso$lambda == cv.lasso$lambda.1se])
## [1] 7.921353
x = model.matrix(crim ~ . - 1, data = Boston)
y = Boston$crim
cv.ridge = cv.glmnet(x, y, type.measure = "mse", alpha = 0)
plot(cv.ridge)
coef(cv.ridge)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## lambda.1se
## (Intercept) 1.523899542
## zn -0.002949852
## indus 0.029276741
## chas -0.166526007
## nox 1.874769665
## rm -0.142852604
## age 0.006207995
## dis -0.094547258
## rad 0.045932737
## tax 0.002086668
## ptratio 0.071258052
## black -0.002605281
## lstat 0.035745604
## medv -0.023480540
sqrt(cv.ridge$cvm[cv.ridge$lambda == cv.ridge$lambda.1se])
## [1] 7.669133
pcr.crime = pcr(crim ~ ., data = Boston, scale = TRUE, validation = "CV")
summary(pcr.crime)
## Data: X dimension: 506 13
## Y dimension: 506 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 8.61 7.175 7.180 6.724 6.731 6.727 6.727
## adjCV 8.61 7.174 7.179 6.721 6.725 6.724 6.724
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 6.722 6.614 6.618 6.607 6.598 6.553 6.488
## adjCV 6.718 6.609 6.613 6.602 6.592 6.546 6.481
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 47.70 60.36 69.67 76.45 82.99 88.00 91.14 93.45
## crim 30.69 30.87 39.27 39.61 39.61 39.86 40.14 42.47
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 95.40 97.04 98.46 99.52 100.0
## crim 42.55 42.78 43.04 44.13 45.4
The best subset selection model had the lowest cross-validation error.
The model has 11 predictors. By excluding 2, the model has lower variance and higher accuracy.