Chapter 06 (page 259): 2, 9, 11
library(MASS)
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-6
library(ISLR2)
##
## Attaching package: 'ISLR2'
## The following object is masked from 'package:MASS':
##
## Boston
library(pls)
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
library(leaps)
#2
For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
Relative to least squares, the lasso is less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
Relative to least squares, the ridge regresson is less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
Relative to least squares, non-linear methods are more flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease
#9
In this exercise, we will predict the number of applications received using the other variables in the College data set.
names(College)
## [1] "Private" "Apps" "Accept" "Enroll" "Top10perc"
## [6] "Top25perc" "F.Undergrad" "P.Undergrad" "Outstate" "Room.Board"
## [11] "Books" "Personal" "PhD" "Terminal" "S.F.Ratio"
## [16] "perc.alumni" "Expend" "Grad.Rate"
set.seed(99)
train <- sample(c(TRUE,FALSE),nrow(College),rep=TRUE)
test <- (!train)
lm.fit <- lm(Apps ~ ., data = College[train,])
pred <- predict(lm.fit, College[test,])
mean((pred - College[test,]$Apps)^2)
## [1] 1324037
The MSE for the linear model is: 1324037
set.seed(99)
x <- model.matrix(Apps ~ ., College)
y <- College$Apps
y.test <- y[test]
grid <- 10^seq(10,-2,length=100)
cv.out <- cv.glmnet(x[train,],y[train], alpha=0)
bestlam <- cv.out$lambda.min
ridge.mod <- glmnet(x[train,], y[train], alpha = 0, lambda = grid, thresh = 1e-12)
ridge.pred <- predict(ridge.mod, s=bestlam, newx = x[test,])
mean((ridge.pred-y.test)^2)
## [1] 1969100
The MSE for the ridge model is: 1969100
set.seed(99)
cv.out <- cv.glmnet(x[train,],y[train], alpha=1)
bestlam <- cv.out$lambda.min
lasso.mod <- glmnet(x[train,], y[train], alpha = 1, lambda = grid)
lasso.pred <- predict(lasso.mod, s=bestlam, newx = x[test,])
mean((lasso.pred-y.test)^2)
## [1] 1371963
The MSE for the lasso model is: 1371963
set.seed(99)
pcr_fit <- pcr(Apps~., data=College[train,], scale=TRUE, validation="CV")
validationplot(pcr_fit, val.type = "MSEP")
summary(pcr_fit)
## Data: X dimension: 393 17
## Y dimension: 393 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 3349 3256 1705 1722 1467 1347 1344
## adjCV 3349 3253 1701 1720 1436 1340 1341
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1239 1218 1251 1269 1270 1269 1272
## adjCV 1215 1212 1244 1262 1264 1262 1265
## 14 comps 15 comps 16 comps 17 comps
## CV 1269 1216 1053 1059
## adjCV 1262 1210 1049 1052
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.953 58.23 65.81 71.45 76.86 81.40 84.92 88.32
## Apps 6.669 75.32 75.37 82.89 85.26 85.29 87.67 87.67
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 91.17 93.6 95.61 97.12 98.16 98.88 99.41
## Apps 87.89 87.9 88.01 88.17 88.28 88.35 89.51
## 16 comps 17 comps
## X 99.83 100.00
## Apps 91.53 91.89
pcr_pred <- predict(pcr_fit , College[test , ], ncomp = 10)
mean (( pcr_pred - y.test)^2)
## [1] 2775474
The MSE for the PCR model is: 2775474
pls_fit <- plsr(Apps~. ,data=College[train,] ,scale=TRUE ,validation="CV")
validationplot(pls_fit ,val.type = "MSEP")
summary(pls_fit)
## Data: X dimension: 393 17
## Y dimension: 393 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 3349 1559 1376 1176 1144 1092 1055
## adjCV 3349 1555 1376 1173 1136 1080 1050
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1041 1041 1043 1045 1043 1045 1045
## adjCV 1037 1036 1038 1040 1038 1040 1040
## 14 comps 15 comps 16 comps 17 comps
## CV 1044 1044 1044 1044
## adjCV 1039 1039 1039 1039
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 26.23 50.58 63.51 66.76 69.84 75.24 78.06 82.10
## Apps 79.46 84.26 88.70 90.00 91.21 91.61 91.75 91.79
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 84.40 86.41 89.47 91.43 92.77 94.11 96.18
## Apps 91.83 91.86 91.87 91.88 91.88 91.88 91.88
## 16 comps 17 comps
## X 98.90 100.00
## Apps 91.88 91.89
pls_pred <- predict(pls_fit , College[test, ], ncomp = 7)
mean (( pls_pred - y.test)^2)
## [1] 1376845
The error for the PCR model is: 1376845
#11
We will now try to predict per capita crime rate in the Boston data set. (a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
attach(Boston)
set.seed(1)
predict.regsubsets <- function(object, newdata, id, ...) {
form <- as.formula(object$call[[2]])
mat <- model.matrix(form, newdata)
coefi <- coef(object, id = id)
mat[, names(coefi)] %*% coefi
}
k = 10
p <- ncol(Boston) - 1
folds <- sample(rep(1:k, length = nrow(Boston)))
cv.errors <- matrix(NA, k, p)
for (i in 1:k) {
best.fit <- regsubsets(crim ~ ., data = Boston[folds != i, ], nvmax = p)
for (j in 1:p) {
pred <- predict(best.fit, Boston[folds == i, ], id = j)
cv.errors[i, j] <- mean((Boston$crim[folds == i] - pred)^2)}}
mean.cv.errors <- apply(cv.errors, 2, mean)
plot(mean.cv.errors, type = "b", xlab = "Predictors", ylab = "Error")
which.min(mean.cv.errors)
## [1] 11
mean.cv.errors[9]
## [1] 42.75855
The Best subset model with 9 predictors displays the lowest Test MSE, which is= 42.81453
set.seed(1)
split <- sample(nrow(Boston), 0.7*nrow(Boston))
train <- Boston[split,]
test <- Boston[-split,]
train.mat <- model.matrix(crim~.,train)
test.mat <- model.matrix(crim~.,test)
grid=10^seq(10,-2,length=100)
bridge.mod <- glmnet(train.mat, train$crim, alpha=0, lambda = grid, thresh=1e-12)
cv.out <- cv.glmnet(train.mat, train$crim, alpha=0, lambda = grid, thresh=1e-12)
lambda <- cv.out$lambda.min
lambda
## [1] 0.07054802
ridge <-glmnet(train.mat, train$crim, alpha=0, lambda = lambda)
coef(ridge)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 8.6551478305
## (Intercept) .
## zn 0.0348028903
## indus -0.0778134418
## chas -0.7005827378
## nox -5.8865340096
## rm 0.4652812869
## age -0.0023113130
## dis -0.6847741847
## rad 0.5311924048
## tax -0.0006274497
## ptratio -0.3173369112
## lstat 0.2300573891
## medv -0.1460584759
pred <- predict(ridge, s=lambda, newx=test.mat)
MSE <- mean((pred-test$crim)^2)
MSE
## [1] 57.81492
The MSE for the ridge model is: 59.57622
lasso <- glmnet(train.mat, train$crim, alpha=0,lambda=grid)
cv.lasso <- cv.glmnet(train.mat, train$crim, alpha=0, lambda=grid)
lambda.lasso <- cv.lasso$lambda.min
lambda.lasso
## [1] 0.09326033
lasso1<-glmnet(train.mat, train$crim, alpha=0, lambda=lambda.lasso)
coef(lasso1)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 8.2638175782
## (Intercept) .
## zn 0.0343471195
## indus -0.0792217778
## chas -0.6979559379
## nox -5.6728263126
## rm 0.4575823669
## age -0.0022603493
## dis -0.6731592993
## rad 0.5224052360
## tax -0.0002176078
## ptratio -0.3099114914
## lstat 0.2307052204
## medv -0.1431985787
pred.lasso <- predict(lasso, s=lambda.lasso, newx= test.mat)
lasso_MSE <- mean((test$crim - pred.lasso)^2)
lasso_MSE
## [1] 57.88517
The MSE for the lasso model is: 59.91168
pcr.model <- pcr(crim ~ . ,data=train ,scale=TRUE ,validation="CV")
summary(pcr.model)
## Data: X dimension: 354 12
## Y dimension: 354 1
## Fit method: svdpc
## Number of components considered: 12
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 8.085 6.776 6.774 6.422 6.363 6.307 6.279
## adjCV 8.085 6.770 6.768 6.412 6.354 6.299 6.271
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps
## CV 6.096 6.090 6.074 6.078 6.068 6.009
## adjCV 6.087 6.078 6.065 6.069 6.058 5.999
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 50.57 64.47 73.58 80.72 87.14 90.65 93.18 95.07
## crim 31.52 31.89 39.44 40.88 41.82 42.41 45.89 46.09
## 9 comps 10 comps 11 comps 12 comps
## X 96.86 98.34 99.46 100.00
## crim 46.40 46.46 46.94 48.03
validationplot(pcr.model , val.type = "MSEP")
pcr.model1 <- pcr(crim ~ . ,data=train ,scale=TRUE , ncomp = 12)
summary(pcr.model1)
## Data: X dimension: 354 12
## Y dimension: 354 1
## Fit method: svdpc
## Number of components considered: 12
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 50.57 64.47 73.58 80.72 87.14 90.65 93.18 95.07
## crim 31.52 31.89 39.44 40.88 41.82 42.41 45.89 46.09
## 9 comps 10 comps 11 comps 12 comps
## X 96.86 98.34 99.46 100.00
## crim 46.40 46.46 46.94 48.03
predict.pcr <- predict(pcr.model ,test ,ncomp=12)
MSE <- mean((test$crim - predict.pcr)^2)
MSE
## [1] 57.61252
The MSE for the PCR model is: 58.97718
Comparing the MSE from the different models applied to the Boston data set, we can observe that the best subset model has the lowest MSE, being 42.81453
the Best Subset Model performs with only 9 predictors, unlike the rest of the models which perform with 13.
detach(Boston)