(a) The lasso, relative to least squares, is:
Less flexible and hence will give improved prediction accu- racy when
its increase in bias is less than its decrease in variance.
(b) Repeat (a) for ridge regression relative to least
squares.
Less flexible and hence will give improved prediction accu- racy when
its increase in bias is less than its decrease in variance.
(c) Repeat (a) for non-linear methods relative to least
squares.
More flexible and hence will give improved prediction accu- racy when
its increase in variance is less than its decrease in bias.
(a) Split the data set into a training set and a test set.
library(ISLR)
data(College)
set.seed(11)
train = sample(1:dim(College)[1], dim(College)[1] / 2)
test <- -train
College.train <- College[train, ]
College.test <- College[test, ]
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
fit.lm <- lm(Apps ~ ., data = College.train)
pred.lm <- predict(fit.lm, College.test)
mean((pred.lm - College.test$Apps)^2)
## [1] 1026096
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(Matrix)
library(MatrixModels)
library(glmnet)
## Loaded glmnet 4.1-7
train.matrix <- model.matrix(Apps ~ ., data = College.train)
test.matrix <- model.matrix(Apps ~ ., data = College.test)
grid <- 10 ^ seq(4, -2, length = 100)
fit.ridge <- glmnet(train.matrix, College.train$Apps, alpha = 0, lambda = grid, thresh = 1e-12)
cv.ridge <- cv.glmnet(train.matrix, College.train$Apps, alpha = 0, lambda = grid, thresh = 1e-12)
bestlamda.ridge <- cv.ridge$lambda.min
bestlamda.ridge
## [1] 0.01
pred.ridge <- predict(fit.ridge, s = bestlamda.ridge, newx = test.matrix)
mean((pred.ridge - College.test$Apps)^2)
## [1] 1026069
The test error is higher for ridge regression than for least squares.
(d) Fit a lasso model on the training set, with λ chosen by cross- validation. Report the test error obtained, along with the num- ber of non-zero coefficient estimates.
fit.lasso <- glmnet(train.matrix, College.train$Apps, alpha = 1, lambda = grid, thresh = 1e-12)
cv.lasso <- cv.glmnet(train.matrix, College.train$Apps, alpha = 1, lambda = grid, thresh = 1e-12)
bestlamda.lasso <- cv.lasso$lambda.min
bestlamda.lasso
## [1] 0.01
pred.lasso <- predict(fit.lasso, s = bestlamda.lasso, newx = test.matrix)
mean((pred.lasso - College.test$Apps)^2)
## [1] 1026036
predict(fit.lasso, s = bestlamda.lasso, type = "coefficients")
## 19 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 37.86520037
## (Intercept) .
## PrivateYes -551.14946609
## Accept 1.74980812
## Enroll -1.36005786
## Top10perc 65.55655577
## Top25perc -22.52640339
## F.Undergrad 0.10181853
## P.Undergrad 0.01789131
## Outstate -0.08706371
## Room.Board 0.15384585
## Books -0.12227313
## Personal 0.16194591
## PhD -14.29638634
## Terminal -1.03118224
## S.F.Ratio 4.47956819
## perc.alumni -0.45456280
## Expend 0.05618050
## Grad.Rate 9.07242834
The test error is also higher for ridge regression than for least squares.
(e) Fit a PCR model on the training set, with M chosen by cross- validation. Report the test error obtained, along with the value of M selected by cross-validation.
library(pls)
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
fit.pcr <- pcr(Apps ~ ., data = College.train, scale = TRUE, validation = "CV")
validationplot(fit.pcr, val.type = "MSEP")
pred.pcr <- predict(fit.pcr, College.test, ncomp = 10)
mean((pred.pcr - College.test$Apps)^2)
## [1] 1867486
The test error is also higher for PCR than for least squares.
(f) Fit a PLS model on the training set, with M chosen by cross- validation. Report the test error obtained, along with the value of M selected by cross-validation.
fit.pls <- plsr(Apps ~ ., data = College.train, scale = TRUE, validation = "CV")
validationplot(fit.pls, val.type = "MSEP")
pred.pls <- predict(fit.pls, College.test, ncomp = 10)
mean((pred.pls - College.test$Apps)^2)
## [1] 1031287
(g) Comment on the results obtained. How accurately can we pre- dict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
test.avg <- mean(College.test$Apps)
lm.r2 <- 1 - mean((pred.lm - College.test$Apps)^2) / mean((test.avg - College.test$Apps)^2)
ridge.r2 <- 1 - mean((pred.ridge - College.test$Apps)^2) / mean((test.avg - College.test$Apps)^2)
lasso.r2 <- 1 - mean((pred.lasso - College.test$Apps)^2) / mean((test.avg - College.test$Apps)^2)
pcr.r2 <- 1 - mean((pred.pcr - College.test$Apps)^2) / mean((test.avg - College.test$Apps)^2)
pls.r2 <- 1 - mean((pred.pls - College.test$Apps)^2) / mean((test.avg - College.test$Apps)^2)
the test \(R^2\) for least squares
is 0.910422805412481
the test \(R^2\) for ridge is
0.910425186297465 the test \(R^2\) for
lasso is 0.910428049291513 the test \(R^2\) for pcr is 0.836970251054856
the test \(R^2\) for pls is
0.90996958889881
All models, except PCR, predict college applications with high accuracy.
(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider..
library(leaps)
library(MASS)
data(Boston)
set.seed(1)
predict.regsubsets <- function(object, newdata, id, ...) {
form <- as.formula(object$call[[2]])
mat <- model.matrix(form, newdata)
coefi <- coef(object, id = id)
xvars <- names(coefi)
mat[, xvars] %*% coefi
}
k = 10
folds <- sample(1:k, nrow(Boston), replace = TRUE)
cv.errors <- matrix(NA, k, 13, dimnames = list(NULL, paste(1:13)))
for (j in 1:k) {
best.fit <- regsubsets(crim ~ ., data = Boston[folds != j, ], nvmax = 13)
for (i in 1:13) {
pred <- predict(best.fit, Boston[folds == j, ], id = i)
cv.errors[j, i] <- mean((Boston$crim[folds == j] - pred)^2)
}
}
mean.cv.errors <- apply(cv.errors, 2, mean)
plot(mean.cv.errors, type = "b", xlab = "Number of variables", ylab = "CV error")
This is for cross-validation.
x <- model.matrix(crim ~ ., Boston)[, -1]
y <- Boston$crim
cv.out <- cv.glmnet(x, y, alpha = 1, type.measure = "mse")
plot(cv.out)
This is for lasso
cv.out <- cv.glmnet(x, y, alpha = 0, type.measure = "mse")
plot(cv.out)
this is for ridge regression.
pcr.fit <- pcr(crim ~ ., data = Boston, scale = TRUE, validation = "CV")
summary(pcr.fit)
## Data: X dimension: 506 13
## Y dimension: 506 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 8.61 7.198 7.198 6.786 6.762 6.790 6.821
## adjCV 8.61 7.195 7.195 6.780 6.753 6.784 6.813
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 6.822 6.689 6.712 6.720 6.712 6.664 6.593
## adjCV 6.812 6.679 6.701 6.708 6.700 6.651 6.580
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 47.70 60.36 69.67 76.45 82.99 88.00 91.14 93.45
## crim 30.69 30.87 39.27 39.61 39.61 39.86 40.14 42.47
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 95.40 97.04 98.46 99.52 100.0
## crim 42.55 42.78 43.04 44.13 45.4
validationplot(pcr.fit, val.type = "MSEP")
this is for pcr.
(b) Propose a model (or set of models) that seem to perform
well on this data set, and justify your answer. Make sure that you are
evaluating model performance using validation set error, cross-
validation, or some other reasonable alternative, as opposed to using
training error.
We select the ones with lowest CV so that would be the cross validation
method.
(c) Does your chosen model involve all of the features in the
data set? Why or why not?
No it only has 13 perdictor variables.