Conceptual Questions
2.) For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
(a) The lasso, relative to least squares, is:
Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
(b) Repeat (a) for ridge regression relative to least squares.
Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
(c) Repeat (a) for non-linear methods relative to least squares.
More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias
Applied Questions
9. In this exercise, we will predict the number of applications received using the other variables in the College data set.
(a) Split the data set into a training set and a test set.
library(ISLR)
library(caret)
library(tidyverse)
set.seed(1)
College=na.omit(College)
data("College")
# 70% of the sample size
smp_size <- floor(.7 * nrow(College))
# set the seed to make partition reproducible
set.seed(1)
train_ind <- sample(seq_len(nrow(College)), size = smp_size)
train <- College[train_ind, ]
test <- College[-train_ind, ]
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
The test mean squared error of the linear model is 1,261,630
set.seed(1)
lm.fit <- lm(Apps ~ ., data = train)
lm.pred <- predict(lm.fit, test)
lm.RMSE <- mean((lm.pred-test[, "Apps"])^2)
lm.RMSE
## [1] 1261630
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(glmnet)
set.seed(1)
#find best λ
x_train <- model.matrix(Apps ~ ., data = train)
x_test <- model.matrix(Apps ~ ., data = test)
grid <- 4 ^ seq(4, -2, length = 100)
ridge.mod <- cv.glmnet(x_train, train[, "Apps"], alpha = 0, lambda = grid, thresh = 1e-12)
ridge.bestlam <- ridge.mod$lambda.min
ridge.bestlam
## [1] 0.0625
The test mean squared error of the ridge model is 1,261,460
set.seed(1)
#fit ridge regression with the best λ
ridge.pred = predict(ridge.mod, newx=x_test, s=ridge.bestlam)
ridge.RMSE <- mean((ridge.pred - test[, "Apps"])^2)
ridge.RMSE
## [1] 1261460
(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.1
set.seed(1)
#find best λ
lasso.mod = cv.glmnet(x_train, train[, "Apps"], alpha=1, lambda=grid, thresh=1e-12)
lasso.bestlam = lasso.mod$lambda.min
lasso.bestlam
## [1] 0.0625
The test mean squared error of the lasso model is 1,261,414
set.seed(1)
#fit lasso regression with the best λ
lasso.pred = predict(lasso.mod, newx=x_test, s=lasso.bestlam)
lasso.RMSE <- mean((lasso.pred - test[, "Apps"])^2)
lasso.RMSE
## [1] 1261414
The coefficients results below suggests that most variables has a coefficients close to zero.
model_matrix <- model.matrix(Apps~., data=College)
lasso.mod = glmnet(model_matrix, College[, "Apps"], alpha=1)
predict(lasso.mod, s=lasso.bestlam, type="coefficients")
## 19 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -471.39372069
## (Intercept) .
## PrivateYes -491.04485135
## Accept 1.57033288
## Enroll -0.75961467
## Top10perc 48.14698891
## Top25perc -12.84690694
## F.Undergrad 0.04149116
## P.Undergrad 0.04438973
## Outstate -0.08328388
## Room.Board 0.14943472
## Books 0.01532293
## Personal 0.02909954
## PhD -8.39597537
## Terminal -3.26800340
## S.F.Ratio 14.59298267
## perc.alumni -0.04404771
## Expend 0.07712632
## Grad.Rate 8.28950241
(e) Fit a PCR model on the training set, with M chosen by cross validation. Report the test error obtained, along with the value of M selected by cross-validation.
Based on a cross validation of 5, the test mean squared error of the PCR model is 2,057,929.
library(pls)
set.seed(1)
pcr.fit=pcr(Apps~., data=train, scale=TRUE, validation ="CV")
validationplot(pcr.fit, val.type="RMSEP")
set.seed(1)
pcr.pred = predict(pcr.fit, test, ncomp=5)
pcr.RMSE <- mean((pcr.pred - test[, "Apps"])^2)
pcr.RMSE
## [1] 2057929
(f) Fit a PLS model on the training set, with M chosen by cross validation. Report the test error obtained, along with the value of M selected by cross-validation.
Based on a cross validation of 5, the test mean squared error of the PLS model is 1,316,934.
pls.fit=plsr(Apps~., data=train, scale=TRUE, validation ="CV")
validationplot(pls.fit, val.type="RMSEP")
set.seed(1)
pls.pred = predict(pls.fit, test, ncomp=5)
pls.RMSE <- mean((pls.pred - test[, "Apps"])^2)
pls.RMSE
## [1] 1316934
(g) Comment on the results obtained. How accurately can we predict the number of College applications received? Is there much difference among the test errors resulting from these five approaches?
The test rsquared results, Linear Model = 91.34%, Ridge Model = 91.35%, Lasso Model = 91.58%, PCR Model = 85.88%, and PLS 90.97%. The results suggests that all of the model does fairly well in predicting the number of College applications.
The test error results, Linear Model = 1261630, Ridge Model = 1261460, Lasso Model = 1227029, PCR Model = 2057929, and PLS 1316934. The results suggests that the PCR has the higher test error and Lasso has the lowest error.
test.avg <- mean(test[, "Apps"])
lm.test.r2 <- 1 - mean((test[, "Apps"] - lm.pred)^2) /
mean((test[, "Apps"] - test.avg)^2)
ridge.test.r2 <- 1 - mean((test[, "Apps"] - ridge.pred)^2)/
mean((test[, "Apps"] - test.avg)^2)
lasso.test.r2 <- 1 - mean((test[, "Apps"] - lasso.pred)^2) /
mean((test[, "Apps"] - test.avg)^2)
pcr.test.r2 <- 1 - mean((test[, "Apps"] - pcr.pred)^2) /
mean((test[, "Apps"] - test.avg)^2)
pls.test.r2 <- 1 - mean((test[, "Apps"] - pls.pred)^2) /
mean((test[, "Apps"] - test.avg)^2)
r2 <- c(lm.test.r2, ridge.test.r2, lasso.test.r2, pcr.test.r2, pls.test.r2)
r2
## [1] 0.9134458 0.9134575 0.9134607 0.8588157 0.9096517
barplot(c(lm.test.r2, ridge.test.r2, lasso.test.r2, pcr.test.r2, pls.test.r2),
col = "#FF9999" , names.arg = c("Linear Model", "Ridge", "Lasso", "PCR", "PLS"),
main = "Test R-Squared")
RMSE <- c(lm.RMSE, ridge.RMSE, lasso.RMSE, pcr.RMSE, pls.RMSE)
RMSE
## [1] 1261630 1261460 1261414 2057929 1316934
barplot(c(lm.RMSE, ridge.RMSE, lasso.RMSE, pcr.RMSE, pls.RMSE),
col = "#FF9999" , names.arg = c("Linear Model", "Ridge", "Lasso", "PCR", "PLS"),
main = "RMSE")
11. We will now try to predict per capita crime rate in the Boston data set.
(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider
The results of the models, Ridge Regression RRMSE = 59.63 and Test R-squared 36.70%, Lasso RMSE = 59.80 and Test R-squared 36.70%, PCR RMSE = 63.96 and Test R-squared 36.70%, PLS RMSE = 61.03 and Test R-squared 36.70%. Based on these results none of the models did well in predicting the crime rate.
library(MASS)
data("Boston")
Boston=na.omit(Boston)
# 70% of the sample size
smp_size <- floor(.7 * nrow(Boston))
# set the seed to make partition reproducible
set.seed(1)
train_ind <- sample(seq_len(nrow(Boston)), size = smp_size)
train <- Boston[train_ind, ]
test <- Boston[-train_ind, ]
Ridge Regression
set.seed(1)
#find best λ
x_train <- model.matrix(crim ~ ., data = train)
x_test <- model.matrix(crim ~ ., data = test)
grid <- 4 ^ seq(4, -2, length = 100)
ridge.mod <- cv.glmnet(x_train, train[, "crim"], alpha = 0, lambda = grid, thresh = 1e-12)
ridge.bl <- ridge.mod$lambda.min
#fit ridge regression with the best λ
ridge.pred = predict(ridge.mod, newx=x_test, s=ridge.bl)
ridge.RMSE <- mean((ridge.pred - test[, "crim"])^2)
ridge.RMSE
## [1] 59.63209
#calculate test r squared
test.avg <- mean(test[, "crim"])
ridge.test.r2 <- 1 - mean((test[, "crim"] - ridge.pred)^2)/
mean((test[, "crim"] - test.avg)^2)
ridge.test.r2
## [1] 0.3670422
Lasso
set.seed(1)
#find best λ
lasso.mod = cv.glmnet(x_train, train[, "crim"], alpha=1, lambda=grid, thresh=1e-12)
lasso.bl = lasso.mod$lambda.min
#fit lasso regression with the best λ
lasso.pred = predict(lasso.mod, newx=x_test, s=lasso.bl)
lasso.RMSE <- mean((lasso.pred - test[, "crim"])^2)
lasso.RMSE
## [1] 59.80344
#calculate test r squared
lasso.test.r2 <- 1 - mean((test[, "crim"] - ridge.pred)^2)/
mean((test[, "crim"] - test.avg)^2)
lasso.test.r2
## [1] 0.3670422
PCR
set.seed(1)
#Fit PCR and plot
pcr.fit=pcr(crim~., data=train, scale=TRUE, validation ="CV")
validationplot(pcr.fit, val.type="RMSEP")
#calculate RMSE
pcr.pred = predict(pcr.fit, test, ncomp=4)
pcr.RMSE <- mean((pcr.pred - test[, "crim"])^2)
pcr.RMSE
## [1] 63.96279
#calculate test r squared
pcr.test.r2 <- 1 - mean((test[, "crim"] - ridge.pred)^2)/
mean((test[, "crim"] - test.avg)^2)
pcr.test.r2
## [1] 0.3670422
PLS
#fit PLS and plot
set.seed(1)
pls.fit=plsr(crim~., data=train, scale=TRUE, validation ="CV")
validationplot(pls.fit, val.type="RMSEP")
#calculate RMSE
pls.pred = predict(pls.fit, test, ncomp=4)
pls.RMSE <- mean((pls.pred - test[, "crim"])^2)
pls.RMSE
## [1] 61.02598
#calculate test r squared
pls.test.r2 <- 1 - mean((test[, "crim"] - ridge.pred)^2)/
mean((test[, "crim"] - test.avg)^2)
pls.test.r2
## [1] 0.3670422
(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.
Based on the result of the RMSE and Test R-squared, none of the models did well, and only produced of r-squared of 36.70% on all of the models.
(c) Does your chosen model involve all of the features in the data set? Why or why not?
Yes, based on the plot of PCR and PLS, both models includes all variables. The results of the below of the coefficients indicates that all variables were included in both ridge and lasso. The variables were included because none were removed before modeling and none of the models did not disregard any of the varibles.
model_matrix <- model.matrix(crim~., data=Boston)
ridge.mod = glmnet(model_matrix, Boston[, "crim"], alpha=0)
predict(ridge.mod, s=ridge.bl, type="coefficients")
## 15 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 9.063048626
## (Intercept) .
## zn 0.033002416
## indus -0.082046152
## chas -0.737684583
## nox -5.393098481
## rm 0.335972073
## age 0.001962473
## dis -0.702123641
## rad 0.422779054
## tax 0.003400607
## ptratio -0.135911587
## black -0.008483285
## lstat 0.142613436
## medv -0.139604127
model_matrix <- model.matrix(crim~., data=Boston)
lasso.mod = glmnet(model_matrix, Boston[, "crim"], alpha=1)
predict(ridge.mod, s=lasso.bl, type="coefficients")
## 15 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 9.063048626
## (Intercept) .
## zn 0.033002416
## indus -0.082046152
## chas -0.737684583
## nox -5.393098481
## rm 0.335972073
## age 0.001962473
## dis -0.702123641
## rad 0.422779054
## tax 0.003400607
## ptratio -0.135911587
## black -0.008483285
## lstat 0.142613436
## medv -0.139604127