library(tidyverse)
library(openintro)
library(ISLR)
library(glmnet)
library(pls)
library(MASS)For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
(a) The lasso, relative to least squares, is: i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias. iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
The answer is iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. Lasso’s technique allows us to reduce the coefficient estimates.
(b) Repeat (a) for ridge regression relative to least squares.
The answer is iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. Ridge regression can also reduce coefficient estimates
(c) Repeat (a) for non-linear methods relative to least squares.
The answer is ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
In this exercise, we will predict the number of applications received using the other variables in the College data set.
(a) Split the data set into a training set and a test set.
set.seed(1)
num = sample(nrow(College), 0.75*nrow(College))
train = College[num,]
test = College[-num,](b) Fit a linear model using least squares on the training set, and report the test error obtained.
lmfit = lm(Apps~.,data=train)
lmpred = predict(lmfit,test)
lmerr = mean((test$Apps-lmpred)^2)
lmerr## [1] 1384604
avg.apps=mean(test$Apps)
lm.r2=1-mean((lmpred-test$Apps)^2)/mean((avg.apps-test$Apps)^2)
lm.r2## [1] 0.9086432
The test error obtained is 1384604.
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
trainc = model.matrix(Apps~.,data=train)[,-1]
testc = model.matrix(Apps~.,data=test)[,-1]
rridge = cv.glmnet(trainc, train$Apps, alpha = 0)
lambda = rridge$lambda.min
rrpred = predict(rridge, s=lambda, newx = testc)
rrerr = mean((test$Apps-rrpred)^2)
rrerr## [1] 1206346
rr.r2=1-mean((rrpred-test$Apps)^2)/mean((avg.apps-test$Apps)^2)
rr.r2## [1] 0.9204048
The test error obtained is 1206346.
(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
lass.train = model.matrix(Apps~., data=train)[,-1]
lass.test = model.matrix(Apps~., data=test)[,-1]
lass.fit = cv.glmnet(lass.train, train$Apps, alpha=1)
lambda = lass.fit$lambda.min
lass.pred = predict(lass.fit, s=lambda, newx=lass.test)
lass.err = mean((test$Apps - lass.pred)^2)
lass.err## [1] 1370403
lass.cof = predict(lass.fit, type="coefficients", s=lambda)[1:ncol(College),]
lass.cof## (Intercept) PrivateYes Accept Enroll Top10perc
## -5.651833e+02 -4.646154e+02 1.696513e+00 -1.079012e+00 5.131131e+01
## Top25perc F.Undergrad P.Undergrad Outstate Room.Board
## -1.397974e+01 5.644318e-02 5.646120e-02 -7.818063e-02 1.592739e-01
## Books Personal PhD Terminal S.F.Ratio
## 2.282606e-01 3.466774e-03 -1.025118e+01 0.000000e+00 1.561720e+01
## perc.alumni Expend Grad.Rate
## 1.514189e+00 5.513999e-02 6.157195e+00
lass.r2=1-mean((lass.pred-test$Apps)^2)/mean((avg.apps-test$Apps)^2)
lass.r2## [1] 0.9095802
We got a test error of 1370403 from using the lasso method.
(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
pcr.fit = pcr(Apps~., data=train, scale=TRUE, validation="CV")
summary(pcr.fit)## Data: X dimension: 582 17
## Y dimension: 582 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 3862 3761 2078 2076 1809 1709 1662
## adjCV 3862 3761 2075 2076 1790 1689 1657
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1654 1615 1573 1578 1584 1587 1586
## adjCV 1649 1605 1568 1572 1579 1582 1580
## 14 comps 15 comps 16 comps 17 comps
## CV 1588 1528 1193 1133
## adjCV 1583 1511 1183 1124
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.159 57.17 64.41 70.20 75.53 80.48 84.24 87.56
## Apps 5.226 71.83 71.84 80.02 83.01 83.07 83.21 84.46
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.54 92.81 94.92 96.73 97.81 98.69 99.35
## Apps 85.00 85.22 85.22 85.23 85.36 85.45 89.93
## 16 comps 17 comps
## X 99.82 100.00
## Apps 92.84 93.36
pcr.pred = predict(pcr.fit, test, ncomp=10)
pcr.err = mean((test$Apps - pcr.pred)^2)
pcr.err## [1] 1952693
pcr.r2=1-mean((pcr.pred-test$Apps)^2)/mean((avg.apps-test$Apps)^2)
pcr.r2## [1] 0.8711605
The test error obtained using the PCR is 1952693 and the M selected was 10.
(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
pls.fit = plsr(Apps~., data=train, scale=TRUE, validation="CV")
summary(pls.fit)## Data: X dimension: 582 17
## Y dimension: 582 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 3862 1933 1688 1489 1424 1250 1167
## adjCV 3862 1927 1687 1482 1404 1227 1157
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1149 1144 1138 1134 1135 1132 1132
## adjCV 1140 1136 1130 1126 1127 1124 1123
## 14 comps 15 comps 16 comps 17 comps
## CV 1130 1131 1130 1130
## adjCV 1122 1122 1122 1122
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 25.67 47.09 62.54 65.0 67.54 72.28 76.80 80.63
## Apps 76.80 82.71 87.20 90.8 92.79 93.05 93.14 93.22
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 82.71 85.53 88.01 90.95 93.07 95.18 96.86
## Apps 93.30 93.32 93.34 93.35 93.36 93.36 93.36
## 16 comps 17 comps
## X 98.00 100.00
## Apps 93.36 93.36
pls.pred = predict(pls.fit, test, ncomp=9)
pls.err = mean((test$Apps - pls.pred)^2)
pls.err## [1] 1381335
pls.r2=1-mean((pls.pred-test$Apps)^2)/mean((avg.apps-test$Apps)^2)
pls.r2## [1] 0.9088589
The test error obtained using the PLS is 1381335 and the M selected was again 9.
(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
Our results from the various models tested, was that the linear model had a test error of 1384604. The ridge regression model had an error of 1206346. The lasso model had an error of 1370403. The PCR model had an error of 1952693. Lastly, the PLS model had an error of 1381335. These results tell us that the ridge regression method is the best model for our data. In order to see if the models can accurately predict the number of applications received, we look at the individual r^2 for each model. All have an r^2 of .87 or higher meaning they can all accurately model the data. Overall, there isn’t that much of a difference between test errors of the 5 tests, but that being said the PCR model had a higher error vs the ridge regression model which had a lower error compared to the others.
We will now try to predict per capita crime rate in the Boston data set.
(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
set.seed(10)
training = sample(1:nrow(Boston), nrow(Boston)/2)
train = Boston[training,]
test = Boston[-training,]
trainset = model.matrix(crim~., data=train)[,-1]
testset = model.matrix(crim~., data=test)[,-1]First, I’m performing a lasso regression
lasso.fit = cv.glmnet(trainset, train$crim, alpha=1)
lambda = lasso.fit$lambda.min
lasso.pred = predict(lasso.fit, s=lambda, newx=testset)
lasso.err = mean((test$crim - lasso.pred)^2)
lasso.err## [1] 56.09141
We are given a test error of 55.95583.
Next, I’m going to perform a PCR
pcr2.fit = pcr(crim~., data=train, scale=TRUE, validation="CV")
summary(pcr2.fit)## Data: X dimension: 253 13
## Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 7.521 6.136 6.095 5.727 5.626 5.675 5.777
## adjCV 7.521 6.130 6.089 5.717 5.616 5.661 5.759
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 5.782 5.673 5.552 5.546 5.554 5.542 5.505
## adjCV 5.764 5.649 5.530 5.526 5.536 5.521 5.484
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 46.57 58.79 68.40 76.02 82.37 87.52 90.99 93.28
## crim 35.21 36.53 45.03 47.18 47.30 47.35 47.35 49.85
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 95.35 97.13 98.44 99.51 100.00
## crim 51.98 51.99 52.05 52.64 53.51
pcr2.pred = predict(pcr2.fit, test, ncomp=12)
pcr2.err = mean((test$crim - pcr2.pred)^2)
pcr2.err## [1] 56.58621
We are given a test error of 56.58621.
Last, I’m going to perform a ridge regression
ridge.fit = cv.glmnet(trainset, train$crim, alpha=0)
lambda = ridge.fit$lambda.min
ridge.pred = predict(ridge.fit, s=lambda, newx=testset)
ridge.err = mean((test$crim - ridge.pred)^2)
ridge.err## [1] 56.34103
We are given a test error of 56.34103.
(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.
As computed in part (A), we are given the test errors given were 55.95583 for the lasso regression, 56.58621 for the PCR model, and 56.34103 for the ridge regression. The lowest test error was the lasso model, so we can propose that this model would perform the best compared to the others. That being said, the other models’ errors didn’t fall far from the lasso model so they can all be said to perform well.
(c) Does your chosen model involve all of the features in the data set? Why or why not?
No, because my chosen model (the lasso model) removes variables deemed “unimportant” or barely significant, and changes their coefficients to 0.
…