library(ISLR2)Assignment 5
Question 2
For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
(a) The lasso, relative to least squares, is:
- More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
- More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
- Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
- Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
A: iii is the correct answer. Lasso’s solution can have a reduction in variance at the expense of small increase in bias while least squares estimates have high variance. Lasso can shrink coefficient estimates, removing non-essential variables for less variance and higher bias.
(b) Repeat (a) for ridge regression relative to least squares.
A: iii is the correct answer. Like Lasso, ridge can shrink the coefficient estimates, decreasing variance with higher bias. Ridge is less flexible than the least squares.
(c) Repeat (a) for non-linear methods relative to least squares.
*A: ii is the correct answer. Non-linear models are more flexible and have less bias than least squares.
Question 9
In this exercise, we will predict the number of applications received using the other variables in the College data set.
(a) Split the data set into a training set and a test set.
attach(College)
x=model.matrix(Apps~.,College)[,-1]
y=College$Apps
set.seed(10)
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
College.train = College[train, ]
College.test = College[test, ]
y.test=y[test](b) Fit a linear model using least squares on the training set, and report the test error obtained.
pls.fit<-lm(Apps~., data=College, subset=train)
summary(pls.fit)
Call:
lm(formula = Apps ~ ., data = College, subset = train)
Residuals:
Min 1Q Median 3Q Max
-5139.5 -473.3 -21.1 353.2 7402.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -629.36179 639.35741 -0.984 0.325579
PrivateYes -647.56836 192.17056 -3.370 0.000832 ***
Accept 1.68912 0.05038 33.530 < 2e-16 ***
Enroll -1.02383 0.27721 -3.693 0.000255 ***
Top10perc 48.19124 8.10714 5.944 6.42e-09 ***
Top25perc -10.51538 6.44952 -1.630 0.103865
F.Undergrad 0.01992 0.05364 0.371 0.710574
P.Undergrad 0.04213 0.05348 0.788 0.431373
Outstate -0.09489 0.02674 -3.549 0.000436 ***
Room.Board 0.14549 0.07243 2.009 0.045277 *
Books 0.06660 0.31115 0.214 0.830623
Personal 0.05663 0.09453 0.599 0.549475
PhD -10.11489 7.11588 -1.421 0.156027
Terminal -2.29300 8.03546 -0.285 0.775528
S.F.Ratio 22.07117 18.70991 1.180 0.238897
perc.alumni 2.08121 6.00673 0.346 0.729179
Expend 0.07654 0.01672 4.577 6.45e-06 ***
Grad.Rate 9.99706 4.49821 2.222 0.026857 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1092 on 370 degrees of freedom
Multiple R-squared: 0.9395, Adjusted R-squared: 0.9367
F-statistic: 338 on 17 and 370 DF, p-value: < 2.2e-16
pred.app<-predict(pls.fit, College.test)
test.error<-mean((College.test$Apps-pred.app)^2)
test.error[1] 1020100
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(glmnet)Loading required package: Matrix
Loaded glmnet 4.1-10
grid=10^seq(10,-2,length=100)
ridge.mod=glmnet(x[train,],y[train],alpha=0,lambda=grid)
summary(ridge.mod) Length Class Mode
a0 100 -none- numeric
beta 1700 dgCMatrix S4
df 100 -none- numeric
dim 2 -none- numeric
lambda 100 -none- numeric
dev.ratio 100 -none- numeric
nulldev 1 -none- numeric
npasses 1 -none- numeric
jerr 1 -none- numeric
offset 1 -none- logical
call 5 -none- call
nobs 1 -none- numeric
cv.college.out=cv.glmnet(x[train,],y[train] ,alpha=0)
bestlam=cv.college.out$lambda.min
bestlam[1] 411.3927
ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
mean((ridge.pred-y.test)^2)[1] 985020.1
(d) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
summary(lasso.mod) Length Class Mode
a0 100 -none- numeric
beta 1700 dgCMatrix S4
df 100 -none- numeric
dim 2 -none- numeric
lambda 100 -none- numeric
dev.ratio 100 -none- numeric
nulldev 1 -none- numeric
npasses 1 -none- numeric
jerr 1 -none- numeric
offset 1 -none- logical
call 5 -none- call
nobs 1 -none- numeric
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
bestlam=cv.out$lambda.min
bestlam[1] 24.66235
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)[1] 1008145
out=glmnet(x,y,alpha=1,lambda = grid)
lasso.coef=predict(out,type="coefficients",s=bestlam)[1:18,]
lasso.coef[lasso.coef!=0] (Intercept) PrivateYes Accept Enroll Top10perc
-6.324960e+02 -4.087012e+02 1.436837e+00 -1.410240e-01 3.143012e+01
Top25perc P.Undergrad Outstate Room.Board Personal
-8.606536e-01 1.480293e-02 -5.342495e-02 1.205819e-01 4.379135e-05
PhD Terminal S.F.Ratio perc.alumni Expend
-5.121245e+00 -3.371192e+00 2.717231e+00 -1.039648e+00 6.838161e-02
Grad.Rate
4.700317e+00
(e) Fit a PCR model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.
library(pls)
Attaching package: 'pls'
The following object is masked from 'package:stats':
loadings
pcr.college=pcr(Apps~., data=College.train,scale=TRUE,validation="CV")
summary(pcr.college)Data: X dimension: 388 17
Y dimension: 388 1
Fit method: svdpc
Number of components considered: 17
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
CV 4347 4345 2371 2391 2104 1949 1898
adjCV 4347 4345 2368 2396 2085 1939 1891
7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
CV 1899 1880 1864 1861 1870 1873 1891
adjCV 1893 1862 1857 1853 1862 1865 1885
14 comps 15 comps 16 comps 17 comps
CV 1903 1727 1295 1260
adjCV 1975 1669 1283 1249
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
X 32.6794 56.94 64.38 70.61 76.27 80.97 84.48 87.54
Apps 0.9148 71.17 71.36 79.85 81.49 82.73 82.79 83.70
9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
X 90.50 92.89 94.96 96.81 97.97 98.73 99.39
Apps 83.86 84.08 84.11 84.11 84.16 84.28 93.08
16 comps 17 comps
X 99.86 100.00
Apps 93.71 93.95
validationplot(pcr.college, val.type="MSEP")pcr.pred=predict(pcr.college,x[test,],ncomp=10)
mean((pcr.pred-y.test)^2)[1] 1422699
(f) Fit a PLS model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.
pls.college=plsr(Apps~., data=College.train,scale=TRUE, validation="CV")
validationplot(pls.college, val.type="MSEP")summary(pls.college)Data: X dimension: 388 17
Y dimension: 388 1
Fit method: kernelpls
Number of components considered: 17
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
CV 4347 2178 1872 1734 1615 1453 1359
adjCV 4347 2171 1867 1726 1586 1427 1341
7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
CV 1347 1340 1329 1317 1310 1305 1305
adjCV 1330 1324 1314 1302 1296 1291 1291
14 comps 15 comps 16 comps 17 comps
CV 1305 1307 1307 1307
adjCV 1291 1292 1293 1293
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
X 24.27 38.72 62.64 65.26 69.01 73.96 78.86 82.18
Apps 76.96 84.31 86.80 91.48 93.37 93.75 93.81 93.84
9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
X 85.35 87.42 89.18 91.41 92.70 94.58 97.16
Apps 93.88 93.91 93.93 93.94 93.95 93.95 93.95
16 comps 17 comps
X 98.15 100.00
Apps 93.95 93.95
pls.pred=predict(pls.college,x[test,],ncomp=9)
mean((pls.pred-y.test)^2)[1] 1049868
(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
test.avg = mean(College.test[, "Apps"])
lm.test.r2 = 1 - mean((College.test[, "Apps"] - pred.app)^2) /mean((College.test[, "Apps"] - test.avg)^2)
ridge.test.r2 = 1 - mean((College.test[, "Apps"] - ridge.pred)^2) /mean((College.test[, "Apps"] - test.avg)^2)
lasso.test.r2 = 1 - mean((College.test[, "Apps"] - lasso.pred)^2) /mean((College.test[, "Apps"] - test.avg)^2)
pcr.test.r2 = 1 - mean((pcr.pred-y.test)^2) /mean((College.test[, "Apps"] - test.avg)^2)
pls.test.r2 = 1 - mean((pls.pred-y.test)^2) /mean((College.test[, "Apps"] - test.avg)^2)
barplot(c(lm.test.r2, ridge.test.r2, lasso.test.r2, pcr.test.r2, pls.test.r2), names.arg=c("OLS", "Ridge", "Lasso", "PCR", "PLS"), main="Test R-squared")detach(College)Question 11
We will now try to predict per capita crime rate in the Boston dataset.
- Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
library(leaps)library(MASS)
Attaching package: 'MASS'
The following object is masked from 'package:ISLR2':
Boston
set.seed(1)
attach(Boston)
predict.regsubsets = function(object, newdata, id, ...) {
form = as.formula(object$call[[2]])
mat = model.matrix(form, newdata)
coefi = coef(object, id = id)
mat[, names(coefi)] %*% coefi
}
k = 10
p = ncol(Boston) - 1
folds = sample(rep(1:k, length = nrow(Boston)))
cv.errors = matrix(NA, k, p)
for (i in 1:k) {
best.fit = regsubsets(crim ~ ., data = Boston[folds != i, ], nvmax = p)
for (j in 1:p) {
pred = predict(best.fit, Boston[folds == i, ], id = j)
cv.errors[i, j] = mean((Boston$crim[folds == i] - pred)^2)
}
}
mean.cv.errors <- apply(cv.errors, 2, mean)
plot(mean.cv.errors, type = "b", xlab = "Number of variables", ylab = "CV error")which.min(mean.cv.errors)[1] 9
mean.cv.errors[which.min(mean.cv.errors)][1] 42.81453
x = model.matrix(crim ~ . - 1, data = Boston)
y = Boston$crim
cv.lasso = cv.glmnet(x, y, type.measure = "mse")
plot(cv.lasso)coef(cv.lasso)14 x 1 sparse Matrix of class "dgCMatrix"
lambda.1se
(Intercept) 2.176491
zn .
indus .
chas .
nox .
rm .
age .
dis .
rad 0.150484
tax .
ptratio .
black .
lstat .
medv .
sqrt(cv.lasso$cvm[cv.lasso$lambda == cv.lasso$lambda.1se])[1] 7.921353
x = model.matrix(crim ~ . - 1, data = Boston)
y = Boston$crim
cv.ridge = cv.glmnet(x, y, type.measure = "mse", alpha = 0)
plot(cv.ridge)coef(cv.ridge)14 x 1 sparse Matrix of class "dgCMatrix"
lambda.1se
(Intercept) 1.523899542
zn -0.002949852
indus 0.029276741
chas -0.166526007
nox 1.874769665
rm -0.142852604
age 0.006207995
dis -0.094547258
rad 0.045932737
tax 0.002086668
ptratio 0.071258052
black -0.002605281
lstat 0.035745604
medv -0.023480540
sqrt(cv.ridge$cvm[cv.ridge$lambda == cv.ridge$lambda.1se])[1] 7.669133
pcr.crime = pcr(crim ~ ., data = Boston, scale = TRUE, validation = "CV")
summary(pcr.crime)Data: X dimension: 506 13
Y dimension: 506 1
Fit method: svdpc
Number of components considered: 13
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
CV 8.61 7.175 7.180 6.724 6.731 6.727 6.727
adjCV 8.61 7.174 7.179 6.721 6.725 6.724 6.724
7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
CV 6.722 6.614 6.618 6.607 6.598 6.553 6.488
adjCV 6.718 6.609 6.613 6.602 6.592 6.546 6.481
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
X 47.70 60.36 69.67 76.45 82.99 88.00 91.14 93.45
crim 30.69 30.87 39.27 39.61 39.61 39.86 40.14 42.47
9 comps 10 comps 11 comps 12 comps 13 comps
X 95.40 97.04 98.46 99.52 100.0
crim 42.55 42.78 43.04 44.13 45.4
- Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross-validation, or some other reasonable alternative, as opposed to using training error.
Based on the MSE, the best subset selection model had the lowest cross-validation error with the MSE of 42.8.
- Does your chosen model involve all of the features in the dataset? Why or why not?
The model chosen is the best subset selection model. This model has 9 predictors and the lowest MSE. By not including all 13 predictors, we have less variance. The goal of this model is to have low variances and low MSE while having good accuracy.