#2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer. (a) The lasso, relative to least squares, is: i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease invariance. ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decreasein bias. iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias. Lasso restrict the size of the regression coefficient which helps lower variance in bias. This due to Lasso regression relying on parameter which control factors in shrinkage. Option iii. is correct
(b) Repeat (a) for ridge regression relative to least squares. Ridge regression variance decrease and bias increase as coefficient tends to 0 which makes ridge regression less flexible. Option iii is the best fit.
(c) Repeat (a) for non-linear methods relative to least squares. Non-linear regression is generally more flexible.
#9.In this exercise, we will predict the number of applications received using the other variables in the College data set.
library(ISLR2)
attach(College)
(a) Split the data set into a training set and a test set.
x=model.matrix(Apps~.,College)[,-1]
y=College$Apps
set.seed(10)
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
College1.train = College[train, ]
College1.test = College[test, ]
y.test=y[test]
This splits the set into training data and test data sent in to equal parts. (b) Fit a linear model using least squares on the training set, and report the test error obtained.
ls.fit = lm(Apps~., data = College, subset = train)
summary(ls.fit)
##
## Call:
## lm(formula = Apps ~ ., data = College, subset = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5139.5 -473.3 -21.1 353.2 7402.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -629.36179 639.35741 -0.984 0.325579
## PrivateYes -647.56836 192.17056 -3.370 0.000832 ***
## Accept 1.68912 0.05038 33.530 < 2e-16 ***
## Enroll -1.02383 0.27721 -3.693 0.000255 ***
## Top10perc 48.19124 8.10714 5.944 6.42e-09 ***
## Top25perc -10.51538 6.44952 -1.630 0.103865
## F.Undergrad 0.01992 0.05364 0.371 0.710574
## P.Undergrad 0.04213 0.05348 0.788 0.431373
## Outstate -0.09489 0.02674 -3.549 0.000436 ***
## Room.Board 0.14549 0.07243 2.009 0.045277 *
## Books 0.06660 0.31115 0.214 0.830623
## Personal 0.05663 0.09453 0.599 0.549475
## PhD -10.11489 7.11588 -1.421 0.156027
## Terminal -2.29300 8.03546 -0.285 0.775528
## S.F.Ratio 22.07117 18.70991 1.180 0.238897
## perc.alumni 2.08121 6.00673 0.346 0.729179
## Expend 0.07654 0.01672 4.577 6.45e-06 ***
## Grad.Rate 9.99706 4.49821 2.222 0.026857 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1092 on 370 degrees of freedom
## Multiple R-squared: 0.9395, Adjusted R-squared: 0.9367
## F-statistic: 338 on 17 and 370 DF, p-value: < 2.2e-16
predict.app = predict(ls.fit, College1.test)
test.error = mean((College1.test$Apps-predict.app)^2)
test.error
## [1] 1020100
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(glmnet)
grid=10^seq(10,-2,)
ridge.mod=glmnet(x[train,],y[train],alpha=0,lambda=grid)
summary(ridge.mod)
## Length Class Mode
## a0 13 -none- numeric
## beta 221 dgCMatrix S4
## df 13 -none- numeric
## dim 2 -none- numeric
## lambda 13 -none- numeric
## dev.ratio 13 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
college.out=cv.glmnet(x[train,],y[train], aplpha=0)
lam=college.out$lambda.min
lam
## [1] 32.60217
rid.predict=predict(ridge.mod, s=lam, newx=x[test,])
mean((rid.predict-y.test)^2)
## [1] 999129.1
The test error is 1000663 (d) Fit a lasso model on the training set, with λ chosen by cross validation. Report the test error obtained, along with the number of non-zero coefficient estimates.
lasso.mod = glmnet(x[train,], y[train],alpha=1,lambda=grid)
summary(lasso.mod)
## Length Class Mode
## a0 13 -none- numeric
## beta 221 dgCMatrix S4
## df 13 -none- numeric
## dim 2 -none- numeric
## lambda 13 -none- numeric
## dev.ratio 13 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
lass.out=cv.glmnet(x[train,],y[train],alpha=1)
lasso.predict=predict(lasso.mod,s=lam,newx=x[test,])
mean((lasso.predict-y.test)^2)
## [1] 997606.4
out=glmnet(x,y,alpha=1,lambda = grid)
lasso.co=predict(out,type="coefficients",s=lam)[1:18,]
lasso.co[lasso.co!=0]
## (Intercept) PrivateYes Accept Enroll Top10perc
## -578.14677497 -402.62550281 1.49288970 -0.32380778 37.98789203
## Top25perc P.Undergrad Outstate Room.Board Personal
## -6.46409544 0.03086870 -0.05613934 0.10555602 0.01510032
## PhD Terminal S.F.Ratio perc.alumni Expend
## -5.53047112 -2.24688282 8.28625922 -0.35713200 0.06603770
## Grad.Rate
## 5.17749687
The MSE for the lasso model is 998674.3 (e) Fit a PCR model on the training set, with M chosen by cross validation. Report the test error obtained, along with the value of M selected by cross-validation.
library(pls)
pcr.c=pcr(Apps~., data=College1.train,scal=TRUE,validation='CV')
summary(pcr.c)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4347 4345 2371 2391 2104 1949 1898
## adjCV 4347 4345 2368 2396 2085 1939 1891
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1899 1880 1864 1861 1870 1873 1891
## adjCV 1893 1862 1857 1853 1862 1865 1885
## 14 comps 15 comps 16 comps 17 comps
## CV 1903 1727 1295 1260
## adjCV 1975 1669 1283 1249
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.6794 56.94 64.38 70.61 76.27 80.97 84.48 87.54
## Apps 0.9148 71.17 71.36 79.85 81.49 82.73 82.79 83.70
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.50 92.89 94.96 96.81 97.97 98.73 99.39
## Apps 83.86 84.08 84.11 84.11 84.16 84.28 93.08
## 16 comps 17 comps
## X 99.86 100.00
## Apps 93.71 93.95
validationplot(pcr.c, val.type = "MSEP")
pcr.predict=predict(pcr.c,x[test,],ncomp=10)
mean((pcr.predict-y.test)^2)
## [1] 1422699
Component 9 and 10 have the lowest CV value of 1864 and The MSE for the pcr model is1422699 (f) Fit a PLS model on the training set, with M chosen by cross validation. Report the test error obtained, along with the value of M selected by cross-validation.
pls.c=plsr(Apps~., data=College1.train, scale=TRUE, validation="CV")
validationplot(pls.c,val.type = "MSEP")
summary(pls.c)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4347 2178 1872 1734 1615 1453 1359
## adjCV 4347 2171 1867 1726 1586 1427 1341
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1347 1340 1329 1317 1310 1305 1305
## adjCV 1330 1324 1314 1302 1296 1291 1291
## 14 comps 15 comps 16 comps 17 comps
## CV 1305 1307 1307 1307
## adjCV 1291 1292 1293 1293
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 24.27 38.72 62.64 65.26 69.01 73.96 78.86 82.18
## Apps 76.96 84.31 86.80 91.48 93.37 93.75 93.81 93.84
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 85.35 87.42 89.18 91.41 92.70 94.58 97.16
## Apps 93.88 93.91 93.93 93.94 93.95 93.95 93.95
## 16 comps 17 comps
## X 98.15 100.00
## Apps 93.95 93.95
pls.predict=predict(pls.c,x[test,],ncomp=9)
mean((pls.predict-y.test)^2)
## [1] 1049868
Test error is 1049868 (g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches
test.avg = mean(College1.test[, "Apps"])
lm.test.r2 = 1 - mean((College1.test[, "Apps"] - predict.app)^2) / mean((College1.test[, "Apps"] - test.avg)^2)
ridge.test.r2 = 1 - mean((College1.test[, "Apps"] - rid.predict)^2) /mean((College1.test[, "Apps"] - test.avg)^2)
lasso.test.r2 = 1 - mean((College1.test[, "Apps"] - lasso.predict)^2) /mean((College1.test[, "Apps"] - test.avg)^2)
pcr.test.r2 = 1 - mean((pcr.predict-y.test)^2) /mean((College1.test[, "Apps"] - test.avg)^2)
pls.test.r2 = 1 - mean((pls.predict-y.test)^2) /mean((College1.test[, "Apps"] - test.avg)^2)
barplot(c(lm.test.r2, ridge.test.r2, lasso.test.r2, pcr.test.r2, pls.test.r2), col="yellow", names.arg=c("OLS", "Ridge", "Lasso", "PCR", "PLS"), main="Test R-squared")
detach(College)
#11. We will now try to predict per capita crime rate in the Boston data set. (a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
library(leaps)
library(ISLR2)
attach(Boston)
predict.regsubsets = function(object, newdata, id, ...) {
form = as.formula(object$call[[2]])
mat = model.matrix(form, newdata)
coefi = coef(object, id = id)
mat[, names(coefi)] %*% coefi
}
k = 10
p = ncol(Boston) - 1
folds = sample(rep(1:k, length = nrow(Boston)))
cv.errors = matrix(NA, k, p)
for (i in 1:k) {
best.fit = regsubsets(crim ~ ., data = Boston[folds != i, ], nvmax = p)
for (j in 1:p) {
pred = predict(best.fit, Boston[folds == i, ], id = j)
cv.errors[i, j] = mean((Boston$crim[folds == i] - pred)^2)
}
}
mean.cv.errors <- apply(cv.errors, 2, mean)
plot(mean.cv.errors, type = "b", xlab = "Number of variables", ylab = "CV error")
which.min(mean.cv.errors)
## [1] 11
mean.cv.errors[which.min(mean.cv.errors)]
## [1] 42.09744
x = model.matrix(crim ~ . - 1, data = Boston)
y = Boston$crim
cv.lasso = cv.glmnet(x, y, type.measure = "mse")
plot(cv.lasso)
coef(cv.lasso)
## 13 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 1.7799525
## zn .
## indus .
## chas .
## nox .
## rm .
## age .
## dis .
## rad 0.1920089
## tax .
## ptratio .
## lstat .
## medv .
sqrt(cv.lasso$cvm[cv.lasso$lambda == cv.lasso$lambda.1se])
## [1] 7.737743
x = model.matrix(crim ~ . - 1, data = Boston)
y = Boston$crim
cv.ridge = cv.glmnet(x, y, type.measure = "mse", alpha = 0)
plot(cv.ridge)
coef(cv.ridge)
## 13 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -0.077937209
## zn -0.002995505
## indus 0.034313469
## chas -0.215824241
## nox 2.232666325
## rm -0.160148174
## age 0.007254035
## dis -0.112595718
## rad 0.057406967
## tax 0.002573222
## ptratio 0.084061113
## lstat 0.043426139
## medv -0.028553107
sqrt(cv.ridge$cvm[cv.ridge$lambda == cv.ridge$lambda.1se])
## [1] 7.560442
pcr.crime = pcr(crim ~ ., data = Boston, scale = TRUE, validation = "CV")
summary(pcr.crime)
## Data: X dimension: 506 12
## Y dimension: 506 1
## Fit method: svdpc
## Number of components considered: 12
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 8.61 7.277 7.275 6.865 6.848 6.796 6.788
## adjCV 8.61 7.274 7.271 6.858 6.846 6.792 6.784
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps
## CV 6.645 6.650 6.639 6.627 6.587 6.495
## adjCV 6.639 6.645 6.633 6.622 6.580 6.488
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 49.93 63.64 72.94 80.21 86.83 90.26 92.79 94.99
## crim 29.39 29.55 37.39 37.85 38.85 39.23 41.73 41.82
## 9 comps 10 comps 11 comps 12 comps
## X 96.78 98.33 99.48 100.00
## crim 42.12 42.43 43.58 44.93
(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross validation, or some other reasonable alternative, as opposed to using training error. The best subset, the lasso, ridge resulted in MSE at 42.21041. The lowest CV error for the PCR method 6.567, Lasso 7.563699 and Ridge 7.79906.
(c) Does your chosen model involve all of the features in the data set? Why or why not? This model has a low MSE and has the least number of predictors with 11. The Lasso and Ridge models have all the available predictors from this data set. The PCR method has the lowest CV but it has 13 predictors so there isn’t any reduction .