More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
(a) The lasso, relative to least squares, is:
Option iii because Lasso’s solution can have a reduction in variance at the expense of small increase in bias while least squares estimates have high variance.
(b) Repeat (a) for ridge regression relative to least squares. Option iii, because Ridge regression and lasso’s advantage over least squares is rooted in the bias-variance trade-off.Ridge regression works best when the least squares estimates have high variance, the difference is the variance selection.
(c) Repeat (a) for non-linear methods relative to least squares. Option ii, because non-linear models are more flexible and have less bias.
library(ISLR)
data(College)
str(College)
## 'data.frame': 777 obs. of 18 variables:
## $ Private : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ Apps : num 1660 2186 1428 417 193 ...
## $ Accept : num 1232 1924 1097 349 146 ...
## $ Enroll : num 721 512 336 137 55 158 103 489 227 172 ...
## $ Top10perc : num 23 16 22 60 16 38 17 37 30 21 ...
## $ Top25perc : num 52 29 50 89 44 62 45 68 63 44 ...
## $ F.Undergrad: num 2885 2683 1036 510 249 ...
## $ P.Undergrad: num 537 1227 99 63 869 ...
## $ Outstate : num 7440 12280 11250 12960 7560 ...
## $ Room.Board : num 3300 6450 3750 5450 4120 ...
## $ Books : num 450 750 400 450 800 500 500 450 300 660 ...
## $ Personal : num 2200 1500 1165 875 1500 ...
## $ PhD : num 70 29 53 92 76 67 90 89 79 40 ...
## $ Terminal : num 78 30 66 97 72 73 93 100 84 41 ...
## $ S.F.Ratio : num 18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...
## $ perc.alumni: num 12 16 30 37 2 11 26 37 23 15 ...
## $ Expend : num 7041 10527 8735 19016 10922 ...
## $ Grad.Rate : num 60 56 54 59 15 55 63 73 80 52 ...
(a) Split the data set into a training set and a test set.
set.seed(1)
mydata<-sample(nrow(College), 0.75*nrow(College))
head(mydata)
## [1] 679 129 509 471 299 270
training<-College[mydata, ]
testing<-College[-mydata, ]
dim(College)
## [1] 777 18
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
lmod=lm(Apps~., data=training)
summary(lmod)
##
## Call:
## lm(formula = Apps ~ ., data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5773.1 -425.2 4.5 327.9 7496.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.784e+02 4.707e+02 -1.229 0.21962
## PrivateYes -4.673e+02 1.571e+02 -2.975 0.00305 **
## Accept 1.712e+00 4.567e-02 37.497 < 2e-16 ***
## Enroll -1.197e+00 2.151e-01 -5.564 4.08e-08 ***
## Top10perc 5.298e+01 6.158e+00 8.603 < 2e-16 ***
## Top25perc -1.528e+01 4.866e+00 -3.141 0.00177 **
## F.Undergrad 7.085e-02 3.760e-02 1.884 0.06002 .
## P.Undergrad 5.771e-02 3.530e-02 1.635 0.10266
## Outstate -8.143e-02 2.077e-02 -3.920 9.95e-05 ***
## Room.Board 1.609e-01 5.361e-02 3.002 0.00280 **
## Books 2.338e-01 2.634e-01 0.887 0.37521
## Personal 6.611e-03 6.850e-02 0.097 0.92315
## PhD -1.114e+01 5.149e+00 -2.163 0.03093 *
## Terminal 9.186e-01 5.709e+00 0.161 0.87223
## S.F.Ratio 1.689e+01 1.542e+01 1.096 0.27368
## perc.alumni 2.256e+00 4.635e+00 0.487 0.62667
## Expend 5.567e-02 1.300e-02 4.284 2.16e-05 ***
## Grad.Rate 6.427e+00 3.307e+00 1.944 0.05243 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1009 on 564 degrees of freedom
## Multiple R-squared: 0.9336, Adjusted R-squared: 0.9316
## F-statistic: 466.7 on 17 and 564 DF, p-value: < 2.2e-16
lmod.pred=predict(lmod, newdata=testing)
lmod.err=mean((testing$Apps-lmod.pred)^2)
lmod.err
## [1] 1384604
The test error is 1384604 and The r2 for this model is 93.16%; This means that 93.16% of the variance is described by this model.
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(Matrix)
library(foreach)
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.1.3
## Loaded glmnet 4.1-3
set.seed(1)
xtraining=model.matrix(Apps~., data=training[,-1])
ytraining=training$Apps
xtesting=model.matrix(Apps~., data=testing[,-1])
ytesting=testing$Apps
ridge.reg=cv.glmnet(xtraining, ytraining, alpha=0)
plot(ridge.reg)
ridge.lambda=ridge.reg$lambda.min
ridge.lambda
## [1] 364.6228
ridge.pred=predict(ridge.reg, s=ridge.lambda, newx = xtesting)
ridge.err=mean((ridge.pred-ytesting)^2)
ridge.err
## [1] 1260111
The ridge regression has lower test error compared to the OLS regression, this is due that ridge regression predicts better because OLS has higher variance due to large variance in the data.
(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
set.seed(1)
lasso.mod=cv.glmnet(xtraining, ytraining, alpha=1)
plot(lasso.mod)
lasso.lambda=lasso.mod$lambda.min
lasso.lambda
## [1] 1.945882
lasso.pred=predict(lasso.mod, s=lasso.lambda, newx = xtesting)
lasso.err=mean((lasso.pred-ytesting)^2)
lasso.err
## [1] 1394834
lasso.coeff=predict(lasso.mod, type="coefficients", s=lasso.lambda)[1:18,]
lasso.coeff
## (Intercept) (Intercept) Accept Enroll Top10perc
## -1.030320e+03 0.000000e+00 1.693638e+00 -1.100616e+00 5.104012e+01
## Top25perc F.Undergrad P.Undergrad Outstate Room.Board
## -1.369969e+01 7.851307e-02 6.036558e-02 -9.861002e-02 1.475536e-01
## Books Personal PhD Terminal S.F.Ratio
## 2.185146e-01 0.000000e+00 -8.390364e+00 1.349648e+00 2.457249e+01
## perc.alumni Expend Grad.Rate
## 7.684156e-01 5.858007e-02 5.324356e+00
Lasso is more simple and with easier interpretation than ridge regression, however it did not perform as well in prediction accuracy with a test error of 1394834, because the variance of the ridge regression was slightly lower than that of the lasso, it produced a more minimum error.
(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
library(pls)
## Warning: package 'pls' was built under R version 4.1.3
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
library(stats)
pcr.mod=pcr(Apps~., data=training, scale=TRUE, validation="CV")
validationplot(pcr.mod, val.type = "MSEP")
summary(pcr.mod)
## Data: X dimension: 582 17
## Y dimension: 582 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 3862 3773 2100 2106 1812 1678 1674
## adjCV 3862 3772 2097 2104 1800 1666 1668
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1674 1625 1580 1579 1582 1586 1590
## adjCV 1669 1613 1574 1573 1576 1580 1584
## 14 comps 15 comps 16 comps 17 comps
## CV 1585 1480 1189 1123
## adjCV 1580 1463 1179 1115
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.159 57.17 64.41 70.20 75.53 80.48 84.24 87.56
## Apps 5.226 71.83 71.84 80.02 83.01 83.07 83.21 84.46
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.54 92.81 94.92 96.73 97.81 98.69 99.35
## Apps 85.00 85.22 85.22 85.23 85.36 85.45 89.93
## 16 comps 17 comps
## X 99.82 100.00
## Apps 92.84 93.36
validationplot(pcr.mod, val.type="MSEP")
pcr.pred=predict(pcr.mod, testing, ncomp=10)
pcr.err=mean((pcr.pred-testing$Apps)^2)
pcr.err
## [1] 1952693
The number of components that has the lowest CV is number 10, with 1560, with a variance of 85.22%, with a MSE error of 1952693 for the pcr model.
(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
pls.mod=plsr(Apps~., data=training, scale=TRUE, validation="CV")
validationplot(pls.mod, val.type = "MSEP")
summary(pls.mod)
## Data: X dimension: 582 17
## Y dimension: 582 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 3862 1916 1708 1496 1415 1261 1181
## adjCV 3862 1912 1706 1489 1396 1237 1170
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1166 1152 1146 1144 1144 1140 1138
## adjCV 1157 1144 1137 1135 1135 1131 1129
## 14 comps 15 comps 16 comps 17 comps
## CV 1137 1137 1137 1137
## adjCV 1129 1129 1129 1129
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 25.67 47.09 62.54 65.0 67.54 72.28 76.80 80.63
## Apps 76.80 82.71 87.20 90.8 92.79 93.05 93.14 93.22
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 82.71 85.53 88.01 90.95 93.07 95.18 96.86
## Apps 93.30 93.32 93.34 93.35 93.36 93.36 93.36
## 16 comps 17 comps
## X 98.00 100.00
## Apps 93.36 93.36
Based on the summary of the model, the lowest CV error occurs at 13 components, with 93.36% of the variance in the predictors is explained and 93.07% of the variance in the response variable is explained.
pls.pred=predict(pls.mod, testing, ncomp = 13)
pls.err=mean((pls.pred-testing$Apps)^2)
pls.err
## [1] 1380520
(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
barplot(c(lmod.err, ridge.err, lasso.err, pcr.err, pls.err), col="gray", xlab="Methods", ylab="Test Error", main="Test Errors general", names.arg=c("OLS", "Ridge", "Lasso", "PCR", "PLS"))
avg.apps=mean(testing$Apps)
lm.r2=1-mean((lmod.pred-testing$Apps)^2)/mean((avg.apps-testing$Apps)^2)
lm.r2
## [1] 0.9086432
ridge.r2=1-mean((ridge.pred-testing$Apps)^2)/mean((avg.apps-testing$Apps)^2)
ridge.r2
## [1] 0.9168573
lasso.r2=1-mean((lasso.pred-testing$Apps)^2)/mean((avg.apps-testing$Apps)^2)
lasso.r2
## [1] 0.9079682
pcr.r2=1-mean((pcr.pred-testing$Apps)^2)/mean((avg.apps-testing$Apps)^2)
pcr.r2
## [1] 0.8711605
pls.r2=1-mean((pls.pred-testing$Apps)^2)/mean((avg.apps-testing$Apps)^2)
pls.r2
## [1] 0.9089127
Based on the results most of the models have high accuracy, the best is the ridge model with R2 of 0.9168573.
library(MASS)
library(leaps)
## Warning: package 'leaps' was built under R version 4.1.3
attach(Boston)
data(Boston)
set.seed(1)
(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
predict.regsubsets = function(object, newdata, id, ...) {
form = as.formula(object$call[[2]])
mat = model.matrix(form, newdata)
coefi = coef(object, id = id)
mat[, names(coefi)] %*% coefi
}
k = 10
p = ncol(Boston) - 1
folds = sample(rep(1:k, length = nrow(Boston)))
cv.errors = matrix(NA, k, p)
for (i in 1:k) {
best.fit = regsubsets(crim ~ ., data = Boston[folds != i, ], nvmax = p)
for (j in 1:p) {
pred = predict(best.fit, Boston[folds == i, ], id = j)
cv.errors[i, j] = mean((Boston$crim[folds == i] - pred)^2)
}
}
rmse.cv = sqrt(apply(cv.errors, 2, mean))
plot(rmse.cv, pch = 19, type = "b")
summary(best.fit)
## Subset selection object
## Call: regsubsets.formula(crim ~ ., data = Boston[folds != i, ], nvmax = p)
## 13 Variables (and intercept)
## Forced in Forced out
## zn FALSE FALSE
## indus FALSE FALSE
## chas FALSE FALSE
## nox FALSE FALSE
## rm FALSE FALSE
## age FALSE FALSE
## dis FALSE FALSE
## rad FALSE FALSE
## tax FALSE FALSE
## ptratio FALSE FALSE
## black FALSE FALSE
## lstat FALSE FALSE
## medv FALSE FALSE
## 1 subsets of each size up to 13
## Selection Algorithm: exhaustive
## zn indus chas nox rm age dis rad tax ptratio black lstat medv
## 1 ( 1 ) " " " " " " " " " " " " " " "*" " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " " " " "*" " " " " " " "*" " "
## 3 ( 1 ) " " " " " " " " " " " " " " "*" " " " " "*" "*" " "
## 4 ( 1 ) "*" " " " " " " " " " " " " "*" " " " " "*" "*" " "
## 5 ( 1 ) "*" " " " " " " " " " " "*" "*" " " " " " " "*" "*"
## 6 ( 1 ) "*" "*" " " " " " " " " "*" "*" " " " " " " "*" "*"
## 7 ( 1 ) "*" "*" " " " " " " " " "*" "*" " " " " "*" "*" "*"
## 8 ( 1 ) "*" " " " " "*" " " " " "*" "*" " " "*" "*" "*" "*"
## 9 ( 1 ) "*" " " " " "*" "*" " " "*" "*" " " "*" "*" "*" "*"
## 10 ( 1 ) "*" "*" " " "*" "*" " " "*" "*" " " "*" "*" "*" "*"
## 11 ( 1 ) "*" "*" "*" "*" "*" " " "*" "*" " " "*" "*" "*" "*"
## 12 ( 1 ) "*" "*" "*" "*" "*" " " "*" "*" "*" "*" "*" "*" "*"
## 13 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
The variables included in this model are zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, black, lstat, and medv.
which.min(rmse.cv)
## [1] 9
boston.bsm.err=(rmse.cv[which.min(rmse.cv)])^2
boston.bsm.err
## [1] 42.81453
The cross-validation selects a 2-variable model based on the Test MSE, the CV estimate for the test MSE is 44.32201–the lowest MSE reported. PCR
library(pls)
boston.pcr=pcr(crim~., data=Boston, scale=TRUE, validation="CV")
summary(boston.pcr)
## Data: X dimension: 506 13
## Y dimension: 506 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 8.61 7.197 7.191 6.740 6.742 6.738 6.745
## adjCV 8.61 7.195 7.188 6.736 6.735 6.735 6.741
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 6.745 6.609 6.635 6.624 6.625 6.589 6.526
## adjCV 6.740 6.604 6.628 6.617 6.618 6.581 6.517
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 47.70 60.36 69.67 76.45 82.99 88.00 91.14 93.45
## crim 30.69 30.87 39.27 39.61 39.61 39.86 40.14 42.47
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 95.40 97.04 98.46 99.52 100.0
## crim 42.55 42.78 43.04 44.13 45.4
Based on the CV error as well as the variances explained, I think that the appropriate PCR model would only include 8 components. With 8 components, 93.45% of the variance is explained in the predictors by the model, and 42.47% of the variance is explained in the response variable by the model.
LASSO MODEL
boston.x=model.matrix(crim~., data=Boston)[,-1]
boston.y=Boston$crim
boston.lassomod=cv.glmnet(boston.x,boston.y,alpha=1, type.measure = "mse")
coef(boston.lassomod)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 1.4186414
## zn .
## indus .
## chas .
## nox .
## rm .
## age .
## dis .
## rad 0.2298449
## tax .
## ptratio .
## black .
## lstat .
## medv .
boston.lasso.err=(boston.lassomod$cvm[boston.lassomod$lambda==boston.lassomod$lambda.1se])
boston.lasso.err
## [1] 57.47456
plot(boston.lassomod)
The Lasso model has a MSE of 55.0202, with the variable rad.
Ridge regression
boston.ridge=cv.glmnet(boston.x, boston.y, type.measure = "mse", alpha=0)
plot(boston.ridge)
coef(boston.ridge)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 1.146730398
## zn -0.002889389
## indus 0.033208330
## chas -0.209288622
## nox 2.158437385
## rm -0.158326514
## age 0.007072375
## dis -0.109819199
## rad 0.056100087
## tax 0.002508948
## ptratio 0.082673533
## black -0.003147919
## lstat 0.042243863
## medv -0.027678989
boston.ridge.err=boston.ridge$cvm[boston.ridge$lambda==boston.ridge$lambda.1se]
boston.ridge.err
## [1] 56.23578
The ridge regression has a MSE if 58.66963.
(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.
The model that has the lowest error is the chosen by the subset selection method 42.81453.
(c) Does your chosen model involve all of the features in the data set? Why or why not? The best option is the best subset model, because it has a lowest MSE and has the least number of predictors with 9.