For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
(a) The lasso, relative to least squares, is:
i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
The Correct answer to part A is number III Less Flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease.
This is due to Lasso’s solution can have a reduced variance when there is a small increase in bias while the least squares estimate bias get a higher variance. The Lasso can make the coefficient smaller as well as remove the non-essential variables for less variance & higher bias.
(b) Repeat (a) for ridge regression relative to least squares.
The Correct answer to part B is the same ass part A which is number III, Less Flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease.
This is due to the fact that the Ridge Regression can also shrink the coefficient estimates as small as possible and leads to keeping some of the extra variance provided by the incorporation of non-essential variables. It also makes a model with a smaller variance and larger bias relative to least squares. When compared to lasso Ridge has a smaller bias and larger variance. At times ther is an extreme situation of this depending on the size of the λ penalty used which leads to higher penalty, higher bias, and lower variance.
(c) Repeat (a) for non-linear methods relative to least squares.
The Correct answer to part C is number II, more flexible and hence will give improved prediction accuracy when its increase in variance is than its decrease in bias.
This is primarily due to the reason that non-linear regression is more flexible when it comes to the shape of the curves of the models that it can fit. Non- linear regression tends to have less bias when compared to least squares. Finally, these models tend to have higher variance and yet has the goal of increased prediction accuracy.
In this exercise, we will predict the number of applications received using the other variables in the College data set.
(a) Split the data set into a training set and a test set.
library(ISLR2)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.4 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.2.3
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Loaded glmnet 4.1-7
library(pls)
## Warning: package 'pls' was built under R version 4.2.3
##
## Attaching package: 'pls'
##
## The following object is masked from 'package:stats':
##
## loadings
library(dplyr)
attach(College)
str(College)
## 'data.frame': 777 obs. of 18 variables:
## $ Private : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ Apps : num 1660 2186 1428 417 193 ...
## $ Accept : num 1232 1924 1097 349 146 ...
## $ Enroll : num 721 512 336 137 55 158 103 489 227 172 ...
## $ Top10perc : num 23 16 22 60 16 38 17 37 30 21 ...
## $ Top25perc : num 52 29 50 89 44 62 45 68 63 44 ...
## $ F.Undergrad: num 2885 2683 1036 510 249 ...
## $ P.Undergrad: num 537 1227 99 63 869 ...
## $ Outstate : num 7440 12280 11250 12960 7560 ...
## $ Room.Board : num 3300 6450 3750 5450 4120 ...
## $ Books : num 450 750 400 450 800 500 500 450 300 660 ...
## $ Personal : num 2200 1500 1165 875 1500 ...
## $ PhD : num 70 29 53 92 76 67 90 89 79 40 ...
## $ Terminal : num 78 30 66 97 72 73 93 100 84 41 ...
## $ S.F.Ratio : num 18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...
## $ perc.alumni: num 12 16 30 37 2 11 26 37 23 15 ...
## $ Expend : num 7041 10527 8735 19016 10922 ...
## $ Grad.Rate : num 60 56 54 59 15 55 63 73 80 52 ...
pairs(College)
x=model.matrix(Apps~.,College)[,-1]
y=College$Apps
set.seed(10)
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
College.train = College[train, ]
College.test = College[test, ]
y.test=y[test]
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
pls.fit<-lm(Apps~., data=College, subset=train)
summary(pls.fit)
##
## Call:
## lm(formula = Apps ~ ., data = College, subset = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5139.5 -473.3 -21.1 353.2 7402.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -629.36179 639.35741 -0.984 0.325579
## PrivateYes -647.56836 192.17056 -3.370 0.000832 ***
## Accept 1.68912 0.05038 33.530 < 2e-16 ***
## Enroll -1.02383 0.27721 -3.693 0.000255 ***
## Top10perc 48.19124 8.10714 5.944 6.42e-09 ***
## Top25perc -10.51538 6.44952 -1.630 0.103865
## F.Undergrad 0.01992 0.05364 0.371 0.710574
## P.Undergrad 0.04213 0.05348 0.788 0.431373
## Outstate -0.09489 0.02674 -3.549 0.000436 ***
## Room.Board 0.14549 0.07243 2.009 0.045277 *
## Books 0.06660 0.31115 0.214 0.830623
## Personal 0.05663 0.09453 0.599 0.549475
## PhD -10.11489 7.11588 -1.421 0.156027
## Terminal -2.29300 8.03546 -0.285 0.775528
## S.F.Ratio 22.07117 18.70991 1.180 0.238897
## perc.alumni 2.08121 6.00673 0.346 0.729179
## Expend 0.07654 0.01672 4.577 6.45e-06 ***
## Grad.Rate 9.99706 4.49821 2.222 0.026857 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1092 on 370 degrees of freedom
## Multiple R-squared: 0.9395, Adjusted R-squared: 0.9367
## F-statistic: 338 on 17 and 370 DF, p-value: < 2.2e-16
pred.app<-predict(pls.fit, College.test)
test.error<-mean((College.test$Apps-pred.app)^2)
test.error
## [1] 1020100
This shows the mean squared error for the linear model using least squares is 1020100.
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
grid=10^seq(10,-2,length=100)
ridge.mod=glmnet(x[train,],y[train],alpha=0,lambda=grid)
summary(ridge.mod)
## Length Class Mode
## a0 100 -none- numeric
## beta 1700 dgCMatrix S4
## df 100 -none- numeric
## dim 2 -none- numeric
## lambda 100 -none- numeric
## dev.ratio 100 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
cv.college.out=cv.glmnet(x[train,],y[train] ,alpha=0)
bestlam=cv.college.out$lambda.min
bestlam
## [1] 411.3927
ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
mean((ridge.pred-y.test)^2)
## [1] 985020.1
This shows the Mean Squared Error for the ridge regression model is 985020.1 which is slightly lower than the one with the leaner model.
(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
summary(lasso.mod)
## Length Class Mode
## a0 100 -none- numeric
## beta 1700 dgCMatrix S4
## df 100 -none- numeric
## dim 2 -none- numeric
## lambda 100 -none- numeric
## dev.ratio 100 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
bestlam=cv.out$lambda.min
bestlam
## [1] 24.66235
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
## [1] 1008145
out=glmnet(x,y,alpha=1,lambda = grid)
lasso.coef=predict(out,type="coefficients",s=bestlam)[1:18,]
lasso.coef[lasso.coef!=0]
## (Intercept) PrivateYes Accept Enroll Top10perc
## -6.324960e+02 -4.087012e+02 1.436837e+00 -1.410240e-01 3.143012e+01
## Top25perc P.Undergrad Outstate Room.Board Personal
## -8.606536e-01 1.480293e-02 -5.342495e-02 1.205819e-01 4.379135e-05
## PhD Terminal S.F.Ratio perc.alumni Expend
## -5.121245e+00 -3.371192e+00 2.717231e+00 -1.039648e+00 6.838161e-02
## Grad.Rate
## 4.700317e+00
This next model shows the Mean Squared Error of the LASSO model id 1008145 higher than the previous model in part c but still lower than the model of part a.
(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
pcr.college=pcr(Apps~., data=College.train,scale=TRUE,validation="CV")
summary(pcr.college)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4347 4345 2371 2391 2104 1949 1898
## adjCV 4347 4345 2368 2396 2085 1939 1891
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1899 1880 1864 1861 1870 1873 1891
## adjCV 1893 1862 1857 1853 1862 1865 1885
## 14 comps 15 comps 16 comps 17 comps
## CV 1903 1727 1295 1260
## adjCV 1975 1669 1283 1249
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.6794 56.94 64.38 70.61 76.27 80.97 84.48 87.54
## Apps 0.9148 71.17 71.36 79.85 81.49 82.73 82.79 83.70
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.50 92.89 94.96 96.81 97.97 98.73 99.39
## Apps 83.86 84.08 84.11 84.11 84.16 84.28 93.08
## 16 comps 17 comps
## X 99.86 100.00
## Apps 93.71 93.95
validationplot(pcr.college, val.type="MSEP")
pcr.pred=predict(pcr.college,x[test,],ncomp=10)
mean((pcr.pred-y.test)^2)
## [1] 1422699
This model shows the mean squared error for the PCR model is 1422699 and the selected number of components was 10 as it has a low cv and high varience explained< so far this is the highest mean sqaured error of all the models ran so far.
(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the valueof M selected by cross-validation.
pls.college=plsr(Apps~., data=College.train,scale=TRUE, validation="CV")
validationplot(pls.college, val.type="MSEP")
summary(pls.college)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4347 2178 1872 1734 1615 1453 1359
## adjCV 4347 2171 1867 1726 1586 1427 1341
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1347 1340 1329 1317 1310 1305 1305
## adjCV 1330 1324 1314 1302 1296 1291 1291
## 14 comps 15 comps 16 comps 17 comps
## CV 1305 1307 1307 1307
## adjCV 1291 1292 1293 1293
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 24.27 38.72 62.64 65.26 69.01 73.96 78.86 82.18
## Apps 76.96 84.31 86.80 91.48 93.37 93.75 93.81 93.84
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 85.35 87.42 89.18 91.41 92.70 94.58 97.16
## Apps 93.88 93.91 93.93 93.94 93.95 93.95 93.95
## 16 comps 17 comps
## X 98.15 100.00
## Apps 93.95 93.95
pls.pred=predict(pls.college,x[test,],ncomp=9)
mean((pls.pred-y.test)^2)
## [1] 1049868
This model shows tje mean squared error for the PLS model is 1049868 which is simular but in the middle of the pack compared to the other models. The selected number of components was 11 and had the lowest CV and a high variance.
(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
test.avg = mean(College.test[, "Apps"])
lm.test.r2 = 1 - mean((College.test[, "Apps"] - pred.app)^2) /mean((College.test[, "Apps"] - test.avg)^2)
ridge.test.r2 = 1 - mean((College.test[, "Apps"] - ridge.pred)^2) /mean((College.test[, "Apps"] - test.avg)^2)
lasso.test.r2 = 1 - mean((College.test[, "Apps"] - lasso.pred)^2) /mean((College.test[, "Apps"] - test.avg)^2)
pcr.test.r2 = 1 - mean((pcr.pred-y.test)^2) /mean((College.test[, "Apps"] - test.avg)^2)
pls.test.r2 = 1 - mean((pls.pred-y.test)^2) /mean((College.test[, "Apps"] - test.avg)^2)
barplot(c(lm.test.r2, ridge.test.r2, lasso.test.r2, pcr.test.r2, pls.test.r2), names.arg=c("OLS", "Ridge", "Lasso", "PCR", "PLS"), main="Test R-squared")
detach(College)
When comparing the R-squared of the models they show the PCR had the lowest accuracy but not too much lower than the other models. When looking at the Mean Squared Error of the different model the Ridge Regression model has the lowest so it would be the better of model.
We will now try to predict per capita crime rate in the Boston data set.
library(leaps)
## Warning: package 'leaps' was built under R version 4.2.3
attach(Boston)
str(Boston)
## 'data.frame': 506 obs. of 13 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : int 0 0 0 0 0 0 0 0 0 0 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : int 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
set.seed(123)
index = sample(nrow(Boston), 0.7*nrow(Boston))
boston.train<-Boston[index,]
boston.test<-Boston[-index,]
boston.bsm = regsubsets(crim ~ .,data = boston.train, nvmax = 13)
summary(boston.bsm)
## Subset selection object
## Call: regsubsets.formula(crim ~ ., data = boston.train, nvmax = 13)
## 12 Variables (and intercept)
## Forced in Forced out
## zn FALSE FALSE
## indus FALSE FALSE
## chas FALSE FALSE
## nox FALSE FALSE
## rm FALSE FALSE
## age FALSE FALSE
## dis FALSE FALSE
## rad FALSE FALSE
## tax FALSE FALSE
## ptratio FALSE FALSE
## lstat FALSE FALSE
## medv FALSE FALSE
## 1 subsets of each size up to 12
## Selection Algorithm: exhaustive
## zn indus chas nox rm age dis rad tax ptratio lstat medv
## 1 ( 1 ) " " " " " " " " " " " " " " "*" " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " " " " "*" " " " " "*" " "
## 3 ( 1 ) " " " " " " " " " " " " " " "*" " " "*" " " "*"
## 4 ( 1 ) "*" " " " " " " " " " " "*" "*" " " " " " " "*"
## 5 ( 1 ) "*" "*" " " " " " " " " "*" "*" " " " " " " "*"
## 6 ( 1 ) "*" " " " " "*" " " " " "*" "*" " " "*" " " "*"
## 7 ( 1 ) "*" " " " " "*" " " " " "*" "*" " " "*" "*" "*"
## 8 ( 1 ) "*" " " " " "*" "*" " " "*" "*" " " "*" "*" "*"
## 9 ( 1 ) "*" " " " " "*" "*" " " "*" "*" "*" "*" "*" "*"
## 10 ( 1 ) "*" "*" " " "*" "*" " " "*" "*" "*" "*" "*" "*"
## 11 ( 1 ) "*" "*" "*" "*" "*" " " "*" "*" "*" "*" "*" "*"
## 12 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
boston.test.mat = model.matrix(crim ~., data = boston.test, nvmax = 12)
val.errors = rep(NA, 12)
for (i in 1:12) {
coefi = coef(boston.bsm, id = i)
pred = boston.test.mat[, names(coefi)] %*% coefi
val.errors[i] = mean((pred - boston.test$crim)^2)
}
plot(val.errors, xlab = "Number of predictors in model", ylab = "Test MSE", pch = 19, type = "b")
which.min(val.errors)
## [1] 11
coef(boston.bsm,which.min(val.errors))
## (Intercept) zn indus chas nox
## 15.299873241 0.052384985 -0.054494823 -0.252605744 -10.641836652
## rm dis rad tax ptratio
## 0.731054549 -1.025629494 0.683220087 -0.004538344 -0.359671426
## lstat medv
## 0.130666954 -0.263974404
boston.bsm.MSE = val.errors[11]
boston.bsm.MSE
## [1] 18.67874
The best subset model adds 1 predictor each time in this case up to 12 predictors within all 12 models, the one that gave the best/ least Means squared error was 11. Concluding the final model selected by the best subset approach with the MSE being 18.68
boston.train.mat=model.matrix(crim~.,boston.train)
boston.test.mat=model.matrix(crim~.,boston.test)
grid=10^seq(10,-2,length=100)
set.seed(1)
bridge.mod = glmnet(boston.train.mat,boston.train$crim,alpha=0, lambda = grid, thresh=1e-12)
cv.out=cv.glmnet(boston.train.mat,boston.train$crim,alpha=0, lambda = grid, thresh=1e-12)
bestlam=cv.out$lambda.min
bestlam
## [1] 0.2154435
plot(cv.out)
ridge.model.1<-glmnet(boston.train.mat,boston.train$crim,alpha=0,lambda=bestlam)
coef(ridge.model.1)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 1.072632e+01
## (Intercept) .
## zn 4.543365e-02
## indus -7.477200e-02
## chas -2.817295e-01
## nox -8.051290e+00
## rm 6.901592e-01
## age -2.601379e-04
## dis -8.892072e-01
## rad 5.868775e-01
## tax 6.594563e-05
## ptratio -2.900139e-01
## lstat 1.457719e-01
## medv -2.295003e-01
ridge.pred=predict(ridge.model.1,s=bestlam,newx=boston.test.mat)
ridge.MSE = mean((ridge.pred-boston.test$crim)^2)
ridge.MSE
## [1] 18.1551
As output the ridge regression model runs a range of values of lambda. In this model it found the best lambda of 0.2154465 in which it got its lowest Means square error of 18.16
lasso.model<-glmnet(boston.train.mat,boston.train$crim,alpha=0,lambda=grid)
boston.cv.lasso<-cv.glmnet(boston.train.mat,boston.train$crim,alpha=0,lambda=grid)
plot(boston.cv.lasso)
boston.bestlam.lasso<-boston.cv.lasso$lambda.min
boston.bestlam.lasso
## [1] 0.1629751
lasso.model.1<-glmnet(boston.train.mat,boston.train$crim,alpha=0,lambda=boston.bestlam.lasso)
coef(lasso.model.1)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 11.6672209672
## (Intercept) .
## zn 0.0468570045
## indus -0.0720333874
## chas -0.2725308738
## nox -8.5989190675
## rm 0.6993673676
## age -0.0001445266
## dis -0.9184581671
## rad 0.6059432441
## tax -0.0007972230
## ptratio -0.3048771521
## lstat 0.1428859802
## medv -0.2366344026
boston.pred.newlasso<-predict(lasso.model,s=boston.bestlam.lasso,newx=boston.test.mat)
lasso.MSE<-mean((boston.test$crim-boston.pred.newlasso)^2)
lasso.MSE
## [1] 18.24209
As found in the Lasso regression model the lambda was found of 0.1629751 with the Means squared error being 18.24. Which the lamba is lower than the previous model and the MSE is higher.
boston.pcr.model<-pcr(crim~.,data=boston.train,scale=TRUE,validation="CV")
summary(boston.pcr.model)
## Data: X dimension: 354 12
## Y dimension: 354 1
## Fit method: svdpc
## Number of components considered: 12
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 9.525 8.068 8.053 7.676 7.574 7.564 7.541
## adjCV 9.525 8.066 8.050 7.673 7.569 7.561 7.537
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps
## CV 7.491 7.427 7.424 7.430 7.376 7.292
## adjCV 7.519 7.420 7.417 7.423 7.367 7.282
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 50.92 64.72 73.86 80.98 87.04 90.43 92.81 95.13
## crim 28.57 29.01 35.55 37.36 37.72 38.42 38.72 40.97
## 9 comps 10 comps 11 comps 12 comps
## X 96.92 98.38 99.50 100.00
## crim 41.14 41.14 42.64 43.88
validationplot(boston.pcr.model,val.type="MSEP")
boston.pcr.model$ncomp
## [1] 12
boston.pcr.12.model= pcr(crim~.,data=boston.train,scale=TRUE,ncomp=12)
summary(boston.pcr.12.model)
## Data: X dimension: 354 12
## Y dimension: 354 1
## Fit method: svdpc
## Number of components considered: 12
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 50.92 64.72 73.86 80.98 87.04 90.43 92.81 95.13
## crim 28.57 29.01 35.55 37.36 37.72 38.42 38.72 40.97
## 9 comps 10 comps 11 comps 12 comps
## X 96.92 98.38 99.50 100.00
## crim 41.14 41.14 42.64 43.88
boston.predict.pcr<-predict(boston.pcr.model,boston.test,ncomp=12)
pcr.MSE<-mean((boston.test$crim-boston.predict.pcr)^2)
pcr.MSE
## [1] 18.67968
As found in the PCR model shows the mean squared error for the PCR model is 18.68 and the selected number of components was 12.
model.names = c("Best subset selection", "Lasso", "Ridge regression", "PCR")
error.matrix = c(boston.bsm.MSE, lasso.MSE, ridge.MSE, pcr.MSE)
data.frame(model.names, error.matrix)
## model.names error.matrix
## 1 Best subset selection 18.67874
## 2 Lasso 18.24209
## 3 Ridge regression 18.15510
## 4 PCR 18.67968
To Sum this comparison of the 4 models show that the Ridge Regression model has the lowest Means sqaured error of 18.16.
(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.
In conclusion, s shown in the running of the four different models and then model comparision, it was found the Rigde regression is the best model. This is due to it having the lowest Means Sqaure Error of the 4 models which is 18.16. To add though all four models' MSE are within the same range.
(c) Does your chosen model involve all of the features in the data set? Why or why not?
Yes, the choosen model RIdge regression involves all the features in the dataset. It is shown from having a better lambda & the lower MSE which is due to it using as many of the predictors as possible and has multicollinearity like this dataset has.