library(tidyverse)
library(ISLR)
library(leaps)
library(glmnet)
library(caret)
library(pls)
library(MASS)
library(dplyr)
For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
(a) The lasso, relative to least squares, is:
Option iii. This is because lasso estimates the \(\beta\) that shrinks the RSS, which gives a \(\beta\) close to 0
(b) Repeat (a) for ridge regression relative to least squares.
Also option iii, for the same reasons as in part (a), however Ridge Regression will not shrink the coefficients to exactly zero.
(c) Repeat (a) for non-linear methods relative to least squares.
Option ii. The non-linear method have a higher variance, which allows the curve to follow observations more closely which then lets the model preform better.
In this exercise, we will predict the number of applications received using the other variables in the College data set.
(a) Split the data set into a training set and a test set.
data("College")
set.seed(1)
train_1 <- sample(1:nrow(College), round(nrow(College) * 0.5))
train <- College[train_1, ]
nrow(train) / nrow(College)
## [1] 0.4993565
test <- College[-train_1, ]
nrow(test) / nrow(College)
## [1] 0.5006435
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
linear <- lm(Apps ~ ., data = train)
summary(linear)
##
## Call:
## lm(formula = Apps ~ ., data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5741.2 -479.5 15.3 359.6 7258.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.902e+02 6.381e+02 -1.238 0.216410
## PrivateYes -3.070e+02 2.006e+02 -1.531 0.126736
## Accept 1.779e+00 5.420e-02 32.830 < 2e-16 ***
## Enroll -1.470e+00 3.115e-01 -4.720 3.35e-06 ***
## Top10perc 6.673e+01 8.310e+00 8.030 1.31e-14 ***
## Top25perc -2.231e+01 6.533e+00 -3.415 0.000708 ***
## F.Undergrad 9.269e-02 5.529e-02 1.676 0.094538 .
## P.Undergrad 9.397e-03 5.493e-02 0.171 0.864275
## Outstate -1.084e-01 2.700e-02 -4.014 7.22e-05 ***
## Room.Board 2.115e-01 7.224e-02 2.928 0.003622 **
## Books 2.912e-01 3.985e-01 0.731 0.465399
## Personal 6.133e-03 8.803e-02 0.070 0.944497
## PhD -1.548e+01 6.681e+00 -2.316 0.021082 *
## Terminal 6.415e+00 7.290e+00 0.880 0.379470
## S.F.Ratio 2.283e+01 2.047e+01 1.115 0.265526
## perc.alumni 1.134e+00 6.083e+00 0.186 0.852274
## Expend 4.857e-02 1.619e-02 2.999 0.002890 **
## Grad.Rate 7.490e+00 4.397e+00 1.703 0.089324 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1083 on 370 degrees of freedom
## Multiple R-squared: 0.9389, Adjusted R-squared: 0.9361
## F-statistic: 334.3 on 17 and 370 DF, p-value: < 2.2e-16
ols_pred <- predict(linear, test)
(ols_mse <- mean((ols_pred - test$Apps)^2))
## [1] 1135758
The MSE we got for this test was 1108531.
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(glmnet)
train.mat<-model.matrix(Apps~.,data=train)
test.mat<-model.matrix(Apps~.,data=test)
grid<-10^seq(4,-2,length=100)
library(glmnet)
ridge<-glmnet(train.mat,train$Apps,alpha=0,lambda=grid,thresh = 1e-12)
cv.ridge<-cv.glmnet(train.mat,train$Apps,alpha=0,lambda=grid,thresh=1e-12)
bestlam.ridge<-cv.ridge$lambda.min
bestlam.ridge
## [1] 0.01
pred.newridge<-predict(ridge,s=bestlam.ridge,newx =test.mat)
mean((test$Apps-pred.newridge)^2)
## [1] 1135714
This MSE for this test is 1108514 which is slightly less than LSM model, which was 1108531.
(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
lasso<-glmnet(train.mat,train$Apps,alpha=1,lambda=grid,thresh = 1e-12)
cv.lasso<-cv.glmnet(train.mat,train$Apps,alpha=1,lambda=grid,thresh=1e-12)
bestlam.lasso<-cv.lasso$lambda.min
bestlam.lasso
## [1] 0.01
pred.newlasso<-predict(lasso,s=bestlam.lasso,newx =test.mat)
mean((test$Apps-pred.newlasso)^2)
## [1] 1135660
predict(lasso,s=bestlam.lasso,type="coefficients")
## 19 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -7.900363e+02
## (Intercept) .
## PrivateYes -3.070103e+02
## Accept 1.779328e+00
## Enroll -1.469508e+00
## Top10perc 6.672214e+01
## Top25perc -2.230442e+01
## F.Undergrad 9.258974e-02
## P.Undergrad 9.408838e-03
## Outstate -1.083495e-01
## Room.Board 2.115147e-01
## Books 2.912105e-01
## Personal 6.120406e-03
## PhD -1.547200e+01
## Terminal 6.409503e+00
## S.F.Ratio 2.282638e+01
## perc.alumni 1.130498e+00
## Expend 4.856697e-02
## Grad.Rate 7.488081e+00
The error we got was 999125.6 which is slightly lower than what we got in part (d).
(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
library(pls)
fit.pcr <- pcr(Apps ~ ., data = train, scale = TRUE, validation = "CV")
validationplot(fit.pcr, val.type = "MSEP")
pred.pcr <- predict(fit.pcr, test, ncomp = 10)
mean((pred.pcr - test$Apps)^2)
## [1] 1723100
The test error was 1505718, which is higher than both the ridge regression and lasso models.
(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
fit.pls <- plsr(Apps ~ ., data = train, scale = TRUE, validation = "CV")
validationplot(fit.pls, val.type = "MSEP")
pred.pls <- predict(fit.pls, test, ncomp = 10)
mean((pred.pls - test$Apps)^2)
## [1] 1131661
The error for this model is 1134531.
(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
totalsumofsquares = sum((mean(test$Apps) - test$Apps)^2)
totalsumofresiduals = sum((pred.newridge - test$Apps)^2)
1 - (totalsumofresiduals)/(totalsumofsquares)
## [1] 0.9015452
Based on all of our results, Ridge Regression had the least error which would most accurately predict the number of applications received. The model with the highest error woul be the LASSO method.
We will now try to predict per capita crime rate in the Boston data set.
library(MASS)
library(glmnet)
library(leaps)
library(pls)
data("Boston")
(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
set.seed(1)
subset<-sample(nrow(Boston),nrow(Boston)*0.50)
boston.train<-Boston[subset,]
boston.test<-Boston[-subset,]
slr.full<-lm(medv~.,data=boston.train)
slr.predict<-predict(slr.full,boston.test)
slr.MSE<-mean((slr.predict-boston.test$medv)^2)
slr.MSE
## [1] 26.86123
bsm<-regsubsets(medv~.,data=boston.train,nbest=1,nvmax=13)
summary(bsm)
## Subset selection object
## Call: regsubsets.formula(medv ~ ., data = boston.train, nbest = 1,
## nvmax = 13)
## 13 Variables (and intercept)
## Forced in Forced out
## crim FALSE FALSE
## zn FALSE FALSE
## indus FALSE FALSE
## chas FALSE FALSE
## nox FALSE FALSE
## rm FALSE FALSE
## age FALSE FALSE
## dis FALSE FALSE
## rad FALSE FALSE
## tax FALSE FALSE
## ptratio FALSE FALSE
## black FALSE FALSE
## lstat FALSE FALSE
## 1 subsets of each size up to 13
## Selection Algorithm: exhaustive
## crim zn indus chas nox rm age dis rad tax ptratio black lstat
## 1 ( 1 ) " " " " " " " " " " " " " " " " " " " " " " " " "*"
## 2 ( 1 ) " " " " " " " " " " "*" " " " " " " " " " " " " "*"
## 3 ( 1 ) " " " " " " " " " " "*" " " " " " " " " "*" " " "*"
## 4 ( 1 ) " " " " " " "*" " " "*" " " " " " " " " "*" " " "*"
## 5 ( 1 ) " " " " " " " " "*" "*" " " "*" " " " " "*" " " "*"
## 6 ( 1 ) " " " " " " " " " " "*" " " "*" "*" "*" "*" " " "*"
## 7 ( 1 ) " " " " " " " " "*" "*" " " "*" "*" "*" "*" " " "*"
## 8 ( 1 ) "*" " " " " " " "*" "*" " " "*" "*" "*" "*" " " "*"
## 9 ( 1 ) "*" " " " " "*" "*" "*" " " "*" "*" "*" "*" " " "*"
## 10 ( 1 ) "*" " " " " "*" "*" "*" " " "*" "*" "*" "*" "*" "*"
## 11 ( 1 ) "*" "*" " " "*" "*" "*" " " "*" "*" "*" "*" "*" "*"
## 12 ( 1 ) "*" "*" "*" "*" "*" "*" " " "*" "*" "*" "*" "*" "*"
## 13 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
boston.test.mat <- model.matrix(medv~ ., data = boston.test, nvmax = 13)
val.errors <- rep(NA, 13)
for (i in 1:13) {coefi <- coef(bsm, id = i)
pred <- boston.test.mat[, names(coefi)] %*% coefi
val.errors[i] <- mean((pred - boston.test$medv)^2)}
which.min(val.errors)
## [1] 12
bsm.MSE<-val.errors[11]
bsm.MSE
## [1] 26.85842
nullmodel<-lm(medv ~ 1, data = boston.train)
fullmodel<-lm(medv ~ ., data = boston.train)
fwd.model<-step(nullmodel, scope = list(lower = nullmodel, upper = fullmodel), direction = "forward")
## Start: AIC=1100.55
## medv ~ 1
##
## Df Sum of Sq RSS AIC
## + lstat 1 10842.0 8605.9 896.28
## + rm 1 9884.6 9563.3 922.97
## + ptratio 1 5114.5 14333.4 1025.35
## + tax 1 4779.1 14668.8 1031.20
## + indus 1 4484.1 14963.7 1036.24
## + nox 1 4068.6 15379.3 1043.17
## + crim 1 3718.8 15729.1 1048.86
## + age 1 3600.4 15847.5 1050.76
## + rad 1 3074.6 16373.3 1059.01
## + black 1 2461.7 16986.2 1068.31
## + zn 1 2457.9 16990.0 1068.37
## + dis 1 1590.4 17857.5 1080.97
## + chas 1 730.5 18717.4 1092.87
## <none> 19447.9 1100.55
##
## Step: AIC=896.28
## medv ~ lstat
##
## Df Sum of Sq RSS AIC
## + rm 1 2085.06 6520.9 828.09
## + ptratio 1 1171.36 7434.6 861.27
## + chas 1 427.46 8178.5 885.39
## + tax 1 300.89 8305.0 889.28
## + dis 1 228.39 8377.5 891.48
## + black 1 92.97 8512.9 895.54
## + zn 1 85.26 8520.6 895.77
## + crim 1 74.82 8531.1 896.08
## + age 1 72.51 8533.4 896.14
## <none> 8605.9 896.28
## + rad 1 47.77 8558.1 896.88
## + indus 1 39.19 8566.7 897.13
## + nox 1 4.39 8601.5 898.16
##
## Step: AIC=828.09
## medv ~ lstat + rm
##
## Df Sum of Sq RSS AIC
## + ptratio 1 824.15 5696.7 795.91
## + tax 1 351.28 6169.6 816.08
## + chas 1 344.68 6176.2 816.35
## + black 1 227.88 6293.0 821.09
## + crim 1 143.66 6377.2 824.45
## + rad 1 130.81 6390.0 824.96
## + dis 1 84.67 6436.2 826.78
## <none> 6520.9 828.09
## + zn 1 30.73 6490.1 828.90
## + indus 1 21.44 6499.4 829.26
## + nox 1 15.40 6505.5 829.49
## + age 1 0.21 6520.6 830.08
##
## Step: AIC=795.91
## medv ~ lstat + rm + ptratio
##
## Df Sum of Sq RSS AIC
## + chas 1 203.711 5493.0 788.69
## + dis 1 187.949 5508.7 789.42
## + black 1 126.547 5570.2 792.22
## + tax 1 101.425 5595.3 793.36
## + crim 1 70.418 5626.3 794.76
## <none> 5696.7 795.91
## + age 1 19.500 5677.2 797.04
## + zn 1 11.637 5685.1 797.39
## + nox 1 6.848 5689.8 797.60
## + rad 1 1.435 5695.3 797.84
## + indus 1 0.376 5696.3 797.89
##
## Step: AIC=788.69
## medv ~ lstat + rm + ptratio + chas
##
## Df Sum of Sq RSS AIC
## + dis 1 135.559 5357.4 784.37
## + black 1 106.263 5386.7 785.75
## + tax 1 97.720 5395.3 786.15
## + crim 1 62.204 5430.8 787.81
## <none> 5493.0 788.69
## + nox 1 19.651 5473.3 789.79
## + zn 1 7.597 5485.4 790.34
## + age 1 6.503 5486.5 790.39
## + rad 1 2.406 5490.6 790.58
## + indus 1 0.712 5492.3 790.66
##
## Step: AIC=784.37
## medv ~ lstat + rm + ptratio + chas + dis
##
## Df Sum of Sq RSS AIC
## + nox 1 294.802 5062.6 772.05
## + tax 1 240.625 5116.8 774.74
## + black 1 139.238 5218.2 779.71
## + crim 1 103.497 5253.9 781.44
## + indus 1 100.754 5256.7 781.57
## <none> 5357.4 784.37
## + rad 1 31.976 5325.5 784.86
## + age 1 31.945 5325.5 784.86
## + zn 1 17.620 5339.8 785.54
##
## Step: AIC=772.05
## medv ~ lstat + rm + ptratio + chas + dis + nox
##
## Df Sum of Sq RSS AIC
## + black 1 93.740 4968.9 769.32
## + tax 1 64.410 4998.2 770.81
## + crim 1 63.936 4998.7 770.84
## <none> 5062.6 772.05
## + zn 1 17.206 5045.4 773.19
## + indus 1 12.916 5049.7 773.41
## + rad 1 5.096 5057.5 773.80
## + age 1 3.632 5059.0 773.87
##
## Step: AIC=769.32
## medv ~ lstat + rm + ptratio + chas + dis + nox + black
##
## Df Sum of Sq RSS AIC
## <none> 4968.9 769.32
## + rad 1 31.4109 4937.5 769.72
## + tax 1 29.9137 4939.0 769.80
## + crim 1 28.4346 4940.5 769.87
## + zn 1 25.4566 4943.4 770.02
## + indus 1 10.3822 4958.5 770.79
## + age 1 8.4054 4960.5 770.89
summary(fwd.model)
##
## Call:
## lm(formula = medv ~ lstat + rm + ptratio + chas + dis + nox +
## black, data = boston.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.1197 -2.7283 -0.4954 1.7559 16.7531
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.027255 6.590656 4.253 3.01e-05 ***
## lstat -0.514869 0.062929 -8.182 1.53e-14 ***
## rm 4.522123 0.529007 8.548 1.36e-15 ***
## ptratio -0.973703 0.158207 -6.155 3.05e-09 ***
## chas 3.058479 1.167288 2.620 0.009338 **
## dis -1.039421 0.230975 -4.500 1.05e-05 ***
## nox -15.136660 4.317294 -3.506 0.000541 ***
## black 0.007219 0.003358 2.150 0.032543 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.503 on 245 degrees of freedom
## Multiple R-squared: 0.7445, Adjusted R-squared: 0.7372
## F-statistic: 102 on 7 and 245 DF, p-value: < 2.2e-16
fwd.predict<-predict(fwd.model,boston.test)
fwd.mse<-mean((fwd.predict-boston.test$medv)^2)
fwd.mse
## [1] 27.45563
bwd.model<-step(fullmodel,direction = "backward")
## Start: AIC=756.62
## medv ~ crim + zn + indus + chas + nox + rm + age + dis + rad +
## tax + ptratio + black + lstat
##
## Df Sum of Sq RSS AIC
## - age 1 0.78 4507.4 754.66
## - indus 1 3.57 4510.2 754.82
## <none> 4506.6 756.62
## - zn 1 50.40 4557.0 757.43
## - black 1 72.28 4578.9 758.64
## - chas 1 77.17 4583.8 758.91
## - crim 1 90.78 4597.4 759.66
## - nox 1 133.78 4640.4 762.02
## - tax 1 307.43 4814.0 771.31
## - rad 1 364.39 4871.0 774.29
## - dis 1 411.50 4918.1 776.72
## - ptratio 1 592.43 5099.0 785.86
## - lstat 1 943.49 5450.1 802.71
## - rm 1 1098.86 5605.5 809.82
##
## Step: AIC=754.66
## medv ~ crim + zn + indus + chas + nox + rm + dis + rad + tax +
## ptratio + black + lstat
##
## Df Sum of Sq RSS AIC
## - indus 1 3.65 4511.0 752.87
## <none> 4507.4 754.66
## - zn 1 51.85 4559.2 755.55
## - black 1 71.74 4579.1 756.66
## - chas 1 76.63 4584.0 756.93
## - crim 1 90.53 4597.9 757.69
## - nox 1 147.42 4654.8 760.80
## - tax 1 310.41 4817.8 769.51
## - rad 1 373.48 4880.9 772.80
## - dis 1 443.51 4950.9 776.41
## - ptratio 1 611.10 5118.5 784.83
## - rm 1 1141.83 5649.2 809.79
## - lstat 1 1149.41 5656.8 810.13
##
## Step: AIC=752.87
## medv ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio +
## black + lstat
##
## Df Sum of Sq RSS AIC
## <none> 4511.0 752.87
## - zn 1 49.56 4560.6 753.63
## - black 1 71.75 4582.8 754.86
## - chas 1 78.50 4589.5 755.23
## - crim 1 93.75 4604.8 756.07
## - nox 1 144.49 4655.5 758.84
## - tax 1 325.81 4836.8 768.51
## - rad 1 372.51 4883.5 770.94
## - dis 1 499.36 5010.4 777.43
## - ptratio 1 608.00 5119.0 782.86
## - rm 1 1138.88 5649.9 807.82
## - lstat 1 1152.03 5663.1 808.41
summary(bwd.model)
##
## Call:
## lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad +
## tax + ptratio + black + lstat, data = boston.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.4289 -2.5103 -0.3652 1.5543 16.8483
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.844360 6.711874 5.191 4.43e-07 ***
## crim -0.093283 0.041682 -2.238 0.02614 *
## zn 0.029404 0.018070 1.627 0.10500
## chas 2.320891 1.133312 2.048 0.04166 *
## nox -13.320909 4.794503 -2.778 0.00589 **
## rm 4.060373 0.520542 7.800 1.86e-13 ***
## dis -1.274369 0.246729 -5.165 5.04e-07 ***
## rad 0.375198 0.084105 4.461 1.25e-05 ***
## tax -0.018505 0.004435 -4.172 4.22e-05 ***
## ptratio -0.982326 0.172358 -5.699 3.50e-08 ***
## black 0.006774 0.003460 1.958 0.05140 .
## lstat -0.488743 0.062299 -7.845 1.40e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.326 on 241 degrees of freedom
## Multiple R-squared: 0.768, Adjusted R-squared: 0.7575
## F-statistic: 72.54 on 11 and 241 DF, p-value: < 2.2e-16
bwd.predict<-predict(bwd.model,boston.test)
bwd.mse<-mean((bwd.predict-boston.test$medv)^2)
bwd.mse
## [1] 26.85842
step.model<-step(nullmodel, scope = list(lower = nullmodel, upper = fullmodel), direction = "both")
## Start: AIC=1100.55
## medv ~ 1
##
## Df Sum of Sq RSS AIC
## + lstat 1 10842.0 8605.9 896.28
## + rm 1 9884.6 9563.3 922.97
## + ptratio 1 5114.5 14333.4 1025.35
## + tax 1 4779.1 14668.8 1031.20
## + indus 1 4484.1 14963.7 1036.24
## + nox 1 4068.6 15379.3 1043.17
## + crim 1 3718.8 15729.1 1048.86
## + age 1 3600.4 15847.5 1050.76
## + rad 1 3074.6 16373.3 1059.01
## + black 1 2461.7 16986.2 1068.31
## + zn 1 2457.9 16990.0 1068.37
## + dis 1 1590.4 17857.5 1080.97
## + chas 1 730.5 18717.4 1092.87
## <none> 19447.9 1100.55
##
## Step: AIC=896.28
## medv ~ lstat
##
## Df Sum of Sq RSS AIC
## + rm 1 2085.1 6520.9 828.09
## + ptratio 1 1171.4 7434.6 861.27
## + chas 1 427.5 8178.5 885.39
## + tax 1 300.9 8305.0 889.28
## + dis 1 228.4 8377.5 891.48
## + black 1 93.0 8512.9 895.54
## + zn 1 85.3 8520.6 895.77
## + crim 1 74.8 8531.1 896.08
## + age 1 72.5 8533.4 896.14
## <none> 8605.9 896.28
## + rad 1 47.8 8558.1 896.88
## + indus 1 39.2 8566.7 897.13
## + nox 1 4.4 8601.5 898.16
## - lstat 1 10842.0 19447.9 1100.55
##
## Step: AIC=828.09
## medv ~ lstat + rm
##
## Df Sum of Sq RSS AIC
## + ptratio 1 824.15 5696.7 795.91
## + tax 1 351.28 6169.6 816.08
## + chas 1 344.68 6176.2 816.35
## + black 1 227.88 6293.0 821.09
## + crim 1 143.66 6377.2 824.45
## + rad 1 130.81 6390.0 824.96
## + dis 1 84.67 6436.2 826.78
## <none> 6520.9 828.09
## + zn 1 30.73 6490.1 828.90
## + indus 1 21.44 6499.4 829.26
## + nox 1 15.40 6505.5 829.49
## + age 1 0.21 6520.6 830.08
## - rm 1 2085.06 8605.9 896.28
## - lstat 1 3042.41 9563.3 922.97
##
## Step: AIC=795.91
## medv ~ lstat + rm + ptratio
##
## Df Sum of Sq RSS AIC
## + chas 1 203.71 5493.0 788.69
## + dis 1 187.95 5508.7 789.42
## + black 1 126.55 5570.2 792.22
## + tax 1 101.42 5595.3 793.36
## + crim 1 70.42 5626.3 794.76
## <none> 5696.7 795.91
## + age 1 19.50 5677.2 797.04
## + zn 1 11.64 5685.1 797.39
## + nox 1 6.85 5689.8 797.60
## + rad 1 1.44 5695.3 797.84
## + indus 1 0.38 5696.3 797.89
## - ptratio 1 824.15 6520.9 828.09
## - rm 1 1737.85 7434.6 861.27
## - lstat 1 2145.70 7842.4 874.78
##
## Step: AIC=788.69
## medv ~ lstat + rm + ptratio + chas
##
## Df Sum of Sq RSS AIC
## + dis 1 135.56 5357.4 784.37
## + black 1 106.26 5386.7 785.75
## + tax 1 97.72 5395.3 786.15
## + crim 1 62.20 5430.8 787.81
## <none> 5493.0 788.69
## + nox 1 19.65 5473.3 789.79
## + zn 1 7.60 5485.4 790.34
## + age 1 6.50 5486.5 790.39
## + rad 1 2.41 5490.6 790.58
## + indus 1 0.71 5492.3 790.66
## - chas 1 203.71 5696.7 795.91
## - ptratio 1 683.19 6176.2 816.35
## - rm 1 1704.54 7197.5 855.07
## - lstat 1 2169.81 7662.8 870.92
##
## Step: AIC=784.37
## medv ~ lstat + rm + ptratio + chas + dis
##
## Df Sum of Sq RSS AIC
## + nox 1 294.80 5062.6 772.05
## + tax 1 240.63 5116.8 774.74
## + black 1 139.24 5218.2 779.71
## + crim 1 103.50 5253.9 781.44
## + indus 1 100.75 5256.7 781.57
## <none> 5357.4 784.37
## + rad 1 31.98 5325.5 784.86
## + age 1 31.94 5325.5 784.86
## + zn 1 17.62 5339.8 785.54
## - dis 1 135.56 5493.0 788.69
## - chas 1 151.32 5508.7 789.42
## - ptratio 1 771.23 6128.7 816.40
## - rm 1 1529.43 6886.9 845.91
## - lstat 1 2190.86 7548.3 869.11
##
## Step: AIC=772.05
## medv ~ lstat + rm + ptratio + chas + dis + nox
##
## Df Sum of Sq RSS AIC
## + black 1 93.74 4968.9 769.32
## + tax 1 64.41 4998.2 770.81
## + crim 1 63.94 4998.7 770.84
## <none> 5062.6 772.05
## + zn 1 17.21 5045.4 773.19
## + indus 1 12.92 5049.7 773.41
## + rad 1 5.10 5057.5 773.80
## + age 1 3.63 5059.0 773.87
## - chas 1 161.79 5224.4 778.01
## - nox 1 294.80 5357.4 784.37
## - dis 1 410.71 5473.3 789.79
## - ptratio 1 842.05 5904.7 808.98
## - rm 1 1410.77 6473.4 832.24
## - lstat 1 1606.46 6669.1 839.78
##
## Step: AIC=769.32
## medv ~ lstat + rm + ptratio + chas + dis + nox + black
##
## Df Sum of Sq RSS AIC
## <none> 4968.9 769.32
## + rad 1 31.41 4937.5 769.72
## + tax 1 29.91 4939.0 769.80
## + crim 1 28.43 4940.5 769.87
## + zn 1 25.46 4943.4 770.02
## + indus 1 10.38 4958.5 770.79
## + age 1 8.41 4960.5 770.89
## - black 1 93.74 5062.6 772.05
## - chas 1 139.23 5108.1 774.31
## - nox 1 249.30 5218.2 779.71
## - dis 1 410.72 5379.6 787.42
## - ptratio 1 768.24 5737.1 803.69
## - lstat 1 1357.64 6326.5 828.44
## - rm 1 1482.02 6450.9 833.36
summary(step.model)
##
## Call:
## lm(formula = medv ~ lstat + rm + ptratio + chas + dis + nox +
## black, data = boston.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.1197 -2.7283 -0.4954 1.7559 16.7531
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.027255 6.590656 4.253 3.01e-05 ***
## lstat -0.514869 0.062929 -8.182 1.53e-14 ***
## rm 4.522123 0.529007 8.548 1.36e-15 ***
## ptratio -0.973703 0.158207 -6.155 3.05e-09 ***
## chas 3.058479 1.167288 2.620 0.009338 **
## dis -1.039421 0.230975 -4.500 1.05e-05 ***
## nox -15.136660 4.317294 -3.506 0.000541 ***
## black 0.007219 0.003358 2.150 0.032543 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.503 on 245 degrees of freedom
## Multiple R-squared: 0.7445, Adjusted R-squared: 0.7372
## F-statistic: 102 on 7 and 245 DF, p-value: < 2.2e-16
step.predict<-predict(step.model,boston.test)
step.mse<-mean((step.predict-boston.test$medv)^2)
step.mse
## [1] 27.45563
bostontrain.mat<-model.matrix(medv~.,data=boston.train)
bostontest.mat<-model.matrix(medv~.,data=boston.test)
grid<-10^seq(4,-2,length=100)
ridge.model<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
boston.cv.ridge<-cv.glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
boston.bestlam.ridge<-boston.cv.ridge$lambda.min
boston.bestlam.ridge
## [1] 0.05336699
ridge.model.1<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=boston.bestlam.ridge)
coef(ridge.model.1)
## 15 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 33.549938944
## (Intercept) .
## crim -0.089333575
## zn 0.028630555
## indus 0.022489614
## chas 2.372039925
## nox -13.053954757
## rm 4.138827718
## age -0.004384557
## dis -1.236522044
## rad 0.347615984
## tax -0.017518196
## ptratio -0.965923891
## black 0.006772119
## lstat -0.481050642
boston.pred.newridge<-predict(ridge.model,s=boston.bestlam.ridge,newx=bostontest.mat)
ridge.MSE<-mean((boston.test$medv-boston.pred.newridge)^2)
ridge.MSE
## [1] 26.79522
lasso.model<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
boston.cv.lasso<-cv.glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
boston.bestlam.lasso<-boston.cv.lasso$lambda.min
boston.bestlam.lasso
## [1] 0.08111308
lasso.model.1<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=boston.bestlam.lasso)
coef(lasso.model.1)
## 15 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 33.024094010
## (Intercept) .
## crim -0.088159911
## zn 0.027992223
## indus 0.018467094
## chas 2.400885843
## nox -12.807002261
## rm 4.155104180
## age -0.004805140
## dis -1.221950655
## rad 0.333681186
## tax -0.016820543
## ptratio -0.959305356
## black 0.006760755
## lstat -0.478261284
boston.pred.newlasso<-predict(lasso.model,s=boston.bestlam.lasso,newx=bostontest.mat)
lasso.MSE<-mean((boston.test$medv-boston.pred.newlasso)^2)
lasso.MSE
## [1] 26.77717
pcr.model<-pcr(medv~.,data=boston.train,scale=TRUE,validation="CV")
summary(pcr.model)
## Data: X dimension: 253 13
## Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 8.802 6.785 6.673 5.361 4.840 4.813 4.849
## adjCV 8.802 6.783 6.674 5.352 4.817 4.797 4.834
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 4.857 4.847 4.864 4.860 4.888 4.850 4.686
## adjCV 4.842 4.830 4.847 4.841 4.872 4.825 4.662
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 48.02 58.80 67.99 75.11 80.85 85.73 89.55 92.97
## medv 40.83 43.75 64.55 72.78 72.81 72.85 72.88 73.28
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 95.09 96.92 98.29 99.50 100.00
## medv 73.43 73.77 73.78 75.04 76.83
pcr.model.5comp<-pcr(medv~.,data=boston.train,scale=TRUE,ncomp=5)
summary(pcr.model.5comp)
## Data: X dimension: 253 13
## Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 5
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps
## X 48.02 58.80 67.99 75.11 80.85
## medv 40.83 43.75 64.55 72.78 72.81
pcr.model.5comp$coefficients
## , , 1 comps
##
## medv
## crim -0.58901583
## zn 0.55290872
## indus -0.75658758
## chas 0.04169713
## nox -0.77186413
## rm 0.42824335
## age -0.69998703
## dis 0.72258349
## rad -0.71757547
## tax -0.75690381
## ptratio -0.48362619
## black 0.48103143
## lstat -0.68991293
##
## , , 2 comps
##
## medv
## crim -1.0327137
## zn 0.1183425
## indus -0.5461255
## chas 0.5908909
## nox -0.5331747
## rm 0.4320955
## age -0.3325371
## dis 0.3178432
## rad -1.0960500
## tax -1.0634332
## ptratio -0.7103027
## black 0.9657038
## lstat -0.6915826
##
## , , 3 comps
##
## medv
## crim -0.41021087
## zn 1.06227200
## indus -0.44361778
## chas 2.25645994
## nox 0.04055328
## rm 2.48038051
## age -0.31224947
## dis -0.15731819
## rad -0.18301087
## tax -0.34805561
## ptratio -2.08747479
## black 0.25720650
## lstat -1.63011220
##
## , , 4 comps
##
## medv
## crim -0.94766845
## zn 0.14082804
## indus -0.37620155
## chas 0.97649527
## nox 0.02713317
## rm 3.94426169
## age -0.14133317
## dis -0.64803172
## rad 0.03854496
## tax -0.18770194
## ptratio -1.34190432
## black 0.43537465
## lstat -2.62947540
##
## , , 5 comps
##
## medv
## crim -0.936993259
## zn 0.161504585
## indus -0.364860941
## chas 0.864967231
## nox 0.063304433
## rm 3.952029786
## age -0.105517898
## dis -0.675079851
## rad -0.005581025
## tax -0.214616117
## ptratio -1.471135730
## black 0.405716567
## lstat -2.596174401
boston.predict.pcr<-predict(pcr.model,boston.test,ncomp=5)
pcr.MSE<-mean((boston.test$medv-boston.predict.pcr)^2)
pcr.MSE
## [1] 30.28912
plsr.model<-plsr(medv~.,data=boston.train,scale=TRUE,validation="CV")
validationplot(plsr.model,val.type="MSEP")
plsr.model.5comp<-plsr(medv~.,data=boston.train,scale=TRUE,ncomp=5)
summary(plsr.model.5comp)
## Data: X dimension: 253 13
## Y dimension: 253 1
## Fit method: kernelpls
## Number of components considered: 5
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps
## X 47.00 56.99 64.47 69.35 76.13
## medv 52.23 73.62 74.80 75.90 76.34
plsr.model.5comp$coefficients
## , , 1 comps
##
## medv
## crim -0.6868980
## zn 0.5584312
## indus -0.7542734
## chas 0.3044365
## nox -0.7184728
## rm 1.1198754
## age -0.6758696
## dis 0.4492035
## rad -0.6245714
## tax -0.7786859
## ptratio -0.8055454
## black 0.5588625
## lstat -1.1728534
##
## , , 2 comps
##
## medv
## crim -0.7133688
## zn 0.2549858
## indus -0.3176855
## chas 1.2241722
## nox -0.1155627
## rm 3.5893439
## age -0.2038082
## dis -1.0812735
## rad 0.2048370
## tax -0.4464095
## ptratio -1.7663214
## black 0.5851232
## lstat -2.7200795
##
## , , 3 comps
##
## medv
## crim -0.5518822
## zn 0.1765839
## indus -0.3471971
## chas 0.5337188
## nox -0.5437048
## rm 3.6351309
## age -0.3851608
## dis -1.5566943
## rad 0.7046832
## tax -0.6243897
## ptratio -1.6045435
## black 0.5789136
## lstat -3.1847314
##
## , , 4 comps
##
## medv
## crim -0.5531629
## zn 0.1879035
## indus -0.2014220
## chas 0.5457557
## nox -1.2314578
## rm 2.9663873
## age -0.3673851
## dis -2.6699970
## rad 1.3611945
## tax -1.4978588
## ptratio -1.8436926
## black 0.8602464
## lstat -3.5238234
##
## , , 5 comps
##
## medv
## crim -0.4810112
## zn 0.6098952
## indus -0.1724216
## chas 0.6845996
## nox -1.6119076
## rm 2.7556799
## age -0.3872218
## dis -2.9521650
## rad 1.9996162
## tax -1.8545314
## ptratio -1.9948811
## black 0.6537034
## lstat -3.5619537
boston.predict.plsr<-predict(plsr.model,boston.test,ncomp=5)
plsr.MSE<-mean((boston.test$medv-boston.predict.plsr)^2)
plsr.MSE
## [1] 26.36205
(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.
In this case, the model with the least error would be Partial Least Square with 26.3621. The model with the most error would be Principal Component Regression with 30.2891. The Best Subset and Stepwise Selection both have the same MSE.
(c) Does your chosen model involve all of the features in the data set? Why or why not?