library(tidyverse)
library(ISLR)
library(leaps)
library(glmnet)
library(caret)
library(pls) 
library(MASS)
library(dplyr)

Problem 2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

(a) The lasso, relative to least squares, is:

Option iii. This is because lasso estimates the \(\beta\) that shrinks the RSS, which gives a \(\beta\) close to 0

(b) Repeat (a) for ridge regression relative to least squares.

Also option iii, for the same reasons as in part (a), however Ridge Regression will not shrink the coefficients to exactly zero.

(c) Repeat (a) for non-linear methods relative to least squares.

Option ii. The non-linear method have a higher variance, which allows the curve to follow observations more closely which then lets the model preform better.

Problem 9

In this exercise, we will predict the number of applications received using the other variables in the College data set.

(a) Split the data set into a training set and a test set.

data("College")
set.seed(1)
train_1 <- sample(1:nrow(College), round(nrow(College) * 0.5))
train <- College[train_1, ]
nrow(train) / nrow(College)
## [1] 0.4993565
test <- College[-train_1, ]
nrow(test) / nrow(College)
## [1] 0.5006435

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

linear <- lm(Apps ~ ., data = train)
summary(linear)
## 
## Call:
## lm(formula = Apps ~ ., data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5741.2  -479.5    15.3   359.6  7258.0 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7.902e+02  6.381e+02  -1.238 0.216410    
## PrivateYes  -3.070e+02  2.006e+02  -1.531 0.126736    
## Accept       1.779e+00  5.420e-02  32.830  < 2e-16 ***
## Enroll      -1.470e+00  3.115e-01  -4.720 3.35e-06 ***
## Top10perc    6.673e+01  8.310e+00   8.030 1.31e-14 ***
## Top25perc   -2.231e+01  6.533e+00  -3.415 0.000708 ***
## F.Undergrad  9.269e-02  5.529e-02   1.676 0.094538 .  
## P.Undergrad  9.397e-03  5.493e-02   0.171 0.864275    
## Outstate    -1.084e-01  2.700e-02  -4.014 7.22e-05 ***
## Room.Board   2.115e-01  7.224e-02   2.928 0.003622 ** 
## Books        2.912e-01  3.985e-01   0.731 0.465399    
## Personal     6.133e-03  8.803e-02   0.070 0.944497    
## PhD         -1.548e+01  6.681e+00  -2.316 0.021082 *  
## Terminal     6.415e+00  7.290e+00   0.880 0.379470    
## S.F.Ratio    2.283e+01  2.047e+01   1.115 0.265526    
## perc.alumni  1.134e+00  6.083e+00   0.186 0.852274    
## Expend       4.857e-02  1.619e-02   2.999 0.002890 ** 
## Grad.Rate    7.490e+00  4.397e+00   1.703 0.089324 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1083 on 370 degrees of freedom
## Multiple R-squared:  0.9389, Adjusted R-squared:  0.9361 
## F-statistic: 334.3 on 17 and 370 DF,  p-value: < 2.2e-16
ols_pred <- predict(linear, test)
(ols_mse <- mean((ols_pred - test$Apps)^2))
## [1] 1135758

The MSE we got for this test was 1108531.

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

library(glmnet)
train.mat<-model.matrix(Apps~.,data=train)
test.mat<-model.matrix(Apps~.,data=test)
grid<-10^seq(4,-2,length=100)
library(glmnet)
ridge<-glmnet(train.mat,train$Apps,alpha=0,lambda=grid,thresh = 1e-12)
cv.ridge<-cv.glmnet(train.mat,train$Apps,alpha=0,lambda=grid,thresh=1e-12)
bestlam.ridge<-cv.ridge$lambda.min
bestlam.ridge
## [1] 0.01
pred.newridge<-predict(ridge,s=bestlam.ridge,newx =test.mat)
mean((test$Apps-pred.newridge)^2)
## [1] 1135714

This MSE for this test is 1108514 which is slightly less than LSM model, which was 1108531.

(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.

lasso<-glmnet(train.mat,train$Apps,alpha=1,lambda=grid,thresh = 1e-12)
cv.lasso<-cv.glmnet(train.mat,train$Apps,alpha=1,lambda=grid,thresh=1e-12)
bestlam.lasso<-cv.lasso$lambda.min
bestlam.lasso
## [1] 0.01
pred.newlasso<-predict(lasso,s=bestlam.lasso,newx =test.mat)
mean((test$Apps-pred.newlasso)^2)
## [1] 1135660
predict(lasso,s=bestlam.lasso,type="coefficients")
## 19 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept) -7.900363e+02
## (Intercept)  .           
## PrivateYes  -3.070103e+02
## Accept       1.779328e+00
## Enroll      -1.469508e+00
## Top10perc    6.672214e+01
## Top25perc   -2.230442e+01
## F.Undergrad  9.258974e-02
## P.Undergrad  9.408838e-03
## Outstate    -1.083495e-01
## Room.Board   2.115147e-01
## Books        2.912105e-01
## Personal     6.120406e-03
## PhD         -1.547200e+01
## Terminal     6.409503e+00
## S.F.Ratio    2.282638e+01
## perc.alumni  1.130498e+00
## Expend       4.856697e-02
## Grad.Rate    7.488081e+00

The error we got was 999125.6 which is slightly lower than what we got in part (d).

(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

library(pls)
fit.pcr <- pcr(Apps ~ ., data = train, scale = TRUE, validation = "CV")
validationplot(fit.pcr, val.type = "MSEP")

pred.pcr <- predict(fit.pcr, test, ncomp = 10)
mean((pred.pcr - test$Apps)^2)
## [1] 1723100

The test error was 1505718, which is higher than both the ridge regression and lasso models.

(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

fit.pls <- plsr(Apps ~ ., data = train, scale = TRUE, validation = "CV")
validationplot(fit.pls, val.type = "MSEP")

pred.pls <- predict(fit.pls, test, ncomp = 10)
mean((pred.pls - test$Apps)^2)
## [1] 1131661

The error for this model is 1134531.

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

totalsumofsquares = sum((mean(test$Apps) - test$Apps)^2)
totalsumofresiduals = sum((pred.newridge - test$Apps)^2)
1 - (totalsumofresiduals)/(totalsumofsquares)  
## [1] 0.9015452

Based on all of our results, Ridge Regression had the least error which would most accurately predict the number of applications received. The model with the highest error woul be the LASSO method.

Problem 11

We will now try to predict per capita crime rate in the Boston data set.

library(MASS)
library(glmnet)
library(leaps)
library(pls)
data("Boston")

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

set.seed(1)
subset<-sample(nrow(Boston),nrow(Boston)*0.50)
boston.train<-Boston[subset,]
boston.test<-Boston[-subset,]

Full Model

slr.full<-lm(medv~.,data=boston.train)
slr.predict<-predict(slr.full,boston.test)
slr.MSE<-mean((slr.predict-boston.test$medv)^2)
slr.MSE
## [1] 26.86123

Best Subset

bsm<-regsubsets(medv~.,data=boston.train,nbest=1,nvmax=13)
summary(bsm)
## Subset selection object
## Call: regsubsets.formula(medv ~ ., data = boston.train, nbest = 1, 
##     nvmax = 13)
## 13 Variables  (and intercept)
##         Forced in Forced out
## crim        FALSE      FALSE
## zn          FALSE      FALSE
## indus       FALSE      FALSE
## chas        FALSE      FALSE
## nox         FALSE      FALSE
## rm          FALSE      FALSE
## age         FALSE      FALSE
## dis         FALSE      FALSE
## rad         FALSE      FALSE
## tax         FALSE      FALSE
## ptratio     FALSE      FALSE
## black       FALSE      FALSE
## lstat       FALSE      FALSE
## 1 subsets of each size up to 13
## Selection Algorithm: exhaustive
##           crim zn  indus chas nox rm  age dis rad tax ptratio black lstat
## 1  ( 1 )  " "  " " " "   " "  " " " " " " " " " " " " " "     " "   "*"  
## 2  ( 1 )  " "  " " " "   " "  " " "*" " " " " " " " " " "     " "   "*"  
## 3  ( 1 )  " "  " " " "   " "  " " "*" " " " " " " " " "*"     " "   "*"  
## 4  ( 1 )  " "  " " " "   "*"  " " "*" " " " " " " " " "*"     " "   "*"  
## 5  ( 1 )  " "  " " " "   " "  "*" "*" " " "*" " " " " "*"     " "   "*"  
## 6  ( 1 )  " "  " " " "   " "  " " "*" " " "*" "*" "*" "*"     " "   "*"  
## 7  ( 1 )  " "  " " " "   " "  "*" "*" " " "*" "*" "*" "*"     " "   "*"  
## 8  ( 1 )  "*"  " " " "   " "  "*" "*" " " "*" "*" "*" "*"     " "   "*"  
## 9  ( 1 )  "*"  " " " "   "*"  "*" "*" " " "*" "*" "*" "*"     " "   "*"  
## 10  ( 1 ) "*"  " " " "   "*"  "*" "*" " " "*" "*" "*" "*"     "*"   "*"  
## 11  ( 1 ) "*"  "*" " "   "*"  "*" "*" " " "*" "*" "*" "*"     "*"   "*"  
## 12  ( 1 ) "*"  "*" "*"   "*"  "*" "*" " " "*" "*" "*" "*"     "*"   "*"  
## 13  ( 1 ) "*"  "*" "*"   "*"  "*" "*" "*" "*" "*" "*" "*"     "*"   "*"
boston.test.mat <- model.matrix(medv~ ., data = boston.test, nvmax = 13)
val.errors <- rep(NA, 13)
for (i in 1:13) {coefi <- coef(bsm, id = i) 
pred <- boston.test.mat[, names(coefi)] %*% coefi 
val.errors[i] <- mean((pred - boston.test$medv)^2)}
which.min(val.errors)
## [1] 12
bsm.MSE<-val.errors[11]
bsm.MSE
## [1] 26.85842

Forward Selection

nullmodel<-lm(medv ~ 1, data = boston.train)
fullmodel<-lm(medv ~ ., data = boston.train)
fwd.model<-step(nullmodel, scope = list(lower = nullmodel, upper = fullmodel), direction = "forward")
## Start:  AIC=1100.55
## medv ~ 1
## 
##           Df Sum of Sq     RSS     AIC
## + lstat    1   10842.0  8605.9  896.28
## + rm       1    9884.6  9563.3  922.97
## + ptratio  1    5114.5 14333.4 1025.35
## + tax      1    4779.1 14668.8 1031.20
## + indus    1    4484.1 14963.7 1036.24
## + nox      1    4068.6 15379.3 1043.17
## + crim     1    3718.8 15729.1 1048.86
## + age      1    3600.4 15847.5 1050.76
## + rad      1    3074.6 16373.3 1059.01
## + black    1    2461.7 16986.2 1068.31
## + zn       1    2457.9 16990.0 1068.37
## + dis      1    1590.4 17857.5 1080.97
## + chas     1     730.5 18717.4 1092.87
## <none>                 19447.9 1100.55
## 
## Step:  AIC=896.28
## medv ~ lstat
## 
##           Df Sum of Sq    RSS    AIC
## + rm       1   2085.06 6520.9 828.09
## + ptratio  1   1171.36 7434.6 861.27
## + chas     1    427.46 8178.5 885.39
## + tax      1    300.89 8305.0 889.28
## + dis      1    228.39 8377.5 891.48
## + black    1     92.97 8512.9 895.54
## + zn       1     85.26 8520.6 895.77
## + crim     1     74.82 8531.1 896.08
## + age      1     72.51 8533.4 896.14
## <none>                 8605.9 896.28
## + rad      1     47.77 8558.1 896.88
## + indus    1     39.19 8566.7 897.13
## + nox      1      4.39 8601.5 898.16
## 
## Step:  AIC=828.09
## medv ~ lstat + rm
## 
##           Df Sum of Sq    RSS    AIC
## + ptratio  1    824.15 5696.7 795.91
## + tax      1    351.28 6169.6 816.08
## + chas     1    344.68 6176.2 816.35
## + black    1    227.88 6293.0 821.09
## + crim     1    143.66 6377.2 824.45
## + rad      1    130.81 6390.0 824.96
## + dis      1     84.67 6436.2 826.78
## <none>                 6520.9 828.09
## + zn       1     30.73 6490.1 828.90
## + indus    1     21.44 6499.4 829.26
## + nox      1     15.40 6505.5 829.49
## + age      1      0.21 6520.6 830.08
## 
## Step:  AIC=795.91
## medv ~ lstat + rm + ptratio
## 
##         Df Sum of Sq    RSS    AIC
## + chas   1   203.711 5493.0 788.69
## + dis    1   187.949 5508.7 789.42
## + black  1   126.547 5570.2 792.22
## + tax    1   101.425 5595.3 793.36
## + crim   1    70.418 5626.3 794.76
## <none>               5696.7 795.91
## + age    1    19.500 5677.2 797.04
## + zn     1    11.637 5685.1 797.39
## + nox    1     6.848 5689.8 797.60
## + rad    1     1.435 5695.3 797.84
## + indus  1     0.376 5696.3 797.89
## 
## Step:  AIC=788.69
## medv ~ lstat + rm + ptratio + chas
## 
##         Df Sum of Sq    RSS    AIC
## + dis    1   135.559 5357.4 784.37
## + black  1   106.263 5386.7 785.75
## + tax    1    97.720 5395.3 786.15
## + crim   1    62.204 5430.8 787.81
## <none>               5493.0 788.69
## + nox    1    19.651 5473.3 789.79
## + zn     1     7.597 5485.4 790.34
## + age    1     6.503 5486.5 790.39
## + rad    1     2.406 5490.6 790.58
## + indus  1     0.712 5492.3 790.66
## 
## Step:  AIC=784.37
## medv ~ lstat + rm + ptratio + chas + dis
## 
##         Df Sum of Sq    RSS    AIC
## + nox    1   294.802 5062.6 772.05
## + tax    1   240.625 5116.8 774.74
## + black  1   139.238 5218.2 779.71
## + crim   1   103.497 5253.9 781.44
## + indus  1   100.754 5256.7 781.57
## <none>               5357.4 784.37
## + rad    1    31.976 5325.5 784.86
## + age    1    31.945 5325.5 784.86
## + zn     1    17.620 5339.8 785.54
## 
## Step:  AIC=772.05
## medv ~ lstat + rm + ptratio + chas + dis + nox
## 
##         Df Sum of Sq    RSS    AIC
## + black  1    93.740 4968.9 769.32
## + tax    1    64.410 4998.2 770.81
## + crim   1    63.936 4998.7 770.84
## <none>               5062.6 772.05
## + zn     1    17.206 5045.4 773.19
## + indus  1    12.916 5049.7 773.41
## + rad    1     5.096 5057.5 773.80
## + age    1     3.632 5059.0 773.87
## 
## Step:  AIC=769.32
## medv ~ lstat + rm + ptratio + chas + dis + nox + black
## 
##         Df Sum of Sq    RSS    AIC
## <none>               4968.9 769.32
## + rad    1   31.4109 4937.5 769.72
## + tax    1   29.9137 4939.0 769.80
## + crim   1   28.4346 4940.5 769.87
## + zn     1   25.4566 4943.4 770.02
## + indus  1   10.3822 4958.5 770.79
## + age    1    8.4054 4960.5 770.89
summary(fwd.model)
## 
## Call:
## lm(formula = medv ~ lstat + rm + ptratio + chas + dis + nox + 
##     black, data = boston.train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.1197  -2.7283  -0.4954   1.7559  16.7531 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  28.027255   6.590656   4.253 3.01e-05 ***
## lstat        -0.514869   0.062929  -8.182 1.53e-14 ***
## rm            4.522123   0.529007   8.548 1.36e-15 ***
## ptratio      -0.973703   0.158207  -6.155 3.05e-09 ***
## chas          3.058479   1.167288   2.620 0.009338 ** 
## dis          -1.039421   0.230975  -4.500 1.05e-05 ***
## nox         -15.136660   4.317294  -3.506 0.000541 ***
## black         0.007219   0.003358   2.150 0.032543 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.503 on 245 degrees of freedom
## Multiple R-squared:  0.7445, Adjusted R-squared:  0.7372 
## F-statistic:   102 on 7 and 245 DF,  p-value: < 2.2e-16
fwd.predict<-predict(fwd.model,boston.test)
fwd.mse<-mean((fwd.predict-boston.test$medv)^2)
fwd.mse
## [1] 27.45563

Backward Selection

bwd.model<-step(fullmodel,direction = "backward")
## Start:  AIC=756.62
## medv ~ crim + zn + indus + chas + nox + rm + age + dis + rad + 
##     tax + ptratio + black + lstat
## 
##           Df Sum of Sq    RSS    AIC
## - age      1      0.78 4507.4 754.66
## - indus    1      3.57 4510.2 754.82
## <none>                 4506.6 756.62
## - zn       1     50.40 4557.0 757.43
## - black    1     72.28 4578.9 758.64
## - chas     1     77.17 4583.8 758.91
## - crim     1     90.78 4597.4 759.66
## - nox      1    133.78 4640.4 762.02
## - tax      1    307.43 4814.0 771.31
## - rad      1    364.39 4871.0 774.29
## - dis      1    411.50 4918.1 776.72
## - ptratio  1    592.43 5099.0 785.86
## - lstat    1    943.49 5450.1 802.71
## - rm       1   1098.86 5605.5 809.82
## 
## Step:  AIC=754.66
## medv ~ crim + zn + indus + chas + nox + rm + dis + rad + tax + 
##     ptratio + black + lstat
## 
##           Df Sum of Sq    RSS    AIC
## - indus    1      3.65 4511.0 752.87
## <none>                 4507.4 754.66
## - zn       1     51.85 4559.2 755.55
## - black    1     71.74 4579.1 756.66
## - chas     1     76.63 4584.0 756.93
## - crim     1     90.53 4597.9 757.69
## - nox      1    147.42 4654.8 760.80
## - tax      1    310.41 4817.8 769.51
## - rad      1    373.48 4880.9 772.80
## - dis      1    443.51 4950.9 776.41
## - ptratio  1    611.10 5118.5 784.83
## - rm       1   1141.83 5649.2 809.79
## - lstat    1   1149.41 5656.8 810.13
## 
## Step:  AIC=752.87
## medv ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio + 
##     black + lstat
## 
##           Df Sum of Sq    RSS    AIC
## <none>                 4511.0 752.87
## - zn       1     49.56 4560.6 753.63
## - black    1     71.75 4582.8 754.86
## - chas     1     78.50 4589.5 755.23
## - crim     1     93.75 4604.8 756.07
## - nox      1    144.49 4655.5 758.84
## - tax      1    325.81 4836.8 768.51
## - rad      1    372.51 4883.5 770.94
## - dis      1    499.36 5010.4 777.43
## - ptratio  1    608.00 5119.0 782.86
## - rm       1   1138.88 5649.9 807.82
## - lstat    1   1152.03 5663.1 808.41
summary(bwd.model)
## 
## Call:
## lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad + 
##     tax + ptratio + black + lstat, data = boston.train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.4289  -2.5103  -0.3652   1.5543  16.8483 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  34.844360   6.711874   5.191 4.43e-07 ***
## crim         -0.093283   0.041682  -2.238  0.02614 *  
## zn            0.029404   0.018070   1.627  0.10500    
## chas          2.320891   1.133312   2.048  0.04166 *  
## nox         -13.320909   4.794503  -2.778  0.00589 ** 
## rm            4.060373   0.520542   7.800 1.86e-13 ***
## dis          -1.274369   0.246729  -5.165 5.04e-07 ***
## rad           0.375198   0.084105   4.461 1.25e-05 ***
## tax          -0.018505   0.004435  -4.172 4.22e-05 ***
## ptratio      -0.982326   0.172358  -5.699 3.50e-08 ***
## black         0.006774   0.003460   1.958  0.05140 .  
## lstat        -0.488743   0.062299  -7.845 1.40e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.326 on 241 degrees of freedom
## Multiple R-squared:  0.768,  Adjusted R-squared:  0.7575 
## F-statistic: 72.54 on 11 and 241 DF,  p-value: < 2.2e-16
bwd.predict<-predict(bwd.model,boston.test)
bwd.mse<-mean((bwd.predict-boston.test$medv)^2)
bwd.mse
## [1] 26.85842

Stepwise Selection

step.model<-step(nullmodel, scope = list(lower = nullmodel, upper = fullmodel), direction = "both")
## Start:  AIC=1100.55
## medv ~ 1
## 
##           Df Sum of Sq     RSS     AIC
## + lstat    1   10842.0  8605.9  896.28
## + rm       1    9884.6  9563.3  922.97
## + ptratio  1    5114.5 14333.4 1025.35
## + tax      1    4779.1 14668.8 1031.20
## + indus    1    4484.1 14963.7 1036.24
## + nox      1    4068.6 15379.3 1043.17
## + crim     1    3718.8 15729.1 1048.86
## + age      1    3600.4 15847.5 1050.76
## + rad      1    3074.6 16373.3 1059.01
## + black    1    2461.7 16986.2 1068.31
## + zn       1    2457.9 16990.0 1068.37
## + dis      1    1590.4 17857.5 1080.97
## + chas     1     730.5 18717.4 1092.87
## <none>                 19447.9 1100.55
## 
## Step:  AIC=896.28
## medv ~ lstat
## 
##           Df Sum of Sq     RSS     AIC
## + rm       1    2085.1  6520.9  828.09
## + ptratio  1    1171.4  7434.6  861.27
## + chas     1     427.5  8178.5  885.39
## + tax      1     300.9  8305.0  889.28
## + dis      1     228.4  8377.5  891.48
## + black    1      93.0  8512.9  895.54
## + zn       1      85.3  8520.6  895.77
## + crim     1      74.8  8531.1  896.08
## + age      1      72.5  8533.4  896.14
## <none>                  8605.9  896.28
## + rad      1      47.8  8558.1  896.88
## + indus    1      39.2  8566.7  897.13
## + nox      1       4.4  8601.5  898.16
## - lstat    1   10842.0 19447.9 1100.55
## 
## Step:  AIC=828.09
## medv ~ lstat + rm
## 
##           Df Sum of Sq    RSS    AIC
## + ptratio  1    824.15 5696.7 795.91
## + tax      1    351.28 6169.6 816.08
## + chas     1    344.68 6176.2 816.35
## + black    1    227.88 6293.0 821.09
## + crim     1    143.66 6377.2 824.45
## + rad      1    130.81 6390.0 824.96
## + dis      1     84.67 6436.2 826.78
## <none>                 6520.9 828.09
## + zn       1     30.73 6490.1 828.90
## + indus    1     21.44 6499.4 829.26
## + nox      1     15.40 6505.5 829.49
## + age      1      0.21 6520.6 830.08
## - rm       1   2085.06 8605.9 896.28
## - lstat    1   3042.41 9563.3 922.97
## 
## Step:  AIC=795.91
## medv ~ lstat + rm + ptratio
## 
##           Df Sum of Sq    RSS    AIC
## + chas     1    203.71 5493.0 788.69
## + dis      1    187.95 5508.7 789.42
## + black    1    126.55 5570.2 792.22
## + tax      1    101.42 5595.3 793.36
## + crim     1     70.42 5626.3 794.76
## <none>                 5696.7 795.91
## + age      1     19.50 5677.2 797.04
## + zn       1     11.64 5685.1 797.39
## + nox      1      6.85 5689.8 797.60
## + rad      1      1.44 5695.3 797.84
## + indus    1      0.38 5696.3 797.89
## - ptratio  1    824.15 6520.9 828.09
## - rm       1   1737.85 7434.6 861.27
## - lstat    1   2145.70 7842.4 874.78
## 
## Step:  AIC=788.69
## medv ~ lstat + rm + ptratio + chas
## 
##           Df Sum of Sq    RSS    AIC
## + dis      1    135.56 5357.4 784.37
## + black    1    106.26 5386.7 785.75
## + tax      1     97.72 5395.3 786.15
## + crim     1     62.20 5430.8 787.81
## <none>                 5493.0 788.69
## + nox      1     19.65 5473.3 789.79
## + zn       1      7.60 5485.4 790.34
## + age      1      6.50 5486.5 790.39
## + rad      1      2.41 5490.6 790.58
## + indus    1      0.71 5492.3 790.66
## - chas     1    203.71 5696.7 795.91
## - ptratio  1    683.19 6176.2 816.35
## - rm       1   1704.54 7197.5 855.07
## - lstat    1   2169.81 7662.8 870.92
## 
## Step:  AIC=784.37
## medv ~ lstat + rm + ptratio + chas + dis
## 
##           Df Sum of Sq    RSS    AIC
## + nox      1    294.80 5062.6 772.05
## + tax      1    240.63 5116.8 774.74
## + black    1    139.24 5218.2 779.71
## + crim     1    103.50 5253.9 781.44
## + indus    1    100.75 5256.7 781.57
## <none>                 5357.4 784.37
## + rad      1     31.98 5325.5 784.86
## + age      1     31.94 5325.5 784.86
## + zn       1     17.62 5339.8 785.54
## - dis      1    135.56 5493.0 788.69
## - chas     1    151.32 5508.7 789.42
## - ptratio  1    771.23 6128.7 816.40
## - rm       1   1529.43 6886.9 845.91
## - lstat    1   2190.86 7548.3 869.11
## 
## Step:  AIC=772.05
## medv ~ lstat + rm + ptratio + chas + dis + nox
## 
##           Df Sum of Sq    RSS    AIC
## + black    1     93.74 4968.9 769.32
## + tax      1     64.41 4998.2 770.81
## + crim     1     63.94 4998.7 770.84
## <none>                 5062.6 772.05
## + zn       1     17.21 5045.4 773.19
## + indus    1     12.92 5049.7 773.41
## + rad      1      5.10 5057.5 773.80
## + age      1      3.63 5059.0 773.87
## - chas     1    161.79 5224.4 778.01
## - nox      1    294.80 5357.4 784.37
## - dis      1    410.71 5473.3 789.79
## - ptratio  1    842.05 5904.7 808.98
## - rm       1   1410.77 6473.4 832.24
## - lstat    1   1606.46 6669.1 839.78
## 
## Step:  AIC=769.32
## medv ~ lstat + rm + ptratio + chas + dis + nox + black
## 
##           Df Sum of Sq    RSS    AIC
## <none>                 4968.9 769.32
## + rad      1     31.41 4937.5 769.72
## + tax      1     29.91 4939.0 769.80
## + crim     1     28.43 4940.5 769.87
## + zn       1     25.46 4943.4 770.02
## + indus    1     10.38 4958.5 770.79
## + age      1      8.41 4960.5 770.89
## - black    1     93.74 5062.6 772.05
## - chas     1    139.23 5108.1 774.31
## - nox      1    249.30 5218.2 779.71
## - dis      1    410.72 5379.6 787.42
## - ptratio  1    768.24 5737.1 803.69
## - lstat    1   1357.64 6326.5 828.44
## - rm       1   1482.02 6450.9 833.36
summary(step.model)
## 
## Call:
## lm(formula = medv ~ lstat + rm + ptratio + chas + dis + nox + 
##     black, data = boston.train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.1197  -2.7283  -0.4954   1.7559  16.7531 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  28.027255   6.590656   4.253 3.01e-05 ***
## lstat        -0.514869   0.062929  -8.182 1.53e-14 ***
## rm            4.522123   0.529007   8.548 1.36e-15 ***
## ptratio      -0.973703   0.158207  -6.155 3.05e-09 ***
## chas          3.058479   1.167288   2.620 0.009338 ** 
## dis          -1.039421   0.230975  -4.500 1.05e-05 ***
## nox         -15.136660   4.317294  -3.506 0.000541 ***
## black         0.007219   0.003358   2.150 0.032543 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.503 on 245 degrees of freedom
## Multiple R-squared:  0.7445, Adjusted R-squared:  0.7372 
## F-statistic:   102 on 7 and 245 DF,  p-value: < 2.2e-16
step.predict<-predict(step.model,boston.test)
step.mse<-mean((step.predict-boston.test$medv)^2)
step.mse
## [1] 27.45563

Ridge Regression

bostontrain.mat<-model.matrix(medv~.,data=boston.train)
bostontest.mat<-model.matrix(medv~.,data=boston.test)
grid<-10^seq(4,-2,length=100)
ridge.model<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
boston.cv.ridge<-cv.glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
boston.bestlam.ridge<-boston.cv.ridge$lambda.min
boston.bestlam.ridge
## [1] 0.05336699
ridge.model.1<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=boston.bestlam.ridge)
coef(ridge.model.1)
## 15 x 1 sparse Matrix of class "dgCMatrix"
##                        s0
## (Intercept)  33.549938944
## (Intercept)   .          
## crim         -0.089333575
## zn            0.028630555
## indus         0.022489614
## chas          2.372039925
## nox         -13.053954757
## rm            4.138827718
## age          -0.004384557
## dis          -1.236522044
## rad           0.347615984
## tax          -0.017518196
## ptratio      -0.965923891
## black         0.006772119
## lstat        -0.481050642
boston.pred.newridge<-predict(ridge.model,s=boston.bestlam.ridge,newx=bostontest.mat)
ridge.MSE<-mean((boston.test$medv-boston.pred.newridge)^2)
ridge.MSE
## [1] 26.79522

Lasso Regression

lasso.model<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
boston.cv.lasso<-cv.glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
boston.bestlam.lasso<-boston.cv.lasso$lambda.min
boston.bestlam.lasso
## [1] 0.08111308
lasso.model.1<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=boston.bestlam.lasso)
coef(lasso.model.1)
## 15 x 1 sparse Matrix of class "dgCMatrix"
##                        s0
## (Intercept)  33.024094010
## (Intercept)   .          
## crim         -0.088159911
## zn            0.027992223
## indus         0.018467094
## chas          2.400885843
## nox         -12.807002261
## rm            4.155104180
## age          -0.004805140
## dis          -1.221950655
## rad           0.333681186
## tax          -0.016820543
## ptratio      -0.959305356
## black         0.006760755
## lstat        -0.478261284
boston.pred.newlasso<-predict(lasso.model,s=boston.bestlam.lasso,newx=bostontest.mat)
lasso.MSE<-mean((boston.test$medv-boston.pred.newlasso)^2)
lasso.MSE
## [1] 26.77717

PCR

pcr.model<-pcr(medv~.,data=boston.train,scale=TRUE,validation="CV")
summary(pcr.model)
## Data:    X dimension: 253 13 
##  Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           8.802    6.785    6.673    5.361    4.840    4.813    4.849
## adjCV        8.802    6.783    6.674    5.352    4.817    4.797    4.834
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       4.857    4.847    4.864     4.860     4.888     4.850     4.686
## adjCV    4.842    4.830    4.847     4.841     4.872     4.825     4.662
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       48.02    58.80    67.99    75.11    80.85    85.73    89.55    92.97
## medv    40.83    43.75    64.55    72.78    72.81    72.85    72.88    73.28
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       95.09     96.92     98.29     99.50    100.00
## medv    73.43     73.77     73.78     75.04     76.83
pcr.model.5comp<-pcr(medv~.,data=boston.train,scale=TRUE,ncomp=5)
summary(pcr.model.5comp)
## Data:    X dimension: 253 13 
##  Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 5
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps
## X       48.02    58.80    67.99    75.11    80.85
## medv    40.83    43.75    64.55    72.78    72.81
pcr.model.5comp$coefficients
## , , 1 comps
## 
##                medv
## crim    -0.58901583
## zn       0.55290872
## indus   -0.75658758
## chas     0.04169713
## nox     -0.77186413
## rm       0.42824335
## age     -0.69998703
## dis      0.72258349
## rad     -0.71757547
## tax     -0.75690381
## ptratio -0.48362619
## black    0.48103143
## lstat   -0.68991293
## 
## , , 2 comps
## 
##               medv
## crim    -1.0327137
## zn       0.1183425
## indus   -0.5461255
## chas     0.5908909
## nox     -0.5331747
## rm       0.4320955
## age     -0.3325371
## dis      0.3178432
## rad     -1.0960500
## tax     -1.0634332
## ptratio -0.7103027
## black    0.9657038
## lstat   -0.6915826
## 
## , , 3 comps
## 
##                medv
## crim    -0.41021087
## zn       1.06227200
## indus   -0.44361778
## chas     2.25645994
## nox      0.04055328
## rm       2.48038051
## age     -0.31224947
## dis     -0.15731819
## rad     -0.18301087
## tax     -0.34805561
## ptratio -2.08747479
## black    0.25720650
## lstat   -1.63011220
## 
## , , 4 comps
## 
##                medv
## crim    -0.94766845
## zn       0.14082804
## indus   -0.37620155
## chas     0.97649527
## nox      0.02713317
## rm       3.94426169
## age     -0.14133317
## dis     -0.64803172
## rad      0.03854496
## tax     -0.18770194
## ptratio -1.34190432
## black    0.43537465
## lstat   -2.62947540
## 
## , , 5 comps
## 
##                 medv
## crim    -0.936993259
## zn       0.161504585
## indus   -0.364860941
## chas     0.864967231
## nox      0.063304433
## rm       3.952029786
## age     -0.105517898
## dis     -0.675079851
## rad     -0.005581025
## tax     -0.214616117
## ptratio -1.471135730
## black    0.405716567
## lstat   -2.596174401
boston.predict.pcr<-predict(pcr.model,boston.test,ncomp=5)
pcr.MSE<-mean((boston.test$medv-boston.predict.pcr)^2)
pcr.MSE
## [1] 30.28912

PLS

plsr.model<-plsr(medv~.,data=boston.train,scale=TRUE,validation="CV")
validationplot(plsr.model,val.type="MSEP")

plsr.model.5comp<-plsr(medv~.,data=boston.train,scale=TRUE,ncomp=5)
summary(plsr.model.5comp)
## Data:    X dimension: 253 13 
##  Y dimension: 253 1
## Fit method: kernelpls
## Number of components considered: 5
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps
## X       47.00    56.99    64.47    69.35    76.13
## medv    52.23    73.62    74.80    75.90    76.34
plsr.model.5comp$coefficients
## , , 1 comps
## 
##               medv
## crim    -0.6868980
## zn       0.5584312
## indus   -0.7542734
## chas     0.3044365
## nox     -0.7184728
## rm       1.1198754
## age     -0.6758696
## dis      0.4492035
## rad     -0.6245714
## tax     -0.7786859
## ptratio -0.8055454
## black    0.5588625
## lstat   -1.1728534
## 
## , , 2 comps
## 
##               medv
## crim    -0.7133688
## zn       0.2549858
## indus   -0.3176855
## chas     1.2241722
## nox     -0.1155627
## rm       3.5893439
## age     -0.2038082
## dis     -1.0812735
## rad      0.2048370
## tax     -0.4464095
## ptratio -1.7663214
## black    0.5851232
## lstat   -2.7200795
## 
## , , 3 comps
## 
##               medv
## crim    -0.5518822
## zn       0.1765839
## indus   -0.3471971
## chas     0.5337188
## nox     -0.5437048
## rm       3.6351309
## age     -0.3851608
## dis     -1.5566943
## rad      0.7046832
## tax     -0.6243897
## ptratio -1.6045435
## black    0.5789136
## lstat   -3.1847314
## 
## , , 4 comps
## 
##               medv
## crim    -0.5531629
## zn       0.1879035
## indus   -0.2014220
## chas     0.5457557
## nox     -1.2314578
## rm       2.9663873
## age     -0.3673851
## dis     -2.6699970
## rad      1.3611945
## tax     -1.4978588
## ptratio -1.8436926
## black    0.8602464
## lstat   -3.5238234
## 
## , , 5 comps
## 
##               medv
## crim    -0.4810112
## zn       0.6098952
## indus   -0.1724216
## chas     0.6845996
## nox     -1.6119076
## rm       2.7556799
## age     -0.3872218
## dis     -2.9521650
## rad      1.9996162
## tax     -1.8545314
## ptratio -1.9948811
## black    0.6537034
## lstat   -3.5619537
boston.predict.plsr<-predict(plsr.model,boston.test,ncomp=5)
plsr.MSE<-mean((boston.test$medv-boston.predict.plsr)^2)
plsr.MSE
## [1] 26.36205

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

In this case, the model with the least error would be Partial Least Square with 26.3621. The model with the most error would be Principal Component Regression with 30.2891. The Best Subset and Stepwise Selection both have the same MSE.

(c) Does your chosen model involve all of the features in the data set? Why or why not?