(2) For parts (a) through (c), indicate which of i. through iv. is correct.
i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
# (a) The lasso, relative to least squares, is: Answer is (iii) Less flexible giving improved prediction accuracy when its increase in bias is less than its decrease in variance.
# (b) Repeat (a) for ridge regression relative to least squares: Answer is (iii) Same as Lasso, Less flexible and will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
# (c) Repeat (a) for non-linear methods relative to least squares: Answer is (ii) More flexible giving improved prediction accuracy when its increase in variance is less than its decrease in bias.
(9)
(a) Split the data set into a training set and a test set.
## Warning: package 'ISLR' was built under R version 3.5.3
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
ls.full<-lm(Apps~.,data=train)
summary(ls.full)
##
## Call:
## lm(formula = Apps ~ ., data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5023.2 -408.6 -46.6 336.7 7242.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -140.27742 518.76903 -0.270 0.786955
## PrivateYes -511.17144 176.18600 -2.901 0.003872 **
## Accept 1.63646 0.05066 32.302 < 2e-16 ***
## Enroll -1.08353 0.24583 -4.408 1.27e-05 ***
## Top10perc 58.88924 7.33783 8.025 6.66e-15 ***
## Top25perc -21.37240 6.10623 -3.500 0.000505 ***
## F.Undergrad 0.07107 0.04330 1.641 0.101333
## P.Undergrad 0.05855 0.03783 1.548 0.122318
## Outstate -0.08871 0.02447 -3.625 0.000317 ***
## Room.Board 0.14479 0.06418 2.256 0.024493 *
## Books -0.19750 0.29231 -0.676 0.499547
## Personal 0.03265 0.07920 0.412 0.680391
## PhD -8.78197 6.16418 -1.425 0.154844
## Terminal -4.79463 6.69874 -0.716 0.474463
## S.F.Ratio 17.93368 16.68176 1.075 0.282848
## perc.alumni 2.29991 5.51102 0.417 0.676608
## Expend 0.07830 0.01574 4.974 8.92e-07 ***
## Grad.Rate 9.14788 3.83929 2.383 0.017541 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1113 on 525 degrees of freedom
## Multiple R-squared: 0.9308, Adjusted R-squared: 0.9286
## F-statistic: 415.4 on 17 and 525 DF, p-value: < 2.2e-16
predicted.apps<-predict(ls.full,test)
testerror<-mean((test$Apps-predicted.apps)^2)
testerror
## [1] 769127.1
The Mean Square error for the test data set is 769127.1
(c) Fit a ridge regression model on the training set, with ?? chosen by cross-validation. Report the test error obtained.
train.mat<-model.matrix(Apps~.,data=train)
test.mat<-model.matrix(Apps~.,data=test)
grid<-10^seq(4,-2,length=100)
library(glmnet)
## Warning: package 'glmnet' was built under R version 3.5.3
## Loading required package: Matrix
## Loading required package: foreach
## Warning: package 'foreach' was built under R version 3.5.3
## Loaded glmnet 2.0-16
ridge<-glmnet(train.mat,train$Apps,alpha=0,lambda=grid,thresh = 1e-12)
cv.ridge<-cv.glmnet(train.mat,train$Apps,alpha=0,lambda=grid,thresh=1e-12)
bestlam.ridge<-cv.ridge$lambda.min
bestlam.ridge
## [1] 0.01
pred.newridge<-predict(ridge,s=bestlam.ridge,newx =test.mat)
mean((test$Apps-pred.newridge)^2)
## [1] 769103.1
The Mean Square error for the test data set is found to be 7.69103.1 for Ridge Regression Model, slightly less than the one for Least Square Model
(d) Fit a lasso model on the training set, with ?? chosen by crossvalidation.Report the test error obtained, along with the number of non-zero coefficient estimates.
lasso<-glmnet(train.mat,train$Apps,alpha=1,lambda=grid,thresh = 1e-12)
cv.lasso<-cv.glmnet(train.mat,train$Apps,alpha=1,lambda=grid,thresh=1e-12)
bestlam.lasso<-cv.lasso$lambda.min
bestlam.lasso
## [1] 0.01
pred.newlasso<-predict(lasso,s=bestlam.lasso,newx =test.mat)
mean((test$Apps-pred.newlasso)^2)
## [1] 769058.3
predict(lasso,s=bestlam.lasso,type="coefficients")
## 19 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -140.44188745
## (Intercept) .
## PrivateYes -511.17487803
## Accept 1.63637874
## Enroll -1.08293849
## Top10perc 58.87977942
## Top25perc -21.36470140
## F.Undergrad 0.07099655
## P.Undergrad 0.05854286
## Outstate -0.08868893
## Room.Board 0.14477237
## Books -0.19740566
## Personal 0.03263004
## PhD -8.78073978
## Terminal -4.79422684
## S.F.Ratio 17.92955431
## perc.alumni 2.29577721
## Expend 0.07829692
## Grad.Rate 9.14621252
The Mean Square error for the test data set is found to be 7.69058.3 for Lasso Regression Model The mean square error has further reduced compared to Ridge Regression.
(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
library(pls)
## Warning: package 'pls' was built under R version 3.5.3
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
pcrmodel<-pcr(Apps~.,data=train,scale=TRUE,validation="CV")
validationplot(pcrmodel,val.type="MSEP")

predict.pcr<-predict(pcrmodel,test,ncomp=17)
mean((test$Apps-predict.pcr)^2)
## [1] 769127.1
We see that the cross validation error is minimum for M=17.If we use 17 components it gives MSE on test set as 7.69127.1 which is similar to the one obtained for least squares method.
(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
plsrmodel<-plsr(Apps~.,data=train,scale=TRUE,validation="CV")
validationplot(plsrmodel,val.type="MSEP")

predict.plsr<-predict(plsrmodel,test,ncomp=10)
mean((test$Apps-predict.plsr)^2)
## [1] 775233.6
We see that the cross validation error is minimum for M=10.If we use 10 components it gives MSE on test set as 7.75233.6, which is higher compared to the one obtained for least squares method.
(11)
(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
library(MASS)
attach(Boston)
dim(Boston)
## [1] 506 14
str(Boston)
## 'data.frame': 506 obs. of 14 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : int 0 0 0 0 0 0 0 0 0 0 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : int 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ black : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
summary(Boston)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08204 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio black
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
set.seed(10743959)
subset<-sample(nrow(Boston),nrow(Boston)*0.70)
boston.train<-Boston[subset,]
boston.test<-Boston[-subset,]
Full model
slr.full<-lm(medv~.,data=boston.train)
slr.predict<-predict(slr.full,boston.test)
slr.MSE<-mean((slr.predict-boston.test$medv)^2)
slr.MSE
## [1] 17.96323
Best subset
library(leaps)
## Warning: package 'leaps' was built under R version 3.5.3
bsm<-regsubsets(medv~.,data=boston.train,nbest=1,nvmax=13)
summary(bsm)
## Subset selection object
## Call: regsubsets.formula(medv ~ ., data = boston.train, nbest = 1,
## nvmax = 13)
## 13 Variables (and intercept)
## Forced in Forced out
## crim FALSE FALSE
## zn FALSE FALSE
## indus FALSE FALSE
## chas FALSE FALSE
## nox FALSE FALSE
## rm FALSE FALSE
## age FALSE FALSE
## dis FALSE FALSE
## rad FALSE FALSE
## tax FALSE FALSE
## ptratio FALSE FALSE
## black FALSE FALSE
## lstat FALSE FALSE
## 1 subsets of each size up to 13
## Selection Algorithm: exhaustive
## crim zn indus chas nox rm age dis rad tax ptratio black lstat
## 1 ( 1 ) " " " " " " " " " " " " " " " " " " " " " " " " "*"
## 2 ( 1 ) " " " " " " " " " " "*" " " " " " " " " " " " " "*"
## 3 ( 1 ) " " " " " " " " " " "*" " " " " " " " " "*" " " "*"
## 4 ( 1 ) " " " " " " " " " " "*" " " "*" " " " " "*" " " "*"
## 5 ( 1 ) " " " " " " " " "*" "*" " " "*" " " " " "*" " " "*"
## 6 ( 1 ) " " "*" " " " " "*" "*" " " "*" " " " " "*" " " "*"
## 7 ( 1 ) " " "*" " " "*" "*" "*" " " "*" " " " " "*" " " "*"
## 8 ( 1 ) " " "*" " " "*" "*" "*" " " "*" " " " " "*" "*" "*"
## 9 ( 1 ) " " "*" " " " " "*" "*" " " "*" "*" "*" "*" "*" "*"
## 10 ( 1 ) " " "*" " " "*" "*" "*" " " "*" "*" "*" "*" "*" "*"
## 11 ( 1 ) "*" "*" " " "*" "*" "*" " " "*" "*" "*" "*" "*" "*"
## 12 ( 1 ) "*" "*" " " "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
## 13 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
boston.test.mat <- model.matrix(medv~ ., data = boston.test, nvmax = 13)
val.errors <- rep(NA, 13)
for (i in 1:13) {
coefi <- coef(bsm, id = i)
pred <- boston.test.mat[, names(coefi)] %*% coefi
val.errors[i] <- mean((pred - boston.test$medv)^2)
}
plot(val.errors, xlab = "Number of predictors", ylab = "Test MSE", pch = 19, type = "b",col="orange")

which.min(val.errors)
## [1] 11
coef(bsm,which.min(val.errors))
## (Intercept) crim zn chas nox
## 42.504262409 -0.051977242 0.058240750 2.579296005 -16.716321472
## rm dis rad tax ptratio
## 3.017526315 -1.598507331 0.330049548 -0.013725911 -0.904781765
## black lstat
## 0.008099312 -0.611990159
bsm.MSE<-val.errors[11]
bsm.MSE
## [1] 17.81448
forward selection
nullmodel<-lm(medv ~ 1, data = boston.train)
fullmodel<-lm(medv ~ ., data = boston.train)
fwd.model<-step(nullmodel, scope = list(lower = nullmodel, upper = fullmodel),
direction = "forward")
## Start: AIC=1586.28
## medv ~ 1
##
## Df Sum of Sq RSS AIC
## + lstat 1 16886.9 14204 1311.0
## + rm 1 13875.0 17216 1379.0
## + indus 1 6900.6 24190 1499.4
## + ptratio 1 6829.7 24261 1500.5
## + tax 1 5850.3 25241 1514.5
## + nox 1 5552.7 25538 1518.6
## + crim 1 4592.6 26498 1531.7
## + age 1 4281.3 26810 1535.8
## + zn 1 4209.0 26882 1536.8
## + black 1 3725.3 27366 1543.1
## + rad 1 3716.7 27374 1543.2
## + dis 1 1831.6 29259 1566.8
## + chas 1 1093.5 29997 1575.6
## <none> 31091 1586.3
##
## Step: AIC=1310.96
## medv ~ lstat
##
## Df Sum of Sq RSS AIC
## + rm 1 2395.69 11808 1247.6
## + ptratio 1 1732.25 12472 1266.9
## + dis 1 876.01 13328 1290.4
## + chas 1 517.36 13687 1299.8
## + age 1 225.86 13978 1307.3
## + zn 1 110.67 14093 1310.2
## + tax 1 91.75 14112 1310.7
## <none> 14204 1311.0
## + black 1 77.31 14127 1311.0
## + nox 1 44.17 14160 1311.9
## + crim 1 16.41 14188 1312.5
## + indus 1 10.89 14193 1312.7
## + rad 1 0.00 14204 1313.0
##
## Step: AIC=1247.57
## medv ~ lstat + rm
##
## Df Sum of Sq RSS AIC
## + ptratio 1 1132.61 10676 1213.9
## + dis 1 549.77 11258 1232.7
## + chas 1 387.64 11421 1237.8
## + black 1 222.12 11586 1242.8
## + tax 1 141.87 11666 1245.3
## <none> 11808 1247.6
## + age 1 41.66 11767 1248.3
## + zn 1 36.09 11772 1248.5
## + rad 1 28.87 11779 1248.7
## + crim 1 27.66 11781 1248.7
## + nox 1 8.20 11800 1249.3
## + indus 1 3.04 11805 1249.5
##
## Step: AIC=1213.87
## medv ~ lstat + rm + ptratio
##
## Df Sum of Sq RSS AIC
## + dis 1 582.48 10093 1196.0
## + chas 1 306.74 10369 1205.5
## + black 1 152.15 10524 1210.8
## + rad 1 81.27 10594 1213.2
## + age 1 63.71 10612 1213.8
## <none> 10676 1213.9
## + indus 1 17.59 10658 1215.3
## + zn 1 4.49 10671 1215.7
## + crim 1 0.99 10675 1215.8
## + tax 1 0.02 10676 1215.9
## + nox 1 0.01 10676 1215.9
##
## Step: AIC=1196.01
## medv ~ lstat + rm + ptratio + dis
##
## Df Sum of Sq RSS AIC
## + nox 1 485.83 9607.3 1180.5
## + zn 1 246.86 9846.3 1189.2
## + black 1 220.03 9873.1 1190.2
## + chas 1 189.94 9903.2 1191.3
## + indus 1 167.38 9925.8 1192.1
## + age 1 96.92 9996.3 1194.6
## + tax 1 84.77 10008.4 1195.0
## <none> 10093.2 1196.0
## + crim 1 14.98 10078.2 1197.5
## + rad 1 0.32 10092.9 1198.0
##
## Step: AIC=1180.55
## medv ~ lstat + rm + ptratio + dis + nox
##
## Df Sum of Sq RSS AIC
## + zn 1 263.206 9344.1 1172.7
## + chas 1 235.838 9371.5 1173.8
## + black 1 134.635 9472.7 1177.5
## + rad 1 94.690 9512.7 1179.0
## <none> 9607.3 1180.5
## + age 1 18.917 9588.4 1181.8
## + indus 1 17.286 9590.1 1181.9
## + crim 1 1.982 9605.4 1182.5
## + tax 1 0.423 9606.9 1182.5
##
## Step: AIC=1172.72
## medv ~ lstat + rm + ptratio + dis + nox + zn
##
## Df Sum of Sq RSS AIC
## + chas 1 228.511 9115.6 1166.0
## + black 1 161.277 9182.9 1168.5
## <none> 9344.1 1172.7
## + rad 1 44.435 9299.7 1173.0
## + crim 1 20.334 9323.8 1173.9
## + indus 1 13.701 9330.4 1174.2
## + tax 1 12.329 9331.8 1174.2
## + age 1 7.138 9337.0 1174.5
##
## Step: AIC=1165.95
## medv ~ lstat + rm + ptratio + dis + nox + zn + chas
##
## Df Sum of Sq RSS AIC
## + black 1 129.863 8985.8 1162.9
## <none> 9115.6 1166.0
## + rad 1 46.912 9068.7 1166.1
## + indus 1 18.764 9096.9 1167.2
## + crim 1 12.611 9103.0 1167.5
## + age 1 10.235 9105.4 1167.5
## + tax 1 6.846 9108.8 1167.7
##
## Step: AIC=1162.87
## medv ~ lstat + rm + ptratio + dis + nox + zn + chas + black
##
## Df Sum of Sq RSS AIC
## + rad 1 113.259 8872.5 1160.4
## <none> 8985.8 1162.9
## + age 1 14.476 8971.3 1164.3
## + indus 1 10.915 8974.9 1164.4
## + crim 1 0.376 8985.4 1164.9
## + tax 1 0.161 8985.6 1164.9
##
## Step: AIC=1160.38
## medv ~ lstat + rm + ptratio + dis + nox + zn + chas + black +
## rad
##
## Df Sum of Sq RSS AIC
## + tax 1 226.847 8645.7 1153.2
## <none> 8872.5 1160.4
## + indus 1 25.821 8846.7 1161.3
## + crim 1 20.783 8851.7 1161.5
## + age 1 10.522 8862.0 1162.0
##
## Step: AIC=1153.21
## medv ~ lstat + rm + ptratio + dis + nox + zn + chas + black +
## rad + tax
##
## Df Sum of Sq RSS AIC
## <none> 8645.7 1153.2
## + crim 1 24.7408 8620.9 1154.2
## + age 1 6.6613 8639.0 1154.9
## + indus 1 2.1652 8643.5 1155.1
summary(fwd.model)
##
## Call:
## lm(formula = medv ~ lstat + rm + ptratio + dis + nox + zn + chas +
## black + rad + tax, data = boston.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.5511 -2.7981 -0.5087 1.6696 24.1049
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.553676 6.098471 6.814 4.28e-11 ***
## lstat -0.620575 0.058679 -10.576 < 2e-16 ***
## rm 3.068242 0.484744 6.330 7.69e-10 ***
## ptratio -0.899612 0.160989 -5.588 4.69e-08 ***
## dis -1.568736 0.229077 -6.848 3.47e-11 ***
## nox -16.280488 4.309207 -3.778 0.000186 ***
## zn 0.056446 0.016975 3.325 0.000979 ***
## chas 2.628518 1.055356 2.491 0.013224 *
## black 0.008998 0.003350 2.686 0.007590 **
## rad 0.305467 0.083179 3.672 0.000279 ***
## tax -0.013603 0.004534 -3.000 0.002898 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.021 on 343 degrees of freedom
## Multiple R-squared: 0.7219, Adjusted R-squared: 0.7138
## F-statistic: 89.05 on 10 and 343 DF, p-value: < 2.2e-16
fwd.predict<-predict(fwd.model,boston.test)
fwd.mse<-mean((fwd.predict-boston.test$medv)^2)
fwd.mse
## [1] 18.84049
backward selection
bwd.model<-step(fullmodel,direction = "backward")
## Start: AIC=1157.9
## medv ~ crim + zn + indus + chas + nox + rm + age + dis + rad +
## tax + ptratio + black + lstat
##
## Df Sum of Sq RSS AIC
## - indus 1 1.52 8615.3 1156.0
## - age 1 5.66 8619.4 1156.1
## - crim 1 23.03 8636.8 1156.8
## <none> 8613.8 1157.9
## - black 1 140.52 8754.3 1161.6
## - chas 1 148.15 8761.9 1161.9
## - tax 1 200.73 8814.5 1164.1
## - zn 1 282.73 8896.5 1167.3
## - nox 1 323.10 8936.9 1168.9
## - rad 1 344.22 8958.0 1169.8
## - ptratio 1 781.31 9395.1 1186.6
## - rm 1 962.12 9575.9 1193.4
## - dis 1 1057.32 9671.1 1196.9
## - lstat 1 2403.65 11017.4 1243.0
##
## Step: AIC=1155.97
## medv ~ crim + zn + chas + nox + rm + age + dis + rad + tax +
## ptratio + black + lstat
##
## Df Sum of Sq RSS AIC
## - age 1 5.62 8620.9 1154.2
## - crim 1 23.70 8639.0 1154.9
## <none> 8615.3 1156.0
## - black 1 139.54 8754.8 1159.7
## - chas 1 152.16 8767.5 1160.2
## - tax 1 227.08 8842.4 1163.2
## - zn 1 282.42 8897.7 1165.4
## - nox 1 330.81 8946.1 1167.3
## - rad 1 355.54 8970.8 1168.3
## - ptratio 1 782.19 9397.5 1184.7
## - rm 1 961.95 9577.2 1191.4
## - dis 1 1129.57 9744.9 1197.6
## - lstat 1 2415.87 11031.2 1241.5
##
## Step: AIC=1154.2
## medv ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio +
## black + lstat
##
## Df Sum of Sq RSS AIC
## - crim 1 24.74 8645.7 1153.2
## <none> 8620.9 1154.2
## - black 1 137.25 8758.2 1157.8
## - chas 1 150.23 8771.1 1158.3
## - tax 1 230.80 8851.7 1161.5
## - zn 1 293.36 8914.3 1164.0
## - rad 1 364.43 8985.4 1166.9
## - nox 1 375.39 8996.3 1167.3
## - ptratio 1 795.33 9416.2 1183.4
## - rm 1 965.97 9586.9 1189.8
## - dis 1 1206.59 9827.5 1198.6
## - lstat 1 2683.27 11304.2 1248.1
##
## Step: AIC=1153.21
## medv ~ zn + chas + nox + rm + dis + rad + tax + ptratio + black +
## lstat
##
## Df Sum of Sq RSS AIC
## <none> 8645.7 1153.2
## - chas 1 156.36 8802.0 1157.6
## - black 1 181.81 8827.5 1158.6
## - tax 1 226.85 8872.5 1160.4
## - zn 1 278.70 8924.4 1162.4
## - rad 1 339.94 8985.6 1164.9
## - nox 1 359.79 9005.4 1165.7
## - ptratio 1 787.09 9432.8 1182.1
## - rm 1 1009.85 9655.5 1190.3
## - dis 1 1182.06 9827.7 1196.6
## - lstat 1 2819.25 11464.9 1251.1
summary(bwd.model)
##
## Call:
## lm(formula = medv ~ zn + chas + nox + rm + dis + rad + tax +
## ptratio + black + lstat, data = boston.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.5511 -2.7981 -0.5087 1.6696 24.1049
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.553676 6.098471 6.814 4.28e-11 ***
## zn 0.056446 0.016975 3.325 0.000979 ***
## chas 2.628518 1.055356 2.491 0.013224 *
## nox -16.280488 4.309207 -3.778 0.000186 ***
## rm 3.068242 0.484744 6.330 7.69e-10 ***
## dis -1.568736 0.229077 -6.848 3.47e-11 ***
## rad 0.305467 0.083179 3.672 0.000279 ***
## tax -0.013603 0.004534 -3.000 0.002898 **
## ptratio -0.899612 0.160989 -5.588 4.69e-08 ***
## black 0.008998 0.003350 2.686 0.007590 **
## lstat -0.620575 0.058679 -10.576 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.021 on 343 degrees of freedom
## Multiple R-squared: 0.7219, Adjusted R-squared: 0.7138
## F-statistic: 89.05 on 10 and 343 DF, p-value: < 2.2e-16
bwd.predict<-predict(bwd.model,boston.test)
bwd.mse<-mean((bwd.predict-boston.test$medv)^2)
bwd.mse
## [1] 18.84049
stepwise
step.model<-step(nullmodel, scope = list(lower = nullmodel, upper = fullmodel),
direction = "both")
## Start: AIC=1586.28
## medv ~ 1
##
## Df Sum of Sq RSS AIC
## + lstat 1 16886.9 14204 1311.0
## + rm 1 13875.0 17216 1379.0
## + indus 1 6900.6 24190 1499.4
## + ptratio 1 6829.7 24261 1500.5
## + tax 1 5850.3 25241 1514.5
## + nox 1 5552.7 25538 1518.6
## + crim 1 4592.6 26498 1531.7
## + age 1 4281.3 26810 1535.8
## + zn 1 4209.0 26882 1536.8
## + black 1 3725.3 27366 1543.1
## + rad 1 3716.7 27374 1543.2
## + dis 1 1831.6 29259 1566.8
## + chas 1 1093.5 29997 1575.6
## <none> 31091 1586.3
##
## Step: AIC=1310.96
## medv ~ lstat
##
## Df Sum of Sq RSS AIC
## + rm 1 2395.7 11808 1247.6
## + ptratio 1 1732.3 12472 1266.9
## + dis 1 876.0 13328 1290.4
## + chas 1 517.4 13687 1299.8
## + age 1 225.9 13978 1307.3
## + zn 1 110.7 14093 1310.2
## + tax 1 91.8 14112 1310.7
## <none> 14204 1311.0
## + black 1 77.3 14127 1311.0
## + nox 1 44.2 14160 1311.9
## + crim 1 16.4 14188 1312.5
## + indus 1 10.9 14193 1312.7
## + rad 1 0.0 14204 1313.0
## - lstat 1 16886.9 31091 1586.3
##
## Step: AIC=1247.57
## medv ~ lstat + rm
##
## Df Sum of Sq RSS AIC
## + ptratio 1 1132.6 10676 1213.9
## + dis 1 549.8 11258 1232.7
## + chas 1 387.6 11421 1237.8
## + black 1 222.1 11586 1242.8
## + tax 1 141.9 11666 1245.3
## <none> 11808 1247.6
## + age 1 41.7 11767 1248.3
## + zn 1 36.1 11772 1248.5
## + rad 1 28.9 11779 1248.7
## + crim 1 27.7 11781 1248.7
## + nox 1 8.2 11800 1249.3
## + indus 1 3.0 11805 1249.5
## - rm 1 2395.7 14204 1311.0
## - lstat 1 5407.6 17216 1379.0
##
## Step: AIC=1213.87
## medv ~ lstat + rm + ptratio
##
## Df Sum of Sq RSS AIC
## + dis 1 582.5 10093 1196.0
## + chas 1 306.7 10369 1205.5
## + black 1 152.2 10524 1210.8
## + rad 1 81.3 10594 1213.2
## + age 1 63.7 10612 1213.8
## <none> 10676 1213.9
## + indus 1 17.6 10658 1215.3
## + zn 1 4.5 10671 1215.7
## + crim 1 1.0 10675 1215.8
## + tax 1 0.0 10676 1215.9
## + nox 1 0.0 10676 1215.9
## - ptratio 1 1132.6 11808 1247.6
## - rm 1 1796.1 12472 1266.9
## - lstat 1 4415.1 15091 1334.4
##
## Step: AIC=1196.01
## medv ~ lstat + rm + ptratio + dis
##
## Df Sum of Sq RSS AIC
## + nox 1 485.8 9607.3 1180.5
## + zn 1 246.9 9846.3 1189.2
## + black 1 220.0 9873.1 1190.2
## + chas 1 189.9 9903.2 1191.3
## + indus 1 167.4 9925.8 1192.1
## + age 1 96.9 9996.3 1194.6
## + tax 1 84.8 10008.4 1195.0
## <none> 10093.2 1196.0
## + crim 1 15.0 10078.2 1197.5
## + rad 1 0.3 10092.9 1198.0
## - dis 1 582.5 10675.7 1213.9
## - ptratio 1 1165.3 11258.5 1232.7
## - rm 1 1509.0 11602.2 1243.3
## - lstat 1 4868.4 14961.6 1333.4
##
## Step: AIC=1180.55
## medv ~ lstat + rm + ptratio + dis + nox
##
## Df Sum of Sq RSS AIC
## + zn 1 263.2 9344.1 1172.7
## + chas 1 235.8 9371.5 1173.8
## + black 1 134.6 9472.7 1177.5
## + rad 1 94.7 9512.7 1179.0
## <none> 9607.3 1180.5
## + age 1 18.9 9588.4 1181.8
## + indus 1 17.3 9590.1 1181.9
## + crim 1 2.0 9605.4 1182.5
## + tax 1 0.4 9606.9 1182.5
## - nox 1 485.8 10093.2 1196.0
## - dis 1 1068.3 10675.6 1215.9
## - ptratio 1 1356.9 10964.3 1225.3
## - rm 1 1444.7 11052.1 1228.1
## - lstat 1 3420.8 13028.1 1286.4
##
## Step: AIC=1172.72
## medv ~ lstat + rm + ptratio + dis + nox + zn
##
## Df Sum of Sq RSS AIC
## + chas 1 228.5 9115.6 1166.0
## + black 1 161.3 9182.9 1168.5
## <none> 9344.1 1172.7
## + rad 1 44.4 9299.7 1173.0
## + crim 1 20.3 9323.8 1173.9
## + indus 1 13.7 9330.4 1174.2
## + tax 1 12.3 9331.8 1174.2
## + age 1 7.1 9337.0 1174.5
## - zn 1 263.2 9607.3 1180.5
## - nox 1 502.2 9846.3 1189.2
## - ptratio 1 954.0 10298.1 1205.1
## - rm 1 1215.1 10559.2 1214.0
## - dis 1 1326.0 10670.1 1217.7
## - lstat 1 3512.4 12856.5 1283.7
##
## Step: AIC=1165.95
## medv ~ lstat + rm + ptratio + dis + nox + zn + chas
##
## Df Sum of Sq RSS AIC
## + black 1 129.9 8985.8 1162.9
## <none> 9115.6 1166.0
## + rad 1 46.9 9068.7 1166.1
## + indus 1 18.8 9096.9 1167.2
## + crim 1 12.6 9103.0 1167.5
## + age 1 10.2 9105.4 1167.5
## + tax 1 6.8 9108.8 1167.7
## - chas 1 228.5 9344.1 1172.7
## - zn 1 255.9 9371.5 1173.8
## - nox 1 547.8 9663.4 1184.6
## - ptratio 1 902.1 10017.7 1197.4
## - rm 1 1183.0 10298.7 1207.2
## - dis 1 1240.9 10356.5 1209.1
## - lstat 1 3276.0 12391.6 1272.6
##
## Step: AIC=1162.87
## medv ~ lstat + rm + ptratio + dis + nox + zn + chas + black
##
## Df Sum of Sq RSS AIC
## + rad 1 113.26 8872.5 1160.4
## <none> 8985.8 1162.9
## + age 1 14.48 8971.3 1164.3
## + indus 1 10.92 8974.9 1164.4
## + crim 1 0.38 8985.4 1164.9
## + tax 1 0.16 8985.6 1164.9
## - black 1 129.86 9115.6 1166.0
## - chas 1 197.10 9182.9 1168.5
## - zn 1 279.84 9265.6 1171.7
## - nox 1 454.26 9440.0 1178.3
## - ptratio 1 820.48 9806.2 1191.8
## - dis 1 1252.80 10238.6 1207.1
## - rm 1 1260.63 10246.4 1207.3
## - lstat 1 2823.02 11808.8 1257.6
##
## Step: AIC=1160.38
## medv ~ lstat + rm + ptratio + dis + nox + zn + chas + black +
## rad
##
## Df Sum of Sq RSS AIC
## + tax 1 226.85 8645.7 1153.2
## <none> 8872.5 1160.4
## + indus 1 25.82 8846.7 1161.3
## + crim 1 20.78 8851.7 1161.5
## + age 1 10.52 8862.0 1162.0
## - rad 1 113.26 8985.8 1162.9
## - chas 1 192.69 9065.2 1166.0
## - black 1 196.21 9068.7 1166.1
## - zn 1 209.06 9081.6 1166.6
## - nox 1 562.12 9434.6 1180.1
## - ptratio 1 904.86 9777.4 1192.8
## - dis 1 1144.82 10017.3 1201.3
## - rm 1 1151.98 10024.5 1201.6
## - lstat 1 2850.03 11722.5 1257.0
##
## Step: AIC=1153.21
## medv ~ lstat + rm + ptratio + dis + nox + zn + chas + black +
## rad + tax
##
## Df Sum of Sq RSS AIC
## <none> 8645.7 1153.2
## + crim 1 24.74 8620.9 1154.2
## + age 1 6.66 8639.0 1154.9
## + indus 1 2.17 8643.5 1155.1
## - chas 1 156.36 8802.0 1157.6
## - black 1 181.81 8827.5 1158.6
## - tax 1 226.85 8872.5 1160.4
## - zn 1 278.70 8924.4 1162.4
## - rad 1 339.94 8985.6 1164.9
## - nox 1 359.79 9005.4 1165.7
## - ptratio 1 787.09 9432.8 1182.1
## - rm 1 1009.85 9655.5 1190.3
## - dis 1 1182.06 9827.7 1196.6
## - lstat 1 2819.25 11464.9 1251.1
summary(step.model)
##
## Call:
## lm(formula = medv ~ lstat + rm + ptratio + dis + nox + zn + chas +
## black + rad + tax, data = boston.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.5511 -2.7981 -0.5087 1.6696 24.1049
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.553676 6.098471 6.814 4.28e-11 ***
## lstat -0.620575 0.058679 -10.576 < 2e-16 ***
## rm 3.068242 0.484744 6.330 7.69e-10 ***
## ptratio -0.899612 0.160989 -5.588 4.69e-08 ***
## dis -1.568736 0.229077 -6.848 3.47e-11 ***
## nox -16.280488 4.309207 -3.778 0.000186 ***
## zn 0.056446 0.016975 3.325 0.000979 ***
## chas 2.628518 1.055356 2.491 0.013224 *
## black 0.008998 0.003350 2.686 0.007590 **
## rad 0.305467 0.083179 3.672 0.000279 ***
## tax -0.013603 0.004534 -3.000 0.002898 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.021 on 343 degrees of freedom
## Multiple R-squared: 0.7219, Adjusted R-squared: 0.7138
## F-statistic: 89.05 on 10 and 343 DF, p-value: < 2.2e-16
step.predict<-predict(step.model,boston.test)
step.mse<-mean((step.predict-boston.test$medv)^2)
step.mse
## [1] 18.84049
ridge regression
bostontrain.mat<-model.matrix(medv~.,data=boston.train)
bostontest.mat<-model.matrix(medv~.,data=boston.test)
grid<-10^seq(4,-2,length=100)
library(glmnet)
ridge.model<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
boston.cv.ridge<-cv.glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
plot(boston.cv.ridge)

boston.bestlam.ridge<-boston.cv.ridge$lambda.min
boston.bestlam.ridge
## [1] 0.1417474
ridge.model.1<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=boston.bestlam.ridge)
coef(ridge.model.1)
## 15 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 39.726176503
## (Intercept) .
## crim -0.043885158
## zn 0.052687856
## indus -0.005063095
## chas 2.692011518
## nox -15.182853995
## rm 3.177687894
## age -0.008481379
## dis -1.529508653
## rad 0.269963547
## tax -0.010966192
## ptratio -0.881138929
## black 0.008225794
## lstat -0.588333809
boston.pred.newridge<-predict(ridge.model,s=boston.bestlam.ridge,newx=bostontest.mat)
ridge.MSE<-mean((boston.test$medv-boston.pred.newridge)^2)
ridge.MSE
## [1] 17.74425
I performed ridge regression for a range of values of lambda and find MSE for training set. The value of lambda corresponding to minimum MSE is used to build final model and for prediction. The value of lambda is 0.1417474 for which we got minimum train set MSE
lasso regression
lasso.model<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
boston.cv.lasso<-cv.glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=grid)
plot(boston.cv.lasso)

boston.bestlam.lasso<-boston.cv.lasso$lambda.min
boston.bestlam.lasso
## [1] 0.1232847
lasso.model.1<-glmnet(bostontrain.mat,boston.train$medv,alpha=0,lambda=boston.bestlam.lasso)
coef(lasso.model.1)
## 15 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 40.033394715
## (Intercept) .
## crim -0.044466644
## zn 0.053252258
## indus -0.002505365
## chas 2.680304197
## nox -15.340063745
## rm 3.166768913
## age -0.008441933
## dis -1.541098565
## rad 0.276257251
## tax -0.011281348
## ptratio -0.883796393
## black 0.008225935
## lstat -0.590324265
boston.pred.newlasso<-predict(lasso.model,s=boston.bestlam.lasso,newx=bostontest.mat)
lasso.MSE<-mean((boston.test$medv-boston.pred.newlasso)^2)
lasso.MSE
## [1] 17.76477
I performed lasso regression for a range of values of lambda and find MSE for training set. The value of lambda corresponding to minimum MSE is used to build final model and for prediction The value of lambda is 0.1232847 for which we got minimum train set MSE.
Principal Component Regression (PCR)
library(pls)
pcr.model<-pcr(medv~.,data=boston.train,scale=TRUE,validation="CV")
summary(pcr.model)
## Data: X dimension: 354 13
## Y dimension: 354 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 9.398 7.599 7.437 5.937 5.957 5.576 5.598
## adjCV 9.398 7.597 7.434 5.932 5.952 5.567 5.591
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 5.575 5.477 5.447 5.428 5.452 5.253 5.186
## adjCV 5.565 5.467 5.444 5.418 5.445 5.240 5.173
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
## X 47.68 59.12 68.52 75.68 82.00 86.70 90.14
## medv 34.95 39.63 60.77 60.94 65.86 65.96 66.42
## 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## X 93.15 95.05 96.84 98.28 99.58 100.00
## medv 67.91 68.10 69.03 69.15 71.49 72.29
validationplot(pcr.model,val.type="MSEP")

pcr.model.5comp<-pcr(medv~.,data=boston.train,scale=TRUE,ncomp=5)
summary(pcr.model.5comp)
## Data: X dimension: 354 13
## Y dimension: 354 1
## Fit method: svdpc
## Number of components considered: 5
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps
## X 47.68 59.12 68.52 75.68 82.00
## medv 34.95 39.63 60.77 60.94 65.86
pcr.model.5comp$coefficients
## , , 1 comps
##
## medv
## crim -0.6063915501
## zn 0.5653092042
## indus -0.7708923694
## chas -0.0006292165
## nox -0.7499003656
## rm 0.4304083600
## age -0.6915127075
## dis 0.7132887312
## rad -0.7111004067
## tax -0.7475540656
## ptratio -0.4139434089
## black 0.4877422872
## lstat -0.6904890541
##
## , , 2 comps
##
## medv
## crim -1.16076039
## zn 0.04235492
## indus -0.56193240
## chas 0.63284060
## nox -0.34381707
## rm 0.58457968
## age -0.17407242
## dis 0.12326631
## rad -1.17923658
## tax -1.15681336
## ptratio -0.97561493
## black 0.97449535
## lstat -0.73255090
##
## , , 3 comps
##
## medv
## crim -0.40885303
## zn 1.14506849
## indus -0.57361682
## chas 2.11629628
## nox 0.07154186
## rm 2.92971918
## age -0.12664889
## dis -0.12661619
## rad -0.06747743
## tax -0.27611708
## ptratio -2.13668603
## black 0.02081913
## lstat -1.84931819
##
## , , 4 comps
##
## medv
## crim -0.446493901
## zn 1.082229500
## indus -0.560066308
## chas 2.339542181
## nox 0.002820896
## rm 2.929573203
## age -0.173545861
## dis -0.095552865
## rad 0.012642142
## tax -0.218882977
## ptratio -1.892652735
## black 0.159432982
## lstat -1.922300636
##
## , , 5 comps
##
## medv
## crim -0.79069410
## zn 0.31377021
## indus -0.53227588
## chas 0.97980565
## nox -0.07258062
## rm 4.11921228
## age 0.08861939
## dis -0.53232412
## rad 0.24973101
## tax -0.07788551
## ptratio -1.33947559
## black 0.49875496
## lstat -2.65801809
boston.predict.pcr<-predict(pcr.model,boston.test,ncomp=5)
pcr.MSE<-mean((boston.test$medv-boston.predict.pcr)^2)
pcr.MSE
## [1] 15.96318
I performed Principal Component Regression and see summary for 13 components plot of MSE vs Number of Components suggests that minimum MSE is obtained for 13 components.This means that PCR is not helping in dimension reduction from p+1 to m+1 I select 5 as number of components in final model as 5 components explain about 80 % of variation Scores of the components are also summarized. Scores are further used to fit the Principal Component Model
PLS
library(pls)
plsr.model<-plsr(medv~.,data=boston.train,scale=TRUE,validation="CV")
validationplot(plsr.model,val.type="MSEP")

plsr.model.5comp<-plsr(medv~.,data=boston.train,scale=TRUE,ncomp=5)
summary(plsr.model.5comp)
## Data: X dimension: 354 13
## Y dimension: 354 1
## Fit method: kernelpls
## Number of components considered: 5
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps
## X 46.42 57.12 64.14 70.68 76.34
## medv 46.90 67.90 70.21 71.38 71.87
plsr.model.5comp$coefficients
## , , 1 comps
##
## medv
## crim -0.6629002
## zn 0.6346088
## indus -0.8125719
## chas 0.3234698
## nox -0.7289083
## rm 1.1522209
## age -0.6400400
## dis 0.4186334
## rad -0.5963464
## tax -0.7481862
## ptratio -0.8083894
## black 0.5970368
## lstat -1.2711419
##
## , , 2 comps
##
## medv
## crim -0.42482542
## zn 0.52607570
## indus -0.47112526
## chas 1.40382834
## nox -0.24208308
## rm 3.48196710
## age -0.04543861
## dis -1.18593277
## rad 0.35129618
## tax -0.25239724
## ptratio -1.93617630
## black 0.69452051
## lstat -3.03566596
##
## , , 3 comps
##
## medv
## crim 0.0162841
## zn 0.5614550
## indus -0.4975755
## chas 0.5868014
## nox -0.9170569
## rm 3.3403655
## age -0.3678324
## dis -1.9690256
## rad 0.9934994
## tax -0.1655670
## ptratio -1.6703181
## black 0.8684079
## lstat -4.0099664
##
## , , 4 comps
##
## medv
## crim -0.0137188
## zn 0.5497794
## indus -0.3653050
## chas 0.5499696
## nox -1.3785074
## rm 2.5871245
## age -0.3460459
## dis -3.0540460
## rad 1.1149044
## tax -0.6409048
## ptratio -1.9283887
## black 1.1417590
## lstat -4.5137243
##
## , , 5 comps
##
## medv
## crim 0.04611646
## zn 1.10369104
## indus -0.31643319
## chas 0.79412630
## nox -1.76929120
## rm 2.19271829
## age -0.40901868
## dis -3.49536648
## rad 1.46882099
## tax -0.87414555
## ptratio -2.08221751
## black 0.81717250
## lstat -4.63089016
boston.predict.plsr<-predict(plsr.model,boston.test,ncomp=5)
plsr.MSE<-mean((boston.test$medv-boston.predict.plsr)^2)
plsr.MSE
## [1] 19.42082
(c) Does your chosen model involve all of the features in the data set? Why or why not?
# The PCR model contains 5 components which comprises of all the 13 predictor variables. The weightage of each predictor variable for every component is given by scores
#The Ridge regression model also contains all the predictor variables
#This suggest that all the predictor variables are contributing in predicting the response variable
The End