Assignment #5
Chapter 06 (page 259): 2, 9, 11
2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
(a) The lasso, relative to least squares, is:
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease invariance.
Since it shrinks coefficients, lasso, like ridge regression, trades decreased variance for increased bias. As long as the increase in bias does not outpace the decrease in variance, accuracy will be improved.
(b) Repeat (a) for ridge regression relative to least squares.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease invariance.
Since it shrinks coefficients, ridge regression trades decreased variance for increased bias. As long as the increase in bias does not outpace the decrease in variance, accuracy will be improved.
(c) Repeat (a) for non-linear methods relative to least squares.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decreasein bias.
Non-linear methods are more flexible than OLS, and will follow the observations more closely than a model fit with least squares. The non-linear methods will result in a more accurate model than OLS when the relationship between the dependant and independant variables are non-linear.
9. In this exercise, we will predict the number of applications received using the other variables in the College data set.
library(ISLR)
## Warning: package 'ISLR' was built under R version 3.6.3
attach(College)
set.seed(1)
train=sample(1:nrow(College),nrow(College)/2)
train.college = College[train,]
test=(-train)
test.college = College[test,]
Apps.test=Apps[test]
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
lm.fit = lm(Apps~., data=train.college)
lm.pred = predict(lm.fit, test.college)
mean((lm.pred-Apps.test)^2)
## [1] 1135758
After fitting a linear model on the training subset of College, the test error is 1135758.
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(glmnet)
## Warning: package 'glmnet' was built under R version 3.6.3
## Loading required package: Matrix
## Loaded glmnet 4.0-2
set.seed(1)
x=model.matrix(Apps~.,College[,-Apps])
y=College$Apps
y.test=y[test]
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
grid=10^seq(10,-2,length=100)
ridge.mod = glmnet(x,y,lambda=grid, alpha=0)
cv.out=cv.glmnet(x[train,],y[train],alpha=0)
plot(cv.out)
bestlam=cv.out$lambda.min
ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
mean((ridge.pred-y.test)^2)
## [1] 906029.4
The MSE of a ridge regression fit using \(\lambda\) selected by cross-validation is 906029.4.
(d) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.
set.seed(1)
x=model.matrix(Apps~.,College[,-Apps])
y=College$Apps
y.test=y[test]
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
plot(lasso.mod)
## Warning in regularize.values(x, y, ties, missing(ties)): collapsing to
## unique 'x' values
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
plot(cv.out)
bestlam=cv.out$lambda.min
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
## [1] 1116252
out=glmnet(x,y,alpha=1,lambda =grid)
lasso.coef=predict(out,type='coefficients',s=bestlam)[1:19,]
lasso.coef
## (Intercept) (Intercept) PrivateYes Accept Enroll
## -468.96036214 0.00000000 -491.47351867 1.57117027 -0.76858284
## Top10perc Top25perc F.Undergrad P.Undergrad Outstate
## 48.25732969 -12.94716203 0.04293538 0.04406960 -0.08340724
## Room.Board Books Personal PhD Terminal
## 0.14963683 0.01549548 0.02915128 -8.42538012 -3.26671319
## S.F.Ratio perc.alumni Expend Grad.Rate
## 14.61167079 -0.02612797 0.07718333 8.31579869
lasso.coef[lasso.coef!=0]
## (Intercept) PrivateYes Accept Enroll Top10perc
## -468.96036214 -491.47351867 1.57117027 -0.76858284 48.25732969
## Top25perc F.Undergrad P.Undergrad Outstate Room.Board
## -12.94716203 0.04293538 0.04406960 -0.08340724 0.14963683
## Books Personal PhD Terminal S.F.Ratio
## 0.01549548 0.02915128 -8.42538012 -3.26671319 14.61167079
## perc.alumni Expend Grad.Rate
## -0.02612797 0.07718333 8.31579869
The MSE of a lasso model fit using \(\lambda\) selected by cross-validation is 1116252. There are 17 coefficients with non-zero estimates.
(e) Fit a PCR model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.
set.seed(1)
library(pls)
## Warning: package 'pls' was built under R version 3.6.3
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
pcr.fit = pcr(Apps~., data=train.college, Scale=TRUE, validation='CV')
summary(pcr.fit)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4288 4171 2237 2162 2018 1444 1440
## adjCV 4288 4168 2231 2157 2016 1429 1427
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1425 1404 1407 1359 1326 1327 1339
## adjCV 1413 1391 1394 1347 1314 1314 1326
## 14 comps 15 comps 16 comps 17 comps
## CV 1335 1269 1269 1270
## adjCV 1322 1257 1256 1257
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
## X 51.812 87.28 95.83 97.47 98.79 99.45 99.94
## Apps 5.672 75.44 77.29 85.81 91.84 91.84 91.86
## 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps
## X 99.98 100.00 100.00 100.00 100.00 100.00 100.00
## Apps 92.16 92.18 92.68 93.05 93.06 93.07 93.15
## 15 comps 16 comps 17 comps
## X 100.00 100.00 100.00
## Apps 93.82 93.85 93.89
validationplot(pcr.fit,val.type="MSEP")
pcr.pred=predict(pcr.fit,test.college,ncomp =7)
mean((pcr.pred-Apps.test)^2)
## [1] 1230956
Using a PCR model, the MSE is 1230956.
(f) Fit a PLS model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.
pls.fit=plsr(Apps~.,data=train.college,scale=TRUE,validation ="CV")
summary(pls.fit)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4288 2213 1990 1733 1610 1381 1298
## adjCV 4288 2207 1985 1722 1580 1369 1285
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1279 1268 1254 1250 1241 1239 1238
## adjCV 1267 1256 1243 1239 1231 1229 1227
## 14 comps 15 comps 16 comps 17 comps
## CV 1236 1236 1236 1236
## adjCV 1225 1225 1225 1225
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
## X 27.21 50.73 63.06 65.52 70.20 74.20 78.62
## Apps 75.39 81.24 86.97 91.14 92.62 93.43 93.56
## 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps
## X 80.81 83.29 87.17 89.15 91.37 92.58 94.42
## Apps 93.68 93.76 93.79 93.83 93.86 93.88 93.89
## 15 comps 16 comps 17 comps
## X 96.98 98.78 100.00
## Apps 93.89 93.89 93.89
validationplot(pls.fit,val.type="MSEP")
pls.pred=predict(pls.fit,test.college,ncomp =7)
mean((pls.pred -Apps.test)^2)
## [1] 1095359
Using a PLS model, the MSE is 1095359.
(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
The above models created with different methods all had relatively similar MSE ranging between ~910000 and ~1200000. The ridge regression resulted in the lowest MSE with 906029.4, while PCR was the worst fit model with an MSE of 1230956.
11. We will now try to predict per capita crime rate in the Boston dataset.
library(MASS)
library(leaps)
## Warning: package 'leaps' was built under R version 3.6.3
attach(Boston)
set.seed(1)
train=sample(1:nrow(Boston), nrow(Boston)/2)
train.boston=Boston[train,]
test=(-train)
test.boston=Boston[test,]
crim.test=Boston$crim[test]
x.mat.train=model.matrix(crim~., data=train.boston)[,-crim]
x.mat.train=model.matrix(crim~., data=test.boston)[,-crim]
(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
# Least Squares - all variables
lm.full=lm(crim~.,data=train.boston)
lm.predict=predict(lm.full,test.boston)
lm.MSE=mean((lm.predict-crim.test)^2)
lm.MSE # 41.54639
## [1] 41.54639
# subset
library(leaps)
regfit.best=regsubsets(crim~.,data=Boston[train,],nvmax=13)
test.mat=model.matrix(crim~.,data=Boston[test,])
val.errors=rep(NA,13)
for(i in 1:13){
coefi=coef(regfit.best,id=i)
pred=test.mat[,names(coefi)]%*%coefi
val.errors[i]=mean((Boston$crim[test]-pred)^2)
}
val.errors
## [1] 40.14557 41.87706 42.40901 42.45745 43.03836 42.13258 41.25016
## [8] 41.28957 41.49271 41.50577 41.48839 41.46692 41.54639
which.min(val.errors) # 1 Variable - MSE=40.14557
## [1] 1
sub.mse=val.errors[1]
#ridge
library(glmnet)
set.seed(1)
x=model.matrix(crim~.,Boston)[,-1]
y=Boston$crim
y.test=y[test]
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
grid=10^seq(10,-2,length=100)
ridge.mod = glmnet(x,y,lambda=grid, alpha=0)
cv.out=cv.glmnet(x[train,],y[train],alpha=0)
plot(cv.out)
bestlam=cv.out$lambda.min
ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
ridge.mse = mean((ridge.pred-y.test)^2)
ridge.mse #39.67203
## [1] 38.01174
#lasso
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
grid=10^seq(10,-2,length=100)
lasso.mod = glmnet(x,y,lambda=grid, alpha=1)
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
plot(cv.out)
bestlam=cv.out$lambda.min
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
lasso.mse = mean((lasso.pred-y.test)^2)
lasso.mse #38.05746
## [1] 39.67203
#PCR
set.seed(1)
library(pls)
pcr.fit = pcr(crim~., data=train.boston, Scale=TRUE, validation='CV')
summary(pcr.fit)
## Data: X dimension: 253 13
## Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 9.275 7.305 7.267 7.254 7.234 7.130 7.115
## adjCV 9.275 7.300 7.259 7.246 7.226 7.121 7.107
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 6.795 6.763 6.743 6.694 6.694 6.696 6.700
## adjCV 6.782 6.753 6.732 6.682 6.682 6.684 6.686
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
## X 81.27 97.18 99.08 99.72 99.89 99.93 99.97
## crim 39.26 40.61 40.87 41.25 43.24 43.95 48.88
## 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## X 99.99 100.00 100.00 100.00 100.00 100.00
## crim 49.34 49.94 50.86 50.93 50.98 51.37
validationplot(pcr.fit,val.type="MSEP")
pcr.pred=predict(pcr.fit,test.boston,ncomp =7)
pcr.mse=mean((pcr.pred-y.test)^2) #43.48068
pcr.mse #43.48068
## [1] 43.48068
Simple least squares, best subset, lasso, ridge regression, and PCR were all used to fit and test models. Test MSE were similar for all fit models, but the lasso model appears to have the lowest test MSE.
(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross-validation, or some other reasonable alternative, as opposed to using training error.
out=glmnet(x,y,alpha=1,lambda =grid)
lasso.coef=predict(out,type='coefficient',s=bestlam)[1:14,]
lasso.coef
## (Intercept) zn indus chas nox
## 14.476496146 0.039476650 -0.070676946 -0.643189309 -8.433633678
## rm age dis rad tax
## 0.323327442 0.000000000 -0.877313853 0.539447859 -0.001159816
## ptratio black lstat medv
## -0.224731922 -0.007537653 0.126684807 -0.175267556
The proposed model fit with the lasso method is: \(crim = 14.0596 + .0386zn - .0718indus -.6269chas - 8.1333nox + .30354rm -.8579dis + .5316rad-.0007tax - .2171ptratio - .0075black + .1264lstat - .1713medv\). This model performed the best of any of the fit models based on test MSE.
(c) Does your chosen model involve all of the features in the dataset? Why or why not?
This model does not include all of the features in the dataset. The coefficient for age became 0 in the model