Assignment #5

Chapter 06 (page 259): 2, 9, 11

#2

2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
(a) The lasso, relative to least squares, is:
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease invariance.

Since it shrinks coefficients, lasso, like ridge regression, trades decreased variance for increased bias. As long as the increase in bias does not outpace the decrease in variance, accuracy will be improved.

(b) Repeat (a) for ridge regression relative to least squares.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease invariance.

Since it shrinks coefficients, ridge regression trades decreased variance for increased bias. As long as the increase in bias does not outpace the decrease in variance, accuracy will be improved.

(c) Repeat (a) for non-linear methods relative to least squares.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decreasein bias.

Non-linear methods are more flexible than OLS, and will follow the observations more closely than a model fit with least squares. The non-linear methods will result in a more accurate model than OLS when the relationship between the dependant and independant variables are non-linear.

#9

9. In this exercise, we will predict the number of applications received using the other variables in the College data set.

library(ISLR)
## Warning: package 'ISLR' was built under R version 3.6.3
attach(College)
  1. Split the data ** set into a training set and a test set.**
set.seed(1)
train=sample(1:nrow(College),nrow(College)/2)
train.college = College[train,]
test=(-train)
test.college = College[test,]
Apps.test=Apps[test]

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

lm.fit = lm(Apps~., data=train.college)
lm.pred = predict(lm.fit, test.college)
mean((lm.pred-Apps.test)^2)
## [1] 1135758

After fitting a linear model on the training subset of College, the test error is 1135758.

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

library(glmnet)
## Warning: package 'glmnet' was built under R version 3.6.3
## Loading required package: Matrix
## Loaded glmnet 4.0-2
set.seed(1)
x=model.matrix(Apps~.,College[,-Apps])
y=College$Apps
y.test=y[test]
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
grid=10^seq(10,-2,length=100)
ridge.mod = glmnet(x,y,lambda=grid, alpha=0)
cv.out=cv.glmnet(x[train,],y[train],alpha=0)
plot(cv.out)

bestlam=cv.out$lambda.min
ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
mean((ridge.pred-y.test)^2)
## [1] 906029.4

The MSE of a ridge regression fit using \(\lambda\) selected by cross-validation is 906029.4.

(d) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

set.seed(1)
x=model.matrix(Apps~.,College[,-Apps])
y=College$Apps
y.test=y[test]
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
plot(lasso.mod)
## Warning in regularize.values(x, y, ties, missing(ties)): collapsing to
## unique 'x' values

cv.out=cv.glmnet(x[train,],y[train],alpha=1)
plot(cv.out)

bestlam=cv.out$lambda.min
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
## [1] 1116252
out=glmnet(x,y,alpha=1,lambda =grid)
lasso.coef=predict(out,type='coefficients',s=bestlam)[1:19,]
lasso.coef
##   (Intercept)   (Intercept)    PrivateYes        Accept        Enroll 
## -468.96036214    0.00000000 -491.47351867    1.57117027   -0.76858284 
##     Top10perc     Top25perc   F.Undergrad   P.Undergrad      Outstate 
##   48.25732969  -12.94716203    0.04293538    0.04406960   -0.08340724 
##    Room.Board         Books      Personal           PhD      Terminal 
##    0.14963683    0.01549548    0.02915128   -8.42538012   -3.26671319 
##     S.F.Ratio   perc.alumni        Expend     Grad.Rate 
##   14.61167079   -0.02612797    0.07718333    8.31579869
lasso.coef[lasso.coef!=0]
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -468.96036214 -491.47351867    1.57117027   -0.76858284   48.25732969 
##     Top25perc   F.Undergrad   P.Undergrad      Outstate    Room.Board 
##  -12.94716203    0.04293538    0.04406960   -0.08340724    0.14963683 
##         Books      Personal           PhD      Terminal     S.F.Ratio 
##    0.01549548    0.02915128   -8.42538012   -3.26671319   14.61167079 
##   perc.alumni        Expend     Grad.Rate 
##   -0.02612797    0.07718333    8.31579869

The MSE of a lasso model fit using \(\lambda\) selected by cross-validation is 1116252. There are 17 coefficients with non-zero estimates.

(e) Fit a PCR model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.

set.seed(1)
library(pls)
## Warning: package 'pls' was built under R version 3.6.3
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
pcr.fit = pcr(Apps~., data=train.college, Scale=TRUE, validation='CV')
summary(pcr.fit)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4288     4171     2237     2162     2018     1444     1440
## adjCV         4288     4168     2231     2157     2016     1429     1427
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1425     1404     1407      1359      1326      1327      1339
## adjCV     1413     1391     1394      1347      1314      1314      1326
##        14 comps  15 comps  16 comps  17 comps
## CV         1335      1269      1269      1270
## adjCV      1322      1257      1256      1257
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
## X      51.812    87.28    95.83    97.47    98.79    99.45    99.94
## Apps    5.672    75.44    77.29    85.81    91.84    91.84    91.86
##       8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps
## X       99.98   100.00    100.00    100.00    100.00    100.00    100.00
## Apps    92.16    92.18     92.68     93.05     93.06     93.07     93.15
##       15 comps  16 comps  17 comps
## X       100.00    100.00    100.00
## Apps     93.82     93.85     93.89
validationplot(pcr.fit,val.type="MSEP")

pcr.pred=predict(pcr.fit,test.college,ncomp =7)
mean((pcr.pred-Apps.test)^2)
## [1] 1230956

Using a PCR model, the MSE is 1230956.

(f) Fit a PLS model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.

pls.fit=plsr(Apps~.,data=train.college,scale=TRUE,validation ="CV")
summary(pls.fit)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4288     2213     1990     1733     1610     1381     1298
## adjCV         4288     2207     1985     1722     1580     1369     1285
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1279     1268     1254      1250      1241      1239      1238
## adjCV     1267     1256     1243      1239      1231      1229      1227
##        14 comps  15 comps  16 comps  17 comps
## CV         1236      1236      1236      1236
## adjCV      1225      1225      1225      1225
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
## X       27.21    50.73    63.06    65.52    70.20    74.20    78.62
## Apps    75.39    81.24    86.97    91.14    92.62    93.43    93.56
##       8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps
## X       80.81    83.29     87.17     89.15     91.37     92.58     94.42
## Apps    93.68    93.76     93.79     93.83     93.86     93.88     93.89
##       15 comps  16 comps  17 comps
## X        96.98     98.78    100.00
## Apps     93.89     93.89     93.89
validationplot(pls.fit,val.type="MSEP")

pls.pred=predict(pls.fit,test.college,ncomp =7)
mean((pls.pred -Apps.test)^2)
## [1] 1095359

Using a PLS model, the MSE is 1095359.

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

The above models created with different methods all had relatively similar MSE ranging between ~910000 and ~1200000. The ridge regression resulted in the lowest MSE with 906029.4, while PCR was the worst fit model with an MSE of 1230956.

#11

11. We will now try to predict per capita crime rate in the Boston dataset.

library(MASS)
library(leaps)
## Warning: package 'leaps' was built under R version 3.6.3
attach(Boston)
set.seed(1)
train=sample(1:nrow(Boston), nrow(Boston)/2)
train.boston=Boston[train,]
test=(-train)
test.boston=Boston[test,]
crim.test=Boston$crim[test]
x.mat.train=model.matrix(crim~., data=train.boston)[,-crim]
x.mat.train=model.matrix(crim~., data=test.boston)[,-crim]

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

# Least Squares - all variables
lm.full=lm(crim~.,data=train.boston)
lm.predict=predict(lm.full,test.boston)
lm.MSE=mean((lm.predict-crim.test)^2)
lm.MSE # 41.54639
## [1] 41.54639
# subset
library(leaps)
regfit.best=regsubsets(crim~.,data=Boston[train,],nvmax=13)
test.mat=model.matrix(crim~.,data=Boston[test,])
val.errors=rep(NA,13)
for(i in 1:13){
   coefi=coef(regfit.best,id=i)
   pred=test.mat[,names(coefi)]%*%coefi
   val.errors[i]=mean((Boston$crim[test]-pred)^2)
}
val.errors
##  [1] 40.14557 41.87706 42.40901 42.45745 43.03836 42.13258 41.25016
##  [8] 41.28957 41.49271 41.50577 41.48839 41.46692 41.54639
which.min(val.errors) # 1 Variable - MSE=40.14557
## [1] 1
sub.mse=val.errors[1]

#ridge
library(glmnet)
set.seed(1)
x=model.matrix(crim~.,Boston)[,-1]
y=Boston$crim
y.test=y[test]
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
grid=10^seq(10,-2,length=100)
ridge.mod = glmnet(x,y,lambda=grid, alpha=0)
cv.out=cv.glmnet(x[train,],y[train],alpha=0)
plot(cv.out)

bestlam=cv.out$lambda.min
ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
ridge.mse = mean((ridge.pred-y.test)^2) 
ridge.mse #39.67203
## [1] 38.01174
#lasso
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
grid=10^seq(10,-2,length=100)
lasso.mod = glmnet(x,y,lambda=grid, alpha=1)
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
plot(cv.out)

bestlam=cv.out$lambda.min
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
lasso.mse = mean((lasso.pred-y.test)^2)
lasso.mse #38.05746
## [1] 39.67203
#PCR
set.seed(1)
library(pls)
pcr.fit = pcr(crim~., data=train.boston, Scale=TRUE, validation='CV')
summary(pcr.fit)
## Data:    X dimension: 253 13 
##  Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           9.275    7.305    7.267    7.254    7.234    7.130    7.115
## adjCV        9.275    7.300    7.259    7.246    7.226    7.121    7.107
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.795    6.763    6.743     6.694     6.694     6.696     6.700
## adjCV    6.782    6.753    6.732     6.682     6.682     6.684     6.686
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
## X       81.27    97.18    99.08    99.72    99.89    99.93    99.97
## crim    39.26    40.61    40.87    41.25    43.24    43.95    48.88
##       8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## X       99.99   100.00    100.00    100.00    100.00    100.00
## crim    49.34    49.94     50.86     50.93     50.98     51.37
validationplot(pcr.fit,val.type="MSEP")

pcr.pred=predict(pcr.fit,test.boston,ncomp =7)
pcr.mse=mean((pcr.pred-y.test)^2) #43.48068
pcr.mse #43.48068
## [1] 43.48068

Simple least squares, best subset, lasso, ridge regression, and PCR were all used to fit and test models. Test MSE were similar for all fit models, but the lasso model appears to have the lowest test MSE.

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross-validation, or some other reasonable alternative, as opposed to using training error.

out=glmnet(x,y,alpha=1,lambda =grid)
lasso.coef=predict(out,type='coefficient',s=bestlam)[1:14,]
lasso.coef
##  (Intercept)           zn        indus         chas          nox 
## 14.476496146  0.039476650 -0.070676946 -0.643189309 -8.433633678 
##           rm          age          dis          rad          tax 
##  0.323327442  0.000000000 -0.877313853  0.539447859 -0.001159816 
##      ptratio        black        lstat         medv 
## -0.224731922 -0.007537653  0.126684807 -0.175267556

The proposed model fit with the lasso method is: \(crim = 14.0596 + .0386zn - .0718indus -.6269chas - 8.1333nox + .30354rm -.8579dis + .5316rad-.0007tax - .2171ptratio - .0075black + .1264lstat - .1713medv\). This model performed the best of any of the fit models based on test MSE.

(c) Does your chosen model involve all of the features in the dataset? Why or why not?
This model does not include all of the features in the dataset. The coefficient for age became 0 in the model