Chapter 06 (page 259): 2, 9, 11

#install.packages("leaps")

2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

(a) The lasso, relative to least squares, (iii) is less flexible and will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

(b) The ridge regression relative to least squares, (iii) is less flexible and will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

(c) Non-linear methods relative to least squares, (ii) is more flexible and will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

9. In this exercise, we will predict the number of applications received using the other variables in the College data set.

library(ISLR)
library(leaps)

(a) Split the data set into a training set and a test set.

attach(College)
set.seed(1)
train <- sample(nrow(College) * 0.7)
Ctrain <- College[train, ]
Ctest <- College[-train, ]

(b) Fit a linear model using least squares on the training set, and report the test error obtained.
The test mean squared error is 1413322

Col.lm <- lm(Apps~., data = Ctrain)
pred.lm <- predict(Col.lm, Ctest)
mean((Ctest$Apps - pred.lm)^2)
## [1] 1413322

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained. The test mean squared error is 976268.9

#install.packages("glmnet")
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-2
x=model.matrix(Apps~.,College)[,-1]
y=College$Apps
grid=10^seq(10,-2,length=100)
Cridge.mod=glmnet(x,y,alpha=1,lambda=grid)
set.seed(1)
train <- sample(1:nrow(x), nrow(x)/2)
test=(-train)
y.test=y[test]
Cridge.mod=glmnet(x[train,], y[train], alpha = 0, lambda = grid)
set.seed(1)
cv.out=cv.glmnet(x[train,],y[train],alpha=0)
bestlam=cv.out$lambda.min
bestlam
## [1] 405.8404
Cridge.pred=predict(Cridge.mod,s=bestlam,newx=x[test,])
mean((Cridge.pred-y.test)^2)
## [1] 976268.9

(d) Fit a lasso model on the training set, with λ chosen by cross- validation. Report the test error obtained, along with the number of non-zero coefficient estimates.
The test mean squared error is 1116252 and there are 18 non-zero coefficient estimates.

Classo.mod=glmnet(x[train,], y[train], alpha = 1, lambda = grid)
set.seed(1)
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
bestlam=cv.out$lambda.min
bestlam
## [1] 1.97344
Classo.pred=predict(Classo.mod,s=bestlam,newx=x[test,])
mean((Classo.pred-y.test)^2)
## [1] 1116252
predict(cv.out,type="coefficients",s=bestlam) [1:18,]
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -7.688896e+02 -3.127034e+02  1.762718e+00 -1.318195e+00  6.482356e+01 
##     Top25perc   F.Undergrad   P.Undergrad      Outstate    Room.Board 
## -2.081406e+01  7.119149e-02  1.246161e-02 -1.049091e-01  2.088305e-01 
##         Books      Personal           PhD      Terminal     S.F.Ratio 
##  2.926466e-01  3.955068e-03 -1.455463e+01  5.395858e+00  2.171398e+01 
##   perc.alumni        Expend     Grad.Rate 
##  5.088260e-01  4.824455e-02  7.036148e+00

(e) Fit a PCR model on the training set, with M chosen by cross- validation. Report the test error obtained, along with the value of M selected by cross-validation.
The test mean squared error is 1936434 with 10 components.

#install.packages("pls")
library(pls)
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
train=sample(1:nrow(College), nrow(College)/2)
test=(-train)
set.seed(1)
Colpcr=pcr(Apps~., data=College, scale=TRUE, validation="CV")
summary(Colpcr)
## Data:    X dimension: 777 17 
##  Y dimension: 777 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3873     3842     2033     2033     1725     1617     1597
## adjCV         3873     3844     2031     2033     1684     1604     1593
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1592     1549     1510      1508      1514      1515      1525
## adjCV     1592     1543     1507      1505      1511      1511      1522
##        14 comps  15 comps  16 comps  17 comps
## CV         1526      1453      1163      1140
## adjCV      1522      1435      1157      1134
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X      31.670    57.30    64.30    69.90    75.39    80.38    83.99    87.40
## Apps    2.316    73.06    73.07    82.08    84.08    84.11    84.32    85.18
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.50     92.91     95.01     96.81      97.9     98.75     99.36
## Apps    85.88     86.06     86.06     86.10      86.1     86.13     90.32
##       16 comps  17 comps
## X        99.84    100.00
## Apps     92.52     92.92
validationplot(Colpcr, val.type = "MSEP")

set.seed(1)
Colpcr=pcr(Apps~., data=College[train, ], scale=TRUE, validation="CV")
#validationplot(Colpcr, val.type = "MSEP")
Colpcr.pred=predict(Colpcr, College[test,], ncomp=10)
mean((Colpcr.pred-College$Apps[test])^2)                    
## [1] 1322362

(f) Fit a PLS model on the training set, with M chosen by cross- validation. Report the test error obtained, along with the value of M selected by cross-validation.
The test mean squared error is 1264440 with 11 components.

set.seed(1)
Colpls=plsr(Apps~., data=College[train,], scale=TRUE, validation="CV")
summary(Colpls)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4377     2187     1789     1784     1615     1453     1298
## adjCV         4377     2182     1773     1800     1588     1414     1282
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1273     1244     1241      1235      1220      1221      1220
## adjCV     1260     1233     1229      1223      1210      1210      1208
##        14 comps  15 comps  16 comps  17 comps
## CV         1219      1220      1220      1220
## adjCV      1208      1209      1209      1209
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       26.36    33.68    61.84    64.33    66.96    72.94    76.83    79.79
## Apps    76.52    86.14    87.25    91.87    93.75    93.96    94.04    94.15
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       82.64     86.12     89.36     90.92     92.85     95.55     96.98
## Apps    94.26     94.31     94.36     94.40     94.41     94.41     94.41
##       16 comps  17 comps
## X        98.86    100.00
## Apps     94.41     94.41
Colpls.pred=predict(Colpls, College[test,], ncomp=11)
mean((Colpls.pred-College$Apps[test])^2) 
## [1] 1243875

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
Yes there is a pretty big difference between Ridge Regression test MSE and PCR test MSE.
Ridge MSE 976268.9
Linear MSE 1116252
lasso MSE 1412848
PCR MSE 1936434
PLS MSE 1264440

11. We will now try to predict per capita crime rate in the Boston data set.

library(MASS)

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider. I tried best subset, lasso, and ridge regression. The ridge regression with cross-validation had the best test MSE, but the MSE for lasso with cross-validation was not far off. Best subset had the highest MSE out of the three.

set.seed(1)
train <- sample(nrow(Boston) * 0.7)
Btrain <- Boston[train, ]
Btest <- Boston[-train, ]

Best Subset

bestBost <- regsubsets(crim ~ ., data = Boston, nvmax = 13)
summary(bestBost)
## Subset selection object
## Call: regsubsets.formula(crim ~ ., data = Boston, nvmax = 13)
## 13 Variables  (and intercept)
##         Forced in Forced out
## zn          FALSE      FALSE
## indus       FALSE      FALSE
## chas        FALSE      FALSE
## nox         FALSE      FALSE
## rm          FALSE      FALSE
## age         FALSE      FALSE
## dis         FALSE      FALSE
## rad         FALSE      FALSE
## tax         FALSE      FALSE
## ptratio     FALSE      FALSE
## black       FALSE      FALSE
## lstat       FALSE      FALSE
## medv        FALSE      FALSE
## 1 subsets of each size up to 13
## Selection Algorithm: exhaustive
##           zn  indus chas nox rm  age dis rad tax ptratio black lstat medv
## 1  ( 1 )  " " " "   " "  " " " " " " " " "*" " " " "     " "   " "   " " 
## 2  ( 1 )  " " " "   " "  " " " " " " " " "*" " " " "     " "   "*"   " " 
## 3  ( 1 )  " " " "   " "  " " " " " " " " "*" " " " "     "*"   "*"   " " 
## 4  ( 1 )  "*" " "   " "  " " " " " " "*" "*" " " " "     " "   " "   "*" 
## 5  ( 1 )  "*" " "   " "  " " " " " " "*" "*" " " " "     "*"   " "   "*" 
## 6  ( 1 )  "*" " "   " "  "*" " " " " "*" "*" " " " "     "*"   " "   "*" 
## 7  ( 1 )  "*" " "   " "  "*" " " " " "*" "*" " " "*"     "*"   " "   "*" 
## 8  ( 1 )  "*" " "   " "  "*" " " " " "*" "*" " " "*"     "*"   "*"   "*" 
## 9  ( 1 )  "*" "*"   " "  "*" " " " " "*" "*" " " "*"     "*"   "*"   "*" 
## 10  ( 1 ) "*" "*"   " "  "*" "*" " " "*" "*" " " "*"     "*"   "*"   "*" 
## 11  ( 1 ) "*" "*"   " "  "*" "*" " " "*" "*" "*" "*"     "*"   "*"   "*" 
## 12  ( 1 ) "*" "*"   "*"  "*" "*" " " "*" "*" "*" "*"     "*"   "*"   "*" 
## 13  ( 1 ) "*" "*"   "*"  "*" "*" "*" "*" "*" "*" "*"     "*"   "*"   "*"
reg.summary=summary(bestBost)
par(mfrow=c(2,2))
plot(reg.summary$rss,xlab="Number of Variables",ylab="RSS",type="l")
plot(reg.summary$adjr2,xlab="Number of Variables",ylab="Adjusted RSq",type="l")
plot(reg.summary$cp,xlab="Number of Variables",ylab="Adjusted Cp",type="l")
plot(reg.summary$bic,xlab="Number of Variables",ylab="Adjusted BIC",type="l")

which.max(reg.summary$adjr2)
## [1] 9
which.min(reg.summary$cp)
## [1] 8
which.min(reg.summary$bic)
## [1] 3
set.seed(1)
train=sample(c(TRUE,FALSE), nrow(Hitters),rep=TRUE)
test=(!train)
bestBost <- regsubsets(crim ~ ., data = Boston[train, ], nvmax = 13)
test.mat=model.matrix(crim~.,data=Boston[test,])

predict.regsubsets=function(object,newdata,id,...){
  form=as.formula(object$call[[2]])
  mat=model.matrix(form,newdata)
  coefi=coef(object,id=id)
  xvars=names(coefi)
  mat[,xvars]%*%coefi
}
k=10
set.seed(1)
folds=sample(1:k,nrow(Boston),replace=TRUE)
cv.errors=matrix(NA,k,13, dimnames=list(NULL, paste(1:13)))
for(j in 1:k){
  best.fit=regsubsets(crim~.,data=Boston[folds!=j,],nvmax=13)
  for(i in 1:13){
    pred=predict(best.fit,Boston[folds==j,],id=i)
    cv.errors[j,i]=mean( (Boston$crim[folds==j]-pred)^2)
    }
  }
mean.cv.errors=apply(cv.errors,2,mean)
mean.cv.errors
##        1        2        3        4        5        6        7        8 
## 45.44573 43.87260 43.94979 44.02424 43.96415 43.96199 42.96268 42.66948 
##        9       10       11       12       13 
## 42.53822 42.73416 42.52367 42.46014 42.50125
reg.best=regsubsets(crim~.,data=Boston, nvmax=13)
coef(reg.best, 12)
##   (Intercept)            zn         indus          chas           nox 
##  16.985713928   0.044673247  -0.063848469  -0.744367726 -10.202169211 
##            rm           dis           rad           tax       ptratio 
##   0.439588002  -0.993556631   0.587660185  -0.003767546  -0.269948860 
##         black         lstat          medv 
##  -0.007518904   0.128120290  -0.198877768

Lasso Regression

x=model.matrix(crim~.,Boston)[,-1]
y=Boston$crim
grid=10^seq(10,-2,length=100)
lasso.mod=glmnet(x,y,alpha=1,lambda=grid)
set.seed(1)
train <- sample(1:nrow(x), nrow(x)/2)
test=(-train)
y.test=y[test]
lasso.mod=glmnet(x[train,], y[train], alpha = 1, lambda = grid)
set.seed(1)
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
bestlam=cv.out$lambda.min
bestlam
## [1] 0.05148183
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
## [1] 40.99066

Ridge Regression

ridge.mod=glmnet(x[train,],y[train],alpha=0,lambda=grid)
set.seed(1)
cv.out=cv.glmnet(x[train,],y[train],alpha=0)
bestlam=cv.out$lambda.min
bestlam
## [1] 0.5919159
ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
mean((ridge.pred-y.test)^2)
## [1] 40.92395

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross- validation, or some other reasonable alternative, as opposed to using training error.
The ridge regression with cross-validation had the best test MSE, but the MSE for lasso with cross-validation did pretty well too. Best subset had a test MSE that was somewhat close to the other two, but it was the highest of the three.
Lasso test MSE 40.99066 Ridge test MSE 40.92395 Best Subset test MSE 42.50125

mean((lasso.pred-y.test)^2)
## [1] 40.99066
mean((ridge.pred-y.test)^2)
## [1] 40.92395

(c) Does your chosen model involve all of the features in the data set? Why or why not?
The ridge regression with cross-validation involves all of the features of the data set because ridge regression does not perform variable selection.