2,9,11

  1. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
  1. The lasso, relative to least squares, is:
  1. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  2. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  3. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  4. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

iii is correct because it can have a less variance and a better prediction,a higher bias

  1. Repeat (a) for ridge regression relative to least squares.
  1. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. iii is correct because it decreases variance and has a higher bias. it is also less flexible relative to least squares.
  1. Repeat (a) for non-linear methods relative to least squares.
  1. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias. ii is correct because it has less bias than least squares in linear models
  1. In this exercise, we will predict the number of applications received using the other variables in the College data set.
  1. Split the data set into a training set and a test set.
library(ISLR)
## Warning: package 'ISLR' was built under R version 3.6.3
attach(College)
x=model.matrix(Apps~.,College)[,-1]
y=College$Apps
set.seed(10)
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
College.train = College[train, ]
College.test = College[test, ]
y.test=y[test]
  1. Fit a linear model using least squares on the training set, and report the test error obtained.
pls.fit<-lm(Apps~., data=College, subset=train)
summary(pls.fit)
## 
## Call:
## lm(formula = Apps ~ ., data = College, subset = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5139.5  -473.3   -21.1   353.2  7402.7 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -629.36179  639.35741  -0.984 0.325579    
## PrivateYes  -647.56836  192.17056  -3.370 0.000832 ***
## Accept         1.68912    0.05038  33.530  < 2e-16 ***
## Enroll        -1.02383    0.27721  -3.693 0.000255 ***
## Expend         0.07654    0.01672   4.577 6.45e-06 ***
## Grad.Rate      9.99706    4.49821   2.222 0.026857 *  
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1092 on 370 degrees of freedom
## Multiple R-squared:  0.9395, Adjusted R-squared:  0.9367 
## F-statistic:   338 on 17 and 370 DF,  p-value: < 2.2e-16
pred.app<-predict(pls.fit, College.test)
test.error<-mean((College.test$Apps-pred.app)^2)
test.error
## [1] 1020100
  1. Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(glmnet)
## Loaded glmnet 4.0-2
grid=10^seq(10,-2,length=100)
ridge.mod=glmnet(x[train,],y[train],alpha=0,lambda=grid)
summary(ridge.mod)
##           Length Class     Mode   
## a0         100   -none-    numeric
## beta      1700   dgCMatrix S4     
## df         100   -none-    numeric
## dim          2   -none-    numeric
cv.college.out=cv.glmnet(x[train,],y[train] ,alpha=0)
bestlam=cv.college.out$lambda.min
bestlam
ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
mean((ridge.pred-y.test)^2)
## [1] 985020.1
  1. Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
summary(lasso.mod)
##           Length Class     Mode   
## a0         100   -none-    numeric
## beta      1700   dgCMatrix S4     
## df         100   -none-    numeric
## dim          2   -none-    numeric
## lambda     100   -none-    numeric
## dev.ratio  100   -none-    numeric
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
bestlam=cv.out$lambda.min
bestlam
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
out=glmnet(x,y,alpha=1,lambda = grid)
lasso.coef=predict(out,type="coefficients",s=bestlam)[1:18,]
lasso.coef[lasso.coef!=0]
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -6.324960e+02 -4.087012e+02  1.436837e+00 -1.410240e-01  3.143012e+01 
##     Top25perc   P.Undergrad      Outstate    Room.Board      Personal 
## -8.606525e-01  1.480293e-02 -5.342495e-02  1.205819e-01  4.379046e-05 
  1. Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
library(pls)
## Warning: package 'pls' was built under R version 3.6.3
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
pcr.college=pcr(Apps~., data=College.train,scale=TRUE,validation="CV")
summary(pcr.college)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4347     4345     2371     2391     2104     1949     1898
## adjCV         4347     4345     2368     2396     2085     1939     1891
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1899     1880     1864      1861      1870      1873      1891
## adjCV     1893     1862     1857      1853      1862      1865      1885
##       8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps
## X       87.54    90.50     92.89     94.96     96.81     97.97     98.73
## Apps    83.70    83.86     84.08     84.11     84.11     84.16     84.28
##       15 comps  16 comps  17 comps
## X        99.39     99.86    100.00
## Apps     93.08     93.71     93.95
validationplot(pcr.college, val.type="MSEP")
pcr.pred=predict(pcr.college,x[test,],ncomp=10)
mean((pcr.pred-y.test)^2)
  1. Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
pls.college=plsr(Apps~., data=College.train,scale=TRUE, validation="CV")
validationplot(pls.college, val.type="MSEP")
summary(pls.college)
##        14 comps  15 comps  16 comps  17 comps
## CV         1305      1307      1307      1307
## adjCV      1291      1292      1293      1293
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
## X       24.27    38.72    62.64    65.26    69.01    73.96    78.86
## Apps    76.96    84.31    86.80    91.48    93.37    93.75    93.81
##       8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps
## X       82.18    85.35     87.42     89.18     91.41     92.70     94.58
## Apps    93.84    93.88     93.91     93.93     93.94     93.95     93.95
pls.pred=predict(pls.college,x[test,],ncomp=9)
mean((pls.pred-y.test)^2)
## [1] 1049868
  1. Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
test.avg = mean(College.test[, "Apps"])
lm.test.r2 = 1 - mean((College.test[, "Apps"] - pred.app)^2) 
pcr.test.r2 = 1 - mean((pcr.pred-y.test)^2) /mean((College.test[, "Apps"] - test.avg)^2)
pls.test.r2 = 1 - mean((pls.pred-y.test)^2) /mean((College.test[, "Apps"] - test.avg)^2)
barplot(c(lm.test.r2, ridge.test.r2, lasso.test.r2, pcr.test.r2, pls.test.r2), names.arg=c("OLS", "Ridge", "Lasso", "PCR", "PLS"), main="Test R-squared")
detach(College)
  1. We will now try to predict per capita crime rate in the Boston data set.
  1. Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
library(leaps)
library(MASS)
set.seed(1)
attach(Boston)
predict.regsubsets = function(object, newdata, id, ...) {
    form = as.formula(object$call[[2]])
    mat = model.matrix(form, newdata)
    coefi = coef(object, id = id)
    mat[, names(coefi)] %*% coefi
}
k = 10
p = ncol(Boston) - 1
folds = sample(rep(1:k, length = nrow(Boston)))
cv.errors = matrix(NA, k, p)
for (j in 1:p) {
        pred = predict(best.fit, Boston[folds == i, ], id = j)
}
mean.cv.errors <- apply(cv.errors, 2, mean)
plot(mean.cv.errors, type = "b", xlab = "Number of variables", ylab = "CV error")
mean.cv.errors[which.min(mean.cv.errors)]
x = model.matrix(crim ~ . - 1, data = Boston)
y = Boston$crim
cv.ridge = cv.glmnet(x, y, type.measure = "mse", alpha = 0)
plot(cv.ridge)
## zn          -0.002949852
## indus        0.029276741
## chas        -0.166526006
## nox          1.874769661
## rm          -0.142852604
## age          0.006207995
## dis         -0.094547258
## rad          0.045932737
## tax          0.002086668
## ptratio      0.071258052
pcr.crime = pcr(crim ~ ., data = Boston, scale = TRUE, validation = "CV")
summary(pcr.crime)
# VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            8.61    7.175    7.180    6.724    6.731    6.727    6.727
## adjCV         8.61    7.174    7.179    6.721    6.725    6.724    6.724
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.722    6.614    6.618     6.607     6.598     6.553     6.488
## adjCV    6.718    6.609    6.613     6.602     6.592     6.546     6.481
  1. Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error. The model chosen has the lowest cross validation.

  2. Does your chosen model involve all of the features in the data set? Why or why not? The model that is picked has the lowest MSE. It has all of the features in the data set because it has low variance.