2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

  1. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

  2. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias

  3. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

  4. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

(a) The lasso, relative to least squares, is:

iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Explanation: Generally, the more flexible a method is the more variance it has. When the least squares estimates have excessively high variance, the lasso leads to qualitatively similar behavior to ridge regression, in that as λ increases, the variance decreases and the bias increases

(b) Repeat (a) for ridge regression relative to least squares.

iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Explanation:Generally, the more flexible a method is the more variance it has. At the least squares coefficient estimates, which correspond to ridge regression with λ = 0, the variance is high but there is no bias. But as λ increases, the shrinkage of the ridge coefficient estimates leads to a substantial reduction in the variance of the predictions, at the expense of a slight increase in bias.

(c) Repeat (a) for non-linear methods relative to least squares

ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

Explanation: While OLS can model curves, it is relatively restricted in the shapes of the curves that it can fit. Sometimes it can’t fit the specific curve in your data. Non-linear methods are much more flexible in the shapes of the curves that it can fit, given the possible outcomes a shape can take that can be considered non-linear.

9. In this exercise, we will predict the number of applications received using the other variables in the College data set.

(a) Split the data set into a training set and a test set.

library(ISLR)
attach(College)
set.seed(7)
subset = sample(nrow(College),nrow(College)*0.75)
train = College[subset,]
test = College[-subset,]

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

fit.lm9B = lm(Apps~., data = train)
pred.fit = predict(fit.lm9B,test)
te9B = mean((pred.fit - test$Apps)^2)
te9B
## [1] 784381.4

The Mean Square error appears to be 784381.4

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.0-2
train.mat = model.matrix(Apps~., data = train)
test.mat = model.matrix(Apps~., data = test)
grid = 10^seq(4, -2, length = 100)

ridge = glmnet(train.mat, train$Apps, alpha = 0, lambda = grid, thresh = 1e-12)
cv.ridge = cv.glmnet(train.mat, train$Apps, alpha = 0, lambda = grid, thresh = 1e-12)
bestlam.ridge = cv.ridge$lambda.min
pred.ridge = predict(ridge,s = bestlam.ridge, newx = test.mat)

te9C = mean((pred.ridge - test$Apps)^2)
te9C
## [1] 784367.1

The MSE via ridge regression is 784367.1 and it appears to be a little less than our OLS MSE value of 784381.4

(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.

fit.lasso = glmnet(train.mat, train$Apps, alpha = 1, lambda = grid, thresh = 1e-12)
cv.lasso = cv.glmnet(train.mat, train$Apps, alpha = 1, lambda = grid, thresh = 1e-12)
bestlam.lasso = cv.lasso$lambda.min

pred.lasso = predict(fit.lasso, s = bestlam.lasso, newx = test.mat)
te9D = mean((pred.lasso - test$Apps)^2)
te9D
## [1] 781272.7
predict(fit.lasso, s = bestlam.lasso, type = "coefficients")
## 19 x 1 sparse Matrix of class "dgCMatrix"
##                         1
## (Intercept) -599.62722787
## (Intercept)    .         
## PrivateYes  -424.49706194
## Accept         1.48427724
## Enroll        -0.23908656
## Top10perc     30.66864732
## Top25perc      .         
## F.Undergrad    .         
## P.Undergrad    .         
## Outstate      -0.04569949
## Room.Board     0.10671006
## Books          .         
## Personal       0.02434163
## PhD           -5.20456803
## Terminal      -3.54311550
## S.F.Ratio      .         
## perc.alumni   -0.86758825
## Expend         0.06367211
## Grad.Rate      4.30134480

The MSE via lasso model is 781272.7, which is lower than both ridge regressions MSE and OLS MSE. With regards to the remaining non-zero coefficient estimates, there are only 12.

(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

library(pls)
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
set.seed(1)
pcrmodel = pcr(Apps~., data = train, scale = TRUE, validation = "CV")
validationplot(pcrmodel, val.type = "MSEP")

summary(pcrmodel)
## Data:    X dimension: 582 17 
##  Y dimension: 582 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4081     4065     2197     2194     1959     1811     1778
## adjCV         4081     4065     2193     2190     1944     1799     1772
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1765     1726     1666      1677      1684      1690      1700
## adjCV     1759     1717     1660      1670      1677      1684      1693
##        14 comps  15 comps  16 comps  17 comps
## CV         1700      1601      1302      1264
## adjCV      1694      1573      1291      1253
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X      32.153    57.36    64.31    70.06    75.61    80.56    84.16    87.66
## Apps    1.529    72.25    72.54    78.43    82.58    83.06    83.83    84.40
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.58     92.83     94.97     96.78     97.87     98.71     99.33
## Apps    85.28     85.31     85.31     85.31     85.32     85.36     90.82
##       16 comps  17 comps
## X        99.81    100.00
## Apps     92.47     93.03
pred.pcr = predict(pcrmodel, test, ncomp = 16)
te9E = mean((pred.pcr - test$Apps)^2)
te9E
## [1] 774105.7

The PCR MSE is 774105.7. Without recreating the OLS model by picking M = 17, the value of M = 16 produces the lowest MSE

(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

set.seed(1)
plsmodel = plsr(Apps~., data = train, scale = TRUE, validation = "CV")
validationplot(plsmodel, val.type = "MSEP")

summary(plsmodel)
## Data:    X dimension: 582 17 
##  Y dimension: 582 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4081     2013     1724     1608     1525     1349     1295
## adjCV         4081     2008     1719     1600     1506     1334     1283
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1282     1269     1263      1265      1267      1264      1263
## adjCV     1271     1259     1252      1255      1256      1253      1252
##        14 comps  15 comps  16 comps  17 comps
## CV         1264      1264      1264      1264
## adjCV      1253      1253      1253      1253
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       25.28    39.73    62.68    65.48    68.91    73.73    77.69    80.69
## Apps    77.38    84.65    87.18    90.44    92.33    92.78    92.84    92.92
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       82.97     87.00     89.36     90.79     93.32     95.13     96.76
## Apps    92.99     93.01     93.02     93.03     93.03     93.03     93.03
##       16 comps  17 comps
## X        98.17    100.00
## Apps     93.03     93.03
pred.pls = predict(plsmodel, test, ncomp = 9)
te9F = mean((pred.pls - test$Apps)^2)
te9F 
## [1] 777977.1

The PCR MSE is 777977.1 and the value of M = 9 produces the lowest MSE.

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

Test_Error_Values = c(te9B, te9C, te9D, te9E, te9F)
names(Test_Error_Values) = c("lm", "ridge", "lasso", "pcr", "pls")


test.avg = mean(test$Apps)
lm.r2 = 1 - mean((pred.fit - test$Apps)^2) / mean((test.avg - test$Apps)^2)
ridge.r2 = 1 - mean((pred.ridge - test$Apps)^2) / mean((test.avg - test$Apps)^2)
lasso.r2 = 1 - mean((pred.lasso - test$Apps)^2) / mean((test.avg - test$Apps)^2)
pcr.r2 = 1 - mean((pred.pcr - test$Apps)^2) / mean((test.avg - test$Apps)^2)
pls.r2 = 1 - mean((pred.pls - test$Apps)^2) / mean((test.avg - test$Apps)^2)


RSquared_Chart = c(lm.r2, ridge.r2, lasso.r2, pcr.r2, pls.r2)
names(RSquared_Chart) = c("lm", "ridge", "lasso", "pcr", "pls")

Test Error Values

Test_Error_Values
##       lm    ridge    lasso      pcr      pls 
## 784381.4 784367.1 781272.7 774105.7 777977.1

RSquared Values

RSquared_Chart
##        lm     ridge     lasso       pcr       pls 
## 0.9208070 0.9208084 0.9211209 0.9218445 0.9214536

When it comes to looking at how accurately we predicted the number of college applications received, all of the Rsquared values were above 92%, which seems pretty good.

After running through the OLS, Ridge Regression, Lasso, PCR, and PLS models the end results show that the PCR model, with an MSE of 774105.7 and an RSquared at 0.921, outperformed the rest of the models. Still, PCR did not outperform the rest of the models by large margin, in fact the differences appear to be fiarly small.

11. We will now try to predict per capita crime rate in the Boston data set.

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

I’ll begin by exploring the Boston dataset’s basic structure and summary statistics. I’ll then follow up by doing a 75/25 test/train split on the dataset. After that, I will try out some different regressions methods and then collect, analyze, and discuss the results in terms of the best method based on test MSE. These methods will include Best Subset Regression, Ridge Regression, Lasso, PCR, and PLS.

library(MASS)
attach(Boston)
str(Boston)
## 'data.frame':    506 obs. of  14 variables:
##  $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
##  $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
##  $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
##  $ rm     : num  6.58 6.42 7.18 7 7.15 ...
##  $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
##  $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
##  $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
##  $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
##  $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
##  $ black  : num  397 397 393 395 397 ...
##  $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
##  $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
summary(Boston)
##       crim                zn             indus            chas        
##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
##  1st Qu.: 0.08204   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
##       nox               rm             age              dis        
##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
##       rad              tax           ptratio          black       
##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
##      lstat            medv      
##  Min.   : 1.73   Min.   : 5.00  
##  1st Qu.: 6.95   1st Qu.:17.02  
##  Median :11.36   Median :21.20  
##  Mean   :12.65   Mean   :22.53  
##  3rd Qu.:16.95   3rd Qu.:25.00  
##  Max.   :37.97   Max.   :50.00
set.seed(2244422)
subset = sample(nrow(Boston), nrow(Boston)*0.75)
btrain = Boston[subset,]
btest = Boston[-subset,]

Best Subset Selection

library(leaps)
fit.bestsub = regsubsets(crim~., data = btrain, nvmax = 13)
btest.mat = model.matrix(crim~., data = btest, nvmax = 13)
val.errors = rep(NA, 13)
for (i in 1:13) {
    coefi = coef(fit.bestsub, id = i)
    pred = btest.mat[, names(coefi)] %*% coefi
    val.errors[i] = mean((pred- btest$crim)^2)
}
plot(val.errors, xlab = "Number of predictors", ylab = "Test MSE", pch = 19, type = "b",col="orange")

which.min(val.errors)
## [1] 12
Bestsub_MSE = val.errors[12]
Bestsub_MSE
## [1] 38.88763

Ridge Regression

set.seed(2321)
train.mat = model.matrix(crim~., data = btrain)
test.mat = model.matrix(crim~., data = btest)
grid = 10^seq(4, -2, length = 100)

ridge = glmnet(train.mat, btrain$crim, alpha = 0, lambda = grid, thresh = 1e-12)
cv.ridge = cv.glmnet(train.mat, btrain$crim, alpha = 0, lambda = grid, thresh = 1e-12)
bestlam.ridge = cv.ridge$lambda.min
pred.ridge = predict(ridge, s = bestlam.ridge, newx = test.mat)

Ridge_MSE = mean((pred.ridge - btest$crim)^2)
Ridge_MSE
## [1] 38.66517

Lasso

fit.lasso = glmnet(train.mat, btrain$crim, alpha = 1, lambda = grid, thresh = 1e-12)
cv.lasso = cv.glmnet(train.mat, btrain$crim, alpha = 1, lambda = grid, thresh = 1e-12)
bestlam.lasso = cv.lasso$lambda.min

pred.lasso = predict(fit.lasso, s = bestlam.lasso, newx = test.mat)
Lasso_MSE = mean((pred.lasso - btest$crim)^2)
Lasso_MSE
## [1] 39.03718

PCR

set.seed(123)
pcrmodel = pcr(crim~., data = btrain, scale = TRUE, validation = "CV")
validationplot(pcrmodel, val.type = "MSEP")

summary(pcrmodel)
## Data:    X dimension: 379 13 
##  Y dimension: 379 1
## Fit method: svdpc
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           8.776    7.356    7.331    6.916    6.930    6.927    6.920
## adjCV        8.776    7.351    7.326    6.910    6.925    6.922    6.913
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.928    6.770    6.838     6.825     6.838     6.815     6.734
## adjCV    6.921    6.759    6.825     6.812     6.825     6.798     6.717
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       47.72    60.18    69.28    76.11    82.78    87.90    90.90    93.32
## crim    31.17    31.66    39.08    39.13    39.24    40.08    40.28    43.21
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       95.45     97.07     98.41     99.49    100.00
## crim    43.24     43.65     43.67     44.98     46.37
pred.pcr = predict(pcrmodel, btest, ncomp = 8)
PCR_MSE = mean((pred.pcr - btest$crim)^2)
PCR_MSE
## [1] 40.21201

PLS

set.seed(3211)
plsmodel = plsr(crim~., data = btrain, scale = TRUE, validation = "CV")
validationplot(plsmodel, val.type = "MSEP")

summary(plsmodel)
## Data:    X dimension: 379 13 
##  Y dimension: 379 1
## Fit method: kernelpls
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           8.776    7.126    6.771    6.707    6.677    6.653    6.645
## adjCV        8.776    7.124    6.765    6.695    6.666    6.641    6.632
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.632    6.631    6.625     6.627     6.627     6.627     6.627
## adjCV    6.620    6.619    6.614     6.616     6.615     6.615     6.615
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       47.32    56.66    62.55    71.27    75.80    78.52    83.85    85.70
## crim    34.67    42.35    44.63    45.43    45.89    46.23    46.29    46.36
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       88.56     94.09     96.53     98.54    100.00
## crim    46.37     46.37     46.37     46.37     46.37
pred.pls = predict(plsmodel, btest, ncomp = 9)
PLS_MSE = mean((pred.pls - btest$crim)^2)
PLS_MSE 
## [1] 38.90602

Now that I have ran through the different models, I will post their respective MSE values

Test_Error_Values11 = c(Bestsub_MSE, Ridge_MSE, Lasso_MSE, PCR_MSE, PLS_MSE)
names(Test_Error_Values11) = c("Best SS", "ridge", "lasso", "pcr", "pls")
Test_Error_Values11
##  Best SS    ridge    lasso      pcr      pls 
## 38.88763 38.66517 39.03718 40.21201 38.90602

Best Subset MSE = 38.88763

Ridge MSE = 38.66517

Lasso MSE = 39.03718

PCR MSE = 40.21201

PLS MSE = 38.90602

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

There was not a huge difference in outcome with regards to all of my test error values. Still, If I had to make a reccomendation I would go ahead and chose the ridge model, for it has the lowest MSE at 38.6651.

(c) Does your chosen model involve all of the features in the data set? Why or why not

My chosen model does involve all of the features in the data set. Since I picked my model based on the lowest MSE, I chose the Ridge model, which includes all of the features by design.