Problem 2 - Part A

The correct selection is III.

The best predictive lasso model, relative to least squares, is achieved when its increase in bias is less than its decrease in variance. Since the number of predictors in the model is limited by the lasso technique, bias increases and variance decreases as insignificant predictors are removed. The best predictive lasso model is achieved when the resulting increase in bias is less than the decrease in variance.

Problem 2 - Part B

The correct selection is III.

The best predictive ridge model, relative to least squares, is achieved when its increase in bias is less than its decrease in variance. The ridge regression bias/variance trade is similar to the lasso technique, expect that variables will be shrunk when they do not have a significant relationship with the target (they typically will not equal zero).

Problem 2 - Part C

The correct selection is II.

The best predictive nonlinear method is achieved when its increase in variance is less than its decrease in bias. In the case of nonlinear methods, the model will achieve better performance when the relationship between the target and the predictors is not linear.

Problem 9 - Part A

In this chunk, I am splitting the college data set in half to create one training set and one validation set.

college <- College
attach(college)
set.seed(1)
split <- sample(1:nrow(college), nrow(college)/2)
collegeTrain <- college[split,]
collegeTest <- college[-split,]

Problem 9 - Part B

The test error obtained for the least squares model is 1135758.

collegeLM <- lm(Apps~., data=collegeTrain)
collegeLM.pred <- predict(collegeLM, collegeTest)
mean((collegeTest$Apps - collegeLM.pred)^2)
## [1] 1135758

Problem 9 - Part C

collegeTrain.matrix <- model.matrix(Apps~., data=collegeTrain)
collegeTest.matrix <- model.matrix(Apps~., data=collegeTest)
grid = 10^seq(4,-2, length=100)
collegeRidge <- cv.glmnet(collegeTrain.matrix, collegeTrain$Apps, alpha=0, lambda=grid, thresh=1e-12)
bestLambda.ridge <- collegeRidge$lambda.min
bestLambda.ridge
## [1] 0.01

The test error obtained for the ridge regression model is 1135714.

collegeRidge.pred <- predict(collegeRidge, newx=collegeTest.matrix, s=bestLambda.ridge)
mean((collegeTest$Apps - collegeRidge.pred)^2)
## [1] 1135714

Problem 9 - Part D

collegeLasso <- cv.glmnet(collegeTrain.matrix, collegeTrain$Apps, alpha=1, lambda=grid, thresh=1e-12)
bestLambda.lasso <- collegeLasso$lambda.min
bestLambda.lasso
## [1] 0.01

The test error obtained for the lasso model is 1135660.

collegeLasso.pred <- predict(collegeLasso, newx=collegeTest.matrix, s=bestLambda.lasso)
mean((collegeTest$Apps - collegeLasso.pred)^2)
## [1] 1135660

16 coefficients do not equal zero.

collegeLasso.coef <- predict(collegeLasso, s=bestLambda.lasso, type="coefficients")[1:ncol(college),]
collegeLasso.coef[collegeLasso.coef!=0]
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -7.900363e+02 -3.070103e+02  1.779328e+00 -1.469508e+00  6.672214e+01 
##     Top25perc   F.Undergrad   P.Undergrad      Outstate    Room.Board 
## -2.230442e+01  9.258974e-02  9.408838e-03 -1.083495e-01  2.115147e-01 
##         Books      Personal           PhD      Terminal     S.F.Ratio 
##  2.912105e-01  6.120406e-03 -1.547200e+01  6.409503e+00  2.282638e+01 
##   perc.alumni        Expend 
##  1.130498e+00  4.856697e-02

Problem 9 - Part E

set.seed(1)
collegePCR <- pcr(Apps~., data=collegeTrain, scale=TRUE, validation="CV")
validationplot(collegePCR, val.type="MSEP")

summary(collegePCR)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4288     4006     2373     2372     2069     1961     1919
## adjCV         4288     4007     2368     2369     1999     1948     1911
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1919     1921     1876      1832      1832      1836      1837
## adjCV     1912     1915     1868      1821      1823      1827      1827
##        14 comps  15 comps  16 comps  17 comps
## CV         1853      1759      1341      1270
## adjCV      1850      1733      1326      1257
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       32.20    57.78    65.31    70.99    76.37    81.27     84.8    87.85
## Apps    13.44    70.93    71.07    79.87    81.15    82.25     82.3    82.33
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.62     92.91     94.98     96.74     97.79     98.72     99.42
## Apps    83.38     84.76     84.80     84.84     85.11     85.14     90.55
##       16 comps  17 comps
## X        99.88    100.00
## Apps     93.42     93.89

The value of M selected by cross-validation is 17 (min CV is 1270). The test error obtained for the PCR model is 1135758.

collegePCR.pred <- predict(collegePCR, collegeTest, ncomp=17)
mean((collegeTest$Apps - collegePCR.pred)^2)
## [1] 1135758

Problem 9 - Part F

set.seed(1)
collegePLS <- plsr(Apps~., data=collegeTrain, scale=TRUE, validation="CV")
validationplot(collegePLS, val.type="MSEP")

summary(collegePLS)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4288     2217     2019     1761     1630     1533     1347
## adjCV         4288     2211     2012     1749     1605     1510     1331
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1309     1303     1286      1283      1283      1277      1271
## adjCV     1296     1289     1273      1270      1270      1264      1258
##        14 comps  15 comps  16 comps  17 comps
## CV         1270      1270      1270      1270
## adjCV      1258      1257      1257      1257
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       27.21    50.73    63.06    65.52    70.20    74.20    78.62    80.81
## Apps    75.39    81.24    86.97    91.14    92.62    93.43    93.56    93.68
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       83.29     87.17     89.15     91.37     92.58     94.42     96.98
## Apps    93.76     93.79     93.83     93.86     93.88     93.89     93.89
##       16 comps  17 comps
## X        98.78    100.00
## Apps     93.89     93.89

The value of M selected by cross-validation is 15 (min CV is 1270 and min adjusted CV is 1257). The test error obtained for the PCR model is 1135806.

collegePLS.pred <- predict(collegePLS, collegeTest, ncomp=15)
mean((collegeTest$Apps - collegePLS.pred)^2)
## [1] 1135806

Problem 9 - Part G

collegeTest.avg <- mean(collegeTest$Apps)
collegeLM.r2 <- 1 - mean((collegeLM.pred - collegeTest$Apps)^2)/mean((collegeTest.avg - collegeTest$Apps)^2)
collegeRidge.r2 <- 1 - mean((collegeRidge.pred - collegeTest$Apps)^2) / mean((collegeTest.avg - collegeTest$Apps)^2)
collegeLasso.r2 <- 1 - mean((collegeLasso.pred - collegeTest$Apps)^2) / mean((collegeTest.avg - collegeTest$Apps)^2)
collegePCR.r2 <- 1 - mean((collegePCR.pred - collegeTest$Apps)^2) / mean((collegeTest.avg - collegeTest$Apps)^2)
collegePLS.r2 <- 1 - mean((collegePLS.pred - collegeTest$Apps)^2) / mean((collegeTest.avg - collegeTest$Apps)^2)

The test R-square value for all of the models are highly similar, though it appears that the lasso model provides the best performance since it has a slightly higher value than the other four models (0.9015499 or 90.15499% of the variance explained).

collegeLM.r2
## [1] 0.9015413
collegeRidge.r2
## [1] 0.9015452
collegeLasso.r2
## [1] 0.9015499
collegePCR.r2
## [1] 0.9015413
collegePLS.r2
## [1] 0.9015372

Problem 11 - Part A

boston <- Boston
attach(boston)

In this chunk, I am splitting the boston data set in half to create one training set and one validation set.

set.seed(1)
split.2 <- sample(1:nrow(boston), nrow(boston)/2)
bostonTrain <- boston[split.2,]
bostonTest <- boston[-split.2,]
bostonTrain.matrix <- model.matrix(crim~., data=bostonTrain)[,-1]
bostonTest.matrix <- model.matrix(crim~., data=bostonTest)[,-1]

Lasso Regression

bostonLasso <- cv.glmnet(bostonTrain.matrix, bostonTrain$crim, alpha=1)
bestLambda.boston.lasso <- bostonLasso$lambda.min
bestLambda.boston.lasso
## [1] 0.06805595
plot(bostonLasso)

The cross-validation estimate for the test error is 40.90173.

bostonLasso.pred <- predict(bostonLasso, newx=bostonTest.matrix, s=bestLambda.boston.lasso)
mean((bostonTest$crim - bostonLasso.pred)^2)
## [1] 40.90173

11 coefficients in this model do not equal zero.

bostonLasso.coef <- predict(bostonLasso, s=bestLambda.boston.lasso, type="coefficients")[1:13,]
bostonLasso.coef
## (Intercept)          zn       indus        chas         nox          rm 
## 17.65005511  0.03516255 -0.11838293 -0.43135144 -7.19578178  0.04271112 
##         age         dis         rad         tax     ptratio       black 
##  0.00000000 -0.76801501  0.52430211  0.00000000 -0.35072332 -0.01307754 
##       lstat 
##  0.25559458

Ridge Regression

bostonRidge <- cv.glmnet(bostonTrain.matrix, bostonTrain$crim, alpha=0)
bestLambda.boston.ridge <- bostonLasso$lambda.min
bestLambda.boston.ridge
## [1] 0.06805595
plot(bostonRidge)

The cross-validation estimate for the test error is 40.92777.

bostonRidge.pred <- predict(bostonRidge, newx=bostonTest.matrix, s=bestLambda.boston.ridge)
mean((bostonTest$crim - bostonRidge.pred)^2)
## [1] 40.92777

11 coefficients in this model do not equal zero.

bostonRidge.coef <- predict(bostonRidge, s=bestLambda.boston.ridge, type="coefficients")[1:13,]
bostonRidge.coef
##  (Intercept)           zn        indus         chas          nox           rm 
## 14.702068274  0.035283661 -0.119976460 -0.616052143 -5.629356968  0.228001208 
##          age          dis          rad          tax      ptratio        black 
## -0.004314219 -0.768474301  0.434236779  0.003139323 -0.298647696 -0.013823430 
##        lstat 
##  0.262179350

Principal Components Regression

set.seed(1)
bostonPCR <- pcr(crim~., data=bostonTrain, scale=TRUE, validation="CV")
validationplot(bostonPCR, val.type="MSEP")

summary(bostonPCR)
## Data:    X dimension: 253 13 
##  Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           9.275    7.555    7.549    7.093    6.926    6.932    6.977
## adjCV        9.275    7.550    7.544    7.088    6.920    6.926    6.968
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.976    6.871    6.845     6.848     6.797     6.764     6.700
## adjCV    6.967    6.859    6.843     6.842     6.786     6.751     6.686
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       48.51     60.4    69.86    77.08    82.80    87.68    91.24    93.56
## crim    34.94     35.2    42.83    45.47    45.57    45.58    45.75    47.59
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       95.47     97.08     98.48     99.54    100.00
## crim    47.68     48.75     49.31     50.14     51.37

The value of M selected by cross-validation is 13 (min CV is 6.700). The test error obtained for this model is 41.54639.

bostonPCR.pred <- predict(bostonPCR, bostonTest, ncomp=13)
mean((bostonTest$crim - bostonPCR.pred)^2)
## [1] 41.54639

Partial Least Squares

set.seed(1)
bostonPLS <- plsr(crim~., data=bostonTrain, scale=TRUE, validation="CV")
validationplot(bostonPLS, val.type="MSEP")

summary(bostonPLS)
## Data:    X dimension: 253 13 
##  Y dimension: 253 1
## Fit method: kernelpls
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           9.275    7.328    6.842    6.784    6.741    6.718    6.695
## adjCV        9.275    7.322    6.834    6.768    6.728    6.707    6.683
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.722    6.707    6.696     6.701     6.700     6.700     6.700
## adjCV    6.706    6.693    6.683     6.688     6.686     6.686     6.686
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       48.10    57.46    62.68    69.65    77.95    80.82    84.40    87.65
## crim    38.94    47.55    49.73    50.61    50.85    51.17    51.29    51.35
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       90.47     93.07     96.66     98.26    100.00
## crim    51.37     51.37     51.37     51.37     51.37

The value of M selected by cross-validation is 6 (min CV is 6.695). The test error obtained for this model is 41.78158.

bostonPLS.pred <- predict(bostonPLS, bostonTest, ncomp=6)
mean((bostonTest$crim - bostonPLS.pred)^2)
## [1] 41.78158

Problem 11 - Part B

The test errors obtained for the four models I performed are as follows:

Lasso: 40.90173 Ridge: 40.92777 PCR: 41.54639 PLS: 41.78158

Based on these results, I am led to believe that the lasso model performs best for this data set since it has the lowest test error (although the test error for all four models are similar) and it eliminates two of the thirteen predictors.

Problem 11 - Part C

The model I chose does not contain all of the features in the data set. Two coefficients in the lasso model are equal to zero and can therefore be eliminated. Consequently, this leaves eleven coefficients in the model which do not equal zero.