Problem 2 - Part A
The correct selection is III.
The best predictive lasso model, relative to least squares, is achieved when its increase in bias is less than its decrease in variance. Since the number of predictors in the model is limited by the lasso technique, bias increases and variance decreases as insignificant predictors are removed. The best predictive lasso model is achieved when the resulting increase in bias is less than the decrease in variance.
Problem 2 - Part B
The correct selection is III.
The best predictive ridge model, relative to least squares, is achieved when its increase in bias is less than its decrease in variance. The ridge regression bias/variance trade is similar to the lasso technique, expect that variables will be shrunk when they do not have a significant relationship with the target (they typically will not equal zero).
Problem 2 - Part C
The correct selection is II.
The best predictive nonlinear method is achieved when its increase in variance is less than its decrease in bias. In the case of nonlinear methods, the model will achieve better performance when the relationship between the target and the predictors is not linear.
Problem 9 - Part A
In this chunk, I am splitting the college data set in half to create one training set and one validation set.
college <- College
attach(college)
set.seed(1)
split <- sample(1:nrow(college), nrow(college)/2)
collegeTrain <- college[split,]
collegeTest <- college[-split,]
Problem 9 - Part B
The test error obtained for the least squares model is 1135758.
collegeLM <- lm(Apps~., data=collegeTrain)
collegeLM.pred <- predict(collegeLM, collegeTest)
mean((collegeTest$Apps - collegeLM.pred)^2)
## [1] 1135758
Problem 9 - Part C
collegeTrain.matrix <- model.matrix(Apps~., data=collegeTrain)
collegeTest.matrix <- model.matrix(Apps~., data=collegeTest)
grid = 10^seq(4,-2, length=100)
collegeRidge <- cv.glmnet(collegeTrain.matrix, collegeTrain$Apps, alpha=0, lambda=grid, thresh=1e-12)
bestLambda.ridge <- collegeRidge$lambda.min
bestLambda.ridge
## [1] 0.01
The test error obtained for the ridge regression model is 1135714.
collegeRidge.pred <- predict(collegeRidge, newx=collegeTest.matrix, s=bestLambda.ridge)
mean((collegeTest$Apps - collegeRidge.pred)^2)
## [1] 1135714
Problem 9 - Part D
collegeLasso <- cv.glmnet(collegeTrain.matrix, collegeTrain$Apps, alpha=1, lambda=grid, thresh=1e-12)
bestLambda.lasso <- collegeLasso$lambda.min
bestLambda.lasso
## [1] 0.01
The test error obtained for the lasso model is 1135660.
collegeLasso.pred <- predict(collegeLasso, newx=collegeTest.matrix, s=bestLambda.lasso)
mean((collegeTest$Apps - collegeLasso.pred)^2)
## [1] 1135660
16 coefficients do not equal zero.
collegeLasso.coef <- predict(collegeLasso, s=bestLambda.lasso, type="coefficients")[1:ncol(college),]
collegeLasso.coef[collegeLasso.coef!=0]
## (Intercept) PrivateYes Accept Enroll Top10perc
## -7.900363e+02 -3.070103e+02 1.779328e+00 -1.469508e+00 6.672214e+01
## Top25perc F.Undergrad P.Undergrad Outstate Room.Board
## -2.230442e+01 9.258974e-02 9.408838e-03 -1.083495e-01 2.115147e-01
## Books Personal PhD Terminal S.F.Ratio
## 2.912105e-01 6.120406e-03 -1.547200e+01 6.409503e+00 2.282638e+01
## perc.alumni Expend
## 1.130498e+00 4.856697e-02
Problem 9 - Part E
set.seed(1)
collegePCR <- pcr(Apps~., data=collegeTrain, scale=TRUE, validation="CV")
validationplot(collegePCR, val.type="MSEP")
summary(collegePCR)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4288 4006 2373 2372 2069 1961 1919
## adjCV 4288 4007 2368 2369 1999 1948 1911
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1919 1921 1876 1832 1832 1836 1837
## adjCV 1912 1915 1868 1821 1823 1827 1827
## 14 comps 15 comps 16 comps 17 comps
## CV 1853 1759 1341 1270
## adjCV 1850 1733 1326 1257
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.20 57.78 65.31 70.99 76.37 81.27 84.8 87.85
## Apps 13.44 70.93 71.07 79.87 81.15 82.25 82.3 82.33
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.62 92.91 94.98 96.74 97.79 98.72 99.42
## Apps 83.38 84.76 84.80 84.84 85.11 85.14 90.55
## 16 comps 17 comps
## X 99.88 100.00
## Apps 93.42 93.89
The value of M selected by cross-validation is 17 (min CV is 1270). The test error obtained for the PCR model is 1135758.
collegePCR.pred <- predict(collegePCR, collegeTest, ncomp=17)
mean((collegeTest$Apps - collegePCR.pred)^2)
## [1] 1135758
Problem 9 - Part F
set.seed(1)
collegePLS <- plsr(Apps~., data=collegeTrain, scale=TRUE, validation="CV")
validationplot(collegePLS, val.type="MSEP")
summary(collegePLS)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4288 2217 2019 1761 1630 1533 1347
## adjCV 4288 2211 2012 1749 1605 1510 1331
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1309 1303 1286 1283 1283 1277 1271
## adjCV 1296 1289 1273 1270 1270 1264 1258
## 14 comps 15 comps 16 comps 17 comps
## CV 1270 1270 1270 1270
## adjCV 1258 1257 1257 1257
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 27.21 50.73 63.06 65.52 70.20 74.20 78.62 80.81
## Apps 75.39 81.24 86.97 91.14 92.62 93.43 93.56 93.68
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 83.29 87.17 89.15 91.37 92.58 94.42 96.98
## Apps 93.76 93.79 93.83 93.86 93.88 93.89 93.89
## 16 comps 17 comps
## X 98.78 100.00
## Apps 93.89 93.89
The value of M selected by cross-validation is 15 (min CV is 1270 and min adjusted CV is 1257). The test error obtained for the PCR model is 1135806.
collegePLS.pred <- predict(collegePLS, collegeTest, ncomp=15)
mean((collegeTest$Apps - collegePLS.pred)^2)
## [1] 1135806
Problem 9 - Part G
collegeTest.avg <- mean(collegeTest$Apps)
collegeLM.r2 <- 1 - mean((collegeLM.pred - collegeTest$Apps)^2)/mean((collegeTest.avg - collegeTest$Apps)^2)
collegeRidge.r2 <- 1 - mean((collegeRidge.pred - collegeTest$Apps)^2) / mean((collegeTest.avg - collegeTest$Apps)^2)
collegeLasso.r2 <- 1 - mean((collegeLasso.pred - collegeTest$Apps)^2) / mean((collegeTest.avg - collegeTest$Apps)^2)
collegePCR.r2 <- 1 - mean((collegePCR.pred - collegeTest$Apps)^2) / mean((collegeTest.avg - collegeTest$Apps)^2)
collegePLS.r2 <- 1 - mean((collegePLS.pred - collegeTest$Apps)^2) / mean((collegeTest.avg - collegeTest$Apps)^2)
The test R-square value for all of the models are highly similar, though it appears that the lasso model provides the best performance since it has a slightly higher value than the other four models (0.9015499 or 90.15499% of the variance explained).
collegeLM.r2
## [1] 0.9015413
collegeRidge.r2
## [1] 0.9015452
collegeLasso.r2
## [1] 0.9015499
collegePCR.r2
## [1] 0.9015413
collegePLS.r2
## [1] 0.9015372
Problem 11 - Part A
boston <- Boston
attach(boston)
In this chunk, I am splitting the boston data set in half to create one training set and one validation set.
set.seed(1)
split.2 <- sample(1:nrow(boston), nrow(boston)/2)
bostonTrain <- boston[split.2,]
bostonTest <- boston[-split.2,]
bostonTrain.matrix <- model.matrix(crim~., data=bostonTrain)[,-1]
bostonTest.matrix <- model.matrix(crim~., data=bostonTest)[,-1]
Lasso Regression
bostonLasso <- cv.glmnet(bostonTrain.matrix, bostonTrain$crim, alpha=1)
bestLambda.boston.lasso <- bostonLasso$lambda.min
bestLambda.boston.lasso
## [1] 0.06805595
plot(bostonLasso)
The cross-validation estimate for the test error is 40.90173.
bostonLasso.pred <- predict(bostonLasso, newx=bostonTest.matrix, s=bestLambda.boston.lasso)
mean((bostonTest$crim - bostonLasso.pred)^2)
## [1] 40.90173
11 coefficients in this model do not equal zero.
bostonLasso.coef <- predict(bostonLasso, s=bestLambda.boston.lasso, type="coefficients")[1:13,]
bostonLasso.coef
## (Intercept) zn indus chas nox rm
## 17.65005511 0.03516255 -0.11838293 -0.43135144 -7.19578178 0.04271112
## age dis rad tax ptratio black
## 0.00000000 -0.76801501 0.52430211 0.00000000 -0.35072332 -0.01307754
## lstat
## 0.25559458
Ridge Regression
bostonRidge <- cv.glmnet(bostonTrain.matrix, bostonTrain$crim, alpha=0)
bestLambda.boston.ridge <- bostonLasso$lambda.min
bestLambda.boston.ridge
## [1] 0.06805595
plot(bostonRidge)
The cross-validation estimate for the test error is 40.92777.
bostonRidge.pred <- predict(bostonRidge, newx=bostonTest.matrix, s=bestLambda.boston.ridge)
mean((bostonTest$crim - bostonRidge.pred)^2)
## [1] 40.92777
11 coefficients in this model do not equal zero.
bostonRidge.coef <- predict(bostonRidge, s=bestLambda.boston.ridge, type="coefficients")[1:13,]
bostonRidge.coef
## (Intercept) zn indus chas nox rm
## 14.702068274 0.035283661 -0.119976460 -0.616052143 -5.629356968 0.228001208
## age dis rad tax ptratio black
## -0.004314219 -0.768474301 0.434236779 0.003139323 -0.298647696 -0.013823430
## lstat
## 0.262179350
Principal Components Regression
set.seed(1)
bostonPCR <- pcr(crim~., data=bostonTrain, scale=TRUE, validation="CV")
validationplot(bostonPCR, val.type="MSEP")
summary(bostonPCR)
## Data: X dimension: 253 13
## Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 9.275 7.555 7.549 7.093 6.926 6.932 6.977
## adjCV 9.275 7.550 7.544 7.088 6.920 6.926 6.968
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 6.976 6.871 6.845 6.848 6.797 6.764 6.700
## adjCV 6.967 6.859 6.843 6.842 6.786 6.751 6.686
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 48.51 60.4 69.86 77.08 82.80 87.68 91.24 93.56
## crim 34.94 35.2 42.83 45.47 45.57 45.58 45.75 47.59
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 95.47 97.08 98.48 99.54 100.00
## crim 47.68 48.75 49.31 50.14 51.37
The value of M selected by cross-validation is 13 (min CV is 6.700). The test error obtained for this model is 41.54639.
bostonPCR.pred <- predict(bostonPCR, bostonTest, ncomp=13)
mean((bostonTest$crim - bostonPCR.pred)^2)
## [1] 41.54639
Partial Least Squares
set.seed(1)
bostonPLS <- plsr(crim~., data=bostonTrain, scale=TRUE, validation="CV")
validationplot(bostonPLS, val.type="MSEP")
summary(bostonPLS)
## Data: X dimension: 253 13
## Y dimension: 253 1
## Fit method: kernelpls
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 9.275 7.328 6.842 6.784 6.741 6.718 6.695
## adjCV 9.275 7.322 6.834 6.768 6.728 6.707 6.683
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 6.722 6.707 6.696 6.701 6.700 6.700 6.700
## adjCV 6.706 6.693 6.683 6.688 6.686 6.686 6.686
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 48.10 57.46 62.68 69.65 77.95 80.82 84.40 87.65
## crim 38.94 47.55 49.73 50.61 50.85 51.17 51.29 51.35
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 90.47 93.07 96.66 98.26 100.00
## crim 51.37 51.37 51.37 51.37 51.37
The value of M selected by cross-validation is 6 (min CV is 6.695). The test error obtained for this model is 41.78158.
bostonPLS.pred <- predict(bostonPLS, bostonTest, ncomp=6)
mean((bostonTest$crim - bostonPLS.pred)^2)
## [1] 41.78158
Problem 11 - Part B
The test errors obtained for the four models I performed are as follows:
Lasso: 40.90173 Ridge: 40.92777 PCR: 41.54639 PLS: 41.78158
Based on these results, I am led to believe that the lasso model performs best for this data set since it has the lowest test error (although the test error for all four models are similar) and it eliminates two of the thirteen predictors.
Problem 11 - Part C
The model I chose does not contain all of the features in the data set. Two coefficients in the lasso model are equal to zero and can therefore be eliminated. Consequently, this leaves eleven coefficients in the model which do not equal zero.