For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
(a) The lasso, relative to least squares, is: 3. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
(b) Repeat (a) for ridge regression relative to least squares.
- As the shrinkage parameter in ridge regression increases, flexibility of ridge decreases. This leads to an increase in bias and decreases the variance and prediction accuracy of ridge regression will improve on the least square, when its increase in bias is less than its decrease in variance. Using least square leads to low bias and high variance, specially when the number of samples and variables are similar in size and prediction accuracy could be low.
(c) Repeat (a) for non-linear methods relative to least squares.
College data set
data("College")
sum(is.na(College))
## [1] 0
set.seed(52)
train=sample(1:nrow(College),nrow(College)/2)
train_col = College[train,]
test_col = College[-train,]
lm.fit = lm(Apps ~ ., data = train_col )
pred.lm = predict(lm.fit, newdata = test_col)
lm.mse=mean((test_col$Apps - pred.lm)^2)
lm.mse
## [1] 1338031
train_mat = model.matrix(Apps ~ ., train_col)[, -1]
test.mat = model.matrix(Apps ~ ., test_col)[, -1]
train_vec = train_col$Apps
test_vec = test_col$Apps
grid <- 10^seq(10, -2, length = 1000)
ridge.mod = cv.glmnet(train_mat, train_vec, alpha = 0, lambda = grid, thresh = 1e-12)
best_lam = ridge.mod$lambda.min
best_lam
## [1] 20.6688
ridge.pred <- predict(ridge.mod , s = best_lam, newx = test.mat)
ridge.mse = mean (( ridge.pred - test_vec)^2)
ridge.mse
## [1] 1430793
The test MSE is higher than the OLS test MSE(1338031)
grid <- 10^seq(10, -2, length = 1000)
lasso.mod = cv.glmnet(train_mat, train_vec, alpha = 1, lambda = grid, thresh = 1e-12)
best_lam_lasso = lasso.mod$lambda.min
best_lam_lasso
## [1] 12.9155
lasso.pred <- predict(lasso.mod , s = best_lam_lasso, newx = test.mat)
lasso.mse = mean (( lasso.pred - test_vec)^2)
lasso.mse
## [1] 1416085
lasso.coef <- predict(lasso.mod , type = "coefficients",s = best_lam_lasso)[1:18, ]
lasso.coef
## (Intercept) PrivateYes Accept Enroll Top10perc
## -781.78084063 -583.76048433 1.18036855 0.00000000 26.44871883
## Top25perc F.Undergrad P.Undergrad Outstate Room.Board
## 0.00000000 0.05690070 0.00000000 -0.04118382 0.15967536
## Books Personal PhD Terminal S.F.Ratio
## -0.15706118 0.03455801 -6.73033624 -5.60584788 12.74968758
## perc.alumni Expend Grad.Rate
## -10.91308670 0.10784044 9.39631529
set.seed (1)
pcr.fit <- pcr(Apps ~ ., data = train_col , scale = TRUE , validation = "CV")
validationplot(pcr.fit, val.type="MSEP")
pcr.pred = predict(pcr.fit, test.mat, ncomp=9)
pcr.mse=mean((test_vec - pcr.pred)^2)
pcr.mse
## [1] 3048364
set.seed (1)
pls.fit <- plsr(Apps ~ ., data = train_col , scale = TRUE , validation = "CV")
validationplot(pls.fit, val.type="MSEP")
pls.pred = predict(pls.fit, test.mat, ncomp=5)
pls.mse=mean((test_vec - pls.pred)^2)
pls.mse
## [1] 2146014
MSE = c(lm.mse, ridge.mse, lasso.mse, pcr.mse, pls.mse)
Model = c("Linear Reg", "Ridge Reg", "Lasso", "PCR", "PLS")
all = data.frame(Model, MSE)
(all = all[order(MSE, Model),])
predict.regsubsets <- function(object , newdata , id, ...) {
form <- as.formula(object$call [[2]])
mat <- model.matrix(form , newdata)
coefi <- coef(object , id = id)
xvars <- names(coefi)
mat[, xvars] %*% coefi
}
k <- 10
n <- nrow(Boston)
p = ncol(Boston) - 1
folds = sample(rep (1:k, length = n))
cv.errors <- matrix(NA, k, p, dimnames = list(NULL , paste (1:p)))
for (j in 1:k) {
best.fit = regsubsets(crim ~ ., data = Boston[folds != j, ], nvmax = n)
for (i in 1:p) {
pred = predict(best.fit , Boston[folds == j, ], id = i)
cv.errors [j, i] = mean (( Boston$crim[folds == j] - pred)^2)
}
}
rmse.cv.err = sqrt(apply(cv.errors, 2, mean))
plot(rmse.cv.err, pch = 1, type = "b")
folds
## [1] 8 3 9 10 6 1 1 2 2 6 3 2 3 6 3 4 8 7 4 7 3 2 3 2 8
## [26] 3 1 2 10 9 8 5 9 10 10 5 3 10 10 6 1 8 10 4 3 9 9 8 6 2
## [51] 9 6 7 2 7 1 9 5 3 6 6 5 10 3 8 2 8 2 6 8 9 2 1 6 5
## [76] 5 6 5 6 10 1 6 1 2 3 2 10 9 4 4 8 10 10 2 1 1 5 5 10 6
## [101] 8 2 5 3 9 1 8 4 3 6 7 9 6 7 5 10 7 3 8 2 2 8 4 6 8
## [126] 2 1 2 4 7 1 3 2 4 7 7 8 10 3 4 2 10 10 1 5 5 3 8 1 6
## [151] 5 4 6 4 1 6 8 5 6 7 10 10 8 5 4 8 4 2 5 3 5 3 10 8 5
## [176] 2 1 1 10 5 4 7 9 6 1 7 5 6 9 4 9 3 7 3 7 10 3 4 9 9
## [201] 8 5 6 7 7 2 4 3 9 6 4 6 2 2 4 7 2 9 9 7 10 4 4 3 5
## [226] 4 2 8 7 9 4 10 3 5 9 6 6 1 7 9 6 5 5 7 7 8 7 1 5 2
## [251] 5 9 10 4 3 4 9 1 5 5 1 2 10 5 2 10 8 6 5 1 9 10 10 9 4
## [276] 6 5 6 9 3 2 1 5 8 2 10 8 6 1 8 3 3 1 4 7 7 8 5 8 3
## [301] 7 2 9 9 9 10 6 4 6 6 1 4 9 10 9 4 8 9 9 4 1 7 1 1 7
## [326] 7 9 9 1 1 4 9 10 1 6 1 10 6 7 8 4 1 2 10 1 10 10 3 9 9
## [351] 7 3 2 1 7 1 8 10 8 8 2 1 4 1 9 7 7 7 1 5 9 7 5 4 2
## [376] 7 8 10 4 2 4 6 4 5 8 3 3 3 5 8 3 4 6 3 7 8 7 10 2 7
## [401] 6 5 8 7 2 3 1 3 2 1 2 2 2 7 4 9 7 3 9 9 2 5 8 5 2
## [426] 6 4 6 10 7 4 9 1 6 4 10 4 7 10 3 2 8 4 6 8 6 3 8 5 3
## [451] 4 5 1 9 3 3 6 4 10 2 3 1 5 9 4 3 1 9 5 10 7 7 9 9 3
## [476] 8 2 6 3 8 6 7 10 6 6 5 10 1 8 4 5 2 10 8 1 7 5 4 5 3
## [501] 10 1 8 5 8 10
coef(best.fit, 10)
## (Intercept) zn indus nox dis
## 21.824249741 0.053352858 -0.093824399 -10.682834604 -1.153005385
## rad tax ptratio black lstat
## 0.622543420 -0.004911094 -0.273150005 -0.007863557 0.110571831
## medv
## -0.216655711
which.min(rmse.cv.err)
## 9
## 9
best_rmse=rmse.cv.err[which.min(rmse.cv.err)]
best_rmse
## 9
## 6.514147
x = model.matrix(crim ~ .-1, data = Boston)
y = Boston$crim
lasso.cv = cv.glmnet(x, y, type.measure = "mse")
plot(lasso.cv)
coef(lasso.cv)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 1.0894283
## zn .
## indus .
## chas .
## nox .
## rm .
## age .
## dis .
## rad 0.2643196
## tax .
## ptratio .
## black .
## lstat .
## medv .
lasso_rmse=sqrt(lasso.cv$cvm[lasso.cv$lambda == lasso.cv$lambda.1se])
lasso_rmse
## [1] 7.402467
ridge.cv = cv.glmnet(x, y, type.measure = "mse", alpha = 0)
plot(ridge.cv)
coef(ridge.cv)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 1.761587688
## zn -0.002865786
## indus 0.026514040
## chas -0.142145708
## nox 1.684835359
## rm -0.130992458
## age 0.005612849
## dis -0.084630196
## rad 0.039946869
## tax 0.001830673
## ptratio 0.063756646
## black -0.002278831
## lstat 0.031682619
## medv -0.020839190
ridge_rmse=sqrt(ridge.cv$cvm[ridge.cv$lambda == ridge.cv$lambda.1se])
ridge_rmse
## [1] 7.735648
pcr.fit.b = pcr(crim ~ ., data = Boston, scale = TRUE, validation = "CV")
summary(pcr.fit.b)
## Data: X dimension: 506 13
## Y dimension: 506 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 8.61 7.195 7.190 6.781 6.756 6.773 6.805
## adjCV 8.61 7.193 7.188 6.776 6.749 6.768 6.798
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 6.798 6.669 6.685 6.687 6.679 6.630 6.562
## adjCV 6.790 6.660 6.677 6.677 6.669 6.619 6.550
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 47.70 60.36 69.67 76.45 82.99 88.00 91.14 93.45
## crim 30.69 30.87 39.27 39.61 39.61 39.86 40.14 42.47
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 95.40 97.04 98.46 99.52 100.0
## crim 42.55 42.78 43.04 44.13 45.4
rmse_pcr=sqrt(pcr.fit.b$validation$adj[13])
rmse_pcr
## [1] 6.361026
RMSE = c(best_rmse, ridge_rmse, lasso_rmse, rmse_pcr)
Models = c("Best Subset", "Ridge Reg", "Lasso", "PCR")
all_models = data.frame(Models, RMSE)
(all_models = all_models[order(RMSE, Models),])
Based on the CV RMSE values for the different models, the PRC and best subset models seem to perform better on this data set.
Though the PCR model with 13 features has the lowest RMSE, I would prefer the model chosen with best subset approach as it has only 10 features, making it a simpler model with slightly lower RMSE relative to PCR, but much lower RMSE compared to Lasso and Ridge models.
coef(best.fit, 10)
## (Intercept) zn indus nox dis
## 21.824249741 0.053352858 -0.093824399 -10.682834604 -1.153005385
## rad tax ptratio black lstat
## 0.622543420 -0.004911094 -0.273150005 -0.007863557 0.110571831
## medv
## -0.216655711