For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
(a) The lasso, relative to least squares, is:
Explanation: The lasso works by shrinking the coefficients estimates to zero thus significantly reducing their variance. It also helps reduce the chance of overfitting, thus why it is less flexible.
(b) Repeat (a) for ridge regression relative to least squares.
Explanation: The ridge regression works similar to the lasso because it also shrinks the coefficients significantly reducing their variance but at the expense of a slight increase in bias.
(c) Repeat (a) for non-linear methods relative to least squares.
In this exercise, we will predict the number of applications received using the other variables in the College data set.
library(ISLR2)
attach(College)
(a) Split the data set into a training set and a test set.
set.seed(1)
inTrain <- sample(1:nrow(College), 0.8*nrow(College))
college_train <- College[inTrain,]
college_test <- College[-inTrain,]
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
The test error after fitting the linear model is 1567324.
lm_fit <- lm(Apps~., data=college_train)
summary(lm_fit)
##
## Call:
## lm(formula = Apps ~ ., data = college_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5555.2 -404.6 19.9 310.3 7577.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -630.58238 435.56266 -1.448 0.148209
## PrivateYes -388.97393 148.87623 -2.613 0.009206 **
## Accept 1.69123 0.04433 38.153 < 2e-16 ***
## Enroll -1.21543 0.20873 -5.823 9.41e-09 ***
## Top10perc 50.45622 5.88174 8.578 < 2e-16 ***
## Top25perc -13.62655 4.67321 -2.916 0.003679 **
## F.Undergrad 0.08271 0.03632 2.277 0.023111 *
## P.Undergrad 0.06555 0.03367 1.947 0.052008 .
## Outstate -0.07562 0.01987 -3.805 0.000156 ***
## Room.Board 0.14161 0.05130 2.760 0.005947 **
## Books 0.21161 0.25184 0.840 0.401102
## Personal 0.01873 0.06604 0.284 0.776803
## PhD -9.72551 4.91228 -1.980 0.048176 *
## Terminal -0.48690 5.43302 -0.090 0.928620
## S.F.Ratio 18.26146 13.83984 1.319 0.187508
## perc.alumni 1.39008 4.39572 0.316 0.751934
## Expend 0.05764 0.01254 4.595 5.26e-06 ***
## Grad.Rate 5.89480 3.11185 1.894 0.058662 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 993.8 on 603 degrees of freedom
## Multiple R-squared: 0.9347, Adjusted R-squared: 0.9328
## F-statistic: 507.5 on 17 and 603 DF, p-value: < 2.2e-16
lm_pred <- predict(lm_fit, college_test, type="response")
mean((lm_pred-college_test$Apps)^2)
## [1] 1567324
(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
The test error obtained after fitting the ridge regression model is 1441717.
library(glmnet)
set.seed(1)
train_mat <- model.matrix(Apps~., data = college_train)
test_mat = model.matrix(Apps~., data = college_test)
cv_out <- cv.glmnet(train_mat,college_train$Apps,alpha=0)
bestlam <- cv_out$lambda.min
bestlam
## [1] 362.9786
ridge_mod <- glmnet(train_mat,college_train$Apps,alpha = 0)
ridge_pred <- predict(ridge_mod,s=bestlam,newx = test_mat)
mean((ridge_pred - college_test$Apps)^2)
## [1] 1441717
(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
The test error of the lasso model on the training set is 1526566 with 3 non-zero coefficients.
set.seed(1)
cv_out2 <- cv.glmnet(train_mat,college_train$Apps,alpha=1)
bestlam2 <- cv_out2$lambda.min
bestlam2
## [1] 10.33776
lasso_mod <- glmnet(train_mat,college_train$Apps,alpha=1)
lasso_pred <- predict(lasso_mod,s=bestlam2,newx=test_mat)
mean((lasso_pred - college_test$Apps)^2)
## [1] 1526566
lasso_coef <- predict(lasso_mod, type="coefficients", s=bestlam)[1:18,]
lasso_coef[lasso_coef!=0]
## (Intercept) Accept Top10perc Expend
## -1.807026e+02 1.349772e+00 1.645574e+01 1.014995e-03
(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
The test error obtained after fitting a PCR model is 2322618 using a value of 10 as M, which was selected by cross validation.
library(pls)
set.seed(1)
pcr_fit <- pcr(Apps~., data = college_train, scale = TRUE, validation = "CV")
summary(pcr_fit)
## Data: X dimension: 621 17
## Y dimension: 621 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 3837 3756 2050 2056 1660 1574 1578
## adjCV 3837 3756 2047 2056 1642 1562 1576
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1564 1522 1509 1496 1501 1503 1496
## adjCV 1563 1515 1506 1493 1498 1500 1493
## 14 comps 15 comps 16 comps 17 comps
## CV 1497 1497 1199 1118
## adjCV 1494 1487 1187 1109
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.003 57.06 64.13 70.03 75.36 80.38 84.09 87.44
## Apps 4.441 72.01 72.02 81.86 83.67 83.77 84.01 85.14
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.47 92.83 94.91 96.78 97.86 98.72 99.36
## Apps 85.42 85.76 85.76 85.76 85.89 85.95 89.93
## 16 comps 17 comps
## X 99.83 100.00
## Apps 92.89 93.47
pcr_pred <- predict(pcr_fit, college_test, ncomp = 10)
mean((pcr_pred -college_test$Apps)^2)
## [1] 2322618
(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
The test error obtained after fitting a PLS model on the training set was 1551898 with M as a value of 10 as selected by cross-validation.
set.seed(1)
pls_fit <- plsr(Apps~., data=college_train, scale=TRUE,validation="CV")
summary(pls_fit)
## Data: X dimension: 621 17
## Y dimension: 621 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 3837 1865 1623 1441 1380 1244 1166
## adjCV 3837 1862 1623 1437 1363 1222 1154
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1147 1133 1123 1122 1122 1121 1118
## adjCV 1136 1125 1115 1114 1114 1112 1110
## 14 comps 15 comps 16 comps 17 comps
## CV 1118 1118 1118 1118
## adjCV 1109 1110 1109 1109
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 25.55 45.38 62.59 65.08 67.55 72.02 75.93 80.46
## Apps 77.30 83.57 87.51 90.88 92.88 93.15 93.24 93.31
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 82.51 85.43 87.83 91.09 92.73 95.12 96.95
## Apps 93.39 93.42 93.45 93.46 93.47 93.47 93.47
## 16 comps 17 comps
## X 97.97 100.00
## Apps 93.47 93.47
validationplot(pls_fit, val.type = "MSEP")
pls_pred <- predict(pls_fit, college_test, ncomp = 10)
mean((pls_pred - college_test$Apps)^2)
## [1] 1551898
(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
The test error for the linear model is 1567324, test error for the ridge regression model is 1441717, test error for the lasso model is 1526566, test error for the PCR model is 2322618, and the test error for the PLS model on the training set was 1551898. All these models have relatively similar test errors except the PCR model but the ridge model gives the best results.
detach(College)
We will now try to predict per capita crime rate in the Boston data set.
library(ISLR2)
attach(Boston)
set.seed(1)
inTrain <- sample(1:nrow(Boston), 0.8*nrow(Boston))
boston_train <- Boston[inTrain,]
boston_test <- Boston[-inTrain,]
(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
The linear model produced a test error of 64.30352.
lm_fit <- lm(crim~., data=boston_train)
summary(lm_fit)
##
## Call:
## lm(formula = crim ~ ., data = boston_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.909 -1.991 -0.354 0.976 59.619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.424748 7.433437 1.941 0.05303 .
## zn 0.037134 0.019236 1.930 0.05427 .
## indus -0.061103 0.084930 -0.719 0.47229
## chas -0.694068 1.202456 -0.577 0.56413
## nox -8.523151 5.455552 -1.562 0.11903
## rm 0.272036 0.637903 0.426 0.67001
## age 0.001956 0.018389 0.106 0.91536
## dis -0.840921 0.295902 -2.842 0.00472 **
## rad 0.590820 0.086960 6.794 4.08e-11 ***
## tax -0.003601 0.005140 -0.701 0.48392
## ptratio -0.344734 0.195586 -1.763 0.07875 .
## lstat 0.138942 0.077358 1.796 0.07325 .
## medv -0.181755 0.060290 -3.015 0.00274 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.006 on 391 degrees of freedom
## Multiple R-squared: 0.4603, Adjusted R-squared: 0.4437
## F-statistic: 27.79 on 12 and 391 DF, p-value: < 2.2e-16
lm_pred <- predict(lm_fit, boston_test, type="response")
mean((lm_pred-boston_test$crim)^2)
## [1] 64.30352
The Ridge model produced a test error of 65.04252.
library(glmnet)
set.seed(1)
train_mat <- model.matrix(crim~., data = boston_train)
test_mat = model.matrix(crim~., data = boston_test)
cv_out <- cv.glmnet(train_mat,boston_train$crim,alpha=1)
bestlam <- cv_out$lambda.min
bestlam
## [1] 0.04862993
ridge_mod <- glmnet(train_mat,boston_train$crim,alpha = 1)
ridge_pred <- predict(ridge_mod,s=bestlam,newx = test_mat)
mean((ridge_pred - boston_test$crim)^2)
## [1] 65.04252
The Lasso model produced a test error of 65.04252
set.seed(1)
cv_out2 <- cv.glmnet(train_mat,boston_train$crim ,alpha=1)
bestlam2 <- cv_out2$lambda.min
bestlam2
## [1] 0.04862993
lasso_mod <- glmnet(train_mat,boston_train$crim,alpha=1)
lasso_pred <- predict(lasso_mod,s=bestlam2,newx=test_mat)
mean((lasso_pred - boston_test$crim)^2)
## [1] 65.04252
The PCR model produced a test error of 67.85974.
library(pls)
set.seed(1)
pcr_fit <- pcr(crim~., data = boston_train, scale = TRUE, validation = "CV")
summary(pcr_fit)
## Data: X dimension: 404 12
## Y dimension: 404 1
## Fit method: svdpc
## Number of components considered: 12
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 8.062 6.812 6.809 6.433 6.374 6.347 6.325
## adjCV 8.062 6.808 6.804 6.427 6.369 6.342 6.320
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps
## CV 6.183 6.198 6.184 6.193 6.176 6.094
## adjCV 6.176 6.191 6.177 6.186 6.166 6.084
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 50.37 64.05 73.18 80.30 86.59 90.14 92.71 94.86
## crim 29.85 30.02 37.77 38.96 39.64 40.17 42.82 43.05
## 9 comps 10 comps 11 comps 12 comps
## X 96.71 98.23 99.44 100.00
## crim 43.45 43.53 44.50 46.03
pcr_pred <- predict(pcr_fit, boston_test, ncomp = 7)
mean((pcr_pred -boston_test$crim)^2)
## [1] 67.85974
(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.
The linear model produced the best test error at 64.30352 and is a very simple model to implement and comprehend.
(c) Does your chosen model involve all of the features in the data set? Why or why not?
Yes, the chosen linear model involves all the possible predictors because through a few test I found just leaving all the predictors in gave the lowest test error.