#2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
##(a) The lasso, relative to least squares, is:
#iii. Less flexible and hence will give impoved prediction accuracy when its increase in bias is less than its decrease in variance. With an increasing ?? the variance starts to decrease faster than bias increases Which will lead to a low variance in the U shaped MSE curve.
##(b) Repeat (a) for ridge regression relative to least squares.
#iii. Less flexible and hence will give impoved prediction accuracy when its increase in bias is less than its decrease in variance. With an increasing ?? the variance starts to decrease faster than bias increases Which will lead to a low variance in the U shaped MSE curve.
##(c) Repeat (a) for non-linear methods relative to least squares
#ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
#9. In this exercise, we will predict the number of applications received using the other variables in the College data set.
##(a) Split the data set into a training set and a test set.
library(ISLR2)
data(College)
set.seed(1)
trainid = sample(1:nrow(College), nrow(College)/2)
train = College[trainid,]
test = College[-trainid,]
##(b) Fit a linear model using least squares on the training set, and report the test error obtained.
lm_fit = lm(Apps~., data=train)
lm_pred = predict(lm_fit, test)
lm_err = mean((test$Apps - lm_pred)^2)
lm_err
## [1] 1135758
##(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.0.5
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.0.5
## Loaded glmnet 4.1-4
train_X = model.matrix(Apps ~ ., data = train)
train_Y = train[, "Apps"]
test_X = model.matrix(Apps ~ ., data = test)
test_Y = test[, "Apps"]
grid = 10 ^ seq(4, -2, length=100)
ridge_mod = glmnet(train_X, train_Y, alpha =0, lambda=grid, thresh = 1e-12)
lambda_best = ridge_mod$lambda.min
ridge_pred = predict(ridge_mod, newx= test_X, s=lambda_best)
(ridge_err = mean((test_Y - ridge_pred)^2))
## [1] 1164319
##(d) Fit a lasso model on the training set, with I chosen by cross validation. Report the test error obtained, along with the number of non-zero coefficient estimates.
lasso_mod = glmnet(train_X, train_Y, alpha =1, lambda=grid, thresh = 1e-12)
lasso_cv = cv.glmnet(train_X, train_Y, alpha =1, lambda=grid, thresh = 1e-12)
lambda_best = lasso_cv$lambda.min
lasso_pred = predict(lasso_mod, newx= test_X, s=lambda_best)
(lasso_err = mean((test_Y - lasso_pred)^2))
## [1] 1135660
predict(lasso_mod, s = lambda_best, type="coefficients")
## 19 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -7.900363e+02
## (Intercept) .
## PrivateYes -3.070103e+02
## Accept 1.779328e+00
## Enroll -1.469508e+00
## Top10perc 6.672214e+01
## Top25perc -2.230442e+01
## F.Undergrad 9.258974e-02
## P.Undergrad 9.408838e-03
## Outstate -1.083495e-01
## Room.Board 2.115147e-01
## Books 2.912105e-01
## Personal 6.120406e-03
## PhD -1.547200e+01
## Terminal 6.409503e+00
## S.F.Ratio 2.282638e+01
## perc.alumni 1.130498e+00
## Expend 4.856697e-02
## Grad.Rate 7.488081e+00
##(e) Fit a PCR model on the training set, with M chosen by cross validation. Report the test error obtained, along with the value of M selected by cross-validation.
library(pls)
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
pcr_fit = pcr(Apps~., data=train, scale=TRUE, validation="CV")
validationplot(pcr_fit, val.type="MSEP")
summary(pcr_fit)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4288 4027 2351 2355 2046 1965 1906
## adjCV 4288 4031 2347 2353 2014 1955 1899
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1910 1913 1871 1799 1799 1802 1800
## adjCV 1903 1908 1866 1790 1792 1795 1793
## 14 comps 15 comps 16 comps 17 comps
## CV 1832 1728 1310 1222
## adjCV 1829 1702 1296 1212
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.20 57.78 65.31 70.99 76.37 81.27 84.8 87.85
## Apps 13.44 70.93 71.07 79.87 81.15 82.25 82.3 82.33
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.62 92.91 94.98 96.74 97.79 98.72 99.42
## Apps 83.38 84.76 84.80 84.84 85.11 85.14 90.55
## 16 comps 17 comps
## X 99.88 100.00
## Apps 93.42 93.89
pcr_pred = predict(pcr_fit, test, ncomp=10)
(pcr_err = mean((test$Apps - pcr_pred)^2))
## [1] 1723100
##(f) Fit a PLS model on the training set, with M chosen by cross validation. Report the test error obtained, along with the valu of M selected by cross-validation.
pls_fit = plsr(Apps~., data=train, scale=TRUE, validation="CV")
validationplot(pls_fit, val.type="MSEP")
##(g) Comment on the results obtained. How accurately can we pre dict the number of college applications received? Is there much difference among the test errors resulting from these five ap proaches?
test_avg = mean(test$Apps)
lm_r2 =1 - lm_err/mean((test_avg - test$Apps)^2)
ridge_r2 =1 - ridge_err/mean((test_avg - test$Apps)^2)
lasso_r2 =1 - lasso_err/mean((test_avg - test$Apps)^2)
pcr_r2 =1 - pcr_err/mean((test_avg - test$Apps)^2)
pls_r2 =1 - pcr_err/mean((test_avg - test$Apps)^2)
all_r2 = c(lm_r2, ridge_r2, lasso_r2, pcr_r2, pls_r2)
names(all_r2) = c("lm", "ridge", "lasso", "pcr", "pls")
barplot(all_r2 )
#All except the pcr model predict college applications with low error.
#11. We will now try to predict per capita crime rate in the Boston data set.
##(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
set.seed(10)
training = sample(1:nrow(Boston), nrow(Boston)/2)
train = Boston[training,]
test = Boston[-training,]
trainset = model.matrix(crim~., data=train)[,-1]
testset = model.matrix(crim~., data=test)[,-1]
lasso.fit = cv.glmnet(trainset, train$crim, alpha=1)
lambda = lasso.fit$lambda.min
lasso.pred = predict(lasso.fit, s=lambda, newx=testset)
lasso.err = mean((test$crim - lasso.pred)^2)
lasso.err
## [1] 55.79952
pcr2.fit = pcr(crim~., data=train, scale=TRUE, validation="CV")
summary(pcr2.fit)
## Data: X dimension: 253 12
## Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 12
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 7.521 6.213 6.177 5.881 5.745 5.690 5.710
## adjCV 7.521 6.208 6.172 5.874 5.738 5.685 5.703
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps
## CV 5.582 5.433 5.425 5.422 5.411 5.373
## adjCV 5.577 5.423 5.417 5.415 5.402 5.363
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 48.65 61.88 71.98 79.65 86.07 89.97 92.68 94.95
## crim 33.28 34.39 40.90 43.35 44.56 44.71 47.38 50.19
## 9 comps 10 comps 11 comps 12 comps
## X 96.87 98.31 99.47 100.00
## crim 50.21 50.32 50.91 51.88
pcr2.pred = predict(pcr2.fit, test, ncomp=12)
pcr2.err = mean((test$crim - pcr2.pred)^2)
pcr2.err
## [1] 55.17084
ridge.fit = cv.glmnet(trainset, train$crim, alpha=0)
lambda = ridge.fit$lambda.min
ridge.pred = predict(ridge.fit, s=lambda, newx=testset)
ridge.err = mean((test$crim - ridge.pred)^2)
ridge.err
## [1] 56.03528
##(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross validation, or some other reasonable alternative, as opposed to using training error.
#In part (A), we are given the test error 55.95583 for the lasso regression, 56.58621 for the PCR model, and 56.34103 for the ridge regression. The lowest test error was the lasso model, so this model would perform the best compared to the others. The other models errors didn't fall far from the lasso model so they can all perform well.
##(c) Does your chosen model involve all of the features in the data set? Why or why not
#No, the lasso model removes variables deemed "unimportant" or barely significant, and changes their coefficients to 0