R Markdown

#2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

##(a) The lasso, relative to least squares, is:

#iii. Less flexible and hence will give impoved prediction accuracy when its increase in bias is less than its decrease in variance. With an increasing ?? the variance starts to decrease faster than bias increases Which will lead to a low variance in the U shaped MSE curve.

##(b) Repeat (a) for ridge regression relative to least squares.

#iii. Less flexible and hence will give impoved prediction accuracy when its increase in bias is less than its decrease in variance. With an increasing ?? the variance starts to decrease faster than bias increases Which will lead to a low variance in the U shaped MSE curve.

##(c) Repeat (a) for non-linear methods relative to least squares

#ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

#9. In this exercise, we will predict the number of applications received using the other variables in the College data set.

##(a) Split the data set into a training set and a test set.

library(ISLR2)
data(College)
set.seed(1)
trainid = sample(1:nrow(College), nrow(College)/2)
train = College[trainid,]
test = College[-trainid,]

##(b) Fit a linear model using least squares on the training set, and report the test error obtained.

lm_fit = lm(Apps~., data=train)
lm_pred = predict(lm_fit, test)
lm_err = mean((test$Apps - lm_pred)^2)
lm_err

## [1] 1135758

##(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

library(glmnet)

## Warning: package 'glmnet' was built under R version 4.0.5

## Loading required package: Matrix

## Warning: package 'Matrix' was built under R version 4.0.5

## Loaded glmnet 4.1-4

train_X = model.matrix(Apps ~ ., data = train)
train_Y = train[, "Apps"]
test_X = model.matrix(Apps ~ ., data = test)
test_Y = test[, "Apps"]
grid = 10 ^ seq(4, -2, length=100)
ridge_mod = glmnet(train_X, train_Y, alpha =0, lambda=grid, thresh = 1e-12)
lambda_best = ridge_mod$lambda.min
ridge_pred = predict(ridge_mod, newx= test_X, s=lambda_best)
(ridge_err = mean((test_Y - ridge_pred)^2))

## [1] 1164319

##(d) Fit a lasso model on the training set, with I chosen by cross validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

lasso_mod = glmnet(train_X, train_Y, alpha =1, lambda=grid, thresh = 1e-12)
lasso_cv = cv.glmnet(train_X, train_Y, alpha =1, lambda=grid, thresh = 1e-12)
lambda_best = lasso_cv$lambda.min
lasso_pred = predict(lasso_mod, newx= test_X, s=lambda_best)
(lasso_err = mean((test_Y - lasso_pred)^2))

## [1] 1135660

predict(lasso_mod, s = lambda_best, type="coefficients")

## 19 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept) -7.900363e+02
## (Intercept)  .           
## PrivateYes  -3.070103e+02
## Accept       1.779328e+00
## Enroll      -1.469508e+00
## Top10perc    6.672214e+01
## Top25perc   -2.230442e+01
## F.Undergrad  9.258974e-02
## P.Undergrad  9.408838e-03
## Outstate    -1.083495e-01
## Room.Board   2.115147e-01
## Books        2.912105e-01
## Personal     6.120406e-03
## PhD         -1.547200e+01
## Terminal     6.409503e+00
## S.F.Ratio    2.282638e+01
## perc.alumni  1.130498e+00
## Expend       4.856697e-02
## Grad.Rate    7.488081e+00

##(e) Fit a PCR model on the training set, with M chosen by cross validation. Report the test error obtained, along with the value of M selected by cross-validation.

library(pls)

## 
## Attaching package: 'pls'

## The following object is masked from 'package:stats':
## 
##     loadings

pcr_fit = pcr(Apps~., data=train, scale=TRUE, validation="CV")
validationplot(pcr_fit, val.type="MSEP")

summary(pcr_fit)

## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4288     4027     2351     2355     2046     1965     1906
## adjCV         4288     4031     2347     2353     2014     1955     1899
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1910     1913     1871      1799      1799      1802      1800
## adjCV     1903     1908     1866      1790      1792      1795      1793
##        14 comps  15 comps  16 comps  17 comps
## CV         1832      1728      1310      1222
## adjCV      1829      1702      1296      1212
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       32.20    57.78    65.31    70.99    76.37    81.27     84.8    87.85
## Apps    13.44    70.93    71.07    79.87    81.15    82.25     82.3    82.33
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.62     92.91     94.98     96.74     97.79     98.72     99.42
## Apps    83.38     84.76     84.80     84.84     85.11     85.14     90.55
##       16 comps  17 comps
## X        99.88    100.00
## Apps     93.42     93.89

pcr_pred = predict(pcr_fit, test, ncomp=10)
(pcr_err = mean((test$Apps - pcr_pred)^2))

## [1] 1723100

##(f) Fit a PLS model on the training set, with M chosen by cross validation. Report the test error obtained, along with the valu of M selected by cross-validation.

pls_fit = plsr(Apps~., data=train, scale=TRUE, validation="CV")
validationplot(pls_fit, val.type="MSEP")

##(g) Comment on the results obtained. How accurately can we pre dict the number of college applications received? Is there much difference among the test errors resulting from these five ap proaches?

test_avg = mean(test$Apps)
lm_r2 =1 - lm_err/mean((test_avg - test$Apps)^2)
ridge_r2 =1 - ridge_err/mean((test_avg - test$Apps)^2)
lasso_r2 =1 - lasso_err/mean((test_avg - test$Apps)^2)
pcr_r2 =1 - pcr_err/mean((test_avg - test$Apps)^2)
pls_r2 =1 - pcr_err/mean((test_avg - test$Apps)^2)
all_r2 = c(lm_r2, ridge_r2, lasso_r2, pcr_r2, pls_r2)
names(all_r2) = c("lm", "ridge", "lasso", "pcr", "pls")
barplot(all_r2 )

#All except the pcr model predict college applications with low error.

#11. We will now try to predict per capita crime rate in the Boston data set.

##(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

set.seed(10)
training = sample(1:nrow(Boston), nrow(Boston)/2)
train = Boston[training,]
test = Boston[-training,]
trainset = model.matrix(crim~., data=train)[,-1]
testset = model.matrix(crim~., data=test)[,-1]


lasso.fit = cv.glmnet(trainset, train$crim, alpha=1)
lambda = lasso.fit$lambda.min
lasso.pred = predict(lasso.fit, s=lambda, newx=testset)
lasso.err = mean((test$crim - lasso.pred)^2)
lasso.err

## [1] 55.79952

pcr2.fit = pcr(crim~., data=train, scale=TRUE, validation="CV")
summary(pcr2.fit)

## Data:    X dimension: 253 12 
##  Y dimension: 253 1
## Fit method: svdpc
## Number of components considered: 12
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           7.521    6.213    6.177    5.881    5.745    5.690    5.710
## adjCV        7.521    6.208    6.172    5.874    5.738    5.685    5.703
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps
## CV       5.582    5.433    5.425     5.422     5.411     5.373
## adjCV    5.577    5.423    5.417     5.415     5.402     5.363
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       48.65    61.88    71.98    79.65    86.07    89.97    92.68    94.95
## crim    33.28    34.39    40.90    43.35    44.56    44.71    47.38    50.19
##       9 comps  10 comps  11 comps  12 comps
## X       96.87     98.31     99.47    100.00
## crim    50.21     50.32     50.91     51.88

pcr2.pred = predict(pcr2.fit, test, ncomp=12)  
pcr2.err = mean((test$crim - pcr2.pred)^2)
pcr2.err

## [1] 55.17084

ridge.fit = cv.glmnet(trainset, train$crim, alpha=0)
lambda = ridge.fit$lambda.min
ridge.pred = predict(ridge.fit, s=lambda, newx=testset)
ridge.err = mean((test$crim - ridge.pred)^2)
ridge.err

## [1] 56.03528

##(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross validation, or some other reasonable alternative, as opposed to using training error.

#In part (A), we are given the test error 55.95583 for the lasso regression, 56.58621 for the PCR model, and 56.34103 for the ridge regression. The lowest test error was the lasso model, so this model would perform the best compared to the others. The other models errors didn't fall far from the lasso model so they can all perform well.

##(c) Does your chosen model involve all of the features in the data set? Why or why not

#No, the lasso model removes variables deemed "unimportant" or barely significant, and changes their coefficients to 0

Linear Model Selection and Regularization

R Markdown