2. The following statements are true:

(a) The lasso, relative to least squares, is: iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
(b) Ridge regression relative to least squares, is: iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

(c) Non-linear methods relative to least squares, is: ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

9. In this exercise, we will predict the number of applications received using the other variables in the College data set.

(a) Split the data set into a training set and a test set.

library(ISLR)
rand_obs <- sample.int(nrow(College), nrow(College)*.7)
train_set <- College[rand_obs,]
test_set <- College[-rand_obs,]

(b) Fit a linear model using least squares on the training set, and report the test error obtained

lm.fit <- lm(Apps ~ ., data = train_set)
summary(lm.fit)
## 
## Call:
## lm(formula = Apps ~ ., data = train_set)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3541.6  -458.7   -37.4   312.3  7418.8 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -144.31603  512.02034  -0.282 0.778165    
## PrivateYes  -442.87818  172.42573  -2.569 0.010489 *  
## Accept         1.67649    0.04572  36.668  < 2e-16 ***
## Enroll        -0.79910    0.20746  -3.852 0.000132 ***
## Top10perc     48.33376    6.71193   7.201 2.08e-12 ***
## Top25perc    -12.57759    5.36135  -2.346 0.019348 *  
## F.Undergrad    0.01079    0.03771   0.286 0.774956    
## P.Undergrad    0.06310    0.03576   1.764 0.078231 .  
## Outstate      -0.09519    0.02306  -4.127 4.27e-05 ***
## Room.Board     0.12475    0.05684   2.195 0.028622 *  
## Books          0.16668    0.28951   0.576 0.565032    
## Personal       0.01771    0.07062   0.251 0.802134    
## PhD           -8.78289    5.81358  -1.511 0.131453    
## Terminal      -4.33909    6.40877  -0.677 0.498669    
## S.F.Ratio      2.53563   17.12521   0.148 0.882349    
## perc.alumni    1.37125    4.92392   0.278 0.780747    
## Expend         0.07091    0.01602   4.426 1.17e-05 ***
## Grad.Rate      8.46122    3.55393   2.381 0.017630 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1034 on 525 degrees of freedom
## Multiple R-squared:  0.936,  Adjusted R-squared:  0.934 
## F-statistic: 451.9 on 17 and 525 DF,  p-value: < 2.2e-16

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

x <- model.matrix(Apps~ ., College)[,-2]
y <- College$Apps 

library(glmnet)
## Warning: package 'glmnet' was built under R version 4.1.3
## Loading required package: Matrix
## Loaded glmnet 4.1-3
grid=10^seq(10,-2,length=100)
ridge.mod=glmnet(x,y,alpha=0,lambda=grid)
set.seed(1)
train=sample(1:nrow(x),nrow(x)/2)
test=(-train)
y.test=y[test]
cv.out=cv.glmnet(x[train,],y[train],alpha=0)
plot(cv.out)

bestlam=cv.out$lambda.min
bestlam
## [1] 405.8404

(d) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
plot(lasso.mod)
## Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
## collapsing to unique 'x' values

set.seed(1)
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
plot(cv.out)

bestlam =cv.out$lambda.min
lasso.pred=predict(lasso.mod ,s=bestlam ,newx=x[test ,])
mean((lasso.pred - y.test)^2)
## [1] 1140717

(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

library(pls)
## Warning: package 'pls' was built under R version 4.1.3
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
pcr.fit=pcr(Apps ~ ., data = College, scale=TRUE, validation ="CV")
summary(pcr.fit)
## Data:    X dimension: 777 17 
##  Y dimension: 777 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3873     3836     2035     2041     1784     1591     1595
## adjCV         3873     3838     2033     2040     1720     1585     1592
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1581     1557     1514      1513      1515      1514      1520
## adjCV     1582     1551     1511      1509      1512      1510      1516
##        14 comps  15 comps  16 comps  17 comps
## CV         1521      1466      1176      1139
## adjCV      1518      1450      1169      1133
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X      31.670    57.30    64.30    69.90    75.39    80.38    83.99    87.40
## Apps    2.316    73.06    73.07    82.08    84.08    84.11    84.32    85.18
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.50     92.91     95.01     96.81      97.9     98.75     99.36
## Apps    85.88     86.06     86.06     86.10      86.1     86.13     90.32
##       16 comps  17 comps
## X        99.84    100.00
## Apps     92.52     92.92
validationplot(pcr.fit,val.type="MSEP")

(f) Fit a PLS model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.

set.seed(1)
pls.fit=plsr(Apps ~ .,data=College, subset=train,scale=TRUE,validation ="CV")
summary (pls.fit)
## Data:    X dimension: 388 17 
##  Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4288     2217     2019     1761     1630     1533     1347
## adjCV         4288     2211     2012     1749     1605     1510     1331
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1309     1303     1286      1283      1283      1277      1271
## adjCV     1296     1289     1273      1270      1270      1264      1258
##        14 comps  15 comps  16 comps  17 comps
## CV         1270      1270      1270      1270
## adjCV      1258      1257      1257      1257
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       27.21    50.73    63.06    65.52    70.20    74.20    78.62    80.81
## Apps    75.39    81.24    86.97    91.14    92.62    93.43    93.56    93.68
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       83.29     87.17     89.15     91.37     92.58     94.42     96.98
## Apps    93.76     93.79     93.83     93.86     93.88     93.89     93.89
##       16 comps  17 comps
## X        98.78    100.00
## Apps     93.89     93.89
validationplot(pls.fit,val.type="MSEP")

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?
There is not much difference between the 5 different approaches, but Ridge seems to be the worst.