2

  1. The lasso, relative to least squares, is:
  1. More flexible and hence will give improved prediction ac- curacy when its increase in bias is less than its decrease in variance.

  2. More flexible and hence will give improved prediction accu- racy when its increase in variance is less than its decrease in bias.

  3. Less flexible and hence will give improved prediction accu- racy when its increase in bias is less than its decrease in variance.

  4. Less flexible and hence will give improved prediction accu- racy when its increase in variance is less than its decrease in bias.

False, because Lasso can have coefficients that can be exactly zero which makes it not as flexible.

  1. Repeat (a) for ridge regression relative to least squares.

iii is true for ridge regression as well because is not the most flexible but more flexible than lasso, and the bias increase is smaller than the variance reduction.

  1. Linear Model Selection and Regularization
  1. Repeat (a) for non-linear methods relative to least squares

i, is true for non-linear methods relative to least squares because we want the variance is less than the relative decrease that it receives in terms of bias.

9

In this exercise, we will predict the number of applications received using the other variables in the College data set. (a) Split the data set into a training set and a test set.

#checking the structure 
str(df)
'data.frame':   777 obs. of  17 variables:
 $ Apps       : int  1660 2186 1428 417 193 587 353 1899 1038 582 ...
 $ Accept     : int  1232 1924 1097 349 146 479 340 1720 839 498 ...
 $ Enroll     : int  721 512 336 137 55 158 103 489 227 172 ...
 $ Top10perc  : int  23 16 22 60 16 38 17 37 30 21 ...
 $ Top25perc  : int  52 29 50 89 44 62 45 68 63 44 ...
 $ F.Undergrad: int  2885 2683 1036 510 249 678 416 1594 973 799 ...
 $ P.Undergrad: int  537 1227 99 63 869 41 230 32 306 78 ...
 $ Outstate   : int  7440 12280 11250 12960 7560 13500 13290 13868 15595 10468 ...
 $ Room.Board : int  3300 6450 3750 5450 4120 3335 5720 4826 4400 3380 ...
 $ Books      : int  450 750 400 450 800 500 500 450 300 660 ...
 $ Personal   : int  2200 1500 1165 875 1500 675 1500 850 500 1800 ...
 $ PhD        : int  70 29 53 92 76 67 90 89 79 40 ...
 $ Terminal   : int  78 30 66 97 72 73 93 100 84 41 ...
 $ S.F.Ratio  : num  18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...
 $ perc.alumni: int  12 16 30 37 2 11 26 37 23 15 ...
 $ Expend     : int  7041 10527 8735 19016 10922 9727 8861 11487 11644 8991 ...
 $ Grad.Rate  : int  60 56 54 59 15 55 63 73 80 52 ...
# creating training and testing data
index = sample(1:nrow(df), size = 0.7 *nrow(df))
train = df[index,]
test = df[-index,]

x.train = model.matrix(Apps~.,train)[,-1]
y.train = train$Apps
x.test = model.matrix(Apps~.,train)[,-1]
y.test = train$Apps
  1. Fit a linear model using least squares on the training set, and report the test error obtained.
lm = lm(Apps ~ .,data = train)
summary(lm)

Call:
lm(formula = Apps ~ ., data = train)

Residuals:
    Min      1Q  Median      3Q     Max 
-3470.2  -447.5   -39.1   305.6  6861.9 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.197e+03  4.851e+02  -2.467 0.013931 *  
Accept       1.279e+00  6.654e-02  19.215  < 2e-16 ***
Enroll      -2.212e-01  2.694e-01  -0.821 0.411890    
Top10perc    4.733e+01  7.090e+00   6.676 6.25e-11 ***
Top25perc   -1.383e+01  5.552e+00  -2.491 0.013049 *  
F.Undergrad  7.968e-02  4.225e-02   1.886 0.059860 .  
P.Undergrad  5.029e-02  3.773e-02   1.333 0.183104    
Outstate    -9.617e-02  2.282e-02  -4.213 2.96e-05 ***
Room.Board   1.460e-01  6.010e-02   2.429 0.015487 *  
Books       -5.628e-02  2.715e-01  -0.207 0.835869    
Personal    -1.769e-02  7.358e-02  -0.240 0.810121    
PhD         -6.853e+00  5.587e+00  -1.227 0.220525    
Terminal     5.239e-02  6.294e+00   0.008 0.993362    
S.F.Ratio    2.026e+01  1.651e+01   1.227 0.220190    
perc.alumni -9.682e+00  5.166e+00  -1.874 0.061487 .  
Expend       9.455e-02  1.621e-02   5.832 9.55e-09 ***
Grad.Rate    1.348e+01  3.900e+00   3.455 0.000595 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1065 on 526 degrees of freedom
Multiple R-squared:  0.9127,    Adjusted R-squared:  0.9101 
F-statistic: 343.7 on 16 and 526 DF,  p-value: < 2.2e-16
set.seed(1)
# finding the test error by predicting on the test data 
predictions = predict(lm, newdata = test)

actual = test$Apps 
mse = mean((predictions - actual)^2)
cat("MSE:", mse)
MSE: 1328973
  1. Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
x = model.matrix(Apps ~.,df)[,-1]
y = df$Apps
library(glmnet)
grid = 10^seq(10,-2, length =100)
ridge.mod = glmnet(x,y, alpha = 0 , lambda = grid)
# cross validation 
cv.ridge = cv.glmnet(x,y, alpha = 0, lambda = grid)
best.lambda = cv.ridge$lambda.min 
print(paste("Best lambda:", best.lambda))
[1] "Best lambda: 0.01"
cat("MSE:", mse)
MSE: 1154607
  1. Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.
print(paste("Best Lasso Lambda:", round(best.lasso.lam,3)))
[1] "Best Lasso Lambda: 0.376"
lasso.model = glmnet(x.train,y.train,alpha = 1,lambda = best.lasso.lam)
lasso.pred = predict(lasso.model,x.test)

mse = mean((lasso.pred-y.test)^2)
cat("Test MSE for Lasso Regression", mse)
Test MSE for Lasso Regression 1099404
  1. Fit a PCR model on the training set, with M chosen by cross- validation. Report the test error obtained, along with the value of M selected by cross-validation.
summary(pcr.fit)
Data:   X dimension: 777 16 
    Y dimension: 777 1
Fit method: svdpc
Number of components considered: 16

VALIDATION: RMSEP
Cross-validated using 10 random segments.
       (Intercept)  1 comps  2 comps  3 comps  4 comps
CV            3873     3667     1991     1977     1905
adjCV         3873     3666     1988     1976     1986
       5 comps  6 comps  7 comps  8 comps  9 comps
CV        1642     1601     1602     1560     1515
adjCV     1622     1597     1605     1555     1512
       10 comps  11 comps  12 comps  13 comps  14 comps
CV         1514      1514      1518      1518      1450
adjCV      1511      1511      1514      1514      1433
       15 comps  16 comps
CV         1169      1143
adjCV      1164      1137

TRAINING: % variance explained
      1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
X       32.77    57.08    64.39    70.22    75.96    81.25
Apps    10.83    74.28    74.53    74.62    83.94    84.06
      7 comps  8 comps  9 comps  10 comps  11 comps
X       85.03    88.65    91.93     94.44     96.39
Apps    84.13    85.00    85.80     85.93     85.98
      12 comps  13 comps  14 comps  15 comps  16 comps
X        97.77     98.67     99.32     99.83     100.0
Apps     85.99     86.02     90.20     92.36      92.8

All the predictors indicate the CV error is the lowest.

pcr.fit = pcr(Apps ~.,data = df, subset=x.train, scale = T,validation = "CV")

pcr.pred = predict(pcr.fit,x.test, ncomp = 16) 
mean((pcr.pred -y.test)^2)
[1] 1270180
pcr.fit = pcr(y~x , scale =T, ncomp = 16)
summary(pcr.fit)
Data:   X dimension: 777 16 
    Y dimension: 777 1
Fit method: svdpc
Number of components considered: 16
TRAINING: % variance explained
   1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
X    32.77    57.08    64.39    70.22    75.96    81.25
y    10.83    74.28    74.53    74.62    83.94    84.06
   7 comps  8 comps  9 comps  10 comps  11 comps  12 comps
X    85.03    88.65    91.93     94.44     96.39     97.77
y    84.13    85.00    85.80     85.93     85.98     85.99
   13 comps  14 comps  15 comps  16 comps
X     98.67     99.32     99.83     100.0
y     86.02     90.20     92.36      92.8
  1. Fit a PLS model on the training set, with M chosen by cross- validation. Report the test error obtained, along with the value of M selected by cross-validation.
set.seed(1)
pls.fit = plsr(Apps ~., data = df, subset = x.train, scale = T, validation = "CV")
summary(pls.fit)
Data:   X dimension: 5490 16 
    Y dimension: 5490 1
Fit method: kernelpls
Number of components considered: 16

VALIDATION: RMSEP
Cross-validated using 10 random segments.
       (Intercept)  1 comps  2 comps  3 comps  4 comps
CV            3427     1631     1480     1232     1104
adjCV         3427     1631     1479     1232     1103
       5 comps  6 comps  7 comps  8 comps  9 comps
CV        1030    980.3    974.8    969.4    965.1
adjCV     1029    979.3    973.6    968.6    964.3
       10 comps  11 comps  12 comps  13 comps  14 comps
CV        960.5     958.0     957.7     957.4     957.3
adjCV     959.7     957.2     956.8     956.5     956.4
       15 comps  16 comps
CV        957.2     957.2
adjCV     956.4     956.3

TRAINING: % variance explained
      1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
X       25.76    54.35    62.55    66.30    71.37    74.48
Apps    77.69    81.72    87.39    90.03    91.33    92.10
      7 comps  8 comps  9 comps  10 comps  11 comps
X       77.60    81.30    84.37     86.84     90.59
Apps    92.19    92.27    92.35     92.42     92.44
      12 comps  13 comps  14 comps  15 comps  16 comps
X        93.41     94.57     96.80     98.70    100.00
Apps     92.45     92.46     92.46     92.46     92.46
validationplot(pls.fit,val.type = "MSEP")

out = which.min(pls.fit$validation$PRESS)
out
[1] 16

The whole model has the lowest CV error rate.

pls.pred = predict(pls.fit,x.test,ncomp = 16)
mean((pls.pred-y.test)^2)
[1] 1270180
summary(pls.fit)
Data:   X dimension: 5490 16 
    Y dimension: 5490 1
Fit method: kernelpls
Number of components considered: 16

VALIDATION: RMSEP
Cross-validated using 10 random segments.
       (Intercept)  1 comps  2 comps  3 comps  4 comps
CV            3427     1631     1480     1232     1104
adjCV         3427     1631     1479     1232     1103
       5 comps  6 comps  7 comps  8 comps  9 comps
CV        1030    980.3    974.8    969.4    965.1
adjCV     1029    979.3    973.6    968.6    964.3
       10 comps  11 comps  12 comps  13 comps  14 comps
CV        960.5     958.0     957.7     957.4     957.3
adjCV     959.7     957.2     956.8     956.5     956.4
       15 comps  16 comps
CV        957.2     957.2
adjCV     956.4     956.3

TRAINING: % variance explained
      1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
X       25.76    54.35    62.55    66.30    71.37    74.48
Apps    77.69    81.72    87.39    90.03    91.33    92.10
      7 comps  8 comps  9 comps  10 comps  11 comps
X       77.60    81.30    84.37     86.84     90.59
Apps    92.19    92.27    92.35     92.42     92.44
      12 comps  13 comps  14 comps  15 comps  16 comps
X        93.41     94.57     96.80     98.70    100.00
Apps     92.45     92.46     92.46     92.46     92.46
  1. Comment on the results obtained. How accurately can we pre- dict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

I do not think that we can predict all that accurately how many applications colleges are going to receive. When looking at all of the model there seems to be a wide range of values when looking at the following MSE values. Generally, we would want the MSE values to be particularly lower, but they are not so that is not a great indicator of predicting the number of applications that colleges are going to receive.

11

We will now try to predict per capita crime rate in the Boston data set.

  1. Try out some of the regression methods explored in this chapter,such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
library(MASS)
data(Boston)

boston.index = sample(1:nrow(Boston), size = 0.7 * nrow(Boston))

boston.train = Boston[boston.index,]
boston.test = Boston[-boston.index,]

bx.train = model.matrix(crim ~.,boston.train)
by.train = boston.train$crim
bx.test =model.matrix(crim ~.,boston.test)
by.test = boston.test$crim

Best Subsets Method

library(leaps)
# Best Subsets Method 
regfit.full = regsubsets(crim ~., boston.train,nvmax = 13)

reg.summary= summary(regfit.full)

names(reg.summary)
[1] "which"  "rsq"    "rss"    "adjr2"  "cp"     "bic"   
[7] "outmat" "obj"   
val.errors = rep(NA,13)
for (i in 1:13) {
  coefi = coef(regfit.full, id = i)
  pred = bx.test[,names(coefi)] %*% coefi
  val.errors[i] = mean((by.test - pred)^2)
}
val.errors
which.min(val.errors)
coef(regfit.full,1)

The model with crim as the dependent variable and rad as the independent variable yields the lowest test MSE with a value of \(13.3\).

Lasso Method

set.seed(1)
cv.out = cv.glmnet(bx.train,by.train,alpha = 1)
bos.lam = cv.out$lambda.min
bos.las.pred = predict(blasso.model, s= best.lasso.lam, newx = bx.test)
mean((bos.las.pred -by.test)^2)
[1] 13.63303

MSE is close to that of the Best Subsets Method # Ridge Method

boston.ridge = glmnet(bx.train,by.train, alpha = 0, lambda = 0)

cv.bos.rid = cv.glmnet(bx.train, by.train,alpha = 0)
plot(cv.bos.rid)

best.bos.ridge.lam = cv.bos.rid$lambda.min
bos.ridge.pred = predict(boston.ridge,s =best.bos.ridge.lam,newx = bx.test)
mean((bos.ridge.pred-by.test)^2)
[1] 18.0694

MSE is higher than the other models.

PCR Method

set.seed(2)
bos.pcr = pcr(crim ~., data = Boston, scale = T, validation = "CV")
summary(bos.pcr)
Data:   X dimension: 506 13 
    Y dimension: 506 1
Fit method: svdpc
Number of components considered: 13

VALIDATION: RMSEP
Cross-validated using 10 random segments.
       (Intercept)  1 comps  2 comps  3 comps  4 comps
CV            8.61    7.229    7.227    6.814    6.799
adjCV         8.61    7.225    7.222    6.807    6.789
       5 comps  6 comps  7 comps  8 comps  9 comps
CV       6.795    6.794    6.787    6.654    6.664
adjCV    6.788    6.787    6.780    6.645    6.656
       10 comps  11 comps  12 comps  13 comps
CV        6.673     6.676     6.651     6.573
adjCV     6.664     6.666     6.639     6.562

TRAINING: % variance explained
      1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
X       47.70    60.36    69.67    76.45    82.99    88.00
crim    30.69    30.87    39.27    39.61    39.61    39.86
      7 comps  8 comps  9 comps  10 comps  11 comps
X       91.14    93.45    95.40     97.04     98.46
crim    40.14    42.47    42.55     42.78     43.04
      12 comps  13 comps
X        99.52     100.0
crim     44.13      45.4
validationplot(bos.pcr, val.type = "MSEP")

# Cross Validating to find the best M value 
set.seed(1)
bos.pcr2 = pcr(crim ~., data = Boston,subset = bx.train, scale = T, validation = "CV")
summary(bos.pcr2)
Data:   X dimension: 3916 13 
    Y dimension: 3916 1
Fit method: svdpc
Number of components considered: 13

VALIDATION: RMSEP
Cross-validated using 10 random segments.
       (Intercept)  1 comps  2 comps  3 comps  4 comps
CV            3.31    2.036    1.818    1.714    1.714
adjCV         3.31    2.036    1.818    1.714    1.714
       5 comps  6 comps  7 comps  8 comps  9 comps
CV       1.615    1.589     1.53    1.528    1.515
adjCV    1.615    1.588     1.53    1.528    1.515
       10 comps  11 comps  12 comps  13 comps
CV        1.478     1.438     1.396     1.381
adjCV     1.477     1.438     1.395     1.381

TRAINING: % variance explained
      1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
X       48.11     59.6    68.81    76.66    83.79    89.08
crim    62.29     70.0    73.40    73.43    76.44    77.22
      7 comps  8 comps  9 comps  10 comps  11 comps
X       93.07    95.99    97.37     98.40     99.19
crim    78.86    78.95    79.33     80.36     81.48
      12 comps  13 comps
X        99.79    100.00
crim     82.55     82.94
validationplot(bos.pcr2, val.type = "MSEP")

which.min(bos.pcr2$validation$PRESS)
[1] 13
mean((bos.pcr.pred-by.test)^2)
Warning: longer object length is not a multiple of shorter object length
[1] 39.8526

The MSE is the highest between all of the other methods.

  1. Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross-validation, or some other reasonable alternative, as opposed to using training error.

The best model that was used was the Best Subsets Method which performed well on the test data being \(13.3\).

  1. Does your chosen model involve all of the features in the data set? Why or why not?

This model only had one variable that being rad on crim.

---
title: "HW 5"
output: 
  html_notebook:
    toc: True
    toc_float: TRUE
---

# 2

(a) The lasso, relative to least squares, is:


i. More flexible and hence will give improved prediction ac-
curacy when its increase in bias is less than its decrease in
variance. 


ii. More flexible and hence will give improved prediction accu-
racy when its increase in variance is less than its decrease
in bias. 

iii. Less flexible and hence will give improved prediction accu-
racy when its increase in bias is less than its decrease in
variance. 

iv. Less flexible and hence will give improved prediction accu-
racy when its increase in variance is less than its decrease
in bias. 

False, because Lasso can have coefficients that can be exactly zero which makes it not as flexible. 

(b) Repeat (a) for ridge regression relative to least squares.

iii is true for ridge regression as well because is not the most flexible but more flexible than lasso, and the bias increase is smaller than the variance reduction.

6. Linear Model Selection and Regularization

(c) Repeat (a) for non-linear methods relative to least squares

i, is true for non-linear methods relative to least squares because we want the variance is less than the relative decrease that it receives in terms of bias.

# 9
In this exercise, we will predict the number of applications received
using the other variables in the College data set.
(a) Split the data set into a training set and a test set.
```{r}
df= read.csv("College.csv") # reading in the data
df = df[-c(1:2)]
```


```{r}
#checking the structure 
str(df)
```

```{r} 
# creating training and testing data
index = sample(1:nrow(df), size = 0.7 *nrow(df))
train = df[index,]
test = df[-index,]

x.train = model.matrix(Apps~.,train)[,-1]
y.train = train$Apps
x.test = model.matrix(Apps~.,train)[,-1]
y.test = train$Apps
```


(b) Fit a linear model using least squares on the training set, and
report the test error obtained.
```{r}
lm = lm(Apps ~ .,data = train)
summary(lm)
```

```{r}
set.seed(1)
# finding the test error by predicting on the test data 
predictions = predict(lm, newdata = test)

actual = test$Apps 
mse = mean((predictions - actual)^2)
cat("MSE:", mse)
```

(c) Fit a ridge regression model on the training set, with λ chosen
by cross-validation. Report the test error obtained.

```{r}
x = model.matrix(Apps ~.,df)[,-1]
y = df$Apps
```

```{r}
library(glmnet)
grid = 10^seq(10,-2, length =100)
ridge.mod = glmnet(x,y, alpha = 0 , lambda = grid)
```

```{r}
# cross validation 
cv.ridge = cv.glmnet(x,y, alpha = 0, lambda = grid)
best.lambda = cv.ridge$lambda.min 
print(paste("Best lambda:", best.lambda))
```

```{r}
ridge.model = glmnet(x,y, alpha = 0, lambda = best.lambda)
ridge.pred = predict(ridge.model,x.test)

mse = mean((ridge.pred-y.test)^2)
cat("MSE:", mse)
```


(d) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.
```{r}
# cross validation o
cv.lasso = cv.glmnet(x,y,alpha =1,lambda = grid)
best.lasso.lam = cv.lasso$lambda.min
print(paste("Best Lasso Lambda:", round(best.lasso.lam,3)))
```
```{r}
lasso.model = glmnet(x.train,y.train,alpha = 1,lambda = best.lasso.lam)
lasso.pred = predict(lasso.model,x.test)

mse = mean((lasso.pred-y.test)^2)
cat("Test MSE for Lasso Regression", mse)

```

(e) Fit a PCR model on the training set, with M chosen by cross-
validation. Report the test error obtained, along with the value
of M selected by cross-validation.

```{r}
library(pls)
set.seed(1)

pcr.fit = pcr(Apps ~.,data = df,scale = T, validation ="CV")
summary(pcr.fit)
```

```{r}
validationplot(pcr.fit, val.type = "MSEP")

```
All the predictors indicate the CV error is the lowest.

```{r}
pcr.fit = pcr(Apps ~.,data = df, subset=x.train, scale = T,validation = "CV")

pcr.pred = predict(pcr.fit,x.test, ncomp = 16) 
mean((pcr.pred -y.test)^2)
```


```{r}
pcr.fit = pcr(y~x , scale =T, ncomp = 16)
summary(pcr.fit)
```

(f) Fit a PLS model on the training set, with M chosen by cross-
validation. Report the test error obtained, along with the value
of M selected by cross-validation.

```{r}
set.seed(1)
pls.fit = plsr(Apps ~., data = df, subset = x.train, scale = T, validation = "CV")
summary(pls.fit)
```

```{r}
validationplot(pls.fit,val.type = "MSEP")
out = which.min(pls.fit$validation$PRESS)
out
```
The whole model has the lowest CV error rate.

```{r}
pls.pred = predict(pls.fit,x.test,ncomp = 16)
mean((pls.pred-y.test)^2)
```

```{r}
pls.fit = plsr(Apps~.,data = df,scale = T, ncomp=16)
summary(pls.fit)
```


(g) Comment on the results obtained. How accurately can we pre-
dict the number of college applications received? Is there much
difference among the test errors resulting from these five approaches?

I do not think that we can predict all that accurately how many applications colleges are going to receive. When looking at all of the model there seems to be a wide range of values when looking at the following MSE values. Generally, we would want the MSE values to be particularly lower, but they are not so that is not a great indicator of predicting the number of applications that colleges are going to receive.



# 11
We will now try to predict per capita crime rate in the Boston data
set.


(a) Try out some of the regression methods explored in this chapter,such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

```{r}
library(MASS)
data(Boston)

boston.index = sample(1:nrow(Boston), size = 0.7 * nrow(Boston))

boston.train = Boston[boston.index,]
boston.test = Boston[-boston.index,]

bx.train = model.matrix(crim ~.,boston.train)
by.train = boston.train$crim
bx.test =model.matrix(crim ~.,boston.test)
by.test = boston.test$crim

```
## Best Subsets Method
```{r}
library(leaps)
# Best Subsets Method 
regfit.full = regsubsets(crim ~., boston.train,nvmax = 13)

reg.summary= summary(regfit.full)

names(reg.summary)
```

```{r}
val.errors = rep(NA,13)
for (i in 1:13) {
  coefi = coef(regfit.full, id = i)
  pred = bx.test[,names(coefi)] %*% coefi
  val.errors[i] = mean((by.test - pred)^2)
}
val.errors
which.min(val.errors)
coef(regfit.full,1)
```
The model with *crim* as the dependent variable and *rad* as the independent variable yields the lowest test MSE with a value of $13.3$.


## Lasso Method


```{r}
blasso.model = glmnet(bx.train, by.train, alpha = 1, lambda = grid)
plot(blasso.model)
```
```{r}
set.seed(1)
cv.out = cv.glmnet(bx.train,by.train,alpha = 1)
bos.lam = cv.out$lambda.min
bos.las.pred = predict(blasso.model, s= best.lasso.lam, newx = bx.test)
mean((bos.las.pred -by.test)^2)
```
MSE is close to that of the Best Subsets Method 
# Ridge Method 

```{r}
boston.ridge = glmnet(bx.train,by.train, alpha = 0, lambda = 0)

cv.bos.rid = cv.glmnet(bx.train, by.train,alpha = 0)
plot(cv.bos.rid)
best.bos.ridge.lam = cv.bos.rid$lambda.min
bos.ridge.pred = predict(boston.ridge,s =best.bos.ridge.lam,newx = bx.test)
mean((bos.ridge.pred-by.test)^2)
```
MSE is higher than the other models.


## PCR Method 
```{r}
set.seed(2)
bos.pcr = pcr(crim ~., data = Boston, scale = T, validation = "CV")
summary(bos.pcr)
validationplot(bos.pcr, val.type = "MSEP")
```

```{r}
# Cross Validating to find the best M value 
set.seed(1)
bos.pcr2 = pcr(crim ~., data = Boston,subset = bx.train, scale = T, validation = "CV")
summary(bos.pcr2)
validationplot(bos.pcr2, val.type = "MSEP")
which.min(bos.pcr2$validation$PRESS)
```
```{r}
# Testing on test data
bos.pcr.pred = predict(bos.pcr2, newx = bx.test, ncomp = 13)
mean((bos.pcr.pred-by.test)^2)
```
The MSE is the highest between all of the other methods.

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, cross-validation, or some other reasonable alternative, as opposed to using training error.

The best model that was used was the Best Subsets Method which performed well on the test data being $13.3$.

(c) Does your chosen model involve all of the features in the data set? Why or why not?

This model only had one variable that being *rad* on *crim*.
