Project4

#The data contains information collected by the US Bureau of the Census concerning housing in the area of Boston, Massachusetts. Let's use the code below to display the data.

D= MASS::Boston
#The goal of this preject is to predict the median house price in new tracts based on information such as crime rate, pollution, and number of rooms.

1.Partition the data into a training set and a validation set. Set the seed to the number “3.14” by issuing the code set.seed(3.14).

#Before we partition the data into a training and a validation set, it is important to know why we divide the data. We separate data into training and validation sets because it gives the model something to predict over which it was not shown before. The training set is used to create the model and validation is to predict the accuracy/performance of the model.

set.seed(3.14)

#Here we split the dataset into 80% 'training' and 20% 'validation'.
ind<-sample(2, nrow(D), replace=TRUE, prob=c(0.8, 0.2))
training_data<-D[ind==1,]
validation_data <- D[ind==2,]

#Training set is used to create the model.
head(training_data)

     crim   zn indus chas   nox    rm  age    dis rad tax ptratio  black lstat
1 0.00632 18.0  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98
3 0.02729  0.0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03
4 0.03237  0.0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94
5 0.06905  0.0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90  5.33
6 0.02985  0.0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12  5.21
7 0.08829 12.5  7.87    0 0.524 6.012 66.6 5.5605   5 311    15.2 395.60 12.43
  medv
1 24.0
3 34.7
4 33.4
5 36.2
6 28.7
7 22.9

#As we can see from the table, what ever data is missing in tdata has gone to vdata. Validation predicts the accuracy of model.
head(validation_data)

      crim zn indus chas   nox    rm  age    dis rad tax ptratio  black lstat
2  0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14
15 0.63796  0  8.14    0 0.538 6.096 84.5 4.4619   4 307    21.0 380.02 10.26
16 0.62739  0  8.14    0 0.538 5.834 56.5 4.4986   4 307    21.0 395.62  8.47
19 0.80271  0  8.14    0 0.538 5.456 36.6 3.7965   4 307    21.0 288.99 11.69
28 0.95577  0  8.14    0 0.538 6.047 88.8 4.4534   4 307    21.0 306.38 17.28
37 0.09744  0  5.96    0 0.499 5.841 61.4 3.3779   5 279    19.2 377.56 11.41
   medv
2  21.6
15 18.2
16 19.9
19 20.2
28 14.8
37 20.0

#Thing to remember: We develop multiple regression model based on training dataset and we test that model using validation data which is vdata in our case. If the predictions are good, that means our model is doing fine.

2.Use the training data to do stepwise regression with the three options (backward, forward, both). Choose the top model from each regression. Then, use each of these models to make prediction for MEDV based on the validation data. Compare the models in terms of RMSE. If the three models are the same, then no comparison is needed and just report the RMSE of the common model.

m0 = lm(medv ~ 1, data = D) # Fit the intercept-only model (no predictor)
summary(m0)


Call:
lm(formula = medv ~ 1, data = D)

Residuals:
    Min      1Q  Median      3Q     Max 
-17.533  -5.508  -1.333   2.467  27.467 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  22.5328     0.4089   55.11   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.197 on 505 degrees of freedom

m = lm(medv ~ ., data = D) # Fit the model with all predictors in
summary(m)


Call:
lm(formula = medv ~ ., data = D)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.595  -2.730  -0.518   1.777  26.199 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
crim        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
zn           4.642e-02  1.373e-02   3.382 0.000778 ***
indus        2.056e-02  6.150e-02   0.334 0.738288    
chas         2.687e+00  8.616e-01   3.118 0.001925 ** 
nox         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
rm           3.810e+00  4.179e-01   9.116  < 2e-16 ***
age          6.922e-04  1.321e-02   0.052 0.958229    
dis         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
rad          3.060e-01  6.635e-02   4.613 5.07e-06 ***
tax         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
ptratio     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
black        9.312e-03  2.686e-03   3.467 0.000573 ***
lstat       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.745 on 492 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7338 
F-statistic: 108.1 on 13 and 492 DF,  p-value: < 2.2e-16

#To select a subset of predictors that maximize a criterion (such as R-square) or minimize a criterion , we can use the training data to do stepwise regression with the three options (backward, forward, both). Here, the procedure is provided by the package MASS.

suppressMessages(library(MASS))

m<-stepAIC(m, direction = "backward")

Start:  AIC=1589.64
medv ~ crim + zn + indus + chas + nox + rm + age + dis + rad + 
    tax + ptratio + black + lstat

          Df Sum of Sq   RSS    AIC
- age      1      0.06 11079 1587.7
- indus    1      2.52 11081 1587.8
<none>                 11079 1589.6
- chas     1    218.97 11298 1597.5
- tax      1    242.26 11321 1598.6
- crim     1    243.22 11322 1598.6
- zn       1    257.49 11336 1599.3
- black    1    270.63 11349 1599.8
- rad      1    479.15 11558 1609.1
- nox      1    487.16 11566 1609.4
- ptratio  1   1194.23 12273 1639.4
- dis      1   1232.41 12311 1641.0
- rm       1   1871.32 12950 1666.6
- lstat    1   2410.84 13490 1687.3

Step:  AIC=1587.65
medv ~ crim + zn + indus + chas + nox + rm + dis + rad + tax + 
    ptratio + black + lstat

          Df Sum of Sq   RSS    AIC
- indus    1      2.52 11081 1585.8
<none>                 11079 1587.7
- chas     1    219.91 11299 1595.6
- tax      1    242.24 11321 1596.6
- crim     1    243.20 11322 1596.6
- zn       1    260.32 11339 1597.4
- black    1    272.26 11351 1597.9
- rad      1    481.09 11560 1607.2
- nox      1    520.87 11600 1608.9
- ptratio  1   1200.23 12279 1637.7
- dis      1   1352.26 12431 1643.9
- rm       1   1959.55 13038 1668.0
- lstat    1   2718.88 13798 1696.7

Step:  AIC=1585.76
medv ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio + 
    black + lstat

          Df Sum of Sq   RSS    AIC
<none>                 11081 1585.8
- chas     1    227.21 11309 1594.0
- crim     1    245.37 11327 1594.8
- zn       1    257.82 11339 1595.4
- black    1    270.82 11352 1596.0
- tax      1    273.62 11355 1596.1
- rad      1    500.92 11582 1606.1
- nox      1    541.91 11623 1607.9
- ptratio  1   1206.45 12288 1636.0
- dis      1   1448.94 12530 1645.9
- rm       1   1963.66 13045 1666.3
- lstat    1   2723.48 13805 1695.0


Call:
lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad + 
    tax + ptratio + black + lstat, data = D)

Coefficients:
(Intercept)         crim           zn         chas          nox           rm  
  36.341145    -0.108413     0.045845     2.718716   -17.376023     3.801579  
        dis          rad          tax      ptratio        black        lstat  
  -1.492711     0.299608    -0.011778    -0.946525     0.009291    -0.522553

n<-stepAIC(m0, direction = "forward", scope = list(lower = m0, upper = m))

Start:  AIC=2246.51
medv ~ 1

          Df Sum of Sq   RSS    AIC
+ lstat    1   23243.9 19472 1851.0
+ rm       1   20654.4 22062 1914.2
+ ptratio  1   11014.3 31702 2097.6
+ tax      1    9377.3 33339 2123.1
+ nox      1    7800.1 34916 2146.5
+ crim     1    6440.8 36276 2165.8
+ rad      1    6221.1 36495 2168.9
+ zn       1    5549.7 37167 2178.1
+ black    1    4749.9 37966 2188.9
+ dis      1    2668.2 40048 2215.9
+ chas     1    1312.1 41404 2232.7
<none>                 42716 2246.5

Step:  AIC=1851.01
medv ~ lstat

          Df Sum of Sq   RSS    AIC
+ rm       1    4033.1 15439 1735.6
+ ptratio  1    2670.1 16802 1778.4
+ chas     1     786.3 18686 1832.2
+ dis      1     772.4 18700 1832.5
+ tax      1     274.4 19198 1845.8
+ black    1     198.3 19274 1847.8
+ zn       1     160.3 19312 1848.8
+ crim     1     146.9 19325 1849.2
<none>                 19472 1851.0
+ rad      1      25.1 19447 1852.4
+ nox      1       4.8 19468 1852.9

Step:  AIC=1735.58
medv ~ lstat + rm

          Df Sum of Sq   RSS    AIC
+ ptratio  1   1711.32 13728 1678.1
+ chas     1    548.53 14891 1719.3
+ black    1    512.31 14927 1720.5
+ tax      1    425.16 15014 1723.5
+ dis      1    351.15 15088 1725.9
+ crim     1    311.42 15128 1727.3
+ rad      1    180.45 15259 1731.6
<none>                 15439 1735.6
+ zn       1     56.56 15383 1735.7
+ nox      1     14.90 15424 1737.1

Step:  AIC=1678.13
medv ~ lstat + rm + ptratio

        Df Sum of Sq   RSS    AIC
+ dis    1    499.08 13229 1661.4
+ black  1    389.68 13338 1665.6
+ chas   1    377.96 13350 1666.0
+ crim   1    122.52 13606 1675.6
<none>               13728 1678.1
+ tax    1     44.36 13684 1678.5
+ nox    1     24.81 13703 1679.2
+ zn     1     14.96 13713 1679.6
+ rad    1      6.07 13722 1679.9

Step:  AIC=1661.39
medv ~ lstat + rm + ptratio + dis

        Df Sum of Sq   RSS    AIC
+ nox    1    759.56 12469 1633.5
+ black  1    502.64 12726 1643.8
+ chas   1    267.43 12962 1653.1
+ tax    1    240.34 12989 1654.1
+ crim   1    233.54 12995 1654.4
+ zn     1    144.81 13084 1657.8
<none>               13229 1661.4
+ rad    1     22.40 13206 1662.5

Step:  AIC=1633.47
medv ~ lstat + rm + ptratio + dis + nox

        Df Sum of Sq   RSS    AIC
+ chas   1    328.27 12141 1622.0
+ black  1    311.83 12158 1622.7
+ zn     1    151.71 12318 1629.3
+ crim   1    141.43 12328 1629.7
+ rad    1     53.48 12416 1633.3
<none>               12469 1633.5
+ tax    1     10.50 12459 1635.0

Step:  AIC=1621.97
medv ~ lstat + rm + ptratio + dis + nox + chas

        Df Sum of Sq   RSS    AIC
+ black  1   272.837 11868 1612.5
+ zn     1   164.406 11977 1617.1
+ crim   1   116.330 12025 1619.1
+ rad    1    58.556 12082 1621.5
<none>               12141 1622.0
+ tax    1     4.187 12137 1623.8

Step:  AIC=1612.47
medv ~ lstat + rm + ptratio + dis + nox + chas + black

       Df Sum of Sq   RSS    AIC
+ zn    1   189.936 11678 1606.3
+ rad   1   144.320 11724 1608.3
+ crim  1    55.633 11813 1612.1
<none>              11868 1612.5
+ tax   1     2.703 11866 1614.4

Step:  AIC=1606.31
medv ~ lstat + rm + ptratio + dis + nox + chas + black + zn

       Df Sum of Sq   RSS    AIC
+ crim  1    94.712 11584 1604.2
+ rad   1    93.614 11585 1604.2
<none>              11678 1606.3
+ tax   1     3.952 11674 1608.1

Step:  AIC=1604.19
medv ~ lstat + rm + ptratio + dis + nox + chas + black + zn + 
    crim

       Df Sum of Sq   RSS    AIC
+ rad   1   228.604 11355 1596.1
<none>              11584 1604.2
+ tax   1     1.305 11582 1606.1

Step:  AIC=1596.1
medv ~ lstat + rm + ptratio + dis + nox + chas + black + zn + 
    crim + rad

       Df Sum of Sq   RSS    AIC
+ tax   1    273.62 11081 1585.8
<none>              11355 1596.1

Step:  AIC=1585.76
medv ~ lstat + rm + ptratio + dis + nox + chas + black + zn + 
    crim + rad + tax


Call:
lm(formula = medv ~ lstat + rm + ptratio + dis + nox + chas + 
    black + zn + crim + rad + tax, data = D)

Coefficients:
(Intercept)        lstat           rm      ptratio          dis          nox  
  36.341145    -0.522553     3.801579    -0.946525    -1.492711   -17.376023  
       chas        black           zn         crim          rad          tax  
   2.718716     0.009291     0.045845    -0.108413     0.299608    -0.011778

mn<-stepAIC(m, direction = "both", scope = list(lower = m0, upper = m))

Start:  AIC=1585.76
medv ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio + 
    black + lstat

          Df Sum of Sq   RSS    AIC
<none>                 11081 1585.8
- chas     1    227.21 11309 1594.0
- crim     1    245.37 11327 1594.8
- zn       1    257.82 11339 1595.4
- black    1    270.82 11352 1596.0
- tax      1    273.62 11355 1596.1
- rad      1    500.92 11582 1606.1
- nox      1    541.91 11623 1607.9
- ptratio  1   1206.45 12288 1636.0
- dis      1   1448.94 12530 1645.9
- rm       1   1963.66 13045 1666.3
- lstat    1   2723.48 13805 1695.0

mn


Call:
lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad + 
    tax + ptratio + black + lstat, data = D)

Coefficients:
(Intercept)         crim           zn         chas          nox           rm  
  36.341145    -0.108413     0.045845     2.718716   -17.376023     3.801579  
        dis          rad          tax      ptratio        black        lstat  
  -1.492711     0.299608    -0.011778    -0.946525     0.009291    -0.522553

summary(m)


Call:
lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad + 
    tax + ptratio + black + lstat, data = D)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5984  -2.7386  -0.5046   1.7273  26.2373 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.341145   5.067492   7.171 2.73e-12 ***
crim         -0.108413   0.032779  -3.307 0.001010 ** 
zn            0.045845   0.013523   3.390 0.000754 ***
chas          2.718716   0.854240   3.183 0.001551 ** 
nox         -17.376023   3.535243  -4.915 1.21e-06 ***
rm            3.801579   0.406316   9.356  < 2e-16 ***
dis          -1.492711   0.185731  -8.037 6.84e-15 ***
rad           0.299608   0.063402   4.726 3.00e-06 ***
tax          -0.011778   0.003372  -3.493 0.000521 ***
ptratio      -0.946525   0.129066  -7.334 9.24e-13 ***
black         0.009291   0.002674   3.475 0.000557 ***
lstat        -0.522553   0.047424 -11.019  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.736 on 494 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7348 
F-statistic: 128.2 on 11 and 494 DF,  p-value: < 2.2e-16

summary(n)


Call:
lm(formula = medv ~ lstat + rm + ptratio + dis + nox + chas + 
    black + zn + crim + rad + tax, data = D)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5984  -2.7386  -0.5046   1.7273  26.2373 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.341145   5.067492   7.171 2.73e-12 ***
lstat        -0.522553   0.047424 -11.019  < 2e-16 ***
rm            3.801579   0.406316   9.356  < 2e-16 ***
ptratio      -0.946525   0.129066  -7.334 9.24e-13 ***
dis          -1.492711   0.185731  -8.037 6.84e-15 ***
nox         -17.376023   3.535243  -4.915 1.21e-06 ***
chas          2.718716   0.854240   3.183 0.001551 ** 
black         0.009291   0.002674   3.475 0.000557 ***
zn            0.045845   0.013523   3.390 0.000754 ***
crim         -0.108413   0.032779  -3.307 0.001010 ** 
rad           0.299608   0.063402   4.726 3.00e-06 ***
tax          -0.011778   0.003372  -3.493 0.000521 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.736 on 494 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7348 
F-statistic: 128.2 on 11 and 494 DF,  p-value: < 2.2e-16

summary(mn)


Call:
lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad + 
    tax + ptratio + black + lstat, data = D)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5984  -2.7386  -0.5046   1.7273  26.2373 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.341145   5.067492   7.171 2.73e-12 ***
crim         -0.108413   0.032779  -3.307 0.001010 ** 
zn            0.045845   0.013523   3.390 0.000754 ***
chas          2.718716   0.854240   3.183 0.001551 ** 
nox         -17.376023   3.535243  -4.915 1.21e-06 ***
rm            3.801579   0.406316   9.356  < 2e-16 ***
dis          -1.492711   0.185731  -8.037 6.84e-15 ***
rad           0.299608   0.063402   4.726 3.00e-06 ***
tax          -0.011778   0.003372  -3.493 0.000521 ***
ptratio      -0.946525   0.129066  -7.334 9.24e-13 ***
black         0.009291   0.002674   3.475 0.000557 ***
lstat        -0.522553   0.047424 -11.019  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.736 on 494 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7348 
F-statistic: 128.2 on 11 and 494 DF,  p-value: < 2.2e-16

#Findings:

#We can see from the summary that the three models are the same, therefore, there is no need of comparison and we just report the RMSE of the common model. We have the value of multiple R^2 as 0.4505 and RMSE = 5.21.

3.Create a lift chart and a decile-wise lift chart for the best model obtained based on the model using the backward variable selection procedure.

library(gains)

# regression model based on all numerical predictors
reg <- lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad + 
    tax + ptratio + black + lstat, data = D)

# predictions
#pred_v <- predict(reg, newdata = D[validation_data])

#gain <- gains(D[validation_data,]$medv[!is.na(m)], m[!is.na(m)])

#plot(c(0,gain$cume.pct.of.total*sum(medv))~c(0,gain$cume.obs), 
     #xlab="#", ylab="Cumulative medv", main="Lift Chart", type="l")



# Decile-wise lift chart
#barplot(gain$mean.resp/mean(price), names.arg = gain$depth,
        #xlab = "Percentile", ylab = "Mean Response", main = "Decile-wise lift chart")

4.Fit ridge regression, lasso regression, and elastic net regression with the training data and then compare the performance of the 3 regressions in terms of RMSE.

# Setup a grid range of lambda values:
lambda <- 10^seq(-3, 3, length = 100)

# 1. # Build the model based on ridge regression
ridge <- train(
  medv ~., 
  data = training_data, 
  method = "glmnet",
  tuneGrid = expand.grid(alpha = 0, lambda = lambda)
)
# Model coefficients
coef(ridge$finalModel, ridge$bestTune$lambda)

## 14 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept)  3.223772e+01
## crim        -1.080828e-01
## zn           3.193412e-02
## indus       -4.999724e-02
## chas         3.038451e+00
## nox         -1.423678e+01
## rm           3.651674e+00
## age          8.950671e-04
## dis         -1.215940e+00
## rad          1.874167e-01
## tax         -5.197185e-03
## ptratio     -8.635587e-01
## black        8.894530e-03
## lstat       -5.223572e-01

# Make predictions
predictions <- ridge %>% predict(newdata = validation_data)
# Model prediction performance
data.frame(
  RMSE = RMSE(predictions, validation_data$medv),
  Rsquare = R2(predictions, validation_data$medv)
)

##       RMSE   Rsquare
## 1 4.152015 0.7678127

# 2. Build the model based on lasso regression
lasso <- train(
  medv ~., 
  data = training_data, 
  method = "glmnet",
  tuneGrid = expand.grid(alpha = 1, lambda = lambda)
  )

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

# Model coefficients
coef(lasso$finalModel, lasso$bestTune$lambda)

## 14 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept)  41.797118782
## crim         -0.128772396
## zn            0.045335521
## indus         0.016465095
## chas          2.745495132
## nox         -21.149205375
## rm            3.408370045
## age           0.009485832
## dis          -1.574481940
## rad           0.351625818
## tax          -0.012180005
## ptratio      -0.972779321
## black         0.008672929
## lstat        -0.589218413

# Make predictions
predictions <- lasso %>% predict(newdata = validation_data)
# Model prediction performance
data.frame(
  RMSE = RMSE(predictions, validation_data$medv),
  Rsquare = R2(predictions, validation_data$medv)
)

##       RMSE   Rsquare
## 1 4.332493 0.7514415

# 3. Build the model based on Elastic net:
elastic <- train(
  medv ~., 
  data = training_data, 
  method = "glmnet"
)
# Model coefficients
coef(elastic$finalModel, elastic$bestTune$lambda)

## 14 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept)  38.811194493
## crim         -0.120488959
## zn            0.040045129
## indus        -0.003379220
## chas          2.840609422
## nox         -19.091811691
## rm            3.504208726
## age           0.005605836
## dis          -1.472310961
## rad           0.288202219
## tax          -0.009268021
## ptratio      -0.941962764
## black         0.008690953
## lstat        -0.572782193

# Make predictions
predictions <- elastic %>% predict(newdata = validation_data)
# Model prediction performance
data.frame(
  RMSE = RMSE(predictions, validation_data$medv),
  Rsquare = R2(predictions, validation_data$medv)
)

##      RMSE   Rsquare
## 1 4.26269 0.7574006

# 4. Comparing performances of the models:
models <- list(ridge = ridge, lasso = lasso, elastic = elastic)

resamples(models) %>% summary( metric = "RMSE")

## 
## Call:
## summary.resamples(object = ., metric = "RMSE")
## 
## Models: ridge, lasso, elastic 
## Number of resamples: 25 
## 
## RMSE 
##             Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
## ridge   3.903173 4.527245 5.245453 5.069848 5.571126 6.098857    0
## lasso   4.371040 4.800932 4.923307 5.117049 5.462554 6.028990    0
## elastic 4.427861 4.723254 5.046577 5.100148 5.322544 6.130705    0

#Here we look at median to compare performance. We pick a small value of median to find the best regression model. From our findings, we can see that lasso has the least median, so it is the best model.

Project4

Santosh Tiwari

2/22/2022