Overview

In this lesson, we will discussed two alternate forms of linear regression called ridge regression and LASSO regression. These two methods are examples of regularization or shrinkage methods, in which model parameters are encouraged to be small. In this Report we will discuss about the change in lambda affects the Regression model

Load Packages

Tools for creating ridge and LASSO models are provided by the glmnet package.

library(glmnet)  
package 㤼㸱glmnet㤼㸲 was built under R version 3.6.3
library(reshape)
package 㤼㸱reshape㤼㸲 was built under R version 3.6.3
library(ggplot2)

Generate Data

In this lesson, we will be working with synthetic data. We will now generate the dataset and use it to create a dataframe called df.

       x                 y           
 Min.   :-3.8870   Min.   :-127.293  
 1st Qu.:-1.1963   1st Qu.: -10.483  
 Median : 0.3596   Median :   9.303  
 Mean   : 0.2668   Mean   :   9.269  
 3rd Qu.: 2.0789   3rd Qu.:  15.789  
 Max.   : 3.7405   Max.   : 114.023  

Plot Data

The cell below generates a scatterplot of the dataset.

g <- ggplot(data = df)

g1 <- g + geom_point(aes(x = x , y = y), col = "blue", alpha =0.7, pch = 19)+
  geom_smooth(aes(x= x, y=y),method ="lm")
g1

Create Training and Validation Sets

We will randomly split the dataset into a training set and a validation set.

set.seed(23)
sel <- sample(1:nrow(df), 0.7*nrow(df))

train <- df[sel, ]
valid <- df[-sel, ]

c(nrow(train), nrow(valid))
[1] 14  6

Polynomial Regression

Motivated by the apparently non-linear relationship exhibited in the scatterplot, we will fit a polynomial model to the training set.

#poly_mod <- lm(y ~ poly(x, 9), df)

poly_mod <- lm(y ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5) +
                 I(x^6) + I(x^7) + I(x^8) + I(x^9), train)
summary(poly_mod)

Call:
lm(formula = y ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5) + I(x^6) + 
    I(x^7) + I(x^8) + I(x^9), data = train)

Residuals:
      19        8       11        7       13        2        1       17        5 
-0.94511  9.88130  0.10056  2.55222 -7.52030  3.71294  0.09926 -6.84575 -1.00670 
      14       20        4        3        6 
-3.16802  0.24743 11.35406 -9.42186  0.95997 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)   8.11633    9.42120   0.861   0.4375  
x            36.69803   16.09769   2.280   0.0848 .
I(x^2)       -2.31905   14.59141  -0.159   0.8814  
I(x^3)      -30.06376   14.18018  -2.120   0.1013  
I(x^4)        2.62536    6.44076   0.408   0.7044  
I(x^5)        7.82333    4.08245   1.916   0.1278  
I(x^6)       -0.54048    0.88043  -0.614   0.5725  
I(x^7)       -0.79316    0.45212  -1.754   0.1542  
I(x^8)        0.02660    0.03520   0.756   0.4919  
I(x^9)        0.02733    0.01616   1.691   0.1660  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.63 on 4 degrees of freedom
Multiple R-squared:  0.9887,    Adjusted R-squared:  0.9632 
F-statistic: 38.78 on 9 and 4 DF,  p-value: 0.001547

Plot Fitted Curve

The figure below shows the fitted curve, as well as the residuals for both the training and validation sets.

Calculate Training and Validation r-Squared

We will now calculate the training and validation r-squared values for the polynomial model.

SST_train <- var(train$y) * (nrow(train) - 1)
SST_valid <- var(valid$y) * (nrow(valid) - 1)

SSE_train <- sum((train$y - train_pred_poly)^2)
SSE_valid <- sum((valid$y - valid_pred_poly)^2)

train_r2 <- 1 - SSE_train / SST_train
valid_r2 <- 1 - SSE_valid / SST_valid

cat('Training r-Squared:    ', train_r2, '\n',
    'Validation r-Squared: ', valid_r2, sep='')
Training r-Squared:    0.9886688
Validation r-Squared: -1.739167

Ridge Regression

Ridge regression is a regularization (shrinkage) technique that encourages model coefficients to be small by altering the loss function to include a penalty term that is based on the magnitude of the coeffiecients. The ridge loss function includes a regularization hyperparameter \(aplpha\) that determines how much weight should be placed on the penalty term. Different values of this hyperparameter will result in different models.

Goal: Minimize \(Loss\left(\hat\beta_0, ..., \hat\beta_m\right) = \frac{1}{n} SSE + \lambda \sum_{i=1}^{m} \hat\beta_i^2\)

Create Matrix of Predictors

The glmnet() function that provides the implementation of ridge regression in R does not use the same formula structure as other modeling functions that we have encountered. It requires the predictor values to be stored in a matrix of their own.

#Xmat <- model.matrix(y ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5) +
#                     I(x^6) + I(x^7) + I(x^8) + I(x^9))[,-1]

Xmat <- cbind(df$x, df$x^2, df$x^3, df$x^4, df$x^5, 
              df$x^6, df$x^7, df$x^8, df$x^9)
colnames(Xmat) <- c('x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9')
Xmat[,c(1,2,8,9)]
              x1         x2           x8            x9
 [1,]  2.5973953  6.7464624 2.071593e+03  5.380746e+03
 [2,] -3.0651918  9.3954010 7.792221e+03 -2.388465e+04
 [3,] -1.6017556  2.5656210 4.332813e+01 -6.940108e+01
 [4,] -1.1475141  1.3167886 3.006521e+00 -3.450025e+00
 [5,]  3.7215598 13.8500075 3.679595e+04  1.369383e+05
 [6,]  3.7404842 13.9912223 3.831975e+04  1.433344e+05
 [7,]  0.2647389  0.0700867 2.412918e-05  6.387932e-06
 [8,] -1.3427559  1.8029935 1.056761e+01 -1.418972e+01
 [9,]  1.2307501  1.5147457 5.264522e+00  6.479310e+00
[10,]  0.9163963  0.8397822 4.973553e-01  4.557745e-01
[11,] -3.8869633 15.1084840 5.210550e+04 -2.025322e+05
[12,]  2.1070352  4.4395972 3.884850e+02  8.185516e+02
[13,] -1.1255583  1.2668814 2.575988e+00 -2.899425e+00
[14,] -3.1891943 10.1709602 1.070158e+04 -3.412941e+04
[15,]  0.4543778  0.2064592 1.816925e-03  8.255703e-04
[16,] -0.6062526  0.3675423 1.824858e-02 -1.106325e-02
[17,] -0.6457376  0.4169770 3.023071e-02 -1.952111e-02
[18,]  3.4672877 12.0220839 2.088907e+04  7.242840e+04
[19,]  1.3765259  1.8948235 1.289066e+01  1.774432e+01
[20,]  2.0695417  4.2830028 3.365065e+02  6.964142e+02

We will use the sel variable created previous to recreate our train/validation split.

Xmat_train <- Xmat[sel,]
Xmat_valid <- Xmat[-sel,]

Create Ridge Models for Several Values of Lambda

When training a ridge model, it is important to select a good value for the regularization hyperparameter lambda. The glmnet() function allows us to specify several lambda values, and then simultaneously train ridge models for each value of lambda provided. We can then evaluate each of these models to identify the best value for lambda.

In the code chunk below, we train 100 ridge models. We will then extract the parameter estimates for each of these models.

# Create grid of lambda values
log_lambda_grid <- seq(6, -2, length=100) # In reverse
lambda_grid <- 10^log_lambda_grid

# Create 100 ridge models
ridge_models <- glmnet(Xmat_train, train$y, alpha=0, lambda=lambda_grid)

# Get coefficients of all 100 models
ridge_coef <- coef(ridge_models)

# Display coefficients for 6 models. 
# Ridge Models are displayed in decreasing order of lambda.
round(ridge_coef[, c(1:3, 98:100)], 6)
10 x 6 sparse Matrix of class "dgCMatrix"
                  s0       s1       s2       s97       s98       s99
(Intercept) 6.416705 6.416650 6.416584  5.227570  5.033764  4.861744
x1          0.000955 0.001150 0.001385  7.511869  7.774842  8.038722
x2          0.000022 0.000027 0.000032  2.663012  2.909190  3.127641
x3          0.000095 0.000115 0.000138 -1.866751 -1.996273 -2.131507
x4          0.000001 0.000001 0.000002 -0.497869 -0.527326 -0.551402
x5          0.000007 0.000009 0.000011  0.094714  0.106327  0.119162
x6          0.000000 0.000000 0.000000 -0.000858 -0.001566 -0.002458
x7          0.000001 0.000001 0.000001  0.005023  0.004964  0.004875
x8          0.000000 0.000000 0.000000  0.001641  0.001753  0.001861
x9          0.000000 0.000000 0.000000  0.000350  0.000336  0.000321

Plot Coefficient Estimates against Lambda

The plot below shows how the model coefficients change with respect to lambda.

Generate Predictions

We can use the predict() function to generate predictions according to each of the ridge models that we have trained.

train_pred_ridge <- predict(ridge_models, Xmat_train)
valid_pred_ridge <- predict(ridge_models, Xmat_valid)

valid_pred_ridge[,c(1,2,99,100)]
           s0       s1       s98       s99
[1,] 6.418116 6.418350 14.405266 14.614793
[2,] 6.417677 6.417821 12.765029 12.904992
[3,] 6.420160 6.420811 11.413877 11.495879
[4,] 6.417152 6.417189  8.959429  8.938939
[5,] 6.416112 6.415937  1.754171  1.528253
[6,] 6.433910 6.437370 48.974687 48.696388

Calculating Training and Validation r-Squared

In the cell below, we will calculate training and validation r-squared scores for each of our ridge models.

# Find SSE
SSE_train <- colSums((train_pred_ridge - train$y)^2)
SSE_valid <- colSums((valid_pred_ridge - valid$y)^2)

# Find r-Squared
r2_train_list <- 1 - SSE_train / SST_train
r2_valid_list <- 1 - SSE_valid / SST_valid

round(r2_valid_list[c(1:3, 98:100)],4)
     s0      s1      s2     s97     s98     s99 
-0.1291 -0.1290 -0.1289  0.6850  0.6785  0.6730 

Plotting Training and Validation r-Squared as a Function of Lambda

The figure below shows how the training and validation r-squared values change with respect to lambda.

plot(log_lambda_grid, r2_train_list, ylim=c(-0.2,1), pch=".", col="salmon", 
     xlab="ln(lambda)", ylab="r-Squared", main="Training and Validation Scores (Ridge)")

lines(log_lambda_grid, r2_train_list, col="salmon", lwd=2)

lines(log_lambda_grid, r2_valid_list, col="cornflowerblue", lwd=2)

legend(75, 1, legend=c("Training Acc", "Validation Acc"),
       col=c("salmon", "cornflowerblue"), lty=1, lwd=2, cex=0.8)

Finding Optimal Value of Lambda

We will now determine the value of lambda that results in the highest validation score.

best_valid_r2 <- max(r2_valid_list)
best_valid_r2_ix <- which.max(r2_valid_list)
best_log_lambda <- log_lambda_grid[best_valid_r2_ix]

cat('Index of Optimal r-Squared:    ', best_valid_r2_ix, '\n',
    'Value of Optimal r-Squared:    ', best_valid_r2, '\n',
    'Value of Optimal log(lambda):  ', best_log_lambda, sep='')
Index of Optimal r-Squared:    66
Value of Optimal r-Squared:    0.8700205
Value of Optimal log(lambda):  0.7474747

Plot Fitted Curve

In the figure below, we plot the curve representing the optimal ridge model.

LASSO Regression

Like ridge regression, LASSO regression is a regularization method that encourages small parameter values by adding a penalty term to the loss function. This penalty term has a slightly different form from the one encountered in ridge regression.

Goal: Minimize \(Loss\left(\hat\beta_0, ..., \hat\beta_m\right) = \frac{1}{n} SSE + \lambda \sum_{i=1}^{m} | \hat\beta_i |\)

Create LASSO Models for Several Values of Lambda

In the code chunk below, we train 100 LASSO models. We will then extract the parameter estimates for each of these models.

log_lambda_grid <- seq(2, -4, length=100) # In reverse
lambda_grid <- 10^log_lambda_grid

# Create 100 LASSO models
lasso_models <- glmnet(Xmat_train, train$y, alpha=1, lambda=lambda_grid)

# Get coefficients of all 100 models
lasso_coef <- coef(lasso_models)

# Display coefficients for 6 models. 
# LASSO Models are displayed in decreasing order of lambda.
round(lasso_coef[, c(1:3, 98:100)], 6)
10 x 6 sparse Matrix of class "dgCMatrix"
                  s0       s1       s2       s97       s98       s99
(Intercept) 6.416973 6.416973 6.416973  4.435341  4.436035  4.436568
x1          .        .        .        12.741928 12.742527 12.743195
x2          .        .        .         4.085117  4.084868  4.084699
x3          .        .        .        -5.053629 -5.054243 -5.054863
x4          .        .        .        -0.585410 -0.585359 -0.585312
x5          .        .        .         0.486529  0.486617  0.486705
x6          .        .        .        -0.006752 -0.006759 -0.006766
x7          .        .        .        -0.002574 -0.002577 -0.002580
x8          .        .        .         0.001932  0.001932  0.001932
x9          .        .        .        -0.000063 -0.000063 -0.000063

Plot Coefficient Estimates against Lambda

The plot below shows how the model coefficients change with respect to lambda.

Generate Predictions

We can use the predict() function to generate predictions according to each of the LASSO models that we have trained.

train_pred_lasso <- predict(lasso_models, Xmat_train)
valid_pred_lasso <- predict(lasso_models, Xmat_valid)

valid_pred_lasso[,c(1,2,99,100)]
           s0       s1        s98        s99
[1,] 6.416973 6.416973 16.8902719 16.8905375
[2,] 6.416973 6.416973 15.5512321 15.5518425
[3,] 6.416973 6.416973 10.4447371 10.4436781
[4,] 6.416973 6.416973 10.5795817 10.5803283
[5,] 6.416973 6.416973 -0.7807393 -0.7805362
[6,] 6.416973 6.416973 54.8181886 54.8185283

Calculating Training and Validation r-Squared

In the cell below, we will calculate training and validation r-squared scores for each of our LASSO models.

# Find SSE
SSE_train <- colSums((train_pred_lasso - train$y)^2)
SSE_valid <- colSums((valid_pred_lasso - valid$y)^2)

# Find r-Squared
r2_train_list <- 1 - SSE_train / SST_train
r2_valid_list <- 1 - SSE_valid / SST_valid

round(r2_valid_list[c(1:3, 98:100)],4)
     s0      s1      s2     s97     s98     s99 
-0.1297 -0.1297 -0.1297  0.6803  0.6803  0.6803 

Plotting Training and Validation r-Squared as a Function of Lambda

The figure below shows how the training and validation r-squared values change with respect to lambda.

plot(log_lambda_grid, r2_train_list, ylim=c(-0.2,1), pch=".", col="salmon", 
     xlab="ln(lambda)", ylab="r-Squared", main="Training and Validation Scores (Ridge)")

lines(log_lambda_grid, r2_train_list, col="salmon", lwd=2)

lines(log_lambda_grid, r2_valid_list, col="cornflowerblue", lwd=2)

legend(75, 1, legend=c("Training Acc", "Validation Acc"),
       col=c("salmon", "cornflowerblue"), lty=1, lwd=2, cex=0.8)

Finding Optimal Value of Lambda

We will now determine the value of lambda that results in the highest validation score.

best_valid_r2 <- max(r2_valid_list)
best_valid_r2_ix <- which.max(r2_valid_list)
best_log_lambda <- log_lambda_grid[best_valid_r2_ix]

cat('Index of Optimal r-Squared:    ', best_valid_r2_ix, '\n',
    'Value of Optimal r-Squared:    ', best_valid_r2, '\n',
    'Value of Optimal log(lambda):  ', best_log_lambda, sep='')
Index of Optimal r-Squared:    46
Value of Optimal r-Squared:    0.842338
Value of Optimal log(lambda):  -0.7272727

Plot Fitted Curve

In the figure below, we plot the curve representing the optimal LASSO model.

---
title: "Lasso and Ridge Regression - Regularization Technique"
author: "Praveen Jalaja"
output:
  html_notebook:
    theme: flatly
    toc: yes
    toc_depth: 4
  html_document:
    df_print: paged
    toc: yes
    toc_depth: '4'
---


### **Overview**

In this lesson, we will discussed two alternate forms of linear regression called **ridge regression** and **LASSO regression**. These two methods are examples of **regularization** or **shrinkage** methods, in which model parameters are encouraged to be small. In this Report we will discuss about the change in lambda affects the Regression model

### **Load Packages**

Tools for creating ridge and LASSO models are provided by the `glmnet` package.

```{r, message=FALSE}
library(glmnet)  
library(reshape)
library(ggplot2)
```

### **Generate Data**

In this lesson, we will be working with synthetic data. We will now generate the dataset and use it to create a dataframe called `df`. 

```{r, echo=FALSE}
set.seed(125)
n <- 20
x <- runif(n, -4, 4)
y <- 3 + 0.5 * x + 0.01 * x**7 + rnorm(n, 0, 10)

df <- data.frame(x, y)

summary(df)
```

### **Plot Data**

The cell below generates a scatterplot of the dataset.

```{r}
g <- ggplot(data = df)

g1 <- g + geom_point(aes(x = x , y = y), col = "blue", alpha =0.7, pch = 19)+
  geom_smooth(aes(x= x, y=y),method ="lm")
g1
```

### **Create Training and Validation Sets**

We will randomly split the dataset into a training set and a validation set. 

```{r}
set.seed(23)
sel <- sample(1:nrow(df), 0.7*nrow(df))

train <- df[sel, ]
valid <- df[-sel, ]

c(nrow(train), nrow(valid))
```

### **Polynomial Regression**

Motivated by the apparently non-linear relationship exhibited in the scatterplot, we will fit a polynomial model to the training set. 

```{r}
#poly_mod <- lm(y ~ poly(x, 9), df)

poly_mod <- lm(y ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5) +
                 I(x^6) + I(x^7) + I(x^8) + I(x^9), train)
summary(poly_mod)
```

#### **Plot Fitted Curve**

The figure below shows the fitted curve, as well as the residuals for both the training and validation sets.

```{r, echo=FALSE}

train_pred_poly <- predict(poly_mod, data.frame(x = train$x))
valid_pred_poly <- predict(poly_mod, data.frame(x = valid$x))



xg <- seq(-4,4,length=100)
yg_poly <- predict(poly_mod, data.frame(x=xg))

poly <- ggplot()+
  geom_point(aes(x =x, y =y),col = "blue", data = train)+
  lims(y = c(-80,140))+
  labs(x = "x", y = "y", title = "Polynominal Plot")+
  geom_line(aes(x = xg , y = yg_poly))+
  geom_segment(aes(x =x, y= y, xend = x ,yend = train_pred_poly), data =train, col ="blue")+geom_point(aes(x =x ,y =y ), data = valid, col="red")+
  geom_segment(aes(x =x ,y = y, xend = x, yend = valid_pred_poly), col = "red", data =valid)


poly


```

#### **Calculate Training and Validation r-Squared**

We will now calculate the training and validation r-squared values for the polynomial model. 

```{r}
SST_train <- var(train$y) * (nrow(train) - 1)
SST_valid <- var(valid$y) * (nrow(valid) - 1)

SSE_train <- sum((train$y - train_pred_poly)^2)
SSE_valid <- sum((valid$y - valid_pred_poly)^2)

train_r2 <- 1 - SSE_train / SST_train
valid_r2 <- 1 - SSE_valid / SST_valid

cat('Training r-Squared:    ', train_r2, '\n',
    'Validation r-Squared: ', valid_r2, sep='')
```


### **Ridge Regression**

Ridge regression is a regularization (shrinkage) technique that encourages model coefficients to be small by altering the loss function to include a penalty term that is based on the magnitude of the coeffiecients. The ridge loss function includes a regularization hyperparameter $aplpha$ that determines how much weight should be placed on the penalty term. Different values of this hyperparameter will result in different models. 

Goal: Minimize $Loss\left(\hat\beta_0, ..., \hat\beta_m\right) = \frac{1}{n} SSE + \lambda \sum_{i=1}^{m} \hat\beta_i^2$


#### **Create Matrix of Predictors**

The `glmnet()` function that provides the implementation of ridge regression in R does not use the same formula structure as other modeling functions that we have encountered. It requires the predictor values to be stored in a matrix of their own. 

```{r}
#Xmat <- model.matrix(y ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5) +
#                     I(x^6) + I(x^7) + I(x^8) + I(x^9))[,-1]

Xmat <- cbind(df$x, df$x^2, df$x^3, df$x^4, df$x^5, 
              df$x^6, df$x^7, df$x^8, df$x^9)
colnames(Xmat) <- c('x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9')
Xmat[,c(1,2,8,9)]
```


We will use the `sel` variable created previous to recreate our train/validation split.

```{r}
Xmat_train <- Xmat[sel,]
Xmat_valid <- Xmat[-sel,]
```


#### **Create Ridge Models for Several Values of Lambda**

When training a ridge model, it is important to select a good value for the regularization hyperparameter lambda. The `glmnet()` function allows us to specify several lambda values, and then simultaneously train ridge models for each value of lambda provided. We can then evaluate each of these models to identify the best value for lambda. 

In the code chunk below, we train 100 ridge models. We will then extract the parameter estimates for each of these models. 

```{r}
# Create grid of lambda values
log_lambda_grid <- seq(6, -2, length=100) # In reverse
lambda_grid <- 10^log_lambda_grid

# Create 100 ridge models
ridge_models <- glmnet(Xmat_train, train$y, alpha=0, lambda=lambda_grid)

# Get coefficients of all 100 models
ridge_coef <- coef(ridge_models)

# Display coefficients for 6 models. 
# Ridge Models are displayed in decreasing order of lambda.
round(ridge_coef[, c(1:3, 98:100)], 6)
```

#### **Plot Coefficient Estimates against Lambda**

The plot below shows how the model coefficients change with respect to lambda.

```{r, echo=FALSE}
ridge_coef_Df <- data.frame(grid = log_lambda_grid, 
                            x1 = ridge_coef[2,], x2 = ridge_coef[3,], x3 = ridge_coef[4,], 
                            x4 = ridge_coef[5,], x5 = ridge_coef[6,], x6 = ridge_coef[7,], 
                            x7 = ridge_coef[8,], x8 = ridge_coef[9,], x9 = ridge_coef[10,])
ridge_melted <- melt(ridge_coef_Df, id.vars="grid")
ggplot(ridge_melted, aes(grid, value, col=variable)) + geom_line(size=1) +
  xlab('log(lambda)') + ylab('Coefficients') 
```

#### **Generate Predictions**

We can use the `predict()` function to generate predictions according to each of the ridge models that we have trained. 

```{r}
train_pred_ridge <- predict(ridge_models, Xmat_train)
valid_pred_ridge <- predict(ridge_models, Xmat_valid)

valid_pred_ridge[,c(1,2,99,100)]
```

#### **Calculating Training and Validation r-Squared**

In the cell below, we will calculate training and validation r-squared scores for each of our ridge models. 

```{r}
# Find SSE
SSE_train <- colSums((train_pred_ridge - train$y)^2)
SSE_valid <- colSums((valid_pred_ridge - valid$y)^2)

# Find r-Squared
r2_train_list <- 1 - SSE_train / SST_train
r2_valid_list <- 1 - SSE_valid / SST_valid

round(r2_valid_list[c(1:3, 98:100)],4)
```

#### **Plotting Training and Validation r-Squared as a Function of Lambda**

The figure below shows how the training and validation r-squared values change with respect to lambda.

```{r}
plot(log_lambda_grid, r2_train_list, ylim=c(-0.2,1), pch=".", col="salmon", 
     xlab="ln(lambda)", ylab="r-Squared", main="Training and Validation Scores (Ridge)")

lines(log_lambda_grid, r2_train_list, col="salmon", lwd=2)

lines(log_lambda_grid, r2_valid_list, col="cornflowerblue", lwd=2)

legend(75, 1, legend=c("Training Acc", "Validation Acc"),
       col=c("salmon", "cornflowerblue"), lty=1, lwd=2, cex=0.8)
```

#### **Finding Optimal Value of Lambda**

We will now determine the value of lambda that results in the highest validation score.

```{r}
best_valid_r2 <- max(r2_valid_list)
best_valid_r2_ix <- which.max(r2_valid_list)
best_log_lambda <- log_lambda_grid[best_valid_r2_ix]

cat('Index of Optimal r-Squared:    ', best_valid_r2_ix, '\n',
    'Value of Optimal r-Squared:    ', best_valid_r2, '\n',
    'Value of Optimal log(lambda):  ', best_log_lambda, sep='')

```

#### **Plot Fitted Curve**

In the figure below, we plot the curve representing the optimal ridge model. 

```{r, echo=FALSE}
Xmat_grid <- cbind(xg, xg^2, xg^3, xg^4, xg^5, xg^6, xg^7, xg^8, xg^9)
colnames(Xmat_grid) <- c('x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9')

grid_pred <- predict(ridge_models, Xmat_grid)[,best_valid_r2_ix]

plot(y ~ x, train, ylim=c(-80, 140), xlab="x", ylab="y", main="Ridge Model")
lines(xg, grid_pred)
segments(train$x, train$y, train$x, train_pred_ridge[,best_valid_r2_ix]) 
points(valid$y ~ valid$x, col="red")
segments(valid$x, valid$y, valid$x, valid_pred_ridge[,best_valid_r2_ix], col="red")
```

### **LASSO Regression**

Like ridge regression, LASSO regression is a regularization method that encourages small parameter values by adding a penalty term to the loss function. This penalty term has a slightly different form from the one encountered in ridge regression. 

Goal: Minimize $Loss\left(\hat\beta_0, ..., \hat\beta_m\right) = \frac{1}{n} SSE + \lambda \sum_{i=1}^{m} | \hat\beta_i |$

#### **Create LASSO Models for Several Values of Lambda**

In the code chunk below, we train 100 LASSO models. We will then extract the parameter estimates for each of these models. 

```{r}
log_lambda_grid <- seq(2, -4, length=100) # In reverse
lambda_grid <- 10^log_lambda_grid

# Create 100 LASSO models
lasso_models <- glmnet(Xmat_train, train$y, alpha=1, lambda=lambda_grid)

# Get coefficients of all 100 models
lasso_coef <- coef(lasso_models)

# Display coefficients for 6 models. 
# LASSO Models are displayed in decreasing order of lambda.
round(lasso_coef[, c(1:3, 98:100)], 6)
```

#### **Plot Coefficient Estimates against Lambda**

The plot below shows how the model coefficients change with respect to lambda.

```{r, echo=FALSE}
lasso_coef_Df <- data.frame(grid = log_lambda_grid, 
                            x1 = lasso_coef[2,], x2 = lasso_coef[3,], x3 = lasso_coef[4,], 
                            x4 = lasso_coef[5,], x5 = lasso_coef[6,], x6 = lasso_coef[7,], 
                            x7 = lasso_coef[8,], x8 = lasso_coef[9,], x9 = lasso_coef[10,])
lasso_melted <- melt(lasso_coef_Df, id.vars="grid")

ggplot(lasso_melted, aes(grid, value, col=variable)) + geom_line(size=1) +
  xlab('log(lambda)') + ylab('Coefficients') 
```

#### **Generate Predictions**

We can use the `predict()` function to generate predictions according to each of the LASSO models that we have trained. 

```{r}
train_pred_lasso <- predict(lasso_models, Xmat_train)
valid_pred_lasso <- predict(lasso_models, Xmat_valid)

valid_pred_lasso[,c(1,2,99,100)]
```

#### **Calculating Training and Validation r-Squared**

In the cell below, we will calculate training and validation r-squared scores for each of our LASSO models. 

```{r}
# Find SSE
SSE_train <- colSums((train_pred_lasso - train$y)^2)
SSE_valid <- colSums((valid_pred_lasso - valid$y)^2)

# Find r-Squared
r2_train_list <- 1 - SSE_train / SST_train
r2_valid_list <- 1 - SSE_valid / SST_valid

round(r2_valid_list[c(1:3, 98:100)],4)
```

#### **Plotting Training and Validation r-Squared as a Function of Lambda**

The figure below shows how the training and validation r-squared values change with respect to lambda.


```{r}
plot(log_lambda_grid, r2_train_list, ylim=c(-0.2,1), pch=".", col="salmon", 
     xlab="ln(lambda)", ylab="r-Squared", main="Training and Validation Scores (Ridge)")

lines(log_lambda_grid, r2_train_list, col="salmon", lwd=2)

lines(log_lambda_grid, r2_valid_list, col="cornflowerblue", lwd=2)

legend(75, 1, legend=c("Training Acc", "Validation Acc"),
       col=c("salmon", "cornflowerblue"), lty=1, lwd=2, cex=0.8)
```

#### **Finding Optimal Value of Lambda**

We will now determine the value of lambda that results in the highest validation score.

```{r}
best_valid_r2 <- max(r2_valid_list)
best_valid_r2_ix <- which.max(r2_valid_list)
best_log_lambda <- log_lambda_grid[best_valid_r2_ix]

cat('Index of Optimal r-Squared:    ', best_valid_r2_ix, '\n',
    'Value of Optimal r-Squared:    ', best_valid_r2, '\n',
    'Value of Optimal log(lambda):  ', best_log_lambda, sep='')

```

#### **Plot Fitted Curve**

In the figure below, we plot the curve representing the optimal LASSO model. 

```{r, echo=FALSE}
grid_pred <- predict(lasso_models, Xmat_grid)[,best_valid_r2_ix]

plot(y ~ x, train, ylim=c(-80, 140), xlab="x", ylab="y", main="LASSO Model")
lines(xg, grid_pred)
segments(train$x, train$y, train$x, train_pred_lasso[,best_valid_r2_ix]) 
points(valid$y ~ valid$x, col="red")
segments(valid$x, valid$y, valid$x, valid_pred_lasso[,best_valid_r2_ix], col="red")

```
