What is Lasso Regression

Lasso Regression is a type of linear regression that use a regularization term called the L1 penalty. Similar to how the Ridge Regression uses the L2 penalty. The term “lasso” means Least Absolute Shrinkage and Selection Operator. This method was first introduced by Tibshirani. An advantage of using Lasso Regression is its ability to handle multicollinearity by choosing one variable.

Applied Lasso Regression Example

require(dplyr)
require(data.table)

Data Source : https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html

data <- fread("https://raw.githubusercontent.com/SpencerPao/Ridge-Lasso-ElasticNet/master/Boston_Housing.csv")

data <- na.omit(data)
par(mfrow = c(4, 4), mar = c(3, 3, 1, 1))

for (col_name in names(data)) {
    if (is.numeric(data[[col_name]])) {
        hist(data[[col_name]], main = paste(col_name), xlab = "Value")
    }
}

par(mfrow = c(1, 1))

data_scaled <- cbind(scale(data[, 1:13]), data[, 13])

scaled data: take an observation and substract from the column mean then divide the substraction by the standard deviation

par(mfrow = c(4, 4), mar = c(3, 3, 1, 1))

for (col_name in names(data_scaled)) {
    if (is.numeric(data_scaled[[col_name]])) {
        hist(data_scaled[[col_name]], main = paste(col_name), xlab = "Value")
    }
}

par(mfrow = c(1, 1))

We create a train-test split of 80-20 for our Lasso Regression

# Train-Test Split
set.seed(123)
size <- floor(0.8 * nrow(data_scaled))

training_ind <- sample(seq_len(nrow(data_scaled)), size = size)

train <- data[training_ind, ]
xtrain <- train[, 1:13] |>
    as.matrix()
ytrain <- train |>
    select(MEDV) |>
    unlist() |>
    as.numeric()


test <- data_scaled[-training_ind, ]
xtest <- data[, 1:13] |>
    as.matrix()
ytest <- data[, 14] |>
    unlist() |>
    as.numeric()
lambda_array <- seq(from = 0.01, to = 100, by = 0.01)
require(glmnet)
lassoFit <- glmnet(xtrain, ytrain, alpha = 1, lambda = lambda_array)

Similar to how we used glmnet() in Ridge Regression by setting alpha = 0, we set alpha = 1 in order to get the lasso regression. And an alpha between 0 and 1 would be combination of Rigde and Lasso regression which is also called as Elastic Net Regression.

summary(lassoFit)
##           Length Class     Mode   
## a0         10000 -none-    numeric
## beta      130000 dgCMatrix S4     
## df         10000 -none-    numeric
## dim            2 -none-    numeric
## lambda     10000 -none-    numeric
## dev.ratio  10000 -none-    numeric
## nulldev        1 -none-    numeric
## npasses        1 -none-    numeric
## jerr           1 -none-    numeric
## offset         1 -none-    logical
## call           5 -none-    call   
## nobs           1 -none-    numeric

Next, we take a look at the lambdas in relation to the coefficients

plot(lassoFit, xvar = "lambda", label = T)

Since this is a Lasso regression when the values reach zero they remain at zero for the rest of the lambdas as they increase.

Feature 4, 5 and 6 take a some time until it reaches zero relative to the other features indication that these features are more heavily weighted in the model(i.e. more important).

Goodness of Fit

plot(lassoFit, xvar = "dev", label = T)

Prediction

y_predictions_lasso <- predict(lassoFit, s = min(lambda_array), newx = xtest)

SST and SSE

sst <- sum((ytest - mean(ytest))^2)
sse <- sum((y_predictions_lasso - ytest)^2)

r_square_lasso <- 1 - (sse/sst)
print(paste("SSE:", sse))
## [1] "SSE: 9209.97923572005"
print(paste("SST:", sst))
## [1] "SST: 34993.7499115044"
print(paste("R-sqaured:", r_square_lasso))
## [1] "R-sqaured: 0.736810737374213"

MSE

mse_lasso <- mean((y_predictions_lasso - ytest)^2)
mse_lasso
## [1] 20.37606
rmse_lasso <- sqrt(mse_lasso)
rmse_lasso
## [1] 4.513985

After reviewing the performance metrics, the lasso regression model’s accuracy at its prediction capabilities is comparable to the ridge regression model we build in prior blog. Further improvement on the lasso model can be using the k-fold cross validtion to get the optimal lambda.