Lasso Regression is a type of linear regression that use a regularization term called the L1 penalty. Similar to how the Ridge Regression uses the L2 penalty. The term “lasso” means Least Absolute Shrinkage and Selection Operator. This method was first introduced by Tibshirani. An advantage of using Lasso Regression is its ability to handle multicollinearity by choosing one variable.
Data Source : https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html
data <- fread("https://raw.githubusercontent.com/SpencerPao/Ridge-Lasso-ElasticNet/master/Boston_Housing.csv")
data <- na.omit(data)par(mfrow = c(4, 4), mar = c(3, 3, 1, 1))
for (col_name in names(data)) {
if (is.numeric(data[[col_name]])) {
hist(data[[col_name]], main = paste(col_name), xlab = "Value")
}
}
par(mfrow = c(1, 1))scaled data: take an observation and substract from the column mean then divide the substraction by the standard deviation
par(mfrow = c(4, 4), mar = c(3, 3, 1, 1))
for (col_name in names(data_scaled)) {
if (is.numeric(data_scaled[[col_name]])) {
hist(data_scaled[[col_name]], main = paste(col_name), xlab = "Value")
}
}
par(mfrow = c(1, 1))We create a train-test split of 80-20 for our Lasso Regression
# Train-Test Split
set.seed(123)
size <- floor(0.8 * nrow(data_scaled))
training_ind <- sample(seq_len(nrow(data_scaled)), size = size)
train <- data[training_ind, ]
xtrain <- train[, 1:13] |>
as.matrix()
ytrain <- train |>
select(MEDV) |>
unlist() |>
as.numeric()
test <- data_scaled[-training_ind, ]
xtest <- data[, 1:13] |>
as.matrix()
ytest <- data[, 14] |>
unlist() |>
as.numeric()Similar to how we used glmnet() in Ridge Regression by setting alpha = 0, we set alpha = 1 in order to get the lasso regression. And an alpha between 0 and 1 would be combination of Rigde and Lasso regression which is also called as Elastic Net Regression.
## Length Class Mode
## a0 10000 -none- numeric
## beta 130000 dgCMatrix S4
## df 10000 -none- numeric
## dim 2 -none- numeric
## lambda 10000 -none- numeric
## dev.ratio 10000 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
Next, we take a look at the lambdas in relation to the coefficients
Since this is a Lasso regression when the values reach zero they remain at zero for the rest of the lambdas as they increase.
Feature 4, 5 and 6 take a some time until it reaches zero relative to the other features indication that these features are more heavily weighted in the model(i.e. more important).
sst <- sum((ytest - mean(ytest))^2)
sse <- sum((y_predictions_lasso - ytest)^2)
r_square_lasso <- 1 - (sse/sst)## [1] "SSE: 9209.97923572005"
## [1] "SST: 34993.7499115044"
## [1] "R-sqaured: 0.736810737374213"
## [1] 20.37606
## [1] 4.513985
After reviewing the performance metrics, the lasso regression model’s accuracy at its prediction capabilities is comparable to the ridge regression model we build in prior blog. Further improvement on the lasso model can be using the k-fold cross validtion to get the optimal lambda.