Lasso regression is a type of linear regression that adds a regularization penalty to the loss function during training. This penalty is proportional to the absolute value of the coefficients, which encourages the model to not only fit the data but also to keep the model weights as small as possible. This property of Lasso regression makes it particularly useful for feature selection, as it can effectively reduce the number of features by setting the coefficients of less important features to zero.
Here’s a step-by-step example of how to implement Lasso regression in R, including how to decide on the number of input variables and the optimal lambda value:
# install.packages("glmnet")
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.2.3
## Loading required package: Matrix
## Loaded glmnet 4.1-8
set.seed(42) # for reproducibility
n <- 100 # number of samples
p <- 100 # number of variables
X <- matrix(rnorm(n * p), n, p)
beta <- c(rep(1, 10), rep(0, p-10)) # only first 10 are informative
y <- X %*% beta + rnorm(n)
cv.lasso <- cv.glmnet(X, y, alpha=1) # alpha=1 for Lasso
plot(cv.lasso)
# This code fits the Lasso model across a range of lambda values and performs cross-validation to assess model performance. The plot function shows the mean squared error for each lambda.
best.lambda <- cv.lasso$lambda.min
print(best.lambda)
## [1] 0.0324792
lasso.model <- glmnet(X, y, alpha=1, lambda=best.lambda)
coef(lasso.model)
## 101 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 0.261556932
## V1 0.878224411
## V2 1.163501203
## V3 0.863586153
## V4 0.761606635
## V5 1.141607402
## V6 0.976442884
## V7 0.846018321
## V8 1.199882133
## V9 0.995539757
## V10 1.170741260
## V11 0.030781462
## V12 .
## V13 .
## V14 .
## V15 -0.030427855
## V16 -0.118873704
## V17 .
## V18 .
## V19 .
## V20 .
## V21 -0.149071528
## V22 -0.042924447
## V23 .
## V24 .
## V25 .
## V26 -0.005810407
## V27 -0.080204412
## V28 .
## V29 .
## V30 .
## V31 0.085085985
## V32 .
## V33 -0.171795608
## V34 .
## V35 0.043480946
## V36 -0.151323362
## V37 .
## V38 0.038903816
## V39 0.018480756
## V40 .
## V41 .
## V42 .
## V43 .
## V44 0.038648370
## V45 0.042544526
## V46 0.122027134
## V47 .
## V48 .
## V49 0.044273502
## V50 .
## V51 -0.099218688
## V52 0.105502422
## V53 0.054445439
## V54 0.057229452
## V55 .
## V56 -0.081061285
## V57 .
## V58 0.018283616
## V59 -0.054336189
## V60 0.016794284
## V61 -0.219440660
## V62 .
## V63 .
## V64 .
## V65 .
## V66 0.077581043
## V67 -0.134037701
## V68 0.154062557
## V69 .
## V70 .
## V71 .
## V72 .
## V73 -0.057622789
## V74 0.019749255
## V75 .
## V76 -0.037391945
## V77 -0.089953400
## V78 0.186404534
## V79 0.129575177
## V80 -0.164696368
## V81 .
## V82 .
## V83 .
## V84 .
## V85 -0.113563536
## V86 -0.022514404
## V87 0.084850661
## V88 0.086598338
## V89 0.147645539
## V90 0.228979638
## V91 .
## V92 .
## V93 .
## V94 .
## V95 0.076066563
## V96 .
## V97 .
## V98 -0.075137531
## V99 .
## V100 .