Often considered a better alternative to stepwise-regression, Least Absolute Shrinkage and Selection Operator regression (LASSO for short) is a regularization technique for regression models that can reduce the amount of variables in a model. This works for various types of regression (logistic, linear, count, etc.). LASSO penalizes the model based on the sum of absolute coefficient values, which has the effect of lowering coefficients (shrinkage). Some coefficients may shrink to zero, which is how LASSO allows for automatic variable selection similar to stepwise.

A hyper-parameter, \(\lambda\), is used to control the shrinkage; this value is between 0 and 1. An optimized \(\lambda\) value can be chosen by compiling LASSO regression model using a range of \(\lambda\) values, and choosing the best model.

LASSO regression is great when you have many coefficients and want a simpler model. However, information concerning the relationship of coefficients to the response variable can become obscured. Thus, LASSO is a good technique where one want’s a simplified model that can make accurate predictions, but isn’t concerned with the relationships of the independent variables to the response variable.

Below is an example of an LASSO binomial regression model from assignment 3.

library(glmnet)  #Was a helpful guide: https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html

#Data prep.  Needs to be in matrix format
#Took code from here: 
#https://stackoverflow.com/questions/35437411/error-in-predict-glmnet-function-not-yet-implemented-method
trainx = model.matrix(~.-target,data=train)     
newx = model.matrix(~.-target,data=validation)

#Makes a series of crossvalidated glmnet models for 100 lambda values (default)
#lamba values are constants that define coefficient shrinkage.  
glmnetmodel <- cv.glmnet(x = trainx,   #Predictor variables
                      y = train[,names(train) == "target"],   #Responce variable
                      family = "binomial", #Has it do logistic regression
                      nfolds = 10, #10 fold cv
                      type.measure = "class",  #uses missclassification error as loss
                      gamma = seq(0,1,0.1),  #Values to use for relaxed fit
                      relax = FALSE,#Mixes relaxed fit with regluarized fit
                      alpha = 1)  
#Basically a choice betwen lasso, ridge, or elasticnet regression.  Alpha = 1 is lasso.



#Predicts the probability that the target variable is 1
predictions <- predict(glmnetmodel, newx = newx, type = "response", 
                       s=glmnetmodel$lambda.min) 
#setting lambda.min uses the lambda value with the minimum mean cv error (picks the best model)


#Print's the coefficients the model uses
print(coef.glmnet(glmnetmodel, s = glmnetmodel$lambda.min))
## 14 x 1 sparse Matrix of class "dgCMatrix"
##                         1
## (Intercept) -25.488121792
## (Intercept)   .          
## zn           -0.036655013
## indus        -0.024064734
## chas1         0.712106522
## nox          29.822168595
## rm            .          
## age           0.021102022
## dis           0.347175535
## rad           0.324580442
## tax          -0.004284457
## ptratio       0.211894182
## lstat         0.036257519
## medv          0.094371163

Above, you can see it’s reduced one of the coefficients to 0 (rm). Below, we can see the model is very predictive, although we cannot learn much about the individual relationships of each independent variable to the response variable.

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 42  2
##          1  5 43
##                                           
##                Accuracy : 0.9239          
##                  95% CI : (0.8495, 0.9689)
##     No Information Rate : 0.5109          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.848           
##                                           
##  Mcnemar's Test P-Value : 0.4497          
##                                           
##             Sensitivity : 0.9556          
##             Specificity : 0.8936          
##          Pos Pred Value : 0.8958          
##          Neg Pred Value : 0.9545          
##              Prevalence : 0.4891          
##          Detection Rate : 0.4674          
##    Detection Prevalence : 0.5217          
##       Balanced Accuracy : 0.9246          
##                                           
##        'Positive' Class : 1               
## 
## Warning in roc.default(validation$target, predictions): Deprecated use a matrix
## as predictor. Unexpected results may be produced, please pass a numeric vector.

## Area under the curve: 0.9858

Some sourced of information used here:

https://machinelearningmastery.com/lasso-regression-with-python/

https://en.wikipedia.org/wiki/Lasso_(statistics)

https://www.mygreatlearning.com/blog/understanding-of-lasso-regression/

https://www.statisticshowto.com/lasso-regression/#:~:text=Lasso%20regression%20is%20a%20type,i.e.%20models%20with%20fewer%20parameters).

https://stats.stackexchange.com/questions/17251/what-is-the-lasso-in-regression-analysis