Lasso Regression

In Linear regression, the model is not penalized for its choice of weights, at all. As a result, during the training stage, if the model considers one particular feature to be particularly important, it may place a large weight on that feature i.e. derive a large value for its associated co-efficient. This can sometimes lead to overfitting, especially when it comes to small datasets with a large number of variables.

In order to reduce this potential mis-match in the size of the co-efficient associated with the different predictor variables, there is a another technique called Lasso Regression. LASSO stands for Least Absolute Shrinkage and Selection Operator. Lasso is a modification of linear regression, wherein the model is penalized by introducing the sum of absolute values of the co-efficients in the objective function. Thus, the absolute values of weight will be (in general) reduced, and many will tend to be zeros.

while Ordinary Least Squares (OLS) regression tries to find coefficient estimates that minimize the sum of squared residuals (RSS) as follows:

RSS = \(\sum (y_{i} - \hat{y})^2\)

Lasso regression tries to find coefficient estimates that minimize the following objective function:

RSS + \(\lambda\sum|\beta_{j}|\)

where: j goes from 1 to p based on the number of predictors and lambda is set greater than or equal to 0.

This second term in the equation is known as the shrinkage penalty. In lasso regression, we select a value for λ that produces the lowest possible test MSE (mean squared error).

In order to avoid penalizing the co-efficients of variables that differ widely in range of values, in the same way, it is important to first scale or normalize all the variables before they can be used to fit a lasso regression model.