Lasso is Scale Variant

Set Up

# Clear the workspace
rm(list = ls()) # Clear environment
gc()            # Clear memory

          used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells  597327 32.0    1356528 72.5         NA   700240 37.4
Vcells 1101602  8.5    8388608 64.0      49152  1963592 15.0

cat("\f")       # Clear the console

# Set a random seed for reproducibility
set.seed(123)

# Prepare needed libraries

packages <- c("stargazer", 
              "visdat",
              "skimr",
              "tidyverse",
              "glmnet"
              )  

for (i in 1:length(packages)) {
  if (!packages[i] %in% rownames(installed.packages())) {
    install.packages(packages[i]
                     , repos = "http://cran.rstudio.com/"
                     , dependencies = TRUE
                     )
  }
  library(packages[i], character.only = TRUE)
}


Please cite as:

 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading required package: Matrix


Attaching package: 'Matrix'


The following objects are masked from 'package:tidyr':

    expand, pack, unpack


Loaded glmnet 4.1-8

rm(packages)

Data

Description

df <- carData::Salaries

skim(df)

Data summary
Name	df
Number of rows	397
Number of columns	6
_______________________
Column type frequency:
factor	3
numeric	3
________________________
Group variables	None

Variable type: factor

skim_variable	complete_rate	ordered	n_unique	top_counts
rank	1	FALSE	3	Pro: 266, Ass: 67, Ass: 64
discipline	1	FALSE	2	B: 216, A: 181
sex	1	FALSE	2	Mal: 358, Fem: 39

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
yrs.since.phd	1	22.31	12.89	1	12	21	32	56	▇▇▆▅▁
yrs.service	1	17.61	13.01	0	7	16	27	60	▇▅▃▂▁
salary	1	113706.46	30289.04	57800	91000	107300	134185	231545	▅▇▅▂▁

str(df)

'data.frame':   397 obs. of  6 variables:
 $ rank         : Factor w/ 3 levels "AsstProf","AssocProf",..: 3 3 1 3 3 2 3 3 3 3 ...
 $ discipline   : Factor w/ 2 levels "A","B": 2 2 2 2 2 2 2 2 2 2 ...
 $ yrs.since.phd: int  19 20 4 45 40 6 30 45 21 18 ...
 $ yrs.service  : int  18 16 3 39 41 6 23 45 20 18 ...
 $ sex          : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 1 ...
 $ salary       : int  139750 173200 79750 115000 141500 97000 175000 147765 119250 129000 ...

vis_dat(df)

stargazer(df, 
          type="text"
          )


=======================================================
Statistic      N     Mean      St. Dev.   Min     Max  
-------------------------------------------------------
yrs.since.phd 397   22.315      12.887     1      56   
yrs.service   397   17.615      13.006     0      60   
salary        397 113,706.500 30,289.040 57,800 231,545
-------------------------------------------------------

Scaled Data

Lets create two other dataframes where salary is measured in cents and in thousands. All other variables are the same.

df_cents <- df |> mutate(salary = salary * 100 )
df_thousands <- df |> mutate(salary = salary / 1000 )

stargazer(df,df_cents, df_thousands, 
          type="text"
          )


=======================================================
Statistic      N     Mean      St. Dev.   Min     Max  
-------------------------------------------------------
yrs.since.phd 397   22.315      12.887     1      56   
yrs.service   397   17.615      13.006     0      60   
salary        397 113,706.500 30,289.040 57,800 231,545
-------------------------------------------------------

===================================================================
Statistic      N       Mean        St. Dev.       Min       Max    
-------------------------------------------------------------------
yrs.since.phd 397     22.315        12.887         1         56    
yrs.service   397     17.615        13.006         0         60    
salary        397 11,370,646.000 3,028,904.000 5,780,000 23,154,500
-------------------------------------------------------------------

=================================================
Statistic      N   Mean   St. Dev.  Min     Max  
-------------------------------------------------
yrs.since.phd 397 22.315   12.887    1      56   
yrs.service   397 17.615   13.006    0      60   
salary        397 113.706  30.289  57.800 231.545
-------------------------------------------------

OLS (Ordinary Least Squares)

Is scale invariant. The choice of scale is purely for convenience and clarity in interpretation.
- The regression equation:

\[ rank_i = \beta_0 + \beta_1 Years \ Since \ PhD_i + \beta_2 Years \ of \ Service_i + \beta_3 Sex_i + \beta_4 Salary_i + \epsilon _i \tag{1}\]

includes the variable Salary, which could be measured in units such as dollars, thousands of dollars, or cents. Changing the scale of the salary variable (e.g., dividing by 1,000 to express salary in thousands) does not alter the underlying relationship between salary and rank. Instead, it affects the interpretation of the coefficient for Salary.

OLS is Scale Invariant

Scale invariance refers to the property of regression models where changing the scale of an independent variable affects only the coefficient of that variable, not its overall effect on the dependent variable.

Specifically:

If Salary is measured in dollars, the coefficient \(\beta_4\) represents the change in rank for a one-dollar increase in salary.
If Salary is measured in thousands of dollars, the coefficient \(\beta_4\) would adjust accordingly, representing the change in rank for a one-thousand-dollar increase in salary.

The model’s predictions (\(\hat{rank}_i\)) and the overall fit (e.g., \(R^2\), significance of \(\beta_4\)) remain unchanged because the relationship between salary and rank is preserved; only the scale of the coefficient adapts.

reg1 <- 
lm(data = df, 
   formula = rank ~ yrs.since.phd + yrs.service + salary + sex)

reg1_cents <- 
lm(data = df_cents, 
   formula = rank ~ yrs.since.phd + yrs.service + salary + sex)

reg1_thousands <- 
lm(data = df_thousands, 
   formula = rank ~ yrs.since.phd + yrs.service + salary + sex)

stargazer(reg1,reg1_cents, reg1_thousands, 
          type="text", 
          column.labels = c("Original","In Cents","In Thousands"),          
          order = ("salary"),
          digits = 10
          )


=======================================================
                         Dependent variable:           
              -----------------------------------------
                                rank                   
                Original      In Cents    In Thousands 
                   (1)           (2)           (3)     
-------------------------------------------------------
salary        0.0000098328  0.0000000983  0.0098327800 
                                                       
                                                       
yrs.since.phd 0.0355461500  0.0355461500  0.0355461500 
                                                       
                                                       
yrs.service   -0.0038601220 -0.0038601220 -0.0038601220
                                                       
                                                       
sexMale       0.0154936600  0.0154936600  0.0154936600 
                                                       
                                                       
Constant      0.6440243000  0.6440243000  0.6440243000 
                                                       
                                                       
-------------------------------------------------------
Observations       397           397           397     
=======================================================
Note:                       *p<0.1; **p<0.05; ***p<0.01

Lasso (Least Absolute Shrinkage and Selection Operator)

Now, suppose we are interested in the estimating equation:

\[ rank_i = \beta_0 + \beta_1 Years \ Since \ PhD_i + \beta_2 Years \ of \ Service_i + \beta_3 Salary_i + \epsilon _i \tag{2}\]

We will run lasso on this functional form.
We will then show that if you do not scale a variable, it is more likely to get eliminated (if it is relatively large compared to scaled variables), and more unlikely to get eliminated (if it is relatively small compared to scaled variables).

LASSO is not scale-invariant.

This means that the selection of variables and the magnitude of their coefficients can be affected by the scale of the predictors

Lets scale all the independent variables.

?scale

df_scaled <- as.data.frame(scale(x = df[,c(3,4,6)]))
df_scaled <- cbind(df_scaled, df[,c("rank", "discipline", "sex")])

stargazer(df, df_scaled, 
          type="text"
          )


=======================================================
Statistic      N     Mean      St. Dev.   Min     Max  
-------------------------------------------------------
yrs.since.phd 397   22.315      12.887     1      56   
yrs.service   397   17.615      13.006     0      60   
salary        397 113,706.500 30,289.040 57,800 231,545
-------------------------------------------------------

==============================================
Statistic      N   Mean  St. Dev.  Min    Max 
----------------------------------------------
yrs.since.phd 397 0.000   1.000   -1.654 2.614
yrs.service   397 -0.000  1.000   -1.354 3.259
salary        397 0.000   1.000   -1.846 3.890
----------------------------------------------

Run OLS on the estimating equation, which is equivalent to \(\lambda=0\) in the glmnet command. \[ rank_i = \beta_0 + \beta_1 Years \ Since \ PhD_i + \beta_2 Years \ of \ Service_i + \beta_3 Salary_i + \epsilon _i \]

?glmnet
# lasso

scaledX <- df_scaled[,c("salary", "yrs.since.phd", "yrs.service")]
scaledX_cents <- scaledX |> mutate(salary = salary * 100)
scaledX_thousands <- scaledX |> mutate(salary = salary / 1000)

la.eq <- glmnet(x = scaledX,  # "salary", "yrs.since.phd", "yrs.service"
                y = as.numeric(df_scaled$rank), 
                alpha = 1,
                lambda = 0  # OLS
                )

Confirm \(\lambda\)=0 is OLS.

reg2 <- 
lm(data = df, 
   formula = as.numeric(rank) ~ scale(yrs.since.phd) + scale(yrs.service) + scale(salary) )

summary(reg2) # same coefficients as la.eq


Call:
lm(formula = as.numeric(rank) ~ scale(yrs.since.phd) + scale(yrs.service) + 
    scale(salary), data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.30020 -0.37101  0.02817  0.34410  1.02497 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)           2.50126    0.02387 104.790  < 2e-16 ***
scale(yrs.since.phd)  0.45802    0.06018   7.611 2.04e-13 ***
scale(yrs.service)   -0.04959    0.05798  -0.855    0.393    
scale(salary)         0.29829    0.02653  11.244  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4756 on 393 degrees of freedom
Multiple R-squared:  0.6189,    Adjusted R-squared:  0.616 
F-statistic: 212.7 on 3 and 393 DF,  p-value: < 2.2e-16

Output matches for reg2 and la.eq.

Implement Lasso

Choosing a non-zero \(\lambda\) ensures we are implementing an Lasso.
- la.eq selects all three vars.
- la.eq_cents selects should eliminate salary variable as it would give us a bigger coefficient corresponding to salary variable. Lasso coefficient on salary in la.eq_cents model should be thus smaller than la.eq model, suggesting we strongly eliminate this variables.
- la.eq_thousands selects should keep salary variable as it would give us a smaller coefficient corresponding to salary variable. Lasso coefficient on salary in la.eq_thousands model should be thus bigger than la.eq model, suggesting we strongly keep this variables.

?glmnet
# lasso

scaledX <- df_scaled[,c("salary", "yrs.since.phd", "yrs.service")]
scaledX_cents <- scaledX |> mutate(salary = salary * 1000000)
scaledX_thousands <- scaledX |> mutate(salary = salary / 10000000)

la.eq <- glmnet(x = scaledX, 
                y = as.numeric(df_scaled$rank), 
                alpha=1,
                lambda = .342  
                ) 

la.eq_cents <- glmnet(x = scaledX_cents, 
                y = as.numeric(df_scaled$rank), 
                alpha=1,
                lambda = .342  
                ) 

la.eq_thousands <- glmnet(x = scaledX_thousands, 
                y = as.numeric(df_scaled$rank), 
                alpha=1,
                lambda = .342  
                ) 

data.frame(
    Lasso_standard    = la.eq$beta[,1],
    Lasso_cents       = la.eq_cents$beta[,1],
    Lasso_thousands   = la.eq_thousands$beta[,1]
)

              Lasso_standard  Lasso_cents Lasso_thousands
salary             0.0598139 5.981390e-08    5.981390e+05
yrs.since.phd      0.1704519 1.704519e-01    1.704519e-01
yrs.service        0.0000000 0.000000e+00    0.000000e+00

LASSO regression (L1-regularization) is scale-dependent, meaning that variables with different scales can affect the results of variable selection.

LASSO applies the same penalty to all predictors, regardless of their scale.

Larger values of income (e.g., in cents) lead to larger coefficients, even if the relationship between salary and y is identical.
LASSO penalizes larger coefficients more heavily, shrinking or eliminating the variable from the model incorrectly.

Without Scaling:

Coefficients for salary in different units vary drastically due to scale differences.
LASSO may incorrectly shrink or eliminate variables like salary in cents because its large magnitude leads to larger penalties.

With Scaling:

Coefficients are adjusted for scale, and variable selection depends purely on predictive power, not units.

Summary

Lasso with \(\lambda=0\) is plain old ordinary lease squares (OLS).
- OLS is scale invariant.
Salary measured in dollars, cents, or thousands affects the coefficients in lasso due to scale differences.
- Standardization ensures fair treatment of variables, avoiding bias in LASSO’s variable selection.
- Always standardize predictors before applying LASSO or any regularized regression technique.