Introduction

As an alternative to the winsorization and truncation procedures,recent studies commonly estimate “robust regressions” to address the potential influence of outliers in their tests or to increase the precision of their estimates. These robust regressions reflect a collection of a linear regression methods that, similar to OLS, use a specific objective function to estimate the parameters of the regression model for a sample of data. What most robust regressions do is perform an iterative procedure in which observations with large regression residuals get downweighted. Because different estimators and their different implementations in Stata vary in their effectiveness of controlling for outliers. Here we illustrate how to implement alternative robust estimators in Stata.

Reducing bias with robust estimators

To set the stage, consider the following simulated example dataset of 1,000 observations where variable \(x\) and \(y\) have a simple relation: \(y= x + \epsilon\), where \(x\) and \(\epsilon\) are variables drawn from a standard normal distribution with mean 0 and standard deviation 1. For this sample, a regression of \(y\) on \(x\) should produce a coefficient estimate on \(x\) that is close to 1.

But let’s assume that for a randomly drawn 10 percent of observations, the decimal point of \(x\) was placed incorrectly so that the true values of \(x\) are all multiplied by 10 in the observed data.

clear
set seed 1234
set seed 1000
set obs 1000

gen x = rnormal()
gen y = x + rnormal()
reg y x

replace x = x*10 if _n <= 100
Number of observations (_N) was 0, now 1,000.

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(1, 998)       =    941.60
       Model |  931.354225         1  931.354225   Prob > F        =    0.0000
    Residual |  987.136919       998   .98911515   R-squared       =    0.4855
-------------+----------------------------------   Adj R-squared   =    0.4849
       Total |  1918.49114       999  1.92041156   Root MSE        =    .99454

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |    .996082   .0324609    30.69   0.000     .9323825    1.059782
       _cons |  -.0043599   .0314823    -0.14   0.890     -.066139    .0574193
------------------------------------------------------------------------------

(100 real changes made)

When we estimate a simple regression using OLS on the dataset, we get:

reg y x
avplots
graph export "fig_outlier.png", replace width(800) height(600)
      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(1, 998)       =    170.74
       Model |  280.274791         1  280.274791   Prob > F        =    0.0000
    Residual |  1638.21635       998  1.64149935   R-squared       =    0.1461
-------------+----------------------------------   Adj R-squared   =    0.1452
       Total |  1918.49114       999  1.92041156   Root MSE        =    1.2812

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .1719327   .0131579    13.07   0.000     .1461124     .197753
       _cons |  -.0168409   .0405855    -0.41   0.678    -.0964837    .0628018
------------------------------------------------------------------------------


file fig_outlier.png saved as PNG format

The contamination in the \(x\) variable biases the coefficient on \(x\) away from the true value of 1. Using the post-estimation command avplots, we can see how this bias is caused by the unusual values of the \(x\) variable.

Regression Outliers
Regression Outliers

Let’s have a look at how robust estimators in Stata address this bias. Some studies use Stata’s built-in function rreg, but we can see this does not solve the problem as the coefficient is still downward-biased:

rreg y x
   Huber iteration 1:  maximum difference in weights = .61426016
   Huber iteration 2:  maximum difference in weights = .01751717
Biweight iteration 3:  maximum difference in weights = .18892616
Biweight iteration 4:  maximum difference in weights = .00363223

Robust regression                               Number of obs     =      1,000
                                                F(  1,       998) =     159.28
                                                Prob > F          =     0.0000

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |    .169637   .0134413    12.62   0.000     .1432607    .1960134
       _cons |  -.0338366   .0414595    -0.82   0.415    -.1151943    .0475212
------------------------------------------------------------------------------

An alternative program we can use is robreg. Using the default settings of the “MM”-estimator that is part of rogreg, (the program contains many forms of robust estimation, including for example median regression), we obtain results that look much better than those based on rreg. The coefficient on \(x\) is not significantly different from the true value of 1 at \(p\) < 0.05, although the point estimate is still a bit too low at 0.91:

robreg mm y x
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ........................... done
fitting empty model ... done
computing standard errors ... done

MM regression (85% efficiency)                          Number of obs =  1,000
                                                        Wald chi2(1)  = 272.10
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.1495
                                                Breakdown point   =         50
                                                M-estimate: k     =  3.4436898
                                                S-estimate: k     =   1.547645
                                                Scale             =   1.117712

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .8952919   .0542751    16.50   0.000     .7887854    1.001798
       _cons |  -.0210565   .0366988    -0.57   0.566    -.0930723    .0509592
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = 6.4322347      Prob > chi2 = 0.0112

The difference between rreg and robreg mm is that the latter estimates a form of robust estimation that is more resistant to unusual values in the independent variable(s), while the former is not. More precisely put, the latter is an MM-estimator, while the former is an M estimator.

We need to keep in mind that besides these differences in estimators,the estimators can have different implementations.As the output above shows, the default implementation of robreg mm relies on a “85% efficiency”, which refers to the “normal efficiency” of the estimator. When OLS is the appropriate estimator but we use an alternative such as an MM-estimator, we lose precision (efficiency) because the robust estimator downweights observations. Because of this downweighting,we lose information that could be relevant for the estimation. The lower we set the “normal efficiency”, the more the information is downweighted. The advantage of this downweighting is that when we have situations as those in the example described above, where OLS is not the best estimator, we can obtain estimates that are less biased and more precise.

What happens when we change the default level of normal efficiency with robreg mm? Let’s see what happens when we use a 95% normal efficiency:

robreg mm y x, eff(95)
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................................................................
> .. done
fitting empty model ... done
computing standard errors ... done

MM regression (95% efficiency)                          Number of obs =  1,000
                                                        Wald chi2(1)  =  58.91
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.1185
                                                Breakdown point   =         50
                                                M-estimate: k     =  4.6850649
                                                S-estimate: k     =   1.547645
                                                Scale             =   1.117712

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .1686758   .0219767     7.68   0.000     .1255499    .2118017
       _cons |  -.0408841    .042482    -0.96   0.336    -.1242484    .0424801
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = 234.00944      Prob > chi2 = 0.0000

Interestingly, the results show us that when we increase the normal efficiency to 95% in this particular example, the estimates are as biased as we saw for OLS. This suggests that it is important to assess the sensitivity of the results to using alternative levels of normal efficiency and that we should not necessarily strive for normal efficiency levels that are high. When we instead lower the normal efficiency to 70%, we can see that the coefficient estimate increases to 0.94 (compared to 0.89 when using 85% normal efficiency). At least as important, we can see that the standard error (\(t\)-value) of the coefficient on \(x\) declines (increases), which shows how the robust estimation with lower normal efficiency both reduces bias and increases the precision of estimation:

robreg mm y x, eff(70)
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ............................. done
fitting empty model ... done
computing standard errors ... done

MM regression (70% efficiency)                          Number of obs =  1,000
                                                        Wald chi2(1)  = 369.84
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.1673
                                                Breakdown point   =         50
                                                M-estimate: k     =  2.6972206
                                                S-estimate: k     =   1.547645
                                                Scale             =   1.117712

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9456408   .0491718    19.23   0.000     .8491488    1.042133
       _cons |  -.0215139   .0391472    -0.55   0.583     -.098334    .0553063
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = 3.6316446      Prob > chi2 = 0.0567

In the following figure, we plot the coefficient estimates and their 95% confidence intervals for a broader set of alternatives to OLS. Besides the rreg and robreg mm implementations shown above, we also plot the results from using median regression ( using qreg), MM-estimations with 50 and 90 percent efficiency, and an S-estimator (using robreg s) that can be seen as an MM-estimation with the lowest possible normal efficiency (roughly 28.7 percent).

reg y x
       margins, dydx(*) saving(file0, replace)

qreg y x
        margins, dydx(*) saving(file1, replace)
rreg y x
        margins, dydx(*) saving(file2, replace)
robreg mm y x, eff(95)
        margins, dydx(*) saving(file3, replace)
    robreg mm y x, eff(90)
        margins, dydx(*) saving(file4, replace)
    robreg mm y x, eff(85)
        margins, dydx(*) saving(file5, replace)
    robreg mm y x, eff(70)
        margins, dydx(*) saving(file6, replace)
    robreg mm y x, eff(50)  
        margins, dydx(*) saving(file7, replace)
robreg s y x
        margins, dydx(*) saving(file8, replace)

combomarginsplot file0 file1 file2 file3 file4 file5 file6 file7 file8, ///
        labels("OLS" "Median" "RREG" "MM95" "MM90" "MM85" "MM70" "MM50" "S") ///
        xtitle("Estimator") ///
        ytitle("Coefficient estimate and 95% confidence interval") ///
        title("")
    graph export "fig_outlier_estimations.png", as(png) name("Graph") replace
      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(1, 998)       =    170.74
       Model |  280.274791         1  280.274791   Prob > F        =    0.0000
    Residual |  1638.21635       998  1.64149935   R-squared       =    0.1461
-------------+----------------------------------   Adj R-squared   =    0.1452
       Total |  1918.49114       999  1.92041156   Root MSE        =    1.2812

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .1719327   .0131579    13.07   0.000     .1461124     .197753
       _cons |  -.0168409   .0405855    -0.41   0.678    -.0964837    .0628018
------------------------------------------------------------------------------


Average marginal effects                                 Number of obs = 1,000
Model VCE: OLS

Expression: Linear prediction, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .1719327   .0131579    13.07   0.000     .1461124     .197753
------------------------------------------------------------------------------

Iteration  1:  WLS sum of weighted deviations =   510.7951

Iteration  1: sum of abs. weighted deviations =  510.80214
Iteration  2: sum of abs. weighted deviations =  510.51225
Iteration  3: sum of abs. weighted deviations =  510.29312
Iteration  4: sum of abs. weighted deviations =  509.73221
Iteration  5: sum of abs. weighted deviations =  509.71895
Iteration  6: sum of abs. weighted deviations =  509.71841

Median regression                                   Number of obs =      1,000
  Raw sum of deviations 551.1152 (about -.14306737)
  Min sum of deviations 509.7184                    Pseudo R2     =     0.0751

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .1783037   .0170875    10.43   0.000     .1447722    .2118353
       _cons |   -.108289   .0527063    -2.05   0.040    -.2117168   -.0048611
------------------------------------------------------------------------------


Average marginal effects                                 Number of obs = 1,000
Model VCE: IID

Expression: Linear prediction, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .1783037   .0170875    10.43   0.000     .1448129    .2117946
------------------------------------------------------------------------------


   Huber iteration 1:  maximum difference in weights = .61426016
   Huber iteration 2:  maximum difference in weights = .01751717
Biweight iteration 3:  maximum difference in weights = .18892616
Biweight iteration 4:  maximum difference in weights = .00363223

Robust regression                               Number of obs     =      1,000
                                                F(  1,       998) =     159.28
                                                Prob > F          =     0.0000

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |    .169637   .0134413    12.62   0.000     .1432607    .1960134
       _cons |  -.0338366   .0414595    -0.82   0.415    -.1151943    .0475212
------------------------------------------------------------------------------


Average marginal effects                                 Number of obs = 1,000

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |    .169637   .0134413    12.62   0.000     .1432926    .1959814
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................................................................
> .. done
fitting empty model ... done
computing standard errors ... done

MM regression (95% efficiency)                          Number of obs =  1,000
                                                        Wald chi2(1)  =  58.91
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.1185
                                                Breakdown point   =         50
                                                M-estimate: k     =  4.6850649
                                                S-estimate: k     =   1.547645
                                                Scale             =   1.117712

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .1686758   .0219767     7.68   0.000     .1255499    .2118017
       _cons |  -.0408841    .042482    -0.96   0.336    -.1242484    .0424801
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = 234.00944      Prob > chi2 = 0.0000


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .1686758   .0219767     7.68   0.000     .1256022    .2117494
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ............................... done
fitting empty model ... done
computing standard errors ... done

MM regression (90% efficiency)                          Number of obs =  1,000
                                                        Wald chi2(1)  = 173.89
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.1263
                                                Breakdown point   =         50
                                                M-estimate: k     =  3.8826616
                                                S-estimate: k     =   1.547645
                                                Scale             =   1.117712

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .8610597   .0652965    13.19   0.000     .7329256    .9891939
       _cons |  -.0217583   .0362316    -0.60   0.548    -.0928571    .0493405
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = 7.0773179      Prob > chi2 = 0.0078


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .8610597   .0652965    13.19   0.000      .733081    .9890385
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ........................... done
fitting empty model ... done
computing standard errors ... done

MM regression (85% efficiency)                          Number of obs =  1,000
                                                        Wald chi2(1)  = 272.10
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.1495
                                                Breakdown point   =         50
                                                M-estimate: k     =  3.4436898
                                                S-estimate: k     =   1.547645
                                                Scale             =   1.117712

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .8952919   .0542751    16.50   0.000     .7887854    1.001798
       _cons |  -.0210565   .0366988    -0.57   0.566    -.0930723    .0509592
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = 6.4322346      Prob > chi2 = 0.0112


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .8952919   .0542751    16.50   0.000     .7889146    1.001669
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ............................. done
fitting empty model ... done
computing standard errors ... done

MM regression (70% efficiency)                          Number of obs =  1,000
                                                        Wald chi2(1)  = 369.84
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.1673
                                                Breakdown point   =         50
                                                M-estimate: k     =  2.6972206
                                                S-estimate: k     =   1.547645
                                                Scale             =   1.117712

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9456408   .0491718    19.23   0.000     .8491488    1.042133
       _cons |  -.0215139   .0391472    -0.55   0.583     -.098334    .0553063
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = 3.6316446      Prob > chi2 = 0.0567


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9456408   .0491718    19.23   0.000     .8492658    1.042016
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ..................................... done
fitting empty model ... done
computing standard errors ... done

MM regression (50% efficiency)                          Number of obs =  1,000
                                                        Wald chi2(1)  = 358.68
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.1586
                                                Breakdown point   =         50
                                                M-estimate: k     =   2.087132
                                                S-estimate: k     =   1.547645
                                                Scale             =   1.117712

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9754561   .0515058    18.94   0.000     .8743841    1.076528
       _cons |  -.0219169   .0451865    -0.49   0.628    -.1105883    .0667545
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = 2.6131929      Prob > chi2 = 0.1060


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9754561   .0515058    18.94   0.000     .8745066    1.076406
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
computing standard errors ... done

S regression (28.7% efficiency)                         Number of obs =  1,000
                                                        Wald chi2(1)  = 361.59
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.1362
                                                Breakdown point   =         50
                                                Biweight k        =   1.547645
                                                Scale             =   1.117712

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   1.010181    .053124    19.02   0.000     .9059339    1.114429
       _cons |  -.0162354   .0545348    -0.30   0.766    -.1232514    .0907807
------------------------------------------------------------------------------
Hausman test of S against LS:    chi2(1) = 245.15507      Prob > chi2 = 0.0000


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   1.010181    .053124    19.02   0.000     .9060603    1.114303
------------------------------------------------------------------------------


  Variables that uniquely identify margins: _filenumber

file fig_outlier_estimations.png saved as PNG format
Estimates based on different estimators
Estimates based on different estimators

The figure illustrates the difference between the estimators that fail versus those that are close to unbiased, as well as the differences in the actual efficiency (precision) versus normal efficiency of the estimators. For this example, we can see that the MM-estimation with 0.70 efficiency is most precise, as evidenced by the smallest confidence interval.

Increasing precision with robust estimators

Another useful illustration to showcase what robust estimators do relative to OLS, is one where OLS is unbiased but it is simply not as precise as other estimators. This happens when regression errors fail to satisfy the homoskedasticity assumption, which almost always happens when the errors are non-normal. In the following lines, we create the values of \(x\) and \(y\) in a way that is similar to that in the example from the previous section, except that we now do not contaminate the \(x\) variable and we make the regression errors skewed:

clear
set seed 1234
set obs 1000
gen x=rnormal()
gen y= x+exp(rnormal())
Number of observations (_N) was 0, now 1,000.

Although the regression errors are skewed, OLS estimates are unbiased:

reg y x
      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(1, 998)       =    194.97
       Model |  924.363301         1  924.363301   Prob > F        =    0.0000
    Residual |  4731.55711       998  4.74103919   R-squared       =    0.1634
-------------+----------------------------------   Adj R-squared   =    0.1626
       Total |  5655.92041       999  5.66158199   Root MSE        =    2.1774

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9450322   .0676803    13.96   0.000     .8122202    1.077844
       _cons |   1.516893   .0688552    22.03   0.000     1.381775     1.65201
------------------------------------------------------------------------------

But because of the skewed regression errors, the homoskedasticity assumption is violated and OLS is no longer the most precise estimator.

robreg m y x, eff(95) biw
predict wt1, weights
sum wt1, d
pwcorr wt wt1

reg y x, r
        margins, dydx(*) saving(file0, replace)
    robreg q y x
        margins, dydx(*) saving(file1, replace)
    robreg m y x, eff(95)
        margins, dydx(*) saving(file2, replace)     
    robreg mm y x, eff(95)
        margins, dydx(*) saving(file3, replace)
    robreg mm y x, eff(90)
        margins, dydx(*) saving(file4, replace)
    robreg mm y x, eff(85)
        margins, dydx(*) saving(file5, replace)
    robreg mm y x, eff(70)
        margins, dydx(*) saving(file6, replace)
    robreg mm y x, eff(50)  
        margins, dydx(*) saving(file7, replace)
    robreg s y x
        margins, dydx(*) saving(file8, replace)

    combomarginsplot file0 file1 file2 file3 file4 file5 file6 file7 file8, ///
        labels("OLS" "Median" "M95" "MM95" "MM90" "MM85" "MM70" "MM50" "S") ///
        xtitle("Estimator") ///
        ytitle("Coefficient estimate and 95% confidence interval") ///
        title("") ///
        yscale(range(.8 1.1)) ///
        ylabel(0.8(0.1)1.1)
    graph export "fig_outlier_estimations2.png", as(png) name("Graph") replace
obtaining LAD starting values ... done
iterating RLS ............. done
fitting empty model ... done
computing standard errors ... done

M regression (95% efficiency)                          Number of obs =   1,000
                                                       Wald chi2(1)  = 1250.60
                                                       Prob > chi2   =  0.0000
                                                       Pseudo R2     =  0.4018
                                                Biweight k        =  4.6850649
                                                Scale             =  .88044721

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   1.003459   .0283753    35.36   0.000     .9477771    1.059141
       _cons |   1.073209    .028137    38.14   0.000     1.017995    1.128424
------------------------------------------------------------------------------

                         RLS weights
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%     .0946082              0
10%     .5806968              0       Obs               1,000
25%     .9064368              0       Sum of wgt.       1,000

50%     .9538094                      Mean           .8633503
                        Largest       Std. dev.      .2467352
75%     .9870978       .9999984
90%     .9982703        .999999       Variance       .0608782
95%     .9996591       .9999995       Skewness      -2.566302
99%     .9999715              1       Kurtosis       8.541501

             |      wt1      wt1
-------------+------------------
         wt1 |   1.0000 
         wt1 |   1.0000   1.0000 


Linear regression                               Number of obs     =      1,000
                                                F(1, 998)         =     274.06
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1634
                                                Root MSE          =     2.1774

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9450322   .0570852    16.55   0.000     .8330114    1.057053
       _cons |   1.516893   .0688372    22.04   0.000     1.381811    1.651975
------------------------------------------------------------------------------


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Linear prediction, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9450322   .0570852    16.55   0.000     .8330114    1.057053
------------------------------------------------------------------------------

iterating Frisch-Newton algorithm ........... done
fitting empty model ... done
computing standard errors ... done

.5 Quantile regression                                  Number of obs =  1,000
                                                        Wald chi2(1)  = 644.80
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.2606
                                                Sum of abs.dev.   =  502.59844
                                                Scale             =  .88044721

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9826234   .0386967    25.39   0.000     .9066872     1.05856
       _cons |   1.001046   .0396284    25.26   0.000     .9232816    1.078811
------------------------------------------------------------------------------


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9826234   .0386967    25.39   0.000     .9067792    1.058468
------------------------------------------------------------------------------

obtaining LAD starting values ... done
iterating RLS .......... done
fitting empty model ... done
computing standard errors ... done

M regression (95% efficiency)                          Number of obs =   1,000
                                                       Wald chi2(1)  = 1216.64
                                                       Prob > chi2   =  0.0000
                                                       Pseudo R2     =  0.3331
                                                Huber k           =  1.3449975
                                                Scale             =  .88044721

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9948067   .0285206    34.88   0.000     .9388395    1.050774
       _cons |   1.159885   .0287974    40.28   0.000     1.103375    1.216396
------------------------------------------------------------------------------


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9948067   .0285206    34.88   0.000     .9389074    1.050706
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ............... done
fitting empty model ... done
computing standard errors ... done

MM regression (95% efficiency)                         Number of obs =   1,000
                                                       Wald chi2(1)  = 1302.11
                                                       Prob > chi2   =  0.0000
                                                       Pseudo R2     =  0.4042
                                                Breakdown point   =         50
                                                M-estimate: k     =  4.6850649
                                                S-estimate: k     =   1.547645
                                                Scale             =  .75326883

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   1.001039   .0277413    36.08   0.000      .946601    1.055477
       _cons |   1.027756   .0376837    27.27   0.000     .9538078    1.101705
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = .12693163      Prob > chi2 = 0.7216


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   1.001039   .0277413    36.08   0.000     .9466671    1.055411
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ................. done
fitting empty model ... done
computing standard errors ... done

MM regression (90% efficiency)                         Number of obs =   1,000
                                                       Wald chi2(1)  = 1358.98
                                                       Prob > chi2   =  0.0000
                                                       Pseudo R2     =  0.3964
                                                Breakdown point   =         50
                                                M-estimate: k     =  3.8826616
                                                S-estimate: k     =   1.547645
                                                Scale             =  .75326883

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9976769   .0270635    36.86   0.000      .944569    1.050785
       _cons |   .9721504   .0374366    25.97   0.000     .8986869    1.045614
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = .06168784      Prob > chi2 = 0.8038


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9976769   .0270635    36.86   0.000     .9446334     1.05072
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................. done
fitting empty model ... done
computing standard errors ... done

MM regression (85% efficiency)                         Number of obs =   1,000
                                                       Wald chi2(1)  = 1426.40
                                                       Prob > chi2   =  0.0000
                                                       Pseudo R2     =  0.3874
                                                Breakdown point   =         50
                                                M-estimate: k     =  3.4436898
                                                S-estimate: k     =   1.547645
                                                Scale             =  .75326883

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9962669   .0263788    37.77   0.000     .9445026    1.048031
       _cons |    .934497   .0368942    25.33   0.000     .8620978    1.006896
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = .04208427      Prob > chi2 = 0.8375


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9962669   .0263788    37.77   0.000     .9445654    1.047968
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ..................... done
fitting empty model ... done
computing standard errors ... done

MM regression (70% efficiency)                         Number of obs =   1,000
                                                       Wald chi2(1)  = 1629.16
                                                       Prob > chi2   =  0.0000
                                                       Pseudo R2     =  0.3620
                                                Breakdown point   =         50
                                                M-estimate: k     =  2.6972206
                                                S-estimate: k     =   1.547645
                                                Scale             =  .75326883

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9957474   .0246699    40.36   0.000     .9473365    1.044158
       _cons |   .8532004   .0361648    23.59   0.000     .7822327    .9241681
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) =  .0536402      Prob > chi2 = 0.8168


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9957474   .0246699    40.36   0.000     .9473952      1.0441
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ........................ done
fitting empty model ... done
computing standard errors ... done

MM regression (50% efficiency)                         Number of obs =   1,000
                                                       Wald chi2(1)  = 1781.73
                                                       Prob > chi2   =  0.0000
                                                       Pseudo R2     =  0.3307
                                                Breakdown point   =         50
                                                M-estimate: k     =   2.087132
                                                S-estimate: k     =   1.547645
                                                Scale             =  .75326883

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9930466    .023526    42.21   0.000     .9468805    1.039213
       _cons |   .7667032   .0332074    23.09   0.000      .701539    .8318675
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) =  .0130284      Prob > chi2 = 0.9091


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9930466    .023526    42.21   0.000     .9469365    1.039157
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
computing standard errors ... done

S regression (28.7% efficiency)                        Number of obs =   1,000
                                                       Wald chi2(1)  = 2003.74
                                                       Prob > chi2   =  0.0000
                                                       Pseudo R2     =  0.2887
                                                Breakdown point   =         50
                                                Biweight k        =   1.547645
                                                Scale             =  .75326883

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9917874   .0221563    44.76   0.000     .9483091    1.035266
       _cons |   .6658629   .0303878    21.91   0.000     .6062317    .7254942
------------------------------------------------------------------------------
Hausman test of S against LS:    chi2(1) = .65265897      Prob > chi2 = 0.4192


Average marginal effects                                 Number of obs = 1,000
Model VCE: Robust

Expression: Fitted values, predict()
dy/dx wrt:  x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .9917874   .0221563    44.76   0.000     .9483618    1.035213
------------------------------------------------------------------------------


  Variables that uniquely identify margins: _filenumber

file fig_outlier_estimations2.png saved as PNG format
Precision of estimates with skewed regression errors
Precision of estimates with skewed regression errors

As we can see from the difference in confidence intervals, all the displayed alternative estimators are more precise than OLS. Also, although the differences are relatively small in this example, we can see that the estimators with lower normal efficiencies again can have higher efficiencies (smaller confidence intervals) in practice.

Getting the standard errors right with robust estimator

Lastly, it is important to keep in mind the clustering of standard errors. In contemporary research, the ad-hoc approach has been to exploit the fact that the coefficients obtained from robust estimators can be retrieved using weighted least squares (WLS). By storing each individual observation’s weight determined by the robust estimator, some researchers have obtained clustered standard errors by using those weights in a second-step WLS estimation with cluster-robust standard errors.

However, this WLS approach is inappropriate. To show why, consider the same data structure as above(with contamination of \(x\)) but with a dependence in the regression errors that requires us to cluster standard errors by firm (the dataset now consists of 100 hypothetical firms with 10 years of data each).

clear
set seed 1234
set obs 1000
gen firm=ceil(_n/10)
bys firm: gen year=_n
gen x=rnormal() if year==1
replace x=x[_n-1] if year>1
replace x= x + rnormal()
gen e=rnormal() if year==1
replace e= e[_n-1] if year>1
replace e= e+ rnormal()
gen y=x + e

replace x=x*10 if _n<=100

reg y x
reg y x, cluster(firm)      

robreg mm y x, eff(70) 
robreg mm y x, eff(70) cluster(firm)
Number of observations (_N) was 0, now 1,000.



(900 missing values generated)

(900 real changes made)

(1,000 real changes made)

(900 missing values generated)

(900 real changes made)

(1,000 real changes made)


(100 real changes made)

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(1, 998)       =    232.52
       Model |  803.592702         1  803.592702   Prob > F        =    0.0000
    Residual |  3449.04318       998  3.45595509   R-squared       =    0.1890
-------------+----------------------------------   Adj R-squared   =    0.1882
       Total |  4252.63588       999  4.25689277   Root MSE        =     1.859

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .1834723    .012032    15.25   0.000     .1598614    .2070832
       _cons |   .1294369   .0589154     2.20   0.028     .0138247    .2450491
------------------------------------------------------------------------------


Linear regression                               Number of obs     =      1,000
                                                F(1, 99)          =      42.11
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1890
                                                Root MSE          =      1.859

                                 (Std. err. adjusted for 100 clusters in firm)
------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .1834723   .0282725     6.49   0.000     .1273736     .239571
       _cons |   .1294369   .1399899     0.92   0.357    -.1483333    .4072072
------------------------------------------------------------------------------

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................... done
fitting empty model ... done
computing standard errors ... done

MM regression (70% efficiency)                          Number of obs =  1,000
                                                        Wald chi2(1)  = 690.19
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.2316
                                                Breakdown point   =         50
                                                M-estimate: k     =  2.6972206
                                                S-estimate: k     =   1.547645
                                                Scale             =  1.5479873

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   1.015867   .0386682    26.27   0.000     .9399868    1.091747
       _cons |   .0073708   .0521136     0.14   0.888     -.094894    .1096356
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = .51368697      Prob > chi2 = 0.4735

preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................... done
fitting empty model ... done
computing standard errors ... done

MM regression (70% efficiency)                          Number of obs =  1,000
                                                        Wald chi2(1)  = 300.57
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.2316
                                                Breakdown point   =         50
                                                M-estimate: k     =  2.6972206
                                                S-estimate: k     =   1.547645
                                                Scale             =  1.5479873

                                 (Std. err. adjusted for 100 clusters in firm)
------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   1.015867   .0585959    17.34   0.000     .8996002    1.132134
       _cons |   .0073708   .1142998     0.06   0.949    -.2194247    .2341663
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = .55188146      Prob > chi2 = 0.4575

We can see that when we use the appropriate standard error correction, the standard errors are more conservative(i.e.,higher)than when we do not account for clustering of the errors.

The \(t\)-value on the coefficient on \(x\) drops from 26.27 to 17.34 due to the more conservative standard error estimate, and these standard error estimates are unbiased.

Similarly, bootstrapped cluster-robust standard errors implemented in program roboot produces similarly unbiased standard error estimates:

roboot y x, nboot(9999) eff(70) seed(1234) cluster(firm)
STEP 1: Obtaining robust regression estimates, residuals, and scale estimate...

STEP 2: Obtaining cluster-robust bootstrapped standard errors...
 Status: 0%....20%....40%....60%....80%....100%
 
 
MM-estimator with 70% efficiency and bootstrapped clustered SEs
 
Number of bootstrap samples =  9999                    Number of obs =    1000
Number of clusters (firm) =     100                    Pseudo R2 =      0.2316
------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   1.015867   .0589569    17.23   0.000     .8988839     1.13285
       _cons |   .0073705   .1158184     0.06   0.949    -.2224382    .2371793
------------------------------------------------------------------------------
SE clustered by firm
 
Average weight assigned to observations:                0.7527
Fraction of observations with weight < 0.50:            0.1750
Fraction of observations with weight equal to zero:     0.0790
 

Now let’s see what happens when we use the inappropriate WLS approach. From the following output, we can see that the standard error (\(t\)-value) of the coefficient on \(x\) decreases (increases) relative to the situation where we do not cluster the standard errors. In other words, our inferences can become even more biased than when we fail to account for the clustering of regression errors, because the standard error estimate is too small.

robreg mm y x, eff(70) 
    predict w70, weights
    reg y x [aw=w70], cluster(firm)
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................... done
fitting empty model ... done
computing standard errors ... done

MM regression (70% efficiency)                          Number of obs =  1,000
                                                        Wald chi2(1)  = 690.19
                                                        Prob > chi2   = 0.0000
                                                        Pseudo R2     = 0.2316
                                                Breakdown point   =         50
                                                M-estimate: k     =  2.6972206
                                                S-estimate: k     =   1.547645
                                                Scale             =  1.5479873

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   1.015867   .0386682    26.27   0.000     .9399868    1.091747
       _cons |   .0073708   .0521136     0.14   0.888     -.094894    .1096356
------------------------------------------------------------------------------
Hausman test of MM against S:    chi2(1) = .51368698      Prob > chi2 = 0.4735


(sum of wgt is 752.7451663073552)

Linear regression                               Number of obs     =        921
                                                F(1, 98)          =     867.00
                                                Prob > F          =     0.0000
                                                R-squared         =     0.6387
                                                Root MSE          =     1.1055

                                  (Std. err. adjusted for 99 clusters in firm)
------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   1.015867   .0345006    29.44   0.000     .9474017    1.084332
       _cons |   .0073708    .072418     0.10   0.919    -.1363404     .151082
------------------------------------------------------------------------------