As an alternative to the winsorization and truncation procedures,recent studies commonly estimate “robust regressions” to address the potential influence of outliers in their tests or to increase the precision of their estimates. These robust regressions reflect a collection of a linear regression methods that, similar to OLS, use a specific objective function to estimate the parameters of the regression model for a sample of data. What most robust regressions do is perform an iterative procedure in which observations with large regression residuals get downweighted. Because different estimators and their different implementations in Stata vary in their effectiveness of controlling for outliers. Here we illustrate how to implement alternative robust estimators in Stata.
To set the stage, consider the following simulated example dataset of 1,000 observations where variable \(x\) and \(y\) have a simple relation: \(y= x + \epsilon\), where \(x\) and \(\epsilon\) are variables drawn from a standard normal distribution with mean 0 and standard deviation 1. For this sample, a regression of \(y\) on \(x\) should produce a coefficient estimate on \(x\) that is close to 1.
But let’s assume that for a randomly drawn 10 percent of observations, the decimal point of \(x\) was placed incorrectly so that the true values of \(x\) are all multiplied by 10 in the observed data.
clear
set seed 1234
set seed 1000
set obs 1000
gen x = rnormal()
gen y = x + rnormal()
reg y x
replace x = x*10 if _n <= 100
Number of observations (_N) was 0, now 1,000.
Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(1, 998) = 941.60
Model | 931.354225 1 931.354225 Prob > F = 0.0000
Residual | 987.136919 998 .98911515 R-squared = 0.4855
-------------+---------------------------------- Adj R-squared = 0.4849
Total | 1918.49114 999 1.92041156 Root MSE = .99454
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .996082 .0324609 30.69 0.000 .9323825 1.059782
_cons | -.0043599 .0314823 -0.14 0.890 -.066139 .0574193
------------------------------------------------------------------------------
(100 real changes made)
When we estimate a simple regression using OLS on the dataset, we get:
Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(1, 998) = 170.74
Model | 280.274791 1 280.274791 Prob > F = 0.0000
Residual | 1638.21635 998 1.64149935 R-squared = 0.1461
-------------+---------------------------------- Adj R-squared = 0.1452
Total | 1918.49114 999 1.92041156 Root MSE = 1.2812
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .1719327 .0131579 13.07 0.000 .1461124 .197753
_cons | -.0168409 .0405855 -0.41 0.678 -.0964837 .0628018
------------------------------------------------------------------------------
file fig_outlier.png saved as PNG format
The contamination in the \(x\)
variable biases the coefficient on \(x\) away from the true value of 1. Using
the post-estimation command avplots
, we can see how this
bias is caused by the unusual values of the \(x\) variable.
Let’s have a look at how robust estimators in Stata address this
bias. Some studies use Stata’s built-in function rreg
, but
we can see this does not solve the problem as the coefficient is still
downward-biased:
Huber iteration 1: maximum difference in weights = .61426016
Huber iteration 2: maximum difference in weights = .01751717
Biweight iteration 3: maximum difference in weights = .18892616
Biweight iteration 4: maximum difference in weights = .00363223
Robust regression Number of obs = 1,000
F( 1, 998) = 159.28
Prob > F = 0.0000
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .169637 .0134413 12.62 0.000 .1432607 .1960134
_cons | -.0338366 .0414595 -0.82 0.415 -.1151943 .0475212
------------------------------------------------------------------------------
An alternative program we can use is robreg
. Using the
default settings of the “MM”-estimator that is part of
rogreg
, (the program contains many forms of robust
estimation, including for example median regression), we obtain results
that look much better than those based on rreg
. The
coefficient on \(x\) is not
significantly different from the true value of 1 at \(p\) < 0.05, although the point estimate
is still a bit too low at 0.91:
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ........................... done
fitting empty model ... done
computing standard errors ... done
MM regression (85% efficiency) Number of obs = 1,000
Wald chi2(1) = 272.10
Prob > chi2 = 0.0000
Pseudo R2 = 0.1495
Breakdown point = 50
M-estimate: k = 3.4436898
S-estimate: k = 1.547645
Scale = 1.117712
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .8952919 .0542751 16.50 0.000 .7887854 1.001798
_cons | -.0210565 .0366988 -0.57 0.566 -.0930723 .0509592
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = 6.4322347 Prob > chi2 = 0.0112
The difference between rreg
and robreg mm
is that the latter estimates a form of robust estimation that is more
resistant to unusual values in the independent variable(s), while the
former is not. More precisely put, the latter is an MM-estimator, while
the former is an M estimator.
We need to keep in mind that besides these differences in
estimators,the estimators can have different
implementations.As the output above shows, the default
implementation of robreg mm
relies on a “85% efficiency”,
which refers to the “normal efficiency” of the estimator. When OLS is
the appropriate estimator but we use an alternative such as an
MM-estimator, we lose precision (efficiency) because the robust
estimator downweights observations. Because of this downweighting,we
lose information that could be relevant for the estimation. The lower we
set the “normal efficiency”, the more the information is downweighted.
The advantage of this downweighting is that when we have situations as
those in the example described above, where OLS is not the best
estimator, we can obtain estimates that are less biased and more
precise.
What happens when we change the default level of normal efficiency
with robreg mm
? Let’s see what happens when we use a 95%
normal efficiency:
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................................................................
> .. done
fitting empty model ... done
computing standard errors ... done
MM regression (95% efficiency) Number of obs = 1,000
Wald chi2(1) = 58.91
Prob > chi2 = 0.0000
Pseudo R2 = 0.1185
Breakdown point = 50
M-estimate: k = 4.6850649
S-estimate: k = 1.547645
Scale = 1.117712
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .1686758 .0219767 7.68 0.000 .1255499 .2118017
_cons | -.0408841 .042482 -0.96 0.336 -.1242484 .0424801
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = 234.00944 Prob > chi2 = 0.0000
Interestingly, the results show us that when we increase the normal efficiency to 95% in this particular example, the estimates are as biased as we saw for OLS. This suggests that it is important to assess the sensitivity of the results to using alternative levels of normal efficiency and that we should not necessarily strive for normal efficiency levels that are high. When we instead lower the normal efficiency to 70%, we can see that the coefficient estimate increases to 0.94 (compared to 0.89 when using 85% normal efficiency). At least as important, we can see that the standard error (\(t\)-value) of the coefficient on \(x\) declines (increases), which shows how the robust estimation with lower normal efficiency both reduces bias and increases the precision of estimation:
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ............................. done
fitting empty model ... done
computing standard errors ... done
MM regression (70% efficiency) Number of obs = 1,000
Wald chi2(1) = 369.84
Prob > chi2 = 0.0000
Pseudo R2 = 0.1673
Breakdown point = 50
M-estimate: k = 2.6972206
S-estimate: k = 1.547645
Scale = 1.117712
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9456408 .0491718 19.23 0.000 .8491488 1.042133
_cons | -.0215139 .0391472 -0.55 0.583 -.098334 .0553063
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = 3.6316446 Prob > chi2 = 0.0567
In the following figure, we plot the coefficient estimates and their
95% confidence intervals for a broader set of alternatives to OLS.
Besides the rreg
and robreg mm
implementations
shown above, we also plot the results from using median regression (
using qreg
), MM-estimations with 50 and 90 percent
efficiency, and an S-estimator (using robreg s
) that can be
seen as an MM-estimation with the lowest possible normal efficiency
(roughly 28.7 percent).
reg y x
margins, dydx(*) saving(file0, replace)
qreg y x
margins, dydx(*) saving(file1, replace)
rreg y x
margins, dydx(*) saving(file2, replace)
robreg mm y x, eff(95)
margins, dydx(*) saving(file3, replace)
robreg mm y x, eff(90)
margins, dydx(*) saving(file4, replace)
robreg mm y x, eff(85)
margins, dydx(*) saving(file5, replace)
robreg mm y x, eff(70)
margins, dydx(*) saving(file6, replace)
robreg mm y x, eff(50)
margins, dydx(*) saving(file7, replace)
robreg s y x
margins, dydx(*) saving(file8, replace)
combomarginsplot file0 file1 file2 file3 file4 file5 file6 file7 file8, ///
labels("OLS" "Median" "RREG" "MM95" "MM90" "MM85" "MM70" "MM50" "S") ///
xtitle("Estimator") ///
ytitle("Coefficient estimate and 95% confidence interval") ///
title("")
graph export "fig_outlier_estimations.png", as(png) name("Graph") replace
Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(1, 998) = 170.74
Model | 280.274791 1 280.274791 Prob > F = 0.0000
Residual | 1638.21635 998 1.64149935 R-squared = 0.1461
-------------+---------------------------------- Adj R-squared = 0.1452
Total | 1918.49114 999 1.92041156 Root MSE = 1.2812
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .1719327 .0131579 13.07 0.000 .1461124 .197753
_cons | -.0168409 .0405855 -0.41 0.678 -.0964837 .0628018
------------------------------------------------------------------------------
Average marginal effects Number of obs = 1,000
Model VCE: OLS
Expression: Linear prediction, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .1719327 .0131579 13.07 0.000 .1461124 .197753
------------------------------------------------------------------------------
Iteration 1: WLS sum of weighted deviations = 510.7951
Iteration 1: sum of abs. weighted deviations = 510.80214
Iteration 2: sum of abs. weighted deviations = 510.51225
Iteration 3: sum of abs. weighted deviations = 510.29312
Iteration 4: sum of abs. weighted deviations = 509.73221
Iteration 5: sum of abs. weighted deviations = 509.71895
Iteration 6: sum of abs. weighted deviations = 509.71841
Median regression Number of obs = 1,000
Raw sum of deviations 551.1152 (about -.14306737)
Min sum of deviations 509.7184 Pseudo R2 = 0.0751
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .1783037 .0170875 10.43 0.000 .1447722 .2118353
_cons | -.108289 .0527063 -2.05 0.040 -.2117168 -.0048611
------------------------------------------------------------------------------
Average marginal effects Number of obs = 1,000
Model VCE: IID
Expression: Linear prediction, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .1783037 .0170875 10.43 0.000 .1448129 .2117946
------------------------------------------------------------------------------
Huber iteration 1: maximum difference in weights = .61426016
Huber iteration 2: maximum difference in weights = .01751717
Biweight iteration 3: maximum difference in weights = .18892616
Biweight iteration 4: maximum difference in weights = .00363223
Robust regression Number of obs = 1,000
F( 1, 998) = 159.28
Prob > F = 0.0000
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .169637 .0134413 12.62 0.000 .1432607 .1960134
_cons | -.0338366 .0414595 -0.82 0.415 -.1151943 .0475212
------------------------------------------------------------------------------
Average marginal effects Number of obs = 1,000
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .169637 .0134413 12.62 0.000 .1432926 .1959814
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................................................................
> .. done
fitting empty model ... done
computing standard errors ... done
MM regression (95% efficiency) Number of obs = 1,000
Wald chi2(1) = 58.91
Prob > chi2 = 0.0000
Pseudo R2 = 0.1185
Breakdown point = 50
M-estimate: k = 4.6850649
S-estimate: k = 1.547645
Scale = 1.117712
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .1686758 .0219767 7.68 0.000 .1255499 .2118017
_cons | -.0408841 .042482 -0.96 0.336 -.1242484 .0424801
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = 234.00944 Prob > chi2 = 0.0000
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .1686758 .0219767 7.68 0.000 .1256022 .2117494
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ............................... done
fitting empty model ... done
computing standard errors ... done
MM regression (90% efficiency) Number of obs = 1,000
Wald chi2(1) = 173.89
Prob > chi2 = 0.0000
Pseudo R2 = 0.1263
Breakdown point = 50
M-estimate: k = 3.8826616
S-estimate: k = 1.547645
Scale = 1.117712
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .8610597 .0652965 13.19 0.000 .7329256 .9891939
_cons | -.0217583 .0362316 -0.60 0.548 -.0928571 .0493405
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = 7.0773179 Prob > chi2 = 0.0078
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .8610597 .0652965 13.19 0.000 .733081 .9890385
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ........................... done
fitting empty model ... done
computing standard errors ... done
MM regression (85% efficiency) Number of obs = 1,000
Wald chi2(1) = 272.10
Prob > chi2 = 0.0000
Pseudo R2 = 0.1495
Breakdown point = 50
M-estimate: k = 3.4436898
S-estimate: k = 1.547645
Scale = 1.117712
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .8952919 .0542751 16.50 0.000 .7887854 1.001798
_cons | -.0210565 .0366988 -0.57 0.566 -.0930723 .0509592
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = 6.4322346 Prob > chi2 = 0.0112
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .8952919 .0542751 16.50 0.000 .7889146 1.001669
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ............................. done
fitting empty model ... done
computing standard errors ... done
MM regression (70% efficiency) Number of obs = 1,000
Wald chi2(1) = 369.84
Prob > chi2 = 0.0000
Pseudo R2 = 0.1673
Breakdown point = 50
M-estimate: k = 2.6972206
S-estimate: k = 1.547645
Scale = 1.117712
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9456408 .0491718 19.23 0.000 .8491488 1.042133
_cons | -.0215139 .0391472 -0.55 0.583 -.098334 .0553063
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = 3.6316446 Prob > chi2 = 0.0567
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9456408 .0491718 19.23 0.000 .8492658 1.042016
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ..................................... done
fitting empty model ... done
computing standard errors ... done
MM regression (50% efficiency) Number of obs = 1,000
Wald chi2(1) = 358.68
Prob > chi2 = 0.0000
Pseudo R2 = 0.1586
Breakdown point = 50
M-estimate: k = 2.087132
S-estimate: k = 1.547645
Scale = 1.117712
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9754561 .0515058 18.94 0.000 .8743841 1.076528
_cons | -.0219169 .0451865 -0.49 0.628 -.1105883 .0667545
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = 2.6131929 Prob > chi2 = 0.1060
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9754561 .0515058 18.94 0.000 .8745066 1.076406
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
computing standard errors ... done
S regression (28.7% efficiency) Number of obs = 1,000
Wald chi2(1) = 361.59
Prob > chi2 = 0.0000
Pseudo R2 = 0.1362
Breakdown point = 50
Biweight k = 1.547645
Scale = 1.117712
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | 1.010181 .053124 19.02 0.000 .9059339 1.114429
_cons | -.0162354 .0545348 -0.30 0.766 -.1232514 .0907807
------------------------------------------------------------------------------
Hausman test of S against LS: chi2(1) = 245.15507 Prob > chi2 = 0.0000
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | 1.010181 .053124 19.02 0.000 .9060603 1.114303
------------------------------------------------------------------------------
Variables that uniquely identify margins: _filenumber
file fig_outlier_estimations.png saved as PNG format
The figure illustrates the difference between the estimators that fail versus those that are close to unbiased, as well as the differences in the actual efficiency (precision) versus normal efficiency of the estimators. For this example, we can see that the MM-estimation with 0.70 efficiency is most precise, as evidenced by the smallest confidence interval.
Another useful illustration to showcase what robust estimators do relative to OLS, is one where OLS is unbiased but it is simply not as precise as other estimators. This happens when regression errors fail to satisfy the homoskedasticity assumption, which almost always happens when the errors are non-normal. In the following lines, we create the values of \(x\) and \(y\) in a way that is similar to that in the example from the previous section, except that we now do not contaminate the \(x\) variable and we make the regression errors skewed:
Number of observations (_N) was 0, now 1,000.
Although the regression errors are skewed, OLS estimates are unbiased:
Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(1, 998) = 194.97
Model | 924.363301 1 924.363301 Prob > F = 0.0000
Residual | 4731.55711 998 4.74103919 R-squared = 0.1634
-------------+---------------------------------- Adj R-squared = 0.1626
Total | 5655.92041 999 5.66158199 Root MSE = 2.1774
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9450322 .0676803 13.96 0.000 .8122202 1.077844
_cons | 1.516893 .0688552 22.03 0.000 1.381775 1.65201
------------------------------------------------------------------------------
But because of the skewed regression errors, the homoskedasticity assumption is violated and OLS is no longer the most precise estimator.
robreg m y x, eff(95) biw
predict wt1, weights
sum wt1, d
pwcorr wt wt1
reg y x, r
margins, dydx(*) saving(file0, replace)
robreg q y x
margins, dydx(*) saving(file1, replace)
robreg m y x, eff(95)
margins, dydx(*) saving(file2, replace)
robreg mm y x, eff(95)
margins, dydx(*) saving(file3, replace)
robreg mm y x, eff(90)
margins, dydx(*) saving(file4, replace)
robreg mm y x, eff(85)
margins, dydx(*) saving(file5, replace)
robreg mm y x, eff(70)
margins, dydx(*) saving(file6, replace)
robreg mm y x, eff(50)
margins, dydx(*) saving(file7, replace)
robreg s y x
margins, dydx(*) saving(file8, replace)
combomarginsplot file0 file1 file2 file3 file4 file5 file6 file7 file8, ///
labels("OLS" "Median" "M95" "MM95" "MM90" "MM85" "MM70" "MM50" "S") ///
xtitle("Estimator") ///
ytitle("Coefficient estimate and 95% confidence interval") ///
title("") ///
yscale(range(.8 1.1)) ///
ylabel(0.8(0.1)1.1)
graph export "fig_outlier_estimations2.png", as(png) name("Graph") replace
obtaining LAD starting values ... done
iterating RLS ............. done
fitting empty model ... done
computing standard errors ... done
M regression (95% efficiency) Number of obs = 1,000
Wald chi2(1) = 1250.60
Prob > chi2 = 0.0000
Pseudo R2 = 0.4018
Biweight k = 4.6850649
Scale = .88044721
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | 1.003459 .0283753 35.36 0.000 .9477771 1.059141
_cons | 1.073209 .028137 38.14 0.000 1.017995 1.128424
------------------------------------------------------------------------------
RLS weights
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% .0946082 0
10% .5806968 0 Obs 1,000
25% .9064368 0 Sum of wgt. 1,000
50% .9538094 Mean .8633503
Largest Std. dev. .2467352
75% .9870978 .9999984
90% .9982703 .999999 Variance .0608782
95% .9996591 .9999995 Skewness -2.566302
99% .9999715 1 Kurtosis 8.541501
| wt1 wt1
-------------+------------------
wt1 | 1.0000
wt1 | 1.0000 1.0000
Linear regression Number of obs = 1,000
F(1, 998) = 274.06
Prob > F = 0.0000
R-squared = 0.1634
Root MSE = 2.1774
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9450322 .0570852 16.55 0.000 .8330114 1.057053
_cons | 1.516893 .0688372 22.04 0.000 1.381811 1.651975
------------------------------------------------------------------------------
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Linear prediction, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9450322 .0570852 16.55 0.000 .8330114 1.057053
------------------------------------------------------------------------------
iterating Frisch-Newton algorithm ........... done
fitting empty model ... done
computing standard errors ... done
.5 Quantile regression Number of obs = 1,000
Wald chi2(1) = 644.80
Prob > chi2 = 0.0000
Pseudo R2 = 0.2606
Sum of abs.dev. = 502.59844
Scale = .88044721
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9826234 .0386967 25.39 0.000 .9066872 1.05856
_cons | 1.001046 .0396284 25.26 0.000 .9232816 1.078811
------------------------------------------------------------------------------
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9826234 .0386967 25.39 0.000 .9067792 1.058468
------------------------------------------------------------------------------
obtaining LAD starting values ... done
iterating RLS .......... done
fitting empty model ... done
computing standard errors ... done
M regression (95% efficiency) Number of obs = 1,000
Wald chi2(1) = 1216.64
Prob > chi2 = 0.0000
Pseudo R2 = 0.3331
Huber k = 1.3449975
Scale = .88044721
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9948067 .0285206 34.88 0.000 .9388395 1.050774
_cons | 1.159885 .0287974 40.28 0.000 1.103375 1.216396
------------------------------------------------------------------------------
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9948067 .0285206 34.88 0.000 .9389074 1.050706
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ............... done
fitting empty model ... done
computing standard errors ... done
MM regression (95% efficiency) Number of obs = 1,000
Wald chi2(1) = 1302.11
Prob > chi2 = 0.0000
Pseudo R2 = 0.4042
Breakdown point = 50
M-estimate: k = 4.6850649
S-estimate: k = 1.547645
Scale = .75326883
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | 1.001039 .0277413 36.08 0.000 .946601 1.055477
_cons | 1.027756 .0376837 27.27 0.000 .9538078 1.101705
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = .12693163 Prob > chi2 = 0.7216
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | 1.001039 .0277413 36.08 0.000 .9466671 1.055411
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ................. done
fitting empty model ... done
computing standard errors ... done
MM regression (90% efficiency) Number of obs = 1,000
Wald chi2(1) = 1358.98
Prob > chi2 = 0.0000
Pseudo R2 = 0.3964
Breakdown point = 50
M-estimate: k = 3.8826616
S-estimate: k = 1.547645
Scale = .75326883
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9976769 .0270635 36.86 0.000 .944569 1.050785
_cons | .9721504 .0374366 25.97 0.000 .8986869 1.045614
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = .06168784 Prob > chi2 = 0.8038
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9976769 .0270635 36.86 0.000 .9446334 1.05072
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................. done
fitting empty model ... done
computing standard errors ... done
MM regression (85% efficiency) Number of obs = 1,000
Wald chi2(1) = 1426.40
Prob > chi2 = 0.0000
Pseudo R2 = 0.3874
Breakdown point = 50
M-estimate: k = 3.4436898
S-estimate: k = 1.547645
Scale = .75326883
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9962669 .0263788 37.77 0.000 .9445026 1.048031
_cons | .934497 .0368942 25.33 0.000 .8620978 1.006896
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = .04208427 Prob > chi2 = 0.8375
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9962669 .0263788 37.77 0.000 .9445654 1.047968
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ..................... done
fitting empty model ... done
computing standard errors ... done
MM regression (70% efficiency) Number of obs = 1,000
Wald chi2(1) = 1629.16
Prob > chi2 = 0.0000
Pseudo R2 = 0.3620
Breakdown point = 50
M-estimate: k = 2.6972206
S-estimate: k = 1.547645
Scale = .75326883
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9957474 .0246699 40.36 0.000 .9473365 1.044158
_cons | .8532004 .0361648 23.59 0.000 .7822327 .9241681
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = .0536402 Prob > chi2 = 0.8168
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9957474 .0246699 40.36 0.000 .9473952 1.0441
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS ........................ done
fitting empty model ... done
computing standard errors ... done
MM regression (50% efficiency) Number of obs = 1,000
Wald chi2(1) = 1781.73
Prob > chi2 = 0.0000
Pseudo R2 = 0.3307
Breakdown point = 50
M-estimate: k = 2.087132
S-estimate: k = 1.547645
Scale = .75326883
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9930466 .023526 42.21 0.000 .9468805 1.039213
_cons | .7667032 .0332074 23.09 0.000 .701539 .8318675
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = .0130284 Prob > chi2 = 0.9091
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9930466 .023526 42.21 0.000 .9469365 1.039157
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
computing standard errors ... done
S regression (28.7% efficiency) Number of obs = 1,000
Wald chi2(1) = 2003.74
Prob > chi2 = 0.0000
Pseudo R2 = 0.2887
Breakdown point = 50
Biweight k = 1.547645
Scale = .75326883
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9917874 .0221563 44.76 0.000 .9483091 1.035266
_cons | .6658629 .0303878 21.91 0.000 .6062317 .7254942
------------------------------------------------------------------------------
Hausman test of S against LS: chi2(1) = .65265897 Prob > chi2 = 0.4192
Average marginal effects Number of obs = 1,000
Model VCE: Robust
Expression: Fitted values, predict()
dy/dx wrt: x
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .9917874 .0221563 44.76 0.000 .9483618 1.035213
------------------------------------------------------------------------------
Variables that uniquely identify margins: _filenumber
file fig_outlier_estimations2.png saved as PNG format
As we can see from the difference in confidence intervals, all the displayed alternative estimators are more precise than OLS. Also, although the differences are relatively small in this example, we can see that the estimators with lower normal efficiencies again can have higher efficiencies (smaller confidence intervals) in practice.
Lastly, it is important to keep in mind the clustering of standard errors. In contemporary research, the ad-hoc approach has been to exploit the fact that the coefficients obtained from robust estimators can be retrieved using weighted least squares (WLS). By storing each individual observation’s weight determined by the robust estimator, some researchers have obtained clustered standard errors by using those weights in a second-step WLS estimation with cluster-robust standard errors.
However, this WLS approach is inappropriate. To show why, consider the same data structure as above(with contamination of \(x\)) but with a dependence in the regression errors that requires us to cluster standard errors by firm (the dataset now consists of 100 hypothetical firms with 10 years of data each).
clear
set seed 1234
set obs 1000
gen firm=ceil(_n/10)
bys firm: gen year=_n
gen x=rnormal() if year==1
replace x=x[_n-1] if year>1
replace x= x + rnormal()
gen e=rnormal() if year==1
replace e= e[_n-1] if year>1
replace e= e+ rnormal()
gen y=x + e
replace x=x*10 if _n<=100
reg y x
reg y x, cluster(firm)
robreg mm y x, eff(70)
robreg mm y x, eff(70) cluster(firm)
Number of observations (_N) was 0, now 1,000.
(900 missing values generated)
(900 real changes made)
(1,000 real changes made)
(900 missing values generated)
(900 real changes made)
(1,000 real changes made)
(100 real changes made)
Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(1, 998) = 232.52
Model | 803.592702 1 803.592702 Prob > F = 0.0000
Residual | 3449.04318 998 3.45595509 R-squared = 0.1890
-------------+---------------------------------- Adj R-squared = 0.1882
Total | 4252.63588 999 4.25689277 Root MSE = 1.859
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .1834723 .012032 15.25 0.000 .1598614 .2070832
_cons | .1294369 .0589154 2.20 0.028 .0138247 .2450491
------------------------------------------------------------------------------
Linear regression Number of obs = 1,000
F(1, 99) = 42.11
Prob > F = 0.0000
R-squared = 0.1890
Root MSE = 1.859
(Std. err. adjusted for 100 clusters in firm)
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .1834723 .0282725 6.49 0.000 .1273736 .239571
_cons | .1294369 .1399899 0.92 0.357 -.1483333 .4072072
------------------------------------------------------------------------------
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................... done
fitting empty model ... done
computing standard errors ... done
MM regression (70% efficiency) Number of obs = 1,000
Wald chi2(1) = 690.19
Prob > chi2 = 0.0000
Pseudo R2 = 0.2316
Breakdown point = 50
M-estimate: k = 2.6972206
S-estimate: k = 1.547645
Scale = 1.5479873
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | 1.015867 .0386682 26.27 0.000 .9399868 1.091747
_cons | .0073708 .0521136 0.14 0.888 -.094894 .1096356
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = .51368697 Prob > chi2 = 0.4735
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................... done
fitting empty model ... done
computing standard errors ... done
MM regression (70% efficiency) Number of obs = 1,000
Wald chi2(1) = 300.57
Prob > chi2 = 0.0000
Pseudo R2 = 0.2316
Breakdown point = 50
M-estimate: k = 2.6972206
S-estimate: k = 1.547645
Scale = 1.5479873
(Std. err. adjusted for 100 clusters in firm)
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | 1.015867 .0585959 17.34 0.000 .8996002 1.132134
_cons | .0073708 .1142998 0.06 0.949 -.2194247 .2341663
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = .55188146 Prob > chi2 = 0.4575
We can see that when we use the appropriate standard error correction, the standard errors are more conservative(i.e.,higher)than when we do not account for clustering of the errors.
The \(t\)-value on the coefficient on \(x\) drops from 26.27 to 17.34 due to the more conservative standard error estimate, and these standard error estimates are unbiased.
Similarly, bootstrapped cluster-robust standard errors implemented in
program roboot
produces similarly unbiased standard error
estimates:
STEP 1: Obtaining robust regression estimates, residuals, and scale estimate...
STEP 2: Obtaining cluster-robust bootstrapped standard errors...
Status: 0%....20%....40%....60%....80%....100%
MM-estimator with 70% efficiency and bootstrapped clustered SEs
Number of bootstrap samples = 9999 Number of obs = 1000
Number of clusters (firm) = 100 Pseudo R2 = 0.2316
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | 1.015867 .0589569 17.23 0.000 .8988839 1.13285
_cons | .0073705 .1158184 0.06 0.949 -.2224382 .2371793
------------------------------------------------------------------------------
SE clustered by firm
Average weight assigned to observations: 0.7527
Fraction of observations with weight < 0.50: 0.1750
Fraction of observations with weight equal to zero: 0.0790
Now let’s see what happens when we use the inappropriate WLS approach. From the following output, we can see that the standard error (\(t\)-value) of the coefficient on \(x\) decreases (increases) relative to the situation where we do not cluster the standard errors. In other words, our inferences can become even more biased than when we fail to account for the clustering of regression errors, because the standard error estimate is too small.
preparing data for subsampling ... done
enumerating 20 candidates 0%....20%....40%....60%....80%....100%
refining 5 best candidates ..... done
iterating RLS .................... done
fitting empty model ... done
computing standard errors ... done
MM regression (70% efficiency) Number of obs = 1,000
Wald chi2(1) = 690.19
Prob > chi2 = 0.0000
Pseudo R2 = 0.2316
Breakdown point = 50
M-estimate: k = 2.6972206
S-estimate: k = 1.547645
Scale = 1.5479873
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | 1.015867 .0386682 26.27 0.000 .9399868 1.091747
_cons | .0073708 .0521136 0.14 0.888 -.094894 .1096356
------------------------------------------------------------------------------
Hausman test of MM against S: chi2(1) = .51368698 Prob > chi2 = 0.4735
(sum of wgt is 752.7451663073552)
Linear regression Number of obs = 921
F(1, 98) = 867.00
Prob > F = 0.0000
R-squared = 0.6387
Root MSE = 1.1055
(Std. err. adjusted for 99 clusters in firm)
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | 1.015867 .0345006 29.44 0.000 .9474017 1.084332
_cons | .0073708 .072418 0.10 0.919 -.1363404 .151082
------------------------------------------------------------------------------