We know that cluster-robust standard errors can be unreliable with
few clusters or when clusters vary substantially in size. One solution
to this problem is to use the “restricted wild cluster bootstrap”, or
“WCR” bootstrap, which can be implemented with the program
boottest
. To illustrate this program, we consider a
simulated dataset that consists of 1000 firms and 10 years per firm.
clear
set obs 10000
set seed 2
gen firm=ceil(_n/10)
bys firm: gen year=_n
gen x1=rnormal()
gen x2=rnormal() if firm==1
egen max=max(x2), by(year)
replace x2=max
drop max
gen x=x1+x2
gen e1=rnormal()
gen e2=rnormal() if firm==1
egen max=max(e2), by(year)
replace e2=max
drop max
gen e=e1+e2
gen y=e
Number of observations (_N) was 0, now 10,000.
(9,990 missing values generated)
(9,990 real changes made)
(9,990 missing values generated)
(9,990 real changes made)
The data are simulated such that there is a strong need to cluster by year, as can be seen from the regression output without and with cluster adjustment, respectively:
Source | SS df MS Number of obs = 10,000
-------------+---------------------------------- F(1, 9998) = 823.35
Model | 1373.93814 1 1373.93814 Prob > F = 0.0000
Residual | 16683.9192 9,998 1.66872567 R-squared = 0.0761
-------------+---------------------------------- Adj R-squared = 0.0760
Total | 18057.8574 9,999 1.80596633 Root MSE = 1.2918
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .2382644 .0083036 28.69 0.000 .2219876 .2545412
_cons | -.174081 .0134251 -12.97 0.000 -.2003968 -.1477652
------------------------------------------------------------------------------
Linear regression Number of obs = 10,000
F(1, 9) = 3.98
Prob > F = 0.0771
R-squared = 0.0761
Root MSE = 1.2918
(Std. err. adjusted for 10 clusters in year)
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .2382644 .1193802 2.00 0.077 -.0317924 .5083212
_cons | -.174081 .291538 -0.60 0.565 -.8335857 .4854238
------------------------------------------------------------------------------
Now, let’s find the true standard error, approximated by the standard deviation of the coefficient on \(x\) that we obtain from 1000 datasets created with the same data-generating process.
local sims = 1000
tempname resultsfile
tempfile results
postfile `resultsfile' b_x using `results', replace
quietly {
forvalues i = 1/`sims' {
clear
set obs 10000
gen firm = ceil(_n/10)
bys firm: gen year = _n
gen x1 = rnormal()
gen x2 = rnormal() if firm == 1
egen max = max(x2), by(year)
replace x2 = max
drop max
gen x = x1 + x2
gen e1 = rnormal()
gen e2 = rnormal() if firm == 1
egen max = max(e2), by(year)
replace e2 = max
drop max
gen e = e1 + e2
gen y = e
reg y x
post `resultsfile' (_b[x])
}
}
postclose `resultsfile'
use `results', clear
summarize b_x, detail
display "Standard deviation of coefficient on x over 1000 simulations: " r(sd)
(file C:\Users\User\AppData\Local\Temp\ST_a5e4_000001.tmp not found)
b_x
-------------------------------------------------------------
Percentiles Smallest
1% -.3631191 -.6092813
5% -.2585434 -.4439344
10% -.1981587 -.4377828 Obs 1,000
25% -.1097536 -.4128857 Sum of wgt. 1,000
50% -.0039017 Mean -.0042902
Largest Std. dev. .1562775
75% .0991302 .4617284
90% .1913024 .4708806 Variance .0244227
95% .2580352 .4801755 Skewness .0535759
99% .360173 .494745 Kurtosis 3.093615
Standard deviation of coefficient on x over 1000 simulations: .15627751
We see that the true standard error is about 0.155. Hence, it is clear that the standard error of 0.008 without cluster-adjustment is severely understated. However, the 0.119 based on clustering by year is also too low, even though we have correctly identified and specified the dimension in which we should cluster.
The WCR bootstrap is a solution to this problem that provides an alternative p-value for cluster-robust inference. Similar to the cluster-robust variance estimator that underlies the clustered standard errors we saw above, the WCR bootstrap allows for arbitrary correlation in regression errors within a cluster. What is different is that the WCR bootstrap generates a distribution of \(t\)-values based on a bootstrap procedure in which the bootstrap samples mimic the distribution of the original sample. By imposing the null hypothesis when generating the bootstrap samples, the procedure compares the original \(t\)-value from the regression (in the above example, \(t\) = 2.00) to the distribution of \(t\)-values that are obtained after imposing the null on the bootstrap samples.The bootstrapped \(p\)-value is then based on the fraction of \(t\)-values that are larger than the original \(t\)-value.
Let’s see what happens when we apply the WCR bootstrap by using
boottest
after estimating the regression with clustered
standard errors (reg y x, cluster(year))
. With only 10
clusters, the number of unique permutations of the bootstraps is limited
to 1024 (\(2^{10}\)),hence we use the
default of 999 replications here. Also, because there is a random
component in the bootstrap procedure, we should set the seed of the
random number generator in order to ensure that results are
replicable.
Before proceeding, we need to restore our original dataset prior to the bootstrap.
clear
set obs 10000
set seed 2
gen firm = ceil(_n/10)
bys firm: gen year = _n
gen x1 = rnormal()
gen x2 = rnormal() if firm == 1
egen max = max(x2), by(year)
replace x2 = max
drop max
gen x = x1 + x2
gen e1 = rnormal()
gen e2 = rnormal() if firm == 1
egen max = max(e2), by(year)
replace e2 = max
drop max
gen e = e1 + e2
gen y = e
* Run regression
reg y x, cluster(year)
* Run wild bootstrap test
boottest x = 0, seed(1234567)
Number of observations (_N) was 0, now 10,000.
(9,990 missing values generated)
(9,990 real changes made)
(9,990 missing values generated)
(9,990 real changes made)
Linear regression Number of obs = 10,000
F(1, 9) = 3.98
Prob > F = 0.0771
R-squared = 0.0761
Root MSE = 1.2918
(Std. err. adjusted for 10 clusters in year)
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | .2382644 .1193802 2.00 0.077 -.0317924 .5083212
_cons | -.174081 .291538 -0.60 0.565 -.8335857 .4854238
------------------------------------------------------------------------------
Wild bootstrap-t, null imposed, 999 replications, Wald test, bootstrap clusteri
> ng by year, Rademacher weights:
x = 0
t(9) = 1.9958
Prob>|t| = 0.1391
95% confidence set for null hypothesis expression: [−.07613, .5128]
Here, we see that t(9)=1.9958
refers to the \(t\)-value of 2.00 for \(x\) we saw earlier.
The results tell us that with the WCR bootstrap, the \(p\)-value is 0.1391 instead of the 0.077 we obtained with standard cluster adjustment in the regression. In other words,compared to the result based on the (understated) cluster-robust standard error estimate, we now obtain a more conservative \(p\)-value and no longer label the coefficient on \(x\) as statistically significantly different from zero.
This is expected as we have simulated the data based on a data
generating process in which \(x\) and
\(y\) were independent.The 95%
confidence interval provided by boottest
also informs us
that the coefficient is not different from zero. Thus, when clustering
on a dimension with few clusters (e.g., ten years), this example
illustrates that it might be useful to assess whether conclusions about
statistical significance are robust to using the WCR bootstrap.