We know that cluster-robust standard errors can be unreliable with few clusters or when clusters vary substantially in size. One solution to this problem is to use the “restricted wild cluster bootstrap”, or “WCR” bootstrap, which can be implemented with the program boottest. To illustrate this program, we consider a simulated dataset that consists of 1000 firms and 10 years per firm.

clear
set obs 10000
set seed 2
 
gen firm=ceil(_n/10)
bys firm: gen year=_n

gen x1=rnormal()
gen x2=rnormal() if firm==1
egen max=max(x2), by(year)
replace x2=max
drop max
gen x=x1+x2


gen e1=rnormal()
gen e2=rnormal() if firm==1
egen max=max(e2), by(year)
replace e2=max
drop max
gen e=e1+e2
gen y=e
Number of observations (_N) was 0, now 10,000.





(9,990 missing values generated)


(9,990 real changes made)




(9,990 missing values generated)


(9,990 real changes made)

The data are simulated such that there is a strong need to cluster by year, as can be seen from the regression output without and with cluster adjustment, respectively:

reg y x
reg y x, cluster(year)
      Source |       SS           df       MS      Number of obs   =    10,000
-------------+----------------------------------   F(1, 9998)      =    823.35
       Model |  1373.93814         1  1373.93814   Prob > F        =    0.0000
    Residual |  16683.9192     9,998  1.66872567   R-squared       =    0.0761
-------------+----------------------------------   Adj R-squared   =    0.0760
       Total |  18057.8574     9,999  1.80596633   Root MSE        =    1.2918

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .2382644   .0083036    28.69   0.000     .2219876    .2545412
       _cons |   -.174081   .0134251   -12.97   0.000    -.2003968   -.1477652
------------------------------------------------------------------------------


Linear regression                               Number of obs     =     10,000
                                                F(1, 9)           =       3.98
                                                Prob > F          =     0.0771
                                                R-squared         =     0.0761
                                                Root MSE          =     1.2918

                                  (Std. err. adjusted for 10 clusters in year)
------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .2382644   .1193802     2.00   0.077    -.0317924    .5083212
       _cons |   -.174081    .291538    -0.60   0.565    -.8335857    .4854238
------------------------------------------------------------------------------

Now, let’s find the true standard error, approximated by the standard deviation of the coefficient on \(x\) that we obtain from 1000 datasets created with the same data-generating process.

local sims = 1000
tempname resultsfile
tempfile results
postfile `resultsfile' b_x using `results', replace

quietly {
    forvalues i = 1/`sims' {
        clear
        set obs 10000

        gen firm = ceil(_n/10)
        bys firm: gen year = _n

        gen x1 = rnormal()
        gen x2 = rnormal() if firm == 1
        egen max = max(x2), by(year)
        replace x2 = max
        drop max
        gen x = x1 + x2

        gen e1 = rnormal()
        gen e2 = rnormal() if firm == 1
        egen max = max(e2), by(year)
        replace e2 = max
        drop max
        gen e = e1 + e2
        gen y = e

        reg y x
        post `resultsfile' (_b[x])
    }
}

postclose `resultsfile'

use `results', clear
summarize b_x, detail
display "Standard deviation of coefficient on x over 1000 simulations: " r(sd)
(file C:\Users\User\AppData\Local\Temp\ST_a5e4_000001.tmp not found)

                             b_x
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -.3631191      -.6092813
 5%    -.2585434      -.4439344
10%    -.1981587      -.4377828       Obs               1,000
25%    -.1097536      -.4128857       Sum of wgt.       1,000

50%    -.0039017                      Mean          -.0042902
                        Largest       Std. dev.      .1562775
75%     .0991302       .4617284
90%     .1913024       .4708806       Variance       .0244227
95%     .2580352       .4801755       Skewness       .0535759
99%      .360173        .494745       Kurtosis       3.093615

Standard deviation of coefficient on x over 1000 simulations: .15627751

We see that the true standard error is about 0.155. Hence, it is clear that the standard error of 0.008 without cluster-adjustment is severely understated. However, the 0.119 based on clustering by year is also too low, even though we have correctly identified and specified the dimension in which we should cluster.

The WCR bootstrap is a solution to this problem that provides an alternative p-value for cluster-robust inference. Similar to the cluster-robust variance estimator that underlies the clustered standard errors we saw above, the WCR bootstrap allows for arbitrary correlation in regression errors within a cluster. What is different is that the WCR bootstrap generates a distribution of \(t\)-values based on a bootstrap procedure in which the bootstrap samples mimic the distribution of the original sample. By imposing the null hypothesis when generating the bootstrap samples, the procedure compares the original \(t\)-value from the regression (in the above example, \(t\) = 2.00) to the distribution of \(t\)-values that are obtained after imposing the null on the bootstrap samples.The bootstrapped \(p\)-value is then based on the fraction of \(t\)-values that are larger than the original \(t\)-value.

Let’s see what happens when we apply the WCR bootstrap by using boottest after estimating the regression with clustered standard errors (reg y x, cluster(year)). With only 10 clusters, the number of unique permutations of the bootstraps is limited to 1024 (\(2^{10}\)),hence we use the default of 999 replications here. Also, because there is a random component in the bootstrap procedure, we should set the seed of the random number generator in order to ensure that results are replicable.

Before proceeding, we need to restore our original dataset prior to the bootstrap.


clear
set obs 10000
set seed 2

gen firm = ceil(_n/10)
bys firm: gen year = _n

gen x1 = rnormal()
gen x2 = rnormal() if firm == 1
egen max = max(x2), by(year)
replace x2 = max
drop max
gen x = x1 + x2

gen e1 = rnormal()
gen e2 = rnormal() if firm == 1
egen max = max(e2), by(year)
replace e2 = max
drop max
gen e = e1 + e2
gen y = e

* Run regression
reg y x, cluster(year)

* Run wild bootstrap test
boottest x = 0, seed(1234567)
Number of observations (_N) was 0, now 10,000.





(9,990 missing values generated)


(9,990 real changes made)




(9,990 missing values generated)


(9,990 real changes made)





Linear regression                               Number of obs     =     10,000
                                                F(1, 9)           =       3.98
                                                Prob > F          =     0.0771
                                                R-squared         =     0.0761
                                                Root MSE          =     1.2918

                                  (Std. err. adjusted for 10 clusters in year)
------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   .2382644   .1193802     2.00   0.077    -.0317924    .5083212
       _cons |   -.174081    .291538    -0.60   0.565    -.8335857    .4854238
------------------------------------------------------------------------------


Wild bootstrap-t, null imposed, 999 replications, Wald test, bootstrap clusteri
> ng by year, Rademacher weights:
  x = 0

                            t(9) =     1.9958
                        Prob>|t| =     0.1391

95% confidence set for null hypothesis expression: [−.07613, .5128]

Here, we see that t(9)=1.9958 refers to the \(t\)-value of 2.00 for \(x\) we saw earlier.

The results tell us that with the WCR bootstrap, the \(p\)-value is 0.1391 instead of the 0.077 we obtained with standard cluster adjustment in the regression. In other words,compared to the result based on the (understated) cluster-robust standard error estimate, we now obtain a more conservative \(p\)-value and no longer label the coefficient on \(x\) as statistically significantly different from zero.

This is expected as we have simulated the data based on a data generating process in which \(x\) and \(y\) were independent.The 95% confidence interval provided by boottest also informs us that the coefficient is not different from zero. Thus, when clustering on a dimension with few clusters (e.g., ten years), this example illustrates that it might be useful to assess whether conclusions about statistical significance are robust to using the WCR bootstrap.