Clustering

N - Number of observations

M - Number of cluster

K - Number of regressors (rank)

Clustering Fixed Effects (FE) estimations

\[\frac{(N-K)}{(N-K)-(M-1)} \cdot (X'X)^{-1} \, \sum_{j=1}^M u_{M}'u_{M} \, (X'X)^{-1} \cdot \frac{M}{M-1} \frac{N-1}{N-K}\] where \(u = X_{j} e_{j}\).

Clustering First Difference (FD) estimations

\[(X'X)^{-1} \, \sum_{j=1}^M u_{M}'u_{M} \, (X'X)^{-1} \cdot \frac{M}{M-1} \frac{N-1}{N-K}\] where \(u = X_{j} e_{j}\).

Greate data frame with a group structure

# packages 
library(dplyr)
library(lfe)
# data structure
N<-100 #number of obeservations
Y<-8 #number of years
G<-20 #number of groups
d <- data.frame(
  id = rep(1:N, each = Y), # individual id
  year = rep(1:Y, N) + 2000, 
  gid = rep(1:G, each = Y * (N/G)) #group id
)
# covariates
d$x1 <- rnorm(N * Y, 0, 1)
d$x2 <- rnorm(N * Y, 0, 1)
# error term 
d$e <- rnorm(N * Y, 0, 12)
# group treatment
g<-unique(d[,c("gid","year")])
g$treat<-sample(x=c(0,1),size=G * Y,replace=T)
d<-merge(d,g,by=c("gid","year"),all.x=T,sort=F)
# coefficients and outcomes
coef.x1 <- 1
coef.x2 <- 2
coef.treat <- 3
d$y<-coef.x1*d$x1 + coef.x2*d$x2 + coef.treat*d$treat +   d$e
# first differences
a<-lapply(
  split(d,d$id),
  function(s){
    # s<-split(d,d$id)[[1]]
    n<-c("y","x1","x2","treat")
    for (v in n){
      # print(v)
      s[,paste0("d.",v)]<-s[,v]-dplyr::lag(s[,v])
    }
    return(s)
  }
)
d<-do.call("rbind",a)

Fixed effects model

f<-"y ~ treat + x1 + x2  | id + year | 0 | gid"
f<-as.formula(f)
e<-felm(formula=f,data=d)
summary(e)
## 
## Call:
##    felm(formula = f, data = d) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.041  -7.157   0.174   7.651  35.864 
## 
## Coefficients:
##       Estimate Cluster s.e. t value Pr(>|t|)    
## treat   1.3647       0.9462   1.442 0.149674    
## x1      0.4862       0.4453   1.092 0.275280    
## x2      1.8892       0.5125   3.686 0.000246 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.07 on 690 degrees of freedom
## Multiple R-squared(full model): 0.1755   Adjusted R-squared: 0.04531 
## Multiple R-squared(proj model): 0.02836   Adjusted R-squared: -0.1251 
## F-statistic(full model, *iid*):1.348 on 109 and 690 DF, p-value: 0.01554 
## F-statistic(proj model):  4.73 on 3 and 19 DF, p-value: 0.01251

First difference model

f<-"d.y ~ d.treat + d.x1 + d.x2  | year | 0 | gid"
f<-as.formula(f)
e<-felm(formula=f,data=d)
summary(e)
## 
## Call:
##    felm(formula = f, data = d) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -47.864 -12.015  -0.719  12.037  66.471 
## 
## Coefficients:
##         Estimate Cluster s.e. t value Pr(>|t|)    
## d.treat   1.4694       1.0632   1.382 0.167416    
## d.x1      0.6896       0.4591   1.502 0.133543    
## d.x2      2.0929       0.5594   3.742 0.000198 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.27 on 690 degrees of freedom
##   (100 observations deleted due to missingness)
## Multiple R-squared(full model): 0.05065   Adjusted R-squared: 0.03826 
## Multiple R-squared(proj model): 0.0341   Adjusted R-squared: 0.0215 
## F-statistic(full model, *iid*): 4.09 on 9 and 690 DF, p-value: 3.895e-05 
## F-statistic(proj model): 5.644 on 3 and 19 DF, p-value: 0.006127