Difference-in-differences (DiD) is a statistical technique commonly used in Econometrics and social sciences to estimate the causal effect of an intervention or policy change (aka “treatment”) on an outcome of interest
Origins of Diff-in-Diff: John Snow’s Cholera1 Hypothesis (water borne) in 1855
We compare the change in the outcome variable between a treatment group (those affected by the intervention or policy change) and a control group (those not affected by the intervention) before and after the intervention i.e. over time
The logic behind DiD is that if the event never happens, the differences between treatment group and the control group should stay the same over time
The key identifying assumption is that the average outcome among the treated and comparison populations would have followed ‘parallel trends’ in the absence of treatment
We also assume that the treatment has no causal effect before its implementation (no anticipation)
Regression & Two Way Table Setup
\[ y = \beta_0 + \beta_1 \ Time + \beta_2 \ Treated + \beta_3 \ Time * Treated + \epsilon \]
Coefficient | Calculation | Interpretation |
---|---|---|
\(\beta_0\) | B | Baseline Average |
\(\beta_1\) | D-B | Time Trend in control group |
\(\beta_2\) | A-B | Difference between two groups pre-intervention |
\(\beta_3\) | (C-A) - (D-B) | Difference in changes over time |
Use the DiD model to estimate the effect of coastal weather events (2005 Atlantic hurricane season) on house prices
Use the DiD model to estimate the effect of coastal weather events (2005 Atlantic hurricane season) on house prices
Use the DiD model to estimate the effect of coastal weather events (2005 Atlantic hurricane season) on house prices
Use the DiD model to estimate the effect of coastal weather events (2005 Atlantic hurricane season) on house prices
2005 Atlantic hurricane season which was the most active hurricane season in recorded history up until 2020
We can create a dummy for time when the hurricane/treatment was in effect
Loading Library & Importing Data
================================================
Statistic N Mean St. Dev. Min Max
------------------------------------------------
HPI_CHG 48 0.022 0.017 -0.006 0.061
Time_Period 48 0.500 0.505 0 1
Disaster_Affected 48 0.208 0.410 0 1
NUM_DISASTERS 48 3.208 2.143 1 10
NUM_IND_ASSIST 48 8.583 14.946 0 55
------------------------------------------------
Observation counts by pre/post and treatment/control groups
# Create a two-way table with labels
raw_table <- table(Time_Period = ifelse(test = df$Time_Period == 0, yes = "Pre", no = "Post" ),
Treatment_Status = ifelse(test = df$Disaster_Affected == 0, yes = "Control", no = "Treatment")
)
raw_table
Treatment_Status
Time_Period Control Treatment
Post 19 5
Pre 19 5
Estimating Equation \(\small HPI\_Price_{st} = \beta_0 + \beta_1 \ Time\_Period_t + \beta_2 \ Disaster\_Affected_s + \beta_3 \ Time\_Period_t * Disaster\_Affected_s + \epsilon_{st}\)
Implement in R
Call:
lm(formula = HPI_CHG ~ Time_Period * Disaster_Affected, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.023081 -0.007610 -0.000171 0.004656 0.035981
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.037090 0.002819 13.157 < 2e-16 ***
Time_Period -0.027847 0.003987 -6.985 1.2e-08 ***
Disaster_Affected -0.013944 0.006176 -2.258 0.0290 *
Time_Period:Disaster_Affected 0.019739 0.008734 2.260 0.0288 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.01229 on 44 degrees of freedom
Multiple R-squared: 0.5356, Adjusted R-squared: 0.504
F-statistic: 16.92 on 3 and 44 DF, p-value: 1.882e-07
Time_Period:Disaster_Affected
0.01973946
mean_table <- tapply(X = df$HPI_CHG,
INDEX = list(Time_Period = ifelse(df$Time_Period == 0, "Pre", "Post"),
Treatment_Status = ifelse(df$Disaster_Affected == 0, "Control", "Treatment") ),
FUN = mean )
# Display the table
print(mean_table)
Treatment_Status
Time_Period Control Treatment
Post 0.009242792 0.01503835
Pre 0.037090020 0.02314612
DiD_effect <- ( mean_table[1, 2] - mean_table[1, 1] ) - ( mean_table[2, 2] - mean_table[2, 1] )
print(DiD_effect)
[1] 0.01973946
The standard two-group, two period difference-in-differences setup relies on the assumption of parallel trends
y
would trend at the same rate in the absence of the interventionThe standard DiD estimator measures the difference in estimated trends between the two groups
DiD estimate is equivalent to the interaction term of treatment and time dummy, or the difference in y for treament and control group between post and pre period
DiD with covariates
DiD with multiple periods
DiD with variation in treatment timing (Staggered DiD2)
Triple Difference (DiDiD)
Boston College
Cholera is a vicious disease that attacks victims suddenly, with acute symptoms such as vomiting and diarrhea. In the nineteenth century, it was usually fatal.
Units being exposed to treatment at different points in time.