Difference in Difference

Dr. Arvind Sharma

Introduction

  • Difference-in-differences (DiD) is a statistical technique commonly used in Econometrics and social sciences to estimate the causal effect of an intervention or policy change (aka “treatment”) on an outcome of interest

  • Origins of Diff-in-Diff: John Snow’s Cholera1 Hypothesis (water borne) in 1855

  • Fictional character in the medieval fantasy novel, “A Song of Ice and Fire” by George R. R. Martin, and its HBO television adaptation Game of Thrones
  • English physician, founder of early germ theory and leader in the development of medical hygiene
  • Card and Kruger (1994) popularized the method in Economics (classic minimum wage study)

Methodology Intuition

  • We compare the change in the outcome variable between a treatment group (those affected by the intervention or policy change) and a control group (those not affected by the intervention) before and after the intervention i.e. over time

    • By comparing these changes, DiD attempts to isolate the causal effect of the treatment from other factors that might also influence the outcome
  • The logic behind DiD is that if the event never happens, the differences between treatment group and the control group should stay the same over time

Identifying Assumptions

  • In the canonical difference-in-differences model, where two time periods are available, there is a treated population of units that receives a treatment of interest beginning in the second period, and a comparison population that does not receive the treatment in either period
  1. The key identifying assumption is that the average outcome among the treated and comparison populations would have followed ‘parallel trends’ in the absence of treatment

  2. We also assume that the treatment has no causal effect before its implementation (no anticipation)

  • Together, these assumptions allow us to identify the average treatment effect on the treated (ATT)

Structure of Diff in Diff Model

Regression & Two Way Table Setup

\[ y = \beta_0 + \beta_1 \ Time + \beta_2 \ Treated + \beta_3 \ Time * Treated + \epsilon \]

Coefficient Calculation Interpretation
\(\beta_0\) B Baseline Average
\(\beta_1\) D-B Time Trend in control group
\(\beta_2\) A-B Difference between two groups pre-intervention
\(\beta_3\) (C-A) - (D-B) Difference in changes over time

Example

Use the DiD model to estimate the effect of coastal weather events (2005 Atlantic hurricane season) on house prices

  • Path of the Hurricane (Gulf of Mexico)

Example

Use the DiD model to estimate the effect of coastal weather events (2005 Atlantic hurricane season) on house prices

  • Home Damage by Hurricane

Example

Use the DiD model to estimate the effect of coastal weather events (2005 Atlantic hurricane season) on house prices

  • Federal Emergency Management Agency (FEMA) released funds to rebuilding homes in adversely affected states

Example

Use the DiD model to estimate the effect of coastal weather events (2005 Atlantic hurricane season) on house prices

  • FEMA released funds to rebuilding homes.

\(1^{st}\) Margin: Pre / Post

  • 2005 Atlantic hurricane season which was the most active hurricane season in recorded history up until 2020

  • We can create a dummy for time when the hurricane/treatment was in effect

\(2^{nd}\) Margin: Control / Treatment

  • We can create treatment dummy for states having a coastline to sea that were affected by 2005 hurricane, and thus given FEMA funding for housing reconstruction.

Tutorial

Loading Library & Importing Data

# Load libraries
library(stargazer)  # summary statistics
library(tidyverse)  # data manipulation
library(dplyr)      

# Importing Data
df <- read.csv("us_fred_coastal_us_states_avg_hpi_before_after_2005.csv")
stargazer(... = df, 
          type = "text")

================================================
Statistic         N  Mean  St. Dev.  Min    Max 
------------------------------------------------
HPI_CHG           48 0.022  0.017   -0.006 0.061
Time_Period       48 0.500  0.505     0      1  
Disaster_Affected 48 0.208  0.410     0      1  
NUM_DISASTERS     48 3.208  2.143     1     10  
NUM_IND_ASSIST    48 8.583  14.946    0     55  
------------------------------------------------

Observation counts by pre/post and treatment/control groups

# Create a two-way table with labels

raw_table <- table(Time_Period       = ifelse(test = df$Time_Period == 0,       yes = "Pre",     no = "Post"     ),
                   Treatment_Status  = ifelse(test = df$Disaster_Affected == 0, yes = "Control", no = "Treatment")
                   )

raw_table
           Treatment_Status
Time_Period Control Treatment
       Post      19         5
       Pre       19         5
  • Estimating Equation \(\small HPI\_Price_{st} = \beta_0 + \beta_1 \ Time\_Period_t + \beta_2 \ Disaster\_Affected_s + \beta_3 \ Time\_Period_t * Disaster\_Affected_s + \epsilon_{st}\)

  • Implement in R


Call:
lm(formula = HPI_CHG ~ Time_Period * Disaster_Affected, data = df)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.023081 -0.007610 -0.000171  0.004656  0.035981 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    0.037090   0.002819  13.157  < 2e-16 ***
Time_Period                   -0.027847   0.003987  -6.985  1.2e-08 ***
Disaster_Affected             -0.013944   0.006176  -2.258   0.0290 *  
Time_Period:Disaster_Affected  0.019739   0.008734   2.260   0.0288 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.01229 on 44 degrees of freedom
Multiple R-squared:  0.5356,    Adjusted R-squared:  0.504 
F-statistic: 16.92 on 3 and 44 DF,  p-value: 1.882e-07
  • DiD estimate
Time_Period:Disaster_Affected 
                   0.01973946 
  • Average house price by pre/post and treatment/control groups
mean_table <- tapply(X     = df$HPI_CHG, 
                     INDEX = list(Time_Period      = ifelse(df$Time_Period == 0,       "Pre",     "Post"),
                                  Treatment_Status = ifelse(df$Disaster_Affected == 0, "Control", "Treatment") ),
                     FUN   = mean )

# Display the table
print(mean_table)
           Treatment_Status
Time_Period     Control  Treatment
       Post 0.009242792 0.01503835
       Pre  0.037090020 0.02314612
  • Calculate DiD effect
DiD_effect <- ( mean_table[1, 2] - mean_table[1, 1] )  - ( mean_table[2, 2] - mean_table[2, 1] )
print(DiD_effect)
[1] 0.01973946
  • Same as \(\beta_3\) of estimating equation!

Tutorial Summary

Summary

  1. The standard two-group, two period difference-in-differences setup relies on the assumption of parallel trends

    • Parallel trends assumes that any trends in the outcome y would trend at the same rate in the absence of the intervention
    • Prior to the intervention, y should move in the same direction for both groups
  2. The standard DiD estimator measures the difference in estimated trends between the two groups

  3. DiD estimate is equivalent to the interaction term of treatment and time dummy, or the difference in y for treament and control group between post and pre period

    • If the parallel trends assumption is violated, we cannot be sure that the DiD estimator is identifying the effects of the policy or simply some other unaccounted factor causing different trends between these groups

Extensions

  1. DiD with covariates

    • Leverage available information about observed characteristics like covariate-specific trends, or unobserved heterogenity with fixed effects
  2. DiD with multiple periods

    • Estimating the treatment effect over multiple time periods rather than just before and after the intervention
  3. DiD with variation in treatment timing (Staggered DiD2)

  4. Triple Difference (DiDiD)

Appendix

References

  1. https://econ.georgetown.edu/difference-in-differences-methods/

  2. https://ds4ps.org/pe4ps-textbook/docs/p-030-diff-in-diff.html#

  3. https://www.princeton.edu/~otorres/DID101.pdf

John Snow Diff-in-Diff Table

Cases of cholera per 10,000 households

Footnotes

  1. Cholera is a vicious disease that attacks victims suddenly, with acute symptoms such as vomiting and diarrhea. In the nineteenth century, it was usually fatal.

  2. Units being exposed to treatment at different points in time.