What comes after the basic DID model
I’m assuming you already know:
We want: how much more did the treated group change than the untreated group?
\[(\bar{Y}^{T}_{after}-\bar{Y}^{T}_{before}) - (\bar{Y}^{C}_{after}-\bar{Y}^{C}_{before})\]
DID as a research design is wonderfully flexible. The logic — use the control group’s change as the treated group’s counterfactual — survives almost anything you throw at it.
DID as an estimation method (TWFE + OLS) is fragile. It breaks the moment you vary almost anything.
Today: what breaks DID, usually the estimator but sometimes the design, and how can we fix it?
You’d think these would be harmless. They are not:
In ordinary regression, controls close back doors / justify conditional independence.
In DID, the design already handles level differences between groups. So controls are not there to make groups comparable or to handle endogeneity in treatment assignment.
Controls in DID exist to rescue parallel trends:
“I don’t think parallel trends holds raw — but it holds conditional on this variable.”
If that’s not the claim you’re making, you don’t want the control.
No. TWFE-with-controls quietly does the wrong thing:
Parallel trends violated only through income; true treatment effect baked in at 0; income is a clean confounder (not caused by treatment). 1,000 runs:
The density sits well off the truth — and there’s no post-treatment bias anywhere. It still fails.
Post-treatment bias. Any covariate measured after treatment may itself be caused by treatment.
Rule of thumb: use covariates measured at baseline only. Many software packages won’t even let you do otherwise.
Match on a pre-treatment covariate that differs a lot between groups, say covariate is low for treated and high for untreated:
Average DID over 1,000 sims (true effect = 0): full data ≈ 0.006 vs. matched pair ≈ -0.480. Matching the closest pair manufactures a downward “effect” out of pure regression to the mean.
You must decide on which scale you believe parallel trends, and use that outcome.
The design still feels obvious: more dose → more change. Right?
DID watches how a gap in outcome levels changes from before to after.
But who says it has to be a gap in levels? We can run DID on a gap in almost anything:
The big use: find a group that shouldn’t be affected, and subtract its “effect” out.
Marshes vs. parks. A policy funds trash removal from marshes in some prefectures.
Maryland 2008 law: mortgage servicers must report their loan-modification activity. Did it change behavior?
ESRR servicers modified more loans and foreclosed more — the second effect being the opposite of the policy’s intent.
We “check” parallel trends with pre-treatment event-study coefficients. But:
pretrends)“Passing” the pre-trends test does not mean parallel trends holds.
Stop pretending parallel trends is exactly true. Instead, bound how badly it could be violated:
HonestDiD in R / StataMedicaid-expansion event study (5 pre-periods, 2 post). Original estimate is a significant positive effect. Relax parallel trends by \(\bar{M}\times\) the max pre-period violation:
Original CI (left) excludes 0; robust intervals stay above 0 until \(\bar{M}=2\).
| Problem | Reach for |
|---|---|
| Controls done right | Callaway-Sant’Anna, ETWFE, doubly-robust |
| Binary / count outcome | LPM, Poisson, nonlinear ETWFE |
| Continuous / dose treatment | de Chaisemartin-D’Haultfœuille, CGS |
| DID on a placebo group | Triple differences (DDD) |
| Unsure about parallel trends | HonestDiD sensitivity analysis |
Please don’t leave thinking “the old way is broken but this new way Just Works.”
Nothing Just Works in DID. Every change in the setting or data probably requires a change of estimator. That estimator usually exists but you need to find it!
For any DID that isn’t plain-OLS-no-covariates-single-period, ask:
It can work. It just won’t Just Work.
did, etwfe, DRDID, HonestDiD, pretrendsBetween DID Concept and DID Execution