Equivalence Tests of Identification Assumptions

Dor Leventer

Fall 2022

Introduction

A Causal Inference Framework

A causal inference framework is made up of the following components:

  • Estimand + Identification Assumptions -> Identification Theorem
  • Id. Theorem -> Estimator
  • Estimator + Statistical Assumptions -> Inference
  • Id. Assumptions -> Validation Tests + Sensitivity Analysis

Today we will focus on validation tests

Motivation

All our causal inference frameworks have assumptions

  • RCT: conditional independence \(Y(w)\) and \(W\)
    • Test: Balance (no mean diff.) on pre-treatment covariates
  • DID: parallel trends of \(Y(0)\) post treatment
    • Test: Trends before treatment aren’t diff.
  • RD: continuity of \(\mathbb{E}[Y(w)]\) around the cutoff
    • Test: Pre-treatment covariates don’t exhibit discon.

Question: how to perform these validation tests?

An Example DID Setup

  • Say we have three time periods \(t\in\{1,2,3\}\)
  • Say we have treatment \(G_i=1\) and control \(G_i=0\)
  • And for treatment group, treatment starts at \(t=3\)
  • So we need parallel trends (PT). Define for variable \(X\) \[\Delta X_{g,t}=\mathbb{E}\left[X_{i,t}-X_{i,t-1}\mid G_i=g\right]\]
  • Can write PT of \(Y(0)\) at time \(t=3\) as \[\Delta Y_{1,3}\left(0\right) = \Delta Y_{0,3}\left(0\right)\]

Testing PT

  • We can never test PT directly, since \(Y(0)\) unobserved for \(G_i=1\) at \(t=3\).
  • Hence, we usually conduct a suggestive test on trends before treatment
  • If we assume no anticipation, or, \(\forall t<3: Y_{i,t}=Y_{i,t}(0)\)
  • Then can test for \[\Delta Y_{1,2}\left(0\right) = \Delta Y_{0,2}\left(0\right)\]
  • Using \[\Delta Y_{1,2} = \Delta Y_{0,2}\]

We usually call such a test pre-trend testing.

The Pre-Trend Test

We now turn to the formal statistical test

  • Lets start with the hypothesis.
  • Usually we write the null and alternative as \[H_0: \Delta Y_{1,2} - \Delta Y_{0,2} = 0\] \[H_1: \Delta Y_{1,2} - \Delta Y_{0,2} \neq 0\]

This has several problems, which we will now go over.

Problems with convential testing methods

Burden of proof of treatment effects

Say, for a second, we are testing whether some treatment has some effect.

  • Then the hypothesis \[H_0:\theta=0\] puts the burden of proof on the researcher.
  • That is, we assume that there is no treatment effect
    • (opposite of what the researcher wants usually)
  • And say - assuming there is no effect, lets consider your results

Thats how we construct the test statistic

  • We assume a world where the null is true (or, treatment doesn’t work)
  • And then consider the results

Type I and type II errors of treatment effects

When testing for treatment effects, we (want to?) control for rate of type I error.

  • Consider the type I and II error table
PT true PT false
Reject PT \(\mathbb{P}(\text{PT holds and we reject it})=\alpha\) \(\mathbb{P}(\text{PT doesn't hold and we reject it})=1-\alpha\)
Accept PT \(\mathbb{P}(\text{PT holds and we accept it})=1-\beta\) \(\mathbb{P}(\text{PT doesn't hold and we accept it})=\beta\)
  • If we want to control for a similar essence as in the previous slide
  • We want to control for the rate at which we find PT when it doesn’t hold
  • That’s \(\beta\)

Point 2: convential testing controls for type I when we want to control for type II.

Pre-trend event study plots

A visual example

  • To build intuition
  • Lets consider the confidence intervals of pre-trends in an event study
  • The exercise
    • We start from 99% CI
    • Lower to 95%, and then 90%
    • While we do this
    • Think when \(\mathbb{P}(\text{PT doesn't hold and we accept it})=\beta\) increases

A visual example

A visual example

A visual example

Equivalence testing

Lets see what we can do

Point 1: convential testing assumes identification holds, when we want to assume it doesn’t.

Point 2: convential testing controls for type I when we want to control for type II.

  • We want to assume that identification doesn’t hold
  • What about simply switching the hypothesis? \[H_0: \Delta Y_{1,2} - \Delta Y_{0,2} \neq 0\] \[H_1: \Delta Y_{1,2} - \Delta Y_{0,2} = 0\]
  • But using real data, this makes the null hypothesis (almost) always correct…
  • Also, what should we assume in the hypothesis?

\(\rightarrow\) next approch please

Second approach

Point 1: convential testing assumes identification holds, when we want to assume it doesn’t.

Point 2: convential testing controls for type I when we want to control for type II.

  • Okay, so not \(\neq0\), but maybe greater than something?
  • Lets set some parameter \(k\), such that \[H_{0}:\Delta Y_{1,2}-\Delta Y_{0,2}>k\] \[H_1: \Delta Y_{1,2}-\Delta Y_{0,2}\leq k\]
  • Almost there
  • Problem - negative values / maybe we need a distance metric

Third approach

Point 1: convential testing assumes identification holds, when we want to assume it doesn’t.

Point 2: convential testing controls for type I when we want to control for type II.

  • Need to take into account negative values \(\rightarrow\) absolute difference \[H_{0}:\left|\Delta Y_{1,2}-\Delta Y_{0,2}\right|>k\] \[H_1:\left|\Delta Y_{1,2}-\Delta Y_{0,2}\right|\leq k\]

A solution to our problems?

We say that the estimator and truth are equivalent if they are less then \(k\) apart.

  • That is the stated alternative hypothesis, what the researcher assumes. \[H_1:\left|\Delta Y_{1,2}-\Delta Y_{0,2}\right|\leq k\]
  • The null hypothesis, is that this is not true \(\rightarrow\) the differene is bigger then \(k\) / not equivalent \[H_{0}:\left|\Delta Y_{1,2}-\Delta Y_{0,2}\right|>k\]

Hence the first point above is solved. What about controlling for the correct error type?

But first, a statistical test.

Two One-Sided Tests (TOST)

We want to test \(H_{0}:\left|\Delta Y_{1,2}-\Delta Y_{0,2}\right|>k\)

  • We accpent the null (and reject the alternative) if \[\underset{H_{0,A}}{\Delta Y_{1,2}\left(0\right)-\Delta Y_{0,2}\left(0\right)}>k\quad\text{or}\quad\underset{H_{0,B}}{\Delta Y_{1,2}\left(0\right)-\Delta Y_{0,2}\left(0\right)}<-k\]
  • So if we reject both of these, we reject the null of (PT doesn’t hold)

Two one-sided T tests

  • We can construct a test statistic for each (one-sided) hypothesis.
  • Denote the estimator of the difference by \(\widehat{\beta}_2\), and construct the test statistic \[T_{A}=\frac{\widehat{\beta}_{2}-k}{\sqrt{\mathbb{V}\left(\widehat{\beta}_{2}\right)}}\quad\text{and}\quad T_{B}=\frac{\widehat{\beta}_{2}+k}{\sqrt{\mathbb{V}\left(\widehat{\beta}_{2}\right)}}\]
  • Can construct critical values for both using some \(t_{\alpha/2}\)
  • If both are unlikely, as in \(T_A < -t_{\alpha/2}\) and \(T_B > t_{\alpha/2}\), then \(H_0\) is unlikely

\(\rightarrow\) \(\alpha\) now controls for type II error

TOST vs. the convential test

Hartman and Hidalgo (2018) show that we can do the above test

  • By calculating single test statistic, and comparing to uncented \(t\) distributions above and below
  • This way of doing the TOST allows a nice comparison to the prior testing method (tentative graph only…)

TOST vs. the convential test

Lets visualize this

Again, a visual example

  • To build intuition
  • Lets consider the same event study plots us before

Again, a visual example

This was the plot from before, using \(\alpha = 0.1\)

Again, a visual example

Lets focus on the period -5. The estimated difference is \(-1.1\).

Again, a visual example

If we set \(k=2.5\), both one sided tests are not rejected, and so deemed equivalent

Again, a visual example

If we set \(k=1\), one test is rejected, and hence deemed not equivalent (PT fails)

Again, a visual example

To build more intuition, consider \(t=-6\). Zero is not rejected.

Again, a visual example

If we set \(k = 0.5\), zero is accepted (not different) but equivalence is rejected (yes different)

Next problem: how to choose \(k\)?

Equivalence range and interval

Hartman and Hidalgo (2018) discuss these tests in the context of balance tables for RCTs

  • First, suggest that expert domain knowledge is best

In most of our context, seems hard to argue for correct range

  • So discuss some default values
  • The minimal range of \(k\) that rejects \(H_0\) at wanted level \(\alpha\) – termed equivalence range
  • Within some pre-determined range, 0.36 of SD of the covariate in the control group – termed equivalence confidence interval

Equivalence range and interval

Lets again look at a tentative graph

Equivalence range and interval

Lets focus on the pre-trends

Equivalence range and interval

Get rid of conventional error bars

Equivalence range and interval

Add equivalence range - minimal \(k\) that rejects \(H_0\) of not equivalent at \(\alpha=0.05\)

Equivalence range and interval

Add equivalence interval: 0.36 standard deviation in control group

Closing remarks

More to read

There is some interesting stuff going on in this area

  • We already saw equivalence testing equiv. tests for CIA in RCTs
    • Hartman and Hidalgo (2018)
  • Equiv. testing in DID:
    • Bilinski and Hatfield (2018) have a nice discussion on relaxing PT assumptions in the regression
    • Liu, Wang, and Xu (n.d.) discuss the implementation of Hartman and Hidalgo (2018) to DID and PT in more depth, also combining with new TWFE / imputation estimators
  • Equiv. testing in RD
    • Hartman (2021) discuss tests of null of discontinuity of covariates, and has a nice application on RD and close elections

Summary: main pros and cons

Main pro:

  • Correct testing procedure!

Main con:

  • Need to argue for your choice of \(k\)

But, maybe not a con? This is more work for researchers (arg, again with the econometricians producing work for applied…)

  • But, allows the researcher to transparently encode the identification assumption and validation test.

And thats it!

Thanks for listening.

All code (and hence slides) is available at Git repo https://github.com/dorlev3/equiv_test_and_identification_talk

References

Bilinski, Alyssa, and Laura A Hatfield. 2018. “Nothing to See Here? Non-Inferiority Approaches to Parallel Trends and Other Model Assumptions.” arXiv Preprint arXiv:1805.03273.
Hartman, Erin. 2021. “Equivalence Testing for Regression Discontinuity Designs.” Political Analysis 29 (4): 505–21.
Hartman, Erin, and F Daniel Hidalgo. 2018. “An Equivalence Approach to Balance and Placebo Tests.” American Journal of Political Science 62 (4): 1000–1013.
Liu, Licheng, Ye Wang, and Yiqing Xu. n.d. “A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data.” American Journal of Political Science n/a (n/a). https://doi.org/https://doi.org/10.1111/ajps.12723.