Equivalence Tests of Identification Assumptions

Dor Leventer

Fall 2022

Introduction

A Causal Inference Framework

A causal inference framework is made up of the following components:

Estimand + Identification Assumptions -> Identification Theorem
Id. Theorem -> Estimator
Estimator + Statistical Assumptions -> Inference
Id. Assumptions -> Validation Tests + Sensitivity Analysis

Today we will focus on validation tests

Motivation

All our causal inference frameworks have assumptions

RCT: conditional independence \(Y(w)\) and \(W\)
- Test: Balance (no mean diff.) on pre-treatment covariates
DID: parallel trends of \(Y(0)\) post treatment
- Test: Trends before treatment aren’t diff.
RD: continuity of \(\mathbb{E}[Y(w)]\) around the cutoff
- Test: Pre-treatment covariates don’t exhibit discon.

Question: how to perform these validation tests?

An Example DID Setup

Say we have three time periods \(t\in\{1,2,3\}\)
Say we have treatment \(G_i=1\) and control \(G_i=0\)
And for treatment group, treatment starts at \(t=3\)
So we need parallel trends (PT). Define for variable \(X\) \[\Delta X_{g,t}=\mathbb{E}\left[X_{i,t}-X_{i,t-1}\mid G_i=g\right]\]
Can write PT of \(Y(0)\) at time \(t=3\) as \[\Delta Y_{1,3}\left(0\right) = \Delta Y_{0,3}\left(0\right)\]

Testing PT

We can never test PT directly, since \(Y(0)\) unobserved for \(G_i=1\) at \(t=3\).
Hence, we usually conduct a suggestive test on trends before treatment
If we assume no anticipation, or, \(\forall t<3: Y_{i,t}=Y_{i,t}(0)\)
Then can test for \[\Delta Y_{1,2}\left(0\right) = \Delta Y_{0,2}\left(0\right)\]
Using \[\Delta Y_{1,2} = \Delta Y_{0,2}\]

We usually call such a test pre-trend testing.

The Pre-Trend Test

We now turn to the formal statistical test

Lets start with the hypothesis.
Usually we write the null and alternative as \[H_0: \Delta Y_{1,2} - \Delta Y_{0,2} = 0\] \[H_1: \Delta Y_{1,2} - \Delta Y_{0,2} \neq 0\]

This has several problems, which we will now go over.

Problems with convential testing methods

Burden of proof of treatment effects

Say, for a second, we are testing whether some treatment has some effect.

Then the hypothesis \[H_0:\theta=0\] puts the burden of proof on the researcher.
That is, we assume that there is no treatment effect
- (opposite of what the researcher wants usually)
And say - assuming there is no effect, lets consider your results

Thats how we construct the test statistic

We assume a world where the null is true (or, treatment doesn’t work)
And then consider the results

Burden of proof of parallel trends

Going back to the PT example

Our null hypothesis was \[H_0: \Delta Y_{1,2} - \Delta Y_{0,2} = 0\]
Which is to say, identification holds
But this takes away the burden of proof from the researcher!
That is, given \(H_0\), the test statistic shows how likely the results are in a world where identification holds.
This… seems opposite of what we want.

Point 1: convential testing assumes identification holds, when we want to assume it doesn’t.

Type I and type II errors of treatment effects

When testing for treatment effects, we (want to?) control for rate of type I error.

Consider the type I and II error table

	PT true	PT false
Reject PT	\(\mathbb{P}(\text{PT holds and we reject it})=\alpha\)	\(\mathbb{P}(\text{PT doesn't hold and we reject it})=1-\alpha\)
Accept PT	\(\mathbb{P}(\text{PT holds and we accept it})=1-\beta\)	\(\mathbb{P}(\text{PT doesn't hold and we accept it})=\beta\)

If we want to control for a similar essence as in the previous slide
We want to control for the rate at which we find PT when it doesn’t hold
That’s \(\beta\)…

Point 2: convential testing controls for type I when we want to control for type II.

Pre-trend event study plots

A visual example

To build intuition
Lets consider the confidence intervals of pre-trends in an event study
The exercise
- We start from 99% CI
- Lower to 95%, and then 90%
- While we do this
- Think when \(\mathbb{P}(\text{PT doesn't hold and we accept it})=\beta\) increases

A visual example

Equivalence testing

Lets see what we can do

Point 1: convential testing assumes identification holds, when we want to assume it doesn’t.

Point 2: convential testing controls for type I when we want to control for type II.

We want to assume that identification doesn’t hold
What about simply switching the hypothesis? \[H_0: \Delta Y_{1,2} - \Delta Y_{0,2} \neq 0\] \[H_1: \Delta Y_{1,2} - \Delta Y_{0,2} = 0\]
But using real data, this makes the null hypothesis (almost) always correct…
Also, what should we assume in the hypothesis?

\(\rightarrow\) next approch please

Second approach

Point 1: convential testing assumes identification holds, when we want to assume it doesn’t.

Point 2: convential testing controls for type I when we want to control for type II.

Okay, so not \(\neq0\), but maybe greater than something?
Lets set some parameter \(k\), such that \[H_{0}:\Delta Y_{1,2}-\Delta Y_{0,2}>k\] \[H_1: \Delta Y_{1,2}-\Delta Y_{0,2}\leq k\]
Almost there
Problem - negative values / maybe we need a distance metric

Third approach

Point 1: convential testing assumes identification holds, when we want to assume it doesn’t.

Point 2: convential testing controls for type I when we want to control for type II.

Need to take into account negative values \(\rightarrow\) absolute difference \[H_{0}:\left|\Delta Y_{1,2}-\Delta Y_{0,2}\right|>k\] \[H_1:\left|\Delta Y_{1,2}-\Delta Y_{0,2}\right|\leq k\]

A solution to our problems?

We say that the estimator and truth are equivalent if they are less then \(k\) apart.

That is the stated alternative hypothesis, what the researcher assumes. \[H_1:\left|\Delta Y_{1,2}-\Delta Y_{0,2}\right|\leq k\]
The null hypothesis, is that this is not true \(\rightarrow\) the differene is bigger then \(k\) / not equivalent \[H_{0}:\left|\Delta Y_{1,2}-\Delta Y_{0,2}\right|>k\]

Hence the first point above is solved. What about controlling for the correct error type?

But first, a statistical test.

Two One-Sided Tests (TOST)

We want to test \(H_{0}:\left|\Delta Y_{1,2}-\Delta Y_{0,2}\right|>k\)

We accpent the null (and reject the alternative) if \[\underset{H_{0,A}}{\Delta Y_{1,2}\left(0\right)-\Delta Y_{0,2}\left(0\right)}>k\quad\text{or}\quad\underset{H_{0,B}}{\Delta Y_{1,2}\left(0\right)-\Delta Y_{0,2}\left(0\right)}<-k\]
So if we reject both of these, we reject the null of (PT doesn’t hold)

Two one-sided T tests

We can construct a test statistic for each (one-sided) hypothesis.
Denote the estimator of the difference by \(\widehat{\beta}_2\), and construct the test statistic \[T_{A}=\frac{\widehat{\beta}_{2}-k}{\sqrt{\mathbb{V}\left(\widehat{\beta}_{2}\right)}}\quad\text{and}\quad T_{B}=\frac{\widehat{\beta}_{2}+k}{\sqrt{\mathbb{V}\left(\widehat{\beta}_{2}\right)}}\]
Can construct critical values for both using some \(t_{\alpha/2}\)
If both are unlikely, as in \(T_A < -t_{\alpha/2}\) and \(T_B > t_{\alpha/2}\), then \(H_0\) is unlikely

\(\rightarrow\) \(\alpha\) now controls for type II error

TOST vs. the convential test

Hartman and Hidalgo (2018) show that we can do the above test

By calculating single test statistic, and comparing to uncented \(t\) distributions above and below
This way of doing the TOST allows a nice comparison to the prior testing method (tentative graph only…)

TOST vs. the convential test

Lets visualize this

Again, a visual example

To build intuition
Lets consider the same event study plots us before

Again, a visual example

This was the plot from before, using \(\alpha = 0.1\)

Again, a visual example

Lets focus on the period -5. The estimated difference is \(-1.1\).

Again, a visual example

If we set \(k=2.5\), both one sided tests are not rejected, and so deemed equivalent

Again, a visual example

If we set \(k=1\), one test is rejected, and hence deemed not equivalent (PT fails)

Again, a visual example

To build more intuition, consider \(t=-6\). Zero is not rejected.

Again, a visual example

If we set \(k = 0.5\), zero is accepted (not different) but equivalence is rejected (yes different)

Next problem: how to choose \(k\)?

Equivalence range and interval

Hartman and Hidalgo (2018) discuss these tests in the context of balance tables for RCTs

First, suggest that expert domain knowledge is best

In most of our context, seems hard to argue for correct range

So discuss some default values
The minimal range of \(k\) that rejects \(H_0\) at wanted level \(\alpha\) – termed equivalence range
Within some pre-determined range, 0.36 of SD of the covariate in the control group – termed equivalence confidence interval

Equivalence range and interval

Lets again look at a tentative graph

Equivalence range and interval

Lets focus on the pre-trends

Equivalence range and interval

Get rid of conventional error bars

Equivalence range and interval

Add equivalence range - minimal \(k\) that rejects \(H_0\) of not equivalent at \(\alpha=0.05\)

Equivalence range and interval

Add equivalence interval: 0.36 standard deviation in control group

Closing remarks

Summary: main pros and cons

Main pro:

Correct testing procedure!

Main con:

Need to argue for your choice of \(k\)…

But, maybe not a con? This is more work for researchers (arg, again with the econometricians producing work for applied…)

But, allows the researcher to transparently encode the identification assumption and validation test.

And thats it!

Thanks for listening.

All code (and hence slides) is available at Git repo https://github.com/dorlev3/equiv_test_and_identification_talk

References

Bilinski, Alyssa, and Laura A Hatfield. 2018. “Nothing to See Here? Non-Inferiority Approaches to Parallel Trends and Other Model Assumptions.” arXiv Preprint arXiv:1805.03273.

Hartman, Erin. 2021. “Equivalence Testing for Regression Discontinuity Designs.” Political Analysis 29 (4): 505–21.

Hartman, Erin, and F Daniel Hidalgo. 2018. “An Equivalence Approach to Balance and Placebo Tests.” American Journal of Political Science 62 (4): 1000–1013.

Liu, Licheng, Ye Wang, and Yiqing Xu. n.d. “A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data.” American Journal of Political Science n/a (n/a). https://doi.org/https://doi.org/10.1111/ajps.12723.

Equivalence Tests of Identification Assumptions

Introduction

A Causal Inference Framework

Motivation

An Example DID Setup

Testing PT

The Pre-Trend Test

Problems with convential testing methods

Burden of proof of treatment effects

Burden of proof of parallel trends

Type I and type II errors of treatment effects

Pre-trend event study plots

A visual example

A visual example

A visual example

A visual example

Equivalence testing

Lets see what we can do

Second approach

Third approach

A solution to our problems?

Two One-Sided Tests (TOST)

Two one-sided T tests

TOST vs. the convential test

TOST vs. the convential test

Lets visualize this

Again, a visual example

Again, a visual example

Again, a visual example

Again, a visual example

Again, a visual example

Again, a visual example

Again, a visual example

Next problem: how to choose \(k\)?

Equivalence range and interval

Equivalence range and interval

Equivalence range and interval

Equivalence range and interval

Equivalence range and interval

Equivalence range and interval

Closing remarks

More to read

Summary: main pros and cons

And thats it!

References