Estimation and Testing of Direct Effects in Randomized Experiments With Non-Smooth General Interference

class: center, top, .title-slide, title-slide

.title[
# Estimation and Testing of Direct Effects in Randomized Experiments With Non-Smooth General Interference
]
.subtitle[
## Preliminary Exam Presentation<br>.small[Slides online at <a href="https://rpubs.com/rmtrane/prelim-presentation">https://rpubs.com/rmtrane/prelim-presentation</a>]
]
.author[
### Ralph Møller Trane
]
.institute[
### University of Wisconsin–Madison<br><br>
]
.date[
### 2022-05-17</br>.small[(last compiled: 2022-05-17)]
]

---

# Introductions

* Ralph Trane

* 4th year PhD student in the Department of Statistics

* Born and raised in Denmark, alumni of the University of Copenhagen.

* Previously worked as an Assistant Researcher in the Department of Ophthalmology and Visual Sciences here at UW--Madison under Karen Cruickshank.

* Interested in estimation of causal effects under interference, the use of interactive visuals to communicate ideas, and more...

* Previous project: Nonparametric Bounds in Two-Sample Summary-Data Mendelian Randomization Studies.
    * Published earlier this year in Statistics in Medicine [(Trane and Kang, 2022)](#referencescont)

---

# Overview

Part I: Background

1. Setup of Potential Outcomes in the Presence of Interference
    * general potential outcomes, interference graph
    * estimand and estimation

2. Random Graph Asymptotics
    * introduction of the graphon, potential outcomes model as per [Li and Wager (2021)](#referencescont)
    * asymptotic normality

Part II: Original Work

3. Improving/building on [Li and Wager (2021)](#referencescont)
    * more efficient estimator
    * more inclusive model

4. Summary/Future Work

---
layout: true

# Part I: General Setup

---

**Potential Outcomes in the Presence of General Interference**

We are interested in a *population* of subjects `$i = 1, ..., n$`

Each subject is randomly assigned a *treatment* `$W_i \in \{0,1\}$` as `$W_i \overset{\text{iid}}{\sim} \text{Bernoulli}(\pi)$` `$(\pi \in (0,1))$`

For each subject, an *outcome* `$Y_i$` is observed. This observed outcome is one of many *potential outcomes* `$Y_i(\boldsymbol{w}) \in \mathbb{R}$`, which we assume exist and are **fixed** for all possible treatment assignment vectors `$\boldsymbol{w} \in \{0,1\}^n$`.

Assume a *generalized SUTVA*, i.e. `$Y_i = \sum_{\boldsymbol{w'} \in \{0,1\}^n} 1[\boldsymbol{W} = \boldsymbol{w}'] Y_i(\boldsymbol{w}')$`.

We use `$Y_i(w_i, \boldsymbol{w}_{-i})$` to indicate potential outcome for subject `$i$` when subject `$i$` assigned `$w_i$`, and others assigned `$\boldsymbol{w}_{-i} \in \{0,1\}^{n-1}$`.

**Interference Graph**

In parallel to the potential outcomes, we introduce the notion of an *interference graph*.

An interference graph `$\mathcal{G}$` consists of nodes or vertices at the `$n$` subjects, and an edge or adjacency matrix `$\boldsymbol{E} = \{E_{ij}\}_{i,j=1}^n$`, where `$E_{ij} = 1$` if `$W_j$` influences `$Y_i(\boldsymbol{W})$`, and `$E_{ij} = 0$` otherwise.

(Formally, `$Y_i(w, \boldsymbol{w}_{-i}) = Y_i(w, \boldsymbol{w}_{-i}')$` for all `$\boldsymbol{w}_{-i}, \boldsymbol{w}_{-i}' \in \{0,1\}^{n-1}$` where `$w_j = w_j'$` for all `$j$` with `$E_{ij} = 1$`.)

---

**No Interference Example**: the effect of a drug (for example, aspirin) on disease status (headache).

No interference seems reasonable, so `$E_{ij} = 0$`

Two potential outcomes for each individual because `$Y_i(w_i, \boldsymbol{w}_{-i}) = Y_i(w_i)$`.

An intuitive estimand: Average Treatment Effect (ATE) = `$\frac{1}{n} \sum_{i=1}^n Y_i(1) - Y_i(0)$`.

---

**Interference Example**: the effect of vaccination status on disease status.

No interference unlikely. `$E_{ij} = 1$` if `$i$` and `$j$` often spend time together.

Here, the previous definition of the ATE not exactly useful. [Hudgens and Halloran (2008)](#references) provide a nice generalization of the ATE:

`\begin{align}
\bar{\tau}_\text{DIR} = \frac{1}{n} \sum_{i=1}^n \mathbb{E}[Y_i(1, \boldsymbol{W}_{-i})] - \mathbb{E}[Y_i (0, \boldsymbol{W}_{-i})].
\end{align}`

When potential outcomes are considered fixed, the inner average is over treatment assignment.

If no interference, `$\bar{\tau}_\text{DIR} = \text{ATE}$` because `$\mathbb{E}[Y_i(1, \boldsymbol{W}_{-i})] = \mathbb{E}[Y_i(1)] = Y_i(1)$`.

Also, in this scenario, we could consider multiple causal estimands. For example, the indirect effect of treating a larger or smaller part of the population. Will defer this question for another time.

---

## Estimand & Estimation

As hinted at on the previous slide, we are interested in estimating what we will refer to as the *direct effect*:

`\begin{equation}
\bar{\tau}_\text{DIR} = \frac{1}{n} \sum_{i=1}^n \mathbb{E}[Y_i(1, \boldsymbol{W}_{-i}) - Y_i (0, \boldsymbol{W}_{-i})].
\end{equation}`

Nice result: an unbiased estimator for `$\bar{\tau}_\text{DIR}$` is the well-known Horvitz-Thompson estimator:

$$
`\begin{equation}
\hat{\tau}_\text{DIR}^\text{HT} = \frac{1}{n} \sum_{i=1}^n \frac{Y_i W_i}{\pi} - \frac{Y_i(1-W_i)}{1-\pi}
\end{equation}`
$$

This is unbiased for the ATE if no interference present, and `$\hat{\tau}_\text{DIR}^\text{HT}$` if interference is present.

---

## Open Questions

So, we have an unbiased estimator for our estimand at interest. What we do not have, is

... an unbiased variance estimator of the HT estimator

... asymptotic normality of the HT estimator in general

---
layout: false

# Part I: Possible Answers

[Hudgens and Halloran (2008)](#references): assume partial and stratified interference to get conservative variance estimators, and finite sample confidence intervals.

[Liu, Hudgens, and Becker-Dreps (2016)](#references): assume partial interference with groups being random draws from superpopulation to get asymptotic normality of Inverse Probability-Weighted estimator and Hájek-type estimator.

Both heavily restrict the interference structure.

**Example**: power plant emission data.

[Papadogeorgou, Mealli, and Zigler (2019)](#references) consider power plant emission reduction technology.

Can imagine power plant A influences power plant B influences power plant C.

---
layout: true

# Part I: Random Interference Graph

---

[Li and Wager (2021)](#referencescont) show how viewing the interference graph as a random draw from a graphon can help get asymptotic results.

This is done under the following assumptions on the interference graph.

**Assumption 1**: `$E_{ij} = E_{ji}$`

**Assumption 2**: `$\mathcal{G}$` is randomly generated as follows:
* each individual has a random latent position `$X_i \overset{\text{iid}}{\sim} \text{Uniform}(0,1)$`
* `$G_n: [0,1]^2 \mapsto [0,1]$` is a symmetric function
* probability an edge exists based on the latent positions: `$P(E_{ij} = 1 | X_i, X_j) = G_n(X_i, X_j)$`, `$i < j$`

`$G_n$` is called a *graphon*

**Assumption 3**: `$G_n(X_i, X_j) = \min\{1, \rho_n G(X_i,X_j)\}$` where 
* `$G(\cdot, \cdot): [0,1]^2 \mapsto [0, \infty)$`
* `$0 < \rho_n \le 1$` such that either `$\rho_n = 1$` or `$\rho_n \to 0$` and `$\rho_n n \to \infty$`.

---

Also need a few assumptions on the potential outcomes.

**Assumption 4**: `$Y_i(w, \boldsymbol{w}_{-i}) = f_i(w, X_i, M_i / N_i; \epsilon_i)$` where `$M_i = \sum_{j \neq i} E_{ij} W_j$` and `$N_i = \sum_{j\neq i}E_{ij}$`.

(This is similar to the stratified interference assumption made by [Hudgens and Halloran (2008)](#references))

**Assumption 5**: `$f_i$` is three-times differentiable, and `$|f_i|, |f_i'|, |f_i''|, |f_i'''| \le B$` where the derivative is taken with respect to `$M_i / N_i$`.

**Central Limit Theorem**: under Assumptions 1-5 and some mild assumptions on the graphon,

`\begin{equation}
\sqrt{n}(\hat{\tau}_\text{DIR}^\text{HT} - \bar{\tau}_\text{DIR}) \to N(0, \pi(1-\pi)\mathbb{E}[(R_i + Q_i)^2]).
\end{equation}`

`$R_i$`: would show up if no interference is present; depends on `$f_i$`

`$Q_i$`: additional term due to interference; depends on first derivatives of `$f_i$`

Now, a natural question is: can we do better (be more efficient) than the somewhat simple Horvitz-Thompson estimator?

---
layout: true

# Part II: Improving/Building on Li and Wager (2021)

## More Efficient Estimator (?)

---

[Li and Wager (2021)](#referencescont) considered the well-known and intuitive Horvitz-Thompson estimator.

Could there be a more efficient estimator out there?

The idea: lower the variance due to interference by "projecting out" the effect of the interference graph.

Suggestion: fit a model `$\hat{Y}$` that predicts the potential outcomes from latent positions. Then estimate the direct effect `$\bar{\tau}_\text{DIR}$` using

`\begin{equation}
\hat{\tau}^\text{new} = \frac{1}{n} \sum_{i=1}^n \left(\frac{W_i(Y_i - \hat{Y}_i(1))}{\pi} + \hat{Y}_i(1)\right) - \frac{1}{n} \sum_{i=1}^n \left(\frac{(1-W_i)(Y_i - \hat{Y}_i(0))}{1-\pi}   + \hat{Y}_i(0)\right)
\end{equation}`

where `$\hat{Y}_i(w)$` is the predicted potential outcome of individual `$i$` had they received treatment `$w$`.

Latent positions unknown, so use some estimate of latent positions.

---

### Early Simulation Results

Five different potential outcome models.

Latent positions: `$X_i \overset{\text{iid}}{\sim} \text{Uniform}(0,1)$`.

Graphon: `$G_n(X_i, X_j) = (3/10 + 3/5\cdot 1[X_i > 0.5])\cdot(3/10 + 3/5\cdot 1[X_j > 0.5])$`.

Two predictive models for `$\hat{Y}$`:

1. use the actual potential outcome model with actual latent positions (oracle)
2. linear model with formula `Y ~ W*(E1 + ... + E10)` where `E1, ..., E10` are the first `$10$` eigenvectors of the adjacency matrix.

---

---
layout: true

# Part II: Improving/Building on Li and Wager (2021)

## More inclusive model

---

The model proposed by [Li and Wager (2021)](#referencescont) excludes models that might seem plausible.
 
**Example**: power plant emission data.

[Papadogeorgou, Mealli, and Zigler (2019)](#references) consider power plant emission reduction technology.

Distance between power plants might influence spill over effect.

Peer effects might not be symmetrical.

Spill over effect might not be smooth enough for [Li and Wager (2021)](#referencescont).

---

We propose a slightly tweaked model, and are working to show that the same `$\sqrt{n}$`-convergence result holds.

.pull-left[
For the potential outcomes, assume

`\begin{align}
&Y_i(w, \boldsymbol{w}_{-i}) \\
&\quad = f_i(w, X_i, \pi; \epsilon_i) \\
&\qquad + \frac{1}{n\rho_n} \sum_{j \neq i} A_i(w, \boldsymbol{E}_i, X_i, X_j; \alpha_i) E_{ij} (w_j - \pi) \\
&\qquad + \left(\frac{1}{n\rho_n} \sum_{j \neq i} B_i(w, \boldsymbol{E}_i, X_i, X_j; \beta_i) E_{ij} (w_j - \pi)\right)^2 \\
&\qquad + r_i(w, \boldsymbol{E}_i, (\boldsymbol{EW})_i, X_i; \eta_i)
\end{align}`
]

.pull-right[
Compare to the Taylor expansion
`\begin{align}
&f_i(w, X_i, M_i/N_i; \epsilon_i) \\
&\quad = f_i(w, X_i, \pi; \epsilon_i) \\
&\qquad + f_i'(w, X_i, \pi; \epsilon_i)(M_i / N_i - \pi) \\
&\qquad + \frac{1}{2}f_i''(w, X_i, \pi; \epsilon_i)(M_i / N_i - \pi)^2 \\
&\qquad + \frac{1}{6}f_i'''(w, X_i, \pi_i^*; \epsilon_i)(M_i / N_i - \pi)^3
\end{align}`

Remember: `$M_i = \sum_{j\neq i} E_{ij}W_j$`.
]

---

We propose a slightly tweaked model, and are working to show that the same `$\sqrt{n}$`-convergence result holds.

.pull-left[
For the potential outcomes, assume

.pull-right[
Compare to this rewrite of the Taylor expansion
`\begin{align}
&f_i(w, X_i, M_i/N_i; \epsilon_i) \\
&\quad = f_i(w, X_i, \pi; \epsilon_i) \\
&\qquad + \frac{1}{n\rho_n} \sum_{j\neq i} \frac{f_i'(w, X_i, \pi; \epsilon_i)n\rho_n}{N_i} E_{ij}(W_j - \pi) \\
&\qquad + \left(\frac{1}{n\rho_n} \sum_{j\neq i} \frac{\sqrt{f_i''(w, X_i, \pi; \epsilon_i)}n\rho_n}{N_i}(W_j - \pi)\right)^2 \\
&\qquad + \frac{1}{6}f_i'''(w, X_i, \pi_i^*; \epsilon_i)(M_i / N_i - \pi)^3
\end{align}`

Remember: `$M_i = \sum_{j\neq i} E_{ij}W_j$`.
]

---

We propose a slightly tweaked model, and are working to show that the same `$\sqrt{n}$`-convergence result holds.

.pull-left[
For the potential outcomes, assume

`\begin{align}
&Y_i(w, \boldsymbol{w}_{-i}) \\
&\quad = f_i(w, X_i, \pi; \epsilon_i) \\
&\qquad + \frac{1}{n\rho_n} \sum_{j \neq i} \color{#56B4E9}{A_i(w, \boldsymbol{E}_i, X_i, X_j; \alpha_i)} E_{ij} (w_j - \pi) \\ 
&\qquad + \left(\frac{1}{n\rho_n} \sum_{j \neq i} \color{#009E73}{B_i(w, \boldsymbol{E}_i, X_i, X_j; \beta_i)} E_{ij} (w_j - \pi)\right)^2 \\
&\qquad + \color{#E69F00}{r_i(w, \boldsymbol{E}_i, (\boldsymbol{EW})_i, X_i; \eta_i)}
\end{align}`

]

.pull-right[
Compare to this rewrite of the Taylor expansion
`\begin{align}
&f_i(w, X_i, M_i/N_i; \epsilon_i) \\
&\quad = f_i(w, X_i, \pi; \epsilon_i) \\
&\qquad + \frac{1}{n\rho_n} \sum_{j\neq i} \color{#56B4E9}{\frac{f_i'(w, X_i, \pi; \epsilon_i)n\rho_n}{N_i}}E_{ij}(W_j - \pi) \\
&\qquad + \left(\frac{1}{n\rho_n} \sum_{j\neq i} \color{#009E73}{\frac{\sqrt{f_i''(w, X_i, \pi; \epsilon_i)}n\rho_n}{N_i}}(W_j - \pi)\right)^2 \\
&\qquad + \color{#E69F00}{\frac{1}{6}f_i'''(w, X_i, \pi_i^*; \epsilon_i)(M_i / N_i - \pi)^3}
\end{align}`

Remember: `$M_i = \sum_{j\neq i} E_{ij}W_j$`.
]

---

**Example**: power plant emission data.

[Papadogeorgou, Mealli, and Zigler (2019)](#references) consider power plant emission reduction technology.

Distance between power plants might influence spill over effect.

Peer effects might not be symmetrical.

Spill over effect might not be smooth enough for [Li and Wager (2021)](#referencescont).

Our model could allow for all of this:

`\begin{equation}
Y_i(w, \boldsymbol{w}_{-i}) = f_i(w) + \frac{1}{n\rho_n} \sum_{j\neq i} (\max\{0, X_i - X_j\} + \alpha_i) E_{ij}(w_j - \pi) + \eta_i
\end{equation}`

---

We derive same asymptotic result when `$f_i, A_i, B_i$` are all bounded, and `$r_i$` is "well behaved".

**Lemma**: Under Assumptions 1-3 and the tweaked model,

`\begin{align}
\hat{\tau}_\text{DIR}^\text{HT} - & \bar{\tau}_\text{DIR} = \frac{1}{n}\sum_{i=1}^n \left(\frac{f_i(1, X_i, \pi; \epsilon_i)}{\pi} + \frac{f_i(0, X_i, \pi; \epsilon_i)}{1-\pi}\right)(W_i - \pi) \\
&\quad + \frac{1}{n^2 \rho_n} \sum_{i=1}^n \sum_{j \neq i}\left(A_i(1, \boldsymbol{E}_i, X_i, X_j; \alpha_i) - A_i(0, \boldsymbol{E}_i, X_i, X_j; \alpha_i)\right)E_{ij}(W_j - \pi) \\
&\quad + \mathcal{O}_p\left(\delta \right).
\end{align}`

where

`\begin{align}
\delta = \frac{\sqrt{\max_i N_i}}{n^{3/2} \rho_n} + \frac{\sqrt{n \max_i N_i^2 + \max_i N_i \sum_{i\neq j} \gamma_{ij}}}{n^3 \rho_n^2} + \sqrt{\frac{\sum_{i\neq j} \gamma_{i,j}}{n^{5}\rho_n^{3}} + \frac{1}{n_4 \rho_n^3}}
\end{align}`

---

**Conjecture**: Under Assumptions 1-3, the tweaked model, and mild assumptions on the graphon,

`\begin{equation}
\sqrt{n}(\hat{\tau}_\text{DIR}^\text{HT} - \bar{\tau}_\text{DIR}) \to N(0, \pi(1-\pi)\mathbb{E}[(R_i + Q_i)^2]).
\end{equation}`

---
layout: false

# Part II: Summary/Future Work

We have discussed:

* interference is challenging -- especially variance estimation.

* strict assumptions on interference structure not always reasonable.

* interpreting the interference graph as a realization from a graphon model can help us get asymptotic results without interference structure restrictions.

* Li and Wager (2021) exclude non-smooth potential outcomes, and do not allow for asymmetrical or non-smooth interference.
    * We can include both by making minor tweaks

* early indications of potential efficiency gains using more complex estimators.

---
# Part II: Summary/Future Work

In the near future, we hope to

1. complete the proof of asymptotic normality of the Horvitz-Thompson estimator under our slightly tweaked model

2. formalize our newly proposed estimator, and start work on asymptotic results
    * hopefully helps us determine when this is more/less efficient than Horvitz-Thompson estimator

More long term projects include:

1. individualized propensity scores to maybe extend to observational data

2. heterogeneous Treatment Effects

3. consider indirect effect estimators

---
layout: false
name: references

# References

Manski, C. F. (1990). "Nonparametric Bounds on Treatment Effects". In:
_The American Economic Review_ 80.2, pp. 319-323. ISSN: 0002-8282.
JSTOR: [2006592](https://www.jstor.org/stable/2006592).

Balke, A. and J. Pearl (1997). "Bounds on Treatment Effects from
Studies with Imperfect Compliance". In: _Journal of the American
Statistical Association_ 92.439, pp. 1171-1176. ISSN: 0162-1459. DOI:
[10.1080/01621459.1997.10474074](https://doi.org/10.1080%2F01621459.1997.10474074).
URL:
[https://doi.org/10.1080/01621459.1997.10474074](https://doi.org/10.1080/01621459.1997.10474074)
(visited on Feb. 05, 2020).

Hudgens, M. G. and M. E. Halloran (2008). "Toward Causal Inference With
Interference". In: _Journal of the American Statistical Association_
103.482, pp. 832-842. ISSN: 0162-1459, 1537-274X. DOI:
[10.1198/016214508000000292](https://doi.org/10.1198%2F016214508000000292).
URL:
[https://www.tandfonline.com/doi/full/10.1198/016214508000000292](https://www.tandfonline.com/doi/full/10.1198/016214508000000292)
(visited on Feb. 22, 2022).

Liu, L., M. G. Hudgens, and S. Becker-Dreps (2016). "On Inverse
Probability-Weighted Estimators in the Presence of Interference". In:
_Biometrika_ 103.4, pp. 829-842. ISSN: 0006-3444. DOI:
[10.1093/biomet/asw047](https://doi.org/10.1093%2Fbiomet%2Fasw047).
URL:
[https://doi.org/10.1093/biomet/asw047](https://doi.org/10.1093/biomet/asw047)
(visited on Feb. 02, 2021).

Papadogeorgou, G., F. Mealli, and C. M. Zigler (2019). "Causal
Inference with Interfering Units for Cluster and Population Level
Treatment Allocation Programs". In: _Biometrics_ 75.3, pp. 778-787.
ISSN: 0006-341X. DOI:
[10.1111/biom.13049](https://doi.org/10.1111%2Fbiom.13049). pmid: pmid.
URL:
[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6784535/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6784535/)
(visited on Mar. 03, 2022).

---
name: referencescont

# References (cont.)

Li, S. and S. Wager (2021). "Random Graph Asymptotics for Treatment
Effect Estimation under Network Interference". URL:
[http://arxiv.org/abs/2007.13302](http://arxiv.org/abs/2007.13302)
(visited on Mar. 01, 2022).

Trane, R. M. and H. Kang (2022). "Nonparametric Bounds in Two-Sample
Summary-Data Mendelian Randomization: Some Cautionary Tales for
Practice". In: _Statistics in Medicine_ n/a.n/a. ISSN: 1097-0258. DOI:
[10.1002/sim.9368](https://doi.org/10.1002%2Fsim.9368). URL:
[https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9368](https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9368)
(visited on May. 07, 2022).