Understanding negative controls in observational studies

class: center, middle, inverse, title-slide

# Understanding negative controls in observational studies
### Kristen Hunter (Harvard University)
### April 19, 2019 .small-right[<a href='https://cdn3.vectorstock.com/i/1000x1000/48/37/colorful-abstract-background-design-vector-21524837.jpg'>Image: Vector Stock</a>]

---

# Confounding in observational studies

- In analyzing observational studies we rely on the **unconfoundedness** assumption, that the assignment mechanism does not depend on potential outcomes given the covariates:

`$$E(Z \mid X, Y(0), Y(1)) = E(Z \mid X)$$`

Where
- `$Z$` is the treatment
- `$Y(Z)$` are the potential outcomes
- `$X$` are observed covariates

<center>What if we do not have unconfoundedness?</center>

---

# Negative control outcome

- A negative control outcome is not influenced by the **treatment** (but is influenced by confounding variables)
- Can be used to **detect or remove** confounding

If `$Z$` and `$Z^\prime$` are two different treatments, and `$N(Z)$` is a negative control outcome:

`$$N(Z) = N(Z^\prime)$$`

<img src="https://1funny.com/wp-content/uploads/2011/12/irrelephant-2.jpg" class="center-img-small" style="width:450px">
<a href="https://1funny.com/wp-content/uploads/2011/12/irrelephant-2.jpg">
Image: 1Funny</a>

---

# Negative control outcome examples

<table id="nc">
 <tr>
 <th>Causal effect</th>
 <th>Negative control outcome</th>
 <th>Reference</th>
 </tr>
 <tr>
 <td>radon on lung cancer</td>
 <td>Chronic Obstructive Pulmonary Disease (COPD)</td>
 <td>Richardson et al. 2014</td>
 </tr>
 <tr>
 <td>influenza vaccine on mortality rate</td>
 <td>mortality rate in pre-influenza summer season
 </td>
 <td>Jackson et al. 2006</td>
 </tr>
 <tr>
 <td>breast feeding on child's obesity</td>
 <td>pigeon home invasion v. mice home invasion</td>
 <td>Lawlor et al. 2016</td>
 </tr>
</table>

.pull-left[
<img src="https://www.memesmonkey.com/images/memesmonkey/51/5198b99bf94cd98b05430a8d544004f2.jpeg" style="width:200px;height:auto;"> 
<a href="https://www.memesmonkey.com/images/memesmonkey/51/5198b99bf94cd98b05430a8d544004f2.jpeg">
Image: Meme</a> 
]

.pull-right[
<img src="http://www.quickmeme.com/img/de/dea3b937cdaa3f837382f396f8e5b57e78e8f111cdf7ade912c69c446e20414d.jpg" style="width:250px;height:auto;"> 
<a href="http://www.quickmeme.com/img/de/dea3b937cdaa3f837382f396f8e5b57e78e8f111cdf7ade912c69c446e20414d.jpg">
Image: Quick meme</a>
]

---

# Negative control exposure

.pull-left[
- A negative control **exposure** is a treatment that does not causally affect the outcome of interest
- If `$W$` is the negative control exposure:

`$$Y(W = 1) = Y(W = 0)$$`
]

.pull-right[
<img src="https://pics.me.me/loren-alman-fast-acting-extra-placebos-strength-place-bos-hmm-better-12471271.png" class="center-img-small" style="width:400px">
<a href="https://pics.me.me/loren-alman-fast-acting-extra-placebos-strength-place-bos-hmm-better-12471271.png">
Image: Meme Collection</a>
]
---

# Negative control exposure examples

<table id="nc">
 <tr>
 <th>Causal effect</th>
 <th>Negative control exposure</th>
 <th>Reference</th>
 </tr>
 <tr>
 <td>maternal smoking on low birth weight</td>
 <td>paternal smoking</td>
 <td>Smith 2008</td>
 </tr>
 <tr>
 <td>influenza vaccine on pneumonia</td>
 <td>tetanus vaccine</td>
 <td>Lipsitch et al. 2010</td>
 </tr>
 <tr>
 <td>air pollution on mortality</td>
 <td>air pollution on a future day</td>
 <td>Miao et al. 2018</td>
 </tr>
</table>

.pull-left[
<img src="https://i.imgur.com/ocDphMS.png" style="width:275px;height:auto;"> 
<a href="https://i.imgur.com/ocDphMS.png">
Image: Imgur</a> 
]

.pull-right[
<img src="https://www.google.com/imgres?imgurl=https%3A%2F%2Flookaside.fbsbx.com%2Flookaside%2Fcrawler%2Fmedia%2F%3Fmedia_id%3D506598522729800&imgrefurl=https%3A%2F%2Fwww.facebook.com%2FBaby-Memes-506598522729800%2F&docid=sZxUuoLg5LSCcM&tbnid=AMUv7NTjizlx8M%3A&vet=10ahUKEwiH9djon9rhAhUIm-AKHXPoBNkQMwhFKAkwCQ..i&w=960&h=637&safe=active&bih=728&biw=1399&q=baby%20meme&ved=0ahUKEwiH9djon9rhAhUIm-AKHXPoBNkQMwhFKAkwCQ&iact=mrc&uact=8" style="width:400px;height:auto;"> 
<a href="https://www.google.com/imgres?imgurl=https%3A%2F%2Flookaside.fbsbx.com%2Flookaside%2Fcrawler%2Fmedia%2F%3Fmedia_id%3D506598522729800&imgrefurl=https%3A%2F%2Fwww.facebook.com%2FBaby-Memes-506598522729800%2F&docid=sZxUuoLg5LSCcM&tbnid=AMUv7NTjizlx8M%3A&vet=10ahUKEwiH9djon9rhAhUIm-AKHXPoBNkQMwhFKAkwCQ..i&w=960&h=637&safe=active&bih=728&biw=1399&q=baby%20meme&ved=0ahUKEwiH9djon9rhAhUIm-AKHXPoBNkQMwhFKAkwCQ&iact=mrc&uact=8">
Image: Baby memes</a>
]

---

# Control Outcome Calibration Approach

- COCA (Tchetgen Tchetgen 13) **calibrates** a treatment effect using a **negative control outcome** in an observational study with unobserved confounding
- Negative control outcome: `$N(1) = N(0)$`

<img src="plots/coke.jpg" class="center-img" style="width:400px;height:auto;"> 
<a href="http://www.megumistbarth.com/71-large_default/coca-zero-33-cl.jpg">
Image: Megumi</a>

---

# Negative control assumptions

- If the potential outcomes are confounded with treatment, so is the negative control, so that the the **negative control** actually **detects** the confounding we are interested in:

`$$N {\not\!\perp\!\!\!\perp} Z \mid X \iff \{Y(1), Y(0)\} {\not\!\perp\!\!\!\perp}  Z \mid X$$`
- In terms of assignment mechanism:
`\begin{align}
E(Z | N, X) &\neq E(Z \mid X) \iff\\
E(Z | X, Y(1), Y(0)) &\neq  E(Z \mid X)
\end{align}`

<img src="plots/detect.jpg" class="center-img" style="width:300px;height:auto;"> 
<a href="http://www.quickmeme.com/img/13/1313dc2f7f7de6450b7fc28183f01ab345061ba0ece7b6630fa6c21ae19f1527.jpg">
Image: Quick Meme</a>

---

# COCA assumptions

- To go beyond **detecting** confounding, and actually **remove** confounding, we need a strong assumption:

`$$N {\!\perp\!\!\!\perp} Z \mid \{X, Y(0), Y(1)\}$$`

In terms of assignment mechanism:

`$$E(Z \mid X, N, Y(1), Y(0)) = E(Z \mid X, Y(1), Y(0))$$`

- Remember that we had `$N {\not\!\perp\!\!\!\perp} Z \mid X$`, but given `$Y(0), Y(1)$`, now we have reached conditional independence
- The potential outcomes give us all the information we need to remove any confounding
- There is no extra confounding in the relationship between `$N$` and `$Z$` that is not founding in the relationship between `$\{Y(0), Y(1)\}$` and `$Z$`

---

# COCA assumptions

- Final assumption: additive treatment effect:

`$$Y(1) = Y(0) + \beta$$`

- This means that conditioning on `$\{Y(0), Y(1)\}$` is the same as conditioning on `$Y^{obs}$` because `$Y(0)$` and `$Y(1)$` are deterministically related

<img src="plots/add_meme.jpg" class="center-img" style="width:225px;height:auto;"> 
<a href="http://memecrunch.com/meme/5NOZF/not-adding-up/image.jpg">
Image: Meme Crunch</a>

---

# COCA assumptions

- So we can convert:

`$$N {\!\perp\!\!\!\perp} Z \mid \{Y(0), Y(1)\}$$`
- To conditioning on what we observe:
`$$N {\!\perp\!\!\!\perp} Z \mid Y^{obs}$$`
- Which we can also write as:

`$$E(N \mid Z = 1, Y^{obs}) = E(N \mid Z = 0, Y^{obs})$$`

---

# COCA method

- We can think of this in terms of a regression:

`$$E(N \mid Z, Y^{obs}) = \alpha_0 + \alpha_{ny.z}Y^{obs} + \alpha_{nz.y} Z$$`

- Notation: `$\alpha_{ab.c}$` is the regression coefficient of regressing `$a$` on `$b$` if we control for `$c$`

<img src="plots/regress_meme.jpg" class="center-img" style="width:400px;height:auto;"> 
<a href="https://i.imgflip.com/sy501.jpg">
Image: Img Flip</a>

---

# COCA method

- So we can split this into the different `$Z$` cases:

`\begin{align}
E(N \mid Z = 1, Y^{obs}) &= \alpha_0 + \alpha_{ny.z}Y^{obs}(1) + \alpha_{nz.y}\\
E(N \mid Z = 0, Y^{obs}) &= \alpha_0 + \alpha_{ny.z}Y^{obs}(0)\\
\end{align}`

Where:
- `$Y^{obs}(1)$` is the subset of the vector `$Y^{obs}$` for which `$Z_i = 1$`
- `$Y^{obs}(0)$` is the subset of the vector `$Y^{obs}$` for which `$Z_i = 0$`

---

# COCA method

Using this assumption, we can arrive at an estimate of the causal effect, `$\hat{\beta}$`:

`\begin{align*}
0 &= E\left[N \mid Z = 1,Y^{obs}\right] - E\left[N \mid Z = 0,Y^{obs}\right]\\
&= \left[\alpha_0 + \alpha_{ny.z} Y^{obs}(1) + \alpha_{nz.y}\right] - \left[\alpha_0 + \alpha_{ny.z} Y^{obs}(0)\right]\\
&= \alpha_{ny.z} \left[Y^{obs}(1) - Y^{obs}(0)\right] + \alpha_{nz.y}\\
&= \alpha_{ny.z}\beta + \alpha_{nz.y}\\
\end{align*}`

So we arrive at our estimator of the treatment effect:

`$$\hat{\beta}_{COCA} = -\frac{\hat{\alpha}_{nz.y}}{\hat{\alpha}_{ny.z}}$$`

---

# COCA summary

- If we have a negative control outcome that meets certain requirements, we have an **unbiased** estimate of the treatment effect
- This is unbiased even though we know we have **violated** the unconfoudedness assumption!
- Assumptions may be strong and can be **hard to interpret**
- Sensitivity analysis to breaking certain assumptions is possible

<img src="plots/happy_possum.jpg" class="center-img" style="width:500px;height:auto;"> 
<a href="https://memegenerator.net/img/instances/74886307/youve-got-this.jpg">
Image: Meme Generator</a>

---

# Literature review of negative controls in causal inference

- *Lipsitch 2010*: Encourages use of negative controls in observational studies to detect confounding and bias
- *Tchetgen Tchetgen 2013*: COCA
- *Miao 2017*: Identifiability conditions with negative controls
- *Schuemie 2018*: Negative controls to calibrate p-values in observational health data
- *Miao 2018*: Combines negative control outcomes and negative control exposures to calibrate treatment effects

---

# Conclusions

- **Negative controls** are an under-utilized method for **detecting** and sometimes **removing** confounding in observational studies
- A negative control outcome or exposure must be picked carefully by the analyst in order to be valid and useful

<img src="plots/awesome_meme.jpg" class="center-img" style="width:400px;height:auto;"> 
<a href="https://sayingimages.com/wp-content/uploads/so-much-awesome-meme.jpg">
Image: Saying Images</a>

---

# Thank you!

---

# References 1

Jackson, L. A, M. A. Jackson, J. C. Nelson, et al. (2006).
"Evidence of bias in estimates of influenza vaccine effectiveness
in seniors Lisa". In: _International Journal of Epidemiology_ 35,
pp. 337-344.

Lawlor, D. A, K. Tilling, and G. D. Smith (2016). "Triangulation
in aetiological epidemiology". In: _International Journal of
Epidemiology_, pp. 1866-1886.

Lipsitch, M, E. T. Tchetgen, and T. Cohen (2010). "Negative
Controls: A Tool for Detecting Confounding and Bias in
Observational Studies Marc". In: _Epidemiology_ 21, pp. 383-388.

Miao, W. and E. T. Tchetgen (2018). "A Confounding Bridge Approach
for Double Negative Control Inference on Causal Effect". In:
_arXiv:1808.04945_. <URL: https://arxiv.org/abs/1808.04945>.

---

# References 2
Richardson, D. B, D. Laurier, M. K. Schubauer-Berigan, et al.
(2014). "Assessment and indirect adjustment for confounding by
smoking in cohhort studies using relative hazard models". In:
_American Journal of Epidemiology_ 180.9, pp. 933-940.

Smith, G. D. (2008). "Assessing Intrauterine Influences on
Offspring Health Outcomes: Can Epidemiological Studies Yield
Robust Findings?" In: _Basic & Clinical Pharmacology and
Toxicology_ 102, pp. 245-256.