Nonparametric Bounds in Two-Sample Summary-Data Mendelian Randomization

class: center, top, .title-slide, title-slide

# Nonparametric Bounds in Two-Sample Summary-Data Mendelian Randomization
## Some Cautionary Tales for Practice .vsmall[(slides at <a href="https://rpubs.com/rmtrane/FutureMRpresentation" class="uri">https://rpubs.com/rmtrane/FutureMRpresentation</a>)]
### Ralph Møller Trane, Hyunseung Kang
### University of Wisconsin–Madison 
### 2021-12-16

---

# Highlights

Problem:

* Previously, nonparametric IV bounds have been thoroughly studied when data on exposure, outcome, and instrument are collected at once (summarized by <a name=cite-swanson_partial_2018></a>[Swanson, Hernán, Miller, et al. (2018)](#bib-swanson_partial_2018))

* Many MR studies use two-sample data, i.e. data on exposure/instrument are separate from data on outcome/instrument

* We study the behavior of nonparametric bounds from two-sample data

--
 
Take-aways:

* Both simulation and real data examples show that two-sample bounds are generally much wider than one-sample bounds making them less useful

* Generally, nonparametric bounds by themselves might be of limited use in two-sample MR studies.

---
layout: true

# Setup

---
 
Does some (binary) `$X$` cause (binary) `$Y$`? (We will only consider binary `$X$`, `$Y$`.)

Formally, want to learn something about `$\text{ATE} = E[Y^1 - Y^0] = E[Y^1] - E[Y^0]$`. Note: binary `$Y$`, so `$-1 \le \text{ATE} \le 1$`.

We will do so using an IV:

Formally, `$Z$` should satisfy

(A1) `$Z \not\perp X$` *(Relevance)*
(A2) `$Z \perp U$` *(Independent instrument)*
(A3) `$Y^{z,x} = Y^{z',x} = Y^{x}$` for all `$x,z,z'$` *(Exclusion restriction)*
(A4) `$Y^{z,x} \perp Z, X | U$` *(Conditional ignorability of `$X,Z$` given `$U$`)*

---
layout: true

# Non-parametric bounds

---

The IV model itself can be used to obtain firm bounds on the ATE. <a name=cite-manski_nonparametric_1990></a>[Manski (1990)](#bib-manski_nonparametric_1990) showed that for a binary instrument

`$$\small
\max \left\{\begin{array}{c}
\max_z -P(Y = 0, X = 1 | Z = z) - P(Y = 1, X = 0 | Z = z) \\
\max_{z_1 \neq z_2} P(Y = 1 | Z = z_1) - P(Y = 1 | Z = z_2) - P(Y = 1, X = 0 | Z = z_1) - P(Y = 0, X = 1 | Z = z_2)
\end{array}\right\} \\ \\
\small \le \qquad \text{ATE} \qquad \le \qquad \\ \\
\small \min \left\{\begin{array}{c}
\min_z P(Y = 1, X = 1 | Z = z) + P(Y = 0, X = 0 | Z = z) \\
\min_{z_1 \neq z_2} P(Y = 1 | Z = z_1) - P(Y = 1 | Z = z_2) + P(Y = 1, X = 0 | Z = z_1) + P(Y = 0, X = 1 | Z = z_2)
\end{array}\right\}$$`

<a name=cite-balke_bounds_1997></a>[Balke and Pearl (1997)](#bib-balke_bounds_1997) showed that the width of these bounds is always less than `$1 - ST$`, where `$ST = |P(X = 1|Z=1) - P(X = 1|Z=0)|$`.

Bounds for arbitrary categorical instruments presented in <a name=cite-richardson_ace_2014></a>[Richardson and Robins (2014)](#bib-richardson_ace_2014).

---
layout: false

# Two-Sample Mendelian Randomization

In some MR analyses, we do not have data on `$(X,Y) | Z$`. Instead, we rely on GWAS results which give information about `$X|Z$` and `$Y|Z$` separately.

Fortunately, bounds using `$P(X|Z)$` and `$P(Y|Z)$` have been derived <a name=cite-ramsahai_causal_2012></a>([Ramsahai, 2012](#bib-ramsahai_causal_2012)), but their behavior not well-known.

Our main question: **what can we learn from nonparametric bounds of causal effects in two-sample MR studies?**

Two "metrics": (1) width of the bounds, and (2) is `$0$` included in bounds?

---

# Result 1: Length of Nonparametric Bounds from Two-Sample MR

Width of many two-sample bounds vs. strength of instruments. Each dot represents bounds based on a set of values for `$P(X|Z)$` and `$P(Y|Z)$`. 
Black: simulated values. Colored: real data.

**Result**: under (A1)-(A4), the width is less than `$2(1-\text{ST})$`.

.small[
(For multi-leveled IV: `$\text{ST} = \max_{z_1 \neq z_2} | P(X = 1 | Z = z_1) - P(X = 0 | Z = z_2)|$`.)
]

---
# Illustration of Result 1

Due to very wide bounds, we are unable to detect direction when using real data, and generally learn very little:

.pull-left[A: Two-sample IV bounds for the ATE of smoking on the incidence of lung cancer.]

.pull-right[B: Two-sample IV bounds for the ATE of high cholesterol on the incidence of heart attack.]

Note: results based on GWAS.

---
# Interpretation of Result 1

Conclusion: we pay a price when using two-sample rather than one-sample data.

Question: how much information is lost due to the two-sample design?

---
layout: true

# Quantifying Information Loss

---

In one-sample data, we get `$\color{blue}{P(X = x, Y = y | Z = z)}$`

In two-sample data, we get `$\color{red}{P(X = x | Z = z), P(Y = y | Z = z)}$`.

What we really lose is information about `$\text{Cov}(X,Y | Z = z)$`!

**IF** we knew `$\text{Cov}(X, Y | Z = z)$`, we could go from two-sample information to one-sample information:

$$
\color{blue}{P(X = x, Y = y | Z = z)} = \color{red}{P(X = x | Z = z)P(Y = y | Z = z)} + (2\cdot I[x = y] - 1)\text{Cov}(X, Y | Z = z)
$$

We obtain *potential* one-sample bounds based on the two-sample data by randomly drawing valid values of `$\text{Cov}(X,Y|Z=z)$`.

By doing so repeatedly, we get a sense of what information might have been obtained from one-sample data nonparametric bounds.

---

We reconstruct 1000 one-sample bounds from each of nine sets of two-sample bounds. Simulated data.

---

.pull-left[
<img src="data:image/png;base64,#/Users/ralphtrane/Documents/ACEBounds/figures/png/example_analyses/trivariate_bounds.png" height="550"/>
]

.pull-right[

Possible one-sample IV bounds for the ATE of

A. smoking on the incidence of lung cancer

B. high cholesterol on the incidence of heart attack

]

---
layout: false

# Lessons Learned

Lesson 1: Two-sample data give bounds that are much more conservative than one-sample data

Lesson 2: In practice, the genetic markers used as instruments are just too weak to guarantee informative bounds

Lesson 3: Bound-based analysis does not, on its own, seem to be terribly useful in a two-sample MR study

Lesson 4: However, it might be useful in addition to other analyses:

* check if an effect estimate based on a different IV method is within the bounds

* bound effect size if direction is already well known

---
layout: false

# References

<a name=bib-balke_bounds_1997></a>[Balke, A. and J.
Pearl](#cite-balke_bounds_1997) (1997). "Bounds on Treatment Effects
from Studies with Imperfect Compliance". In: _Journal of the American
Statistical Association_ 92.439, pp. 1171-1176. ISSN: 0162-1459. DOI:
[10.1080/01621459.1997.10474074](https://doi.org/10.1080%2F01621459.1997.10474074).
URL:
[https://doi.org/10.1080/01621459.1997.10474074](https://doi.org/10.1080/01621459.1997.10474074)
(visited on Feb. 05, 2020).

<a name=bib-manski_nonparametric_1990></a>[Manski, C.
F.](#cite-manski_nonparametric_1990) (1990). "Nonparametric Bounds on
Treatment Effects". In: _The American Economic Review_ 80.2, pp.
319-323. ISSN: 0002-8282.

<a name=bib-ramsahai_causal_2012></a>[Ramsahai, R.
R.](#cite-ramsahai_causal_2012) (2012). "Causal Bounds and Observable
Constraints for Non-Deterministic Models". In: _J. Mach. Learn. Res._
13, pp. 829-848. ISSN: 1532-4435.

<a name=bib-richardson_ace_2014></a>[Richardson, T. S. and J. M.
Robins](#cite-richardson_ace_2014) (2014). "ACE Bounds; SEMs with
Equilibrium Conditions". In: _Statistical Science_ 29.3, pp. 363-366.
ISSN: 0883-4237. DOI:
[10.1214/14-STS485](https://doi.org/10.1214%2F14-STS485). arXiv:
[1410.0470](https://arxiv.org/abs/1410.0470).

<a name=bib-swanson_partial_2018></a>[Swanson, S. A., M. A. Hernán, M.
Miller, et al.](#cite-swanson_partial_2018) (2018). "Partial
Identification of the Average Treatment Effect Using Instrumental
Variables: Review of Methods for Binary Instruments, Treatments, and
Outcomes". En. In: _Journal of the American Statistical Association_
113.522, pp. 933-947. ISSN: 0162-1459, 1537-274X. DOI:
[10.1080/01621459.2018.1434530](https://doi.org/10.1080%2F01621459.2018.1434530).