class: center, top, .title-slide, title-slide # Nonparametric Bounds in Two-Sample Summary-Data Mendelian Randomization ## Some Cautionary Tales for Practice
.vsmall[(slides at
https://rpubs.com/rmtrane/FutureMRpresentation
)] ### Ralph Møller Trane, Hyunseung Kang ### University of Wisconsin–Madison
### 2021-12-16 --- # Highlights Problem: * Previously, nonparametric IV bounds have been thoroughly studied when data on exposure, outcome, and instrument are collected at once (summarized by <a name=cite-swanson_partial_2018></a>[Swanson, Hernán, Miller, et al. (2018)](#bib-swanson_partial_2018)) * Many MR studies use two-sample data, i.e. data on exposure/instrument are separate from data on outcome/instrument * We study the behavior of nonparametric bounds from two-sample data -- Take-aways: * Both simulation and real data examples show that two-sample bounds are generally much wider than one-sample bounds making them less useful * Generally, nonparametric bounds by themselves might be of limited use in two-sample MR studies. --- layout: true # Setup --- Does some (binary) `\(X\)` cause (binary) `\(Y\)`? (We will only consider binary `\(X\)`, `\(Y\)`.) Formally, want to learn something about `\(\text{ATE} = E[Y^1 - Y^0] = E[Y^1] - E[Y^0]\)`. Note: binary `\(Y\)`, so `\(-1 \le \text{ATE} \le 1\)`. We will do so using an IV: <img src="data:image/png;base64,#FutureMRpresentation_files/figure-html/unnamed-chunk-2-1.png" height="200px" style="display: block; margin: auto;" /> Formally, `\(Z\)` should satisfy (A1) `\(Z \not\perp X\)` *(Relevance)*</br> (A2) `\(Z \perp U\)` *(Independent instrument)*</br> (A3) `\(Y^{z,x} = Y^{z',x} = Y^{x}\)` for all `\(x,z,z'\)` *(Exclusion restriction)*</br> (A4) `\(Y^{z,x} \perp Z, X | U\)` *(Conditional ignorability of `\(X,Z\)` given `\(U\)`)* --- layout: true # Non-parametric bounds --- The IV model itself can be used to obtain firm bounds on the ATE. <a name=cite-manski_nonparametric_1990></a>[Manski (1990)](#bib-manski_nonparametric_1990) showed that for a binary instrument `$$\small \max \left\{\begin{array}{c} \max_z -P(Y = 0, X = 1 | Z = z) - P(Y = 1, X = 0 | Z = z) \\ \max_{z_1 \neq z_2} P(Y = 1 | Z = z_1) - P(Y = 1 | Z = z_2) - P(Y = 1, X = 0 | Z = z_1) - P(Y = 0, X = 1 | Z = z_2) \end{array}\right\} \\ \\ \small \le \qquad \text{ATE} \qquad \le \qquad \\ \\ \small \min \left\{\begin{array}{c} \min_z P(Y = 1, X = 1 | Z = z) + P(Y = 0, X = 0 | Z = z) \\ \min_{z_1 \neq z_2} P(Y = 1 | Z = z_1) - P(Y = 1 | Z = z_2) + P(Y = 1, X = 0 | Z = z_1) + P(Y = 0, X = 1 | Z = z_2) \end{array}\right\}$$` <a name=cite-balke_bounds_1997></a>[Balke and Pearl (1997)](#bib-balke_bounds_1997) showed that the width of these bounds is always less than `\(1 - ST\)`, where `\(ST = |P(X = 1|Z=1) - P(X = 1|Z=0)|\)`. Bounds for arbitrary categorical instruments presented in <a name=cite-richardson_ace_2014></a>[Richardson and Robins (2014)](#bib-richardson_ace_2014). --- layout: false # Two-Sample Mendelian Randomization In some MR analyses, we do not have data on `\((X,Y) | Z\)`. Instead, we rely on GWAS results which give information about `\(X|Z\)` and `\(Y|Z\)` separately. Fortunately, bounds using `\(P(X|Z)\)` and `\(P(Y|Z)\)` have been derived <a name=cite-ramsahai_causal_2012></a>([Ramsahai, 2012](#bib-ramsahai_causal_2012)), but their behavior not well-known. -- Our main question: **what can we learn from nonparametric bounds of causal effects in two-sample MR studies?** Two "metrics": (1) width of the bounds, and (2) is `\(0\)` included in bounds? --- # Result 1: Length of Nonparametric Bounds from Two-Sample MR Width of many two-sample bounds vs. strength of instruments. Each dot represents bounds based on a set of values for `\(P(X|Z)\)` and `\(P(Y|Z)\)`. </br> Black: simulated values. Colored: real data. <center> <img src="data:image/png;base64,#/Users/ralphtrane/Documents/ACEBounds/FutureMRpresentation/pip_figure.png" height="375"/> </center> **Result**: under (A1)-(A4), the width is less than `\(2(1-\text{ST})\)`. .small[ (For multi-leveled IV: `\(\text{ST} = \max_{z_1 \neq z_2} | P(X = 1 | Z = z_1) - P(X = 0 | Z = z_2)|\)`.) ] --- # Illustration of Result 1 Due to very wide bounds, we are unable to detect direction when using real data, and generally learn very little: .pull-left[A: Two-sample IV bounds for the ATE of smoking on the incidence of lung cancer.] .pull-right[B: Two-sample IV bounds for the ATE of high cholesterol on the incidence of heart attack.] <img src="data:image/png;base64,#/Users/ralphtrane/Documents/ACEBounds/figures/png/example_analyses/bivariate_bounds.png" height="400" class="imgcenter"/> Note: results based on GWAS. --- # Interpretation of Result 1 Conclusion: we pay a price when using two-sample rather than one-sample data. Question: how much information is lost due to the two-sample design? --- layout: true # Quantifying Information Loss --- In <span style="color: blue">one-sample</span> data, we get `\(\color{blue}{P(X = x, Y = y | Z = z)}\)` In <span style="color: red">two-sample</span> data, we get `\(\color{red}{P(X = x | Z = z), P(Y = y | Z = z)}\)`. -- What we really lose is information about `\(\text{Cov}(X,Y | Z = z)\)`! **IF** we knew `\(\text{Cov}(X, Y | Z = z)\)`, we could go from two-sample information to one-sample information: $$ \color{blue}{P(X = x, Y = y | Z = z)} = \color{red}{P(X = x | Z = z)P(Y = y | Z = z)} + (2\cdot I[x = y] - 1)\text{Cov}(X, Y | Z = z) $$ -- We obtain *potential* <span style="color: blue">one-sample</span> bounds based on the <span style="color: red">two-sample</span> data by randomly drawing valid values of `\(\text{Cov}(X,Y|Z=z)\)`. By doing so repeatedly, we get a sense of what information might have been obtained from one-sample data nonparametric bounds. <!-- From the IV model, we find constraints on `\(\text{Cov}(X,Y|Z)\)` depending on `\(P(X|Z)\)` and `\(P(Y|Z)\)`. Randomly choosing valid values of `\(\text{Cov}(X,Y|Z)\)`, we can reconstruct potential one-sample bounds, and get a sense of the information lost due to two-sample study design. --> --- We reconstruct 1000 one-sample bounds from each of nine sets of two-sample bounds. Simulated data. <center> <img src="data:image/png;base64,#/Users/ralphtrane/Documents/ACEBounds/figures/png/trivariate_bounds_subset_plot.png" height="500"/> </center> --- .pull-left[ <img src="data:image/png;base64,#/Users/ralphtrane/Documents/ACEBounds/figures/png/example_analyses/trivariate_bounds.png" height="550"/> ] .pull-right[ </br></br> Possible one-sample IV bounds for the ATE of A. smoking on the incidence of lung cancer B. high cholesterol on the incidence of heart attack ] --- layout: false # Lessons Learned Lesson 1: Two-sample data give bounds that are much more conservative than one-sample data Lesson 2: In practice, the genetic markers used as instruments are just too weak to guarantee informative bounds Lesson 3: Bound-based analysis does not, on its own, seem to be terribly useful in a two-sample MR study Lesson 4: However, it might be useful in addition to other analyses: * check if an effect estimate based on a different IV method is within the bounds * bound effect size if direction is already well known --- layout: false # References <a name=bib-balke_bounds_1997></a>[Balke, A. and J. Pearl](#cite-balke_bounds_1997) (1997). "Bounds on Treatment Effects from Studies with Imperfect Compliance". In: _Journal of the American Statistical Association_ 92.439, pp. 1171-1176. ISSN: 0162-1459. DOI: [10.1080/01621459.1997.10474074](https://doi.org/10.1080%2F01621459.1997.10474074). URL: [https://doi.org/10.1080/01621459.1997.10474074](https://doi.org/10.1080/01621459.1997.10474074) (visited on Feb. 05, 2020). <a name=bib-manski_nonparametric_1990></a>[Manski, C. F.](#cite-manski_nonparametric_1990) (1990). "Nonparametric Bounds on Treatment Effects". In: _The American Economic Review_ 80.2, pp. 319-323. ISSN: 0002-8282. <a name=bib-ramsahai_causal_2012></a>[Ramsahai, R. R.](#cite-ramsahai_causal_2012) (2012). "Causal Bounds and Observable Constraints for Non-Deterministic Models". In: _J. Mach. Learn. Res._ 13, pp. 829-848. ISSN: 1532-4435. <a name=bib-richardson_ace_2014></a>[Richardson, T. S. and J. M. Robins](#cite-richardson_ace_2014) (2014). "ACE Bounds; SEMs with Equilibrium Conditions". In: _Statistical Science_ 29.3, pp. 363-366. ISSN: 0883-4237. DOI: [10.1214/14-STS485](https://doi.org/10.1214%2F14-STS485). arXiv: [1410.0470](https://arxiv.org/abs/1410.0470). <a name=bib-swanson_partial_2018></a>[Swanson, S. A., M. A. Hernán, M. Miller, et al.](#cite-swanson_partial_2018) (2018). "Partial Identification of the Average Treatment Effect Using Instrumental Variables: Review of Methods for Binary Instruments, Treatments, and Outcomes". En. In: _Journal of the American Statistical Association_ 113.522, pp. 933-947. ISSN: 0162-1459, 1537-274X. DOI: [10.1080/01621459.2018.1434530](https://doi.org/10.1080%2F01621459.2018.1434530).