class: center, top, .title-slide, title-slide # Nonparametric Bounds in Two-Sample Summary-Data Mendelian Randomization ## Some Cautionary Tales for Practice
.vsmall[(slides at
https://rpubs.com/rmtrane/bdo_pres
)] ### Ralph Møller Trane ### University of Wisconsin–Madison
### 2021-04-30 --- # Setup Does `\(X\)` cause `\(Y\)`? (We will only consider binary `\(X,Y\)`) Formally, want to learn something about `\(\text{ATE} = E[Y^1 - Y^0]\)`. (Since `\(Y\)` is binary, `\(-1 \le \text{ATE} \le 1\)`.) Tough question if we cannot rule out the existence of unmeasured confounders. <img src="data:image/png;base64,#BDOpresentation_files/figure-html/unnamed-chunk-2-1.png" height="400px" style="display: block; margin: auto;" /> --- # Instrumental Variables .pull-left[ We can estimate the ATE if we can find `\(Z\)` such that <img src="data:image/png;base64,#BDOpresentation_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> Formally, the `\(Z\)` should satisfy the following: (A1) *(Relevance)*: `\(Z \not\perp X\)` </br> (A2) *(Independent instrument)*: `\(Z \perp U\)` </br> (A3) *(Exclusion restriction)*:</br> `\(Y^{z,x} = Y^{z',x} = Y^{x}\)` for all `\(x,z,z'\)` </br> (A4) *(Conditional ignorability of `\(X,Z\)` given `\(U\)`)*: `\(Y^{z,x} \perp Z, X | U\)` ] .pull-right[ Examples: * <a name=cite-leigh_instrumental_2004></a>[Leigh and Schembri (2004)](https://linkinghub.elsevier.com/retrieve/pii/S0895435603003214) use tobacco tax level as an instrument to estimate the causal effect of smoking on lung cancer. * <a name=cite-bloom_benefits_1997></a>[Bloom, Orr, Bell, Cave, Doolittle, Lin, and Bos (1997)](https://www.jstor.org/stable/146183?origin=crossref) use the random assignment of admission to a training program to assess the causal effect of that program on earnings. * Many more: <a name=cite-angrist_instrumental_2001></a>[Angrist and Krueger (2001)](https://pubs.aeaweb.org/doi/10.1257/jep.15.4.69) ] --- # Mendelian Randomization In recent years, the use of genetic markers as IVs has gained traction. This is called *Mendelian Randomization*. Built on Gregor Mendel's observation that alleles are distributed randomly in people at fertilization. -- For example, * `\(Z\)` = some SNP * `\(X\)` = high cholesterol * `\(Y\)` = incidence of heart attack * `\(U\)` = environmental risk factors -- There are many ways of estimating causal effects using IVs. Most rely on additional strong modeling assumptions. The IV model itself can be used to obtain firm nonparametric bounds on the ATE. --- layout: true # Nonparametric bounds --- <a name=cite-manski_nonparametric_1990></a>[Manski (1990)](#bib-manski_nonparametric_1990) showed that for a binary instrument `$$\small \max \left\{\begin{array}{c} \max_z -P(Y = 0, X = 1 | Z = z) - P(Y = 1, X = 0 | Z = z) \\ \max_{z_1 \neq z_2} P(Y = 1 | Z = z_1) - P(Y = 1 | Z = z_2) - P(Y = 1, X = 0 | Z = z_1) - P(Y = 0, X = 1 | Z = z_2) \end{array}\right\} \\ \le \text{ATE} \le \\ \small \min \left\{\begin{array}{c} \min_z P(Y = 1, X = 1 | Z = z) + P(Y = 0, X = 0 | Z = z) \\ \min_{z_1 \neq z_2} P(Y = 1 | Z = z_1) - P(Y = 1 | Z = z_2) + P(Y = 1, X = 0 | Z = z_1) + P(Y = 0, X = 1 | Z = z_2) \end{array}\right\}$$` <a name=cite-balke_bounds_1997></a>[Balke and Pearl (1997)](https://doi.org/10.1080/01621459.1997.10474074) showed that the width of these bounds is always less than `\(1 - ST\)` (important!), where `$$ST = |P(X = 1|Z=1) - P(X = 1|Z=0)|$$` (Bounds for arbitrary categorical instruments presented in <a name=cite-richardson_ace_2014></a>[Richardson and Robins (2014)](https://arxiv.org/abs/1410.0470)) --- In many MR analyses, we do not have data on `\((X,Y) | Z\)`. Instead, they rely on GWAS results which give information about `\(X|Z\)` and `\(Y|Z\)` separately. Fortunately, bounds using `\(P(X|Z)\)` and `\(P(Y|Z)\)` have been derived <a name=cite-ramsahai_causal_2012></a>([Ramsahai, 2012](#bib-ramsahai_causal_2012)), but the behavior not well-known. -- Our main question: what can we learn about causal effects using nonparametric bounds in two-sample MR studies? --- Width of many two-sample bounds vs. strength of instruments. Each dot represents bounds based on a set of values for `\(P(X|Z)\)` and `\(P(Y|Z)\)`. Black: simulated values. Colored: real data. <center> <img src="data:image/png;base64,#/home/ralphtrane/Documents/RPackages_dev/ACEBounds/BDOpresentation/pip_figure.png" height="400"/> </center> **Result**: under additional assumptions, width `\(\le 2(1-\text{ST})\)`. .small[ (For multi-leveled IV: `\(\text{ST} = \max_{z_1 \neq z_2} | P(X = 1 | Z = z_1) - P(X = 0 | Z = z_2)|\)`.) ] --- Also unable to detect direction when using real data: .pull-left[A: Two-sample IV bounds for the ATE of smoking on the incidence of lung cancer.] .pull-right[B: Two-sample IV bounds for the ATE of high cholesterol on the incidence of heart attack.] <img src="data:image/png;base64,#/home/ralphtrane/Documents/RPackages_dev/ACEBounds/figures/png/example_analyses/bivariate_bounds.png" height="450"/> --- Conclusion: we pay a price when using two-sample rather than one-sample data. -- That price is information about `\(\text{Cov}(X,Y | Z = z)\)`: $$ P(X = x, Y = y | Z = z) = P(X = x | Z = z)P(Y = y | Z = z) + (2\cdot I[x = y] - 1)\text{Cov}(X, Y | Z = z) $$ -- It is possible to find inequalities `\(\text{Cov}(X,Y | Z = z)\)` must satisfy based on the observed values of `\(P(X = x | Z = z)\)` and `\(P(Y = y | Z = z)\)` for the resulting `\((X,Y)|Z\)` to follow the IV model. -- So we can get a sense information lost due to the two sample design by choosing random, valid values of `\(\text{Cov}(X,Y | Z = z)\)`, and reconstructing the corresponding one-sample bounds. --- Reconstructed one-sample bounds based on two-sample bounds. 1000 one-sample bounds in each panel. Simulated data. <center> <img src="data:image/png;base64,#/home/ralphtrane/Documents/RPackages_dev/ACEBounds/figures/png/trivariate_bounds_subset_plot.png" height="500"/> </center> --- .pull-left[ <img src="data:image/png;base64,#/home/ralphtrane/Documents/RPackages_dev/ACEBounds/figures/png/example_analyses/trivariate_bounds.png" height="550"/> ] .pull-right[ </br></br> Possible one-sample IV bounds for the ATE of A. smoking on the incidence of lung cancer B. high cholesterol on the incidence of heart attack ] --- layout: false # Lessons Learned * Two-sample data result in bounds much more conservative than one-sample data * In practice, the genetic markers used as instruments are just too weak to give informative bounds * Bound-based analysis does not, on its own, seem to be terribly useful in a two-sample MR study * However, it might be useful in an addition to other sorts of analyses: - check if an effect estimate based on a different IV method is within the bounds - bound effect size if direction is already well known </br></br></br></br></br></br></br> .small[ (Slides created using [`xaringan`](https://bookdown.org/yihui/rmarkdown/xaringan.html). Theme available as RStudio skeleton here: https://github.com/rmtrane/XaringanForUWMadison) ] --- layout: false # References <a name=bib-balke_bounds_1997></a>[Balke, A. and J. Pearl](#cite-balke_bounds_1997) (1997). "Bounds on Treatment Effects from Studies with Imperfect Compliance". In: _Journal of the American Statistical Association_ 92.439, pp. 1171-1176. ISSN: 0162-1459. DOI: [10.1080/01621459.1997.10474074](https://doi.org/10.1080%2F01621459.1997.10474074). URL: [https://doi.org/10.1080/01621459.1997.10474074](https://doi.org/10.1080/01621459.1997.10474074) (visited on Feb. 05, 2020). <a name=bib-bloom_benefits_1997></a>[Bloom, H. S., L. L. Orr, S. H. Bell, et al.](#cite-bloom_benefits_1997) (1997). "The Benefits and Costs of JTPA Title II-A Programs: Key Findings from the National Job Training Partnership Act Study". In: _The Journal of Human Resources_ 32.3, p. 549. ISSN: 0022166X. DOI: [10.2307/146183](https://doi.org/10.2307%2F146183). URL: [https://www.jstor.org/stable/146183?origin=crossref](https://www.jstor.org/stable/146183?origin=crossref) (visited on Apr. 30, 2021). <a name=bib-leigh_instrumental_2004></a>[Leigh, J. and M. Schembri](#cite-leigh_instrumental_2004) (2004). "Instrumental variables technique: cigarette price provided better estimate of effects of smoking on SF-12". En. In: _Journal of Clinical Epidemiology_ 57.3, pp. 284-293. ISSN: 08954356. DOI: [10.1016/j.jclinepi.2003.08.006](https://doi.org/10.1016%2Fj.jclinepi.2003.08.006). URL: [https://linkinghub.elsevier.com/retrieve/pii/S0895435603003214](https://linkinghub.elsevier.com/retrieve/pii/S0895435603003214) (visited on Apr. 30, 2021). <a name=bib-manski_nonparametric_1990></a>[Manski, C. F.](#cite-manski_nonparametric_1990) (1990). "Nonparametric Bounds on Treatment Effects". In: _The American Economic Review_ 80.2, pp. 319-323. ISSN: 0002-8282. <a name=bib-ramsahai_causal_2012></a>[Ramsahai, R. R.](#cite-ramsahai_causal_2012) (2012). "Causal Bounds and Observable Constraints for Non-Deterministic Models". In: _J. Mach. Learn. Res._ 13, pp. 829-848. ISSN: 1532-4435. --- # References (cont.) <a name=bib-angrist_instrumental_2001></a>[Angrist, J. D. and A. B. Krueger](#cite-angrist_instrumental_2001) (2001). "Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments". En. In: _Journal of Economic Perspectives_ 15.4, pp. 69-85. ISSN: 0895-3309. DOI: [10.1257/jep.15.4.69](https://doi.org/10.1257%2Fjep.15.4.69). URL: [https://pubs.aeaweb.org/doi/10.1257/jep.15.4.69](https://pubs.aeaweb.org/doi/10.1257/jep.15.4.69) (visited on Apr. 30, 2021). <a name=bib-richardson_ace_2014></a>[Richardson, T. S. and J. M. Robins](#cite-richardson_ace_2014) (2014). "ACE Bounds; SEMs with Equilibrium Conditions". In: _Statistical Science_ 29.3, pp. 363-366. ISSN: 0883-4237. DOI: [10.1214/14-STS485](https://doi.org/10.1214%2F14-STS485). arXiv: [1410.0470](https://arxiv.org/abs/1410.0470).