cisMR-cML for robust cis Mendelian randomization analysis

This vignette shows an example of applying cisMR-cML to infer causal relationship of LDL on CAD using cis-SNPs on the PCSK9 region.

Example

We provide the pre-processed data for LDL and CAD on the PCSK9 region.

data("mr_dat")
str(mr_dat)
#> List of 7
#>  $ exp_df:Classes 'data.table' and 'data.frame': 9 obs. of  12 variables:
#>   ..$ SNP   : chr [1:9] "rs55862049" "rs187607506" "rs11206510" "rs17111503" ...
#>   ..$ A1    : chr [1:9] "A" "C" "T" "A" ...
#>   ..$ A2    : chr [1:9] "C" "T" "C" "G" ...
#>   ..$ freq  : num [1:9] 0.853 0.755 0.812 0.732 0.982 ...
#>   ..$ b     : num [1:9] -0.0309 0.0419 0.0418 -0.0373 0.3003 ...
#>   ..$ se    : num [1:9] 0.00289 0.00242 0.00262 0.00232 0.00773 ...
#>   ..$ p     : num [1:9] 1.33e-26 4.30e-67 2.23e-57 3.06e-58 0.00 ...
#>   ..$ N     : int [1:9] 343621 343621 343621 343621 343621 343621 343621 343621 343621
#>   ..$ z     : num [1:9] -10.7 17.3 16 -16.1 38.8 ...
#>   ..$ cor   : num [1:9] -0.0182 0.0295 0.0272 -0.0274 0.0661 ...
#>   ..$ se_cor: num [1:9] 0.00171 0.00171 0.00171 0.00171 0.0017 ...
#>   ..$ bJ    : num [1:9] -0.01173 0.01487 -0.00377 -0.01608 0.06235 ...
#>   ..- attr(*, ".internal.selfref")=<externalptr> 
#>  $ out_df:Classes 'data.table' and 'data.frame': 9 obs. of  12 variables:
#>   ..$ SNP   : chr [1:9] "rs55862049" "rs187607506" "rs11206510" "rs17111503" ...
#>   ..$ A1    : chr [1:9] "A" "C" "T" "A" ...
#>   ..$ A2    : chr [1:9] "C" "T" "C" "G" ...
#>   ..$ freq  : num [1:9] 0.868 0.804 0.848 0.736 0.986 ...
#>   ..$ b     : num [1:9] -0.0301 0.0636 0.0745 -0.0349 0.2565 ...
#>   ..$ se    : num [1:9] 0.0143 0.0129 0.0133 0.0113 0.0573 ...
#>   ..$ p     : num [1:9] 3.48e-02 7.67e-07 2.34e-08 2.01e-03 7.47e-06 ...
#>   ..$ N     : int [1:9] 184305 184305 184305 184305 184305 184305 184305 184305 184305
#>   ..$ z     : num [1:9] -2.11 4.94 5.58 -3.09 4.48 ...
#>   ..$ cor   : num [1:9] -0.00492 0.01151 0.01301 -0.00719 0.01043 ...
#>   ..$ se_cor: num [1:9] 0.00233 0.00233 0.00233 0.00233 0.00233 ...
#>   ..$ bJ    : num [1:9] -0.00236 0.00286 0.00777 -0.00347 0.00774 ...
#>   ..- attr(*, ".internal.selfref")=<externalptr> 
#>  $ N1    : int 343621
#>  $ N2    : num 184305
#>  $ LD_mat: num [1:9, 1:9] 1 -0.2278 -0.1979 0.022 -0.0396 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:9] "rs55862049" "rs187607506" "rs11206510" "rs17111503" ...
#>   .. ..$ : chr [1:9] "rs55862049" "rs187607506" "rs11206510" "rs17111503" ...
#>  $ exp_IV: chr [1:8] "rs55862049" "rs187607506" "rs17111503" "rs11591147" ...
#>  $ out_IV: chr "rs11206510"

As shown above, 9 SNPs in the PCSK9 region were used in the analysis (the first real data application in the manuscript), in which 8 SNPs (shown in mr_dat$exp_IV) were jointly associated (selected by GCTA-COJO) with LDL, and 1 SNP (shown in mr_dat$out_IV) was associate with CAD. mr_dat$LD_mat is the LD/correlation matrix among the 9 SNPs, which was estimated based on the UK Biobank White-ancestry individuals.

mr_dat$exp_df and mr_dat$out_df correspond to the GWAS summary statistics for LDL (exposure) and CAD (outcome) respectively. In particular, mr_dat$exp_df$b, mr_dat$exp_df$se, mr_dat$exp_df$p are the raw GWAS marginal summary statistics, and mr_dat$exp_df$bJ is the conditional genetic effect estimates on the exposure (similar for mr_dat$out_df), which can be calculated as follows (see Manuscript Section 4.2):

mr_dat$exp_df$cor = mr_dat$exp_df$b / sqrt(mr_dat$exp_df$b^2 + (mr_dat$exp_df$N - 2) *
                                             mr_dat$exp_df$se^2)
mr_dat$exp_df$bJ = solve(mr_dat$LD_mat) %*% mr_dat$exp_df$cor
mr_dat$out_df$cor = mr_dat$out_df$b / sqrt(mr_dat$out_df$b^2 + (mr_dat$out_df$N - 2) *
                                             mr_dat$out_df$se^2)
mr_dat$out_df$bJ = solve(mr_dat$LD_mat) %*% mr_dat$out_df$cor

Next, we apply cisMR-cML with 100 data perturbations with 5 random starts where $\theta^{(0)}\sim U(-0.1,0.1)$:

b_exp_cond=mr_dat$exp_df$bJ
b_out_cond=mr_dat$out_df$bJ 
Sig_exp1 = solve(mr_dat$LD_mat) %*% (mr_dat$exp_df$se_cor %o% mr_dat$exp_df$se_cor * mr_dat$LD_mat) %*%
  solve(mr_dat$LD_mat)
Sig_out1 = solve(mr_dat$LD_mat) %*% (mr_dat$out_df$se_cor %o% mr_dat$out_df$se_cor * mr_dat$LD_mat) %*%
  solve(mr_dat$LD_mat)
Sig_exp_inv=solve(Sig_exp1)
Sig_out_inv=solve(Sig_out1)

t0 = proc.time()[3]
ciscML_res = cismr_cML_DP(b_exp=b_exp_cond,b_out=b_out_cond,
                          Sig_exp_inv=Sig_exp_inv,Sig_out_inv=Sig_out_inv,maxit=200,
                          n = mr_dat$N1,random_start = 5,
                          min_theta_range=-0.1,max_theta_range=0.1,
                          num_pert=100,random_start_pert=5,random_seed = 12345)
t1 = proc.time()[3]
cat("The running time is: ", t1 - t0, " seconds.")
#> The running time is:  32.88  seconds.

cat("The estimated effect of the exposure on outcome: ", ciscML_res$BIC_DP_theta)
#> The estimated effect of the exposure on outcome:  0.1845757
cat("Standard error of the causal estimate: ", ciscML_res$BIC_DP_se)
#> Standard error of the causal estimate:  0.05461205
cat("P-value for the causal estimate: ", ciscML_res$BIC_DP_p)
#> P-value for the causal estimate:  0.0007254868

At this point, we have reproduced the LDL-CAD result in our manuscript.

If you failed to reproduce the results, please make sure you have the same R packages version (especially MASS_7.3-51.4) as those in the built vignette.

sessionInfo()
#> R version 3.6.3 (2020-02-29)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS  10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] cisMRcML_0.0.0.9000
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.35       MASS_7.3-51.4       R6_2.5.1           
#>  [4] lifecycle_1.0.4     jsonlite_1.8.8      evaluate_0.23      
#>  [7] cachem_1.0.8        rlang_1.1.3         cli_3.6.2          
#> [10] rstudioapi_0.15.0   jquerylib_0.1.4     bslib_0.6.1        
#> [13] rmarkdown_2.26      tools_3.6.3         numDeriv_2016.8-1.1
#> [16] xfun_0.42           yaml_2.3.8          fastmap_1.1.1      
#> [19] compiler_3.6.3      htmltools_0.5.7     knitr_1.45         
#> [22] sass_0.4.8

cisMR-cML for robust cis Mendelian randomization analysis

Zhaotong Lin

March 13, 2024

Example