This vignette shows an example of applying cisMR-cML to infer causal relationship of LDL on CAD using cis-SNPs on the PCSK9 region.
We provide the pre-processed data for LDL and CAD on the PCSK9 region.
data("mr_dat")
str(mr_dat)
#> List of 7
#> $ exp_df:Classes 'data.table' and 'data.frame': 9 obs. of 12 variables:
#> ..$ SNP : chr [1:9] "rs55862049" "rs187607506" "rs11206510" "rs17111503" ...
#> ..$ A1 : chr [1:9] "A" "C" "T" "A" ...
#> ..$ A2 : chr [1:9] "C" "T" "C" "G" ...
#> ..$ freq : num [1:9] 0.853 0.755 0.812 0.732 0.982 ...
#> ..$ b : num [1:9] -0.0309 0.0419 0.0418 -0.0373 0.3003 ...
#> ..$ se : num [1:9] 0.00289 0.00242 0.00262 0.00232 0.00773 ...
#> ..$ p : num [1:9] 1.33e-26 4.30e-67 2.23e-57 3.06e-58 0.00 ...
#> ..$ N : int [1:9] 343621 343621 343621 343621 343621 343621 343621 343621 343621
#> ..$ z : num [1:9] -10.7 17.3 16 -16.1 38.8 ...
#> ..$ cor : num [1:9] -0.0182 0.0295 0.0272 -0.0274 0.0661 ...
#> ..$ se_cor: num [1:9] 0.00171 0.00171 0.00171 0.00171 0.0017 ...
#> ..$ bJ : num [1:9] -0.01173 0.01487 -0.00377 -0.01608 0.06235 ...
#> ..- attr(*, ".internal.selfref")=<externalptr>
#> $ out_df:Classes 'data.table' and 'data.frame': 9 obs. of 12 variables:
#> ..$ SNP : chr [1:9] "rs55862049" "rs187607506" "rs11206510" "rs17111503" ...
#> ..$ A1 : chr [1:9] "A" "C" "T" "A" ...
#> ..$ A2 : chr [1:9] "C" "T" "C" "G" ...
#> ..$ freq : num [1:9] 0.868 0.804 0.848 0.736 0.986 ...
#> ..$ b : num [1:9] -0.0301 0.0636 0.0745 -0.0349 0.2565 ...
#> ..$ se : num [1:9] 0.0143 0.0129 0.0133 0.0113 0.0573 ...
#> ..$ p : num [1:9] 3.48e-02 7.67e-07 2.34e-08 2.01e-03 7.47e-06 ...
#> ..$ N : int [1:9] 184305 184305 184305 184305 184305 184305 184305 184305 184305
#> ..$ z : num [1:9] -2.11 4.94 5.58 -3.09 4.48 ...
#> ..$ cor : num [1:9] -0.00492 0.01151 0.01301 -0.00719 0.01043 ...
#> ..$ se_cor: num [1:9] 0.00233 0.00233 0.00233 0.00233 0.00233 ...
#> ..$ bJ : num [1:9] -0.00236 0.00286 0.00777 -0.00347 0.00774 ...
#> ..- attr(*, ".internal.selfref")=<externalptr>
#> $ N1 : int 343621
#> $ N2 : num 184305
#> $ LD_mat: num [1:9, 1:9] 1 -0.2278 -0.1979 0.022 -0.0396 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:9] "rs55862049" "rs187607506" "rs11206510" "rs17111503" ...
#> .. ..$ : chr [1:9] "rs55862049" "rs187607506" "rs11206510" "rs17111503" ...
#> $ exp_IV: chr [1:8] "rs55862049" "rs187607506" "rs17111503" "rs11591147" ...
#> $ out_IV: chr "rs11206510"
As shown above, 9 SNPs in the PCSK9 region were used in the analysis
(the first real data application in the manuscript), in which 8 SNPs
(shown in mr_dat$exp_IV
) were jointly associated (selected
by GCTA-COJO)
with LDL, and 1 SNP (shown in mr_dat$out_IV
) was associate
with CAD. mr_dat$LD_mat
is the LD/correlation matrix among
the 9 SNPs, which was estimated based on the UK Biobank White-ancestry
individuals.
mr_dat$exp_df
and mr_dat$out_df
correspond
to the GWAS summary statistics for LDL (exposure) and CAD (outcome)
respectively. In particular, mr_dat$exp_df$b
,
mr_dat$exp_df$se
, mr_dat$exp_df$p
are the raw
GWAS marginal summary statistics, and mr_dat$exp_df$bJ
is
the conditional genetic effect estimates on the
exposure (similar for mr_dat$out_df
), which can be
calculated as follows (see Manuscript Section 4.2):
mr_dat$exp_df$cor = mr_dat$exp_df$b / sqrt(mr_dat$exp_df$b^2 + (mr_dat$exp_df$N - 2) *
mr_dat$exp_df$se^2)
mr_dat$exp_df$bJ = solve(mr_dat$LD_mat) %*% mr_dat$exp_df$cor
mr_dat$out_df$cor = mr_dat$out_df$b / sqrt(mr_dat$out_df$b^2 + (mr_dat$out_df$N - 2) *
mr_dat$out_df$se^2)
mr_dat$out_df$bJ = solve(mr_dat$LD_mat) %*% mr_dat$out_df$cor
Next, we apply cisMR-cML with 100 data perturbations with 5 random starts where \(\theta^{(0)}\sim U(-0.1,0.1)\):
b_exp_cond=mr_dat$exp_df$bJ
b_out_cond=mr_dat$out_df$bJ
Sig_exp1 = solve(mr_dat$LD_mat) %*% (mr_dat$exp_df$se_cor %o% mr_dat$exp_df$se_cor * mr_dat$LD_mat) %*%
solve(mr_dat$LD_mat)
Sig_out1 = solve(mr_dat$LD_mat) %*% (mr_dat$out_df$se_cor %o% mr_dat$out_df$se_cor * mr_dat$LD_mat) %*%
solve(mr_dat$LD_mat)
Sig_exp_inv=solve(Sig_exp1)
Sig_out_inv=solve(Sig_out1)
t0 = proc.time()[3]
ciscML_res = cismr_cML_DP(b_exp=b_exp_cond,b_out=b_out_cond,
Sig_exp_inv=Sig_exp_inv,Sig_out_inv=Sig_out_inv,maxit=200,
n = mr_dat$N1,random_start = 5,
min_theta_range=-0.1,max_theta_range=0.1,
num_pert=100,random_start_pert=5,random_seed = 12345)
t1 = proc.time()[3]
cat("The running time is: ", t1 - t0, " seconds.")
#> The running time is: 32.88 seconds.
cat("The estimated effect of the exposure on outcome: ", ciscML_res$BIC_DP_theta)
#> The estimated effect of the exposure on outcome: 0.1845757
cat("Standard error of the causal estimate: ", ciscML_res$BIC_DP_se)
#> Standard error of the causal estimate: 0.05461205
cat("P-value for the causal estimate: ", ciscML_res$BIC_DP_p)
#> P-value for the causal estimate: 0.0007254868
At this point, we have reproduced the LDL-CAD result in our manuscript.
If you failed to reproduce the results, please make sure you have the same R packages version (especially MASS_7.3-51.4) as those in the built vignette.
sessionInfo()
#> R version 3.6.3 (2020-02-29)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] cisMRcML_0.0.0.9000
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.35 MASS_7.3-51.4 R6_2.5.1
#> [4] lifecycle_1.0.4 jsonlite_1.8.8 evaluate_0.23
#> [7] cachem_1.0.8 rlang_1.1.3 cli_3.6.2
#> [10] rstudioapi_0.15.0 jquerylib_0.1.4 bslib_0.6.1
#> [13] rmarkdown_2.26 tools_3.6.3 numDeriv_2016.8-1.1
#> [16] xfun_0.42 yaml_2.3.8 fastmap_1.1.1
#> [19] compiler_3.6.3 htmltools_0.5.7 knitr_1.45
#> [22] sass_0.4.8