This pipeline is designed to facilitate genome-wide and cis-based two-sample Mendelian randomisation (MR) analyses. In addition to applying various univariable MR methods, the pipeline includes options for data pre-processing of the data, harmonisation, and clumping. The function will produce results tables, various relevant statistics (e.g. heterogeneity statistics), and a forest plot of the results, as well as additional tables and information to facilitate further analyses.
The main R function to run the pipeline is ‘perform_mr’. Though the function and its dependencies is available for download from github, you can access the function at: K:/isise/Procardis Topics/Proteomics QTLs/Analyses/MRpipeline/Scripts/perform_mr.R. We recommend you use the ‘Examples’ R file (below) to get started. This file will source the function and make sure you have all necessary R packages installed.
You will need to install various R packages to run the function. Some of these are available on CRAN (rio, tidyverse,data.table, MendelianRandomization, testthat, grid, gridExtra,ieugwasr,patchwork) and others (TwoSampleMR, gsmr) you will need to install from source. The best way to ensure you have all the necessary packages installed is to use the ‘Examples’ file provided: K:/isise/Procardis Topics/Proteomics QTLs/Analyses/MRpipeline/Scripts/Examples/perform_mr_example.R
The pipeline will implement the following MR methods:
You can read about these methods here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9123217/
The function requires the following columns for the exposure and
outcome input data sets: chr, pos, rsid, effect_allele, other_allele,
beta, se, pval, eaf
Harmonisation is straightforward but requires an action for
dealing with palindromic SNPs – SNPs that have the same alleles on the
forward strand as the reverse strand (e.g. C/G on forward and G/C on
reverse). When the reference strands of the exposure and outcome data
sets are unknown, palindromic SNPs can be challenging to harmonise
because the orientation of the alleles is ambiguous. In some cases we
can use the effect allele frequency to resolve the ambiguity. For
example, consider a SNP with alleles A & T, with a frequency of 0.11
for allele A in the exposure data set and 0.91 in the outcome data set
(see Table below). In both data sets the effect allele is A and assume
they are from the same population (meaning they are likely to have
similar minor allele frequencies). Because the minor allele frequency in
the exposure data set (0.11) corresponds to the A allele but the minor
allele frequency in the outcome data (1-0.91 = 0.09) corresponds to the
T allele, it is reasonable to infer that the exposure and outcome data
are on different strands. Therefore, we need to reverse (multiply by -1)
the outcome estimate to harmonise the estimates across data sets.
However, resolving these ambiguities is much more difficult when the effect allele frequencies are closer to 0.5. In such cases SNPs are determined to be too ambiguous to resolve, and are usually discarded. Note that this only applies when the exposure and outcome strands are uncertain. For more information see https://academic.oup.com/ije/article/45/6/1717/3072174?login=true
The function has 3 options for dealing with palindromic SNPs, which are identical to those used in the TwoSampleMR package:
For Generalised IVW and GSMR, you will need to supply an LD
matrix. For smaller numbers of SNPs (i.e. <500), there is an option
to request that the function create an LD matrix for you. However, there
is no guarantee that all SNPs will be found in the reference panel
(1000G V3).
Please don’t hesitate to contact Adam Von Ende or Elsa Valdes-Marquez