Hopewell Team MR Pipeline

This pipeline is designed to facilitate genome-wide and cis-based two-sample Mendelian randomisation (MR) analyses. In addition to applying various univariable MR methods, the pipeline includes options for data pre-processing of the data, harmonisation, and clumping. The function will produce results tables, various relevant statistics (e.g. heterogeneity statistics), and a forest plot of the results, as well as additional tables and information to facilitate further analyses.

Installation

The main R function to run the pipeline is ‘perform_mr’. Though the function and its dependencies is available for download from github, you can access the function at: K:/isise/Procardis Topics/Proteomics QTLs/Analyses/MRpipeline/Scripts/perform_mr.R. We recommend you use the ‘Examples’ R file (below) to get started. This file will source the function and make sure you have all necessary R packages installed.

Usage

You will need to install various R packages to run the function. Some of these are available on CRAN (rio, tidyverse,data.table, MendelianRandomization, testthat, grid, gridExtra,ieugwasr,patchwork) and others (TwoSampleMR, gsmr) you will need to install from source. The best way to ensure you have all the necessary packages installed is to use the ‘Examples’ file provided: K:/isise/Procardis Topics/Proteomics QTLs/Analyses/MRpipeline/Scripts/Examples/perform_mr_example.R

test

Methods

The pipeline will implement the following MR methods:

  • Inverse variance weighted (fixed effects & random effects)
  • Generalised inverse variance weighted (fixed effects & random effects)
  • Weighted median
  • Weighted mode
  • MR Egger
  • MR PRESSO
  • GSMR with Heidi outlier test

You can read about these methods here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9123217/

Specific considerations

Input files


The function requires the following columns for the exposure and outcome input data sets: chr, pos, rsid, effect_allele, other_allele, beta, se, pval, eaf

However, only rsid, effect_allele, other_allele, beta, se, and pval are strictly required; if you do not require certain columns (e.g. eaf) to perform your analyses, then just create empty columns in your data set.

Harmonisation


Harmonisation is straightforward but requires an action for dealing with palindromic SNPs – SNPs that have the same alleles on the forward strand as the reverse strand (e.g. C/G on forward and G/C on reverse). When the reference strands of the exposure and outcome data sets are unknown, palindromic SNPs can be challenging to harmonise because the orientation of the alleles is ambiguous. In some cases we can use the effect allele frequency to resolve the ambiguity. For example, consider a SNP with alleles A & T, with a frequency of 0.11 for allele A in the exposure data set and 0.91 in the outcome data set (see Table below). In both data sets the effect allele is A and assume they are from the same population (meaning they are likely to have similar minor allele frequencies). Because the minor allele frequency in the exposure data set (0.11) corresponds to the A allele but the minor allele frequency in the outcome data (1-0.91 = 0.09) corresponds to the T allele, it is reasonable to infer that the exposure and outcome data are on different strands. Therefore, we need to reverse (multiply by -1) the outcome estimate to harmonise the estimates across data sets.

However, resolving these ambiguities is much more difficult when the effect allele frequencies are closer to 0.5. In such cases SNPs are determined to be too ambiguous to resolve, and are usually discarded. Note that this only applies when the exposure and outcome strands are uncertain. For more information see https://academic.oup.com/ije/article/45/6/1717/3072174?login=true

Description

The function has 3 options for dealing with palindromic SNPs, which are identical to those used in the TwoSampleMR package:

  1. = assume SNPs in the exposure and outcome GWAS are on the same strand
  2. = try to infer the effect allele based on the effect allele frequency*
  3. = Correct the strand for non-palindromic SNPs, but drop all palindromic SNPs
*SNPs will be discarded if the exposure and outcome effect allele frequency is >0.42

Generalised IVW and GSMR


For Generalised IVW and GSMR, you will need to supply an LD matrix. For smaller numbers of SNPs (i.e. <500), there is an option to request that the function create an LD matrix for you. However, there is no guarantee that all SNPs will be found in the reference panel (1000G V3).

Also note that these methods are designed to accommodate correlated SNPs, so you may not need to perfo

MR PRESSO
MR PRESSO recommends the number of bootstraps of the empirical distribution to be equal to n_snps/0.05 (e.g. 10,000 bootstraps for 500 SNPs). The function will default to this recommended number, though a large number of bootstraps can cause significant increases in runtime. You can choose a smaller number of bootstraps, but MR PRESSO might complain, and the results for MR PRESSO may not be reliable.

rsid vs chr:pos
Currently, the pipeline will only harmonise on rsid. However, there is a workaround if you wish to harmonise on chr:pos. You can do this by creating an ‘rsid’ column in the exposure and outcome data sets that includes chr:pos (e.g. chr11:14335353). The reason this works is because the harmonisation function simply matches strings from one data set to another, so does not care if it is ‘rs14343’ or ‘chr11:14335353’ or ‘flyingmonkey1234’. There is one catch: if you want the function to perform clumping or create an LD matrix for you, these will not work using chr:pos because these functions rely on rsids. However, there is now a workaround for this as well: we have added an experimental argument (‘map_rsids=T’) that will map chr:pos from harmonised SNPs to the rsid, so that the function can then perform downstream clumping, etc,. You will need two additional packages for this: SNPlocs.Hsapiens.dbSNP144.GRCh37 and SNPlocs.Hsapiens.dbSNP155.GRCh38. If you are harmonizing on chr:pos, make sure your exposure and outcome data sets are on the same build!

Questions? / Suggestions?

Please don’t hesitate to contact Adam Von Ende or Elsa Valdes-Marquez