The MRPracticals package contains a sample data frame, and two vignette files with code and instructions:
Running the command vignette(“MRBase”) will display a document giving a detailed description of the MR Base practical.
Running the command vignette(“RadialMR”) will display a document giving a detailed description of the RadialMR practical.
Running the command vignette(“MVMR Tutorial”) will display a document giving a detailed description of the MVMR practical.
if(F){
install.packages(c("devtools", "knitr", "rmarkdown"))
library(devtools)
install_github(c("MRCIEU/TwoSampleMR","MRCIEU/MRInstruments"))
library(devtools)
install_github("WSpiller/MRPracticals",build_opts = c("--no-resave-data", "--no-manual"),build_vignettes = TRUE)
}
library(TwoSampleMR)
## TwoSampleMR version 0.5.6
## [>] New: Option to use non-European LD reference panels for clumping etc
## [>] Some studies temporarily quarantined to verify effect allele
## [>] See news(package='TwoSampleMR') and https://gwas.mrcieu.ac.uk for further details
A data frame of the instruments for an exposure is required. Each line has the information for one variant for one exposure. The minimum information required for MR analysis is the following: SNP - rs ID beta - The effect size. If the trait is binary then log(OR) should be used se - The standard error of the effect size effect_allele - The allele of the SNP which has the effect marked in beta
Other information that is useful for MR can also be provided:
other_allele - The non-effect allele eaf - The effect allele frequency Phenotype - The name of the phenotype for which the SNP has an effect You can also provide the following extra information:
chr - Physical position of variant (chromosome) position - Physical position of variant (position) samplesize - Sample size for estimating the effect size ncase - Number of cases ncontrol - Number of controls pval - The P-value for the SNP’s association with the exposure units - The units in which the effects are presented gene - The gene or other annotation for the the SNP
An example of a text file with the default column names is provided as part of the package, the first few rows look like this:
Phenotype SNP beta se effect_allele other_allele eaf pval units gene samplesize BMI rs10767664 0.19 0.0306122448979592 A T 0.78 5e-26 kg/m2 BDNF 225238 BMI rs13078807 0.1 0.0204081632653061 G A 0.2 4e-11 kg/m2 CADM2 221431 BMI rs1514175 0.07 0.0204081632653061 A G 0.43 8e-14 kg/m2 TNNI3K 207641 BMI rs1558902 0.39 0.0204081632653061 A T 0.42 5e-120 kg/m2 FTO 222476 BMI rs10968576 0.11 0.0204081632653061 G A 0.31 3e-13 kg/m2 LRRN6C 247166 BMI rs2241423 0.13 0.0204081632653061 G A 0.78 1e-18 kg/m2 LBXCOR1 227886
bmi_file <- system.file("extdata", "bmi.txt", package="TwoSampleMR")
head(bmi_file)
## [1] "C:/Users/13619/Documents/R/win-library/4.0/TwoSampleMR/extdata/bmi.txt"
bmi_exp_dat <- read_exposure_data(bmi_file)
head(bmi_exp_dat)
## SNP beta.exposure se.exposure effect_allele.exposure
## 1 rs10767664 0.19 0.03061224 A
## 2 rs13078807 0.10 0.02040816 G
## 3 rs1514175 0.07 0.02040816 A
## 4 rs1558902 0.39 0.02040816 A
## 5 rs10968576 0.11 0.02040816 G
## 6 rs2241423 0.13 0.02040816 G
## other_allele.exposure eaf.exposure pval.exposure units.exposure gene.exposure
## 1 T 0.78 5e-26 kg/m2 BDNF
## 2 A 0.20 4e-11 kg/m2 CADM2
## 3 G 0.43 8e-14 kg/m2 TNNI3K
## 4 T 0.42 5e-120 kg/m2 FTO
## 5 A 0.31 3e-13 kg/m2 LRRN6C
## 6 A 0.78 1e-18 kg/m2 LBXCOR1
## samplesize.exposure exposure mr_keep.exposure pval_origin.exposure
## 1 225238 BMI TRUE reported
## 2 221431 BMI TRUE reported
## 3 207641 BMI TRUE reported
## 4 222476 BMI TRUE reported
## 5 247166 BMI TRUE reported
## 6 227886 BMI TRUE reported
## units.exposure_dat id.exposure data_source.exposure
## 1 kg/m2 tyeVoC textfile
## 2 kg/m2 tyeVoC textfile
## 3 kg/m2 tyeVoC textfile
## 4 kg/m2 tyeVoC textfile
## 5 kg/m2 tyeVoC textfile
## 6 kg/m2 tyeVoC textfile
If the text file does not have default column names, this can still be read in as follows. Here are the first few rows of an example:
rsid,effect,SE,a1,a2,a1_freq,p-value,Units,Gene,n rs10767664,0.19,0.030612245,A,T,0.78,5.00E-26,kg/m2,BDNF,225238 rs13078807,0.1,0.020408163,G,A,0.2,4.00E-11,kg/m2,CADM2,221431 rs1514175,0.07,0.020408163,A,G,0.43,8.00E-14,kg/m2,TNNI3K,207641 rs1558902,0.39,0.020408163,A,T,0.42,5.00E-120,kg/m2,FTO,222476 Note that this is a CSV file, with commas separating fields. The file is located here:
bmi2_file <- system.file("extdata/bmi.csv", package = "TwoSampleMR")
bmi_exp_dat <- read_exposure_data(
filename = bmi2_file,
sep = ",",
snp_col = "rsid",
beta_col = "effect",
se_col = "SE",
effect_allele_col = "a1",
other_allele_col = "a2",
eaf_col = "a1_freq",
pval_col = "p-value",
units_col = "Units",
gene_col = "Gene",
samplesize_col = "n"
)
## No phenotype name specified, defaulting to 'exposure'.
#> No phenotype name specified, defaulting to 'exposure'.
head(bmi_exp_dat)
## SNP beta.exposure se.exposure effect_allele.exposure
## 1 rs10767664 0.19 0.03061224 A
## 2 rs13078807 0.10 0.02040816 G
## 3 rs1514175 0.07 0.02040816 A
## 4 rs1558902 0.39 0.02040816 A
## 5 rs10968576 0.11 0.02040816 G
## 6 rs2241423 0.13 0.02040816 G
## other_allele.exposure eaf.exposure pval.exposure units.exposure gene.exposure
## 1 T 0.78 5e-26 kg/m2 BDNF
## 2 A 0.20 4e-11 kg/m2 CADM2
## 3 G 0.43 8e-14 kg/m2 TNNI3K
## 4 T 0.42 5e-120 kg/m2 FTO
## 5 A 0.31 3e-13 kg/m2 LRRN6C
## 6 A 0.78 1e-18 kg/m2 LBXCOR1
## samplesize.exposure exposure mr_keep.exposure pval_origin.exposure
## 1 225238 exposure TRUE reported
## 2 221431 exposure TRUE reported
## 3 207641 exposure TRUE reported
## 4 222476 exposure TRUE reported
## 5 247166 exposure TRUE reported
## 6 227886 exposure TRUE reported
## units.exposure_dat id.exposure data_source.exposure
## 1 kg/m2 UmWLFX textfile
## 2 kg/m2 UmWLFX textfile
## 3 kg/m2 UmWLFX textfile
## 4 kg/m2 UmWLFX textfile
## 5 kg/m2 UmWLFX textfile
## 6 kg/m2 UmWLFX textfile
random_df <- data.frame(
SNP = c("rs1", "rs2"),
beta = c(1, 2),
se = c(1, 2),
effect_allele = c("A", "T")
)
random_df
## SNP beta se effect_allele
## 1 rs1 1 1 A
## 2 rs2 2 2 T
This can be formatted like so:
random_exp_dat <- format_data(random_df, type="exposure")
## No phenotype name specified, defaulting to 'exposure'.
## Warning in format_data(random_df, type = "exposure"): The following columns are not present but are helpful for harmonisation
## other_alleleeaf
## Inferring p-values
#> No phenotype name specified, defaulting to 'exposure'.
#> Warning in format_data(random_df, type = "exposure"): The following columns are not present but are helpful for harmonisation
#> other_alleleeaf
#> Inferring p-values
random_exp_dat
## SNP beta.exposure se.exposure effect_allele.exposure exposure
## 1 rs1 1 1 A exposure
## 2 rs2 2 2 T exposure
## mr_keep.exposure pval.exposure pval_origin.exposure id.exposure
## 1 TRUE 0.3173105 inferred QcOEI6
## 2 TRUE 0.3173105 inferred QcOEI6
## other_allele.exposure eaf.exposure
## 1 NA NA
## 2 NA NA
GWAS catalog * The NHGRI-EBI GWAS catalog contains a catalog of significant associations obtained from GWASs. This version of the data is filtered and harmonised to contain associations that have the required data to perform MR, to ensure that the units used to report effect sizes from a particular study are all the same, and other data cleaning operations. To use the GWAS catalog:
library(MRInstruments)
data(gwas_catalog)
head(gwas_catalog)
## Phenotype_simple
## 1 Eosinophil percentage of white cells
## 2 Eosinophil counts
## 3 Medication use (agents acting on the renin-angiotensin system)
## 4 Post bronchodilator FEV1
## 5 DNA methylation variation (age effect)
## 6 Ankylosing spondylitis
## MAPPED_TRAIT_EFO
## 1 eosinophil percentage of leukocytes
## 2 eosinophil count
## 3 Agents acting on the renin-angiotensin system use measurement
## 4 forced expiratory volume, response to bronchodilator
## 5 DNA methylation
## 6 ankylosing spondylitis
## MAPPED_TRAIT_EFO_URI
## 1 http://www.ebi.ac.uk/efo/EFO_0007991
## 2 http://www.ebi.ac.uk/efo/EFO_0004842
## 3 http://www.ebi.ac.uk/efo/EFO_0009931
## 4 http://www.ebi.ac.uk/efo/EFO_0004314, http://purl.obolibrary.org/obo/GO_0097366
## 5 http://purl.obolibrary.org/obo/GO_0006306
## 6 http://www.ebi.ac.uk/efo/EFO_0003898
## Initial_sample_description
## 1 172,378 European ancestry individuals
## 2 172,275 European ancestry individuals
## 3 62,752 European ancestry cases, 174,778 European ancestry controls
## 4 10,094 European ancestry current and former smoker individuals, 3,260 African American current and former smoker individuals, 178 current and former smoker individuals
## 5 Up to 954 individuals
## 6 921 Turkish ancestry cases, 907 Turkish ancestry controls, 422 Iranian ancestry cases, 754 Iranian ancestry controls
## Replication_sample_description STUDY.ACCESSION
## 1 <NA> GCST004600
## 2 <NA> GCST004606
## 3 <NA> GCST007930
## 4 <NA> GCST003262
## 5 <NA> GCST006660
## 6 <NA> GCST007844
## Phenotype Phenotype_info
## 1 Eosinophil percentage of white cells
## 2 Eosinophil counts
## 3 Medication use (agents acting on the renin-angiotensin system)
## 4 Post bronchodilator FEV1
## 5 DNA methylation variation (age effect)
## 6 Ankylosing spondylitis
## PubmedID Author Year SNP chr bp_ens_GRCh38 Region gene
## 1 27863252 Astle WJ 2016 rs1000005 21 33060745 21q22.11 AP000282.2
## 2 27863252 Astle WJ 2016 rs1000005 21 33060745 21q22.11 AP000282.2
## 3 31015401 Wu Y 2019 rs1000010 3 11562645 3p25.3 VGLL4
## 4 26634245 Lutz SM 2015 rs10000225 4 144312789 4q31.21 Intergenic
## 5 30348214 Zhang Q 2018 rs10000513 4 160334994 4q32.1 NR
## 6 30946743 Li Z 2019 rs10000518 4 11502867 4p15.33 HS3ST1
## Gene_ens effect_allele other_allele beta se pval
## 1 AP000282.2,LINC00945 C G -0.02652552 0.003826531 2e-13
## 2 AP000282.2,LINC00945 C G -0.02481715 0.003571429 7e-12
## 3 G A -0.03724189 0.006377551 6e-09
## 4 Intergenic A T -0.04400000 0.009420188 3e-06
## 5 NR <NA> <NA> NA NA 4e-08
## 6 G A 0.73396926 NA 6e-06
## units eaf date_added_to_MRBASE
## 1 unit decrease 0.589400 2019-08-29
## 2 unit decrease 0.589400 2019-08-29
## 3 unit decrease 0.351806 2019-08-29
## 4 NR unit decrease 0.350000 2019-08-29
## 5 <NA> NA 2019-08-29
## 6 <NA> NA 2019-08-29
For example, to obtain instruments for body mass index using the Speliotes et al 2010 study:
bmi_gwas <-
subset(gwas_catalog,
grepl("Speliotes", Author) & Phenotype == "Body mass index")
bmi_exp_dat <- format_data(bmi_gwas)
## Warning in format_data(bmi_gwas): other_allele column has some values that are
## not A/C/T/G or an indel comprising only these characters or D/I. These SNPs will
## be excluded
## Warning in .fun(piece, ...): More than one type of unit specified for Body mass
## index
## Warning in format_data(bmi_gwas): The following SNP(s) are missing required information for the MR tests and will be excluded
## rs13107325
## rs2444217
## rs255414
## rs2922763
## rs3764400
## rs4771122
## rs4836133
## rs6955651
## rs7138803
## rs867559
## rs887912
Independent top hits from GWASs on 121 metabolites in whole blood are stored in the metab_qtls data object. Use ?metab_qtls to get more information.
data(metab_qtls)
head(metab_qtls)
## phenotype chromosome position SNP effect_allele other_allele eaf
## 1 AcAce 8 9181395 rs2169387 G A 0.870251
## 2 AcAce 11 116648917 rs964184 C G 0.857715
## 3 Ace 6 12042473 rs6933521 C T 0.120256
## 4 Ala 2 27730940 rs1260326 C T 0.638817
## 5 Ala 2 65220910 rs2160387 C T 0.403170
## 6 Ala 12 47201814 rs4554975 G A 0.644059
## beta se pval n_studies n
## 1 0.085630 0.015451 3.61e-08 11 19257
## 2 -0.096027 0.014624 6.71e-11 11 19261
## 3 -0.091667 0.015885 8.10e-09 14 24742
## 4 -0.104582 0.009940 7.40e-26 13 22569
## 5 -0.071001 0.009603 1.49e-13 14 24793
## 6 -0.069135 0.009598 6.12e-13 14 24792
#For example, to obtain instruments for the Alanine:
ala_exp_dat <- format_metab_qtls(subset(metab_qtls, phenotype == "Ala"))
Independent top hits from GWASs on 47 protein levels in whole blood are stored in the proteomic_qtls data object. Use ?proteomic_qtls to get more information.
data(proteomic_qtls)
head(proteomic_qtls)
## analyte chr position SNP gene location annotation other_allele
## 1 CFHR1 1 196698945 rs12144939 CFH cis missense T
## 2 IL6r 1 154425456 rs12126142 IL6R cis missense A
## 3 ApoA4 11 116677723 rs1263167 APOA4 cis intergenic G
## 4 SELE 9 136149399 rs507666 ABO trans intronic A
## 5 FetuinA 3 186335941 rs2070633 AHSG cis missense T
## 6 ACE 17 61566031 rs4343 ACE cis synonymous A
## effect_allele eaf maf pval beta se
## 1 G 0.643 0.357 8.99e-143 -1.108 0.04355258
## 2 G 0.608 0.392 1.81e-106 0.850 0.03878364
## 3 A 0.803 0.197 2.64e-54 -0.919 0.05922332
## 4 G 0.809 0.191 1.01e-52 -0.882 0.05771545
## 5 C 0.676 0.324 2.88e-44 -0.629 0.04506925
## 6 G 0.508 0.492 6.66e-44 0.493 0.03547679
#For example, to obtain instruments for the ApoH protein:
apoh_exp_dat <-
format_proteomic_qtls(subset(proteomic_qtls, analyte == "ApoH"))
## Warning in format_data(proteomic_qtls_subset, type = type, phenotype_col =
## "analyte"): effect_allele column is not character data. Coercing...
## Warning in format_data(proteomic_qtls_subset, type = type, phenotype_col =
## "analyte"): other_allele column is not character data. Coercing...
Independent top hits from GWASs on 32432 gene identifiers and in 44 tissues are available from the GTEX study in gtex_eqtl. Use ?gtex_eqtl to get more information.
data(gtex_eqtl)
head(gtex_eqtl)
## tissue gene_name gene_start SNP snp_position
## 1 Adipose Subcutaneous RP4-669L17.10 1:317720 rs2519065 1:787151
## 2 Adipose Subcutaneous RP11-206L10.1 1:661611 rs11804171 1:723819
## 3 Adipose Subcutaneous RP11-206L10.3 1:677193 rs149110718 1:759227
## 4 Adipose Subcutaneous RP11-206L10.2 1:700306 rs148649543 1:752796
## 5 Adipose Subcutaneous RP11-206L10.9 1:714150 rs12184279 1:717485
## 6 Adipose Subcutaneous RP11-206L10.8 1:736259 rs10454454 1:754954
## effect_allele other_allele beta se pval n
## 1 A G 0.551788 0.0747180 2.14627e-12 298
## 2 A T -0.917475 0.1150060 4.99967e-14 298
## 3 T C 0.807571 0.1776530 8.44694e-06 298
## 4 T C 0.745393 0.0958531 1.82660e-13 298
## 5 A C 1.927250 0.2247390 9.55098e-16 298
## 6 A G 1.000400 0.1787470 5.61079e-08 298
Independent top hits from GWASs on 0 DNA methylation levels in whole blood across 5 time points are available from the ARIES study in aries_mqtl. Use ?aries_mqtl to get more information.
data(aries_mqtl)
head(aries_mqtl)
## SNP timepoint cpg beta pval se snp_chr
## 1 esv2656832 1 cg21826606 0.3459 1.60408e-26 0.03265336 1
## 2 esv2658098 1 cg22681495 -0.6263 1.55765e-66 0.03643240 15
## 3 esv2660043 1 cg24276624 -0.5772 3.16370e-26 0.05481823 11
## 4 esv2660043 1 cg11157765 -0.5423 1.33928e-22 0.05583777 11
## 5 esv2660673 1 cg05832925 -0.5919 2.88011e-50 0.03982467 11
## 6 esv2660769 1 cg05859533 -0.6224 1.49085e-58 0.03868158 16
## snp_pos effect_allele other_allele eaf sex age units
## 1 25591901 I R 0.3974 mixed Birth SD units
## 2 86057007 D R 0.2076 mixed Birth SD units
## 3 69982552 D R 0.1450 mixed Birth SD units
## 4 69982552 D R 0.1450 mixed Birth SD units
## 5 74024905 D R 0.1671 mixed Birth SD units
## 6 57725395 D R 0.2136 mixed Birth SD units
## island_location cpg_chr cpg_pos gene gene_location cis_trans
## 1 N_Shore 1 25593055 cis
## 2 15 86058755 AKAP13 Body cis
## 3 11 69982941 ANO1 Body cis
## 4 11 69982996 ANO1 Body cis
## 5 S_Shelf 11 74026371 cis
## 6 16 57727230 CCDC135 TSS1500 cis
#For example, to obtain instruments for cg25212131 CpG DNA methylation levels in at birth:
cg25212131_exp_dat <-
format_aries_mqtl(subset(aries_mqtl, cpg == "cg25212131" &
age == "Birth"))
The IEU GWAS database contains the entire summary statistics for thousands of GWASs. You can browse them here: https://gwas.mrcieu.ac.uk/
You can use this database to define the instruments for a particular exposure. You can also use this database to obtain the effects for constructing polygenic risk scores using different p-value thresholds.
To obtain a list and details about the available GWASs do the following:
#ao <- available_outcomes()
#head(ao)
#The available_outcomes function returns a table of all the available studies in the database. Each study has a unique ID. e.g.
#head(subset(ao, select = c(trait, id)))
To extract instruments for a particular trait using a particular study, for example to obtain SNPs for body mass index using the Locke et al 2015 GIANT study, you specify the study ID as follows:
bmi2014_exp_dat <- extract_instruments(outcomes = 'ieu-a-2')
## API: public: http://gwas-api.mrcieu.ac.uk/
dim(bmi2014_exp_dat)
## [1] 79 15
bmi2014_exp_dat[1:5,1:5]
## pval.exposure samplesize.exposure chr.exposure se.exposure beta.exposure
## 1 2.18198e-08 339152 1 0.0030 -0.0168
## 2 4.56773e-11 339065 1 0.0031 0.0201
## 3 5.05941e-14 313621 1 0.0087 0.0659
## 4 5.45205e-10 338768 1 0.0029 0.0181
## 5 1.88018e-28 338123 1 0.0030 0.0331
For standard two sample MR it is important to ensure that the instruments for the exposure are independent. Once instruments have been identified for an exposure variable, the IEU GWAS database can be used to perform clumping.
You can provide a list of SNP IDs, the SNPs will be extracted from 1000 genomes data, LD calculated between them, and amongst those SNPs that have LD R-square above the specified threshold only the SNP with the lowest P-value will be retained. To do this, use the following command:
bmi_exp_dat <- clump_data(bmi_exp_dat)
## Please look at vignettes for options on running this locally if you need to run many instances of this command.
## Clumping aYQffE, 32 variants, using EUR population reference
## Removing 4 of 32 variants due to LD with other variants or absence from LD reference panel
## Clumping 7Ch2Q5, 6 variants, using EUR population reference
## Removing 1 of 6 variants due to LD with other variants or absence from LD reference panel
dim(bmi_exp_dat)
## [1] 33 15
bmi2014_exp_dat[1:5,1:5]
## pval.exposure samplesize.exposure chr.exposure se.exposure beta.exposure
## 1 2.18198e-08 339152 1 0.0030 -0.0168
## 2 4.56773e-11 339065 1 0.0031 0.0201
## 3 5.05941e-14 313621 1 0.0087 0.0659
## 4 5.45205e-10 338768 1 0.0029 0.0181
## 5 1.88018e-28 338123 1 0.0030 0.0331
The clump_data command takes any data frame that has been formatted to be an exposure data type of data frame. Note that for the instruments in the R/MRInstruments package the SNPs are already LD clumped.
Note: The LD reference panel only includes SNPs (no INDELs). There are five super-populations from which LD can be calculated, by default European samples are used. Only SNPs with MAF > 0.01 within-population are available.
NOTE: If a variant is dropped from your unclumped data it could be because it is absent from the reference panel. For more flexibility, including using your own LD reference data, see here: https://mrcieu.github.io/ieugwasr/