0.1 Data

0.1.1 UK Biobank Pharma Proteomics Project (UKB-PPP) Proteins

2940 summary statistics; Europeans; both GRCh19/38

  • Inflammation: 736
  • Cardiometabolic: 736
  • Oncology: 735
  • Neurology: 733

0.1.2 METSIM

1391 summary statistics; Finnish men; GRCh38; tsv files

  • Amino Acid: 215
  • Carbohydrate: 25
  • Cofactors and Vitamins: 38
  • Energy: 10
  • Lipid: 548
  • Nucleotide: 42
  • Partially Characterized: 16
  • Peptide: 42
  • Uncharacterized: 292
  • Xenobiotics: 163

0.1.3 Nightingale

249 summary statistics; Europeans; GRCh37; vcf files

Don’t have a breakdown by class, but do know the files include:

  • Lipoprotein Subclasses:
    • Includes particle concentrations and composition
  • Lipoprotein Particle Size
  • Apolipoprotein A-I and B
  • Multiple Cholesterol and Triglyceride Measures
  • Albumin
  • Various Fatty Acids
  • Low-Molecular-Weight Metabolites:
    • Amino acids (including branched-chain and aromatic)
    • Glycolysis-related measures
    • Ketone bodies

0.1.4 Clinical Phenotype(s)

  • Coronary Heart Disease (CHD) GRCh37; vcf file; unformatted and downloaded

I haven’t yet added CHD to the analysis, nor have I formatted the data. I’m waiting to finish benchmarking proteins and metabolites. But the plan is to add in CHD and other clinical phenotypes.

The CHD data below was done on the CARDIoGRAMplusC4D chip, which included Europeans and South Asians. (The authors of Hyprcoloc used it while saying that their method relies on the traits in the model being from the same population! Begs some questions in the colloquial sense [vs petitio principii], such as: “Which is it? Can Hyprcoloc handle traits from different ancestries or not?” “Was it on oversight or on purpose?”)

In the R documentation for Hyprcoloc, LD matrices aren’t required. This is because the authors assume the traits don’t overlap. There are options for indicating the traits do overlap, but the analysis gets vastly more complex. We need three additional matrices if samples overlap. So, it is best to choose wisely from the beginning, at least initally.

UKP-PPP, METSIM, and CHD CARDIoGRAMplusC4D should be reasonably independent. (I may need to change course on the UKB Nightingale idea. There are Nightingale summary statistics that contain fewer people that aren’t from the UKB. I might be able to get them instead of the ones I downloaded. METSIM is the priority, now.)

Phenotype Abbreviation Dataset Author PubMed ID Sample Size (N) Number of Cases (N Cases) Population
Coronary Heart Disease CHD ieu-a-7 Nikpay 26343387 184,305 60,801 Mixed

NOTE: Remember the Jurgens data…

  • e.g, for BMI: “/Users/charleenadams/acy1_bmi/GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv”

0.2 Directories


0.3 Fetch metabolite summary statistics

0.4 Format UKB-PPP pQTLs

I had previously obtained these (see https://rpubs.com/YodaMendel/1243451) for an example of how to programmatically get files from UKB-PPP.

0.4.1 Untar all 2940

0.4.2 Add rsIDs by CHR

  • ~28 hours run time parallelized, including MAC sleeping overnight😞
    • Next time use caffeinate -i ./add_rsids.sh

0.4.3 🚫 DON’T 🚫 merge UKP-PPP chromosome files: draft script!!!

Takes too long; keep as chromosomes; retained because it likely works and we might need the chromosomes merged if we try out a non-cis-region approach later.

0.4.4 Create cis-regions

The method I devised below obtains cis-regions (500KB up and downstream of TSSes using Ensembl) for 2808 of 2940 (96%) of the files.

0.5 Debugging headers

Adding in the rsIDs created headers with mixed delimiters, so I fixed that.

0.6 Format Nightingale VCF files: draft script

1 PCSK9 and METSIM

1.1 Subset METSIM files by PCSK9 cis-region

1.2 Filter METSIM files on at least one SNP with P<5E-08 in PCSK9 cis-region

1.3 Harmonizing PCSK9 and METSIM metabolites

1.4 Prep for Hyprcoloc

1.5 Hyprcoloc of PCSK9 and 184 METSIM metabolites

We conducted a colocalization analysis using HyPrColoc to explore shared genetic loci influencing PCSK9 and 184 METSIM metabolites.

1.6 Hyprcoloc of PCSK9 and top 10 METSIM metabolites

top = 10 metabolites containing rsIDs with smallest p-values


1.7 Summary tables: easier on the eyes

Here I provide James Staley’s (Hyprcoloc package author) example results. Their paper for the package had examined CHD. Their top putatively causal SNP (rs11591147) is the same as ours:

1.7.1 Staley’s table

Iteration Traits Posterior Probability Regional Probability Candidate SNP Posterior Explained by SNP Dropped Trait
1 T1, T2, T3, T4, T5 1.0000 1 rs11591147 1.0000 NA
2 T6, T7, T8 0.9164 1 rs12117612 0.4197 NA
3 T9, T10 0.9018 1 rs7524677 0.0763 NA

1.7.2 Our PCSK9 and 184 metabolites (uniform priors = FALSE)

Iteration Traits Posterior Prob Regional Prob Candidate SNP Posterior Explained by SNP
1 1,2-dipalmitoyl-GPC (16:0/16:0), 1-(1-enyl-palmitoyl)-2-oleoyl-GPC (P-16:0/18:1), 1-(1-enyl-palmitoyl)-2-palmitoyl-GPC (P-16:0/16:0), 1-palmitoyl-2-stearoyl-GPC (16:0/18:0), 1-palmityl-2-palmitoyl-GPC (O-16:0/16:0), 1-stearoyl-2-oleoyl-GPI (18:0/18:1), 1-stearoyl-GPC (18:0), N-palmitoyl-sphingosine (d18:1/16:0), PCSK9, X - 23641, arachidoylcarnitine (C20), behenoyl sphingomyelin (d18:1/22:0), cerotoylcarnitine (C26), cholesterol, glycosyl ceramide (d18:1/20:0, d16:1/22:0), glycosyl ceramide (d18:2/24:1, d18:1/24:2), glycosyl-N-palmitoyl-sphingosine (d18:1/16:0), hydroxypalmitoyl sphingomyelin (d18:1/16:0(OH)), lactosyl-N-nervonoyl-sphingosine (d18:1/24:1), lignoceroyl sphingomyelin (d18:1/24:0), myristoyl dihydrosphingomyelin (d18:0/14:0), palmitoyl dihydrosphingomyelin (d18:0/16:0), palmitoyl sphingomyelin (d18:1/16:0), palmitoyl-sphingosine-phosphoethanolamine (d18:1/16:0), sphingomyelin (d17:1/16:0, d18:1/15:0, d16:1/17:0), sphingomyelin (d18:0/20:0, d16:0/22:0), sphingomyelin (d18:1/20:0, d16:1/22:0), sphingomyelin (d18:1/24:1, d18:2/24:0), sphingomyelin (d18:1/25:0, d19:0/24:1, d20:1/23:0, d19:1/24:0), sphingomyelin (d18:2/16:0, d18:1/16:1) 0.4593 0.6101 rs11591147 1

1.7.3 Our PCSK9 and 184 metabolites (uniform priors = TRUE)

Iteration Traits Posterior Prob Regional Prob Candidate SNP Posterior Explained by SNP
1 1,2-dipalmitoyl-GPC (16:0/16:0), 1-(1-enyl-palmitoyl)-2-oleoyl-GPC (P-16:0/18:1), 1-(1-enyl-palmitoyl)-2-palmitoyl-GPC (P-16:0/16:0), 1-palmitoyl-2-stearoyl-GPC (16:0/18:0), 1-palmityl-2-palmitoyl-GPC (O-16:0/16:0), 1-stearoyl-2-oleoyl-GPI (18:0/18:1), 1-stearoyl-GPC (18:0), N-palmitoyl-sphingosine (d18:1/16:0), PCSK9, X - 23641, arachidoylcarnitine (C20), behenoyl sphingomyelin (d18:1/22:0), cerotoylcarnitine (C26), cholesterol, glycosyl ceramide (d18:1/20:0, d16:1/22:0), glycosyl ceramide (d18:2/24:1, d18:1/24:2), glycosyl-N-palmitoyl-sphingosine (d18:1/16:0), hydroxypalmitoyl sphingomyelin (d18:1/16:0(OH)), lactosyl-N-nervonoyl-sphingosine (d18:1/24:1), lignoceroyl sphingomyelin (d18:1/24:0), myristoyl dihydrosphingomyelin (d18:0/14:0), palmitoyl dihydrosphingomyelin (d18:0/16:0), palmitoyl sphingomyelin (d18:1/16:0), palmitoyl-sphingosine-phosphoethanolamine (d18:1/16:0), sphingomyelin (d17:1/16:0, d18:1/15:0, d16:1/17:0), sphingomyelin (d18:0/20:0, d16:0/22:0), sphingomyelin (d18:1/20:0, d16:1/22:0), sphingomyelin (d18:1/24:1, d18:2/24:0), sphingomyelin (d18:1/25:0, d19:0/24:1, d20:1/23:0, d19:1/24:0), sphingomyelin (d18:2/16:0, d18:1/16:1) 0.7712 0.7792 rs11591147 1
2 (S)-3-hydroxybutyrylcarnitine, 3-hydroxydecanoate, 3-hydroxydecanoylcarnitine, 3-hydroxyhexanoate, 3-hydroxylaurate, 3-hydroxyoctanoate, X - 15469, X - 16397, X - 18921, X - 21353, X - 22519, cis-4-decenoate (10:1n6), dodecadienoate (12:2), tetradecadienoate (14:2) 0.7638 0.7864 rs1616691 1
3 2,3-dihydroxy-5-methylthio-4-pentenoate (DMTPA), 2-palmitoyl-GPE (16:0), X - 15503, adipoylcarnitine (C6-DC), dimethylarginine (SDMA + ADMA), kynurenine, pseudouridine 0.7604 0.7791 rs141072298 0.8198
4 N-acetylkynurenine (2), cis-4-decenoylcarnitine (C10:1), decanoylcarnitine (C10), hexanoylcarnitine (C6), octanoylcarnitine (C8) 0.8019 0.8254 rs148238018 0.9829

1.7.4 Our PCSK9 and 184 metabolites (uniform priors = FALSE; regional and alignment thresholds = 0.95)

Iteration Traits Posterior Prob Regional Prob Candidate SNP Posterior Explained by SNP
1 1,2-dipalmitoyl-GPC (16:0/16:0), 1-(1-enyl-palmitoyl)-2-oleoyl-GPC (P-16:0/18:1), 1-(1-enyl-palmitoyl)-2-palmitoyl-GPC (P-16:0/16:0), 1-palmitoyl-2-stearoyl-GPC (16:0/18:0), 1-palmityl-2-palmitoyl-GPC (O-16:0/16:0), N-palmitoyl-sphingosine (d18:1/16:0), PCSK9, behenoyl sphingomyelin (d18:1/22:0), cholesterol, glycosyl ceramide (d18:1/20:0, d16:1/22:0), glycosyl ceramide (d18:2/24:1, d18:1/24:2), glycosyl-N-palmitoyl-sphingosine (d18:1/16:0), hydroxypalmitoyl sphingomyelin (d18:1/16:0(OH)), lactosyl-N-nervonoyl-sphingosine (d18:1/24:1), lignoceroyl sphingomyelin (d18:1/24:0), myristoyl dihydrosphingomyelin (d18:0/14:0), palmitoyl dihydrosphingomyelin (d18:0/16:0), palmitoyl sphingomyelin (d18:1/16:0), palmitoyl-sphingosine-phosphoethanolamine (d18:1/16:0), sphingomyelin (d17:1/16:0, d18:1/15:0, d16:1/17:0), sphingomyelin (d18:0/20:0, d16:0/22:0), sphingomyelin (d18:1/20:0, d16:1/22:0), sphingomyelin (d18:1/24:1, d18:2/24:0), sphingomyelin (d18:2/16:0, d18:1/16:1) 0.9687 0.9808 rs11591147 1
2 (S)-3-hydroxybutyrylcarnitine, 3-hydroxydecanoate, 3-hydroxydecanoylcarnitine, 3-hydroxyhexanoate, 3-hydroxylaurate, 3-hydroxyoctanoate, X - 15469, X - 21353, X - 22519 0.9832 0.9884 rs1616691 1

1.7.5 Our PCSK9 and 10 metabolites (uniform priors = FALSE)

Iteration Traits Posterior Prob Regional Prob Candidate SNP Posterior Explained by SNP
1 1-(1-enyl-palmitoyl)-2-palmitoyl-GPC (P-16:0/16:0), PCSK9, cholesterol, glycosyl ceramide (d18:1/20:0, d16:1/22:0), glycosyl-N-palmitoyl-sphingosine (d18:1/16:0), hydroxypalmitoyl sphingomyelin (d18:1/16:0(OH)), palmitoyl dihydrosphingomyelin (d18:0/16:0), palmitoyl sphingomyelin (d18:1/16:0), palmitoyl-sphingosine-phosphoethanolamine (d18:1/16:0), sphingomyelin (d18:1/24:1, d18:2/24:0) 0.9994 1 rs11591147 1

1.7.6 Our PCSK9 and 10 metabolites (uniform priors = TRUE)

Iteration Traits Posterior Prob Regional Prob Candidate SNP Posterior Explained by SNP
1 1-(1-enyl-palmitoyl)-2-palmitoyl-GPC (P-16:0/16:0), PCSK9, cholesterol, glycosyl ceramide (d18:1/20:0, d16:1/22:0), glycosyl-N-palmitoyl-sphingosine (d18:1/16:0), hydroxypalmitoyl sphingomyelin (d18:1/16:0(OH)), palmitoyl dihydrosphingomyelin (d18:0/16:0), palmitoyl sphingomyelin (d18:1/16:0), palmitoyl-sphingosine-phosphoethanolamine (d18:1/16:0), sphingomyelin (d18:1/24:1, d18:2/24:0) 1 1 rs11591147 1

1.7.7 Our PCSK9 and 10 metabolites (uniform priors = FALSE; regional and alignment thresholds = 0.95)

Iteration Traits Posterior Prob Regional Prob Candidate SNP Posterior Explained by SNP
1 1-(1-enyl-palmitoyl)-2-palmitoyl-GPC (P-16:0/16:0), PCSK9, cholesterol, glycosyl ceramide (d18:1/20:0, d16:1/22:0), glycosyl-N-palmitoyl-sphingosine (d18:1/16:0), hydroxypalmitoyl sphingomyelin (d18:1/16:0(OH)), palmitoyl dihydrosphingomyelin (d18:0/16:0), palmitoyl sphingomyelin (d18:1/16:0), palmitoyl-sphingosine-phosphoethanolamine (d18:1/16:0), sphingomyelin (d18:1/24:1, d18:2/24:0) 0.9994 1 rs11591147 1