Meta-analytic and Wald ratio Mendelian randomization of expression of cytosolic and mitochondrial ribosomal proteins on breast cancer

MR-RP-eqtlGen-bc

Charleen D. Adams1 and Bernardo Lemos1,2

1Department of Environmental Health, Program in Molecular and Integrative Physiological Sciences, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
2Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA

2021-04-15

Abstract

Under conditions of nucleolar stress, various ribosomal proteins (RPs) interact with the MDM2 proto-oncogene (MDM2) to stabilize p53. Dysregulation of the RPs may disturb this pathway, but whether changes in the expression of the RPs influence risk for breast cancer in the general population is unknown. We tested whether expression changes in individual cytosolic and mitochondrial RPs influence risk for breast cancer using inverse-variance weighted (IVW) and Wald ratio Mendelian randomization (MR). Summary statistics for cis expression quantitative trait loci (eQTLs) for SNPs controlling expression of RPs (P<5x10-06) were extracted from the eqtlGen Consortium’s genome-wide association study (GWAS) of expression in blood (n=125 RPs). Summary statistics for 123 of the 125 RPs were extracted from the Breast Cancer Association Consortium’s (BCAC) 2020 genome-wide association study (GWAS) of breast cancer (133,384 cases and 113,789). After linkage disequilibirum (LD)-clumping each set of eQTLs for a given RP, there were 31 RPs available for meta-analysis. While the IVW result for RPL13A was significant (P=0.011), after false-discovery rate multiple-testing correction, there were no significant findings. Seven RPs had at least one significant Wald ratio test after FDR correction (MRPL19, MRPL21, MRPL23, MRPL24, MRPL34, MRPS30, and RPL13). Three of these (MRPL19, MRPL23, and MRPS30) could not be meta-analyzed. For MRPL21,MRPL24, MRPL34, and RPL13, the null IVW results suggest that the RP eQTLs for these RPs influence BC through pathways other than the expression of the RP they control.

0.1 Format eqtlGen GWAS
1 Clean the BCAC data
2 Harmonize
3 Split, loop, MR, & FDR-correct results

0.1 Format eqtlGen GWAS

eqtlGen stored MAFs in a separate file.
MAFs are needed for the calculation of the betas & SEs from the Z-scores.
Methods for this are provided by: Zhu 2016: https://www.nature.com/articles/ng.3538

\[ \beta = \frac{z} {\sqrt2p(1-p)(n + z^2)} \] \[ SE = \frac{1} {\sqrt2p(1-p)(n + z^2)} \] where \[p = MAF\]

# README files for the eQTLGen GWAS
# This README accompanies the files with cis-eQTL results from eQTLGen
 
# Files
# -----
# File with full cis-eQTL results: 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
# File with significant (FDR<0.05) cis-eQTL results: 2019-12-11-cis-eQTLsFDR0.05-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
 
# Column Names
# ------------
# Pvalue - P-value
# SNP - SNP rs ID
# SNPChr - SNP chromosome
# SNPPos - SNP position
# AssessedAllele - Assessed allele, the Z-score refers to this allele
# OtherAllele - Not assessed allele
# Zscore - Z-score
# Gene - ENSG name (Ensembl v71) of the eQTL gene
# GeneSymbol - HGNC name of the gene
# GeneChr - Gene chromosome
# GenePos - Centre of gene position
# NrCohorts - Total number of cohorts where this SNP-gene combination was tested
# NrSamples - Total number of samples where this SNP-gene combination was tested
# FDR - False discovery rate estimated based on permutations
# BonferroniP - P-value after Bonferroni correction
# 
# Additional information
# ----------------------
# These files contain all cis-eQTL results from eQTLGen, accompanying the article.
# 19,250 genes that showed expression in blood were tested.
# Every SNP-gene combination with a distance <1Mb from the center of the gene and  tested in at least 2 cohorts was included.
# Associations where SNP/proxy positioned in Illumina probe were not removed from combined analysis.
# UPDATE LOG
# ----------
# 2018-10-19: Initial data release
# 2018-12-19: In the current README, following file names have been fixed and updated:
# 2019-12-20: Cis-eQTLs are now updated to have a 2-cohort filter: every cis-eQTL must be tested in at least 2 cohorts to be reported
# On cluster, I gzipped the eqtlGen file: gzip 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt
# this is because I kept getting error messages both here & on the cluster saying ther was no room on the disk.
# I moved unnecessary files to a folder in my personal computer DB: /Users/charleenadams/Dropbox/Harvard/bc/survival/odyssey_transfer/
# This has the gz files for the protein GWAS, the eQTLGen SMR file, and the eQTLGen EAF file.
# I launched an interacted terminal session with increased memory: 
# salloc -p test  --mem 5000 -t 0-06:00
#-----------------------------------------------------------------------------------------------------------------------
# README file for the MAF eQTLGen cis-eQTL data 
# A README accompanies the file with allele frequencies from eQTLGen cis-eQTL analysis: 
# 2018-07-18_SNP_AF_for_AlleleB_combined_allele_counts_and_MAF_pos_added.txt.gz
# Column Names
# ------------
# SNP -         SNP rs ID
# hg19_chr -    chr number
# hg19_pos -    SNP position (hg19)
# AlleleA -     Other allele
# AlleleB -     Assessed allele
# allA_total -  Total allele count of genotype AA
# allAB_total - Total allele count of genotype AB
# allB_total -  Total allele count of genotype BB
# AlleleB_all - Allele frequency of assessed allele
# Additional information
# ----------------------
# The allele frequencies were calculated using reported allele counts from all cohorts except 
# Framingham Heart Study, because the related samples are present in this cohort.
#-----------------------------------------------------------------------------------------------------------------------
# Remember that the eqtlGen file is stored in /n/home04/cdadams/MR-eqtlGen-long
# Don't move to MR-RP-eqtlGen-bc because it will result a storage over-capacity
setwd('/n/home04/cdadams/MR-eqtlGen-long/')
tbl=fread('cat 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz | gunzip')

rp_list_expanded=read.csv('/n/home04/cdadams/ld/ldsc/expanded_verified.csv')
rp_list_expanded

# Extract RPs
RPs=tbl[which(tbl$GeneSymbol %in% rp_list_expanded$all_RP),]
freq=table(RPs$GeneSymbol)
rp_freq=as.data.frame(freq)

# Read-in MAFs
eqtl_MAF=read.table(file = "2018-07-18_SNP_AF_for_AlleleB_combined_allele_counts_and_MAF_pos_added.txt", 
                    sep = "\t", header=TRUE)
RPs_total <- merge(RPs,eqtl_MAF,by="SNP")

# Calculate betas and SEs
n=30596
RPs_total$Beta=RPs_total$Zscore/sqrt(2*(RPs_total$AlleleB_all)*
                  (1-RPs_total$AlleleB_all)*(n+RPs_total$Zscore^2))
RPs_total$SE=1/sqrt(2*(RPs_total$AlleleB_all)*
                  (1-RPs_total$AlleleB_all)*(n+RPs_total$Zscore^2))
write.table(RPs_total, file = "/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs_total_maf_beta_se.txt", sep = "\t",
            row.names = FALSE, col.names = TRUE, quote=FALSE)
RPs_total=read.delim("/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs_total_maf_beta_se.txt", check.names = FALSE)

# gzRPs <- gzfile("/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs.gz", "w")
# write.csv(RPs, gzRPs)
# close(gzRPs)
# Read-in for MR formatting

freq=table(RPs_total$GeneSymbol)
rp_freq=as.data.frame(freq)
rp_freq

setwd("/n/home04/cdadams/MR-RP-eqtlGen-bc/")
exposure_RP_eqtlGen <- read_exposure_data(
  filename = "/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs_total_maf_beta_se.txt",
  sep = '\t',
  snp_col = 'SNP',
  beta_col = 'Beta',
  se_col = 'SE',
  effect_allele_col = 'AssessedAllele',
  #phenotype_col = 'tissue',
  #units_col = 'tissue',
  other_allele_col = 'OtherAllele',
  eaf_col = 'AlleleB_all',
  #samplesize_col = 'samplesize',
  #ncase_col = 'ncase',
  #ncontrol_col = 'ncontrol',
  gene_col = 'GeneSymbol',
  pval_col = 'Pvalue', 
  id_col = 'RP eqtlGEN'
)
exposure_RP_eqtlGen_p=exposure_RP_eqtlGen[which(exposure_RP_eqtlGen$pval.exposure<0.000005),]
write.csv(exposure_RP_eqtlGen_p, '/n/home04/cdadams/MR-RP-eqtlGen-bc/exposure_RP_eqtlGen_p.csv')

# Just verifying that genes aren't lost when reading in
exposure_RP_eqtlGen_p=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/exposure_RP_eqtlGen_p.csv')
colnames(exposure_RP_eqtlGen_p)
freq2=table(exposure_RP_eqtlGen_p$gene.exposure)
rp_freq2=as.data.frame(freq2)
rp_freq2

1 Clean the BCAC data

I’ve kept the BCAC data in the ldsr folder since copying the GWAS files to each new folder for each new project will result in over-reaching the storage capacity I’m allotted on Odyssey.

# Read-in the BCAC data
# I had compressed the file before transfering to the cluster: 
# gzip icogs_onco_gwas_meta_overall_breast_cancer_summary_level_statistics.txt
# An Excel data dictionary accompanies the BCAC data
# /n/home04/cdadams/ld/ldsc//data_dictionary_summary_stats_bc_risk_2020.xlsx
#Data dictiony
# iCOGS-Oncoarray-GWAS-Meta-Overall-summary-results 
# Label Meaning
# var_name  The variant name is coded as chromosome_position_non-effect-allele_effect-allele
# Effect.Gwas   The effect allele in GWAS
# Baseline.Gwas The non effect allele in GWAS
# Freq.Gwas Effect allele frequency
# beta.Gwas Log odds ratio (OR) for effect alllele in GWAS data
# SE.Gwas   standard error of the log OR estimate in GWAS data
# P.value.Gwas  P-value of the log OR estimate in GWAS data
# SNP.iCOGS SNPs name in iCOGS panel. For SNPs with rsid, the name is coded as rsid:position:non-effect-allele:effect-allele; 
# For SNPs without rsid, the name is similarly as var_name 

bcac=read.table(gzfile("/n/home04/cdadams/ld/ldsc/icogs_onco_gwas_meta_overall_breast_cancer_summary_level_statistics.txt.gz"), header=TRUE)

bcac$id=gsub( ":.*$", "", bcac$SNP.iCOGs)
bcac2=bcac[which(startsWith(bcac$id, 'rs')),]   
colnames(bcac2)[5]="Beta"
#save.image("longevity_bcac.Rdata")

write.table(bcac2, file = "/n/home04/cdadams/ld/ldsc/bcac_master3.txt", sep = "\t",
            row.names = FALSE, col.names = TRUE, quote=FALSE)
# Read-in the bcac file formated for ldsc
bcac3=read.table(file = "/n/home04/cdadams/ld/ldsc/bcac_master3.txt", sep = "\t", header=TRUE)

2 Harmonize

exposure_RP_eqtlGen_p=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/exposure_RP_eqtlGen_p.csv')
freq=table(exposure_RP_eqtlGen_p$gene.exposure)
freq=as.data.frame(freq)

bc_outcome <- read_outcome_data(
  snps = exposure_RP_eqtlGen_p$SNP,
  filename = "/n/home04/cdadams/ld/ldsc/bcac_master3.txt",
  sep = '\t',
  snp_col = 'id',
  beta_col = 'Beta',
  se_col = 'SE.Gwas',
  effect_allele_col = 'Effect.Gwas',
  #phenotype_col = 'tissue',#tissue
  #units_col = 'tissue',
  other_allele_col = 'Baseline.Gwas',
  eaf_col = 'Freq.Gwas',
  #samplesize_col = 'samplesize',
  #ncase_col = '133384',
  #ncontrol_col = '113789',
  #gene_col = 'gene_name',
  pval_col = 'P.value.Gwas', 
  id_col = 'breast cancer'
)

# Harmonize the alleles btw the two gwas
RP_eqtlGen_bc <- harmonise_data(exposure_RP_eqtlGen_p, bc_outcome, action = 2)
RP_eqtlGen_bc$samplesize.outcome="247173"
RP_eqtlGen_bc$samplesize.exposure="30596"
RP_eqtlGen_bc$id.exposure=RP_eqtlGen_bc$gene.exposure
write.csv(RP_eqtlGen_bc, '/n/home04/cdadams/MR-RP-eqtlGen-bc/RP_eqtlGen_bc.csv') #can simply read this in for the MR

# Descriptives
colnames(RP_eqtlGen_bc)
freq3=table(RP_eqtlGen_bc$gene.exposure)
rp_freq3=as.data.frame(freq3)
rp_freq3
rp_freq3=rp_freq3[order(rp_freq3$Freq),]
rp_freq3

3 Split, loop, MR, & FDR-correct results

setwd('/n/home04/cdadams/MR-RP-eqtlGen-bc')
RP_eqtlGen_bc=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/RP_eqtlGen_bc.csv')

# Command to remove everything except "v"
#rm(list=(ls()[ls()!="v"]))
# Split the harmonized data into separate data frames for each genes

RP_split <- split(RP_eqtlGen_bc, RP_eqtlGen_bc$id.exposure)
list2env(RP_split, envir= .GlobalEnv) #split the list into separate datasets

# Remove the original RP_eqtlGen_bc and RP_split
rm(list= ls()[(ls() %in% c("RP_eqtlGen_bc","RP_split"))])

# Save each harmonized gene as a .csv
# for (i in 1:length(files)){
#   write.csv(files[[i]], paste(names(files[i]), ".csv", sep = ""))
# }
# Get them back together: not used here but useful
# dfs = sapply(.GlobalEnv, is.data.frame) 
# dfs
# 
# df=do.call(rbind, mget(names(dfs)[dfs]))
# head(df)
# dim(df)

# Combine the separated files into a list
files <- mget(ls())
length(files)

# Use a loop to clump each RP dataframe in the list, run MR, & save results
for (i in 1:length(files)){
   files[[i]]=clump_data(files[[i]])
   write.csv(files[[i]], paste(names(files[i]), "_clump.csv", sep = ""))
   files[[i]]=mr(files[[i]])
   write.csv(files[[i]], paste(names(files[i]), "_mr.csv", sep = ""))
}

# Idea for reading all mr results in and saving:
# Idea from: https://statisticsglobe.com/merge-csv-files-in-r
# Sadly, didn't work. 
# data_all <- list.files(path = "/n/home04/cdadams/MR-RP-eqtlGen-bc",  
#   pattern = "*_mr.csv", full.names = TRUE) %>% 
#   lapply(read_csv) %>%                                                    
#   bind_rows                                                                
#   write.csv(data_all, "complete_RP_MR_res.csv")

# Merged all of the clumped data 
tbl_fread_clump <- 
    list.files(pattern = "*_clump.csv") %>% 
    map_df(~fread(.))
tbl_fread_clump_df=as.data.frame(tbl_fread_clump)
head(tbl_fread_clump_df)
colnames(tbl_fread_clump_df)

df <- tbl_fread_clump_df[c(3:34)] #check this
write.csv(df, '/n/home04/cdadams/MR-RP-eqtlGen-bc/df.csv')

# Merge all the MR results and save 
tbl_fread <- 
    list.files(pattern = "*_mr.csv") %>% 
    map_df(~fread(.))
tbl_fread_df=as.data.frame(tbl_fread)
myvars=c('id.exposure','id.outcome','method', 'nsnp','b','se','pval')
res=tbl_fread_df[myvars]  
res$id.outcome="bc"
res=res[order(res$p),]

# FDR correction
p=res$p
fdr=p.adjust(p, method = "fdr", n = length(p))
res$fdr=fdr
res=res[order(res$fdr),]
write.csv(res,'/n/home04/cdadams/MR-RP-eqtlGen-bc/MR-eqtlGEN-RP-bc.csv')

res=res[order(res$p),]

# Select just the set with >2 SNPs
res=res[order(res$fdr),]
res_meta_set=res[which(res$nsnp>2),]
res_meta_set=res_meta_set[which(res_meta_set$method=='Inverse variance weighted'),]
rownames(res_meta_set)

# FDR for res_meta_set (only IVWs)
p2=res_meta_set$p
fdr2=p.adjust(p2, method = "fdr", n = length(p2))
res_meta_set$fdr2=fdr2
res_meta_set$fdr=NULL
res_meta_set=res_meta_set[order(res_meta_set$fdr2),]
write.csv(res_meta_set,'/n/home04/cdadams/MR-RP-eqtlGen-bc/MR-eqtlGEN-RP-bc-meta.csv')

# FDR for res_meta_set: only RP (not MRP) (only IVWs)
res_meta_set=res_meta_set[order(res_meta_set$id.exposure),]

p2=res_meta_set$p
fdr2=p.adjust(p2, method = "fdr", n = length(p2))
res_meta_set$fdr2=fdr2
res_meta_set$fdr=NULL
res_meta_set=res_meta_set[order(res_meta_set$fdr2),]
write.csv(res_meta_set,'/n/home04/cdadams/MR-RP-eqtlGen-bc/MR-eqtlGEN-RP-bc-meta.csv')

# Consideration for later: Find way to merge the df and results

3.1 Results

3.1.1 Top results ordered by p-value (not FDR)

This includes a list of all tests run

includes 1 SNP tests (Wald ratios)
inverse variance weighted (IVW) – main meta-analytic test
sensitivity estimators: weighted median, weighted mode, and MR-Egger

res=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/MR-eqtlGEN-RP-bc.csv')
head(res, n=5)

##     X id.exposure            id.outcome                    method nsnp
## 1 113      MRPS30 overall breast cancer                Wald ratio    1
## 2 156      RPL13A overall breast cancer Inverse variance weighted    4
## 3  57      MRPL41 overall breast cancer                Wald ratio    1
## 4 111      MRPS25 overall breast cancer Inverse variance weighted    2
## 5  14      MRPL19 overall breast cancer Inverse variance weighted    2
##            b         se        pval        fdr
## 1  0.9559052 0.24942715 0.000126898 0.03134381
## 2 -0.1652277 0.06542380 0.011553435 0.90321708
## 3 -0.4873273 0.19655197 0.013161047 0.90321708
## 4 -0.1536628 0.06293826 0.014626997 0.90321708
## 5 -0.1523476 0.06675390 0.022475879 0.97642760

3.1.2 Top IVW results by FDR

res_meta_set=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/MR-eqtlGEN-RP-bc-meta.csv')
dim(res_meta_set)

## [1] 31  9

res_meta_set

##      X id.exposure            id.outcome                    method nsnp
## 1  156      RPL13A overall breast cancer Inverse variance weighted    4
## 2   35      MRPL34 overall breast cancer Inverse variance weighted    3
## 3   46      MRPL37 overall breast cancer Inverse variance weighted    3
## 4  167       RPL17 overall breast cancer Inverse variance weighted    4
## 5   51      MRPL38 overall breast cancer Inverse variance weighted    5
## 6  186       RPL29 overall breast cancer Inverse variance weighted    3
## 7   77      MRPL52 overall breast cancer Inverse variance weighted    4
## 8  117      MRPS33 overall breast cancer Inverse variance weighted    3
## 9  224       RPS23 overall breast cancer Inverse variance weighted    6
## 10 161       RPL14 overall breast cancer Inverse variance weighted    4
## 11 135       MRPS9 overall breast cancer Inverse variance weighted    4
## 12 181       RPL28 overall breast cancer Inverse variance weighted    5
## 13 151       RPL13 overall breast cancer Inverse variance weighted    4
## 14  11      MRPL18 overall breast cancer Inverse variance weighted    4
## 15 140      RPL10A overall breast cancer Inverse variance weighted    4
## 16 130       MRPS7 overall breast cancer Inverse variance weighted    4
## 17  69      MRPL48 overall breast cancer Inverse variance weighted    5
## 18 206        RPL9 overall breast cancer Inverse variance weighted    4
## 19 245        RPS9 overall breast cancer Inverse variance weighted    3
## 20 105      MRPS21 overall breast cancer Inverse variance weighted    3
## 21 240        RPS8 overall breast cancer Inverse variance weighted    3
## 22  82      MRPL53 overall breast cancer Inverse variance weighted    3
## 23 201        RPL8 overall breast cancer Inverse variance weighted    5
## 24  23      MRPL21 overall breast cancer Inverse variance weighted    7
## 25  61      MRPL43 overall breast cancer Inverse variance weighted    4
## 26 125       MRPS6 overall breast cancer Inverse variance weighted    4
## 27 146       RPL12 overall breast cancer Inverse variance weighted    4
## 28  40      MRPL35 overall breast cancer Inverse variance weighted    3
## 29  18      MRPL20 overall breast cancer Inverse variance weighted    3
## 30 176      RPL27A overall breast cancer Inverse variance weighted    3
## 31  87      MRPL54 overall breast cancer Inverse variance weighted    4
##               b         se       pval      fdr2
## 1  -0.165227703 0.06542380 0.01155344 0.3581565
## 2  -0.091211365 0.05148747 0.07647334 0.5897460
## 3  -0.154948725 0.08842939 0.07973408 0.5897460
## 4  -0.088403512 0.05375430 0.10005542 0.5897460
## 5  -0.180656374 0.11557756 0.11803498 0.5897460
## 6   0.451353996 0.30112935 0.13390715 0.5897460
## 7   0.101153377 0.06981795 0.14738892 0.5897460
## 8  -0.114936022 0.08027212 0.15219251 0.5897460
## 9   0.070584901 0.06028215 0.24163539 0.7673316
## 10  0.086374102 0.07652116 0.25899865 0.7673316
## 11  0.063031463 0.05789512 0.27627825 0.7673316
## 12  0.064400510 0.06175610 0.29703157 0.7673316
## 13  0.051175243 0.05621589 0.36264636 0.8173705
## 14  0.052751938 0.05972047 0.37706655 0.8173705
## 15  0.087399102 0.10377422 0.39967357 0.8173705
## 16 -0.041279621 0.05440001 0.44796241 0.8173705
## 17 -0.051595424 0.06803552 0.44823544 0.8173705
## 18  0.067839791 0.10275462 0.50911807 0.8701942
## 19  0.033545757 0.05512650 0.54284051 0.8701942
## 20  0.032298800 0.06113505 0.59727805 0.8701942
## 21  0.031256333 0.06621773 0.63690993 0.8701942
## 22  0.043758890 0.10371184 0.67307786 0.8701942
## 23  0.027220134 0.06487378 0.67478786 0.8701942
## 24  0.011232709 0.02899473 0.69845627 0.8701942
## 25 -0.024865610 0.06493465 0.70176955 0.8701942
## 26  0.023870872 0.08111773 0.76854822 0.9163460
## 27  0.021503697 0.09496661 0.82086367 0.9424731
## 28  0.008020470 0.06813823 0.90629838 0.9668579
## 29  0.017533751 0.20810222 0.93285331 0.9668579
## 30 -0.006321899 0.11646430 0.95671060 0.9668579
## 31 -0.002119708 0.05101649 0.96685786 0.9668579

res_meta_set=res_meta_set[order(res_meta_set$id.exposure), ]
res_meta_set$id.exposure #MRPL19 MRPL21 MRPL23 MRPL24 MRPL34 MRPS30  RPL13

##  [1] "MRPL18" "MRPL20" "MRPL21" "MRPL34" "MRPL35" "MRPL37" "MRPL38" "MRPL43"
##  [9] "MRPL48" "MRPL52" "MRPL53" "MRPL54" "MRPS21" "MRPS33" "MRPS6"  "MRPS7" 
## [17] "MRPS9"  "RPL10A" "RPL12"  "RPL13"  "RPL13A" "RPL14"  "RPL17"  "RPL27A"
## [25] "RPL28"  "RPL29"  "RPL8"   "RPL9"   "RPS23"  "RPS8"   "RPS9"

# Not tested with IVW
# MRPL19 MRPL23 MRPS30

3.2 Wald ratios (all unclumped SNPs)

setwd('/n/home04/cdadams/MR-RP-eqtlGen-bc')
RP_eqtlGen_bc=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/RP_eqtlGen_bc.csv')

# Split the harmonized data into separate data frames for each genes
RP_split <- split(RP_eqtlGen_bc, RP_eqtlGen_bc$id.exposure)
list2env(RP_split, envir= .GlobalEnv) #split the list into separate datasets

# Remove the original RP_eqtlGen_bc and RP_split
rm(list= ls()[(ls() %in% c("RP_eqtlGen_bc","RP_split") )])

# Combine the separated files into a list
files <- mget(ls())
length(files)

# Use a loop to run MR & save results
for (i in 1:length(files)){
   files[[i]]=mr_singlesnp(files[[i]], parameters = default_parameters(), single_method = "mr_wald_ratio")
   write.csv(files[[i]], paste(names(files[i]), "_Walds_mr.csv", sep = ""))
}

# Merge all the MR results and save 
Walds_res_tbl_fread <- 
    list.files(pattern = "*_Walds_mr.csv") %>% 
    map_df(~fread(.))
Walds_res_tbl_fread_df=as.data.frame(Walds_res_tbl_fread)
myvars=c('id.exposure','id.outcome', 'SNP','b','se','p')
Walds_res=Walds_res_tbl_fread_df[myvars]  
Walds_res$id.outcome="bc"
Walds_res=Walds_res[order(Walds_res$p),]
write.csv(Walds_res,"Walds_includes_metas_res.csv")

# Tried  selecting "rs", but the "inverse variance weighted" includes "rs"
exp1 <- Walds_res[Walds_res$SNP %like% "rs", ]  
Walds_res2=Walds_res %>% 
  filter(str_detect(Walds_res$SNP, "^rs"))
# FDR correction
p=Walds_res2$p
fdr=p.adjust(p, method = "fdr", n = length(p))

Walds_res2$fdr=fdr
Walds_res2=Walds_res2[order(Walds_res2$fdr),]
write.csv(Walds_res2,'/n/home04/cdadams/MR-RP-eqtlGen-bc/Walds_MR_eqtlGEN_RP_bc.csv')

3.2.1 Top Wald-ratio results by FDR

Walds_res=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/Walds_MR_eqtlGEN_RP_bc.csv')
dim(Walds_res)

## [1] 53257     8

sig_Walds_res=Walds_res[which(Walds_res$fdr<0.05),]
table(sig_Walds_res$id.exposure)

## 
## MRPL19 MRPL21 MRPL23 MRPL24 MRPL34 MRPS30  RPL13 
##      1      1      4      1     29    156      8

#MRPL19 MRPL21 MRPL23 MRPL24 MRPL34 MRPS30  RPL13

3.3 Meta-analysis for collective RPs

Will make another repo for this, eventually.

# Get the clumped meta-analytic set for the group of RPs
RP_eqtlGen_bc=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/RP_eqtlGen_bc.csv')
RP_eqtlGen_bc$id.exposure="collective RPs"
RP_eqtlGen_bc_clumped=clump_data(RP_eqtlGen_bc)
gene_freq=table(RP_eqtlGen_bc_clumped$gene.exposure)
gene_freq=as.data.frame(gene_freq)
dat=RP_eqtlGen_bc_clumped

# Run MR
mr_results<- mr(dat)
mr_results

##      id.exposure id.outcome outcome exposure                    method nsnp
## 1 collective RPs     WVFiKJ outcome exposure                  MR Egger  186
## 2 collective RPs     WVFiKJ outcome exposure           Weighted median  186
## 3 collective RPs     WVFiKJ outcome exposure Inverse variance weighted  186
## 4 collective RPs     WVFiKJ outcome exposure               Simple mode  186
## 5 collective RPs     WVFiKJ outcome exposure             Weighted mode  186
##              b          se      pval
## 1 -0.023384301 0.016847259 0.1668099
## 2  0.016969311 0.013293702 0.2017815
## 3  0.004060416 0.009288452 0.6620050
## 4  0.026519297 0.022244583 0.2347221
## 5  0.016847867 0.016875652 0.3194120

# Detect and remove outliers (outliers can indicate pleiotropy)
raddat <- format_radial(BXG=dat$beta.exposure, BYG=dat$beta.outcome, 
  seBXG=dat$se.exposure, seBYG=dat$se.outcome, RSID=dat$SNP)
ivwrad <- ivw_radial(raddat, alpha=0.05, weights=1)

## 
## Radial IVW
## 
##                 Estimate   Std.Error   t value  Pr(>|t|)
## Effect (1st) 0.004677796 0.009280510 0.5040451 0.6142297
## Iterative    0.004677794 0.009280511 0.5040449 0.6142298
## Exact (FE)   0.004675870 0.008457056 0.5528958 0.5803348
## Exact (RE)   0.004689604 0.009360280 0.5010110 0.6169494
## 
## 
## Residual standard error: 1.097 on 188 degrees of freedom
## 
## F-statistic: 0.25 on 1 and 188 DF, p-value: 0.615
## Q-Statistic for heterogeneity: 226.394 on 188 DF , p-value: 0.02916883
## 
##  Outliers detected 
## Number of iterations = 2

ivwrad2 <- ivw_radial(raddat, alpha=0.05, weights=1)

## 
## Radial IVW
## 
##                 Estimate   Std.Error   t value  Pr(>|t|)
## Effect (1st) 0.004677796 0.009280510 0.5040451 0.6142297
## Iterative    0.004677794 0.009280511 0.5040449 0.6142298
## Exact (FE)   0.004675870 0.008457056 0.5528958 0.5803348
## Exact (RE)   0.004689604 0.009388550 0.4995024 0.6180095
## 
## 
## Residual standard error: 1.097 on 188 degrees of freedom
## 
## F-statistic: 0.25 on 1 and 188 DF, p-value: 0.615
## Q-Statistic for heterogeneity: 226.394 on 188 DF , p-value: 0.02916883
## 
##  Outliers detected 
## Number of iterations = 2

dim(ivwrad$outliers)[1]

## [1] 14

outliers=ivwrad$outliers[1]
outliers

##            SNP
## 1   rs10039541
## 2   rs10819246
## 3   rs11128729
## 4  rs117605916
## 5   rs13034052
## 6  rs148276897
## 7   rs17305746
## 8  rs181131314
## 9   rs35061212
## 10   rs4979655
## 11   rs6747017
## 12   rs6875287
## 13  rs72950603
## 14  rs76629948

myvars=outliers$SNP
dat2 <- dat[! dat$SNP %in% myvars, ]
dim(dat2)

## [1] 175  34

dim(dat)

## [1] 189  34

dat2$outcome="breast cancer (BCAC)"
dat2$id.outcome="breast cancer (BCAC)"

# Run MR without outliers
mr_results <- mr(dat2)
mr_results

##      id.exposure           id.outcome              outcome exposure
## 1 collective RPs breast cancer (BCAC) breast cancer (BCAC) exposure
## 2 collective RPs breast cancer (BCAC) breast cancer (BCAC) exposure
## 3 collective RPs breast cancer (BCAC) breast cancer (BCAC) exposure
## 4 collective RPs breast cancer (BCAC) breast cancer (BCAC) exposure
## 5 collective RPs breast cancer (BCAC) breast cancer (BCAC) exposure
##                      method nsnp           b          se       pval
## 1                  MR Egger  172 -0.01048461 0.016213793 0.51873198
## 2           Weighted median  172  0.02577580 0.013428342 0.05492015
## 3 Inverse variance weighted  172  0.01066183 0.008888786 0.23034554
## 4               Simple mode  172  0.02396877 0.022460532 0.28740820
## 5             Weighted mode  172  0.02396877 0.015863736 0.13265568

singlesnp_results=mr_singlesnp(dat2, parameters = default_parameters(),
  single_method = "mr_wald_ratio", 
  all_method = c("mr_ivw",
  "mr_egger_regression", 
  "mr_weighted_mode",
  "mr_weighted_median"))
dim(singlesnp_results)

## [1] 176   9

singlesnp_results=singlesnp_results[order(singlesnp_results$SNP),]

# Remove the snps that weren't used in the MR to have a record of the data source
singlesnp_results.2=singlesnp_results[5:176,]
dim(singlesnp_results.2)

## [1] 172   9

dat2.1=dat2[which(dat2$SNP %in% singlesnp_results.2$SNP),]
dim(dat2.1)

## [1] 172  34

dim(dat)

## [1] 189  34

dat=dat2.1
dim(dat)

## [1] 172  34

# r2 and Fstat

n=34684 
k=172
p=dat$eaf.exposure
r2.1<-(2*(dat$beta.exposure^2)*p*(1-p))/1
r2=sum(r2.1, na.rm = TRUE)
r2

## [1] 2.09056

Fstat <- r2*(n-1-k)/((1-r2)*k)
Fstat

## [1] -384.6291

# Combine results
sin=singlesnp_results
res=mr_results
het<-mr_heterogeneity(dat)
plt<-mr_pleiotropy_test(dat)

# Bug in function for combining; save separately
# all_res<-combine_all_mrresults(res,het,plt,sin,ao_slc=F,Exp=T,split.exposure=T,split.outcome=T)
# all_res$lower_intercept=all_res$intercept-1.96*all_res$intercept_se
# all_res$upper_intercept=all_res$intercept+1.96*all_res$intercept_se
# 
# head(all_res[,c("Method","outcome","exposure","nsnp","b","se","or", "or_lci95",  
#   "or_uci95","pval","intercept","intercept_se","intercept_pval",
#   "Q","Q_df","Q_pval")])
#write.csv(all_res,"combined_results_all_RP_MRP_blood_v8.csv")

write.csv(mr_results,'/n/home04/cdadams/MR-RP-eqtlGen-bc/res_META_MR-RP-eqtlGen-bc.csv')
write.csv(singlesnp_results,'/n/home04/cdadams/MR-RP-eqtlGen-bc/singlesnp_META_MR-RP-eqtlGen-bc.csv')
write.csv(dat, '/n/home04/cdadams/MR-RP-eqtlGen-bc/SIMEX_META_MR-RP-eqtlGen-bc.csv')
write.csv(plt,'/n/home04/cdadams/MR-RP-eqtlGen-bc/plt_META_MR-RP-eqtlGen-bc.csv')
write.csv(het,'/n/home04/cdadams/MR-RP-eqtlGen-bc/het_META_MR-RP-eqtlGen-bc.csv')

3.4 Data for collective RP

3.5 Figures

The top plot displays the individual SNPs included in the analysis, their effect estimates and 95% confidence intervals, along with the meta-analyzed “causal” estimates (the term “causal” is used in MR to mean MR is less prone to confounding and reverse causation than observational designs).
The inverse-variance weighted (IVW) estimate is the approach that is used to test the causal effect. The weighted median, weighted mode, and MR-Egger are sensitivity estimators. If the directions and magnitudes of the sensitivity estimators’ findings comport with those of the IVW, this is evidence against pleiotropy. (The sensitivity estimators needn’t all be statistically signficiant. For instance, the MR-Egger estimate isn’t, but this only means that the MR-Egger test doesn’t provide additional support for the IVW. The direction and magnitude of the MR-Egger estimate does, however, comport with the IVW’s.)
The bottom (scatter) plot is a sensitivity analysis for heterogeneity in the effect estimates.

3.6 IVW & sensitivity estimator results

##                      method nsnp    or    se or_lci95 or_uci95  pval
## 1                  MR Egger  172 0.990 0.016    0.959    1.022 0.519
## 2           Weighted median  172 1.026 0.013    0.999    1.053 0.055
## 3 Inverse variance weighted  172 1.011 0.009    0.993    1.028 0.230
## 4               Simple mode  172 1.024 0.022    0.980    1.070 0.287
## 5             Weighted mode  172 1.024 0.016    0.993    1.057 0.133

3.7 Wald ratio results

##        Var1 Freq
## 1     MRPL1    1
## 2    MRPL10    2
## 3    MRPL13    1
## 4    MRPL14    1
## 5    MRPL15    1
## 6    MRPL17    2
## 7    MRPL18    4
## 8    MRPL19    1
## 9     MRPL2    2
## 10   MRPL20    3
## 11   MRPL21    4
## 12   MRPL22    1
## 13   MRPL23    1
## 14   MRPL24    1
## 15   MRPL27    1
## 16   MRPL28    1
## 17   MRPL32    1
## 18   MRPL33    1
## 19   MRPL34    1
## 20   MRPL35    3
## 21   MRPL36    2
## 22   MRPL37    1
## 23   MRPL38    3
## 24   MRPL39    1
## 25    MRPL4    1
## 26   MRPL40    1
## 27   MRPL42    1
## 28   MRPL43    4
## 29   MRPL44    1
## 30   MRPL45    1
## 31   MRPL47    1
## 32   MRPL48    2
## 33   MRPL49    1
## 34   MRPL50    2
## 35   MRPL51    1
## 36   MRPL52    4
## 37   MRPL53    2
## 38   MRPL54    2
## 39   MRPL55    1
## 40   MRPS10    1
## 41   MRPS11    1
## 42   MRPS12    1
## 43   MRPS14    1
## 44   MRPS15    2
## 45   MRPS16    1
## 46   MRPS17    2
## 47  MRPS18B    1
## 48  MRPS18C    1
## 49   MRPS21    2
## 50   MRPS22    1
## 51   MRPS24    1
## 52   MRPS26    1
## 53   MRPS31    1
## 54   MRPS33    3
## 55   MRPS34    1
## 56   MRPS35    2
## 57   MRPS36    1
## 58    MRPS6    3
## 59    MRPS7    2
## 60    MRPS9    1
## 61   RPL10A    1
## 62    RPL11    1
## 63    RPL12    3
## 64    RPL13    4
## 65   RPL13A    1
## 66    RPL14    3
## 67    RPL15    1
## 68    RPL17    4
## 69    RPL21    1
## 70    RPL22    1
## 71   RPL27A    1
## 72    RPL28    2
## 73    RPL29    3
## 74     RPL3    1
## 75    RPL30    1
## 76    RPL31    2
## 77    RPL32    1
## 78   RPL35A    1
## 79    RPL36    2
## 80    RPL37    1
## 81    RPL3L    1
## 82     RPL4    1
## 83     RPL8    5
## 84     RPL9    4
## 85    RPS12    1
## 86    RPS13    1
## 87    RPS14    2
## 88   RPS15A    2
## 89    RPS16    1
## 90    RPS18    1
## 91    RPS20    2
## 92    RPS21    1
## 93    RPS23    4
## 94    RPS24    2
## 95    RPS25    1
## 96    RPS26    1
## 97   RPS27A    1
## 98    RPS28    1
## 99    RPS29    1
## 100   RPS3A    2
## 101    RPS5    1
## 102    RPS6    1
## 103    RPS7    1
## 104    RPS8    2
## 105    RPS9    2

3.8 BCAC 2017

Will create another repo for this. The idea is to see how the BCAC 2017 data perform.

# Try using eqtlGen with BCAC 2017
# Extract the increased expression loci from BCAC
exposure_RP_eqtlGen_p=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/exposure_RP_eqtlGen_p.csv')
exposure_RP_eqtlGen_p_clumped=clump_data(exposure_RP_eqtlGen_p)
dim(exposure_RP_eqtlGen_p_clumped)
outcome_dat <- extract_outcome_data(exposure_RP_eqtlGen_p_clumped$SNP, c('1126'),
  proxies = 1, rsq = 0.8, align_alleles = 1, palindromes = 1, maf_threshold = 0.3)
dat <- harmonise_data(exposure_RP_eqtlGen_p_clumped, outcome_dat, action = 2)
res_17=mr(dat)

# Detect and remove outliers (outliers can indicate pleiotropy)
raddat <- format_radial(BXG=dat$beta.exposure, BYG=dat$beta.outcome, 
  seBXG=dat$se.exposure, seBYG=dat$se.outcome, RSID=dat$SNP)
ivwrad <- ivw_radial(raddat, alpha=0.05, weights=1)
ivwrad2 <- ivw_radial(raddat, alpha=0.05, weights=1)
dim(ivwrad$outliers)[1] 
outliers=ivwrad$outliers[1]
outliers
myvars=outliers$SNP
dat2 <- dat[! dat$SNP %in% myvars, ]
dim(dat2) 
dim(dat)
dat2$outcome="breast cancer (BCAC)"
dat2$id.outcome="breast cancer (BCAC)"

# Run MR without outliers
mr_results_17 <- mr(dat2)
mr_results_17

3.9 Output (eQTLGen on BC)

My Rpubs