Meta-analytic and Wald ratio Mendelian randomization of expression of cytosolic and mitochondrial ribosomal proteins on breast cancer in BCAC (2017)

MR-RP-eqtlGen-bc-17

Charleen D. Adams1 and Bernardo Lemos1,2

1Department of Environmental Health, Program in Molecular and Integrative Physiological Sciences, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
2Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA

2021-04-15

Abstract

Under conditions of nucleolar stress, various ribosomal proteins (RPs) interact with the MDM2 proto-oncogene (MDM2) to stabilize p53. Dysregulation of the RPs may disturb this pathway, but whether changes in the expression of the RPs influence risk for breast cancer in the general population is unknown. We tested whether expression changes in individual cytosolic and mitochondrial RPs influence risk for breast cancer using inverse-variance weighted (IVW) and Wald ratio Mendelian randomization (MR). Summary statistics for cis expression quantitative trait loci (eQTLs) for SNPs controlling expression of RPs (P<5x10-06) were extracted from the eqtlGen Consortium’s genome-wide association study (GWAS) of expression in blood (n=125 RPs). Summary statistics for 119 of the 125 RPs were extracted from the Breast Cancer Association Consortium’s (BCAC) 2017 genome-wide association study (GWAS) of breast cancer (122977 cases and 105974 controls). After linkage disequilibirum (LD)-clumping each set of eQTLs for a given RP, there were 33 RPs available for meta-analysis.

0.1 Format eqtlGen GWAS
0.2 Harmonize
1 Split, loop, MR, & FDR-correct results

0.1 Format eqtlGen GWAS

eqtlGen stored MAFs in a separate file.
MAFs are needed for the calculation of the betas & SEs from the Z-scores.
Methods for this are provided by: Zhu 2016: https://www.nature.com/articles/ng.3538

\[ \beta = \frac{z} {\sqrt2p(1-p)(n + z^2)} \] \[ SE = \frac{1} {\sqrt2p(1-p)(n + z^2)} \] where \[p = MAF\]

# README files for the eQTLGen GWAS
# This README accompanies the files with cis-eQTL results from eQTLGen
 
# Files
# -----
# File with full cis-eQTL results: 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
# File with significant (FDR<0.05) cis-eQTL results: 2019-12-11-cis-eQTLsFDR0.05-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
 
# Column Names
# ------------
# Pvalue - P-value
# SNP - SNP rs ID
# SNPChr - SNP chromosome
# SNPPos - SNP position
# AssessedAllele - Assessed allele, the Z-score refers to this allele
# OtherAllele - Not assessed allele
# Zscore - Z-score
# Gene - ENSG name (Ensembl v71) of the eQTL gene
# GeneSymbol - HGNC name of the gene
# GeneChr - Gene chromosome
# GenePos - Centre of gene position
# NrCohorts - Total number of cohorts where this SNP-gene combination was tested
# NrSamples - Total number of samples where this SNP-gene combination was tested
# FDR - False discovery rate estimated based on permutations
# BonferroniP - P-value after Bonferroni correction
# 
# Additional information
# ----------------------
# These files contain all cis-eQTL results from eQTLGen, accompanying the article.
# 19,250 genes that showed expression in blood were tested.
# Every SNP-gene combination with a distance <1Mb from the center of the gene and  tested in at least 2 cohorts was included.
# Associations where SNP/proxy positioned in Illumina probe were not removed from combined analysis.
# UPDATE LOG
# ----------
# 2018-10-19: Initial data release
# 2018-12-19: In the current README, following file names have been fixed and updated:
# 2019-12-20: Cis-eQTLs are now updated to have a 2-cohort filter: every cis-eQTL must be tested in at least 2 cohorts to be reported
# On cluster, I gzipped the eqtlGen file: gzip 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt
# this is because I kept getting error messages both here & on the cluster saying ther was no room on the disk.
# I moved unnecessary files to a folder in my personal computer DB: /Users/charleenadams/Dropbox/Harvard/bc/survival/odyssey_transfer/
# This has the gz files for the protein GWAS, the eQTLGen SMR file, and the eQTLGen EAF file.
# I launched an interacted terminal session with increased memory: 
# salloc -p test  --mem 5000 -t 0-06:00
#-----------------------------------------------------------------------------------------------------------------------
# README file for the MAF eQTLGen cis-eQTL data 
# A README accompanies the file with allele frequencies from eQTLGen cis-eQTL analysis: 
# 2018-07-18_SNP_AF_for_AlleleB_combined_allele_counts_and_MAF_pos_added.txt.gz
# Column Names
# ------------
# SNP -         SNP rs ID
# hg19_chr -    chr number
# hg19_pos -    SNP position (hg19)
# AlleleA -     Other allele
# AlleleB -     Assessed allele
# allA_total -  Total allele count of genotype AA
# allAB_total - Total allele count of genotype AB
# allB_total -  Total allele count of genotype BB
# AlleleB_all - Allele frequency of assessed allele
# Additional information
# ----------------------
# The allele frequencies were calculated using reported allele counts from all cohorts except 
# Framingham Heart Study, because the related samples are present in this cohort.
#-----------------------------------------------------------------------------------------------------------------------
# Remember that the eqtlGen file is stored in /n/home04/cdadams/MR-eqtlGen-long
# Don't move to MR-RP-eqtlGen-bc-17 because it will result a storage over-capacity
setwd('/n/home04/cdadams/MR-eqtlGen-long/')
tbl=fread('cat 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz | gunzip')

rp_list_expanded=read.csv('/n/home04/cdadams/ld/ldsc/expanded_verified.csv')
rp_list_expanded

# Extract RPs
RPs=tbl[which(tbl$GeneSymbol %in% rp_list_expanded$all_RP),]
freq=table(RPs$GeneSymbol)
rp_freq=as.data.frame(freq)

# Read-in MAFs
eqtl_MAF=read.table(file = "2018-07-18_SNP_AF_for_AlleleB_combined_allele_counts_and_MAF_pos_added.txt", 
                    sep = "\t", header=TRUE)
RPs_total <- merge(RPs,eqtl_MAF,by="SNP")

# Calculate betas and SEs
n=30596
RPs_total$Beta=RPs_total$Zscore/sqrt(2*(RPs_total$AlleleB_all)*
                  (1-RPs_total$AlleleB_all)*(n+RPs_total$Zscore^2))
RPs_total$SE=1/sqrt(2*(RPs_total$AlleleB_all)*
                  (1-RPs_total$AlleleB_all)*(n+RPs_total$Zscore^2))
write.table(RPs_total, file = "/n/home04/cdadams/MR-RP-eqtlGen-bc-17/RPs_total_maf_beta_se.txt", sep = "\t",
            row.names = FALSE, col.names = TRUE, quote=FALSE)
RPs_total=read.delim("/n/home04/cdadams/MR-RP-eqtlGen-bc-17/RPs_total_maf_beta_se.txt", check.names = FALSE)

# gzRPs <- gzfile("/n/home04/cdadams/MR-RP-eqtlGen-bc-17/RPs.gz", "w")
# write.csv(RPs, gzRPs)
# close(gzRPs)
# Read-in for MR formatting

freq=table(RPs_total$GeneSymbol)
rp_freq=as.data.frame(freq)
rp_freq

setwd("/n/home04/cdadams/MR-RP-eqtlGen-bc-17")

# Read-in the data saved for the 2020 BCAC: MR-RP-eqtlGen-bc
exposure_RP_eqtlGen <- read_exposure_data(
  filename = "/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs_total_maf_beta_se.txt",
  sep = '\t',
  snp_col = 'SNP',
  beta_col = 'Beta',
  se_col = 'SE',
  effect_allele_col = 'AssessedAllele',
  #phenotype_col = 'tissue',
  #units_col = 'tissue',
  other_allele_col = 'OtherAllele',
  eaf_col = 'AlleleB_all',
  #samplesize_col = 'samplesize',
  #ncase_col = 'ncase',
  #ncontrol_col = 'ncontrol',
  gene_col = 'GeneSymbol',
  pval_col = 'Pvalue', 
  id_col = 'RP eqtlGEN'
)

freq=table(exposure_RP_eqtlGen$gene.exposure)
rp_freq=as.data.frame(freq)
rp_freq

exposure_RP_eqtlGen_p=exposure_RP_eqtlGen[which(exposure_RP_eqtlGen$pval.exposure<0.000005),]

freq2=table(exposure_RP_eqtlGen_p$gene.exposure)
rp_freq2=as.data.frame(freq2)
rp_freq2

write.csv(exposure_RP_eqtlGen_p, '/n/home04/cdadams/MR-RP-eqtlGen-bc-17/exposure_RP_eqtlGen_p.csv')

0.2 Harmonize

# Read-in the BCAC 2017 data
exposure_RP_eqtlGen_p=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc-17/exposure_RP_eqtlGen_p.csv')
outcome_dat <- extract_outcome_data(exposure_RP_eqtlGen_p$SNP, c('1126'), #1128 ER-; 1127 ER+
  proxies = 1, rsq = 0.8, align_alleles = 1, palindromes = 1, maf_threshold = 0.3)
dat <- harmonise_data(exposure_RP_eqtlGen_p, outcome_dat, action = 2)

# Descriptives
freq3=table(dat$gene.exposure)
rp_freq3=as.data.frame(freq3)
rp_freq3

write.csv(dat, '/n/home04/cdadams/MR-RP-eqtlGen-bc-17/RP_eqtlGen_bc_17.csv')

1 Split, loop, MR, & FDR-correct results

setwd('/n/home04/cdadams/MR-RP-eqtlGen-bc-17')
RP_eqtlGen_bc_17=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc-17/RP_eqtlGen_bc_17.csv')
head(RP_eqtlGen_bc_17)
RP_eqtlGen_bc_17$id.exposure=RP_eqtlGen_bc_17$gene.exposure

# Command to remove everything except "v"
#rm(list=(ls()[ls()!="v"]))
# Split the harmonized data into separate data frames for each genes

RP_split <- split(RP_eqtlGen_bc_17, RP_eqtlGen_bc_17$id.exposure)
list2env(RP_split, envir= .GlobalEnv) #split the list into separate datasets

# Remove the original RP_eqtlGen_bc_17 and RP_split
rm(list= ls()[(ls() %in% c("RP_eqtlGen_bc_17","RP_split"))])

# Save each harmonized gene as a .csv
# for (i in 1:length(files)){
#   write.csv(files[[i]], paste(names(files[i]), ".csv", sep = ""))
# }
# Get them back together: not used here but useful
# dfs = sapply(.GlobalEnv, is.data.frame) 
# dfs
# 
# df=do.call(rbind, mget(names(dfs)[dfs]))
# head(df)
# dim(df)

# Combine the separated files into a list
files <- mget(ls())
length(files)

# Use a loop to clump each RP dataframe in the list, run MR, & save results
for (i in 1:length(files)){
   files[[i]]=clump_data(files[[i]])
   write.csv(files[[i]], paste(names(files[i]), "_clump.csv", sep = ""))
   files[[i]]=mr(files[[i]])
   write.csv(files[[i]], paste(names(files[i]), "_mr.csv", sep = ""))
}

# Idea for reading all mr results in and saving:
# Idea from: https://statisticsglobe.com/merge-csv-files-in-r
# Sadly, didn't work. 
# data_all <- list.files(path = "/n/home04/cdadams/MR-RP-eqtlGen-bc-17",  
#   pattern = "*_mr.csv", full.names = TRUE) %>% 
#   lapply(read_csv) %>%                                                    
#   bind_rows                                                                
#   write.csv(data_all, "complete_RP_MR_res.csv")

# Merged all of the clumped data 
tbl_fread_clump <- 
    list.files(pattern = "*_clump.csv") %>% 
    map_df(~fread(.))
tbl_fread_clump_df=as.data.frame(tbl_fread_clump)
head(tbl_fread_clump_df)
colnames(tbl_fread_clump_df)

df <- tbl_fread_clump_df[c(3:43)] #check this
freq_df=table(df$gene.exposure)
freq_df=as.data.frame(freq_df)
freq_df
freq_set=freq_df[which(freq_df$Freq>2),]
dim(freq_set)

write.csv(df, '/n/home04/cdadams/MR-RP-eqtlGen-bc-17/df_17.csv')

# Merge all the MR results and save 
tbl_fread <- 
    list.files(pattern = "*_mr.csv") %>% 
    map_df(~fread(.))
tbl_fread_df=as.data.frame(tbl_fread)
myvars=c('id.exposure','id.outcome','method', 'nsnp','b','se','pval')
res=tbl_fread_df[myvars]  
res$id.outcome="bc 2017"
res=res[order(res$p),]

# FDR correction
p=res$p
fdr=p.adjust(p, method = "fdr", n = length(p))
res$fdr=fdr
res=res[order(res$fdr),]
write.csv(res,'/n/home04/cdadams/MR-RP-eqtlGen-bc-17/MR-eqtlGEN-RP-bc-17.csv')

res=res[order(res$p),]

# Select just the set with >2 SNPs
res=res[order(res$fdr),]
res_meta_set=res[which(res$nsnp>2),]
res_meta_set=res_meta_set[which(res_meta_set$method=='Inverse variance weighted'),]
rownames(res_meta_set)

# FDR for res_meta_set (only IVWs)
p2=res_meta_set$p
fdr2=p.adjust(p2, method = "fdr", n = length(p2))
res_meta_set$fdr2=fdr2
res_meta_set$fdr=NULL
res_meta_set=res_meta_set[order(res_meta_set$fdr2),]
write.csv(res_meta_set,'/n/home04/cdadams/MR-RP-eqtlGen-bc-17/MR-eqtlGEN-RP-bc-17-meta.csv')

# FDR for res_meta_set: only RP (not MRP) (only IVWs)
res_meta_set=res_meta_set[order(res_meta_set$id.exposure),]

p2=res_meta_set$p
fdr2=p.adjust(p2, method = "fdr", n = length(p2))
res_meta_set$fdr2=fdr2
res_meta_set$fdr=NULL
res_meta_set=res_meta_set[order(res_meta_set$fdr2),]
write.csv(res_meta_set,'/n/home04/cdadams/MR-RP-eqtlGen-bc-17/MR-eqtlGEN-RP-bc-17-meta.csv')

# Consideration for later: Find way to merge the df and results

1.1 Results

1.1.1 Top results ordered by p-value (not FDR)

This includes a list of all tests run

includes 1 SNP tests (Wald ratios)
inverse variance weighted (IVW) – main meta-analytic test
sensitivity estimators: weighted median, weighted mode, and MR-Egger

res=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc-17/MR-eqtlGEN-RP-bc-17.csv')
head(res, n=5)

##     X id.exposure id.outcome                    method nsnp           b
## 1 111      MRPS30    bc 2017                Wald ratio    1  1.09385651
## 2  99     MRPS18C    bc 2017                Wald ratio    1  0.25057689
## 3  15       MRPL2    bc 2017 Inverse variance weighted    2 -0.10805538
## 4 221       RPS23    bc 2017 Inverse variance weighted    6  0.05960075
## 5 220       RPS23    bc 2017           Weighted median    6  0.06684002
##           se         pval          fdr
## 1 0.08778721 1.228805e-35 2.985996e-33
## 2 0.04327512 7.025137e-09 8.535542e-07
## 3 0.02680083 5.535294e-05 4.483588e-03
## 4 0.01600793 1.967127e-04 1.165190e-02
## 5 0.01819783 2.397510e-04 1.165190e-02

1.1.2 Top IVW results by FDR

res_meta_set=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc-17/MR-eqtlGEN-RP-bc-17-meta.csv')
dim(res_meta_set)

## [1] 31  9

res_meta_set

##      X id.exposure id.outcome                    method nsnp            b
## 1  221       RPS23    bc 2017 Inverse variance weighted    6  0.059600750
## 2   51      MRPL38    bc 2017 Inverse variance weighted    5 -0.130870724
## 3   68      MRPL48    bc 2017 Inverse variance weighted    5  0.069526790
## 4  132       MRPS9    bc 2017 Inverse variance weighted    4  0.053323275
## 5  115      MRPS33    bc 2017 Inverse variance weighted    3 -0.063481171
## 6  203        RPL9    bc 2017 Inverse variance weighted    4  0.065877625
## 7   11      MRPL18    bc 2017 Inverse variance weighted    4  0.043641995
## 8   46      MRPL37    bc 2017 Inverse variance weighted    3 -0.074307248
## 9  173      RPL27A    bc 2017 Inverse variance weighted    3 -0.081366040
## 10 148       RPL13    bc 2017 Inverse variance weighted    4 -0.043366995
## 11 158       RPL14    bc 2017 Inverse variance weighted    4 -0.046810993
## 12 127       MRPS7    bc 2017 Inverse variance weighted    4  0.028446597
## 13 143       RPL12    bc 2017 Inverse variance weighted    4 -0.032839781
## 14  35      MRPL34    bc 2017 Inverse variance weighted    4 -0.052629420
## 15 164       RPL17    bc 2017 Inverse variance weighted    4 -0.022264817
## 16  60      MRPL43    bc 2017 Inverse variance weighted    4 -0.028601595
## 17 236        RPS8    bc 2017 Inverse variance weighted    3  0.028849787
## 18 137      RPL10A    bc 2017 Inverse variance weighted    4  0.036865010
## 19 103      MRPS21    bc 2017 Inverse variance weighted    3  0.020403854
## 20 153      RPL13A    bc 2017 Inverse variance weighted    4  0.024847109
## 21  18      MRPL20    bc 2017 Inverse variance weighted    3 -0.031053403
## 22  80      MRPL53    bc 2017 Inverse variance weighted    3  0.018170308
## 23  85      MRPL54    bc 2017 Inverse variance weighted    4 -0.015684836
## 24  23      MRPL21    bc 2017 Inverse variance weighted    7 -0.009288684
## 25  75      MRPL52    bc 2017 Inverse variance weighted    4 -0.020003918
## 26 198        RPL8    bc 2017 Inverse variance weighted    6  0.015722266
## 27 122       MRPS6    bc 2017 Inverse variance weighted    4 -0.013312604
## 28 178       RPL28    bc 2017 Inverse variance weighted    5 -0.015111376
## 29  40      MRPL35    bc 2017 Inverse variance weighted    3 -0.007901239
## 30 183       RPL29    bc 2017 Inverse variance weighted    3 -0.029488294
## 31 241        RPS9    bc 2017 Inverse variance weighted    3  0.002784067
##            se         pval        fdr2
## 1  0.01600793 0.0001967127 0.006098093
## 2  0.04025360 0.0011493690 0.017815219
## 3  0.02402501 0.0038044814 0.039312975
## 4  0.02261295 0.0183694575 0.142363295
## 5  0.02812787 0.0240157181 0.148897452
## 6  0.03030750 0.0297320977 0.153615838
## 7  0.02114069 0.0389836363 0.169318416
## 8  0.03684040 0.0436950751 0.169318416
## 9  0.04246165 0.0553365395 0.190603636
## 10 0.02350268 0.0650094009 0.201529143
## 11 0.02817350 0.0966079093 0.272258654
## 12 0.01913451 0.1371034919 0.354184021
## 13 0.02570746 0.2014463452 0.480372054
## 14 0.04361920 0.2275995888 0.493148798
## 15 0.01889343 0.2386203860 0.493148798
## 16 0.02611256 0.2733767611 0.498510564
## 17 0.02552052 0.2582847308 0.498510564
## 18 0.03684590 0.3170595972 0.546047084
## 19 0.02186937 0.3508262381 0.572400704
## 20 0.02881607 0.3885413723 0.602239127
## 21 0.04227257 0.4625834850 0.643548244
## 22 0.02557845 0.4774712778 0.643548244
## 23 0.02165664 0.4689118566 0.643548244
## 24 0.01501565 0.5361796645 0.666305591
## 25 0.03242991 0.5373432186 0.666305591
## 26 0.03057950 0.6071513608 0.723911238
## 27 0.02919407 0.6483873498 0.744444735
## 28 0.03757594 0.6875694569 0.761237613
## 29 0.02155723 0.7139742659 0.763213870
## 30 0.19420039 0.8793094485 0.908619763
## 31 0.02635004 0.9158544664 0.915854466

res_meta_set=res_meta_set[order(res_meta_set$id.exposure), ]
res_meta_set$id.exposure

##  [1] "MRPL18" "MRPL20" "MRPL21" "MRPL34" "MRPL35" "MRPL37" "MRPL38" "MRPL43"
##  [9] "MRPL48" "MRPL52" "MRPL53" "MRPL54" "MRPS21" "MRPS33" "MRPS6"  "MRPS7" 
## [17] "MRPS9"  "RPL10A" "RPL12"  "RPL13"  "RPL13A" "RPL14"  "RPL17"  "RPL27A"
## [25] "RPL28"  "RPL29"  "RPL8"   "RPL9"   "RPS23"  "RPS8"   "RPS9"

# Not tested with IVW

1.2 Wald ratios (all unclumped SNPs)

setwd('/n/home04/cdadams/MR-RP-eqtlGen-bc-17')
RP_eqtlGen_bc_17=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc-17/RP_eqtlGen_bc_17.csv')
RP_eqtlGen_bc_17$id.exposure=RP_eqtlGen_bc_17$gene.exposure

# Split the harmonized data into separate data frames for each genes
RP_split <- split(RP_eqtlGen_bc_17, RP_eqtlGen_bc_17$id.exposure)
list2env(RP_split, envir= .GlobalEnv) #split the list into separate datasets

# Remove the original RP_eqtlGen_bc_17 and RP_split
rm(list= ls()[(ls() %in% c("RP_eqtlGen_bc_17","RP_split") )])

# Combine the separated files into a list
files <- mget(ls())
length(files)

# Use a loop to run MR & save results
for (i in 1:length(files)){
   files[[i]]=mr_singlesnp(files[[i]], parameters = default_parameters(), single_method = "mr_wald_ratio")
   write.csv(files[[i]], paste(names(files[i]), "_Walds_mr.csv", sep = ""))
}

# Merge all the MR results and save 
Walds_res_tbl_fread <- 
    list.files(pattern = "*_Walds_mr.csv") %>% 
    map_df(~fread(.))
Walds_res_tbl_fread_df=as.data.frame(Walds_res_tbl_fread)
myvars=c('id.exposure','id.outcome', 'SNP','b','se','p')
Walds_res=Walds_res_tbl_fread_df[myvars]  
Walds_res$id.outcome="bc 2017"
Walds_res=Walds_res[order(Walds_res$p),]
write.csv(Walds_res,"Walds_includes_metas_res.csv")

# Tried  selecting "rs", but the "inverse variance weighted" includes "rs"
exp1 <- Walds_res[Walds_res$SNP %like% "rs", ]  
Walds_res2=Walds_res %>% 
  filter(str_detect(Walds_res$SNP, "^rs"))
# FDR correction
p=Walds_res2$p
fdr=p.adjust(p, method = "fdr", n = length(p))

Walds_res2$fdr=fdr
Walds_res2=Walds_res2[order(Walds_res2$fdr),]
write.csv(Walds_res2,'/n/home04/cdadams/MR-RP-eqtlGen-bc-17/Walds_MR_eqtlGEN_RP_bc.csv')

1.2.1 Top Wald-ratio results by FDR

Walds_res=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc-17/Walds_MR_eqtlGEN_RP_bc.csv')
dim(Walds_res)

## [1] 53579     8

sig_Walds_res=Walds_res[which(Walds_res$fdr<0.05),]
table(sig_Walds_res$id.exposure)

## 
##  MRPL10  MRPL14  MRPL17  MRPL18   MRPL2  MRPL21  MRPL22  MRPL23  MRPL24  MRPL34 
##       8       1       8       2      72     104       7      39       6     108 
##  MRPL36  MRPL38  MRPL39  MRPL48  MRPL52  MRPL53  MRPL55  MRPS16 MRPS18B MRPS18C 
##       1      16       2      56       3       1       1     283       4     313 
##  MRPS21  MRPS22  MRPS30   MRPS6   MRPS7   MRPS9   RPL12   RPL13  RPL13A   RPL22 
##       1       4     280       4      71       5      27     152       2      15 
##   RPL28    RPL3   RPL36    RPL4    RPL9   RPS10   RPS19   RPS23    RPS8 
##       7       3      20       1      13       2      34     325     313

1.3 Meta-analysis for collective RPs

Will make another repo for this, eventually.

# Get the clumped meta-analytic set for the group of RPs
RP_eqtlGen_bc_17=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc-17/RP_eqtlGen_bc_17.csv')
RP_eqtlGen_bc_17$id.exposure="collective RPs"
RP_eqtlGen_bc_17_clumped=clump_data(RP_eqtlGen_bc_17)
gene_freq4=table(RP_eqtlGen_bc_17_clumped$gene.exposure)
gene_freq4=as.data.frame(gene_freq4)
gene_freq4

##        Var1 Freq
## 1     MRPL1    1
## 2    MRPL10    2
## 3    MRPL13    1
## 4    MRPL14    1
## 5    MRPL15    1
## 6    MRPL17    2
## 7    MRPL18    4
## 8    MRPL19    2
## 9     MRPL2    2
## 10   MRPL20    3
## 11   MRPL21    4
## 12   MRPL22    1
## 13   MRPL23    1
## 14   MRPL24    1
## 15   MRPL27    2
## 16   MRPL28    1
## 17   MRPL32    1
## 18   MRPL33    1
## 19   MRPL34    4
## 20   MRPL35    3
## 21   MRPL36    2
## 22   MRPL37    1
## 23   MRPL38    4
## 24   MRPL39    1
## 25    MRPL4    1
## 26   MRPL40    1
## 27   MRPL41    1
## 28   MRPL43    4
## 29   MRPL44    1
## 30   MRPL45    1
## 31   MRPL47    1
## 32   MRPL48    2
## 33   MRPL49    1
## 34   MRPL50    2
## 35   MRPL52    4
## 36   MRPL53    2
## 37   MRPL54    2
## 38   MRPL55    1
## 39   MRPS10    1
## 40   MRPS11    1
## 41   MRPS12    1
## 42   MRPS14    1
## 43   MRPS15    2
## 44   MRPS16    1
## 45   MRPS17    2
## 46  MRPS18B    1
## 47  MRPS18C    1
## 48   MRPS21    2
## 49   MRPS22    1
## 50   MRPS24    1
## 51   MRPS25    1
## 52   MRPS26    1
## 53   MRPS30    1
## 54   MRPS31    1
## 55   MRPS33    3
## 56   MRPS34    1
## 57   MRPS36    1
## 58    MRPS6    3
## 59    MRPS7    2
## 60    MRPS9    2
## 61   RPL10A    1
## 62    RPL11    1
## 63    RPL12    4
## 64    RPL13    4
## 65   RPL13A    2
## 66    RPL14    4
## 67    RPL15    1
## 68    RPL17    4
## 69    RPL21    1
## 70    RPL22    1
## 71   RPL23A    1
## 72   RPL27A    1
## 73    RPL28    2
## 74    RPL29    3
## 75     RPL3    1
## 76    RPL30    1
## 77    RPL31    2
## 78    RPL32    1
## 79   RPL35A    1
## 80    RPL36    2
## 81    RPL37    1
## 82   RPL37A    1
## 83    RPL3L    1
## 84     RPL4    1
## 85     RPL8    6
## 86     RPL9    4
## 87    RPS12    1
## 88    RPS13    1
## 89    RPS14    2
## 90   RPS15A    2
## 91    RPS16    1
## 92    RPS18    1
## 93    RPS19    1
## 94    RPS20    2
## 95    RPS21    1
## 96    RPS23    6
## 97    RPS24    2
## 98    RPS25    1
## 99   RPS27A    1
## 100   RPS29    2
## 101   RPS3A    2
## 102    RPS5    1
## 103    RPS6    1
## 104    RPS7    1
## 105    RPS8    2
## 106    RPS9    2

dat=RP_eqtlGen_bc_17_clumped

# Run MR
mr_results<- mr(dat)
mr_results

##      id.exposure id.outcome
## 1 collective RPs ieu-a-1126
## 2 collective RPs ieu-a-1126
## 3 collective RPs ieu-a-1126
## 4 collective RPs ieu-a-1126
## 5 collective RPs ieu-a-1126
##                                                                          outcome
## 1 Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1126
## 2 Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1126
## 3 Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1126
## 4 Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1126
## 5 Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1126
##   exposure                    method nsnp            b          se      pval
## 1 exposure                  MR Egger  183 -0.009029368 0.009880328 0.3619992
## 2 exposure           Weighted median  183  0.004184527 0.005924136 0.4799691
## 3 exposure Inverse variance weighted  183  0.003136745 0.005412059 0.5621949
## 4 exposure               Simple mode  183  0.011889978 0.012869143 0.3567546
## 5 exposure             Weighted mode  183  0.004147419 0.006677620 0.5353159

# Detect and remove outliers (outliers can indicate pleiotropy)
raddat <- format_radial(BXG=dat$beta.exposure, BYG=dat$beta.outcome, 
  seBXG=dat$se.exposure, seBYG=dat$se.outcome, RSID=dat$SNP)
ivwrad <- ivw_radial(raddat, alpha=0.05, weights=1)

## 
## Radial IVW
## 
##                 Estimate   Std.Error   t value  Pr(>|t|)
## Effect (1st) 0.003428381 0.005397083 0.6352286 0.5252794
## Iterative    0.003428391 0.005397082 0.6352304 0.5252781
## Exact (FE)   0.003457563 0.003110902 1.1114343 0.2663814
## Exact (RE)   0.003438110 0.005179820 0.6637509 0.5076761
## 
## 
## Residual standard error: 1.735 on 185 degrees of freedom
## 
## F-statistic: 0.4 on 1 and 185 DF, p-value: 0.526
## Q-Statistic for heterogeneity: 556.8334 on 185 DF , p-value: 6.830118e-39
## 
##  Outliers detected 
## Number of iterations = 2

ivwrad2 <- ivw_radial(raddat, alpha=0.05, weights=1)

## 
## Radial IVW
## 
##                 Estimate   Std.Error   t value  Pr(>|t|)
## Effect (1st) 0.003428381 0.005397083 0.6352286 0.5252794
## Iterative    0.003428391 0.005397082 0.6352304 0.5252781
## Exact (FE)   0.003457563 0.003110902 1.1114343 0.2663814
## Exact (RE)   0.003438110 0.005223557 0.6581933 0.5112321
## 
## 
## Residual standard error: 1.735 on 185 degrees of freedom
## 
## F-statistic: 0.4 on 1 and 185 DF, p-value: 0.526
## Q-Statistic for heterogeneity: 556.8334 on 185 DF , p-value: 6.830118e-39
## 
##  Outliers detected 
## Number of iterations = 2

dim(ivwrad$outliers)[1]

## [1] 33

outliers=ivwrad$outliers[1]
outliers

##            SNP
## 1   rs10839596
## 2  rs113368915
## 3  rs114645635
## 4   rs11543947
## 5  rs116538259
## 6   rs12911848
## 7      rs15921
## 8   rs17305746
## 9  rs181131314
## 10   rs3736502
## 11   rs3789524
## 12   rs4130590
## 13   rs4713862
## 14   rs4785606
## 15   rs4979655
## 16  rs55943208
## 17  rs56089672
## 18  rs59922539
## 19  rs62126253
## 20      rs6509
## 21   rs6747017
## 22   rs6788205
## 23   rs6875287
## 24   rs7100800
## 25   rs7252877
## 26  rs75086869
## 27    rs758335
## 28   rs7666601
## 29  rs78505905
## 30    rs801155
## 31   rs9462853
## 32   rs9472022
## 33   rs9493450

myvars=outliers$SNP
dat2 <- dat[! dat$SNP %in% myvars, ]
dim(dat2)

## [1] 153  42

dim(dat)

## [1] 186  42

dat2$outcome="breast cancer (BCAC 2017)"
dat2$id.outcome="breast cancer (BCAC 2017)"

# Run MR without outliers
mr_results <- mr(dat2)
mr_results

##      id.exposure                id.outcome                   outcome exposure
## 1 collective RPs breast cancer (BCAC 2017) breast cancer (BCAC 2017) exposure
## 2 collective RPs breast cancer (BCAC 2017) breast cancer (BCAC 2017) exposure
## 3 collective RPs breast cancer (BCAC 2017) breast cancer (BCAC 2017) exposure
## 4 collective RPs breast cancer (BCAC 2017) breast cancer (BCAC 2017) exposure
## 5 collective RPs breast cancer (BCAC 2017) breast cancer (BCAC 2017) exposure
##                      method nsnp           b          se      pval
## 1                  MR Egger  151 0.001833688 0.006432617 0.7759933
## 2           Weighted median  151 0.006909040 0.006155683 0.2616992
## 3 Inverse variance weighted  151 0.004692680 0.003561069 0.1875797
## 4               Simple mode  151 0.017862858 0.011606378 0.1258976
## 5             Weighted mode  151 0.009603957 0.006543677 0.1442876

singlesnp_results=mr_singlesnp(dat2, parameters = default_parameters(),
  single_method = "mr_wald_ratio", 
  all_method = c("mr_ivw",
  "mr_egger_regression", 
  "mr_weighted_mode",
  "mr_weighted_median"))
dim(singlesnp_results)

## [1] 155   9

singlesnp_results=singlesnp_results[order(singlesnp_results$SNP),]
dim(singlesnp_results)

## [1] 155   9

# Remove the snps that weren't used in the MR to have a record of the data source
singlesnp_results.2=singlesnp_results[5:155,]
dim(singlesnp_results.2)

## [1] 151   9

dat2.1=dat2[which(dat2$SNP %in% singlesnp_results.2$SNP),]
dim(dat2.1)

## [1] 151  42

dim(dat)

## [1] 186  42

dat=dat2.1
dim(dat)

## [1] 151  42

# Descriptives 
freq5=table(dat$gene.exposure)
freq5=as.data.frame(freq5)
freq5

##       Var1 Freq
## 1    MRPL1    1
## 2   MRPL10    2
## 3   MRPL13    1
## 4   MRPL14    1
## 5   MRPL15    1
## 6   MRPL17    1
## 7   MRPL18    4
## 8   MRPL19    1
## 9   MRPL20    3
## 10  MRPL21    4
## 11  MRPL22    1
## 12  MRPL23    1
## 13  MRPL24    1
## 14  MRPL27    1
## 15  MRPL28    1
## 16  MRPL32    1
## 17  MRPL33    1
## 18  MRPL34    2
## 19  MRPL35    3
## 20  MRPL36    2
## 21  MRPL38    4
## 22  MRPL39    1
## 23   MRPL4    1
## 24  MRPL40    1
## 25  MRPL43    4
## 26  MRPL44    1
## 27  MRPL45    1
## 28  MRPL48    2
## 29  MRPL49    1
## 30  MRPL50    2
## 31  MRPL52    3
## 32  MRPL53    2
## 33  MRPL54    2
## 34  MRPL55    1
## 35  MRPS10    1
## 36  MRPS11    1
## 37  MRPS12    1
## 38  MRPS14    1
## 39  MRPS15    2
## 40  MRPS17    2
## 41 MRPS18B    1
## 42  MRPS21    2
## 43  MRPS22    1
## 44  MRPS24    1
## 45  MRPS25    1
## 46  MRPS26    1
## 47  MRPS31    1
## 48  MRPS33    2
## 49  MRPS34    1
## 50  MRPS36    1
## 51   MRPS6    3
## 52   MRPS7    2
## 53   MRPS9    2
## 54   RPL11    1
## 55   RPL12    3
## 56   RPL13    3
## 57  RPL13A    2
## 58   RPL14    3
## 59   RPL15    1
## 60   RPL17    4
## 61   RPL21    1
## 62  RPL27A    1
## 63   RPL28    2
## 64   RPL29    2
## 65   RPL30    1
## 66   RPL31    2
## 67   RPL32    1
## 68   RPL36    1
## 69   RPL37    1
## 70  RPL37A    1
## 71    RPL8    6
## 72    RPL9    4
## 73   RPS13    1
## 74   RPS14    2
## 75  RPS15A    2
## 76   RPS16    1
## 77   RPS20    2
## 78   RPS21    1
## 79   RPS23    4
## 80   RPS24    2
## 81   RPS25    1
## 82  RPS27A    1
## 83   RPS29    1
## 84   RPS3A    1
## 85    RPS5    1
## 86    RPS6    1
## 87    RPS8    2
## 88    RPS9    2

# r2 and Fstat

n=34684 
k=151
p=dat$eaf.exposure
r2.1<-(2*(dat$beta.exposure^2)*p*(1-p))/1
r2=sum(r2.1, na.rm = TRUE)
r2

## [1] 1.777762

Fstat <- r2*(n-1-k)/((1-r2)*k)
Fstat

## [1] -522.7232

# Combine results
sin=singlesnp_results
res=mr_results
het<-mr_heterogeneity(dat)
plt<-mr_pleiotropy_test(dat)

# Bug in function for combining; save separately
# all_res<-combine_all_mrresults(res,het,plt,sin,ao_slc=F,Exp=T,split.exposure=T,split.outcome=T)
# all_res$lower_intercept=all_res$intercept-1.96*all_res$intercept_se
# all_res$upper_intercept=all_res$intercept+1.96*all_res$intercept_se
# 
# head(all_res[,c("Method","outcome","exposure","nsnp","b","se","or", "or_lci95",  
#   "or_uci95","pval","intercept","intercept_se","intercept_pval",
#   "Q","Q_df","Q_pval")])
#write.csv(all_res,"combined_results_all_RP_MRP_blood_v8.csv")

write.csv(mr_results,'/n/home04/cdadams/MR-RP-eqtlGen-bc-17/res_META_MR-RP-eqtlGen-bc-17.csv')
write.csv(singlesnp_results,'/n/home04/cdadams/MR-RP-eqtlGen-bc-17/singlesnp_META_MR-RP-eqtlGen-bc-17.csv')
write.csv(dat, '/n/home04/cdadams/MR-RP-eqtlGen-bc-17/SIMEX_META_MR-RP-eqtlGen-bc-17.csv')
write.csv(plt,'/n/home04/cdadams/MR-RP-eqtlGen-bc-17/plt_META_MR-RP-eqtlGen-bc-17.csv')
write.csv(het,'/n/home04/cdadams/MR-RP-eqtlGen-bc-17/het_META_MR-RP-eqtlGen-bc-17.csv')

1.4 Data for collective RP

1.5 Figures

The top plot displays the individual SNPs included in the analysis, their effect estimates and 95% confidence intervals, along with the meta-analyzed “causal” estimates (the term “causal” is used in MR to mean MR is less prone to confounding and reverse causation than observational designs).
The inverse-variance weighted (IVW) estimate is the approach that is used to test the causal effect. The weighted median, weighted mode, and MR-Egger are sensitivity estimators. If the directions and magnitudes of the sensitivity estimators’ findings comport with those of the IVW, this is evidence against pleiotropy. (The sensitivity estimators needn’t all be statistically signficiant. For instance, the MR-Egger estimate isn’t, but this only means that the MR-Egger test doesn’t provide additional support for the IVW. The direction and magnitude of the MR-Egger estimate does, however, comport with the IVW’s.)
The bottom (scatter) plot is a sensitivity analysis for heterogeneity in the effect estimates.

1.6 IVW & sensitivity estimator results

##                      method nsnp    or    se or_lci95 or_uci95  pval
## 1                  MR Egger  151 1.002 0.006    0.989    1.015 0.776
## 2           Weighted median  151 1.007 0.006    0.995    1.019 0.262
## 3 Inverse variance weighted  151 1.005 0.004    0.998    1.012 0.188
## 4               Simple mode  151 1.018 0.012    0.995    1.041 0.126
## 5             Weighted mode  151 1.010 0.007    0.997    1.023 0.144

1.7 Wald ratio results

##       Var1 Freq
## 1    MRPL1    1
## 2   MRPL10    2
## 3   MRPL13    1
## 4   MRPL14    1
## 5   MRPL15    1
## 6   MRPL17    1
## 7   MRPL18    4
## 8   MRPL19    1
## 9   MRPL20    3
## 10  MRPL21    4
## 11  MRPL22    1
## 12  MRPL23    1
## 13  MRPL24    1
## 14  MRPL27    1
## 15  MRPL28    1
## 16  MRPL32    1
## 17  MRPL33    1
## 18  MRPL34    2
## 19  MRPL35    3
## 20  MRPL36    2
## 21  MRPL38    4
## 22  MRPL39    1
## 23   MRPL4    1
## 24  MRPL40    1
## 25  MRPL43    4
## 26  MRPL44    1
## 27  MRPL45    1
## 28  MRPL48    2
## 29  MRPL49    1
## 30  MRPL50    2
## 31  MRPL52    3
## 32  MRPL53    2
## 33  MRPL54    2
## 34  MRPL55    1
## 35  MRPS10    1
## 36  MRPS11    1
## 37  MRPS12    1
## 38  MRPS14    1
## 39  MRPS15    2
## 40  MRPS17    2
## 41 MRPS18B    1
## 42  MRPS21    2
## 43  MRPS22    1
## 44  MRPS24    1
## 45  MRPS25    1
## 46  MRPS26    1
## 47  MRPS31    1
## 48  MRPS33    2
## 49  MRPS34    1
## 50  MRPS36    1
## 51   MRPS6    3
## 52   MRPS7    2
## 53   MRPS9    2
## 54   RPL11    1
## 55   RPL12    3
## 56   RPL13    3
## 57  RPL13A    2
## 58   RPL14    3
## 59   RPL15    1
## 60   RPL17    4
## 61   RPL21    1
## 62  RPL27A    1
## 63   RPL28    2
## 64   RPL29    2
## 65   RPL30    1
## 66   RPL31    2
## 67   RPL32    1
## 68   RPL36    1
## 69   RPL37    1
## 70  RPL37A    1
## 71    RPL8    6
## 72    RPL9    4
## 73   RPS13    1
## 74   RPS14    2
## 75  RPS15A    2
## 76   RPS16    1
## 77   RPS20    2
## 78   RPS21    1
## 79   RPS23    4
## 80   RPS24    2
## 81   RPS25    1
## 82  RPS27A    1
## 83   RPS29    1
## 84   RPS3A    1
## 85    RPS5    1
## 86    RPS6    1
## 87    RPS8    2
## 88    RPS9    2

1.8 Rpubs

https://rpubs.com/Charleen_D_Adams/754979