Meta-analytic and Wald ratio Mendelian randomization of expression of cytosolic and mitochondrial ribosomal proteins on ER+ breast cancer in BCAC (2017)

MR-RP-eqtlGen-bcp-17

Charleen D. Adams1 and Bernardo Lemos1,2

1Department of Environmental Health, Program in Molecular and Integrative Physiological Sciences, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
2Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA

2021-04-15

Abstract

Under conditions of nucleolar stress, various ribosomal proteins (RPs) interact with the MDM2 proto-oncogene (MDM2) to stabilize p53. Dysregulation of the RPs may disturb this pathway, but whether changes in the expression of the RPs influence risk for breast cancer in the general population is unknown. We tested whether expression changes in individual cytosolic and mitochondrial RPs influence risk for ER+ breast cancer using inverse-variance weighted (IVW) and Wald ratio Mendelian randomization (MR). Summary statistics for cis expression quantitative trait loci (eQTLs) for SNPs controlling expression of RPs (P<5x10-06) were extracted from the eqtlGen Consortium’s genome-wide association study (GWAS) of expression in blood (n=125 RPs). Summary statistics for 119 of the 125 RPs were extracted from the Breast Cancer Association Consortium’s (BCAC) 2017 genome-wide association study (GWAS) of breast cancer (69501 ER+ cases and 105974 controls). After linkage disequilibirum (LD)-clumping each set of eQTLs for a given RP, there were 31 RPs available for meta-analysis.

0.1 Format eqtlGen GWAS
0.2 Harmonize
1 Split, loop, MR, & FDR-correct results

0.1 Format eqtlGen GWAS

eqtlGen stored MAFs in a separate file.
MAFs are needed for the calculation of the betas & SEs from the Z-scores.
Methods for this are provided by: Zhu 2016: https://www.nature.com/articles/ng.3538

\[ \beta = \frac{z} {\sqrt2p(1-p)(n + z^2)} \] \[ SE = \frac{1} {\sqrt2p(1-p)(n + z^2)} \] where \[p = MAF\]

# README files for the eQTLGen GWAS
# This README accompanies the files with cis-eQTL results from eQTLGen
 
# Files
# -----
# File with full cis-eQTL results: 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
# File with significant (FDR<0.05) cis-eQTL results: 2019-12-11-cis-eQTLsFDR0.05-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
 
# Column Names
# ------------
# Pvalue - P-value
# SNP - SNP rs ID
# SNPChr - SNP chromosome
# SNPPos - SNP position
# AssessedAllele - Assessed allele, the Z-score refers to this allele
# OtherAllele - Not assessed allele
# Zscore - Z-score
# Gene - ENSG name (Ensembl v71) of the eQTL gene
# GeneSymbol - HGNC name of the gene
# GeneChr - Gene chromosome
# GenePos - Centre of gene position
# NrCohorts - Total number of cohorts where this SNP-gene combination was tested
# NrSamples - Total number of samples where this SNP-gene combination was tested
# FDR - False discovery rate estimated based on permutations
# BonferroniP - P-value after Bonferroni correction
# 
# Additional information
# ----------------------
# These files contain all cis-eQTL results from eQTLGen, accompanying the article.
# 19,250 genes that showed expression in blood were tested.
# Every SNP-gene combination with a distance <1Mb from the center of the gene and  tested in at least 2 cohorts was included.
# Associations where SNP/proxy positioned in Illumina probe were not removed from combined analysis.
# UPDATE LOG
# ----------
# 2018-10-19: Initial data release
# 2018-12-19: In the current README, following file names have been fixed and updated:
# 2019-12-20: Cis-eQTLs are now updated to have a 2-cohort filter: every cis-eQTL must be tested in at least 2 cohorts to be reported
# On cluster, I gzipped the eqtlGen file: gzip 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt
# this is because I kept getting error messages both here & on the cluster saying ther was no room on the disk.
# I moved unnecessary files to a folder in my personal computer DB: /Users/charleenadams/Dropbox/Harvard/bc/survival/odyssey_transfer/
# This has the gz files for the protein GWAS, the eQTLGen SMR file, and the eQTLGen EAF file.
# I launched an interacted terminal session with increased memory: 
# salloc -p test  --mem 5000 -t 0-06:00
#-----------------------------------------------------------------------------------------------------------------------
# README file for the MAF eQTLGen cis-eQTL data 
# A README accompanies the file with allele frequencies from eQTLGen cis-eQTL analysis: 
# 2018-07-18_SNP_AF_for_AlleleB_combined_allele_counts_and_MAF_pos_added.txt.gz
# Column Names
# ------------
# SNP -         SNP rs ID
# hg19_chr -    chr number
# hg19_pos -    SNP position (hg19)
# AlleleA -     Other allele
# AlleleB -     Assessed allele
# allA_total -  Total allele count of genotype AA
# allAB_total - Total allele count of genotype AB
# allB_total -  Total allele count of genotype BB
# AlleleB_all - Allele frequency of assessed allele
# Additional information
# ----------------------
# The allele frequencies were calculated using reported allele counts from all cohorts except 
# Framingham Heart Study, because the related samples are present in this cohort.
#-----------------------------------------------------------------------------------------------------------------------
# Remember that the eqtlGen file is stored in /n/home04/cdadams/MR-eqtlGen-long
# Don't move to MR-RP-eqtlGen-bcp-17 because it will result a storage over-capacity
setwd('/n/home04/cdadams/MR-eqtlGen-long/')
tbl=fread('cat 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz | gunzip')

rp_list_expanded=read.csv('/n/home04/cdadams/ld/ldsc/expanded_verified.csv')
rp_list_expanded

# Extract RPs
RPs=tbl[which(tbl$GeneSymbol %in% rp_list_expanded$all_RP),]
freq=table(RPs$GeneSymbol)
rp_freq=as.data.frame(freq)

# Read-in MAFs
eqtl_MAF=read.table(file = "2018-07-18_SNP_AF_for_AlleleB_combined_allele_counts_and_MAF_pos_added.txt", 
                    sep = "\t", header=TRUE)
RPs_total <- merge(RPs,eqtl_MAF,by="SNP")

# Calculate betas and SEs
n=30596
RPs_total$Beta=RPs_total$Zscore/sqrt(2*(RPs_total$AlleleB_all)*
                  (1-RPs_total$AlleleB_all)*(n+RPs_total$Zscore^2))
RPs_total$SE=1/sqrt(2*(RPs_total$AlleleB_all)*
                  (1-RPs_total$AlleleB_all)*(n+RPs_total$Zscore^2))
write.table(RPs_total, file = "/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs_total_maf_beta_se.txt", sep = "\t",
            row.names = FALSE, col.names = TRUE, quote=FALSE)
RPs_total=read.delim("/n/home04/cdadams/MR-RP-eqtlGen/RPs_total_maf_beta_se.txt", check.names = FALSE)

# gzRPs <- gzfile("/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/RPs.gz", "w")
# write.csv(RPs, gzRPs)
# close(gzRPs)
# Read-in for MR formatting

freq=table(RPs_total$GeneSymbol)
rp_freq=as.data.frame(freq)
rp_freq

setwd("/n/home04/cdadams/MR-RP-eqtlGen-bcp-17")

# Read-in the data saved for the 2020 BCAC: MR-RP-eqtlGen-bc
exposure_RP_eqtlGen <- read_exposure_data(
  filename = "/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs_total_maf_beta_se.txt",
  sep = '\t',
  snp_col = 'SNP',
  beta_col = 'Beta',
  se_col = 'SE',
  effect_allele_col = 'AssessedAllele',
  #phenotype_col = 'tissue',
  #units_col = 'tissue',
  other_allele_col = 'OtherAllele',
  eaf_col = 'AlleleB_all',
  #samplesize_col = 'samplesize',
  #ncase_col = 'ncase',
  #ncontrol_col = 'ncontrol',
  gene_col = 'GeneSymbol',
  pval_col = 'Pvalue', 
  id_col = 'RP eqtlGEN'
)

freq=table(exposure_RP_eqtlGen$gene.exposure)
rp_freq=as.data.frame(freq)
rp_freq

exposure_RP_eqtlGen_p=exposure_RP_eqtlGen[which(exposure_RP_eqtlGen$pval.exposure<0.000005),]

freq2=table(exposure_RP_eqtlGen_p$gene.exposure)
rp_freq2=as.data.frame(freq2)
rp_freq2

write.csv(exposure_RP_eqtlGen_p, '/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/exposure_RP_eqtlGen_p.csv')

0.2 Harmonize

# Read-in the BCAC 2017 data
setwd('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17')
exposure_RP_eqtlGen_p=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/exposure_RP_eqtlGen_p.csv')
outcome_dat <- extract_outcome_data(exposure_RP_eqtlGen_p$SNP, c('1127'), #1128 ER-; 1126 overall
  proxies = 1, rsq = 0.8, align_alleles = 1, palindromes = 1, maf_threshold = 0.3)
dat <- harmonise_data(exposure_RP_eqtlGen_p, outcome_dat, action = 2)

# Descriptives
freq3=table(dat$gene.exposure)
rp_freq3=as.data.frame(freq3)
rp_freq3

write.csv(dat, '/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/RP_eqtlGen_bcp_17.csv')

1 Split, loop, MR, & FDR-correct results

setwd('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17')
RP_eqtlGen_bcp_17=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/RP_eqtlGen_bcp_17.csv')
head(RP_eqtlGen_bcp_17)
RP_eqtlGen_bcp_17$id.exposure=RP_eqtlGen_bcp_17$gene.exposure

# Command to remove everything except "v"
#rm(list=(ls()[ls()!="v"]))
# Split the harmonized data into separate data frames for each genes

RP_split <- split(RP_eqtlGen_bcp_17, RP_eqtlGen_bcp_17$id.exposure)
list2env(RP_split, envir= .GlobalEnv) #split the list into separate datasets

# Remove the original RP_eqtlGen_bcp_17 and RP_split
rm(list= ls()[(ls() %in% c("RP_eqtlGen_bcp_17","RP_split"))])

# Save each harmonized gene as a .csv
# for (i in 1:length(files)){
#   write.csv(files[[i]], paste(names(files[i]), ".csv", sep = ""))
# }
# Get them back together: not used here but useful
# dfs = sapply(.GlobalEnv, is.data.frame) 
# dfs
# 
# df=do.call(rbind, mget(names(dfs)[dfs]))
# head(df)
# dim(df)

# Combine the separated files into a list
files <- mget(ls())
length(files)

# Use a loop to clump each RP dataframe in the list, run MR, & save results
for (i in 1:length(files)){
   files[[i]]=clump_data(files[[i]])
   write.csv(files[[i]], paste(names(files[i]), "_clump.csv", sep = ""))
   files[[i]]=mr(files[[i]])
   write.csv(files[[i]], paste(names(files[i]), "_mr.csv", sep = ""))
}

# Idea for reading all mr results in and saving:
# Idea from: https://statisticsglobe.com/merge-csv-files-in-r
# Sadly, didn't work. 
# data_all <- list.files(path = "/n/home04/cdadams/MR-RP-eqtlGen-bcp-17",  
#   pattern = "*_mr.csv", full.names = TRUE) %>% 
#   lapply(read_csv) %>%                                                    
#   bind_rows                                                                
#   write.csv(data_all, "complete_RP_MR_res.csv")

# Merged all of the clumped data 
tbl_fread_clump <- 
    list.files(pattern = "*_clump.csv") %>% 
    map_df(~fread(.))
tbl_fread_clump_df=as.data.frame(tbl_fread_clump)
head(tbl_fread_clump_df)
colnames(tbl_fread_clump_df)

df <- tbl_fread_clump_df[c(3:43)] #check this
freq_df=table(df$gene.exposure)
freq_df=as.data.frame(freq_df)
freq_df
freq_set=freq_df[which(freq_df$Freq>2),]
dim(freq_set)

write.csv(df, '/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/df_17.csv')

# Merge all the MR results and save 
tbl_fread <- 
    list.files(pattern = "*_mr.csv") %>% 
    map_df(~fread(.))
tbl_fread_df=as.data.frame(tbl_fread)
myvars=c('id.exposure','id.outcome','method', 'nsnp','b','se','pval')
res=tbl_fread_df[myvars]  
res$id.outcome="ER+ bc 2017"
res=res[order(res$p),]

# FDR correction
p=res$p
fdr=p.adjust(p, method = "fdr", n = length(p))
res$fdr=fdr
res=res[order(res$fdr),]
write.csv(res,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/MR-eqtlGEN-RP-bcp-17.csv')

res=res[order(res$p),]

# Select just the set with >2 SNPs
res=res[order(res$fdr),]
res_meta_set=res[which(res$nsnp>2),]
res_meta_set=res_meta_set[which(res_meta_set$method=='Inverse variance weighted'),]
rownames(res_meta_set)

# FDR for res_meta_set (only IVWs)
p2=res_meta_set$p
fdr2=p.adjust(p2, method = "fdr", n = length(p2))
res_meta_set$fdr2=fdr2
res_meta_set$fdr=NULL
res_meta_set=res_meta_set[order(res_meta_set$fdr2),]
write.csv(res_meta_set,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/MR-eqtlGEN-RP-bcp-17-meta.csv')

# FDR for res_meta_set: only RP (not MRP) (only IVWs)
res_meta_set=res_meta_set[order(res_meta_set$id.exposure),]

p2=res_meta_set$p
fdr2=p.adjust(p2, method = "fdr", n = length(p2))
res_meta_set$fdr2=fdr2
res_meta_set$fdr=NULL
res_meta_set=res_meta_set[order(res_meta_set$fdr2),]
write.csv(res_meta_set,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/MR-eqtlGEN-RP-bcp-17-meta.csv')

# Consideration for later: Find way to merge the df and results

1.1 Results

1.1.1 Top results ordered by p-value (not FDR)

This includes a list of all tests run

includes 1 SNP tests (Wald ratios)
inverse variance weighted (IVW) – main meta-analytic test
sensitivity estimators: weighted median, weighted mode, and MR-Egger

res=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/MR-eqtlGEN-RP-bcp-17.csv')
head(res, n=5)

##     X id.exposure  id.outcome                    method nsnp           b
## 1 111      MRPS30 ER+ bc 2017                Wald ratio    1  1.36000504
## 2  99     MRPS18C ER+ bc 2017                Wald ratio    1  0.25825473
## 3  15       MRPL2 ER+ bc 2017 Inverse variance weighted    2 -0.12395908
## 4 221       RPS23 ER+ bc 2017 Inverse variance weighted    6  0.07019376
## 5 190      RPL35A ER+ bc 2017                Wald ratio    1 -0.69235417
##           se         pval          fdr
## 1 0.10590203 9.524857e-38 2.314540e-35
## 2 0.05165095 5.733031e-07 6.965633e-05
## 3 0.03209674 1.124417e-04 9.107777e-03
## 4 0.01910243 2.382230e-04 1.447205e-02
## 5 0.19240477 3.201499e-04 1.555929e-02

1.1.2 Top IVW results by FDR

res_meta_set=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/MR-eqtlGEN-RP-bcp-17-meta.csv')
dim(res_meta_set)

## [1] 31  9

res_meta_set

##      X id.exposure  id.outcome                    method nsnp            b
## 1  221       RPS23 ER+ bc 2017 Inverse variance weighted    6  0.070193760
## 2   68      MRPL48 ER+ bc 2017 Inverse variance weighted    5  0.082108561
## 3  148       RPL13 ER+ bc 2017 Inverse variance weighted    4 -0.056833406
## 4   51      MRPL38 ER+ bc 2017 Inverse variance weighted    5 -0.099246046
## 5  236        RPS8 ER+ bc 2017 Inverse variance weighted    3  0.057003623
## 6   46      MRPL37 ER+ bc 2017 Inverse variance weighted    3 -0.069683400
## 7  115      MRPS33 ER+ bc 2017 Inverse variance weighted    3 -0.062020143
## 8  173      RPL27A ER+ bc 2017 Inverse variance weighted    3 -0.086683217
## 9   11      MRPL18 ER+ bc 2017 Inverse variance weighted    4  0.045635808
## 10  75      MRPL52 ER+ bc 2017 Inverse variance weighted    4 -0.039648950
## 11 127       MRPS7 ER+ bc 2017 Inverse variance weighted    4  0.034623396
## 12 132       MRPS9 ER+ bc 2017 Inverse variance weighted    4  0.046895719
## 13 143       RPL12 ER+ bc 2017 Inverse variance weighted    4 -0.029068397
## 14 158       RPL14 ER+ bc 2017 Inverse variance weighted    4 -0.052275808
## 15  23      MRPL21 ER+ bc 2017 Inverse variance weighted    7 -0.019464547
## 16  85      MRPL54 ER+ bc 2017 Inverse variance weighted    4 -0.021513340
## 17  60      MRPL43 ER+ bc 2017 Inverse variance weighted    4 -0.023868518
## 18 183       RPL29 ER+ bc 2017 Inverse variance weighted    3 -0.087949683
## 19  35      MRPL34 ER+ bc 2017 Inverse variance weighted    4  0.012889398
## 20 153      RPL13A ER+ bc 2017 Inverse variance weighted    4  0.022827257
## 21 164       RPL17 ER+ bc 2017 Inverse variance weighted    4 -0.013597557
## 22 203        RPL9 ER+ bc 2017 Inverse variance weighted    4  0.023453545
## 23 137      RPL10A ER+ bc 2017 Inverse variance weighted    4  0.014115894
## 24 122       MRPS6 ER+ bc 2017 Inverse variance weighted    4 -0.009730245
## 25  18      MRPL20 ER+ bc 2017 Inverse variance weighted    3 -0.013188812
## 26  40      MRPL35 ER+ bc 2017 Inverse variance weighted    3 -0.005761123
## 27 178       RPL28 ER+ bc 2017 Inverse variance weighted    5 -0.009670186
## 28 103      MRPS21 ER+ bc 2017 Inverse variance weighted    3  0.004584002
## 29  80      MRPL53 ER+ bc 2017 Inverse variance weighted    3  0.001734184
## 30 198        RPL8 ER+ bc 2017 Inverse variance weighted    6 -0.001497263
## 31 241        RPS9 ER+ bc 2017 Inverse variance weighted    3 -0.003076627
##            se        pval        fdr2
## 1  0.01910243 0.000238223 0.007384913
## 2  0.02852680 0.003998303 0.061973700
## 3  0.02290781 0.013102814 0.135395749
## 4  0.04802896 0.038792689 0.300643339
## 5  0.02898985 0.049260469 0.305414906
## 6  0.03847879 0.070147675 0.310653989
## 7  0.03352201 0.064294547 0.310653989
## 8  0.05069215 0.087267202 0.338160408
## 9  0.02907782 0.116546123 0.362610012
## 10 0.02580286 0.124388738 0.362610012
## 11 0.02278792 0.128668069 0.362610012
## 12 0.04053644 0.247322209 0.567862197
## 13 0.02561515 0.256453896 0.567862197
## 14 0.04288714 0.222876152 0.567862197
## 15 0.01976719 0.324776587 0.642901859
## 16 0.02216833 0.331820314 0.642901859
## 17 0.02894245 0.409548133 0.714777925
## 18 0.10790448 0.415032343 0.714777925
## 19 0.02246856 0.566195323 0.797820682
## 20 0.03615467 0.527793511 0.797820682
## 21 0.02250647 0.545735947 0.797820682
## 22 0.03619654 0.517017363 0.797820682
## 23 0.02932128 0.630216933 0.849422823
## 24 0.02612009 0.709506080 0.916445353
## 25 0.04999126 0.791916809 0.944290119
## 26 0.02567348 0.822446233 0.944290119
## 27 0.03917218 0.805013773 0.944290119
## 28 0.02596783 0.859880579 0.952010641
## 29 0.03032745 0.954400227 0.970182469
## 30 0.04005580 0.970182469 0.970182469
## 31 0.03499394 0.929941161 0.970182469

res_meta_set=res_meta_set[order(res_meta_set$id.exposure), ]
res_meta_set$id.exposure

##  [1] "MRPL18" "MRPL20" "MRPL21" "MRPL34" "MRPL35" "MRPL37" "MRPL38" "MRPL43"
##  [9] "MRPL48" "MRPL52" "MRPL53" "MRPL54" "MRPS21" "MRPS33" "MRPS6"  "MRPS7" 
## [17] "MRPS9"  "RPL10A" "RPL12"  "RPL13"  "RPL13A" "RPL14"  "RPL17"  "RPL27A"
## [25] "RPL28"  "RPL29"  "RPL8"   "RPL9"   "RPS23"  "RPS8"   "RPS9"

# Not tested with IVW

1.2 Wald ratios (all unclumped SNPs)

setwd('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17')
RP_eqtlGen_bcp_17=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/RP_eqtlGen_bcp_17.csv')
RP_eqtlGen_bcp_17$id.exposure=RP_eqtlGen_bcp_17$gene.exposure

# Split the harmonized data into separate data frames for each genes
RP_split <- split(RP_eqtlGen_bcp_17, RP_eqtlGen_bcp_17$id.exposure)
list2env(RP_split, envir= .GlobalEnv) #split the list into separate datasets

# Remove the original RP_eqtlGen_bcp_17 and RP_split
rm(list= ls()[(ls() %in% c("RP_eqtlGen_bcp_17","RP_split") )])

# Combine the separated files into a list
files <- mget(ls())
length(files)

# Use a loop to run MR & save results
for (i in 1:length(files)){
   files[[i]]=mr_singlesnp(files[[i]], parameters = default_parameters(), single_method = "mr_wald_ratio")
   write.csv(files[[i]], paste(names(files[i]), "_Walds_mr.csv", sep = ""))
}

# Merge all the MR results and save 
Walds_res_tbl_fread <- 
    list.files(pattern = "*_Walds_mr.csv") %>% 
    map_df(~fread(.))
Walds_res_tbl_fread_df=as.data.frame(Walds_res_tbl_fread)
myvars=c('id.exposure','id.outcome', 'SNP','b','se','p')
Walds_res=Walds_res_tbl_fread_df[myvars]  
Walds_res$id.outcome="bc 2017"
Walds_res=Walds_res[order(Walds_res$p),]
write.csv(Walds_res,"Walds_includes_metas_res.csv")

# Tried  selecting "rs", but the "inverse variance weighted" includes "rs"
exp1 <- Walds_res[Walds_res$SNP %like% "rs", ]  
Walds_res2=Walds_res %>% 
  filter(str_detect(Walds_res$SNP, "^rs"))
# FDR correction
p=Walds_res2$p
fdr=p.adjust(p, method = "fdr", n = length(p))

Walds_res2$fdr=fdr
Walds_res2=Walds_res2[order(Walds_res2$fdr),]
write.csv(Walds_res2,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/Walds_MR_eqtlGEN_RP_bcp.csv')

1.2.1 Top Wald-ratio results by FDR

Walds_res=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/Walds_MR_eqtlGEN_RP_bcp.csv')
dim(Walds_res)

## [1] 242   8

sig_Walds_res=Walds_res[which(Walds_res$fdr<0.05),]
table(sig_Walds_res$id.exposure)

## 
##  MRPL21  MRPS16 MRPS18C  MRPS30  RPL35A   RPS23 
##       1       1       1       1       1       1

1.3 Meta-analysis for collective RPs

Will make another repo for this, eventually.

# Get the clumped meta-analytic set for the group of RPs
RP_eqtlGen_bcp_17=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/RP_eqtlGen_bcp_17.csv')
RP_eqtlGen_bcp_17$id.exposure="collective RPs"
RP_eqtlGen_bcp_17_clumped=clump_data(RP_eqtlGen_bcp_17)
gene_freq4=table(RP_eqtlGen_bcp_17_clumped$gene.exposure)
gene_freq4=as.data.frame(gene_freq4)
gene_freq4

##        Var1 Freq
## 1     MRPL1    1
## 2    MRPL10    2
## 3    MRPL13    1
## 4    MRPL14    1
## 5    MRPL15    1
## 6    MRPL17    2
## 7    MRPL18    4
## 8    MRPL19    2
## 9     MRPL2    2
## 10   MRPL20    3
## 11   MRPL21    4
## 12   MRPL22    1
## 13   MRPL23    1
## 14   MRPL24    1
## 15   MRPL27    2
## 16   MRPL28    1
## 17   MRPL32    1
## 18   MRPL33    1
## 19   MRPL34    4
## 20   MRPL35    3
## 21   MRPL36    2
## 22   MRPL37    1
## 23   MRPL38    4
## 24   MRPL39    1
## 25    MRPL4    1
## 26   MRPL40    1
## 27   MRPL41    1
## 28   MRPL43    4
## 29   MRPL44    1
## 30   MRPL45    1
## 31   MRPL47    1
## 32   MRPL48    2
## 33   MRPL49    1
## 34   MRPL50    2
## 35   MRPL52    4
## 36   MRPL53    2
## 37   MRPL54    2
## 38   MRPL55    1
## 39   MRPS10    1
## 40   MRPS11    1
## 41   MRPS12    1
## 42   MRPS14    1
## 43   MRPS15    2
## 44   MRPS16    1
## 45   MRPS17    2
## 46  MRPS18B    1
## 47  MRPS18C    1
## 48   MRPS21    2
## 49   MRPS22    1
## 50   MRPS24    1
## 51   MRPS25    1
## 52   MRPS26    1
## 53   MRPS30    1
## 54   MRPS31    1
## 55   MRPS33    3
## 56   MRPS34    1
## 57   MRPS36    1
## 58    MRPS6    3
## 59    MRPS7    2
## 60    MRPS9    2
## 61   RPL10A    1
## 62    RPL11    1
## 63    RPL12    4
## 64    RPL13    4
## 65   RPL13A    2
## 66    RPL14    4
## 67    RPL15    1
## 68    RPL17    4
## 69    RPL21    1
## 70    RPL22    1
## 71   RPL23A    1
## 72   RPL27A    1
## 73    RPL28    2
## 74    RPL29    3
## 75     RPL3    1
## 76    RPL30    1
## 77    RPL31    2
## 78    RPL32    1
## 79   RPL35A    1
## 80    RPL36    2
## 81    RPL37    1
## 82   RPL37A    1
## 83    RPL3L    1
## 84     RPL4    1
## 85     RPL8    6
## 86     RPL9    4
## 87    RPS12    1
## 88    RPS13    1
## 89    RPS14    2
## 90   RPS15A    2
## 91    RPS16    1
## 92    RPS18    1
## 93    RPS19    1
## 94    RPS20    2
## 95    RPS21    1
## 96    RPS23    6
## 97    RPS24    2
## 98    RPS25    1
## 99   RPS27A    1
## 100   RPS29    2
## 101   RPS3A    2
## 102    RPS5    1
## 103    RPS6    1
## 104    RPS7    1
## 105    RPS8    2
## 106    RPS9    2

dat=RP_eqtlGen_bcp_17_clumped

# Run MR
mr_results<- mr(dat)
mr_results

##      id.exposure id.outcome
## 1 collective RPs ieu-a-1127
## 2 collective RPs ieu-a-1127
## 3 collective RPs ieu-a-1127
## 4 collective RPs ieu-a-1127
## 5 collective RPs ieu-a-1127
##                                                                              outcome
## 1 ER+ Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1127
## 2 ER+ Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1127
## 3 ER+ Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1127
## 4 ER+ Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1127
## 5 ER+ Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1127
##   exposure                    method nsnp            b          se      pval
## 1 exposure                  MR Egger  183 -0.008739882 0.011090503 0.4316977
## 2 exposure           Weighted median  183 -0.001341592 0.006683644 0.8409116
## 3 exposure Inverse variance weighted  183  0.002641145 0.006068528 0.6634027
## 4 exposure               Simple mode  183  0.012273735 0.013629989 0.3690470
## 5 exposure             Weighted mode  183 -0.001114050 0.007773544 0.8862015

# Detect and remove outliers (outliers can indicate pleiotropy)
raddat <- format_radial(BXG=dat$beta.exposure, BYG=dat$beta.outcome, 
  seBXG=dat$se.exposure, seBYG=dat$se.outcome, RSID=dat$SNP)
ivwrad <- ivw_radial(raddat, alpha=0.05, weights=1)

## 
## Radial IVW
## 
##                 Estimate   Std.Error   t value  Pr(>|t|)
## Effect (1st) 0.002950807 0.006045006 0.4881396 0.6254510
## Iterative    0.002950809 0.006045005 0.4881400 0.6254507
## Exact (FE)   0.002972601 0.003709906 0.8012605 0.4229809
## Exact (RE)   0.002959073 0.005600999 0.5283116 0.5979162
## 
## 
## Residual standard error: 1.629 on 185 degrees of freedom
## 
## F-statistic: 0.24 on 1 and 185 DF, p-value: 0.626
## Q-Statistic for heterogeneity: 491.1832 on 185 DF , p-value: 1.361352e-29
## 
##  Outliers detected 
## Number of iterations = 2

ivwrad2 <- ivw_radial(raddat, alpha=0.05, weights=1)

## 
## Radial IVW
## 
##                 Estimate   Std.Error   t value  Pr(>|t|)
## Effect (1st) 0.002950807 0.006045006 0.4881396 0.6254510
## Iterative    0.002950809 0.006045005 0.4881400 0.6254507
## Exact (FE)   0.002972601 0.003709906 0.8012605 0.4229809
## Exact (RE)   0.002959073 0.005571372 0.5311210 0.5959720
## 
## 
## Residual standard error: 1.629 on 185 degrees of freedom
## 
## F-statistic: 0.24 on 1 and 185 DF, p-value: 0.626
## Q-Statistic for heterogeneity: 491.1832 on 185 DF , p-value: 1.361352e-29
## 
##  Outliers detected 
## Number of iterations = 2

dim(ivwrad$outliers)[1]

## [1] 26

outliers=ivwrad$outliers[1]
outliers

##            SNP
## 1      rs10934
## 2  rs111614017
## 3   rs11211024
## 4  rs112595439
## 5  rs113368915
## 6  rs114645635
## 7  rs116037967
## 8  rs117831875
## 9      rs15921
## 10  rs17305746
## 11 rs181131314
## 12   rs3096309
## 13   rs4979655
## 14  rs55849964
## 15  rs59922539
## 16  rs61887067
## 17      rs6509
## 18   rs6875287
## 19   rs7100800
## 20   rs7252877
## 21   rs7666601
## 22   rs7709772
## 23  rs78505905
## 24   rs9462853
## 25   rs9472022
## 26   rs9493450

myvars=outliers$SNP
dat2 <- dat[! dat$SNP %in% myvars, ]
dim(dat2)

## [1] 160  42

dim(dat)

## [1] 186  42

dat2$outcome="breast cancer (BCAC ER+ 2017)"
dat2$id.outcome="breast cancer (BCAC ER+ 2017)"

# Run MR without outliers
mr_results <- mr(dat2)
mr_results

##      id.exposure                    id.outcome                       outcome
## 1 collective RPs breast cancer (BCAC ER+ 2017) breast cancer (BCAC ER+ 2017)
## 2 collective RPs breast cancer (BCAC ER+ 2017) breast cancer (BCAC ER+ 2017)
## 3 collective RPs breast cancer (BCAC ER+ 2017) breast cancer (BCAC ER+ 2017)
## 4 collective RPs breast cancer (BCAC ER+ 2017) breast cancer (BCAC ER+ 2017)
## 5 collective RPs breast cancer (BCAC ER+ 2017) breast cancer (BCAC ER+ 2017)
##   exposure                    method nsnp             b          se      pval
## 1 exposure                  MR Egger  158  0.0017883139 0.007264531 0.8058738
## 2 exposure           Weighted median  158  0.0027383838 0.006900867 0.6915021
## 3 exposure Inverse variance weighted  158  0.0040322372 0.004045085 0.3188501
## 4 exposure               Simple mode  158  0.0122904607 0.012119471 0.3120912
## 5 exposure             Weighted mode  158 -0.0005456549 0.008148852 0.9466980

singlesnp_results=mr_singlesnp(dat2, parameters = default_parameters(),
  single_method = "mr_wald_ratio", 
  all_method = c("mr_ivw",
  "mr_egger_regression", 
  "mr_weighted_mode",
  "mr_weighted_median"))
dim(singlesnp_results)

## [1] 162   9

singlesnp_results=singlesnp_results[order(singlesnp_results$SNP),]
dim(singlesnp_results)

## [1] 162   9

# Remove the snps that weren't used in the MR to have a record of the data source
singlesnp_results.2=singlesnp_results[5:162,]
dim(singlesnp_results.2)

## [1] 158   9

dat2.1=dat2[which(dat2$SNP %in% singlesnp_results.2$SNP),]
dim(dat2.1)

## [1] 158  42

dim(dat)

## [1] 186  42

dat=dat2.1
dim(dat)

## [1] 158  42

# Descriptives 
freq5=table(dat$gene.exposure)
freq5=as.data.frame(freq5)
freq5

##       Var1 Freq
## 1   MRPL10    2
## 2   MRPL13    1
## 3   MRPL14    1
## 4   MRPL15    1
## 5   MRPL17    2
## 6   MRPL18    4
## 7   MRPL19    2
## 8   MRPL20    3
## 9   MRPL21    3
## 10  MRPL22    1
## 11  MRPL23    1
## 12  MRPL24    1
## 13  MRPL27    1
## 14  MRPL28    1
## 15  MRPL32    1
## 16  MRPL33    1
## 17  MRPL34    4
## 18  MRPL35    3
## 19  MRPL36    2
## 20  MRPL38    4
## 21  MRPL39    1
## 22   MRPL4    1
## 23  MRPL40    1
## 24  MRPL43    4
## 25  MRPL44    1
## 26  MRPL45    1
## 27  MRPL47    1
## 28  MRPL48    2
## 29  MRPL49    1
## 30  MRPL50    2
## 31  MRPL52    4
## 32  MRPL53    2
## 33  MRPL54    2
## 34  MRPL55    1
## 35  MRPS10    1
## 36  MRPS11    1
## 37  MRPS12    1
## 38  MRPS14    1
## 39  MRPS15    2
## 40  MRPS17    2
## 41 MRPS18B    1
## 42  MRPS21    2
## 43  MRPS22    1
## 44  MRPS24    1
## 45  MRPS25    1
## 46  MRPS26    1
## 47  MRPS31    1
## 48  MRPS33    3
## 49  MRPS34    1
## 50  MRPS36    1
## 51   MRPS6    3
## 52   MRPS7    2
## 53   MRPS9    2
## 54  RPL10A    1
## 55   RPL11    1
## 56   RPL12    4
## 57   RPL13    3
## 58  RPL13A    2
## 59   RPL14    3
## 60   RPL15    1
## 61   RPL17    4
## 62   RPL21    1
## 63   RPL22    1
## 64  RPL27A    1
## 65   RPL28    2
## 66   RPL29    3
## 67   RPL30    1
## 68   RPL31    1
## 69   RPL32    1
## 70   RPL36    1
## 71   RPL37    1
## 72  RPL37A    1
## 73   RPL3L    1
## 74    RPL4    1
## 75    RPL8    5
## 76    RPL9    4
## 77   RPS13    1
## 78   RPS14    1
## 79  RPS15A    1
## 80   RPS16    1
## 81   RPS18    1
## 82   RPS20    2
## 83   RPS21    1
## 84   RPS23    3
## 85   RPS24    2
## 86   RPS25    1
## 87  RPS27A    1
## 88   RPS29    1
## 89   RPS3A    2
## 90    RPS5    1
## 91    RPS6    1
## 92    RPS8    1
## 93    RPS9    2

# r2 and Fstat

n=34684 
k=158
p=dat$eaf.exposure
r2.1<-(2*(dat$beta.exposure^2)*p*(1-p))/1
r2=sum(r2.1, na.rm = TRUE)
r2

## [1] 1.817757

Fstat <- r2*(n-1-k)/((1-r2)*k)
Fstat

## [1] -485.7223

# Combine results
sin=singlesnp_results
res=mr_results
het<-mr_heterogeneity(dat)
plt<-mr_pleiotropy_test(dat)

# Bug in function for combining; save separately
# all_res<-combine_all_mrresults(res,het,plt,sin,ao_slc=F,Exp=T,split.exposure=T,split.outcome=T)
# all_res$lower_intercept=all_res$intercept-1.96*all_res$intercept_se
# all_res$upper_intercept=all_res$intercept+1.96*all_res$intercept_se
# 
# head(all_res[,c("Method","outcome","exposure","nsnp","b","se","or", "or_lci95",  
#   "or_uci95","pval","intercept","intercept_se","intercept_pval",
#   "Q","Q_df","Q_pval")])
#write.csv(all_res,"combined_results_all_RP_MRP_blood_v8.csv")

write.csv(mr_results,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/res_META_MR-RP-eqtlGen-bcp-17.csv')
write.csv(singlesnp_results,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/singlesnp_META_MR-RP-eqtlGen-bcp-17.csv')
write.csv(dat, '/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/SIMEX_META_MR-RP-eqtlGen-bcp-17.csv')
write.csv(plt,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/plt_META_MR-RP-eqtlGen-bcp-17.csv')
write.csv(het,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/het_META_MR-RP-eqtlGen-bcp-17.csv')

1.4 Data for collective RP

1.5 Figures

The top plot displays the individual SNPs included in the analysis, their effect estimates and 95% confidence intervals, along with the meta-analyzed “causal” estimates (the term “causal” is used in MR to mean MR is less prone to confounding and reverse causation than observational designs).
The inverse-variance weighted (IVW) estimate is the approach that is used to test the causal effect. The weighted median, weighted mode, and MR-Egger are sensitivity estimators. If the directions and magnitudes of the sensitivity estimators’ findings comport with those of the IVW, this is evidence against pleiotropy. (The sensitivity estimators needn’t all be statistically signficiant. For instance, the MR-Egger estimate isn’t, but this only means that the MR-Egger test doesn’t provide additional support for the IVW. The direction and magnitude of the MR-Egger estimate does, however, comport with the IVW’s.)
The bottom (scatter) plot is a sensitivity analysis for heterogeneity in the effect estimates.

1.6 IVW & sensitivity estimator results

##                      method nsnp    or    se or_lci95 or_uci95  pval
## 1                  MR Egger  158 1.002 0.007    0.988    1.016 0.806
## 2           Weighted median  158 1.003 0.007    0.989    1.016 0.692
## 3 Inverse variance weighted  158 1.004 0.004    0.996    1.012 0.319
## 4               Simple mode  158 1.012 0.012    0.989    1.037 0.312
## 5             Weighted mode  158 0.999 0.008    0.984    1.016 0.947

1.7 Wald ratio results

##       Var1 Freq
## 1   MRPL10    2
## 2   MRPL13    1
## 3   MRPL14    1
## 4   MRPL15    1
## 5   MRPL17    2
## 6   MRPL18    4
## 7   MRPL19    2
## 8   MRPL20    3
## 9   MRPL21    3
## 10  MRPL22    1
## 11  MRPL23    1
## 12  MRPL24    1
## 13  MRPL27    1
## 14  MRPL28    1
## 15  MRPL32    1
## 16  MRPL33    1
## 17  MRPL34    4
## 18  MRPL35    3
## 19  MRPL36    2
## 20  MRPL38    4
## 21  MRPL39    1
## 22   MRPL4    1
## 23  MRPL40    1
## 24  MRPL43    4
## 25  MRPL44    1
## 26  MRPL45    1
## 27  MRPL47    1
## 28  MRPL48    2
## 29  MRPL49    1
## 30  MRPL50    2
## 31  MRPL52    4
## 32  MRPL53    2
## 33  MRPL54    2
## 34  MRPL55    1
## 35  MRPS10    1
## 36  MRPS11    1
## 37  MRPS12    1
## 38  MRPS14    1
## 39  MRPS15    2
## 40  MRPS17    2
## 41 MRPS18B    1
## 42  MRPS21    2
## 43  MRPS22    1
## 44  MRPS24    1
## 45  MRPS25    1
## 46  MRPS26    1
## 47  MRPS31    1
## 48  MRPS33    3
## 49  MRPS34    1
## 50  MRPS36    1
## 51   MRPS6    3
## 52   MRPS7    2
## 53   MRPS9    2
## 54  RPL10A    1
## 55   RPL11    1
## 56   RPL12    4
## 57   RPL13    3
## 58  RPL13A    2
## 59   RPL14    3
## 60   RPL15    1
## 61   RPL17    4
## 62   RPL21    1
## 63   RPL22    1
## 64  RPL27A    1
## 65   RPL28    2
## 66   RPL29    3
## 67   RPL30    1
## 68   RPL31    1
## 69   RPL32    1
## 70   RPL36    1
## 71   RPL37    1
## 72  RPL37A    1
## 73   RPL3L    1
## 74    RPL4    1
## 75    RPL8    5
## 76    RPL9    4
## 77   RPS13    1
## 78   RPS14    1
## 79  RPS15A    1
## 80   RPS16    1
## 81   RPS18    1
## 82   RPS20    2
## 83   RPS21    1
## 84   RPS23    3
## 85   RPS24    2
## 86   RPS25    1
## 87  RPS27A    1
## 88   RPS29    1
## 89   RPS3A    2
## 90    RPS5    1
## 91    RPS6    1
## 92    RPS8    1
## 93    RPS9    2

1.8 Rpubs

https://rpubs.com/Charleen_D_Adams/754999