Under conditions of nucleolar stress, various ribosomal proteins (RPs) interact with the MDM2 proto-oncogene (MDM2) to stabilize p53. Dysregulation of the RPs may disturb this pathway, but whether changes in the expression of the RPs influence risk for breast cancer in the general population is unknown. We tested whether expression changes in individual cytosolic and mitochondrial RPs influence risk for ER+ breast cancer using inverse-variance weighted (IVW) and Wald ratio Mendelian randomization (MR). Summary statistics for cis expression quantitative trait loci (eQTLs) for SNPs controlling expression of RPs (P<5x10-06) were extracted from the eqtlGen Consortium’s genome-wide association study (GWAS) of expression in blood (n=125 RPs). Summary statistics for 119 of the 125 RPs were extracted from the Breast Cancer Association Consortium’s (BCAC) 2017 genome-wide association study (GWAS) of breast cancer (69501 ER+ cases and 105974 controls). After linkage disequilibirum (LD)-clumping each set of eQTLs for a given RP, there were 31 RPs available for meta-analysis.
\[ \beta = \frac{z} {\sqrt2p(1-p)(n + z^2)} \] \[ SE = \frac{1} {\sqrt2p(1-p)(n + z^2)} \] where \[p = MAF\]
# README files for the eQTLGen GWAS
# This README accompanies the files with cis-eQTL results from eQTLGen
# Files
# -----
# File with full cis-eQTL results: 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
# File with significant (FDR<0.05) cis-eQTL results: 2019-12-11-cis-eQTLsFDR0.05-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
# Column Names
# ------------
# Pvalue - P-value
# SNP - SNP rs ID
# SNPChr - SNP chromosome
# SNPPos - SNP position
# AssessedAllele - Assessed allele, the Z-score refers to this allele
# OtherAllele - Not assessed allele
# Zscore - Z-score
# Gene - ENSG name (Ensembl v71) of the eQTL gene
# GeneSymbol - HGNC name of the gene
# GeneChr - Gene chromosome
# GenePos - Centre of gene position
# NrCohorts - Total number of cohorts where this SNP-gene combination was tested
# NrSamples - Total number of samples where this SNP-gene combination was tested
# FDR - False discovery rate estimated based on permutations
# BonferroniP - P-value after Bonferroni correction
#
# Additional information
# ----------------------
# These files contain all cis-eQTL results from eQTLGen, accompanying the article.
# 19,250 genes that showed expression in blood were tested.
# Every SNP-gene combination with a distance <1Mb from the center of the gene and tested in at least 2 cohorts was included.
# Associations where SNP/proxy positioned in Illumina probe were not removed from combined analysis.
# UPDATE LOG
# ----------
# 2018-10-19: Initial data release
# 2018-12-19: In the current README, following file names have been fixed and updated:
# 2019-12-20: Cis-eQTLs are now updated to have a 2-cohort filter: every cis-eQTL must be tested in at least 2 cohorts to be reported
# On cluster, I gzipped the eqtlGen file: gzip 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt
# this is because I kept getting error messages both here & on the cluster saying ther was no room on the disk.
# I moved unnecessary files to a folder in my personal computer DB: /Users/charleenadams/Dropbox/Harvard/bc/survival/odyssey_transfer/
# This has the gz files for the protein GWAS, the eQTLGen SMR file, and the eQTLGen EAF file.
# I launched an interacted terminal session with increased memory:
# salloc -p test --mem 5000 -t 0-06:00
#-----------------------------------------------------------------------------------------------------------------------
# README file for the MAF eQTLGen cis-eQTL data
# A README accompanies the file with allele frequencies from eQTLGen cis-eQTL analysis:
# 2018-07-18_SNP_AF_for_AlleleB_combined_allele_counts_and_MAF_pos_added.txt.gz
# Column Names
# ------------
# SNP - SNP rs ID
# hg19_chr - chr number
# hg19_pos - SNP position (hg19)
# AlleleA - Other allele
# AlleleB - Assessed allele
# allA_total - Total allele count of genotype AA
# allAB_total - Total allele count of genotype AB
# allB_total - Total allele count of genotype BB
# AlleleB_all - Allele frequency of assessed allele
# Additional information
# ----------------------
# The allele frequencies were calculated using reported allele counts from all cohorts except
# Framingham Heart Study, because the related samples are present in this cohort.
#-----------------------------------------------------------------------------------------------------------------------
# Remember that the eqtlGen file is stored in /n/home04/cdadams/MR-eqtlGen-long
# Don't move to MR-RP-eqtlGen-bcp-17 because it will result a storage over-capacity
setwd('/n/home04/cdadams/MR-eqtlGen-long/')
tbl=fread('cat 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz | gunzip')
rp_list_expanded=read.csv('/n/home04/cdadams/ld/ldsc/expanded_verified.csv')
rp_list_expanded
# Extract RPs
RPs=tbl[which(tbl$GeneSymbol %in% rp_list_expanded$all_RP),]
freq=table(RPs$GeneSymbol)
rp_freq=as.data.frame(freq)
# Read-in MAFs
eqtl_MAF=read.table(file = "2018-07-18_SNP_AF_for_AlleleB_combined_allele_counts_and_MAF_pos_added.txt",
sep = "\t", header=TRUE)
RPs_total <- merge(RPs,eqtl_MAF,by="SNP")
# Calculate betas and SEs
n=30596
RPs_total$Beta=RPs_total$Zscore/sqrt(2*(RPs_total$AlleleB_all)*
(1-RPs_total$AlleleB_all)*(n+RPs_total$Zscore^2))
RPs_total$SE=1/sqrt(2*(RPs_total$AlleleB_all)*
(1-RPs_total$AlleleB_all)*(n+RPs_total$Zscore^2))
write.table(RPs_total, file = "/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs_total_maf_beta_se.txt", sep = "\t",
row.names = FALSE, col.names = TRUE, quote=FALSE)
RPs_total=read.delim("/n/home04/cdadams/MR-RP-eqtlGen/RPs_total_maf_beta_se.txt", check.names = FALSE)
# gzRPs <- gzfile("/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/RPs.gz", "w")
# write.csv(RPs, gzRPs)
# close(gzRPs)
# Read-in for MR formatting
freq=table(RPs_total$GeneSymbol)
rp_freq=as.data.frame(freq)
rp_freq
setwd("/n/home04/cdadams/MR-RP-eqtlGen-bcp-17")
# Read-in the data saved for the 2020 BCAC: MR-RP-eqtlGen-bc
exposure_RP_eqtlGen <- read_exposure_data(
filename = "/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs_total_maf_beta_se.txt",
sep = '\t',
snp_col = 'SNP',
beta_col = 'Beta',
se_col = 'SE',
effect_allele_col = 'AssessedAllele',
#phenotype_col = 'tissue',
#units_col = 'tissue',
other_allele_col = 'OtherAllele',
eaf_col = 'AlleleB_all',
#samplesize_col = 'samplesize',
#ncase_col = 'ncase',
#ncontrol_col = 'ncontrol',
gene_col = 'GeneSymbol',
pval_col = 'Pvalue',
id_col = 'RP eqtlGEN'
)
freq=table(exposure_RP_eqtlGen$gene.exposure)
rp_freq=as.data.frame(freq)
rp_freq
exposure_RP_eqtlGen_p=exposure_RP_eqtlGen[which(exposure_RP_eqtlGen$pval.exposure<0.000005),]
freq2=table(exposure_RP_eqtlGen_p$gene.exposure)
rp_freq2=as.data.frame(freq2)
rp_freq2
write.csv(exposure_RP_eqtlGen_p, '/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/exposure_RP_eqtlGen_p.csv')
# Read-in the BCAC 2017 data
setwd('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17')
exposure_RP_eqtlGen_p=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/exposure_RP_eqtlGen_p.csv')
outcome_dat <- extract_outcome_data(exposure_RP_eqtlGen_p$SNP, c('1127'), #1128 ER-; 1126 overall
proxies = 1, rsq = 0.8, align_alleles = 1, palindromes = 1, maf_threshold = 0.3)
dat <- harmonise_data(exposure_RP_eqtlGen_p, outcome_dat, action = 2)
# Descriptives
freq3=table(dat$gene.exposure)
rp_freq3=as.data.frame(freq3)
rp_freq3
write.csv(dat, '/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/RP_eqtlGen_bcp_17.csv')
setwd('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17')
RP_eqtlGen_bcp_17=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/RP_eqtlGen_bcp_17.csv')
head(RP_eqtlGen_bcp_17)
RP_eqtlGen_bcp_17$id.exposure=RP_eqtlGen_bcp_17$gene.exposure
# Command to remove everything except "v"
#rm(list=(ls()[ls()!="v"]))
# Split the harmonized data into separate data frames for each genes
RP_split <- split(RP_eqtlGen_bcp_17, RP_eqtlGen_bcp_17$id.exposure)
list2env(RP_split, envir= .GlobalEnv) #split the list into separate datasets
# Remove the original RP_eqtlGen_bcp_17 and RP_split
rm(list= ls()[(ls() %in% c("RP_eqtlGen_bcp_17","RP_split"))])
# Save each harmonized gene as a .csv
# for (i in 1:length(files)){
# write.csv(files[[i]], paste(names(files[i]), ".csv", sep = ""))
# }
# Get them back together: not used here but useful
# dfs = sapply(.GlobalEnv, is.data.frame)
# dfs
#
# df=do.call(rbind, mget(names(dfs)[dfs]))
# head(df)
# dim(df)
# Combine the separated files into a list
files <- mget(ls())
length(files)
# Use a loop to clump each RP dataframe in the list, run MR, & save results
for (i in 1:length(files)){
files[[i]]=clump_data(files[[i]])
write.csv(files[[i]], paste(names(files[i]), "_clump.csv", sep = ""))
files[[i]]=mr(files[[i]])
write.csv(files[[i]], paste(names(files[i]), "_mr.csv", sep = ""))
}
# Idea for reading all mr results in and saving:
# Idea from: https://statisticsglobe.com/merge-csv-files-in-r
# Sadly, didn't work.
# data_all <- list.files(path = "/n/home04/cdadams/MR-RP-eqtlGen-bcp-17",
# pattern = "*_mr.csv", full.names = TRUE) %>%
# lapply(read_csv) %>%
# bind_rows
# write.csv(data_all, "complete_RP_MR_res.csv")
# Merged all of the clumped data
tbl_fread_clump <-
list.files(pattern = "*_clump.csv") %>%
map_df(~fread(.))
tbl_fread_clump_df=as.data.frame(tbl_fread_clump)
head(tbl_fread_clump_df)
colnames(tbl_fread_clump_df)
df <- tbl_fread_clump_df[c(3:43)] #check this
freq_df=table(df$gene.exposure)
freq_df=as.data.frame(freq_df)
freq_df
freq_set=freq_df[which(freq_df$Freq>2),]
dim(freq_set)
write.csv(df, '/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/df_17.csv')
# Merge all the MR results and save
tbl_fread <-
list.files(pattern = "*_mr.csv") %>%
map_df(~fread(.))
tbl_fread_df=as.data.frame(tbl_fread)
myvars=c('id.exposure','id.outcome','method', 'nsnp','b','se','pval')
res=tbl_fread_df[myvars]
res$id.outcome="ER+ bc 2017"
res=res[order(res$p),]
# FDR correction
p=res$p
fdr=p.adjust(p, method = "fdr", n = length(p))
res$fdr=fdr
res=res[order(res$fdr),]
write.csv(res,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/MR-eqtlGEN-RP-bcp-17.csv')
res=res[order(res$p),]
# Select just the set with >2 SNPs
res=res[order(res$fdr),]
res_meta_set=res[which(res$nsnp>2),]
res_meta_set=res_meta_set[which(res_meta_set$method=='Inverse variance weighted'),]
rownames(res_meta_set)
# FDR for res_meta_set (only IVWs)
p2=res_meta_set$p
fdr2=p.adjust(p2, method = "fdr", n = length(p2))
res_meta_set$fdr2=fdr2
res_meta_set$fdr=NULL
res_meta_set=res_meta_set[order(res_meta_set$fdr2),]
write.csv(res_meta_set,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/MR-eqtlGEN-RP-bcp-17-meta.csv')
# FDR for res_meta_set: only RP (not MRP) (only IVWs)
res_meta_set=res_meta_set[order(res_meta_set$id.exposure),]
p2=res_meta_set$p
fdr2=p.adjust(p2, method = "fdr", n = length(p2))
res_meta_set$fdr2=fdr2
res_meta_set$fdr=NULL
res_meta_set=res_meta_set[order(res_meta_set$fdr2),]
write.csv(res_meta_set,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/MR-eqtlGEN-RP-bcp-17-meta.csv')
# Consideration for later: Find way to merge the df and results
This includes a list of all tests run
res=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/MR-eqtlGEN-RP-bcp-17.csv')
head(res, n=5)
## X id.exposure id.outcome method nsnp b
## 1 111 MRPS30 ER+ bc 2017 Wald ratio 1 1.36000504
## 2 99 MRPS18C ER+ bc 2017 Wald ratio 1 0.25825473
## 3 15 MRPL2 ER+ bc 2017 Inverse variance weighted 2 -0.12395908
## 4 221 RPS23 ER+ bc 2017 Inverse variance weighted 6 0.07019376
## 5 190 RPL35A ER+ bc 2017 Wald ratio 1 -0.69235417
## se pval fdr
## 1 0.10590203 9.524857e-38 2.314540e-35
## 2 0.05165095 5.733031e-07 6.965633e-05
## 3 0.03209674 1.124417e-04 9.107777e-03
## 4 0.01910243 2.382230e-04 1.447205e-02
## 5 0.19240477 3.201499e-04 1.555929e-02
res_meta_set=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/MR-eqtlGEN-RP-bcp-17-meta.csv')
dim(res_meta_set)
## [1] 31 9
res_meta_set
## X id.exposure id.outcome method nsnp b
## 1 221 RPS23 ER+ bc 2017 Inverse variance weighted 6 0.070193760
## 2 68 MRPL48 ER+ bc 2017 Inverse variance weighted 5 0.082108561
## 3 148 RPL13 ER+ bc 2017 Inverse variance weighted 4 -0.056833406
## 4 51 MRPL38 ER+ bc 2017 Inverse variance weighted 5 -0.099246046
## 5 236 RPS8 ER+ bc 2017 Inverse variance weighted 3 0.057003623
## 6 46 MRPL37 ER+ bc 2017 Inverse variance weighted 3 -0.069683400
## 7 115 MRPS33 ER+ bc 2017 Inverse variance weighted 3 -0.062020143
## 8 173 RPL27A ER+ bc 2017 Inverse variance weighted 3 -0.086683217
## 9 11 MRPL18 ER+ bc 2017 Inverse variance weighted 4 0.045635808
## 10 75 MRPL52 ER+ bc 2017 Inverse variance weighted 4 -0.039648950
## 11 127 MRPS7 ER+ bc 2017 Inverse variance weighted 4 0.034623396
## 12 132 MRPS9 ER+ bc 2017 Inverse variance weighted 4 0.046895719
## 13 143 RPL12 ER+ bc 2017 Inverse variance weighted 4 -0.029068397
## 14 158 RPL14 ER+ bc 2017 Inverse variance weighted 4 -0.052275808
## 15 23 MRPL21 ER+ bc 2017 Inverse variance weighted 7 -0.019464547
## 16 85 MRPL54 ER+ bc 2017 Inverse variance weighted 4 -0.021513340
## 17 60 MRPL43 ER+ bc 2017 Inverse variance weighted 4 -0.023868518
## 18 183 RPL29 ER+ bc 2017 Inverse variance weighted 3 -0.087949683
## 19 35 MRPL34 ER+ bc 2017 Inverse variance weighted 4 0.012889398
## 20 153 RPL13A ER+ bc 2017 Inverse variance weighted 4 0.022827257
## 21 164 RPL17 ER+ bc 2017 Inverse variance weighted 4 -0.013597557
## 22 203 RPL9 ER+ bc 2017 Inverse variance weighted 4 0.023453545
## 23 137 RPL10A ER+ bc 2017 Inverse variance weighted 4 0.014115894
## 24 122 MRPS6 ER+ bc 2017 Inverse variance weighted 4 -0.009730245
## 25 18 MRPL20 ER+ bc 2017 Inverse variance weighted 3 -0.013188812
## 26 40 MRPL35 ER+ bc 2017 Inverse variance weighted 3 -0.005761123
## 27 178 RPL28 ER+ bc 2017 Inverse variance weighted 5 -0.009670186
## 28 103 MRPS21 ER+ bc 2017 Inverse variance weighted 3 0.004584002
## 29 80 MRPL53 ER+ bc 2017 Inverse variance weighted 3 0.001734184
## 30 198 RPL8 ER+ bc 2017 Inverse variance weighted 6 -0.001497263
## 31 241 RPS9 ER+ bc 2017 Inverse variance weighted 3 -0.003076627
## se pval fdr2
## 1 0.01910243 0.000238223 0.007384913
## 2 0.02852680 0.003998303 0.061973700
## 3 0.02290781 0.013102814 0.135395749
## 4 0.04802896 0.038792689 0.300643339
## 5 0.02898985 0.049260469 0.305414906
## 6 0.03847879 0.070147675 0.310653989
## 7 0.03352201 0.064294547 0.310653989
## 8 0.05069215 0.087267202 0.338160408
## 9 0.02907782 0.116546123 0.362610012
## 10 0.02580286 0.124388738 0.362610012
## 11 0.02278792 0.128668069 0.362610012
## 12 0.04053644 0.247322209 0.567862197
## 13 0.02561515 0.256453896 0.567862197
## 14 0.04288714 0.222876152 0.567862197
## 15 0.01976719 0.324776587 0.642901859
## 16 0.02216833 0.331820314 0.642901859
## 17 0.02894245 0.409548133 0.714777925
## 18 0.10790448 0.415032343 0.714777925
## 19 0.02246856 0.566195323 0.797820682
## 20 0.03615467 0.527793511 0.797820682
## 21 0.02250647 0.545735947 0.797820682
## 22 0.03619654 0.517017363 0.797820682
## 23 0.02932128 0.630216933 0.849422823
## 24 0.02612009 0.709506080 0.916445353
## 25 0.04999126 0.791916809 0.944290119
## 26 0.02567348 0.822446233 0.944290119
## 27 0.03917218 0.805013773 0.944290119
## 28 0.02596783 0.859880579 0.952010641
## 29 0.03032745 0.954400227 0.970182469
## 30 0.04005580 0.970182469 0.970182469
## 31 0.03499394 0.929941161 0.970182469
res_meta_set=res_meta_set[order(res_meta_set$id.exposure), ]
res_meta_set$id.exposure
## [1] "MRPL18" "MRPL20" "MRPL21" "MRPL34" "MRPL35" "MRPL37" "MRPL38" "MRPL43"
## [9] "MRPL48" "MRPL52" "MRPL53" "MRPL54" "MRPS21" "MRPS33" "MRPS6" "MRPS7"
## [17] "MRPS9" "RPL10A" "RPL12" "RPL13" "RPL13A" "RPL14" "RPL17" "RPL27A"
## [25] "RPL28" "RPL29" "RPL8" "RPL9" "RPS23" "RPS8" "RPS9"
# Not tested with IVW
setwd('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17')
RP_eqtlGen_bcp_17=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/RP_eqtlGen_bcp_17.csv')
RP_eqtlGen_bcp_17$id.exposure=RP_eqtlGen_bcp_17$gene.exposure
# Split the harmonized data into separate data frames for each genes
RP_split <- split(RP_eqtlGen_bcp_17, RP_eqtlGen_bcp_17$id.exposure)
list2env(RP_split, envir= .GlobalEnv) #split the list into separate datasets
# Remove the original RP_eqtlGen_bcp_17 and RP_split
rm(list= ls()[(ls() %in% c("RP_eqtlGen_bcp_17","RP_split") )])
# Combine the separated files into a list
files <- mget(ls())
length(files)
# Use a loop to run MR & save results
for (i in 1:length(files)){
files[[i]]=mr_singlesnp(files[[i]], parameters = default_parameters(), single_method = "mr_wald_ratio")
write.csv(files[[i]], paste(names(files[i]), "_Walds_mr.csv", sep = ""))
}
# Merge all the MR results and save
Walds_res_tbl_fread <-
list.files(pattern = "*_Walds_mr.csv") %>%
map_df(~fread(.))
Walds_res_tbl_fread_df=as.data.frame(Walds_res_tbl_fread)
myvars=c('id.exposure','id.outcome', 'SNP','b','se','p')
Walds_res=Walds_res_tbl_fread_df[myvars]
Walds_res$id.outcome="bc 2017"
Walds_res=Walds_res[order(Walds_res$p),]
write.csv(Walds_res,"Walds_includes_metas_res.csv")
# Tried selecting "rs", but the "inverse variance weighted" includes "rs"
exp1 <- Walds_res[Walds_res$SNP %like% "rs", ]
Walds_res2=Walds_res %>%
filter(str_detect(Walds_res$SNP, "^rs"))
# FDR correction
p=Walds_res2$p
fdr=p.adjust(p, method = "fdr", n = length(p))
Walds_res2$fdr=fdr
Walds_res2=Walds_res2[order(Walds_res2$fdr),]
write.csv(Walds_res2,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/Walds_MR_eqtlGEN_RP_bcp.csv')
Walds_res=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/Walds_MR_eqtlGEN_RP_bcp.csv')
dim(Walds_res)
## [1] 242 8
sig_Walds_res=Walds_res[which(Walds_res$fdr<0.05),]
table(sig_Walds_res$id.exposure)
##
## MRPL21 MRPS16 MRPS18C MRPS30 RPL35A RPS23
## 1 1 1 1 1 1
Will make another repo for this, eventually.
# Get the clumped meta-analytic set for the group of RPs
RP_eqtlGen_bcp_17=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/RP_eqtlGen_bcp_17.csv')
RP_eqtlGen_bcp_17$id.exposure="collective RPs"
RP_eqtlGen_bcp_17_clumped=clump_data(RP_eqtlGen_bcp_17)
gene_freq4=table(RP_eqtlGen_bcp_17_clumped$gene.exposure)
gene_freq4=as.data.frame(gene_freq4)
gene_freq4
## Var1 Freq
## 1 MRPL1 1
## 2 MRPL10 2
## 3 MRPL13 1
## 4 MRPL14 1
## 5 MRPL15 1
## 6 MRPL17 2
## 7 MRPL18 4
## 8 MRPL19 2
## 9 MRPL2 2
## 10 MRPL20 3
## 11 MRPL21 4
## 12 MRPL22 1
## 13 MRPL23 1
## 14 MRPL24 1
## 15 MRPL27 2
## 16 MRPL28 1
## 17 MRPL32 1
## 18 MRPL33 1
## 19 MRPL34 4
## 20 MRPL35 3
## 21 MRPL36 2
## 22 MRPL37 1
## 23 MRPL38 4
## 24 MRPL39 1
## 25 MRPL4 1
## 26 MRPL40 1
## 27 MRPL41 1
## 28 MRPL43 4
## 29 MRPL44 1
## 30 MRPL45 1
## 31 MRPL47 1
## 32 MRPL48 2
## 33 MRPL49 1
## 34 MRPL50 2
## 35 MRPL52 4
## 36 MRPL53 2
## 37 MRPL54 2
## 38 MRPL55 1
## 39 MRPS10 1
## 40 MRPS11 1
## 41 MRPS12 1
## 42 MRPS14 1
## 43 MRPS15 2
## 44 MRPS16 1
## 45 MRPS17 2
## 46 MRPS18B 1
## 47 MRPS18C 1
## 48 MRPS21 2
## 49 MRPS22 1
## 50 MRPS24 1
## 51 MRPS25 1
## 52 MRPS26 1
## 53 MRPS30 1
## 54 MRPS31 1
## 55 MRPS33 3
## 56 MRPS34 1
## 57 MRPS36 1
## 58 MRPS6 3
## 59 MRPS7 2
## 60 MRPS9 2
## 61 RPL10A 1
## 62 RPL11 1
## 63 RPL12 4
## 64 RPL13 4
## 65 RPL13A 2
## 66 RPL14 4
## 67 RPL15 1
## 68 RPL17 4
## 69 RPL21 1
## 70 RPL22 1
## 71 RPL23A 1
## 72 RPL27A 1
## 73 RPL28 2
## 74 RPL29 3
## 75 RPL3 1
## 76 RPL30 1
## 77 RPL31 2
## 78 RPL32 1
## 79 RPL35A 1
## 80 RPL36 2
## 81 RPL37 1
## 82 RPL37A 1
## 83 RPL3L 1
## 84 RPL4 1
## 85 RPL8 6
## 86 RPL9 4
## 87 RPS12 1
## 88 RPS13 1
## 89 RPS14 2
## 90 RPS15A 2
## 91 RPS16 1
## 92 RPS18 1
## 93 RPS19 1
## 94 RPS20 2
## 95 RPS21 1
## 96 RPS23 6
## 97 RPS24 2
## 98 RPS25 1
## 99 RPS27A 1
## 100 RPS29 2
## 101 RPS3A 2
## 102 RPS5 1
## 103 RPS6 1
## 104 RPS7 1
## 105 RPS8 2
## 106 RPS9 2
dat=RP_eqtlGen_bcp_17_clumped
# Run MR
mr_results<- mr(dat)
mr_results
## id.exposure id.outcome
## 1 collective RPs ieu-a-1127
## 2 collective RPs ieu-a-1127
## 3 collective RPs ieu-a-1127
## 4 collective RPs ieu-a-1127
## 5 collective RPs ieu-a-1127
## outcome
## 1 ER+ Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1127
## 2 ER+ Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1127
## 3 ER+ Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1127
## 4 ER+ Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1127
## 5 ER+ Breast cancer (Combined Oncoarray; iCOGS; GWAS meta analysis) || id:ieu-a-1127
## exposure method nsnp b se pval
## 1 exposure MR Egger 183 -0.008739882 0.011090503 0.4316977
## 2 exposure Weighted median 183 -0.001341592 0.006683644 0.8409116
## 3 exposure Inverse variance weighted 183 0.002641145 0.006068528 0.6634027
## 4 exposure Simple mode 183 0.012273735 0.013629989 0.3690470
## 5 exposure Weighted mode 183 -0.001114050 0.007773544 0.8862015
# Detect and remove outliers (outliers can indicate pleiotropy)
raddat <- format_radial(BXG=dat$beta.exposure, BYG=dat$beta.outcome,
seBXG=dat$se.exposure, seBYG=dat$se.outcome, RSID=dat$SNP)
ivwrad <- ivw_radial(raddat, alpha=0.05, weights=1)
##
## Radial IVW
##
## Estimate Std.Error t value Pr(>|t|)
## Effect (1st) 0.002950807 0.006045006 0.4881396 0.6254510
## Iterative 0.002950809 0.006045005 0.4881400 0.6254507
## Exact (FE) 0.002972601 0.003709906 0.8012605 0.4229809
## Exact (RE) 0.002959073 0.005600999 0.5283116 0.5979162
##
##
## Residual standard error: 1.629 on 185 degrees of freedom
##
## F-statistic: 0.24 on 1 and 185 DF, p-value: 0.626
## Q-Statistic for heterogeneity: 491.1832 on 185 DF , p-value: 1.361352e-29
##
## Outliers detected
## Number of iterations = 2
ivwrad2 <- ivw_radial(raddat, alpha=0.05, weights=1)
##
## Radial IVW
##
## Estimate Std.Error t value Pr(>|t|)
## Effect (1st) 0.002950807 0.006045006 0.4881396 0.6254510
## Iterative 0.002950809 0.006045005 0.4881400 0.6254507
## Exact (FE) 0.002972601 0.003709906 0.8012605 0.4229809
## Exact (RE) 0.002959073 0.005571372 0.5311210 0.5959720
##
##
## Residual standard error: 1.629 on 185 degrees of freedom
##
## F-statistic: 0.24 on 1 and 185 DF, p-value: 0.626
## Q-Statistic for heterogeneity: 491.1832 on 185 DF , p-value: 1.361352e-29
##
## Outliers detected
## Number of iterations = 2
dim(ivwrad$outliers)[1]
## [1] 26
outliers=ivwrad$outliers[1]
outliers
## SNP
## 1 rs10934
## 2 rs111614017
## 3 rs11211024
## 4 rs112595439
## 5 rs113368915
## 6 rs114645635
## 7 rs116037967
## 8 rs117831875
## 9 rs15921
## 10 rs17305746
## 11 rs181131314
## 12 rs3096309
## 13 rs4979655
## 14 rs55849964
## 15 rs59922539
## 16 rs61887067
## 17 rs6509
## 18 rs6875287
## 19 rs7100800
## 20 rs7252877
## 21 rs7666601
## 22 rs7709772
## 23 rs78505905
## 24 rs9462853
## 25 rs9472022
## 26 rs9493450
myvars=outliers$SNP
dat2 <- dat[! dat$SNP %in% myvars, ]
dim(dat2)
## [1] 160 42
dim(dat)
## [1] 186 42
dat2$outcome="breast cancer (BCAC ER+ 2017)"
dat2$id.outcome="breast cancer (BCAC ER+ 2017)"
# Run MR without outliers
mr_results <- mr(dat2)
mr_results
## id.exposure id.outcome outcome
## 1 collective RPs breast cancer (BCAC ER+ 2017) breast cancer (BCAC ER+ 2017)
## 2 collective RPs breast cancer (BCAC ER+ 2017) breast cancer (BCAC ER+ 2017)
## 3 collective RPs breast cancer (BCAC ER+ 2017) breast cancer (BCAC ER+ 2017)
## 4 collective RPs breast cancer (BCAC ER+ 2017) breast cancer (BCAC ER+ 2017)
## 5 collective RPs breast cancer (BCAC ER+ 2017) breast cancer (BCAC ER+ 2017)
## exposure method nsnp b se pval
## 1 exposure MR Egger 158 0.0017883139 0.007264531 0.8058738
## 2 exposure Weighted median 158 0.0027383838 0.006900867 0.6915021
## 3 exposure Inverse variance weighted 158 0.0040322372 0.004045085 0.3188501
## 4 exposure Simple mode 158 0.0122904607 0.012119471 0.3120912
## 5 exposure Weighted mode 158 -0.0005456549 0.008148852 0.9466980
singlesnp_results=mr_singlesnp(dat2, parameters = default_parameters(),
single_method = "mr_wald_ratio",
all_method = c("mr_ivw",
"mr_egger_regression",
"mr_weighted_mode",
"mr_weighted_median"))
dim(singlesnp_results)
## [1] 162 9
singlesnp_results=singlesnp_results[order(singlesnp_results$SNP),]
dim(singlesnp_results)
## [1] 162 9
# Remove the snps that weren't used in the MR to have a record of the data source
singlesnp_results.2=singlesnp_results[5:162,]
dim(singlesnp_results.2)
## [1] 158 9
dat2.1=dat2[which(dat2$SNP %in% singlesnp_results.2$SNP),]
dim(dat2.1)
## [1] 158 42
dim(dat)
## [1] 186 42
dat=dat2.1
dim(dat)
## [1] 158 42
# Descriptives
freq5=table(dat$gene.exposure)
freq5=as.data.frame(freq5)
freq5
## Var1 Freq
## 1 MRPL10 2
## 2 MRPL13 1
## 3 MRPL14 1
## 4 MRPL15 1
## 5 MRPL17 2
## 6 MRPL18 4
## 7 MRPL19 2
## 8 MRPL20 3
## 9 MRPL21 3
## 10 MRPL22 1
## 11 MRPL23 1
## 12 MRPL24 1
## 13 MRPL27 1
## 14 MRPL28 1
## 15 MRPL32 1
## 16 MRPL33 1
## 17 MRPL34 4
## 18 MRPL35 3
## 19 MRPL36 2
## 20 MRPL38 4
## 21 MRPL39 1
## 22 MRPL4 1
## 23 MRPL40 1
## 24 MRPL43 4
## 25 MRPL44 1
## 26 MRPL45 1
## 27 MRPL47 1
## 28 MRPL48 2
## 29 MRPL49 1
## 30 MRPL50 2
## 31 MRPL52 4
## 32 MRPL53 2
## 33 MRPL54 2
## 34 MRPL55 1
## 35 MRPS10 1
## 36 MRPS11 1
## 37 MRPS12 1
## 38 MRPS14 1
## 39 MRPS15 2
## 40 MRPS17 2
## 41 MRPS18B 1
## 42 MRPS21 2
## 43 MRPS22 1
## 44 MRPS24 1
## 45 MRPS25 1
## 46 MRPS26 1
## 47 MRPS31 1
## 48 MRPS33 3
## 49 MRPS34 1
## 50 MRPS36 1
## 51 MRPS6 3
## 52 MRPS7 2
## 53 MRPS9 2
## 54 RPL10A 1
## 55 RPL11 1
## 56 RPL12 4
## 57 RPL13 3
## 58 RPL13A 2
## 59 RPL14 3
## 60 RPL15 1
## 61 RPL17 4
## 62 RPL21 1
## 63 RPL22 1
## 64 RPL27A 1
## 65 RPL28 2
## 66 RPL29 3
## 67 RPL30 1
## 68 RPL31 1
## 69 RPL32 1
## 70 RPL36 1
## 71 RPL37 1
## 72 RPL37A 1
## 73 RPL3L 1
## 74 RPL4 1
## 75 RPL8 5
## 76 RPL9 4
## 77 RPS13 1
## 78 RPS14 1
## 79 RPS15A 1
## 80 RPS16 1
## 81 RPS18 1
## 82 RPS20 2
## 83 RPS21 1
## 84 RPS23 3
## 85 RPS24 2
## 86 RPS25 1
## 87 RPS27A 1
## 88 RPS29 1
## 89 RPS3A 2
## 90 RPS5 1
## 91 RPS6 1
## 92 RPS8 1
## 93 RPS9 2
# r2 and Fstat
n=34684
k=158
p=dat$eaf.exposure
r2.1<-(2*(dat$beta.exposure^2)*p*(1-p))/1
r2=sum(r2.1, na.rm = TRUE)
r2
## [1] 1.817757
Fstat <- r2*(n-1-k)/((1-r2)*k)
Fstat
## [1] -485.7223
# Combine results
sin=singlesnp_results
res=mr_results
het<-mr_heterogeneity(dat)
plt<-mr_pleiotropy_test(dat)
# Bug in function for combining; save separately
# all_res<-combine_all_mrresults(res,het,plt,sin,ao_slc=F,Exp=T,split.exposure=T,split.outcome=T)
# all_res$lower_intercept=all_res$intercept-1.96*all_res$intercept_se
# all_res$upper_intercept=all_res$intercept+1.96*all_res$intercept_se
#
# head(all_res[,c("Method","outcome","exposure","nsnp","b","se","or", "or_lci95",
# "or_uci95","pval","intercept","intercept_se","intercept_pval",
# "Q","Q_df","Q_pval")])
#write.csv(all_res,"combined_results_all_RP_MRP_blood_v8.csv")
write.csv(mr_results,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/res_META_MR-RP-eqtlGen-bcp-17.csv')
write.csv(singlesnp_results,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/singlesnp_META_MR-RP-eqtlGen-bcp-17.csv')
write.csv(dat, '/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/SIMEX_META_MR-RP-eqtlGen-bcp-17.csv')
write.csv(plt,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/plt_META_MR-RP-eqtlGen-bcp-17.csv')
write.csv(het,'/n/home04/cdadams/MR-RP-eqtlGen-bcp-17/het_META_MR-RP-eqtlGen-bcp-17.csv')
The top plot displays the individual SNPs included in the analysis, their effect estimates and 95% confidence intervals, along with the meta-analyzed “causal” estimates (the term “causal” is used in MR to mean MR is less prone to confounding and reverse causation than observational designs).
The inverse-variance weighted (IVW) estimate is the approach that is used to test the causal effect. The weighted median, weighted mode, and MR-Egger are sensitivity estimators. If the directions and magnitudes of the sensitivity estimators’ findings comport with those of the IVW, this is evidence against pleiotropy. (The sensitivity estimators needn’t all be statistically signficiant. For instance, the MR-Egger estimate isn’t, but this only means that the MR-Egger test doesn’t provide additional support for the IVW. The direction and magnitude of the MR-Egger estimate does, however, comport with the IVW’s.)
The bottom (scatter) plot is a sensitivity analysis for heterogeneity in the effect estimates.
## method nsnp or se or_lci95 or_uci95 pval
## 1 MR Egger 158 1.002 0.007 0.988 1.016 0.806
## 2 Weighted median 158 1.003 0.007 0.989 1.016 0.692
## 3 Inverse variance weighted 158 1.004 0.004 0.996 1.012 0.319
## 4 Simple mode 158 1.012 0.012 0.989 1.037 0.312
## 5 Weighted mode 158 0.999 0.008 0.984 1.016 0.947
## Var1 Freq
## 1 MRPL10 2
## 2 MRPL13 1
## 3 MRPL14 1
## 4 MRPL15 1
## 5 MRPL17 2
## 6 MRPL18 4
## 7 MRPL19 2
## 8 MRPL20 3
## 9 MRPL21 3
## 10 MRPL22 1
## 11 MRPL23 1
## 12 MRPL24 1
## 13 MRPL27 1
## 14 MRPL28 1
## 15 MRPL32 1
## 16 MRPL33 1
## 17 MRPL34 4
## 18 MRPL35 3
## 19 MRPL36 2
## 20 MRPL38 4
## 21 MRPL39 1
## 22 MRPL4 1
## 23 MRPL40 1
## 24 MRPL43 4
## 25 MRPL44 1
## 26 MRPL45 1
## 27 MRPL47 1
## 28 MRPL48 2
## 29 MRPL49 1
## 30 MRPL50 2
## 31 MRPL52 4
## 32 MRPL53 2
## 33 MRPL54 2
## 34 MRPL55 1
## 35 MRPS10 1
## 36 MRPS11 1
## 37 MRPS12 1
## 38 MRPS14 1
## 39 MRPS15 2
## 40 MRPS17 2
## 41 MRPS18B 1
## 42 MRPS21 2
## 43 MRPS22 1
## 44 MRPS24 1
## 45 MRPS25 1
## 46 MRPS26 1
## 47 MRPS31 1
## 48 MRPS33 3
## 49 MRPS34 1
## 50 MRPS36 1
## 51 MRPS6 3
## 52 MRPS7 2
## 53 MRPS9 2
## 54 RPL10A 1
## 55 RPL11 1
## 56 RPL12 4
## 57 RPL13 3
## 58 RPL13A 2
## 59 RPL14 3
## 60 RPL15 1
## 61 RPL17 4
## 62 RPL21 1
## 63 RPL22 1
## 64 RPL27A 1
## 65 RPL28 2
## 66 RPL29 3
## 67 RPL30 1
## 68 RPL31 1
## 69 RPL32 1
## 70 RPL36 1
## 71 RPL37 1
## 72 RPL37A 1
## 73 RPL3L 1
## 74 RPL4 1
## 75 RPL8 5
## 76 RPL9 4
## 77 RPS13 1
## 78 RPS14 1
## 79 RPS15A 1
## 80 RPS16 1
## 81 RPS18 1
## 82 RPS20 2
## 83 RPS21 1
## 84 RPS23 3
## 85 RPS24 2
## 86 RPS25 1
## 87 RPS27A 1
## 88 RPS29 1
## 89 RPS3A 2
## 90 RPS5 1
## 91 RPS6 1
## 92 RPS8 1
## 93 RPS9 2