Under conditions of nucleolar stress, various ribosomal proteins (RPs) interact with the MDM2 proto-oncogene (MDM2) to stabilize p53. Dysregulation of the RPs may disturb this pathway, but whether changes in the expression of the RPs influence risk for breast cancer in the general population is unknown. We tested whether expression changes in individual cytosolic and mitochondrial RPs influence risk for breast cancer using inverse-variance weighted (IVW) and Wald ratio Mendelian randomization (MR). Summary statistics for cis expression quantitative trait loci (eQTLs) for SNPs controlling expression of RPs (P<5x10-06) were extracted from the eqtlGen Consortium’s genome-wide association study (GWAS) of expression in blood (n=125 RPs). Summary statistics for 123 of the 125 RPs were extracted from the Breast Cancer Association Consortium’s (BCAC) 2020 genome-wide association study (GWAS) of breast cancer (133,384 cases and 113,789). After linkage disequilibirum (LD)-clumping each set of eQTLs for a given RP, there were 31 RPs available for meta-analysis. While the IVW result for RPL13A was significant (P=0.011), after false-discovery rate multiple-testing correction, there were no significant findings. Seven RPs had at least one significant Wald ratio test after FDR correction (MRPL19, MRPL21, MRPL23, MRPL24, MRPL34, MRPS30, and RPL13). Three of these (MRPL19, MRPL23, and MRPS30) could not be meta-analyzed. For MRPL21,MRPL24, MRPL34, and RPL13, the null IVW results suggest that the RP eQTLs for these RPs influence BC through pathways other than the expression of the RP they control.
\[ \beta = \frac{z} {\sqrt2p(1-p)(n + z^2)} \] \[ SE = \frac{1} {\sqrt2p(1-p)(n + z^2)} \] where \[p = MAF\]
# README files for the eQTLGen GWAS
# This README accompanies the files with cis-eQTL results from eQTLGen
# Files
# -----
# File with full cis-eQTL results: 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
# File with significant (FDR<0.05) cis-eQTL results: 2019-12-11-cis-eQTLsFDR0.05-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
# Column Names
# ------------
# Pvalue - P-value
# SNP - SNP rs ID
# SNPChr - SNP chromosome
# SNPPos - SNP position
# AssessedAllele - Assessed allele, the Z-score refers to this allele
# OtherAllele - Not assessed allele
# Zscore - Z-score
# Gene - ENSG name (Ensembl v71) of the eQTL gene
# GeneSymbol - HGNC name of the gene
# GeneChr - Gene chromosome
# GenePos - Centre of gene position
# NrCohorts - Total number of cohorts where this SNP-gene combination was tested
# NrSamples - Total number of samples where this SNP-gene combination was tested
# FDR - False discovery rate estimated based on permutations
# BonferroniP - P-value after Bonferroni correction
#
# Additional information
# ----------------------
# These files contain all cis-eQTL results from eQTLGen, accompanying the article.
# 19,250 genes that showed expression in blood were tested.
# Every SNP-gene combination with a distance <1Mb from the center of the gene and tested in at least 2 cohorts was included.
# Associations where SNP/proxy positioned in Illumina probe were not removed from combined analysis.
# UPDATE LOG
# ----------
# 2018-10-19: Initial data release
# 2018-12-19: In the current README, following file names have been fixed and updated:
# 2019-12-20: Cis-eQTLs are now updated to have a 2-cohort filter: every cis-eQTL must be tested in at least 2 cohorts to be reported
# On cluster, I gzipped the eqtlGen file: gzip 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt
# this is because I kept getting error messages both here & on the cluster saying ther was no room on the disk.
# I moved unnecessary files to a folder in my personal computer DB: /Users/charleenadams/Dropbox/Harvard/bc/survival/odyssey_transfer/
# This has the gz files for the protein GWAS, the eQTLGen SMR file, and the eQTLGen EAF file.
# I launched an interacted terminal session with increased memory:
# salloc -p test --mem 5000 -t 0-06:00
#-----------------------------------------------------------------------------------------------------------------------
# README file for the MAF eQTLGen cis-eQTL data
# A README accompanies the file with allele frequencies from eQTLGen cis-eQTL analysis:
# 2018-07-18_SNP_AF_for_AlleleB_combined_allele_counts_and_MAF_pos_added.txt.gz
# Column Names
# ------------
# SNP - SNP rs ID
# hg19_chr - chr number
# hg19_pos - SNP position (hg19)
# AlleleA - Other allele
# AlleleB - Assessed allele
# allA_total - Total allele count of genotype AA
# allAB_total - Total allele count of genotype AB
# allB_total - Total allele count of genotype BB
# AlleleB_all - Allele frequency of assessed allele
# Additional information
# ----------------------
# The allele frequencies were calculated using reported allele counts from all cohorts except
# Framingham Heart Study, because the related samples are present in this cohort.
#-----------------------------------------------------------------------------------------------------------------------
# Remember that the eqtlGen file is stored in /n/home04/cdadams/MR-eqtlGen-long
# Don't move to MR-RP-eqtlGen-bc because it will result a storage over-capacity
setwd('/n/home04/cdadams/MR-eqtlGen-long/')
tbl=fread('cat 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz | gunzip')
rp_list_expanded=read.csv('/n/home04/cdadams/ld/ldsc/expanded_verified.csv')
rp_list_expanded
# Extract RPs
RPs=tbl[which(tbl$GeneSymbol %in% rp_list_expanded$all_RP),]
freq=table(RPs$GeneSymbol)
rp_freq=as.data.frame(freq)
# Read-in MAFs
eqtl_MAF=read.table(file = "2018-07-18_SNP_AF_for_AlleleB_combined_allele_counts_and_MAF_pos_added.txt",
sep = "\t", header=TRUE)
RPs_total <- merge(RPs,eqtl_MAF,by="SNP")
# Calculate betas and SEs
n=30596
RPs_total$Beta=RPs_total$Zscore/sqrt(2*(RPs_total$AlleleB_all)*
(1-RPs_total$AlleleB_all)*(n+RPs_total$Zscore^2))
RPs_total$SE=1/sqrt(2*(RPs_total$AlleleB_all)*
(1-RPs_total$AlleleB_all)*(n+RPs_total$Zscore^2))
write.table(RPs_total, file = "/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs_total_maf_beta_se.txt", sep = "\t",
row.names = FALSE, col.names = TRUE, quote=FALSE)
RPs_total=read.delim("/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs_total_maf_beta_se.txt", check.names = FALSE)
# gzRPs <- gzfile("/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs.gz", "w")
# write.csv(RPs, gzRPs)
# close(gzRPs)
# Read-in for MR formatting
freq=table(RPs_total$GeneSymbol)
rp_freq=as.data.frame(freq)
rp_freq
setwd("/n/home04/cdadams/MR-RP-eqtlGen-bc/")
exposure_RP_eqtlGen <- read_exposure_data(
filename = "/n/home04/cdadams/MR-RP-eqtlGen-bc/RPs_total_maf_beta_se.txt",
sep = '\t',
snp_col = 'SNP',
beta_col = 'Beta',
se_col = 'SE',
effect_allele_col = 'AssessedAllele',
#phenotype_col = 'tissue',
#units_col = 'tissue',
other_allele_col = 'OtherAllele',
eaf_col = 'AlleleB_all',
#samplesize_col = 'samplesize',
#ncase_col = 'ncase',
#ncontrol_col = 'ncontrol',
gene_col = 'GeneSymbol',
pval_col = 'Pvalue',
id_col = 'RP eqtlGEN'
)
exposure_RP_eqtlGen_p=exposure_RP_eqtlGen[which(exposure_RP_eqtlGen$pval.exposure<0.000005),]
write.csv(exposure_RP_eqtlGen_p, '/n/home04/cdadams/MR-RP-eqtlGen-bc/exposure_RP_eqtlGen_p.csv')
# Just verifying that genes aren't lost when reading in
exposure_RP_eqtlGen_p=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/exposure_RP_eqtlGen_p.csv')
colnames(exposure_RP_eqtlGen_p)
freq2=table(exposure_RP_eqtlGen_p$gene.exposure)
rp_freq2=as.data.frame(freq2)
rp_freq2
I’ve kept the BCAC data in the ldsr folder since copying the GWAS files to each new folder for each new project will result in over-reaching the storage capacity I’m allotted on Odyssey.
# Read-in the BCAC data
# I had compressed the file before transfering to the cluster:
# gzip icogs_onco_gwas_meta_overall_breast_cancer_summary_level_statistics.txt
# An Excel data dictionary accompanies the BCAC data
# /n/home04/cdadams/ld/ldsc//data_dictionary_summary_stats_bc_risk_2020.xlsx
#Data dictiony
# iCOGS-Oncoarray-GWAS-Meta-Overall-summary-results
# Label Meaning
# var_name The variant name is coded as chromosome_position_non-effect-allele_effect-allele
# Effect.Gwas The effect allele in GWAS
# Baseline.Gwas The non effect allele in GWAS
# Freq.Gwas Effect allele frequency
# beta.Gwas Log odds ratio (OR) for effect alllele in GWAS data
# SE.Gwas standard error of the log OR estimate in GWAS data
# P.value.Gwas P-value of the log OR estimate in GWAS data
# SNP.iCOGS SNPs name in iCOGS panel. For SNPs with rsid, the name is coded as rsid:position:non-effect-allele:effect-allele;
# For SNPs without rsid, the name is similarly as var_name
bcac=read.table(gzfile("/n/home04/cdadams/ld/ldsc/icogs_onco_gwas_meta_overall_breast_cancer_summary_level_statistics.txt.gz"), header=TRUE)
bcac$id=gsub( ":.*$", "", bcac$SNP.iCOGs)
bcac2=bcac[which(startsWith(bcac$id, 'rs')),]
colnames(bcac2)[5]="Beta"
#save.image("longevity_bcac.Rdata")
write.table(bcac2, file = "/n/home04/cdadams/ld/ldsc/bcac_master3.txt", sep = "\t",
row.names = FALSE, col.names = TRUE, quote=FALSE)
# Read-in the bcac file formated for ldsc
bcac3=read.table(file = "/n/home04/cdadams/ld/ldsc/bcac_master3.txt", sep = "\t", header=TRUE)
exposure_RP_eqtlGen_p=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/exposure_RP_eqtlGen_p.csv')
freq=table(exposure_RP_eqtlGen_p$gene.exposure)
freq=as.data.frame(freq)
freq
bc_outcome <- read_outcome_data(
snps = exposure_RP_eqtlGen_p$SNP,
filename = "/n/home04/cdadams/ld/ldsc/bcac_master3.txt",
sep = '\t',
snp_col = 'id',
beta_col = 'Beta',
se_col = 'SE.Gwas',
effect_allele_col = 'Effect.Gwas',
#phenotype_col = 'tissue',#tissue
#units_col = 'tissue',
other_allele_col = 'Baseline.Gwas',
eaf_col = 'Freq.Gwas',
#samplesize_col = 'samplesize',
#ncase_col = '133384',
#ncontrol_col = '113789',
#gene_col = 'gene_name',
pval_col = 'P.value.Gwas',
id_col = 'breast cancer'
)
# Harmonize the alleles btw the two gwas
RP_eqtlGen_bc <- harmonise_data(exposure_RP_eqtlGen_p, bc_outcome, action = 2)
RP_eqtlGen_bc$samplesize.outcome="247173"
RP_eqtlGen_bc$samplesize.exposure="30596"
RP_eqtlGen_bc$id.exposure=RP_eqtlGen_bc$gene.exposure
write.csv(RP_eqtlGen_bc, '/n/home04/cdadams/MR-RP-eqtlGen-bc/RP_eqtlGen_bc.csv') #can simply read this in for the MR
# Descriptives
colnames(RP_eqtlGen_bc)
freq3=table(RP_eqtlGen_bc$gene.exposure)
rp_freq3=as.data.frame(freq3)
rp_freq3
rp_freq3=rp_freq3[order(rp_freq3$Freq),]
rp_freq3
setwd('/n/home04/cdadams/MR-RP-eqtlGen-bc')
RP_eqtlGen_bc=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/RP_eqtlGen_bc.csv')
# Command to remove everything except "v"
#rm(list=(ls()[ls()!="v"]))
# Split the harmonized data into separate data frames for each genes
RP_split <- split(RP_eqtlGen_bc, RP_eqtlGen_bc$id.exposure)
list2env(RP_split, envir= .GlobalEnv) #split the list into separate datasets
# Remove the original RP_eqtlGen_bc and RP_split
rm(list= ls()[(ls() %in% c("RP_eqtlGen_bc","RP_split"))])
# Save each harmonized gene as a .csv
# for (i in 1:length(files)){
# write.csv(files[[i]], paste(names(files[i]), ".csv", sep = ""))
# }
# Get them back together: not used here but useful
# dfs = sapply(.GlobalEnv, is.data.frame)
# dfs
#
# df=do.call(rbind, mget(names(dfs)[dfs]))
# head(df)
# dim(df)
# Combine the separated files into a list
files <- mget(ls())
length(files)
# Use a loop to clump each RP dataframe in the list, run MR, & save results
for (i in 1:length(files)){
files[[i]]=clump_data(files[[i]])
write.csv(files[[i]], paste(names(files[i]), "_clump.csv", sep = ""))
files[[i]]=mr(files[[i]])
write.csv(files[[i]], paste(names(files[i]), "_mr.csv", sep = ""))
}
# Idea for reading all mr results in and saving:
# Idea from: https://statisticsglobe.com/merge-csv-files-in-r
# Sadly, didn't work.
# data_all <- list.files(path = "/n/home04/cdadams/MR-RP-eqtlGen-bc",
# pattern = "*_mr.csv", full.names = TRUE) %>%
# lapply(read_csv) %>%
# bind_rows
# write.csv(data_all, "complete_RP_MR_res.csv")
# Merged all of the clumped data
tbl_fread_clump <-
list.files(pattern = "*_clump.csv") %>%
map_df(~fread(.))
tbl_fread_clump_df=as.data.frame(tbl_fread_clump)
head(tbl_fread_clump_df)
colnames(tbl_fread_clump_df)
df <- tbl_fread_clump_df[c(3:34)] #check this
write.csv(df, '/n/home04/cdadams/MR-RP-eqtlGen-bc/df.csv')
# Merge all the MR results and save
tbl_fread <-
list.files(pattern = "*_mr.csv") %>%
map_df(~fread(.))
tbl_fread_df=as.data.frame(tbl_fread)
myvars=c('id.exposure','id.outcome','method', 'nsnp','b','se','pval')
res=tbl_fread_df[myvars]
res$id.outcome="bc"
res=res[order(res$p),]
# FDR correction
p=res$p
fdr=p.adjust(p, method = "fdr", n = length(p))
res$fdr=fdr
res=res[order(res$fdr),]
write.csv(res,'/n/home04/cdadams/MR-RP-eqtlGen-bc/MR-eqtlGEN-RP-bc.csv')
res=res[order(res$p),]
# Select just the set with >2 SNPs
res=res[order(res$fdr),]
res_meta_set=res[which(res$nsnp>2),]
res_meta_set=res_meta_set[which(res_meta_set$method=='Inverse variance weighted'),]
rownames(res_meta_set)
# FDR for res_meta_set (only IVWs)
p2=res_meta_set$p
fdr2=p.adjust(p2, method = "fdr", n = length(p2))
res_meta_set$fdr2=fdr2
res_meta_set$fdr=NULL
res_meta_set=res_meta_set[order(res_meta_set$fdr2),]
write.csv(res_meta_set,'/n/home04/cdadams/MR-RP-eqtlGen-bc/MR-eqtlGEN-RP-bc-meta.csv')
# FDR for res_meta_set: only RP (not MRP) (only IVWs)
res_meta_set=res_meta_set[order(res_meta_set$id.exposure),]
p2=res_meta_set$p
fdr2=p.adjust(p2, method = "fdr", n = length(p2))
res_meta_set$fdr2=fdr2
res_meta_set$fdr=NULL
res_meta_set=res_meta_set[order(res_meta_set$fdr2),]
write.csv(res_meta_set,'/n/home04/cdadams/MR-RP-eqtlGen-bc/MR-eqtlGEN-RP-bc-meta.csv')
# Consideration for later: Find way to merge the df and results
This includes a list of all tests run
res=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/MR-eqtlGEN-RP-bc.csv')
head(res, n=5)
## X id.exposure id.outcome method nsnp
## 1 113 MRPS30 overall breast cancer Wald ratio 1
## 2 156 RPL13A overall breast cancer Inverse variance weighted 4
## 3 57 MRPL41 overall breast cancer Wald ratio 1
## 4 111 MRPS25 overall breast cancer Inverse variance weighted 2
## 5 14 MRPL19 overall breast cancer Inverse variance weighted 2
## b se pval fdr
## 1 0.9559052 0.24942715 0.000126898 0.03134381
## 2 -0.1652277 0.06542380 0.011553435 0.90321708
## 3 -0.4873273 0.19655197 0.013161047 0.90321708
## 4 -0.1536628 0.06293826 0.014626997 0.90321708
## 5 -0.1523476 0.06675390 0.022475879 0.97642760
res_meta_set=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/MR-eqtlGEN-RP-bc-meta.csv')
dim(res_meta_set)
## [1] 31 9
res_meta_set
## X id.exposure id.outcome method nsnp
## 1 156 RPL13A overall breast cancer Inverse variance weighted 4
## 2 35 MRPL34 overall breast cancer Inverse variance weighted 3
## 3 46 MRPL37 overall breast cancer Inverse variance weighted 3
## 4 167 RPL17 overall breast cancer Inverse variance weighted 4
## 5 51 MRPL38 overall breast cancer Inverse variance weighted 5
## 6 186 RPL29 overall breast cancer Inverse variance weighted 3
## 7 77 MRPL52 overall breast cancer Inverse variance weighted 4
## 8 117 MRPS33 overall breast cancer Inverse variance weighted 3
## 9 224 RPS23 overall breast cancer Inverse variance weighted 6
## 10 161 RPL14 overall breast cancer Inverse variance weighted 4
## 11 135 MRPS9 overall breast cancer Inverse variance weighted 4
## 12 181 RPL28 overall breast cancer Inverse variance weighted 5
## 13 151 RPL13 overall breast cancer Inverse variance weighted 4
## 14 11 MRPL18 overall breast cancer Inverse variance weighted 4
## 15 140 RPL10A overall breast cancer Inverse variance weighted 4
## 16 130 MRPS7 overall breast cancer Inverse variance weighted 4
## 17 69 MRPL48 overall breast cancer Inverse variance weighted 5
## 18 206 RPL9 overall breast cancer Inverse variance weighted 4
## 19 245 RPS9 overall breast cancer Inverse variance weighted 3
## 20 105 MRPS21 overall breast cancer Inverse variance weighted 3
## 21 240 RPS8 overall breast cancer Inverse variance weighted 3
## 22 82 MRPL53 overall breast cancer Inverse variance weighted 3
## 23 201 RPL8 overall breast cancer Inverse variance weighted 5
## 24 23 MRPL21 overall breast cancer Inverse variance weighted 7
## 25 61 MRPL43 overall breast cancer Inverse variance weighted 4
## 26 125 MRPS6 overall breast cancer Inverse variance weighted 4
## 27 146 RPL12 overall breast cancer Inverse variance weighted 4
## 28 40 MRPL35 overall breast cancer Inverse variance weighted 3
## 29 18 MRPL20 overall breast cancer Inverse variance weighted 3
## 30 176 RPL27A overall breast cancer Inverse variance weighted 3
## 31 87 MRPL54 overall breast cancer Inverse variance weighted 4
## b se pval fdr2
## 1 -0.165227703 0.06542380 0.01155344 0.3581565
## 2 -0.091211365 0.05148747 0.07647334 0.5897460
## 3 -0.154948725 0.08842939 0.07973408 0.5897460
## 4 -0.088403512 0.05375430 0.10005542 0.5897460
## 5 -0.180656374 0.11557756 0.11803498 0.5897460
## 6 0.451353996 0.30112935 0.13390715 0.5897460
## 7 0.101153377 0.06981795 0.14738892 0.5897460
## 8 -0.114936022 0.08027212 0.15219251 0.5897460
## 9 0.070584901 0.06028215 0.24163539 0.7673316
## 10 0.086374102 0.07652116 0.25899865 0.7673316
## 11 0.063031463 0.05789512 0.27627825 0.7673316
## 12 0.064400510 0.06175610 0.29703157 0.7673316
## 13 0.051175243 0.05621589 0.36264636 0.8173705
## 14 0.052751938 0.05972047 0.37706655 0.8173705
## 15 0.087399102 0.10377422 0.39967357 0.8173705
## 16 -0.041279621 0.05440001 0.44796241 0.8173705
## 17 -0.051595424 0.06803552 0.44823544 0.8173705
## 18 0.067839791 0.10275462 0.50911807 0.8701942
## 19 0.033545757 0.05512650 0.54284051 0.8701942
## 20 0.032298800 0.06113505 0.59727805 0.8701942
## 21 0.031256333 0.06621773 0.63690993 0.8701942
## 22 0.043758890 0.10371184 0.67307786 0.8701942
## 23 0.027220134 0.06487378 0.67478786 0.8701942
## 24 0.011232709 0.02899473 0.69845627 0.8701942
## 25 -0.024865610 0.06493465 0.70176955 0.8701942
## 26 0.023870872 0.08111773 0.76854822 0.9163460
## 27 0.021503697 0.09496661 0.82086367 0.9424731
## 28 0.008020470 0.06813823 0.90629838 0.9668579
## 29 0.017533751 0.20810222 0.93285331 0.9668579
## 30 -0.006321899 0.11646430 0.95671060 0.9668579
## 31 -0.002119708 0.05101649 0.96685786 0.9668579
res_meta_set=res_meta_set[order(res_meta_set$id.exposure), ]
res_meta_set$id.exposure #MRPL19 MRPL21 MRPL23 MRPL24 MRPL34 MRPS30 RPL13
## [1] "MRPL18" "MRPL20" "MRPL21" "MRPL34" "MRPL35" "MRPL37" "MRPL38" "MRPL43"
## [9] "MRPL48" "MRPL52" "MRPL53" "MRPL54" "MRPS21" "MRPS33" "MRPS6" "MRPS7"
## [17] "MRPS9" "RPL10A" "RPL12" "RPL13" "RPL13A" "RPL14" "RPL17" "RPL27A"
## [25] "RPL28" "RPL29" "RPL8" "RPL9" "RPS23" "RPS8" "RPS9"
# Not tested with IVW
# MRPL19 MRPL23 MRPS30
setwd('/n/home04/cdadams/MR-RP-eqtlGen-bc')
RP_eqtlGen_bc=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/RP_eqtlGen_bc.csv')
# Split the harmonized data into separate data frames for each genes
RP_split <- split(RP_eqtlGen_bc, RP_eqtlGen_bc$id.exposure)
list2env(RP_split, envir= .GlobalEnv) #split the list into separate datasets
# Remove the original RP_eqtlGen_bc and RP_split
rm(list= ls()[(ls() %in% c("RP_eqtlGen_bc","RP_split") )])
# Combine the separated files into a list
files <- mget(ls())
length(files)
# Use a loop to run MR & save results
for (i in 1:length(files)){
files[[i]]=mr_singlesnp(files[[i]], parameters = default_parameters(), single_method = "mr_wald_ratio")
write.csv(files[[i]], paste(names(files[i]), "_Walds_mr.csv", sep = ""))
}
# Merge all the MR results and save
Walds_res_tbl_fread <-
list.files(pattern = "*_Walds_mr.csv") %>%
map_df(~fread(.))
Walds_res_tbl_fread_df=as.data.frame(Walds_res_tbl_fread)
myvars=c('id.exposure','id.outcome', 'SNP','b','se','p')
Walds_res=Walds_res_tbl_fread_df[myvars]
Walds_res$id.outcome="bc"
Walds_res=Walds_res[order(Walds_res$p),]
write.csv(Walds_res,"Walds_includes_metas_res.csv")
# Tried selecting "rs", but the "inverse variance weighted" includes "rs"
exp1 <- Walds_res[Walds_res$SNP %like% "rs", ]
Walds_res2=Walds_res %>%
filter(str_detect(Walds_res$SNP, "^rs"))
# FDR correction
p=Walds_res2$p
fdr=p.adjust(p, method = "fdr", n = length(p))
Walds_res2$fdr=fdr
Walds_res2=Walds_res2[order(Walds_res2$fdr),]
write.csv(Walds_res2,'/n/home04/cdadams/MR-RP-eqtlGen-bc/Walds_MR_eqtlGEN_RP_bc.csv')
Walds_res=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/Walds_MR_eqtlGEN_RP_bc.csv')
dim(Walds_res)
## [1] 53257 8
sig_Walds_res=Walds_res[which(Walds_res$fdr<0.05),]
table(sig_Walds_res$id.exposure)
##
## MRPL19 MRPL21 MRPL23 MRPL24 MRPL34 MRPS30 RPL13
## 1 1 4 1 29 156 8
#MRPL19 MRPL21 MRPL23 MRPL24 MRPL34 MRPS30 RPL13
Will make another repo for this, eventually.
# Get the clumped meta-analytic set for the group of RPs
RP_eqtlGen_bc=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/RP_eqtlGen_bc.csv')
RP_eqtlGen_bc$id.exposure="collective RPs"
RP_eqtlGen_bc_clumped=clump_data(RP_eqtlGen_bc)
gene_freq=table(RP_eqtlGen_bc_clumped$gene.exposure)
gene_freq=as.data.frame(gene_freq)
dat=RP_eqtlGen_bc_clumped
# Run MR
mr_results<- mr(dat)
mr_results
## id.exposure id.outcome outcome exposure method nsnp
## 1 collective RPs WVFiKJ outcome exposure MR Egger 186
## 2 collective RPs WVFiKJ outcome exposure Weighted median 186
## 3 collective RPs WVFiKJ outcome exposure Inverse variance weighted 186
## 4 collective RPs WVFiKJ outcome exposure Simple mode 186
## 5 collective RPs WVFiKJ outcome exposure Weighted mode 186
## b se pval
## 1 -0.023384301 0.016847259 0.1668099
## 2 0.016969311 0.013465603 0.2075984
## 3 0.004060416 0.009288452 0.6620050
## 4 0.026519297 0.022654624 0.2432689
## 5 0.016847867 0.015678964 0.2839747
# Detect and remove outliers (outliers can indicate pleiotropy)
raddat <- format_radial(BXG=dat$beta.exposure, BYG=dat$beta.outcome,
seBXG=dat$se.exposure, seBYG=dat$se.outcome, RSID=dat$SNP)
ivwrad <- ivw_radial(raddat, alpha=0.05, weights=1)
##
## Radial IVW
##
## Estimate Std.Error t value Pr(>|t|)
## Effect (1st) 0.004677796 0.009280510 0.5040451 0.6142297
## Iterative 0.004677794 0.009280511 0.5040449 0.6142298
## Exact (FE) 0.004675870 0.008457056 0.5528958 0.5803348
## Exact (RE) 0.004689604 0.009466780 0.4953747 0.6209141
##
##
## Residual standard error: 1.097 on 188 degrees of freedom
##
## F-statistic: 0.25 on 1 and 188 DF, p-value: 0.615
## Q-Statistic for heterogeneity: 226.394 on 188 DF , p-value: 0.02916883
##
## Outliers detected
## Number of iterations = 2
ivwrad2 <- ivw_radial(raddat, alpha=0.05, weights=1)
##
## Radial IVW
##
## Estimate Std.Error t value Pr(>|t|)
## Effect (1st) 0.004677796 0.009280510 0.5040451 0.6142297
## Iterative 0.004677794 0.009280511 0.5040449 0.6142298
## Exact (FE) 0.004675870 0.008457056 0.5528958 0.5803348
## Exact (RE) 0.004689604 0.009863474 0.4754516 0.6350171
##
##
## Residual standard error: 1.097 on 188 degrees of freedom
##
## F-statistic: 0.25 on 1 and 188 DF, p-value: 0.615
## Q-Statistic for heterogeneity: 226.394 on 188 DF , p-value: 0.02916883
##
## Outliers detected
## Number of iterations = 2
dim(ivwrad$outliers)[1]
## [1] 14
outliers=ivwrad$outliers[1]
outliers
## SNP
## 1 rs10039541
## 2 rs10819246
## 3 rs11128729
## 4 rs117605916
## 5 rs13034052
## 6 rs148276897
## 7 rs17305746
## 8 rs181131314
## 9 rs35061212
## 10 rs4979655
## 11 rs6747017
## 12 rs6875287
## 13 rs72950603
## 14 rs76629948
myvars=outliers$SNP
dat2 <- dat[! dat$SNP %in% myvars, ]
dim(dat2)
## [1] 175 34
dim(dat)
## [1] 189 34
dat2$outcome="breast cancer (BCAC)"
dat2$id.outcome="breast cancer (BCAC)"
# Run MR without outliers
mr_results <- mr(dat2)
mr_results
## id.exposure id.outcome outcome exposure
## 1 collective RPs breast cancer (BCAC) breast cancer (BCAC) exposure
## 2 collective RPs breast cancer (BCAC) breast cancer (BCAC) exposure
## 3 collective RPs breast cancer (BCAC) breast cancer (BCAC) exposure
## 4 collective RPs breast cancer (BCAC) breast cancer (BCAC) exposure
## 5 collective RPs breast cancer (BCAC) breast cancer (BCAC) exposure
## method nsnp b se pval
## 1 MR Egger 172 -0.01048461 0.016213793 0.51873198
## 2 Weighted median 172 0.02577580 0.013611446 0.05826673
## 3 Inverse variance weighted 172 0.01066183 0.008888786 0.23034554
## 4 Simple mode 172 0.02396877 0.022581134 0.28998184
## 5 Weighted mode 172 0.02396877 0.014906642 0.10969633
singlesnp_results=mr_singlesnp(dat2, parameters = default_parameters(),
single_method = "mr_wald_ratio",
all_method = c("mr_ivw",
"mr_egger_regression",
"mr_weighted_mode",
"mr_weighted_median"))
dim(singlesnp_results)
## [1] 176 9
singlesnp_results=singlesnp_results[order(singlesnp_results$SNP),]
# Remove the snps that weren't used in the MR to have a record of the data source
singlesnp_results.2=singlesnp_results[5:176,]
dim(singlesnp_results.2)
## [1] 172 9
dat2.1=dat2[which(dat2$SNP %in% singlesnp_results.2$SNP),]
dim(dat2.1)
## [1] 172 34
dim(dat)
## [1] 189 34
dat=dat2.1
dim(dat)
## [1] 172 34
# r2 and Fstat
summary(dat$beta.exposure)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.87206 -0.17947 -0.06836 -0.01841 0.14376 0.89985
mean=mean(dat$beta.exposure)
n=34684
test=scale(dat$beta.exposure)
str(test)
## num [1:172, 1] 0.4686 -0.0836 -1.1002 0.9197 -1.0975 ...
## - attr(*, "scaled:center")= num -0.0184
## - attr(*, "scaled:scale")= num 0.241
test <- scale(dat$beta.exposure)
test3=as.data.frame(test)
dim(test3)
## [1] 172 1
dat$scaled.dat=test3$V1
summary(dat$scaled.dat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.5421 -0.6683 -0.2073 0.0000 0.6729 3.8101
n=34684
k=172
p=dat$eaf.exposure
r2.1<-(2*(dat$scaled.dat^2)*p*(1-p))/1 # use dat$beta.exposure if already standardized
r2=sum(r2.1, na.rm = TRUE)
r2
## [1] 35.68071
Fstat <- r2*(n-1-k)/((1-r2)*k)
Fstat
## [1] -206.4309
# Combine results
sin=singlesnp_results
res=mr_results
het<-mr_heterogeneity(dat)
plt<-mr_pleiotropy_test(dat)
# Bug in function for combining; save separately
# all_res<-combine_all_mrresults(res,het,plt,sin,ao_slc=F,Exp=T,split.exposure=T,split.outcome=T)
# all_res$lower_intercept=all_res$intercept-1.96*all_res$intercept_se
# all_res$upper_intercept=all_res$intercept+1.96*all_res$intercept_se
#
# head(all_res[,c("Method","outcome","exposure","nsnp","b","se","or", "or_lci95",
# "or_uci95","pval","intercept","intercept_se","intercept_pval",
# "Q","Q_df","Q_pval")])
#write.csv(all_res,"combined_results_all_RP_MRP_blood_v8.csv")
write.csv(mr_results,'/n/home04/cdadams/MR-RP-eqtlGen-bc/res_META_MR-RP-eqtlGen-bc.csv')
write.csv(singlesnp_results,'/n/home04/cdadams/MR-RP-eqtlGen-bc/singlesnp_META_MR-RP-eqtlGen-bc.csv')
write.csv(dat, '/n/home04/cdadams/MR-RP-eqtlGen-bc/SIMEX_META_MR-RP-eqtlGen-bc.csv')
write.csv(plt,'/n/home04/cdadams/MR-RP-eqtlGen-bc/plt_META_MR-RP-eqtlGen-bc.csv')
write.csv(het,'/n/home04/cdadams/MR-RP-eqtlGen-bc/het_META_MR-RP-eqtlGen-bc.csv')
The top plot displays the individual SNPs included in the analysis, their effect estimates and 95% confidence intervals, along with the meta-analyzed “causal” estimates (the term “causal” is used in MR to mean MR is less prone to confounding and reverse causation than observational designs).
The inverse-variance weighted (IVW) estimate is the approach that is used to test the causal effect. The weighted median, weighted mode, and MR-Egger are sensitivity estimators. If the directions and magnitudes of the sensitivity estimators’ findings comport with those of the IVW, this is evidence against pleiotropy. (The sensitivity estimators needn’t all be statistically signficiant. For instance, the MR-Egger estimate isn’t, but this only means that the MR-Egger test doesn’t provide additional support for the IVW. The direction and magnitude of the MR-Egger estimate does, however, comport with the IVW’s.)
The bottom (scatter) plot is a sensitivity analysis for heterogeneity in the effect estimates.
## method nsnp or se or_lci95 or_uci95 pval
## 1 MR Egger 172 0.990 0.016 0.959 1.022 0.519
## 2 Weighted median 172 1.026 0.014 0.999 1.054 0.058
## 3 Inverse variance weighted 172 1.011 0.009 0.993 1.028 0.230
## 4 Simple mode 172 1.024 0.023 0.980 1.071 0.290
## 5 Weighted mode 172 1.024 0.015 0.995 1.055 0.110
## Var1 Freq
## 1 MRPL1 1
## 2 MRPL10 2
## 3 MRPL13 1
## 4 MRPL14 1
## 5 MRPL15 1
## 6 MRPL17 2
## 7 MRPL18 4
## 8 MRPL19 1
## 9 MRPL2 2
## 10 MRPL20 3
## 11 MRPL21 4
## 12 MRPL22 1
## 13 MRPL23 1
## 14 MRPL24 1
## 15 MRPL27 1
## 16 MRPL28 1
## 17 MRPL32 1
## 18 MRPL33 1
## 19 MRPL34 1
## 20 MRPL35 3
## 21 MRPL36 2
## 22 MRPL37 1
## 23 MRPL38 3
## 24 MRPL39 1
## 25 MRPL4 1
## 26 MRPL40 1
## 27 MRPL42 1
## 28 MRPL43 4
## 29 MRPL44 1
## 30 MRPL45 1
## 31 MRPL47 1
## 32 MRPL48 2
## 33 MRPL49 1
## 34 MRPL50 2
## 35 MRPL51 1
## 36 MRPL52 4
## 37 MRPL53 2
## 38 MRPL54 2
## 39 MRPL55 1
## 40 MRPS10 1
## 41 MRPS11 1
## 42 MRPS12 1
## 43 MRPS14 1
## 44 MRPS15 2
## 45 MRPS16 1
## 46 MRPS17 2
## 47 MRPS18B 1
## 48 MRPS18C 1
## 49 MRPS21 2
## 50 MRPS22 1
## 51 MRPS24 1
## 52 MRPS26 1
## 53 MRPS31 1
## 54 MRPS33 3
## 55 MRPS34 1
## 56 MRPS35 2
## 57 MRPS36 1
## 58 MRPS6 3
## 59 MRPS7 2
## 60 MRPS9 1
## 61 RPL10A 1
## 62 RPL11 1
## 63 RPL12 3
## 64 RPL13 4
## 65 RPL13A 1
## 66 RPL14 3
## 67 RPL15 1
## 68 RPL17 4
## 69 RPL21 1
## 70 RPL22 1
## 71 RPL27A 1
## 72 RPL28 2
## 73 RPL29 3
## 74 RPL3 1
## 75 RPL30 1
## 76 RPL31 2
## 77 RPL32 1
## 78 RPL35A 1
## 79 RPL36 2
## 80 RPL37 1
## 81 RPL3L 1
## 82 RPL4 1
## 83 RPL8 5
## 84 RPL9 4
## 85 RPS12 1
## 86 RPS13 1
## 87 RPS14 2
## 88 RPS15A 2
## 89 RPS16 1
## 90 RPS18 1
## 91 RPS20 2
## 92 RPS21 1
## 93 RPS23 4
## 94 RPS24 2
## 95 RPS25 1
## 96 RPS26 1
## 97 RPS27A 1
## 98 RPS28 1
## 99 RPS29 1
## 100 RPS3A 2
## 101 RPS5 1
## 102 RPS6 1
## 103 RPS7 1
## 104 RPS8 2
## 105 RPS9 2
Will create another repo for this. The idea is to see how the BCAC 2017 data perform.
# Try using eqtlGen with BCAC 2017
# Extract the increased expression loci from BCAC
exposure_RP_eqtlGen_p=read.csv('/n/home04/cdadams/MR-RP-eqtlGen-bc/exposure_RP_eqtlGen_p.csv')
exposure_RP_eqtlGen_p_clumped=clump_data(exposure_RP_eqtlGen_p)
dim(exposure_RP_eqtlGen_p_clumped)
outcome_dat <- extract_outcome_data(exposure_RP_eqtlGen_p_clumped$SNP, c('1126'),
proxies = 1, rsq = 0.8, align_alleles = 1, palindromes = 1, maf_threshold = 0.3)
dat <- harmonise_data(exposure_RP_eqtlGen_p_clumped, outcome_dat, action = 2)
res_17=mr(dat)
# Detect and remove outliers (outliers can indicate pleiotropy)
raddat <- format_radial(BXG=dat$beta.exposure, BYG=dat$beta.outcome,
seBXG=dat$se.exposure, seBYG=dat$se.outcome, RSID=dat$SNP)
ivwrad <- ivw_radial(raddat, alpha=0.05, weights=1)
ivwrad2 <- ivw_radial(raddat, alpha=0.05, weights=1)
dim(ivwrad$outliers)[1]
outliers=ivwrad$outliers[1]
outliers
myvars=outliers$SNP
dat2 <- dat[! dat$SNP %in% myvars, ]
dim(dat2)
dim(dat)
dat2$outcome="breast cancer (BCAC)"
dat2$id.outcome="breast cancer (BCAC)"
# Run MR without outliers
mr_results_17 <- mr(dat2)
mr_results_17