Meta-analytic Mendelian randomization (MR) of expression of cytosolic and mitochondrial ribosomal proteins (RPs) on ER- breast cancer (BC)

MR-dysreg-GTEx-all-RP-BC-ERn-breast

Charleen D. Adams1 and Bernardo Lemos1,2

1Department of Environmental Health, Program in Molecular and Integrative Physiological Sciences, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
2Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA

2021-04-12

Abstract

Under conditions of nucleolar stress, various ribosomal proteins (RPs) interact with the MDM2 proto-oncogene (MDM2) to stabilize p53. Dysregulation of the RPs may disturb this pathway, but whether changes in the expression of the RPs influence risk for breast cancer in the general population is unknown. We tested whether expression changes in cytosolic or mitochondrial RPs influence risk for ER- breast cancer using inverse-variance weighted (IVW) Mendelian randomization (MR)–an instrumental variables technique. Summary statistics for expression quantitative trait loci (eQTLs) for SNPs controlling expression of RPs (P<5x10-06) served as instrumental variables for decreased RP expression. The summary statistics for the SNPs decreasing expression of the RPs were extracted from the Gene-Tissue Expression (GTEx) project’s (version 8 “eGene”) eQTL data in breast tissue. Summary statistics for these same SNPs were extracted from the Breast Cancer Association Consortium’s (BCAC) 2017 genome-wide association study (GWAS) of breast cancer (21468 ER- cases and 105974 controls). After outliers were removed, summary statistics for 35 SNPs were used to ascertain whether decreased expression of RPs increases risk for ER- breast cancer. This was done by calculating and meta-analyzing individual Wald ratios. Wald ratios were obtained by dividing the effect estimate for the impact of a given SNP on breast cancer by the effect estimate for the impact of the same SNP on RP expression. The meta-analyzed Wald ratios yield an overall causal (IVW) estimate that should be free from most sources of environmental confounding, so long as the assumptions for MR are not violated. Chief among the MR assumptions are that the genetic instruments (RP eQTLs) influence breast cancer only through the expression of the RPs and not through pleiotropic pathways. We performed sensitivity analyses, using the MR-Egger, weighted median, and weighted mode MR estimators to test for pleiotropy. The IVW estimate revealed a null effect for ER- breast cancer corresponding to an increase in expression of cytosolic and mitochondrial RPs, regardless of whether the effect alleles increased or decreased expression (odds ratio=1.009; 95% CI=0.996, 1.022; P=0.173). There was no evidence of pleiotropy. The null IVW result implies either that environmental factors that increase the expression of the RPs are not risk factors ER- breast cancer or that we were underpowered to observe an effect with only 21468 cases.

0.1 Overview
0.2 Actions
0.3 Code
0.4 Figures
0.5 IVW & sensitivity estimator results
0.6 Wald ratio results

0.1 Overview

Method: meta-analytic Mendelian randomization (MR)
Instrument data source: Gene-Tissue Expression (GTex) Project
Expression quantitative trait loci (eQTL)
Outcome data source: BCAC 2017 release
Ribosomal protein (RP): small & large subunits only
Significance threshold for analyses here (Q-value for ‘eGene’ status <0.05 & nominal P<5x10-6)

0.2 Actions

# Create the directory to match the repository in GitHub
mkdir /n/home04/cdadams/MR-dysreg-GTEx-all-RP-BC-breast
cd /n/home04/cdadams/MR-dysreg-GTEx-all-RP-BC-breast

Read in the full GTEx v8 “eGene” data
Clean up column names
Extract breast eQTLs
Extract RP genes
Select SNPs at P<5e-06
Format for MR
Merge breast RP eqtls with BCAC
Run MR
Look for outliers & remove them
Test for pleiotropy & heterogeneity
Generate forest plot & scatterplot
Table the data sources

0.3 Code

# MR of decreased cytosolic RP expression in breast on breast cancer
setwd('/n/home04/cdadams/MR-dysreg-GTEx-all-RP-BC-ERn-breast')

# Hand-curated RP list (no pseudogenes): 150
rp_list_expanded=read.csv('/n/home04/cdadams/ld/ldsc/expanded_verified.csv')
rp_list_expanded

# Look and clean: 
master_v8_n=read.csv("/n/home04/cdadams/ld/ldsc/v8_master_tissues_n.csv")
master_v8_n$X=NULL
colnames(master_v8_n)[2] <- "chromosome"
colnames(master_v8_n)[31] <- "MAF"
master_v8_n$eaf=ifelse(master_v8_n$ref_factor=='1', master_v8_n$MAF, 1-master_v8_n$MAF)
gtex_breast=master_v8_n[which(master_v8_n$tissue=="Breast_Mammary_Tissue"),]
write.csv(gtex_breast, '/n/home04/cdadams/MR-dysreg-GTEx-all-RP-BC-breast/gtex_breast.csv')

# Just read in the cleaned data 
gtex_breast=read.csv('/n/home04/cdadams/MR-dysreg-GTEx-all-RP-BC-breast/gtex_breast.csv')
gtex_breast$X=NULL

# Get the RP eqtls 
gtex_breast_RP=gtex_breast[which(gtex_breast$gene_name %in% rp_list_expanded$all_RP),]

# Select the P<5x10-06 eqtls
gtex_breast_RP_p_06=gtex_breast_RP[which(gtex_breast_RP$pval<0.000005),]

# Order
gtex_breast_RP_p_06=gtex_breast_RP_p_06[order(gtex_breast_RP_p_06$gene_name, decreasing=TRUE),]

# Remove duplicate entries
gtex_breast_RP_p_06Clean <- gtex_breast_RP_p_06[!duplicated(gtex_breast_RP_p_06$gene_name),]
gtex_breast_RP_p_06Clean$tissue="Breast"

write.csv(gtex_breast_RP_p_06Clean, '/n/home04/cdadams/MR-dysreg-GTEx-all-RP-BC-breast/gtex_breast_RP_p_06Clean.csv')

# MR format
# Use samplesize_col = 'position' to keep position. Change after.

exposure_dat <- read_exposure_data(
  filename = "/n/home04/cdadams/MR-dysreg-RP-BC/gtex_breast_RP_p_06Clean.csv",
  sep = ',',
  snp_col = 'SNP',
  beta_col = 'beta',
  se_col = 'se',
  effect_allele_col = 'effect_allele',
  phenotype_col = 'tissue',
  other_allele_col = 'other_allele',
  eaf_col = 'eaf',
  samplesize_col = 'position',
  gene_col = 'gene_name',
  pval_col = 'pval', 
)

# Format the colname names
exposure_dat$chr_name=exposure_dat$chr.exposure
exposure_dat$chrom_start=exposure_dat$samplesize.exposure
exposure_dat$samplesize.exposure=NULL
exposure_dat$gene_tissue=exposure_dat$exposure
exposure_dat$exposure="RP expression in breast"
exposure_dat$id.exposure="RP expression in breast"

# # Select the decreased expression eQTLs
# dat_decrease=exposure_dat[which(exposure_dat$beta.exposure<0),]
# 
# # Subset by just the cytosolic RPs
# decrease_RPs <- dat_decrease[startsWith(as.character(dat_decrease$gene.exposure), 'R'),]

# Clump for LD
exposure_dat_clump <- clump_data(exposure_dat)

# Extract the increased expression loci from BCAC
outcome_dat <- extract_outcome_data(exposure_dat_clump$SNP, c('1128'),
  proxies = 1, rsq = 0.8, align_alleles = 1, palindromes = 1, maf_threshold = 0.3)
dat <- harmonise_data(exposure_dat_clump, outcome_dat, action = 2)

# Run MR
mr_results<- mr(dat)
mr_results

# Detect and remove outliers (outliers can indicate pleiotropy)
raddat <- format_radial(BXG=dat$beta.exposure, BYG=dat$beta.outcome, 
  seBXG=dat$se.exposure, seBYG=dat$se.outcome, RSID=dat$SNP)
ivwrad <- ivw_radial(raddat, alpha=0.05, weights=1)
ivwrad2 <- ivw_radial(raddat, alpha=0.05, weights=1)
dim(ivwrad$outliers)[1] 
outliers=ivwrad$outliers[1]
outliers
myvars=outliers$SNP
dat2 <- dat[! dat$SNP %in% myvars, ]
dim(dat2) 
dim(dat)
dat2$outcome="breast cancer (BCAC)"
dat2$id.outcome="breast cancer (BCAC)"

# Run MR without outliers
mr_results <- mr(dat2)
mr_results

singlesnp_results=mr_singlesnp(dat2, parameters = default_parameters(),
  single_method = "mr_wald_ratio", 
  all_method = c("mr_ivw",
  "mr_egger_regression", 
  "mr_weighted_mode",
  "mr_weighted_median"))

# Remove the snps that weren't used in the MR to have a record of the data source
singlesnp_results.2=singlesnp_results[1:35,]
dim(singlesnp_results.2)
dat2.1=dat2[which(dat2$SNP %in% singlesnp_results.2$SNP),]
dim(dat2.1)
dim(dat)
dat=dat2.1
dim(dat)

# r2 and Fstat

n=480
k=35
p=dat$eaf.exposure
r2.1<-(2*(dat$beta.exposure^2)*p*(1-p))/1
r2=sum(r2.1, na.rm = TRUE)
r2 
Fstat <- r2*(n-1-k)/((1-r2)*k)
Fstat

# Combine results
sin=singlesnp_results
res=mr_results
het<-mr_heterogeneity(dat)
plt<-mr_pleiotropy_test(dat)

# Bug in function for combining; save separately
# all_res<-combine_all_mrresults(res,het,plt,sin,ao_slc=F,Exp=T,split.exposure=T,split.outcome=T)
# all_res$lower_intercept=all_res$intercept-1.96*all_res$intercept_se
# all_res$upper_intercept=all_res$intercept+1.96*all_res$intercept_se
# 
# head(all_res[,c("Method","outcome","exposure","nsnp","b","se","or", "or_lci95",  
#   "or_uci95","pval","intercept","intercept_se","intercept_pval",
#   "Q","Q_df","Q_pval")])
#write.csv(all_res,"combined_results_all_RP_MRP_breast_v8.csv")

write.csv(mr_results,'res_all_RP_breast_ERn_v8.csv')
write.csv(singlesnp_results,'singlesnp_all_RP_breast_ERn_v8.csv')
write.csv(dat, 'all_RP_breast_v8_SIMEX_ERn.csv')
write.csv(plt,'plt_all_RP_breast_ERn_v8.csv')
write.csv(het,'het_all_RP_breast_v8_ERn.csv')

0.4 Figures

The top plot displays the individual SNPs included in the analysis, their effect estimates and 95% confidence intervals, along with the meta-analyzed “causal” estimates (the term “causal” is used in MR to mean MR is less prone to confounding and reverse causation than observational designs).
The inverse-variance weighted (IVW) estimate is the approach that is used to test the causal effect. The weighted median, weighted mode, and MR-Egger are sensitivity estimators. If the directions and magnitudes of the sensitivity estimators’ findings comport with those of the IVW, this is evidence against pleiotropy. (The sensitivity estimators needn’t all be statistically signficiant. For instance, the MR-Egger estimate isn’t, but this only means that the MR-Egger test doesn’t provide additional support for the IVW. The direction and magnitude of the MR-Egger estimate does, however, comport with the IVW’s.)
The bottom (scatter) plot is a sensitivity analysis for heterogeneity in the effect estimates.

0.5 IVW & sensitivity estimator results

##                      method nsnp    or    se or_lci95 or_uci95  pval
## 1                  MR Egger   35 1.006 0.012    0.982    1.030 0.641
## 2           Weighted median   35 1.008 0.009    0.990    1.026 0.409
## 3 Inverse variance weighted   35 1.009 0.007    0.996    1.022 0.173
## 4               Simple mode   35 1.009 0.015    0.980    1.040 0.536
## 5             Weighted mode   35 1.006 0.009    0.989    1.023 0.505