Meta-analytic Mendelian randomization (MR) of decreased expression of cytosolic RPs on BC

MR-dysreg-GTEx-decr-RP-BC-breast

Charleen D. Adams1 and Bernardo Lemos1,2

1Department of Environmental Health, Program in Molecular and Integrative Physiological Sciences, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
2Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA

2021-04-10

Abstract

Under conditions of nucleolar stress, various ribosomal proteins (RPs) interact with the MDM2 proto-oncogene (MDM2) to stabilize p53. Dysregulation of the RPs may disturb this pathway, but whether changes in the expression of the RPs influence risk for breast cancer in the general population is unknown. We tested whether decreased expression of cytosolic RPs increases risk for breast cancer using inverse-variance weighted (IVW) Mendelian randomization (MR)–an instrumental variables technique. Summary statistics for expression quantitative trait loci (eQTLs) for SNPs decreasing expression of RPs (P<5x10-06) served as instrumental variables for decreased RP expression. The summary statistics for the SNPs decreasing expression of the RPs were extracted from the Gene-Tissue Expression (GTEx) project’s (version 8 “eGene”) eQTL data in breast tissue. Summary statistics for these same SNPs were extracted from the Breast Cancer Association Consortium’s (BCAC) 2017 genome-wide association study (GWAS) of breast cancer (122977 cases and 105974 controls). After outliers were removed, summary statistics for 14 SNPs were used to ascertain whether decreased expression of RPs increases risk for breast cancer. This was done by calculating and meta-analyzing individual Wald ratios. Wald ratios were obtained by dividing the effect estimate for the impact of a given SNP on breast cancer by the effect estimate for the impact of the same SNP on RP expression. The meta-analyzed Wald ratios yield an overall causal (IVW) estimate that should be free from most sources of environmental confounding, so long as the assumptions for MR are not violated. Chief among the MR assumptions are that the genetic instruments (RP eQTLs) influence breast cancer only through the expression of the RPs and not through pleiotropic pathways. We performed sensitivity analyses, using the MR-Egger, weighted median, and weighted mode MR estimators to test for pleiotropy. The IVW estimate revealed an increased risk for breast cancer corresponding to an increase in expression of cytosolic RPs with eQTLs that decrease their expression (odds ratio=1.025; 95% CI=1.010, 1.040; P=0.001). There was no evidence of pleiotropy. This implies those with the RP alleles decreasing RP expression may be protected against breast cancer and that environmental factors that increase the expression of the RPs may increase risk for breast cancer.

0.1 Overview
0.2 Actions
0.3 Code
0.4 Figures
0.5 IVW & sensitivity estimator results
0.6 Wald ratio results

0.1 Overview

Method: meta-analytic Mendelian randomization (MR)
Instrument data source: Gene-Tissue Expression (GTex) Project
Expression quantitative trait loci (eQTL)
Outcome data source: BCAC 2017 release
Ribosomal protein (RP): small & large subunits only
Significance threshold for analyses here (Q-value for ‘eGene’ status <0.05 & nominal P<5x10-6)

0.2 Actions

# Create the directory to match the repository in GitHub
mkdir /n/home04/cdadams/MR-dysreg-RP-BC
cd /n/home04/cdadams/MR-dysreg-RP-BC

Read in the full GTEx v8 “eGene” data
Clean up column names
Extract breast eQTLs
Extract RP genes
Select SNPs at P<5e-06
Format for MR
Merge breast RP eqtls with BCAC
Run MR
Look for outliers & remove them
Test for pleiotropy & heterogeneity
Generate forest plot & scatterplot
Table the data sources

0.3 Code

# MR of decreased cytosolic RP expression in breast on breast cancer
setwd('/n/home04/cdadams/MR-dysreg-RP-BC')

# Hand-curated RP list (no pseudogenes): 150
rp_list_expanded=read.csv('/n/home04/cdadams/ld/ldsc/expanded_verified.csv')
rp_list_expanded

# Look and clean: 
master_v8_n=read.csv("/n/home04/cdadams/ld/ldsc/v8_master_tissues_n.csv")
master_v8_n$X=NULL
colnames(master_v8_n)[2] <- "chromosome"
colnames(master_v8_n)[31] <- "MAF"
master_v8_n$eaf=ifelse(master_v8_n$ref_factor=='1', master_v8_n$MAF, 1-master_v8_n$MAF)
gtex_breast=master_v8_n[which(master_v8_n$tissue=="Breast_Mammary_Tissue"),]
write.csv(gtex_breast, '/n/home04/cdadams/MR-dysreg-RP-BC/gtex_breast.csv')

# Just read in the cleaned data 
gtex_breast=read.csv('/n/home04/cdadams/MR-dysreg-RP-BC/gtex_breast.csv')
gtex_breast$X=NULL

# Get the RP eqtls 
gtex_breast_RP=gtex_breast[which(gtex_breast$gene_name %in% rp_list_expanded$all_RP),]

# Select the P<5x10-06 eqtls
gtex_breast_RP_p_06=gtex_breast_RP[which(gtex_breast_RP$pval<0.000005),]

# Order
gtex_breast_RP_p_06=gtex_breast_RP_p_06[order(gtex_breast_RP_p_06$gene_name, decreasing=TRUE),]

# Remove duplicate entries
gtex_breast_RP_p_06Clean <- gtex_breast_RP_p_06[!duplicated(gtex_breast_RP_p_06$gene_name),]
gtex_breast_RP_p_06Clean$tissue="Breast"

write.csv(gtex_breast_RP_p_06Clean, '/n/home04/cdadams/MR-dysreg-RP-BC/gtex_breast_RP_p_06Clean.csv')

# MR format
# Use samplesize_col = 'position' to keep position. Change after.

exposure_dat <- read_exposure_data(
  filename = "/n/home04/cdadams/MR-dysreg-RP-BC/gtex_breast_RP_p_06Clean.csv",
  sep = ',',
  snp_col = 'SNP',
  beta_col = 'beta',
  se_col = 'se',
  effect_allele_col = 'effect_allele',
  phenotype_col = 'tissue',
  other_allele_col = 'other_allele',
  eaf_col = 'eaf',
  samplesize_col = 'position',
  gene_col = 'gene_name',
  pval_col = 'pval', 
)

# Format the colname names
exposure_dat$chr_name=exposure_dat$chr.exposure
exposure_dat$chrom_start=exposure_dat$samplesize.exposure
exposure_dat$samplesize.exposure=NULL
exposure_dat$gene_tissue=exposure_dat$exposure
exposure_dat$exposure="decreased RP expression in breast"
exposure_dat$id.exposure="decreased RP expression in breast"

# Select the decreased expression eQTLs
dat_decrease=exposure_dat[which(exposure_dat$beta.exposure<0),]

# Subset by just the cytosolic RPs
decrease_RPs <- dat_decrease[startsWith(as.character(dat_decrease$gene.exposure), 'R'),]

# Clump for LD
decrease_RPs_clump <- clump_data(decrease_RPs)
#decrease_RPs_clump <- clump_data(decrease_RPs, clump_r2=0.01)

# Extract the increased expression loci from BCAC
outcome_dat <- extract_outcome_data(decrease_RPs_clump$SNP, c('1126'),
  proxies = 1, rsq = 0.8, align_alleles = 1, palindromes = 1, maf_threshold = 0.3)
dat <- harmonise_data(decrease_RPs_clump, outcome_dat, action = 2)

# Run MR
mr_results<- mr(dat)
mr_results

# Detect and remove outliers (outliers can indicate pleiotropy)
raddat <- format_radial(BXG=dat$beta.exposure, BYG=dat$beta.outcome, 
  seBXG=dat$se.exposure, seBYG=dat$se.outcome, RSID=dat$SNP)
ivwrad <- ivw_radial(raddat, alpha=0.05, weights=1)
ivwrad2 <- ivw_radial(raddat, alpha=0.05, weights=1)
dim(ivwrad$outliers)[1] 
outliers=ivwrad$outliers[1]
outliers
myvars=outliers$SNP
dat2 <- dat[! dat$SNP %in% myvars, ]
dim(dat2) 
dim(dat)
dat2$outcome="breast cancer (BCAC)"
dat2$id.outcome="breast cancer (BCAC)"

# Run MR without outliers
mr_results <- mr(dat2)
mr_results

singlesnp_results=mr_singlesnp(dat2, parameters = default_parameters(),
  single_method = "mr_wald_ratio", 
  all_method = c("mr_ivw",
  "mr_egger_regression", 
  "mr_weighted_mode",
  "mr_weighted_median"))

# Remove the snps that weren't used in the MR to have a record of the data source
singlesnp_results.2=singlesnp_results[1:14,]
dim(singlesnp_results.2)
dat2.1=dat2[which(dat2$SNP %in% singlesnp_results.2$SNP),]
dim(dat2.1)
dim(dat)
dat=dat2.1
dim(dat)

# r2 and Fstat

n=480
k=14
p=dat$eaf.exposure
r2.1<-(2*(dat$beta.exposure^2)*p*(1-p))/1
r2=sum(r2.1, na.rm = TRUE)
r2 
Fstat <- r2*(n-1-k)/((1-r2)*k)
Fstat

# Combine results
sin=singlesnp_results
res=mr_results
het<-mr_heterogeneity(dat)
plt<-mr_pleiotropy_test(dat)

# Bug in function for combining; save separately
# all_res<-combine_all_mrresults(res,het,plt,sin,ao_slc=F,Exp=T,split.exposure=T,split.outcome=T)
# all_res$lower_intercept=all_res$intercept-1.96*all_res$intercept_se
# all_res$upper_intercept=all_res$intercept+1.96*all_res$intercept_se
# 
# head(all_res[,c("Method","outcome","exposure","nsnp","b","se","or", "or_lci95",  
#   "or_uci95","pval","intercept","intercept_se","intercept_pval",
#   "Q","Q_df","Q_pval")])
#write.csv(all_res,"combined_results_decreased_cytosolic_breast_v8.csv")

write.csv(mr_results,'res_decreased_cytosolic_breast_v8.csv')
write.csv(singlesnp_results,'singlesnp_decreased_cytosolic_breast_v8.csv')
write.csv(dat, 'decreased_cytosolic_breast_v8_SIMEX.csv')
write.csv(plt,'plt_decreased_cytosolic_breast_v8.csv')
write.csv(het,'het_decreased_cytosolic_breast_v8.csv')

0.4 Figures

The top plot displays the individual SNPs included in the analysis, their effect estimates and 95% confidence intervals, along with the meta-analyzed “causal” estimates (the term “causal” is used in MR to mean MR is less prone to confounding and reverse causation than observational designs).
The inverse-variance weighted (IVW) estimate is the approach that is used to test the causal effect. The weighted median, weighted mode, and MR-Egger are sensitivity estimators. If the directions and magnitudes of the sensitivity estimators’ findings comport with those of the IVW, this is evidence against pleiotropy. (The sensitivity estimators needn’t all be statistically signficiant. For instance, the MR-Egger estimate isn’t, but this only means that the MR-Egger test doesn’t provide additional support for the IVW. The direction and magnitude of the MR-Egger estimate does, however, comport with the IVW’s.)
The bottom (scatter) plot is a sensitivity analysis for heterogeneity in the effect estimates. It shows a) the that an increase in the expression of RPs that have SNPs that decrease their expression increased risk for breast cancer across all estimators (the rising regression lines), b) that the sensitivity estimators comport (the regression lines don’t crisscross), and c) that there aren’t substantial outliers (there are no substantial outliers, since I detected and removed them before running the final MR; outliers can indicate pleiotropy).

0.5 IVW & sensitivity estimator results

##                      method nsnp    or    se or_lci95 or_uci95  pval
## 1                  MR Egger   14 1.029 0.016    0.997    1.061 0.098
## 2           Weighted median   14 1.024 0.009    1.005    1.043 0.013
## 3 Inverse variance weighted   14 1.025 0.008    1.010    1.040 0.001
## 4               Simple mode   14 1.028 0.017    0.994    1.063 0.134
## 5             Weighted mode   14 1.024 0.009    1.006    1.042 0.020