Under conditions of nucleolar stress, various ribosomal proteins (RPs) interact with the MDM2 proto-oncogene (MDM2) to stabilize p53. Dysregulation of the RPs may disturb this pathway, but whether changes in the expression of the RPs influence risk for breast cancer in the general population is unknown. We tested whether expression changes in cytosolic or mitochondrial RPs influence risk for ER+ breast cancer using inverse-variance weighted (IVW) Mendelian randomization (MR)–an instrumental variables technique. Summary statistics for expression quantitative trait loci (eQTLs) for SNPs decreasing expression of RPs (P<5x10-06) served as instrumental variables for decreased RP expression. The summary statistics for the SNPs decreasing expression of the RPs were extracted from the Gene-Tissue Expression (GTEx) project’s (version 8 “eGene”) eQTL data in breast tissue. Summary statistics for these same SNPs were extracted from the Breast Cancer Association Consortium’s (BCAC) 2017 genome-wide association study (GWAS) of breast cancer (69501 ER+ cases and 105974 controls). After outliers were removed, summary statistics for 31 SNPs were used to ascertain whether any expression changes in RPs increase risk for ER+ breast cancer. This was done by calculating and meta-analyzing individual Wald ratios. Wald ratios were obtained by dividing the effect estimate for the impact of a given SNP on breast cancer by the effect estimate for the impact of the same SNP on RP expression. The meta-analyzed Wald ratios yield an overall causal (IVW) estimate that should be free from most sources of environmental confounding, so long as the assumptions for MR are not violated. Chief among the MR assumptions are that the genetic instruments (RP eQTLs) influence breast cancer only through the expression of the RPs and not through pleiotropic pathways. We performed sensitivity analyses, using the MR-Egger, weighted median, and weighted mode MR estimators to test for pleiotropy. The IVW estimate revealed an increased risk for ER+ breast cancer corresponding to an increase in expression of cytosolic and mitochondrial RPs, regardless of whether the effect alleles increased or decreased expression (odds ratio=1.012; 95% CI=1.002, 1.023; P=0.019). There was no evidence of pleiotropy. This implies that environmental factors that increase the expression of the RPs may increase risk for ER+ breast cancer.
# Create the directory to match the repository in GitHub
mkdir /n/home04/cdadams/MR-dysreg-GTEx-all-RP-BC-ERp-breast
cd /n/home04/cdadams/MR-dysreg-GTEx-all-RP-BC-ERp-breast
cd /n/home04/cdadams/Aim2/MR-dysreg-GTEx-all-RP-BC-ERp-breast
# MR any RP expression changes on ER+ BC
setwd('/n/home04/cdadams/Aim2/MR-dysreg-GTEx-all-RP-BC-ERp-breast')
# Hand-curated RP list (no pseudogenes): 150
rp_list_expanded=read.csv('/n/home04/cdadams/ld/ldsc/expanded_verified.csv')
rp_list_expanded
# Look and clean; highlighted out but very important
# master_v8_n=read.csv("/n/home04/cdadams/ld/ldsc/v8_master_tissues_n.csv")
# head(master_v8_n)
# master_v8_n$X=NULL
# colnames(master_v8_n)[2] <- "chromosome"
# colnames(master_v8_n)[31] <- "MAF"
# master_v8_n$eaf=ifelse(master_v8_n$ref_factor=='1', master_v8_n$MAF, 1-master_v8_n$MAF)
# gtex_breast=master_v8_n[which(master_v8_n$tissue=="Breast_Mammary_Tissue"),]
# write.csv(gtex_breast, '/n/home04/cdadams/Aim2/MR-dysreg-GTEx-all-RP-BC-ERp-breast/gtex_breast.csv')
gtex_breast=read.csv('/n/home04/cdadams/Aim2/MR-dysreg-GTEx-all-RP-BC-ERp-breast/gtex_breast.csv')
gtex_breast$X=NULL
str(gtex_breast)
str(rp_list_expanded)
rp_list_expanded$all_RP=as.factor(rp_list_expanded$all_RP)
gtex_breast$gene_name=as.factor(gtex_breast$gene_name )
gtex_breast_RP=gtex_breast[which(gtex_breast$gene_name %in% rp_list_expanded$all_RP),]
dim(gtex_breast_RP)
gtex_breast_RPClean <- gtex_breast_RP[!duplicated(gtex_breast_RP$gene_name),]
dim(gtex_breast_RPClean)
table(gtex_breast_RPClean$SNP)
gtex_breast_RPClean$gene_name=droplevels(gtex_breast_RPClean$gene_name)
freq1=table(gtex_breast_RPClean$gene_name)
freq1=as.data.frame(freq1)
freq1 #150!
dim(freq1)
sum(freq1$Freq) #150
# Get the RP eqtls
# Select the P<5x10-06 eqtls
gtex_breast_RPClean_p_06=gtex_breast_RPClean[which(gtex_breast_RPClean$pval<0.000005),]
freq2=table(gtex_breast_RPClean_p_06$gene_name)
freq2=as.data.frame(freq2)
freq2
sum(freq2$Freq) #69
# Order
gtex_breast_RPClean_p_06=gtex_breast_RPClean_p_06[order(gtex_breast_RPClean_p_06$gene_name, decreasing=TRUE),]
write.csv(gtex_breast_RPClean_p_06, '/n/home04/cdadams/Aim2/MR-dysreg-GTEx-all-RP-BC-ERp-breast/gtex_breast_RPClean_p_06.csv')
# MR format
# Use samplesize_col = 'position' to keep position. Change after.
exposure_dat <- read_exposure_data(
filename = "/n/home04/cdadams/Aim2/MR-dysreg-GTEx-all-RP-BC-ERp-breast/gtex_breast_RPClean_p_06.csv",
sep = ',',
snp_col = 'SNP',
beta_col = 'beta',
se_col = 'se',
effect_allele_col = 'effect_allele',
phenotype_col = 'tissue',
other_allele_col = 'other_allele',
eaf_col = 'eaf',
samplesize_col = 'position',
gene_col = 'gene_name',
pval_col = 'pval',
)
freq3=table(exposure_dat$gene.exposure)
freq3=as.data.frame(freq3)
dim(freq3)
sum(freq3$Freq)
freq3
# Format the colname names
exposure_dat$chr_name=exposure_dat$chr.exposure
exposure_dat$chrom_start=exposure_dat$samplesize.exposure
exposure_dat$samplesize.exposure=NULL
exposure_dat$gene_tissue=exposure_dat$exposure
exposure_dat$exposure="RP expression in breast"
exposure_dat$id.exposure="RP expression in breast"
# # Select the decreased expression eQTLs
# dat_decrease=exposure_dat[which(exposure_dat$beta.exposure<0),]
#
# # Subset by just the cytosolic RPs
# decrease_RPs <- dat_decrease[startsWith(as.character(dat_decrease$gene.exposure), 'R'),]
# Extract the increased expression loci from BCAC
outcome_dat <- extract_outcome_data(exposure_dat$SNP, c('1127'),
proxies = 1, rsq = 0.8, align_alleles = 1, palindromes = 1, maf_threshold = 0.3)
dat <- harmonise_data(exposure_dat, outcome_dat, action = 2)
freq4=table(dat$gene.exposure)
freq4=as.data.frame(freq4)
freq4
sum(freq4$Freq)
dim(freq4)
# Clump for LD
dat_clump <- clump_data(dat)
freq5=table(dat_clump$gene.exposure)
freq5=as.data.frame(freq5)
freq5
sum(freq5$Freq)
dim(freq5)
# Run MR
mr_results<- mr(dat_clump)
mr_results
dat=dat_clump
# Detect and remove outliers (outliers can indicate pleiotropy)
raddat <- format_radial(BXG=dat$beta.exposure, BYG=dat$beta.outcome,
seBXG=dat$se.exposure, seBYG=dat$se.outcome, RSID=dat$SNP)
ivwrad <- ivw_radial(raddat, alpha=0.05, weights=1)
ivwrad2 <- ivw_radial(raddat, alpha=0.05, weights=1)
dim(ivwrad$outliers)[1]
outliers=ivwrad$outliers[1]
outliers
myvars=outliers$SNP
dat2 <- dat[! dat$SNP %in% myvars, ]
dim(dat2)
dim(dat)
dat2$outcome="ER+ breast cancer (2017 BCAC)"
dat2$id.outcome="ER+ breast cancer (2017 BCAC)"
# Run MR without outliers
mr_results <- mr(dat2)
mr_results
singlesnp_results=mr_singlesnp(dat2, parameters = default_parameters(),
single_method = "mr_wald_ratio",
all_method = c("mr_ivw",
"mr_egger_regression",
"mr_weighted_mode",
"mr_weighted_median"))
# Remove the snps that weren't used in the MR to have a record of the data source
singlesnp_results.2=singlesnp_results[1:31,]
dim(singlesnp_results.2)
dat2.1=dat2[which(dat2$SNP %in% singlesnp_results.2$SNP),]
dim(dat2.1)
dim(dat)
dat=dat2.1
dim(dat)
# r2 and Fstat
n=480
k=31
summary(dat$eaf.exposure)
test=scale(dat$beta.exposure)
str(test)
test <- scale(dat$beta.exposure)
test3=as.data.frame(test)
dim(test3)
dat$scaled.dat=test3$V1
summary(dat$scaled.dat)
p=dat$eaf.exposure
r2.1<-(2*(dat$scaled.dat^2)*p*(1-p))/1 #dat$beta.exposure
r2=sum(r2.1, na.rm = TRUE)
r2
#r2=mean(r2.1, na.rm=TRUE)
#r2
Fstat <- r2*(n-1-k)/((1-r2)*k)
Fstat
# Combine results
sin=singlesnp_results
res=mr_results
het<-mr_heterogeneity(dat)
plt<-mr_pleiotropy_test(dat)
# Bug in function for combining; save separately
# all_res<-combine_all_mrresults(res,het,plt,sin,ao_slc=F,Exp=T,split.exposure=T,split.outcome=T)
# all_res$lower_intercept=all_res$intercept-1.96*all_res$intercept_se
# all_res$upper_intercept=all_res$intercept+1.96*all_res$intercept_se
#
# head(all_res[,c("Method","outcome","exposure","nsnp","b","se","or", "or_lci95",
# "or_uci95","pval","intercept","intercept_se","intercept_pval",
# "Q","Q_df","Q_pval")])
#write.csv(all_res,"combined_results_all_RP_MRP_breast_v8.csv")
write.csv(mr_results,'/n/home04/cdadams/Aim2/MR-dysreg-GTEx-all-RP-BC-ERp-breast/res_all_RP_breast_ERp_v8.csv')
write.csv(singlesnp_results,'/n/home04/cdadams/Aim2/MR-dysreg-GTEx-all-RP-BC-ERp-breast/singlesnp_all_RP_breast_ERp_v8.csv')
write.csv(dat, '/n/home04/cdadams/Aim2/MR-dysreg-GTEx-all-RP-BC-ERp-breast/all_RP_breast_v8_SIMEX_ERp.csv')
write.csv(plt,'/n/home04/cdadams/Aim2/MR-dysreg-GTEx-all-RP-BC-ERp-breast/plt_all_RP_breast_ERp_v8.csv')
write.csv(het,'/n/home04/cdadams/Aim2/MR-dysreg-GTEx-all-RP-BC-ERp-breast/het_all_RP_breast_v8_ERp.csv')
The top plot displays the individual SNPs included in the analysis, their effect estimates and 95% confidence intervals, along with the meta-analyzed “causal” estimates (the term “causal” is used in MR to mean MR is less prone to confounding and reverse causation than observational designs).
The inverse-variance weighted (IVW) estimate is the approach that is used to test the causal effect. The weighted median, weighted mode, and MR-Egger are sensitivity estimators. If the directions and magnitudes of the sensitivity estimators’ findings comport with those of the IVW, this is evidence against pleiotropy. (The sensitivity estimators needn’t all be statistically signficiant. For instance, the MR-Egger estimate isn’t, but this only means that the MR-Egger test doesn’t provide additional support for the IVW. The direction and magnitude of the MR-Egger estimate does, however, comport with the IVW’s.)
The bottom (scatter) plot is a sensitivity analysis for heterogeneity in the effect estimates.
FALSE Saving 8 x 5 in image
FALSE Saving 8 x 5 in image
## method nsnp or se or_lci95 or_uci95 pval
## 1 MR Egger 31 1.002 0.009 0.985 1.020 0.808
## 2 Weighted median 31 1.006 0.007 0.993 1.020 0.347
## 3 Inverse variance weighted 31 1.012 0.005 1.002 1.023 0.019
## 4 Simple mode 31 1.015 0.011 0.993 1.038 0.186
## 5 Weighted mode 31 1.009 0.006 0.997 1.022 0.166