Author: Charleen D. Adams

Abbreviations and definitions

Background

GTEx coded their eQTL analysis such that the effect allele is the alternate allele. But the alternate allele is not always the minor allele. The effect sizes (“betas”) for the eQTLs refer to the slope of the linear regressions “computed as the effect of the alternative allele (ALT) relative to the reference allele (REF) in the human genome reference (i.e., the eQTL effect allele is the ALT allele)” – which is not always the minor allele.

In order to know which of the alleles is the true minor allele, I needed to obtain annotation information for the SNPs.

Goal

Total number of eQTLs across tissues (threshold qval<0.05 & P<5x10-6) and number of unique eQTLs

# Total number eQTLs across the RP genes 
dim(stringent_RP_sig_beta)
## [1] 2418   45
# Number of unique eQTLs
snp=unique(stringent_RP_sig_beta$SNP)
length(snp)
## [1] 996

Annotation from SNPnexus

SNPnexus is an online annotation tool for SNPs (with an option for GRCh38/hg38).

I entered our list of 996 unique eQTLs (threshold qval<0.05 & P<5x10-6) into SNPnexus and obtained annotation data for which allele is the reference, alternate, and minor, along with the minor allele frequencey (MAF).

(Side note: Separate SNPnexus annotation files also provide genomic consequence and deleteriosness for these SNPs. I can make plots with this data later…)

Determine EAF

# Read in the dummy copy of the SNPnexus results called my_data2
my_data2 <- read.table("C:/Users/charl/Dropbox/Harvard/ribosomal_proteins/Main_Tables_Paper/v8/gen_coords_v8_unique.txt", 
  header=TRUE, sep='\t',stringsAsFactors = FALSE)
# How many of the unique eQTLs can we annotate?

dim(my_data2)
## [1] 985  11
985/996 #99%
## [1] 0.9889558
# Create a dummy variable called "match" for whether the minor allele is the reference allele (1=yes)
my_data2<-within(my_data2, match <- ifelse (Minor.Allele == REF.Allele,1,0))
table(my_data2$match)
## 
##   0   1 
## 740 245
# Create the EAF variable by reasoning that if the minor allele is the references allele, then
# 1) the EAF is 1-MAF for our data 
# 2) otherwise, the EAF is the MAF

my_data2$Minor.Allele.Global.Frequency=as.numeric(my_data2$Minor.Allele.Global.Frequency)
my_data2$EAF <- my_data2$Minor.Allele.Global.Frequency
my_data2$EAF <- ifelse(my_data2$match=="1", 1-my_data2$Minor.Allele.Global.Frequency, 
  my_data2$Minor.Allele.Global.Frequency)

my_data2$dbSNP=as.character(my_data2$dbSNP)

# sqldf to merge (ignore warning)
merged=sqldf("select * 
  from stringent_RP_sig_beta left join my_data2 
  on stringent_RP_sig_beta.SNP = my_data2.dbSNP", 
  row.names = TRUE)
merged_EAF <- merged[order(merged$SNP),]

Summary of EAF for GTEx ribosomal protein eQTLs

hist(merged$EAF, col = c("green"),
     main="Histogram of EAF values for ribosomal protein eQTLs")

summary(merged$EAF)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0024  0.1216  0.2919  0.3279  0.5132  0.9740     139

Interactive table of the primary RP eQTL data (selected variables, including EAF)

In the table, “effect_allele” refers to the alternate allele in GTEx. The “other_allele” is the reference allele. The variable “match” signifies “1” when the reference allele is the minor allele and “0” when the alternate allele is the minor allele.

library(DT)
my_table_cols=c('tissue','n','position','chr','gene_name','SNP', 'EAF','match','effect_allele', 'other_allele','beta','pval', 'qval','se' )
interactive_merged=merged[my_table_cols]

datatable(interactive_merged, options = list(pageLength = 5))

Commentary on eQTL interactive table

Data cleaning: one SNP in GTEx had two rsids

merged_EAF=merged

#### One SNP has two rsids: change to the rsid that has MAF info: rs12012747
merged_EAF$SNP=as.character(merged_EAF$SNP)
merged_EAF$SNP=ifelse(merged_EAF$SNP=='rs4909,rs12012747', 'rs12012747', merged_EAF$SNP)
merged_EAF$match=ifelse(merged_EAF$SNP=='rs12012747', 0, merged_EAF$match)
merged_EAF$EAF=ifelse(merged_EAF$SNP=='rs12012747', 0.331126, merged_EAF$EAF)

### Rename the 'eaf' variable in the master file to reflect that it is the MAF in GTEx (not the actual EAF; the actual EAF is labled 'EAF')
colnames(merged_EAF)[31] <- "maf_in_GTEx"
write.csv(merged_EAF, 'merged_EAF.csv')

Acknowledgement language for manuscript

‘The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from: the GTEx Portal on 04/01/2020’

Github

I’ve started a Github repository to organize the files and scripts for this project. If Github refuses to open through the link, copy this link into your browser: https://github.com/charleendadams/ribosomal_protein_eQTLs