library(ChIPpeakAnno)
library(ChIPseeker)
NFE2L2 <- readPeakFile("C:/Users/user/Desktop/BishopLab/ENCFF418TUX.bed.gz", as ="GRanges")
BRCA1 <- readPeakFile("C:/Users/user/Desktop/BishopLab/ENCFF791SNR.bed.gz", as = "GRanges")
Peaks for NFE2L2 and BCRA1 were downloaded from ENCODE.
I’ll create a venn diagram between binding sites of the two sets of data, then annotate the peaks for both and use upset plots and venn pie diagrams to explore the distributions of the kinds of binding sites in each data set.
ol <- findOverlapsOfPeaks(BRCA1, NFE2L2)
duplicated or NA names found.
Rename all the names by numbers.
duplicated or NA names found.
Rename all the names by numbers.
makeVennDiagram(ol,
fill=c("#009E73", "#F0E442"),
col=c("#D55E00", "#0072B2"),
cat.col=c("#D55E00", "#0072B2"))
Missing totalTest! totalTest is required for HyperG test.
If totalTest is missing, pvalue will be calculated by estimating
the total binding sites of encoding region of human.
totalTest = humanGenomeSize * (2%(codingDNA) +
1%(regulationRegion)) / ( 2 * averagePeakWidth )
= 3.3e+9 * 0.03 / ( 2 * averagePeakWidth)
= 5e+7 /averagePeakWidth
$p.value
BRCA1 NFE2L2 pval
[1,] 1 1 1
$vennCounts
BRCA1 NFE2L2 Counts count.BRCA1 count.NFE2L2
[1,] 0 0 0 0 0
[2,] 0 1 7055 0 7055
[3,] 1 0 9317 9317 0
[4,] 1 1 199 237 201
attr(,"class")
[1] "VennCounts"
199 Binding sites are shared amongst the two data sets.
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
tagMatrix_BRCA1 <- getTagMatrix(BRCA1, windows = promoter)
tagMatrix_NFE2L2 <- getTagMatrix(NFE2L2, windows = promoter)
plotAvgProf(tagMatrix_NFE2L2, xlim=c(-3000, 3000), xlab="Genomic Region (5'->3') NFE2L2")
plotAvgProf(tagMatrix_BRCA1, xlim=c(-3000, 3000), xlab="Genomic Region (5'->3') BRCA1")
The Human hg38 genome from UCSC were used for annotations.
Both NFE2L2 and BCRA1 are shown to bind close to the transcription start site(TSS). This makes sense for both of these protiens.
NFE2L2 is a known transcription factor(TF) which regulates protiens that protect against oxidative stress caused by inflammation (1). The peak frequencies would indicate that these cis-acting elements are very close to the TSS.
BCRA1 is another known TF which complexes with other protiens to form the BRCA1-associated genome surveillance complex (BASC). BASC, through the BCRA1 protien then acts upon RNA-poleymerase 2 (RNA-p2), giving the protien a regulatory role in transcription (2).
NFE2L2_anno <- annotatePeak(NFE2L2, tssRegion=c(-750, 750),TxDb=txdb)
>> preparing features information... 2021-01-15 3:16:36 PM
>> identifying nearest features... 2021-01-15 3:16:36 PM
>> calculating distance from peak to TSS... 2021-01-15 3:16:37 PM
>> assigning genomic annotation... 2021-01-15 3:16:37 PM
>> assigning chromosome lengths 2021-01-15 3:16:41 PM
>> done... 2021-01-15 3:16:41 PM
BRCA1_anno <- annotatePeak(BRCA1, tssRegion=c(-750, 750),TxDb=txdb)
>> preparing features information... 2021-01-15 3:16:41 PM
>> identifying nearest features... 2021-01-15 3:16:41 PM
>> calculating distance from peak to TSS... 2021-01-15 3:16:41 PM
>> assigning genomic annotation... 2021-01-15 3:16:41 PM
>> assigning chromosome lengths 2021-01-15 3:16:46 PM
>> done... 2021-01-15 3:16:46 PM
upsetplot(NFE2L2_anno,vennpie=TRUE)
upsetplot(BRCA1_anno,vennpie=TRUE)
TSS regions for NFE2L2 and BCRA1 were chosen based on the metaplot.
Two main groups stand out for NFE2L2; genic intron regions and intergenic regions (including distal ones). Other groupings are much less prevelant compared to these two. Knowing NFE2L2 is a TF and binds to elements close to the TSS would help make sense of this as it would bind between coding regions of the DNA on promoter elements.
BCRA1, by contrast, has several groupings with potential relevance - I will discuss 3 of the most prominent. The first two of these are shared with NFE2L2 and make sense for the previously mentioned reasons. The next most prevelant goup is one which overlaps genic, promoter, exon and five prime UTR regions. Given that BCRA1 has a regulatory role in transcription by binding to RNA-p2, this grouping makes sense, as BCRA1 would initially bind to a promoter very close to the TSS and have some relation to the capping of the synthesized mRNA (which occurs co-transcriptionally). I am not sure of how to interpret groupings which overlap both intron and exonic regions.
library(ReactomePA)
library(org.Hs.eg.db)
NFE2L2_path <- enrichPathway(as.data.frame(NFE2L2_anno)$geneId)
BRCA1_path <- enrichPathway(as.data.frame(BRCA1_anno)$geneId)
dotplot(NFE2L2_path, title = "NFE2L2 Pathways")
wrong orderBy parameter; set to default `orderBy = "x"`
dotplot(BRCA1_path, title = "BRCA1 Pathways")
wrong orderBy parameter; set to default `orderBy = "x"`
Per the dot plots, NFE2L2 and BRCA1 do not share any pathways. Howver, analyzing their individual pathways may provide some insight for where to look when conducting further work into NFE2L2 and BCRA1’s relationship
Two of NFE2L2’s more statistically probable (according to their p-values) pathways are related to MET signalling. The MET signalling pathway is closely associated with oncogenises. Met is a signal transduction protien, which when activaed, sends a signal cascade which promotes cell growth. Upregulation of the Met protien can result in tumor growth. One factor which my lead to this upregulation are hypoxic conditions (3). Knowing that NFE2L2 regulates genes whos products protect against oxidative stress, perhaps NFE2L2 plays a role in combating tumor growth by interfereing with MET signalling - via the protiens which it regulates. Knowing that BRCA1 aids in tumor suppression through transcription regulation, exploring NFE2L2’s relationship to Met signaling and where BRCA1 may interact in this relationship (particularly in breast tissue) may provide an insightful path for further work.
An interesting observation from BRCA1’s plot is the absence of any RNA-p2 interaction and instead, much evidence for RNA polymerase 3 and ribosome interaction. Pathways which are listed here relate to RNA polymerase 3 transcription as well as rRNA processing. If BRCA1 interacts with the latter at the level of transcription, perhaps the protien is able to interact with all three eukaryotic forms of RNA-polymerase. A study on BRCA1 and it’s interactions with RNA polymerase 1 and 3 through ChIP-seq analysis (4) supports the previously mentioned hypothesis and goes further by stating that in certain cell types, depletion of the protien led to a down regulation of rRNA transcription, with the inverse also being true. high levels of ribosome activiy and rRNA transcription are clear indications of tumor growth. From my own quick ChIP-seq analysis and this study, its made clear that BRCA1’s role in oncogenisis extends beyond transcriptional control, going so far as to influence translation.
Now knowing this, another avenue of research could look into the levels of expression for BRCA1 in multiple other tissue types - both healthy and cancerous.