This is a manual for the developmental version of Travis R package
Install exomePeak, this will install a few useful packages from Bioconductor
source("http://bioconductor.org/biocLite.R")
biocLite("exomePeak")
Install ggplot2, devtools and then Travis package
install.packages("ggplot2")
install.packages("devtools")
library(devtools)
install_github('lzcyzm/Travis_Dev')
load Travis package
library(Travis)
## Loading required package: Rsamtools
## Loading required package: S4Vectors
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
##
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
##
## The following objects are masked from 'package:stats':
##
## IQR, mad, xtabs
##
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, as.vector, cbind,
## colnames, do.call, duplicated, eval, evalq, Filter, Find, get,
## grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rep.int, rownames, sapply,
## setdiff, sort, table, tapply, union, unique, unlist, unsplit
## Warning in .recacheSubclasses(def@className, def, doSubclasses, env):
## undefined subclass "externalRefMethod" of class "expressionORfunction";
## definition not updated
## Loading required package: IRanges
## Warning in .recacheSubclasses(def@className, def, doSubclasses, env):
## undefined subclass "externalRefMethod" of class "functionORNULL";
## definition not updated
## Loading required package: GenomeInfoDb
## Loading required package: GenomicRanges
## Loading required package: XVector
## Loading required package: Biostrings
## Loading required package: GenomicFeatures
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Loading required package: rtracklayer
## Loading required package: GenomicAlignments
## Loading required package: SummarizedExperiment
## Loading required package: ggplot2
Find the toy data files came with Travis package
bed3=system.file("extdata", "H3K4me3_mm10_1000peaks.bed", package="Travis")
bed12=system.file("extdata", "m6A_mm10_exomePeak_1000peaks_bed12.bed", package="Travis")
bam=system.file("extdata", "SRR568477_mm10_XIST.bam", package="Travis")
narrowPeak=system.file("extdata", "m6A_hg19_1000peaks_macs2.narrowPeak", package="Travis")
Read 4 different kinds of Genomic Features into R
H3K4me3 <- import.bed(bed3) # bed3 imported as GRanges
m6A_HepG2 <- BED12toGRangesList(bed12) # bed12 imported as GRangesList
## [1] "Converting BED12 to GRangesList"
## [1] "It may take a few minutes"
Chip <- readGAlignments(bam) # bam imported as GAlignments
m6A_Bcell <- narrowPeaktoGRanges(narrowPeak) # bam imported as GAlignments
Put everything in a list with names. Make sure to be aware of the genome assembly used for each genomic feature set. Put different genome assembly features in different lists.
feature_mm10 <- list(H3K4me3,m6A_HepG2,Chip)
names(feature_mm10) <- c("H3K4me3", "m6A_HepG2", "Chip")
feature_hg19 <- list(m6A_Bcell)
names(feature_hg19) <- c("m6A_Bcell")
Show the transcriptomic view of 4 sets of genomic features
TravisPlot(feature_mm10,
genome="mm10",
saveToPDFprefix = "Toy")
Gene annotation will be downloaded automatically, and A PDF figure with prefix “Toy” will be saved in the current working directory, showing the transcriptomic distribution of genomic features.
It takes a few minutes to download Transcriptome information and build TravisCoodinatesFromTxDb, so we may want to avoid them to save time. We may install Transcriptome information database with R package:
source("http://bioconductor.org/biocLite.R")
biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") # install TranscriptDb for hg19
biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene") # install TranscriptDb for mm10
and then, load the transcriptome information with
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
hg19_txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
mm10_txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene
Alternatively, txdb can be downloaded online as well, which will take a few minutes:
hg19_txdb <- makeTxDbFromUCSC(genome="hg19")
mm10_txdb <- makeTxDbFromUCSC(genome="mm10")
We may then buid Travis coordinates
hg19_TravisCoordsFromTxDb <- makeTravisCoordsFromTxDb(hg19_txdb) # TravisCoordinates generated
## [1] "total 82960 transcripts extracted ..."
## [1] "total 47358 transcripts left after ambiguity filter ..."
## [1] "total 20302 mRNAs left after component length filter ..."
## [1] "total 8184 ncRNAs left after ncRNA length filter ..."
## [1] "Building Travis Coordinates. It may take a few minutes ..."
## [1] "Travis Coordinates Built ..."
mm10_TravisCoordsFromTxDb <- makeTravisCoordsFromTxDb(mm10_txdb) # TravisCoordinates generated
## [1] "total 61642 transcripts extracted ..."
## [1] "total 48644 transcripts left after ambiguity filter ..."
## [1] "total 18933 mRNAs left after component length filter ..."
## [1] "total 6819 ncRNAs left after ncRNA length filter ..."
## [1] "Building Travis Coordinates. It may take a few minutes ..."
## [1] "Travis Coordinates Built ..."
with TravisCoordinatesFromTxDb we can build TravisPlot.
TravisPlot(feature_mm10,
TravisCoordsFromTxDb = mm10_TravisCoordsFromTxDb,
saveToPDFprefix = "mm10")
## [1] "Using provided Travis Coordinates"
## [1] "Figures saved into mm10_Travis.pdf ..."
## Warning: Removed 5793 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 994 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 55 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 1158 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 324 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 181 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
TravisPlot(feature_hg19,
TravisCoordsFromTxDb = hg19_TravisCoordsFromTxDb,
saveToPDFprefix = "hg19")
## [1] "Using provided Travis Coordinates"
## [1] "Figures saved into hg19_Travis.pdf ..."
## Warning: Removed 2115 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 926 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
two PDF figures with prefix “mm10” and “hg19” will be saved in the current working directory.. Make sure the genome assembly used is correct.
The Travis can cover neighborhood DNA regions as well (5k promoter and 5k tail) with option includeNeighborDNA. This option may be useful when examine transcription related genomic features that directly located on DNA rather than RNA.
TravisPlot(feature_mm10,
TravisCoordsFromTxDb = mm10_TravisCoordsFromTxDb,
saveToPDFprefix = "mm10_withDNA",
includeNeighborDNA =TRUE)
## [1] "Using provided Travis Coordinates"
## [1] "Figures saved into mm10_withDNA_Travis.pdf ..."
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
You may also return the count information and customized the plot; however, this function is not well supported so far by Travis package, so it is not recommended.
f <-system.file("extdata", "RNAm5C_mm10_10000sites_bismark.cov", package="Travis")
a <- read.table(f,header=FALSE,sep="\t")
q <- a[[5]]
size <- a[[5]]+a[[6]]
d0 <- (pbinom(q, size, 0.05, lower.tail = TRUE, log.p = FALSE) < 0.05)
d1 <- (pbinom(q, size, 0.05, lower.tail = FALSE, log.p = FALSE) < 0.05)
GR <- GRanges(seqnames = a[,1], ranges = IRanges(start=a[,2], width=1))
peak <- list(GR[d0],GR[d1])
names(peak) <- c("<5%", ">5%")
ct <- TravisPlot(peak,
TravisCoordsFromTxDb=mm10_TravisCoordsFromTxDb,
saveToPDFprefix="RNAm5C_mm10",
returnCount=TRUE)
## [1] "Using provided Travis Coordinates"
## [1] "Figures saved into RNAm5C_mm10_Travis.pdf ..."
## Warning: Removed 102 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 13 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 40 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 15 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
ct1 <- ct[["mRNA_normalized"]]
p1 <- ggplot(ct1, aes(x=pos, group=Feature, weight=5*weight)) +
ggtitle("Distribution on mRNA") +
theme(axis.ticks = element_blank(), axis.text.x = element_blank()) +
xlab("") +
ylab("Frequency") +
geom_density(adjust=0.5,aes(fill=factor(Feature)),alpha=0.2,position="fill") +
annotate("text", x = 1.5, y = -0.1, label = "5'UTR", size=3) +
annotate("text", x = 0.5, y = -0.1, label = "Promoter (5kb)", size=3) +
annotate("text", x = 2.5, y = -0.1, label = "CDS", size=3) +
annotate("text", x = 3.5, y = -0.1, label = "3'UTR", size=3) +
annotate("text", x = 4.5, y = -0.1, label = "Tail (5kb)", size=3) +
xlim(0,5)
p1
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
ggsave(filename="stack_RNAm5C_mm10.pdf",plot=p1,width=5, height=3)
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
# example 4
f <-system.file("extdata", "DNAm5C_mm10_10000sites_bismark.cov", package="Travis")
a <- read.table(f,header=FALSE,sep="\t")
q <- a[[5]]
size <- a[[5]]+a[[6]]
d0 <- (pbinom(q, size, 0.5, lower.tail = TRUE, log.p = FALSE) < 0.05)
d1 <- (pbinom(q, size, 0.5, lower.tail = FALSE, log.p = FALSE) < 0.05)
GR <- GRanges(seqnames = a[,1], ranges = IRanges(start=a[,2], width=1))
peak <- list(GR[d0],GR[d1])
names(peak) <- c("<50%", ">50%")
ct <- TravisPlot(peak,
TravisCoordsFromTxDb=mm10_TravisCoordsFromTxDb,
saveToPDFprefix="DNAm5C_mm10",
returnCount=TRUE)
## [1] "Using provided Travis Coordinates"
## [1] "Figures saved into DNAm5C_mm10_Travis.pdf ..."
## Warning: Removed 184 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 389 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 46 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning: Removed 69 rows containing non-finite values (stat_density).
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
ct2 <- ct[["ncRNA_normalized"]]
p2 <- ggplot(ct2, aes(x=pos, group=Feature, weight=3*weight)) +
ggtitle("Distribution on mRNA") +
theme(axis.ticks = element_blank(), axis.text.x = element_blank()) +
xlab("") +
ylab("Frequency") +
geom_density(adjust=0.5,aes(fill=factor(Feature)),alpha=0.2,position="fill") +
annotate("text", x = 6.5, y = -0.1, label = "Promoter (5kb)", size=3) +
annotate("text", x = 7.5, y = -0.1, label = "lncRNA", size=3) +
annotate("text", x = 8.5, y = -0.1, label = "Tail (5kb)", size=3) +
xlim(6,9)
p2
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
ggsave(filename="stack_DNAm5C_mm10.pdf",plot=p2,width=5, height=3)
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
These are 8 sections in for an “TravisCoordsFromTxDb” object. Let us see how much they overlap with each other.
TravisCoords <- hg19_TravisCoordsFromTxDb
type <- paste(mcols(TravisCoords)$comp,mcols(TravisCoords)$category)
key <- unique(type)
landmark <- list(1,2,3,4,5,6,7,8)
names(landmark) <- key
for (i in 1:length(key)) {
landmark[[i]] <- TravisCoords[type==key[i]]
}
TravisPlot(landmark[1:5],TravisCoordsFromTxDb=TravisCoords,saveToPDFprefix="Landmark_hg19_mRNA",includeNeighborDNA =TRUE)
## [1] "Using provided Travis Coordinates"
## [1] "Figures saved into Landmark_hg19_mRNA_Travis.pdf ..."
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
TravisPlot(landmark[6:8],TravisCoordsFromTxDb=TravisCoords,saveToPDFprefix="Landmark_hg19_ncRNA",includeNeighborDNA =TRUE)
## [1] "Using provided Travis Coordinates"
## [1] "Figures saved into Landmark_hg19_ncRNA_Travis.pdf ..."
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
We can roughly see that, the mRNA components are evenly distributed on mRNA, and the lncRNA components are evenly distributed on lncRNA. The distribution on mRNA and lncRNA are standardized independently, the above code doesn’t support the overlapping between mRNA components and lncRNA components.
The “TravisCoordsFromTxDb” object contains the genome-projected transcriptome coordinates, which can be valuable for evaluating transcriptomic information related applications, such as checking the quality of MeRIP-Seq data. The Travis coordinates are stored in the following format.
head(TravisCoords)
## GRanges object with 6 ranges and 4 metadata columns:
## seqnames ranges strand | txid pos
## <Rle> <IRanges> <Rle> | <factor> <numeric>
## uc010nxq.1 chr1 [ 6874, 6924] + | uc010nxq.1 0.005
## uc009vjk.2 chr1 [ 317037, 317087] + | uc009vjk.2 0.005
## uc001aau.3 chr1 [ 318892, 318942] + | uc001aau.3 0.005
## uc021oeh.1 chr1 [ 319288, 319338] + | uc021oeh.1 0.005
## uc021oei.1 chr1 [ 322546, 322596] + | uc021oei.1 0.005
## uc001acv.3 chr1 [1067397, 1067447] + | uc001acv.3 0.005
## comp category
## <factor> <factor>
## uc010nxq.1 Front mRNA
## uc009vjk.2 Front mRNA
## uc001aau.3 Front mRNA
## uc021oeh.1 Front mRNA
## uc021oei.1 Front mRNA
## uc001acv.3 Front mRNA
## -------
## seqinfo: 52 sequences from an unspecified genome; no seqlengths
It might be necessary to compare the difference bewteen results returned from makeTravisCoordsFromTxDb and makeTravisCoordsFromGRangesList. We may not need to differentiate different components on mRNA, i.e., 5’UTR, CDS, and 3’UTR.
sessionInfo()
## R version 3.2.1 (2015-06-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8 x64 (build 9200)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] grid parallel stats4 stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.1.3
## [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.3
## [3] Travis_0.2.0
## [4] ggplot2_1.0.1
## [5] GenomicAlignments_1.5.12
## [6] SummarizedExperiment_0.3.2
## [7] rtracklayer_1.29.12
## [8] GenomicFeatures_1.21.13
## [9] AnnotationDbi_1.31.17
## [10] Biobase_2.29.1
## [11] Rsamtools_1.21.14
## [12] Biostrings_2.37.2
## [13] XVector_0.9.1
## [14] GenomicRanges_1.21.17
## [15] GenomeInfoDb_1.5.9
## [16] IRanges_2.3.17
## [17] S4Vectors_0.7.12
## [18] BiocGenerics_0.15.5
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.0 formatR_1.2 futile.logger_1.4.1
## [4] plyr_1.8.3 bitops_1.0-6 futile.options_1.0.0
## [7] tools_3.2.1 zlibbioc_1.15.0 biomaRt_2.25.1
## [10] digest_0.6.8 evaluate_0.7 RSQLite_1.0.0
## [13] gtable_0.1.2 DBI_0.3.1 proto_0.3-10
## [16] stringr_1.0.0 knitr_1.10.5 XML_3.98-1.3
## [19] BiocParallel_1.3.47 rmarkdown_0.7 reshape2_1.4.1
## [22] lambda.r_1.1.7 magrittr_1.5 MASS_7.3-43
## [25] scales_0.2.5 htmltools_0.2.6 colorspace_1.2-6
## [28] labeling_0.3 stringi_0.5-5 munsell_0.4.2
## [31] RCurl_1.95-4.7