Getting Started with Travis - V3

This is a manual for the developmental version of Travis R package

Intall Travis package

Install exomePeak, this will install a few useful packages from Bioconductor

source("http://bioconductor.org/biocLite.R")
biocLite("exomePeak")

Install ggplot2, devtools and then Travis package

install.packages("ggplot2")
install.packages("devtools")
library(devtools)
install_github('lzcyzm/Travis_Bioconductor')

load Travis package

suppressPackageStartupMessages(library(Travis))
## Warning in .recacheSubclasses(def@className, def, doSubclasses, env):
## undefined subclass "externalRefMethod" of class "expressionORfunction";
## definition not updated
## Warning in .recacheSubclasses(def@className, def, doSubclasses, env):
## undefined subclass "externalRefMethod" of class "functionORNULL";
## definition not updated
## Warning: package 'ggplot2' was built under R version 3.2.2

Read all the toy data files into a list with names

Find the toy data files came with Travis package

bed3=system.file("extdata", "H3K4me3_mm10_1000peaks.bed", package="Travis")
bed12=system.file("extdata", "m6A_mm10_exomePeak_1000peaks_bed12.bed", package="Travis")
bam=system.file("extdata", "SRR568477_mm10_XIST.bam", package="Travis")
narrowPeak=system.file("extdata", "m6A_hg19_1000peaks_macs2.narrowPeak", package="Travis")

Read 4 different kinds of Genomic Features into R

H3K4me3 <- import.bed(bed3) # bed3 imported as GRanges
m6A_HepG2 <- BED12toGRangesList(bed12) # bed12 imported as GRangesList
## [1] "Converting BED12 to GRangesList"
## [1] "It may take a few minutes"
Chip <- readGAlignments(bam) # bam imported as GAlignments
m6A_Bcell <- narrowPeaktoGRanges(narrowPeak) # bam imported as GAlignments

Put everything in a list with names. Make sure to be aware of the genome assembly used for each genomic feature set. Put different genome assembly features in different lists.

feature_mm10 <- list(H3K4me3,m6A_HepG2,Chip) 
names(feature_mm10) <- c("H3K4me3", "m6A_HepG2", "Chip")

feature_hg19 <- list(m6A_Bcell) 
names(feature_hg19) <- c("m6A_Bcell")

TravisPlot

Show the transcriptomic view of 4 sets of genomic features

Example 1

TravisPlot(feature_mm10, 
           genome="mm10", 
           saveToPDFprefix = "Toy")

Gene annotation will be downloaded automatically, and A PDF figure with prefix “Toy” will be saved in the current working directory, showing the transcriptomic distribution of genomic features.

Example 2

It takes a few minutes to download Transcriptome information and build TravisCoodinatesFromTxDb, so we may want to avoid them to save time. We may install Transcriptome information database with R package:

source("http://bioconductor.org/biocLite.R")
biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") # install TranscriptDb for hg19
biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene") # install TranscriptDb for mm10

and then, load the transcriptome information with

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
hg19_txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
mm10_txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene

Alternatively, txdb can be downloaded online as well, which will take a few minutes:

hg19_txdb <- makeTxDbFromUCSC(genome="hg19")
mm10_txdb <- makeTxDbFromUCSC(genome="mm10")

We may then buid Travis coordinates

hg19_TravisCoordsFromTxDb <- makeTravisCoordsFromTxDb(hg19_txdb) # TravisCoordinates generated
## [1] "total 82960 transcripts extracted ..."
## [1] "total 47358 transcripts left after ambiguity filter ..."
## [1] "total 20302 mRNAs left after component length filter ..."
## [1] "total 8184 ncRNAs left after ncRNA length filter ..."
## [1] "Building Travis Coordinates. It may take a few minutes ..."
## [1] "Travis Coordinates Built ..."
mm10_TravisCoordsFromTxDb <- makeTravisCoordsFromTxDb(mm10_txdb) # TravisCoordinates generated
## [1] "total 61642 transcripts extracted ..."
## [1] "total 48644 transcripts left after ambiguity filter ..."
## [1] "total 18933 mRNAs left after component length filter ..."
## [1] "total 6819 ncRNAs left after ncRNA length filter ..."
## [1] "Building Travis Coordinates. It may take a few minutes ..."
## [1] "Travis Coordinates Built ..."

with TravisCoordinatesFromTxDb we can build TravisPlot.

TravisPlot(feature_mm10, 
           TravisCoordsFromTxDb = mm10_TravisCoordsFromTxDb, 
           saveToPDFprefix = "mm10")
## [1] "Using provided Travis Coordinates"
## [1] "resolving ambiguious features ..."
## [1] "Figures saved into mm10_Travis.pdf ..."

TravisPlot(feature_hg19, 
           TravisCoordsFromTxDb = hg19_TravisCoordsFromTxDb, 
           saveToPDFprefix = "hg19")
## [1] "Using provided Travis Coordinates"
## [1] "resolving ambiguious features ..."
## [1] "Figures saved into hg19_Travis.pdf ..."

two PDF figures with prefix “mm10” and “hg19” will be saved in the current working directory.. Make sure the genome assembly number used is matching the data.

Example 3

The Travis can cover neighborhood DNA regions as well (1k promoter and 1k tail) with option includeNeighborDNA. This option may be useful when examine transcription related genomic features that directly located on DNA rather than RNA.

TravisPlot(feature_mm10, 
           TravisCoordsFromTxDb = mm10_TravisCoordsFromTxDb, 
           saveToPDFprefix = "mm10_withDNA",
           includeNeighborDNA =TRUE)
## [1] "Using provided Travis Coordinates"
## [1] "resolving ambiguious features ..."
## [1] "rescaleComponent and includeNeighborDNA are simutaneously supported ..."
## [1] "Using includeNeighborDNA only instead ..."
## [1] "Figures saved into mm10_withDNA_Travis.pdf ..."

TravisPlot(feature_hg19, 
           TravisCoordsFromTxDb = hg19_TravisCoordsFromTxDb, 
           saveToPDFprefix = "hg19_withDNA",
           includeNeighborDNA =TRUE)
## [1] "Using provided Travis Coordinates"
## [1] "resolving ambiguious features ..."
## [1] "rescaleComponent and includeNeighborDNA are simutaneously supported ..."
## [1] "Using includeNeighborDNA only instead ..."
## [1] "Figures saved into hg19_withDNA_Travis.pdf ..."

Example 4

Alternatively, you may cancel the component rescaling, and treat each component equally with option rescaleComponent=FALSE. The figure generated in this way looks clearer, but the relative size of the components will be missing.

TravisPlot(feature_mm10, 
           TravisCoordsFromTxDb = mm10_TravisCoordsFromTxDb, 
           saveToPDFprefix = "mm10_withDNA_equal",
           includeNeighborDNA =TRUE,
           rescaleComponent=FALSE)
## [1] "Using provided Travis Coordinates"
## [1] "resolving ambiguious features ..."
## [1] "Figures saved into mm10_withDNA_equal_Travis.pdf ..."

TravisPlot(feature_hg19, 
           TravisCoordsFromTxDb = hg19_TravisCoordsFromTxDb, 
           saveToPDFprefix = "hg19_withDNA_equal",
           includeNeighborDNA =TRUE,
           rescaleComponent=FALSE)
## [1] "Using provided Travis Coordinates"
## [1] "resolving ambiguious features ..."
## [1] "Figures saved into hg19_withDNA_equal_Travis.pdf ..."

Advanced

You may also return the count information and customized the plot; however, this function is not well supported so far thus not recommended. Please contact the developer (jia.meng@hotmail.com) of Travis package for additional functions.

# example 5
f <-system.file("extdata", "RNAm5C_mm10_10000sites_bismark.cov", package="Travis") 
a <- read.table(f,header=FALSE,sep="\t")
q <- a[[5]]
size <- a[[5]]+a[[6]]
d0 <- (pbinom(q, size, 0.02, lower.tail = TRUE, log.p = FALSE) < 0.05)
d1 <- (pbinom(q, size, 0.02, lower.tail = FALSE, log.p = FALSE) < 0.05)  
GR <- GRanges(seqnames = a[,1], ranges = IRanges(start=a[,2], width=1))
peak <- list(GR[d0],GR[d1])
names(peak) <- c("<2%", ">2%")
ct <- TravisPlot(peak,
           TravisCoordsFromTxDb=mm10_TravisCoordsFromTxDb,
           saveToPDFprefix="RNAm5C_mm10",
           returnCount=TRUE)
## [1] "Using provided Travis Coordinates"
## [1] "resolving ambiguious features ..."
## [1] "Figures saved into RNAm5C_mm10_Travis.pdf ..."

We can then customize the plot with a little effort, e.g., calculate the relative percentage of the highly methylated and lowly methylated residuals at different locations.

ct1 <- ct[["mRNA_normalized"]]
id <- which(match(ct1$comp,c("Front","Back"))>0)
ct1 <- ct1[-id,]
featureSet <- as.character(unique(ct1$Feature))
for (i in 1:length(featureSet)) {
  id <- (ct1$Feature==featureSet[i])
  ct1$weight[id] <- ct1$count[id]/sum(ct1$count[id])
}

p1 <- ggplot(ct1, aes(x=pos, group=Feature, weight=3*weight))  +
  ggtitle("Distribution on mRNA") +
  theme(axis.ticks = element_blank(), axis.text.x = element_blank()) + 
  xlab("") + 
  ylab("Frequency") +
  geom_density(adjust=0.5,aes(fill=factor(Feature)),alpha=0.2,position="fill") +
  annotate("text", x = 1.5, y = -0.1, label = "5'UTR", size=3) +
  annotate("text", x = 2.5, y = -0.1, label = "CDS", size=3) +
  annotate("text", x = 3.5, y = -0.1, label = "3'UTR", size=3) + 
  xlim(1,4)
suppressWarnings(p1)
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

ggsave(filename="stack_RNAm5C_mm10.pdf",plot=p1,width=5, height=3)
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density
## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

Checking the overlap between different components in RNA and lncRNA

These are 8 sections in for an “TravisCoordsFromTxDb” object. Let us see how much they overlap with each other.

TravisCoords <- hg19_TravisCoordsFromTxDb
type <- paste(mcols(TravisCoords)$comp,mcols(TravisCoords)$category)
key <- unique(type)
landmark <- list(1,2,3,4,5,6,7,8)
names(landmark) <- key  
for (i in 1:length(key)) {
  landmark[[i]] <- TravisCoords[type==key[i]]
}
TravisPlot(landmark[1:5],
           TravisCoordsFromTxDb=TravisCoords,
           saveToPDFprefix="Landmark_hg19_mRNA",
           includeNeighborDNA =TRUE)
## [1] "Using provided Travis Coordinates"
## [1] "resolving ambiguious features ..."
## [1] "rescaleComponent and includeNeighborDNA are simutaneously supported ..."
## [1] "Using includeNeighborDNA only instead ..."
## [1] "Figures saved into Landmark_hg19_mRNA_Travis.pdf ..."

TravisPlot(landmark[6:8],
           TravisCoordsFromTxDb=TravisCoords,
           saveToPDFprefix="Landmark_hg19_ncRNA",
           includeNeighborDNA =TRUE)
## [1] "Using provided Travis Coordinates"
## [1] "resolving ambiguious features ..."
## [1] "rescaleComponent and includeNeighborDNA are simutaneously supported ..."
## [1] "Using includeNeighborDNA only instead ..."
## [1] "Figures saved into Landmark_hg19_ncRNA_Travis.pdf ..."

We can roughly see that, the mRNA components are evenly distributed on mRNA, and the lncRNA components are evenly distributed on lncRNA. The distribution on mRNA and lncRNA are standardized independently, the above code doesn’t support the overlapping between mRNA components and lncRNA components.

TravisCoords - transcriptomic landmarks projected to genome

The “TravisCoordsFromTxDb” object contains the genome-projected transcriptome coordinates, which can be valuable for evaluating transcriptomic information related applications, such as checking the quality of MeRIP-Seq data. The Travis coordinates are stored in the following format.

head(TravisCoords)
## GRanges object with 6 ranges and 5 metadata columns:
##              seqnames             ranges strand |       txid       pos
##                 <Rle>          <IRanges>  <Rle> |   <factor> <numeric>
##   uc010nxq.1     chr1 [  10854,   10904]      + | uc010nxq.1     0.005
##   uc009vjk.2     chr1 [ 321017,  321067]      + | uc009vjk.2     0.005
##   uc001aau.3     chr1 [ 322872,  322922]      + | uc001aau.3     0.005
##   uc021oeh.1     chr1 [ 323268,  323318]      + | uc021oeh.1     0.005
##   uc021oei.1     chr1 [ 326526,  326576]      + | uc021oei.1     0.005
##   uc001acv.3     chr1 [1071377, 1071427]      + | uc001acv.3     0.005
##               interval     comp category
##              <numeric> <factor> <factor>
##   uc010nxq.1        10    Front     mRNA
##   uc009vjk.2        10    Front     mRNA
##   uc001aau.3        10    Front     mRNA
##   uc021oeh.1        10    Front     mRNA
##   uc021oei.1        10    Front     mRNA
##   uc001acv.3        10    Front     mRNA
##   -------
##   seqinfo: 52 sequences from an unspecified genome; no seqlengths

It might be necessary to compare the difference bewteen results returned from makeTravisCoordsFromTxDb and makeTravisCoordsFromGRangesList. We may not need to differentiate different components on mRNA, i.e., 5’UTR, CDS, and 3’UTR.

Session Information

sessionInfo()
## R version 3.2.1 (2015-06-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8 x64 (build 9200)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.1.3
##  [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.3 
##  [3] Travis_0.2.0                            
##  [4] ggplot2_1.0.1                           
##  [5] GenomicAlignments_1.5.12                
##  [6] SummarizedExperiment_0.3.2              
##  [7] rtracklayer_1.29.12                     
##  [8] GenomicFeatures_1.21.13                 
##  [9] AnnotationDbi_1.31.17                   
## [10] Biobase_2.29.1                          
## [11] Rsamtools_1.21.14                       
## [12] Biostrings_2.37.2                       
## [13] XVector_0.9.1                           
## [14] GenomicRanges_1.21.17                   
## [15] GenomeInfoDb_1.5.9                      
## [16] IRanges_2.3.17                          
## [17] S4Vectors_0.7.12                        
## [18] BiocGenerics_0.15.5                     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.0          formatR_1.2          futile.logger_1.4.1 
##  [4] plyr_1.8.3           bitops_1.0-6         futile.options_1.0.0
##  [7] tools_3.2.1          zlibbioc_1.15.0      biomaRt_2.25.1      
## [10] digest_0.6.8         evaluate_0.7         RSQLite_1.0.0       
## [13] gtable_0.1.2         DBI_0.3.1            proto_0.3-10        
## [16] stringr_1.0.0        knitr_1.10.5         XML_3.98-1.3        
## [19] BiocParallel_1.3.47  rmarkdown_0.7        reshape2_1.4.1      
## [22] lambda.r_1.1.7       magrittr_1.5         MASS_7.3-43         
## [25] scales_0.3.0         htmltools_0.2.6      colorspace_1.2-6    
## [28] labeling_0.3         stringi_0.5-5        munsell_0.4.2       
## [31] RCurl_1.95-4.7