Getting Started with Travis - V2

This is a manual for the developmental version of Travis R package

Intall Travis package

Install exomePeak, this will install a few useful packages from Bioconductor

source("http://bioconductor.org/biocLite.R")
biocLite("exomePeak")

Install ggplot2, devtools and then Travis package

install.packages("ggplot2")
install.packages("devtools")
library(devtools)
install_github('lzcyzm/Travis_Dev')

load Travis package

library(Travis)

## Loading required package: Rsamtools
## Loading required package: S4Vectors
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## 
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## 
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, xtabs
## 
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, as.vector, cbind,
##     colnames, do.call, duplicated, eval, evalq, Filter, Find, get,
##     grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rep.int, rownames, sapply,
##     setdiff, sort, table, tapply, union, unique, unlist, unsplit

## Warning in .recacheSubclasses(def@className, def, doSubclasses, env):
## undefined subclass "externalRefMethod" of class "expressionORfunction";
## definition not updated

## Loading required package: IRanges

## Warning in .recacheSubclasses(def@className, def, doSubclasses, env):
## undefined subclass "externalRefMethod" of class "functionORNULL";
## definition not updated

## Loading required package: GenomeInfoDb
## Loading required package: GenomicRanges
## Loading required package: XVector
## Loading required package: Biostrings
## Loading required package: GenomicFeatures
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Loading required package: rtracklayer
## Loading required package: GenomicAlignments
## Loading required package: SummarizedExperiment
## Loading required package: ggplot2

Read all the toy data files into a list with names

Find the toy data files came with Travis package

bed3=system.file("extdata", "H3K4me3_mm10_1000peaks.bed", package="Travis")
bed12=system.file("extdata", "m6A_mm10_exomePeak_1000peaks_bed12.bed", package="Travis")
bam=system.file("extdata", "SRR568477_mm10_XIST.bam", package="Travis")
narrowPeak=system.file("extdata", "m6A_hg19_1000peaks_macs2.narrowPeak", package="Travis")

Read 4 different kinds of Genomic Features into R

H3K4me3 <- import.bed(bed3) # bed3 imported as GRanges
m6A_HepG2 <- BED12toGRangesList(bed12) # bed12 imported as GRangesList

## [1] "Converting BED12 to GRangesList"
## [1] "It may take a few minutes"

Chip <- readGAlignments(bam) # bam imported as GAlignments
m6A_Bcell <- narrowPeaktoGRanges(narrowPeak) # bam imported as GAlignments

Put everything in a list with names. Make sure to be aware of the genome assembly used for each genomic feature set. Put different genome assembly features in different lists.

feature_mm10 <- list(H3K4me3,m6A_HepG2,Chip) 
names(feature_mm10) <- c("H3K4me3", "m6A_HepG2", "Chip")

feature_hg19 <- list(m6A_Bcell) 
names(feature_hg19) <- c("m6A_Bcell")

TravisPlot

Show the transcriptomic view of 4 sets of genomic features

Example 1

TravisPlot(feature_mm10, 
           genome="mm10", 
           saveToPDFprefix = "Toy")

Gene annotation will be downloaded automatically, and A PDF figure with prefix “Toy” will be saved in the current working directory, showing the transcriptomic distribution of genomic features.

Example 2

It takes a few minutes to download Transcriptome information and build TravisCoodinatesFromTxDb, so we may want to avoid them to save time. We may install Transcriptome information database with R package:

source("http://bioconductor.org/biocLite.R")
biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") # install TranscriptDb for hg19
biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene") # install TranscriptDb for mm10

and then, load the transcriptome information with

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
hg19_txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
mm10_txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene

Alternatively, txdb can be downloaded online as well, which will take a few minutes:

hg19_txdb <- makeTxDbFromUCSC(genome="hg19")
mm10_txdb <- makeTxDbFromUCSC(genome="mm10")

We may then buid Travis coordinates

hg19_TravisCoordsFromTxDb <- makeTravisCoordsFromTxDb(hg19_txdb) # TravisCoordinates generated

## [1] "total 82960 transcripts extracted ..."
## [1] "total 47358 transcripts left after ambiguity filter ..."
## [1] "total 20302 mRNAs left after component length filter ..."
## [1] "total 8184 ncRNAs left after ncRNA length filter ..."
## [1] "Building Travis Coordinates. It may take a few minutes ..."
## [1] "Travis Coordinates Built ..."

mm10_TravisCoordsFromTxDb <- makeTravisCoordsFromTxDb(mm10_txdb) # TravisCoordinates generated

## [1] "total 61642 transcripts extracted ..."
## [1] "total 48644 transcripts left after ambiguity filter ..."
## [1] "total 18933 mRNAs left after component length filter ..."
## [1] "total 6819 ncRNAs left after ncRNA length filter ..."
## [1] "Building Travis Coordinates. It may take a few minutes ..."
## [1] "Travis Coordinates Built ..."

with TravisCoordinatesFromTxDb we can build TravisPlot.

TravisPlot(feature_mm10, 
           TravisCoordsFromTxDb = mm10_TravisCoordsFromTxDb, 
           saveToPDFprefix = "mm10")

## [1] "Using provided Travis Coordinates"

## [1] "Figures saved into mm10_Travis.pdf ..."

## Warning: Removed 5793 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 994 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 55 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 1158 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 324 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 181 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

TravisPlot(feature_hg19, 
           TravisCoordsFromTxDb = hg19_TravisCoordsFromTxDb, 
           saveToPDFprefix = "hg19")

## [1] "Using provided Travis Coordinates"

## [1] "Figures saved into hg19_Travis.pdf ..."

## Warning: Removed 2115 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 926 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

two PDF figures with prefix “mm10” and “hg19” will be saved in the current working directory.. Make sure the genome assembly used is correct.

Example 3

The Travis can cover neighborhood DNA regions as well (5k promoter and 5k tail) with option includeNeighborDNA. This option may be useful when examine transcription related genomic features that directly located on DNA rather than RNA.

TravisPlot(feature_mm10, 
           TravisCoordsFromTxDb = mm10_TravisCoordsFromTxDb, 
           saveToPDFprefix = "mm10_withDNA",
           includeNeighborDNA =TRUE)

## [1] "Using provided Travis Coordinates"

## [1] "Figures saved into mm10_withDNA_Travis.pdf ..."

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

Advanced

You may also return the count information and customized the plot; however, this function is not well supported so far by Travis package, so it is not recommended.

f <-system.file("extdata", "RNAm5C_mm10_10000sites_bismark.cov", package="Travis") 
a <- read.table(f,header=FALSE,sep="\t")
q <- a[[5]]
size <- a[[5]]+a[[6]]
d0 <- (pbinom(q, size, 0.05, lower.tail = TRUE, log.p = FALSE) < 0.05)
d1 <- (pbinom(q, size, 0.05, lower.tail = FALSE, log.p = FALSE) < 0.05)  
GR <- GRanges(seqnames = a[,1], ranges = IRanges(start=a[,2], width=1))
peak <- list(GR[d0],GR[d1])
names(peak) <- c("<5%", ">5%")
ct <- TravisPlot(peak,
           TravisCoordsFromTxDb=mm10_TravisCoordsFromTxDb,
           saveToPDFprefix="RNAm5C_mm10",
           returnCount=TRUE)

## [1] "Using provided Travis Coordinates"

## [1] "Figures saved into RNAm5C_mm10_Travis.pdf ..."

## Warning: Removed 102 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 13 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 40 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 15 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

ct1 <- ct[["mRNA_normalized"]]
    
p1 <- ggplot(ct1, aes(x=pos, group=Feature, weight=5*weight))  +
      ggtitle("Distribution on mRNA") +
      theme(axis.ticks = element_blank(), axis.text.x = element_blank()) + 
      xlab("") + 
      ylab("Frequency") +
      geom_density(adjust=0.5,aes(fill=factor(Feature)),alpha=0.2,position="fill") +
      annotate("text", x = 1.5, y = -0.1, label = "5'UTR", size=3) +
  annotate("text", x = 0.5, y = -0.1, label = "Promoter (5kb)", size=3) +
      annotate("text", x = 2.5, y = -0.1, label = "CDS", size=3) +
      annotate("text", x = 3.5, y = -0.1, label = "3'UTR", size=3) + 
  annotate("text", x = 4.5, y = -0.1, label = "Tail (5kb)", size=3) +
      xlim(0,5)
p1

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

ggsave(filename="stack_RNAm5C_mm10.pdf",plot=p1,width=5, height=3)

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

# example 4
f <-system.file("extdata", "DNAm5C_mm10_10000sites_bismark.cov", package="Travis")
a <- read.table(f,header=FALSE,sep="\t")
q <- a[[5]]
size <- a[[5]]+a[[6]]
d0 <- (pbinom(q, size, 0.5, lower.tail = TRUE, log.p = FALSE) < 0.05)
d1 <- (pbinom(q, size, 0.5, lower.tail = FALSE, log.p = FALSE) < 0.05)  
GR <- GRanges(seqnames = a[,1], ranges = IRanges(start=a[,2], width=1))
peak <- list(GR[d0],GR[d1])
names(peak) <- c("<50%", ">50%")
ct <- TravisPlot(peak,
           TravisCoordsFromTxDb=mm10_TravisCoordsFromTxDb,
           saveToPDFprefix="DNAm5C_mm10",
           returnCount=TRUE)

## [1] "Using provided Travis Coordinates"

## [1] "Figures saved into DNAm5C_mm10_Travis.pdf ..."

## Warning: Removed 184 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 389 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 46 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning: Removed 69 rows containing non-finite values (stat_density).

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

ct2 <- ct[["ncRNA_normalized"]]
    
p2 <- ggplot(ct2, aes(x=pos, group=Feature, weight=3*weight))  +
      ggtitle("Distribution on mRNA") +
      theme(axis.ticks = element_blank(), axis.text.x = element_blank()) + 
      xlab("") + 
      ylab("Frequency") +
      geom_density(adjust=0.5,aes(fill=factor(Feature)),alpha=0.2,position="fill") +
      annotate("text", x = 6.5, y = -0.1, label = "Promoter (5kb)", size=3) +
      annotate("text", x = 7.5, y = -0.1, label = "lncRNA", size=3) +
      annotate("text", x = 8.5, y = -0.1, label = "Tail (5kb)", size=3) + 
      xlim(6,9)
p2

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

ggsave(filename="stack_DNAm5C_mm10.pdf",plot=p2,width=5, height=3)

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

Checking the overlap between different components in RNA and lncRNA

These are 8 sections in for an “TravisCoordsFromTxDb” object. Let us see how much they overlap with each other.

TravisCoords <- hg19_TravisCoordsFromTxDb
type <- paste(mcols(TravisCoords)$comp,mcols(TravisCoords)$category)
key <- unique(type)
landmark <- list(1,2,3,4,5,6,7,8)
names(landmark) <- key  
for (i in 1:length(key)) {
  landmark[[i]] <- TravisCoords[type==key[i]]
}
TravisPlot(landmark[1:5],TravisCoordsFromTxDb=TravisCoords,saveToPDFprefix="Landmark_hg19_mRNA",includeNeighborDNA =TRUE)

## [1] "Using provided Travis Coordinates"

## [1] "Figures saved into Landmark_hg19_mRNA_Travis.pdf ..."

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

TravisPlot(landmark[6:8],TravisCoordsFromTxDb=TravisCoords,saveToPDFprefix="Landmark_hg19_ncRNA",includeNeighborDNA =TRUE)

## [1] "Using provided Travis Coordinates"

## [1] "Figures saved into Landmark_hg19_ncRNA_Travis.pdf ..."

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

## Warning in density.default(data$x, adjust = adjust, kernel = kernel, weight
## = data$weight, : sum(weights) != 1 -- will not get true density

We can roughly see that, the mRNA components are evenly distributed on mRNA, and the lncRNA components are evenly distributed on lncRNA. The distribution on mRNA and lncRNA are standardized independently, the above code doesn’t support the overlapping between mRNA components and lncRNA components.

TravisCoords - transcriptomic landmarks projected to genome

The “TravisCoordsFromTxDb” object contains the genome-projected transcriptome coordinates, which can be valuable for evaluating transcriptomic information related applications, such as checking the quality of MeRIP-Seq data. The Travis coordinates are stored in the following format.

head(TravisCoords)

## GRanges object with 6 ranges and 4 metadata columns:
##              seqnames             ranges strand |       txid       pos
##                 <Rle>          <IRanges>  <Rle> |   <factor> <numeric>
##   uc010nxq.1     chr1 [   6874,    6924]      + | uc010nxq.1     0.005
##   uc009vjk.2     chr1 [ 317037,  317087]      + | uc009vjk.2     0.005
##   uc001aau.3     chr1 [ 318892,  318942]      + | uc001aau.3     0.005
##   uc021oeh.1     chr1 [ 319288,  319338]      + | uc021oeh.1     0.005
##   uc021oei.1     chr1 [ 322546,  322596]      + | uc021oei.1     0.005
##   uc001acv.3     chr1 [1067397, 1067447]      + | uc001acv.3     0.005
##                  comp category
##              <factor> <factor>
##   uc010nxq.1    Front     mRNA
##   uc009vjk.2    Front     mRNA
##   uc001aau.3    Front     mRNA
##   uc021oeh.1    Front     mRNA
##   uc021oei.1    Front     mRNA
##   uc001acv.3    Front     mRNA
##   -------
##   seqinfo: 52 sequences from an unspecified genome; no seqlengths

It might be necessary to compare the difference bewteen results returned from makeTravisCoordsFromTxDb and makeTravisCoordsFromGRangesList. We may not need to differentiate different components on mRNA, i.e., 5’UTR, CDS, and 3’UTR.

Session Information

sessionInfo()

## R version 3.2.1 (2015-06-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8 x64 (build 9200)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.1.3
##  [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.3 
##  [3] Travis_0.2.0                            
##  [4] ggplot2_1.0.1                           
##  [5] GenomicAlignments_1.5.12                
##  [6] SummarizedExperiment_0.3.2              
##  [7] rtracklayer_1.29.12                     
##  [8] GenomicFeatures_1.21.13                 
##  [9] AnnotationDbi_1.31.17                   
## [10] Biobase_2.29.1                          
## [11] Rsamtools_1.21.14                       
## [12] Biostrings_2.37.2                       
## [13] XVector_0.9.1                           
## [14] GenomicRanges_1.21.17                   
## [15] GenomeInfoDb_1.5.9                      
## [16] IRanges_2.3.17                          
## [17] S4Vectors_0.7.12                        
## [18] BiocGenerics_0.15.5                     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.0          formatR_1.2          futile.logger_1.4.1 
##  [4] plyr_1.8.3           bitops_1.0-6         futile.options_1.0.0
##  [7] tools_3.2.1          zlibbioc_1.15.0      biomaRt_2.25.1      
## [10] digest_0.6.8         evaluate_0.7         RSQLite_1.0.0       
## [13] gtable_0.1.2         DBI_0.3.1            proto_0.3-10        
## [16] stringr_1.0.0        knitr_1.10.5         XML_3.98-1.3        
## [19] BiocParallel_1.3.47  rmarkdown_0.7        reshape2_1.4.1      
## [22] lambda.r_1.1.7       magrittr_1.5         MASS_7.3-43         
## [25] scales_0.2.5         htmltools_0.2.6      colorspace_1.2-6    
## [28] labeling_0.3         stringi_0.5-5        munsell_0.4.2       
## [31] RCurl_1.95-4.7