There are many interesting approaches to visualizing genome-scale data. Two major packages in Bioconductor are Gviz and ggbio. Both represent significant efforts at bridging the gap between graphics facilities and various genomic data structures.

ggbio’s autoplot method can be very useful for broad overviews. For a GRanges instance, each range for which data exists can be depicted as a band on the chromosome. The karyogram layout gives a genome-wide view, but it can be important to control the handling of extra-chromosomal sequence levels.

## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## 
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## 
## The following object is masked from 'package:stats':
## 
##     xtabs
## 
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, as.vector, cbind,
##     colnames, do.call, duplicated, eval, evalq, Filter, Find, get,
##     intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rep.int, rownames, sapply, setdiff, sort,
##     table, tapply, union, unique, unlist, unsplit
## 
## Loading required package: S4Vectors
## Creating a generic function for 'nchar' from package 'base' in package 'S4Vectors'
## Loading required package: IRanges
## Loading required package: ggplot2
## Need specific help about ggbio? try mailing 
##  the maintainer or visit http://tengfei.github.com/ggbio/
## 
## Attaching package: 'ggbio'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     geom_bar, geom_rect, geom_segment, ggsave, stat_bin,
##     stat_identity, xlim
library(ERBS)
data(HepG2)
library(GenomeInfoDb)  # trim all but autosomal chroms
seqlevels(HepG2, force=TRUE) = paste0("chr", 1:22)
data(GM12878)
seqlevels(GM12878, force=TRUE) = paste0("chr", 1:22)
library(ggbio)
autoplot(HepG2, layout="karyogram", main="ESRRA binding on HepG2")
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.

Notice that the title is not printed, currently a bug.

autoplot(GM12878, layout="karyogram", main="ESRRA binding on GM12878")
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.

———————————————–

Question 4.3.1

Elaborate the visualization of GM12878 ESRAA peaks as follows

library(ERBS)
library(ggbio)
library(GenomeInfoDb)
data(GM12878)
seqlevels(GM12878, force=TRUE) = paste0("chr", 1:22)
autoplot(GM12878, layout="karyogram", aes(colour=log(peak)))
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.

What is the difference between the bars colored in light blue vs those colored in darker blue or black?

The light blue bars correspond to genomic locations with lower probability of ESRRA binding.

The key is based on the aes(color=log(peak)) element of the call to autoplot.

Question 4.3.2

The following code combines information on two cell lines and the measured peak values.

library(ERBS)
data(HepG2)
data(GM12878)
HepG2$cell = "HepG2"
GM12878$cell = "Bcell"
tot = c(GM12878, HepG2)
tot$peak10 = tot$peak/10 # copes with automatic scale of y axis
seqlevels(tot, force=TRUE) = paste0("chr", 1:22)
library(ggbio)
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:ggbio':
## 
##     rescale
p = autoplot(seqinfo(tot))
p = p + layout_karyogram(tot, aes(fill=cell, colour=cell), geom="rect") +
    scale_colour_manual(values = alpha(c("green", "red"), .1)) +
    scale_fill_manual(values = alpha(c("green", "red"), .1))
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
p + layout_karyogram(tot, aes(x=start, y=peak10), ylim=c(15,30),
    geom="point", color="blue", size=.8)
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.

One chromosome is distinguished in this visualization by exhibiting a unusually dense set of binding events. Perform the following computations adter computing tot

stot = split(tot, as.character(seqnames(tot)))
w = sapply(stot, function(x) sum(width(x)))
sort(w/seqlengths(tot)[names(w)])
##        chr13         chr4         chr5        chr18         chr3 
## 0.0003640188 0.0004506674 0.0006031774 0.0006339747 0.0007702410 
##         chr6         chr8        chr14        chr15        chr21 
## 0.0007922155 0.0007992880 0.0008041301 0.0008168035 0.0008297961 
##         chr2        chr10         chr7         chr9        chr12 
## 0.0008607670 0.0008858319 0.0009302705 0.0009381261 0.0010232055 
##        chr11         chr1        chr22        chr20        chr16 
## 0.0011309380 0.0013064441 0.0017759823 0.0018399690 0.0022556755 
##        chr17        chr19 
## 0.0023735513 0.0050610713

Pick the chromosome with the greatest density of ESRRA binding sites.

chr19