Introduction to ggbio

Leonardo Collado-Torres
October 8, 2013

Setup

## CRAN
install.packages("ggplot2")

## Bioconductor
source("http://bioconductor.org/biocLite.R")
biocLite(c("IRanges", "GenomicRanges", "ggbio", "biovizBase"))

Goal

ggbio is to make it easy to make the common genomic plots once you have your data in GRanges objects and other objects from GenomicRanges.

Some plots

  • Manhattan plot (SNPs)
  • Ideograms
  • Tracks (emulate a genome browser)
  • Circular: good for re-arrangements

How it works

  • The syntax is similar to ggplot2 as ggbio builds on top of it.
  • Plotting functions return ggplot2 objects which you can then modify using ggplot2 code.
  • ggbio figures out how to align all your data in the genome axis for you.

Plus and minuses

  • Has very nice plots
  • Easy to make tracks
  • Many plots already implemented in long manual
  • It is inherently slow for large data as ggplot2 is slow in those cases
  • Might require more RAM

ggplot2 syntax

Ideogram: code

library("ggbio")
load(system.file("data", "hg19IdeogramCyto.rda", package="biovizBase", mustWork=TRUE))
p.ideo <- plotIdeogram(hg19IdeogramCyto, "chr1")
print(p.ideo)

Ideogram: result

plot of chunk unnamed-chunk-3

Karyogram: code

library("GenomicRanges")
data(hg19Ideogram, package= "biovizBase")
autoplot(seqinfo(hg19Ideogram)[paste0("chr", 1:13)])

Karyogram: result

plot of chunk unnamed-chunk-5

More interesting: make data

Some random data

## Generate
gr <- GRanges(sample(paste0("chr", 1:13), 50, TRUE), ranges=IRanges(round(runif(50, 1, 1e8)), width=1000))
## Get chr lengths
seqlengths(gr) <- seqlengths(hg19Ideogram)[names(seqlengths(gr))]
## Add a group variable
gr$group <- factor(sample(letters[1:4], 50, TRUE))

More interesting: data exploration

gr
GRanges with 50 ranges and 1 metadata column:
       seqnames               ranges strand   |    group
          <Rle>            <IRanges>  <Rle>   | <factor>
   [1]     chr2 [96543912, 96544911]      *   |        d
   [2]     chr4 [ 3299785,  3300784]      *   |        d
   [3]     chr8 [65888200, 65889199]      *   |        c
   [4]     chr7 [16710711, 16711710]      *   |        a
   [5]     chr4 [75695759, 75696758]      *   |        d
   ...      ...                  ...    ... ...      ...
  [46]     chr2 [30052086, 30053085]      *   |        c
  [47]     chr1 [31094192, 31095191]      *   |        a
  [48]     chr9 [89971281, 89972280]      *   |        d
  [49]    chr10 [63767875, 63768874]      *   |        a
  [50]     chr3 [ 7818785,  7819784]      *   |        a
  ---
  seqlengths:
        chr1     chr10     chr11     chr12 ...      chr7      chr8      chr9
   249250621 135534747 135006516 133851895 ... 159138663 146364022 141213431

More interesting: plot code

autoplot(seqinfo(gr)) + layout_karyogram(gr, aes(fill=group, color=group))

More interesting: result

plot of chunk unnamed-chunk-9

Tracks

First some noisy data

y <- as.vector(sapply(1:5, function(z) rnorm(100, 10 * z * (-1)^z, 2 * z)))
df <- data.frame(y=y + 1.1 * abs(min(y)), x=seq_len(length(y)))

We can visualize it with ggplot2

p.noisy <- ggplot(df, aes(x=x, y=y)) + geom_line()
p.noisy

Tracks: ggplot2

plot of chunk unnamed-chunk-12

Tracks: fake exons

exons <- GRanges(rep("chr1", 2), IRanges(c(101, 301), width=100))
p.exon <- autoplot(exons)
print(p.exon)

Tracks: fake exons plot

plot of chunk unnamed-chunk-14

Combine

final <- tracks(p.ideo, "Coverage" = p.noisy, "Exons" = p.exon, heights = c(2, 6, 3), title="Tracks plot!") + ylab("") + theme_tracks_sunset()
print(final)

Tracks: final plot

plot of chunk unnamed-chunk-16

More sophisticated cases

install.packages("devtools")
library("devtools")
install_github("derfinder", "lcolladotor")
library("derfinder")
?plotOverview
?plotCluster

plotCluster example

plotCluster example

End