ggplot2 redux

random things I like and other stuff

Marcus Beck
ORISE post-doc

To cover

  • facet_wrap, facet_grid
  • themes, preset and custom
  • ggmap
  • ggally

What can you do w/ ggplot2 that you can't do w/ base functions?

R Code Chunk Example

  • Facetting is one of the more powerful aspects of ggplot2

  • plot y vs x (or just x) by z, where z is some categorical variable

  • facet_grid or facet_wrap

data(diamonds)
head(diamonds)
##   carat       cut color clarity depth table price    x    y    z
## 1  0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
## 2  0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
## 3  0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31
## 4  0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
## 5  0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48

Simple scatterplot


p1 <- ggplot(diamonds, aes(x = carat, 
        y = price)) +
    geom_point()
p1

plot of chunk unnamed-chunk-4

Simple scatterplot with facet_wrap


p1 + facet_wrap(~color)

A simple scatterplot of diamond price by carat, facet_wrap by color

plot of chunk unnamed-chunk-6

Simple scatterplot with facet_grid


p1 + facet_grid(~color)

A simple scatterplot of diamond price by carat, facet_grid by color

What's the difference?

plot of chunk unnamed-chunk-8

facet_wrap vs facet_grid

  • more important for multiple facet variables
  • facet_wrap always has one horizontal facet label on the top
p1 + facet_wrap(~ cut + color)

plot of chunk unnamed-chunk-9

facet_wrap vs facet_grid

  • more important for multiple facet variables
  • facet_grid can have both horizontal, vertical facet labels
p1 + facet_grid(cut ~ color)

plot of chunk unnamed-chunk-10

facet_wrap vs facet_grid

  • order of variables affects position of facets
  • facet_wrap orders facets by position in the call
p1 + facet_wrap(~ color + cut) # same as facet_wrap(color ~ cut)

plot of chunk unnamed-chunk-11

facet_wrap vs facet_grid

  • order of variables affects position of facets
  • facet_wrap orders facets by position in the call
p1 + facet_wrap(~ cut + color) # same as facet_wrap(cut ~ color)

plot of chunk unnamed-chunk-12

facet_wrap vs facet_grid

  • order of variables affects position of facets
  • facet_grid orders vertical/horizontal facets by left/right of tilde
p1 + facet_grid(cut ~ color) # not the same as facet_grid(~ cut + color)

plot of chunk unnamed-chunk-13

facet_wrap vs facet_grid

  • order of variables affects position of facets
  • facet_grid orders vertical/horizontal facets by left/right of tilde
p1 + facet_grid(color ~ cut) # not the same as facet_grid(~ color + cut)

plot of chunk unnamed-chunk-14

facet_wrap vs facet_grid

  • both use the scales argument for axes, otherwise fixed
p1 + facet_wrap(~ color + cut, scales = 'free') # or 'free_x', 'free_y' 

plot of chunk unnamed-chunk-15

facet_wrap vs facet_grid

  • facet_grid treats scales differently
p1 + facet_grid(color ~ cut, scales = 'free') # or 'free_x', 'free_y' 

plot of chunk unnamed-chunk-16

facet_wrap vs facet_grid

  • facet_wrap uses the ncol argument
p1 + facet_wrap(~ color + cut, ncol = 7)

plot of chunk unnamed-chunk-17

facet_wrap vs facet_grid

  • facet_grid always creates a symmetrical plot
p1 + facet_grid(~ color + cut)

plot of chunk unnamed-chunk-18

A huge advantage of facets...

  • Very easy to quickly evaluate a variable by multiple categories
  • Adding some fake variables to diamonds...
diamonds$fake1 <- sample(c('A', 'B'), nrow(diamonds), replace = T)
diamonds$fake2 <- sample(c('C', 'D'), nrow(diamonds), replace = T)
diamonds$fake3 <- sample(c('E', 'F'), nrow(diamonds), replace = T)
diamonds$fake4 <- sample(c('G', 'H'), nrow(diamonds), replace = T)
head(diamonds[, grep('fake', names(diamonds))])
##   fake1 fake2 fake3 fake4
## 1     B     D     F     G
## 2     A     D     E     H
## 3     A     D     F     G
## 4     A     C     E     G
## 5     B     D     E     G
## 6     A     C     F     H

A huge advantage of facets...

p1 + facet_grid(fake1 + fake2 ~ fake3 + fake4)

plot of chunk unnamed-chunk-21

Facet summary

facet_wrap and facet_grid accomplish similar tasks, with slight differences

facet_wrap

  • only horizontal facet labels
  • not symmetric, uses ncol
  • scales affect all facets

facet_grid

  • horizontal/vertical facet labels
  • always symmetric
  • scales only affect outer facets

    • Not apparent why you would use one over the other...

Themes


  • ggplot2 creates plots with a given theme, default is theme_grey()
data(iris)
p2 <- ggplot(iris, aes(x = Sepal.Length, 
        y = Sepal.Width, 
        colour = Species)) +
    geom_point()
p2

plot of chunk unnamed-chunk-23

Themes


  • Other pre-loaded themes are theme_bw() and...
p2 + theme_bw() 

plot of chunk unnamed-chunk-25

Themes


  • theme_classic()
p2 + theme_classic()    

plot of chunk unnamed-chunk-27

Themes

  • Themes are simply complete calls to the ggplot2::theme function
  • A custom theme is easily made
ugly_theme <- theme(
    panel.background = element_rect(fill = "green"), 
    axis.line = element_line(size = 3, colour = "red", linetype = "dotted"),
    axis.text = element_text(colour = "blue"),
    axis.ticks.length = unit(.85, "cm")
    )
  • Complete list of theme options here

Themes


  • Great success!
p2 + ugly_theme

plot of chunk unnamed-chunk-30

Themes

  • The user-defined theme can also be set as default by updating an existing theme
ugly_default <- function(){
    theme_grey() %+replace%
    theme(
        panel.background = element_rect(fill = "green"), 
        axis.line = element_line(size = 3, colour = "red", linetype = "dotted"),
        axis.text = element_text(colour = "blue"),
        axis.ticks.length = unit(.85, "cm")
        )
    }
theme_set(ugly_default())

p2

Themes

  • The ggthemes library provides additional themes
library(devtools)
install_github('ggthemes', username = 'jrnold')
library(ggthemes)   
  • Check the repo on Github for more info

Themes


  • The ggthemes library provides additional themes
  • Wall Street Journal theme
p2 + theme_wsj()

plot of chunk unnamed-chunk-35

Themes


  • The ggthemes library provides additional themes
  • Google docs theme
p2 + theme_gdocs()

plot of chunk unnamed-chunk-37

Themes


  • The ggthemes library provides additional themes
  • Even this old school Excel theme...
p2 + theme_excel()

plot of chunk unnamed-chunk-39

Easy mapping with ggmap

The basic idea of ggmap is to take a downloaded map image, plot it as a context layer using ggplot2, and then plot additional content layers of data, statistics, or models on top of the map.

Kahle and Wickham 2011

Easy mapping with ggmap

  • install/load ggmap
  • download the images and format for plotting, done with get_map
install.packages('ggmap')
library(ggmap)

# get map by location
loc <- 'Environmental Protection Agency, 1 Sabine Drive, Gulf Breeze, FL'
my_map <- get_map(
    location = loc, 
    source = 'google', 
    maptype = 'terrain', 
    zoom = 13
    )

Easy mapping with ggmap


  • plot with ggmap
ggmap(my_map, extent = 'panel')

plot of chunk unnamed-chunk-43

Easy mapping with ggmap


  • Now add some regular ggplot2 content layers
pts <- data.frame(
    lon = c(-87.1930, -87.2050, -87.1571),
    lat = c(30.3473, 30.3406, 30.3380),
    lab = c('Site 1', 'Site 2', 'Home')
    )
ggmap(my_map, extent = 'panel',
    base_layer = ggplot(pts, 
            aes(x = lon, y = lat))) +
        geom_text(aes(label = lab))

plot of chunk unnamed-chunk-45

Easy mapping with ggmap


  • Additional map types
  • See documentation for full list of options
my_map <- get_map(
    location = loc, 
    source = 'google', 
    maptype = 'satellite', 
    zoom = 13
    )
ggmap(my_map, extent = 'panel')

plot of chunk unnamed-chunk-47

GGally


  • A helper to ggplot2... contains templates for different plots to be combined into a plot matrix, a parallel coordinate plot function, as well as a function for making a network plot, on CRAN
  • The generalized pairs plot is a plot matrix that builds on the standard pairs plot
data(tips, package = "reshape2")
pairs(tips[, 1:4])

plot of chunk unnamed-chunk-49

GGally


  • Pairs plots are inadequate for exploratory analysis of variables that are a mix of quantitative and categorical information
  • ggpairs provides a plot matrix of mosaic tiles that describe data of different categories using a ggplot2 framework
install.packages('GGally')
ggpairs(tips[, 1:4])

plot of chunk unnamed-chunk-51

GGally


  • Information above/below diagonal is not redundant
  • quantitative-quantitative: scatterplot
  • quantitative-categorical: boxplots
  • categorical-categorical: conditional barplots
install.packages('GGally')
ggpairs(tips[, 1:4])

plot of chunk unnamed-chunk-53

GGally


  • Defaults can be customized
install.packages('GGally')
ggpairs(
  tips[,1:4],
  upper = list(continuous = "density", 
    combo = "box"),
  lower = list(continuous = "points", 
    combo = "dot")
    )

plot of chunk unnamed-chunk-55

GGally


  • Other plots... static parallel coordinate plots.
ggparcoord(data = iris, columns = 1:4, 
    groupColumn = 5, 
    order = "anyClass")

plot of chunk unnamed-chunk-57

GGally


  • Other plots... network plots.
library(sna)

url = url("http://networkdata.ics.uci.edu/
    netdata/data/cities.RData")
print(load(url)); close(url)
# plot cities, firms and law firms
type = cities %v% "type"
type = ifelse(grepl("City|Law", type), 
    gsub("I+", "", type), "Firm")
ggnet(cities, mode = "kamadakawai", 
    alpha = .5, node.group = type, 
    label.nodes = c("Paris", "Beijing", 
        "Chicago"), 
    color = "darkred")

plot of chunk unnamed-chunk-59

What does ggplot2 offer that isn't in base?

  • Many geoms
ls(pattern = '^geom_', env = as.environment('package:ggplot2'))
##  [1] "geom_abline"     "geom_area"       "geom_bar"       
##  [4] "geom_bin2d"      "geom_blank"      "geom_boxplot"   
##  [7] "geom_contour"    "geom_crossbar"   "geom_density"   
## [10] "geom_density2d"  "geom_dotplot"    "geom_errorbar"  
## [13] "geom_errorbarh"  "geom_freqpoly"   "geom_hex"       
## [16] "geom_histogram"  "geom_hline"      "geom_jitter"    
## [19] "geom_line"       "geom_linerange"  "geom_map"       
## [22] "geom_path"       "geom_point"      "geom_pointrange"
## [25] "geom_polygon"    "geom_quantile"   "geom_raster"    
## [28] "geom_rect"       "geom_ribbon"     "geom_rug"       
## [31] "geom_segment"    "geom_smooth"     "geom_step"      
## [34] "geom_text"       "geom_tile"       "geom_violin"    
## [37] "geom_vline"

What does ggplot2 offer that isn't in base?

  • Easy facetting, pre-loaded and customized themes, spatial data, pairs plots...

  • See online documentation for additional functionality

  • Presentation materials available here

  • EPA, NHEERL slidify template from Jeff's repo