ggpipe overhead

Sergio Oller

2017-11-08

This document compares the performance of two simple plots, built with ggplot2 with respect to ggpipe, to show the performance penalty of ggpipe.

ggpipe has some performance penalty with respect to ggplot2 because it uses tryCatch and evaluates twice one of the arguments, to determine if it is a ggplot object, and therefore to know if we are piping or adding.

We define two tasks:

Task 1 is where most of the overhead of ggpipe is supposed to be, as the plot building step is done by ggplot2 in any case. However Task 2 reflects the most common use case (we usually want to print the plots we define).

Task 1 results: Define a plot

b1 <- microbenchmark::microbenchmark(
  ggpipe = {p <- ggpipe::ggplot(iris) %>% ggpipe::geom_point(ggpipe::aes(x = Sepal.Length, y = Sepal.Width)) %>% ggpipe::scale_x_continuous("test")},
  ggplot2 = {p <- ggplot2::ggplot(iris) + ggplot2::geom_point(ggplot2::aes(x = Sepal.Length, y = Sepal.Width)) + ggplot2::scale_x_continuous("test")})
b1$task <- "Define Plot"

print(b1)
## Unit: milliseconds
##     expr      min       lq     mean   median       uq       max neval cld
##   ggpipe 1.650381 1.717074 1.885598 1.764209 1.856898  4.842685   100   a
##  ggplot2 1.467166 1.532710 2.227409 1.590813 1.687745 44.903408   100   a

Task 2 results: Define + Build plot

b2 <- microbenchmark::microbenchmark(
  ggpipe = {ggplot2::ggplot_build(ggpipe::ggplot(iris) %>% ggpipe::geom_point(ggpipe::aes(x = Sepal.Length, y = Sepal.Width)) %>% ggpipe::scale_x_continuous("test"))},
  ggplot2 = {ggplot2::ggplot_build(ggplot2::ggplot(iris) + ggplot2::geom_point(ggplot2::aes(x = Sepal.Length, y = Sepal.Width)) + ggplot2::scale_x_continuous("test"))})
b2$task <- "Define+Build Plot"

print(b2)
## Unit: milliseconds
##     expr      min       lq     mean   median       uq      max neval cld
##   ggpipe 10.82524 11.47942 13.30970 11.84620 12.53144 116.1467   100   a
##  ggplot2 10.49926 11.24596 12.90153 11.69261 12.44777 102.9085   100   a

This is the median time represented in a bar plot:

b <- dplyr::bind_rows(b1, b2)

b %>%
  group_by(expr, task) %>%
  summarize(median_time = median(time)/1E6) %>%
  ungroup %>%
  ggplot %>%
  geom_col(aes(x = expr, y = median_time, fill = expr)) %>%
  xlab('Package') %>%
  ylab("Median time (ms) (less is better)") %>%
  facet_wrap(~task) %>%
  guides(fill = FALSE)

And the region where most of the data points are:

ggplot(b) %>%
  geom_jitter(aes(x = expr, y = time/1E6, color = expr)) %>%
  xlab('Package') %>%
  ylab("Time (ms) (less is better)") %>%
  ylim(0, 20) %>%
  facet_wrap(~task) %>%
  guides(colour = FALSE)