This document compares the performance of two simple plots, built with ggplot2 with respect to ggpipe, to show the performance penalty of ggpipe.
ggpipe has some performance penalty with respect to ggplot2 because it uses tryCatch and evaluates twice one of the arguments, to determine if it is a ggplot object, and therefore to know if we are piping or adding.
We define two tasks:
Task 1 is where most of the overhead of ggpipe is supposed to be, as the plot building step is done by ggplot2 in any case. However Task 2 reflects the most common use case (we usually want to print the plots we define).
Task 1 results: Define a plot
b1 <- microbenchmark::microbenchmark(
ggpipe = {p <- ggpipe::ggplot(iris) %>% ggpipe::geom_point(ggpipe::aes(x = Sepal.Length, y = Sepal.Width)) %>% ggpipe::scale_x_continuous("test")},
ggplot2 = {p <- ggplot2::ggplot(iris) + ggplot2::geom_point(ggplot2::aes(x = Sepal.Length, y = Sepal.Width)) + ggplot2::scale_x_continuous("test")})
b1$task <- "Define Plot"
print(b1)## Unit: milliseconds
## expr min lq mean median uq max neval cld
## ggpipe 1.650381 1.717074 1.885598 1.764209 1.856898 4.842685 100 a
## ggplot2 1.467166 1.532710 2.227409 1.590813 1.687745 44.903408 100 a
Task 2 results: Define + Build plot
b2 <- microbenchmark::microbenchmark(
ggpipe = {ggplot2::ggplot_build(ggpipe::ggplot(iris) %>% ggpipe::geom_point(ggpipe::aes(x = Sepal.Length, y = Sepal.Width)) %>% ggpipe::scale_x_continuous("test"))},
ggplot2 = {ggplot2::ggplot_build(ggplot2::ggplot(iris) + ggplot2::geom_point(ggplot2::aes(x = Sepal.Length, y = Sepal.Width)) + ggplot2::scale_x_continuous("test"))})
b2$task <- "Define+Build Plot"
print(b2)## Unit: milliseconds
## expr min lq mean median uq max neval cld
## ggpipe 10.82524 11.47942 13.30970 11.84620 12.53144 116.1467 100 a
## ggplot2 10.49926 11.24596 12.90153 11.69261 12.44777 102.9085 100 a
This is the median time represented in a bar plot:
b <- dplyr::bind_rows(b1, b2)
b %>%
group_by(expr, task) %>%
summarize(median_time = median(time)/1E6) %>%
ungroup %>%
ggplot %>%
geom_col(aes(x = expr, y = median_time, fill = expr)) %>%
xlab('Package') %>%
ylab("Median time (ms) (less is better)") %>%
facet_wrap(~task) %>%
guides(fill = FALSE)And the region where most of the data points are:
ggplot(b) %>%
geom_jitter(aes(x = expr, y = time/1E6, color = expr)) %>%
xlab('Package') %>%
ylab("Time (ms) (less is better)") %>%
ylim(0, 20) %>%
facet_wrap(~task) %>%
guides(colour = FALSE)