do.calldo.call() has rather surprising behaviour. You might expect that do.call("f", list(x) would equivalent be f(x), but that’s far from the truth. The following example shows you what various calls to do.call() actually generate:
df <- data.frame(x = runif(1), y = runif(1))
f <- function(x) {
sys.call()
}
# Worst: inlines f and df
do.call(f, list(df))
#> (function (x)
#> {
#> sys.call()
#> })(list(x = 0.263280282728374, y = 0.70501884073019))
# Only inlines df
do.call("f", list(df))
#> f(list(x = 0.263280282728374, y = 0.70501884073019))
# Only inlines f
do.call(f, list(quote(df)))
#> (function (x)
#> {
#> sys.call()
#> })(df)
# Best: inlines neither. Equivalent to f(df)
do.call("f", list(quote(df)))
#> f(df)
This has some obvious performance implications: when the call is larger, match.call() is slower:
object.size(diamonds)
#> 3456256 bytes
microbenchmark(
do.call(f, list(diamonds)),
do.call("f", list(diamonds)),
do.call(f, list(quote(diamonds))),
do.call("f", list(quote(diamonds)))
)
#> Unit: microseconds
#> expr min lq median uq max neval
#> do.call(f, list(diamonds)) 1.76 1.90 2.09 2.60 46.48 100
#> do.call("f", list(diamonds)) 1.79 2.02 2.24 2.71 11.72 100
#> do.call(f, list(quote(diamonds))) 1.76 1.92 2.18 2.45 28.63 100
#> do.call("f", list(quote(diamonds))) 1.80 2.09 2.27 2.67 9.79 100
This is a bit of an aritifical benchmark, because match.call() is not that common. It’s more useful to inspect the performance of a data modifying function:
microbenchmark(
do.call(head, list(diamonds)),
do.call("head", list(diamonds)),
do.call(head, list(quote(diamonds))),
do.call("head", list(quote(diamonds)))
)
#> Unit: microseconds
#> expr min lq median uq max neval
#> do.call(head, list(diamonds)) 559 1393 2077 2349 25044 100
#> do.call("head", list(diamonds)) 618 1596 2074 2320 25191 100
#> do.call(head, list(quote(diamonds))) 217 243 297 329 1137 100
#> do.call("head", list(quote(diamonds))) 215 234 252 319 474 100
However, something strange happens if we wrap head() in another function:
head2 <- function(x) {
head(x)
}
microbenchmark(
do.call(head, list(diamonds)),
do.call(head2, list(diamonds))
)
#> Unit: microseconds
#> expr min lq median uq max neval
#> do.call(head, list(diamonds)) 595 1146 2018 2383 24638 100
#> do.call(head2, list(diamonds)) 216 234 291 318 391 100
I suspect that this is because head() is an S3 generic, which is likely to do some manipulation of the call when before calling the method. If we call a function that isn’t an S3 generic, some of the inlined calls are actually slightly faster, probably because inlining avoids variable lookup.
microbenchmark(
do.call(nrow, list(diamonds)),
do.call("nrow", list(diamonds)),
do.call(nrow, list(quote(diamonds))),
do.call("nrow", list(quote(diamonds)))
)
#> Unit: microseconds
#> expr min lq median uq max neval
#> do.call(nrow, list(diamonds)) 5.29 5.74 5.97 6.34 51.5 100
#> do.call("nrow", list(diamonds)) 5.56 5.81 6.07 6.39 15.0 100
#> do.call(nrow, list(quote(diamonds))) 5.53 5.78 5.98 6.32 16.4 100
#> do.call("nrow", list(quote(diamonds))) 5.58 5.93 6.13 6.42 51.2 100
The biggest impact of using do.call() is when an error occurs. It takes substantially longer for control to return back to the console when a large object is inlined:
h <- function(x) {
stop("!")
}
system.time(do.call(h, list(diamonds)))
#> Timing stopped at: 0.687 0.006 0.693
system.time(do.call("h", list(quote(diamonds))))
#> Timing stopped at: 0.001 0 0.002
(The previous chunk is not evaluated by knitr because I can’t reproduce the problem in a non-interactive setting. Instead I executed the code by hand and copied and pasted the output.)