do.call
do.call()
has rather surprising behaviour. You might expect that do.call("f", list(x)
would equivalent be f(x)
, but that’s far from the truth. The following example shows you what various calls to do.call()
actually generate:
df <- data.frame(x = runif(1), y = runif(1))
f <- function(x) {
sys.call()
}
# Worst: inlines f and df
do.call(f, list(df))
#> (function (x)
#> {
#> sys.call()
#> })(list(x = 0.263280282728374, y = 0.70501884073019))
# Only inlines df
do.call("f", list(df))
#> f(list(x = 0.263280282728374, y = 0.70501884073019))
# Only inlines f
do.call(f, list(quote(df)))
#> (function (x)
#> {
#> sys.call()
#> })(df)
# Best: inlines neither. Equivalent to f(df)
do.call("f", list(quote(df)))
#> f(df)
This has some obvious performance implications: when the call is larger, match.call()
is slower:
object.size(diamonds)
#> 3456256 bytes
microbenchmark(
do.call(f, list(diamonds)),
do.call("f", list(diamonds)),
do.call(f, list(quote(diamonds))),
do.call("f", list(quote(diamonds)))
)
#> Unit: microseconds
#> expr min lq median uq max neval
#> do.call(f, list(diamonds)) 1.76 1.90 2.09 2.60 46.48 100
#> do.call("f", list(diamonds)) 1.79 2.02 2.24 2.71 11.72 100
#> do.call(f, list(quote(diamonds))) 1.76 1.92 2.18 2.45 28.63 100
#> do.call("f", list(quote(diamonds))) 1.80 2.09 2.27 2.67 9.79 100
This is a bit of an aritifical benchmark, because match.call()
is not that common. It’s more useful to inspect the performance of a data modifying function:
microbenchmark(
do.call(head, list(diamonds)),
do.call("head", list(diamonds)),
do.call(head, list(quote(diamonds))),
do.call("head", list(quote(diamonds)))
)
#> Unit: microseconds
#> expr min lq median uq max neval
#> do.call(head, list(diamonds)) 559 1393 2077 2349 25044 100
#> do.call("head", list(diamonds)) 618 1596 2074 2320 25191 100
#> do.call(head, list(quote(diamonds))) 217 243 297 329 1137 100
#> do.call("head", list(quote(diamonds))) 215 234 252 319 474 100
However, something strange happens if we wrap head()
in another function:
head2 <- function(x) {
head(x)
}
microbenchmark(
do.call(head, list(diamonds)),
do.call(head2, list(diamonds))
)
#> Unit: microseconds
#> expr min lq median uq max neval
#> do.call(head, list(diamonds)) 595 1146 2018 2383 24638 100
#> do.call(head2, list(diamonds)) 216 234 291 318 391 100
I suspect that this is because head()
is an S3 generic, which is likely to do some manipulation of the call when before calling the method. If we call a function that isn’t an S3 generic, some of the inlined calls are actually slightly faster, probably because inlining avoids variable lookup.
microbenchmark(
do.call(nrow, list(diamonds)),
do.call("nrow", list(diamonds)),
do.call(nrow, list(quote(diamonds))),
do.call("nrow", list(quote(diamonds)))
)
#> Unit: microseconds
#> expr min lq median uq max neval
#> do.call(nrow, list(diamonds)) 5.29 5.74 5.97 6.34 51.5 100
#> do.call("nrow", list(diamonds)) 5.56 5.81 6.07 6.39 15.0 100
#> do.call(nrow, list(quote(diamonds))) 5.53 5.78 5.98 6.32 16.4 100
#> do.call("nrow", list(quote(diamonds))) 5.58 5.93 6.13 6.42 51.2 100
The biggest impact of using do.call()
is when an error occurs. It takes substantially longer for control to return back to the console when a large object is inlined:
h <- function(x) {
stop("!")
}
system.time(do.call(h, list(diamonds)))
#> Timing stopped at: 0.687 0.006 0.693
system.time(do.call("h", list(quote(diamonds))))
#> Timing stopped at: 0.001 0 0.002
(The previous chunk is not evaluated by knitr because I can’t reproduce the problem in a non-interactive setting. Instead I executed the code by hand and copied and pasted the output.)