library(dplyr)
The doublestache syntax of rlang 0.4.0 makes it easier to wrap data-masking functions:
mean_by <- function(data, var, by) {
data %>%
group_by({{ by }}) %>%
summarise(average = mean({{ var }}))
}
{{ reads as an interpolation syntax and suggests to the reader that the code of by and of var is substituted, rather than their values. But how do we parameterise the return name? It is currently hard-coded as average:
mtcars %>% mean_by(disp)
#> # A tibble: 1 x 1
#> average
#> <dbl>
#> 1 231.
mtcars %>% mean_by(disp, by = cyl)
#> # A tibble: 3 x 2
#> cyl average
#> <dbl> <dbl>
#> 1 4 105.
#> 2 6 183.
#> 3 8 353.
All tidyverse functions taking dots support name-unquoting:
var <- "foo"
rlang::list2(
var = 1,
!!var := 2
)
#> $var
#> [1] 1
#>
#> $foo
#> [1] 2
It is not too difficult to add the return name as a parameter:
mean_by <- function(data, var, by, name = "average") {
data %>%
group_by({{ by }}) %>%
summarise(!!name := mean({{ var }}))
}
mtcars %>% mean_by(disp, name = "foo")
#> # A tibble: 1 x 1
#> foo
#> <dbl>
#> 1 231.
However this requires knowing about quasiquotation:
!! unquotes a variable, i.e. substitute its value.Quasiquotation should probably considered an advanced subject. Perplexed users can copy/paste patterns, but it doesn’t feel nice to use a language without understanding it.
The second problem is that this syntax uses the vestigial operator := because R doesn’t allow !! on the LHS of =. This makes it even more awkward to teach.
One useful aspect of the dots syntax in R is that you can pass names with objects and expressions. If your function takes dots, not only will it support multiple inputs, but the caller can give names to those inputs.
Unfortunately it’s not obvious how and where to pass dots in the mean_by() example. If we pass dots to mean(), we are just forwarding arguments to that function. This is useful for parameterising mean(), but not to name the return value
mean_by <- function(data, var, ..., by) {
data %>%
group_by({{ by }}) %>%
summarise(average = mean({{ var }}, ...))
}
mtcars %>% mean_by(disp, by = cyl)
#> # A tibble: 3 x 2
#> cyl average
#> <dbl> <dbl>
#> 1 4 105.
#> 2 6 183.
#> 3 8 353.
mtcars %>% mean_by(disp, trim = 0.2, by = cyl)
#> # A tibble: 3 x 2
#> cyl average
#> <dbl> <dbl>
#> 1 4 103.
#> 2 6 176.
#> 3 8 346.
The second option is to pass the dots to summarise(). This will work nicely to name the return values. But we have now changed the purpose of the function, it’s now a general variant of summarise(), it no longer takes the mean:
mean_by <- function(data, var, ..., by) {
data %>%
group_by({{ by }}) %>%
summarise(...)
}
mtcars %>% mean_by(foo = disp)
#> Error: Column `foo` must be length 1 (a summary value), not 32
mtcars %>% mean_by(foo = mean(disp))
#> # A tibble: 1 x 1
#> foo
#> <dbl>
#> 1 231.
Is there a way to take ... inputs but still apply the mean() to each of them?
To get out of this pickle, we need some metaprogramming skills. We’re going to quote the dots, and expand each of the input expressions with mean():
mean_by <- function(data, ..., by) {
# `.named` makes sure the dots have default names, if not supplied
dots <- enquos(..., .named = TRUE)
# Go over all inputs, and wrap them in a call
dots <- lapply(dots, function(dot) call("mean", dot, na.rm = TRUE))
# Finally, splice the expressions back into `summarise()`:
data %>%
group_by({{ by }}) %>%
summarise(!!!dots)
}
We can now parameterise the return name:
mtcars %>% mean_by(disp)
#> # A tibble: 1 x 1
#> disp
#> <dbl>
#> 1 231.
mtcars %>% mean_by(foo = disp)
#> # A tibble: 1 x 1
#> foo
#> <dbl>
#> 1 231.
And our function supports multiple inputs as a bonus:
mtcars %>% mean_by(foo = disp, bar = drat, by = cyl)
#> # A tibble: 3 x 3
#> cyl foo bar
#> <dbl> <dbl> <dbl>
#> 1 4 105. 4.07
#> 2 6 183. 3.59
#> 3 8 353. 3.23
Maybe we could leverage syntax to get there more easily? A triple {{{ would be to ... what {{ is to single inputs, a shortcut for enquos() and !!!. It is normally not needed because you can pass dots straight to other tidy eval function. The twist is that this operator would support complex expressions that act as a template:
mean_by <- function(data, ..., by) {
data %>%
group_by({{ by }}) %>%
summarise({{{ mean(..., na.rm = TRUE) }}})
}
Whereas the doublestache interpolates inputs, the superstache templates them. This operator would enable new patterns that are currently impossible to do without learning about metaprogramming such as quoting inputs, creating calls, etc.