Parameterising return names

library(dplyr)

The doublestache syntax of rlang 0.4.0 makes it easier to wrap data-masking functions:

mean_by <- function(data, var, by) {
  data %>%
    group_by({{ by }}) %>%
    summarise(average = mean({{ var }}))
}

{{ reads as an interpolation syntax and suggests to the reader that the code of by and of var is substituted, rather than their values. But how do we parameterise the return name? It is currently hard-coded as average:

mtcars %>% mean_by(disp)
#> # A tibble: 1 x 1
#>   average
#>     <dbl>
#> 1    231.

mtcars %>% mean_by(disp, by = cyl)
#> # A tibble: 3 x 2
#>     cyl average
#>   <dbl>   <dbl>
#> 1     4    105.
#> 2     6    183.
#> 3     8    353.

Unquoting names

All tidyverse functions taking dots support name-unquoting:

var <- "foo"

rlang::list2(
  var = 1,
  !!var := 2
)
#> $var
#> [1] 1
#> 
#> $foo
#> [1] 2

It is not too difficult to add the return name as a parameter:

mean_by <- function(data, var, by, name = "average") {
  data %>%
    group_by({{ by }}) %>%
    summarise(!!name := mean({{ var }}))
}

mtcars %>% mean_by(disp, name = "foo")
#> # A tibble: 1 x 1
#>     foo
#>   <dbl>
#> 1  231.

However this requires knowing about quasiquotation:

  • Argument names in R are quoted.
  • !! unquotes a variable, i.e. substitute its value.

Quasiquotation should probably considered an advanced subject. Perplexed users can copy/paste patterns, but it doesn’t feel nice to use a language without understanding it.

The second problem is that this syntax uses the vestigial operator := because R doesn’t allow !! on the LHS of =. This makes it even more awkward to teach.

Taking dots

One useful aspect of the dots syntax in R is that you can pass names with objects and expressions. If your function takes dots, not only will it support multiple inputs, but the caller can give names to those inputs.

Unfortunately it’s not obvious how and where to pass dots in the mean_by() example. If we pass dots to mean(), we are just forwarding arguments to that function. This is useful for parameterising mean(), but not to name the return value

mean_by <- function(data, var, ..., by) {
  data %>%
    group_by({{ by }}) %>%
    summarise(average = mean({{ var }}, ...))
}

mtcars %>% mean_by(disp, by = cyl)
#> # A tibble: 3 x 2
#>     cyl average
#>   <dbl>   <dbl>
#> 1     4    105.
#> 2     6    183.
#> 3     8    353.

mtcars %>% mean_by(disp, trim = 0.2, by = cyl)
#> # A tibble: 3 x 2
#>     cyl average
#>   <dbl>   <dbl>
#> 1     4    103.
#> 2     6    176.
#> 3     8    346.

The second option is to pass the dots to summarise(). This will work nicely to name the return values. But we have now changed the purpose of the function, it’s now a general variant of summarise(), it no longer takes the mean:

mean_by <- function(data, var, ..., by) {
  data %>%
    group_by({{ by }}) %>%
    summarise(...)
}

mtcars %>% mean_by(foo = disp)
#> Error: Column `foo` must be length 1 (a summary value), not 32

mtcars %>% mean_by(foo = mean(disp))
#> # A tibble: 1 x 1
#>     foo
#>   <dbl>
#> 1  231.

Is there a way to take ... inputs but still apply the mean() to each of them?

Modifying … inputs

To get out of this pickle, we need some metaprogramming skills. We’re going to quote the dots, and expand each of the input expressions with mean():

mean_by <- function(data, ..., by) {
  # `.named` makes sure the dots have default names, if not supplied
  dots <- enquos(..., .named = TRUE)

  # Go over all inputs, and wrap them in a call
  dots <- lapply(dots, function(dot) call("mean", dot, na.rm = TRUE))

  # Finally, splice the expressions back into `summarise()`:
  data %>%
    group_by({{ by }}) %>%
    summarise(!!!dots)
}

We can now parameterise the return name:

mtcars %>% mean_by(disp)
#> # A tibble: 1 x 1
#>    disp
#>   <dbl>
#> 1  231.

mtcars %>% mean_by(foo = disp)
#> # A tibble: 1 x 1
#>     foo
#>   <dbl>
#> 1  231.

And our function supports multiple inputs as a bonus:

mtcars %>% mean_by(foo = disp, bar = drat, by = cyl)
#> # A tibble: 3 x 3
#>     cyl   foo   bar
#>   <dbl> <dbl> <dbl>
#> 1     4  105.  4.07
#> 2     6  183.  3.59
#> 3     8  353.  3.23

A superstache operator?

Maybe we could leverage syntax to get there more easily? A triple {{{ would be to ... what {{ is to single inputs, a shortcut for enquos() and !!!. It is normally not needed because you can pass dots straight to other tidy eval function. The twist is that this operator would support complex expressions that act as a template:

mean_by <- function(data, ..., by) {
  data %>%
    group_by({{ by }}) %>%
    summarise({{{ mean(..., na.rm = TRUE) }}})
}

Whereas the doublestache interpolates inputs, the superstache templates them. This operator would enable new patterns that are currently impossible to do without learning about metaprogramming such as quoting inputs, creating calls, etc.