This document outlines a new approach to non-standard evaluation (NSE). There are three key ideas:
Instead of using substitute()
, use lazy::lazy()
to capture both expression and environment. (Or use lazy::lazy_dots(...)
to capture promises in ...
)
Every function that uses NSE should have a standard evaluation (SE) escape hatch that does the actual computation. The SE-function name should end with _
.
The SE-function has a flexible input specification to make it easy for people to program with.
lazy()
The key tool that makes this approach possible is lazy()
, an equivalent to substitute()
that captures both expression and environment associated with a function argument:
library(lazy)
f <- function(x = a - b) {
lazy(x)
}
f()
#> <lazy>
#> code: a - b
#> env: <environment: 0x7fc192537ae8>
f(a + b)
#> <lazy>
#> code: a + b
#> env: <environment: R_GlobalEnv>
As a complement to eval()
, the lazy package provides lazy_eval()
that uses the environment associated with the lazy object:
a <- 10
b <- 1
lazy_eval(f())
#> [1] 9
lazy_eval(f(a + b))
#> [1] 11
The second argument to lazy eval is a list or data frame where names should be looked up first:
lazy_eval(f(), list(a = 1))
#> [1] 0
lazy_eval()
also works with formulas, since they contain the same information as a lazy object: an expression (only the RHS is used by convention) and an environment:
lazy_eval(~ a + b)
#> [1] 11
h <- function(i) {
~ 10 + i
}
lazy_eval(h(1))
#> [1] 11
Whenever we need a function that does non-standard evaluation, always write the standard evaluation version first. For example, let’s implement our own version of subset()
:
subset2_ <- function(df, condition) {
r <- lazy_eval(condition, df)
r <- r & !is.na(r)
df[r, , drop = FALSE]
}
subset2_(mtcars, lazy(mpg > 31))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
subset2_(mtcars, ~mpg > 31)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
To make the NSE version as flexible as possible, use as.lazy()
to coerce input into a lazy object. In general, this requires an environment, and the parent.frame()
is a reasonable guess. This allows the user to pass in quoted calls and strings. This is a little risky but provides useful scaffolding when learning how to do NSE the right way.
subset2_ <- function(df, condition) {
condition <- as.lazy(condition, parent.frame())
r <- lazy_eval(condition, df)
r <- r & !is.na(r)
df[r, , drop = FALSE]
}
subset2_(mtcars, quote(mpg > 31))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
subset2_(mtcars, "mpg > 31")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
With the standard evaluation version in hand, writing the standard evaluation version is easy. We just use lazy()
to capture the unevaluated expression and corresponding environment:
subset2 <- function(df, condition) {
subset2_(df, lazy(condition))
}
subset2(mtcars, mpg > 31)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
This standard evaluation escape hatch is very important because it allows us to implement different NSE approaches. For example, we could create a subsetting function that finds all rows where a variable is above a threshold:
above_threshold <- function(df, var, threshold) {
cond <- substitute(var > threshold)
subset2_(df, cond)
}
above_threshold(mtcars, mpg, 31)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
The use of substitute()
is appropriate here because var
is a variable name and threshold
is a value - the environment in which they are defined is not important.
Because lazy()
captures the environment associated with the function argument, we automatically avoid a subtle scoping bug present in subset()
:
x <- 31
f1 <- function(...) {
x <- 30
subset(mtcars, ...)
}
# Uses 30 instead of 31
f1(mpg > x)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
f2 <- function(...) {
x <- 30
subset2(mtcars, ...)
}
# Correctly uses 31
f2(mpg > x)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
lazy()
has another advantage over substitute()
- by default, it follows promises across function invovations. This simplifies the casual use of NSE.
x <- 31
g1 <- function(comp) {
x <- 30
subset(mtcars, comp)
}
g1(mpg > x)
#> Error: object 'mpg' not found
g2 <- function(comp) {
x <- 30
subset2(mtcars, comp)
}
g2(mpg > x)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Note that g2()
doesn’t have a standard-evaluation escape hatch, so it’s not suitable for programming with in the same way that subset2_()
is. See vignettes("chained-promises")
for more details on this topic.