This document outlines a new approach to non-standard evaluation (NSE). There are three key ideas:
Instead of using substitute(), use lazy::lazy() to capture both expression and environment. (Or use lazy::lazy_dots(...) to capture promises in ...)
Every function that uses NSE should have a standard evaluation (SE) escape hatch that does the actual computation. The SE-function name should end with _.
The SE-function has a flexible input specification to make it easy for people to program with.
lazy()The key tool that makes this approach possible is lazy(), an equivalent to substitute() that captures both expression and environment associated with a function argument:
library(lazy)
f <- function(x = a - b) {
lazy(x)
}
f()
#> <lazy>
#> code: a - b
#> env: <environment: 0x7fc192537ae8>
f(a + b)
#> <lazy>
#> code: a + b
#> env: <environment: R_GlobalEnv>
As a complement to eval(), the lazy package provides lazy_eval() that uses the environment associated with the lazy object:
a <- 10
b <- 1
lazy_eval(f())
#> [1] 9
lazy_eval(f(a + b))
#> [1] 11
The second argument to lazy eval is a list or data frame where names should be looked up first:
lazy_eval(f(), list(a = 1))
#> [1] 0
lazy_eval() also works with formulas, since they contain the same information as a lazy object: an expression (only the RHS is used by convention) and an environment:
lazy_eval(~ a + b)
#> [1] 11
h <- function(i) {
~ 10 + i
}
lazy_eval(h(1))
#> [1] 11
Whenever we need a function that does non-standard evaluation, always write the standard evaluation version first. For example, let’s implement our own version of subset():
subset2_ <- function(df, condition) {
r <- lazy_eval(condition, df)
r <- r & !is.na(r)
df[r, , drop = FALSE]
}
subset2_(mtcars, lazy(mpg > 31))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
subset2_(mtcars, ~mpg > 31)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
To make the NSE version as flexible as possible, use as.lazy() to coerce input into a lazy object. In general, this requires an environment, and the parent.frame() is a reasonable guess. This allows the user to pass in quoted calls and strings. This is a little risky but provides useful scaffolding when learning how to do NSE the right way.
subset2_ <- function(df, condition) {
condition <- as.lazy(condition, parent.frame())
r <- lazy_eval(condition, df)
r <- r & !is.na(r)
df[r, , drop = FALSE]
}
subset2_(mtcars, quote(mpg > 31))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
subset2_(mtcars, "mpg > 31")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
With the standard evaluation version in hand, writing the standard evaluation version is easy. We just use lazy() to capture the unevaluated expression and corresponding environment:
subset2 <- function(df, condition) {
subset2_(df, lazy(condition))
}
subset2(mtcars, mpg > 31)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
This standard evaluation escape hatch is very important because it allows us to implement different NSE approaches. For example, we could create a subsetting function that finds all rows where a variable is above a threshold:
above_threshold <- function(df, var, threshold) {
cond <- substitute(var > threshold)
subset2_(df, cond)
}
above_threshold(mtcars, mpg, 31)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
The use of substitute() is appropriate here because var is a variable name and threshold is a value - the environment in which they are defined is not important.
Because lazy() captures the environment associated with the function argument, we automatically avoid a subtle scoping bug present in subset():
x <- 31
f1 <- function(...) {
x <- 30
subset(mtcars, ...)
}
# Uses 30 instead of 31
f1(mpg > x)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
f2 <- function(...) {
x <- 30
subset2(mtcars, ...)
}
# Correctly uses 31
f2(mpg > x)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
lazy() has another advantage over substitute() - by default, it follows promises across function invovations. This simplifies the casual use of NSE.
x <- 31
g1 <- function(comp) {
x <- 30
subset(mtcars, comp)
}
g1(mpg > x)
#> Error: object 'mpg' not found
g2 <- function(comp) {
x <- 30
subset2(mtcars, comp)
}
g2(mpg > x)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Note that g2() doesn’t have a standard-evaluation escape hatch, so it’s not suitable for programming with in the same way that subset2_() is. See vignettes("chained-promises") for more details on this topic.