The pipe operator, %>%

One of the main operators to be used is the pipe operator, %>%, which essentially removes the dataset input parameter. Therefore, it can be referred to with just the editable parameters. It is originally from the {magrittr} package, but is included in many {tidyverse} packages, including {dplyr} and {tidyr}.

CTRL + SHIFT + M (Windows) CMD + SHIFT + M (Mac)

Can be useful to read the pipe operator as saying “and then do this…”

There was a joke going around on social media about the pipe operator:

# DON'T ACTUALLY RUN THIS CODE CHUNK, IT'S FOR ILLUSTRATION ONLY # 

#' How mornings look for most people: 
me %>% 
  wake_up() %>% 
  get_out_of_bed() %>% 
  get_dressed() %>% 
  leave_house()
   
#' How my mornings look most of the time: 
leave_house(get_dressed(get_out_of_bed(wake_up(me))))

As the joke demonstrates, the pipe operator makes it much clearer and easier to read R syntax, especially when compared with nested functions. When a function is nested, you need to read from the inside, out, in order to understand it. When using a pipe, the syntax can read sequentially, almost like a sentence.

At each step, the pipe operator takes the output from the preceding step and enters it as the input into the next step (as the first argument of the next function). For this reason, {tidyverse} functions are often written with data = as their first argument.

But what if the first argument of a function isn’t data? Can we still use a pipe?

If we just piped into the function, it wouldn’t work, however we can use data = . as our data argument. For example, in a linear model lm(), the first argument is the formula argument, y ~ x, for example (you can think of the ~ as saying “by”), and data is the second argument:

data("iris")

# iris %>% lm(Sepal.Length ~ Petal.Length) # Throws an error 

iris %>% lm(Sepal.Length ~ Petal.Length, data = .) # This fixes the problem of the data argument not being first 
## 
## Call:
## lm(formula = Sepal.Length ~ Petal.Length, data = .)
## 
## Coefficients:
##  (Intercept)  Petal.Length  
##       4.3066        0.4089

Good practice:

  • There should always be a space before the pipe operator
  • The pipe operator should always be at the end of the line (new line after the pipe operator)
  • Ideally, no more than one pipe operator in a single line
  • Always include the function brackets (brackets make it obvious it’s a function)
# DON'T ACTUALLY RUN THIS CODE CHUNK, IT'S FOR ILLUSTRATION ONLY # 

# Good practice: 
my_awesome_dataset %>% 
  do_something() %>% 
  do_something_else() 

# Not as clear/easy to read: 
my_awesome_dataset %>% do_something() %>% do_something_else() 
my_awesome_dataset %>% do_something %>% do_something_else # is do_something another data frame? 

The assignment pipe operator, %<>%

This operator works like a pipe, but the updates the original input with the final output. Basically, it does this:

# DON'T ACTUALLY RUN THIS CODE CHUNK, IT'S FOR ILLUSTRATION ONLY # 

# Not using the assignment pipe: 
my_awesome_dataset <- my_awesome_dataset %>% 
  do_something() %>% 
  do_something_else() 

# Using the assignment pipe: 
my_awesome_dataset %<>% 
  do_something() %>% 
  do_something_else()

# These two are equivalent