The two most popular ¹ R style guides are the Google’s R Style Guide and Hadley Wickham’s Style Guide. But these guides only address base R code. The Tidyverse comes with its own set of conventions. The following is an attempt at organizing the coding conventions for the Tidyverse.

General Script Structure

Pacakge Declarations

If you’re using several Tidyverse pacakges, use library(tidyverse) instead of listing them all individually.

Line Length

Many style guides mention to never let a line go past 80 (or 120) characters. If you’re using RStudio, there’s a helpful setting for this. Go to Tools -> Global Options… -> Code -> Display, and select the option “Show margin”, and set “margin column” to 80 (or 120).

When arguments to functions overrun the 80 character limit, break them up into multiple lines and use spaces to align them. This applies in particular to pipes where multiple levels of indentation can quickly lead to overflowing lines.

Pipelines

Each step in a pipeline should be on its own line, even for for short pipes.

GOOD:

mtcars %>% 
  mutate(cyl = cyl * 2) %>%
  mutate(mpg = mpg + 2)

BAD:

mtcars %>% mutate(cyl = cyl * 2) %>% mutate(mpg = mpg + 2)

Every line past the first line should be indented with two spaces.

GOOD:

mtcars2 <- mtcars %>%
  mutate(cyl = cyl * 2) %>%
  group_by(gear) %>%
  summarise(avg_disp = mean(disp))

BAD:

mtcars2 <- mtcars %>% 
mutate(cyl = cyl * 2) %>%
group_by(gear) %>%
summarise(avg_disp = mean(disp))

Keep your pipelines under ten pipes. If your pipeline is longer than that, break up the pipline into intermediate objects with meaningful names ²

When breaking up a pipe into multiple intermediate objects, don’t use the same name with a suffix attached (e.g. foo_1, foo_2, etc.), rather try to come up with a meaningful name that summarizes the pipe’s goal (e.g. foo_summarized_by_bar).

@davidjayharris @ucfagls yeah that often trips me up. Accidentally run same line twice and end up in weird state
— Hadley Wickham (@hadleywickham) January 11, 2017

Avoid the assignment operator %<>% whenever possible (which is to say, always).³ Instead, use explicit assignment. Also, don’t use the -> assignment operator at the end of a pipe.

GOOD:

mtcars <- mtcars %>% 
  group_by(gear) %>%
  summarise(avg_disp = mean(disp))

BAD:

mtcars %<>% 
  group_by(gear) %>%
  summarise(avg_disp = mean(disp))

BAD:

mtcars %>%
  group_by(gear) %>%
  summarise(avg_disp = mean(disp)) -> mtcars

When adding more then one column in a mutate pipe, separate them on multiple lines, or just use separate mutate statements for each column.

GOOD:

mtcars %>%
  mutate(transmition = factor(am, labels =  c("automatic", "manual")),
         weight = wt * 1000,
         kml = mpg * 0.425)

GOOD:

mtcars %>%
  mutate(transmition = factor(am, labels =  c("automatic", "manual")) %>%
  mutate(weight = wt * 1000) %>%
  mutate(kml = mpg * 0.425)

BAD:

mtcars %>%
  mutate(transmition = factor(am, labels =  c("automatic", "manual")), weight = wt * 1000, kml = mpg * 0.425)

`ggplot` Objects

Each additional geom or similar ggplot2 component gets its own line.

GOOD:

ggplot(mtcars, aes(mpg, cyl)) +
  geom_point()

BAD:

ggplot(mtcars, aes(mpg, cyl)) + geom_point()

In general, a ggplot object should have the following order:

geoms and stats
scales, coords, and facets
annotates
themes, labss, etc.

If possible, declare the aesthetic mappings of a ggplot object in the opening ggplot call, so that the later geoms inherit the same mappings.

GOOD:

ggplot(mtcars, aes(mpg, disp)) +
  geom_point()

BAD:

ggplot(mtcars) +
  geom_point(aes(mpg, disp))

A Tidyverse Style Guide

Abraham Neuwirth

January 15, 2017

General Script Structure

Pacakge Declarations

Line Length

Pipelines

`ggplot` Objects

A Tidyverse Style Guide

Abraham Neuwirth

January 15, 2017

General Script Structure

Pacakge Declarations

Line Length

Pipelines

ggplot Objects

`ggplot` Objects