The two most popular 1 R style guides are the Google’s R Style Guide and Hadley Wickham’s Style Guide. But these guides only address base R code. The Tidyverse comes with its own set of conventions. The following is an attempt at organizing the coding conventions for the Tidyverse.
If you’re using several Tidyverse pacakges, use library(tidyverse)
instead of listing them all individually.
Many style guides mention to never let a line go past 80 (or 120) characters. If you’re using RStudio, there’s a helpful setting for this. Go to Tools -> Global Options… -> Code -> Display, and select the option “Show margin”, and set “margin column” to 80 (or 120).
When arguments to functions overrun the 80 character limit, break them up into multiple lines and use spaces to align them. This applies in particular to pipes where multiple levels of indentation can quickly lead to overflowing lines.
GOOD:
mtcars %>%
mutate(cyl = cyl * 2) %>%
mutate(mpg = mpg + 2)
BAD:
mtcars %>% mutate(cyl = cyl * 2) %>% mutate(mpg = mpg + 2)
GOOD:
mtcars2 <- mtcars %>%
mutate(cyl = cyl * 2) %>%
group_by(gear) %>%
summarise(avg_disp = mean(disp))
BAD:
mtcars2 <- mtcars %>%
mutate(cyl = cyl * 2) %>%
group_by(gear) %>%
summarise(avg_disp = mean(disp))
Keep your pipelines under ten pipes. If your pipeline is longer than that, break up the pipline into intermediate objects with meaningful names 2
When breaking up a pipe into multiple intermediate objects, don’t use the same name with a suffix attached (e.g. foo_1
, foo_2
, etc.), rather try to come up with a meaningful name that summarizes the pipe’s goal (e.g. foo_summarized_by_bar
).
@davidjayharris @ucfagls yeah that often trips me up. Accidentally run same line twice and end up in weird state
— Hadley Wickham (@hadleywickham) January 11, 2017
Avoid the assignment operator %<>%
whenever possible (which is to say, always).3 Instead, use explicit assignment. Also, don’t use the ->
assignment operator at the end of a pipe.
GOOD:
mtcars <- mtcars %>%
group_by(gear) %>%
summarise(avg_disp = mean(disp))
BAD:
mtcars %<>%
group_by(gear) %>%
summarise(avg_disp = mean(disp))
BAD:
mtcars %>%
group_by(gear) %>%
summarise(avg_disp = mean(disp)) -> mtcars
When adding more then one column in a mutate
pipe, separate them on multiple lines, or just use separate mutate statements for each column.
GOOD:
mtcars %>%
mutate(transmition = factor(am, labels = c("automatic", "manual")),
weight = wt * 1000,
kml = mpg * 0.425)
GOOD:
mtcars %>%
mutate(transmition = factor(am, labels = c("automatic", "manual")) %>%
mutate(weight = wt * 1000) %>%
mutate(kml = mpg * 0.425)
BAD:
mtcars %>%
mutate(transmition = factor(am, labels = c("automatic", "manual")), weight = wt * 1000, kml = mpg * 0.425)
ggplot
Objectsgeom
or similar ggplot2
component gets its own line.
GOOD:
ggplot(mtcars, aes(mpg, cyl)) +
geom_point()
BAD:
ggplot(mtcars, aes(mpg, cyl)) + geom_point()
In general, a ggplot
object should have the following order:
geom
s and stat
sscale
s, coord
s, and facet
sannotate
stheme
s, labs
s, etc.If possible, declare the aesthetic mappings of a ggplot
object in the opening ggplot
call, so that the later geoms inherit the same mappings.
GOOD:
ggplot(mtcars, aes(mpg, disp)) +
geom_point()
BAD:
ggplot(mtcars) +
geom_point(aes(mpg, disp))