Let’s say we want to use R to find the following:
\[\sqrt{ | \log_{10}{0.75} |}\]
and round the result to two decimal places. How can we achieve that?
Step-by-step we need to:
How would that look in R?
How functions and computers work, you actually need to start with the last step and work you’re way to the first step!
# Have to round, the squareroot, the absolute value, then take the log
round(digits = 2, sqrt(abs(log10(0.75))))
## [1] 0.35
To improve the readability, a practice is to include the new function in a new line:
round(
digits = 2,
sqrt(
abs(
log10(
0.75
)
)
)
)
## [1] 0.35
One reason why the code is difficult to read is the first function is
the one we use in step 4 (round) and the last function written in the
code is the one we want to use first (log10()
). So we need
to read the code from the center to the ends (a “Middle-out” approach)
to understand what is going on. Ideally we’d start with the first
function we need and end on the last function we’d use. So how do we do
that?
“Piping” in coding was created as an approach to make code more
natural to read and write. The pipe operator (|>
in R or
%>%
using the tidyverse universe) will “pass”
the results from one function to the next in the order we use them.
I’ll be using the native pipe, |>
, but you can use
either!
Note
You can shortcut either pipe by using with CRTL+SHIFT+m on PC or CMND+SHIFT+m on a mac.
It will default to using %>% but you can change it to use
|>
by
Tools > Global Options… > Code > Use Native Pipe Operator
It’s faster once you get the hang of it!
Using pipes we use R to calculate the number in the same steps we would if we were to do it by hand!
# Start with 0.75
0.75 |>
# Then pass it to the log10() function
log10() |>
# Next, we need the absolute value
abs() |>
# Now that it is positive, we can take the square root
sqrt() |>
# We can pass it into round, but we'll need to specify how many digits to round
round(digits = 2)
## [1] 0.35
So what’s the big deal about pipes? We can use it for many functions, not just basic calculations!
# Getting the data set from ggplot2
ggplot2::mpg |>
# Only keeping the cars from 2008
dplyr::filter(year == 2008) |>
# Pulling out the manufacturer column from the data frame and changing it to a vector
pull(manufacturer) |>
# Counting the number times each manufacturer appears using table()
table(dnn = "Manufacturer") |>
# Converting it from a table type object to a data.frame object
data.frame() |>
# Changing the column name from Freq to Count
rename(Count = Freq) |>
# Making the data.frame look nice for the knitted document with gt()
gt::gt()
Manufacturer | Count |
---|---|
audi | 9 |
chevrolet | 12 |
dodge | 21 |
ford | 10 |
honda | 4 |
hyundai | 8 |
jeep | 6 |
land rover | 2 |
lincoln | 1 |
mercury | 2 |
nissan | 7 |
pontiac | 2 |
subaru | 8 |
toyota | 14 |
volkswagen | 11 |
When piping a data set into a function, we can only use the name of
the columns directly (no data$column
) if the function has a
data = argument. Sadly, there are many functions that don’t have a data
argument, like table():
# Code commented out so the document will knit
#ggplot2::mpg |>
# table(manufacturer, class)
There is a different pipe operator that works with functions missing the data argument, but we won’t worry about that and you aren’t expected to use it.
Hopefully you’re able to see the benefits of piping when programming in ease of writing and reading code. But you still need to add descriptive comments!