Pipe operators (%>% and others)

# Pipe operators like %>%. These operators, available in magrittr, dplyr, and other packages, have transformed how I process data. By allowing me to pass the output of one function directly into another, they’ve made my code cleaner and more intuitive.
# 
# Basic Use and Chaining
# When I first encountered the %>% operator, I realized it was a game-changer for readability. Instead of nesting functions inside each other, I could write my operations in a straightforward, left-to-right sequence. For example, calculating the mean of a sequence like 1:10 became so much simpler. Instead of wrapping mean(1:10), I could just write 1:10 %>% mean, and it felt more natural.
# 
# I also explored converting factors to numeric values, something I used to do with nested function calls like as.numeric(as.character(years)). With the pipe operator, I simplified it to years %>% as.character %>% as.numeric. It was a small change but one that significantly improved the readability of my code.
# 
# Using Placeholders with the Pipe
# I found myself in situations where I didn’t want the default behavior of using the left-hand side (LHS) as the first argument on the right-hand side (RHS). Learning about placeholders like . was a revelation. For instance, when I wanted to check if a substring contained a specific pattern, I used substring("Data Science", 6, 12) %>% grepl("Science", .) to make my intent clear. I loved how I could also use curly braces {} to manipulate and combine the LHS in creative ways.
# 
# Functional Sequences
# I often repeat sequences of operations, so discovering that I could store them as a reusable function using pipes was invaluable. I created a function extract_month to extract the month from date strings. By defining it as . %>% as.character %>% as.Date %>% month, I could apply it easily across different datasets. Whether it was applying it to a single column like df$today %>% extract_month or to an entire dataframe with df %>% mutate_all(funs(extract_month)), it streamlined my workflow.
# 
# Reassigning with %<>%
# I must admit, I wasn’t thrilled about typing variable names repeatedly. That’s when I stumbled upon %<>%, the compound assignment operator. It allowed me to pipe and reassign in one go. Instead of writing df <- df %>% select(1:3) %>% filter(mpg > 20, cyl == 6), I used df %<>% select(1:3) %>% filter(mpg > 20, cyl == 6). This saved me time and made my code cleaner.
# 
# Exposing Columns with %$%
# Another highlight was %$%, the exposition pipe operator. I often needed to pass columns from a dataframe into functions that don’t accept a data argument. For example, I used mtcars %>% filter(wt > 2) %$% cor.test(hp, mpg) to quickly run a correlation test on filtered data. This operator simplified tasks that previously felt cumbersome, allowing me to focus more on the analysis.
# 
# Handling Side Effects with %T>%
# There were moments when I needed to produce side effects like saving or printing data without disrupting the flow of my code. %T>%, the tee operator, was perfect for this. When working with sorted lists of letters and numbers, I used letters_numbers <- c(letters, 1:10) %>% sort %T>% write.csv(file = "letters_numbers.csv"). It allowed me to save my work without losing the sorted list in the process.
# 
# Visualizing with dplyr and ggplot2
# Piping from dplyr into ggplot2 was a revelation. By chaining data manipulations and visualizations, I could create a seamless exploratory data analysis pipeline. I filtered diamonds for depth values greater than 60, grouped by cut, calculated the mean price, and then plotted it—all in one go. Writing diamonds %>% filter(depth > 60) %>% group_by(cut) %>% summarize(mean_price = mean(price)) %>% ggplot(aes(x = cut, y = mean_price)) + geom_bar(stat = "identity") made my workflow so much more efficient.
# 
# Reflecting on Pipe Operators
# Pipe operators have truly transformed how I interact with data in R. They’ve not only made my code more readable and intuitive but also helped me focus on what really matters—the data itself. Whether it’s handling repetitive sequences, managing side effects, or seamlessly moving from data manipulation to visualization, I’ve come to appreciate the elegance and power that pipes bring to my data science projects.

# Load necessary libraries
library(magrittr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

# Basic use and chaining with pipe operator
1:20 %>% sum

## [1] 210

# Nested function replacement using pipe
categories <- factor(c("A", "B", "C", "D"))
categories %>% as.character %>% toupper

## [1] "A" "B" "C" "D"

# Piping with placeholders
"Data Analysis" %>% substring(6, 13) %>% grepl("Analysis", .)

## [1] TRUE

"Data Analysis" %>% substring(6, 13) %>% { c(paste('Insight:', .)) }

## [1] "Insight: Analysis"

"Data Analysis" %>% substring(6, 13) %>% { c(paste(. ,'Processed', .)) }

## [1] "Analysis Processed Analysis"

# Functional sequences
extract_month <- . %>% as.character %>% as.Date %>% month
df <- data.frame(today = "2024-11-03", earlier = "2022-05-15")
df$today %>% extract_month

## [1] 11

df %>% lapply(extract_month) %>% as.data.frame

##   today earlier
## 1    11       5

df %>% mutate_all(funs(extract_month))

## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
## 
## # Simple named list: list(mean = mean, median = median)
## 
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
## 
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

##   today earlier
## 1    11       5

# Load necessary library
library(dplyr)

# Compound assignment with %<>%
df_new <- iris
df_new %<>% filter(Sepal.Length > 5, Species == "setosa") %>% select(1:2)

# Exposing contents with %$%
iris %>%
  filter(Sepal.Width > 3) %$%
  cor.test(Sepal.Length, Petal.Length)

## 
##  Pearson's product-moment correlation
## 
## data:  Sepal.Length and Petal.Length
## t = 19.112, df = 65, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8747764 0.9510883
## sample estimates:
##       cor 
## 0.9213771

# Creating side effects with %T>%
letters_numbers <- c(letters, 1:10) %>% 
  sort %T>% 
  write.csv(file = "letters_numbers.csv")

# Workaround for saving unnamed object
save3 <- function(. = ., name, file = stop("'file' must be specified")) {
  assign(name, .)
  call_save <- call("save", ... = name, file = file)
  eval(call_save)
}

letters_numbers <- c(letters, 1:10) %>% 
  sort %T>% 
  save3("letters_numbers", "letters_numbers.RData")

# Using the pipe with dplyr and ggplot2
mtcars %>%
  filter(hp > 100) %>%
  group_by(cyl) %>%
  summarize(mean_mpg = mean(mpg)) %>%
  ggplot(aes(x = factor(cyl), y = mean_mpg)) +
  geom_bar(stat = "identity")

# Libraries
library(knitr)
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

# Code
code_steps <- data.frame(
  Step = c(
    "Basic use and chaining with pipe operator",
    "Nested function replacement using pipe",
    "Piping with placeholders",
    "Functional sequences",
    "Compound assignment with %<>%",
    "Exposing contents with %$%",
    "Creating side effects with %T>%",
    "Workaround for saving unnamed object",
    "Using the pipe with dplyr and ggplot2"
  ),
  Code = c(
    '1:20 %>% sum',
    'categories <- factor(c("A", "B", "C", "D")); categories %>% as.character %>% toupper',
    '"Data Analysis" %>% substring(6, 13) %>% grepl("Analysis", .)',
    '. %>% as.character %>% as.Date %>% month; df <- data.frame(today = "2024-11-03", earlier = "2022-05-15"); df$today %>% extract_month',
    'df_new %<>% select(1:2) %>% filter(Sepal.Length > 5, Species == "setosa")',
    'iris %>% filter(Sepal.Width > 3) %$% cor.test(Sepal.Length, Petal.Length)',
    'letters_numbers <- c(letters, 1:10) %>% sort %T>% write.csv(file = "letters_numbers.csv")',
    'letters_numbers %>% sort %T>% save3("letters_numbers", "letters_numbers.RData")',
    'mtcars %>% filter(hp > 100) %>% group_by(cyl) %>% summarize(mean_mpg = mean(mpg)) %>% ggplot(aes(x = factor(cyl), y = mean_mpg)) + geom_bar(stat = "identity")'
  )
)

# Colors in Tabular Table 
code_steps %>%
  kbl() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(1:9, background = c("lightcyan", "lavender", "lightpink", "lightyellow", "lightgreen", "lightblue", "lightcoral", "lightgray", "lightgoldenrod")) %>%
  column_spec(1, bold = TRUE, background = "lightsteelblue") %>%
  column_spec(2, background = "lightgoldenrodyellow")

Step	Code
Basic use and chaining with pipe operator	1:20 %>% sum
Nested function replacement using pipe	categories <- factor(c(“A”, “B”, “C”, “D”)); categories %>% as.character %>% toupper
Piping with placeholders	“Data Analysis” %>% substring(6, 13) %>% grepl(“Analysis”, .)
Functional sequences	. %>% as.character %>% as.Date %>% month; df <- data.frame(today = “2024-11-03”, earlier = “2022-05-15”); df$today %>% extract_month </td> </tr> <tr> <td style="text-align:left;background-color: lightcyan !important;font-weight: bold;background-color: lightsteelblue !important;"> Compound assignment with %<>% </td> <td style="text-align:left;background-color: lightcyan !important;background-color: lightgoldenrodyellow !important;"> df_new %<>% select(1:2) %>% filter(Sepal.Length > 5, Species == "setosa") </td> </tr> <tr> <td style="text-align:left;background-color: lightcyan !important;font-weight: bold;background-color: lightsteelblue !important;"> Exposing contents with %$%	iris %>% filter(Sepal.Width > 3) %$% cor.test(Sepal.Length, Petal.Length)
Creating side effects with %T>%	letters_numbers <- c(letters, 1:10) %>% sort %T>% write.csv(file = “letters_numbers.csv”)
Workaround for saving unnamed object	letters_numbers %>% sort %T>% save3(“letters_numbers”, “letters_numbers.RData”)
Using the pipe with dplyr and ggplot2	mtcars %>% filter(hp > 100) %>% group_by(cyl) %>% summarize(mean_mpg = mean(mpg)) %>% ggplot(aes(x = factor(cyl), y = mean_mpg)) + geom_bar(stat = “identity”)

Pipe operators (%>% and others)

Avery Holloman

2024-11-04