%>% performance demonstrationLet’s use a built-in dataset.
library(dplyr)
starwars
## # A tibble: 87 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Luke Sk… 172 77.0 blond fair blue 19.0 male
## 2 C-3PO 167 75.0 <NA> gold yellow 112 <NA>
## 3 R2-D2 96 32.0 <NA> white, bl… red 33.0 <NA>
## 4 Darth V… 202 136 none white yellow 41.9 male
## 5 Leia Or… 150 49.0 brown light brown 19.0 female
## 6 Owen La… 178 120 brown, gr… light blue 52.0 male
## 7 Beru Wh… 165 75.0 brown light blue 47.0 female
## 8 R5-D4 97 32.0 <NA> white, red red NA <NA>
## 9 Biggs D… 183 84.0 black light brown 24.0 male
## 10 Obi-Wan… 182 77.0 auburn, w… fair blue-gray 57.0 male
## # ... with 77 more rows, and 5 more variables: homeworld <chr>,
## # species <chr>, films <list>, vehicles <list>, starships <list>
Let’s group by species and calculate mass using pipes.
starwars %>%
group_by(species) %>%
summarize(mean(mass))
## # A tibble: 38 x 2
## species `mean(mass)`
## <chr> <dbl>
## 1 Aleena 15.0
## 2 Besalisk 102
## 3 Cerean 82.0
## 4 Chagrian NA
## 5 Clawdite 55.0
## 6 Droid NA
## 7 Dug 40.0
## 8 Ewok 20.0
## 9 Geonosian 80.0
## 10 Gungan NA
## # ... with 28 more rows
Do it without pipes.
summarize(group_by(starwars, species), mean(mass))
## # A tibble: 38 x 2
## species `mean(mass)`
## <chr> <dbl>
## 1 Aleena 15.0
## 2 Besalisk 102
## 3 Cerean 82.0
## 4 Chagrian NA
## 5 Clawdite 55.0
## 6 Droid NA
## 7 Dug 40.0
## 8 Ewok 20.0
## 9 Geonosian 80.0
## 10 Gungan NA
## # ... with 28 more rows
Do it with a temporary variable.
tmp <- group_by(starwars, species)
summarize(tmp, mean(mass))
## # A tibble: 38 x 2
## species `mean(mass)`
## <chr> <dbl>
## 1 Aleena 15.0
## 2 Besalisk 102
## 3 Cerean 82.0
## 4 Chagrian NA
## 5 Clawdite 55.0
## 6 Droid NA
## 7 Dug 40.0
## 8 Ewok 20.0
## 9 Geonosian 80.0
## 10 Gungan NA
## # ... with 28 more rows
Let’s use the microbenchmark package to do it 100 times each, getting the performance for each. Don’t worry about how this code works - it’s just running each of the above procedures 100 times to get an average performance.
microbenchmark::microbenchmark(
pipe = starwars %>% group_by(species) %>% summarize(mean(mass)),
nopipe = summarize(group_by(starwars, species), mean(mass)),
temp_variables = {tmp <- group_by(starwars); summarize(starwars, mean(mass))}
)
## Unit: milliseconds
## expr min lq mean median uq max
## pipe 2.020793 2.398608 2.637022 2.494255 2.685257 4.727662
## nopipe 1.863022 2.230478 2.466623 2.330475 2.533246 3.939843
## temp_variables 1.266467 1.458763 2.057072 1.517044 1.649115 42.447261
## neval cld
## 100 a
## 100 a
## 100 a
Using the %>% adds a tiny performance overhead, but not much. This is on the scale of milliseconds. In either case, assigning to a temporary variable seems to increase performance very slightly, but the millisecond you save isn’t worth the cluttered workspace or cognitive overhead.
Also check out code profiling with the profvis package, built into RStudio. You can highlight particular lines of code, and in the rstudio menu click Profile – Profile Selected Lines.
https://support.rstudio.com/hc/en-us/articles/218221837-Profiling-with-RStudio