purrring in R

Introduction

This short article explains basic concepts of functional programming, why you would write code in a functional way and how to do it. The goal is to pique your curiosity in the diverse ways of achieving the same results with less code.

One library that allows functional programming in R is called the purrr. In this article, you’ll learn about the basic functions

map()
pmap()
compose()

You’ll also learn some principles of programming and analysis:

DRY (Don’t Repeat Yourself)
Self-documenting code
Thinking at the appropriate level of abstraction

Mapping with `map()`

One of the most useful programming principles is DRY, which stands for (Don’t Repeat Yourself). One of the most repeated type of code that you write is a loop. Consider what the typical code would be for calculating the sums of a list of vector of integers:

some_list = list(x = c(1, 2, 3), 
                 y = c(40, 50, 60), 
                 z = c(700, 800, 900))
result = list()

for (sublist in some_list){
  result = append(result, sum(sublist))
}

print(result)

## [[1]]
## [1] 6
## 
## [[2]]
## [1] 150
## 
## [[3]]
## [1] 2400

That was a keyboardful! You are likely to write for (item in list){} so many times that you might be wondering if there’s a better way to do this. The map() function in the purrr does just that:

library(purrr)
some_list = list(x = c(1, 2, 3), 
                 y = c(40, 50, 60), 
                 z = c(700, 800, 900))
result = list()

result = map(some_list, sum)

print(result)

## $x
## [1] 6
## 
## $y
## [1] 150
## 
## $z
## [1] 2400

The loop has been replaced by the map() function. This results in neater code, which is easier to grasp at a glance because of fewer nouns and verbs, i.e. sublist and append() are unnecessary.

Graphically:

Self-documenting code

Additionally, you are likely to write the above loop multiple times, e.g. to use with mean(), median() etc.

for (sublist in some_list){
  result1 = append(result, sum(sublist))
}

for (sublist in some_list){
  result2 = append(result, mean(sublist))
}

for (sublist in some_list){
  result3 = append(result, median(sublist))
}

At a glance, it’s quite difficult to figure out why there are three for loops. This example only has one line in the for loop, imagine if there are many lines. To solve this, most people might be tempted by commenting:

# Calculating sum
for (sublist in some_list){
  result1 = append(result, sum(sublist))
}

# Calculating mean
for (sublist in some_list){
  result2 = append(result, mean(sublist))
}

# Calculating median
for (sublist in some_list){
  result3 = append(result, median(sublist))
}

This type of commenting certainly clarifies, however, considering the effect of using map():

result1 = map(some_list, sum)
result2 = map(some_list, mean)
result3 = map(some_list, median)

To me, this code is so obvious that it doesn’t need comments at all. This is a purrrfect example of a self-documenting code.

`pmap()` (parallel map)

What if you wanted to take the sum of the other dimension, i.e. the nth elements of each of the lists? Let’s try loops:

some_list = list(x = c(1, 2, 3), 
                 y = c(40, 50, 60), 
                 z = c(700, 800, 900))
result = list()

for (i in 1:length(some_list)){
  result = append(result, some_list$x[i] + 
                          some_list$y[i] + 
                          some_list$z[i])
}

print(result)

## [[1]]
## [1] 741
## 
## [[2]]
## [1] 852
## 
## [[3]]
## [1] 963

and from a functional perspective:

library(purrr)
some_list = list(x = c(1, 2, 3), 
                 y = c(40, 50, 60), 
                 z = c(700, 800, 900))
result = list()

result = pmap(some_list, sum)

print(result)

## [[1]]
## [1] 741
## 
## [[2]]
## [1] 852
## 
## [[3]]
## [1] 963

The pmap() function applies the desired operation, e.g. sum(), across each element of the list.

Graphically:

Composing your functions, `compose()`

Moving away from moving away from loops (this is apt!), another important concept is working at the appropriate level of abstraction. Let’s illustrate this concept with a simple example. Using the mtcars data, you want to find out the top three number of carburetors are. First, the traditional way:

library(tidyverse)

tail(sort(table(mtcars$carb)), 3)

## 
##  1  2  4 
##  7 10 10

There are three functions to achieve this, first table(), then sort(), finally tail(). In the traditional way, you have to read the order inside out, which is aggravated by every widening gap between the function name on the left and any other parameters on the right. This is not a natural way to read for humans. One would quickly become lost in the details; counting parentheses, ensuring that the extra arguments are aligned with the parentheses. This is a hindrance to analytical thinking.

You might ask, “Why don’t I just use the pipe method from the magrittr package? That would solve the problem right?” Let’s see how this looks:

library(tidyverse)

mtcars %>% 
  select(carb) %>% 
  table %>%
  sort%>% 
  tail(3)

## .
##  1  2  4 
##  7 10 10

Yes, this is certainly an improvement. However, what if you wanted to do the same set of operations for multiple columns, would you have to write the same five lines multiple times?

mtcars %>% 
  select(carb) %>% 
  table %>%
  sort%>% 
  tail(3)

mtcars %>% 
  select(gear) %>% 
  table %>%
  sort%>% 
  tail(3)

mtcars %>% 
  select(cyl) %>% 
  table %>%
  sort%>% 
  tail(3)

Then you might ask, “Why don’t I just factor the above steps into a function?” Let’s see this in action:

library(tidyverse)

top_3 <- function(data, variable_name){
  res <- data %>% 
          select(variable_name) %>% 
          table %>%
          sort %>% 
          tail(3)
  
  return (res)
}

top_3(mtcars, "carb")

## .
##  1  2  4 
##  7 10 10

This is yet another improvement and this code certainly allows thinking at the appropriate level of abstraction, e.g. the following doesn’t need commenting:

top_3(mtcars, "carb")
top_3(mtcars, "gear")
top_3(mtcars, "cyl")

However, while the end result is desired, consider the following definition of top_3:

library(tidyverse)

top_3 <- compose(partial(tail, n = 3), sort, table, select)

top_3(mtcars, "carb")

## out
##  1  2  4 
##  7 10 10

The verb compose()

expresses the intention to combine multiple functions in a certain order
is syntactically shorter; everything fits in one line and doesn’t need the cumbersome pipes or an excess of parentheses
is read linearly from right to left, i.e. select(), then table(), then sort() and finally tail().

Wrapping up

This article is only the tip of the iceberg that is functional programming. I hope that I have achieved my goal of piquing your curiosity. I would suggest this elucidating six-part series on functional programming in general and this six-part purrr-specific series. Happy analysing and programming!

purrring in R

Roger Yu

31 March 2019

Introduction

Mapping with `map()`

Self-documenting code

`pmap()` (parallel map)

Composing your functions, `compose()`

Wrapping up

purrring in R

Roger Yu

31 March 2019

Introduction

Mapping with map()

Self-documenting code

pmap() (parallel map)

Composing your functions, compose()

Wrapping up

Mapping with `map()`

`pmap()` (parallel map)

Composing your functions, `compose()`