This document benchmarks a few different ways of assembling a list of values in R. Surprisingly, it is possible to grow a list of values with amortized O(1) append behavior using base R types, although it is not especially idiomatic.

The candidates

Each function accepts an argument f which will be called to generate an empty data structure, so we can test these functions against both atomic vectors and lists. Each function also takes an argument k that describes the size of the final data structure.

append_index uses indexing to extend a list:

append_index = function(f, k) {
  l = f()
  for(i in 1:k) {
    l[length(l) + 1] = i
  }
  l
}

concat_call calls some function g to append an element to the growing list. We’ll try using each of these as g:

concat_call = function(f, g, k) {
  l = f()
  for(i in 1:k) {
    l = g(l, i)
  }
  l
}

Finally, preallocate creates the entire list at once and then fills in the values.

preallocate = function(f, k) {
  l = f(k)
  for(i in seq_along(l)) {
    l[i] = i
  }
}

Atomic vectors

First, let’s look at the results for atomic vectors.

result = seq(from=1000, to=max_k, by=1000) %>%
  lapply(function(k) {
    microbenchmark(append_index(c, k),
                   concat_call(c, c, k),
                   concat_call(c, append, k),
                   preallocate(numeric, k),
                   times=times) %>%
      group_by(expr) %>%
      mutate(time=time/1e6) %>% # ns (-9) to ms (-3)
      summarize(median=median(time), lq=quantile(time, 0.25), uq=quantile(time, 0.75)) %>%
      mutate(k=k)
  }) %>%
  bind_rows()

All solutions except preallocate show superlinear behavior for atomic vectors, implying that each append operation (which we’re calling N times) is O(N). Preallocating vectors can make R code much snappier.

Lists

Now we can try with lists. Notice that here we’re trying concat_call with both c() and list() for extending our list.

Using concat_call with list actually produces a deeply nested list, which isn’t quite the output we want:

concat_call(list, list, 3)
## [[1]]
## [[1]][[1]]
## [[1]][[1]][[1]]
## list()
## 
## [[1]][[1]][[2]]
## [1] 1
## 
## 
## [[1]][[2]]
## [1] 2
## 
## 
## [[2]]
## [1] 3

So we flatten it with unlist and cast it back to a list to get the same result as the other solutions.

empty_list = function(k) {
  vector("list", k)
}

list_result = seq(from=1000, to=max_k, by=1000) %>%
  lapply(function(k) {
    microbenchmark(append_index(list, k),
                   concat_call(list, c, k),
                   concat_call(list, append, k),
                   concat_call(list, list, k) %>% unlist %>% list,
                   preallocate(empty_list, k),
                   times=times) %>%
      group_by(expr) %>%
      mutate(time=time/1e6) %>% # ns (-9) to ms (-3)
      summarize(median=median(time), lq=quantile(time, 0.25), uq=quantile(time, 0.75)) %>%
      mutate(k=k)
  }) %>%
  bind_rows()

One thing to note is that working with lists is generally slower than working with vectors. But hold on a second: concat_call with list is just as fast as preallocating the entire list.

What’s going on here? Each time we call c(old_list, new_element), R creates a new list, copying in the values from old_list and then appending new_element. Because each element gets copied into a new data structure, each append operation is O(N).

But each time we call list(old_list, new_element), R gives us a list of length two. The first element of the list is just a reference to the old list; the second element of the list is the new element. The values don’t get copied, so creating this new list is a O(1) operation. After we repeat this N times, we’ll have N nested lists, each of length two. Flattening the list is a O(N) operation, but if we don’t need random access on the data structure as we build it, we only have to do it once, so we can amortize it over each append operation.

In my humblest of opinions, this isn’t very idiomatic, but if you need to grow a list of values one value at a time, this is at least a concise way to perform that operation using the data structures in base R.