Data Structure Profiling

version 2

Author

Marc Paterno

Published

March 27, 2024

Purpose

This document describes the microbenchmarking work I have done to help choose between possible data structures to use for the SimPhotonsLite class.

Note for version 2: this document corrects an error in the data collection present in version 1. In the earlier version, the apparently good performance of the hashmap solution was caused by an error in the code that filled the collections. The result of this error was that the number of measurements present in the std::map<int,int> and hashmap containers was smaller than expected (and never larger than 1000 elements). The conclusions reached are affected by this change.

Contenders for the design

The original design for SimPhotonsLite is a std::map<int, int>. Possible contenders for a redesign include:

std::unordered_map<int, int>. I call this hashmap.
A container of record objects, each record containing a tick and an nphots value. Possible choices for the container are std::vector, std::deque and std::forward_list. I call these array-of-struct data structures. In the analysis below, they are aosv, aosd and aosl.
a struct containing two containers of int values. The choices for the container are the same as above. I call these struct-of-array data structures. In the analysis below, they are soav, soad and soal.

Generally, the struct-of-array data structures are designed to optimize the performance of iterating over all nphots values, and the array-of-struct data structures are designed to optimize the performance of iterating simultaneously through both ticks and nphots. hashmap is optimized for doing insertions and deletions and direct lookups by ticks value. Note that the data product is created only once, so that optimization for insertions and deletions is not relevant. If lookup by ticks values is needed, the arrays in the SOA and AOS implementations could be sorted, and an interface providing a binary search could be provided.

Performance measurements

I made performance measurements using the nanobench library. In each case, collections of a varying range of sizes were used. Note that the typical size of a SimPhotonsLite object is about 3000 key/value pairs.

I created benchmarks for a variety of different usage patterns, described below. In each case I ran the same code on a Linux box (Intel Skylake-avx512) and a MacBook Pro (Apple M2 Max chip). On the Skylake machine, the code was compiled with GCC 12.3.0. On the M2 machine, the code was compiled with Apple Clang 15.0. I have used two different compilers, standard library implementations, and types of hardware to illustrate what patterns are common across implementations, and where implementation differences cause performance divergences.

Iterating through `nphots` values, ignoring `ticks`

In this test the iteration looks only at the nphots values, summing them.

Figure 1: Time taken to iterate through the structure, summing all `nphots` values. The vertical dotted line shows the typical size of the collection (3000).

The winners are the SOA data structures, which have similar performance. On the Skylake build, the AOS vector also achieves excellent performance; this is not true on the M2 build. The std::map<int, int> data structure is the slowest over the entire range. The hashmap data structure is less bad than the std::map<int,int>, but is still bad. This is a bucket-based data structure that performs approximately as well as the aosl, which is a node-based data structure; both yield poor cache coherency.

Iterating through both `ticks` and `nphots`

In this test, the iteration requires looking at both the ticks and the nphots data. The task we do is to find the tick with the largest value of nphots by a scan through all the data.

Figure 2: Time taken to iterate through the structure, scanning all `nphots` to find the largest, and to record the `ticks` associated with that largest value. In this scan, `ticks` is only read when necessary. The vertical dotted line shows the typical size of the collection (3000).

This time the differences between the SOA and AOS structures are less dramatic, with the SOA structures doing better. Again, the std::map<int,int> is uniformly worst, and the node- and bucket-based stores of records are poor.

Tabular results

If we concentrate only on the size of 3000, we obtain the rankings below.

structure	time
aosv	514.25
soav	514.70
soal	516.60
soad	518.05
aosd	3070.63
hashmap	5735.98
aosl	5808.77
map	15809.36

Table 1: Running times for sums function, Skylake results only.

structure	time
aosv	3072.53
soav	3744.61
soad	3749.71
soal	3750.77
aosd	4705.20
hashmap	5792.76
aosl	5813.63
map	17084.49

Table 2: Running times for scan function, Skylake results only.

Conclusions

For the case if iterating only through the values (not looking at the keys), the struct-of-array choices, and the array-of-struct for an array type of vector are all good. The node- and the bucket-based solutions (std::map<int,int>,hashmap,aosl`) are not contenders for the solution.

For the case of iterating through both keys and values the array-of-struct and struct-of-array, both using vector, are the two best.

The size of the collections in question does not change the decision of which is fastest. In the same of 10 events I have used in my work thus far, the range of sizes was from 1 to 10 thousand measurements; over this range the relative performances are fairly stable.