Verification Analysis
Introduction
The purpose of this document is to verify that our timing measurements are well-understood.
Timestamp analysis
In the new Timestamp service, we record a timestamp before and after the execution of the input and output modules. Timestamps are recorded as the time in nanoseconds since 1970, recorded as a 64-bit integer. Along with each timestamp we record the CPU id and NUMA node id on which the main art thread is running.
The writing times are pretty strongly correlated, presumably because larger events take longer to write. Note that ROOT is writing more data than HEPnOS, because the file-based workflow requires “copying forward” data products while the distributed data store model does not.
We can also compare writing times when only having ROOT write the same data products as HEPnOS is writing. Here we see a stronger correlation (and also evidence that ROOT serialization is faster than Boost serialization).
Comparison of reading time is not easy. Direct comparison of the results from the Timestamp service (below) illustrates the fact that RootInput does not read the data products:
To measure the time taken to read the data products, we need to look at the module execution time. It is during the module execution that data products are read from a ROOT file. This not directly a feature of ROOT; it is a feature of art’s delayed reading, which supports ROOT but for which we have not built HEPnOS support.
Module timing
The use of HEPnOS or ROOT for i/o should not affect the speed with which the algorithms run. Since the delayed reading done with ROOT happens during the module execcution, we might be able to observe the time taken by comparing the ROOT and HEPnOS module execution times. However, variability of the module execution times, and the apparent nondeterminism of some of the algorithms, may mask this time.
It seems, for each module, that the runtime is slower when using HEPnOS io than when using ROOT io. We need to understand the reason this is happening.