This document has been updated to include two runs of the eventselection program over the same data.
Run 1 used 112 nodes for the eventselection program, and 16 nodes running HEPnOS daemon processes. We had 32 targets on each HEPnOS node, for a total of 512 targets. It used 63 ranks per node for the eventselection program.
Run 2 was similar except that it used 64 ranks per node for the eventselection program.
Our raw timing data consists of a series of timestamps captured using the MPI timer, a step name identifying the kind of timestamp, and an integer datum with a meaning that depends on the step it accompanies.
## # A tibble: 14,224 x 23
## job rank start post_dataset post_block_conf… post_broadcast pre_decompose
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0 0.276 19.1 19.1 191. 191.
## 2 1 1000 0.244 27.3 27.3 191. 191.
## 3 1 1001 0.103 27.3 27.3 191. 191.
## 4 1 1002 0.266 17.6 17.6 191. 191.
## 5 1 1003 0.258 18.7 18.7 191. 191.
## 6 1 1004 0.221 27.3 27.3 191. 191.
## 7 1 1005 0.125 21.2 21.2 191. 191.
## 8 1 1006 0.107 17.1 17.1 191. 191.
## 9 1 1007 0.275 25.2 25.2 191. 191.
## 10 1 1008 0.109 19.1 19.1 191. 191.
## # … with 14,214 more rows, and 16 more variables: post_decompose <dbl>,
## # pre_execute_block <dbl>, post_execute_block <dbl>, pre_reduction <dbl>,
## # post_reduction <dbl>, finish <dbl>, makeds <dbl>, calcbc <dbl>,
## # broadcast <dbl>, prep <dbl>, makeblocks <dbl>, makelambda <dbl>,
## # executeblock <dbl>, reduction <dbl>, output <dbl>, total <dbl>
We first look at timestamps that are captured for each event. The evts_1 dataframe contains one row per event. The columns in the evts_1 dataframe are:
rank: the MPI rankevt: a unique identifier for the eventprecr: pre-create records timestamppostcr: post-create records timestamppostfr: post-filter records timestamppostps: post-process subrun timestampnslices: number of slices in the eventnbytes: total number of bytes in all slices in the eventload: duration of loading from HEPnOS, in secondsrec: duration of conversion from HEPnOS object to StandardRecord objectfilt: duration of NOvA candidate selection executionAll timestamps are shifted so that the very first timestamp is at \(t = 0\), with times in seconds.
Looking at the starting times shows how long it takes to start the various ranks in the program. The first rank to start has its start time set to 0.
## # A tibble: 2 x 3
## job rank start
## <chr> <int> <dbl>
## 1 1 1077 0
## 2 2 837 0
There are no undue delays in startup.
Job 2 has an enormous delay in the finishing time for rank 0; job 1 does not:
## # A tibble: 2 x 3
## job rank finish
## <chr> <int> <dbl>
## 1 1 0 520.
## 2 2 0 5999.
Leaving off rank 0, the finishing times shows clear structure from the reduction. Note that rank 0 is the root to which the reduction is done. Job 2 is clearly slower than job 1.
The reduction task in the eventselection program consists of getting the SliceIDs (4 integers each) of all the slices that pass the candidate selection critera to rank 0, where they are then printed out. The reduction operation is the concatenation of vectors of SliceIDs.
Note we use time to indicate a point in time at which some event happens, and duration to mean the span of time taken by some process.
Job 2 shows a huge duration for the reduction of rank 0, and job 1 does not:
## # A tibble: 2 x 3
## job rank reduction
## <chr> <int> <dbl>
## 1 1 0 79.3
## 2 2 0 5477.
Leaving off rank 0, the reduction durations show structure that seems to indicate the organization of the reduction algorithm.
The distribution of reduction durations is best shown on a log scale.
It is possible that the long reduction duration for rank 0 might be because of some delay in starting the reduction for some ranks. To see if there is any indication of this, we look at the time at which the reduction was begun, for each rank:
Loading data means reading from HEPnOS into the eventselection client.
Recall that we call a hepnos::Event::load function for each data product, and that we are reading several data products.
The size of the read is the number of bytes corresponding to the sum of the sizes of the objects in memory as they are seen by the client code. We are not counting bytes transferred.
The duration of the read is the number of seconds from the start of the first hepnos::Event::load to the end of the final hepnos::Event::load for that event.
The median load size is 6396 bytes and the largest is 72660 bytes.
The distributions for the two jobs should be identical, because the same data set is process in each case.
The duration shows a small but relevant tail, hard to see in this plot.
We call the number of bytes per second read the speed.
We can plot how the loading speed changes with the size of the data loaded. We group the events by size (nbytes) into 200 groups of equal size. For each group, we plot the mean speed (in bytes/s) versus the mean size nbytes (in bytes). The line is a “lowess” regression (local polynomial fitting).
We can make a similar plot of the load duration:
The load duration shows an “overhead” of ~325 milliseconds, with a slow increase in duration for sizes above ~10 kilobytes.
We define a slow load as one that takes more than 10 seconds.
## # A tibble: 18,942 x 12
## job rank evt precr postfr postcr postps nslices nbytes load rec
## <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> <dbl> <dbl>
## 1 1 0 8 208. 225. 225. 225. 2 2616 17.6 1.10e-4
## 2 1 1000 601 201. 218. 218. 218. 6 10848 17.0 1.52e-3
## 3 1 1000 603 218. 232. 232. 232. 6 11856 13.9 1.60e-4
## 4 1 1001 1266 198. 213. 213. 213. 3 4740 14.8 1.30e-4
## 5 1 1001 1270 213. 233. 233. 233. 6 12012 19.3 1.50e-4
## 6 1 1002 1792 201. 218. 218. 218. 2 2856 16.9 1.10e-4
## 7 1 1002 1793 218. 232. 232. 232. 4 11304 13.4 1.40e-4
## 8 1 1002 1794 234. 244. 244. 244. 2 2616 10.7 1.10e-4
## 9 1 1003 2394 207. 225. 225. 225. 2 2856 17.6 1.20e-4
## 10 1 1004 2985 201. 218. 218. 218. 2 2856 16.9 1.53e-3
## # … with 18,932 more rows, and 1 more variable: filt <dbl>
Job 2 has more slow loads than job 1:
## # A tibble: 2 x 2
## job n
## <chr> <int>
## 1 1 8756
## 2 2 10186
The slow loads seem to be spread across all ranks.
We can identify which HEPnOS target is associated with each rank.
We can then summarize the event data by target:
## # A tibble: 1,024 x 10
## # Groups: job [2]
## job target nevts nranks minld medld meanld maxld nbytes nslices
## <chr> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 1 0 7734 14 0.00893 0.185 0.330 27.8 55084068 35861
## 2 1 1 7857 14 0.00893 0.177 0.328 27.0 56090376 36538
## 3 1 2 7746 14 0.00893 0.183 0.332 29.3 53726376 34994
## 4 1 3 7771 14 0.00898 0.183 0.333 30.1 53054172 34555
## 5 1 4 7711 14 0.00893 0.181 0.326 31.6 53682060 35072
## 6 1 5 7678 14 0.00908 0.185 0.331 33.4 52684128 34244
## 7 1 6 7791 14 0.00855 0.176 0.330 29.1 54216888 35284
## 8 1 7 7708 14 0.00862 0.188 0.330 26.3 53174220 34580
## 9 1 8 7658 14 0.00871 0.191 0.336 31.6 51630540 33506
## 10 1 9 7531 14 0.00907 0.186 0.332 29.9 51564792 33480
## # … with 1,014 more rows
## job target nevts nranks
## Length:1024 Min. : 0.0 Min. :6961 Min. :13.00
## Class :character 1st Qu.:127.8 1st Qu.:7538 1st Qu.:14.00
## Mode :character Median :255.5 Median :7644 Median :14.00
## Mean :255.5 Mean :7601 Mean :13.89
## 3rd Qu.:383.2 3rd Qu.:7722 3rd Qu.:14.00
## Max. :511.0 Max. :7900 Max. :14.00
## minld medld meanld maxld
## Min. :0.008000 Min. :0.1652 Min. :0.3184 Min. :17.53
## 1st Qu.:0.008720 1st Qu.:0.1867 1st Qu.:0.3337 1st Qu.:28.75
## Median :0.008870 Median :0.2111 Median :0.3389 Median :30.80
## Mean :0.008839 Mean :0.2092 Mean :0.3387 Mean :29.81
## 3rd Qu.:0.008980 3rd Qu.:0.2320 3rd Qu.:0.3436 3rd Qu.:32.28
## Max. :0.009450 Max. :0.2391 Max. :0.3540 Max. :37.15
## nbytes nslices
## Min. :43585320 Min. :28421
## 1st Qu.:51331257 1st Qu.:33354
## Median :52786092 Median :34342
## Mean :52551737 Mean :34175
## 3rd Qu.:54086640 3rd Qu.:35150
## Max. :56900088 Max. :36925
We show, for every target, the median load duration. We use the color to indicate the number of megabytes read from that target, and bin the data by number of (thousands of) slices, each plotted in its own panel. There is a tendency for the highest target numbers to have a smaller number of megabytes transfered, along with a slightly smaller median load duration.
## Warning in mutate_impl(.data, dots, caller_env()): Unequal factor levels:
## coercing to character
## Warning in mutate_impl(.data, dots, caller_env()): binding character and factor
## vector, coercing into character vector
## Warning in mutate_impl(.data, dots, caller_env()): binding character and factor
## vector, coercing into character vector
The distribution of event load durations for a few selected targets does not seem to indicate the different targets are behaving differently.
The time evolution of the load duration is complex. We plot here the load duration for each event, as a function of the (approximate) time at which that load is done. The time used is the timestamp from the pre create records event, which is taken shortly before that event is read from HEPnOS.
We plot only a few ranks, because the full plot takes very long to produce and is too busy to be useful.
We can summarize the data for each rank.
## # A tibble: 14,224 x 8
## # Groups: job [2]
## job rank target n minlt meanlt medlt maxlt
## <chr> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 0 0 600 0.00938 0.341 0.204 17.6
## 2 1 1 1 600 0.00907 0.344 0.168 16.0
## 3 1 2 2 600 0.00893 0.337 0.174 17.0
## 4 1 3 3 573 0.00898 0.343 0.140 18.2
## 5 1 4 4 516 0.00983 0.358 0.225 16.7
## 6 1 5 5 600 0.00915 0.335 0.189 17.6
## 7 1 6 6 600 0.00926 0.338 0.195 17.2
## 8 1 7 7 600 0.00947 0.347 0.191 21.3
## 9 1 8 8 506 0.00953 0.374 0.195 20.8
## 10 1 9 9 600 0.00951 0.336 0.204 23.1
## # … with 14,214 more rows
## job rank target n
## Length:14224 Min. : 0 Min. : 0.0 Min. :329.0
## Class :character 1st Qu.:1778 1st Qu.:126.8 1st Qu.:500.0
## Mode :character Median :3556 Median :253.5 Median :562.0
## Mean :3556 Mean :253.9 Mean :547.2
## 3rd Qu.:5333 3rd Qu.:380.2 3rd Qu.:600.0
## Max. :7167 Max. :511.0 Max. :600.0
## minlt meanlt medlt maxlt
## Min. :0.00800 Min. :0.2181 Min. :0.1029 Min. : 3.82
## 1st Qu.:0.00918 1st Qu.:0.3241 1st Qu.:0.1967 1st Qu.:17.09
## Median :0.00963 Median :0.3461 Median :0.2114 Median :17.73
## Mean :0.00973 Mean :0.3388 Mean :0.2080 Mean :18.89
## 3rd Qu.:0.01024 3rd Qu.:0.3606 3rd Qu.:0.2376 3rd Qu.:18.84
## Max. :0.01204 Max. :0.4425 Max. :0.3198 Max. :37.15
The median load duration per rank shows some structure. The target does not seem to be important in this structure.
The mean load duration per rank is more dependent on extreme outliers, and also shows some structure.
The maximum load duration per rank shows less structure, but there is still one block of ranks very different from the others.