We read all the data from the HDF5 files, into a single dataframe. In addition to the columns in the HDF5 file, we also add ranks and nranks.
## # A tibble: 1,030 x 21
## birth beforeread afterfirstbc createconfig aftereadosc afterbc
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 3.07e-4 0.00420 0.0857 0.0862 1.11 1.11
## 2 1.38e-3 0.00246 0.0867 0.0876 0.0876 1.11
## 3 1.30e-3 0.00246 0.0867 0.0876 0.0876 1.11
## 4 5.05e-4 0.00144 0.0867 0.0875 0.0875 1.11
## 5 1.12e-3 0.00217 0.0866 0.0876 0.0876 1.11
## 6 6.82e-4 0.00150 0.0866 0.0873 0.0873 1.11
## 7 3.00e-5 0.000819 0.0870 0.0876 0.0876 1.11
## 8 7.22e-4 0.00155 0.0870 0.0877 0.0877 1.11
## 9 1.28e-3 0.00234 0.0865 0.0872 0.0872 1.11
## 10 4.42e-4 0.00133 0.0865 0.0870 0.0870 1.11
## # … with 1,020 more rows, and 15 more variables: afterreadandbcbg <dbl>,
## # aftermatrixinv <dbl>, aftergridcreation <dbl>, aftersiggencreation <dbl>,
## # afterhdf5dscxreation <dbl>, afterdiycompose <dbl>,
## # afterloadbalancecalc <dbl>, beforefc <dbl>, afterfc <dbl>, end <dbl>,
## # work <dbl>, rank <int>, nr <int>, nranks <fct>, fc <dbl>
We have hypothesized that running 69 ranks on a 68 core machine will cause 2 ranks to perform poorly (because they are running on the same core). We also expect running 70 ranks to give 4 ranks that perform poorly. Here are the data:
This plot shows how well each rank is performing. We note that when there is more than one rank running on a core the velocity of those ranks is almost (but not quite) cut in half.
The performance of the low-velocity ranks is prominent. Ranks running two-to-a-core are much slower than those runmning alone on a core. Ones running four-to-a-core are slower still.