Data

We read all the data from the HDF5 files, into a single dataframe. In addition to the columns in the HDF5 file, we also add ranks and nranks.

## # A tibble: 1,030 x 21
##      birth beforeread afterfirstbc createconfig aftereadosc afterbc
##      <dbl>      <dbl>        <dbl>        <dbl>       <dbl>   <dbl>
##  1 3.07e-4   0.00420        0.0857       0.0862      1.11      1.11
##  2 1.38e-3   0.00246        0.0867       0.0876      0.0876    1.11
##  3 1.30e-3   0.00246        0.0867       0.0876      0.0876    1.11
##  4 5.05e-4   0.00144        0.0867       0.0875      0.0875    1.11
##  5 1.12e-3   0.00217        0.0866       0.0876      0.0876    1.11
##  6 6.82e-4   0.00150        0.0866       0.0873      0.0873    1.11
##  7 3.00e-5   0.000819       0.0870       0.0876      0.0876    1.11
##  8 7.22e-4   0.00155        0.0870       0.0877      0.0877    1.11
##  9 1.28e-3   0.00234        0.0865       0.0872      0.0872    1.11
## 10 4.42e-4   0.00133        0.0865       0.0870      0.0870    1.11
## # … with 1,020 more rows, and 15 more variables: afterreadandbcbg <dbl>,
## #   aftermatrixinv <dbl>, aftergridcreation <dbl>, aftersiggencreation <dbl>,
## #   afterhdf5dscxreation <dbl>, afterdiycompose <dbl>,
## #   afterloadbalancecalc <dbl>, beforefc <dbl>, afterfc <dbl>, end <dbl>,
## #   work <dbl>, rank <int>, nr <int>, nranks <fct>, fc <dbl>

Plots

Time to do FC calculation

We have hypothesized that running 69 ranks on a 68 core machine will cause 2 ranks to perform poorly (because they are running on the same core). We also expect running 70 ranks to give 4 ranks that perform poorly. Here are the data:

Velocity for FC calcualtion

This plot shows how well each rank is performing. We note that when there is more than one rank running on a core the velocity of those ranks is almost (but not quite) cut in half.

The performance of the low-velocity ranks is prominent. Ranks running two-to-a-core are much slower than those runmning alone on a core. Ones running four-to-a-core are slower still.