1 The data

raw_112_64 <- read_raw_dataframes("campaign_4/112_64/timing*.dat")
raw_008_64 <- read_raw_dataframes("campaign_4/008_64/timing*.dat")
events_112_64 <- make_events_df(raw_112_64)
events_008_64 <- make_events_df(raw_008_64)

We are looking at a run using 16 nodes running the HEPnOS daemon, with 512 targets.

We use two runs of the eventselection program: 1. 112 nodes each running 64 ranks. 2. 8 nodes each running 64 ranks.

The dataset used is the 1929 subrun sample from the NOvA ND.

There are 3979542 events in the data sample.

2 Total job run time

The total job run time, according to the batch job log, for the 112 node run was ~639 seconds. For the 8 node run it was ~614 seconds.

Looking only at the timing while the MPI programming is running, we see the distribution of total run times by rank:

3 Breakdown of running time

ebr_112_64 <- events_112_64 %>%
  group_by(rank) %>% 
  summarize(nevents = n(),
            load=sum(load),
            rec=sum(rec),
            filt=sum(filt),
            nslices=sum(nslices),
            nbytes=sum(nbytes),
            .groups = "drop")
ebr_008_64 <- events_008_64 %>%
  group_by(rank) %>% 
  summarize(nevents = n(),
            load=sum(load),
            rec=sum(rec),
            filt=sum(filt),
            nslices=sum(nslices),
            nbytes=sum(nbytes),
            .groups = "drop")
ebr <- bind_rows("8" = ebr_008_64, "112" = ebr_112_64, .id="run")
ebr$run <- as.integer(ebr$run)
summary(ebr_112_64)
##       rank         nevents           load            rec         
##  Min.   :   0   Min.   :334.0   Min.   :100.5   Min.   :0.03735  
##  1st Qu.:1792   1st Qu.:500.0   1st Qu.:145.6   1st Qu.:0.06932  
##  Median :3584   Median :599.0   Median :172.7   Median :0.08003  
##  Mean   :3584   Mean   :555.2   Mean   :164.5   Mean   :0.07603  
##  3rd Qu.:5375   3rd Qu.:600.0   3rd Qu.:183.7   3rd Qu.:0.08279  
##  Max.   :7167   Max.   :600.0   Max.   :198.1   Max.   :0.09279  
##       filt           nslices         nbytes       
##  Min.   :0.8785   Min.   : 729   Min.   :1104288  
##  1st Qu.:4.1038   1st Qu.:2301   1st Qu.:3537753  
##  Median :4.6966   Median :2591   Median :3980346  
##  Mean   :4.4621   Mean   :2494   Mean   :3835275  
##  3rd Qu.:4.9446   3rd Qu.:2759   3rd Qu.:4245807  
##  Max.   :6.0373   Max.   :3194   Max.   :5154600
summary(ebr_008_64)
##       rank          nevents          load            rec        
##  Min.   :  0.0   Min.   :7182   Min.   :103.7   Min.   :0.9577  
##  1st Qu.:127.8   1st Qu.:7717   1st Qu.:107.6   1st Qu.:1.0537  
##  Median :255.5   Median :7784   Median :109.7   Median :1.0683  
##  Mean   :255.5   Mean   :7773   Mean   :110.9   Mean   :1.0678  
##  3rd Qu.:383.2   3rd Qu.:7832   3rd Qu.:113.8   3rd Qu.:1.0824  
##  Max.   :511.0   Max.   :7972   Max.   :131.6   Max.   :1.2927  
##       filt          nslices          nbytes        
##  Min.   :44.00   Min.   :22521   Min.   :34735164  
##  1st Qu.:65.13   1st Qu.:34510   1st Qu.:53028810  
##  Median :67.04   Median :35562   Median :54653988  
##  Mean   :66.49   Mean   :34919   Mean   :53693848  
##  3rd Qu.:68.73   3rd Qu.:36214   3rd Qu.:55694382  
##  Max.   :96.10   Max.   :38475   Max.   :59072916

3.1 Plots

The scaling for the filtering step scales nearly perfectly. We can see this by looking at the number of events per second processed by each rank. We do this as a function of the number of events processed by the rank, to see if there is any variation.

ggplot(ebr, aes(nevents, nevents/filt)) +
  geom_smooth(method="lm", formula="y~x") +
  geom_point(alpha=0.3) +
  facet_wrap(vars(run), scales="free_x") +
  labs(x="Number of events processed by the rank",
       y="events/sec for each rank",
       title="Per-rank throughput of filtering step for 8 and 112 node runs.")

The loading does not scale well. Recall that in each case we are using a HEPnOS daemon with 512 targets. This means the 8 node run has only 1 client rank per target, while the 112 node run has 14 client ranks per target.

ggplot(ebr, aes(nevents, nevents/load)) +
  geom_smooth(method="lm", formula="y~x") +
  geom_point(alpha=0.3) +
  facet_wrap(vars(run), scales="free_x") +
  labs(x="Number of events loaded by the rank",
       y="events/sec for each rank",
       title="Per-rank throughput of data loading step for 8 and 112 node runs.")

We can also look at the aggregate loading rate for the whole program. I am not sure what is the best way to define this. What I have done here is to sum the rate for each rank in the program, giving a throughput in events/second.

ebr %>% group_by(run) %>%
  summarize(throughput=sum(nevents/load), .groups="drop") %>%
  knitr::kable()
run throughput
8 35928.85
112 24483.10