We are looking at a run using 16 nodes running the HEPnOS daemon, with 512 targets, and 112 nodes running eventselection, each node running 64 ranks. The dataset used is the 1691 subrun sample from the NOvA ND.
raw <- readRDS("theta_es_7168_2020-06-30_01.rds")
global <- make_global_df(raw)
events <- make_events_df(raw)
The total job run time, according to the batch job log, is ~574 seconds. About 100 seconds seems to be consumed by MPI startup and shutdown.
Looking only at the timing while the MPI programming is running, we see the distribution of total run times by rank:
ggplot(global, aes(total)) +
geom_histogram(bins=50) +
labs(x="total running time", y="number of ranks")
We can get a more detailed breakdown of the running time per rank by looking at the event-level data, summing the times in the different steps of data processing for each event handled by a given rank.
ebr <-
events %>%
group_by(rank) %>%
summarize(nevents=n(),
load=sum(load),
rec=sum(rec),
filt=sum(filt),
nslices=sum(nslices),nbytes=sum(nbytes),
.groups = "drop")
ebr
summary(ebr)
## rank nevents load rec
## Min. : 0 Min. :244.0 Min. :109.2 Min. :0.02844
## 1st Qu.:1792 1st Qu.:500.0 1st Qu.:167.3 1st Qu.:0.06572
## Median :3584 Median :600.0 Median :185.0 Median :0.07577
## Mean :3584 Mean :555.2 Mean :181.3 Mean :0.07190
## 3rd Qu.:5375 3rd Qu.:600.0 3rd Qu.:197.9 3rd Qu.:0.07815
## Max. :7167 Max. :600.0 Max. :216.9 Max. :0.08630
## filt nslices nbytes
## Min. :1.248 Min. : 765 Min. :1169016
## 1st Qu.:4.481 1st Qu.:2294 1st Qu.:3529803
## Median :5.125 Median :2608 Median :4006770
## Mean :4.871 Mean :2494 Mean :3835275
## 3rd Qu.:5.409 3rd Qu.:2766 3rd Qu.:4251654
## Max. :6.076 Max. :3131 Max. :4946472
The loading time dominates the processing. The time (rec) it takes to transform the HEPnOS-related format to the Standard Record format is negligible.
ggplot(ebr, aes(nevents, filt)) +
geom_smooth(method="lm", formula="y~x") +
geom_point(alpha=0.3)
ggplot(ebr, aes(nevents, rec)) +
geom_smooth(method="lm", formula="y~x") +
geom_point(alpha=0.3)
ggplot(ebr, aes(nevents, load)) +
geom_smooth(method="lm", formula="y~x") +
geom_point(alpha=0.3)