1 Introduction

This document presents an analysis of the reading performance of PandAna. PandAna uses the Python package h5py (http://h5py.org) to read tabular data stored in HDF5 (https://portal.hdfgroup.org/display/HDF5/HDF5).

We have run a sample PandAna application on Haswell nodes of Cori at NERSC, using the same code and a varying number of MPI ranks.

The data being read consist of 1951 subruns of NOvA ND data, in h5caf format. We have looked at compressed and uncompressed files, and for the compressed files, at both striped and unstriped files. The striped files each had a 1 MB stripe size, and 128 stripe count.

The analysis code can be found in the PandAna repository (https://bitbucket.org/mpaterno/pandana), in Demos/candidate_selection.py.

2 Experimental setup

# Read in the data
run_names <- dir(path = "local-analysis/striping-study",
                 pattern = "cori_[[:digit:]]+_1951")
dirs <- paste("local-analysis/striping-study",
              run_names,
              sep="/")
frames <- lapply(dirs, read_pandana_perf_experiment)
names(frames) <- run_names

We have 7 experiments, each with a different number of ranks used. For each rank in each experiment, we record a timestamp at various “events” in the program execution.

The experiments analyzed here are:

names(frames) %>% knitr::kable()

x
cori_0128_1951_uncomp_striped_01
cori_0256_1951_uncomp_striped_01
cori_0512_1951_striped_01
cori_0512_1951_striped_02
cori_0512_1951_uncomp_striped_01
cori_0512_1951_unstriped_01
cori_0512_1951_unstriped_02

main <- bind_rows(lapply(frames, make_main_df), .id="run")

3 Timing summary data

First we look at the timing summary data for each run, and how that time varies with the number of ranks used.

main_summary <-
  group_by(main, nranks, run) %>% 
  summarize(read = max(go), compute = max(fillSpectra), total = max(total), .groups = "drop")
main_summary

4 Effect of striping and compression

uncomp <- filter(main, nranks==512, grepl("1951_uncomp_striped", run))
striped <- filter(main, nranks==512, grepl("1951_striped", run))
unstriped <- filter(main, nranks==512, grepl("1951_unstriped", run))
d512 <- bind_rows(uncomp = uncomp,
                  striped = striped,
                  unstriped = unstriped,
                  .id = "filetype")

We have several runs with 512 ranks that we can used to evaluate the effect of compression and of striping the file. Note that we have two runs on the striped and compressed file, and also two runs on the unstriped and compressed file.

ggplot(d512, aes(total, filetype)) +
 geom_boxplot() +
 labs(x="Total run time (s)", y="type of file")

Total run time for each rank by file type.

ggplot(d512, aes(go, filetype)) +
  geom_boxplot() +
  labs(x="Read time (s)", y="type of file")

Read time for each rank by file type.

Reading the uncompressed file is significantly slower than reading the compressed files; striping seems to make no difference, except perhaps in obtaining more uniform performance. Since we have only two runs on each of the compressed files, we haven’t enough data to be certain.

ggplot(d512, aes(fillSpectra, filetype)) +
 geom_boxplot() +
 labs(x="Total processing time (s)", y="type of file")

Total in-memory processing time for each rank by file type.

As expected, the in-memory processing times are essentially identical for the different runs.

5 Scaling performance of reads on uncompressed file

Although the reading speed of the uncompressed file seems poor, we want to see whether the scaling is good.

uncomp_summary <- main_summary %>% filter(grepl("uncomp", run))

log2breaks <- c(128, 256, 512 )
ggplot(uncomp_summary, aes(nranks, read)) +
 geom_point() +
 scale_x_log10(breaks = log2breaks) +
 scale_y_log10() +
 labs(x="Total ranks", y="Read time (s)")

Read time as a function of the nuber of ranks used.

The time it takes for the data to be read does not change with the number of ranks doing the reading; the values are essentially constant.

log2breaks <- c(128, 256, 512 )
ggplot(uncomp_summary, aes(nranks, compute)) +
 geom_smooth(method="lm", formula = "y~x") +
 geom_point() +
 scale_x_log10(breaks = log2breaks) +
 scale_y_log10() +
 labs(x="Total ranks", y="Total computing time (s)")

In-memory processing time as a function of the nuber of ranks used.

As expected, the scaling of the in-memory processing is excellent.

PandAna Performance part 3

Marc Paterno

2020-09-17

1 Introduction

2 Experimental setup

3 Timing summary data

4 Effect of striping and compression

5 Scaling performance of reads on uncompressed file