1 Purpose

This document contains analysis of the Phase 1 behavior of the parallel CUHRE algorithm. This analysis is for the DES integrand “SigmaMiscent”, which has 7 variables of integration.

2 Read data

Note: reading the RDS file is relatively slow. use the function rds_to_feather to convert the RDS file into a (larger, but faster to read) Feather file for faster processing. From the R console, run:

library(ldrd2020050) # to make the rds_to_feather function available
rds_to_feather("sigMiscent_1e-3_Phase_1_regions.rds", "sigMiscent_1e-3_Phase_1_regions.feather")

Note: We can not use caching of the read of the raw dataframe because it contains too many rows.

d <- read_feather_or_rds("sigMiscent_1e-3_Phase_1_regions") %>%
  augment_raw_dataframe()

3 Summarize by iteration

The summary dataframe has one row per iteration. The columns whose names start with act contain the summary of the regions that are active in the iteration; this means that these regions will be split in the next iteration. The columns whose names start with fin contain the summary of the regions that are finished in the iteration; these regions will not be split in the next iteration. The columns whose names start with it contain the sum of the act and fin columns. The columns whose names start with tot contain the total of all the finished regions up to and including the current iteration, plus the active regions for this iteration. These columns represent the best answer from the algorithm, up to the given iteration.

by_iter <- make_iteration_dataframe(d)
by_iter

3.1 Evolution of estimate and its error

For the first few iterations, we see the estimate fluctuate widely and the error estimate is very large:

ggplot(by_iter, aes(iteration, tot.est,  ymin=tot.est-tot.err, ymax=tot.est+tot.err)) +
  geom_errorbar(alpha=0.5, size=0.5, width=0.2) +
  geom_point() +
  labs(x="iteration", y = "Integral estimate")  +
  scale_y_continuous(labels = scales::comma)

After iteration 6, the error estimate deceases dramatically.

filter(by_iter, iteration>6) %>%
  ggplot( aes(iteration, tot.est,  ymin=tot.est-tot.err, ymax=tot.est+tot.err)) +
  geom_errorbar(alpha=0.5, size=0.5, width=0.2) +
  geom_point()  +
  labs(x="iteration", y = "Integral estimate") +
  scale_y_continuous(labels = scales::comma)

After about 20 iterations, we can see that the estimate is clearly positive. The error is still steadily decreasing.

filter(by_iter, iteration>20) %>%
  ggplot( aes(iteration, tot.est,  ymin=tot.est-tot.err, ymax=tot.est+tot.err)) +
  geom_errorbar(alpha=0.5, size=0.5, width=0.2) +
  geom_point() +
  labs(x="iteration", y = "Integral estimate") +
  scale_y_continuous(labels = scales::comma)

The final few steps show that we are apparently converging on a single value, but the error estimate has almost stopped decreasing:

filter(by_iter, iteration>30) %>%
  ggplot( aes(iteration, tot.est,  ymin=tot.est-tot.err, ymax=tot.est+tot.err)) +
  geom_errorbar(alpha=0.5, size=0.5, width=0.2) +
  geom_point() +
  labs(x="iteration", y = "Integral estimate") +
  scale_y_continuous(labels = scales::comma)

3.2 Evolution of the error estimate

At the final iteration, the algorithm has converged to the 0.001 fractional error tolerance specified:

by_iter %>%
  select(iteration, starts_with("tot")) %>%
  mutate(frac.err = tot.err/abs(tot.est),
         converged = (tot.err/tot.est) < 0.001)

After about 30 iterations, the rate at which the algorithm is reducing the error estimate clearly decreases.

ggplot(by_iter, aes(iteration, abs(tot.err/tot.est))) +
  geom_point() +
  labs(x = "iteration", y = "fractional error estimate") +
  scale_y_log10(labels = scales::comma)

3.3 Evolution of number of regions in each iteration

vols <-
  by_iter %>%
  select(iteration, act.nreg, fin.nreg, tot.nreg) %>%
  rename(active = act.nreg, finished = fin.nreg, total = tot.nreg) %>% 
  mutate(in.memory = active + finished) %>% 
  pivot_longer(cols=!iteration) %>%
  mutate(name = as.factor(name))

max_in_memory <- filter(vols, name=="in.memory") %>% pull(value) %>% max()
total_regions_calculated <- by_iter$tot.nreg %>% max()

The evolution of the number of regions in each iteration shows several phases. While the total number of regions every used only increases, the maximum number of regions currently in memory peaks at \(4.041916\times 10^{6}\). The total number of regions evaluated is \(2.0379538\times 10^{7}\)

ggplot(vols, aes(iteration, value, color=name)) +
  geom_point() +
  geom_line(alpha=0.3) +
  scale_y_log10(labels = scales::comma) +
  labs(x = "iteration", y = "Number of regions")

Phase 1 Regions

Marc Paterno

2020-11-25