Purpose of this document

This document shows a performance comparison between the serial and parallel implementations of the CUHRE algorithm for a positive-definite integrand.

The integrand chosen is: \[ | \cos(4 v +5 w + 6 x +7 y + 8 z)/k |\] with \(k = 0.6371054\). For this integrand, the normalization is approximate (meaning that the true value of the integrand is close to, but not exactly, 1.0).

Testing environment

These tests were run on ibmpower9.fnal.gov.

The Power9 machine used in these tests is a PowerNV 8335-GTG (AC 922) system. It has two processors, each with 8 cores. Each core has 4 slices.

Description of the dataframe

  1. alg: the name of the algorithm
  2. epsrel: the fractional error target
  3. value: the estimated value of the integral
  4. errorest: the estimated error for the result
  5. error: the absolute difference between the estimated value and the true value
  6. neval: the number of function evaluations used
  7. nregions: the number of regions used
  8. time: the time in milliseconds for the calculation
  9. r: ratio of (errorest/(epsrel*value)); this should be less than 1 if the algorithm has converged
  10. converged: boolean showing whether r < 1

A value of NA indicates that the algorithm did not converge, but rather stopped because the maximum number of function evaluations had been reached.

alg epsrel value errorest error neval nregions time r converged
cuhre 1.000e-03 1.0048540 1.004804e-03 4.854329e-03 8118201 14869 3.326333e+03 9.999502e-01 TRUE
cuhre 2.000e-04 1.0025390 2.005053e-04 2.539198e-03 42580629 77987 2.125008e+04 9.999875e-01 TRUE
cuhre 4.000e-05 0.9999999 3.999993e-05 5.463197e-08 207258051 379594 4.221983e+05 9.999983e-01 TRUE
cuhre 8.000e-06 NA NA NA 1000000365 1831503 9.132985e+06 NA NA
gpucuhre 1.000e-03 0.9985112 8.233076e-04 1.488768e-03 8945391 32767 1.042083e+01 8.245352e-01 TRUE
gpucuhre 2.000e-04 0.9994064 1.252114e-05 5.935883e-04 12585987141 23034875 2.884576e+03 6.264288e-02 TRUE
gpucuhre 4.000e-05 0.9994064 1.252114e-05 5.935883e-04 273 1 4.719840e-01 3.132144e-01 TRUE
gpucuhre 8.000e-06 0.9993348 7.496914e-06 6.652113e-04 25543365393 46766337 5.521393e+03 9.377380e-01 TRUE
gpucuhre 1.600e-06 0.9993013 5.847191e-06 6.986637e-04 29308169163 53661582 6.283886e+03 3.657050e+00 FALSE
gpucuhre 3.200e-07 0.9992867 4.955706e-06 7.133163e-04 30374713005 55614959 6.504139e+03 1.549764e+01 FALSE
gpucuhre 6.400e-08 0.9992785 4.377391e-06 7.215283e-04 30374713005 55614959 6.519846e+03 6.844612e+01 FALSE
gpucuhre 1.280e-08 0.9992768 3.963703e-06 7.232203e-04 31727254923 58092142 6.787105e+03 3.098884e+02 FALSE
gpucuhre 2.560e-09 0.9992777 3.648882e-06 7.223472e-04 33609219735 61538964 7.172606e+03 1.426375e+03 FALSE
gpucuhre 5.120e-10 0.9992786 3.398863e-06 7.213668e-04 34638790095 63424624 7.380872e+03 6.643197e+03 FALSE
gpucuhre 1.024e-10 0.9992851 9.342901e-06 7.149388e-04 34644706551 63435460 7.380701e+03 9.130454e+04 FALSE
gpucuhre 2.048e-11 0.9992874 9.343265e-06 7.126272e-04 36650385135 67108864 7.790073e+03 4.565394e+05 FALSE
gpucuhre 4.096e-12 0.9992874 6.606686e-06 7.126272e-04 36650385135 67108864 7.787351e+03 1.614111e+06 FALSE

Analysis

The parallel algorithm (at least as I have used it) seems not to be influenced by the user-specified fractional error target. The plot (and especially the fit) is not at all currently meaningful.

Because the range of fractional error tolerance values is large, and the range of times is very large, we use log scales for our plot. The line is a fitted linear model; the fit is clearly poor.

The number of regions needed to obtain a given fractional error tolerance seems to be related in a power law to the reciprocal of the fractional error tolerance, for the serial algorithm. Because the parallel algorithm is not yet working correctly, that plot is currently uninformative. The shaded band indicates the linear model’s standard error estimate.

Does the estimated value of the integral show any trend as we tighten the required fractional error tolerance?

## Warning: Removed 1 rows containing missing values (geom_point).