Purpose of this document

This document shows a performance comparison between the serial and parallel implementations of the CUHRE algorithm for a non-positive-definite integrand.

The integrand chosen is: \[ \cos(s + 2 t + 3 u +4 w +5 w + 6 x +7 y + 8 z) / k\] with \[ k = (1/315) \sin(1) \sin(3/2) \sin(2) \sin(5/2) \sin(3) \sin(7/2) \sin(4) (\sin(37/2) - \sin(35/2)). \]

\(k\) is approximately equal to 3.43955795218325e-05.

Testing environment

These tests were run on ibmpower9.fnal.gov.

The Power9 machine used in these tests is a PowerNV 8335-GTG (AC 922) system. It has two processors, each with 8 cores. Each core has 4 slices.

Description of the dataframe

  1. alg: the name of the algorithm (cuhre is serial; gpucuhre is parallel)
  2. epsrel: the fractional error target
  3. value: the estimated value of the integral
  4. errorest: the estimated error for the result
  5. error: the absolute difference between the estimated value and the true value
  6. neval: the number of function evaluations used
  7. nregions: the number of regions used
  8. time: the time in milliseconds for the calculation
  9. r: ratio of (errorest/(epsrel*value)); this should be less than 1 if the algorithm has converged

A value of NA indicates that the algorithm did not converge, but rather stopped because the maximum number of function evaluations had been reached.

alg epsrel value errorest error neval nregions time r
cuhre 1.0000e-03 0.9999127 0.0009998640 8.73389e-05 48526075 21958 26820.71 0.9999513
cuhre 5.0000e-04 1.0000750 0.0005000200 7.50680e-05 79205295 35840 44249.74 0.9999650
cuhre 2.5000e-04 1.0000490 0.0002500074 4.88586e-05 124311395 56250 70582.94 0.9999806
cuhre 1.2500e-04 1.0000140 0.0001250007 1.35957e-05 202753135 91744 121709.60 0.9999916
cuhre 6.2500e-05 1.0000040 0.0000625000 4.13070e-06 335613915 151862 230264.20 0.9999955
cuhre 3.1250e-05 1.0000010 0.0000312499 1.45540e-06 548874495 248360 451179.80 0.9999961
cuhre 1.5625e-05 0.9999997 0.0000156250 2.66500e-07 882520405 399331 935296.90 0.9999971
cuhre 7.8125e-06 1.0000010 0.0000078125 8.65700e-07 1388581675 628318 2060807.00 0.9999987
cuhre 3.9062e-06 1.0000000 0.0000039062 1.85500e-07 2117439675 958118 4446506.00 0.9999990
cuhre 1.9531e-06 1.0000000 0.0000019531 1.61000e-07 3294176275 1490578 10027550.00 0.9999995

Analysis

We do not yet have results from the parallel algorithm.

Because the range of fractional error tolerance values is large, and the range of times is very large, we use log scales for our plot. The line is a fitted linear model; the fit is clearly poor.

The number of regions needed to obtain a given fractional error tolerance seems to be related in a power law to the reciprocal of the fractional error tolerance.