This document shows a performance comparison between the serial and parallel implementations of the CUHRE algorithm for a non-positive-definite integrand.
The integrand chosen is: \[ \cos(s + 2 t + 3 u +4 w +5 w + 6 x +7 y + 8 z) / k\] with \[ k = (1/315) \sin(1) \sin(3/2) \sin(2) \sin(5/2) \sin(3) \sin(7/2) \sin(4) (\sin(37/2) - \sin(35/2)). \]
\(k\) is approximately equal to 3.43955795218325e-05.
These tests were run on ibmpower9.fnal.gov.
The Power9 machine used in these tests is a PowerNV 8335-GTG (AC 922) system. It has two processors, each with 8 cores. Each core has 4 slices.
A value of NA indicates that the algorithm did not converge, but rather stopped because the maximum number of function evaluations had been reached.
| alg | epsrel | value | errorest | error | neval | nregions | time | r |
|---|---|---|---|---|---|---|---|---|
| cuhre | 1.0000e-03 | 0.9999127 | 0.0009998640 | 8.73389e-05 | 48526075 | 21958 | 26820.71 | 0.9999513 |
| cuhre | 5.0000e-04 | 1.0000750 | 0.0005000200 | 7.50680e-05 | 79205295 | 35840 | 44249.74 | 0.9999650 |
| cuhre | 2.5000e-04 | 1.0000490 | 0.0002500074 | 4.88586e-05 | 124311395 | 56250 | 70582.94 | 0.9999806 |
| cuhre | 1.2500e-04 | 1.0000140 | 0.0001250007 | 1.35957e-05 | 202753135 | 91744 | 121709.60 | 0.9999916 |
| cuhre | 6.2500e-05 | 1.0000040 | 0.0000625000 | 4.13070e-06 | 335613915 | 151862 | 230264.20 | 0.9999955 |
| cuhre | 3.1250e-05 | 1.0000010 | 0.0000312499 | 1.45540e-06 | 548874495 | 248360 | 451179.80 | 0.9999961 |
| cuhre | 1.5625e-05 | 0.9999997 | 0.0000156250 | 2.66500e-07 | 882520405 | 399331 | 935296.90 | 0.9999971 |
| cuhre | 7.8125e-06 | 1.0000010 | 0.0000078125 | 8.65700e-07 | 1388581675 | 628318 | 2060807.00 | 0.9999987 |
| cuhre | 3.9062e-06 | 1.0000000 | 0.0000039062 | 1.85500e-07 | 2117439675 | 958118 | 4446506.00 | 0.9999990 |
| cuhre | 1.9531e-06 | 1.0000000 | 0.0000019531 | 1.61000e-07 | 3294176275 | 1490578 | 10027550.00 | 0.9999995 |
We do not yet have results from the parallel algorithm.
Because the range of fractional error tolerance values is large, and the range of times is very large, we use log scales for our plot. The line is a fitted linear model; the fit is clearly poor.
The number of regions needed to obtain a given fractional error tolerance seems to be related in a power law to the reciprocal of the fractional error tolerance.