This document shows a performance comparison between the serial and parallel implementations of the CUHRE algorithm for a positive-definite integrand.
The integrand chosen is: \[ | \cos(4 v +5 w + 6 x +7 y + 8 z)/k |\] with \(k = 0.6371054\). For this integrand, the normalization is approximate (meaning that the true value of the integrand is close to, but not exactly, 1.0).
These tests were run on ibmpower9.fnal.gov.
The Power9 machine used in these tests is a PowerNV 8335-GTG (AC 922) system. It has two processors, each with 8 cores. Each core has 4 slices.
A value of NA indicates that the algorithm did not converge, but rather stopped because the maximum number of function evaluations had been reached.
| alg | epsrel | value | errorest | error | neval | nregions | time | r | converged |
|---|---|---|---|---|---|---|---|---|---|
| cuhre | 1.000e-03 | 1.0048540 | 1.004804e-03 | 4.854329e-03 | 8118201 | 14869 | 3.326333e+03 | 9.999502e-01 | TRUE |
| cuhre | 2.000e-04 | 1.0025390 | 2.005053e-04 | 2.539198e-03 | 42580629 | 77987 | 2.125008e+04 | 9.999875e-01 | TRUE |
| cuhre | 4.000e-05 | 0.9999999 | 3.999993e-05 | 5.463197e-08 | 207258051 | 379594 | 4.221983e+05 | 9.999983e-01 | TRUE |
| cuhre | 8.000e-06 | NA | NA | NA | 1000000365 | 1831503 | 9.132985e+06 | NA | NA |
| gpucuhre | 1.000e-03 | 0.9985112 | 8.233076e-04 | 1.488768e-03 | 8945391 | 32767 | 1.042083e+01 | 8.245352e-01 | TRUE |
| gpucuhre | 2.000e-04 | 0.9994064 | 1.252114e-05 | 5.935883e-04 | 12585987141 | 23034875 | 2.884576e+03 | 6.264288e-02 | TRUE |
| gpucuhre | 4.000e-05 | 0.9994064 | 1.252114e-05 | 5.935883e-04 | 273 | 1 | 4.719840e-01 | 3.132144e-01 | TRUE |
| gpucuhre | 8.000e-06 | 0.9993348 | 7.496914e-06 | 6.652113e-04 | 25543365393 | 46766337 | 5.521393e+03 | 9.377380e-01 | TRUE |
| gpucuhre | 1.600e-06 | 0.9993013 | 5.847191e-06 | 6.986637e-04 | 29308169163 | 53661582 | 6.283886e+03 | 3.657050e+00 | FALSE |
| gpucuhre | 3.200e-07 | 0.9992867 | 4.955706e-06 | 7.133163e-04 | 30374713005 | 55614959 | 6.504139e+03 | 1.549764e+01 | FALSE |
| gpucuhre | 6.400e-08 | 0.9992785 | 4.377391e-06 | 7.215283e-04 | 30374713005 | 55614959 | 6.519846e+03 | 6.844612e+01 | FALSE |
| gpucuhre | 1.280e-08 | 0.9992768 | 3.963703e-06 | 7.232203e-04 | 31727254923 | 58092142 | 6.787105e+03 | 3.098884e+02 | FALSE |
| gpucuhre | 2.560e-09 | 0.9992777 | 3.648882e-06 | 7.223472e-04 | 33609219735 | 61538964 | 7.172606e+03 | 1.426375e+03 | FALSE |
| gpucuhre | 5.120e-10 | 0.9992786 | 3.398863e-06 | 7.213668e-04 | 34638790095 | 63424624 | 7.380872e+03 | 6.643197e+03 | FALSE |
| gpucuhre | 1.024e-10 | 0.9992851 | 9.342901e-06 | 7.149388e-04 | 34644706551 | 63435460 | 7.380701e+03 | 9.130454e+04 | FALSE |
| gpucuhre | 2.048e-11 | 0.9992874 | 9.343265e-06 | 7.126272e-04 | 36650385135 | 67108864 | 7.790073e+03 | 4.565394e+05 | FALSE |
| gpucuhre | 4.096e-12 | 0.9992874 | 6.606686e-06 | 7.126272e-04 | 36650385135 | 67108864 | 7.787351e+03 | 1.614111e+06 | FALSE |
The parallel algorithm (at least as I have used it) seems not to be influenced by the user-specified fractional error target. The plot (and especially the fit) is not at all currently meaningful.
Because the range of fractional error tolerance values is large, and the range of times is very large, we use log scales for our plot. The line is a fitted linear model; the fit is clearly poor.
The number of regions needed to obtain a given fractional error tolerance seems to be related in a power law to the reciprocal of the fractional error tolerance, for the serial algorithm. Because the parallel algorithm is not yet working correctly, that plot is currently uninformative. The shaded band indicates the linear model’s standard error estimate.
Does the estimated value of the integral show any trend as we tighten the required fractional error tolerance?
## Warning: Removed 1 rows containing missing values (geom_point).