1 Read the data
We read the raw data and transform it into the dataframe x:
## Rows: 4,100
## Columns: 21
## $ id <fct> f2 6D, f2 6D, f2 6D, f2 6D, f2 6D, f2 6D, f2 6D, f2 6D…
## $ integrand <fct> f2, f2, f2, f2, f2, f2, f2, f2, f2, f2, f2, f2, f2, f2…
## $ ndim <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, …
## $ epsrel <dbl> 0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001…
## $ integral <dbl> 1.28689e+13, 1.28689e+13, 1.28689e+13, 1.28689e+13, 1.…
## $ estimate <dbl> 1.286895e+13, 1.286894e+13, 1.286903e+13, 1.286894e+13…
## $ errorest <dbl> 45245711, 45980789, 45956106, 45980789, 45570908, 4598…
## $ iters <int> 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,…
## $ adj_iters <int> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40…
## $ skip_iters <int> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10…
## $ completed_iters <int> 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11…
## $ ncall <dbl> 1e+08, 1e+08, 1e+08, 1e+08, 1e+08, 1e+08, 1e+08, 1e+08…
## $ neval <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ time <dbl> 826.2411, 455.7271, 455.6021, 459.7445, 460.4974, 455.…
## $ status <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ nprec <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
## $ abserr <dbl> -51757495, -46675631, -134606906, -46675631, -29320070…
## $ fracerr <dbl> -4.021906e-06, -3.627011e-06, -1.045986e-05, -3.627011…
## $ est.fracerr <dbl> 3.515882e-06, 3.573004e-06, 3.571061e-06, 3.573004e-06…
## $ error.ratio <dbl> -1.1439251, -1.0151153, -2.9290626, -1.0151153, -0.643…
## $ chisq <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
2 \(\chi^2\) distribution for the data
0 records have status not equal to 0. However, the \(\chi^2\) distribution indicates that not all runs have reliable results:
The VEGAS paper says that we should distrust results that have $^2 > $ completed_iters-1. We will thus remove all runs with a \(\chi^2\) value above this.
3700 of 4100 rows pass this requirement. For these records, the distribution of chisq is still not great, but it is closer to the \(\chi^2\) distribution, shown in red:
For the remaining plots, we will only use records with chisq below our threshold value.
3 Error ratio
We are assuming that the meaning of the error estimate from the m-Cubes algorithm the same as the meaning of the error estimate from VEGAS. If this is correct, and our understanding of VEGAS is correct, then the error estimate should be distributed according to the normal distribution with mean 0 and with a standard deviation equal to the actual error in the calculation. In this case, the quantity we call error.ratio should be normally distributed with mean 0 and standard deviation 1.
First, we will group the data by function, and compare the distribution of error.estimate to a normal distribution for each function:
Next we do similarly, but this time grouping by the user’s required error tolerance, rather than by function:
None of these distributions are well-approximated by the normal distribution.
4 Figure 1 in the paper
Figure 1 in the paper shows the achieved relative error (\(y\)) as a function of the user-specified digits of precision (\(x\)) for each integrand. Because there are many runs shown on each panel, and one ‘dot’ per run, we end up with a great deal of over-plotting, and it is not easy to understand how the distribution of \(y\) varies with \(x\). For this kind of data, I find that a type of box-plot is often the best display: