Dealing with Multiple Local Minima: the Rastrigin Test Function
The problem
The “likelihood function” we will eventually get from DUNE is very likely to be one that has multiple local minima, at least in some of the variables. This certainly was the case for the NOvA likelihood function. So it is important for us to look at how well currently available algorithms are able to find the global minimum in a function with multiple local minima.
The Rastrigin function
One useful test function to evaluate how well an algorithm deals with multiple local minima is the Rastrigin function. This function is the sum of a series of terms, each of which is itself the sum of a parabola and negated cosine wave. The function’s definition is: \[f(\vec{x}) = 10 n + \sum_{i=1}^{n} ({x_i}^2 - 10 \cos(2 \pi x_i))\] where \(\vec{x}\) is the set of values \((x_1, \ldots , x_n)\). The parabola term creates a global minimum at the origin. The cosine term produces local minima at all “half wavelenths”, i.e., whenever the value of a coordinate is integral. Figure 1 shows what the Rastrigin function in one dimenion looks like.
Setup for the experiments
For each of the experiments, we select a dimensionality for the problem. In each experiment, we vary the maximum number of objective function calls we allow the minimizer to use. We are not setting any tolerance parameter for the algorithm. We are interested in looking at the quality of the minimum found, as a function of the allowed number of function calls. We also want to record the location of the minimum that was found, and the amount of time it took the algorithm to complete.
The Rastrigin function is symmetric about zero in every dimension. The objective function we will be using for DUNE (or any other experiment) will not generally have any such symmetr. In order for this symmetry to not give an undue advantage to the search for the function minimum, we select the region so that in each dimension the global minimum is not centered. Furthermore, we make sure it is differently off-center in each dimension.
We found that having large search region makes this problem very hard, and so after a first set of experiments using a search region that is 20 units wide in each coordinate, we switched to using 5 units wide in each coordinate.
In each experiment, we start with a maxcalls value of 1. We try the minimizer with this value of maxcalls, and print out the results. We stop when either of the following conditions holds:
- The result is less than \(10^{-6}\) (the correct answer is exactly 0). In this case, we consider the algorithm to have converged.
- It is taking too long, and we give up. In this case we consider the algorithm not to have converged.
Solving in 2 dimensions
In 2 dimensions, the dlib::find_min_global function is able to find the global minimum in an tolerable amount of time. The raw data are shown in Table 1.
| maxcalls | fmin | x0 | x1 | time |
|---|---|---|---|---|
| 1 | 145.0000000 | -9.0000000 | -8.000000 | 0.000160 |
| 2 | 145.0000000 | -9.0000000 | -8.000000 | 0.000063 |
| 4 | 10.6966950 | 0.9155896 | 1.830911 | 0.000510 |
| 8 | 10.6966950 | 0.9155896 | 1.830911 | 0.006571 |
| 16 | 5.0000000 | 1.0000000 | 2.000000 | 0.002645 |
| 32 | 5.0000000 | 1.0000000 | 2.000000 | 0.004101 |
| 64 | 5.0000000 | 1.0000000 | 2.000000 | 0.005716 |
| 128 | 5.0000000 | 1.0000000 | 2.000000 | 0.010254 |
| 256 | 5.0000000 | 1.0000000 | 2.000000 | 0.022236 |
| 512 | 0.9949591 | -0.9949586 | 0.000000 | 0.053494 |
| 1024 | 0.9949591 | -0.9949586 | 0.000000 | 0.164198 |
| 2048 | 0.9949591 | -0.9949586 | 0.000000 | 0.626109 |
| 4096 | 0.9949591 | -0.9949586 | 0.000000 | 2.331822 |
| 8192 | 0.9949591 | -0.9949586 | 0.000000 | 9.703094 |
| 16384 | 0.9949591 | -0.9949586 | 0.000000 | 41.999079 |
| 32768 | 0.0000000 | 0.0000000 | 0.000000 | 177.139547 |
Figure 2 shows how the value of the minimum found improves with an increasing number of function calls allowed. Note how the algorithm seems to have performance plateaus, where varying the number of function calls allowed does not make any difference – until a new plateau is reached. Eventually, the algorithm is able to find the global minimum.
Figure 3 shows how the time taken to run the algorithm increases with the value of maxcalls. For the larger values of maxcalls, the relationship seems to be an approximate power law. Fitting the data for maxcalls > 1000 gives an exponent of 2.02. It appears from these data that the algorithm is \(\mathcal{O}(n^2)\) in two dimensions. Only a careful analysis of the algorithm would let us be sure.
Solving in 3 dimensions
We repeat the analysis for the function in 3 dimensions. In 3 dimensions, the dlib::find_min_global function is unable to find the global minimum in an acceptable amount of time. The raw data are shown in Table 2.
| maxcalls | fmin | x0 | x1 | x2 | time |
|---|---|---|---|---|---|
| 1 | 194.000000 | -9.0000000 | -8.0000000 | -7.0000000 | 0.000081 |
| 2 | 194.000000 | -9.0000000 | -8.0000000 | -7.0000000 | 0.000088 |
| 4 | 130.716107 | 0.9704965 | -9.5213687 | 2.7241786 | 0.000660 |
| 8 | 25.105858 | 0.8820629 | 0.6702858 | -2.1032346 | 0.005139 |
| 16 | 5.875490 | 1.0000000 | 0.8494613 | -0.0046777 | 0.008988 |
| 32 | 2.153669 | 1.0000000 | 1.0012149 | -0.0276179 | 0.004815 |
| 64 | 1.989918 | 0.9949586 | 0.9949586 | 0.0000000 | 0.007212 |
| 128 | 1.989918 | 0.9949586 | 0.9949586 | 0.0000000 | 0.013036 |
| 256 | 1.989918 | 0.9949586 | 0.9949586 | 0.0000000 | 0.027539 |
| 512 | 1.989918 | 0.9949586 | 0.9949586 | 0.0000000 | 0.064075 |
| 1024 | 1.989918 | 0.9949586 | 0.9949586 | 0.0000000 | 0.182558 |
| 2048 | 1.989918 | 0.9949586 | 0.9949586 | 0.0000000 | 0.639757 |
| 4096 | 1.989918 | 0.9949586 | 0.9949586 | 0.0000000 | 2.611620 |
| 8192 | 1.989918 | 0.9949586 | 0.9949586 | 0.0000000 | 10.525603 |
| 16384 | 1.989918 | 0.9949586 | 0.9949586 | 0.0000000 | 45.571786 |
| 32768 | 1.989918 | 0.9949586 | 0.9949586 | 0.0000000 | 189.861363 |
| 65536 | 1.989918 | 0.9949586 | 0.9949586 | 0.0000000 | 805.874199 |
Figure 4 plots the rate of convergence. It shows some of same plateau structure as did the 2D case.
Figure 5 shows the time taken to execute the algorithm. Fitting the data for maxcalls > 1000 gives an exponent of 2.03. These data are consistent with the algorithm being \(\mathcal{O}(n^2)\) in three dimensions. So, while the algorithm did not converge, at least it does not appear that the complexity of the algorithm is dependent on the dimenstionality of the search space. Note that the algorithm ran for 13.4 minutes without converging.
Solving in 5 dimensions
As we should expect, in 5 dimensions the dlib::find_min_global function is not able to find the global minimum even if allowed to work for hours. Figure 6 shows the rate of rate of convergence.
Figure 7 shows the time taken for the algorithm to run. After 256 function calls, which takes about 0.031 seconds, the algorithm is unable to improve its solution. This is even after setting the maximum allowed number of function calls to 262144, which took 9.74 hours.
Fitting the data for maxcalls > 1000 gives an exponent of 2.15. Perhaps the algorithm is not \(\mathcal{O}(n^2)\), but slightly worse, e.g. \(\mathcal{O}(n^2 \log(n))\)? Only inspecting the code (or documentation) will tell us for sure.
Using a smaller search area
In the fits above, the search area contained \(20^n\) local minima, where \(n\) is the dimensionality of the search. This is apparently too hard for the algorithm to solve in an acceptable amount of time.
Next we restricted the search to a smaller region, with only \(5^n\) minima.
Search in 2 dimensions
In the smaller region, the algorithm find the minimum even more quickly than it did in the larger search space. Figure 8 shows the rate of convergence, and Figure 9 shows the time taken to run the algorithm.
Fitting the data for maxcalls > 500 gives an exponent of 1.71.
Search in 3 dimensions
In the smaller region, the algorithm is able to converge – but it takes 3.31 minutes, even though we’re only working in 3 dimensions. Figure 10 shows the rate of convergence and Figure 11 shows the time taken for the algorithm to run.
Fitting the data for maxcalls > 1000 gives an exponent of 2.04.
Search in 4 dimensions
Even in the smaller region, the algorithm is unable to converge within an acceptable time: the algorithm ran for 8.14 hours without converging. Figure 12 shows the rate of convergence and Figure 13 shows the time it took the algorithm to run.
Fitting the data for maxcalls > 5000 gives an exponent of 2.24.