Dealing with Multiple Local Minima: the Rastrigin Test Function

The problem

The “likelihood function” we will eventually get from DUNE is very likely to be one that has multiple local minima, at least in some of the variables. This certainly was the case for the NOvA likelihood function. So it is important for us to look at how well currently available algorithms are able to find the global minimum in a function with multiple local minima.

The Rastrigin function

One useful test function to evaluate how well an algorithm deals with multiple local minima is the Rastrigin function. This function is the sum of a series of terms, each of which is itself the sum of a parabola and negated cosine wave. The function’s definition is: \[f(\vec{x}) = 10 n + \sum_{i=1}^{n} ({x_i}^2 - 10 \cos(2 \pi x_i))\] where \(\vec{x}\) is the set of values \((x_1, \ldots , x_n)\). The parabola term creates a global minimum at the origin. The cosine term produces local minima at all “half wavelenths”, i.e., whenever the value of a coordinate is integral. Figure 1 shows what the Rastrigin function in one dimenion looks like.

Setup for the experiments

For each of the experiments, we select a dimensionality for the problem. In each experiment, we vary the maximum number of objective function calls we allow the minimizer to use. We are not setting any tolerance parameter for the algorithm. We are interested in looking at the quality of the minimum found, as a function of the allowed number of function calls. We also want to record the location of the minimum that was found, and the amount of time it took the algorithm to complete.

The Rastrigin function is symmetric about zero in every dimension. The objective function we will be using for DUNE (or any other experiment) will not generally have any such symmetr. In order for this symmetry to not give an undue advantage to the search for the function minimum, we select the region so that in each dimension the global minimum is not centered. Furthermore, we make sure it is differently off-center in each dimension.

We found that having large search region makes this problem very hard, and so after a first set of experiments using a search region that is 20 units wide in each coordinate, we switched to using 5 units wide in each coordinate.

In each experiment, we start with a maxcalls value of 1. We try the minimizer with this value of maxcalls, and print out the results. We stop when either of the following conditions holds:

The result is less than \(10^{-6}\) (the correct answer is exactly 0). In this case, we consider the algorithm to have converged.
It is taking too long, and we give up. In this case we consider the algorithm not to have converged.

Solving in 2 dimensions

In 2 dimensions, the dlib::find_min_global function is able to find the global minimum in an tolerable amount of time. The raw data are shown in Table 1.

Table 1: Minimization of the Rastrigin function in two dimensions.
maxcalls	fmin	x0	x1	time
1	145.0000000	-9.0000000	-8.000000	0.000160
2	145.0000000	-9.0000000	-8.000000	0.000063
4	10.6966950	0.9155896	1.830911	0.000510
8	10.6966950	0.9155896	1.830911	0.006571
16	5.0000000	1.0000000	2.000000	0.002645
32	5.0000000	1.0000000	2.000000	0.004101
64	5.0000000	1.0000000	2.000000	0.005716
128	5.0000000	1.0000000	2.000000	0.010254
256	5.0000000	1.0000000	2.000000	0.022236
512	0.9949591	-0.9949586	0.000000	0.053494
1024	0.9949591	-0.9949586	0.000000	0.164198
2048	0.9949591	-0.9949586	0.000000	0.626109
4096	0.9949591	-0.9949586	0.000000	2.331822
8192	0.9949591	-0.9949586	0.000000	9.703094
16384	0.9949591	-0.9949586	0.000000	41.999079
32768	0.0000000	0.0000000	0.000000	177.139547

Figure 2 shows how the value of the minimum found improves with an increasing number of function calls allowed. Note how the algorithm seems to have performance plateaus, where varying the number of function calls allowed does not make any difference – until a new plateau is reached. Eventually, the algorithm is able to find the global minimum.

Figure 2: Result of minimization in 2D as a function of the maximum number of function calls allowed. Note that the y-location of the final point is our means of showing y=0 on a plot with a logarithmic y-axis.

Figure 3: Execution time as a function of the maximum number of function calls allowed.

Figure 3 shows how the time taken to run the algorithm increases with the value of maxcalls. For the larger values of maxcalls, the relationship seems to be an approximate power law. Fitting the data for maxcalls > 1000 gives an exponent of 2.02. It appears from these data that the algorithm is \(\mathcal{O}(n^2)\) in two dimensions. Only a careful analysis of the algorithm would let us be sure.

Solving in 3 dimensions

We repeat the analysis for the function in 3 dimensions. In 3 dimensions, the dlib::find_min_global function is unable to find the global minimum in an acceptable amount of time. The raw data are shown in Table 2.

Table 2: Minimization of the Rastrigin function in three dimensions.
maxcalls	fmin	x0	x1	x2	time
1	194.000000	-9.0000000	-8.0000000	-7.0000000	0.000081
2	194.000000	-9.0000000	-8.0000000	-7.0000000	0.000088
4	130.716107	0.9704965	-9.5213687	2.7241786	0.000660
8	25.105858	0.8820629	0.6702858	-2.1032346	0.005139
16	5.875490	1.0000000	0.8494613	-0.0046777	0.008988
32	2.153669	1.0000000	1.0012149	-0.0276179	0.004815
64	1.989918	0.9949586	0.9949586	0.0000000	0.007212
128	1.989918	0.9949586	0.9949586	0.0000000	0.013036
256	1.989918	0.9949586	0.9949586	0.0000000	0.027539
512	1.989918	0.9949586	0.9949586	0.0000000	0.064075
1024	1.989918	0.9949586	0.9949586	0.0000000	0.182558
2048	1.989918	0.9949586	0.9949586	0.0000000	0.639757
4096	1.989918	0.9949586	0.9949586	0.0000000	2.611620
8192	1.989918	0.9949586	0.9949586	0.0000000	10.525603
16384	1.989918	0.9949586	0.9949586	0.0000000	45.571786
32768	1.989918	0.9949586	0.9949586	0.0000000	189.861363
65536	1.989918	0.9949586	0.9949586	0.0000000	805.874199

Figure 4 plots the rate of convergence. It shows some of same plateau structure as did the 2D case.

Figure 4: Result of minimization in 3D as a function of the maximum number of function calls allowed.

Figure 5 shows the time taken to execute the algorithm. Fitting the data for maxcalls > 1000 gives an exponent of 2.03. These data are consistent with the algorithm being \(\mathcal{O}(n^2)\) in three dimensions. So, while the algorithm did not converge, at least it does not appear that the complexity of the algorithm is dependent on the dimenstionality of the search space. Note that the algorithm ran for 13.4 minutes without converging.

Figure 5: Execution time as a function of the maximum number of function calls allowed.

Solving in 5 dimensions

As we should expect, in 5 dimensions the dlib::find_min_global function is not able to find the global minimum even if allowed to work for hours. Figure 6 shows the rate of rate of convergence.

Figure 6: Result of minimization in 5D as a function of the maximum number of function calls allowed.

Figure 7 shows the time taken for the algorithm to run. After 256 function calls, which takes about 0.031 seconds, the algorithm is unable to improve its solution. This is even after setting the maximum allowed number of function calls to 262144, which took 9.74 hours.

Figure 7: Execution time as a function of the maximum number of function calls allowed.

Fitting the data for maxcalls > 1000 gives an exponent of 2.15. Perhaps the algorithm is not \(\mathcal{O}(n^2)\), but slightly worse, e.g. \(\mathcal{O}(n^2 \log(n))\)? Only inspecting the code (or documentation) will tell us for sure.

Using a smaller search area

In the fits above, the search area contained \(20^n\) local minima, where \(n\) is the dimensionality of the search. This is apparently too hard for the algorithm to solve in an acceptable amount of time.

Next we restricted the search to a smaller region, with only \(5^n\) minima.

Search in 2 dimensions

In the smaller region, the algorithm find the minimum even more quickly than it did in the larger search space. Figure 8 shows the rate of convergence, and Figure 9 shows the time taken to run the algorithm.

Figure 8: Result of minimization in 2D as a function of the maximum number of function calls allowed.

Figure 9: Execution time as a function of the maximum number of function calls allowed.

Fitting the data for maxcalls > 500 gives an exponent of 1.71.

Search in 3 dimensions

In the smaller region, the algorithm is able to converge – but it takes 3.31 minutes, even though we’re only working in 3 dimensions. Figure 10 shows the rate of convergence and Figure 11 shows the time taken for the algorithm to run.

Figure 10: Result of minimization in 3D as a function of the maximum number of function calls allowed.

Figure 11: Execution time as a function of the maximum number of function calls allowed.

Fitting the data for maxcalls > 1000 gives an exponent of 2.04.

Search in 4 dimensions

Even in the smaller region, the algorithm is unable to converge within an acceptable time: the algorithm ran for 8.14 hours without converging. Figure 12 shows the rate of convergence and Figure 13 shows the time it took the algorithm to run.

Figure 12: Result of minimization in 4D as a function of the maximum number of function calls allowed.

Figure 13: Execution time as a function of the maximum number of function calls allowed.

Fitting the data for maxcalls > 5000 gives an exponent of 2.24.