Rastrigin Function

Here, the Rastrigin function is implemented for n dimensions, with the default number of dimensions set to 2. Given an n-dimensional input vector \(\mathbf{x} = (x_1, x_2, ..., x_n)\) , it outputs a scalar computed using the function:

\(f(\mathbf{x}) = An + \sum_{i=1}^n [x_i^2 - A \cos(2\pi x_i)]\) where \(A = 10\) and \(n =2\) by default

3D Plot of the Rastrigin function

I use a perspective 3D plot to visualise the Rastrigin function in three dimensions, with different values on the x and y axes corresponding to \(x_1\) and \(x_2\) , and the z coordinate representing the function value \(f(\mathbf{x})\) where \(\mathbf{x} = (x_1, x_2)\)

The figure curves downward toward the centre, suggesting that the minimum might be (0,0). We can confirm this by observing that for all \(i\), \(A (1-\cos(2\pi x_i)) + x_i^2\) is non negative and \(f(x)\) is the sum of these. \(f((0,0)) = 0\) is the least non negative quantity possible and thus the origin is a global minimum.

Optimisation using optim()

Different starting points

Starting at some random coordinates, we try to find a local minimum of the above Rastrigin function surface using the optim optimiser, using the default “Nelder-Mead” method.

##      p1_init   p2_init value_init   p1_optim  p2_optim value_optim
## 1 -2.4011913  4.079510   41.76422 -1.9899000  3.979759    19.89908
## 2 -1.3094513  4.553475   55.53880 -0.9949912  4.974703    25.86868
## 3  0.7460184  1.646569   29.56874  0.9949811  1.989927     4.97479
## 4  4.1800478  1.322128   39.34412  3.9797852  1.989949    19.89908
## 5 -3.0547770 -4.487309   50.02228 -2.9848159 -3.979761    24.87385

Different methods

We then vary the method for the same starting point and compare the minima found, number of function calls needed and time taken for 10 replications:

##                        p1           p2    value  time
## initial     -2.401191e+00 4.079510e+00 41.76422 0.000
## Nelder-Mead -1.989900e+00 3.979759e+00 19.89908 0.001
## BFGS        -2.984856e+00 2.984854e+00 17.90920 0.001
## CG          -3.441730e-09 8.088158e-10  0.00000 0.002
## L-BFGS-B    -1.989912e+00 3.979784e+00 19.89907 0.001
## SANN        -1.987232e+00 1.991254e+00  7.96144 0.133

We can observe that the methods vary a lot in terms of the local optima they find, as well as the number of function calls needed. optim() does not allow passing multiple methods as arguments at once, thus we have to use a loop.

Other local optimisers

ucminf()

We use the ucminf optimiser to find minima starting from the exact same points as the previous implementation did using optim(Nelder-Mead).

##      p1_init   p2_init value_init      p1_optim   p2_optim value_optim
## 1 -2.4011913  4.079510   41.76422 -1.989913e+00  3.9797819    19.89907
## 2 -1.3094513  4.553475   55.53880  1.989911e+00  4.9746889    28.85355
## 3  0.7460184  1.646569   29.56874  9.949581e-01  1.9899112     4.97479
## 4  4.1800478  1.322128   39.34412  3.979782e+00  0.9949581    16.91420
## 5 -3.0547770 -4.487309   50.02228 -1.275794e-08 -3.9797859    15.91924

In three of these five cases, we approacb different minima to the ones we reached using optim.

optimx()

We can use optimx() to run different optimisation methods using a much smaller number of lines of code. We can compare them:

##                        p1           p2     value xtime
## Nelder-Mead -1.989900e+00 3.979759e+00 19.899075 0.000
## BFGS        -2.984856e+00 2.984854e+00 17.909202 0.000
## CG          -3.441730e-09 8.088158e-10  0.000000 0.000
## L-BFGS-B    -1.989912e+00 3.979784e+00 19.899075 0.000
## ucminf      -1.989913e+00 3.979782e+00 19.899075 0.000
## nlm          9.949581e-01 9.949581e-01  1.989918 0.001
## nlminb      -1.989912e+00 3.979784e+00 19.899075 0.001

While four of the methods converge to the same local minimum, two other methods approach a better local minimum, and the CG method approaches the global minimum. All methods take a negligible amount of time to converge.

Stochastic Optimisation

I use three different global optimisers: DEoptim, GenSA and psoptim, which are listed under different classes of methods in [^2]. The three are based on different algorithms: Differential Evolutionary optimisation, Generalized Simulated Annealing, and Particle Swarm Optimisation.

We compare them in terms of precision achieved and time taken for 10 replications (using default function arguments).

##                    p1            p2       value  time
## DEoptim -1.357104e-06 -8.588545e-06 1.49994e-08 0.133
## GenSA    2.843176e-10  1.177885e-10 0.00000e+00 0.983
## psoptim  3.376046e-09 -6.880894e-10 0.00000e+00 1.802

We can observe that DEoptim is the fastest optimiser under the respective default conditions, while psoptim and GenSA are slower but better.

DEoptim

For DEoptim, tuning the function arguments NP (number of population members) and itermax (maximum no. of iterations) may be useful to get better estimates of the global minimum. The default values of NP is 50 and itermax is 250.

##                         p1            p2        value  time
## default       6.402712e-06 -2.388698e-06 9.265030e-09 0.134
## NP=75         3.107653e-07  8.397239e-08 2.055955e-11 0.436
## NP=200        2.216914e-06 -1.503676e-06 1.423611e-09 1.154
## itermax=375  -9.068795e-10  4.398613e-09 3.552714e-15 0.255
## itermax=1000 -9.581122e-11  4.609981e-10 0.000000e+00 0.672

We can see that the values we get by increasing either NP or itermax are closer to the origin than the previous ones. However, we see that the optimiser converges better and faster for a change in itermax compared to a proportionate change in NP (it is more sensitive to changes in itermax).

GenSA

Here, we can tune maxit (maximum no. of iterations), temperature (a function argument that controls the probability of accepting worse solutions during the search process).

##                              p1            p2     value  time
## default           -2.827916e-12  5.476625e-13 0.0000000 0.984
## temperature = 1    1.391409e-14 -7.015557e-13 0.0000000 1.267
## temperature = 10  -2.460481e-09  5.711768e-10 0.0000000 1.068
## temperature = 100  8.868914e-12 -3.636632e-11 0.0000000 1.064
## maxit = 100        1.918515e-10  9.949586e-01 0.9949591 0.013
## maxit = 375       -1.061417e-12  2.191422e-12 0.0000000 0.042
## maxit = 1000      -7.361776e-13  6.851085e-13 0.0000000 0.108

In the case of GenSA, the estimated optimum proposed by the default function arguments are quite good, producing a function value that underflows as 0.

It is also quite slow. The package authors note: “The default values of the control components are set for a complex optimization problem. For usual optimization problem with medium complexity, GenSA can find a reasonable solution quickly so the user is recommended to let GenSA stop earlier by setting threshold.stop.”

Evaluating the estimates in terms of the distance from the origin, increasing the maxit does seem to improve the estimates. However, in case of temperature, no consistent (monotonous) pattern is visible. According to the GenSA package documentation however: “For very complex optimization problems, the user is recommended to increase maxit and temp.” It is possible that the low dimensionality of the space we’re considering for our Rastrigin function makes it relatively simple and thus tweaking the temperature does not yield much benefit.

psoptim

Similar to the previous two, maxit (default 1000) can be increased for getting a better estimate. The swarm size s (default 12) is also a hyperparameter. We could also try using the non-default SPSO2011 method.

##                        p1            p2        value  time
## default      2.611627e-10  1.622116e-09 0.000000e+00 1.834
## maxit = 100  1.156110e-05 -2.148004e-07 2.652605e-08 0.181
## maxit = 500 -2.022985e-09 -1.094271e-10 0.000000e+00 0.903
## s = 3       -1.031261e-09  9.949586e-01 9.949591e-01 0.529
## s = 6        8.316656e-10  4.021507e-10 0.000000e+00 0.961
## SPSO2011     3.185646e-09 -9.572976e-11 0.000000e+00 7.426

We can observe that maxit and s correlate positively with the estimate accuracy. The SPSO2011 method isn’t much better here but is significantly slower.

Other functions

Rosenbrock function

We can implement the Rosenbrock function for n dimensions, and plot it for 2D inputs:

\(f(\mathbf{x}) = \sum_{i=1}^{n-1} [ 100(x_{i+1} - x_i^2)^2 + (1-x_i)^2 ]\)

Styblinski–Tang function

We now implement the Styblinski-Tang function in n dimensions, given by:

\(f(x) = \frac{1}{2} \sum_{i=1}^n (x_i^4 - 16 x_i^2 + 5x_i)\)

References

[^1] CRAN Task View: Optimization and Mathematical Programming

[^2] Continuous Global Optimization in R