Nelder-Mead algorithm
From Scholarpedia
| This article is undergoing 1 initial review (1 completed); It may contain inaccuracies and unapproved changes made by anonymous reviewers. | ||||||||||||||||||||
Author: Dr. Saša Singer, Department of Mathematics, University of Zagreb, Croatia
Author: Dr. John Nelder, Imperial College, London, UK
The Nelder-Mead algorithm or simplex search algorithm, originally published in 1965 (Nelder and Mead, 1965), is one of the best known algorithms for multidimensional unconstrained optimization without derivatives.
The basic algorithm is quite simple to understand and very easy to use. For these reasons, it is very popular in many fields of science and technology, especially in chemistry and medicine.
The method does not require any derivative information, which makes it suitable for problems with non-smooth functions. It is widely used to solve parameter estimation and similar statistical problems, where the function values are uncertain or subject to noise. It can also be used for problems with discontinuous functions, which occur frequently in statistics and experimental mathematics.
Contents |
Basic description
The Nelder-Mead algorithm is designed to solve the classical
unconstrained optimization problem of minimizing a given nonlinear
function
.
The method
- uses only function values at some points in
,
- without even trying to form an approximate gradient at any of these points.
Hence it belongs to the general class of direct search methods (see Wright, 1996; Powell, 1998; Torczon, 1997).
Like many other direct search methods, the Nelder-Mead method
is simplex-based. A simplex
in
is defined as the convex hull of
vertices
.
For example, a simplex in
is a triangle,
and a simplex in
is a tetrahedron.
|
|
A simplex-based direct search method begins with a set of
points
that are considered as the
vertices of a working simplex
,
and the corresponding set of
function values at the vertices
, for
.
The initial working simplex
has to be nondegenerate, i.e.,
the points
should not lie in the same hyperplane.
The method then performs a sequence of transformations of the working
simplex
, aimed at decreasing the function values at its vertices.
In each step, the transformation is determined by
computing one or more test points, together with their function values,
and by comparison of these function values with those at the vertices.
This process is terminated when the working simplex
becomes
sufficiently small in some sense, or when the function values
are
close enough in some sense (provided
is continuous).
The Nelder-Mead algorithm typically requires only one or
two function evaluations per step, while many other direct search
methods use
or even more function evaluations for the simplex
transformation in each step.
The rest of the article is largely based on Wright (1996) and Lagarias et al. (1998), which are excellent starting references on the Nelder-Mead method.
Origins and history
Direct search methods first appeared in the 1950s and early 1960s with the growing use of computers to fit experimental data. The name “direct search” was introduced in 1961 by Hooke and Jeeves.
The first simplex-based direct search method was proposed by Spendley, Hext and Himsworth in 1962 (Spendley et al., 1962). It uses only two types of transformations to form a new simplex in each step:
- reflection away from the worst vertex (the one with the largest function value), or
- shrinkage towards the best vertex (the one with the smallest function value).
In these transformations, the angles between edges in every simplex remain constant throughout the iterations, so the working simplex can change in size, but not in shape.
In 1965, Nelder and Mead modified the original method of Spendley et al. by including two additional transformations—expansion and contraction, that allow the working simplex to change not only in size, but also in shape.
A figurative motivation is given in the introduction to their paper:
In the method to be described the simplex adapts itself to the local landscape, elongating down long inclined planes, changing direction on encountering a valley at an angle, and contracting in the neighbourhood of a minimum.
The Nelder-Mead simplex method gained popularity very quickly. At that time, due to its simplicity and low storage requirements, it was ideally suited for use on minicomputers, especially in laboratories. In the 1970s, the method became a standard member of several major software libraries. Its popularity grew even more in the 1980s, when it first appeared as the “amoeba algorithm” in the widely used handbook Numerical Recipes (Press et al., second edition, 1992), and in Matlab software package, where it is now called “fminsearch” (Matlab, 2004).
Despite its age and recent advances in direct search methods, the Nelder-Mead method it is still the most popular direct search method in practice.
The Nelder-Mead simplex algorithm
Even though the method is quite simple, it is implemented in many different ways. Apart from some minor computational details in the basic algorithm, the main difference between various implementations lies in the construction of the initial simplex, and in the selection of convergence or termination tests used to break the iteration process. The general algorithm is given below.
- Construct the initial working simplex
.
- Repeat the following steps until the termination test is satisfied:
- calculate the termination test information;
- if the termination test is not satisfied then transform the working simplex.
- Return the best vertex of the current simplex
and the associated function value.
Initial simplex
The initial simplex
is usually constructed by generating
vertices
around a given input
point
.
In practice, the most frequent choice is
to allow
proper restarts of the algorithm. The remaining
vertices are then generated
to obtain one of two standard shapes of
:
-
is right-angled at
, based on coordinate axes, or 
where
is the stepsize in the direction of unit vector
in
.
-
is a regular simplex, where all edges have the same specified length.
Simplex transformation algorithm
One iteration of the Nelder-Mead method consists of the following three steps.
- Ordering: Determine the indices
of the worst, second worst and the best vertex, respectively, in the current working simplex 

In some implementations, the vertices of
are ordered with respect to the function values, to satisfy
. Then
,
, and
. Consistent tie-breaking rules for this ordering were given by Lagarias et al. (1998).
- Centroid: Calculate the centroid
of the best side—this is the one opposite to the worst vertex 
- Transformation: Compute the new working simplex from the current one. First, try to replace only the worst vertex
with a better point by using reflection, expansion or contraction with respect to the best side. All test points lie on the line defined by
and
, and at most two of them are computed in one iteration. If this succeeds, the accepted point becomes the new vertex of the working simplex. If this fails, shrink the simplex towards the best vertex
. In this case,
new vertices are computed.
Simplex transformations in the Nelder-Mead method are controlled by four
parameters:
for reflection,
for contraction,
for expansion and
for shrinkage.
They should satisfy the following constraints
The standard values, used in most implementations, are
A slightly different choice was suggested by Parkinson and Hutchinson (1972).
An alternate notation for these four parameters:
,
,
,
, respectively, is used in
Lagarias et al. (1998) and Matlab implementation.
The following algorithm describes the working simplex transformations in step 3 (see Wright, 1996), and the effects of various transformations are shown in the corresponding figures. The new working simplex is shown in red.
- Reflect: Compute the reflection point
and
. If
, accept
and terminate the iteration.
- Expand: If
, compute the expansion point
and
. If
, accept
and terminate the iteration. Otherwise (if
), accept
and terminate the iteration.
This “greedy minimization” approach includes the better of the two points
,
in the new simplex, and the simplex is expanded only if
. It is used in most implementations, and in theory (Lagarias et al., 1998).
The original Nelder-Mead paper uses “greedy expansion”, where
is accepted if
and
, regardless of the relationship between
and
. It may happen that
, so
would be a better new point than
, and
is still accepted for the new simplex. The working simplex is kept as large as possible, to avoid premature termination of iterations, which is sometimes useful for non-smooth functions (see, for example, Rowan, 1990).
- Contract: If
, compute the contraction point
by using the better of the two points
and
.
- Outside: If
, compute
and
. If
, accept
and terminate the iteration.
Otherwise, perform a shrink transformation. - Inside: If
, compute
and
. If
, accept
and terminate the iteration.
Otherwise, perform a shrink transformation.
- Outside: If
|
|
- Shrink: Compute
new vertices
and
, for
, with
.
Termination tests
The Nelder-Mead algorithm must terminate in a finite number of
steps or iterations. Quite generally, the termination test is composed of
three different parts:
,
and
.
-
is the domain convergence or termination test. It becomes true when the working simplex
is sufficiently small in some sense (some or all vertices
are close enough).
-
is the function value convergence test. It becomes true when (some or all) function values
are close enough in some sense.
-
is the no convergence test. It becomes true if the number of iterations or function evaluations exceeds some prescribed maximum allowed value.
The algorithm terminates as soon as at least one of these tests becomes true.
Various forms of
and
tests have
been used in practice, and some common implementations of the algorithm
have only one of these two tests
(see Singer and Singer, 2001, and Singer and Singer, 2004, for a survey).
If the algorithm is expected to work for discontinuous functions
,
then it must have some form of a
test.
This test is also useful for continuous functions, when a reasonably accurate
minimizing point is required, in addition to the minimal function value.
In such cases, a
test is only a safeguard for “flat”
functions.
Efficient implementation
After so many years of numerical experience with the Nelder-Mead method, there is an overwhelming evidence that slow shrink transformations almost never happen in practice. Therefore, a typical nonshrink iteration of the algorithm is extremely fast, since it computes only one or two test points and the associated function values. Moreover, in this case, the new working simplex contains only one new vertex—the accepted point that replaces the worst vertex in the old simplex. As a consequence, the first two steps in the next iteration can be performed more efficiently than in the obvious implementation:
- the ordering of the vertices can be updated in linear time (at most
comparisons) by one step of straight insertion sort, and
- the new centroid can also be computed by updating the previous one in
operations, with almost no additional storage.
See Singer and Singer (2004) for details of this implementation.
A fairly simple efficiency analysis of one Nelder-Mead iteration by Singer and Singer (2001) revealed a potential computational bottleneck in the domain convergence test. It becomes a serious problem if each function evaluation is reasonably fast with respect to the whole iteration. To circumvent this problem, Singer and Singer (2004) proposed a simple and efficient domain convergence test based on tracking the relative “linearized volume” of the working simplex. This paper also shows how much can be gained by efficient implementation of various steps in the Nelder-Mead algorithm.
Convergence
Rigorous analysis of the Nelder-Mead method seems to be a very hard mathematical problem. Known convergence results for simplex-based direct search methods rely on one or both of the following properties:
- (a) The angles between adjacent edges of the working simplex are uniformly bounded away from
and
throughout the iterations, i.e., the simplex remains uniformly nondegenerate (see Torczon, 1997).
- (b) Some form of “sufficient” descent condition for function values at the vertices is required at each iteration.
In general, the original Nelder-Mead method does not satisfy either of these properties. Deliberately by design, the shape of the working simplex can almost degenerate while “adapting itself to the local landscape”, and the method uses only simple decrease of function values at the vertices to transform the simplex. Hence, very little is known about the convergence properties of the method—with mainly negative results.
McKinnon (1998) gave a family of strictly convex functions and a class of initial simplices in two dimensions for which all vertices of the working simplex converge to a nonminimizing point.
A very readable paper by Lagarias et al. (1998) contains several convergence results in one and two dimensions for strictly convex functions with bounded level sets.
Almost nothing is known about the behaviour of the method for less smooth or discontinuous functions—except, of course, that the method may fail to converge to a minimizing point.
Advantages and disadvantages
In many practical problems, like parameter estimation and process control, the function values are uncertain or subject to noise. Therefore, a highly accurate solution is not necessary, and it may be impossible to compute. All that is desired is an improvement in function value, rather than full optimization.
The Nelder-Mead method frequently gives significant
improvements in the first few iterations and quickly produces
quite satisfactory results.
Also, the method typically requires only one or two
function evaluations per iteration, except in shrink transformations,
which are extremely rare in practice.
This is very important in applications where each function evaluation
is very expensive or time-consuming. For such problems, the method is
substantially faster than many other methods that require
at least
function evaluations in each iteration.
- In its best-case performance, the Nelder-Mead method beats most of its competitors.
Apart from being simple to understand and use, this is the main reason for its popularity in practice.
On the other hand, the lack of convergence theory is often reflected in practice as a numerical breakdown of the algorithm, even for smooth and well-behaved functions.
- The method can take an enormous amount of iterations with negligible improvement in function value, despite being nowhere near to a minimum.
This usually results in premature termination of iterations. A heuristic approach to deal with such cases is to restart the algorithm several times, with reasonably small number of allowed iterations per each run.
Modifications and related methods
There are many modifications of the original Nelder-Mead method. Most of them are aimed at improving the worst-case performance of the method with regards to convergence, while trying to retain some or most of its best-case efficiency.
Typically, such modifications incorporate additional tests at some stages of the algorithm, to ensure that at least one of the properties (a), (b) from above are satisfied. For example, a provably convergent variant of the Nelder-Mead method by Price et al. (2002) uses a combination of (a) and (b). Another convergent variant by Bürmen et al. (2006) uses the principle of grid restrainment, which is a form of (a).
Other simplex-based direct search methods that have strong convergence properties have been proposed since the late 1970s. These include the multidirectional search method (see Torczon, 1991; Dennis and Torczon, 1991), and the whole class of pattern search methods (see Torczon, 1997).
A survey article by Powell (1998) gives the state of the art in general direct search methods at that time.
References
- Bürmen, A., Puhan, J., and Tuma, T. (2006), “Grid Restrained Nelder-Mead Algorithm”, Comput. Optim. Appl. 34, pp. 359–375.
- Dennis, J.E. and Torczon, V.J. (1991), “Direct Search Methods on Parallel Machines”, SIAM. J. Optim. 1, pp. 448–474.
- Hooke, R. and Jeeves, T.A. (1961), “Direct Search Solution of Numerical and Statistical Problems”, J. ACM 8, pp. 212–229.
- Lagarias, J.C., Reeds, J.A., Wright, M.H., and Wright, P.E. (1998), “Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions”, SIAM. J. Optim. 9, pp. 112–147.
- The MathWorks, Inc. (2004), “MATLAB Language Reference Manual, Version 7”, Natick, Massachusetts.
- McKinnon, K.I.M. (1998), “Convergence of the Nelder-Mead Simplex Method to a Nonstationary Point”, SIAM. J. Optim. 9, pp. 148–158.
- Nelder, J.A. and Mead, R. (1965), “A simplex method for function minimization”, Comput. J., 7, pp. 308–313.
- Parkinson, J.M. and Hutchinson, D. (1972), “An investigation into the efficiency of variants on the simplex method”, in Numerical Methods for Nonlinear Optimization, F.A. Lootsma (Ed.), Academic Press, New York, pp. 115–135.
- Powell, M.J.D. (1998), “Direct search algortihms for optimization calculations”, in Acta Numerica 1998, A. Iserles (Ed.), Cambridge University Press, Cambridge, UK, pp. 287–336.
- Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. (1992), Numerical Recipes in FORTRAN: The Art of Scientific Computing, (second ed.), Cambridge University Press, Cambridge, New York.
- Price, C.J., Coope, I.D., and Byatt, D. (2002), “A Convergent Variant of the Nelder-Mead Algorithm”, J. Optim. Theory Appl. 113, No. 1, pp. 5–19.
- Rowan, T.H. (1990), “Functional Stability Analysis of Numerical Algorithms”, Ph.D. thesis, University of Texas, Austin.
- Singer, Sanja and Singer, Saša (2001), “Complexity Analysis of Nelder-Mead Search Iterations”, in Proceedings of the 1. Conference on Applied Mathematics and Computation, Dubrovnik, Croatia, 1999, M. Rogina, V. Hari, N. Limić, and Z. Tutek (Eds.), PMF–Matematički odjel, Zagreb, pp. 185–196.
- Singer, Saša and Singer, Sanja (2004), “Efficient Implementation of the Nelder-Mead Search Algorithm”, Appl. Numer. Anal. Comput. Math.} 1, No. 3, pp. 524–534.
- Spendley, W., Hext, G.R., and Himsworth, F.R. (1962), “Sequential Application of Simplex Designs in Optimisation and Evolutionary Operation”, Technometrics, Vol. 4, pp. 441–461.
- Torczon, V.J. (1991), “On the Convergence of the Multidirectional Search Algorithm”, SIAM. J. Optim. 1, pp. 123–145.
- Torczon, V.J. (1997), “On the Convergence of Pattern Search Algorithms”, SIAM. J. Optim. 7, pp. 1–25.
- Wright, M.H. (1996), “Direct Search Methods: Once Scorned, Now Respectable”, in Numerical Analysis 1995, Proceedings of the 1995 Dundee Biennial Conference in Numerical Analysis, D.F. Griffiths and G.A. Watson (Eds.), Addison Wesley Longman, Harlow, UK, pp. 191–208.







