Nelder-Mead algorithm
From Scholarpedia
| Saša Singer and John Nelder (2009), Scholarpedia, 4(7):2928. | doi:10.4249/scholarpedia.2928 | revision #66073 [link to/cite this article] | |||||||||||||||||||
| Hosting and maintenance of this article is sponsored by Brain Corporation. |
Curator: Dr. Saša Singer, Department of Mathematics, University of Zagreb, Croatia
Curator: Dr. John Nelder, Imperial College, London, UK
The Nelder-Mead algorithm or simplex search algorithm, originally published in 1965 (Nelder and Mead, 1965), is one of the best known algorithms for multidimensional unconstrained optimization without derivatives. This method should not be confused with Dantzig's simplex method for linear programming, which is completely different, as it solves a linearly constrained linear problem.
The basic algorithm is quite simple to understand and very easy to use. For these reasons, it is very popular in many fields of science and technology, especially in chemistry and medicine.
The method does not require any derivative information, which makes it suitable for problems with non-smooth functions. It is widely used to solve parameter estimation and similar statistical problems, where the function values are uncertain or subject to noise. It can also be used for problems with discontinuous functions, which occur frequently in statistics and experimental mathematics.
Contents |
Basic description
The Nelder-Mead algorithm is designed to solve the classical
unconstrained optimization problem of minimizing a given nonlinear
function
.
The method
- uses only function values at some points in
, and
- does not try to form an approximate gradient at any of these points.
Hence it belongs to the general class of direct search methods (see Wright, 1996; Powell, 1998; Kolda et al., 2003).
The Nelder-Mead method is simplex-based. A simplex
in
is defined as the convex hull of
vertices
.
For example, a simplex in
is a triangle,
and a simplex in
is a tetrahedron.
|
|
A simplex-based direct search method begins with a set of
points
that are considered as the
vertices of a working simplex
,
and the corresponding set of
function values at the vertices
, for
.
The initial working simplex
has to be nondegenerate, i.e.,
the points
must not lie in the same hyperplane.
The method then performs a sequence of transformations of the working
simplex
, aimed at decreasing the function values at its vertices.
At each step, the transformation is determined by
computing one or more test points, together with their function values,
and by comparison of these function values with those at the vertices.
This process is terminated when the working simplex
becomes
sufficiently small in some sense, or when the function values
are
close enough in some sense (provided
is continuous).
The Nelder-Mead algorithm typically requires only one or
two function evaluations at each step, while many other direct search
methods use
or even more function evaluations.
The rest of the article is largely based on Wright (1996) and Lagarias et al. (1998), which are excellent starting references for the Nelder-Mead method.
Origins and history
Direct search methods first appeared in the 1950s and early 1960s with the growing use of computers to fit experimental data. The name “direct search” was introduced in 1961 by Hooke and Jeeves.
The first simplex-based direct search method was proposed by Spendley, Hext and Himsworth in 1962 (Spendley et al., 1962). It uses only two types of transformations to form a new simplex in each step:
- reflection away from the worst vertex (the one with the largest function value), or
- shrinkage towards the best vertex (the one with the smallest function value).
In these transformations, the angles between edges in every simplex remain constant throughout the iterations, so the working simplex can change in size, but not in shape.
In 1965, Nelder and Mead modified the original method of Spendley et al. by including two additional transformations—expansion and contraction, that allow the working simplex to change not only its size, but also its shape.
A figurative motivation is given in the introduction to their paper:
In the method to be described the simplex adapts itself to the local landscape, elongating down long inclined planes, changing direction on encountering a valley at an angle, and contracting in the neighbourhood of a minimum.
The Nelder-Mead simplex method gained popularity very quickly. At that time, due to its simplicity and low storage requirements, it was ideally suited for use on minicomputers, especially in laboratories. In the 1970s, the method became a standard member of several major software libraries. Its popularity grew even more in the 1980s, when it first appeared as the “amoeba algorithm” in the widely used handbook Numerical Recipes (Press et al., second edition, 1992), and in Matlab software package, where it is now called “fminsearch” (Matlab, 2008).
Despite its age and recent advances in direct search methods, the Nelder-Mead method is still among the most popular direct search methods in practice.
The Nelder-Mead simplex algorithm
Even though the method is quite simple, it is implemented in many different ways. Apart from some minor computational details in the basic algorithm, the main difference between various implementations lies in the construction of the initial simplex, and in the selection of convergence or termination tests used to end the iteration process. The general algorithm is given below.
- Construct the initial working simplex
.
- Repeat the following steps until the termination test is satisfied:
- calculate the termination test information;
- if the termination test is not satisfied then transform the working simplex.
- Return the best vertex of the current simplex
and the associated function value.
Initial simplex
The initial simplex
is usually constructed by generating
vertices
around a given input
point
.
In practice, the most frequent choice is
to allow
proper restarts of the algorithm. The remaining
vertices are then generated
to obtain one of two standard shapes of
:
-
is right-angled at
, based on coordinate axes, or 
where
is a stepsize in the direction of unit vector
in
.
-
is a regular simplex, where all edges have the same specified length.
Simplex transformation algorithm
One iteration of the Nelder-Mead method consists of the following three steps.
- Ordering: Determine the indices
of the worst, second worst and the best vertex, respectively, in the current working simplex 

In some implementations, the vertices of
are ordered with respect to the function values, to satisfy
. Then
,
, and
. Consistent tie-breaking rules for this ordering were given by Lagarias et al. (1998).
- Centroid: Calculate the centroid
of the best side—this is the one opposite the worst vertex 
- Transformation: Compute the new working simplex from the current one. First, try to replace only the worst vertex
with a better point by using reflection, expansion or contraction with respect to the best side. All test points lie on the line defined by
and
, and at most two of them are computed in one iteration. If this succeeds, the accepted point becomes the new vertex of the working simplex. If this fails, shrink the simplex towards the best vertex
. In this case,
new vertices are computed.
Simplex transformations in the Nelder-Mead method are controlled by four
parameters:
for reflection,
for contraction,
for expansion and
for shrinkage.
They should satisfy the following constraints
The standard values, used in most implementations, are
A slightly different choice was suggested by Parkinson and Hutchinson (1972).
An alternate notation for these four parameters:
,
,
,
, respectively, is used in
Lagarias et al. (1998) and Matlab implementation.
The following algorithm describes the working simplex transformations in step 3 (see Wright, 1996), and the effects of various transformations are shown in the corresponding figures. The new working simplex is shown in red.
- Reflect: Compute the reflection point
and
. If
, accept
and terminate the iteration.
- Expand: If
, compute the expansion point
and
. If
, accept
and terminate the iteration. Otherwise (if
), accept
and terminate the iteration.
This “greedy minimization” approach includes the better of the two points
,
in the new simplex, and the simplex is expanded only if
. It is used in most implementations, and in theory (Lagarias et al., 1998).
The original Nelder-Mead paper uses “greedy expansion”, where
is accepted if
and
, regardless of the relationship between
and
. It may happen that
, so
would be a better new point than
, and
is still accepted for the new simplex. The working simplex is kept as large as possible, to avoid premature termination of iterations, which is sometimes useful for non-smooth functions (see, for example, Rowan, 1990).
- Contract: If
, compute the contraction point
by using the better of the two points
and
.
- Outside: If
, compute
and
. If
, accept
and terminate the iteration.
Otherwise, perform a shrink transformation. - Inside: If
, compute
and
. If
, accept
and terminate the iteration.
Otherwise, perform a shrink transformation.
- Outside: If
|
|
- Shrink: Compute
new vertices
and
, for
, with
.
The shrink transformation was introduced to prevent the algorithm from failing in the following case, described by the quote from the original paper:
A failed contraction is much rarer, but can occur when a valley is curved and one point of the simplex is much farther from the valley bottom than the others; contraction may then cause the reflected point to move away from the valley bottom instead of towards it. Further contractions are then useless. The action proposed contracts the simplex towards the lowest point, and will eventually bring all points into the valley.
Termination tests
A practical implementation of the Nelder-Mead method must include
a test that ensures termination in a finite amount of time.
The termination test is often composed of three different parts:
,
and
.
-
is the domain convergence or termination test. It becomes true when the working simplex
is sufficiently small in some sense (some or all vertices
are close enough).
-
is the function-value convergence test. It becomes true when (some or all) function values
are close enough in some sense.
-
is the no-convergence test. It becomes true if the number of iterations or function evaluations exceeds some prescribed maximum allowed value.
The algorithm terminates as soon as at least one of these tests becomes true.
Various forms of
and
tests have
been used in practice, and some common implementations of the algorithm
have only one of these two tests
(see Singer and Singer, 2001, and Singer and Singer, 2004, for a survey).
If the algorithm is expected to work for discontinuous functions
,
then it must have some form of a
test.
This test is also useful for continuous functions, when a reasonably accurate
minimizing point is required, in addition to the minimal function value.
In such cases, a
test is only a safeguard for “flat”
functions.
Postprocessing in parameter estimation problems
The Nelder-Mead method was primarily designed for statistical parameter estimation problems. The original paper also provides a method to calculate the curvature of the surface in the neighbourhood of the minimum by calculating the function values at the midpoints of the edges of the final simplex. This gives enough information to approximate the Hessian matrix of second derivatives. Its inverse then gives the approximate (co)variances for the parameter estimates.
Efficient implementation
After so many years of numerical experience with the Nelder-Mead method, there is overwhelming evidence that shrink transformations almost never happen in practice. Therefore, a typical nonshrink iteration of the algorithm is extremely fast, since it computes only one or two test points and the associated function values. Moreover, in this case, the new working simplex contains only one new vertex—the accepted point that replaces the worst vertex in the old simplex. As a consequence, the first two steps in the next iteration can be performed more efficiently than in the obvious implementation:
- the ordering of the vertices can be updated in linear time (at most
comparisons) by one step of straight insertion sort, and
- the new centroid can also be computed by updating the previous one in
operations, with almost no additional storage.
See Singer and Singer (2004) for details of this implementation.
An analysis of a single Nelder-Mead iteration by Singer and Singer (2001) revealed a potential computational bottleneck in the domain convergence test. It becomes a serious problem if each function evaluation is reasonably fast with respect to the whole iteration. To circumvent this problem, Singer and Singer (2004) proposed a simple and efficient domain convergence test based on tracking the relative “linearized volume” of the working simplex. This paper also shows how much can be gained by efficient implementation of various steps in the Nelder-Mead algorithm.
Convergence
Rigorous analysis of the Nelder-Mead method seems to be a very hard mathematical problem. Known convergence results for direct search methods (see Audet and Dennis, 2003; Price and Coope, 2003), in simplex terms, rely on one or both of the following properties:
- (a) The angles between adjacent edges of the working simplex are uniformly bounded away from
and
throughout the iterations, i.e., the simplex remains uniformly nondegenerate (see Torczon, 1997).
- (b) Some form of “sufficient” descent condition for function values at the vertices is required at each iteration.
In general, the original Nelder-Mead method does not satisfy either of these properties. By design, the shape of the working simplex can almost degenerate while “adapting itself to the local landscape”, and the method uses only simple decrease of function values at the vertices to transform the simplex. Hence, very little is known about the convergence properties of the method—with mainly negative results.
McKinnon (1998) gave a family of strictly convex functions and a class of initial simplices in two dimensions for which all vertices of the working simplex converge to a nonminimizing point.
A very readable paper by Lagarias et al. (1998) contains several convergence results in one and two dimensions for strictly convex functions with bounded level sets.
Almost nothing is known about the behaviour of the method for less smooth or discontinuous functions—except, of course, that the method may fail to converge to a minimizing point.
Advantages and disadvantages
In many practical problems, like parameter estimation and process control, the function values are uncertain or subject to noise. Therefore, a highly accurate solution is not necessary, and may be impossible to compute. All that is desired is an improvement in function value, rather than full optimization.
The Nelder-Mead method frequently gives significant
improvements in the first few iterations and quickly produces
quite satisfactory results.
Also, the method typically requires only one or two
function evaluations per iteration, except in shrink transformations,
which are extremely rare in practice.
This is very important in applications where each function evaluation
is very expensive or time-consuming. For such problems, the method is
often faster than other methods, especially those that require
at least
function evaluations per iteration.
- In many numerical tests, the Nelder-Mead method succeeds in obtaining a good reduction in the function value using a relatively small number of function evaluations.
Apart from being simple to understand and use, this is the main reason for its popularity in practice.
On the other hand, the lack of convergence theory is often reflected in practice as a numerical breakdown of the algorithm, even for smooth and well-behaved functions.
- The method can take an enormous amount of iterations with negligible improvement in function value, despite being nowhere near to a minimum.
This usually results in premature termination of iterations. A heuristic approach to deal with such cases is to restart the algorithm several times, with reasonably small number of allowed iterations per each run.
Modifications and related methods
There are many modifications of the original Nelder-Mead method. Most of them are aimed at improving the worst-case performance of the method with regards to convergence, while trying to retain some or most of its best-case efficiency.
Typically, such modifications incorporate additional tests at some stages of the algorithm, to ensure that at least one of the properties (a), (b) from above are satisfied. For example, a provably convergent variant of the Nelder-Mead method by Price et al. (2002) uses a combination of (a) and (b). Another convergent variant by Bürmen et al. (2006) uses the principle of grid restrainment, which is a form of (a).
Other direct search methods that have strong convergence properties have been proposed since the late 1970s. These include the multidirectional search method (see Torczon, 1991; Dennis and Torczon, 1991), generalized pattern search methods (see Torczon, 1997; Audet and Dennis, 2003), frame-based methods (see Coope and Price, 2001), and mesh adaptive direct search algorithms (see Audet and Dennis, 2006).
An excellent survey article by Kolda et al. (2003) gives the state of the art in general direct search methods at that time. An older survey is given by Powell (1998).
References
- Audet, C. and Dennis, J.E. Jr. (2003), “Analysis of Generalized Pattern Searches”, SIAM J. Optim. 13, pp. 889–903.
- Audet, C. and Dennis, J.E. Jr. (2006), “Mesh Adaptive Direct Search Algorithms for Constrained Optimization”, SIAM J. Optim. 17, pp. 188–217.
- Bürmen, A., Puhan, J., and Tuma, T. (2006), “Grid Restrained Nelder-Mead Algorithm”, Comput. Optim. Appl. 34, pp. 359–375.
- Coope, I.D. and Price, C.J. (2001), “On the Convergence of Grid-Based Methods for Unconstrained Optimization”, SIAM J. Optim. 11, pp. 859–869.
- Dennis, J.E. Jr. and Torczon, V. (1991), “Direct Search Methods on Parallel Machines”, SIAM J. Optim. 1, pp. 448–474.
- Hooke, R. and Jeeves, T.A. (1961), “Direct Search Solution of Numerical and Statistical Problems”, J. ACM 8, pp. 212–229.
- Kolda, T.G., Lewis, R.M., and Torczon, V. (2003), “Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods”, SIAM Rev. 45, pp. 385–482.
- Lagarias, J.C., Reeds, J.A., Wright, M.H., and Wright, P.E. (1998), “Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions”, SIAM J. Optim. 9, pp. 112–147.
- The MathWorks, Inc. (2008), “MATLAB 7, Function Reference”, Natick, Massachusetts.
- McKinnon, K.I.M. (1998), “Convergence of the Nelder-Mead Simplex Method to a Nonstationary Point”, SIAM J. Optim. 9, pp. 148–158.
- Nelder, J.A. and Mead, R. (1965), “A simplex method for function minimization”, Comput. J., 7, pp. 308–313.
- Parkinson, J.M. and Hutchinson, D. (1972), “An investigation into the efficiency of variants on the simplex method”, in Numerical Methods for Nonlinear Optimization, F.A. Lootsma (Ed.), Academic Press, New York, pp. 115–135.
- Powell, M.J.D. (1998), “Direct search algortihms for optimization calculations”, in Acta Numerica 1998, A. Iserles (Ed.), Cambridge University Press, Cambridge, UK, pp. 287–336.
- Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. (1992), Numerical Recipes in FORTRAN: The Art of Scientific Computing, (second ed.), Cambridge University Press, Cambridge, New York.
- Price, C.J. and Coope, I.D. (2003), “Frames and Grids in Unconstrained and Linearly Constrained Optimization: A Nonsmooth Approach”, SIAM J. Optim. 14, pp. 415–438.
- Price, C.J., Coope, I.D., and Byatt, D. (2002), “A Convergent Variant of the Nelder-Mead Algorithm”, J. Optim. Theory Appl. 113, No. 1, pp. 5–19.
- Rowan, T.H. (1990), “Functional Stability Analysis of Numerical Algorithms”, Ph.D. thesis, University of Texas, Austin.
- Singer, Sanja and Singer, Saša (2001), “Complexity Analysis of Nelder-Mead Search Iterations”, in Proceedings of the 1. Conference on Applied Mathematics and Computation, Dubrovnik, Croatia, 1999, M. Rogina, V. Hari, N. Limić, and Z. Tutek (Eds.), PMF–Matematički odjel, Zagreb, pp. 185–196.
- Singer, Saša and Singer, Sanja (2004), “Efficient Implementation of the Nelder-Mead Search Algorithm”, Appl. Numer. Anal. Comput. Math. 1, No. 3, pp. 524–534.
- Spendley, W., Hext, G.R., and Himsworth, F.R. (1962), “Sequential Application of Simplex Designs in Optimisation and Evolutionary Operation”, Technometrics, Vol. 4, pp. 441–461.
- Torczon, V. (1991), “On the Convergence of the Multidirectional Search Algorithm”, SIAM J. Optim. 1, pp. 123–145.
- Torczon, V. (1997), “On the Convergence of Pattern Search Algorithms”, SIAM J. Optim. 7, pp. 1–25.
- Wright, M.H. (1996), “Direct Search Methods: Once Scorned, Now Respectable”, in Numerical Analysis 1995, Proceedings of the 1995 Dundee Biennial Conference in Numerical Analysis, D.F. Griffiths and G.A. Watson (Eds.), Addison Wesley Longman, Harlow, UK, pp. 191–208.
Internal references
- Rob Schreiber (2007) MATLAB. Scholarpedia, 2(7):2929.
See also
Optimization, Direct search method, Matlab
| Saša Singer, John Nelder (2009) Nelder-Mead algorithm. Scholarpedia, 4(7):2928, (go to the first approved version) Created: 17 January 2007, reviewed: 16 July 2009, accepted: 16 July 2009 |







