Post-publication activity

Curator: John Nelder

The Nelder-Mead algorithm or simplex search algorithm, originally published in 1965 (Nelder and Mead, 1965), is one of the best known algorithms for multidimensional unconstrained optimization without derivatives. This method should not be confused with Dantzig's simplex method for linear programming, which is completely different, as it solves a linearly constrained linear problem.

The basic algorithm is quite simple to understand and very easy to use. For these reasons, it is very popular in many fields of science and technology, especially in chemistry and medicine.

The method does not require any derivative information, which makes it suitable for problems with non-smooth functions. It is widely used to solve parameter estimation and similar statistical problems, where the function values are uncertain or subject to noise. It can also be used for problems with discontinuous functions, which occur frequently in statistics and experimental mathematics.

## Basic description

The Nelder-Mead algorithm is designed to solve the classical unconstrained optimization problem of minimizing a given nonlinear function $$f : {\mathbb R}^n \to {\mathbb R}\ .$$ The method

• uses only function values at some points in $${\mathbb R}^n\ ,$$ and
• does not try to form an approximate gradient at any of these points.

Hence it belongs to the general class of direct search methods (see Wright, 1996; Powell, 1998; Kolda et al., 2003).

The Nelder-Mead method is simplex-based. A simplex $$S$$ in $${\mathbb R}^n$$ is defined as the convex hull of $$n + 1$$ vertices $$x_0, \ldots, x_n \in {\mathbb R}^n\ .$$ For example, a simplex in $${\mathbb R}^2$$ is a triangle, and a simplex in $${\mathbb R}^3$$ is a tetrahedron.

A simplex-based direct search method begins with a set of $$n + 1$$ points $$x_0, \ldots, x_n \in {\mathbb R}^n$$ that are considered as the vertices of a working simplex $$S\ ,$$ and the corresponding set of function values at the vertices $$f_j := f(x_j)\ ,$$ for $$j = 0, \ldots, n\ .$$ The initial working simplex $$S$$ has to be nondegenerate, i.e., the points $$x_0, \ldots, x_n$$ must not lie in the same hyperplane.

The method then performs a sequence of transformations of the working simplex $$S\ ,$$ aimed at decreasing the function values at its vertices. At each step, the transformation is determined by computing one or more test points, together with their function values, and by comparison of these function values with those at the vertices.

This process is terminated when the working simplex $$S$$ becomes sufficiently small in some sense, or when the function values $$f_j$$ are close enough in some sense (provided $$f$$ is continuous).

The Nelder-Mead algorithm typically requires only one or two function evaluations at each step, while many other direct search methods use $$n$$ or even more function evaluations.

The rest of the article is largely based on Wright (1996) and Lagarias et al. (1998), which are excellent starting references for the Nelder-Mead method.

## Origins and history

Direct search methods first appeared in the 1950s and early 1960s with the growing use of computers to fit experimental data. The name “direct search” was introduced in 1961 by Hooke and Jeeves.

The first simplex-based direct search method was proposed by Spendley, Hext and Himsworth in 1962 (Spendley et al., 1962). It uses only two types of transformations to form a new simplex in each step:

• reflection away from the worst vertex (the one with the largest function value), or
• shrinkage towards the best vertex (the one with the smallest function value).

In these transformations, the angles between edges in every simplex remain constant throughout the iterations, so the working simplex can change in size, but not in shape.

In 1965, Nelder and Mead modified the original method of Spendley et al. by including two additional transformations—expansion and contraction, that allow the working simplex to change not only its size, but also its shape.

A figurative motivation is given in the introduction to their paper:

In the method to be described the simplex adapts itself to the local landscape, elongating down long inclined planes, changing direction on encountering a valley at an angle, and contracting in the neighbourhood of a minimum.

The Nelder-Mead simplex method gained popularity very quickly. At that time, due to its simplicity and low storage requirements, it was ideally suited for use on minicomputers, especially in laboratories. In the 1970s, the method became a standard member of several major software libraries. Its popularity grew even more in the 1980s, when it first appeared as the “amoeba algorithm” in the widely used handbook Numerical Recipes (Press et al., second edition, 1992), and in Matlab software package, where it is now called “fminsearch” (Matlab, 2008).

Despite its age and recent advances in direct search methods, the Nelder-Mead method is still among the most popular direct search methods in practice.

Even though the method is quite simple, it is implemented in many different ways. Apart from some minor computational details in the basic algorithm, the main difference between various implementations lies in the construction of the initial simplex, and in the selection of convergence or termination tests used to end the iteration process. The general algorithm is given below.

• Construct the initial working simplex $$S\ .$$
• Repeat the following steps until the termination test is satisfied:
• calculate the termination test information;
• if the termination test is not satisfied then transform the working simplex.
• Return the best vertex of the current simplex $$S$$ and the associated function value.

### Initial simplex

The initial simplex $$S$$ is usually constructed by generating $$n + 1$$ vertices $$x_0, \ldots, x_n$$ around a given input point $$x_{in} \in {\mathbb R}^n\ .$$ In practice, the most frequent choice is $$x_0 = x_{in}$$ to allow proper restarts of the algorithm. The remaining $$n$$ vertices are then generated to obtain one of two standard shapes of $$S\ :$$

• $$S$$ is right-angled at $$x_0\ ,$$ based on coordinate axes, or
$$\qquad x_j := x_0 + h_j e_j, \quad j = 1, \ldots, n,$$
where $$h_j$$ is a stepsize in the direction of unit vector $$e_j$$ in $${\mathbb R}^n\ .$$
• $$S$$ is a regular simplex, where all edges have the same specified length.

### Simplex transformation algorithm

One iteration of the Nelder-Mead method consists of the following three steps.

1. Ordering: Determine the indices $$h, s, l$$ of the worst, second worst and the best vertex, respectively, in the current working simplex $$S$$
$$\qquad f_h = \max_{j} f_j, \quad f_s = \max_{j \neq h} f_j, \quad f_l = \min_{j \neq h} f_j.$$
In some implementations, the vertices of $$S$$ are ordered with respect to the function values, to satisfy$$f_0 \leq f_1 \leq \cdots \leq f_{n-1} \leq f_n\ .$$ Then $$l = 0\ ,$$ $$s = n - 1\ ,$$ and $$h = n\ .$$ Consistent tie-breaking rules for this ordering were given by Lagarias et al. (1998).
2. Centroid: Calculate the centroid $$c$$ of the best side—this is the one opposite the worst vertex $$x_h$$
$$\qquad c := \frac{1}{n} \sum_{j \neq h} x_j.$$
3. Transformation: Compute the new working simplex from the current one. First, try to replace only the worst vertex $$x_h$$ with a better point by using reflection, expansion or contraction with respect to the best side. All test points lie on the line defined by $$x_h$$ and $$c\ ,$$ and at most two of them are computed in one iteration. If this succeeds, the accepted point becomes the new vertex of the working simplex. If this fails, shrink the simplex towards the best vertex $$x_l\ .$$ In this case, $$n$$ new vertices are computed.

Simplex transformations in the Nelder-Mead method are controlled by four parameters$\alpha$ for reflection, $$\beta$$ for contraction, $$\gamma$$ for expansion and $$\delta$$ for shrinkage. They should satisfy the following constraints
$$\qquad \alpha > 0, \quad 0 < \beta < 1, \quad \gamma > 1, \quad \gamma > \alpha, \quad 0 < \delta < 1.$$
The standard values, used in most implementations, are
$$\qquad \alpha = 1, \quad \beta = \frac{1}{2}, \quad \gamma = 2, \quad \delta = \frac{1}{2}.$$
A slightly different choice was suggested by Parkinson and Hutchinson (1972). An alternate notation for these four parameters$\rho\ ,$ $$\gamma\ ,$$ $$\chi\ ,$$ $$\sigma\ ,$$ respectively, is used in Lagarias et al. (1998) and Matlab implementation.

The following algorithm describes the working simplex transformations in step 3 (see Wright, 1996), and the effects of various transformations are shown in the corresponding figures. The new working simplex is shown in red.

• Reflect: Compute the reflection point $$x_r := c + \alpha (c - x_h)$$ and $$f_r := f(x_r)\ .$$ If $$f_l \leq f_r < f_s\ ,$$ accept $$x_r$$ and terminate the iteration.
• Expand: If $$f_r < f_l\ ,$$ compute the expansion point $$x_e := c + \gamma (x_r - c)$$ and $$f_e := f(x_e)\ .$$ If $$f_e < f_r\ ,$$ accept $$x_e$$ and terminate the iteration. Otherwise (if $$f_e \geq f_r$$), accept $$x_r$$ and terminate the iteration.

This “greedy minimization” approach includes the better of the two points $$x_r\ ,$$ $$x_e$$ in the new simplex, and the simplex is expanded only if $$f_e < f_r < f_l\ .$$ It is used in most implementations, and in theory (Lagarias et al., 1998).

The original Nelder-Mead paper uses “greedy expansion”, where $$x_e$$ is accepted if $$f_e < f_l$$ and $$f_r < f_l\ ,$$ regardless of the relationship between $$f_r$$ and $$f_e\ .$$ It may happen that $$f_r < f_e\ ,$$ so $$x_r$$ would be a better new point than $$x_e\ ,$$ and $$x_e$$ is still accepted for the new simplex. The working simplex is kept as large as possible, to avoid premature termination of iterations, which is sometimes useful for non-smooth functions (see, for example, Rowan, 1990).
• Contract: If $$f_r \geq f_s\ ,$$ compute the contraction point $$x_c$$ by using the better of the two points $$x_h$$ and $$x_r\ .$$
• Outside: If $$f_s \leq f_r < f_h\ ,$$ compute $$x_c := c + \beta (x_r - c)$$ and $$f_c := f(x_c)\ .$$ If $$f_c \leq f_r\ ,$$ accept $$x_c$$ and terminate the iteration.
Otherwise, perform a shrink transformation.
• Inside: If $$f_r \geq f_h\ ,$$ compute $$x_c := c + \beta (x_h - c)$$ and $$f_c := f(x_c)\ .$$ If $$f_c < f_h\ ,$$ accept $$x_c$$ and terminate the iteration.
Otherwise, perform a shrink transformation.
• Shrink: Compute $$n$$ new vertices $$x_j := x_l + \delta (x_j - x_l)$$ and $$f_j := f(x_j)\ ,$$ for $$j = 0, \ldots, n\ ,$$ with $$j \neq l\ .$$

The shrink transformation was introduced to prevent the algorithm from failing in the following case, described by the quote from the original paper:

A failed contraction is much rarer, but can occur when a valley is curved and one point of the simplex is much farther from the valley bottom than the others; contraction may then cause the reflected point to move away from the valley bottom instead of towards it. Further contractions are then useless. The action proposed contracts the simplex towards the lowest point, and will eventually bring all points into the valley.

### Termination tests

A practical implementation of the Nelder-Mead method must include a test that ensures termination in a finite amount of time. The termination test is often composed of three different parts$\textit{term\_x}\ ,$ $$\textit{term\_f}$$ and $${\it fail}\ .$$

• $$\textit{term\_x}$$ is the domain convergence or termination test. It becomes true when the working simplex $$S$$ is sufficiently small in some sense (some or all vertices $$x_j$$ are close enough).
• $$\textit{term\_f}$$ is the function-value convergence test. It becomes true when (some or all) function values $$f_j$$ are close enough in some sense.
• $${\it fail}$$ is the no-convergence test. It becomes true if the number of iterations or function evaluations exceeds some prescribed maximum allowed value.

The algorithm terminates as soon as at least one of these tests becomes true.

Various forms of $$\textit{term\_x}$$ and $$\textit{term\_f}$$ tests have been used in practice, and some common implementations of the algorithm have only one of these two tests (see Singer and Singer, 2001, and Singer and Singer, 2004, for a survey).

If the algorithm is expected to work for discontinuous functions $$f\ ,$$ then it must have some form of a $$\textit{term\_x}$$ test. This test is also useful for continuous functions, when a reasonably accurate minimizing point is required, in addition to the minimal function value. In such cases, a $$\textit{term\_f}$$ test is only a safeguard for “flat” functions.

### Postprocessing in parameter estimation problems

The Nelder-Mead method was primarily designed for statistical parameter estimation problems. The original paper also provides a method to calculate the curvature of the surface in the neighbourhood of the minimum by calculating the function values at the midpoints of the edges of the final simplex. This gives enough information to approximate the Hessian matrix of second derivatives. Its inverse then gives the approximate (co)variances for the parameter estimates.

## Efficient implementation

After so many years of numerical experience with the Nelder-Mead method, there is overwhelming evidence that shrink transformations almost never happen in practice. Therefore, a typical nonshrink iteration of the algorithm is extremely fast, since it computes only one or two test points and the associated function values. Moreover, in this case, the new working simplex contains only one new vertex—the accepted point that replaces the worst vertex in the old simplex. As a consequence, the first two steps in the next iteration can be performed more efficiently than in the obvious implementation:

• the ordering of the vertices can be updated in linear time (at most $$n$$ comparisons) by one step of straight insertion sort, and
• the new centroid can also be computed by updating the previous one in $$O(n)$$ operations, with almost no additional storage.

See Singer and Singer (2004) for details of this implementation.

An analysis of a single Nelder-Mead iteration by Singer and Singer (2001) revealed a potential computational bottleneck in the domain convergence test. It becomes a serious problem if each function evaluation is reasonably fast with respect to the whole iteration. To circumvent this problem, Singer and Singer (2004) proposed a simple and efficient domain convergence test based on tracking the relative “linearized volume” of the working simplex. This paper also shows how much can be gained by efficient implementation of various steps in the Nelder-Mead algorithm.

## Convergence

Rigorous analysis of the Nelder-Mead method seems to be a very hard mathematical problem. Known convergence results for direct search methods (see Audet and Dennis, 2003; Price and Coope, 2003), in simplex terms, rely on one or both of the following properties:

• (a) The angles between adjacent edges of the working simplex are uniformly bounded away from $$0$$ and $$\pi$$ throughout the iterations, i.e., the simplex remains uniformly nondegenerate (see Torczon, 1997).
• (b) Some form of “sufficient” descent condition for function values at the vertices is required at each iteration.

In general, the original Nelder-Mead method does not satisfy either of these properties. By design, the shape of the working simplex can almost degenerate while “adapting itself to the local landscape”, and the method uses only simple decrease of function values at the vertices to transform the simplex. Hence, very little is known about the convergence properties of the method—with mainly negative results.

McKinnon (1998) gave a family of strictly convex functions and a class of initial simplices in two dimensions for which all vertices of the working simplex converge to a nonminimizing point.

A very readable paper by Lagarias et al. (1998) contains several convergence results in one and two dimensions for strictly convex functions with bounded level sets.

Almost nothing is known about the behaviour of the method for less smooth or discontinuous functions—except, of course, that the method may fail to converge to a minimizing point.

In many practical problems, like parameter estimation and process control, the function values are uncertain or subject to noise. Therefore, a highly accurate solution is not necessary, and may be impossible to compute. All that is desired is an improvement in function value, rather than full optimization.

The Nelder-Mead method frequently gives significant improvements in the first few iterations and quickly produces quite satisfactory results. Also, the method typically requires only one or two function evaluations per iteration, except in shrink transformations, which are extremely rare in practice. This is very important in applications where each function evaluation is very expensive or time-consuming. For such problems, the method is often faster than other methods, especially those that require at least $$n$$ function evaluations per iteration.

• In many numerical tests, the Nelder-Mead method succeeds in obtaining a good reduction in the function value using a relatively small number of function evaluations.

Apart from being simple to understand and use, this is the main reason for its popularity in practice.

On the other hand, the lack of convergence theory is often reflected in practice as a numerical breakdown of the algorithm, even for smooth and well-behaved functions.

• The method can take an enormous amount of iterations with negligible improvement in function value, despite being nowhere near to a minimum.

This usually results in premature termination of iterations. A heuristic approach to deal with such cases is to restart the algorithm several times, with reasonably small number of allowed iterations per each run.

## Modifications and related methods

There are many modifications of the original Nelder-Mead method. Most of them are aimed at improving the worst-case performance of the method with regards to convergence, while trying to retain some or most of its best-case efficiency.

Typically, such modifications incorporate additional tests at some stages of the algorithm, to ensure that at least one of the properties (a), (b) from above are satisfied. For example, a provably convergent variant of the Nelder-Mead method by Price et al. (2002) uses a combination of (a) and (b). Another convergent variant by Bürmen et al. (2006) uses the principle of grid restrainment, which is a form of (a).

Other direct search methods that have strong convergence properties have been proposed since the late 1970s. These include the multidirectional search method (see Torczon, 1991; Dennis and Torczon, 1991), generalized pattern search methods (see Torczon, 1997; Audet and Dennis, 2003), frame-based methods (see Coope and Price, 2001), and mesh adaptive direct search algorithms (see Audet and Dennis, 2006).

An excellent survey article by Kolda et al. (2003) gives the state of the art in general direct search methods at that time. An older survey is given by Powell (1998).