User:C. T. Kelley/Proposed/Numerical methods for nonlinear equations

Prof. C. T. Kelley, Department of Mathematics, North Carolina State University, USA

Prof. C. T. Kelley accepted the invitation on 4 September 2009 (self-imposed deadline: 4 March 2010).

This article is about numerical methods for nonlinear equations . The nonlinear equations we will consider have the general form \[\tag{1} F(x) = 0 \]

where \(F\) and \(x\) are vectors of the same size \(N\ .\) We will use standard mathematical notation for this and say \( F: R^N \to R^N\ .\) The objective of most numerical methods is to approximate a solution of (1) to acceptable accuracy. This is not the same as asking for all solutions.

We will use \(\| \cdot \|\) to denote a vector norm on \(R^N\) as well as the matrix norm induced by that vector norm. You may think of this as the usual Euclidean norm if you like. We will assume the reader is familiar with elementary ideas from numerical analysis, such as methods for the solution of linear equations, \(O\) notation, and numerical differentiation.

The methods we will consider in this article are iterative. This means that the output of the method is a sequence of vectors \(\{ x_n \}\) which approximate a solution. This is different from giving a numerical formula for the solution. To illustrate the difference we will consider the scalar (ie \(N = 1\) equation \[\tag{2} f(x) = x^2 - 4 = 0. \]

Note that in (2) we write the function as \(f\) and not \(F\ .\) The convention is that vector functions are written with upper case letters and scalar function in lower case. The solutions to (2) are \[ x^* = \pm 2. \] Here we have introduced another convention by letting \(x^*\) denote a solution. Most nonlinear equations cannot be solved by simple formulae, and iterative methods are the only choice.

You may have encountered Newton's method in calculus. For scalar equations the Newton iteration transforms a current approximation of a solution \(x_c\) to a new one \(x_+\) via \[\tag{3} x_+ = x_c - \frac{f(x_c)}{f'(x_c)}. \]

If we let \(x_0 = 1\) be our initial iterate, then the first two Newton iterates are \[ x_1 = 1 - \frac{-3}{2} = 5/2 \mbox{ and } x_2 = \frac{5}{2} - \frac{25/4 - 4}{10/4} = 1.6. \] The iteration will converge (quite rapidly) to \(x^* = +2\ .\) If \(-2\) is the solution you wanted, you will have to start an a different initial iterate.

This article is not about scalar equations and aims to consider the general case where \( N \ge 1 \ .\) The reader who is interested in methods for scalar equations can consult any of the standard references (Dennis et al 1996, Kelley 1995, Kelley 2003, Ortega et al 2000).

We will focus on Newton's method a some of its generalizations. In the next section we will explain Newton's method for \(N \ge 1\ .\) formally state our assumptions, and examine the computational cost. The subsequent sections address ways to reduce that cost.

Newton's method

If \(F:R^N \to R^N\) we will let \(f_j\) be the \(j\)th component of \(F\ .\) The Jacobian matrix of \(F\) is the \(N \times N\) matrix \( F' \) with \(ij\) entry \[ F'_{ij}(x) = \partial f_j/\partial (x)_j \] where \((x)_j\) is the \(j\)th component of \(x\ .\) We use this notation to avoid confusion with a Newton iterate. Newton's method is \[\tag{4} x_+ = x_c - F'(x_c)^{-1} F(x_c). \]

Newton's method models \(F\) at \(x_c\) by the local linear model \[\tag{5} M_c(x) = F(x_c) + F'(x_c) (x - x_c) \]

and then solves \(M_c(x) = 0\) to obtain \(x_+\ .\) One can improve the performance of the iteration by changing the model (replace \( F'(x_c) \) by something easier to process) or changing the way solves \(M_c(x) = 0\ .\)

Be aware that the classic formula (4) does not reflect how one implements the method in practice. The transition from \(x_c\) to \(x_+\) can be broken into four steps

Evaluate \( F(x_c) \) and test for termination
Evaluate \( F'(x_c) \)
Solve the linear system \(F'(x_c) s = - F(x_c)\) for the Newton step
\( x_+ = x_c + s \)

The most costly parts are the evaluation of \(F'(x_c)\) and the solution of the linear equation for the Newton step. The subsequent sections will tell you how to reduce that cost. For now, let's consider the case where you will evaluate \(F\) with a forward difference and will solve the equation for the Newton step by Gaussian elimination. In this case you can approximate the \(j\)th column of the Jacobian with the forward difference \[ \frac{F(x_c + h e_j ) - F(x_c)}{h} \] where \(e_j\) is the unit vector in the \(j\)th coordinate direction and \(h\) the difference increment. Since you'll need to approximate \(N\) columns, the cost of the Jacobian evaluation will be \(N\) evaluations of \(F\ .\) If function evaluations are expensive, computing a finite-difference Jacobian can be the most expensive part of the solve. If you solve the equation for the Newton step with Gaussian elimination, factoring the Jacobian costs \(O(N^3)\) floating point operations, which becomes a problem for large \( N \ .\)

In order for (4) to make sense, \(F\) must be differentiable and the Jacobian matrix must be nonsingular. We will formalize this and make assumptions that are valid for most applications. These standard assumptions are

There is a solution \( x^* \ .\)
\( F'(x^*) \) is nonsingular.
\( F' \) is Lipschitz continuous near \( x^* \ .\)

Lipschtz continuity of \( F' \) means that there is \( \gamma \ge 0 \) such that \[\tag{6} \| F'(x) - F'(y) \| \le \gamma \| x - y \| \]

for all \( x, y \) sufficiently close to \( x^* \ .\)

The local convergence theorem should remind you of the result from calculus.

Theorem: Let the standard assumptions hold. Then if \(x_c\) is sufficiently near \( x^* \) \[\tag{7} \| x_+ - x^* \| < \| x_+ - x^* \|/2 \]

and there is \( K_N > 0 \) such that \[\tag{8} \| x_+ - x^* \| < K_N \| x_+ - x^* \|^2 \]

Theorems about iterative methods for nonlinear equations are often statements about the progress from \(x_c\) to \(x_+\ .\) This is sufficient to describe the convergence of a sequence of iterates. For example suppose \(x_+\) is an initial iterate which satisfies the hypotheses of the theorem. Then (7) implies that \(x_1\ ,\) the subsequent iterate, is twice as close to math> x^* \ .</math> Hence the sequence of Newton iterates \(\{ x_n \}\) exists and converges to \( x^* \ .\) Newton's method does much more, and (8) is the key estimate. (8) tells us that the improvement in accuracy accelerates as the iteration progress, roughly doubling the number of significant figures with each iteration. An iteration, like Newton's method, which satisfies (8) is said to converge q-quadratically. As we discuss modifications to Newton's method, q-quadratic convergence is one thing we'll sacrifice.

With this discussion in mind we will restate the theorem in terms of a sequence of iterates \( \{ x_n \} \ .\)

Theorem: Let the standard assumptions hold. Then if \(x_0\) is sufficiently near \( x^* \)' then the Newton sequence \(\{x_n\}\) exists and converges to \( x^* \ .\) Moreover the convergence is q-quadratic" \[\tag{quad:label exists!} \| x_{n+1} - x^* \| =O(\| x_n - x^* \|^2). \]

In the remainder of this article we will use the \( x_c \) and \( x_+ \) notation for the estimates, but state the theorems in terms of the entire sequence of iterations.

Errors in the Function and Jacobian

Of course, your function and Jacobian will have errors. These can be unavoidable floating-point arithmetic errors, errors in programming, or even errors you intentionally make to improve efficiency. A result from (Kelley 1995) is useful to quantify this. Assume that you make errors in \(F\) and \(F'\ ,\) and that instead of evaluating the Newton iteration you compute the approximation \[ x_+^A = x_c - (F'(x_c) + \Delta_c)^{-1} (F(x_c) + \epsilon_c). \] The difference between the approximate and exact Newton iterate is \[ x_+^A = x_+ + O( \| x_c - x^* \| \| \Delta_c \| + \| \epsilon_c \| ) \]

Inexact Newton methods

Quasi-Newton Methods

Software

The matlab nonlinear solver codes associated with (Kelley 2003) are a significant improvement over the ones from (Kelley 1995). They include Newton's method with direct solvers, Newton-Krylov methods, and Broyden's method.

References

Dennis, J. E. and Schnabel, R. B, (1996) Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.

Kelley, C. T. (1995),Iterative Methods for Linear and Nonlinear Equations, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.

Kelley, C. T. (2003), Solving Nonlinear Equations with Newton's Method, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.

Ortega, J. M. and Rheinboldt, W. R. (2000), Iterative Solution of Nonlinear Equations in Several Variables, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.

User:C. T. Kelley/Proposed/Numerical methods for nonlinear equations

Contents

Newton's method

Errors in the Function and Jacobian

Inexact Newton methods

Quasi-Newton Methods

Software

References

Recommended Reading

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Focal areas

Activity

Tools