# Initial value problems

Lawrence F. Shampine and Skip Thompson (2007), Scholarpedia, 2(3):2861. | doi:10.4249/scholarpedia.2861 | revision #91375 [link to/cite this article] |

## Contents |

## Initial Value Problems

Most general-purpose programs for the numerical solution of ordinary differential equations expect the equations to be presented as an explicit system of first order equations, \[\tag{1} y' = F(t,y) \]

that are to hold on a *finite* interval \([t_0, t_f]\ .\) An
**initial value problem** specifies the solution of interest by an
initial condition \(y(t_0) = A\ .\) The solvers also expect that
\(F(t,y)\) is continuous in a region that includes \(A\)
and that the partial derivatives \(\partial F_i/\partial y_j\)
are bounded there, assumptions that imply the initial value problem has a solution and only
one.

A boundary value problem specifies a solution of interest by imposing conditions at more than one point. For instance, the Blasius problem is the differential equation \(y' ' ' = - y \, y' '/2\) with boundary conditions \(y(0) = 0, y'(0) = 0, y'(\infty) = 1\ .\) This example is quite unusual in that a transformation of the solution of the initial value problem \[\tag{2} u' ' ' = - u \, u' '/2, \quad u(0) = 0, u'(0) = 0, u' '(0) = 1 \]

provides a solution of the boundary value problem. Generally existence and uniqueness of solutions are much more complicated for boundary value problems than initial value problems, especially because it is not uncommon that the interval is infinite or \(F(t,y)\) is not smooth. Correspondingly, solving boundary value problems numerically is rather different from solving initial value problems.

Differential equations arise in the most diverse forms, so it is necessary to prepare them for solution. The usual way to write a set of equations as a first order system is to introduce an unknown for each dependent variable in the original set of equations plus an unknown for each derivative up to one less than the highest appearing. For the problem (2) we could let \(y_1(t) = u(t), y_2(t) = u'(t), y_3(t) = u' '(t)\ ,\) to obtain \[ y_1' = y_2 \] \[ y_2' = y_3 \] \[ y_3' = - y_1 \, y_3/2 \] with initial conditions \(y_1(0) = 0, y_2(0) = 0, y_3(0) = 1\ .\)

The *van der Pol equation* \(\epsilon y' ' - (1 - y^2)\,y' + y = 0\)
is equivalent to
\[
y_1' = y_2
\]
\[
\epsilon y_2' = -y_1 + (1 - y_1^2)y_2
\]
Clearly the character of this equation changes when the parameter
\(\epsilon\) is set to 0. Although we can solve such a system
numerically for any given \(\epsilon > 0\ ,\) we need singular perturbation theory to understand how the solution depends on
\(\epsilon\ .\) The system with \(\epsilon = 0\) is a
differential-algebraic equation. Differential-algebraic equations
resemble ordinary differential equations, but they
differ in important ways. Nonetheless, there are programs that
accept equations in the implicit form \(F(t,y,y') = 0\) and
solve initial value problems for both ordinary differential equations
and certain kinds of differential-algebraic equations. Van der Pol's
equation for small \(\epsilon > 0\) is an example of a **stiff system**.
Programs intended for non-stiff initial value problems perform very poorly when
applied to a stiff system. Although most initial value problems are not stiff, many
important problems are, so special methods have been developed that solve them
effectively. An initial value problem is stiff in regions where \(y(t)\)
is slowly varying and the differential equation is very stable, i.e., nearby solutions
of the equation converge very rapidly to \(y(t)\ .\)

## Discrete Variable Methods

Discrete variable methods approximate \(y(t)\) on a mesh \(t_0 < t_1 < \ldots < t_f\ .\) They start with the initial value \(y_0 = A\) and on reaching \(t_n\ ,\) step to \(t_{n+1} = t_n + h_n\) by computing \(y_{n+1} \approx y(t_{n+1})\ .\) Methods are often analyzed with the assumption that the step sizes \(h_n\) are constant, but general-purpose solvers vary the step size for reasons discussed below. A great many methods have been proposed, but three kinds dominate: Runge-Kutta, Adams, BDFs (backward differentiation formulas). On reaching \(t_n\) there are available values \(y_n,y_{n-1},\ldots \) and \(F_n = F(t_n,y_n), F_{n-1},\ldots \) that might be exploited. Methods with memory like Adams and BDFs use some of these values; one-step methods like Runge-Kutta evaluate \(F\) only at points in \([t_n,t_{n+1}]\ .\)

### Euler Methods

Any smooth function \(u(t)\) can be expanded in a Taylor series \[ u(t_{n+1}) = u(t_n) + h_n u't_n) + h_n^2 \frac{u^{(2)}(t_n)}{2} + \ldots \] If \(u(t)\) is the local solution defined by \[\tag{3} u' = F(t,u),\quad u(t_n) = y_n \]

this expansion suggests an approximation of \(u(t_{n+1})\) called
**Euler's method** or the **forward Euler method**
\[
y_{n+1} = y_n + h_n F(t_n,y_n)
\]
This is an explicit formula that requires one evaluation of \(F\)
per step. The series shows that the *local error*
\[\tag{4}
le_n = u(t_{n+1}) - y_{n+1}
\]

is \(O(h_n^2)\ .\) When \(u(t)\) in this definition is the solution
\(y(t)\) of the initial value problem,
the local error is called the *truncation error* or *discretization error*.

The **backward Euler method** is
\[
y_{n+1} = y_n + h_n F(t_{n+1},y_{n+1})
\]
It is not obvious why we might be interested in a formula that
defines \(y_{n+1}\) implicitly as the solution of a system of
algebraic equations and has the same accuracy as an explicit
formula, but it turns out that this formula is effective for stiff
systems and the forward Euler method is not. These two one-step
methods can be derived in several ways with extensions that lead to the popular
kinds of methods.

### Backward Differentiation Formulas

The backward Euler method can be derived from a difference
approximation to the derivative in equation (1)
\[
\frac{y_{n+1} - y_n}{h_n} = F(t_{n+1},y_{n+1})
\]
Difference approximations that use more of the values
\(y_n,y_{n-1},\ldots \) result in other
**backward differentiation formulas** (BDFs). Interpolation
theory or Taylor series expansion shows that each additional
\(y_{n-j}\) used raises the order of the truncation error by one.
The backward Euler formula is BDF1, the lowest order member of the
family. These formulas are the most popular for solving stiff
systems.

### Adams Methods

**Adams methods** arise from integrating (3) to get
\[
u(t_{n+1}) = y_n + \int^{t_{n+1}}_{t_n} F(t,u(t))\,dt
\]
If the function \(F(t,u(t))\) here is approximated by a polynomial
interpolating values \(F_n, F_{n-1},\ldots \ ,\) an explicit
**Adams-Bashforth (AB) formula** is obtained. Euler's method is AB1,
the lowest order member of the family. If the polynomial also
interpolates \(F(t_{n+1},y_{n+1})\ ,\) an implicit **Adams-Moulton (AM) formula** is obtained. As with the BDFs, each additional
\(F_{n-j}\) used raises the order of the truncation error by one.
The backward Euler method is AM1. Linear interpolation results in the
second order AM2,
\[
y_{n+1} = y_n + \frac{h_n}{2}\, [F(t_n,y_n) + F(t_{n+1},y_{n+1})]
\]
which is known as the **trapezoidal rule**.

How an implicit formula is evaluated is crucial to its use.
An explicit **predictor** formula is first used to compute a
tentative value \(p_{n+1} \ .\) It is substituted for
\(y_{n+1}\) in the right hand side of the implicit
**corrector** formula and the formula evaluated to get a
better approximation to \(y_{n+1}\ .\)
For example, if the formulas AB1 and AM1 are used as a pair,
this process of *simple iteration* starts with the predicted value
\[
y_{n+1}^{[0]} = y_n + h_n F(t_n, y_{n})
\]
and successively corrects values
\[
y_{n+1}^{[m+1]} = y_n + h_n F(t_{n+1}, y_{n+1}^{[m]})
\]
This process is repeated until the implicit formula is satisfied well
enough or the step is deemed a failure. If the step is a failure, \(h_n\)
is reduced to improve the rate of convergence and the accuracy of the
prediction, and the program again tries to take a step.
Simple iteration is the standard way of evaluating implicit formulas for
non-stiff problems. If a predetermined, *fixed* number of iterations is done,
the resulting method is called a **predictor-corrector** formula.
If the predictor has a truncation error that is of the same order as
the implicit formula, a single iteration produces a result with a
truncation error that agrees to leading order with that of iterating
to completion. If the predictor is of order one lower, the result
has the same order as the corrector, but the leading terms in the
truncation error are different. There are two ways that Adams
formulas are implemented in popular solvers. Both predict with an
Adams-Bashforth formula and correct with an Adams-Moulton method.
One way is to iterate to completion, so that the integration is effectively done
with an implicit Adams-Moulton method. The other is to do a single
correction, which amounts to an explicit formula. For non-stiff
problems there is little difference in practice, but the two
implementations behave very differently when solving stiff problems.
As an example, if we predict with Euler's method (AB1) and correct
once with the trapezoidal rule (AM2), we obtain a formula
\[
p_{n+1} = y_n + h_n F(t_n,y_n)
\]
\[
y_{n+1} = y_n + \frac{h_n}{2}\, [F(t_n,y_n) + F(t_{n+1},p_{n+1})]
\]
that is known as *Heun's* method. It is an explicit method that
costs two evaluations of \(F\) per step and has a local error
that is \(O(h_n^3)\ .\)

### Runge-Kutta Methods

Euler's method, the backward Euler method, the trapezoidal rule, and
Heun's method are all examples of **Runge-Kutta (RK) methods**.
RK methods are often derived by writing down the form of the
one-step method, expanding in Taylor series, and choosing
coefficients to match terms in a Taylor series expansion of the
local solution to as high order as possible. The general form of an explicit Runge-Kutta method involving two evaluations of \(F\ ,\) or as they are called in this context, *stages*, is
\[
y_{n,1} = y_n + \alpha_{1} h_n F(t_n,y_n)
\]
\[
y_{n+1} = y_n + h_n \, [ \beta_{2,0} F(t_n,y_n) +
\beta_{2,1} F(t_{n+1},y_{n,1})]
\]
It turns out that it is not possible to get order three with just two stages, but the coefficients of Heun's method show that it is possible to get order two. Another formula of order two that has been used in some solvers is
\[
y_{n,1} = y_n + \frac{h_n}{2} F(t_n,y_n)
\]
\[
y_{n+1} = y_n + h_n\, F(t_{n+1},y_{n,1})
\]
Explicit RK methods are very popular for solving non-stiff IVPs. Implicit RK methods are very popular for solving BVPs and also used for solving stiff IVPs.

## Convergence

On reaching \(t_n\ ,\) we can write the true, or global, error at \(t_{n+1}\) in terms of the local solution as \[\tag{5} y(t_{n+1}) - y_{n+1} = [y(t_{n+1})-u(t_{n+1})] + [u(t_{n+1})- y_{n+1}] \]

The second term on the right is the local error, a measure of how well the method imitates the behavior of the differential equations. Reducing the step size reduces the local error and the higher the order, the more it is reduced. The first term measures how much two solutions of the differential equations can differ at \(t_n + h_n\) given that they differ at \(t_n\) by \(y(t_n)-u(t_n)=y(t_n)-y_n\ ,\) i.e., it measures the stability of the problem. The argument is repeated at the next step for a different local solution. In this perspective, the numerical method tries at each step to track closely a local solution of the differential equations, but the code moves from one local solution to another and the cumulative effect depends on the stability of the equations near \(y(t)\ .\) With standard assumptions about the initial value problem, the cumulative effect grows no more than linearly for a one-step method. Suppose that a constant step size \(h\) is used with a method having a local error that is \(O(h^{p+1})\ .\) To go from \(t_0\) to \(t_f\) requires \(O(1/h)\) steps, hence the worst error on the interval is \(O(h^p)\ .\) For this reason a method with local error \(O(h^{p+1})\) is said to be of order \(p\ .\)

In this view of the error, the stability of the initial value problem is paramount. A view that is somewhat better suited to methods with memory emphasizes the stability of the numerical method. The values \({y(t_n)}\) satisfy the formula with a perturbation that is the truncation error. If the numerical method is stable, convergence can be established as in the other approach. Using more values \(y_{n-j}\) or slopes \(F_{n-j}\) leads to more accurate BDFs and Adams methods, but we must wonder whether errors in these values might be amplified to the point that a formula is not stable, even as the step size goes to zero. Indeed, BDF7 is not stable in this sense, though BDF1-BDF6 are. The order of accuracy can be nearly doubled by using both values and slopes, but these very accurate formulas are not usable because they are not stable. If the step sizes are sufficiently small, all the popular numerical methods are stable. When solving non-stiff problems it is generally the case that if the step sizes are small enough to provide the desired accuracy, the numerical method has satisfactory stability. On the other hand, if the initial value problem is stiff, the step sizes that would provide the desired accuracy must be reduced greatly to keep the computation stable when using a method and implementation appropriate for non-stiff initial value problems. Some implicit methods have such good stability properties that they can solve stiff initial value problems with step sizes that are appropriate to the behavior of the solution if they are evaluated in a suitable way. The backward Euler method and the trapezoidal rule are examples.

## Error Estimation and Control

General-purpose initial value problem solvers estimate and control the error at each step by adjusting the step size. This approach gives a user confidence that the problem has been solved in a meaningful way. It is inefficient and perhaps impractical to solve with constant step size an initial value problem with a solution that exhibits regions of sharp change. Numerical methods are stable only if the step sizes are sufficiently small and control of the error helps bring this about.

The truncation error of all the popular formulas involves derivatives of the solution or differential equation. To estimate the truncation error, these derivatives are estimated using methods which involve little or no additional computation. For BDF methods the necessary derivative of the solution is commonly estimated using a difference approximation based on values \(y_{n+1},y_n,y_{n-1},\ldots \ .\) Another approach is to take each step with two formulas and estimate the error by comparison. This is commonly done for Runge-Kutta methods by comparing formulas of different orders. That is the approach taken in some Adams codes; others use the fact that the truncation errors of Adams-Bashforth and Adams-Moulton formulas of the same order differ only by constant factors. With an estimate of the error incurred with step size \(h_n\ ,\) it is possible to predict the error that would be made with a different step size. If the estimated error of the current step is bigger than a tolerance specified by the user, the step is rejected and tried again with a step size that it is predicted will succeed. If the estimated error is much smaller than required, the next step size is increased so that the problem can be solved more efficiently.

The Adams methods and BDFs are families of formulas. When taking a step with one of these formulas, it is possible to estimate what the error would have been if the step had been taken with a formula of lower order and in the right circumstances, a formula of order one higher. Using such estimates the popular Adams and BDF solvers try to select not only an efficient step size, but also an efficient order. Variation of order provides a simple and effective way of dealing with the practical issue of starting the integration. The lowest order formulas are one-step, so they are used to get started. As more \(y_{n-j}\) and \(F_{n-j}\) become available, the solver can raise the order as appropriate.

## Further Reading and IVP Solvers

The text Shampine et al. (2003) is an introduction to methods and software for initial value problems and boundary value problems (and delay differential equations) in the problem solving environment MATLAB. Initial value problems and differential-algebraic equations are discussed at a similar level in Ascher and Petzold (1998) and at a higher level in Hairer et al. (2000) and Hairer and Wanner (2004). All these texts provide references to software. GAMS, the Guide to Available Mathematical Software, http://gams.nist.gov/ is a convenient way to locate software for solving IVPs numerically.

## References

- U.M. Ascher and L.R. Petzold, Computer Methods for Ordinary Differential Equations and Differential-Algebraic Equations, SIAM, Philadelphia, 1998.
- E. Hairer, S.P. Norsett, and G. Wanner, Solving Ordinary Differential Equations I, Nonstiff Problems, Springer--Verlag, Berlin, 2000.
- E. Hairer and G. Wanner, Solving Ordinary Differential Equations II, Stiff and Differential--Algebraic Problems, Springer--Verlag, Berlin, 2004.
- L.F. Shampine, I. Gladwell, and S. Thompson,Solving ODEs with MATLAB, Cambridge University Press, Cambridge, 2003.

**Internal references**

- Bill Gear (2007) Backward differentiation formulas. Scholarpedia, 2(8):3162.
- Skip Thompson (2007) Delay-differential equations. Scholarpedia, 2(3):2367.
- Rob Schreiber (2007) MATLAB. Scholarpedia, 2(7):2929.
- Kendall E. Atkinson (2007) Numerical analysis. Scholarpedia, 2(8):3163.
- John Butcher (2007) Runge-Kutta methods. Scholarpedia, 2(9):3147.
- Philip Holmes and Eric T. Shea-Brown (2006) Stability. Scholarpedia, 1(10):1838.
- Lawrence F. Shampine and Skip Thompson (2007) Stiff systems. Scholarpedia, 2(3):2855.
- Takashi Kanamaru (2007) Van der Pol oscillator. Scholarpedia, 2(1):2202.

## External links

## See Also

Boundary Value Problems, Differential-Algebraic Equations, Numerical Analysis, Stiff Systems