Dr. Peter Grassberger

From Scholarpedia
Curator of ScholarpediaCurator Index: 1(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
\documentclass[12pt]{report}
+
==Basic Definitions==
\usepackage{epsfig}
+
\parindent=0.cm
+
\parskip=.3cm
+
\textheight=23.cm
+
\newcommand{\be}{:<math>}
+
\newcommand{\ee}{</math>}
+
  
 
 
'''{\Large The Grassberger-Procaccia Algorithm}'''
 
 
\vspace{1.cm}
 
 
 
 
== Basic Definitions ==
 
  
 
The GP algorithm is used for estimating the correlation dimension of some fractal
 
The GP algorithm is used for estimating the correlation dimension of some fractal
 
measure <math>\mu</math> from a given set of points randomly distributed according to <math>\mu</math>. Let
 
measure <math>\mu</math> from a given set of points randomly distributed according to <math>\mu</math>. Let
the <math>N</math> points be denoted by <math>'''x'''_1,... '''x'''_N</math>, in some metric space with distances
+
the <math>N</math> points be denoted by <math>{\mathbf x}_1,\ldots {\mathbf x}_N</math>, in some metric space with distances
<math>|'''x'''_i-'''x'''_j|</math> between any pair of points. For any positive number <math>r</math>, the ''correlation sum''
+
<math>|{\mathbf x}_i-{\mathbf x}_j|</math> between any pair of points. For any positive number <math>r</math>, the ''correlation sum''
 
<math>C(r)</math> is then defined as the fraction of pairs whose distance is smaller than <math>r</math>,
 
<math>C(r)</math> is then defined as the fraction of pairs whose distance is smaller than <math>r</math>,
  
:<math eq1> {\hat C}(r) = {2\over N(N-1)}\sum_{i<j} \theta(r-|'''x'''_i-'''x'''_j|),      </math>
+
:<math eq1> {\hat C}(r) = {2\over N(N-1)}\sum_{i<j} \theta(r-|{\mathbf x}_i-{\mathbf x}_j|),      </math>
  
 
where <math>\theta(x)</math> is the Heaviside step function. It is an unbiased estimator of the
 
where <math>\theta(x)</math> is the Heaviside step function. It is an unbiased estimator of the
 
''correlation integral''
 
''correlation integral''
  
:<math eq2> C(r) = \int d\mu('''x''') \int d\mu('''y''') \theta(r-|'''x'''-'''y'''|).                </math>
+
:<math
 +
eq2> C(r) = \int d\mu({\mathbf x}) \int d\mu({\mathbf y}) \theta(r-|{\mathbf x}-{\mathbf y}|).                </math>
  
 
Both <math>{\hat C}(r)</math> and <math>C(r)</math> are monotonically decreasing to zero as <math>r\to 0</math>. If <math>C(r)</math>
 
Both <math>{\hat C}(r)</math> and <math>C(r)</math> are monotonically decreasing to zero as <math>r\to 0</math>. If <math>C(r)</math>
Line 38: Line 24:
 
and most naive way to estimate <math>D</math> is to plot <math>C(r)</math> against <math>r</math> on a log-log plot and to
 
and most naive way to estimate <math>D</math> is to plot <math>C(r)</math> against <math>r</math> on a log-log plot and to
 
fit a straight line to the small-<math>r</math> tail of the curve. <math>D</math> is then the slope of this line
 
fit a straight line to the small-<math>r</math> tail of the curve. <math>D</math> is then the slope of this line
(see Fig. 1). More sophisticated methods involve e.g. fitting local slopes <math>D_{\rm eff}(r)</math>
+
(see Fig.&nbsp;1). More sophisticated methods involve e.g. fitting local slopes <math>D_{\rm eff}(r)</math>
 
and extrapolating them to <math>r\to 0</math>, or the method proposed in \cite{takens1,Theiler0}.
 
and extrapolating them to <math>r\to 0</math>, or the method proposed in \cite{takens1,Theiler0}.
  
== Main Application: Chaotic Dynamical Systems ==
+
==Main Application: Chaotic Dynamical Systems==
 +
 
  
 
Although the GP algorithm can be used for any measure (the basic idea had been used before
 
Although the GP algorithm can be used for any measure (the basic idea had been used before
 
to estimate dimensions of fractal clusters created by diffusion limited aggregation \cite{a1}),
 
to estimate dimensions of fractal clusters created by diffusion limited aggregation \cite{a1}),
 
it is mostly used to measure the fractal dimensions of a ''strange attractor'' from a
 
it is mostly used to measure the fractal dimensions of a ''strange attractor'' from a
univariate (i.e. scalar) ''time series'' which we denote as <math>x_1,... x_N</math>. Now, <math>x_i
+
univariate (i.e. scalar) ''time series'' which is denoted as <math>x_1,\ldots x_N</math>. Now, <math>x_i</math> represents
</math> represents
+
a measurement of the quantity <math>x</math> at time <math>t_i = t_0 + i\Delta t</math>. We assume stationarity,
a measurement of the quantity <math>x</math> at time <math>t_i = t_0 + i\Delta t</math>. We assume s
+
tationarity,
+
 
i.e. the statistics of the set <math>\{x_i\}</math> is invariant under time translation. Unless the
 
i.e. the statistics of the set <math>\{x_i\}</math> is invariant under time translation. Unless the
 
measurements are i.i.d., there will be correlations between successive measurements. But
 
measurements are i.i.d., there will be correlations between successive measurements. But
 
they will be weak and short-ranged, if the data are produced by a chaotic system, i.e.
 
they will be weak and short-ranged, if the data are produced by a chaotic system, i.e.
 
if they are sampled from a trajectory on a strange attractor (or strange repeller). In that
 
if they are sampled from a trajectory on a strange attractor (or strange repeller). In that
case, and if <math>N</math> is sufficiently big, we can assume that the data are effectively indepen
+
case, and if <math>N</math> is sufficiently big, one can assume that the data are effectively independent
dent
+
and randomly sampled from the ''invariant natural measure'' on the attractor, and one can
and randomly sampled from the ''invariant natural measure'' on the attractor, and we can
+
 
directly take over Eqs.(<ref>eq1</ref>) and (<ref>eq2</ref>). Furthermore, using Takens' time delay
 
directly take over Eqs.(<ref>eq1</ref>) and (<ref>eq2</ref>). Furthermore, using Takens' time delay
embedding theorem \cite{takens2} and its improvements \cite{sauer}, we can replace a series
+
embedding theorem \cite{takens2} and its improvements \cite{sauer}, one can replace a series
 
of <math>N+m-1</math> univariate measurements by a time series of <math>N</math> ''delay vectors''
 
of <math>N+m-1</math> univariate measurements by a time series of <math>N</math> ''delay vectors''
  
:<math delay> '''x'''_i = (x_{i-m+1},x_{i-m+2},... x_i) \in R^m          </math>
+
:<math delay> {\mathbf x}_i = (x_{i-m+1},x_{i-m+2},\ldots x_i) \in R^m          </math>
  
 
where <math>m</math> is the ''embedding dimension''. Estimating the dimension of attractors by
 
where <math>m</math> is the ''embedding dimension''. Estimating the dimension of attractors by
Line 70: Line 54:
 
One of the main applications of the GP algorithm is to distinguish (in principle) between
 
One of the main applications of the GP algorithm is to distinguish (in principle) between
 
stochastic and deterministically chaotic time sequences. For a stochastic signal, <math>C(r,m)
 
stochastic and deterministically chaotic time sequences. For a stochastic signal, <math>C(r,m)
\sim r^m</math> for all <math>m</math>. In contrast, <math>C(r,m) \sim r^D</math> for <math>m</math>
+
\sim r^m</math> for all <math>m</math>. In contrast, <math>C(r,m) \sim r^D</math> for <math>m</math> larger than the attractor
larger than the attractor
+
 
dimension, if the signal is generated by a deterministic system. Notice that in both cases
 
dimension, if the signal is generated by a deterministic system. Notice that in both cases
 
the Fourier spectrum is continuous, and thus cannot be used to make this distinction.
 
the Fourier spectrum is continuous, and thus cannot be used to make this distinction.
 
In practice, the distinction based on <math>C(r,m)</math> is often not possible either, due to
 
In practice, the distinction based on <math>C(r,m)</math> is often not possible either, due to
 
experimental noise, finiteness of <math>N</math>, non-stationarity and intermittency effects, and
 
experimental noise, finiteness of <math>N</math>, non-stationarity and intermittency effects, and
due to the uncertainties involved in the extrapolation <math>r\to 0</math>. It seems fair to say tha
+
due to the uncertainties involved in the extrapolation <math>r\to 0</math>. It seems fair to say that
t
+
 
a large fraction of the relevant literature is obsolete, because authors have underestimated
 
a large fraction of the relevant literature is obsolete, because authors have underestimated
 
these problems in view of the easiness of the implementation of the bare algorithm.
 
these problems in view of the easiness of the implementation of the bare algorithm.
  
== Relations to Other Dynamical Invariants and Multifractality ==
+
==Relations to Other Dynamical Invariants and Multifractality==
  
Let us denote by <math>C(r,m)</math> the correlation integral obtained with embedding dimension <mat
+
 
h>m</math>.
+
Let us denote by <math>C(r,m)</math> the correlation integral obtained with embedding dimension <math>m</math>.
 
Then the typical behavior for <math>r\to 0,\; m\to \infty</math> for a chaotic system with attractor
 
Then the typical behavior for <math>r\to 0,\; m\to \infty</math> for a chaotic system with attractor
dimension <math><m</math> is <math>C(r,m) \sim r^D \exp{(-mK_2\Delta t)}</math> where <math>K_2</mat
+
dimension <math><m</math> is <math>C(r,m) \sim r^D \exp{(-mK_2\Delta t)}</math> where <math>K_2</math> is the order-2 Renyi entropy,
h> is the order-2 Renyi entropy,
+
 
a proxy for the ''Kolmogorov-Sinai'' entropy. Thus the GP algorithm can be used also to
 
a proxy for the ''Kolmogorov-Sinai'' entropy. Thus the GP algorithm can be used also to
estimate dynamical entropies \cite{GP2} (see Fig. 1).
+
estimate dynamical entropies \cite{GP2} (see Fig.&nbsp;1).
  
 
The basic idea of the GP algorithm, namely to estimate a dimension from the statistics
 
The basic idea of the GP algorithm, namely to estimate a dimension from the statistics
 
of near neighbors, can be implemented also in other ways. For instance, one can define
 
of near neighbors, can be implemented also in other ways. For instance, one can define
''pointwise dimensions'' <math>D(i)</math> by counting for each <math>i</math> the fraction <math>n_
+
''pointwise dimensions'' <math>D(i)</math> by counting for each <math>i</math> the fraction <math>n_i(r)/(N-1)</math> of
i(r)/(N-1)</math> of
+
points which are <math>r</math>-close neighbors of <math>{\mathbf x}_i</math> and fitting it to a power law. Alternatively,
points which are <math>r</math>-close neighbors of <math>'''x'''_i</math> and fitting it to a power
+
one obtains the ''information dimension'' <math>D_1</math> by fitting a power law to a geometric average,
law. Alternatively,
+
one obtains the ''information dimension'' <math>D_1</math> by fitting a power law to a geometric ave
+
rage,
+
 
<math>C_1(r,m) = \exp[N^{-1} \sum_i \ln (n_i(r)/(N-1))]</math>. Or, more generally, one can define non-linear
 
<math>C_1(r,m) = \exp[N^{-1} \sum_i \ln (n_i(r)/(N-1))]</math>. Or, more generally, one can define non-linear
 
averages by <math>C_q(r,m) = [N^{-1} \sum_i [(n_i(r)/(N-1)]^{q-1}]^{1/(q-1)}</math>. Notice that <math>C_2(r,m)
 
averages by <math>C_q(r,m) = [N^{-1} \sum_i [(n_i(r)/(N-1)]^{q-1}]^{1/(q-1)}</math>. Notice that <math>C_2(r,m)
Line 112: Line 90:
 
analyses are in general those with <math>q=1</math>.
 
analyses are in general those with <math>q=1</math>.
  
== Computational Complexity Aspects ==
+
==Computational Complexity Aspects==
 +
 
  
 
Typically, one wants to obtain <math>{\hat C}(r,m)</math> for <math>N_r</math> different values of <math>r</math> (equally
 
Typically, one wants to obtain <math>{\hat C}(r,m)</math> for <math>N_r</math> different values of <math>r</math> (equally
Line 124: Line 103:
 
can treat all values of <math>m</math> in a single run, which reduces the <math>M^2</math> dependence to <math>M</math>. This
 
can treat all values of <math>m</math> in a single run, which reduces the <math>M^2</math> dependence to <math>M</math>. This
 
can be further reduced to a weaker than linear increase with <math>M</math> (at least for intermediate
 
can be further reduced to a weaker than linear increase with <math>M</math> (at least for intermediate
values of <math>M</math>), if one replaces the double sum over <math>i</math> and <math>j</math> in Eq. (1) by a sum over <math>i</math> and <math>i-j</math>.
+
values of <math>M</math>), if one replaces the double sum over <math>i</math> and <math>j</math> in Eq.&nbsp;(1) by a sum over <math>i</math> and <math>i-j</math>.
 
For a fast implementation using also some other shortcuts, see \cite{Lehnertz1}.
 
For a fast implementation using also some other shortcuts, see \cite{Lehnertz1}.
  
Line 132: Line 111:
 
pairs with large <math>r</math>, obtaining substantial improvements \cite{boxes}.
 
pairs with large <math>r</math>, obtaining substantial improvements \cite{boxes}.
  
== "Optimal" Choices for Delay and Embedding Dimension ==
+
==``Optimal" Choices for Delay and Embedding Dimension==
 +
 
  
 
There exists a large literature which attempts to determine optimal choices for the delay
 
There exists a large literature which attempts to determine optimal choices for the delay
Line 148: Line 128:
 
other, near-by, choices \cite{GSS}.
 
other, near-by, choices \cite{GSS}.
  
== Non-Stationary Signals and Theiler Correction==
+
==Non-Stationary Signals and Theiler Correction==
 +
 
  
 
When applying the GP method to time sequences, one should remember that its justification
 
When applying the GP method to time sequences, one should remember that its justification
hinges on the assumption that all points <math>'''x'''_i</math> are independent apart from being
+
hinges on the assumption that all points <math>{\mathbf x}_i</math> are independent apart from being
 
distributed according to the same invariant measure. In particular, there should be no
 
distributed according to the same invariant measure. In particular, there should be no
 
significant time correlations.
 
significant time correlations.
  
 
This is manifestly and grossly violated, if the system is not stationary. In that case
 
This is manifestly and grossly violated, if the system is not stationary. In that case
a main reason for two points <math>'''x'''_i</math> and <math>'''x'''_j</math> to be close neighbors in space
+
a main reason for two points <math>{\mathbf x}_i</math> and <math>{\mathbf x}_j</math> to be close neighbors in space
 
might be that they are also close in time, as is most clearly demonstrated by (ordinary
 
might be that they are also close in time, as is most clearly demonstrated by (ordinary
 
or fractal) diffusion. Neglecting this has been one of the most common reasons for wrong
 
or fractal) diffusion. Neglecting this has been one of the most common reasons for wrong
 
claims for small attractor dimensions. Fortunately, there is an easy way to test against
 
claims for small attractor dimensions. Fortunately, there is an easy way to test against
this danger: Plot all pairs <math>(i,j)</math> with <math>|'''x'''_i-'''x'''_j|<r</math>
+
this danger: Plot all pairs <math>(i,j)</math> with <math>|{\mathbf x}_i-{\mathbf x}_j|<r</math>
 
against <math>|i-j|</math>, and check that they don't cluster at small <math>|i-j|</math> (more precisely, the
 
against <math>|i-j|</math>, and check that they don't cluster at small <math>|i-j|</math> (more precisely, the
 
density of these points should be <math>\sim N-|i-j|</math>). More
 
density of these points should be <math>\sim N-|i-j|</math>). More
Line 173: Line 154:
 
:<math thei> {\hat C}(r) = {2\over (N-\tau)(N-\tau-1)}\sum_{i+\tau<j} \theta(r-|x_i-x_j|).      </math>
 
:<math thei> {\hat C}(r) = {2\over (N-\tau)(N-\tau-1)}\sum_{i+\tau<j} \theta(r-|x_i-x_j|).      </math>
  
== Intermittent, Noisy, and Stochastic Time Sequences ==
+
==Intermittent, Noisy, and Stochastic Time Sequences==
 +
 
  
 
Experimental data are usually noisy and often intermittent. Strong intermittency
 
Experimental data are usually noisy and often intermittent. Strong intermittency
Line 203: Line 185:
  
 
\begin{figure}
 
\begin{figure}
 +
\begin{center}
 
\epsfig{file=henon.ps, width=8.3cm, angle=270}
 
\epsfig{file=henon.ps, width=8.3cm, angle=270}
\caption{ Log-log plot of <math>{\hat C}(r,m)</math> for the H{\'e}non map versus r, for <math>m=1,2,... 8</math>.
+
\caption{ Log-log plot of <math>{\hat C}(r,m)</math> for the H{\'e}non map versus r, for <math>m=1,2,\ldots 8</math>.
 
The H{\'e}non map is defined as <math>x_{i+1} = 1 -1.4x_i^2+0.3y_i,\; y_{i+1} = x_i</math>. The metric
 
The H{\'e}non map is defined as <math>x_{i+1} = 1 -1.4x_i^2+0.3y_i,\; y_{i+1} = x_i</math>. The metric
 
used is the maximum norm.
 
used is the maximum norm.
 
A power law <math>{\hat C}(r,m) \sim r^D</math> with <math>D\approx 1.21</math> is seem in the central region
 
A power law <math>{\hat C}(r,m) \sim r^D</math> with <math>D\approx 1.21</math> is seem in the central region
(<math>-9 < \ln r < -2</math>) for all curves with <math>m\geq 2</math>. For smaller <math>r</math>, the data look noisy due
+
(<math>-10 < \ln r < -2</math>) for all curves with <math>m\geq 2</math>. For smaller <math>r</math>, the data look noisy due
to lack of statistics (<math>n=3\times 10^5</math>) and due to the randomness of the chosen starting point <math>x_0</math>.
+
to lack of statistics (<math>n=3\times 10^5</math>). For larger <math>r</math> one sees deviations from scaling
For larger <math>r</math> one sees deviations
+
which would be different if the Euclidean norm had been used. The gaps between
from scaling which would be different had we chosen the Euclidean norm. The gaps between
+
 
the curves for <math>m\geq 2</math> in the scaling region determine the dynamical entropy <math>K_2</math>.}
 
the curves for <math>m\geq 2</math> in the scaling region determine the dynamical entropy <math>K_2</math>.}
 +
\end{center}
 
\end{figure}
 
\end{figure}
  
Line 243: Line 226:
 
(Cambridge University Press, Cambridge 2003).
 
(Cambridge University Press, Cambridge 2003).
 
\bibitem{Tisean} R. Hegger, H. Kantz and T. Schreiber, TISEAN sortware package; URL
 
\bibitem{Tisean} R. Hegger, H. Kantz and T. Schreiber, TISEAN sortware package; URL
{\sf http://www.mpipks-dresden.mpg.de/ tisean} (2007).
+
{\sf http://www.mpipks-dresden.mpg.de/&nbsp;tisean} (2007).
 
\end{thebibliography}
 
\end{thebibliography}

Revision as of 23:10, 13 April 2007

Contents

Basic Definitions

The GP algorithm is used for estimating the correlation dimension of some fractal measure \(\mu\) from a given set of points randomly distributed according to \(\mu\). Let the \(N\) points be denoted by \({\mathbf x}_1,\ldots {\mathbf x}_N\), in some metric space with distances \(|{\mathbf x}_i-{\mathbf x}_j|\) between any pair of points. For any positive number \(r\), the correlation sum \(C(r)\) is then defined as the fraction of pairs whose distance is smaller than \(r\),

<math eq1> {\hat C}(r) = {2\over N(N-1)}\sum_{i<j} \theta(r-|{\mathbf x}_i-{\mathbf x}_j|), </math>

where \(\theta(x)\) is the Heaviside step function. It is an unbiased estimator of the correlation integral

<math

eq2> C(r) = \int d\mu({\mathbf x}) \int d\mu({\mathbf y}) \theta(r-|{\mathbf x}-{\mathbf y}|). </math>

Both \({\hat C}(r)\) and \(C(r)\) are monotonically decreasing to zero as \(r\to 0\). If \(C(r)\) decreases like a power law, \(C(r) \sim r^D\), then \(D\) is called the correlation dimension of \(\mu\). The term ``GP algorithm" is used generically for any algorithm which attempts to estimate \(D\) (and more generally \(C(r)\)) from the small-\(r\) behavior of \({\hat C}(r)\), in particular when the input data are in form of a time series. Because this involves an extrapolation to a limit where the statistics is severely undersampled for any finite \(N\), this is an inherently ill-posed problem. The simplest and most naive way to estimate \(D\) is to plot \(C(r)\) against \(r\) on a log-log plot and to fit a straight line to the small-\(r\) tail of the curve. \(D\) is then the slope of this line (see Fig. 1). More sophisticated methods involve e.g. fitting local slopes \(D_{\rm eff}(r)\) and extrapolating them to \(r\to 0\), or the method proposed in \cite{takens1,Theiler0}.

Main Application: Chaotic Dynamical Systems

Although the GP algorithm can be used for any measure (the basic idea had been used before to estimate dimensions of fractal clusters created by diffusion limited aggregation \cite{a1}), it is mostly used to measure the fractal dimensions of a strange attractor from a univariate (i.e. scalar) time series which is denoted as \(x_1,\ldots x_N\). Now, \(x_i\) represents a measurement of the quantity \(x\) at time \(t_i = t_0 + i\Delta t\). We assume stationarity, i.e. the statistics of the set \(\{x_i\}\) is invariant under time translation. Unless the measurements are i.i.d., there will be correlations between successive measurements. But they will be weak and short-ranged, if the data are produced by a chaotic system, i.e. if they are sampled from a trajectory on a strange attractor (or strange repeller). In that case, and if \(N\) is sufficiently big, one can assume that the data are effectively independent and randomly sampled from the invariant natural measure on the attractor, and one can directly take over Eqs.(<ref>eq1</ref>) and (<ref>eq2</ref>). Furthermore, using Takens' time delay embedding theorem \cite{takens2} and its improvements \cite{sauer}, one can replace a series of \(N+m-1\) univariate measurements by a time series of \(N\) delay vectors

<math delay> {\mathbf x}_i = (x_{i-m+1},x_{i-m+2},\ldots x_i) \in R^m </math>

where \(m\) is the embedding dimension. Estimating the dimension of attractors by using Eqs.(<ref>eq1</ref>) with delay vectors, and using Euclidean distances in delay vector space, was first proposed in \cite{GP}. An equivalent algorithm with maximum instead of Euclidean norm had been proposed independently by Takens \cite{takens3}.

One of the main applications of the GP algorithm is to distinguish (in principle) between stochastic and deterministically chaotic time sequences. For a stochastic signal, \(C(r,m) \sim r^m\) for all \(m\). In contrast, \(C(r,m) \sim r^D\) for \(m\) larger than the attractor dimension, if the signal is generated by a deterministic system. Notice that in both cases the Fourier spectrum is continuous, and thus cannot be used to make this distinction. In practice, the distinction based on \(C(r,m)\) is often not possible either, due to experimental noise, finiteness of \(N\), non-stationarity and intermittency effects, and due to the uncertainties involved in the extrapolation \(r\to 0\). It seems fair to say that a large fraction of the relevant literature is obsolete, because authors have underestimated these problems in view of the easiness of the implementation of the bare algorithm.

Relations to Other Dynamical Invariants and Multifractality

Let us denote by \(C(r,m)\) the correlation integral obtained with embedding dimension \(m\). Then the typical behavior for \(r\to 0,\; m\to \infty\) for a chaotic system with attractor dimension \(<m\) is \(C(r,m) \sim r^D \exp{(-mK_2\Delta t)}\) where \(K_2\) is the order-2 Renyi entropy, a proxy for the Kolmogorov-Sinai entropy. Thus the GP algorithm can be used also to estimate dynamical entropies \cite{GP2} (see Fig. 1).

The basic idea of the GP algorithm, namely to estimate a dimension from the statistics of near neighbors, can be implemented also in other ways. For instance, one can define pointwise dimensions \(D(i)\) by counting for each \(i\) the fraction \(n_i(r)/(N-1)\) of points which are \(r\)-close neighbors of \({\mathbf x}_i\) and fitting it to a power law. Alternatively, one obtains the information dimension \(D_1\) by fitting a power law to a geometric average, \(C_1(r,m) = \exp[N^{-1} \sum_i \ln (n_i(r)/(N-1))]\). Or, more generally, one can define non-linear averages by \(C_q(r,m) = [N^{-1} \sum_i [(n_i(r)/(N-1)]^{q-1}]^{1/(q-1)}\). Notice that \(C_2(r,m) \equiv C(r,m)\). If \(C_q((r,m) \sim r^{D_q} \exp{(-m{K_q}\Delta t)}\), then \(D_q\) and \(K_q\) are called order-\(q\) Renyi dimensions and order-q dynamical entropies. Thus \(D\) is also called \(D_2\), the order-2 Renyi dimension.

Measures for which \(D_q\) is independent of \(q\) are called monofractal, those with non-trivial \(q\)-dependence are called multifractal \cite{G1,HP,Jetal}. All \(D_q\) and \(K_q\) are (metric) invariants, i.e. their values do not change when the metric \(|x-y|\) is replaced by some other metric, or when \(x_i \to f(x_i)\) with smooth and invertible \(f(x)\). While the invariants with \(q=2\) are easiest to measure, the most interesting invariants for theoretical analyses are in general those with \(q=1\).

Computational Complexity Aspects

Typically, one wants to obtain \({\hat C}(r,m)\) for \(N_r\) different values of \(r\) (equally spaced on logarithmic scale) and for \(M\) different values of \(m\). Naive evaluation of Eq.(<ref>eq1</ref>) requires then \(O(N^2N_rM^2)\) operations. With e.g. \(N=10^4, N_r = 10^2, M=10\) (rather modest requirements), this is already a non-trivial task on a fast PC. The most obvious improvement is obtained by binning \(r\) logarithmically and storing \[ {\hat C}(r_k,m)- {\hat C}(r_{k-1},m) = {1\over N(N-1)}\#\{(i,j):\; r_{k-1} < |x_i-x_j| < r_k\} \] in separate entries of a histogram. This reduces the complexity to \(O(N^2M^2)\). Next, one can treat all values of \(m\) in a single run, which reduces the \(M^2\) dependence to \(M\). This can be further reduced to a weaker than linear increase with \(M\) (at least for intermediate values of \(M\)), if one replaces the double sum over \(i\) and \(j\) in Eq. (1) by a sum over \(i\) and \(i-j\). For a fast implementation using also some other shortcuts, see \cite{Lehnertz1}.

For very large \(N\) one can reduce CPU time further by noticing that it is mainly the small \(r\) tail of \({\hat C}(r,m)\) that is of interest. By preprocessing the data (using e.g. grids and taking pairs of points only from neighboring boxes) one can avoid counting pairs with large \(r\), obtaining substantial improvements \cite{boxes}.

``Optimal" Choices for Delay and Embedding Dimension

There exists a large literature which attempts to determine optimal choices for the delay \(\Delta t\) and for \(m\). The delay is often chosen such that some measure of dependence (e.g. mutual information) between successive coordinates \(x_i\) and \(x_{i+1}\) of delay vectors has a local minimum. This is, however, not what one really wants. Instead, one wants to avoid that all \(m\) components are too dependent. These two requirements are in general mutually exclusive \cite{GSS}. Also, there are in general no optimal values of \(m\) and \(\Delta t\) separately, but only for the product \((m+1)\Delta t\). The reason is simply that adding more measured values cannot be detrimental (at least if the data are not too noisy, if the maximum norm is used, and if one has enough computing power). The only general advice one can give for \(\Delta t\) and for \(m\) is to avoid values for which \(D\) has a local minimum, because that means that such a choice cannot resolve all effective degrees of freedom, as they would be seen with other, near-by, choices \cite{GSS}.

Non-Stationary Signals and Theiler Correction

When applying the GP method to time sequences, one should remember that its justification hinges on the assumption that all points \({\mathbf x}_i\) are independent apart from being distributed according to the same invariant measure. In particular, there should be no significant time correlations.

This is manifestly and grossly violated, if the system is not stationary. In that case a main reason for two points \({\mathbf x}_i\) and \({\mathbf x}_j\) to be close neighbors in space might be that they are also close in time, as is most clearly demonstrated by (ordinary or fractal) diffusion. Neglecting this has been one of the most common reasons for wrong claims for small attractor dimensions. Fortunately, there is an easy way to test against this danger: Plot all pairs \((i,j)\) with \(|{\mathbf x}_i-{\mathbf x}_j|<r\) against \(|i-j|\), and check that they don't cluster at small \(|i-j|\) (more precisely, the density of these points should be \(\sim N-|i-j|\)). More common tests for stationarity are less useful, as they are sensitive to the bulk of the data and not only to the tiny fraction of small distance pairs.

Even for stationary systems, pairs with very small \(|i-j|\) will not be independent and should thus be excluded from the analysis. As suggested by Theiler \cite{Theiler}, this is done by defining a generous upper limit \(\tau\) to the correlation time, and replacing Eq.(<ref>eq1</ref>) by

<math thei> {\hat C}(r) = {2\over (N-\tau)(N-\tau-1)}\sum_{i+\tau<j} \theta(r-|x_i-x_j|). </math>

Intermittent, Noisy, and Stochastic Time Sequences

Experimental data are usually noisy and often intermittent. Strong intermittency poses a practical problem, in that it implies a large time scale over which the signal does not look stationary. It also leads often to very inhomogeneous invariant measures, so that any scaling law is likely to show very large corrections. Finally, it usually implies a strong dependence on the order \(q\), so that \(D\) is a bad proxy for the more interesting information dimension \(D_1\).

Low amplitude and high frequency noise (the most common case) leads to deviations from scaling behavior at small \(r\). In the ideal case, it fills the available phase space, and thus \({\hat C}(r,m) \sim r^m\) below the noise level, with \({\hat C}(r,m) \sim r^D\) above. In this case the estimation of \(D\) is more difficult but still possible.

The worst case is when a separation into noise and deterministic signal is no longer possible. In this case, looking for scaling behavior is no longer adequate. But studying \({\hat C}(r,m)\) can still be useful for rejecting null models, such as AR and ARMA models popular e.g. in economy \cite{Brock}. Another application of \({\hat C}(r,m)\) is to EEG analysis. There, even if estimates of ``dimensions" are usually misleading, the shape of \({\hat C}(r,m)\) can systematically depend on mental states which might not be easily distinguished otherwise. For instance, values of \({\hat C}(r,m)\) at small \(r\) are increased (i.e. effective dimensions are reduced) during sleep, under the influence of narcotic drugs, and during epileptic seizures. An interesting suggestion which has stimulated much controversy is that there is also a ``preictal" phase preceding epileptic seizures, during which \(D_{\rm eff}\) is reduced and which could be used to predict seizures \cite{Lehnertz2}.

For further reading, see \cite{Kantz}. For public domain software, see e.g. \cite{Tisean}.

\begin{figure} \begin{center} \epsfig{file=henon.ps, width=8.3cm, angle=270} \caption{ Log-log plot of UNIQ70f5a7665b934b12-MathJax-133-QINU for the H{\'e}non map versus r, for UNIQ70f5a7665b934b12-MathJax-134-QINU. The H{\'e}non map is defined as UNIQ70f5a7665b934b12-MathJax-135-QINU. The metric used is the maximum norm. A power law UNIQ70f5a7665b934b12-MathJax-136-QINU with UNIQ70f5a7665b934b12-MathJax-137-QINU is seem in the central region (UNIQ70f5a7665b934b12-MathJax-138-QINU) for all curves with UNIQ70f5a7665b934b12-MathJax-139-QINU. For smaller UNIQ70f5a7665b934b12-MathJax-140-QINU, the data look noisy due to lack of statistics (UNIQ70f5a7665b934b12-MathJax-141-QINU). For larger UNIQ70f5a7665b934b12-MathJax-142-QINU one sees deviations from scaling which would be different if the Euclidean norm had been used. The gaps between the curves for UNIQ70f5a7665b934b12-MathJax-143-QINU in the scaling region determine the dynamical entropy UNIQ70f5a7665b934b12-MathJax-144-QINU.} \end{center} \end{figure}

\begin{thebibliography}{99} \bibitem{takens1} F. Takens, in: B.L.J. Braaksma et al., eds., "Dynamical Systems and Bifurcations", Lecture Notes in Math. Vol. 1125, Springer, Heidelberg (1985). \bibitem{Theiler0} J. Theiler, Phys. Lett. A 135, 195 (1988). \bibitem{a1} T.A. Witten and L.M. Sander, Phys. Rev. Lett. '''47''', 1400 (1981). \bibitem{takens2} F. Takens, in: Proc. Warwick Symp. 1980, D. Rand and B.S. Young, eds., Lecture Notes in Math. 898 (Springer, Berlin, 1981). \bibitem{sauer} T. Sauer, J.A. Yorke, and M. Casdagli, J. Stat. Phys. '''65''', 579 (1991). \bibitem{GP} P. Grassberger and I. Procaccia, Physica D '''9''', 198 (1983); Phys. Rev. Lett. '''50''', 346 (1983). \bibitem{takens3} F. Takens, ``Invariants related to dimension and entropy", Atas do UNIQ70f5a7665b934b12-MathJax-145-QINU Coloquio Brasileiro de Matematica (1982). \bibitem{GP2} P. Grassberger and I. Procaccia, Phys. Rev. A '''28''', 2591 (1983). \bibitem{G1} P. Grassberger, Phys. Lett. A '''97''', 227 (1983); '''107''', 101 (1985). \bibitem{HP} H.G.E. Hentschel and I. Procaccia, Physica D '''8''', 435 (1983). \bibitem{Jetal} T.C. Halsey, M.H. Jensen, L.P. Kadanoff, I. Procaccia, and B.I. Shraiman, Phys. Rev. A 33, 1141 (1986). \bibitem{Lehnertz1} G. Widmann et al., Physica D '''121''', 65 (1998). \bibitem{boxes} T. Schreiber, Int. J. Bifurcation and Chaos 5, 349 (1995). \bibitem{GSS} P. Grassberger, T. Schreiber, and C. Schaffrath, Int. J. Bifurcation and Chaos 1, 521 (1991). \bibitem{Theiler} J. Theiler, J. Opt. Soc. Amer. A '''7''', 1055 (1990). \bibitem{Brock} W.A. Brock, W.D. Dechert, J.A. Scheinkman, and B. LeBaron, Economic Reviews '''15''', 197 (1996). \bibitem{Lehnertz2} C.E. Elger and K. Lehnertz, ``Prediction of seizure occurrence by chaos analysis: Technique and therapeutic implications", in: F. Rosenow et al., eds., Handbook of Clinical Neurophysiology Vol.3, pp.491-500 (2004). \bibitem{Kantz} H. Kantz and T. Schreiber, "Nonlinear time series analysis", 2nd edition (Cambridge University Press, Cambridge 2003). \bibitem{Tisean} R. Hegger, H. Kantz and T. Schreiber, TISEAN sortware package; URL {\sf http://www.mpipks-dresden.mpg.de/ tisean} (2007). \end{thebibliography}

Personal tools
Namespaces

Variants
Actions
Navigation
Focal areas
Activity
Tools