# Ornstein theory

**Ornstein theory** is about the relationship between coin tossing and dynamical systems (even systems governed by Newton's law of motion).

## Background

The basic objects of abstract ergodic theory are measure preserving, invertible transformations, \(T\ ,\) on a measure space \(X \!\ ,\) with measure \(\mu\) \((\mu (X) = 1)\ ,\) or a one-parameter family, \(T_{t}\ ,\) of such transformations.

We call the latter a "flow," and denote it by \((T_{t}, X, \mu)\ .\) We call the former a transformation or "shift," and denote it by \((T, X, \mu)\ .\)

### Dynamical systems

Measure preserving flows (or transformations) were originally studied in the context of dynamical systems (e.g., the time evolution of gas in a box or the motion of a billiard ball on a frictionless table).

The state or configuration of the system is represented by a point, \(x\ ,\) in phase space \(M\) (e.g., the position and direction of motion of the ball is represented by a point in 3 dimensions).

Newton's laws determine where \(x\) in \(M\) will be at time \(t\ ,\) denote this by \((T_{t}(x))\ .\)

There is an invariant probability measure, \(\mu\ ,\) with the same sets of measure \(0\) as Lebesgue measure (Liouville theorem). (For billiards, this is 3-dimensional Lebesgue measure).

\(\mu\) of a set E models the probability that the state of the system is in E. The reasons for adding a probability structure to our model are: In most physical phenomena, time has a preferred direction (e.g., entropy increases). Since Newton’s laws are time reversible, we need a probability structure to explain this. In addition, our system may have configurations that are consistent with Newton’s laws, but have probability zero (or very small probability). We want this to be included in our model.

Ergodic theory studies these systems at the level of abstraction, where we regard \(M, \mu\) as an abstract measure space \(X, \mu\) (i.e., we ignore sets of probability \(0\ ,\) and the geometry of \(M\)). We say that two systems are isomorphic if they are the same at this level of abstraction [\( (T_{t}, X, \mu)\) or \((T, X, \mu)\) is isomorphic to \((\hat {T}_{t}, \hat{X}, \hat{\mu})\) or \((\hat {T}, \hat{X}, \hat{\mu})\) if there is a 1-1 invertible measure preserving map between \(X\) and \( \hat{X}\ ,\) that takes the action of \(T_{t}\) (or \(T\)) to the action of \(T_{t}\) or \(\hat {T}\)]. When the meaning is clear, we will replace "isomorphic to" with "is".

### Stationary processes

Stationary processes are the second major setting for ergodic theory.

A stationary process is a measure, \(\mu\ ,\) on its realizations. Each realization is represented by a point, \(x\ ,\) in \(X\ .\) (In the case of coin flipping, a realization is a doubly infinite sequence of \(H\) and \(T\ .\) If the probability of \(H\) is \(p\) and that of \(T\) is \(q,\) then the probability of all realizations that start with \(H, H,T\) will be \(p pq,\) etc. This extends to a measure on \(X.\))

\(T\) or \(T_t\) is the time shift on these realizations.

We get back the process by a function, \(F\ ,\) on \(X\ ,\) such that \(\lbrace{F(T^{n}(x))}\rbrace_{-\infty}^{\infty}\) or \(\lbrace{F(T_{t}(x))}\rbrace_{-\infty}^{\infty}\) are the realizations we started out with.

If we start with an arbitrary \(F\ ,\) we get a measure on the \(\lbrace{F(T^{n}{(x)})}\rbrace_{-\infty}^{\infty}\) or \(\lbrace{F(T_{t}(x))}\rbrace_{-\infty}^{\infty}\ .\) This is the model for a stationary process, except that many \(x\) can give the same \(\lbrace{F(T^{n}(x))}\rbrace\) or \(\lbrace{F(T_{t}(x))}\rbrace.\) If we lump these points together, we get a new transformation or flow, which we call a **factor** of \((T, X, \mu)\) or \((T_{t}, X, \mu)\ .\)

**Every \((T, X, \mu)\) or \((T_{t}, X, \mu)\) can be described as the shift on the realizations of some stationary process** (infinitely, many processes give the same \((T, X, \mu)\) or \((T_{t}, X, \mu)\ .\)

**BERNOULLI SHIFTS are those transformations that come from independent processes.**

If the values of an independent process have probabilities \(p_{i}\ ,\) then the Shannon entropy of the process is \(\sum_{i}\lbrace {p_i}\ \log p_{i}\rbrace \! \ .\) Kolmogorov showed that this entropy is an isomorphism invariant of the corresponding Bernoulli shift.

This answered a long-standing question by showing that not all Bernoulli shifts were isomorphic. The question then became: are all Bernoulli shifts of the same entropy isomorphic?

A positive answer to this question was given 12 years later by Ornstein.

## Ornstein theory

### Theorems 1, 2, & 3

**Theorem 1:** Bernoulli shifts of the same (finite or infinite) entropy are isomorphic.

This is the starting point of the Ornstein theory, which goes well beyond the theorem above.

**Theorem 2:**

(a) Any factor of a Bernoulli shift is a Bernoulli shift

(b) If T has positive entropy \(k\) (or \(\infty\)), then it has a factor which is a Bernoulli shift of any entropy ≤\(k\) (or ≤ \(\infty\)).

__Remark__: Part (b) was proved by Sinai before the Ornstein theorem by a completely different method.

**Theorem 3:**

There exists an abstract finite entropy flow, \(B_{t}\ ,\) with the following properties:

- If we discretize time, the resulting shift, \(B_{t_{0}}\ ,\) is a Bernoulli shift of finite entropy for all \(t_{0}\) (\(t_{0}\) is a constant).
- Every Bernoulli shift of finite entropy is isomorphic to some \(B_{t_{0}}\ .\)
- Uniqueness. If \(T_{t_{0}}\) is a finite entropy Bernoulli shift for some \(t_{0}\ ,\) then \(T_{t}\) is \(B_{ct}\) for some constant, \(c\ .\)
- The only factors of \(B_{ct}\) are the \(B_{at}\) for \(a \le c\)
- If \(T_t\) has finite positive entropy, then \(T_t\) has \(B_{ct}\) as a factor of the same entropy.

There exists an abstract flow of infinite entropy, \(B_{t}^{\infty}\ ,\) with the following properties:

- All \(B_{ct}\) are isomorphic (unlike finite entropy).
- \(B_{t_{0}}^{\infty}\) is the Bernoulli shift of infinite entropy for all \(t_{0}\ .\)
- If \(T_{t_{0}}\) is the Bernoulli shift of infinite entropy for some \(t_{0}\ ,\) then \(T_{t}\) is \(B_{t}^{\infty}\ .\)
- The only factors of \(B_{t}^{\infty}\) are \(B_{t}^{\infty}\) and the \(B_{ct}\ .\) They appear as factors of all flows of infinite entropy.

__Remarks:__

- A basic fact about entropy is: the entropy of the flow \(F_{ct}\) equals \(c\) times the entropy of \(F_{t}\) and the entropy of the shift \(F_{t_{0}}\) equals \(t_{0}\) times the entropy of the flow \(F_{t}\ .\) Thus: (b) above contains Theorem 1 but (since \(B_{t_{0}}\) is the time one transformation of \(B_{t_{0}t}\)) it implies that
**even though Bernoulli shifts of different entropies are not isomorphic, they are "essentially" the same in the sense that they only differ by a rescaling of time.** - Theorem 3 contains the result: Bernoulli flows (\(B_{t}^{\infty}\) and the \(B_{ct}\)) of the same entropy are isomorphic.
- One reason for the interest in Bernoulli shifts is that, at the level of abstraction of measure preserving transformations, they are the "most random" transformation possible in the same sense that independent processes are. (There are more ways to justify the statement above and we will return to this issue when we describe the Randomness hierarchy.)

One of the main implications of Theorem 3 is a continuous time analog, i.e.,**it proves the existence of an object (or objects) (\(B_{t}^{\infty}\) and the \(B_{ct}\)) that can, at our level of abstraction, be considered to be the "most random" flows possible**(we will return to this when we discuss the randomness hierarchy continuous time). - It was not previously known if the Bernoulli shift \(\frac {1}{2}\ ,\) \(\frac {1}{2}\) had a square root. Our theorem can be thought of as a very strong answer.

### Bernoulli examples

Perhaps the most important parts of the Ornstein theory are criteria for determining whether or not a shift or flow is Bernoulli (a Bernoulli shift, \(B_{ct}\ ,\) or \(B_{t}^{\infty}\)) because it allows us to prove that certain concrete systems are Bernoulli^{1}. These criteria, however, take the longest to describe and we will start with a sampling of some of their consequences and postpone their description to the end of this article.

- The flow that we get from the following stationary process is isomorphic to \(B_{ct}\ :\) the process has two outputs, A and B. A lasts for time \(t_a\) and B lasts for time \(t_b\ .\) At the end of this time, we flip a coin to determine the next output (\( t_a\) and \(t_b\) are not rationally related).
- The flow we get from Brownian motion on the unit interval with reflecting barriers is isomorphic to \(B_{t}^{\infty}\)
^{2}. - Geodesic flow on a manifold, \(M\ ,\) of negative curvature is isomorphic to \(B_{ct}\)
^{3}(the flow is actually on the tangent space to \(M\)). - The motion of a billiard ball on a table with a finite number of convex obstacles (Sinai billiards) is isomorphic to \(B_{ct}\)
^{4}. - Let \(T_t\) be a smooth flow on a compact 3-dimensional manifold with a smooth invariant measure,
^{5}then: each ergodic component on which \(T_t\) has positive entropy, has positive probability, and is either \(B_{ct}\) or \(B_{ct} \times R_{t}\) (\( R_{t}\) is rotation of the circle)^{6}.

(The analogous result holds for diffeomorphisms on 2-dimensional manifolds.)

- Mixing Markov processes or mixing multi-step Markov processes are Bernoulli shifts.
- Ergodic automorphisms of compact groups are Bernoulli shifts
^{7}.

__Remarks:__

**It was previously believed that flows arising from "random" stationary processes (i.e., the time evolution is governed by coin flipping) and the flows arising from dynamical systems where the time evolution is governed by Newton's law were distinct.**- This also has relevance to chaos "theory," which is about the ways that some Newtonian or deterministic systems "behave as if they were random."

### "Randomness" hierarchy (discrete time)

We can think of the "randomness"^{8} or "lack of determinism" of an abstract transformation \((T,X)\) in terms of the collection all finite or countable valued processes \(T, F, X\) (i.e., measurements on the system). From this point of view, we should look at all factors (all \(F\ ,\) not just those that generate the \(\sigma\) algebra.)

- Completely deterministic: the past of \(T, F, X\) determines the future. This is the same as zero entropy (all factors have zero entropy).
- Positive entropy: Could have zero entropy factors. Must have a Bernoulli factor of full entropy (Theorem 2b).
- Completely positive entropy (the K-property): Eliminates zero entropy factors.
- Bernoulli shifts: Only Bernoulli factors (Theorem 2a).

### Counter examples

Counter examples flesh out the hierarchy above.

It was once conjectured that the only transformations with the K-property were Bernoulli shifts.^{9} This was shown by Ornstein to be false. There are uncountable many non-isomorphic K of the same entropy^{10}. In fact, we now have a zoo of qualitatively different transformations with the K-property (some of the most ingenious are due to Rudolph and Hoffman).

This is relevant to criteria for Bernoulli because most of the systems that have been shown to be Bernoulli were previously known to have the K-property. We now know that the K-property designates a large and messy class, whereas the "Bernoulli property" is essentially a complete characterization at our level of abstraction.

We should also note that positive entropy has a non-trivial place in our hierarchy in the sense that it involves more than zero-entropy and the K-property. It was conjectured (Pinsker) that every positive entropy transformation was the direct product of zero-entropy and K.^{11} This was shown to be false by Ornstein.

__Remark__: The "randomness hierarchy" gives more meaning to the statement "The Bernoulli shifts are the most random (or least deterministic) possible."

### "Randomness" hierarchy (continuous time)

**Theorem 3 allows us to extend the hierarchy to continuous time by providing a unique top to the hierarchy:** \(B_{ct}\ ,\) \(B_t^{\infty}\ ,\) which is pure in the sense that the only factors of \(B_{ct}\) and \(B_t^{\infty}\) are \(B_{ct}\) and \(B_t^{\infty}\ ,\) and is responsible for all positive entropy (Theorem 3e).

Smorodinsky extended the non-Bernoulli K to continuous time. The counter example to the Pinsker conjecture also extends.

### Stability

The reason that we are able to produce isomorphisms between very different dynamical systems is that we have abstracted out the geometry of the phase space. If, however, we are comparing systems with the same phase space and where the second system is obtained from the first by a small change in the dynamics (laws of motion), then it is reasonable to ask for an **isomorphism that respects the geometry of the phase space**.

This would give us a "stability" result in the sense that a small change in the defining dynamics will produce a small change in the behavior of the system.

**Theorem:** Let \(F_t\) be geodesic flow on a manifold, or surface, \(M\) of negative curvature (the flow is really on the tangent bundle to \(M, M^T\ ,\) the phase space of the system) and define \(\hat{F}_t\) by a small \(c^2\) change in the Riemannian structure of \(M\ .\)

Then, given \(\varepsilon\) if the change is small enough, \(\hat{F}_t\) will be isomorphic to \(F_{ct}, |c-1|< \varepsilon\) by a map \(\psi\) of \(M^T\) to itself that moves all but \(\varepsilon\) of the points by a distance \(< \varepsilon\) (essentially preserving the geometry of \(M^T\)).

We would get the same conclusion if, instead of changing the Riemannian structure, we made a small variable speed change along the orbits of \(F_t\ .\)

We would also get the same conclusion if we made a small \(c^2\) change in the obstacles in Sinai billiards (Eleranta thesis).

The theorems above are examples of what we call statistical stability.

The conclusion of the statistical stability theorem above defines what we mean by a "small change in behavior." This is not the standard meaning, which comes from the much more thoroughly studied "structural stability" (of Peixoto, Anosov, Smale, Mané, etc.)

In the case of geodesic flow, if we define \(\hat{F}_t\) as coming from the above change in Riemannian structure, then structural stability asserts that there is a homeomorphism of \(\bar{\psi}\) of \(M^T\) onto itself that takes the orbits of \(F_t\) onto those of \(\hat{F}_t\) and moves all points by \(< \varepsilon\ .\)

Statistical stability corrects the following statistical problems with structural stability.

- It is possible that \(\bar{\psi}\) maps a set of probability one onto a set of probability zero.
- Since \(\bar{\psi}\) does not preserve the speed along orbits, sets that correspond under \(\bar{\psi}\) at time 0 may not correspond to each other at time \(t\neq0\ .\)

If our perturbation were a variable speed change along the same orbits, \(\bar{\psi}\) would be the identity, whereas \(\psi\) must scramble orbits.

Sinai billiards is statistically stable (we make a small change in the curvature of the objects) but not structurally stable.

"Essentially all" systems that are structurally stable are also statistically stable. (A precise statement can be found in [OW1]) but statistically stable systems are a larger class and hold for a larger class of perturbation..

### Criteria for Bernoulli

We will call a stationary process \(T,F\ ,\) where \(T\) is a Bernoulli shift a Bernoulli process. We will now give characterizations of Bernoulli processes.

Before we state these characterizations, we will describe a metric (the \(\bar{d}\) metric) between stationary processes.

Two stationary processes with values in the same metric space M, \(\bar{T}, \bar{X}, \bar{F}, \bar\mu\) and \(\hat{T}, \hat{X}, \hat{F}, \hat\mu\) are closer than \(\alpha\) if there is a third stationary process, \(T, X, F_{1}, F_{2}, \mu\ ,\) such that \(T, X, F_{1}, \mu\) is the same as \(\bar{T}, \bar{X}, \bar{F}, \bar\mu\ ,\) and \(T, X, F_{2}, \mu\) is the same as \(\hat{T}, \hat {X}, \hat {F}, \hat \mu\ ,\) and \(F_{1}\) and \(F_{2}\) differ by \(< \alpha\ ,\) except on a set of measure \(< \alpha\ .\)

(This is not an isomorphism, because neither \(F_{1}\) nor \(F_{2}\) is required to generate the \(\sigma\) algebra. The same definition holds for flows.)

There is a finite non-stationary version of \(\bar{d}\ ,\) in which the measure on the sequences of outputs of length n is represented by a measure space \(\bar{X}\ ,\) whose points are these outputs. We can make \(\bar{X}\) and \(\hat{X}\) non discrete. We match \(\bar{X}\) and \(\hat{X}\) in a 1-1 measure preserving way, so that corresponding sequences differ in fewer than \(\alpha n\)places, except on a set of measure \(\leq \alpha\ .\)

#### Finitely determined (FD) systems

\( T, F\) (or \(T, F, X, \mu\) ) is said to be finitely determined if, given \(\varepsilon\ ,\) there is an n and \(\delta > 0\ ,\) such that, if \(\bar{T}, \bar{F}\) satisfies

- \(\bar{d} \left (\bigvee_0^n F(T^{i}(x)), \bigvee_0^n \bar{F}(\bar{t}^{i}(x)) \right ) < \delta\ ,\) and
- the entropies of \(T, F\) and \(\bar{T}, \bar{F}\) are closer than \(\delta\ ,\)

then \(\bar{d} ((T, F), (\bar{T}, \bar{F})) < \varepsilon\)

FD is the crucial idea behind our theorem, since it allows us to control infinite behavior by finite constructions. We prove:

**Theorem:** FD processes of the same entropy are isomorphic.

We then show that a specific process (e.g., an independent process) is FD.

#### Very weak Bernoulli (VWB) systems

\( T, F\) is said to be "very weak Bernoulli" (VWB) if, for every \(\varepsilon > 0\ ,\) there is an \(n\) such that, for all \(m > 0\ ,\) and all but \(\varepsilon\) of the atoms in \(\bigvee_{-m}^{-1} F(T^i(x))\ ,\) the conditional distribution of \(\bigvee_0^n F(T^{i}x)\) is closer than \(\varepsilon\) in \(\bar{d}\) to the unconditional distribution.

A process is FD if---and only if---it is VWB (the "only if" is joint with Weiss).

VWB is usually the easiest to check. In examples 3, 4, and 5, the hyperbolic structure allows us to check V.W.B for the time one map, with respect to a generating partition.

**The mixing hierarchy (discrete time)**

VWB allows us to view the relationship between K and Bernoulli in terms of the dependence of the future on the past. VWB is such a condition, K can be described by an analogous, but weaker, condition, where we condition \(F(T^n_{(x)})\ ,\) rather than \(\bigvee_0^n F(T^{n}x)\) on the past.

**VWB completes the traditional mixing hierarchy: ergodicity, weak mixing, mild mixing, mixing, K, Bernoulli**.

The mixing hierarchy, in addition to the randomness hierarchy, gives a sense in which the Bernoullis are the most random possible.

#### Extremal systems

An ergodic finite state process \((P,T)\) is said to be external if the following is true: For every \(\varepsilon > 0\) there exists an integer \(n_0\) and \(\delta > 0\) such that, for all \(n > n_0\ ,\) if we consider a partition of the space (of outparts of length \(n\)) into \(2^{n\delta}\) sets \((F_1,F_2,...,F_{2^{n\delta}}\ ,\) then, for all \(F_i\) but a family whose union has measure smaller than \(\varepsilon\ ,\)

\(\bar{d} \left (\bigvee_0^n T^{i}P/F_i, \bigvee_0^n T^iP \right ) < \varepsilon\ .\)

A system is extremal if---and only if---it is FD.

#### Other criteria

The \(\bar{d}\) limit of Bernoulli processes is a Bernoulli process.

\( T\) is a Bernoulli shift if it has an increasing sequence of Bernoulli factors that generate the \(\sigma\) algebra.

### More general groups

The proofs of Theorems 1, 2, and 3 can be refined to work for general amenable groups [OW2]. (A key ingredient is a theorem of Rudolph.) In particular, the theory applies to infinite particle systems and Ising models.

### Orbit equivalence

We can weaken isomorphism by only requiring that orbits map onto orbits with certain restrictions (in continuous time the speed is not preserved and in discrete time certain changes in the order are allowed). FD, VWB, Theorems 1,2, and 3, and the counterexamples have analogs in this situation.

This is mainly due to Feldman, Ornstein-Weiss, and Rudolph [ORW, KR].

### Thouvenot's Relative theory

This theory gives criteria for a factor, \(T_A\ ,\) of \(T\) to have an independent complement (\(T = T_A \times T_B\)). There is a relative FD and a relative VWB.

Factors of \(T = T_A\ ,\) and \(T_\bar{A}\) are equivalent if there is an automorphism of \(T\) that takes \(T_A\) onto \(T_\bar{A}\ .\) The classification of factors of a Bernoulli shift^{12} under equivalence, largely mirrors the classification of transformations under isomorphism. The main examples are due to Hoffman.

## Footnotes

^{1} This area is still active.

^{2} This is due to Shields.

^{3} Joint paper with Weiss.

^{4} Joint paper with Galavotti (using deep results of Sinai to check criteria).

^{5} The double pendulum is a good example of such a \(T_t\ .\)

^{6} This is due to Pesin (using Pesin theory to check criteria).

^{7} This is due to Lind and, independently, Miles and Thomas (checking the criteria requires difficult arguments).

^{8} Except for factors of Bernoulli shifts, due to Kolmogorov, Sinai, and Rokhlin, and predates the Ornstein theory. Kolmogorov introduced the K property (with a different definition) and Sinai-Rokhlin corrected it with entropy. Except for the fact that factors of Bernoulli shifts are Bernoulli, the randomness hierarchy predates the Ornstein theory.

^{9} Example 5 shows this conjecture to be true for 2-dimensional diffeomorphisms.

^{10} Joint paper with Shields.

^{11} Example 5 shows this conjecture to be true for 2-dimensional diffeomorphisms.

^{12} Before the Kolmogorov-Sinai entropy, it was not known whether or not every Bernoulli shift had even one non-trivial factor.

## References

**Internal references**

- Paul M.B. Vitanyi (2007) Andrey Nikolaevich Kolmogorov. Scholarpedia, 2(2):2798.

- Leonid Bunimovich (2007) Dynamical billiards. Scholarpedia, 2(8):1813.

- James Meiss (2007) Dynamical systems. Scholarpedia, 2(2):1629.

- Tomasz Downarowicz (2007) Entropy. Scholarpedia, 2(11):3901.

- Yakov Pesin and Boris Hasselblatt (2008) Nonuniform hyperbolicity. Scholarpedia, 3(1):4842.

- Philip Holmes and Eric T. Shea-Brown (2006) Stability. Scholarpedia, 1(10):1838.

- David H. Terman and Eugene M. Izhikevich (2008) State space. Scholarpedia, 3(3):1924.

### Expository

Thouvenot, J.-P. "Entropy, Isomorphism and Equivalence in Ergodic Theory." *Handbook of Dynamical Systems, Vol. 1A*, B. Hasselblatt and A. Katok, Editors. pp. 206--238 (2002).

[OW1] Ornstein, D.S. & Weiss, B. "Statistical Properties of Chaotic Systems." *Bulletin of the American Mathematical Society*, **24**:1. pp. 11--116 (January 1991).

### Books

Ornstein, D.S. *Ergodic Theory, Randomness, and Dynamical Systems*. Yale Mathematical Monographs. Yale U Press : New Haven. 144 pages. (1974).

Shields, P. *The Theory of Bernoulli Shifts*. U Chicago Press : Chicago. 128 pages. (1973).

Smorodinsky, M. "Ergodic Theory, Entropy." *Lecture Notes in Math*, **214**. Springer : New York. (1971).

Rudolph, D. *Fundamentals of Measurable Dynamics: Ergodic Theory on Lebesgue Space.* Clarendon/Oxford U Press : New York. 184 pages. (1990).

### Technical sources

[OW2] Ornstein, D.S. & Weiss, B. "Entropy and isomorphism theorems for actions of amenable groups." *J. Analyse Math.*, **48**. pp. 1--141 (1987).

[ORW] Ornstein, D.S., Rudolph, D., and Weiss, B., Equivalence of measure preserving transformations, *Mem. Amer. Math. Soc.* **262** (1984).

[KR] Kammeyer, J. & Rudolph, D. *Restricted Orbit Equivalence of Discrete Amenable Groups*. Cambridge U Press : Cambridge, UK. 203 pages. (2002).

## See also

Ergodic Theory, Invariant Measure, Kolmogorov-Sinai entropy, Structural Stability