Entropy
From Scholarpedia
| Tomasz Downarowicz (2007), Scholarpedia, 2(11):3901. | revision #64771 [link to/cite this article] | |||||||||||||||||||
- In classical physics, the entropy of a physical system is proportional to the quantity of energy no longer available to do physical work. Entropy is central to the second law of thermodynamics, which states that in an isolated system any activity increases the entropy.
- In quantum mechanics, von Neumann entropy extends the notion of entropy to quantum systems by means of the density matrix.
- In probability theory, the entropy of a random variable measures the uncertainty about the value that might be assumed by the variable.
- In information theory, the compression entropy of a message (e.g. a computer file) quantifies the information content carried by the message in terms of the best lossless compression rate.
- In the theory of dynamical systems, entropy quantifies the exponential complexity of a dynamical system or the average flow of information per unit of time.
- In sociology, entropy is the natural decay of structure (such as law, organization, and convention) in a social system.
- In the common sense, entropy means disorder or chaos.
History
The term entropy was coined in 1865 [Cl] by the German physicist Rudolf Clausius from Greek en- = in + trope = a turning (point). The word reveals an analogy to energy and etymologists believe that it was designed to denote the form of energy that any energy eventually and inevitably turns into -- a useless heat. The idea was inspired by an earlier formulation by Sadi Carnot [Ca] of what is now known as the second law of thermodynamics.
The Austrian physicist Ludwig Boltzmann [B] and the American scientist Willard Gibbs [G] put entropy into the probabilistic setup of statistical mechanics (around 1875). This idea was later developed by Max Planck. Entropy was generalized to quantum mechanics in 1932 by John von Neumann [N]. Later this led to the invention of entropy as a term in probability theory by Claude Shannon [Sh] (1948), popularized in a joint book [SW] with Warren Weaver, that provided foundations for information theory.
The concept of entropy in dynamical systems was introduced by Andrei Kolmogorov [K] and made precise by Yakov Sinai [Si] in what is now known as the Kolmogorov-Sinai entropy.
The formulation of Maxwell's paradox by James C. Maxwell (around 1871) triggered a search for the physical meaning of information, which resulted in the finding by Rolf Landauer [L] (1961) of the heat equivalent of the erasure of one bit of information, which brought the notions of entropy in thermodynamics and information theory together.
The term entropy is now used in many other sciences (such as sociology), sometimes distant from physics or mathematics, where it no longer maintains its rigorous quantitative character. Usually, it roughly means disorder, chaos, decay of diversity or tendency toward uniform distribution of kinds.
Entropy in physics
Thermodynamical entropy - macroscopic approach
In thermodynamics, a physical system is a collection of objects (bodies) whose state is parametrized by several characteristics such as the distribution of density, pressure, temperature, velocity, chemical potential, etc. The change of entropy of a physical system when it passes from one state to another equals
where
denotes an element of heat being absorbed (or emitted; then it has negative sign) by a body,
is the absolute temperature of that body at that moment, and the integration is over all elements of heat
active in the passage. The above formula allows one to compare the entropies of different states of a system, or to compute the entropy
of each state up to a constant (which is satisfactory in most cases). The absolute value of entropy is established by the
third law of thermodynamics.
Notice that when an element
of heat is transmitted from a warmer body at temperature
to a cooler one at temperature
, then the entropy of the first body changes by
, while that of the other rises by
. Since
, the absolute value of the latter fraction is larger and jointly the entropy of the two-body system increases (while the global energy remains the same).
A system is isolated if it does not interact with its surroundings (i.e., is not influenced in any way). In particular, an isolated system does not exchange energy or matter (or even information) with its surroundings. In virtue of the first law of thermodynamics (the conservation of energy principle), an isolated system can pass only between states of the same global energy. The second law of thermodynamics introduces irreversibility of the evolution: an isolated system cannot pass from a state of higher entropy to a state of lower entropy. Equivalently, the second law says that it is impossible to perform a process whose only final effect is the transmission of heat from a cooler medium to a warmer one. Any such transmission must involve outside work; the elements participating in the work will also change their states and the overall entropy will rise.
The first and second laws of thermodynamics together imply that an isolated system will tend to the state of maximal entropy among all states of the same energy. This state is called the equilibrium state and reaching it is interpreted as the thermodynamical death of the system. The energy distributed in this state is incapable of any further activity.
See Example of calculating entropy and finding the equilibrium state.
Boltzmann entropy and Gibbs entropy - microscopic approach
Boltzmann [B] gave another, statistical meaning to entropy. A thermodynamical state
(or macrostate, as described in terms of a distribution of pressure, temperature, etc.) can be realized in many different ways at the microscopic level, corresponding to many points
(called microstates) in phase space
. For example, the microscopic description
may consist of specifying the positions and velocities of all particles, requiring
real numbers for
particles; in this case, phase space is
, or the set of those
with a certain energy. The Boltzmann entropy of
is defined to be proportional to the logarithm of the phase space volume of the set
of all
that realize the state
:
- (1)
The constant
depends on the unit of phase space volume, and the proportionality factor
is known as the Boltzmann constant. One also writes
for every
.
Analogously, if we take
to be a finite set, and
denotes the number of elements in the set
, then the Boltzmann entropy is defined to be
- (2)
(The constant
is not needed any more as no choice of volume unit is involved.) The logarithm is used in the formulas (1) and (2) because different macrostates correspond to sets of very very different sizes in phase space, the largest of which (by far) belongs to the equilibrium state.
The formulas (1) and (2) can be rephrased in probabilistic terms if we imagine that one element of
is chosen at random with uniform distribution. That is, if the set
of all microstates has
elements and the set
of those realizing the macrostate
has
elements, we assign probability
to each microstate and
to the state
. Then Boltzmann's formula can be rewritten as
- (3)
where
. Practically
is the maximal possible entropy of a macrostate, since
cannot have more elements than
. (And indeed, the thermal equilibrium state
contains most microstates of a given energy, so that
and thus
. This means that the equilibrium state has the maximal possible entropy.)
Notice that logarithms of probabilities are negative.
Gibbs entropy refines (3) to the case where the microstates
realizing the macrostate
may have different probabilities
:
- (4)
The formula (4) reduces to (2) if
for all microstates in
. For a continuous probability distribution with density function
, the Gibbs entropy is defined as
Probability distributions over phase space arise in particular because thermal equilibrium states are connected with such distributions, known as the canonical, microcanonical, and macrocanonical distributions.
See Example of calculating both thermodynamical and Boltzmann entropy.
Entropy in quantum mechanics
John von Neumann [N] found analogs in quantum mechanics for both the Boltzmann entropy and the Gibbs entropy. The quantum Boltzmann entropy is defined by
where
is the subspace of Hilbert space containing those microstates realizing the macrostate
. The Hilbert space plays a role roughly analogous to that of the classical phase space.
The quantum Gibbs entropy, usually called von Neumann entropy, is
where
is a density matrix, which plays a role roughly analogous to that of a probability distribution over phase space. See also the main article on von Neumann entropy.
Black hole entropy
Einstein's General Theory of Relativity implies that black holes exist and that (under reasonable technical assumptions) a certain quantity about a black hole, the surface area
of its horizon, behaves much like entropy in thermodynamics: for any system of black holes, the sum of their surface areas cannot decrease. This statement was proven by Stephen W. Hawking (around 1972) and is known as the second law of black hole dynamics. Indeed, further considerations pointed out by Jacob Bekenstein (1973) and others suggest calling the quantity
(rather than
itself) the black hole's entropy, where the prefactor involves only constants of nature:
is the speed of light,
Boltzmann's constant,
Newton's gravitational constant, and
Planck's constant.
It is believed that the second law of black hole dynamics can be violated in reality (because of quantum effects not taken into account by General Relativity) using processes (such as Hawking radiation) that increase the thermodynamical entropy. It is further believed that the sum of black hole entropy and thermodynamical entropy cannot decrease, and should thus be regarded as the entropy of a system containing black holes. A proof of this claim would require a quantum theory of gravitation, which is not available to date in a satisfactory way; it is also not clear whether and how the black hole entropy is connected to the number of microstates as involved in Boltzmann's formula (2).
See Bekenstein-Hawking entropy.
Entropy in mathematics
Shannon entropy
In probability theory, a probability vector
(also called a partition of unity)
is a collection of finitely many nonnegative numbers
whose sum equals 1.
The Shannon entropy of a probability vector
is a straightforward adaptation of the
Gibbs entropy formula (4) (but leaving out the proportionality factor
):
- (5)
Probability vectors occur naturally in connection with finite partitions of a probability space. Consider an
abstract space
equipped with a probability measure
assigning probabilities
(numbers between 0 and 1) to subsets of
(more precisely, a measure usually does not assign values to all subsets only to certain selected subsets called measurable sets; such sets form a large family closed under set operations such as unions or intersections, called a sigmafield).
A finite partition
of
is a collection of pairwise disjoint measurable sets
whose union is
.
Then the probabilities
form a probability vector
. One associates the entropy of this vector to the partition
:
In this setup entropy can be viewed as a parameter strictly related to the notion of information.
Given a measurable set
, the amount of information associated with
is defined as
- (6)
The information function
associated with a partition
is defined on the space
and it assumes the constant value
at all points
belonging to the set
, i.e.,
- (7)
One easily verifies that the expected value of the information function with respect to
equals the entropy
.
This explain the relation between information and entropy: while information depends on the points in the probability space, entropy is the constant representing the mean value of the information.
In some sciences (e.g. in neuroscience) the term information refers to the difference between the entropy of a signal and the entropy of the noise. So defined information corresponds to the notion of conditional entropy (as explained below) and the similarity of names with the information function is rather incidental.
Interpretation of Shannon entropy
- The partition
of the space
associates with each element
the information that answers the question in which
are you?. That is the maximal knowledge about the points depending solely on the partition. One bit of information is equivalent to acquiring an answer to a binary question, i.e., to a question asking for a choice between two possibilities. Unless the partition has two elements, the question in which
are you? is not binary. But it can be replaced by a series of binary questions and one is free to use any arrangement (tree) of such questions. In such an arrangement, the number of questions
(i.e., the amount of information in bits) needed to determine the location of the point
within the partition may vary from point to point (see example below). The smaller the expected value of
the better the arrangement. The best arrangement satisfies
for almost every
. The difference between
and
results from the crudeness of the measurement of information by counting binary questions; the outcome is always a positive integer. The function
can be interpreted as the precise value. Entropy is the expected amount of information needed to locate a point in the partition.
See Example of calculating and interpreting the information and entropy of a partition.
- Another interpretation of Shannon entropy deals with the notion of uncertainty. Let
be a random variable defined on the probability space
and assuming values in a finite set
. The variable
generates a partition
of
into the sets
(called the preimage partition). The probabilities
form a partition of unity called the distribution of
. Suppose an experimenter knows the distribution of
and tries to guess the outcome of
before performing the experiment, i.e., before picking some
and reading the value
. His uncertainty about the outcome is the expected value of the information he is missing to be certain. As explained above that is exactly the entropy
.
Notice that the entropy does not depend on the metric structure of the set of values of
, so entropy cannot be
compared to variance. Variance measures a different kind of uncertainty of the outcome of a real random
variable
, which takes into account the distances between the outcome values.
Properties of the information function and of the Shannon entropy
- (a) The information
(see (6)) associated with a set
is a nonnegative and decreasing function of
; the smaller the set, the more information is encoded in the fact that
.
if and only if
(there is no information encoded in an event which is certain).
- (b) The entropy of a partition does not depend on the order in which the elements of the partition are numbered.
- (c) The entropy of a partition is nonnegative and equal to zero if and only if one of the elements
of the partition has measure 1 (and all other elements have measure zero).
- (d) The entropy of a partition into
sets is highest for the measure which assigns equal values
to these sets. The entropy then equals
.
- (e) The entropy of a partition into
sets is a continuous function of the measures of these sets.
- (f) If the elements of a partition
are obtained by uniting elements of the partition
(i.e., if
is a refinement of
) then
.
- (g) The entropy of the least common refinement
of two partitions
and
is not larger than the sum of the entropies of
and
(this property is called subadditivity).
- (h) The equality
holds if and only if
and
are stochastically independent.
Shannon proved that above properties (b), (c), (d), (e) and (h) determine the defining formula (5).
Conditional entropy
Given two finite partitions
and
of the same probability space
, the conditional entropy of
given
is defined as
- (9)
and is interpreted as the amount of information added by introducing the partition
when the
partition
(and its information) is already known. The following holds
where
is the conditional probability measure on
obtained by restricting and normalizing
(to normalize a finite measure means to multiply it by a constant so that it becomes
a probability measure).
Sometimes the known information comes not from a finite partition, only from a family of partitions, or, more
generally, from a family of measurable sets that form a sigmafield
smaller than the
sigmafield of all measurable sets. In such case one considers the conditional entropy of a partition
given a sigmafield
defined as
- (10)
where the infimum is taken over all finite partitions
measurable with respect to
(infimum of a set of numbers is either the smallest number in this set or the largest
number smaller than all elements in the set; for example the infimum of all strictly positive numbers is 0).
See also mutual information.
Properties of the conditional entropy
- (a)
,
- (b) If
is a refinement of
then
.
- (c)
.
The property (h) of Shannon entropy can be reformulated as follows:
- (h') The partitions
and
are stochastically independent if and only if
(by symmetry also
).
Kolmogorov-Sinai entropy
See the main article on Kolmogorov-Sinai entropy.
This is the key entropy notion in ergodic theory. Let
be a measurable transformation
of the probability space
, which preserves the measure
, i.e., such that
for every measurable set
. (In dynamical systems it is more natural to consider preimages of sets
rather than their forward images. For instance, preimages of disjoint sets are disjoint, which is not true for images. The image of a measurable set is usually not measurable unless the transformation is invertible, which is not assumed in the setup of the Kolmogorov-Sinai entropy.) Let
be a finite measurable partition of
and let
denote the least common refinement
. By a subadditivity argument, the sequence of Shannon entropies
converges to its infimum. The entropy of
with respect to the partition
is defined as the limit
- (11)
The Kolmogorov-Sinai entropy of the measure-preserving system
is the supremum
- (12)
where
ranges over all finite measurable partitions of
.
A system
with a fixed measurable partition
is called a process.
The most important part of the Kolmogorov-Sinai entropy theory deals with processes.
Interpretation of the Kolmogorov-Sinai entropy of a process
By the definition (11), the entropy
of a process generated by a partition
can be interpreted as the average gain of information per unit of time delivered by the partition
.
The same entropy can be computed using the notion of conditional entropy (see (10)):
where
is the sigmafield generated by all partitions
(
)
and is sometimes called the past (or future depending on the interpretation) of the process. This formula provides another interpretation of the Kolmogorov-Sinai entropy: it is the new information obtained in one step of a process given all the information from the past.
The main entropy theorems in ergodic theory
In the context of a measure-preserving transformation
, a partition
is called a one-sided generator if the sigmafield generated jointly by the partitions
equals the sigmafield of all measurable sets.
If
is invertible one also defines a two-sided generator as a partition
such that the sigmafield generated jointly by the partitions
is the sigmafield of all measurable sets.
- The Kolmogorov-Sinai Theorem: If
is a generator (one-sided or two-sided), then
.
- The Krieger Generator Theorem: If
is invertible then
if and only if the system has a finite two-sided generator. Moreover,
if and only if both
is invertible and the system has a one-sided generator.
The main theorem concerning entropy of processes is
- The Shannon-McMillan-Breiman Theorem: Given an ergodic process
, the convergence
holds at
-almost every point
. (Recall that
denotes the information function associated with the partition
, see (7).)
The interpretation of this last theorem is that for large
the partition
cuts
the major part of the space into sets of roughly equal measures. One has to be careful with the meaning of roughly.
The ratios between measures of the typical sets in
may be far from unity,
only their logarithms must have absolute values much smaller than
.
Entropy plays the key role in the theory of Bernoulli processes developed by Donald Ornstein, where it turns out to be a complete invariant of the isomorphism:
- The Ornstein Theorem: Two Bernoulli processes are isomorphic if and only if their entropies are equal.
Topological entropy
This is the main entropy notion in topological dynamics. In a topological dynamical system
topological entropy measures the exponential speed of the growth of the number of distingushable
-orbits as
grows to infinity.
See the main article on topological entropy.
Compression entropy
Consider a message in the form of a very long sequence (word)
composed from a finite collection
of letters. For example, every text or computer file or even TV program has such form.
Formally,
, where
is the length of the message. In view of what was said
about the information associated with a set, all messages of the same length
should contain the same
amount of information
because the uniform measure of every such word is
(here
denotes cardinality of the set
,
i.e., the number of its elements). The simple example below shows, however, that the amount of information,
if properly understood, varies from word to word. For, let
Using obvious notation,
can be written as
, while there is no obvious abbreviation for
. One is inclined to believe that
carries more information than
, because
its description cannot be essentially compacted.
This intuition was given a formal shape by scientists such as Shannon and Kolmogorov in the 60's and later develped by Leonid Levin (1974).
A lossless data compression code is formally any function transforming in a 1-1 correspondence all finite messages over
to finite sequences (of various lengths) over another finite alphabet, say,
. Usually one requires that the code also contains all instructions
needed to perform the inverse map (i.e., to decode). Only the commonly used conventions need not be included. The precise
rigors imposed on the code lead to slightly different notions (see for example Kolmogorov complexity).
The information content of a message
equals
, where
is the
length of the shortest possible image of
by a lossless data compression code. The ratio
is called the (best possible) compression rate of
or its compression entropy
(it represents the ratio between the length of the compressed and original message if both are transformed to the
binary alphabet by a simple binary encoding of the letters).
See Connections between different meanings of entropy.
See example of a simple compression algorithm
See also Lempel-Ziv algorithm
Entropy as disorder
In nearly all its meanings, entropy can be viewed as a measure of disorder and chaos, as long as by order one understands
segregating things by their kind (e.g. by similar properties or parameter values).
Chaos is the state of a system (physical or dynamical) in which elements of all kinds are mixed evenly throughout
the space, so that the space is homogeneous. For example, a container with gas is in its state of maximal entropy when
the temperature and pressure are constant throughout the volume. That means there is approximately
the same number of particles in every unit of the volume, and the proportion between slow and fast particles is everywhere
the same. States of lower entropy occur when particles are organized, for example: slower ones in one area, faster ones in another.
A message
has high entropy if all short words appear with equal frequencies in all sufficiently long subwords
of
. Any trace of organization and logic in the structure of the message allows for its compression and
hence lowers its compression entropy. These observations lead to the common sense meaning of entropy.
To have order in the house means to have food separated from utensils and plates, clothing arranged in the closet by type, trash deposited in the trash container, etc. When these things get mixed together, entropy increases causing disorder and chaos. In a social system, order is associated with classification of the individuals by their skills and assigning to them appropriate positions in the system. Law and other mechanisms are enforced to keep such order. When this classification and assignment fails, the system falls into chaos.
Connections between different meanings of entropy
See Connections between different meanings of entropy.
References
[B] Ludwig Boltzmann, Lectures on Gas Theory, 1898
[Ca] Sadi Carnot, Reflections on the Motive Power of Fire, 1824
[Cl] Rudolf Clausius, The Mechanical Theory of Heat – with its Applications to the Steam Engine and to Physical Properties of Bodies, London, 1865.
[G] Willard Gibbs, A Method of Geometrical Representation of the Thermodynamic Properties of Substances by Means of Surfaces, 1873
[L] Rolf Landauer, IBM Jl. Res. Develop. 5, 1961.
[K] Andrei N. Kolmogorov, New Metric Invariant of Transitive Dynamical Systems and Endomorphisms of Lebesgue Spaces, 1958.
[N] John von Neumann, Mathematische Grundlagen der Quantenmechanik, Springer, Berlin, 1932.
[P] Karl Petersen, Ergodic Theory, Cambridge, 1983.
[Sh] Claude E. Shannon, A Mathematical Theory of Communication, 1948.
[Si] Yakov G. Sinai, On the Notion of Entropy of a Dynamical System, 1959.
[SW] Claude E. Shannon and Warren Weaver, The Mathematical Theory of Communication.
[Wa] Peter Walters, Ergodic theory{introductory lectures, Springer, Berlin, 1975.
Internal references
- Paul M.B. Vitanyi (2007) Andrey Nikolaevich Kolmogorov. Scholarpedia, 2(2):2798.
- Olaf Sporns (2007) Complexity. Scholarpedia, 2(10):1623.
- James Meiss (2007) Dynamical systems. Scholarpedia, 2(2):1629.
- Eugene M. Izhikevich (2007) Equilibrium. Scholarpedia, 2(10):2014.
- Jacob D. Bekenstein (2008), Bekenstein-Hawking entropy, Scholarpedia, 3(10):7375.
Subpages
External links
- Wolfram Math World: Entropy (and the links therein)
See also
Bekenstein-Hawking entropy, Chaos, Data compression, Entropy in Chaotic Dynamics, Entropy of Spike Trains, Kolmogorov complexity, Kolmogorov-Sinai Entropy, Laws of thermodynamics, Mutual information, Time's arrow and Boltzmann's entropy, Topological entropy, Transfer Entropy, Von Neumann entropy.
| Tomasz Downarowicz (2007) Entropy. Scholarpedia, 2(11):3901, (go to the first approved version) Created: 22 May 2007, reviewed: 16 November 2007, accepted: 16 November 2007 |




