Cumulants
From Scholarpedia
| Peter McCullagh and John Kolassa (2009), Scholarpedia, 4(3):4699. | revision #59691 [link to/cite this article] | |||||||||||||||||||
Curator: Dr. Peter McCullagh, Department of Statistics, University of Chicago, IL
Curator: Dr. John Kolassa, Department of Statistics, Rutgers University, NJ
This article describes a sequence of numbers, called cumulants, that are used to describe, and in some circumstances approximate, a univariate or multivariate distribution. Cumulants are not unique in this role; other sequences, such as moments and their generalizations, may also be used in both roles. Cumulants have multiple advantages over competitors, in that cumulants change in a very simple way when the underlying random variable is subject to an affine transformation, cumulants for sums of independent random variables have a very simple relationship to the cumulants of the addends, and cumulants may be used in a simple way to describe the difference between a distribution and its simplest Gaussian approximation.
Contents |
Overview and Definitions
Definition
The moment of order
(or
th moment) of a real-valued random variable
is
for integer
.
The value is assumed to be finite.
Provided that it has a Taylor expansion about the origin,
the moment generating function (or Fourier--Laplace transform)
- (1)
is an easy way to combine all of the moments into a single expression.
The
th moment is hence the
th derivative of
at the origin.
When
has a distribution given by a density
, then
The cumulants
are the coefficients in the Taylor expansion of
the cumulant generating function about the origin
Evidently
implies
.
The relationship between the first few moments and cumulants,
obtained by extracting coefficients from the expansion, is as follows
- (4)
In the reverse direction
- (5)
In particular,
is the mean of
,
is the
variance, and
.
Higher-order cumulants are not the same as moments about the mean.
Hald (2000) credits Thiele (1889) with the first derivation of cumulants.
Fisher (1929) called the quantities
cumulative moment functions; Hotelling (1933)
claims credit for the simpler term cumulants.
Lauritzen (2002) presents an overview, translation, and reprinting of much of this early work.
Examples
As above, let
denote the real numbers.
Let
represent the positive reals, and let
be the natural numbers.
| Distribution | Density | CGF | Cumulants |
| Normal | ![]() | ![]() | , , for ![]() |
| Bernoulli | ![]() | ![]() | , , ![]() |
| Poisson | ![]() | ![]() | |
| Exponential | ![]() | ![]() | |
| Geometric | ![]() | ![]() | , , for . |
Definitions under less restrictive conditions
The Cauchy distribution with density
has no moments because
the integral (2) does not converge for any integer
;
Student's
distribution on five degrees of freedom is symmetric with density
.
The first four moments are
. Higher-order moments are
not defined.
The cumulants up to order four are defined by (4)
even though the moment generating function (1) does not exist
for any real
.
In both of these cases, the characteristic function
is
well-defined for real
,
for the Cauchy distribution,
and
for
.
In the latter case, both
and
have Taylor expansions up to order four only, so the moments and
cumulants are defined only up to this order.
The infinite expansion (1) is justified when
the radius of convergence is positive, in which case
is finite on
an open set containing zero, and all moments and cumulants are finite.
However, finiteness of the moments does not imply that
exists for any
.
The log normal distribution provides a counterexample.
It has finite moments
of all orders,
but (1) diverges for every
.
Uniqueness
The normal distribution
has cumulant generating function
, a quadratic polynomial implying that all cumulants
of order three and higher are zero.
Marcinkiewicz (1939) showed that the normal distribution is the only distribution
whose cumulant generating function is a polynomial, i.e. the only distribution
having a finite number of non-zero cumulants.
The Poisson distribution with mean
has moment generating function
and cumulant generating function
.
Consequently all the cumulants are equal to the mean.
Two distinct distributions may have the same moments, and hence the same cumulants.
This statement is fairly obvious for distributions whose moments are all infinite,
or even for distributions having infinite higher-order moments.
But it is much less obvious for distributions having finite moments of all orders.
Heyde (1963) gave one such pair of distributions with densities
and
for
.
The first of these is called the log normal distribution.
To show that these distributions have the same moments it suffices to show that
for integer
, which can be shown by making the substitution
.
If the sequence of moments is such that (1) has a finite radius of convergence, the distribution is uniquely determined.
Properties
Cumulants of order
are called semi-invariant on account of their
behavior under affine transformation of variables (Thiele ,1903, Dressel ,1940).
If
is the
th cumulant of
,
the
th cumulant of the affine transformation
is
,
independent of
.
This behavior is considerably simpler than that of moments.
However, moments about the mean are also semi-invariant, so this property alone
does not explain why cumulants are useful for statistical purposes.
The term cumulant reflects their behavior under
addition of random variables.
Let
be the sum of two independent random variables.
The moment generating function of the sum is the product
and the cumulant generating function is the sum
Consequently, the
th cumulant of the sum is the sum of the
th cumulants.
By extension, if
are independent and identically distributed,
the
th cumulant of the sum is
.
Let
be the
cumulant of order
of the standardized sum
;
then
- (6)
Provided that the cumulants are finite, all cumulants of order
of the standardized sum tend to zero, which is a simple demonstration of the central limit theorem.
Good (1977) obtained an expression for the
th cumulant of
as
the
th moment of the discrete Fourier transform of an independent and
identically distributed sequence as follows.
Let
be independent copies of
with
th cumulant
,
and let
be a primitive
th root of unity.
The discrete Fourier combination
is a complex-valued random variable whose distribution is invariant under
rotation
through multiples of
.
The
th cumulant of the sum is
,
which is equal to
if
is a multiple of
, and zero otherwise.
Consequently
for integer
and
.
Multivariate cumulants
Somewhat surprisingly, the relation between moments and cumulants is simpler and
more transparent in the multivariate case than in the univariate case.
Let
be the components of a random vector.
In a departure from the univariate notation, we write
for the components of the mean vector,
for the components of the second moment matrix,
for the third moments, and so on.
It is convenient notationally to adopt Einstein's summation convention,
so
denotes the linear combination
,
the square of the linear combination is
a sum of
terms, and so on for higher powers.
The Taylor expansion of the moment generating function
is
The cumulants are defined as the coefficients
in the Taylor expansion
This notation does not distinguish first-order moments from first-order cumulants, but commas separating the superscripts serve to distinguish higher-order cumulants from moments.
Comparison of coefficients reveals that each moment
is a sum over partitions of the superscripts, each term in the sum being a
product of cumulants:
Each parenthetical number indicates a sum over distinct partitions
having the same block sizes, so the fourth-order moment is a sum of 15 distinct cumulant products.
In the reverse direction, each cumulant is also a sum over partitions of the indices.
Each term in the sum is a product of moments, but with coefficient
where
is the number of blocks:
These relationships are an instance of Mobius inversion on the partition lattice.
Partition notation serves one additional purpose.
It establishes moments and cumulants as special cases of generalized cumulants,
which includes objects of the type
,
, and
with incompletely partitioned indices.
These objects arise very naturally in statistical work involving asymptotic
approximation of distributions.
They are intermediate between moments and cumulants, and have characteristics of both.
Every generalized cumulant can be expressed as a sum of certain products of ordinary cumulants. Some examples are as follows:
Each generalized cumulant is associated with a partition
of the given set of indices.
For example,
is associated with the partition
of four indices
into three blocks.
Each term on the right is a cumulant product associated with a partition
of the same indices.
The coefficient is one if the
least upper bound
has a single block,
otherwise zero.
Thus, with
, the product
does not appear
on the right because
has two blocks.
As an example of the way these formulae may be used,
let
be a scalar random variable with cumulants
.
By translating the second formula in the preceding list, we find that
the variance of the squared variable is
reducing to
if the mean is zero.
Exponential families
Let
be a probability distribution on an arbitrary measurable space
,
and let
be a real-valued random variable
with cumulant generating function
, finite in a set
containing zero in the interior.
The family of distributions on
with density
indexed by
is called the exponential
family associated with
and the canonical
statistic
.
In statistical physics, the normalizing constant
is called the
partition function.
Two examples suffice to illustrate the idea.
In the first example,
is the set of natural numbers,
and
.
The associated exponential family is
where
is the Riemann zeta function with real argument
.
In the second example,
is the symmetric group or the set of
permutations of
letters,
is a permutation,
is the number of cycles,
is the uniform distribution,
and
for all real
.
The exponential family of distributions on permutations of
is
the same as the distribution generated by the
Chinese restaurant process
with parameter
.
The associated marginal distribution on partitions,
the
Ewens distribution
on partitions of
,
is also of the exponential-family form with canonical statistic equal
to the number of blocks or cycles. This distribution is also the same as
the distribution generated by the
Dirichlet process.
This number
is a random variable whose cumulants are the
derivatives of
evaluated at the parameter
.
In the multi-parameter case,
is a random vector
and
is a linear functional,
is the joint moment generating function.
It is sometimes convenient to employ Einstein's implicit summation convention
in the form
where
are
the components of
, and
are the coefficients
of the linear functional.
For simplicity of notation in what follows,
and
is the identity function.
An exponential-family distribution in
has the form
for given functions
and
.
Integration shows that the distribution
has
cumulant generating function
.
The cumulants of
are equal to the derivatives of
at the parameter
.
Calculus of cumulants
Consider descriptions of the sampling distribution of estimates of cumulants.
Such calculations are notationally complicated, and may be simplified by a tool called umbral calculus.
The umbral calculus is a syntax or formal system consisting of
certain operations on objects called umbrae,
mimicking addition and multiplication of independent real-valued random
variables. Rota and Taylor (1994) reviews this calculus.
To each real-valued sequence
there corresponds an umbra
such that
.
This definition goes beyond the random variable context to allow for special umbrae, the singleton and Bell umbra,
corresponding to no real-valued random variable.
Using these special umbrae, one develops the notion of an
-cumulant umbra
by formal product operations in the syntax.
Properties of cumulants,
k-statistics
and other polynomial functions
are then derived by purely combinatorial operations.
Di Nardo et al. (2008) present details.
Streitberg (1990) presents parallels between the calculus of cumulants and the calculus of certain decompositions of multivariate cumulative distribution functions into independent segments; these characterizations in terms of independent segments are called Lancaster interactions.
Moment and Cumulant Measures for Random Measures
Moments and cumulants extend quite naturally to random distributions.
Let
be a random measure on a space
.
Then the expectation of
is
defined as that measure such that
, for
in a suitable sigma field. Higher--order
moments then translate to expectations of product measures.
Let
be the measure defined on
, such that
.
Then the moment of order
of
is
.
A moment generating functional can similarly be defined for
; a heuristic definition may be constructed through analogy with
(1): Let
for certain functions
on
,
and moments can be recovered from
via Fréchet
differentiation.
Cumulants can then be defined as in (4), using the obvious analogy.
These moments and cumulants have application to the theory of point processes.
The above exposition, and applications to the theory of point processes,
can be found in Daley and Vere-Jones (1988).
Approximation of distributions
Edgeworth approximation
Suppose that
is a random variable that arises as the sum
of
independent and identically-distributed summands, each of which has
mean
, unit variance, and
cumulants
, and
.
For ease of exposition, assume that cumulants of all orders exist.
Then, using (6), the cumulant generating function of
is given by
,
and the moment generating function of
is given by
Expanding the second factor gives
Reordering terms in powers of sample size,
- (7)
Repeated application of integration by parts to (3) shows that
- (8)
where
denotes the derivative of
of order
. Relation
(8) holds if
and its derivatives go to zero quickly
as
. Applying (8) to the normal
density
, and applying the result to
(7), gives
for
. Since the relationship
giving the moment generating function in terms of the density is invertible,
and since the inversion process is properly smooth,
Edgeworth (1907) approximates the density of
by
- (9)
In fact, when the summands contributing to
have a density and cumulants of order at least 5, the error in the
approximation, multiplied by
, remains bounded.
The functions
defined above are the Hermite polynomials.
The approximation (9) is known as the Edgeworth series.
The subscript refers to the number of cumulants used in its definition.
This series can be used to approximate either the cumulative distribution function or survival function through term-wise integration.
The preceding discussion is intended to be heuristic; Kolassa (2006) presents a rigorous derivation, along with the natural extension to random vectors.
Saddlepoint approximation
The approximation (9) to the density
has the property that
, for some constant
,
when the cumulant of order
exists;
does not depend on
.
A similar bound holds for the relative error
, only when
is restricted to a finite interval.
Because of the polynomial factor multiplying the first omitted term in
(9), the relative error can be expected to behave poorly.
One might prefer an approximation that maintains good behavior for
values of
in a range that increases as
increases; specifically,
one might prefer an approximation that performs well for values of
in a fixed interval.
Assume again that random variables
are independent and identically distributed, each with a cumulant generating function
finite for
in a neighborhood of
. As above, define the exponential family
One can then choose a value of
depending on
that makes
easy to approximate, and use
the exponential family relationship to derive an approximation for
. Conventionally we choose
to
satisfy
- (10)
this makes the expectation of the distribution
with density
equal to the observed value.
One then applies (9), with the scale of the ordinate changed
to reflect the fact that we are approximating the distribution of
,
to obtain
Using the fact that
,
, and
,
Here
are calculated from the derivatives of
in the preceding manner, but in this case evaluated at
.
This approximation may only be applied to values of
for which
(10) has solutions in an open neighborhood of 0.
Expression (11) represents the saddlepoint approximation to
the density of the mean
; since
has a cumulant generating function defined on an open set containing
,
cumulants of all orders exist, the Edgeworth series including
may be applied to
, and so the error in the
Edgeworth series is of order
. Hence the error in (11)
is of the same order, and in this case, is relative and uniform for values of
in a bounded subset of an open subset on which (10)
has a solution.
This approximation was introduced to the statistics literature by
Daniels (1954).
The Edgeworth series for the density was trivially integrated to obtain an
approximation to tail probabilities. Integration of the saddlepoint
approximation is more delicate. Two main approaches have been investigated.
Daniels (1987) expresses
exactly as a complex integral
involving
, integrates with respect to
to obtain another
complex integral, and reviews techniques for approximating the resulting
integrals.
Robinson (1982) and Lugannani and Rice (1980) derive tail probability approximations based
on approximately integrating (11) with respect to
directly.
These saddlepoint and Edgeworth approximations have multivariate and conditional extensions. Davison (1988) exploits the conditional saddlepoint tail probability approximation to perform inference in canonical exponential families.
Samples and sub-samples
A function
is symmetric if
for each permutation
of the arguments.
For example, the total
, the average
,
the min, max and median are symmetric functions, as are the sum of squares
, the sample variance
and the mean absolute deviation
.
A vector
in
is an
ordered list of
real numbers
or a function
where
.
For
, a 1--1 function
is a sample of size
,
the sampled values being
.
All told, there are
distinct samples of size
that can be taken from a list of length
.
A sequence of functions
is
consistent under sub-sampling if, for each
,
where
denotes the average
over samples of size
.
For
, this condition implies only that
is a symmetric function.
Although the total and the median are both symmetric functions, neither is
consistent under sub-sampling.
For example, the median of the numbers
is one,
but the average of the medians of samples of size two is 4/3.
However, the average
is sampling consistent.
Likewise the sample variance
with divisor
is sampling consistent,
but the mean squared deviation
with divisor
is not.
Other sampling consistent functions include Fisher's
-statistics,
the first few of which are
,
for
,
defined for
.
For a sequence of independent and identically distributed random variables,
the
-statistic of order
is the unique symmetric function
such that
.
Fisher (1929) derived the variances and covariances.
The connection with finite-population sub-sampling was developed by
Tukey (1950).
The class of statistics called U-statistic is consistent under sub-sampling.
References
- D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes. Springer-Verlag, New York, 1988.
- H. E. Daniels. Saddlepoint approximations in statistics. The Annals of Mathematical Statistics, 25 (4): 631--650, 1954.
- H. E. Daniels. Tail probability approximations. Review of the International Statistical Institute, 55: 37--46, 1987.
- A. C. Davison. Approximate conditional inference in generalized linear models. Journal of the Royal Statistical Society Series B, 50: 445--461, 1988.
- E. Di Nardo, G. Guarino, and D. Senato. A unifying framework for $k$-statistics, polykays and their multivariate generalizations. Bernoulli, 14: 440--468, 2008.
- P. L. Dressel. Statistical seminvariants and their setimates with particular emphasis on their relation to algebraic invariants. The Annals of Mathematical Statistics, 11 (1): 33--57, 1940.
- F. Y. Edgeworth. On the representation of statistical frequency by a series. Journal of the Royal Statistical Society, 70 (1): 102--106, 1907.
- R. A. Fisher. Moments and product moments of sampling distributions. Proceedings of the London Mathematical Society, Series 2, 30: 199--238, 1929.
- I. J. Good. A new formula for k-statistics. The Annals of Statistics, 5 (1): 224--228, 1977.
- A. Hald. The early history of cumulants and the Gram-Charlier series. International Statistical Review, 68: 137--153, 2000.
- C. C. Heyde. On a property of the lognormal distribution. Journal of the Royal Statistical Society. Series B (Methodological), 25 (2): 392--393, 1963.
- Harold Hotelling. Review: [untitled]. Journal of the American Statistical Association, 28 (183): 374--375, 1933. ISSN 01621459. URL urlhttp://www.jstor.org/stable/2278451.
- J. E. Kolassa. Series Approximation Methods in Statistics. Springer--Verlag, New York, 2006.
- S.L. Lauritzen, editor. Thiele: pioneer in statistics. Oxford University Press, New York, 2002.
- R. Lugannani and S. Rice. Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability, 12: 475--490, 1980.
- J. Marcinkiewicz. Sur une peropri'et'e de la loi de Gauss. Mathematische Zeitschrift, 44: 612--618, 1939.
- J. Robinson. Saddlepoint approximations for permutation tests and confidence intervals. Journal of the Royal Statistical Society. Series B (Methodological), 44 (1): 91--101, 1982.
- G.-C. Rota and B. D. Taylor. The classical umbral calculus. SIAM J. Math. Anal, 25 (2): 694--711, 1994.
- B. Streitberg. Lancaster interactions revisited. The Annals of Statistics, 18 (4): 1878--1885, 1990.
- T. N. Thiele. Almindelig Iagttagelseslaere: Sandsynlighedsregning og mindste Kvadraters Methode. C. A. Reitzel, Copenhagen, 1889.
- T. N. Thiele. Theory of Observations. C. & E. Layton, London, 1903.
- J. W. Tukey. Some sampling simplified. Journal of the American Statistical Association, 45 (252): 501--519, 1950.
See also
| Peter McCullagh, John Kolassa (2009) Cumulants. Scholarpedia, 4(3):4699, (go to the first approved version) Created: 13 August 2007, reviewed: 3 March 2009, accepted: 12 March 2009 |
and


,
,
for 


,
, ![\kappa_3=[2 \pi ^3-3 \pi ^2+\pi]](/wiki/images/math/4b5aad613ee812fd07570f56abb2a048.png)






,
,
for
.




