Dr. John Kolassa

From Scholarpedia
Scholar of ScholarpediaCurator Index: 0.3(Difference between revisions)
Jump to: navigation, search
m (User 1:)
m (User 1:)
Line 1: Line 1:
This is a draft, 1st author Peter McCullagh
 
 
 
This article describes a sequence of numbers, called ''' cumulants''',  
 
This article describes a sequence of numbers, called ''' cumulants''',  
 
that are used  
 
that are used  
Line 12: Line 10:
 
be used in a simple way to describe the difference between a distribution and  
 
be used in a simple way to describe the difference between a distribution and  
 
its simplest Gaussian approximation.
 
its simplest Gaussian approximation.
==Overview and Definitions==
+
 
===Definition===
+
The moment of order
+
<math>r</math> (or
+
<math>r</math>th moment) of a real-valued random variable
+
<math>X</math> is
+
:<math>
+
\mu_r = E(X^r)
+
</math>
+
for integer
+
<math>r=0,1,\ldots</math>.
+
The value is assumed to be finite.
+
Provided that it has a Taylor expansion about the origin,
+
 
The moment generating function (or Fourier--Laplace transform)
 
The moment generating function (or Fourier--Laplace transform)
:<math  powerseries>
+
<math  powerseries>
 
M(\xi) = E(e^{\xi X})  
 
M(\xi) = E(e^{\xi X})  
= E(1 + \xi X +\cdots + \xi^r X^r/r!+\cdots)
 
= \sum_{r=0}^\infty \mu_r \xi^r/r!
 
 
</math>
 
</math>
 
is an easy way to combine all of the moments into a single expression.
 
is an easy way to combine all of the moments into a single expression.
The
+
The cumulants up to order four are defined
<math>r</math>th moment is hence the
+
<math>r</math>th derivative of
+
<math>M</math> at the origin.
+
This definition is due to Fisher (1929).
+
 
+
When
+
<math>X</math> has a distribution given by a density
+
<math>f</math>, then
+
:<math  ctsmomdef>
+
\mu_r = \int_{-\infty}^\infty x^r f(x)\,dx,</math> and
+
:<math  mgfdef>
+
M(\xi) = E(e^{\xi X}) =\int_{-\infty}^\infty\exp(\xi x) f(x) d x.
+
</math>
+
 
+
The cumulants
+
<math>\kappa_r</math> are the coefficients in the Taylor expansion of
+
the cumulant generating function about the origin
+
:<math>
+
K(\xi) = \log M(\xi) = \sum_{r} \kappa_r \xi^r/r!.
+
</math>
+
Evidently
+
<math>\mu_0 = 1</math> implies
+
<math>\kappa_0 = 0</math>.
+
The relationship between the first few moments and cumulants,
+
obtained by extracting coefficients from the expansion, is as follows
+
:<math  forward>\begin{array}{lcl}
+
\kappa_1 &=& \mu_1 \\
+
\kappa_2 &=& \mu_2 - \mu_1^2\\
+
\kappa_3 &=& \mu_3 - 3\mu_2\mu_1 + 2\mu_1^3\\
+
\kappa_4 &=& \mu_4 - 4\mu_3\mu_1 - 3\mu_2^2 + 12\mu_2\mu_1^2 -6\mu_1^4.
+
\end{array}</math>
+
In the reverse direction
+
:<math  reverse>\begin{array}{lcl}
+
\mu_2 &=& \kappa_2 + \kappa_1^2\\
+
\mu_3 &=& \kappa_3 + 3\kappa_2\kappa_1 + \kappa_1^3\\
+
\mu_4 &=& \kappa_4 + 4\kappa_3\kappa_1 + 3\kappa_2^2 + 6\kappa_2\kappa_1^2 + \kappa_1^4.
+
\end{array}</math>
+
In particular,
+
<math>\kappa_1 = \mu_1</math> is the mean of
+
<math>X</math>,
+
<math>\kappa_2</math> is the
+
variance, and
+
<math>\kappa_3 = E((X - \mu_1)^3)</math>.
+
Higher-order cumulants are not the same as moments about the mean.
+
Hald (2000) credits Thiele (1889) with the first derivation of cumulants.
+
Lauritzen (2002) presents an overview, translation, and reprinting of much of this early work.
+
===Examples===
+
As above, let <math> {\mathcal R}</math> denote the real numbers.
+
Let <math> {\mathcal R}^+</math> represent the positive reals, and let <math> {\mathcal N}=\{0,1,\ldots\}</math> be the natural numbers.
+
 
+
 
+
<table><tr><td>Distribution</td><td>Density </td><td>CGF</td><td>Cumulants</td></tr>
+
<tr><td>Normal</td><td><math> \frac{\exp(-x^2)}{\sqrt{2\pi}}, x\in{\mathcal R}</math></td><td><math> \xi^2/2</math></td><td><math> \kappa_1=0</math>, <math> \kappa_2=1</math>, <math> \kappa_r=0</math> for <math>r>2</math></td></tr>
+
<tr><td>Bernoulli</td><td><math> \pi^x(1-\pi)^{1-x}, x\in\{0,1\}</math></td><td><math> \log(1-\pi+\pi\exp(\xi))</math></td><td><math> \kappa_1=\pi</math>, <math> \kappa_2=\pi(1-\pi)</math>, <math> \kappa_3=[2 \pi ^3-3 \pi ^2+\pi]</math></td></tr>
+
<tr><td>Poisson</td><td><math> \frac{\exp(-\lambda)\lambda^x}{x!}, x\in{\mathcal N}      </math></td><td><math> (e^{\xi }-1)\lambda</math></td><td><math> \kappa_r=\lambda \ \forall r</math> </td></tr>
+
<tr><td>Exponential</td><td><math> \frac{\exp(-x/\lambda)}{\lambda}, x\in{\mathcal R}^+</math></td><td><math> -\log(1-\lambda\xi)</math></td><td><math> \kappa_r=\lambda^r(r-1)!  \ \forall r</math> </td></tr>
+
<tr><td>Geometric</td><td><math> (1-\pi)\pi^x, x\in{\mathcal N}</math></td><td><math> \log(1-\pi)-\log(1-\pi\exp(\xi)) </math></td><td> <math> \kappa_1=\rho</math>, <math> \kappa_2=\rho^2+\rho</math>,<math> \kappa_3=2 \rho ^3+3 \rho ^2+\rho</math> for <math> \rho=\pi/(1-\pi)</math>.</td></tr>
+
</table>
+
===Definitions under less restrictive conditions===
+
The Cauchy distribution with density <math> \pi^{-1}/(1+x^2)</math> has no moments because
+
the integral (<ref>ctsmomdef</ref>) does not converge for any integer <math> r\ge 1</math>
+
Student's <math> t</math> distribution on five degrees of freedom is symmetric with density
+
<math> (3\pi\surd5/8)/(1 + x^2/5)^3</math>
+
The first four moments are <math> 0, 5/3, 0, 25</math> : higher-order moments are
+
not defined.
+
The cumulants up to order four are defined by (<ref>forward</ref>)
+
 
even though the moment generating function (<ref>powerseries</ref>) does not exist
 
even though the moment generating function (<ref>powerseries</ref>) does not exist
 
for any real <math> \xi\neq 0</math> .
 
for any real <math> \xi\neq 0</math> .
  
In both of these cases, the characteristic function <math> M(i\xi)</math> is
 
well-defined for real <math> \xi</math> ,
 
<math> \exp(-|\xi|)</math> for the Cauchy distribution,
 
and <math> \exp(-|\xi|\surd 5)(1 + |\xi|\surd5 + 5\xi^2/3)</math>  for <math> t_5</math> .
 
In the latter case, both <math> M(i\xi)</math> and <math> K(i\xi)</math>
 
have Taylor expansions up to order four only, so the moments and
 
cumulants are defined only up to this order.
 
The infinite expansion (<ref>powerseries</ref>) is justified when
 
the radius of convergence is positive, in which case <math> M(\xi)</math> is finite on
 
an open set containing zero, and all moments and cumulants are finite.
 
However, finiteness of the moments does not imply that <math> M(\xi)</math>
 
exists for any <math> \xi\neq 0</math> .
 
The log normal distribution provides a counterexample.
 
 
It has finite moments <math> \mu_r = e^{r^2/2}</math> of all orders,
 
It has finite moments <math> \mu_r = e^{r^2/2}</math> of all orders,
but (<ref>powerseries</ref>) diverges for every <math> \xi\neq 0</math>.
+
but equation (<ref>powerseries</ref>) diverges for every <math> \xi\neq 0</math>.
===Uniqueness===
+
The normal distribution
+
<math>N(\mu, \sigma^2)</math> has cumulant generating function
+
<math>\xi\mu + \xi^2 \sigma^2/2</math>, a quadratic polynomial implying that all cumulants
+
of order three and higher are zero.
+
Marcinkiewicz (1939) showed that the normal distribution is the only distribution
+
whose cumulant generating function is a polynomial, i.e. the only distribution
+
having a finite number of non-zero cumulants.
+
The Poisson distribution with mean
+
<math>\mu</math> has moment generating function
+
<math>\exp(\mu(e^\xi - 1))</math> and cumulant generating function
+
<math>\mu(e^\xi -1)</math>.
+
Consequently all the cumulants are equal to the mean.
+
 
+
Two distinct distributions may have the same moments, and hence the same cumulants.
+
This statement is fairly obvious for distributions whose moments are all infinite,
+
or even for distributions having infinite higher-order moments.
+
But it is much less obvious for distributions having finite moments of all orders.
+
Heyde (1963) gave one such pair of distributions with densities
+
<math>
+
f_1(x) = \exp(-(\log x)^2/2) / (x\sqrt{2\pi})
+
</math>
+
and <math>
+
f_2(x) = f_1(x) [1 + \sin(2\pi\log x)/2] 
+
</math>
+
for
+
<math>x > 0</math>.
+
The first of these is called the log normal distribution.
+
To show that these distributions have the same moments it suffices to show that
+
:<math>
+
\int_0^\infty x^k f_1(x) \sin(2\pi\log x)\, dx = 0
+
</math>
+
for integer
+
<math>k\ge 1</math>, which can be shown by making the substitution
+
<math>\log x = y+k</math>.
+
 
+
If the sequence of moments is such that (<ref>powerseries</ref>)
+
has a finite radius of convergence, the distribution is uniquely determined.
+
 
+
===Properties===
+
Cumulants of order
+
<math>r \ge 2</math> are called semi-invariant on account of their
+
behavior under affine transformation of variables (Thiele ,1903, Dressel ,1940).
+
If
+
<math>\kappa_r</math> is the
+
<math>r</math>th cumulant of
+
<math>X</math>,
+
the
+
<math>r</math>th cumulant of the affine transformation
+
<math>a + b X</math> is
+
<math>b^r \kappa_r</math>,
+
independent of
+
<math>a</math>.
+
This behavior is considerably simpler than that of moments.
+
However, moments about the mean are also semi-invariant, so this property alone
+
does not explain why cumulants are useful for statistical purposes.
+
 
+
The term cumulant was coined by Fisher (1929) on account of their behavior under
+
addition of random variables.
+
Let
+
<math>S = X+Y</math> be the sum of two independent random variables.
+
The moment generating function of the sum is the product
+
:<math>
+
M_S(\xi) =  M_X(\xi) M_Y(\xi),
+
</math>
+
and the cumulant generating function is the sum
+
:<math>
+
K_S(\xi) = K_X(\xi) + K_Y(\xi).
+
</math>
+
Consequently, the
+
<math>r</math>th cumulant of the sum is the sum of the
+
<math>r</math>th cumulants.
+
By extension, if
+
<math>X_1,\ldots X_n</math> are independent and identically distributed,
+
the
+
<math>r</math>th cumulant of the sum is
+
<math>n\kappa_r</math>.
+
Let
+
<math>\kappa_{n;r}</math> be
+
cumulant of order
+
<math>r</math> of the standardized sum
+
<math>n^{-1/2}(X_1+\cdots + X_n)</math>;
+
then
+
:<math  ndep>
+
\kappa_{n;r}=n^{1-r/2} \kappa_r.
+
</math>
+
Provided that the cumulants are finite, all cumulants of order
+
<math>r\ge 3</math>
+
of the standardized sum tend to zero, which is a simple demonstration of the central limit theorem.
+
 
+
Good (1977) obtained an expression for the
+
<math>r</math>th cumulant of
+
<math>X</math> as
+
the
+
<math>r</math>th moment of the discrete Fourier transform of an independent and
+
identically distributed sequence as follows.
+
Let
+
<math>X_1, X_2,\ldots</math> be independent copies of
+
<math>X</math> with
+
<math>r</math>th cumulant
+
<math>\kappa_r</math>,
+
and let
+
<math>\omega = e^{2\pi i/n}</math> be a primitive
+
<math>n</math>th root of unity.
+
The discrete Fourier combination
+
:<math>
+
Z = X_1 + \omega X_2 + \cdots + \omega^{n-1} X_n
+
</math>
+
is a complex-valued random variable whose distribution is invariant under
+
rotation
+
<math>Z\sim \omega Z</math> through multiples of
+
<math>2\pi /n</math>.
+
The
+
<math>r</math>th cumulant of the sum is
+
<math>\kappa_r \sum_{j=1}^n \omega^{r j}</math>,
+
which is equal to
+
<math>n\kappa_r</math> if
+
<math>r</math> is a multiple of
+
<math>n</math>, and zero otherwise.
+
Consequently
+
<math>E(Z^r) = 0</math> for integer
+
<math>r < n</math> and
+
<math>E(Z^n) = n\kappa_n</math>.
+
 
+
 
+
===Multivariate cumulants===
+
Somewhat surprisingly, the relation between moments and cumulants is simpler and
+
more transparent in the multivariate case than in the univariate case.
+
Let
+
<math>X = (X^1,\ldots, X^k)</math> be the components of a random vector.
+
In a departure from the univariate notation, we write
+
<math>\kappa^r = E(X^r)</math> for the components of the mean vector,
+
<math>\kappa^{rs} = E(X^r X^s)</math> for the components of the second moment matrix,
+
<math>\kappa^{r s t} = E(X^r X^s X^t)</math> for the third moments, and so on.
+
It is convenient notationally to adopt Einstein's summation convention,
+
so
+
<math>\xi_r X^r</math> denotes the linear combination
+
<math>\xi_1 X^1 + \cdots + \xi_k X^k</math>,
+
the square of the linear combination is
+
<math>(\xi_r X^r)^2 = \xi_r\xi_s X^r X^s</math>
+
a sum of
+
<math>k^2</math> terms, and so on for higher powers.
+
The Taylor expansion of the moment generating function
+
<math>M(\xi) = E(\exp(\xi_r X^r)</math>
+
is
+
:<math>
+
M(\xi) = 1 + \xi_r \kappa^r
+
+ \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{rs}
+
+ \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r s t} +\cdots.
+
</math>
+
The cumulants are defined as the coefficients
+
<math>\kappa^{r,s}, \kappa^{r,s,t},\ldots</math>
+
in the Taylor expansion
+
:<math>
+
\log M(\xi) = \xi_r \kappa^r
+
+ \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{r,s}
+
+ \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r,s,t} +\cdots.
+
</math>
+
This notation does not distinguish first-order moments from first-order cumulants,
+
but commas separating the superscripts serve to distinguish higher-order cumulants from moments.
+
 
+
Comparison of coefficients reveals that the each moment
+
<math>\kappa^{rs}, \kappa^{r s t},\ldots</math>
+
is a sum over partitions of the superscripts, each term in the sum being a
+
product of cumulants:
+
:<math>\begin{array}{lcl}
+
\kappa^{rs}&=&\kappa^{r,s} + \kappa^r\kappa^s\\
+
\kappa^{r s t}&=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t + \kappa^{r,t}\kappa^s + \kappa^{s,t}\kappa^r
+
+ \kappa^r\kappa^s\kappa^t\\
+
&=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t[3] + \kappa^r\kappa^s\kappa^t\\
+
\kappa^{r s t u}&=&\kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,s}\kappa^{t,u}[3]
+
+ \kappa^{r,s}\kappa^t\kappa^u[6] + \kappa^r\kappa^s\kappa^t\kappa^u.
+
\end{array}</math>
+
Each parenthetical number indicates a sum over distinct partitions
+
having the same block sizes, so the fourth-order moment is a sum of 15 distinct cumulant products.
+
In the reverse direction, each cumulant is also a sum over partitions of the indices.
+
Each term in the sum is a product of moments, but with coefficient
+
<math>(-1)^{\nu-1} (\nu-1)!</math>
+
where
+
<math>\nu</math> is the number of blocks:
+
:<math>\begin{array}{lcl}
+
\kappa^{r,s} &=& \kappa^{rs} - \kappa^r\kappa^s\\
+
\kappa^{r,s,t} &=& \kappa^{r s t} - \kappa^{rs}\kappa^t[3] + 2 \kappa^r\kappa^s\kappa^t\\
+
\kappa^{r,s,t,u} &=& \kappa^{r s t u} - \kappa^{r s t}\kappa^u[4] - \kappa^{rs}\kappa^{t u}[3]
+
+ 2 \kappa^{rs}\kappa^t\kappa^u[6] - 6 \kappa^r\kappa^s\kappa^t\kappa^u
+
\end{array}</math>
+
 
+
These relationships are an instance of M\"obius inversion on the partition lattice.
+
 
+
Partition notation serves one additional purpose.
+
It establishes moments and cumulants as special cases of generalized cumulants,
+
which includes objects of the type
+
<math>\kappa^{r,st} = {\rm cov}(X^r, X^s X^t)</math>,
+
<math>\kappa^{rs, t u} = {\rm cov}(X^r X^s, X^t X^u)</math>, and
+
<math>\kappa^{rs, t, u}</math> with incompletely partitioned indices.
+
These objects arise very naturally in statistical work involving asymptotic
+
approximation of distributions.
+
They are intermediate between moments and cumulants, and have characteristics of both.
+
 
+
Every generalized cumulant can be expressed as a sum of certain products of ordinary cumulants.
+
Some examples are as follows:
+
:<math>\begin{array}{lcl}
+
\kappa^{rs, t} &=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t} + \kappa^s \kappa^{r,t}\\
+
&=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t}[2]\\
+
\kappa^{rs,t u} &=& \kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,t}\kappa^{s,u}[2]
+
+ \kappa^{r,t}\kappa^s\kappa^u[4]\\
+
\kappa^{rs,t,u} &=& \kappa^{r,s,t,u} + \kappa^{r,t,u}\kappa^s[2] + \kappa^{r,t}\kappa^{s,u}[2]
+
\end{array}</math>
+
Each generalized cumulant is associated with a partition
+
<math>\tau</math> of the given set of indices.
+
For example,
+
<math>\kappa^{rs,t,u}</math> is associated with the partition
+
<math>\tau=rs|t|u</math> of four indices
+
into three blocks.
+
Each term on the right is a cumulant product associated with a partition
+
<math>\sigma</math> of the same indices.
+
The coefficient is one if the least upper bound
+
<math>\sigma\vee\tau</math> has a single block,
+
otherwise zero.
+
Thus, with
+
<math>\tau=rs|t|u</math>, the product
+
<math>\kappa^{r,s}\kappa^{t,u}</math> does not appear
+
on the right because
+
<math>\sigma\vee\tau = rs|t u</math> has two blocks.
+
 
+
As an example of the way these formulae may be used,
+
let
+
<math>X</math> be a scalar random variable with cumulants
+
<math>\kappa_1,\kappa_2,\kappa_3,\ldots</math>.
+
By translating the second formula in the preceding list, we find that
+
the variance of the squared variable is
+
:<math>
+
{\rm var}(X^2) = \kappa_4 + 4\kappa_3\kappa_1 + 2\kappa_2^2 + 4\kappa_2\kappa_1^2,
+
</math>
+
reducing to
+
<math>\kappa_4 + 2\kappa_2^2</math> if the mean is zero.
+
 
+
===Exponential families===
+
Let <math> f</math> be a probability distribution on an arbitrary measurable space <math> ({\mathcal X},\nu)</math> ,
+
and let <math> t\colon{\mathcal X}\to{\mathcal R}</math> be a real-valued random variable
+
with cumulant generating function
+
<math> K(\cdot)</math> , finite in a set <math> \Theta</math> containing zero in the interior.
+
The family of distributions on <math> {\mathcal X}</math> with density
+
:<math>
+
f_\theta(x) = e^{\theta t(x)} f(x) / M(\theta) = e^{\theta t(x) - K(\theta)} f(x)
+
</math>
+
indexed by <math> \theta\in\Theta</math> is called the exponential family
+
associated with <math> f</math> and the canonical statistic <math> t</math> .
+
In statistical physics, the normalizing constant <math> M(\theta)</math> is called the
+
partition function.
+
 
+
Two examples suffice to illustrate the idea.
+
In the first example, <math> {\mathcal X} = \{1,2,\ldots\}</math> is the set of natural numbers,
+
<math> f(x) \propto 1/x^2</math> and <math> t(x) = -\log(x)</math> .
+
The associated exponential family is
+
<math> f_\theta(x) = x^{-\theta}/\zeta(\theta)</math> ,
+
where <math> \zeta(\theta)</math> is the Riemann zeta function with real argument <math> \theta > 1</math> .
+
 
+
In the second example, <math> {\mathcal X}={\mathcal X}_n</math> is the symmetric group or the set of
+
permutations of <math> n</math> letters,
+
<math> x\in{\mathcal X}_n</math> is a permutation, <math> t(x)</math> is the number of cycles,
+
<math> f(x) = 1/n!</math> is the uniform distribution,
+
and <math> M_n(\xi) = \Gamma(n+e^\xi)/(n!\, \Gamma(e^\xi))</math> for all real <math> \xi</math> .
+
The exponential family of distributions on permutations of <math> [n]</math> is
+
:<math>
+
f_{n,\theta}(x) = \frac{\Gamma(\lambda)\, \lambda^{t(x)}} {\Gamma(n+\lambda)},
+
</math>
+
the same as the the distribution generated by the Chinese restaurant process
+
with parameter <math> \lambda = e^\theta</math> .
+
The associated marginal distribution on partitions,
+
the Ewens distribution on partitions of <math> [n]</math> ,
+
is also of the exponential-family form with canonical statistic equal
+
to the number of blocks or cycles.
+
This number <math> t(x)</math> is a random variable whose cumulants are the
+
derivatives of <math> \log M(\cdot)</math> evaluated at the parameter <math> \theta</math> .
+
 
+
 
+
In the multi-parameter case,
+
<math> t\colon{\mathcal X}\to{\mathcal R}^p</math> is a random vector
+
and <math> \xi\colon{\mathcal R}^p\to{\mathcal R}</math> is a linear functional,
+
<math> M(\xi) = E(e^{\xi(t)})</math> is the joint moment generating function.
+
It is sometimes convenient to employ Einstein's implicit summation convention
+
in the form <math> \theta(t) = \theta_i t^i</math> where <math> t^1,\ldots, t^p</math> are
+
the components of <math> t(x)</math> , and <math> \theta_1,\ldots, \theta_p</math> are the coefficients
+
of the linear functional.
+
For simplicity of notation in what follows, <math> {\mathcal X}={\mathcal R}^p</math> and <math> t(x) = x</math>
+
is the identity function.
+
An exponential-family distribution in <math> {\mathcal R}^p</math> has the form
+
:<math>
+
f_\theta(x)=\exp(x^j\theta_j-g(x)-\varphi(\theta))
+
</math>
+
for given functions <math> g</math> and <math> \varphi</math> .
+
Integration shows that the distribution <math> f_\theta</math> has
+
cumulant generating function <math> K_\theta(\xi)=\varphi(\theta+\xi)-\varphi(\theta)</math> .
+
The cumulants of <math> X\sim f_\theta</math> are equal to the derivatives of <math> \varphi</math>
+
at the parameter <math> \theta</math> .
+
 
+
===Calculus of cumulants===
+
The umbral calculus is a syntax or formal system consisting of
+
certain operations on objects called umbrae,
+
mimicking addition and multiplication of independent real-valued random
+
variables.  Rota and Taylor (1994) reviews this calculus.
+
To each real-valued sequence <math> 1, a_1, a_2,\ldots</math>
+
there corresponds an umbra <math> \alpha</math> such that <math> E(\alpha^r) = a_r</math> .
+
This freedom gives rise to special umbrae, the singleton and Bell umbra,
+
corresponding to no real-valued random variable.
+
Using these special umbrae, one develops the formal notion of an
+
<math>\alpha</math>-cumulant umbra
+
<math>\chi\cdot\alpha</math>
+
by formal product operations in the syntax.
+
Properties of cumulants, <math> k</math> -statistics and other polynomial functions
+
are then derived by purely formal combinatorial operations.
+
Di Nardo et al. (2008) present details.
+
 
+
Streitberg (1990) presents parallels between the calculus of cumulants and the
+
calculus of certain decompositions of multivariate cumulative distribution
+
functions into independent segments; these characterizations in terms of
+
independent segments are called Lancaster interactions.
+
===Moment and Cumulant Measures for Random Measures===
+
Moments and cumulants extend quite naturally to random distributions. 
+
Let <math>\upsilon</math> be a random measure on a space <math>\Upsilon</math>. 
+
Then the expectation of <math>\upsilon</math> is
+
defined as that measure such that <math>E(\upsilon)(A)=E(\upsilon(A))</math>, for <math>A</math> in a suitable sigma field.  Higher--order
+
moments then translate to expectations of product measures.
+
Let <math>\upsilon^{(k)}</math> be the measure defined on
+
<math>\Upsilon^k</math>, such that
+
<math>\upsilon^{(k)}(A_1\times\cdots\times A_k)=\prod_{j=1}^k\upsilon(A_j)</math>. 
+
Then the moment of order <math>k</math> of <math>\upsilon</math> is <math>E(\upsilon^{(k)})</math>.
+
A moment generating functional can similarly be defined for <math>\upsilon</math>; a heuristic definition may be constructed through analogy with
+
(<ref>powerseries</ref>): Let
+
:<math>
+
\Phi(f)=\sum_{r=0}^\infty f(x_1)\ldots f(x_r)\upsilon^{(r)}(d x_1\cdots d x_r)/r!,
+
</math>
+
for certain functions <math>f</math> on <math>\Upsilon</math>,
+
and moments can be recovered from <math>\Phi(f)</math> via Fr\'echet
+
differentiation. 
+
Cumulants can then be defined as in (<ref>forward</ref>), using the obvious analogy.
+
These moments and cumulants have application to the theory of point processes.
+
The above exposition, and applications to the theory of point processes,
+
can be found in Daley and Vere-Jones (1988).
+
==Approximation of distributions==
+
===Edgeworth approximation===
+
Suppose that
+
<math>Y</math> is a random variable that arises as the sum
+
of
+
<math>n</math> independent and identically-distributed summands, each of which has
+
mean
+
<math>0</math>, unit variance, and
+
cumulants
+
<math>\kappa_r</math>, and
+
<math>X=Y/\sqrt{n}</math>.
+
For ease of exposition, assume that cumulants of all orders exist.
+
Then, using (<ref>ndep</ref>), the cumulant generating function of
+
<math>X</math> is given by
+
<math>K(\xi)=\xi^2/2 +\kappa_3\xi^3/(6\sqrt{n}) +\kappa_4\xi^4/(24 n) +\cdots</math>,
+
and the moment generating function of
+
<math>X</math> is given by
+
:<math>
+
K(\xi)=\exp(\xi^2/2)\exp(\kappa_3\xi^3/(6\sqrt{n})+\kappa_4\xi^4/(24 n)+\cdots)
+
</math>
+
Exponentiating the second factor gives
+
:<math>
+
K(\xi)=\exp(\xi^2/2)\left(1\!+\!{{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\!+\! {\textstyle{\frac12}} \left[
+
{{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\right]^2\!+\!\!\cdots\right).
+
</math>
+
Reordering terms in powers of sample size,
+
:<math  kseries>
+
=\exp(\xi^2/2)\left(1+{{\kappa_3\xi^3}\over{6\sqrt{n}}}+{{\kappa_4\xi^4}\over{24 n}}+
+
{{\kappa_3^2\xi^6}\over{72 n}}+\cdots\right).
+
</math>
+
Repeated application of integration by parts to (<ref>mgfdef</ref>) shows that
+
:<math  mgfderiv>
+
\xi^r M(\xi) =\int_{-\infty}^\infty\exp(\xi x)(-1)^r f^{(r)}(x) d x,
+
</math>
+
where
+
<math>f^{(r)}</math> denotes the derivative of
+
<math>f</math> of order
+
<math>r</math>.  Relation
+
(<ref>mgfderiv</ref>) holds if
+
<math>f</math> and its derivatives go to zero quickly
+
as
+
<math>\vert x\vert\to\infty</math>.  Applying (<ref>mgfderiv</ref>) to the normal
+
density
+
<math>\phi(x)=\exp(-x^2/2)/\sqrt{2\pi}</math>, and applying the result to
+
(<ref>kseries</ref>), gives
+
:<math>
+
M(\xi)\approx\int_{-\infty}^\infty\exp(\xi x)\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+
+
{{\kappa_3^2h^6(x)}\over{72 n}}\right] d x
+
</math>
+
for
+
<math>h^r(x)=(-1)^r\phi^{(r)}(x)/\phi(x)</math>, and, since the relationship
+
giving the moment generating function in terms of the density is invertible,
+
and that the inversion process is properly smooth,
+
Edgeworth (1907) approximates the density of
+
<math>X</math> by
+
:<math  edser>
+
e_4(x)=\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+
+
{{\kappa_3^2h^6(x)}\over{72 n}}\right].
+
</math> 
+
In fact, when the summands contributing to
+
<math>S</math> have a density and cumulants of order at least 5, the error in the
+
approximation, multiplied by
+
<math>n^{3/2}</math>, remains bounded. 
+
The functions
+
<math>h^r</math> defined above are the Hermite polynomials.
+
The approximation (<ref>edser</ref>) is known as the Edgeworth series.
+
The subscript refers to the number of cumulants used in its definition.
+
This series can be used to approximate either the cumulative distribution function or survival function through term-wise integration.
+
 
+
The preceding discussion is intended to be heuristic; Kolassa (2006) presents
+
a rigorous derivation, along with the natural extension to random vectors.
+
===Saddlepoint approximation===
+
The approximation (<ref>edser</ref>) to the density
+
<math>f(x)</math> has the property that
+
<math>|f(x)-e_r(x)|\leq C n^{-(r-1)/2}</math>, for some constant
+
<math>C</math>,
+
when the cumulant of order
+
<math>r+1</math> exists;
+
<math>C</math> does not depend on
+
<math>x</math>.
+
A similar bound holds for the relative error
+
<math>(f(x)-e_r(x))/f(x)</math>, only when
+
<math>x</math> is restricted to a finite interval.
+
Because of the polynomial factor multiplying the first omitted term in
+
(<ref>edser</ref>), the relative error can be expected to behave poorly.
+
One might prefer an approximation that maintains good behavior for
+
values of
+
<math>X</math> in a range that increases as
+
<math>n</math> increases; specifically,
+
one might prefer an approximation that performs well for values of
+
<math>\bar Y=X/\sqrt{n}</math> in a fixed interval.
+
 
+
Assume again that random variables
+
<math>Y_j</math> are independent and identically distributed, each with a cumulant generating function
+
<math>K(\xi)</math> finite for
+
<math>\xi</math>
+
in a neighborhood of
+
<math>0</math>.  As above, define the exponential family
+
:<math>
+
f_{\bar Y}(\bar y;\theta)=\exp(\theta\bar y-K(\theta))f_{\bar Y}(\bar y).
+
</math>
+
One can then choose a value of
+
<math>\theta</math> depending on
+
<math>\bar y</math>
+
that makes
+
<math>f_{\bar Y}(\bar y;\theta)</math> easy to approximate, and
+
the exponential family relationship to derive an approximation for
+
<math>f_{\bar Y}(\bar y)</math>.  Conventionally we choose
+
<math>\hat\theta</math> to
+
satisfy
+
:<math  speqn>
+
K'(\hat\theta)=\bar y;
+
</math>
+
this makes the expectation of the distribution
+
with density
+
<math>f_{\bar Y}(\cdot;\hat\theta)</math> equal to the observed value.
+
One then applies (<ref>edser</ref>), with the scale of the ordinate changed
+
to reflect the fact that we are approximating the distribution of
+
<math>X/\sqrt{n}</math>,
+
to obtain
+
:<math>
+
f_{\bar Y}(\bar y)\approx\exp(-\hat\theta\bar y+K(\hat\theta))
+
n\phi(0)\left[1+{{\kappa_3 h^3(0)}\over{6\sqrt{n}}}+{{\kappa_4h^4(0)}\over{24 n}}+
+
{{\kappa_3^2h^6(0)}\over{72 n}}\right].
+
</math>
+
Using the fact that <math>h^3(0)=0</math>,
+
<math>h^4(0)=3</math>, and <math>h^6(0)=-15</math>,
+
we obtain :<math  spser>
+
f_{\bar Y}(\bar y)\approx{{n}\over{\sqrt{2\pi}}}
+
\exp(K(\hat\theta)-\hat\theta\bar y)
+
\left[1+{{\hat\kappa_4}\over{8 n}}-
+
{{5\hat\kappa_3^2}\over{24 n}}\right].
+
</math>
+
Here
+
<math>\hat\kappa_j</math> are calculated from the derivatives of
+
<math>K</math> in the preceding manner, but in this case evaluated at
+
<math>\hat\theta</math>.
+
This approximation may only be applied to values of
+
<math>\bar y</math> for which
+
(<ref>speqn</ref>) has solutions in an open neighborhood of 0.
+
Expression (<ref>spser</ref>) represents the saddlepoint approximation to
+
the density of the mean
+
<math>\bar Y</math>; since
+
<math>f_{\bar Y}(\bar y;\theta)</math>
+
has a cumulant generating function defined on an open set containing
+
<math>0</math>,
+
cumulants of all orders exist, the Edgeworth series including
+
<math>\kappa_6</math>
+
may be applied to
+
<math>f_{\bar Y}(\bar y;\theta)</math>, and so the error in the
+
Edgeworth series is of order
+
<math>O(1/n^2)</math>.  Hence the error in (<ref>spser</ref>)
+
is of the same order, and in this case, is relative and uniform for values of
+
<math>\bar y</math> in a bounded subset of an open subset on which (<ref>speqn</ref>)
+
has a solution.
+
This approximation was introduced to the statistics literature by
+
Daniels (1954).
+
 
+
The Edgeworth series for the density was trivially integrated to obtain an
+
approximation to tail probabilities.  Integration of the saddlepoint
+
approximation is more delicate.  Two main approaches have been investigated.
+
Daniels (1987) expresses
+
<math>f_{\bar Y}(\bar y)</math> exactly as a complex integral
+
involving
+
<math>K(\xi)</math>, integrates with respect to
+
<math>\bar y</math> to obtain another
+
complex integral, and reviews techniques for approximating the resulting
+
integrals.
+
Robinson (1982) and Lugannani and Rice (1980) derive tail probability approximations based
+
on approximately integrating (<ref>spser</ref>) with respect to
+
<math>\bar y</math> directly.
+
 
+
These saddlepoint and Edgeworth approximations have multivariate and
+
conditional extensions.  Davison (1988) exploits the conditional saddlepoint tail probability approximation to perform inference in canonical exponential families.
+
==Samples and sub-samples==
+
A function
+
<math>f\colon{\mathcal R}^n\to{\mathcal R}</math> is symmetric if
+
<math>f(x_1 ,\ldots, x_n) = f(x_{\pi(1)} ,\ldots, x_{\pi(n)})</math>
+
for each permutation
+
<math>\pi</math> of the arguments.
+
For example, the total
+
<math>T_n = x_1 + \cdots + x_n</math>, the average
+
<math>T_n/n</math>,
+
the min, max and median are symmetric functions, as are the sum of squares
+
<math>S_n = \sum x_i^2</math>, the sample variance
+
<math>s_n^2 = (S_n - T_n^2/n)/(n-1)</math>
+
and the mean absolute deviation
+
<math>\sum |x_i - x_j|/(n(n-1))</math>.
+
 
+
A vector
+
<math>x</math> in
+
<math>{\mathcal R}^n</math> is an ordered list of
+
<math>n</math> real numbers
+
<math>(x_1 ,\ldots, x_n)</math>
+
or a function
+
<math>x\colon[n]\to{\mathcal R}</math> where
+
<math>[n]=\{1 ,\ldots, n\}</math>.
+
For
+
<math>m \le n</math>, a 1--1 function
+
<math>\varphi\colon[m]\to[n]</math> is a sample of size
+
<math>m</math>,
+
the sampled values being
+
<math>x\varphi = (x_{\varphi(1)} ,\ldots, x_{\varphi(m)})</math>.
+
All told, there are
+
<math>n(n-1)\cdots(n-m+1)</math> distinct samples of size
+
<math>m</math>
+
that can be taken from a list of length
+
<math>n</math>.
+
A ''sequence'' of functions
+
<math>f_n\colon{\mathcal R}^n\to{\mathcal R}</math> is
+
consistent under sub-sampling if, for each
+
<math>f_m, f_n</math>,
+
:<math>
+
f_n(x) = {\rm ave} _\varphi f_m(x\varphi),
+
</math>
+
where
+
<math>{\rm ave} _\varphi</math> denotes the average over samples of size
+
<math>m</math>.
+
For
+
<math>m=n</math>, this condition implies only that
+
<math>f_n</math> is a symmetric function.
+
 
+
Although the total and the median are both symmetric functions, neither is consistent
+
under sub-sampling.
+
For example, the median of the numbers
+
<math>(0,1,3)</math> is one,
+
but the average of the medians of samples of size two is 4/3.
+
However, the average
+
<math>\bar x_n = T_n/n</math> is sampling consistent.
+
Likewise the sample variance
+
<math>s_n^2 = \sum(x_i - \bar x)^2/(n-1)</math> with divisor
+
<math>n-1</math>
+
is sampling consistent,
+
but the mean squared deviation
+
<math>\sum(x_i - \bar x_n)^2/n</math> with divisor
+
<math>n</math> is not.
+
Other sampling consistent functions include Fisher's
+
<math>k</math>-statistics,
+
the first few of which are
+
<math>k_{1,n} = \bar x_n</math>,
+
<math>k_{2,n} = s_n^2</math> for
+
<math>n\ge 2</math>,
+
<math>
+
k_{3,n} = n\sum(x_i - \bar x_n)^3/((n-1)(n-2)),
+
</math>
+
defined for
+
<math>n\ge 3</math>.
+
 
+
For a sequence of independent and identically distributed random variables,
+
the
+
<math>k</math>-statistic of order
+
<math>r\le n</math> is the unique symmetric function
+
such that
+
<math>E(k_{r,n}) = \kappa_r</math>.
+
Fisher (1929) derived the variances and covariances.
+
The connection with finite-population sub-sampling was developed by
+
Tukey (1950).
+
 
+
 
+
==References==
+
*D. J. Daley and D. Vere-Jones. ''An Introduction to the Theory of Point Processes''. Springer-Verlag, New York, 1988.
+
 
+
*H. E. Daniels. Saddlepoint approximations in statistics. ''The Annals of Mathematical Statistics'', 25  (4): 631--650, 1954.
+
 
+
*H. E. Daniels. Tail probability approximations. ''Review of the International Statistical Institute'', 55:  37--46, 1987.
+
 
+
*A. C. Davison. Approximate conditional inference in generalized linear models. ''Journal of the Royal Statistical Society Series B'', 50:  445--461, 1988.
+
 
+
*E. Di Nardo, G. Guarino, and D. Senato. A unifying framework for $k$-statistics, polykays and their  multivariate generalizations. ''Bernoulli'', 14: 440--468, 2008.
+
 
+
*P. L. Dressel. Statistical seminvariants and their setimates with particular  emphasis on their relation to algebraic invariants. ''The Annals of Mathematical Statistics'', 11  (1): 33--57, 1940.
+
 
+
*F. Y. Edgeworth. On the representation of statistical frequency by a series. ''Journal of the Royal Statistical Society'', 70  (1): 102--106, 1907.
+
 
+
*R. A. Fisher. Moments and product moments of sampling distributions. ''Proceedings of the London Mathematical Society, Series 2'',  30: 199--238, 1929.
+
 
+
*I. J. Good. A new formula for k-statistics. ''The Annals of Statistics'', 5 (1): 224--228,  1977.
+
 
+
*A. Hald. The early history of cumulants and the Gram-Charlier series. ''International Statistical Review'', 68: 137--153, 2000.
+
 
+
*C. C. Heyde. On a property of the lognormal distribution. ''Journal of the Royal Statistical Society. Series B  (Methodological)'', 25 (2): 392--393, 1963.
+
 
+
*J. E. Kolassa. ''Series Approximation Methods in Statistics''. Springer--Verlag, New York, 2006.
+
 
+
*S.L. Lauritzen, editor. ''Thiele: pioneer in statistics''. Oxford University Press, New York, 2002.
+
 
+
*R. Lugannani and S. Rice. Saddle point approximation for the distribution of the sum of  independent random variables. ''Advances in Applied Probability'', 12: 475--490, 1980.
+
 
+
*J. Marcinkiewicz. Sur une peropri'et'e de la loi de Gauss. ''Mathematische Zeitschrift'', 44: 612--618, 1939.
+
 
+
*J. Robinson. Saddlepoint approximations for permutation tests and confidence  intervals. ''Journal of the Royal Statistical Society. Series B  (Methodological)'', 44 (1): 91--101, 1982.
+
 
+
*G.-C. Rota and B. D. Taylor. The classical umbral calculus. ''SIAM J. Math. Anal'', 25 (2): 694--711, 1994.
+
 
+
*B. Streitberg. Lancaster interactions revisited. ''The Annals of Statistics'', 18 (4): 1878--1885,  1990.
+
 
+
*T. N. Thiele. ''Almindelig Iagttagelseslaere: Sandsynlighedsregning og mindste  Kvadraters Methode''. C. A. Reitzel, Copenhagen, 1889.
+
 
+
*T. N. Thiele. ''Theory of Observations''. C. & E. Layton, London, 1903.
+
 
+
*J. W. Tukey. Some sampling simplified. ''Journal of the American Statistical Association'', 45  (252): 501--519, 1950.
+

Revision as of 15:53, 9 December 2008

This article describes a sequence of numbers, called cumulants, that are used to describe, and in some circumstances approximate, a univariate or multivariate distribution. Cumulants are not unique in this role; other sequences, such as moments and their generalizations, may also be used in both roles. Cumulants have multiple advantages over competitors, in that cumulants change in a very simple way when the underlying random variable is subject to an affine transformation, cumulants for sums of independent random variables have a very simple relationship to the cumulants of the addends, and cumulants may be used in a simple way to describe the difference between a distribution and its simplest Gaussian approximation.

The moment generating function (or Fourier--Laplace transform) <math powerseries> M(\xi) = E(e^{\xi X}) </math> is an easy way to combine all of the moments into a single expression. The cumulants up to order four are defined even though the moment generating function (<ref>powerseries</ref>) does not exist for any real \( \xi\neq 0\) .

It has finite moments \( \mu_r = e^{r^2/2}\) of all orders, but equation (<ref>powerseries</ref>) diverges for every \( \xi\neq 0\).

Personal tools
Namespaces

Variants
Actions
Navigation
Focal areas
Activity
Tools