Dr. John Kolassa

Revision as of 15:53, 9 December 2008

This article describes a sequence of numbers, called cumulants, that are used to describe, and in some circumstances approximate, a univariate or multivariate distribution. Cumulants are not unique in this role; other sequences, such as moments and their generalizations, may also be used in both roles. Cumulants have multiple advantages over competitors, in that cumulants change in a very simple way when the underlying random variable is subject to an affine transformation, cumulants for sums of independent random variables have a very simple relationship to the cumulants of the addends, and cumulants may be used in a simple way to describe the difference between a distribution and its simplest Gaussian approximation.

The moment generating function (or Fourier--Laplace transform) <math powerseries> M(\xi) = E(e^{\xi X}) </math> is an easy way to combine all of the moments into a single expression. The cumulants up to order four are defined even though the moment generating function (<ref>powerseries</ref>) does not exist for any real \( \xi\neq 0\) .

It has finite moments \( \mu_r = e^{r^2/2}\) of all orders, but equation (<ref>powerseries</ref>) diverges for every \( \xi\neq 0\).

Dr. John Kolassa

Revision as of 15:53, 9 December 2008

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Focal areas

Activity

Tools

@@ Line 1: / Line 1: @@
-This is a draft, 1st author Peter McCullagh
 This article describes a sequence of numbers, called ''' cumulants''',
 that are used
@@ Line 12: / Line 10: @@
 be used in a simple way to describe the difference between a distribution and
 its simplest Gaussian approximation.
-==Overview and Definitions==
-===Definition===
-The moment of order
-<math>r</math> (or
-<math>r</math>th moment) of a real-valued random variable
-<math>X</math> is
-:<math>
-\mu_r = E(X^r)
-</math>
-for integer
-<math>r=0,1,\ldots</math>.
-The value is assumed to be finite.
-Provided that it has a Taylor expansion about the origin,
 The moment generating function (or Fourier--Laplace transform)
-:<math  powerseries>
+<math  powerseries>
 M(\xi) = E(e^{\xi X})
-= E(1 + \xi X +\cdots + \xi^r X^r/r!+\cdots)
-	= \sum_{r=0}^\infty \mu_r \xi^r/r!
 </math>
 is an easy way to combine all of the moments into a single expression.
-The
+The cumulants up to order four are defined
-<math>r</math>th moment is hence the
-<math>r</math>th derivative of
-<math>M</math> at the origin.
-This definition is due to Fisher (1929).
-When
-<math>X</math> has a distribution given by a density
-<math>f</math>, then
-:<math  ctsmomdef>
-\mu_r = \int_{-\infty}^\infty x^r f(x)\,dx,</math> and
-:<math  mgfdef>
-M(\xi) = E(e^{\xi X}) =\int_{-\infty}^\infty\exp(\xi x) f(x) d x.
-</math>
-The cumulants
-<math>\kappa_r</math> are the coefficients in the Taylor expansion of
-the cumulant generating function about the origin
-:<math>
-K(\xi) = \log M(\xi) = \sum_{r} \kappa_r \xi^r/r!.
-</math>
-Evidently
-<math>\mu_0 = 1</math> implies
-<math>\kappa_0 = 0</math>.
-The relationship between the first few moments and cumulants,
-obtained by extracting coefficients from the expansion, is as follows
-:<math  forward>\begin{array}{lcl}
-\kappa_1 &=& \mu_1 \\
-\kappa_2 &=& \mu_2 - \mu_1^2\\
-\kappa_3 &=& \mu_3 - 3\mu_2\mu_1 + 2\mu_1^3\\
-\kappa_4 &=& \mu_4 - 4\mu_3\mu_1 - 3\mu_2^2 + 12\mu_2\mu_1^2 -6\mu_1^4.
-\end{array}</math>
-In the reverse direction
-:<math  reverse>\begin{array}{lcl}
-\mu_2 &=& \kappa_2 + \kappa_1^2\\
-\mu_3 &=& \kappa_3 + 3\kappa_2\kappa_1 + \kappa_1^3\\
-\mu_4 &=& \kappa_4 + 4\kappa_3\kappa_1 + 3\kappa_2^2 + 6\kappa_2\kappa_1^2 + \kappa_1^4.
-\end{array}</math>
-In particular,
-<math>\kappa_1 = \mu_1</math> is the mean of
-<math>X</math>,
-<math>\kappa_2</math> is the
-variance, and
-<math>\kappa_3 = E((X - \mu_1)^3)</math>.
-Higher-order cumulants are not the same as moments about the mean.
-Hald (2000) credits Thiele (1889) with the first derivation of cumulants.
-Lauritzen (2002) presents an overview, translation, and reprinting of much of this early work.
-===Examples===
-As above, let <math> {\mathcal R}</math> denote the real numbers.
-Let <math> {\mathcal R}^+</math> represent the positive reals, and let <math> {\mathcal N}=\{0,1,\ldots\}</math> be the natural numbers.
-<table><tr><td>Distribution</td><td>Density </td><td>CGF</td><td>Cumulants</td></tr>
-<tr><td>Normal</td><td><math> \frac{\exp(-x^2)}{\sqrt{2\pi}}, x\in{\mathcal R}</math></td><td><math> \xi^2/2</math></td><td><math> \kappa_1=0</math>, <math> \kappa_2=1</math>, <math> \kappa_r=0</math> for <math>r>2</math></td></tr>
-<tr><td>Bernoulli</td><td><math> \pi^x(1-\pi)^{1-x}, x\in\{0,1\}</math></td><td><math> \log(1-\pi+\pi\exp(\xi))</math></td><td><math> \kappa_1=\pi</math>, <math> \kappa_2=\pi(1-\pi)</math>, <math> \kappa_3=[2 \pi ^3-3 \pi ^2+\pi]</math></td></tr>
-<tr><td>Poisson</td><td><math> \frac{\exp(-\lambda)\lambda^x}{x!}, x\in{\mathcal N}      </math></td><td><math> (e^{\xi }-1)\lambda</math></td><td><math> \kappa_r=\lambda \ \forall r</math> </td></tr>
-<tr><td>Exponential</td><td><math> \frac{\exp(-x/\lambda)}{\lambda}, x\in{\mathcal R}^+</math></td><td><math> -\log(1-\lambda\xi)</math></td><td><math> \kappa_r=\lambda^r(r-1)!  \ \forall r</math> </td></tr>
-<tr><td>Geometric</td><td><math> (1-\pi)\pi^x, x\in{\mathcal N}</math></td><td><math> \log(1-\pi)-\log(1-\pi\exp(\xi)) </math></td><td> <math> \kappa_1=\rho</math>, <math> \kappa_2=\rho^2+\rho</math>,<math> \kappa_3=2 \rho ^3+3 \rho ^2+\rho</math> for <math> \rho=\pi/(1-\pi)</math>.</td></tr>
-</table>
-===Definitions under less restrictive conditions===
-The Cauchy distribution with density <math> \pi^{-1}/(1+x^2)</math> has no moments because
-the integral (<ref>ctsmomdef</ref>) does not converge for any integer <math> r\ge 1</math>
-Student's <math> t</math> distribution on five degrees of freedom is symmetric with density
-<math> (3\pi\surd5/8)/(1 + x^2/5)^3</math>
-The first four moments are <math> 0, 5/3, 0, 25</math> : higher-order moments are
-not defined.
-The cumulants up to order four are defined by (<ref>forward</ref>)
 even though the moment generating function (<ref>powerseries</ref>) does not exist
 for any real <math> \xi\neq 0</math> .
-In both of these cases, the characteristic function <math> M(i\xi)</math> is
-well-defined for real <math> \xi</math> ,
-<math> \exp(-|\xi|)</math> for the Cauchy distribution,
-and <math> \exp(-|\xi|\surd 5)(1 + |\xi|\surd5 + 5\xi^2/3)</math>  for <math> t_5</math> .
-In the latter case, both <math> M(i\xi)</math> and <math> K(i\xi)</math>
-have Taylor expansions up to order four only, so the moments and
-cumulants are defined only up to this order.
-The infinite expansion (<ref>powerseries</ref>) is justified when
-the radius of convergence is positive, in which case <math> M(\xi)</math> is finite on
-an open set containing zero, and all moments and cumulants are finite.
-However, finiteness of the moments does not imply that <math> M(\xi)</math>
-exists for any <math> \xi\neq 0</math> .
-The log normal distribution provides a counterexample.
 It has finite moments <math> \mu_r = e^{r^2/2}</math> of all orders,
-but (<ref>powerseries</ref>) diverges for every <math> \xi\neq 0</math>.
+but equation (<ref>powerseries</ref>) diverges for every <math> \xi\neq 0</math>.
-===Uniqueness===
-The normal distribution
-<math>N(\mu, \sigma^2)</math> has cumulant generating function
-<math>\xi\mu + \xi^2 \sigma^2/2</math>, a quadratic polynomial implying that all cumulants
-of order three and higher are zero.
-Marcinkiewicz (1939) showed that the normal distribution is the only distribution
-whose cumulant generating function is a polynomial, i.e. the only distribution
-having a finite number of non-zero cumulants.
-The Poisson distribution with mean
-<math>\mu</math> has moment generating function
-<math>\exp(\mu(e^\xi - 1))</math> and cumulant generating function
-<math>\mu(e^\xi -1)</math>.
-Consequently all the cumulants are equal to the mean.
-Two distinct distributions may have the same moments, and hence the same cumulants.
-This statement is fairly obvious for distributions whose moments are all infinite,
-or even for distributions having infinite higher-order moments.
-But it is much less obvious for distributions having finite moments of all orders.
-Heyde (1963) gave one such pair of distributions with densities
-<math>
-f_1(x) = \exp(-(\log x)^2/2) / (x\sqrt{2\pi})
-</math>
-and <math>
-f_2(x) = f_1(x) [1 + \sin(2\pi\log x)/2]
-</math>
-for
-<math>x > 0</math>.
-The first of these is called the log normal distribution.
-To show that these distributions have the same moments it suffices to show that
-:<math>
-\int_0^\infty x^k f_1(x) \sin(2\pi\log x)\, dx = 0
-</math>
-for integer
-<math>k\ge 1</math>, which can be shown by making the substitution
-<math>\log x = y+k</math>.
-If the sequence of moments is such that (<ref>powerseries</ref>)
-has a finite radius of convergence, the distribution is uniquely determined.
-===Properties===
-Cumulants of order
-<math>r \ge 2</math> are called semi-invariant on account of their
-behavior under affine transformation of variables (Thiele ,1903, Dressel ,1940).
-If
-<math>\kappa_r</math> is the
-<math>r</math>th cumulant of
-<math>X</math>,
-the
-<math>r</math>th cumulant of the affine transformation
-<math>a + b X</math> is
-<math>b^r \kappa_r</math>,
-independent of
-<math>a</math>.
-This behavior is considerably simpler than that of moments.
-However, moments about the mean are also semi-invariant, so this property alone
-does not explain why cumulants are useful for statistical purposes.
-The term cumulant was coined by Fisher (1929) on account of their behavior under
-addition of random variables.
-Let
-<math>S = X+Y</math> be the sum of two independent random variables.
-The moment generating function of the sum is the product
-:<math>
-M_S(\xi) =  M_X(\xi) M_Y(\xi),
-</math>
-and the cumulant generating function is the sum
-:<math>
-K_S(\xi) = K_X(\xi) + K_Y(\xi).
-</math>
-Consequently, the
-<math>r</math>th cumulant of the sum is the sum of the
-<math>r</math>th cumulants.
-By extension, if
-<math>X_1,\ldots X_n</math> are independent and identically distributed,
-the
-<math>r</math>th cumulant of the sum is
-<math>n\kappa_r</math>.
-Let
-<math>\kappa_{n;r}</math> be
-cumulant of order
-<math>r</math> of the standardized sum
-<math>n^{-1/2}(X_1+\cdots + X_n)</math>;
-then
-:<math  ndep>
-\kappa_{n;r}=n^{1-r/2} \kappa_r.
-</math>
-Provided that the cumulants are finite, all cumulants of order
-<math>r\ge 3</math>
-of the standardized sum tend to zero, which is a simple demonstration of the central limit theorem.
-Good (1977) obtained an expression for the
-<math>r</math>th cumulant of
-<math>X</math> as
-the
-<math>r</math>th moment of the discrete Fourier transform of an independent and
-identically distributed sequence as follows.
-Let
-<math>X_1, X_2,\ldots</math> be independent copies of
-<math>X</math> with
-<math>r</math>th cumulant
-<math>\kappa_r</math>,
-and let
-<math>\omega = e^{2\pi i/n}</math> be a primitive
-<math>n</math>th root of unity.
-The discrete Fourier combination
-:<math>
-Z = X_1 + \omega X_2 + \cdots + \omega^{n-1} X_n
-</math>
-is a complex-valued random variable whose distribution is invariant under
-rotation
-<math>Z\sim \omega Z</math> through multiples of
-<math>2\pi /n</math>.
-The
-<math>r</math>th cumulant of the sum is
-<math>\kappa_r \sum_{j=1}^n \omega^{r j}</math>,
-which is equal to
-<math>n\kappa_r</math> if
-<math>r</math> is a multiple of
-<math>n</math>, and zero otherwise.
-Consequently
-<math>E(Z^r) = 0</math> for integer
-<math>r < n</math> and
-<math>E(Z^n) = n\kappa_n</math>.
-===Multivariate cumulants===
-Somewhat surprisingly, the relation between moments and cumulants is simpler and
-more transparent in the multivariate case than in the univariate case.
-Let
-<math>X = (X^1,\ldots, X^k)</math> be the components of a random vector.
-In a departure from the univariate notation, we write
-<math>\kappa^r = E(X^r)</math> for the components of the mean vector,
-<math>\kappa^{rs} = E(X^r X^s)</math> for the components of the second moment matrix,
-<math>\kappa^{r s t} = E(X^r X^s X^t)</math> for the third moments, and so on.
-It is convenient notationally to adopt Einstein's summation convention,
-so
-<math>\xi_r X^r</math> denotes the linear combination
-<math>\xi_1 X^1 + \cdots + \xi_k X^k</math>,
-the square of the linear combination is
-<math>(\xi_r X^r)^2 = \xi_r\xi_s X^r X^s</math>
-a sum of
-<math>k^2</math> terms, and so on for higher powers.
-The Taylor expansion of the moment generating function
-<math>M(\xi) = E(\exp(\xi_r X^r)</math>
-is
-:<math>
-M(\xi) = 1 + \xi_r \kappa^r
-	+ \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{rs}
-	+ \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r s t} +\cdots.
-</math>
-The cumulants are defined as the coefficients
-<math>\kappa^{r,s}, \kappa^{r,s,t},\ldots</math>
-in the Taylor expansion
-:<math>
-\log M(\xi) = \xi_r \kappa^r
-+ \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{r,s}
-+ \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r,s,t} +\cdots.
-</math>
-This notation does not distinguish first-order moments from first-order cumulants,
-but commas separating the superscripts serve to distinguish higher-order cumulants from moments.
-Comparison of coefficients reveals that the each moment
-<math>\kappa^{rs}, \kappa^{r s t},\ldots</math>
-is a sum over partitions of the superscripts, each term in the sum being a
-product of cumulants:
-:<math>\begin{array}{lcl}
-\kappa^{rs}&=&\kappa^{r,s} + \kappa^r\kappa^s\\
-\kappa^{r s t}&=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t + \kappa^{r,t}\kappa^s + \kappa^{s,t}\kappa^r
-	+ \kappa^r\kappa^s\kappa^t\\
-	&=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t[3] + \kappa^r\kappa^s\kappa^t\\
-\kappa^{r s t u}&=&\kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,s}\kappa^{t,u}[3]
-	+ \kappa^{r,s}\kappa^t\kappa^u[6] + \kappa^r\kappa^s\kappa^t\kappa^u.
-\end{array}</math>
-Each parenthetical number indicates a sum over distinct partitions
-having the same block sizes, so the fourth-order moment is a sum of 15 distinct cumulant products.
-In the reverse direction, each cumulant is also a sum over partitions of the indices.
-Each term in the sum is a product of moments, but with coefficient
-<math>(-1)^{\nu-1} (\nu-1)!</math>
-where
-<math>\nu</math> is the number of blocks:
-:<math>\begin{array}{lcl}
-\kappa^{r,s} &=& \kappa^{rs} - \kappa^r\kappa^s\\
-\kappa^{r,s,t} &=& \kappa^{r s t} - \kappa^{rs}\kappa^t[3] + 2 \kappa^r\kappa^s\kappa^t\\
-\kappa^{r,s,t,u} &=& \kappa^{r s t u} - \kappa^{r s t}\kappa^u[4] - \kappa^{rs}\kappa^{t u}[3]
-+ 2 \kappa^{rs}\kappa^t\kappa^u[6] - 6 \kappa^r\kappa^s\kappa^t\kappa^u
-\end{array}</math>
-These relationships are an instance of M\"obius inversion on the partition lattice.
-Partition notation serves one additional purpose.
-It establishes moments and cumulants as special cases of generalized cumulants,
-which includes objects of the type
-<math>\kappa^{r,st} = {\rm cov}(X^r, X^s X^t)</math>,
-<math>\kappa^{rs, t u} = {\rm cov}(X^r X^s, X^t X^u)</math>, and
-<math>\kappa^{rs, t, u}</math> with incompletely partitioned indices.
-These objects arise very naturally in statistical work involving asymptotic
-approximation of distributions.
-They are intermediate between moments and cumulants, and have characteristics of both.
-Every generalized cumulant can be expressed as a sum of certain products of ordinary cumulants.
-Some examples are as follows:
-:<math>\begin{array}{lcl}
-\kappa^{rs, t} &=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t} + \kappa^s \kappa^{r,t}\\
-	&=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t}[2]\\
-\kappa^{rs,t u} &=& \kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,t}\kappa^{s,u}[2]
-	+ \kappa^{r,t}\kappa^s\kappa^u[4]\\
-\kappa^{rs,t,u} &=& \kappa^{r,s,t,u} + \kappa^{r,t,u}\kappa^s[2] + \kappa^{r,t}\kappa^{s,u}[2]
-\end{array}</math>
-Each generalized cumulant is associated with a partition
-<math>\tau</math> of the given set of indices.
-For example,
-<math>\kappa^{rs,t,u}</math> is associated with the partition
-<math>\tau=rs|t|u</math> of four indices
-into three blocks.
-Each term on the right is a cumulant product associated with a partition
-<math>\sigma</math> of the same indices.
-The coefficient is one if the least upper bound
-<math>\sigma\vee\tau</math> has a single block,
-otherwise zero.
-Thus, with
-<math>\tau=rs|t|u</math>, the product
-<math>\kappa^{r,s}\kappa^{t,u}</math> does not appear
-on the right because
-<math>\sigma\vee\tau = rs|t u</math> has two blocks.
-As an example of the way these formulae may be used,
-let
-<math>X</math> be a scalar random variable with cumulants
-<math>\kappa_1,\kappa_2,\kappa_3,\ldots</math>.
-By translating the second formula in the preceding list, we find that
-the variance of the squared variable is
-:<math>
-{\rm var}(X^2) = \kappa_4 + 4\kappa_3\kappa_1 + 2\kappa_2^2 + 4\kappa_2\kappa_1^2,
-</math>
-reducing to
-<math>\kappa_4 + 2\kappa_2^2</math> if the mean is zero.
-===Exponential families===
-Let <math> f</math> be a probability distribution on an arbitrary measurable space <math> ({\mathcal X},\nu)</math> ,
-and let <math> t\colon{\mathcal X}\to{\mathcal R}</math> be a real-valued random variable
-with cumulant generating function
-<math> K(\cdot)</math> , finite in a set <math> \Theta</math> containing zero in the interior.
-The family of distributions on <math> {\mathcal X}</math> with density
-:<math>
-f_\theta(x) = e^{\theta t(x)} f(x) / M(\theta) = e^{\theta t(x) - K(\theta)} f(x)
-</math>
-indexed by <math> \theta\in\Theta</math> is called the exponential family
-associated with <math> f</math> and the canonical statistic <math> t</math> .
-In statistical physics, the normalizing constant <math> M(\theta)</math> is called the
-partition function.
-Two examples suffice to illustrate the idea.
-In the first example, <math> {\mathcal X} = \{1,2,\ldots\}</math> is the set of natural numbers,
-<math> f(x) \propto 1/x^2</math> and <math> t(x) = -\log(x)</math> .
-The associated exponential family is
-<math> f_\theta(x) = x^{-\theta}/\zeta(\theta)</math> ,
-where <math> \zeta(\theta)</math> is the Riemann zeta function with real argument <math> \theta > 1</math> .
-In the second example, <math> {\mathcal X}={\mathcal X}_n</math> is the symmetric group or the set of
-permutations of <math> n</math> letters,
-<math> x\in{\mathcal X}_n</math> is a permutation, <math> t(x)</math> is the number of cycles,
-<math> f(x) = 1/n!</math> is the uniform distribution,
-and <math> M_n(\xi) = \Gamma(n+e^\xi)/(n!\, \Gamma(e^\xi))</math> for all real <math> \xi</math> .
-The exponential family of distributions on permutations of <math> [n]</math> is
-:<math>
-f_{n,\theta}(x) = \frac{\Gamma(\lambda)\, \lambda^{t(x)}} {\Gamma(n+\lambda)},
-</math>
-the same as the the distribution generated by the Chinese restaurant process
-with parameter <math> \lambda = e^\theta</math> .
-The associated marginal distribution on partitions,
-the Ewens distribution on partitions of <math> [n]</math> ,
-is also of the exponential-family form with canonical statistic equal
-to the number of blocks or cycles.
-This number <math> t(x)</math> is a random variable whose cumulants are the
-derivatives of <math> \log M(\cdot)</math> evaluated at the parameter <math> \theta</math> .
-In the multi-parameter case,
-<math> t\colon{\mathcal X}\to{\mathcal R}^p</math> is a random vector
-and <math> \xi\colon{\mathcal R}^p\to{\mathcal R}</math> is a linear functional,
-<math> M(\xi) = E(e^{\xi(t)})</math> is the joint moment generating function.
-It is sometimes convenient to employ Einstein's implicit summation convention
-in the form <math> \theta(t) = \theta_i t^i</math> where <math> t^1,\ldots, t^p</math> are
-the components of <math> t(x)</math> , and <math> \theta_1,\ldots, \theta_p</math> are the coefficients
-of the linear functional.
-For simplicity of notation in what follows, <math> {\mathcal X}={\mathcal R}^p</math> and <math> t(x) = x</math>
-is the identity function.
-An exponential-family distribution in <math> {\mathcal R}^p</math> has the form
-:<math>
-f_\theta(x)=\exp(x^j\theta_j-g(x)-\varphi(\theta))
-</math>
-for given functions <math> g</math> and <math> \varphi</math> .
-Integration shows that the distribution <math> f_\theta</math> has
-cumulant generating function <math> K_\theta(\xi)=\varphi(\theta+\xi)-\varphi(\theta)</math> .
-The cumulants of <math> X\sim f_\theta</math> are equal to the derivatives of <math> \varphi</math>
-at the parameter <math> \theta</math> .
-===Calculus of cumulants===
-The umbral calculus is a syntax or formal system consisting of
-certain operations on objects called umbrae,
-mimicking addition and multiplication of independent real-valued random
-variables.  Rota and Taylor (1994) reviews this calculus.
-To each real-valued sequence <math> 1, a_1, a_2,\ldots</math>
-there corresponds an umbra <math> \alpha</math> such that <math> E(\alpha^r) = a_r</math> .
-This freedom gives rise to special umbrae, the singleton and Bell umbra,
-corresponding to no real-valued random variable.
-Using these special umbrae, one develops the formal notion of an
-<math>\alpha</math>-cumulant umbra
-<math>\chi\cdot\alpha</math>
-by formal product operations in the syntax.
-Properties of cumulants, <math> k</math> -statistics and other polynomial functions
-are then derived by purely formal combinatorial operations.
-Di Nardo et al. (2008) present details.
-Streitberg (1990) presents parallels between the calculus of cumulants and the
-calculus of certain decompositions of multivariate cumulative distribution
-functions into independent segments; these characterizations in terms of
-independent segments are called Lancaster interactions.
-===Moment and Cumulant Measures for Random Measures===
-Moments and cumulants extend quite naturally to random distributions.
-Let <math>\upsilon</math> be a random measure on a space <math>\Upsilon</math>.
-Then the expectation of <math>\upsilon</math> is
-defined as that measure such that <math>E(\upsilon)(A)=E(\upsilon(A))</math>, for <math>A</math> in a suitable sigma field.  Higher--order
-moments then translate to expectations of product measures.
-Let <math>\upsilon^{(k)}</math> be the measure defined on
-<math>\Upsilon^k</math>, such that
-<math>\upsilon^{(k)}(A_1\times\cdots\times A_k)=\prod_{j=1}^k\upsilon(A_j)</math>.
-Then the moment of order <math>k</math> of <math>\upsilon</math> is <math>E(\upsilon^{(k)})</math>.
-A moment generating functional can similarly be defined for <math>\upsilon</math>; a heuristic definition may be constructed through analogy with
-(<ref>powerseries</ref>): Let
-:<math>
-\Phi(f)=\sum_{r=0}^\infty f(x_1)\ldots f(x_r)\upsilon^{(r)}(d x_1\cdots d x_r)/r!,
-</math>
-for certain functions <math>f</math> on <math>\Upsilon</math>,
-and moments can be recovered from <math>\Phi(f)</math> via Fr\'echet
-differentiation.
-Cumulants can then be defined as in (<ref>forward</ref>), using the obvious analogy.
-These moments and cumulants have application to the theory of point processes.
-The above exposition, and applications to the theory of point processes,
-can be found in Daley and Vere-Jones (1988).
-==Approximation of distributions==
-===Edgeworth approximation===
-Suppose that
-<math>Y</math> is a random variable that arises as the sum
-of
-<math>n</math> independent and identically-distributed summands, each of which has
-mean
-<math>0</math>, unit variance, and
-cumulants
-<math>\kappa_r</math>, and
-<math>X=Y/\sqrt{n}</math>.
-For ease of exposition, assume that cumulants of all orders exist.
-Then, using (<ref>ndep</ref>), the cumulant generating function of
-<math>X</math> is given by
-<math>K(\xi)=\xi^2/2 +\kappa_3\xi^3/(6\sqrt{n}) +\kappa_4\xi^4/(24 n) +\cdots</math>,
-and the moment generating function of
-<math>X</math> is given by
-:<math>
-K(\xi)=\exp(\xi^2/2)\exp(\kappa_3\xi^3/(6\sqrt{n})+\kappa_4\xi^4/(24 n)+\cdots)
-</math>
-Exponentiating the second factor gives
-:<math>
-K(\xi)=\exp(\xi^2/2)\left(1\!+\!{{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\!+\! {\textstyle{\frac12}} \left[
-{{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\right]^2\!+\!\!\cdots\right).
-</math>
-Reordering terms in powers of sample size,
-:<math  kseries>
-=\exp(\xi^2/2)\left(1+{{\kappa_3\xi^3}\over{6\sqrt{n}}}+{{\kappa_4\xi^4}\over{24 n}}+
-{{\kappa_3^2\xi^6}\over{72 n}}+\cdots\right).
-</math>
-Repeated application of integration by parts to (<ref>mgfdef</ref>) shows that
-:<math  mgfderiv>
-\xi^r M(\xi) =\int_{-\infty}^\infty\exp(\xi x)(-1)^r f^{(r)}(x) d x,
-</math>
-where
-<math>f^{(r)}</math> denotes the derivative of
-<math>f</math> of order
-<math>r</math>.  Relation
-(<ref>mgfderiv</ref>) holds if
-<math>f</math> and its derivatives go to zero quickly
-as
-<math>\vert x\vert\to\infty</math>.  Applying (<ref>mgfderiv</ref>) to the normal
-density
-<math>\phi(x)=\exp(-x^2/2)/\sqrt{2\pi}</math>, and applying the result to
-(<ref>kseries</ref>), gives
-:<math>
-M(\xi)\approx\int_{-\infty}^\infty\exp(\xi x)\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+
-{{\kappa_3^2h^6(x)}\over{72 n}}\right] d x
-</math>
-for
-<math>h^r(x)=(-1)^r\phi^{(r)}(x)/\phi(x)</math>, and, since the relationship
-giving the moment generating function in terms of the density is invertible,
-and that the inversion process is properly smooth,
-Edgeworth (1907) approximates the density of
-<math>X</math> by
-:<math  edser>
-e_4(x)=\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+
-{{\kappa_3^2h^6(x)}\over{72 n}}\right].
-</math>
-In fact, when the summands contributing to
-<math>S</math> have a density and cumulants of order at least 5, the error in the
-approximation, multiplied by
-<math>n^{3/2}</math>, remains bounded.
-The functions
-<math>h^r</math> defined above are the Hermite polynomials.
-The approximation (<ref>edser</ref>) is known as the Edgeworth series.
-The subscript refers to the number of cumulants used in its definition.
-This series can be used to approximate either the cumulative distribution function or survival function through term-wise integration.
-The preceding discussion is intended to be heuristic; Kolassa (2006) presents
-a rigorous derivation, along with the natural extension to random vectors.
-===Saddlepoint approximation===
-The approximation (<ref>edser</ref>) to the density
-<math>f(x)</math> has the property that
-<math>|f(x)-e_r(x)|\leq C n^{-(r-1)/2}</math>, for some constant
-<math>C</math>,
-when the cumulant of order
-<math>r+1</math> exists;
-<math>C</math> does not depend on
-<math>x</math>.
-A similar bound holds for the relative error
-<math>(f(x)-e_r(x))/f(x)</math>, only when
-<math>x</math> is restricted to a finite interval.
-Because of the polynomial factor multiplying the first omitted term in
-(<ref>edser</ref>), the relative error can be expected to behave poorly.
-One might prefer an approximation that maintains good behavior for
-values of
-<math>X</math> in a range that increases as
-<math>n</math> increases; specifically,
-one might prefer an approximation that performs well for values of
-<math>\bar Y=X/\sqrt{n}</math> in a fixed interval.
-Assume again that random variables
-<math>Y_j</math> are independent and identically distributed, each with a cumulant generating function
-<math>K(\xi)</math> finite for
-<math>\xi</math>
-in a neighborhood of
-<math>0</math>.  As above, define the exponential family
-:<math>
-f_{\bar Y}(\bar y;\theta)=\exp(\theta\bar y-K(\theta))f_{\bar Y}(\bar y).
-</math>
-One can then choose a value of
-<math>\theta</math> depending on
-<math>\bar y</math>
-that makes
-<math>f_{\bar Y}(\bar y;\theta)</math> easy to approximate, and
-the exponential family relationship to derive an approximation for
-<math>f_{\bar Y}(\bar y)</math>.  Conventionally we choose
-<math>\hat\theta</math> to
-satisfy
-:<math  speqn>
-K'(\hat\theta)=\bar y;
-</math>
-this makes the expectation of the distribution
-with density
-<math>f_{\bar Y}(\cdot;\hat\theta)</math> equal to the observed value.
-One then applies (<ref>edser</ref>), with the scale of the ordinate changed
-to reflect the fact that we are approximating the distribution of
-<math>X/\sqrt{n}</math>,
-to obtain
-:<math>
-f_{\bar Y}(\bar y)\approx\exp(-\hat\theta\bar y+K(\hat\theta))
-n\phi(0)\left[1+{{\kappa_3 h^3(0)}\over{6\sqrt{n}}}+{{\kappa_4h^4(0)}\over{24 n}}+
-{{\kappa_3^2h^6(0)}\over{72 n}}\right].
-</math>
-Using the fact that <math>h^3(0)=0</math>,
-<math>h^4(0)=3</math>, and <math>h^6(0)=-15</math>,
-we obtain :<math  spser>
-f_{\bar Y}(\bar y)\approx{{n}\over{\sqrt{2\pi}}}
-\exp(K(\hat\theta)-\hat\theta\bar y)
-\left[1+{{\hat\kappa_4}\over{8 n}}-
-{{5\hat\kappa_3^2}\over{24 n}}\right].
-</math>
-Here
-<math>\hat\kappa_j</math> are calculated from the derivatives of
-<math>K</math> in the preceding manner, but in this case evaluated at
-<math>\hat\theta</math>.
-This approximation may only be applied to values of
-<math>\bar y</math> for which
-(<ref>speqn</ref>) has solutions in an open neighborhood of 0.
-Expression (<ref>spser</ref>) represents the saddlepoint approximation to
-the density of the mean
-<math>\bar Y</math>; since
-<math>f_{\bar Y}(\bar y;\theta)</math>
-has a cumulant generating function defined on an open set containing
-<math>0</math>,
-cumulants of all orders exist, the Edgeworth series including
-<math>\kappa_6</math>
-may be applied to
-<math>f_{\bar Y}(\bar y;\theta)</math>, and so the error in the
-Edgeworth series is of order
-<math>O(1/n^2)</math>.  Hence the error in (<ref>spser</ref>)
-is of the same order, and in this case, is relative and uniform for values of
-<math>\bar y</math> in a bounded subset of an open subset on which (<ref>speqn</ref>)
-has a solution.
-This approximation was introduced to the statistics literature by
-Daniels (1954).
-The Edgeworth series for the density was trivially integrated to obtain an
-approximation to tail probabilities.  Integration of the saddlepoint
-approximation is more delicate.  Two main approaches have been investigated.
-Daniels (1987) expresses
-<math>f_{\bar Y}(\bar y)</math> exactly as a complex integral
-involving
-<math>K(\xi)</math>, integrates with respect to
-<math>\bar y</math> to obtain another
-complex integral, and reviews techniques for approximating the resulting
-integrals.
-Robinson (1982) and Lugannani and Rice (1980) derive tail probability approximations based
-on approximately integrating (<ref>spser</ref>) with respect to
-<math>\bar y</math> directly.
-These saddlepoint and Edgeworth approximations have multivariate and
-conditional extensions.  Davison (1988) exploits the conditional saddlepoint tail probability approximation to perform inference in canonical exponential families.
-==Samples and sub-samples==
-A function
-<math>f\colon{\mathcal R}^n\to{\mathcal R}</math> is symmetric if
-<math>f(x_1 ,\ldots, x_n) = f(x_{\pi(1)} ,\ldots, x_{\pi(n)})</math>
-for each permutation
-<math>\pi</math> of the arguments.
-For example, the total
-<math>T_n = x_1 + \cdots + x_n</math>, the average
-<math>T_n/n</math>,
-the min, max and median are symmetric functions, as are the sum of squares
-<math>S_n = \sum x_i^2</math>, the sample variance
-<math>s_n^2 = (S_n - T_n^2/n)/(n-1)</math>
-and the mean absolute deviation
-<math>\sum |x_i - x_j|/(n(n-1))</math>.
-A vector
-<math>x</math> in
-<math>{\mathcal R}^n</math> is an ordered list of
-<math>n</math> real numbers
-<math>(x_1 ,\ldots, x_n)</math>
-or a function
-<math>x\colon[n]\to{\mathcal R}</math> where
-<math>[n]=\{1 ,\ldots, n\}</math>.
-For
-<math>m \le n</math>, a 1--1 function
-<math>\varphi\colon[m]\to[n]</math> is a sample of size
-<math>m</math>,
-the sampled values being
-<math>x\varphi = (x_{\varphi(1)} ,\ldots, x_{\varphi(m)})</math>.
-All told, there are
-<math>n(n-1)\cdots(n-m+1)</math> distinct samples of size
-<math>m</math>
-that can be taken from a list of length
-<math>n</math>.
-A ''sequence'' of functions
-<math>f_n\colon{\mathcal R}^n\to{\mathcal R}</math> is
-consistent under sub-sampling if, for each
-<math>f_m, f_n</math>,
-:<math>
-f_n(x) = {\rm ave} _\varphi f_m(x\varphi),
-</math>
-where
-<math>{\rm ave} _\varphi</math> denotes the average over samples of size
-<math>m</math>.
-For
-<math>m=n</math>, this condition implies only that
-<math>f_n</math> is a symmetric function.
-Although the total and the median are both symmetric functions, neither is consistent
-under sub-sampling.
-For example, the median of the numbers
-<math>(0,1,3)</math> is one,
-but the average of the medians of samples of size two is 4/3.
-However, the average
-<math>\bar x_n = T_n/n</math> is sampling consistent.
-Likewise the sample variance
-<math>s_n^2 = \sum(x_i - \bar x)^2/(n-1)</math> with divisor
-<math>n-1</math>
-is sampling consistent,
-but the mean squared deviation
-<math>\sum(x_i - \bar x_n)^2/n</math> with divisor
-<math>n</math> is not.
-Other sampling consistent functions include Fisher's
-<math>k</math>-statistics,
-the first few of which are
-<math>k_{1,n} = \bar x_n</math>,
-<math>k_{2,n} = s_n^2</math> for
-<math>n\ge 2</math>,
-<math>
-k_{3,n} = n\sum(x_i - \bar x_n)^3/((n-1)(n-2)),
-</math>
-defined for
-<math>n\ge 3</math>.
-For a sequence of independent and identically distributed random variables,
-the
-<math>k</math>-statistic of order
-<math>r\le n</math> is the unique symmetric function
-such that
-<math>E(k_{r,n}) = \kappa_r</math>.
-Fisher (1929) derived the variances and covariances.
-The connection with finite-population sub-sampling was developed by
-Tukey (1950).
-==References==
-*D. J. Daley and D. Vere-Jones. ''An Introduction to the Theory of Point Processes''. Springer-Verlag, New York, 1988.
-*H. E. Daniels. Saddlepoint approximations in statistics. ''The Annals of Mathematical Statistics'', 25  (4): 631--650, 1954.
-*H. E. Daniels. Tail probability approximations. ''Review of the International Statistical Institute'', 55:  37--46, 1987.
-*A. C. Davison. Approximate conditional inference in generalized linear models. ''Journal of the Royal Statistical Society Series B'', 50:  445--461, 1988.
-*E. Di Nardo, G. Guarino, and D. Senato. A unifying framework for $k$-statistics, polykays and their  multivariate generalizations. ''Bernoulli'', 14: 440--468, 2008.
-*P. L. Dressel. Statistical seminvariants and their setimates with particular  emphasis on their relation to algebraic invariants. ''The Annals of Mathematical Statistics'', 11  (1): 33--57, 1940.
-*F. Y. Edgeworth. On the representation of statistical frequency by a series. ''Journal of the Royal Statistical Society'', 70  (1): 102--106, 1907.
-*R. A. Fisher. Moments and product moments of sampling distributions. ''Proceedings of the London Mathematical Society, Series 2'',  30: 199--238, 1929.
-*I. J. Good. A new formula for k-statistics. ''The Annals of Statistics'', 5 (1): 224--228,  1977.
-*A. Hald. The early history of cumulants and the Gram-Charlier series. ''International Statistical Review'', 68: 137--153, 2000.
-*C. C. Heyde. On a property of the lognormal distribution. ''Journal of the Royal Statistical Society. Series B  (Methodological)'', 25 (2): 392--393, 1963.
-*J. E. Kolassa. ''Series Approximation Methods in Statistics''. Springer--Verlag, New York, 2006.
-*S.L. Lauritzen, editor. ''Thiele: pioneer in statistics''. Oxford University Press, New York, 2002.
-*R. Lugannani and S. Rice. Saddle point approximation for the distribution of the sum of  independent random variables. ''Advances in Applied Probability'', 12: 475--490, 1980.
-*J. Marcinkiewicz. Sur une peropri'et'e de la loi de Gauss. ''Mathematische Zeitschrift'', 44: 612--618, 1939.
-*J. Robinson. Saddlepoint approximations for permutation tests and confidence  intervals. ''Journal of the Royal Statistical Society. Series B  (Methodological)'', 44 (1): 91--101, 1982.
-*G.-C. Rota and B. D. Taylor. The classical umbral calculus. ''SIAM J. Math. Anal'', 25 (2): 694--711, 1994.
-*B. Streitberg. Lancaster interactions revisited. ''The Annals of Statistics'', 18 (4): 1878--1885,  1990.
-*T. N. Thiele. ''Almindelig Iagttagelseslaere: Sandsynlighedsregning og mindste  Kvadraters Methode''. C. A. Reitzel, Copenhagen, 1889.
-*T. N. Thiele. ''Theory of Observations''. C. & E. Layton, London, 1903.
-*J. W. Tukey. Some sampling simplified. ''Journal of the American Statistical Association'', 45  (252): 501--519, 1950.