|
|
| Line 1: |
Line 1: |
| − | This article describes a sequence of numbers, called <strong> cumulants</strong>,
| |
| − | that are used
| |
| − | to describe, and in some circumstances approximate, a univariate or multivariate
| |
| − | distribution. Cumulants are not unique in this role; other sequences, such as
| |
| − | moments and their generalizations, may also be used in both roles.
| |
| − | Cumulants have multiple advantages over competitors, in that cumulants change
| |
| − | in a very simple way when the underlying random variable is subject to an
| |
| − | affine transformation, cumulants for sums of independent random variables have
| |
| − | a very simple relationship to the cumulants of the addends, and cumulants may
| |
| − | be used in a simple way to describe the difference between a distribution and
| |
| − | its simplest Gaussian approximation.
| |
| − | ==Overview and Definitions==
| |
| − | ===Definition===
| |
| − | The moment of order
| |
| − | <math>r</math> (or
| |
| − | <math>r</math>th moment) of a real-valued random variable
| |
| − | <math>X</math> is
| |
| | <math> | | <math> |
| − | \mu_r = E(X^r) | + | \begin{array} |
| | + | a&=&b |
| | + | \end{array} |
| | </math> | | </math> |
| − | for integer
| |
| − | <math>r=0,1,\ldots</math>.
| |
| − | The value is assumed to be finite.
| |
| − | Provided that it has a Taylor expansion about the origin,
| |
| − | The moment generating function (or Fourier--Laplace transform)
| |
| − | <math powerseries>
| |
| − | M(\xi) = E(e^{\xi X})
| |
| − | = E(1 + \xi X +\cdots + \xi^r X^r/r!+\cdots)
| |
| − | = \sum_{r=0}^\infty \mu_r \xi^r/r!
| |
| − | </math>
| |
| − | is an easy way to combine all of the moments into a single expression.
| |
| − | The
| |
| − | <math>r</math>th moment is hence the
| |
| − | <math>r</math>th derivative of
| |
| − | <math>M</math> at the origin.
| |
| − | This definition is due to Fisher (1929).
| |
| − |
| |
| − | When
| |
| − | <math>X</math> has a distribution given by a density
| |
| − | <math>f</math>, then
| |
| − | <math ctsmomdef>
| |
| − | \mu_r = \int_{-\infty}^\infty x^r f(x)\, dx </math>, and
| |
| − | <math mgfdef>
| |
| − | M(\xi) = E(e^{\xi X}) =\int_{-\infty}^\infty\exp(\xi x) f(x)~d x.
| |
| − | </math>
| |
| − |
| |
| − | The cumulants
| |
| − | <math>\kappa_r</math> are the coefficients in the Taylor expansion of
| |
| − | the cumulant generating function about the origin
| |
| − | <math>
| |
| − | K(\xi) = \log M(\xi) = \sum_{r} \kappa_r \xi^r/r!.
| |
| − | </math>
| |
| − | Evidently
| |
| − | <math>\mu_0 = 1</math> implies
| |
| − | <math>\kappa_0 = 0</math>.
| |
| − | The relationship between the first few moments and cumulants,
| |
| − | obtained by extracting coefficients from the expansion, is as follows
| |
| − | \begin{eqnarray}\label{forward}
| |
| − | \left.
| |
| − | \parbox{10cm}{
| |
| − | \begin{eqnarray*}
| |
| − | \kappa_1 &=& \mu_1 \\
| |
| − | \kappa_2 &=& \mu_2 - \mu_1^2\\
| |
| − | \kappa_3 &=& \mu_3 - 3\mu_2\mu_1 + 2\mu_1^3\\
| |
| − | \kappa_4 &=& \mu_4 - 4\mu_3\mu_1 - 3\mu_2^2 + 12\mu_2\mu_1^2 -6\mu_1^4.
| |
| − | \end{eqnarray*}}\right\}
| |
| − | \end{eqnarray}
| |
| − | In the reverse direction
| |
| − | \begin{eqnarray}\label{reverse}
| |
| − | \left.
| |
| − | \parbox{10cm}{
| |
| − | \begin{eqnarray*}
| |
| − | \mu_2 &=& \kappa_2 + \kappa_1^2\\
| |
| − | \mu_3 &=& \kappa_3 + 3\kappa_2\kappa_1 + \kappa_1^3\\
| |
| − | \mu_4 &=& \kappa_4 + 4\kappa_3\kappa_1 + 3\kappa_2^2 + 6\kappa_2\kappa_1^2 + \kappa_1^4.
| |
| − | \end{eqnarray*}}\right\}
| |
| − | \end{eqnarray}
| |
| − | In particular,
| |
| − | <math>\kappa_1 = \mu_1</math> is the mean of~
| |
| − | <math>X</math>,
| |
| − | <math>\kappa_2</math>~is the
| |
| − | variance, and
| |
| − | <math>\kappa_3 = E((X - \mu_1)^3)</math>.
| |
| − | Higher-order cumulants are not the same as moments about the mean.
| |
| − |
| |
| − | ===Definitions under less restrictive conditions===
| |
| − | The Cauchy distribution with density <math> \pi^{-1}/(1+x^2)</math> has no moments because
| |
| − | the integral (<ref>ctsmomdef</ref>) does not converge for any integer~<math> r\ge 1</math>
| |
| − | Student's~<math> t</math> distribution on five degrees of freedom is symmetric with density
| |
| − | <math> (3\pi\surd5/8)/(1 + x^2/5)^3</math>
| |
| − | The first four moments are <math> 0, 5/3, 0, 25</math> : higher-order moments are
| |
| − | not defined.
| |
| − | The cumulants up to order four are defined by (<ref>forward</ref>)
| |
| − | even though the moment generating function (<ref>powerseries</ref>) does not exist
| |
| − | for any real <math> \xi\neq 0</math> .
| |
| − |
| |
| − | In both of these cases, the characteristic function <math> M(i\xi)</math> is
| |
| − | well-defined for real <math> \xi</math> ,
| |
| − | <math> \exp(-|\xi|)</math> for the Cauchy distribution,
| |
| − | and <math> \exp(-|\xi|\surd 5)(1 + |\xi|\surd5 + 5\xi^2/3)</math> for <math> t_5</math> .
| |
| − | In the latter case, both <math> M(i\xi)</math> and <math> K(i\xi)</math>
| |
| − | have Taylor expansions up to order four only, so the moments and
| |
| − | cumulants are defined only up to this order.
| |
| − | The infinite expansion (<ref>powerseries</ref>) is justified when
| |
| − | the radius of convergence is positive, in which case <math> M(\xi)</math> is finite on
| |
| − | an open set containing zero, and all moments and cumulants are finite.
| |
| − | However, finiteness of the moments does not imply that <math> M(\xi)</math>
| |
| − | exists for any <math> \xi\neq 0</math> .
| |
| − | The log normal distribution provides a counterexample.
| |
| − | It has finite moments <math> \mu_r = e^{r^2/2}</math> of all orders,
| |
| − | but (<ref>powerseries</ref>) diverges for every~<math> \xi\neq 0</math>.
| |
| − | ===Uniqueness===
| |
| − | The normal distribution
| |
| − | <math>N(\mu, \sigma^2)</math> has cumulant generating function
| |
| − | <math>\xi\mu + \xi^2 \sigma^2/2</math>, a quadratic polynomial implying that all cumulants
| |
| − | of order three and higher are zero.
| |
| − | Marcinkiewicz (1939) showed that the normal distribution is the only distribution
| |
| − | whose cumulant generating function is a polynomial, i.e.~the only distribution
| |
| − | having a finite number of non-zero cumulants.
| |
| − | The Poisson distribution with mean
| |
| − | <math>\mu</math> has moment generating function
| |
| − | <math>\exp(\mu(e^\xi - 1))</math> and cumulant generating function
| |
| − | <math>\mu(e^\xi -1)</math>.
| |
| − | Consequently all the cumulants are equal to the mean.
| |
| − |
| |
| − | Two distinct distributions may have the same moments, and hence the same cumulants.
| |
| − | This statement is fairly obvious for distributions whose moments are all infinite,
| |
| − | or even for distributions having infinite higher-order moments.
| |
| − | But it is much less obvious for distributions having finite moments of all orders.
| |
| − | Heyde (1963) gave one such pair of distributions with densities
| |
| − | <math>
| |
| − | f_1(x) = \exp(-(\log x)^2/2) / (x\sqrt{2\pi})
| |
| − | </math>
| |
| − | and <math>
| |
| − | f_2(x) = f_1(x) [1 + \sin(2\pi\log x)/2]
| |
| − | </math>
| |
| − | for
| |
| − | <math>x > 0</math>.
| |
| − | The first of these is called the log normal distribution.
| |
| − | To show that these distributions have the same moments it suffices to show that
| |
| − | <math>
| |
| − | \int_0^\infty x^k f_1(x) \sin(2\pi\log x)\, dx = 0
| |
| − | </math>
| |
| − | for integer
| |
| − | <math>k\ge 1</math>, which can be shown by making the substitution
| |
| − | <math>\log x = y+k</math>.
| |
| − |
| |
| − | If the sequence of moments is such that (<ref>powerseries</ref>)
| |
| − | has a finite radius of convergence, the distribution is uniquely determined.
| |
| − |
| |
| − | ===Properties===
| |
| − | Cumulants of order
| |
| − | <math>r \ge 2</math> are called semi-invariant on account of their
| |
| − | behavior under affine transformation of variables (Thiele ,1903, Dressel ,1940).
| |
| − | If
| |
| − | <math>\kappa_r</math> is the
| |
| − | <math>r</math>th cumulant of
| |
| − | <math>X</math>,
| |
| − | the
| |
| − | <math>r</math>th cumulant of the affine transformation
| |
| − | <math>a + b X</math> is
| |
| − | <math>b^r \kappa_r</math>,
| |
| − | independent of~
| |
| − | <math>a</math>.
| |
| − | This behavior is considerably simpler than that of moments.
| |
| − | However, moments about the mean are also semi-invariant, so this property alone
| |
| − | does not explain why cumulants are useful for statistical purposes.
| |
| − |
| |
| − | The term cumulant was coined by Fisher (1929) on account of their behavior under
| |
| − | addition of random variables.
| |
| − | Let
| |
| − | <math>S = X+Y</math> be the sum of two independent random variables.
| |
| − | The moment generating function of the sum is the product
| |
| − | <math>
| |
| − | M_S(\xi) = M_X(\xi) M_Y(\xi),
| |
| − | </math>
| |
| − | and the cumulant generating function is the sum
| |
| − | <math>
| |
| − | K_S(\xi) = K_X(\xi) + K_Y(\xi).
| |
| − | </math>
| |
| − | Consequently, the
| |
| − | <math>r</math>th cumulant of the sum is the sum of the
| |
| − | <math>r</math>th cumulants.
| |
| − | By extension, if
| |
| − | <math>X_1,\ldots X_n</math> are independent and identically distributed,
| |
| − | the
| |
| − | <math>r</math>th cumulant of the sum is
| |
| − | <math>n\kappa_r</math>.
| |
| − | Let
| |
| − | <math>\kappa_{n;r}</math> be
| |
| − | cumulant of order
| |
| − | <math>r</math> of the standardized sum
| |
| − | <math>n^{-1/2}(X_1+\cdots + X_n)</math>;
| |
| − | then
| |
| − | <math ndep>
| |
| − | \kappa_{n;r}=n^{1-r/2} \kappa_r.
| |
| − | </math>
| |
| − | Provided that the cumulants are finite, all cumulants of order
| |
| − | <math>r\ge 3</math>
| |
| − | of the standardized sum tend to zero, which is a simple demonstration of the central limit theorem.
| |
| − |
| |
| − | Good (1977) obtained an expression for the
| |
| − | <math>r</math>th cumulant of
| |
| − | <math>X</math> as
| |
| − | the
| |
| − | <math>r</math>th moment of the discrete Fourier transform of an independent and
| |
| − | identically distributed sequence as follows.
| |
| − | Let
| |
| − | <math>X_1, X_2,\ldots</math> be independent copies of~
| |
| − | <math>X</math> with
| |
| − | <math>r</math>th cumulant~
| |
| − | <math>\kappa_r</math>,
| |
| − | and let
| |
| − | <math>\omega = e^{2\pi i/n}</math> be a primitive
| |
| − | <math>n</math>th root of unity.
| |
| − | The discrete Fourier combination
| |
| − | <math>
| |
| − | Z = X_1 + \omega X_2 + \cdots + \omega^{n-1} X_n
| |
| − | </math>
| |
| − | is a complex-valued random variable whose distribution is invariant under
| |
| − | rotation
| |
| − | <math>Z\sim \omega Z</math> through multiples of~
| |
| − | <math>2\pi /n</math>.
| |
| − | The
| |
| − | <math>r</math>th cumulant of the sum is
| |
| − | <math>\kappa_r \sum_{j=1}^n \omega^{r j}</math>,
| |
| − | which is equal to
| |
| − | <math>n\kappa_r</math> if
| |
| − | <math>r</math> is a multiple of
| |
| − | <math>n</math>, and zero otherwise.
| |
| − | Consequently
| |
| − | <math>E(Z^r) = 0</math> for integer
| |
| − | <math>r < n</math> and
| |
| − | <math>E(Z^n) = n\kappa_n</math>.
| |
| − |
| |
| − |
| |
| − | ===Multivariate cumulants===
| |
| − | Somewhat surprisingly, the relation between moments and cumulants is simpler and
| |
| − | more transparent in the multivariate case than in the univariate case.
| |
| − | Let
| |
| − | <math>X = (X^1,\ldots, X^k)</math> be the components of a random vector.
| |
| − | In a departure from the univariate notation, we write
| |
| − | <math>\kappa^r = E(X^r)</math> for the components of the mean vector,
| |
| − | <math>\kappa^{rs} = E(X^r X^s)</math> for the components of the second moment matrix,
| |
| − | <math>\kappa^{r s t} = E(X^r X^s X^t)</math> for the third moments, and so on.
| |
| − | It is convenient notationally to adopt Einstein's summation convention,
| |
| − | so
| |
| − | <math>\xi_r X^r</math> denotes the linear combination
| |
| − | <math>\xi_1 X^1 + \cdots + \xi_k X^k</math>,
| |
| − | the square of the linear combination is
| |
| − | <math>(\xi_r X^r)^2 = \xi_r\xi_s X^r X^s</math>
| |
| − | a sum of
| |
| − | <math>k^2</math> terms, and so on for higher powers.
| |
| − | The Taylor expansion of the moment generating function
| |
| − | <math>M(\xi) = E(\exp(\xi_r X^r)</math>
| |
| − | is
| |
| − | <math>
| |
| − | M(\xi) = 1 + \xi_r \kappa^r
| |
| − | + \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{rs}
| |
| − | + \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r s t} +\cdots.
| |
| − | </math>
| |
| − | The cumulants are defined as the coefficients
| |
| − | <math>\kappa^{r,s}, \kappa^{r,s,t},\ldots</math>
| |
| − | in the Taylor expansion
| |
| − | <math>
| |
| − | \log M(\xi) = \xi_r \kappa^r
| |
| − | + \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{r,s}
| |
| − | + \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r,s,t} +\cdots.
| |
| − | </math>
| |
| − | This notation does not distinguish first-order moments from first-order cumulants,
| |
| − | but commas separating the superscripts serve to distinguish higher-order cumulants from moments.
| |
| − |
| |
| − | Comparison of coefficients reveals that the each moment
| |
| − | <math>\kappa^{rs}, \kappa^{r s t},\ldots</math>
| |
| − | is a sum over partitions of the superscripts, each term in the sum being a
| |
| − | product of cumulants:
| |
| − | \begin{eqnarray*}
| |
| − | \kappa^{rs}&=&\kappa^{r,s} + \kappa^r\kappa^s\\
| |
| − | \kappa^{r s t}&=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t + \kappa^{r,t}\kappa^s + \kappa^{s,t}\kappa^r
| |
| − | + \kappa^r\kappa^s\kappa^t\\
| |
| − | &=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t[3] + \kappa^r\kappa^s\kappa^t\\
| |
| − | \kappa^{r s t u}&=&\kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,s}\kappa^{t,u}[3]
| |
| − | + \kappa^{r,s}\kappa^t\kappa^u[6] + \kappa^r\kappa^s\kappa^t\kappa^u.
| |
| − | \end{eqnarray*}
| |
| − | Each parenthetical number indicates a sum over distinct partitions
| |
| − | having the same block sizes, so the fourth-order moment is a sum of 15 distinct cumulant products.
| |
| − | In the reverse direction, each cumulant is also a sum over partitions of the indices.
| |
| − | Each term in the sum is a product of moments, but with coefficient
| |
| − | <math>(-1)^{\nu-1} (\nu-1)!</math>
| |
| − | where
| |
| − | <math>\nu</math> is the number of blocks:
| |
| − | \begin{eqnarray*}
| |
| − | \kappa^{r,s} &=& \kappa^{rs} - \kappa^r\kappa^s\\
| |
| − | \kappa^{r,s,t} &=& \kappa^{r s t} - \kappa^{rs}\kappa^t[3] + 2 \kappa^r\kappa^s\kappa^t\\
| |
| − | \kappa^{r,s,t,u} &=& \kappa^{r s t u} - \kappa^{r s t}\kappa^u[4] - \kappa^{rs}\kappa^{t u}[3]
| |
| − | + 2 \kappa^{rs}\kappa^t\kappa^u[6] - 6 \kappa^r\kappa^s\kappa^t\kappa^u
| |
| − | \end{eqnarray*}
| |
| − |
| |
| − | These relationships are an instance of M\"obius inversion on the partition lattice.
| |
| − |
| |
| − | Partition notation serves one additional purpose.
| |
| − | It establishes moments and cumulants as special cases of generalized cumulants,
| |
| − | which includes objects of the type
| |
| − | <math>\kappa^{r,st} = {\rm cov}(X^r, X^s X^t)</math>,
| |
| − | <math>\kappa^{rs, t u} = {\rm cov}(X^r X^s, X^t X^u)</math>, and
| |
| − | <math>\kappa^{rs, t, u}</math> with incompletely partitioned indices.
| |
| − | These objects arise very naturally in statistical work involving asymptotic
| |
| − | approximation of distributions.
| |
| − | They are intermediate between moments and cumulants, and have characteristics of both.
| |
| − |
| |
| − | Every generalized cumulant can be expressed as a sum of certain products of ordinary cumulants.
| |
| − | Some examples are as follows:
| |
| − | \begin{eqnarray*}
| |
| − | \kappa^{rs, t} &=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t} + \kappa^s \kappa^{r,t}\\
| |
| − | &=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t}[2]\\
| |
| − | \kappa^{rs,t u} &=& \kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,t}\kappa^{s,u}[2]
| |
| − | + \kappa^{r,t}\kappa^s\kappa^u[4]\\
| |
| − | \kappa^{rs,t,u} &=& \kappa^{r,s,t,u} + \kappa^{r,t,u}\kappa^s[2] + \kappa^{r,t}\kappa^{s,u}[2]
| |
| − | \end{eqnarray*}
| |
| − | Each generalized cumulant is associated with a partition
| |
| − | <math>\tau</math> of the given set of indices.
| |
| − | For example,
| |
| − | <math>\kappa^{rs,t,u}</math> is associated with the partition
| |
| − | <math>\tau=rs|t|u</math> of four indices
| |
| − | into three blocks.
| |
| − | Each term on the right is a cumulant product associated with a partition
| |
| − | <math>\sigma</math> of the same indices.
| |
| − | The coefficient is one if the least upper bound
| |
| − | <math>\sigma\vee\tau</math> has a single block,
| |
| − | otherwise zero.
| |
| − | Thus, with
| |
| − | <math>\tau=rs|t|u</math>, the product
| |
| − | <math>\kappa^{r,s}\kappa^{t,u}</math> does not appear
| |
| − | on the right because
| |
| − | <math>\sigma\vee\tau = rs|t u</math> has two blocks.
| |
| − |
| |
| − | As an example of the way these formulae may be used,
| |
| − | let
| |
| − | <math>X</math> be a scalar random variable with cumulants
| |
| − | <math>\kappa_1,\kappa_2,\kappa_3,\ldots</math>.
| |
| − | By translating the second formula in the preceding list, we find that
| |
| − | the variance of the squared variable is
| |
| − | <math>
| |
| − | {\rm var}(X^2) = \kappa_4 + 4\kappa_3\kappa_1 + 2\kappa_2^2 + 4\kappa_2\kappa_1^2,
| |
| − | </math>
| |
| − | reducing to
| |
| − | <math>\kappa_4 + 2\kappa_2^2</math> if the mean is zero.
| |
| − |
| |
| − | ===Exponential families===
| |
| − | Let <math> f</math> be a probability distribution on an arbitrary measurable space <math> ({\mathcal X},\nu)</math> ,
| |
| − | and let <math> t\colon{\mathcal X}\to{\mathcal R}</math> be a real-valued random variable
| |
| − | with cumulant generating function
| |
| − | <math> K(\cdot)</math> , finite in a set <math> \Theta</math> containing zero in the interior.
| |
| − | The family of distributions on <math> {\mathcal X}</math> with density
| |
| − | <math>
| |
| − | f_\theta(x) = e^{\theta t(x)} f(x) / M(\theta) = e^{\theta t(x) - K(\theta)} f(x)
| |
| − | </math>
| |
| − | indexed by <math> \theta\in\Theta</math> is called the exponential family
| |
| − | associated with <math> f</math> and the canonical statistic~<math> t</math> .
| |
| − | In statistical physics, the normalizing constant <math> M(\theta)</math> is called the
| |
| − | partition function.
| |
| − |
| |
| − | Two examples suffice to illustrate the idea.
| |
| − | In the first example, <math> {\mathcal X} = \{1,2,\ldots\}</math> is the set of natural numbers,
| |
| − | <math> f(x) \propto 1/x^2</math> and <math> t(x) = -\log(x)</math> .
| |
| − | The associated exponential family is
| |
| − | <math> f_\theta(x) = x^{-\theta}/\zeta(\theta)</math> ,
| |
| − | where <math> \zeta(\theta)</math> is the Riemann zeta function with real argument <math> \theta > 1</math> .
| |
| − |
| |
| − | In the second example, <math> {\mathcal X}={\mathcal X}_n</math> is the symmetric group or the set of
| |
| − | permutations of <math> n</math> letters,
| |
| − | <math> x\in{\mathcal X}_n</math> is a permutation, <math> t(x)</math> is the number of cycles,
| |
| − | <math> f(x) = 1/n!</math> is the uniform distribution,
| |
| − | and <math> M_n(\xi) = \Gamma(n+e^\xi)/(n!\, \Gamma(e^\xi))</math> for all real~<math> \xi</math> .
| |
| − | The exponential family of distributions on permutations of <math> [n]</math> is
| |
| − | <math>
| |
| − | f_{n,\theta}(x) = \frac{\Gamma(\lambda)\, \lambda^{t(x)}} {\Gamma(n+\lambda)},
| |
| − | </math>
| |
| − | the same as the the distribution generated by the Chinese restaurant process
| |
| − | with parameter <math> \lambda = e^\theta</math> .
| |
| − | The associated marginal distribution on partitions,
| |
| − | the Ewens distribution on partitions of <math> [n]</math> ,
| |
| − | is also of the exponential-family form with canonical statistic equal
| |
| − | to the number of blocks or cycles.
| |
| − | This number <math> t(x)</math> is a random variable whose cumulants are the
| |
| − | derivatives of <math> \log M(\cdot)</math> evaluated at the parameter~<math> \theta</math> .
| |
| − |
| |
| − |
| |
| − | In the multi-parameter case,
| |
| − | <math> t\colon{\mathcal X}\to{\mathcal R}^p</math> is a random vector
| |
| − | and <math> \xi\colon{\mathcal R}^p\to{\mathcal R}</math> is a linear functional,
| |
| − | <math> M(\xi) = E(e^{\xi(t)})</math> is the joint moment generating function.
| |
| − | It is sometimes convenient to employ Einstein's implicit summation convention
| |
| − | in the form <math> \theta(t) = \theta_i t^i</math> where <math> t^1,\ldots, t^p</math> are
| |
| − | the components of <math> t(x)</math> , and <math> \theta_1,\ldots, \theta_p</math> are the coefficients
| |
| − | of the linear functional.
| |
| − | For simplicity of notation in what follows, <math> {\mathcal X}={\mathcal R}^p</math> and <math> t(x) = x</math>
| |
| − | is the identity function.
| |
| − | An exponential-family distribution in <math> {\mathcal R}^p</math> has the form
| |
| − | <math>
| |
| − | f_\theta(x)=\exp(x^j\theta_j-g(x)-\varphi(\theta))
| |
| − | </math>
| |
| − | for given functions <math> g</math> and <math> \varphi</math> .
| |
| − | Integration shows that the distribution <math> f_\theta</math> has
| |
| − | cumulant generating function <math> K_\theta(\xi)=\varphi(\theta+\xi)-\varphi(\theta)</math> .
| |
| − | The cumulants of <math> X\sim f_\theta</math> are equal to the derivatives of <math> \varphi</math>
| |
| − | at the parameter~<math> \theta</math> .
| |
| − |
| |
| − | ===Calculus of cumulants===
| |
| − | The umbral calculus is a syntax or formal system consisting of
| |
| − | certain operations on objects called umbrae,
| |
| − | mimicking addition and multiplication of independent real-valued random variables
| |
| − | (Rota and Taylor ,1994).
| |
| − | To each real-valued sequence <math> 1, a_1, a_2,\ldots</math>
| |
| − | there corresponds an umbra <math> \alpha</math> such that <math> E(\alpha^r) = a_r</math> .
| |
| − | This freedom gives rise to special umbrae, the singleton and Bell umbra,
| |
| − | corresponding to no real-valued random variable.
| |
| − | Using these special umbrae, one develops the formal notion of
| |
| − | an <math> \alpha</math>-cumulant umbra <math> \chi\cdot\alpha</math> by formal product operations in the syntax.
| |
| − | Properties of cumulants, <math> k</math> -statistics and other polynomial functions
| |
| − | are then derived by purely formal combinatorial operations.
| |
| − | Di~Nardo et~al. (2008) present details.
| |
| − | ==Approximation of distributions==
| |
| − | ===Edgeworth approximation===
| |
| − | Suppose that
| |
| − | <math>Y</math> is a random variable that arises as the sum
| |
| − | of
| |
| − | <math>n</math> independent and identically-distributed summands, each of which has
| |
| − | mean
| |
| − | <math>0</math>, unit variance, and
| |
| − | cumulants
| |
| − | <math>\kappa_r</math>, and
| |
| − | <math>X=Y/\sqrt{n}</math>.
| |
| − | For ease of exposition, assume that cumulants of all orders exist.
| |
| − | Then, using (<ref>ndep</ref>), the cumulant generating function of
| |
| − | <math>X</math> is given by
| |
| − | <math>K(\xi)=\xi^2/2 +\kappa_3\xi^3/(6\sqrt{n}) +\kappa_4\xi^4/(24 n) +\cdots</math>,
| |
| − | and the moment generating function of
| |
| − | <math>X</math> is given by
| |
| − | <math>
| |
| − | K(\xi)=\exp(\xi^2/2)\exp(\kappa_3\xi^3/(6\sqrt{n})+\kappa_4\xi^4/(24 n)+\cdots)
| |
| − | </math>
| |
| − | Exponentiating the second factor gives
| |
| − | <math>
| |
| − | K(\xi)=\exp(\xi^2/2)\left(1\!+\!{{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\!+\! {\textstyle{\frac12}} \left[
| |
| − | {{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\right]^2\!+\!\!\cdots\right).
| |
| − | </math>
| |
| − | Reordering terms in powers of sample size,
| |
| − | <math kseries>
| |
| − | =\exp(\xi^2/2)\left(1+{{\kappa_3\xi^3}\over{6\sqrt{n}}}+{{\kappa_4\xi^4}\over{24 n}}+
| |
| − | {{\kappa_3^2\xi^6}\over{72 n}}+\cdots\right).
| |
| − | </math>
| |
| − | Repeated application of integration by parts to (<ref>mgfdef</ref>) shows that
| |
| − | <math mgfderiv>
| |
| − | \xi^r M(\xi) =\int_{-\infty}^\infty\exp(\xi x)(-1)^r f^{(r)}(x)~d x,
| |
| − | </math>
| |
| − | where
| |
| − | <math>f^{(r)}</math> denotes the derivative of
| |
| − | <math>f</math> of order
| |
| − | <math>r</math>. Relation
| |
| − | (<ref>mgfderiv</ref>) holds if
| |
| − | <math>f</math> and its derivatives go to zero quickly
| |
| − | as
| |
| − | <math>\vert x\vert\to\infty</math>. Applying (<ref>mgfderiv</ref>) to the normal
| |
| − | density
| |
| − | <math>\phi(x)=\exp(-x^2/2)/\sqrt{2\pi}</math>, and applying the result to
| |
| − | (<ref>kseries</ref>), gives
| |
| − | <math>
| |
| − | M(\xi)\approx\int_{-\infty}^\infty\exp(\xi x)\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+
| |
| − | {{\kappa_3^2h^6(x)}\over{72 n}}\right]~d x
| |
| − | </math>
| |
| − | for
| |
| − | <math>h^r(x)=(-1)^r\phi^{(r)}(x)/\phi(x)</math>, and, since the relationship
| |
| − | giving the moment generating function in terms of the density is invertible,
| |
| − | and that the inversion process is properly smooth,
| |
| − | Edgeworth (1907) approximates the density of
| |
| − | <math>X</math> by
| |
| − | <math edser>
| |
| − | e_4(x)=\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+
| |
| − | {{\kappa_3^2h^6(x)}\over{72 n}}\right].
| |
| − | </math>
| |
| − | In fact, when the summands contributing to
| |
| − | <math>S</math> have a density and cumulants of order at least 5, the error in the
| |
| − | approximation, multiplied by
| |
| − | <math>n^{3/2}</math>, remains bounded.
| |
| − | The functions
| |
| − | <math>h^r</math> defined above are the Hermite polynomials.
| |
| − | The approximation (<ref>edser</ref>) is known as the Edgeworth series.
| |
| − | The subscript refers to the number of cumulants used in its definition.
| |
| − | This series can be used to approximate either the cumulative distribution function or survival function through term-wise integration.
| |
| − |
| |
| − | The preceding discussion is intended to be heuristic; Kolassa (2006) presents
| |
| − | a rigorous derivation, along with the natural extension to random vectors.
| |
| − | ===Saddlepoint approximation===
| |
| − | The approximation (<ref>edser</ref>) to the density
| |
| − | <math>f(x)</math> has the property that
| |
| − | <math>|f(x)-e_r(x)|\leq C n^{-(r-1)/2}</math>, for some constant
| |
| − | <math>C</math>,
| |
| − | when the cumulant of order
| |
| − | <math>r+1</math> exists;
| |
| − | <math>C</math> does not depend on
| |
| − | <math>x</math>.
| |
| − | A similar bound holds for the relative error
| |
| − | <math>(f(x)-e_r(x))/f(x)</math>, only when
| |
| − | <math>x</math> is restricted to a finite interval.
| |
| − | Because of the polynomial factor multiplying the first omitted term in
| |
| − | (<ref>edser</ref>), the relative error can be expected to behave poorly.
| |
| − | One might prefer an approximation that maintains good behavior for
| |
| − | values of
| |
| − | <math>X</math> in a range that increases as
| |
| − | <math>n</math> increases; specifically,
| |
| − | one might prefer an approximation that performs well for values of
| |
| − | <math>\bar Y=X/\sqrt{n}</math> in a fixed interval.
| |
| − |
| |
| − | Assume again that random variables
| |
| − | <math>Y_j</math> are independent and identically distributed, each with a cumulant generating function
| |
| − | <math>K(\xi)</math> finite for
| |
| − | <math>\xi</math>
| |
| − | in a neighborhood of
| |
| − | <math>0</math>. As above, define the exponential family
| |
| − | <math>
| |
| − | f_{\bar Y}(\bar y;\theta)=\exp(\theta\bar y-K(\theta))f_{\bar Y}(\bar y).
| |
| − | </math>
| |
| − | One can then choose a value of
| |
| − | <math>\theta</math> depending on
| |
| − | <math>\bar y</math>
| |
| − | that makes
| |
| − | <math>f_{\bar Y}(\bar y;\theta)</math> easy to approximate, and
| |
| − | the exponential family relationship to derive an approximation for
| |
| − | <math>f_{\bar Y}(\bar y)</math>. Conventionally we choose
| |
| − | <math>\hat\theta</math> to
| |
| − | satisfy
| |
| − | <math speqn>
| |
| − | K'(\hat\theta)=\bar y;
| |
| − | </math>
| |
| − | this makes the expectation of the distribution
| |
| − | with density
| |
| − | <math>f_{\bar Y}(\cdot;\hat\theta)</math> equal to the observed value.
| |
| − | One then applies (<ref>edser</ref>), with the scale of the ordinate changed
| |
| − | to reflect the fact that we are approximating the distribution of
| |
| − | <math>X/\sqrt{n}</math>,
| |
| − | to obtain
| |
| − | <math>
| |
| − | f_{\bar Y}(\bar y)\approx\exp(-\hat\theta\bar y+K(\hat\theta))
| |
| − | n\phi(0)\left[1+{{\kappa_3 h^3(0)}\over{6\sqrt{n}}}+{{\kappa_4h^4(0)}\over{24 n}}+
| |
| − | {{\kappa_3^2h^6(0)}\over{72 n}}\right].
| |
| − | </math>
| |
| − | Using the fact that <math>h^3(0)=0</math>,
| |
| − | <math>h^4(0)=3</math>, and <math>h^6(0)=-15</math>,
| |
| − | we obtain <math spser>
| |
| − | f_{\bar Y}(\bar y)\approx{{n}\over{\sqrt{2\pi}}}
| |
| − | \exp(K(\hat\theta)-\hat\theta\bar y)
| |
| − | \left[1+{{\hat\kappa_4}\over{8 n}}-
| |
| − | {{5\hat\kappa_3^2}\over{24 n}}\right].
| |
| − | </math>
| |
| − | Here
| |
| − | <math>\hat\kappa_j</math> are calculated from the derivatives of
| |
| − | <math>K</math> in the preceding manner, but in this case evaluated at
| |
| − | <math>\hat\theta</math>.
| |
| − | This approximation may only be applied to values of
| |
| − | <math>\bar y</math> for which
| |
| − | (<ref>speqn</ref>) has solutions in an open neighborhood of 0.
| |
| − | Expression (<ref>spser</ref>) represents the saddlepoint approximation to
| |
| − | the density of the mean
| |
| − | <math>\bar Y</math>; since
| |
| − | <math>f_{\bar Y}(\bar y;\theta)</math>
| |
| − | has a cumulant generating function defined on an open set containing
| |
| − | <math>0</math>,
| |
| − | cumulants of all orders exist, the Edgeworth series including
| |
| − | <math>\kappa_6</math>
| |
| − | may be applied to
| |
| − | <math>f_{\bar Y}(\bar y;\theta)</math>, and so the error in the
| |
| − | Edgeworth series is of order
| |
| − | <math>O(1/n^2)</math>. Hence the error in (<ref>spser</ref>)
| |
| − | is of the same order, and in this case, is relative and uniform for values of
| |
| − | <math>\bar y</math> in a bounded subset of an open subset on which (<ref>speqn</ref>)
| |
| − | has a solution.
| |
| − | This approximation was introduced to the statistics literature by
| |
| − | Daniels (1954).
| |
| − |
| |
| − | The Edgeworth series for the density was trivially integrated to obtain an
| |
| − | approximation to tail probabilities. Integration of the saddlepoint
| |
| − | approximation is more delicate. Two main approaches have been investigated.
| |
| − | Daniels (1987) expresses
| |
| − | <math>f_{\bar Y}(\bar y)</math> exactly as a complex integral
| |
| − | involving
| |
| − | <math>K(\xi)</math>, integrates with respect to
| |
| − | <math>\bar y</math> to obtain another
| |
| − | complex integral, and reviews techniques for approximating the resulting
| |
| − | integrals.
| |
| − | Robinson (1982) and Lugannani and Rice (1980) derive tail probability approximations based
| |
| − | on approximately integrating (<ref>spser</ref>) with respect to
| |
| − | <math>\bar y</math> directly.
| |
| − |
| |
| − | These saddlepoint and Edgeworth approximations have multivariate and
| |
| − | conditional extensions. Davison (1988) exploits the conditional saddlepoint tail probability approximation to perform inference in canonical exponential families.
| |
| − | ==Samples and sub-samples==
| |
| − | A function
| |
| − | <math>f\colon{\mathcal R}^n\to{\mathcal R}</math> is symmetric if
| |
| − | <math>f(x_1 ,\ldots, x_n) = f(x_{\pi(1)} ,\ldots, x_{\pi(n)})</math>
| |
| − | for each permutation
| |
| − | <math>\pi</math> of the arguments.
| |
| − | For example, the total
| |
| − | <math>T_n = x_1 + \cdots + x_n</math>, the average
| |
| − | <math>T_n/n</math>,
| |
| − | the min, max and median are symmetric functions, as are the sum of squares
| |
| − | <math>S_n = \sum x_i^2</math>, the sample variance
| |
| − | <math>s_n^2 = (S_n - T_n^2/n)/(n-1)</math>
| |
| − | and the mean absolute deviation
| |
| − | <math>\sum |x_i - x_j|/(n(n-1))</math>.
| |
| − |
| |
| − | A vector
| |
| − | <math>x</math> in
| |
| − | <math>{\mathcal R}^n</math> is an ordered list of
| |
| − | <math>n</math> real numbers
| |
| − | <math>(x_1 ,\ldots, x_n)</math>
| |
| − | or a function
| |
| − | <math>x\colon[n]\to{\mathcal R}</math> where
| |
| − | <math>[n]=\{1 ,\ldots, n\}</math>.
| |
| − | For
| |
| − | <math>m \le n</math>, a 1--1 function
| |
| − | <math>\varphi\colon[m]\to[n]</math> is a sample of size~
| |
| − | <math>m</math>,
| |
| − | the sampled values being
| |
| − | <math>x\varphi = (x_{\varphi(1)} ,\ldots, x_{\varphi(m)})</math>.
| |
| − | All told, there are
| |
| − | <math>n(n-1)\cdots(n-m+1)</math> distinct samples of size~
| |
| − | <math>m</math>
| |
| − | that can be taken from a list of length~
| |
| − | <math>n</math>.
| |
| − | A \emph{sequence} of functions
| |
| − | <math>f_n\colon{\mathcal R}^n\to{\mathcal R}</math> is
| |
| − | consistent under sub-sampling if, for each
| |
| − | <math>f_m, f_n</math>,
| |
| − | <math>
| |
| − | f_n(x) = {\rm ave} _\varphi f_m(x\varphi),
| |
| − | </math>
| |
| − | where
| |
| − | <math>{\rm ave} _\varphi</math> denotes the average over samples of size~
| |
| − | <math>m</math>.
| |
| − | For
| |
| − | <math>m=n</math>, this condition implies only that
| |
| − | <math>f_n</math> is a symmetric function.
| |
| − |
| |
| − | Although the total and the median are both symmetric functions, neither is consistent
| |
| − | under sub-sampling.
| |
| − | For example, the median of the numbers
| |
| − | <math>(0,1,3)</math> is one,
| |
| − | but the average of the medians of samples of size two is 4/3.
| |
| − | However, the average
| |
| − | <math>\bar x_n = T_n/n</math> is sampling consistent.
| |
| − | Likewise the sample variance
| |
| − | <math>s_n^2 = \sum(x_i - \bar x)^2/(n-1)</math> with divisor
| |
| − | <math>n-1</math>
| |
| − | is sampling consistent,
| |
| − | but the mean squared deviation
| |
| − | <math>\sum(x_i - \bar x_n)^2/n</math> with divisor
| |
| − | <math>n</math> is not.
| |
| − | Other sampling consistent functions include Fisher's
| |
| − | <math>k</math>-statistics,
| |
| − | the first few of which are
| |
| − | <math>k_{1,n} = \bar x_n</math>,
| |
| − | <math>k_{2,n} = s_n^2</math> for
| |
| − | <math>n\ge 2</math>,
| |
| − | \begin{eqnarray*}
| |
| − | k_{3,n} &=& n\sum(x_i - \bar x_n)^3/((n-1)(n-2))\\
| |
| − | k_{4,n} &=&
| |
| − | \end{eqnarray*}
| |
| − | defined for
| |
| − | <math>n\ge 3</math> and
| |
| − | <math>n\ge 4</math> respectively.
| |
| − |
| |
| − | For a sequence of independent and identically distributed random variables,
| |
| − | the
| |
| − | <math>k</math>-statistic of order~
| |
| − | <math>r\le n</math> is the unique symmetric function
| |
| − | such that
| |
| − | <math>E(k_{r,n}) = \kappa_r</math>.
| |
| − | Fisher (1929) derived the variances and covariances.
| |
| − | The connection with finite-population sub-sampling was developed by
| |
| − | Tukey (1950).
| |
| − |
| |
| − |
| |
| − | ==References==
| |
| − | *
| |
| − | H. E. Daniels.
| |
| − | Saddlepoint approximations in statistics.
| |
| − | <em>The Annals of Mathematical Statistics<\em>, 25
| |
| − | (4): 631--650, 1954.
| |
| − |
| |
| − | *
| |
| − | H. E. Daniels.
| |
| − | Tail probability approximations.
| |
| − | <em>Review of the International Statistical Institute<\em>,
| |
| − | 55: 37--46, 1987.
| |
| − |
| |
| − | *
| |
| − | A. C. Davison.
| |
| − | Approximate conditional inference in generalized linear models.
| |
| − | <em>Journal of the Royal Statistical Society Series B<\em>,
| |
| − | 50: 445--461, 1988.
| |
| − |
| |
| − | *
| |
| − | E. Di Nardo, G. Guarino, and D. Senato.
| |
| − | A unifying framework for $k$-statistics, polykays and their
| |
| − | multivariate generalizations.
| |
| − | <em>Bernoulli<\em>, 14: 440--468, 2008.
| |
| − |
| |
| − | *
| |
| − | P. L. Dressel.
| |
| − | Statistical seminvariants and their setimates with particular
| |
| − | emphasis on their relation to algebraic invariants.
| |
| − | <em>The Annals of Mathematical Statistics<\em>, 11
| |
| − | (1): 33--57, 1940.
| |
| − |
| |
| − | *
| |
| − | F. Y. Edgeworth.
| |
| − | On the representation of statistical frequency by a series.
| |
| − | <em>Journal of the Royal Statistical Society<\em>, 70
| |
| − | (1): 102--106, 1907.
| |
| − |
| |
| − | *
| |
| − | R. A. Fisher.
| |
| − | Moments and product moments of sampling distributions.
| |
| − | <em>Proceedings of the London Mathematical Society, Series 2<\em>,
| |
| − | 30: 199--238, 1929.
| |
| − |
| |
| − | *
| |
| − | I. J. Good.
| |
| − | A new formula for k-statistics.
| |
| − | <em>The Annals of Statistics<\em>, 5 (1): 224--228,
| |
| − | 1977.
| |
| − |
| |
| − | *
| |
| − | C. C. Heyde.
| |
| − | On a property of the lognormal distribution.
| |
| − | Journal of the Royal Statistical Society. Series B
| |
| − | (Methodological)}, 25 (2): 392--393, 1963.
| |
| − |
| |
| − | *
| |
| − | J. E. Kolassa.
| |
| − | <em>Series Approximation Methods in Statistics<\em>.
| |
| − | Springer--Verlag, New York, 2006.
| |
| − |
| |
| − | *
| |
| − | R. Lugannani and S. Rice.
| |
| − | Saddle point approximation for the distribution of the sum of
| |
| − | independent random variables.
| |
| − | <em>Advances in Applied Probability<\em>, 12: 475--490, 1980.
| |
| − |
| |
| − | *
| |
| − | J. Marcinkiewicz.
| |
| − | Sur une peropri\'et\'e de la loi de {G}auss.
| |
| − | <em>Mathematische Zeitschrift<\em>, 44: 612--618, 1939.
| |
| − |
| |
| − | *
| |
| − | J. Robinson.
| |
| − | Saddlepoint approximations for permutation tests and confidence
| |
| − | intervals.
| |
| − | Journal of the Royal Statistical Society. Series B
| |
| − | (Methodological)}, 44 (1): 91--101, 1982.
| |
| − |
| |
| − | *
| |
| − | G.-C. Rota and B. D. Taylor.
| |
| − | The classical umbral calculus.
| |
| − | <em>SIAM J. Math. Anal<\em>, (25): 694--711, 1994.
| |
| − |
| |
| − | *
| |
| − | T. N. Thiele.
| |
| − | <em>Theory of Observations<\em>.
| |
| − | C. & E. Layton, London, 1903.
| |
| − |
| |
| − | *
| |
| − | J. W. Tukey.
| |
| − | Some sampling simplified.
| |
| − | <em>Journal of the American Statistical Association<\em>, 45
| |
| − | (252): 501--519, 1950.
| |