|
|
| Line 1: |
Line 1: |
| − | This is a draft, 1st author Peter McCullagh
| |
| − |
| |
| | This article describes a sequence of numbers, called ''' cumulants''', | | This article describes a sequence of numbers, called ''' cumulants''', |
| | that are used | | that are used |
| Line 12: |
Line 10: |
| | be used in a simple way to describe the difference between a distribution and | | be used in a simple way to describe the difference between a distribution and |
| | its simplest Gaussian approximation. | | its simplest Gaussian approximation. |
| − | ==Overview and Definitions==
| + | |
| − | ===Definition===
| + | |
| − | The moment of order
| + | |
| − | <math>r</math> (or
| + | |
| − | <math>r</math>th moment) of a real-valued random variable
| + | |
| − | <math>X</math> is
| + | |
| − | :<math>
| + | |
| − | \mu_r = E(X^r)
| + | |
| − | </math>
| + | |
| − | for integer
| + | |
| − | <math>r=0,1,\ldots</math>.
| + | |
| − | The value is assumed to be finite.
| + | |
| − | Provided that it has a Taylor expansion about the origin,
| + | |
| | The moment generating function (or Fourier--Laplace transform) | | The moment generating function (or Fourier--Laplace transform) |
| − | :<math powerseries>
| + | <math powerseries> |
| | M(\xi) = E(e^{\xi X}) | | M(\xi) = E(e^{\xi X}) |
| − | = E(1 + \xi X +\cdots + \xi^r X^r/r!+\cdots)
| |
| − | = \sum_{r=0}^\infty \mu_r \xi^r/r!
| |
| | </math> | | </math> |
| | is an easy way to combine all of the moments into a single expression. | | is an easy way to combine all of the moments into a single expression. |
| − | The
| + | The cumulants up to order four are defined |
| − | <math>r</math>th moment is hence the
| + | |
| − | <math>r</math>th derivative of
| + | |
| − | <math>M</math> at the origin.
| + | |
| − | This definition is due to Fisher (1929).
| + | |
| − | | + | |
| − | When
| + | |
| − | <math>X</math> has a distribution given by a density
| + | |
| − | <math>f</math>, then
| + | |
| − | :<math ctsmomdef>
| + | |
| − | \mu_r = \int_{-\infty}^\infty x^r f(x)\,dx,</math> and
| + | |
| − | :<math mgfdef>
| + | |
| − | M(\xi) = E(e^{\xi X}) =\int_{-\infty}^\infty\exp(\xi x) f(x) d x.
| + | |
| − | </math>
| + | |
| − | | + | |
| − | The cumulants
| + | |
| − | <math>\kappa_r</math> are the coefficients in the Taylor expansion of
| + | |
| − | the cumulant generating function about the origin
| + | |
| − | :<math>
| + | |
| − | K(\xi) = \log M(\xi) = \sum_{r} \kappa_r \xi^r/r!.
| + | |
| − | </math>
| + | |
| − | Evidently
| + | |
| − | <math>\mu_0 = 1</math> implies
| + | |
| − | <math>\kappa_0 = 0</math>.
| + | |
| − | The relationship between the first few moments and cumulants,
| + | |
| − | obtained by extracting coefficients from the expansion, is as follows
| + | |
| − | :<math forward>\begin{array}{lcl}
| + | |
| − | \kappa_1 &=& \mu_1 \\
| + | |
| − | \kappa_2 &=& \mu_2 - \mu_1^2\\
| + | |
| − | \kappa_3 &=& \mu_3 - 3\mu_2\mu_1 + 2\mu_1^3\\
| + | |
| − | \kappa_4 &=& \mu_4 - 4\mu_3\mu_1 - 3\mu_2^2 + 12\mu_2\mu_1^2 -6\mu_1^4.
| + | |
| − | \end{array}</math>
| + | |
| − | In the reverse direction
| + | |
| − | :<math reverse>\begin{array}{lcl}
| + | |
| − | \mu_2 &=& \kappa_2 + \kappa_1^2\\
| + | |
| − | \mu_3 &=& \kappa_3 + 3\kappa_2\kappa_1 + \kappa_1^3\\
| + | |
| − | \mu_4 &=& \kappa_4 + 4\kappa_3\kappa_1 + 3\kappa_2^2 + 6\kappa_2\kappa_1^2 + \kappa_1^4.
| + | |
| − | \end{array}</math>
| + | |
| − | In particular,
| + | |
| − | <math>\kappa_1 = \mu_1</math> is the mean of
| + | |
| − | <math>X</math>,
| + | |
| − | <math>\kappa_2</math> is the
| + | |
| − | variance, and
| + | |
| − | <math>\kappa_3 = E((X - \mu_1)^3)</math>.
| + | |
| − | Higher-order cumulants are not the same as moments about the mean.
| + | |
| − | Hald (2000) credits Thiele (1889) with the first derivation of cumulants.
| + | |
| − | Lauritzen (2002) presents an overview, translation, and reprinting of much of this early work.
| + | |
| − | ===Examples===
| + | |
| − | As above, let <math> {\mathcal R}</math> denote the real numbers.
| + | |
| − | Let <math> {\mathcal R}^+</math> represent the positive reals, and let <math> {\mathcal N}=\{0,1,\ldots\}</math> be the natural numbers.
| + | |
| − | | + | |
| − | | + | |
| − | <table><tr><td>Distribution</td><td>Density </td><td>CGF</td><td>Cumulants</td></tr>
| + | |
| − | <tr><td>Normal</td><td><math> \frac{\exp(-x^2)}{\sqrt{2\pi}}, x\in{\mathcal R}</math></td><td><math> \xi^2/2</math></td><td><math> \kappa_1=0</math>, <math> \kappa_2=1</math>, <math> \kappa_r=0</math> for <math>r>2</math></td></tr>
| + | |
| − | <tr><td>Bernoulli</td><td><math> \pi^x(1-\pi)^{1-x}, x\in\{0,1\}</math></td><td><math> \log(1-\pi+\pi\exp(\xi))</math></td><td><math> \kappa_1=\pi</math>, <math> \kappa_2=\pi(1-\pi)</math>, <math> \kappa_3=[2 \pi ^3-3 \pi ^2+\pi]</math></td></tr>
| + | |
| − | <tr><td>Poisson</td><td><math> \frac{\exp(-\lambda)\lambda^x}{x!}, x\in{\mathcal N} </math></td><td><math> (e^{\xi }-1)\lambda</math></td><td><math> \kappa_r=\lambda \ \forall r</math> </td></tr>
| + | |
| − | <tr><td>Exponential</td><td><math> \frac{\exp(-x/\lambda)}{\lambda}, x\in{\mathcal R}^+</math></td><td><math> -\log(1-\lambda\xi)</math></td><td><math> \kappa_r=\lambda^r(r-1)! \ \forall r</math> </td></tr>
| + | |
| − | <tr><td>Geometric</td><td><math> (1-\pi)\pi^x, x\in{\mathcal N}</math></td><td><math> \log(1-\pi)-\log(1-\pi\exp(\xi)) </math></td><td> <math> \kappa_1=\rho</math>, <math> \kappa_2=\rho^2+\rho</math>,<math> \kappa_3=2 \rho ^3+3 \rho ^2+\rho</math> for <math> \rho=\pi/(1-\pi)</math>.</td></tr>
| + | |
| − | </table>
| + | |
| − | ===Definitions under less restrictive conditions===
| + | |
| − | The Cauchy distribution with density <math> \pi^{-1}/(1+x^2)</math> has no moments because
| + | |
| − | the integral (<ref>ctsmomdef</ref>) does not converge for any integer <math> r\ge 1</math>
| + | |
| − | Student's <math> t</math> distribution on five degrees of freedom is symmetric with density
| + | |
| − | <math> (3\pi\surd5/8)/(1 + x^2/5)^3</math>
| + | |
| − | The first four moments are <math> 0, 5/3, 0, 25</math> : higher-order moments are
| + | |
| − | not defined.
| + | |
| − | The cumulants up to order four are defined by (<ref>forward</ref>) | + | |
| | even though the moment generating function (<ref>powerseries</ref>) does not exist | | even though the moment generating function (<ref>powerseries</ref>) does not exist |
| | for any real <math> \xi\neq 0</math> . | | for any real <math> \xi\neq 0</math> . |
| | | | |
| − | In both of these cases, the characteristic function <math> M(i\xi)</math> is
| |
| − | well-defined for real <math> \xi</math> ,
| |
| − | <math> \exp(-|\xi|)</math> for the Cauchy distribution,
| |
| − | and <math> \exp(-|\xi|\surd 5)(1 + |\xi|\surd5 + 5\xi^2/3)</math> for <math> t_5</math> .
| |
| − | In the latter case, both <math> M(i\xi)</math> and <math> K(i\xi)</math>
| |
| − | have Taylor expansions up to order four only, so the moments and
| |
| − | cumulants are defined only up to this order.
| |
| − | The infinite expansion (<ref>powerseries</ref>) is justified when
| |
| − | the radius of convergence is positive, in which case <math> M(\xi)</math> is finite on
| |
| − | an open set containing zero, and all moments and cumulants are finite.
| |
| − | However, finiteness of the moments does not imply that <math> M(\xi)</math>
| |
| − | exists for any <math> \xi\neq 0</math> .
| |
| − | The log normal distribution provides a counterexample.
| |
| | It has finite moments <math> \mu_r = e^{r^2/2}</math> of all orders, | | It has finite moments <math> \mu_r = e^{r^2/2}</math> of all orders, |
| − | but (<ref>powerseries</ref>) diverges for every <math> \xi\neq 0</math>. | + | but equation (<ref>powerseries</ref>) diverges for every <math> \xi\neq 0</math>. |
| − | ===Uniqueness===
| + | |
| − | The normal distribution
| + | |
| − | <math>N(\mu, \sigma^2)</math> has cumulant generating function
| + | |
| − | <math>\xi\mu + \xi^2 \sigma^2/2</math>, a quadratic polynomial implying that all cumulants
| + | |
| − | of order three and higher are zero.
| + | |
| − | Marcinkiewicz (1939) showed that the normal distribution is the only distribution
| + | |
| − | whose cumulant generating function is a polynomial, i.e. the only distribution
| + | |
| − | having a finite number of non-zero cumulants.
| + | |
| − | The Poisson distribution with mean
| + | |
| − | <math>\mu</math> has moment generating function
| + | |
| − | <math>\exp(\mu(e^\xi - 1))</math> and cumulant generating function
| + | |
| − | <math>\mu(e^\xi -1)</math>.
| + | |
| − | Consequently all the cumulants are equal to the mean.
| + | |
| − | | + | |
| − | Two distinct distributions may have the same moments, and hence the same cumulants.
| + | |
| − | This statement is fairly obvious for distributions whose moments are all infinite,
| + | |
| − | or even for distributions having infinite higher-order moments.
| + | |
| − | But it is much less obvious for distributions having finite moments of all orders.
| + | |
| − | Heyde (1963) gave one such pair of distributions with densities
| + | |
| − | <math>
| + | |
| − | f_1(x) = \exp(-(\log x)^2/2) / (x\sqrt{2\pi})
| + | |
| − | </math>
| + | |
| − | and <math>
| + | |
| − | f_2(x) = f_1(x) [1 + \sin(2\pi\log x)/2]
| + | |
| − | </math>
| + | |
| − | for
| + | |
| − | <math>x > 0</math>.
| + | |
| − | The first of these is called the log normal distribution.
| + | |
| − | To show that these distributions have the same moments it suffices to show that
| + | |
| − | :<math>
| + | |
| − | \int_0^\infty x^k f_1(x) \sin(2\pi\log x)\, dx = 0
| + | |
| − | </math>
| + | |
| − | for integer
| + | |
| − | <math>k\ge 1</math>, which can be shown by making the substitution
| + | |
| − | <math>\log x = y+k</math>.
| + | |
| − | | + | |
| − | If the sequence of moments is such that (<ref>powerseries</ref>)
| + | |
| − | has a finite radius of convergence, the distribution is uniquely determined.
| + | |
| − | | + | |
| − | ===Properties===
| + | |
| − | Cumulants of order
| + | |
| − | <math>r \ge 2</math> are called semi-invariant on account of their
| + | |
| − | behavior under affine transformation of variables (Thiele ,1903, Dressel ,1940).
| + | |
| − | If
| + | |
| − | <math>\kappa_r</math> is the
| + | |
| − | <math>r</math>th cumulant of
| + | |
| − | <math>X</math>,
| + | |
| − | the
| + | |
| − | <math>r</math>th cumulant of the affine transformation
| + | |
| − | <math>a + b X</math> is
| + | |
| − | <math>b^r \kappa_r</math>,
| + | |
| − | independent of
| + | |
| − | <math>a</math>.
| + | |
| − | This behavior is considerably simpler than that of moments.
| + | |
| − | However, moments about the mean are also semi-invariant, so this property alone
| + | |
| − | does not explain why cumulants are useful for statistical purposes.
| + | |
| − | | + | |
| − | The term cumulant was coined by Fisher (1929) on account of their behavior under
| + | |
| − | addition of random variables.
| + | |
| − | Let
| + | |
| − | <math>S = X+Y</math> be the sum of two independent random variables.
| + | |
| − | The moment generating function of the sum is the product
| + | |
| − | :<math>
| + | |
| − | M_S(\xi) = M_X(\xi) M_Y(\xi),
| + | |
| − | </math>
| + | |
| − | and the cumulant generating function is the sum
| + | |
| − | :<math>
| + | |
| − | K_S(\xi) = K_X(\xi) + K_Y(\xi).
| + | |
| − | </math>
| + | |
| − | Consequently, the
| + | |
| − | <math>r</math>th cumulant of the sum is the sum of the
| + | |
| − | <math>r</math>th cumulants.
| + | |
| − | By extension, if
| + | |
| − | <math>X_1,\ldots X_n</math> are independent and identically distributed,
| + | |
| − | the
| + | |
| − | <math>r</math>th cumulant of the sum is
| + | |
| − | <math>n\kappa_r</math>.
| + | |
| − | Let
| + | |
| − | <math>\kappa_{n;r}</math> be
| + | |
| − | cumulant of order
| + | |
| − | <math>r</math> of the standardized sum
| + | |
| − | <math>n^{-1/2}(X_1+\cdots + X_n)</math>;
| + | |
| − | then
| + | |
| − | :<math ndep>
| + | |
| − | \kappa_{n;r}=n^{1-r/2} \kappa_r.
| + | |
| − | </math>
| + | |
| − | Provided that the cumulants are finite, all cumulants of order
| + | |
| − | <math>r\ge 3</math>
| + | |
| − | of the standardized sum tend to zero, which is a simple demonstration of the central limit theorem.
| + | |
| − | | + | |
| − | Good (1977) obtained an expression for the
| + | |
| − | <math>r</math>th cumulant of
| + | |
| − | <math>X</math> as
| + | |
| − | the
| + | |
| − | <math>r</math>th moment of the discrete Fourier transform of an independent and
| + | |
| − | identically distributed sequence as follows.
| + | |
| − | Let
| + | |
| − | <math>X_1, X_2,\ldots</math> be independent copies of
| + | |
| − | <math>X</math> with
| + | |
| − | <math>r</math>th cumulant
| + | |
| − | <math>\kappa_r</math>,
| + | |
| − | and let
| + | |
| − | <math>\omega = e^{2\pi i/n}</math> be a primitive
| + | |
| − | <math>n</math>th root of unity.
| + | |
| − | The discrete Fourier combination
| + | |
| − | :<math>
| + | |
| − | Z = X_1 + \omega X_2 + \cdots + \omega^{n-1} X_n
| + | |
| − | </math>
| + | |
| − | is a complex-valued random variable whose distribution is invariant under
| + | |
| − | rotation
| + | |
| − | <math>Z\sim \omega Z</math> through multiples of
| + | |
| − | <math>2\pi /n</math>.
| + | |
| − | The
| + | |
| − | <math>r</math>th cumulant of the sum is
| + | |
| − | <math>\kappa_r \sum_{j=1}^n \omega^{r j}</math>,
| + | |
| − | which is equal to
| + | |
| − | <math>n\kappa_r</math> if
| + | |
| − | <math>r</math> is a multiple of
| + | |
| − | <math>n</math>, and zero otherwise.
| + | |
| − | Consequently
| + | |
| − | <math>E(Z^r) = 0</math> for integer
| + | |
| − | <math>r < n</math> and
| + | |
| − | <math>E(Z^n) = n\kappa_n</math>.
| + | |
| − | | + | |
| − | | + | |
| − | ===Multivariate cumulants===
| + | |
| − | Somewhat surprisingly, the relation between moments and cumulants is simpler and
| + | |
| − | more transparent in the multivariate case than in the univariate case.
| + | |
| − | Let
| + | |
| − | <math>X = (X^1,\ldots, X^k)</math> be the components of a random vector.
| + | |
| − | In a departure from the univariate notation, we write
| + | |
| − | <math>\kappa^r = E(X^r)</math> for the components of the mean vector,
| + | |
| − | <math>\kappa^{rs} = E(X^r X^s)</math> for the components of the second moment matrix,
| + | |
| − | <math>\kappa^{r s t} = E(X^r X^s X^t)</math> for the third moments, and so on.
| + | |
| − | It is convenient notationally to adopt Einstein's summation convention,
| + | |
| − | so
| + | |
| − | <math>\xi_r X^r</math> denotes the linear combination
| + | |
| − | <math>\xi_1 X^1 + \cdots + \xi_k X^k</math>,
| + | |
| − | the square of the linear combination is
| + | |
| − | <math>(\xi_r X^r)^2 = \xi_r\xi_s X^r X^s</math>
| + | |
| − | a sum of
| + | |
| − | <math>k^2</math> terms, and so on for higher powers.
| + | |
| − | The Taylor expansion of the moment generating function
| + | |
| − | <math>M(\xi) = E(\exp(\xi_r X^r)</math>
| + | |
| − | is
| + | |
| − | :<math>
| + | |
| − | M(\xi) = 1 + \xi_r \kappa^r
| + | |
| − | + \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{rs}
| + | |
| − | + \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r s t} +\cdots.
| + | |
| − | </math>
| + | |
| − | The cumulants are defined as the coefficients
| + | |
| − | <math>\kappa^{r,s}, \kappa^{r,s,t},\ldots</math>
| + | |
| − | in the Taylor expansion
| + | |
| − | :<math>
| + | |
| − | \log M(\xi) = \xi_r \kappa^r
| + | |
| − | + \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{r,s}
| + | |
| − | + \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r,s,t} +\cdots.
| + | |
| − | </math>
| + | |
| − | This notation does not distinguish first-order moments from first-order cumulants,
| + | |
| − | but commas separating the superscripts serve to distinguish higher-order cumulants from moments.
| + | |
| − | | + | |
| − | Comparison of coefficients reveals that the each moment
| + | |
| − | <math>\kappa^{rs}, \kappa^{r s t},\ldots</math>
| + | |
| − | is a sum over partitions of the superscripts, each term in the sum being a
| + | |
| − | product of cumulants:
| + | |
| − | :<math>\begin{array}{lcl}
| + | |
| − | \kappa^{rs}&=&\kappa^{r,s} + \kappa^r\kappa^s\\
| + | |
| − | \kappa^{r s t}&=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t + \kappa^{r,t}\kappa^s + \kappa^{s,t}\kappa^r
| + | |
| − | + \kappa^r\kappa^s\kappa^t\\
| + | |
| − | &=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t[3] + \kappa^r\kappa^s\kappa^t\\
| + | |
| − | \kappa^{r s t u}&=&\kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,s}\kappa^{t,u}[3]
| + | |
| − | + \kappa^{r,s}\kappa^t\kappa^u[6] + \kappa^r\kappa^s\kappa^t\kappa^u.
| + | |
| − | \end{array}</math>
| + | |
| − | Each parenthetical number indicates a sum over distinct partitions
| + | |
| − | having the same block sizes, so the fourth-order moment is a sum of 15 distinct cumulant products.
| + | |
| − | In the reverse direction, each cumulant is also a sum over partitions of the indices.
| + | |
| − | Each term in the sum is a product of moments, but with coefficient
| + | |
| − | <math>(-1)^{\nu-1} (\nu-1)!</math>
| + | |
| − | where
| + | |
| − | <math>\nu</math> is the number of blocks:
| + | |
| − | :<math>\begin{array}{lcl}
| + | |
| − | \kappa^{r,s} &=& \kappa^{rs} - \kappa^r\kappa^s\\
| + | |
| − | \kappa^{r,s,t} &=& \kappa^{r s t} - \kappa^{rs}\kappa^t[3] + 2 \kappa^r\kappa^s\kappa^t\\
| + | |
| − | \kappa^{r,s,t,u} &=& \kappa^{r s t u} - \kappa^{r s t}\kappa^u[4] - \kappa^{rs}\kappa^{t u}[3]
| + | |
| − | + 2 \kappa^{rs}\kappa^t\kappa^u[6] - 6 \kappa^r\kappa^s\kappa^t\kappa^u
| + | |
| − | \end{array}</math>
| + | |
| − | | + | |
| − | These relationships are an instance of M\"obius inversion on the partition lattice.
| + | |
| − | | + | |
| − | Partition notation serves one additional purpose.
| + | |
| − | It establishes moments and cumulants as special cases of generalized cumulants,
| + | |
| − | which includes objects of the type
| + | |
| − | <math>\kappa^{r,st} = {\rm cov}(X^r, X^s X^t)</math>,
| + | |
| − | <math>\kappa^{rs, t u} = {\rm cov}(X^r X^s, X^t X^u)</math>, and
| + | |
| − | <math>\kappa^{rs, t, u}</math> with incompletely partitioned indices.
| + | |
| − | These objects arise very naturally in statistical work involving asymptotic
| + | |
| − | approximation of distributions.
| + | |
| − | They are intermediate between moments and cumulants, and have characteristics of both.
| + | |
| − | | + | |
| − | Every generalized cumulant can be expressed as a sum of certain products of ordinary cumulants.
| + | |
| − | Some examples are as follows:
| + | |
| − | :<math>\begin{array}{lcl}
| + | |
| − | \kappa^{rs, t} &=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t} + \kappa^s \kappa^{r,t}\\
| + | |
| − | &=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t}[2]\\
| + | |
| − | \kappa^{rs,t u} &=& \kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,t}\kappa^{s,u}[2]
| + | |
| − | + \kappa^{r,t}\kappa^s\kappa^u[4]\\
| + | |
| − | \kappa^{rs,t,u} &=& \kappa^{r,s,t,u} + \kappa^{r,t,u}\kappa^s[2] + \kappa^{r,t}\kappa^{s,u}[2]
| + | |
| − | \end{array}</math>
| + | |
| − | Each generalized cumulant is associated with a partition
| + | |
| − | <math>\tau</math> of the given set of indices.
| + | |
| − | For example,
| + | |
| − | <math>\kappa^{rs,t,u}</math> is associated with the partition
| + | |
| − | <math>\tau=rs|t|u</math> of four indices
| + | |
| − | into three blocks.
| + | |
| − | Each term on the right is a cumulant product associated with a partition
| + | |
| − | <math>\sigma</math> of the same indices.
| + | |
| − | The coefficient is one if the least upper bound
| + | |
| − | <math>\sigma\vee\tau</math> has a single block,
| + | |
| − | otherwise zero.
| + | |
| − | Thus, with
| + | |
| − | <math>\tau=rs|t|u</math>, the product
| + | |
| − | <math>\kappa^{r,s}\kappa^{t,u}</math> does not appear
| + | |
| − | on the right because
| + | |
| − | <math>\sigma\vee\tau = rs|t u</math> has two blocks.
| + | |
| − | | + | |
| − | As an example of the way these formulae may be used,
| + | |
| − | let
| + | |
| − | <math>X</math> be a scalar random variable with cumulants
| + | |
| − | <math>\kappa_1,\kappa_2,\kappa_3,\ldots</math>.
| + | |
| − | By translating the second formula in the preceding list, we find that
| + | |
| − | the variance of the squared variable is
| + | |
| − | :<math>
| + | |
| − | {\rm var}(X^2) = \kappa_4 + 4\kappa_3\kappa_1 + 2\kappa_2^2 + 4\kappa_2\kappa_1^2,
| + | |
| − | </math>
| + | |
| − | reducing to
| + | |
| − | <math>\kappa_4 + 2\kappa_2^2</math> if the mean is zero.
| + | |
| − | | + | |
| − | ===Exponential families===
| + | |
| − | Let <math> f</math> be a probability distribution on an arbitrary measurable space <math> ({\mathcal X},\nu)</math> ,
| + | |
| − | and let <math> t\colon{\mathcal X}\to{\mathcal R}</math> be a real-valued random variable
| + | |
| − | with cumulant generating function
| + | |
| − | <math> K(\cdot)</math> , finite in a set <math> \Theta</math> containing zero in the interior.
| + | |
| − | The family of distributions on <math> {\mathcal X}</math> with density
| + | |
| − | :<math>
| + | |
| − | f_\theta(x) = e^{\theta t(x)} f(x) / M(\theta) = e^{\theta t(x) - K(\theta)} f(x)
| + | |
| − | </math>
| + | |
| − | indexed by <math> \theta\in\Theta</math> is called the exponential family
| + | |
| − | associated with <math> f</math> and the canonical statistic <math> t</math> .
| + | |
| − | In statistical physics, the normalizing constant <math> M(\theta)</math> is called the
| + | |
| − | partition function.
| + | |
| − | | + | |
| − | Two examples suffice to illustrate the idea.
| + | |
| − | In the first example, <math> {\mathcal X} = \{1,2,\ldots\}</math> is the set of natural numbers,
| + | |
| − | <math> f(x) \propto 1/x^2</math> and <math> t(x) = -\log(x)</math> .
| + | |
| − | The associated exponential family is
| + | |
| − | <math> f_\theta(x) = x^{-\theta}/\zeta(\theta)</math> ,
| + | |
| − | where <math> \zeta(\theta)</math> is the Riemann zeta function with real argument <math> \theta > 1</math> .
| + | |
| − | | + | |
| − | In the second example, <math> {\mathcal X}={\mathcal X}_n</math> is the symmetric group or the set of
| + | |
| − | permutations of <math> n</math> letters,
| + | |
| − | <math> x\in{\mathcal X}_n</math> is a permutation, <math> t(x)</math> is the number of cycles,
| + | |
| − | <math> f(x) = 1/n!</math> is the uniform distribution,
| + | |
| − | and <math> M_n(\xi) = \Gamma(n+e^\xi)/(n!\, \Gamma(e^\xi))</math> for all real <math> \xi</math> .
| + | |
| − | The exponential family of distributions on permutations of <math> [n]</math> is
| + | |
| − | :<math>
| + | |
| − | f_{n,\theta}(x) = \frac{\Gamma(\lambda)\, \lambda^{t(x)}} {\Gamma(n+\lambda)},
| + | |
| − | </math>
| + | |
| − | the same as the the distribution generated by the Chinese restaurant process
| + | |
| − | with parameter <math> \lambda = e^\theta</math> .
| + | |
| − | The associated marginal distribution on partitions,
| + | |
| − | the Ewens distribution on partitions of <math> [n]</math> ,
| + | |
| − | is also of the exponential-family form with canonical statistic equal
| + | |
| − | to the number of blocks or cycles.
| + | |
| − | This number <math> t(x)</math> is a random variable whose cumulants are the
| + | |
| − | derivatives of <math> \log M(\cdot)</math> evaluated at the parameter <math> \theta</math> .
| + | |
| − | | + | |
| − | | + | |
| − | In the multi-parameter case,
| + | |
| − | <math> t\colon{\mathcal X}\to{\mathcal R}^p</math> is a random vector
| + | |
| − | and <math> \xi\colon{\mathcal R}^p\to{\mathcal R}</math> is a linear functional,
| + | |
| − | <math> M(\xi) = E(e^{\xi(t)})</math> is the joint moment generating function.
| + | |
| − | It is sometimes convenient to employ Einstein's implicit summation convention
| + | |
| − | in the form <math> \theta(t) = \theta_i t^i</math> where <math> t^1,\ldots, t^p</math> are
| + | |
| − | the components of <math> t(x)</math> , and <math> \theta_1,\ldots, \theta_p</math> are the coefficients
| + | |
| − | of the linear functional.
| + | |
| − | For simplicity of notation in what follows, <math> {\mathcal X}={\mathcal R}^p</math> and <math> t(x) = x</math>
| + | |
| − | is the identity function.
| + | |
| − | An exponential-family distribution in <math> {\mathcal R}^p</math> has the form
| + | |
| − | :<math>
| + | |
| − | f_\theta(x)=\exp(x^j\theta_j-g(x)-\varphi(\theta))
| + | |
| − | </math>
| + | |
| − | for given functions <math> g</math> and <math> \varphi</math> .
| + | |
| − | Integration shows that the distribution <math> f_\theta</math> has
| + | |
| − | cumulant generating function <math> K_\theta(\xi)=\varphi(\theta+\xi)-\varphi(\theta)</math> .
| + | |
| − | The cumulants of <math> X\sim f_\theta</math> are equal to the derivatives of <math> \varphi</math>
| + | |
| − | at the parameter <math> \theta</math> .
| + | |
| − | | + | |
| − | ===Calculus of cumulants===
| + | |
| − | The umbral calculus is a syntax or formal system consisting of
| + | |
| − | certain operations on objects called umbrae,
| + | |
| − | mimicking addition and multiplication of independent real-valued random
| + | |
| − | variables. Rota and Taylor (1994) reviews this calculus.
| + | |
| − | To each real-valued sequence <math> 1, a_1, a_2,\ldots</math>
| + | |
| − | there corresponds an umbra <math> \alpha</math> such that <math> E(\alpha^r) = a_r</math> .
| + | |
| − | This freedom gives rise to special umbrae, the singleton and Bell umbra,
| + | |
| − | corresponding to no real-valued random variable.
| + | |
| − | Using these special umbrae, one develops the formal notion of an
| + | |
| − | <math>\alpha</math>-cumulant umbra
| + | |
| − | <math>\chi\cdot\alpha</math>
| + | |
| − | by formal product operations in the syntax.
| + | |
| − | Properties of cumulants, <math> k</math> -statistics and other polynomial functions
| + | |
| − | are then derived by purely formal combinatorial operations.
| + | |
| − | Di Nardo et al. (2008) present details.
| + | |
| − | | + | |
| − | Streitberg (1990) presents parallels between the calculus of cumulants and the
| + | |
| − | calculus of certain decompositions of multivariate cumulative distribution
| + | |
| − | functions into independent segments; these characterizations in terms of
| + | |
| − | independent segments are called Lancaster interactions.
| + | |
| − | ===Moment and Cumulant Measures for Random Measures===
| + | |
| − | Moments and cumulants extend quite naturally to random distributions.
| + | |
| − | Let <math>\upsilon</math> be a random measure on a space <math>\Upsilon</math>.
| + | |
| − | Then the expectation of <math>\upsilon</math> is
| + | |
| − | defined as that measure such that <math>E(\upsilon)(A)=E(\upsilon(A))</math>, for <math>A</math> in a suitable sigma field. Higher--order
| + | |
| − | moments then translate to expectations of product measures.
| + | |
| − | Let <math>\upsilon^{(k)}</math> be the measure defined on
| + | |
| − | <math>\Upsilon^k</math>, such that
| + | |
| − | <math>\upsilon^{(k)}(A_1\times\cdots\times A_k)=\prod_{j=1}^k\upsilon(A_j)</math>.
| + | |
| − | Then the moment of order <math>k</math> of <math>\upsilon</math> is <math>E(\upsilon^{(k)})</math>.
| + | |
| − | A moment generating functional can similarly be defined for <math>\upsilon</math>; a heuristic definition may be constructed through analogy with
| + | |
| − | (<ref>powerseries</ref>): Let
| + | |
| − | :<math>
| + | |
| − | \Phi(f)=\sum_{r=0}^\infty f(x_1)\ldots f(x_r)\upsilon^{(r)}(d x_1\cdots d x_r)/r!,
| + | |
| − | </math>
| + | |
| − | for certain functions <math>f</math> on <math>\Upsilon</math>,
| + | |
| − | and moments can be recovered from <math>\Phi(f)</math> via Fr\'echet
| + | |
| − | differentiation.
| + | |
| − | Cumulants can then be defined as in (<ref>forward</ref>), using the obvious analogy.
| + | |
| − | These moments and cumulants have application to the theory of point processes.
| + | |
| − | The above exposition, and applications to the theory of point processes,
| + | |
| − | can be found in Daley and Vere-Jones (1988).
| + | |
| − | ==Approximation of distributions==
| + | |
| − | ===Edgeworth approximation===
| + | |
| − | Suppose that
| + | |
| − | <math>Y</math> is a random variable that arises as the sum
| + | |
| − | of
| + | |
| − | <math>n</math> independent and identically-distributed summands, each of which has
| + | |
| − | mean
| + | |
| − | <math>0</math>, unit variance, and
| + | |
| − | cumulants
| + | |
| − | <math>\kappa_r</math>, and
| + | |
| − | <math>X=Y/\sqrt{n}</math>.
| + | |
| − | For ease of exposition, assume that cumulants of all orders exist.
| + | |
| − | Then, using (<ref>ndep</ref>), the cumulant generating function of
| + | |
| − | <math>X</math> is given by
| + | |
| − | <math>K(\xi)=\xi^2/2 +\kappa_3\xi^3/(6\sqrt{n}) +\kappa_4\xi^4/(24 n) +\cdots</math>,
| + | |
| − | and the moment generating function of
| + | |
| − | <math>X</math> is given by
| + | |
| − | :<math>
| + | |
| − | K(\xi)=\exp(\xi^2/2)\exp(\kappa_3\xi^3/(6\sqrt{n})+\kappa_4\xi^4/(24 n)+\cdots)
| + | |
| − | </math>
| + | |
| − | Exponentiating the second factor gives
| + | |
| − | :<math>
| + | |
| − | K(\xi)=\exp(\xi^2/2)\left(1\!+\!{{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\!+\! {\textstyle{\frac12}} \left[
| + | |
| − | {{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\right]^2\!+\!\!\cdots\right).
| + | |
| − | </math>
| + | |
| − | Reordering terms in powers of sample size,
| + | |
| − | :<math kseries>
| + | |
| − | =\exp(\xi^2/2)\left(1+{{\kappa_3\xi^3}\over{6\sqrt{n}}}+{{\kappa_4\xi^4}\over{24 n}}+
| + | |
| − | {{\kappa_3^2\xi^6}\over{72 n}}+\cdots\right).
| + | |
| − | </math>
| + | |
| − | Repeated application of integration by parts to (<ref>mgfdef</ref>) shows that
| + | |
| − | :<math mgfderiv>
| + | |
| − | \xi^r M(\xi) =\int_{-\infty}^\infty\exp(\xi x)(-1)^r f^{(r)}(x) d x,
| + | |
| − | </math>
| + | |
| − | where
| + | |
| − | <math>f^{(r)}</math> denotes the derivative of
| + | |
| − | <math>f</math> of order
| + | |
| − | <math>r</math>. Relation
| + | |
| − | (<ref>mgfderiv</ref>) holds if
| + | |
| − | <math>f</math> and its derivatives go to zero quickly
| + | |
| − | as
| + | |
| − | <math>\vert x\vert\to\infty</math>. Applying (<ref>mgfderiv</ref>) to the normal
| + | |
| − | density
| + | |
| − | <math>\phi(x)=\exp(-x^2/2)/\sqrt{2\pi}</math>, and applying the result to
| + | |
| − | (<ref>kseries</ref>), gives
| + | |
| − | :<math>
| + | |
| − | M(\xi)\approx\int_{-\infty}^\infty\exp(\xi x)\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+
| + | |
| − | {{\kappa_3^2h^6(x)}\over{72 n}}\right] d x
| + | |
| − | </math>
| + | |
| − | for
| + | |
| − | <math>h^r(x)=(-1)^r\phi^{(r)}(x)/\phi(x)</math>, and, since the relationship
| + | |
| − | giving the moment generating function in terms of the density is invertible,
| + | |
| − | and that the inversion process is properly smooth,
| + | |
| − | Edgeworth (1907) approximates the density of
| + | |
| − | <math>X</math> by
| + | |
| − | :<math edser>
| + | |
| − | e_4(x)=\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+
| + | |
| − | {{\kappa_3^2h^6(x)}\over{72 n}}\right].
| + | |
| − | </math>
| + | |
| − | In fact, when the summands contributing to
| + | |
| − | <math>S</math> have a density and cumulants of order at least 5, the error in the
| + | |
| − | approximation, multiplied by
| + | |
| − | <math>n^{3/2}</math>, remains bounded.
| + | |
| − | The functions
| + | |
| − | <math>h^r</math> defined above are the Hermite polynomials.
| + | |
| − | The approximation (<ref>edser</ref>) is known as the Edgeworth series.
| + | |
| − | The subscript refers to the number of cumulants used in its definition.
| + | |
| − | This series can be used to approximate either the cumulative distribution function or survival function through term-wise integration.
| + | |
| − | | + | |
| − | The preceding discussion is intended to be heuristic; Kolassa (2006) presents
| + | |
| − | a rigorous derivation, along with the natural extension to random vectors.
| + | |
| − | ===Saddlepoint approximation===
| + | |
| − | The approximation (<ref>edser</ref>) to the density
| + | |
| − | <math>f(x)</math> has the property that
| + | |
| − | <math>|f(x)-e_r(x)|\leq C n^{-(r-1)/2}</math>, for some constant
| + | |
| − | <math>C</math>,
| + | |
| − | when the cumulant of order
| + | |
| − | <math>r+1</math> exists;
| + | |
| − | <math>C</math> does not depend on
| + | |
| − | <math>x</math>.
| + | |
| − | A similar bound holds for the relative error
| + | |
| − | <math>(f(x)-e_r(x))/f(x)</math>, only when
| + | |
| − | <math>x</math> is restricted to a finite interval.
| + | |
| − | Because of the polynomial factor multiplying the first omitted term in
| + | |
| − | (<ref>edser</ref>), the relative error can be expected to behave poorly.
| + | |
| − | One might prefer an approximation that maintains good behavior for
| + | |
| − | values of
| + | |
| − | <math>X</math> in a range that increases as
| + | |
| − | <math>n</math> increases; specifically,
| + | |
| − | one might prefer an approximation that performs well for values of
| + | |
| − | <math>\bar Y=X/\sqrt{n}</math> in a fixed interval.
| + | |
| − | | + | |
| − | Assume again that random variables
| + | |
| − | <math>Y_j</math> are independent and identically distributed, each with a cumulant generating function
| + | |
| − | <math>K(\xi)</math> finite for
| + | |
| − | <math>\xi</math>
| + | |
| − | in a neighborhood of
| + | |
| − | <math>0</math>. As above, define the exponential family
| + | |
| − | :<math>
| + | |
| − | f_{\bar Y}(\bar y;\theta)=\exp(\theta\bar y-K(\theta))f_{\bar Y}(\bar y).
| + | |
| − | </math>
| + | |
| − | One can then choose a value of
| + | |
| − | <math>\theta</math> depending on
| + | |
| − | <math>\bar y</math>
| + | |
| − | that makes
| + | |
| − | <math>f_{\bar Y}(\bar y;\theta)</math> easy to approximate, and
| + | |
| − | the exponential family relationship to derive an approximation for
| + | |
| − | <math>f_{\bar Y}(\bar y)</math>. Conventionally we choose
| + | |
| − | <math>\hat\theta</math> to
| + | |
| − | satisfy
| + | |
| − | :<math speqn>
| + | |
| − | K'(\hat\theta)=\bar y;
| + | |
| − | </math>
| + | |
| − | this makes the expectation of the distribution
| + | |
| − | with density
| + | |
| − | <math>f_{\bar Y}(\cdot;\hat\theta)</math> equal to the observed value.
| + | |
| − | One then applies (<ref>edser</ref>), with the scale of the ordinate changed
| + | |
| − | to reflect the fact that we are approximating the distribution of
| + | |
| − | <math>X/\sqrt{n}</math>,
| + | |
| − | to obtain
| + | |
| − | :<math>
| + | |
| − | f_{\bar Y}(\bar y)\approx\exp(-\hat\theta\bar y+K(\hat\theta))
| + | |
| − | n\phi(0)\left[1+{{\kappa_3 h^3(0)}\over{6\sqrt{n}}}+{{\kappa_4h^4(0)}\over{24 n}}+
| + | |
| − | {{\kappa_3^2h^6(0)}\over{72 n}}\right].
| + | |
| − | </math>
| + | |
| − | Using the fact that <math>h^3(0)=0</math>,
| + | |
| − | <math>h^4(0)=3</math>, and <math>h^6(0)=-15</math>,
| + | |
| − | we obtain :<math spser>
| + | |
| − | f_{\bar Y}(\bar y)\approx{{n}\over{\sqrt{2\pi}}}
| + | |
| − | \exp(K(\hat\theta)-\hat\theta\bar y)
| + | |
| − | \left[1+{{\hat\kappa_4}\over{8 n}}-
| + | |
| − | {{5\hat\kappa_3^2}\over{24 n}}\right].
| + | |
| − | </math>
| + | |
| − | Here
| + | |
| − | <math>\hat\kappa_j</math> are calculated from the derivatives of
| + | |
| − | <math>K</math> in the preceding manner, but in this case evaluated at
| + | |
| − | <math>\hat\theta</math>.
| + | |
| − | This approximation may only be applied to values of
| + | |
| − | <math>\bar y</math> for which
| + | |
| − | (<ref>speqn</ref>) has solutions in an open neighborhood of 0.
| + | |
| − | Expression (<ref>spser</ref>) represents the saddlepoint approximation to
| + | |
| − | the density of the mean
| + | |
| − | <math>\bar Y</math>; since
| + | |
| − | <math>f_{\bar Y}(\bar y;\theta)</math>
| + | |
| − | has a cumulant generating function defined on an open set containing
| + | |
| − | <math>0</math>,
| + | |
| − | cumulants of all orders exist, the Edgeworth series including
| + | |
| − | <math>\kappa_6</math>
| + | |
| − | may be applied to
| + | |
| − | <math>f_{\bar Y}(\bar y;\theta)</math>, and so the error in the
| + | |
| − | Edgeworth series is of order
| + | |
| − | <math>O(1/n^2)</math>. Hence the error in (<ref>spser</ref>)
| + | |
| − | is of the same order, and in this case, is relative and uniform for values of
| + | |
| − | <math>\bar y</math> in a bounded subset of an open subset on which (<ref>speqn</ref>)
| + | |
| − | has a solution.
| + | |
| − | This approximation was introduced to the statistics literature by
| + | |
| − | Daniels (1954).
| + | |
| − | | + | |
| − | The Edgeworth series for the density was trivially integrated to obtain an
| + | |
| − | approximation to tail probabilities. Integration of the saddlepoint
| + | |
| − | approximation is more delicate. Two main approaches have been investigated.
| + | |
| − | Daniels (1987) expresses
| + | |
| − | <math>f_{\bar Y}(\bar y)</math> exactly as a complex integral
| + | |
| − | involving
| + | |
| − | <math>K(\xi)</math>, integrates with respect to
| + | |
| − | <math>\bar y</math> to obtain another
| + | |
| − | complex integral, and reviews techniques for approximating the resulting
| + | |
| − | integrals.
| + | |
| − | Robinson (1982) and Lugannani and Rice (1980) derive tail probability approximations based
| + | |
| − | on approximately integrating (<ref>spser</ref>) with respect to
| + | |
| − | <math>\bar y</math> directly.
| + | |
| − | | + | |
| − | These saddlepoint and Edgeworth approximations have multivariate and
| + | |
| − | conditional extensions. Davison (1988) exploits the conditional saddlepoint tail probability approximation to perform inference in canonical exponential families.
| + | |
| − | ==Samples and sub-samples==
| + | |
| − | A function
| + | |
| − | <math>f\colon{\mathcal R}^n\to{\mathcal R}</math> is symmetric if
| + | |
| − | <math>f(x_1 ,\ldots, x_n) = f(x_{\pi(1)} ,\ldots, x_{\pi(n)})</math>
| + | |
| − | for each permutation
| + | |
| − | <math>\pi</math> of the arguments.
| + | |
| − | For example, the total
| + | |
| − | <math>T_n = x_1 + \cdots + x_n</math>, the average
| + | |
| − | <math>T_n/n</math>,
| + | |
| − | the min, max and median are symmetric functions, as are the sum of squares
| + | |
| − | <math>S_n = \sum x_i^2</math>, the sample variance
| + | |
| − | <math>s_n^2 = (S_n - T_n^2/n)/(n-1)</math>
| + | |
| − | and the mean absolute deviation
| + | |
| − | <math>\sum |x_i - x_j|/(n(n-1))</math>.
| + | |
| − | | + | |
| − | A vector
| + | |
| − | <math>x</math> in
| + | |
| − | <math>{\mathcal R}^n</math> is an ordered list of
| + | |
| − | <math>n</math> real numbers
| + | |
| − | <math>(x_1 ,\ldots, x_n)</math>
| + | |
| − | or a function
| + | |
| − | <math>x\colon[n]\to{\mathcal R}</math> where
| + | |
| − | <math>[n]=\{1 ,\ldots, n\}</math>.
| + | |
| − | For
| + | |
| − | <math>m \le n</math>, a 1--1 function
| + | |
| − | <math>\varphi\colon[m]\to[n]</math> is a sample of size
| + | |
| − | <math>m</math>,
| + | |
| − | the sampled values being
| + | |
| − | <math>x\varphi = (x_{\varphi(1)} ,\ldots, x_{\varphi(m)})</math>.
| + | |
| − | All told, there are
| + | |
| − | <math>n(n-1)\cdots(n-m+1)</math> distinct samples of size
| + | |
| − | <math>m</math>
| + | |
| − | that can be taken from a list of length
| + | |
| − | <math>n</math>.
| + | |
| − | A ''sequence'' of functions
| + | |
| − | <math>f_n\colon{\mathcal R}^n\to{\mathcal R}</math> is
| + | |
| − | consistent under sub-sampling if, for each
| + | |
| − | <math>f_m, f_n</math>,
| + | |
| − | :<math>
| + | |
| − | f_n(x) = {\rm ave} _\varphi f_m(x\varphi),
| + | |
| − | </math>
| + | |
| − | where
| + | |
| − | <math>{\rm ave} _\varphi</math> denotes the average over samples of size
| + | |
| − | <math>m</math>.
| + | |
| − | For
| + | |
| − | <math>m=n</math>, this condition implies only that
| + | |
| − | <math>f_n</math> is a symmetric function.
| + | |
| − | | + | |
| − | Although the total and the median are both symmetric functions, neither is consistent
| + | |
| − | under sub-sampling.
| + | |
| − | For example, the median of the numbers
| + | |
| − | <math>(0,1,3)</math> is one,
| + | |
| − | but the average of the medians of samples of size two is 4/3.
| + | |
| − | However, the average
| + | |
| − | <math>\bar x_n = T_n/n</math> is sampling consistent.
| + | |
| − | Likewise the sample variance
| + | |
| − | <math>s_n^2 = \sum(x_i - \bar x)^2/(n-1)</math> with divisor
| + | |
| − | <math>n-1</math>
| + | |
| − | is sampling consistent,
| + | |
| − | but the mean squared deviation
| + | |
| − | <math>\sum(x_i - \bar x_n)^2/n</math> with divisor
| + | |
| − | <math>n</math> is not.
| + | |
| − | Other sampling consistent functions include Fisher's
| + | |
| − | <math>k</math>-statistics,
| + | |
| − | the first few of which are
| + | |
| − | <math>k_{1,n} = \bar x_n</math>,
| + | |
| − | <math>k_{2,n} = s_n^2</math> for
| + | |
| − | <math>n\ge 2</math>,
| + | |
| − | <math>
| + | |
| − | k_{3,n} = n\sum(x_i - \bar x_n)^3/((n-1)(n-2)),
| + | |
| − | </math>
| + | |
| − | defined for
| + | |
| − | <math>n\ge 3</math>.
| + | |
| − | | + | |
| − | For a sequence of independent and identically distributed random variables,
| + | |
| − | the
| + | |
| − | <math>k</math>-statistic of order
| + | |
| − | <math>r\le n</math> is the unique symmetric function
| + | |
| − | such that
| + | |
| − | <math>E(k_{r,n}) = \kappa_r</math>.
| + | |
| − | Fisher (1929) derived the variances and covariances.
| + | |
| − | The connection with finite-population sub-sampling was developed by
| + | |
| − | Tukey (1950).
| + | |
| − | | + | |
| − | | + | |
| − | ==References==
| + | |
| − | *D. J. Daley and D. Vere-Jones. ''An Introduction to the Theory of Point Processes''. Springer-Verlag, New York, 1988.
| + | |
| − | | + | |
| − | *H. E. Daniels. Saddlepoint approximations in statistics. ''The Annals of Mathematical Statistics'', 25 (4): 631--650, 1954.
| + | |
| − | | + | |
| − | *H. E. Daniels. Tail probability approximations. ''Review of the International Statistical Institute'', 55: 37--46, 1987.
| + | |
| − | | + | |
| − | *A. C. Davison. Approximate conditional inference in generalized linear models. ''Journal of the Royal Statistical Society Series B'', 50: 445--461, 1988.
| + | |
| − | | + | |
| − | *E. Di Nardo, G. Guarino, and D. Senato. A unifying framework for $k$-statistics, polykays and their multivariate generalizations. ''Bernoulli'', 14: 440--468, 2008.
| + | |
| − | | + | |
| − | *P. L. Dressel. Statistical seminvariants and their setimates with particular emphasis on their relation to algebraic invariants. ''The Annals of Mathematical Statistics'', 11 (1): 33--57, 1940.
| + | |
| − | | + | |
| − | *F. Y. Edgeworth. On the representation of statistical frequency by a series. ''Journal of the Royal Statistical Society'', 70 (1): 102--106, 1907.
| + | |
| − | | + | |
| − | *R. A. Fisher. Moments and product moments of sampling distributions. ''Proceedings of the London Mathematical Society, Series 2'', 30: 199--238, 1929.
| + | |
| − | | + | |
| − | *I. J. Good. A new formula for k-statistics. ''The Annals of Statistics'', 5 (1): 224--228, 1977.
| + | |
| − | | + | |
| − | *A. Hald. The early history of cumulants and the Gram-Charlier series. ''International Statistical Review'', 68: 137--153, 2000.
| + | |
| − | | + | |
| − | *C. C. Heyde. On a property of the lognormal distribution. ''Journal of the Royal Statistical Society. Series B (Methodological)'', 25 (2): 392--393, 1963.
| + | |
| − | | + | |
| − | *J. E. Kolassa. ''Series Approximation Methods in Statistics''. Springer--Verlag, New York, 2006.
| + | |
| − | | + | |
| − | *S.L. Lauritzen, editor. ''Thiele: pioneer in statistics''. Oxford University Press, New York, 2002.
| + | |
| − | | + | |
| − | *R. Lugannani and S. Rice. Saddle point approximation for the distribution of the sum of independent random variables. ''Advances in Applied Probability'', 12: 475--490, 1980.
| + | |
| − | | + | |
| − | *J. Marcinkiewicz. Sur une peropri'et'e de la loi de Gauss. ''Mathematische Zeitschrift'', 44: 612--618, 1939.
| + | |
| − | | + | |
| − | *J. Robinson. Saddlepoint approximations for permutation tests and confidence intervals. ''Journal of the Royal Statistical Society. Series B (Methodological)'', 44 (1): 91--101, 1982.
| + | |
| − | | + | |
| − | *G.-C. Rota and B. D. Taylor. The classical umbral calculus. ''SIAM J. Math. Anal'', 25 (2): 694--711, 1994.
| + | |
| − | | + | |
| − | *B. Streitberg. Lancaster interactions revisited. ''The Annals of Statistics'', 18 (4): 1878--1885, 1990.
| + | |
| − | | + | |
| − | *T. N. Thiele. ''Almindelig Iagttagelseslaere: Sandsynlighedsregning og mindste Kvadraters Methode''. C. A. Reitzel, Copenhagen, 1889.
| + | |
| − | | + | |
| − | *T. N. Thiele. ''Theory of Observations''. C. & E. Layton, London, 1903.
| + | |
| − | | + | |
| − | *J. W. Tukey. Some sampling simplified. ''Journal of the American Statistical Association'', 45 (252): 501--519, 1950.
| + | |
The moment generating function (or Fourier--Laplace transform)
<math powerseries>
M(\xi) = E(e^{\xi X})
</math>
is an easy way to combine all of the moments into a single expression.
The cumulants up to order four are defined
even though the moment generating function (<ref>powerseries</ref>) does not exist
for any real \( \xi\neq 0\) .
It has finite moments \( \mu_r = e^{r^2/2}\) of all orders,
but equation (<ref>powerseries</ref>) diverges for every \( \xi\neq 0\).