@@ Line 1: / Line 1: @@
-This article describes a sequence of numbers, called <strong> cumulants</strong>,
-that are used
-to describe, and in some circumstances approximate, a univariate or multivariate
-distribution.  Cumulants are not unique in this role; other sequences, such as
-moments and their generalizations, may also be used in both roles.
-Cumulants have multiple advantages over competitors, in that cumulants change
-in a very simple way when the underlying random variable is subject to an
-affine transformation, cumulants for sums of independent random variables have
-a very simple relationship to the cumulants of the addends, and cumulants may
-be used in a simple way to describe the difference between a distribution and
-its simplest Gaussian approximation.
-==Overview and Definitions==
-===Definition===
-The moment of order
-<math>r</math> (or
-<math>r</math>th moment) of a real-valued random variable
-<math>X</math> is
 <math>
-\mu_r = E(X^r)
+\begin{array}
+a&=&b
+\end{array}
 </math>
-for integer
-<math>r=0,1,\ldots</math>.
-The value is assumed to be finite.
-Provided that it has a Taylor expansion about the origin,
-The moment generating function (or Fourier--Laplace transform)
-<math  powerseries>
-M(\xi) = E(e^{\xi X})
-= E(1 + \xi X +\cdots + \xi^r X^r/r!+\cdots)
-	= \sum_{r=0}^\infty \mu_r \xi^r/r!
-</math>
-is an easy way to combine all of the moments into a single expression.
-The
-<math>r</math>th moment is hence the
-<math>r</math>th derivative of
-<math>M</math> at the origin.
-This definition is due to Fisher (1929).
-When
-<math>X</math> has a distribution given by a density
-<math>f</math>, then
-<math  ctsmomdef>
-\mu_r = \int_{-\infty}^\infty x^r f(x)\, dx </math>, and
-<math  mgfdef>
-M(\xi) = E(e^{\xi X}) =\int_{-\infty}^\infty\exp(\xi x) f(x)~d x.
-</math>
-The cumulants
-<math>\kappa_r</math> are the coefficients in the Taylor expansion of
-the cumulant generating function about the origin
-<math>
-K(\xi) = \log M(\xi) = \sum_{r} \kappa_r \xi^r/r!.
-</math>
-Evidently
-<math>\mu_0 = 1</math> implies
-<math>\kappa_0 = 0</math>.
-The relationship between the first few moments and cumulants,
-obtained by extracting coefficients from the expansion, is as follows
-\begin{eqnarray}\label{forward}
-\left.
-\parbox{10cm}{
-\begin{eqnarray*}
-\kappa_1 &=& \mu_1 \\
-\kappa_2 &=& \mu_2 - \mu_1^2\\
-\kappa_3 &=& \mu_3 - 3\mu_2\mu_1 + 2\mu_1^3\\
-\kappa_4 &=& \mu_4 - 4\mu_3\mu_1 - 3\mu_2^2 + 12\mu_2\mu_1^2 -6\mu_1^4.
-\end{eqnarray*}}\right\}
-\end{eqnarray}
-In the reverse direction
-\begin{eqnarray}\label{reverse}
-\left.
-\parbox{10cm}{
-\begin{eqnarray*}
-\mu_2 &=& \kappa_2 + \kappa_1^2\\
-\mu_3 &=& \kappa_3 + 3\kappa_2\kappa_1 + \kappa_1^3\\
-\mu_4 &=& \kappa_4 + 4\kappa_3\kappa_1 + 3\kappa_2^2 + 6\kappa_2\kappa_1^2 + \kappa_1^4.
-\end{eqnarray*}}\right\}
-\end{eqnarray}
-In particular,
-<math>\kappa_1 = \mu_1</math> is the mean of~
-<math>X</math>,
-<math>\kappa_2</math>~is the
-variance, and
-<math>\kappa_3 = E((X - \mu_1)^3)</math>.
-Higher-order cumulants are not the same as moments about the mean.
-===Definitions under less restrictive conditions===
-The Cauchy distribution with density <math> \pi^{-1}/(1+x^2)</math> has no moments because
-the integral (<ref>ctsmomdef</ref>) does not converge for any integer~<math> r\ge 1</math>
-Student's~<math> t</math> distribution on five degrees of freedom is symmetric with density
-<math> (3\pi\surd5/8)/(1 + x^2/5)^3</math>
-The first four moments are <math> 0, 5/3, 0, 25</math> : higher-order moments are
-not defined.
-The cumulants up to order four are defined by (<ref>forward</ref>)
-even though the moment generating function (<ref>powerseries</ref>) does not exist
-for any real <math> \xi\neq 0</math> .
-In both of these cases, the characteristic function <math> M(i\xi)</math> is
-well-defined for real <math> \xi</math> ,
-<math> \exp(-|\xi|)</math> for the Cauchy distribution,
-and <math> \exp(-|\xi|\surd 5)(1 + |\xi|\surd5 + 5\xi^2/3)</math>  for <math> t_5</math> .
-In the latter case, both <math> M(i\xi)</math> and <math> K(i\xi)</math>
-have Taylor expansions up to order four only, so the moments and
-cumulants are defined only up to this order.
-The infinite expansion (<ref>powerseries</ref>) is justified when
-the radius of convergence is positive, in which case <math> M(\xi)</math> is finite on
-an open set containing zero, and all moments and cumulants are finite.
-However, finiteness of the moments does not imply that <math> M(\xi)</math>
-exists for any <math> \xi\neq 0</math> .
-The log normal distribution provides a counterexample.
-It has finite moments <math> \mu_r = e^{r^2/2}</math> of all orders,
-but (<ref>powerseries</ref>) diverges for every~<math> \xi\neq 0</math>.
-===Uniqueness===
-The normal distribution
-<math>N(\mu, \sigma^2)</math> has cumulant generating function
-<math>\xi\mu + \xi^2 \sigma^2/2</math>, a quadratic polynomial implying that all cumulants
-of order three and higher are zero.
-Marcinkiewicz (1939) showed that the normal distribution is the only distribution
-whose cumulant generating function is a polynomial, i.e.~the only distribution
-having a finite number of non-zero cumulants.
-The Poisson distribution with mean
-<math>\mu</math> has moment generating function
-<math>\exp(\mu(e^\xi - 1))</math> and cumulant generating function
-<math>\mu(e^\xi -1)</math>.
-Consequently all the cumulants are equal to the mean.
-Two distinct distributions may have the same moments, and hence the same cumulants.
-This statement is fairly obvious for distributions whose moments are all infinite,
-or even for distributions having infinite higher-order moments.
-But it is much less obvious for distributions having finite moments of all orders.
-Heyde (1963) gave one such pair of distributions with densities
-<math>
-f_1(x) = \exp(-(\log x)^2/2) / (x\sqrt{2\pi})
-</math>
-and <math>
-f_2(x) = f_1(x) [1 + \sin(2\pi\log x)/2]
-</math>
-for
-<math>x > 0</math>.
-The first of these is called the log normal distribution.
-To show that these distributions have the same moments it suffices to show that
-<math>
-\int_0^\infty x^k f_1(x) \sin(2\pi\log x)\, dx = 0
-</math>
-for integer
-<math>k\ge 1</math>, which can be shown by making the substitution
-<math>\log x = y+k</math>.
-If the sequence of moments is such that (<ref>powerseries</ref>)
-has a finite radius of convergence, the distribution is uniquely determined.
-===Properties===
-Cumulants of order
-<math>r \ge 2</math> are called semi-invariant on account of their
-behavior under affine transformation of variables (Thiele ,1903, Dressel ,1940).
-If
-<math>\kappa_r</math> is the
-<math>r</math>th cumulant of
-<math>X</math>,
-the
-<math>r</math>th cumulant of the affine transformation
-<math>a + b X</math> is
-<math>b^r \kappa_r</math>,
-independent of~
-<math>a</math>.
-This behavior is considerably simpler than that of moments.
-However, moments about the mean are also semi-invariant, so this property alone
-does not explain why cumulants are useful for statistical purposes.
-The term cumulant was coined by Fisher (1929) on account of their behavior under
-addition of random variables.
-Let
-<math>S = X+Y</math> be the sum of two independent random variables.
-The moment generating function of the sum is the product
-<math>
-M_S(\xi) =  M_X(\xi) M_Y(\xi),
-</math>
-and the cumulant generating function is the sum
-<math>
-K_S(\xi) = K_X(\xi) + K_Y(\xi).
-</math>
-Consequently, the
-<math>r</math>th cumulant of the sum is the sum of the
-<math>r</math>th cumulants.
-By extension, if
-<math>X_1,\ldots X_n</math> are independent and identically distributed,
-the
-<math>r</math>th cumulant of the sum is
-<math>n\kappa_r</math>.
-Let
-<math>\kappa_{n;r}</math> be
-cumulant of order
-<math>r</math> of the standardized sum
-<math>n^{-1/2}(X_1+\cdots + X_n)</math>;
-then
-<math  ndep>
-\kappa_{n;r}=n^{1-r/2} \kappa_r.
-</math>
-Provided that the cumulants are finite, all cumulants of order
-<math>r\ge 3</math>
-of the standardized sum tend to zero, which is a simple demonstration of the central limit theorem.
-Good (1977) obtained an expression for the
-<math>r</math>th cumulant of
-<math>X</math> as
-the
-<math>r</math>th moment of the discrete Fourier transform of an independent and
-identically distributed sequence as follows.
-Let
-<math>X_1, X_2,\ldots</math> be independent copies of~
-<math>X</math> with
-<math>r</math>th cumulant~
-<math>\kappa_r</math>,
-and let
-<math>\omega = e^{2\pi i/n}</math> be a primitive
-<math>n</math>th root of unity.
-The discrete Fourier combination
-<math>
-Z = X_1 + \omega X_2 + \cdots + \omega^{n-1} X_n
-</math>
-is a complex-valued random variable whose distribution is invariant under
-rotation
-<math>Z\sim \omega Z</math> through multiples of~
-<math>2\pi /n</math>.
-The
-<math>r</math>th cumulant of the sum is
-<math>\kappa_r \sum_{j=1}^n \omega^{r j}</math>,
-which is equal to
-<math>n\kappa_r</math> if
-<math>r</math> is a multiple of
-<math>n</math>, and zero otherwise.
-Consequently
-<math>E(Z^r) = 0</math> for integer
-<math>r < n</math> and
-<math>E(Z^n) = n\kappa_n</math>.
-===Multivariate cumulants===
-Somewhat surprisingly, the relation between moments and cumulants is simpler and
-more transparent in the multivariate case than in the univariate case.
-Let
-<math>X = (X^1,\ldots, X^k)</math> be the components of a random vector.
-In a departure from the univariate notation, we write
-<math>\kappa^r = E(X^r)</math> for the components of the mean vector,
-<math>\kappa^{rs} = E(X^r X^s)</math> for the components of the second moment matrix,
-<math>\kappa^{r s t} = E(X^r X^s X^t)</math> for the third moments, and so on.
-It is convenient notationally to adopt Einstein's summation convention,
-so
-<math>\xi_r X^r</math> denotes the linear combination
-<math>\xi_1 X^1 + \cdots + \xi_k X^k</math>,
-the square of the linear combination is
-<math>(\xi_r X^r)^2 = \xi_r\xi_s X^r X^s</math>
-a sum of
-<math>k^2</math> terms, and so on for higher powers.
-The Taylor expansion of the moment generating function
-<math>M(\xi) = E(\exp(\xi_r X^r)</math>
-is
-<math>
-M(\xi) = 1 + \xi_r \kappa^r
-	+ \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{rs}
-	+ \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r s t} +\cdots.
-</math>
-The cumulants are defined as the coefficients
-<math>\kappa^{r,s}, \kappa^{r,s,t},\ldots</math>
-in the Taylor expansion
-<math>
-\log M(\xi) = \xi_r \kappa^r
-        + \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{r,s}
-        + \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r,s,t} +\cdots.
-</math>
-This notation does not distinguish first-order moments from first-order cumulants,
-but commas separating the superscripts serve to distinguish higher-order cumulants from moments.
-Comparison of coefficients reveals that the each moment
-<math>\kappa^{rs}, \kappa^{r s t},\ldots</math>
-is a sum over partitions of the superscripts, each term in the sum being a
-product of cumulants:
-\begin{eqnarray*}
-\kappa^{rs}&=&\kappa^{r,s} + \kappa^r\kappa^s\\
-\kappa^{r s t}&=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t + \kappa^{r,t}\kappa^s + \kappa^{s,t}\kappa^r
-	+ \kappa^r\kappa^s\kappa^t\\
-	&=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t[3] + \kappa^r\kappa^s\kappa^t\\
-\kappa^{r s t u}&=&\kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,s}\kappa^{t,u}[3]
-	+ \kappa^{r,s}\kappa^t\kappa^u[6] + \kappa^r\kappa^s\kappa^t\kappa^u.
-\end{eqnarray*}
-Each parenthetical number indicates a sum over distinct partitions
-having the same block sizes, so the fourth-order moment is a sum of 15 distinct cumulant products.
-In the reverse direction, each cumulant is also a sum over partitions of the indices.
-Each term in the sum is a product of moments, but with coefficient
-<math>(-1)^{\nu-1} (\nu-1)!</math>
-where
-<math>\nu</math> is the number of blocks:
-\begin{eqnarray*}
-\kappa^{r,s} &=& \kappa^{rs} - \kappa^r\kappa^s\\
-\kappa^{r,s,t} &=& \kappa^{r s t} - \kappa^{rs}\kappa^t[3] + 2 \kappa^r\kappa^s\kappa^t\\
-\kappa^{r,s,t,u} &=& \kappa^{r s t u} - \kappa^{r s t}\kappa^u[4] - \kappa^{rs}\kappa^{t u}[3]
-        + 2 \kappa^{rs}\kappa^t\kappa^u[6] - 6 \kappa^r\kappa^s\kappa^t\kappa^u
-\end{eqnarray*}
-These relationships are an instance of M\"obius inversion on the partition lattice.
-Partition notation serves one additional purpose.
-It establishes moments and cumulants as special cases of generalized cumulants,
-which includes objects of the type
-<math>\kappa^{r,st} = {\rm cov}(X^r, X^s X^t)</math>,
-<math>\kappa^{rs, t u} = {\rm cov}(X^r X^s, X^t X^u)</math>, and
-<math>\kappa^{rs, t, u}</math> with incompletely partitioned indices.
-These objects arise very naturally in statistical work involving asymptotic
-approximation of distributions.
-They are intermediate between moments and cumulants, and have characteristics of both.
-Every generalized cumulant can be expressed as a sum of certain products of ordinary cumulants.
-Some examples are as follows:
-\begin{eqnarray*}
-\kappa^{rs, t} &=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t} + \kappa^s \kappa^{r,t}\\
-	&=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t}[2]\\
-\kappa^{rs,t u} &=& \kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,t}\kappa^{s,u}[2]
-	+ \kappa^{r,t}\kappa^s\kappa^u[4]\\
-\kappa^{rs,t,u} &=& \kappa^{r,s,t,u} + \kappa^{r,t,u}\kappa^s[2] + \kappa^{r,t}\kappa^{s,u}[2]
-\end{eqnarray*}
-Each generalized cumulant is associated with a partition
-<math>\tau</math> of the given set of indices.
-For example,
-<math>\kappa^{rs,t,u}</math> is associated with the partition
-<math>\tau=rs|t|u</math> of four indices
-into three blocks.
-Each term on the right is a cumulant product associated with a partition
-<math>\sigma</math> of the same indices.
-The coefficient is one if the least upper bound
-<math>\sigma\vee\tau</math> has a single block,
-otherwise zero.
-Thus, with
-<math>\tau=rs|t|u</math>, the product
-<math>\kappa^{r,s}\kappa^{t,u}</math> does not appear
-on the right because
-<math>\sigma\vee\tau = rs|t u</math> has two blocks.
-As an example of the way these formulae may be used,
-let
-<math>X</math> be a scalar random variable with cumulants
-<math>\kappa_1,\kappa_2,\kappa_3,\ldots</math>.
-By translating the second formula in the preceding list, we find that
-the variance of the squared variable is
-<math>
-{\rm var}(X^2) = \kappa_4 + 4\kappa_3\kappa_1 + 2\kappa_2^2 + 4\kappa_2\kappa_1^2,
-</math>
-reducing to
-<math>\kappa_4 + 2\kappa_2^2</math> if the mean is zero.
-===Exponential families===
-Let <math> f</math> be a probability distribution on an arbitrary measurable space <math> ({\mathcal X},\nu)</math> ,
-and let <math> t\colon{\mathcal X}\to{\mathcal R}</math> be a real-valued random variable
-with cumulant generating function
-<math> K(\cdot)</math> , finite in a set <math> \Theta</math> containing zero in the interior.
-The family of distributions on <math> {\mathcal X}</math> with density
-<math>
-f_\theta(x) = e^{\theta t(x)} f(x) / M(\theta) = e^{\theta t(x) - K(\theta)} f(x)
-</math>
-indexed by <math> \theta\in\Theta</math> is called the exponential family
-associated with <math> f</math> and the canonical statistic~<math> t</math> .
-In statistical physics, the normalizing constant <math> M(\theta)</math> is called the
-partition function.
-Two examples suffice to illustrate the idea.
-In the first example, <math> {\mathcal X} = \{1,2,\ldots\}</math> is the set of natural numbers,
-<math> f(x) \propto 1/x^2</math> and <math> t(x) = -\log(x)</math> .
-The associated exponential family is
-<math> f_\theta(x) = x^{-\theta}/\zeta(\theta)</math> ,
-where <math> \zeta(\theta)</math> is the Riemann zeta function with real argument <math> \theta > 1</math> .
-In the second example, <math> {\mathcal X}={\mathcal X}_n</math> is the symmetric group or the set of
-permutations of <math> n</math> letters,
-<math> x\in{\mathcal X}_n</math> is a permutation, <math> t(x)</math> is the number of cycles,
-<math> f(x) = 1/n!</math> is the uniform distribution,
-and <math> M_n(\xi) = \Gamma(n+e^\xi)/(n!\, \Gamma(e^\xi))</math> for all real~<math> \xi</math> .
-The exponential family of distributions on permutations of <math> [n]</math> is
-<math>
-f_{n,\theta}(x) = \frac{\Gamma(\lambda)\, \lambda^{t(x)}} {\Gamma(n+\lambda)},
-</math>
-the same as the the distribution generated by the Chinese restaurant process
-with parameter <math> \lambda = e^\theta</math> .
-The associated marginal distribution on partitions,
-the Ewens distribution on partitions of <math> [n]</math> ,
-is also of the exponential-family form with canonical statistic equal
-to the number of blocks or cycles.
-This number <math> t(x)</math> is a random variable whose cumulants are the
-derivatives of <math> \log M(\cdot)</math> evaluated at the parameter~<math> \theta</math> .
-In the multi-parameter case,
-<math> t\colon{\mathcal X}\to{\mathcal R}^p</math> is a random vector
-and <math> \xi\colon{\mathcal R}^p\to{\mathcal R}</math> is a linear functional,
-<math> M(\xi) = E(e^{\xi(t)})</math> is the joint moment generating function.
-It is sometimes convenient to employ Einstein's implicit summation convention
-in the form <math> \theta(t) = \theta_i t^i</math> where <math> t^1,\ldots, t^p</math> are
-the components of <math> t(x)</math> , and <math> \theta_1,\ldots, \theta_p</math> are the coefficients
-of the linear functional.
-For simplicity of notation in what follows, <math> {\mathcal X}={\mathcal R}^p</math> and <math> t(x) = x</math>
-is the identity function.
-An exponential-family distribution in <math> {\mathcal R}^p</math> has the form
-<math>
-f_\theta(x)=\exp(x^j\theta_j-g(x)-\varphi(\theta))
-</math>
-for given functions <math> g</math> and <math> \varphi</math> .
-Integration shows that the distribution <math> f_\theta</math> has
-cumulant generating function <math> K_\theta(\xi)=\varphi(\theta+\xi)-\varphi(\theta)</math> .
-The cumulants of <math> X\sim f_\theta</math> are equal to the derivatives of <math> \varphi</math>
-at the parameter~<math> \theta</math> .
-===Calculus of cumulants===
-The umbral calculus is a syntax or formal system consisting of
-certain operations on objects called umbrae,
-mimicking addition and multiplication of independent real-valued random variables
-(Rota and Taylor ,1994).
-To each real-valued sequence <math> 1, a_1, a_2,\ldots</math>
-there corresponds an umbra <math> \alpha</math> such that <math> E(\alpha^r) = a_r</math> .
-This freedom gives rise to special umbrae, the singleton and Bell umbra,
-corresponding to no real-valued random variable.
-Using these special umbrae, one develops the formal notion of
-an <math> \alpha</math>-cumulant umbra <math> \chi\cdot\alpha</math> by formal product operations in the syntax.
-Properties of cumulants, <math> k</math> -statistics and other polynomial functions
-are then derived by purely formal combinatorial operations.
-Di~Nardo et~al. (2008) present details.
-==Approximation of distributions==
-===Edgeworth approximation===
-Suppose that
-<math>Y</math> is a random variable that arises as the sum
-of
-<math>n</math> independent and identically-distributed summands, each of which has
-mean
-<math>0</math>, unit variance, and
-cumulants
-<math>\kappa_r</math>, and
-<math>X=Y/\sqrt{n}</math>.
-For ease of exposition, assume that cumulants of all orders exist.
-Then, using (<ref>ndep</ref>), the cumulant generating function of
-<math>X</math> is given by
-<math>K(\xi)=\xi^2/2 +\kappa_3\xi^3/(6\sqrt{n}) +\kappa_4\xi^4/(24 n) +\cdots</math>,
-and the moment generating function of
-<math>X</math> is given by
-<math>
-K(\xi)=\exp(\xi^2/2)\exp(\kappa_3\xi^3/(6\sqrt{n})+\kappa_4\xi^4/(24 n)+\cdots)
-</math>
-Exponentiating the second factor gives
-<math>
-K(\xi)=\exp(\xi^2/2)\left(1\!+\!{{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\!+\! {\textstyle{\frac12}} \left[
-{{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\right]^2\!+\!\!\cdots\right).
-</math>
-Reordering terms in powers of sample size,
-<math  kseries>
-=\exp(\xi^2/2)\left(1+{{\kappa_3\xi^3}\over{6\sqrt{n}}}+{{\kappa_4\xi^4}\over{24 n}}+
-{{\kappa_3^2\xi^6}\over{72 n}}+\cdots\right).
-</math>
-Repeated application of integration by parts to (<ref>mgfdef</ref>) shows that
-<math  mgfderiv>
-\xi^r M(\xi) =\int_{-\infty}^\infty\exp(\xi x)(-1)^r f^{(r)}(x)~d x,
-</math>
-where
-<math>f^{(r)}</math> denotes the derivative of
-<math>f</math> of order
-<math>r</math>.  Relation
-(<ref>mgfderiv</ref>) holds if
-<math>f</math> and its derivatives go to zero quickly
-as
-<math>\vert x\vert\to\infty</math>.  Applying (<ref>mgfderiv</ref>) to the normal
-density
-<math>\phi(x)=\exp(-x^2/2)/\sqrt{2\pi}</math>, and applying the result to
-(<ref>kseries</ref>), gives
-<math>
-M(\xi)\approx\int_{-\infty}^\infty\exp(\xi x)\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+
-{{\kappa_3^2h^6(x)}\over{72 n}}\right]~d x
-</math>
-for
-<math>h^r(x)=(-1)^r\phi^{(r)}(x)/\phi(x)</math>, and, since the relationship
-giving the moment generating function in terms of the density is invertible,
-and that the inversion process is properly smooth,
-Edgeworth (1907) approximates the density of
-<math>X</math> by
-<math  edser>
-e_4(x)=\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+
-{{\kappa_3^2h^6(x)}\over{72 n}}\right].
-</math>
-In fact, when the summands contributing to
-<math>S</math> have a density and cumulants of order at least 5, the error in the
-approximation, multiplied by
-<math>n^{3/2}</math>, remains bounded.
-The functions
-<math>h^r</math> defined above are the Hermite polynomials.
-The approximation (<ref>edser</ref>) is known as the Edgeworth series.
-The subscript refers to the number of cumulants used in its definition.
-This series can be used to approximate either the cumulative distribution function or survival function through term-wise integration.
-The preceding discussion is intended to be heuristic; Kolassa (2006) presents
-a rigorous derivation, along with the natural extension to random vectors.
-===Saddlepoint approximation===
-The approximation (<ref>edser</ref>) to the density
-<math>f(x)</math> has the property that
-<math>|f(x)-e_r(x)|\leq C n^{-(r-1)/2}</math>, for some constant
-<math>C</math>,
-when the cumulant of order
-<math>r+1</math> exists;
-<math>C</math> does not depend on
-<math>x</math>.
-A similar bound holds for the relative error
-<math>(f(x)-e_r(x))/f(x)</math>, only when
-<math>x</math> is restricted to a finite interval.
-Because of the polynomial factor multiplying the first omitted term in
-(<ref>edser</ref>), the relative error can be expected to behave poorly.
-One might prefer an approximation that maintains good behavior for
-values of
-<math>X</math> in a range that increases as
-<math>n</math> increases; specifically,
-one might prefer an approximation that performs well for values of
-<math>\bar Y=X/\sqrt{n}</math> in a fixed interval.
-Assume again that random variables
-<math>Y_j</math> are independent and identically distributed, each with a cumulant generating function
-<math>K(\xi)</math> finite for
-<math>\xi</math>
-in a neighborhood of
-<math>0</math>.  As above, define the exponential family
-<math>
-f_{\bar Y}(\bar y;\theta)=\exp(\theta\bar y-K(\theta))f_{\bar Y}(\bar y).
-</math>
-One can then choose a value of
-<math>\theta</math> depending on
-<math>\bar y</math>
-that makes
-<math>f_{\bar Y}(\bar y;\theta)</math> easy to approximate, and
-the exponential family relationship to derive an approximation for
-<math>f_{\bar Y}(\bar y)</math>.  Conventionally we choose
-<math>\hat\theta</math> to
-satisfy
-<math  speqn>
-K'(\hat\theta)=\bar y;
-</math>
-this makes the expectation of the distribution
-with density
-<math>f_{\bar Y}(\cdot;\hat\theta)</math> equal to the observed value.
-One then applies (<ref>edser</ref>), with the scale of the ordinate changed
-to reflect the fact that we are approximating the distribution of
-<math>X/\sqrt{n}</math>,
-to obtain
-<math>
-f_{\bar Y}(\bar y)\approx\exp(-\hat\theta\bar y+K(\hat\theta))
-n\phi(0)\left[1+{{\kappa_3 h^3(0)}\over{6\sqrt{n}}}+{{\kappa_4h^4(0)}\over{24 n}}+
-{{\kappa_3^2h^6(0)}\over{72 n}}\right].
-</math>
-Using the fact that <math>h^3(0)=0</math>,
-<math>h^4(0)=3</math>, and <math>h^6(0)=-15</math>,
-we obtain <math  spser>
-f_{\bar Y}(\bar y)\approx{{n}\over{\sqrt{2\pi}}}
-\exp(K(\hat\theta)-\hat\theta\bar y)
-\left[1+{{\hat\kappa_4}\over{8 n}}-
-{{5\hat\kappa_3^2}\over{24 n}}\right].
-</math>
-Here
-<math>\hat\kappa_j</math> are calculated from the derivatives of
-<math>K</math> in the preceding manner, but in this case evaluated at
-<math>\hat\theta</math>.
-This approximation may only be applied to values of
-<math>\bar y</math> for which
-(<ref>speqn</ref>) has solutions in an open neighborhood of 0.
-Expression (<ref>spser</ref>) represents the saddlepoint approximation to
-the density of the mean
-<math>\bar Y</math>; since
-<math>f_{\bar Y}(\bar y;\theta)</math>
-has a cumulant generating function defined on an open set containing
-<math>0</math>,
-cumulants of all orders exist, the Edgeworth series including
-<math>\kappa_6</math>
-may be applied to
-<math>f_{\bar Y}(\bar y;\theta)</math>, and so the error in the
-Edgeworth series is of order
-<math>O(1/n^2)</math>.  Hence the error in (<ref>spser</ref>)
-is of the same order, and in this case, is relative and uniform for values of
-<math>\bar y</math> in a bounded subset of an open subset on which (<ref>speqn</ref>)
-has a solution.
-This approximation was introduced to the statistics literature by
-Daniels (1954).
-The Edgeworth series for the density was trivially integrated to obtain an
-approximation to tail probabilities.  Integration of the saddlepoint
-approximation is more delicate.  Two main approaches have been investigated.
-Daniels (1987) expresses
-<math>f_{\bar Y}(\bar y)</math> exactly as a complex integral
-involving
-<math>K(\xi)</math>, integrates with respect to
-<math>\bar y</math> to obtain another
-complex integral, and reviews techniques for approximating the resulting
-integrals.
-Robinson (1982) and Lugannani and Rice (1980) derive tail probability approximations based
-on approximately integrating (<ref>spser</ref>) with respect to
-<math>\bar y</math> directly.
-These saddlepoint and Edgeworth approximations have multivariate and
-conditional extensions.  Davison (1988) exploits the conditional saddlepoint tail probability approximation to perform inference in canonical exponential families.
-==Samples and sub-samples==
-A function
-<math>f\colon{\mathcal R}^n\to{\mathcal R}</math> is symmetric if
-<math>f(x_1 ,\ldots, x_n) = f(x_{\pi(1)} ,\ldots, x_{\pi(n)})</math>
-for each permutation
-<math>\pi</math> of the arguments.
-For example, the total
-<math>T_n = x_1 + \cdots + x_n</math>, the average
-<math>T_n/n</math>,
-the min, max and median are symmetric functions, as are the sum of squares
-<math>S_n = \sum x_i^2</math>, the sample variance
-<math>s_n^2 = (S_n - T_n^2/n)/(n-1)</math>
-and the mean absolute deviation
-<math>\sum |x_i - x_j|/(n(n-1))</math>.
-A vector
-<math>x</math> in
-<math>{\mathcal R}^n</math> is an ordered list of
-<math>n</math> real numbers
-<math>(x_1 ,\ldots, x_n)</math>
-or a function
-<math>x\colon[n]\to{\mathcal R}</math> where
-<math>[n]=\{1 ,\ldots, n\}</math>.
-For
-<math>m \le n</math>, a 1--1 function
-<math>\varphi\colon[m]\to[n]</math> is a sample of size~
-<math>m</math>,
-the sampled values being
-<math>x\varphi = (x_{\varphi(1)} ,\ldots, x_{\varphi(m)})</math>.
-All told, there are
-<math>n(n-1)\cdots(n-m+1)</math> distinct samples of size~
-<math>m</math>
-that can be taken from a list of length~
-<math>n</math>.
-A \emph{sequence} of functions
-<math>f_n\colon{\mathcal R}^n\to{\mathcal R}</math> is
-consistent under sub-sampling if, for each
-<math>f_m, f_n</math>,
-<math>
-f_n(x) = {\rm ave} _\varphi f_m(x\varphi),
-</math>
-where
-<math>{\rm ave} _\varphi</math> denotes the average over samples of size~
-<math>m</math>.
-For
-<math>m=n</math>, this condition implies only that
-<math>f_n</math> is a symmetric function.
-Although the total and the median are both symmetric functions, neither is consistent
-under sub-sampling.
-For example, the median of the numbers
-<math>(0,1,3)</math> is one,
-but the average of the medians of samples of size two is 4/3.
-However, the average
-<math>\bar x_n = T_n/n</math> is sampling consistent.
-Likewise the sample variance
-<math>s_n^2 = \sum(x_i - \bar x)^2/(n-1)</math> with divisor
-<math>n-1</math>
-is sampling consistent,
-but the mean squared deviation
-<math>\sum(x_i - \bar x_n)^2/n</math> with divisor
-<math>n</math> is not.
-Other sampling consistent functions include Fisher's
-<math>k</math>-statistics,
-the first few of which are
-<math>k_{1,n} = \bar x_n</math>,
-<math>k_{2,n} = s_n^2</math> for
-<math>n\ge 2</math>,
-\begin{eqnarray*}
-k_{3,n} &=& n\sum(x_i - \bar x_n)^3/((n-1)(n-2))\\
-k_{4,n} &=&
-\end{eqnarray*}
-defined for
-<math>n\ge 3</math> and
-<math>n\ge 4</math> respectively.
-For a sequence of independent and identically distributed random variables,
-the
-<math>k</math>-statistic of order~
-<math>r\le n</math> is the unique symmetric function
-such that
-<math>E(k_{r,n}) = \kappa_r</math>.
-Fisher (1929) derived the variances and covariances.
-The connection with finite-population sub-sampling was developed by
-Tukey (1950).
-==References==
-*
-H. E. Daniels.
-Saddlepoint approximations in statistics.
-<em>The Annals of Mathematical Statistics<\em>, 25
-(4): 631--650, 1954.
-*
-H. E. Daniels.
-Tail probability approximations.
-<em>Review of the International Statistical Institute<\em>,
-: 37--46, 1987.
-*
-A. C. Davison.
-Approximate conditional inference in generalized linear models.
-<em>Journal of the Royal Statistical Society Series B<\em>,
-: 445--461, 1988.
-*
-E. Di Nardo, G. Guarino, and D. Senato.
-A unifying framework for $k$-statistics, polykays and their
-multivariate generalizations.
-<em>Bernoulli<\em>, 14: 440--468, 2008.
-*
-P. L. Dressel.
-Statistical seminvariants and their setimates with particular
-emphasis on their relation to algebraic invariants.
-<em>The Annals of Mathematical Statistics<\em>, 11
-(1): 33--57, 1940.
-*
-F. Y. Edgeworth.
-On the representation of statistical frequency by a series.
-<em>Journal of the Royal Statistical Society<\em>, 70
-(1): 102--106, 1907.
-*
-R. A. Fisher.
-Moments and product moments of sampling distributions.
-<em>Proceedings of the London Mathematical Society, Series 2<\em>,
-: 199--238, 1929.
-*
-I. J. Good.
-A new formula for k-statistics.
-<em>The Annals of Statistics<\em>, 5 (1): 224--228,
-.
-*
-C. C. Heyde.
-On a property of the lognormal distribution.
-Journal of the Royal Statistical Society. Series B
-(Methodological)}, 25 (2): 392--393, 1963.
-*
-J. E. Kolassa.
-<em>Series Approximation Methods in Statistics<\em>.
-Springer--Verlag, New York, 2006.
-*
-R. Lugannani and S. Rice.
-Saddle point approximation for the distribution of the sum of
-independent random variables.
-<em>Advances in Applied Probability<\em>, 12: 475--490, 1980.
-*
-J. Marcinkiewicz.
-Sur une peropri\'et\'e de la loi de {G}auss.
-<em>Mathematische Zeitschrift<\em>, 44: 612--618, 1939.
-*
-J. Robinson.
-Saddlepoint approximations for permutation tests and confidence
-intervals.
-Journal of the Royal Statistical Society. Series B
-(Methodological)}, 44 (1): 91--101, 1982.
-*
-G.-C. Rota and B. D. Taylor.
-The classical umbral calculus.
-<em>SIAM J. Math. Anal<\em>,  (25): 694--711, 1994.
-*
-T. N. Thiele.
-<em>Theory of Observations<\em>.
-C. & E. Layton, London, 1903.
-*
-J. W. Tukey.
-Some sampling simplified.
-<em>Journal of the American Statistical Association<\em>, 45
-(252): 501--519, 1950.

Dr. John Kolassa

Revision as of 21:08, 5 September 2008

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Focal areas

Activity

Tools