Dr. John Kolassa

Revision as of 19:18, 5 September 2008

This article describes a sequence of numbers, called cumulants, that are used to describe, and in some circumstances approximate, a univariate or multivariate distribution. Cumulants are not unique in this role; other sequences, such as moments and their generalizations, may also be used in both roles. Cumulants have multiple advantages over competitors, in that cumulants change in a very simple way when the underlying random variable is subject to an affine transformation, cumulants for sums of independent random variables have a very simple relationship to the cumulants of the addends, and cumulants may be used in a simple way to describe the difference between a distribution and its simplest Gaussian approximation.

Overview and Definitions

Definition

The moment of order $r$ (or $r$th moment) of a real-valued random variable $X$ is $ \mu_r = E(X^r) $ for integer $r=0,1,\ldots$. The value is assumed to be finite. Provided that it has a Taylor expansion about the origin, The moment generating function (or Fourier--Laplace transform) <math powerseries> M(\xi) = E(e^{\xi X}) = E(1 + \xi X +\cdots + \xi^r X^r/r!+\cdots) = \sum_{r=0}^\infty \mu_r \xi^r/r! </math> is an easy way to combine all of the moments into a single expression. The $r$th moment is hence the $r$th derivative of $M$ at the origin. This definition is due to Fisher (1929).

When $X$ has a distribution given by a density $f$, then <math ctsmomdef> \mu_r = \int_{-\infty}^\infty x^r f(x)\, dx </math>, and <math mgfdef> M(\xi) = E(e^{\xi X}) =\int_{-\infty}^\infty\exp(\xi x) f(x)~d x. </math>

The cumulants $\kappa_r$ are the coefficients in the Taylor expansion of the cumulant generating function about the origin $ K(\xi) = \log M(\xi) = \sum_{r} \kappa_r \xi^r/r!. $ Evidently $\mu_0 = 1$ implies $\kappa_0 = 0$. The relationship between the first few moments and cumulants, obtained by extracting coefficients from the expansion, is as follows \begin{eqnarray}\tag{1} \left. \parbox{10cm}{ \begin{eqnarray*} \kappa_1 &=& \mu_1 \\ \kappa_2 &=& \mu_2 - \mu_1^2\\ \kappa_3 &=& \mu_3 - 3\mu_2\mu_1 + 2\mu_1^3\\ \kappa_4 &=& \mu_4 - 4\mu_3\mu_1 - 3\mu_2^2 + 12\mu_2\mu_1^2 -6\mu_1^4. \end{eqnarray*}}\right\} \end{eqnarray} In the reverse direction \begin{eqnarray}\tag{2} \left. \parbox{10cm}{ \begin{eqnarray*} \mu_2 &=& \kappa_2 + \kappa_1^2\\ \mu_3 &=& \kappa_3 + 3\kappa_2\kappa_1 + \kappa_1^3\\ \mu_4 &=& \kappa_4 + 4\kappa_3\kappa_1 + 3\kappa_2^2 + 6\kappa_2\kappa_1^2 + \kappa_1^4. \end{eqnarray*}}\right\} \end{eqnarray} In particular, $\kappa_1 = \mu_1$ is the mean of~ $X$, $\kappa_2$~is the variance, and $\kappa_3 = E((X - \mu_1)^3)$. Higher-order cumulants are not the same as moments about the mean.

Definitions under less restrictive conditions

The Cauchy distribution with density $ \pi^{-1}/(1+x^2)$ has no moments because the integral (<ref>ctsmomdef</ref>) does not converge for any integer~$ r\ge 1$ Student's~$ t$ distribution on five degrees of freedom is symmetric with density $ (3\pi\surd5/8)/(1 + x^2/5)^3$ The first four moments are $ 0, 5/3, 0, 25$ : higher-order moments are not defined. The cumulants up to order four are defined by (<ref>forward</ref>) even though the moment generating function (<ref>powerseries</ref>) does not exist for any real $ \xi\neq 0$ .

In both of these cases, the characteristic function $ M(i\xi)$ is well-defined for real $ \xi$ , $ \exp(-|\xi|)$ for the Cauchy distribution, and $ \exp(-|\xi|\surd 5)(1 + |\xi|\surd5 + 5\xi^2/3)$ for $ t_5$ . In the latter case, both $ M(i\xi)$ and $ K(i\xi)$ have Taylor expansions up to order four only, so the moments and cumulants are defined only up to this order. The infinite expansion (<ref>powerseries</ref>) is justified when the radius of convergence is positive, in which case $ M(\xi)$ is finite on an open set containing zero, and all moments and cumulants are finite. However, finiteness of the moments does not imply that $ M(\xi)$ exists for any $ \xi\neq 0$ . The log normal distribution provides a counterexample. It has finite moments $ \mu_r = e^{r^2/2}$ of all orders, but (<ref>powerseries</ref>) diverges for every~$ \xi\neq 0$.

Uniqueness

The normal distribution $N(\mu, \sigma^2)$ has cumulant generating function $\xi\mu + \xi^2 \sigma^2/2$, a quadratic polynomial implying that all cumulants of order three and higher are zero. Marcinkiewicz (1939) showed that the normal distribution is the only distribution whose cumulant generating function is a polynomial, i.e.~the only distribution having a finite number of non-zero cumulants. The Poisson distribution with mean $\mu$ has moment generating function $\exp(\mu(e^\xi - 1))$ and cumulant generating function $\mu(e^\xi -1)$. Consequently all the cumulants are equal to the mean.

Two distinct distributions may have the same moments, and hence the same cumulants. This statement is fairly obvious for distributions whose moments are all infinite, or even for distributions having infinite higher-order moments. But it is much less obvious for distributions having finite moments of all orders. Heyde (1963) gave one such pair of distributions with densities $ f_1(x) = \exp(-(\log x)^2/2) / (x\sqrt{2\pi}) $ and $ f_2(x) = f_1(x) [1 + \sin(2\pi\log x)/2] $ for $x > 0$. The first of these is called the log normal distribution. To show that these distributions have the same moments it suffices to show that $ \int_0^\infty x^k f_1(x) \sin(2\pi\log x)\, dx = 0 $ for integer $k\ge 1$, which can be shown by making the substitution $\log x = y+k$.

If the sequence of moments is such that (<ref>powerseries</ref>) has a finite radius of convergence, the distribution is uniquely determined.

Properties

Cumulants of order $r \ge 2$ are called semi-invariant on account of their behavior under affine transformation of variables (Thiele ,1903, Dressel ,1940). If $\kappa_r$ is the $r$th cumulant of $X$, the $r$th cumulant of the affine transformation $a + b X$ is $b^r \kappa_r$, independent of~ $a$. This behavior is considerably simpler than that of moments. However, moments about the mean are also semi-invariant, so this property alone does not explain why cumulants are useful for statistical purposes.

The term cumulant was coined by Fisher (1929) on account of their behavior under addition of random variables. Let $S = X+Y$ be the sum of two independent random variables. The moment generating function of the sum is the product $ M_S(\xi) = M_X(\xi) M_Y(\xi), $ and the cumulant generating function is the sum $ K_S(\xi) = K_X(\xi) + K_Y(\xi). $ Consequently, the $r$th cumulant of the sum is the sum of the $r$th cumulants. By extension, if $X_1,\ldots X_n$ are independent and identically distributed, the $r$th cumulant of the sum is $n\kappa_r$. Let $\kappa_{n;r}$ be cumulant of order $r$ of the standardized sum $n^{-1/2}(X_1+\cdots + X_n)$; then <math ndep> \kappa_{n;r}=n^{1-r/2} \kappa_r. </math> Provided that the cumulants are finite, all cumulants of order $r\ge 3$ of the standardized sum tend to zero, which is a simple demonstration of the central limit theorem.

Good (1977) obtained an expression for the $r$th cumulant of $X$ as the $r$th moment of the discrete Fourier transform of an independent and identically distributed sequence as follows. Let $X_1, X_2,\ldots$ be independent copies of~ $X$ with $r$th cumulant~ $\kappa_r$, and let $\omega = e^{2\pi i/n}$ be a primitive $n$th root of unity. The discrete Fourier combination $ Z = X_1 + \omega X_2 + \cdots + \omega^{n-1} X_n $ is a complex-valued random variable whose distribution is invariant under rotation $Z\sim \omega Z$ through multiples of~ $2\pi /n$. The $r$th cumulant of the sum is $\kappa_r \sum_{j=1}^n \omega^{r j}$, which is equal to $n\kappa_r$ if $r$ is a multiple of $n$, and zero otherwise. Consequently $E(Z^r) = 0$ for integer $r < n$ and $E(Z^n) = n\kappa_n$.

Multivariate cumulants

Somewhat surprisingly, the relation between moments and cumulants is simpler and more transparent in the multivariate case than in the univariate case. Let $X = (X^1,\ldots, X^k)$ be the components of a random vector. In a departure from the univariate notation, we write $\kappa^r = E(X^r)$ for the components of the mean vector, $\kappa^{rs} = E(X^r X^s)$ for the components of the second moment matrix, $\kappa^{r s t} = E(X^r X^s X^t)$ for the third moments, and so on. It is convenient notationally to adopt Einstein's summation convention, so $\xi_r X^r$ denotes the linear combination $\xi_1 X^1 + \cdots + \xi_k X^k$, the square of the linear combination is $(\xi_r X^r)^2 = \xi_r\xi_s X^r X^s$ a sum of $k^2$ terms, and so on for higher powers. The Taylor expansion of the moment generating function $M(\xi) = E(\exp(\xi_r X^r)$ is $ M(\xi) = 1 + \xi_r \kappa^r + \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{rs} + \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r s t} +\cdots. $ The cumulants are defined as the coefficients $\kappa^{r,s}, \kappa^{r,s,t},\ldots$ in the Taylor expansion $ \log M(\xi) = \xi_r \kappa^r + \textstyle{\frac1{2!}} \xi_r\xi_s \kappa^{r,s} + \textstyle{\frac1{3!}} \xi_r\xi_s \xi_t \kappa^{r,s,t} +\cdots. $ This notation does not distinguish first-order moments from first-order cumulants, but commas separating the superscripts serve to distinguish higher-order cumulants from moments.

Comparison of coefficients reveals that the each moment $\kappa^{rs}, \kappa^{r s t},\ldots$ is a sum over partitions of the superscripts, each term in the sum being a product of cumulants: \begin{eqnarray*} \kappa^{rs}&=&\kappa^{r,s} + \kappa^r\kappa^s\\ \kappa^{r s t}&=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t + \kappa^{r,t}\kappa^s + \kappa^{s,t}\kappa^r + \kappa^r\kappa^s\kappa^t\\ &=&\kappa^{r,s,t} + \kappa^{r,s}\kappa^t[3] + \kappa^r\kappa^s\kappa^t\\ \kappa^{r s t u}&=&\kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,s}\kappa^{t,u}[3] + \kappa^{r,s}\kappa^t\kappa^u[6] + \kappa^r\kappa^s\kappa^t\kappa^u. \end{eqnarray*} Each parenthetical number indicates a sum over distinct partitions having the same block sizes, so the fourth-order moment is a sum of 15 distinct cumulant products. In the reverse direction, each cumulant is also a sum over partitions of the indices. Each term in the sum is a product of moments, but with coefficient $(-1)^{\nu-1} (\nu-1)!$ where $\nu$ is the number of blocks: \begin{eqnarray*} \kappa^{r,s} &=& \kappa^{rs} - \kappa^r\kappa^s\\ \kappa^{r,s,t} &=& \kappa^{r s t} - \kappa^{rs}\kappa^t[3] + 2 \kappa^r\kappa^s\kappa^t\\ \kappa^{r,s,t,u} &=& \kappa^{r s t u} - \kappa^{r s t}\kappa^u[4] - \kappa^{rs}\kappa^{t u}[3] + 2 \kappa^{rs}\kappa^t\kappa^u[6] - 6 \kappa^r\kappa^s\kappa^t\kappa^u \end{eqnarray*}

These relationships are an instance of M\"obius inversion on the partition lattice.

Partition notation serves one additional purpose. It establishes moments and cumulants as special cases of generalized cumulants, which includes objects of the type $\kappa^{r,st} = {\rm cov}(X^r, X^s X^t)$, $\kappa^{rs, t u} = {\rm cov}(X^r X^s, X^t X^u)$, and $\kappa^{rs, t, u}$ with incompletely partitioned indices. These objects arise very naturally in statistical work involving asymptotic approximation of distributions. They are intermediate between moments and cumulants, and have characteristics of both.

Every generalized cumulant can be expressed as a sum of certain products of ordinary cumulants. Some examples are as follows: \begin{eqnarray*} \kappa^{rs, t} &=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t} + \kappa^s \kappa^{r,t}\\ &=& \kappa^{r,s,t} + \kappa^r\kappa^{s,t}[2]\\ \kappa^{rs,t u} &=& \kappa^{r,s,t,u} + \kappa^{r,s,t}\kappa^u[4] + \kappa^{r,t}\kappa^{s,u}[2] + \kappa^{r,t}\kappa^s\kappa^u[4]\\ \kappa^{rs,t,u} &=& \kappa^{r,s,t,u} + \kappa^{r,t,u}\kappa^s[2] + \kappa^{r,t}\kappa^{s,u}[2] \end{eqnarray*} Each generalized cumulant is associated with a partition $\tau$ of the given set of indices. For example, $\kappa^{rs,t,u}$ is associated with the partition $\tau=rs|t|u$ of four indices into three blocks. Each term on the right is a cumulant product associated with a partition $\sigma$ of the same indices. The coefficient is one if the least upper bound $\sigma\vee\tau$ has a single block, otherwise zero. Thus, with $\tau=rs|t|u$, the product $\kappa^{r,s}\kappa^{t,u}$ does not appear on the right because $\sigma\vee\tau = rs|t u$ has two blocks.

As an example of the way these formulae may be used, let $X$ be a scalar random variable with cumulants $\kappa_1,\kappa_2,\kappa_3,\ldots$. By translating the second formula in the preceding list, we find that the variance of the squared variable is $ {\rm var}(X^2) = \kappa_4 + 4\kappa_3\kappa_1 + 2\kappa_2^2 + 4\kappa_2\kappa_1^2, $ reducing to $\kappa_4 + 2\kappa_2^2$ if the mean is zero.

Exponential families

Let $ f$ be a probability distribution on an arbitrary measurable space $ ({\mathcal X},\nu)$ , and let $ t\colon{\mathcal X}\to{\mathcal R}$ be a real-valued random variable with cumulant generating function $ K(\cdot)$ , finite in a set $ \Theta$ containing zero in the interior. The family of distributions on $ {\mathcal X}$ with density $ f_\theta(x) = e^{\theta t(x)} f(x) / M(\theta) = e^{\theta t(x) - K(\theta)} f(x) $ indexed by $ \theta\in\Theta$ is called the exponential family associated with $ f$ and the canonical statistic~$ t$ . In statistical physics, the normalizing constant $ M(\theta)$ is called the partition function.

Two examples suffice to illustrate the idea. In the first example, $ {\mathcal X} = \{1,2,\ldots\}$ is the set of natural numbers, $ f(x) \propto 1/x^2$ and $ t(x) = -\log(x)$ . The associated exponential family is $ f_\theta(x) = x^{-\theta}/\zeta(\theta)$ , where $ \zeta(\theta)$ is the Riemann zeta function with real argument $ \theta > 1$ .

In the second example, $ {\mathcal X}={\mathcal X}_n$ is the symmetric group or the set of permutations of $ n$ letters, $ x\in{\mathcal X}_n$ is a permutation, $ t(x)$ is the number of cycles, $ f(x) = 1/n!$ is the uniform distribution, and $ M_n(\xi) = \Gamma(n+e^\xi)/(n!\, \Gamma(e^\xi))$ for all real~$ \xi$ . The exponential family of distributions on permutations of $ [n]$ is $ f_{n,\theta}(x) = \frac{\Gamma(\lambda)\, \lambda^{t(x)}} {\Gamma(n+\lambda)}, $ the same as the the distribution generated by the Chinese restaurant process with parameter $ \lambda = e^\theta$ . The associated marginal distribution on partitions, the Ewens distribution on partitions of $ [n]$ , is also of the exponential-family form with canonical statistic equal to the number of blocks or cycles. This number $ t(x)$ is a random variable whose cumulants are the derivatives of $ \log M(\cdot)$ evaluated at the parameter~$ \theta$ .

In the multi-parameter case, $ t\colon{\mathcal X}\to{\mathcal R}^p$ is a random vector and $ \xi\colon{\mathcal R}^p\to{\mathcal R}$ is a linear functional, $ M(\xi) = E(e^{\xi(t)})$ is the joint moment generating function. It is sometimes convenient to employ Einstein's implicit summation convention in the form $ \theta(t) = \theta_i t^i$ where $ t^1,\ldots, t^p$ are the components of $ t(x)$ , and $ \theta_1,\ldots, \theta_p$ are the coefficients of the linear functional. For simplicity of notation in what follows, $ {\mathcal X}={\mathcal R}^p$ and $ t(x) = x$ is the identity function. An exponential-family distribution in $ {\mathcal R}^p$ has the form $ f_\theta(x)=\exp(x^j\theta_j-g(x)-\varphi(\theta)) $ for given functions $ g$ and $ \varphi$ . Integration shows that the distribution $ f_\theta$ has cumulant generating function $ K_\theta(\xi)=\varphi(\theta+\xi)-\varphi(\theta)$ . The cumulants of $ X\sim f_\theta$ are equal to the derivatives of $ \varphi$ at the parameter~$ \theta$ .

Calculus of cumulants

The umbral calculus is a syntax or formal system consisting of certain operations on objects called umbrae, mimicking addition and multiplication of independent real-valued random variables (Rota and Taylor ,1994). To each real-valued sequence $ 1, a_1, a_2,\ldots$ there corresponds an umbra $ \alpha$ such that $ E(\alpha^r) = a_r$ . This freedom gives rise to special umbrae, the singleton and Bell umbra, corresponding to no real-valued random variable. Using these special umbrae, one develops the formal notion of an $ \alpha$-cumulant umbra $ \chi\cdot\alpha$ by formal product operations in the syntax. Properties of cumulants, $ k$ -statistics and other polynomial functions are then derived by purely formal combinatorial operations. Di~Nardo et~al. (2008) present details.

Approximation of distributions

Edgeworth approximation

Suppose that $Y$ is a random variable that arises as the sum of $n$ independent and identically-distributed summands, each of which has mean $0$, unit variance, and cumulants $\kappa_r$, and $X=Y/\sqrt{n}$. For ease of exposition, assume that cumulants of all orders exist. Then, using (<ref>ndep</ref>), the cumulant generating function of $X$ is given by $K(\xi)=\xi^2/2 +\kappa_3\xi^3/(6\sqrt{n}) +\kappa_4\xi^4/(24 n) +\cdots$, and the moment generating function of $X$ is given by $ K(\xi)=\exp(\xi^2/2)\exp(\kappa_3\xi^3/(6\sqrt{n})+\kappa_4\xi^4/(24 n)+\cdots) $ Exponentiating the second factor gives $ K(\xi)=\exp(\xi^2/2)\left(1\!+\!{{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\!+\! {\textstyle{\frac12}} \left[ {{\kappa_3\xi^3}\over{6\sqrt{n}}}\!+\!{{\kappa_4\xi^4}\over{24 n}}\!+\!\cdots\right]^2\!+\!\!\cdots\right). $ Reordering terms in powers of sample size, <math kseries> =\exp(\xi^2/2)\left(1+{{\kappa_3\xi^3}\over{6\sqrt{n}}}+{{\kappa_4\xi^4}\over{24 n}}+ {{\kappa_3^2\xi^6}\over{72 n}}+\cdots\right). </math> Repeated application of integration by parts to (<ref>mgfdef</ref>) shows that <math mgfderiv> \xi^r M(\xi) =\int_{-\infty}^\infty\exp(\xi x)(-1)^r f^{(r)}(x)~d x, </math> where $f^{(r)}$ denotes the derivative of $f$ of order $r$. Relation (<ref>mgfderiv</ref>) holds if $f$ and its derivatives go to zero quickly as $\vert x\vert\to\infty$. Applying (<ref>mgfderiv</ref>) to the normal density $\phi(x)=\exp(-x^2/2)/\sqrt{2\pi}$, and applying the result to (<ref>kseries</ref>), gives $ M(\xi)\approx\int_{-\infty}^\infty\exp(\xi x)\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+ {{\kappa_3^2h^6(x)}\over{72 n}}\right]~d x $ for $h^r(x)=(-1)^r\phi^{(r)}(x)/\phi(x)$, and, since the relationship giving the moment generating function in terms of the density is invertible, and that the inversion process is properly smooth, Edgeworth (1907) approximates the density of $X$ by <math edser> e_4(x)=\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+ {{\kappa_3^2h^6(x)}\over{72 n}}\right]. </math> In fact, when the summands contributing to $S$ have a density and cumulants of order at least 5, the error in the approximation, multiplied by $n^{3/2}$, remains bounded. The functions $h^r$ defined above are the Hermite polynomials. The approximation (<ref>edser</ref>) is known as the Edgeworth series. The subscript refers to the number of cumulants used in its definition. This series can be used to approximate either the cumulative distribution function or survival function through term-wise integration.

The preceding discussion is intended to be heuristic; Kolassa (2006) presents a rigorous derivation, along with the natural extension to random vectors.

Saddlepoint approximation

The approximation (<ref>edser</ref>) to the density $f(x)$ has the property that $|f(x)-e_r(x)|\leq C n^{-(r-1)/2}$, for some constant $C$, when the cumulant of order $r+1$ exists; $C$ does not depend on $x$. A similar bound holds for the relative error $(f(x)-e_r(x))/f(x)$, only when $x$ is restricted to a finite interval. Because of the polynomial factor multiplying the first omitted term in (<ref>edser</ref>), the relative error can be expected to behave poorly. One might prefer an approximation that maintains good behavior for values of $X$ in a range that increases as $n$ increases; specifically, one might prefer an approximation that performs well for values of $\bar Y=X/\sqrt{n}$ in a fixed interval.

Assume again that random variables $Y_j$ are independent and identically distributed, each with a cumulant generating function $K(\xi)$ finite for $\xi$ in a neighborhood of $0$. As above, define the exponential family $ f_{\bar Y}(\bar y;\theta)=\exp(\theta\bar y-K(\theta))f_{\bar Y}(\bar y). $ One can then choose a value of $\theta$ depending on $\bar y$ that makes $f_{\bar Y}(\bar y;\theta)$ easy to approximate, and the exponential family relationship to derive an approximation for $f_{\bar Y}(\bar y)$. Conventionally we choose $\hat\theta$ to satisfy <math speqn> K'(\hat\theta)=\bar y; </math> this makes the expectation of the distribution with density $f_{\bar Y}(\cdot;\hat\theta)$ equal to the observed value. One then applies (<ref>edser</ref>), with the scale of the ordinate changed to reflect the fact that we are approximating the distribution of $X/\sqrt{n}$, to obtain $ f_{\bar Y}(\bar y)\approx\exp(-\hat\theta\bar y+K(\hat\theta)) n\phi(0)\left[1+{{\kappa_3 h^3(0)}\over{6\sqrt{n}}}+{{\kappa_4h^4(0)}\over{24 n}}+ {{\kappa_3^2h^6(0)}\over{72 n}}\right]. $ Using the fact that $h^3(0)=0$, $h^4(0)=3$, and $h^6(0)=-15$, we obtain <math spser> f_{\bar Y}(\bar y)\approx{{n}\over{\sqrt{2\pi}}} \exp(K(\hat\theta)-\hat\theta\bar y) \left[1+{{\hat\kappa_4}\over{8 n}}- {{5\hat\kappa_3^2}\over{24 n}}\right]. </math> Here $\hat\kappa_j$ are calculated from the derivatives of $K$ in the preceding manner, but in this case evaluated at $\hat\theta$. This approximation may only be applied to values of $\bar y$ for which (<ref>speqn</ref>) has solutions in an open neighborhood of 0. Expression (<ref>spser</ref>) represents the saddlepoint approximation to the density of the mean $\bar Y$; since $f_{\bar Y}(\bar y;\theta)$ has a cumulant generating function defined on an open set containing $0$, cumulants of all orders exist, the Edgeworth series including $\kappa_6$ may be applied to $f_{\bar Y}(\bar y;\theta)$, and so the error in the Edgeworth series is of order $O(1/n^2)$. Hence the error in (<ref>spser</ref>) is of the same order, and in this case, is relative and uniform for values of $\bar y$ in a bounded subset of an open subset on which (<ref>speqn</ref>) has a solution. This approximation was introduced to the statistics literature by Daniels (1954).

The Edgeworth series for the density was trivially integrated to obtain an approximation to tail probabilities. Integration of the saddlepoint approximation is more delicate. Two main approaches have been investigated. Daniels (1987) expresses $f_{\bar Y}(\bar y)$ exactly as a complex integral involving $K(\xi)$, integrates with respect to $\bar y$ to obtain another complex integral, and reviews techniques for approximating the resulting integrals. Robinson (1982) and Lugannani and Rice (1980) derive tail probability approximations based on approximately integrating (<ref>spser</ref>) with respect to $\bar y$ directly.

These saddlepoint and Edgeworth approximations have multivariate and conditional extensions. Davison (1988) exploits the conditional saddlepoint tail probability approximation to perform inference in canonical exponential families.

Samples and sub-samples

A function $f\colon{\mathcal R}^n\to{\mathcal R}$ is symmetric if $f(x_1 ,\ldots, x_n) = f(x_{\pi(1)} ,\ldots, x_{\pi(n)})$ for each permutation $\pi$ of the arguments. For example, the total $T_n = x_1 + \cdots + x_n$, the average $T_n/n$, the min, max and median are symmetric functions, as are the sum of squares $S_n = \sum x_i^2$, the sample variance $s_n^2 = (S_n - T_n^2/n)/(n-1)$ and the mean absolute deviation $\sum |x_i - x_j|/(n(n-1))$.

A vector $x$ in ${\mathcal R}^n$ is an ordered list of $n$ real numbers $(x_1 ,\ldots, x_n)$ or a function $x\colon[n]\to{\mathcal R}$ where $[n]=\{1 ,\ldots, n\}$. For $m \le n$, a 1--1 function $\varphi\colon[m]\to[n]$ is a sample of size~ $m$, the sampled values being $x\varphi = (x_{\varphi(1)} ,\ldots, x_{\varphi(m)})$. All told, there are $n(n-1)\cdots(n-m+1)$ distinct samples of size~ $m$ that can be taken from a list of length~ $n$. A \emph{sequence} of functions $f_n\colon{\mathcal R}^n\to{\mathcal R}$ is consistent under sub-sampling if, for each $f_m, f_n$, $ f_n(x) = {\rm ave} _\varphi f_m(x\varphi), $ where ${\rm ave} _\varphi$ denotes the average over samples of size~ $m$. For $m=n$, this condition implies only that $f_n$ is a symmetric function.

Although the total and the median are both symmetric functions, neither is consistent under sub-sampling. For example, the median of the numbers $(0,1,3)$ is one, but the average of the medians of samples of size two is 4/3. However, the average $\bar x_n = T_n/n$ is sampling consistent. Likewise the sample variance $s_n^2 = \sum(x_i - \bar x)^2/(n-1)$ with divisor $n-1$ is sampling consistent, but the mean squared deviation $\sum(x_i - \bar x_n)^2/n$ with divisor $n$ is not. Other sampling consistent functions include Fisher's $k$-statistics, the first few of which are $k_{1,n} = \bar x_n$, $k_{2,n} = s_n^2$ for $n\ge 2$, \begin{eqnarray*} k_{3,n} &=& n\sum(x_i - \bar x_n)^3/((n-1)(n-2))\\ k_{4,n} &=& \end{eqnarray*} defined for $n\ge 3$ and $n\ge 4$ respectively.

For a sequence of independent and identically distributed random variables, the $k$-statistic of order~ $r\le n$ is the unique symmetric function such that $E(k_{r,n}) = \kappa_r$. Fisher (1929) derived the variances and covariances. The connection with finite-population sub-sampling was developed by Tukey (1950).

References

H. E. Daniels. Saddlepoint approximations in statistics. The Annals of Mathematical Statistics<\em>, 25 (4): 631--650, 1954.

H. E. Daniels. Tail probability approximations. Review of the International Statistical Institute<\em>, 55: 37--46, 1987.

A. C. Davison. Approximate conditional inference in generalized linear models. Journal of the Royal Statistical Society Series B<\em>, 50: 445--461, 1988.

E. Di Nardo, G. Guarino, and D. Senato. A unifying framework for $k$-statistics, polykays and their multivariate generalizations. Bernoulli<\em>, 14: 440--468, 2008.

P. L. Dressel. Statistical seminvariants and their setimates with particular emphasis on their relation to algebraic invariants. The Annals of Mathematical Statistics<\em>, 11 (1): 33--57, 1940.

F. Y. Edgeworth. On the representation of statistical frequency by a series. Journal of the Royal Statistical Society<\em>, 70 (1): 102--106, 1907.

R. A. Fisher. Moments and product moments of sampling distributions. Proceedings of the London Mathematical Society, Series 2<\em>, 30: 199--238, 1929.

I. J. Good. A new formula for k-statistics. The Annals of Statistics<\em>, 5 (1): 224--228, 1977.

C. C. Heyde. On a property of the lognormal distribution. Journal of the Royal Statistical Society. Series B (Methodological)}, 25 (2): 392--393, 1963.

J. E. Kolassa. Series Approximation Methods in Statistics<\em>. Springer--Verlag, New York, 2006.

R. Lugannani and S. Rice. Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability<\em>, 12: 475--490, 1980.

J. Marcinkiewicz. Sur une peropri\'et\'e de la loi de {G}auss. Mathematische Zeitschrift<\em>, 44: 612--618, 1939.

J. Robinson. Saddlepoint approximations for permutation tests and confidence intervals. Journal of the Royal Statistical Society. Series B (Methodological)}, 44 (1): 91--101, 1982.

G.-C. Rota and B. D. Taylor. The classical umbral calculus. SIAM J. Math. Anal<\em>, (25): 694--711, 1994.

T. N. Thiele. Theory of Observations<\em>. C. & E. Layton, London, 1903.

J. W. Tukey. Some sampling simplified. Journal of the American Statistical Association<\em>, 45 (252): 501--519, 1950.

Dr. John Kolassa

Revision as of 19:18, 5 September 2008

Contents

Overview and Definitions

Definition

Definitions under less restrictive conditions

Uniqueness

Properties

Multivariate cumulants

Exponential families

Calculus of cumulants

Approximation of distributions

Edgeworth approximation

Saddlepoint approximation

Samples and sub-samples

References

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Focal areas

Activity

Tools

@@ Line 34: / Line 34: @@
 <math>r</math>th derivative of
 <math>M</math> at the origin.
-This definition is due to \citet{1929}.
+This definition is due to Fisher (1929).
 When
@@ Line 86: / Line 86: @@
 ===Definitions under less restrictive conditions===
 The Cauchy distribution with density <math> \pi^{-1}/(1+x^2)</math> has no moments because
-the integral (\ref{ctsmomdef}) does not converge for any integer~<math> r\ge 1</math>
+the integral (<ref>ctsmomdef</ref>) does not converge for any integer~<math> r\ge 1</math>
 Student's~<math> t</math> distribution on five degrees of freedom is symmetric with density
 <math> (3\pi\surd5/8)/(1 + x^2/5)^3</math>
 The first four moments are <math> 0, 5/3, 0, 25</math> : higher-order moments are
 not defined.
-The cumulants up to order four are defined by (\ref{forward})
+The cumulants up to order four are defined by (<ref>forward</ref>)
-even though the moment generating function (\ref{powerseries}) does not exist
+even though the moment generating function (<ref>powerseries</ref>) does not exist
 for any real <math> \xi\neq 0</math> .
@@ Line 102: / Line 102: @@
 have Taylor expansions up to order four only, so the moments and
 cumulants are defined only up to this order.
-The infinite expansion (\ref{powerseries}) is justified when
+The infinite expansion (<ref>powerseries</ref>) is justified when
 the radius of convergence is positive, in which case <math> M(\xi)</math> is finite on
 an open set containing zero, and all moments and cumulants are finite.
@@ Line 109: / Line 109: @@
 The log normal distribution provides a counterexample.
 It has finite moments <math> \mu_r = e^{r^2/2}</math> of all orders,
-but (\ref{powerseries}) diverges for every~<math> \xi\neq 0</math>.
+but (<ref>powerseries</ref>) diverges for every~<math> \xi\neq 0</math>.
 ===Uniqueness===
 The normal distribution
@@ Line 115: / Line 115: @@
 <math>\xi\mu + \xi^2 \sigma^2/2</math>, a quadratic polynomial implying that all cumulants
 of order three and higher are zero.
-\citet{1939} showed that the normal distribution is the only distribution
+Marcinkiewicz (1939) showed that the normal distribution is the only distribution
 whose cumulant generating function is a polynomial, i.e.~the only distribution
 having a finite number of non-zero cumulants.
@@ Line 128: / Line 128: @@
 or even for distributions having infinite higher-order moments.
 But it is much less obvious for distributions having finite moments of all orders.
-\citet{1963} gave one such pair of distributions with densities
+Heyde (1963) gave one such pair of distributions with densities
 <math>
 f_1(x) = \exp(-(\log x)^2/2) / (x\sqrt{2\pi})
@@ Line 146: / Line 146: @@
 <math>\log x = y+k</math>.
-If the sequence of moments is such that (\ref{powerseries})
+If the sequence of moments is such that (<ref>powerseries</ref>)
 has a finite radius of convergence, the distribution is uniquely determined.
@@ Line 152: / Line 152: @@
 Cumulants of order
 <math>r \ge 2</math> are called semi-invariant on account of their
-behavior under affine transformation of variables (\cite{1903}, \cite{1940}).
+behavior under affine transformation of variables (Thiele ,1903, Dressel ,1940).
 If
 <math>\kappa_r</math> is the
@@ Line 167: / Line 167: @@
 does not explain why cumulants are useful for statistical purposes.
-The term cumulant was coined by \citet{1929} on account of their behavior under
+The term cumulant was coined by Fisher (1929) on account of their behavior under
 addition of random variables.
 Let
@@ Line 200: / Line 200: @@
 of the standardized sum tend to zero, which is a simple demonstration of the central limit theorem.
-\citet{1977} obtained an expression for the
+Good (1977) obtained an expression for the
 <math>r</math>th cumulant of
 <math>X</math> as
@@ Line 411: / Line 411: @@
 certain operations on objects called umbrae,
 mimicking addition and multiplication of independent real-valued random variables
-\citep{1994}.
+(Rota and Taylor ,1994).
 To each real-valued sequence <math> 1, a_1, a_2,\ldots</math>
 there corresponds an umbra <math> \alpha</math> such that <math> E(\alpha^r) = a_r</math> .
@@ Line 420: / Line 420: @@
 Properties of cumulants, <math> k</math> -statistics and other polynomial functions
 are then derived by purely formal combinatorial operations.
-\citet{2008} present details.
+Di~Nardo et~al. (2008) present details.
 ==Approximation of distributions==
 ===Edgeworth approximation===
@@ Line 433: / Line 433: @@
 <math>X=Y/\sqrt{n}</math>.
 For ease of exposition, assume that cumulants of all orders exist.
-Then, using (\ref{ndep}), the cumulant generating function of
+Then, using (<ref>ndep</ref>), the cumulant generating function of
 <math>X</math> is given by
 <math>K(\xi)=\xi^2/2 +\kappa_3\xi^3/(6\sqrt{n}) +\kappa_4\xi^4/(24 n) +\cdots</math>,
@@ Line 451: / Line 451: @@
 {{\kappa_3^2\xi^6}\over{72 n}}+\cdots\right).
 </math>
-Repeated application of integration by parts to (\ref{mgfdef}) shows that
+Repeated application of integration by parts to (<ref>mgfdef</ref>) shows that
 <math  mgfderiv>
 \xi^r M(\xi) =\int_{-\infty}^\infty\exp(\xi x)(-1)^r f^{(r)}(x)~d x,
@@ Line 459: / Line 459: @@
 <math>f</math> of order
 <math>r</math>.  Relation
-(\ref{mgfderiv}) holds if
+(<ref>mgfderiv</ref>) holds if
 <math>f</math> and its derivatives go to zero quickly
 as
-<math>\vert x\vert\to\infty</math>.  Applying (\ref{mgfderiv}) to the normal
+<math>\vert x\vert\to\infty</math>.  Applying (<ref>mgfderiv</ref>) to the normal
 density
 <math>\phi(x)=\exp(-x^2/2)/\sqrt{2\pi}</math>, and applying the result to
-(\ref{kseries}), gives
+(<ref>kseries</ref>), gives
 <math>
 M(\xi)\approx\int_{-\infty}^\infty\exp(\xi x)\phi(x)\left[1+{{\kappa_3 h^3(x)}\over{6\sqrt{n}}}+{{\kappa_4h^4(x)}\over{24 n}}+
@@ Line 474: / Line 474: @@
 giving the moment generating function in terms of the density is invertible,
 and that the inversion process is properly smooth,
-\citet{1907} approximates the density of
+Edgeworth (1907) approximates the density of
 <math>X</math> by
 <math  edser>
@@ Line 486: / Line 486: @@
 The functions
 <math>h^r</math> defined above are the Hermite polynomials.
-The approximation (\ref{edser}) is known as the Edgeworth series.
+The approximation (<ref>edser</ref>) is known as the Edgeworth series.
 The subscript refers to the number of cumulants used in its definition.
 This series can be used to approximate either the cumulative distribution function or survival function through term-wise integration.
-The preceding discussion is intended to be heuristic; \citet{2006} presents
+The preceding discussion is intended to be heuristic; Kolassa (2006) presents
 a rigorous derivation, along with the natural extension to random vectors.
 ===Saddlepoint approximation===
-The approximation (\ref{edser}) to the density
+The approximation (<ref>edser</ref>) to the density
 <math>f(x)</math> has the property that
 <math>|f(x)-e_r(x)|\leq C n^{-(r-1)/2}</math>, for some constant
@@ Line 505: / Line 505: @@
 <math>x</math> is restricted to a finite interval.
 Because of the polynomial factor multiplying the first omitted term in
-(\ref{edser}), the relative error can be expected to behave poorly.
+(<ref>edser</ref>), the relative error can be expected to behave poorly.
 One might prefer an approximation that maintains good behavior for
 values of
@@ Line 537: / Line 537: @@
 with density
 <math>f_{\bar Y}(\cdot;\hat\theta)</math> equal to the observed value.
-One then applies (\ref{edser}), with the scale of the ordinate changed
+One then applies (<ref>edser</ref>), with the scale of the ordinate changed
 to reflect the fact that we are approximating the distribution of
 <math>X/\sqrt{n}</math>,
@@ Line 560: / Line 560: @@
 This approximation may only be applied to values of
 <math>\bar y</math> for which
-(\ref{speqn}) has solutions in an open neighborhood of 0.
+(<ref>speqn</ref>) has solutions in an open neighborhood of 0.
-Expression (\ref{spser}) represents the saddlepoint approximation to
+Expression (<ref>spser</ref>) represents the saddlepoint approximation to
 the density of the mean
 <math>\bar Y</math>; since
@@ Line 572: / Line 572: @@
 <math>f_{\bar Y}(\bar y;\theta)</math>, and so the error in the
 Edgeworth series is of order
-<math>O(1/n^2)</math>.  Hence the error in (\ref{spser})
+<math>O(1/n^2)</math>.  Hence the error in (<ref>spser</ref>)
 is of the same order, and in this case, is relative and uniform for values of
-<math>\bar y</math> in a bounded subset of an open subset on which (\ref{speqn})
+<math>\bar y</math> in a bounded subset of an open subset on which (<ref>speqn</ref>)
 has a solution.
 This approximation was introduced to the statistics literature by
-\citet{1954}.
+Daniels (1954).
 The Edgeworth series for the density was trivially integrated to obtain an
 approximation to tail probabilities.  Integration of the saddlepoint
 approximation is more delicate.  Two main approaches have been investigated.
-\citet{1987} expresses
+Daniels (1987) expresses
 <math>f_{\bar Y}(\bar y)</math> exactly as a complex integral
 involving
@@ Line 589: / Line 589: @@
 complex integral, and reviews techniques for approximating the resulting
 integrals.
-\citet{1982} and \citet{1980} derive tail probability approximations based
+Robinson (1982) and Lugannani and Rice (1980) derive tail probability approximations based
-on approximately integrating (\ref{spser}) with respect to
+on approximately integrating (<ref>spser</ref>) with respect to
 <math>\bar y</math> directly.
 These saddlepoint and Edgeworth approximations have multivariate and
-conditional extensions.  \citet{1988} exploits the conditional saddlepoint tail probability approximation to perform inference in canonical exponential families.
+conditional extensions.  Davison (1988) exploits the conditional saddlepoint tail probability approximation to perform inference in canonical exponential families.
 ==Samples and sub-samples==
 A function
@@ Line 671: / Line 671: @@
 <math>n\ge 4</math> respectively.
+For a sequence of independent and identically distributed random variables,
+the
+<math>k</math>-statistic of order~
+<math>r\le n</math> is the unique symmetric function
+such that
+<math>E(k_{r,n}) = \kappa_r</math>.
+Fisher (1929) derived the variances and covariances.
+The connection with finite-population sub-sampling was developed by
+Tukey (1950).
 ==References==
-<references/>
+*
-\bibliography{cumulants}
+H. E. Daniels.
+Saddlepoint approximations in statistics.
+<em>The Annals of Mathematical Statistics<\em>, 25
+(4): 631--650, 1954.
+*
+H. E. Daniels.
+Tail probability approximations.
+<em>Review of the International Statistical Institute<\em>,
+: 37--46, 1987.
+*
+A. C. Davison.
+Approximate conditional inference in generalized linear models.
+<em>Journal of the Royal Statistical Society Series B<\em>,
+: 445--461, 1988.
+*
+E. Di Nardo, G. Guarino, and D. Senato.
+A unifying framework for $k$-statistics, polykays and their
+multivariate generalizations.
+<em>Bernoulli<\em>, 14: 440--468, 2008.
+*
+P. L. Dressel.
+Statistical seminvariants and their setimates with particular
+emphasis on their relation to algebraic invariants.
+<em>The Annals of Mathematical Statistics<\em>, 11
+(1): 33--57, 1940.
+*
+F. Y. Edgeworth.
+On the representation of statistical frequency by a series.
+<em>Journal of the Royal Statistical Society<\em>, 70
+(1): 102--106, 1907.
+*
+R. A. Fisher.
+Moments and product moments of sampling distributions.
+<em>Proceedings of the London Mathematical Society, Series 2<\em>,
+: 199--238, 1929.
+*
+I. J. Good.
+A new formula for k-statistics.
+<em>The Annals of Statistics<\em>, 5 (1): 224--228,
+.
+*
+C. C. Heyde.
+On a property of the lognormal distribution.
+Journal of the Royal Statistical Society. Series B
+(Methodological)}, 25 (2): 392--393, 1963.
+*
+J. E. Kolassa.
+<em>Series Approximation Methods in Statistics<\em>.
+Springer--Verlag, New York, 2006.
+*
+R. Lugannani and S. Rice.
+Saddle point approximation for the distribution of the sum of
+independent random variables.
+<em>Advances in Applied Probability<\em>, 12: 475--490, 1980.
+*
+J. Marcinkiewicz.
+Sur une peropri\'et\'e de la loi de {G}auss.
+<em>Mathematische Zeitschrift<\em>, 44: 612--618, 1939.
+*
+J. Robinson.
+Saddlepoint approximations for permutation tests and confidence
+intervals.
+Journal of the Royal Statistical Society. Series B
+(Methodological)}, 44 (1): 91--101, 1982.
+*
+G.-C. Rota and B. D. Taylor.
+The classical umbral calculus.
+<em>SIAM J. Math. Anal<\em>,  (25): 694--711, 1994.
+*
+T. N. Thiele.
+<em>Theory of Observations<\em>.
+C. & E. Layton, London, 1903.
+*
+J. W. Tukey.
+Some sampling simplified.
+<em>Journal of the American Statistical Association<\em>, 45
+(252): 501--519, 1950.