# Second order efficiency

Post-publication activity

The Fisher–Rao Theorem provides an asymptotic bound to loss of information in replacing the sample by an estimator of the unknown parameters. The Rao Theorem provides a lower bound to asymptotic variance of an estimator up to terms of $$O(1/n^2)\ .$$

## Definition

Denote by $$L(X,\theta)$$ the likelihood based on a sample of $$n\!$$ independent observations, $$x_1,\cdots,x_n$$ which we represent by $$X\!\ .$$ Further let

$Z(\theta) = \frac{\partial \log L}{\partial \theta}, ni(\theta) = V \left[Z(\theta) \right]$

where $$ni(\theta)$$ is Fisher information in the sample. Let $$T$$ be an estimator of $$\theta$$ and $$M(T,\theta)$$ be the likelihood based on $$T\ .$$ Define:

$ni_T = V \left[ \frac{\partial \log M}{\partial \theta} \right]$

as the information contained in $$T\ .$$ Fisher (1925) considered

$E^\prime = \lim n(i-i_T)$

as a measures of efficiency of T and showed that the maximum likelihood estimator has the minimum value for $$E^\prime$$ in the estimation of parameter in a $$k$$–category multinomial distribution, with cell probabilities as functions of $$\theta\ .$$

Rao (1961) defined first order efficiency of an estimator $$T$$ as the property

$\mid n^{-1/2} Z(\theta) - \alpha - \beta n^{1/2} (T-\theta) \mid \to 0$ in probability as $$n \to \infty\ ,$$ for appropriate choice of $$\alpha$$ and $$\beta\ ,$$ which implies as Doob (1934) showed $$i_T \to i$$ as $$n \to \infty\ .$$

Rao (1961) defined the second order efficiency as

$E = \min_{\lambda} V_a \left[ Z(\theta) - n^{1/2}\alpha - n\beta(T-\theta) -n\lambda(T-\theta)^2 \right]$

where $$V_a$$ stands for asymptotic variance.

The two concepts of Fisher (1925) and of Rao (1961) are similar but in particular cases $$E$$ and $$E^\prime$$ may not be the same as pointed out by Efron (1975). However, Fisher (1925) reported $$E$$ as his computation of $$E^\prime\ .$$ In the case of a multinomial distribution with probabilities in the classes as $$\pi_1(\theta),\pi_2(\theta),\cdots,\pi_k(\theta)\ ,$$ $$E$$ has the lower bound obtained by Fisher and Rao

$\tag{1} \frac{\mu_{02}-2\mu_{21}+\mu_{40}}{i} - i - \frac{\mu_{11}^2 + \mu_{30}^2 - 2\mu_{11}\mu_{30}}{i^2}$

where

$\mu_{rs} = \sum \pi_j \left( \frac{\pi^\prime_j}{\pi_j} \right)^r\left( \frac{\pi^{\prime\prime}_j}{\pi_j} \right)^s$

which is attained by the maximum likelihood estimator. Efron (1975) called the result (1) the Fisher–Rao Theorem. He extended the computations to exponential family and identified the expression $$E$$ as the curvature of the family of distributions at $$\theta\!\ .$$ In another paper, Rao (1961) obtained the expansion of the asymptotic variance of a consistent estimator, corrected for bias of $$O(1/n)\ ,$$ up to terms of $$O(1/n^2)$$ as

$\tag{2} \frac{1}{ni} + \frac{\phi}{n^2} + o(1/n^2)$

and showed that the minimum value of $$\phi$$ is

$\tag{3} \frac{E}{i^2} + \frac{\mu_{11}^2}{2i^4}$

which is attained by the maximum likelihood estimator (MLE). Ghosh and Subramanyam (1974) identified the result (2), (3) as the Rao Theorem. They clarified the computations of $$E^\prime$$ and $$E$$ and extended the results to exponential family of distributions.

## History

After Fisher (1922) introduced maximum likelihood as a general method of estimation of unknown parameters asserting that it provides estimators which are consistent and have least asymptotic variance, several papers appeared questioning Fisher’s claims. Examples have been given of other methods of estimation which yield estimators with the same or better properties. This motivated the author to make a deeper investigation of properties of estimators and methods of estimation. In a series of papers, Rao (1960, 1961, 1962, 1963) introduced the concepts of Fisher consistency, which places a restriction on the estimating function, first order efficiency, correction for bias up to $$O(1/n)\ .$$ These concepts bring out maximum likelihood estimates as having better properties than those obtained by other proposed methods.

## Applications

Second Order Efficiency (SOE) provides an effective measure to choose an estimator with the best possible summary of data for drawing inference. Berkson (1955) claimed that minimum logit chi-square estimator performs better than the maximum likelihood (ML) estimator. Ghosh and Subramanyam (1974) showed that the ML estimator corrected for bias has better performance in terms of (SOE)