Completeness (statistics)

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, completeness is a property of a statistic in relation to a model for a set of observed data. In essence, it ensures that the distributions corresponding to different values of the parameters are distinct.

It is closely related to the idea of identifiability, but in statistical theory it is often found as a condition imposed on a sufficient statistic from which certain optimality results are derived.


Consider a random variable X whose probability distribution belongs to a parametric family of probability distributions Pθ parametrized by θ.

Formally, a statistic s is a measurable function of X; thus, a statistic s is evaluated on a random variable X, taking the value s(X), which is itself a random variable. A given realization of the random variable X(ω) is a data-point (datum), on which the statistic s takes the value s(X(ω)).

The statistic s is said to be complete for the distribution of X if, for every measurable function g,[1]

if E(g(s(X))) = 0 for all θ then Pθ(g(s(X)) = 0) = 1 for all θ.

The statistic s is said to be boundedly complete for the distribution of X if this implication holds for every measurable function g that is also bounded.

Example 1: Bernoulli model[edit]

The Bernoulli model admits a complete statistic.[2] Let X be a random sample of size n such that each Xi has the same Bernoulli distribution with parameter p. Let T be the number of 1s observed in the sample. T is a statistic of X which has a binomial distribution with parameters (n,p). If the parameter space for p is (0,1), then T is a complete statistic. To see this, note that

Observe also that neither p nor 1 − p can be 0. Hence if and only if:

On denoting p/(1 − p) by r, one gets:

First, observe that the range of r is the positive reals. Also, E(g(T)) is a polynomial in r and, therefore, can only be identical to 0 if all coefficients are 0, that is, g(t) = 0 for all t.

It is important to notice that the result that all coefficients must be 0 was obtained because of the range of r. Had the parameter space been finite and with a number of elements smaller than n, it might be possible to solve the linear equations in g(t) obtained by substituting the values of r and get solutions different from 0. For example, if n = 1 and the parameter space is {0.5}, a single observation, T is not complete. Observe that, with the definition:

then, E(g(T)) = 0 although g(t) is not 0 for t = 0 nor for t = 1.

Example 2: Sum of normals[edit]

This example will show that, in a sample X1X2 of size 2 from a normal distribution with known variance, the statistic X1 + X2 is complete and sufficient. Suppose (X1, X2) are independent, identically distributed random variables, normally distributed with expectation θ and variance 1. The sum

is a complete statistic for θ.

To show this, it is sufficient to demonstrate that there is no non-zero function such that the expectation of

remains zero regardless of the value of θ.

That fact may be seen as follows, the probability distribution of X1 + X2 is normal with expectation 2θ and variance 2. Its probability density function in is therefore proportional to

The expectation of g above would therefore be a constant times

A bit of algebra reduces this to

where k(θ) is nowhere zero and

As a function of θ this is a two-sided Laplace transform of h(X), and cannot be identically zero unless h(x) is zero almost everywhere.[3] The exponential is not zero, so this can only happen if g(x) is zero almost everywhere.

Relation to sufficient statistics[edit]

For some parametric families, a complete sufficient statistic does not exist (for example, see Galili and Meilijson 2016 [4]). Also, a minimal sufficient statistic need not exist. (A case in which there is no minimal sufficient statistic was shown by Bahadur in 1957.[citation needed]) Under mild conditions, a minimal sufficient statistic does always exist. In particular, these conditions always hold if the random variables (associated with Pθ ) are all discrete or are all continuous.[citation needed]

Importance of completeness[edit]

The notion of completeness has many applications in statistics, particularly in the following two theorems of mathematical statistics.

Lehmann–Scheffé theorem[edit]

Completeness occurs in the Lehmann–Scheffé theorem,[5] which states that if a statistic that is unbiased, complete and sufficient for some parameter θ, then it is the best mean-unbiased estimator for θ. In other words, this statistic has a smaller expected loss for any convex loss function; in many practical applications with the squared loss-function, it has a smaller mean squared error among any estimators with the same expected value.

Examples exists that when the minimal sufficient statistic is not complete then several alternative statistics exist for unbiased estimation of θ, while some of them have lower variance than others.[6]

See also minimum-variance unbiased estimator.

Basu's theorem[edit]

Bounded completeness occurs in Basu's theorem,[7] which states that a statistic that is both boundedly complete and sufficient is independent of any ancillary statistic.

Bahadur's theorem[edit]

Bounded completeness also occurs in Bahadur's theorem. In the case where there exists at least one minimal sufficient statistic, a statistic which is sufficient and boundedly complete, is necessarily minimal sufficient.


  1. ^ Young, G. A. and Smith, R. L. (2005). Essentials of Statistical Inference. (p. 94). Cambridge University Press.
  2. ^ Casella, G. and Berger, R. L. (2001). Statistical Inference. (pp. 285–286). Duxbury Press.
  3. ^ Orloff, Jeremy. "Uniqueness of Laplace Transform" (PDF). 
  4. ^ Tal Galili & Isaac Meilijson (31 Mar 2016). "An Example of an Improvable Rao–Blackwell Improvement, Inefficient Maximum Likelihood Estimator, and Unbiased Generalized Bayes Estimator". The American Statistician. 70 (1): 108–113. doi:10.1080/00031305.2015.1100683. PMC 4960505Freely accessible. 
  5. ^ Casella, George; Berger, Roger L. (2001). Statistical Inference (2nd ed.). Duxbury Press. ISBN 978-0534243128. 
  6. ^ Tal Galili & Isaac Meilijson (31 Mar 2016). "An Example of an Improvable Rao–Blackwell Improvement, Inefficient Maximum Likelihood Estimator, and Unbiased Generalized Bayes Estimator". The American Statistician. 70 (1): 108–113. doi:10.1080/00031305.2015.1100683. PMC 4960505Freely accessible. 
  7. ^ Casella, G. and Berger, R. L. (2001). Statistical Inference. (pp. 287). Duxbury Press.