Abstract
Blackwell’s 1947 paper “Conditional Expectation and Unbiased Sequential Estimation” (Annals of Mathematical Statistics, 18(1), 105-110) gives a sharp result about how to improve an estimator: given any unbiased estimator T of a parameter θ, and a sufficient statistic S for θ, the conditional expectation E[T | S] is also unbiased for θ and has variance no greater than T. Together with C. R. Rao’s independent 1945 result, this is now universally known as the Rao-Blackwell theorem. The improvement is strict whenever T is not already a function of S. This note states the theorem, sketches the proof, walks one concrete example, and points to its role in modern statistics.
Setting and notation
Let X₁, …, Xₙ be a sample drawn from a distribution indexed by an unknown parameter θ ∈ Θ. Write P_θ for the probability measure on the sample space under parameter θ, and E_θ for the expectation under P_θ.
An estimator T(X₁, …, Xₙ) is a measurable function of the sample. It is an unbiased estimator of θ when E_θ[T] = θ for every θ. We compare two unbiased estimators by their variance: the smaller the better.
A statistic S(X₁, …, Xₙ) is a sufficient statistic for θ when the conditional distribution of the sample given S does not depend on θ. Informally: once we know S, the original data carry no further information about θ.
Statement of the theorem
Rao-Blackwell. Let T be an unbiased estimator of θ with finite variance, and let S be a sufficient statistic for θ. Define T* := E[T | S]. Then T* is itself an unbiased estimator of θ, and
Var_θ(T*) ≤ Var_θ(T) for every θ,
with equality if and only if T is already a function of S (up to a P_θ-null set).
The conditional expectation T* is well-defined because S is sufficient: the distribution of T given S, and hence its mean, does not depend on θ. So T* is a legitimate estimator — its evaluation requires only the data, not the unknown θ.
Proof sketch
Two ingredients. First, the tower property of conditional expectation:
E_θ[T*] = E_θ[E[T | S]] = E_θ[T] = θ,
which gives unbiasedness.
Second, the variance decomposition: for any square-integrable random variable T and any σ-algebra G,
Var(T) = E[Var(T | G)] + Var(E[T | G]).
Applied with G = σ(S) this gives
Var_θ(T) = E_θ[Var(T | S)] + Var_θ(T*).
Both terms on the right are non-negative, so Var_θ(T*) ≤ Var_θ(T). Equality holds iff E_θ[Var(T | S)] = 0, which forces Var(T | S) = 0 almost surely — i.e. T is determined by S.
The same argument extends to any convex loss function L by Jensen’s inequality: E[L(T*, θ)] ≤ E[L(T, θ)]. The variance case (squared error loss) is the special case usually stated.
A concrete example: estimating exp(-λ) from a Poisson sample
Let X₁, …, Xₙ be i.i.d. Poisson with mean λ. Suppose we want to estimate τ(λ) = exp(-λ), the probability that a single Poisson observation is zero.
A simple unbiased estimator: let T := 1 if X₁ = 0, else 0. Then E_λ[T] = P_λ(X₁ = 0) = exp(-λ), so T is unbiased. But T uses only the first observation; its variance is exp(-λ) (1 - exp(-λ)), which is large for moderate λ and uses none of the rest of the sample.
The sample sum S = X₁ + … + Xₙ is sufficient for λ (this is a standard exponential-family fact). The Rao-Blackwellised estimator is
T* = E[1_{X₁ = 0} | S = s] = P(X₁ = 0 | X₁ + … + Xₙ = s).
Given S = s, the joint distribution of (X₁, …, Xₙ) is multinomial(s; 1/n, …, 1/n) — every Poisson sample of size n with fixed sum has this conditional law, regardless of λ. So
P(X₁ = 0 | S = s) = (1 - 1/n)^s.
The improved estimator is therefore T* = ((n-1)/n)^S. Its variance is dramatically smaller than that of T for n much larger than 1, and it uses all of the data. By the theorem we are guaranteed Var_λ(T*) ≤ Var_λ(T) with strict inequality (since T is not a function of S alone).
Why this matters
The theorem gives a recipe: take any unbiased estimator and condition it on a sufficient statistic to get something at least as good. When combined with completeness of the sufficient statistic (Lehmann-Scheffé, 1950), it yields the minimum-variance unbiased estimator: there is essentially only one Rao-Blackwellisation up to almost-sure equality, and it dominates every competing unbiased estimator simultaneously.
Three consequences are worth pulling out.
A uniqueness criterion. Two unbiased estimators with the same conditional expectation given a complete sufficient statistic must be equal almost surely. This is how textbooks prove uniqueness of an MVUE once one is constructed.
A construction recipe. To find an MVUE for τ(θ), find any unbiased estimator, project it onto the σ-algebra generated by a complete sufficient statistic. The Poisson exp(-λ) example above is the classroom illustration.
A decision-theoretic upgrade. Since the result holds for any convex loss, the same idea works in Bayesian decision theory: any decision rule is dominated by its conditional expectation given a sufficient statistic, so we can restrict attention to rules that are functions of S without loss of risk.
Relation to Rao’s earlier work
C. R. Rao’s 1945 “Information and accuracy attainable in the estimation of statistical parameters” (Bulletin of the Calcutta Mathematical Society, 37, 81-91) had proved a closely related lower bound — the Cramér-Rao inequality — and noted that conditioning on a sufficient statistic could not increase variance. Blackwell’s 1947 paper gives the cleanest modern statement and the variance-decomposition proof. The combined name first appears in Lehmann’s textbook of 1959.
References
C. R. Rao. Information and accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society, 37:81-91, 1945.
D. Blackwell. Conditional expectation and unbiased sequential estimation. Annals of Mathematical Statistics, 18(1):105-110, 1947.
E. L. Lehmann and H. Scheffé. Completeness, similar regions, and unbiased estimation — Part I. Sankhyā, 10:305-340, 1950.
E. L. Lehmann. Testing Statistical Hypotheses. Wiley, 1959. (Section 1.6 gives the modern textbook treatment.)