Sufficient Statistics and Likelihood Functions

Introduction

When a statistic $T(\mathbf{Y})$ is sufficient for a parameter $\theta$, we can use its probability density function (pdf) or probability mass function (pmf) as the likelihood function for $\theta$, up to a constant of proportionality. This document summarizes the key theorems and concepts.

The Core Result

If $T$ is sufficient for $\theta$, then:

\[L(\theta \mid \mathbf{y}) \propto f_T(t \mid \theta)\]

where: - $L(\theta \mid \mathbf{y})$ is the likelihood based on the full data - $f_T(t \mid \theta)$ is the pdf/pmf of $T$ evaluated at the observed value $t = T(\mathbf{y})$ - $\propto$ means “proportional to as a function of $\theta$”

Key Theorems

1. Factorization Theorem (Fisher-Neyman)

Statement: $T(\mathbf{Y})$ is sufficient for $\theta$ if and only if

\[f(\mathbf{y} \mid \theta) = h(\mathbf{y}) \times g(T(\mathbf{y}), \theta)\]

where: - $h(\mathbf{y})$ does not depend on $\theta$ - $g(T(\mathbf{y}), \theta)$ depends on $\mathbf{y}$ only through $T(\mathbf{y})$

Implication for likelihood:

\[L(\theta \mid \mathbf{y}) = h(\mathbf{y}) \times g(t, \theta) \propto g(t, \theta) \propto f_T(t \mid \theta)\]

This is the most practical and commonly used theorem.

2. Conditional Distribution Definition (Original Fisher Definition)

Statement: $T$ is sufficient for $\theta$ if the conditional distribution of $\mathbf{Y}$ given $T = t$ does not depend on $\theta$.

Derivation:

\[f_{\mathbf{Y}}(\mathbf{y} \mid \theta) = f_{\mathbf{Y}|T}(\mathbf{y} \mid t) \times f_T(t \mid \theta)\]

Since $f_{\mathbf{Y}|T}(\mathbf{y} \mid t)$ is free of $\theta$ (by sufficiency):

\[L(\theta \mid \mathbf{y}) \propto f_T(t \mid \theta)\]

3. Likelihood Ratio Criterion for Sufficiency

Statement: $T$ is sufficient for $\theta$ if and only if for every pair of sample points $\mathbf{y}_1$ and $\mathbf{y}_2$ with $T(\mathbf{y}_1) = T(\mathbf{y}_2)$:

\[L(\theta \mid \mathbf{y}_1) \propto L(\theta \mid \mathbf{y}_2) \quad \text{(as functions of $\theta$)}\]

Implication: If $T$ is sufficient, all sample points yielding the same $T$ give proportional likelihoods, so we can replace $\mathbf{y}$ with $t$.

4. Neyman Sufficiency Principle

Statement: If $T$ is sufficient for $\theta$, then there exists a function $c(\mathbf{y}, t)$ such that for all $\theta$:

\[L(\theta \mid \mathbf{y}) = c(\mathbf{y}, t) \times L^*(\theta \mid t)\]

where $c(\mathbf{y}, t)$ does not depend on $\theta$, and $L^*(\theta \mid t)$ is the likelihood based on $T$ alone.

5. Blackwell-Rao Theorem (Indirect)

Statement: If $T$ is sufficient for $\theta$ and $\hat{\theta}$ is an estimator of $\theta$, then $\mathbb{E}[\hat{\theta} \mid T]$ has mean squared error less than or equal to that of $\hat{\theta}$.

Implication: All information about $\theta$ is contained in $T$, so likelihood-based inference can be based solely on $T$.

6. Likelihood Principle (Foundational)

Statement: All evidence from data about $\theta$ is contained in the likelihood function.

Combined with sufficiency: If $T$ is sufficient, the likelihood based on $\mathbf{y}$ and the likelihood based on $T$ are proportional, so inference should be identical.

Example: Normal Distribution with Known Variance

Let $Y_i \stackrel{\text{iid}}{\sim} N(\mu, \sigma^2)$ with $\sigma^2$ known.

Full likelihood: \[L(\mu \mid y_1, \ldots, y_n) \propto \exp\left[-\frac{\sum (y_i - \mu)^2}{2\sigma^2}\right]\]

Sufficient statistic: $\bar{Y} = \frac{1}{n}\sum Y_i$

Distribution of $\bar{Y}$: $\bar{Y} \mid \mu \sim N(\mu, \sigma^2/n)$

Likelihood based on $\bar{Y}$: \[L(\mu \mid \bar{y}) \propto \exp\left[-\frac{n(\bar{y} - \mu)^2}{2\sigma^2}\right]\]

These are proportional as functions of $\mu$, confirming the theorem.

Important Caveat

Using the pdf of a statistic as the likelihood is valid if and only if** that statistic is sufficient.**

Statistic	Sufficient?	Can use its pdf as likelihood?
$\bar{Y}$ for $N(\mu, \sigma^2)$ with $\sigma^2$ known	✅ Yes	✅ Yes
$\max Y_i$ for Uniform$(0, \theta)$	✅ Yes	✅ Yes
Sample median for $N(\mu, \sigma^2)$	❌ No (not sufficient)	❌ No
$\bar{Y}$ for Uniform$(0, \theta)$	❌ No	❌ No

Practical Takeaway

Find a sufficient statistic $T$ for $\theta$ (often using the Factorization Theorem)
Derive the sampling distribution of $T$ (pdf/pmf $f_T(t \mid \theta)$)
Use $f_T(t \mid \theta)$ as the likelihood for Bayesian or frequentist inference
The full data can be discarded — no information is lost

This principle is fundamental to both frequentist (MLE, sufficiency) and Bayesian (conjugate priors) inference.

References

Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics.
Neyman, J. (1935). Su un teorema concernente le cosiddette statistiche sufficienti.
Halmos, P. R., & Savage, L. J. (1949). Application of the Radon-Nikodym theorem to the theory of sufficient statistics.
Birnbaum, A. (1962). On the foundations of statistical inference.