Introduction

When a statistic \(T(\mathbf{Y})\) is sufficient for a parameter \(\theta\), we can use its probability density function (pdf) or probability mass function (pmf) as the likelihood function for \(\theta\), up to a constant of proportionality. This document summarizes the key theorems and concepts.


The Core Result

If \(T\) is sufficient for \(\theta\), then:

\[L(\theta \mid \mathbf{y}) \propto f_T(t \mid \theta)\]

where: - \(L(\theta \mid \mathbf{y})\) is the likelihood based on the full data - \(f_T(t \mid \theta)\) is the pdf/pmf of \(T\) evaluated at the observed value \(t = T(\mathbf{y})\) - \(\propto\) means “proportional to as a function of \(\theta\)


Key Theorems

1. Factorization Theorem (Fisher-Neyman)

Statement: \(T(\mathbf{Y})\) is sufficient for \(\theta\) if and only if

\[f(\mathbf{y} \mid \theta) = h(\mathbf{y}) \times g(T(\mathbf{y}), \theta)\]

where: - \(h(\mathbf{y})\) does not depend on \(\theta\) - \(g(T(\mathbf{y}), \theta)\) depends on \(\mathbf{y}\) only through \(T(\mathbf{y})\)

Implication for likelihood:

\[L(\theta \mid \mathbf{y}) = h(\mathbf{y}) \times g(t, \theta) \propto g(t, \theta) \propto f_T(t \mid \theta)\]

This is the most practical and commonly used theorem.


2. Conditional Distribution Definition (Original Fisher Definition)

Statement: \(T\) is sufficient for \(\theta\) if the conditional distribution of \(\mathbf{Y}\) given \(T = t\) does not depend on \(\theta\).

Derivation:

\[f_{\mathbf{Y}}(\mathbf{y} \mid \theta) = f_{\mathbf{Y}|T}(\mathbf{y} \mid t) \times f_T(t \mid \theta)\]

Since \(f_{\mathbf{Y}|T}(\mathbf{y} \mid t)\) is free of \(\theta\) (by sufficiency):

\[L(\theta \mid \mathbf{y}) \propto f_T(t \mid \theta)\]


3. Likelihood Ratio Criterion for Sufficiency

Statement: \(T\) is sufficient for \(\theta\) if and only if for every pair of sample points \(\mathbf{y}_1\) and \(\mathbf{y}_2\) with \(T(\mathbf{y}_1) = T(\mathbf{y}_2)\):

\[L(\theta \mid \mathbf{y}_1) \propto L(\theta \mid \mathbf{y}_2) \quad \text{(as functions of $\theta$)}\]

Implication: If \(T\) is sufficient, all sample points yielding the same \(T\) give proportional likelihoods, so we can replace \(\mathbf{y}\) with \(t\).


4. Neyman Sufficiency Principle

Statement: If \(T\) is sufficient for \(\theta\), then there exists a function \(c(\mathbf{y}, t)\) such that for all \(\theta\):

\[L(\theta \mid \mathbf{y}) = c(\mathbf{y}, t) \times L^*(\theta \mid t)\]

where \(c(\mathbf{y}, t)\) does not depend on \(\theta\), and \(L^*(\theta \mid t)\) is the likelihood based on \(T\) alone.


5. Blackwell-Rao Theorem (Indirect)

Statement: If \(T\) is sufficient for \(\theta\) and \(\hat{\theta}\) is an estimator of \(\theta\), then \(\mathbb{E}[\hat{\theta} \mid T]\) has mean squared error less than or equal to that of \(\hat{\theta}\).

Implication: All information about \(\theta\) is contained in \(T\), so likelihood-based inference can be based solely on \(T\).


6. Likelihood Principle (Foundational)

Statement: All evidence from data about \(\theta\) is contained in the likelihood function.

Combined with sufficiency: If \(T\) is sufficient, the likelihood based on \(\mathbf{y}\) and the likelihood based on \(T\) are proportional, so inference should be identical.


Example: Normal Distribution with Known Variance

Let \(Y_i \stackrel{\text{iid}}{\sim} N(\mu, \sigma^2)\) with \(\sigma^2\) known.

Full likelihood: \[L(\mu \mid y_1, \ldots, y_n) \propto \exp\left[-\frac{\sum (y_i - \mu)^2}{2\sigma^2}\right]\]

Sufficient statistic: \(\bar{Y} = \frac{1}{n}\sum Y_i\)

Distribution of \(\bar{Y}\): \(\bar{Y} \mid \mu \sim N(\mu, \sigma^2/n)\)

Likelihood based on \(\bar{Y}\): \[L(\mu \mid \bar{y}) \propto \exp\left[-\frac{n(\bar{y} - \mu)^2}{2\sigma^2}\right]\]

These are proportional as functions of \(\mu\), confirming the theorem.


Important Caveat

Using the pdf of a statistic as the likelihood is valid if and only if** that statistic is sufficient.**

Statistic Sufficient? Can use its pdf as likelihood?
\(\bar{Y}\) for \(N(\mu, \sigma^2)\) with \(\sigma^2\) known ✅ Yes ✅ Yes
\(\max Y_i\) for Uniform\((0, \theta)\) ✅ Yes ✅ Yes
Sample median for \(N(\mu, \sigma^2)\) ❌ No (not sufficient) ❌ No
\(\bar{Y}\) for Uniform\((0, \theta)\) ❌ No ❌ No

Practical Takeaway

  1. Find a sufficient statistic \(T\) for \(\theta\) (often using the Factorization Theorem)
  2. Derive the sampling distribution of \(T\) (pdf/pmf \(f_T(t \mid \theta)\))
  3. Use \(f_T(t \mid \theta)\) as the likelihood for Bayesian or frequentist inference
  4. The full data can be discarded — no information is lost

This principle is fundamental to both frequentist (MLE, sufficiency) and Bayesian (conjugate priors) inference.


References

  • Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics.
  • Neyman, J. (1935). Su un teorema concernente le cosiddette statistiche sufficienti.
  • Halmos, P. R., & Savage, L. J. (1949). Application of the Radon-Nikodym theorem to the theory of sufficient statistics.
  • Birnbaum, A. (1962). On the foundations of statistical inference.