When a statistic \(T(\mathbf{Y})\) is sufficient for a parameter \(\theta\), we can use its probability density function (pdf) or probability mass function (pmf) as the likelihood function for \(\theta\), up to a constant of proportionality. This document summarizes the key theorems and concepts.
If \(T\) is sufficient for \(\theta\), then:
\[L(\theta \mid \mathbf{y}) \propto f_T(t \mid \theta)\]
where: - \(L(\theta \mid \mathbf{y})\) is the likelihood based on the full data - \(f_T(t \mid \theta)\) is the pdf/pmf of \(T\) evaluated at the observed value \(t = T(\mathbf{y})\) - \(\propto\) means “proportional to as a function of \(\theta\)”
Statement: \(T(\mathbf{Y})\) is sufficient for \(\theta\) if and only if
\[f(\mathbf{y} \mid \theta) = h(\mathbf{y}) \times g(T(\mathbf{y}), \theta)\]
where: - \(h(\mathbf{y})\) does not depend on \(\theta\) - \(g(T(\mathbf{y}), \theta)\) depends on \(\mathbf{y}\) only through \(T(\mathbf{y})\)
Implication for likelihood:
\[L(\theta \mid \mathbf{y}) = h(\mathbf{y}) \times g(t, \theta) \propto g(t, \theta) \propto f_T(t \mid \theta)\]
This is the most practical and commonly used theorem.
Statement: \(T\) is sufficient for \(\theta\) if the conditional distribution of \(\mathbf{Y}\) given \(T = t\) does not depend on \(\theta\).
Derivation:
\[f_{\mathbf{Y}}(\mathbf{y} \mid \theta) = f_{\mathbf{Y}|T}(\mathbf{y} \mid t) \times f_T(t \mid \theta)\]
Since \(f_{\mathbf{Y}|T}(\mathbf{y} \mid t)\) is free of \(\theta\) (by sufficiency):
\[L(\theta \mid \mathbf{y}) \propto f_T(t \mid \theta)\]
Statement: \(T\) is sufficient for \(\theta\) if and only if for every pair of sample points \(\mathbf{y}_1\) and \(\mathbf{y}_2\) with \(T(\mathbf{y}_1) = T(\mathbf{y}_2)\):
\[L(\theta \mid \mathbf{y}_1) \propto L(\theta \mid \mathbf{y}_2) \quad \text{(as functions of $\theta$)}\]
Implication: If \(T\) is sufficient, all sample points yielding the same \(T\) give proportional likelihoods, so we can replace \(\mathbf{y}\) with \(t\).
Statement: If \(T\) is sufficient for \(\theta\), then there exists a function \(c(\mathbf{y}, t)\) such that for all \(\theta\):
\[L(\theta \mid \mathbf{y}) = c(\mathbf{y}, t) \times L^*(\theta \mid t)\]
where \(c(\mathbf{y}, t)\) does not depend on \(\theta\), and \(L^*(\theta \mid t)\) is the likelihood based on \(T\) alone.
Statement: If \(T\) is sufficient for \(\theta\) and \(\hat{\theta}\) is an estimator of \(\theta\), then \(\mathbb{E}[\hat{\theta} \mid T]\) has mean squared error less than or equal to that of \(\hat{\theta}\).
Implication: All information about \(\theta\) is contained in \(T\), so likelihood-based inference can be based solely on \(T\).
Statement: All evidence from data about \(\theta\) is contained in the likelihood function.
Combined with sufficiency: If \(T\) is sufficient, the likelihood based on \(\mathbf{y}\) and the likelihood based on \(T\) are proportional, so inference should be identical.
Let \(Y_i \stackrel{\text{iid}}{\sim} N(\mu, \sigma^2)\) with \(\sigma^2\) known.
Full likelihood: \[L(\mu \mid y_1, \ldots, y_n) \propto \exp\left[-\frac{\sum (y_i - \mu)^2}{2\sigma^2}\right]\]
Sufficient statistic: \(\bar{Y} = \frac{1}{n}\sum Y_i\)
Distribution of \(\bar{Y}\): \(\bar{Y} \mid \mu \sim N(\mu, \sigma^2/n)\)
Likelihood based on \(\bar{Y}\): \[L(\mu \mid \bar{y}) \propto \exp\left[-\frac{n(\bar{y} - \mu)^2}{2\sigma^2}\right]\]
These are proportional as functions of \(\mu\), confirming the theorem.
Using the pdf of a statistic as the likelihood is valid if and only if** that statistic is sufficient.**
| Statistic | Sufficient? | Can use its pdf as likelihood? |
|---|---|---|
| \(\bar{Y}\) for \(N(\mu, \sigma^2)\) with \(\sigma^2\) known | ✅ Yes | ✅ Yes |
| \(\max Y_i\) for Uniform\((0, \theta)\) | ✅ Yes | ✅ Yes |
| Sample median for \(N(\mu, \sigma^2)\) | ❌ No (not sufficient) | ❌ No |
| \(\bar{Y}\) for Uniform\((0, \theta)\) | ❌ No | ❌ No |
This principle is fundamental to both frequentist (MLE, sufficiency) and Bayesian (conjugate priors) inference.