A statistic \(T(X_1, \dots, X_n)\) is sufficient for a parameter \(\theta\) if the conditional distribution of the sample given \(T = t\) does not depend on \(\theta\).
Equivalently, \(T\) captures all information about \(\theta\) available in the sample.
Let \(X_1, \dots, X_n \stackrel{\text{i.i.d.}}{\sim} \text{Bernoulli}(p)\).
The joint pmf is:
\[ f(x_1,\dots,x_n; p) = p^{\sum x_i} (1-p)^{n - \sum x_i} \]
Let \(T = \sum_{i=1}^n x_i\). Then:
\[ f(x_1,\dots,x_n; p) = \underbrace{p^T (1-p)^{n-T}}_{g(T,p)} \cdot \underbrace{1}_{h(x)} \]
By the Factorization Theorem, \(T\) is sufficient for \(p\).
We compute:
\[ P(X_1 = x_1, \dots, X_n = x_n \mid T = t) \]
\[ P(\text{sample} \mid T = t) = \frac{P(X_1 = x_1, \dots, X_n = x_n)}{P(T = t)} \]
Numerator:
\[ P(X_1 = x_1, \dots, X_n = x_n) = p^t (1-p)^{n-t} \]
Denominator (since \(T \sim \text{Binomial}(n, p)\)):
\[ P(T = t) = \binom{n}{t} p^t (1-p)^{n-t} \]
Cancel \(p^t (1-p)^{n-t}\) (valid for \(0 < p < 1\)):
\[ P(X_1 = x_1, \dots, X_n = x_n \mid T = t) = \frac{1}{\binom{n}{t}} \]
Therefore, no additional information about \(p\) can be obtained from the individual \(X_i\) beyond what \(T\) already provides. That is precisely the definition of sufficiency.
| Concept | Implication |
|---|---|
| Sufficiency | \(P(\text{sample} \mid T)\) is free of \(\theta\) |
| Bernoulli example | \(T = \sum X_i\) is sufficient for \(p\) |
| Conditional distribution given \(T=t\) | Uniform over all sequences with \(t\) ones |
| Key benefit | Inference about \(p\) can be based entirely on \(T\) |