Definition

A statistic \(T(X_1, \dots, X_n)\) is sufficient for a parameter \(\theta\) if the conditional distribution of the sample given \(T = t\) does not depend on \(\theta\).

Equivalently, \(T\) captures all information about \(\theta\) available in the sample.


Why is \(T = \sum_{i=1}^n X_i\) sufficient for \(p\) in a Bernoulli model?

Let \(X_1, \dots, X_n \stackrel{\text{i.i.d.}}{\sim} \text{Bernoulli}(p)\).

Factorization Theorem (Neyman–Fisher)

The joint pmf is:

\[ f(x_1,\dots,x_n; p) = p^{\sum x_i} (1-p)^{n - \sum x_i} \]

Let \(T = \sum_{i=1}^n x_i\). Then:

\[ f(x_1,\dots,x_n; p) = \underbrace{p^T (1-p)^{n-T}}_{g(T,p)} \cdot \underbrace{1}_{h(x)} \]

  • \(g(T,p)\) depends on the data only through \(T\)
  • \(h(x) = 1\) does not depend on \(p\)

By the Factorization Theorem, \(T\) is sufficient for \(p\).


Conditional distribution of the sample given \(T = t\)

We compute:

\[ P(X_1 = x_1, \dots, X_n = x_n \mid T = t) \]

  • If \(\sum x_i \neq t\), the probability is 0.
  • If \(\sum x_i = t\), then:

\[ P(\text{sample} \mid T = t) = \frac{P(X_1 = x_1, \dots, X_n = x_n)}{P(T = t)} \]

Numerator:

\[ P(X_1 = x_1, \dots, X_n = x_n) = p^t (1-p)^{n-t} \]

Denominator (since \(T \sim \text{Binomial}(n, p)\)):

\[ P(T = t) = \binom{n}{t} p^t (1-p)^{n-t} \]

Cancel \(p^t (1-p)^{n-t}\) (valid for \(0 < p < 1\)):

\[ P(X_1 = x_1, \dots, X_n = x_n \mid T = t) = \frac{1}{\binom{n}{t}} \]


Interpretation

  • The conditional distribution does not depend on \(p\).
  • Given \(T = t\), every specific sequence with exactly \(t\) ones has probability \(1 / \binom{n}{t}\).
  • This means: once you know \(T\), the actual pattern of zeros and ones is like choosing a random subset of positions for the successes uniformly among all \(\binom{n}{t}\) possibilities, regardless of \(p\).

Therefore, no additional information about \(p\) can be obtained from the individual \(X_i\) beyond what \(T\) already provides. That is precisely the definition of sufficiency.


Summary

Concept Implication
Sufficiency \(P(\text{sample} \mid T)\) is free of \(\theta\)
Bernoulli example \(T = \sum X_i\) is sufficient for \(p\)
Conditional distribution given \(T=t\) Uniform over all sequences with \(t\) ones
Key benefit Inference about \(p\) can be based entirely on \(T\)