8.4 - Sufficiency

Conceptual description of sufficiency

  • Suppose interest lies in estimating \(p\), the proportion of Minnesotan adults who still have their wisdom teeth.
  • A sample of \(n\) Minnesotan adults is obtained; \(Y_1,...,Y_n \sim\) i.i.d. \(BERN(p)\), where \(Y_i = 1\) if that Minnesotan still has their wisdom teeth and 0 otherwise.
  • Two statisticians have the following pieces of information:
  1. Statistician 1 has the complete data values: \(\{Y_1=1,Y_2=0,Y_3=0,Y_4=1,...,Y_n=0\}\)
  2. Statistician 2 knows only the total number out of \(n\) who still have their wisdom teeth: \(\sum_{i=1}^n Y_i.\)
  • Sufficiency is the idea that Statistician 2 knows just as much about \(p\) as Statistician 1.

Formal definition of sufficiency

  • Let \(Y_1,...,Y_n\) be an i.i.d. sample from a distribution with parameter \(\theta\).
  • Let \(U = g(Y_1,...,Y_n)\) represent a statistic from the sample
  • \(U\) is said to be sufficient if the distribution of the sample given \(U\) does not depend on \(\theta\).
  • Discrete:

\[P(Y_1 = y_1, Y_2 = y_2,...,Y_n = y_n| U=u) \mbox{ is } \theta\mbox{-free}\]

  • Continuous:

\[f(y_1,...,y_n| U=u) \mbox{ is } \theta\mbox{-free}\]

  • I.e., once we know \(U\), there is no other information about \(\theta\) left in the individual data values.

“Sufficient for sufficiency”

  • Note that, in the discrete case:

\[P(Y_1 = y_1, Y_2 = y_2,...,Y_n = y_n| U=u) = \frac{P(Y_1 = y_1, Y_2 = y_2,...,Y_n = y_n \cap U=u)}{P(U=u)}\] \[= \frac{P(Y_1 = y_1, Y_2 = y_2,...,Y_n = y_n )}{P(U=u)}=\frac{\prod_{i=1}^nP(Y_i=y_i)}{P(U=u)}\]

  • Sufficient for sufficiency if we can show that
    • \(\frac{\prod_{i=1}^nP(Y_i=y_i)}{P(U=u)}\mbox{ is } \theta\mbox{-free}\)
    • \(\frac{\prod_{i=1}^nf_Y(y_i)}{f_U(u)} \mbox{ is } \theta\mbox{-free}\)

Functions of sufficient statistics

  • Important fact: Any one-to-one function of a sufficient statistic is also sufficient.
  • E.g., if \(\sum_{i=1}^n Y_i\) is a sufficient statistic, so is \(\bar Y\).

Sum of Bernoullis example

  • Recall previous example, where \(Y_1,...,Y_n\stackrel{i.i.d.}{\sim}BERN(p)\).
  • Show that \(U = \sum_{i=1}^n Y_i\) is sufficient for \(p\).

Proof:

  • We have shown that \(U\sim BIN(n,p)\) (MGF method)
  • Thus:

\[\scriptsize \frac{\prod_{i=1}^nP(Y_i=y_i)}{P(U=u)} = \frac{\prod_{i=1}^n p^{y_i}(1-p)^{1-y_i}}{{n\choose u}p^u (1-p)^u} = \frac{ p^{\sum_{i=1}^n y_i}(1-p)^{n-\sum_{i=1}^n y_i}}{{n\choose u}p^u (1-p)^u}\]

\[\scriptsize = \frac{ p^{ u}(1-p)^{n-u}}{{n\choose u}p^u (1-p)^u} = \frac{1}{{n\choose u}} \Leftarrow p\mbox{-free}\]

\[\scriptsize \therefore U = \sum_{i=1}^n Y_i \mbox{ is sufficient for } p.\]

Normal example

  • Suppose \(Y_1,...,Y_n\stackrel{i.i.d.}{\sim}N(\mu,\sigma^2)\).
  • Show that \(U = \sum_{i=1}^n Y_i\) is sufficient for \(\mu\). Is it sufficient for \(\sigma^2\)?
  • We have shown that \(U\sim N(n\mu,n\sigma^2)\) (MGF method)
  • Thus:

\[\scriptsize \frac{\prod_{i=1}^nf_Y(y_i)}{f_U(u)} = \frac{\prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(y_i-\mu)^2}{2\sigma^2}}}{\frac{1}{\sqrt{2\pi n\sigma^2}} e^{-\frac{(u-n\mu)^2}{2n\sigma^2}}} = \frac{\left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^n e^{-\sum_{i=1}^n\frac{(y_i-\mu)^2}{2\sigma^2}}}{\frac{1}{\sqrt{2\pi n\sigma^2}} e^{-\frac{(u-n\mu)^2}{2n\sigma^2}}}\]

\[\scriptsize = \frac{\left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^n e^{-\frac{\sum_i y_i^2-2\mu u +n\mu^2 }{2\sigma^2}}}{\frac{1}{\sqrt{2\pi n\sigma^2}} e^{-\frac{u^2 -2un \mu + n^2\mu^2}{2n\sigma^2}}} =\sqrt{n} \left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^{n-1} e^{-\frac{\sum_i y_i^2 }{2\sigma^2}+ \frac{u^2}{2n\sigma^2}}\Leftarrow \mu\mbox{-free only!}\]

so \(U=\sum_i Y_i\) is sufficient for \(\mu\), but not for \(\sigma^2\)!

The Fisher-Neyman factorization theorem

R.A. Fisher

Image source: University of Adelaide

Jerzy Neyman

Image source: UC Berkeley

  • Fisher: invented the notion and definition of sufficiency (1922)
  • Neyman: formalized approach to finding a sufficient statistic and establishing its sufficiency (1935)

Factorization theorem

Let \(f(y_1,...,y_n;\vec{\theta})\) represent the joint probability density function of a random sample drawn from a population governed by parameter vector \(\vec{\theta}\). Then a statistic \(\vec{U}\) (also possibly vector valued) is sufficient for \(\vec{\theta}\) if and only if there exists functions \(g(\vec{u};\vec\theta)\) and \(h(y_1,...,y_n)\) such that:

\[f(y_1,...,y_n;\vec{\theta}) = g(\vec{u};\vec{\theta}) h(y_1,...,y_n).\]

  • Analogously for jointly discrete data, with joint pmf, need to show:

\[p(y_1,...,y_n;\vec{\theta}) = g(\vec{u};\vec{\theta}) h(y_1,...,y_n).\]

  • Factorization theorem suggests a way to find a “good” (sufficient) statistic, and establishes its sufficiency all in one

Example: Bernoulli

  • Suppose \(Y_1,...,Y_n\stackrel{i.i.d.}{\sim}BERN(p)\)
  • Use factorization theorem to find sufficient statistic for \(p\).

\[ p(y_1,y_2,...,y_n;p) =\prod_{i=1}^n p^{y_i}(1-p)^{1-y_i} = p^{\sum_{i=1}^n y_i}(1-p)^{n-\sum_{i=1}^n y_i}= \underbrace{p^u(1-p)^{n-u}}_{g(u;p)} \times \underbrace{1}_{h(y_1,...,y_n)} \]

\[\therefore U = \sum_{i=1}^n Y_i \mbox{ is sufficient for }p\]

Example: Normal

  • Suppose \(Y_1,...,Y_n\stackrel{i.i.d.}{\sim}N(\mu,\sigma^2)\)
  • Use factorization theorem to find sufficient statistic for \(\{\mu,\sigma^2\}\).

\[ \prod_{i=1}^nf_Y(y_i;\mu,\sigma^2) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(y_i-\mu)^2}{2\sigma^2}} =\left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^n e^{-\sum_{i=1}^n\frac{(y_i-\mu)^2}{2\sigma^2}} \]

\[ = \underbrace{(\sigma^2)^{-n/2} e^{-\frac{\sum_i y_i^2-2\mu\sum_i y_i +n\mu^2 }{2\sigma^2}}}_{g\left(u = (\sum_i y_i,\sum_i y_i^2);\theta=(\mu,\sigma^2)\right)} \underbrace{(2\pi)^{-n/2}}_{h(y_1,...,y_n)}\] \[\therefore U = \left(\sum_i Y_i,\sum_i Y_i^2\right) \mbox{ is jointly sufficient for }(\mu,\sigma^2)\]

Is \(\sum_{i=1}^n Y_i\) alone sufficient for \(\mu\)?

\[ \prod_{i=1}^nf_Y(y_i;\mu)= \underbrace{e^{-\frac{-2\mu\sum_i y_i +n\mu^2 }{2\sigma^2}}}_{g(u = \sum_i y_i;\mu)} \underbrace{e^{-\frac{\sum_i y_i^2}{2\sigma^2}}(2\pi\sigma^2)^{-n/2}}_{h(y_1,...,y_n)}\]

\[\therefore U = \sum_i Y_i \mbox{ is sufficient for }\mu\]

Is \(\sum_{i=1}^n Y_i^2\) alone sufficient for \(\sigma^2\)?

\[ \prod_{i=1}^nf_Y(y_i;\sigma^2)=e^{-\frac{\sum_i y_i^2}{2\sigma^2}} e^{-\frac{-n\mu^2 }{2\sigma^2}}(2\pi\sigma^2)^{-n/2} \underbrace{\times e^{-\frac{-2\mu\sum_i y_i }{2\sigma^2}}}_{\mbox{cannot factor} \sum_i y_i \mbox{ away from }\sigma^2}\]

\[\therefore U = \sum_i Y_i^2 \mbox{ alone is NOT sufficient for }\sigma^2\]

Makes sense, given classic sample variance estimator:

\[\hat\sigma^2 = \frac{\sum_{i=1}^n (Y_i -\bar Y)^2}{n-1}=\frac{\sum_{i=1}^n Y_i^2 -n\bar Y^2}{n-1}\]

makes use of both \(\sum_{i=1}^n Y_i^2\) and \(\sum_{i=1}^n Y_i\)

Example: uniform

  • When \(\theta\) governs the support, we need to incorporate this with indicator functions.

    \[1(condition) = \begin{cases} 1 & \mbox{condition satisfied} \\ 0 & otherwise \end{cases}\]

  • Suppose \(Y_1,...,Y_n\stackrel{i.i.d.}{\sim}UNIF(0,\theta)\)

  • Use factorization theorem to find sufficient statistic for \(\theta\).

\[ \prod_{i=1}^nf_Y(y_i;\theta) = \prod_{i=1}^n \frac{1}{\theta}\cdot 1(0<y_i<\theta) = \underbrace{\frac{1}{\theta^n}\cdot 1(Y_{(n)}< \theta)}_{g(u;\theta)}\cdot \underbrace{1(Y_{(1)}>0)}_{h(y_1,...,y_n)} \]

\[\therefore U = Y_{(n)}\mbox{ is sufficient for }\theta\]

  • Note that the indicator function for support is always there in practice, but is automatically part of \(h(\cdot)\) function if support doesn’t depend on \(\theta\).