MLE and sufficient statistics

The theorem states:

Let \(X_1, X_2,...,X_n\) denote a random sample from a distribution that has pdf or pmf \(f(x; \theta), \theta \in \Omega.\) If a sufficient statistic \(Y_1 = u_1(X_1, X_2,...,X_n)\) for \(\theta\) exists and if a maximum likelihood estimator \(\hat{\theta}\) of \(\theta\) also exists uniquely, then \(\hat{\theta}\) is a function of \(Y_1 = u_1(X_1, X_2,...,X_n)\).

Proof:

\(Y_1\) is the sufficient statistic, let \(f_{Y1}(y_1; θ)\) be the pdf or pmf of \(Y_1\). By Neyman factorization theorem we have:

\[\prod_{i=1}^nf(x_i;\theta)=f_{Y_1}(y_1;\theta)H(x_1,x_2,...,x_n)\] The \(\hat{\theta}\) maximizing the joint pdf for fixed \(x_1,...,x_n\) therefore depends on the \(x_i\) only through the value of the sufficient statistic \(y_1\). This proved the theorem. To understand the proof, we look at the above equation, \(\hat{\theta}\) is the outcome, \(x_1,...,x_n\) are fixed, \(y_1\), i.e. the sufficient statistic is the only variable can very, therefore, we can say \(\hat{\theta}\) depends on \(y_1\) which means \(\hat{\theta}\) is a function of a sufficient statistic \(y_1\).

Order statistic and sufficient statistics

Suppose we have a random variable \(X_1,X_2,X_3\) with pdf

\[f(x;\theta)=e^{-(x-\theta)}I_{(\theta,\infty)}(x)\] Then we can show \(min (x_i)\) is a sufficient statistic for \(\theta\) while \(max(x_i)\) is not a sufficient for \(\theta\):

First, we write the joint pdf of \(X_1,X_2,X_3\) as

\[\begin{align*} f(x_1;\theta)f(x_2;\theta)f(x_3;\theta)&=e^{-(x_1-\theta)}I_{\theta,\infty}(x_1)\times e^{-(x_2-\theta)}I_{\theta,\infty}(x_2)\times e^{-(x_3-\theta)}I_{\theta,\infty}(x_3)\\&=e^{-(x_1-\theta)}e^{-(x_2-\theta)}e^{-(x_2-\theta)}I_{(\theta,\infty)}(x_1)I_{(\theta,\infty)}(x_2)I_{(\theta,\infty)}(x_3) \end{align*}\]

The key is here, we can wrote \[I_{(\theta,\infty)}(x_1)I_{(\theta,\infty)}(x_2)I_{(\theta,\infty)}(x_3)=I_{(\theta,\infty)}(min(x_i))\tag{1}\]

Notice, indicator function \(I\) only have value 0 or 1.

Situation 1:

If \(\theta<min(x_i)<\infty\) then the right side of equation \((1)\) will be 1, and the left side of the equation will be guaranteed to be 1, since \(min(x_i)\) within the \((\theta,\infty)\) then \(x_1,x_2,x_3\) must within the interval, therefore, the right side will be 1, too.

Situation 2:

If \(min(x_i)\) is not within the interval \((\theta,\infty)\) the right side of equation \((1)\) will be 0, however, the left side of the equation \((1)\) will also be 0, since there must be a \(x\) that equals to \(min(x_i)\) that also outside the \((\theta,\infty)\) interval then make the product of indicator functions as 0.

Therefore, we showed that equation is correct.

However, we cannot write the product of the indicator functions as:

\[I_{(\theta,\infty)}(x_1)I_{(\theta,\infty)}(x_2)I_{(\theta,\infty)}(x_3)=I_{(\theta,\infty)}(max(x_i))\tag{2}\] Because the right side of equation \((2)\) not always equal to the left side.

Such as if \(max(x_i)\) within the \((\theta,\infty)\) interval,then the indicator function will be 1, however, we cannot infer that if \(max(x_i)\) within the \((\theta,\infty)\) then \(x_1,x_2,x_3\) are all inside the interval, then the left side might be 0.

Therefore, we see \(min(x_i)\) is a sufficient statistic for this distribution, but \(max(x_i)\) is not.

The key to understand this example is the product of the indicator functions.

Another example

Suppose the \(X_i\) are uniformly distributed on \([0, \theta]\) where \(\theta\) is unknown. We show \(max(x_i)\) is a sufficient statistic

The pdf of the uniformly distributed on \([0, \theta]\) is \(f(x)=\frac{1}{\theta}I_{[0, \theta]}(x)\)

Then the joint density is

\[\begin{align*} f(x_1, ..., x_n;\theta) &=\frac{1}{\theta}I_{[0, \theta]}(x_1)\frac{1}{\theta}I_{[0, \theta]}(x_2)...\frac{1}{\theta}I_{[0, \theta]}(x_3)\\&=\frac{1}{\theta^n}I_{[0, \theta]}(x_1)I_{[0, \theta]}(x_2)...I_{[0, \theta]}(x_3)\\&=\frac{1}{\theta^n}I_{[0,\theta]}(max(x_i))\times 1 \end{align*}\]

The key is we can write \[{\theta^n}I_{[0, \theta]}(x_1)I_{[0, \theta]}(x_2)...I_{[0, \theta]}(x_3)={\theta^n}I_{[0,\theta]}(max(x_i)) \tag{3}\]

we can see for \(0\le x_i \le \theta\), if \(max(x_i)\) within the interval \([0,\theta]\),the right side of \((3)\) will be 1, and the left side of the equation \((3)\), i.e. the product of indicator functions, is also 1, if \(max(x_i)\) is not within \([0,\theta]\) at the right side, the indicator function will be 0, then at the left side of \((3)\) there must have an indicator function be 0, then the production will be 0, therefor the equation \((3)\) will be always hold.

Therefore, we showed \(max{(x_i)}\) is a sufficient statistic for this distribution by factorization theorem.

Factorization theorem for several parameters

Let \(X_1, X_2, \ldots, X_n\) denote random variables with a joint p.d.f. (or joint p.m.f.):

\[f(x_1,x_2, ... ,x_n; \theta_1, \theta_2)\] which depends on the parameters \(\theta_1\) and \(\theta_2\). Then, the statistics \(Y_1=u_1(X_1, X_2, ... , X_n)\) and \(Y_2=u_2(X_1, X_2, ... , X_n)\) are joint sufficient statistics for \(\theta_1\) and \(\theta_2\) if and only if:

\[f(x_1, x_2, ... , x_n;\theta_1, \theta_2) =\phi\left[u_1(x_1, ... , x_n), u_2(x_1, ... , x_n);\theta_1, \theta_2 \right] h(x_1, ... , x_n)\]

where:

\(\phi\) is a function that depends on the data \((x_1, x_2, ... , x_n)\) only through the functions \((x_1, x_2, ... , x_n)\) and \((x_1, x_2, ... , x_n)\), and the function \(h(x_1, ... , x_n)\) does not depend on either of the parameters \(\theta_1\) or \(\theta_2\).

References

1.Moore DS. Maximum likelihood and sufficient statistics. The American Mathematical Monthly. 1971 Jan 1;78(1):50-2.

2.Hogg, Robert V., Joseph W. McKean, and Allen T. Craig. Introduction to mathematical statistics (8th edition, p425 )

3.https://online.stat.psu.edu/stat415/lesson/24/24.4