Statistics 2: Topic 2 (continue)

Author

Minerva Mukhopadhyay

Published

February 13, 2025

So far we have been introduced to the concepts of unbiasness and efficiency.

Naturally, one would ask if an (unbiased and) efficient estimator of \(\psi(\boldsymbol{\theta})\) is the best possible estimator for \(\psi(\boldsymbol{\theta})\). To answer this, we need to know what do we mean by the best estimator of \(\psi(\boldsymbol{\theta})\).

An estimator of \(\psi(\boldsymbol{\theta})\) is the best if it minimizes the error in estimating \(\psi(\boldsymbol{\theta})\) on an average (for all \(\boldsymbol{\theta}\)).

Definition[Mean-Squared Error (MSE)]

Let \(T({\bf X})\) be an estimator of \(\psi(\boldsymbol{\theta})\). The MSE of \(T({\bf X})\) in estimating \(\psi(\boldsymbol{\theta})\) is given by \[ \]

\[ E_{\boldsymbol{\theta}} \left[ \left\{ T({\bf X}) - \psi(\boldsymbol{\theta}) \right\}^{2} \right] . \]

An estimator of \(\psi(\boldsymbol{\theta})\) is the best if it minimizes the MSE in estimating \(\psi(\boldsymbol{\theta})\) (for all \(\boldsymbol{\theta}\)).

Definition [UMVUE]

An estimator of \(\theta\), \(T({\bf X})\), is called a uniformly minimum variance unbiased estimator (UMVUE) if

  1. \(E_{\theta}\{ T({\bf X})\}=\theta\) for all \(\theta\), and

  2. for any estimator \(T^{\prime}({\bf X})\) with \(E_{\theta}\{ T^{\prime}({\bf X})\}=\theta\), \(\mathrm{var}_{\theta}\{T({\bf X})\}\leq \mathrm{var}_{\theta}\{T^{\prime}({\bf X})\}\) for all \(\theta\).

    Note: UMVUE is unique.

    Note: If an (unbiased and) efficient estimator exists then it must be the UMVUE. However, the converse is not true.

How to find the UMVUE?

One way to find the UMVUE is through the complete-sufficient statistics (CSS).

Result: Let \(T({\bf X})\) be complete-sufficient statistic, and there exists a function \(g\) such that \(E_{\boldsymbol{\theta}} \left[ g(T({\bf X}) \right] = \psi(\boldsymbol{\theta})\) for all \(\boldsymbol{\theta}\) , i.e, \(g(T({\bf X}))\) is an unbiased estimator of \(\psi(\boldsymbol{\theta})\). Then \(g(T({\bf X}))\) is the UMVUE of \(\psi(\boldsymbol{\theta})\).

Example: Let \(X_{1}, \ldots, X_{n}\) be a random sample from \(\mathrm{uniform}(0, \theta)\). Suppose it is known that the highest order statistics \(X_{(n)} = \max\{ X_{1}, \ldots, X_{n}\}\) is a complete-sufficient statistic.

  • Verify that \(E_{\theta} (X_{(n)}) = n \theta/ (n+1)\), for all \(\theta\), which implies
    \[ E_{\theta} \left[\frac{n+1}{n} X_{(n)} \right] = \theta.\]

  • Therefore, by the above result \(T({\bf X}) = (n+1) X_{(n)}/n\) is the UMVUE of \(\theta\).

How to identify a complete-sufficient statistic?

  • Usually it is not easy to obtain a complete-sufficient statistic (beyond the scope of this course).

  • However, when the class of distributions under consideration belongs to the \(\mathrm{exponential~ family}\), then there is a simple way out.

Definition [Exponential family]:

A family of pmfs of pdfs is called a \(d\)-parameter exponential family if it can be expressed as

\[ f_{\boldsymbol{\theta}}({\bf x}) = \exp \left\{h(x) + c(\boldsymbol{\theta}) + \sum_{i=1}^{k} w_{i} (\boldsymbol{\theta}) T_{i}(x)\right\} . \] Here \(h, T_{1}, \ldots, T_{k}\) are real valued functions of \(x\), not depending on \(\boldsymbol{\theta}\). Further, \(c(\boldsymbol{\theta}), w_{1} (\boldsymbol{\theta}), \ldots, w_{k} (\boldsymbol{\theta})\) are real-valued functions of \(\boldsymbol{\theta}\), not depending on \(x\).

Result: Let \(X_{1},\ldots, X_{n}\) be a random sample from a distribution with pmf or pdf \(f_{\boldsymbol{\theta}}, ~ \boldsymbol{\theta} \in \boldsymbol{\Theta}\) contains an open subset of \(\mathbb{R}^{d}\) which belongs to an exponential family with \(d\leq k\). Then the statistic \({\bf T}({\bf X})\) is jointly complete-sufficient for \(\boldsymbol{\theta}\), where \[ {\bf T}({\bf X}) = \left(\sum_{i=1}^{n} T_{1} (X_{i}), \cdots, \sum_{i=1}^{n} T_{k} (X_{i}) \right). \]

Note: A one-one function of a complete-sufficient statistic is also a complete-sufficient statistic.

Example: Let \(X_{1}, \ldots, X_{n}\) be a random sample from \(\mathtt{normal}(\mu, \sigma^{2})\) distribution.

  • Express the joint pdf of \(X_{1}, \ldots, X_{n}\) in the form of exponential family.

  • Hence show that \((\sum_{i=1}^{n} X_{i} , \sum_{i=1}^{n} {X^{2}_{i}})\) is jointly complete-sufficient.

  • Using the above result find the UMVUEs of \(\mu\) and \(\sigma^{2}\).