Statistics 2 Topic 2
Statistical Inference: Point Estimation
- We learn some desirable properties of the estimators.
(I) Key Takeaways
Let \(X_{1}, \ldots, X_{n}\) be a random sample from some population characterized by \(F_{\boldsymbol{\theta}}\) (i.e., \(X_{1}, \ldots, X_{n}\) are iid samples from \(F_{\boldsymbol{\theta}}\)), and we are interested in estimating a function of \(\boldsymbol{\theta}\), say \(\psi = \psi(\boldsymbol{\theta})\).
Definition(Statistic): Any function of the random samples \({\bf X}_{1}, \ldots, {\bf X}_{n}\) is called a statistic.
Definition(Estimator): Suppose the statistic \(T({\bf X})\) is used to estimate \(\psi = \psi(\boldsymbol{\theta})\), then \(T({\bf X})\) is called an estimator of \(\psi(\boldsymbol{\theta})\).
For a particular realization of \(X_{1}, \ldots, X_{n}\), say, \(x_{1}, \ldots, x_{n}\), the value of \(T({\bf x})\) is called an estimate of \(\psi(\boldsymbol{\theta})\).
Note: Like the random sample, a statistic (in particular, an estimator) \(T({\bf X})\) is also a random variable. The distribution of \(T({\bf X})\) is called the sampling distribution of \(T({\bf X})\).
Example: Let \(X_{1}, \ldots, X_{n}\) be a random sample of size \(n\) from \(\mathtt{Binomial}(m,\theta)\) distribution. Consider three estimators of \(\theta\) as:
\[ T_{1} ({\bf X}) = \frac{1}{m}\bar{X}_{n}, \quad T_{2} ({\bf X}) = \frac{S_{n}^{2}}{m} +\frac{\bar{X}_{n}^{2}}{m^{2}}, \quad \text{and} \quad T_{3} ({\bf X}) = \frac{1}{m}X_{1}.\]
We can obtain the sampling distribution of these statistics analytically, as well as using simulation.
- For \(m=20\) and different choices of \(\theta\), the sampling distribution of \(T_{1}({\bf X})\) (calculated based on \(n = 15\) random samples) is described in the following histograms.
- For \(m=20\) and different choices of \(\theta\), the sampling distribution of \(T_{2}({\bf X})\) (calculated based on \(n = 15\) random samples) is described in the following histograms.
- For \(m=20\) and different choices of \(\theta\), the sampling distribution of \(T_{3}({\bf X})\) (calculated based on \(n = 15\) random samples) is described in the following histograms.
The above figures clearly indicate the following
\(T_{1}\) and \(T_{3}\) neither overestimate nor underestimate \(\theta\) on an average.
\(T_{2}\) tends to underestimate \(\theta\). The amount of bias increases as \(\theta (1-\theta)\) increases. It has the highest bias when \(\theta = 0.5\).
The variance of \(T_{1}\) is much smaller than the variances of \(T_{2}\) and \(T_{3}\).
Therefore, based on these behavior of the sampling distributions one would choose \(T_{1}\) over \(T_{2}\) and \(T_{3}\).
Definition (Unbiasness) Let \(T({\bf X})\) be an estimator of \(\psi(\theta)\). Then the bias in estimating \(\psi(\boldsymbol{\theta})\) by \(T({\bf X})\) is defined as
\[\mathcal{B}_{T,n}(\psi(\boldsymbol{\theta})) = E_{\boldsymbol{\theta}} (T({\bf X})) - \psi(\boldsymbol{\theta}).\]
An estimator \(T({\bf X})\) is called unbiased if \(\mathcal{B}_{T,n} (\psi(\boldsymbol{\theta})) = 0\) for all \(\boldsymbol{\theta}\).
If \(\mathcal{B}_{T,n} (\psi(\boldsymbol{\theta}))>0\) (or, \(<0\)) then \(T({\bf X})\) tends to overestimate (or, underestimates) \(\psi(\boldsymbol{\theta})\).
Among the unbiased estimators, one would like to choose that estimator which has smaller variance.
Definition (Relative Efficiency) Let \(T_{1}({\bf X})\) and \(T_{2}({\bf X})\) be two unbiased estimators of \(\psi(\boldsymbol{\theta})\). Then the relative efficiency of \(T_{1}\) and \(T_{2}\) is
\[ \mathcal{RE}_{T_{1}, T_{2}} ( \psi(\boldsymbol{\theta})) = \frac{\text{var}_{\boldsymbol{\theta}}(T_{1})}{\text{var}_{\boldsymbol{\theta}}(T_{2})}.\]
- If \(\mathcal{RE}_{T_{1}, T_{2}} ( \psi(\boldsymbol{\theta}))<1\) (or, \(>1\)) then the estimator \(T_{1}\) is preferred (or, not preferred) over \(T_{2}\).
What is the minimum possible value of the variance of an unbiased estimator?
- If the underlying distribution \(F_{\boldsymbol{\theta}}\) satisfies some regularity conditions then it is possible to obtain an achievable lowerbound of the variance of any estimator \(T({\bf X})\) of \(\psi(\boldsymbol{\theta})\).
- Define \(E_{\theta}\left\{ T({\bf X})\right\}=\psi(\theta)\), \(\psi^{\prime}(\theta)= \displaystyle\frac{\partial}{\partial\theta} \psi(\theta)\) and \(I(\theta) = E_{\theta} \left[\left\{ \displaystyle\frac{\partial}{\partial\theta} \log f_{X_{1}} (X_{1} ; \theta) \right\}^{2} \right]\).
- If \(0<I(\theta)<\infty\), then \(T({\bf X})\) satisfies
\[ \text{var}_{\theta}\left\{T({\bf X})\right\} \geq \frac{[\psi^{\prime}(\theta)]^{2}}{n I(\theta)}.\]
- The quantity in the RHS of the above expression is called the Cramer-Rao Lower bound (CRLB).
- \(I(\theta)\) is called the Fisher information of \(\theta\) based on one sample. \(n I(\theta)\) is the Fisher information of \(\theta\) based on \(n\) iid samples.
- For the case where \(\psi (\theta) = \theta\) and \(T({\bf X})\) is an unbiased estimator of \(\theta\), CRLB reduces to \([n I(\theta)]^{-1}\).
- If for an estimator the CRLB is achieved, then it is the best unbiased estimator.
Example (continued): In the binomial example (stated above) it can be shown that \(T_{1}({\bf X})\) satisfies the CRLB.
Definition (Efficient estimator) An unbiased estimator of \(\psi(\theta)\) that attains the CRLB is called an efficient estimator.