Theory of Point Estimation

Introduction

Let \(X_1, X_2, \dots, X_n\) be a random sample from a normal distribution with mean \(\mu\)
and variance \(\sigma^2\), denoted \(X_i \sim N(\mu, \sigma^2)\).

The problem of point estimation is to pick a statistic \(T(X_1,\dots,X_n)\)
that “best” estimates the parameter \(\mu\).

  • An estimator is a statistic (random variable).
  • An estimate is its realized numerical value (constant).

Definition 1.
The set of all admissible values of the parameter of a distribution is called the parameter space \(\Omega\).


Examples

  1. If \(X_i \sim \text{Poisson}(\lambda)\), then \(\Omega = (0,\infty)\).
    Possible estimators of \(\lambda\):
    \[ T_1 = \frac{1}{n}\sum_{i=1}^n X_i, \quad T_2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2 \]

  2. If \(X_i \sim \text{Bernoulli}(\theta)\), then \(\Omega = (0,1)\).
    Possible estimators of \(\theta\):
    \[ T_1 = \frac{1}{n}\sum_{i=1}^n X_i, \quad T_2 = X_1, \quad T_3 = \frac{X_1+X_2}{2} \]

  3. If \(X_i \sim N(\mu, \sigma^2)\), then \(\Omega = \{(\mu,\sigma^2): -\infty < \mu < \infty, \sigma^2 > 0\}\).
    Possible estimators:
    \[ T_1 = \bar{X}, \quad T_2 = S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2 \]


Properties of Estimators and Statistics

Estimator properties:

  • Consistency
  • Unbiasedness
  • Minimum variance
  • Efficiency

Statistic properties:

  • Sufficiency
  • Completeness

Since estimators are statistics, they inherit properties of the underlying statistic.


Consistency

Definition.
A sequence \(\{X_n\}\) converges in probability to \(b\) if
\[ \Pr(|X_n - b| > \epsilon) \to 0 \quad \text{as } n \to \infty, \; \forall \epsilon > 0. \]
Denoted \(X_n \xrightarrow{p} b\).

Definition.
An estimator sequence \(\{T_n\}\) is consistent for \(\theta\) if \(T_n \xrightarrow{p} \theta\).


Theorem 1: Weak Law of Large Numbers

If \(X_i\) are iid with \(E[X_i] = \mu\) and \(\text{Var}(X_i)=\sigma^2 < \infty\), then
\[\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \xrightarrow{p} \mu.\]

Proof.
By Chebyshev’s inequality, for \(\epsilon>0\):
\[ \Pr(|\bar{X}_n - \mu| > \epsilon) \le \frac{\sigma^2}{n\epsilon^2} \to 0. \]
Thus \(\bar{X}_n \xrightarrow{p} \mu\).


Theorem 2

If \(\lim_{n\to\infty} E[T_n] = \theta\) and \(\lim_{n\to\infty} \text{Var}(T_n) = 0\), then \(T_n\) is consistent for \(\theta\).

Proof. By Chebyshev’s inequality:
\[ \Pr(|T_n - E[T_n]| > \epsilon) \le \frac{\text{Var}(T_n)}{\epsilon^2} \to 0. \]
And since \(E[T_n] \to \theta\), we get \(T_n \xrightarrow{p} \theta\).


Sufficiency

Definition.
A statistic \(T(X)\) is sufficient for \(\theta\) if the conditional distribution of the sample given \(T\) is independent of \(\theta\).


Factorization Theorem

\(T(X)\) is sufficient for \(\theta\) iff:
\[ f(x_1,\dots,x_n;\theta) = g(T(x);\theta)h(x) \] where \(h(x)\) does not depend on \(\theta\).


Example 1: Binomial

If \(X_i \sim \text{Bernoulli}(p)\), then \(T=\sum X_i\) is sufficient for \(p\).

Proof.
Joint pmf:
\[ f(x_1,\dots,x_n;p) = p^{\sum x_i}(1-p)^{n-\sum x_i}. \]
This can be written as \(g(T;p)h(x)\), so \(T\) is sufficient.


Example 2: Poisson

If \(X_1,X_2 \sim \text{Poisson}(\lambda)\), then \(T=X_1+X_2\) is sufficient for \(\lambda\).


Minimal Sufficiency

Definition (Lehmann-Scheffé).
A statistic is minimal sufficient if it is a function of every other sufficient statistic.

Construction (likelihood ratio method).
Two sample points \(x_1, x_2\) are equivalent if
\[ \frac{L(x_1;\theta)}{L(x_2;\theta)} \text{ is independent of } \theta. \]
The partition induced corresponds to a minimal sufficient statistic.


Completeness

Definition.
A family \(\{f(x;\theta): \theta \in \Omega\}\) is complete if:
\[ E[g(X)] = 0 \; \forall \theta \in \Omega \;\; \Rightarrow \;\; g(X)=0 \;\text{a.s.} \]

Example 1. The family \(\text{Binomial}(n,\theta)\) is complete.

Example 2. The family \(N(\theta,\theta)\) is not complete.

Example 3. Uniform\((0,\theta)\) is complete.


Exponential Families

Definition.
A distribution belongs to an exponential family if it can be written as:
\[ f(x;\theta) = \exp\{Q(\theta)T(x) + D(\theta) + S(x)\}. \]


Examples

  • Binomial: exponential family with \(T(x)=x\).
  • Normal with known mean: exponential family in \(\sigma^2\).
  • Gamma: exponential family in \(\theta\).

Main Theorem

In exponential families, the canonical statistic \(T\) is sufficient.
If \(Q(\theta)\) ranges over an open set, then \(T\) is also complete sufficient.


References