8.3 - Consistency

Consistency: an asymptotic property

  • An estimator \(\hat\theta\) of \(\theta\) is said to be a consistent estimator of \(\theta\) if, for any \(\epsilon>0\):

\[\lim_{n\rightarrow \infty} P(|\hat\theta-\theta| < \epsilon) = 1.\]

  • An equivalent statement:

\[\lim_{n\rightarrow \infty} P(|\hat\theta-\theta| > \epsilon) = 0.\]

  • Consistency is also known as convergence in probability. If \(\hat\theta\) is a consistent estimator of \(\theta\), \(\hat\theta\) is said to converge in probability to \(\theta\) and we can write:

\[\hat \theta \rightarrow_p \theta\]

  • Intuitively: the estimator \(\hat \theta\) is arbitrarily close to \(\theta\) if \(n\) is large.

Example: consistency of sample mean of normals

  • Suppose \(Y_1,...,Y_n\) are i.i.d. \(\sim N(\mu,\sigma^2)\). Show that \(\bar Y\) is a consistent estimator of \(\mu\).

Proof:

For any \(\epsilon >0\):

\[\scriptsize P(|\bar Y -\mu| < \epsilon) = P\left((\bar Y -\mu)^2 < \epsilon^2\right)=P\left(\left(\frac{\bar Y -\mu}{\sigma/\sqrt{n}}\right)^2 < \left(\frac{\epsilon}{\sigma/\sqrt{n}}\right)^2\right)\]

\[\scriptsize=P\left(Z^2 < \frac{n\epsilon^2}{\sigma^2}\right), \mbox{ where } Z \sim N(0,1) \Rightarrow Z^2 \sim \chi^2_1\]

\[\scriptsize \mbox{As } n\rightarrow \infty, \,\, P\left(Z^2 < \frac{n\epsilon^2}{\sigma^2}\right) \rightarrow P\left(Z^2 < \infty \right) = 1\]

chi-square density with cut point

Simulating consistency

To demonstrate consistency via simulation study, we will:

  • Show how \(\bar Y\) approaches \(\mu\) for one “run” of \(n \in \{2,4,6, ..., 1000\}\);
  • Do this for many runs;
  • For each \(n\), find the proportion of runs for which \(\bar Y\) is within \(\epsilon\) of \(\mu.\)
  • One “run” of \(\bar Y\):
library(purrrfect)
library(tidyverse)

set.seed(1234)
mu <- 3
(ybars_one_run <- parameters(~n,
            seq(2,1000, by = 2))
 %>% mutate(y_bar = map_dbl(n, \(n) mean(rnorm(n, mu, sd = 1))),
            mu = mu,
           )
) %>% head()
# A tibble: 6 × 3
      n y_bar    mu
  <dbl> <dbl> <dbl>
1     2  2.54     3
2     4  2.92     3
3     6  2.32     3
4     8  3.04     3
5    10  2.61     3
6    12  2.39     3

Plotting one simulated “run”

ggplot(data = ybars_one_run, aes(x = n, y = y_bar))+
  geom_line()+
  geom_hline(yintercept = mu, color = "red") + 
  ylab(expression(bar(Y)))+
  theme_classic()

Simulating 5000 runs

  • Now simulating 5000 runs, and determining whether each \(\bar Y\) is within \(\epsilon = 0.1\) of \(\mu=3\):
epsilon <- 0.1

(ybars_many_runs <- parameters(~n,   seq(2,1000, by = 2))
 %>% add_trials(5000)
 %>% mutate(y_bar = map_dbl(n, \(n) mean(rnorm(n, mu, sd = 1))),
            mu = mu,
           )
 %>% mutate(within_epsilon = ifelse(abs(y_bar-mu) < epsilon, 1, 0))
) %>% head
# A tibble: 6 × 5
      n .trial y_bar    mu within_epsilon
  <dbl>  <dbl> <dbl> <dbl>          <dbl>
1     2      1  2.70     3              0
2     2      2  2.95     3              1
3     2      3  1.78     3              0
4     2      4  2.80     3              0
5     2      5  3.51     3              0
6     2      6  3.18     3              0

Plotting the first 16 runs

Finding simulated \(P(|\bar Y - \mu|<\epsilon)\)

  • For each \(n\), find the proportion of runs for which \(\bar Y\) is within \(\epsilon\) of \(\mu\)
  • Plot vs \(n\):
(ybars_many_runs 
  %>% summarize(prob_within_eps = mean(within_epsilon), .by = n) 
) -> prop_within_epsilon

ggplot(data = prop_within_epsilon) + 
  geom_line(aes(x = n, y = prob_within_eps)) + 
  labs(x='n',
    y=expression(P(abs(bar(Y) - mu) < epsilon))
    ) +
  theme_classic(base_size = 16)
Plot of probability of being within epsilon of mu by n

Not all statistics \(\rightarrow_p\) to a parameter

  • A consistent estimator \(\hat \theta\) of \(\theta\) will converge in probability to \(\theta\).
  • However, not all statistics converge in probability to a parameter.
  • If \(\hat\theta\rightarrow_p c\) for some \(c\ne \theta\), then \(\hat\theta\) is clearly not a consistent estimator of \(\theta\).

Sample max of Betas converges in probability to a constant

  • Let \(Y_{(n)}\) represent the maximum order statistic from a sample of size \(n\) drawn i.i.d. from a \(BETA(\alpha,1)\) parent population
  • Parent population CDF:

\[F_Y(y) = \begin{cases} 0 & y < 0 \\ y^{\alpha} & 0\le y \le 1 \\ 1 & y > 1 \end{cases}\]

  • Show that \(Y_{(n)}\rightarrow_p 1\), and thus that \(Y_{(n)}\) is not a consistent estimator of \(\alpha\).

Convergence of \(Y_{(n)}\) to 1

For \(\epsilon >0\):

\[P(|Y_{(n)}-1| < \epsilon)=P(-\epsilon< Y_{(n)}-1 < \epsilon)=P(1-\epsilon< Y_{(n)} < 1+\epsilon)\]

\[=F_{Y_{(n)}}(1+\epsilon) -F_{Y_{(n)}}(1-\epsilon) \]

Recall that:

\[F_{Y_{(n)}}(y) = P(Y_{(n)}\le y ) = P(Y_i\le y )^n = F_Y(y)^n\]

\[\Rightarrow F_{Y_{(n)}}(1+\epsilon) -F_{Y_{(n)}}(1-\epsilon)=F_{Y}(1+\epsilon)^n -F_{Y}(1-\epsilon)^n\] \[ = 1^n - \left((1-\epsilon)^\alpha\right)^n = 1-(1-\epsilon)^{\alpha n}\]

\[ \rightarrow 1 \mbox{ as } n\rightarrow \infty\]

\[\therefore Y_{(n)} \rightarrow_p 1\]

Establishing consistency using asymptotic bias and variance

  • One can also establish consistency (or lack thereof) by investigating the asymptotic bias and variance of an estimator.
  • Theorem: Let \(\hat\theta\) be an estimator of a parameter \(\theta\). If:

\[\lim_{n\rightarrow \infty} B(\hat\theta) = 0 \mbox{ and } \lim_{n\rightarrow \infty} Var(\hat\theta) = 0,\]

then \(\hat\theta\) is a consistent estimator of \(\theta\).

  • Corollary: If an estimator \(\hat\theta\) is unbiased for \(\theta\), then it is also consistent for \(\theta\) if \(\lim_{n\rightarrow \infty} Var(\hat\theta) = 0\).

Proof: Practice! But makes use of Markov’s Inequality, which states that if \(Y\) is any non-negatively-valued random variable and if \(k > 0\) is a constant, then:

\[P(Y > k) \leq \frac{E(Y)}{k^2}.\]

Example: Sample variance of normals

  • Suppose \(Y_1,...,Y_n\) are i.i.d. \(\sim N(\mu,\sigma^2)\)
  • Let:

\[\hat\sigma_1^2 = \frac{\sum_{i=1}^n (Y_i-\bar Y)^2}{n-1}\]

\[\hat\sigma_2^2 = \frac{\sum_{i=1}^n (Y_i-\bar Y)^2}{n}\]

  • Show that both estimators are consistent for \(\sigma^2\).

Investigating \(\hat\sigma_1^2\)

  • Note that:

\[\hat\sigma_1^2 = \frac{\sum_{i=1}^n (Y_i-\bar Y)^2}{n-1} = S^2\]

  • Since the sample comes from a normal population,

\[\frac{(n-1)\hat\sigma_1^2}{\sigma^2}\sim \chi^2_{n-1}\Rightarrow E\left[\frac{(n-1)\hat\sigma_1^2}{\sigma^2}\right] = n-1, Var\left[\frac{(n-1)\hat\sigma_1^2}{\sigma^2}\right] = 2(n-1)\]

\[\Rightarrow E\left(\hat\sigma_1^2\right) = \sigma^2, Var\left(\hat\sigma_1^2\right) = \frac{2\sigma^4}{n-1}\]

\[\Rightarrow B(\hat\sigma_1^2)=0\ \ \forall\ n, Var(\hat\sigma^2_1) \rightarrow 0 \mbox{ as }n\rightarrow \infty\]

\[\therefore \hat\sigma_1^2 \rightarrow_p \sigma^2\]

Investigating \(\hat\sigma_2^2\)

  • Note that:

\[\scriptsize \hat\sigma_2^2 = \frac{\sum_{i=1}^n (Y_i-\bar Y)^2}{n} = \frac{n-1}{n}\hat\sigma^2_1\]

  • Thus:

\[\scriptsize E(\hat\sigma^2_2) = E\left(\frac{n-1}{n}\hat\sigma^2_1\right) = \frac{n-1}{n}\sigma^2\Rightarrow B(\hat\sigma_2^2) = \frac{n-1}{n}\sigma^2-\sigma^2 = -\frac{\sigma^2}{n}\]

\[\scriptsize Var(\hat\sigma^2_2) = Var\left(\frac{n-1}{n}\hat\sigma^2_1\right) = \left(\frac{n-1}{n}\right)^2\frac{2\sigma^4}{n-1} = \frac{2\sigma^4(n-1)}{n^2}\]

  • As \(n\rightarrow \infty\) , \(B(\hat\sigma_2^2) = -\frac{\sigma^2}{n}\rightarrow 0\) and \(Var(\hat\sigma^2_2) = \frac{2\sigma^4(n-1)}{n^2} \rightarrow 0\)

\[\therefore \hat\sigma^2_2 \rightarrow_p \sigma^2\]

The Weak Law of Large Numbers

  • A general law that establishes consistency of sample means
  • Distribution agnostic; regardless of the parent population’s distribution (assuming mild regularity conditions)
  • Informally: the sample mean converges in probability (is consistent for) the population mean, whatever it is.

Formal statement of WLLN

  • If \(Y_1,...,Y_n\) are i.i.d. from a population with finite mean and variance, i.e. 
    • \(\mu< \infty\)
    • \(\sigma^2 < \infty\)
  • Then \(\bar Y\) is a consistent estimator of \(\mu\); that is, \(\bar Y \rightarrow_p \mu\).

Proof:

Using what we’ve previously shown algebraically:

\[E(\bar Y) = \mu \Rightarrow B(\bar Y) = 0\ \ \forall n\]

\[Var(\bar Y) = \frac{\sigma^2}{n} \Rightarrow \lim_{n\rightarrow \infty} Var(\bar Y) = 0\]

\[\therefore \bar Y \rightarrow_p \mu\]

Application of WLLN: sample means of raw Gammas

  • Suppose \(Y_1,Y_2,...,Y_n \stackrel{i.i.d}{\sim} GAM(\alpha,\lambda)\).
  • Consider \(\bar Y\), the sample mean
  • To what does \(\bar Y\) converge in probability?
  • By the WLLN, since this parent population has a finite mean and variance:

\[\bar Y \rightarrow_p \mu = \frac{\alpha}{\lambda}\] I.e., for any \(\epsilon > 0\),

\[\lim_{n\rightarrow \infty}P\left(\left|\bar Y - \frac{\alpha}{\lambda}\right| < \epsilon\right) =1\]

Application of WLLN: sample means of squared Gammas

  • Suppose \(Y_1,Y_2,...,Y_n \stackrel{i.i.d}{\sim} GAM(\alpha,\lambda)\).
  • Consider \(\bar W = \frac{\sum_{i=1}^n Y_i^2}{n}\), the sample second moment
  • To what does \(\bar W\) converge in probability?

Letting \(W_i = Y_i^2\), \(W_i \not \sim GAM(\alpha,\lambda)\), but:

\[ E(W_i) = \mu_W = E(Y_i^2) = Var(Y_i) + E(Y_i)^2 = \frac{\alpha}{\lambda^2} + \frac{\alpha^2}{\lambda^2} = \frac{\alpha(\alpha+1)}{\lambda^2},\] \[ Var(W_i) = \sigma^2_W = E(W_i^2)-E(W_i)^2 = E(Y_i^4)- \left[\frac{\alpha(\alpha+1)}{\lambda^2}\right]^2<\infty,\] since

\[ E(Y_i^4) = \frac{\Gamma(\alpha+4)}{\Gamma(\alpha)\lambda^4}<\infty.\]

Thus by the WLLN, since \(\bar W\) is a sample mean from a (not-Gamma!) parent population with finite mean and variance:

\[ \bar W \rightarrow_p \mu_W = \frac{\alpha(\alpha+1)}{\lambda^2}.\]