Convergence, CLT, WLLN, CMT, Slutzky theorem. Simulations.

This document defines some concepts from probability theory and uses them to show examples of their use in econometrics. A lot of the definitions are based on Bruce Hansen's Econometrics Book, which is still for free available online, and a very useful reference.

Convergence in Probability

A random variable \( z_n \in \mathbb{R} \) converges in probability to constant \( c \in \mathbb{R} \), denoted \( z_n \overset{p}{\longrightarrow} c \) as \( n \to \infty \), if for all \( \delta > 0 \), \[ \lim_{n\to\infty} \Pr (|z_n - c | \leq \delta) = 1 \].

In simple words, this means that the distribution of \( z \) concentrates about a certain point \( c \) as \( n \) increases. We call \( c \) the probability limit of \( z_n \).

Almost sure convergence

This is very similar to Convergence in Probability. “Almost sure” means “with probability equal one”. It is a stronger convergence concept that the previous one:

A random variable \( z_n \in \mathbb{R} \) converges almost surely to constant \( c \in \mathbb{R} \), denoted \( z_n \overset{a.s.}{\longrightarrow} c \) as \( n \to \infty \), if for all \( \delta > 0 \), \[ \Pr ( \lim_{n\to\infty} |z_n - c | \leq \delta) = 1 \].

The Weak Law of Large Numbers (WLLN)

This is an application of Convergence in Probability.

For \( y_i \) from an i.i.d. sample, if \( E|y|<\infty \), as \( n \to \infty \), \[ \overline{y} = \frac{1}{n} \sum_{i=1}^n y_i \overset{p}{\longrightarrow} E(y_i). \]

WLLN example

Say we have a random sample \( X_i \) drawn from the normal distribution \( N(3,5) \). We can see the how the sample mean concentrates about the population mean as we increase the sample size. For each sample size \( n \), we'll compute 30 sample means and investigate how they are distributed around the population mean. To understand this graph you only have to think about what it means to compute a mean based on only a few observations: if you draw x far away from the mean 3, this will have a big impact (because you are not dividing by a large n).

# this example is inspired by Yihui Xie's animation package. Example here
# http://animation.yihui.name/prob:law_of_large_numbers
library(ggplot2)
set.seed <- 1234
mu <- 3  # mean
sdev <- 5  # standard deviation
ntrials <- 30  # number of means per sample size
ssize <- 80  # maximal sample size
df <- NULL
for (i in 1:ssize) {
    m <- rowMeans(matrix(replicate(ntrials, rnorm(n = i, mean = mu, sd = sdev)), 
        ncol = i))
    mdf <- data.frame(means = m, sid = rep(i, times = ntrials))
    df <- rbind(df, mdf)
}
ggplot(data = df, aes(x = sid, y = means)) + stat_summary(geom = "ribbon", 
    fun.ymin = "min", fun.ymax = "max", alpha = 0.5, fill = "blue") + geom_point() + 
    scale_x_continuous("sample size") + scale_y_continuous("sample means") + 
    geom_hline(yintercept = mu, color = "red")

plot of chunk unnamed-chunk-1

Notice that this is true for any iid sequence. Take for example \( z_i = x_i^3 \). The result above tells us that \( \overline{x} = \frac{1}{n} \sum_{i=1}^n \overset{p}{\longrightarrow} E(x_i) \), such that it must be the case that \( \overline{z} = \frac{1}{n} \sum_{i=1}^n z_i \overset{p}{\longrightarrow} E(z_i) \), and \( E(z_i)=E(x_i ^3) \). Let's try this out here:

set.seed <- 1234
mu <- 3  # mean
sdev <- 5  # standard deviation
ntrials <- 30  # number of means per sample size
ssize <- 80  # maximal sample size
third <- mean(rnorm(n = 10000, mean = mu, sd = sdev)^3)  # estimate the third moment
df <- NULL
for (i in 1:ssize) {
    m <- rowMeans(matrix(replicate(ntrials, (rnorm(n = i, mean = mu, sd = sdev)))^3, 
        ncol = i))  # note: z_i = x_i^3
    mdf <- data.frame(means = m, sid = rep(i, times = ntrials))
    df <- rbind(df, mdf)
}
ggplot(data = df, aes(x = sid, y = means)) + stat_summary(geom = "ribbon", 
    fun.ymin = "min", fun.ymax = "max", alpha = 0.5, fill = "blue") + geom_point() + 
    scale_x_continuous("sample size") + scale_y_continuous("sample means") + 
    geom_hline(yintercept = third, color = "red")  # note: red line at third moment of x

plot of chunk unnamed-chunk-2

Notice that the convergence goes towards \( E(x_i ^3) \), which we estimated to be 250.0712.

Convergence in Distribution

To derive asymptotic distributions of estimators, we use this concept.

Let \( z_n \) be a random vector with distribution \( F_n (u) = \Pr (z_n \leq u) \). We way that \( z_n \) converges in distribution to \( z \) as \( n\to \infty \), denoted \( z_n \overset{d}{\longrightarrow} z \), if for all \( u \) at which \( F(u)=\Pr(z\leq u) \) is continuous, \( F_n(u)\to F(u) \) as \( n\to \infty \).

We say that \( z \) is the limiting distribution, or the asymptotic distribution of \( z_n \).

Central Limit Theorem

For \( y_i \) iid, if \( E||y||^2 < \infty \), then as \( n\to \infty \) \[ \sqrt{n} (\overline{y}_n - \mu) = \frac{1}{\sqrt{n}} \sum_{i=1}^n (y_i - \mu) \overset{d}{\longrightarrow} N(0,V) \] where \( \mu=E(y) \) and \( V=E((y-\mu)(y-\mu)') \).

In other words, the standardized sum \( z_n = \sqrt(n) (\overline{y}_n - \mu) \) has mean zero and variance \( V \) and is approximately normally distributed. Please take a moment to appreciate that this is true for any kind of distribution that the sequence \( y_i \) might have been drawn from.

CLT examples

Suppose we draw random numbers from the exponential distribution with mean 3. This gives rise to the following histogram:
plot of chunk unnamed-chunk-3

We can now see the central limit of this distribution, as we increase the sample size on which we compute the statistic \( z_n = \sqrt{n} (\overline{z}_n - \mu) \):

set.seed <- 12
lambda <- 1/3
muexp <- 1/lambda  # mean
varexp <- 1/(lambda^2)
ntrials <- 100
ssizes <- c(3, 30, 1000, 1e+05)  # sample sizes
df <- NULL
for (i in 1:length(ssizes)) {
    m <- rowMeans(matrix(replicate(ntrials, rexp(n = ssizes[i], rate = 1/muexp)), 
        ncol = ssizes[i]))
    m <- sqrt(ssizes[i]) * (m - muexp)
    mdf <- data.frame(means = m, sid = factor(paste("sample size", ssizes[i])))
    df <- rbind(df, mdf)
}
ggplot(data = df, aes(x = means, group = sid)) + geom_density() + 
    facet_wrap(~sid, scales = "free_y")

plot of chunk CLT

Note how the distributions get closer to \( N(0,9) \) with increasing sample size:

plot of chunk true-norm

A slight variation on this theme concerns standardization of this result: if we derive the asymptotic distribution of \( z_n = \sqrt{n} \frac{\overline{z}_n - \mu}{\sigma} \) we find that this converges to a standard normal. In our experiment:

set.seed  <- 12
lambda    <- 1/3
muexp     <- 1/lambda   # mean
varexp    <- 1/(lambda^2)
ntrials   <- 100
ssizes    <- c(3,30,1000,100000)  # sample sizes
df <- NULL
for (i in 1:length(ssizes)) {
     m   <- rowMeans(matrix(replicate(ntrials, rexp(n = ssizes[i], rate = 1/muexp)),ncol = ssizes[i]))
     m   <- sqrt(ssizes[i]) * (m - muexp)/sqrt(varexp)  # note standardization here
     mdf <- data.frame(means = m, sid = factor(paste("sample size",ssizes[i])))
     df  <- rbind(df, mdf) 
     }
ggplot(data = df, aes(x = means,group=sid)) + geom_density() + facet_wrap(~sid,scales="free_y")

plot of chunk CLT-standardized

Also remember that this is not limited to the exponential distribution used in this example. We could have used any distribution you can think of to draw random numbers from, with the same result.

Continuous Mapping Theorem for convergence in probability (CMTP)

If \( z_n \overset{p}{\longrightarrow} c \) as \( n\to \infty \) and \( g() \) is continuous at \( c \), then \[ g(z_n) \overset{p}{\longrightarrow} g(c) \] as \( n\to \infty \).

This implies in particular that if \( z_n \overset{p}{\longrightarrow} c \) as \( n\to \infty \) then
\[ \begin{aligned} z_n + a \overset{p}{\longrightarrow} & c + a \\ z_n a \overset{p}{\longrightarrow} & a c \\ z_n^2 \overset{p}{\longrightarrow} & c^2 \\ \end{aligned} \]
because the functions \( g(u)=u+a, g(u)=au, g(u)=u^2 \) are continuous. For \( c\neq0 \), also this works: \( \frac{a}{z_n} \overset{p}{\longrightarrow} \frac{a}{c} \)

Continuous Mapping Theorem for convergence in distribution (CMTD)

If \( z_n \overset{d}{\longrightarrow} c \) as \( n\to \infty \) and \( g: \mathbb{R}^m \to \mathbb{R}^k \) has a set of discontinuity points \( D_g \) s.t. \( \Pr(z\in D_g)=0 \), then \[ g(z_n) \overset{d}{\longrightarrow} g(c) \] as \( n\to \infty \).

The discontinuity remark can be illustrated with \( g(u) = u^{-1} \), which is discontinuous at \( u=0 \). But if \( z_n \overset{d}{\longrightarrow} z\sim N(0,1) \) then \( \Pr(z=0)=0 \) and so \( z_n^{-1} \overset{d}{\longrightarrow} z^{-1} \)

Slutzky Theorem

if \( z_n \overset{d}{\longrightarrow} z \) and \( c_n \overset{d}{\longrightarrow} c \) as \( n\to \infty \) then
\[ \begin{aligned} z_n + c_n \overset{d}{\longrightarrow} & z + c \\ z_n c_n \overset{d}{\longrightarrow} & z c \\ \frac{z_n}{c_n} \overset{d}{\longrightarrow} & \frac{z}{c},if\quad c\neq 0 \end{aligned} \]

Delta Method

If \( \sqrt{n} (\theta_n - \theta_0) \overset{d}{\longrightarrow} \xi \), where \( \theta \) is (m,1) and \( g(\theta): \mathbb{R}^m \to \mathbb{R}^k,k\leq m \), is continuously differentiable in a neighborhood of \( \theta \) then as \( n\to \infty \)
\[ \sqrt{n} (g(\theta_n) - g(\theta_0) \overset{d}{\longrightarrow} G'\xi \]
where \( G(\theta) = \frac{\partial}{\partial \theta} g(\theta) \) and \( G=G(\theta_0) \). In particular, if
\[ \sqrt{n} (\theta_n - \theta_0) \overset{d}{\longrightarrow} N(0,V) \]
where \( V \) is (m,m), then as \( n\to \infty \)
\[ \sqrt{n} (g(\theta_n) - g(\theta_0) \overset{d}{\longrightarrow} N(0,G'VG) \]