This document defines some concepts from probability theory and uses them to show examples of their use in econometrics. A lot of the definitions are based on Bruce Hansen's Econometrics Book, which is still for free available online, and a very useful reference.
A random variable \( z_n \in \mathbb{R} \) converges in probability to constant \( c \in \mathbb{R} \), denoted \( z_n \overset{p}{\longrightarrow} c \) as \( n \to \infty \), if for all \( \delta > 0 \), \[ \lim_{n\to\infty} \Pr (|z_n - c | \leq \delta) = 1 \].
In simple words, this means that the distribution of \( z \) concentrates about a certain point \( c \) as \( n \) increases. We call \( c \) the probability limit of \( z_n \).
This is very similar to Convergence in Probability. “Almost sure” means “with probability equal one”. It is a stronger convergence concept that the previous one:
A random variable \( z_n \in \mathbb{R} \) converges almost surely to constant \( c \in \mathbb{R} \), denoted \( z_n \overset{a.s.}{\longrightarrow} c \) as \( n \to \infty \), if for all \( \delta > 0 \), \[ \Pr ( \lim_{n\to\infty} |z_n - c | \leq \delta) = 1 \].
This is an application of Convergence in Probability.
For \( y_i \) from an i.i.d. sample, if \( E|y|<\infty \), as \( n \to \infty \), \[ \overline{y} = \frac{1}{n} \sum_{i=1}^n y_i \overset{p}{\longrightarrow} E(y_i). \]
Say we have a random sample \( X_i \) drawn from the normal distribution \( N(3,5) \). We can see the how the sample mean concentrates about the population mean as we increase the sample size. For each sample size \( n \), we'll compute 30 sample means and investigate how they are distributed around the population mean. To understand this graph you only have to think about what it means to compute a mean based on only a few observations: if you draw x far away from the mean 3, this will have a big impact (because you are not dividing by a large n).
# this example is inspired by Yihui Xie's animation package. Example here
# http://animation.yihui.name/prob:law_of_large_numbers
library(ggplot2)
set.seed <- 1234
mu <- 3 # mean
sdev <- 5 # standard deviation
ntrials <- 30 # number of means per sample size
ssize <- 80 # maximal sample size
df <- NULL
for (i in 1:ssize) {
m <- rowMeans(matrix(replicate(ntrials, rnorm(n = i, mean = mu, sd = sdev)),
ncol = i))
mdf <- data.frame(means = m, sid = rep(i, times = ntrials))
df <- rbind(df, mdf)
}
ggplot(data = df, aes(x = sid, y = means)) + stat_summary(geom = "ribbon",
fun.ymin = "min", fun.ymax = "max", alpha = 0.5, fill = "blue") + geom_point() +
scale_x_continuous("sample size") + scale_y_continuous("sample means") +
geom_hline(yintercept = mu, color = "red")
Notice that this is true for any iid sequence. Take for example \( z_i = x_i^3 \). The result above tells us that \( \overline{x} = \frac{1}{n} \sum_{i=1}^n \overset{p}{\longrightarrow} E(x_i) \), such that it must be the case that \( \overline{z} = \frac{1}{n} \sum_{i=1}^n z_i \overset{p}{\longrightarrow} E(z_i) \), and \( E(z_i)=E(x_i ^3) \). Let's try this out here:
set.seed <- 1234
mu <- 3 # mean
sdev <- 5 # standard deviation
ntrials <- 30 # number of means per sample size
ssize <- 80 # maximal sample size
third <- mean(rnorm(n = 10000, mean = mu, sd = sdev)^3) # estimate the third moment
df <- NULL
for (i in 1:ssize) {
m <- rowMeans(matrix(replicate(ntrials, (rnorm(n = i, mean = mu, sd = sdev)))^3,
ncol = i)) # note: z_i = x_i^3
mdf <- data.frame(means = m, sid = rep(i, times = ntrials))
df <- rbind(df, mdf)
}
ggplot(data = df, aes(x = sid, y = means)) + stat_summary(geom = "ribbon",
fun.ymin = "min", fun.ymax = "max", alpha = 0.5, fill = "blue") + geom_point() +
scale_x_continuous("sample size") + scale_y_continuous("sample means") +
geom_hline(yintercept = third, color = "red") # note: red line at third moment of x
Notice that the convergence goes towards \( E(x_i ^3) \), which we estimated to be 250.0712
.
To derive asymptotic distributions of estimators, we use this concept.
Let \( z_n \) be a random vector with distribution \( F_n (u) = \Pr (z_n \leq u) \). We way that \( z_n \) converges in distribution to \( z \) as \( n\to \infty \), denoted \( z_n \overset{d}{\longrightarrow} z \), if for all \( u \) at which \( F(u)=\Pr(z\leq u) \) is continuous, \( F_n(u)\to F(u) \) as \( n\to \infty \).
We say that \( z \) is the limiting distribution, or the asymptotic distribution of \( z_n \).
For \( y_i \) iid, if \( E||y||^2 < \infty \), then as \( n\to \infty \) \[ \sqrt{n} (\overline{y}_n - \mu) = \frac{1}{\sqrt{n}} \sum_{i=1}^n (y_i - \mu) \overset{d}{\longrightarrow} N(0,V) \] where \( \mu=E(y) \) and \( V=E((y-\mu)(y-\mu)') \).
In other words, the standardized sum \( z_n = \sqrt(n) (\overline{y}_n - \mu) \) has mean zero and variance \( V \) and is approximately normally distributed. Please take a moment to appreciate that this is true for any kind of distribution that the sequence \( y_i \) might have been drawn from.
Suppose we draw random numbers from the exponential distribution with mean 3. This gives rise to the following histogram:
We can now see the central limit of this distribution, as we increase the sample size on which we compute the statistic \( z_n = \sqrt{n} (\overline{z}_n - \mu) \):
set.seed <- 12
lambda <- 1/3
muexp <- 1/lambda # mean
varexp <- 1/(lambda^2)
ntrials <- 100
ssizes <- c(3, 30, 1000, 1e+05) # sample sizes
df <- NULL
for (i in 1:length(ssizes)) {
m <- rowMeans(matrix(replicate(ntrials, rexp(n = ssizes[i], rate = 1/muexp)),
ncol = ssizes[i]))
m <- sqrt(ssizes[i]) * (m - muexp)
mdf <- data.frame(means = m, sid = factor(paste("sample size", ssizes[i])))
df <- rbind(df, mdf)
}
ggplot(data = df, aes(x = means, group = sid)) + geom_density() +
facet_wrap(~sid, scales = "free_y")
Note how the distributions get closer to \( N(0,9) \) with increasing sample size:
A slight variation on this theme concerns standardization of this result: if we derive the asymptotic distribution of \( z_n = \sqrt{n} \frac{\overline{z}_n - \mu}{\sigma} \) we find that this converges to a standard normal. In our experiment:
set.seed <- 12
lambda <- 1/3
muexp <- 1/lambda # mean
varexp <- 1/(lambda^2)
ntrials <- 100
ssizes <- c(3,30,1000,100000) # sample sizes
df <- NULL
for (i in 1:length(ssizes)) {
m <- rowMeans(matrix(replicate(ntrials, rexp(n = ssizes[i], rate = 1/muexp)),ncol = ssizes[i]))
m <- sqrt(ssizes[i]) * (m - muexp)/sqrt(varexp) # note standardization here
mdf <- data.frame(means = m, sid = factor(paste("sample size",ssizes[i])))
df <- rbind(df, mdf)
}
ggplot(data = df, aes(x = means,group=sid)) + geom_density() + facet_wrap(~sid,scales="free_y")
Also remember that this is not limited to the exponential distribution used in this example. We could have used any distribution you can think of to draw random numbers from, with the same result.
If \( z_n \overset{p}{\longrightarrow} c \) as \( n\to \infty \) and \( g() \) is continuous at \( c \), then \[ g(z_n) \overset{p}{\longrightarrow} g(c) \] as \( n\to \infty \).
This implies in particular that if \( z_n \overset{p}{\longrightarrow} c \) as \( n\to \infty \) then
\[ \begin{aligned}
z_n + a \overset{p}{\longrightarrow} & c + a \\
z_n a \overset{p}{\longrightarrow} & a c \\
z_n^2 \overset{p}{\longrightarrow} & c^2 \\
\end{aligned}
\]
because the functions \( g(u)=u+a, g(u)=au, g(u)=u^2 \) are continuous. For \( c\neq0 \), also this works: \( \frac{a}{z_n} \overset{p}{\longrightarrow} \frac{a}{c} \)
If \( z_n \overset{d}{\longrightarrow} c \) as \( n\to \infty \) and \( g: \mathbb{R}^m \to \mathbb{R}^k \) has a set of discontinuity points \( D_g \) s.t. \( \Pr(z\in D_g)=0 \), then \[ g(z_n) \overset{d}{\longrightarrow} g(c) \] as \( n\to \infty \).
The discontinuity remark can be illustrated with \( g(u) = u^{-1} \), which is discontinuous at \( u=0 \). But if \( z_n \overset{d}{\longrightarrow} z\sim N(0,1) \) then \( \Pr(z=0)=0 \) and so \( z_n^{-1} \overset{d}{\longrightarrow} z^{-1} \)
if \( z_n \overset{d}{\longrightarrow} z \) and \( c_n \overset{d}{\longrightarrow} c \) as \( n\to \infty \) then
\[ \begin{aligned} z_n + c_n \overset{d}{\longrightarrow} & z + c \\ z_n c_n \overset{d}{\longrightarrow} & z c \\ \frac{z_n}{c_n} \overset{d}{\longrightarrow} & \frac{z}{c},if\quad c\neq 0 \end{aligned} \]
If \( \sqrt{n} (\theta_n - \theta_0) \overset{d}{\longrightarrow} \xi \), where \( \theta \) is (m,1) and \( g(\theta): \mathbb{R}^m \to \mathbb{R}^k,k\leq m \), is continuously differentiable in a neighborhood of \( \theta \) then as \( n\to \infty \)
\[ \sqrt{n} (g(\theta_n) - g(\theta_0) \overset{d}{\longrightarrow} G'\xi \]
where \( G(\theta) = \frac{\partial}{\partial \theta} g(\theta) \) and \( G=G(\theta_0) \). In particular, if
\[ \sqrt{n} (\theta_n - \theta_0) \overset{d}{\longrightarrow} N(0,V) \]
where \( V \) is (m,m), then as \( n\to \infty \)
\[ \sqrt{n} (g(\theta_n) - g(\theta_0) \overset{d}{\longrightarrow} N(0,G'VG) \]