In econometrics, it is important to provide information to the reader about how uncertain we are about the estimates.
There are two main approaches used in statistics; classical statistics and Bayesian statistics. There is also a third way, called empirical Bayesian statistics.
The approach you are most familiar with from economics is called classical statistics.
Classical statistics uses the analogy principle. This principle states that if the econometrician is interested in some characteristic of the population, then she should use the analogy in the sample.
This section shows how the analogy principle can be used to provide information about the uncertainty surrounding our estimate.
Consider a case where we observe a sample \(x = {x_1, ..., x_N}\), with \(N\) observations. We assume that sample comes from a normal distribution \(x \sim N(\mu, \sigma^2)\). Typically, we are interested in estimating the mean, \(\mu\). Let that estimate be denoted by \(\hat\mu\). We know that \(\hat\mu \neq \mu\), but by how much do they differ?
Consider the simulation of a sample of size 10 drawn from a normal distribution with mean -3 and standard deviation of 5.
set.seed(2021)
N <- 10
mu <- -3
sigma <- 5
x <- rnorm(N, mean = mu, sd = sigma)
dens_x <- density(x)
The figure of this sample distribution differs substantially from the true distribution. However, we are not interested in determining the true underlying distribution. Rather, we are interested in some aggregate characteristic of the distribution such as the mean.
mean(x)
## [1] -1.482596
sd(x)
## [1] 4.729517
The sample mean is =1.48, which is also substantially different from the true value of -3. The sample standard deviation is 4.73 which is also different from the true aleu of 5.
How do we convey to the reader that our estimate of various statistics may not be accurate?
One idea is to think about how our estimate would vary if we had a large number of different samples, all drawn from the same true distribution. For each imaginary sample that we draw from the true distribution, we’ll get a different estimate. As the number of samples get large, we will have a distribution of the estimates. This distribution provides information about the uncertainty of our sample estimate.
M <- 500
sample_est <- matrix(NA, M, N)
for (m in 1:M){
sample_est[m, ] <- rnorm(N, mean=-3, sd=5)
}
hist(sample_est)
The figure presents the histogram of the distribution of sample means. The distribution is centered around the true mean of -3, but varies by more than 3 on each side. This illustrates that if we took a large number of samples our estimate would be correct on average.
We say that our estimate of the mean is unbiased.
More importantly, the weight of the distribution is around the true value. However, we cannot rule out the possibillity of our mean estimate being as low as -6 or as high as 0. The extent of the dispersion is determined by the size of the sample.
We can try running the same experiment with a larger sample, say \(N=100\).
M <- 100
sample_est <- matrix(NA, M, N)
for (m in 1:M){
sample_est[m, ] <- rnorm(N, mean=-3, sd=5)
}
hist(sample_est)
Another idea is to consider what happens if we have a large number of imaginary observations. If we had a very large sample our estimate of the mean would be very close to the true value.
With 100,000 observations the sample mean is -3.001, which is quite close to 03. An estimator with this property is called consistent.
set.seed(2021)
N <- 100000
x <- rnorm(N, -3,5)
mean(x)
## [1] -2.978944
The law of large numbers (LLN) indicates that a sample mean \(\bar x\) is a consistent estimator of \(\mu\). LLN states that as the sample size gets large, the sample estimate converges to the true value. It suggests that if our sample is “large enough”, then our estimate may be close to the true value.
Seems nice, but may not be that useful if our sample size is 10. It also does not provide us with any information about how uncertain our estimate actually is.
The Central Limit Theorem states that as the number of observations get large, then the estimated sample mean is distributed normally with a mean equal to the true mean and a standard deviation equal to the true standard deviation divided by the square root of the number of observations.
The following is a simulation of the CLT. It presents the density of the distribution of estimated sample means as the size of the sample increases. Note that the distributions are normalized so that they will be standard normal (if the sample size is large enough).
M <- 1000
sample_means <- matrix(NA, M, 4)
for (m in 1:M){
sample_means[m, 1] <- mean(rnorm(10, mean=-3, sd=5))
sample_means[m, 2] <- mean(rnorm(100, mean=-3, sd=5))
sample_means[m, 3] <- mean(rnorm(1000, mean=-3, sd=5))
sample_means[m, 4] <- mean(rnorm(10000, mean=-3, sd=5))
}
plot(density((sample_means[,1]+3)*((10^(.5))/5)),
type="l", lwd=3, lty=1, xlab="x", main="")
for (i in 2:4){
lines(density((sample_means[,i]+3)*((10^(i*(.5))/5))),
lwd=3, lty=i)
}
print("Densities of samples means for various sample sizes with known true mean and variance")
## [1] "Densities of samples means for various sample sizes with known true mean and variance"
The interesting part of the figure above is that even for small samples the density is close to a standard normal distribution.
The CLT suggests a way of determining the uncertainty associated with the estimate. It gives a distribution of the sample mean. Moreover, the variance of the distribution is determined by the true variance and the sample size. The uncertainty associated with the estimate is smaller if the variance is smaller and if the sample size is larger.
The CLT states that when sample size tends to infinity, the distribution of the sample means approaches the normal distribution. This is the statement about the shape of the distribution. The LLN tells us where the center of the bell is located. As the sample size approaches infinity, the center of the distribution sample means becomes very close to the population mean.