Asymptotics is the term for the behavior of statistics as the sample size (or some relevant quantity) limits to infinity (or some relevant number)
Asymptopia is the name of a land where everything works well, and it should work well, because there’s an infinite amount of data in the land of Asymptopia.
Asymptotics are incredibly useful for simple statistical inference and approximations. It’s like a little swiss army knife that you can pull out to investigate the statistical properties of many statistics, without having to do much computing.
Asymptopics form the basis for frequency interpretation of probabilities (the long run proportion of times an event occurs)
Fortunately, instead of diving into the mathematics of the limits of random variables, there’s a set of powerful tools that we can rely on.
These results allow us to talk about the large sample distribution of sample means of a collection of \(iid\) observations.
The first of these results, the Law of Large Numbers, we intuitively know
set.seed(1234)
n <- 1000
means <- cumsum(rnorm(n))/(1:n)
The mean of the first observation by itself, then the mean of the first and the second observation, then the mean of the first, second, and third observation, and so on.
plot(1:n, means, type = "l", xlab = "n", ylab = "Cumulative mean", main = "Cumulative means series")
abline(h = 0, col = "red")
You can notice that there is a lot of variability in the cumulative means distribution, But then as the number of simulation goes on, we get closer and closer to the true population value, which is zero.
means <- cumsum(sample(0:1, n, replace = T))/(1:n)
plot(1:n, means, type = "l", xlab = "n", ylab = "Cumulative mean", main = "Coin Flip Series")
abline(h = 0.5, col = "red")
An estimator is consistent if it converges to what you want to estimate. So for example the sample proportion from \(iid\) coin flips is consistent for the true success probability of a coin. As you flip a coin over and over, the sample proportion of heads converges to the probability of getting a head on that coin.
The LLN says that the sample mean of \(iid\) samples is consistent for the population mean. Typically, good estimators are consistent; it’s not too much to ask that if we go to the trouble of collecting an infinite amount of data that we get the right answer, but it’s not always verified.
The sample variance and the standard deviation of \(iid\) random variables are consistent as well.