Asymptotics
- Asymptotics is the term for the behavior of statistics as the sample size (or some other relevant quantity) limits to infinity (or some other relevant number)
- (Asymptopia is my name for the land of asymptotics, where everything works out well and there's no messes. The land of infinite data is nice that way.)
- Asymptotics are incredibly useful for simple statistical inference and approximations
- (Not covered in this class) Asymptotics often lead to nice understanding of procedures
- Asymptotics generally give no assurances about finite sample performance
- The kinds of asymptotics that do are orders of magnitude more difficult to work with
- Asymptotics form the basis for frequency interpretation of probabilities
(the long run proportion of times an event occurs)
- To understand asymptotics, we need a very basic understanding of limits.
Numerical limits
Imagine a sequence
- \( a_1 = .9 \),
- \( a_2 = .99 \),
- \( a_3 = .999 \), …
Clearly this sequence converges to \( 1 \)
Definition of a limit: For any fixed distance we can find a point in the sequence so that the sequence is closer to the limit than that distance from that point on
Limits of random variables
- The problem is harder for random variables
Consider \( \bar X_n \) the sample average of the first \( n \) of a collection of \( iid \) observations
- Example \( \bar X_n \) could be the average of the result of \( n \) coin flips (i.e. the sample proportion of heads)
We say that \( \bar X_n \) converges in probability to a limit if for any fixed distance the probability of \( \bar X_n \) being closer (further away) than that distance from the limit converges to one (zero)
The Law of Large Numbers
- Establishing that a random sequence converges to a limit is hard
- Fortunately, we have a theorem that does all the work for us, called
the Law of Large Numbers
- The law of large numbers states that if \( X_1,\ldots X_n \) are iid from a population with mean \( \mu \) and variance \( \sigma^2 \) then \( \bar X_n \) converges in probability to \( \mu \)
- (There are many variations on the LLN; we are using a particularly lazy version, my favorite kind of version)
Law of large numbers in action
n <- 10000; means <- cumsum(rnorm(n)) / (1 : n)
plot(1 : n, means, type = "l", lwd = 2,
frame = FALSE, ylab = "cumulative means", xlab = "sample size")
abline(h = 0)
Discussion
- An estimator is consistent if it converges to what you want to estimate
- Consistency is neither necessary nor sufficient for one estimator to be better than another
- Typically, good estimators are consistent; it's not too much to ask that if we go to the trouble of collecting an infinite amount of data that we get the right answer
- The LLN basically states that the sample mean is consistent
- The sample variance and the sample standard deviation are consistent as well
- Recall also that the sample mean and the sample variance are unbiased as well
- (The sample standard deviation is biased, by the way)
The Central Limit Theorem
- The Central Limit Theorem (CLT) is one of the most important theorems in statistics
- For our purposes, the CLT states that the distribution of averages of iid variables, properly normalized, becomes that of a standard normal as the sample size increases
- The CLT applies in an endless variety of settings
- Let \( X_1,\ldots,X_n \) be a collection of iid random variables with mean \( \mu \) and variance \( \sigma^2 \)
- Let \( \bar X_n \) be their sample average
- Then \( \frac{\bar X_n - \mu}{\sigma / \sqrt{n}} \) has a distribution like that of a standard normal for large \( n \).
- Remember the form
\[ \frac{\bar X_n - \mu}{\sigma / \sqrt{n}} =
\frac{\mbox{Estimate} - \mbox{Mean of estimate}}{\mbox{Std. Err. of estimate}}.
\]
- Usually, replacing the standard error by its estimated value doesn't change the CLT
Example
- Simulate a standard normal random variable by rolling \( n \) (six sided)
- Let \( X_i \) be the outcome for die \( i \)
- Then note that \( \mu = E[X_i] = 3.5 \)
- \( Var(X_i) = 2.92 \)
- SE \( \sqrt{2.92 / n} = 1.71 / \sqrt{n} \)
- Standardized mean
\[
\frac{\bar X_n - 3.5}{1.71/\sqrt{n}}
\]
Simulation of mean of \( n \) dice
Coin CLT
- Let \( X_i \) be the \( 0 \) or \( 1 \) result of the \( i^{th} \) flip of a possibly unfair coin
- The sample proportion, say \( \hat p \), is the average of the coin flips
- \( E[X_i] = p \) and \( Var(X_i) = p(1-p) \)
- Standard error of the mean is \( \sqrt{p(1-p)/n} \)
- Then
\[
\frac{\hat p - p}{\sqrt{p(1-p)/n}}
\]
will be approximately normally distributed
CLT in practice
- In practice the CLT is mostly useful as an approximation
\[
P\left( \frac{\bar X_n - \mu}{\sigma / \sqrt{n}} \leq z \right) \approx \Phi(z).
\]
- Recall \( 1.96 \) is a good approximation to the \( .975^{th} \) quantile of the standard normal
- Consider
\[
\begin{eqnarray*}
.95 & \approx & P\left( -1.96 \leq \frac{\bar X_n - \mu}{\sigma / \sqrt{n}} \leq 1.96 \right)\\ \\
& = & P\left(\bar X_n +1.96 \sigma/\sqrt{n} \geq \mu \geq \bar X_n - 1.96\sigma/\sqrt{n} \right),\\
\end{eqnarray*}
\]
Confidence intervals
- Therefore, according to the CLT, the probability that the random interval \[ \bar X_n \pm z_{1-\alpha/2}\sigma / \sqrt{n} \] contains \( \mu \) is approximately 100\( (1-\alpha) \)%, where \( z_{1-\alpha/2} \) is the \( 1-\alpha/2 \) quantile of the standard normal distribution
- This is called a \( 100(1 - \alpha) \)% confidence interval for \( \mu \)
- We can replace the unknown \( \sigma \) with \( s \)
Give a confidence interval for the average height of sons
in Galton's data
library(UsingR);data(father.son); x <- father.son$sheight
(mean(x) + c(-1, 1) * qnorm(.975) * sd(x) / sqrt(length(x))) / 12
[1] 5.710 5.738
Sample proportions
- In the event that each \( X_i \) is \( 0 \) or \( 1 \) with common success probability \( p \) then \( \sigma^2 = p(1 - p) \)
- The interval takes the form
\[
\hat p \pm z_{1 - \alpha/2} \sqrt{\frac{p(1 - p)}{n}}
\]
- Replacing \( p \) by \( \hat p \) in the standard error results in what is called a Wald confidence interval for \( p \)
- Also note that \( p(1-p) \leq 1/4 \) for \( 0 \leq p \leq 1 \)
- Let \( \alpha = .05 \) so that \( z_{1 -\alpha/2} = 1.96 \approx 2 \) then
\[
2 \sqrt{\frac{p(1 - p)}{n}} \leq 2 \sqrt{\frac{1}{4n}} = \frac{1}{\sqrt{n}}
\]
- Therefore \( \hat p \pm \frac{1}{\sqrt{n}} \) is a quick CI estimate for \( p \)
Example
- Your campaign advisor told you that in a random sample of 100 likely voters,
56 intent to vote for you.
- Can you relax? Do you have this race in the bag?
- Without access to a computer or calculator, how precise is this estimate?
1/sqrt(100)=.1 so a back of the envelope calculation gives an approximate 95% interval of (0.46, 0.66)
- Not enough for you to relax, better go do more campaigning!
- Rough guidelines, 100 for 1 decimal place, 10,000 for 2, 1,000,000 for 3.
round(1 / sqrt(10 ^ (1 : 6)), 3)
[1] 0.316 0.100 0.032 0.010 0.003 0.001
Poisson interval
- A nuclear pump failed 5 times out of 94.32 days, give a 95% confidence interval for the failure rate per day?
- \( X \sim Poisson(\lambda t) \).
- Estimate \( \hat \lambda = X/t \)
- \( Var(\hat \lambda) = \lambda / t \)
\[
\frac{\hat \lambda - \lambda}{\sqrt{\hat \lambda / t}}
=
\frac{X - t \lambda}{\sqrt{X}}
\rightarrow N(0,1)
\]
- This isn't the best interval.
- There are better asymptotic intervals.
- You can get an exact CI in this case.
R code
x <- 5; t <- 94.32; lambda <- x / t
round(lambda + c(-1, 1) * qnorm(.975) * sqrt(lambda / t), 3)
[1] 0.007 0.099
poisson.test(x, T = 94.32)$conf
[1] 0.01721 0.12371
attr(,"conf.level")
[1] 0.95
In the regression class
exp(confint(glm(x ~ 1 + offset(log(t)), family = poisson(link = log))))
2.5 % 97.5 %
0.01901 0.11393