Lecture 5.1a

October 17, 2016

Asymptotics

What we've been looking at

We've seen features of OLS that are true for any sample:
- We get unbiased estimates when Assumptions 1-4 hold, and
- number 5 means our methods of estimating variance are effective
- etc., etc.

What we'll be looking at

With large samples:
- our results are consistent (even if biased for small samples), and
- we can use asymptotic normality tests without making any assumptions about the error term.

Law of Large Numbers

~~The biggest and strongest numbers tell the other numbers what to do.~~

The average from a large number of trials (e.g. dice rolls) will tend to converge to the expected value.
- This means that your hot streak on the roulette wheel can't be permanant… in the long run.
- This does not mean that if "we're due" for a 13 you should put all your money on it… one more trial is not a large number.

Consistency

With really big samples, a consistent estimator approaches the true value of the underlying parameter. It is asymptotically unbiased.

As \(n \rightarrow \infty\)

\[ P(|\hat{\theta}_n - \theta| < \varepsilon) \rightarrow 1\]

for some arbitrary \(\varepsilon > 0\).

Consistency

As \(n \rightarrow \infty\)

\[ P(|\hat{\theta}_n - \theta| < \varepsilon) \rightarrow 1\]

for some arbitrary \(\varepsilon > 0\).

In English:

\(\varepsilon\) is any number you want it to be, and what you want it to be is some teeny tiny number that's practically zero.
You're using a random sample of \(n\) observations to estimate some parameter \(\theta\) (e.g. a slope coefficient)

Consistency

As \(n \rightarrow \infty\)

\[ P(|\hat{\theta}_n - \theta| < \varepsilon) \rightarrow 1\]

for some arbitrary \(\varepsilon > 0\).

With consistent estimators, the bigger your sample, the less likely it is that your estimate will be any bigger than \(\varepsilon\) (the smallest number that makes a difference to you).
- Even if your estimator is biased in small samples, if it's consistent, it will (probabalistically) get closer and closer to the truth as your sample gets bigger.

Consistency

Distribution for small samples is biased, but estimator is consistent for large (enough) samples

Why is consistency important?

Any estimator that isn't consistent is failing an important test
But unbiasedness might be too high a bar for some situations
- in situations where our best available estimator is biased, we might still be okay as long as it's consistent.
- These aren't situations I've come across in my work, but it's a big statistical world and I live in a small part of it.

Do we have to worry about OLS?

No. OLS is unbiased and consistent. As \(n\) approaches infinity, not only is \(\hat{\beta}\) an unbiased estimator for \(\beta\), but it converges to that value. (under assumptions 1-4)

What this means: We can get arbitrarily close to the true value of \(\beta\) by increasing our sample size.

Some math (hold on to your hats!)

For the simple regression case (you don't need to remember this, but it's helpful to see it… you can always look it up when you forget):

\[ \hat{\beta_1} = \frac{\sum(x_{i1} - \bar{x}) y_i }{\sum(x_i - \bar{x})^2} \] \[ \hat{\beta_1} = \beta_1 + \frac{\sum(x_{i1} - \bar{x}) u_i }{\sum(x_i - \bar{x})^2} = \beta_1 + \frac{1/n \sum(x_{i1} - \bar{x}) u_i }{1/n \sum(x_i - \bar{x})^2} \]

This second equation is derived by substituting \(\beta_0 + \beta_1 x_{i1} + u_i\) for \(y_i\) in the first equation and doing algebra.

More math

Now we've got \(n\) in the equation and we can use the Law of Large Numbers. Now we can make a statement in terms of probability limits. (Remember limits?)

\[plim \hat{\beta_1} = \beta_1 + cov(x_1,u)/var(x)\] \[plim \hat{\beta_1} = \beta_1 + 0/var(x)\]

because of assumption 4 (zero mean etc.). OLS is consistent!

Careful

There are wild random distributions out there that don't behave as nicely…
When a distribution has fat tails there's a relatively high probability of events far from the mean.
If a process has positive feedback loops it might have infinite variance.
- e.g. unexpected price changes in the price of some share garners increased attention which garners even more…

Update to Assumption 4

Zero conditional mean assumption: \(E(u|x_1,x_2,...,x_k) = 0\)

New assumption: \(E(u) = 0\) and \(cov(x_j,u) = 0\) for \(j=1,2,...,k\).

That is, zero mean and zero correlation

The first implies the second, but is more general (and harder to check).

Zero mean/correlation

Zero conditional mean assumption means that we can't have some function \(f(x_1,x_2,...,x_k)\) that is correlated with \(u\).

Zero mean/correlation means that each \(x_j\) is uncorrelated, even if there's some convoluted combinations of \(x\) variables that is correlated with \(u\).

Zero mean/correlation

This weaker assumption also frees us up to consider cases where our linear model isn't the true population regression function.
The zero conditional mean assumption assumes we've built a model with all appropriate non-linearities accounted for with things like quadratic terms.
In practice we don't want a true but messy model,
- we want a simplified model that reflects the truth in a way we can digest (interperability!).
Our gentler assumption frees us up to estimate the best linear approximation of more complex processes. (But beware supramarginal changes!)

Problems with regressions

If \(cov(x_j,u) \neq 0\) for any \(j=1,2,...,k\) then OLS is biased and inconsistent.

If we could observe \(u\) we could estimate the bias as

\[ plim \hat{\beta_1} - \beta_1 = \frac{cov(x_1,u)}{var(x_1)}\],

Omitted variable bias with math

Let the true model be \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + v\), and let it follow our standard assumptions.
If we estimate \(\tilde y = \tilde\beta_0 + \tilde\beta_1 x_1 +u\) then we're going to over- or under-state the effect of \(x_1\) (depending on \(cov(x_1,x_2)\)). \[ plim \tilde\beta_1 = \beta_1 + \beta_2 \frac{cov(x_1,x_2)}{var(x_1)}\]
This is because \(u = \beta_2 x_2 + v\) and \(v\) is uncorrelated with both \(x\) variables.
- If \(x_1\) and \(x_2\) are correlated, then our estimate of \(\beta_1\) is correlated with \(u\).
- If they're uncorrelated, then \(\hat{\beta_1}\) is at least consistent.

Returning to the normality assumption

We require Assumption 6 (errors are normally distributed and independent of our \(x\) variables) to do hypothesis testing.
- We can't see \(u\) but we can see \(y\), so we'll treat this assumption as \(y\) being normally distributed around our regression line.
But frequently we're dealing with situations where that assumption isn't justified.

Example

load("../../Wooldridge Material/Data Sets- R/401k.RData")
ex4.6 <- lm(prate ~ mrate + age + totemp, data = data)
ggplot(data,aes(x = prate)) + geom_histogram(binwidth = 5)

Example

So are we out of luck?

No! Asymptotically we're fine.
With a large enough sample, our OLS estimators are asymptotically normally distributed.