Lecture 5.1b

October 20, 2016

Alternative to Assumption 4

Zero conditional mean assumption (MLR.4): \(E(u|x_1,x_2,...,x_k) = 0\)

Alternative assumption(MLR.4'): \(E(u) = 0\) and \(cov(x_j,u) = 0\) for \(j=1,2,...,k\).

That is, zero mean and zero correlation

The first implies the second, but is more general (and harder to check).

Zero mean/correlation

Zero conditional mean assumption means that we can't have some function \(f(x_1,x_2,...,x_k)\) that is correlated with \(u\).

Zero mean/correlation means that each \(x_j\) is uncorrelated, even if there's some convoluted combinations of \(x\) variables that is correlated with \(u\).

Zero mean/correlation

This weaker assumption also frees us up to consider cases where our linear model isn't the true population regression function.
The zero conditional mean assumption assumes we've built a model with all appropriate non-linearities accounted for with things like quadratic terms.

Zero mean/correlation

In practice we don't want a true but messy model,
- we want a simplified model that reflects the truth in a way we can digest (interperability!).
Our gentler assumption frees us up to estimate the best linear approximation of more complex processes. (But beware supramarginal changes!)
Importantly, this assumption is all that's required for OLS to be consistent.

Omitted variable bias (asymptotically)

If \(cov(x_j,u) \neq 0\) for any \(j=1,2,...,k\), then OLS is biased and inconsistent.

If we could observe \(u\) we could estimate the bias as

\[ plim \hat{\beta_1} - \beta_1 = \frac{cov(x_1,u)}{var(x_1)}\],

Omitted variable bias with math

Let the true model be \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + v\), and let it follow our standard assumptions.
If we estimate \(\tilde y = \tilde\beta_0 + \tilde\beta_1 x_1 +u\) then we're going to over- or under-state the effect of \(x_1\) (depending on \(cov(x_1,x_2)\)). \[ plim \tilde\beta_1 = \beta_1 + \beta_2 \frac{cov(x_1,x_2)}{var(x_1)}\]
This is because \(u = \beta_2 x_2 + v\) and \(v\) is uncorrelated with both \(x\) variables.
- If \(x_1\) and \(x_2\) are correlated, then our estimate of \(\beta_1\) is correlated with \(u\).
- If they're uncorrelated, then \(\hat{\beta_1}\) is at least consistent.

Returning to the normality assumption

We require Assumption 6 (errors are normally distributed and independent of our \(x\) variables) to do hypothesis testing.
- We can't see \(u\) but we can see \(y\), so we'll treat this assumption as \(y\) being normally distributed around our regression line.
But frequently we're dealing with situations where that assumption isn't justified.

Example

load("../../Wooldridge Material/Data Sets- R/401k.RData")
ex4.6 <- lm(prate ~ mrate + age + totemp, data = data)
ggplot(data,aes(x = prate)) + geom_histogram(binwidth = 5)

Example

OLS Asymptotic variance

Under Assumptions 1-5, \(\hat{\beta_j}\) is asymptotically normally distributed…
Even with non-normally distributed errors, with a large enough sample size, we can use methods based on normality.
We just need to assume finite variance.

OLS Asymptotic variance

\[ (\hat{\beta_j}-\beta_j)/se(\hat{\beta_j}) \sim^a Normal(0,1) \]

And since we're dealing with asymptotics, and \(t \rightarrow Normal\) we can use our usual t-tests (and F-test(s)).
The question (still) is: how many observations are enough for this normal approximation to be legitimate?
- If the data isn't too weird, then \(n=30\) is considered an okay rule of thumb.
- I would consider any regression with so few observations to be more exploratory/indicative of the truth than definitive.
Homoskedasticity assumption becomes especially important.

OLS Asymptotic efficiency

We like OLS because it's BLUE.
Asymptotically it is still efficient… the variance of the error is smaller than alternative linear models.