October 20, 2016

Alternative to Assumption 4

Zero conditional mean assumption (MLR.4): \(E(u|x_1,x_2,...,x_k) = 0\)

Alternative assumption(MLR.4'): \(E(u) = 0\) and \(cov(x_j,u) = 0\) for \(j=1,2,...,k\).

That is, zero mean and zero correlation

The first implies the second, but is more general (and harder to check).

Zero mean/correlation

Zero conditional mean assumption means that we can't have some function \(f(x_1,x_2,...,x_k)\) that is correlated with \(u\).

Zero mean/correlation means that each \(x_j\) is uncorrelated, even if there's some convoluted combinations of \(x\) variables that is correlated with \(u\).

Zero mean/correlation

  • This weaker assumption also frees us up to consider cases where our linear model isn't the true population regression function.

  • The zero conditional mean assumption assumes we've built a model with all appropriate non-linearities accounted for with things like quadratic terms.

Zero mean/correlation

  • In practice we don't want a true but messy model,
    • we want a simplified model that reflects the truth in a way we can digest (interperability!).
  • Our gentler assumption frees us up to estimate the best linear approximation of more complex processes. (But beware supramarginal changes!)

  • Importantly, this assumption is all that's required for OLS to be consistent.

Omitted variable bias (asymptotically)

If \(cov(x_j,u) \neq 0\) for any \(j=1,2,...,k\), then OLS is biased and inconsistent.

If we could observe \(u\) we could estimate the bias as

\[ plim \hat{\beta_1} - \beta_1 = \frac{cov(x_1,u)}{var(x_1)}\],

Omitted variable bias with math

  • Let the true model be \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + v\), and let it follow our standard assumptions.

  • If we estimate \(\tilde y = \tilde\beta_0 + \tilde\beta_1 x_1 +u\) then we're going to over- or under-state the effect of \(x_1\) (depending on \(cov(x_1,x_2)\)). \[ plim \tilde\beta_1 = \beta_1 + \beta_2 \frac{cov(x_1,x_2)}{var(x_1)}\]

  • This is because \(u = \beta_2 x_2 + v\) and \(v\) is uncorrelated with both \(x\) variables.
    • If \(x_1\) and \(x_2\) are correlated, then our estimate of \(\beta_1\) is correlated with \(u\).
    • If they're uncorrelated, then \(\hat{\beta_1}\) is at least consistent.

Returning to the normality assumption

  • We require Assumption 6 (errors are normally distributed and independent of our \(x\) variables) to do hypothesis testing.
    • We can't see \(u\) but we can see \(y\), so we'll treat this assumption as \(y\) being normally distributed around our regression line.
  • But frequently we're dealing with situations where that assumption isn't justified.

Example

load("../../Wooldridge Material/Data Sets- R/401k.RData")
ex4.6 <- lm(prate ~ mrate + age + totemp, data = data)
ggplot(data,aes(x = prate)) + geom_histogram(binwidth = 5)

Example

Example

OLS Asymptotic variance

  • Under Assumptions 1-5, \(\hat{\beta_j}\) is asymptotically normally distributed…

  • Even with non-normally distributed errors, with a large enough sample size, we can use methods based on normality.

  • We just need to assume finite variance.

OLS Asymptotic variance

\[ (\hat{\beta_j}-\beta_j)/se(\hat{\beta_j}) \sim^a Normal(0,1) \]

  • And since we're dealing with asymptotics, and \(t \rightarrow Normal\) we can use our usual t-tests (and F-test(s)).
  • The question (still) is: how many observations are enough for this normal approximation to be legitimate?
    • If the data isn't too weird, then \(n=30\) is considered an okay rule of thumb.
    • I would consider any regression with so few observations to be more exploratory/indicative of the truth than definitive.
  • Homoskedasticity assumption becomes especially important.

OLS Asymptotic efficiency

  • We like OLS because it's BLUE.
  • Asymptotically it is still efficient… the variance of the error is smaller than alternative linear models.