October 17, 2016
The biggest and strongest numbers tell the other numbers what to do.
As \(n \rightarrow \infty\)
\[ P(|\hat{\theta}_n - \theta| < \varepsilon) \rightarrow 1\]
for some arbitrary \(\varepsilon > 0\).
As \(n \rightarrow \infty\)
\[ P(|\hat{\theta}_n - \theta| < \varepsilon) \rightarrow 1\]
for some arbitrary \(\varepsilon > 0\).
In English:
As \(n \rightarrow \infty\)
\[ P(|\hat{\theta}_n - \theta| < \varepsilon) \rightarrow 1\]
for some arbitrary \(\varepsilon > 0\).
No. OLS is unbiased and consistent. As \(n\) approaches infinity, not only is \(\hat{\beta}\) an unbiased estimator for \(\beta\), but it converges to that value. (under assumptions 1-4)
What this means: We can get arbitrarily close to the true value of \(\beta\) by increasing our sample size.
For the simple regression case (you don't need to remember this, but it's helpful to see it… you can always look it up when you forget):
\[ \hat{\beta_1} = \frac{\sum(x_{i1} - \bar{x}) y_i }{\sum(x_i - \bar{x})^2} \] \[ \hat{\beta_1} = \beta_1 + \frac{\sum(x_{i1} - \bar{x}) u_i }{\sum(x_i - \bar{x})^2} = \beta_1 + \frac{1/n \sum(x_{i1} - \bar{x}) u_i }{1/n \sum(x_i - \bar{x})^2} \]
This second equation is derived by substituting \(\beta_0 + \beta_1 x_{i1} + u_i\) for \(y_i\) in the first equation and doing algebra.
Now we've got \(n\) in the equation and we can use the Law of Large Numbers. Now we can make a statement in terms of probability limits. (Remember limits?)
\[plim \hat{\beta_1} = \beta_1 + cov(x_1,u)/var(x)\] \[plim \hat{\beta_1} = \beta_1 + 0/var(x)\]
because of assumption 4 (zero mean etc.). OLS is consistent!
Zero conditional mean assumption: \(E(u|x_1,x_2,...,x_k) = 0\)
New assumption: \(E(u) = 0\) and \(cov(x_j,u) = 0\) for \(j=1,2,...,k\).
That is, zero mean and zero correlation
The first implies the second, but is more general (and harder to check).
Zero conditional mean assumption means that we can't have some function \(f(x_1,x_2,...,x_k)\) that is correlated with \(u\).
Zero mean/correlation means that each \(x_j\) is uncorrelated, even if there's some convoluted combinations of \(x\) variables that is correlated with \(u\).
This weaker assumption also frees us up to consider cases where our linear model isn't the true population regression function.
The zero conditional mean assumption assumes we've built a model with all appropriate non-linearities accounted for with things like quadratic terms.
Our gentler assumption frees us up to estimate the best linear approximation of more complex processes. (But beware supramarginal changes!)
If \(cov(x_j,u) \neq 0\) for any \(j=1,2,...,k\) then OLS is biased and inconsistent.
If we could observe \(u\) we could estimate the bias as
\[ plim \hat{\beta_1} - \beta_1 = \frac{cov(x_1,u)}{var(x_1)}\],
Let the true model be \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + v\), and let it follow our standard assumptions.
If we estimate \(\tilde y = \tilde\beta_0 + \tilde\beta_1 x_1 +u\) then we're going to over- or under-state the effect of \(x_1\) (depending on \(cov(x_1,x_2)\)). \[ plim \tilde\beta_1 = \beta_1 + \beta_2 \frac{cov(x_1,x_2)}{var(x_1)}\]
load("../../Wooldridge Material/Data Sets- R/401k.RData") ex4.6 <- lm(prate ~ mrate + age + totemp, data = data) ggplot(data,aes(x = prate)) + geom_histogram(binwidth = 5)