Thanks guys for being here, and I hope that you’ve had a great quarter so far.
This week, we’re going to go over some practice problems for the final. The hopeful form will be: I give you a question, you work together to try and solve it.
Before we do that, I want you to remember to fill out your surveys. They help us get better at instruction, and, for these newer formatted courses, figure out what you guys like and don’t like about the class.
Ok. Onto the show. Let’s start with a couple of longer-form questions from last quarter’s final exam. Then, we can do some questions from past practice question sets.
i.) What is the causal effect of income for Ali?
5000
ii.) Calculate the causal effect of education for Bob
15,000
iii.) Is the treatment effect constant? Briefly explain.
No
iv.) Bob went to college; Ali did not. Estimate the effect of college using the difference estimator, the difference in the mean of the treatment group and the treated group.
Ali did not go to college, so, no-college = 70,000
Bob did go to college, so, college = 65,000
Thus, the treatment effect is estimated at -5,000
v.) Do we have selection bias?
Yes! Bob likely is in the college group because his income went up by more than Ali’s, and Ali selected out of college because she was going to get little benefit from it anyway (5000).
This is a rather simple one, and is explained in the last problem. Primarily, it’s that we can never see all the observations displayed in the rubin-causal model matrix. That is, we can only really observe one ‘state’ of the world, and thus we need to choose the right causal approximation to do this well. To do well on a question like this, I’d recommend reading Ed’s notes on casuality.
Write down an ADL(1,1) model for the effect of income on births
What does ADL stand for?
Autoregressive Distributive Lag. So, autoregressive means we have a self-term in our eqn, and distributive is saying we have lags of our explanatory variable as well. Let’s set this up:
\[Births_{t} = \beta_0 + \beta_1 income_t + \alpha_1 Births_{t-1} + \beta_2 income_{t-1}\] Let’s do a few of the weirder T/F questions.
Prediction focuses on estimating \(\beta\) while causal inference focuses on estimating \(\hat{y}\).
False. It’s the opposite, right?
In the following dynamic time-series model, \(u_t\) is first-order autocorrelated, i.e.
\[Health_t = \beta_0 + \beta_1 Income_t + \beta_2 Health_{t-1} + u_t\] \[Health_{t-1} = \beta_0 + \beta_1 Income_{t-1} + \beta_2 Health_{t-2} + u_{t-1}\] \[u_t = \rho u_{t-1} + \varepsilon_t\]
Where \(\varepsilon_t\) is white-noise: iid and mean 0, variance = constant.
Explain why OLS will likely be biased for \(\beta_2\) - even with no omitted variables.
Ok. Well why might we detect bias in an estimator? What is an estimator equal to? Let’s think about our error term a bit here. Plugging everything in, we get
\[Health_t = \beta_0 + \beta_1 Income_t + \beta_2(\beta_0 + \beta_1 Income_{t-1} + \beta_2 Health_{t-2} + u_{t-1}) + (\rho u_{t-1} + \varepsilon_t)\]
Let me ask you this: is \(\frac{Cov(Health_{t-1},Health_t)}{Var(Health_t)} = \beta_2\)?
You can see that exogeneity here is violated. Or, \(Cov(X,u) \neq 0\).
If the covariance between \(u_t\) and \(u_{t-1}\) increases as t gets larger, then is \(u_t\) stationary?
No. Maybe we need to think about why.
If the covariance between two error terms is changing, what does that mean?
Well it means that as t is increasing into the future, our error process is changing. That means that covariance between \(u_t\) and \(u_{t-1}\) is dependent not only on k (1 in our case) but also t. That’s bad, and breaks assumption 3 of our stationarity rules.
How are random walks and autocorrelated variables similar? How are they different?
Well let’s recall what exactly a random walk is. At it’s most basic, a random walk is equal to:
\[X_t = X_{t-1} + u_t\]
A random walk, therefore, is a special case of an autocorrelated variable. It’s important to note that there are two real model restrictions:
1.) No ‘intercept’, or really that \(\beta_0 = 0\)
2.) And our ‘AR’ part is equal to 1. That is, our coefficient on the autocorrelated portion is equal to one.
It’s different because it is non-stationary and therefore you can’t trust OLS while estimating. However, unlike tests for autocorrelation, you need to do a special transformation to solve a random walk. Do you remember what that is?
That’s right: first differences. That will ‘delete’ your \(X_{t-1}\) and so long as \(u_t\) is stationary you should be ok.
Why is physical fitness a bad instrument for military service, but a military draft is likely a valid instrument, when trying to estimate income.
Well, recall our problems with instruments. We need two things from them. They need to be exogenous, and relevant. I buy relevance for all variables involved here, but what about exogeneity?
If someone is physically fit, they likely have a lot of grit which may correlate with their lifetime income in a positive way.
If someone is drafted, they simply were selected by lottery. That’s not likely to be due to some underlying factor related to income, unless you believe in luck or any of that nonsense.
The probabilitiy limit of the instrumental variables estimator is
\[plim(\widehat{\beta}_1^{IV}) = \beta_1 + \frac{Cov(z,u)}{Cov(z,x)}\]
where z is our instrument, u is our disturbance, and x is our endogenous variable.
i.) How do the two requirements of a valid instrument enter into this equation?
ii.) (T/F) If we have a valid instrument, then \(\widehat{\beta}_1^{IV}\) is a consistent estimator of \(\beta_1\).
Let’s start with part i.
Ok, so what exactly are our requirements for IV? We know we need,
Exogeneity
Relevance.
But what really, do those things mean for us? Well, let’s start with exogeneity of the instrument. If our variable is exogenous, that means it won’t be correlated with the error term in our DGP. What is the mathematical expression for a correlation coefficient?
\[\rho_{z,u} = \frac{cov(z,u)}{\sigma_u \sigma_z}\]
Great. So how do we get our correlation coefficient REALLY REALLY small? We can try to get the covariance of z and u to be small, or make our population SD really huge. Really though, unless those sd are going to infinity, then we won’t get that second case.
If you look above, our plim will go to our estimator \(\beta_1\) if Cov(z,u) = 0. Thus, we need our covariance term to be zero.
What about relevance? Well, what does relevance mean? It means that our z variable is correlated with our X variable. Or, that our z-variable ‘explains’ X sufficiently.
Let’s assume we have just a teensy bit of endogeneity. That is, cov(z,u) is a little bigger than 0. As we send
\[lim_{Cov(z,x) \rightarrow 0}\frac{Cov(z,u)}{Cov(z,x)} = \infty,\ if\ cov(z,u) \neq 0\] So if we don’t have Cov(z,x) equal to some sizable number, than we may bias our reasonable estimates some.
part ii
Ok. You guys should be able to do this part. But, where you might fall down, is the consistent estimator portion here. What exactly is a consistent estimator?
From Wikipedia, (but really you should know this:) a consistent estimator \(T_n\) of the parameter \(\theta\) is:
\[plim_{n \rightarrow \infty}(T_n) = \theta\]
Once you know that, if we did our IV correctly, we should be able to figure this out (what makes our IV ‘correct’ and what makes our eqn ‘efficient’)
Good luck on the final exam!!