Omitted Variable Bias in Short, Medium and Long Regressions

Author

Dor Leventer

Published

October 18, 2022

This note discusses omitted variable bias, in the less simple context of three covariates, compared to the two covariates regularly discussed in econometrics classes, within the linear regression framework. Idea taken from chapter 2 in Hansen 2021, Econometrics. All errors are my own.

Setup

Denote by X_1, X_2, and X_3 three random variables, and by Y an outcome variable. Say we are interested in the mean effect of X_1 on Y. We can consider three regression equations, each time adding a covariate.

\begin{align} Y & = \alpha_S + X_{1}\beta^S_{1}+u_{S}\\ Y & = \alpha_M + X_{1}\beta^M_{1}+X_{2}\beta^M_{2}+u_{M}\\ Y & = \alpha_L + X_{1}\beta^L_{1}+X_{2}\beta^L_{2}+X_{3}\beta^L_{3}+u_{L} \end{align} which we will term, from top to bottom, the Short (S), Medium (M), and Long (L) regressions. For our purposes, we will treat the Long equation as the population regression. That is, we are interested in learning \beta^L_1. And the purpose of this discussion, is to discuss which one we prefer to use instead, say if we cannot observe X_3, in terms of bias: \beta^S_{1} or \beta^M_{1}.

For simplicity, assume all the linear projection assumptions hold, e.g. mean independence of errors and so on.

Bias in the Short Equation

We begin with formulating the bias in the short equation. First, we calculate \beta^S_1 using the best linear predictor, and then substitute in Y from the Long equation.

\begin{align*} \beta^S_{1} & =V\left(X_{1}\right)^{-1}\times Cov\left(X_{1},Y\right)\\ & =V\left(X_{1}\right)^{-1}\times Cov\left(X_{1},X_{1}\beta^L_{1}+X_{2}\beta^L_{2}+X_{3}\beta^L_{3}\right)\\ & =\beta^L_{1}+V\left(X_{1}\right)^{-1}Cov\left(X_{1},X_{2}\right)\beta^L_{2}+V\left(X_{1}\right)^{-1}Cov\left(X_{1},X_{3}\right)\beta^L_{3} \end{align*}

In words, the regression coefficient from the Small equation, \beta^S_1, is equal to \beta^L_1, the coefficient we are interested in, plus a bias of magnitude

Bias\left(\beta_{1}^{S}\right) = V\left(X_{1}\right)^{-1}Cov\left(X_{1},X_{2}\right)\beta_{2}^{L}+V\left(X_{1}\right)^{-1}Cov\left(X_{1},X_{3}\right)\beta_{3}^{L}

Each element in the Bias(\beta^S_1) has a well-defined interpretation: \beta^L_2 and \beta^L_3 are coefficients in the population regression, V\left(X_{1}\right)^{-1}Cov\left(X_{1},X_{2}\right) is the linear projection of X_2 on X_1, and V\left(X_{1}\right)^{-1}Cov\left(X_{1},X_{3}\right) is the linear projection of X_3 on X_1.

So, to simplify, we write \begin{align*} X_{2}=\gamma_{1}^{X_{2}:X_{1}}X_{1}+u & \rightarrow\gamma_{1}^{X_{2}:X_{1}}=\frac{Cov\left(X_{1},X_{2}\right)}{V\left(X_{1}\right)}\\ X_{3}=\gamma_{1}^{X_{3}:X_{1}}X_{1}+u & \rightarrow\gamma_{1}^{X_{3}:X_{1}}=\frac{Cov\left(X_{1},X_{3}\right)}{V\left(X_{1}\right)}\\ Bias\left(\beta_{1}^{S}\right) & =\gamma_{1}^{X_{2}:X_{1}}\beta_{2}^{L}+\gamma_{1}^{X_{3}:X_{1}}\beta_{3}^{L} \end{align*}

Bias in the Medium Equation

We continue with the bias in the medium equation. Writing out the MSE for the Medium regression, taking derivatives and setting them to zero, one can show that

\beta^M_{1}=\frac{Cov\left(X_{1},Y\right)V\left(X_{2}\right)-Cov\left(X_{2},Y\right)Cov\left(X_{1},X_{2}\right)}{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}

Hence, in the Medium equation we get a bias of

Bias\left(\beta_{1}^{M}\right)=\beta_{3}^{L}\left[\frac{Cov\left(X_{1},X_{3}\right)V\left(X_{2}\right)-Cov\left(X_{2},X_{3}\right)Cov\left(X_{1},X_{2}\right)}{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}\right]

(algebra is left to last section if want to see the work)

Notes that we can further simplify this

\begin{align*} X_{3} & =\gamma_{1}^{X_{3}:X_{1}+X_{2}}X_{1}+\gamma_{2}^{X_{3}:X_{1}+X_{2}}X_{2}+u\\ & \rightarrow\gamma_{1}^{X_{3}:X_{1}+X_{2}}=\frac{Cov\left(X_{1},X_{3}\right)V\left(X_{2}\right)-Cov\left(X_{2},X_{3}\right)Cov\left(X_{1},X_{2}\right)}{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}\\ Bias\left(\beta_{1}^{M}\right) & =\beta_{3}^{L}\gamma_{1}^{X_{3}:X_{1}+X_{2}} \end{align*}

Comparing the Bias terms

When is it better to use the Medium regression over the Short equation? In terms of bias, one can write this as an inequality Bias\left(\beta^M_{1}\right)\geq Bias\left(\beta^S_{1}\right) and substitute the work we did above \beta_{3}^{L}\gamma_{1}^{X_{3}:X_{1}+X_{2}} \geq\gamma_{1}^{X_{2}:X_{1}}\beta_{2}^{L}+\gamma_{1}^{X_{3}:X_{1}}\beta_{3}^{L} Re-arranging we get

\begin{align*} \beta_{3}^{L}\gamma_{1}^{X_{3}:X_{1}+X_{2}}-\gamma_{1}^{X_{2}:X_{1}}\beta_{2}^{L}-\gamma_{1}^{X_{3}:X_{1}}\beta_{3}^{L} & \geq0\\ \beta_{3}^{L}\left(\gamma_{1}^{X_{3}:X_{1}+X_{2}}-\gamma_{1}^{X_{3}:X_{1}}\right)-\gamma_{1}^{X_{2}:X_{1}}\beta_{2}^{L} & \geq0 \end{align*}

First take-away: this doesn’t always hold! That is, putting in another control doesn’t magically erase bias. It depends.

Second take-away: We need to be careful what we control for. Say that we have an hypothesis for \beta^L_2 and \beta^L_3 (which are unknown, otherwise we would could go for \beta_1^L directly). One can draw a simple table to show when controlling for adding X_2 is good (i.e., Medium regression), and when using solely X_1 is preferred (i.e., Short regression).

Illustration

Assume that \beta_2^L,\beta_3^L>0. Then we can write out condition under which we prefer using \beta_1^S for estimating \beta_1^L, and conditions under which \beta_1^M is preferable. In terms of bias, of course.

\gamma_{1}^{X_{2}:X_{1}} \gamma_{1}^{X_{3}:X_{1}} \gamma_{1}^{X_{3}:X_{1}+X_{2}} Condition Use Regression
- - + None M
- + + \gamma_{1}^{X_{3}:X_{1}+X_{2}} > \gamma_{1}^{X_{3}:X_{1}} M
- + + \gamma_{1}^{X_{3}:X_{1}+X_{2}} < \gamma_{1}^{X_{3}:X_{1}} S
- - - \gamma_{1}^{X_{3}:X_{1}+X_{2}} > \gamma_{1}^{X_{3}:X_{1}} M
- - - \gamma_{1}^{X_{3}:X_{1}+X_{2}} < \gamma_{1}^{X_{3}:X_{1}} S
- + - -\gamma_{1}^{X_{2}:X_{1}}\beta_{2}^{L}>\beta_{3}^{L}\left(\gamma_{1}^{X_{3}:X_{1}+X_{2}}-\gamma_{1}^{X_{3}:X_{1}}\right) M

As we can see, whether it is preferable to use the Short and Medium in this case depends.

Final Lap

So, as seen, it determine which regression is preferable we need a hypothesis on the unknown coefficients in the Long equation, and a hypothesis on the partially unknown coefficients in linear projections of the covariates on themselves. We use the word unknown, since if it were possible we would have used the Long equation, but we assume that X_3 is not observed. And we use the words partially unknown, since we can find out \gamma_1^{X_2:X_1}. But since X_3 is unknown, the other coefficients in the equation will remain unknown as well.

Lastly, using a decomposition of the covarites linear projections, and using similar notation to what we used above, it might be useful to note that \gamma_{1}^{X_{3}:X_{1}}=\gamma_{1}^{X_{3}:X_{1}+X_{2}}+\gamma_{2}^{X_{3}:X_{1}+X_{2}}\gamma_{1}^{X_{2}:X_{1}}

Thats it for today.

Algebra!

Medium Equation

Substituting Y from the Long equation we get

\begin{align*} \beta^M_{1} & =\frac{Cov\left(X_{1},\beta^L_{1}X_{1}+\beta^L_{2}X_{2}+\beta^L_{3}X_{3}\right)V\left(X_{2}\right)}{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}\\ & -\frac{Cov\left(X_{2},\beta^L_{1}X_{1}+\beta^L_{2}X_{2}+\beta^L_{3}X_{3}\right)Cov\left(X_{1},X_{2}\right)}{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}\\ & =\frac{\left[V\left(X_{1}\right)\beta^L_{1}+Cov\left(X_{1},X_{2}\right)\beta^L_{2}+Cov\left(X_{1},X_{3}\right)\beta^L_{3}\right]V\left(X_{2}\right)}{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}\\ & -\frac{\left[Cov\left(X_{1},X_{2}\right)\beta^L_{1}+V\left(X_{2}\right)\beta^L_{2}+Cov\left(X_{2},X_{3}\right)\beta^L_{3}\right]Cov\left(X_{1},X_{2}\right)}{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}\\ & =\beta^L_{1}\left[\frac{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}\right]\\ & +\beta^L_{2}\left[\frac{Cov\left(X_{1},X_{2}\right)V\left(X_{2}\right)-V\left(X_{2}\right)Cov\left(X_{1},X_{2}\right)}{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}\right]\\ & +\beta^L_{3}\left[\frac{Cov\left(X_{1},X_{3}\right)V\left(X_{2}\right)-Cov\left(X_{2},X_{3}\right)Cov\left(X_{1},X_{2}\right)}{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}\right]\\ & =\beta^L_{1}+\beta^L_{3}\left[\frac{Cov\left(X_{1},X_{3}\right)V\left(X_{2}\right)-Cov\left(X_{2},X_{3}\right)Cov\left(X_{1},X_{2}\right)}{V\left(X_{1}\right)V\left(X_{2}\right)-Cov\left(X_{1},X_{2}\right)^{2}}\right] \end{align*}