CEF_v1.knit

class: center, middle
# Conditional Expectation Function (CEF)
## Econometría
#### Dr. Francisco J. Cabrera-Hernández
#### Maestría en Economía
Primavera 2025
#####CIDE Santa Fe, Ciudad de México.

---
<style>
  .centered-word {
    position: absolute;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%);
  }
</style>

<div class="centered-word">
  <h1>Introduction</h1>
</div>

---
## Econometrics

Sir Clive Granger, Nobel Laureate, noted:

*“We need a special field called econometrics, and textbooks about it, because it is generally accepted that economic data possess certain properties that are not considered in standard statistics texts”*

---
## ¿What is it for?

**Predicción (Series de Tiempo):**

- Utiliza los datos del pasado y observa su comportamiento: tendencia o estacionalidad.

- Utiliza un modelo/algortimo que mejor se adapte a esa "forma" de los datos (curve-fitting).

- A partir de esto predice, **suponiendo que todo lo demás permanece cambiando igual**

---
## Time Series
<div class="figure" style="text-align: center">
<img src="data:image/png;base64,#mtimeseries.png" alt=" " width="80%" />
<p class="caption"> </p>
</div>

---
## Machine Learning (Data Science)

- En esencia es el mismo procedimiento que que la modelación de series de tiempo (curve-fitting)

- Pueden integrar "predictores": edad, escolaridad, género...

- Use cada vez más extenso, complementado con otros métodos (computer intensive)

---
## ¿What is it for?

**Causal inference:**

- El más importante para establecer relaciones causales entre variables.

- Integra al análisis **factores no observables.**

- No busca predecir (los economistas hemos sido bastante malos en esto). Se hace preguntas "what if?"

- Por ejemplo: [¿Estudiar en una universidad privada mejora los ingresos de las personas?](https://www.nber.org/papers/w7322).

- El análisis causal se puede complementar con *Machine Learning*.

---
## Reduced-Form Causal Analysis

- México es el segundo receptor de remesas del mundo ¿pero sirven para reducir la pobreza?

- La teoría clásica dice que el salario mínimo encima del equilibrio genera desempleo ¿en México lo generó?

- La teoría  dice que el comercio internacional genera ventajas comparativas que benefician a los trabajadores ¿es cierto para México?

---
## Data

- Ideally, we would use experimental data to answer these questions ¿how?

- Most economic data is **observational**. This means that
all variables must be treated as random and possibly jointly determined.

- We can measure the joint distribution of variables and assess their joint dependence.

- But it is difficult to infer **causality** from observational data. It requires **identification**.

---
## Data

- Cross-sectional: one observation per individual. Typically surveys or administrative data.

- We often assume that cross-sectional observations are mutually independent.

- We mean `$i^{th}$` observation `$(Y_i,X_i)$` is independent of the `$j^{th}$` observation `$(Y_j,X_j)$` for `$i\ne j$`.

- *Independently distributed. Relates to the relationship betwen `$i$` and `$j$` not between `$Y$` and `$X$`*

- If the data are randomly gathered, draw from the same probability distribution `$F(y,x)$` (the population) it is identically distributed.

---
## Data
- Hence data we will work with are **i.i.d.** or a **random sample.**

- However the data can appear in clusters (schools, firms, municipalities, states.).

- This violates independence between observations within clusters.

- We eventually deal with these types of data in this course.

---
## Notation

- Variables denoted with capital letters: `$Y$`, `$X$`, `$Z$`.

- Random variables and vectors (mathematically): `$Y$`, `$X$`, `$Z$`. Except equation errors: `$e$`, `$u$` or `$v$`.

- Real numbers (elements of the real line `$\mathbb{R}$`), or scalars, are lower case `$x$`.

- Vectors (elements of `$\mathbb{R}^k$`) are also lower case:

`$$\mathbf{x} = 
\begin{pmatrix}
x_1 \\
x_2 \\
\vdots \\
x_k
\end{pmatrix}$$`

- Matrices are obviously bold capital (***X***).

- Unknown parameters are Greek letters: `$\beta$`, `$\theta$`, etc.

---
<style>
  .centered-word {
    position: absolute;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%);
  }
</style>

<div class="centered-word">
  <h1>Conditional Expectation Function (CEF)</h1>
</div>

---

## Introduction to CEF
- Least squares is a tool to estimate the conditional mean of one variable.

- Here we abstract from estimation and focus on the probabilistic foundation.

- Implies a quick review of theory.

- Later we recap LS's algebra and parameter estimation.

---

## Distribution of wages (Example)

Wage is a random (we don't know the wage before measuring) variable with **probability distribution**:

`$$F(u)=\mathbb{P}[wage  \le u ]$$`

Observed wages are realization from the distribution `$F$`.

If differentiable, the PDF:

$$ f(u) = {d \over du} F(u) $$
 
---

## Distribution of wages (Example)

Mean or expectation for a random variable Y with discrete support is:

`$$\mu = E[Y] = \sum_{j=1}^{\infty} \tau_{j}\mathbb{P}[Y=\tau_{j}]$$`

For continuous random variables with density f(y):

$$ \mu = E[Y] = \int_{-\infty}^{\infty} yf(y)dy $$

---
## Conditional Expectation Function

The conditional expectation of log(wage) given gender, race, and education is:

$$ E[log(wage) | gender = man, race = white, education = 12 ] $$
<div class="figure" style="text-align: center">
<img src="data:image/png;base64,#CEF.png" alt=" " width="65%" />
<p class="caption"> </p>
</div>

---
## Conditional Expectation Function

In general: `$$E[Y|X_1 = x_1 , X_2 = x_2... X_k = x_k] = m(x_1,x_2, ...,x_k )$$`

We can write the conditioning variables as a vector in `$\mathbb{R}^k$`:

`$$X= \left(
\begin{array}{c}
X_1\\
X_2\\
\vdots\\
X_k
\end{array}
\right)$$`
The CEF:

$$m(x) = E[Y|X=x]; x \epsilon \mathbb{R}^{k} $$

When `$X=x$`, the average value of `$Y$` is `$m(x)$`.

---

## Continuous CEF

In previous example, conditioning variables are discrete. Yet, many are continuous.

Vars `$(Y,X)$` are continuously distributed with a **joint density** function `$f(y,x)$`, with marginal density of x:

`$$f_X(x) = \int_{-\infty}^{\infty} f(y,x)dy$$`

The **conditional density** of Y given X is:

$$ \color{green} {f_{X|Y} (y|x)} =  {f(y,x) \over f_X(x)}$$

For any `$x$` such that `$f_X(x)>0$`

*It is divided by the marginal density `$f_X(x)$` so that it integrates to one.

---
## Continuous CEF 
The **conditional density** is a renormalized slice of the joint density `$f(y,x)$` holding `$x$` fixed.

[An example code here!](https://github.com/fcabrerahz/EconometricsME/blob/main/Code/1_CEF_joint_density_routine.R)
---
## Continuous CEF

The CEF of `$Y$` given `$X=x$` is the expectation of the conditional density (the sliced joint density):

`$$m(x) = E[Y|X=x]  = \int_{-\infty}^{\infty} y \color{green}{ f_{X|Y}(y|x)} \, dy$$`

`$m(X)$` is the expectation of Y for the sub population where conditioning variables are fixed at x.

When X is continuously distributed this subpopulation is infinitely small.

Again: this definition is appropriated when the **marginal density of X** is "well defined".

Won't work when the event `$X=x$` has zero probability (`$f_X(x)=0$`)

---
## Continuous CEF 
So the CEF is the solid line. **Note is a smooth but nonlinear function!**

<div class="figure" style="text-align: center">
<img src="data:image/png;base64,#CEFC.png" alt=" " width="90%" />
<p class="caption"> </p>
</div>
---
<style>
  .centered-word {
    position: absolute;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%);
  }
</style>

<div class="centered-word">
  <h1>CEF Error</h1>
</div>

---
## CEF error

Difference between Y and the CEF evaluated at X:

`$$e = Y- m(X) \\   Y = m(X) + e$$`

*Remember the conditioning theorem: `$E[X|X]=X$`; `$E[g(X)|X]=g(X)$` for any `$g(.)$`*

Conditional mean of zero:
`$$E[e|X] = E[Y- m(X)|X] \\
= E[Y|X] - E[m(X)|X] \\
= m(X) - m(X) = 0$$`

---
## CEF error

The unconditional mean is also zero.

*Remember the simple law of iterated expectations: `$E[E[Y|X]]=E[Y]$`*

$$ E[E[e|X]] = E[e] = E[0] = 0$$

For any function `$h(x), E[h(X)e]=0$` so e is uncorrelated with any function of vector X.

---
## CEF error

Considering:

`$Y = m(X) + e$`; and `$E[e|X]=0$`

If `$E[e|X]=0$` (conditional mean restriction), then `$m(X)$` is the CEF of `$Y$` given `$X$`.

The conditional mean of `$e$` is zero and thus independent of X.

This implies `$e$` is **mean independent** of whatever value of `$X$`.

It does not imply that the **distribution of `$e$`** is independent of `$X$`.

*Generally, `$e$` and `$X$` are jointly dependent even with the conditional mean of `$e = 0$`.*

---
## CEF error

e.g. the shape of conditional distribution varies with the level of experience.

---
## Variance of error

The unconditional variance of CEF error is:

`$$\sigma^{2} = var[e] = E[(e-E[e])^{2}] = E[e^{2}]$$`

`$\sigma^{2}$` is the variance of the regression error.

This is, the variation in `$Y$` not accounted by `$E[Y|X]$`, as in `$e=Y-E[Y|X]$`:

<!--
**Theorem 2.4** If  `$E|Y|^{r} < \infty$` for `$r \ge 1$` then `$E|e|^{r} < \infty$`

**Theorem 2.5** if `$E[Y^{2}] < \infty$` then `$\sigma^{2} < \infty$` 
-->

---
## Variance of error

The error variance depends on information in X, e.g. Consider:

`$$Y =  E[Y|X_1] + e_1 \\ 
Y = E[Y|X_1,X_2] + e_2$$`

**Theorem 2.6:**

`$$var[Y] \ge var[Y-E[Y|X_1]] \ge var[Y-E[Y|X_1, X_2]].$$`

Increasing the amount of elements in vector (X) decreases the variance of unexplained portion of Y.

Why?

---
## Variance of error

**Jensen's Inequality:**

If `$g(x)$` is convex, for any random vector `$X \in \mathbb{R}^k$`: `$g (\mathbb{E}[X]) \le \mathbb{E}[g(X)]$`

If `$g(x)$` is concave, this reverses.

*Example:*
Let `$X$` be a random variable, and let `$g(x) = x^2$`, which is a convex function.

We have for this specific case:

$$
g(\mathbb{E}[X]) = (\mathbb{E}[X])^2 \quad \text{and} \quad \mathbb{E}[g(X)] = \mathbb{E}[X^2]
$$

The inequality becomes: `$$(\mathbb{E}[X])^2 \leq \mathbb{E}[X^2]$$`

---
## Variance of error

**Some Proof**:

The variance of `$X$` is always non-negative:

`$$\text{Var}(X) = E[X^2] - (E[X])^2 \geq 0$$`

Hence: 
`$$({E}[\color{green}{X}])^2 \leq {E}[\color{green}{X}^2]$$`

[some code here!](https://github.com/fcabrerahz/EconometricsME/blob/main/Code/2_CEF_jensens_routine.R)

---
## Variance of error

Combining iterated expectations and Jensen's Inequality:

- *law of iterated expectations: `$E[E[Y|X_1,X_2]|X_1]=E[Y|X_1]$`*
- *conditioning theorem `$E[g(X)|X]]=g(X)$`*

`$$(E(\color{green}{E[Y|X_1, X_2]}|X_1]))^2 \leq  E[\color{green}{({E}[Y|X_1, X_2])}^2|X_1]$$`

Taking unconditional expectations:

`$$(E(\color{green}{E[Y|X_1]}))^2 \leq  E[\color{green}{({E}[Y|X_1, X_2])}^2]$$`

Hence:

`$$var[Y] \ge var[Y-E[Y|X_1]] \ge var[Y-E[Y|X_1, X_2]].$$`

[more code!](https://github.com/fcabrerahz/EconometricsME/blob/main/Code/3_variance_theorem.R)

---
## Conditional Variance

The conditional variance is a function of the conditioning variables.

For a random variable `$W$` given `$X=x$`  is: 
`$$\sigma^{2}(x) = var[W|X=x] = E[W-(E[W|X=x])^{2} | X=x]$$`

The conditional variance of a random variable `$W$`, is the conditional second moment centered around the conditional first moment.

---
## Conditional Variance

Yet,  given `$E[e]=0$`, the conditional variance of `$e$` is:

`$\sigma^{2}(x) = var[e|X=x] = \color{green}{E[e^{2}|X=x]}$`;

This is the conditional mean of `$e^2$` given X.

If `$X$` treated as random: `$var [e|X] = \sigma^{2}(X)$`

---
## Conditional Variance

The conditional variance of `$e$` is equivalent to `$\sigma^2(x) = var[Y|X=x]$` (conditional variance of the dependent variable)

An important exception, for the variance of `$e$`, when `$\sigma^{2}(x)$` is contant and independent of x (homoskedasticity): `$\sigma^{2}(x)= \sigma^{2}$`

The general case is heteroskedasticity (depending on x).

---
## Conditional Variance

The difference between the log-wage densities of men and women is
not purely a location shift is also a difference in spread.

---
## Unconditional and Conditional Variance

As commonly known, we can decompose the variance as:

`$$var[X] = \color{green}{E[var[X|W]}+var[E[X|W]]$$`

`$\color{green}{E[var[X|W]]}$`, relates to how much `$X$` fluctuates within each fixed value of `$W$`

`$var[E[X|W]]$`, relates to how the mean of `$X$` changes between different values of `$W$`.

Yet in the case of the error, its conditional mean: `$E[e|X]=0$`.

Hence: `$var[e]= \color{green}{E[var[e|X]]}$`

---
## Unconditional and Conditional Variance

Given:

`$$var[e]= \color{green}{E[var[e|X]]}$$`

We have shown: `$$var[e]=\sigma^2=\color{green}{E[e^2]}$$`

So  `$var[e]= E[\color{green}{E[e^2}|X]]=E[e^2]$`, for law of iterated expectations.

The average of the conditional error variance is the unconditional error variance.

*.blue[*Ask GPT for data demonstration in R.*]

---
<style>
  .centered-word {
    position: absolute;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%);
  }
</style>

<div class="centered-word">
  <h1>CEF and Linear Regression</h1>
</div>

---
## Derivative

`$m(x) = E[Y|X=x]$` in terms of marginal changes `$\Delta m(x)$`.

The marginal effect of a change in `$X_1$`, holding the rest constant:

`$${\delta \over \delta x_1} m(x_1,...x_k)$$`
When CEF is discrete (binary):

`$$m(1, x_2,...xk) - m(0,x_2,... x_k)$$`

---
## Derivative

We can collect the `$k$` effects into one `$kx1$` vector and make the derivative of it.

`$$\Delta m(x) = \left(
\begin{array}{c}
\Delta_1 m(x)\\
\Delta_2 m(x)\\
\vdots\\
\Delta_k m(x)
\end{array}
\right)$$`

Holding constant the variables included in the conditional mean (not quite the same as *ceteris paribus*). **The regression derivative is the change in E(Y|X).**

The change in **the actual value of Y** happens only if `$e$` is unaffected by the change in X: `$E(eX)=0$`

---
##Linear CEF (linear regression)

If `$m(x)= E(Y|X=x)$` is linear in `$x$` we can write:

`$$m(x) = x_1 \beta_1 + x_2 \beta_2 + ... + X_k \beta_k + \beta_{k+1}$$`

Same as the `$kx1$` vector: 
`$$X = \left(
\begin{array}{c}
X_1 \\
\vdots\\
X_k-1\\
1\\
\end{array}
\right)$$`

With this definition: `$m(x)= x_1\beta_1 + x_2\beta_2 + ... + \beta_k = x'\beta$` (*)
---
##Linear CEF (linear regression)

with a `$kx1$` coefficient vector (the regression derivative):

`$$\beta = \left(
\begin{array}{c}
\beta_1 \\
\beta_2 \\
\vdots\\
\beta_k
\end{array}
\right)$$`

This is the **linear CEF model**.

Better know as linear regression model, or regression of Y on X.

In the linear CEF model the regression derivative is simply the coefficient vector.

---
##Linear CEF (linear regression)

Linear Homoskedastic CEF model:

`$Y= X' \beta + e$`; `$E[e|X] = 0$`; `$E[e^2|X] = \sigma^2$`

Linear Heteroskedastic CEF model:

`$Y= X' \beta + e$`; `$E[e|X] = 0$`; `$E[e^2|X] = \sigma^2(X)$`

---
##CEF as best Predictor

Any predictor of Y can be expressed as a function `$g(X)$` of `$X$`.

The prediction error: `$Y - g(X)$`.

A non-stochastic measure of the magnitude of the prediction error is the MSE:

`$$E[(Y - g(X))^2]$$`

Best predictor is the function `$g(X)$` minimizing error regardless of the joint distribution of `$(Y,X)$`

---
##CEF as best Predictor (Demonstration)

$$E [(\color{green}{Y} - g(X))^2] = $$

$$E[(\color{green}{e + m(x)} - g(X))^2] = $$

`$$E[e^2] + 2E[e(m(X)-g(X))]+E[(m(X)-g(X))^2] =$$`

For property of CEF error: `$E[h(X)e]=0$`

`$$E[e^2] + E[(m(X)-g(X))^2] \ge E[(\color{green}{E[e^2]};$$`

where `$E[e^2] =_{def} E[(Y - m(X))^2])$`

If `$E[Y^2]<\infty$` then for any `$g(X)$`, `$E[(Y - g(X))^2] \ge E[(\color{green}{E[(Y - m(X)^2]}$`

Only if `$m(x)=g(x)$`, or at least with CEF, we have the best predictor.

---
##Best Linear Predictor
`$m(x) = E(Y|X)$` is the best predictor of Y among all functions of X. Yet **its functional form is unknown**.

CEF is unlikely to be accurate unless `$X$` is discrete, low-dimensional and *all interactions* are included.

**It is more realistic to assume a linear predictor as an approximation.**

So the CEF model is a (lineal) approximation of the population change in actual Y given X, provided `$E(e)=0$`, `$E(Xe)$`, `$E(h(X)e)=0.$`

We then estimate a lineal projection, which we can model with data via OLS, MLE, or other method.

---
##Best Linear Predictor

We can define an approximation to the CEF linear function with the lowest MSE.

**Assumptions**:

1. `$E[Y^2] < \infty$`
2. `$E||X^2|| < \infty$`
3. `$Q_{xx} = E[XX']$` is positive definite

Imply variables Y and X have finite means variances and covariances and matrix X is invertible.

---
##Best Linear Predictor
A linear predictor for `$Y$` is a function `$X' \beta$` for some `$\beta \in \mathbb{R}^k$`

The best linear predictor of Y given X is `$\mathcal{P}[Y|X]=X'\beta$`

`$\beta$` minimizes the mean squared prediction error, MSPE:

`$$S(\beta)=E[(Y-X'\beta)^2]$$`
The minimizer `$\beta = argmin S(b)$` ; `$b \in \mathbb{R}^k$` is called the **linear projection coefficient**.

---
##Best Linear Predictor
Calculating an explicit expression for its value:

- We can write MSPE as a quadratic function of `$\beta$`, to solve explicitly for the minimizer:
`$$S(\beta) = E[Y^2] - 2\beta'E[XY] + \beta'E[XX']\beta$$`
- FOC for minimization:
`$$0 = {\delta \over \delta \beta} S(\beta) = -2E[XY] + 2E[XX']\beta$$`
`$$2E[XY] = 2E[XX']\beta$$` `$$Q_{XY} = Q_{XX}\beta$$`
`$$\color{green}{\beta = Q_{XX}^{-1} Q_{XY}}$$`

---
##Best Linear Predictor

`$$\color{green}{\beta = Q_{XX}^{-1} Q_{XY}}$$`
Equivalent: `$$\beta=(E[XX'])^{-1} E[XY]$$`

`$Q_{XX}$` is a `$k\times k$` matrix and `$Q_{XY}$` is a `$k \times 1$` column vector.

If `$Q_{XX}^{-1}$` not invertible, there are multiple solutions for `$\beta$`

The best linear predictor is thus: `$$\mathcal{P}[Y|X]=X'\beta= X'(E[XX'])^{-1} E[XY]$$`

Also known as the **linear projection** of Y on X.

---
##Best Linear Predictor

The projection error is: `$e = Y - X'\beta$` ; `$Y = X'\beta + e$`

This equation is the best linear predictor of Y given X, or the linear projection of Y on X.

Economists call it: **"The Regression"**.

So linear CEF model, in theory, is called the linear regression model, in practice.

---
##Best Linear Predictor
`$Y = X'\beta + e$`

An important property is: `$E[Xe] = 0$`

Proof: using properties `$AA^{-1} = I$` and `$Ia=a$`

`$$E[Xe] = E[X(Y-X'\beta)]$$` 
`$$= E[XY] - E[XX'](E[XX']^{-1})E[XY] = 0$$`
---
##Best Linear Predictor

For a set of `$k$` equations, one for each regressor `$E[X_{j} e] = 0$` for `$j= 1 ... k$`; with constant `$X_k=1$`: `$E[e]=0$`

**The projection error is mean-zero when a constant is added.**

Since `$cov(X_j, e) = E[X_{j} e] - E[X_{j}] E[e]=0$`: `$X_j$` and `$e$` are uncorrelated.

Suming up, for any random (Y,X) with *finite* variances, we can get `$Y = X'\beta + e$`

.red[Remember: This defined as the best linear **predictor**, NOT necessarily is a parameter of a structural or causal economic model.]

---
##Ilustration for Best Linear Predictor

The (full) CEF of a log(wage) as function of black and female:

`$$\small E[log(wage) | black, female] = -0.20 black - 0.24 female + 0.10 black x female +3.06$$`

Now consider the lineal projection:

$$ \mathcal{P}[log(wage) | black, female] =  -0.15 black - 0.23 female + 3.06$$

The (full) CEF shows that race gap varies by gender: 20% for black men and 10% for black women.

The projection model approximates this with an average gap of 15% for black, regardless of gender.

---
## Ilustration for Best Linear Predictor

Linear projection of wages on years of education: 
$$\mathcal{P}[log(wage) | education] = 0.11 education + 1.5 $$
<div class="figure" style="text-align: center">
<img src="data:image/png;base64,#BLP_ex.png" alt=" " width="33%" />
<p class="caption"> </p>
</div>

Spline approximation: 
`$$\mathbb{P}[log(wage)|education, (education-9) * \mathbb{1} \{education > 9\}]$$` 
`$$=0.02 education +0.10 (education-9) * \mathbb{1} \{education > 9\} +2.3$$` 
---
## Ilustration for Best Linear Predictor

A linear projection can be a poor approximation to the CEF.

*Write the estimated equations, depicted in the figure, for 12 years of education.*

---
## Ilustration for Best Linear Predictor

``` r
set.seed(123) # For reproducibility
n <- 500
X <- runif(n, -2, 2) # Generate X uniformly from -2 to 2
epsilon <- rnorm(n) # Random noise
Y <- X^2 + epsilon # Nonlinear relationship: CEF is X^2

true_cef <- function(x) x^2

linear_model <- lm(Y ~ X)

# Create a grid for plotting
grid_X <- seq(min(X), max(X), length.out = 500)
pred_linear <- predict(linear_model, newdata = data.frame(X = grid_X))

# Plot the results
plot(X, Y, pch = 16, col = rgb(0, 0, 1, 0.5), 
     xlab = "X", ylab = "Y", main = "CEF vs Linear Projection")
lines(grid_X, true_cef(grid_X), col = "green", lwd = 2, lty = 2) 
lines(grid_X, pred_linear, col = "red", lwd = 2, lty = 1) 
legend("topright", legend = c("True CEF: E[Y|X]", "Linear Projection"),
       col = c("green", "red"), lty = c(2, 1), lwd = 2, bty = "n")
```
---
## Ilustration for Best Linear Predictor

<img src="data:image/png;base64,#CEF_v1_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" />
---
## Linear predictor error variance
As in the CEF `$\sigma^{2} = E[e^{2}]$`:

`$$\sigma^{2} = E[(Y - X'\beta)^{2}]$$`
`$$= E[Y^{2}] - 2E[YX'] \color{Green}\beta + \color{blue}\beta'E[XX'] \color{Green}\beta$$`
`$$=Q_{YY}-2Q_{YX} \color{Green}{ Q_{XX}^{-1}Q_{XY}}+\color{Blue}{Q_{YX}Q_{XX}^{-1}}Q_{XX}\color{Green}{Q_{XX}^{-1}Q_{XY}}$$`

`$$=Q_{YY}-Q_{YX}Q_{XX}^{-1}Q_{XY}$$`

`$Q_{YY}$` represents the total variability of Y

`$Q_{YX}Q_{XX}^{-1}Q_{XY}$` represents the variability explained by the linear projection of Y on X.

---
## Omitted Variable Bias (OVB)

Projection of Y on X is: `$Y=\color{Green}{X'_1 \beta_1 + X'_2 \beta_2 + e}$`; `$E[Xe]=0$`

Consider now projection of `$Y$` on `$X_1$`: `$Y=X'_1 \gamma_1+ u$`; `$E[X_1u]=0$`

We calculate:

`$$\gamma_1=(E[X_1,X'_1])^{-1}E[X_1Y]$$`
`$$=(E[X_1,X'_1])^{-1}E[X_1(\color{Green}{X'_1\beta_1 + X'_2\beta_2 + e})]$$`
`$$\beta_1 + (E[X_1,X'_1])^{-1} E[X_1 X'_2]\beta_2$$`

Where `$(E[X_1,X'_1])^{-1} E[X_1 X'_2] = \Gamma_{12}$` is the coefficient matrix from a projection of `$X_2$` on `$X_1$`

Unless `$\Gamma_{12} = 0$` or `$\beta_2=0$`; `$\beta_1 \ne \gamma_1$` and there is an OVB.

---
<style>
  .centered-word {
    position: absolute;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%);
  }
</style>