class: center, middle # Conditional Expectation Function (CEF) ## Econometría #### Dr. Francisco J. Cabrera-Hernández #### Maestría en Economía Primavera 2025 #####CIDE Santa Fe, Ciudad de México. --- <style> .centered-word { position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); } </style> <div class="centered-word"> <h1>Introduction</h1> </div> --- ## Econometrics Sir Clive Granger, Nobel Laureate, noted: *“We need a special field called econometrics, and textbooks about it, because it is generally accepted that economic data possess certain properties that are not considered in standard statistics texts”* --- ## ¿What is it for? **Predicción (Series de Tiempo):** - Utiliza los datos del pasado y observa su comportamiento: tendencia o estacionalidad. - Utiliza un modelo/algortimo que mejor se adapte a esa "forma" de los datos (curve-fitting). - A partir de esto predice, **suponiendo que todo lo demás permanece cambiando igual** --- ## Time Series <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#mtimeseries.png" alt=" " width="80%" /> <p class="caption"> </p> </div> --- ## Machine Learning (Data Science) - En esencia es el mismo procedimiento que que la modelación de series de tiempo (curve-fitting) - Pueden integrar "predictores": edad, escolaridad, género... - Use cada vez más extenso, complementado con otros métodos (computer intensive) <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#ML.jpg" alt=" " width="50%" /> <p class="caption"> </p> </div> --- ## ¿What is it for? **Causal inference:** - El más importante para establecer relaciones causales entre variables. - Integra al análisis **factores no observables.** - No busca predecir (los economistas hemos sido bastante malos en esto). Se hace preguntas "what if?" - Por ejemplo: [¿Estudiar en una universidad privada mejora los ingresos de las personas?](https://www.nber.org/papers/w7322). - El análisis causal se puede complementar con *Machine Learning*. --- ## Reduced-Form Causal Analysis - México es el segundo receptor de remesas del mundo ¿pero sirven para reducir la pobreza? - La teoría clásica dice que el salario mínimo encima del equilibrio genera desempleo ¿en México lo generó? - La teoría dice que el comercio internacional genera ventajas comparativas que benefician a los trabajadores ¿es cierto para México? --- ## Data - Ideally, we would use experimental data to answer these questions ¿how? - Most economic data is **observational**. This means that all variables must be treated as random and possibly jointly determined. - We can measure the joint distribution of variables and assess their joint dependence. - But it is difficult to infer **causality** from observational data. It requires **identification**. --- ## Data - Cross-sectional: one observation per individual. Typically surveys or administrative data. - We often assume that cross-sectional observations are mutually independent. - We mean `\(i^{th}\)` observation `\((Y_i,X_i)\)` is independent of the `\(j^{th}\)` observation `\((Y_j,X_j)\)` for `\(i\ne j\)`. - *Independently distributed. Relates to the relationship betwen `\(i\)` and `\(j\)` not between `\(Y\)` and `\(X\)`* - If the data are randomly gathered, draw from the same probability distribution `\(F(y,x)\)` (the population) it is identically distributed. --- ## Data - Hence data we will work with are **i.i.d.** or a **random sample.** - However the data can appear in clusters (schools, firms, municipalities, states.). - This violates independence between observations within clusters. - We eventually deal with these types of data in this course. --- ## Notation - Variables denoted with capital letters: `\(Y\)`, `\(X\)`, `\(Z\)`. - Random variables and vectors (mathematically): `\(Y\)`, `\(X\)`, `\(Z\)`. Except equation errors: `\(e\)`, `\(u\)` or `\(v\)`. - Real numbers (elements of the real line `\(\mathbb{R}\)`), or scalars, are lower case `\(x\)`. - Vectors (elements of `\(\mathbb{R}^k\)`) are also lower case: `$$\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_k \end{pmatrix}$$` - Matrices are obviously bold capital (***X***). - Unknown parameters are Greek letters: `\(\beta\)`, `\(\theta\)`, etc. --- <style> .centered-word { position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); } </style> <div class="centered-word"> <h1>Conditional Expectation Function (CEF)</h1> </div> --- ## Introduction to CEF - Least squares is a tool to estimate the conditional mean of one variable. - Here we abstract from estimation and focus on the probabilistic foundation. - Implies a quick review of theory. - Later we recap LS's algebra and parameter estimation. --- ## Distribution of wages (Example) Wage is a random (we don't know the wage before measuring) variable with **probability distribution**: `$$F(u)=\mathbb{P}[wage \le u ]$$` Observed wages are realization from the distribution `\(F\)`. If differentiable, the PDF: $$ f(u) = {d \over du} F(u) $$ --- ## Distribution of wages (Example) Mean or expectation for a random variable Y with discrete support is: `$$\mu = E[Y] = \sum_{j=1}^{\infty} \tau_{j}\mathbb{P}[Y=\tau_{j}]$$` For continuous random variables with density f(y): $$ \mu = E[Y] = \int_{-\infty}^{\infty} yf(y)dy $$ --- ## Conditional Expectation Function The conditional expectation of log(wage) given gender, race, and education is: $$ E[log(wage) | gender = man, race = white, education = 12 ] $$ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#CEF.png" alt=" " width="65%" /> <p class="caption"> </p> </div> --- ## Conditional Expectation Function In general: `$$E[Y|X_1 = x_1 , X_2 = x_2... X_k = x_k] = m(x_1,x_2, ...,x_k )$$` We can write the conditioning variables as a vector in `\(\mathbb{R}^k\)`: `$$X= \left( \begin{array}{c} X_1\\ X_2\\ \vdots\\ X_k \end{array} \right)$$` The CEF: $$m(x) = E[Y|X=x]; x \epsilon \mathbb{R}^{k} $$ When `\(X=x\)`, the average value of `\(Y\)` is `\(m(x)\)`. --- ## Continuous CEF In previous example, conditioning variables are discrete. Yet, many are continuous. Vars `\((Y,X)\)` are continuously distributed with a **joint density** function `\(f(y,x)\)`, with marginal density of x: `$$f_X(x) = \int_{-\infty}^{\infty} f(y,x)dy$$` The **conditional density** of Y given X is: $$ \color{green} {f_{X|Y} (y|x)} = {f(y,x) \over f_X(x)}$$ For any `\(x\)` such that `\(f_X(x)>0\)` *It is divided by the marginal density `\(f_X(x)\)` so that it integrates to one. --- ## Continuous CEF The **conditional density** is a renormalized slice of the joint density `\(f(y,x)\)` holding `\(x\)` fixed. <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#CEFC.png" alt=" " width="90%" /> <p class="caption"> </p> </div> [An example code here!](https://github.com/fcabrerahz/EconometricsME/blob/main/Code/1_CEF_joint_density_routine.R) --- ## Continuous CEF The CEF of `\(Y\)` given `\(X=x\)` is the expectation of the conditional density (the sliced joint density): `$$m(x) = E[Y|X=x] = \int_{-\infty}^{\infty} y \color{green}{ f_{X|Y}(y|x)} \, dy$$` `\(m(X)\)` is the expectation of Y for the sub population where conditioning variables are fixed at x. When X is continuously distributed this subpopulation is infinitely small. Again: this definition is appropriated when the **marginal density of X** is "well defined". Won't work when the event `\(X=x\)` has zero probability (`\(f_X(x)=0\)`) --- ## Continuous CEF So the CEF is the solid line. **Note is a smooth but nonlinear function!** <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#CEFC.png" alt=" " width="90%" /> <p class="caption"> </p> </div> --- <style> .centered-word { position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); } </style> <div class="centered-word"> <h1>CEF Error</h1> </div> --- ## CEF error Difference between Y and the CEF evaluated at X: `$$e = Y- m(X) \\ Y = m(X) + e$$` *Remember the conditioning theorem: `\(E[X|X]=X\)`; `\(E[g(X)|X]=g(X)\)` for any `\(g(.)\)`* Conditional mean of zero: `$$E[e|X] = E[Y- m(X)|X] \\ = E[Y|X] - E[m(X)|X] \\ = m(X) - m(X) = 0$$` --- ## CEF error The unconditional mean is also zero. *Remember the simple law of iterated expectations: `\(E[E[Y|X]]=E[Y]\)`* $$ E[E[e|X]] = E[e] = E[0] = 0$$ For any function `\(h(x), E[h(X)e]=0\)` so e is uncorrelated with any function of vector X. --- ## CEF error Considering: `\(Y = m(X) + e\)`; and `\(E[e|X]=0\)` If `\(E[e|X]=0\)` (conditional mean restriction), then `\(m(X)\)` is the CEF of `\(Y\)` given `\(X\)`. The conditional mean of `\(e\)` is zero and thus independent of X. This implies `\(e\)` is **mean independent** of whatever value of `\(X\)`. It does not imply that the **distribution of `\(e\)`** is independent of `\(X\)`. *Generally, `\(e\)` and `\(X\)` are jointly dependent even with the conditional mean of `\(e = 0\)`.* --- ## CEF error e.g. the shape of conditional distribution varies with the level of experience. <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#CEFe.png" alt=" " width="85%" /> <p class="caption"> </p> </div> --- ## Variance of error The unconditional variance of CEF error is: `$$\sigma^{2} = var[e] = E[(e-E[e])^{2}] = E[e^{2}]$$` `\(\sigma^{2}\)` is the variance of the regression error. This is, the variation in `\(Y\)` not accounted by `\(E[Y|X]\)`, as in `\(e=Y-E[Y|X]\)`: <!-- **Theorem 2.4** If `\(E|Y|^{r} < \infty\)` for `\(r \ge 1\)` then `\(E|e|^{r} < \infty\)` **Theorem 2.5** if `\(E[Y^{2}] < \infty\)` then `\(\sigma^{2} < \infty\)` --> --- ## Variance of error The error variance depends on information in X, e.g. Consider: `$$Y = E[Y|X_1] + e_1 \\ Y = E[Y|X_1,X_2] + e_2$$` **Theorem 2.6:** `$$var[Y] \ge var[Y-E[Y|X_1]] \ge var[Y-E[Y|X_1, X_2]].$$` Increasing the amount of elements in vector (X) decreases the variance of unexplained portion of Y. Why? --- ## Variance of error **Jensen's Inequality:** If `\(g(x)\)` is convex, for any random vector `\(X \in \mathbb{R}^k\)`: `\(g (\mathbb{E}[X]) \le \mathbb{E}[g(X)]\)` If `\(g(x)\)` is concave, this reverses. *Example:* Let `\(X\)` be a random variable, and let `\(g(x) = x^2\)`, which is a convex function. We have for this specific case: $$ g(\mathbb{E}[X]) = (\mathbb{E}[X])^2 \quad \text{and} \quad \mathbb{E}[g(X)] = \mathbb{E}[X^2] $$ The inequality becomes: `$$(\mathbb{E}[X])^2 \leq \mathbb{E}[X^2]$$` --- ## Variance of error **Some Proof**: The variance of `\(X\)` is always non-negative: `$$\text{Var}(X) = E[X^2] - (E[X])^2 \geq 0$$` Hence: `$$({E}[\color{green}{X}])^2 \leq {E}[\color{green}{X}^2]$$` [some code here!](https://github.com/fcabrerahz/EconometricsME/blob/main/Code/2_CEF_jensens_routine.R) --- ## Variance of error Combining iterated expectations and Jensen's Inequality: - *law of iterated expectations: `\(E[E[Y|X_1,X_2]|X_1]=E[Y|X_1]\)`* - *conditioning theorem `\(E[g(X)|X]]=g(X)\)`* `$$(E(\color{green}{E[Y|X_1, X_2]}|X_1]))^2 \leq E[\color{green}{({E}[Y|X_1, X_2])}^2|X_1]$$` Taking unconditional expectations: `$$(E(\color{green}{E[Y|X_1]}))^2 \leq E[\color{green}{({E}[Y|X_1, X_2])}^2]$$` Hence: `$$var[Y] \ge var[Y-E[Y|X_1]] \ge var[Y-E[Y|X_1, X_2]].$$` [more code!](https://github.com/fcabrerahz/EconometricsME/blob/main/Code/3_variance_theorem.R) --- ## Conditional Variance The conditional variance is a function of the conditioning variables. For a random variable `\(W\)` given `\(X=x\)` is: `$$\sigma^{2}(x) = var[W|X=x] = E[W-(E[W|X=x])^{2} | X=x]$$` The conditional variance of a random variable `\(W\)`, is the conditional second moment centered around the conditional first moment. --- ## Conditional Variance Yet, given `\(E[e]=0\)`, the conditional variance of `\(e\)` is: `\(\sigma^{2}(x) = var[e|X=x] = \color{green}{E[e^{2}|X=x]}\)`; This is the conditional mean of `\(e^2\)` given X. If `\(X\)` treated as random: `\(var [e|X] = \sigma^{2}(X)\)` --- ## Conditional Variance The conditional variance of `\(e\)` is equivalent to `\(\sigma^2(x) = var[Y|X=x]\)` (conditional variance of the dependent variable) An important exception, for the variance of `\(e\)`, when `\(\sigma^{2}(x)\)` is contant and independent of x (homoskedasticity): `\(\sigma^{2}(x)= \sigma^{2}\)` The general case is heteroskedasticity (depending on x). --- ## Conditional Variance The difference between the log-wage densities of men and women is not purely a location shift is also a difference in spread. <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#CEFC.png" alt=" " width="90%" /> <p class="caption"> </p> </div> --- ## Unconditional and Conditional Variance As commonly known, we can decompose the variance as: `$$var[X] = \color{green}{E[var[X|W]}+var[E[X|W]]$$` `\(\color{green}{E[var[X|W]]}\)`, relates to how much `\(X\)` fluctuates within each fixed value of `\(W\)` `\(var[E[X|W]]\)`, relates to how the mean of `\(X\)` changes between different values of `\(W\)`. Yet in the case of the error, its conditional mean: `\(E[e|X]=0\)`. Hence: `\(var[e]= \color{green}{E[var[e|X]]}\)` --- ## Unconditional and Conditional Variance Given: `$$var[e]= \color{green}{E[var[e|X]]}$$` We have shown: `$$var[e]=\sigma^2=\color{green}{E[e^2]}$$` So `\(var[e]= E[\color{green}{E[e^2}|X]]=E[e^2]\)`, for law of iterated expectations. The average of the conditional error variance is the unconditional error variance. *.blue[*Ask GPT for data demonstration in R.*] --- <style> .centered-word { position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); } </style> <div class="centered-word"> <h1>CEF and Linear Regression</h1> </div> --- ## Derivative `\(m(x) = E[Y|X=x]\)` in terms of marginal changes `\(\Delta m(x)\)`. The marginal effect of a change in `\(X_1\)`, holding the rest constant: `$${\delta \over \delta x_1} m(x_1,...x_k)$$` When CEF is discrete (binary): `$$m(1, x_2,...xk) - m(0,x_2,... x_k)$$` --- ## Derivative We can collect the `\(k\)` effects into one `\(kx1\)` vector and make the derivative of it. `$$\Delta m(x) = \left( \begin{array}{c} \Delta_1 m(x)\\ \Delta_2 m(x)\\ \vdots\\ \Delta_k m(x) \end{array} \right)$$` Holding constant the variables included in the conditional mean (not quite the same as *ceteris paribus*). **The regression derivative is the change in E(Y|X).** The change in **the actual value of Y** happens only if `\(e\)` is unaffected by the change in X: `\(E(eX)=0\)` --- ##Linear CEF (linear regression) If `\(m(x)= E(Y|X=x)\)` is linear in `\(x\)` we can write: `$$m(x) = x_1 \beta_1 + x_2 \beta_2 + ... + X_k \beta_k + \beta_{k+1}$$` Same as the `\(kx1\)` vector: `$$X = \left( \begin{array}{c} X_1 \\ \vdots\\ X_k-1\\ 1\\ \end{array} \right)$$` With this definition: `\(m(x)= x_1\beta_1 + x_2\beta_2 + ... + \beta_k = x'\beta\)` (*) --- ##Linear CEF (linear regression) with a `\(kx1\)` coefficient vector (the regression derivative): `$$\beta = \left( \begin{array}{c} \beta_1 \\ \beta_2 \\ \vdots\\ \beta_k \end{array} \right)$$` This is the **linear CEF model**. Better know as linear regression model, or regression of Y on X. In the linear CEF model the regression derivative is simply the coefficient vector. --- ##Linear CEF (linear regression) Linear Homoskedastic CEF model: `\(Y= X' \beta + e\)`; `\(E[e|X] = 0\)`; `\(E[e^2|X] = \sigma^2\)` Linear Heteroskedastic CEF model: `\(Y= X' \beta + e\)`; `\(E[e|X] = 0\)`; `\(E[e^2|X] = \sigma^2(X)\)` --- ##CEF as best Predictor Any predictor of Y can be expressed as a function `\(g(X)\)` of `\(X\)`. The prediction error: `\(Y - g(X)\)`. A non-stochastic measure of the magnitude of the prediction error is the MSE: `$$E[(Y - g(X))^2]$$` Best predictor is the function `\(g(X)\)` minimizing error regardless of the joint distribution of `\((Y,X)\)` --- ##CEF as best Predictor (Demonstration) $$E [(\color{green}{Y} - g(X))^2] = $$ $$E[(\color{green}{e + m(x)} - g(X))^2] = $$ `$$E[e^2] + 2E[e(m(X)-g(X))]+E[(m(X)-g(X))^2] =$$` For property of CEF error: `\(E[h(X)e]=0\)` `$$E[e^2] + E[(m(X)-g(X))^2] \ge E[(\color{green}{E[e^2]};$$` where `\(E[e^2] =_{def} E[(Y - m(X))^2])\)` If `\(E[Y^2]<\infty\)` then for any `\(g(X)\)`, `\(E[(Y - g(X))^2] \ge E[(\color{green}{E[(Y - m(X)^2]}\)` Only if `\(m(x)=g(x)\)`, or at least with CEF, we have the best predictor. --- ##Best Linear Predictor `\(m(x) = E(Y|X)\)` is the best predictor of Y among all functions of X. Yet **its functional form is unknown**. CEF is unlikely to be accurate unless `\(X\)` is discrete, low-dimensional and *all interactions* are included. **It is more realistic to assume a linear predictor as an approximation.** So the CEF model is a (lineal) approximation of the population change in actual Y given X, provided `\(E(e)=0\)`, `\(E(Xe)\)`, `\(E(h(X)e)=0.\)` We then estimate a lineal projection, which we can model with data via OLS, MLE, or other method. --- ##Best Linear Predictor We can define an approximation to the CEF linear function with the lowest MSE. **Assumptions**: 1. `\(E[Y^2] < \infty\)` 2. `\(E||X^2|| < \infty\)` 3. `\(Q_{xx} = E[XX']\)` is positive definite Imply variables Y and X have finite means variances and covariances and matrix X is invertible. --- ##Best Linear Predictor A linear predictor for `\(Y\)` is a function `\(X' \beta\)` for some `\(\beta \in \mathbb{R}^k\)` The best linear predictor of Y given X is `\(\mathcal{P}[Y|X]=X'\beta\)` `\(\beta\)` minimizes the mean squared prediction error, MSPE: `$$S(\beta)=E[(Y-X'\beta)^2]$$` The minimizer `\(\beta = argmin S(b)\)` ; `\(b \in \mathbb{R}^k\)` is called the **linear projection coefficient**. --- ##Best Linear Predictor Calculating an explicit expression for its value: - We can write MSPE as a quadratic function of `\(\beta\)`, to solve explicitly for the minimizer: `$$S(\beta) = E[Y^2] - 2\beta'E[XY] + \beta'E[XX']\beta$$` - FOC for minimization: `$$0 = {\delta \over \delta \beta} S(\beta) = -2E[XY] + 2E[XX']\beta$$` `$$2E[XY] = 2E[XX']\beta$$` `$$Q_{XY} = Q_{XX}\beta$$` `$$\color{green}{\beta = Q_{XX}^{-1} Q_{XY}}$$` --- ##Best Linear Predictor `$$\color{green}{\beta = Q_{XX}^{-1} Q_{XY}}$$` Equivalent: `$$\beta=(E[XX'])^{-1} E[XY]$$` `\(Q_{XX}\)` is a `\(k\times k\)` matrix and `\(Q_{XY}\)` is a `\(k \times 1\)` column vector. If `\(Q_{XX}^{-1}\)` not invertible, there are multiple solutions for `\(\beta\)` The best linear predictor is thus: `$$\mathcal{P}[Y|X]=X'\beta= X'(E[XX'])^{-1} E[XY]$$` Also known as the **linear projection** of Y on X. --- ##Best Linear Predictor The projection error is: `\(e = Y - X'\beta\)` ; `\(Y = X'\beta + e\)` This equation is the best linear predictor of Y given X, or the linear projection of Y on X. Economists call it: **"The Regression"**. So linear CEF model, in theory, is called the linear regression model, in practice. --- ##Best Linear Predictor `\(Y = X'\beta + e\)` An important property is: `\(E[Xe] = 0\)` Proof: using properties `\(AA^{-1} = I\)` and `\(Ia=a\)` `$$E[Xe] = E[X(Y-X'\beta)]$$` `$$= E[XY] - E[XX'](E[XX']^{-1})E[XY] = 0$$` --- ##Best Linear Predictor For a set of `\(k\)` equations, one for each regressor `\(E[X_{j} e] = 0\)` for `\(j= 1 ... k\)`; with constant `\(X_k=1\)`: `\(E[e]=0\)` **The projection error is mean-zero when a constant is added.** Since `\(cov(X_j, e) = E[X_{j} e] - E[X_{j}] E[e]=0\)`: `\(X_j\)` and `\(e\)` are uncorrelated. Suming up, for any random (Y,X) with *finite* variances, we can get `\(Y = X'\beta + e\)` .red[Remember: This defined as the best linear **predictor**, NOT necessarily is a parameter of a structural or causal economic model.] --- ##Ilustration for Best Linear Predictor The (full) CEF of a log(wage) as function of black and female: `$$\small E[log(wage) | black, female] = -0.20 black - 0.24 female + 0.10 black x female +3.06$$` Now consider the lineal projection: $$ \mathcal{P}[log(wage) | black, female] = -0.15 black - 0.23 female + 3.06$$ The (full) CEF shows that race gap varies by gender: 20% for black men and 10% for black women. The projection model approximates this with an average gap of 15% for black, regardless of gender. --- ## Ilustration for Best Linear Predictor Linear projection of wages on years of education: $$\mathcal{P}[log(wage) | education] = 0.11 education + 1.5 $$ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#BLP_ex.png" alt=" " width="33%" /> <p class="caption"> </p> </div> Spline approximation: `$$\mathbb{P}[log(wage)|education, (education-9) * \mathbb{1} \{education > 9\}]$$` `$$=0.02 education +0.10 (education-9) * \mathbb{1} \{education > 9\} +2.3$$` --- ## Ilustration for Best Linear Predictor A linear projection can be a poor approximation to the CEF. <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#bad_LP.png" alt=" " width="60%%" /> <p class="caption"> </p> </div> *Write the estimated equations, depicted in the figure, for 12 years of education.* --- ## Ilustration for Best Linear Predictor ``` r set.seed(123) # For reproducibility n <- 500 X <- runif(n, -2, 2) # Generate X uniformly from -2 to 2 epsilon <- rnorm(n) # Random noise Y <- X^2 + epsilon # Nonlinear relationship: CEF is X^2 true_cef <- function(x) x^2 linear_model <- lm(Y ~ X) # Create a grid for plotting grid_X <- seq(min(X), max(X), length.out = 500) pred_linear <- predict(linear_model, newdata = data.frame(X = grid_X)) # Plot the results plot(X, Y, pch = 16, col = rgb(0, 0, 1, 0.5), xlab = "X", ylab = "Y", main = "CEF vs Linear Projection") lines(grid_X, true_cef(grid_X), col = "green", lwd = 2, lty = 2) lines(grid_X, pred_linear, col = "red", lwd = 2, lty = 1) legend("topright", legend = c("True CEF: E[Y|X]", "Linear Projection"), col = c("green", "red"), lty = c(2, 1), lwd = 2, bty = "n") ``` --- ## Ilustration for Best Linear Predictor <img src="data:image/png;base64,#CEF_v1_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- ## Linear predictor error variance As in the CEF `\(\sigma^{2} = E[e^{2}]\)`: `$$\sigma^{2} = E[(Y - X'\beta)^{2}]$$` `$$= E[Y^{2}] - 2E[YX'] \color{Green}\beta + \color{blue}\beta'E[XX'] \color{Green}\beta$$` `$$=Q_{YY}-2Q_{YX} \color{Green}{ Q_{XX}^{-1}Q_{XY}}+\color{Blue}{Q_{YX}Q_{XX}^{-1}}Q_{XX}\color{Green}{Q_{XX}^{-1}Q_{XY}}$$` `$$=Q_{YY}-Q_{YX}Q_{XX}^{-1}Q_{XY}$$` `\(Q_{YY}\)` represents the total variability of Y `\(Q_{YX}Q_{XX}^{-1}Q_{XY}\)` represents the variability explained by the linear projection of Y on X. --- ## Omitted Variable Bias (OVB) Projection of Y on X is: `\(Y=\color{Green}{X'_1 \beta_1 + X'_2 \beta_2 + e}\)`; `\(E[Xe]=0\)` Consider now projection of `\(Y\)` on `\(X_1\)`: `\(Y=X'_1 \gamma_1+ u\)`; `\(E[X_1u]=0\)` We calculate: `$$\gamma_1=(E[X_1,X'_1])^{-1}E[X_1Y]$$` `$$=(E[X_1,X'_1])^{-1}E[X_1(\color{Green}{X'_1\beta_1 + X'_2\beta_2 + e})]$$` `$$\beta_1 + (E[X_1,X'_1])^{-1} E[X_1 X'_2]\beta_2$$` Where `\((E[X_1,X'_1])^{-1} E[X_1 X'_2] = \Gamma_{12}\)` is the coefficient matrix from a projection of `\(X_2\)` on `\(X_1\)` Unless `\(\Gamma_{12} = 0\)` or `\(\beta_2=0\)`; `\(\beta_1 \ne \gamma_1\)` and there is an OVB. --- <style> .centered-word { position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); } </style> <div class="centered-word"> <h1>The End</h1> </div>