hetero.knit

class: center, middle
# Tema 6. Heteroscedasticidad
### Econometría
#### Licenciatura en Economía
#### Dr. Francisco J. Cabrera-Hernández
Otoño 2025
##### CIDE Santa Fe, Ciudad de México.

---
## Outline

- **.blue[Inferencia Robusta a Heteroscedasticidad]**

- Tests de Heteroscedasticidad.

- Weighted Least Squares.

- Feasible Least Squares.

- Errores estándar en cluster.

---
## Recuerde:

- Bajo MLR1 a MLR4 se puede estimar `$\beta$` insesgadamente.

- Bajo MLR5: `$var(u|x_1, x_2... x_k) = \sigma_2$`

- Bajo MLR6: existe normalidad en los residuos `$\mu \sim N(0,\sigma^2)$` independientemente de `$x_1, x_2... x_k$`

- **Los F-test y t-test son válidos bajo MLR 5 y MLR6.**

- Los valores críticos se obtienen bajo la hipóteisis nula y provienen de una normal estandarizada.

---
## Heteroscedasticidad Pura e Impura

**Heterocedasticidad Pura:**

- Ocurre cuando se viola MLR5 *y se cumple MLR1-MLR4* (i.e. modelo bien especificado)

**Heteroscedasticidad Impura:**

- Se da por un error de especificación (e.g. sesgo por omisión).

- Por ejemplo, piense en la relación entre ingreso y educación cuando no se controla por industria de trabajo.

- "The portion of the omitted effect *not represented by one of the included explanatory variables* must be absorbed by the error term".

---
## Consecuencias de la heteroscedasticidad

- OLS es aun insesgado y consistente bajo heteroscedasticidad.

- `$R^2$` y su interpretación no cambia:

`$$R^2 = 1 - \frac {\sigma^2_\mu} {\sigma^2_y}$$`
- en `$R^2$` Se usa `$\sigma^2_\mu$`, que es la varianza del error incondicional a `$x_j$`. No es afectada por heteroscedasticidad.

`$$SSR/n \to \sigma^2_\mu; SST/n \to \sigma^2_y$$`

- La varianza condicional `$var(u|x_1, x_2... x_k) = \sigma^2$` sí es afectada por heteroscedasticidad.

---
## Consecuencias de la heteroscedasticidad

- La heteroscedasticidad afecta cálculo de `$var(\hat\beta_1)$`.

- Como se mencionó F-test y t-test no son válidos si se viola MLR5.

- Sin MLR5 OLS ya no es **BLUE**. No es el más eficiente.

- OLS ya no es asintóticamente eficiente.

---
## Consecuencias de la heteroscedasticidad

Recuerde que con `$n \to \infty$` y bajo MLR5:

`$$var(\hat \beta_1) = \frac {\hat\sigma^2} {\sum_{i=1}^n (x_i - \bar x)^2}$$`

- `$\hat{var}(\hat\beta_1)$` se reduce a la tasa 1/n

- `$\hat\beta_1$` converge a `$\beta_1$` con mas/menos varianza: esto se refiere a la **eficiencia del estimador**

Sin MLR5 y condicional en `$x_i$`:

`$$var(\hat \beta_1) = \frac {{\sum_{i=1}^n (x_i - \bar x)^2} \hat\sigma_i^2} {[\sum_{i=1}^n (x_i - \bar x)^2]^2}$$`

---
##Eficiencia bajo homoscedasticidad

``` r
repet <- 5000   
running_se <- NULL

# Set seed for reproducibility
set.seed(123456)

# Simulate beta estimates and calculate the running mean
for (i in 50:repet) {
  x <- rnorm(i)  # Regressor x
  u <- rnorm(i, sd = 2)  # Random error with some variance
  y <- 2 + beta_1_true * x + u      # Define y, with beta_1 = 2
  model <- lm(y ~ x)  # Store beta_1 estimate
  # Calculate the running variance up to the current iteration
  robust_vcov <- vcovHC(model, type = "HC1")
  running_variances[i] <- diag(robust_vcov)
}

# we then plot the running variance...
```

---
##Eficiencia bajo homoscedasticidad

---
##Eficiencia bajo heteroscedasticidad

``` r
repet <- 5000   
running_variances <- NULL

# Set seed for reproducibility
set.seed(123456)

# Simulate beta estimates and calculate the running mean
for (i in 50:repet) {
  x <- rnorm(i)  # Regressor x
  u <- rnorm(i, sd = 2 + abs(x))  # Random error with some variance
  y <- 2 + beta_1_true * x + u      # Define y, with beta_1 = 2
  model <- lm(y ~ x)  # Store beta_1 estimate
  # Calculate the running variance up to the current iteration
  robust_vcov <- vcovHC(model, type = "HC1")
  running_variances[i] <- diag(robust_vcov)
}

# we then plot the robust se estimates...
```

---
##Eficiencia bajo heteroscedasticidad

---
## SE Robustos a Heteroscedasticidad

- Con tamaños de muestra pequeños, los estadísticos t robustos pueden tener distribuciones que no se aproximan bien a la distribución t.

- Los errores estándar robustos y los estadísticos t robustos solo están justificados cuando el tamaño de muestra es grande.

- Los errores estándar “usuales” (no robustos) siguen utilizándose porque, bajo los supuestos MLR5 y MLR6, los estadísticos t usuales siguen exactamente una distribución t.

- La versión robusta a heterocedasticidad de la prueba F no tiene una forma analítica sencilla, aunque puede calcularse utilizando con programas estadísticos.

---
##Derivación `$\hat{var}(\hat\beta_1)$`

`$$\hat\beta_1= \frac{\sum_{i=1}^n (x_i-\bar x)(y_i-\bar y)}{\sum_{i=1}^n (x_i-\bar x)^2}$$`
`$$\hat\beta_1= \frac{\sum_{i=1}^n (x_i-\bar x)y_i}{\sum_{i=1}^n (x_i-\bar x)^2}$$`
`$$\hat\beta_1= \frac{\sum_{i=1}^n (x_i-\bar x)(\beta_0+\beta_1x_1+\mu_i)}{\sum_{i=1}^n (x_i-\bar x)^2}$$`
`$$\hat\beta_1= \frac{\beta_1\sum_{i=1}^n (x_i-\bar x)x_i+ \sum_{i=1}^n (x_i-\bar x)\mu_i}{\sum_{i=1}^n (x_i-\bar x)^2}$$`
`$$var(\hat\beta_1)= var \left( \beta_1 + \frac{\sum_{i=1}^n (x_i-\bar x)\mu_i}{\sum_{i=1}^n (x_i-\bar x)^2} \right)$$`

---
##Derivación `$\hat{var}(\hat\beta_1)$`

`$$var(\hat\beta_1)=  \frac{var\sum_{i=1}^n (x_i-\bar x)\mu_i}{[\sum_{i=1}^n (x_i-\bar x)^2]^2}$$`
Con independencia de errores...

`$$var(\hat\beta_1)=  \frac{\sum_{i=1}^n var(x_i-\bar x)\mu_i}{[\sum_{i=1}^n (x_i-\bar x)^2]^2}$$`
Under heteroskedasticity...
`$$var(\hat\beta_1)=  \frac{\sum_{i=1}^n (x_i-\bar x)^2var(\mu_i)}{[\sum_{i=1}^n (x_i-\bar x)^2]^2}$$`
`$$\hat{var}(\hat\beta_1)=  \frac{\sum_{i=1}^n (x_i-\bar x)^2 \hat\mu_i^2}{[\sum_{i=1}^n (x_i-\bar x)^2]^2}$$`
---
## Derivación `$\hat{var}(\hat\beta_1)$`

`$$\hat{var}(\hat\beta_1)=  \frac{\sum_{i=1}^n (x_i-\bar x)^2 \hat\mu_i^2}{[\sum_{i=1}^n (x_i-\bar x)^2]^2}$$`

- En términos matriciales:

`$$A\hat{var}(\hat\beta_1) = (X'X)^{-1} \left(\sum_{i=1}^n\hat\mu_i^2 x_i'x_i \right)(X'X)^{-1}$$`
- Este es el "sandwich" estimator (White, HCO)

---
## SE Robustos a Heteroscedasticidad

- White (1980) demostró que un estimador válido de `$var(\hat\beta_1)$`, para heteroscedasticidad de cualquier forma, incluida homoscedasticidad, es:

`$$var(\hat \beta_1) = \frac {{\sum_{i=1}^n (x_i - \bar x)^2} \hat\sigma_i^2} {[\sum_{i=1}^n (x_i - \bar x)^2]^2} \to \hat{var}(\hat \beta_1) = \frac {{\sum_{i=1}^n (x_i - \bar x)^2} \hat\mu_i^2} {SST^2_x}$$`

- Que se obtiene de los datos luego de la regresión OLS.

---
## White SE matricial

`$y \sim N(\mu, \sigma^2) ; \mu = X\beta ; y \sim N(X\beta, \Sigma)$`, donde `$\Sigma=\sigma^2I$` bajo homoscedasticidad.

`$y: n\times1$`; `$X: n\times k$`; `$\beta: k\times 1$`; `$\Sigma:n \times n$`

- Matriz de varianza covarianza (sin supuestos)

`$$\Sigma = \begin{pmatrix}
\sigma_{11}^2 & \sigma_{12} & \sigma_{13} & \cdots & \sigma_{1n} \\
\sigma_{21} & \sigma_{22}^2 & \sigma_{23} & \cdots & \sigma_{2n} \\
\sigma_{31} & \sigma_{32} & \sigma_{33}^2 & \cdots & \sigma_{3n} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
\sigma_{n1} & \sigma_{n2} & \sigma_{n3} & \cdots & \sigma_{nn}^2
\end{pmatrix}$$`

---
## White SE matricial

- Asumiendo independencia:
`$$\Sigma = \begin{pmatrix}
\sigma_{1}^2 & 0 & 0 & \cdots & 0 \\
0 & \sigma_{2}^2 & 0 & \cdots & 0 \\
0 & 0 & \sigma_{3}^2 & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & \cdots & \sigma_{n}^2
\end{pmatrix}$$`

- Asumiendo homoscedasticidad:

`$$\Sigma = \begin{pmatrix}
\sigma_{}^2 & 0 & 0 & \cdots & 0 \\
0 & \sigma_{}^2 & 0 & \cdots & 0 \\
0 & 0 & \sigma_{}^2 & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & \cdots & \sigma_{}^2
\end{pmatrix} = \sigma^2
\begin{pmatrix}
1_{} & 0 & 0 & \cdots & 0 \\
0 & 1_{} & 0 & \cdots & 0 \\
0 & 0 & 1_{} & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & \cdots & 1_{}
\end{pmatrix}$$`

---

## Varianza de `$\hat\beta$` matricial:

- Recuerde:

$$
\hat{\beta} = (X'X)^{-1}X'y
$$

- Sustituyendo `$y = X\beta + u$`:

$$
\hat{\beta} - \beta = (X'X)^{-1}X'u
$$

- Varianza condicional en `$X$`:

$$
Var(\hat{\beta}|X) = Var\left[(X'X)^{-1}X'u \,\big|\, X\right]
$$

- Dado `$Var(AZ) = A\,Var(Z)\,A'$`, obtenemos:

$$
Var(\hat{\beta}|X) = [(X'X)^{-1} X'] Var(u|X) [(X'X)^{-1}X']'
$$

---

## Varianza de `$\hat\beta$` matricial:

Bajo Homoscedasticidad e independencia de errores:

`$Var(u|X) = \sigma^2 I$`, entonces:

$$
Var(\hat{\beta}|X) = [(X'X)^{-1} X'] \sigma^2 I [(X'X)^{-1}\color{green}{X'}]'
$$
- Dado `$[A\color{green}{B}]' = [\color{green}{B'}A']$` y `$[B']' = B$`:  
$$
Var(\hat{\beta}|X) = [(X'X)^{-1} X'] \sigma^2 I [\color{green}{X}[(X'X)^{-1}]']
$$
- Dado: `$[(X'X)^{-1}]'= (X'X)^{-1}$`

$$
Var(\hat{\beta}|X) = \sigma^2 (X'X)^{-1} X' I X(X'X)^{-1}
$$
Entonces:
`$$Var(\hat\beta|X) = \sigma^2(X'X)^{-1}$$`

---
## White SE matricial

Bajo Heteroscedasticidad:

`$$var(\hat\beta) = [(X'X)^{-1}X']\Sigma[X(X'X)^{-1}]$$`

- `$\Sigma: n \times n$` y no se pueden estimar `$n \times n$` elementos con n observaciones.

Pero `$X'\Sigma X$` es `$kxn(nxn)nxk$` = `$kxk$`, lo que se puede estimar.

- En lugar de estimar la matriz de varianza-covarianza se aproxima la diagonal con los residuos pesados por cada `$x_i'x_i$`.

`$$var(\hat\beta) = (X'X)^{-1}(\sum_{i=1}^n\hat\mu_i^2x_i'x_i)(X'X)^{-1}$$`

---
## SE robusto para `$\beta_j$`:

Los errores White, Huber, Eicker (HC1):

`$$\hat{var}(\hat\beta_j)= \frac{\sum_{i=1}^n \hat{r}_{ij}^2 \hat\mu_i^2}{SSR_j^2}$$` 
- `$\hat{r}_{ij}^2$`: el efecto de `$x_j$` en `$y$` luego del partialling out de `$x_i$`

- Se pesa por varianza específica de `$x_j$` respecto a `$y$`

- Se puede sustituir `$SSR_j^2$` por `$SST_j(1-R^2)$` para considerar multicolinearidad.

---
## Ejemplo: SE Robustos a Heteroscedasticidad

- Idealmente se deberían de estimar y mostrar ambos errores (bajo MLR5 y HC1). Aunque nadie lo hace.

- Si existen diferencias importantes, se tienen problemas de heteroscedastidicad y/o de tamaño de muestra.

---
## Outline

- Inferencia Robusta a Heteroscedasticidad

- **.blue[Tests de Heteroscedasticidad.]**

- Weighted Least Squares.

- Feasible Least Squares.

- Errores estándar en cluster.

---
## Test de heterocedasticidad

La **heterocedasticidad puede tomar varias formas**. Así que no hay un test definitivo.

**Prueban heteroscedasticidad pura.**

Antes que aplicar una prueba:

- ¿Hay errores de especificación obvios?

- ¿Es probable que el fenómeno estudiado tenga heterocedasticidad?  
 
- ¿El gráfico de los residuos muestra evidencia de heterocedasticidad?
 
**Homoscedasticidad es la excepción.**

---

## Breusch-Pagan

La prueba de **Breusch–Pagan** evalúa si la varianza de los errores depende de las variables explicativas:

$$
H_0: Var(u|x_1, x_2, \ldots, x_k) = Var(u|x) = \sigma^2
$$

Recordemos que:

$$
Var(u|x) = E(u^2|x) - [E(u|x)]^2 = E(u^2|x)
$$

Por lo tanto, bajo la hipótesis nula:

$$
E(u^2|x_1, \ldots, x_k) = E(u^2) = \sigma^2
$$

Es decir, el valor esperado de `$u^2$` **no debe variar con** las variables explicativas.

---
## Prueba de Breusch–Pagan

$$
H_0: E(u^2|x_1, x_2, \ldots, x_k) = E(u^2) = \sigma^2
$$

Se estima:

`$$\hat{u}_i^2 = \delta_0 + \delta_1 x_{1i} + \cdots + \delta_k x_{ki} + error_i$$`

y se contrasta:

$$
H_0: \delta_1 = \delta_2 = \cdots = \delta_k = 0
$$
---
## Prueba de Breusch–Pagan

El estadístico **F** se calcula como:

`$$F = \frac{R_{\hat{u}^2}^2 / k}{1 - R_{\hat{u}^2}^2 / (n - k - 1)}$$`

Si el estadístico F es alto (es decir, `$R^2$` alto), se rechaza `$H_0$`.

Una forma alternativa es el **estadístico LM (Lagrange Multiplier)**:

$$
LM = n \cdot R_{\hat{u}^2}^2 \sim \chi_k^2
$$

Valores grandes de `$LM$` también implican rechazo de la hipótesis de homocedasticidad.

---
## Prueba de White

Se estima una regresión auxiliar de los residuos al cuadrado sobre las variables explicativas, sus cuadrados e interacciones:

`$$\hat{u}_i^2 = \delta_0 + \delta_1x_{1i} + \delta_2x_{2i} + \delta_3x_{3i} + \delta_4x_{1i}^2 + \\ \delta_5x_{2i}^2 + \delta_6x_{3i}^2 + \delta_7x_{1i}x_{2i} + \delta_8x_{1i}x_{3i} + \delta_9x_{2i}x_{3i} + error_i$$`

y se contrasta:

`$$H_0: \delta_1 = \delta_2 = \cdots = \delta_9 = 0$$`

El estadístico de contraste es:

`$$LM = n \cdot R_{\hat{u}^2}^2 \sim \chi^2_9$$`

Permite detectar formas más generales de heterocedasticidad que Breusch–Pagan.

Incluir cuadrados e interacciones genera un gran número de parámetros.

---
## Prueba de White

Una versión simplificada de la prueba de White consiste en estimar:

`$$\hat{u}_i^2 = \delta_0 + \delta_1\hat{y}_i + \delta_2\hat{y}_i^2 + error_i$$`

Esta regresión prueba si la varianza de los errores depende de las variables explicativas, sus cuadrados o interacciones, `$\hat{y}$` y `$\hat{y}^2$` las contienen implícitamente.

**Hipótesis nula:**

$$
H_0: \delta_1 = \delta_2 = 0
$$

El estadístico de contraste es:

$$
LM = n \cdot R_{\hat{u}^2}^2 \sim \chi_2^2
$$
---
## Prueba de White

**Ejemplo:**

Ecuación de precios de vivienda en logaritmos:

`$$R_{\hat{u}^2}^2 = 0.0392,\quad LM = 88(0.0392) = 3.45,\quad p\text{-valor}_{LM} = 0.178$$`

No se rechaza `$H_0$`: no hay evidencia de heterocedasticidad.

---
## Outline

- Inferencia Robusta a Heteroscedasticidad

- Tests de Heteroscedasticidad.

- **.blue[Weighted Least Squares.]**

- Feasible Least Squares.

- Errores estándar en cluster.

---

## Mínimos Cuadrados Ponderados (WLS)

WLS es eficiente bajo heterocedasticidad; OLS no lo es.

WLS es un caso particular de los **mínimos cuadrados generalizados (GLS)**.

Suponemos que se cumplen los supuestos MLR1–MLR4, pero no el MLR5 (homocedasticidad).

- Modelo:

`$$y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i} + \cdots + \beta_kx_{ki} + u_i$$`
- Donde la varianza del error depende de las variables explicativas:

`$$Var(u_i|x_{1i}, \ldots, x_{ki}) = \sigma^2 h(x_{1i}, \ldots, x_{ki})$$`

con una función conocida `$h(\cdot)$` como: `$h(x_{1i}, \ldots, x_{ki}) = x_{1i}$`

---
## Mínimos Cuadrados Ponderados (WLS)

Si la heterocedasticidad es conocida, e.g. una constante multiplicativa:

`$$Var(u_i|x_i) = \sigma^2 h(x_i), \quad h(x_i) = h_i > 0$$`

Entonces, para cada observación:

`$$\sigma_i^2 = Var(u_i|x_{i1}, \ldots, x_{ik}) = \sigma^2 h_i$$`

Partimos del modelo:

`$$y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \cdots + \beta_kx_{ik} + u_i$$`

Si dividimos toda la ecuación por `$\sqrt{h_i}$`:

`$$\frac{y_i}{\sqrt{h_i}} = \beta_0 \frac{1}{\sqrt{h_i}} + \beta_1 \frac{x_{i1}}{\sqrt{h_i}} + \cdots + \beta_k \frac{x_{ik}}{\sqrt{h_i}} + \frac{u_i}{\sqrt{h_i}}$$`

Así, la nueva perturbación `$\tilde{u}_i = u_i / \sqrt{h_i}$` tiene varianza constante.

---
## Mínimos Cuadrados Ponderados (WLS)

**Prueba:**

`$$Var(\tilde{u}_i|x_i) = \sigma^2$$`

`$$Var\!\left(\frac{u_i}{\sqrt{h_i}} \,\middle|\, x_i\right) = E\!\left[\left(\frac{u_i}{\sqrt{h_i}}\right)^2 \middle| x_i \right] = \frac{E(u_i^2|x_i)}{h_i} = \frac{\sigma^2 h_i}{h_i} = \sigma^2$$`
Al dividir toda la ecuación original por `$\sqrt{h_i}$` obtenemos:

`$$\frac{y_i}{\sqrt{h_i}} = \beta_0 \frac{1}{\sqrt{h_i}} +
\beta_1 \frac{x_{i1}}{\sqrt{h_i}} + \cdots + \beta_k \frac{x_{ik}}{\sqrt{h_i}} + \frac{u_i}{\sqrt{h_i}}$$`

El modelo "pesado" se puede denotar como:

`$$y_i^* = \beta_0 x_{i0}^* + \beta_1 x_{i1}^* + \cdots + \beta_k x_{ik}^* + u_i^*$$`

---
## Mínimos Cuadrados Ponderados (WLS)

**Por ejemplo:**

`$$savings_i = \beta_0 + \beta_1 income_i + u_i, \qquad Var(u_i|income_i) = \sigma^2 income_i$$`

`$$\Rightarrow \frac{savi_i}{\sqrt{inci_i}} = \beta_0 \left(\frac{1}{\sqrt{inci_i}}\right) + \beta_1 \left(\frac{inci_i}{\sqrt{inci_i}}\right) + u_i^*$$`
*Note that this regression has no intercept.*

If the other Gauss-Markov assumptions hold, OLS applied to the transformed model is **BLUE**

---
## Mínimos Cuadrados Ponderados (WLS)

Minimización:

`$$\min \sum_{i=1}^n \left(\frac{y_i}{\sqrt{h_i}}- b_0 \frac{1}{\sqrt{h_i}}- b_1 \frac{x_{i1}}{\sqrt{h_i}}- \cdots- b_k \frac{x_{ik}}{\sqrt{h_i}} \right)^2$$`

Esto es equivalente a:

`$$\min \sum_{i=1}^n ( y_i - b_0 - b_1x_{i1} - \cdots - b_kx_{ik} )^2 \left(\frac{1}{h_i}\right)$$`

Las observaciones con `$h_i$` grande reciben **menor peso** en la estimación.

MCO es un caso particular en el que todos los pesos son iguales `$h_i = 1$`.

---
## Notas

- Econometrics packages that have a built-in WLS option will report an R-squared (and adjusted R-squared) along with WLS estimates and standard errors.

- Typically, the WLS R-squared is obtained from the weighted SSR.

- As a goodness-of-fit measure, this R-squared is not especially useful, as it effectively measures explained variation in `$y_i*$` rather than `$y_i$`.

- Nevertheless, the WLS R-squared computed as described are appropriate for computing F statistics for exclusion restrictions

---
## Outline

- Inferencia Robusta a Heteroscedasticidad

- Tests de Heteroscedasticidad.

- Weighted Least Squares.

- **.blue[Feasible Least Squares.]**

- Errores estándar en cluster.

---
## Feasible GLS

Cuando la forma de la heterocedasticidad **no es conocida**, se puede modelar como exponencial (siempre positiva):

`$$Var(u|x) = \sigma^2 \exp(\delta_0 + \delta_1x_1 + \cdots + \delta_kx_k)
= \sigma^2 h(x)$$`

Suponiendo un error multiplicativo independiente de explicativas:

$$
u^2 = \sigma^2 \exp(\delta_0 + \delta_1x_1 + \cdots + \delta_kx_k) {\times}  v
$$
Tomando logaritmos:

$$
\log(u^2) = \alpha_0 + \delta_1x_1 + \cdots + \delta_kx_k + e
$$

---
## Feasible GLS

$$
\log(u^2) = \alpha_0 + \delta_1x_1 + \cdots + \delta_kx_k + e
$$

A partir de los residuos estimados `$\hat{u}_i$` obtenemos regresión:

$$
\log(\hat{u}_i^2) = \hat{\alpha}_0 + \hat{\delta}_1x_1 + \cdots + \hat{\delta}_kx_k + \epsilon
$$

Entonces, la función estimada de heterocedasticidad es:

$$
\hat{h}_i = \exp(\hat{\alpha}_0 + \hat{\delta}_1x_1 + \cdots + \hat{\delta}_kx_k)
$$

y podemos estimar **WLS usando pesos `$1/\hat{h}_i$`**.

---
## Feasible GLS

- Si la función de heterocedasticidad está mal especificada, WLS es consistente bajo **MLR.4**. Calcular **errores estándar robustos**.

-  Cualquier función de `$x$` no está correlacionada con `$u$`, por lo tanto el error ponderado `$u \sqrt{h(x)}$` tampoco lo está.

- Si existe heterocedasticidad **pura** fuerte, a menudo es mejor utilizar una **forma incorrecta** de heterocedasticidad para **aumentar eficiencia**.

- **Problemas fuertes de heteroscedasticidad suelen relacionarse con incumplimiento de MLR4**

- Si los estimadores **OLS** y **FGLS** producen resultados diferentes, otros supuestos (por ejemplo, **MLR.4**) **no se cumplen**.

---
## Outline

- Inferencia Robusta a Heteroscedasticidad

- Tests de Heteroscedasticidad.

- Weighted Least Squares.

- Feasible Least Squares.

- **.blue[Errores estándar en cluster.]**

---
## Clustered Sampling

Samples could be correlated within groups (not across). For example when studying schools, firms, households or localities.

This is `$Y_{ig}, X_{ig}$` where `$g= 1,..G.$` indexes the cluster. Number of observations per cluster is `$n_g$` and `$n=\sum_{g = 1}^{G}n_g$`.

A model is:

`$$Y_{ig} = X'_{ig}\beta + e_{ig}$$`

Or we can use cluster notation `$Y_g= X'_g\beta+e_g$`. Where `$e_g = (e_{1g},..., e_{n_g})'$` is an `$n_g\times1$` error vector.

We can write the sums over observations as `$\sum_{g = 1}^{G}\sum_{i=1}^{n_g}$`

This is the sum across clusters of the sum across observations within each cluster.

---
## Clustered Sampling

OLS is: 
`$$\beta=  (\sum_{g = 1}^{G}X'_gX_{g})^{-1} (\sum_{g = 1}^{G}X'_{g}y_{g})$$`
`$$=(X'X)^{-1}(X'y)$$`
With residuals `$\hat{e}_{ig}= Y_{ig}-X'_{ig}\hat{\beta}$`  or  `$\hat{e}_{g}= Y_{g}-X'_{g}\hat{\beta}$` (in cluster level notation)

---
## Clustered Sampling

Assumption was that clusters are mutually independent and errors conditionally mean `$E[e_{g}|X_{g}]=0$`.

This is all interaction effects within clusters have been accounted for in the specification of the individual regressors `$X_{ig}$`

e.g the achievement of any student is unaffected by the individual `$x_i$` (e.g. age, gender and test scores) of other students within the same school.

---
## Clustered Sampling (Example)

From Duflo et. al. (2011) in 121 primary schools in Kenya.
Students are randomly assigned into "tracking" classrooms or heterogenous classrooms.

`$$TestScore_{ig} = -0.071 + 0.138Tracking_{g} + e_{ig}$$`
and

`$$TestScore_{ig} = \alpha + Tracking_{g} + X'_{ig}\beta + e_{ig}$$`

---
## Variance with clusters

Let `$\sum_g= E[e_ge'_g|X_g]$` denote `$n_g \times n_g$` conditional covariance matrix of errors within the `$g_{th}$` cluster.

`$$var[(\sum_{g=1}^G X'_ge_g)|X]=\sum_{g=1}^G var [X'_ge_g|X_g]$$`
`$$= \sum_{g=1}^G X'_g E[e_ge'_g|X_g]X_g$$`
`$$= \sum_{g=1}^G X'_g \Sigma_g X_g =_{def} \Omega_n$$`
Hence: `$V_{\hat{\beta}}= var[\hat{\beta}|X] = (X'X)^{-1} \Omega_n(X'X)^{-1}$`

This differs from the formula of the independent case, due to correlation within clusters.

---
## Variance with clusters

Variance difference depends on the degree of correlation between observation within clusters.

e.g. if same number of observations within cluster `$n_g = N$`, `$E[e^2_{ig}|X] = \sigma^2$`; `$E[e^2_{ig},e^2_{lg}|X] = \sigma^2\rho$` for `$i\ne l$`.

Same regressors within clusters. Hence:

`$$V_\hat{\beta} = (X'X)^{-1} \sigma^2  (1 + \rho(N-1))$$`

For `$\rho>0$` is approximately a multiple `$\rho N$` of the conventional formula.

If cluster size 100 and `$\rho = 0.25$`, the exact variance should be 25 times bigger with SE five times bigger.

**As we are weighting error variance by `$\rho$` this is FGLS.**

---
## Variance with clusters

Arellano Bond (1987) give the cluster robust covariance matrix that extends White:

For white squared error `$e^2_i$` is unbiased for `$E[e^2_i|X_i]=\sigma^2_i$`

With cluster dependence the matrix `$e'_ge'_g$` is unbiased for `$E[e^2_ge'^2_g|X_g]=\Sigma_g$`

The unbiased estimator for `${\Omega_n}$` is `$\tilde{\Omega}_{n} = \Sigma^G_{g=1} X'_ge_ge'_gX_g'$` replacing with residuals:

`$$\hat{\Omega}_n = \Sigma^G_{g=1} X'_g\hat{e}_g\hat{e}'_gX$$`
 `$$= \sum_{g=1}^G  \sum_{i=1}^{n_g}  \sum_{l=1}^{n_g} X_{ig} X'_{lg} \hat{e}_{ig} \hat{e}_{lg}$$`
`$$= \sum_{g=1}^G(\sum_{i=1}^{n_g}X_{ig}\hat{e}_{ig})  (\sum_{l=1}^{n_g}X_{lg}\hat{e}_{lg})'$$` 
---
## Variance with clusters

A finite sample adjustment is: `$a_n(X'X)^{-1}\tilde{\Omega_n}(X'X)^{-1}$`, where `$a_n = ({n-1 \over n-k})  ({G \over G-1})$` to improve performance when G is small.

Example:

`$$TestScore_{ig} = -0.071 + 0.138 Tracking_g + e_{ig}$$`
         `$$\quad        (0.019)   \quad   (0.026)$$`
         `$$\quad        [0.054]   \quad   [0.054]$$`     
         
---
## Variance with clusters

Aproximación escalar:

- Varianza de `$\hat{\beta}_1$` con correlación o heterocedasticidad:

`$$Var(\hat{\beta}_1|x_i) = \frac{var ( \sum_{i=1}^{n}(x_i - \bar{x})u_i )}{SST_x^2}$$`

- El **supuesto de independencia ya no es válido.** Entonces:

`$$Var(\hat{\beta}_1|x_i) =
\frac{\sum_{i=1}^{n}\sum_{j=1}^{n} Cov\big((x_i - \bar{x})u_i, (x_j - \bar{x})u_j\big)}{SST_x^2}$$`

Estimador empírico:

`$$\widehat{Var}(\hat{\beta}_1) =
\frac{\sum_{i=1}^{n}\sum_{j=1}^{n}(x_i - \bar{x})\hat{u}_i \hat{u}_j (x_j - \bar{x})}{SST_x^2}$$`

---
## Variance with clusters

**Forma matricial:**

`$$\widehat{V} = \left(\sum_{i=1}^{N} X_i'X_i\right)^{-1}
\left(\sum_{i=1}^{N} X_i'\hat{u}_i \hat{u}_i'X_i\right)
\left(\sum_{i=1}^{N} X_i'X_i\right)^{-1}$$`

- Los errores pueden ser **heterocedásticos** y/o **autocorrelacionados**.

- El estimador consistente de la varianza en este caso es el **HAC**

- *Heteroskedasticity and Autocorrelation-Consistent* de **Newey y West (1987)**.

- More on: Wooldridge (2012), *Econometric Analysis of Cross-Section and Panel Data*.

---
## Variance with clusters (Ejemplo)

Consideremos el modelo a nivel de empleado dentro de empresas `$i$`:

`$$y_{i,e} = \beta_0 + \beta_1 x_{i,e,1} + \beta_2 x_{i,e,2} + \cdots + \beta_k x_{i,e,k} + f_i + \nu_{i,e}$$`

donde `$f_i$` es un **efecto de empresa** no observado compartido por todos los empleados de la firma `$i$`,  y `$\nu_{i,e}$` es un error de cada empleado `$e$`.

El **error compuesto** es: `$u_{i,e} = f_i + \nu_{i,e}$`

La varianza del error compuesto es:

`$$Var(f_i + \nu_{i,e}) = Var(f_i) + Var(\nu_{i,e}) = \sigma_f^2 + \sigma_\nu^2 = \sigma^2$$`

---
## Variance with clusters (Ejemplo)

Calculamos la **covarianza entre dos errores compuestos** dentro de la misma empresa:

`$$Cov(u_{i,e}, u_{i,g}) = Cov(f_i + \nu_{i,e}, f_i + \nu_{i,g}) = Cov(f_i, f_i) \\ + Cov(f_i, \nu_{i,g}) + Cov(\nu_{i,e}, f_i) + Cov(\nu_{i,e}, \nu_{i,g})$$`

Sólo si `$f_i, \nu_{i,e}$` y `$\nu_{i,g}$` son **independientes**, entonces:

`$$Cov(u_{i,e}, u_{i,g}) = Var(f_i) = \sigma_f^2$$`
---
## En suma

Los errores agrupados suelen ser **mayores** y su magnitud es más grande si:

- Los **regresores están más correlacionados** dentro del grupo.  
- Los **errores son más correlacionados** entre observaciones del mismo clúster.  
- Hay **más observaciones por clúster**.

Si los errores están correlacionados dentro del grupo, OLS pierde eficiencia.

Una observación adicional en el mismo clúster **no aporta información independiente**.

Ignorar la agrupación: **errores estándar subestimados** y **mayor probabilidad de falsos rechazos** de `$H_0$`.

---

<div class="centered-word">
  <h3>.black[¿Dudas?]</h3>
  <h3>.black[francisco.cabrera@cide.edu]</h3>
</div>