censoring.knit

class: center, middle
# Censoring and Selection
### Dr. Francisco J. Cabrera-Hernández
#### Econometría
#### Maestría en Economía
Primavera 2025
#####CIDE Santa Fe, Ciudad de México.

---
## Introduction

- Censored regression occurs when the dependent variable is constrained

- Observations pile up on a boundary, for example zero.

- Selection occurs when sampling is endogenous

- Conventional (e.g. least squares) estimators are biased for the population parameters of the
uncensored/unselected distributions.

---
## Censored Distributions

---

## Tobit Model (Censored Regression)

`$$Y^* = X' \beta + e \\
e \sim \mathcal{N}(0, \sigma^2) \\
Y = \max(Y^*, 0)$$`

The variable `$Y^*$` is latent (unobserved). The observed variable `$Y$` is censored **from below** at zero.

This means that positive values are uncensored and negative values are transformed to 0.

Individual’s optimal (unconstrained) continuously distributed choice is `$Y^*$`.

Assumptions: e is independent of `$Y*$` normal and homoskedastic, and censoring happens at a known threshold.

---
## Tobit Model (Censored Regression)

To justify this interpretation of the model we need to envisage a context where desired choices include negative values.

Observations pile-up at zero due to reporting constraints, social desirability bias, or survey design. e.g. business profits.

The location of the density and the **degree of censoring** are controlled by the conditional mean `$X'\beta$`.

As `$X'\beta$` **moves to the right**, the amount of censoring **is decreased**.

As `$X'\beta$` **moves to the left**, the amount of censoring **is increased**.

---
## Censored Distributions

A common “remedy” to the censoring problem is deletion of the censored observations. This creates
a truncated distribution.

We distinguish between three distributions and variables: uncensored (`$Y^*$`), censored
(`$Y$`), and truncated (`$Y^\#$`).

---
## Censored Regression Functions
We want to know: What is the probability that `$Y^*$` is less than 0, given `$X$`. Note:

`$Y^* < 0$` ⟺ `$X'\beta + e < 0$` ⟺ `$e < -X'\beta$`

So the conditional probability of censoring is:

$$
\mathbb{P} \left[ Y^* < 0 \mid X \right] = \mathbb{P} \left[ e < -X' \beta \mid X \right] = \Phi \left( \frac{ -X' \beta }{ \sigma } \right).
$$

*Because `$e$` is normal with mean 0 and variance `$\sigma^2$`, we standardize.*
---
## Censored Regression Functions

$$
\mathbb{P} \left[ Y^* < 0 \mid X \right] = \mathbb{P} \left[ e < -X' \beta \mid X \right] = \Phi \left( \frac{ -X' \beta }{ \sigma } \right).
$$

**If `$X'\beta$` is large and positive:**

`$-X'\beta$` is large and negative, and `$\Phi(-X'\beta/\sigma)$` is close to 0. So there’s low chance of censoring (most `$Y^*$` are positive).

In the graph before: censoring probability is 98% for `$X = -3$`, 50% for `$X = -1$`, and 2% for `$X = 1$`.

---
## Conditional Means

Since `$Y^* \leq Y \leq Y^{\#}$` it follows that:

`$$m^*(x) \leq m(x) \leq m^{\#}(x)$$`

With strict inequality if the censoring probability is positive.

This shows that the conditional means of the truncated and censored distributions are biased for the uncensored conditional mean.

In the graph before, the uncensored mean `$m^*(x)$` is marked by the straight line.

An estimator which is consistent for the conditional mean, will estimate the biased censored mean `$m(x)$` or truncated mean `$m^\#(x)$`.

---
## OLS Bias

Greene (1981) wrote the model with an explicit intercept as:

$$
Y^* = \alpha + X' \beta + e, \quad \text{and assume } X \sim \mathcal{N}(0, \Sigma).
$$

Showed that the best linear predictor slope coefficient is:

$$
\beta_{\text{BLP}} = \beta (1 - \pi) 
$$
Where `$\pi = \mathbb{P}(Y = 0)$` is the **censoring probability**.

The **least squares slope coefficients are shrunk toward zero** proportionately with the censoring percentage.

While this is specific to normally distributed regressors, it highlights the **bias introduced by censoring**

Allows a quick calculation of the expected bias due to censoring if bias is small do not bother correcting censoring.

---
## Tobit Estimator

Tobin (1958) proposed estimation of the censored regression model (27.1) by maximum likelihood.

The censored variable `$Y$` has a conditional distribution function which is a mixture of continuous and discrete components:

$$
F(y \mid x) =
`\begin{cases}
0, & y < 0 \\
\Phi\left( \frac{y - x'\beta}{\sigma} \right), & y \geq 0.
\end{cases}`
$$

The associated density function is:

$$
f(y \mid x) = \Phi\left( \frac{ -x'\beta }{ \sigma } \right)^{\mathbb{1}\{y=0\}} 
\left[ \sigma^{-1} \phi\left( \frac{ y - x'\beta }{ \sigma } \right) \right]^{\mathbb{1}\{y>0\}}.
$$

The first component is the probability of censoring, and the second component is the normal regression density.

---
## Tobit Estimator

The log-likelihood is the sum of the log density functions evaluated at the observations:

`$$\ell_n(\beta, \sigma^2) = \sum_{i=1}^n \log f(Y_i \mid X_i)$$`

`$$= \sum_{i=1}^n \left[
\mathbb{1}\{Y_i = 0\} \log f(Y_i \mid X_i) +
\mathbb{1}\{Y_i > 0\} \log \left( \sigma^{-1} \phi\left( \frac{ Y_i - X_i'\beta }{ \sigma } \right) \right)
\right]$$`

`$$= \color{green}{ \sum_{Y_i = 0} \log \Phi\left( \frac{ -X_i'\beta }{ \sigma } \right)} - \frac{1}{2} \sum_{Y_i > 0} \left( \log(2\pi \sigma^2) + \frac{1}{\sigma^2} (Y_i - X_i'\beta)^2 \right)$$`
The first component is the same as in a probit model, and the second component is the same as for the normal regression model.

---
## Tobit Estimator

MLE are the values which maximize the log-likelihood `$\ell_n(\beta, \sigma^2)$`.

**It is asymptotically normal.**

The density is discontinuous in the measure-theoretic sense (because of the mixture of
point mass at zero and continuous density).

Yet The log-likelihood is continuous and differentiable ensuring global convergence (see BH pp.847)

Still, Tobit log-likelihood because the likelihood involves a mixture of components (second derivatives.)

Quasi-Newton methods (like BFGS): approximate the Hessian Matrix (of second derivatives) instead of computing it fully.

[Coding](https://github.com/fcabrerahz/EconometricsME/blob/main/Code/21_tobit.R)

---

## Tobit Estimator

What if Normality/Homoskedasticity Fails?

Tobit Estimates become inconsistent: the Tobit model is **nonlinear in the likelihood**.

- `$\text{MLE of } \beta \text{ is not consistent}.$`

- Predicted means of `$Y$` are biased.

- Marginal effects (depend on `$\Phi$` and `$\phi$` are misleading.

- Standard errors are invalid. **OLS is more robust in such case.**

With high censoring, you're estimating from **very limited information**.

Even small deviations from normality can lead to **large distortions**.

---

## Alternatives to the Tobit Model

Semi-parametric & robust approaches:

**Censored Quantile Regression** (Powell, 1986):
  - No normality assumption
  - Robust to skewness and heavy tails
  
**Two-part / Hurdle models**:
  - One model for $ \mathbb{P}(Y > 0) $
  - Another for $ \mathbb{E}[Y \mid Y > 0] $

---
<style>
  .centered-word {
    position: absolute;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%);
  }
</style>

<div class="centered-word">
  <h1>Selection</h1>
</div>

---

## Sample Selection Bias

Most econometric models assume **random sampling**.

But in practice, samples are often **non-random**, creating **bias** if selection is endogenous.

1. **Wage regression.** Wages are only observed for those working.  
- Selection depends on **labor force participation**.

2. **Program evaluation.**Volunteers to programs may have unobserved traits.  
   
3. **Surveys.** Low response rates may bias results if response is related to the variable of interest.

4. **Ratings.** Only some users rate products — those who choose to may differ systematically (they don't like it).

---
## Sample Selection Bias

Let:

- `$(Y, X)$` be drawn from the population

Then:

- `$S = 1$` if the pair is **included** in the sample  
- `$S = 0$` otherwise

Hence:

$$
\mathbb{E}[Y \mid X, S = 1] = X' \beta + \mathbb{E}[e \mid X, S = 1]
$$
Selection bias occurs when the second term is non-zero.

---

## Sample Selection Bias

Suppose selection follows:

$$
S = 1 \{ X'\gamma + u > 0 \}
$$
This is consistent with a latent variable framework.

Then:

$$
\mathbb{E}[Y \mid X, S = 1] = X'\beta + \mathbb{E}[e \mid u > -X'\gamma]
$$

Selection bias appears because selection depends on **unobserved components** of the model.

---

## Sample Selection Bias

Sample selection rule (latent): `$S = \mathbf{1}\{X'\gamma + u > 0\}$`

---

## Heckman Model "Correction"

Consists of two equations:

1. **Outcome equation** (latent variable):  
   `$Y^* = X'\beta + \varepsilon$`

2. **Selection equation**:  
   `$S = 1\{ Z'\gamma + u > 0 \}$`

- We only observe `$Y = Y^*$` when `$S = 1$`.
- The error terms `$\varepsilon$` and `$u$` are assumed to be **jointly normally distributed**.

If `$\varepsilon$` and `$u$` are **correlated**, then:

`$\mathbb{E}[\varepsilon \mid S = 1] \neq 0$`

This violates a key assumption of regression and leads to **selection bias** in OLS.

---

## Conditional Expectation

In the selected sample:

`$\mathbb{E}[Y \mid X, S = 1] = X'\beta + \mathbb{E}[\varepsilon \mid u > -Z'\gamma]$`

The bias term is:

`$\mathbb{E}[\varepsilon \mid u > -Z'\gamma] = \rho \sigma_\varepsilon \lambda(Z'\gamma)$`

Where `$\lambda(\cdot)$` is the **inverse Mills ratio** (recall from Tobit).

$$
\lambda(x) = \frac{\phi(x)}{\Phi(x)}
$$

---
## Heckman Two-Step Estimator

**Step 1**: Estimate the selection equation using a **probit** model.  
Compute the inverse Mills ratio:

`$\lambda(Z'\hat{\gamma}) = \frac{\phi(Z'\hat{\gamma})}{\Phi(Z'\hat{\gamma})}$`

**It should include an exclusion restriction.** A variable that affects selection but not the outcome.

**Step 2**: Include `$\lambda$` as an additional regressor in the outcome equation:

`$Y = X'\beta + \rho \sigma_\varepsilon \lambda + \text{error}$`

- The inverse Mills ratio `$\lambda$` captures the **likelihood of selection**.
- By including it, we control for **selection bias**, just like controlling for a confounder.
- It would corrects the fact that the observed sample is **non-random**.

---
## Heckman Two-Step Estimator

Heavily criticized as assumes joint normality of the errors.

The error in the outcome equation (`$\varepsilon$`) and the error in the selection equation (`$u$`) must be jointly normally distributed.

This assumption is not testable with the data.

Mills ratio uses probit, again tied to normality.

Exlusion restricción implies finding a variable that selects you into the sample (Conditionall indpendence assumptio). **Finding it is hard.**

If you have it, better use IV estimation (again OLS.)

---
<style>
  .centered-word {
    position: absolute;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%);
  }
</style>