Potential.knit

class: center, middle
# Potential Outcomes
### Dr. Francisco J. Cabrera-Hernández
#### Econometría
#### Maestría en Economía
Primavera 2024
#####CIDE Santa Fe, Ciudad de México.

---
##Introduction

- **Questions about questions**

- Selection Bias

- Random Assignment into treatment

- Regression analysis of experiments

- Bad Controls

---

## What is the causal relationship of interest?

- A coherent, interesting, and doable research agenda is the solid foundation on which useful
statistical analyses are built.

- In the beginning, we should ask: **What is the causal relationship of interest?**

- **Descriptive** research has a place in the world but the most interesting questions are causal.

- It tells us what would happen in alternative (or "counterfactual") worlds.

---

## Experiments

- The description of an ideal experiment helps you formulate causal questions precisely.

- The mechanics of an ideal experiment highlight the forces you'd like to manipulate: "Find your source of variation".

-   In the case of schooling and wages, for example, we can imagine offering students 
a reward for staying at school.

- In the case of political institutions, we might like to go back in time and randomly assign different government structures to former colonies!

- Most experiments are hypothetical. But are worth contemplating because they help us pick fruitful research topics.

---

## Fundamentally Unidentified Questions (FUQ'd)

- Questions about the causal effect of race or gender because these things are hard to isolate.

- "Imagine your chromosomes were switched at birth" Not possible but, studies involve fake job applicants and resumes.

<img src="data:image/png;base64,#cvs.jpg" width="40%" style="display: block; margin: auto;" />
---

## Really FUQ'd!

- Do children do better in primary if they start school earlier?

- 7-year-old brain is better prepared. If some start at age 6 and others at age 7, we cannot compare them!

- If we wait until they both are 7, some will be in second grade.

- The effect of start age on elementary school test scores is FUQ'd.

---

## What is your identification strategy?

- The manner in which a researcher uses observational data
(i.e., data not generated by a randomized trial) to approximate a real experiment.

- Quarter of birth is related with years of education! Angrist & Krueger (1996)

- Credible identification strategies are **emblematic** of modern empirical work.

---

## What is your mode of statistical inference?

- The population to be studied.

- The sample to be used.

- The estimator (OLS, LAD, MLE, Non-parametric options)

- The assumptions made when constructing standard errors (clustered or grouped).

---

class: center, middle
#The experimental ideal
<img src="data:image/png;base64,#earth.png" width="80%" style="display: block; margin: auto;" />
---

#Overview
- Questions about questions

- **Selection Bias**

- Random Assignment into treatment

- Regression analysis of experiments

- Bad Controls

---

## Selection Bias

- The most credible research designs use random assignment.

- Why are so powerful?

Imagine we want to answer if visits in the last year to hospitals make people healthier?
  `$$Health_{it} = \alpha + D_1 Hospital_t + ... + U_{it}$$`

What is the problem?

<img src="data:image/png;base64,#health.png" width="115%" style="display: block; margin: auto;" />
- This suggests that going to the hospital makes people sicker (**self-selection problem**).

---
## Formally

- Hospital: `$D_i = {0,1}$`
- Protential Outcome: 
`$$Y_{1i} | D_i = 1$$`
`$$Y_{0i} | D_i = 0$$`

<img src="data:image/png;base64,#ATE1.png" width="50%" style="display: block; margin: auto;" />
- `$Y_{1i} - Y_{0i}$` is the causal effect of hospitalization for an individual.
- But we never see both potential outcomes for any one person.

---

##Naive Comparision

A naive comparison of averages by hospitalization:

<img src="data:image/png;base64,#ATE2.png" width="125%" style="display: block; margin: auto;" />
Where: `$E[Y_{1i}|D_i=1]-E[Y_{0i}|D_i=1]=E[Y_{1i}-Y_{0i}|D_i = 1]$`

---
##Naive Comparision

- The average causal effect comes from those who were hospitalized `$E[Y_{1i}|D_i=1]$` had they NOT being hospitalized `$E[Y_{0i}|D_i=1]$`

- *Selection bias* is the difference in average `$Y_{0i}$` between those who were and were NOT hospitalized.

- Those who were not hospitalized have better health `$E[Y_{0i}|D_i=0]$`, making selection bias negative.

- The selection bias may be so large (in absolute value) that it completely masks a positive treatment effect.

---

#Overview
- Questions about questions

- Selection Bias

- **Random Assignment into treatment**

- Regression analysis of experiments

- Bad Controls

---

##St. Random Assignment into treatment

- Makes `$D_i$` independent of potential outcomes:

- Independence of `$Y_{0i}$` and `$D_i$` allows us to write `$E[Y_{0i} | D_i = 1]$` instead of `$E[Y_{0i}|D_i = 0]$` in the second line.

Simplifies to: 
<img src="data:image/png;base64,#ATE4.png" width="85%" style="display: block; margin: auto;" />

This is the **Average Treatment Effect of hospitalization on randomly chosen patients**.

---

## More on selection bias

- An iconic example from labor economics is the evaluation of government-subsidized training programs

- The idea is to increase employment and earnings. Yet **non-experimental** evidence shows that the trainees
earn less than plausible comparison groups.

- Another important question has been "What is the effect of class size on students' achievement?" What is the identification problem?

- How would you solve this experimentally? how would you do it non-experimentally?

---

##Internal Validity vs. External Validity

- Experiments have a strong *internal validity*, yet are most of the times impractical.

- High costs, long duration, and low *external validity* or low generalization.

- There are other problems such as the "Howthorne effect"

- Nevertheless, a notional randomized trial is our benchmark. Not all researchers
share this view, but many do.

---

## Distribution of an RCT

- Tutores en secundaria

<img src="data:image/png;base64,#tutores.jpg" width="80%" style="display: block; margin: auto;" />
---

#Distribution of an RCT

- Tutores en secundaria

<img src="data:image/png;base64,#notas1.png" width="100%" style="display: block; margin: auto;" />
---

#Distribution of an RCT

- Tutores en secundaria

<img src="data:image/png;base64,#notas2.png" width="70%" style="display: block; margin: auto;" />
---

#Distribution of an RCT (with small n)

- Tutores en secundaria

<img src="data:image/png;base64,#notas3.png" width="70%" style="display: block; margin: auto;" />
---

#Overview
- Questions about questions

- Selection Bias

- Random Assignment into treatment

- **Regression analysis of experiments**

- Bad Controls

---

##Regression Analysis of Experiments

- Suppose that the treatment effect is the same for everyone: `$y_{1i} - y_{0i} = \rho$`
<img src="data:image/png;base64,#reg1.png" width="90%" style="display: block; margin: auto;" />
- Where `$\eta_i$` is the random part of `$Y_{0i}$`.

`$$E[Y_{i}|D_i =1] = \alpha + \rho + E[\eta_i|D_i=1]$$`
`$$E[Y_{i}|D_i =0] = \alpha + E[\eta_i|D_i=0]$$`
hence:

`$$E[Y_{i}|D_i =1]-E[Y_{i}|D_i =0] = \rho + E[\eta_i|D_i=1] - E[\eta_i|D_i=0]$$`

---

##Regression Analysis of Experiments

- Where: `$E[\eta_i|D_i=1] - E[\eta_i|D_i=0]$` is the selection bias. The correlation between regression error `$\eta_i$` and the regresor, `$D_i$`.

- If `$D_i$` is randomly assigned, the selection term disappears and `$\rho$` is the causal effect.

Hence:

`$$Y_i = \alpha + \rho D_i + X'_i\sigma + \eta_i$$`
- Adding covariates should not change `$\rho$` as these are "balanced" between treated and untreated `$i$`.

- But they reduce the residual variance.

---

##The STAR Experiment

---

##The STAR Experiment
<img src="data:image/png;base64,#star1.png" width="75%" style="display: block; margin: auto;" />

---
##Regression and Causality

- When can we think of a regression coefficient as approximating the causal effect that might be revealed in
an experiment?

- A regression is causal when the CEF it approximates is causal.

- CEF is causal when it describes differences in average potential outcomes for a fixed reference population.

---

##Regression and Causality

- The causal connection between schooling and earnings can be defined as the functional relationship that describes what a given individual `$i$` would earn if she obtained different levels of education.

- In empirical work, the causal relationship between schooling and earnings tells us what people would earn *on average* if we could change their schooling keeping the rest fixed.

- **Or change their schooling randomly so that those with different levels of schooling would be comparable**.

- This leads to the *conditional independence assumption (CIA)*, that provides the justification for the causal interpretation of regression.

- This assumption is sometimes called *selection-on-observables* because the covariates to be held fixed are observed.

---

##Regression and Causality

- The causal relationship between college attendance and a future earnings can be described using the potential-outcomes notation.

- In this case, `$Y_{0i}$` is earnings with no college, while `$Y_{1i}$` is earnings with college.

---
## Regression and Causality

- We get to see one of `$Y_{0i}$` or `$Y_{1i}$`, but never both.

- We therefore hope to measure the average of `$Y_{1i}$` - `$Y_{0i}$`, yet we have a bias.

- It seems likely that those who go to college would have earned more anyway.

- If so, selection bias is positive, and the naive comparison exaggerates the "college premium".

---
##Regression and Causality

- Hence: the **CIA** states that:  `$[Y_{1i} , Y_{0i}] \perp C_i | X_i$`

- Or: `$E[Y_i | X_i, C_i = 1] - E[Y_i|X_i, C_i=0] = E[Y_{1i}-Y_{0i}|X_i]$`

- i.e., potential outcomes (wages) of people who went to college, and those who did not go, are independent of going to college, once we control for `$X_i$`.

- This `$X_i$` is "the door" that fully explains why someone "is treated". This is the base of some estimators such as Propensity Score Matching (PSM) or Heckman Probit.

- Such estimators are not longer used (in modern econometrics). **CIA is highly implausible**.

---

#Overview

- Questions about questions

- Selection Bias

- Random Assignment into treatment

- Regression analysis of experiments

- **Bad Controls**

---
##Bad Controls: More is better?

- Control for covariates can make the CIA more plausible. But not always more is better.

- Bad controls are variables that are themselves *outcome variables* in the notional experiment at hand.

- Good controls are variables that we can think of as having been fixed at the time the regressor of interest was determined.

- The essence of the bad control problem is a version of **selection bias**.

---
##Bad Controls

- For example, a college degree opens the door to higher-paying white collar jobs.

- Should occupation therefore be seen as an omitted variable in a regression of wages on schooling?

- If college affects occupation, comparisons of wages by college degree status within an occupation are no longer apples-to-apples

- ***This is even if college degree completion is randomly assigned***.

---

## Bad Controls Formally

- `$W_i$` for white-collar workers and `$Y_i$` for wage (outcome). Asume college `$C_i$` randomly assigned.

- We might estimate these average treatment effects by simply regressing `$Y_i$` and `$W_i$` on `$C_i$`.

- Bad control: a comparision of earnings conditional on `$W_i$` does not have a causal interpretation.

- Consider the difference in mean earnings between college graduates and others conditional on working at a
white collar job...

---
## Bad Controls Formally

- The estimand for people with withe-collar job is the difference in means with `$C_i$` switched on and
off, conditional on `$W_i$` = 1:

<img src="data:image/png;base64,#bad2.png" width="120%" style="display: block; margin: auto;" />
by the joint independence of `$[Y_i, W_i, Y_o, Wo]$` and `$C_i$` we have:
<img src="data:image/png;base64,#bad3.png" width="120%" style="display: block; margin: auto;" />
- This expresion denotes an apples-to-orange comparison:
<img src="data:image/png;base64,#bad4.png" width="100%" style="display: block; margin: auto;" />
- Someone who gets a white collar without benefit of a college degree (i.e., `$W_{0i} = 1$`) is probably special, i.e., has a better than average `$y_{0i}$`.
---
## Bad Controls

- In Column (5) we no longer know if the reduction in the coefficient is due to ommited variable bias, or due to (self)selection into occupations affected by college.

---
## Bad Controls: Proxy Variable.

- A second version of the bad control scenario involves proxy controls that are affected by the variable of interest.

- Is is OK to control for a IQ in a regression of wages on education if this is measured before school. Otherwise:

- Where `$a_{i}$` is innate ability and `$a_{li}$` is abbility measured later, after school. So when substituting `$a_{i}$` with  `$a_{li}$` we attenuate the effect `$\rho$`, unless `$\pi_1$` = 0.

- Clear reasoning about causal channels requires explicit assumptions about what happened first, or the
assertion that none of the control variables are themselves caused by the regressor of interest.

---
class: center, middle
#THE END