Core Challenge

Correlation vs. Causation

In impact assessment, our primary goal is to determine whether a specific intervention (Policy, Program, or “Treatment”) caused a change in an outcome.

  • The Intuitive Trap: It is natural to compare those who participated in a program with those who did not.

  • The Error: “Correlation is not causation.” Just because \(X\) and \(Y\) move together does not mean \(X\) causes \(Y\).

  • The Goal: To isolate the effect of the intervention from other confounding factors.

Potential Outcomes Framework

Rubin Causal Model

To talk about causality mathematically, we use the Potential Outcomes Framework, often attributed to Donald Rubin. This framework forces us to imagine two parallel universes for every individual.

A. Notation and Definitions

For any individual unit \(i\) (e.g., a person, a firm, a school):

  • Treatment Status (\(D_i\)):

    • \(D_i = 1\) if unit \(i\) receives the treatment.
    • \(D_i = 0\) if unit \(i\) does not receive the treatment.
  • Potential Outcomes:

    • \(Y_{1i}\): The outcome for unit \(i\) if they were treated.
    • \(Y_{0i}\): The outcome for unit \(i\) if they were not treated.

Potential Outcomes Framework

Crucial Concept:

These potential outcomes (\(Y_{1i}\) and \(Y_{0i}\)) exist in theory for everyone, regardless of whether they actually received the treatment or not.

B. The Causal Effect

The causal effect of the treatment on an individual \(i\) (\(\tau_i\)) is the difference between these two potential outcomes: \[\tau_i = Y_{1i} - Y_{0i}\]

Problem of Causal Inference

The problem is that for any individual \(i\), we cannot observe both \(Y_{1i}\) and \(Y_{0i}\) simultaneously.

  • If we treat person \(A\), we observe \(Y_{1A}\), but \(Y_{0A}\) becomes the Counterfactual.

  • If we do not treat person \(A\), we observe \(Y_{0A}\), but \(Y_{1A}\) becomes the Counterfactual.

Therefore, the Observed Outcome (\(Y_i\)) is realized as:

\[Y_i = D_i Y_{1i} + (1 - D_i) Y_{0i}\]

Because we can never calculate the individual treatment effect \(\tau_i\), individual causal inference is impossible. We must instead rely on Average Treatment Effects across a population.

Selection Bias

Since we cannot see the counterfactual, researchers often try to estimate the impact by comparing the average outcome of the treated group to the average outcome of the untreated group.

This is called the Simple Difference in Means (SDO). \[SDO = E[Y_i | D_i = 1] - E[Y_i | D_i = 0]\] However, this comparison is usually misleading due to Selection Bias.

Selection bias is a systematic error that occurs when individuals selected into a treatment or study differ in meaningful ways from those not selected, causing the estimated effect to reflect pre-existing differences rather than the true causal effect of the treatment.

Selection Bias

A. Decomposition of the Difference in Means

To understand why the simple comparison fails, we can mathematically decompose the SDO.

1. Start with the SDO:

\[E[Y_i | D_i = 1] - E[Y_i | D_i = 0]\]

2. Substitute potential outcomes:

\[E[Y_{1i} | D_i = 1] - E[Y_{0i} | D_i = 0]\] (Note: We observe \(Y_{1}\) for the treated and \(Y_{0}\) for the untreated)

Selection Bias

3. Add and subtract the counterfactual: \(E[Y_{0i} | D_i = 1]\)

\[= \underbrace{E[Y_{1i} | D_i = 1] - E[Y_{0i} | D_i = 1]}_{\text{ATT}} + \underbrace{E[Y_{0i} | D_i = 1] - E[Y_{0i} | D_i = 0]}_{\text{Selection Bias}}\]

  • ATT (Average Treatment Effect on the Treated): The actual benefit the treated group received. This is what we want to measure.

  • Selection Bias: The difference in potential outcomes (\(Y_0\)) between the two groups.This represents how the treated group differs from the untreated group even before the treatment occurs.

Examples

1. Do hospitals improve health? (Negative Selection Bias)

Imagine you are an alien visiting Earth. You want to know if Hospitals (\(D\)) are good for Human Health (\(Y\)).

You collect data and compare two groups:

  • Group A: People currently in the hospital (\(D=1\)).
  • Group B: People currently not in the hospital (\(D=0\)).

The Observation: When you measure their average health, you find that Group A (Hospitalized) is in much worse physical shape than Group B (Not Hospitalized).

The Naive Conclusion (Simple Difference in Means):

“Going to the hospital causes poor health. Therefore, hospitals are dangerous.”

Why this is Selection Bias (The Math)

The reason the conclusion is wrong is because of Selection Bias. People do not go to the hospital randomly; they “select” into the hospital because they are already sick.

Examples

Let’s look at the Selection Bias term from our equation:

\[\text{Selection Bias} = \underbrace{E[Y_{0i} | D_i = 1]}_{\text{Baseline health of the Hospitalized}} - \underbrace{E[Y_{0i} | D_i = 0]}_{\text{Baseline health of the General Public}}\]

1. The First Term: \(E[Y_{0i} | D_i = 1]\)

  • Who are they? These are the people who did go to the hospital (\(D=1\)).
  • What are we measuring? Their potential outcome \(Y_0\). This represents their health if they had stayed home and not received care.
  • Reality: Their health would be terrible. They are likely injured or ill.
    • Value: Low (e.g., 2/10 health score).

Examples

Let’s look at the Selection Bias term from our equation:

\[\text{Selection Bias} = \underbrace{E[Y_{0i} | D_i = 1]}_{\text{Baseline health of the Hospitalized}} - \underbrace{E[Y_{0i} | D_i = 0]}_{\text{Baseline health of the General Public}}\]

2. The Second Term: \(E[Y_{0i} | D_i = 0]\)

  • Who are they? These are the people who did not go to the hospital (\(D=0\)).
  • What are we measuring? Their potential outcome \(Y_0\). This represents their health without care.
  • Reality: They are walking around, working, and generally fine.
    • Value: High (e.g., 9/10 health score).

Examples

3. The Calculation

\[Bias = (Low) - (High) = Negative\]

Because the people who go to the hospital start with a “health deficit,” the Selection Bias is Negative.

The “Masking” Effect

This negative bias is so strong that it mathematically “masks” or hides the positive effect of the hospital.

  • True Effect (ATT): The hospital improves a sick person’s health by +5 points.

  • Selection Bias: Sick people are -7 points less healthy than the general public.

\[\text{Observed Difference} = \text{True Effect} (+5) + \text{Selection Bias} (-7)\] \[\text{Observed Difference} = -2\]

  • Conclusion : Even though the hospital helped (+5), the data makes it look like the hospital hurt (-2) because the patients were so sick to begin with.

Examples

2. The “Go-Getter” or Positive Bias (Job Training)

Scenario: You want to know if a Voluntary Job Training Program (\(D\)) increases Future Wages (\(Y\)).

You compare two groups:

  • Group A (\(D=1\)): People who signed up for the training.

  • Group B (\(D=0\)): People who did not sign up.

The Observation: Group A earns significantly more money after the program than Group B.

The Naive Conclusion:

“The training program is a miracle! It created massive wage gains.”

Examples

Why this is Positive Selection Bias

Unlike the hospital example (where the treated were “weaker”), here the treated group is “stronger.”

People who voluntarily sign up for job training are often more motivated, ambitious, and organized (“Go-Getters”). People who don’t sign up might be less career-focused or disorganized.

\[\text{Selection Bias} = \underbrace{E[Y_{0i} | D_i = 1]}_{\text{Base earning potential of Trainees}} - \underbrace{E[Y_{0i} | D_i = 0]}_{\text{Base earning potential of Non-Trainees}}\]

1. The First Term: \(E[Y_{0i} | D_i = 1]\)

  • Who are they? The “Go-Getters” who signed up (\(D=1\)).

  • What are we measuring? Their earnings \(Y_0\) if the program never existed.

  • Reality: Because they are ambitious and hardworking, they would likely find a way to get a promotion or a better job anyway.

    • Value: High (e.g., $50k/year potential).

Examples

\[\text{Selection Bias} = \underbrace{E[Y_{0i} | D_i = 1]}_{\text{Base earning potential of Trainees}} - \underbrace{E[Y_{0i} | D_i = 0]}_{\text{Base earning potential of Non-Trainees}}\]

2. The Second Term:\(E[Y_{0i} | D_i = 0]\)

  • Who are they? The people who didn’t bother to sign up (\(D=0\)).

  • What are we measuring? Their earnings \(Y_0\) without the program.

  • Reality: They might remain in their current roles.

    • Value: Average (e.g., $40k/year potential).

Examples

3.The Calculation

\[Bias = (High) - (Average) = Positive\]

The “Inflation” Effect

In this case, the bias works in the same direction as the treatment, causing us to overestimate the impact.

  • True Effect (ATT): The training teaches skills worth +\(5k/year\).

  • Selection Bias: The “Go-Getters” are naturally worth +\(10k/year\) more than the others because of their drive.

\[\text{Observed Difference} = \text{True Effect} (+5k) + \text{Selection Bias} (+10k)\]

\[\text{Observed Difference} = +15k\]

  • Conclusion:You measure a $15k wage gap and attribute it all to your program. In reality, the program only contributed 1/3 of that effect. The rest was just the superior quality of the participants.

Key Takeaway for Students

In social science, Selection Bias occurs when the treated group differs from the untreated group in a way that affects the outcome regardless of whether the treatment happened.

  • Positive Selection Bias: The treated group would have done better anyway (e.g., highly motivated students attending a tutoring program).

  • Negative Selection Bias: The treated group would have done worse anyway (e.g., sick people going to a hospital, or struggling regions receiving financial aid).

Solutions to Selection Bias

How do we eliminate the bias term (\(E[Y_{0i} | D_i = 1] - E[Y_{0i} | D_i = 0]\)) so we can see the true causal effect?

We generally categorize solutions into two “buckets”:

  • Randomization (RCTs: The Gold Standard) and

  • Quasi-Experimental Methods (QEM). This category covers Difference-in-Differences (DiD), Regression Discontinuity Design (RDD), Propensity Score Matching (PSM), and Instrumental Variables (IV).