** Endogeneity and 2SLS in GRETL**

A Step-by-Step Guide for Applied Econometrics Lab
- GPA Data Set


Part 1: Why OLS Requires Exogeneity

1.1 The Fundamental OLS Assumption

OLS regression relies on the exogeneity assumption:
\[ \text{Cov}(X, \varepsilon) = 0 \]
This means:
- Your explanatory variables (X) must not correlate with unobserved factors in the error term (ε).
- If they do, OLS estimates become biased ( wrong on average) and inconsistent (do not improve even with large datasets).
- Invalid inference: Confidence intervals and hypothesis tests become unreliable.

Math Behind the Bias

The OLS estimator is:
\[ \hat{\beta} = \beta + \underbrace{(X'X)^{-1}X'\varepsilon}_{\text{Bias term}} \]
If \(X\) and \(\varepsilon\) are correlated, the bias term doesn’t vanish, even as \(n \to \infty\).

1.2 What Happens When Exogeneity Fails?

Example: Suppose we estimate:
\[ \text{GPA} = \beta_0 + \beta_1 \text{StudyHours} + \varepsilon \]
- If ε includes “natural ability,” and smarter students study more, then:
\[ \text{Cov}(\text{StudyHours}, \varepsilon) > 0 \]

  • OLS attributes both study effort AND ability to β₁

  • Result: OLS overestimates the effect of studying because it conflates study hours with innate ability.

  • Analogy: Using a thermometer affected by sunlight → biased temperature readings.

  • Example: Model students grade using attendance at lectures.

    • Which omitted factor would lead to endogeneity of attendance?
    • Three possible omitted factors:
    1. Difficulty of exam
    2. Motivation of the students?
    3. Compulsory attendance yes/no?

  1. Difficulty of exam NO: not correlated with attendance.
  2. Motivation of the students? YES: correlates with attendance and a ects grade.
  3. Compulsory attendance yes/no? NO: does not directly impact the grade

Part 2: The GPA and Prep Course Example

2.1 Introducing the Problem

Research Question: Does taking a preparatory math course (participation) improve GPA in an engineering MOOC?

The Data (from TrainExer45.gdt):

Variable Description Role
GPA Grade Point Average (0-10) Outcome
Participation 1 if took prep course (voluntary) Endogenous X
Gender Control variable Exogenous
Email 1 if received invitation Instrument (Z)

In our GPA study: - Prep course participation is voluntary (self-selection) - Motivated students (high ε) are more likely to participate - OLS conflates course effect with student motivation

Analogy: Measuring a drug’s effect when only healthy patients take it.

2.2 The Self-Selection Problem

  • Participation is voluntary: Students decide based on:
    • Motivation
    • Prior knowledge
    • Time availability
  • These unobserved factors (ε) affect both participation and GPA → endogeneity

Instrumental Variables - The Solution

2.3 What Makes a Good Instrument?

A valid instrument Z must satisfy: 1. Relevance: Correlated with endogenous X (participation) - Test: Strong first-stage relationship (F-stat > 10) 2. Exogeneity: Uncorrelated with ε (affects y only through X) - No statistical test - must argue conceptually

2.4 Our Instrument: Email Invitation

 - **Nature**: Random technical issue caused some students to **not receive** the invitation
 - **Why valid?**
  1. Relevance: Receiving email increases participation chance
  2. Exogeneity: Email delivery was random (uncorrelated with student ability/motivation)

Endogeneity Risk:
- Motivated students (high ε) are more likely to take the course.
- OLS conflates the course effect with motivation bias.


Part 3: Step-by-Step GRETL Analysis

3.1 Step 1: Run OLS (Biased Estimates)

  1. Menu: Model → Ordinary Least Squares
  2. Dependent: GPA
  3. Regressors: const Gender Participation
  4. Result: β_participation = 0.82 (SE = 0.047)

Interpretation: Likely overestimated due to self-selection.

3.2 Step 2: First-Stage Regression

  1. Menu: Model → Two-Stage Least Squares
  2. Dependent: Participation
  3. Regressors: const Gender Email
  4. Save residuals: Name as V

Key Output: - Check if Email is significant (t-stat > 2) - F-statistic should be > 10 (weak instrument test)

3.3 Step 3: Hausman Test

  1. Run OLS on GPA, save residuals (e_OLS)
  2. Menu: Model → Ordinary Least Squares
    • Dependent: e_OLS
    • Regressors: const Gender Participation V
  3. Test: If V is significant (p < 0.05), OLS is biased

Intuition: V captures the “self-selection” part of participation that correlates with GPA errors.

3.4 Step 4: Run 2SLS

  1. Menu: Model → Two-Stage Least Squares
  2. Dependent: GPA
  3. Regressors: const Gender Participation
  4. Instruments: const Gender Email
  5. Result: β_participation = 0.24 (SE = 0.115)

Interpretation: The true causal effect is much smaller than OLS suggested!


Part 4: Key Takeaways

  1. Self-selection creates endogeneity → OLS fails
  2. Valid instruments must:
    • Affect participation (relevance)
    • Be randomly assigned (exogeneity)
  3. 2SLS workflow:
    • First stage: Regress endogenous X on Z
    • Second stage: Use predicted X to estimate y
  4. Always test:
    • Instrument strength (first-stage F-stat)
    • Endogeneity (Hausman test)

GRETL Quick Guide

Task Menu Path Key Check
OLS Model → OLS Compare with 2SLS
First-stage regression Model → Two-Stage Least Squares F-stat > 10
Hausman test Regress OLS residuals on X and V p-value of V
2SLS Model → Two-Stage Least Squares Smaller coefficient = less bias

Final Advice: “Finding good instruments is like detective work - look for natural experiments and argue carefully for exogeneity!”