Teaching Notes: Day 5 - Multiple Regression Application Lab

Session Overview

Duration: 2 × 45-minute sessions
Objective: Apply multiple regression concepts to analyze wage determinants, test model specifications, and interpret results.
Tools: GRETL, Wage Dataset


Session 1: Model Estimation & Interpretation (45 min)

1. Review of Key Concepts

1. The Fundamental Problem: Why Simple Regression Isn’t Enough

A. The Naïve Approach: Wage vs. Gender (Simple Regression)

  • Question: “Do women earn less than men?”
  • Simple Regression Approach:
    \[ \log(\text{Wage}) = \beta_0 + \beta_1 \text{Female} + \epsilon \]
    • Finds: Women earn 22% less on average.
    • But is this all discrimination?
    • Problem: This bundles all gender-related differences (education, job type, experience) into one coefficient.

B. The Reality: Gender Affects Wage Through Multiple Channels

  • Structural Pathways:

    Gender → Education → Wage  
    Gender → Part-Time Status → Wage  
    Gender → (Direct Discrimination) → Wage
  • Example:

    • Suppose:
      • Women, on average, have less education (48% in Level 1 vs. 34% for men).
      • Education → Higher wages (omitted variable bias).
    • Result: Simple regression overstates discrimination because it doesn’t account for education differences.

C. The Need for Multiple Regression

  • Goal: Isolate the direct effect of gender, holding other factors constant.
  • Analogy:
    • “Comparing wages of men and women with the same education and job status.”
    • Like a controlled experiment, where we “adjust” for the factors that affect wage (dependent variable).

2. Total Effect vs. Partial Effect: Two Different Research Questions

( Policy vs. Economic Analysis)

A. When to Use Simple Regression (Total Effect)

  • Question: “What is the overall wage gap society observes?”
    • Useful for policy debates (e.g., “Do we need gender equity laws?”).
    • Example: If women earn less because they choose lower-paying careers, should policymakers intervene?

B. When to Use Multiple Regression (Partial Effect)

  • Question: “Is there wage discrimination after accounting for qualifications?”
    • Used in legal cases (e.g., suing for pay discrimination).
    • Example: If women with the same education and job status still earn less, this suggests bias.

C. Real-World Parallel: The “College Premium” Debate

  • Simple Regression: College grads earn more.
  • Multiple Regression: But what if they also come from wealthier families?
  • Key Insight: Multiple regression helps separate causation from correlation.

3. Omitted Variable Bias: The Core Issue

A. Visualizing Bias

  1. Draw:
    • X-axis: Female (0 = Male, 1 = Female)
    • Y-axis: Log(Wage)
    • Regression line: Downward slope (-0.25).
  2. Now overlay:
    • Education: Women cluster at lower education → pulls their wages down further.
    • Part-Time: More women work part-time → also lowers wages.

B. The Two Biases at Play

  1. Downward Bias from Education
    • Women have less education → lowers wages.
    • If we don’t control for education, the gender gap looks worse than it is.
  2. Downward Bias from Part-Time Work
    • Women work part-time more often → lowers wages.
    • If we don’t control for part-time status, the gender gap looks worse than it is.

C. Net Effect in Simple Regression

  • The observed -0.25 is a mixture:
    • True discrimination effect (unknown).
    • Minus education bias (makes gap seem bigger).
    • Minus part-time bias (makes gap seem bigger).
  • Conclusion: We can’t trust the simple regression!

4. Preparing for GRETL: What We’ll Test

(Linking Theory to Lab)

A. Residual Analysis: Detecting Omitted Variables

  • If residuals correlate with education → education was omitted!
  • If residuals correlate with part-time → part-time was omitted!

B. Next Steps in Lab

  1. Run simple regression (gender only).
  2. Check residuals against education & part-time. Interpret the coefficient values. Discuss.
  3. If patterns exist → multiple regression is needed!
  • Extra level of education has an effect of 22% on unexplained wage (Expected).
  • Having a part time job increases unexplained wage by 10% (Unexpected).
  • Why? If part time jobs are more common for employees with higher education, this might happen.

Summary

Concept Simple Regression Multiple Regression
What it measures Total gender gap Partial (direct) effect
Use case Policy discussions Legal discrimination
Main problem Omitted variable bias Requires more data
Gender gap interpretation “Women earn 22% less” “Women earn X% less for the same qualifications

Transition to GRETL:
“Now, let’s see this in action with real data!”

2. Lab Exercise 1: Multiple Regression in GRETL (20 min)

Task: Run the model:
\[ \log(\text{Wage}) = 3.05 - 0.04\text{Female} + 0.03\text{Age} + 0.23\text{Educ} - 0.37\text{Parttime} + e \]
Steps:
1. Load dataset in GRETL.
2. Model > Ordinary Least Squares > log(Wage) ~ Female + Age + Educ + Parttime.

Regression Outcomes Discussion:
| Variable | Coeff. | Interpretation (Relative Effect) |
|————|——–|———————————-|
| Female | -0.041 | Women earn 4% less (p=0.097, NS at 5%) |
| Age | 0.031 | +3% per year (p=0.000) |
| Educ | 0.233 | +26% per level (p=0.000) |
| Parttime | -0.365 | -31% (p=0.000) |

Key Points:
- Gender effect shrinks from -25% (simple) to -4% (multiple regression).
- Education and part-time are highly significant confounders.


Session 2: Model Extensions & Testing (45 min)

3. Absolute vs. Relative Effects (10 min)

Mathematical Interpretation: - Log-wage model (relative effects):
\[ \beta_j = \frac{\partial \text{Wage}}{\partial x_j} \cdot \frac{1}{\text{Wage}} \approx \% \Delta \text{Wage} \]
- e.g., \(e^{0.233} - 1 = 26\%\) wage increase per education level.

  • Linear wage model (absolute effects): \[ \text{Wage} = -77.87 - 2.12\text{Female} + 29.47\text{Educ} + \dots \]
    • Education adds $29.47 per level (vs. 26% in log model).

Conceptual Discussion:
- “Use log models for % or relative effects, linear for $ effects.”

4. Testing Education Effects (20 min)

Task: Test if education returns are constant across levels.

Steps:
1. Create dummies (DE2, DE3, DE4) for education levels.
2. Estimate:
\[ \log(\text{Wage}) = 3.32 - 0.03\text{Female} + 0.03\text{Age} + 0.17\text{DE2} + 0.38\text{DE3} + 0.77\text{DE4} - 0.37\text{Parttime} + e \]
3. F-test:
- \(H_0\): Linear education effects (\(\beta_5 = 2\beta_4, \beta_6 = 3\beta_4\)).
- Compute \(F = \frac{(0.716 - 0.704)/2}{(1-0.716)/493} = 10.4\) > 2.6 (critical value).
- Reject \(H_0\): Returns are non-linear (47% gain for highest level).

Wage Effects by Education Level: - Level 1 → 2: +19%
- Level 2 → 3: +23%
- Level 3 → 4: +47%

Discussion:
- “Higher education yields increasing returns, especially at top levels.”