Teaching Notes: Econometrics Lab Session on Nonlinear Regression Functions

Introduction

This lab session focuses on detecting and modeling nonlinear relationships in regression analysis using the California Test Score dataset. We’ll explore two main groups of methods: 1. When the effect of X₁ on Y depends on the value of X₁ itself. 2. When the effect of X₁ on Y depends on another variable X₂.

Today we’ll focus on the first group - nonlinear relationships where the effect of a predictor depends on its own value.

1. Visualizing Nonlinearity: Test Scores vs. District Income

Scatterplot with Linear OLS Regression

Let’s begin by examining the relationship between test scores (Y) and district average income (X) with a simple linear regression:

View → Graph Specified Vars → X-Y Scatter

Observation: The linear fit doesn’t capture the apparent curvature in the data - most points are below the line at very low and very high incomes, but above the line in the middle range.

2. Quadratic Regression Model

Specification

To model this curvature, we’ll estimate a quadratic regression: TestScoreᵢ = β₀ + β₁Incomeᵢ + β₂Incomeᵢ²+

avginc  → Add → square of selected variables 
Model  → ordinary least squares  → testscr avginc avginc_sq

Estimated equation: TestScore = 607.3 + 3.85 Income - 0.0423 Income²

Testing Nonlinearity

We can test whether the quadratic term is needed:

H0:   β₂ = 0 or not

The significant t-statistic (-8.81) and small p-value (<0.01%) suggest the quadratic term improves the model.

Interpreting Effects

The effect of income changes depends on the initial income level:

  1. Increase from $10K to $11K: ΔTestScore = [607.3 + 3.85×11 - 0.0423×11²] - [607.3 + 3.85×10 - 0.0423×10²] = 2.96 points

  2. Increase from $40K to $41K: ΔTestScore = [607.3 + 3.85×41 - 0.0423×41²] - [607.3 + 3.85×40 - 0.0423×40²] = 0.42 points

Key Insight: A $1000 increase has a larger effect in poorer districts than wealthy ones.

3. General Approach to Modeling Nonlinearities (Recipe)

  1. Identify possible nonlinearity: Examine scatterplots and consider theoretical relationships
  2. Specify nonlinear function: Start with quadratic terms
  3. Test against linear model: Use t-tests on nonlinear terms
  4. Plot estimated function: Visualize fit
  5. Interpret effects: Calculate effects at meaningful X values

4. Cubic Regression Model

For more flexibility, we can estimate a cubic model: TestScoreᵢ = β₀ + β₁Incomeᵢ + β₂Incomeᵢ² + β₃Incomeᵢ³ +uᵢ

Add → Define new variable → avgin_cube=avg_inc^3

Estimated equation: TestScore = 600.1 + 5.02 Income - 0.096 Income² + 0.00069 Income³

Testing Higher-Order Terms

H0:   β3 = 0 or not

Polynomial Modeling Recipe

  1. Start with quadratic (X²) terms
  2. Add cubic (X³) if theory suggests or data shows more complexity
  3. Test significance of highest-order term
  4. If insignificant, remove and use lower-order model
  5. Continue until highest-order term is significant

5. Logarithmic Regression Models

Three Cases:

  1. Linear-log: Y = β1 + β2ln(X) +

    Interpretation: 1% increase in X → 0.01β2 change in Y

  2. Log-linear: ln(Y) = β1 + β2X +

    Interpretation: 1-unit increase in X → 100β₁% change in Y

  3. Log-log: ln(Y) = β1 + β2ln(X) +

    Interpretation: 1% increase in X → β2% change in Y (elasticity)

Interpretation Recipe

Case Specification Interpretation of β₁
Linear-log Y = β1 + β2ln(X) 1% ΔX → 0.01β2 ΔY
Log-linear ln(Y) = β1 + β2X 1-unit ΔX → 100β2% ΔY
Log-log ln(Y) = β1 + β2ln(X) 1% ΔX → β2% ΔY (elasticity)

Conclusion of First Group Methods

We’ve explored several approaches to modeling when the effect of X₁ on Y depends on X₁’s value: - Polynomial regressions (quadratic, cubic) - Logarithmic transformations All methods showed that the test score-income relationship is indeed nonlinear, with diminishing returns to higher income.

Econometrics Lab Session on Interaction Effects (Second Group of Methods)

Introduction to Interaction Effects

We now turn to the second group of methods for modeling nonlinear relationships - when the effect of one independent variable (X₁) depends on the value of another variable (X₂). These are called interaction effects and are crucial for understanding how relationships change across different subgroups or conditions.

1. Interactions Between Two Binary Variables

Example Setup:

Let’s create binary variables for our dataset: - DSTR: 1 if student-teacher ratio ≥ 20, 0 otherwise - DEL: 1 if % English learners ≥ 10%, 0 otherwise

Regression with Interaction:

Estimated equation: TestScore = 664.1 - 1.9 DSTR - 18.2 DEL - 3.5(DSTR × DEL)

Interpretation:

The effect of high STR depends on English learner status: - For low EL districts (DEL=0): Effect = -1.9 points - For high EL districts (DEL=1): Effect = -1.9 - 3.5 = -5.4 points

Testing the Interaction:

The t-statistic (-3.5/3.1 ≈ -1.13) suggests the interaction may not be statistically significant at conventional levels (p ≈ 0.26).

2. Interactions Between Two Continuous Variables

Specification:

TestScore = β1 + β2STR + β3PctEL + β4(STR × PctEL) +

Estimated equation: TestScore = 686.3 - 1.12 STR - 0.67 PctEL + 0.0012(STR × PctEL)

Interpretation:

The marginal effect of STR is: ∂TestScore/∂STR = -1.12 + 0.0012 PctEL

At median PctEL (8.85): Effect = -1.12 + 0.0012×8.85 ≈ -1.11

At 75th %ile PctEL (23.0): Effect = -1.12 + 0.0012×23.0 ≈ -1.09

Key Findings:

  1. The STR effect shows nonlinearity (significant cubic terms)
  2. The interaction between STR and % English learners is marginally significant
  3. Economic controls (meal eligibility, income) are important confounders

General Recipe for Interaction Analysis:

  1. Theoretical justification: Identify plausible interactions
  2. Estimate models: With and without interaction terms
  3. Test significance: Of interaction terms
  4. Interpret carefully: Effects are now conditional
  5. Visualize: Plot regression lines for key subgroups
  6. Calculate effects: At meaningful values of moderating variable

Conclusion

Interaction effects allow us to model how relationships change across different contexts. Key takeaways: - Binary × binary interactions create four distinct groups - Continuous × binary interactions allow different slopes - Continuous × continuous interactions make effects depend on values - Always include main effects when adding interactions - Visualizations are crucial for interpretation

Next Steps: Students should practice with different interaction specifications and test whether the effects they find are statistically and substantively significant.