Teaching Notes: Econometrics Lab Session on Nonlinear Regression Functions

Introduction

This lab session focuses on detecting and modeling nonlinear relationships in regression analysis using the California Test Score dataset. We’ll explore two main groups of methods: 1. When the effect of X₁ on Y depends on the value of X₁ itself. 2. When the effect of X₁ on Y depends on another variable X₂.

Today we’ll focus on the first group - nonlinear relationships where the effect of a predictor depends on its own value.

1. Visualizing Nonlinearity: Test Scores vs. District Income

Scatterplot with Linear OLS Regression

Let’s begin by examining the relationship between test scores (Y) and district average income (X) with a simple linear regression:

View → Graph Specified Vars → X-Y Scatter

Observation: The linear fit doesn’t capture the apparent curvature in the data - most points are below the line at very low and very high incomes, but above the line in the middle range.

2. Quadratic Regression Model

Specification

To model this curvature, we’ll estimate a quadratic regression: TestScoreᵢ = β₀ + β₁Incomeᵢ + β₂Incomeᵢ²+

avginc  → Add → square of selected variables 
Model  → ordinary least squares  → testscr avginc avginc_sq

Estimated equation: TestScore = 607.3 + 3.85 Income - 0.0423 Income²

Testing Nonlinearity

We can test whether the quadratic term is needed:

H0:   β₂ = 0 or not

The significant t-statistic (-8.81) and small p-value (<0.01%) suggest the quadratic term improves the model.

Interpreting Effects

The effect of income changes depends on the initial income level:

Increase from $10K to $11K: ΔTestScore = [607.3 + 3.85×11 - 0.0423×11²] - [607.3 + 3.85×10 - 0.0423×10²] = 2.96 points
Increase from $40K to $41K: ΔTestScore = [607.3 + 3.85×41 - 0.0423×41²] - [607.3 + 3.85×40 - 0.0423×40²] = 0.42 points

Key Insight: A $1000 increase has a larger effect in poorer districts than wealthy ones.

3. General Approach to Modeling Nonlinearities (Recipe)

Identify possible nonlinearity: Examine scatterplots and consider theoretical relationships
Specify nonlinear function: Start with quadratic terms
Test against linear model: Use t-tests on nonlinear terms
Plot estimated function: Visualize fit
Interpret effects: Calculate effects at meaningful X values

4. Cubic Regression Model

For more flexibility, we can estimate a cubic model: TestScoreᵢ = β₀ + β₁Incomeᵢ + β₂Incomeᵢ² + β₃Incomeᵢ³ +uᵢ

Add → Define new variable → avgin_cube=avg_inc^3

Estimated equation: TestScore = 600.1 + 5.02 Income - 0.096 Income² + 0.00069 Income³

Testing Higher-Order Terms

H0:   β3 = 0 or not

Polynomial Modeling Recipe

Start with quadratic (X²) terms
Add cubic (X³) if theory suggests or data shows more complexity
Test significance of highest-order term
If insignificant, remove and use lower-order model
Continue until highest-order term is significant

5. Logarithmic Regression Models

Three Cases:

Linear-log: Y = β1 + β2ln(X) +

Interpretation: 1% increase in X → 0.01β2 change in Y
Log-linear: ln(Y) = β1 + β2X +

Interpretation: 1-unit increase in X → 100β₁% change in Y
Log-log: ln(Y) = β1 + β2ln(X) +

Interpretation: 1% increase in X → β2% change in Y (elasticity)

Interpretation Recipe

Case	Specification	Interpretation of β₁
Linear-log	Y = β1 + β2ln(X)	1% ΔX → 0.01β2 ΔY
Log-linear	ln(Y) = β1 + β2X	1-unit ΔX → 100β2% ΔY
Log-log	ln(Y) = β1 + β2ln(X)	1% ΔX → β2% ΔY (elasticity)

Conclusion of First Group Methods

We’ve explored several approaches to modeling when the effect of X₁ on Y depends on X₁’s value: - Polynomial regressions (quadratic, cubic) - Logarithmic transformations All methods showed that the test score-income relationship is indeed nonlinear, with diminishing returns to higher income.

Econometrics Lab Session on Interaction Effects (Second Group of Methods)

Introduction to Interaction Effects

We now turn to the second group of methods for modeling nonlinear relationships - when the effect of one independent variable (X₁) depends on the value of another variable (X₂). These are called interaction effects and are crucial for understanding how relationships change across different subgroups or conditions.

1. Interactions Between Two Binary Variables

Example Setup:

Let’s create binary variables for our dataset: - DSTR: 1 if student-teacher ratio ≥ 20, 0 otherwise - DEL: 1 if % English learners ≥ 10%, 0 otherwise

Regression with Interaction:

Estimated equation: TestScore = 664.1 - 1.9 DSTR - 18.2 DEL - 3.5(DSTR × DEL)

Interpretation:

The effect of high STR depends on English learner status: - For low EL districts (DEL=0): Effect = -1.9 points - For high EL districts (DEL=1): Effect = -1.9 - 3.5 = -5.4 points

Testing the Interaction:

The t-statistic (-3.5/3.1 ≈ -1.13) suggests the interaction may not be statistically significant at conventional levels (p ≈ 0.26).

2. Interactions Between Two Continuous Variables

Specification:

TestScore = β1 + β2STR + β3PctEL + β4(STR × PctEL) +

Estimated equation: TestScore = 686.3 - 1.12 STR - 0.67 PctEL + 0.0012(STR × PctEL)

Interpretation:

The marginal effect of STR is: ∂TestScore/∂STR = -1.12 + 0.0012 PctEL

At median PctEL (8.85): Effect = -1.12 + 0.0012×8.85 ≈ -1.11

At 75th %ile PctEL (23.0): Effect = -1.12 + 0.0012×23.0 ≈ -1.09

Key Findings:

The STR effect shows nonlinearity (significant cubic terms)
The interaction between STR and % English learners is marginally significant
Economic controls (meal eligibility, income) are important confounders

General Recipe for Interaction Analysis:

Theoretical justification: Identify plausible interactions
Estimate models: With and without interaction terms
Test significance: Of interaction terms
Interpret carefully: Effects are now conditional
Visualize: Plot regression lines for key subgroups
Calculate effects: At meaningful values of moderating variable

Conclusion

Interaction effects allow us to model how relationships change across different contexts. Key takeaways: - Binary × binary interactions create four distinct groups - Continuous × binary interactions allow different slopes - Continuous × continuous interactions make effects depend on values - Always include main effects when adding interactions - Visualizations are crucial for interpretation

Next Steps: Students should practice with different interaction specifications and test whether the effects they find are statistically and substantively significant.

Teaching Note for Applied Econometrics and Economic Modelling Lab Session - Day 6

2025-03-31

Teaching Notes: Econometrics Lab Session on Nonlinear Regression Functions

Introduction

1. Visualizing Nonlinearity: Test Scores vs. District Income

Scatterplot with Linear OLS Regression

2. Quadratic Regression Model

Specification

Testing Nonlinearity

Interpreting Effects

3. General Approach to Modeling Nonlinearities (Recipe)

4. Cubic Regression Model

Testing Higher-Order Terms

Polynomial Modeling Recipe

5. Logarithmic Regression Models

Three Cases:

Interpretation Recipe

Conclusion of First Group Methods

Econometrics Lab Session on Interaction Effects (Second Group of Methods)

Introduction to Interaction Effects

1. Interactions Between Two Binary Variables

Example Setup:

Regression with Interaction:

Interpretation:

Testing the Interaction:

2. Interactions Between Two Continuous Variables

Specification:

Interpretation:

Key Findings:

General Recipe for Interaction Analysis:

Conclusion