Duration: 2 × 45-minute sessions
Objective: Apply multiple regression concepts to
analyze wage determinants, test model specifications, create dummies for
education levels, test the linearity of education levels and interpret
results.
Tools: GRETL
Question: “Do women earn less than men?”
Simple Regression Approach:
\[
\log(\text{Wage}) = \beta_0 + \beta_1 \text{Female} + \epsilon
\]
Structural Pathways:
Gender → Education → Wage
Gender → Part-Time Status → Wage
Gender → (Direct Discrimination) → WageExample:
( Policy vs. Economic Analysis)
(Linking Theory to Lab)
| Concept | Simple Regression | Multiple Regression |
|---|---|---|
| What it measures | Total gender gap | Partial (direct) effect |
| Use case | Policy discussions | Legal discrimination |
| Main problem | Omitted variable bias | Requires more data |
| Gender gap interpretation | “Women earn 22% less” | “Women earn X% less for the same qualifications” |
Task: Run the model:
\[
\log(\text{Wage}) = 4.27 - 0.09\text{Female} + 0.22\text{Educ} -
0.14\text{Parttime} + e
\]
Steps:
1. Load dataset in GRETL.
2.
Model > Ordinary Least Squares > log(Wage) ~ Female + Age + Educ + Parttime.
Regression Outcomes Discussion:
| Variable | Coeff. | Interpretation (Relative Effect) |
|————|——–|———————————-|
| Female | -0.095 | Women earn 9% less (p=0.000) |
| Educ | 0.219 | +25% per level (p=0.000) |
| Parttime | -0.142 | -13% (p=0.000) |
Key Points:
Mathematical Interpretation:
Conceptual Discussion:
Task: Test if education returns are constant across levels. Because using educ as a single number forces the relationship to be linear: it assumes going from level 1→2 has the same effect as from level 2→3, and from 3→4. That is, each additional education level adds the same wage increase. That’s a strong assumption – and often wrong.
Steps:
-Add → Dummies for selected discrete variable. - Select education as the variable. -Choose “Create dummies for all levels except the first
Run OLS with all dummies and other controls. \[ \log(\text{Wage}) = 4.52 - 0.08\text{Female} + 0.03\text{Age} + 0.11\text{DE2} + 0.34\text{DE3} + 0.73\text{DE4} - 0.15\text{Parttime} + e \]
DE2 coefficient (0.11): Moving from level 1 to level 2 increases log wage by 0.11 → approximately 12% higher wage (since e^0.11 ≈ 1.12).
DE3 coefficient (0.34): Level 3 vs. level 1 → 0.34 log points → about 41% higher wage.
DE4 coefficient (0.73): Level 4 vs. level 1 → 0.73 log points → about 108% higher wage (more than double!).
But the incremental gains are:
Level 1 → 2: +12%
Level 2 → 3: from 12% to 41% → extra +29%? Actually careful: The difference between DE3 and DE2 is 0.34 - 0.11 = 0.23 → about +26% extra.
Level 3 → 4: 0.73 - 0.34 = 0.39 → about +47% extra.
So the returns increase as education level rises – the highest level gives a huge jump.
The F-test: Testing if the “step pattern” is just a straight line
The null hypothesis (linear model) says: The increase from 1→2 = some amount δ.
Then 1→3 must be 2δ, and 1→4 must be 3δ.
In terms of dummy coefficients: β₃ = 2×β₂ and β₄ = 3×β₂.
If that’s true, then the three dummy coefficients lie exactly on a straight line when plotted against education level
The F-test quantifies whether the deviation is larger than what random chance would produce. Here F=8.98, much larger than the critical value of 2.6, so we reject linearity.
Why does this matter for policy?
Summary:
Dummies let each education level have its own wage effect.
The pattern shows increasing returns: the jump from high school to college is much larger than from primary to secondary.
The F-test confirms that this pattern is not a straight line – it’s steeper at the top.
Wage Effects by Education Level: - Level 1 → 2:
+12%
- Level 2 → 3: +26%
- Level 3 → 4: +47%
Discussion:
- “Higher education yields increasing returns, especially at top
levels.”
This lab session focuses on detecting and modeling nonlinear relationships in regression analysis using the California Test Score dataset. We’ll explore
We’ll focus on nonlinear relationships where the effect of a predictor depends on its own value.
Let’s begin by examining the relationship between test scores (Y) and district average income (X) with a simple linear regression:
View → Graph Specified Vars → X-Y Scatter
Observation: The linear fit doesn’t capture the apparent curvature in the data - most points are below the line at very low and very high incomes, but above the line in the middle range.
To model this curvature, we’ll estimate a quadratic regression: TestScoreᵢ = β₀ + β₁Incomeᵢ + β₂Incomeᵢ²+
avginc → Add → square of selected variables
Model → ordinary least squares → testscr avginc avginc_sq
Estimated equation: TestScore = 607.3 + 3.85 Income - 0.0423 Income²
We can test whether the quadratic term is needed:
H0: β₂ = 0 or not
The significant t-statistic (-8.81) and small p-value (<0.01%) suggest the quadratic term improves the model.
The effect of income changes depends on the initial income level:
Increase from $10K to $11K: ΔTestScore = [607.3 + 3.85×11 - 0.0423×11²] - [607.3 + 3.85×10 - 0.0423×10²] = 2.96 points
Increase from $40K to $41K: ΔTestScore = [607.3 + 3.85×41 - 0.0423×41²] - [607.3 + 3.85×40 - 0.0423×40²] = 0.42 points
Key Insight: A $1000 increase has a larger effect in poorer districts than wealthy ones.
For more flexibility, we can estimate a cubic model: TestScoreᵢ = β₀ + β₁Incomeᵢ + β₂Incomeᵢ² + β₃Incomeᵢ³ +uᵢ
Add → Define new variable → avgin_cube=avg_inc^3
Estimated equation: TestScore = 600.1 + 5.02 Income - 0.096 Income² + 0.00069 Income³
H0: β3 = 0 or not
We’ve explored several approaches to modeling when the effect of X₁ on Y depends on X₁’s value: