Discussion Question:
“Why might women’s earnings have less variance than men’s? How would
this affect OLS results?”
Key Points:
- Heteroskedasticity → Biased standard errors → Wrong conclusions about
significance.
- Example: If high earners have more volatile salaries, OLS understates
uncertainty for top income brackets.
Model
> Ordinary Least Squares
AHE
Age
, Female
,
Bachelor
Graph
>
Fitted, actual plot
> Residual plot
.White Test:
1. After regression: Tests
>
Heteroskedasticity
> White’s test
.
2. Criterion: Reject null (homoskedasticity) if
p-value < 0.05.
Breusch-Pagan Test:
1. Tests
> Heteroskedasticity
>
Breusch-Pagan
.
2. Criterion: Reject null if p-value <
0.05.
Discussion Question:
“Why might White’s test be preferred over Breusch-Pagan?”
- Answer: White’s test detects any form of
heteroskedasticity; BP assumes a linear relationship between errors and
regressors.
Discussion Question:
“When would you use WLS instead of robust SEs?”
- Answer: If you know the variance structure (e.g.,
variance ∝ education level).
Discussion Question:
“Why can’t we include all region dummies (Northeast, Midwest, South,
West) plus an intercept?”
- Answer: Perfect multicollinearity (regions sum to
1).
View
> Correlation matrix
> Select
Age
, Age²
, Female
,
Bachelor
.Model
> OLS
>
Include Age
, Age²
, Female
,
Bachelor
.Tests
>
Variance inflation factors
.Discussion Question:
“Why does multicollinearity inflate standard errors but not bias
coefficients?”
- Answer: OLS estimates remain unbiased, but
uncertainty increases because predictors “compete” to explain the same
variation.
Northeast
(baseline category).Add
> Define new variable
:
Age_centered
Age - mean(Age)
Age_centered²
.Discussion Question:
“Why does centering reduce multicollinearity between Age and
Age²?”
- Answer: Removes correlation between linear and
quadratic terms.
Task: Diagnose and fix issues in:
\[ \text{ln(AHE)} = \beta_0 + \beta_1
\text{Age} + \beta_2 \text{Age}^2 + \beta_3 \text{Female} + \beta_4
\text{Bachelor} \]
Steps:
1. Check heteroskedasticity (White’s test
).
2. Check multicollinearity (VIF
).
3. Apply fixes (centering + robust SEs).
Issue | Test | Solution | Gretl Path |
---|---|---|---|
Heteroskedasticity | White’s test (p < 0.05) | Robust SEs | Tests > Heteroskedasticity |
Multicollinearity | VIF > 10 | Centering/dropping | Tests > Variance inflation |
a. Linear Regression of AHE on Age, Gender, and
Education
- Run a regression of average hourly earnings (AHE)
on:
- Age (continuous)
- Female (binary: 1 if female, 0 if male)
- Bachelor (binary: 1 if holds a bachelor’s degree, 0
otherwise).
- Interpretation:
- If Age increases from 25 to 26, by
how much do earnings change?
- If Age increases from 33 to 34, by
how much do earnings change?
b. Log-Linear Regression of ln(AHE) on Age, Gender, and
Education
- Run a regression of ln(AHE) on the same variables
(Age, Female, Bachelor).
- Interpretation:
- For a one-year increase in Age from 25 to 26, what is
the expected percentage change in earnings?
- For a one-year increase in Age from 33 to 34, what is
the expected percentage change in earnings?
c. Log-Log Regression of ln(AHE) on ln(Age), Gender, and
Education
- Run a regression of ln(AHE) on:
- ln(Age) (natural log of Age)
- Female
- Bachelor.
- Interpretation:
- If Age increases from 25 to 26, what is the expected
percentage change in earnings?
- If Age increases from 33 to 34, what is the expected
percentage change in earnings?
d. Quadratic (Polynomial) Regression of ln(AHE) on Age and
Age²
- Run a regression of ln(AHE) on:
- Age
- Age² (Age squared)
- Female
- Bachelor.
- Interpretation:
- If Age increases from 25 to 26, what is the expected
percentage change in earnings?
- If Age increases from 33 to 34, what is the expected
percentage change in earnings?
e. Model Comparison (c vs. b)
- Do you prefer the log-log model (c) over the
log-linear model (b)? Explain why.
f. Model Comparison (d vs. b)
- Do you prefer the quadratic model (d) over the
log-linear model (b)? Explain why.
g. Model Comparison (d vs. c)
- Do you prefer the quadratic model (d) over the
log-log model (c)? Explain why.
h. Graphical Analysis of Regression Functions
- Plot the estimated relationship between Age and
ln(AHE) for:
- Males with only a high school diploma (no bachelor’s
degree), using models (b), (c), and (d).
- Describe:
- Similarities and differences between the three regression
curves.
- Additional Consideration:
- Would the results differ if plotted for females with a
bachelor’s degree?