Wage_GenderDS.xls (500
observations on wage, gender, age, education, part-time status)Wage Gender.xls fileobs, wage,
female, age, educ,
parttime.Variable definitions:
wage: hourly wage (in some currency unit)female: 1 if woman, 0 if manage: age in yearseduc: education level (1 = low, 2 = medium, 3 = high, 4
= very high)parttime: 1 if part-time worker, 0 if full-timeGRETL: View → Summary statistics (select all variables).
female).educ and parttime by
gender.Exercise:
female as group variable.Expected observation:
Intuitive explanation:
Create log(wage) in GRETL:
wage → natural log.l_wage appears.Instructions:
wage.l_wage.Discussion questions:
GRETL: View → Graph → Factorized Boxplots → check
“Draw boxplots for each level of a categorical variable” → select
female as factor.
Observe:
Expected findings:
Model:
\[
\ln(\text{wage}_i) = \beta_1 + \beta_2 \text{female}_i + \varepsilon_i
\]
GRETL: Model → Ordinary Least Squares
l_wagefemalefemaleInterpretation:
On average, women earn about 22% less than men in this
sample.
Expected answers:
Key insight: The simple regression measures the total wage gap (including all differences), not the causal effect of gender holding other factors constant.
GRETL: View → Summary statistics → factorized by
female, select educ.
Create a cross‑tabulation manually or use Data → Frequency
tables → two variables: educ and
female.
Expected pattern: - Lower education levels (educ=1) more frequent among women. - Higher education levels (educ=4) less frequent among women.
Discussion: - “If women have less education on average, and education raises wages, then part of the raw wage gap is due to education differences, not discrimination.”
GRETL: Summary statistics for parttime
by female.
Expected result: - Proportion of part‑time workers among women: ~56% - Among men: ~22%
Discussion: - Part‑time jobs typically pay less per hour (fewer benefits, less seniority). - Women may choose part‑time work for family reasons, but the wage penalty should not be ignored.
Factorized box-plots: - l_wage
vs. educ – shows positive relationship. -
l_wage vs. parttime – shows negative
relationship.
GRETL: View → Graph Specified Vars → Factorized Box-plots
Summary table:
| Variable | Men (female=0) | Women (female=1) | Relationship with wage |
|---|---|---|---|
| Education | Higher average | Lower average | Positive |
| Part‑time | Lower share | Higher share | Negative |
Conclusion: Both confounders likely exaggerate the raw gender gap. A multiple regression can estimate the direct effect of gender after controlling for these factors.
Key takeaways: - The simple regression shows a 22% total wage gap. - But women also have lower education and higher part‑time rates. - To isolate discrimination, we need to control for these confounders.
Preview of Session 2:
“We will diagnose omitted variable bias by analyzing residuals, then
learn how multiple regression solves the problem.”
Definition: The total effect of gender on wage includes all pathways:
Formula:
\[
\text{Total gap} = \underbrace{\text{Direct effect}}_{\text{causal}} +
\underbrace{\text{Indirect via confounders}}_{\text{non‑discriminatory}}
\]
Policy use: Total gap measures overall inequality, useful for broad social policy.
Definition: The effect of gender holding other variables constant (e.g., same education, same part‑time status).
Interpretation: The partial effect is often interpreted as the upper bound of discrimination (if all relevant confounders are controlled).
Legal use: Courts often require evidence of discrimination after accounting for qualifications.
For education:
female and educ), and education increases wage
(positive effect), then omitting educ causes the
female coefficient to be more negative
(biased downward).For part‑time:
female).female coefficient to be more
negative (bias downward).Conclusion: The simple regression overestimates the gender gap (makes it look larger) because it fails to control for factors that are correlated with gender and also affect wages.
Residual \(e_i = \ln(\text{wage}_i) - \hat{\beta}_1 - \hat{\beta}_2 \text{female}_i\).
Interpretation: The part of log‑wage not explained by gender alone.
If the model is correct, residuals should be random (no correlation with other variables). If residuals correlate with education or part‑time, that means those variables belong in the model.
GRETL steps:
l_wage on
female), click Save → Residuals.e_simple.Scatterplot:
educ, Y variable: e_simple. Add jitter.Expected pattern: Positive correlation.
Calculate correlation coefficient:
- View → Summary statistics → select
e_simple and educ → check “Correlation
matrix”.
Expected correlation: roughly +0.53 (positive).
Interpretation:
“Residuals are not random – they increase with education. This means our
model is missing education. Women have less education, so the model
incorrectly attributes the lower wage from less education to
gender.”
Scatterplot: e_simple
vs. parttime (part‑time is 0/1, so use boxplot or scatter
with jitter).
Expected pattern: Negative relationship.
Calculate correlation: should be negative (e.g., –0.2 to –0.3).
Interpretation:
“Residuals are lower for part‑time workers. Since women are more likely
to work part‑time, the simple regression mistakenly treats part‑time
wage penalty as part of the gender gap.”
Multiple regression of residuals on educ and
parttime:
e_simple,
independent: educ, parttime.Expected output: Both coefficients significant, \(R^2\) around 0.2–0.3.
Interpretation:
“The omitted variables explain a substantial portion of the residuals.
This confirms that the simple regression suffers from omitted variable
bias.”
Key message:
Practical rule:
| Concept | Simple Regression (l_wage ~ female) | Multiple Regression (l_wage ~ female + educ + parttime) |
|---|---|---|
| What it measures | Total wage gap (including confounders) | Direct gender effect (holding education and part‑time constant) |
| Interpretation of female coefficient | –22% (biased downward) | Expected to be smaller (closer to zero) |
| Key limitation | Omitted variable bias | Requires that no other confounders exist |
| Use case | Describing overall inequality | Estimating discrimination (partial effect) |
| Residual diagnostics | Correlated with educ and parttime | Should be uncorrelated with included variables |
| Task | GRETL Menu Path |
|---|---|
| Load CSV XLSX XLS | File → Open data → Import → CSV or |
| Summary statistics | View → Summary statistics |
| Grouped stats | View → Summary statistics → factorized |
| Histogram | Variable→ Frequency distribution → Normal distribution |
| Boxplot | View → Graph → Boxplot → select categorical variable (factor) |
| OLS regression | Model → Ordinary Least Squares |
| Save residuals | From model window: Save → Residuals |
| Correlation matrix | View → Summary statistics → check “Correlation matrix” |
| Create log variable | Add → Logs of selected variables → natural log |