PART 1: Fertility and Labor Supply: Guided In Class-Exercise

Dataset Introduction & Variable Descriptions

We’ll use the Fertility_Small dataset (30,000 married women, 1980 Census) with these key variables: - Dependent Variable: weeksm1 (mother’s weeks worked in 1979)
- Independent Variable: morekids (=1 if mother has >2 children)
- Instrument variable: samesex (=1 if first two children are same sex)
- Controls: agem1 (age), black, hispan, othrace

Introduction to the Problem

We want to estimate how having additional children affects mothers’ labor supply. At first glance, we might simply compare work hours between mothers with 2 children versus those with 3+ children. But is this comparison reliable?

ANSWER: Q0: Key Questions to Consider: a) What is the expected sign of coefficient on the number of children? Why? b) What are data types of weeksm1, morekids, and samesex? c) Why might simple correlation overstate the true causal effect? Why might be more children correlated with other factors?

Step 1: Run OLS Regression (Naive Estimate)

  1. In GRETL: Model → Ordinary Least Squares
  2. Set:
    • Dependent variable: weeksm1
    • Regressors: const morekids

ANSWER Q1

  1. Record and interpret:
    • Coefficient on morekids = ____
    • Standard error = ____
    • Is it statistically significant? (p < 0.05)
    • Comment on the results. What does this initial estimate suggest about the relationship between family size and labor supply? What is the marginal effect of having 3+ children rather than 2 children on labor supply of women? Why might this be misleading?

Understanding Endogeneity

The OLS estimate may be biased because: a)Omitted Variables: Mothers who prefer larger families might have different work preferences, women with career prospects might delay having children b)Reverse Causality: Women who work less may choose to have more children

ANSWER Q2

Can you think of other factors that might simultaneously affect fertility decisions and labor supply?

Instrumental Variables Approach

We’ll use samesex (whether first two children are same sex) as an instrument because: a) Relevance: Parents with same-sex children are more likely to have a third child b) Exogeneity: Child’s sex is essentially random

Step 2: First-Stage Regression

  1. In GRETL: Model → Ordinary Least Squares
  2. Set:
    • Dependent: morekids
    • Regressors: const samesex
    • Save the residuals as V
  3. Check:
    • Is the coefficient on samesex positive and significant? (t-stat > 2)
    • Is the F-statistic > 10? (Weak instrument test)

PS: Model → 2SLS will give you the same results if you select the same independent variables as instruments and regressors.

ANSWER Q3 Report the results. Comment on the coefficient on samesex. Why is it important that our instrument is both relevant and exogenous? Is the F-statistic > 10? (Weak instrument test).

Step 3: Hausman Test
“Is OLS really biased?”
- Run Step 1 OLS again → Save residuals as e_OLS
- GRETL: Model → OLS → Dependent: e_OLS, Regressors: const morekids V

ANSWER Q4:
Is V significant? (p < 0.05) ✓/✗ → If yes, OLS is biased!

Step 4: 2SLS
“Get the causal effect”
- GRETL: Model → Two-Stage Least Squares
- Dependent: weeksm1
- Regressors: const morekids
- Instruments: const samesex
ANSWER Q5:
a) 2SLS β = ______ vs OLS β = ______
b) Which estimate is more credible? Why?
c) What does this difference suggest about the original OLS bias? d) Compare standard errors of the estimators. Comment.

Step 5: Adding Control Variables

  1. In GRETL: Model → Two-Stage Least Squares
  2. Set:
    • Dependent: weeksm1
    • Regressors: const morekids agem1 black hispan othrace
    • Instruments: const samesex agem1 black hispan othrace
  3. Examine how results change

ANSWER Q6 a) Why might we want to include these control variables? b) How robust are our findings to these specifications?

Expected Results Comparison

Method morekids Coefficient Interpretation
OLS
IV

Fill in the blanks above: (Hint for interpretation: Potentially biased/Causal estimate)

ANSWER Q7: Discussion Questions

  1. Why is OLS inappropriate here?
  2. How does samesex address endogeneity?
  3. What if samesex was weakly correlated with morekids?

PART 2: Economic Growth Determinants: Guided In-Class Exercise

Data Introduction

Using the Growth dataset (excluding Malta), we’ll examine relationships between:

  • Growth: Annual GDP growth rate (%)
  • TradeShare: Measure of trade openness
  • YearsSchool: Average years of schooling
  • Rev_Coups: Number of revolutionary coups
  • Assassinations: Number of assassinations
  • RGDP60: Real GDP per capita in 1960

Exercise 1: Linear vs. Log Specification

  1. Scatterplot Analysis
    • Create a scatterplot of Growth against YearsSchool
    • GRETL command: View → Graph specified vars → X-Y scatter

Questions:

ANSWER Q1 Does the relationship appear linear or nonlinear?

ANSWER Q2 Why might regression (2) with ln(YearsSchool) fit better than regression (1) with YearsSchool?


Exercise 2: Policy Prediction

  1. Education Policy Impact
    • Regression (1): Growth ~ TradeShare + YearsSchool
    • Regression (2): Growth ~ TradeShare + ln(YearsSchool)

ANSWER Q3 Predict growth increase when schooling rises from 4 to 6 years using both regressions

ANSWER Q4 Compare results. Which specification seems more plausible?


Exercise 3: Nonlinear Trade Effects

  1. TradeShare Nonlinearity
    • Regression (5): Growth ~ TradeShare + TradeShare² + TradeShare³ + controls

ANSWER Q5 Are the quadratic/cubic terms jointly significant? (Test → Omit variables)

ANSWER Q6 What does this suggest about the trade-growth relationship?


Exercise 4: Joint Significance

  1. Omitted Variable Test
    • Full regression: Growth ~ TradeShare + YearsSchool + Rev_Coups + Assassinations + RGDP60

ANSWER Q7 Test if YearsSchool, Rev_Coups, Assassinations, RGDP60 can be omitted