In-Class Notes: Panel Data Models

Author

AS

Worksheet: Fixed Effects, Random Effects, or Mixed Effects?

Background

Below are three models describing the same idea: - We observe repeated data for individuals (i) over time (t) - We want to study how \(X_{it}\) affects \(Y_{it}\)

Each equation treats the unit-specific effect differently.

Your task: Identify whether each model represents a Fixed Effects, Random Effects, or Mixed Effects model.


Model Specifications

Model A

\[Y_{it} = \alpha_i + \beta X_{it} + u_{it}\]

Model B

\[Y_{it} = \alpha + \beta X_{it} + \mu_i + u_{it}, \quad \mu_i \sim N(0, \sigma_\mu^2)\]

Model C

\[Y_{it} = (\alpha + \mu_i) + (\beta + v_i) X_{it} + u_{it}\]

Where: \[\begin{cases} \mu_i \sim N(0, \sigma_\mu^2)\\ v_i \sim N(0, \sigma_v^2) \end{cases}\]


Questions for Students

  1. Which model allows each individual to have their own intercept (but a common slope)?

  2. Which model assumes that the unobserved effect is random and uncorrelated with \(X_{it}\)?

  3. Which model allows both intercepts and slopes to vary across individuals?

  4. Which model would you expect to be most common in econometrics and why?


Answer Key and Explanations

Model A → Fixed Effects (FE)

\[Y_{it} = \alpha_i + \beta X_{it} + u_{it}\]

  • Each unit (i) has its own intercept \(\alpha_i\) (fixed constant)
  • The slope \(\beta\) is the same for all units
  • Removes unobserved heterogeneity correlated with \(X_{it}\)
  • Estimated using within transformation or dummy variables

Econometric insight: Most common for causal inference when unobserved heterogeneity is correlated with regressors.


Model B → Random Effects (RE)

\[Y_{it} = \alpha + \beta X_{it} + \mu_i + u_{it}\]

  • \(\mu_i\) is a random intercept drawn from a distribution
  • Assumes \(\text{Cov}(\mu_i, X_{it}) = 0\)
  • More efficient than FE if assumption holds
  • Estimated using Generalized Least Squares (GLS)

Econometric insight: Used when time-invariant regressors matter and unobserved effects are uncorrelated with regressors.


Model C → Mixed Effects (ME)

\[Y_{it} = (\alpha + \mu_i) + (\beta + v_i) X_{it} + u_{it}\]

  • Both intercepts \((\alpha + \mu_i)\) and slopes \((\beta + v_i)\) vary across units
  • Random deviations \(\mu_i, v_i\) capture group-specific patterns
  • Used in hierarchical or multilevel settings (e.g., students within schools)
  • Estimated by Maximum Likelihood / REML

Econometric insight: Common in applied micro (education, health, labor) when relationships differ by group.


Discussion Prompts

  1. Which model would be most appropriate if you believe unobserved ability is correlated with education?
    • Answer: Fixed Effects, as it can handle correlation between unobserved heterogeneity and regressors
  2. Which model best captures the idea that different firms respond differently to market changes?
    • Answer: Mixed Effects, as it allows slopes to vary across units
  3. How would you test whether Random Effects is valid?
    • Answer: Hausman test compares FE and RE estimates; if they differ significantly, RE assumption is violated

Quick Summary Table

Model Intercept Slope Key Assumption Estimation Method
Fixed Effects Unit-specific (\(\alpha_i\)) Common (\(\beta\)) Correlation with \(X_{it}\) allowed Within transformation, LSDV
Random Effects Random (\(\alpha + \mu_i\)) Common (\(\beta\)) \(\text{Cov}(\mu_i, X_{it}) = 0\) GLS
Mixed Effects Random (\(\alpha + \mu_i\)) Random (\(\beta + v_i\)) Hierarchical structure ML/REML

Key Takeaways

  1. Fixed Effects is the workhorse of causal inference in econometrics
  2. Random Effects is more efficient but requires stronger assumptions
  3. Mixed Effects is flexible for hierarchical data structures
  4. Choice depends on:
    • Nature of unobserved heterogeneity
    • Research question (causal vs. predictive)
    • Data structure (panel vs. hierarchical)

How to Think About Random Effects vs Fixed Effects

The Common Misconception

NOT quite right: - “Random Effects = randomly sampled observations” - “Fixed Effects = all observations from population”

The real distinction is about: - How unobserved heterogeneity relates to your regressors - What assumptions you’re willing to make - What type of variation identifies your effect


The Core Conceptual Difference

Random Effects Mindset

“The unobserved differences between units are random draws from a distribution, independent of the explanatory variables”

Example: 1000 households randomly surveyed
- Each household has unobserved preferences (μᵢ)
- These preferences are like random "lottery tickets" from a distribution
- Key: Household income doesn't determine these random preferences
- We can use BOTH within and between household variation

Fixed Effects Mindset

“The unobserved differences between units are fixed parameters that may be correlated with explanatory variables”

Example: Same 1000 households
- Each household has unobserved ability/preferences (αᵢ)
- These might be correlated with income (ability → higher income)
- Key: We DON'T assume independence
- We use ONLY within-household variation over time

A Concrete Example: Studying Income → Consumption

Dataset: 500 households observed for 5 years

Scenario A: Use Random Effects

Your belief: Households were randomly selected from the population
- Unobserved frugality (μᵢ) is randomly distributed
- Frugal and spendthrift households equally likely to be rich or poor
- Income doesn't determine baseline consumption preferences

Model: Consumptionᵢₜ = α + β·Incomeᵢₜ + μᵢ + uᵢₜ
       where E[μᵢ|Incomeᵢₜ] = 0

Scenario B: Use Fixed Effects

Your belief: Unobserved traits correlate with income
- More educated households have both:
  * Higher income (education → better jobs)
  * Different consumption patterns (education → preferences)
- Can't assume independence

Model: Consumptionᵢₜ = αᵢ + β·Incomeᵢₜ + uᵢₜ
       where αᵢ can be correlated with Incomeᵢₜ

It’s NOT About Sample Size or Selection!

Both RE and FE can use:

  • The exact same dataset
  • All observations or a subset
  • Randomly selected or complete population data

Example: Company Employee Data

Full census of 10,000 employees over 10 years:

Could use RE if:

  • Studying effect of training on productivity
  • You believe innate ability is uncorrelated with training participation
  • (Maybe training is randomly assigned)

Should use FE if:

  • You believe high-ability workers select into training
  • Ability affects both training participation AND productivity
  • You want to control for this selection

Practical Thinking Guide

Think Fixed Effects when:

  1. Selection concerns: “Better units might select into treatment”
  2. Omitted variables: “Unobserved quality affects both X and Y”
  3. Causal focus: “I want within-unit variation only”
  4. Skeptical stance: “I don’t trust independence assumptions”

Mental model: “Compare each unit to itself over time”

Think Random Effects when:

  1. True randomization: “Units randomly drawn from population”
  2. Experimental setting: “Treatment randomly assigned”
  3. Efficiency matters: “I want to use all variation in the data”
  4. Time-invariant variables: “I need to estimate effects of gender, race, geography”

Mental model: “Pool all information, weighting by reliability”


The “Random” in Random Effects

The word “random” refers to the error structure, not the sampling:

  • Random Effects: Treats αᵢ as a random variable drawn from a distribution
  • Fixed Effects: Treats αᵢ as fixed parameters to be estimated

Both can use the same data! The difference is the assumption about these αᵢ.


The Hausman Test Logic

H₀: RE and FE estimates are both consistent (RE assumptions hold)

H₁: Only FE is consistent (unobserved effects correlated with regressors)

If RE assumptions are true:
- Both RE and FE should give similar estimates
- RE is more efficient (smaller standard errors)

If RE assumptions are violated:
- RE is biased and inconsistent
- FE remains consistent
- Estimates will differ significantly

Quick Mental Checks

Use FE when you think:

“The unobserved stuff that makes units different is probably related to my X variables”

Use RE when you think:

“The unobserved differences between units are just random noise, unrelated to my X variables”

The key question:

“Is there something unobserved about these units that affects both X and Y?” - YES → Fixed Effects - NO → Random Effects (if more efficient)


Final Insight

It’s not about the data you have, it’s about the assumption you’re willing to make:

  • Same exact dataset can be analyzed with FE or RE
  • Choice depends on your beliefs about unobserved heterogeneity
  • When in doubt → Use FE (it’s more robust to correlation)
  • RE is a bonus → When its assumptions hold, you get more efficient estimates

Remember: In economics, we usually worry about selection and unobserved heterogeneity, which is why FE is the “workhorse” model!

Economics Examples: Fixed, Random, and Mixed Effects Models

Fixed Effects (FE) Examples

1. Labor Market Returns to Job Training

Research Question: What is the effect of job training programs on individual wages?

\[\text{Wage}_{it} = \alpha_i + \beta \cdot \text{Training}_{it} + \gamma \cdot \text{Experience}_{it} + u_{it}\]

  • Why FE? Unobserved ability (\(\alpha_i\)) is likely correlated with participation in training programs
  • What FE controls for: Time-invariant individual characteristics (innate ability, motivation, family background)
  • Key insight: Identifies within-person variation - comparing the same person before/after training
  • Real study example: Card & Sullivan (1988) on displaced workers

2. International Trade: Effects of Free Trade Agreements

Research Question: How do FTAs affect bilateral trade flows between countries?

\[\ln(\text{Trade}_{ijt}) = \alpha_{ij} + \beta \cdot \text{FTA}_{ijt} + \gamma \cdot \ln(\text{GDP}_{it} \times \text{GDP}_{jt}) + u_{ijt}\]

  • Why FE? Country-pair fixed effects (\(\alpha_{ij}\)) capture time-invariant factors like distance, language, colonial history
  • What FE controls for: All time-invariant bilateral characteristics that affect trade
  • Key insight: Exploits within country-pair variation when FTA status changes
  • Real study example: Baier & Bergstrand (2007) on FTA effects

3. Public Finance: Corporate Tax Effects on Investment

Research Question: How do corporate tax rates affect firm investment decisions?

\[\text{Investment}_{it} = \alpha_i + \beta \cdot \text{TaxRate}_{it} + \gamma \cdot \text{CashFlow}_{it} + u_{it}\]

  • Why FE? Firm fixed effects (\(\alpha_i\)) control for time-invariant firm characteristics (management quality, industry position)
  • What FE controls for: Unobserved firm heterogeneity that affects both tax planning and investment
  • Key insight: Uses within-firm variation from tax reforms over time
  • Real study example: Giroud & Rauh (2019) on state corporate taxes

Random Effects (RE) Examples

1. Household Consumption Patterns

Research Question: How does household income affect consumption across different categories?

\[\text{Consumption}_{it} = \alpha + \beta \cdot \text{Income}_{it} + \gamma \cdot \text{FamilySize}_{it} + \mu_i + u_{it}\]

  • Why RE? Household preferences (\(\mu_i\)) are likely uncorrelated with income in random survey samples
  • RE advantage: Can include time-invariant regressors (education, urban/rural location)
  • Key assumption: Random sampling ensures \(\text{Cov}(\mu_i, \text{Income}_{it}) = 0\)
  • Real study example: Blundell et al. (1994) on UK household expenditure

2. Cross-Country Growth Regressions

Research Question: What factors drive economic growth across countries?

\[\text{GrowthRate}_{it} = \alpha + \beta \cdot \text{Investment}_{it} + \gamma \cdot \text{Education}_{it} + \mu_i + u_{it}\]

  • Why RE? Country-specific growth potential (\(\mu_i\)) treated as random draws from global distribution
  • RE advantage: Preserves cross-country variation; can include geography, institutions
  • Key assumption: Initial conditions uncorrelated with policy variables
  • Real study example: Islam (1995) on convergence in growth models

3. Agricultural Production Functions

Research Question: How do inputs affect crop yields across randomly selected farms?

\[\ln(\text{Yield}_{it}) = \alpha + \beta_1 \ln(\text{Fertilizer}_{it}) + \beta_2 \ln(\text{Labor}_{it}) + \mu_i + u_{it}\]

  • Why RE? Farm-specific productivity (\(\mu_i\)) assumed uncorrelated with input choices in extension programs
  • RE advantage: More efficient estimates when farms are randomly selected for programs
  • Key assumption: No selection bias in input use
  • Real study example: Battese & Coelli (1995) on technical efficiency

Mixed Effects (ME) Examples

1. Education: Student Achievement with School Effects

Research Question: How does class size affect student test scores, accounting for school and student heterogeneity?

\[\text{TestScore}_{ijt} = (\alpha + \mu_j) + (\beta + v_j) \cdot \text{ClassSize}_{jt} + \gamma \cdot \text{StudentChar}_{ijt} + u_{ijt}\]

Where: \(i\) = student, \(j\) = school, \(t\) = time

  • Random intercepts (\(\mu_j\)): School quality varies (resources, location, leadership)
  • Random slopes (\(v_j\)): Class size effects differ by school (some schools handle large classes better)
  • Why ME? Captures both between-school and within-school variation
  • Real study example: Hanushek et al. (2003) on class size effects

2. Health Economics: Patient Outcomes Across Hospitals

Research Question: How do treatment protocols affect patient recovery, varying by hospital?

\[\text{Recovery}_{ijt} = (\alpha + \mu_j) + (\beta + v_j) \cdot \text{Treatment}_{ijt} + \gamma \cdot \text{PatientRisk}_{ijt} + u_{ijt}\]

Where: \(i\) = patient, \(j\) = hospital, \(t\) = time

  • Random intercepts (\(\mu_j\)): Baseline hospital quality differs
  • Random slopes (\(v_j\)): Treatment effectiveness varies by hospital (staff expertise, equipment)
  • Why ME? Accounts for hospital clustering and heterogeneous treatment effects
  • Real study example: Gatsonis et al. (1993) on cardiac catheterization

3. Regional Economics: Firm Productivity Across Cities

Research Question: How do agglomeration economies affect firm productivity differently across cities?

\[\ln(\text{TFP}_{ijt}) = (\alpha + \mu_j) + (\beta + v_j) \cdot \ln(\text{CitySize}_{jt}) + \gamma \cdot \text{FirmChar}_{ijt} + u_{ijt}\]

Where: \(i\) = firm, \(j\) = city, \(t\) = time

  • Random intercepts (\(\mu_j\)): City-specific advantages (infrastructure, institutions)
  • Random slopes (\(v_j\)): Agglomeration benefits vary by city (specialized vs. diverse cities)
  • Why ME? Captures how urban economies function differently across cities
  • Real study example: Henderson (2003) on urban agglomeration effects

Quick Decision Guide

Choose Fixed Effects when:

  • Unobserved heterogeneity is likely correlated with regressors
  • You want causal identification from within-unit variation
  • Time-invariant factors are nuisance parameters
  • Examples: Ability bias, firm quality, country-pair characteristics

Choose Random Effects when:

  • Units are randomly sampled from a population
  • Unobserved effects are uncorrelated with regressors
  • You need to include time-invariant variables
  • Examples: Random household surveys, experimental settings

Choose Mixed Effects when:

  • Data has hierarchical/nested structure
  • Both intercepts AND slopes vary across groups
  • You care about group-level heterogeneity
  • Examples: Students in schools, patients in hospitals, firms in regions

Empirical Testing Strategy

  1. Start with Mixed Effects if you have hierarchical data
  2. Test RE vs FE using Hausman test for non-hierarchical panel data
  3. Test for random slopes using likelihood ratio tests in ME models
  4. Consider Mundlak approach as a middle ground (RE with group means of regressors)

Modern Extensions

  • Correlated Random Effects (Mundlak): Allows correlation through group means
  • Fixed Effects with Individual Slopes: Recent work on heterogeneous trends
  • Bayesian Mixed Effects: Better handling of small groups and complex hierarchies