In-Class Notes: Panel Data Models

Author

Worksheet: Fixed Effects, Random Effects, or Mixed Effects?

Background

Below are three models describing the same idea: - We observe repeated data for individuals (i) over time (t) - We want to study how \(X_{it}\) affects \(Y_{it}\)

Each equation treats the unit-specific effect differently.

Your task: Identify whether each model represents a Fixed Effects, Random Effects, or Mixed Effects model.

Model Specifications

Model A

\[Y_{it} = \alpha_i + \beta X_{it} + u_{it}\]

Model B

\[Y_{it} = \alpha + \beta X_{it} + \mu_i + u_{it}, \quad \mu_i \sim N(0, \sigma_\mu^2)\]

Model C

\[Y_{it} = (\alpha + \mu_i) + (\beta + v_i) X_{it} + u_{it}\]

Where: \[\begin{cases} \mu_i \sim N(0, \sigma_\mu^2)\\ v_i \sim N(0, \sigma_v^2) \end{cases}\]

Questions for Students

Which model allows each individual to have their own intercept (but a common slope)?
Which model assumes that the unobserved effect is random and uncorrelated with \(X_{it}\)?
Which model allows both intercepts and slopes to vary across individuals?
Which model would you expect to be most common in econometrics and why?

Answer Key and Explanations

Model A → Fixed Effects (FE)

\[Y_{it} = \alpha_i + \beta X_{it} + u_{it}\]

Each unit (i) has its own intercept \(\alpha_i\) (fixed constant)
The slope \(\beta\) is the same for all units
Removes unobserved heterogeneity correlated with \(X_{it}\)
Estimated using within transformation or dummy variables

Econometric insight: Most common for causal inference when unobserved heterogeneity is correlated with regressors.

Model B → Random Effects (RE)

\[Y_{it} = \alpha + \beta X_{it} + \mu_i + u_{it}\]

\(\mu_i\) is a random intercept drawn from a distribution
Assumes \(\text{Cov}(\mu_i, X_{it}) = 0\)
More efficient than FE if assumption holds
Estimated using Generalized Least Squares (GLS)

Econometric insight: Used when time-invariant regressors matter and unobserved effects are uncorrelated with regressors.

Model C → Mixed Effects (ME)

\[Y_{it} = (\alpha + \mu_i) + (\beta + v_i) X_{it} + u_{it}\]

Both intercepts \((\alpha + \mu_i)\) and slopes \((\beta + v_i)\) vary across units
Random deviations \(\mu_i, v_i\) capture group-specific patterns
Used in hierarchical or multilevel settings (e.g., students within schools)
Estimated by Maximum Likelihood / REML

Econometric insight: Common in applied micro (education, health, labor) when relationships differ by group.

Discussion Prompts

Which model would be most appropriate if you believe unobserved ability is correlated with education?
- Answer: Fixed Effects, as it can handle correlation between unobserved heterogeneity and regressors
Which model best captures the idea that different firms respond differently to market changes?
- Answer: Mixed Effects, as it allows slopes to vary across units
How would you test whether Random Effects is valid?
- Answer: Hausman test compares FE and RE estimates; if they differ significantly, RE assumption is violated

Quick Summary Table

Model	Intercept	Slope	Key Assumption	Estimation Method
Fixed Effects	Unit-specific (\(\alpha_i\))	Common (\(\beta\))	Correlation with \(X_{it}\) allowed	Within transformation, LSDV
Random Effects	Random (\(\alpha + \mu_i\))	Common (\(\beta\))	\(\text{Cov}(\mu_i, X_{it}) = 0\)	GLS
Mixed Effects	Random (\(\alpha + \mu_i\))	Random (\(\beta + v_i\))	Hierarchical structure	ML/REML

Key Takeaways

Fixed Effects is the workhorse of causal inference in econometrics
Random Effects is more efficient but requires stronger assumptions
Mixed Effects is flexible for hierarchical data structures
Choice depends on:
- Nature of unobserved heterogeneity
- Research question (causal vs. predictive)
- Data structure (panel vs. hierarchical)

How to Think About Random Effects vs Fixed Effects

The Common Misconception

NOT quite right: - “Random Effects = randomly sampled observations” - “Fixed Effects = all observations from population”

The real distinction is about: - How unobserved heterogeneity relates to your regressors - What assumptions you’re willing to make - What type of variation identifies your effect

The Core Conceptual Difference

Random Effects Mindset

“The unobserved differences between units are random draws from a distribution, independent of the explanatory variables”

Example: 1000 households randomly surveyed
- Each household has unobserved preferences (μᵢ)
- These preferences are like random "lottery tickets" from a distribution
- Key: Household income doesn't determine these random preferences
- We can use BOTH within and between household variation

Fixed Effects Mindset

“The unobserved differences between units are fixed parameters that may be correlated with explanatory variables”

Example: Same 1000 households
- Each household has unobserved ability/preferences (αᵢ)
- These might be correlated with income (ability → higher income)
- Key: We DON'T assume independence
- We use ONLY within-household variation over time

A Concrete Example: Studying Income → Consumption

Dataset: 500 households observed for 5 years

Scenario A: Use Random Effects

Your belief: Households were randomly selected from the population
- Unobserved frugality (μᵢ) is randomly distributed
- Frugal and spendthrift households equally likely to be rich or poor
- Income doesn't determine baseline consumption preferences

Model: Consumptionᵢₜ = α + β·Incomeᵢₜ + μᵢ + uᵢₜ
       where E[μᵢ|Incomeᵢₜ] = 0

Scenario B: Use Fixed Effects

Your belief: Unobserved traits correlate with income
- More educated households have both:
  * Higher income (education → better jobs)
  * Different consumption patterns (education → preferences)
- Can't assume independence

Model: Consumptionᵢₜ = αᵢ + β·Incomeᵢₜ + uᵢₜ
       where αᵢ can be correlated with Incomeᵢₜ

It’s NOT About Sample Size or Selection!

Both RE and FE can use:

The exact same dataset
All observations or a subset
Randomly selected or complete population data

Example: Company Employee Data

Full census of 10,000 employees over 10 years:

Could use RE if:

Studying effect of training on productivity
You believe innate ability is uncorrelated with training participation
(Maybe training is randomly assigned)

Should use FE if:

You believe high-ability workers select into training
Ability affects both training participation AND productivity
You want to control for this selection

Practical Thinking Guide

Think Fixed Effects when:

Selection concerns: “Better units might select into treatment”
Omitted variables: “Unobserved quality affects both X and Y”
Causal focus: “I want within-unit variation only”
Skeptical stance: “I don’t trust independence assumptions”

Mental model: “Compare each unit to itself over time”

Think Random Effects when:

True randomization: “Units randomly drawn from population”
Experimental setting: “Treatment randomly assigned”
Efficiency matters: “I want to use all variation in the data”
Time-invariant variables: “I need to estimate effects of gender, race, geography”

Mental model: “Pool all information, weighting by reliability”

The “Random” in Random Effects

The word “random” refers to the error structure, not the sampling:

Random Effects: Treats αᵢ as a random variable drawn from a distribution
Fixed Effects: Treats αᵢ as fixed parameters to be estimated

Both can use the same data! The difference is the assumption about these αᵢ.

The Hausman Test Logic

H₀: RE and FE estimates are both consistent (RE assumptions hold)

H₁: Only FE is consistent (unobserved effects correlated with regressors)

If RE assumptions are true:
- Both RE and FE should give similar estimates
- RE is more efficient (smaller standard errors)

If RE assumptions are violated:
- RE is biased and inconsistent
- FE remains consistent
- Estimates will differ significantly

Quick Mental Checks

Use FE when you think:

“The unobserved stuff that makes units different is probably related to my X variables”

Use RE when you think:

“The unobserved differences between units are just random noise, unrelated to my X variables”

The key question:

“Is there something unobserved about these units that affects both X and Y?” - YES → Fixed Effects - NO → Random Effects (if more efficient)

Final Insight

It’s not about the data you have, it’s about the assumption you’re willing to make:

Same exact dataset can be analyzed with FE or RE
Choice depends on your beliefs about unobserved heterogeneity
When in doubt → Use FE (it’s more robust to correlation)
RE is a bonus → When its assumptions hold, you get more efficient estimates

Remember: In economics, we usually worry about selection and unobserved heterogeneity, which is why FE is the “workhorse” model!

Economics Examples: Fixed, Random, and Mixed Effects Models

Fixed Effects (FE) Examples

1. Labor Market Returns to Job Training

Research Question: What is the effect of job training programs on individual wages?

\[\text{Wage}_{it} = \alpha_i + \beta \cdot \text{Training}_{it} + \gamma \cdot \text{Experience}_{it} + u_{it}\]

Why FE? Unobserved ability (\(\alpha_i\)) is likely correlated with participation in training programs
What FE controls for: Time-invariant individual characteristics (innate ability, motivation, family background)
Key insight: Identifies within-person variation - comparing the same person before/after training
Real study example: Card & Sullivan (1988) on displaced workers

2. International Trade: Effects of Free Trade Agreements

Research Question: How do FTAs affect bilateral trade flows between countries?

\[\ln(\text{Trade}_{ijt}) = \alpha_{ij} + \beta \cdot \text{FTA}_{ijt} + \gamma \cdot \ln(\text{GDP}_{it} \times \text{GDP}_{jt}) + u_{ijt}\]

Why FE? Country-pair fixed effects (\(\alpha_{ij}\)) capture time-invariant factors like distance, language, colonial history
What FE controls for: All time-invariant bilateral characteristics that affect trade
Key insight: Exploits within country-pair variation when FTA status changes
Real study example: Baier & Bergstrand (2007) on FTA effects

3. Public Finance: Corporate Tax Effects on Investment

Research Question: How do corporate tax rates affect firm investment decisions?

\[\text{Investment}_{it} = \alpha_i + \beta \cdot \text{TaxRate}_{it} + \gamma \cdot \text{CashFlow}_{it} + u_{it}\]

Why FE? Firm fixed effects (\(\alpha_i\)) control for time-invariant firm characteristics (management quality, industry position)
What FE controls for: Unobserved firm heterogeneity that affects both tax planning and investment
Key insight: Uses within-firm variation from tax reforms over time
Real study example: Giroud & Rauh (2019) on state corporate taxes

Random Effects (RE) Examples

1. Household Consumption Patterns

Research Question: How does household income affect consumption across different categories?

\[\text{Consumption}_{it} = \alpha + \beta \cdot \text{Income}_{it} + \gamma \cdot \text{FamilySize}_{it} + \mu_i + u_{it}\]

Why RE? Household preferences (\(\mu_i\)) are likely uncorrelated with income in random survey samples
RE advantage: Can include time-invariant regressors (education, urban/rural location)
Key assumption: Random sampling ensures \(\text{Cov}(\mu_i, \text{Income}_{it}) = 0\)
Real study example: Blundell et al. (1994) on UK household expenditure

2. Cross-Country Growth Regressions

Research Question: What factors drive economic growth across countries?

\[\text{GrowthRate}_{it} = \alpha + \beta \cdot \text{Investment}_{it} + \gamma \cdot \text{Education}_{it} + \mu_i + u_{it}\]

Why RE? Country-specific growth potential (\(\mu_i\)) treated as random draws from global distribution
RE advantage: Preserves cross-country variation; can include geography, institutions
Key assumption: Initial conditions uncorrelated with policy variables
Real study example: Islam (1995) on convergence in growth models

3. Agricultural Production Functions

Research Question: How do inputs affect crop yields across randomly selected farms?

\[\ln(\text{Yield}_{it}) = \alpha + \beta_1 \ln(\text{Fertilizer}_{it}) + \beta_2 \ln(\text{Labor}_{it}) + \mu_i + u_{it}\]

Why RE? Farm-specific productivity (\(\mu_i\)) assumed uncorrelated with input choices in extension programs
RE advantage: More efficient estimates when farms are randomly selected for programs
Key assumption: No selection bias in input use
Real study example: Battese & Coelli (1995) on technical efficiency

Mixed Effects (ME) Examples

1. Education: Student Achievement with School Effects

Research Question: How does class size affect student test scores, accounting for school and student heterogeneity?

\[\text{TestScore}_{ijt} = (\alpha + \mu_j) + (\beta + v_j) \cdot \text{ClassSize}_{jt} + \gamma \cdot \text{StudentChar}_{ijt} + u_{ijt}\]

Where: \(i\) = student, \(j\) = school, \(t\) = time

Random intercepts (\(\mu_j\)): School quality varies (resources, location, leadership)
Random slopes (\(v_j\)): Class size effects differ by school (some schools handle large classes better)
Why ME? Captures both between-school and within-school variation
Real study example: Hanushek et al. (2003) on class size effects

2. Health Economics: Patient Outcomes Across Hospitals

Research Question: How do treatment protocols affect patient recovery, varying by hospital?

\[\text{Recovery}_{ijt} = (\alpha + \mu_j) + (\beta + v_j) \cdot \text{Treatment}_{ijt} + \gamma \cdot \text{PatientRisk}_{ijt} + u_{ijt}\]

Where: \(i\) = patient, \(j\) = hospital, \(t\) = time

Random intercepts (\(\mu_j\)): Baseline hospital quality differs
Random slopes (\(v_j\)): Treatment effectiveness varies by hospital (staff expertise, equipment)
Why ME? Accounts for hospital clustering and heterogeneous treatment effects
Real study example: Gatsonis et al. (1993) on cardiac catheterization

3. Regional Economics: Firm Productivity Across Cities

Research Question: How do agglomeration economies affect firm productivity differently across cities?

\[\ln(\text{TFP}_{ijt}) = (\alpha + \mu_j) + (\beta + v_j) \cdot \ln(\text{CitySize}_{jt}) + \gamma \cdot \text{FirmChar}_{ijt} + u_{ijt}\]

Where: \(i\) = firm, \(j\) = city, \(t\) = time

Random intercepts (\(\mu_j\)): City-specific advantages (infrastructure, institutions)
Random slopes (\(v_j\)): Agglomeration benefits vary by city (specialized vs. diverse cities)
Why ME? Captures how urban economies function differently across cities
Real study example: Henderson (2003) on urban agglomeration effects

Quick Decision Guide

Choose Fixed Effects when:

Unobserved heterogeneity is likely correlated with regressors
You want causal identification from within-unit variation
Time-invariant factors are nuisance parameters
Examples: Ability bias, firm quality, country-pair characteristics

Choose Random Effects when:

Units are randomly sampled from a population
Unobserved effects are uncorrelated with regressors
You need to include time-invariant variables
Examples: Random household surveys, experimental settings

Choose Mixed Effects when:

Data has hierarchical/nested structure
Both intercepts AND slopes vary across groups
You care about group-level heterogeneity
Examples: Students in schools, patients in hospitals, firms in regions

Empirical Testing Strategy

Start with Mixed Effects if you have hierarchical data
Test RE vs FE using Hausman test for non-hierarchical panel data
Test for random slopes using likelihood ratio tests in ME models
Consider Mundlak approach as a middle ground (RE with group means of regressors)

Modern Extensions

Correlated Random Effects (Mundlak): Allows correlation through group means
Fixed Effects with Individual Slopes: Recent work on heterogeneous trends
Bayesian Mixed Effects: Better handling of small groups and complex hierarchies