In-Class Notes: Panel Data Models
Worksheet: Fixed Effects, Random Effects, or Mixed Effects?
Background
Below are three models describing the same idea: - We observe repeated data for individuals (i) over time (t) - We want to study how \(X_{it}\) affects \(Y_{it}\)
Each equation treats the unit-specific effect differently.
Your task: Identify whether each model represents a Fixed Effects, Random Effects, or Mixed Effects model.
Model Specifications
Model A
\[Y_{it} = \alpha_i + \beta X_{it} + u_{it}\]
Model B
\[Y_{it} = \alpha + \beta X_{it} + \mu_i + u_{it}, \quad \mu_i \sim N(0, \sigma_\mu^2)\]
Model C
\[Y_{it} = (\alpha + \mu_i) + (\beta + v_i) X_{it} + u_{it}\]
Where: \[\begin{cases} \mu_i \sim N(0, \sigma_\mu^2)\\ v_i \sim N(0, \sigma_v^2) \end{cases}\]
Questions for Students
Which model allows each individual to have their own intercept (but a common slope)?
Which model assumes that the unobserved effect is random and uncorrelated with \(X_{it}\)?
Which model allows both intercepts and slopes to vary across individuals?
Which model would you expect to be most common in econometrics and why?
Answer Key and Explanations
Model A → Fixed Effects (FE)
\[Y_{it} = \alpha_i + \beta X_{it} + u_{it}\]
- Each unit (i) has its own intercept \(\alpha_i\) (fixed constant)
- The slope \(\beta\) is the same for all units
- Removes unobserved heterogeneity correlated with \(X_{it}\)
- Estimated using within transformation or dummy variables
Econometric insight: Most common for causal inference when unobserved heterogeneity is correlated with regressors.
Model B → Random Effects (RE)
\[Y_{it} = \alpha + \beta X_{it} + \mu_i + u_{it}\]
- \(\mu_i\) is a random intercept drawn from a distribution
- Assumes \(\text{Cov}(\mu_i, X_{it}) = 0\)
- More efficient than FE if assumption holds
- Estimated using Generalized Least Squares (GLS)
Econometric insight: Used when time-invariant regressors matter and unobserved effects are uncorrelated with regressors.
Model C → Mixed Effects (ME)
\[Y_{it} = (\alpha + \mu_i) + (\beta + v_i) X_{it} + u_{it}\]
- Both intercepts \((\alpha + \mu_i)\) and slopes \((\beta + v_i)\) vary across units
- Random deviations \(\mu_i, v_i\) capture group-specific patterns
- Used in hierarchical or multilevel settings (e.g., students within schools)
- Estimated by Maximum Likelihood / REML
Econometric insight: Common in applied micro (education, health, labor) when relationships differ by group.
Discussion Prompts
- Which model would be most appropriate if you believe unobserved ability is correlated with education?
- Answer: Fixed Effects, as it can handle correlation between unobserved heterogeneity and regressors
- Which model best captures the idea that different firms respond differently to market changes?
- Answer: Mixed Effects, as it allows slopes to vary across units
- How would you test whether Random Effects is valid?
- Answer: Hausman test compares FE and RE estimates; if they differ significantly, RE assumption is violated
Quick Summary Table
| Model | Intercept | Slope | Key Assumption | Estimation Method |
|---|---|---|---|---|
| Fixed Effects | Unit-specific (\(\alpha_i\)) | Common (\(\beta\)) | Correlation with \(X_{it}\) allowed | Within transformation, LSDV |
| Random Effects | Random (\(\alpha + \mu_i\)) | Common (\(\beta\)) | \(\text{Cov}(\mu_i, X_{it}) = 0\) | GLS |
| Mixed Effects | Random (\(\alpha + \mu_i\)) | Random (\(\beta + v_i\)) | Hierarchical structure | ML/REML |
Key Takeaways
- Fixed Effects is the workhorse of causal inference in econometrics
- Random Effects is more efficient but requires stronger assumptions
- Mixed Effects is flexible for hierarchical data structures
- Choice depends on:
- Nature of unobserved heterogeneity
- Research question (causal vs. predictive)
- Data structure (panel vs. hierarchical)
How to Think About Random Effects vs Fixed Effects
The Common Misconception
NOT quite right: - “Random Effects = randomly sampled observations” - “Fixed Effects = all observations from population”
The real distinction is about: - How unobserved heterogeneity relates to your regressors - What assumptions you’re willing to make - What type of variation identifies your effect
The Core Conceptual Difference
Random Effects Mindset
“The unobserved differences between units are random draws from a distribution, independent of the explanatory variables”
Example: 1000 households randomly surveyed
- Each household has unobserved preferences (μᵢ)
- These preferences are like random "lottery tickets" from a distribution
- Key: Household income doesn't determine these random preferences
- We can use BOTH within and between household variation
Fixed Effects Mindset
“The unobserved differences between units are fixed parameters that may be correlated with explanatory variables”
Example: Same 1000 households
- Each household has unobserved ability/preferences (αᵢ)
- These might be correlated with income (ability → higher income)
- Key: We DON'T assume independence
- We use ONLY within-household variation over time
A Concrete Example: Studying Income → Consumption
Dataset: 500 households observed for 5 years
Scenario A: Use Random Effects
Your belief: Households were randomly selected from the population
- Unobserved frugality (μᵢ) is randomly distributed
- Frugal and spendthrift households equally likely to be rich or poor
- Income doesn't determine baseline consumption preferences
Model: Consumptionᵢₜ = α + β·Incomeᵢₜ + μᵢ + uᵢₜ
where E[μᵢ|Incomeᵢₜ] = 0
Scenario B: Use Fixed Effects
Your belief: Unobserved traits correlate with income
- More educated households have both:
* Higher income (education → better jobs)
* Different consumption patterns (education → preferences)
- Can't assume independence
Model: Consumptionᵢₜ = αᵢ + β·Incomeᵢₜ + uᵢₜ
where αᵢ can be correlated with Incomeᵢₜ
It’s NOT About Sample Size or Selection!
Both RE and FE can use:
- The exact same dataset
- All observations or a subset
- Randomly selected or complete population data
Example: Company Employee Data
Full census of 10,000 employees over 10 years:
Could use RE if:
- Studying effect of training on productivity
- You believe innate ability is uncorrelated with training participation
- (Maybe training is randomly assigned)
Should use FE if:
- You believe high-ability workers select into training
- Ability affects both training participation AND productivity
- You want to control for this selection
Practical Thinking Guide
Think Fixed Effects when:
- Selection concerns: “Better units might select into treatment”
- Omitted variables: “Unobserved quality affects both X and Y”
- Causal focus: “I want within-unit variation only”
- Skeptical stance: “I don’t trust independence assumptions”
Mental model: “Compare each unit to itself over time”
Think Random Effects when:
- True randomization: “Units randomly drawn from population”
- Experimental setting: “Treatment randomly assigned”
- Efficiency matters: “I want to use all variation in the data”
- Time-invariant variables: “I need to estimate effects of gender, race, geography”
Mental model: “Pool all information, weighting by reliability”
The “Random” in Random Effects
The word “random” refers to the error structure, not the sampling:
- Random Effects: Treats αᵢ as a random variable drawn from a distribution
- Fixed Effects: Treats αᵢ as fixed parameters to be estimated
Both can use the same data! The difference is the assumption about these αᵢ.
The Hausman Test Logic
H₀: RE and FE estimates are both consistent (RE assumptions hold)
H₁: Only FE is consistent (unobserved effects correlated with regressors)
If RE assumptions are true:
- Both RE and FE should give similar estimates
- RE is more efficient (smaller standard errors)
If RE assumptions are violated:
- RE is biased and inconsistent
- FE remains consistent
- Estimates will differ significantly
Quick Mental Checks
Use FE when you think:
“The unobserved stuff that makes units different is probably related to my X variables”
Use RE when you think:
“The unobserved differences between units are just random noise, unrelated to my X variables”
The key question:
“Is there something unobserved about these units that affects both X and Y?” - YES → Fixed Effects - NO → Random Effects (if more efficient)
Final Insight
It’s not about the data you have, it’s about the assumption you’re willing to make:
- Same exact dataset can be analyzed with FE or RE
- Choice depends on your beliefs about unobserved heterogeneity
- When in doubt → Use FE (it’s more robust to correlation)
- RE is a bonus → When its assumptions hold, you get more efficient estimates
Remember: In economics, we usually worry about selection and unobserved heterogeneity, which is why FE is the “workhorse” model!
Economics Examples: Fixed, Random, and Mixed Effects Models
Fixed Effects (FE) Examples
1. Labor Market Returns to Job Training
Research Question: What is the effect of job training programs on individual wages?
\[\text{Wage}_{it} = \alpha_i + \beta \cdot \text{Training}_{it} + \gamma \cdot \text{Experience}_{it} + u_{it}\]
- Why FE? Unobserved ability (\(\alpha_i\)) is likely correlated with participation in training programs
- What FE controls for: Time-invariant individual characteristics (innate ability, motivation, family background)
- Key insight: Identifies within-person variation - comparing the same person before/after training
- Real study example: Card & Sullivan (1988) on displaced workers
2. International Trade: Effects of Free Trade Agreements
Research Question: How do FTAs affect bilateral trade flows between countries?
\[\ln(\text{Trade}_{ijt}) = \alpha_{ij} + \beta \cdot \text{FTA}_{ijt} + \gamma \cdot \ln(\text{GDP}_{it} \times \text{GDP}_{jt}) + u_{ijt}\]
- Why FE? Country-pair fixed effects (\(\alpha_{ij}\)) capture time-invariant factors like distance, language, colonial history
- What FE controls for: All time-invariant bilateral characteristics that affect trade
- Key insight: Exploits within country-pair variation when FTA status changes
- Real study example: Baier & Bergstrand (2007) on FTA effects
3. Public Finance: Corporate Tax Effects on Investment
Research Question: How do corporate tax rates affect firm investment decisions?
\[\text{Investment}_{it} = \alpha_i + \beta \cdot \text{TaxRate}_{it} + \gamma \cdot \text{CashFlow}_{it} + u_{it}\]
- Why FE? Firm fixed effects (\(\alpha_i\)) control for time-invariant firm characteristics (management quality, industry position)
- What FE controls for: Unobserved firm heterogeneity that affects both tax planning and investment
- Key insight: Uses within-firm variation from tax reforms over time
- Real study example: Giroud & Rauh (2019) on state corporate taxes
Random Effects (RE) Examples
1. Household Consumption Patterns
Research Question: How does household income affect consumption across different categories?
\[\text{Consumption}_{it} = \alpha + \beta \cdot \text{Income}_{it} + \gamma \cdot \text{FamilySize}_{it} + \mu_i + u_{it}\]
- Why RE? Household preferences (\(\mu_i\)) are likely uncorrelated with income in random survey samples
- RE advantage: Can include time-invariant regressors (education, urban/rural location)
- Key assumption: Random sampling ensures \(\text{Cov}(\mu_i, \text{Income}_{it}) = 0\)
- Real study example: Blundell et al. (1994) on UK household expenditure
2. Cross-Country Growth Regressions
Research Question: What factors drive economic growth across countries?
\[\text{GrowthRate}_{it} = \alpha + \beta \cdot \text{Investment}_{it} + \gamma \cdot \text{Education}_{it} + \mu_i + u_{it}\]
- Why RE? Country-specific growth potential (\(\mu_i\)) treated as random draws from global distribution
- RE advantage: Preserves cross-country variation; can include geography, institutions
- Key assumption: Initial conditions uncorrelated with policy variables
- Real study example: Islam (1995) on convergence in growth models
3. Agricultural Production Functions
Research Question: How do inputs affect crop yields across randomly selected farms?
\[\ln(\text{Yield}_{it}) = \alpha + \beta_1 \ln(\text{Fertilizer}_{it}) + \beta_2 \ln(\text{Labor}_{it}) + \mu_i + u_{it}\]
- Why RE? Farm-specific productivity (\(\mu_i\)) assumed uncorrelated with input choices in extension programs
- RE advantage: More efficient estimates when farms are randomly selected for programs
- Key assumption: No selection bias in input use
- Real study example: Battese & Coelli (1995) on technical efficiency
Mixed Effects (ME) Examples
1. Education: Student Achievement with School Effects
Research Question: How does class size affect student test scores, accounting for school and student heterogeneity?
\[\text{TestScore}_{ijt} = (\alpha + \mu_j) + (\beta + v_j) \cdot \text{ClassSize}_{jt} + \gamma \cdot \text{StudentChar}_{ijt} + u_{ijt}\]
Where: \(i\) = student, \(j\) = school, \(t\) = time
- Random intercepts (\(\mu_j\)): School quality varies (resources, location, leadership)
- Random slopes (\(v_j\)): Class size effects differ by school (some schools handle large classes better)
- Why ME? Captures both between-school and within-school variation
- Real study example: Hanushek et al. (2003) on class size effects
2. Health Economics: Patient Outcomes Across Hospitals
Research Question: How do treatment protocols affect patient recovery, varying by hospital?
\[\text{Recovery}_{ijt} = (\alpha + \mu_j) + (\beta + v_j) \cdot \text{Treatment}_{ijt} + \gamma \cdot \text{PatientRisk}_{ijt} + u_{ijt}\]
Where: \(i\) = patient, \(j\) = hospital, \(t\) = time
- Random intercepts (\(\mu_j\)): Baseline hospital quality differs
- Random slopes (\(v_j\)): Treatment effectiveness varies by hospital (staff expertise, equipment)
- Why ME? Accounts for hospital clustering and heterogeneous treatment effects
- Real study example: Gatsonis et al. (1993) on cardiac catheterization
3. Regional Economics: Firm Productivity Across Cities
Research Question: How do agglomeration economies affect firm productivity differently across cities?
\[\ln(\text{TFP}_{ijt}) = (\alpha + \mu_j) + (\beta + v_j) \cdot \ln(\text{CitySize}_{jt}) + \gamma \cdot \text{FirmChar}_{ijt} + u_{ijt}\]
Where: \(i\) = firm, \(j\) = city, \(t\) = time
- Random intercepts (\(\mu_j\)): City-specific advantages (infrastructure, institutions)
- Random slopes (\(v_j\)): Agglomeration benefits vary by city (specialized vs. diverse cities)
- Why ME? Captures how urban economies function differently across cities
- Real study example: Henderson (2003) on urban agglomeration effects
Quick Decision Guide
Choose Fixed Effects when:
- Unobserved heterogeneity is likely correlated with regressors
- You want causal identification from within-unit variation
- Time-invariant factors are nuisance parameters
- Examples: Ability bias, firm quality, country-pair characteristics
Choose Random Effects when:
- Units are randomly sampled from a population
- Unobserved effects are uncorrelated with regressors
- You need to include time-invariant variables
- Examples: Random household surveys, experimental settings
Choose Mixed Effects when:
- Data has hierarchical/nested structure
- Both intercepts AND slopes vary across groups
- You care about group-level heterogeneity
- Examples: Students in schools, patients in hospitals, firms in regions
Empirical Testing Strategy
- Start with Mixed Effects if you have hierarchical data
- Test RE vs FE using Hausman test for non-hierarchical panel data
- Test for random slopes using likelihood ratio tests in ME models
- Consider Mundlak approach as a middle ground (RE with group means of regressors)
Modern Extensions
- Correlated Random Effects (Mundlak): Allows correlation through group means
- Fixed Effects with Individual Slopes: Recent work on heterogeneous trends
- Bayesian Mixed Effects: Better handling of small groups and complex hierarchies