Panel Data Models: A Comprehensive Guide
Pooled OLS, Fixed Effects, and Random Effects Estimators
1 Introduction and Setup
1.1 What is Panel Data?
Panel data (also called longitudinal data) combines cross-sectional and time-series dimensions. We observe multiple entities (individuals, firms, countries, regions) over multiple time periods.
Key advantages:
- Controls for unobserved heterogeneity
- More degrees of freedom and efficiency
- Can study dynamics and causality better than pure cross-section
- Reduces collinearity between variables
1.2 Notation and Data Structure
1.2.1 Basic Setup
Consider a balanced panel with:
- N = number of cross-sectional units (e.g., regions, individuals)
- T = number of time periods
- n = NT = total number of observations
1.2.2 Variables
- y_{it} — dependent variable for unit i at time t
- x_{it} — (K \times 1) vector of explanatory variables (regressors)
- \beta — (K \times 1) parameter vector (coefficients of interest)
- \alpha_i — individual-specific effect (unobserved heterogeneity)
- \varepsilon_{it} — idiosyncratic error term (time-varying shock)
1.2.3 Stacking Convention
We stack observations in a specific order: all time periods for unit 1, then all time periods for unit 2, etc.
Stacked dependent variable: y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_N \end{bmatrix}_{NT \times 1}, \quad \text{where} \quad y_i = \begin{bmatrix} y_{i1} \\ y_{i2} \\ \vdots \\ y_{iT} \end{bmatrix}_{T \times 1}
Stacked regressor matrix: X = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_N \end{bmatrix}_{NT \times K}, \quad \text{where} \quad X_i = \begin{bmatrix} x_{i1}' \\ x_{i2}' \\ \vdots \\ x_{iT}' \end{bmatrix}_{T \times K}
1.2.4 Useful Matrix Notation
- I_N — (N \times N) identity matrix
- I_T — (T \times T) identity matrix
- \mathbf{1}_T — (T \times 1) vector of ones
- J_T = \mathbf{1}_T \mathbf{1}_T' / T — (T \times T) averaging matrix
- \otimes — Kronecker product operator
2 Pooled OLS Model
2.1 The Basic Idea
Pooled OLS treats all observations as if they were independent, ignoring the panel structure. It assumes no unobserved heterogeneity across units.
2.2 Model Specification
2.2.1 Scalar Form
For unit i at time t: y_{it} = x_{it}'\beta + u_{it}, \quad i=1,\ldots,N, \quad t=1,\ldots,T
where u_{it} is a composite error term that pools all unobserved effects.
2.2.2 Matrix Form
Stacking all observations: y = X\beta + u
where: - y is (NT \times 1) - X is (NT \times K) - \beta is (K \times 1) - u is (NT \times 1)
2.3 Pooled OLS Estimator
2.3.1 Derivation
Minimize the sum of squared residuals: \min_{\beta} \quad (y - X\beta)'(y - X\beta)
First-order condition: -2X'(y - X\hat{\beta}_P) = 0
Solving for \hat{\beta}_P: X'X\hat{\beta}_P = X'y
2.3.2 The Estimator
\boxed{\hat{\beta}_P = (X'X)^{-1}X'y}
This is just standard OLS applied to the entire stacked dataset.
2.4 Assumptions
2.4.1 Classical Assumptions
- Strict exogeneity: E[u_{it} \mid X] = 0
- Homoskedasticity: \text{Var}(u_{it}) = \sigma_u^2 for all i,t
- No serial correlation: \text{Cov}(u_{it}, u_{is}) = 0 for t \neq s
- No cross-sectional correlation: \text{Cov}(u_{it}, u_{jt}) = 0 for i \neq j
2.4.2 Variance-Covariance Matrix
Under these assumptions: \text{Var}(u) = \sigma_u^2 I_{NT}
where I_{NT} is the (NT \times NT) identity matrix.
2.5 Variance of the Estimator
Under classical assumptions: \text{Var}(\hat{\beta}_P \mid X) = \sigma_u^2 (X'X)^{-1}
Estimation of \sigma_u^2: \hat{\sigma}_u^2 = \frac{\hat{u}'\hat{u}}{NT - K} = \frac{\sum_{i=1}^N \sum_{t=1}^T \hat{u}_{it}^2}{NT - K}
2.6 When Does Pooled OLS Work?
2.6.1 Consistency Condition
Pooled OLS is consistent if: E[u_{it} \mid X] = 0
This requires: No unobserved unit-specific effects that correlate with X.
2.6.2 Why Pooled OLS Usually Fails
In reality, u_{it} often contains unobserved individual effects: u_{it} = \alpha_i + \varepsilon_{it}
If \alpha_i is correlated with x_{it}, then: E[u_{it} \mid X] = E[\alpha_i \mid X] \neq 0
This causes omitted variable bias.
2.6.3 Example: Returns to Education
Model: \ln(\text{wage}_{it}) = \beta_0 + \beta_1 \text{education}_{it} + u_{it}
Problem: u_{it} contains unobserved ability \alpha_i
- High ability individuals may get more education
- \text{Cov}(\text{education}_{it}, \alpha_i) > 0
- Pooled OLS overestimates returns to education
2.7 Advantages and Disadvantages
2.7.1 Advantages
- Simple to compute
- Efficient if assumptions hold
- Can include time-invariant variables
2.7.2 Disadvantages
- Biased and inconsistent if \alpha_i exists and correlates with X
- Ignores panel structure
- Standard errors wrong if serial correlation or heteroskedasticity present
3 Fixed Effects Model
3.1 The Core Insight
Fixed effects allows each unit to have its own intercept \alpha_i, which can be correlated with the regressors. This controls for all time-invariant unobserved heterogeneity.
3.2 Model Specification
3.2.1 Scalar Form
y_{it} = x_{it}'\beta + \alpha_i + \varepsilon_{it}
where: - \alpha_i = individual-specific effect (time-invariant) - \varepsilon_{it} = idiosyncratic error (time-varying)
3.2.2 Decomposition of Error
u_{it} = \alpha_i + \varepsilon_{it}
Key difference from pooled OLS: We explicitly model \alpha_i and allow \text{Cov}(\alpha_i, x_{it}) \neq 0.
3.2.3 Matrix Form (LSDV)
Stack all observations and include unit dummies: y = X\beta + D\alpha + \varepsilon
where: - D is the (NT \times N) matrix of unit dummy variables - \alpha = (\alpha_1, \ldots, \alpha_N)' is the (N \times 1) vector of fixed effects
3.2.4 Structure of the Dummy Matrix D
D = I_N \otimes \mathbf{1}_T = \begin{bmatrix} \mathbf{1}_T & 0 & \cdots & 0 \\ 0 & \mathbf{1}_T & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \mathbf{1}_T \end{bmatrix}_{NT \times N}
For unit 1, the first T rows have 1 in column 1 and 0 elsewhere, etc.
3.3 Two Equivalent Estimators
3.3.1 LSDV (Least Squares Dummy Variable) Estimator
Run OLS on the augmented model: \begin{bmatrix} \hat{\beta}_{LSDV} \\ \hat{\alpha} \end{bmatrix} = \left( \begin{bmatrix} X & D \end{bmatrix}' \begin{bmatrix} X & D \end{bmatrix} \right)^{-1} \begin{bmatrix} X & D \end{bmatrix}' y
Problem: Computationally expensive when N is large (adds N dummy variables).
3.3.2 Within (Demeaned) Estimator
More efficient approach: Transform the data to eliminate \alpha_i.
3.3.2.1 Unit-Specific Time Averages
For each unit i, compute: \bar{y}_i = \frac{1}{T}\sum_{t=1}^T y_{it}, \quad \bar{x}_i = \frac{1}{T}\sum_{t=1}^T x_{it}
3.3.2.2 Taking Deviations from Means
Average the model over time: \bar{y}_i = \bar{x}_i'\beta + \alpha_i + \bar{\varepsilon}_i
Subtract this from the original equation: y_{it} - \bar{y}_i = (x_{it} - \bar{x}_i)'\beta + (\varepsilon_{it} - \bar{\varepsilon}_i)
Notice: The fixed effect \alpha_i cancels out! This is the “within transformation.”
3.3.2.3 Demeaned Variables
Define: \tilde{y}_{it} = y_{it} - \bar{y}_i, \quad \tilde{x}_{it} = x_{it} - \bar{x}_i, \quad \tilde{\varepsilon}_{it} = \varepsilon_{it} - \bar{\varepsilon}_i
The within-transformed model: \tilde{y}_{it} = \tilde{x}_{it}'\beta + \tilde{\varepsilon}_{it}
Apply OLS to this demeaned data.
3.3.3 Matrix Form of Within Transformation
Define the demeaning matrix for each unit: M_T = I_T - \frac{1}{T}\mathbf{1}_T\mathbf{1}_T' = I_T - J_T
This matrix, when applied to any (T \times 1) vector, subtracts its mean from each element.
For the full panel: M = I_N \otimes M_T = I_{NT} - I_N \otimes J_T
Properties of M: - Symmetric: M = M' - Idempotent: M^2 = M (key for projection matrices) - M removes unit-specific means
3.3.4 Apply the Transformation
\tilde{y} = My, \quad \tilde{X} = MX
Note that M \cdot D = 0 (the demeaning matrix eliminates all unit dummy variables).
3.4 Fixed Effects Estimator
3.4.1 The Within Estimator Formula
\boxed{\hat{\beta}_{FE} = (\tilde{X}'\tilde{X})^{-1}\tilde{X}'\tilde{y} = (X'MX)^{-1}X'My}
This is OLS on demeaned data.
3.4.2 Relationship to LSDV
Theorem: \hat{\beta}_{FE} = \hat{\beta}_{LSDV}
Both approaches give identical coefficient estimates, but the within estimator is computationally much faster.
3.4.3 Recovering the Fixed Effects
After estimating \hat{\beta}_{FE}, we can recover: \hat{\alpha}_i = \bar{y}_i - \bar{x}_i'\hat{\beta}_{FE}
3.5 Assumptions
3.5.1 Key Assumptions for FE
- Strict exogeneity (conditional): E[\varepsilon_{it} \mid X_i, \alpha_i] = 0
- Homoskedasticity: \text{Var}(\varepsilon_{it}) = \sigma_\varepsilon^2
- No serial correlation: \text{Cov}(\varepsilon_{it}, \varepsilon_{is}) = 0 for t \neq s
- No perfect collinearity in \tilde{X}
Crucially: We do NOT require \text{Cov}(\alpha_i, x_{it}) = 0. This is the main advantage!
3.6 Variance of the FE Estimator
\text{Var}(\hat{\beta}_{FE} \mid X, \alpha) = \sigma_\varepsilon^2 (X'MX)^{-1}
Estimation of \sigma_\varepsilon^2: \hat{\sigma}_\varepsilon^2 = \frac{\tilde{\varepsilon}'\tilde{\varepsilon}}{NT - N - K} = \frac{\sum_{i=1}^N\sum_{t=1}^T \tilde{\varepsilon}_{it}^2}{NT - N - K}
Degrees of freedom: NT - N - K because we lose N fixed effects and estimate K slopes.
3.7 Interpretation and Properties
3.7.1 What FE Identifies
Fixed effects uses within-unit variation over time: \hat{\beta}_{FE} = \text{Cov}(\tilde{x}_{it}, \tilde{y}_{it}) / \text{Var}(\tilde{x}_{it})
It answers: “When x_{it} changes within unit i over time, how does y_{it} change?”
3.7.2 What FE Cannot Identify
Time-invariant variables are eliminated by demeaning: - If x_{it} = x_i (no time variation), then \tilde{x}_{it} = x_i - x_i = 0 - Cannot estimate effects of gender, race, country of birth, etc.
3.7.3 R^2 in Fixed Effects
Three types of R^2:
Within R^2: Fit of demeaned model R^2_{\text{within}} = 1 - \frac{\sum \tilde{\varepsilon}_{it}^2}{\sum \tilde{y}_{it}^2}
Between R^2: Fit of unit means R^2_{\text{between}} = 1 - \frac{\sum (\bar{y}_i - \bar{x}_i'\hat{\beta}_{FE})^2}{\sum (\bar{y}_i - \bar{y})^2}
Overall R^2: Total fit including fixed effects
3.8 Advantages and Disadvantages
3.8.1 Advantages
- Consistent even when \text{Cov}(\alpha_i, x_{it}) \neq 0
- Eliminates omitted variable bias from time-invariant factors
- Natural for policy evaluation (before-after comparisons)
- No distributional assumptions on \alpha_i
3.8.2 Disadvantages
- Cannot estimate time-invariant effects
- Less efficient than RE if \text{Cov}(\alpha_i, x_{it}) = 0 actually holds
- May exacerbate measurement error in x_{it}
- Requires sufficient within-unit variation
4 Random Effects Model
4.1 The Core Idea
Random effects treats \alpha_i as a random variable drawn from a distribution, uncorrelated with the regressors. This allows more efficient estimation than FE.
4.2 Model Specification
4.2.1 Scalar Form
y_{it} = x_{it}'\beta + \alpha_i + \varepsilon_{it}
Same structure as FE, but with different assumptions on \alpha_i.
4.2.2 Composite Error Structure
u_{it} = \alpha_i + \varepsilon_{it}
where: - \alpha_i \sim (0, \sigma_\alpha^2) — random individual effect - \varepsilon_{it} \sim (0, \sigma_\varepsilon^2) — idiosyncratic error - \text{Cov}(\alpha_i, \varepsilon_{jt}) = 0 for all i,j,t - KEY ASSUMPTION: \text{Cov}(\alpha_i, x_{it}) = 0 (orthogonality)
4.3 Variance-Covariance Structure
4.3.1 Variance of Composite Error
\text{Var}(u_{it}) = \text{Var}(\alpha_i) + \text{Var}(\varepsilon_{it}) = \sigma_\alpha^2 + \sigma_\varepsilon^2
4.3.2 Serial Correlation Within Units
For the same unit i at different times: \text{Cov}(u_{it}, u_{is}) = \text{Cov}(\alpha_i + \varepsilon_{it}, \alpha_i + \varepsilon_{is}) = \sigma_\alpha^2
This creates positive serial correlation within units.
4.3.3 Intraclass Correlation Coefficient
The correlation between any two observations from the same unit: \rho = \frac{\sigma_\alpha^2}{\sigma_\alpha^2 + \sigma_\varepsilon^2} = \frac{\sigma_\alpha^2}{\sigma_u^2}
This measures the fraction of total variance due to unit-specific effects.
4.3.4 Variance-Covariance Matrix for Unit i
\text{Var}(u_i) = \Omega_i = \sigma_\varepsilon^2 I_T + \sigma_\alpha^2 \mathbf{1}_T\mathbf{1}_T' = \sigma_\varepsilon^2 I_T + \sigma_\alpha^2 J_T \cdot T
In expanded form: \Omega_i = \begin{bmatrix} \sigma_\alpha^2 + \sigma_\varepsilon^2 & \sigma_\alpha^2 & \cdots & \sigma_\alpha^2 \\ \sigma_\alpha^2 & \sigma_\alpha^2 + \sigma_\varepsilon^2 & \cdots & \sigma_\alpha^2 \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_\alpha^2 & \sigma_\alpha^2 & \cdots & \sigma_\alpha^2 + \sigma_\varepsilon^2 \end{bmatrix}_{T \times T}
Structure: Constant variance on diagonal, constant covariance off-diagonal (equicorrelation).
4.3.5 Full Panel Variance-Covariance Matrix
\Omega = \text{Var}(u) = I_N \otimes \Omega_i = \sigma_\varepsilon^2 I_{NT} + \sigma_\alpha^2 (I_N \otimes J_T)
4.4 Random Effects Estimator
4.4.1 GLS (Generalized Least Squares) Approach
Since \Omega \neq \sigma^2 I, OLS is inefficient. The efficient estimator is GLS:
\boxed{\hat{\beta}_{RE} = (X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y}
Problem: This requires knowing \Omega (which depends on \sigma_\alpha^2 and \sigma_\varepsilon^2).
4.4.2 Feasible GLS (FGLS)
In practice: 1. Estimate \sigma_\alpha^2 and \sigma_\varepsilon^2 from data 2. Construct \hat{\Omega} 3. Use FGLS: \hat{\beta}_{RE} = (X'\hat{\Omega}^{-1}X)^{-1}X'\hat{\Omega}^{-1}y
4.4.3 Quasi-Demeaning Transformation (Practical Implementation)
Rather than compute \Omega^{-1} directly, RE can be implemented via partial demeaning.
4.4.3.1 The Transformation
Define the quasi-demeaning factor: \theta = 1 - \sqrt{\frac{\sigma_\varepsilon^2}{\sigma_\varepsilon^2 + T\sigma_\alpha^2}}
Transform the data: y_{it}^* = y_{it} - \theta\bar{y}_i, \quad x_{it}^* = x_{it} - \theta\bar{x}_i
Random effects estimator: \hat{\beta}_{RE} = \left(\sum_{i=1}^N\sum_{t=1}^T x_{it}^* x_{it}^{*'}\right)^{-1} \left(\sum_{i=1}^N\sum_{t=1}^T x_{it}^* y_{it}^*\right)
This is just OLS on quasi-demeaned data.
4.4.3.2 Understanding \theta
Interpretation: \theta determines how much of the unit mean to subtract.
0 \leq \theta \leq 1
Special cases:
- If \theta = 0: y_{it}^* = y_{it} → No demeaning → Pooled OLS
- Occurs when \sigma_\alpha^2 = 0 (no random effects)
- If \theta = 1: y_{it}^* = y_{it} - \bar{y}_i → Full demeaning → Fixed Effects
- Occurs when \sigma_\alpha^2 \to \infty or T \to \infty
- If 0 < \theta < 1: Partial demeaning → True Random Effects
- Weighted average of pooled OLS and FE
4.4.3.3 Why Partial Demeaning?
Full demeaning (FE) removes all between-unit variation. RE uses both: - Within variation: Changes over time within units - Between variation: Differences across units
By only partially demeaning, RE preserves some between-unit information while still accounting for the serial correlation induced by \alpha_i.
4.5 Estimating Variance Components
Several methods to estimate \sigma_\alpha^2 and \sigma_\varepsilon^2:
4.5.1 Method 1: From Fixed Effects and Pooled OLS Residuals
- Estimate FE model, get \hat{\sigma}_\varepsilon^2
- Estimate pooled OLS, get \hat{\sigma}_u^2
- Calculate: \hat{\sigma}_\alpha^2 = \frac{\hat{\sigma}_u^2 - \hat{\sigma}_\varepsilon^2}{T}
4.5.2 Method 2: ANOVA-type Estimator
- Run fixed effects, compute \hat{\sigma}_\varepsilon^2 from within residuals
- Compute between variation from unit means
- Estimate \hat{\sigma}_\alpha^2 from between-group sum of squares
4.5.3 Method 3: Maximum Likelihood
Assume normality and maximize: \mathcal{L}(\beta, \sigma_\alpha^2, \sigma_\varepsilon^2) = \prod_{i=1}^N f(y_i \mid X_i; \beta, \Omega_i)
4.6 Variance of the RE Estimator
Under RE assumptions: \text{Var}(\hat{\beta}_{RE} \mid X) = (X'\Omega^{-1}X)^{-1}
This is smaller than (more efficient than) FE variance when the RE assumption \text{Cov}(\alpha_i, x_{it}) = 0 holds.
4.7 Assumptions
4.7.1 Critical RE Assumptions
- Orthogonality: E[\alpha_i \mid X_i] = 0 (equivalently, \text{Cov}(\alpha_i, x_{it}) = 0)
- Random effects distribution: \alpha_i \sim (0, \sigma_\alpha^2)
- Idiosyncratic errors: \varepsilon_{it} \sim (0, \sigma_\varepsilon^2)
- Strict exogeneity: E[\varepsilon_{it} \mid X_i, \alpha_i] = 0
- No correlation: \text{Cov}(\alpha_i, \varepsilon_{jt}) = 0 for all i,j,t
Assumption 1 is the most restrictive and differentiates RE from FE.
4.8 Advantages and Disadvantages
4.8.1 Advantages
- More efficient than FE when assumptions hold
- Can estimate coefficients on time-invariant variables
- Uses both within and between variation
- Better for generalization to population
- Computationally simpler than LSDV
4.8.2 Disadvantages
- Inconsistent if \text{Cov}(\alpha_i, x_{it}) \neq 0
- Requires strong orthogonality assumption
- Sensitive to model specification
- Less robust than FE
5 Comparing the Three Models
5.1 Summary Table
| Aspect | Pooled OLS | Fixed Effects | Random Effects |
|---|---|---|---|
| Treatment of \alpha_i | Ignored | Fixed parameters | Random draws |
| Estimator formula | (X'X)^{-1}X'y | (X'MX)^{-1}X'My | (X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y |
| Demeaning | None | Full (\theta=1) | Partial (0<\theta<1) |
| Consistency requires | \text{Cov}(\alpha_i, x_{it})=0 | None (allows correlation) | \text{Cov}(\alpha_i, x_{it})=0 |
| Time-invariant X | Can estimate | Cannot estimate | Can estimate |
| Efficiency | Low (if panel structure) | Medium | High (if assumptions hold) |
| Variation used | Total | Within only | Within + Between |
| Robustness | Low | High | Medium |
5.2 Relationships Between Estimators
5.2.1 Nested Structure
\text{Pooled OLS} \xleftarrow[\theta=0]{\text{special case}} \text{Random Effects} \xleftarrow[\theta=1]{\text{special case}} \text{Fixed Effects}
5.2.2 Algebraic Relationships
FE and RE converge as T \to \infty: \lim_{T \to \infty} \theta = 1 \implies \hat{\beta}_{RE} \to \hat{\beta}_{FE}
RE is between pooled and FE: \hat{\beta}_{RE} = \lambda \hat{\beta}_{FE} + (1-\lambda)\hat{\beta}_{P} for some 0 < \lambda < 1 (informally)
5.3 Bias-Efficiency Trade-off
5.3.1 When \text{Cov}(\alpha_i, x_{it}) = 0 (RE assumption holds)
- Pooled OLS: Consistent but inefficient (ignores serial correlation)
- FE: Consistent but inefficient (throws away between variation)
- RE: Consistent AND efficient ✓
5.3.2 When \text{Cov}(\alpha_i, x_{it}) \neq 0 (RE assumption fails)
- Pooled OLS: Biased and inconsistent ✗
- FE: Consistent ✓
- RE: Biased and inconsistent ✗
Lesson: If in doubt, use FE for robustness.
6 Statistical Tests for Model Selection
6.1 Test 1: F-test for Fixed Effects vs Pooled OLS
6.1.1 Null and Alternative Hypotheses
H_0: \alpha_1 = \alpha_2 = \cdots = \alpha_N = 0 \quad (\text{Pooled OLS is adequate}) H_1: \text{At least one } \alpha_i \neq 0 \quad (\text{Fixed effects needed})
6.1.2 Test Statistic
F = \frac{(SSR_P - SSR_{FE})/N}{SSR_{FE}/(NT - N - K)} \sim F(N, NT-N-K)
where: - SSR_P = sum of squared residuals from pooled OLS - SSR_{FE} = sum of squared residuals from fixed effects
6.1.3 Decision Rule
- If F > F_{\alpha} (or p < 0.05): Reject H_0 → Use fixed effects
- If F \leq F_{\alpha} (or p \geq 0.05): Fail to reject → Pooled OLS adequate
6.2 Test 2: Breusch-Pagan LM Test for Random Effects
6.2.1 Null and Alternative Hypotheses
H_0: \sigma_\alpha^2 = 0 \quad (\text{No random effects, pooled OLS is adequate}) H_1: \sigma_\alpha^2 > 0 \quad (\text{Random effects exist})
6.2.2 Test Statistic
LM = \frac{NT}{2(T-1)} \left[ \frac{\sum_{i=1}^N (\sum_{t=1}^T \hat{u}_{it})^2}{\sum_{i=1}^N\sum_{t=1}^T \hat{u}_{it}^2} - 1 \right]^2 \sim \chi^2(1)
where \hat{u}_{it} are pooled OLS residuals.
6.2.3 Intuition
The test checks if residuals are more correlated within units than expected by chance.
6.2.4 Decision Rule
- If LM > \chi^2_{\alpha}(1) (or p < 0.05): Reject H_0 → Random effects needed
- If LM \leq \chi^2_{\alpha}(1) (or p \geq 0.05): Fail to reject → Pooled OLS adequate
6.3 Test 3: Hausman Specification Test (Fixed vs Random Effects)
6.3.1 The Key Question
Should we use fixed effects or random effects?
6.3.2 Null and Alternative Hypotheses
$ H_0: (i, x{it}) = 0 () $ $ H_1: (i, x{it}) () $
6.3.3 Intuition
- Under H_0: Both FE and RE are consistent, but RE is more efficient
- Under H_1: Only FE is consistent, RE is biased
If the two estimators differ significantly, it suggests H_1 is true.
6.3.4 Test Statistic
$ H = ({FE} - )’ [({FE}) - ()]^{-1} ({FE} - ) ^2(K) $
where K is the number of time-varying regressors.
6.3.5 Components
Difference vector: $ = {FE} - $
Variance of difference: $ () = ({FE}) - () $
Under H_0, \text{Var}(\hat{q}) is positive semi-definite.
6.3.6 Decision Rule
- If H > \chi^2_{\alpha}(K) (or p < 0.05): Reject H_0 → Use Fixed Effects
- If H \leq \chi^2_{\alpha}(K) (or p \geq 0.05): Fail to reject → Use Random Effects
6.3.7 Practical Interpretation
Small Hausman statistic / Large p-value: - \hat{\beta}_{FE} \approx \hat{\beta}_{RE} - Suggests orthogonality assumption may hold - RE preferred for efficiency
Large Hausman statistic / Small p-value: - \hat{\beta}_{FE} and \hat{\beta}_{RE} differ substantially - Suggests correlation between \alpha_i and x_{it} - FE preferred for consistency
6.4 Complete Decision Framework
6.4.1 Step-by-Step Model Selection
START
│
├─→ Run Pooled OLS
│
├─→ Test 1: F-test for Fixed Effects
│ ├─ Reject H₀? → Panel structure exists, continue
│ └─ Fail to reject? → Use Pooled OLS (with robust SE)
│
├─→ Test 2: Breusch-Pagan LM test
│ ├─ Reject H₀? → Random effects exist, continue
│ └─ Fail to reject? → Use Pooled OLS
│
├─→ Test 3: Hausman Test
│ ├─ Reject H₀? → Use FIXED EFFECTS ✓
│ └─ Fail to reject? → Use RANDOM EFFECTS ✓
│
END
6.4.2 Theoretical Considerations
Beyond statistical tests, consider:
- Nature of sample:
- Specific units of interest (e.g., G7 countries) → FE
- Random sample from population → RE
- Research question:
- Need time-invariant effects? → RE or Pooled
- Focus on within-unit changes? → FE
- Data structure:
- Large N, small T → FE often appropriate
- Small N, large T → Both work, but consider dynamics
- Plausibility of orthogonality:
- Can \alpha_i realistically be uncorrelated with X? Usually no → FE
- Example: Individual ability affects both education and wages → FE
7 Additional Diagnostic Tests
7.1 Serial Correlation Tests
7.1.1 Why It Matters
Panel data often exhibits serial correlation in \varepsilon_{it}, violating the assumption: $ (,* ) = 0 t s $
7.1.2 Wooldridge Test for Serial Correlation
Null hypothesis: H_0: No first-order serial correlation
Test procedure: 1. Estimate FE model, obtain residuals \hat{\varepsilon}_{it} 2. Regress \hat{\varepsilon}_{it} on \hat{\varepsilon}_{i,t-1} and other variables 3. Test if coefficient on \hat{\varepsilon}_{i,t-1} is zero
Decision rule: - If p < 0.05 → Serial correlation present - Use clustered standard errors or Driscoll-Kraay SEs
7.2 Heteroskedasticity Tests
7.2.1 Modified Wald Test for Groupwise Heteroskedasticity
Null hypothesis: H_0: \sigma_{\varepsilon_i}^2 = \sigma_\varepsilon^2 for all i
Tests whether error variance differs across units.
Decision rule: - If p < 0.05 → Use robust/clustered standard errors
7.3 Cross-Sectional Dependence
7.3.1 Pesaran CD Test
Tests whether residuals are correlated across different units: $ H_0: (,* ) = 0 i j $
Implications if rejected: - Spatial correlation or common shocks - May need Driscoll-Kraay SEs or spatial panel models
8 Robust Inference in Panel Data
8.1 Clustered Standard Errors
8.1.1 Why Cluster?
Even after accounting for \alpha_i, errors may be correlated within units: $ (,* ) $
8.1.2 Cluster-Robust Variance Estimator
For FE: $ {cluster}() = (X’MX)^{-1} (_{i=1}^N X_i’M _i _i’ M X_i) (X’MX)^{-1} $
This allows arbitrary correlation within clusters (units).
8.1.3 When to Use
- Default for panel data
- Especially important if T is small
- Conservative approach
8.2 Driscoll-Kraay Standard Errors
8.2.1 For Spatial and Temporal Correlation
Allows for: - Serial correlation within units - Cross-sectional correlation across units - Useful when T is large
8.2.2 Variance Estimator
$ {DK}() = (X’X)^{-1} (X’X)^{-1} $
where \hat{S}_{DK} accounts for both serial and spatial correlation.
9 Dynamic Panel Data Models
9.1 The Basic Dynamic Model
$ y_{it} = y_{i,t-1} + x_{it}’+ i +* $
Problem: The lagged dependent variable y_{i,t-1} is correlated with \alpha_i, making both FE and RE inconsistent.
9.2 Why Standard Estimators Fail
9.2.1 Correlation Structure
$ y_{i,t-1} = x_{i,t-1}’+ i +* $
Since y_{i,t-1} contains \alpha_i: $ (y_{i,t-1}, _i) $
This violates strict exogeneity even for FE!
9.2.2 The FE Bias
With fixed T, the FE estimator of \rho is biased: $ _{FE} - = O(1/T) $
For small T (common in micro panels), this bias can be large.
9.3 Solutions: GMM Estimators
9.3.1 Arellano-Bond (First-Difference GMM)
First-difference the equation: $ y_{it} = y_{i,t-1} + x_{it}’+ _{it} $
Use y_{i,t-2}, y_{i,t-3}, \ldots as instruments for \Delta y_{i,t-1}.
9.3.2 Arellano-Bover/Blundell-Bond (System GMM)
Combines: - First-differenced equations (with lagged levels as instruments) - Level equations (with lagged differences as instruments)
More efficient than AB when \rho is close to 1.
10 Extensions and Advanced Topics
10.1 Two-Way Fixed Effects
10.1.1 Model with Time Fixed Effects
$ y_{it} = x_{it}’+ _i + t +* $
where: - \alpha_i = unit fixed effects - \lambda_t = time fixed effects (common shocks)
10.1.2 Estimation
Include both unit and time dummies, or double-demean: $ {it} = y{it} - {y}_i - {y}_t + {{y}} $
10.1.3 When to Use
- Control for aggregate time shocks (recessions, policy changes)
- Standard in difference-in-differences applications
10.2 Unbalanced Panels
10.2.1 Missing Data Structure
Not all units observed in all time periods: - T_i varies across units - Total observations: n = \sum_{i=1}^N T_i
10.2.2 Implications
- Pooled OLS and RE: Straightforward to adapt
- FE: Still consistent, but demeaning uses T_i for each unit
- Hausman test may be affected
10.2.3 Handling
Most software handles automatically, but check: - Missingness mechanism (MAR vs MNAR) - Impact on variance estimates
10.3 Instrumental Variables in Panel Data
10.3.1 Model
$ y_{it} = x_{it}‘+ z_{it}’+ i +* $
where z_{it} is endogenous: \text{Cov}(z_{it}, \varepsilon_{it}) \neq 0
10.3.2 FE-2SLS
- Demean all variables (including instruments)
- Apply 2SLS to demeaned data
10.3.3 Requires
Valid instruments w_{it} such that: - Relevance: \text{Cov}(w_{it}, z_{it}) \neq 0 - Exogeneity: \text{Cov}(w_{it}, \varepsilon_{it}) = 0
11 Practical Implementation Guide
11.1 Model Selection Checklist
11.1.1 1. Preliminary Analysis
11.1.2 2. Estimate All Models
11.1.3 3. Run Specification Tests
11.1.4 4. Check Diagnostics
11.1.5 5. Robust Inference
11.1.6 6. Report Results
11.2 Common Pitfalls and How to Avoid Them
11.2.1 Pitfall 1: Using Pooled OLS When Panel Structure Exists
Problem: Biased estimates and wrong standard errors
Solution: Always test with F-test and BP-LM test
11.2.2 Pitfall 2: Using RE When \text{Cov}(\alpha_i, x_{it}) \neq 0
Problem: Inconsistent estimates
Solution: - Run Hausman test - Consider theoretical plausibility - When in doubt, use FE
11.2.3 Pitfall 3: Ignoring Serial Correlation
Problem: Standard errors too small, over-rejection of hypotheses
Solution: Always use cluster-robust SEs
11.2.4 Pitfall 4: Trying to Estimate Time-Invariant Effects with FE
Problem: Cannot identify these coefficients
Solution: Use RE or pooled OLS if these variables are crucial
11.2.5 Pitfall 5: Not Checking Within Variation
Problem: FE may not be identified if no within variation
Solution: Check \text{Var}(\tilde{x}_{it}) > 0 before using FE
12 Comparison with Alternative Methods
12.1 Panel Data vs Cross-Sectional Regression
| Aspect | Cross-Section | Panel Data |
|---|---|---|
| Observations | N units, 1 time | N units, T times |
| Unobserved heterogeneity | Cannot control | FE/RE control |
| Efficiency | Lower (less data) | Higher |
| Dynamics | Cannot study | Can model |
| Identification | Weaker | Stronger |
12.2 Panel Data vs Time Series
| Aspect | Time Series | Panel Data |
|---|---|---|
| Units | 1 unit, T times | N units, T times |
| Asymptotics | T → ∞ | N → ∞ (usually) |
| Unobserved effects | Cannot separate from trend | Can separate |
| Degrees of freedom | Limited | Abundant |
13 Summary and Recommendations
13.1 When to Use Each Model
13.1.1 Use Pooled OLS When:
- No panel structure detected (F-test and BP-LM both fail to reject)
- Truly independent observations
- Time-invariant effects are crucial and plausibly exogenous
13.1.2 Use Fixed Effects When:
- \text{Cov}(\alpha_i, x_{it}) likely non-zero (most cases)
- Focus on within-unit changes
- Robustness is priority
- Don’t need time-invariant coefficients
13.1.3 Use Random Effects When:
- Hausman test fails to reject
- Need time-invariant coefficients
- Sample is random draw from population
- \text{Cov}(\alpha_i, x_{it}) = 0 is plausible
13.2 General Advice
- Default to FE for robustness in most economic applications
- Always use robust/clustered standard errors
- Run all three models and specification tests for comparison
- Consider the economics not just the statistics
- Check diagnostics after selecting model
14 Mathematical Appendix
14.1 Kronecker Product Properties
For matrices A (m × n) and B (p × q):
$ A B =
\begin{bmatrix} a_{11}B & a_{12}B & \cdots & a_{1n}B \\ a_{21}B & a_{22}B & \cdots & a_{2n}B \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1}B & a_{m2}B & \cdots & a_{mn}B \end{bmatrix}_{mp nq} $
Properties: 1. (A \otimes B)(C \otimes D) = (AC) \otimes (BD) 2. (A \otimes B)' = A' \otimes B' 3. (A \otimes B)^{-1} = A^{-1} \otimes B^{-1} (if invertible)
14.2 Matrix Differentiation Rules
For \beta as a column vector:
- \frac{\partial (a'\beta)}{\partial \beta} = a
- \frac{\partial (\beta'A\beta)}{\partial \beta} = (A + A')\beta
- \frac{\partial (y - X\beta)'(y - X\beta)}{\partial \beta} = -2X'(y - X\beta)
14.3 Projection Matrix Properties
For any projection matrix P:
- Symmetric: P = P'
- Idempotent: P^2 = P
- Eigenvalues: Either 0 or 1
- Complement: M = I - P is also a projection matrix
15 References and Further Reading
15.1 Essential Textbooks
- Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press.
- Comprehensive treatment of panel data methods
- Focuses on microeconometrics applications
- Baltagi, B. H. (2021). Econometric Analysis of Panel Data (6th ed.). Springer.
- Detailed mathematical exposition
- Covers advanced topics
- Hsiao, C. (2014). Analysis of Panel Data (3rd ed.). Cambridge University Press.
- Theoretical foundations
- Economic applications
15.2 Software Resources
- Stata:
xtreg,xttest0,hausman,xtserial - R:
plmpackage,lfepackage for high-dimensional FE - Python:
linearmodelspackage - Eviews: Built-in panel data procedures
15.3 Online Resources
- Econometrics Academy (YouTube channel)
- NBER Summer Institute lectures
- World Bank’s Impact Evaluation resources
16 Glossary of Terms
Balanced panel: Every unit observed in every time period
Composite error: u_{it} = \alpha_i + \varepsilon_{it}
Demeaning: Subtracting group means from variables
Endogeneity: Correlation between regressor and error term
Fixed effect: Unit-specific intercept treated as parameter
Idiosyncratic error: Time-varying, unit-specific shock \varepsilon_{it}
Intraclass correlation: Correlation of observations within same unit
Panel data: Dataset with both cross-sectional and time dimensions
Quasi-demeaning: Partial demeaning with factor 0 < \theta < 1
Random effect: Unit-specific term treated as random draw
Strict exogeneity: E[\varepsilon_{it} \mid X_i] = 0 for all t
Unbalanced panel: Units observed different numbers of times
Within transformation: Demeaning data to remove fixed effects
End of Document