Panel Data Models: A Comprehensive Guide

Pooled OLS, Fixed Effects, and Random Effects Estimators

Author

Oliver James

Published

December 2, 2025

1 Introduction and Setup

1.1 What is Panel Data?

Panel data (also called longitudinal data) combines cross-sectional and time-series dimensions. We observe multiple entities (individuals, firms, countries, regions) over multiple time periods.

Key advantages:

  • Controls for unobserved heterogeneity
  • More degrees of freedom and efficiency
  • Can study dynamics and causality better than pure cross-section
  • Reduces collinearity between variables

1.2 Notation and Data Structure

1.2.1 Basic Setup

Consider a balanced panel with:

  • N = number of cross-sectional units (e.g., regions, individuals)
  • T = number of time periods
  • n = NT = total number of observations

1.2.2 Variables

  • y_{it} — dependent variable for unit i at time t
  • x_{it}(K \times 1) vector of explanatory variables (regressors)
  • \beta(K \times 1) parameter vector (coefficients of interest)
  • \alpha_i — individual-specific effect (unobserved heterogeneity)
  • \varepsilon_{it} — idiosyncratic error term (time-varying shock)

1.2.3 Stacking Convention

We stack observations in a specific order: all time periods for unit 1, then all time periods for unit 2, etc.

Stacked dependent variable: y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_N \end{bmatrix}_{NT \times 1}, \quad \text{where} \quad y_i = \begin{bmatrix} y_{i1} \\ y_{i2} \\ \vdots \\ y_{iT} \end{bmatrix}_{T \times 1}

Stacked regressor matrix: X = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_N \end{bmatrix}_{NT \times K}, \quad \text{where} \quad X_i = \begin{bmatrix} x_{i1}' \\ x_{i2}' \\ \vdots \\ x_{iT}' \end{bmatrix}_{T \times K}

1.2.4 Useful Matrix Notation

  • I_N(N \times N) identity matrix
  • I_T(T \times T) identity matrix
  • \mathbf{1}_T(T \times 1) vector of ones
  • J_T = \mathbf{1}_T \mathbf{1}_T' / T(T \times T) averaging matrix
  • \otimes — Kronecker product operator

2 Pooled OLS Model

2.1 The Basic Idea

Pooled OLS treats all observations as if they were independent, ignoring the panel structure. It assumes no unobserved heterogeneity across units.

2.2 Model Specification

2.2.1 Scalar Form

For unit i at time t: y_{it} = x_{it}'\beta + u_{it}, \quad i=1,\ldots,N, \quad t=1,\ldots,T

where u_{it} is a composite error term that pools all unobserved effects.

2.2.2 Matrix Form

Stacking all observations: y = X\beta + u

where: - y is (NT \times 1) - X is (NT \times K) - \beta is (K \times 1) - u is (NT \times 1)

2.3 Pooled OLS Estimator

2.3.1 Derivation

Minimize the sum of squared residuals: \min_{\beta} \quad (y - X\beta)'(y - X\beta)

First-order condition: -2X'(y - X\hat{\beta}_P) = 0

Solving for \hat{\beta}_P: X'X\hat{\beta}_P = X'y

2.3.2 The Estimator

\boxed{\hat{\beta}_P = (X'X)^{-1}X'y}

This is just standard OLS applied to the entire stacked dataset.

2.4 Assumptions

2.4.1 Classical Assumptions

  1. Strict exogeneity: E[u_{it} \mid X] = 0
  2. Homoskedasticity: \text{Var}(u_{it}) = \sigma_u^2 for all i,t
  3. No serial correlation: \text{Cov}(u_{it}, u_{is}) = 0 for t \neq s
  4. No cross-sectional correlation: \text{Cov}(u_{it}, u_{jt}) = 0 for i \neq j

2.4.2 Variance-Covariance Matrix

Under these assumptions: \text{Var}(u) = \sigma_u^2 I_{NT}

where I_{NT} is the (NT \times NT) identity matrix.

2.5 Variance of the Estimator

Under classical assumptions: \text{Var}(\hat{\beta}_P \mid X) = \sigma_u^2 (X'X)^{-1}

Estimation of \sigma_u^2: \hat{\sigma}_u^2 = \frac{\hat{u}'\hat{u}}{NT - K} = \frac{\sum_{i=1}^N \sum_{t=1}^T \hat{u}_{it}^2}{NT - K}

2.6 When Does Pooled OLS Work?

2.6.1 Consistency Condition

Pooled OLS is consistent if: E[u_{it} \mid X] = 0

This requires: No unobserved unit-specific effects that correlate with X.

2.6.2 Why Pooled OLS Usually Fails

In reality, u_{it} often contains unobserved individual effects: u_{it} = \alpha_i + \varepsilon_{it}

If \alpha_i is correlated with x_{it}, then: E[u_{it} \mid X] = E[\alpha_i \mid X] \neq 0

This causes omitted variable bias.

2.6.3 Example: Returns to Education

Model: \ln(\text{wage}_{it}) = \beta_0 + \beta_1 \text{education}_{it} + u_{it}

Problem: u_{it} contains unobserved ability \alpha_i

  • High ability individuals may get more education
  • \text{Cov}(\text{education}_{it}, \alpha_i) > 0
  • Pooled OLS overestimates returns to education

2.7 Advantages and Disadvantages

2.7.1 Advantages

  • Simple to compute
  • Efficient if assumptions hold
  • Can include time-invariant variables

2.7.2 Disadvantages

  • Biased and inconsistent if \alpha_i exists and correlates with X
  • Ignores panel structure
  • Standard errors wrong if serial correlation or heteroskedasticity present

3 Fixed Effects Model

3.1 The Core Insight

Fixed effects allows each unit to have its own intercept \alpha_i, which can be correlated with the regressors. This controls for all time-invariant unobserved heterogeneity.

3.2 Model Specification

3.2.1 Scalar Form

y_{it} = x_{it}'\beta + \alpha_i + \varepsilon_{it}

where: - \alpha_i = individual-specific effect (time-invariant) - \varepsilon_{it} = idiosyncratic error (time-varying)

3.2.2 Decomposition of Error

u_{it} = \alpha_i + \varepsilon_{it}

Key difference from pooled OLS: We explicitly model \alpha_i and allow \text{Cov}(\alpha_i, x_{it}) \neq 0.

3.2.3 Matrix Form (LSDV)

Stack all observations and include unit dummies: y = X\beta + D\alpha + \varepsilon

where: - D is the (NT \times N) matrix of unit dummy variables - \alpha = (\alpha_1, \ldots, \alpha_N)' is the (N \times 1) vector of fixed effects

3.2.4 Structure of the Dummy Matrix D

D = I_N \otimes \mathbf{1}_T = \begin{bmatrix} \mathbf{1}_T & 0 & \cdots & 0 \\ 0 & \mathbf{1}_T & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \mathbf{1}_T \end{bmatrix}_{NT \times N}

For unit 1, the first T rows have 1 in column 1 and 0 elsewhere, etc.

3.3 Two Equivalent Estimators

3.3.1 LSDV (Least Squares Dummy Variable) Estimator

Run OLS on the augmented model: \begin{bmatrix} \hat{\beta}_{LSDV} \\ \hat{\alpha} \end{bmatrix} = \left( \begin{bmatrix} X & D \end{bmatrix}' \begin{bmatrix} X & D \end{bmatrix} \right)^{-1} \begin{bmatrix} X & D \end{bmatrix}' y

Problem: Computationally expensive when N is large (adds N dummy variables).

3.3.2 Within (Demeaned) Estimator

More efficient approach: Transform the data to eliminate \alpha_i.

3.3.2.1 Unit-Specific Time Averages

For each unit i, compute: \bar{y}_i = \frac{1}{T}\sum_{t=1}^T y_{it}, \quad \bar{x}_i = \frac{1}{T}\sum_{t=1}^T x_{it}

3.3.2.2 Taking Deviations from Means

Average the model over time: \bar{y}_i = \bar{x}_i'\beta + \alpha_i + \bar{\varepsilon}_i

Subtract this from the original equation: y_{it} - \bar{y}_i = (x_{it} - \bar{x}_i)'\beta + (\varepsilon_{it} - \bar{\varepsilon}_i)

Notice: The fixed effect \alpha_i cancels out! This is the “within transformation.”

3.3.2.3 Demeaned Variables

Define: \tilde{y}_{it} = y_{it} - \bar{y}_i, \quad \tilde{x}_{it} = x_{it} - \bar{x}_i, \quad \tilde{\varepsilon}_{it} = \varepsilon_{it} - \bar{\varepsilon}_i

The within-transformed model: \tilde{y}_{it} = \tilde{x}_{it}'\beta + \tilde{\varepsilon}_{it}

Apply OLS to this demeaned data.

3.3.3 Matrix Form of Within Transformation

Define the demeaning matrix for each unit: M_T = I_T - \frac{1}{T}\mathbf{1}_T\mathbf{1}_T' = I_T - J_T

This matrix, when applied to any (T \times 1) vector, subtracts its mean from each element.

For the full panel: M = I_N \otimes M_T = I_{NT} - I_N \otimes J_T

Properties of M: - Symmetric: M = M' - Idempotent: M^2 = M (key for projection matrices) - M removes unit-specific means

3.3.4 Apply the Transformation

\tilde{y} = My, \quad \tilde{X} = MX

Note that M \cdot D = 0 (the demeaning matrix eliminates all unit dummy variables).

3.4 Fixed Effects Estimator

3.4.1 The Within Estimator Formula

\boxed{\hat{\beta}_{FE} = (\tilde{X}'\tilde{X})^{-1}\tilde{X}'\tilde{y} = (X'MX)^{-1}X'My}

This is OLS on demeaned data.

3.4.2 Relationship to LSDV

Theorem: \hat{\beta}_{FE} = \hat{\beta}_{LSDV}

Both approaches give identical coefficient estimates, but the within estimator is computationally much faster.

3.4.3 Recovering the Fixed Effects

After estimating \hat{\beta}_{FE}, we can recover: \hat{\alpha}_i = \bar{y}_i - \bar{x}_i'\hat{\beta}_{FE}

3.5 Assumptions

3.5.1 Key Assumptions for FE

  1. Strict exogeneity (conditional): E[\varepsilon_{it} \mid X_i, \alpha_i] = 0
  2. Homoskedasticity: \text{Var}(\varepsilon_{it}) = \sigma_\varepsilon^2
  3. No serial correlation: \text{Cov}(\varepsilon_{it}, \varepsilon_{is}) = 0 for t \neq s
  4. No perfect collinearity in \tilde{X}

Crucially: We do NOT require \text{Cov}(\alpha_i, x_{it}) = 0. This is the main advantage!

3.6 Variance of the FE Estimator

\text{Var}(\hat{\beta}_{FE} \mid X, \alpha) = \sigma_\varepsilon^2 (X'MX)^{-1}

Estimation of \sigma_\varepsilon^2: \hat{\sigma}_\varepsilon^2 = \frac{\tilde{\varepsilon}'\tilde{\varepsilon}}{NT - N - K} = \frac{\sum_{i=1}^N\sum_{t=1}^T \tilde{\varepsilon}_{it}^2}{NT - N - K}

Degrees of freedom: NT - N - K because we lose N fixed effects and estimate K slopes.

3.7 Interpretation and Properties

3.7.1 What FE Identifies

Fixed effects uses within-unit variation over time: \hat{\beta}_{FE} = \text{Cov}(\tilde{x}_{it}, \tilde{y}_{it}) / \text{Var}(\tilde{x}_{it})

It answers: “When x_{it} changes within unit i over time, how does y_{it} change?”

3.7.2 What FE Cannot Identify

Time-invariant variables are eliminated by demeaning: - If x_{it} = x_i (no time variation), then \tilde{x}_{it} = x_i - x_i = 0 - Cannot estimate effects of gender, race, country of birth, etc.

3.7.3 R^2 in Fixed Effects

Three types of R^2:

  1. Within R^2: Fit of demeaned model R^2_{\text{within}} = 1 - \frac{\sum \tilde{\varepsilon}_{it}^2}{\sum \tilde{y}_{it}^2}

  2. Between R^2: Fit of unit means R^2_{\text{between}} = 1 - \frac{\sum (\bar{y}_i - \bar{x}_i'\hat{\beta}_{FE})^2}{\sum (\bar{y}_i - \bar{y})^2}

  3. Overall R^2: Total fit including fixed effects

3.8 Advantages and Disadvantages

3.8.1 Advantages

  • Consistent even when \text{Cov}(\alpha_i, x_{it}) \neq 0
  • Eliminates omitted variable bias from time-invariant factors
  • Natural for policy evaluation (before-after comparisons)
  • No distributional assumptions on \alpha_i

3.8.2 Disadvantages

  • Cannot estimate time-invariant effects
  • Less efficient than RE if \text{Cov}(\alpha_i, x_{it}) = 0 actually holds
  • May exacerbate measurement error in x_{it}
  • Requires sufficient within-unit variation

4 Random Effects Model

4.1 The Core Idea

Random effects treats \alpha_i as a random variable drawn from a distribution, uncorrelated with the regressors. This allows more efficient estimation than FE.

4.2 Model Specification

4.2.1 Scalar Form

y_{it} = x_{it}'\beta + \alpha_i + \varepsilon_{it}

Same structure as FE, but with different assumptions on \alpha_i.

4.2.2 Composite Error Structure

u_{it} = \alpha_i + \varepsilon_{it}

where: - \alpha_i \sim (0, \sigma_\alpha^2) — random individual effect - \varepsilon_{it} \sim (0, \sigma_\varepsilon^2) — idiosyncratic error - \text{Cov}(\alpha_i, \varepsilon_{jt}) = 0 for all i,j,t - KEY ASSUMPTION: \text{Cov}(\alpha_i, x_{it}) = 0 (orthogonality)

4.3 Variance-Covariance Structure

4.3.1 Variance of Composite Error

\text{Var}(u_{it}) = \text{Var}(\alpha_i) + \text{Var}(\varepsilon_{it}) = \sigma_\alpha^2 + \sigma_\varepsilon^2

4.3.2 Serial Correlation Within Units

For the same unit i at different times: \text{Cov}(u_{it}, u_{is}) = \text{Cov}(\alpha_i + \varepsilon_{it}, \alpha_i + \varepsilon_{is}) = \sigma_\alpha^2

This creates positive serial correlation within units.

4.3.3 Intraclass Correlation Coefficient

The correlation between any two observations from the same unit: \rho = \frac{\sigma_\alpha^2}{\sigma_\alpha^2 + \sigma_\varepsilon^2} = \frac{\sigma_\alpha^2}{\sigma_u^2}

This measures the fraction of total variance due to unit-specific effects.

4.3.4 Variance-Covariance Matrix for Unit i

\text{Var}(u_i) = \Omega_i = \sigma_\varepsilon^2 I_T + \sigma_\alpha^2 \mathbf{1}_T\mathbf{1}_T' = \sigma_\varepsilon^2 I_T + \sigma_\alpha^2 J_T \cdot T

In expanded form: \Omega_i = \begin{bmatrix} \sigma_\alpha^2 + \sigma_\varepsilon^2 & \sigma_\alpha^2 & \cdots & \sigma_\alpha^2 \\ \sigma_\alpha^2 & \sigma_\alpha^2 + \sigma_\varepsilon^2 & \cdots & \sigma_\alpha^2 \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_\alpha^2 & \sigma_\alpha^2 & \cdots & \sigma_\alpha^2 + \sigma_\varepsilon^2 \end{bmatrix}_{T \times T}

Structure: Constant variance on diagonal, constant covariance off-diagonal (equicorrelation).

4.3.5 Full Panel Variance-Covariance Matrix

\Omega = \text{Var}(u) = I_N \otimes \Omega_i = \sigma_\varepsilon^2 I_{NT} + \sigma_\alpha^2 (I_N \otimes J_T)

4.4 Random Effects Estimator

4.4.1 GLS (Generalized Least Squares) Approach

Since \Omega \neq \sigma^2 I, OLS is inefficient. The efficient estimator is GLS:

\boxed{\hat{\beta}_{RE} = (X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y}

Problem: This requires knowing \Omega (which depends on \sigma_\alpha^2 and \sigma_\varepsilon^2).

4.4.2 Feasible GLS (FGLS)

In practice: 1. Estimate \sigma_\alpha^2 and \sigma_\varepsilon^2 from data 2. Construct \hat{\Omega} 3. Use FGLS: \hat{\beta}_{RE} = (X'\hat{\Omega}^{-1}X)^{-1}X'\hat{\Omega}^{-1}y

4.4.3 Quasi-Demeaning Transformation (Practical Implementation)

Rather than compute \Omega^{-1} directly, RE can be implemented via partial demeaning.

4.4.3.1 The Transformation

Define the quasi-demeaning factor: \theta = 1 - \sqrt{\frac{\sigma_\varepsilon^2}{\sigma_\varepsilon^2 + T\sigma_\alpha^2}}

Transform the data: y_{it}^* = y_{it} - \theta\bar{y}_i, \quad x_{it}^* = x_{it} - \theta\bar{x}_i

Random effects estimator: \hat{\beta}_{RE} = \left(\sum_{i=1}^N\sum_{t=1}^T x_{it}^* x_{it}^{*'}\right)^{-1} \left(\sum_{i=1}^N\sum_{t=1}^T x_{it}^* y_{it}^*\right)

This is just OLS on quasi-demeaned data.

4.4.3.2 Understanding \theta

Interpretation: \theta determines how much of the unit mean to subtract.

0 \leq \theta \leq 1

Special cases:

  1. If \theta = 0: y_{it}^* = y_{it} → No demeaning → Pooled OLS
    • Occurs when \sigma_\alpha^2 = 0 (no random effects)
  2. If \theta = 1: y_{it}^* = y_{it} - \bar{y}_i → Full demeaning → Fixed Effects
    • Occurs when \sigma_\alpha^2 \to \infty or T \to \infty
  3. If 0 < \theta < 1: Partial demeaning → True Random Effects
    • Weighted average of pooled OLS and FE

4.4.3.3 Why Partial Demeaning?

Full demeaning (FE) removes all between-unit variation. RE uses both: - Within variation: Changes over time within units - Between variation: Differences across units

By only partially demeaning, RE preserves some between-unit information while still accounting for the serial correlation induced by \alpha_i.

4.5 Estimating Variance Components

Several methods to estimate \sigma_\alpha^2 and \sigma_\varepsilon^2:

4.5.1 Method 1: From Fixed Effects and Pooled OLS Residuals

  1. Estimate FE model, get \hat{\sigma}_\varepsilon^2
  2. Estimate pooled OLS, get \hat{\sigma}_u^2
  3. Calculate: \hat{\sigma}_\alpha^2 = \frac{\hat{\sigma}_u^2 - \hat{\sigma}_\varepsilon^2}{T}

4.5.2 Method 2: ANOVA-type Estimator

  1. Run fixed effects, compute \hat{\sigma}_\varepsilon^2 from within residuals
  2. Compute between variation from unit means
  3. Estimate \hat{\sigma}_\alpha^2 from between-group sum of squares

4.5.3 Method 3: Maximum Likelihood

Assume normality and maximize: \mathcal{L}(\beta, \sigma_\alpha^2, \sigma_\varepsilon^2) = \prod_{i=1}^N f(y_i \mid X_i; \beta, \Omega_i)

4.6 Variance of the RE Estimator

Under RE assumptions: \text{Var}(\hat{\beta}_{RE} \mid X) = (X'\Omega^{-1}X)^{-1}

This is smaller than (more efficient than) FE variance when the RE assumption \text{Cov}(\alpha_i, x_{it}) = 0 holds.

4.7 Assumptions

4.7.1 Critical RE Assumptions

  1. Orthogonality: E[\alpha_i \mid X_i] = 0 (equivalently, \text{Cov}(\alpha_i, x_{it}) = 0)
  2. Random effects distribution: \alpha_i \sim (0, \sigma_\alpha^2)
  3. Idiosyncratic errors: \varepsilon_{it} \sim (0, \sigma_\varepsilon^2)
  4. Strict exogeneity: E[\varepsilon_{it} \mid X_i, \alpha_i] = 0
  5. No correlation: \text{Cov}(\alpha_i, \varepsilon_{jt}) = 0 for all i,j,t

Assumption 1 is the most restrictive and differentiates RE from FE.

4.8 Advantages and Disadvantages

4.8.1 Advantages

  • More efficient than FE when assumptions hold
  • Can estimate coefficients on time-invariant variables
  • Uses both within and between variation
  • Better for generalization to population
  • Computationally simpler than LSDV

4.8.2 Disadvantages

  • Inconsistent if \text{Cov}(\alpha_i, x_{it}) \neq 0
  • Requires strong orthogonality assumption
  • Sensitive to model specification
  • Less robust than FE

5 Comparing the Three Models

5.1 Summary Table

Aspect Pooled OLS Fixed Effects Random Effects
Treatment of \alpha_i Ignored Fixed parameters Random draws
Estimator formula (X'X)^{-1}X'y (X'MX)^{-1}X'My (X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y
Demeaning None Full (\theta=1) Partial (0<\theta<1)
Consistency requires \text{Cov}(\alpha_i, x_{it})=0 None (allows correlation) \text{Cov}(\alpha_i, x_{it})=0
Time-invariant X Can estimate Cannot estimate Can estimate
Efficiency Low (if panel structure) Medium High (if assumptions hold)
Variation used Total Within only Within + Between
Robustness Low High Medium

5.2 Relationships Between Estimators

5.2.1 Nested Structure

\text{Pooled OLS} \xleftarrow[\theta=0]{\text{special case}} \text{Random Effects} \xleftarrow[\theta=1]{\text{special case}} \text{Fixed Effects}

5.2.2 Algebraic Relationships

  1. FE and RE converge as T \to \infty: \lim_{T \to \infty} \theta = 1 \implies \hat{\beta}_{RE} \to \hat{\beta}_{FE}

  2. RE is between pooled and FE: \hat{\beta}_{RE} = \lambda \hat{\beta}_{FE} + (1-\lambda)\hat{\beta}_{P} for some 0 < \lambda < 1 (informally)

5.3 Bias-Efficiency Trade-off

5.3.1 When \text{Cov}(\alpha_i, x_{it}) = 0 (RE assumption holds)

  • Pooled OLS: Consistent but inefficient (ignores serial correlation)
  • FE: Consistent but inefficient (throws away between variation)
  • RE: Consistent AND efficient ✓

5.3.2 When \text{Cov}(\alpha_i, x_{it}) \neq 0 (RE assumption fails)

  • Pooled OLS: Biased and inconsistent ✗
  • FE: Consistent ✓
  • RE: Biased and inconsistent ✗

Lesson: If in doubt, use FE for robustness.


6 Statistical Tests for Model Selection

6.1 Test 1: F-test for Fixed Effects vs Pooled OLS

6.1.1 Null and Alternative Hypotheses

H_0: \alpha_1 = \alpha_2 = \cdots = \alpha_N = 0 \quad (\text{Pooled OLS is adequate}) H_1: \text{At least one } \alpha_i \neq 0 \quad (\text{Fixed effects needed})

6.1.2 Test Statistic

F = \frac{(SSR_P - SSR_{FE})/N}{SSR_{FE}/(NT - N - K)} \sim F(N, NT-N-K)

where: - SSR_P = sum of squared residuals from pooled OLS - SSR_{FE} = sum of squared residuals from fixed effects

6.1.3 Decision Rule

  • If F > F_{\alpha} (or p < 0.05): Reject H_0 → Use fixed effects
  • If F \leq F_{\alpha} (or p \geq 0.05): Fail to reject → Pooled OLS adequate

6.2 Test 2: Breusch-Pagan LM Test for Random Effects

6.2.1 Null and Alternative Hypotheses

H_0: \sigma_\alpha^2 = 0 \quad (\text{No random effects, pooled OLS is adequate}) H_1: \sigma_\alpha^2 > 0 \quad (\text{Random effects exist})

6.2.2 Test Statistic

LM = \frac{NT}{2(T-1)} \left[ \frac{\sum_{i=1}^N (\sum_{t=1}^T \hat{u}_{it})^2}{\sum_{i=1}^N\sum_{t=1}^T \hat{u}_{it}^2} - 1 \right]^2 \sim \chi^2(1)

where \hat{u}_{it} are pooled OLS residuals.

6.2.3 Intuition

The test checks if residuals are more correlated within units than expected by chance.

6.2.4 Decision Rule

  • If LM > \chi^2_{\alpha}(1) (or p < 0.05): Reject H_0 → Random effects needed
  • If LM \leq \chi^2_{\alpha}(1) (or p \geq 0.05): Fail to reject → Pooled OLS adequate

6.3 Test 3: Hausman Specification Test (Fixed vs Random Effects)

6.3.1 The Key Question

Should we use fixed effects or random effects?

6.3.2 Null and Alternative Hypotheses

$ H_0: (i, x{it}) = 0 () $ $ H_1: (i, x{it}) () $

6.3.3 Intuition

  • Under H_0: Both FE and RE are consistent, but RE is more efficient
  • Under H_1: Only FE is consistent, RE is biased

If the two estimators differ significantly, it suggests H_1 is true.

6.3.4 Test Statistic

$ H = ({FE} - )’ [({FE}) - ()]^{-1} ({FE} - ) ^2(K) $

where K is the number of time-varying regressors.

6.3.5 Components

Difference vector: $ = {FE} - $

Variance of difference: $ () = ({FE}) - () $

Under H_0, \text{Var}(\hat{q}) is positive semi-definite.

6.3.6 Decision Rule

  • If H > \chi^2_{\alpha}(K) (or p < 0.05): Reject H_0 → Use Fixed Effects
  • If H \leq \chi^2_{\alpha}(K) (or p \geq 0.05): Fail to reject → Use Random Effects

6.3.7 Practical Interpretation

Small Hausman statistic / Large p-value: - \hat{\beta}_{FE} \approx \hat{\beta}_{RE} - Suggests orthogonality assumption may hold - RE preferred for efficiency

Large Hausman statistic / Small p-value: - \hat{\beta}_{FE} and \hat{\beta}_{RE} differ substantially - Suggests correlation between \alpha_i and x_{it} - FE preferred for consistency

6.4 Complete Decision Framework

6.4.1 Step-by-Step Model Selection

START
  │
  ├─→ Run Pooled OLS
  │
  ├─→ Test 1: F-test for Fixed Effects
  │    ├─ Reject H₀? → Panel structure exists, continue
  │    └─ Fail to reject? → Use Pooled OLS (with robust SE)
  │
  ├─→ Test 2: Breusch-Pagan LM test
  │    ├─ Reject H₀? → Random effects exist, continue
  │    └─ Fail to reject? → Use Pooled OLS
  │
  ├─→ Test 3: Hausman Test
  │    ├─ Reject H₀? → Use FIXED EFFECTS ✓
  │    └─ Fail to reject? → Use RANDOM EFFECTS ✓
  │
END

6.4.2 Theoretical Considerations

Beyond statistical tests, consider:

  1. Nature of sample:
    • Specific units of interest (e.g., G7 countries) → FE
    • Random sample from population → RE
  2. Research question:
    • Need time-invariant effects? → RE or Pooled
    • Focus on within-unit changes? → FE
  3. Data structure:
    • Large N, small T → FE often appropriate
    • Small N, large T → Both work, but consider dynamics
  4. Plausibility of orthogonality:
    • Can \alpha_i realistically be uncorrelated with X? Usually no → FE
    • Example: Individual ability affects both education and wages → FE

7 Additional Diagnostic Tests

7.1 Serial Correlation Tests

7.1.1 Why It Matters

Panel data often exhibits serial correlation in \varepsilon_{it}, violating the assumption: $ (,* ) = 0 t s $

7.1.2 Wooldridge Test for Serial Correlation

Null hypothesis: H_0: No first-order serial correlation

Test procedure: 1. Estimate FE model, obtain residuals \hat{\varepsilon}_{it} 2. Regress \hat{\varepsilon}_{it} on \hat{\varepsilon}_{i,t-1} and other variables 3. Test if coefficient on \hat{\varepsilon}_{i,t-1} is zero

Decision rule: - If p < 0.05 → Serial correlation present - Use clustered standard errors or Driscoll-Kraay SEs

7.2 Heteroskedasticity Tests

7.2.1 Modified Wald Test for Groupwise Heteroskedasticity

Null hypothesis: H_0: \sigma_{\varepsilon_i}^2 = \sigma_\varepsilon^2 for all i

Tests whether error variance differs across units.

Decision rule: - If p < 0.05 → Use robust/clustered standard errors

7.3 Cross-Sectional Dependence

7.3.1 Pesaran CD Test

Tests whether residuals are correlated across different units: $ H_0: (,* ) = 0 i j $

Implications if rejected: - Spatial correlation or common shocks - May need Driscoll-Kraay SEs or spatial panel models


8 Robust Inference in Panel Data

8.1 Clustered Standard Errors

8.1.1 Why Cluster?

Even after accounting for \alpha_i, errors may be correlated within units: $ (,* ) $

8.1.2 Cluster-Robust Variance Estimator

For FE: $ {cluster}() = (X’MX)^{-1} (_{i=1}^N X_i’M _i _i’ M X_i) (X’MX)^{-1} $

This allows arbitrary correlation within clusters (units).

8.1.3 When to Use

  • Default for panel data
  • Especially important if T is small
  • Conservative approach

8.2 Driscoll-Kraay Standard Errors

8.2.1 For Spatial and Temporal Correlation

Allows for: - Serial correlation within units - Cross-sectional correlation across units - Useful when T is large

8.2.2 Variance Estimator

$ {DK}() = (X’X)^{-1} (X’X)^{-1} $

where \hat{S}_{DK} accounts for both serial and spatial correlation.


9 Dynamic Panel Data Models

9.1 The Basic Dynamic Model

$ y_{it} = y_{i,t-1} + x_{it}’+ i +* $

Problem: The lagged dependent variable y_{i,t-1} is correlated with \alpha_i, making both FE and RE inconsistent.

9.2 Why Standard Estimators Fail

9.2.1 Correlation Structure

$ y_{i,t-1} = x_{i,t-1}’+ i +* $

Since y_{i,t-1} contains \alpha_i: $ (y_{i,t-1}, _i) $

This violates strict exogeneity even for FE!

9.2.2 The FE Bias

With fixed T, the FE estimator of \rho is biased: $ _{FE} - = O(1/T) $

For small T (common in micro panels), this bias can be large.

9.3 Solutions: GMM Estimators

9.3.1 Arellano-Bond (First-Difference GMM)

First-difference the equation: $ y_{it} = y_{i,t-1} + x_{it}’+ _{it} $

Use y_{i,t-2}, y_{i,t-3}, \ldots as instruments for \Delta y_{i,t-1}.

9.3.2 Arellano-Bover/Blundell-Bond (System GMM)

Combines: - First-differenced equations (with lagged levels as instruments) - Level equations (with lagged differences as instruments)

More efficient than AB when \rho is close to 1.


10 Extensions and Advanced Topics

10.1 Two-Way Fixed Effects

10.1.1 Model with Time Fixed Effects

$ y_{it} = x_{it}’+ _i + t +* $

where: - \alpha_i = unit fixed effects - \lambda_t = time fixed effects (common shocks)

10.1.2 Estimation

Include both unit and time dummies, or double-demean: $ {it} = y{it} - {y}_i - {y}_t + {{y}} $

10.1.3 When to Use

  • Control for aggregate time shocks (recessions, policy changes)
  • Standard in difference-in-differences applications

10.2 Unbalanced Panels

10.2.1 Missing Data Structure

Not all units observed in all time periods: - T_i varies across units - Total observations: n = \sum_{i=1}^N T_i

10.2.2 Implications

  • Pooled OLS and RE: Straightforward to adapt
  • FE: Still consistent, but demeaning uses T_i for each unit
  • Hausman test may be affected

10.2.3 Handling

Most software handles automatically, but check: - Missingness mechanism (MAR vs MNAR) - Impact on variance estimates

10.3 Instrumental Variables in Panel Data

10.3.1 Model

$ y_{it} = x_{it}‘+ z_{it}’+ i +* $

where z_{it} is endogenous: \text{Cov}(z_{it}, \varepsilon_{it}) \neq 0

10.3.2 FE-2SLS

  1. Demean all variables (including instruments)
  2. Apply 2SLS to demeaned data

10.3.3 Requires

Valid instruments w_{it} such that: - Relevance: \text{Cov}(w_{it}, z_{it}) \neq 0 - Exogeneity: \text{Cov}(w_{it}, \varepsilon_{it}) = 0


11 Practical Implementation Guide

11.1 Model Selection Checklist

11.1.1 1. Preliminary Analysis

11.1.2 2. Estimate All Models

11.1.3 3. Run Specification Tests

11.1.4 4. Check Diagnostics

11.1.5 5. Robust Inference

11.1.6 6. Report Results

11.2 Common Pitfalls and How to Avoid Them

11.2.1 Pitfall 1: Using Pooled OLS When Panel Structure Exists

Problem: Biased estimates and wrong standard errors

Solution: Always test with F-test and BP-LM test

11.2.2 Pitfall 2: Using RE When \text{Cov}(\alpha_i, x_{it}) \neq 0

Problem: Inconsistent estimates

Solution: - Run Hausman test - Consider theoretical plausibility - When in doubt, use FE

11.2.3 Pitfall 3: Ignoring Serial Correlation

Problem: Standard errors too small, over-rejection of hypotheses

Solution: Always use cluster-robust SEs

11.2.4 Pitfall 4: Trying to Estimate Time-Invariant Effects with FE

Problem: Cannot identify these coefficients

Solution: Use RE or pooled OLS if these variables are crucial

11.2.5 Pitfall 5: Not Checking Within Variation

Problem: FE may not be identified if no within variation

Solution: Check \text{Var}(\tilde{x}_{it}) > 0 before using FE


12 Comparison with Alternative Methods

12.1 Panel Data vs Cross-Sectional Regression

Aspect Cross-Section Panel Data
Observations N units, 1 time N units, T times
Unobserved heterogeneity Cannot control FE/RE control
Efficiency Lower (less data) Higher
Dynamics Cannot study Can model
Identification Weaker Stronger

12.2 Panel Data vs Time Series

Aspect Time Series Panel Data
Units 1 unit, T times N units, T times
Asymptotics T → ∞ N → ∞ (usually)
Unobserved effects Cannot separate from trend Can separate
Degrees of freedom Limited Abundant

13 Summary and Recommendations

13.1 When to Use Each Model

13.1.1 Use Pooled OLS When:

  • No panel structure detected (F-test and BP-LM both fail to reject)
  • Truly independent observations
  • Time-invariant effects are crucial and plausibly exogenous

13.1.2 Use Fixed Effects When:

  • \text{Cov}(\alpha_i, x_{it}) likely non-zero (most cases)
  • Focus on within-unit changes
  • Robustness is priority
  • Don’t need time-invariant coefficients

13.1.3 Use Random Effects When:

  • Hausman test fails to reject
  • Need time-invariant coefficients
  • Sample is random draw from population
  • \text{Cov}(\alpha_i, x_{it}) = 0 is plausible

13.2 General Advice

  1. Default to FE for robustness in most economic applications
  2. Always use robust/clustered standard errors
  3. Run all three models and specification tests for comparison
  4. Consider the economics not just the statistics
  5. Check diagnostics after selecting model

14 Mathematical Appendix

14.1 Kronecker Product Properties

For matrices A (m × n) and B (p × q):

$ A B =

\begin{bmatrix} a_{11}B & a_{12}B & \cdots & a_{1n}B \\ a_{21}B & a_{22}B & \cdots & a_{2n}B \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1}B & a_{m2}B & \cdots & a_{mn}B \end{bmatrix}

_{mp nq} $

Properties: 1. (A \otimes B)(C \otimes D) = (AC) \otimes (BD) 2. (A \otimes B)' = A' \otimes B' 3. (A \otimes B)^{-1} = A^{-1} \otimes B^{-1} (if invertible)

14.2 Matrix Differentiation Rules

For \beta as a column vector:

  1. \frac{\partial (a'\beta)}{\partial \beta} = a
  2. \frac{\partial (\beta'A\beta)}{\partial \beta} = (A + A')\beta
  3. \frac{\partial (y - X\beta)'(y - X\beta)}{\partial \beta} = -2X'(y - X\beta)

14.3 Projection Matrix Properties

For any projection matrix P:

  1. Symmetric: P = P'
  2. Idempotent: P^2 = P
  3. Eigenvalues: Either 0 or 1
  4. Complement: M = I - P is also a projection matrix

15 References and Further Reading

15.1 Essential Textbooks

  1. Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press.
    • Comprehensive treatment of panel data methods
    • Focuses on microeconometrics applications
  2. Baltagi, B. H. (2021). Econometric Analysis of Panel Data (6th ed.). Springer.
    • Detailed mathematical exposition
    • Covers advanced topics
  3. Hsiao, C. (2014). Analysis of Panel Data (3rd ed.). Cambridge University Press.
    • Theoretical foundations
    • Economic applications

15.2 Software Resources

  • Stata: xtreg, xttest0, hausman, xtserial
  • R: plm package, lfe package for high-dimensional FE
  • Python: linearmodels package
  • Eviews: Built-in panel data procedures

15.3 Online Resources

  • Econometrics Academy (YouTube channel)
  • NBER Summer Institute lectures
  • World Bank’s Impact Evaluation resources

16 Glossary of Terms

Balanced panel: Every unit observed in every time period

Composite error: u_{it} = \alpha_i + \varepsilon_{it}

Demeaning: Subtracting group means from variables

Endogeneity: Correlation between regressor and error term

Fixed effect: Unit-specific intercept treated as parameter

Idiosyncratic error: Time-varying, unit-specific shock \varepsilon_{it}

Intraclass correlation: Correlation of observations within same unit

Panel data: Dataset with both cross-sectional and time dimensions

Quasi-demeaning: Partial demeaning with factor 0 < \theta < 1

Random effect: Unit-specific term treated as random draw

Strict exogeneity: E[\varepsilon_{it} \mid X_i] = 0 for all t

Unbalanced panel: Units observed different numbers of times

Within transformation: Demeaning data to remove fixed effects


End of Document