Panel Data Models: A Comprehensive Guide

Pooled OLS, Fixed Effects, and Random Effects Estimators

Author

Oliver James

Published

December 2, 2025

1 Introduction and Setup

1.1 What is Panel Data?

Panel data (also called longitudinal data) combines cross-sectional and time-series dimensions. We observe multiple entities (individuals, firms, countries, regions) over multiple time periods.

Key advantages:

Controls for unobserved heterogeneity
More degrees of freedom and efficiency
Can study dynamics and causality better than pure cross-section
Reduces collinearity between variables

1.2 Notation and Data Structure

1.2.1 Basic Setup

Consider a balanced panel with:

N = number of cross-sectional units (e.g., regions, individuals)
T = number of time periods
n = NT = total number of observations

1.2.2 Variables

y_{it} — dependent variable for unit i at time t
x_{it} — (K \times 1) vector of explanatory variables (regressors)
\beta — (K \times 1) parameter vector (coefficients of interest)
\alpha_i — individual-specific effect (unobserved heterogeneity)
\varepsilon_{it} — idiosyncratic error term (time-varying shock)

1.2.3 Stacking Convention

We stack observations in a specific order: all time periods for unit 1, then all time periods for unit 2, etc.

Stacked dependent variable: y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_N \end{bmatrix}_{NT \times 1}, \quad \text{where} \quad y_i = \begin{bmatrix} y_{i1} \\ y_{i2} \\ \vdots \\ y_{iT} \end{bmatrix}_{T \times 1}

Stacked regressor matrix: X = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_N \end{bmatrix}_{NT \times K}, \quad \text{where} \quad X_i = \begin{bmatrix} x_{i1}' \\ x_{i2}' \\ \vdots \\ x_{iT}' \end{bmatrix}_{T \times K}

1.2.4 Useful Matrix Notation

I_N — (N \times N) identity matrix
I_T — (T \times T) identity matrix
\mathbf{1}_T — (T \times 1) vector of ones
J_T = \mathbf{1}_T \mathbf{1}_T' / T — (T \times T) averaging matrix
\otimes — Kronecker product operator

2 Pooled OLS Model

2.1 The Basic Idea

Pooled OLS treats all observations as if they were independent, ignoring the panel structure. It assumes no unobserved heterogeneity across units.

2.2 Model Specification

2.2.1 Scalar Form

For unit i at time t: y_{it} = x_{it}'\beta + u_{it}, \quad i=1,\ldots,N, \quad t=1,\ldots,T

where u_{it} is a composite error term that pools all unobserved effects.

2.2.2 Matrix Form

Stacking all observations: y = X\beta + u

where: - y is (NT \times 1) - X is (NT \times K) - \beta is (K \times 1) - u is (NT \times 1)

2.3 Pooled OLS Estimator

2.3.1 Derivation

Minimize the sum of squared residuals: \min_{\beta} \quad (y - X\beta)'(y - X\beta)

First-order condition: -2X'(y - X\hat{\beta}_P) = 0

Solving for \hat{\beta}_P: X'X\hat{\beta}_P = X'y

2.3.2 The Estimator

\boxed{\hat{\beta}_P = (X'X)^{-1}X'y}

This is just standard OLS applied to the entire stacked dataset.

2.4 Assumptions

2.4.1 Classical Assumptions

Strict exogeneity: E[u_{it} \mid X] = 0
Homoskedasticity: \text{Var}(u_{it}) = \sigma_u^2 for all i,t
No serial correlation: \text{Cov}(u_{it}, u_{is}) = 0 for t \neq s
No cross-sectional correlation: \text{Cov}(u_{it}, u_{jt}) = 0 for i \neq j

2.4.2 Variance-Covariance Matrix

Under these assumptions: \text{Var}(u) = \sigma_u^2 I_{NT}

where I_{NT} is the (NT \times NT) identity matrix.

2.5 Variance of the Estimator

Under classical assumptions: \text{Var}(\hat{\beta}_P \mid X) = \sigma_u^2 (X'X)^{-1}

Estimation of \sigma_u^2: \hat{\sigma}_u^2 = \frac{\hat{u}'\hat{u}}{NT - K} = \frac{\sum_{i=1}^N \sum_{t=1}^T \hat{u}_{it}^2}{NT - K}

2.6 When Does Pooled OLS Work?

2.6.1 Consistency Condition

Pooled OLS is consistent if: E[u_{it} \mid X] = 0

This requires: No unobserved unit-specific effects that correlate with X.

2.6.2 Why Pooled OLS Usually Fails

In reality, u_{it} often contains unobserved individual effects: u_{it} = \alpha_i + \varepsilon_{it}

If \alpha_i is correlated with x_{it}, then: E[u_{it} \mid X] = E[\alpha_i \mid X] \neq 0

This causes omitted variable bias.

2.6.3 Example: Returns to Education

Model: \ln(\text{wage}_{it}) = \beta_0 + \beta_1 \text{education}_{it} + u_{it}

Problem: u_{it} contains unobserved ability \alpha_i

High ability individuals may get more education
\text{Cov}(\text{education}_{it}, \alpha_i) > 0
Pooled OLS overestimates returns to education

2.7 Advantages and Disadvantages

2.7.1 Advantages

Simple to compute
Efficient if assumptions hold
Can include time-invariant variables

2.7.2 Disadvantages

Biased and inconsistent if \alpha_i exists and correlates with X
Ignores panel structure
Standard errors wrong if serial correlation or heteroskedasticity present

3 Fixed Effects Model

3.1 The Core Insight

Fixed effects allows each unit to have its own intercept \alpha_i, which can be correlated with the regressors. This controls for all time-invariant unobserved heterogeneity.

3.2 Model Specification

3.2.1 Scalar Form

y_{it} = x_{it}'\beta + \alpha_i + \varepsilon_{it}

where: - \alpha_i = individual-specific effect (time-invariant) - \varepsilon_{it} = idiosyncratic error (time-varying)

3.2.2 Decomposition of Error

u_{it} = \alpha_i + \varepsilon_{it}

Key difference from pooled OLS: We explicitly model \alpha_i and allow \text{Cov}(\alpha_i, x_{it}) \neq 0.

3.2.3 Matrix Form (LSDV)

Stack all observations and include unit dummies: y = X\beta + D\alpha + \varepsilon

where: - D is the (NT \times N) matrix of unit dummy variables - \alpha = (\alpha_1, \ldots, \alpha_N)' is the (N \times 1) vector of fixed effects

3.2.4 Structure of the Dummy Matrix D

D = I_N \otimes \mathbf{1}_T = \begin{bmatrix} \mathbf{1}_T & 0 & \cdots & 0 \\ 0 & \mathbf{1}_T & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \mathbf{1}_T \end{bmatrix}_{NT \times N}

For unit 1, the first T rows have 1 in column 1 and 0 elsewhere, etc.

3.3 Two Equivalent Estimators

3.3.1 LSDV (Least Squares Dummy Variable) Estimator

Run OLS on the augmented model: \begin{bmatrix} \hat{\beta}_{LSDV} \\ \hat{\alpha} \end{bmatrix} = \left( \begin{bmatrix} X & D \end{bmatrix}' \begin{bmatrix} X & D \end{bmatrix} \right)^{-1} \begin{bmatrix} X & D \end{bmatrix}' y

Problem: Computationally expensive when N is large (adds N dummy variables).

3.3.2 Within (Demeaned) Estimator

More efficient approach: Transform the data to eliminate \alpha_i.

3.3.2.1 Unit-Specific Time Averages

For each unit i, compute: \bar{y}_i = \frac{1}{T}\sum_{t=1}^T y_{it}, \quad \bar{x}_i = \frac{1}{T}\sum_{t=1}^T x_{it}

3.3.2.2 Taking Deviations from Means

Average the model over time: \bar{y}_i = \bar{x}_i'\beta + \alpha_i + \bar{\varepsilon}_i

Subtract this from the original equation: y_{it} - \bar{y}_i = (x_{it} - \bar{x}_i)'\beta + (\varepsilon_{it} - \bar{\varepsilon}_i)

Notice: The fixed effect \alpha_i cancels out! This is the “within transformation.”

3.3.2.3 Demeaned Variables

Define: \tilde{y}_{it} = y_{it} - \bar{y}_i, \quad \tilde{x}_{it} = x_{it} - \bar{x}_i, \quad \tilde{\varepsilon}_{it} = \varepsilon_{it} - \bar{\varepsilon}_i

The within-transformed model: \tilde{y}_{it} = \tilde{x}_{it}'\beta + \tilde{\varepsilon}_{it}

Apply OLS to this demeaned data.

3.3.3 Matrix Form of Within Transformation

Define the demeaning matrix for each unit: M_T = I_T - \frac{1}{T}\mathbf{1}_T\mathbf{1}_T' = I_T - J_T

This matrix, when applied to any (T \times 1) vector, subtracts its mean from each element.

For the full panel: M = I_N \otimes M_T = I_{NT} - I_N \otimes J_T

Properties of M: - Symmetric: M = M' - Idempotent: M^2 = M (key for projection matrices) - M removes unit-specific means

3.3.4 Apply the Transformation

\tilde{y} = My, \quad \tilde{X} = MX

Note that M \cdot D = 0 (the demeaning matrix eliminates all unit dummy variables).

3.4 Fixed Effects Estimator

3.4.1 The Within Estimator Formula

\boxed{\hat{\beta}_{FE} = (\tilde{X}'\tilde{X})^{-1}\tilde{X}'\tilde{y} = (X'MX)^{-1}X'My}

This is OLS on demeaned data.

3.4.2 Relationship to LSDV

Theorem: \hat{\beta}_{FE} = \hat{\beta}_{LSDV}

Both approaches give identical coefficient estimates, but the within estimator is computationally much faster.

3.4.3 Recovering the Fixed Effects

After estimating \hat{\beta}_{FE}, we can recover: \hat{\alpha}_i = \bar{y}_i - \bar{x}_i'\hat{\beta}_{FE}

3.5 Assumptions

3.5.1 Key Assumptions for FE

Strict exogeneity (conditional): E[\varepsilon_{it} \mid X_i, \alpha_i] = 0
Homoskedasticity: \text{Var}(\varepsilon_{it}) = \sigma_\varepsilon^2
No serial correlation: \text{Cov}(\varepsilon_{it}, \varepsilon_{is}) = 0 for t \neq s
No perfect collinearity in \tilde{X}

Crucially: We do NOT require \text{Cov}(\alpha_i, x_{it}) = 0. This is the main advantage!

3.6 Variance of the FE Estimator

\text{Var}(\hat{\beta}_{FE} \mid X, \alpha) = \sigma_\varepsilon^2 (X'MX)^{-1}

Estimation of \sigma_\varepsilon^2: \hat{\sigma}_\varepsilon^2 = \frac{\tilde{\varepsilon}'\tilde{\varepsilon}}{NT - N - K} = \frac{\sum_{i=1}^N\sum_{t=1}^T \tilde{\varepsilon}_{it}^2}{NT - N - K}

Degrees of freedom: NT - N - K because we lose N fixed effects and estimate K slopes.

3.7 Interpretation and Properties

3.7.1 What FE Identifies

Fixed effects uses within-unit variation over time: \hat{\beta}_{FE} = \text{Cov}(\tilde{x}_{it}, \tilde{y}_{it}) / \text{Var}(\tilde{x}_{it})

It answers: “When x_{it} changes within unit i over time, how does y_{it} change?”

3.7.2 What FE Cannot Identify

Time-invariant variables are eliminated by demeaning: - If x_{it} = x_i (no time variation), then \tilde{x}_{it} = x_i - x_i = 0 - Cannot estimate effects of gender, race, country of birth, etc.

3.7.3 R^2 in Fixed Effects

Three types of R^2:

Within R^2: Fit of demeaned model R^2_{\text{within}} = 1 - \frac{\sum \tilde{\varepsilon}_{it}^2}{\sum \tilde{y}_{it}^2}
Between R^2: Fit of unit means R^2_{\text{between}} = 1 - \frac{\sum (\bar{y}_i - \bar{x}_i'\hat{\beta}_{FE})^2}{\sum (\bar{y}_i - \bar{y})^2}
Overall R^2: Total fit including fixed effects

3.8 Advantages and Disadvantages

3.8.1 Advantages

Consistent even when \text{Cov}(\alpha_i, x_{it}) \neq 0
Eliminates omitted variable bias from time-invariant factors
Natural for policy evaluation (before-after comparisons)
No distributional assumptions on \alpha_i

3.8.2 Disadvantages

Cannot estimate time-invariant effects
Less efficient than RE if \text{Cov}(\alpha_i, x_{it}) = 0 actually holds
May exacerbate measurement error in x_{it}
Requires sufficient within-unit variation

4 Random Effects Model

4.1 The Core Idea

Random effects treats \alpha_i as a random variable drawn from a distribution, uncorrelated with the regressors. This allows more efficient estimation than FE.

4.2 Model Specification

4.2.1 Scalar Form

y_{it} = x_{it}'\beta + \alpha_i + \varepsilon_{it}

Same structure as FE, but with different assumptions on \alpha_i.

4.2.2 Composite Error Structure

u_{it} = \alpha_i + \varepsilon_{it}

where: - \alpha_i \sim (0, \sigma_\alpha^2) — random individual effect - \varepsilon_{it} \sim (0, \sigma_\varepsilon^2) — idiosyncratic error - \text{Cov}(\alpha_i, \varepsilon_{jt}) = 0 for all i,j,t - KEY ASSUMPTION: \text{Cov}(\alpha_i, x_{it}) = 0 (orthogonality)

4.3 Variance-Covariance Structure

4.3.1 Variance of Composite Error

\text{Var}(u_{it}) = \text{Var}(\alpha_i) + \text{Var}(\varepsilon_{it}) = \sigma_\alpha^2 + \sigma_\varepsilon^2

4.3.2 Serial Correlation Within Units

For the same unit i at different times: \text{Cov}(u_{it}, u_{is}) = \text{Cov}(\alpha_i + \varepsilon_{it}, \alpha_i + \varepsilon_{is}) = \sigma_\alpha^2

This creates positive serial correlation within units.

4.3.3 Intraclass Correlation Coefficient

The correlation between any two observations from the same unit: \rho = \frac{\sigma_\alpha^2}{\sigma_\alpha^2 + \sigma_\varepsilon^2} = \frac{\sigma_\alpha^2}{\sigma_u^2}

This measures the fraction of total variance due to unit-specific effects.

4.3.4 Variance-Covariance Matrix for Unit i

\text{Var}(u_i) = \Omega_i = \sigma_\varepsilon^2 I_T + \sigma_\alpha^2 \mathbf{1}_T\mathbf{1}_T' = \sigma_\varepsilon^2 I_T + \sigma_\alpha^2 J_T \cdot T

In expanded form: \Omega_i = \begin{bmatrix} \sigma_\alpha^2 + \sigma_\varepsilon^2 & \sigma_\alpha^2 & \cdots & \sigma_\alpha^2 \\ \sigma_\alpha^2 & \sigma_\alpha^2 + \sigma_\varepsilon^2 & \cdots & \sigma_\alpha^2 \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_\alpha^2 & \sigma_\alpha^2 & \cdots & \sigma_\alpha^2 + \sigma_\varepsilon^2 \end{bmatrix}_{T \times T}

Structure: Constant variance on diagonal, constant covariance off-diagonal (equicorrelation).

4.3.5 Full Panel Variance-Covariance Matrix

\Omega = \text{Var}(u) = I_N \otimes \Omega_i = \sigma_\varepsilon^2 I_{NT} + \sigma_\alpha^2 (I_N \otimes J_T)

4.4 Random Effects Estimator

4.4.1 GLS (Generalized Least Squares) Approach

Since \Omega \neq \sigma^2 I, OLS is inefficient. The efficient estimator is GLS:

\boxed{\hat{\beta}_{RE} = (X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y}

Problem: This requires knowing \Omega (which depends on \sigma_\alpha^2 and \sigma_\varepsilon^2).

4.4.2 Feasible GLS (FGLS)

In practice: 1. Estimate \sigma_\alpha^2 and \sigma_\varepsilon^2 from data 2. Construct \hat{\Omega} 3. Use FGLS: \hat{\beta}_{RE} = (X'\hat{\Omega}^{-1}X)^{-1}X'\hat{\Omega}^{-1}y

4.4.3 Quasi-Demeaning Transformation (Practical Implementation)

Rather than compute \Omega^{-1} directly, RE can be implemented via partial demeaning.

4.4.3.1 The Transformation

Define the quasi-demeaning factor: \theta = 1 - \sqrt{\frac{\sigma_\varepsilon^2}{\sigma_\varepsilon^2 + T\sigma_\alpha^2}}

Transform the data: y_{it}^* = y_{it} - \theta\bar{y}_i, \quad x_{it}^* = x_{it} - \theta\bar{x}_i

Random effects estimator: \hat{\beta}_{RE} = \left(\sum_{i=1}^N\sum_{t=1}^T x_{it}^* x_{it}^{*'}\right)^{-1} \left(\sum_{i=1}^N\sum_{t=1}^T x_{it}^* y_{it}^*\right)

This is just OLS on quasi-demeaned data.

4.4.3.2 Understanding \theta

Interpretation: \theta determines how much of the unit mean to subtract.

0 \leq \theta \leq 1

Special cases:

If \theta = 0: y_{it}^* = y_{it} → No demeaning → Pooled OLS
- Occurs when \sigma_\alpha^2 = 0 (no random effects)
If \theta = 1: y_{it}^* = y_{it} - \bar{y}_i → Full demeaning → Fixed Effects
- Occurs when \sigma_\alpha^2 \to \infty or T \to \infty
If 0 < \theta < 1: Partial demeaning → True Random Effects
- Weighted average of pooled OLS and FE

4.4.3.3 Why Partial Demeaning?

Full demeaning (FE) removes all between-unit variation. RE uses both: - Within variation: Changes over time within units - Between variation: Differences across units

By only partially demeaning, RE preserves some between-unit information while still accounting for the serial correlation induced by \alpha_i.

4.5 Estimating Variance Components

Several methods to estimate \sigma_\alpha^2 and \sigma_\varepsilon^2:

4.5.1 Method 1: From Fixed Effects and Pooled OLS Residuals

Estimate FE model, get \hat{\sigma}_\varepsilon^2
Estimate pooled OLS, get \hat{\sigma}_u^2
Calculate: \hat{\sigma}_\alpha^2 = \frac{\hat{\sigma}_u^2 - \hat{\sigma}_\varepsilon^2}{T}

4.5.2 Method 2: ANOVA-type Estimator

Run fixed effects, compute \hat{\sigma}_\varepsilon^2 from within residuals
Compute between variation from unit means
Estimate \hat{\sigma}_\alpha^2 from between-group sum of squares

4.5.3 Method 3: Maximum Likelihood

Assume normality and maximize: \mathcal{L}(\beta, \sigma_\alpha^2, \sigma_\varepsilon^2) = \prod_{i=1}^N f(y_i \mid X_i; \beta, \Omega_i)

4.6 Variance of the RE Estimator

Under RE assumptions: \text{Var}(\hat{\beta}_{RE} \mid X) = (X'\Omega^{-1}X)^{-1}

This is smaller than (more efficient than) FE variance when the RE assumption \text{Cov}(\alpha_i, x_{it}) = 0 holds.

4.7 Assumptions

4.7.1 Critical RE Assumptions

Orthogonality: E[\alpha_i \mid X_i] = 0 (equivalently, \text{Cov}(\alpha_i, x_{it}) = 0)
Random effects distribution: \alpha_i \sim (0, \sigma_\alpha^2)
Idiosyncratic errors: \varepsilon_{it} \sim (0, \sigma_\varepsilon^2)
Strict exogeneity: E[\varepsilon_{it} \mid X_i, \alpha_i] = 0
No correlation: \text{Cov}(\alpha_i, \varepsilon_{jt}) = 0 for all i,j,t

Assumption 1 is the most restrictive and differentiates RE from FE.

4.8 Advantages and Disadvantages

4.8.1 Advantages

More efficient than FE when assumptions hold
Can estimate coefficients on time-invariant variables
Uses both within and between variation
Better for generalization to population
Computationally simpler than LSDV

4.8.2 Disadvantages

Inconsistent if \text{Cov}(\alpha_i, x_{it}) \neq 0
Requires strong orthogonality assumption
Sensitive to model specification
Less robust than FE

5 Comparing the Three Models

5.1 Summary Table

Aspect	Pooled OLS	Fixed Effects	Random Effects
Treatment of \alpha_i	Ignored	Fixed parameters	Random draws
Estimator formula	(X'X)^{-1}X'y	(X'MX)^{-1}X'My	(X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y
Demeaning	None	Full (\theta=1)	Partial (0<\theta<1)
Consistency requires	\text{Cov}(\alpha_i, x_{it})=0	None (allows correlation)	\text{Cov}(\alpha_i, x_{it})=0
Time-invariant X	Can estimate	Cannot estimate	Can estimate
Efficiency	Low (if panel structure)	Medium	High (if assumptions hold)
Variation used	Total	Within only	Within + Between
Robustness	Low	High	Medium

5.2 Relationships Between Estimators

5.2.1 Nested Structure

\text{Pooled OLS} \xleftarrow[\theta=0]{\text{special case}} \text{Random Effects} \xleftarrow[\theta=1]{\text{special case}} \text{Fixed Effects}

5.2.2 Algebraic Relationships

FE and RE converge as T \to \infty: \lim_{T \to \infty} \theta = 1 \implies \hat{\beta}_{RE} \to \hat{\beta}_{FE}
RE is between pooled and FE: \hat{\beta}_{RE} = \lambda \hat{\beta}_{FE} + (1-\lambda)\hat{\beta}_{P} for some 0 < \lambda < 1 (informally)

5.3 Bias-Efficiency Trade-off

5.3.1 When \text{Cov}(\alpha_i, x_{it}) = 0 (RE assumption holds)

Pooled OLS: Consistent but inefficient (ignores serial correlation)
FE: Consistent but inefficient (throws away between variation)
RE: Consistent AND efficient ✓

5.3.2 When \text{Cov}(\alpha_i, x_{it}) \neq 0 (RE assumption fails)

Pooled OLS: Biased and inconsistent ✗
FE: Consistent ✓
RE: Biased and inconsistent ✗

Lesson: If in doubt, use FE for robustness.

6 Statistical Tests for Model Selection

6.1 Test 1: F-test for Fixed Effects vs Pooled OLS

6.1.1 Null and Alternative Hypotheses

H_0: \alpha_1 = \alpha_2 = \cdots = \alpha_N = 0 \quad (\text{Pooled OLS is adequate}) H_1: \text{At least one } \alpha_i \neq 0 \quad (\text{Fixed effects needed})

6.1.2 Test Statistic

F = \frac{(SSR_P - SSR_{FE})/N}{SSR_{FE}/(NT - N - K)} \sim F(N, NT-N-K)

where: - SSR_P = sum of squared residuals from pooled OLS - SSR_{FE} = sum of squared residuals from fixed effects

6.1.3 Decision Rule

If F > F_{\alpha} (or p < 0.05): Reject H_0 → Use fixed effects
If F \leq F_{\alpha} (or p \geq 0.05): Fail to reject → Pooled OLS adequate

6.2 Test 2: Breusch-Pagan LM Test for Random Effects

6.2.1 Null and Alternative Hypotheses

H_0: \sigma_\alpha^2 = 0 \quad (\text{No random effects, pooled OLS is adequate}) H_1: \sigma_\alpha^2 > 0 \quad (\text{Random effects exist})

6.2.2 Test Statistic

LM = \frac{NT}{2(T-1)} \left[ \frac{\sum_{i=1}^N (\sum_{t=1}^T \hat{u}_{it})^2}{\sum_{i=1}^N\sum_{t=1}^T \hat{u}_{it}^2} - 1 \right]^2 \sim \chi^2(1)

where \hat{u}_{it} are pooled OLS residuals.

6.2.3 Intuition

The test checks if residuals are more correlated within units than expected by chance.

6.2.4 Decision Rule

If LM > \chi^2_{\alpha}(1) (or p < 0.05): Reject H_0 → Random effects needed
If LM \leq \chi^2_{\alpha}(1) (or p \geq 0.05): Fail to reject → Pooled OLS adequate

6.3 Test 3: Hausman Specification Test (Fixed vs Random Effects)

6.3.1 The Key Question

Should we use fixed effects or random effects?

6.3.2 Null and Alternative Hypotheses

$ H_0: (i, x{it}) = 0 () $ $ H_1: (i, x{it}) () $

6.3.3 Intuition

Under H_0: Both FE and RE are consistent, but RE is more efficient
Under H_1: Only FE is consistent, RE is biased

If the two estimators differ significantly, it suggests H_1 is true.

6.3.4 Test Statistic

$ H = ({FE} - )’ [({FE}) - ()]^{-1} ({FE} - ) ^2(K) $

where K is the number of time-varying regressors.

6.3.5 Components

Difference vector: $ = {FE} - $

Variance of difference: $ () = ({FE}) - () $

Under H_0, \text{Var}(\hat{q}) is positive semi-definite.

6.3.6 Decision Rule

If H > \chi^2_{\alpha}(K) (or p < 0.05): Reject H_0 → Use Fixed Effects
If H \leq \chi^2_{\alpha}(K) (or p \geq 0.05): Fail to reject → Use Random Effects

6.3.7 Practical Interpretation

Small Hausman statistic / Large p-value: - \hat{\beta}_{FE} \approx \hat{\beta}_{RE} - Suggests orthogonality assumption may hold - RE preferred for efficiency

Large Hausman statistic / Small p-value: - \hat{\beta}_{FE} and \hat{\beta}_{RE} differ substantially - Suggests correlation between \alpha_i and x_{it} - FE preferred for consistency

6.4 Complete Decision Framework

6.4.1 Step-by-Step Model Selection

START
  │
  ├─→ Run Pooled OLS
  │
  ├─→ Test 1: F-test for Fixed Effects
  │    ├─ Reject H₀? → Panel structure exists, continue
  │    └─ Fail to reject? → Use Pooled OLS (with robust SE)
  │
  ├─→ Test 2: Breusch-Pagan LM test
  │    ├─ Reject H₀? → Random effects exist, continue
  │    └─ Fail to reject? → Use Pooled OLS
  │
  ├─→ Test 3: Hausman Test
  │    ├─ Reject H₀? → Use FIXED EFFECTS ✓
  │    └─ Fail to reject? → Use RANDOM EFFECTS ✓
  │
END

6.4.2 Theoretical Considerations

Beyond statistical tests, consider:

Nature of sample:
- Specific units of interest (e.g., G7 countries) → FE
- Random sample from population → RE
Research question:
- Need time-invariant effects? → RE or Pooled
- Focus on within-unit changes? → FE
Data structure:
- Large N, small T → FE often appropriate
- Small N, large T → Both work, but consider dynamics
Plausibility of orthogonality:
- Can \alpha_i realistically be uncorrelated with X? Usually no → FE
- Example: Individual ability affects both education and wages → FE

7 Additional Diagnostic Tests

7.1 Serial Correlation Tests

7.1.1 Why It Matters

Panel data often exhibits serial correlation in \varepsilon_{it}, violating the assumption: $ (,* ) = 0 t s $

7.1.2 Wooldridge Test for Serial Correlation

Null hypothesis: H_0: No first-order serial correlation

Test procedure: 1. Estimate FE model, obtain residuals \hat{\varepsilon}_{it} 2. Regress \hat{\varepsilon}_{it} on \hat{\varepsilon}_{i,t-1} and other variables 3. Test if coefficient on \hat{\varepsilon}_{i,t-1} is zero

Decision rule: - If p < 0.05 → Serial correlation present - Use clustered standard errors or Driscoll-Kraay SEs

7.2 Heteroskedasticity Tests

7.2.1 Modified Wald Test for Groupwise Heteroskedasticity

Null hypothesis: H_0: \sigma_{\varepsilon_i}^2 = \sigma_\varepsilon^2 for all i

Tests whether error variance differs across units.

Decision rule: - If p < 0.05 → Use robust/clustered standard errors

7.3 Cross-Sectional Dependence

7.3.1 Pesaran CD Test

Tests whether residuals are correlated across different units: $ H_0: (,* ) = 0 i j $

Implications if rejected: - Spatial correlation or common shocks - May need Driscoll-Kraay SEs or spatial panel models

8 Robust Inference in Panel Data

8.1 Clustered Standard Errors

8.1.1 Why Cluster?

Even after accounting for \alpha_i, errors may be correlated within units: $ (,* ) $

8.1.2 Cluster-Robust Variance Estimator

For FE: $ {cluster}() = (X’MX)^{-1} (_{i=1}^N X_i’M _i _i’ M X_i) (X’MX)^{-1} $

This allows arbitrary correlation within clusters (units).

8.1.3 When to Use

Default for panel data
Especially important if T is small
Conservative approach

8.2 Driscoll-Kraay Standard Errors

8.2.1 For Spatial and Temporal Correlation

Allows for: - Serial correlation within units - Cross-sectional correlation across units - Useful when T is large

8.2.2 Variance Estimator

$ {DK}() = (X’X)^{-1} (X’X)^{-1} $

where \hat{S}_{DK} accounts for both serial and spatial correlation.

9 Dynamic Panel Data Models

9.1 The Basic Dynamic Model

$ y_{it} = y_{i,t-1} + x_{it}’+ i +* $

Problem: The lagged dependent variable y_{i,t-1} is correlated with \alpha_i, making both FE and RE inconsistent.

9.2 Why Standard Estimators Fail

9.2.1 Correlation Structure

$ y_{i,t-1} = x_{i,t-1}’+ i +* $

Since y_{i,t-1} contains \alpha_i: $ (y_{i,t-1}, _i) $

This violates strict exogeneity even for FE!

9.2.2 The FE Bias

With fixed T, the FE estimator of \rho is biased: $ _{FE} - = O(1/T) $

For small T (common in micro panels), this bias can be large.

9.3 Solutions: GMM Estimators

9.3.1 Arellano-Bond (First-Difference GMM)

First-difference the equation: $ y_{it} = y_{i,t-1} + x_{it}’+ _{it} $

Use y_{i,t-2}, y_{i,t-3}, \ldots as instruments for \Delta y_{i,t-1}.

9.3.2 Arellano-Bover/Blundell-Bond (System GMM)

Combines: - First-differenced equations (with lagged levels as instruments) - Level equations (with lagged differences as instruments)

More efficient than AB when \rho is close to 1.

10 Extensions and Advanced Topics

10.1 Two-Way Fixed Effects

10.1.1 Model with Time Fixed Effects

$ y_{it} = x_{it}’+ _i + t +* $

where: - \alpha_i = unit fixed effects - \lambda_t = time fixed effects (common shocks)

10.1.2 Estimation

Include both unit and time dummies, or double-demean: $ {it} = y{it} - {y}_i - {y}_t + {{y}} $

10.1.3 When to Use

Control for aggregate time shocks (recessions, policy changes)
Standard in difference-in-differences applications

10.2 Unbalanced Panels

10.2.1 Missing Data Structure

Not all units observed in all time periods: - T_i varies across units - Total observations: n = \sum_{i=1}^N T_i

10.2.2 Implications

Pooled OLS and RE: Straightforward to adapt
FE: Still consistent, but demeaning uses T_i for each unit
Hausman test may be affected

10.2.3 Handling

Most software handles automatically, but check: - Missingness mechanism (MAR vs MNAR) - Impact on variance estimates

10.3 Instrumental Variables in Panel Data

10.3.1 Model

$ y_{it} = x_{it}‘+ z_{it}’+ i +* $

where z_{it} is endogenous: \text{Cov}(z_{it}, \varepsilon_{it}) \neq 0

10.3.2 FE-2SLS

Demean all variables (including instruments)
Apply 2SLS to demeaned data

10.3.3 Requires

Valid instruments w_{it} such that: - Relevance: \text{Cov}(w_{it}, z_{it}) \neq 0 - Exogeneity: \text{Cov}(w_{it}, \varepsilon_{it}) = 0

12 Comparison with Alternative Methods

12.1 Panel Data vs Cross-Sectional Regression

Aspect	Cross-Section	Panel Data
Observations	N units, 1 time	N units, T times
Unobserved heterogeneity	Cannot control	FE/RE control
Efficiency	Lower (less data)	Higher
Dynamics	Cannot study	Can model
Identification	Weaker	Stronger

12.2 Panel Data vs Time Series

Aspect	Time Series	Panel Data
Units	1 unit, T times	N units, T times
Asymptotics	T → ∞	N → ∞ (usually)
Unobserved effects	Cannot separate from trend	Can separate
Degrees of freedom	Limited	Abundant

13 Summary and Recommendations

13.1 When to Use Each Model

13.1.1 Use Pooled OLS When:

No panel structure detected (F-test and BP-LM both fail to reject)
Truly independent observations
Time-invariant effects are crucial and plausibly exogenous

13.1.2 Use Fixed Effects When:

\text{Cov}(\alpha_i, x_{it}) likely non-zero (most cases)
Focus on within-unit changes
Robustness is priority
Don’t need time-invariant coefficients

13.1.3 Use Random Effects When:

Hausman test fails to reject
Need time-invariant coefficients
Sample is random draw from population
\text{Cov}(\alpha_i, x_{it}) = 0 is plausible

13.2 General Advice

Default to FE for robustness in most economic applications
Always use robust/clustered standard errors
Run all three models and specification tests for comparison
Consider the economics not just the statistics
Check diagnostics after selecting model

14 Mathematical Appendix

14.1 Kronecker Product Properties

For matrices A (m × n) and B (p × q):

$ A B =

\begin{bmatrix} a_{11}B & a_{12}B & \cdots & a_{1n}B \\ a_{21}B & a_{22}B & \cdots & a_{2n}B \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1}B & a_{m2}B & \cdots & a_{mn}B \end{bmatrix}

_{mp nq} $

Properties: 1. (A \otimes B)(C \otimes D) = (AC) \otimes (BD) 2. (A \otimes B)' = A' \otimes B' 3. (A \otimes B)^{-1} = A^{-1} \otimes B^{-1} (if invertible)

14.2 Matrix Differentiation Rules

For \beta as a column vector:

\frac{\partial (a'\beta)}{\partial \beta} = a
\frac{\partial (\beta'A\beta)}{\partial \beta} = (A + A')\beta
\frac{\partial (y - X\beta)'(y - X\beta)}{\partial \beta} = -2X'(y - X\beta)

14.3 Projection Matrix Properties

For any projection matrix P:

Symmetric: P = P'
Idempotent: P^2 = P
Eigenvalues: Either 0 or 1
Complement: M = I - P is also a projection matrix

15 References and Further Reading

15.1 Essential Textbooks

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press.
- Comprehensive treatment of panel data methods
- Focuses on microeconometrics applications
Baltagi, B. H. (2021). Econometric Analysis of Panel Data (6th ed.). Springer.
- Detailed mathematical exposition
- Covers advanced topics
Hsiao, C. (2014). Analysis of Panel Data (3rd ed.). Cambridge University Press.
- Theoretical foundations
- Economic applications

15.2 Software Resources

Stata: xtreg, xttest0, hausman, xtserial
R: plm package, lfe package for high-dimensional FE
Python: linearmodels package
Eviews: Built-in panel data procedures

15.3 Online Resources

Econometrics Academy (YouTube channel)
NBER Summer Institute lectures
World Bank’s Impact Evaluation resources

16 Glossary of Terms

Balanced panel: Every unit observed in every time period

Composite error: u_{it} = \alpha_i + \varepsilon_{it}

Demeaning: Subtracting group means from variables

Endogeneity: Correlation between regressor and error term

Fixed effect: Unit-specific intercept treated as parameter

Idiosyncratic error: Time-varying, unit-specific shock \varepsilon_{it}

Intraclass correlation: Correlation of observations within same unit

Panel data: Dataset with both cross-sectional and time dimensions

Quasi-demeaning: Partial demeaning with factor 0 < \theta < 1

Random effect: Unit-specific term treated as random draw

Strict exogeneity: E[\varepsilon_{it} \mid X_i] = 0 for all t

Unbalanced panel: Units observed different numbers of times

Within transformation: Demeaning data to remove fixed effects

End of Document