1 Introduction

This document demonstrates the use of the plm package in R for panel data analysis. Panel data, also known as longitudinal data, contains observations on multiple entities (individuals, firms, countries) over multiple time periods.

2 Required Libraries

First, let’s load the required packages:

library(plm)
library(stargazer)
library(sjPlot)
library(ggplot2)
library(dplyr)
library(gt)

3 Data

We’ll use the Grunfeld dataset from the plm package, which contains investment data for 10 US firms from 1935-1954.

data("Grunfeld", package = "plm")
gt(head(Grunfeld))
firm year inv value capital
1 1935 317.6 3078.5 2.8
1 1936 391.8 4661.7 52.6
1 1937 410.6 5387.1 156.9
1 1938 257.7 2792.2 209.2
1 1939 330.8 4313.2 203.4
1 1940 461.2 4643.9 207.2
# Dataset structure
cat("Dataset dimensions:", nrow(Grunfeld), "rows x", ncol(Grunfeld), "columns\n")

Dataset dimensions: 200 rows x 5 columns

cat("Number of firms:", length(unique(Grunfeld$firm)), "\n")

Number of firms: 10

cat("Time period:", min(Grunfeld$year), "-", max(Grunfeld$year), "\n")

Time period: 1935 - 1954

cat("Number of time periods:", length(unique(Grunfeld$year)), "\n")

Number of time periods: 20

4 Model Specification

We’ll estimate several panel data models to examine the relationship between investment (value) and its determinants (capital stock and market value).

  1. Pooled OLS Model

The pooled OLS (Ordinary Least Squares) model treats all observations as independent, ignoring the panel structure of the data. It assumes no individual or time-specific effects.

Key characteristics:

  • Combines all data into a single pool
  • Assumes constant intercept across all entities and time periods
  • Ignores heterogeneity between entities
  • Uses standard OLS estimation

Model specification:

\(y_{it} = \alpha + \beta X_{it} + \varepsilon_{it}\)

Where:

\(y_{it}\) is the dependent variable for entity \(i\) at time \(t\)

\(\alpha\) is the common intercept

\(X_{it}\) are the explanatory variables

\(\varepsilon_{it}\) is the error term

  1. Fixed Effects Model (FEM)

The fixed effects model accounts for individual-specific characteristics that may be correlated with the explanatory variables. It allows each entity to have its own intercept.

Key characteristics:

Controls for time-invariant individual characteristics

  • Uses within-estimator (de-meaning approach)
  • Eliminates bias from omitted variables that are constant over time
  • Cannot estimate coefficients for time-invariant variables

Model specification:

\(y_{it} = \alpha_i + \beta X_{it} + \varepsilon_{it}\)

Where:

\(\alpha_i\) represents entity-specific fixed effects

  1. Random Effects Model (REM)

The random effects model treats individual-specific effects as random draws from a larger population. It assumes these effects are uncorrelated with the explanatory variables.

Key characteristics:-

  • More efficient than fixed effects when assumptions are met
  • Allows estimation of time-invariant variables
  • Uses generalized least squares (GLS) estimation
  • Assumes individual effects are uncorrelated with explanatory variables

Model specification:

\(y_{it} = \alpha + \beta X_{it} + \mu_i + \varepsilon_{it}\)

Where:

\(\mu_i\) represents random individual effects

  • pooled <- plm(inv ~ value + capital, data = Grunfeld, model = “pooling”)
  • fixed <- plm(inv ~ value + capital, data = Grunfeld, model = “within”)
  • random <- plm(inv ~ value + capital, data = Grunfeld, model = “random”)
pooled <- plm(inv ~ value + capital, data = Grunfeld, model = "pooling")

fixed <- plm(inv ~ value + capital, data = Grunfeld, model = "within")

random <- plm(inv ~ value + capital, data = Grunfeld, model = "random")

5 Results

5.1 Model Comparison Table

stargazer(pooled, fixed, random, 
          type = "html",
          title = "Panel Data Model Comparison",
          column.labels = c("Pooled OLS", "Fixed Effects", "Random Effects"),
          dep.var.labels = "Investment",
          covariate.labels = c("Value", "Capital Stock"))
Panel Data Model Comparison
Dependent variable:
Investment
Pooled OLS Fixed Effects Random Effects
(1) (2) (3)
Value 0.116*** 0.110*** 0.110***
(0.006) (0.012) (0.010)
Capital Stock 0.231*** 0.310*** 0.308***
(0.025) (0.017) (0.017)
Constant -42.714*** -57.834**
(9.512) (28.899)
Observations 200 200 200
R2 0.812 0.767 0.770
Adjusted R2 0.811 0.753 0.767
F Statistic 426.576*** (df = 2; 197) 309.014*** (df = 2; 188) 657.674***
Note: p<0.1; p<0.05; p<0.01

5.2 Fixed Effects Model Details

# Summary of fixed effects model
summary(fixed)

Oneway (individual) effect Within Model

Call: plm(formula = inv ~ value + capital, data = Grunfeld, model = “within”)

Balanced Panel: n = 10, T = 20, N = 200

Residuals: Min. 1st Qu. Median 3rd Qu. Max. -184.00857 -17.64316 0.56337 19.19222 250.70974

Coefficients: Estimate Std. Error t-value Pr(>|t|)
value 0.110124 0.011857 9.2879 < 2.2e-16 capital 0.310065 0.017355 17.8666 < 2.2e-16 — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

Total Sum of Squares: 2244400 Residual Sum of Squares: 523480 R-Squared: 0.76676 Adj. R-Squared: 0.75311 F-statistic: 309.014 on 2 and 188 DF, p-value: < 2.22e-16

5.3 Diagnostic Tests

# Hausman test to choose between fixed and random effects
hausman_test <- phtest(fixed, random)
cat("Hausman Test p-value:", round(hausman_test$p.value, 4), "\n")

Hausman Test p-value: 0.3119


# Breusch-Pagan test for random effects
bp_test <- plmtest(pooled, type = "bp")
cat("Breusch-Pagan Test p-value:", round(bp_test$p.value, 4), "\n")

Breusch-Pagan Test p-value: 0

6 Visualization

# Create enhanced effect plot function
plot_plm_effect <- function(model, term, data = NULL) {
  if(is.null(data)) data <- model.frame(model)
  
  response_var <- all.vars(formula(model))[1]
  term_vals <- seq(min(data[[term]], na.rm = TRUE),
                  max(data[[term]], na.rm = TRUE),
                  length.out = 100)
  
  # Create prediction grid
  pred_grid <- data.frame(
    temp_var = term_vals
  )
  names(pred_grid) <- term
  
  # Add other variables at their means/modes
  other_vars <- setdiff(all.vars(formula(model))[-1], term)
  for(var in other_vars) {
    if(is.numeric(data[[var]])) {
      pred_grid[[var]] <- mean(data[[var]], na.rm = TRUE)
    } else {
      # For factors, use the most common level
      pred_grid[[var]] <- names(sort(table(data[[var]]), decreasing = TRUE))[1]
    }
  }
  
  # Predict
  pred_grid$prediction <- predict(model, newdata = pred_grid)
  
  # Create plot
  p <- ggplot(pred_grid, aes_string(x = term, y = "prediction")) +
    geom_line(color = "darkblue", linewidth = 1.2) +
    geom_ribbon(aes(ymin = prediction - sd(prediction, na.rm = TRUE),
                    ymax = prediction + sd(prediction, na.rm = TRUE)),
                alpha = 0.2, fill = "blue") +
    labs(title = paste("Marginal Effect of", term, "on", response_var),
         x = term,
         y = paste("Predicted", response_var)) +
    theme_minimal() +
    theme(plot.title = element_text(hjust = 0.5))
  
  return(p)
}

# Generate the plot
effect_plot <- plot_plm_effect(fixed, "value")
print(effect_plot)

7 Interpretation of Results

7.1 Model Selection

Based on the Hausman test (p-value = 0.3119), we cannot reject the null hypothesis, suggesting random effects may be appropriate.

The Breusch-Pagan test (p-value = 0) indicates that there is significant evidence of individual effects.

7.2 Coefficient Interpretation

In the fixed effects model:

  • A one-unit increase in value is associated with a 0.1101 unit increase in investment, holding capital stock constant.

  • A one-unit increase in capital stock is associated with a 0.3101 unit increase in investment, holding value constant.

The fixed effects model explains approximately 76.7% of the within variance.

7.3 Model Fit

The F-statistic for the fixed effects model is 309.01 with a p-value of < 2.2e-16, indicating that the model is statistically significant.

8 Conclusion

This analysis demonstrates the use of plm for panel data regression. The random effects model appears to be the most appropriate for this dataset based on statistical tests. The results show that both value and capital stock are positively and significantly associated with investment.

The plm package provides comprehensive tools for panel data analysis, including various estimation methods and diagnostic tests to help select the most appropriate model for the data structure.