1 Introduction

This document demonstrates the use of the plm package in R for panel data analysis. Panel data, also known as longitudinal data, contains observations on multiple entities (individuals, firms, countries) over multiple time periods.

2 Required Libraries

First, let’s load the required packages:

library(plm)
library(stargazer)
library(sjPlot)
library(ggplot2)
library(dplyr)
library(gt)

3 Data

We’ll use the Grunfeld dataset from the plm package, which contains investment data for 10 US firms from 1935-1954.

data("Grunfeld", package = "plm")
gt(head(Grunfeld))

firm	year	inv	value	capital
1	1935	317.6	3078.5	2.8
1	1936	391.8	4661.7	52.6
1	1937	410.6	5387.1	156.9
1	1938	257.7	2792.2	209.2
1	1939	330.8	4313.2	203.4
1	1940	461.2	4643.9	207.2

# Dataset structure
cat("Dataset dimensions:", nrow(Grunfeld), "rows x", ncol(Grunfeld), "columns\n")

Dataset dimensions: 200 rows x 5 columns

cat("Number of firms:", length(unique(Grunfeld$firm)), "\n")

Number of firms: 10

cat("Time period:", min(Grunfeld$year), "-", max(Grunfeld$year), "\n")

Time period: 1935 - 1954

cat("Number of time periods:", length(unique(Grunfeld$year)), "\n")

Number of time periods: 20

4 Model Specification

We’ll estimate several panel data models to examine the relationship between investment (value) and its determinants (capital stock and market value).

Pooled OLS Model

The pooled OLS (Ordinary Least Squares) model treats all observations as independent, ignoring the panel structure of the data. It assumes no individual or time-specific effects.

Key characteristics:

Combines all data into a single pool
Assumes constant intercept across all entities and time periods
Ignores heterogeneity between entities
Uses standard OLS estimation

Model specification:

\(y_{it} = \alpha + \beta X_{it} + \varepsilon_{it}\)

Where:

\(y_{it}\) is the dependent variable for entity \(i\) at time \(t\)

\(\alpha\) is the common intercept

\(X_{it}\) are the explanatory variables

\(\varepsilon_{it}\) is the error term

Fixed Effects Model (FEM)

The fixed effects model accounts for individual-specific characteristics that may be correlated with the explanatory variables. It allows each entity to have its own intercept.

Key characteristics:

Controls for time-invariant individual characteristics

Uses within-estimator (de-meaning approach)
Eliminates bias from omitted variables that are constant over time
Cannot estimate coefficients for time-invariant variables

Model specification:

\(y_{it} = \alpha_i + \beta X_{it} + \varepsilon_{it}\)

Where:

\(\alpha_i\) represents entity-specific fixed effects

Random Effects Model (REM)

The random effects model treats individual-specific effects as random draws from a larger population. It assumes these effects are uncorrelated with the explanatory variables.

Key characteristics:-

More efficient than fixed effects when assumptions are met
Allows estimation of time-invariant variables
Uses generalized least squares (GLS) estimation
Assumes individual effects are uncorrelated with explanatory variables

Model specification:

\(y_{it} = \alpha + \beta X_{it} + \mu_i + \varepsilon_{it}\)

Where:

\(\mu_i\) represents random individual effects

pooled <- plm(inv ~ value + capital, data = Grunfeld, model = “pooling”)
fixed <- plm(inv ~ value + capital, data = Grunfeld, model = “within”)
random <- plm(inv ~ value + capital, data = Grunfeld, model = “random”)

pooled <- plm(inv ~ value + capital, data = Grunfeld, model = "pooling")

fixed <- plm(inv ~ value + capital, data = Grunfeld, model = "within")

random <- plm(inv ~ value + capital, data = Grunfeld, model = "random")

5 Results

5.1 Model Comparison Table

stargazer(pooled, fixed, random, 
          type = "html",
          title = "Panel Data Model Comparison",
          column.labels = c("Pooled OLS", "Fixed Effects", "Random Effects"),
          dep.var.labels = "Investment",
          covariate.labels = c("Value", "Capital Stock"))

**Panel Data Model Comparison**

	Dependent variable:

	Investment
	Pooled OLS	Fixed Effects	Random Effects
	(1)	(2)	(3)

Value	0.116^***	0.110^***	0.110^***
	(0.006)	(0.012)	(0.010)

Capital Stock	0.231^***	0.310^***	0.308^***
	(0.025)	(0.017)	(0.017)

Constant	-42.714^***		-57.834^**
	(9.512)		(28.899)


Observations	200	200	200
R²	0.812	0.767	0.770
Adjusted R²	0.811	0.753	0.767
F Statistic	426.576^*** (df = 2; 197)	309.014^*** (df = 2; 188)	657.674^***

Note:	p<0.1; p<0.05; p<0.01

5.2 Fixed Effects Model Details

# Summary of fixed effects model
summary(fixed)

Oneway (individual) effect Within Model

Call: plm(formula = inv ~ value + capital, data = Grunfeld, model = “within”)

Balanced Panel: n = 10, T = 20, N = 200

Residuals: Min. 1st Qu. Median 3rd Qu. Max. -184.00857 -17.64316 0.56337 19.19222 250.70974

Coefficients: Estimate Std. Error t-value Pr(>|t|)
value 0.110124 0.011857 9.2879 < 2.2e-16 capital 0.310065 0.017355 17.8666 < 2.2e-16 — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

Total Sum of Squares: 2244400 Residual Sum of Squares: 523480 R-Squared: 0.76676 Adj. R-Squared: 0.75311 F-statistic: 309.014 on 2 and 188 DF, p-value: < 2.22e-16

5.3 Diagnostic Tests

# Hausman test to choose between fixed and random effects
hausman_test <- phtest(fixed, random)
cat("Hausman Test p-value:", round(hausman_test$p.value, 4), "\n")

Hausman Test p-value: 0.3119


# Breusch-Pagan test for random effects
bp_test <- plmtest(pooled, type = "bp")
cat("Breusch-Pagan Test p-value:", round(bp_test$p.value, 4), "\n")

Breusch-Pagan Test p-value: 0

6 Visualization

# Create enhanced effect plot function
plot_plm_effect <- function(model, term, data = NULL) {
  if(is.null(data)) data <- model.frame(model)
  
  response_var <- all.vars(formula(model))[1]
  term_vals <- seq(min(data[[term]], na.rm = TRUE),
                  max(data[[term]], na.rm = TRUE),
                  length.out = 100)
  
  # Create prediction grid
  pred_grid <- data.frame(
    temp_var = term_vals
  )
  names(pred_grid) <- term
  
  # Add other variables at their means/modes
  other_vars <- setdiff(all.vars(formula(model))[-1], term)
  for(var in other_vars) {
    if(is.numeric(data[[var]])) {
      pred_grid[[var]] <- mean(data[[var]], na.rm = TRUE)
    } else {
      # For factors, use the most common level
      pred_grid[[var]] <- names(sort(table(data[[var]]), decreasing = TRUE))[1]
    }
  }
  
  # Predict
  pred_grid$prediction <- predict(model, newdata = pred_grid)
  
  # Create plot
  p <- ggplot(pred_grid, aes_string(x = term, y = "prediction")) +
    geom_line(color = "darkblue", linewidth = 1.2) +
    geom_ribbon(aes(ymin = prediction - sd(prediction, na.rm = TRUE),
                    ymax = prediction + sd(prediction, na.rm = TRUE)),
                alpha = 0.2, fill = "blue") +
    labs(title = paste("Marginal Effect of", term, "on", response_var),
         x = term,
         y = paste("Predicted", response_var)) +
    theme_minimal() +
    theme(plot.title = element_text(hjust = 0.5))
  
  return(p)
}

# Generate the plot
effect_plot <- plot_plm_effect(fixed, "value")
print(effect_plot)

7 Interpretation of Results

7.1 Model Selection

Based on the Hausman test (p-value = 0.3119), we cannot reject the null hypothesis, suggesting random effects may be appropriate.

The Breusch-Pagan test (p-value = 0) indicates that there is significant evidence of individual effects.

7.2 Coefficient Interpretation

In the fixed effects model:

A one-unit increase in value is associated with a 0.1101 unit increase in investment, holding capital stock constant.
A one-unit increase in capital stock is associated with a 0.3101 unit increase in investment, holding value constant.

The fixed effects model explains approximately 76.7% of the within variance.

7.3 Model Fit

The F-statistic for the fixed effects model is 309.01 with a p-value of < 2.2e-16, indicating that the model is statistically significant.

8 Conclusion

This analysis demonstrates the use of plm for panel data regression. The random effects model appears to be the most appropriate for this dataset based on statistical tests. The results show that both value and capital stock are positively and significantly associated with investment.

The plm package provides comprehensive tools for panel data analysis, including various estimation methods and diagnostic tests to help select the most appropriate model for the data structure.

Panel Data Analysis

Using `plm` R function

Amir Freund

2025-08-22

1 Introduction

2 Required Libraries

3 Data

4 Model Specification

5 Results

5.1 Model Comparison Table

5.2 Fixed Effects Model Details

5.3 Diagnostic Tests

6 Visualization

7 Interpretation of Results

7.1 Model Selection

7.2 Coefficient Interpretation

7.3 Model Fit

8 Conclusion

Panel Data Analysis

Using plm R function

Amir Freund

2025-08-22

1 Introduction

2 Required Libraries

3 Data

4 Model Specification

5 Results

5.1 Model Comparison Table

5.2 Fixed Effects Model Details

5.3 Diagnostic Tests

6 Visualization

7 Interpretation of Results

7.1 Model Selection

7.2 Coefficient Interpretation

7.3 Model Fit

8 Conclusion

Using `plm` R function