plm
R functionThis document demonstrates the use of the plm
package in
R for panel data analysis. Panel data, also known as longitudinal data,
contains observations on multiple entities (individuals, firms,
countries) over multiple time periods.
First, let’s load the required packages:
We’ll use the Grunfeld
dataset from the plm
package, which contains investment data for 10 US firms from
1935-1954.
firm | year | inv | value | capital |
---|---|---|---|---|
1 | 1935 | 317.6 | 3078.5 | 2.8 |
1 | 1936 | 391.8 | 4661.7 | 52.6 |
1 | 1937 | 410.6 | 5387.1 | 156.9 |
1 | 1938 | 257.7 | 2792.2 | 209.2 |
1 | 1939 | 330.8 | 4313.2 | 203.4 |
1 | 1940 | 461.2 | 4643.9 | 207.2 |
# Dataset structure
cat("Dataset dimensions:", nrow(Grunfeld), "rows x", ncol(Grunfeld), "columns\n")
Dataset dimensions: 200 rows x 5 columns
Number of firms: 10
Time period: 1935 - 1954
Number of time periods: 20
We’ll estimate several panel data models to examine the relationship between investment (value) and its determinants (capital stock and market value).
The pooled OLS (Ordinary Least Squares) model treats all observations as independent, ignoring the panel structure of the data. It assumes no individual or time-specific effects.
Key characteristics:
Model specification:
\(y_{it} = \alpha + \beta X_{it} + \varepsilon_{it}\)
Where:
\(y_{it}\) is the dependent variable for entity \(i\) at time \(t\)
\(\alpha\) is the common intercept
\(X_{it}\) are the explanatory variables
\(\varepsilon_{it}\) is the error term
The fixed effects model accounts for individual-specific characteristics that may be correlated with the explanatory variables. It allows each entity to have its own intercept.
Key characteristics:
Controls for time-invariant individual characteristics
Model specification:
\(y_{it} = \alpha_i + \beta X_{it} + \varepsilon_{it}\)
Where:
\(\alpha_i\) represents entity-specific fixed effects
The random effects model treats individual-specific effects as random draws from a larger population. It assumes these effects are uncorrelated with the explanatory variables.
Key characteristics:-
Model specification:
\(y_{it} = \alpha + \beta X_{it} + \mu_i + \varepsilon_{it}\)
Where:
\(\mu_i\) represents random individual effects
stargazer(pooled, fixed, random,
type = "html",
title = "Panel Data Model Comparison",
column.labels = c("Pooled OLS", "Fixed Effects", "Random Effects"),
dep.var.labels = "Investment",
covariate.labels = c("Value", "Capital Stock"))
Dependent variable: | |||
Investment | |||
Pooled OLS | Fixed Effects | Random Effects | |
(1) | (2) | (3) | |
Value | 0.116*** | 0.110*** | 0.110*** |
(0.006) | (0.012) | (0.010) | |
Capital Stock | 0.231*** | 0.310*** | 0.308*** |
(0.025) | (0.017) | (0.017) | |
Constant | -42.714*** | -57.834** | |
(9.512) | (28.899) | ||
Observations | 200 | 200 | 200 |
R2 | 0.812 | 0.767 | 0.770 |
Adjusted R2 | 0.811 | 0.753 | 0.767 |
F Statistic | 426.576*** (df = 2; 197) | 309.014*** (df = 2; 188) | 657.674*** |
Note: | p<0.1; p<0.05; p<0.01 |
Oneway (individual) effect Within Model
Call: plm(formula = inv ~ value + capital, data = Grunfeld, model = “within”)
Balanced Panel: n = 10, T = 20, N = 200
Residuals: Min. 1st Qu. Median 3rd Qu. Max. -184.00857 -17.64316 0.56337 19.19222 250.70974
Coefficients: Estimate Std. Error t-value Pr(>|t|)
value 0.110124 0.011857 9.2879 < 2.2e-16 capital
0.310065 0.017355 17.8666 < 2.2e-16 — Signif. codes: 0
‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
Total Sum of Squares: 2244400 Residual Sum of Squares: 523480 R-Squared: 0.76676 Adj. R-Squared: 0.75311 F-statistic: 309.014 on 2 and 188 DF, p-value: < 2.22e-16
# Hausman test to choose between fixed and random effects
hausman_test <- phtest(fixed, random)
cat("Hausman Test p-value:", round(hausman_test$p.value, 4), "\n")
Hausman Test p-value: 0.3119
# Breusch-Pagan test for random effects
bp_test <- plmtest(pooled, type = "bp")
cat("Breusch-Pagan Test p-value:", round(bp_test$p.value, 4), "\n")
Breusch-Pagan Test p-value: 0
# Create enhanced effect plot function
plot_plm_effect <- function(model, term, data = NULL) {
if(is.null(data)) data <- model.frame(model)
response_var <- all.vars(formula(model))[1]
term_vals <- seq(min(data[[term]], na.rm = TRUE),
max(data[[term]], na.rm = TRUE),
length.out = 100)
# Create prediction grid
pred_grid <- data.frame(
temp_var = term_vals
)
names(pred_grid) <- term
# Add other variables at their means/modes
other_vars <- setdiff(all.vars(formula(model))[-1], term)
for(var in other_vars) {
if(is.numeric(data[[var]])) {
pred_grid[[var]] <- mean(data[[var]], na.rm = TRUE)
} else {
# For factors, use the most common level
pred_grid[[var]] <- names(sort(table(data[[var]]), decreasing = TRUE))[1]
}
}
# Predict
pred_grid$prediction <- predict(model, newdata = pred_grid)
# Create plot
p <- ggplot(pred_grid, aes_string(x = term, y = "prediction")) +
geom_line(color = "darkblue", linewidth = 1.2) +
geom_ribbon(aes(ymin = prediction - sd(prediction, na.rm = TRUE),
ymax = prediction + sd(prediction, na.rm = TRUE)),
alpha = 0.2, fill = "blue") +
labs(title = paste("Marginal Effect of", term, "on", response_var),
x = term,
y = paste("Predicted", response_var)) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
return(p)
}
# Generate the plot
effect_plot <- plot_plm_effect(fixed, "value")
print(effect_plot)
Based on the Hausman test (p-value = 0.3119), we cannot reject the null hypothesis, suggesting random effects may be appropriate.
The Breusch-Pagan test (p-value = 0) indicates that there is significant evidence of individual effects.
In the fixed effects model:
A one-unit increase in value is associated with a 0.1101 unit increase in investment, holding capital stock constant.
A one-unit increase in capital stock is associated with a 0.3101 unit increase in investment, holding value constant.
The fixed effects model explains approximately 76.7% of the within variance.
The F-statistic for the fixed effects model is 309.01 with a p-value of < 2.2e-16, indicating that the model is statistically significant.
This analysis demonstrates the use of plm
for panel data
regression. The random effects model appears to be the most appropriate
for this dataset based on statistical tests. The results show that both
value and capital stock are positively and significantly associated with
investment.
The plm
package provides comprehensive tools for panel
data analysis, including various estimation methods and diagnostic tests
to help select the most appropriate model for the data structure.