Panel Data Regression

Panel data (also known as longitudinal or cross-sectional time-series data) is a dataset in which the behavior of entities are observed across time. These entities could be states, companies, individuals, countries, etc.

Fixed and Random Effect

The term Fixed and Random are used frequently in multilevel modeling. They are used in context of ANOVA and regression models and refer to a certain types of statistical model. Researchers use fixed effect regression or ANOVA and they are rarely faced with a situation involving random effects analysis. Fixed effects are variables that are constant across individuals; these variables, like age, sex, or ethnicity, don’t change or change at a constant rate over time. They have fixed effects; in other words, any change they cause to an individual is the same. For example, any effects from being a woman, a person of color, or a 17-year-old will not change over time. The opposite of fixed effects are random effects. These variables are—like the name suggests—random and unpredictable.

In this example, below we look a data set of countries over time in years across an artificial dependent variable y and artificial x variable x1. To create the fixed effects model, we use the plm function in R. To run the PLM function in R, we create the model, which is y regressed on x1, create the index variables and set the model.

library(foreign)
library(plm)
## Warning: package 'plm' was built under R version 3.6.3
Panel <-read.dta("http://dss.princeton.edu/training/Panel101.dta")
head(Panel)
##   country year           y y_bin        x1         x2          x3   opinion op
## 1       A 1990  1342787840     1 0.2779036 -1.1079559  0.28255358 Str agree  1
## 2       A 1991 -1899660544     0 0.3206847 -0.9487200  0.49253848     Disag  0
## 3       A 1992   -11234363     0 0.3634657 -0.7894840  0.70252335     Disag  0
## 4       A 1993  2645775360     1 0.2461440 -0.8855330 -0.09439092     Disag  0
## 5       A 1994  3008334848     1 0.4246230 -0.7297683  0.94613063     Disag  0
## 6       A 1995  3229574144     1 0.4772141 -0.7232460  1.02968037 Str agree  1

The index variables are the fixed effects that we want to create, so in this case we have two sets of fixed effects countries and years.The model function lets us tell R whether to use fixed or random effects, where in this case within means fixed effects. Then we can get the parameter estimate for x1, which indicates that average amount of change per country over time given a unit increase in x.

fixed = plm(y ~ x1, data = Panel, index = c("country", "year"), model = "within")
summary(fixed)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = y ~ x1, data = Panel, model = "within", index = c("country", 
##     "year"))
## 
## Balanced Panel: n = 7, T = 10, N = 70
## 
## Residuals:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -8.63e+09 -9.70e+08  5.40e+08  0.00e+00  1.39e+09  5.61e+09 
## 
## Coefficients:
##      Estimate Std. Error t-value Pr(>|t|)  
## x1 2475617827 1106675594   2.237  0.02889 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    5.2364e+20
## Residual Sum of Squares: 4.8454e+20
## R-Squared:      0.074684
## Adj. R-Squared: -0.029788
## F-statistic: 5.00411 on 1 and 62 DF, p-value: 0.028892

In a random effects model, we do not assume that the effects of the time invariant variables, such as school and country, are the same and allow them to have their own starting values (i.e. intercepts). In this model, the R code is almost identical, except that in the model section we change within to random.

random = plm(y ~ x1, data = Panel, index = c("country", "year"), model = "random")
summary(random)
## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = y ~ x1, data = Panel, model = "random", index = c("country", 
##     "year"))
## 
## Balanced Panel: n = 7, T = 10, N = 70
## 
## Effects:
##                     var   std.dev share
## idiosyncratic 7.815e+18 2.796e+09 0.873
## individual    1.133e+18 1.065e+09 0.127
## theta: 0.3611
## 
## Residuals:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -8.94e+09 -1.51e+09  2.82e+08  0.00e+00  1.56e+09  6.63e+09 
## 
## Coefficients:
##               Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 1037014284  790626206  1.3116   0.1896
## x1          1247001782  902145601  1.3823   0.1669
## 
## Total Sum of Squares:    5.6595e+20
## Residual Sum of Squares: 5.5048e+20
## R-Squared:      0.02733
## Adj. R-Squared: 0.013026
## Chisq: 1.91065 on 1 DF, p-value: 0.16689

To decide between fixed or random effects you can run a Hausman test where the null hypothesis is that the preferred model is random effects vs. the alternative the fixed effects. If the p-value is significant (for example <0.05) then use fixed effects, if not use random effects. In this example, the p-value is above .05 so a random effects model is the better fit for this data.

phtest(random, fixed)
## 
##  Hausman Test
## 
## data:  y ~ x1
## chisq = 3.674, df = 1, p-value = 0.05527
## alternative hypothesis: one model is inconsistent

Reference

https://www.princeton.edu/~otorres/Panel101R.pdf

http://dss.princeton.edu/training/Panel101.pdf