Panel data (also known as longitudinal or cross-sectional time-series data) is a dataset in which the behavior of entities are observed across time. These entities could be states, companies, individuals, countries, etc.
The term Fixed and Random are used frequently in multilevel modeling. They are used in context of ANOVA and regression models and refer to a certain types of statistical model. Researchers use fixed effect regression or ANOVA and they are rarely faced with a situation involving random effects analysis. Fixed effects are variables that are constant across individuals; these variables, like age, sex, or ethnicity, don’t change or change at a constant rate over time. They have fixed effects; in other words, any change they cause to an individual is the same. For example, any effects from being a woman, a person of color, or a 17-year-old will not change over time. The opposite of fixed effects are random effects. These variables are—like the name suggests—random and unpredictable.
In this example, below we look a data set of countries over time in years across an artificial dependent variable y and artificial x variable x1. To create the fixed effects model, we use the plm function in R. To run the PLM function in R, we create the model, which is y regressed on x1, create the index variables and set the model.
library(foreign)
library(plm)
## Warning: package 'plm' was built under R version 3.6.3
Panel <-read.dta("http://dss.princeton.edu/training/Panel101.dta")
head(Panel)
## country year y y_bin x1 x2 x3 opinion op
## 1 A 1990 1342787840 1 0.2779036 -1.1079559 0.28255358 Str agree 1
## 2 A 1991 -1899660544 0 0.3206847 -0.9487200 0.49253848 Disag 0
## 3 A 1992 -11234363 0 0.3634657 -0.7894840 0.70252335 Disag 0
## 4 A 1993 2645775360 1 0.2461440 -0.8855330 -0.09439092 Disag 0
## 5 A 1994 3008334848 1 0.4246230 -0.7297683 0.94613063 Disag 0
## 6 A 1995 3229574144 1 0.4772141 -0.7232460 1.02968037 Str agree 1
The index variables are the fixed effects that we want to create, so in this case we have two sets of fixed effects countries and years.The model function lets us tell R whether to use fixed or random effects, where in this case within means fixed effects. Then we can get the parameter estimate for x1, which indicates that average amount of change per country over time given a unit increase in x.
fixed = plm(y ~ x1, data = Panel, index = c("country", "year"), model = "within")
summary(fixed)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = y ~ x1, data = Panel, model = "within", index = c("country",
## "year"))
##
## Balanced Panel: n = 7, T = 10, N = 70
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -8.63e+09 -9.70e+08 5.40e+08 0.00e+00 1.39e+09 5.61e+09
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## x1 2475617827 1106675594 2.237 0.02889 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 5.2364e+20
## Residual Sum of Squares: 4.8454e+20
## R-Squared: 0.074684
## Adj. R-Squared: -0.029788
## F-statistic: 5.00411 on 1 and 62 DF, p-value: 0.028892
In a random effects model, we do not assume that the effects of the time invariant variables, such as school and country, are the same and allow them to have their own starting values (i.e. intercepts). In this model, the R code is almost identical, except that in the model section we change within to random.
random = plm(y ~ x1, data = Panel, index = c("country", "year"), model = "random")
summary(random)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = y ~ x1, data = Panel, model = "random", index = c("country",
## "year"))
##
## Balanced Panel: n = 7, T = 10, N = 70
##
## Effects:
## var std.dev share
## idiosyncratic 7.815e+18 2.796e+09 0.873
## individual 1.133e+18 1.065e+09 0.127
## theta: 0.3611
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -8.94e+09 -1.51e+09 2.82e+08 0.00e+00 1.56e+09 6.63e+09
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 1037014284 790626206 1.3116 0.1896
## x1 1247001782 902145601 1.3823 0.1669
##
## Total Sum of Squares: 5.6595e+20
## Residual Sum of Squares: 5.5048e+20
## R-Squared: 0.02733
## Adj. R-Squared: 0.013026
## Chisq: 1.91065 on 1 DF, p-value: 0.16689
To decide between fixed or random effects you can run a Hausman test where the null hypothesis is that the preferred model is random effects vs. the alternative the fixed effects. If the p-value is significant (for example <0.05) then use fixed effects, if not use random effects. In this example, the p-value is above .05 so a random effects model is the better fit for this data.
phtest(random, fixed)
##
## Hausman Test
##
## data: y ~ x1
## chisq = 3.674, df = 1, p-value = 0.05527
## alternative hypothesis: one model is inconsistent