With panel data, nomarly we could use Pooled OLS, Fixed effect or Random effect model to solve the problems. In some cases, we can use some other advanced models as IV, GMM, Hausman-Taylor, FLGS model.
As will be shown below are POLS, FE and RE models
First of all, we need to install package “plm”, which will be used to run paneldata models. It will not be shown below for Rmarkdow proccessing.
library(foreign)
setwd("/Users/vancam/KTLR")
library(car)
# The data will be used here is named panel1.dta, which was first collected by professor Moshe Kim and then was reused by Greene (2008). This data contains information about 6 different airlines (N=6) during the period from 1970 to 1984 (T=15). Thus, we have long panel data with 6x15=90 observations.
van <- read.dta("Panel1.dta")
head(van)
## i t C Q PF LF
## 1 1 1 1140640 0.952757 106650 0.534487
## 2 1 2 1215690 0.986757 110307 0.532328
## 3 1 3 1309570 1.091980 110574 0.547736
## 4 1 4 1511530 1.175780 121974 0.540846
## 5 1 5 1676730 1.160170 196606 0.591167
## 6 1 6 1823740 1.173760 265609 0.575417
tail(van)
## i t C Q PF LF
## 85 6 10 276797 0.092749 564867 0.554276
## 86 6 11 381478 0.112640 874818 0.517766
## 87 6 12 506969 0.154154 1013170 0.580049
## 88 6 13 633388 0.186461 930477 0.556024
## 89 6 14 804388 0.246847 851676 0.537791
## 90 6 15 1009500 0.304013 819476 0.525775
# Drawwing a scatterplot for C variable (total cost of 6 diffrent airlines in the data Panel1 )
scatterplot(C~t|i, reg.line=FALSE, data=van)
# Corvariance between variables
cor(van)
## i t C Q PF LF
## i 1.00000000 0.0000000 -0.7086242 -0.8679359 0.01329393 -0.3399570
## t 0.00000000 1.0000000 0.5000271 0.2711141 0.93118760 0.6001491
## C -0.70862418 0.5000271 1.0000000 0.9263269 0.47904374 0.4143377
## Q -0.86793588 0.2711141 0.9263269 1.0000000 0.22761248 0.4248100
## PF 0.01329393 0.9311876 0.4790437 0.2276125 1.00000000 0.4867001
## LF -0.33995702 0.6001491 0.4143377 0.4248100 0.48670008 1.0000000
POLS is the simplest model to analise panel data with assuptions that all the intercepts and slops are the same between different airlines and through diffrent times.
##
## Call:
## lm(formula = C ~ Q + PF + LF, data = van)
##
## Residuals:
## Min 1Q Median 3Q Max
## -520654 -250270 37333 208690 849700
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.159e+06 3.606e+05 3.213 0.00185 **
## Q 2.026e+06 6.181e+04 32.781 < 2e-16 ***
## PF 1.225e+00 1.037e-01 11.814 < 2e-16 ***
## LF -3.066e+06 6.963e+05 -4.403 3.06e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 281600 on 86 degrees of freedom
## Multiple R-squared: 0.9461, Adjusted R-squared: 0.9442
## F-statistic: 503.1 on 3 and 86 DF, p-value: < 2.2e-16
We have the regression result as below:
C = 1159000 + 2026000Q + 1225000PF - 3066000LF R-square = 0.9461 (perfect)
This result mean these 6 airlines have the same intercept and slope, which is unreliable. And this is the weakness of the POLS. That why we need to try Fixed effect model
library(plm)
## Loading required package: Formula
FEM <- plm(C~Q+PF+LF, data = van, index = c("i", "t"), model = "within")
summary(FEM)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = C ~ Q + PF + LF, data = van, model = "within",
## index = c("i", "t"))
##
## Balanced Panel: n=6, T=15, N=90
##
## Residuals :
## Min. 1st Qu. Median 3rd Qu. Max.
## -552000 -159000 1800 137000 499000
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## Q 3.3190e+06 1.7135e+05 19.3694 < 2.2e-16 ***
## PF 7.7307e-01 9.7319e-02 7.9437 9.698e-12 ***
## LF -3.7974e+06 6.1377e+05 -6.1869 2.375e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 5.0776e+13
## Residual Sum of Squares: 3.5865e+12
## R-Squared: 0.92937
## Adj. R-Squared: 0.92239
## F-statistic: 355.254 on 3 and 81 DF, p-value: < 2.22e-16
We can see in this regresion that there is no presence of intercepts of 6 airlines. To show them up, we can use:
fixef(FEM)
## 1 2 3 4 5 6
## -131236.0 470497.3 1205944.6 1646356.2 1697016.5 1575238.4
Now we should test whether POLS or FE is better. We use Ftest with H0: All 6 intercepts of 6 ariline are equal to 0 H1: There is at least one intercept is not equal to 0
pFtest(FEM, ols)
##
## F test for individual effects
##
## data: C ~ Q + PF + LF
## F = 14.595, df1 = 5, df2 = 81, p-value = 3.467e-10
## alternative hypothesis: significant effects
# The results show that F = 14.595 and Pvalue = 3.467e-10 (approximate to 0), so we can deny H0 and accept H1, which means FE is better.
REM <- plm(C~Q+PF+LF, data = van, index = c("i", "t"), model ="random")
summary(REM)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = C ~ Q + PF + LF, data = van, model = "random",
## index = c("i", "t"))
##
## Balanced Panel: n=6, T=15, N=90
##
## Effects:
## var std.dev share
## idiosyncratic 4.428e+10 2.104e+05 0.906
## individual 4.615e+09 6.793e+04 0.094
## theta: 0.3754
##
## Residuals :
## Min. 1st Qu. Median 3rd Qu. Max.
## -531000 -242000 50200 204000 783000
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 1.0952e+06 3.7697e+05 2.9052 0.004665 **
## Q 2.1446e+06 8.8171e+04 24.3228 < 2.2e-16 ***
## PF 1.1757e+00 1.0356e-01 11.3531 < 2.2e-16 ***
## LF -3.0261e+06 7.2713e+05 -4.1616 7.466e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 8.0306e+13
## Residual Sum of Squares: 6.2698e+12
## R-Squared: 0.92193
## Adj. R-Squared: 0.9192
## F-statistic: 338.508 on 3 and 86 DF, p-value: < 2.22e-16
Now we can copmare the results between FE and RE model
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2. http://CRAN.R-project.org/package=stargazer
stargazer(FEM, REM, title = "A comparision between FE and RE models", type = "text")
##
## A comparision between FE and RE models
## ============================================================
## Dependent variable:
## -----------------------------------------------
## C
## (1) (2)
## ------------------------------------------------------------
## Q 3,319,023.000*** 2,144,561.000***
## (171,354.100) (88,170.630)
##
## PF 0.773*** 1.176***
## (0.097) (0.104)
##
## LF -3,797,368.000*** -3,026,060.000***
## (613,773.100) (727,130.000)
##
## Constant 1,095,172.000***
## (376,967.000)
##
## ------------------------------------------------------------
## Observations 90 90
## R2 0.929 0.922
## Adjusted R2 0.922 0.919
## F Statistic 355.254*** (df = 3; 81) 338.508*** (df = 3; 86)
## ============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
# We can see that there is no significant different between FE and RE within this data.
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.