Panel data analysis

With panel data, nomarly we could use Pooled OLS, Fixed effect or Random effect model to solve the problems. In some cases, we can use some other advanced models as IV, GMM, Hausman-Taylor, FLGS model.

As will be shown below are POLS, FE and RE models

First of all, we need to install package “plm”, which will be used to run paneldata models. It will not be shown below for Rmarkdow proccessing.

library(foreign)
setwd("/Users/vancam/KTLR")
library(car)
# The data will be used here is named panel1.dta, which was first collected by professor Moshe Kim and then was reused by Greene (2008). This data contains information about 6 different airlines (N=6) during the period from 1970 to 1984 (T=15). Thus, we have long panel data with 6x15=90 observations.
van <- read.dta("Panel1.dta")
head(van)

##   i t       C        Q     PF       LF
## 1 1 1 1140640 0.952757 106650 0.534487
## 2 1 2 1215690 0.986757 110307 0.532328
## 3 1 3 1309570 1.091980 110574 0.547736
## 4 1 4 1511530 1.175780 121974 0.540846
## 5 1 5 1676730 1.160170 196606 0.591167
## 6 1 6 1823740 1.173760 265609 0.575417

tail(van)

##    i  t       C        Q      PF       LF
## 85 6 10  276797 0.092749  564867 0.554276
## 86 6 11  381478 0.112640  874818 0.517766
## 87 6 12  506969 0.154154 1013170 0.580049
## 88 6 13  633388 0.186461  930477 0.556024
## 89 6 14  804388 0.246847  851676 0.537791
## 90 6 15 1009500 0.304013  819476 0.525775

# Drawwing a scatterplot for C variable (total cost of 6 diffrent airlines in the data Panel1 )
scatterplot(C~t|i, reg.line=FALSE, data=van)

# Corvariance between variables
cor(van)

##              i         t          C          Q         PF         LF
## i   1.00000000 0.0000000 -0.7086242 -0.8679359 0.01329393 -0.3399570
## t   0.00000000 1.0000000  0.5000271  0.2711141 0.93118760  0.6001491
## C  -0.70862418 0.5000271  1.0000000  0.9263269 0.47904374  0.4143377
## Q  -0.86793588 0.2711141  0.9263269  1.0000000 0.22761248  0.4248100
## PF  0.01329393 0.9311876  0.4790437  0.2276125 1.00000000  0.4867001
## LF -0.33995702 0.6001491  0.4143377  0.4248100 0.48670008  1.0000000

Pooled OLS

POLS is the simplest model to analise panel data with assuptions that all the intercepts and slops are the same between different airlines and through diffrent times.

## 
## Call:
## lm(formula = C ~ Q + PF + LF, data = van)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -520654 -250270   37333  208690  849700 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.159e+06  3.606e+05   3.213  0.00185 ** 
## Q            2.026e+06  6.181e+04  32.781  < 2e-16 ***
## PF           1.225e+00  1.037e-01  11.814  < 2e-16 ***
## LF          -3.066e+06  6.963e+05  -4.403 3.06e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 281600 on 86 degrees of freedom
## Multiple R-squared:  0.9461, Adjusted R-squared:  0.9442 
## F-statistic: 503.1 on 3 and 86 DF,  p-value: < 2.2e-16

We have the regression result as below:

C = 1159000 + 2026000Q + 1225000PF - 3066000LF R-square = 0.9461 (perfect)

This result mean these 6 airlines have the same intercept and slope, which is unreliable. And this is the weakness of the POLS. That why we need to try Fixed effect model

Fixed effect Model

library(plm)

## Loading required package: Formula

FEM <- plm(C~Q+PF+LF, data = van, index = c("i", "t"), model = "within")
summary(FEM)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = C ~ Q + PF + LF, data = van, model = "within", 
##     index = c("i", "t"))
## 
## Balanced Panel: n=6, T=15, N=90
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
## -552000 -159000    1800  137000  499000 
## 
## Coefficients :
##       Estimate  Std. Error t-value  Pr(>|t|)    
## Q   3.3190e+06  1.7135e+05 19.3694 < 2.2e-16 ***
## PF  7.7307e-01  9.7319e-02  7.9437 9.698e-12 ***
## LF -3.7974e+06  6.1377e+05 -6.1869 2.375e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    5.0776e+13
## Residual Sum of Squares: 3.5865e+12
## R-Squared:      0.92937
## Adj. R-Squared: 0.92239
## F-statistic: 355.254 on 3 and 81 DF, p-value: < 2.22e-16

We can see in this regresion that there is no presence of intercepts of 6 airlines. To show them up, we can use:

fixef(FEM)

##         1         2         3         4         5         6 
## -131236.0  470497.3 1205944.6 1646356.2 1697016.5 1575238.4

Now we should test whether POLS or FE is better. We use Ftest with H0: All 6 intercepts of 6 ariline are equal to 0 H1: There is at least one intercept is not equal to 0

pFtest(FEM, ols)

## 
##  F test for individual effects
## 
## data:  C ~ Q + PF + LF
## F = 14.595, df1 = 5, df2 = 81, p-value = 3.467e-10
## alternative hypothesis: significant effects

# The results show that F = 14.595 and Pvalue = 3.467e-10 (approximate to 0), so we can deny H0 and accept H1, which means FE is better.

Random effect model Now we will try RE model for its advantages

REM <- plm(C~Q+PF+LF, data = van, index = c("i", "t"), model ="random")
summary(REM)

## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = C ~ Q + PF + LF, data = van, model = "random", 
##     index = c("i", "t"))
## 
## Balanced Panel: n=6, T=15, N=90
## 
## Effects:
##                     var   std.dev share
## idiosyncratic 4.428e+10 2.104e+05 0.906
## individual    4.615e+09 6.793e+04 0.094
## theta:  0.3754  
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
## -531000 -242000   50200  204000  783000 
## 
## Coefficients :
##                Estimate  Std. Error t-value  Pr(>|t|)    
## (Intercept)  1.0952e+06  3.7697e+05  2.9052  0.004665 ** 
## Q            2.1446e+06  8.8171e+04 24.3228 < 2.2e-16 ***
## PF           1.1757e+00  1.0356e-01 11.3531 < 2.2e-16 ***
## LF          -3.0261e+06  7.2713e+05 -4.1616 7.466e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    8.0306e+13
## Residual Sum of Squares: 6.2698e+12
## R-Squared:      0.92193
## Adj. R-Squared: 0.9192
## F-statistic: 338.508 on 3 and 86 DF, p-value: < 2.22e-16

Now we can copmare the results between FE and RE model

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2. http://CRAN.R-project.org/package=stargazer

stargazer(FEM, REM, title = "A comparision between FE and RE models", type = "text")

## 
## A comparision between FE and RE models
## ============================================================
##                            Dependent variable:              
##              -----------------------------------------------
##                                     C                       
##                        (1)                     (2)          
## ------------------------------------------------------------
## Q               3,319,023.000***        2,144,561.000***    
##                   (171,354.100)           (88,170.630)      
##                                                             
## PF                  0.773***                1.176***        
##                      (0.097)                 (0.104)        
##                                                             
## LF              -3,797,368.000***       -3,026,060.000***   
##                   (613,773.100)           (727,130.000)     
##                                                             
## Constant                                1,095,172.000***    
##                                           (376,967.000)     
##                                                             
## ------------------------------------------------------------
## Observations           90                      90           
## R2                    0.929                   0.922         
## Adjusted R2           0.922                   0.919         
## F Statistic  355.254*** (df = 3; 81) 338.508*** (df = 3; 86)
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

# We can see that there is no significant different between FE and RE within this data.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Panel data analysis

Ha Cam Van

6/15/2017