In this section, I am going to show how to manually estimate the panel regression equation. Due to that the topics we are covering is turning to be advanced, I am using some basic R command, like “lm”, to facilitate the computation. Therefore, we can focus more on the panel analysis. The reference is primarily Greene (2007)’s “Econometric Analysis” 6 edition. Specialy thanks to my colleague, Youngjune Kim, for lending me the hard copy!

The panel data analysis can be categorized into fixed effects and random effects models. The essential distinction origins from the exogeneity assumption. That is, if the effects are correlated with the independent variables. The fixed effect admits the correlation, hence, has to regress on these effects. Because if not, the effects will be hidden in the “noise”, and cause endogeneity issue, biasing the estimations. The random effects assumes that the effects are not correlated with the indenpendent variables. Ideally, one does not have to model these effects, because by suppressing these effects in the residuals, the biasedness of coefficient estimates should not affected. Despite the unbiasedness, the efficiency would, instead, be largely messed up. Because, the error terms include information about the effects.

1. Fixed Effect model

The fixed effect model is easy to model. The LSDV estimation is one of the most convenient way to obtain the results. In R, you can just specify “factor” in the regression.

data("Grunfeld", package="plm")
group_reg <- lm(inv ~ value + capital  + factor(firm), data = Grunfeld) # group effect only
time_reg <- lm(inv ~ value + capital  + factor(year), data = Grunfeld) # time effect only
group_time_reg <- lm(inv ~ value + capital  + factor(year) + factor(firm), data = Grunfeld) # group and time effect
ols_reg <- lm(inv ~ value + capital, data = Grunfeld) # pooled result

summary(ols_reg)

## 
## Call:
## lm(formula = inv ~ value + capital, data = Grunfeld)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -291.68  -30.01    5.30   34.83  369.45 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -42.714369   9.511676  -4.491 1.21e-05 ***
## value         0.115562   0.005836  19.803  < 2e-16 ***
## capital       0.230678   0.025476   9.055  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 94.41 on 197 degrees of freedom
## Multiple R-squared:  0.8124, Adjusted R-squared:  0.8105 
## F-statistic: 426.6 on 2 and 197 DF,  p-value: < 2.2e-16

summary(group_reg)

## 
## Call:
## lm(formula = inv ~ value + capital + factor(firm), data = Grunfeld)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -184.009  -17.643    0.563   19.192  250.710 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -70.29672   49.70796  -1.414    0.159    
## value             0.11012    0.01186   9.288  < 2e-16 ***
## capital           0.31007    0.01735  17.867  < 2e-16 ***
## factor(firm)2   172.20253   31.16126   5.526 1.08e-07 ***
## factor(firm)3  -165.27512   31.77556  -5.201 5.14e-07 ***
## factor(firm)4    42.48742   43.90988   0.968    0.334    
## factor(firm)5   -44.32010   50.49226  -0.878    0.381    
## factor(firm)6    47.13542   46.81068   1.007    0.315    
## factor(firm)7     3.74324   50.56493   0.074    0.941    
## factor(firm)8    12.75106   44.05263   0.289    0.773    
## factor(firm)9   -16.92555   48.45327  -0.349    0.727    
## factor(firm)10   63.72887   50.33023   1.266    0.207    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 52.77 on 188 degrees of freedom
## Multiple R-squared:  0.9441, Adjusted R-squared:  0.9408 
## F-statistic: 288.5 on 11 and 188 DF,  p-value: < 2.2e-16

One can test on the validityof the fixed effects by using F-test (as below), since the pooled regression is actually restricted. In the large sample, the Wald test shall work as well.

n <- 10 
time <- 20 
k <- 2
group_f_stat <- (summary(group_reg)$r.squared - summary(ols_reg)$r.squared)*(1- summary(group_reg)$r.squared)/(n - 1)*(n*time - n - 2)
time_f_stat <- (summary(time_reg)$r.squared - summary(ols_reg)$r.squared)*(1- summary(time_reg)$r.squared)/(time - 1)*(n*time - time - 2)
group_f_stat

## [1] 0.1538188

time_f_stat

## [1] 0.007854384

2. Random Effect model

The trouble with random effect model is that the covariance matrix, or the efficiency, is largely affected. It is natural to develop FGLS estimation to take such “effect” into account. Given homoscedascity and non-autocorrelation, one should notice that the covariance matrix \(\omega\) of error terms takes a certain structure (see page 202 in Greene(2007)). \(\omega\) depends on \(var(\epsilon)\) and \(var(\mu)\). The mission is clear. Estimate the covariance matrix!

square_lsdv <- sum((group_reg$residuals - mean(group_reg$residuals))^2)/(n*time - n - k)
square_pool <- sum((ols_reg$residuals - mean(ols_reg$residuals))^2)/(n*time - k - 1)
theta <- 1 - sqrt(square_lsdv / (square_lsdv + time*(square_pool - square_lsdv)))
theta

## [1] 0.8509607

The value of theta is interesting. If it is one, the \(var(\mu)\) is equal to 0, meaning that there is no random effects. Then, the fixed effect assumption is solid. Now, you should understand why the Hausman test is to compare the difference between fixed effect estimation coefficiencts and random effect estimation coefficiencts.
In the following codes, I take the group mean of data and weight them by the value of theta.

dat <- Grunfeld %>%
  left_join(., aggregate(Grunfeld, by=list(Grunfeld$firm),FUN=mean)[,-c(1,3)], by = c('firm')) %>% 
  mutate(inv_new = (inv.x - theta*inv.y)/sqrt(square_lsdv), cap_new = (capital.x - theta*capital.y)/sqrt(square_lsdv), 
         value_new = (value.x - theta*value.y)/sqrt(square_lsdv))
summary(lm(inv_new ~ cap_new + value_new, data = dat))

## 
## Call:
## lm(formula = inv_new ~ cap_new + value_new, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3966 -0.3846  0.0970  0.3681  4.8008 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.16301    0.07687  -2.121   0.0352 *  
## cap_new      0.30781    0.01722  17.877   <2e-16 ***
## value_new    0.10975    0.01036  10.594   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.004 on 197 degrees of freedom
## Multiple R-squared:  0.7699, Adjusted R-squared:  0.7676 
## F-statistic: 329.6 on 2 and 197 DF,  p-value: < 2.2e-16

Note that if theta is zero, there is no weight, and the following estimation will turn to be within-group estimation, which is equivalent to LSDV estimation, or the fixed effect model! See below.

theta = 1
dat <- Grunfeld %>%
  left_join(., aggregate(Grunfeld, by=list(Grunfeld$firm),FUN=mean)[,-c(1,3)], by = c('firm')) %>% 
  mutate(inv_new = (inv.x - theta*inv.y)/sqrt(square_lsdv), cap_new = (capital.x - theta*capital.y)/sqrt(square_lsdv), 
         value_new = (value.x - theta*value.y)/sqrt(square_lsdv))
summary(lm(inv_new ~ cap_new + value_new, data = dat))

## 
## Call:
## lm(formula = inv_new ~ cap_new + value_new, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4871 -0.3344  0.0107  0.3637  4.7512 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.976e-16  6.908e-02   0.000        1    
## cap_new     3.101e-01  1.695e-02  18.289   <2e-16 ***
## value_new   1.101e-01  1.158e-02   9.508   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9769 on 197 degrees of freedom
## Multiple R-squared:  0.7668, Adjusted R-squared:  0.7644 
## F-statistic: 323.8 on 2 and 197 DF,  p-value: < 2.2e-16

Below shows the results by using package “plm”.

grun.fe <- plm(inv~value+capital,data=Grunfeld,model="within")
grun.re <- plm(inv~value+capital,data=Grunfeld,model="random")
summary(grun.fe)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = inv ~ value + capital, data = Grunfeld, model = "within")
## 
## Balanced Panel: n=10, T=20, N=200
## 
## Residuals :
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -184.000  -17.600    0.563   19.200  251.000 
## 
## Coefficients :
##         Estimate Std. Error t-value  Pr(>|t|)    
## value   0.110124   0.011857  9.2879 < 2.2e-16 ***
## capital 0.310065   0.017355 17.8666 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    2244400
## Residual Sum of Squares: 523480
## R-Squared:      0.76676
## Adj. R-Squared: 0.72075
## F-statistic: 309.014 on 2 and 188 DF, p-value: < 2.22e-16

summary(grun.re)

## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = inv ~ value + capital, data = Grunfeld, model = "random")
## 
## Balanced Panel: n=10, T=20, N=200
## 
## Effects:
##                   var std.dev share
## idiosyncratic 2784.46   52.77 0.282
## individual    7089.80   84.20 0.718
## theta:  0.8612  
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
## -178.00  -19.70    4.69   19.50  253.00 
## 
## Coefficients :
##               Estimate Std. Error t-value Pr(>|t|)    
## (Intercept) -57.834415  28.898935 -2.0013  0.04674 *  
## value         0.109781   0.010493 10.4627  < 2e-16 ***
## capital       0.308113   0.017180 17.9339  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    2381400
## Residual Sum of Squares: 548900
## R-Squared:      0.7695
## Adj. R-Squared: 0.75796
## F-statistic: 328.837 on 2 and 197 DF, p-value: < 2.22e-16

Panel Data Analysis, part 1

Bowen Chen

September 9, 2016

1. Fixed Effect model

2. Random Effect model