In this section, I am going to show how to manually estimate the panel regression equation. Due to that the topics we are covering is turning to be advanced, I am using some basic R command, like “lm”, to facilitate the computation. Therefore, we can focus more on the panel analysis. The reference is primarily Greene (2007)’s “Econometric Analysis” 6 edition. Specialy thanks to my colleague, Youngjune Kim, for lending me the hard copy!
The panel data analysis can be categorized into fixed effects and random effects models. The essential distinction origins from the exogeneity assumption. That is, if the effects are correlated with the independent variables. The fixed effect admits the correlation, hence, has to regress on these effects. Because if not, the effects will be hidden in the “noise”, and cause endogeneity issue, biasing the estimations. The random effects assumes that the effects are not correlated with the indenpendent variables. Ideally, one does not have to model these effects, because by suppressing these effects in the residuals, the biasedness of coefficient estimates should not affected. Despite the unbiasedness, the efficiency would, instead, be largely messed up. Because, the error terms include information about the effects.
The fixed effect model is easy to model. The LSDV estimation is one of the most convenient way to obtain the results. In R, you can just specify “factor” in the regression.
data("Grunfeld", package="plm")
group_reg <- lm(inv ~ value + capital + factor(firm), data = Grunfeld) # group effect only
time_reg <- lm(inv ~ value + capital + factor(year), data = Grunfeld) # time effect only
group_time_reg <- lm(inv ~ value + capital + factor(year) + factor(firm), data = Grunfeld) # group and time effect
ols_reg <- lm(inv ~ value + capital, data = Grunfeld) # pooled result
summary(ols_reg)
##
## Call:
## lm(formula = inv ~ value + capital, data = Grunfeld)
##
## Residuals:
## Min 1Q Median 3Q Max
## -291.68 -30.01 5.30 34.83 369.45
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -42.714369 9.511676 -4.491 1.21e-05 ***
## value 0.115562 0.005836 19.803 < 2e-16 ***
## capital 0.230678 0.025476 9.055 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 94.41 on 197 degrees of freedom
## Multiple R-squared: 0.8124, Adjusted R-squared: 0.8105
## F-statistic: 426.6 on 2 and 197 DF, p-value: < 2.2e-16
summary(group_reg)
##
## Call:
## lm(formula = inv ~ value + capital + factor(firm), data = Grunfeld)
##
## Residuals:
## Min 1Q Median 3Q Max
## -184.009 -17.643 0.563 19.192 250.710
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -70.29672 49.70796 -1.414 0.159
## value 0.11012 0.01186 9.288 < 2e-16 ***
## capital 0.31007 0.01735 17.867 < 2e-16 ***
## factor(firm)2 172.20253 31.16126 5.526 1.08e-07 ***
## factor(firm)3 -165.27512 31.77556 -5.201 5.14e-07 ***
## factor(firm)4 42.48742 43.90988 0.968 0.334
## factor(firm)5 -44.32010 50.49226 -0.878 0.381
## factor(firm)6 47.13542 46.81068 1.007 0.315
## factor(firm)7 3.74324 50.56493 0.074 0.941
## factor(firm)8 12.75106 44.05263 0.289 0.773
## factor(firm)9 -16.92555 48.45327 -0.349 0.727
## factor(firm)10 63.72887 50.33023 1.266 0.207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 52.77 on 188 degrees of freedom
## Multiple R-squared: 0.9441, Adjusted R-squared: 0.9408
## F-statistic: 288.5 on 11 and 188 DF, p-value: < 2.2e-16
One can test on the validityof the fixed effects by using F-test (as below), since the pooled regression is actually restricted. In the large sample, the Wald test shall work as well.
n <- 10
time <- 20
k <- 2
group_f_stat <- (summary(group_reg)$r.squared - summary(ols_reg)$r.squared)*(1- summary(group_reg)$r.squared)/(n - 1)*(n*time - n - 2)
time_f_stat <- (summary(time_reg)$r.squared - summary(ols_reg)$r.squared)*(1- summary(time_reg)$r.squared)/(time - 1)*(n*time - time - 2)
group_f_stat
## [1] 0.1538188
time_f_stat
## [1] 0.007854384
The trouble with random effect model is that the covariance matrix, or the efficiency, is largely affected. It is natural to develop FGLS estimation to take such “effect” into account. Given homoscedascity and non-autocorrelation, one should notice that the covariance matrix \(\omega\) of error terms takes a certain structure (see page 202 in Greene(2007)). \(\omega\) depends on \(var(\epsilon)\) and \(var(\mu)\). The mission is clear. Estimate the covariance matrix!
square_lsdv <- sum((group_reg$residuals - mean(group_reg$residuals))^2)/(n*time - n - k)
square_pool <- sum((ols_reg$residuals - mean(ols_reg$residuals))^2)/(n*time - k - 1)
theta <- 1 - sqrt(square_lsdv / (square_lsdv + time*(square_pool - square_lsdv)))
theta
## [1] 0.8509607
The value of theta is interesting. If it is one, the \(var(\mu)\) is equal to 0, meaning that there is no random effects. Then, the fixed effect assumption is solid. Now, you should understand why the Hausman test is to compare the difference between fixed effect estimation coefficiencts and random effect estimation coefficiencts.
In the following codes, I take the group mean of data and weight them by the value of theta.
dat <- Grunfeld %>%
left_join(., aggregate(Grunfeld, by=list(Grunfeld$firm),FUN=mean)[,-c(1,3)], by = c('firm')) %>%
mutate(inv_new = (inv.x - theta*inv.y)/sqrt(square_lsdv), cap_new = (capital.x - theta*capital.y)/sqrt(square_lsdv),
value_new = (value.x - theta*value.y)/sqrt(square_lsdv))
summary(lm(inv_new ~ cap_new + value_new, data = dat))
##
## Call:
## lm(formula = inv_new ~ cap_new + value_new, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3966 -0.3846 0.0970 0.3681 4.8008
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.16301 0.07687 -2.121 0.0352 *
## cap_new 0.30781 0.01722 17.877 <2e-16 ***
## value_new 0.10975 0.01036 10.594 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.004 on 197 degrees of freedom
## Multiple R-squared: 0.7699, Adjusted R-squared: 0.7676
## F-statistic: 329.6 on 2 and 197 DF, p-value: < 2.2e-16
Note that if theta is zero, there is no weight, and the following estimation will turn to be within-group estimation, which is equivalent to LSDV estimation, or the fixed effect model! See below.
theta = 1
dat <- Grunfeld %>%
left_join(., aggregate(Grunfeld, by=list(Grunfeld$firm),FUN=mean)[,-c(1,3)], by = c('firm')) %>%
mutate(inv_new = (inv.x - theta*inv.y)/sqrt(square_lsdv), cap_new = (capital.x - theta*capital.y)/sqrt(square_lsdv),
value_new = (value.x - theta*value.y)/sqrt(square_lsdv))
summary(lm(inv_new ~ cap_new + value_new, data = dat))
##
## Call:
## lm(formula = inv_new ~ cap_new + value_new, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4871 -0.3344 0.0107 0.3637 4.7512
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.976e-16 6.908e-02 0.000 1
## cap_new 3.101e-01 1.695e-02 18.289 <2e-16 ***
## value_new 1.101e-01 1.158e-02 9.508 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9769 on 197 degrees of freedom
## Multiple R-squared: 0.7668, Adjusted R-squared: 0.7644
## F-statistic: 323.8 on 2 and 197 DF, p-value: < 2.2e-16
Below shows the results by using package “plm”.
grun.fe <- plm(inv~value+capital,data=Grunfeld,model="within")
grun.re <- plm(inv~value+capital,data=Grunfeld,model="random")
summary(grun.fe)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = inv ~ value + capital, data = Grunfeld, model = "within")
##
## Balanced Panel: n=10, T=20, N=200
##
## Residuals :
## Min. 1st Qu. Median 3rd Qu. Max.
## -184.000 -17.600 0.563 19.200 251.000
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## value 0.110124 0.011857 9.2879 < 2.2e-16 ***
## capital 0.310065 0.017355 17.8666 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 2244400
## Residual Sum of Squares: 523480
## R-Squared: 0.76676
## Adj. R-Squared: 0.72075
## F-statistic: 309.014 on 2 and 188 DF, p-value: < 2.22e-16
summary(grun.re)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = inv ~ value + capital, data = Grunfeld, model = "random")
##
## Balanced Panel: n=10, T=20, N=200
##
## Effects:
## var std.dev share
## idiosyncratic 2784.46 52.77 0.282
## individual 7089.80 84.20 0.718
## theta: 0.8612
##
## Residuals :
## Min. 1st Qu. Median 3rd Qu. Max.
## -178.00 -19.70 4.69 19.50 253.00
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -57.834415 28.898935 -2.0013 0.04674 *
## value 0.109781 0.010493 10.4627 < 2e-16 ***
## capital 0.308113 0.017180 17.9339 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 2381400
## Residual Sum of Squares: 548900
## R-Squared: 0.7695
## Adj. R-Squared: 0.75796
## F-statistic: 328.837 on 2 and 197 DF, p-value: < 2.22e-16