1. What is panel data?
- Structure in data: Individual & Time
- Prepare Data
library(plm)
mydata<- read.csv("panel_wage.csv")
attach(mydata) # 代表接下來都用同樣的df,呼叫變數時前面不用再打'mydata$'
formula = lwage ~ exp + exp2 + wks + ed- Variable List
| lwage | log(Wage) | Dependent Variable |
| exp | Experience | Varying Regressor |
| exp2 | Experience^2 | Varying Regressor |
| wks | Weeks worked | Varying Regressor |
| ed | Education | Time-invariant Regressor |
- Set as Panel Data
pdata <- pdata.frame(mydata, index=c("id","t"))2. How to deal with corrlations among error: Data Transform
I. Estimators
- OLS Pooled
- Transformation: Nothing
- No. obs: NT
- Ignores the unobserved heterogeneity of users (possible association within groups)
pooling <- plm(formula, data=pdata, model= "pooling")
summary(pooling)- Beween (individual)
- Transformation: Time average of all variable
- No. obs: N
- Loss information
between <- plm(formula, data=pdata, model= "between")
summary(between)- Within (individual) across time, Fixed Effect (FE)
- Transformation: Time-demean
- No. obs: NT
- Individual specific effect (𝜶i) cancelled
- Time-invariant variable are dropped
fixed <- plm(formula, data=pdata, model= "within")
summary(fixed)圖解Btween和Within之差異
- First-Diff (FD)
- Transformation: One period difference
- No. obs: N(T-1)
- Individual specific effect (𝜶i) cancelled
- Time-invariant variable are dropped
firstdiff <- plm(formula, data=pdata, model= "fd")
summary(firstdiff) # 沒有截距項,會把exp的係數打在(Intercept)中- Random Effect (RE)
- Transformation: Weighted average of between & within estimates
- No. obs: NT
- Lambda愈接近0代表靠近Pooled OLS,愈接近1代表靠近within
random <- plm(formula, data=pdata, model= "random")
summary(random)II. Comparation
| Estimator \ True model | Pooled model | RE model | FE model |
|---|---|---|---|
| Pooled OLS estimator | Consistent | Consistent | Inconsistent |
| Between estimator | Consistent | Consistent | Inconsistent |
| Within or FE estimator | Consistent | Consistent | Consistent |
| RE estimator | Consistent | Consistent | Inconsistent |
3. Choose a Model
Flowchart for Choosing a Model
- Heteroscedasticity: BP test
library(lmtest)
bptest(pooling)##
## studentized Breusch-Pagan test
##
## data: pooling
## BP = 40.252, df = 4, p-value = 3.838e-08
- Other Statistic test
# LM test for random effects versus OLS
plmtest(pooling)##
## Lagrange Multiplier Test - (Honda) for balanced panels
##
## data: formula
## normal = 72.056, p-value < 2.2e-16
## alternative hypothesis: significant effects
# LM test for fixed effects versus OLS
pFtest(fixed, pooling)##
## F test for individual effects
##
## data: formula
## F = 40.239, df1 = 593, df2 = 3567, p-value < 2.2e-16
## alternative hypothesis: significant effects
- Hausman test: FE v.s. RE
- Can be calculated only for the time-varying regressors.
- Significant: use the fixed effects.
- Insignificant: use the random effects.
phtest(random, fixed)##
## Hausman Test
##
## data: formula
## chisq = 6191.4, df = 3, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent
4. Explaination
Comparing Estimators
- 不管是哪個 estimators 都顯示,較高的經驗和教育水準與較高的薪資水平有關
- 就各個模型而言
- 【Pooled OLS】跨過個人和時間,額外一年的工作經驗會導致薪資提高4%
- 【Between】對有多一年工作經驗的人,其平均薪資比一般人高3%
- 【Within】每增加一年的工作經驗,對經驗高於平均的人而言薪資會多11%
- 【First differences】在第一年到下一年的期間,每增加一年的工作經驗,薪資會多11%
- 【Random】每增加一年的工作經驗,對經驗高於平均的人而言薪資會多8%
- 因為 Hausman test 顯示 FE & RE 兩者模型的係數顯著不同,因此我們選擇 FE 模型
- Rho 是 individual specific variation 的百分比,此例有非常高的比例 (FE: 98% & RE: 81%) 被 individual specific term 被解釋,剩餘不能解釋的是由於 idiosyncratic error
- Lambda 為 82%,因此 RE estimates 比 pooled estimates 更靠近 within estimates
- FE把所有個人扣除所以R2比較大
- R-squares 顯示 between estimator 可以解釋 32% 的 between variation,而 FE & RE estimators 分別可以解釋 66% 和 63% 的 within variation