L5_1

Phần 1: Các ước lượng cơ bản

Linear panel-data models

Library

library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.4.1

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plm)


Attaching package: 'plm'

The following objects are masked from 'package:dplyr':

    between, lag, lead

library(lmtest)

Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Data

Sử dụng bộ dữ liệu mus08psidextract.dta

Pdata <- haven::read_dta("C:\\Users\\Huynh Chuong\\Desktop\\University\\UEL\\Class_QuantMethods\\Giáo trình\\Data\\mus\\mus08psidextract.dta")

Mô hình OLS, POOLED OLS

Biến phụ thuộc: tiền lương (log) lwage Biến độc lập: Kinh nghiệm: exp exp2 ; Số giờ làm việc: wks ; Trình độ học vấn: ed

variable.names(Pdata)

 [1] "exp"   "wks"   "occ"   "ind"   "south" "smsa"  "ms"    "fem"   "union"
[10] "ed"    "blk"   "lwage" "id"    "t"     "tdum1" "tdum2" "tdum3" "tdum4"
[19] "tdum5" "tdum6" "tdum7" "exp2"

POOLED_OLS<-lm(lwage~exp+exp2+wks+ed,data=Pdata)

summary(POOLED_OLS)


Call:
lm(formula = lwage ~ exp + exp2 + wks + ed, data = Pdata)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.16058 -0.25035  0.00027  0.26792  2.12969 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.908e+00  6.733e-02  72.894  < 2e-16 ***
exp          4.468e-02  2.393e-03  18.670  < 2e-16 ***
exp2        -7.156e-04  5.279e-05 -13.555  < 2e-16 ***
wks          5.827e-03  1.183e-03   4.927 8.67e-07 ***
ed           7.604e-02  2.227e-03  34.151  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3908 on 4160 degrees of freedom
Multiple R-squared:  0.2836,    Adjusted R-squared:  0.2829 
F-statistic: 411.6 on 4 and 4160 DF,  p-value: < 2.2e-16

Kiểm tra mô hình tuyến tính thông thường

Đa cộng tuyến

car::vif(POOLED_OLS)

      exp      exp2       wks        ed 
18.771894 18.768104  1.003096  1.050545

Phương sai sai số thay đổi

lmtest::bptest(lwage~exp+exp2+wks+ed,data=Pdata)


    studentized Breusch-Pagan test

data:  lwage ~ exp + exp2 + wks + ed
BP = 40.252, df = 4, p-value = 3.838e-08

Cài đặt dữ liệu bảng

Xem dữ liệu cài đặt

Pdata %>% pdata.frame(index=c("id","t")) %>% head()

    exp wks occ ind south smsa ms fem union ed blk   lwage id t tdum1 tdum2
1-1   3  32   0   0     1    0  1   0     0  9   0 5.56068  1 1     1     0
1-2   4  43   0   0     1    0  1   0     0  9   0 5.72031  1 2     0     1
1-3   5  40   0   0     1    0  1   0     0  9   0 5.99645  1 3     0     0
1-4   6  39   0   0     1    0  1   0     0  9   0 5.99645  1 4     0     0
1-5   7  42   0   1     1    0  1   0     0  9   0 6.06146  1 5     0     0
1-6   8  35   0   1     1    0  1   0     0  9   0 6.17379  1 6     0     0
    tdum3 tdum4 tdum5 tdum6 tdum7 exp2
1-1     0     0     0     0     0    9
1-2     0     0     0     0     0   16
1-3     1     0     0     0     0   25
1-4     0     1     0     0     0   36
1-5     0     0     1     0     0   49
1-6     0     0     0     1     0   64

Cài đặt và gán tên

Pdata %>% pdata.frame(index=c("id","t")) -> Pdata

Xem thông tin dữ liệu bảng

Pdata |> pdim()

Balanced Panel: n = 595, T = 7, N = 4165

Thống kê chung

Pdata %>%
  group_by(t) %>%
  summarise(mean=mean(exp(lwage)),
            min= min(exp(lwage)),
            max=max(exp(lwage)))

# A tibble: 7 × 4
  t      mean   min   max
  <fct> <dbl> <dbl> <dbl>
1 1      629.  150.  998.
2 2      682.  150.  998.
3 3      811.  100. 3900.
4 4      894.  161. 5000.
5 5      969.  195. 3300.
6 6     1047.  287. 3500.
7 7     1148.  292. 5100.

Ước lượng mô hình ảnh hưởng cố định - FEM

Gán tên cho hàm hồi quy ước lượng

reg <- "lwage~exp+exp2+wks+ed"

Ước lượng với 3 tiếp cận: individual, time, twoway.

Mô hình ước lượng có dạng

FEM1 <- plm(reg, Pdata, effect = "individual", model="within")

FEM2 <- plm(reg, Pdata, effect = "time", model="within")

FEM3 <- plm(reg, Pdata, effect = "twoway", model="within")

Xem kết quả tổng hợp và giải thích

Models <- list(FEM1, FEM2, FEM3)
modelsummary::modelsummary(Models,
                           statistic = c('{statistic} '),
                           stars = c("*"=0.1, "**"=0.05, "***"=0.01))

tinytable_a6bzpa0otkdi2khe9x4i

	(1)	(2)	(3)
* p < 0.1, p < 0.05, * p < 0.01
exp	0.114***	0.037***
	46.089	16.812
exp2	0.000***	-0.001***	0.000***
	-7.768	-12.546	-7.423
wks	0.001	0.006***	0.001
	1.394	5.312	1.135
ed		0.074***
		36.424
Num.Obs.	4165	4165	4165
R2	0.657	0.285	0.016
R2 Adj.	0.599	0.283	-0.150
AIC	-4499.3	3177.2	-4540.8
BIC	-4474.0	3208.9	-4521.8
RMSE	0.14	0.35	0.14

Ước lượng mô hình ảnh hưởng ngẫu nhiên

Ước lượng với 3 mô hình random: individual, time, twoway. Mô hình ước lượng có dạng:

REM1 <- plm(reg, Pdata, effect = "individual", model="random")

REM2 <- plm(reg, Pdata, effect = "time", model="random")

REM3 <- plm(reg, Pdata, effect = "twoway", model="random")

Models <- list(REM1, REM2, REM3)
modelsummary::modelsummary(Models,
                           statistic = c('{statistic} '),
                           stars = c("*"=0.1, "**"=0.05, "***"=0.01))

tinytable_0bq47biyoo709c5jyfnz

	(1)	(2)	(3)
* p < 0.1, p < 0.05, * p < 0.01
(Intercept)	3.829***	4.908***	4.396***
	40.897	72.894	48.922
exp	0.089***	0.045***	0.066***
	31.536	18.670	24.095
exp2	-0.001***	-0.001***	-0.001***
	-12.408	-13.555	-11.109
wks	0.001	0.006***	0.001
	1.299	4.927	1.343
ed	0.112***	0.076***	0.097***
	18.443	34.151	17.193
Num.Obs.	4165	4165	4165
R2	0.420	0.284	0.238
R2 Adj.	0.419	0.283	0.238
AIC	-1974.8	4000.7	-2610.1
BIC	-1936.8	4038.7	-2572.0
RMSE	0.19	0.39	0.18

Kết quả tổng hợp POOLED, FEM, REM

Models <- list("POOLED" =POOLED_OLS,"FEMid"=FEM1, "REMid"=REM1,
               "FEMt"=FEM2,"REMt"=REM2,"FEMtwoway"=FEM3,
               "REMtwoway"=REM3)
modelsummary::modelsummary(Models,
                           statistic = c('{statistic} '),
                           stars = c("*"=0.1, "**"=0.05, "***"=0.01))

tinytable_z2les79g7z5g6w5vfm5e

	POOLED	FEMid	REMid	FEMt	REMt	FEMtwoway	REMtwoway
* p < 0.1, p < 0.05, * p < 0.01
(Intercept)	4.908***		3.829***		4.908***		4.396***
	72.894		40.897		72.894		48.922
exp	0.045***	0.114***	0.089***	0.037***	0.045***		0.066***
	18.670	46.089	31.536	16.812	18.670		24.095
exp2	-0.001***	0.000***	-0.001***	-0.001***	-0.001***	0.000***	-0.001***
	-13.555	-7.768	-12.408	-12.546	-13.555	-7.423	-11.109
wks	0.006***	0.001	0.001	0.006***	0.006***	0.001	0.001
	4.927	1.394	1.299	5.312	4.927	1.135	1.343
ed	0.076***		0.112***	0.074***	0.076***		0.097***
	34.151		18.443	36.424	34.151		17.193
Num.Obs.	4165	4165	4165	4165	4165	4165	4165
R2	0.284	0.657	0.420	0.285	0.284	0.016	0.238
R2 Adj.	0.283	0.599	0.419	0.283	0.283	-0.150	0.238
AIC	4000.7	-4499.3	-1974.8	3177.2	4000.7	-4540.8	-2610.1
BIC	4038.7	-4474.0	-1936.8	3208.9	4038.7	-4521.8	-2572.0
Log.Lik.	-1994.372
F	411.623
RMSE	0.39	0.14	0.19	0.35	0.39	0.14	0.18

Phần 2: Các kiểm định cơ bản

Đa cộng tuyến

car::vif(POOLED_OLS)

      exp      exp2       wks        ed 
18.771894 18.768104  1.003096  1.050545

Kiểm định có ảnh hưởng “cá nhân”, “thời gian”

POOLED_OLS<- plm(reg,Pdata, model="pooling" )

plmtest(POOLED_OLS)


    Lagrange Multiplier Test - (Honda)

data:  reg
normal = 72.056, p-value < 2.2e-16
alternative hypothesis: significant effects

plmtest(POOLED_OLS, effect = "individual")


    Lagrange Multiplier Test - (Honda)

data:  reg
normal = 72.056, p-value < 2.2e-16
alternative hypothesis: significant effects

plmtest(POOLED_OLS, effect = "time")


    Lagrange Multiplier Test - time effects (Honda)

data:  reg
normal = 189.44, p-value < 2.2e-16
alternative hypothesis: significant effects

plmtest(POOLED_OLS, effect = "twoway")


    Lagrange Multiplier Test - two-ways effects (Honda)

data:  reg
normal = 184.9, p-value < 2.2e-16
alternative hypothesis: significant effects

pwtest(POOLED_OLS,effect ="individual")


    Wooldridge's test for unobserved individual effects

data:  formula
z = 13.713, p-value < 2.2e-16
alternative hypothesis: unobserved effect

pwtest(POOLED_OLS,effect ="time")


    Wooldridge's test for unobserved time effects

data:  formula
z = 2.0142, p-value = 0.04399
alternative hypothesis: unobserved effect

pwtest(POOLED_OLS,effect = c("individual", "time"))


    Wooldridge's test for unobserved individual effects

data:  formula
z = 13.713, p-value < 2.2e-16
alternative hypothesis: unobserved effect

Kiểm định lựa chọn FEM, REM

Nguyên tắc kiểm định lựa chọn mô hình FEM và REM, lưu ý rằng, mô hình FEM là mô hình đặt ở trước. Nguyên nhân là: Nếu mô hình FEM mà phù hợp thì REM ước lượng không còn nhất quán nữa.

phtest(FEM3, REM3)


    Hausman Test

data:  reg
chisq = 150.46, df = 2, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent

Kiểm định Tương quan chuỗi

pbltest(REM3)


    Baltagi and Li two-sided LM test

data:  formula(x$formula)
chisq = 410.1, df = 1, p-value < 2.2e-16
alternative hypothesis: AR(1)/MA(1) errors in RE panel model

# pbltest(FEM3) Không sử dụng được trong FEM

bgtest(FEM3)


    Breusch-Godfrey test for serial correlation of order up to 1

data:  FEM3
LM test = 1860.5, df = 1, p-value < 2.2e-16

Heteroskedasticity

lmtest::bptest(POOLED_OLS)


    studentized Breusch-Pagan test

data:  POOLED_OLS
BP = 40.252, df = 4, p-value = 3.838e-08

lmtest::bptest(FEM3)


    studentized Breusch-Pagan test

data:  FEM3
BP = 40.252, df = 4, p-value = 3.838e-08

lmtest::bptest(REM3)


    studentized Breusch-Pagan test

data:  REM3
BP = 40.252, df = 4, p-value = 3.838e-08

Tests for Cross-sectional Dependence

pcdtest(FEM3)


    Pesaran CD test for cross-sectional dependence in panels

data:  lwage ~ exp + exp2 + wks + ed
z = 18.476, p-value < 2.2e-16
alternative hypothesis: cross-sectional dependence

pcdtest(REM3)


    Pesaran CD test for cross-sectional dependence in panels

data:  lwage ~ exp + exp2 + wks + ed
z = 351.52, p-value < 2.2e-16
alternative hypothesis: cross-sectional dependence

Khắc phục hay phát triển mô hình.

Ước lượng tổng quát GLS

* Mô hình POOL và REM sử dụng chung 1 kiểu ước lượng GLS. * Điểm hạn chế của ước lượng này là chỉ chọn được tích hợp ảnh hưởng của “individual” hoặc “time”. * Muốn phát triển thêm thì cần mở rộng các hướng ở các mô hình khác như: dữ liệu bảng theo thời gian (Dùng trong long-panel), hoặc các mô hình khác có liên quan. Nếu không chúng ta chỉ cần hiệu chỉnh sai số của mô hình được chọn.

Pooled_GLSid <- pggls(reg,Pdata, model="pooling", effect= "individual")
# Pooled_GLSt <- pggls(reg,Pdata, model="pooling", effect="time") : ưu tiên sử dụng individual hơn time. Nếu dùng time thường dữ liệu thuộc dạng Long-panel, khi đó nên chuyển sang các dạng mô hình long-panel.

Xử lý khi mô hình REM có hiện tượng tự tương quan

library(nlme)


Attaching package: 'nlme'

The following object is masked from 'package:dplyr':

    collapse

# AR(1) errors within each id
REM3_robust <-  gls(lwage ~ exp + exp2 + wks + ed,Pdata,correlation = corAR1(form =~1|id))
summary(REM3_robust)

Generalized least squares fit by REML
  Model: lwage ~ exp + exp2 + wks + ed 
  Data: Pdata 
        AIC       BIC   logLik
  -1035.995 -991.6621 524.9975

Correlation Structure: AR(1)
 Formula: ~1 | id 
 Parameter estimate(s):
      Phi 
0.9005671 

Coefficients:
                Value  Std.Error  t-value p-value
(Intercept)  4.676658 0.09315787 50.20143  0.0000
exp          0.067315 0.00438772 15.34157  0.0000
exp2        -0.000900 0.00009680 -9.29954  0.0000
wks          0.000073 0.00060226  0.12077  0.9039
ed           0.086918 0.00566914 15.33177  0.0000

 Correlation: 
     (Intr) exp    exp2   wks   
exp  -0.441                     
exp2  0.326 -0.954              
wks  -0.291 -0.029  0.031       
ed   -0.829  0.036  0.024  0.001

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-5.5168482 -0.6209991  0.0109206  0.7137432  5.5267909 

Residual standard error: 0.4329754 
Degrees of freedom: 4165 total; 4160 residual

Đối với mô hình FEM:

FEM_GLS_id <- pggls(reg,Pdata, model="within", effect= "individual") 
FEM_GLS_t <- pggls(reg,Pdata, model="within", effect=  "time")

Nếu mô hình FEM có hiện tượng/ nghi ngờ có hiện tượng tương quan chuỗi thì có thể ước lượng mô hình GLS_FD (first difference - sai phân bậc 1)

FEM_GLS_fd <- pggls(reg,Pdata, model="fd" )

Hiệu chỉnh sai số

Cách tiếp cận thứ 2: Hiệu chỉnh sai số: Nghĩa là vẫn giữ ước lượng của mô hình cũ, chỉ hiệu chỉnh sai số do các hiện tượng trong mô hình.

FEM

model <- FEM3
# Original coefficients
coeftest(model)


t test of coefficients:

        Estimate  Std. Error t value  Pr(>|t|)    
exp2 -4.0505e-04  5.4568e-05 -7.4230 1.425e-13 ***
wks   6.7996e-04  5.9893e-04  1.1353    0.2563    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Heteroskedasticity consistent coefficients
coeftest(model, vcovHC)


t test of coefficients:

        Estimate  Std. Error t value Pr(>|t|)    
exp2 -4.0505e-04  8.3219e-05 -4.8673 1.18e-06 ***
wks   6.7996e-04  8.7958e-04  0.7730   0.4395    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Heteroskedasticity consistent coefficients (Arellano)
coeftest(model, vcovHC(model, method = "arellano"))


t test of coefficients:

        Estimate  Std. Error t value Pr(>|t|)    
exp2 -4.0505e-04  8.3219e-05 -4.8673 1.18e-06 ***
wks   6.7996e-04  8.7958e-04  0.7730   0.4395    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Heteroskedasticity consistent coefficients, type 3
coeftest(model, vcovHC(model, type = "HC3"))


t test of coefficients:

        Estimate  Std. Error t value  Pr(>|t|)    
exp2 -4.0505e-04  8.3296e-05 -4.8628 1.207e-06 ***
wks   6.7996e-04  8.8458e-04  0.7687    0.4421    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# The following shows the HC standard errors of the coefficients
t(sapply(c("HC0", "HC1", "HC2", "HC3", "HC4"), function(x) sqrt(diag(vcovHC(model, type = x)))))

            exp2          wks
HC0 8.321890e-05 0.0008795785
HC1 8.323888e-05 0.0008797897
HC2 8.325748e-05 0.0008820718
HC3 8.329611e-05 0.0008845782
HC4 8.331737e-05 0.0008894645

REM

model <- REM3
# Original coefficients
coeftest(model)


t test of coefficients:

               Estimate  Std. Error  t value Pr(>|t|)    
(Intercept)  4.3958e+00  8.9852e-02  48.9224   <2e-16 ***
exp          6.6382e-02  2.7550e-03  24.0952   <2e-16 ***
exp2        -6.4421e-04  5.7992e-05 -11.1086   <2e-16 ***
wks          9.2658e-04  6.8987e-04   1.3431   0.1793    
ed           9.7363e-02  5.6630e-03  17.1928   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Heteroskedasticity consistent coefficients
coeftest(model, vcovHC)


t test of coefficients:

               Estimate  Std. Error t value  Pr(>|t|)    
(Intercept)  4.3958e+00  1.1568e-01 37.9990 < 2.2e-16 ***
exp          6.6382e-02  4.1368e-03 16.0467 < 2.2e-16 ***
exp2        -6.4421e-04  8.9574e-05 -7.1919 7.543e-13 ***
wks          9.2658e-04  9.1953e-04  1.0077    0.3137    
ed           9.7363e-02  6.6234e-03 14.6999 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Heteroskedasticity consistent coefficients (Arellano)
coeftest(model, vcovHC(model, method = "arellano"))


t test of coefficients:

               Estimate  Std. Error t value  Pr(>|t|)    
(Intercept)  4.3958e+00  1.1568e-01 37.9990 < 2.2e-16 ***
exp          6.6382e-02  4.1368e-03 16.0467 < 2.2e-16 ***
exp2        -6.4421e-04  8.9574e-05 -7.1919 7.543e-13 ***
wks          9.2658e-04  9.1953e-04  1.0077    0.3137    
ed           9.7363e-02  6.6234e-03 14.6999 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Heteroskedasticity consistent coefficients, type 3
coeftest(model, vcovHC(model, type = "HC3"))


t test of coefficients:

               Estimate  Std. Error t value  Pr(>|t|)    
(Intercept)  4.3958e+00  1.1600e-01 37.8956 < 2.2e-16 ***
exp          6.6382e-02  4.1447e-03 16.0159 < 2.2e-16 ***
exp2        -6.4421e-04  8.9764e-05 -7.1767 8.418e-13 ***
wks          9.2658e-04  9.2551e-04  1.0012    0.3168    
ed           9.7363e-02  6.6339e-03 14.6765 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# The following shows the HC standard errors of the coefficients
t(sapply(c("HC0", "HC1", "HC2", "HC3", "HC4"), function(x) sqrt(diag(vcovHC(model, type = x)))))

    (Intercept)         exp         exp2          wks          ed
HC0   0.1156810 0.004136785 8.957427e-05 0.0009195295 0.006623354
HC1   0.1157505 0.004139271 8.962808e-05 0.0009200819 0.006627333
HC2   0.1158385 0.004140755 8.966895e-05 0.0009225119 0.006628625
HC3   0.1159968 0.004144733 8.976388e-05 0.0009255131 0.006633906
HC4   0.1161463 0.004145660 8.980416e-05 0.0009309792 0.006633085