The dataset traffic.csv consists of
1982–1988 state-level data for 48 U.S. states on traffic fatality rate
(deaths per 100,000). We model the highway fatality rates as a function
of several common factors:
beertax — the tax on a case of beer;
spircons — a measure of spirits consumption;
unrate — the state unemployment rate;
perincK — state per capita income, in thousands.
# init
# setwd("C:\\Users\\Rafal\\WNE\\Advanced_Econometrics\\AE_Lab_02")
Sys.setenv(LANG = "en")
options(scipen=999)
library("MASS")
library("sandwich")
library("zoo")
library("car")
library("lmtest")
library("Formula")
library("plm")
library("stargazer")
# ------------------------------------------------------------------------------
# Exercise 1
# Fixed effects model
# ------------------------------------------------------------------------------
traffic = read.csv(file="traffic.csv", sep=",", header=TRUE)# plm 是面板数据处理包。此代码构建了一个固定效应模型(Fixed Effects Model)。
# fatal: 因变量,通常指交通死亡率。
# beertax, spircons, unrate, perincK: 自变量,分别代表啤酒税、烈酒消耗量、失业率、人均收入。
# data=traffic: 指定使用 traffic 数据集。
# index=c("state", "year"): 指定面板的两个维度,即“州”和“年份”。
# model="within": 核心参数。指定使用“组内估计量(Within Estimator)”,即固定效应模型。
# 它会自动扣除每个州不随时间变化的固有特征(如地理、文化等)。
fixed <-plm(fatal~beertax+spircons+unrate+perincK, data=traffic,
index=c("state", "year"), model="within")
summary(fixed)## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = fatal ~ beertax + spircons + unrate + perincK,
## data = traffic, model = "within", index = c("state", "year"))
##
## Balanced Panel: n = 48, T = 7, N = 336
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.44378893 -0.07922879 0.00078842 0.06761301 0.56861722
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## beertax -0.4840728 0.1625106 -2.9787 0.003145 **
## spircons 0.8169651 0.0792118 10.3137 < 0.00000000000000022 ***
## unrate -0.0290499 0.0090274 -3.2180 0.001441 **
## perincK 0.1047103 0.0205986 5.0834 0.0000006738 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 10.785
## Residual Sum of Squares: 6.9816
## R-Squared: 0.35265
## Adj. R-Squared: 0.2364
## F-statistic: 38.6774 on 4 and 284 DF, p-value: < 0.000000000000000222
| 系数 | 具体值 | 量化含义 | 直觉解释 | |
|---|---|---|---|---|
| beertax | -0.4841 | 是 (p < 0.01) | 啤酒税增加 1 单位 → 交通死亡率下降约 0.484 | 酒税提高 → 酒精消费下降 → 酒驾减少 → 死亡率下降 |
| spircons | 0.8170 | 是 (p < 0.001) | 烈酒消费增加 1 单位 → 交通死亡率上升约 0.817 | 酒精消费增加 → 酒驾更多 → 事故和死亡增加 |
| unrate | -0.0290 | 是 (p < 0.01) | 失业率增加 1 个百分点 → 死亡率下降约 0.029 | 经济差 → 出行减少 → 交通事故减少 |
| perincK | 0.1047 | 是 (p < 0.001) | 人均收入增加 1000 美元 → 死亡率上升约 0.105 | 收入提高 → 出行/购车增加 → 交通暴露增加 → 死亡率上升 |
注:
在你的代码中,index=c("state", "year")
告诉软件谁是个体,谁是时间。而 model="within" 默认执行的是
Oneway (Individual) Effect。
这意味着模型假设:每个州都有自己独特的、不随时间改变的背景特征(\(u_i\))。
模型基本信息:
系数估计 (Coefficients):
拟合优度: * R-Squared (0.35265): 模型解释了死亡率中约 35.3% 的“组内变动”(即州内随时间的变化部分)。
这段分析是在回答一个政策问题:“如果一个州提高啤酒税,真的能让交通事故死人更少吗?”
控制了“各州特色”: 模型使用了
model="within"。这意味着它排除了各州之间本来就有的差异(比如德州面积大、路况复杂,而纽约州公共交通发达)。它只对比同一个州在不同年份,当税收变动时,死亡率是否也跟着变动。
啤酒税的效果:
结果显示,啤酒税的系数是负的且有星号(**)。翻译成大白话就是:多收点啤酒税确实管用!
这种税收政策能抑制酒后驾车,从而挽救生命。
其他发现:
一句话总结: 通过对全美 48 个州 7 年的数据分析,我们有证据相信:通过提高酒精税来干预交通安全在统计学上是行得通的。
在 plm 函数中,你可以通过 effect
参数来改变控制的维度: 默认:==individual==。
| 选项 (effect) | 统计学名称 | 通俗解读 |
|---|---|---|
| “individual” (默认) | 单向个体效应 | 只盯着“州”的特性。剔除每个州固有的背景(如地理、文化)。 |
| “time” | 单向时间效应 | 只盯着“年份”的特性。剔除每一年的特殊性(如某年全美大雪、某年全美经济危机)。 |
| “twoways” | 双向固定效应 | 既盯州,又盯年份。 同时剔除各州固有差异和年度全美共有影响。这是目前论文中==最常用的高级做法==。 |
代码示例:
# 如果你想升级为双向固定效应模型:
fixed_2way <- plm(fatal ~ beertax + spircons, data = traffic,
index = c("state", "year"),
model = "within",
effect = "twoways") # 加上这一句
summary(fixed_2way) ## Twoways effects Within Model
##
## Call:
## plm(formula = fatal ~ beertax + spircons, data = traffic, effect = "twoways",
## model = "within", index = c("state", "year"))
##
## Balanced Panel: n = 48, T = 7, N = 336
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.4900568 -0.0763304 0.0048405 0.0835691 0.7479948
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## beertax -0.49242 0.17820 -2.7632 0.006103 **
## spircons 1.02729 0.12438 8.2593 0.000000000000005877 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 10.29
## Residual Sum of Squares: 7.9761
## R-Squared: 0.2249
## Adj. R-Squared: 0.07265
## F-statistic: 40.6222 on 2 and 280 DF, p-value: 0.00000000000000032358
## AL AR AZ CA CO CT DE FL
## 1.269227 0.926946 0.123137 -1.209680 -1.089790 -2.248239 -1.352593 -0.214829
## GA IA ID IL IN KS KY LA
## 0.849896 -0.242856 0.793980 -1.393393 -0.282001 -0.156068 0.260491 0.278784
## MA MD ME MI MN MO MS MT
## -2.272930 -1.702579 -0.459730 -0.769796 -1.457475 -0.203825 1.431560 0.668828
## NC ND NE NH NJ NM NV NY
## 0.630467 -0.854888 -0.543657 -2.957510 -2.155776 1.722967 -2.164578 -1.877086
## OH OK OR PA RI SC SD TN
## -0.420526 0.584941 -0.118840 -0.651213 -1.828010 1.244031 -0.076547 0.488462
## TX UT VA VT WA WI WV WY
## 0.152122 0.502898 -0.622678 -0.575079 -0.906985 -1.147033 1.055006 0.550873
## NM MS AL SC WV AR
## 1.72296667 1.43156033 1.26922654 1.24403149 1.05500578 0.92694626
## GA ID MT NC OK WY
## 0.84989551 0.79398048 0.66882836 0.63046722 0.58494127 0.55087312
## UT TN LA KY TX AZ
## 0.50289787 0.48846245 0.27878366 0.26049062 0.15212238 0.12313743
## SD OR KS MO FL IA
## -0.07654684 -0.11883962 -0.15606808 -0.20382454 -0.21482860 -0.24285625
## IN OH ME NE VT VA
## -0.28200110 -0.42052601 -0.45973049 -0.54365750 -0.57507925 -0.62267775
## PA MI ND WA CO WI
## -0.65121302 -0.76979577 -0.85488808 -0.90698507 -1.08978969 -1.14703336
## CA DE IL MN MD RI
## -1.20967986 -1.35259271 -1.39339310 -1.45747549 -1.70257930 -1.82801024
## NY NJ NV CT MA NH
## -1.87708572 -2.15577594 -2.16457801 -2.24823877 -2.27292980 -2.95750969
一句话总结:fixef
的作用是提取出每个个体(如各个州)特有的、不随时间改变的“天生背景水平”或“固有差异”。
在面板数据分析中,执行完固定效应回归(model="within")后,使用
fixef(fixed)
是为了提取那些被“吸收”掉的个体固定效应。
fixef(fixed) 的功能在之前的 summary(fixed)
输出中,你看到的是自变量(如啤酒税、失业率)对死亡率的影响。但那个模型在计算过程中,为了消除各州自身的固有差异,实际上把每个州特有的“基准死亡率”给扣除了。
fixef(fixed)
的作用就是把这些被扣除的、各州特有的“身份标签”给找回来。
在公式 \(y_{it} = x_{it}\beta + u_i + \varepsilon_{it}\) 中:
summary(fixed) 给出的是 \(\beta\)(自变量的影响)。fixef(fixed) 给出的是 \(u_i\)(每个个体的固定截距)。这些 \(u_i\) 代表了剔除所有自变量影响后,每个州自身固有的死亡率水平。
fixef
值很高:说明这个州由于某些模型没抓取到的原因(如地理崎岖、酒驾文化盛行),天生就比其他州交通事故多。fixef
值很低:说明这个州天生就比较安全。我们可以把这比作一次“减肥比赛”:
当你运行 summary 时,老师告诉你“多运动一小时能减几斤”;
当你运行 fixef
时,老师是在告诉你:“抛开运动不谈,张三本身就是喝水都长肉的体质,而李四是天生吃不胖的体质。”
通常运行这个命令后,你会得到 48 个数值(对应 48 个州)。研究者通常会:
fixef。##
## Call:
## lm(formula = fatal ~ beertax + spircons + unrate + perincK, data = traffic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.22581 -0.35100 -0.05238 0.27829 1.94364
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.11867 0.29700 13.868 < 0.0000000000000002 ***
## beertax 0.09720 0.06155 1.579 0.115256
## spircons 0.16235 0.04325 3.754 0.000206 ***
## unrate -0.02910 0.01272 -2.289 0.022731 *
## perincK -0.15843 0.01699 -9.327 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4793 on 331 degrees of freedom
## Multiple R-squared: 0.3019, Adjusted R-squared: 0.2934
## F-statistic: 35.78 on 4 and 331 DF, p-value: < 0.00000000000000022
把 OLS(pooled regression) 和 固定效应(FE) 做对比. 关键点:OLS 忽略了州固定效应(αᵢ) 也就是说它假设:所有州是“同质的”(没有不可观测差异)这在现实中通常是不成立的(文化、道路、安全法规等).
| 变量 | OLS | FE | 结论 |
|---|---|---|---|
| beertax | 不显著 + 正 | 显著 - 负 | OLS 有偏 ❗ |
| spircons | 正 | 正 | 稳健 ✔️ |
| unrate | 负 | 负 | 稳健 ✔️ |
| perincK | 负 | 正 | 严重偏误 ❗ |
在该面板数据中,由于存在未观测的个体效应(state-specific effects),且这些效应很可能与解释变量相关,OLS 估计是有偏且不一致的,因此固定效应(FE)估计更为可靠。
The pooled OLS estimator ignores unobserved state-specific effects. If these effects are correlated with the regressors, the OLS estimator is biased and inconsistent.
The fixed effects estimator controls for time-invariant heterogeneity across states. The substantial differences between OLS and FE estimates suggest that such heterogeneity is present.
Therefore, the fixed effects model is more appropriate in this case.
# Pooled OLS == POLS
# Simple regression model used for panel data set.
# Biased and inconsistent estimates!
# Autocorrelation and heteroskedasticity problems
# tests for poolability
# pFtest 用于在“固定效应模型 (Fixed Effects)”和“混合回归模型 (Pooled OLS)”之间做抉择。
# fixed: 你之前运行的固定效应模型(考虑了各州的个体差异)。
# ols: 假设所有州都一模一样的普通回归模型(忽略了个体差异)。
# pFtest 的核心逻辑是:检查模型中那些“个体的固定效应(u_i)”是否全都等于 0?
# 如果它们不等于 0,说明每个州确实有自己的特性,不能简单地把数据“混在一起(Pool)”跑回归。
pFtest(fixed, ols)##
## F test for individual effects
##
## data: fatal ~ beertax + spircons + unrate + perincK
## F = 59.768, df1 = 47, df2 = 284, p-value < 0.00000000000000022
## alternative hypothesis: significant effects
pbgtest(Breusch-Godfrey/Wooldridge
test)用于检测面板数据模型中是否存在序列相关(Serial Correlation
/
Autocorrelation)问题。它专门检查残差项在时间维度上是否相互关联。
##
## Breusch-Godfrey/Wooldridge test for serial correlation in panel models
##
## data: fatal ~ beertax + spircons + unrate + perincK
## chisq = 25.11, df = 7, p-value = 0.0007256
## alternative hypothesis: serial correlation in idiosyncratic errors
它在问:“同一个体今年的误差,是否会受到去年误差的影响?”原假设 (\(H_0\)): 不存在序列相关。即:随时间变化的随机扰动项(误差)是相互独立的。备择假设 (\(H_1\)): 存在序列相关。即:误差项在时间上具有惯性或关联。3. 结果解读统计量: chisq = 25.11,这是一个卡方分布统计量。p-value (0.0007256): 远小于 0.05。结论: 拒绝原假设,认为模型中存在显著的序列相关。
pbgtest 是在问:“误差项是不是‘记仇’或‘==有记性==’?”如果 \(P < 0.05\),说明去年的扰动会影响今年。你必须使用稳健标准误(Robust Standard Errors)来修正你的结果,否则结论就是“虚假繁荣”。
bptest(Breusch-Pagan test)用于检测模型是否存在异方差性(Heteroskedasticity)。参数 studentize=T 表示使用更稳健的 Koenker 修正版,适用于非正态分布的情况。
bptest 是在问:“误差项的脾气是不是一样大?”如果 \(P < 0.05\),说明模型在不同情况下的‘精准度’忽高忽低。你必须使用 稳健标准误(Robust Standard Errors) 来给你的 \(P\) 值‘挤水分’。特别提醒:在面板数据中,既然你前面的 pbgtest 发现了序列相关,现在的 bptest 又发现了异方差,那么你必须在最终报告结果时使用 聚类稳健标准误(Clustered Robust Standard Errors)。
# Testing for heteroskedasticity
bptest(fatal~beertax+spircons+unrate+perincK, data=traffic, studentize=T)##
## studentized Breusch-Pagan test
##
## data: fatal ~ beertax + spircons + unrate + perincK
## BP = 30.917, df = 4, p-value = 0.000003184
# Controlling for heteroskedasticity and autocorrelation:
coeftest(fixed, vcov.=vcovHC(fixed, method="white1", type="HC0", cluster="group"))##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## beertax -0.4840728 0.1513486 -3.1984 0.0015383 **
## spircons 0.8169651 0.1058757 7.7163 0.0000000000002046 ***
## unrate -0.0290499 0.0076355 -3.8046 0.0001739 ***
## perincK 0.1047103 0.0229624 4.5601 0.0000076111116621 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest 配合 vcovHC 是面板数据的“终极挤水工具”。 它同时修补了异方差和序列相关两大漏洞。如果修正后的星号依然存在,说明你的研究结论极其硬核。
Here is the OCR result formatted cleanly in Markdown:
The data set dancingwiththestars.csv
consists of judges’ ratings of professional dance competitors across 20
seasons of a popular television series. Set team and
time as indices for the panel data model.
| variable | description |
|---|---|
| serial | observation number |
| season | 1–20 – the IV |
| episode | 1–12 within each season (varies) |
| judgenum | judge number (judges 1–3 are permanent) |
| judgexp | number of episodes the judge has attended |
| dancenum | dance within each episode |
| score | the judge’s evaluation of the performance – the DV |
| finalist | whether or not the team made the top 3 that season |
| teamID | initials of the performers |
| ppepisodexp | number of episodes the professional partner has been on the show |
Sys.setenv(LANG = "en")
options(scipen=999)
library("MASS")
library("sandwich")
library("zoo")
library("car")
library("lmtest")
library("Formula")
library("plm")
library("stargazer")
data <- read.csv('./dancingwiththestars.csv')
fixed <- plm(score ~ season + judgexp + ppepisodexp, data=data,
index=c("team", "time"), model="within")
random <- plm(score ~ season + judgexp + ppepisodexp, data=data,
index=c("team", "time"), model="random")
pooling <- plm(score ~ season + judgexp + ppepisodexp, data=data,
index=c("team", "time"), model="pooling")
pFtest(fixed, pooling) # p-value < 0.05 -> use FE##
## F test for individual effects
##
## data: score ~ season + judgexp + ppepisodexp
## F = 12.512, df1 = 256, df2 = 1310, p-value < 0.00000000000000022
## alternative hypothesis: significant effects
##
## Lagrange Multiplier Test - (Breusch-Pagan)
##
## data: score ~ season + judgexp + ppepisodexp
## chisq = 1472.7, df = 1, p-value < 0.00000000000000022
## alternative hypothesis: significant effects
##
## Hausman Test
##
## data: score ~ season + judgexp + ppepisodexp
## chisq = 40.657, df = 2, p-value = 0.000000001484
## alternative hypothesis: one model is inconsistent
First, the Breusch–Pagan LM test (plmtest) is conducted to check for panel effects. The p-value is extremely small (p < 0.001), so we reject the null hypothesis of no panel effects, indicating that panel data methods are appropriate over pooled OLS.
Second, the F-test (pFtest) is used to compare the fixed effects model with pooled OLS. The p-value is very small (p < 0.001), so we reject the null hypothesis that all individual effects are zero. This suggests that the fixed effects model is preferred over pooled OLS.
Finally, the Hausman test (phtest) is used to choose between fixed and random effects models. The p-value is very small (p < 0.001), so we reject the null hypothesis that the random effects estimator is consistent. This indicates that the random effects model is inappropriate.
Therefore, the fixed effects (FE) model is the most appropriate specification for explaining score.
library(lmtest)
library(sandwich)
se_pool <- sqrt(diag(vcovHC(pooling, type="HC1")))
se_fe <- sqrt(diag(vcovHC(fixed, type="HC1")))
se_re <- sqrt(diag(vcovHC(random, type="HC1")))
stargazer(pooling, fixed, random,
type = "text",
se = list(se_pool, se_fe, se_re),
title = "Panel Data Models with Robust SE",
dep.var.labels = "Score",
column.labels = c("POLS", "FE", "RE"),
digits = 3)##
## Panel Data Models with Robust SE
## ===========================================================================
## Dependent variable:
## --------------------------------------------------------------
## Score
## POLS FE RE
## (1) (2) (3)
## ---------------------------------------------------------------------------
## season -2.501*** -2.112***
## (0.162) (0.074)
##
## judgexp 0.229*** 0.198*** 0.194***
## (0.015) (0.040) (0.007)
##
## ppepisodexp 0.005*** -0.005 0.006***
## (0.002) (0.041) (0.002)
##
## Constant 11.116*** 10.162***
## (0.308) (0.202)
##
## ---------------------------------------------------------------------------
## Observations 1,570 1,570 1,570
## R2 0.329 0.388 0.446
## Adjusted R2 0.328 0.267 0.445
## F Statistic 256.127*** (df = 3; 1566) 414.956*** (df = 2; 1310) 938.489***
## ===========================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
表 X 报告了合并 OLS 模型、固定效应模型和随机效应模型的估计结果。 根据豪斯曼检验,固定效应模型更优。
在固定效应模型中,评委经验(judgexp)对评分具有正向且统计上显著的影响。在其他因素保持不变的情况下,评委经验每增加一个单位,评分约增加 0.198 分。
相比之下,变量 ppepisodexp 在固定效应模型中不具有统计显著性,尽管它在汇总 OLS 模型和随机效应模型中似乎显著。这表明后两种模型可能存在遗漏变量偏差。
总体而言,结果表明在构建法官评分模型时,控制未观测到的个体异质性至关重要。
Table X reports the estimation results for pooled OLS, fixed effects, and random effects models. Based on the Hausman test, the fixed effects model is preferred.
In the fixed effects model, judge experience (judgexp) has a positive and statistically significant effect on scores. A one-unit increase in judge experience increases the score by approximately 0.198 points, holding other factors constant.
In contrast, the variable ppepisodexp is not statistically significant in the fixed effects model, although it appears significant in the pooled OLS and random effects models. This suggests that the latter models may suffer from omitted variable bias.
Overall, the results indicate that controlling for unobserved individual heterogeneity is important when modeling judges’ scores.
解释:
season 在本FE模型里被吸收,不必解释。 The variable
season is not estimated in the fixed effects model due to collinearity
with time effects or the within transformation.p < 0.01(非常显著); (0.040) 是标准差。The factors affecting the investment behavior by firms were studied by Grunfeld using a panel of data.
Investment demand is the purchase of durable goods by both households and firms. In terms of total spending, investment spending is the volatile component. Therefore, understanding what determines investment is crucial to understanding the sources of fluctuations in aggregate demand. In addition, a firm’s net fixed investment, which is the flow of additions to capital stock or replacements for worn-out capital, is important because it determines the future value of the capital stock and thus affects future labor productivity and aggregate supply.
There are several interesting and elaborate theories that seek to describe the determinants of the investment process for the firm. Most of these theories evolve to the conclusion that perceived profit opportunities (expected profits or present discounted value of future earnings) and desired capital stock are two important determinants of a firm’s fixed business investment. Unfortunately, neither of these variables are directly observable. Therefore, in formulating our economic model, we use observable proxies for these variables instead.
In terms of expected profits, one alternative is to identify the present discounted value of future earnings as the market value of the firm’s securities. The price of a firm’s stock represents and contains information about these expected profits. Consequently, the stock market value of the firm at the beginning of the year, denoted for firm i in time period t as ( V_{it} ), may be used as a proxy for expected profits.
In terms of desired capital stock, expectations play a definite role. To catch these expectations effects, one possibility is to use a model that recognizes that actual capital stock in any period is the sum of a large number of past desired capital stocks. Thus, we use the beginning of the year actual capital stock, denoted for the ith firm as ( K_{it} ), as a proxy for permanent desired capital stock.
Focusing on these explanatory variables, an economic model for describing gross firm investment for the ith firm in the tth time period, denoted ( INV_{it} ), may be expressed as:
\[INV_{it} = f(V_{it}, K_{it}) \tag{1}\]
Our concern is how we might take this general economic model and specify an econometric model that adequately represents a panel of real-world data. The data consist of ( T = 20 ) years of data (1935–1954) for ( N = 11 ) large firms.
var name variable label
------------------------------------------------------------
i GM=1 USS=2 GE=3 Chr=4 Rich=5 IBM=6 UnOil=7 West=8 Goodyr=9 Match=10
t year, t=1 is 1935; t=20 is 1954
inv = gross investment in plant and equipment, millions of $1947
v = value of common and preferred stock, millions of $1947
k = stock of capital, millions of $1947
------------------------------------------------------------
Questions
Sys.setenv(LANG = "en")
options(scipen=999)
library("MASS")
library("sandwich")
library("zoo")
library("car")
library("lmtest")
library("Formula")
library("plm")
library("stargazer")
data("Grunfeld", package="plm")
fixed <- plm(inv ~ value + capital, data=Grunfeld,
index=c("firm", "year"), model="within")
summary(fixed)## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = inv ~ value + capital, data = Grunfeld, model = "within",
## index = c("firm", "year"))
##
## Balanced Panel: n = 10, T = 20, N = 200
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -184.00857 -17.64316 0.56337 19.19222 250.70974
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## value 0.110124 0.011857 9.2879 < 0.00000000000000022 ***
## capital 0.310065 0.017355 17.8666 < 0.00000000000000022 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 2244400
## Residual Sum of Squares: 523480
## R-Squared: 0.76676
## Adj. R-Squared: 0.75311
## F-statistic: 309.014 on 2 and 188 DF, p-value: < 0.000000000000000222
random <- plm(inv ~ value + capital, data=Grunfeld,
index=c("firm", "year"), model="random")
pooling <- plm(inv ~ value + capital, data=Grunfeld,
index=c("firm", "year"), model="pooling")
plmtest(pooling, type="bp") # p-value < 0.05 -> 有 panel effect ##
## Lagrange Multiplier Test - (Breusch-Pagan)
##
## data: inv ~ value + capital
## chisq = 798.16, df = 1, p-value < 0.00000000000000022
## alternative hypothesis: significant effects
##
## Lagrange Multiplier Test - (Breusch-Pagan)
##
## data: inv ~ value + capital
## chisq = 798.16, df = 1, p-value < 0.00000000000000022
## alternative hypothesis: significant effects
##
## F test for individual effects
##
## data: inv ~ value + capital
## F = 49.177, df1 = 9, df2 = 188, p-value < 0.00000000000000022
## alternative hypothesis: significant effects
##
## Hausman Test
##
## data: inv ~ value + capital
## chisq = 2.3304, df = 2, p-value = 0.3119
## alternative hypothesis: one model is inconsistent
The Breusch–Pagan Lagrange Multiplier test strongly rejects the null hypothesis of no panel effects (p-value < 0.001), indicating that pooling OLS is inappropriate.
The F-test for individual effects also rejects the null hypothesis (p-value < 0.001), suggesting that fixed effects are preferred over pooling.
Finally, the Hausman test yields a p-value of 0.3119, which is greater than 0.05. Therefore, we fail to reject the null hypothesis that the random effects estimator is consistent.
Conclusion: The random effects (RE) model is the most appropriate specification for explaining investment (inv), as it is both consistent and more efficient than the fixed effects model.
Breusch–Pagan 拉格朗日乘子检验强烈拒绝了“不存在面板效应”的零假设(p 值 < 0.001),表明采用池化普通最小二乘法(OLS)是不恰当的。
针对个体效应的 F 检验也拒绝了零假设(p 值 < 0.001),这表明与池化方法相比,固定效应模型更为合适。
最后,豪斯曼检验得出的p值为0.3119,大于0.05。因此,我们无法拒绝“随机效应估计量是一致的”这一零假设。
结论: 随机效应(RE)模型是解释投资(inv)的最合适模型,因为它既一致,又比固定效应模型更有效。
library(lmtest)
library(sandwich)
se_pool <- sqrt(diag(vcovHC(pooling, type="HC1")))
se_fe <- sqrt(diag(vcovHC(fixed, type="HC1")))
se_re <- sqrt(diag(vcovHC(random, type="HC1")))
stargazer(pooling, fixed, random,
type = "text",
se = list(se_pool, se_fe, se_re),
title = "Panel Data Models with Robust SE",
dep.var.labels = "Score",
column.labels = c("POLS", "FE", "RE"),
digits = 3)##
## Panel Data Models with Robust SE
## =========================================================================
## Dependent variable:
## ------------------------------------------------------------
## Score
## POLS FE RE
## (1) (2) (3)
## -------------------------------------------------------------------------
## value 0.116*** 0.110*** 0.110***
## (0.015) (0.014) (0.013)
##
## capital 0.231*** 0.310*** 0.308***
## (0.081) (0.050) (0.052)
##
## Constant -42.714** -57.834**
## (19.426) (23.628)
##
## -------------------------------------------------------------------------
## Observations 200 200 200
## R2 0.812 0.767 0.770
## Adjusted R2 0.811 0.753 0.767
## F Statistic 426.576*** (df = 2; 197) 309.014*** (df = 2; 188) 657.674***
## =========================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The estimation results show that both explanatory variables, value and capital, are statistically significant at the 1% level across all three models (POLS, FE, and RE).
The coefficient of value is positive and stable across specifications (around 0.11), indicating that higher market valuation is associated with higher investment.
Similarly, capital has a positive and significant effect on investment, with a somewhat larger magnitude in the FE and RE models compared to the pooling model.
Although all models yield similar qualitative results, based on the previous model selection tests (Breusch–Pagan, F-test, and Hausman test), the random effects (RE) model is preferred, as it is both consistent and more efficient.
The similarity of coefficients across models suggests robustness of the results.
估计结果表明,在所有三个模型(POLS、FE 和 RE)中,两个解释变量——估值和资本——均在 1% 显著性水平上具有统计学意义。
估值的系数在不同规格下均为正且稳定(约为 0.11),这表明较高的市场估值与较高的投资相关。
同样,资本对投资也具有正向且显著的影响,其效应量在FE和RE模型中略大于池化模型。
尽管所有模型得出的定性结果相似,但根据先前的模型选择检验(Breusch–Pagan检验、F检验和Hausman检验),随机效应(RE)模型更优,因为它既一致又更高效。
各模型系数的相似性表明了结果的稳健性。
The panel is balanced. A panel is considered balanced when each cross-sectional unit is observed for the same number of time periods.
In this dataset, there are \(N=10\) firms and \(T=20\) time periods, resulting in a total of 200 observations, which equals \(N \times T\). This indicates that each firm is observed in every year from 1935 to 1954, with no missing data.
Therefore, the panel is balanced.