1 Exercise 1

The dataset traffic.csv consists of 1982–1988 state-level data for 48 U.S. states on traffic fatality rate (deaths per 100,000). We model the highway fatality rates as a function of several common factors:

  • beertax — the tax on a case of beer;

  • spircons — a measure of spirits consumption;

  • unrate — the state unemployment rate;

  • perincK — state per capita income, in thousands.

    1. Estimate model for fatality rate using fixed effects estimator.
    1. Interpret parameters of the model.
    1. Are individual effects significant?
    1. Is there autocorrelation in residuals?
    1. Is there heteroskedasticity in residuals?
    1. Apply robust variance-covariance matrix estimator.

1.1 a) Estimate model for fatality rate using fixed effects estimator.

# init
# setwd("C:\\Users\\Rafal\\WNE\\Advanced_Econometrics\\AE_Lab_02")
Sys.setenv(LANG = "en")
options(scipen=999)

library("MASS")
library("sandwich")
library("zoo")
library("car")
library("lmtest")
library("Formula")
library("plm")
library("stargazer")

# ------------------------------------------------------------------------------
# Exercise 1
# Fixed effects model
# ------------------------------------------------------------------------------

traffic = read.csv(file="traffic.csv", sep=",", header=TRUE)
# plm 是面板数据处理包。此代码构建了一个固定效应模型(Fixed Effects Model)。
# fatal: 因变量,通常指交通死亡率。
# beertax, spircons, unrate, perincK: 自变量,分别代表啤酒税、烈酒消耗量、失业率、人均收入。
# data=traffic: 指定使用 traffic 数据集。
# index=c("state", "year"): 指定面板的两个维度,即“州”和“年份”。
# model="within": 核心参数。指定使用“组内估计量(Within Estimator)”,即固定效应模型。
# 它会自动扣除每个州不随时间变化的固有特征(如地理、文化等)。
fixed <-plm(fatal~beertax+spircons+unrate+perincK, data=traffic, 
            index=c("state", "year"), model="within")
summary(fixed)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = fatal ~ beertax + spircons + unrate + perincK, 
##     data = traffic, model = "within", index = c("state", "year"))
## 
## Balanced Panel: n = 48, T = 7, N = 336
## 
## Residuals:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## -0.44378893 -0.07922879  0.00078842  0.06761301  0.56861722 
## 
## Coefficients:
##            Estimate Std. Error t-value              Pr(>|t|)    
## beertax  -0.4840728  0.1625106 -2.9787              0.003145 ** 
## spircons  0.8169651  0.0792118 10.3137 < 0.00000000000000022 ***
## unrate   -0.0290499  0.0090274 -3.2180              0.001441 ** 
## perincK   0.1047103  0.0205986  5.0834          0.0000006738 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    10.785
## Residual Sum of Squares: 6.9816
## R-Squared:      0.35265
## Adj. R-Squared: 0.2364
## F-statistic: 38.6774 on 4 and 284 DF, p-value: < 0.000000000000000222

1.1.1 简单解读

系数 具体值 量化含义 直觉解释
beertax -0.4841 是 (p < 0.01) 啤酒税增加 1 单位 → 交通死亡率下降约 0.484 酒税提高 → 酒精消费下降 → 酒驾减少 → 死亡率下降
spircons 0.8170 是 (p < 0.001) 烈酒消费增加 1 单位 → 交通死亡率上升约 0.817 酒精消费增加 → 酒驾更多 → 事故和死亡增加
unrate -0.0290 是 (p < 0.01) 失业率增加 1 个百分点 → 死亡率下降约 0.029 经济差 → 出行减少 → 交通事故减少
perincK 0.1047 是 (p < 0.001) 人均收入增加 1000 美元 → 死亡率上升约 0.105 收入提高 → 出行/购车增加 → 交通暴露增加 → 死亡率上升

注:

  • 所有系数均为固定效应(within)估计,反映的是“==同一州随时间变化==”的边际效应,而非州与州之间的差异。
  • 直觉解释,这一栏表示的是,x变化1单位,影响的 每10万人 里死亡率的变化的绝对值。例如,啤酒税增加 1 单位 → 交通死亡率下降约 0.484 ==每10万人==

1.1.2 Oneway (individual) effect Within Model 的意思

在你的代码中,index=c("state", "year") 告诉软件谁是个体,谁是时间。而 model="within" 默认执行的是 Oneway (Individual) Effect

这意味着模型假设:每个州都有自己独特的、不随时间改变的背景特征(\(u_i\)

  • 例子: 蒙大拿州地广人稀,路况和加州完全不同。这些物理背景、文化习俗是固定的。
  • 处理方式: 模型通过“组内转换”(Within Transformation),把每个州自己的平均水平减掉。这样,模型比较的就是:“当加州今年的啤酒税高于加州往年的平均水平时,加州今年的死亡率是否低于其往年的平均水平?”
  • 局限性:没有考虑到“年份”带来的集体冲击。比如 2020 年全美可能因为疫情导致大家都不出门,死亡率集体下降,这种“全美年度共有特征”在单向模型里没被单独剔除。

1.1.3 数据含义

模型基本信息:

  • Balanced Panel (平衡面板): \(n = 48, T = 7, N = 336\)。这意味着数据包含 48 个州,每个州都有连续 7 年的记录,总共 336 条观测值。
  • 模型性质: Oneway (individual) effect。这表明模型只控制了“州”层面的固定效应。

系数估计 (Coefficients):

  • beertax (-0.4841): 显著为负(\(p < 0.01\))。这意味着提高啤酒税能显著降低交通死亡率。在控制了各州自身特征后,啤酒税每增加 1 单位,死亡率平均下降约 0.48。
  • spircons (0.8170): 极其显著为正。烈酒消耗量越高,交通死亡率越高,这符合常理。
  • unrate (-0.0290): 显著为负。这可能反映了经济不景气(失业率高)时,人们开车出行的频率降低,从而减少了事故。
  • perincK (0.1047): 极其显著为正。人均收入越高,死亡率越高,可能与拥有车辆数更多、出行更频繁有关。

拟合优度: * R-Squared (0.35265): 模型解释了死亡率中约 35.3% 的“组内变动”(即州内随时间的变化部分)。


1.1.3.1 通俗解释

这段分析是在回答一个政策问题:“如果一个州提高啤酒税,真的能让交通事故死人更少吗?”

  1. 控制了“各州特色”: 模型使用了 model="within"。这意味着它排除了各州之间本来就有的差异(比如德州面积大、路况复杂,而纽约州公共交通发达)。它只对比同一个州在不同年份,当税收变动时,死亡率是否也跟着变动。

  2. 啤酒税的效果: 结果显示,啤酒税的系数是负的且有星号(**)。翻译成大白话就是:多收点啤酒税确实管用! 这种税收政策能抑制酒后驾车,从而挽救生命。

  3. 其他发现:

    • 烈酒喝得多,死人就多;
    • 经济好了(收入高了),开车的人多了,意外也就多了;
    • 没工作的人多了(失业率高),大家可能都在家待着不出门,车祸反而少了。

一句话总结: 通过对全美 48 个州 7 年的数据分析,我们有证据相信:通过提高酒精税来干预交通安全在统计学上是行得通的。

plm 函数中,你可以通过 effect 参数来改变控制的维度: 默认:==individual==。

选项 (effect) 统计学名称 通俗解读
“individual” (默认) 单向个体效应 只盯着“州”的特性。剔除每个州固有的背景(如地理、文化)。
“time” 单向时间效应 只盯着“年份”的特性。剔除每一年的特殊性(如某年全美大雪、某年全美经济危机)。
“twoways” 双向固定效应 既盯州,又盯年份。 同时剔除各州固有差异和年度全美共有影响。这是目前论文中==最常用的高级做法==。

代码示例:

# 如果你想升级为双向固定效应模型:
fixed_2way <- plm(fatal ~ beertax + spircons, data = traffic, 
                  index = c("state", "year"), 
                  model = "within", 
                  effect = "twoways") # 加上这一句
summary(fixed_2way)                  
## Twoways effects Within Model
## 
## Call:
## plm(formula = fatal ~ beertax + spircons, data = traffic, effect = "twoways", 
##     model = "within", index = c("state", "year"))
## 
## Balanced Panel: n = 48, T = 7, N = 336
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -0.4900568 -0.0763304  0.0048405  0.0835691  0.7479948 
## 
## Coefficients:
##          Estimate Std. Error t-value             Pr(>|t|)    
## beertax  -0.49242    0.17820 -2.7632             0.006103 ** 
## spircons  1.02729    0.12438  8.2593 0.000000000000005877 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    10.29
## Residual Sum of Squares: 7.9761
## R-Squared:      0.2249
## Adj. R-Squared: 0.07265
## F-statistic: 40.6222 on 2 and 280 DF, p-value: 0.00000000000000032358

1.1.4 fixef 提取出特有的、不随时间改变的“天生背景水平”或“固有差异”

fixef(fixed)
##        AL        AR        AZ        CA        CO        CT        DE        FL 
##  1.269227  0.926946  0.123137 -1.209680 -1.089790 -2.248239 -1.352593 -0.214829 
##        GA        IA        ID        IL        IN        KS        KY        LA 
##  0.849896 -0.242856  0.793980 -1.393393 -0.282001 -0.156068  0.260491  0.278784 
##        MA        MD        ME        MI        MN        MO        MS        MT 
## -2.272930 -1.702579 -0.459730 -0.769796 -1.457475 -0.203825  1.431560  0.668828 
##        NC        ND        NE        NH        NJ        NM        NV        NY 
##  0.630467 -0.854888 -0.543657 -2.957510 -2.155776  1.722967 -2.164578 -1.877086 
##        OH        OK        OR        PA        RI        SC        SD        TN 
## -0.420526  0.584941 -0.118840 -0.651213 -1.828010  1.244031 -0.076547  0.488462 
##        TX        UT        VA        VT        WA        WI        WV        WY 
##  0.152122  0.502898 -0.622678 -0.575079 -0.906985 -1.147033  1.055006  0.550873
sort(fixef(fixed), decreasing = TRUE)
##          NM          MS          AL          SC          WV          AR 
##  1.72296667  1.43156033  1.26922654  1.24403149  1.05500578  0.92694626 
##          GA          ID          MT          NC          OK          WY 
##  0.84989551  0.79398048  0.66882836  0.63046722  0.58494127  0.55087312 
##          UT          TN          LA          KY          TX          AZ 
##  0.50289787  0.48846245  0.27878366  0.26049062  0.15212238  0.12313743 
##          SD          OR          KS          MO          FL          IA 
## -0.07654684 -0.11883962 -0.15606808 -0.20382454 -0.21482860 -0.24285625 
##          IN          OH          ME          NE          VT          VA 
## -0.28200110 -0.42052601 -0.45973049 -0.54365750 -0.57507925 -0.62267775 
##          PA          MI          ND          WA          CO          WI 
## -0.65121302 -0.76979577 -0.85488808 -0.90698507 -1.08978969 -1.14703336 
##          CA          DE          IL          MN          MD          RI 
## -1.20967986 -1.35259271 -1.39339310 -1.45747549 -1.70257930 -1.82801024 
##          NY          NJ          NV          CT          MA          NH 
## -1.87708572 -2.15577594 -2.16457801 -2.24823877 -2.27292980 -2.95750969

一句话总结:fixef 的作用是提取出每个个体(如各个州)特有的、不随时间改变的“天生背景水平”或“固有差异”。 在面板数据分析中,执行完固定效应回归(model="within")后,使用 fixef(fixed) 是为了提取那些被“吸收”掉的个体固定效应


1.1.5 1. fixef(fixed) 的功能

在之前的 summary(fixed) 输出中,你看到的是自变量(如啤酒税、失业率)对死亡率的影响。但那个模型在计算过程中,为了消除各州自身的固有差异,实际上把每个州特有的“基准死亡率”给扣除了。

fixef(fixed) 的作用就是把这些被扣除的、各州特有的“身份标签”给找回来。


1.1.6 2. 统计学意义

在公式 \(y_{it} = x_{it}\beta + u_i + \varepsilon_{it}\) 中:

  • summary(fixed) 给出的是 \(\beta\)(自变量的影响)。
  • fixef(fixed) 给出的是 \(u_i\)(每个个体的固定截距)。

这些 \(u_i\) 代表了剔除所有自变量影响后,每个州自身固有的死亡率水平

  • 如果某个州的 fixef 值很:说明这个州由于某些模型没抓取到的原因(如地理崎岖、酒驾文化盛行),天生就比其他州交通事故多。
  • 如果某个州的 fixef 值很:说明这个州天生就比较安全。

1.1.7 3. 通俗解读

我们可以把这比作一次“减肥比赛”

  • 自变量(\(\beta\)): 代表“运动时间”和“节食量”对减重的影响。
  • 固定效应(\(u_i\)): 代表每个选手的“易胖体质”

当你运行 summary 时,老师告诉你“多运动一小时能减几斤”; 当你运行 fixef 时,老师是在告诉你:“抛开运动不谈,张三本身就是喝水都长肉的体质,而李四是天生吃不胖的体质。”


1.1.8 4. 常见用法

通常运行这个命令后,你会得到 48 个数值(对应 48 个州)。研究者通常会:

  1. 看排名: 哪些州的固有死亡风险最高?
  2. 验证假设: 如果你发现某个州的固定效应特别大,你可能会去查资料,发现那个州的交通法规可能一直以来就很松散。
  3. 计算预测值: 如果你想预测某个特定州明年的死亡率,你需要用 \(\beta \times X\) 加上这个州的 fixef

1.1.9 OLS vs FE

ols<-lm(fatal~beertax+spircons+unrate+perincK, data=traffic)
summary(ols)
## 
## Call:
## lm(formula = fatal ~ beertax + spircons + unrate + perincK, data = traffic)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.22581 -0.35100 -0.05238  0.27829  1.94364 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  4.11867    0.29700  13.868 < 0.0000000000000002 ***
## beertax      0.09720    0.06155   1.579             0.115256    
## spircons     0.16235    0.04325   3.754             0.000206 ***
## unrate      -0.02910    0.01272  -2.289             0.022731 *  
## perincK     -0.15843    0.01699  -9.327 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4793 on 331 degrees of freedom
## Multiple R-squared:  0.3019, Adjusted R-squared:  0.2934 
## F-statistic: 35.78 on 4 and 331 DF,  p-value: < 0.00000000000000022

把 OLS(pooled regression) 和 固定效应(FE) 做对比. 关键点:OLS 忽略了州固定效应(αᵢ) 也就是说它假设:所有州是“同质的”(没有不可观测差异)这在现实中通常是不成立的(文化、道路、安全法规等).

变量 OLS FE 结论
beertax 不显著 + 正 显著 - 负 OLS 有偏 ❗
spircons 稳健 ✔️
unrate 稳健 ✔️
perincK 严重偏误 ❗

在该面板数据中,由于存在未观测的个体效应(state-specific effects),且这些效应很可能与解释变量相关,OLS 估计是有偏且不一致的,因此固定效应(FE)估计更为可靠。

The pooled OLS estimator ignores unobserved state-specific effects. If these effects are correlated with the regressors, the OLS estimator is biased and inconsistent.

The fixed effects estimator controls for time-invariant heterogeneity across states. The substantial differences between OLS and FE estimates suggest that such heterogeneity is present.

Therefore, the fixed effects model is more appropriate in this case.

1.1.10 pFtest 在面板数据中的决策作用

# Pooled OLS == POLS
# Simple regression model used for panel data set.
# Biased and inconsistent estimates!
# Autocorrelation and heteroskedasticity problems

# tests for poolability

# pFtest 用于在“固定效应模型 (Fixed Effects)”和“混合回归模型 (Pooled OLS)”之间做抉择。
# fixed: 你之前运行的固定效应模型(考虑了各州的个体差异)。
# ols: 假设所有州都一模一样的普通回归模型(忽略了个体差异)。

# pFtest 的核心逻辑是:检查模型中那些“个体的固定效应(u_i)”是否全都等于 0?
# 如果它们不等于 0,说明每个州确实有自己的特性,不能简单地把数据“混在一起(Pool)”跑回归。
pFtest(fixed, ols)
## 
##  F test for individual effects
## 
## data:  fatal ~ beertax + spircons + unrate + perincK
## F = 59.768, df1 = 47, df2 = 284, p-value < 0.00000000000000022
## alternative hypothesis: significant effects

1.1.11 pbgtest 序列相关检验

pbgtest(Breusch-Godfrey/Wooldridge test)用于检测面板数据模型中是否存在序列相关(Serial Correlation / Autocorrelation)问题。它专门检查残差项在时间维度上是否相互关联。

# Testing for serial correlation
pbgtest(fixed)
## 
##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models
## 
## data:  fatal ~ beertax + spircons + unrate + perincK
## chisq = 25.11, df = 7, p-value = 0.0007256
## alternative hypothesis: serial correlation in idiosyncratic errors

1.1.11.1 核心逻辑

它在问:“同一个体今年的误差,是否会受到去年误差的影响?”原假设 (\(H_0\)): 不存在序列相关。即:随时间变化的随机扰动项(误差)是相互独立的。备择假设 (\(H_1\)): 存在序列相关。即:误差项在时间上具有惯性或关联。3. 结果解读统计量: chisq = 25.11,这是一个卡方分布统计量。p-value (0.0007256): 远小于 0.05。结论: 拒绝原假设,认为模型中存在显著的序列相关。

pbgtest 是在问:“误差项是不是‘记仇’或‘==有记性==’?”如果 \(P < 0.05\),说明去年的扰动会影响今年。你必须使用稳健标准误(Robust Standard Errors)来修正你的结果,否则结论就是“虚假繁荣”。

1.1.12 bptest 异方差检验

bptest(Breusch-Pagan test)用于检测模型是否存在异方差性(Heteroskedasticity)。参数 studentize=T 表示使用更稳健的 Koenker 修正版,适用于非正态分布的情况。

bptest 是在问:“误差项的脾气是不是一样大?”如果 \(P < 0.05\),说明模型在不同情况下的‘精准度’忽高忽低。你必须使用 稳健标准误(Robust Standard Errors) 来给你的 \(P\) 值‘挤水分’。特别提醒:在面板数据中,既然你前面的 pbgtest 发现了序列相关,现在的 bptest 又发现了异方差,那么你必须在最终报告结果时使用 聚类稳健标准误(Clustered Robust Standard Errors)。

# Testing for heteroskedasticity
bptest(fatal~beertax+spircons+unrate+perincK, data=traffic, studentize=T)
## 
##  studentized Breusch-Pagan test
## 
## data:  fatal ~ beertax + spircons + unrate + perincK
## BP = 30.917, df = 4, p-value = 0.000003184
# Controlling for heteroskedasticity and autocorrelation:
coeftest(fixed, vcov.=vcovHC(fixed, method="white1", type="HC0", cluster="group"))
## 
## t test of coefficients:
## 
##            Estimate Std. Error t value           Pr(>|t|)    
## beertax  -0.4840728  0.1513486 -3.1984          0.0015383 ** 
## spircons  0.8169651  0.1058757  7.7163 0.0000000000002046 ***
## unrate   -0.0290499  0.0076355 -3.8046          0.0001739 ***
## perincK   0.1047103  0.0229624  4.5601 0.0000076111116621 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coeftest 配合 vcovHC 是面板数据的“终极挤水工具”。 它同时修补了异方差和序列相关两大漏洞。如果修正后的星号依然存在,说明你的研究结论极其硬核。

Here is the OCR result formatted cleanly in Markdown:


2 Exercise 3

The data set dancingwiththestars.csv consists of judges’ ratings of professional dance competitors across 20 seasons of a popular television series. Set team and time as indices for the panel data model.


2.0.1 Variables

variable description
serial observation number
season 1–20 – the IV
episode 1–12 within each season (varies)
judgenum judge number (judges 1–3 are permanent)
judgexp number of episodes the judge has attended
dancenum dance within each episode
score the judge’s evaluation of the performance – the DV
finalist whether or not the team made the top 3 that season
teamID initials of the performers
ppepisodexp number of episodes the professional partner has been on the show

2.0.2 Questions

    1. Decide which of POLS, FE and RE model is appropriate for score.
    1. Create quality publication table on the basis of models’ results.

2.1 a) Decide which of POLS, FE and RE model is appropriate for score

Sys.setenv(LANG = "en")
options(scipen=999)

library("MASS")
library("sandwich")
library("zoo")
library("car")
library("lmtest")
library("Formula")
library("plm")
library("stargazer")

data <- read.csv('./dancingwiththestars.csv')

fixed <- plm(score ~ season + judgexp + ppepisodexp, data=data, 
             index=c("team", "time"), model="within")


random <- plm(score ~ season + judgexp + ppepisodexp, data=data, 
             index=c("team", "time"), model="random")


pooling <- plm(score ~ season + judgexp + ppepisodexp, data=data, 
              index=c("team", "time"), model="pooling")

pFtest(fixed, pooling) # p-value < 0.05 -> use FE
## 
##  F test for individual effects
## 
## data:  score ~ season + judgexp + ppepisodexp
## F = 12.512, df1 = 256, df2 = 1310, p-value < 0.00000000000000022
## alternative hypothesis: significant effects
plmtest(pooling, type="bp") # p-value < 0.05 -> 有 panel effect 
## 
##  Lagrange Multiplier Test - (Breusch-Pagan)
## 
## data:  score ~ season + judgexp + ppepisodexp
## chisq = 1472.7, df = 1, p-value < 0.00000000000000022
## alternative hypothesis: significant effects
phtest(fixed, random) # p-value < 0.05, -> use FE
## 
##  Hausman Test
## 
## data:  score ~ season + judgexp + ppepisodexp
## chisq = 40.657, df = 2, p-value = 0.000000001484
## alternative hypothesis: one model is inconsistent

First, the Breusch–Pagan LM test (plmtest) is conducted to check for panel effects. The p-value is extremely small (p < 0.001), so we reject the null hypothesis of no panel effects, indicating that panel data methods are appropriate over pooled OLS.

Second, the F-test (pFtest) is used to compare the fixed effects model with pooled OLS. The p-value is very small (p < 0.001), so we reject the null hypothesis that all individual effects are zero. This suggests that the fixed effects model is preferred over pooled OLS.

Finally, the Hausman test (phtest) is used to choose between fixed and random effects models. The p-value is very small (p < 0.001), so we reject the null hypothesis that the random effects estimator is consistent. This indicates that the random effects model is inappropriate.

Therefore, the fixed effects (FE) model is the most appropriate specification for explaining score.

2.2 b) Create quality publication table on the basis of models’ results.

library(lmtest)
library(sandwich)

se_pool <- sqrt(diag(vcovHC(pooling, type="HC1")))
se_fe   <- sqrt(diag(vcovHC(fixed, type="HC1")))
se_re   <- sqrt(diag(vcovHC(random, type="HC1")))

stargazer(pooling, fixed, random,
          type = "text",
          se = list(se_pool, se_fe, se_re),
          title = "Panel Data Models with Robust SE",
          dep.var.labels = "Score",
          column.labels = c("POLS", "FE", "RE"),
          digits = 3)
## 
## Panel Data Models with Robust SE
## ===========================================================================
##                                   Dependent variable:                      
##              --------------------------------------------------------------
##                                          Score                             
##                        POLS                       FE                 RE    
##                         (1)                       (2)               (3)    
## ---------------------------------------------------------------------------
## season               -2.501***                                   -2.112*** 
##                       (0.162)                                     (0.074)  
##                                                                            
## judgexp              0.229***                  0.198***           0.194*** 
##                       (0.015)                   (0.040)           (0.007)  
##                                                                            
## ppepisodexp          0.005***                   -0.005            0.006*** 
##                       (0.002)                   (0.041)           (0.002)  
##                                                                            
## Constant             11.116***                                   10.162*** 
##                       (0.308)                                     (0.202)  
##                                                                            
## ---------------------------------------------------------------------------
## Observations           1,570                     1,570             1,570   
## R2                     0.329                     0.388             0.446   
## Adjusted R2            0.328                     0.267             0.445   
## F Statistic  256.127*** (df = 3; 1566) 414.956*** (df = 2; 1310) 938.489***
## ===========================================================================
## Note:                                           *p<0.1; **p<0.05; ***p<0.01

表 X 报告了合并 OLS 模型、固定效应模型和随机效应模型的估计结果。 根据豪斯曼检验,固定效应模型更优。

在固定效应模型中,评委经验(judgexp)对评分具有正向且统计上显著的影响。在其他因素保持不变的情况下,评委经验每增加一个单位,评分约增加 0.198 分。

相比之下,变量 ppepisodexp 在固定效应模型中不具有统计显著性,尽管它在汇总 OLS 模型和随机效应模型中似乎显著。这表明后两种模型可能存在遗漏变量偏差。

总体而言,结果表明在构建法官评分模型时,控制未观测到的个体异质性至关重要。

Table X reports the estimation results for pooled OLS, fixed effects, and random effects models. Based on the Hausman test, the fixed effects model is preferred.

In the fixed effects model, judge experience (judgexp) has a positive and statistically significant effect on scores. A one-unit increase in judge experience increases the score by approximately 0.198 points, holding other factors constant.

In contrast, the variable ppepisodexp is not statistically significant in the fixed effects model, although it appears significant in the pooled OLS and random effects models. This suggests that the latter models may suffer from omitted variable bias.

Overall, the results indicate that controlling for unobserved individual heterogeneity is important when modeling judges’ scores.

解释:

  • 因为已经做过hauseman测试,重点解读第 (2) 列(FE 模型)即可。
  • season 在本FE模型里被吸收,不必解释。 The variable season is not estimated in the fixed effects model due to collinearity with time effects or the within transformation.
  • judgexp 0.198*** (0.040) 代表 在 FE 模型下,judgexp 每增加 1 单位,score 平均增加 0.198; *** 代表 p < 0.01(非常显著); (0.040) 是标准差。
  • ppepisodexp -0.005 (0.041) 代表 在 FE 模型下,ppepisodexp 每增加 1 单位,score 平均降低 0.005; 没有星号代表完全不显著,没有证据表明它有影响.

3 Exercise 4

The factors affecting the investment behavior by firms were studied by Grunfeld using a panel of data.

Investment demand is the purchase of durable goods by both households and firms. In terms of total spending, investment spending is the volatile component. Therefore, understanding what determines investment is crucial to understanding the sources of fluctuations in aggregate demand. In addition, a firm’s net fixed investment, which is the flow of additions to capital stock or replacements for worn-out capital, is important because it determines the future value of the capital stock and thus affects future labor productivity and aggregate supply.

There are several interesting and elaborate theories that seek to describe the determinants of the investment process for the firm. Most of these theories evolve to the conclusion that perceived profit opportunities (expected profits or present discounted value of future earnings) and desired capital stock are two important determinants of a firm’s fixed business investment. Unfortunately, neither of these variables are directly observable. Therefore, in formulating our economic model, we use observable proxies for these variables instead.

In terms of expected profits, one alternative is to identify the present discounted value of future earnings as the market value of the firm’s securities. The price of a firm’s stock represents and contains information about these expected profits. Consequently, the stock market value of the firm at the beginning of the year, denoted for firm i in time period t as ( V_{it} ), may be used as a proxy for expected profits.

In terms of desired capital stock, expectations play a definite role. To catch these expectations effects, one possibility is to use a model that recognizes that actual capital stock in any period is the sum of a large number of past desired capital stocks. Thus, we use the beginning of the year actual capital stock, denoted for the ith firm as ( K_{it} ), as a proxy for permanent desired capital stock.

Focusing on these explanatory variables, an economic model for describing gross firm investment for the ith firm in the tth time period, denoted ( INV_{it} ), may be expressed as:

\[INV_{it} = f(V_{it}, K_{it}) \tag{1}\]

Our concern is how we might take this general economic model and specify an econometric model that adequately represents a panel of real-world data. The data consist of ( T = 20 ) years of data (1935–1954) for ( N = 11 ) large firms.


3.1 Data Description

var name     variable label
------------------------------------------------------------
i            GM=1 USS=2 GE=3 Chr=4 Rich=5 IBM=6 UnOil=7 West=8 Goodyr=9 Match=10
t            year, t=1 is 1935; t=20 is 1954
inv          = gross investment in plant and equipment, millions of $1947
v            = value of common and preferred stock, millions of $1947
k            = stock of capital, millions of $1947
------------------------------------------------------------

Questions

    1. Decide which one of POLS, FE and RE model is appropriate for INV. Explain your answer.
    1. Create quality publication table on the basis of models’ results.
    1. Are the models balanced? Explain.

3.2 a) Decide which one of POLS, FE and RE model is appropriate for INV. Explain your answer.

Sys.setenv(LANG = "en")
options(scipen=999)

library("MASS")
library("sandwich")
library("zoo")
library("car")
library("lmtest")
library("Formula")
library("plm")
library("stargazer")

data("Grunfeld", package="plm")

fixed <- plm(inv ~ value + capital, data=Grunfeld, 
             index=c("firm", "year"), model="within")
summary(fixed)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = inv ~ value + capital, data = Grunfeld, model = "within", 
##     index = c("firm", "year"))
## 
## Balanced Panel: n = 10, T = 20, N = 200
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -184.00857  -17.64316    0.56337   19.19222  250.70974 
## 
## Coefficients:
##         Estimate Std. Error t-value              Pr(>|t|)    
## value   0.110124   0.011857  9.2879 < 0.00000000000000022 ***
## capital 0.310065   0.017355 17.8666 < 0.00000000000000022 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    2244400
## Residual Sum of Squares: 523480
## R-Squared:      0.76676
## Adj. R-Squared: 0.75311
## F-statistic: 309.014 on 2 and 188 DF, p-value: < 0.000000000000000222
random <- plm(inv ~ value + capital, data=Grunfeld, 
              index=c("firm", "year"), model="random")


pooling <- plm(inv ~ value + capital, data=Grunfeld, 
               index=c("firm", "year"), model="pooling")


plmtest(pooling, type="bp") # p-value < 0.05 -> 有 panel effect 
## 
##  Lagrange Multiplier Test - (Breusch-Pagan)
## 
## data:  inv ~ value + capital
## chisq = 798.16, df = 1, p-value < 0.00000000000000022
## alternative hypothesis: significant effects
plmtest(pooling, type="bp") # p-value < 0.05 -> 有 panel effect 
## 
##  Lagrange Multiplier Test - (Breusch-Pagan)
## 
## data:  inv ~ value + capital
## chisq = 798.16, df = 1, p-value < 0.00000000000000022
## alternative hypothesis: significant effects
pFtest(fixed, pooling) # p-value < 0.05 -> use FE
## 
##  F test for individual effects
## 
## data:  inv ~ value + capital
## F = 49.177, df1 = 9, df2 = 188, p-value < 0.00000000000000022
## alternative hypothesis: significant effects
phtest(fixed, random) # p-value < 0.05, -> use FE
## 
##  Hausman Test
## 
## data:  inv ~ value + capital
## chisq = 2.3304, df = 2, p-value = 0.3119
## alternative hypothesis: one model is inconsistent

The Breusch–Pagan Lagrange Multiplier test strongly rejects the null hypothesis of no panel effects (p-value < 0.001), indicating that pooling OLS is inappropriate.

The F-test for individual effects also rejects the null hypothesis (p-value < 0.001), suggesting that fixed effects are preferred over pooling.

Finally, the Hausman test yields a p-value of 0.3119, which is greater than 0.05. Therefore, we fail to reject the null hypothesis that the random effects estimator is consistent.

Conclusion: The random effects (RE) model is the most appropriate specification for explaining investment (inv), as it is both consistent and more efficient than the fixed effects model.

Breusch–Pagan 拉格朗日乘子检验强烈拒绝了“不存在面板效应”的零假设(p 值 < 0.001),表明采用池化普通最小二乘法(OLS)是不恰当的。

针对个体效应的 F 检验也拒绝了零假设(p 值 < 0.001),这表明与池化方法相比,固定效应模型更为合适。

最后,豪斯曼检验得出的p值为0.3119,大于0.05。因此,我们无法拒绝“随机效应估计量是一致的”这一零假设。

结论: 随机效应(RE)模型是解释投资(inv)的最合适模型,因为它既一致,又比固定效应模型更有效。

3.3 b) Create quality publication table on the basis of models’ results.

library(lmtest)
library(sandwich)

se_pool <- sqrt(diag(vcovHC(pooling, type="HC1")))
se_fe   <- sqrt(diag(vcovHC(fixed, type="HC1")))
se_re   <- sqrt(diag(vcovHC(random, type="HC1")))

stargazer(pooling, fixed, random,
          type = "text",
          se = list(se_pool, se_fe, se_re),
          title = "Panel Data Models with Robust SE",
          dep.var.labels = "Score",
          column.labels = c("POLS", "FE", "RE"),
          digits = 3)
## 
## Panel Data Models with Robust SE
## =========================================================================
##                                  Dependent variable:                     
##              ------------------------------------------------------------
##                                         Score                            
##                        POLS                      FE                RE    
##                        (1)                      (2)               (3)    
## -------------------------------------------------------------------------
## value                0.116***                 0.110***          0.110*** 
##                      (0.015)                  (0.014)           (0.013)  
##                                                                          
## capital              0.231***                 0.310***          0.308*** 
##                      (0.081)                  (0.050)           (0.052)  
##                                                                          
## Constant            -42.714**                                  -57.834** 
##                      (19.426)                                   (23.628) 
##                                                                          
## -------------------------------------------------------------------------
## Observations           200                      200               200    
## R2                    0.812                    0.767             0.770   
## Adjusted R2           0.811                    0.753             0.767   
## F Statistic  426.576*** (df = 2; 197) 309.014*** (df = 2; 188) 657.674***
## =========================================================================
## Note:                                         *p<0.1; **p<0.05; ***p<0.01

The estimation results show that both explanatory variables, value and capital, are statistically significant at the 1% level across all three models (POLS, FE, and RE).

The coefficient of value is positive and stable across specifications (around 0.11), indicating that higher market valuation is associated with higher investment.

Similarly, capital has a positive and significant effect on investment, with a somewhat larger magnitude in the FE and RE models compared to the pooling model.

Although all models yield similar qualitative results, based on the previous model selection tests (Breusch–Pagan, F-test, and Hausman test), the random effects (RE) model is preferred, as it is both consistent and more efficient.

The similarity of coefficients across models suggests robustness of the results.

估计结果表明,在所有三个模型(POLS、FE 和 RE)中,两个解释变量——估值和资本——均在 1% 显著性水平上具有统计学意义。

估值的系数在不同规格下均为正且稳定(约为 0.11),这表明较高的市场估值与较高的投资相关。

同样,资本对投资也具有正向且显著的影响,其效应量在FE和RE模型中略大于池化模型。

尽管所有模型得出的定性结果相似,但根据先前的模型选择检验(Breusch–Pagan检验、F检验和Hausman检验),随机效应(RE)模型更优,因为它既一致又更高效。

各模型系数的相似性表明了结果的稳健性。

3.4 c) Are the models balanced? Explain.

The panel is balanced. A panel is considered balanced when each cross-sectional unit is observed for the same number of time periods.

In this dataset, there are \(N=10\) firms and \(T=20\) time periods, resulting in a total of 200 observations, which equals \(N \times T\). This indicates that each firm is observed in every year from 1935 to 1954, with no missing data.

Therefore, the panel is balanced.