These study notes cover the essential material from Lecture 11: Distributed Lag (DL) and Autoregressive Distributed Lag (ARDL) Models. These models are ==the workhorses of time series analysis==, used to understand how economic variables react to changes over time.
The central question of this chapter is: How do we model the dynamic relationship between variables?
In basic cross-sectional econometrics, we assume \(x\) affects \(y\) immediately. However, in the real world—especially with time series data—actions have delayed effects. For example, an increase in the money supply today might not affect inflation until several months later. (在基础的横截面计量经济学中,我们假设 \(x\) 对 \(y\) 产生即时影响。然而,在现实世界中——尤其是涉及时间序列数据时——行动往往具有滞后效应。例如,今天的货币供应量增加可能要到几个月后才会对通货膨胀产生影响。)
This chapter introduces:
Distributed Lag (DL) models: Where current \(y\) is explained by current and past values of \(x\).
Autoregressive Distributed Lag (ARDL) models: Where current \(y\) is explained by past values of \(y\) AND current/past values of \(x\).
Long-term Equilibrium: How to calculate the “steady state” where a system settles after a shock.
==分布滞后==(DL)模型: 当前的 \(y\) 由 \(x\) 的当前值和过去值共同解释。
==自回归分布滞后==(ARDL)模型: 当前的 \(y\) 由过去的 \(y\) 值以及当前/过去的 \(x\) 值共同解释。
长期均衡: 如何计算系统在经历冲击后所达到的“稳态”。
A “lag” is simply the value of a variable from a previous time period (\(t-1, t-2, \text{etc.}\)). Time series models use these to capture persistence (the tendency of a variable to stay where it is) and delayed reactions. 所谓“滞后”,即指前一时间段(\(t-1, t-2, \text{等}\))中某个变量的数值。时间序列模型利用这些滞后项来捕捉滞后性(变量保持当前水平的倾向)和延迟反应。
Before estimating these models, variables must be stationary (their mean and variance do not change over time).(在估计这些模型之前,变量必须是平稳的(其均值和方差不随时间变化))
This is a theoretical situation where nothing changes—there are no shocks, and variables remain constant (\(y_t = y_{t-1} = y^*\)). It allows economists to see the “long-run” relationship between variables once all temporary fluctuations have died out.(这是一个理论上的情况,其中没有任何变化——没有冲击,变量保持不变。它使经济学家能够看到变量之间的”长期”关系,一旦所有临时波动都消退)
Formula: \(y_t = \mu + \beta_0 x_t + \beta_1 x_{t-1} + ... + \beta_p x_{t-p} + \epsilon_t\).
Formula: \(y_t = \alpha_1 y_{t-1} + ... + \alpha_p y_{t-p} + \mu + \beta_0 x_t + ... + \beta_s x_{t-s} + \epsilon_t\).
In an exam, follow this logic to determine the appropriate model:(在考试中,按照以下逻辑来确定合适的模型)
In time series, interpreting a single lagged coefficient (like \(\beta_2\)) in isolation makes “little sense” because a change in \(x_{t-2}\) usually implies \(x\) was also changing in \(t-1\) and \(t\). Instead, focus on the multipliers.(在时间序列中,单独解释单个滞后系数(如\(\beta_2\))“意义不大”,因为\(x_{t-2}\)的变化通常意味着\(x\)在\(t-1\)和\(t\)也在变化。相反,应该关注乘数)
This is the most likely exam question.(这是最可能出现的考试题目)
Used to choose the number of lags. Usually, we choose the model that provides the lowest AIC or BIC value.(用于选择滞后项的数量。通常,我们选择提供最低AIC或BIC值的模型)
动态关系:在时间序列中,变量的影响往往具有滞后效应(例如:货币供应量对通胀的影响) 。
滞后项 (Lag):指前一时间段(\(t-1, t-2\)等)的数值,用于捕捉持久性和延迟反应 。
平稳性 (Stationarity):估计模型的前提。非平稳序列需通过“差分”处理 。
稳态 (Steady State):不存在冲击、变量保持不变的理论状态(\(y_t = y_{t-1} = y^*\)),用于观察长期均衡关系 。
分布滞后 (DL) 模型:\(y_t = \mu + \beta_0 x_t + \beta_1 x_{t-1} + \dots + \beta_p x_{t-p} + \epsilon_t\) 。
自回归分布滞后 (ARDL) 模型:\(y_t = \alpha_1 y_{t-1} + \dots + \alpha_p y_{t-p} + \mu + \beta_0 x_t + \dots + \beta_s x_{t-s} + \epsilon_t\) 。
短期(即期)乘数:当期系数 \(\beta_0\) 。
长期(累积)乘数 (LTM):
DL模型:\(LTM = \sum \beta_i\) 。
ARDL模型:\(LTM = \frac{\sum \beta_i}{1 - \sum \alpha_i}\) 。
平均反应滞后:\(\bar{\omega} = \frac{\sum (i \cdot \beta_i)}{LTM}\),衡量反应速度 。
本部分根据白板照片整理,对模型估计、诊断检验及复杂场景进行了深入补充。
根据样本量和平稳性,估计路径分为“简单路径”与“困难路径”:
当所有变量均为平稳 (Stationary) 时,使用 OLS (普通最小二乘法) 进行估计。需满足:
函数形式设定正确:通过 Ramsey RESET 检验。
同方差性 (Homoskedasticity):通过 Breusch-Pagan (BP) 检验。
无自相关 (Lack of autocorrelation):
DL模型使用 Breusch-Godfrey (BG) 检验。
ARDL模型使用 Ljung-Box (Portmanteau) 检验。
大样本:拥有足够多的观测值。
如果无法确保平稳性或处于小样本环境,估计变得更加困难,需依赖严格外生性 (Strict Exogeneity):
当期外生性 (Contemporaneous Exogeneity):\(Cov(x_t, \epsilon_t) = 0\)。
强/严格外生性 (Strong/Strict Exogeneity):\(\epsilon_t\) 与任何时期的 \(x\) 变量都不相关。
适用场景:即使没有平稳性假设或样本量较小,若满足严格外生性,模型逻辑依然成立。
在处理多个解释变量(如 \(x, z, w\))时,模型形式如下:
\[y_t = \beta_0 + (\beta_1 x_t + \beta_2 x_{t-1}) + (\alpha_1 z_t + \alpha_2 z_{t-1} + \alpha_3 z_{t-2}) + (\delta_1 w_{t-1} + \delta_2 w_{t-12}) + (\gamma_1 y_{t-1} + \dots + \gamma_s y_{t-s}) + \epsilon_t\]
稳态 (Steady State) 求解逻辑:
假设系统达到均衡,所有时间下标消失:\(y_t = y_{t-1} = y^*\),\(x_t = x_{t-1} = x^*\),以此类推。
\[y^* = \beta_0 + \beta_1 x^* + \beta_2 x^* + \alpha_1 z^* + \alpha_2 z^* + \alpha_3 z^* + \delta_1 w^* + \delta_2 w^* + \gamma_1 y^* + \dots + \gamma_s y^*\]
通过代数移项,即可解出长期均衡值 \(y^*\)。
针对永久冲击 (Permanent Shock),即变量增加并保持在水平值(如 \(x_{15}=x_{16}=\dots=7\)),各变量的 LTM 计算如下:
| 模型类型 | 变量 x 的 LTM | 变量 z 的 LTM |
|---|---|---|
| DL 模型 | \(LTM_x = \beta_1 + \beta_2\) | \(LTM_z = \alpha_1 + \alpha_2 + \alpha_3\) |
| ARDL 模型 | \(LTM_x = \frac{\beta_1 + \beta_2}{1 - (\gamma_1 + \gamma_2 + \dots + \gamma_s)}\) | \(LTM_z = \frac{\alpha_1 + \alpha_2 + \alpha_3}{1 - (\gamma_1 + \dots + \gamma_s)}\) |
| 检验项目 | 工具 | 零假设 (H0) | 失败后的补救措施 |
|---|---|---|---|
| 函数形式 | Ramsey RESET | 模型设定正确/反映现实 | 增加变量、添加滞后项、取对数 |
| 同方差性 | Breusch-Pagan | 误差项为同方差 | 使用对数形式、使用稳健标准误 |
| 自相关 | BG / Ljung-Box | 残差项无自相关 | 增加滞后项(尤其是 \(y\) 的滞后项) |
关键结论:
在 ARDL 模型中,自相关会导致内生性偏误,必须通过增加滞后项使 \(p-value > 0.05\)。
如果无法完美解决同方差或自相关问题,应使用 稳健协方差矩阵估计量 (Robust Standard Errors) 来确保推断的有效性。
###########################################################################
# Advanced Econometrics #
# Spring semester #
# dr Marcin Chlebus, dr Rafał Woźniak #
# University of Warsaw, Faculty of Economic Sciences #
# #
# #
# Lab 12: (AR)DL Linear regression #
# #
###########################################################################
if(!require(lmtest)){install.packages("lmtest")}
library(lmtest)
if(!require(dynlm)){install.packages("dynlm")}
library(dynlm)
# install.packages("devtools")
# library(devtools)
# install_github("fcbarbi/ardl")
# library(ardl)
if(!require(AER)){install.packages("AER")}
library(AER)
if(!require(zoo)){install.packages("zoo")}
library(zoo)
if(!require(xts)){install.packages("xts")}
library(xts)
if(!require(fBasics)){install.packages("fBasics")}
library(fBasics)
Sys.setenv(LANG = "en")
options(scipen=5)##########################################################
# Expolratory data analysis
#########################################################
# Quarterly US macroeconomic data from 1950(1) – 2000(4)
# provided by USMacroG, a “ts” time series. Contains
# dpi --> disposable income
# consumption --> consumption (in billion USD).
data(USMacroG)
data(package = "AER")
USA<-as.zoo(USMacroG)
USA[1:5,]## gdp consumption invest government dpi cpi m1 tbill unemp
## 1950 Q1 1610.5 1058.9 198.1 361.0 1186.1 70.6 110.20 1.12 6.4
## 1950 Q2 1658.8 1075.9 220.4 366.4 1178.1 71.4 111.75 1.17 5.6
## 1950 Q3 1723.0 1131.0 239.7 359.6 1196.5 73.2 112.95 1.23 4.6
## 1950 Q4 1753.9 1097.6 271.8 382.5 1210.0 74.9 113.93 1.35 4.2
## 1951 Q1 1773.5 1122.8 242.9 421.9 1207.9 77.3 115.08 1.40 3.5
## population inflation interest
## 1950 Q1 149.461 NA NA
## 1950 Q2 150.260 4.5071 -3.3404
## 1950 Q3 151.064 9.9590 -8.7290
## 1950 Q4 151.871 9.1834 -7.8301
## 1951 Q1 152.393 12.6160 -11.2160
#plotting DPI and consuption
colors_ <- c("black", "red")
plot(USA[,c("dpi","consumption")], plot.type = "single",
col = colors_, # colors for subsequent lines
ylab = "dpi/consumption", xlab = "time", # axes labels
main = "DPI & Consumption in the USA 1950 - 2000") # title above the plot
# separately we add a legend definition
legend("topleft", # legend position - combination of top, bottom and left, right
names(USA[,c("dpi","consumption")]), # legend elements (names of the series)
text.col = colors_) # colors (the same as in plot() function above)#Conclusions?
# 1. Those 2 variables follows probably the same trend, in regression we would find this relation, but probably nothing more
# What to do?
# differentiation - is there relation between deviance from the trend?
# we want to work on stationary TS #plotting first difference of DPI and consuption
plot(diff(USA[,c("dpi","consumption")]), plot.type = "single",
col = colors_, # colors for subsequent lines
ylab = "dpi/consumtion", xlab = "time", # axes labels
main = "DPI & Consumption in the USA 1950 - 2000") # title above the plot
# separately we add a legend definition
legend("topleft", # legend position - combination of top, bottom and left, right
names(USA[,c("dpi","consumption")]), # legend elements (names of the series)
text.col = colors_) # colors (the same as in plot() function above) #stationarity testing
source("function_testdf2.R")
options(scipen=999)
testdf2(variable = USA[,c("dpi")], # vector tested
test.type = "c", # test.type = "c",
max.augmentations = 5, # maximum number of augmentations added
max.order=5) # maximum order of residual lags for BG test## augmentations adf p_adf p_bg.p.value.1 p_bg.p.value.2 p_bg.p.value.3
## 1 0 4.520147 0.99 0.4757627 0.1466866 0.2377082
## 2 1 4.417310 0.99 0.9225736 0.1784467 0.2924392
## 3 2 3.639854 0.99 0.9354447 0.9963847 0.7998528
## 4 3 3.342271 0.99 0.9843572 0.9972283 0.9836444
## 5 4 3.069740 0.99 0.9103627 0.9851835 0.9822840
## 6 5 3.715439 0.99 0.9635154 0.9980164 0.9962116
## p_bg.p.value.4 p_bg.p.value.5
## 1 0.3438192 0.01927990
## 2 0.4280401 0.02312387
## 3 0.8876616 0.08151424
## 4 0.9893585 0.10515041
## 5 0.9961649 0.10115539
## 6 0.9919605 0.99814270
testdf2(variable = diff(USA[,c("dpi")]), # vector tested
test.type = "c", # test.type = "c",
max.augmentations = 5, # maximum number of augmentations added
max.order=5) # maximum order of residual lags for BG test## augmentations adf p_adf p_bg.p.value.1 p_bg.p.value.2 p_bg.p.value.3
## 1 0 -13.460411 0.01 0.8720424 0.01570519 0.0149188
## 2 1 -7.813974 0.01 0.7532990 0.89567380 0.3446841
## 3 2 -6.038376 0.01 0.8871046 0.98923951 0.9932058
## 4 3 -5.022296 0.01 0.8511516 0.97342644 0.9908236
## 5 4 -5.491999 0.01 0.9090418 0.99051048 0.9943576
## 6 5 -4.799076 0.01 0.9367024 0.98509353 0.9882262
## p_bg.p.value.4 p_bg.p.value.5
## 1 0.01496486 0.004641123
## 2 0.32977415 0.113457768
## 3 0.90527668 0.239907589
## 4 0.99687858 0.225223810
## 5 0.99900087 0.998999470
## 6 0.99643513 0.998707633
testdf2(variable = USA[,c("consumption")], # vector tested
test.type = "c", # test.type = "c",
max.augmentations = 5, # maximum number of augmentations added
max.order=5) # maximum order of residual lags for BG test## augmentations adf p_adf p_bg.p.value.1 p_bg.p.value.2 p_bg.p.value.3
## 1 0 8.296398 0.99 0.005546307 0.0001103848 0.000008381582
## 2 1 5.771917 0.99 0.485949219 0.0208831243 0.000140733984
## 3 2 4.367355 0.99 0.755707867 0.9043717071 0.006860732824
## 4 3 3.066683 0.99 0.750462543 0.9496740461 0.832868347784
## 5 4 3.252579 0.99 0.991992033 0.9729771490 0.908205859670
## 6 5 3.202110 0.99 0.854199944 0.9826726715 0.998104122232
## p_bg.p.value.4 p_bg.p.value.5
## 1 0.00001907399 0.00003700224
## 2 0.00041767804 0.00065979348
## 3 0.01362072624 0.02641457522
## 4 0.79058700070 0.83992186062
## 5 0.96783873221 0.97249863759
## 6 0.99938185774 0.99929469234
testdf2(variable = diff(USA[,c("consumption")]), # vector tested
test.type = "c", # test.type = "c",
max.augmentations = 5, # maximum number of augmentations added
max.order=5) # maximum order of residual lags for BG test## augmentations adf p_adf p_bg.p.value.1 p_bg.p.value.2
## 1 0 -9.184518 0.01000000 0.04956141 0.003167801
## 2 1 -5.237949 0.01000000 0.30523939 0.246359385
## 3 2 -3.629948 0.01000000 0.87947985 0.926376184
## 4 3 -3.554668 0.01000000 0.97024924 0.906962022
## 5 4 -3.607831 0.01000000 0.86171475 0.984715987
## 6 5 -3.126966 0.02712406 0.98008540 0.998966222
## p_bg.p.value.3 p_bg.p.value.4 p_bg.p.value.5
## 1 0.0000001672182 0.0000001920137 0.0000006372337
## 2 0.0017024608681 0.0042263851347 0.0089973989528
## 3 0.9548186200858 0.9089565420640 0.9419401977419
## 4 0.9599028563715 0.9609444552864 0.9748349856447
## 5 0.9981523313452 0.9903607166502 0.9970624226973
## 6 0.9995951693166 0.9996078091443 0.9999463124934
# first-differences seem to have increasing variance
# thus, let us analyse the first difference of the log-transformed
# first differences
testdf2(variable = diff(log(USA[,c("dpi")])), # vector tested
test.type = "c", # test.type = "nc",
max.augmentations = 5, # maximum number of augmentations added
max.order=5) # maximum order of residual lags for BG test## augmentations adf p_adf p_bg.p.value.1 p_bg.p.value.2 p_bg.p.value.3
## 1 0 -13.218923 0.01 0.9662664 0.6590377 0.4960841
## 2 1 -8.916091 0.01 0.9100174 0.9923266 0.6236955
## 3 2 -6.824515 0.01 0.9450289 0.9383219 0.9883654
## 4 3 -6.336003 0.01 0.9707415 0.9714525 0.9958945
## 5 4 -7.033210 0.01 0.8707947 0.9863307 0.9962280
## 6 5 -5.947960 0.01 0.9861110 0.9986554 0.9985395
## p_bg.p.value.4 p_bg.p.value.5
## 1 0.6648265 0.04837356
## 2 0.7793006 0.07764483
## 3 0.9973459 0.14685535
## 4 0.9989003 0.17154874
## 5 0.9994993 0.99966040
## 6 0.9998475 0.99914533
testdf2(variable = diff(log(USA[,c("consumption")])), # vector tested
test.type = "c", # test.type = "nc",
max.augmentations = 5, # maximum number of augmentations added
max.order=5) # maximum order of residual lags for BG test## augmentations adf p_adf p_bg.p.value.1 p_bg.p.value.2 p_bg.p.value.3
## 1 0 -13.791807 0.01 0.71451406 0.0002341137 0.0007533067
## 2 1 -8.134460 0.01 0.07989654 0.1829103993 0.1138040669
## 3 2 -6.268350 0.01 0.71253996 0.4629639320 0.1105162682
## 4 3 -6.777003 0.01 0.88973894 0.1969371662 0.0370375602
## 5 4 -6.450499 0.01 0.56548398 0.8323252837 0.8093948218
## 6 5 -6.174957 0.01 0.91795679 0.9945301814 0.9996765731
## p_bg.p.value.4 p_bg.p.value.5
## 1 0.0004893063 0.0008670902
## 2 0.0405561599 0.0464505336
## 3 0.0493164293 0.0478830419
## 4 0.0730829520 0.0966908739
## 5 0.9059957973 0.9334284979
## 6 0.9997289343 0.9857120180
##########################################################
# Autoregressive Distributed Lagged Model
#########################################################
# Distirbuted Lagged Model
# As regressors we use lags of explanatory variables
# start = c(1960, 1) is set to make comparison between models with different lag order possible
dl_1 <- dynlm(d(consumption) ~ d(dpi) + L(d(dpi)), data = USA, start = c(1960, 1))
summary(dl_1)##
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
##
## Call:
## dynlm(formula = d(consumption) ~ d(dpi) + L(d(dpi)), data = USA,
## start = c(1960, 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -64.413 -12.797 -1.915 12.594 53.529
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.60067 2.69478 4.676 0.00000615612291 ***
## d(dpi) 0.37915 0.04942 7.673 0.00000000000152 ***
## L(d(dpi)) 0.18116 0.04957 3.655 0.000348 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.06 on 161 degrees of freedom
## Multiple R-squared: 0.3074, Adjusted R-squared: 0.2988
## F-statistic: 35.73 on 2 and 161 DF, p-value: 0.0000000000001437
# alternative model for log-transformed variables
USA$ln_consumption = log(USA$consumption)
USA$ln_dpi = log(USA$dpi)
dl_ln1 <- dynlm(d(ln_consumption) ~ d(ln_dpi) + L(d(ln_dpi)), data = USA, start = c(1960, 1))
summary(dl_ln1)##
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
##
## Call:
## dynlm(formula = d(ln_consumption) ~ d(ln_dpi) + L(d(ln_dpi)),
## data = USA, start = c(1960, 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0202245 -0.0035871 -0.0002189 0.0040243 0.0132598
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0038889 0.0007819 4.973 0.000001672323833 ***
## d(ln_dpi) 0.4165784 0.0530758 7.849 0.000000000000554 ***
## L(d(ln_dpi)) 0.1655767 0.0530807 3.119 0.00215 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.005956 on 161 degrees of freedom
## Multiple R-squared: 0.3121, Adjusted R-squared: 0.3035
## F-statistic: 36.52 on 2 and 161 DF, p-value: 0.00000000000008359
#in order to draw a conclusions based on the model we would like to sort out a problem with autocorrelation
#has this model sorted out autocorrelation issue? - nope
resids_ <- as.data.frame(dl_1$residuals)
colnames(resids_)<-c("res")
bgtest(res~1,data=resids_, order = 1)##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: res ~ 1
## LM test = 2.8539, df = 1, p-value = 0.09116
##
## Breusch-Godfrey test for serial correlation of order up to 2
##
## data: res ~ 1
## LM test = 11.199, df = 2, p-value = 0.0037
##
## Breusch-Godfrey test for serial correlation of order up to 3
##
## data: res ~ 1
## LM test = 22.658, df = 3, p-value = 0.00004758
##
## Breusch-Godfrey test for serial correlation of order up to 4
##
## data: res ~ 1
## LM test = 23.717, df = 4, p-value = 0.00009101
##
## Breusch-Godfrey test for serial correlation of order up to 5
##
## data: res ~ 1
## LM test = 24.111, df = 5, p-value = 0.0002067
使用 c(1:5) 来显示 5 个lags.
dl_5_all <- dynlm(d(consumption) ~ d(dpi) + L(d(dpi),c(1:5)), data = USA,start = c(1960, 1))
summary(dl_5_all)##
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
##
## Call:
## dynlm(formula = d(consumption) ~ d(dpi) + L(d(dpi), c(1:5)),
## data = USA, start = c(1960, 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -64.047 -11.994 -1.912 12.038 49.236
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.64149 3.56385 1.583 0.11544
## d(dpi) 0.36923 0.05050 7.311 0.0000000000127 ***
## L(d(dpi), c(1:5))1 0.16208 0.04976 3.257 0.00138 **
## L(d(dpi), c(1:5))2 0.08255 0.05033 1.640 0.10295
## L(d(dpi), c(1:5))3 0.05801 0.05077 1.143 0.25497
## L(d(dpi), c(1:5))4 0.04099 0.05024 0.816 0.41585
## L(d(dpi), c(1:5))5 0.08489 0.05103 1.664 0.09821 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.74 on 157 degrees of freedom
## Multiple R-squared: 0.3451, Adjusted R-squared: 0.3201
## F-statistic: 13.79 on 6 and 157 DF, p-value: 0.000000000001476
dl_5 <- dynlm(d(consumption) ~ d(dpi) + L(d(dpi),c(1,2,5)), data = USA,start = c(1960, 1))
summary(dl_5)##
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
##
## Call:
## dynlm(formula = d(consumption) ~ d(dpi) + L(d(dpi), c(1, 2, 5)),
## data = USA, start = c(1960, 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -65.371 -12.682 -1.721 11.804 48.567
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.58732 3.25397 2.332 0.02097 *
## d(dpi) 0.37793 0.05009 7.545 0.00000000000328 ***
## L(d(dpi), c(1, 2, 5))1 0.17365 0.04902 3.543 0.00052 ***
## L(d(dpi), c(1, 2, 5))2 0.08627 0.04967 1.737 0.08436 .
## L(d(dpi), c(1, 2, 5))5 0.09302 0.05030 1.849 0.06626 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.73 on 159 degrees of freedom
## Multiple R-squared: 0.3372, Adjusted R-squared: 0.3205
## F-statistic: 20.22 on 4 and 159 DF, p-value: 0.0000000000001754
## Analysis of Variance Table
##
## Model 1: d(consumption) ~ d(dpi) + L(d(dpi), c(1:5))
## Model 2: d(consumption) ~ d(dpi) + L(d(dpi), c(1, 2, 5))
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 157 67530
## 2 159 68350 -2 -819.71 0.9529 0.3879
#has this model sorted out autocorrelation issue? - nope
resids_ <- as.data.frame(dl_5$residuals)
colnames(resids_)<-c("res")
bgtest(res~1,data=resids_, order = 1)##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: res ~ 1
## LM test = 1.2403, df = 1, p-value = 0.2654
##
## Breusch-Godfrey test for serial correlation of order up to 2
##
## data: res ~ 1
## LM test = 8.7087, df = 2, p-value = 0.01285
##
## Breusch-Godfrey test for serial correlation of order up to 3
##
## data: res ~ 1
## LM test = 16.047, df = 3, p-value = 0.001109
##
## Breusch-Godfrey test for serial correlation of order up to 4
##
## data: res ~ 1
## LM test = 16.918, df = 4, p-value = 0.002005
##
## Breusch-Godfrey test for serial correlation of order up to 5
##
## data: res ~ 1
## LM test = 17.682, df = 5, p-value = 0.003373
==TODO==: explain coeftest
# Autocorrelation issue has beed not solved, but in DL models the problem is only in inconsistent Variance - Covariance matrix
# Newey - Davis robust estimator can be used to update the results and draw conclusions
library(lmtest)
library(sandwich)
coeftest(dl_5, vcov=vcovHAC(dl_5))##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.587319 3.566781 2.1272 0.0349445 *
## d(dpi) 0.377935 0.063942 5.9106 0.00000002012 ***
## L(d(dpi), c(1, 2, 5))1 0.173651 0.046350 3.7465 0.0002504 ***
## L(d(dpi), c(1, 2, 5))2 0.086269 0.051880 1.6629 0.0983093 .
## L(d(dpi), c(1, 2, 5))5 0.093020 0.061803 1.5051 0.1342819
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Maybe Autoregressive Distirbuted Lagged Model would help?
# Note: Autocorrelation in residuals for ARDL lead to inconsistency of estimators!
ardl_5_all <- dynlm(d(consumption) ~ L(d(consumption),c(1:5)) + d(dpi) + L(d(dpi),c(1,2,5)), data = USA, start = c(1960, 1))
summary(ardl_5_all)##
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
##
## Call:
## dynlm(formula = d(consumption) ~ L(d(consumption), c(1:5)) +
## d(dpi) + L(d(dpi), c(1, 2, 5)), data = USA, start = c(1960,
## 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -69.815 -10.597 0.132 11.497 46.341
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.402872 3.170389 1.389 0.166915
## L(d(consumption), c(1:5))1 0.069549 0.082560 0.842 0.400871
## L(d(consumption), c(1:5))2 0.240945 0.082132 2.934 0.003863 **
## L(d(consumption), c(1:5))3 0.251578 0.074329 3.385 0.000904 ***
## L(d(consumption), c(1:5))4 -0.080380 0.075909 -1.059 0.291300
## L(d(consumption), c(1:5))5 -0.006211 0.084126 -0.074 0.941237
## d(dpi) 0.300907 0.051961 5.791 0.0000000381 ***
## L(d(dpi), c(1, 2, 5))1 0.079843 0.056471 1.414 0.159414
## L(d(dpi), c(1, 2, 5))2 -0.047063 0.055210 -0.852 0.395299
## L(d(dpi), c(1, 2, 5))5 0.041920 0.054999 0.762 0.447111
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.41 on 154 degrees of freedom
## Multiple R-squared: 0.4371, Adjusted R-squared: 0.4042
## F-statistic: 13.29 on 9 and 154 DF, p-value: 0.00000000000000135
ardl_5 <- dynlm(d(consumption) ~ L(d(consumption),c(2,3)) + d(dpi) + L(d(dpi),c(1)), data = USA, start = c(1960, 1))
summary(ardl_5)##
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
##
## Call:
## dynlm(formula = d(consumption) ~ L(d(consumption), c(2, 3)) +
## d(dpi) + L(d(dpi), c(1)), data = USA, start = c(1960, 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.136 -11.432 -0.785 11.774 44.642
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.44202 2.84257 1.563 0.12012
## L(d(consumption), c(2, 3))2 0.20552 0.06809 3.018 0.00296 **
## L(d(consumption), c(2, 3))3 0.25150 0.06820 3.688 0.00031 ***
## d(dpi) 0.29335 0.04776 6.142 0.00000000627 ***
## L(d(dpi), c(1)) 0.10020 0.04790 2.092 0.03806 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.27 on 159 degrees of freedom
## Multiple R-squared: 0.4276, Adjusted R-squared: 0.4132
## F-statistic: 29.7 on 4 and 159 DF, p-value: < 0.00000000000000022
## Analysis of Variance Table
##
## Model 1: d(consumption) ~ L(d(consumption), c(1:5)) + d(dpi) + L(d(dpi),
## c(1, 2, 5))
## Model 2: d(consumption) ~ L(d(consumption), c(2, 3)) + d(dpi) + L(d(dpi),
## c(1))
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 154 58046
## 2 159 59023 -5 -976.74 0.5183 0.7622
#has this model sorted out autocorrelation issue? - Yes!!
resids_ <- as.data.frame(ardl_5$residuals)
colnames(resids_)<-c("res")
bgtest(res~1,data=resids_, order = 1)##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: res ~ 1
## LM test = 0.82863, df = 1, p-value = 0.3627
##
## Breusch-Godfrey test for serial correlation of order up to 2
##
## data: res ~ 1
## LM test = 1.1992, df = 2, p-value = 0.549
##
## Breusch-Godfrey test for serial correlation of order up to 3
##
## data: res ~ 1
## LM test = 1.2328, df = 3, p-value = 0.7452
##
## Breusch-Godfrey test for serial correlation of order up to 4
##
## data: res ~ 1
## LM test = 2.5794, df = 4, p-value = 0.6305
##
## Breusch-Godfrey test for serial correlation of order up to 5
##
## data: res ~ 1
## LM test = 2.7137, df = 5, p-value = 0.744
###################################################################################################
#Diagnostics other than Autocorrelation
###################################################################################################
# Diagnostics plots
par(mfrow=c(2,2))
plot(ardl_5) par(mfrow=c(1,1))
# 1. Residuals vs Fitted
#
# This plot shows if residuals have non-linear patterns.
# There could be a non-linear relationship between predictor variables and an outcome variable
# and the pattern could show up in this plot if the model doesn’t capture the non-linear relationship.
# If you find equally spread residuals around a horizontal line without distinct patterns,
# that is a good indication you don’t have non-linear relationships.
# 2. Normal Q-Q
#
# This plot shows if residuals are normally distributed.
# Do residuals follow a straight line well or do they deviate severely?
# It’s good if residuals are lined well on the straight dashed line.
#residuals normality test
if(!require(interp)){install.packages("interp")}
library(interp)
jbTest(as.matrix(residuals(ardl_5)))##
## Title:
## Jarque - Bera Normality Test
##
## Test Results:
## PARAMETER:
## Sample Size: 164
## STATISTIC:
## LM: 4.04
## ALM: 4.688
## P VALUE:
## LM p-value: 0.096
## ALM p-value: 0.084
## Asymptotic: 0.133
# 3. Scale-Location
#
# It’s also called Spread-Location plot.
# This plot shows if residuals are spread equally along the ranges of predictors.
# This is how you can check the assumption of equal variance (homoscedasticity).
# It’s good if you see a horizontal line with equally (randomly) spread points.
# 4. Residuals vs Leverage
#
# This plot helps us to find influential cases (i.e., subjects) if any.
# Not all outliers are influential in linear regression analysis (whatever outliers mean).
# Even though data have extreme values, they might not be influential to determine a regression line.
# That means, the results wouldn’t be much different if we either include or exclude them from analysis.
# They follow the trend in the majority of cases and they don’t really matter; they are not influential.
# On the other hand, some cases could be very influential even if they look to be within a reasonable range of the values.
# They could be extreme cases against a regression line and can alter the results if we exclude them from analysis.
# Another way to put it is that they don’t get along with the trend in the majority of the cases.
#
# Unlike the other plots, this time patterns are not relevant.
# We watch out for outlying values at the upper right corner or at the lower right corner.
# Those spots are the places where cases can be influential against a regression line.
# Look for cases outside of a dashed line, Cook’s distance.
# When cases are outside of the Cook’s distance (meaning they have high Cook’s distance scores),
# the cases are influential to the regression results. The regression results will be altered if we exclude those cases.
#
# http://data.library.virginia.edu/diagnostic-plots/##
## Breusch-Pagan test
##
## data: ardl_5
## BP = 20.181, df = 4, p-value = 0.00046
##
## studentized Breusch-Pagan test
##
## data: ardl_5
## BP = 15.421, df = 4, p-value = 0.003902
## GVIF Df GVIF^(1/(2*Df))
## L(d(consumption), c(2, 3)) 1.235631 2 1.054319
## d(dpi) 1.116469 1 1.056631
## L(d(dpi), c(1)) 1.116288 1 1.056545
############################################################
# upload a dataset form AER package named USConsump1993
# in the dataset relation between income and expenditure
# in the USE years 1950-1993 are stored
##########################################################
data(USConsump1993)
USA<-as.zoo(USConsump1993)
# Excersie 4
# 4.1
# Asses wheter time series are stationary or not
# and prepared data to (AR)DL model for expenditure
# 4.2
# Choose the best model from (AR)DL family
# 4.3
# Perform diagnostics analysis to check wheter the chosen model fulfills assumptions