These study notes cover the essential material from Lecture 11: Distributed Lag (DL) and Autoregressive Distributed Lag (ARDL) Models. These models are ==the workhorses of time series analysis==, used to understand how economic variables react to changes over time.

0.1 Chapter Overview

The central question of this chapter is: How do we model the dynamic relationship between variables?

In basic cross-sectional econometrics, we assume \(x\) affects \(y\) immediately. However, in the real world—especially with time series data—actions have delayed effects. For example, an increase in the money supply today might not affect inflation until several months later. (在基础的横截面计量经济学中，我们假设 \(x\) 对 \(y\) 产生即时影响。然而，在现实世界中——尤其是涉及时间序列数据时——行动往往具有滞后效应。例如，今天的货币供应量增加可能要到几个月后才会对通货膨胀产生影响。)

This chapter introduces:

Distributed Lag (DL) models: Where current \(y\) is explained by current and past values of \(x\).
Autoregressive Distributed Lag (ARDL) models: Where current \(y\) is explained by past values of \(y\) AND current/past values of \(x\).
Long-term Equilibrium: How to calculate the “steady state” where a system settles after a shock.
==分布滞后==（DL）模型： 当前的 \(y\) 由 \(x\) 的当前值和过去值共同解释。
==自回归分布滞后==（ARDL）模型： 当前的 \(y\) 由过去的 \(y\) 值以及当前/过去的 \(x\) 值共同解释。
长期均衡： 如何计算系统在经历冲击后所达到的“稳态”。

0.2 Core Concepts

0.2.1 1. Lags and Dynamics

A “lag” is simply the value of a variable from a previous time period (\(t-1, t-2, \text{etc.}\)). Time series models use these to capture persistence (the tendency of a variable to stay where it is) and delayed reactions. 所谓“滞后”，即指前一时间段（\(t-1, t-2, \text{等}\)）中某个变量的数值。时间序列模型利用这些滞后项来捕捉滞后性（变量保持当前水平的倾向）和延迟反应。

0.2.2 2. Stationarity: The Prerequisite（平稳性：前提条件）

Before estimating these models, variables must be stationary (their mean and variance do not change over time).（在估计这些模型之前，变量必须是平稳的（其均值和方差不随时间变化））

If a series is non-stationary, you must “difference” it (calculate the change from one period to the next) until it becomes stationary.（如果一个序列是非平稳的，你必须”差分”它（计算一个时期与下一个时期的变化），直到它变成平稳的）
If variables are stationary, you can proceed with simple OLS estimation.（如果变量是平稳的，你可以进行简单的OLS估计）

0.2.3 3. The Steady State (Equilibrium)（稳态（均衡））

This is a theoretical situation where nothing changes—there are no shocks, and variables remain constant (\(y_t = y_{t-1} = y^*\)). It allows economists to see the “long-run” relationship between variables once all temporary fluctuations have died out.（这是一个理论上的情况，其中没有任何变化——没有冲击，变量保持不变。它使经济学家能够看到变量之间的”长期”关系，一旦所有临时波动都消退）

0.2.4 4. Short-term vs. Long-term Multipliers（短期乘数与长期乘数）

Short-term (Immediate) Multiplier: The impact of a one-unit change in \(x\) on \(y\) in the exact same time period.（短期（即期）乘数： 在完全相同的时间段内，\(x\)增加一个单位对\(y\)的影响）
Long-term (Total) Multiplier: The total cumulative impact of that change on \(y\) after all delayed effects have occurred.（长期（总）乘数： 在所有延迟效应都发生后，该变化对\(y\)的总累积影响）

0.3 Important Models or Methods（重要的模型或方法）

0.3.1 1. Distributed Lag (DL) Model（分布滞后（DL）模型）

Formula: \(y_t = \mu + \beta_0 x_t + \beta_1 x_{t-1} + ... + \beta_p x_{t-p} + \epsilon_t\).

Intuition: The current value of \(y\) depends on what \(x\) is doing now and what it did in the past.（直观理解： 当前的\(y\)值取决于\(x\)现在正在做什么以及它过去做过什么）
Limitation: If you need many lags to explain the relationship, you run out of degrees of freedom and encounter multicollinearity between the lags.（局限性： 如果需要许多滞后项来解释关系，你会耗尽自由度，并遇到滞后项之间的多重共线性）

0.3.2 2. Autoregressive Distributed Lag (ARDL) Model（自回归分布滞后（ARDL）模型）

Formula: \(y_t = \alpha_1 y_{t-1} + ... + \alpha_p y_{t-p} + \mu + \beta_0 x_t + ... + \beta_s x_{t-s} + \epsilon_t\).

Intuition: By adding the past of \(y\) (the Autoregressive part), the model becomes much more efficient. One or two lags of \(y\) can often replace a dozen lags of \(x\), resulting in a better-fitting model with fewer variables.（直观理解： 通过添加\(y\)的过去值（自回归部分），模型变得更加高效。\(y\)的一个或两个滞后项通常可以替代\(x\)的十几个滞后项，从而产生一个拟合更好、变量更少的模型）
Benefit: ARDL models are highly effective at removing autocorrelation in the residuals, which is a major problem in time series.（优势： ARDL模型在消除残差中的自相关方面非常有效，这是时间序列中的主要问题）

0.4 How to Choose the Right Model（如何选择合适的模型）

In an exam, follow this logic to determine the appropriate model:（在考试中，按照以下逻辑来确定合适的模型）

Check Stationarity: Use tests like ADF (Augmented Dickey-Fuller). If the variables are stationary, you can use OLS for DL or ARDL.（检查平稳性： 使用ADF（增强的迪基-富勒）等测试。如果变量是平稳的，你可以对DL或ARDL使用OLS）
Evaluate the Residuals: Start with a simple model. Run the Breusch-Godfrey test for autocorrelation.（评估残差： 从简单的模型开始。运行Breusch-Godfrey测试来检测自相关）
- If there is no autocorrelation, a simple DL model may be sufficient.（如果没有自相关，简单的DL模型可能就足够了）
- If there is autocorrelation, you must add lags—usually lags of the dependent variable \(y\)—moving you into an ARDL framework.（如果存在自相关，你_必须_添加滞后项——通常是因变量\(y\)的滞后项——将你推入ARDL框架）
Evaluate Sample Size:（评估样本量：）
- Large Sample (Hundreds/Thousands of observations): You are safe using OLS with stationary variables and robust standard errors.（大样本（数百/数千个观测值）： 使用OLS与平稳变量和稳健标准误是安全的）
- Small Sample: You may need to rely on the Strict Exogeneity assumption (where current shocks don’t affect future values of \(x\)), which is harder to prove and has no specific test.（小样本： 你可能需要依赖严格外生性假设（当前冲击不影响\(x\)的未来值），这更难证明，没有特定的检验）

0.5 Interpreting Model Output (Important)（解释模型输出（重要））

0.5.1 1. Individual Coefficients（个别系数）

In time series, interpreting a single lagged coefficient (like \(\beta_2\)) in isolation makes “little sense” because a change in \(x_{t-2}\) usually implies \(x\) was also changing in \(t-1\) and \(t\). Instead, focus on the multipliers.（在时间序列中，单独解释单个滞后系数（如\(\beta_2\)）“意义不大”，因为\(x_{t-2}\)的变化通常意味着\(x\)在\(t-1\)和\(t\)也在变化。相反，应该关注乘数）

0.5.2 2. Short-term (Impact) Multiplier（短期（冲击）乘数）

This is simply the coefficient of the current-period variable (\(\beta_0\)).（这只是当期变量的系数（\(\beta_0\)））
Interpretation: “If \(x\) increases by 1 unit today, \(y\) increases by \(\beta_0\) units today, holding all else constant”.（解释： “如果今天\(x\)增加1个单位，那么今天\(y\)增加\(\beta_0\)个单位，其他条件不变”）

0.5.3 3. Long-term (Cumulative) Multiplier (LTM)（长期（累积）乘数（LTM））

This is the most likely exam question.（这是最可能出现的考试题目）

For DL Model: \(LTM = \sum \beta_i\) (the sum of all \(x\) coefficients).（对于DL模型： \(LTM = \sum \beta_i\)（所有\(x\)系数的和））
For ARDL Model: \(LTM = \frac{\sum \beta_i}{1 - \sum \alpha_i}\) (sum of \(x\) coefficients divided by 1 minus the sum of \(y\) coefficients).（对于ARDL模型： \(LTM = \frac{\sum \beta_i}{1 - \sum \alpha_i}\)（\(x\)系数的和除以1减去\(y\)系数的和））
Interpretation: This represents the total reaction of the system.（解释： 这表示系统的总反应）
- Scenario A (One-period shock): If \(x\) increases for only one period and then returns to normal, the LTM is the total “excessive” reaction \(y\) will have before it also returns to normal.（场景A（单期冲击）： 如果\(x\)仅增加一个时期然后恢复正常，LTM是\(y\)在也恢复正常之前的总”过度”反应）
- Scenario B (Permanent shock): If \(x\) increases and stays at the new level, the LTM is the difference between the new steady state and the old steady state.（场景B（永久冲击）： 如果\(x\)增加并_保持_在新的水平，LTM是新稳态和旧稳态之间的差异）

0.5.4 4. Average Reaction Lag（平均反应滞后）

Formula: \(\bar{\omega} = \frac{\sum (i \cdot \beta_i)}{LTM}\).（公式： \(\bar{\omega} = \frac{\sum (i \cdot \beta_i)}{LTM}\)）
Interpretation: Measures how long it takes for \(y\) to react to \(x\). A smaller value means a faster reaction.（解释： 衡量\(y\)对\(x\)反应需要多长时间。较小的值意味着反应更快）

0.6 Model Comparison（模型比较）

0.6.1 1. Ramsey RESET Test（Ramsey RESET测试）

Null Hypothesis (\(H_0\)): The functional form is correct (the model reflects reality).（零假设（\(H_0\)）： 函数形式是正确的（模型反映现实））
Decision: If \(p > 0.05\), the model is “okay.” If \(p < 0.05\), the model is invalid, and you should add more lags or variables.（决策： 如果\(p > 0.05\)，模型是”可以的”。如果\(p < 0.05\)，模型无效，你应该添加更多滞后项或变量）

0.6.2 2. Breusch-Godfrey (BG) Test（Breusch-Godfrey（BG）测试）

Null Hypothesis (\(H_0\)): No autocorrelation in the residuals.（零假设（\(H_0\)）： 残差中没有自相关）
Importance: In ARDL models, autocorrelation leads to endogeneity, meaning your estimates will be biased. You must achieve a non-significant BG test (\(p > 0.05\)).（重要性： 在ARDL模型中，自相关导致内生性，意味着你的估计会产生偏差。你_必须_获得非显著的BG测试（\(p > 0.05\)））

0.6.3 3. Information Criteria (AIC/BIC)（信息准则（AIC/BIC））

Used to choose the number of lags. Usually, we choose the model that provides the lowest AIC or BIC value.（用于选择滞后项的数量。通常，我们选择提供最低AIC或BIC值的模型）

0.7 Key Takeaways for Exams（考试的关键要点）

DL vs. ARDL: DL has lags of \(x\); ARDL has lags of \(x\) AND \(y\). ARDL is generally better at handling autocorrelation.（DL对比ARDL： DL有\(x\)的滞后项；ARDL有\(x\)和\(y\)的滞后项。ARDL通常更善于处理自相关）
Autocorrelation is the Enemy: If you have it, add more lags or use robust standard errors.（自相关是敌人： 如果有自相关，添加更多滞后项或使用稳健标准误）
LTM Calculation: Remember the formula difference for ARDL (\(\frac{\sum \beta}{1 - \sum \alpha}\)) vs. DL (\(\sum \beta\)).（LTM计算： 记住ARDL（\(\frac{\sum \beta}{1 - \sum \alpha}\)）与DL（\(\sum \beta\)）的公式差异）
Steady State Logic: To calculate a steady state value (\(y^*\)), assume all \(y_t = y_{t-1} = y^*\) and all \(x_t = x_{t-1} = x^*\), then solve the equation algebraically.（稳态逻辑： 要计算稳态值（\(y^*\)），假设所有\(y_t = y_{t-1} = y^*\)和所有\(x_t = x_{t-1} = x^*\)，然后用代数方法求解方程）
Strict Exogeneity: Only relevant for small samples where you cannot ensure stationarity. It means shocks today don’t affect \(x\) in the future (e.g., rainfall is strictly exogenous; hiring police officers after a crime spike is not).（严格外生性： 仅与小样本相关，在小样本中你无法确保平稳性。它意味着今天的冲击不会影响未来的\(x\)（例如，降雨是严格外生的；犯罪激增后雇用警察则不是））

1 学习笔记：分布滞后 (DL) 与自回归分布滞后 (ARDL) 模型

1.1 第一部分：讲义核心内容回顾

1.1.1 1. 核心概念与动机

动态关系：在时间序列中，变量的影响往往具有滞后效应（例如：货币供应量对通胀的影响）。
滞后项 (Lag)：指前一时间段（\(t-1, t-2\)等）的数值，用于捕捉持久性和延迟反应。
平稳性 (Stationarity)：估计模型的前提。非平稳序列需通过“差分”处理。
稳态 (Steady State)：不存在冲击、变量保持不变的理论状态（\(y_t = y_{t-1} = y^*\)），用于观察长期均衡关系。

1.1.2 2. 模型定义

分布滞后 (DL) 模型：\(y_t = \mu + \beta_0 x_t + \beta_1 x_{t-1} + \dots + \beta_p x_{t-p} + \epsilon_t\) 。
自回归分布滞后 (ARDL) 模型：\(y_t = \alpha_1 y_{t-1} + \dots + \alpha_p y_{t-p} + \mu + \beta_0 x_t + \dots + \beta_s x_{t-s} + \epsilon_t\) 。
- 优势：ARDL通过加入\(y\)的滞后项，能更高效地捕捉动态，减少变量数量并消除残差自相关。

1.1.3 3. 乘数 (Multipliers) 与解释

短期（即期）乘数：当期系数 \(\beta_0\) 。
长期（累积）乘数 (LTM)：
- DL模型：\(LTM = \sum \beta_i\) 。
- ARDL模型：\(LTM = \frac{\sum \beta_i}{1 - \sum \alpha_i}\) 。
平均反应滞后：\(\bar{\omega} = \frac{\sum (i \cdot \beta_i)}{LTM}\)，衡量反应速度。

1.2 第二部分：课堂板书补充内容 (整理版)

本部分根据白板照片整理，对模型估计、诊断检验及复杂场景进行了深入补充。

1.2.1 1. 模型估计的前提条件与路径选择

根据样本量和平稳性，估计路径分为“简单路径”与“困难路径”：

1.2.1.1 A. 简单路径：基于平稳性与大样本 (Whiteboard 1728, 1715)

当所有变量均为平稳 (Stationary) 时，使用 OLS (普通最小二乘法) 进行估计。需满足：

函数形式设定正确：通过 Ramsey RESET 检验。
同方差性 (Homoskedasticity)：通过 Breusch-Pagan (BP) 检验。
无自相关 (Lack of autocorrelation)：
- DL模型使用 Breusch-Godfrey (BG) 检验。
- ARDL模型使用 Ljung-Box (Portmanteau) 检验。
大样本：拥有足够多的观测值。

1.2.1.2 B. 困难路径：小样本或非平稳性 (Whiteboard 1730)

如果无法确保平稳性或处于小样本环境，估计变得更加困难，需依赖严格外生性 (Strict Exogeneity)：

当期外生性 (Contemporaneous Exogeneity)：\(Cov(x_t, \epsilon_t) = 0\)。
强/严格外生性 (Strong/Strict Exogeneity)：\(\epsilon_t\) 与任何时期的 \(x\) 变量都不相关。
适用场景：即使没有平稳性假设或样本量较小，若满足严格外生性，模型逻辑依然成立。

1.2.2 2. 多变量 ARDL 模型与稳态代数求解 (Whiteboard 1727)

在处理多个解释变量（如 \(x, z, w\)）时，模型形式如下：

\[y_t = \beta_0 + (\beta_1 x_t + \beta_2 x_{t-1}) + (\alpha_1 z_t + \alpha_2 z_{t-1} + \alpha_3 z_{t-2}) + (\delta_1 w_{t-1} + \delta_2 w_{t-12}) + (\gamma_1 y_{t-1} + \dots + \gamma_s y_{t-s}) + \epsilon_t\]

稳态 (Steady State) 求解逻辑：

假设系统达到均衡，所有时间下标消失：\(y_t = y_{t-1} = y^*\)，\(x_t = x_{t-1} = x^*\)，以此类推。

\[y^* = \beta_0 + \beta_1 x^* + \beta_2 x^* + \alpha_1 z^* + \alpha_2 z^* + \alpha_3 z^* + \delta_1 w^* + \delta_2 w^* + \gamma_1 y^* + \dots + \gamma_s y^*\]

通过代数移项，即可解出长期均衡值 \(y^*\)。

1.2.3 3. 多变量长期乘数 (LTM) 详尽公式 (Whiteboard 1723)

针对永久冲击 (Permanent Shock)，即变量增加并保持在水平值（如 \(x_{15}=x_{16}=\dots=7\)），各变量的 LTM 计算如下：

模型类型	变量 x 的 LTM	变量 z 的 LTM
DL 模型	\(LTM_x = \beta_1 + \beta_2\)	\(LTM_z = \alpha_1 + \alpha_2 + \alpha_3\)
ARDL 模型	\(LTM_x = \frac{\beta_1 + \beta_2}{1 - (\gamma_1 + \gamma_2 + \dots + \gamma_s)}\)	\(LTM_z = \frac{\alpha_1 + \alpha_2 + \alpha_3}{1 - (\gamma_1 + \dots + \gamma_s)}\)

1.2.4 4. 诊断检验与补救措施 (Whiteboard 1716, 1717)

检验项目	工具	零假设 (H0)	失败后的补救措施
函数形式	Ramsey RESET	模型设定正确/反映现实	增加变量、添加滞后项、取对数
同方差性	Breusch-Pagan	误差项为同方差	使用对数形式、使用稳健标准误
自相关	BG / Ljung-Box	残差项无自相关	增加滞后项（尤其是 \(y\) 的滞后项）

关键结论：

在 ARDL 模型中，自相关会导致内生性偏误，必须通过增加滞后项使 \(p-value > 0.05\)。
如果无法完美解决同方差或自相关问题，应使用 稳健协方差矩阵估计量 (Robust Standard Errors) 来确保推断的有效性。

###########################################################################
#       Advanced Econometrics                                                 #
#   Spring semester                                                       #
#   dr Marcin Chlebus, dr Rafał Woźniak                                   #
#   University of Warsaw, Faculty of Economic Sciences                    #
#                                                                         #
#                                                                         #
#                 Lab 12: (AR)DL Linear regression                        #
#                                                                         #
###########################################################################

if(!require(lmtest)){install.packages("lmtest")}
library(lmtest) 

if(!require(dynlm)){install.packages("dynlm")}
library(dynlm)

# install.packages("devtools")
# library(devtools)
# install_github("fcbarbi/ardl")
# library(ardl)

if(!require(AER)){install.packages("AER")}
library(AER)

if(!require(zoo)){install.packages("zoo")}
library(zoo)

if(!require(xts)){install.packages("xts")}
library(xts)

if(!require(fBasics)){install.packages("fBasics")}
library(fBasics)

Sys.setenv(LANG = "en")
options(scipen=5)

##########################################################
# Expolratory data analysis
#########################################################

# Quarterly US macroeconomic data from 1950(1) – 2000(4)
# provided by USMacroG, a “ts” time series. Contains 
# dpi --> disposable income
# consumption --> consumption (in billion USD).

data(USMacroG)

data(package = "AER")

USA<-as.zoo(USMacroG)
USA[1:5,]

##            gdp consumption invest government    dpi  cpi     m1 tbill unemp
## 1950 Q1 1610.5      1058.9  198.1      361.0 1186.1 70.6 110.20  1.12   6.4
## 1950 Q2 1658.8      1075.9  220.4      366.4 1178.1 71.4 111.75  1.17   5.6
## 1950 Q3 1723.0      1131.0  239.7      359.6 1196.5 73.2 112.95  1.23   4.6
## 1950 Q4 1753.9      1097.6  271.8      382.5 1210.0 74.9 113.93  1.35   4.2
## 1951 Q1 1773.5      1122.8  242.9      421.9 1207.9 77.3 115.08  1.40   3.5
##         population inflation interest
## 1950 Q1    149.461        NA       NA
## 1950 Q2    150.260    4.5071  -3.3404
## 1950 Q3    151.064    9.9590  -8.7290
## 1950 Q4    151.871    9.1834  -7.8301
## 1951 Q1    152.393   12.6160 -11.2160

#plotting DPI and consuption
  colors_ <- c("black", "red")
  
  plot(USA[,c("dpi","consumption")], plot.type = "single",
       col = colors_,   # colors for subsequent lines
       ylab = "dpi/consumption", xlab = "time", # axes labels
       main = "DPI & Consumption in the USA 1950 - 2000") # title above the plot
  
  # separately we add a legend definition
  legend("topleft",     # legend position - combination of top, bottom and left, right
         names(USA[,c("dpi","consumption")]), # legend elements (names of the series)
         text.col = colors_)     # colors (the same as in plot() function above)

#Conclusions?
  # 1. Those 2 variables follows probably the same trend, in regression we would find this relation, but probably nothing more
  # What to do?
    # differentiation - is there relation between deviance from the trend?
    # we want to work on stationary TS

    #plotting first difference of DPI and consuption
    plot(diff(USA[,c("dpi","consumption")]), plot.type = "single",
         col = colors_,   # colors for subsequent lines
         ylab = "dpi/consumtion", xlab = "time", # axes labels
         main = "DPI & Consumption in the USA 1950 - 2000") # title above the plot
    
    # separately we add a legend definition
    legend("topleft",     # legend position - combination of top, bottom and left, right
           names(USA[,c("dpi","consumption")]), # legend elements (names of the series)
           text.col = colors_)     # colors (the same as in plot() function above)

  #stationarity testing
    
    source("function_testdf2.R")
    options(scipen=999)
    testdf2(variable = USA[,c("dpi")], # vector tested
       test.type = "c", # test.type = "c",
       max.augmentations = 5, # maximum number of augmentations added
       max.order=5) # maximum order of residual lags for BG test

##   augmentations      adf p_adf p_bg.p.value.1 p_bg.p.value.2 p_bg.p.value.3
## 1             0 4.520147  0.99      0.4757627      0.1466866      0.2377082
## 2             1 4.417310  0.99      0.9225736      0.1784467      0.2924392
## 3             2 3.639854  0.99      0.9354447      0.9963847      0.7998528
## 4             3 3.342271  0.99      0.9843572      0.9972283      0.9836444
## 5             4 3.069740  0.99      0.9103627      0.9851835      0.9822840
## 6             5 3.715439  0.99      0.9635154      0.9980164      0.9962116
##   p_bg.p.value.4 p_bg.p.value.5
## 1      0.3438192     0.01927990
## 2      0.4280401     0.02312387
## 3      0.8876616     0.08151424
## 4      0.9893585     0.10515041
## 5      0.9961649     0.10115539
## 6      0.9919605     0.99814270

    testdf2(variable = diff(USA[,c("dpi")]), # vector tested
           test.type = "c", # test.type = "c",
           max.augmentations = 5, # maximum number of augmentations added
           max.order=5) # maximum order of residual lags for BG test

##   augmentations        adf p_adf p_bg.p.value.1 p_bg.p.value.2 p_bg.p.value.3
## 1             0 -13.460411  0.01      0.8720424     0.01570519      0.0149188
## 2             1  -7.813974  0.01      0.7532990     0.89567380      0.3446841
## 3             2  -6.038376  0.01      0.8871046     0.98923951      0.9932058
## 4             3  -5.022296  0.01      0.8511516     0.97342644      0.9908236
## 5             4  -5.491999  0.01      0.9090418     0.99051048      0.9943576
## 6             5  -4.799076  0.01      0.9367024     0.98509353      0.9882262
##   p_bg.p.value.4 p_bg.p.value.5
## 1     0.01496486    0.004641123
## 2     0.32977415    0.113457768
## 3     0.90527668    0.239907589
## 4     0.99687858    0.225223810
## 5     0.99900087    0.998999470
## 6     0.99643513    0.998707633

   testdf2(variable = USA[,c("consumption")], # vector tested
          test.type = "c", # test.type = "c",
          max.augmentations = 5,  # maximum number of augmentations added
          max.order=5) # maximum order of residual lags for BG test

##   augmentations      adf p_adf p_bg.p.value.1 p_bg.p.value.2 p_bg.p.value.3
## 1             0 8.296398  0.99    0.005546307   0.0001103848 0.000008381582
## 2             1 5.771917  0.99    0.485949219   0.0208831243 0.000140733984
## 3             2 4.367355  0.99    0.755707867   0.9043717071 0.006860732824
## 4             3 3.066683  0.99    0.750462543   0.9496740461 0.832868347784
## 5             4 3.252579  0.99    0.991992033   0.9729771490 0.908205859670
## 6             5 3.202110  0.99    0.854199944   0.9826726715 0.998104122232
##   p_bg.p.value.4 p_bg.p.value.5
## 1  0.00001907399  0.00003700224
## 2  0.00041767804  0.00065979348
## 3  0.01362072624  0.02641457522
## 4  0.79058700070  0.83992186062
## 5  0.96783873221  0.97249863759
## 6  0.99938185774  0.99929469234

    testdf2(variable = diff(USA[,c("consumption")]), # vector tested
            test.type = "c", # test.type = "c",
            max.augmentations = 5,  # maximum number of augmentations added
            max.order=5) # maximum order of residual lags for BG test

##   augmentations       adf      p_adf p_bg.p.value.1 p_bg.p.value.2
## 1             0 -9.184518 0.01000000     0.04956141    0.003167801
## 2             1 -5.237949 0.01000000     0.30523939    0.246359385
## 3             2 -3.629948 0.01000000     0.87947985    0.926376184
## 4             3 -3.554668 0.01000000     0.97024924    0.906962022
## 5             4 -3.607831 0.01000000     0.86171475    0.984715987
## 6             5 -3.126966 0.02712406     0.98008540    0.998966222
##    p_bg.p.value.3  p_bg.p.value.4  p_bg.p.value.5
## 1 0.0000001672182 0.0000001920137 0.0000006372337
## 2 0.0017024608681 0.0042263851347 0.0089973989528
## 3 0.9548186200858 0.9089565420640 0.9419401977419
## 4 0.9599028563715 0.9609444552864 0.9748349856447
## 5 0.9981523313452 0.9903607166502 0.9970624226973
## 6 0.9995951693166 0.9996078091443 0.9999463124934

 # first-differences seem to have increasing variance
    # thus, let us analyse the first difference of the log-transformed
    # first differences
    
    testdf2(variable = diff(log(USA[,c("dpi")])), # vector tested
            test.type = "c", # test.type = "nc",
            max.augmentations = 5, # maximum number of augmentations added
            max.order=5) # maximum order of residual lags for BG test

##   augmentations        adf p_adf p_bg.p.value.1 p_bg.p.value.2 p_bg.p.value.3
## 1             0 -13.218923  0.01      0.9662664      0.6590377      0.4960841
## 2             1  -8.916091  0.01      0.9100174      0.9923266      0.6236955
## 3             2  -6.824515  0.01      0.9450289      0.9383219      0.9883654
## 4             3  -6.336003  0.01      0.9707415      0.9714525      0.9958945
## 5             4  -7.033210  0.01      0.8707947      0.9863307      0.9962280
## 6             5  -5.947960  0.01      0.9861110      0.9986554      0.9985395
##   p_bg.p.value.4 p_bg.p.value.5
## 1      0.6648265     0.04837356
## 2      0.7793006     0.07764483
## 3      0.9973459     0.14685535
## 4      0.9989003     0.17154874
## 5      0.9994993     0.99966040
## 6      0.9998475     0.99914533

    testdf2(variable = diff(log(USA[,c("consumption")])), # vector tested
            test.type = "c", # test.type = "nc",
            max.augmentations = 5,  # maximum number of augmentations added
            max.order=5) # maximum order of residual lags for BG test

##   augmentations        adf p_adf p_bg.p.value.1 p_bg.p.value.2 p_bg.p.value.3
## 1             0 -13.791807  0.01     0.71451406   0.0002341137   0.0007533067
## 2             1  -8.134460  0.01     0.07989654   0.1829103993   0.1138040669
## 3             2  -6.268350  0.01     0.71253996   0.4629639320   0.1105162682
## 4             3  -6.777003  0.01     0.88973894   0.1969371662   0.0370375602
## 5             4  -6.450499  0.01     0.56548398   0.8323252837   0.8093948218
## 6             5  -6.174957  0.01     0.91795679   0.9945301814   0.9996765731
##   p_bg.p.value.4 p_bg.p.value.5
## 1   0.0004893063   0.0008670902
## 2   0.0405561599   0.0464505336
## 3   0.0493164293   0.0478830419
## 4   0.0730829520   0.0966908739
## 5   0.9059957973   0.9334284979
## 6   0.9997289343   0.9857120180

##########################################################
# Autoregressive Distributed Lagged Model
#########################################################
  
  #  Distirbuted Lagged Model
  # As regressors we use lags of explanatory variables
  # start = c(1960, 1) is set to make comparison between models with different lag order possible

  dl_1 <- dynlm(d(consumption) ~ d(dpi) + L(d(dpi)), data = USA, start = c(1960, 1))
  summary(dl_1)

## 
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
## 
## Call:
## dynlm(formula = d(consumption) ~ d(dpi) + L(d(dpi)), data = USA, 
##     start = c(1960, 1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -64.413 -12.797  -1.915  12.594  53.529 
## 
## Coefficients:
##             Estimate Std. Error t value         Pr(>|t|)    
## (Intercept) 12.60067    2.69478   4.676 0.00000615612291 ***
## d(dpi)       0.37915    0.04942   7.673 0.00000000000152 ***
## L(d(dpi))    0.18116    0.04957   3.655         0.000348 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.06 on 161 degrees of freedom
## Multiple R-squared:  0.3074, Adjusted R-squared:  0.2988 
## F-statistic: 35.73 on 2 and 161 DF,  p-value: 0.0000000000001437

# alternative model for log-transformed variables
  USA$ln_consumption = log(USA$consumption)
  USA$ln_dpi = log(USA$dpi)
  dl_ln1 <- dynlm(d(ln_consumption) ~ d(ln_dpi) + L(d(ln_dpi)), data = USA, start = c(1960, 1))
  summary(dl_ln1)

## 
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
## 
## Call:
## dynlm(formula = d(ln_consumption) ~ d(ln_dpi) + L(d(ln_dpi)), 
##     data = USA, start = c(1960, 1))
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0202245 -0.0035871 -0.0002189  0.0040243  0.0132598 
## 
## Coefficients:
##               Estimate Std. Error t value          Pr(>|t|)    
## (Intercept)  0.0038889  0.0007819   4.973 0.000001672323833 ***
## d(ln_dpi)    0.4165784  0.0530758   7.849 0.000000000000554 ***
## L(d(ln_dpi)) 0.1655767  0.0530807   3.119           0.00215 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.005956 on 161 degrees of freedom
## Multiple R-squared:  0.3121, Adjusted R-squared:  0.3035 
## F-statistic: 36.52 on 2 and 161 DF,  p-value: 0.00000000000008359

  acf(dl_1$residuals, type='correlation')

  #in order to draw a conclusions based on the model we would like to sort out a problem with autocorrelation
  #has this model sorted out autocorrelation issue? - nope
  resids_ <- as.data.frame(dl_1$residuals)
  colnames(resids_)<-c("res")
  bgtest(res~1,data=resids_, order = 1)

## 
##  Breusch-Godfrey test for serial correlation of order up to 1
## 
## data:  res ~ 1
## LM test = 2.8539, df = 1, p-value = 0.09116

  bgtest(res~1,data=resids_, order = 2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 2
## 
## data:  res ~ 1
## LM test = 11.199, df = 2, p-value = 0.0037

  bgtest(res~1,data=resids_, order = 3)

## 
##  Breusch-Godfrey test for serial correlation of order up to 3
## 
## data:  res ~ 1
## LM test = 22.658, df = 3, p-value = 0.00004758

  bgtest(res~1,data=resids_, order = 4)

## 
##  Breusch-Godfrey test for serial correlation of order up to 4
## 
## data:  res ~ 1
## LM test = 23.717, df = 4, p-value = 0.00009101

  bgtest(res~1,data=resids_, order = 5)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  res ~ 1
## LM test = 24.111, df = 5, p-value = 0.0002067

使用 c(1:5) 来显示 5 个lags.

  dl_5_all <- dynlm(d(consumption) ~ d(dpi) + L(d(dpi),c(1:5)), data = USA,start = c(1960, 1))
  summary(dl_5_all)

## 
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
## 
## Call:
## dynlm(formula = d(consumption) ~ d(dpi) + L(d(dpi), c(1:5)), 
##     data = USA, start = c(1960, 1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -64.047 -11.994  -1.912  12.038  49.236 
## 
## Coefficients:
##                    Estimate Std. Error t value        Pr(>|t|)    
## (Intercept)         5.64149    3.56385   1.583         0.11544    
## d(dpi)              0.36923    0.05050   7.311 0.0000000000127 ***
## L(d(dpi), c(1:5))1  0.16208    0.04976   3.257         0.00138 ** 
## L(d(dpi), c(1:5))2  0.08255    0.05033   1.640         0.10295    
## L(d(dpi), c(1:5))3  0.05801    0.05077   1.143         0.25497    
## L(d(dpi), c(1:5))4  0.04099    0.05024   0.816         0.41585    
## L(d(dpi), c(1:5))5  0.08489    0.05103   1.664         0.09821 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.74 on 157 degrees of freedom
## Multiple R-squared:  0.3451, Adjusted R-squared:  0.3201 
## F-statistic: 13.79 on 6 and 157 DF,  p-value: 0.000000000001476

  dl_5 <- dynlm(d(consumption) ~ d(dpi) + L(d(dpi),c(1,2,5)), data = USA,start = c(1960, 1))
  summary(dl_5)

## 
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
## 
## Call:
## dynlm(formula = d(consumption) ~ d(dpi) + L(d(dpi), c(1, 2, 5)), 
##     data = USA, start = c(1960, 1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -65.371 -12.682  -1.721  11.804  48.567 
## 
## Coefficients:
##                        Estimate Std. Error t value         Pr(>|t|)    
## (Intercept)             7.58732    3.25397   2.332          0.02097 *  
## d(dpi)                  0.37793    0.05009   7.545 0.00000000000328 ***
## L(d(dpi), c(1, 2, 5))1  0.17365    0.04902   3.543          0.00052 ***
## L(d(dpi), c(1, 2, 5))2  0.08627    0.04967   1.737          0.08436 .  
## L(d(dpi), c(1, 2, 5))5  0.09302    0.05030   1.849          0.06626 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.73 on 159 degrees of freedom
## Multiple R-squared:  0.3372, Adjusted R-squared:  0.3205 
## F-statistic: 20.22 on 4 and 159 DF,  p-value: 0.0000000000001754

  #check which model is better:
    anova(dl_5_all,dl_5)

## Analysis of Variance Table
## 
## Model 1: d(consumption) ~ d(dpi) + L(d(dpi), c(1:5))
## Model 2: d(consumption) ~ d(dpi) + L(d(dpi), c(1, 2, 5))
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1    157 67530                           
## 2    159 68350 -2   -819.71 0.9529 0.3879

    #conclusion? both are statistically equal so we prefer with more DF (less regressors)

    acf(dl_5$residuals, type='correlation')

    #has this model sorted out autocorrelation issue? - nope
  resids_ <- as.data.frame(dl_5$residuals)
  colnames(resids_)<-c("res")
  bgtest(res~1,data=resids_, order = 1)

## 
##  Breusch-Godfrey test for serial correlation of order up to 1
## 
## data:  res ~ 1
## LM test = 1.2403, df = 1, p-value = 0.2654

  bgtest(res~1,data=resids_, order = 2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 2
## 
## data:  res ~ 1
## LM test = 8.7087, df = 2, p-value = 0.01285

  bgtest(res~1,data=resids_, order = 3)

## 
##  Breusch-Godfrey test for serial correlation of order up to 3
## 
## data:  res ~ 1
## LM test = 16.047, df = 3, p-value = 0.001109

  bgtest(res~1,data=resids_, order = 4)

## 
##  Breusch-Godfrey test for serial correlation of order up to 4
## 
## data:  res ~ 1
## LM test = 16.918, df = 4, p-value = 0.002005

  bgtest(res~1,data=resids_, order = 5)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  res ~ 1
## LM test = 17.682, df = 5, p-value = 0.003373

==TODO==: explain coeftest

# Autocorrelation issue has beed not solved, but in DL models the problem is only in inconsistent Variance - Covariance matrix
    # Newey - Davis robust estimator can be used to update the results and draw conclusions
    
    library(lmtest)
    library(sandwich)
    coeftest(dl_5, vcov=vcovHAC(dl_5))

## 
## t test of coefficients:
## 
##                        Estimate Std. Error t value      Pr(>|t|)    
## (Intercept)            7.587319   3.566781  2.1272     0.0349445 *  
## d(dpi)                 0.377935   0.063942  5.9106 0.00000002012 ***
## L(d(dpi), c(1, 2, 5))1 0.173651   0.046350  3.7465     0.0002504 ***
## L(d(dpi), c(1, 2, 5))2 0.086269   0.051880  1.6629     0.0983093 .  
## L(d(dpi), c(1, 2, 5))5 0.093020   0.061803  1.5051     0.1342819    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

   # Maybe Autoregressive Distirbuted Lagged Model would help?
   # Note: Autocorrelation in residuals for ARDL lead to inconsistency of estimators!
    
   
    ardl_5_all <- dynlm(d(consumption) ~ L(d(consumption),c(1:5)) + d(dpi) + L(d(dpi),c(1,2,5)), data = USA, start = c(1960, 1))
    summary(ardl_5_all)

## 
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
## 
## Call:
## dynlm(formula = d(consumption) ~ L(d(consumption), c(1:5)) + 
##     d(dpi) + L(d(dpi), c(1, 2, 5)), data = USA, start = c(1960, 
##     1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -69.815 -10.597   0.132  11.497  46.341 
## 
## Coefficients:
##                             Estimate Std. Error t value     Pr(>|t|)    
## (Intercept)                 4.402872   3.170389   1.389     0.166915    
## L(d(consumption), c(1:5))1  0.069549   0.082560   0.842     0.400871    
## L(d(consumption), c(1:5))2  0.240945   0.082132   2.934     0.003863 ** 
## L(d(consumption), c(1:5))3  0.251578   0.074329   3.385     0.000904 ***
## L(d(consumption), c(1:5))4 -0.080380   0.075909  -1.059     0.291300    
## L(d(consumption), c(1:5))5 -0.006211   0.084126  -0.074     0.941237    
## d(dpi)                      0.300907   0.051961   5.791 0.0000000381 ***
## L(d(dpi), c(1, 2, 5))1      0.079843   0.056471   1.414     0.159414    
## L(d(dpi), c(1, 2, 5))2     -0.047063   0.055210  -0.852     0.395299    
## L(d(dpi), c(1, 2, 5))5      0.041920   0.054999   0.762     0.447111    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.41 on 154 degrees of freedom
## Multiple R-squared:  0.4371, Adjusted R-squared:  0.4042 
## F-statistic: 13.29 on 9 and 154 DF,  p-value: 0.00000000000000135

   ardl_5 <- dynlm(d(consumption) ~ L(d(consumption),c(2,3)) + d(dpi) + L(d(dpi),c(1)), data = USA, start = c(1960, 1))
    summary(ardl_5)

## 
## Time series regression with "zooreg" data:
## Start = 1960 Q1, End = 2000 Q4
## 
## Call:
## dynlm(formula = d(consumption) ~ L(d(consumption), c(2, 3)) + 
##     d(dpi) + L(d(dpi), c(1)), data = USA, start = c(1960, 1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -70.136 -11.432  -0.785  11.774  44.642 
## 
## Coefficients:
##                             Estimate Std. Error t value      Pr(>|t|)    
## (Intercept)                  4.44202    2.84257   1.563       0.12012    
## L(d(consumption), c(2, 3))2  0.20552    0.06809   3.018       0.00296 ** 
## L(d(consumption), c(2, 3))3  0.25150    0.06820   3.688       0.00031 ***
## d(dpi)                       0.29335    0.04776   6.142 0.00000000627 ***
## L(d(dpi), c(1))              0.10020    0.04790   2.092       0.03806 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.27 on 159 degrees of freedom
## Multiple R-squared:  0.4276, Adjusted R-squared:  0.4132 
## F-statistic:  29.7 on 4 and 159 DF,  p-value: < 0.00000000000000022

    #check which model is better:
    anova(ardl_5_all,ardl_5)

## Analysis of Variance Table
## 
## Model 1: d(consumption) ~ L(d(consumption), c(1:5)) + d(dpi) + L(d(dpi), 
##     c(1, 2, 5))
## Model 2: d(consumption) ~ L(d(consumption), c(2, 3)) + d(dpi) + L(d(dpi), 
##     c(1))
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1    154 58046                           
## 2    159 59023 -5   -976.74 0.5183 0.7622

    #conclusion? both are statistically equal so we prefer with more DF (less regressors)

    acf(ardl_5$residuals, type='correlation')

    #has this model sorted out autocorrelation issue? - Yes!!
   resids_ <- as.data.frame(ardl_5$residuals)
  colnames(resids_)<-c("res")
  bgtest(res~1,data=resids_, order = 1)

## 
##  Breusch-Godfrey test for serial correlation of order up to 1
## 
## data:  res ~ 1
## LM test = 0.82863, df = 1, p-value = 0.3627

  bgtest(res~1,data=resids_, order = 2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 2
## 
## data:  res ~ 1
## LM test = 1.1992, df = 2, p-value = 0.549

  bgtest(res~1,data=resids_, order = 3)

## 
##  Breusch-Godfrey test for serial correlation of order up to 3
## 
## data:  res ~ 1
## LM test = 1.2328, df = 3, p-value = 0.7452

  bgtest(res~1,data=resids_, order = 4)

## 
##  Breusch-Godfrey test for serial correlation of order up to 4
## 
## data:  res ~ 1
## LM test = 2.5794, df = 4, p-value = 0.6305

  bgtest(res~1,data=resids_, order = 5)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  res ~ 1
## LM test = 2.7137, df = 5, p-value = 0.744

###################################################################################################      
    #Diagnostics other than Autocorrelation
    ###################################################################################################
    
    # Diagnostics plots    
    
    par(mfrow=c(2,2))
    plot(ardl_5)

    par(mfrow=c(1,1))
    
    # 1. Residuals vs Fitted
    # 
    # This plot shows if residuals have non-linear patterns. 
    # There could be a non-linear relationship between predictor variables and an outcome variable 
    # and the pattern could show up in this plot if the model doesn’t capture the non-linear relationship. 
    # If you find equally spread residuals around a horizontal line without distinct patterns,
    # that is a good indication you don’t have non-linear relationships.
    
    # 2. Normal Q-Q
    # 
    # This plot shows if residuals are normally distributed. 
    # Do residuals follow a straight line well or do they deviate severely? 
    # It’s good if residuals are lined well on the straight dashed line.
    
    #residuals normality test
    if(!require(interp)){install.packages("interp")}
    library(interp)
    jbTest(as.matrix(residuals(ardl_5)))

## 
## Title:
##  Jarque - Bera Normality Test
## 
## Test Results:
##   PARAMETER:
##     Sample Size: 164
##   STATISTIC:
##     LM: 4.04
##     ALM: 4.688
##   P VALUE:
##     LM p-value: 0.096 
##     ALM p-value: 0.084 
##     Asymptotic: 0.133

    # 3. Scale-Location
    # 
    # It’s also called Spread-Location plot. 
    # This plot shows if residuals are spread equally along the ranges of predictors. 
    # This is how you can check the assumption of equal variance (homoscedasticity). 
    # It’s good if you see a horizontal line with equally (randomly) spread points.
    
    # 4. Residuals vs Leverage
    # 
    # This plot helps us to find influential cases (i.e., subjects) if any. 
    # Not all outliers are influential in linear regression analysis (whatever outliers mean). 
    # Even though data have extreme values, they might not be influential to determine a regression line. 
    # That means, the results wouldn’t be much different if we either include or exclude them from analysis. 
    # They follow the trend in the majority of cases and they don’t really matter; they are not influential. 
    # On the other hand, some cases could be very influential even if they look to be within a reasonable range of the values. 
    # They could be extreme cases against a regression line and can alter the results if we exclude them from analysis. 
    # Another way to put it is that they don’t get along with the trend in the majority of the cases.
    # 
    # Unlike the other plots, this time patterns are not relevant. 
    # We watch out for outlying values at the upper right corner or at the lower right corner. 
    # Those spots are the places where cases can be influential against a regression line. 
    # Look for cases outside of a dashed line, Cook’s distance.
    # When cases are outside of the Cook’s distance (meaning they have high Cook’s distance scores), 
    # the cases are influential to the regression results. The regression results will be altered if we exclude those cases.
    # 
    
    # http://data.library.virginia.edu/diagnostic-plots/

    ## Breusch-Pagan
    bptest(ardl_5,data=USA, studentize=FALSE)

## 
##  Breusch-Pagan test
## 
## data:  ardl_5
## BP = 20.181, df = 4, p-value = 0.00046

    bptest(ardl_5,data=USA)

## 
##  studentized Breusch-Pagan test
## 
## data:  ardl_5
## BP = 15.421, df = 4, p-value = 0.003902

    ## Varaince Inflation Ratio
    vif(ardl_5,data=USA)

##                                GVIF Df GVIF^(1/(2*Df))
## L(d(consumption), c(2, 3)) 1.235631  2        1.054319
## d(dpi)                     1.116469  1        1.056631
## L(d(dpi), c(1))            1.116288  1        1.056545

############################################################
    
#    upload a dataset form AER package named USConsump1993
#    in the dataset relation between income and expenditure
#    in the USE years 1950-1993 are stored
    
##########################################################
  
    data(USConsump1993)
    USA<-as.zoo(USConsump1993)    
    
     
#    Excersie 4
    
    
    

# 4.1
# Asses wheter time series are stationary or not 
# and prepared data to (AR)DL model for expenditure 
  
    
# 4.2 
# Choose the best model from (AR)DL family
    
    
    
    
# 4.3
# Perform diagnostics analysis to check wheter the chosen model fulfills assumptions

Lab 12: Autoregressive Distributed Lag (ARDL) Models