Last lecture:
- ARIMA: removal of non-stationary patterns (unit roots and seasonality) recovers stationarity.
This lecture:
Extracting trends, checking unit root.
Study a system of two time series.
11/12/2017
Last lecture:
This lecture:
Extracting trends, checking unit root.
Study a system of two time series.
We knew: a time series \(Y_t\) is generated from a series of independent noises \(\varepsilon_t\).
In other words, a noise process \(\varepsilon_t\) is transformed to the process \(Y_t\) by a so-called linear filter \[Y_t = \Psi(\mathbb{L}) \varepsilon_t\] where \(\Psi (\cdot)\) is called the transfer function of the filter.
ARIMA(p,1,q) model \[\phi(\mathbb{L})(1-\mathbb{L}) Y_t =\theta(\mathbb{L})\varepsilon_t \\ \mbox{or: } (1-\mathbb{L}) Y_t = \phi(\mathbb{L})^{-1}\theta(\mathbb{L})\varepsilon_t \] where \((1-\mathbb{L})\) is the difference filter, \(\theta(\mathbb{L})\) is the MA filter, and \(\phi(\mathbb{L})\) is the AR filter.
Similar, for seasonal ARIMA\((p,1,q)(P, 1, Q)_{d}\): \[ \phi (\mathbb{L}) \Phi (\mathbb{L}^d) (1 - \mathbb{L}) (1 - \mathbb{L}^d) Y_t = \theta (\mathbb{L}) \Theta (\mathbb{L}^d) \varepsilon_t\] where \(\Phi (\mathbb{L}^d)\), \((1 - \mathbb{L}^d)\), \(\Theta (\mathbb{L}^d)\) are seasonal filters.
The linear filter is an additive linear function of previous (or future) variables.
If the filter only uses previous (lagged) variables, then it is a one side filter. AR, MA filters are one side filter.
One naive two sides filter (\(S(\mathbb{L})\)): \[S(\mathbb{L})Y_t = \sum_{j=-k}^{k} a_j Y_{t-j},\] \(k\) is the periodicity of the seasonality. When \(k=6\) is to remove the annual cycle in monthly data \(S(\mathbb{L})Y_t = \sum_{j=−6}^{6} a_j Y_{t−j}\) where \(a_j = 1/12\) for \(j = 0, \pm 1,..., \pm 5\) and \(a_{\pm 6} = 1/24\).
library(Quandl); library(xts); library(lubridate); library(forecast)
Bel = Quandl("ECB/STS_M_BE_N_UNEH_LTT000_4_000", type = "xts",
collapse = "monthly", start_date="2000-01-31")
a = rep(1,11)
a = c(0.5, a, 0.5)
a = a/sum(a)
trend = filter(Bel, sides=2, a)
ts.plot(Bel); lines(trend, col="red")
ts.plot(Bel - trend)
Unit root test is to determine if a stochastic trend is present. The most popular test is the augmented Dickey-Fuller (ADF) test.
Suppose that \(Y_{t}\) is difference stationary: \[\nabla Y_{t} = (1- \mathbb{L})Y_t = \mbox{Stationary Components}. \]
The intuition of ADF test is to express the original process with \(p\) number of stationary components \(\nabla Y_{t-j}\), \(j=1,\dots, p\).
ADF tests use a parametric autoregressive structure to capture serial correlation
Test regression \[Y_{t}= \beta D_{t}+\phi Y_{t-1}+\sum_{j=1}^{p}\psi_{j}\nabla Y_{t-j}+\varepsilon_{t}\] \[D_{t}= \mbox{Deterministic terms}\] \[\nabla Y_{t-j}\: \mbox{captures serial correlation}\]
\(\nabla Y_{t-j}\) are stationary. If \(Y_t\) is I(1), then it is the case only if \(\phi=1\).
ADF t-statistic is \[\mbox{ADF}_{t}= t_{\phi=1}=\frac{\hat{\phi}-1}{\mbox{s.e.}(\hat{\phi})}\]
ADF test tests the null hypothesis that a time series \(Y_t\) is I(1) against the alternative that it is I(0): \[H_{0}: \quad\phi=1\quad(\phi(z)=0\mbox{ has a unit root})\] \[H_{1}: \quad|\phi|<1\quad(\phi(z)=0\mbox{ has roots outside the unit circle})\]
library(tseries) y = arima.sim(n=1000,list(order = c(0, 1, 0))) adf.test(y)
## ## Augmented Dickey-Fuller Test ## ## data: y ## Dickey-Fuller = -2.2119, Lag order = 9, p-value = 0.4886 ## alternative hypothesis: stationary
adf.test(Bel)
## ## Augmented Dickey-Fuller Test ## ## data: Bel ## Dickey-Fuller = -2.2102, Lag order = 6, p-value = 0.4877 ## alternative hypothesis: stationary
adf.test(na.omit(diff(Bel))) # difference filter
## ## Augmented Dickey-Fuller Test ## ## data: na.omit(diff(Bel)) ## Dickey-Fuller = -6.7769, Lag order = 6, p-value = 0.01 ## alternative hypothesis: stationary
ARIMA: a single variable model.
A system: joint behavior of multiple variables.
Multiple time series.
It is only when the dynamic characteristics of a system are understood that intelligent direction, manipulation, and control of the system become possible.
In static cases, regression explores the relationship between two variables. Predictor variables either directly cause the response or provide a plausible explanation of it.
We can represent a dynamic relationship connecting two series by a linear filter.
Pairs of time series \((X_t, Y_t)\), e.g. a production system with one input resource \(X\) and one output product \(Y\). A steady-state relationship \[ Y_t = \Psi (\mathbb{L}) X_t \] \(Y_t\) is a function of \(X_t\). The filter \(\Psi (\mathbb{L})\) is called the steady-state gain.
Consider a linear filter of the form: \[ \begin{align} Y_t =& \psi_0 X_t + \psi_1 X_{t-1} +\cdots \\ =& (\psi_0 + \psi_1 \mathbb{L} + \psi_2 \mathbb{L}^2 +\cdots) X_t = \Psi (\mathbb{L}) X_t \end{align} \]
\(\Psi (\cdot )\) is the transfer function of the filter.
\(\psi_0,\psi_1,\psi_2, \dots\) are the impulse response function of the system. (Recall the case in AR models.)
\(\psi_j\) may be regarded as the \(Y\)’s response at times \(j\) to a unit pulse input of \(X_{t-j}\) at time \(t-j\).
A more general form \[ \begin{align} \delta_0 Y_t + \delta_1 Y_{t-1} + \cdots =& \omega_0 X_t + \omega_1 X_{t-1} +\cdots \\ \delta(\mathbb{L}) Y_t= & \omega(\mathbb{L}) X_t \\ Y_t =& \delta^{-1}(\mathbb{L}) \omega(\mathbb{L}) X_t = \Psi(\mathbb{L}) X_t \end{align} \]
A more more general form \[ \begin{align} Y_t= & \delta^{-1}(\mathbb{L}) \omega(\mathbb{L}) X_t + n_0 \varepsilon_t + n_1 \varepsilon_{t-1} + \cdots \\ Y_t =& \Psi (\mathbb{L}) X_t + N(\mathbb{L})\varepsilon_t. \end{align} \]
Cross-correlation function between the input and output. (Autocorrelation function for univariate time series.)
If \((X_t, Y_t)\) are stationary, then cross-covariance \[ \gamma_{XY} (k) =\mathbb{E} \left[(X_t - \mathbb{E}[X_t]) (Y_{t + k} - \mathbb{E}[Y_{t + k}]) \right]\]
CCF is \[ \rho_{XY} (k) = \frac{\gamma_{XY} (k)}{\sigma_X \sigma_Y}. \]
Note: In general, \(\rho_{XY} (k)\) is not equal to \(\rho_{XY}(-k)\). However \(\gamma_X (k)\), as the covariance between \(X_t\) and \(X_{t-k}\), is symmetric \(\gamma_X (k)=\gamma_X (-k)\).
set.seed(2018); X = arima.sim(n=1000, list(ar=0.2,ma=0))
Y=rep(0,1000); Y[1] = X[1];
for (t in 2:1000){Y[t] = 2 + 0.5*Y[t-1] + 5*X[t] + 2*X[t-1] + rnorm(1)}
ccf(Y,X)
acf(X)
acf(Y)
In static case, one regression of \(Y\) on \(X\) often implies a causal effect. (Changes of \(X\) causes the changes of \(Y\).)
Does high correlation imply causal effects between two time series?
For time series variables, an apparent causality could be caused by some underlying common trends, especially the unit roots.
In this case, the standard regression analysis doesn’t work.
Suppose \(Y_t = C(\mathbb{L})\varepsilon_t\) and \(X_t = C(\mathbb{L})e_t\) where \(e_t\) and \(\varepsilon_t\) are two independent white noises.
The common term \(C(\mathbb{L})\) in \(Y_t\) and \(X_t\) can induce a significant cross-correlation.
Let \(C(\mathbb{L})= 1- \mathbb{L}\), then a unit root exists for both \(Y_t\) and \(X_t\).
In the other words, two indpendent processes \(X_t\) and \(Y_t\) may have high cross-correlation value when there exists some transfer functions.
x = rnorm(100); y = rnorm(100); cor(x,y)
## [1] 0.07911594
for(i in 2:100) {
x[i] = x[i-1] + rnorm(1)
y[i] = y[i-1] + rnorm(1)}; cor(x,y)
## [1] -0.7669698
summary(lm(y~ x))$coef
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 12.473138 1.41098050 8.84005 3.968860e-14 ## x -1.031262 0.08715624 -11.83234 1.364193e-20
plot.ts(x, ylim=c(-30,30)); lines(y)
X = arima.sim(n=1000, list(order = c(1,1,0), ar = 0.7)) Y = arima.sim(n=1000, list(order = c(0,1,1), ma = 0.3)) cor(X,Y)
## [1] -0.4342586
summary(lm(Y~ X))$coef
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -5.2096738 2.20921501 -2.358156 1.855786e-02 ## X -0.3055914 0.02005546 -15.237317 2.694838e-47
ccf(Y,X)
When \(X_t\) and \(Y_t\) are two independent processes, the correlation is induced by the transfer.
Linear regression is not available. One needs to identify the transfer function \[Y_t= \Psi(\mathbb{L})X_t + N_t = \delta^{-1}(\mathbb{L}) \omega(\mathbb{L}) X_t + N_t\] where the system is corrupted by noise \(N_t\), \(Y_t\) and \(X_t\) are ARIMA processes.
Note \(\delta(\mathbb{L}) = 1 - \delta_1 \mathbb{L} -\cdots - \delta_r \mathbb{L}^r\) and \(\omega(\mathbb{L}) = 1 - \omega_1 \mathbb{L} -\cdots - \omega_s \mathbb{L}^s\)
Our target: identify \(\delta_1,\dots \delta_r\) and \(\omega_1,\dots, \omega_s\).
\((1 - 0.5\mathbb{L}) Y_t = 2 + 5 X_t + 2X_{t-1} + \varepsilon_t\) or
\(Y_t = 4 + \frac{(5 + 2\mathbb{L})}{(1 - 0.5\mathbb{L})} X_t + N_t\)
set.seed(2018); X = arima.sim(n=1000, list(ar=0.2,ma=0))
Y=rep(0,1000); Y[1] = X[1];
for (t in 2:1000){Y[t] = 2 + 0.5*Y[t-1] + 5*X[t] + 2*X[t-1] + rnorm(1)}
summary(lm(Y~ 1 + X + lag(X)))$coef
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.911712 0.1870019 20.91803 7.432406e-81 ## X 5.866740 0.1801060 32.57381 3.906933e-159
\(Y_t = 4 + \frac{(5 + 2\mathbb{L})}{(1 - 0.5\mathbb{L})} X_t + N_t\)
library(MTS) # MTS: multivariate time series transf = tfm1(Y,X, orderX= c(1,1,0), orderN=c(0,0,0))
## Delay: 0 ## Transfer function coefficients & s.e.: ## in the order: constant, omega, and delta: 1 2 1 ## [,1] [,2] [,3] [,4] ## v 3.9142 5.0061 1.9735 0.49791 ## se.v 0.0368 0.0361 0.0569 0.00482
# orderX (r,s,b) r and s are the degrees of denominator # and numerator polynomials and b is the delay
acf(transf$residuals)
ccf(transf$residuals, X)
Detrend a trending process.
Test the existence of unit root.
Relation between two time series.
Identify the transfer function of this relation.