多元时间序列分析的目的可以概括为分析不同时间序列的动态相关性, 从而得到单元序列分析更准确的预测。首先,我们生成k = 3个时间序列,\(y_1, y_2, y_3\), 每个序列长度T为500。从单元时间序列分析得知,分析的重点在残差,即矫正过的时间序列,比较宽松但是很实际的假设是时间序列是一个white noise, 所以这里模拟的序列从简单出发也是white noise。 假定我们已经知道三个序列的Covariance Matrix \(cm\) 为
## z1 z2 z3
## z1 1.2 -0.30 0.80
## z2 -0.3 0.85 1.45
## z3 0.8 1.45 0.68
三个时间序列的平均数 \(mn = (0, 0, 0)'\), 那么我们可以模拟得到这个三时间序列
require(mvtnorm)
cm = matrix(c(1.2, -0.3, 0.1, -.3, 1.5, 0.2, .1, .2, .68),3,3)
mn = c(0, 0, 0)
set.seed(1000) # for replication
zr = rmvnorm(500, mean = mn, sigma = cm)
colnames(zr) = c("z1","z2", "z3")
dim(zr)
## [1] 500 3
head(zr, n = 5)
## z1 z2 z3
## [1,] -0.3209363 -1.3993240 -0.11665802
## [2,] 0.7758741 -1.0788685 -0.35697389
## [3,] -0.6135904 0.9346472 0.03022249
## [4,] -1.3930749 -1.0658197 -0.63522585
## [5,] 0.0680348 -0.2997222 -1.09540040
var(zr) # check covariance matrix
## z1 z2 z3
## z1 1.14415813 -0.2536987 0.07239649
## z2 -0.25369865 1.3557670 0.11946303
## z3 0.07239649 0.1194630 0.65732968
apply(zr, 2, mean) # check mean
## z1 z2 z3
## -0.04239829 0.02280032 0.01503880
我们可以从下图中看到,很明显,他们都是stationary的。 因为先验的设定了他们的covariance和mean,他们的走势没有特别相似或不相似。
library(MTS)
MTSplot(zr)
很自然,我们会考虑到这些数列和自己的lag value 以及其他序列的lag vlaue 是否确有联系,这个问题可以用cross covariance/correlation matrices 中找到提示, 这两个matrices,相对于lag \(\ell\), 依序为如下定义:
\[ \hat{\Gamma_\ell}=\frac{1}{T-1}\sum_{t = \ell+1}^T(z_t -\hat{\mu}_z)(z_{t-\ell} -\hat{\mu}_z)'\] \[ \hat{\rho_\ell}=\hat{D}^{-1}\hat{\Gamma_\ell}\hat{D}^{-1}, \qquad \hat{D} = diag(\hat{\Gamma_0}^{0.5}) =diag(\hat{\sigma}_{ii})\]
用上面的多元序列zr为基础,下面举例计算 \(\ell\) 为0 和 2 的情况, 首先 \(\ell\) 是0的计算过程
nT = nrow(zr)
nT
## [1] 500
nV = ncol(zr)
nV
## [1] 3
tem_zr = scale(zr, center = T, scale = F) # zr - mean(zr) by col
head(tem_zr)
## z1 z2 z3
## [1,] -0.2785380 -1.4221243 -0.13169682
## [2,] 0.8182723 -1.1016688 -0.37201269
## [3,] -0.5711921 0.9118469 0.01518369
## [4,] -1.3506766 -1.0886200 -0.65026464
## [5,] 0.1104331 -0.3225226 -1.11043920
## [6,] 0.2078648 0.1452449 0.03136217
g0 = (t(tem_zr) %*% tem_zr) / (nT - 1)
g0 # Gamma_0
## z1 z2 z3
## z1 1.14415813 -0.2536987 0.07239649
## z2 -0.25369865 1.3557670 0.11946303
## z3 0.07239649 0.1194630 0.65732968
se = (diag(diag(g0)))^0.5
se
## [,1] [,2] [,3]
## [1,] 1.069653 0.000000 0.0000000
## [2,] 0.000000 1.164374 0.0000000
## [3,] 0.000000 0.000000 0.8107587
seinv = solve(se)
rho0 = seinv %*% g0 %*% seinv
rho0
## [,1] [,2] [,3]
## [1,] 1.00000000 -0.2036960 0.08348008
## [2,] -0.20369605 1.0000000 0.12654627
## [3,] 0.08348008 0.1265463 1.00000000
从 \(\rho_0\) 可知,相关性很弱,接着着计算 \(\ell\) 为2的例子
lag = 2
zr1 = tem_zr[(lag + 1):nT,]
zr2 = tem_zr[1:(nT - lag), ]
g2 = (t(zr1) %*% zr2)/(nT - 1)
g2
## z1 z2 z3
## z1 -0.0007782111 -0.0007902644 0.055793009
## z2 -0.0555908528 -0.0452666599 -0.047337904
## z3 0.0354941321 -0.0056331215 0.008798018
rho2 = seinv %*% g2 %*% seinv
rho2
## [,1] [,2] [,3]
## [1,] -0.0006801605 -0.0006345077 0.06433468
## [2,] -0.0446342021 -0.0333882288 -0.05014468
## [3,] 0.0409281311 -0.0059671226 0.01338448
显然,几乎没有相关性存在,以上计算可以很容易的使用MTS包中的ccm函数得出
ccm(zr,2)$ccm
## [1] "Covariance matrix:"
## z1 z2 z3
## z1 1.1442 -0.254 0.0724
## z2 -0.2537 1.356 0.1195
## z3 0.0724 0.119 0.6573
## CCM at lag: 0
## [,1] [,2] [,3]
## [1,] 1.0000 -0.204 0.0835
## [2,] -0.2037 1.000 0.1265
## [3,] 0.0835 0.127 1.0000
## Simplified matrix:
## CCM at lag: 1
## . . .
## . . .
## . . .
## CCM at lag: 2
## . . .
## . . .
## . . .
## Hit Enter for p-value plot of individual ccm:
## [,1] [,2] [,3]
## [1,] 1.00000000 0.0400708655 -0.0006788001
## [2,] -0.20369605 -0.0104526411 -0.0445449337
## [3,] 0.08348008 0.0008291153 0.0408462748
## [4,] -0.20369605 -0.0023695134 -0.0006332387
## [5,] 1.00000000 0.0244876772 -0.0333214523
## [6,] 0.12654627 0.0424406888 -0.0059551884
## [7,] 0.08348008 0.0836350689 0.0642060093
## [8,] 0.12654627 -0.0580104573 -0.0500443914
## [9,] 1.00000000 0.0441315719 0.0133577139
这个包中使用了判别时代替具体数值,具体地,根据某个cross correlation matrices \(\rho_{\ell}\)的中的元素\(\rho_{\ell, ij}\)生成新的matrix S: \[S_{\ell, ij}= \begin{cases}+ & \rho_{\ell, ij} \geq 2/\sqrt{T}\\- & \rho_{\ell, ij} \leq -2/\sqrt{T}\\. & \rho_{\ell, ij} <2/\sqrt{T}\end{cases}\]
类似单元时间序列,在多元情况下,我们也可以用Ljung-Box Test来检验 \(H_0: \rho_1 = \rho_2 = ... \rho_m = 0\), 同样检验的基准为 \(\chi^{2}\)分布: \[Q_k(m) = T^2\sum_{\ell = 1}^m\frac{1}{T-\ell}tr(\hat\Gamma_{\ell}'\hat\Gamma_0^{-1}\Gamma_{\ell}\hat\Gamma_0^{-1})\]
对于序列zr,下面举例计算\(Q_3(2)\)
g1 = t(tem_zr[(1+1): nT, ]) %*% tem_zr[1:(nT-1),]
g1 = g1/(nT -1)
g1
## z1 z2 z3
## z1 0.0459392852 -0.002957088 0.07267625
## z2 -0.0130446087 0.033266118 -0.05487315
## z3 0.0007204752 0.040145423 0.02906713
trpd1 = t(g1) %*% solve(g0) %*% g1 %*% solve(g0)
trs = sum(diag(trpd1)) / (nT - 1)
Q31 = trs * (nT^2)
Q31
## [1] 7.021982
trpd2 = t(g2) %*% solve(g0) %*% g2 %*% solve(g0)
trs2 = sum(diag(trpd2)) / (nT - 2) + trs
Q32 = trs2 * (nT^2)
Q32
## [1] 12.85992
pv = 1 - pchisq(c(Q31, Q32), df = c(nV^2, 2*nV^2))
pv
## [1] 0.6348307 0.7998270
显然,在百分之五的confidence-level的状况下,可以拒绝说有在\(\rho_1, \rho_2\)中的元素都不相关, LJung Box Test 同样可以使用MTS 中的mq函数轻易实现:
mq(zr, lag = 2)
## Ljung-Box Statistics:
## m Q(m) df p-value
## [1,] 1.00 7.02 9.00 0.63
## [2,] 2.00 12.86 18.00 0.80
Tsay, Ruey S(2014): Multivariate Time Series Analysis With R and Financial Applications*, Chapter 1.