Introduction

Causality is the agency of efficacy that connects one process (the cause) with another process or state (the effect), where the first is understood to be partly repsonsible for the second, and the second is dependent on the first. In gegeral, a process has many causes, which are said to be causal factors for it, and all lie in its past. An effect can in turn be a cause of many other effects.

The Granger causality is a statistical hypothesis test for determing whether one time series is useful in forecasting another. A time series X is said to Granger-cause Y if it can be shown that those X values provide statistically significant information about future values of Y through a series of t-tests and F-tests on lagged values of X. * * * ## Principle and theory

Principle

There are two underlying principles: 1. The cause happens prior to its effect. 2. THe cause has unique information about the future values of its effect.

\[P[Y(t+1) \in A | I(t)] \neq P[Y(t+1) \in I_{-X}(t)]\] where P referes to probability, A is an arbitrary non-empty set and I(t) and \(I_{-X}(t)\) respectively denote the information available as of time t in the entire universe, and that in the modified universe in which X is executed. If the above hypothesis is accepted, we say that X Granger-causes Y.

Mathmatical statement and null hypothesis

Let y and x be stationary time series. To test null hypothesis that x does not Granger-cause y, one first finds the proper lagged values of y to include in a univariate autoregression of y: \[y_t = a_0 + a_1y_{t-1} + a_2y_{t-2} + ... + a_my_{t-m} +error_t\] next, the autoregression is augmented by including lagged values of x: \[y_t = a_0 + a_1y_{t-1} + a_2y_{t-2} + ... + a_my_{t-m} +error_t + b_px_{t-p}+...+b_qx_{t-q}+error_t.\] The null hypothesis that x does not Granger-cause y is not rejected f and only if no lagged values of x are retained in the regression.

Multivariate analysis (conditionaly Granger causality test)

Multivariate Granger causality analysis is usually performed by fitting a vector autoregressive model (VAR) to the time series. \[X(t) = \sum_{\tau=1}^{L}{A_\tau X(t-\tau) + \epsilon(t)}\] where \(\epsilon(t)\) is a white Gaussian random vector and \(A_\tau\) is a mtrix for every \(\tau\). If at least one of \(A_\tau(j,i)\) for \(\tau = 1 ... L\) is significantly larger than zero (in absolute value), \(X_i\) is called Granger causes of another time series \(X_j\).

F-statistic

The test is usually done in Wald or F test.

\[F = \frac{\frac{RSS_1-RSS_2}{P_2-P_1}}{\frac{RSS_2}{n-p_2}}\] The null hypothesis (model 2 is not better than model 1) is rejected if the F calculated from the data is greater than the critical value of the F-distribution for some desired false-rejection probability (e.g. 0.05). The F-test is a Wald test.


In the following example, a granger-causality test using the Toda-Yamamoto method are implemented. ref: https://www.christophpfeiffer.org/2012/11/07/toda-yamamoto-implementation-in-r/

The recommended steps: * 1. find the integration order I(k) using ndiffs * 2. select optimal lag order p(1…m) using VARselect which may return more than one,max =4 values * 3. Build the VAR model, do serial.test on the VAR.p models and select the one with the most serial correlation for VAR models * 4. Build an augumented VAR (p+k) * 5. Run causality test from VARS model * 6. Save the two-way pairwise results to an dataframe

library(fUnitRoots)
## Loading required package: urca
## Loading required package: timeDate
## Loading required package: timeSeries
## Loading required package: fBasics
## 
## Rmetrics Package fBasics
## Analysing Markets and calculating Basic Statistics
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
## 
## Attaching package: 'fUnitRoots'
## The following objects are masked from 'package:urca':
## 
##     punitroot, qunitroot, unitrootTable
library(urca)
library(vars)
## Loading required package: MASS
## Loading required package: strucchange
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following object is masked from 'package:timeSeries':
## 
##     time<-
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: lmtest
library(aod)
library(zoo)
library(tseries)

Load data

cof <- read.csv("coffee_data.csv", header=T,sep=";")
names(cof)
## [1] "Date"    "Arabica" "Robusta"
#Adjust Date format
cof["Date"]<-paste(sub("M","-",cof$Date),"-01",sep="")
#Visualize
plot(as.Date(cof$Date),cof$Arabica,type="l",col="black",lwd=2)
lines(as.Date(cof$Date),cof$Robusta,col="blue",lty=2,lwd=1)
legend("topleft",c("Arabica","Robusta"),col=c("black","blue"),lty=c(1,2),lwd=c(2,1),bty="n")

Possible structural break in 1970s. Therefore only values from 1976:01 onwards are regarded

cof1<-cof[193:615,]
#Visualize
plot(as.Date(cof1$Date),cof1$Arabica,type="l",col="black",lwd=2,ylim=range(cof1$Robusta))
lines(as.Date(cof1$Date),cof1$Robusta,col="blue",lty=2,lwd=1)
legend("topright",c("Arabica","Robusta"),col=c("black","blue"),lty=c(1,2),lwd=c(2,1),bty="n")

#Test for unit roots
summary(ur.df(cof$Arabica))
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -136.541   -5.926    0.128    7.416  153.326 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.001184   0.003383  -0.350    0.726    
## z.diff.lag  0.282005   0.039025   7.226 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.43 on 611 degrees of freedom
## Multiple R-squared:  0.07893,    Adjusted R-squared:  0.07591 
## F-statistic: 26.18 on 2 and 611 DF,  p-value: 1.236e-11
## 
## 
## Value of test-statistic is: -0.35 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
#Test for unit roots
summary(ur.df(cof$Robusta))
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -96.278  -3.314   0.416   5.359 109.203 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.004149   0.003236  -1.282      0.2    
## z.diff.lag  0.372268   0.037648   9.888   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.59 on 611 degrees of freedom
## Multiple R-squared:  0.1386, Adjusted R-squared:  0.1358 
## F-statistic: 49.15 on 2 and 611 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -1.2821 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
#Test for unit root using kpss
summary(ur.kpss(cof$Arabica))
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 2.2933 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
summary(ur.kpss(cof$Arabica))
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 2.2933 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
ur.df(diff(cof$Arabica,1))
## 
## ############################################################### 
## # Augmented Dickey-Fuller Test Unit Root / Cointegration Test # 
## ############################################################### 
## 
## The value of the test statistic is: -13.6431
ur.df(diff(cof$Robusta,1))
## 
## ############################################################### 
## # Augmented Dickey-Fuller Test Unit Root / Cointegration Test # 
## ############################################################### 
## 
## The value of the test statistic is: -14.6622
ur.kpss(diff(cof$Arabica,1))
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.0897
ur.kpss(diff(cof$Robusta,1))
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.042

Since first order differencing eliminates the unit root, the maximum order of integration is concluded to be I(1).

Set up VAR-Model

select lag order // either 2 or 6

VARselect(cof1[,2:3],lag=20,type="both")
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      6      2      2      6 
## 
## $criteria
##                  1           2           3           4           5
## AIC(n)    11.07263    10.94173    10.94208    10.95668    10.95945
## HQ(n)     11.10405    10.98887    11.00493    11.03525    11.05373
## SC(n)     11.15201    11.06080    11.10085    11.15513    11.19760
## FPE(n) 64384.40808 56485.15349 56505.29955 57336.74180 57496.76634
##                  6           7           8           9          10
## AIC(n)    10.93581    10.95072    10.95527    10.95470    10.94995
## HQ(n)     11.04580    11.07643    11.09669    11.11184    11.12280
## SC(n)     11.21365    11.26825    11.31249    11.35162    11.38656
## FPE(n) 56154.60934 56999.54484 57261.61258 57231.72097 56963.70710
##                 11          12          13          14          15
## AIC(n)    10.95840    10.97315    10.99035    11.00656    11.00542
## HQ(n)     11.14696    11.17743    11.21035    11.24227    11.25684
## SC(n)     11.43470    11.48914    11.54604    11.60194    11.64049
## FPE(n) 57450.43593 58308.89763 59325.87222 60301.75016 60240.00400
##                 16          17          18          19          20
## AIC(n)    11.01678    11.02608    11.02947    11.04578    11.05028
## HQ(n)     11.28391    11.30893    11.32804    11.36006    11.38027
## SC(n)     11.69154    11.74053    11.78362    11.83961    11.88380
## FPE(n) 60936.28891 61514.98344 61734.49667 62761.22078 63057.02316
##VAR Model, lag=2
V.2<-VAR(cof1[,2:3],p=2,type="both")
st <- serial.test(V.2)
str(st)
## List of 2
##  $ resid : num [1:421, 1:2] -6.63 47 9.04 20.53 -36.04 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:421] "1" "2" "3" "4" ...
##   .. ..$ : chr [1:2] "Arabica" "Robusta"
##  $ serial:List of 5
##   ..$ statistic: Named num 106
##   .. ..- attr(*, "names")= chr "Chi-squared"
##   ..$ parameter: Named num 56
##   .. ..- attr(*, "names")= chr "df"
##   ..$ p.value  : Named num 5.64e-05
##   .. ..- attr(*, "names")= chr "Chi-squared"
##   ..$ method   : chr "Portmanteau Test (asymptotic)"
##   ..$ data.name: chr "Residuals of VAR object V.2"
##   ..- attr(*, "class")= chr "htest"
##  - attr(*, "class")= chr "varcheck"
##VAR-Model, lag=6
V.6<-VAR(cof1[,2:3],p=6,type="both")
serial.test(V.6)
## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object V.6
## Chi-squared = 37.242, df = 40, p-value = 0.5951
#Stability analysis
1/roots(V.6)[[1]] # ">1"
## [1] 1.033862
1/roots(V.6)[[2]] # ">1"
## [1] 1.15703
#Alternative stability analyis
plot(stability(V.6)) ## looks fine

Model with p=6 is less likely to be serially correlated. Thus model with p=6 is selected.

Model with additional lag is set up.

V.7<-VAR(cof1[,2:3],p=7,type="both")
V.7$varresult
## $Arabica
## 
## Call:
## lm(formula = y ~ -1 + ., data = datamat)
## 
## Coefficients:
## Arabica.l1  Robusta.l1  Arabica.l2  Robusta.l2  Arabica.l3  Robusta.l3  
##   1.150779    0.132064   -0.036325   -0.204472   -0.232307    0.273024  
## Arabica.l4  Robusta.l4  Arabica.l5  Robusta.l5  Arabica.l6  Robusta.l6  
##   0.090439   -0.279599   -0.052212    0.008847    0.185130   -0.119799  
## Arabica.l7  Robusta.l7       const       trend  
##  -0.139132    0.196647    8.097807    0.003104  
## 
## 
## $Robusta
## 
## Call:
## lm(formula = y ~ -1 + ., data = datamat)
## 
## Coefficients:
## Arabica.l1  Robusta.l1  Arabica.l2  Robusta.l2  Arabica.l3  Robusta.l3  
##   -0.04887     1.42130     0.20041    -0.65652    -0.22869     0.29542  
## Arabica.l4  Robusta.l4  Arabica.l5  Robusta.l5  Arabica.l6  Robusta.l6  
##    0.07800    -0.12070    -0.09070     0.12401     0.20479    -0.24496  
## Arabica.l7  Robusta.l7       const       trend  
##   -0.09932     0.12843    11.11522    -0.02227
summary(V.7)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: Arabica, Robusta 
## Deterministic variables: both 
## Sample size: 416 
## Log Likelihood: -3496.75 
## Roots of the characteristic polynomial:
## 0.9593 0.8435 0.8435 0.7787 0.7787 0.6966 0.6966 0.6274 0.6274 0.6102 0.6102 0.5987 0.5756 0.1638
## Call:
## VAR(y = cof1[, 2:3], p = 7, type = "both")
## 
## 
## Estimation results for equation Arabica: 
## ======================================== 
## Arabica = Arabica.l1 + Robusta.l1 + Arabica.l2 + Robusta.l2 + Arabica.l3 + Robusta.l3 + Arabica.l4 + Robusta.l4 + Arabica.l5 + Robusta.l5 + Arabica.l6 + Robusta.l6 + Arabica.l7 + Robusta.l7 + const + trend 
## 
##             Estimate Std. Error t value Pr(>|t|)    
## Arabica.l1  1.150779   0.084136  13.678   <2e-16 ***
## Robusta.l1  0.132064   0.116569   1.133   0.2579    
## Arabica.l2 -0.036325   0.131807  -0.276   0.7830    
## Robusta.l2 -0.204472   0.194898  -1.049   0.2948    
## Arabica.l3 -0.232307   0.134396  -1.729   0.0847 .  
## Robusta.l3  0.273024   0.204930   1.332   0.1835    
## Arabica.l4  0.090439   0.134762   0.671   0.5025    
## Robusta.l4 -0.279599   0.205428  -1.361   0.1743    
## Arabica.l5 -0.052212   0.134478  -0.388   0.6980    
## Robusta.l5  0.008847   0.203831   0.043   0.9654    
## Arabica.l6  0.185130   0.130472   1.419   0.1567    
## Robusta.l6 -0.119799   0.193642  -0.619   0.5365    
## Arabica.l7 -0.139132   0.084410  -1.648   0.1001    
## Robusta.l7  0.196647   0.116618   1.686   0.0925 .  
## const       8.097807   6.108936   1.326   0.1857    
## trend       0.003104   0.016641   0.187   0.8521    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 25.2 on 400 degrees of freedom
## Multiple R-Squared: 0.9461,  Adjusted R-squared: 0.9441 
## F-statistic: 468.1 on 15 and 400 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation Robusta: 
## ======================================== 
## Robusta = Arabica.l1 + Robusta.l1 + Arabica.l2 + Robusta.l2 + Arabica.l3 + Robusta.l3 + Arabica.l4 + Robusta.l4 + Arabica.l5 + Robusta.l5 + Arabica.l6 + Robusta.l6 + Arabica.l7 + Robusta.l7 + const + trend 
## 
##            Estimate Std. Error t value Pr(>|t|)    
## Arabica.l1 -0.04887    0.06093  -0.802   0.4230    
## Robusta.l1  1.42130    0.08442  16.835  < 2e-16 ***
## Arabica.l2  0.20041    0.09546   2.099   0.0364 *  
## Robusta.l2 -0.65652    0.14115  -4.651 4.49e-06 ***
## Arabica.l3 -0.22869    0.09733  -2.350   0.0193 *  
## Robusta.l3  0.29542    0.14842   1.990   0.0472 *  
## Arabica.l4  0.07800    0.09760   0.799   0.4247    
## Robusta.l4 -0.12070    0.14878  -0.811   0.4177    
## Arabica.l5 -0.09070    0.09739  -0.931   0.3523    
## Robusta.l5  0.12401    0.14762   0.840   0.4014    
## Arabica.l6  0.20479    0.09449   2.167   0.0308 *  
## Robusta.l6 -0.24496    0.14024  -1.747   0.0815 .  
## Arabica.l7 -0.09932    0.06113  -1.625   0.1050    
## Robusta.l7  0.12843    0.08446   1.521   0.1291    
## const      11.11522    4.42435   2.512   0.0124 *  
## trend      -0.02227    0.01205  -1.848   0.0654 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 18.25 on 400 degrees of freedom
## Multiple R-Squared: 0.9702,  Adjusted R-squared: 0.9691 
## F-statistic: 867.7 on 15 and 400 DF,  p-value: < 2.2e-16 
## 
## 
## 
## Covariance matrix of residuals:
##         Arabica Robusta
## Arabica   634.9   370.5
## Robusta   370.5   333.0
## 
## Correlation matrix of residuals:
##         Arabica Robusta
## Arabica  1.0000  0.8057
## Robusta  0.8057  1.0000

Wald-test (H0: Robusta does not Granger-cause Arabica)

wald.test(b=coef(V.7$varresult[[1]]), Sigma=vcov(V.7$varresult[[1]]), Terms=c(2,4,6,8,10,12))
## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 8.6, df = 6, P(> X2) = 0.2

Could not be rejected (X2=8.6; p=0.2)

Wald.test (H0: Arabica does not Granger-cause Robusta)

wald.test(b=coef(V.7$varresult[[2]]), Sigma=vcov(V.7$varresult[[2]]), Terms= c(1,3,5,7,9,11))
## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 12.3, df = 6, P(> X2) = 0.056

Could be rejected at 10% (X2=12.3; p=0.056)

It seems that Arabica Granger-causes Robusta prices, but not the other way around.


Discussion with alternative

While using the causality function from library {vars},the granger causality test shows slightly different results but reach same statistical conclusion.

# use causality function under vars
causality(V.7,'Robusta')
## $Granger
## 
##  Granger causality H0: Robusta do not Granger-cause Arabica
## 
## data:  VAR object V.7
## F-Test = 1.2416, df1 = 7, df2 = 800, p-value = 0.2771
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: Robusta and Arabica
## 
## data:  VAR object V.7
## Chi-squared = 163.76, df = 1, p-value < 2.2e-16

H0: Robusta do not Granger-cause Arabica will not be rejected since p-value > 0.1 at 10%. So Robusta does not Granger-cause Arabica.

# use causality function under vars
causality(V.7,'Arabica')
## $Granger
## 
##  Granger causality H0: Arabica do not Granger-cause Robusta
## 
## data:  VAR object V.7
## F-Test = 1.8527, df1 = 7, df2 = 800, p-value = 0.07446
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: Arabica and Robusta
## 
## data:  VAR object V.7
## Chi-squared = 163.76, df = 1, p-value < 2.2e-16

H0: Arabica do not Granger-cause Robusta will be rejected at 10% (p = 0.07). Same conclusion as using Wald-test for Granger test.

# test for var(6) which does not count the cointegration
causality(V.6,'Robusta')
## $Granger
## 
##  Granger causality H0: Robusta do not Granger-cause Arabica
## 
## data:  VAR object V.6
## F-Test = 0.85024, df1 = 6, df2 = 806, p-value = 0.5313
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: Robusta and Arabica
## 
## data:  VAR object V.6
## Chi-squared = 164.49, df = 1, p-value < 2.2e-16

Robusta do not Granger-cause Arabica

# test for var(6)
causality(V.6,'Arabica')
## $Granger
## 
##  Granger causality H0: Arabica do not Granger-cause Robusta
## 
## data:  VAR object V.6
## F-Test = 1.7365, df1 = 6, df2 = 806, p-value = 0.1096
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: Arabica and Robusta
## 
## data:  VAR object V.6
## Chi-squared = 164.49, df = 1, p-value < 2.2e-16

Arabica do not Granger-cause Robusta at 10% with p-value(0.109).


The order of integration can have some impact on the granger causality test as seen from the difference between var(6) and var(7).