Simulations show the importance of pre-whitening time series to identify significant cross-correlations between a target and a predictor.
library(forecast)
source("setPowerPointStyle.R")
setPowerPointStyle()
#Generates random data from ARIMA(1,1,0).
x = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 200)
#Creates a matrix z with columns, xt, xt-3, and xt-4
z = ts.intersect(x, lag(x,-3), lag(x,-4))
#Creates y (our target) from lags 3 and 4 of randomly generated x
y = 15+0.8*z[,2]+1.5*z[,3]
#CCF between x and y doesn't show any relationship
ccf(z[,1],y,na.action = na.omit,main='Cross-correlogram without pre-whitening')
Here we cannot see any significant lag even if y was generated as a linear function of the predictor (with lags -3, -4). Let’s see what happens with pre-whitening.
source("setPowerPointStyle.R")
setPowerPointStyle()
#determine ARIMA model for predictor
aa=auto.arima(x)
coef(aa)
## ar1
## 0.6498489
#pre-whitening x
pwx=aa$residuals
#pre-whitening y
newpwy = filter(y, filter = c(1,-(1+coef(aa)[1]),coef(aa)[1]), sides =1)
ccf(pwx,newpwy,na.action=na.omit,main='Cros-correlogram after pre-whitening',xlim=c(-20,20))
Now we can clearly see the significant lags in the correlogram.