Summary

Simulations show the importance of pre-whitening time series to identify significant cross-correlations between a target and a predictor.

library(forecast)
source("setPowerPointStyle.R")
setPowerPointStyle()

#Generates random data from ARIMA(1,1,0).
x = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 200)   


#Creates a matrix z with columns, xt, xt-3, and xt-4
z = ts.intersect(x, lag(x,-3), lag(x,-4)) 

#Creates y (our target) from lags 3 and 4 of randomly generated x
y = 15+0.8*z[,2]+1.5*z[,3]  

#CCF between x and y doesn't show any relationship
ccf(z[,1],y,na.action = na.omit,main='Cross-correlogram without pre-whitening')  

Here we cannot see any significant lag even if y was generated as a linear function of the predictor (with lags -3, -4). Let’s see what happens with pre-whitening.

source("setPowerPointStyle.R")
setPowerPointStyle()


#determine ARIMA model for predictor
aa=auto.arima(x)

coef(aa)
##       ar1 
## 0.6498489
#pre-whitening x
pwx=aa$residuals

#pre-whitening y
newpwy = filter(y, filter = c(1,-(1+coef(aa)[1]),coef(aa)[1]), sides =1)

ccf(pwx,newpwy,na.action=na.omit,main='Cros-correlogram after pre-whitening',xlim=c(-20,20))

Now we can clearly see the significant lags in the correlogram.