This document is designed as a ‘backbone’ tutorial to explained with examples some of the time series stationarity analyses that we performed elsewhere on real data. The general motivation is that even state of the art methods for stationarity are not completely general: some of them are ‘good’ for detecting long non-stationary trends in the data and other are able to detect local non-stationarity even in globally stationary data. In the end, only the combination of both types of methods will give us a better general understanding of our data.
We will use here three sources ofdata:
In this section we create a custom function able to generate several types of random walk data according to the specified input parameters:
#Function to generate random walks
# Npoints: Number of points in the time series
#Amplitude: range of values selected in the sampling for the ramdom walk (from -Amplitude to +Amplitude)
#Slope: decide to include a linear trend in the random walk (Slope!=0), no trend (Slope=0) or a time-variable slope (Slope="variable")
randomWalk<-function(Npoints,Amplitude,Slope){
rwt<-0
for (i in 1:Npoints){
if (Slope=="variable"){
m<-rnorm(1)*i
} else{
m<-Slope
}
x<-cumsum(sample(c(-Amplitude, Amplitude), 10, TRUE)) + m*i
rwt<-c(rwt, x)
}
rwt
}
#Create random walk data frame + plot
library(ggplot2)
library(gridExtra)
FALSE Loading required package: grid
N=300
rwN<-randomWalk(N,50,0.0)
rwT<-randomWalk(N,50,0.9)
rwV<-randomWalk(N,50,"variable")
t<-1:length(rwN)
RW<-data.frame(t,rwN,rwT,rwV)
q1<-qplot(t, rwN,data=RW, geom="line",ylab="RW")
q2<-qplot(t, rwT,data=RW, geom="line",ylab="RW + Trend")
q3<-qplot(t, rwV,data=RW, geom="line",ylab="RW + Variable Slope")
grid.arrange(q1, q2, q3, nrow=3)
Although in or analysis for the string method calculation we are interested in many images (several time series), here, for simplicity we will concentrate our attention on the image 1 from a data.framed already saved.
#load MD quaternion data
md.df<-read.table("md.csv",sep=",",header=TRUE)
md <- as.matrix(md.df[-1])
ST<-md[,1] # 'ST' = STationary
plot(ST,type="l",xlab="time",ylab="Quaternion-Quaternion distances")
Now we load the AirPassengers dataset and we plot it.
#load AirPassengers data
data(AirPassengers)
nonST<-as.vector(AirPassengers) # 'nonST' = nonSTationary
plot(nonST,type="l",xlab="time",ylab="Monthly airline passengers")
The Augmented Dickey-Fuller Test (ADFT) is extensively used to test the stationarity of time series. This test has been documented elsewhere( http://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test) and we also created a python notebook on the corresponding theory (link here!!).
Resuming, for an input time series, ADFT returns its statistics; if this number is lower than the test critical values (99%,95%, and 90%) which are (-2.58, -1.95, and -1.62), then the time series is stationary.
Now lets apply the test for our time series:
# Augmented Dickey-Fuller Test (ADFT) Random walk slope = 0.0
library(urca)
test_rwN<-ur.df(randomWalk(1000,50,0.0),type="none",selectlags="AIC",lags=10)
c("Stationary", slot(test_rwN,"teststat")[1],"---> good results")
## [1] "Stationary" "-24.3514618203009" "---> good results"
# Augmented Dickey-Fuller Test (ADFT) Random walk with trend: slope = 1.0
test_rwT<-ur.df(randomWalk(1000,50,1.0),type="none",selectlags="AIC",lags=10)
c("Stationary",slot(test_rwT,"teststat")[1], "----> wrong result (but almost good!!)")
## [1] "Stationary"
## [2] "-2.84100490779498"
## [3] "----> wrong result (but almost good!!)"
# Augmented Dickey-Fuller Test (ADFT) Random walk with trend: slope = 10.0
test_rwT<-ur.df(randomWalk(1000,50,10.0),type="none",selectlags="AIC",lags=10)
c("non-Stationary",slot(test_rwT,"teststat")[1],"---> good results (but insensitive to small slopes)")
## [1] "non-Stationary"
## [2] "2.00902586991942"
## [3] "---> good results (but insensitive to small slopes)"
# Augmented Dickey-Fuller Test (ADFT) Random walk with variable slope
test_rwV<-ur.df(randomWalk(1000,50,"variable"),type="none",selectlags="AIC",lags=10)
c("Stationary",slot(test_rwV,"teststat")[1],"---> good result (but insensive to data complexity)")
## [1] "Stationary"
## [2] "-18.8203143941033"
## [3] "---> good result (but insensive to data complexity)"
ADFT correctly identify the random walk with slope = 0.0 (stationaty non trending data). However, in our example ADFT failed to recognize trends when the slope is small or time-varying which is not an optimal result. Now we will apply the method to the rest of our data.
# Augmented Dickey-Fuller Test (ADFT) for ST time series
test_ST<-ur.df(ST,type="none",lags=0)
summary(test_ST)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0072959 -0.0013415 0.0000736 0.0015173 0.0085751
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -0.0041385 0.0009149 -4.523 6.16e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.002134 on 9998 degrees of freedom
## Multiple R-squared: 0.002042, Adjusted R-squared: 0.001942
## F-statistic: 20.46 on 1 and 9998 DF, p-value: 6.16e-06
##
##
## Value of test-statistic is: -4.5232
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
As we can see, the value of the ADFT statistics -4.5232429 is lower than any of the critical values, -2.58, -1.95 or -1.62; therefore we can conclude that this time series is stationary.
# Augmented Dickey-Fuller Test (ADFT) for nonST time series
test_nonST<-ur.df(nonST,type="none",lags=0)
summary(test_nonST)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -101.222 -16.181 3.853 22.388 86.765
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 0.000439 0.009316 0.047 0.962
##
## Residual standard error: 33.83 on 142 degrees of freedom
## Multiple R-squared: 1.564e-05, Adjusted R-squared: -0.007027
## F-statistic: 0.00222 on 1 and 142 DF, p-value: 0.9625
##
##
## Value of test-statistic is: 0.0471
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
In this case, the value of the ADFT statistics 0.0471197 is higher than any of the critical values, -2.58, -1.95 or -1.62; therefore we can conclude that this time series is non-stationary.
In spite of it extended use in time series analysis, ADFT should be interpreted carefully on real data. In our examples, the method did not showred a great reliability as it failed to recognized trending data or data that is not trending but has complex non-stationary patterns in the data (simulated here using rendom walks with added time varying slopes). Only in very clear cases (such as the AirPassengers dataset or the classic non-trending random walk data), ADTF showed good results, which, then again is not very usefull.
There is actually a family of “root”methods like ADFT with several adjustable parameters. We leave for another opportunity the empirical testing of some of them in search for a better reliability.
Now, even when a test like ADFT can show that the general trend of the time series is stationary there is no reason why the time series could not present ‘local’ irregularites that are in general overlooked by this method by construction. In this sense we can turn to a method that can considered the same global stationary data on different time scales to detect such irregularities. This is the first general intuition we can give on a wavelet spectrum test (WST).
WST made heavy use of wavelet spectral analysis (https://en.wikipedia.org/wiki/Wavelet_transform), in which a function that represents a small wave-like oscillation (i.e., a wavelet) is repeatedly used in the time domain to filter (i.e., transform) the signal. Information at different resolutions of the signal can be obtained by using several streached versions of the wavelet. The result is a spectral analysis technique that keeps time and frequency information.
Several functions can be used to define a wavelet. In the method that we are describing in this document the Haar rectangular wavelet is used. A plot of this wavelet is shown bellow:
# example of a Haar wavelet
h_1_0 = function(t){
if ((t >= 0) & (t <= 1/2)){return(1)}
else if ((t >= 1/2) & (t <= 1)){return(-1)}
else{return(0)}
}
h = function(t,n,k){return(2^((n-1)/2) * h_1_0((2^(n-1)) * t - k))}
x <- seq(.5, 2.5, .001)
plot(x, sapply(x, function(x) h(x,1,1)), pch='.', type='l',ylab="Haar wavelet")
WST uses a particular property of the Haar wavelet: the haar wavelet transform for a constant signal is zero:
# Discrete wavelet transform using the 'Haar' wavelet
library(wavelets)
signal1<-c(1,1,1,1)
signal2<-c(1,2,3,4)
HaarT_signal1<-dwt(as.matrix(signal1), filter="haar")
HaarT_signal2<-dwt(as.matrix(signal2), filter="haar")
print(c("Haar wavelet coefficients for stationary signal",slot(HaarT_signal1,"W")))
## [[1]]
## [1] "Haar wavelet coefficients for stationary signal"
##
## $W1
## [,1]
## [1,] 0
## [2,] 0
##
## $W2
## [,1]
## [1,] 0
print(c("Haar wavelet coefficients for NON-stationary signal",slot(HaarT_signal2,"W")))
## [[1]]
## [1] "Haar wavelet coefficients for NON-stationary signal"
##
## $W1
## [,1]
## [1,] 0.7071068
## [2,] 0.7071068
##
## $W2
## [,1]
## [1,] 2
Therefore, unlike several traditional methods that study the stationarity of a signal based on the stability of some descriptive statistics (e.g., mean, variance, etc.); WST studies the stability of a haar wavelet based transform of the signal.
Because for a long signal the wavelet transform is applied several times along the x axis (time information) and y axis (scale information); a hypothesis stationarity test should be performed for each ‘transformation’. Failure to provide a stationarity result can therefore tell us at what moment in time and on which signal scale the non-stationarity took place.
We can find a user-friendly implementation of this approach in the package locits:
# WST as implemented in the "locits" package
library("locits")
hwtos2(ST[1:2^13]) #the length of the signal has to be a power of 2.
FALSE 12 11 10 9 8 7 6 5 4 3
FALSE Class 'tos' : Stationarity Object :
FALSE ~~~~ : List with 9 components with names
FALSE nreject rejpval spvals sTS AllTS AllPVal alpha x xSD
FALSE
FALSE
FALSE summary(.):
FALSE ----------
FALSE There are 1270 hypothesis tests altogether
FALSE There were 5 FDR rejects
FALSE The rejection p-value was 0.0001750913
FALSE Using Bonferroni rejection p-value is 3.937008e-05
FALSE And there would be 0 rejections.
FALSE Listing FDR rejects... (thanks Y&Y!)
FALSE P: 6 HWTlev: 6 indices on next line...[1] 40
FALSE P: 7 HWTlev: 6 indices on next line...[1] 14 40
FALSE P: 8 HWTlev: 6 indices on next line...[1] 64
FALSE P: 9 HWTlev: 2 indices on next line...[1] 4
In the tool provided by locits the assesment of the signifficance of each test is performed my two methods: the false discovery rate (FDR) https://en.wikipedia.org/wiki/False_discovery_rate and the Bonferroni correctionhttps://en.wikipedia.org/wiki/Bonferroni_correction.
In this case we can see that we perform a total of 1270 statistical tests, FDR rejected 5 of this tests and Bonferroni did not reject any test. Therefore, if we want to be careful we could conclude that there are 5 local non-stationarities in this time series.
Conveniently, locits provide us also with a graphical representation of the signal and the found non-stationarities (in scale and time).
# Plot of the local non-stationarities
plot(hwtos2(ST[1:2^13]),main="WST for a 'globally' stationary signal")
FALSE 12 11 10 9 8 7 6 5 4 3
In this plot 5 the red double arrows represent the detected non-stationarities. Particularly, the length of the double arrow corresponds with the scale of the wavelet and the specific non-stationary wavelet coefficient is at the middle point of the double arrow. The numbers on the right y axis are the values of the scale of the wavelet periodogram.
First let’s see empirically the effect of this method in out random walk data. In the next figure we show two panels with the results of WST for random walk with slope 0.0 and 1.0
par(mfrow=c(1,2))
plot(hwtos2(randomWalk(5000,50,0.0)[1:2^12]),main="WST for RW with slope 0.0")
FALSE 11 10 9 8 7 6 5 4 3
FALSE NULL
plot(hwtos2(randomWalk(5000,50,1.0)[1:2^12]),main="WST for RW with slope 1.0")
FALSE 11 10 9 8 7 6 5 4 3
FALSE NULL
As we can see WST if not able to detect any internal nonstationarities in these time series. To be precise, the direct application of WST to treding data is not a good option and one fast fix to this would be to detrend the data. However,as these plots show, itis likely that if there is nothing to detect in the treding data, the WST results with detrended data will behave in a similar way.
On the other hand, WST is really valuable to obtain information from more complex time series like the one generated using a random walk generator with a tipe-varying slope. In this case, we see that in fact WST is able to describe the rich dynamic behaviour of this time series.
par(mfrow=c(1,1))
plot(hwtos2(randomWalk(5000,50,"variable")[1:2^12]),main="WST for RW with time-varying slope")
FALSE 11 10 9 8 7 6 5 4 3
Interestingly enough, the capacity of WST to detect these time-changing patterns in the data does not fade away even when we apply some transformation to incorporate trends in the data as in the next figure:
tmp<-randomWalk(5000,50,"variable")
rwTV<-0
for (i in 1:5000){
rwTV<-c(rwTV, abs(tmp[i])+100*i)
}
par(mfrow=c(1,1))
plot(hwtos2(rwTV[1:2^12]),main="WST for trending RW with time-varying slope")
FALSE 11 10 9 8 7 6 5 4 3
The Wavelet Spectrum Test (WST) seems to be a very powerfull method to detect complext patterns in the data as it in fact shoued superior sensibility to detect hidden features in the random walk data generated using a time varying slope with or without a trend. The method correctly failed to find patterns in the noise-like simple random walk models (trending or not).