^{^{1.- Overview}}

This document is designed as a ‘backbone’ tutorial to explained with examples some of the time series stationarity analyses that we performed elsewhere on real data. The general motivation is that even state of the art methods for stationarity are not completely general: some of them are ‘good’ for detecting long non-stationary trends in the data and other are able to detect local non-stationarity even in globally stationary data. In the end, only the combination of both types of methods will give us a better general understanding of our data.

^{PART I}

^{^{2.- Data}}

We will use here three sources ofdata:

Random walk data including: with/without trend, with time varying slope.
Globally stationary data. Quaternion-quaternion distances to measure the rotations/translations of an object (in this case the protein PNP) around a fixed frame (actually we are writting this tutorial ‘a posteriori’; we already by using these methods that these signal under our simulation conditions are stationary).
Globally non-stationary data. The AirPassengers dataset (https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/AirPassengers.html), which contains the monthly airline passenger numbers from 1949 to 1960.

^{^{^{Ramdom walk data}}}

In this section we create a custom function able to generate several types of random walk data according to the specified input parameters:

Stationary random wall (slope = 0.0)
Random walk with constant trend (slope = real numbers)
Random walk with time-varyings stationarity (slope=“variable”). Here the slope is generated for each point as the product of a random number and the index of the “x” axis.

#Function to generate random walks
# Npoints: Number of points in the time series
#Amplitude: range of values selected in the sampling for the ramdom walk (from -Amplitude to +Amplitude)
#Slope: decide to include a linear trend in the random walk (Slope!=0), no trend (Slope=0) or a time-variable slope (Slope="variable")
randomWalk<-function(Npoints,Amplitude,Slope){
        rwt<-0
        for (i in 1:Npoints){
                if (Slope=="variable"){
                        m<-rnorm(1)*i
                } else{
                        m<-Slope
                }
                        
                x<-cumsum(sample(c(-Amplitude, Amplitude), 10, TRUE))  + m*i 
                rwt<-c(rwt, x)
        }
        rwt
}

#Create random walk data frame + plot
library(ggplot2)
library(gridExtra)

FALSE Loading required package: grid

N=300
rwN<-randomWalk(N,50,0.0)
rwT<-randomWalk(N,50,0.9)
rwV<-randomWalk(N,50,"variable")
t<-1:length(rwN)
RW<-data.frame(t,rwN,rwT,rwV)
q1<-qplot(t, rwN,data=RW, geom="line",ylab="RW")
q2<-qplot(t, rwT,data=RW, geom="line",ylab="RW + Trend")
q3<-qplot(t, rwV,data=RW, geom="line",ylab="RW + Variable Slope")
grid.arrange(q1, q2, q3, nrow=3)

^{^{^{Quaternion-quaternion distances}}}

Although in or analysis for the string method calculation we are interested in many images (several time series), here, for simplicity we will concentrate our attention on the image 1 from a data.framed already saved.

#load MD quaternion data
md.df<-read.table("md.csv",sep=",",header=TRUE)
md <- as.matrix(md.df[-1]) 
ST<-md[,1]  # 'ST' = STationary
plot(ST,type="l",xlab="time",ylab="Quaternion-Quaternion distances")

^{^{^{AirPassengers dataset}}}

Now we load the AirPassengers dataset and we plot it.

#load AirPassengers data
data(AirPassengers)
nonST<-as.vector(AirPassengers)  # 'nonST' = nonSTationary
plot(nonST,type="l",xlab="time",ylab="Monthly airline passengers")

^{PART II}

^{^{3.- Global stationarity: Augmented Dickey-Fuller Test (ADFT)}}

The Augmented Dickey-Fuller Test (ADFT) is extensively used to test the stationarity of time series. This test has been documented elsewhere( http://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test) and we also created a python notebook on the corresponding theory (link here!!).

Resuming, for an input time series, ADFT returns its statistics; if this number is lower than the test critical values (99%,95%, and 90%) which are (-2.58, -1.95, and -1.62), then the time series is stationary.

Now lets apply the test for our time series:

^{^{^{Random Walk data}}}

# Augmented Dickey-Fuller Test (ADFT) Random walk slope = 0.0
library(urca)
test_rwN<-ur.df(randomWalk(1000,50,0.0),type="none",selectlags="AIC",lags=10)
c("Stationary", slot(test_rwN,"teststat")[1],"---> good results")

## [1] "Stationary"        "-24.3514618203009" "---> good results"

# Augmented Dickey-Fuller Test (ADFT) Random walk with trend: slope = 1.0
test_rwT<-ur.df(randomWalk(1000,50,1.0),type="none",selectlags="AIC",lags=10)
c("Stationary",slot(test_rwT,"teststat")[1], "----> wrong result (but almost good!!)")

## [1] "Stationary"                            
## [2] "-2.84100490779498"                     
## [3] "----> wrong result (but almost good!!)"

# Augmented Dickey-Fuller Test (ADFT) Random walk with trend: slope = 10.0
test_rwT<-ur.df(randomWalk(1000,50,10.0),type="none",selectlags="AIC",lags=10)
c("non-Stationary",slot(test_rwT,"teststat")[1],"---> good results (but insensitive to small slopes)")

## [1] "non-Stationary"                                     
## [2] "2.00902586991942"                                   
## [3] "---> good results (but insensitive to small slopes)"

# Augmented Dickey-Fuller Test (ADFT) Random walk with variable slope
test_rwV<-ur.df(randomWalk(1000,50,"variable"),type="none",selectlags="AIC",lags=10)
c("Stationary",slot(test_rwV,"teststat")[1],"---> good result (but insensive to data complexity)")

## [1] "Stationary"                                         
## [2] "-18.8203143941033"                                  
## [3] "---> good result (but insensive to data complexity)"

ADFT correctly identify the random walk with slope = 0.0 (stationaty non trending data). However, in our example ADFT failed to recognize trends when the slope is small or time-varying which is not an optimal result. Now we will apply the method to the rest of our data.

^{^{^{ADFT on stationary (ST) data}}}

# Augmented Dickey-Fuller Test (ADFT) for ST time series
test_ST<-ur.df(ST,type="none",lags=0)
summary(test_ST)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0072959 -0.0013415  0.0000736  0.0015173  0.0085751 
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## z.lag.1 -0.0041385  0.0009149  -4.523 6.16e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.002134 on 9998 degrees of freedom
## Multiple R-squared:  0.002042,   Adjusted R-squared:  0.001942 
## F-statistic: 20.46 on 1 and 9998 DF,  p-value: 6.16e-06
## 
## 
## Value of test-statistic is: -4.5232 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

As we can see, the value of the ADFT statistics -4.5232429 is lower than any of the critical values, -2.58, -1.95 or -1.62; therefore we can conclude that this time series is stationary.

^{^{^{ADFT on non-stationary (nonST) data}}}

# Augmented Dickey-Fuller Test (ADFT) for nonST time series
test_nonST<-ur.df(nonST,type="none",lags=0)
summary(test_nonST)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -101.222  -16.181    3.853   22.388   86.765 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)
## z.lag.1 0.000439   0.009316   0.047    0.962
## 
## Residual standard error: 33.83 on 142 degrees of freedom
## Multiple R-squared:  1.564e-05,  Adjusted R-squared:  -0.007027 
## F-statistic: 0.00222 on 1 and 142 DF,  p-value: 0.9625
## 
## 
## Value of test-statistic is: 0.0471 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

In this case, the value of the ADFT statistics 0.0471197 is higher than any of the critical values, -2.58, -1.95 or -1.62; therefore we can conclude that this time series is non-stationary.

^{^{Conclusions (ADFT)}}

In spite of it extended use in time series analysis, ADFT should be interpreted carefully on real data. In our examples, the method did not showred a great reliability as it failed to recognized trending data or data that is not trending but has complex non-stationary patterns in the data (simulated here using rendom walks with added time varying slopes). Only in very clear cases (such as the AirPassengers dataset or the classic non-trending random walk data), ADTF showed good results, which, then again is not very usefull.

There is actually a family of “root”methods like ADFT with several adjustable parameters. We leave for another opportunity the empirical testing of some of them in search for a better reliability.

^{PART III}

^{^{4.- Local non-stationarity on stationary data: Wavelet Spectrum Test (WST)}}

Now, even when a test like ADFT can show that the general trend of the time series is stationary there is no reason why the time series could not present ‘local’ irregularites that are in general overlooked by this method by construction. In this sense we can turn to a method that can considered the same global stationary data on different time scales to detect such irregularities. This is the first general intuition we can give on a wavelet spectrum test (WST).

WST made heavy use of wavelet spectral analysis (https://en.wikipedia.org/wiki/Wavelet_transform), in which a function that represents a small wave-like oscillation (i.e., a wavelet) is repeatedly used in the time domain to filter (i.e., transform) the signal. Information at different resolutions of the signal can be obtained by using several streached versions of the wavelet. The result is a spectral analysis technique that keeps time and frequency information.

Several functions can be used to define a wavelet. In the method that we are describing in this document the Haar rectangular wavelet is used. A plot of this wavelet is shown bellow:

# example of a Haar wavelet
h_1_0 = function(t){
    if ((t >= 0) & (t <= 1/2)){return(1)}
    else if ((t >= 1/2) & (t <= 1)){return(-1)}
    else{return(0)}
}
h = function(t,n,k){return(2^((n-1)/2) * h_1_0((2^(n-1)) * t - k))}
x <- seq(.5, 2.5, .001)
plot(x, sapply(x, function(x) h(x,1,1)), pch='.', type='l',ylab="Haar wavelet")

WST uses a particular property of the Haar wavelet: the haar wavelet transform for a constant signal is zero:

# Discrete wavelet transform using the 'Haar' wavelet
library(wavelets)
signal1<-c(1,1,1,1)
signal2<-c(1,2,3,4)
HaarT_signal1<-dwt(as.matrix(signal1), filter="haar")
HaarT_signal2<-dwt(as.matrix(signal2), filter="haar")  
print(c("Haar wavelet coefficients for stationary signal",slot(HaarT_signal1,"W")))

## [[1]]
## [1] "Haar wavelet coefficients for stationary signal"
## 
## $W1
##      [,1]
## [1,]    0
## [2,]    0
## 
## $W2
##      [,1]
## [1,]    0

print(c("Haar wavelet coefficients for NON-stationary signal",slot(HaarT_signal2,"W")))

## [[1]]
## [1] "Haar wavelet coefficients for NON-stationary signal"
## 
## $W1
##           [,1]
## [1,] 0.7071068
## [2,] 0.7071068
## 
## $W2
##      [,1]
## [1,]    2

Therefore, unlike several traditional methods that study the stationarity of a signal based on the stability of some descriptive statistics (e.g., mean, variance, etc.); WST studies the stability of a haar wavelet based transform of the signal.

Because for a long signal the wavelet transform is applied several times along the x axis (time information) and y axis (scale information); a hypothesis stationarity test should be performed for each ‘transformation’. Failure to provide a stationarity result can therefore tell us at what moment in time and on which signal scale the non-stationarity took place.

We can find a user-friendly implementation of this approach in the package locits:

# WST as implemented in the "locits" package
library("locits")
hwtos2(ST[1:2^13])  #the length of the signal has to be a power of 2.

FALSE   12    11    10    9    8    7    6    5    4    3

FALSE Class 'tos' : Stationarity Object :
FALSE        ~~~~  : List with 9 components with names
FALSE               nreject rejpval spvals sTS AllTS AllPVal alpha x xSD 
FALSE 
FALSE 
FALSE summary(.):
FALSE ----------
FALSE There are  1270  hypothesis tests altogether
FALSE There were  5  FDR rejects
FALSE The rejection p-value was  0.0001750913 
FALSE Using Bonferroni rejection p-value is  3.937008e-05 
FALSE And there would be  0  rejections.
FALSE Listing FDR rejects... (thanks Y&Y!)
FALSE P: 6 HWTlev:  6  indices on next line...[1] 40
FALSE P: 7 HWTlev:  6  indices on next line...[1] 14 40
FALSE P: 8 HWTlev:  6  indices on next line...[1] 64
FALSE P: 9 HWTlev:  2  indices on next line...[1] 4

In the tool provided by locits the assesment of the signifficance of each test is performed my two methods: the false discovery rate (FDR) https://en.wikipedia.org/wiki/False_discovery_rate and the Bonferroni correctionhttps://en.wikipedia.org/wiki/Bonferroni_correction.

In this case we can see that we perform a total of 1270 statistical tests, FDR rejected 5 of this tests and Bonferroni did not reject any test. Therefore, if we want to be careful we could conclude that there are 5 local non-stationarities in this time series.

Conveniently, locits provide us also with a graphical representation of the signal and the found non-stationarities (in scale and time).

# Plot of the local non-stationarities
plot(hwtos2(ST[1:2^13]),main="WST for a 'globally' stationary signal")

FALSE   12    11    10    9    8    7    6    5    4    3

In this plot 5 the red double arrows represent the detected non-stationarities. Particularly, the length of the double arrow corresponds with the scale of the wavelet and the specific non-stationary wavelet coefficient is at the middle point of the double arrow. The numbers on the right y axis are the values of the scale of the wavelet periodogram.

^{^{5.- WST on random walk data}}

First let’s see empirically the effect of this method in out random walk data. In the next figure we show two panels with the results of WST for random walk with slope 0.0 and 1.0

par(mfrow=c(1,2))
plot(hwtos2(randomWalk(5000,50,0.0)[1:2^12]),main="WST for RW with slope 0.0")

FALSE   11    10    9    8    7    6    5    4    3

FALSE NULL

plot(hwtos2(randomWalk(5000,50,1.0)[1:2^12]),main="WST for RW with slope 1.0")

FALSE   11    10    9    8    7    6    5    4    3

FALSE NULL

As we can see WST if not able to detect any internal nonstationarities in these time series. To be precise, the direct application of WST to treding data is not a good option and one fast fix to this would be to detrend the data. However,as these plots show, itis likely that if there is nothing to detect in the treding data, the WST results with detrended data will behave in a similar way.

On the other hand, WST is really valuable to obtain information from more complex time series like the one generated using a random walk generator with a tipe-varying slope. In this case, we see that in fact WST is able to describe the rich dynamic behaviour of this time series.

par(mfrow=c(1,1))
plot(hwtos2(randomWalk(5000,50,"variable")[1:2^12]),main="WST for RW with time-varying slope")

FALSE   11    10    9    8    7    6    5    4    3

Interestingly enough, the capacity of WST to detect these time-changing patterns in the data does not fade away even when we apply some transformation to incorporate trends in the data as in the next figure:

tmp<-randomWalk(5000,50,"variable")
rwTV<-0
for (i in 1:5000){
        rwTV<-c(rwTV, abs(tmp[i])+100*i)
}
par(mfrow=c(1,1))
plot(hwtos2(rwTV[1:2^12]),main="WST for trending RW with time-varying slope")

FALSE   11    10    9    8    7    6    5    4    3

^{^{Conclusions (WST)}}

The Wavelet Spectrum Test (WST) seems to be a very powerfull method to detect complext patterns in the data as it in fact shoued superior sensibility to detect hidden features in the random walk data generated using a time varying slope with or without a trend. The method correctly failed to find patterns in the noise-like simple random walk models (trending or not).

^{^References}

Wavelet processes and adaptive estimation of the evolutionary wavelet spectrum.(2000)

A wavelet-based test for stationarity.(2002)

A test for second-order stationarity and approximate confidence intervals for localized autocovariances for locally stationary time series.(2013)

Mini-tutorial: an empirical analysis of global/local time series stationarity.

Rolando P. Hong Enriquez

28 September, 2015