Data Analysis

These are the auto-correlation estimates for both series (above) as well as of their first differences (below)

par(mfrow=c(2,2))
acf(random_walk)
acf(TVAR)
acf(diff(random_walk))
acf(diff(TVAR))

Now lets run some regressions: We regress the first series against their lagged ones by one.

reg1 <-lm(random_walk[length(random_walk):2] ~ random_walk[(length(random_walk)-1):1])
reg2 <-lm(TVAR[length(TVAR):2] ~ TVAR[(length(TVAR)-1):1])
summary(reg1)

## 
## Call:
## lm(formula = random_walk[length(random_walk):2] ~ random_walk[(length(random_walk) - 
##     1):1])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9763 -0.6745 -0.0201  0.6917  3.7468 
## 
## Coefficients:
##                                          Estimate Std. Error t value
## (Intercept)                              0.004640   0.022128    0.21
## random_walk[(length(random_walk) - 1):1] 0.998003   0.001652  603.98
##                                          Pr(>|t|)    
## (Intercept)                                 0.834    
## random_walk[(length(random_walk) - 1):1]   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9836 on 2046 degrees of freedom
## Multiple R-squared:  0.9944, Adjusted R-squared:  0.9944 
## F-statistic: 3.648e+05 on 1 and 2046 DF,  p-value: < 2.2e-16

Slope too close to one, consistent with the random walk hypothesis

summary(reg2)

## 
## Call:
## lm(formula = TVAR[length(TVAR):2] ~ TVAR[(length(TVAR) - 1):1])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.3086  -1.2392  -0.0149   1.2954  10.8956 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                0.098121   0.078777   1.246    0.213    
## TVAR[(length(TVAR) - 1):1] 0.996405   0.001895 525.673   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.632 on 2046 degrees of freedom
## Multiple R-squared:  0.9927, Adjusted R-squared:  0.9926 
## F-statistic: 2.763e+05 on 1 and 2046 DF,  p-value: < 2.2e-16

Slope again too close to one

Let’s run Augmented Dickey Fuller tests

library(tseries)
adf.test(random_walk,k=1)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  random_walk
## Dickey-Fuller = -1.2995, Lag order = 1, p-value = 0.8749
## alternative hypothesis: stationary

and for the second series

adf.test(TVAR,k=1)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  TVAR
## Dickey-Fuller = -1.6986, Lag order = 1, p-value = 0.7059
## alternative hypothesis: stationary

In both cases, the unit root hypothesis was NOT rejected.

Lets see how the periodograms looks like.

par(mfrow=c(2,2))
spec.pgram(random_walk)
spec.pgram(TVAR)
spec.pgram(diff(random_walk))
spec.pgram(diff(TVAR))

Periodograms look very similar. And first differences too much like white noise. wE Don’t see and any dominant frequencies

But remember ADF tests have low power. Lets try the KPSS. Rejection of the null hypothesis could then be viewed as a convincing evidence in favor of a unit root (or not ??).

require(tseries)
kpss.test(random_walk)

## Warning in kpss.test(random_walk): p-value smaller than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  random_walk
## KPSS Level = 3.357, Truncation lag parameter = 8, p-value = 0.01

kpss.test(TVAR)

## Warning in kpss.test(TVAR): p-value smaller than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  TVAR
## KPSS Level = 5.6447, Truncation lag parameter = 8, p-value = 0.01

KPSS rejects null in both series…

Lets see whether a plot could help.Scatter-plot of consecutive values of observations for each series and corresponding regression lines.

plot(random_walk[length(random_walk):2] ~ random_walk[(length(random_walk)-1):1],col=3)
abline(reg1)

plot(TVAR[length(TVAR):2] ~ TVAR[(length(TVAR)-1):1],col=4)
abline(reg2)

Till now all seem to suggest both are random walks.

To sum up: -ACF show no auto-correlation on the differenced series -Periodograms(differenced series) look (somehow) like white noise -Regressions have slopes too close to 1 -Both unit root tests imply compatibility with the random walk hypothesis

But I’ve got more tests for you.

r Box.test(diff(random_walk), type = "Box-Pierce")

## ## Box-Pierce test ## ## data: diff(random_walk) ## X-squared = 0.65167, df = 1, p-value = 0.4195

r Box.test(diff(TVAR), type = "Box-Pierce")

## ## Box-Pierce test ## ## data: diff(TVAR) ## X-squared = 0.10883, df = 1, p-value = 0.7415

r Box.test(diff(random_walk), type = "Ljung-Box")

## ## Box-Ljung test ## ## data: diff(random_walk) ## X-squared = 0.65263, df = 1, p-value = 0.4192

r Box.test(diff(TVAR), type = "Ljung-Box")

## ## Box-Ljung test ## ## data: diff(TVAR) ## X-squared = 0.10899, df = 1, p-value = 0.7413 Portmanteau tests (Box-Pierce, Ljung–Box) don’t reject the hypothesis of stationarity (with the alternative being serially correlated data).

Hence all aligned with preceding tests

Differenced series for second case have higher variance

plot(density(diff(random_walk)))
density_series_1 <- density(diff(random_walk))
lines(density_series_1$x, dnorm(density_series_1$x),col=3)

plot(density(diff(TVAR)))
density_series_2 <- density(diff(TVAR))
lines(density_series_2$x, dnorm(density_series_2$x, mean(diff(TVAR)), sd(diff(TVAR))),col=4)

But now look at there

par(mfrow=c(2,4))
plot(reg1)
plot(reg2)

Fat tails one the second series, scale-location and residual vs leverage lines not flat, heteroscedastic residuals etc.

See the structure of the regression residuals of the second series. The ones of the first look totally like white noise

par(mfrow=c(2,1))
plot(residuals(reg1),type="l")
plot(residuals(reg2),type="l")

Similarly, no structure on the differences of the first series. But not the same with the second one

par(mfrow=c(2,1))
plot(diff(random_walk),type="l")
plot(diff(TVAR),type="l")

And that is a plot of the squares of the differenced series - an indication of how much “power” each observation contributes to the variance.

“Power” uniformly allocated (over time) on first series but not on the second ones.

par(mfrow=c(2,1))
plot(diff(random_walk)^2,type="l")
plot(diff(TVAR)^2,type="l")

And guess what? Squares of differences auto-correlated on the second series but not on the first

par(mfrow=c(2,1))
acf(diff(random_walk)^2)
acf(diff(TVAR)^2)

Note that the Priestley-Subba Rao stationarity test (https://www.jstor.org/stable/2984336 ) could detect that second series is not stationary. That is because, unlike the former tests, it does NOT ignore the local structure of the series

library("fractal")

## Loading required package: splus2R

## Loading required package: ifultools

## 
## Attaching package: 'fractal'

## The following object is masked from 'package:tseries':
## 
##     surrogate

library("astsa")
stationarity(TVAR)

## 1
## 2
## 3
## 4
## 5
## 6
## N = 2049, nblock = 11, n_block_max = 187, dt =     1.0000
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18

## Priestley-Subba Rao stationarity Test for TVAR
## ----------------------------------------------
## Samples used              : 2049 
## Samples available         : 2046 
## Sampling interval         : 1 
## SDF estimator             : Multitaper 
##   Number of (sine) tapers : 5 
##   Centered                : TRUE 
##   Recentered              : FALSE 
## Number of blocks          : 11 
## Block size                : 186 
## Number of blocks          : 11 
## p-value for T             : 0 
## p-value for I+R           : 0.2372223 
## p-value for T+I+R         : 0

Also rejected using von Sachs and Neumann (2000) wavelet stationariy test that too does NOT ignore local structure

library("locits")

## Loading required package: wavethresh

## Loading required package: MASS

## WaveThresh: R wavelet software, release 4.6.8, installed

## Copyright Guy Nason and others 1993-2016

## Note: nlevels has been renamed to nlevelsWT

## Loading required package: igraph

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

ans <- hwtos2(diff(TVAR))

##   10    9    8    7    6    5    4    3

plot(ans)

So first series is indeed a Gaussian random walk. But what about the second? It is actually a time varying auto-regressive (1) process with AR coefficient chosen randomly from a uniform distribution around the unit root.

If you do the maths, you won’t find the results surprising. But what what is worth highlighting is that global tests (such as unit roots, portmanteau etc.) do often perform terribly. Also, not only inappropriate for testing stationary but can generate too many false positives

Random Walks, Unit Root tests and Stationarity

Generating Data

Data Analysis