Using LeastFracDiff to analyse the stationarity of the S&P 500 stock market index.

Luis Damiano

2018-07-12

LeastFracDiff provides minimal tools for finding an approximation to the least fractional difference order $d$ that transforms a time series into a stationary one, as suggested by de Prado (2018)¹. As we typically work with log prices, we rely on quantmod to download data for our example.

library(LeastFracDiff)
suppressPackageStartupMessages(library(quantmod))

## Warning: package 'quantmod' was built under R version 3.4.4

## Warning: package 'xts' was built under R version 3.4.4

## Warning: package 'zoo' was built under R version 3.4.4

penv <- new.env()
getSymbols(
  Symbols = c("SPY", "QQQ"),
  src     = "yahoo",
  from    = "1990-01-01", 
  to      = "2018-06-30",
  env     = penv
)

## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.

## 
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
## 
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).

## [1] "SPY" "QQQ"

p    <- penv$SPY
x    <- log(Cl(na.omit(p)))

We approximate the minimal fractional differentiation coefficient $d$ so that the p-value from the Augmented Dickey Fuller test is below an arbitrary threshold (say, $0.05$). The function min_d takes three arguments: the time series x, a function fun taking a time series and returning a p-value, and a threshold. The package includes a function adf relying on the package tseries for convenience.

print(adf)

## function (x) 
## {
##     suppressWarnings(tseries::adf.test(x)$p.value)
## }
## <environment: namespace:LeastFracDiff>

d   <- min_d(x, adf, 0.05)
print(d)

## [1] 0.2

A series can be transformed via the function diff_series, which in turns relies on the package fracdiff.

dx  <- diff_series(x, d)
plot(
  cbind(
    "Log Price" = x, 
    "Differenced Log Price" = dx
  ),
  main       = colnames(x),
  grid.col   = "white",
  legend.loc = "right"
)

Instead of keeping only the least $d$, we may explore the relationship between the order of fractional differentiation, some measures of stationarity, and a metric for loss of information with respect to the original series. The function grid_d takes the already discussed arguments x and fun to produce a grid for values of $d$ from dFrom to dTo increase by dBy each step.

g   <- grid_d(x, adf, dFrom = 0, dTo = 1, dBy = 0.1)

knitr::kable(g)

d	pValue	pearsonCor	spearmanCorrelation
0.0	0.6615901	1.0000000	1.0000000
0.1	0.1617780	0.9888802	0.9916126
0.2	0.0100000	0.9492172	0.9635403
0.3	0.0100000	0.8684203	0.9053865
0.4	0.0100000	0.7333514	0.7968286
0.5	0.0100000	0.5481407	0.6239581
0.6	0.0100000	0.3542314	0.4172065
0.7	0.0100000	0.2019763	0.2469496
0.8	0.0100000	0.1060958	0.1274735
0.9	0.0100000	0.0533047	0.0544853
1.0	0.0100000	0.0263716	0.0126136

plot(g, ylab = "ADF p-value")
abline(h = 0.05)

matplot(
  g[, 1], g[, -1], type = "l", ylab = "", xlab = "d",
  col = 1:3, lty = 1, lwd = 1
)
legend(
  x      = "topright",
  legend = c("p-value", "Pearson", "Spearman"),
  bty    = "n",
  cex    = 0.7,
  lty    = 1,
  col    = 1:3
)

We can also compute min $d$. The function roll_min_d takes three mandatory arguments: x a time series, width the number of observations per rolling window, and by the number of time steps between each window. Additionally, the user may pass other arguments for min_d or zoo::rollapply.

For example, the following snippet minimizes $d$ so that the p-value of the ADF statistic is below a threshold of $0.05$. The minimization is run on a rolling window with approximately five years of daily close prices. Each run is separated by a month approximately.

dRolling <- roll_min_d(x, fun = adf, threshold = 0.05, width = 5 * 252, by = 21)

plot(
  na.omit(dRolling),
  main       = sprintf(
    "Auto d for %s (5ys daily obs per window - one window per month)", 
    colnames(x)
  ),
  ylab       = "min d s.t. ADF p-value = 0.05",
  grid.col   = "white",
  legend.loc = "topleft"
)

Compare the fractional differentiation order found using the entire time series $d^ = $ versus the time-varying quantity observed for each window. We observe that min $d$ is approximately zero for the rolling window ending on March 2015, i.e. there was not enough evidence to reject the null hypothesis that log prices had a non-stationary distribution according to ADF.

subx   <- last(x["/2015-04-01"], 5 * 252)
plot(
  subx,
  main       = colnames(x),
  grid.col   = "white",
  legend.loc = "right"
)

print(tseries::adf.test(subx))

## 
##  Augmented Dickey-Fuller Test
## 
## data:  subx
## Dickey-Fuller = -3.4945, Lag order = 10, p-value = 0.04283
## alternative hypothesis: stationary

The code to expand this analysis to other assets is trivial:

dMat <- do.call(
  cbind,
  lapply(penv, function(p) {
    x    <- log(Cl(na.omit(p)))
    as.numeric(roll_min_d(x, fun = adf, threshold = 0.05, width = 5 * 252, by = 21))
  })
)

## Warning in (function (..., deparse.level = 1) : number of rows of result is
## not a multiple of vector length (arg 1)

matplot(
  na.omit(dMat), type = "l", ylab = "d", xlab = "Time",
  col = 1:2, lty = 1, lwd = 1
)
legend(
  x      = "topright",
  legend = colnames(dMat),
  bty    = "n",
  cex    = 0.7,
  lty    = 1,
  col    = 1:2
)

Final remarks

A more granular value for $d$ may be find at the cost of computational complexity by setting a smaller value for dBy.
P-values from other tests, or completely different metrics, may be used by changing the argument fun. For example, set fun = kpss to run the KPSS test.

de Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.↩