This publication is a companion to the talk presented at R/Finance 2013. The goal here is to show a sample of the code used to generate the results in the talk. We post here the code for one of the main empirical results (“Average over all regimes” - page 7). Most of the theoretical results are either given by close form solutions or by Monte Carlo simulating the random process and applying the code in this publication.
Load DJIA weekly returns from FRED using quandmod. Then normalize the data to be closer to constant volatility data. That is closer to a normal random variable.
require(quantmod)
require(TTR)
require(xts)
require(ggplot2)
# Load data
getSymbols("DJIA", src = "FRED")
## [1] "DJIA"
# Use weekly data
S0 <- na.locf(log(DJIA))
S <- to.weekly(S0)[, 4]
x <- diff(S)
x[1] <- 0
x[is.na(x)] <- 0
# Normalized data used to create spectrum Use simple measure of volatility
# to make the Data more gaussian.
mvol = SMA(abs(x), 8)
mvol.lag = lag(mvol, 1)
mvol.lag[is.na(mvol.lag)] = 1
# Gaussian RV: approx
y = x/mvol.lag
Next we list one of the key functions used in the calculations and empirical analysis:
# Given input data R (log-returns), we generate the spectrum: Sharpe (mean
# and standad deviation) vs Lag Used for all pictures and calculations
# with empirical data (or simulations)
generalTsMaUsual <- function(N = 200, step = 5, R) {
Nt <- dim(R)[1]
lookbc <- seq(step, N, step)
sig <- lapply(lookbc, function(n, y) {
apply(y, 2, function(z, p) SMA(z, p), p = n)
}, y = R)
a <- sapply(sig, function(x, y) {
Nt <- dim(y)[1]
aux0 <- x[1:(Nt - 1), , drop = FALSE] * y[2:Nt, , drop = FALSE]
sma1 <- mean(apply(aux0, 2, mean, na.rm = TRUE))
sma2 <- mean(apply(aux0, 2, sd, na.rm = TRUE))
return(cbind(sma1, sma2))
}, y = R)
sma <- cbind(lookbc, t(a))
return(sma)
}
Empirical analysis starts by first finding regimes in the 100+ years of the DJIA that can be considered approximately stationary. That is we want data that has a constant standard deviation and a constant drift. We assume that the auto-correlation depends only on time lag and it is the same during this stationary regime. Based on our results these are reasonable assumptions.
In order to get approximately constant variance we renormalize the weekly returns by a measure of standard deviation. In order to have constant drift we take a simple approach and try to find “regimes” in the re-normalized DJIA by fitting a piece wise constant linear function to the DJIA index. That is done using the package bfast (by Jan Verbesselt, Achim Zeileis, Rob Hyndman and Rogier De Jong). In reality one can do it also visually with better results since bfast is sometimes early or late when inflections take place.
In what follows we have run bfast with “h=0.015”(see bfast package:http://cran.r-project.org/web/packages/bfast/index.html) applied to DJIA monthly cumsum returns (not normalized).
Thereafter we split the normalized weekly returns (y) into list of normalized y returns per regime. Finally we apply the function generalTsMaUsual to calculate the spectrum (Sharpe vs Lag) of each regime. We expect to have pictures that approximate the “Case1” and “Case2” of page 4 and 5 of the presentation. The average over all regimes is finally calculated to result in the data points of figure in page 7. This is also replicated here in what follows. Note: the figures will not exactly match (here vs presentation) because the figure here uses additional data points.
The theoretical line in the presentation is calculated by either simulating ARMA process with drift (equal to the mean drift of y) and auto-correlation selected to match the data points or using the exact formula (page 4 of presentation). We do not present the code here but it is essentially the same code applied to simulated data, for instance.
load("../Data/yR")
# Apply a spectrum calculation per regim Spectrum = Sharpe vs Lag (N)
av <- sapply(yR, function(x) {
if (dim(x)[1] < 70)
return(rep(NA, 43))
a <- generalTsMaUsual(N = 43, step = 1, x)
return(a[, 2]/a[, 3])
})
av <- t(av)
df.full = data.frame(Lag = 1:43, Sharpe = sqrt(52) * apply(av, 2, mean, na.rm = TRUE))
pF = ggplot() + geom_point(data = df.full, aes(x = Lag, y = Sharpe, size = 2)) +
theme(legend.position = "none")
print(pF)