Benchmarking merged.stack, reshape, and melt.data.table+dcast.data.table

The required packages

library(data.table)
library(ggplot2)
library(microbenchmark)
library(reshape2)
library(splitstackshape)
library(zoo)

## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

The sample data

dates <- seq(as.Date("2004-01-01"), by = 7, len = 52 * 10)
tickers <- letters
my.df <- data.frame(dates = as.Date(rep_len(dates, length(dates) * length(tickers))), 
    tickers = rep(tickers, each = length(dates)), SVI = runif(length(dates) * 
        length(tickers), min = 0, max = 100))
svi <- read.zoo(my.df, index.column = "dates", split = "tickers")
meansvi <- rollmean(svi, 52, align = "right")
my.zoo <- merge(svi, meansvi)

The reshaping functions

AM_1 <- function() {
    dt <- data.table(index = index(my.zoo), coredata(my.zoo), key = "index")
    setnames(dt, gsub("([a-z].*)\\.([a-z].*)", "\\2_\\1", names(dt)))
    reshape(dt, idvar = "index", varying = 2:ncol(dt), sep = "_", direction = "long")
}
AM_2 <- function() {
    merged.stack(data.frame(index = index(my.zoo), my.zoo), id.vars = "index", 
        var.stubs = c("\\.svi$", "\\.meansvi$"), sep = "\\.svi|\\.meansvi")
}
agstudy <- function() {
    A <- melt(data.table(index = index(my.zoo), coredata(my.zoo)), id = "index")
    A <- A[, `:=`(c("tickers", "variables", "variable"), {
        vv <- unlist(strsplit(as.character(variable), ".", fixed = TRUE))
        list(unlist(vv)[c(TRUE, FALSE)], unlist(vv)[c(FALSE, TRUE)], NULL)
    })]
    dcast.data.table(A, index + tickers ~ variables, value.var = "value")
}
orig <- function() {
    df <- data.frame(index = index(my.zoo), my.zoo)
    df <- stack(df, select = -index)
    df$index <- as.Date(rep_len(index(my.zoo), nrow(df)))
    df <- cbind(df, do.call(rbind, strsplit(as.character(df$ind), ".", fixed = TRUE)))
    reshape(df, direction = "wide", idvar = c("index", "1"), timevar = "2", 
        drop = "ind")
}

The benchmarks

x <- microbenchmark(AM_1(), AM_2(), agstudy(), orig(), times = 20)
x

## Unit: milliseconds
##       expr   min     lq median     uq   max neval
##     AM_1() 229.2 236.47 244.66 255.91 334.1    20
##     AM_2()  84.0  86.69  90.08  93.60 176.0    20
##  agstudy()  52.5  52.95  56.09  60.36 133.5    20
##     orig() 268.7 324.58 350.45 440.32 588.6    20

autoplot(x)

plot of chunk unnamed-chunk-4

Summary:

reshape is faster than your current approach, which requires stacking before you can reshape, so there's an unnecessary extra step.
merged.stack (AM_2) is not as quick as the bleeding-edge melt.data.table + dcast.data.table approaches, but (in my opinion) benefits from fairly straightforward syntax.

A quick explanation of the `merged.stack` solution.

The solution looks like this:

merged.stack(data.frame(index = index(my.zoo), my.zoo), 
             id.vars = "index", 
             var.stubs = c("\\.svi$", "\\.meansvi$"), 
             sep = "\\.svi$|\\.meansvi$")

Here, the var.stubs argument is a regular expression or a character string that uniquely identifies the column groups. For this question, I've set it to be “names that end with .svi and names that end with .meansvi”.

Presently, that is what gets used for the resulting column names in the reshaped data, but I may add an argument later to allow that to be overridden.

Generally, the sep argument would take a character. However, since the names are somewhat backwards from what reshape (and merged.stack) expects, we just “strip out” the part that represents the groups and assume the rest to be the “time” variable. (reshape expects names to be in the form of “group+sep+time”, so “varX_1”, “varX_2”, and so on. The names in this dataset, however, are in the form of “1_varX”, “2_varX”, which needs a little more special treatment.)