library(data.table)
library(ggplot2)
library(microbenchmark)
library(reshape2)
library(splitstackshape)
library(zoo)
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dates <- seq(as.Date("2004-01-01"), by = 7, len = 52 * 10)
tickers <- letters
my.df <- data.frame(dates = as.Date(rep_len(dates, length(dates) * length(tickers))),
tickers = rep(tickers, each = length(dates)), SVI = runif(length(dates) *
length(tickers), min = 0, max = 100))
svi <- read.zoo(my.df, index.column = "dates", split = "tickers")
meansvi <- rollmean(svi, 52, align = "right")
my.zoo <- merge(svi, meansvi)
AM_1 <- function() {
dt <- data.table(index = index(my.zoo), coredata(my.zoo), key = "index")
setnames(dt, gsub("([a-z].*)\\.([a-z].*)", "\\2_\\1", names(dt)))
reshape(dt, idvar = "index", varying = 2:ncol(dt), sep = "_", direction = "long")
}
AM_2 <- function() {
merged.stack(data.frame(index = index(my.zoo), my.zoo), id.vars = "index",
var.stubs = c("\\.svi$", "\\.meansvi$"), sep = "\\.svi|\\.meansvi")
}
agstudy <- function() {
A <- melt(data.table(index = index(my.zoo), coredata(my.zoo)), id = "index")
A <- A[, `:=`(c("tickers", "variables", "variable"), {
vv <- unlist(strsplit(as.character(variable), ".", fixed = TRUE))
list(unlist(vv)[c(TRUE, FALSE)], unlist(vv)[c(FALSE, TRUE)], NULL)
})]
dcast.data.table(A, index + tickers ~ variables, value.var = "value")
}
orig <- function() {
df <- data.frame(index = index(my.zoo), my.zoo)
df <- stack(df, select = -index)
df$index <- as.Date(rep_len(index(my.zoo), nrow(df)))
df <- cbind(df, do.call(rbind, strsplit(as.character(df$ind), ".", fixed = TRUE)))
reshape(df, direction = "wide", idvar = c("index", "1"), timevar = "2",
drop = "ind")
}
x <- microbenchmark(AM_1(), AM_2(), agstudy(), orig(), times = 20)
x
## Unit: milliseconds
## expr min lq median uq max neval
## AM_1() 229.2 236.47 244.66 255.91 334.1 20
## AM_2() 84.0 86.69 90.08 93.60 176.0 20
## agstudy() 52.5 52.95 56.09 60.36 133.5 20
## orig() 268.7 324.58 350.45 440.32 588.6 20
autoplot(x)
Summary:
reshape is faster than your current approach, which requires stacking before you can reshape, so there's an unnecessary extra step.merged.stack (AM_2) is not as quick as the bleeding-edge melt.data.table + dcast.data.table approaches, but (in my opinion) benefits from fairly straightforward syntax.merged.stack solution.The solution looks like this:
merged.stack(data.frame(index = index(my.zoo), my.zoo),
id.vars = "index",
var.stubs = c("\\.svi$", "\\.meansvi$"),
sep = "\\.svi$|\\.meansvi$")
Here, the var.stubs argument is a regular expression or a character string that uniquely identifies the column groups. For this question, I've set it to be “names that end with .svi and names that end with .meansvi”.
Presently, that is what gets used for the resulting column names in the reshaped data, but I may add an argument later to allow that to be overridden.
Generally, the sep argument would take a character. However, since the names are somewhat backwards from what reshape (and merged.stack) expects, we just “strip out” the part that represents the groups and assume the rest to be the “time” variable. (reshape expects names to be in the form of “group+sep+time”, so “varX_1”, “varX_2”, and so on. The names in this dataset, however, are in the form of “1_varX”, “2_varX”, which needs a little more special treatment.)
By the way, I do plan to update merged.stack to benefit from the perks that Arun has brought to “data.table” with respect to reshaping data, but I'm waiting until it's a stable release that's out on CRAN.