Last Updated: 2021-03-14 15:33:16 UTC
- added macOS benchmarks (KVM virtual machine for now)
Benchmark computing Matrix Profile
This benchmark will use the current Rcpp implementation, and a real dataset of the italian power demand that contains almost 30k observations.
url <- readr::read_csv("https://raw.githubusercontent.com/matrix-profile-foundation/mpf-datasets/05efe885cff4b2266067ad62c4f6fa2b537ad2a2/real/italianpowerdemand.csv", col_names = FALSE)
dataset <- as.numeric(url$X1)
The data base
Here is a plot of the database

Let’s start the benchmark using the bench package instead of microbenchmark. First, to keep compatible with Tidyverse, and because it returns several information about memory and garbage collection usage.
The method for benchmark will be: using a matrix of data size and window size, so we can compare the performance in more than one scenario. Let’s warm up with a sample test:
sample <- head(dataset, 1000)
w_size <- 100
bench::mark(stomp = stomp(sample, w_size, progress = FALSE))
So it works.
Now let’s start the main (and intense) task.
Desktop:
- Intel(R) Core(TM) i7-7700 CPU @ 3.60Ghz.
- 32 GB RAM
- Windows 10 64-bits build 10.0.18363.1316
- WSL2 Ubuntu 20.04.1 LTS (GNU/Linux 5.4.72-microsoft-standard-WSL2 x86_64)
Raspberry:
- Quad Core 1.2GHz Cortex-A53 ARMv8 64bit CPU
- 1 GB RAM
- Raspberry Pi OS
Algorithms to be evaluated:
- STAMP (single and with 4 threads)
- STOMP (single and with 4 threads)
- SCRIMP (single and with 4 threads)
- MPX (single and with 4 threads)
The outputs will not be compared at first to avoid loosing CPU time with small variations that may occur. The code below was the one used to compute the results. They were saved and now it’s using the saved data to speedup this article rendering.
The Multithreading implementation is using the Intel TBB, some system may fallback to TinyThreads++ (at least for now, TBB was working on all tested platforms including Solaris and ARMv8). The main speed issue may be related with the mutex implementation of TinyThread++ that is not as efficient as TBB.
This is the code used to benchmark:
# changing n_workers to 4 will use 4 threads to compute
results <- bench::press(
d_size = c(5000, 10000, 15000, 20000, 25000),
w_size = c(100, 300, 500, 700, 900),
{
data <- head(dataset, d_size)
bench::mark(
stamp = stamp(data, w_size, progress = FALSE, n_workers = 1),
stomp = stomp(data, w_size, progress = FALSE, n_workers = 1),
scrimp = scrimp(data, w_size, progress = FALSE, n_workers = 1),
mpx = mpx(data, w_size, progress = FALSE, n_workers = 1),
check = FALSE,
min_iterations = 3
)
})
save(results, file = "bench.rda")
Summary of benchmarks


A curious comparison, a desktop single thread vs Raspberry Pi 3 B with four threads:

Detailed benchmarks
Single thread experiments
Four threads experiments
ARM


Cubietruck Plus ARM - Eight Threads is depicted below. I find out that R some times don’t have all cores available and is unpredictable [link].


