To start, choose 3 main parameters.
To finish, evaluate results.
\[ DTW(x,y) = min_{\pi \in \Lambda (n,m)} D_{x,y}(\pi)\](1.1) \[ DTW(x,y) = \sum_{i=1}^{|\pi|} \varphi(x_{\pi_1 (i)}, y_{\pi_2 (i)})\](1.2) \[ \kappa_{GA}= \sum_{\pi \in \Lambda (n,m)}\prod_{i=1}^{|\pi|}\ K (x_{\pi_1 (i)}, y_{\pi_2 (i)}) \](2)
Note
for hierarchical clustering with DTW, if one wished to obtain the prototype of the series for further characterization, this must be obtained using mean or preferably another shape-based approach.(*)
may require z-normalization to output mean zero and s.d. 1; recommended for structural vs. amplitude driven change detection (among other exceptions, always include when function output is normalized). See zscore function.
may require that time-series have equal length. See reinterpolate function for a possible solution using linear interpolation.
Note
(*) generally preferred; compact clusters
a pretty cluster result would look like this:
The code below demonstrates the potential for time-series clustering with the R package dtwclust by Alexis Sarda-Espinosa. This was also the main reference for the majority of the previous notes and figures. The R script is adapted from exercises in the dtwclust vignette. Interesting dependencies from flexclust and dtw.
# synthetically generated control charts
library(tidyr)
library(dtwclust)
library(dplyr)
library(ggplot2)
library(reshape)
df <- read.table("http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.data",
header = FALSE)
# wide to long
df_long <- gather(df[c(1:50),c(1:20)])
# make a timepoint column
df_long$time <- rep(1:50,20)
# plot by key
df_long %>%
ggplot(aes(x= time, y= value, color= key)) +
geom_line( size=0.2) +
ggtitle("Control chart sequences") +
facet_wrap(~ key , scales = 'free_x', nrow= 2)
df_list <- as.list(utils::unstack(df_long, value ~ key))
df_list_z <- dtwclust::zscore(df_list)
#hierarchical clustering with 10% window size for up to k=10 clusters
cluster_dtw_h <-list()
for (i in 2:20)
{
cluster_dtw_h[[i]] <- tsclust(df_list_z, type = "h", k = i, distance = "dtw", control = hierarchical_control(method = "complete"), seed = 390, preproc = NULL, args = tsclust_args(dist = list(window.size = 5L)))
}
# take a look at the object
cluster_dtw_h[[20]]
## hierarchical clustering with 20 clusters
## Using dtw distance
## Using PAM (Hierarchical) centroids
## Using method complete
##
## Time required for analysis:
## user system elapsed
## 0.998 0.065 1.065
##
## Cluster sizes with average intra-cluster distance:
##
## size av_dist
## 1 1 0
## 2 1 0
## 3 1 0
## 4 1 0
## 5 1 0
## 6 1 0
## 7 1 0
## 8 1 0
## 9 1 0
## 10 1 0
## 11 1 0
## 12 1 0
## 13 1 0
## 14 1 0
## 15 1 0
## 16 1 0
## 17 1 0
## 18 1 0
## 19 1 0
## 20 1 0
# some cluster information
cluster_dtw_h[[4]]@clusinfo
## size av_dist
## 1 6 33.11645
## 2 3 28.26739
## 3 2 22.15062
## 4 9 35.22409
# plot dendrogram for k= 4
plot(cluster_dtw_h[[4]])
# The series and the obtained prototypes can be plotted too
plot(cluster_dtw_h[[4]], type = "sc")
# the representative prototype
plot(cluster_dtw_h[[4]], type = "centroid")
References
https://cran.r-project.org/web/packages/dtwclust/vignettes/dtwclust.pdf
http://www.sthda.com/english/wiki/print.php?id=237
https://cran.r-project.org/web/packages/dtwclust/dtwclust.pdf
https://cran.r-project.org/web/packages/flexclust/flexclust.pdf
https://cran.r-project.org/web/packages/dtw/dtw.pdf
https://rdrr.io/cran/dtwclust/man/cvi.html
https://www.rdocumentation.org/packages/dtwclust/versions/3.1.1/topics/reinterpolate
Arbelaitz O, Gurrutxaga I, Muguerza J. An extensive comparative study of cluster validity indices. Pattern Recognition. 2013;46:243–256. doi:10.1016/j.patcog.2012.07.021.
Cuturi M (2011). “Fast Global Alignment Kernels.” In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 929–936.
Ratanamahatana, C.A., Keogh, E. Everything you know about Dynamic Time Warping is Wrong. In: Proc. of KDD Workshop on Mining Temporal and Sequential Data. 2004.
Sarda-Espinosa A. Comparing Time-Series Clustering Algorithms in R Using the dtwclust Package. 2017; p. 1–41.
Sarda-Espinosa A. dtwclust: Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance version 5.1.0. 2017.