Time-series clustering

Clustering is an unsupervised data mining technique.
The goal is to form homogeneous groups, or clusters of objects, with minimum inter-cluster and maximum intra-cluster similarity.
An exploratory technique in time-series visualization.
Construct clusters as you consider the entire series as a whole.

To start, choose 3 main parameters.

Distance measure (quantify dissimilarity)
Prototype (summarizes characteristics of all series in a cluster)
Cluster algorithm (most common partitional or hierarchical)

To finish, evaluate results.

Cluster validity indices (CVI)

I. Distance measures

Euclidean distance

most commonly used distance measure
main limitations:
- only for series of equal length
- sensitive to time shifts

Dynamic time warping (DTW) distance

shape based; overcomes Euclidean distance limitations
algorithm compares two series to find the optimum warping path
creates a local cost matrix and traverses it to find the optimal warping path
needs/constraints:
- choice of step pattern
- choice of window that limits the area of the LCM (unknown a priori, so test for best size)
- can handle series with unequal length (slanted band vs. Sakoe-Chiba window)
- minimum alignment and divergence functions, Eq. 1.1 and 1.2
- computationally expensive
- (*)

Global alignment kernel (GAK) distance

similarity between time series by using kernels
uses a local similarity function (in Eq. 2)
again, get the best path at the lowest cost (soft-minimum of all alignment scores)

\[ DTW(x,y) = min_{\pi \in \Lambda (n,m)} D_{x,y}(\pi)\](1.1) \[ DTW(x,y) = \sum_{i=1}^{|\pi|} \varphi(x_{\pi_1 (i)}, y_{\pi_2 (i)})\](1.2) \[ \kappa_{GA}= \sum_{\pi \in \Lambda (n,m)}\prod_{i=1}^{|\pi|}\ K (x_{\pi_1 (i)}, y_{\pi_2 (i)}) \](2)

alignment between two time series x and y of lengths n and m

II. Prototype

Mean or median

the average of each time-point
poor choice (affect convergence)
only for series of equal length

Partition around medoids (PAM)

the medoid is the prototype of the cluster that is one of the time series itself
facilitates computation (“re-usable”)

DTW barycenter averaging
Shape extraction
Fuzzy-based prototypes

Note

for hierarchical clustering with DTW, if one wished to obtain the prototype of the series for further characterization, this must be obtained using mean or preferably another shape-based approach.(*)
may require z-normalization to output mean zero and s.d. 1; recommended for structural vs. amplitude driven change detection (among other exceptions, always include when function output is normalized). See zscore function.
may require that time-series have equal length. See reinterpolate function for a possible solution using linear interpolation.

III. Time-series clustering algorithms

Hierarchical clustering

hierarchical grouping to form clusters
agglomerative
need to specify the pairwise similarities (i.e. DTW) and the similarity measure between groups (linkage method)
linkage methods: i.e. single (closest pair), complete (furthest pair)(*), Diana, Wards
hierarchical vs partitional -> no needed pre-specification of k clusters
produces a dendrogram from where the best k clusters can be deduced (vs CVI’s)

Partitional clustering

(k) random centroids are initialized, distance to all data is determined and objects are assigned to each cluster (iterative)
main disadvantages:
1. pre-specify k
2. stochastic, random start and may converge at a local optima
advantages: best for larger data sets.

Note

(*) generally preferred; compact clusters
a pretty cluster result would look like this:

IV. Cluster evaluation with CVI’s

to choose the best k clusters (undetermined a priori)
interval CVI (measure of cluster purity):
1. Silhouette index (+)
2. Dunn index (+)
3. Calinski-Harabasz index (+)
4. COP index
5. Davies-Bouldin index
- some maximized (+) while others should be minimized
highest index majority vote
see the following documentation for detailed CVI descriptions and specific usages.

The code below demonstrates the potential for time-series clustering with the R package dtwclust by Alexis Sarda-Espinosa. This was also the main reference for the majority of the previous notes and figures. The R script is adapted from exercises in the dtwclust vignette. Interesting dependencies from flexclust and dtw.

# synthetically generated control charts
library(tidyr)
library(dtwclust)
library(dplyr)
library(ggplot2)
library(reshape)

df <- read.table("http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.data", 
                 header = FALSE)

# wide to long
df_long <- gather(df[c(1:50),c(1:20)]) 

# make a timepoint column
df_long$time <- rep(1:50,20)

# plot by key
df_long  %>% 
  ggplot(aes(x= time, y= value, color= key)) +
  geom_line( size=0.2) +
  ggtitle("Control chart sequences") + 
  facet_wrap(~ key , scales = 'free_x', nrow= 2)

df_list <- as.list(utils::unstack(df_long, value ~ key))

df_list_z <- dtwclust::zscore(df_list)

#hierarchical clustering with 10% window size for up to k=10 clusters
cluster_dtw_h <-list()
for (i in 2:20)
{
  cluster_dtw_h[[i]] <- tsclust(df_list_z, type = "h", k = i,  distance = "dtw", control = hierarchical_control(method = "complete"), seed = 390, preproc = NULL, args = tsclust_args(dist = list(window.size = 5L)))
}

# take a look at the object
cluster_dtw_h[[20]]

## hierarchical clustering with 20 clusters
## Using dtw distance
## Using PAM (Hierarchical) centroids
## Using method complete 
## 
## Time required for analysis:
##    user  system elapsed 
##   0.998   0.065   1.065 
## 
## Cluster sizes with average intra-cluster distance:
## 
##    size av_dist
## 1     1       0
## 2     1       0
## 3     1       0
## 4     1       0
## 5     1       0
## 6     1       0
## 7     1       0
## 8     1       0
## 9     1       0
## 10    1       0
## 11    1       0
## 12    1       0
## 13    1       0
## 14    1       0
## 15    1       0
## 16    1       0
## 17    1       0
## 18    1       0
## 19    1       0
## 20    1       0

# some cluster information
cluster_dtw_h[[4]]@clusinfo

##   size  av_dist
## 1    6 33.11645
## 2    3 28.26739
## 3    2 22.15062
## 4    9 35.22409

# plot dendrogram for k= 4
plot(cluster_dtw_h[[4]])

#  The series and the obtained prototypes can be plotted too
plot(cluster_dtw_h[[4]], type = "sc")

# the representative prototype 
plot(cluster_dtw_h[[4]], type = "centroid")

References

https://cran.r-project.org/web/packages/dtwclust/vignettes/dtwclust.pdf

http://www.sthda.com/english/wiki/print.php?id=237

https://cran.r-project.org/web/packages/dtwclust/dtwclust.pdf

https://cran.r-project.org/web/packages/flexclust/flexclust.pdf

https://cran.r-project.org/web/packages/dtw/dtw.pdf

https://rdrr.io/cran/dtwclust/man/cvi.html

https://www.rdocumentation.org/packages/dtwclust/versions/3.1.1/topics/reinterpolate

Arbelaitz O, Gurrutxaga I, Muguerza J. An extensive comparative study of cluster validity indices. Pattern Recognition. 2013;46:243–256. doi:10.1016/j.patcog.2012.07.021.

Cuturi M (2011). “Fast Global Alignment Kernels.” In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 929–936.

Ratanamahatana, C.A., Keogh, E. Everything you know about Dynamic Time Warping is Wrong. In: Proc. of KDD Workshop on Mining Temporal and Sequential Data. 2004.

Sarda-Espinosa A. Comparing Time-Series Clustering Algorithms in R Using the dtwclust Package. 2017; p. 1–41.

Sarda-Espinosa A. dtwclust: Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance version 5.1.0. 2017.

Time-series clustering

Dr. Ana Rita Marques amarques@upei.ca

UPEI EPI on the Island Module 2- 2018

I. Distance measures

II. Prototype

III. Time-series clustering algorithms

IV. Cluster evaluation with CVI’s