1 On clustering time series data

Thanks to the ability of collecting information over long periods of time and the growth of computing power, data is often represented in points in time. That way one observation can be actually a sequence of values, called a univariate time series. Such a data structure is very popular and can be found in various fields. The majority of socio-economic (GDP, employment, birth rates, education), financial (stock prices, exchange rates) and environmental (pollution, energy consumption) indicators are collected over time. Time series data is also broadly used in medicine blood pressure, EKG), IoT (monitoring body activity, energy management) or meteorology (weather, earthquake analysis). Analysis of this type of data is essential in Data Science.

1.1 Clustering time series in literature

As mentioned before, a univariate time series, although consisting of many values, can be treated as one observation. Clustering can be useful to find patterns in datasets consisting of many time series and simplifying them (Aghabozorgi, Shirkhorshidi, and Wah (2015)).

When clustering time series data, we can develop many types of approaches to treating that data. Overall there have been three main types of approaches defined (Guijo-Rubio et al. (2018), p.2)(Aghabozorgi, Shirkhorshidi, and Wah (2015),p.18-19). The first one is whole time series, where we treat one time-series as one observation. As an example one can take economical databases such as Eurostat. Observations are usually countries and in each column is a value of chosen indicator in a specific time point. In order to cluster those countries based on similarity of datapoints in time the distance should be calculated between all points. Second defined approach is called subsequence. It takes into account only one time series and divides it into sections clustering them by similarity. It can be used on data collected by gyroscopes for example in smartwatches to determine types of activities that were performed in that time. That way we can find moments whena person was sitting, standing etc. The last approach is time point and is also applied on single time series but by clustering single points. It treats the time series the same way as if the datapoints weren’t ordered, but independent values. Although all three methods have important applications, this paper will be concentrated only on whole time series analysis.

When taking into account time series, there have been several measures applied (Aghabozorgi, Shirkhorshidi, and Wah (2015). p. 22). For example there is probability-based distance, that takes into account the seasonality of data, Hausdorff distance defined as “the maximum of the distances from a point in any of the sets to the nearest point in the other set” (Rote (1991)) or Hidden Markov model based distance used in modelling complex time series (Zeng, Duan, and Wu (2010)). The most popular however is the Euclidean distance and Dynamic Time Warping distance. It has been proven that the Euclidean distance is the most efficient, but forces both time series to be the same length (Guijo-Rubio et al. (2018); Wang et al. (2013)). DTW method is however known to be the most accurate (Wang et al. (2013)).

Methods of measuring distance in time series have been grouped based on approach - shape based, feature based and model based (Aghabozorgi, Shirkhorshidi, and Wah (2015)). Sometimes, researchers define also edit-based and compression based distances (Liao (2005)). Euclidean and DTW distances are both classified as shape based. This type takes into account the overall shape and matches time series based on that aspect.

1.2 Euclidean vs. Dynamic Time Warping

The main difference between both distances can be best understood graphically. The picture below shows an example of matched points of two data vectors. They were connected based on the minimal distance between points based on DTW (black) and Euclidean (red) distance. It can be seen that with DTW the 11th blue point matches 4 green points. When taking into account the Euclidean distance it can be seen that the assigned points 9-to-9 and 10-to-10 are visibly further than 9-to-11 and 10-to-11. That can significantly impact the overall distance of series between each other. The DTW takes into account the shape of both time series much better.

Fig. 1: Visual comparison of matched points based on DTW (black) and Euclidean (red) distance

Dynamic Time warping is a method of calculating distance that is more accurate than Euclidean distance. It has an advantage over Euclidean if datapoints are shifted between each other and we want to look rather at its shape. Additionally two time series don’t have to be equal in length what is an assumption required by the Euclidean distance. The Euclidean distance takes pairs of datapoints and compares them to each other. DTW calculates the smallest distance between all points - this enables a one-to-many match. In literature dynamic time warping is often paired with k-medoids and hierarchical methods. In some papers it is sometimes paired with k-means though this is a controversial matter (Stack Exchange (2015)). Out of non classical methods, it was also paired with random-swap and hybrid(Aghabozorgi, Shirkhorshidi, and Wah (2015)).

2 Dynamic Time Warping algorithm

To demonstrate this method, I created two vectors a1 and a2, each consisting of 9 datapoints. They have been plotted below.

a1 <- c(7,9,6,9,12,6,4,6,8)
a2 <- c(5,6,4,3,9,5,6,8,9)

xrange <- range(1:9)
yrange <- range(c(a1,a2))

plot(xrange, yrange, type="n", xlab="time",
     ylab="value", xaxp  = c(0,10,10), yaxp  = c(3,12,9)) 
lines(a1, col='blue', type='l')
lines(a2, col='magenta', type='l')

Using the dtw package and dtw function, calculating the distance between two vectors, we get two other vectors showing the best possible matching between datapoints. Index 1 represents values for x axis:

dtw(a1,a2)$index1

##  [1] 1 2 3 3 3 4 5 6 7 8 9 9

and index 2 for y axis:

dtw(a1,a2)$index2

##  [1] 1 1 2 3 4 5 5 6 6 7 8 9

Distance and alignment can be also calculated using dtwclust::dtw_basic which is a simpler version of dtw - without redundant functions

The matching calculated above can also be portrayed on a graph. Diagonal lines portray a situation where data has been matched on-to-one. Vertical and horizontal lines mean that one point on one axis represents more than one point on another.

plot(dtw(a1,a2), xlab="a1 - blue", ylab="a2 - magenta", xaxp  = c(0,10,10), yaxp = c(0,10,10))

The same plot can be made using type=“threeway” that additionally shows the plots on x ad y axis. The dtw functions has to also have keep=TRUE defined.

plot(dtw(a1,a2, keep=TRUE), xlab="a1 - blue", ylab="a2 - magenta", xaxp  = c(0,10,10), yaxp = c(0,10,10), type="threeway")

When joining points on plots it looks like this:

3 Dtw step patterns

One can also plot the connections as below using type=“twoway”.

plot(dtw(a1,a2, keep=TRUE), xaxp  = c(0,10,10), yaxp = c(0,10,10), type="twoway", col=c('blue', 'magenta'))

DTW by default uses symmetric2. In tabs there are calculations performed for symmetric1 and symmetric2 step patterns.

Because an array has it’s point [1,1] in the upper left corner all matrices are rotated 90 degrees to the right compared to the results from dtw. Traditionally the ‘beginning’ of the dtw matrix should be in the lower left corner.

symmetric2

Equation:

dtw(a1, a2)$stepPattern

## Step pattern recursion:
## g[i,j] = min(
##      g[i-1,j-1] + 2 * d[i  ,j  ] ,
##      g[i  ,j-1] +     d[i  ,j  ] ,
##      g[i-1,j  ] +     d[i  ,j  ] ,
##   )
## 
##  Normalization hint: N+M

Alignment plot

Creating a zeroes matrix

dtw_matrix = matrix(rep(c(0),81), nrow=9, ncol=9, byrow = TRUE)
dtw_matrix

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    0    0    0    0    0    0    0    0    0
##  [2,]    0    0    0    0    0    0    0    0    0
##  [3,]    0    0    0    0    0    0    0    0    0
##  [4,]    0    0    0    0    0    0    0    0    0
##  [5,]    0    0    0    0    0    0    0    0    0
##  [6,]    0    0    0    0    0    0    0    0    0
##  [7,]    0    0    0    0    0    0    0    0    0
##  [8,]    0    0    0    0    0    0    0    0    0
##  [9,]    0    0    0    0    0    0    0    0    0

The matrix is calculated according to the step pattern formula. Additionally \(d(0,j)=d(i,0)=d(0,0)=\infty\). That is why it is not included in calculations - will never be accounted for by min() function. I will be using the default euclidean distance between points.

First element:

dtw_matrix[1,1] = sqrt(a1[1]^2 + a2[1]^2)
dtw_matrix

##           [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,] 8.602325    0    0    0    0    0    0    0    0
##  [2,] 0.000000    0    0    0    0    0    0    0    0
##  [3,] 0.000000    0    0    0    0    0    0    0    0
##  [4,] 0.000000    0    0    0    0    0    0    0    0
##  [5,] 0.000000    0    0    0    0    0    0    0    0
##  [6,] 0.000000    0    0    0    0    0    0    0    0
##  [7,] 0.000000    0    0    0    0    0    0    0    0
##  [8,] 0.000000    0    0    0    0    0    0    0    0
##  [9,] 0.000000    0    0    0    0    0    0    0    0

First column:

for (i in 2:9){
  dtw_matrix[i,1] = sqrt((a1[i] - a2[1])^2) + dtw_matrix[i-1,1]
}
dtw_matrix

##            [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]  8.602325    0    0    0    0    0    0    0    0
##  [2,] 12.602325    0    0    0    0    0    0    0    0
##  [3,] 13.602325    0    0    0    0    0    0    0    0
##  [4,] 17.602325    0    0    0    0    0    0    0    0
##  [5,] 24.602325    0    0    0    0    0    0    0    0
##  [6,] 25.602325    0    0    0    0    0    0    0    0
##  [7,] 26.602325    0    0    0    0    0    0    0    0
##  [8,] 27.602325    0    0    0    0    0    0    0    0
##  [9,] 30.602325    0    0    0    0    0    0    0    0

First row:

for (j in 2:9){
  dtw_matrix[1,j] = sqrt((a1[1] - a2[j])^2) + dtw_matrix[1,j-1]
}
dtw_matrix

##            [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
##  [1,]  8.602325 9.602325 12.60233 16.60233 18.60233 20.60233 21.60233
##  [2,] 12.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [3,] 13.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [4,] 17.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [5,] 24.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [6,] 25.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [7,] 26.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [8,] 27.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [9,] 30.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##           [,8]     [,9]
##  [1,] 22.60233 24.60233
##  [2,]  0.00000  0.00000
##  [3,]  0.00000  0.00000
##  [4,]  0.00000  0.00000
##  [5,]  0.00000  0.00000
##  [6,]  0.00000  0.00000
##  [7,]  0.00000  0.00000
##  [8,]  0.00000  0.00000
##  [9,]  0.00000  0.00000

The rest of the matrix:

for (i in 2:9){
  for (j in 2:9){
    dtw_matrix[i,j] = sqrt((a1[i] - a2[j])^2) + min(dtw_matrix[i,j-1], dtw_matrix[i-1,j], dtw_matrix[i-1,j-1] + sqrt((a1[i] - a2[j])^2))
  }
}
dtw_matrix

##            [,1]      [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
##  [1,]  8.602325  9.602325 12.60233 16.60233 18.60233 20.60233 21.60233
##  [2,] 12.602325 12.602325 17.60233 22.60233 16.60233 20.60233 23.60233
##  [3,] 13.602325 12.602325 14.60233 17.60233 19.60233 18.60233 18.60233
##  [4,] 17.602325 15.602325 19.60233 23.60233 17.60233 21.60233 21.60233
##  [5,] 24.602325 21.602325 27.60233 32.60233 20.60233 27.60233 27.60233
##  [6,] 25.602325 21.602325 23.60233 26.60233 23.60233 22.60233 22.60233
##  [7,] 26.602325 23.602325 21.60233 22.60233 27.60233 23.60233 24.60233
##  [8,] 27.602325 23.602325 23.60233 25.60233 28.60233 24.60233 23.60233
##  [9,] 30.602325 25.602325 27.60233 30.60233 27.60233 27.60233 25.60233
##           [,8]     [,9]
##  [1,] 22.60233 24.60233
##  [2,] 23.60233 22.60233
##  [3,] 20.60233 23.60233
##  [4,] 20.60233 20.60233
##  [5,] 24.60233 23.60233
##  [6,] 24.60233 26.60233
##  [7,] 28.60233 31.60233
##  [8,] 25.60233 28.60233
##  [9,] 23.60233 24.60233

Find the optimal alignment:

source of translated algorithm, slightly modified because of another approach to decisions in case of equal values: https://nipunbatra.github.io/blog/2014/dtw.html

The change is - if \(d(i-1, j-1)=d(i, j-1)\) we choose \(d(i, j-1)\) and if \(d(i-1, j-1)=d(i-1, j)\) we choose \(d(i-1, j-1)\).

path = c(9,9) # starting with furthest place in matrix
i = 9
j = 9
while(i>1 & j>1){
  if (j == 1) {
    j = j - 1
  } else if (i == 1) {
    i = i - 1
  } else if (dtw_matrix[i,j-1] == min(dtw_matrix[i-1, j-1], dtw_matrix[i-1, j], dtw_matrix[i, j-1])){
    j = j - 1
  } else if (dtw_matrix[i-1,j-1] == min(dtw_matrix[i-1, j-1], dtw_matrix[i-1, j], dtw_matrix[i, j-1])){
    i = i - 1
    j = j - 1
  } else {
    i = i - 1
    }
  path = rbind(path, c(i,j))
}
path = rbind(path, c(1,1))

plot(dtw(a1,a2))
points(path[,1], path[,2], type="l")

symmetric1

Equation:

dtw(a1, a2, step.pattern = symmetric1)$stepPattern

## Step pattern recursion:
## g[i,j] = min(
##      g[i-1,j-1] +     d[i  ,j  ] ,
##      g[i  ,j-1] +     d[i  ,j  ] ,
##      g[i-1,j  ] +     d[i  ,j  ] ,
##   )
## 
##  Normalization hint: NA

Alignment plot

Creating a zeroes matrix

dtw_matrix = matrix(rep(c(0),81), nrow=9, ncol=9, byrow = TRUE)
dtw_matrix

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    0    0    0    0    0    0    0    0    0
##  [2,]    0    0    0    0    0    0    0    0    0
##  [3,]    0    0    0    0    0    0    0    0    0
##  [4,]    0    0    0    0    0    0    0    0    0
##  [5,]    0    0    0    0    0    0    0    0    0
##  [6,]    0    0    0    0    0    0    0    0    0
##  [7,]    0    0    0    0    0    0    0    0    0
##  [8,]    0    0    0    0    0    0    0    0    0
##  [9,]    0    0    0    0    0    0    0    0    0

First element:

dtw_matrix[1,1] = sqrt(a1[1]^2 + a2[1]^2)
dtw_matrix

##           [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,] 8.602325    0    0    0    0    0    0    0    0
##  [2,] 0.000000    0    0    0    0    0    0    0    0
##  [3,] 0.000000    0    0    0    0    0    0    0    0
##  [4,] 0.000000    0    0    0    0    0    0    0    0
##  [5,] 0.000000    0    0    0    0    0    0    0    0
##  [6,] 0.000000    0    0    0    0    0    0    0    0
##  [7,] 0.000000    0    0    0    0    0    0    0    0
##  [8,] 0.000000    0    0    0    0    0    0    0    0
##  [9,] 0.000000    0    0    0    0    0    0    0    0

First column:

for (i in 2:9){
  dtw_matrix[i,1] = sqrt((a1[i] - a2[1])^2) + dtw_matrix[i-1,1]
}
dtw_matrix

##            [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]  8.602325    0    0    0    0    0    0    0    0
##  [2,] 12.602325    0    0    0    0    0    0    0    0
##  [3,] 13.602325    0    0    0    0    0    0    0    0
##  [4,] 17.602325    0    0    0    0    0    0    0    0
##  [5,] 24.602325    0    0    0    0    0    0    0    0
##  [6,] 25.602325    0    0    0    0    0    0    0    0
##  [7,] 26.602325    0    0    0    0    0    0    0    0
##  [8,] 27.602325    0    0    0    0    0    0    0    0
##  [9,] 30.602325    0    0    0    0    0    0    0    0

First row:

for (j in 2:9){
  dtw_matrix[1,j] = sqrt((a1[1] - a2[j])^2) + dtw_matrix[1,j-1]
}
dtw_matrix

##            [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
##  [1,]  8.602325 9.602325 12.60233 16.60233 18.60233 20.60233 21.60233
##  [2,] 12.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [3,] 13.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [4,] 17.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [5,] 24.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [6,] 25.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [7,] 26.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [8,] 27.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##  [9,] 30.602325 0.000000  0.00000  0.00000  0.00000  0.00000  0.00000
##           [,8]     [,9]
##  [1,] 22.60233 24.60233
##  [2,]  0.00000  0.00000
##  [3,]  0.00000  0.00000
##  [4,]  0.00000  0.00000
##  [5,]  0.00000  0.00000
##  [6,]  0.00000  0.00000
##  [7,]  0.00000  0.00000
##  [8,]  0.00000  0.00000
##  [9,]  0.00000  0.00000

The rest of the matrix:

for (i in 2:9){
  for (j in 2:9){
    dtw_matrix[i,j] = sqrt((a1[i] - a2[j])^2) + min(dtw_matrix[i,j-1], dtw_matrix[i-1,j], dtw_matrix[i-1,j-1])
  }
}
dtw_matrix

##            [,1]      [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
##  [1,]  8.602325  9.602325 12.60233 16.60233 18.60233 20.60233 21.60233
##  [2,] 12.602325 11.602325 14.60233 18.60233 16.60233 20.60233 23.60233
##  [3,] 13.602325 11.602325 13.60233 16.60233 19.60233 17.60233 17.60233
##  [4,] 17.602325 14.602325 16.60233 19.60233 16.60233 20.60233 20.60233
##  [5,] 24.602325 20.602325 22.60233 25.60233 19.60233 23.60233 26.60233
##  [6,] 25.602325 20.602325 22.60233 25.60233 22.60233 20.60233 20.60233
##  [7,] 26.602325 22.602325 20.60233 21.60233 26.60233 21.60233 22.60233
##  [8,] 27.602325 22.602325 22.60233 23.60233 24.60233 22.60233 21.60233
##  [9,] 30.602325 24.602325 26.60233 27.60233 24.60233 25.60233 23.60233
##           [,8]     [,9]
##  [1,] 22.60233 24.60233
##  [2,] 22.60233 22.60233
##  [3,] 19.60233 22.60233
##  [4,] 18.60233 18.60233
##  [5,] 22.60233 21.60233
##  [6,] 22.60233 24.60233
##  [7,] 24.60233 27.60233
##  [8,] 23.60233 26.60233
##  [9,] 21.60233 22.60233

Find the optimal alignment: source of translated algorithm, slightly modified because of another approach to decisions in case of equal values: https://nipunbatra.github.io/blog/2014/dtw.html The change is - if \(d(i-1, j-1)=d(i-1, j)\) we choose \(d(i, j-1)\) and if \(d(i-1, j-1)=d(i-1, j)\) we choose \(d(i-1, j-1)\).

path = c(9,9) # starting with furthest place in matrix
i = 9
j = 9
while(i>1 & j>1){
  if (j == 1) {
    j = j - 1
  } else if (i == 1) {
    i = i - 1
  } else if (dtw_matrix[i-1,j] == min(dtw_matrix[i-1, j-1], dtw_matrix[i-1, j], dtw_matrix[i, j-1])){
    i = i - 1
  } else if (dtw_matrix[i-1,j-1] == min(dtw_matrix[i-1, j-1], dtw_matrix[i-1, j], dtw_matrix[i, j-1])){
    i = i - 1
    j = j - 1
  } else {
    j = j - 1
    }
  path = rbind(path, c(i,j))
}
path = rbind(path, c(1,1))

plot(dtw(a1,a2, step.pattern = symmetric1))
points(path[,1], path[,2], type = "l")

4 Using the dtwclust package for time series clustering

4.1 dtwclust::tsclust

In the dtwclust package is a function tsclust dedicatd to clustering time series. Its options are presented below.

tsclust(series = NULL, type = “partitional”, k = 2L, …, preproc = NULL, distance = “dtw_basic”, centroid = ifelse(type == “fuzzy”, “fcm”, “pam”), control = do.call(paste0(type, “_control“), list()), args = tsclust_args(), seed = NULL, trace = FALSE, error.check = TRUE)

Arguments that are useful to modify:

distance (only some, there are more)
- “dtw_basic” is calculated using dtwclust::dtw_basic, a simpler and faster version of dtw::dtw
- “dtw” is calculated using dtw::dtw, as in the example above
- “dtw2” is calculated using dtwclust::dtw2 which is dtw with L2 norm *type
- “partitional”
- “hierarchical”
- “tadpole”
- “fuzzy”
k
- number of clusters
centroids
- median
- mean
- shape
- dba - DTW Barycenter Averaging
- sdtw_cent - soft-DTW centroids
- pam
- fcm - fuzzy c-means
- fcmdd - fuzzy c-medoids

As stated at the beginning, the most popular types of clusters paired with DTW are PAM and hierarchical, so this is what I will be using. I will be performing the analysis on data collected from Eurostat on emissions of greenhouse gasses per capita Eurostat (2014). The structure of this data is visible below. Each columns represents a different year.

head(emissions)

##    X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007 X2008 X2009 X2010 X2011
## AT  10.3  10.7  10.9  11.5  11.4  11.5  11.1  10.8  10.7   9.8  10.4  10.1
## BE  15.1  14.8  14.6  14.6  14.7  14.2  13.9  13.5  13.4  12.1  12.6  11.5
## BG   7.3   7.9   7.7   8.3   8.3   8.4   8.6   9.1   9.0   7.8   8.3   9.0
## CH   7.9   8.1   7.7   7.8   7.8   7.8   7.8   7.5   7.6   7.3   7.5   6.9
## CY  13.1  13.1  13.2  13.7  13.7  13.6  13.7  13.8  13.8  13.1  12.4  11.7
## CZ  14.7  14.7  14.4  14.7  14.8  14.6  14.7  14.8  14.2  13.3  13.5  13.3
##    X2012 X2013 X2014 X2015 X2016
## AT   9.7   9.7   9.2   9.4   9.4
## BE  11.1  11.1  10.5  10.8  10.8
## BG   8.4   7.7   8.2   8.7   8.4
## CH   7.0   7.1   6.5   6.4   6.4
## CY  10.9  10.0  10.6  10.7  11.3
## CZ  12.9  12.4  12.2  12.3  12.4

5 k-medoids

First I am normalising the chosen data and performing clustering using PAM and DTW distance on a range of 2-17 clusters.

emissions.norm <- BBmisc::normalize(emissions, method="standardize")
clust.pam <- tsclust(emissions.norm, type="partitional", k=2L:17L, distance="dtw", centroid="pam")

5.1 Clustering evaluation

Next I am evaluating the clusters depending on their number. For tsclust object to perform cluster validity one can use the cvi function. By using apply, cvi is calculated for all chosen numbers of clusters. This is a workaround to get the output normally given by for example fviz_nbclust. In each of the tabs are plots of given statistics per number of clusters.

Statistics

##                 k_2          k_3          k_4          k_5          k_6
## Sil    6.219367e-01 4.804408e-01 3.496008e-01 3.591146e-01 2.978441e-01
## SF     8.580315e-06 3.522597e-08 6.769466e-09 8.090195e-12 1.975753e-12
## CH     4.235012e+01 2.981205e+01 2.316044e+01 2.474140e+01 1.937403e+01
## DB     5.375327e-01 8.667175e-01 9.520239e-01 8.326791e-01 9.168558e-01
## DBstar 5.375327e-01 9.850792e-01 1.789908e+00 2.086180e+00 3.194902e+00
## D      3.031549e-02 1.423910e-02 9.053286e-03 2.084148e-02 2.084148e-02
## COP    2.066182e-01 1.095197e-01 1.019924e-01 6.217320e-02 5.845241e-02
##                 k_7          k_8          k_9         k_10        k_11
## Sil    3.329783e-01 4.169944e-01 2.385236e-01 2.486685e-01  0.28305440
## SF     1.176836e-14 4.129316e-07 2.220446e-16 6.661338e-16  0.00000000
## CH     1.763049e+01 2.769270e+01 1.211769e+01 1.220693e+01 10.33570198
## DB     1.323454e+00 5.597688e-01 1.059226e+00 1.353445e+00  1.72345891
## DBstar 3.669975e+00 7.364860e-01 4.231454e+00 6.513487e+00 12.43928247
## D      2.716034e-02 1.628321e-01 2.143362e-02 2.716034e-02  0.01307001
## COP    5.248977e-02 2.349145e-02 5.175937e-02 4.834615e-02  0.04264722
##              k_12         k_13         k_14         k_15         k_16
## Sil    0.14288643 1.542856e-01 2.507844e-01 2.100680e-01 1.769669e-01
## SF     0.00000000 1.085094e-07 4.008583e-07 8.074189e-08 9.923017e-08
## CH     6.80744274 1.718176e+01 1.952996e+01 1.583676e+01 1.341178e+01
## DB     3.12852806 9.254162e-01 8.237494e-01 7.953705e-01 1.007985e+00
## DBstar 9.33335201 2.083942e+00 1.749763e+00 1.967089e+00 2.724805e+00
## D      0.01577096 5.372844e-02 1.753211e-01 1.705979e-01 5.877414e-02
## COP    0.06281129 2.132327e-02 1.485456e-02 1.677851e-02 1.788782e-02
##                k_17
## Sil    6.945246e-02
## SF     3.237554e-07
## CH     1.393392e+01
## DB     7.515518e-01
## DBstar 2.182193e+00
## D      8.606837e-02
## COP    1.293244e-02

Silhouette index

to be maximized

Score function

to be maximized

Calinski-Harabasz index

to be maximized

Davies–Bouldin index

to be minimized

Modified Davies–Bouldin index

to be minimized

Dunn index

to be maximized

COP index

to be minimized

5.2 Clustering with 6 clusters

The best results across statistics almost uniformly were for 6 clusters. Next I am performing the clustering only with 6 clusters.

clust.pam <- tsclust(emissions.norm, type="partitional", k=6L, distance="dtw", clustering="pam")

By using type=“sc” I am plotting different clusters and the time series that belong to those clusters. The dashed line represents the medoid time series.

plot(clust.pam, type = "sc")

Below is a list of countries and which cluster they belong to.

t(cbind(emissions[,0], cluster = clust.pam@cluster))

##         AT BE BG CH CY CZ DE DK EE EL ES FI FR HR HU IE IS IT LI LT LU LV
## cluster  4  4  1  1  4  5  4  4  3  4  2  5  2  6  6  5  3  2  1  6  3  6
##         MT NL NO PL PT RO SE SI SK TR UK
## cluster  1  5  4  4  1  6  1  2  2  6  2

Next in each of the tabs I am plotting each individual cluster from above saparately.

5.3 For each cluster

cluster 1

Here I am plotting the clusters in three ways. Series shows the members of chosen cluster (with clus = kL). Centroids shows a plot of time series that has been chosen as medoid. Plot with type=“sc” is actually a combination of series and centroids.

plot(clust.pam, type = "sc", clus = 1L)

plot(clust.pam, type = "series", clus = 1L)

plot(clust.pam, type = "centroids", clus = 1L)

cluster 2

plot(clust.pam, type = "sc", clus = 2L)

cluster 3

plot(clust.pam, type = "sc", clus = 3L)

cluster 4

plot(clust.pam, type = "sc", clus = 4L)

cluster 5

plot(clust.pam, type = "sc", clus = 5L)

cluster 6

plot(clust.pam, type = "sc", clus = 6L)

6 Hierarchical method

With hierarchical clustering the syntax is the same as before, the only thing is the type is now “h”.

clust.hier <- tsclust(emissions.norm, type = "h", k = 6L, distance = "dtw")

Below is a hierarchical tree. In boxes are countries that are clustered together in terms of emissions.

Here are time series divided by clusters this time using the hierarchical method.

plot(clust.hier, type="sc")

Below is a list of countries and which cluster they belong to.

cutree(clust.hier, k=6L)

## AT BE BG CH CY CZ DE DK EE EL ES FI FR HR HU IE IS IT LI LT LU LV MT NL NO 
##  1  2  1  3  4  2  4  4  5  4  1  2  1  3  3  2  5  1  3  3  6  3  3  2  4 
## PL PT RO SE SI SK TR UK 
##  1  3  3  3  1  1  3  4

By comparison, the results of clustering with k=6 show that they are comparable. The hierarchical method performs better according to silhouette, Caliński-Harabasz and COP indexes but PAM according to Score Function, classic and modified Davies-Bouldin and Dunn indexes.

cvi(clust.hier)

##          Sil           SF           CH           DB       DBstar 
## 4.707510e-01 1.069388e-06 3.200694e+01 5.473567e-01 5.853817e-01 
##            D          COP 
## 1.961133e-01 3.267657e-02

cvi(clust.pam)

##          Sil           SF           CH           DB       DBstar 
## 3.242636e-01 3.197442e-14 2.112754e+01 1.385326e+00 3.210130e+00 
##            D          COP 
## 2.723451e-02 5.796156e-02

Bibliography:

Aghabozorgi, Saeed, Ali Seyed Shirkhorshidi, and Teh Ying Wah. 2015. “Time-Series Clustering–A Decade Review.” Information Systems 53. Elsevier: 16–38.

Eurostat. 2014. “Greenhouse Gas Emissions Per Capita (T2020_rd300).” https://ec.europa.eu/eurostat/tgm/table.do?tab=table&init=1&language=en&pcode=t2020_rd300&plugin=1.

Guijo-Rubio, David, Antonio Manuel Durán-Rosal, Pedro Antonio Gutiérrez, Alicia Troncoso, and César Hervás-Martínez. 2018. “Time Series Clustering Based on the Characterisation of Segment Typologies.” arXiv Preprint arXiv:1810.11624.

Liao, T Warren. 2005. “Clustering of Time Series Data—a Survey.” Pattern Recognition 38 (11). Elsevier: 1857–74.

Rote, Günter. 1991. “Computing the Minimum Hausdorff Distance Between Two Point Sets on a Line Under Translation.” Information Processing Letters 38 (3). Elsevier: 123–27.

Stack Exchange, Forum. 2015. “Dynamic Time Warping Using K-Means.” https://stats.stackexchange.com/questions/131281/dynamic-time-warping-clustering.

Wang, Xiaoyue, Abdullah Mueen, Hui Ding, Goce Trajcevski, Peter Scheuermann, and Eamonn Keogh. 2013. “Experimental Comparison of Representation Methods and Distance Measures for Time Series Data.” Data Mining and Knowledge Discovery 26 (2). Springer: 275–309.

Zeng, Jianping, Jiangjiao Duan, and Chengrong Wu. 2010. “A New Distance Measure for Hidden Markov Models.” Expert Systems with Applications 37 (2). Elsevier: 1550–5.

Dynamic Time Warping (DTW) as a mean to cluster time series

Ewa Sobolewska

1 On clustering time series data

1.1 Clustering time series in literature

1.2 Euclidean vs. Dynamic Time Warping

2 Dynamic Time Warping algorithm

3 Dtw step patterns

symmetric2

symmetric1

4 Using the dtwclust package for time series clustering

4.1 dtwclust::tsclust

5 k-medoids

5.1 Clustering evaluation

Statistics

Silhouette index

Score function

Calinski-Harabasz index

Davies–Bouldin index

Modified Davies–Bouldin index

Dunn index

COP index

5.2 Clustering with 6 clusters

5.3 For each cluster

cluster 1

cluster 2

cluster 3

cluster 4

cluster 5

cluster 6

6 Hierarchical method

Bibliography: