Thanks to the ability of collecting information over long periods of time and the growth of computing power, data is often represented in points in time. That way one observation can be actually a sequence of values, called a univariate time series. Such a data structure is very popular and can be found in various fields. The majority of socio-economic (GDP, employment, birth rates, education), financial (stock prices, exchange rates) and environmental (pollution, energy consumption) indicators are collected over time. Time series data is also broadly used in medicine blood pressure, EKG), IoT (monitoring body activity, energy management) or meteorology (weather, earthquake analysis). Analysis of this type of data is essential in Data Science.
As mentioned before, a univariate time series, although consisting of many values, can be treated as one observation. Clustering can be useful to find patterns in datasets consisting of many time series and simplifying them (Aghabozorgi, Shirkhorshidi, and Wah (2015)).
When clustering time series data, we can develop many types of approaches to treating that data. Overall there have been three main types of approaches defined (Guijo-Rubio et al. (2018), p.2)(Aghabozorgi, Shirkhorshidi, and Wah (2015),p.18-19). The first one is whole time series, where we treat one time-series as one observation. As an example one can take economical databases such as Eurostat. Observations are usually countries and in each column is a value of chosen indicator in a specific time point. In order to cluster those countries based on similarity of datapoints in time the distance should be calculated between all points. Second defined approach is called subsequence. It takes into account only one time series and divides it into sections clustering them by similarity. It can be used on data collected by gyroscopes for example in smartwatches to determine types of activities that were performed in that time. That way we can find moments whena person was sitting, standing etc. The last approach is time point and is also applied on single time series but by clustering single points. It treats the time series the same way as if the datapoints weren’t ordered, but independent values. Although all three methods have important applications, this paper will be concentrated only on whole time series analysis.
When taking into account time series, there have been several measures applied (Aghabozorgi, Shirkhorshidi, and Wah (2015). p. 22). For example there is probability-based distance, that takes into account the seasonality of data, Hausdorff distance defined as “the maximum of the distances from a point in any of the sets to the nearest point in the other set” (Rote (1991)) or Hidden Markov model based distance used in modelling complex time series (Zeng, Duan, and Wu (2010)). The most popular however is the Euclidean distance and Dynamic Time Warping distance. It has been proven that the Euclidean distance is the most efficient, but forces both time series to be the same length (Guijo-Rubio et al. (2018); Wang et al. (2013)). DTW method is however known to be the most accurate (Wang et al. (2013)).
Methods of measuring distance in time series have been grouped based on approach - shape based, feature based and model based (Aghabozorgi, Shirkhorshidi, and Wah (2015)). Sometimes, researchers define also edit-based and compression based distances (Liao (2005)). Euclidean and DTW distances are both classified as shape based. This type takes into account the overall shape and matches time series based on that aspect.
The main difference between both distances can be best understood graphically. The picture below shows an example of matched points of two data vectors. They were connected based on the minimal distance between points based on DTW (black) and Euclidean (red) distance. It can be seen that with DTW the 11th blue point matches 4 green points. When taking into account the Euclidean distance it can be seen that the assigned points 9-to-9 and 10-to-10 are visibly further than 9-to-11 and 10-to-11. That can significantly impact the overall distance of series between each other. The DTW takes into account the shape of both time series much better.
Fig. 1: Visual comparison of matched points based on DTW (black) and Euclidean (red) distance
Dynamic Time warping is a method of calculating distance that is more accurate than Euclidean distance. It has an advantage over Euclidean if datapoints are shifted between each other and we want to look rather at its shape. Additionally two time series don’t have to be equal in length what is an assumption required by the Euclidean distance. The Euclidean distance takes pairs of datapoints and compares them to each other. DTW calculates the smallest distance between all points - this enables a one-to-many match. In literature dynamic time warping is often paired with k-medoids and hierarchical methods. In some papers it is sometimes paired with k-means though this is a controversial matter (Stack Exchange (2015)). Out of non classical methods, it was also paired with random-swap and hybrid(Aghabozorgi, Shirkhorshidi, and Wah (2015)).
To demonstrate this method, I created two vectors a1 and a2, each consisting of 9 datapoints. They have been plotted below.
a1 <- c(7,9,6,9,12,6,4,6,8)
a2 <- c(5,6,4,3,9,5,6,8,9)
xrange <- range(1:9)
yrange <- range(c(a1,a2))
plot(xrange, yrange, type="n", xlab="time",
ylab="value", xaxp = c(0,10,10), yaxp = c(3,12,9))
lines(a1, col='blue', type='l')
lines(a2, col='magenta', type='l')
Using the dtw package and dtw function, calculating the distance between two vectors, we get two other vectors showing the best possible matching between datapoints. Index 1 represents values for x axis:
dtw(a1,a2)$index1
## [1] 1 2 3 3 3 4 5 6 7 8 9 9
and index 2 for y axis:
dtw(a1,a2)$index2
## [1] 1 1 2 3 4 5 5 6 6 7 8 9
Distance and alignment can be also calculated using dtwclust::dtw_basic which is a simpler version of dtw - without redundant functions
The matching calculated above can also be portrayed on a graph. Diagonal lines portray a situation where data has been matched on-to-one. Vertical and horizontal lines mean that one point on one axis represents more than one point on another.
plot(dtw(a1,a2), xlab="a1 - blue", ylab="a2 - magenta", xaxp = c(0,10,10), yaxp = c(0,10,10))
The same plot can be made using type=“threeway” that additionally shows the plots on x ad y axis. The dtw functions has to also have keep=TRUE defined.
plot(dtw(a1,a2, keep=TRUE), xlab="a1 - blue", ylab="a2 - magenta", xaxp = c(0,10,10), yaxp = c(0,10,10), type="threeway")
When joining points on plots it looks like this:
One can also plot the connections as below using type=“twoway”.
plot(dtw(a1,a2, keep=TRUE), xaxp = c(0,10,10), yaxp = c(0,10,10), type="twoway", col=c('blue', 'magenta'))
DTW by default uses symmetric2. In tabs there are calculations performed for symmetric1 and symmetric2 step patterns.
Because an array has it’s point [1,1] in the upper left corner all matrices are rotated 90 degrees to the right compared to the results from dtw. Traditionally the ‘beginning’ of the dtw matrix should be in the lower left corner.
Equation:
dtw(a1, a2)$stepPattern
## Step pattern recursion:
## g[i,j] = min(
## g[i-1,j-1] + 2 * d[i ,j ] ,
## g[i ,j-1] + d[i ,j ] ,
## g[i-1,j ] + d[i ,j ] ,
## )
##
## Normalization hint: N+M
Alignment plot
Creating a zeroes matrix
dtw_matrix = matrix(rep(c(0),81), nrow=9, ncol=9, byrow = TRUE)
dtw_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 0 0 0 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0 0 0 0
## [6,] 0 0 0 0 0 0 0 0 0
## [7,] 0 0 0 0 0 0 0 0 0
## [8,] 0 0 0 0 0 0 0 0 0
## [9,] 0 0 0 0 0 0 0 0 0
The matrix is calculated according to the step pattern formula. Additionally \(d(0,j)=d(i,0)=d(0,0)=\infty\). That is why it is not included in calculations - will never be accounted for by min() function. I will be using the default euclidean distance between points.
First element:
dtw_matrix[1,1] = sqrt(a1[1]^2 + a2[1]^2)
dtw_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 8.602325 0 0 0 0 0 0 0 0
## [2,] 0.000000 0 0 0 0 0 0 0 0
## [3,] 0.000000 0 0 0 0 0 0 0 0
## [4,] 0.000000 0 0 0 0 0 0 0 0
## [5,] 0.000000 0 0 0 0 0 0 0 0
## [6,] 0.000000 0 0 0 0 0 0 0 0
## [7,] 0.000000 0 0 0 0 0 0 0 0
## [8,] 0.000000 0 0 0 0 0 0 0 0
## [9,] 0.000000 0 0 0 0 0 0 0 0
First column:
for (i in 2:9){
dtw_matrix[i,1] = sqrt((a1[i] - a2[1])^2) + dtw_matrix[i-1,1]
}
dtw_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 8.602325 0 0 0 0 0 0 0 0
## [2,] 12.602325 0 0 0 0 0 0 0 0
## [3,] 13.602325 0 0 0 0 0 0 0 0
## [4,] 17.602325 0 0 0 0 0 0 0 0
## [5,] 24.602325 0 0 0 0 0 0 0 0
## [6,] 25.602325 0 0 0 0 0 0 0 0
## [7,] 26.602325 0 0 0 0 0 0 0 0
## [8,] 27.602325 0 0 0 0 0 0 0 0
## [9,] 30.602325 0 0 0 0 0 0 0 0
First row:
for (j in 2:9){
dtw_matrix[1,j] = sqrt((a1[1] - a2[j])^2) + dtw_matrix[1,j-1]
}
dtw_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 8.602325 9.602325 12.60233 16.60233 18.60233 20.60233 21.60233
## [2,] 12.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [3,] 13.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [4,] 17.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [5,] 24.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [6,] 25.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [7,] 26.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [8,] 27.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [9,] 30.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [,8] [,9]
## [1,] 22.60233 24.60233
## [2,] 0.00000 0.00000
## [3,] 0.00000 0.00000
## [4,] 0.00000 0.00000
## [5,] 0.00000 0.00000
## [6,] 0.00000 0.00000
## [7,] 0.00000 0.00000
## [8,] 0.00000 0.00000
## [9,] 0.00000 0.00000
The rest of the matrix:
for (i in 2:9){
for (j in 2:9){
dtw_matrix[i,j] = sqrt((a1[i] - a2[j])^2) + min(dtw_matrix[i,j-1], dtw_matrix[i-1,j], dtw_matrix[i-1,j-1] + sqrt((a1[i] - a2[j])^2))
}
}
dtw_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 8.602325 9.602325 12.60233 16.60233 18.60233 20.60233 21.60233
## [2,] 12.602325 12.602325 17.60233 22.60233 16.60233 20.60233 23.60233
## [3,] 13.602325 12.602325 14.60233 17.60233 19.60233 18.60233 18.60233
## [4,] 17.602325 15.602325 19.60233 23.60233 17.60233 21.60233 21.60233
## [5,] 24.602325 21.602325 27.60233 32.60233 20.60233 27.60233 27.60233
## [6,] 25.602325 21.602325 23.60233 26.60233 23.60233 22.60233 22.60233
## [7,] 26.602325 23.602325 21.60233 22.60233 27.60233 23.60233 24.60233
## [8,] 27.602325 23.602325 23.60233 25.60233 28.60233 24.60233 23.60233
## [9,] 30.602325 25.602325 27.60233 30.60233 27.60233 27.60233 25.60233
## [,8] [,9]
## [1,] 22.60233 24.60233
## [2,] 23.60233 22.60233
## [3,] 20.60233 23.60233
## [4,] 20.60233 20.60233
## [5,] 24.60233 23.60233
## [6,] 24.60233 26.60233
## [7,] 28.60233 31.60233
## [8,] 25.60233 28.60233
## [9,] 23.60233 24.60233
Find the optimal alignment:
source of translated algorithm, slightly modified because of another approach to decisions in case of equal values: https://nipunbatra.github.io/blog/2014/dtw.html
The change is - if \(d(i-1, j-1)=d(i, j-1)\) we choose \(d(i, j-1)\) and if \(d(i-1, j-1)=d(i-1, j)\) we choose \(d(i-1, j-1)\).
path = c(9,9) # starting with furthest place in matrix
i = 9
j = 9
while(i>1 & j>1){
if (j == 1) {
j = j - 1
} else if (i == 1) {
i = i - 1
} else if (dtw_matrix[i,j-1] == min(dtw_matrix[i-1, j-1], dtw_matrix[i-1, j], dtw_matrix[i, j-1])){
j = j - 1
} else if (dtw_matrix[i-1,j-1] == min(dtw_matrix[i-1, j-1], dtw_matrix[i-1, j], dtw_matrix[i, j-1])){
i = i - 1
j = j - 1
} else {
i = i - 1
}
path = rbind(path, c(i,j))
}
path = rbind(path, c(1,1))
plot(dtw(a1,a2))
points(path[,1], path[,2], type="l")
Equation:
dtw(a1, a2, step.pattern = symmetric1)$stepPattern
## Step pattern recursion:
## g[i,j] = min(
## g[i-1,j-1] + d[i ,j ] ,
## g[i ,j-1] + d[i ,j ] ,
## g[i-1,j ] + d[i ,j ] ,
## )
##
## Normalization hint: NA
Alignment plot
Creating a zeroes matrix
dtw_matrix = matrix(rep(c(0),81), nrow=9, ncol=9, byrow = TRUE)
dtw_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 0 0 0 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0 0 0 0
## [6,] 0 0 0 0 0 0 0 0 0
## [7,] 0 0 0 0 0 0 0 0 0
## [8,] 0 0 0 0 0 0 0 0 0
## [9,] 0 0 0 0 0 0 0 0 0
The matrix is calculated according to the step pattern formula. Additionally \(d(0,j)=d(i,0)=d(0,0)=\infty\). That is why it is not included in calculations - will never be accounted for by min() function. I will be using the default euclidean distance between points.
First element:
dtw_matrix[1,1] = sqrt(a1[1]^2 + a2[1]^2)
dtw_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 8.602325 0 0 0 0 0 0 0 0
## [2,] 0.000000 0 0 0 0 0 0 0 0
## [3,] 0.000000 0 0 0 0 0 0 0 0
## [4,] 0.000000 0 0 0 0 0 0 0 0
## [5,] 0.000000 0 0 0 0 0 0 0 0
## [6,] 0.000000 0 0 0 0 0 0 0 0
## [7,] 0.000000 0 0 0 0 0 0 0 0
## [8,] 0.000000 0 0 0 0 0 0 0 0
## [9,] 0.000000 0 0 0 0 0 0 0 0
First column:
for (i in 2:9){
dtw_matrix[i,1] = sqrt((a1[i] - a2[1])^2) + dtw_matrix[i-1,1]
}
dtw_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 8.602325 0 0 0 0 0 0 0 0
## [2,] 12.602325 0 0 0 0 0 0 0 0
## [3,] 13.602325 0 0 0 0 0 0 0 0
## [4,] 17.602325 0 0 0 0 0 0 0 0
## [5,] 24.602325 0 0 0 0 0 0 0 0
## [6,] 25.602325 0 0 0 0 0 0 0 0
## [7,] 26.602325 0 0 0 0 0 0 0 0
## [8,] 27.602325 0 0 0 0 0 0 0 0
## [9,] 30.602325 0 0 0 0 0 0 0 0
First row:
for (j in 2:9){
dtw_matrix[1,j] = sqrt((a1[1] - a2[j])^2) + dtw_matrix[1,j-1]
}
dtw_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 8.602325 9.602325 12.60233 16.60233 18.60233 20.60233 21.60233
## [2,] 12.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [3,] 13.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [4,] 17.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [5,] 24.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [6,] 25.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [7,] 26.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [8,] 27.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [9,] 30.602325 0.000000 0.00000 0.00000 0.00000 0.00000 0.00000
## [,8] [,9]
## [1,] 22.60233 24.60233
## [2,] 0.00000 0.00000
## [3,] 0.00000 0.00000
## [4,] 0.00000 0.00000
## [5,] 0.00000 0.00000
## [6,] 0.00000 0.00000
## [7,] 0.00000 0.00000
## [8,] 0.00000 0.00000
## [9,] 0.00000 0.00000
The rest of the matrix:
for (i in 2:9){
for (j in 2:9){
dtw_matrix[i,j] = sqrt((a1[i] - a2[j])^2) + min(dtw_matrix[i,j-1], dtw_matrix[i-1,j], dtw_matrix[i-1,j-1])
}
}
dtw_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 8.602325 9.602325 12.60233 16.60233 18.60233 20.60233 21.60233
## [2,] 12.602325 11.602325 14.60233 18.60233 16.60233 20.60233 23.60233
## [3,] 13.602325 11.602325 13.60233 16.60233 19.60233 17.60233 17.60233
## [4,] 17.602325 14.602325 16.60233 19.60233 16.60233 20.60233 20.60233
## [5,] 24.602325 20.602325 22.60233 25.60233 19.60233 23.60233 26.60233
## [6,] 25.602325 20.602325 22.60233 25.60233 22.60233 20.60233 20.60233
## [7,] 26.602325 22.602325 20.60233 21.60233 26.60233 21.60233 22.60233
## [8,] 27.602325 22.602325 22.60233 23.60233 24.60233 22.60233 21.60233
## [9,] 30.602325 24.602325 26.60233 27.60233 24.60233 25.60233 23.60233
## [,8] [,9]
## [1,] 22.60233 24.60233
## [2,] 22.60233 22.60233
## [3,] 19.60233 22.60233
## [4,] 18.60233 18.60233
## [5,] 22.60233 21.60233
## [6,] 22.60233 24.60233
## [7,] 24.60233 27.60233
## [8,] 23.60233 26.60233
## [9,] 21.60233 22.60233
Find the optimal alignment: source of translated algorithm, slightly modified because of another approach to decisions in case of equal values: https://nipunbatra.github.io/blog/2014/dtw.html The change is - if \(d(i-1, j-1)=d(i-1, j)\) we choose \(d(i, j-1)\) and if \(d(i-1, j-1)=d(i-1, j)\) we choose \(d(i-1, j-1)\).
path = c(9,9) # starting with furthest place in matrix
i = 9
j = 9
while(i>1 & j>1){
if (j == 1) {
j = j - 1
} else if (i == 1) {
i = i - 1
} else if (dtw_matrix[i-1,j] == min(dtw_matrix[i-1, j-1], dtw_matrix[i-1, j], dtw_matrix[i, j-1])){
i = i - 1
} else if (dtw_matrix[i-1,j-1] == min(dtw_matrix[i-1, j-1], dtw_matrix[i-1, j], dtw_matrix[i, j-1])){
i = i - 1
j = j - 1
} else {
j = j - 1
}
path = rbind(path, c(i,j))
}
path = rbind(path, c(1,1))
plot(dtw(a1,a2, step.pattern = symmetric1))
points(path[,1], path[,2], type = "l")
In the dtwclust package is a function tsclust dedicatd to clustering time series. Its options are presented below.
tsclust(series = NULL, type = “partitional”, k = 2L, …, preproc = NULL, distance = “dtw_basic”, centroid = ifelse(type == “fuzzy”, “fcm”, “pam”), control = do.call(paste0(type, “_control“), list()), args = tsclust_args(), seed = NULL, trace = FALSE, error.check = TRUE)
Arguments that are useful to modify:
As stated at the beginning, the most popular types of clusters paired with DTW are PAM and hierarchical, so this is what I will be using. I will be performing the analysis on data collected from Eurostat on emissions of greenhouse gasses per capita Eurostat (2014). The structure of this data is visible below. Each columns represents a different year.
head(emissions)
## X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007 X2008 X2009 X2010 X2011
## AT 10.3 10.7 10.9 11.5 11.4 11.5 11.1 10.8 10.7 9.8 10.4 10.1
## BE 15.1 14.8 14.6 14.6 14.7 14.2 13.9 13.5 13.4 12.1 12.6 11.5
## BG 7.3 7.9 7.7 8.3 8.3 8.4 8.6 9.1 9.0 7.8 8.3 9.0
## CH 7.9 8.1 7.7 7.8 7.8 7.8 7.8 7.5 7.6 7.3 7.5 6.9
## CY 13.1 13.1 13.2 13.7 13.7 13.6 13.7 13.8 13.8 13.1 12.4 11.7
## CZ 14.7 14.7 14.4 14.7 14.8 14.6 14.7 14.8 14.2 13.3 13.5 13.3
## X2012 X2013 X2014 X2015 X2016
## AT 9.7 9.7 9.2 9.4 9.4
## BE 11.1 11.1 10.5 10.8 10.8
## BG 8.4 7.7 8.2 8.7 8.4
## CH 7.0 7.1 6.5 6.4 6.4
## CY 10.9 10.0 10.6 10.7 11.3
## CZ 12.9 12.4 12.2 12.3 12.4
First I am normalising the chosen data and performing clustering using PAM and DTW distance on a range of 2-17 clusters.
emissions.norm <- BBmisc::normalize(emissions, method="standardize")
clust.pam <- tsclust(emissions.norm, type="partitional", k=2L:17L, distance="dtw", centroid="pam")
Next I am evaluating the clusters depending on their number. For tsclust object to perform cluster validity one can use the cvi function. By using apply, cvi is calculated for all chosen numbers of clusters. This is a workaround to get the output normally given by for example fviz_nbclust. In each of the tabs are plots of given statistics per number of clusters.
## k_2 k_3 k_4 k_5 k_6
## Sil 6.219367e-01 4.804408e-01 3.496008e-01 3.591146e-01 2.978441e-01
## SF 8.580315e-06 3.522597e-08 6.769466e-09 8.090195e-12 1.975753e-12
## CH 4.235012e+01 2.981205e+01 2.316044e+01 2.474140e+01 1.937403e+01
## DB 5.375327e-01 8.667175e-01 9.520239e-01 8.326791e-01 9.168558e-01
## DBstar 5.375327e-01 9.850792e-01 1.789908e+00 2.086180e+00 3.194902e+00
## D 3.031549e-02 1.423910e-02 9.053286e-03 2.084148e-02 2.084148e-02
## COP 2.066182e-01 1.095197e-01 1.019924e-01 6.217320e-02 5.845241e-02
## k_7 k_8 k_9 k_10 k_11
## Sil 3.329783e-01 4.169944e-01 2.385236e-01 2.486685e-01 0.28305440
## SF 1.176836e-14 4.129316e-07 2.220446e-16 6.661338e-16 0.00000000
## CH 1.763049e+01 2.769270e+01 1.211769e+01 1.220693e+01 10.33570198
## DB 1.323454e+00 5.597688e-01 1.059226e+00 1.353445e+00 1.72345891
## DBstar 3.669975e+00 7.364860e-01 4.231454e+00 6.513487e+00 12.43928247
## D 2.716034e-02 1.628321e-01 2.143362e-02 2.716034e-02 0.01307001
## COP 5.248977e-02 2.349145e-02 5.175937e-02 4.834615e-02 0.04264722
## k_12 k_13 k_14 k_15 k_16
## Sil 0.14288643 1.542856e-01 2.507844e-01 2.100680e-01 1.769669e-01
## SF 0.00000000 1.085094e-07 4.008583e-07 8.074189e-08 9.923017e-08
## CH 6.80744274 1.718176e+01 1.952996e+01 1.583676e+01 1.341178e+01
## DB 3.12852806 9.254162e-01 8.237494e-01 7.953705e-01 1.007985e+00
## DBstar 9.33335201 2.083942e+00 1.749763e+00 1.967089e+00 2.724805e+00
## D 0.01577096 5.372844e-02 1.753211e-01 1.705979e-01 5.877414e-02
## COP 0.06281129 2.132327e-02 1.485456e-02 1.677851e-02 1.788782e-02
## k_17
## Sil 6.945246e-02
## SF 3.237554e-07
## CH 1.393392e+01
## DB 7.515518e-01
## DBstar 2.182193e+00
## D 8.606837e-02
## COP 1.293244e-02
to be maximized
to be maximized
to be maximized
to be minimized
to be minimized
to be maximized
to be minimized
The best results across statistics almost uniformly were for 6 clusters. Next I am performing the clustering only with 6 clusters.
clust.pam <- tsclust(emissions.norm, type="partitional", k=6L, distance="dtw", clustering="pam")
By using type=“sc” I am plotting different clusters and the time series that belong to those clusters. The dashed line represents the medoid time series.
plot(clust.pam, type = "sc")
Below is a list of countries and which cluster they belong to.
t(cbind(emissions[,0], cluster = clust.pam@cluster))
## AT BE BG CH CY CZ DE DK EE EL ES FI FR HR HU IE IS IT LI LT LU LV
## cluster 4 4 1 1 4 5 4 4 3 4 2 5 2 6 6 5 3 2 1 6 3 6
## MT NL NO PL PT RO SE SI SK TR UK
## cluster 1 5 4 4 1 6 1 2 2 6 2
Next in each of the tabs I am plotting each individual cluster from above saparately.
Here I am plotting the clusters in three ways. Series shows the members of chosen cluster (with clus = kL). Centroids shows a plot of time series that has been chosen as medoid. Plot with type=“sc” is actually a combination of series and centroids.
plot(clust.pam, type = "sc", clus = 1L)
plot(clust.pam, type = "series", clus = 1L)
plot(clust.pam, type = "centroids", clus = 1L)
plot(clust.pam, type = "sc", clus = 2L)
plot(clust.pam, type = "sc", clus = 3L)
plot(clust.pam, type = "sc", clus = 4L)
plot(clust.pam, type = "sc", clus = 5L)
plot(clust.pam, type = "sc", clus = 6L)
With hierarchical clustering the syntax is the same as before, the only thing is the type is now “h”.
clust.hier <- tsclust(emissions.norm, type = "h", k = 6L, distance = "dtw")
Below is a hierarchical tree. In boxes are countries that are clustered together in terms of emissions.
Here are time series divided by clusters this time using the hierarchical method.
plot(clust.hier, type="sc")
Below is a list of countries and which cluster they belong to.
cutree(clust.hier, k=6L)
## AT BE BG CH CY CZ DE DK EE EL ES FI FR HR HU IE IS IT LI LT LU LV MT NL NO
## 1 2 1 3 4 2 4 4 5 4 1 2 1 3 3 2 5 1 3 3 6 3 3 2 4
## PL PT RO SE SI SK TR UK
## 1 3 3 3 1 1 3 4
By comparison, the results of clustering with k=6 show that they are comparable. The hierarchical method performs better according to silhouette, Caliński-Harabasz and COP indexes but PAM according to Score Function, classic and modified Davies-Bouldin and Dunn indexes.
cvi(clust.hier)
## Sil SF CH DB DBstar
## 4.707510e-01 1.069388e-06 3.200694e+01 5.473567e-01 5.853817e-01
## D COP
## 1.961133e-01 3.267657e-02
cvi(clust.pam)
## Sil SF CH DB DBstar
## 3.242636e-01 3.197442e-14 2.112754e+01 1.385326e+00 3.210130e+00
## D COP
## 2.723451e-02 5.796156e-02
Aghabozorgi, Saeed, Ali Seyed Shirkhorshidi, and Teh Ying Wah. 2015. “Time-Series Clustering–A Decade Review.” Information Systems 53. Elsevier: 16–38.
Eurostat. 2014. “Greenhouse Gas Emissions Per Capita (T2020_rd300).” https://ec.europa.eu/eurostat/tgm/table.do?tab=table&init=1&language=en&pcode=t2020_rd300&plugin=1.
Guijo-Rubio, David, Antonio Manuel Durán-Rosal, Pedro Antonio Gutiérrez, Alicia Troncoso, and César Hervás-Martínez. 2018. “Time Series Clustering Based on the Characterisation of Segment Typologies.” arXiv Preprint arXiv:1810.11624.
Liao, T Warren. 2005. “Clustering of Time Series Data—a Survey.” Pattern Recognition 38 (11). Elsevier: 1857–74.
Rote, Günter. 1991. “Computing the Minimum Hausdorff Distance Between Two Point Sets on a Line Under Translation.” Information Processing Letters 38 (3). Elsevier: 123–27.
Stack Exchange, Forum. 2015. “Dynamic Time Warping Using K-Means.” https://stats.stackexchange.com/questions/131281/dynamic-time-warping-clustering.
Wang, Xiaoyue, Abdullah Mueen, Hui Ding, Goce Trajcevski, Peter Scheuermann, and Eamonn Keogh. 2013. “Experimental Comparison of Representation Methods and Distance Measures for Time Series Data.” Data Mining and Knowledge Discovery 26 (2). Springer: 275–309.
Zeng, Jianping, Jiangjiao Duan, and Chengrong Wu. 2010. “A New Distance Measure for Hidden Markov Models.” Expert Systems with Applications 37 (2). Elsevier: 1550–5.