df <- read.csv('./dataset.csv', header = T)
df$time <- as.POSIXlt(df$time, format = '%m/%d/%Y %H:%M')
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
In the first step, we check whether the data is clean. In general, we investigate the distribution of the data as well as the records. This is to remove any potential bias in sampling.
hist(year(df$time), main = 'Annual distribution', breaks = 2)
hist(month(df$time), main = 'Monthly distribution', breaks = 12)
hist(day(df$time), main = 'Daily distribution')
hist(hour(df$time), main = 'Hourly distribution')
hist(minute(df$time), main = 'By Minute distribution')
As the above figures demonstrate, the distribution of readings throughout different months is not consistent. Thus, due to the following reasons, we sample from the months of Feb., Mar., Jul. and Aug.
Given the distribution of the sensor readings, a subset of the sensors (i.e., d5-d9) can be selected for clustering.
In the first step, we use dynamic time wrapping to cluster the readings. For this purpose, the data needs to be pivoted.
In this case, a fuzzy time wrapping clustering with chord distance have been applied. In the fuzzy clustering, the following measures are used: – “MPC”: to be maximized.
– “K” (~): to be minimized.
– “T”: to be minimized.
– “SC” (~): to be maximized.
– “PBMF” (~): to be maximized.
Wang, W., & Zhang, Y. (2007). On fuzzy cluster validity indices. Fuzzy sets and systems, 158(19), 2095-2117.
library(dtw)
## Loading required package: proxy
##
## Attaching package: 'proxy'
## The following objects are masked from 'package:stats':
##
## as.dist, dist
## The following object is masked from 'package:base':
##
## as.matrix
## Loaded dtw v1.21-3. See ?dtw for help, citation("dtw") for use in publication.
library(dtwclust)
## dtwclust:
## Setting random number generator to L'Ecuyer-CMRG (see RNGkind()).
## To read the included vignettes type: browseVignettes("dtwclust").
## See news(package = "dtwclust") after package updates.
library(ggplot2)
library(reshape2)
library(rowr)
smp_devices <- smp_months
smp_devices$time <- date(smp_devices$time)
tmp <- NULL
for(col in names(smp_devices)[2:length(names(smp_devices))]){
print(paste("Pivoting", col))
for(tm in unique(as.character(smp_devices$time))){
#print(tm)
if(is.null(tmp))
tmp <- as.data.frame(smp_devices[smp_devices$time == tm, col])
else{
tmp <- cbind.fill(tmp, smp_devices[smp_devices$time == tm, col], fill = 0)
}
colnames(tmp)[ncol(tmp)] <- tm
}
clust <- tsclust(t(tmp), type = 'fuzzy', k = 14, distance = 'chord', seed = 74638, trace = T)
print(paste("Device = ", col, " CVIs and Cluster's members plot"))
print(cvi(clust))
print(plot(clust) + labs(title = paste(col," Clusters' members"), x = " "))
}
[1] “Pivoting d5” Iteration 1: Objective = 1.8616 Iteration 2: Objective = 1.8415 Iteration 3: Objective = 1.8188 Iteration 4: Objective = 1.7563 Iteration 5: Objective = 1.6407 Iteration 6: Objective = 1.5583 Iteration 7: Objective = 1.5175 Iteration 8: Objective = 1.5013 Iteration 9: Objective = 1.4946 Iteration 10: Objective = 1.4914 Iteration 11: Objective = 1.4900 Iteration 12: Objective = 1.4890
Elapsed time is 0.33 seconds.
[1] “Device = d5 CVIs and Cluster’s members plot” MPC K T SC PBMF 0.1165810 3783.4513246 24.4268177 -11.7873827 0.9127634 [1] “Pivoting d6” Iteration 1: Objective = 3.4965 Iteration 2: Objective = 3.4796 Iteration 3: Objective = 3.4525 Iteration 4: Objective = 3.3898 Iteration 5: Objective = 3.2673 Iteration 6: Objective = 3.1247 Iteration 7: Objective = 3.0249 Iteration 8: Objective = 2.9695 Iteration 9: Objective = 2.9431 Iteration 10: Objective = 2.9334 Iteration 11: Objective = 2.9308 Iteration 12: Objective = 2.9305
Elapsed time is 0.64 seconds.
[1] “Device = d6 CVIs and Cluster’s members plot” MPC K T SC PBMF 7.304120e-02 2.602616e+05 4.437732e+01 -1.880990e+01 9.040046e-01 [1] “Pivoting d7” Iteration 1: Objective = 4.9364 Iteration 2: Objective = 4.8990 Iteration 3: Objective = 4.8195 Iteration 4: Objective = 4.6844 Iteration 5: Objective = 4.5529 Iteration 6: Objective = 4.4443 Iteration 7: Objective = 4.3561 Iteration 8: Objective = 4.2943 Iteration 9: Objective = 4.2662 Iteration 10: Objective = 4.2552 Iteration 11: Objective = 4.2492 Iteration 12: Objective = 4.2462 Iteration 13: Objective = 4.2453
Elapsed time is 0.87 seconds.
[1] “Device = d7 CVIs and Cluster’s members plot” MPC K T SC PBMF 9.256198e-02 5.536410e+04 6.244920e+01 -1.341852e+01 8.663691e-01 [1] “Pivoting d8” Iteration 1: Objective = 6.6242 Iteration 2: Objective = 6.5937 Iteration 3: Objective = 6.5205 Iteration 4: Objective = 6.3728 Iteration 5: Objective = 6.2149 Iteration 6: Objective = 6.0875 Iteration 7: Objective = 5.9780 Iteration 8: Objective = 5.8684 Iteration 9: Objective = 5.7868 Iteration 10: Objective = 5.7332 Iteration 11: Objective = 5.7055 Iteration 12: Objective = 5.6978 Iteration 13: Objective = 5.6987
Elapsed time is 1.17 seconds.
[1] “Device = d8 CVIs and Cluster’s members plot” MPC K T SC PBMF 9.217722e-02 9.287959e+04 8.304655e+01 -1.342204e+01 8.299724e-01 [1] “Pivoting d9” Iteration 1: Objective = 8.9323 Iteration 2: Objective = 8.9088 Iteration 3: Objective = 8.8521 Iteration 4: Objective = 8.7110 Iteration 5: Objective = 8.4965 Iteration 6: Objective = 8.3037 Iteration 7: Objective = 8.1821 Iteration 8: Objective = 8.0813 Iteration 9: Objective = 7.9813 Iteration 10: Objective = 7.8756 Iteration 11: Objective = 7.7921 Iteration 12: Objective = 7.7437 Iteration 13: Objective = 7.7246 Iteration 14: Objective = 7.7191 Iteration 15: Objective = 7.7146 Iteration 16: Objective = 7.7021 Iteration 17: Objective = 7.6791 Iteration 18: Objective = 7.6709 Iteration 19: Objective = 7.6732 Iteration 20: Objective = 7.6742
Elapsed time is 2.14 seconds.
[1] “Device = d9 CVIs and Cluster’s members plot” MPC K T SC PBMF 0.1074794 15181.8335651 110.9195130 -11.6246174 0.8911483
#interactive_clustering(t(tmp))