Clustering BoP representations of heart rates

This time we generated a bag of patterns representations of the heart rate timeseries from several physionet databases: sddb, nsrdb, chf2db, ltafdb.

They are generated from whole times series available (including events, such as cardiac deaths, etc.); and the timeseries are of different lengths. But they are all sampled at the same constant interval of 4 heart rate measurements per second (each quarter of a second). And all the representations are calculated using parameters: subsequence size = 4*60 (equals to one minute interval), alphabet size = 4, word size = 4. These seem reasonable to start experimenting with. Each representation (histogram) is normalized before clustering - in the sense that the values in the histogram (well, it’s probably not really a histogram, since we don’t have a continiounus variable, but lets just call it a histogram anyway) are relative frequencies and should add up to one for each representation. Otherwise timeseries of different lenght would be different due to lenght. This might not be completely OK for some reason?

The goal: clustering of all the representations, to see if the ones from the same database cluster together. Just to see and maybe get some ideas, how different they are according to our representations.

## 
## Call:
## hclust(d = distMatrix)
## 
## Cluster method   : complete 
## Distance         : euclidean 
## Number of objects: 154

We have 4 different databases of records, so let’s look at the point, where we have 4 clusters (at rougly distance 0.2).

The cluster on the right contains only sddb records (note: the numbers do not map to record numbers). The second cluster from the right contains na mixture of all three other database records (but no sddbs). The thirs cluster from the right again contains only sddb records. The leftmost cluster contains mostly ltafdb records, but also some chf2db records. So there seems to be some discrimatory potential in the method for these data, which is good. To print out and put on the wall for thought.

Some other possible clusterings.

## 
## Call:
## hclust(d = distMatrix, method = "single")
## 
## Cluster method   : single 
## Distance         : euclidean 
## Number of objects: 154

## 
## Call:
## hclust(d = distMatrix, method = "average")
## 
## Cluster method   : average 
## Distance         : euclidean 
## Number of objects: 154

## 
## Call:
## hclust(d = distMatrix, method = "ward.D")
## 
## Cluster method   : ward.D 
## Distance         : euclidean 
## Number of objects: 154

## 
## Call:
## hclust(d = distMatrix, method = "ward.D2")
## 
## Cluster method   : ward.D2 
## Distance         : euclidean 
## Number of objects: 154

Clustering BoP representations of heart rates

Crt Ahlin