Inspired by John W. Foreman book "Data Smart".
Clustering - detection groups of similar cases in data. Classical approach is to compute pairwise distances between cases and with various method combine nearest cases. Above mentioned book suppose approach when we considered our data as graph. If distance between two cases is less than threshold value than these two cases considered as connected nodes. Further we could apply to this graph various methods of communities detection. Each discovered community corresponds to cluster. So we have following algorithm:
- Compute pairwise distance
- Build adjacency matrix - (matrix with 0/1 values - is this cases connected or not)
- Build graph with adjacency matrix
- Detect community on this graph