- K-means is a distance-based method for cluster analysis in data mining
- It enables partitioning a set of data points into groups which are as similar as possible
- Each group, called cluster, is represented by its center
Algorithm
Given K, the number of clusters, k-means clustering works as follows:
- Select K points as initial centroids
- Repeat
- Form K clusters by assigning each point to its closest centroid
- Re-compute the centroids of each cluster
- Until convergence criterion is satisfied
- Different kinds of measures can be used (L1 norm, L2 norm, cosine similarity, ...)