k-means clustering using the 1974 Motor Trend US (mtcars) dataset

Juande
Dec 20 2014

k-means

k-means is a popular clustering method used in data mining. It works by pairing each point or observation with any of the n centroids which represent an individual cluster.

At the initialization part, the k initial centroids are randomly generated within the data domain. At each iteration each observation is paired with its nearest centroid, then for each centroid, the mean is calculated and its position is updated. These 2 steps are repeated until convergence has been reached.

More information available at: k-means clustering

Example

We will present an example to see how the centroids converge after a number of iterations. To show this, we will use 3 centroids. Iteration 1

plot of chunk unnamed-chunk-1

Example

Iteration 2. Notice how the rightmost centroid updates its position

plot of chunk unnamed-chunk-2

Example

At the third iteration the algorithm converges because none of the centroids was updated.

plot of chunk unnamed-chunk-3

Shiny App

Click the following link to see the webapp build in Shiny: k-means clustering in Shiny