Kernel K-Means, K-Means++ and Cluster Evaluation

In this article, the R / Python implementations of KMeans Clustering and Kernel KMeans Clustering algorithms will be used to cluster a few datasets. Then the cluster assignments are compared with the ground truths w.r.t. (R implementations of) the following supervised clustering evaluation metrics: purity and NMI.

KMeans and Kernel KMeans Clustering

The following two animations show the partitions induced in by KMeans clustering in the following two datasets. On the first dataset, KMeans does a good job, but on the second dataset it performs poorly.

Data File Format

The next animation shows that the Kernel KMeans Clustering (implemented in R with Gaussian Kernel) performs considerably better on the second dataset, starting with the Gram matrix \(\phi\), with \(\phi_{(i,j)}=e^{-\frac{(X_i-X_j)^2}{2\sigma^2}}\), with \(\sigma=5\).

Data File Format

The next couple of animations show the clusters obtained at diffetent iterations with the algorithms KMean and Kernel KMeans clustering (with \(\sigma=4\)) respectively, on the same dataset, but this time implemented in python.
The next couple of animations show the clusters obtained at diffetent iterations with the algorithms KMean and Kernel KMeans clustering (with \(\sigma=4\)) respectively, on another dataset.
The next couple of animations show the clusters obtained at diffetent iterations with the algorithms KMean and Kernel KMeans clustering (with \(\sigma=4\)) respectively, on yet another dataset.

KMeans++ Initialization

Instead of random initialization for the KMeans clustering to start with the initial centroids, KMeans++ uses a smart initialization to probabilisitically assure that the centroids are far from each other. The following figure describes the smart initialization.
The following figures / animations show the comparative results in between KMeans with and without smart KMeans++ initialization, in terms of initialization time taken, the iterative convergence of the algorithm to a local optimum and cluster quality (in terms of heterogenity) on a dataset with k=7 clusters. As can be seen, in general KMeans++ smart initialization takes more time, but generally tends to converge faster and to better quality clusters.
The next figures / animations show the comparative results in between KMeans with and without smart KMeans++ initialization, in terms of initialization time taken, the iterative convergence of the algorithm to a local optimum and cluster quality (in terms of heterogenity) on another dataset with k=5 clusters.

Cluster Evaluations

The following figure shows the mathematical formulations of two supervised clustering evaluation metrics, purity and NMI that will be used to evaluate the clusters obtained.
As can be seen from the following evaluation results by running the algorithms on the following dataset, Kernel KMeans with K=2 performs the best (keeping \(\sigma\) of the kernel fixed at 4) in terms of the purity and NMI, when compared to the ground truth shown next.
Again, as can be seen from the following evaluation results by running the algorithms on another dataset, Kernel KMeans with K=2 performs the best (keeping \(\sigma\) of the kernel fixed at 4) in terms of the purity and NMI, when compared to the ground truth shown next.
Now, keeping the number of clusters fixed at K=2, the \(\sigma\) for the kernel is varied for both the datasets. As shown in the next figures / animations and results, for the kernel higher \(\sigma\) values, Kernel KMeans performs better in terms of the purity and NMI.