Clustering App: Choose the number of clusters

May 2020

The app relevance

Exploring the partitioning of different numbers of clusters can be done by hand, by repeatedly running a code like this:

#load iris data
    x = iris[,-5]
    y = iris$Species
#kmeans for 3 clusters
    N = 3
    kc <- kmeans(x, N)
# plot kmeans output    
ggplot() + 
    geom_point(aes(x$`Sepal.Length`, x$`Sepal.Width`), col=kc$cluster, 
               shape = as.numeric(y), size = 2) +
    geom_point(aes(kc$centers[, "Sepal.Length"], 
                   kc$centers[, 'Sepal.Width']), col=1:N, size = 5) +
    labs(x = "Sepal length", y = "Sepal width", 
         title = paste(N, "clusters")) +
    theme_classic()

The app relevance

And several plots are produced to be compared.

How to use the app

An interactive plot to easily vary the number of clusters and see the data partitioning changing accordingly can simplify this task.

The app has the Iris dataset and the user can move the slider on the left pannel varying the number of clusters to classify the observations from 1 to 6.

Please, visit https://bhenning.shinyapps.io/bhApp/

Outputs

On the top right pannel the user can see the clustering classification plots.

On the bottom right pannel the user can see centroid values for each cluster.