January 10, 2018

Finding Clusters using K-Means

The app I developed for this assignment allows users to play with k-means clustering using a Shiny GUI and plotly visualizations. The app allows users to see the effects of dispersion, number of clusters, and clusters created on the performance of the stats::kmeans() function.

The Shiny app accepts parameters for number of clusters (using well dropdowns), standard deviation (using sliders), and axis titles (using text input boxes). It uses the given inputs to generate data, calculate clusters, and plot the results. Because of the number of inputs, I opted to include a button to submit all changes simultaneously, rather than re-plotting with every tick of the slider or every keystroke in a text box.

Note that for i clusters, the true centers of clusters will be found at approximately (i+1,2i).

Example 1:

Example 2:

Conclusions

  • K-means works very well if the number of clusters is known.
  • There seems to be an ideal level of dispersion for identifying clusters.
    • Too much, and groups overlap too much. Too little, and random starting points don't catch all the data.
  • Shiny and plotly work well together.

Thank you!