K-means clustering is a widely-used machine learning algorithm that groups data points into k distnict non-overlapping clusters. The goal behind this algorithm is to partition the data such that the data points within each cluster are as similar as possible, while also maximizing the difference between clusters. It achieves this by iteratively computing a new centroid (mean) for each cluster where each iteration provides either an as-good or better centroid of the data. After enough iterations, the centroids for each cluster eventually converge resulting in the final clusters.
K-means clustering is very helpful when trying to categorize groups of data points. For example, let’s say you have a dataset from a pet store. The dataset consists of the following features: ‘Purchase Date’, ‘Purchase Amount’, ‘Gender’. Using these features, you can apply k-means clustering to determine which customers are dog owners and which ones are cat owners.