K-means clustering is a technique that can take undefined datasets and group them togther based on similarities. The ruspini data set found in Rstudio will be used to complete the K-means analysis and a summary of the set is provided.
setwd("~/CST-425")
rus <- read.csv("ruspini.csv")
summary(rus)
## x y
## Min. : 4.00 Min. : 4.00
## 1st Qu.: 31.50 1st Qu.: 56.50
## Median : 52.00 Median : 96.00
## Mean : 54.88 Mean : 92.03
## 3rd Qu.: 76.50 3rd Qu.:141.50
## Max. :117.00 Max. :156.00
Based off the scatter plot, the best k-value for the analysis is four because the groupings are visually clear.
Finally each cluster is given a distinct color to visually show the groups of data points.