Data Mining

K-means clustering is a technique that can take undefined datasets and group them togther based on similarities. The ruspini data set found in Rstudio will be used to complete the K-means analysis and a summary of the set is provided.

setwd("~/CST-425")
rus <- read.csv("ruspini.csv")

summary(rus)
##        x                y         
##  Min.   :  4.00   Min.   :  4.00  
##  1st Qu.: 31.50   1st Qu.: 56.50  
##  Median : 52.00   Median : 96.00  
##  Mean   : 54.88   Mean   : 92.03  
##  3rd Qu.: 76.50   3rd Qu.:141.50  
##  Max.   :117.00   Max.   :156.00

Scatter Plot

Based off the scatter plot, the best k-value for the analysis is four because the groupings are visually clear.

Choosing the K-value

Finally each cluster is given a distinct color to visually show the groups of data points.