#Import Ruspini Data Set
data(ruspini)
summary(ruspini)
## x y
## Min. : 4.00 Min. : 4.00
## 1st Qu.: 31.50 1st Qu.: 56.50
## Median : 52.00 Median : 96.00
## Mean : 54.88 Mean : 92.03
## 3rd Qu.: 76.50 3rd Qu.:141.50
## Max. :117.00 Max. :156.00
#cluster Ruspini Data set
Here is the initial clustering of the data before the k-means analysis. Based off what is shown below appears to have four groups.
set.seed(1)
ruspini = read.csv("~/CST-425/ruspini.csv")
ruspini = ruspini[,2:3]
plot(ruspini)
#K-means Analysis
The graph below shows that that there are four distinct groups. The reason for k being 4 is because the graph above showed that there could be 4 possible groups that can be indemnified. But this also means that each group canโt be split up into more if need be.
ruspiniKM = kmeans(ruspini, 4)
colors = c("red", "turquoise", "purple","green")
plot(ruspini[,1:2], pch = 19, col = colors[ruspiniKM$cluster])
ruspiniKM
## K-means clustering with 4 clusters of sizes 15, 17, 20, 23
##
## Cluster means:
## x y
## 1 68.93333 19.4000
## 2 98.17647 114.8824
## 3 20.15000 64.9500
## 4 43.91304 146.0435
##
## Clustering vector:
## [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [39] 4 4 4 4 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##
## Within cluster sum of squares by cluster:
## [1] 1456.533 4558.235 3689.500 3176.783
## (between_SS / total_SS = 94.7 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"