Data Mining in R

#Import Ruspini Data Set

data(ruspini)

summary(ruspini)

##        x                y         
##  Min.   :  4.00   Min.   :  4.00  
##  1st Qu.: 31.50   1st Qu.: 56.50  
##  Median : 52.00   Median : 96.00  
##  Mean   : 54.88   Mean   : 92.03  
##  3rd Qu.: 76.50   3rd Qu.:141.50  
##  Max.   :117.00   Max.   :156.00

#cluster Ruspini Data set

Here is the initial clustering of the data before the k-means analysis. Based off what is shown below appears to have four groups.

set.seed(1)
ruspini = read.csv("~/CST-425/ruspini.csv")
ruspini = ruspini[,2:3]
plot(ruspini)

#K-means Analysis

The graph below shows that that there are four distinct groups. The reason for k being 4 is because the graph above showed that there could be 4 possible groups that can be indemnified. But this also means that each group can’t be split up into more if need be.

ruspiniKM = kmeans(ruspini, 4)
colors = c("red", "turquoise", "purple","green")
plot(ruspini[,1:2], pch = 19, col = colors[ruspiniKM$cluster])

ruspiniKM

## K-means clustering with 4 clusters of sizes 15, 17, 20, 23
## 
## Cluster means:
##          x        y
## 1 68.93333  19.4000
## 2 98.17647 114.8824
## 3 20.15000  64.9500
## 4 43.91304 146.0435
## 
## Clustering vector:
##  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [39] 4 4 4 4 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 
## Within cluster sum of squares by cluster:
## [1] 1456.533 4558.235 3689.500 3176.783
##  (between_SS / total_SS =  94.7 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

Data Mining in R

Ben Hebbel

10/4/2020