# From the book by Robert Kabacoff
# R in Action Chapter 16
# Cluster analysis is a data-reduction technique designed to uncover subgroups of
# observations within a dataset
# A cluster is defined as a group
# of observations that are more similar to each other than they are to the observations
# in other groups.
# Hierarchical clustering is applied to the nutrient dataset contained in the
# flexclust package to answer the following questions:
# ■What are the similarities and differences among 27 types of fish, fowl, and meat,
# based on 5 nutrient measures?
# ■Is there a smaller number of groups into which these foods can be meaningfully
# clustered?
# Partitioning methods will be used to evaluate 13 chemical analyses of 178 Italian wine
# samples
library(cluster)
library(NbClust)
library(flexclust)
## Loading required package: grid
## Loading required package: lattice
## Loading required package: modeltools
## Loading required package: stats4
library(fMultivar)
## Loading required package: timeDate
## Loading required package: timeSeries
## Loading required package: fBasics
##
## Rmetrics Package fBasics
## Analysing Markets and calculating Basic Statistics
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
##
## Attaching package: 'fBasics'
## The following object is masked from 'package:flexclust':
##
## getModel
## The following object is masked from 'package:modeltools':
##
## getModel
##
## Rmetrics Package fMultivar
## Analysing and Modeling Multivariate Financial Return Distributions
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
library(rattle)
## Rattle: A free graphical interface for data mining with R.
## Versión 4.1.0 Copyright (c) 2006-2015 Togaware Pty Ltd.
## Escriba 'rattle()' para agitar, sacudir y rotar sus datos.
# 16.1 Common steps in cluster analysis
# 1 Choose appropriate attributes.
# 2 Scale the data.
# 3 Screen for outliers.
# 4 Calculate distances.
# 5 Select a clustering algorithm.
# 6 Obtain one or more cluster solutions.
# 7 Determine the number of clusters present.
# 8 Obtain a final clustering solution.
# 9 Visualize the results.
# 10 Interpret the clusters.
# 11 Validate the results
# Because the calculations of distances between observations is such an integral part of
# cluster analysis, it’s described next and in some detail.
# 16.2 Calculating distances
# Consider the nutrient dataset provided with the flexclust package. The dataset
# contains measurements on the nutrients of 27 types of meat, fish, and fowl. The first
# few observations are given by
data("nutrient",package = "flexclust")
str(nutrient)
## 'data.frame': 27 obs. of 5 variables:
## $ energy : int 340 245 420 375 180 115 170 160 265 300 ...
## $ protein: int 20 21 15 19 22 20 25 26 20 18 ...
## $ fat : int 28 17 39 32 10 3 7 5 20 25 ...
## $ calcium: int 9 9 7 9 17 8 12 14 9 9 ...
## $ iron : num 2.6 2.7 2 2.6 3.7 1.4 1.5 5.9 2.6 2.3 ...
head(nutrient,4)
## energy protein fat calcium iron
## BEEF BRAISED 340 20 28 9 2.6
## HAMBURGER 245 21 17 9 2.7
## BEEF ROAST 420 15 39 7 2.0
## BEEF STEAK 375 19 32 9 2.6
# The dist() function in the base R installation can be used to calculate the distances
# between all rows (observations) of a matrix or data frame. The format is dist(x,
# method=), where x is the input data and method="euclidean" by default
d <- dist(nutrient)
as.matrix(d)[1:4,1:4]
## BEEF BRAISED HAMBURGER BEEF ROAST BEEF STEAK
## BEEF BRAISED 0.00000 95.6400 80.93429 35.24202
## HAMBURGER 95.64000 0.0000 176.49218 130.87784
## BEEF ROAST 80.93429 176.4922 0.00000 45.76418
## BEEF STEAK 35.24202 130.8778 45.76418 0.00000
# Listing 16.1 Average-linkage clustering of the nutrient data
row.names(nutrient) <- tolower(row.names(nutrient))
nutrient.scaled <- scale(nutrient)
d <- dist(nutrient.scaled)
fit.average <- hclust(d,method = "average")
#?hclust()
plot(fit.average,hang=-1,cex=0.8,main="Average Linkage Clustering")

# Listing 16.2 Selecting the number of clusters
library(NbClust)
devAskNewPage(TRUE)
nc <- NbClust(nutrient.scaled,distance = "euclidean",
min.nc = 2,max.nc = 15,method = "average")
## Warning in pf(beale, pp, df2): NaNs produced
## Warning in pf(beale, pp, df2): NaNs produced

## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##

## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 4 proposed 2 as the best number of clusters
## * 4 proposed 3 as the best number of clusters
## * 2 proposed 4 as the best number of clusters
## * 4 proposed 5 as the best number of clusters
## * 1 proposed 9 as the best number of clusters
## * 1 proposed 10 as the best number of clusters
## * 2 proposed 13 as the best number of clusters
## * 1 proposed 14 as the best number of clusters
## * 4 proposed 15 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 2
##
##
## *******************************************************************
summary(nc)
## Length Class Mode
## All.index 364 -none- numeric
## All.CriticalValues 42 -none- numeric
## Best.nc 52 -none- numeric
## Best.partition 27 -none- numeric
t <- table(nc$Best.nc[1,])
barplot(t,xlab="Number of Clusters",ylab="Number of Criteria",
main="Number of Clusters chosen by 26 Criteria")
# Listing 16.3 Obtaining the final cluster solution
# 1 Assigns cases
clusters <- cutree(fit.average,k=5)
table(clusters)
## clusters
## 1 2 3 4 5
## 7 16 1 2 1
# 2 Describes clusters
aggregate(nutrient,by=list(clusters=clusters),median)
## clusters energy protein fat calcium iron
## 1 1 340.0 19 29 9 2.50
## 2 2 170.0 20 8 13 1.45
## 3 3 160.0 26 5 14 5.90
## 4 4 57.5 9 1 78 5.70
## 5 5 180.0 22 9 367 2.50
aggregate(as.data.frame(nutrient.scaled),by=list(clusters=clusters),median)
## clusters energy protein fat calcium iron
## 1 1 1.3101024 0.0000000 1.3785620 -0.4480464 0.08110456
## 2 2 -0.3696099 0.2352002 -0.4869384 -0.3967868 -0.63743114
## 3 3 -0.4684165 1.6464016 -0.7534384 -0.3839719 2.40779157
## 4 4 -1.4811842 -2.3520023 -1.1087718 0.4361807 2.27092763
## 5 5 -0.2708033 0.7056007 -0.3981050 4.1396825 0.08110456
# 3 Plots results
plot(fit.average,hang=-1,cex=0.8,
main="Average Linkage Clustering\n5 Cluster Solution")
rect.hclust(fit.average,k=5)

# The cutree() function is used to cut the tree into five clusters 1. The first cluster
# has 7 observations, the second cluster has 16 observations, and so on. The aggregate()
# function is then used to obtain the median profile for each cluster 2. The
# results are reported in both the original metric and in standardized form. Finally, the
# dendrogram is replotted, and the rect.hclust() function is used to superimpose
# the five-cluster solution 3. The results are displayed in figure
# Sardines form their own cluster and are much higher in calcium than the other
# food groups. Beef heart is also a singleton and is high in protein and iron. The clam
# cluster is low in protein and high in iron. The items in the cluster containing beef
# roast to pork simmered are high in energy and fat. Finally, the largest group (mackerel
# to bluefish) is relatively low in iron.
# Hierarchical clustering can be particularly useful when you expect nested clustering
# and a meaningful hierarchy. This is often the case in the biological sciences. But the
# hierarchical algorithms are greedy in the sense that once an observation is assigned to
# a cluster, it can’t be reassigned later in the process. Additionally, hierarchical clustering
# is difficult to apply in large samples, where there may be hundreds or even thousands
# of observations.
# Listing 16.4 K-means clustering of wine data
# Partitioning methods can work well in these situations.
# A bend in the graph (similar to the bend in the Scree test) can suggest the appropriate
# number of clusters.The graph can be produced with the following function:
wssplot <- function(data, nc=15, seed=1234){
wss <- (nrow(data)-1)*sum(apply(data,2,var))
for (i in 2:nc){
set.seed(seed)
wss[i] <- sum(kmeans(data, centers=i)$withinss)}
plot(1:nc, wss, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares")}
data(wine,package = "rattle")
head(wine)
## Type Alcohol Malic Ash Alcalinity Magnesium Phenols Flavanoids
## 1 1 14.23 1.71 2.43 15.6 127 2.80 3.06
## 2 1 13.20 1.78 2.14 11.2 100 2.65 2.76
## 3 1 13.16 2.36 2.67 18.6 101 2.80 3.24
## 4 1 14.37 1.95 2.50 16.8 113 3.85 3.49
## 5 1 13.24 2.59 2.87 21.0 118 2.80 2.69
## 6 1 14.20 1.76 2.45 15.2 112 3.27 3.39
## Nonflavanoids Proanthocyanins Color Hue Dilution Proline
## 1 0.28 2.29 5.64 1.04 3.92 1065
## 2 0.26 1.28 4.38 1.05 3.40 1050
## 3 0.30 2.81 5.68 1.03 3.17 1185
## 4 0.24 2.18 7.80 0.86 3.45 1480
## 5 0.39 1.82 4.32 1.04 2.93 735
## 6 0.34 1.97 6.75 1.05 2.85 1450
df <-scale(wine[-1])# 1. Standardizes the data
# 2. Determines the number of clusters
wssplot(df)
library(NbClust)
set.seed(1234)
devAskNewPage(ask=TRUE)
nc <- NbClust(df,min.nc = 2,max.nc = 15,method = "kmeans")


## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##

## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 4 proposed 2 as the best number of clusters
## * 15 proposed 3 as the best number of clusters
## * 1 proposed 10 as the best number of clusters
## * 1 proposed 12 as the best number of clusters
## * 1 proposed 14 as the best number of clusters
## * 1 proposed 15 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 3
##
##
## *******************************************************************
summary(nc)
## Length Class Mode
## All.index 364 -none- numeric
## All.CriticalValues 42 -none- numeric
## Best.nc 52 -none- numeric
## Best.partition 178 -none- numeric
ls(nc)
## [1] "All.CriticalValues" "All.index" "Best.nc"
## [4] "Best.partition"
t <- table(nc$Best.nc[1,])
t
##
## 0 1 2 3 10 12 14 15
## 2 1 4 15 1 1 1 1
barplot(t,xlab="Number of Clusters",ylab="Number of Criteria",main="Number of Clusters chosen by 26 criteria")
# 3. Performs the k-means cluster analysis
set.seed(1234)
fit.km <- kmeans(df,3,nstart = 25)
fit.km
## K-means clustering with 3 clusters of sizes 62, 65, 51
##
## Cluster means:
## Alcohol Malic Ash Alcalinity Magnesium Phenols
## 1 0.8328826 -0.3029551 0.3636801 -0.6084749 0.57596208 0.88274724
## 2 -0.9234669 -0.3929331 -0.4931257 0.1701220 -0.49032869 -0.07576891
## 3 0.1644436 0.8690954 0.1863726 0.5228924 -0.07526047 -0.97657548
## Flavanoids Nonflavanoids Proanthocyanins Color Hue
## 1 0.97506900 -0.56050853 0.57865427 0.1705823 0.4726504
## 2 0.02075402 -0.03343924 0.05810161 -0.8993770 0.4605046
## 3 -1.21182921 0.72402116 -0.77751312 0.9388902 -1.1615122
## Dilution Proline
## 1 0.7770551 1.1220202
## 2 0.2700025 -0.7517257
## 3 -1.2887761 -0.4059428
##
## Clustering vector:
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 2 2 2 2 2 2 2 2
## [71] 2 2 2 1 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2
## [106] 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
## [141] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [176] 3 3 3
##
## Within cluster sum of squares by cluster:
## [1] 385.6983 558.6971 326.3537
## (between_SS / total_SS = 44.8 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss"
## [5] "tot.withinss" "betweenss" "size" "iter"
## [9] "ifault"
# Because the variables vary in range, they're standardized prior to clustering 1. Next,
# the number of clusters is determined using the wssplot() and NbClust() functions
# 2. Figure indicates that there is a distinct drop in the within-groups sum of
# squares when moving from one to three clusters. After three clusters, this decrease
# drops off, suggesting that a three-cluster solution may be a good fit to the data. In figure
# 16.5, 14 of 24 criteria provided by the NbClust package suggest a three-cluster
# solution. Note that not all 30 criteria can be calculated for every dataset.
# A final cluster solution is obtained with the kmeans() function, and the cluster centroids
# are printed 3. Because the centroids provided by the function are based on
# standardized data, the aggregate() function is used along with the cluster memberships
# to determine variable means for each cluster in the original metric.
# How well did k-means clustering uncover the actual structure of the data contained
# in the Type variable? A cross-tabulation of Type (wine varietal) and cluster
# membership is given by
ct.km <- table(wine$Type,fit.km$cluster)
ct.km
##
## 1 2 3
## 1 59 0 0
## 2 3 65 3
## 3 0 0 48
# You can quantify the agreement between type and cluster using an adjusted Rand
# index, provided by the flexclust package:
library(flexclust)
randIndex(ct.km)
## ARI
## 0.897495
# The adjusted Rand index provides a measure of the agreement between two partitions,
# adjusted for chance. It ranges from -1 (no agreement) to 1 (perfect agreement).
# Agreement between the wine varietal type and the cluster solution is 0.9. Not
# bad-shall we have some wine?
# Listing 16.5 Partitioning around medoids for the wine data
library(rattle)
set.seed(1234)
fit.pam <- pam(wine[-1],k=3,stand = TRUE)#Clusters standardized data
ls(fit.pam)
## [1] "call" "clusinfo" "clustering" "data" "diss"
## [6] "id.med" "isolation" "medoids" "objective" "silinfo"
fit.pam$medoids#Prints the medoids
## Alcohol Malic Ash Alcalinity Magnesium Phenols Flavanoids
## [1,] 13.48 1.81 2.41 20.5 100 2.70 2.98
## [2,] 12.25 1.73 2.12 19.0 80 1.65 2.03
## [3,] 13.40 3.91 2.48 23.0 102 1.80 0.75
## Nonflavanoids Proanthocyanins Color Hue Dilution Proline
## [1,] 0.26 1.86 5.1 1.04 3.47 920
## [2,] 0.37 1.63 3.4 1.00 3.17 510
## [3,] 0.43 1.41 7.3 0.70 1.56 750
summary(fit.pam)
## Medoids:
## ID Alcohol Malic Ash Alcalinity Magnesium Phenols Flavanoids
## [1,] 36 13.48 1.81 2.41 20.5 100 2.70 2.98
## [2,] 107 12.25 1.73 2.12 19.0 80 1.65 2.03
## [3,] 175 13.40 3.91 2.48 23.0 102 1.80 0.75
## Nonflavanoids Proanthocyanins Color Hue Dilution Proline
## [1,] 0.26 1.86 5.1 1.04 3.47 920
## [2,] 0.37 1.63 3.4 1.00 3.17 510
## [3,] 0.43 1.41 7.3 0.70 1.56 750
## Clustering vector:
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 2 1 2 2 2 1
## [71] 2 1 2 1 1 2 2 2 1 1 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 2
## [106] 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 1 1 3 2 1 2 2 2 2 2 3 3 3 3 2 3 3 3 3 3
## [141] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [176] 3 3 3
## Objective function:
## build swap
## 3.593378 3.476783
##
## Numerical information per cluster:
## size max_diss av_diss diameter separation
## [1,] 75 7.164558 3.611501 11.255700 2.206761
## [2,] 54 6.473747 3.511231 10.901681 2.142040
## [3,] 49 5.781398 3.232620 9.027462 2.142040
##
## Isolated clusters:
## L-clusters: character(0)
## L*-clusters: character(0)
##
## Silhouette plot information:
## cluster neighbor sil_width
## 6 1 2 0.431345868
## 59 1 2 0.422985546
## 4 1 2 0.416433098
## 57 1 2 0.409209912
## 11 1 2 0.404957450
## 1 1 2 0.404822997
## 53 1 2 0.401782997
## 54 1 2 0.399156150
## 32 1 2 0.397323685
## 50 1 2 0.391734022
## 17 1 3 0.390594451
## 43 1 2 0.389290640
## 56 1 2 0.388713717
## 19 1 2 0.388330581
## 58 1 2 0.384584163
## 16 1 2 0.376489540
## 49 1 2 0.373222506
## 15 1 2 0.366725498
## 52 1 2 0.365109439
## 10 1 2 0.363428415
## 7 1 2 0.361644804
## 21 1 2 0.361525995
## 47 1 2 0.360930973
## 8 1 2 0.357461533
## 3 1 2 0.356729004
## 18 1 2 0.356002905
## 55 1 2 0.354923111
## 41 1 2 0.354841346
## 48 1 2 0.346432247
## 20 1 2 0.341821413
## 31 1 2 0.336983341
## 40 1 2 0.320916416
## 13 1 2 0.317531621
## 29 1 2 0.314698409
## 9 1 2 0.311149467
## 23 1 2 0.302217790
## 35 1 2 0.292034174
## 27 1 2 0.291653497
## 30 1 2 0.289798645
## 14 1 2 0.286894798
## 36 1 2 0.281579856
## 34 1 2 0.280814801
## 12 1 2 0.263951882
## 45 1 2 0.247640531
## 51 1 2 0.245465996
## 46 1 3 0.244331574
## 37 1 2 0.243850068
## 33 1 2 0.220417637
## 5 1 2 0.217148834
## 25 1 2 0.180261892
## 2 1 2 0.179671746
## 22 1 2 0.177533123
## 38 1 2 0.176263168
## 74 1 2 0.163570041
## 96 1 2 0.162864422
## 24 1 2 0.153481022
## 26 1 2 0.142592535
## 28 1 2 0.120492756
## 44 1 2 0.083800762
## 122 1 2 0.070664813
## 42 1 2 0.066062884
## 39 1 2 0.053810392
## 70 1 2 0.043660345
## 79 1 2 0.041360020
## 75 1 2 0.033247500
## 99 1 2 0.023496427
## 72 1 2 0.011201672
## 111 1 2 -0.003771621
## 110 1 2 -0.006731855
## 66 1 2 -0.026519834
## 97 1 2 -0.042458279
## 125 1 2 -0.054549029
## 64 1 2 -0.068583474
## 80 1 2 -0.118332689
## 121 1 2 -0.126937263
## 104 2 3 0.401723703
## 117 2 1 0.397043036
## 107 2 1 0.390905484
## 90 2 1 0.374991329
## 102 2 1 0.369490423
## 87 2 3 0.361997963
## 126 2 1 0.345318733
## 114 2 1 0.343147752
## 81 2 1 0.340189953
## 83 2 1 0.335844624
## 129 2 1 0.334780355
## 112 2 1 0.333346092
## 115 2 1 0.332731684
## 120 2 1 0.331189400
## 91 2 3 0.328135316
## 109 2 1 0.322857036
## 68 2 1 0.322395964
## 76 2 3 0.322243985
## 105 2 1 0.313857021
## 89 2 3 0.307182791
## 118 2 1 0.301742050
## 88 2 1 0.285114110
## 65 2 1 0.277539120
## 92 2 3 0.275248918
## 116 2 1 0.269892231
## 98 2 1 0.266889975
## 77 2 1 0.251060937
## 86 2 1 0.247081846
## 94 2 1 0.240197122
## 101 2 1 0.231525203
## 127 2 1 0.220323215
## 103 2 1 0.218235709
## 63 2 1 0.217959365
## 108 2 3 0.208692385
## 60 2 3 0.206061297
## 73 2 3 0.205788908
## 93 2 3 0.199525708
## 106 2 3 0.193048654
## 95 2 1 0.181584956
## 78 2 3 0.154485347
## 61 2 3 0.150120762
## 82 2 1 0.147005421
## 100 2 1 0.099919557
## 128 2 3 0.093070765
## 85 2 1 0.082184071
## 130 2 3 0.077159390
## 71 2 3 0.075987518
## 69 2 3 0.073687442
## 124 2 3 0.073646368
## 113 2 3 0.073572399
## 62 2 3 0.057882441
## 67 2 1 0.038670327
## 119 2 3 0.030009169
## 135 2 3 -0.062088852
## 175 3 2 0.466247395
## 149 3 2 0.458375372
## 150 3 2 0.444264747
## 174 3 2 0.439164897
## 178 3 2 0.438308568
## 148 3 2 0.436671474
## 167 3 2 0.434263598
## 176 3 2 0.429929697
## 157 3 2 0.427701659
## 156 3 2 0.418804029
## 165 3 2 0.418418468
## 173 3 2 0.416421149
## 169 3 2 0.401255418
## 177 3 2 0.397849022
## 168 3 2 0.394065026
## 166 3 2 0.390276509
## 170 3 1 0.376722018
## 154 3 2 0.376068795
## 152 3 2 0.372960669
## 151 3 2 0.365392096
## 161 3 2 0.364939118
## 172 3 2 0.354281662
## 164 3 2 0.346440381
## 158 3 2 0.343333161
## 162 3 2 0.328858654
## 147 3 2 0.314841977
## 145 3 2 0.313628497
## 163 3 2 0.311711278
## 138 3 2 0.310813701
## 137 3 2 0.297694984
## 146 3 2 0.293305453
## 139 3 2 0.292344946
## 133 3 2 0.280830852
## 143 3 2 0.277634029
## 132 3 2 0.269428806
## 153 3 2 0.268825785
## 136 3 2 0.267183656
## 171 3 2 0.261912495
## 141 3 2 0.260333129
## 160 3 1 0.257977193
## 144 3 2 0.246255528
## 155 3 2 0.234979198
## 142 3 2 0.233332916
## 134 3 2 0.230544406
## 140 3 2 0.222955957
## 159 3 1 0.164630354
## 84 3 2 0.159307477
## 131 3 2 0.074198339
## 123 3 2 -0.057130312
## Average silhouette width per cluster:
## [1] 0.2421838 0.2328185 0.3230317
## Average silhouette width of total data set:
## [1] 0.2615985
##
## Available components:
## [1] "medoids" "id.med" "clustering" "objective" "isolation"
## [6] "clusinfo" "silinfo" "diss" "call" "data"
clusplot(fit.pam,main = "Bivariate Cluster Plot")# Plots the cluster solution

ct.pam <- table(wine$Type,fit.pam$clustering)
ct.pam
##
## 1 2 3
## 1 59 0 0
## 2 16 53 2
## 3 0 1 47
# Also note that PAM didn't perform as well as k-means in this instance:
randIndex(ct.pam)#The adjusted Rand index has decreased from 0.9 (for k-means) to 0.7.
## ARI
## 0.6994957