kabacoff

# From the book by Robert Kabacoff
# R in Action Chapter 16
# Cluster analysis is a data-reduction technique designed to uncover subgroups of
# observations within a dataset
# A cluster is defined as a group
# of observations that are more similar to each other than they are to the observations
# in other groups.
# Hierarchical clustering is applied to the nutrient dataset contained in the
# flexclust package to answer the following questions:
#   â  What are the similarities and differences among 27 types of fish, fowl, and meat,
# based on 5 nutrient measures?
# â  Is there a smaller number of groups into which these foods can be meaningfully
# clustered?
# Partitioning methods will be used to evaluate 13 chemical analyses of 178 Italian wine
# samples
library(cluster)
library(NbClust)
library(flexclust)

## Loading required package: grid

## Loading required package: lattice

## Loading required package: modeltools

## Loading required package: stats4

library(fMultivar)

## Loading required package: timeDate

## Loading required package: timeSeries

## Loading required package: fBasics

##

## Rmetrics Package fBasics

## Analysing Markets and calculating Basic Statistics

## Copyright (C) 2005-2014 Rmetrics Association Zurich

## Educational Software for Financial Engineering and Computational Science

## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.

## https://www.rmetrics.org --- Mail to: info@rmetrics.org

## 
## Attaching package: 'fBasics'

## The following object is masked from 'package:flexclust':
## 
##     getModel

## The following object is masked from 'package:modeltools':
## 
##     getModel

##

## Rmetrics Package fMultivar

## Analysing and Modeling Multivariate Financial Return Distributions

## Copyright (C) 2005-2014 Rmetrics Association Zurich

## Educational Software for Financial Engineering and Computational Science

## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.

## https://www.rmetrics.org --- Mail to: info@rmetrics.org

library(rattle)

## Rattle: A free graphical interface for data mining with R.
## Versión 4.1.0 Copyright (c) 2006-2015 Togaware Pty Ltd.
## Escriba 'rattle()' para agitar, sacudir y  rotar sus datos.

# 16.1 Common steps in cluster analysis
# 1 Choose appropriate attributes.
# 2 Scale the data.
# 3 Screen for outliers.
# 4 Calculate distances.
# 5 Select a clustering algorithm.
# 6 Obtain one or more cluster solutions.
# 7 Determine the number of clusters present.
# 8 Obtain a final clustering solution.
# 9 Visualize the results.
# 10 Interpret the clusters.
# 11 Validate the results
# Because the calculations of distances between observations is such an integral part of
# cluster analysis, itâs described next and in some detail.
# 16.2 Calculating distances
# Consider the nutrient dataset provided with the flexclust package. The dataset
# contains measurements on the nutrients of 27 types of meat, fish, and fowl. The first
# few observations are given by
data("nutrient",package = "flexclust")
str(nutrient)

## 'data.frame':    27 obs. of  5 variables:
##  $ energy : int  340 245 420 375 180 115 170 160 265 300 ...
##  $ protein: int  20 21 15 19 22 20 25 26 20 18 ...
##  $ fat    : int  28 17 39 32 10 3 7 5 20 25 ...
##  $ calcium: int  9 9 7 9 17 8 12 14 9 9 ...
##  $ iron   : num  2.6 2.7 2 2.6 3.7 1.4 1.5 5.9 2.6 2.3 ...

head(nutrient,4)

##              energy protein fat calcium iron
## BEEF BRAISED    340      20  28       9  2.6
## HAMBURGER       245      21  17       9  2.7
## BEEF ROAST      420      15  39       7  2.0
## BEEF STEAK      375      19  32       9  2.6

# The dist() function in the base R installation can be used to calculate the distances
# between all rows (observations) of a matrix or data frame. The format is dist(x,
# method=), where x is the input data and method="euclidean" by default
d <- dist(nutrient)
as.matrix(d)[1:4,1:4]

##              BEEF BRAISED HAMBURGER BEEF ROAST BEEF STEAK
## BEEF BRAISED      0.00000   95.6400   80.93429   35.24202
## HAMBURGER        95.64000    0.0000  176.49218  130.87784
## BEEF ROAST       80.93429  176.4922    0.00000   45.76418
## BEEF STEAK       35.24202  130.8778   45.76418    0.00000

# Listing 16.1 Average-linkage clustering of the nutrient data
row.names(nutrient) <- tolower(row.names(nutrient))
nutrient.scaled <- scale(nutrient)
d <- dist(nutrient.scaled)
fit.average <- hclust(d,method = "average")
#?hclust()
plot(fit.average,hang=-1,cex=0.8,main="Average Linkage Clustering")

# Listing 16.2 Selecting the number of clusters
library(NbClust)
devAskNewPage(TRUE)
nc <- NbClust(nutrient.scaled,distance = "euclidean",
              min.nc = 2,max.nc = 15,method = "average")

## Warning in pf(beale, pp, df2): NaNs produced

## Warning in pf(beale, pp, df2): NaNs produced

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 4 proposed 2 as the best number of clusters 
## * 4 proposed 3 as the best number of clusters 
## * 2 proposed 4 as the best number of clusters 
## * 4 proposed 5 as the best number of clusters 
## * 1 proposed 9 as the best number of clusters 
## * 1 proposed 10 as the best number of clusters 
## * 2 proposed 13 as the best number of clusters 
## * 1 proposed 14 as the best number of clusters 
## * 4 proposed 15 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  2 
##  
##  
## *******************************************************************

summary(nc)

##                    Length Class  Mode   
## All.index          364    -none- numeric
## All.CriticalValues  42    -none- numeric
## Best.nc             52    -none- numeric
## Best.partition      27    -none- numeric

t <- table(nc$Best.nc[1,])
barplot(t,xlab="Number of Clusters",ylab="Number of Criteria",
        main="Number of Clusters chosen by 26 Criteria")
# Listing 16.3 Obtaining the final cluster solution
# 1 Assigns cases
clusters <- cutree(fit.average,k=5)
table(clusters)

## clusters
##  1  2  3  4  5 
##  7 16  1  2  1

# 2 Describes clusters
aggregate(nutrient,by=list(clusters=clusters),median)

##   clusters energy protein fat calcium iron
## 1        1  340.0      19  29       9 2.50
## 2        2  170.0      20   8      13 1.45
## 3        3  160.0      26   5      14 5.90
## 4        4   57.5       9   1      78 5.70
## 5        5  180.0      22   9     367 2.50

aggregate(as.data.frame(nutrient.scaled),by=list(clusters=clusters),median)

##   clusters     energy    protein        fat    calcium        iron
## 1        1  1.3101024  0.0000000  1.3785620 -0.4480464  0.08110456
## 2        2 -0.3696099  0.2352002 -0.4869384 -0.3967868 -0.63743114
## 3        3 -0.4684165  1.6464016 -0.7534384 -0.3839719  2.40779157
## 4        4 -1.4811842 -2.3520023 -1.1087718  0.4361807  2.27092763
## 5        5 -0.2708033  0.7056007 -0.3981050  4.1396825  0.08110456

# 3 Plots results
plot(fit.average,hang=-1,cex=0.8,
     main="Average Linkage Clustering\n5 Cluster Solution")
rect.hclust(fit.average,k=5)

# The cutree() function is used to cut the tree into five clusters 1. The first cluster
# has 7 observations, the second cluster has 16 observations, and so on. The aggregate()
# function is then used to obtain the median profile for each cluster 2. The
# results are reported in both the original metric and in standardized form. Finally, the
# dendrogram is replotted, and the rect.hclust() function is used to superimpose
# the five-cluster solution 3. The results are displayed in figure
# Sardines form their own cluster and are much higher in calcium than the other
# food groups. Beef heart is also a singleton and is high in protein and iron. The clam
# cluster is low in protein and high in iron. The items in the cluster containing beef
# roast to pork simmered are high in energy and fat. Finally, the largest group (mackerel
# to bluefish) is relatively low in iron.
# Hierarchical clustering can be particularly useful when you expect nested clustering
# and a meaningful hierarchy. This is often the case in the biological sciences. But the
# hierarchical algorithms are greedy in the sense that once an observation is assigned to
# a cluster, it canât be reassigned later in the process. Additionally, hierarchical clustering
# is difficult to apply in large samples, where there may be hundreds or even thousands
# of observations.
# Listing 16.4 K-means clustering of wine data
# Partitioning methods can work well in these situations.
# A bend in the graph (similar to the bend in the Scree test) can suggest the appropriate
# number of clusters.The graph can be produced with the following function:
wssplot <- function(data, nc=15, seed=1234){
  wss <- (nrow(data)-1)*sum(apply(data,2,var))
  for (i in 2:nc){
    set.seed(seed)
    wss[i] <- sum(kmeans(data, centers=i)$withinss)}
  plot(1:nc, wss, type="b", xlab="Number of Clusters",
       ylab="Within groups sum of squares")}
data(wine,package = "rattle")
head(wine)

##   Type Alcohol Malic  Ash Alcalinity Magnesium Phenols Flavanoids
## 1    1   14.23  1.71 2.43       15.6       127    2.80       3.06
## 2    1   13.20  1.78 2.14       11.2       100    2.65       2.76
## 3    1   13.16  2.36 2.67       18.6       101    2.80       3.24
## 4    1   14.37  1.95 2.50       16.8       113    3.85       3.49
## 5    1   13.24  2.59 2.87       21.0       118    2.80       2.69
## 6    1   14.20  1.76 2.45       15.2       112    3.27       3.39
##   Nonflavanoids Proanthocyanins Color  Hue Dilution Proline
## 1          0.28            2.29  5.64 1.04     3.92    1065
## 2          0.26            1.28  4.38 1.05     3.40    1050
## 3          0.30            2.81  5.68 1.03     3.17    1185
## 4          0.24            2.18  7.80 0.86     3.45    1480
## 5          0.39            1.82  4.32 1.04     2.93     735
## 6          0.34            1.97  6.75 1.05     2.85    1450

df <-scale(wine[-1])# 1. Standardizes  the data
# 2. Determines the number of clusters
wssplot(df)
library(NbClust)
set.seed(1234)
devAskNewPage(ask=TRUE)
nc <- NbClust(df,min.nc = 2,max.nc = 15,method = "kmeans")

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 4 proposed 2 as the best number of clusters 
## * 15 proposed 3 as the best number of clusters 
## * 1 proposed 10 as the best number of clusters 
## * 1 proposed 12 as the best number of clusters 
## * 1 proposed 14 as the best number of clusters 
## * 1 proposed 15 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  3 
##  
##  
## *******************************************************************

summary(nc)

##                    Length Class  Mode   
## All.index          364    -none- numeric
## All.CriticalValues  42    -none- numeric
## Best.nc             52    -none- numeric
## Best.partition     178    -none- numeric

ls(nc)

## [1] "All.CriticalValues" "All.index"          "Best.nc"           
## [4] "Best.partition"

t <- table(nc$Best.nc[1,])
t

## 
##  0  1  2  3 10 12 14 15 
##  2  1  4 15  1  1  1  1

barplot(t,xlab="Number of Clusters",ylab="Number of Criteria",main="Number of Clusters chosen by 26 criteria")
# 3. Performs the k-means cluster analysis
set.seed(1234)
fit.km <- kmeans(df,3,nstart = 25)
fit.km

## K-means clustering with 3 clusters of sizes 62, 65, 51
## 
## Cluster means:
##      Alcohol      Malic        Ash Alcalinity   Magnesium     Phenols
## 1  0.8328826 -0.3029551  0.3636801 -0.6084749  0.57596208  0.88274724
## 2 -0.9234669 -0.3929331 -0.4931257  0.1701220 -0.49032869 -0.07576891
## 3  0.1644436  0.8690954  0.1863726  0.5228924 -0.07526047 -0.97657548
##    Flavanoids Nonflavanoids Proanthocyanins      Color        Hue
## 1  0.97506900   -0.56050853      0.57865427  0.1705823  0.4726504
## 2  0.02075402   -0.03343924      0.05810161 -0.8993770  0.4605046
## 3 -1.21182921    0.72402116     -0.77751312  0.9388902 -1.1615122
##     Dilution    Proline
## 1  0.7770551  1.1220202
## 2  0.2700025 -0.7517257
## 3 -1.2887761 -0.4059428
## 
## Clustering vector:
##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 2 2 2 2 2 2 2 2
##  [71] 2 2 2 1 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2
## [106] 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
## [141] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [176] 3 3 3
## 
## Within cluster sum of squares by cluster:
## [1] 385.6983 558.6971 326.3537
##  (between_SS / total_SS =  44.8 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"

# Because the variables vary in range, they're standardized prior to clustering 1. Next,
# the number of clusters is determined using the wssplot() and NbClust() functions
# 2. Figure indicates that there is a distinct drop in the within-groups sum of
# squares when moving from one to three clusters. After three clusters, this decrease
# drops off, suggesting that a three-cluster solution may be a good fit to the data. In figure
# 16.5, 14 of 24 criteria provided by the NbClust package suggest a three-cluster
# solution. Note that not all 30 criteria can be calculated for every dataset.
# A final cluster solution is obtained with the kmeans() function, and the cluster centroids
# are printed 3. Because the centroids provided by the function are based on
# standardized data, the aggregate() function is used along with the cluster memberships
# to determine variable means for each cluster in the original metric.
# How well did k-means clustering uncover the actual structure of the data contained
# in the Type variable? A cross-tabulation of Type (wine varietal) and cluster
# membership is given by
ct.km <- table(wine$Type,fit.km$cluster)
ct.km

##    
##      1  2  3
##   1 59  0  0
##   2  3 65  3
##   3  0  0 48

# You can quantify the agreement between type and cluster using an adjusted Rand
# index, provided by the flexclust package:
library(flexclust)
randIndex(ct.km)

##      ARI 
## 0.897495

# The adjusted Rand index provides a measure of the agreement between two partitions,
# adjusted for chance. It ranges from -1 (no agreement) to 1 (perfect agreement).
# Agreement between the wine varietal type and the cluster solution is 0.9. Not
# bad-shall we have some wine?
# Listing 16.5 Partitioning around medoids for the wine data
library(rattle)
set.seed(1234)
fit.pam <- pam(wine[-1],k=3,stand = TRUE)#Clusters standardized data
ls(fit.pam)

##  [1] "call"       "clusinfo"   "clustering" "data"       "diss"      
##  [6] "id.med"     "isolation"  "medoids"    "objective"  "silinfo"

fit.pam$medoids#Prints the medoids

##      Alcohol Malic  Ash Alcalinity Magnesium Phenols Flavanoids
## [1,]   13.48  1.81 2.41       20.5       100    2.70       2.98
## [2,]   12.25  1.73 2.12       19.0        80    1.65       2.03
## [3,]   13.40  3.91 2.48       23.0       102    1.80       0.75
##      Nonflavanoids Proanthocyanins Color  Hue Dilution Proline
## [1,]          0.26            1.86   5.1 1.04     3.47     920
## [2,]          0.37            1.63   3.4 1.00     3.17     510
## [3,]          0.43            1.41   7.3 0.70     1.56     750

summary(fit.pam)

## Medoids:
##       ID Alcohol Malic  Ash Alcalinity Magnesium Phenols Flavanoids
## [1,]  36   13.48  1.81 2.41       20.5       100    2.70       2.98
## [2,] 107   12.25  1.73 2.12       19.0        80    1.65       2.03
## [3,] 175   13.40  3.91 2.48       23.0       102    1.80       0.75
##      Nonflavanoids Proanthocyanins Color  Hue Dilution Proline
## [1,]          0.26            1.86   5.1 1.04     3.47     920
## [2,]          0.37            1.63   3.4 1.00     3.17     510
## [3,]          0.43            1.41   7.3 0.70     1.56     750
## Clustering vector:
##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 2 1 2 2 2 1
##  [71] 2 1 2 1 1 2 2 2 1 1 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 2
## [106] 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 1 1 3 2 1 2 2 2 2 2 3 3 3 3 2 3 3 3 3 3
## [141] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [176] 3 3 3
## Objective function:
##    build     swap 
## 3.593378 3.476783 
## 
## Numerical information per cluster:
##      size max_diss  av_diss  diameter separation
## [1,]   75 7.164558 3.611501 11.255700   2.206761
## [2,]   54 6.473747 3.511231 10.901681   2.142040
## [3,]   49 5.781398 3.232620  9.027462   2.142040
## 
## Isolated clusters:
##  L-clusters: character(0)
##  L*-clusters: character(0)
## 
## Silhouette plot information:
##     cluster neighbor    sil_width
## 6         1        2  0.431345868
## 59        1        2  0.422985546
## 4         1        2  0.416433098
## 57        1        2  0.409209912
## 11        1        2  0.404957450
## 1         1        2  0.404822997
## 53        1        2  0.401782997
## 54        1        2  0.399156150
## 32        1        2  0.397323685
## 50        1        2  0.391734022
## 17        1        3  0.390594451
## 43        1        2  0.389290640
## 56        1        2  0.388713717
## 19        1        2  0.388330581
## 58        1        2  0.384584163
## 16        1        2  0.376489540
## 49        1        2  0.373222506
## 15        1        2  0.366725498
## 52        1        2  0.365109439
## 10        1        2  0.363428415
## 7         1        2  0.361644804
## 21        1        2  0.361525995
## 47        1        2  0.360930973
## 8         1        2  0.357461533
## 3         1        2  0.356729004
## 18        1        2  0.356002905
## 55        1        2  0.354923111
## 41        1        2  0.354841346
## 48        1        2  0.346432247
## 20        1        2  0.341821413
## 31        1        2  0.336983341
## 40        1        2  0.320916416
## 13        1        2  0.317531621
## 29        1        2  0.314698409
## 9         1        2  0.311149467
## 23        1        2  0.302217790
## 35        1        2  0.292034174
## 27        1        2  0.291653497
## 30        1        2  0.289798645
## 14        1        2  0.286894798
## 36        1        2  0.281579856
## 34        1        2  0.280814801
## 12        1        2  0.263951882
## 45        1        2  0.247640531
## 51        1        2  0.245465996
## 46        1        3  0.244331574
## 37        1        2  0.243850068
## 33        1        2  0.220417637
## 5         1        2  0.217148834
## 25        1        2  0.180261892
## 2         1        2  0.179671746
## 22        1        2  0.177533123
## 38        1        2  0.176263168
## 74        1        2  0.163570041
## 96        1        2  0.162864422
## 24        1        2  0.153481022
## 26        1        2  0.142592535
## 28        1        2  0.120492756
## 44        1        2  0.083800762
## 122       1        2  0.070664813
## 42        1        2  0.066062884
## 39        1        2  0.053810392
## 70        1        2  0.043660345
## 79        1        2  0.041360020
## 75        1        2  0.033247500
## 99        1        2  0.023496427
## 72        1        2  0.011201672
## 111       1        2 -0.003771621
## 110       1        2 -0.006731855
## 66        1        2 -0.026519834
## 97        1        2 -0.042458279
## 125       1        2 -0.054549029
## 64        1        2 -0.068583474
## 80        1        2 -0.118332689
## 121       1        2 -0.126937263
## 104       2        3  0.401723703
## 117       2        1  0.397043036
## 107       2        1  0.390905484
## 90        2        1  0.374991329
## 102       2        1  0.369490423
## 87        2        3  0.361997963
## 126       2        1  0.345318733
## 114       2        1  0.343147752
## 81        2        1  0.340189953
## 83        2        1  0.335844624
## 129       2        1  0.334780355
## 112       2        1  0.333346092
## 115       2        1  0.332731684
## 120       2        1  0.331189400
## 91        2        3  0.328135316
## 109       2        1  0.322857036
## 68        2        1  0.322395964
## 76        2        3  0.322243985
## 105       2        1  0.313857021
## 89        2        3  0.307182791
## 118       2        1  0.301742050
## 88        2        1  0.285114110
## 65        2        1  0.277539120
## 92        2        3  0.275248918
## 116       2        1  0.269892231
## 98        2        1  0.266889975
## 77        2        1  0.251060937
## 86        2        1  0.247081846
## 94        2        1  0.240197122
## 101       2        1  0.231525203
## 127       2        1  0.220323215
## 103       2        1  0.218235709
## 63        2        1  0.217959365
## 108       2        3  0.208692385
## 60        2        3  0.206061297
## 73        2        3  0.205788908
## 93        2        3  0.199525708
## 106       2        3  0.193048654
## 95        2        1  0.181584956
## 78        2        3  0.154485347
## 61        2        3  0.150120762
## 82        2        1  0.147005421
## 100       2        1  0.099919557
## 128       2        3  0.093070765
## 85        2        1  0.082184071
## 130       2        3  0.077159390
## 71        2        3  0.075987518
## 69        2        3  0.073687442
## 124       2        3  0.073646368
## 113       2        3  0.073572399
## 62        2        3  0.057882441
## 67        2        1  0.038670327
## 119       2        3  0.030009169
## 135       2        3 -0.062088852
## 175       3        2  0.466247395
## 149       3        2  0.458375372
## 150       3        2  0.444264747
## 174       3        2  0.439164897
## 178       3        2  0.438308568
## 148       3        2  0.436671474
## 167       3        2  0.434263598
## 176       3        2  0.429929697
## 157       3        2  0.427701659
## 156       3        2  0.418804029
## 165       3        2  0.418418468
## 173       3        2  0.416421149
## 169       3        2  0.401255418
## 177       3        2  0.397849022
## 168       3        2  0.394065026
## 166       3        2  0.390276509
## 170       3        1  0.376722018
## 154       3        2  0.376068795
## 152       3        2  0.372960669
## 151       3        2  0.365392096
## 161       3        2  0.364939118
## 172       3        2  0.354281662
## 164       3        2  0.346440381
## 158       3        2  0.343333161
## 162       3        2  0.328858654
## 147       3        2  0.314841977
## 145       3        2  0.313628497
## 163       3        2  0.311711278
## 138       3        2  0.310813701
## 137       3        2  0.297694984
## 146       3        2  0.293305453
## 139       3        2  0.292344946
## 133       3        2  0.280830852
## 143       3        2  0.277634029
## 132       3        2  0.269428806
## 153       3        2  0.268825785
## 136       3        2  0.267183656
## 171       3        2  0.261912495
## 141       3        2  0.260333129
## 160       3        1  0.257977193
## 144       3        2  0.246255528
## 155       3        2  0.234979198
## 142       3        2  0.233332916
## 134       3        2  0.230544406
## 140       3        2  0.222955957
## 159       3        1  0.164630354
## 84        3        2  0.159307477
## 131       3        2  0.074198339
## 123       3        2 -0.057130312
## Average silhouette width per cluster:
## [1] 0.2421838 0.2328185 0.3230317
## Average silhouette width of total data set:
## [1] 0.2615985
## 
## Available components:
##  [1] "medoids"    "id.med"     "clustering" "objective"  "isolation" 
##  [6] "clusinfo"   "silinfo"    "diss"       "call"       "data"

clusplot(fit.pam,main = "Bivariate Cluster Plot")# Plots the cluster solution

ct.pam <- table(wine$Type,fit.pam$clustering)
ct.pam

##    
##      1  2  3
##   1 59  0  0
##   2 16 53  2
##   3  0  1 47

# Also note that PAM didn't perform as well as k-means in this instance:
randIndex(ct.pam)#The adjusted Rand index has decreased from 0.9 (for k-means) to 0.7.

##       ARI 
## 0.6994957

kabacoff_16.R

aruac

Sat Apr 02 15:46:17 2016