## # A tibble: 2 x 3
## aye nay other
## <dbl> <dbl> <dbl>
## 1 80 141 92
## 2 70 121 122
## party_clusters_Rep
## 1 1
## 2 1
## 3 1
## 4 1
## 5 2
## 6 2
## 7 2
## 8 1
## 9 2
## 10 1
# Evaluate the quality of the clustering
## [1] 0.7193405
## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 14 proposed 2 as the best number of clusters
## * 3 proposed 3 as the best number of clusters
## * 1 proposed 4 as the best number of clusters
## * 3 proposed 6 as the best number of clusters
## * 1 proposed 9 as the best number of clusters
## * 1 proposed 10 as the best number of clusters
## * 1 proposed 15 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 2
##
##
## *******************************************************************
## null device
## 1
## KL CH Hartigan CCC Scott Marriot TrCovW
## Number_clusters 9.0000 2.000 6.0000 2.0000 10.0000 6.0000 3
## Value_Index 709.5603 1089.291 145.5246 75.5765 687.1084 698.5414 2100043324
## TraceW Friedman Rubin Cindex DB Silhouette Duda
## Number_clusters 3.00 4.000000e+00 6.000 2.0000 2.0000 2.0000 2.0000
## Value_Index 18253.48 3.179308e+16 -75.767 0.1794 0.6185 0.6445 0.7949
## PseudoT2 Beale Ratkowsky Ball PtBiserial Frey McClain
## Number_clusters 2.0000 2.0000 2.000 3.0 2.0000 2.0000 2.0000
## Value_Index 72.7763 0.4372 0.556 28896.5 0.8192 1.4948 0.3468
## Dunn Hubert SDindex Dindex SDbw
## Number_clusters 2.0000 0 2.0000 0 15.0000
## Value_Index 0.0477 0 0.0938 0 0.0892
# Using the recommended number of cluster, compare the quality of the model with 2 clusters
In examining the Republican clustering data, there is a larger gain in variance explained by the 3-clusters k-means algorithm as compared to the 2-cluster k-means for the Republicans than for the Democrats. For the Democrats, the 3-cluster k-means gains explains about 5.1% more of the variance than the 2-clusters. For the Republicans, the 3-cluster k-means explains about 7.6% more of the variance than the 2-clusters. However, for both data sets the NBclust procedure recommends using the 2-cluster k-means procedure.