Select the variables to be included in the cluster

## # A tibble: 2 x 3
##     aye   nay other
##   <dbl> <dbl> <dbl>
## 1    80   141    92
## 2    70   121   122

Run the clustering algo with 2 centers

View the results

##    party_clusters_Rep
## 1                   1
## 2                   1
## 3                   1
## 4                   1
## 5                   2
## 6                   2
## 7                   2
## 8                   1
## 9                   2
## 10                  1

Visualize the output

# Evaluate the quality of the clustering

## [1] 0.7193405

Create a elbow chart of the output, using the function created to evaluate several different number of clusters

Use NbClust to select a number of clusters

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 14 proposed 2 as the best number of clusters 
## * 3 proposed 3 as the best number of clusters 
## * 1 proposed 4 as the best number of clusters 
## * 3 proposed 6 as the best number of clusters 
## * 1 proposed 9 as the best number of clusters 
## * 1 proposed 10 as the best number of clusters 
## * 1 proposed 15 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  2 
##  
##  
## *******************************************************************

## null device 
##           1

##                       KL       CH Hartigan     CCC    Scott  Marriot     TrCovW
## Number_clusters   9.0000    2.000   6.0000  2.0000  10.0000   6.0000          3
## Value_Index     709.5603 1089.291 145.5246 75.5765 687.1084 698.5414 2100043324
##                   TraceW     Friedman   Rubin Cindex     DB Silhouette   Duda
## Number_clusters     3.00 4.000000e+00   6.000 2.0000 2.0000     2.0000 2.0000
## Value_Index     18253.48 3.179308e+16 -75.767 0.1794 0.6185     0.6445 0.7949
##                 PseudoT2  Beale Ratkowsky    Ball PtBiserial   Frey McClain
## Number_clusters   2.0000 2.0000     2.000     3.0     2.0000 2.0000  2.0000
## Value_Index      72.7763 0.4372     0.556 28896.5     0.8192 1.4948  0.3468
##                   Dunn Hubert SDindex Dindex    SDbw
## Number_clusters 2.0000      0  2.0000      0 15.0000
## Value_Index     0.0477      0  0.0938      0  0.0892

Display the results visually

# Using the recommended number of cluster, compare the quality of the model with 2 clusters

What differences and similarities did you see between how the clustering worked for the datasets?

In examining the Republican clustering data, there is a larger gain in variance explained by the 3-clusters k-means algorithm as compared to the 2-cluster k-means for the Republicans than for the Democrats. For the Democrats, the 3-cluster k-means gains explains about 5.1% more of the variance than the 2-clusters. For the Republicans, the 3-cluster k-means explains about 7.6% more of the variance than the 2-clusters. However, for both data sets the NBclust procedure recommends using the 2-cluster k-means procedure.

Clustering Lab

Jack McGrath