Repeat the clustering process only using the Rep house votes dataset - What differences and similarities did you see between how the clustering worked for the Rep House dataset?
The Democrat data had a variance of 79.5% whereas Republican data has a variance of 71.9%. This indicates that the clusters that were formed for Republican-introduced bills accounted for less variance than the Democrat-introduced bills.
Visually, there are a few takeways between the two graphs. 1. Republican introduced bills have more outliers/spread which is also reflected in the lower amount of variance the clustering takes account for 2. Republicans tend to vote more strongly together on Democrat-introduced bills than Repbulican-introduced (the points are more densely consolidated) 3. On Democrat-introduced bills there were 3 Democrat data points included in cluster 2, the cluster predominantly covering Republican ayes/nayes. On Republican-introduced bills there were 10 Democrat data points included in cluster 1, the cluster predominantly covering Republican decisions. This shows that in Republican bills, Democrats tend to be less predictable in their vote. 4. No Republican votes were included in the predominantly Democrat cluster in either graph meaning their votes have less variance and can be better accounted for by clustering.
71.9% of the variance is accounted for by clustering which is a good amount, but still less than the Democrat-introduced bill data. Thus, Republican-introduced bills can be said to have more variance that is unaccounted by clustering to them.
As the number of clusters goes up, the percentage of variance the clustering accounts for generally increases with the largest spike between 1 cluster and 2 clusters.
For clusters 1-10, the Republicans follow a similar upwards trend, but are overall lower in the amount of variance of the clusters cover in comparison to the Democrats as shown in the graph. They gradually become closer in difference towards k=10, but the amount of variance covered by Democrat-introduced bills is always higher. They both start to flatten out at 95% variance covered.
Overall, both groups gain the most benefit at 2 clusters. Both could be argued up to 3 clusters, but Republican-introduced bills have a slightly stronger argument for moving up to three because it has a larger added benefit compared to Democrat-introduced bills.
Amongst Democrats and Republicans, both tests show that the recommended number of clusters across various methods is 2. The Republican-introduced bills has a slightly larger number of votes for 2 clusters at 14 while Democrat-introduced has 12. Democrat-introduced bills has a bit more spread amongst the different methods of the number of clusters that should be used whereas Republican-introduced bills are more consolidated around 2-6. Surprisingly, the Democrat-introduced bills actually has more votes for 3 clusters in comparison to Republican-introduced bills as per the previous evaluation.