No solution required.
No solution required.
No solution required.
For several of the continuous random variable scatter plots, there does appear to be visible clustering occurring - e.g. flipper_length_mm vs bill_depth_mm suggests that there are two distinct clusters. Therefore applying a clustering technique does seem to be warranted.
No solution required.
No solution required.
No solution required.
For the between_ss /total_ss value, we have 958.347 / 1328 = 0.72 = 72%.
As we can see, the clusters are actually doing a pretty good job of separating penguins by species!
Based on these plots, it seems that the \(k\)-means clustering has done a great job of clustering the penguins into clusters of different species. If you look carefully, there is some error when distinguishing between Chinstrap and Adelie penguins. If we also include more data such as sex or island, we might be able to more clearly distinguish between these two species.
The between_ss /total_ss values are:
The between_ss /total_ss value actually increases as we increase the number of clusters, which isn’t necessarily accurate - we know e.g. that there are not four species of penguin in our data (what happens if we use \(k=10\)?!).
No solution required.
Based on this plot, it seems that the optimal number of clusters is two.
Based on this plot, it seems that the optimal number of clusters is four
\(k = 2\):
\(k = 3\):
\(k = 4\):
Answers will vary.
Answers may vary here. Three or four clusters seem reasonable.
No solution required.
Check with your computer lab demonstrator if you are not sure.
The dendrograms all look quite different. The ward.D2 method one looks perhaps the cleanest and easiest to assess.
These notes have been prepared by Amanda Shaker and Rupert Kuveke. Please note that some of the content in these notes has been developed from content in Thulin (2021). The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.