<ggplot2::labels> List of 1
$ title: chr "3 Spheres Scatterplot"
<ggplot2::labels> List of 1
$ title: chr "Ring Moon Scatterplot"
<ggplot2::labels> List of 1
$ title: chr "2 Spirals Scatterplot"
SUBMISSION INSTRUCTIONS
Consider three data sets below. Each data set contains three “clusters” in two dimensions. In the first, the clusters are three convex spheres. (A convex cluster is one where all points in the cluster can be connected with a straight line that does not leave the cluster.) In the second, one cluster is a sphere; one is a ring; and one is a half-moon. In the third, two clusters are spirals and one is a sphere. Our goal is to compare the performance of various hierarchical methods in clustering different cluster shapes.
scatterplots of each data set:
<ggplot2::labels> List of 1
$ title: chr "3 Spheres Scatterplot"
<ggplot2::labels> List of 1
$ title: chr "Ring Moon Scatterplot"
<ggplot2::labels> List of 1
$ title: chr "2 Spirals Scatterplot"
Perform agglomerative clustering with single, complete, average, and Ward linkages. Cut each tree to produce three clusters. Produce 12 scatterplots, one per data set/linkage combination, showing the 3-cluster solution. Title each graph with the linkage used, as well as the average silhouette width (\(\bar s\)) for that clustering solution. Use patchwork to create a nice 4x3 grid of your plots.
Sphere
Distance Matrix
Agglomerative Clustering.
cluster_three_single$ac[1] 0.8953964
cluster_three_complete$ac[1] 0.9792809
cluster_three_average$ac[1] 0.9667276
cluster_three_ward$ac[1] 0.9967758
cluster_ring_single$ac[1] 0.9304074
cluster_ring_complete$ac[1] 0.9911578
cluster_ring_average$ac[1] 0.9852207
cluster_ring_ward$ac[1] 0.9991226
cluster_spiral_single$ac[1] 0.8646828
cluster_spiral_complete$ac[1] 0.9912231
cluster_spiral_average$ac[1] 0.9810023
cluster_spiral_ward$ac[1] 0.998266
Cutting Tree
Discuss the following:
three_spheres: complete, average nd ward are very similar, single would not be a good one.
Ring_moon_spheres: Single would probably be best or Average depending on what we want. But with the silhouette scoring I think Average would be better
two_spirals_spheres: Single and averagehave the same silhouette scoring. The way it was clustered, I think dependent on which route we want to take I would choose average over single.
It looks like it gives a high score each time, but it visually clusters the points kind of like a pie in sections.
(Hint: you have a lot of repetitive code to write. You may find it helpful to write a function that takes a data set and a linkage method as arguments, does the clustering and computes average silhouette width, and produces the desired plot.)
Consider the data set below on milk content of 25 mammals. The variables have been pre-scaled to z-scores, hence no additional standardizing is necessary. (Data source: Everitt et al. Cluster analysis 4ed)
mammals <- read.csv('/Users/rosagomez/Desktop/DSCI 415/Activities/Data/mammal_milk.csv') %>%
column_to_rownames('Mammal')Perform agglomerative clustering with single, complete, average, and Ward linkages. Which has the best agglomerative coefficient?
mammals_single$ac[1] 0.7875718
mammals_complete$ac[1] 0.8985539
mammals_average$ac[1] 0.8706571
mammals_ward$ac[1] 0.9413994
The best agglomerative coefficient would be Ward because it’s closest to 1.
Plot a dendrogram of the method with the highest AC. Which mammals cluster together first?
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the factoextra package.
Please report the issue at <https://github.com/kassambara/factoextra/issues>.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the factoextra package.
Please report the issue at <https://github.com/kassambara/factoextra/issues>.
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the factoextra package.
Please report the issue at <https://github.com/kassambara/factoextra/issues>.
the mammals that cluster together first are deer and reindeer.
If the tree is cut at a height of 4, how many clusters will form? Which cluster will have the fewest mammals, and which mammals will they be?
Dolphin and seal
Use WSS and average silhouette method to suggest the optimal number of clusters. Re-create the dendrogram with the cluster memberships indicated.
Use suitable visualizations, including dimension reduction techniques, to explore the different milk characteristics of the assigned clusters. Discuss.
(K2_mammals + K3_mammals )(K4_mammals + K5_mammals)