Hierarchical Clustering is a clustering method that does not need to specify the number of clusters.
Hierarchical Clustering is a tree-based representation of observations, namely a dendrogram. At any greater height, the clusters obtained by cutting the dendrogram have nested structure. Otherwise, it yields worse (i.e. less accurate) results than K-means clustering for a given number of clusters.
Hierarchical algorithm:
1. start with n observations and a measure e.g. Euclidean distance.
Treat each observation as its own cluster.
2. For i = n, n − 1, . . . , 2:
(a) Examine all pairwise inter-cluster dissimilarities among the i
clusters and identify the pair of clusters that are most similar. Fuse
these two clusters. The dissimilarity between these two clusters
indicates the height in the dendrogram at which the fusion should be
placed.
(b)Compute the new pairwise inter-cluster dissimilarities among the i −
1 remaining clusters.
“USArrests” set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas. Three plots using average averag linkage, complete linkage, and single linkage for “USArrests”.
hc <- hclust(dist(USArrests), "ave")
plot(hc)
hc <- hclust(dist(USArrests), "complete")
plot(hc)
hc <- hclust(dist(USArrests), "single")
plot(hc)
1973 USArrest in 50 states hclust”complete”
1973 USArrest in 50 states hclust”average”
1973 USArrest in 50 states hclust”single”
Complete and average method yield more balanced tree than the single method does. Also, single method has the shortest height among the three since it selects the minimum dissimilarity.
@Manual{ title = {hclust: Hierarchical
Clustering}, author = {{R Core Team}}, organization = {Rdocumentation},
url = {https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/hclust},
} @Manual{R-base, title = {Dynamic
documents with rmarkdown cheat sheet}, author = {{Rstudio}}, year =
{2021},organization = {Rstudio}, url = {https://www.rstudio.com/resources/cheatsheets/}, }
@Manual{R-base, title = {An Introduction
to Statistical Learning with Applications in R}, author = {{Gareth
James, Daniela Witten,Trevor Hastie, Robert Tibshirani}}, organization =
{Springer}, address = {NewYork, U.S}, year = {2021}, url = {https://doi.org/10.1007/978-1-0716-1418-1_1}, }
@Manual{R-base, title = {Violent Crime
Rates by US State}, author = {{Rstudio}}, url ={https://stat.ethz.ch/R-manual/R-patched/library/datasets/html/USArrests.html},}