Hierarchical clustering is a technique that can group up closely related observations that form some pattern among selected data that may initially look unalike. This method builds clusters in a successive manner, merging or splitting them to construct a nested arrangement. Unlike partitioning, where the number of groups must be set beforehand, hierarchical methods allow the innate architecture to be uncovered naturally through the bottom-up or top-down coupling of items. The bottom-up methods first defines a similarity measure between two object, and extends it to similarity between clusters. In contrast, top-down methods directly define similarity between clusters. The step-wise cluster-forming process provides a flexible means of exploring complicated relationships covered within complex datasets, allowing an improved understanding of the underlying relationships.
Intuitive tree-based representation: The result of hierarchical clustering provides insight into the inherent connections between observations within the data. It is depicted on a tree structure known as a dendrogram. A quick look into the diagram gives perspective into how similarities brought certain portions of the information into clusters.
No need to specify number of clusters: Unlike k-means, hierarchical clustering does not make the user set the number of clusters beforehand. This can be particularly helpful when the number of original groupings is unknown beforehand.
Flexibility in cluster shapes: Hierarchical clustering can capture clusters of different shapes and sizes with much better efficiency than k-means.
Easy to implement and understand: The algorithm is very straightforward, making it easy to implement and understand, even for beginners .
Scalability and computational complexity: One of the main disadvantages of usage of hierarchical clustering is its computational complexity. It is less scalable for bigger datasets.
Sensitivity to outliers and noise: Hierarchical clustering is sensitive to outliers and noise, they can significantly warp the structure of the dendrogram, misleading interpretations.
Difficulty in identifying the optimal number of clusters: While it is an advantage not to specify the number of clusters upfront, it can also be a challenge to determine the optimal number of clusters based on the dendrogram, due to subjective view of user.
The dendrogram is the primary tool used for interpreting hierarchical clustering results. The merging height of two clusters represents the distance between them. Analysts often use a threshold height to cut the dendrogram, determining the number of clusters to consider. The selection of distance metric (e.g., Euclidean, Manhattan) and linkage criterion (e.g., complete, average, single) also affects the clustering outcome how analysts understand it.
Hierarchical clustering example from https://towardsdatascience.com/hierarchical-clustering-explained-e59b13846da8
For example based on this graph we would create 4 clusters with given cluster distance by the red line.
Hierarchical clustering has a wide range of applications across various domains:
Biology: For genetic and species classification based on evolutionary distances.
Finance: To cluster stocks with similar price movements.
Marketing: For customer segmentation based on purchasing behavior.
Information Retrieval: To organize and cluster documents or websites based on content similarity.
Hierarchical clustering offers a versatile and intuitive approach to understanding the underlying structure of data without the need to pre-specify the number of clusters.
Despite its computational limitations and sensitivity to noise, its applications across different fields showcase its utility in extracting meaningful patterns from data.
Analysts can gain a deeper understanding of complex datasets by careful selection of measures and linkage criteria for hierarchical clustering.