Introduction
The previous post provided a practical application of using K-means Clustering to real-world dataset. Hierarchical clustering is an alternative approach to k-means clustering for identifying groups in the dataset. It does not require us to pre-specify the number of clusters to be generated as is required by the k-means approach. Furthermore, hierarchical clustering has an added advantage over K-means clustering in that it results in an attractive tree-based representation of the observations, called a dendrogram.
R Packages Required and Data Preparation
To perform a clustering algorithm for both Hierarchical and K-means clustering in R, generally, the data should be prepared as follows:
Rows are observations (individuals) and columns are variables.
Any missing value in the data must be removed or imputed.
The data must be standardized (scaled) to make variables comparable.
This post uses USArrests dataset, which contains statistics in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. It includes also the percent of the population living in urban areas.
The required packages are:
library(tidyverse) # Data manipulation.
library(factoextra) # Clustering visualization.
R Codes
# Clear our workspace:
rm(list = ls())
# Load data
data(USArrests)
# Normalize 0-1 datasets:
df <- USArrests %>% mutate_all(function(x) {(x - min(x)) / (max(x) - min(x))})
# Set rowname:
row.names(df) <- row.names(USArrests)
# Compute distances:
dd <- dist(df, method = "euclidean")
# Visualize the dissimilarity:
fviz_dist(dd, lab_size = 7)

# Perform hierarchical clustering:
hc <- hclust(dd, method = "ward.D2")
# Set theme for all plots:
library(extrafont)
my_font <- "Roboto Condensed"
# Create a draft of dendrogram by using fviz_dend() function:
fviz_dend(hc,
k = 4,
cex = 0.5,
rect = TRUE,
rect_fill = TRUE,
horiz = FALSE,
palette = "jco",
rect_border = "jco",
color_labels_by_k = TRUE) -> basic_plot
# Decorate the draft:
basic_plot +
theme_gray() +
theme(plot.margin = unit(rep(0.7, 4), "cm")) +
theme(text = element_text(family = my_font)) +
labs(title = "Dendrogram based on Hierarchical Clustering (Version 1)",
caption = "Source: World Almanac and Book of facts 1975")

In the dendrogram displayed above, each leaf corresponds to one observation. As we move up the tree, observations that are similar to each other are combined into branches, which are themselves fused at a higher height.
The height of the fusion, provided on the vertical axis, indicates the (dis)similarity between two observations. The higher the height of the fusion, the less similar the observations are. Note that, conclusions about the proximity of two observations can be drawn only based on the height where branches containing those two observations first are fused. We cannot use the proximity of two observations along the horizontal axis as a criteria of their similarity.
The height of the cut to the dendrogram controls the number of clusters obtained. It plays the same role as the k in k-means clustering. In order to identify sub-groups (clusters), we can cut the dendrogram with cutree()
function:
# Cut tree into 4 groups:
sub_grp <- cutree(hc, k = 4)
# Create plot of clusters:
fviz_cluster(list(data = df, cluster = paste0("Group", sub_grp)),
alpha = 1,
palette = "jco",
labelsize = 9,
ellipse.type = "norm") -> cluster_plot
# Decorate the plot:
cluster_plot +
theme(legend.position = c(0.1, 0.8)) +
theme(plot.margin = unit(rep(0.5, 4), "cm")) +
theme(text = element_text(family = my_font)) +
labs(title = "Cluster based on Hierarchical Clustering",
caption = "Source: World Almanac and Book of facts 1975")

And we can extract the clusters and add to our initial data to do some descriptive statistics at the cluster level:
USArrests %>%
mutate(Cluster = paste0("Group", sub_grp)) %>%
group_by(Cluster) %>%
summarise_all(mean) %>%
mutate_if(is.numeric, function(x) {round(x, 2)}) %>%
knitr::kable()
Group1 |
14.67 |
251.29 |
54.29 |
21.69 |
Group2 |
10.82 |
257.38 |
76.00 |
33.19 |
Group3 |
6.06 |
140.06 |
71.33 |
18.68 |
Group4 |
3.09 |
76.00 |
52.08 |
11.83 |
LS0tDQp0aXRsZTogJ0hpZXJhcmNoaWNhbCBDbHVzdGVyaW5nJw0KYXV0aG9yOiAnQXV0aG9yOiBOZ3V5ZW4gQ2hpIER1bmcnDQpzdWJ0aXRsZTogIlIgTWFjaGluZSBMZWFybmluZyBTZXJpZXMiDQpvdXRwdXQ6DQogIGh0bWxfZG9jdW1lbnQ6IA0KICAgIGNvZGVfZG93bmxvYWQ6IHRydWUNCiAgICAjIGNvZGVfZm9sZGluZzogaGlkZQ0KICAgIGhpZ2hsaWdodDogemVuYnVybg0KICAgICMgbnVtYmVyX3NlY3Rpb25zOiB5ZXMNCiAgICB0aGVtZTogImZsYXRseSINCiAgICB0b2M6IFRSVUUNCiAgICB0b2NfZmxvYXQ6IFRSVUUNCi0tLQ0KDQpgYGB7ciBzZXR1cCxpbmNsdWRlPUZBTFNFfQ0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFLCB3YXJuaW5nID0gRkFMU0UsIG1lc3NhZ2UgPSBGQUxTRSwgY2FjaGUgPSBUUlVFKQ0KDQpgYGANCg0KDQojIEludHJvZHVjdGlvbg0KDQpUaGUgW3ByZXZpb3VzIHBvc3RdKGh0dHBzOi8vcnB1YnMuY29tL2NoaWR1bmdrdC81Mzk3NTApIHByb3ZpZGVkIGEgcHJhY3RpY2FsIGFwcGxpY2F0aW9uIG9mIHVzaW5nIEstbWVhbnMgQ2x1c3RlcmluZyB0byByZWFsLXdvcmxkIGRhdGFzZXQuIFtIaWVyYXJjaGljYWwgY2x1c3RlcmluZ10oaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvSGllcmFyY2hpY2FsX2NsdXN0ZXJpbmcpIGlzIGFuIGFsdGVybmF0aXZlIGFwcHJvYWNoIHRvIGstbWVhbnMgY2x1c3RlcmluZyBmb3IgaWRlbnRpZnlpbmcgZ3JvdXBzIGluIHRoZSBkYXRhc2V0LiBJdCBkb2VzIG5vdCByZXF1aXJlIHVzIHRvIHByZS1zcGVjaWZ5IHRoZSBudW1iZXIgb2YgY2x1c3RlcnMgdG8gYmUgZ2VuZXJhdGVkIGFzIGlzIHJlcXVpcmVkIGJ5IHRoZSBrLW1lYW5zIGFwcHJvYWNoLiBGdXJ0aGVybW9yZSwgaGllcmFyY2hpY2FsIGNsdXN0ZXJpbmcgaGFzIGFuIGFkZGVkIGFkdmFudGFnZSBvdmVyIEstbWVhbnMgY2x1c3RlcmluZyBpbiB0aGF0IGl0IHJlc3VsdHMgaW4gYW4gYXR0cmFjdGl2ZSB0cmVlLWJhc2VkIHJlcHJlc2VudGF0aW9uIG9mIHRoZSBvYnNlcnZhdGlvbnMsIGNhbGxlZCBhIFtkZW5kcm9ncmFtXShodHRwczovL2VuLndpa2lwZWRpYS5vcmcvd2lraS9EZW5kcm9ncmFtKS4gDQoNCiMgUiBQYWNrYWdlcyBSZXF1aXJlZCBhbmQgRGF0YSBQcmVwYXJhdGlvbg0KDQpUbyBwZXJmb3JtIGEgY2x1c3RlcmluZyBhbGdvcml0aG0gZm9yIGJvdGggSGllcmFyY2hpY2FsIGFuZCBLLW1lYW5zIGNsdXN0ZXJpbmcgaW4gUiwgZ2VuZXJhbGx5LCB0aGUgZGF0YSBzaG91bGQgYmUgcHJlcGFyZWQgYXMgZm9sbG93czoNCg0KMS4gUm93cyBhcmUgb2JzZXJ2YXRpb25zIChpbmRpdmlkdWFscykgYW5kIGNvbHVtbnMgYXJlIHZhcmlhYmxlcy4gDQoNCjIuIEFueSBtaXNzaW5nIHZhbHVlIGluIHRoZSBkYXRhIG11c3QgYmUgcmVtb3ZlZCBvciBpbXB1dGVkLg0KDQozLiBUaGUgZGF0YSBtdXN0IGJlIHN0YW5kYXJkaXplZCAoc2NhbGVkKSB0byBtYWtlIHZhcmlhYmxlcyBjb21wYXJhYmxlLg0KDQoNClRoaXMgcG9zdCB1c2VzICoqVVNBcnJlc3RzKiogZGF0YXNldCwgd2hpY2ggY29udGFpbnMgc3RhdGlzdGljcyBpbiBhcnJlc3RzIHBlciAxMDAsMDAwIHJlc2lkZW50cyBmb3IgYXNzYXVsdCwgbXVyZGVyLCBhbmQgcmFwZSBpbiBlYWNoIG9mIHRoZSA1MCBVUyBzdGF0ZXMgaW4gMTk3My4gSXQgaW5jbHVkZXMgYWxzbyB0aGUgcGVyY2VudCBvZiB0aGUgcG9wdWxhdGlvbiBsaXZpbmcgaW4gdXJiYW4gYXJlYXMuIA0KDQoNClRoZSByZXF1aXJlZCBwYWNrYWdlcyBhcmU6DQoNCmBgYHtyfQ0KbGlicmFyeSh0aWR5dmVyc2UpICAjIERhdGEgbWFuaXB1bGF0aW9uLiANCmxpYnJhcnkoZmFjdG9leHRyYSkgIyBDbHVzdGVyaW5nIHZpc3VhbGl6YXRpb24uIA0KYGBgDQoNCiMgUiBDb2Rlcw0KDQoNCmBgYHtyfQ0KIyBDbGVhciBvdXIgd29ya3NwYWNlOiANCnJtKGxpc3QgPSBscygpKQ0KDQojIExvYWQgZGF0YQ0KZGF0YShVU0FycmVzdHMpDQoNCiMgTm9ybWFsaXplIDAtMSBkYXRhc2V0czoNCg0KZGYgPC0gVVNBcnJlc3RzICU+JSBtdXRhdGVfYWxsKGZ1bmN0aW9uKHgpIHsoeCAtIG1pbih4KSkgLyAobWF4KHgpIC0gbWluKHgpKX0pDQoNCiMgU2V0IHJvd25hbWU6IA0Kcm93Lm5hbWVzKGRmKSA8LSByb3cubmFtZXMoVVNBcnJlc3RzKQ0KDQojIENvbXB1dGUgZGlzdGFuY2VzOiANCmRkIDwtIGRpc3QoZGYsIG1ldGhvZCA9ICJldWNsaWRlYW4iKQ0KDQojIFZpc3VhbGl6ZSB0aGUgZGlzc2ltaWxhcml0eTogDQpmdml6X2Rpc3QoZGQsIGxhYl9zaXplID0gNykNCg0KIyBQZXJmb3JtIGhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nOiANCmhjIDwtIGhjbHVzdChkZCwgbWV0aG9kID0gIndhcmQuRDIiKQ0KDQojIFNldCB0aGVtZSBmb3IgYWxsIHBsb3RzOiANCg0KbGlicmFyeShleHRyYWZvbnQpDQpteV9mb250IDwtICJSb2JvdG8gQ29uZGVuc2VkIg0KDQojIENyZWF0ZSBhIGRyYWZ0IG9mIGRlbmRyb2dyYW0gYnkgdXNpbmcgZnZpel9kZW5kKCkgZnVuY3Rpb246IA0KDQpmdml6X2RlbmQoaGMsIA0KICAgICAgICAgIGsgPSA0LCAgIA0KICAgICAgICAgIGNleCA9IDAuNSwgDQogICAgICAgICAgcmVjdCA9IFRSVUUsIA0KICAgICAgICAgIHJlY3RfZmlsbCA9IFRSVUUsIA0KICAgICAgICAgIGhvcml6ID0gRkFMU0UsIA0KICAgICAgICAgIHBhbGV0dGUgPSAiamNvIiwgDQogICAgICAgICAgcmVjdF9ib3JkZXIgPSAiamNvIiwgDQogICAgICAgICAgY29sb3JfbGFiZWxzX2J5X2sgPSBUUlVFKSAtPiBiYXNpY19wbG90DQoNCiMgRGVjb3JhdGUgdGhlIGRyYWZ0OiANCg0KYmFzaWNfcGxvdCArIA0KICB0aGVtZV9ncmF5KCkgKyANCiAgdGhlbWUocGxvdC5tYXJnaW4gPSB1bml0KHJlcCgwLjcsIDQpLCAiY20iKSkgKyANCiAgdGhlbWUodGV4dCA9IGVsZW1lbnRfdGV4dChmYW1pbHkgPSBteV9mb250KSkgKyANCiAgbGFicyh0aXRsZSA9ICJEZW5kcm9ncmFtIGJhc2VkIG9uIEhpZXJhcmNoaWNhbCBDbHVzdGVyaW5nIChWZXJzaW9uIDEpIiwgDQogICAgICAgY2FwdGlvbiA9ICJTb3VyY2U6IFdvcmxkIEFsbWFuYWMgYW5kIEJvb2sgb2YgZmFjdHMgMTk3NSIpDQoNCg0KYGBgDQoNCg0KSW4gdGhlIGRlbmRyb2dyYW0gZGlzcGxheWVkIGFib3ZlLCBlYWNoIGxlYWYgY29ycmVzcG9uZHMgdG8gb25lIG9ic2VydmF0aW9uLiBBcyB3ZSBtb3ZlIHVwIHRoZSB0cmVlLCBvYnNlcnZhdGlvbnMgdGhhdCBhcmUgc2ltaWxhciB0byBlYWNoIG90aGVyIGFyZSBjb21iaW5lZCBpbnRvIGJyYW5jaGVzLCB3aGljaCBhcmUgdGhlbXNlbHZlcyBmdXNlZCBhdCBhIGhpZ2hlciBoZWlnaHQuDQoNClRoZSBoZWlnaHQgb2YgdGhlIGZ1c2lvbiwgcHJvdmlkZWQgb24gdGhlIHZlcnRpY2FsIGF4aXMsIGluZGljYXRlcyB0aGUgKGRpcylzaW1pbGFyaXR5IGJldHdlZW4gdHdvIG9ic2VydmF0aW9ucy4gVGhlIGhpZ2hlciB0aGUgaGVpZ2h0IG9mIHRoZSBmdXNpb24sIHRoZSBsZXNzIHNpbWlsYXIgdGhlIG9ic2VydmF0aW9ucyBhcmUuIE5vdGUgdGhhdCwgY29uY2x1c2lvbnMgYWJvdXQgdGhlIHByb3hpbWl0eSBvZiB0d28gb2JzZXJ2YXRpb25zIGNhbiBiZSBkcmF3biBvbmx5IGJhc2VkIG9uIHRoZSBoZWlnaHQgd2hlcmUgYnJhbmNoZXMgY29udGFpbmluZyB0aG9zZSB0d28gb2JzZXJ2YXRpb25zIGZpcnN0IGFyZSBmdXNlZC4gV2UgY2Fubm90IHVzZSB0aGUgcHJveGltaXR5IG9mIHR3byBvYnNlcnZhdGlvbnMgYWxvbmcgdGhlIGhvcml6b250YWwgYXhpcyBhcyBhIGNyaXRlcmlhIG9mIHRoZWlyIHNpbWlsYXJpdHkuDQoNClRoZSBoZWlnaHQgb2YgdGhlIGN1dCB0byB0aGUgZGVuZHJvZ3JhbSBjb250cm9scyB0aGUgbnVtYmVyIG9mIGNsdXN0ZXJzIG9idGFpbmVkLiBJdCBwbGF5cyB0aGUgc2FtZSByb2xlIGFzIHRoZSBrIGluIGstbWVhbnMgY2x1c3RlcmluZy4gSW4gb3JkZXIgdG8gaWRlbnRpZnkgc3ViLWdyb3VwcyAoY2x1c3RlcnMpLCB3ZSBjYW4gY3V0IHRoZSBkZW5kcm9ncmFtIHdpdGggYGN1dHJlZSgpYCBmdW5jdGlvbjogDQoNCg0KYGBge3J9DQojIEN1dCB0cmVlIGludG8gNCBncm91cHM6IA0Kc3ViX2dycCA8LSBjdXRyZWUoaGMsIGsgPSA0KQ0KDQojIENyZWF0ZSBwbG90IG9mIGNsdXN0ZXJzOiANCmZ2aXpfY2x1c3RlcihsaXN0KGRhdGEgPSBkZiwgY2x1c3RlciA9IHBhc3RlMCgiR3JvdXAiLCBzdWJfZ3JwKSksIA0KICAgICAgICAgICAgIGFscGhhID0gMSwgDQogICAgICAgICAgICAgcGFsZXR0ZSA9ICJqY28iLCANCiAgICAgICAgICAgICBsYWJlbHNpemUgPSA5LCANCiAgICAgICAgICAgICBlbGxpcHNlLnR5cGUgPSAibm9ybSIpIC0+IGNsdXN0ZXJfcGxvdA0KDQojIERlY29yYXRlIHRoZSBwbG90OiANCmNsdXN0ZXJfcGxvdCArIA0KICB0aGVtZShsZWdlbmQucG9zaXRpb24gPSBjKDAuMSwgMC44KSkgKyANCiAgdGhlbWUocGxvdC5tYXJnaW4gPSB1bml0KHJlcCgwLjUsIDQpLCAiY20iKSkgKyANCiAgdGhlbWUodGV4dCA9IGVsZW1lbnRfdGV4dChmYW1pbHkgPSBteV9mb250KSkgKyANCiAgbGFicyh0aXRsZSA9ICJDbHVzdGVyIGJhc2VkIG9uIEhpZXJhcmNoaWNhbCBDbHVzdGVyaW5nIiwgDQogICAgICAgY2FwdGlvbiA9ICJTb3VyY2U6IFdvcmxkIEFsbWFuYWMgYW5kIEJvb2sgb2YgZmFjdHMgMTk3NSIpIA0KYGBgDQoNCg0KQW5kIHdlIGNhbiBleHRyYWN0IHRoZSBjbHVzdGVycyBhbmQgYWRkIHRvIG91ciBpbml0aWFsIGRhdGEgdG8gZG8gc29tZSBkZXNjcmlwdGl2ZSBzdGF0aXN0aWNzIGF0IHRoZSBjbHVzdGVyIGxldmVsOg0KDQpgYGB7cn0NClVTQXJyZXN0cyAlPiUNCiAgbXV0YXRlKENsdXN0ZXIgPSBwYXN0ZTAoIkdyb3VwIiwgc3ViX2dycCkpICU+JQ0KICBncm91cF9ieShDbHVzdGVyKSAlPiUNCiAgc3VtbWFyaXNlX2FsbChtZWFuKSAlPiUgDQogIG11dGF0ZV9pZihpcy5udW1lcmljLCBmdW5jdGlvbih4KSB7cm91bmQoeCwgMil9KSAlPiUgDQogIGtuaXRyOjprYWJsZSgpDQpgYGANCg0KIyBSZWZlcmVuY2VzDQoNCjEuIGh0dHBzOi8vd3d3LmRhdGFub3ZpYS5jb20vZW4vYmxvZy9jbHVzdGVyLWFuYWx5c2lzLWluLXItc2ltcGxpZmllZC1hbmQtZW5oYW5jZWQvDQoyLiBodHRwOi8vd3d3LnN0aGRhLmNvbS9lbmdsaXNoL2FydGljbGVzLzMxLXByaW5jaXBhbC1jb21wb25lbnQtbWV0aG9kcy1pbi1yLXByYWN0aWNhbC1ndWlkZS8xMTItcGNhLXByaW5jaXBhbC1jb21wb25lbnQtYW5hbHlzaXMtZXNzZW50aWFscy8NCjMuIGh0dHA6Ly93d3cuc3RoZGEuY29tL2VuZ2xpc2gvYXJ0aWNsZXMvMzEtcHJpbmNpcGFsLWNvbXBvbmVudC1tZXRob2RzLWluLXItcHJhY3RpY2FsLWd1aWRlLzExNy1oY3BjLWhpZXJhcmNoaWNhbC1jbHVzdGVyaW5nLW9uLXByaW5jaXBhbC1jb21wb25lbnRzLWVzc2VudGlhbHMvDQo0LiBodHRwOi8vd3d3LnN0aGRhLmNvbS9lbmdsaXNoL3dpa2kvZmFjdG9leHRyYS1yLXBhY2thZ2UtZWFzeS1tdWx0aXZhcmlhdGUtZGF0YS1hbmFseXNlcy1hbmQtZWxlZ2FudC12aXN1YWxpemF0aW9uDQo1LiBodHRwczovL3VjLXIuZ2l0aHViLmlvL2hjX2NsdXN0ZXJpbmcNCjYuIGh0dHA6Ly93d3cuY29va2Jvb2stci5jb20vR3JhcGhzL0xlZ2VuZHNfKGdncGxvdDIpLw0KNy4gaHR0cHM6Ly9ibG9nLmV4cGxvcmF0b3J5LmlvL3Zpc3VhbGl6aW5nLWstbWVhbnMtY2x1c3RlcmluZy1yZXN1bHRzLXRvLXVuZGVyc3RhbmQtdGhlLWNoYXJhY3RlcmlzdGljcy1vZi1jbHVzdGVycy1iZXR0ZXItYjAyMjZmYjNkZDEwDQo4LiBodHRwczovL3d3dy5mcmVlY29kZWNhbXAub3JnL25ld3MvY2x1c3RlcmluZy10aGUtdG9wLTEtYXNzZXQtYW5hbHlzaXMtaW4tci02YzUyOWIzODJiNDIvDQo5LiBodHRwczovL3d3dy5odWJlci5lbWJsLmRlL21zbWIvQ2hhcC1NdWx0aXZhSGV0ZXJvLmh0bWwNCg==