1 Introduction

Nowadays, most industries are driven by customer behavior and data analysis. For any organization, the two factors help decide the budget, marketing schemes, and promotional campaigns and help move toward a lucrative direction. For example, suppose you work for a fashion online retail store and would like to decide the discount offers to target different types of customers to boost sales. It would be beneficial to have a tool in your arsenal to segment the customers into groups based on their likes and dislikes, previous shopping history, and search queries to help design strategies to boost sales.

An example of boosting product sale by finding the target audience using clustering

Figure 1.1: An example of boosting product sale by finding the target audience using clustering

In this tutorial, we will learn about Hierarchical clustering— the tool that will help us cluster the data into groups based on the dissimilarity between the observation in the data (see Figure 1.1 (Ali 2022)). To begin with, we familiarise ourselves with a broader categorization of Hierarchical clustering— supervised and unsupervised machine learning algorithms and learn about clustering. Then, we will focus on Hierarchical clustering, dwelling deep into the parameters that govern its behavior. Finally, we will polish our concepts through a demonstration using R on publicly available data.

2 Types of Machine Learning Algorithms

Machine learning, a sub-field of Artificial Intelligence (AI), refers to the computer’s ability to solve a problem without explicit instructions. There are many different types of learning in machine learning. In the context of Hierarchical Clustering, it is sufficient to understand the difference between 2 types— supervised and unsupervised. The tutorial (available here) (Brownlee 2019) provides a well-detailed description of all types of learning with examples.

To understand supervised and unsupervised learning, we will define terms such as input variable(s) that refer to the attributes of the data utilized to identify patterns or predict the label, which contains binary or multi-category information.

2.1 Supervised learning

In supervised learning, the dataset has labels to “supervise” the algorithms identifying patterns or learning relationships between input variables and the label. Supervised learning can subsume different types of problems, not limited to the ones mentioned below:

  • Classification— categorizing a data point into predefined classes. For example, classifying email as spam or not. Algorithms such as random forests and support vector machines are examples of classification.
  • Regression— predicting the label value. For example, house prices can be predicted using attributes such as no. of bedrooms, square foot area, availability of garage, etc. Algorithms under regression include linear regression and polynomial regression.

2.2 Unsupervised learning

In unsupervised learning, the dataset doesn’t have labels to “supervise” the algorithms, leaving them to learn patterns in the dataset. Unsupervised learning can subsume different types of problems, not limited to the ones mentioned below:

  • Clustering— grouping data points based on their similarities or differences. Algorithms such as K-means and Hierarchical clustering are examples of clustering.

  • Dimensionality reduction— reducing the number of features (or dimensions) in the dataset if they are too high. Algorithms involve principal component analysis and manifold learning.

3 Importance of Hierarchical Clustering

As discussed before, clustering helps in grouping data based on similarities or dissimilarities. It can be used in situations where we want to segment customers or recommend similar movies. Before learning specifically about Hierarchical clustering, it is important to understand why and when it is required by comparing it with K-means clustering. Figure 3.1 (Traverso A 2019) shows the steps involved in implementing the K-means algorithm to cluster the data. A brief procedure is mentioned below:

  1. Select the number of clusters, k, for clustering data and randomly pick k data points as cluster centroids.
  2. Cluster data points in k clusters— choose the nearest cluster centroid for every data point.
  3. Update the cluster centroids— take the mean of the data points in a cluster.
  4. Repeat steps 2 and 3 until the cluster centroids do not change.
K-means clustering algorithm

Figure 3.1: K-means clustering algorithm

The K-means algorithm relies on predefining the number of clusters. Although there are ways of finding the optimal number of clusters (refer tutorial) (Banerjee 2022), they are not ideal since the decision is based on sample data at hand and hence, might not be robust to new data. Additionally, visualizing high-dimensional clustered data obtained from K-means requires an additional step— reduce dimensions.

Hierarchical clustering helps avoid predefining the number of clusters and easily visualize data in high-dimensional space. Sounds like the perfect clustering tool? Let’s learn more about Hierarchical clustering to understand its advantages and disadvantages over K-means clustering.

4 Hierarchical Clustering

Hierarchical clustering merges unlabelled data into a hierarchy of clusters forming a tree-shaped structure called a dendrogram. Before discussing how to build and interpret a dendrogram, it’s important to understand the 2 types of Hierarchical clustering— Agglomerative and Divisive (see Figure 4.1) (Jarman 2020).

Types of Hierarchical clustering

Figure 4.1: Types of Hierarchical clustering

4.1 Agglomerative Clustering

It is a bottom-up approach in which every data point forms a single cluster (leaf of the tree). At each step, the two least dissimilar clusters are merged, and the process continues until all data points come under a single cluster.

4.2 Divisive Clustering

It is a top-down approach in which all data points are part of one single cluster (the root of the tree). At each step, the most heterogeneous cluster is divided into two clusters, and the process continues until all data points become individual clusters. In this tutorial, we will focus only on Agglomerative clustering.

5 Building blocks of Hierarchical Clustering

Some questions need to be answered before one can efficiently grasp the Agglomerative clustering algorithm, such as:

  • How do we calculate the dissimilarity between clusters?
  • How to visualize the clustering?

The answers to these questions lie in understanding three topics— distance measure, linkage methods, and dendrogram.

5.1 Distance measures and Linkage methods

The dissimilarity between the clusters can be measured by calculating the distance between the clusters. Some of the commonly used distance measure are mentioned below.

  • Euclidean distance— It is the straight line distance between two data points. In n-dimensional space, the euclidean distance between two vectors, p and q is \(d(p,q) = \sqrt (\sum_{i=1}^n (p_i - q_i)^2)\).

  • Manhattan distance— It is the distance between two data points measured along the axes at right angles.In n-dimensional space, the manhattan distance between two vectors, p and q is \(d(p,q) = \sum_{i=1}^n |p_i - q_i|\).

  • Cosine distance— It measures the cosine similarity between two data points. In n-dimensional space, the cosine distance between two vectors, p and q is \(d(p,q) = 1 - cosine\_similarity\), where cosine similarity is measured as \(cosine\_similarity = cos(\theta) = \frac{\sum_{i=1}^n p_i*q_i}{\sqrt \sum_{i=1}^n p_i^2 .\sqrt\sum_{i=1}^n q_i^2}\)

Figure 5.1 (Das 2020) illustrates the difference between the distance measures in a two-dimensional space.
An example illustrating the difference between Euclidean, Manhattan and Cosine distance between points A and B in a two-dimensional space

Figure 5.1: An example illustrating the difference between Euclidean, Manhattan and Cosine distance between points A and B in a two-dimensional space

The distance measures mentioned above explained how to calculate the distance between 2 points. But how do we calculate the distance between 2 clusters? There are several ways to calculate the distance between the clusters that affect the type of clusters we get, called the linkage methods, given below.

  • Single Linkage— It is the shortest distance between two clusters, A and B. To compute this, we calculate the pairwise distance between data points in cluster A and B and use the smallest distance.

  • Complete Linkage— It is the farthest distance between two clusters, A and B. To compute this, we calculate the pairwise distance between data points in cluster A and B and use the largest distance.

  • Average Linkage— It is the average of distances between all pair of points in the two clusters, A and B. To compute this, we calculate the pairwise distance between data points in cluster A and B and calculate the average.

  • Centroid Linkage— It is the distance between the centroids (mean) of clusters A and B.

Figure 5.2 (Bonthu 2021) illustrates the different types of linkage methods between two clusters, A and B, in a two-dimensional space.

An example illustrating the different types of linkage methods in a two-dimensional space

Figure 5.2: An example illustrating the different types of linkage methods in a two-dimensional space

5.2 Understanding Dendrogram

A dendrogram is a diagram that shows the summary of distance measures between clusters. It helps visualize the different clusters in the data, as seen in Figure 5.3 (Polamuri 2020). The left-hand side of the figure shows a scatter plot of 7 data points, P1 to P7, in a two-dimensional space. The right-hand side shows a dendrogram for the data points where the y-axis represents the distance measure.

To interpret the dendrogram, we focus on the y-axis. The dendrogram shows that P6 and P7 are the most similar clusters since the height of the link joining P6 and P7 is the shortest (as evident from the scatter plot as well). The cluster containing P5 and the new merged cluster containing P6 and P7 becomes the second most similar. Furthermore,the maximum height difference between clusters, one containing P5, P6, and P7 and one with P4, shows that they are the most dissimilar. By drawing a horizontal line at a given height, we can group the data into clusters based on the formed (linked) clusters below the line.

There are several misconceptions related to the interpretation of a dendrogram such as:

  1. The closeness of clusters on the horizontal axis reflects similarity/dissimilarity between them. No, it doesn’t.
  2. It is tempting to use the shape of the dendrogram to find the number of clusters. However, there are cases when the dendrogram can depict an incorrect number of clusters. Such a scenario can be observed when there is a large amount of data.

Hence, we should verify the optimal number of clusters obtained from the shape of the dendrogram against methods such as the elbow and average silhouette. (refer tutorial) (Hierarchical Cluster Analysis, n.d.)

An example of a dendrogram (right) based on a scatter plot (left)

Figure 5.3: An example of a dendrogram (right) based on a scatter plot (left)

6 Working of Agglomerative Clustering

The steps involved in implementing Agglomerative clustering are as follows:

  1. Make each point a cluster.
  2. Calculate the distance matrix using the appropriate distance measure and linkage method. A distance matrix is a square matrix containing distances between each pair of clusters.
  3. Using the distance matrix, merge the clusters with the smallest distance between them.
  4. Repeat steps 2 and 3 until only one cluster is left.
  5. Find the number of clusters using a dendrogram and verify it using the elbow method (or average silhouette).

7 Hands-on Demonstration using R

It’s time to implement Hierarchical Clustering in R! We will be utilizing the Formula 1 World Championship dataset available download here (Vopani 2022). Formula 1 World Championship, one of the premier forms of auto racing, involves a series of races across the globe annually, each race called a Grand Prix. F1 participating teams are known as constructors, and each constructor team has two racers.

The purpose of using the dataset is to cluster constructors based on their performance during the era of hybrid engines, 2014 to 2021, and to provide valuable insights based on points earned, fastest lap speed and fastest lap time in each race. If you are new to the world of F1 and curious about it, I would highly recommend watching “Formula 1: Drive to Survive” available on Netflix.

Let’s load the required packages in R.

####################
# Load libraries 
####################
library(tidyverse)
library(dplyr)
library(lubridate)
library(magrittr)
library(ggfortify)

####################
# Clear environment
####################
rm(list = ls());

7.1 Data Curation

The F1 dataset on Kaggle contains multiple CSV files with varied information. To gauge constructor’s performance, we will require only 3 CSV files— races.csv, constructors.csv and results.csv

# load files
races <- read.csv2("./data/races.csv", sep = ',', na.strings = "\\N");
constructors <- read.csv2('./data/constructors.csv', sep = ',', na.strings = "\\N");
results <-read.csv2('./data/results.csv', sep = ',', na.strings = "\\N");

In the above code, na.strings takes a string that gets replaced with NA. The results.csv contains information about the constructor’s performance for both the drivers for each race from 1950-2022. The code below selects required columns from results.csv, merges the constructor’s name and year from the other 2 CSV files, and filters data for the period— 2014 to 2021.

# select the attributes to use.
data <- results %>%
  select(raceId, constructorId, points, fastestLapTime, fastestLapSpeed);

# Add the year of the grand prix based on raceId.
data<-data%>% 
  inner_join(races[,c("raceId", "year","name")], by = "raceId")%>%
  rename(grandPrix = `name`)%>%
  select(-c("raceId"));
# Add constructor's name absed on constructorId
data<-data%>%
  inner_join(constructors[,c("constructorId","name")], by = "constructorId")%>%
  rename(constructorName = `name`) %>%
  select(-c("constructorId"));

# select race data between 2014 and 2021
data <- data[data$year>=2014 & data$year<=2021,];

head(data)
##       points fastestLapTime fastestLapSpeed year             grandPrix
## 22128     25       1:32.478         206.436 2014 Australian Grand Prix
## 22129     18       1:33.066         205.131 2014 Australian Grand Prix
## 22130     15       1:32.917         205.460 2014 Australian Grand Prix
## 22131     12       1:33.186         204.867 2014 Australian Grand Prix
## 22132     10       1:32.616         206.128 2014 Australian Grand Prix
## 22133      8       1:32.568         206.235 2014 Australian Grand Prix
##       constructorName
## 22128        Mercedes
## 22129         McLaren
## 22130         McLaren
## 22131         Ferrari
## 22132        Williams
## 22133     Force India

The final data frame, data contains 5 attributes:

  • year
  • constructorName— the name of the constructor.
  • points— points earned by a driver in a race.
  • fastestLapSpeed— fastest lap speed for a driver in a race (in miles/hour).
  • fastestLapTime— fastest lap time for a driver in a race (in seconds)
  • grandPrix— circuit location for the race.

7.2 Data Analysis

The constructor’s team names have changed multiple times over the years. Hence, we replace the names representing the same constructor with the most recent name.

# create a new variable to store the analysis results.
dataAnalyzed = data.frame(data);
replaceConstrtuctorName <- c('Force India'= "Aston Martin", 'Racing Point' = "Aston Martin", 
                             'Renault' = "Alpine F1 Team", 'Lotus F1' = "Alpine F1 Team",
                             'Toro Rosso' = "AlphaTauri", 'Sauber' = "Alfa Romeo",
                             'Haas F1 Team' = "Marussia", 'Haas F1 Team' = "Manor Marussia");
dataAnalyzed$constructorName <- str_replace_all(dataAnalyzed$constructorName, replaceConstrtuctorName)

To perform a fair performance comparison between constructors, we select constructors that participated every year from 2014-2021. Furthermore, we select only those Grand Prix circuits that held race each year from 2014-2021.

# use constructors present in all years for comparison
numYears<- unique(dataAnalyzed$year);
commonConstructors<-dataAnalyzed[dataAnalyzed$year==numYears[1],c("constructorName")];
for (i in 2:length(numYears)){
  temp<-dataAnalyzed[dataAnalyzed$year==numYears[i],c("constructorName")];
  commonConstructors<-intersect(commonConstructors, temp);
}
commonConstructors    
## [1] "Mercedes"       "McLaren"        "Ferrari"        "Williams"      
## [5] "Aston Martin"   "AlphaTauri"     "Alfa Romeo"     "Alpine F1 Team"
## [9] "Red Bull"
# select only the common constructors
dataAnalyzed<- dataAnalyzed[dataAnalyzed$constructorName %in% commonConstructors,];

# since different circuits can effect the teams performance, we take only those
# circuits that were raced from 2014 to 2021, each year.
commonPrix <- dataAnalyzed[dataAnalyzed$year==numYears[1],c("grandPrix")];
for (i in 2:length(numYears)){
  temp<-dataAnalyzed[dataAnalyzed$year==numYears[i],c("grandPrix")];
  commonPrix<-intersect(commonPrix, temp);
}
commonPrix
## [1] "Bahrain Grand Prix"   "Spanish Grand Prix"   "Austrian Grand Prix" 
## [4] "British Grand Prix"   "Hungarian Grand Prix" "Belgian Grand Prix"  
## [7] "Italian Grand Prix"   "Russian Grand Prix"   "Abu Dhabi Grand Prix"
# select only the common circuits
dataAnalyzed<- dataAnalyzed[dataAnalyzed$grandPrix %in% commonPrix,];

7.3 Data Preprocessing

Let’s check for missing values in the data.

# create a new variable to store the preprocessing results.
dataPreprocessed <- data.frame(dataAnalyzed);

# check if there are missing values in the data
sapply(dataPreprocessed, function(x) sum(is.na(x)))
##          points  fastestLapTime fastestLapSpeed            year       grandPrix 
##               0              81              81               0               0 
## constructorName 
##               0

There are 81 rows with missing values in columns fastestLapSpeed and fastestLapTime. We simply remove the rows with missing values and confirm the removal.

# removing rows with NA
dataPreprocessed <- na.omit(dataPreprocessed);
# confirm removal of NA
sapply(dataPreprocessed, function(x) sum(is.na(x)))
##          points  fastestLapTime fastestLapSpeed            year       grandPrix 
##               0               0               0               0               0 
## constructorName 
##               0

Let’s check the data types for the columns.

glimpse(dataPreprocessed)
## Rows: 1,215
## Columns: 6
## $ points          <chr> "25", "18", "15", "12", "10", "8", "6", "4", "2", "1",…
## $ fastestLapTime  <chr> "1:37.108", "1:37.020", "1:39.320", "1:39.269", "1:38.…
## $ fastestLapSpeed <chr> "200.634", "200.816", "196.165", "196.266", "197.228",…
## $ year            <int> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, …
## $ grandPrix       <chr> "Bahrain Grand Prix", "Bahrain Grand Prix", "Bahrain G…
## $ constructorName <chr> "Mercedes", "Mercedes", "Aston Martin", "Red Bull", "A…

To process the data for clustering, we need to make the following changes in terms of data types:

  1. Convert points from character to integer.
  2. Convert fastestLapTime from character to double and change units to milliseconds.
  3. Convert fastestLapSpeed from character to double.
  4. Convert constructorName and grandPrix to factor.
# change column types 
# 1. 
dataPreprocessed$points <- sapply(dataPreprocessed$points, as.integer);

#2. 
my_options <- options(digits.secs = 3)  ;   # Modify and save default global options
dataPreprocessed <- dataPreprocessed %>%
  mutate(temp = strptime(fastestLapTime,format = "%M:%OS"))%>%
  select(-c(fastestLapTime))%>%
  mutate(fastestLapTime = minute(temp)*60*1000 + second(temp)*1000)%>%
  select(-c(temp));

# 3.
dataPreprocessed$fastestLapSpeed = as.numeric(dataPreprocessed$fastestLapSpeed);

# 4.
chr2factorCols <-c("constructorName", "grandPrix");
dataPreprocessed[chr2factorCols] <- sapply(dataPreprocessed[chr2factorCols], as.factor);

# checkout the data attributes and types after processing
glimpse(dataPreprocessed)
## Rows: 1,215
## Columns: 6
## $ points          <int> 25, 18, 15, 12, 10, 8, 6, 4, 2, 1, 0, 0, 0, 0, 0, 0, 0…
## $ fastestLapSpeed <dbl> 200.634, 200.816, 196.165, 196.266, 197.228, 196.181, …
## $ year            <int> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, …
## $ grandPrix       <chr> "Bahrain Grand Prix", "Bahrain Grand Prix", "Bahrain G…
## $ constructorName <chr> "Mercedes", "Mercedes", "Aston Martin", "Red Bull", "A…
## $ fastestLapTime  <dbl> 97108, 97020, 99320, 99269, 98785, 99312, 99272, 99762…

7.4 Modelling

We will now cluster constructor based on features— points, fastestLapTime and fastestLapSpeed. In order to do this, the following steps are performed.

  1. Calculate the sum of points, maximum of fastestLapTime, and minimum of fastestLapSpeed for a constructor in a particular year.
intermediateResults <- dataPreprocessed%>%
  select(c(year, points, fastestLapSpeed, fastestLapTime, constructorName)) %>%
  group_by(year, constructorName) %>%
  summarise(totalPoints = sum(points), min_fastestLapSpeed = min(fastestLapSpeed), 
            max_fastestLapTime = max(fastestLapTime),.groups = 'drop')%>%
  as.data.frame();
head(intermediateResults)
##   year constructorName totalPoints min_fastestLapSpeed max_fastestLapTime
## 1 2014      Alfa Romeo           0             174.300             114000
## 2 2014      AlphaTauri           7             176.414             114159
## 3 2014  Alpine F1 Team           4             148.352             115649
## 4 2014    Aston Martin          82             155.921             114532
## 5 2014         Ferrari          98             179.257             114090
## 6 2014         McLaren          80             176.898             114203
  1. Find the average of totalPoints, min_fastestLapSpeed and max_fastestLapTime across all the years.
inputData<- intermediateResults%>%
  group_by(constructorName) %>%
  summarise(avg_totalPoints = mean(totalPoints), 
            avg_minFastestLapSpeed = mean(min_fastestLapSpeed),
            avg_maxFastestLapTime = mean(max_fastestLapTime)) %>%
  as.data.frame()%>%
  set_rownames(.$constructorName) %>%
  select(-c(constructorName)) ;
inputData
##                avg_totalPoints avg_minFastestLapSpeed avg_maxFastestLapTime
## Alfa Romeo               7.875               185.2259              112947.6
## AlphaTauri              28.125               183.2430              113121.6
## Alpine F1 Team          40.750               184.4110              110569.8
## Aston Martin            61.250               184.7014              109880.0
## Ferrari                173.125               189.6946              110108.8
## McLaren                 61.500               182.4089              110111.6
## Mercedes               309.875               192.0040              108147.2
## Red Bull               158.500               189.4468              109780.0
## Williams                55.375               186.9545              110388.0

Let’s analyze the constructor teams performance based on the intermediateResults. The code below plots the density of min_fastestLapSpeed across the years for each constructor.

ggplot(data = intermediateResults, aes(x =  min_fastestLapSpeed))+
  geom_density() +
  facet_wrap(~constructorName);

Teams such as Mercedes, Red Bull, and Ferrari show high performance, as evident from the heavy-tailed distribution. The rest of the teams show indications of variable performance. It won’t be a surprise if Hierarchical clustering algorithm groups these teams together.

Let’s get to Clustering now! The data frame, inputData, contains three features for each constructor. Finally, it is time to cluster the data! It involves three easy steps in R:

  1. Scale features.
  2. Calculate the distance matrix based on the selected distance matrix.
  3. Use hclust() to calculate clusters (specify the linkage method).

The code below shows the implementation using the above 3 steps. For clustering the constructor’s performance, we will be using Euclidean distance and Average linkage.

# feature scaling
inputData <- inputData%>%
  scale %>%
  as.data.frame();

# calculate the distance matrix and hierarchical clusters.
distance_mat <- dist(inputData, method = 'euclidean');
hc<- hclust(distance_mat, method = "average");

7.5 Visualization

Let’s visualize the cluster using a dendrogram and a scatter plot. To make the scatter plot, we use PCA() to convert the three features to two-dimensional space.

# plot dendrogram
dend <- as.dendrogram(hc)
plot(dend)

# plot scatter plot
pca <-prcomp(inputData);
ggplot(data = pca$x, mapping = aes(x = PC1, y = PC2))+
  geom_point()+
  geom_text(aes(label= rownames(pca$x)), vjust = -1)+
  ylim(c(-1,1.2))+
  xlim(c(-2.2,3.5))

The clusters shown in the dendrogram can be verified using the scatter plot. For instance, teams such as Red Bull and Ferrari, and AlphaTauri and Alfa Romeo show similar performances as seen from the early merging. Mercedes gets added to the cluster in the end, as can be confirmed by its distance from other constructors on the scatter plot.

By gauging the shape of the dendrogram, one might be tempted to draw a horizontal line around 0.75 on the y-axis to create 5 clusters, indicating 5 levels of performance during the era of hybrid engines. The 5 clusters can be:

  • Group 1— Mercedes
  • Group 2— Red Bull and Ferrari
  • Group 3— Mclaren, Aston Martin, and Alpine F1 Team
  • Group 4— AlphaTauri and Alfa Romeo
  • Group 5— Williams

8 Conclusion

The formed clusters depict quite an accurate depiction of F1 teams performance from 2014 to 2021, with Group 1 leading the Championship for 7 consecutive years and the subsequent decreasing in performance from Group 1 to Group 5 (see Figure below).

Yay! We have successfully implemented Hierarchical Clustering. There are a few more steps that one can perform to improve the performance, such as:

  • Verify the number of clusters using the elbow or average silhouette method.
  • Analyze the performance of clustering based on different linkage methods and distance measures.

Hope you had fun while learning how to implement Hierarchical Clustering!

References

Ali, Moez. 2022. Clustering in Machine Learning: 5 Essential Clustering Algorithms. datacamp. https://www.datacamp.com/blog/clustering-in-machine-learning-5-essential-clustering-algorithms.
Banerjee, Ankita. 2022. K-Mean: Getting the Optimal Number of Clusters. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2021/05/k-mean-getting-the-optimal-number-of-clusters/#.
Bonthu, Harika. 2021. Single-Link Hierarchical Clustering Clearly Explained! Analytics Vidhya. https://www.analyticsvidhya.com/blog/2021/06/single-link-hierarchical-clustering-clearly-explained/.
Brownlee, Jason. 2019. 14 Different Types of Learning in Machine Learning. Machine Learning Mastery. https://machinelearningmastery.com/types-of-learning-in-machine-learning/.
Das, Shubhobrata. 2020. Different Types of Distances Used in Machine Learning. Medium. https://medium.com/@shubhobrata.das.fiem.cse17/different-types-of-distances-used-in-machine-learning-45b18a2282d8.
Hierarchical Cluster Analysis. n.d. UC Business Analytics R Programming Guide. https://uc-r.github.io/hc_clustering.
Jarman, Angur. 2020. Hierarchical Cluster Analysis: Comparison of Single Linkage,complete Linkage, Average Linkage and Centroid Linkage Method. https://www.ncbi.nlm.nih.gov/books/NBK543520/figure/ch9.Fig4/ doi: 10.1007/978-3-319-99713-1_9.
Polamuri, Saimadhu. 2020. HIERARCHICAL CLUSTERING DENDROGRAM. Dataaspirant. https://dataaspirant.com/16-hierarchical-clustering-dendrogram/.
Traverso A, Osong B, Dankers FJWM. 2019. Diving Deeper into Models. https://www.ncbi.nlm.nih.gov/books/NBK543520/figure/ch9.Fig4/ doi: 10.1007/978-3-319-99713-1_9.
Vopani. 2022. Formula 1 World Championship (1950 - 2022). Kaggle. https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020.