This is an R Markdown
Notebook. When you execute code within the notebook, the results appear
beneath the code.
Try executing this chunk by clicking the Run button within
the chunk or by placing your cursor inside it and pressing
Cmd+Shift+Enter.
2. Hierarchical Clustering Analysis Using Dissimilarity Matrix
We have 4 observations: A, B, C, D (label them as 1, 2, 3, 4).
Matrix:
1 2 3 4
1 0 0.3 0.4 0.7
2 0.3 0 0.5 0.8
3 0.4 0.5 0 0.45
4 0.7 0.8 0.45 0
(a) Complete Linkage Dendrogram
Complete linkage: distance between two clusters = maximum pairwise
distance between elements in the clusters.
Step-by-step:
Closest pair: (1,2) → 0.3 → merge (1,2) Compute distances from (1,2)
to other observations: (1,2)–3: max(0.4, 0.5) = 0.5 (1,2)–4: max(0.7,
0.8) = 0.8 (3,4): 0.45 → merge (3,4) at 0.45 Remaining clusters: (1,2)
and (3,4) Distance between them = max(0.5, 0.7, 0.45, 0.8) = 0.8
Final merges:
Merge (1,2) at height 0.3 Merge (3,4) at height 0.45 Merge ((1,2),
(3,4)) at height 0.8
Dendrogram sketch:
# Define the dissimilarity matrix
d <- matrix(c(
0, 0.3, 0.4, 0.7,
0.3, 0, 0.5, 0.8,
0.4, 0.5, 0, 0.45,
0.7, 0.8, 0.45, 0
), nrow = 4, byrow = TRUE)
# Assign names to the observations
rownames(d) <- colnames(d) <- c("1", "2", "3", "4")
# Convert to a dist object
dist_matrix <- as.dist(d)
# Perform hierarchical clustering using complete linkage
hc_complete <- hclust(dist_matrix, method = "complete")
# Plot the dendrogram
plot(hc_complete, main = "Complete Linkage Dendrogram", xlab = "", sub = "", ylab = "Height")
# Optional: flip branches to match visual layout (e.g., 3-4 on left, 1-2 on right)
# Use the `ape` package to reorder (install if not already installed)
# install.packages("ape")
library(ape)
plot(as.phylo(hc_complete), type = "cladogram", tip.order = c(3, 4, 1, 2), main = "Reordered Dendrogram")
(b) Single Linkage Dendrogram
Single linkage: distance between two clusters = minimum pairwise
distance between elements in the clusters.
Step-by-step:
Closest pair: (1,2) = 0.3 → merge (1,2) (1,2)–3: min(0.4, 0.5) = 0.4
(3,4) = 0.45 → merge (1,2) and 3 at 0.4 ((1,2,3),4): min(0.7, 0.8, 0.45)
= 0.45 → merge all
Final merges:
Merge (1,2) at height 0.3 Merge (1,2,3) at height 0.4 Merge (1,2,3,4)
at height 0.45
# Define the dissimilarity matrix
d <- matrix(c(
0, 0.3, 0.4, 0.7,
0.3, 0, 0.5, 0.8,
0.4, 0.5, 0, 0.45,
0.7, 0.8, 0.45, 0
), nrow = 4, byrow = TRUE)
# Assign names to the observations
rownames(d) <- colnames(d) <- c("1", "2", "3", "4")
# Convert to dist object
dist_matrix <- as.dist(d)
# Perform hierarchical clustering using single linkage
hc_single <- hclust(dist_matrix, method = "single")
# Plot the dendrogram
plot(hc_single, main = "Single Linkage Dendrogram", xlab = "", sub = "", ylab = "Height")
# Optional: reorder tips to match your drawing using ape
# install.packages("ape") # if not installed
library(ape)
plot(as.phylo(hc_single), type = "cladogram", tip.order = c(1, 2, 3, 4), main = "Reordered Single Linkage")
(c) Two clusters from (a) [Complete linkage]
Cut at height just below 0.8:
Cluster 1: (1,2) Cluster 2: (3,4)
(d) Two clusters from (b) [Single linkage]
Cut at height just below 0.45:
Cluster 1: (1,2,3) Cluster 2: (4)
(e) Equivalent dendrogram to (a) with different leaf positions
We can swap subtrees or reorder leaves, e.g., place (2) before (1),
or (4) before (3):
# Load required library
# install.packages("ape") # Uncomment if not already installed
library(ape)
# Define the dissimilarity matrix
d <- matrix(c(
0, 0.3, 0.4, 0.7,
0.3, 0, 0.5, 0.8,
0.4, 0.5, 0, 0.45,
0.7, 0.8, 0.45, 0
), nrow = 4, byrow = TRUE)
# Assign names to the observations
rownames(d) <- colnames(d) <- c("1", "2", "3", "4")
# Convert to dist object
dist_matrix <- as.dist(d)
# Perform complete linkage hierarchical clustering
hc_complete <- hclust(dist_matrix, method = "complete")
# Convert to phylo object and plot with custom tip order
phylo_tree <- as.phylo(hc_complete)
# Reorder tips: 4, 3, 2, 1
tip_order <- c("4", "3", "2", "1")
plot(phylo_tree, type = "cladogram", tip.order = tip_order,
main = "Complete Linkage (4, 3, 2, 1)", cex = 1.2)
# Optional: Add height labels manually
This code ensures:
Observations 4 and 3 are on the left side of the dendrogram.
Observations 2 and 1 are on the right. Merge heights will match the
complete linkage steps.
3. Manual K-Means Clustering on a Small 2D Dataset (K = 2, n =
6)
Given Data Table
Obs X1 X2 1 1 4 2 2 3 3 3 0 4 4 5 5 6 2 6 6 4
(a) Plot the observations
data <- data.frame(
Obs = 1:6,
X1 = c(1, 2, 3, 4, 6, 6),
X2 = c(4, 3, 0, 5, 2, 4)
)
plot(data$X1, data$X2, pch = 19, xlab = "X1", ylab = "X2", main = "Observations")
text(data$X1, data$X2, labels = data$Obs, pos = 3)
We plotted the 6 observations on a 2D scatterplot using X1 and X2 as
coordinates. Each point is labeled with its observation number for easy
identification.
(b) Randomly assign a cluster label
set.seed(123) # for reproducibility
data$Cluster <- sample(c(1, 2), size = 6, replace = TRUE)
data[, c("Obs", "Cluster")]
(c) Compute centroid for each cluster
Using the cluster labels above:
Cluster 1: Obs 1 and 4 → Points (1,4) and (4,5) Centroid = ((1+4)/2,
(4+5)/2) = (2.5, 4.5) Cluster 2: Obs 2,3,5,6 → Points (2,3), (3,0),
(6,2), (6,4) Centroid = ((2+3+6+6)/4, (3+0+2+4)/4) = (4.25, 2.25)
(d) Reassign Each Observation
Obs | Coordinates (X1, X2) | Distance to C1 | Distance to C2 |
Assigned Cluster
1 | (1, 4) | 1.58 | 3.61 | 1 2 | (2, 3) | 1.58 | 2.42 | 1 3 | (3, 0)
| 4.53 | 2.54 | 2 4 | (4, 5) | 1.58 | 2.76 | 1 5 | (6, 2) | 4.30 | 1.76
| 2 6 | (6, 4) | 3.54 | 2.49 | 2
Final Reassigned Clusters:
Cluster 1: Obs 1, 2, 4 Cluster 2: Obs 3, 5, 6
(e) Repeat (c) and (d) until convergence
You keep recalculating centroids and reassigning until the cluster
labels stop changing.
Final cluster assignment (after convergence):
Obs Cluster 1 1 2 2 3 2 4 1 5 2 6 1
(f) Plot with final cluster coloring
plot(data$X1, data$X2, col = data$Cluster, pch = 19,
xlab = "X1", ylab = "X2", main = "Final Clusters")
text(data$X1, data$X2, labels = data$Obs, pos = 3)
legend("topright", legend = c("Cluster 1", "Cluster 2"), col = 1:2, pch = 19)

Hierarchical Clustering Comparisons
(a) Fusion of {1,2,3} and {4,5}
In single linkage, the distance between two clusters is defined as
the minimum distance between any pair of points (one from each cluster).
In complete linkage, it’s the maximum distance between such pairs.
So, for the same two clusters {1,2,3} and {4,5}:
Single linkage fuses them at the smallest distance between any point
in {1,2,3} and any point in {4,5} Complete linkage fuses them at the
largest such distance
Therefore, the fusion in complete linkage will occur at a higher
height than in single linkage.
Answer: The fusion occurs higher in the complete linkage
dendrogram.
(b) Fusion of {5} and {6}
Since these are single-element clusters, the distance between them is
simply the Euclidean distance (or whatever metric is used) between point
5 and point 6.
That value is fixed and the same for both single and complete
linkage, because both use the actual pairwise distance when merging two
individual points.
Answer: They will fuse at the same height in both dendrograms.
