Suppose that we cut the dendrogram obtained in (a) such that two clusters result. Which observations are in each cluster? Cutting just before the final merge (0.8) results in two clusters: Cluster 1: {1, 2} Cluster 2: {3, 4}
Suppose that we cut the dendrogram obtained in (b) such that two clusters result. Which observations are in each cluster? Cutting between 0.4 and 0.45 so : Cluster 1: {1, 2, 3} Cluster 2: {4}
It is mentioned in the chapter that at each fusion in the dendrogram, the position of the two clusters being fused can be swapped without changing the meaning of the dendrogram. Draw a dendrogram that is equivalent to the dendrogram in (a), for which two or more of the leaves are repositioned, but for which the meaning of the dendrogram is the same.
x1 <- c(1, 1, 0, 5, 6, 4)
x2 <- c(4, 3, 4, 1, 2, 0)
df <- data.frame(Obs = 1:6, x1 = x1, x2 = x2)
library(ggplot2)
ggplot(data.frame(x1, x2), aes(x = x1, y = x2)) + geom_point()
set.seed(1234)
df$cl <- sample(1:2, nrow(df), replace = TRUE)
df[, c("Obs", "x1", "x2", "cl")]
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
centroids <- df %>%
group_by(cl) %>%
summarise(cx = mean(x1), cy = mean(x2), .groups = "drop")
centroids
Each point is now reassigned to the cluster whose centroid is closest, using Euclidean distance.
assign_clusters <- function(data, centers) {
apply(data, 1, function(row) {
dists <- apply(centers, 1, function(c) sqrt((row["x1"] - c["cx"])^2 + (row["x2"] - c["cy"])^2))
which.min(dists)
})
}
df$newCl <- assign_clusters(df[, c("x1", "x2")], centroids)
df[, c("Obs", "x1", "x2", "cl", "newCl")]
while (!all(df$cl == df$newCl)) {
df$cl <- df$newCl
# Recalculate centroids
centroids <- df %>%
group_by(cl) %>%
summarise(cx = mean(x1), cy = mean(x2), .groups = "drop")
# Reassign clusters
df$newCl <- assign_clusters(df[, c("x1", "x2")], centroids)
}
df$final_cl <- df$cl
df
ggplot(df, aes(x = x1, y = x2, color = factor(final_cl))) +
geom_point(size = 4) +
labs(title = "Final K-means Clustering", x = "X1", y = "X2", color = "Cluster") +
theme_minimal()
A:Complete linkage uses maximum pairwise distance So, fusion will occur at a higher height. Single linkage uses minimum distance for which fusion occurs earlier (lower height),So Fusion happens higher in complete linkage.
A: There is only one pair to compare: (5, 6) Since max and min are equal when only one pair exists, So Fusion height is the same in both linkages