Exercise 8

2.)

2c.)

{1,2} and {3,4}

2d.)

{1,2,3} and {4}

2e.)

3.)

# 1) Define the data
dat <- data.frame(
  obs     = 1:6,
  X1      = c(1, 1, 0, 5, 6, 4),
  X2      = c(4, 3, 4, 1, 2, 0)
)

#Quick raw scatterplot (3a)
plot(dat$X1, dat$X2,
     xlab = "X1", ylab = "X2",
     main = "3(a): Observations 1–6",
     pch  = 19, col = "black")
text(dat$X1, dat$X2, labels = dat$obs, pos = 3)

#Randomly initialize two clusters (3b)
set.seed(42)
dat$cluster <- sample(1:2, nrow(dat), replace = TRUE)

#Function to compute centroids (3c)
compute_centroids <- function(df) {
  aggregate(df[, c("X1","X2")],
            by = list(cluster = df$cluster),
            FUN = mean)
}

#K-means loop: reassign until labels stop changing (3d, 3e)
repeat {
  cents <- compute_centroids(dat)
  
  # compute Euclidean distance to each centroid
  d1 <- sqrt((dat$X1 - cents$X1[1])^2 + (dat$X2 - cents$X2[1])^2)
  d2 <- sqrt((dat$X1 - cents$X1[2])^2 + (dat$X2 - cents$X2[2])^2)
  
  # assign to the nearer centroid
  new.cluster <- ifelse(d1 < d2, cents$cluster[1], cents$cluster[2])
  
  # if nothing changed, break
  if (all(new.cluster == dat$cluster)) break
  
  # otherwise update and repeat
  dat$cluster <- new.cluster
}

#Print final centroids and assignments
cat("3(c)-(e) Final centroids:\n")

## 3(c)-(e) Final centroids:

print(cents)

##   cluster        X1       X2
## 1       1 0.6666667 3.666667
## 2       2 5.0000000 1.000000

cat("\n3(d)-(e) Final cluster assignment:\n")

## 
## 3(d)-(e) Final cluster assignment:

print(dat[, c("obs", "cluster")])

##   obs cluster
## 1   1       1
## 2   2       1
## 3   3       1
## 4   4       2
## 5   5       2
## 6   6       2

#Final colored plot (3f)
palette(c("steelblue", "tomato"))
plot(dat$X1, dat$X2,
     col   = dat$cluster,
     pch   = 19,
     xlab  = "X1", ylab = "X2",
     main  = "3(f): Final K-means Clustering")
text(dat$X1, dat$X2, labels = dat$obs, pos = 3)
legend("topright",
       legend = paste("Cluster", sort(unique(dat$cluster))),
       col    = sort(unique(dat$cluster)),
       pch    = 19)

#Second Panel: Centroids table
#Third Pandel: Cluster Assign Table.

4.)

For 4(a), when fusing the clusters {1,2,3} and {4,5}, single linkage uses the minimum cross-cluster distance while complete linkage uses the maximum, so the complete-linkage merge necessarily occurs at a higher height than the single-linkage merge. In 4(b), because each cluster is a singleton ({5} and {6}), both methods use the same single distance d(5,6), so they fuse at exactly the same height.

Exercise 8

2025-05-03

2.)

2c.)

{1,2} and {3,4}

2d.)

{1,2,3} and {4}

2e.)

3.)

4.)