Evaluating Clustering Methods for Image Segmentation: A Comparative Study Using Elbow, Silhouette, and Calinski-Harabasz Metrics

Introduction

This research aims to evaluate the effectiveness and consistency of the three clustering evaluation metrics by applying them to three images with distinct visual and structural properties:
- a part of the Diego Rivera mural “Dream of a Sunday Afternoon in the Alameda Central” (a highly complex, colorful image);
- a photo of pierogi (own photo, has distinct object boundaries);
- a view of the Baltic Sea (a natural image dominated by gradients and subtle variations).

By analyzing the results from these metrics and visually inspecting the segmented images, this study seeks to identify patterns in how the metrics perform across different types of images and explore the conditions under which one method may be more suitable than others.

Methodology

For this study, the clustering algorithms K-means and CLARA (Clustering Large Applications) were used due to their effectiveness and scalability in handling image data. K-means is computationally efficient and well-suited for image clustering, where pixels can be represented as points in RGB space. It provides compact clusters, which makes it ideal for segmenting images into distinct regions. CLARA is a variation of k-medoids, optimized for large datasets. Instead of processing the entire dataset, it uses random sampling to find cluster medoids, making it faster and more memory-efficient.This study uses combination of K-means and CLARA, balancing computational efficiency to analyze images.

To determine the optimal number of clusters (k), three widely used metrics were employed: the Elbow Method, Silhouette Score, and Calinski-Harabasz Index. Each metric evaluates clustering quality using different criteria.
- Silhouette Score evaluates the quality of clustering by measuring how similar a data point is to its own cluster (cohesion) compared to other clusters (separation). A higher Silhouette Score indicates better-defined clusters, and the number of clusters with the highest score is considered optimal.
- Elbow Method involves plotting the Within-Cluster Sum of Squares (WCSS), also known as the Sum of Squared Errors (SSE), against the number of clusters. The optimal number of clusters corresponds to the “elbow point,” where the SSE starts to decrease more gradually, indicating diminishing returns in clustering quality with additional clusters.
- Calinski-Harabasz Index (Variance Ratio Criterion) calculates the ratio of between-cluster variance to within-cluster variance. Similar to other clustering evaluation metrics such as Silhouette score, the CH index can be used to find the optimal number of clusters k in algorithms like k-means, where the value of k is not known apriori. Higher values CH indicate better-defined clusters.

Libraries

library(jpeg)
library(plotrix)
library(rasterImage)
library(imager)     
library(ggplot2)    
library(Rtsne)      
library(cluster)    
library(gridExtra)
library(fpc)

Diego Rivera mural

setwd("C:/Users/ydmar/Documents/UW/UW - 1 semester/UL")
mural<-readJPEG("diego_rivera.jpg")
class(mural)
## [1] "array"

Plot raster image.

plot(1, type="n")
rasterImage(mural, 0.6, 0.6, 1.4, 1.4)

Inspect the dimensions of the object.

dm1 <-dim(mural)
dm1
## [1]  675 1200    3

The image is a 675 x 1200 pixel RGB image. It has 675 rows and 1200 columns, making up a total of 810,000 pixels. Each pixel has 3 color values: red, green and blue.

To cluster the image further and put it on the plot, we have to change the format of images from jpg to rgb.

rgbMural<-data.frame(x=rep(1:dm1[2], each=dm1[1]),  
                      y=rep(dm1[1]:1, dm1[2]), 
                      r.value=as.vector(mural[,,1]),  
                      g.value=as.vector(mural[,,2]), 
                      b.value=as.vector(mural[,,3]))
head(rgbMural)
##   x   y   r.value   g.value   b.value
## 1 1 675 0.9960784 1.0000000 0.9764706
## 2 1 674 1.0000000 0.9921569 0.9450980
## 3 1 673 0.9921569 0.9960784 0.9333333
## 4 1 672 1.0000000 0.9960784 0.9254902
## 5 1 671 0.9921569 1.0000000 0.9176471
## 6 1 670 0.9372549 0.8705882 0.7019608
plot(y~x, 
     data= rgbMural,
     main="Diego Rivera mural", 
     col=rgb(rgbMural[c("r.value", "g.value", "b.value")]), 
     asp=1, 
     pch="."
)


Each row now represents a pixel with its coordinates (x, y) and corresponding RGB color values.
Determine the optimal number of clusters. Start with Silhouette Score.

n1<-c() 
for (i in 1:10) {
  clS_mural<-clara(rgbMural[, c("r.value", "g.value", "b.value")], i)
  n1[i]<-clS_mural$silinfo$avg.width
}

plot(n1, type='l', main="Optimal number of clusters (Silhouette score)", xlab="Number of clusters", ylab="Average silhouette", col="blue")
points(n1, pch=21, bg="navyblue")
abline(h=(1:30)*5/100, lty=3, col="grey50")

clara_mural<-clara(rgbMural[,3:5], 4) 
plot(silhouette(clara_mural))


Based on the Silhoutte Method, the average silhoutte width peaks at 4 clusters, showing that this is the configuration with best-defined clustering structure. Cluster 1, 3, and 4 have a higher silhoutte width indicating better-defined groupings. However, cluster 2 has low silhoutte width, implying that the points in this cluster may not be well-separated from points in other cluster. Overall, the silhoutte width over 0.5 is reasonable, but the low value for Cluster 2 suggests that ,additionally, another method could be applied.

Elbow Method.

n2<-c() 
for (h in 1:10) {
  clE_mural<-clara(rgbMural[, c("r.value", "g.value", "b.value")], h)
  n2[h]<-clE_mural$objective
}

plot(n2, type = 'l', main = "Optimal Number of Clusters (Elbow Method)",
     xlab = "Number of Clusters", ylab = "Within-Cluster Sum of Squares", col = "blue")
points(n2, pch = 21, bg = "navyblue")
abline(v = which.min(diff(diff(n2))), lty = 3, col = "red")


As we can see, according to the Elbow Method, 4 clusters is also the optimal number, because it represents the point where adding more clusters no longer results in a substantial reduction in WCSS. Since both clustering methods suggest the same number of clusters, apply this number.
Prepare color represenation of the clustered image.

coloursMural <-rgb(clara_mural$medoids[clara_mural$clustering, ])

Plot pixels in the new colours.

plot(rgbMural$y~rgbMural$x, col=coloursMural, pch=".", cex=2, asp=1, main="4 colours")

Pierogi

pierogi<-readJPEG("C:/Users/ydmar/Documents/UW/UW - 1 semester/UL/pierogi.jpg")
class(pierogi)
## [1] "array"

Repeat the same steps as were described above.

plot(1, type="n")
rasterImage(pierogi, 0.6, 0.6, 1.4, 1.4)

dm2 <-dim(pierogi)
dm2
## [1] 874 954   3

The image is a 874 x 954 pixel RGB image. It has 874 rows and 954 columns, making up a total of 833,796 pixels. Each pixel has 3 color values: red, green and blue.

rgbPierogi<-data.frame(x=rep(1:dm2[2], each=dm2[1]),  
                      y=rep(dm2[1]:1, dm2[2]), 
                      r.value=as.vector(pierogi[,,1]),  
                      g.value=as.vector(pierogi[,,2]), 
                      b.value=as.vector(pierogi[,,3]))
head(rgbPierogi)
##   x   y   r.value   g.value   b.value
## 1 1 874 0.3803922 0.2352941 0.1333333
## 2 1 873 0.3803922 0.2352941 0.1333333
## 3 1 872 0.3725490 0.2274510 0.1254902
## 4 1 871 0.3803922 0.2352941 0.1333333
## 5 1 870 0.3686275 0.2235294 0.1215686
## 6 1 869 0.3686275 0.2078431 0.1215686
plot(y~x, 
     data=rgbPierogi, 
     main="Pierogi", 
     col=rgb(rgbPierogi[c("r.value", "g.value", "b.value")]), 
     asp=1, 
     pch="."
)


Silhouette score.

n3<-c() 
for (p in 1:10) {
  clS_pierogi <-clara(rgbPierogi[, c("r.value", "g.value", "b.value")], p)
  n3[p]<-clS_pierogi$silinfo$avg.width
}

plot(n3, type='l', main="Optimal number of clusters (Silhouette score)", xlab="Number of clusters", ylab="Average silhouette", col="blue")
points(n3, pch=21, bg="navyblue")
abline(h=(1:30)*5/100, lty=3, col="grey50")

clara_pierogi2<-clara(rgbPierogi[,3:5], 2) 
plot(silhouette(clara_pierogi2))


Based on the Silhoutte Method, the average silhoutte width peaks at 2 clusters, showing that this is the configuration with best-defined clustering structure. The silhoutte width over 0.7 is reasonable. Check whether the results obtained using the Elbow Method remain consistent.
Elbow Method.

n4<-c() 
for (d in 1:10) {
  clE_pierogi<-clara(rgbPierogi[, c("r.value", "g.value", "b.value")], d)
  n4[d]<-clE_pierogi$objective
}

plot(n4, type = 'l', main = "Optimal Number of Clusters (Elbow Method)",
     xlab = "Number of Clusters", ylab = "Within-Cluster Sum of Squares", col = "blue")
points(n4, pch = 21, bg = "navyblue")
abline(v = which.min(diff(diff(n4))), lty = 3, col = "red")


Based on the Elbow Method, the optimal number of clusters for the ‘Pierogi’ image is 8. However, given the differing recommendations from the Silhouette Score and Elbow Method, the Calinski-Harabasz (CH) Index will be used as an additional metric to ensure a more reliable approach to indicate the optimal number of clusters.
Calinski-Harabasz index
The Calinski-Harabasz index evaluates cluster quality by measuring the ratio of between-cluster dispersion to within-cluster dispersion. To address computational and memory limitations, a sampling approach was applied. A sample size of 10,000 observations is quite sufficient, as it preserves the essential structure of the dataset. Given the dataset’s uniform patterns and consistent distribution of RGB values, the sample is expected to be representative of the entire dataset. Utilizing a fixed random seed function ensures reproducibility and consistency in the sampling process.

set.seed(123)
sampled_pierogi <- rgbPierogi[sample(1:nrow(rgbPierogi), size = 10000), ]
ch_indices_pierogi <- sapply(2:10, function(k) {
  clara_pierogich <- clara(sampled_pierogi, k)
  cluster.stats(dist(sampled_pierogi), clara_pierogich$clustering)$ch
})
ch_indices_pierogi
## [1] 5233.343 7281.453 7156.269 9114.986 8814.928 8870.446 9158.971 9087.849
## [9] 9197.451
plot(2:10, ch_indices_pierogi, type = "b", 
     xlab = "Number of Clusters (k)", ylab = "Calinski-Harabasz Index",
     main = "Calinski-Harabasz Index for K-means Clustering", 
    col = "navyblue",  pch = 16)


As we can see, the highest CH index value of 9197.451 suggests that the best number of clusters for the image is 10. However, this represents a slight variation in cluster count compared to other methods. The next highest CH index value, 9158.971, occurs at 8 clusters, which is only marginally lower than the highest value.
Given this close proximity in CH index values and the fact that the Elbow Method also identifies 8 clusters as optimal, 8 clusters are selected as the most appropriate number for this image. This decision balances consistency between methods.

clara_pierogi8<-clara(rgbPierogi[,3:5], 8) 
coloursPierogi8<-rgb(clara_pierogi8$medoids[clara_pierogi8$clustering, ])
plot(rgbPierogi$y~rgbPierogi$x, col=coloursPierogi8, pch=".", cex=2, asp=1, main="8 colours")

Gdansk, Baltic sea

setwd("C:/Users/ydmar/Documents/UW/UW - 1 semester/UL")
gdansk_sea <-readJPEG("Gdansk_Sea.jpg")
class(gdansk_sea)
## [1] "array"

Repeat the same steps as were described above.

plot(1, type="n")
rasterImage(gdansk_sea, 0.6, 0.6, 1.4, 1.4)

dm3 <-dim(gdansk_sea)
dm3
## [1]  675 1200    3

The image is a 675 x 1200 pixel RGB image. It has 675 rows and 1200 columns, making up a total of 810,000 pixels. Each pixel has 3 color values: red, green and blue.

rgbGdansk<-data.frame(x=rep(1:dm3[2], each=dm3[1]),  
                      y=rep(dm3[1]:1, dm3[2]), 
                      r.value=as.vector(gdansk_sea[,,1]),  
                      g.value=as.vector(gdansk_sea[,,2]), 
                      b.value=as.vector(gdansk_sea[,,3]))
head(rgbGdansk)
##   x   y   r.value   g.value   b.value
## 1 1 675 0.8470588 0.8980392 0.9725490
## 2 1 674 0.8470588 0.8980392 0.9725490
## 3 1 673 0.8509804 0.9019608 0.9764706
## 4 1 672 0.8549020 0.9058824 0.9803922
## 5 1 671 0.8588235 0.9098039 0.9764706
## 6 1 670 0.8627451 0.9137255 0.9803922
plot(y~x, 
     data=rgbGdansk, 
     main="Gdansk_BalticSea", 
     col=rgb(rgbGdansk[c("r.value", "g.value", "b.value")]), 
     asp=1, 
     pch="."
)


Silhoutte score.

n5<-c() 
for (g in 1:10) {
  clS_Gdansk<-clara(rgbGdansk[, c("r.value", "g.value", "b.value")], g)
  n5[g]<-clS_Gdansk$silinfo$avg.width
}

plot(n5, type='l', main="Optimal number of clusters (Silhouette score)", xlab="Number of clusters", ylab="Average silhouette", col="blue")
points(n5, pch=21, bg="navyblue")
abline(h=(1:30)*5/100, lty=3, col="grey50")

claraS_Gdansk <-clara(rgbGdansk[,3:5], 2) 
plot(silhouette(claraS_Gdansk))


Based on the Silhoutte Method, the average silhoutte width peaks at 2 clusters. The silhoutte width of 0.6 is reasonable. Check whether the results obtained using the Elbow Method remain consistent.
Elbow Method.

n6<-c() 
for (s in 1:10) {
  clE_Gdansk<-clara(rgbGdansk[, c("r.value", "g.value", "b.value")], s)
  n6[s]<-clE_Gdansk$objective
}

plot(n6, type = 'l', main = "Optimal Number of Clusters (Elbow Method)",
     xlab = "Number of Clusters", ylab = "Within-Cluster Sum of Squares", col = "blue")
points(n6, pch = 21, bg = "navyblue")
abline(v = which.min(diff(diff(n4))), lty = 3, col = "red")


Similar to the previous image, the Elbow Method indicates that 8 clusters are optimal. To ensure a comprehensive evaluation, apply CH index. Calinski-Harabasz index.

set.seed(123)
sampled_Gdansk <- rgbGdansk[sample(1:nrow(rgbGdansk), size = 10000), ]
ch_indices_Gdansk <- sapply(2:10, function(v) {
  clara_GdanskCh <- clara(sampled_Gdansk, v)
  cluster.stats(dist(sampled_Gdansk), clara_GdanskCh$clustering)$ch
})
ch_indices_Gdansk
## [1] 13044.503 10133.821  9690.864  8753.373 10943.635 10493.068 10325.053
## [8]  9619.297  9994.434
plot(2:10, ch_indices_Gdansk, type = "b", 
     xlab = "Number of Clusters (k)", ylab = "Calinski-Harabasz Index",
     main = "Calinski-Harabasz Index for K-means Clustering",
     col = "navyblue",  pch = 16)


The highest CH index (13044.503) indicates that the optimal number of clusters for Gdansk image is 2. This conclusion aligns with the recommendation from the Silhouette Score, which also suggests that 2 clusters provide the best balance between cluster cohesion and separation. Based on the results from these two methods will proceed with applying a 2-cluster configuration to the image.

coloursGdansk<-rgb(claraS_Gdansk$medoids[claraS_Gdansk$clustering, ])
plot(rgbGdansk$y~rgbGdansk$x, col=coloursGdansk, pch=".", cex=2, asp=1, main="2 colours")


Conclusion

From the conducted study we can state, that the Elbow Method tended to suggest higher numbers of clusters, particularly for images with gradual transitions or intricate details (“Pierogi” and “Baltic Sea” images). This is because the Elbow Method primarily focuses on minimizing within-cluster variance (WCSS), and adding more clusters inherently reduces this value. For images like the “Baltic Sea”, dominated by gradients and fewer distinct regions, simpler clustering (2 clusters) provides a more meaningful segmentation. The Elbow Method’s suggestion of 8 clusters for this image reflects its tendency to over-segment gradient-heavy datasets, capturing insignificant pixel variations rather than the broader structure of the image.
Images like the Diego Rivera mural, which feature intricate patterns and diverse color palettes, require more clusters to accurately represent their complexity. Both the Elbow Method and Silhouette Score identified 4 clusters as optimal, demonstrating their ability to capture the distinct regions and details in such images.
Agreement between the Silhouette Score and the CH Index indicates strong clustering quality, as both metrics consider aspects of cluster separation and compactness. When methods disagree, their differences provide insights into the nature of the data. For example, the disagreement in the “Pierogi” image highlights a trade-off between simplicity (fewer clusters, Silhouette Score) and granularity (more clusters, CH index).
Summarizing the above mentioned results, the following approaches of selecting clustering methods depends on the characteristics of the image may be advised:
- Silhoutte score and Calinski-Harabasz Index for the images with color gradient;
- Elbow Method and Calinski-Harabasz Index for images with distinct objects transitioning in size, shape, or position but with fewer distinct colors.
- Silhoutte score ,Elbow Method and Calinski-Harabasz Index for images with patterns, diverse colors and detailed structure.