Introduction

Clustering colors in pictures has been a very interesting application of clustering for me. Therefore, for my final project, I decided to dig deeper into this area to discover interesting packages for processing and visualizing such applications. After some research I stumbled upon a package called recolorize and a really insightful blog post written by Hannah Weller who was the creator of this package. You can access it here here. By connecting knowledge from class, ideas proposed in the blogpost and and my own concepts I achieved some nice results. I hope you will find them interesting!

Dataset

For my analysis, I used a dataset found on Kaggle. It consists of around 200 pictures of flowers in .png format, all of them in the size of 128x128 pixels. For my analysis I chose 15 of them, 5 purple-ish flowers, 5 pink-ish flowers and 5 orange-ish flowers, which I converted from .png to .jpg format.

Basic color clustering

One initial step involves conducting basic color clustering in pictures using standard clustering packages. This allows us to specify the number of colors (clusters) with which each picture is ‘’repainted’’. It serves as an effective method to reduce picture size and create visually appealing graphics.

For this you need to have jpeg, rasterImage, cluster and factoextra packages installed.

First I’m setting my working directory to the place in which I’m storing the pictures and I’m loading the necessary packages.

library(jpeg)
library(rasterImage)
library(factoextra)
library(cluster)

Then I’m looking at one of the pictures from the dataset.

image1 <- readJPEG("selected_flowers/001.jpg") 
dim(image1)
## [1] 128 128   3

So every picture has 128x128 pixels and three color layers (red, green and blue - RGB scale).

We can also plot the picture

plot(1, type="n",
     xaxt = "n", yaxt = "n",
     xlab = "", ylab = "")
rasterImage(image1, 0.6, 0.6, 1.4, 1.4)

Let’s do this for all 15 pictures. We are making a list that will store the information about pictures.

# Code generation with the help of AI

image_data_list <- list()
for (i in 1:15) {
  file_name <- sprintf("selected_flowers/%03d.jpg", i)
  image <- readJPEG(file_name)
  dm <- dim(image)
  
  # Dataframe for rgb values
  rgb_data <- data.frame(
    x = rep(1:dm[2], each = dm[1]),
    y = rep(dm[1]:1, dm[2]),
    r.value = as.vector(image[,,1]),
    g.value = as.vector(image[,,2]),
    b.value = as.vector(image[,,3])
  )
  
  image_data_list[[paste0("rgbImage", i)]] <- rgb_data
}
#Image 1 - r, g and b values for every pixel
head(image_data_list$rgbImage1)
##   x   y    r.value    g.value     b.value
## 1 1 128 0.05882353 0.08235294 0.074509804
## 2 1 127 0.05882353 0.08627451 0.054901961
## 3 1 126 0.04705882 0.09019608 0.023529412
## 4 1 125 0.05490196 0.10980392 0.007843137
## 5 1 124 0.14901961 0.21176471 0.070588235
## 6 1 123 0.22745098 0.30196078 0.121568627

Let’s plot all 15 flowers to get a closer look.

par(mfrow = c(3, 5), mar = c(1, 1, 1, 1))
for (i in 1:15) {
  plot(y ~ x, data = image_data_list[[i]], main = paste0("Flower ", i),
       col = rgb(image_data_list[[i]][c("r.value", "g.value", "b.value")]),
       asp = 1, pch = 15,
       xaxt = "n", yaxt = "n",
       xlab = "", ylab = "")
}

For each flower we have information about r, g and b layers for every pixel. This information can be used for clustering. We take the columns that store the rgb data (columns r.value, g.value and b.value), set the number of clusters and run the algorithm. Because this dataset is quite big, we use CLARA clustering here. The number of cluster is the number of colors that every picture will be painted with.

# Code generation with the help of AI

clara_list <- list()
for (i in 1:length(image_data_list)) {
  rgb_data <- image_data_list[[paste0("rgbImage", i)]]
  # Using 3 columns for clustering and setting the number of clusters
  clara_result <- clara(rgb_data[, 3:5], 4)
  
  clara_list[[paste0("clara", i)]] <- clara_result
}

With this code we clustered the values in every picture to get only 4 colors. Now we can use those colors in the flower pictures. Now i will use these new clustered colors to paint the flower pictures.

par(mfrow = c(3, 5), mar = c(1, 1, 1, 1))

for (i in 1:length(image_data_list)) {
  image_data <- image_data_list[[paste0("rgbImage", i)]]
  clara_result <- clara_list[[paste0("clara", i)]]

  plot(image_data$y ~ image_data$x,
       col = rgb(clara_result$medoids[clara_result$clustering, , drop = FALSE]),
       pch = 15, cex = 2, asp = 1,
       main = paste("Flower", i),
       xaxt = "n", yaxt = "n",
       xlab = "", ylab = "")
}

We can also look at the outcome of 8 clusters

Reolorize package

This part of the paper is my take on the code provided in the blogspot I mentiod earlier. I think recolorize is an excellent package for performing color clustering that gives us many interesting visualizations and options.

First I’m loading the recolorize package.

library(recolorize)

Then I’m again loading the flower pictures and storing them in a list called flowers with the use of readImage(). Then I’m plotting the pictures with plotImageArray(). Thanks to the package loading and plotting the pictures is much quicker and cleaner.

# Loading pictures of flowers
flowers <- list()
for (i in 1:15) {
  file_name <- sprintf("/Users/monikakot/Desktop/UL/datasets/selected_flowers/%03d.jpg", i)
  flowers[[paste0("flower", i)]] <- readImage(file_name)
}

# Plotting each picture
par(mfrow = c(3, 5), mar = c(1, 1, 1, 1))
for (i in 1:length(flowers)) {
  plotImageArray(flowers[[i]], main = paste0("flower ", i))
}

Color palettes

One of the interesting ways to use this package is to get palletes of colors based on k-means clustering. With the use of recolorize() you can quickly get color palettes for you pictures. I set the number of clusters for each picture to 5, so I have 5 colors for each picture.

# Plotting flower colors
par(mfrow = c(3, 5), mar = c(1, 1, 1, 1))
for (i in 1:length(flowers)) {
  rc <- recolorize(flowers[[i]], method = "k", n = 5, plotting = F)
  plotColorPalette(rc$centers)
}

For each picture we have colors assigned by k-means clustering. Using recolorize2() we can get even more data. Im making a list that will store data for each flower picture. With this for each picture we get: colors for each pixel in the original image, information about the centers of clusters - assigned colors and information about what color was assigned to each pixel after clustering. Also the thing I will also use later is the information about the number of pixel, to which each color was assigned to.

# Making the list
rc_list <- vector("list", length(flowers))
names(rc_list) <- names(flowers)

for (i in 1:length(flowers)) {
  rc_list[[i]] <- recolorize2(flowers[[i]], bins = 5, method = 'k', plotting = F)
}

# Storing info about all the palettes and number of pixels with clustered colors for each picture
all_palettes <- do.call(rbind, lapply(rc_list, function(i) i$centers))
all_sizes <- do.call(c, lapply(rc_list, function(i) i$sizes))

Hierarchical color clustering

Now the cool stuff - using hclust_color() we can plot a dendogram of colors assigned by the clustering algorithms - the similar clustered colors are clustered together. For each flower picture we got 5 colors, so for 15 pictures we have 75 colors in total.

par(mfrow = c(1, 1))
cluster_list <- hclust_color(all_palettes, n_final = 5)

Universal color palette

Next we can do something even better - we can make a one universal palette from all of these colors. A it was stated in the blogpost: “rest of the hclust_color() options are various ways to combine colors by similarity — by default, it calculates the Euclidean distance matrix between all provided color centers in CIE Lab color space”. After that we plot the palette to see what colors were assigned. We do this by the following code:

flower_palette <- matrix(NA, ncol = 3, nrow = length(cluster_list))

for (i in 1:length(cluster_list)) {
  idx <- cluster_list[[i]]
  ctr <- apply(all_palettes, 2, 
               function(j) weighted.mean(j[idx], 
                                         w = all_sizes[idx]))
  flower_palette[i, ] <- ctr
}

# Plotting the universal palette
par(mar = rep(0, 4))
plotColorPalette(flower_palette)

We can see that colors seem accurate compared to what we saw in the initial color palettes for the flower pictures. With the use of imposeColors() we can paint each picture with the one universal palette. Below we have the code for applying it for all the images, then i show what is the outcome foe some of them.

impose_list <- lapply(flowers, function(i) imposeColors(i, flower_palette, 
                                                     adjust_centers = F, 
                                                     plotting = F))
#flower 1
imposeColors(flowers[[1]], flower_palette, 
             adjust_centers = FALSE, plotting = T)

#flower 6
imposeColors(flowers[[6]], flower_palette, 
             adjust_centers = FALSE, plotting = T)

#flower 11
imposeColors(flowers[[11]], flower_palette, 
             adjust_centers = FALSE, plotting = T)

With this we paint the pictures in the same colors and we see the share of each color in the pictures.

Clustering photos by color similarity

This part of the code was my idea for using these packages and the universal colors for painting each flower. My initial idea for this project was to cluster the flower pictures by their color similarity - so that orange flowers would be in one cluster, pink flowers in another cluster etc. With this outcome I thought I could make it work.

The idea was to make a data frame that would store the information how many pixels were painted by one of the five colors for each flower. With the help of AI I made it work. For this I loaded dplyr library.

# Code generation with the help of AI

library(dplyr)

flower_df <- bind_rows(
  lapply(1:15, function(i) {
    flower_name <- paste0("flower", i)
    sizes_vector <- impose_list[[flower_name]]$sizes
    data.frame(flower = flower_name, t(sizes_vector))
  })
)
row.names(flower_df) <- flower_df[,1]
flower_df <- flower_df[,-1]
flower_df
##             X1   X2    X3   X4   X5
## flower1   1042 7776  7470   38   58
## flower2   2085 4237 10055    7    0
## flower3    448 7538  8381   17    0
## flower4   2589 6897  6874    0   24
## flower5   5446 9891   958   11   78
## flower6   8461 7014     0    6  903
## flower7   5584 9900   174    0  726
## flower8  10153 6030   185    0   16
## flower9  11378 4814   192    0    0
## flower10  7145 8776     0   25  438
## flower11     0 9234     0 6923  227
## flower12     0 8835     7 7443   99
## flower13     0 9298     0 6344  742
## flower14     0 9002    12 4464 2906
## flower15     0 8552     0 1971 5861

After this I applied the k-means algorithm to cluster the flowers. I assigned the number of clusters to three because I have flowers in three colors in my dataset.

flower_clusters <- eclust(flower_df, "kmeans", hc_metric = "euclidean", k=3)

The outcome is really good! We can see that flowers 1 to 4 were assigned together, flowers 5 to 10 together and flowers 11 to 15 together. That is in line with what we see on the pictures :) Below you can find the clusters.

Clusters with assigned flowers

Pink-ish cluster

Violet-ish cluster

Orange-ish cluster

Summary

This project showed my findings on color clustering in pictures. The algorithms and packages in here aren’t that advanced but show really interesting and eye-pleasing results. The package recolorize allows for many interesting analyses. Another packages I plan to use in the future are the colordistance and patternize packages. These three together allow for very fascinating work with color in pictures.

Read more about applying them: