Clustering colors in pictures has been a very interesting application
of clustering for me. Therefore, for my final project, I decided to dig
deeper into this area to discover interesting packages for processing
and visualizing such applications. After some research I stumbled upon a
package called recolorize and a really insightful blog post
written by Hannah Weller who was the creator of this package. You can
access it here here.
By connecting knowledge from class, ideas proposed in the blogpost and
and my own concepts I achieved some nice results. I hope you will find
them interesting!
For my analysis, I used a dataset
found on Kaggle. It consists of around 200 pictures of flowers in
.png format, all of them in the size of 128x128 pixels. For
my analysis I chose 15 of them, 5 purple-ish flowers,
5 pink-ish flowers and 5 orange-ish
flowers, which I converted from .png to
.jpg format.
One initial step involves conducting basic color clustering in pictures using standard clustering packages. This allows us to specify the number of colors (clusters) with which each picture is ‘’repainted’’. It serves as an effective method to reduce picture size and create visually appealing graphics.
For this you need to have jpeg,
rasterImage, cluster and
factoextra packages installed.
First I’m setting my working directory to the place in which I’m storing the pictures and I’m loading the necessary packages.
library(jpeg)
library(rasterImage)
library(factoextra)
library(cluster)
Then I’m looking at one of the pictures from the dataset.
image1 <- readJPEG("selected_flowers/001.jpg")
dim(image1)
## [1] 128 128 3
So every picture has 128x128 pixels and three color layers (red, green and blue - RGB scale).
We can also plot the picture
plot(1, type="n",
xaxt = "n", yaxt = "n",
xlab = "", ylab = "")
rasterImage(image1, 0.6, 0.6, 1.4, 1.4)
Let’s do this for all 15 pictures. We are making a list that will store the information about pictures.
# Code generation with the help of AI
image_data_list <- list()
for (i in 1:15) {
file_name <- sprintf("selected_flowers/%03d.jpg", i)
image <- readJPEG(file_name)
dm <- dim(image)
# Dataframe for rgb values
rgb_data <- data.frame(
x = rep(1:dm[2], each = dm[1]),
y = rep(dm[1]:1, dm[2]),
r.value = as.vector(image[,,1]),
g.value = as.vector(image[,,2]),
b.value = as.vector(image[,,3])
)
image_data_list[[paste0("rgbImage", i)]] <- rgb_data
}
#Image 1 - r, g and b values for every pixel
head(image_data_list$rgbImage1)
## x y r.value g.value b.value
## 1 1 128 0.05882353 0.08235294 0.074509804
## 2 1 127 0.05882353 0.08627451 0.054901961
## 3 1 126 0.04705882 0.09019608 0.023529412
## 4 1 125 0.05490196 0.10980392 0.007843137
## 5 1 124 0.14901961 0.21176471 0.070588235
## 6 1 123 0.22745098 0.30196078 0.121568627
Let’s plot all 15 flowers to get a closer look.
par(mfrow = c(3, 5), mar = c(1, 1, 1, 1))
for (i in 1:15) {
plot(y ~ x, data = image_data_list[[i]], main = paste0("Flower ", i),
col = rgb(image_data_list[[i]][c("r.value", "g.value", "b.value")]),
asp = 1, pch = 15,
xaxt = "n", yaxt = "n",
xlab = "", ylab = "")
}
For each flower we have information about r, g and b layers for every pixel. This information can be used for clustering. We take the columns that store the rgb data (columns r.value, g.value and b.value), set the number of clusters and run the algorithm. Because this dataset is quite big, we use CLARA clustering here. The number of cluster is the number of colors that every picture will be painted with.
# Code generation with the help of AI
clara_list <- list()
for (i in 1:length(image_data_list)) {
rgb_data <- image_data_list[[paste0("rgbImage", i)]]
# Using 3 columns for clustering and setting the number of clusters
clara_result <- clara(rgb_data[, 3:5], 4)
clara_list[[paste0("clara", i)]] <- clara_result
}
With this code we clustered the values in every picture to get only 4 colors. Now we can use those colors in the flower pictures. Now i will use these new clustered colors to paint the flower pictures.
par(mfrow = c(3, 5), mar = c(1, 1, 1, 1))
for (i in 1:length(image_data_list)) {
image_data <- image_data_list[[paste0("rgbImage", i)]]
clara_result <- clara_list[[paste0("clara", i)]]
plot(image_data$y ~ image_data$x,
col = rgb(clara_result$medoids[clara_result$clustering, , drop = FALSE]),
pch = 15, cex = 2, asp = 1,
main = paste("Flower", i),
xaxt = "n", yaxt = "n",
xlab = "", ylab = "")
}
We can also look at the outcome of 8 clusters
This part of the paper is my take on the code provided in the blogspot I mentiod earlier. I think recolorize is an excellent package for performing color clustering that gives us many interesting visualizations and options.
First I’m loading the recolorize package.
library(recolorize)
Then I’m again loading the flower pictures and storing them in a list called flowers with the use of readImage(). Then I’m plotting the pictures with plotImageArray(). Thanks to the package loading and plotting the pictures is much quicker and cleaner.
# Loading pictures of flowers
flowers <- list()
for (i in 1:15) {
file_name <- sprintf("/Users/monikakot/Desktop/UL/datasets/selected_flowers/%03d.jpg", i)
flowers[[paste0("flower", i)]] <- readImage(file_name)
}
# Plotting each picture
par(mfrow = c(3, 5), mar = c(1, 1, 1, 1))
for (i in 1:length(flowers)) {
plotImageArray(flowers[[i]], main = paste0("flower ", i))
}
One of the interesting ways to use this package is to get palletes of colors based on k-means clustering. With the use of recolorize() you can quickly get color palettes for you pictures. I set the number of clusters for each picture to 5, so I have 5 colors for each picture.
# Plotting flower colors
par(mfrow = c(3, 5), mar = c(1, 1, 1, 1))
for (i in 1:length(flowers)) {
rc <- recolorize(flowers[[i]], method = "k", n = 5, plotting = F)
plotColorPalette(rc$centers)
}
For each picture we have colors assigned by k-means clustering. Using recolorize2() we can get even more data. Im making a list that will store data for each flower picture. With this for each picture we get: colors for each pixel in the original image, information about the centers of clusters - assigned colors and information about what color was assigned to each pixel after clustering. Also the thing I will also use later is the information about the number of pixel, to which each color was assigned to.
# Making the list
rc_list <- vector("list", length(flowers))
names(rc_list) <- names(flowers)
for (i in 1:length(flowers)) {
rc_list[[i]] <- recolorize2(flowers[[i]], bins = 5, method = 'k', plotting = F)
}
# Storing info about all the palettes and number of pixels with clustered colors for each picture
all_palettes <- do.call(rbind, lapply(rc_list, function(i) i$centers))
all_sizes <- do.call(c, lapply(rc_list, function(i) i$sizes))
Now the cool stuff - using hclust_color() we can plot a dendogram of colors assigned by the clustering algorithms - the similar clustered colors are clustered together. For each flower picture we got 5 colors, so for 15 pictures we have 75 colors in total.
par(mfrow = c(1, 1))
cluster_list <- hclust_color(all_palettes, n_final = 5)
Next we can do something even better - we can make a one universal palette from all of these colors. A it was stated in the blogpost: “rest of the hclust_color() options are various ways to combine colors by similarity — by default, it calculates the Euclidean distance matrix between all provided color centers in CIE Lab color space”. After that we plot the palette to see what colors were assigned. We do this by the following code:
flower_palette <- matrix(NA, ncol = 3, nrow = length(cluster_list))
for (i in 1:length(cluster_list)) {
idx <- cluster_list[[i]]
ctr <- apply(all_palettes, 2,
function(j) weighted.mean(j[idx],
w = all_sizes[idx]))
flower_palette[i, ] <- ctr
}
# Plotting the universal palette
par(mar = rep(0, 4))
plotColorPalette(flower_palette)
We can see that colors seem accurate compared to what we saw in the initial color palettes for the flower pictures. With the use of imposeColors() we can paint each picture with the one universal palette. Below we have the code for applying it for all the images, then i show what is the outcome foe some of them.
impose_list <- lapply(flowers, function(i) imposeColors(i, flower_palette,
adjust_centers = F,
plotting = F))
#flower 1
imposeColors(flowers[[1]], flower_palette,
adjust_centers = FALSE, plotting = T)
#flower 6
imposeColors(flowers[[6]], flower_palette,
adjust_centers = FALSE, plotting = T)
#flower 11
imposeColors(flowers[[11]], flower_palette,
adjust_centers = FALSE, plotting = T)
With this we paint the pictures in the same colors and we see the share of each color in the pictures.
This part of the code was my idea for using these packages and the universal colors for painting each flower. My initial idea for this project was to cluster the flower pictures by their color similarity - so that orange flowers would be in one cluster, pink flowers in another cluster etc. With this outcome I thought I could make it work.
The idea was to make a data frame that would store the information
how many pixels were painted by one of the five colors for each flower.
With the help of AI I made it work. For this I loaded dplyr
library.
# Code generation with the help of AI
library(dplyr)
flower_df <- bind_rows(
lapply(1:15, function(i) {
flower_name <- paste0("flower", i)
sizes_vector <- impose_list[[flower_name]]$sizes
data.frame(flower = flower_name, t(sizes_vector))
})
)
row.names(flower_df) <- flower_df[,1]
flower_df <- flower_df[,-1]
flower_df
## X1 X2 X3 X4 X5
## flower1 1042 7776 7470 38 58
## flower2 2085 4237 10055 7 0
## flower3 448 7538 8381 17 0
## flower4 2589 6897 6874 0 24
## flower5 5446 9891 958 11 78
## flower6 8461 7014 0 6 903
## flower7 5584 9900 174 0 726
## flower8 10153 6030 185 0 16
## flower9 11378 4814 192 0 0
## flower10 7145 8776 0 25 438
## flower11 0 9234 0 6923 227
## flower12 0 8835 7 7443 99
## flower13 0 9298 0 6344 742
## flower14 0 9002 12 4464 2906
## flower15 0 8552 0 1971 5861
After this I applied the k-means algorithm to cluster the flowers. I assigned the number of clusters to three because I have flowers in three colors in my dataset.
flower_clusters <- eclust(flower_df, "kmeans", hc_metric = "euclidean", k=3)
The outcome is really good! We can see that flowers 1 to 4 were assigned together, flowers 5 to 10 together and flowers 11 to 15 together. That is in line with what we see on the pictures :) Below you can find the clusters.
Pink-ish cluster
Violet-ish cluster
Orange-ish cluster
This project showed my findings on color clustering in pictures. The
algorithms and packages in here aren’t that advanced but show really
interesting and eye-pleasing results. The package
recolorize allows for many interesting analyses. Another
packages I plan to use in the future are the colordistance
and patternize packages. These three together allow for
very fascinating work with color in pictures.
Read more about applying them: