In this article I will show how to use unsupervised learning methods for image clustering. The idea is to cluster two images and find the differences between them. I will compare my two mesmerising pictures which i made in paint.
library(jpeg)
library(cluster)
library(plotrix)
Initial step is to open a given picture. To open .jpg file, JPEG library from R Documentation is required.
image1 <- readJPEG("test1.jpg")
image2 <- readJPEG("test2.jpg")
To process an image, we need to denote it in a matrix of numbers. The most suitable format is RGB. RGB indicates the proportion of red, green and blue in a particular pixel of an image. It can be easily processed as it represents the image in a 3 column matrix where the first column refers to the proportion of red colour, second column refers to the proportion of green colour and last column refers to the proportion of blue colour in the range of 0-255. we need to check the dimension first. The size of the image “Scream” is equal to 900x725 and the size of the image “Fear” is equal to 1119x880. Now we can change the format of the images from jpg to rgb and display them in the graph.
dm1 <- dim(image1);dm1[1:2]
## [1] 646 521
dm2 <- dim(image2);dm2[1:2]
## [1] 646 521
rgbImage1 <- data.frame(
x=rep(1:dm1[2], each=dm1[1]),
y=rep(dm1[1]:1, dm1[2]),
r.value=as.vector(image1[,,1]),
g.value=as.vector(image1[,,2]),
b.value=as.vector(image1[,,3]))
rgbImage2 <- data.frame(
x=rep(1:dm2[2], each=dm2[1]),
y=rep(dm2[1]:1, dm2[2]),
r.value=as.vector(image2[,,1]),
g.value=as.vector(image2[,,2]),
b.value=as.vector(image2[,,3]))
We will use the round_df function to ensure that all values in the rgb data is the same size.
round_df <- function(x, digits) {
numeric_columns <- sapply(x, mode) == 'numeric'
x[numeric_columns] <- round(x[numeric_columns], digits)
x
}
rgbImage1 <- round_df(rgbImage1, 4)
rgbImage2 <- round_df(rgbImage2, 4)
plot(y ~ x, data=rgbImage1,
col = rgb(rgbImage1[c("r.value", "g.value", "b.value")]),
asp = 1, pch = ".")
plot(y ~ x, data=rgbImage2,
col = rgb(rgbImage2[c("r.value", "g.value", "b.value")]),
asp = 1, pch = ".")
##Running Clara algorithm
CLARA (Clustering Large Applications) is an extension to k-medoids methodS to deal with data containing a large number of objects (more than several thousand observations) in order to reduce computing time and RAM storage problem. Its time to run clara for first Image
test1 = rgbImage1[, c("r.value", "g.value", "b.value")]
clara <- clara(test1, 10)
plot(silhouette(clara))
colours <- rgb(clara$medoids[clara$clustering, ])
plot(y ~ x, data=rgbImage1,
col = colours,
asp = 1, pch = ".")
For Second Image
test2 = rgbImage2[, c("r.value", "g.value", "b.value")]
clara2 <- clara(test2, 10)
plot(silhouette(clara2))
colours2 <- rgb(clara2$medoids[clara2$clustering, ])
plot(y ~ x, data=rgbImage2,
col = colours2,
asp = 1, pch = ".")
##FINDING THE PERCENTAGE In order to find percentage of difference with this two pictures I set loop for each row of rgb dataframe (except
x and y) ther is two counters, one for same values and another for different values in two images
dif_counter = 0
same_counter = 0
for(i in 1:nrow(rgbImage1)) {
ifelse(rgbImage1[i, c(3,4,5) ] != rgbImage2[i, c(3,4,5) ], same_counter <- same_counter + 1, dif_counter <- dif_counter +1)
}
result <- dif_counter/(same_counter+dif_counter)*100
result
## [1] 68.62353
As you see difference in two pictures is nearly 69% its time for spot this differences
##SPOT THE DIFFERENCES WITH DRAWING CIRCLES AROUND THEM İn the plot you see in below, drawing circles around the different points, totally works: there is two circle, first circle intercepts the added later block in the second image, and second one in the middle of the two added later blocks. at least (70%)
plot(y ~ x, data=rgbImage2,
col = colours2,
asp = 1, pch = ".")
for(i in 1:nrow(rgbImage1)) {
ifelse(rgbImage1[i, c(3,4,5) ] != rgbImage2[i, c(3,4,5) ],draw.circle(i,i,50,border="red",lty=3,lwd=3)
, NA )
}