Spot the difference in two pictures with Clustering method

Data preprocessing

Initial step is to open a given picture. To open .jpg file, JPEG library from R Documentation is required.

image1 <- readJPEG("test1.jpg")
image2 <- readJPEG("test2.jpg")

To process an image, we need to denote it in a matrix of numbers. The most suitable format is RGB. RGB indicates the proportion of red, green and blue in a particular pixel of an image. It can be easily processed as it represents the image in a 3 column matrix where the first column refers to the proportion of red colour, second column refers to the proportion of green colour and last column refers to the proportion of blue colour in the range of 0-255. we need to check the dimension first. The size of the image “Scream” is equal to 900x725 and the size of the image “Fear” is equal to 1119x880. Now we can change the format of the images from jpg to rgb and display them in the graph.

dm1 <- dim(image1);dm1[1:2]

## [1] 646 521

dm2 <-  dim(image2);dm2[1:2]

## [1] 646 521

rgbImage1 <- data.frame(
  x=rep(1:dm1[2], each=dm1[1]),
  y=rep(dm1[1]:1, dm1[2]),
  r.value=as.vector(image1[,,1]),
  g.value=as.vector(image1[,,2]),
  b.value=as.vector(image1[,,3]))

rgbImage2 <- data.frame(
  x=rep(1:dm2[2], each=dm2[1]),
  y=rep(dm2[1]:1, dm2[2]),
  r.value=as.vector(image2[,,1]),
  g.value=as.vector(image2[,,2]),
  b.value=as.vector(image2[,,3]))

We will use the round_df function to ensure that all values in the rgb data is the same size.

round_df <- function(x, digits) {
  numeric_columns <- sapply(x, mode) == 'numeric'
  x[numeric_columns] <-  round(x[numeric_columns], digits)
  x
}

rgbImage1 <- round_df(rgbImage1, 4)
rgbImage2 <- round_df(rgbImage2, 4)

plot(y ~ x, data=rgbImage1,
     col = rgb(rgbImage1[c("r.value", "g.value", "b.value")]),
     asp = 1, pch = ".")

plot(y ~ x, data=rgbImage2,
     col = rgb(rgbImage2[c("r.value", "g.value", "b.value")]),
     asp = 1, pch = ".")

##Running Clara algorithm

CLARA (Clustering Large Applications) is an extension to k-medoids methodS to deal with data containing a large number of objects (more than several thousand observations) in order to reduce computing time and RAM storage problem. Its time to run clara for first Image

test1 = rgbImage1[, c("r.value", "g.value", "b.value")]

clara <- clara(test1, 10)
plot(silhouette(clara))

colours <- rgb(clara$medoids[clara$clustering, ])

plot(y ~ x, data=rgbImage1,
     col = colours, 
     asp = 1, pch = ".")

For Second Image

test2 = rgbImage2[, c("r.value", "g.value", "b.value")]

clara2 <- clara(test2, 10)
plot(silhouette(clara2))

colours2 <- rgb(clara2$medoids[clara2$clustering, ])

plot(y ~ x, data=rgbImage2,
     col = colours2, 
     asp = 1, pch = ".")

##FINDING THE PERCENTAGE In order to find percentage of difference with this two pictures I set loop for each row of rgb dataframe (except x and y) ther is two counters, one for same values and another for different values in two images

dif_counter = 0
same_counter = 0
for(i in 1:nrow(rgbImage1)) {
   ifelse(rgbImage1[i, c(3,4,5) ] != rgbImage2[i, c(3,4,5) ], same_counter <- same_counter + 1, dif_counter <- dif_counter +1)
    
}
result <- dif_counter/(same_counter+dif_counter)*100
result

## [1] 68.62353

As you see difference in two pictures is nearly 69% its time for spot this differences

##SPOT THE DIFFERENCES WITH DRAWING CIRCLES AROUND THEM İn the plot you see in below, drawing circles around the different points, totally works: there is two circle, first circle intercepts the added later block in the second image, and second one in the middle of the two added later blocks. at least (70%)

plot(y ~ x, data=rgbImage2,
     col = colours2, 
     asp = 1, pch = ".")
for(i in 1:nrow(rgbImage1)) {
   ifelse(rgbImage1[i, c(3,4,5) ] != rgbImage2[i, c(3,4,5) ],draw.circle(i,i,50,border="red",lty=3,lwd=3)
, NA )
    
}

Spot the difference in two pictures with Clustering method

Gahraman Akbarov

2/28/2021

Clustering two images

Data preprocessing