Overview

This is an assignment created for Unsupervised Learning class. We were supposed to use clustering methods in order to analyze phenomena that are interesting for us.

My project is answering the question: Are we able to detect animals photographed by a trail camera using various clustering methods?

I will try to point out the animal seen on the photo by using couple of clustering methods, let’s see if I will get even close to highlight only the animal, without any additional background.

The aim of this project is not to cluster every pixel of the image correctly, but rather to highlight the animal. I did not set strict assumptions, for example I didn’t use the same defined number of clusters for every algorithm. I was “juggling” with the parameters for every method to obtain the best results. If you want to create a project similar to this, feel free to change the values of parameters and try to get closer than me :)

The images are loaded in grayscale because this is the format most often registered by a trail camera. Because of the shade and darkness existing in the woods, sensor in the camera detects the low level of light and turns on the IR flash. The result is a grayscale picture.

Used methods: CLARA, DBSCAN, Mean Shift, Spectral Clustering

Code and images are available on my GitHub

Preproccessing

Loading libraries

library(fpc)
library(jpeg)
library(imager)
library(fields)
library(ggplot2)
library(raster)
library(dplyr)
library(cluster)
library(factoextra)
library(flexclust)
library(fpc)
library(ClusterR)
library(meanShiftR)
library(RColorBrewer)
library(Dict)
library(fdm2id)
library(showtext)
library(magick)

Functions

I’ve created 4 functions to facilitate the work and create a clearer output at the end.

Pick colours

pick_colours = function (n) {
  
  #DESCRIPTION : this function is picking n contrasting colours from the palette
  #INPUT: n: number of colours that we want to pick
  #OUTPUT: list_of_colors: list of n colours
  
  #load palette
  qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
  
  #create colors vector
  col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))
  
  #sample n colors from the vector
  list_of_colors = sample(col_vector, n, replace=FALSE)
  
  return (list_of_colors)
  
}

Contrast image

contrast_image = function (dataframe, scalar_white, scalar_black){
  
  #DESCRIPTION : this function is contrasting pixels of the grayscale image
  #INPUT: dataframe: 2D dataframe of our image
          #scalar_white: value which corresponds to how much we want to brighten light pixels
          #scalar_black: value which corresponds to how much we want to darken dark pixels
  #OUTPUT: dataframe: converted dataframe with contrasted pixels
  
  #if grayscale value is larger than a threshold - set a pixel value to scalar_white * pixel, else set it to scalar_black * pixel
  for (j in 1:ncol(dataframe)){
    for (i in 1:nrow(dataframe)){
      if(dataframe[i, j] > 0.3){
        dataframe[i, j] = dataframe[i,j] * scalar_white
      }
      else{
        dataframe[i, j] = dataframe[i, j] * scalar_black
      }
    }
  }
  return(dataframe)
}

Pick colors for DBSCAN

draw_colours_dbscan = function(dataframe, n_clusters){
  
  #DESCRIPTION : this function is assigning colours to specific clusters. It is made for DBSCAN algorithm,                  because this method can output cluster #0
  #INPUT: dataframe: 2D dataframe of our grayscale image with additional column 'cluster', which is the               assignment of each pixel to specific cluster
          #n_clusters: number of clusters which our algorithm has outputted
  #OUTPUT: dataframe: inputted dataframe with additional column 'color'
  
  if (n_clusters < 2){
    print ("Error: insufficient number of clusters : check your parameters")
    break}
    
    
  #sample n colours
  colours = pick_colours(n_clusters - 1)
  
  #compute number of clusters
  clusters = seq(from=1, to=n_clusters - 1, by=1)
  names(colours) = clusters
  
  #this is specific for dbscan: we want to assign color black to non-clustered pixels and contrasting colors for clusters
  for (i in 1:nrow(dataframe)){
    
    if(dataframe[i, ]['cluster'] != 0){
      dataframe[i, ]['color'] = colours[as.numeric(dataframe[i, ]['cluster'])]
    } else{
      dataframe[i , ]['color'] = '#000000'
    }
    
  }
  
  return(dataframe)
}

Pick colors, other methods

draw_colours = function(dataframe, n_clusters){
  
  #DESCRIPTION : this function is assigning colours to specific clusters. 
  #INPUT: dataframe: 2D dataframe of our grayscale image with additional column 'cluster', which is the               assignment of each pixel to specific cluster
          #n_clusters: number of clusters which our algorithm has outputted
  #OUTPUT: dataframe: inputted dataframe with additional column 'color'
  
  if (n_clusters < 2){
    print ("Error: insufficient number of clusters : check your parameters")
    break}
  
  
  #sample n colors
  colours = pick_colours(n_clusters)
  
  #compute number of clusters
  clusters = seq(from=1, to=n_clusters, by=1)
  names(colours) = clusters
  
  #this is specific for dbscan: we want to assign colour black to non-clustered pixels and contrasting colours for clusters
  for (i in 1:nrow(dataframe)){
    
    if(dataframe[i, ]['cluster'] != 0){
      dataframe[i, ]['color'] = colours[as.numeric(dataframe[i, ]['cluster'])]
    } else{
      dataframe[i , ]['color'] = '#000000'
    }
    
  }
  
  return(dataframe)
}

Loading images

Following article analyzes 2 images using various clustering methods. These are the photographs of bobcat and deer.

Bobcat

#loading image
bobcat = readJPEG("bobcat.jpeg")

#checking dimensions to be sure that it's not a RGB image
dim(bobcat)
## [1] 221 480
#plotting image
bobcat = t(apply(bobcat, 2, rev)) #otherwise the image will be rotated
image(bobcat, col  = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))

Deer

#loading image
deer = readJPEG("deer.jpeg")

#checking dimensions to be sure that it's not a RGB image
dim(deer)
## [1] 330 330
#plotting image
deer = t(apply(deer, 2, rev)) #otherwise the image will be rotated
image(deer, col  = grey((0:dim(deer)[1])/dim(deer)[1]))

Contrasting images

In order to increase the contrast of the image, I am using created function contrast_image.

#Contrasting image
contrasted_image_bobcat = contrast_image(bobcat, scalar_white = 1.2, scalar_black = 0.1)
contrasted_image_deer = contrast_image(deer, scalar_white = 1, scalar_black = 1.3)

Clustering

CLARA

Let’s start with CLARA. The intuition behind this algorithm is the same as with PAM, but CLARA works faster with larger datasets, this is why I chose this version to work with images.

With CLARA, we are randomly choosing a sample, applying PAM and choosing optimal medoids for this sample. We repeat these steps for defined number of iterations. Finally, we pick the set of medoids with the lowest cost.

Bobcat

image(bobcat, col  = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))

CLARA with bobcat

#CLARA with contrasted image
clara = clara(contrasted_image_bobcat, 2)
colours <- pick_colours(length(unique(clara$cluster)))
img = image(bobcat, col  = colours) # plot in colours

Deer

image(deer, col  = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))

CLARA with deer

#CLARA with contrasted image
clara = clara(contrasted_image_deer, 3)
colours <- pick_colours(length(unique(clara$cluster)))
img = image(deer, col  = colours) # plot in colours

Silhouette plots of CLARA

Bobcat

clara = clara(contrasted_image_bobcat, 3)
plot(silhouette(clara))

Deer

clara = clara(contrasted_image_deer, 3)
plot(silhouette(clara))

DBSCAN

DBSCAN is a density-based clustering method. We have to define the reachability distance (epsilon) and minimum number of points that a certain point has to reach with its radius to become a core point.

The example below exactly shows the work of this algorithm.

Bobcat

image(bobcat, col  = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))

DBSCAN with bobcat

#DB-SCAN with contrasted image
db_scan = dbscan(contrasted_image_bobcat, eps=0.4, MinPts=10) #performing dbscan

df_image = data.frame(contrasted_image_bobcat)
df_image$cluster = db_scan$cluster #adding information about assigned cluster
df_image$color = "" #setting a column for a color

df_image = draw_colours_dbscan(df_image, n_clusters = length(unique(db_scan$cluster))) #sampling colors for clusters
colours = df_image$color 

image(bobcat, col = colours) #drawing image

Deer

image(deer, col  = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))

DBSCAN with deer

#DB-SCAN with contrasted image
db_scan = dbscan(contrasted_image_deer, eps=0.6, MinPts=3) #performing dbscan

df_image = data.frame(contrasted_image_deer)
df_image$cluster = db_scan$cluster #adding information about assigned cluster
df_image$color = "" #setting a column for a color

df_image = draw_colours_dbscan(df_image, n_clusters = length(unique(db_scan$cluster))) #sampling colors for clusters
colours = df_image$color 

image(deer, col = colours) #drawing image

DBSCAN cluster plots

Bobcat

db_scan = dbscan(contrasted_image_bobcat, eps=0.4, MinPts=10) #performing dbscan
fviz_cluster(db_scan, data = contrasted_image_bobcat, main = 'DBSCAN cluster plot (bobcat)')

Deer

db_scan = dbscan(contrasted_image_deer, eps=0.6, MinPts=3) #performing dbscan
fviz_cluster(db_scan, data = contrasted_image_deer, main = 'DBSCAN cluster plot (deer)')

Mean Shift

Mean Shift algorithm creates a radius around each point. Then, we calculate the mean position of the points (centroid) within the radius and move our point to this position. Points are “climbing” the steepest hills in our distribution. Mode is the peek of the hill.

The visualization of the Mean Shift algorithm is shown below.

Bobcat

image(bobcat, col  = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))

Mean shift clustering with bobcat

#MEAN SHIFT with contrasted image
mean_shift = meanShift(contrasted_image_bobcat, epsilonCluster=0.00052)
df_image = data.frame(contrasted_image_bobcat)
df_image$cluster = mean_shift$assignment #adding information about assigned cluster
df_image$color = ""

df_image = draw_colours(df_image, n_clusters = length(unique(mean_shift$assignment))) #sampling colors for clusters
colours = df_image$color 

image(bobcat, col = colours) #drawing image

Deer

image(deer, col  = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))

Mean shift clustering with deer

#MEAN SHIFT with contrasted image
mean_shift = meanShift(contrasted_image_deer, epsilonCluster=0.00052)
df_image = data.frame(contrasted_image_deer)
df_image$cluster = mean_shift$assignment #adding information about assigned cluster
df_image$color = ""

df_image = draw_colours(df_image, n_clusters = length(unique(mean_shift$assignment))) #sampling colors for clusters
colours = df_image$color 

image(deer, col = colours) #drawing image

Spectral Clustering

Spectral Clustering is often used when dealing with non-convex data. We are creating a Laplacian matrix based on a similarity matrix built from similarity graph, then we are performing classical clustering algorithms on proper eigenvectors of this matrix.

Bobcat

image(bobcat, col  = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))

Spectral clustering with bobcat

#SPECTRAL CLUSTERING with contrasted image
spectral = SPECTRAL(contrasted_image_bobcat, k = 4, sigma = 6, graph = FALSE)

df_image = data.frame(contrasted_image_bobcat)
df_image$cluster = spectral$cluster #adding information about assigned cluster
df_image$color = "" #setting a column for a color

df_image = draw_colours(df_image, n_clusters = length(unique(spectral$cluster))) #sampling colors for clusters
colours = df_image$color

image(bobcat, col = colours)

Deer

image(deer, col  = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))

Spectral clustering with deer

#SPECTRAL CLUSTERING with contrasted image
spectral = SPECTRAL(contrasted_image_deer, k = 2, sigma = 2.5, graph = FALSE)

df_image = data.frame(contrasted_image_deer)
df_image$cluster = spectral$cluster #adding information about assigned cluster
df_image$color = "" #setting a column for a color

df_image = draw_colours(df_image, n_clusters = length(unique(spectral$cluster))) #sampling colors for clusters
colours = df_image$color

image(deer, col = colours)

Observations

As we can see, every algorithm had different output. CLARA worked really well with clustering the body of a bobcat, where DBSCAN and Mean Shift showed that they can be useful in identifying edges of the body. Spectral Clustering has the poorest performance in this case, however we can also observe the edges of the animal, but mostly connected with the background.

When it comes to the deer, algorithms behave much worse when more noise is added to the image (grass and leaves). Highlights of CLARA and Spectral Clustering are the most visible ones.

On the beginning of this article I’ve stated that I want an algorithm to highlight an animal without any additional background. In my opinion, DBSCAN has done it pretty accurately with a bobcat. Let’s transfer the clustered area to the real image.

Overlaying original image with clusters

DBSCAN highlighted the bobcat in the most visible way compared to the other algorithms. Let’s overlay the image of this animal with clusters to make it more lucid and to summarize this article.

overlay = magick::image_read('overlay.png')
print(overlay)
## # A tibble: 1 × 7
##   format width height colorspace matte filesize density
##   <chr>  <int>  <int> <chr>      <lgl>    <int> <chr>  
## 1 PNG     1000   1000 sRGB       TRUE    441498 72x72

Conclusion

DBSCAN worked pretty well with detecting the bobcat. However, we can observe how differently the methods behave on specific photos. Image of a deer is not a hard case, but there was no algorithm, which could handle it. Unsupervised learning is rarely used in image segmentation. There are plenty of supervised techniques, which record a higher performance. With supervision, we are able to easily check the accuracy of a given algorithm.

The second biggest drawback of applying Unsupervised Learning to this problem is the fact that I had to tune parameters by myself in every case. You can observe values of epsilon like 0.4. I had to change the parameters a lot of times and check how it works with the specific image. With supervised learning, most often we posses a trained neural network, which is able to segment almost every type of a picture in a really efficient way.

References

Zeng et al. Image segmentation using spectral clustering of Gaussian mixture models models, Neurocomputing, Volume 144, 2014, Pages 346-356.

https://www.kdnuggets.com/2020/04/dbscan-clustering-algorithm-machine-learning.html

https://en.wikipedia.org/wiki/Spectral_clustering

https://ml-explained.com/blog/mean-shift-explained