This is an assignment created for Unsupervised Learning class. We were supposed to use clustering methods in order to analyze phenomena that are interesting for us.
My project is answering the question: Are we able to detect animals photographed by a trail camera using various clustering methods?
I will try to point out the animal seen on the photo by using couple of clustering methods, let’s see if I will get even close to highlight only the animal, without any additional background.
The aim of this project is not to cluster every pixel of the image correctly, but rather to highlight the animal. I did not set strict assumptions, for example I didn’t use the same defined number of clusters for every algorithm. I was “juggling” with the parameters for every method to obtain the best results. If you want to create a project similar to this, feel free to change the values of parameters and try to get closer than me :)
The images are loaded in grayscale because this is the format most often registered by a trail camera. Because of the shade and darkness existing in the woods, sensor in the camera detects the low level of light and turns on the IR flash. The result is a grayscale picture.
Used methods: CLARA, DBSCAN, Mean Shift, Spectral Clustering
Code and images are available on my GitHub
library(fpc)
library(jpeg)
library(imager)
library(fields)
library(ggplot2)
library(raster)
library(dplyr)
library(cluster)
library(factoextra)
library(flexclust)
library(fpc)
library(ClusterR)
library(meanShiftR)
library(RColorBrewer)
library(Dict)
library(fdm2id)
library(showtext)
library(magick)
I’ve created 4 functions to facilitate the work and create a clearer output at the end.
pick_colours = function (n) {
#DESCRIPTION : this function is picking n contrasting colours from the palette
#INPUT: n: number of colours that we want to pick
#OUTPUT: list_of_colors: list of n colours
#load palette
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
#create colors vector
col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))
#sample n colors from the vector
list_of_colors = sample(col_vector, n, replace=FALSE)
return (list_of_colors)
}
contrast_image = function (dataframe, scalar_white, scalar_black){
#DESCRIPTION : this function is contrasting pixels of the grayscale image
#INPUT: dataframe: 2D dataframe of our image
#scalar_white: value which corresponds to how much we want to brighten light pixels
#scalar_black: value which corresponds to how much we want to darken dark pixels
#OUTPUT: dataframe: converted dataframe with contrasted pixels
#if grayscale value is larger than a threshold - set a pixel value to scalar_white * pixel, else set it to scalar_black * pixel
for (j in 1:ncol(dataframe)){
for (i in 1:nrow(dataframe)){
if(dataframe[i, j] > 0.3){
dataframe[i, j] = dataframe[i,j] * scalar_white
}
else{
dataframe[i, j] = dataframe[i, j] * scalar_black
}
}
}
return(dataframe)
}
draw_colours_dbscan = function(dataframe, n_clusters){
#DESCRIPTION : this function is assigning colours to specific clusters. It is made for DBSCAN algorithm, because this method can output cluster #0
#INPUT: dataframe: 2D dataframe of our grayscale image with additional column 'cluster', which is the assignment of each pixel to specific cluster
#n_clusters: number of clusters which our algorithm has outputted
#OUTPUT: dataframe: inputted dataframe with additional column 'color'
if (n_clusters < 2){
print ("Error: insufficient number of clusters : check your parameters")
break}
#sample n colours
colours = pick_colours(n_clusters - 1)
#compute number of clusters
clusters = seq(from=1, to=n_clusters - 1, by=1)
names(colours) = clusters
#this is specific for dbscan: we want to assign color black to non-clustered pixels and contrasting colors for clusters
for (i in 1:nrow(dataframe)){
if(dataframe[i, ]['cluster'] != 0){
dataframe[i, ]['color'] = colours[as.numeric(dataframe[i, ]['cluster'])]
} else{
dataframe[i , ]['color'] = '#000000'
}
}
return(dataframe)
}
draw_colours = function(dataframe, n_clusters){
#DESCRIPTION : this function is assigning colours to specific clusters.
#INPUT: dataframe: 2D dataframe of our grayscale image with additional column 'cluster', which is the assignment of each pixel to specific cluster
#n_clusters: number of clusters which our algorithm has outputted
#OUTPUT: dataframe: inputted dataframe with additional column 'color'
if (n_clusters < 2){
print ("Error: insufficient number of clusters : check your parameters")
break}
#sample n colors
colours = pick_colours(n_clusters)
#compute number of clusters
clusters = seq(from=1, to=n_clusters, by=1)
names(colours) = clusters
#this is specific for dbscan: we want to assign colour black to non-clustered pixels and contrasting colours for clusters
for (i in 1:nrow(dataframe)){
if(dataframe[i, ]['cluster'] != 0){
dataframe[i, ]['color'] = colours[as.numeric(dataframe[i, ]['cluster'])]
} else{
dataframe[i , ]['color'] = '#000000'
}
}
return(dataframe)
}
Following article analyzes 2 images using various clustering methods. These are the photographs of bobcat and deer.
#loading image
bobcat = readJPEG("bobcat.jpeg")
#checking dimensions to be sure that it's not a RGB image
dim(bobcat)
## [1] 221 480
#plotting image
bobcat = t(apply(bobcat, 2, rev)) #otherwise the image will be rotated
image(bobcat, col = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))
#loading image
deer = readJPEG("deer.jpeg")
#checking dimensions to be sure that it's not a RGB image
dim(deer)
## [1] 330 330
#plotting image
deer = t(apply(deer, 2, rev)) #otherwise the image will be rotated
image(deer, col = grey((0:dim(deer)[1])/dim(deer)[1]))
In order to increase the contrast of the image, I am using created function contrast_image.
#Contrasting image
contrasted_image_bobcat = contrast_image(bobcat, scalar_white = 1.2, scalar_black = 0.1)
contrasted_image_deer = contrast_image(deer, scalar_white = 1, scalar_black = 1.3)
Let’s start with CLARA. The intuition behind this algorithm is the same as with PAM, but CLARA works faster with larger datasets, this is why I chose this version to work with images.
With CLARA, we are randomly choosing a sample, applying PAM and choosing optimal medoids for this sample. We repeat these steps for defined number of iterations. Finally, we pick the set of medoids with the lowest cost.
image(bobcat, col = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))
#CLARA with contrasted image
clara = clara(contrasted_image_bobcat, 2)
colours <- pick_colours(length(unique(clara$cluster)))
img = image(bobcat, col = colours) # plot in colours
image(deer, col = grey((0:dim(bobcat)[1])/dim(bobcat)[1]))
#CLARA with contrasted image
clara = clara(contrasted_image_deer, 3)
colours <- pick_colours(length(unique(clara$cluster)))
img = image(deer, col = colours) # plot in colours
clara = clara(contrasted_image_bobcat, 3)
plot(silhouette(clara))
clara = clara(contrasted_image_deer, 3)
plot(silhouette(clara))
DBSCAN is a density-based clustering method. We have to define the reachability distance (epsilon) and minimum number of points that a certain point has to reach with its radius to become a core point.
The example below exactly shows the work of this algorithm.