First off, I would like to acknowledge the blog/tutorials of Simon Barthelme and Manuel Gimond in which this code was modified and adapted from.

Links:

https://www.r-bloggers.com/new-version-of-imager-package-for-image-processing/

https://mgimond.github.io/Spatial/point-pattern-analysis-in-r.html

#———————————————–

I’ve been fooling around with some of the image analysis packages provided by R and came across “imageR” recently. As I assumed, they can probably be applied to birds-eye-view images of seabird colonies and nesting sites.

Here’s an example of how it can be applied. I haven’t manually counted for accuracy yet this was just for fun, but I’m sure its within a 90% confidence ballpark.

So basically what I’ve done here is processed the image to remove the background from the birds, expand the birds and outline them, then set the outlines against the image for verification, and simply take the length of the list object to get the count of birds (n = 333).

This took about 30 minutes and no machine learning to accomplish, as seems to be the current trend in super-colony counting.

library(imager)
library(spatstat)
library(ggplot2)
library(ggpubr)
library(ecespa)

So first, we want to set the working directory to the folder with the images we wish to process, then import them. I got this image from a random google search for nesting seabirds

#-----------------------------------------
#setwd("My/Working/directory")
list.files()

##  [1] "100915040_262380021578005_7115576343391109120_n.jpg"
##  [2] "100925892_244526526839177_6184116465101176832_n.jpg"
##  [3] "Bird_counting_image_process.R"                      
##  [4] "card00872_fr.jpg"                                   
##  [5] "heatmaps and density.png"                           
##  [6] "Image analysis"                                     
##  [7] "image_point_analysis.R"                             
##  [8] "image_processing.html"                              
##  [9] "image_processing.Rmd"                               
## [10] "images.jpg"                                         
## [11] "images.png"                                         
## [12] "quadrat counts.png"                                 
## [13] "remove NN.png"                                      
## [14] "RPubs - Bird Nesting Analysis in R.html"            
## [15] "rsconnect"                                          
## [16] "shapes to points.png"                               
## [17] "testplot.png"                                       
## [18] "wildlifetrusts_40661540865.jpg"                     
## [19] "zm5ijHsuTaLdWdWW29zCfG-1200-80.jpg"

#-----------------------------------------

# Import the image file
im <- load.image("wildlifetrusts_40661540865.jpg")
plot(im)

Then we want to check our images to see if greyscale or negative would be more appropriate for identifying our species (greyscale if light colored, negative if dark)

# greyscale
im.g <- grayscale(im)
# since the birds are white, lets turn it negative
im.g <- -im.g
plot(im.g)

Then we want to check which percentage captures most of our animals and least of the background

# determine the best threshold to identify animals
layout(t(1:3))
threshold(im.g,"25%") %>% plot
threshold(im.g,"15%") %>% plot 
threshold(im.g,"10%") %>% plot

We also want to make sure there isn’t light effecting what is getting captured by our image processing techniques, so lets remove some of it

# create df and summarize
df <- as.data.frame(im.g)
head(df,5)

##   x y      value
## 1 1 1 -0.2503529
## 2 2 1 -0.2431373
## 3 3 1 -0.3292941
## 4 4 1 -0.3385882
## 5 5 1 -0.3229020

m <- lm(value ~ x + y,data=df) #linear trend
summary(m)

## 
## Call:
## lm(formula = value ~ x + y, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.52393 -0.06093  0.02654  0.09531  0.35363 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.084e-01  1.257e-03 -324.84   <2e-16 ***
## x           -3.056e-04  4.730e-06  -64.60   <2e-16 ***
## y            7.841e-05  6.313e-06   12.42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1423 on 90217 degrees of freedom
## Multiple R-squared:  0.04577,    Adjusted R-squared:  0.04575 
## F-statistic:  2164 on 2 and 90217 DF,  p-value: < 2.2e-16

# extract and remove the luminance based on linear trend
layout(t(1:2))
im.f <- im.g-fitted(m)
plot(im.g,main="Before")
plot(im.f,main="After trend removal")

Now we begin our morphological operations, basically we want to reduce the image quality to only capture the animal points, and then increase the size of the pixels to make them large enough to outline. There are a few ways to do this:

# morphological operations
im.t <- threshold(im.f,"3%")
px <- as.pixset(1-im.t) #Convert to pixset
plot(px)

# enlarge the pixset by 3 pixels
grow(px,2) %>% plot(main="Growing by 2 pixels") # this one is better

# shrink the pixet by 3 pixels
shrink(px,2) %>% plot(main="Shrinking by 2 pixels")

# check side by side
layout(t(1:2))
plot(px,main="Original")
shrink(px,2) %>% grow(2) %>% plot(main="Shrink, then grow")

# clean fills holes
layout(t(1:2))
plot(px,main="Original")
grow(px,2)%>% plot(main="Grow, then shrink")

Now we’ll check out processed image, lay the outlines over the animals, then segregate the outlines into seperate lists to generate a count. Although the accuracy of this particular image is likely not 100%, it’s within the 90% confidence ballpark for sure!

Essentially this method could reduce processing time for animal counts significantly, however, it is likely limited by how much differences there are in the background. For example, if there are scattered objects which are they same color as the animals being counted, they will likely be included in the count. I have no way of mitigating this yet, but hopefully someone can come along and make improvements to the code I provide here!

par(mfrow=c(1,1))
# get the outline of points
plot(im)
fill(px,3) %>% clean(1) %>% highlight

## Split into connected components (individual coins)
pxs <- split_connected(px)
## Compute their respective area
area <- sapply(pxs,sum)
## count the birds
length(pxs)

## [1] 333

Now we want to convert the outlines to central points for further analysis

Using these newly generated points we can start doing some density analyses

#---------------------------------------------------
#####################################################
############ Count and density statistics############
#####################################################

# make a new dataframe with only x,y columns
df <- xy_remnn[,c(1,2)]

# convert to a ppp object for patstat analyses
df_ss    <- haz.ppp(df)

# plot to ensure match with new ppp object
par(mfrow=c(1,2))
plot(im);
plot(df_ss, main=NULL, cols="red", pch=20, add =TRUE)


# make quadrats for counts
# nx=num. rows, ny= num. columns
Q <- quadratcount(df_ss, nx= 6, ny=3)
plot(im)
plot(df_ss, pch=20, cols="red", main=NULL, add=TRUE)  # Plot points
plot(Q, add=TRUE, col = "white", size=13, lwd=3, cex = 2)  # Add quadrat grid

par(mfrow=c(2,2))
# Compute the density for each quadrat
Q.d <- intensity(Q)
plot(im)
# Plot the densities as a heatmap
plot(intensity(Q, image=TRUE),las=1, col=rev(heat.colors(25, alpha=0.3)), add=TRUE)  # Plot density raster
plot(df_ss, pch=20, cex=0.6, col=rgb(0,0,0,.5), add=TRUE, alpha = 0.3)  # Add points



library(viridis)



# density with default function
K1 <- density(df_ss) # Using the default bandwidth
plot(im)
plot(K1, main=NULL, las=1, col=inferno(25, alpha=0.5), add=TRUE)
contour(K1, add=TRUE, col = "white")

# set the bandwidth sigma
K2 <- density(df_ss, sigma=50) # Using a 50km bandwidth
plot(im)
plot(K2, main=NULL, las=1, col=inferno(25, alpha=0.5), add=TRUE)
contour(K1, add=TRUE, col = "white")

# change and test other density functions
K3 <- density(df_ss, kernel = "disc", sigma=50) # Using a 50km bandwidth
plot(im)
plot(K1, main=NULL, las=1, col=inferno(25, alpha=0.5), add=TRUE)
contour(K1, add=TRUE, col = "white")

now we can check a bit of the distance statistics as well

#####################################################
#####################################################
#                DISTANCE ANALYSES 


# ANN vs neighbor order plot
ANN <- apply(nndist(df_ss, k=1:df_ss$n),2,FUN=mean)
plot(ANN ~ eval(1:df_ss$n), type="b", main=NULL, las=1)

# K functions
K <- Kest(df_ss)
plot(K, main=NULL, las=1, legendargs=list(cex=0.8, xpd=TRUE, inset=c(1.01, 0) ))

## Warning in min(D[scaledlegbox]): no non-missing arguments to min; returning Inf

# L functions
L <- Lest(df_ss, main=NULL)
plot(L, main=NULL, las=1, legendargs=list(cex=0.8, xpd=TRUE, inset=c(1.01, 0) ))

## Warning in min(D[scaledlegbox]): no non-missing arguments to min; returning Inf

# plot L function and the Lexpected line set
plot(L, . -r ~ r, main=NULL, las=1, legendargs=list(cex=0.8, xpd=TRUE, inset=c(1.01, 0) ))

## Warning in min(D[scaledlegbox]): no non-missing arguments to min; returning Inf

# pair the correlation g
g  <- pcf(df_ss)
plot(g, main=NULL, las=1, legendargs=list(cex=0.8, xpd=TRUE, inset=c(1.01, 0) ))

## Warning in min(D[scaledlegbox]): no non-missing arguments to min; returning Inf

By Using Markov Chain Models and an Average Nearest Neighbor (ANN) analysis, we can also generate a pseudo-P value to test the null hypothesis of random distribution

#####################################################
#####################################################
#                HYPOTHESIS TESTING
par(mfrow=c(1,1))
#run an (Average nearest neighbor analysis) ANN analysis for birds locations 
# assuming a uniform point density
ann.p <- mean(nndist(df_ss, k=1))
ann.p

## [1] 10.95054

# Next, we will generate the distribution of 
# expected ANN values given a homogeneous (CSR/IRP) 
# point process using Monte Carlo methods. 
# This is our null model.
n     <- 599L               # Number of simulations
ann.r <- vector(length = n) # Create an empty object to be used to store simulated ANN values
for (i in 1:n){
        rand.p   <- rpoint(n=df_ss$n, win=df_ss$window)  # Generate random point locations
        ann.r[i] <- mean(nndist(rand.p, k=1))  # Tally the ANN values
}

# You can plot the last realization of the 
# homogeneous point process to see what a 
# completely random placement of birds could look like.
plot(im)
points(xy_remnn$x,xy_remnn$y, col ="blue", pch = 16)
plot(rand.p, pch=16, main=NULL, cols="red", add=TRUE)
# Add a legend
legend("top", 
       legend = c("Observed", "Random"), 
       col = c("red", "blue"), 
       pch = c(16,16), 
       bty = "n", 
       pt.cex = 2, 
       cex = 1.2, 
       text.col = "black", 
       horiz = T, 
       inset = c(0.1, 0.1))

# plot the histogram of expected values under the 
# null and add a blue vertical line showing where 
# our observed ANN value lies relative to this 
# distribution.
Corner_text <- function(text, location="topright"){
        legend(location,legend=text, bty ="n", pch=NA) 
}

# plot the histogram
hist(ann.r, main=NULL, las=1, breaks=40, col="bisque", xlim=range(ann.p, ann.r))
abline(v=ann.p, col="blue")

# the observed ANN value is larger than the 
# expected ANN values one could expect under the null 
# hypothesis. A larger observed value indicates that 
# the stores are far less clustered than expected 
# under the null.

# A pseudo p-value can be extracted from 
# Monte Carlo simulations. First, we need to find the 
# number of simulated ANN values greater than our 
# observed ANN value.
N.greater <- sum(ann.r > ann.p)


# To compute the p-value, find the end of the 
# distribution closest to the observed ANN value, 
# then divide that count by the total count. Note 
# that this is a so-called one-sided P-value.
p <- min(N.greater + 1, n + 1 - N.greater) / (n +1)
p

## [1] 0.003333333

# In our working example, you’ll note that or 
# simulated ANN value was nowhere near the range 
# of ANN values computed under the null yet we 
# don’t have a p-value of zero. This is by design 
# since the strength of our estimated p will be 
# proportional to the number of simulations–this 
# reflects the chance that given an infinite 
# number of simulations at least one realization 
# of a point pattern could produce an ANN value 
# more extreme than ours.

par(mfrow=c(1,2))

plot(im)
points(xy_remnn$x,xy_remnn$y, col ="blue", pch = 16)
plot(rand.p, pch=16, main=NULL, cols="red", add=TRUE)
# Add a legend
legend("top", 
       legend = c("Observed", "Random"), 
       col = c("red", "blue"), 
       pch = c(16,16), 
       bty = "n", 
       pt.cex = 2, 
       cex = 1.2, 
       text.col = "black", 
       horiz = T, 
       inset = c(0.1, 0.1))
# plot the histogram
hist(ann.r, main=NULL, las=1, breaks=40, col="bisque", xlim=range(ann.p, ann.r))
abline(v=ann.p, col="blue");
Corner_text(text="p = 0.00167")

I’m excited to see what others will do with this code and what other possible applications there are in ecology! I hope this helps someone somewhere!

A method for analyzing bird colonies using image processing techniques and point pattern analyses in R

Russell J. Gray

5/26/2020

now we can check a bit of the distance statistics as well