Behaviour annotation protocol for camera-trap data in R

by Jorrit van Gils

Part 01 - data selection

Implementing camera trap data Aguti in R

Go to https://www.agouti.eu/, register (preferably with personal e-mail) sign in and ask moderator (Y.Liefting) for acces to ‘Hoge Veluwe wildlife monitoring project’.
Once in the project, click on ‘Export data’, download the most recent version of data for example ‘19 October 2021’
Unzip the file to a folder e.g. C:- Wageningen University & Research\02Thesis_Thesis_JorritvanGils I used my OneDrive but you could use any folder you want.

First we install packages and set our working directory.

if(!"tidyverse" %in% rownames(installed.packages())){install.packages("tidyverse")}
if(!"lubridate" %in% rownames(installed.packages())){install.packages("lubridate")}
if(!"rsample" %in% rownames(installed.packages())){install.packages("rsample")} 
if(!"utils" %in% rownames(installed.packages())){install.packages("utils")} 
if(!"imager" %in% rownames(installed.packages())){install.packages("imager")} 
library(tidyverse)
library(lubridate)
library(rsample) 
library(utils) 
library(imager) 

main_dir = "C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils"
sub_dir<-"downloads"
setwd(main_dir)

The data contains 3 csv (excel) files with camera trap record information:

observations.csv
multimedia.csv
deployments.csv

obs_dat <- read_csv("data/raw/hoge-veluwe-wildlife-monitoring-project-20210722055531/observations.csv")
assets_dat <- read_csv("data/raw/hoge-veluwe-wildlife-monitoring-project-20210722055531/multimedia.csv")
dep_dat <- read_csv("data/raw/hoge-veluwe-wildlife-monitoring-project-20210722055531/deployments.csv")

Merging columns of interest
Via Rstudio we merge columns of interest from the 3 csv-files. Because most information can be found in the file ‘observations.csv’, this file will be our base variable: obs_dat.

Information about the image-url and habitats can be found in the two other csv-files.

image-url is obtained from ‘multimedia.csv’*
habitats are obtained from ‘deployments.csv’*

Total columns with variables interest:

timestamp (time)
deployment_id (unique image code)
sequence_id (unique sequence code)
scientific_name (name of the species)
count (how many times observed)
file_path (image-url)*
location_name (habitat)*

filtering data
We filter the data from the original Auguti output to obtain per image 1 animal and we filter for the animal of interest.

scientific_name = “Cervus elaphus” (Red deer)
count = 1 unique value per observation

The above data reducing steps transfer base variable obs_dat to new variable obs_deer discarding images we are not interested in.

obs_dat (observations.csv) contained 454904 images and 20 variables
obs_deer contained 225724 images and 12 variables

base_url = "https://www.agouti.eu/#/project/e1730e39-e15d-41b4-bfeb-4a65912e5553/annotate/sequence/"
obs_deer <- obs_dat %>% 
  select(timestamp, deployment_id, sequence_id, scientific_name, count) %>% 
  filter(scientific_name == "Cervus elaphus") %>% 
  mutate(year = year(timestamp)) %>% 
  mutate(url = paste0(base_url, sequence_id)) %>% 
  unique() %>% 
  group_by(sequence_id) %>% 
  mutate(n=n(),
         n_count_unique = length(unique(count))) %>% 
  ungroup() %>% 
  filter(count == 1, n_count_unique == 1)

obs_deer <- left_join(obs_deer, select(dep_dat, deployment_id, location_name), 
                      by="deployment_id")

assets_dat_filter <- filter(assets_dat, sequence_id %in% obs_deer$sequence_id)
obs_deer <- assets_dat_filter %>% 
  select(multimedia_id, sequence_id, file_path, file_name) %>% 
  left_join(obs_deer, by = "sequence_id")

drops <- c("n","n_count_unique")
obs_deer[ , !(names(obs_deer) %in% drops)]

filter and split train and test dataset
Set.seed() allows reproduction of grouping. If splitting into a train- and a test set is not necessary, just change the group_vfold_cv argument v to 1 (instead of 5) and all images will be assigned to 1 group (dat_train).

The Hoge Veluwe data images are part of sequences. Grouping based on these sequence is needed because later the user can manually selects an image from this sequence.

Creating an independent train and test set is an important step in various machine learning applications. Here, the training and test- data set are split based on following criteria.

sequences from different camera stations
sequences from before 2018 (train) and after 2018 (test)

An additional filter is applied to prevent bias: * per year, per location max 10 sequences

Only sequences that fullfill the above requirements are assigned to the training- (700) and test set(150). These values are chosen to get the maximum amount of suitable sequences.

set.seed(1234567)
dat_split <- group_vfold_cv(obs_deer, group = location_name, v = 5)
dat_train <- analysis(dat_split$splits[[1]])
dat_test <- assessment(dat_split$splits[[1]])

dat_train <- dat_train %>%
  select(sequence_id, deployment_id, timestamp, location_name) %>%
  group_by(sequence_id) %>%
  slice_head(n=1) %>%
  ungroup() %>%
  filter(year(timestamp) < 2019) %>%
  group_by(year(timestamp), location_name) %>%
  slice_sample(n=10) %>%
  ungroup() %>%
slice_sample(n=700)

dat_test <- dat_test %>%
  select(sequence_id, deployment_id, timestamp, location_name) %>%
  group_by(sequence_id) %>%
  slice_head(n=1) %>%
  ungroup() %>%
  filter(year(timestamp) >= 2019) %>%
  group_by(year(timestamp), location_name) %>%
  slice_sample(n=10) %>%
  ungroup() %>%
slice_sample(n=175)

depending on which line you run, either train/test the following code lines will perform the task for that group.

seq_unique  <- unique(dat_train$sequence_id); mainFolder = "data/images/downloads/train" #(1)
seq_unique  <- unique(dat_test$sequence_id); mainFolder = "data/images/downloads/test" #(2)

Part 02 - download and image selection

downloading image sequences
Now we are ready to download the Red deer sequences. We create a folder with the path that can be found right above (mainFolder = …). With a for loop the images are downloaded into the corresponding sequence folder.

training set - 8 hours
test set - 2 hours

if(!dir.exists(file.path(main_dir, mainFolder))){dir.create(file.path(main_dir, mainFolder), recursive=TRUE)}
tstart <- Sys.time()
for (seq_id in seq_unique){
  cat("\n", match(seq_id, seq_unique), "of", length(seq_unique))
  obs_deer_focal <- obs_deer %>%
    filter(sequence_id == seq_id)
  if(!dir.exists(file.path(main_dir, mainFolder, seq_id))){dir.create(file.path(main_dir, mainFolder, seq_id))}
  for(seq_img in obs_deer_focal$file_path)
  {
    y <- strsplit(seq_img, split = "/")[[1]]
    download.file(url = seq_img, destfile = file.path(main_dir, mainFolder, seq_id, paste0(y[5],".jpg")),  method ="curl", quiet = TRUE)
  }
}
tend <- Sys.time()
tend - tstart
dt = (tend - tstart)
Sys.time() + dt/10*6000

manually image selection
After downloading the images go to the folders:

data/images/downloads/train
data/images/downloads/test

For each of these folders go over the sequence_id folders and select maximum 1 image per sequence_id folder.

Delete: * images that despite the selection for count = “1”,still contain multiple animals * images that are blurry * images in which only a small part of the animal is visible * images in which the animal is partly out of the screen

sometimes a sequence_id folder becomes empty (no worries!)

Example: A sequence id folder contains 20 images. For our balanced behaviour dataset we need behaviour from category ‘other’. We remove 19 images and 1 image remains in the sequence folder with vigilance behaviour.
Example: All 30 images are extremely dark and blurry by mist. We decide to delete all images, an empty folder remains.

When your aim is to build an automatic classification model, try to select images based on behavior of interest, ideally an equal amount of training images per category

E.g. train: +-150 moving, +- 150 foraging +- 150 other
E.g. test : +- 50 moving, +- 50 foraging +- 50 other

Most images are category ‘moving’, also ‘foraging’ is abundant.
The category ‘other’ is more rare. Always choose images from ‘other’ category
and sometimes prefer ‘foraging’ over ‘moving’ to end up with a balanced main behaviour dataset.

See below an example of more or less evenly distributed main behavior.

Automatic classification of the sub behavior will only work if there is enough training data per category. As the figure below shows, the sub categories contain not enough images (lets say 100 per category) and the category count varies a lot between the sub behaviors. Therefore to classify sub behavior more data and perhaps grouping of categories is needed.

Tests from the remaining images with DL object detection model YOLO showed that although it seemed as if there was only one animal on the image, some images still contained multiple animals.

labels Therefore all images with these sequence_id are also deleted:

055e2d5d-85ac-4006-b1ca-9cb9dd586c77
4ebcb62a-68dd-4062-b4f6-81e52d6df094
d257e9bb-2aa8-4382-923e-adcc4d33a58f
ee30e8d0-38e4-4f94-a375-98947531edfd
e421d01a-d5d7-474e-a320-d00a7cb67587
4c6ea438-4060-4724-b06f-66413fe5613a
494ba3fc-d5f7-4767-bd6d-1dfcd26902ce
f23bdc85-8b0c-4c47-8163-897adc3f4171
4ab11f57-8142-4184-8110-fa7734ed6fc3
06730a7a-30bc-429d-8892-8f7b01b61be9
f8430bf3-7a81-4f32-950a-b5134770e79b
1ef1938f-1e72-48f4-8d8c-543745fe1c12
6791100f-9162-47c8-b5d0-703f47c213e4
917c37e3-9652-44fd-bf46-b99d1ad1b25c
9454de69-71bb-41dd-b8f4-fad9c9b38dbe
9c55c28a-45e1-4a3d-8262-f5994c28d67d
30f50b5c-2114-4f21-8e81-39e53bfe9dbc
af8ed74a-a089-4c39-971d-ee579d85d30d
b8b4060b-3341-4aa1-a633-e12cd04b918a

We now finalized our image selection both by data selection and image selection. The remaining images in the folder downloads/train or downloads/test are collected and preprocessed.

Example result (part of the test) downloads folder:

Part 03 - collect and preprocess

collect and preprocess
Which folder (train or test) we first collect to process depends on if we run the first or the second line from the code below.

seq_unique  <- unique(dat_train$sequence_id); mainFolder = "data/images/downloads/train" #(1)
seq_unique  <- unique(dat_test$sequence_id); mainFolder = "data/images/downloads/test" #(2)

We transfer the images from the downloads folder to the processed folder:

data/images/downloads/train -> data/images/processed/train
data/images/downloads/test -> data/images/processed/test

we also crop the images to remove the textual information on the sides

save_image_folder <- gsub(mainFolder, pattern="downloads", replacement = "processed")
if(!dir.exists(file.path(main_dir, save_image_folder))){dir.create(file.path(main_dir, save_image_folder), recursive=TRUE)}
seqFolders <- list.files(file.path(main_dir, mainFolder), recursive=FALSE)

for(iseq in seqFolders)
{
  iseqImages <- list.files(file.path(main_dir, mainFolder, iseq), recursive=FALSE)
  for(iseqimg in iseqImages)
  {
    aimg <- load.image(file.path(main_dir, mainFolder, iseq, iseqimg))
    cropTop = 40
    cropBottom = 70
    aimgsub <- imsub(aimg,
                     x %inr% c(0,dim(aimg)[1]), 
                     y %inr% c(cropTop, dim(aimg)[2]-cropBottom)) 
    save.image(aimgsub, file=file.path(main_dir, save_image_folder , iseqimg), quality=0.7)
  }
}

The result are two folders ‘processed/test’ and ‘processed/train’ labels

Part 04 - labeling

Plot and annotate images

Which images (train or test) we first label depends on if we run the first two lines (train) or the third and the fourth line (test) from the code below.

seq_unique  <- unique(dat_train$sequence_id); mainFolder = "data/images/downloads/train"
save_image_folder <- gsub(mainFolder, pattern="downloads", replacement = "processed")

seq_unique  <- unique(dat_test$sequence_id); mainFolder = "data/images/downloads/test"
save_image_folder <- gsub(mainFolder, pattern="downloads", replacement = "processed")

We loop over the images in the above selected folder and provide each image with one label for ‘Main behavior’ and one label for ‘Sub behavior’. We do this by using R shortcuts (see script below, labelOptions). An overview of the behavior is given with the three main behaviors: moving, foraging and other. Also the sub behaviors are specified. An annotation example is shown and later images are added to illustrate what it looks like.

moving
- walking
- running
foraging
- scanning
- browsing
- grazing
other
- roaring
- sitting
- grooming
- standing
- vigilance
- camera watching

labelOptions <- c(moving = "m", foraging = "f", other = "o")
labelOptionsSub <- c(running = "ru", walking = "w", scanning = "sc",
                     browsing = "b", grazing = "gra", roaring = "ro",
                     sitting = "si", grooming = "gro", standing = "st",
                     vigilance = "v", camera_watching = "c")
imglist <- list.files(save_image_folder)
imgPose <- tibble(i = seq_along(imglist), 
                  image = imglist,
                  behaviour = factor(NA_character_, levels = as.character(labelOptions)),
                  behaviour_sub = factor(NA_character_, levels = as.character(labelOptionsSub)))
imgPose
levels(imgPose$behaviour)
while (!is.null(dev.list()))  dev.off()

for(i in imgPose$i[is.na(imgPose$behaviour)])
{
  aimg <- load.image(file.path(main_dir, save_image_folder, imglist[i]))
  thmb <- resize(aimg, -100, -100) 
  plot(thmb, main=i, axes=FALSE, interpolate=FALSE)
  correctAnnotation <- FALSE
  while(correctAnnotation == FALSE)
  {
    annotation <- readline(prompt = paste0("image ",i," -- enter label: \n", paste(paste0(names(labelOptions), " = ", labelOptions), collapse = ", "),"\n"))
    if(annotation %in% labelOptions)
    {
      annotation <- factor(annotation, levels = as.character(labelOptions))
      correctAnnotation <- TRUE
    }else
    {
      cat("INCORRECT label, enter correct label!")
    }
  }
  imgPose$behaviour[i] <- annotation
  correctAnnotation <- FALSE
  while(correctAnnotation == FALSE)
  {
    annotation <- readline(prompt = paste0("imgage ",i," -- enter sub_label: \n", paste(labelOptionsSub, collapse = ", "),"\n"))
    if(annotation %in% labelOptionsSub)
    {
      annotation <- factor(annotation, levels = as.character(labelOptionsSub))
      correctAnnotation <- TRUE
    }else
    {
      cat("INCORRECT label, enter correct label!")
    }
  }
  imgPose$behaviour_sub[i] <- annotation
  while (!is.null(dev.list()))  dev.off()
}
saveRDS(imgPose, file=file.path(main_dir, save_image_folder, "imgPose.rds"))

The result is one rds file (imgPose.rds) in data/image/processed/train and one rds file in data/image/processed/test (partly shown below)

Detailed examples

moving
- walking - legs move one by one
- running - legs moving parallel, feet off the ground, animal vague
foraging
- scanning - Noise (eyes) points towards ground (legs moving)
- browsing - Muzzle reaches up/down, leg position stability
- grazing - Muzzle touches ground
other
- roaring - mouth opens (and head goes up)
- sitting - sitting on the ground
- grooming - snout to body/leg, leg to bodypart or bodypart to object
- standing - No movement
- vigilance - Similar to standing but ears outward (and strong leg position)
- camera watching - face points towards camera

Decision boundary behavior
hard to distinguish combinations:

vigilance -> when ears point outwards and legs are fiercely positioned
standing -> ears more hanging and legs are more relaxed
.
scanning -> head goes towards the ground (not touching), sometimes moving
walking -> head is pointing forward, always moving
.
walking -> alternatively moving feet
running -> feet left/right moving parallel, or feet come from the ground
.
walking -> 2 or more legs are bend
standing -> Legs do not have to be next to each other but usually straight leaning. one leg can go up.
.
browsing -> If browsing from same height head, body position can be confused with behavior walking/standing

Part 05 - create labels table

merging train and test dataset
The code allows to correct for annotation errors and the .rdm files train test are now merged into one tibble containing the information:

i (numbering of the images in train/test)
filename (e.g. 00add8fd-bd07-4d43-a11d-ae8c30808c7e.jpg)
behaviour
sub behaviour
in_validation_set (training dataset == FALSE, test dataset == TRUE)
multimedia_id (unique value image)
path (url)
deployment_id (unique value image)
sequence_id (unique value sequence)
location_name (habitats)
timestamp (time of record)

labelsTrain <- readRDS("C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils/data/images/processed/train/imgPose.rds")
labelsTest <- readRDS("C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils/data/images/processed/test/imgPose.rds")

labelsTest$behaviour[80] <- "f"
labelsTrain$behaviour[380] <- "f"

labelsTrain$train=FALSE
labelsTest$train=TRUE
labels <- bind_rows(labelsTrain, 
                    labelsTest)
labels <- labels %>% 
  mutate(image = gsub(image, pattern=" - Copy", replacement="")) %>% 
  rename(file_name = image) %>% 
  mutate(multimedia_id = file_name) %>% 
  mutate(multimedia_id = gsub(multimedia_id, pattern=".jpg", replacement="")) %>% 
  rename(in_validation_set = train)

labels <- left_join(labels, select(obs_deer, multimedia_id, file_path, deployment_id, sequence_id, location_name, timestamp), 
                    by="multimedia_id")
labels <- labels %>% 
  rename(path = file_path)

We create a new folder that contains the final output: data/processed/labels_behaviour. Be aware that this new folder is different from the previous folder data/images/processed(!)

labelFolder = "data/processed/labels_behaviour"
if(!dir.exists(file.path(main_dir, labelFolder))){dir.create(file.path(main_dir, labelFolder), recursive=TRUE)}

saveRDS(labels, file=file.path(main_dir, labelFolder, "labels_Reddeer_JG.rds"))
write.csv(labels, file=file.path(main_dir, labelFolder, "labels_Reddeer_JG.csv"))

labels <- readRDS("C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils/data/processed/labels_behaviour/labels_Reddeer_JG.rds")
labels

Here we find the final result, which is a tibble with images from visible animals annotated with behavior and sub behavior. In this case Red deer is selected, blurry or edge-image animals are filtered out, and every image contains exactly 1 animal. This tibble forms a excellent starting point for behavior analysis looking at time, habitat or other variables. labels

I hope this helped you. In case you have any questions feel free to contact me via university e-mail jorritvangils@wur.nl or vangilsjorrit@gmail.com.

Behaviour annotation protocol for camera-trap data in R

by Jorrit van Gils

Table of contents:

Part 01 - data selection

Part 02 - download and image selection

Part 03 - collect and preprocess

Part 04 - labeling

Part 05 - create labels table

Part 01 - data selection

Part 02 - download and image selection

Part 03 - collect and preprocess

Part 04 - labeling

Part 05 - create labels table