Behaviour annotation protocol for camera-trap data in R

by Jorrit van Gils

Table of contents:

  • Part 01 - data selection

  • Part 02 - download and image selection

  • Part 03 - collect and preprocess

  • Part 04 - labeling

  • Part 05 - create labels table

This protocol describes 5 steps that filter the Hoge Veluwe camera trap data to directly applicable animal images from the species of interest.


Part 01 - data selection

Implementing camera trap data Aguti in R

  1. Go to https://www.agouti.eu/, register (preferably with personal e-mail) sign in and ask moderator (Y.Liefting) for acces to ‘Hoge Veluwe wildlife monitoring project’.
  2. Once in the project, click on ‘Export data’, download the most recent version of data for example ‘19 October 2021’
  3. Unzip the file to a folder e.g. C:- Wageningen University & Research\02Thesis_Thesis_JorritvanGils I used my OneDrive but you could use any folder you want.

First we install packages and set our working directory.

if(!"tidyverse" %in% rownames(installed.packages())){install.packages("tidyverse")}
if(!"lubridate" %in% rownames(installed.packages())){install.packages("lubridate")}
if(!"rsample" %in% rownames(installed.packages())){install.packages("rsample")} 
if(!"utils" %in% rownames(installed.packages())){install.packages("utils")} 
if(!"imager" %in% rownames(installed.packages())){install.packages("imager")} 
library(tidyverse)
library(lubridate)
library(rsample) 
library(utils) 
library(imager) 

main_dir = "C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils"
sub_dir<-"downloads"
setwd(main_dir) 

The data contains 3 csv (excel) files with camera trap record information:

  • observations.csv
  • multimedia.csv
  • deployments.csv
obs_dat <- read_csv("data/raw/hoge-veluwe-wildlife-monitoring-project-20210722055531/observations.csv")
assets_dat <- read_csv("data/raw/hoge-veluwe-wildlife-monitoring-project-20210722055531/multimedia.csv")
dep_dat <- read_csv("data/raw/hoge-veluwe-wildlife-monitoring-project-20210722055531/deployments.csv")

Merging columns of interest
Via Rstudio we merge columns of interest from the 3 csv-files. Because most information can be found in the file ‘observations.csv’, this file will be our base variable: obs_dat.

Information about the image-url and habitats can be found in the two other csv-files.

  • image-url is obtained from ‘multimedia.csv’*
  • habitats are obtained from ‘deployments.csv’*

Total columns with variables interest:

  • timestamp (time)
  • deployment_id (unique image code)
  • sequence_id (unique sequence code)
  • scientific_name (name of the species)
  • count (how many times observed)
  • file_path (image-url)*
  • location_name (habitat)*

filtering data
We filter the data from the original Auguti output to obtain per image 1 animal and we filter for the animal of interest.

  • scientific_name = “Cervus elaphus” (Red deer)
  • count = 1 unique value per observation

The above data reducing steps transfer base variable obs_dat to new variable obs_deer discarding images we are not interested in.

  • obs_dat (observations.csv) contained 454904 images and 20 variables
  • obs_deer contained 225724 images and 12 variables
base_url = "https://www.agouti.eu/#/project/e1730e39-e15d-41b4-bfeb-4a65912e5553/annotate/sequence/"
obs_deer <- obs_dat %>% 
  select(timestamp, deployment_id, sequence_id, scientific_name, count) %>% 
  filter(scientific_name == "Cervus elaphus") %>% 
  mutate(year = year(timestamp)) %>% 
  mutate(url = paste0(base_url, sequence_id)) %>% 
  unique() %>% 
  group_by(sequence_id) %>% 
  mutate(n=n(),
         n_count_unique = length(unique(count))) %>% 
  ungroup() %>% 
  filter(count == 1, n_count_unique == 1)

obs_deer <- left_join(obs_deer, select(dep_dat, deployment_id, location_name), 
                      by="deployment_id")

assets_dat_filter <- filter(assets_dat, sequence_id %in% obs_deer$sequence_id)
obs_deer <- assets_dat_filter %>% 
  select(multimedia_id, sequence_id, file_path, file_name) %>% 
  left_join(obs_deer, by = "sequence_id")

drops <- c("n","n_count_unique")
obs_deer[ , !(names(obs_deer) %in% drops)]

filter and split train and test dataset
Set.seed() allows reproduction of grouping. If splitting into a train- and a test set is not necessary, just change the group_vfold_cv argument v to 1 (instead of 5) and all images will be assigned to 1 group (dat_train).

The Hoge Veluwe data images are part of sequences. Grouping based on these sequence is needed because later the user can manually selects an image from this sequence.

Creating an independent train and test set is an important step in various machine learning applications. Here, the training and test- data set are split based on following criteria.

  • sequences from different camera stations
  • sequences from before 2018 (train) and after 2018 (test)

An additional filter is applied to prevent bias: * per year, per location max 10 sequences

Only sequences that fullfill the above requirements are assigned to the training- (700) and test set(150). These values are chosen to get the maximum amount of suitable sequences.

set.seed(1234567)
dat_split <- group_vfold_cv(obs_deer, group = location_name, v = 5)
dat_train <- analysis(dat_split$splits[[1]])
dat_test <- assessment(dat_split$splits[[1]])

dat_train <- dat_train %>%
  select(sequence_id, deployment_id, timestamp, location_name) %>%
  group_by(sequence_id) %>%
  slice_head(n=1) %>%
  ungroup() %>%
  filter(year(timestamp) < 2019) %>%
  group_by(year(timestamp), location_name) %>%
  slice_sample(n=10) %>%
  ungroup() %>%
slice_sample(n=700)

dat_test <- dat_test %>%
  select(sequence_id, deployment_id, timestamp, location_name) %>%
  group_by(sequence_id) %>%
  slice_head(n=1) %>%
  ungroup() %>%
  filter(year(timestamp) >= 2019) %>%
  group_by(year(timestamp), location_name) %>%
  slice_sample(n=10) %>%
  ungroup() %>%
slice_sample(n=175)

depending on which line you run, either train/test the following code lines will perform the task for that group.

seq_unique  <- unique(dat_train$sequence_id); mainFolder = "data/images/downloads/train" #(1)
seq_unique  <- unique(dat_test$sequence_id); mainFolder = "data/images/downloads/test" #(2)

Part 02 - download and image selection

downloading image sequences
Now we are ready to download the Red deer sequences. We create a folder with the path that can be found right above (mainFolder = …). With a for loop the images are downloaded into the corresponding sequence folder.

  • training set - 8 hours
  • test set - 2 hours
if(!dir.exists(file.path(main_dir, mainFolder))){dir.create(file.path(main_dir, mainFolder), recursive=TRUE)}
tstart <- Sys.time()
for (seq_id in seq_unique){
  cat("\n", match(seq_id, seq_unique), "of", length(seq_unique))
  obs_deer_focal <- obs_deer %>%
    filter(sequence_id == seq_id)
  if(!dir.exists(file.path(main_dir, mainFolder, seq_id))){dir.create(file.path(main_dir, mainFolder, seq_id))}
  for(seq_img in obs_deer_focal$file_path)
  {
    y <- strsplit(seq_img, split = "/")[[1]]
    download.file(url = seq_img, destfile = file.path(main_dir, mainFolder, seq_id, paste0(y[5],".jpg")),  method ="curl", quiet = TRUE)
  }
}
tend <- Sys.time()
tend - tstart
dt = (tend - tstart)
Sys.time() + dt/10*6000

manually image selection
After downloading the images go to the folders:

  • data/images/downloads/train
  • data/images/downloads/test

For each of these folders go over the sequence_id folders and select maximum 1 image per sequence_id folder.

Delete: * images that despite the selection for count = “1”,still contain multiple animals * images that are blurry * images in which only a small part of the animal is visible * images in which the animal is partly out of the screen

sometimes a sequence_id folder becomes empty (no worries!)

  1. Example: A sequence id folder contains 20 images. For our balanced behaviour dataset we need behaviour from category ‘other’. We remove 19 images and 1 image remains in the sequence folder with vigilance behaviour.

  2. Example: All 30 images are extremely dark and blurry by mist. We decide to delete all images, an empty folder remains.

When your aim is to build an automatic classification model, try to select images based on behavior of interest, ideally an equal amount of training images per category

  • E.g. train: +-150 moving, +- 150 foraging +- 150 other
  • E.g. test : +- 50 moving, +- 50 foraging +- 50 other

Most images are category ‘moving’, also ‘foraging’ is abundant.
The category ‘other’ is more rare. Always choose images from ‘other’ category
and sometimes prefer ‘foraging’ over ‘moving’ to end up with a balanced main behaviour dataset.

See below an example of more or less evenly distributed main behavior.

Automatic classification of the sub behavior will only work if there is enough training data per category. As the figure below shows, the sub categories contain not enough images (lets say 100 per category) and the category count varies a lot between the sub behaviors. Therefore to classify sub behavior more data and perhaps grouping of categories is needed.

Tests from the remaining images with DL object detection model YOLO showed that although it seemed as if there was only one animal on the image, some images still contained multiple animals.

labels Therefore all images with these sequence_id are also deleted:

  • 055e2d5d-85ac-4006-b1ca-9cb9dd586c77
  • 4ebcb62a-68dd-4062-b4f6-81e52d6df094
  • d257e9bb-2aa8-4382-923e-adcc4d33a58f
  • ee30e8d0-38e4-4f94-a375-98947531edfd
  • e421d01a-d5d7-474e-a320-d00a7cb67587
  • 4c6ea438-4060-4724-b06f-66413fe5613a
  • 494ba3fc-d5f7-4767-bd6d-1dfcd26902ce
  • f23bdc85-8b0c-4c47-8163-897adc3f4171
  • 4ab11f57-8142-4184-8110-fa7734ed6fc3
  • 06730a7a-30bc-429d-8892-8f7b01b61be9
  • f8430bf3-7a81-4f32-950a-b5134770e79b
  • 1ef1938f-1e72-48f4-8d8c-543745fe1c12
  • 6791100f-9162-47c8-b5d0-703f47c213e4
  • 917c37e3-9652-44fd-bf46-b99d1ad1b25c
  • 9454de69-71bb-41dd-b8f4-fad9c9b38dbe
  • 9c55c28a-45e1-4a3d-8262-f5994c28d67d
  • 30f50b5c-2114-4f21-8e81-39e53bfe9dbc
  • af8ed74a-a089-4c39-971d-ee579d85d30d
  • b8b4060b-3341-4aa1-a633-e12cd04b918a

We now finalized our image selection both by data selection and image selection. The remaining images in the folder downloads/train or downloads/test are collected and preprocessed.

Example result (part of the test) downloads folder:


Part 03 - collect and preprocess

collect and preprocess
Which folder (train or test) we first collect to process depends on if we run the first or the second line from the code below.

seq_unique  <- unique(dat_train$sequence_id); mainFolder = "data/images/downloads/train" #(1)
seq_unique  <- unique(dat_test$sequence_id); mainFolder = "data/images/downloads/test" #(2)

We transfer the images from the downloads folder to the processed folder:

  • data/images/downloads/train -> data/images/processed/train
  • data/images/downloads/test -> data/images/processed/test

we also crop the images to remove the textual information on the sides

save_image_folder <- gsub(mainFolder, pattern="downloads", replacement = "processed")
if(!dir.exists(file.path(main_dir, save_image_folder))){dir.create(file.path(main_dir, save_image_folder), recursive=TRUE)}
seqFolders <- list.files(file.path(main_dir, mainFolder), recursive=FALSE)

for(iseq in seqFolders)
{
  iseqImages <- list.files(file.path(main_dir, mainFolder, iseq), recursive=FALSE)
  for(iseqimg in iseqImages)
  {
    aimg <- load.image(file.path(main_dir, mainFolder, iseq, iseqimg))
    cropTop = 40
    cropBottom = 70
    aimgsub <- imsub(aimg,
                     x %inr% c(0,dim(aimg)[1]), 
                     y %inr% c(cropTop, dim(aimg)[2]-cropBottom)) 
    save.image(aimgsub, file=file.path(main_dir, save_image_folder , iseqimg), quality=0.7)
  }
}

The result are two folders ‘processed/test’ and ‘processed/train’ labels


Part 04 - labeling

Plot and annotate images

Which images (train or test) we first label depends on if we run the first two lines (train) or the third and the fourth line (test) from the code below.

seq_unique  <- unique(dat_train$sequence_id); mainFolder = "data/images/downloads/train"
save_image_folder <- gsub(mainFolder, pattern="downloads", replacement = "processed")

seq_unique  <- unique(dat_test$sequence_id); mainFolder = "data/images/downloads/test"
save_image_folder <- gsub(mainFolder, pattern="downloads", replacement = "processed")

We loop over the images in the above selected folder and provide each image with one label for ‘Main behavior’ and one label for ‘Sub behavior’. We do this by using R shortcuts (see script below, labelOptions). An overview of the behavior is given with the three main behaviors: moving, foraging and other. Also the sub behaviors are specified. An annotation example is shown and later images are added to illustrate what it looks like.

  • moving
    • walking
    • running
  • foraging
    • scanning
    • browsing
    • grazing
  • other
    • roaring
    • sitting
    • grooming
    • standing
    • vigilance
    • camera watching

labelOptions <- c(moving = "m", foraging = "f", other = "o")
labelOptionsSub <- c(running = "ru", walking = "w", scanning = "sc",
                     browsing = "b", grazing = "gra", roaring = "ro",
                     sitting = "si", grooming = "gro", standing = "st",
                     vigilance = "v", camera_watching = "c")
imglist <- list.files(save_image_folder)
imgPose <- tibble(i = seq_along(imglist), 
                  image = imglist,
                  behaviour = factor(NA_character_, levels = as.character(labelOptions)),
                  behaviour_sub = factor(NA_character_, levels = as.character(labelOptionsSub)))
imgPose
levels(imgPose$behaviour)
while (!is.null(dev.list()))  dev.off()

for(i in imgPose$i[is.na(imgPose$behaviour)])
{
  aimg <- load.image(file.path(main_dir, save_image_folder, imglist[i]))
  thmb <- resize(aimg, -100, -100) 
  plot(thmb, main=i, axes=FALSE, interpolate=FALSE)
  correctAnnotation <- FALSE
  while(correctAnnotation == FALSE)
  {
    annotation <- readline(prompt = paste0("image ",i," -- enter label: \n", paste(paste0(names(labelOptions), " = ", labelOptions), collapse = ", "),"\n"))
    if(annotation %in% labelOptions)
    {
      annotation <- factor(annotation, levels = as.character(labelOptions))
      correctAnnotation <- TRUE
    }else
    {
      cat("INCORRECT label, enter correct label!")
    }
  }
  imgPose$behaviour[i] <- annotation
  correctAnnotation <- FALSE
  while(correctAnnotation == FALSE)
  {
    annotation <- readline(prompt = paste0("imgage ",i," -- enter sub_label: \n", paste(labelOptionsSub, collapse = ", "),"\n"))
    if(annotation %in% labelOptionsSub)
    {
      annotation <- factor(annotation, levels = as.character(labelOptionsSub))
      correctAnnotation <- TRUE
    }else
    {
      cat("INCORRECT label, enter correct label!")
    }
  }
  imgPose$behaviour_sub[i] <- annotation
  while (!is.null(dev.list()))  dev.off()
}
saveRDS(imgPose, file=file.path(main_dir, save_image_folder, "imgPose.rds"))

The result is one rds file (imgPose.rds) in data/image/processed/train and one rds file in data/image/processed/test (partly shown below)

Detailed examples

  • moving
    • walking - legs move one by one labels
    • running - legs moving parallel, feet off the ground, animal vague labels
  • foraging
    • scanning - Noise (eyes) points towards ground (legs moving) labels
    • browsing - Muzzle reaches up/down, leg position stability labels
    • grazing - Muzzle touches ground labels
  • other
    • roaring - mouth opens (and head goes up) labels
    • sitting - sitting on the ground labels
    • grooming - snout to body/leg, leg to bodypart or bodypart to object labels
    • standing - No movement labels
    • vigilance - Similar to standing but ears outward (and strong leg position) labels
    • camera watching - face points towards camera labels

Decision boundary behavior
hard to distinguish combinations:

  • vigilance -> when ears point outwards and legs are fiercely positioned
  • standing -> ears more hanging and legs are more relaxed
    .
  • scanning -> head goes towards the ground (not touching), sometimes moving
  • walking -> head is pointing forward, always moving
    .
  • walking -> alternatively moving feet
  • running -> feet left/right moving parallel, or feet come from the ground
    .
  • walking -> 2 or more legs are bend
  • standing -> Legs do not have to be next to each other but usually straight leaning. one leg can go up.
    .
  • browsing -> If browsing from same height head, body position can be confused with behavior walking/standing

Part 05 - create labels table

merging train and test dataset
The code allows to correct for annotation errors and the .rdm files train test are now merged into one tibble containing the information:

  • i (numbering of the images in train/test)
  • filename (e.g. 00add8fd-bd07-4d43-a11d-ae8c30808c7e.jpg)
  • behaviour
  • sub behaviour
  • in_validation_set (training dataset == FALSE, test dataset == TRUE)
  • multimedia_id (unique value image)
  • path (url)
  • deployment_id (unique value image)
  • sequence_id (unique value sequence)
  • location_name (habitats)
  • timestamp (time of record)
labelsTrain <- readRDS("C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils/data/images/processed/train/imgPose.rds")
labelsTest <- readRDS("C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils/data/images/processed/test/imgPose.rds")

labelsTest$behaviour[80] <- "f"
labelsTrain$behaviour[380] <- "f"

labelsTrain$train=FALSE
labelsTest$train=TRUE
labels <- bind_rows(labelsTrain, 
                    labelsTest)
labels <- labels %>% 
  mutate(image = gsub(image, pattern=" - Copy", replacement="")) %>% 
  rename(file_name = image) %>% 
  mutate(multimedia_id = file_name) %>% 
  mutate(multimedia_id = gsub(multimedia_id, pattern=".jpg", replacement="")) %>% 
  rename(in_validation_set = train)

labels <- left_join(labels, select(obs_deer, multimedia_id, file_path, deployment_id, sequence_id, location_name, timestamp), 
                    by="multimedia_id")
labels <- labels %>% 
  rename(path = file_path) 

We create a new folder that contains the final output: data/processed/labels_behaviour. Be aware that this new folder is different from the previous folder data/images/processed(!)

labelFolder = "data/processed/labels_behaviour"
if(!dir.exists(file.path(main_dir, labelFolder))){dir.create(file.path(main_dir, labelFolder), recursive=TRUE)}

saveRDS(labels, file=file.path(main_dir, labelFolder, "labels_Reddeer_JG.rds"))
write.csv(labels, file=file.path(main_dir, labelFolder, "labels_Reddeer_JG.csv"))
labels <- readRDS("C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils/data/processed/labels_behaviour/labels_Reddeer_JG.rds")
labels

Here we find the final result, which is a tibble with images from visible animals annotated with behavior and sub behavior. In this case Red deer is selected, blurry or edge-image animals are filtered out, and every image contains exactly 1 animal. This tibble forms a excellent starting point for behavior analysis looking at time, habitat or other variables. labels

I hope this helped you. In case you have any questions feel free to contact me via university e-mail or .