This protocol describes 5 steps that filter the Hoge Veluwe camera trap data to directly applicable animal images from the species of interest.
Implementing camera trap data Aguti in R
First we install packages and set our working directory.
if(!"tidyverse" %in% rownames(installed.packages())){install.packages("tidyverse")}
if(!"lubridate" %in% rownames(installed.packages())){install.packages("lubridate")}
if(!"rsample" %in% rownames(installed.packages())){install.packages("rsample")}
if(!"utils" %in% rownames(installed.packages())){install.packages("utils")}
if(!"imager" %in% rownames(installed.packages())){install.packages("imager")}
library(tidyverse)
library(lubridate)
library(rsample)
library(utils)
library(imager)
main_dir = "C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils"
sub_dir<-"downloads"
setwd(main_dir)
The data contains 3 csv (excel) files with camera trap record information:
obs_dat <- read_csv("data/raw/hoge-veluwe-wildlife-monitoring-project-20210722055531/observations.csv")
assets_dat <- read_csv("data/raw/hoge-veluwe-wildlife-monitoring-project-20210722055531/multimedia.csv")
dep_dat <- read_csv("data/raw/hoge-veluwe-wildlife-monitoring-project-20210722055531/deployments.csv")
Merging columns of interest
Via Rstudio we merge columns of interest from the 3 csv-files. Because most information can be found in the file ‘observations.csv’, this file will be our base variable: obs_dat.
Information about the image-url and habitats can be found in the two other csv-files.
Total columns with variables interest:
filtering data
We filter the data from the original Auguti output to obtain per image 1 animal and we filter for the animal of interest.
The above data reducing steps transfer base variable obs_dat to new variable obs_deer discarding images we are not interested in.
base_url = "https://www.agouti.eu/#/project/e1730e39-e15d-41b4-bfeb-4a65912e5553/annotate/sequence/"
obs_deer <- obs_dat %>%
select(timestamp, deployment_id, sequence_id, scientific_name, count) %>%
filter(scientific_name == "Cervus elaphus") %>%
mutate(year = year(timestamp)) %>%
mutate(url = paste0(base_url, sequence_id)) %>%
unique() %>%
group_by(sequence_id) %>%
mutate(n=n(),
n_count_unique = length(unique(count))) %>%
ungroup() %>%
filter(count == 1, n_count_unique == 1)
obs_deer <- left_join(obs_deer, select(dep_dat, deployment_id, location_name),
by="deployment_id")
assets_dat_filter <- filter(assets_dat, sequence_id %in% obs_deer$sequence_id)
obs_deer <- assets_dat_filter %>%
select(multimedia_id, sequence_id, file_path, file_name) %>%
left_join(obs_deer, by = "sequence_id")
drops <- c("n","n_count_unique")
obs_deer[ , !(names(obs_deer) %in% drops)]
filter and split train and test dataset
Set.seed() allows reproduction of grouping. If splitting into a train- and a test set is not necessary, just change the group_vfold_cv argument v to 1 (instead of 5) and all images will be assigned to 1 group (dat_train).
The Hoge Veluwe data images are part of sequences. Grouping based on these sequence is needed because later the user can manually selects an image from this sequence.
Creating an independent train and test set is an important step in various machine learning applications. Here, the training and test- data set are split based on following criteria.
An additional filter is applied to prevent bias: * per year, per location max 10 sequences
Only sequences that fullfill the above requirements are assigned to the training- (700) and test set(150). These values are chosen to get the maximum amount of suitable sequences.
set.seed(1234567)
dat_split <- group_vfold_cv(obs_deer, group = location_name, v = 5)
dat_train <- analysis(dat_split$splits[[1]])
dat_test <- assessment(dat_split$splits[[1]])
dat_train <- dat_train %>%
select(sequence_id, deployment_id, timestamp, location_name) %>%
group_by(sequence_id) %>%
slice_head(n=1) %>%
ungroup() %>%
filter(year(timestamp) < 2019) %>%
group_by(year(timestamp), location_name) %>%
slice_sample(n=10) %>%
ungroup() %>%
slice_sample(n=700)
dat_test <- dat_test %>%
select(sequence_id, deployment_id, timestamp, location_name) %>%
group_by(sequence_id) %>%
slice_head(n=1) %>%
ungroup() %>%
filter(year(timestamp) >= 2019) %>%
group_by(year(timestamp), location_name) %>%
slice_sample(n=10) %>%
ungroup() %>%
slice_sample(n=175)
depending on which line you run, either train/test the following code lines will perform the task for that group.
seq_unique <- unique(dat_train$sequence_id); mainFolder = "data/images/downloads/train" #(1)
seq_unique <- unique(dat_test$sequence_id); mainFolder = "data/images/downloads/test" #(2)
downloading image sequences
Now we are ready to download the Red deer sequences. We create a folder with the path that can be found right above (mainFolder = …). With a for loop the images are downloaded into the corresponding sequence folder.
if(!dir.exists(file.path(main_dir, mainFolder))){dir.create(file.path(main_dir, mainFolder), recursive=TRUE)}
tstart <- Sys.time()
for (seq_id in seq_unique){
cat("\n", match(seq_id, seq_unique), "of", length(seq_unique))
obs_deer_focal <- obs_deer %>%
filter(sequence_id == seq_id)
if(!dir.exists(file.path(main_dir, mainFolder, seq_id))){dir.create(file.path(main_dir, mainFolder, seq_id))}
for(seq_img in obs_deer_focal$file_path)
{
y <- strsplit(seq_img, split = "/")[[1]]
download.file(url = seq_img, destfile = file.path(main_dir, mainFolder, seq_id, paste0(y[5],".jpg")), method ="curl", quiet = TRUE)
}
}
tend <- Sys.time()
tend - tstart
dt = (tend - tstart)
Sys.time() + dt/10*6000
manually image selection
After downloading the images go to the folders:
For each of these folders go over the sequence_id folders and select maximum 1 image per sequence_id folder.
Delete: * images that despite the selection for count = “1”,still contain multiple animals * images that are blurry * images in which only a small part of the animal is visible * images in which the animal is partly out of the screen
sometimes a sequence_id folder becomes empty (no worries!)
Example: A sequence id folder contains 20 images. For our balanced behaviour dataset we need behaviour from category ‘other’. We remove 19 images and 1 image remains in the sequence folder with vigilance behaviour.
Example: All 30 images are extremely dark and blurry by mist. We decide to delete all images, an empty folder remains.
When your aim is to build an automatic classification model, try to select images based on behavior of interest, ideally an equal amount of training images per category
Most images are category ‘moving’, also ‘foraging’ is abundant.
The category ‘other’ is more rare. Always choose images from ‘other’ category
and sometimes prefer ‘foraging’ over ‘moving’ to end up with a balanced main behaviour dataset.
See below an example of more or less evenly distributed main behavior.
Automatic classification of the sub behavior will only work if there is enough training data per category. As the figure below shows, the sub categories contain not enough images (lets say 100 per category) and the category count varies a lot between the sub behaviors. Therefore to classify sub behavior more data and perhaps grouping of categories is needed.
Tests from the remaining images with DL object detection model YOLO showed that although it seemed as if there was only one animal on the image, some images still contained multiple animals.
Therefore all images with these sequence_id are also deleted:
We now finalized our image selection both by data selection and image selection. The remaining images in the folder downloads/train or downloads/test are collected and preprocessed.
Example result (part of the test) downloads folder:
collect and preprocess
Which folder (train or test) we first collect to process depends on if we run the first or the second line from the code below.
seq_unique <- unique(dat_train$sequence_id); mainFolder = "data/images/downloads/train" #(1)
seq_unique <- unique(dat_test$sequence_id); mainFolder = "data/images/downloads/test" #(2)
We transfer the images from the downloads folder to the processed folder:
we also crop the images to remove the textual information on the sides
save_image_folder <- gsub(mainFolder, pattern="downloads", replacement = "processed")
if(!dir.exists(file.path(main_dir, save_image_folder))){dir.create(file.path(main_dir, save_image_folder), recursive=TRUE)}
seqFolders <- list.files(file.path(main_dir, mainFolder), recursive=FALSE)
for(iseq in seqFolders)
{
iseqImages <- list.files(file.path(main_dir, mainFolder, iseq), recursive=FALSE)
for(iseqimg in iseqImages)
{
aimg <- load.image(file.path(main_dir, mainFolder, iseq, iseqimg))
cropTop = 40
cropBottom = 70
aimgsub <- imsub(aimg,
x %inr% c(0,dim(aimg)[1]),
y %inr% c(cropTop, dim(aimg)[2]-cropBottom))
save.image(aimgsub, file=file.path(main_dir, save_image_folder , iseqimg), quality=0.7)
}
}
The result are two folders ‘processed/test’ and ‘processed/train’
Plot and annotate images
Which images (train or test) we first label depends on if we run the first two lines (train) or the third and the fourth line (test) from the code below.
seq_unique <- unique(dat_train$sequence_id); mainFolder = "data/images/downloads/train"
save_image_folder <- gsub(mainFolder, pattern="downloads", replacement = "processed")
seq_unique <- unique(dat_test$sequence_id); mainFolder = "data/images/downloads/test"
save_image_folder <- gsub(mainFolder, pattern="downloads", replacement = "processed")
We loop over the images in the above selected folder and provide each image with one label for ‘Main behavior’ and one label for ‘Sub behavior’. We do this by using R shortcuts (see script below, labelOptions). An overview of the behavior is given with the three main behaviors: moving, foraging and other. Also the sub behaviors are specified. An annotation example is shown and later images are added to illustrate what it looks like.
labelOptions <- c(moving = "m", foraging = "f", other = "o")
labelOptionsSub <- c(running = "ru", walking = "w", scanning = "sc",
browsing = "b", grazing = "gra", roaring = "ro",
sitting = "si", grooming = "gro", standing = "st",
vigilance = "v", camera_watching = "c")
imglist <- list.files(save_image_folder)
imgPose <- tibble(i = seq_along(imglist),
image = imglist,
behaviour = factor(NA_character_, levels = as.character(labelOptions)),
behaviour_sub = factor(NA_character_, levels = as.character(labelOptionsSub)))
imgPose
levels(imgPose$behaviour)
while (!is.null(dev.list())) dev.off()
for(i in imgPose$i[is.na(imgPose$behaviour)])
{
aimg <- load.image(file.path(main_dir, save_image_folder, imglist[i]))
thmb <- resize(aimg, -100, -100)
plot(thmb, main=i, axes=FALSE, interpolate=FALSE)
correctAnnotation <- FALSE
while(correctAnnotation == FALSE)
{
annotation <- readline(prompt = paste0("image ",i," -- enter label: \n", paste(paste0(names(labelOptions), " = ", labelOptions), collapse = ", "),"\n"))
if(annotation %in% labelOptions)
{
annotation <- factor(annotation, levels = as.character(labelOptions))
correctAnnotation <- TRUE
}else
{
cat("INCORRECT label, enter correct label!")
}
}
imgPose$behaviour[i] <- annotation
correctAnnotation <- FALSE
while(correctAnnotation == FALSE)
{
annotation <- readline(prompt = paste0("imgage ",i," -- enter sub_label: \n", paste(labelOptionsSub, collapse = ", "),"\n"))
if(annotation %in% labelOptionsSub)
{
annotation <- factor(annotation, levels = as.character(labelOptionsSub))
correctAnnotation <- TRUE
}else
{
cat("INCORRECT label, enter correct label!")
}
}
imgPose$behaviour_sub[i] <- annotation
while (!is.null(dev.list())) dev.off()
}
saveRDS(imgPose, file=file.path(main_dir, save_image_folder, "imgPose.rds"))
The result is one rds file (imgPose.rds) in data/image/processed/train and one rds file in data/image/processed/test (partly shown below)
Detailed examples
Decision boundary behavior
hard to distinguish combinations:
merging train and test dataset
The code allows to correct for annotation errors and the .rdm files train test are now merged into one tibble containing the information:
labelsTrain <- readRDS("C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils/data/images/processed/train/imgPose.rds")
labelsTest <- readRDS("C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils/data/images/processed/test/imgPose.rds")
labelsTest$behaviour[80] <- "f"
labelsTrain$behaviour[380] <- "f"
labelsTrain$train=FALSE
labelsTest$train=TRUE
labels <- bind_rows(labelsTrain,
labelsTest)
labels <- labels %>%
mutate(image = gsub(image, pattern=" - Copy", replacement="")) %>%
rename(file_name = image) %>%
mutate(multimedia_id = file_name) %>%
mutate(multimedia_id = gsub(multimedia_id, pattern=".jpg", replacement="")) %>%
rename(in_validation_set = train)
labels <- left_join(labels, select(obs_deer, multimedia_id, file_path, deployment_id, sequence_id, location_name, timestamp),
by="multimedia_id")
labels <- labels %>%
rename(path = file_path)
We create a new folder that contains the final output: data/processed/labels_behaviour. Be aware that this new folder is different from the previous folder data/images/processed(!)
labelFolder = "data/processed/labels_behaviour"
if(!dir.exists(file.path(main_dir, labelFolder))){dir.create(file.path(main_dir, labelFolder), recursive=TRUE)}
saveRDS(labels, file=file.path(main_dir, labelFolder, "labels_Reddeer_JG.rds"))
write.csv(labels, file=file.path(main_dir, labelFolder, "labels_Reddeer_JG.csv"))
labels <- readRDS("C:/Users/jorri/OneDrive - Wageningen University & Research/02Thesis/Project_Thesis_JorritvanGils/data/processed/labels_behaviour/labels_Reddeer_JG.rds")
labels
Here we find the final result, which is a tibble with images from visible animals annotated with behavior and sub behavior. In this case Red deer is selected, blurry or edge-image animals are filtered out, and every image contains exactly 1 animal. This tibble forms a excellent starting point for behavior analysis looking at time, habitat or other variables.
I hope this helped you. In case you have any questions feel free to contact me via university e-mail jorritvangils@wur.nl or vangilsjorrit@gmail.com.