---
title: "Basic Camera Trap descriptors - Mara Workshop"
author: "Antonio Uzal"
date: "January 2026"
format:
html:
self-contained: true
toc: true
theme: minty
code-fold: true
code-summary: "Show Code"
code-tools: true
knitr:
opts_chunk:
warning: false
message: false
editor: visual
---
```{=html}
<style>
/* Style for all code outputs */
pre {
color: blue; /* Set font color to blue */
background-color: #f0f8ff; /* Light background color (AliceBlue) */
padding: 10px; /* Add padding for better readability */
border-radius: 5px; /* Rounded corners */
}
</style>
```
------------------------------------------------------------------------
# Introduction
This tutorial covers basic manipulation of data and calculations directed at obtaining basic Camera Trapping descriptors.
The case study is based on the current data obtained from an area that includes Enarau and adjacent landscape
# The dataset
Obtained via [Wildlife Insights](https://www.wildlifeinsights.org/). The download output includes 5 files:
- projects.csv \> description of the project
- cameras.csv \> description of cameras (make, model, serial number and ID)
- deployments.csv \> description of deployments, including location, start and end date, camera ID used and landscape feature where the deployment took place (e.g. trail game)
- images_2010431.csv \> raw camera trap data for single images
- sequences.csv \> raw camera trap data for sequences of images

Import datasets
```{r Importing datasets}
deployment <- read.csv("deployments.csv",
header = TRUE,
sep = ",",
stringsAsFactors = FALSE)
camtraps <- read.csv("sequences.csv",
header = TRUE,
sep = ",",
stringsAsFactors = FALSE)
```
# Obtaining descriptors
Firstly, install required packages, if they are not already in your computer, and load them:
```{r install packages, results='hide',echo=TRUE, message=FALSE, warning=FALSE}
#Load Packages
list.of.packages <- c(
"leaflet", # creates interactive maps
"plotly", # creates interactive plots
"kableExtra", # Creates interactive tables
"tidyr", # A package for data manipulation
"dplyr", # A package for data manipulation
"viridis", # Generates colors for plots
"corrplot", # Plots pairwise correlations
"lubridate", # Easy manipulation of date objects
"taxize", # Package to check taxonomy
"sf", # Package for spatial data analysis
"reshape", # allows restructuring of data
"vegan", # tools for descriptive parameters in ecology
"plotrix", # visualises circular data (activity)
"chron", # handles chronological objects
"knitr") # dynamic reporting (nice tables)
# Check you have them in your library
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
# install any missing packages and load them
if(length(new.packages)) install.packages(new.packages,repos = "http://cran.us.r-project.org")
lapply(list.of.packages, require, character.only = TRUE)
```
## Checking data
Quick set of functions to make sure everything looks like it should.
```{r checking datasets}
names(deployment) # return the name of variables contained in dataset
names(camtraps)
```
We can also look at the structure of each of the variables using the `str` function. Output not shown here.
```{r structure of dataset, results='hide',echo=TRUE, message=FALSE, warning=FALSE}
str(deployment)
str(camtraps)
```
Let's obtain some numerical outputs:
### Number of photos
```{r number of photos}
number <- nrow(camtraps) # number of rows in the dataset
print(paste("Number of photographs:", number))
```
Notice that the number differs from the Wildlife Insights summary, this might be because photographs with humans were filtered out.
### Species
How many different species have been captured and which ones? in the dataset the full species name is divided in *genus* and *species* so we need to combine this first
```{r number of species}
# Combine genus and species into a full species name
camtraps$full_species <- paste(camtraps$genus, camtraps$species, sep = " ")
# Extract unique species
unique_species <- unique(camtraps$full_species)
# Count the number of unique species
print(paste("Number of unique species:",length(unique_species)))
# View the unique species
print(unique_species)
```
The number of species (n=40) differs from the WI summary (n=57). This might be because there are some species that have not been catalogued yet. WI summary includes them, whilst we don't do it.
There were some photographs that containeded no species (i.e. only " " is returned). How often does this happen?
```{r images without identified animals}
# Count how many times the genus has not been identified, and therefor the photograph does not contain relevant species
empty_genus <- sum(camtraps$genus == "")
# View the count
print(empty_genus)
# Proportion of dataset
proportion <- (empty_genus/nrow(camtraps))*100
print(paste("Percentage of photographs without animal captures:",proportion))
```
This has happened in \>5% of the photographs. Remember that photographs containing humans were already filtered out by WI. Looking at the dataset, this is mainly related to the camera being triggered by passing vehicles.
### Deployments
We would like to know if all deployments were successful (i.e. the cameras took photos, regardless if animals were detected)
```{r successful deployments}
#count number of deployments
unique_deployments <- unique(deployment$deployment_id)
num_all_deployments <- length(unique_deployments)
#count number of successful deployments
successful_deployments<- unique(camtraps$deployment_id)
num_successful_deployments <-length(successful_deployments)
#how many were not successful?
total<-num_all_deployments - num_successful_deployments
print(paste("Number of deployments with no captures:", total))
```
Four camera stations had no photos, which could be due to malfunction, damage or lost, or simply nothing triggered the sensors. You might want to double check this by downloading again the dataset without filtering humans as the first photograph should contain the researcher holding a white board with deployment info.
### Start and End of sampling
First we need to make sure that the `start_date` and `end_date` columns are properly formatted as date-time **objects**.
`lubridate` is an awesome package that allows fixing all kind of issues with dates and times. `ynd_hms()` from `lubridate` automatically recognises and converts dates with timezones:
```{r standardise dates and times}
library(lubridate)
# Ensure character if needed (e.g., factors)
deployment$start_date <- as.character(deployment$start_date)
deployment$end_date <- as.character(deployment$end_date)
# Parse as local Nairobi time
deployment$start_date <- ymd_hms(deployment$start_date, tz = "Africa/Nairobi")
deployment$end_date <- ymd_hms(deployment$end_date, tz = "Africa/Nairobi")
# Optional: keep a UTC copy for joins, modelling, or cross-site comparisons
deployment$start_date_utc <- with_tz(deployment$start_date, "UTC")
deployment$end_date_utc <- with_tz(deployment$end_date, "UTC")
# Sanity checks
stopifnot(!any(is.na(deployment$start_date)))
stopifnot(!any(is.na(deployment$end_date)))
# Check for NAs after conversion to ensure it has been successful
print(sum(is.na(deployment$start_date)))
# Should return 0 if conversion is successful
```
Now we can work out the start and end of the sampling period across all deployments
```{r working out start and end of sampling period}
# Find the earliest start date
earliest_start_date <- min(deployment$start_date, na.rm = TRUE)
# Find the latest end date
latest_end_date <- max(deployment$end_date, na.rm = TRUE)
# View the results
print(paste("Earliest start date:", earliest_start_date))
print(paste("Latest end date:", latest_end_date))
```
### Deployment interval
It is useful to know the survey effort per station/deployment and identify sources of imbalance or plainly, when things have gone wrong. For this,, we can calculate the deployment interval: the number of days between start and end date.
```{r calculating deployment interval}
# Calculate the interval for each deployment
deployment$interval <- interval(deployment$start_date, deployment$end_date)
# If you want to calculate the duration in days, hours, etc.
deployment$duration_days <- as.numeric(deployment$interval) / (60 * 60 * 24) # Convert to days
# Check the results
print(head(deployment[, c("start_date", "end_date", "duration_days")],5))
# Or you can summarise the range of values obtained
summary(deployment$duration_days)
```
The average interval was around 19 days, but the variation is quite large... are we sure there was a camera deployed for 54 days?.
### Sampling Effort
This is an important parameter, that will allow us to obtain others, such as the naive occupancy. Sampling effort is considered here as the number of camera days, so that is a simple line of coding:
```{r sampling effort}
# Sum the duration_days to calculate total camera days (sampling effort)
total_camera_days <- sum(deployment$duration_days, na.rm = TRUE)
# Round the result to NO decimal places
total_camera_days_rounded <- round(total_camera_days, 0)
# View the rounded result
print(paste("Total camera days (sampling effort):", total_camera_days_rounded))
```
### Camera Locations
A common mistake in camera trap data sets is that the locations are not where they are supposed to be. The safest way to check your data is to plot it.
The package `leaflet` is awesome for rapid and interactive maps and plots. The next R code produces the simplest version of a leaflet map. Notice you can zoom in and out!
```{r interactive map}
m <- leaflet() %>% # call leaflet
addTiles() %>% # add the default basemap
addCircleMarkers( # Add circles for stations
lng=deployment$longitude, lat=deployment$latitude)
m # return the map
```
::: callout-tip
You can change the base map and use some satellite imagery
```{r interactive map with satellite imagery}
m <- leaflet() %>% # call leaflet
addProviderTiles(providers$Esri.WorldImagery) %>% #Add Esri Wrold imagery
addCircleMarkers( # Add circles for stations
lng=deployment$longitude, lat=deployment$latitude)
m # return the map
```
:::
## Relative Index of Abundance
The RIA is obtained by calculating the number of independent captures per species considering the sampling effort previously obtained (`total_camera_days_rounded)`. We will exclude all blank records (i.e. those that do not have any *genus* identified). Remember that you need to think carefully what you consider as independent captures. For instance, you might want to consider independent captures only when there is a 30 minutes interval in between two captures of the same species, to avoid 'double counting' individuals that might stay around the camera, for example grazing. Examples below consider this (i.e. 'threshold').
A first step is dealing with the timestamp of the sequence. We will consider the timestamp of the sequence as the date and time of the first photograph of the sequence.
```{r sequence timestamp}
# Because we are using sequences, we will use as timestamp for the event the time of the first photograph of the series
camtraps$timestamp <- camtraps$start_time
# Convert timestamp column in camtraps to POSIXct if needed
camtraps$timestamp <- as.POSIXct(camtraps$timestamp, format = "%Y-%m-%d %H:%M:%S")
```
Firstly, we can obtain a RAI for each species at each deployment/station:
```{r obtaining RAI for each species at each deployment - parameters}
# Define a custom function event.sp to calculate trapping events and Relative Index of Abundance (RIA)
event.sp <- function(camtraps,
deployment,
total_camera_days_rounded,
time_threshold = NULL) {
# event.sp: This is the name of the function being defined.
# It is a convention in R to use the function_name
# <- function(...) syntax to create a new function.
#camtraps: This parameter expects a dataset
#(typically a data frame) containing information about
#camera trap photographs.
#deployment: This parameter is intended for another dataset,
#which contains information about camera trap deployments.
#This dataset will often be used to summarise or correlate
#the data from the camtraps dataset.
#total_camera_days_rounded: This parameter is passed to the
#function and represents a numerical value indicating the
#total camera days, rounded to the nearest integer. This
#value is important for calculations (e.g.Relative
#Index of Abundance -RIA), as it reflects the effort over
#which species observations occurred.
#time_threshold = NULL: This parameter is optional and has a
#default value of NULL. If a value is provided when calling
#the function, it will be used as the time threshold for
#determining whether consecutive observations of the same
#species are considered separate events. For example, if you
#set a time threshold (e.g., 30 minutes), only observations
#separated by more than 30 minutes would be counted as
#separate trapping events.
# Check for missing values in genus
missing_genus <- sum(is.na(camtraps$genus) | camtraps$genus == "")
# Print count of missing genus values
if (missing_genus > 0) {
cat("Warning: Found", missing_genus, "missing or empty values in 'genus'. Rows with missing genus will be excluded.\n")
}
# Filter out rows with missing or empty genus
camtraps <- camtraps %>%
filter(!is.na(genus) & genus != "")
# Combine 'genus' and 'species' into a new 'full_species' column in the camtraps dataset
camtraps <- camtraps %>%
mutate(
full_species = paste(genus, species, sep = " "), # Combine genus and species
full_species = trimws(full_species) # Remove any leading/trailing whitespace
)
# Calculate trapping events either with or without a time threshold
# the code first checks if time_threshold is provided:
#Without a time threshold: All detections of a species within a deployment are counted as a single trapping event.
#With a time threshold: The time between detections is used to determine whether a new trapping event should be counted based on the time gap between detections.
if (is.null(time_threshold)) { # time_threshold not provided
trapping_events <- camtraps %>%
group_by(full_species, deployment_id) %>% # The code groups camtraps by full_species and deployment_id
summarise(trapping_event_count = n(), .groups = 'drop') # counts records for each species-deployment combination
} else { # there is a time threshold
# Sort by deployment, species, and timestamp
camtraps <- camtraps %>%
arrange(deployment_id, full_species, timestamp)
# Calculate time difference between consecutive events
trapping_events <- camtraps %>%
group_by(full_species, deployment_id) %>%
mutate(time_diff = c(NA, diff(as.numeric(timestamp)))) %>%
# diff function calculates the difference in seconds
# betweenc onsecutive timestamp values for each group,
# resulting in time_diff
# The c(NA, diff(...)) syntax is used to add NA at the
# beginning, as the first row has no previous timestamp
# to compare to.
mutate(new_event = ifelse(is.na(time_diff) | time_diff > time_threshold, 1, 0)) %>%
# A new trapping event (new_event = 1) is flagged if
# time_diff is NA (the first record) or if time_diff
# exceeds time_threshold.
# Otherwise, new_event is set to 0.
summarise(trapping_event_count = sum(new_event), .groups = 'drop')
} # For each group, the code sums new_event values, giving
# the count of unique trapping events based on the time
# threshold.
# Calculate RIA using the total camera days rounded
trapping_events <- trapping_events %>%
mutate(RIA = trapping_event_count / total_camera_days_rounded) # Calculate RIA
# Return the trapping events summary with RIA
return(trapping_events)
}
```
```{R calculating RAI}
# Example usage
# Assuming total_camera_days_rounded is defined elsewhere in your code
# Without time threshold
trapping_events_no_threshold <- event.sp(camtraps, deployment, total_camera_days_rounded)
# notice time_threshold is not included as the default is NULL
# With a 30-minute threshold (in seconds)
trapping_events_with_threshold <- event.sp(camtraps, deployment, total_camera_days_rounded, time_threshold = 60 * 60)
# View the results
print(trapping_events_no_threshold)
print(trapping_events_with_threshold)
```
But typically we are more interested in the overall results, the RAI for each species, across the study area:
```{r RAI across all study area}
# Assuming trapping_events_no_threshold is obtained from the event.sp function and contains the trapping event data
# if you have set up a threshold, like above you should use
# trapping_events_with_threshold in this R chunk
# Summarize results across all deployments
species_summary <- trapping_events_no_threshold %>%
group_by(full_species) %>%
summarise(
total_trapping_event_count = sum(trapping_event_count, na.rm = TRUE), # Sum of trapping events
total_RAI = sum(RIA, na.rm = TRUE), # Sum of RAI
.groups = 'drop' # Drop grouping after summarization
) %>%
arrange(desc(total_trapping_event_count)) # Optional: sort by total trapping events
# Print a nicely formatted table
kable(species_summary,
format = "html", # C
col.names = c("Species", "Total Trapping Events", "Total RAI"),
caption = "Summary of Trapping Events and RAI by Species")
```
Looking at these results we could make the rather superficial (and most likely erroneous) assumption that red deer is the most abundant species in the study area. This obviously would not consider all sort of factors affecting detectability, which were discussed during the module sessions.
You will see below how the RAI changes dramatically if you use a threshold/interval of 60 minutes between detections to be considered independent:
```{r RAI across all deployments with interval 30 mins, echo=FALSE}
# Assuming trapping_events_with_threshold is used
# Summarize results across all deployments
species_summary <- trapping_events_with_threshold %>%
group_by(full_species) %>%
summarise(
total_trapping_event_count = sum(trapping_event_count, na.rm = TRUE), # Sum of trapping events
total_RAI = sum(RIA, na.rm = TRUE), # Sum of RAI
.groups = 'drop' # Drop grouping after summarization
) %>%
arrange(desc(total_trapping_event_count)) # Optional: sort by total trapping events
# Print a nicely formatted table
kable(species_summary,
format = "html", # C
col.names = c("Species", "Total Trapping Events", "Total RAI"),
caption = "Summary of Trapping Events and RAI by Species")
```
## Naive Occupancy
The calculation of naive occupancy (deployment/stations positive to the presence of each species on the total number of deployment/stations over the study area) is relatively simple and you could do it by hand or on Excel. You can also do it using R!.
```{r calculation of Naive Occupancy}
# Load required libraries
library(knitr) # For kable function
# Calculate Naive Occupancy
naive_occupancy <- camtraps %>%
# Combine 'genus' and 'species' into a new 'full_species' column
mutate(full_species = paste(genus, species, sep = " ")) %>%
# Filter out rows with missing genus
filter(!is.na(genus) & genus != "") %>%
# Count unique deployments for each species
group_by(full_species, deployment_id) %>%
summarise(detected = n() > 0, .groups = 'drop') %>% # Create a presence indicator
group_by(full_species) %>%
summarise(
number_of_positive_deployments = sum(detected), # Count of unique deployments where species was detected
.groups = 'drop' # Drop grouping after summarization
) %>%
# Join with the deployment dataset to get total number of deployments
left_join(deployment %>% summarise(total_deployments = n_distinct(deployment_id)), by = character()) %>%
# Finalize naive occupancy calculation
mutate(naive_occupancy = number_of_positive_deployments / total_deployments) %>%
select(species = full_species, number_of_positive_deployments, total_deployments, naive_occupancy) %>%
arrange(desc(naive_occupancy)) # Optional: sort by naive occupancy
# Print the naive occupancy results with formatted table
# Use kable to print the table in HTML format
kable(naive_occupancy, format = "html", caption = "Naive Occupancy Summary")
```
Results seem to indicate that the most widespread species are foxes and red deer (over 70% of cameras confirm their presence), followed by wolves and brown bears (over 50%)...but again, results are impacted by multiple factors that we have not controlled for.
You can also visualise naive occupancy on a very basic map using the package `camtrapR`:
```{r visualing naive occupancy}
library(camtrapR)
par(mar = c(5, 5, 5, 10))
BasicMap<-detectionMaps(CTtable =deployment,
recordTable =camtraps,
Xcol ="longitude",
Ycol ="latitude",
stationCol ="deployment_id",
speciesCol ="full_species",
printLabels =FALSE, #no need to show labels
richnessPlot=TRUE,
addLegend=FALSE)
```
## Species Accumulation Curve
Resources always limit our projects (e.g. workforce, camera traps, transport, etc). That is why it is important that we try to work out the sampling effort required to characterise the animal community of our study area to make sure our resources are only deployed when needed.
In camera trapping, typically, the sampling effort is measured by the number of camera trap days. We would like to work out how many camera trap days are required to detect the majority of species in our study area. Please note that this only applies for projects focused on producing an inventory.
```{r step 1 species accummulation curve}
# Step 1: Ensure date columns are Date types
deployment <- deployment %>%
mutate(start_date = as.Date(start_date),
end_date = as.Date(end_date))
```
The `mutate` function is used to convert the `start_date` and `end_date` columns in the `deployment` dataset to `Date` types. This ensures that date calculations and comparisons can be performed accurately.
```{r step 2 species accummulation curve}
# Step 2: Create a date sequence for each deployment
deployment_dates <- deployment %>%
select(deployment_id, start_date, end_date) %>%
# Create a data frame to hold all the dates
rowwise() %>%
do(data.frame(deployment_id = .$deployment_id,
date = seq(.$start_date, .$end_date, by = "day"))) %>%
ungroup() # Expand the rows for each date
```
- The code selects relevant columns (`deployment_id`, `start_date`, and `end_date`) from the `deployment` dataset.
- The `rowwise` function allows operations to be performed on each row individually.
- The `do` function creates a new data frame for each row, generating a sequence of dates from `start_date` to `end_date` using `seq()`. This captures all days for each deployment.
- Finally, `ungroup()` is called to revert the data back to a regular data frame structure after generating the sequences.
```{r step 3 species accummulation curve}
# Step 3: Combine genus and species into full_species
camtraps <- camtraps %>%
mutate(full_species = paste(genus, species, sep = " "),
timestamp = as.Date(timestamp)) %>% # Convert timestamp to Date
select(deployment_id, full_species, timestamp)
```
- The `camtraps` dataset is modified to create a new column `full_species`, which combines the `genus` and `species` columns into a single string, facilitating easier identification of species.
- The `timestamp` column is also converted to `Date` format.
- The `select` function is used to keep only relevant columns for further analysis: `deployment_id`, `full_species`, and `timestamp`.
```{r step 4 species accummulation curve}
# Step 4: Count unique species detected per day for each deployment
species_per_day <- deployment_dates %>%
left_join(camtraps, by = c("deployment_id", "date" = "timestamp")) %>%
group_by(deployment_id, date) %>%
summarise(species_detected = list(unique(full_species)), .groups = 'drop') %>%
ungroup()
```
- The `deployment_dates` dataset is left-joined with the `camtraps` dataset using `deployment_id` and matching `date` with `timestamp`.
- The result is grouped by `deployment_id` and `date`, allowing for summarization of the species detected each day.
- The `summarise` function collects the unique species detected for each day into a list called `species_detected`.
- The `.groups = 'drop'` argument prevents additional grouping metadata from being carried over to the next steps.
- Finally, `ungroup()` is called to revert the dataset to a regular data frame.
```{r step 5 species accummulation curve}
# Step 5: Calculate cumulative number of new species detected
# Initialize variables
cumulative_species <- numeric(nrow(species_per_day))
# Loop through each date and calculate cumulative new species
detected_species <- character(0) # Start with no species detected
for (i in seq_len(nrow(species_per_day))) {
# Get newly detected species for the day
new_species <- species_per_day$species_detected[[i]]
# Update the cumulative list
detected_species <- union(detected_species, new_species)
# Store the cumulative count
cumulative_species[i] <- length(detected_species)
}
# Combine with dates
species_per_day <- species_per_day %>%
mutate(cumulative_species = cumulative_species)
```
- An empty numeric vector `cumulative_species` is initialized to store the cumulative counts of species.
- A character vector `detected_species` is initialized to keep track of species detected up to the current date.
- A loop iterates over each row of `species_per_day`. For each date:
- The unique species detected on that day is retrieved.
- The `union` function updates the `detected_species` list to include any new species detected.
- The cumulative count of detected species is updated and stored in `cumulative_species`.
- After the loop, the `cumulative_species` vector is added to the `species_per_day` dataset.
```{r step 6 species accummulation curve}
# Step 6: Count camera days
camera_days <- species_per_day %>%
mutate(camera_day = row_number())
```
The `camera_days` dataset is created by adding a new column `camera_day` to `species_per_day`. This column is generated using `row_number()`, which gives a sequential count of camera trap days.
```{r step 7 species accummulation curve}
# Step 7: Plot the results
ggplot(camera_days, aes(x = camera_day, y = cumulative_species)) +
geom_line(color = "blue", linewidth = 1) + # Use `linewidth` instead of `size`
geom_point(color = "red") +
labs(
x = "Number of Camera Days",
y = "Cumulative Number of New Species Detected",
title = "Species Accumulation Curve"
) +
theme_minimal()
```
- The `ggplot` function creates a plot using `camera_days` as the dataset.
- The x-axis represents the total number of camera days, while the y-axis shows the cumulative number of new species detected.
- `geom_line` draws a blue line representing the cumulative species count over time, and `geom_point` adds red points to indicate specific data points.
- The `labs` function labels the axes and sets the title of the plot.
- `theme_minimal()` applies a clean and simple theme to the plot for better aesthetics.
The result is a Species Accumulation Curve showing that after 2200 camera traps there is no much point increasing the sampling effort.