deployment <- read.csv("deployments.csv",
header = TRUE,
sep = ",",
stringsAsFactors = FALSE)
camtraps <- read.csv("images_LGDs.csv",
header = TRUE,
sep = ",",
stringsAsFactors = FALSE)Basic Camera Trap descriptors
Introduction
This tutorial covers basic manipulation of data and calculations directed at obtaining basic Camera Trapping descriptors.
The case study is based on Dr. Bethany Smith’s data collection in Romania as part of her PhD at Nottingham Trent University focus on understanding the ecological consequences of Livestock Guarding Dogs (LGDs) on wildlife.
The dataset
Obtained via Wildlife Insights. The download output includes 4 files:
projects.csv > description of the project
cameras.csv > description of cameras (make, model, serial number and ID)
deployments.csv > description of deployments, including location, start and end date, camera ID used and landscape feature where the deployment took place (e.g. trail game)
images_LGD.csv > raw camera trap data

Import datasets
Obtaining descriptors
Firstly, install required packages, if they are not already in your computer, and load them:
#Load Packages
list.of.packages <- c(
"leaflet", # creates interactive maps
"plotly", # creates interactive plots
"kableExtra", # Creates interactive tables
"tidyr", # A package for data manipulation
"dplyr", # A package for data manipulation
"viridis", # Generates colors for plots
"corrplot", # Plots pairwise correlations
"lubridate", # Easy manipulation of date objects
"taxize", # Package to check taxonomy
"sf", # Package for spatial data analysis
"reshape", # allos restructuring of data
"vegan", # tools for descriptive parameters in ecology
"plotrix", # visualises circular data (activity)
"chron", # handles chronological objects
"knitr") # dynamic reporting (nice tables)
# Check you have them in your library
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
# install any missing packages and load them
if(length(new.packages)) install.packages(new.packages,repos = "http://cran.us.r-project.org")
lapply(list.of.packages, require, character.only = TRUE)Checking data
Quick set of functions to make sure everything looks like it should.
names(deployment) # return the name of variables contained in dataset [1] "project_id" "deployment_id"
[3] "placename" "longitude"
[5] "latitude" "start_date"
[7] "end_date" "bait_type"
[9] "bait_description" "feature_type"
[11] "feature_type_methodology" "camera_id"
[13] "camera_name" "quiet_period"
[15] "camera_functioning" "sensor_height"
[17] "height_other" "sensor_orientation"
[19] "orientation_other" "plot_treatment"
[21] "plot_treatment_description" "detection_distance"
[23] "subproject_name" "subproject_design"
[25] "event_name" "event_description"
[27] "event_type" "recorded_by"
[29] "remarks"
names(camtraps) [1] "project_id" "deployment_id"
[3] "image_id" "filename"
[5] "location" "is_blank"
[7] "identified_by" "wi_taxon_id"
[9] "class" "order"
[11] "family" "genus"
[13] "species" "common_name"
[15] "uncertainty" "timestamp"
[17] "number_of_objects" "age"
[19] "sex" "animal_recognizable"
[21] "individual_id" "individual_animal_notes"
[23] "behavior" "highlighted"
[25] "markings" "cv_confidence"
[27] "license" "bounding_boxes"
We can also look at the structure of each of the variables using the str function. Output not shown here.
str(deployment)
str(camtraps)Let’s obtain some numerical outputs:
Number of photos
number <- nrow(camtraps) # number of rows in the dataset
print(paste("Number of photographs:", number))[1] "Number of photographs: 71726"
Notice that the number differs from the Wildlife Insights summary, this might be because photographs with humans were filtered out.
Species
How many different species have been captured and which ones? in the dataset the full species name is divided in genus and species so we need to combine this first
# Combine genus and species into a full species name
camtraps$full_species <- paste(camtraps$genus, camtraps$species, sep = " ")
# Extract unique species
unique_species <- unique(camtraps$full_species)
# Count the number of unique species
print(paste("Number of unique species:",length(unique_species)))[1] "Number of unique species: 31"
# View the unique species
print(unique_species) [1] "Cervus elaphus" " " "Ursus arctos"
[4] "Sus scrofa" "Vulpes vulpes" "Capreolus capreolus"
[7] "Lynx lynx" "Rupicapra rupicapra" "Bos taurus"
[10] "Equus caballus" "Sciurus vulgaris" "Canis familiaris"
[13] "Ovis aries" "Equus asinus" "Meles meles"
[16] "Canis lupus" "Felis silvestris" "Garrulus glandarius"
[19] "Sitta europaea" "Martes " "Tetrao urogallus"
[22] "Lepus europaeus" "Felis " "Felis catus"
[25] "Canis aureus" "Canis " "Columba palumbus"
[28] "Erinaceus europaeus" "Aquila chrysaetos" "Bonasa bonasia"
[31] "Dryocopus martius"
The number of species (n=31) differs from the WI summary (n=21). It seems there were occasions when the species could not be identified, only the genus (e.g. Martes, Felis, Canis) and there were some photographs that containeded no species (i.e. only ” ” is returned). How often does this happen?
# Count how many times the genus has not been identified, and therefor the photograph does not contain relevant species
empty_genus <- sum(camtraps$genus == "")
# View the count
print(empty_genus)[1] 25692
# Proportion of dataset
proportion <- (empty_genus/nrow(camtraps))*100
print(paste("Percentage of photographs without animal captures:",proportion))[1] "Percentage of photographs without animal captures: 35.8196469899339"
This has happened in >35% of the photographs. Remember that photographs containing humans were already filtered out by WI.
Deployments
We would like to know if all deployments were successful (i.e. the cameras took photos, regardless if animals were detected)
#count number of deployments
unique_deployments <- unique(deployment$deployment_id)
num_all_deployments <- length(unique_deployments)
#count number of successful deployments
successful_deployments<- unique(camtraps$deployment_id)
num_successful_deployments <-length(successful_deployments)
#how many were not successful?
total<-num_all_deployments - num_successful_deployments
print(paste("Number of deployments with no captures:", total))[1] "Number of deployments with no captures: 2"
Two camera stations had no photos, which could be due to malfunction, damage or lost, or simply nothing triggered the sensors. You might want to double check this by downloading again the dataset without filtering humans as the first photograph should contain the researcher holding a white board with deployment info.
Start and End of sampling
First we need to make sure that the start_date and end_date columns are properly formatted as date-time objects.
This also includes standardising the timezones. For example, some of the dates in the dataset are BST (British Summer Time), while others are GMT (Greenwich Mean Time). This is messy because R can only handle one timezone at a time.
lubridate is an awesome package that allows fixing all kind of issues with dates and times. ynd_hms() from lubridate automatically recognises and converts dates with timezones:
# Ensure date variables are character before conversion
deployment$start_date <- as.character(deployment$start_date)
deployment$end_date <- as.character(deployment$end_date)
# Standardize timezones, in this case the dataset had some BST and some GMT dates
deployment$end_date <- gsub("BST", "GMT", deployment$end_date) # Change BST to GMT
deployment$start_date <- gsub("BST", "GMT", deployment$start_date) # Change BST to GMT
# Now convert start_date and end_date to POSIXct format, assuming they are in GMT
deployment$start_date <- ymd_hms(deployment$start_date, tz = "GMT")
deployment$end_date <- ymd_hms(deployment$end_date, tz = "GMT")
# Now adjust the timezone to Europe/Bucharest
deployment$start_date <- with_tz(deployment$start_date, tzone = "Europe/Bucharest")
deployment$end_date <- with_tz(deployment$end_date, tzone = "Europe/Bucharest")
# Check for NAs after conversion to ensure it has been successful
print(sum(is.na(deployment$start_date))) # Should return 0 if conversion is successful[1] 0
print(sum(is.na(deployment$end_date))) # Should return 0 if conversion is successful[1] 0
Now we can work out the start and end of the sampling period across all deployments
# Find the earliest start date
earliest_start_date <- min(deployment$start_date, na.rm = TRUE)
# Find the latest end date
latest_end_date <- max(deployment$end_date, na.rm = TRUE)
# View the results
print(paste("Earliest start date:", earliest_start_date))[1] "Earliest start date: 2021-07-15 12:54:47"
print(paste("Latest end date:", latest_end_date))[1] "Latest end date: 2022-06-08 16:56:56"
Deployment interval
It is useful to know the survey effort per station/deployment and identify sources of imbalance or plainly, when things have gone wrong. For this we can calculate the deployment interval: the number of days between start and end date.
# Calculate the interval for each deployment
deployment$interval <- interval(deployment$start_date, deployment$end_date)
# If you want to calculate the duration in days, hours, etc.
deployment$duration_days <- as.numeric(deployment$interval) / (60 * 60 * 24) # Convert to days
# Check the results
print(head(deployment[, c("start_date", "end_date", "duration_days")],5)) start_date end_date duration_days
1 2021-09-21 12:15:00 2021-10-28 17:19:00 37.21111
2 2021-08-10 22:00:00 2021-08-23 18:05:00 12.83681
3 2021-08-10 16:38:00 2021-08-23 11:51:00 12.80069
4 2021-12-07 15:20:32 2022-03-31 16:51:22 114.02141
5 2021-07-20 15:12:00 2021-08-10 17:05:00 21.07847
# Or you can summarise the range of values obtained
summary(deployment$duration_days) Min. 1st Qu. Median Mean 3rd Qu. Max.
7.076 24.160 47.377 57.968 71.997 207.949
Beth was not able to access some of the cameras due to the snow, some for up to 6 months, hence some of the intervals are above 100 days. The average interval was around 58 days, but the variation is quite large.
Sampling Effort
This is an important parameter, that will allow us to obtain others, such as the naive occupancy. Sampling effort is considered here as the number of camera days, so that is a simple line of coding:
# Sum the duration_days to calculate total camera days (sampling effort)
total_camera_days <- sum(deployment$duration_days, na.rm = TRUE)
# Round the result to NO decimal places
total_camera_days_rounded <- round(total_camera_days, 0)
# View the rounded result
print(paste("Total camera days (sampling effort):", total_camera_days_rounded))[1] "Total camera days (sampling effort): 7884"
Camera Locations
A common mistake in camera trap data sets is that the locations are not where they are supposed to be. The safest way to check your data is to plot it.
The package leaflet is awesome for rapid and interactive maps and plots. The next R code produces the simplest version of a leaflet map. Notice you can zoom in and out!
m <- leaflet() %>% # call leaflet
addTiles() %>% # add the default basemap
addCircleMarkers( # Add circles for stations
lng=deployment$longitude, lat=deployment$latitude)
m # return the mapYou can change the base map and use some satellite imagery
m <- leaflet() %>% # call leaflet
addProviderTiles(providers$Esri.WorldImagery) %>% #Add Esri Wrold imagery
addCircleMarkers( # Add circles for stations
lng=deployment$longitude, lat=deployment$latitude)
m # return the mapRelative Index of Abundance
The RIA is obtained by calculating the number of independent captures per species considering the sampling effort previously obtained (total_camera_days_rounded). We will exclude all blank records (i.e. those that do not have any genus identified). Remember that you need to think carefully what you consider as independent captures. For instance, you might want to consider independent captures only when there is a 30 minutes interval in between two captures of the same species, to avoid ‘double counting’ individuals that might stay around the camera, for example grazing. Examples below consider this (i.e. ‘threshold’).
Firstly, we can obtain a RAI for each species at each deployment/station:
# Define a custom function event.sp to calculate trapping events and Relative Index of Abundance (RIA)
event.sp <- function(camtraps, deployment, total_camera_days_rounded, time_threshold = NULL) {
# event.sp: This is the name of the function being defined. It is a convention in R to use the function_name <- function(...) syntax to create a new function.
#camtraps: This parameter expects a dataset (typically a data frame) containing information about camera trap photographs.
#deployment: This parameter is intended for another dataset, which contains information about camera trap deployments. This dataset will often be used to summarize or correlate the data from the camtraps dataset.
#total_camera_days_rounded: This parameter is passed to the function and represents a numerical value indicating the total camera days, rounded to the nearest integer. This value is important for calculations, such as the Relative Index of Abundance (RIA), as it reflects the effort over which species observations occurred.
#time_threshold = NULL: This parameter is optional and has a default value of NULL. If a value is provided when calling the function, it will be used as the time threshold for determining whether consecutive observations of the same species are considered separate events. For example, if you set a time threshold (e.g., 30 minutes), only observations separated by more than 30 minutes would be counted as separate trapping events.
# Check for missing values in genus
missing_genus <- sum(is.na(camtraps$genus) | camtraps$genus == "")
# Print count of missing genus values
if (missing_genus > 0) {
cat("Warning: Found", missing_genus, "missing or empty values in 'genus'. Rows with missing genus will be excluded.\n")
}
# Filter out rows with missing or empty genus
camtraps <- camtraps %>%
filter(!is.na(genus) & genus != "")
# Combine 'genus' and 'species' into a new 'full_species' column in the camtraps dataset
camtraps <- camtraps %>%
mutate(
full_species = paste(genus, species, sep = " "), # Combine genus and species
full_species = trimws(full_species) # Remove any leading/trailing whitespace
)
# Convert timestamp column in camtraps to POSIXct if needed
camtraps$timestamp <- as.POSIXct(camtraps$timestamp, format = "%Y-%m-%d %H:%M:%S")
# Calculate trapping events either with or without a time threshold
if (is.null(time_threshold)) {
trapping_events <- camtraps %>%
group_by(full_species, deployment_id) %>%
summarise(trapping_event_count = n(), .groups = 'drop')
} else {
# Sort by deployment, species, and timestamp
camtraps <- camtraps %>%
arrange(deployment_id, full_species, timestamp)
# Calculate time difference between consecutive events
trapping_events <- camtraps %>%
group_by(full_species, deployment_id) %>%
mutate(time_diff = c(NA, diff(as.numeric(timestamp)))) %>%
mutate(new_event = ifelse(is.na(time_diff) | time_diff > time_threshold, 1, 0)) %>%
summarise(trapping_event_count = sum(new_event), .groups = 'drop')
}
# Calculate RIA using the total camera days rounded
trapping_events <- trapping_events %>%
mutate(RIA = trapping_event_count / total_camera_days_rounded) # Calculate RIA
# Return the trapping events summary with RIA
return(trapping_events)
}
# Example usage
# Assuming total_camera_days_rounded is defined elsewhere in your code
# Without time threshold
trapping_events_no_threshold <- event.sp(camtraps, deployment, total_camera_days_rounded)Warning: Found 25692 missing or empty values in 'genus'. Rows with missing genus will be excluded.
# With a 30-minute threshold (in seconds)
trapping_events_with_threshold <- event.sp(camtraps, deployment, total_camera_days_rounded, time_threshold = 30 * 60)Warning: Found 25692 missing or empty values in 'genus'. Rows with missing genus will be excluded.
# View the results
print(trapping_events_no_threshold)# A tibble: 755 × 4
full_species deployment_id trapping_event_count RIA
<chr> <chr> <int> <dbl>
1 Aquila chrysaetos D6_AC11-2 1 0.000127
2 Bonasa bonasia D4_T14-1 3 0.000381
3 Bos taurus D1_AD10-1 326 0.0413
4 Bos taurus D2_AA11-1 14 0.00178
5 Bos taurus D3_AA11-1 93 0.0118
6 Canis D2_AA12-2 5 0.000634
7 Canis D3_AD10-2 2 0.000254
8 Canis D4_AC10-2 1 0.000127
9 Canis D4_AD10-2 7 0.000888
10 Canis D4_S14-1 1 0.000127
# ℹ 745 more rows
print(trapping_events_with_threshold)# A tibble: 755 × 4
full_species deployment_id trapping_event_count RIA
<chr> <chr> <dbl> <dbl>
1 Aquila chrysaetos D6_AC11-2 1 0.000127
2 Bonasa bonasia D4_T14-1 2 0.000254
3 Bos taurus D1_AD10-1 1 0.000127
4 Bos taurus D2_AA11-1 1 0.000127
5 Bos taurus D3_AA11-1 1 0.000127
6 Canis D2_AA12-2 3 0.000381
7 Canis D3_AD10-2 1 0.000127
8 Canis D4_AC10-2 1 0.000127
9 Canis D4_AD10-2 7 0.000888
10 Canis D4_S14-1 1 0.000127
# ℹ 745 more rows
But typically we are more interested in the overall results, the RAI for each species, across the study area:
# Assuming trapping_events_no_threshold is obtained from the event.sp function and contains the trapping event data
# if you have set up a threshold, like above you should use
# trapping_events_with_threshold in this R chunk
# Summarize results across all deployments
species_summary <- trapping_events_no_threshold %>%
group_by(full_species) %>%
summarise(
total_trapping_event_count = sum(trapping_event_count, na.rm = TRUE), # Sum of trapping events
total_RAI = sum(RIA, na.rm = TRUE), # Sum of RAI
.groups = 'drop' # Drop grouping after summarization
) %>%
arrange(desc(total_trapping_event_count)) # Optional: sort by total trapping events
# Print a nicely formatted table
kable(species_summary,
format = "html", # C
col.names = c("Species", "Total Trapping Events", "Total RAI"),
caption = "Summary of Trapping Events and RAI by Species")| Species | Total Trapping Events | Total RAI |
|---|---|---|
| Cervus elaphus | 22716 | 2.8812785 |
| Ovis aries | 5695 | 0.7223491 |
| Canis familiaris | 3540 | 0.4490107 |
| Ursus arctos | 2898 | 0.3675799 |
| Canis lupus | 2043 | 0.2591324 |
| Vulpes vulpes | 1760 | 0.2232369 |
| Sus scrofa | 1650 | 0.2092846 |
| Rupicapra rupicapra | 1597 | 0.2025622 |
| Equus caballus | 1020 | 0.1293760 |
| Equus asinus | 975 | 0.1236682 |
| Capreolus capreolus | 778 | 0.0986809 |
| Bos taurus | 433 | 0.0549214 |
| Lynx lynx | 338 | 0.0428716 |
| Martes | 122 | 0.0154744 |
| Sciurus vulgaris | 99 | 0.0125571 |
| Meles meles | 90 | 0.0114155 |
| Felis silvestris | 88 | 0.0111618 |
| Lepus europaeus | 67 | 0.0084982 |
| Tetrao urogallus | 65 | 0.0082445 |
| Canis | 18 | 0.0022831 |
| Erinaceus europaeus | 9 | 0.0011416 |
| Columba palumbus | 7 | 0.0008879 |
| Canis aureus | 5 | 0.0006342 |
| Dryocopus martius | 5 | 0.0006342 |
| Felis catus | 5 | 0.0006342 |
| Garrulus glandarius | 4 | 0.0005074 |
| Bonasa bonasia | 3 | 0.0003805 |
| Sitta europaea | 2 | 0.0002537 |
| Aquila chrysaetos | 1 | 0.0001268 |
| Felis | 1 | 0.0001268 |
Looking at these results we could make the rather superficial (and most likely erroneous) assumption that red deer is the most abundant species in the study area. This obviously would not consider all sort of factors affecting detectability, which were discussed during the module sessions.
You will see below how the RAI changes dramatically if you use a threshold/interval of 30 minutes between detections to be considered independent:
| Species | Total Trapping Events | Total RAI |
|---|---|---|
| Vulpes vulpes | 1002 | 0.1270928 |
| Cervus elaphus | 834 | 0.1057839 |
| Ursus arctos | 505 | 0.0640538 |
| Canis lupus | 258 | 0.0327245 |
| Canis familiaris | 247 | 0.0313293 |
| Sus scrofa | 134 | 0.0169964 |
| Equus caballus | 114 | 0.0144597 |
| Capreolus capreolus | 109 | 0.0138255 |
| Martes | 85 | 0.0107813 |
| Lynx lynx | 83 | 0.0105277 |
| Sciurus vulgaris | 80 | 0.0101471 |
| Rupicapra rupicapra | 68 | 0.0086251 |
| Meles meles | 54 | 0.0068493 |
| Felis silvestris | 52 | 0.0065956 |
| Ovis aries | 52 | 0.0065956 |
| Lepus europaeus | 43 | 0.0054541 |
| Equus asinus | 42 | 0.0053272 |
| Tetrao urogallus | 17 | 0.0021563 |
| Canis | 15 | 0.0019026 |
| Erinaceus europaeus | 6 | 0.0007610 |
| Columba palumbus | 5 | 0.0006342 |
| Felis catus | 4 | 0.0005074 |
| Garrulus glandarius | 4 | 0.0005074 |
| Bos taurus | 3 | 0.0003805 |
| Bonasa bonasia | 2 | 0.0002537 |
| Canis aureus | 2 | 0.0002537 |
| Dryocopus martius | 2 | 0.0002537 |
| Aquila chrysaetos | 1 | 0.0001268 |
| Felis | 1 | 0.0001268 |
| Sitta europaea | 1 | 0.0001268 |
Naive Occupancy
The calculation of naive occupancy (deployment/stations positive to the presence of each species on the total number of deployment/stations over the study area) is relatively simple and you could do it by hand or on Excel. You can also do it using R!.
# Load required libraries
library(dplyr)
library(knitr) # For kable function
# Calculate Naive Occupancy
naive_occupancy <- camtraps %>%
# Combine 'genus' and 'species' into a new 'full_species' column
mutate(full_species = paste(genus, species, sep = " ")) %>%
# Filter out rows with missing genus
filter(!is.na(genus) & genus != "") %>%
# Count unique deployments for each species
group_by(full_species, deployment_id) %>%
summarise(detected = n() > 0, .groups = 'drop') %>% # Create a presence indicator
group_by(full_species) %>%
summarise(
number_of_positive_deployments = sum(detected), # Count of unique deployments where species was detected
.groups = 'drop' # Drop grouping after summarization
) %>%
# Join with the deployment dataset to get total number of deployments
left_join(deployment %>% summarise(total_deployments = n_distinct(deployment_id)), by = character()) %>%
# Finalize naive occupancy calculation
mutate(naive_occupancy = number_of_positive_deployments / total_deployments) %>%
select(species = full_species, number_of_positive_deployments, total_deployments, naive_occupancy) %>%
arrange(desc(naive_occupancy)) # Optional: sort by naive occupancy
# Print the naive occupancy results with formatted table
# Use kable to print the table in HTML format
kable(naive_occupancy, format = "html", caption = "Naive Occupancy Summary")| species | number_of_positive_deployments | total_deployments | naive_occupancy |
|---|---|---|---|
| Vulpes vulpes | 100 | 135 | 0.7407407 |
| Cervus elaphus | 99 | 135 | 0.7333333 |
| Canis lupus | 76 | 135 | 0.5629630 |
| Ursus arctos | 69 | 135 | 0.5111111 |
| Capreolus capreolus | 58 | 135 | 0.4296296 |
| Canis familiaris | 53 | 135 | 0.3925926 |
| Sus scrofa | 43 | 135 | 0.3185185 |
| Lynx lynx | 37 | 135 | 0.2740741 |
| Martes | 31 | 135 | 0.2296296 |
| Felis silvestris | 29 | 135 | 0.2148148 |
| Sciurus vulgaris | 27 | 135 | 0.2000000 |
| Meles meles | 24 | 135 | 0.1777778 |
| Ovis aries | 20 | 135 | 0.1481481 |
| Rupicapra rupicapra | 16 | 135 | 0.1185185 |
| Equus asinus | 14 | 135 | 0.1037037 |
| Equus caballus | 13 | 135 | 0.0962963 |
| Tetrao urogallus | 10 | 135 | 0.0740741 |
| Lepus europaeus | 9 | 135 | 0.0666667 |
| Canis | 7 | 135 | 0.0518519 |
| Columba palumbus | 4 | 135 | 0.0296296 |
| Bos taurus | 3 | 135 | 0.0222222 |
| Garrulus glandarius | 3 | 135 | 0.0222222 |
| Canis aureus | 2 | 135 | 0.0148148 |
| Felis catus | 2 | 135 | 0.0148148 |
| Aquila chrysaetos | 1 | 135 | 0.0074074 |
| Bonasa bonasia | 1 | 135 | 0.0074074 |
| Dryocopus martius | 1 | 135 | 0.0074074 |
| Erinaceus europaeus | 1 | 135 | 0.0074074 |
| Felis | 1 | 135 | 0.0074074 |
| Sitta europaea | 1 | 135 | 0.0074074 |
Results seem to indicate that the most widespread species are foxes and red deer (over 70% of cameras confirm their presence), followed by wolves and brown bears (over 50%)…but again, results are impacted by multiple factors that we have not controlled for.
You can also visualise occupancy on a very basic map using the package camtrapR:
library(camtrapR)
par(mar = c(5, 5, 5, 10))
BasicMap<-detectionMaps(CTtable =deployment,
recordTable =camtraps,
Xcol ="longitude",
Ycol ="latitude",
stationCol ="deployment_id",
speciesCol ="full_species",
printLabels =FALSE, #no need to show labels
richnessPlot=TRUE,
addLegend=FALSE)Species Accumulation Curve
Resources always limit our projects (e.g. workforce, camera traps, transport, etc). That is why it is important that we try to work out the sampling effort required to characterise the animal community of our study area to make sure our resources are only deployed when needed.
In camera trapping, typically, the sampling effort is measured by the number of camera trap days. We would like to work out how many camera trap days are required to detect the majority of species in our study area. Please notice that this only applies for projects focused on producing an inventory.
# Step 1: Ensure date columns are Date types
deployment <- deployment %>%
mutate(start_date = as.Date(start_date),
end_date = as.Date(end_date))The mutate function is used to convert the start_date and end_date columns in the deployment dataset to Date types. This ensures that date calculations and comparisons can be performed accurately.
# Step 2: Create a date sequence for each deployment
deployment_dates <- deployment %>%
select(deployment_id, start_date, end_date) %>%
# Create a data frame to hold all the dates
rowwise() %>%
do(data.frame(deployment_id = .$deployment_id,
date = seq(.$start_date, .$end_date, by = "day"))) %>%
ungroup() # Expand the rows for each dateThe code selects relevant columns (
deployment_id,start_date, andend_date) from thedeploymentdataset.The
rowwisefunction allows operations to be performed on each row individually.The
dofunction creates a new data frame for each row, generating a sequence of dates fromstart_datetoend_dateusingseq(). This captures all days for each deployment.Finally,
ungroup()is called to revert the data back to a regular data frame structure after generating the sequences.
# Step 3: Combine genus and species into full_species
camtraps <- camtraps %>%
mutate(full_species = paste(genus, species, sep = " "),
timestamp = as.Date(timestamp)) %>% # Convert timestamp to Date
select(deployment_id, full_species, timestamp)The
camtrapsdataset is modified to create a new columnfull_species, which combines thegenusandspeciescolumns into a single string, facilitating easier identification of species.The
timestampcolumn is also converted toDateformat.The
selectfunction is used to keep only relevant columns for further analysis:deployment_id,full_species, andtimestamp.
# Step 4: Count unique species detected per day for each deployment
species_per_day <- deployment_dates %>%
left_join(camtraps, by = c("deployment_id", "date" = "timestamp")) %>%
group_by(deployment_id, date) %>%
summarise(species_detected = list(unique(full_species)), .groups = 'drop') %>%
ungroup()The
deployment_datesdataset is left-joined with thecamtrapsdataset usingdeployment_idand matchingdatewithtimestamp.The result is grouped by
deployment_idanddate, allowing for summarization of the species detected each day.The
summarisefunction collects the unique species detected for each day into a list calledspecies_detected.The
.groups = 'drop'argument prevents additional grouping metadata from being carried over to the next steps.Finally,
ungroup()is called to revert the dataset to a regular data frame.
# Step 5: Calculate cumulative number of new species detected
# Initialize variables
cumulative_species <- numeric(nrow(species_per_day))
# Loop through each date and calculate cumulative new species
detected_species <- character(0) # Start with no species detected
for (i in seq_len(nrow(species_per_day))) {
# Get newly detected species for the day
new_species <- species_per_day$species_detected[[i]]
# Update the cumulative list
detected_species <- union(detected_species, new_species)
# Store the cumulative count
cumulative_species[i] <- length(detected_species)
}
# Combine with dates
species_per_day <- species_per_day %>%
mutate(cumulative_species = cumulative_species)An empty numeric vector
cumulative_speciesis initialized to store the cumulative counts of species.A character vector
detected_speciesis initialized to keep track of species detected up to the current date.A loop iterates over each row of
species_per_day. For each date:The unique species detected on that day is retrieved.
The
unionfunction updates thedetected_specieslist to include any new species detected.The cumulative count of detected species is updated and stored in
cumulative_species.
After the loop, the
cumulative_speciesvector is added to thespecies_per_daydataset.
# Step 6: Count camera days
camera_days <- species_per_day %>%
mutate(camera_day = row_number())The camera_days dataset is created by adding a new column camera_day to species_per_day. This column is generated using row_number(), which gives a sequential count of camera trap days.
# Step 7: Plot the results
ggplot(camera_days, aes(x = camera_day, y = cumulative_species)) +
geom_line(color = "blue", linewidth = 1) + # Use `linewidth` instead of `size`
geom_point(color = "red") +
labs(
x = "Number of Camera Days",
y = "Cumulative Number of New Species Detected",
title = "Species Accumulation Curve"
) +
theme_minimal()The
ggplotfunction creates a plot usingcamera_daysas the dataset.The x-axis represents the total number of camera days, while the y-axis shows the cumulative number of new species detected.
geom_linedraws a blue line representing the cumulative species count over time, andgeom_pointadds red points to indicate specific data points.The
labsfunction labels the axes and sets the title of the plot.theme_minimal()applies a clean and simple theme to the plot for better aesthetics.
The result is a Species Accumulation Curve showing that after 2200 camera traps there is no much point increasing the sampling effort.