Basic Camera Trap descriptors

Author

Antonio Uzal

Published

October 1, 2025

Introduction

This tutorial covers basic manipulation of data and calculations directed at obtaining basic Camera Trapping descriptors.

The case study is based on Dr. Bethany Smith’s data collection in Romania as part of her PhD at Nottingham Trent University focus on understanding the ecological consequences of Livestock Guarding Dogs (LGDs) on wildlife.

The dataset

Obtained via Wildlife Insights. The download output includes 4 files:

projects.csv > description of the project
cameras.csv > description of cameras (make, model, serial number and ID)
deployments.csv > description of deployments, including location, start and end date, camera ID used and landscape feature where the deployment took place (e.g. trail game)
images_LGD.csv > raw camera trap data

Import datasets

deployment <- read.csv("deployments.csv",
                 header = TRUE, 
                 sep = ",", 
                 stringsAsFactors = FALSE)
camtraps <- read.csv("images_LGDs.csv", 
                 header = TRUE, 
                 sep = ",", 
                 stringsAsFactors = FALSE)

Obtaining descriptors

Firstly, install required packages, if they are not already in your computer, and load them:

#Load Packages
list.of.packages <- c(
                      "leaflet",       # creates interactive maps
                      "plotly",        # creates interactive plots   
                      "kableExtra",    # Creates interactive tables 
                      "tidyr",         # A package for data manipulation
                      "dplyr",         # A package for data manipulation
                      "viridis",       # Generates colors for plots  
                      "corrplot",      # Plots pairwise correlations
                      "lubridate",     # Easy manipulation of date objects
                      "taxize",        # Package to check taxonomy 
                      "sf",            # Package for spatial data analysis 
                      "reshape",       # allos restructuring of data
                      "vegan",         # tools for descriptive parameters in ecology
                      "plotrix",       # visualises circular data (activity)
                      "chron",         # handles chronological objects
                      "knitr")         # dynamic reporting (nice tables) 
                              

# Check you have them in your library
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
# install any missing packages and load them
if(length(new.packages)) install.packages(new.packages,repos = "http://cran.us.r-project.org")
lapply(list.of.packages, require, character.only = TRUE)

Checking data

Quick set of functions to make sure everything looks like it should.

names(deployment) # return the name of variables contained in dataset

 [1] "project_id"                 "deployment_id"             
 [3] "placename"                  "longitude"                 
 [5] "latitude"                   "start_date"                
 [7] "end_date"                   "bait_type"                 
 [9] "bait_description"           "feature_type"              
[11] "feature_type_methodology"   "camera_id"                 
[13] "camera_name"                "quiet_period"              
[15] "camera_functioning"         "sensor_height"             
[17] "height_other"               "sensor_orientation"        
[19] "orientation_other"          "plot_treatment"            
[21] "plot_treatment_description" "detection_distance"        
[23] "subproject_name"            "subproject_design"         
[25] "event_name"                 "event_description"         
[27] "event_type"                 "recorded_by"               
[29] "remarks"

names(camtraps)

 [1] "project_id"              "deployment_id"          
 [3] "image_id"                "filename"               
 [5] "location"                "is_blank"               
 [7] "identified_by"           "wi_taxon_id"            
 [9] "class"                   "order"                  
[11] "family"                  "genus"                  
[13] "species"                 "common_name"            
[15] "uncertainty"             "timestamp"              
[17] "number_of_objects"       "age"                    
[19] "sex"                     "animal_recognizable"    
[21] "individual_id"           "individual_animal_notes"
[23] "behavior"                "highlighted"            
[25] "markings"                "cv_confidence"          
[27] "license"                 "bounding_boxes"

We can also look at the structure of each of the variables using the str function. Output not shown here.

str(deployment)
str(camtraps)

Let’s obtain some numerical outputs:

Number of photos

number <- nrow(camtraps) # number of rows in the dataset
print(paste("Number of photographs:", number))

[1] "Number of photographs: 71726"

Notice that the number differs from the Wildlife Insights summary, this might be because photographs with humans were filtered out.

Species

How many different species have been captured and which ones? in the dataset the full species name is divided in genus and species so we need to combine this first

# Combine genus and species into a full species name
camtraps$full_species <- paste(camtraps$genus, camtraps$species, sep = " ")

# Extract unique species
unique_species <- unique(camtraps$full_species)

# Count the number of unique species
print(paste("Number of unique species:",length(unique_species)))

[1] "Number of unique species: 31"

# View the unique species
print(unique_species)

 [1] "Cervus elaphus"      " "                   "Ursus arctos"       
 [4] "Sus scrofa"          "Vulpes vulpes"       "Capreolus capreolus"
 [7] "Lynx lynx"           "Rupicapra rupicapra" "Bos taurus"         
[10] "Equus caballus"      "Sciurus vulgaris"    "Canis familiaris"   
[13] "Ovis aries"          "Equus asinus"        "Meles meles"        
[16] "Canis lupus"         "Felis silvestris"    "Garrulus glandarius"
[19] "Sitta europaea"      "Martes "             "Tetrao urogallus"   
[22] "Lepus europaeus"     "Felis "              "Felis catus"        
[25] "Canis aureus"        "Canis "              "Columba palumbus"   
[28] "Erinaceus europaeus" "Aquila chrysaetos"   "Bonasa bonasia"     
[31] "Dryocopus martius"

The number of species (n=31) differs from the WI summary (n=21). It seems there were occasions when the species could not be identified, only the genus (e.g. Martes, Felis, Canis) and there were some photographs that containeded no species (i.e. only ” ” is returned). How often does this happen?

# Count how many times the genus has not been identified, and therefor the photograph does not contain relevant species
empty_genus <- sum(camtraps$genus == "")

# View the count
print(empty_genus)

[1] 25692

# Proportion of dataset
proportion <- (empty_genus/nrow(camtraps))*100
print(paste("Percentage of photographs without animal captures:",proportion))

[1] "Percentage of photographs without animal captures: 35.8196469899339"

This has happened in >35% of the photographs. Remember that photographs containing humans were already filtered out by WI.

Deployments

We would like to know if all deployments were successful (i.e. the cameras took photos, regardless if animals were detected)

#count number of deployments
unique_deployments <- unique(deployment$deployment_id)
num_all_deployments <- length(unique_deployments)
#count number of successful deployments
successful_deployments<- unique(camtraps$deployment_id)
num_successful_deployments <-length(successful_deployments)
#how many were not successful?
total<-num_all_deployments - num_successful_deployments
print(paste("Number of deployments with no captures:", total))

[1] "Number of deployments with no captures: 2"

Two camera stations had no photos, which could be due to malfunction, damage or lost, or simply nothing triggered the sensors. You might want to double check this by downloading again the dataset without filtering humans as the first photograph should contain the researcher holding a white board with deployment info.

Start and End of sampling

First we need to make sure that the start_date and end_date columns are properly formatted as date-time objects.

This also includes standardising the timezones. For example, some of the dates in the dataset are BST (British Summer Time), while others are GMT (Greenwich Mean Time). This is messy because R can only handle one timezone at a time.

lubridate is an awesome package that allows fixing all kind of issues with dates and times. ynd_hms() from lubridate automatically recognises and converts dates with timezones:

# Ensure date variables are character before conversion
deployment$start_date <- as.character(deployment$start_date)
deployment$end_date <- as.character(deployment$end_date)

# Standardize timezones, in this case the dataset had some BST and some GMT dates

deployment$end_date <- gsub("BST", "GMT", deployment$end_date)  # Change BST to GMT
deployment$start_date <- gsub("BST", "GMT", deployment$start_date)  # Change BST to GMT

# Now convert start_date and end_date to POSIXct format, assuming they are in GMT
deployment$start_date <- ymd_hms(deployment$start_date, tz = "GMT")
deployment$end_date <- ymd_hms(deployment$end_date, tz = "GMT")

# Now adjust the timezone to Europe/Bucharest
deployment$start_date <- with_tz(deployment$start_date, tzone = "Europe/Bucharest")
deployment$end_date <- with_tz(deployment$end_date, tzone = "Europe/Bucharest")

# Check for NAs after conversion to ensure it has been successful
print(sum(is.na(deployment$start_date)))  # Should return 0 if conversion is successful

[1] 0

print(sum(is.na(deployment$end_date)))    # Should return 0 if conversion is successful

[1] 0

Now we can work out the start and end of the sampling period across all deployments

# Find the earliest start date
earliest_start_date <- min(deployment$start_date, na.rm = TRUE)

# Find the latest end date
latest_end_date <- max(deployment$end_date, na.rm = TRUE)

# View the results
print(paste("Earliest start date:", earliest_start_date))

[1] "Earliest start date: 2021-07-15 12:54:47"

print(paste("Latest end date:", latest_end_date))

[1] "Latest end date: 2022-06-08 16:56:56"

Deployment interval

It is useful to know the survey effort per station/deployment and identify sources of imbalance or plainly, when things have gone wrong. For this we can calculate the deployment interval: the number of days between start and end date.

# Calculate the interval for each deployment
deployment$interval <- interval(deployment$start_date, deployment$end_date)

# If you want to calculate the duration in days, hours, etc.
deployment$duration_days <- as.numeric(deployment$interval) / (60 * 60 * 24)  # Convert to days

# Check the results
print(head(deployment[, c("start_date", "end_date", "duration_days")],5))

           start_date            end_date duration_days
1 2021-09-21 12:15:00 2021-10-28 17:19:00      37.21111
2 2021-08-10 22:00:00 2021-08-23 18:05:00      12.83681
3 2021-08-10 16:38:00 2021-08-23 11:51:00      12.80069
4 2021-12-07 15:20:32 2022-03-31 16:51:22     114.02141
5 2021-07-20 15:12:00 2021-08-10 17:05:00      21.07847

# Or you can summarise the range of values obtained
summary(deployment$duration_days)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  7.076  24.160  47.377  57.968  71.997 207.949

Beth was not able to access some of the cameras due to the snow, some for up to 6 months, hence some of the intervals are above 100 days. The average interval was around 58 days, but the variation is quite large.

Sampling Effort

This is an important parameter, that will allow us to obtain others, such as the naive occupancy. Sampling effort is considered here as the number of camera days, so that is a simple line of coding:

# Sum the duration_days to calculate total camera days (sampling effort)
total_camera_days <- sum(deployment$duration_days, na.rm = TRUE)

# Round the result to NO decimal places
total_camera_days_rounded <- round(total_camera_days, 0)

# View the rounded result
print(paste("Total camera days (sampling effort):", total_camera_days_rounded))

[1] "Total camera days (sampling effort): 7884"

Camera Locations

A common mistake in camera trap data sets is that the locations are not where they are supposed to be. The safest way to check your data is to plot it.

The package leaflet is awesome for rapid and interactive maps and plots. The next R code produces the simplest version of a leaflet map. Notice you can zoom in and out!

m <- leaflet() %>%             # call leaflet
        addTiles() %>%         # add the default basemap
        addCircleMarkers(      # Add circles for stations
          lng=deployment$longitude, lat=deployment$latitude) 
m                              # return the map

Tip

You can change the base map and use some satellite imagery

m <- leaflet() %>%             # call leaflet
        addProviderTiles(providers$Esri.WorldImagery) %>% #Add Esri Wrold imagery  
        addCircleMarkers(      # Add circles for stations
          lng=deployment$longitude, lat=deployment$latitude) 
m                              # return the map

Relative Index of Abundance

The RIA is obtained by calculating the number of independent captures per species considering the sampling effort previously obtained (total_camera_days_rounded). We will exclude all blank records (i.e. those that do not have any genus identified). Remember that you need to think carefully what you consider as independent captures. For instance, you might want to consider independent captures only when there is a 30 minutes interval in between two captures of the same species, to avoid ‘double counting’ individuals that might stay around the camera, for example grazing. Examples below consider this (i.e. ‘threshold’).

Firstly, we can obtain a RAI for each species at each deployment/station:

# Define a custom function event.sp to calculate trapping events and Relative Index of Abundance (RIA)
event.sp <- function(camtraps, deployment, total_camera_days_rounded, time_threshold = NULL) {
  # event.sp: This is the name of the function being defined. It is a convention in R to use the function_name <- function(...) syntax to create a new function.
  #camtraps: This parameter expects a dataset (typically a data frame) containing information about camera trap photographs. 
  #deployment: This parameter is intended for another dataset, which contains information about camera trap deployments. This dataset will often be used to summarize or correlate the data from the camtraps dataset.
  #total_camera_days_rounded: This parameter is passed to the function and represents a numerical value indicating the total camera days, rounded to the nearest integer. This value is important for calculations, such as the Relative Index of Abundance (RIA), as it reflects the effort over which species observations occurred.
  #time_threshold = NULL: This parameter is optional and has a default value of NULL. If a value is provided when calling the function, it will be used as the time threshold for determining whether consecutive observations of the same species are considered separate events. For example, if you set a time threshold (e.g., 30 minutes), only observations separated by more than 30 minutes would be counted as separate trapping events.
  
  # Check for missing values in genus
  missing_genus <- sum(is.na(camtraps$genus) | camtraps$genus == "")
  
  # Print count of missing genus values
  if (missing_genus > 0) {
    cat("Warning: Found", missing_genus, "missing or empty values in 'genus'. Rows with missing genus will be excluded.\n")
  }
  
  # Filter out rows with missing or empty genus
  camtraps <- camtraps %>%
    filter(!is.na(genus) & genus != "")
  
  # Combine 'genus' and 'species' into a new 'full_species' column in the camtraps dataset
  camtraps <- camtraps %>%
    mutate(
      full_species = paste(genus, species, sep = " "),  # Combine genus and species
      full_species = trimws(full_species)  # Remove any leading/trailing whitespace
    ) 
  
  # Convert timestamp column in camtraps to POSIXct if needed
  camtraps$timestamp <- as.POSIXct(camtraps$timestamp, format = "%Y-%m-%d %H:%M:%S")
  
  # Calculate trapping events either with or without a time threshold
  if (is.null(time_threshold)) {
    trapping_events <- camtraps %>%
      group_by(full_species, deployment_id) %>%
      summarise(trapping_event_count = n(), .groups = 'drop')
    
  } else {
    # Sort by deployment, species, and timestamp
    camtraps <- camtraps %>%
      arrange(deployment_id, full_species, timestamp)
    
    # Calculate time difference between consecutive events
    trapping_events <- camtraps %>%
      group_by(full_species, deployment_id) %>%
      mutate(time_diff = c(NA, diff(as.numeric(timestamp)))) %>%
      mutate(new_event = ifelse(is.na(time_diff) | time_diff > time_threshold, 1, 0)) %>%
      summarise(trapping_event_count = sum(new_event), .groups = 'drop')
  }
  
  # Calculate RIA using the total camera days rounded
  trapping_events <- trapping_events %>%
    mutate(RIA = trapping_event_count / total_camera_days_rounded)  # Calculate RIA

  # Return the trapping events summary with RIA
  return(trapping_events)
}

# Example usage
# Assuming total_camera_days_rounded is defined elsewhere in your code

# Without time threshold
trapping_events_no_threshold <- event.sp(camtraps, deployment, total_camera_days_rounded)

Warning: Found 25692 missing or empty values in 'genus'. Rows with missing genus will be excluded.

# With a 30-minute threshold (in seconds)
trapping_events_with_threshold <- event.sp(camtraps, deployment, total_camera_days_rounded, time_threshold = 30 * 60)

Warning: Found 25692 missing or empty values in 'genus'. Rows with missing genus will be excluded.

# View the results
print(trapping_events_no_threshold)

# A tibble: 755 × 4
   full_species      deployment_id trapping_event_count      RIA
   <chr>             <chr>                        <int>    <dbl>
 1 Aquila chrysaetos D6_AC11-2                        1 0.000127
 2 Bonasa bonasia    D4_T14-1                         3 0.000381
 3 Bos taurus        D1_AD10-1                      326 0.0413  
 4 Bos taurus        D2_AA11-1                       14 0.00178 
 5 Bos taurus        D3_AA11-1                       93 0.0118  
 6 Canis             D2_AA12-2                        5 0.000634
 7 Canis             D3_AD10-2                        2 0.000254
 8 Canis             D4_AC10-2                        1 0.000127
 9 Canis             D4_AD10-2                        7 0.000888
10 Canis             D4_S14-1                         1 0.000127
# ℹ 745 more rows

print(trapping_events_with_threshold)

# A tibble: 755 × 4
   full_species      deployment_id trapping_event_count      RIA
   <chr>             <chr>                        <dbl>    <dbl>
 1 Aquila chrysaetos D6_AC11-2                        1 0.000127
 2 Bonasa bonasia    D4_T14-1                         2 0.000254
 3 Bos taurus        D1_AD10-1                        1 0.000127
 4 Bos taurus        D2_AA11-1                        1 0.000127
 5 Bos taurus        D3_AA11-1                        1 0.000127
 6 Canis             D2_AA12-2                        3 0.000381
 7 Canis             D3_AD10-2                        1 0.000127
 8 Canis             D4_AC10-2                        1 0.000127
 9 Canis             D4_AD10-2                        7 0.000888
10 Canis             D4_S14-1                         1 0.000127
# ℹ 745 more rows

But typically we are more interested in the overall results, the RAI for each species, across the study area:

# Assuming trapping_events_no_threshold is obtained from the event.sp function and contains the trapping event data
   # if you have set up a threshold, like above you should use
   # trapping_events_with_threshold in this R chunk

# Summarize results across all deployments
species_summary <- trapping_events_no_threshold %>%
  group_by(full_species) %>%
  summarise(
    total_trapping_event_count = sum(trapping_event_count, na.rm = TRUE),  # Sum of trapping events
    total_RAI = sum(RIA, na.rm = TRUE),  # Sum of RAI
    .groups = 'drop'  # Drop grouping after summarization
  ) %>%
  arrange(desc(total_trapping_event_count))  # Optional: sort by total trapping events

# Print a nicely formatted table

kable(species_summary, 
      format = "html",   # C
      col.names = c("Species", "Total Trapping Events", "Total RAI"), 
      caption = "Summary of Trapping Events and RAI by Species")

Summary of Trapping Events and RAI by Species
Species	Total Trapping Events	Total RAI
Cervus elaphus	22716	2.8812785
Ovis aries	5695	0.7223491
Canis familiaris	3540	0.4490107
Ursus arctos	2898	0.3675799
Canis lupus	2043	0.2591324
Vulpes vulpes	1760	0.2232369
Sus scrofa	1650	0.2092846
Rupicapra rupicapra	1597	0.2025622
Equus caballus	1020	0.1293760
Equus asinus	975	0.1236682
Capreolus capreolus	778	0.0986809
Bos taurus	433	0.0549214
Lynx lynx	338	0.0428716
Martes	122	0.0154744
Sciurus vulgaris	99	0.0125571
Meles meles	90	0.0114155
Felis silvestris	88	0.0111618
Lepus europaeus	67	0.0084982
Tetrao urogallus	65	0.0082445
Canis	18	0.0022831
Erinaceus europaeus	9	0.0011416
Columba palumbus	7	0.0008879
Canis aureus	5	0.0006342
Dryocopus martius	5	0.0006342
Felis catus	5	0.0006342
Garrulus glandarius	4	0.0005074
Bonasa bonasia	3	0.0003805
Sitta europaea	2	0.0002537
Aquila chrysaetos	1	0.0001268
Felis	1	0.0001268

Looking at these results we could make the rather superficial (and most likely erroneous) assumption that red deer is the most abundant species in the study area. This obviously would not consider all sort of factors affecting detectability, which were discussed during the module sessions.

You will see below how the RAI changes dramatically if you use a threshold/interval of 30 minutes between detections to be considered independent:

Summary of Trapping Events and RAI by Species
Species	Total Trapping Events	Total RAI
Vulpes vulpes	1002	0.1270928
Cervus elaphus	834	0.1057839
Ursus arctos	505	0.0640538
Canis lupus	258	0.0327245
Canis familiaris	247	0.0313293
Sus scrofa	134	0.0169964
Equus caballus	114	0.0144597
Capreolus capreolus	109	0.0138255
Martes	85	0.0107813
Lynx lynx	83	0.0105277
Sciurus vulgaris	80	0.0101471
Rupicapra rupicapra	68	0.0086251
Meles meles	54	0.0068493
Felis silvestris	52	0.0065956
Ovis aries	52	0.0065956
Lepus europaeus	43	0.0054541
Equus asinus	42	0.0053272
Tetrao urogallus	17	0.0021563
Canis	15	0.0019026
Erinaceus europaeus	6	0.0007610
Columba palumbus	5	0.0006342
Felis catus	4	0.0005074
Garrulus glandarius	4	0.0005074
Bos taurus	3	0.0003805
Bonasa bonasia	2	0.0002537
Canis aureus	2	0.0002537
Dryocopus martius	2	0.0002537
Aquila chrysaetos	1	0.0001268
Felis	1	0.0001268
Sitta europaea	1	0.0001268

Naive Occupancy

The calculation of naive occupancy (deployment/stations positive to the presence of each species on the total number of deployment/stations over the study area) is relatively simple and you could do it by hand or on Excel. You can also do it using R!.

# Load required libraries
library(dplyr)
library(knitr)  # For kable function

# Calculate Naive Occupancy
naive_occupancy <- camtraps %>%
  # Combine 'genus' and 'species' into a new 'full_species' column
  mutate(full_species = paste(genus, species, sep = " ")) %>%
  # Filter out rows with missing genus
  filter(!is.na(genus) & genus != "") %>%
  # Count unique deployments for each species
  group_by(full_species, deployment_id) %>%
  summarise(detected = n() > 0, .groups = 'drop') %>%  # Create a presence indicator
  group_by(full_species) %>%
  summarise(
    number_of_positive_deployments = sum(detected),  # Count of unique deployments where species was detected
    .groups = 'drop'  # Drop grouping after summarization
  ) %>%
  # Join with the deployment dataset to get total number of deployments
  left_join(deployment %>% summarise(total_deployments = n_distinct(deployment_id)), by = character()) %>%
  # Finalize naive occupancy calculation
  mutate(naive_occupancy = number_of_positive_deployments / total_deployments) %>%
  select(species = full_species, number_of_positive_deployments, total_deployments, naive_occupancy) %>%
  arrange(desc(naive_occupancy))  # Optional: sort by naive occupancy

# Print the naive occupancy results with formatted table

# Use kable to print the table in HTML format
kable(naive_occupancy, format = "html", caption = "Naive Occupancy Summary")

Naive Occupancy Summary
species	number_of_positive_deployments	total_deployments	naive_occupancy
Vulpes vulpes	100	135	0.7407407
Cervus elaphus	99	135	0.7333333
Canis lupus	76	135	0.5629630
Ursus arctos	69	135	0.5111111
Capreolus capreolus	58	135	0.4296296
Canis familiaris	53	135	0.3925926
Sus scrofa	43	135	0.3185185
Lynx lynx	37	135	0.2740741
Martes	31	135	0.2296296
Felis silvestris	29	135	0.2148148
Sciurus vulgaris	27	135	0.2000000
Meles meles	24	135	0.1777778
Ovis aries	20	135	0.1481481
Rupicapra rupicapra	16	135	0.1185185
Equus asinus	14	135	0.1037037
Equus caballus	13	135	0.0962963
Tetrao urogallus	10	135	0.0740741
Lepus europaeus	9	135	0.0666667
Canis	7	135	0.0518519
Columba palumbus	4	135	0.0296296
Bos taurus	3	135	0.0222222
Garrulus glandarius	3	135	0.0222222
Canis aureus	2	135	0.0148148
Felis catus	2	135	0.0148148
Aquila chrysaetos	1	135	0.0074074
Bonasa bonasia	1	135	0.0074074
Dryocopus martius	1	135	0.0074074
Erinaceus europaeus	1	135	0.0074074
Felis	1	135	0.0074074
Sitta europaea	1	135	0.0074074

Results seem to indicate that the most widespread species are foxes and red deer (over 70% of cameras confirm their presence), followed by wolves and brown bears (over 50%)…but again, results are impacted by multiple factors that we have not controlled for.

You can also visualise occupancy on a very basic map using the package camtrapR:

library(camtrapR)
par(mar = c(5, 5, 5, 10))
BasicMap<-detectionMaps(CTtable =deployment, 
                         recordTable =camtraps,
                         Xcol ="longitude", 
                         Ycol ="latitude", 
                         stationCol ="deployment_id", 
                         speciesCol ="full_species",
                         printLabels =FALSE, #no need to show labels
                         richnessPlot=TRUE,
                         addLegend=FALSE)

Species Accumulation Curve

Resources always limit our projects (e.g. workforce, camera traps, transport, etc). That is why it is important that we try to work out the sampling effort required to characterise the animal community of our study area to make sure our resources are only deployed when needed.

In camera trapping, typically, the sampling effort is measured by the number of camera trap days. We would like to work out how many camera trap days are required to detect the majority of species in our study area. Please notice that this only applies for projects focused on producing an inventory.

# Step 1: Ensure date columns are Date types
deployment <- deployment %>%
  mutate(start_date = as.Date(start_date),
         end_date = as.Date(end_date))

The mutate function is used to convert the start_date and end_date columns in the deployment dataset to Date types. This ensures that date calculations and comparisons can be performed accurately.

# Step 2: Create a date sequence for each deployment
deployment_dates <- deployment %>%
  select(deployment_id, start_date, end_date) %>%
  # Create a data frame to hold all the dates
  rowwise() %>%
  do(data.frame(deployment_id = .$deployment_id, 
                 date = seq(.$start_date, .$end_date, by = "day"))) %>%
  ungroup()  # Expand the rows for each date

The code selects relevant columns (deployment_id, start_date, and end_date) from the deployment dataset.
The rowwise function allows operations to be performed on each row individually.
The do function creates a new data frame for each row, generating a sequence of dates from start_date to end_date using seq(). This captures all days for each deployment.
Finally, ungroup() is called to revert the data back to a regular data frame structure after generating the sequences.

# Step 3: Combine genus and species into full_species
camtraps <- camtraps %>%
  mutate(full_species = paste(genus, species, sep = " "),
         timestamp = as.Date(timestamp)) %>%  # Convert timestamp to Date
  select(deployment_id, full_species, timestamp)

The camtraps dataset is modified to create a new column full_species, which combines the genus and species columns into a single string, facilitating easier identification of species.
The timestamp column is also converted to Date format.
The select function is used to keep only relevant columns for further analysis: deployment_id, full_species, and timestamp.

# Step 4: Count unique species detected per day for each deployment
species_per_day <- deployment_dates %>%
  left_join(camtraps, by = c("deployment_id", "date" = "timestamp")) %>%
  group_by(deployment_id, date) %>%
  summarise(species_detected = list(unique(full_species)), .groups = 'drop') %>%
  ungroup()

The deployment_dates dataset is left-joined with the camtraps dataset using deployment_id and matching date with timestamp.
The result is grouped by deployment_id and date, allowing for summarization of the species detected each day.
The summarise function collects the unique species detected for each day into a list called species_detected.
The .groups = 'drop' argument prevents additional grouping metadata from being carried over to the next steps.
Finally, ungroup() is called to revert the dataset to a regular data frame.

# Step 5: Calculate cumulative number of new species detected
# Initialize variables
cumulative_species <- numeric(nrow(species_per_day))

# Loop through each date and calculate cumulative new species
detected_species <- character(0)  # Start with no species detected

for (i in seq_len(nrow(species_per_day))) {
  # Get newly detected species for the day
  new_species <- species_per_day$species_detected[[i]]
  
  # Update the cumulative list
  detected_species <- union(detected_species, new_species)
  
  # Store the cumulative count
  cumulative_species[i] <- length(detected_species)
}

# Combine with dates
species_per_day <- species_per_day %>%
  mutate(cumulative_species = cumulative_species)

An empty numeric vector cumulative_species is initialized to store the cumulative counts of species.
A character vector detected_species is initialized to keep track of species detected up to the current date.
A loop iterates over each row of species_per_day. For each date:
- The unique species detected on that day is retrieved.
- The union function updates the detected_species list to include any new species detected.
- The cumulative count of detected species is updated and stored in cumulative_species.
After the loop, the cumulative_species vector is added to the species_per_day dataset.

# Step 6: Count camera days
camera_days <- species_per_day %>%
  mutate(camera_day = row_number())

The camera_days dataset is created by adding a new column camera_day to species_per_day. This column is generated using row_number(), which gives a sequential count of camera trap days.

# Step 7: Plot the results
ggplot(camera_days, aes(x = camera_day, y = cumulative_species)) +
  geom_line(color = "blue", linewidth = 1) +  # Use `linewidth` instead of `size`
  geom_point(color = "red") +
  labs(
    x = "Number of Camera Days",
    y = "Cumulative Number of New Species Detected",
    title = "Species Accumulation Curve"
  ) +
  theme_minimal()

The ggplot function creates a plot using camera_days as the dataset.
The x-axis represents the total number of camera days, while the y-axis shows the cumulative number of new species detected.
geom_line draws a blue line representing the cumulative species count over time, and geom_point adds red points to indicate specific data points.
The labs function labels the axes and sets the title of the plot.
theme_minimal() applies a clean and simple theme to the plot for better aesthetics.

The result is a Species Accumulation Curve showing that after 2200 camera traps there is no much point increasing the sampling effort.