Data Preparation - Occupancy Analysis (single species and multiple season - Wolf)

Author

Antonio Uzal

Published

October 1, 2024

Multi-Season Occupancy Modeling

Single season occupancy models assume population closure; there is no immigration or emigration of individuals, neither significant changes in population numbers due to mortality or births.

However, immigration, dispersal, deaths and births are natural occurrences within populations. Also, the occupancy of species might differ across periods, for instance wild ungulates have different feeding behaviours and habitat selection during the wet and dry seasons. Lastly, the detectability of species might not have a linear relationship with abundance, as different factors might affect the probability that a camera trap will detect the individuals. Some of these factors might vary during the year, for example vegetation cover, which differs across seasons for deciduous plants.

A primary goal of monitoring programs is often to document and understand trends in species occurrence over time, in addition to understanding spatiotemporal variability in occurrence in relation to a set of covariates.

Multi-season occupancy models measure the probability of initial occupancy (i.e. during season 1), but they also estimate the probability of occupancy in the next time period (given initial occupancy, colonisation and extinction) and the probability of detection. These models are also called Dynamic occupancy models (MacKenzie et al. 2003).

In multi-season occupancy models, the simple single-season+species model is complemented by introducing parameters for survival (or extinction) and colonisation probability.

The Task Ahead

We have been using the datasets produced by Wildlife Insights after identifying the content of the photographs and filtering out those with humans.

In previous sessions (Data Preparation and Occupancy Analyses for single species/single season studies), we have used a dataset based on the camera trapping data of Dr Bethany Smith, a former NTU PhD candidate. We mainly dealt with two datasets containing deployment data and raw image data. We also added a couple of covariates to the deployment dataset, selected the summer data and obtained the survey effort and the detection matrix.

The deployment dataset contains details of the camera trap location (longitude and latitude), the period of deployment and on which feature was placed (e.g. road, trail game). Elevation and ruggedness values at each deployment location were added to the dataset.
The summer camera traps raw data contains details related to the the deployment ID, date and time of capture and the animal captured
The effort data is defined as the number of days a camera was active during a sampling occasion.
The detection matrix for single-season occupancy models includes a SINGLE row for each deployment, with a column stating the detection (i.e. 0=no detection, 1=detection) for each sampling occasion and the columns containing the covariates of interest.

In this tutorial, we need to generate a detection matrix that contains the detections across multiple seasons, before we can analyse the data using an occupancy modeling framework.

Initial preparatory steps

Import datasets and standardise the date and time for further analyses

library(readr)
library(lubridate)
library(knitr)
library(tidyverse)
library(kableExtra)

# deployment data, including covariates
deployment_data<- read.csv("deployment+covariates.csv",
                            header = TRUE,
                            sep = ",",
                            stringsAsFactors = FALSE)

      
# upload the camera data (i.e raw photo data)
camtraps <- read.csv("images_LGDs.csv",
                     header = TRUE,
                     sep = ",",
                     stringsAsFactors = FALSE)

Effort as Detection Covariate

Effort can be used as a detection probability variable. Camera effort can be defined as the number of days a camera was active during a sampling occasion. Effort can affect species detection probability(Wevers et al. 2021). You will see some papers using this covariate, others do not. The following code will allow you to obtain this variable and save it as an independent dataset.

library(dplyr)
library(lubridate)

# Ensure start and end dates are in Date format (not POSIXct)
deployment_data <- deployment_data %>%
  mutate(start_date = as.Date(start_date, format = "%Y-%m-%d"),
         end_date = as.Date(end_date, format = "%Y-%m-%d"))

# Get the absolute earliest start date across all deployments
overall_start_date <- as.Date(min(deployment_data$start_date))

# Get the absolute latest end date across all deployments
overall_end_date <- as.Date(max(deployment_data$end_date))

# Define period length - in this case, it is 10 days
period_length <- 10

# Calculate the total number of periods required to cover the entire range
total_periods <- ceiling(as.numeric(difftime(overall_end_date, overall_start_date, units = "days")) / period_length)

# Dynamically create SO column names
so_columns <- paste0("SO", 1:total_periods)

# Initialize an empty data frame for the effort dataset with dynamic SO columns
effort_df <- data.frame(deployment_id = character(), 
                        matrix(NA, nrow = 0, ncol = total_periods),
                        start_date = as.Date(character()),  # Add start_date
                        end_date = as.Date(character()),   # Add end_date
                        stringsAsFactors = FALSE)

# Set the column names for the periods
colnames(effort_df)[2:(total_periods+1)] <- so_columns

# Loop through each deployment to calculate effort for each period (SO1 to SOX, depending on the deployment period)
for (index in 1:nrow(deployment_data)) {
  deployment_id <- deployment_data$deployment_id[index]
  start_date <- deployment_data$start_date[index]
  end_date <- deployment_data$end_date[index]
  total_days <- as.numeric(difftime(end_date, start_date, units = "days")) + 1
  
  # Initialize an array to hold the days for each period
  period_days <- rep(NA, total_periods)  # Adjust the number of periods dynamically
  
  # Loop through each period
  for (period in 0:(total_periods-1)) {
    period_start <- overall_start_date + days(period * period_length)
    period_end <- period_start + days(period_length - 1)
    
    # Check if the deployment overlaps with the current period
    if (period_end >= start_date && period_start <= end_date) {
      active_days <- max(0, as.numeric(difftime(min(end_date, period_end), max(start_date, period_start), units = "days")) + 1)
      period_days[period + 1] <- as.integer(active_days)  # Convert to integer to avoid decimals
    }
  }
  
  # Combine results into the effort_df
  effort_df <- rbind(effort_df, data.frame(deployment_id, t(period_days), start_date, end_date, stringsAsFactors = FALSE))
}

# Rename columns to match the SO format
colnames(effort_df)[2:(total_periods+1)] <- so_columns

# Clean up the dataset to remove duplicate columns
# No need for left_join if columns are already present in the main dataframe
effort_df <- effort_df %>%
  select(deployment_id, all_of(so_columns), start_date, end_date)

# Organize the dataset by earliest start_date
effort_df <- effort_df %>%
  arrange(start_date)

# Save the updated effort_df as a CSV file
write.csv(effort_df, 
          "effort_data.csv", 
          row.names = FALSE)

# Get the first 5 rows
first_5_rows <- head(effort_df, 10)

# Create a nice table for visualisation
kable(first_5_rows, format = "html", caption = "Effort data over 105 days, split by 10 days ocassions") %>%
  kable_styling(full_width = FALSE, position = "left")

Effort data over 105 days, split by 10 days ocassions
deployment_id	SO1	SO2	SO3	SO4	SO5	SO6	SO7	SO8	SO9	SO10	SO11	SO12	SO13	SO14	SO15	SO16	SO17	SO18	SO19	SO20	SO21	SO22	SO23	SO24	SO25	SO26	SO27	SO28	SO29	SO30	SO31	SO32	SO33	start_date	end_date
D1_AB11-1	10	10	7	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2021-07-15	2021-08-10
D1_AC10-1	10	10	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2021-07-15	2021-08-04
D1_AC11-1	10	10	7	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2021-07-15	2021-08-10
D1_AC11-2	10	10	7	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2021-07-15	2021-08-10
D1_AD10-1	10	10	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2021-07-15	2021-08-04
D1_AA10-1	5	10	6	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2021-07-20	2021-08-09
D1_AA11-1	5	10	7	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2021-07-20	2021-08-10
D1_W13-1	5	10	7	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2021-07-20	2021-08-10
D1_Y11-1	5	10	7	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2021-07-20	2021-08-10
D1_Z10-1	5	3	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2021-07-20	2021-07-27

The detection matrix

Similarly to single-season occupancy models, we need the structure to include a SINGLE row for each deployment, with a column stating the detection (i.e. 0=no detection, 1=detection) for each sampling occasion and the columns containing the covariates of interest.

To obtain this structure, we will need to move things around. As explained in previous tutorials, you can do this manually and if the dataset is not extensive it should not take you too long. Also it might be appropriate if you want to have a good sense for the nature of the data and reflect on effort and potential results. If you prefer to automatise the process, you can use the R code below, which includes an extensive explanation of what is going on :-)

Libraries Loaded:

dplyr: For data manipulation using pipes (%>%) and functions like filter(), mutate(), and select().
lubridate: For handling dates and time calculations, making it easier to manipulate date formats.

Code Breakdown:

Ensure Start and End Dates Are in Date Format:

deployment_data <- deployment_data %>%
  mutate(start_date = as.Date(start_date, format = "%Y-%m-%d"),
         end_date = as.Date(end_date, format = "%Y-%m-%d"))

Converts the start_date and end_date columns of the filtered_deployment dataset into Date format. This ensures the dates are consistent for comparison in subsequent steps.

Get the Absolute Earliest Start Date and Latest End Date:

overall_start_date <- as.Date(min(deployment_data$start_date))
overall_end_date <- as.Date(max(deployment_data$end_date))

Finds the earliest deployment start date across all rows of the filtered_deployment dataset. This date will be used as a reference point to define periods for the detection matrix.

Define the Period Length:

period_length <- 10

Sets the length of each period to 10 days. These periods are used to group the data into intervals (Sampling Ocassions, S01 to S0xx) for occupancy modeling.

Calculate the total number of periods (sessions SO1 to SOX)

total_periods <- ceiling(as.numeric(difftime(overall_end_date, overall_start_date, units = "days")) / period_length)

Dynamically create column names for SO1 to SOX

so_columns <- paste0("SO", 1:total_periods)

Create the wolf_detection Dataset:

wolf_detection <- deployment_data %>%
  select(deployment_id, longitude, latitude, feature_type, elevation, ruggedness)

Extracts key columns (deployment_id, longitude, latitude, etc.) from the filtered_deployment dataset (which is a summer subset of the original dataset) to form a new dataset, wolf_detection. This dataset will eventually contain the detection matrix.

Initialize the Period Columns:

# Initialize the SO columns for detection data
# Here we explicitly create the SO columns with NA
for (col in so_columns) {
  wolf_detection[[col]] <- NA
}

Adds columns (SO1 to SOX) to represent the detection status of “wolf” in each 10-day period. These are initialised with NA (this means that by default, periods with no overlap will stay as NA).

Loop Through Each Deployment:

#initiate the loop
for (i in 1:nrow(wolf_detection)) {
  
  current_deployment_id <- wolf_detection$deployment_id[i]  # Current deployment id
  
  # Get start and end date for the current deployment
  start_date <- deployment_data$start_date[deployment_data$deployment_id == current_deployment_id][1]
  end_date <- deployment_data$end_date[deployment_data$deployment_id == current_deployment_id][1]
  
  # Loop through each period (SO1 to SOX)
  for (period in 1:total_periods) {
    
    # Calculate the period start and end dates
    period_start <- overall_start_date + days((period - 1) * period_length)
    period_end <- period_start + days(period_length - 1)
    
    # Check if the deployment overlaps with the current period
    if (max(as.Date(period_start), as.Date(start_date)) <= min(as.Date(period_end), as.Date(end_date))) {
      
      # Filter records specifically for the current deployment and species (Grey wolf)
      wolf_count <- camtraps %>%
        filter(deployment_id == current_deployment_id,
               common_name == "Grey Wolf",
               as.POSIXct(timestamp) >= as.POSIXct(period_start) & 
               as.POSIXct(timestamp) <= as.POSIXct(period_end))
      
      # Count occurrences
      wolf_count_num <- nrow(wolf_count)
      
      # Assign count to detection dataframe
      wolf_detection[[paste0("SO", period)]][i] <- ifelse(wolf_count_num > 0, 1, 0)
    } else {
      wolf_detection[[paste0("SO", period)]][i] <- NA
    }
  }
}

Iterates over each row of the wolf_detection dataset, corresponding to each deployment. For each deployment, it performs the following steps:

Extract Deployment-Specific Dates:

For each deployment, retrieves the start_date and end_date by matching the deployment_id. The [1] ensures that only a single value is extracted, even if there are multiple matches.
Loop Through the S01 to S05 Periods:

Loops over five 21-day periods (S01 to S05), dynamically setting the column name for each period (period_col).
Check If Deployment Overlaps with Period:

Checks if the deployment’s start and end dates overlap with the current period. If so, the code proceeds to check for detections during this period.
Count wolf Detections for the Period:

Filters the filtered_camtraps dataset (which contains camera trap data) to count how many times “Grey wolf” was detected during the current period for the current deployment. This is done by filtering on:
- The matching deployment_id.
- The species name common_name being “Grey wolf”.
- The timestamp falling within the period’s start and end dates.
Update the Period Column if wolf Was Detected:

If any “Grey wolf” detections were found (i.e., wolf_count > 0), the value for the respective period column (SO1 to SO5) is set to 1, indicating detection. If there are no detections but there was overlap, the value is set to 0. If they do not overlap, the value remains NA.

Save the wolf_detection Dataset as a CSV File and visualise outputs:

write.csv(wolf_detection, 
          "wolf_detection.csv", 
          row.names = FALSE)

# Get the first 10 rows
first_10_rows_wolf <- head(wolf_detection, 10)

# Create a nice table for visualization
kable(first_10_rows_wolf, format = "html", caption = "Detection matrix for wolf over 105 days, split by 10-day occasions") %>%
  kable_styling(full_width = FALSE, position = "left")

Detection matrix for wolf over 105 days, split by 10-day occasions
deployment_id	longitude	latitude	feature_type	elevation	ruggedness	SO1	SO2	SO3	SO4	SO5	SO6	SO7	SO8	SO9	SO10	SO11	SO12	SO13	SO14	SO15	SO16	SO17	SO18	SO19	SO20	SO21	SO22	SO23	SO24	SO25	SO26	SO27	SO28	SO29	SO30	SO31	SO32	SO33
D1_AA10-1	24.00188	45.62046	Trail game	1137	-4.155159	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
D1_AA11-1	23.98999	45.60232	Road dirt	1073	-16.522067	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
D1_AA12-1	24.00164	45.58462	Road dirt	1155	-9.000408	NA	NA	0	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
D1_AA12-2	24.00787	45.57602	Road dirt	1158	-18.749835	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	1	0	0	0	0	NA	NA	NA	NA	NA	NA	NA
D1_AA13-2	23.98795	45.55146	Road dirt	1516	15.614101	NA	NA	NA	NA	NA	NA	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
D1_AA14-1	23.99981	45.53323	Road dirt	1554	3.011621	NA	NA	0	0	1	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
D1_AB11-1	24.05055	45.60223	Trail game	1029	-18.145098	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
D1_AB12-2	24.03086	45.57991	Trail game	1230	-5.843720	NA	NA	NA	NA	NA	NA	0	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
D1_AB13-1	24.02157	45.56658	Road dirt	1461	24.696217	NA	NA	0	0	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
D1_AC10-1	24.09269	45.62117	Road dirt	1034	-12.185958	0	0	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

References

MacKenzie, Darryl I., James D. Nichols, James E. Hines, Melinda G. Knutson, and Alan B. Franklin. 2003. “ESTIMATING SITE OCCUPANCY, COLONIZATION, AND LOCAL EXTINCTION WHEN A SPECIES IS DETECTED IMPERFECTLY.” Ecology 84 (8): 2200–2207. https://doi.org/10.1890/02-3090.

Wevers, Jolien, Natalie Beenaerts, Jim Casaer, Fridolin Zimmermann, Tom Artois, and Julien Fattebert. 2021. “Modelling Species Distribution from Camera Trap by-Catch Using a Scale-Optimized Occupancy Approach.” Edited by Marcus Rowcliffe and Rahel Sollmann. Remote Sensing in Ecology and Conservation 7 (3): 534–49. https://doi.org/10.1002/rse2.207.