Data Preparation - Occupancy Analysis (single species and multiple season - Wolf)

Author

Antonio Uzal

Published

October 1, 2024

Multi-Season Occupancy Modeling

Single season occupancy models assume population closure; there is no immigration or emigration of individuals, neither significant changes in population numbers due to mortality or births.

However, immigration, dispersal, deaths and births are natural occurrences within populations. Also, the occupancy of species might differ across periods, for instance wild ungulates have different feeding behaviours and habitat selection during the wet and dry seasons. Lastly, the detectability of species might not have a linear relationship with abundance, as different factors might affect the probability that a camera trap will detect the individuals. Some of these factors might vary during the year, for example vegetation cover, which differs across seasons for deciduous plants.

A primary goal of monitoring programs is often to document and understand trends in species occurrence over time, in addition to understanding spatiotemporal variability in occurrence in relation to a set of covariates.

Multi-season occupancy models measure the probability of initial occupancy (i.e. during season 1), but they also estimate the probability of occupancy in the next time period (given initial occupancy, colonisation and extinction) and the probability of detection. These models are also called Dynamic occupancy models (MacKenzie et al. 2003).

In multi-season occupancy models, the simple single-season+species model is complemented by introducing parameters for survival (or extinction) and colonisation probability.

The Task Ahead

We have been using the datasets produced by Wildlife Insights after identifying the content of the photographs and filtering out those with humans.

In previous sessions (Data Preparation and Occupancy Analyses for single species/single season studies), we have used a dataset based on the camera trapping data of Dr Bethany Smith, a former NTU PhD candidate. We mainly dealt with two datasets containing deployment data and raw image data. We also added a couple of covariates to the deployment dataset, selected the summer data and obtained the survey effort and the detection matrix.

  • The deployment dataset contains details of the camera trap location (longitude and latitude), the period of deployment and on which feature was placed (e.g. road, trail game). Elevation and ruggedness values at each deployment location were added to the dataset.

  • The summer camera traps raw data contains details related to the the deployment ID, date and time of capture and the animal captured

  • The effort data is defined as the number of days a camera was active during a sampling occasion.

  • The detection matrix for single-season occupancy models includes a SINGLE row for each deployment, with a column stating the detection (i.e. 0=no detection, 1=detection) for each sampling occasion and the columns containing the covariates of interest.

In this tutorial, we need to generate a detection matrix that contains the detections across multiple seasons, before we can analyse the data using an occupancy modeling framework.

Initial preparatory steps

Import datasets and standardise the date and time for further analyses

library(readr)
library(lubridate)
library(knitr)
library(tidyverse)
library(kableExtra)

# deployment data, including covariates
deployment_data<- read.csv("deployment+covariates.csv",
                            header = TRUE,
                            sep = ",",
                            stringsAsFactors = FALSE)

      
# upload the camera data (i.e raw photo data)
camtraps <- read.csv("images_LGDs.csv",
                     header = TRUE,
                     sep = ",",
                     stringsAsFactors = FALSE)

Effort as Detection Covariate

Effort can be used as a detection probability variable. Camera effort can be defined as the number of days a camera was active during a sampling occasion. Effort can affect species detection probability(Wevers et al. 2021). You will see some papers using this covariate, others do not. The following code will allow you to obtain this variable and save it as an independent dataset.

library(dplyr)
library(lubridate)

# Ensure start and end dates are in Date format (not POSIXct)
deployment_data <- deployment_data %>%
  mutate(start_date = as.Date(start_date, format = "%Y-%m-%d"),
         end_date = as.Date(end_date, format = "%Y-%m-%d"))

# Get the absolute earliest start date across all deployments
overall_start_date <- as.Date(min(deployment_data$start_date))

# Get the absolute latest end date across all deployments
overall_end_date <- as.Date(max(deployment_data$end_date))

# Define period length - in this case, it is 10 days
period_length <- 10

# Calculate the total number of periods required to cover the entire range
total_periods <- ceiling(as.numeric(difftime(overall_end_date, overall_start_date, units = "days")) / period_length)

# Dynamically create SO column names
so_columns <- paste0("SO", 1:total_periods)

# Initialize an empty data frame for the effort dataset with dynamic SO columns
effort_df <- data.frame(deployment_id = character(), 
                        matrix(NA, nrow = 0, ncol = total_periods),
                        start_date = as.Date(character()),  # Add start_date
                        end_date = as.Date(character()),   # Add end_date
                        stringsAsFactors = FALSE)

# Set the column names for the periods
colnames(effort_df)[2:(total_periods+1)] <- so_columns

# Loop through each deployment to calculate effort for each period (SO1 to SOX, depending on the deployment period)
for (index in 1:nrow(deployment_data)) {
  deployment_id <- deployment_data$deployment_id[index]
  start_date <- deployment_data$start_date[index]
  end_date <- deployment_data$end_date[index]
  total_days <- as.numeric(difftime(end_date, start_date, units = "days")) + 1
  
  # Initialize an array to hold the days for each period
  period_days <- rep(NA, total_periods)  # Adjust the number of periods dynamically
  
  # Loop through each period
  for (period in 0:(total_periods-1)) {
    period_start <- overall_start_date + days(period * period_length)
    period_end <- period_start + days(period_length - 1)
    
    # Check if the deployment overlaps with the current period
    if (period_end >= start_date && period_start <= end_date) {
      active_days <- max(0, as.numeric(difftime(min(end_date, period_end), max(start_date, period_start), units = "days")) + 1)
      period_days[period + 1] <- as.integer(active_days)  # Convert to integer to avoid decimals
    }
  }
  
  # Combine results into the effort_df
  effort_df <- rbind(effort_df, data.frame(deployment_id, t(period_days), start_date, end_date, stringsAsFactors = FALSE))
}

# Rename columns to match the SO format
colnames(effort_df)[2:(total_periods+1)] <- so_columns

# Clean up the dataset to remove duplicate columns
# No need for left_join if columns are already present in the main dataframe
effort_df <- effort_df %>%
  select(deployment_id, all_of(so_columns), start_date, end_date)

# Organize the dataset by earliest start_date
effort_df <- effort_df %>%
  arrange(start_date)

# Save the updated effort_df as a CSV file
write.csv(effort_df, 
          "effort_data.csv", 
          row.names = FALSE)

# Get the first 5 rows
first_5_rows <- head(effort_df, 10)

# Create a nice table for visualisation
kable(first_5_rows, format = "html", caption = "Effort data over 105 days, split by 10 days ocassions") %>%
  kable_styling(full_width = FALSE, position = "left")
Effort data over 105 days, split by 10 days ocassions
deployment_id SO1 SO2 SO3 SO4 SO5 SO6 SO7 SO8 SO9 SO10 SO11 SO12 SO13 SO14 SO15 SO16 SO17 SO18 SO19 SO20 SO21 SO22 SO23 SO24 SO25 SO26 SO27 SO28 SO29 SO30 SO31 SO32 SO33 start_date end_date
D1_AB11-1 10 10 7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2021-07-15 2021-08-10
D1_AC10-1 10 10 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2021-07-15 2021-08-04
D1_AC11-1 10 10 7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2021-07-15 2021-08-10
D1_AC11-2 10 10 7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2021-07-15 2021-08-10
D1_AD10-1 10 10 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2021-07-15 2021-08-04
D1_AA10-1 5 10 6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2021-07-20 2021-08-09
D1_AA11-1 5 10 7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2021-07-20 2021-08-10
D1_W13-1 5 10 7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2021-07-20 2021-08-10
D1_Y11-1 5 10 7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2021-07-20 2021-08-10
D1_Z10-1 5 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2021-07-20 2021-07-27

The detection matrix

Similarly to single-season occupancy models, we need the structure to include a SINGLE row for each deployment, with a column stating the detection (i.e. 0=no detection, 1=detection) for each sampling occasion and the columns containing the covariates of interest.

To obtain this structure, we will need to move things around. As explained in previous tutorials, you can do this manually and if the dataset is not extensive it should not take you too long. Also it might be appropriate if you want to have a good sense for the nature of the data and reflect on effort and potential results. If you prefer to automatise the process, you can use the R code below, which includes an extensive explanation of what is going on :-)

Libraries Loaded:

  • dplyr: For data manipulation using pipes (%>%) and functions like filter(), mutate(), and select().

  • lubridate: For handling dates and time calculations, making it easier to manipulate date formats.

Code Breakdown:

Ensure Start and End Dates Are in Date Format:

deployment_data <- deployment_data %>%
  mutate(start_date = as.Date(start_date, format = "%Y-%m-%d"),
         end_date = as.Date(end_date, format = "%Y-%m-%d"))
  • Converts the start_date and end_date columns of the filtered_deployment dataset into Date format. This ensures the dates are consistent for comparison in subsequent steps.

Get the Absolute Earliest Start Date and Latest End Date:

overall_start_date <- as.Date(min(deployment_data$start_date))
overall_end_date <- as.Date(max(deployment_data$end_date))
  • Finds the earliest deployment start date across all rows of the filtered_deployment dataset. This date will be used as a reference point to define periods for the detection matrix.

Define the Period Length:

period_length <- 10

Sets the length of each period to 10 days. These periods are used to group the data into intervals (Sampling Ocassions, S01 to S0xx) for occupancy modeling.

Calculate the total number of periods (sessions SO1 to SOX)

total_periods <- ceiling(as.numeric(difftime(overall_end_date, overall_start_date, units = "days")) / period_length)

Dynamically create column names for SO1 to SOX

so_columns <- paste0("SO", 1:total_periods)

Create the wolf_detection Dataset:

wolf_detection <- deployment_data %>%
  select(deployment_id, longitude, latitude, feature_type, elevation, ruggedness)
  • Extracts key columns (deployment_id, longitude, latitude, etc.) from the filtered_deployment dataset (which is a summer subset of the original dataset) to form a new dataset, wolf_detection. This dataset will eventually contain the detection matrix.

Initialize the Period Columns:

# Initialize the SO columns for detection data
# Here we explicitly create the SO columns with NA
for (col in so_columns) {
  wolf_detection[[col]] <- NA
}
  • Adds columns (SO1 to SOX) to represent the detection status of “wolf” in each 10-day period. These are initialised with NA (this means that by default, periods with no overlap will stay as NA).

Loop Through Each Deployment:

#initiate the loop
for (i in 1:nrow(wolf_detection)) {
  
  current_deployment_id <- wolf_detection$deployment_id[i]  # Current deployment id
  
  # Get start and end date for the current deployment
  start_date <- deployment_data$start_date[deployment_data$deployment_id == current_deployment_id][1]
  end_date <- deployment_data$end_date[deployment_data$deployment_id == current_deployment_id][1]
  
  # Loop through each period (SO1 to SOX)
  for (period in 1:total_periods) {
    
    # Calculate the period start and end dates
    period_start <- overall_start_date + days((period - 1) * period_length)
    period_end <- period_start + days(period_length - 1)
    
    # Check if the deployment overlaps with the current period
    if (max(as.Date(period_start), as.Date(start_date)) <= min(as.Date(period_end), as.Date(end_date))) {
      
      # Filter records specifically for the current deployment and species (Grey wolf)
      wolf_count <- camtraps %>%
        filter(deployment_id == current_deployment_id,
               common_name == "Grey Wolf",
               as.POSIXct(timestamp) >= as.POSIXct(period_start) & 
               as.POSIXct(timestamp) <= as.POSIXct(period_end))
      
      # Count occurrences
      wolf_count_num <- nrow(wolf_count)
      
      # Assign count to detection dataframe
      wolf_detection[[paste0("SO", period)]][i] <- ifelse(wolf_count_num > 0, 1, 0)
    } else {
      wolf_detection[[paste0("SO", period)]][i] <- NA
    }
  }
}

Iterates over each row of the wolf_detection dataset, corresponding to each deployment. For each deployment, it performs the following steps:

  1. Extract Deployment-Specific Dates:

    For each deployment, retrieves the start_date and end_date by matching the deployment_id. The [1] ensures that only a single value is extracted, even if there are multiple matches.

  2. Loop Through the S01 to S05 Periods:

    Loops over five 21-day periods (S01 to S05), dynamically setting the column name for each period (period_col).

  3. Check If Deployment Overlaps with Period:

    Checks if the deployment’s start and end dates overlap with the current period. If so, the code proceeds to check for detections during this period.

  4. Count wolf Detections for the Period:

    Filters the filtered_camtraps dataset (which contains camera trap data) to count how many times “Grey wolf” was detected during the current period for the current deployment. This is done by filtering on:

    • The matching deployment_id.

    • The species name common_name being “Grey wolf”.

    • The timestamp falling within the period’s start and end dates.

  5. Update the Period Column if wolf Was Detected:

    If any “Grey wolf” detections were found (i.e., wolf_count > 0), the value for the respective period column (SO1 to SO5) is set to 1, indicating detection. If there are no detections but there was overlap, the value is set to 0. If they do not overlap, the value remains NA.

Save the wolf_detection Dataset as a CSV File and visualise outputs:

write.csv(wolf_detection, 
          "wolf_detection.csv", 
          row.names = FALSE)

# Get the first 10 rows
first_10_rows_wolf <- head(wolf_detection, 10)

# Create a nice table for visualization
kable(first_10_rows_wolf, format = "html", caption = "Detection matrix for wolf over 105 days, split by 10-day occasions") %>%
  kable_styling(full_width = FALSE, position = "left")
Detection matrix for wolf over 105 days, split by 10-day occasions
deployment_id longitude latitude feature_type elevation ruggedness SO1 SO2 SO3 SO4 SO5 SO6 SO7 SO8 SO9 SO10 SO11 SO12 SO13 SO14 SO15 SO16 SO17 SO18 SO19 SO20 SO21 SO22 SO23 SO24 SO25 SO26 SO27 SO28 SO29 SO30 SO31 SO32 SO33
D1_AA10-1 24.00188 45.62046 Trail game 1137 -4.155159 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
D1_AA11-1 23.98999 45.60232 Road dirt 1073 -16.522067 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
D1_AA12-1 24.00164 45.58462 Road dirt 1155 -9.000408 NA NA 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
D1_AA12-2 24.00787 45.57602 Road dirt 1158 -18.749835 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 0 0 0 0 NA NA NA NA NA NA NA
D1_AA13-2 23.98795 45.55146 Road dirt 1516 15.614101 NA NA NA NA NA NA 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
D1_AA14-1 23.99981 45.53323 Road dirt 1554 3.011621 NA NA 0 0 1 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
D1_AB11-1 24.05055 45.60223 Trail game 1029 -18.145098 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
D1_AB12-2 24.03086 45.57991 Trail game 1230 -5.843720 NA NA NA NA NA NA 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
D1_AB13-1 24.02157 45.56658 Road dirt 1461 24.696217 NA NA 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
D1_AC10-1 24.09269 45.62117 Road dirt 1034 -12.185958 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

References

MacKenzie, Darryl I., James D. Nichols, James E. Hines, Melinda G. Knutson, and Alan B. Franklin. 2003. “ESTIMATING SITE OCCUPANCY, COLONIZATION, AND LOCAL EXTINCTION WHEN A SPECIES IS DETECTED IMPERFECTLY.” Ecology 84 (8): 2200–2207. https://doi.org/10.1890/02-3090.
Wevers, Jolien, Natalie Beenaerts, Jim Casaer, Fridolin Zimmermann, Tom Artois, and Julien Fattebert. 2021. “Modelling Species Distribution from Camera Trap by-Catch Using a Scale-Optimized Occupancy Approach.” Edited by Marcus Rowcliffe and Rahel Sollmann. Remote Sensing in Ecology and Conservation 7 (3): 534–49. https://doi.org/10.1002/rse2.207.