library(readr)
library(lubridate)
library(knitr)
library(tidyverse)
library(kableExtra)
# deployment data, including covariates
deployment_data<- read.csv("deployment+covariates.csv",
header = TRUE,
sep = ",",
stringsAsFactors = FALSE)
# upload the camera data (i.e raw photo data)
camtraps <- read.csv("images_LGDs.csv",
header = TRUE,
sep = ",",
stringsAsFactors = FALSE)Data Preparation - Occupancy Analysis (single species and multiple season - Wolf)
Multi-Season Occupancy Modeling
Single season occupancy models assume population closure; there is no immigration or emigration of individuals, neither significant changes in population numbers due to mortality or births.
However, immigration, dispersal, deaths and births are natural occurrences within populations. Also, the occupancy of species might differ across periods, for instance wild ungulates have different feeding behaviours and habitat selection during the wet and dry seasons. Lastly, the detectability of species might not have a linear relationship with abundance, as different factors might affect the probability that a camera trap will detect the individuals. Some of these factors might vary during the year, for example vegetation cover, which differs across seasons for deciduous plants.
A primary goal of monitoring programs is often to document and understand trends in species occurrence over time, in addition to understanding spatiotemporal variability in occurrence in relation to a set of covariates.
Multi-season occupancy models measure the probability of initial occupancy (i.e. during season 1), but they also estimate the probability of occupancy in the next time period (given initial occupancy, colonisation and extinction) and the probability of detection. These models are also called Dynamic occupancy models (MacKenzie et al. 2003).
In multi-season occupancy models, the simple single-season+species model is complemented by introducing parameters for survival (or extinction) and colonisation probability.
The Task Ahead
We have been using the datasets produced by Wildlife Insights after identifying the content of the photographs and filtering out those with humans.
In previous sessions (Data Preparation and Occupancy Analyses for single species/single season studies), we have used a dataset based on the camera trapping data of Dr Bethany Smith, a former NTU PhD candidate. We mainly dealt with two datasets containing deployment data and raw image data. We also added a couple of covariates to the deployment dataset, selected the summer data and obtained the survey effort and the detection matrix.
The deployment dataset contains details of the camera trap location (longitude and latitude), the period of deployment and on which feature was placed (e.g. road, trail game). Elevation and ruggedness values at each deployment location were added to the dataset.
The summer camera traps raw data contains details related to the the deployment ID, date and time of capture and the animal captured
The effort data is defined as the number of days a camera was active during a sampling occasion.
The detection matrix for single-season occupancy models includes a SINGLE row for each deployment, with a column stating the detection (i.e. 0=no detection, 1=detection) for each sampling occasion and the columns containing the covariates of interest.
In this tutorial, we need to generate a detection matrix that contains the detections across multiple seasons, before we can analyse the data using an occupancy modeling framework.
Initial preparatory steps
Import datasets and standardise the date and time for further analyses
Effort as Detection Covariate
Effort can be used as a detection probability variable. Camera effort can be defined as the number of days a camera was active during a sampling occasion. Effort can affect species detection probability(Wevers et al. 2021). You will see some papers using this covariate, others do not. The following code will allow you to obtain this variable and save it as an independent dataset.
library(dplyr)
library(lubridate)
# Ensure start and end dates are in Date format (not POSIXct)
deployment_data <- deployment_data %>%
mutate(start_date = as.Date(start_date, format = "%Y-%m-%d"),
end_date = as.Date(end_date, format = "%Y-%m-%d"))
# Get the absolute earliest start date across all deployments
overall_start_date <- as.Date(min(deployment_data$start_date))
# Get the absolute latest end date across all deployments
overall_end_date <- as.Date(max(deployment_data$end_date))
# Define period length - in this case, it is 10 days
period_length <- 10
# Calculate the total number of periods required to cover the entire range
total_periods <- ceiling(as.numeric(difftime(overall_end_date, overall_start_date, units = "days")) / period_length)
# Dynamically create SO column names
so_columns <- paste0("SO", 1:total_periods)
# Initialize an empty data frame for the effort dataset with dynamic SO columns
effort_df <- data.frame(deployment_id = character(),
matrix(NA, nrow = 0, ncol = total_periods),
start_date = as.Date(character()), # Add start_date
end_date = as.Date(character()), # Add end_date
stringsAsFactors = FALSE)
# Set the column names for the periods
colnames(effort_df)[2:(total_periods+1)] <- so_columns
# Loop through each deployment to calculate effort for each period (SO1 to SOX, depending on the deployment period)
for (index in 1:nrow(deployment_data)) {
deployment_id <- deployment_data$deployment_id[index]
start_date <- deployment_data$start_date[index]
end_date <- deployment_data$end_date[index]
total_days <- as.numeric(difftime(end_date, start_date, units = "days")) + 1
# Initialize an array to hold the days for each period
period_days <- rep(NA, total_periods) # Adjust the number of periods dynamically
# Loop through each period
for (period in 0:(total_periods-1)) {
period_start <- overall_start_date + days(period * period_length)
period_end <- period_start + days(period_length - 1)
# Check if the deployment overlaps with the current period
if (period_end >= start_date && period_start <= end_date) {
active_days <- max(0, as.numeric(difftime(min(end_date, period_end), max(start_date, period_start), units = "days")) + 1)
period_days[period + 1] <- as.integer(active_days) # Convert to integer to avoid decimals
}
}
# Combine results into the effort_df
effort_df <- rbind(effort_df, data.frame(deployment_id, t(period_days), start_date, end_date, stringsAsFactors = FALSE))
}
# Rename columns to match the SO format
colnames(effort_df)[2:(total_periods+1)] <- so_columns
# Clean up the dataset to remove duplicate columns
# No need for left_join if columns are already present in the main dataframe
effort_df <- effort_df %>%
select(deployment_id, all_of(so_columns), start_date, end_date)
# Organize the dataset by earliest start_date
effort_df <- effort_df %>%
arrange(start_date)
# Save the updated effort_df as a CSV file
write.csv(effort_df,
"effort_data.csv",
row.names = FALSE)
# Get the first 5 rows
first_5_rows <- head(effort_df, 10)
# Create a nice table for visualisation
kable(first_5_rows, format = "html", caption = "Effort data over 105 days, split by 10 days ocassions") %>%
kable_styling(full_width = FALSE, position = "left")| deployment_id | SO1 | SO2 | SO3 | SO4 | SO5 | SO6 | SO7 | SO8 | SO9 | SO10 | SO11 | SO12 | SO13 | SO14 | SO15 | SO16 | SO17 | SO18 | SO19 | SO20 | SO21 | SO22 | SO23 | SO24 | SO25 | SO26 | SO27 | SO28 | SO29 | SO30 | SO31 | SO32 | SO33 | start_date | end_date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| D1_AB11-1 | 10 | 10 | 7 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2021-07-15 | 2021-08-10 |
| D1_AC10-1 | 10 | 10 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2021-07-15 | 2021-08-04 |
| D1_AC11-1 | 10 | 10 | 7 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2021-07-15 | 2021-08-10 |
| D1_AC11-2 | 10 | 10 | 7 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2021-07-15 | 2021-08-10 |
| D1_AD10-1 | 10 | 10 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2021-07-15 | 2021-08-04 |
| D1_AA10-1 | 5 | 10 | 6 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2021-07-20 | 2021-08-09 |
| D1_AA11-1 | 5 | 10 | 7 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2021-07-20 | 2021-08-10 |
| D1_W13-1 | 5 | 10 | 7 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2021-07-20 | 2021-08-10 |
| D1_Y11-1 | 5 | 10 | 7 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2021-07-20 | 2021-08-10 |
| D1_Z10-1 | 5 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2021-07-20 | 2021-07-27 |
The detection matrix
Similarly to single-season occupancy models, we need the structure to include a SINGLE row for each deployment, with a column stating the detection (i.e. 0=no detection, 1=detection) for each sampling occasion and the columns containing the covariates of interest.
To obtain this structure, we will need to move things around. As explained in previous tutorials, you can do this manually and if the dataset is not extensive it should not take you too long. Also it might be appropriate if you want to have a good sense for the nature of the data and reflect on effort and potential results. If you prefer to automatise the process, you can use the R code below, which includes an extensive explanation of what is going on :-)
Libraries Loaded:
dplyr: For data manipulation using pipes (
%>%) and functions likefilter(),mutate(), andselect().lubridate: For handling dates and time calculations, making it easier to manipulate date formats.
Code Breakdown:
Ensure Start and End Dates Are in Date Format:
deployment_data <- deployment_data %>%
mutate(start_date = as.Date(start_date, format = "%Y-%m-%d"),
end_date = as.Date(end_date, format = "%Y-%m-%d"))- Converts the
start_dateandend_datecolumns of thefiltered_deploymentdataset intoDateformat. This ensures the dates are consistent for comparison in subsequent steps.
Get the Absolute Earliest Start Date and Latest End Date:
overall_start_date <- as.Date(min(deployment_data$start_date))
overall_end_date <- as.Date(max(deployment_data$end_date))- Finds the earliest deployment start date across all rows of the
filtered_deploymentdataset. This date will be used as a reference point to define periods for the detection matrix.
Define the Period Length:
period_length <- 10Sets the length of each period to 10 days. These periods are used to group the data into intervals (Sampling Ocassions, S01 to S0xx) for occupancy modeling.
Calculate the total number of periods (sessions SO1 to SOX)
total_periods <- ceiling(as.numeric(difftime(overall_end_date, overall_start_date, units = "days")) / period_length)Dynamically create column names for SO1 to SOX
so_columns <- paste0("SO", 1:total_periods)Create the wolf_detection Dataset:
wolf_detection <- deployment_data %>%
select(deployment_id, longitude, latitude, feature_type, elevation, ruggedness)- Extracts key columns (
deployment_id,longitude,latitude, etc.) from thefiltered_deploymentdataset (which is a summer subset of the original dataset) to form a new dataset,wolf_detection. This dataset will eventually contain the detection matrix.
Initialize the Period Columns:
# Initialize the SO columns for detection data
# Here we explicitly create the SO columns with NA
for (col in so_columns) {
wolf_detection[[col]] <- NA
}- Adds columns (
SO1toSOX) to represent the detection status of “wolf” in each 10-day period. These are initialised withNA(this means that by default, periods with no overlap will stay asNA).
Loop Through Each Deployment:
#initiate the loop
for (i in 1:nrow(wolf_detection)) {
current_deployment_id <- wolf_detection$deployment_id[i] # Current deployment id
# Get start and end date for the current deployment
start_date <- deployment_data$start_date[deployment_data$deployment_id == current_deployment_id][1]
end_date <- deployment_data$end_date[deployment_data$deployment_id == current_deployment_id][1]
# Loop through each period (SO1 to SOX)
for (period in 1:total_periods) {
# Calculate the period start and end dates
period_start <- overall_start_date + days((period - 1) * period_length)
period_end <- period_start + days(period_length - 1)
# Check if the deployment overlaps with the current period
if (max(as.Date(period_start), as.Date(start_date)) <= min(as.Date(period_end), as.Date(end_date))) {
# Filter records specifically for the current deployment and species (Grey wolf)
wolf_count <- camtraps %>%
filter(deployment_id == current_deployment_id,
common_name == "Grey Wolf",
as.POSIXct(timestamp) >= as.POSIXct(period_start) &
as.POSIXct(timestamp) <= as.POSIXct(period_end))
# Count occurrences
wolf_count_num <- nrow(wolf_count)
# Assign count to detection dataframe
wolf_detection[[paste0("SO", period)]][i] <- ifelse(wolf_count_num > 0, 1, 0)
} else {
wolf_detection[[paste0("SO", period)]][i] <- NA
}
}
}Iterates over each row of the wolf_detection dataset, corresponding to each deployment. For each deployment, it performs the following steps:
Extract Deployment-Specific Dates:
For each deployment, retrieves the
start_dateandend_dateby matching thedeployment_id. The[1]ensures that only a single value is extracted, even if there are multiple matches.Loop Through the S01 to S05 Periods:
Loops over five 21-day periods (S01 to S05), dynamically setting the column name for each period (
period_col).Check If Deployment Overlaps with Period:
Checks if the deployment’s start and end dates overlap with the current period. If so, the code proceeds to check for detections during this period.
Count wolf Detections for the Period:
Filters the
filtered_camtrapsdataset (which contains camera trap data) to count how many times “Grey wolf” was detected during the current period for the current deployment. This is done by filtering on:The matching
deployment_id.The species name
common_namebeing “Grey wolf”.The
timestampfalling within the period’s start and end dates.
Update the Period Column if wolf Was Detected:
If any “Grey wolf” detections were found (i.e.,
wolf_count > 0), the value for the respective period column (SO1toSO5) is set to 1, indicating detection. If there are no detections but there was overlap, the value is set to0. If they do not overlap, the value remainsNA.
Save the wolf_detection Dataset as a CSV File and visualise outputs:
write.csv(wolf_detection,
"wolf_detection.csv",
row.names = FALSE)
# Get the first 10 rows
first_10_rows_wolf <- head(wolf_detection, 10)
# Create a nice table for visualization
kable(first_10_rows_wolf, format = "html", caption = "Detection matrix for wolf over 105 days, split by 10-day occasions") %>%
kable_styling(full_width = FALSE, position = "left")| deployment_id | longitude | latitude | feature_type | elevation | ruggedness | SO1 | SO2 | SO3 | SO4 | SO5 | SO6 | SO7 | SO8 | SO9 | SO10 | SO11 | SO12 | SO13 | SO14 | SO15 | SO16 | SO17 | SO18 | SO19 | SO20 | SO21 | SO22 | SO23 | SO24 | SO25 | SO26 | SO27 | SO28 | SO29 | SO30 | SO31 | SO32 | SO33 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| D1_AA10-1 | 24.00188 | 45.62046 | Trail game | 1137 | -4.155159 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| D1_AA11-1 | 23.98999 | 45.60232 | Road dirt | 1073 | -16.522067 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| D1_AA12-1 | 24.00164 | 45.58462 | Road dirt | 1155 | -9.000408 | NA | NA | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| D1_AA12-2 | 24.00787 | 45.57602 | Road dirt | 1158 | -18.749835 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA |
| D1_AA13-2 | 23.98795 | 45.55146 | Road dirt | 1516 | 15.614101 | NA | NA | NA | NA | NA | NA | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| D1_AA14-1 | 23.99981 | 45.53323 | Road dirt | 1554 | 3.011621 | NA | NA | 0 | 0 | 1 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| D1_AB11-1 | 24.05055 | 45.60223 | Trail game | 1029 | -18.145098 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| D1_AB12-2 | 24.03086 | 45.57991 | Trail game | 1230 | -5.843720 | NA | NA | NA | NA | NA | NA | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| D1_AB13-1 | 24.02157 | 45.56658 | Road dirt | 1461 | 24.696217 | NA | NA | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| D1_AC10-1 | 24.09269 | 45.62117 | Road dirt | 1034 | -12.185958 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |