Dynamic Occupancy model data simulation

Define spatial region

The following document describes the steps to simulate multiple species spatial-temporal occupancy patterns under detectability while accounting for spatial autocorrelation and meta-population dynamics. Denote quadrant \(A\) with area \(|A|\) the region where species occur. A discretization over \(A\) into \(n\) non-overlapping disjoint sub regions \(A_1,\ldots,A_n = A\) produces a square lattice with \(n\) grid cells that can be occupied or not by a species \(i\). First the spatial resolution is defined to be a \(10 \times 10\) area for region \(A\) with a cell size of 1, yielding to \(n=100\) grid cells. Then, a square lattice region is set up and the euclidean distance to each cell is calculated.

quad.size = 10 # Total Area A
cell.size = 1 # non overlapping disjoint subregions A1...Ak = A
n.cell <- (quad.size/cell.size)^2 # number of grid cells (sites)

# Set up a square lattice region A

simgrid <- expand.grid(seq(1,quad.size,by=cell.size), seq(1,quad.size,by=cell.size))

# Distance matrix

distance <- as.matrix(dist(simgrid))
dim(distance)

## [1] 100 100

Simulate Spatial Autocovariate

To introduce spatial autocorrelation into species occurrences,a spatially autocorrelated covariate \(x(s)\) for the \(s\)th location is simulated as follows: \(x(s) \sim \mbox{MVN}(0, \Sigma)\). Where the covariance matrix \(\Sigma\) holds the spatial autocorrelation structure, which in this particular case it will be a function of the distance \(D(\delta_{ij})\) representing the correlation decay between pairs of points as the distance between them increases. The relationship between correlation and distance is modeled by the exponential decay function \(D(\delta_{ij}) = e^{-\phi\delta_{ij}}\) where \(\phi\) controls the rate of decay. Whilst \(\phi = 0\) yields to an homogeneous spatial field, higher values of \(\phi\) produces well defined spatial clusters. This can be seen on the complete Shiny App.

library(MASS) # to call for random variable multivariate normal generator 
library(raster)  # to produce raster from matrix
phi <- 0.05 # parameter controling how rapidly correlation decreases with distance
xs <- mvrnorm(n = 1, rep(0, n.cell), exp(-phi * distance))
# Generate raster for plotting
Xraster <- rasterFromXYZ(cbind(simgrid[, 1:2] - 0.5, xs))

Define site level effects

To estimate spatiotemporal species occurrence patterns, first the number of species \(i\) and number of years \(t\) must be defined. For this scenario a total of 5 species that occur through 4 years are being specified. The number of sites \(j\) is determined by the number of grid cells according to the spatial resolution of the study (Spatial scale dependency). For simplicity a single site-level effect is being simulated for site \(j\), i.e. \(x_{1j} \sim \mathrm{Uniform}(0,1)\). Assuming sites where visited a single time but some sort of sampling effort covarite \(x_{2jt}\) is available at each site for each year(which must be linearly independent from \(x_{1j}\) e.g. sampling hours), then \(x_{2jt} \sim \mathrm{truncated\ norm} (\mu_t,33)\) where \(\mu_t = \{22,25,17,20\}\) with constant variance \(\sigma = 33\)

nsite <-  n.cell    # number of sites
nspec <- 5  # num of species
nyear <- 4  # num of years

# Covariates
# Single covariate 
x1 <- runif(nsite,0,1) # site-level covariate constant over time
# 
library(truncnorm)

# effort per year
set.seed(2020)
effort <- matrix(0,ncol = nyear,nrow = nsite)
effort.mean <- c(22,25,17,20)
for( t in 1:nyear){
  effort[,t] <- rtruncnorm(nsite,0,500,effort.mean[t],33)
}

Initial occupancy state

To simulate species through time, one must defined the initial occupancy state. To do so, first the mean baseline occupancy across all species is defined, i.e. \(\mu_\psi = 0.65\), this is site-level occupancy probability. Then, species specific base line occupancy probability is defined based on the logit scale of this hyperparameter as \(\beta_{0,i} \sim \mathrm{Normal}(\mathrm{logit}(\mu_\psi),1)\). Moreover, the differential effect that site-level covariate \(x_{1j}\) has on species \(i\) is introduced by defining different slopes for each species, i.e. \(\beta_{1,i} \sim \mathrm{Normal}(0.75,1.2)\). similarly, the species - specific spatial effect is included as \(\beta_{2,i} \sim \mathrm{Normal}(1.5,1)I(\phi > 0)\) only when the spatial structure is present, i.e. \(\phi > 0\). Thus, the initial occupancy state \(z_{ij1}\) for species \(i\) at site \(j\) in the first year is:

\[\begin{align} z_{0,1} &\sim \mathrm{Bernoulli}(\psi_{i,1}) \nonumber\\ \mathrm{logit}(\psi_{i,1}) &= \beta_{0,i} + \beta_{1,i} x_{1j} + \beta_{2,i} x_{2j} \end{align}\]

# intercept
mean.psi <- 0.65 # baseline occupancy across all species (community mean or site -level baseline occupancy)
mu.lpsi <- qlogis(mean.psi) # baseline occupancy logit scale
sig.lpsi <- 1 # baseline occupancy variance logit scale
beta0 <- rnorm(nspec, mu.lpsi, sig.lpsi) #species specific intercept (baseline occupancy for each species)

# slope - site level effects
mu.beta.lpsi <- 0.75 # mean slope across all species 
sig.beta.lpsi <- 1.2 # mean variance across all species
beta1 <- rnorm(nspec, mu.beta.lpsi, sig.beta.lpsi) # species specific slopes


# spatial effect species specific coefficient 
mu.beta.lxs <- 1.5 # mean slope across all species 
sig.beta.lxs <- 1 # mean variance across all species
beta2 <- rnorm(nspec, mu.beta.lxs, sig.beta.lxs) # species specific slopes
beta2 <- if(phi==0)rep(0,nspec)else rgamma(nspec, mu.beta.lxs, sig.beta.lxs) # species specific slopes



set.seed(2020)

# initial occupancy probability and occurrence
psi.i0 <- z0 <- array(dim = c(nsite, nspec))

for (i in 1:nspec) {
  psi.i0[,i] <- plogis(beta0[i] + beta1[i] * x1 +beta2[i] * xs)
  z0[, i] <- rbinom(nsite,1,psi.i0)
}

Temporal Occupancy states

Subsequent occupancy year at time \(t+1\) are defined based on species - specific colonization and survival probabilities. Let \(\mu_{\gamma_{0}} = 0.5\) the base line probability of any site being colonized by a species (site level colonization probability) and \(\mu_{\gamma_{1}} = 0.65\) mean slope for site effect on colonization across the species community. Then, species specific intercept colonization and site level effect slopes are drawn from \(\gamma_{0,i} \sim \mathrm{Normal}(\mathrm{logit}(\mu_{\gamma_{0}}),1)\) and \(\gamma_{1,i} \sim \mathrm{Normal}(\mathrm{logit}(\mu_{\gamma_{1}}),1)\) .

# Colonization

# Intercept
mean.gamma <- 0.5 # baseline colonization probability across all species
mu.lgamma <- qlogis(mean.gamma) # colonization baseline logit scale
sig.lgamma <- 1 # baseline colonization variance 
gamma0 <- rnorm(nspec, mu.lgamma, sig.lgamma) # species specific intercept (baseline colonization for each species)

# Slope  
mu.beta.lgamma <- 0.65 # mean slope for site effect on colonization across all species
sigma.beta.lgamma <-  1 # mean slope variance for site effect on colonization across all species
set.seed(2020)
gamma1 <- rnorm(nspec, mu.beta.lgamma, sigma.beta.lgamma) # species specific slopes

Similarly, species specific survival probabilities are defined as follows:

\[\begin{align} \phi_{0,i} \sim \mathrm{Normal}(\mathrm{logit}(\mu_{\phi_{0}}),1)\nonumber\\ \phi_{1,i} \sim \mathrm{Normal}(\mathrm{logit}(\mu_{\phi_{1}}),1) \end{align}\]

Where \(\mu_{\phi_{0}} = 0.5\) and \(\mu_{\phi_{1}} =0.5\) are site-level parameters.

# Survival

# Intercept
mean.phi <- 0.5 # baseline survival probability across all species
mu.lphi <- qlogis(mean.phi) # survival baseline logit scale
sig.lphi <- 1 # baseline colonization variance 
phi0 <- rnorm(nspec, mu.lphi, sig.lphi) # species specific intercept (baseline colonization for each species)

# Slope  
mu.beta.lphi <- 0.5 # mean slope for site effect on colonization across all species
sigma.beta.lphi <-  1 # mean slope variance for site effect on colonization across all species
set.seed(2020)
phi1 <- rnorm(nspec, mu.beta.lphi, sigma.beta.lphi) # species specific slopes

Additionally, spatial effects on colonization and survival are modeled if \(\phi >0\) . Thus, species-specific probabilities of colonization and survival at each site are defines as:

\[\begin{align} \mathrm{logit}(\gamma_{i,j}) = \gamma_{0,i}+ \gamma_{1,i} * x_{1j} + \gamma_{2,i} x_{2j} + \gamma_{2,i} x_{2j}^2 \nonumber \\ \mathrm{logit}(\phi_{i,j}) = \phi_{0,i}+ \phi_{1,i} * x_{1j} + \phi_{2,i} x_{2j} + \phi_{2,i} x_{2j}^2 \end{align}\]

Colonization and survival probabilities are assumed to be constant across all years and only vary between species and sites (given site level effects are present). According to Bled et al (2011), introducing a quadratic term in \(x_{2j}^2\) allows to account Alle effect as \(\gamma\) or \(\phi\) vary in such a way that may lead to a peak at intermediate values of \(x_{2j}\). Alle effect is a density dependent process in which recruitment and survival rates decay as a result of a reduced effective population size.

# Spatial effects
set.seed(2020)
gamma2 <- rnorm(nspec, 0.7, 1) 
gamma3 <- rnorm(nspec, 0.8, 1.2) 
phi2 <- rnorm(nspec, 0.5, 1) 
phi3 <- rnorm(nspec, 0.65, 1.2)


# Colonization and survivial probabilities constant over t Include spatial covariate here?
gamma <- matrix(0,ncol=nspec,nrow = nsite)
phi <- matrix(0,ncol=nspec,nrow = nsite)

for (i in 1:nspec){
  gamma[,i] <- plogis( gamma0[i] + gamma1[i]*x1 + gamma2[i]*xs + gamma3[i]*xs**2) # colonization probability for each species
  phi[,i] <- plogis( phi0[i] + phi1[i]*x1 + phi2[i]*xs + phi3[i]*xs**2) # colonization probability for each species
  
}

The state process can be defined recursively for the occupancy states \(z_{ijt}\) on time \(t>1\) for species \(i\) at each site \(j\).

\[\begin{align} z_{i,j,t} &\sim \mathrm{Bernoulli}(\pi_{i,j,t}) \nonumber\\ \pi_{i,j,t} &= z_{i,j,t-1} \times \phi_{i,j} + (1-z_{i,j,t-1})\times \gamma_{i,j} \end{align}\]

Thus, if site \(j\) is is unoccupied (\(1-z_{ijt}\)) on t then it can be colonized on \(t+1\) with porbability \(\gamma_i\) for species \(i\). Likewise, if site is occupied (\(z_{ijt}\)) in \(t\), it remains occupied (survives) with a probability of \(\phi_i\) on \(t + 1\).

# Add initial occupancy state and probability into the z and pi arrays respectively
pi <- z <- array(dim = c(nsite, nyear, nspec))


for( i in 1:nspec){
  z[,1,i] <- z0[,i]
  pi[,1,i] <- psi.i0[,i]
}


# define recursively occupancy states z on time t for species i at each site j 
# I assume species-specific colonization and survival probabilities only depend on site level effects and they do not change over time

set.seed(2020)
for(t in 2:nyear){
  for (i in 1:nspec){
    pi[,t,i] <- z[,t-1,i]*phi[,i]+(1-z[,t-1,i])*gamma[,i]
    z[,t,i] <- rbinom(nsite,1,pi[,t,i]) 
  }
}

Spatiotemporal occupancy states \(z_{ijt}\) can then be visualized by converting them to raster images according to the spatial extent defined at the beginning of this report. First, the array \(z_{ijt}\) is converted to a \(10 \times 10\) matrix and merged into a list for each year. Then, each raster layer is declared as a categorical variable where 1 denotes a presence and 0 an absence at each grid cell. Finally, the process is repetead again for each individual species and stored as a list named TrueOcc_stack.

TrueOcc_stackyears=list()
TrueOcc_stack=list()

for(i in 1:nspec) {
  for(t in 1:nyear) {
    TrueOcc_stackyears[[t]] <-  raster(matrix(z[,t,i], quad.size/cell.size, quad.size/cell.size),xmn = 0,
                                       xmx = quad.size/cell.size, ymn = 0, ymx = quad.size/cell.size)
    names(TrueOcc_stackyears[[t]]) <- paste("year",t,sep=" ")
    TrueOcc_stackyears[[t]]  <- ratify(TrueOcc_stackyears[[t]])
    rat <- levels(TrueOcc_stackyears[[t]])[[1]]
    
    for(k in 1:length(rat)){
      rat$code[k] <- ifelse( rat$ID[k] ==1,"presence","absence")
    }
    
    levels(TrueOcc_stackyears[[t]] ) <- rat
  }
  TrueOcc_stack[[i]] <- TrueOcc_stackyears
  #names(TrueOcc_stack[[i]]) <- paste("species",i)
}

Each individual raster can be accessed by indexing the list. Alternatively, multiple raster visualization can be produces by stacking each raster suing the stack function.

# species 1 occupancy pattern across years
plot(stack(TrueOcc_stack[[1]]))

Imperfect detection

Supposing state variable \(Z_{ijt}\) is observed imperfectly. Then, the observation process can be simulated by defining the following species-specific effect.

\[\begin{align} \alpha_{0,i} &\sim \mathrm{Normal}(\mu_{p}, 1) \nonumber \\ \alpha_{1,i} &\sim \mathrm{Normal}(\mu_{lp}, 1.2) \nonumber \\ \end{align}\]

Where \(\alpha_{0,i}\) is the species-specific baseline detection (with site-level mean detection probability \(\mu_{p} = 0.025\)) and \(\alpha_{1,i}\) is the coefficient associated with the effect that site-level covariate \(x_{1,j}\) has on the detection of the species \(i\). The effort is included as power term to model the effect that sampling effort has on probability of detecting a species at each of the sites across years. Thus, the detecion probability can be written as:

\[\begin{equation} p_{i,j,t} = 1 - (1-\mathrm{logit}^{-1}(\alpha_{0,i} + \alpha_{1,i} x_{1,j}))^{\mathrm{effort}_{j,t}} \end{equation}\]

mean.p <- 0.025 # baseline detection probability across all species
mu.lp <- qlogis(mean.p) # logit scale
sig.lp <- 1 # baseline detection variance across all species

alpha0.l <- rnorm(nspec, mu.lp, sig.lp) # species specific baseline detection  intercept
pmat <- array(dim = c(nsite, nyear, nspec))

# heterogeneous detections throught site level effects 
# other covariates can be specified as linear effects on detection e.g.

mu.alpha.lp <- 0.5 # mean detection slope for site level effect x1 across all species 
sig.alpha.lp <- 1.2 # mean variance across all species
alpha1 <-rnorm(nspec, mu.alpha.lp, sig.alpha.lp) # species specific slopes


for ( i in 1:nspec){
  for (t in 1:nyear){
    pmat[,t,i] <-  1-(1-plogis(alpha0.l[i] + alpha1[i] * x1))^effort[,t]
  }
}

Detection probabilities are stores in an array which is used to define the observations \(y_{ijt}\) as follows:

\[\begin{align} z_{i,j,t} &\sim \mathrm{Bernoulli}(\pi_{i,j,t}) \nonumber\\ \pi_{i,j,t} &= z_{i,j,t-1} \times \phi_{i,j} + (1-z_{i,j,t-1})\times \gamma_{i,j}\nonumber\\ y_{i,j,t}|z_{i,j,t} &\sim \mathrm{Bernoulli}(p_{i,j,t}) \nonumber\\ p_{i,j,t} &= 1 - (1-\mathrm{logit}^{-1}(\alpha_{0,i} + \alpha_{1,i} x_{1,j}))^{\mathrm{effort}_{j,t}} \end{align}\]

y <- array(dim = c(nsite, nyear, nspec))

for (j in 1:nsite) {
  for (i in 1:nspec) {
    for (t in 1:nyear) {
      y[j, t, i] <- rbinom(1, 1, pmat[j,t,i] * z[j, t, i])
    }
  }
}

Raster visualization can be done using the same approach as with the state process, i.e. defining a set of categorical raster layers based on the chosen grid for each year and species.

ObsOcc_stackyears=list()
ObsOcc_stack=list()

for(i in 1:nspec) {
  for(t in 1:nyear) {
    ObsOcc_stackyears[[t]] <-  raster(matrix(y[,t,i], quad.size/cell.size, quad.size/cell.size),xmn = 0, xmx = quad.size/cell.size, ymn = 0, ymx = quad.size/cell.size)
    names(ObsOcc_stackyears[[t]]) <- paste("year",t,sep=" ")
    ObsOcc_stackyears[[t]]  <- ratify(ObsOcc_stackyears[[t]])
    rat <- levels(ObsOcc_stackyears[[t]] )[[1]]
    for(k in 1:length(rat)){
      rat$code[k] <- ifelse( rat$ID[k] ==1,"presence","absence")
    }
    levels(ObsOcc_stackyears[[t]] ) <- rat
  }
  ObsOcc_stack[[i]] <- ObsOcc_stackyears
}

# species 1 occupancy pattern across years
plot(stack(ObsOcc_stack[[1]]))

For complete visualization visit the Shiny App. The App allows to investigate different scenarios where a variaty of occupancy spatiotemporal dynamics can be simulated for multiple species at different spatial scales and enables to download the data generared by the observational process.

Dynamic Occupancy Model Simulation

Jafet Belmont