Here I’m aming to implement joint species models a la Ovaskainen in Stan.

The data could be, for example, visits to sampling units by birds. We want to learn about how bird species traits affect the probability of visiting a sampling unit, and how features of the sampling units affect visits (as in a standard regression of environmental predictors and count data).

Following Ovaskainen et al. 2019, the number of species is denoted by \(n_s\), the number of species traits by \(n_t\), and the number of species-specific parameters by \(n_p\). The species-specific parameters are combined into the \(n_s \times n_p\) matrix \(\mathbf{\Theta}\), so that each row of \(\mathbf{\Theta}\), i.e., the vector \(\mathbf{\Theta}_s\), contains the parameters for species \(s\). The number of parameters (\(n_p\)) and their interpretations depend on the specific model.

To determine how the way in which species respond to environmental factors (as described by the parameters \(\mathbf{\Theta}\)) depend on their traits and phylogenetic relationships, we vectorize the matrix \(\mathbf{\Theta}\) to the \(n_s n_p \times 1\) vector \(\mathbf{\theta} = vec(\mathbf{\Theta})\), and model it using a multivariate normal distribution as

\[ \theta \sim \text{N} \left(\mathbf{m}, \Sigma \otimes \left[ \rho \mathbf{C} + \left(1 - \rho \right) \mathbf{I}_{n_s} \right] \right) \]

here the \(n_s n_p \times 1\) vector \(\mathbf{m} = \text{vec}(\mathbf{M})\) is the vectorized version of the \(n_s \times n_p\) matrix \(\mathbf{M}\) (with elements \(m_{sp}\), where \(p = 1,...,n_p\) is the index for the regression parameter), and it gives the expected parameters based on species traits. The influence of traits is modeled with a linear regression \(m_{sp} = \sum_k t_{sk} \zeta_{kp}\) ,where the index \(k\) runs over then \(t\) traits, \(t_{sk}\) is the trait \(k\) for species \(s\), and the parameter \(\zeta_{kp}\) measures the influence of trait \(k\) on parameter \(p\). Arranging the regression parameters \(\zeta_{kp}\) into a \(n_t \times n_p\) matrix \(\mathbf{Z}\) and the trait values \(t_{sk}\) into a \(n_s \times n_t\) matrix \(\mathbf{T}\), we can write in matrix form \(\mathbf{M} = \mathbf{T} \mathbf{Z}\). To include the intercept intothe model, we set \(t_{s1}=1\) for all species \(s\).

The \(n_p \times n_p\) variance–covariance matrix \(\Sigma\) above models the species-specific deviations around the expectation based on species traits, and \(\otimes\) is the Kronecker (outer) product. The \(n_s \times n_s\) matrix \(\mathbf{C}\) is a phylogenetic correlation matrix that can be derived from a phylogenetic tree based on genetic data, or constructed from taxonomic information if a quantitative phylogeny is not available. The matrix \(\mathbf{I}_{n_s}\) is the identity matrix of dimension \(n_s\), and the parameter \(0 \leq \rho \leq 1\) measures the strength of the phylogenetic signal.

We can use this setup to model the number of visits at sample \(i\) by species \(s\) as

\[ \begin{aligned} & y_{is} \sim \text{Poisson}(\lambda_{is}) \\ & \log(\lambda_{is}) = \theta_{0s} + \theta_{1s} \times x_i \\ \end{aligned} \]

where parameters \(\theta_{0}\)s and \(\theta_{1}\)s are sampled from a multivariate normal with mean related to species traits and variance-covariance related to the phylogenetic similarity among species as described above.

Simulation

Lets simulate some data. For this, we start by defining the parameter structure as described above

library(ape)
library(mvtnorm)
set.seed(1234)

n.s <- 50  # number of bird spp
n.p <- 2   # number of parameters
n.t <- 2   # number of traits

# define species traits
frug <- rbeta(n.s, 1, 1)    # degree of frugivory
log_bm <- rnorm(n.s, 0, 1)  # log body mass

# define trait matrix includding ones for the intercept
TT <- as.matrix(cbind(rep(1, n.s), scale(frug), scale(log_bm)))

# simulate bird phylogeny 
tree <- rtree(n = n.s)    
CC <- vcv(tree, corr = TRUE) # species correlation based on phylogeny

# sort species and re-arrange phylogenetic correlation
tmp <- dimnames(CC)
ids <- as.numeric(as.factor(tmp[[1]]))

C <- matrix(NA, ncol(CC), ncol(CC))
for(i in 1:ncol(CC)){
  for(j in 1:ncol(CC)){
    C[ids[i],ids[j]] <- CC[i,j]
  }
}

# define parameters that link traits to expected thetas
Z <- matrix(c(1, 0.5, 0.4, 0.5, -0.8, 0), (n.t + 1), n.p)

# get expected thetas
M <- TT %*% Z

Sigma <- diag(n.p) * 0.3
rho = 0.7  # correlation with phylogeny

thetas <- rmvnorm(1, mean = as.vector(M), kronecker(Sigma, rho*C + (1-rho) * diag(n.s)))
Theta <- matrix(thetas[1,], n.s, n.p)

We can inspect now how the degree of frugivory and body mass relate to the intercept and slope of the regression between the predictor \(x\) and the expected count at a samping unit

nf = layout(matrix(c(1,2,3,4,0,0),3,2,byrow=TRUE), widths=c(1,1), heights=c(1,1,0.1))
#layout.show(nf)

op <- par( mar = c(3, 3, 2, 2) + 0.1, mgp = c(3.5, 1, 0), las = 1, bty = "n", cex = 1.2)
plot(scale(frug), Theta[,1], ylab = "", xlab = "", main = "intercept")
plot(scale(frug), Theta[,2], ylab = "", xlab = "", main = "slope")
plot(scale(log_bm), Theta[,1], ylab = "", xlab = "")
plot(scale(log_bm), Theta[,2], ylab = "", xlab = "")
mtext("         scaled frugivory           scaled log body mass", side = 1, line = -2, outer = TRUE, cex=1.3)

par(op)

Now we define the sampling scheme

n.su <- 30 # sampling units
# sample level predictor (environmental covariate)
x <- runif(n.su, -2, 2)

# species indicator
j <- rep(1:n.s, each = n.su) 
x.s <- rep(x, n.s)
n <- length(j) # sample size

log_lambda <- Theta[j,1] + Theta[j,2] * x.s
y <- rpois(n, exp(log_lambda))

The data looks like this

op <- par(las = 1, bty = "n")
plot(x.s, y, xlab = "predictor", ylab = "counts")

par(op)

Now lets write the Stan model. For this, I borrowed a function from brms for the kronecker product.

functions { 
  
  /* compute the kronecker product
  * Args: 
    *   A,B: matrices 
  * Returns: 
    *   kronecker product of A and B
  */ 
    matrix kronecker(matrix A, matrix B) { 
      matrix[rows(A)*rows(B), cols(A)*cols(B)] kron; 
      for (i in 1:cols(A)) { 
        for (j in 1:rows(A)) { 
          kron[((j-1)*rows(B)+1):(j*rows(B)), ((i-1)*cols(B)+1):(i*cols(B))] = A[j,i] * B;
        } 
      } 
      return kron; 
    } 
} 

data { 
  int<lower=1> N;                 // total number of observations 
  int<lower=0> Y[N];              // response variable 
  int<lower=1> K;                 // number of sample-level predictors
  int<lower=1> N_J;               // num of groups (bird spp)
  int<lower=1> L_J;               // num group level predictors (traits)
  int<lower=1,upper=N_J> J[N];    // group id 
  matrix[N, K ] X;                // obs-level design matrix 
  matrix[N_J, L_J] TT;            // group-level traits
  matrix[N_J, N_J] C;             // phylogenetic correlation matrix
  vector[N_J] ones;               // vector on 1s
}

parameters {
  corr_matrix[K] Omega;     // correlation matrix for var-covar of betas
  vector<lower=0>[K] tau;   // scales for the variance covariance of betas
  vector[N_J * K] betas;
  real<lower=0,upper=1> rho;  // correlation between phylogeny and betas
  vector[L_J * K] z;          // coeffs for traits
}

transformed parameters { 
  matrix[K, K] Sigma = quad_form_diag(Omega, tau);
  matrix[N_J*K, N_J*K] S = kronecker(Sigma, rho * C + (1-rho) * diag_matrix(ones));
  matrix[L_J, K] Z = to_matrix(z, L_J, K);    
  vector[N_J * K] m = to_vector(TT * Z);        // mean of coeffs
  matrix[N_J, K] b_m = to_matrix(betas, N_J, K);  // coeffs
} 

model {
  // priors
  Omega ~ lkj_corr(2);
  tau ~ student_t(3,0,10); // cauchy(0, 2.5); // lognormal()
  betas ~ multi_normal(m, S);
  rho ~ beta(2,2);
  z ~ normal(0,1);

 {
  vector[N] mu;
  for (n in 1:N){
     mu[n] = X[n,1] * b_m[J[n],1] + X[n,2] * b_m[J[n],2];
  }
  Y ~ poisson_log(mu);
  }
}

Now we prepare the data and call Stan. As the model has many transformer parameters, we’ll specify parameters we want to monitor by defining a chararcter vector with them.

library("rstan")
## Loading required package: ggplot2
## Loading required package: StanHeaders
## rstan (Version 2.18.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

stan_dat <- list(
  N = length(y),
  Y = y,
  K = ncol(Theta),
  X = matrix(c(rep(1, length(y)), x.s), length(y), ncol(Theta)),
  J = j,
  N_J = max(j),
  L_J = ncol(TT),
  TT = TT,
  ones = numeric(max(j)) + 1
)

pars <- c("Omega", "tau", "betas", "rho", "z")

fit <- stan(file = 'phyl.stan', 
            data = stan_dat,
            pars = pars, 
            include = TRUE,
            iter = 1000, chains = 3)

Before making any inferences we check for convergence and reasonable sample sizes on the posteriors

fit_summary <- summary(fit)$summary
op <- par(mfrow = c(1,2))
hist(fit_summary[,10], main = "R-hat")
hist(fit_summary[,9], main = "n-eff" )

par(op)

N <- dim(fit_summary)[[1]]
for (n in 1:N) {
  rhat <- fit_summary[,10][n]
  if (rhat > 1.1 || is.infinite(rhat) || is.nan(rhat)) {
      print(sprintf('Rhat for parameter %s is %s!',
                    rownames(fit_summary)[n], rhat))
    }
  }
## [1] "Rhat for parameter Omega[1,1] is NaN!"

Rhats and n-eff look good (we get a waringin for the Rhat of Omega[1,1] but that’s a fixed value so no worries).

We can now check some posteriors such as that for \(\rh0\)

draws <- extract(fit)

plot(density(draws$rho), main = "")
abline(v=rho)

Now compare true \(\zeta\) values (in red) to fitted ones

zs <- fit_summary[grepl("z", rownames(fit_summary)),]
#plot(c(Z) - zs[,1])
df <- data.frame(x = 1:dim(zs)[1],
                 tz = c(Z),
                 fz = zs[,1],
                 L = zs[,4],
                 U = zs[,8])

ggplot(df, aes(x = x, y = tz)) +
  geom_point(size = 2, color="red") +
  geom_point(aes(y = fz), size = 2) +
  geom_linerange(aes(ymin = L, ymax = U)) +
  theme_classic()

Let’s compare the sample-level regression coefficients

bs <- fit_summary[grepl("betas", rownames(fit_summary)),]

nf = layout(matrix(c(1,2,3,4,0,0),3,2,byrow=TRUE), widths=c(1,1), heights=c(1,1,0.1))
#layout.show(nf)
op <- par( mar = c(3, 3, 2, 2) + 0.1, mgp = c(3.5, 1, 0), las = 1, bty = "n", cex = 1.2)
plot(scale(frug), bs[1:n.s], ylab = "", xlab = "", main = "intercept")
points(scale(frug), Theta[,1], col = 2)
plot(scale(frug), bs[(n.s+1):(2*n.s)], ylab = "", xlab = "", main = "slope")
points(scale(frug), Theta[,2], col = 2)
plot(scale(log_bm), bs[1:n.s], ylab = "", xlab = "")
points(scale(log_bm), Theta[,1], col = 2)
plot(scale(log_bm),  bs[(n.s+1):(2*n.s)], ylab = "", xlab = "")
points(scale(log_bm), Theta[,2], col = 2)
mtext("         scaled frugivory           scaled log body mass", side = 1, line = -2, outer = TRUE, cex=1.3)

par(op)

Occupancy and imperfect detection

As a simple modification of the previous model we could have presence-absence data and a probability of detection (false negatives).

library(ape)
library(mvtnorm)
set.seed(1234)

n.s <- 50  # number of bird spp
n.p <- 2   # number of parameters
n.t <- 2   # number of traits

# define species traits
frug <- rbeta(n.s, 1, 1)    # degree of frugivory
log_bm <- rnorm(n.s, 0, 1)  # log body mass

# define trait matrix includding ones for the intercept
TT <- as.matrix(cbind(rep(1, n.s), scale(frug), scale(log_bm)))

# simulate bird phylogeny 
tree <- rtree(n = n.s)    
CC <- vcv(tree, corr = TRUE) # species correlation based on phylogeny

# sort species and re-arrange phylogenetic correlation
tmp <- dimnames(CC)
ids <- as.numeric(as.factor(tmp[[1]]))

C <- matrix(NA, ncol(CC), ncol(CC))
for(i in 1:ncol(CC)){
  for(j in 1:ncol(CC)){
    C[ids[i],ids[j]] <- CC[i,j]
  }
}

# define parameters that link traits to expected thetas
Z <- matrix(c(1, 0.5, 0.4, 0.5, -0.8, 0), (n.t + 1), n.p)

# get expected thetas
M <- TT %*% Z

Sigma <- diag(2) * 0.3
rho = 0.7  # correlation with phylogeny

thetas <- rmvnorm(1, mean = as.vector(M), kronecker(Sigma, rho*C + (1-rho) * diag(n.s)))
Theta <- matrix(thetas[1,], n.s, n.p)

# now we simulate the data
n.su <- 30 # sampling units
# sample level predictor (environmental covariate)
x <- runif(n.su, -2, 2)
p.d <- 0.7  # detection probability
n.v <- 4    # number of visits to each samping unit

# species indicator
j <- rep(1:n.s, each = n.su) 
x.s <- rep(x, n.s)
n <- length(j) # sample size

logit_p <- Theta[j,1] + Theta[j,2] * x.s
z <- rbinom(n, size = 1, prob = plogis(logit_p))
y <- matrix(NA, n, n.v)
for(i in 1:n.v) y[,i] <- rbinom(n, size = 1, prob = z * p.d)  

The Stan model has to take into account the detection probability. I copied what I found at the Stan-dev github https://github.com/stan-dev/example-models/blob/master/misc/ecology/occupancy/occupancy.stan

functions { 
  
  /* compute the kronecker product
  * Args: 
    *   A,B: matrices 
  * Returns: 
    *   kronecker product of A and B
  */ 
    matrix kronecker(matrix A, matrix B) { 
      matrix[rows(A)*rows(B), cols(A)*cols(B)] kron; 
      for (i in 1:cols(A)) { 
        for (j in 1:rows(A)) { 
          kron[((j-1)*rows(B)+1):(j*rows(B)), ((i-1)*cols(B)+1):(i*cols(B))] = A[j,i] * B;
        } 
      } 
      return kron; 
    } 
} 

data { 
  int<lower=1> N;                 // total number of observations 
  int<lower=1> T;                 // Number of (potential) replicates at each site
  int<lower=0,upper=1> Y[N,T];    // response variable 
  int<lower=1> K;                 // number of sample-level predictors
  int<lower=1> N_J;               // num of groups (bird spp)
  int<lower=1> L_J;               // num group level predictors (traits)
  int<lower=1,upper=N_J> J[N];    // group id 
  matrix[N, K ] X;                // obs-level design matrix 
  matrix[N_J, L_J] TT;            // group-level traits
  matrix[N_J, N_J] C;             // phylogenetic correlation matrix
  vector[N_J] ones;               // vector on 1s
}

parameters {
  corr_matrix[K] Omega;     // correlation matrix for var-covar of betas
  vector<lower=0>[K] tau;   // scales for the variance covariance of betas
  vector[N_J * K] betas;
  real<lower=0,upper=1> rho;  // correlation between phylogeny and betas
  vector[L_J * K] z;          // coeffs for traits
  real<lower=0,upper=1> p;    // detection probability
}

transformed parameters { 
  matrix[K, K] Sigma = quad_form_diag(Omega, tau);
  matrix[N_J*K, N_J*K] S = kronecker(Sigma, rho * C + (1-rho) * diag_matrix(ones));
  matrix[L_J, K] Z = to_matrix(z, L_J, K);    
  vector[N_J * K] m = to_vector(TT * Z);        // mean of coeffs
  matrix[N_J, K] b_m = to_matrix(betas, N_J, K);  // coeffs
} 

model {
  // priors
  p ~ beta(2,2);
  Omega ~ lkj_corr(2);
  tau ~ student_t(3,0,10); // cauchy(0, 2.5); // lognormal()
  betas ~ multi_normal(m, S);
  rho ~ beta(2,2);
  z ~ normal(0,1);

 {
  vector[N] mu;
  for (n in 1:N){
     mu[n] = X[n,1] * b_m[J[n],1] + X[n,2] * b_m[J[n],2];
     if (sum(Y[n]) > 0)
     target += log(inv_logit(mu[n])) + bernoulli_lpmf(Y[n] | p);
     else
     target += log_sum_exp(log(inv_logit(mu[n])) + bernoulli_lpmf(Y[n] | p), log(1 - inv_logit(mu[n])));
     }
  }
}

Now we set up the data and call Stan

library("rstan")
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

stan_dat <- list(
  N = dim(y)[1],
  Y = y,
  T = dim(y)[2],
  K = ncol(Theta),
  X = matrix(c(rep(1, dim(y)[1]), x.s), dim(y)[1], ncol(Theta)),
  J = j,
  N_J = max(j),
  L_J = ncol(TT),
  TT = TT,
  ones = numeric(max(j)) + 1
)

pars <- c("Omega", "tau", "betas", "rho", "z", "p")

fit <- stan(file = 'phyl_id.stan', 
            data = stan_dat,
            pars = pars, 
            iter = 1000, 
            chains = 3)

Again, before making any inferences we check for convergence and reasonable sample sizes on the posteriors

fit_summary <- summary(fit)$summary
op <- par(mfrow = c(1,2))
hist(fit_summary[,10], main = "R-hat")
hist(fit_summary[,9], main = "n-eff" )

par(op)

N <- dim(fit_summary)[[1]]
for (n in 1:N) {
  rhat <- fit_summary[,10][n]
  if (rhat > 1.1 || is.infinite(rhat) || is.nan(rhat)) {
      print(sprintf('Rhat for parameter %s is %s!',
                    rownames(fit_summary)[n], rhat))
    }
  }
## [1] "Rhat for parameter Omega[1,1] is NaN!"

Some n-eff are low and we should probably run more iterations…

We can now check some posteriors such as that for the detection probability

draws <- extract(fit)

plot(density(draws$p), main = "")
abline(v=p.d)

Now compare true \(\zeta\) values (in red) to fitted ones

zs <- fit_summary[grepl("z", rownames(fit_summary)),]
#plot(c(Z) - zs[,1])
df <- data.frame(x = 1:dim(zs)[1],
                 tz = c(Z),
                 fz = zs[,1],
                 L = zs[,4],
                 U = zs[,8])

ggplot(df, aes(x = x, y = tz)) +
  geom_point(size = 2, color="red") +
  geom_point(aes(y = fz), size = 2) +
  geom_linerange(aes(ymin = L, ymax = U)) +
  theme_classic()

Let’s compare the sample-level regression coefficients

bs <- fit_summary[grepl("betas", rownames(fit_summary)),]

nf = layout(matrix(c(1,2,3,4,0,0),3,2,byrow=TRUE), widths=c(1,1), heights=c(1,1,0.1))
#layout.show(nf)
op <- par( mar = c(3, 3, 2, 2) + 0.1, mgp = c(3.5, 1, 0), las = 1, bty = "n", cex = 1.2)
plot(scale(frug), bs[1:n.s], ylab = "", xlab = "", main = "intercept")
points(scale(frug), Theta[,1], col = 2)
plot(scale(frug), bs[(n.s+1):(2*n.s)], ylab = "", xlab = "", main = "slope")
points(scale(frug), Theta[,2], col = 2)
plot(scale(log_bm), bs[1:n.s], ylab = "", xlab = "")
points(scale(log_bm), Theta[,1], col = 2)
plot(scale(log_bm),  bs[(n.s+1):(2*n.s)], ylab = "", xlab = "")
points(scale(log_bm), Theta[,2], col = 2)
mtext("         scaled frugivory           scaled log body mass", side = 1, line = -2, outer = TRUE, cex=1.3)

par(op)

Juan Manuel Morales. April 2019. Last udpared: 2019-06-12