Integrated Population Models

Alejandro Molina Moctezuma

Before we start

Packages we’ll use:

ASMbook

rstan

Before we start

Brief introductions 🐟

Before we start

Follow along at https://rpubs.com/amolina/ipm

  • Recommended book

Before we start

This talk

  • Integrated models

  • Integrated population models

  • Discussion

  • Examples and code?

  • Make it successful! 😄 Participate and discuss

  • A big apology

What is an Integrated Model?

  • What is a model?

  • Integrated Population Model

  • What is an Integrated Model?

What is an Integrated Model?

  • What?

  • Why?

  • How?

  • When?/When not?

What are IM’s?

Statistical method to share information about some underlying process by combining data sets

  • How: Form joint likelihood with shared parameters among likelihoods of each data set

  • NOT ONLY WAY! (but we will focus on this)

  • Joint likelihood for two or more “disparate” data sets, with at least one shared parameter

IM’s have a shared process

flowchart LR
  A[$$\theta$$] --$$\omega_1$$ --> B($$data_1$$)
  A[$$\theta$$] --$$\omega_2$$ --> C($$data_2$$)

flowchart LR
  subgraph Hidden
    A[$$\theta$$]
    B[$$\gamma$$]
    C[$$\lambda$$]
  end
  Hidden --$$\omega_1$$ --> E($$data_1$$)
  A--$$\omega_2$$ --> D($$data_2$$)

    style Hidden fill:#bdf1f1,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5 

Integrated models

IPM’s

  • estimate abundance and demographic rates
  • Statistical method to share information about some underlying processes (abundance and demographic rates) by combining data sets
  1. Abundance data (counts)
  2. Productivity data (reproduction)
  3. Capture-reapture data (survival)

Exercise

  1. Abundance data (counts)
  2. Productivity data (reproduction)
  3. Capture-reapture data (survival)

flowchart LR
  subgraph Hidden
    A[$$\theta$$]
    B[$$\gamma$$]
    C[$$\lambda$$]
  end
  Hidden --$$\omega_1$$ --> E($$data_1$$)
  A--$$\omega_2$$ --> D($$data_2$$)

    style Hidden fill:#bdf1f1,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5 

Exercise

CJS model:

flowchart LR
  A($$\phi_j$$) ---> C[y]
  D(p) ---> C
  

Exercise

CJS model:

flowchart LR
  A($$\phi$$) ---> C[y]
  D(p) ---> C
  

  • \[ z_{i,f_i} = 1 \]

  • \[ z_{i,t+1} | z_{i,t} \sim Bernoulli(z_{i,t}\phi_{i,t}) \]

  • \[ y_{i,t} | z_{i,t} \sim Bernoulli(z_{i,t}p_{i,t}) \]

Exercise

flowchart LR
  subgraph Hidden
    A[$$\theta$$]
    B[$$\gamma$$]
    C[$$\lambda$$]
  end
  Hidden --$$\omega_1$$ --> E($$data_1$$)
  A--$$\omega_2$$ --> D($$data_2$$)

    style Hidden fill:#bdf1f1,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5 

  1. Abundance data (counts)
  2. Productivity data (reproduction; counts of eggs, counts of newborns)
  3. Capture-reapture data (survival): CJS
  • Make a chart

Why and how?

How: Formulate the joint density of all observed data (and latent variables)

  • More data is better! Higher precision, more parameters
  • Larger datasets
  • New parameters may become possible to estimate

When to run an IPM?

  • Should you always run an IPM when you have the available data? If it has so many benefits!

  • What should your philosophy be?

  • When should you NOT run an IPM?

How to run?

Stack all models inside the same stan model statement

How to run?

Stack all models inside the same stan model statement

  • Important part: Choose the same names for shared parameters

  • Likelihood portion for each model

  • That’s it!

  • Important: You need to understand each model you run

What you should know before running an IPM

  1. Models for population size surveys
  2. Models for “productivity”
  3. Models for survival
  4. Matrix population models

Most complicated aspect of a IPM

  • Understand each individual model

  • Understanding “shared” parameters

  • Theoretical acyclical graph

Example of an IM

Let’s go back to IM’s and run an SDM

Example in R

Common swifts in Switzerland

Example

We have count data for common swifts at different altitudes.

Example in R

  • Dataset: countdata1

What model should we run?

We are just wondering what the effect of elevation is on abundance (counts)

What model?

  • A GLM! Very easy to run using the glm function

GLM

summary(fm1 <- glm(counts ~ selev, data= countdata1, family = poisson(link = "log")))

Call:
glm(formula = counts ~ selev, family = poisson(link = "log"), 
    data = countdata1)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.69493    0.03668   18.94   <2e-16 ***
selev       -1.96130    0.06700  -29.27   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1718.8  on 499  degrees of freedom
Residual deviance:  578.4  on 498  degrees of freedom
AIC: 1615.4

Number of Fisher Scoring iterations: 5

Definition of negative log-likelihood for Poisson

library(ASMbook)
NLL1 <- function(param, y, x) {
  alpha <- param[1]
  beta <- param[2]
  lambda <- exp(alpha + beta * x)
  LL <- dpois(y, lambda, log = TRUE)                   # LL contribution for each datum
  return(-sum(LL))                                     # NLL for all data
}

# Minimize NLL
inits <- c(alpha = 0, beta = 0)                        # Need to provide initial values
sol1 <- optim(inits, NLL1, y = countdata1$counts, x = countdata1$selev, hessian = TRUE, method = 'BFGS')
tmp1 <- get_MLE(sol1, 4)
          MLE     ASE LCL.95  UCL.95
alpha  0.6949 0.03668  0.623  0.7668
beta  -1.9613 0.06700 -2.093 -1.8300

We are lucky, and have a secondary (independent) dataset

  • Dataset of “birdwatchers”

  • Pretty reliable

  • They do not record zeroes!

  • What type of model?

  • A zero-truncated GLM! VGAM –> we won’t run the actual model

Definition of negative log-likelihood (NLL) for the ztPoisson GLM

NLL2 <- function(param, y, x) {
  alpha <- param[1]
  beta <- param[2]
  lambda <- exp(alpha + beta * x)
  L <- dpois(y, lambda)/(1-ppois(0, lambda))           # L. contribution for each datum
  return(-sum(log(L)))                                 # NLL for all data
}

# Minimize NLL
inits <- c(alpha = 0, beta = 0)                        # Need to provide initial values
sol2 <- optim(inits, NLL2, y = countdata2$counts, x = countdata2$selev, hessian = TRUE, method = 'BFGS')
tmp2 <- get_MLE(sol2, 4)
          MLE     ASE  LCL.95  UCL.95
alpha  0.6461 0.03852  0.5706  0.7216
beta  -2.0351 0.07089 -2.1740 -1.8961

We are lucky, and have a third (independent) dataset

  • detdata

  • Detection/nondetection data

  • NOT COUNT DATA!

  • But, underlying abundance process affects the probability of detection!

  • Elevation affects the abundance!

  • What model?

  • GLM glm. Binomial!

Fitting cloglog Bernoulli GLM

summary(fm3 <- glm(y ~ selev,data=detdata, family = binomial(link = "cloglog")))

Call:
glm(formula = y ~ selev, family = binomial(link = "cloglog"), 
    data = detdata)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.71081    0.04169   17.05   <2e-16 ***
selev       -1.98144    0.09499  -20.86   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2308.8  on 1999  degrees of freedom
Residual deviance: 1569.8  on 1998  degrees of freedom
AIC: 1573.8

Number of Fisher Scoring iterations: 6

Definition of negative log-likelihood for Bernoulli cloglog

NLL3 <- function(param, y, x) {
  alpha <- param[1]
  beta <- param[2]
  lambda <- exp(alpha + beta * x)
  psi <- 1-exp(-lambda)
  LL <- dbinom(y, 1, psi, log = TRUE)                  # L. contribution for each datum
  return(-sum(LL))                                     # NLL for all data
}

# Minimize NLL
inits <- c(alpha = 0, beta = 0)
sol3 <- optim(inits, NLL3, y = detdata$y, x = detdata$selev, hessian = TRUE, method = 'BFGS')
tmp3 <- get_MLE(sol3, 4)
          MLE     ASE  LCL.95  UCL.95
alpha  0.7108 0.04214  0.6282  0.7934
beta  -1.9815 0.09682 -2.1712 -1.7917

Fitting the IM with a joint likelihood

We are assuming the data are independent (BIG ASSUMPTION! Read about it!)

  • \[ L_{joint} = L_1L_2L_3 \]

  • \[NLL_{joint}= - LL_1 - LL_2 -LL_3\]

NLLjoint <- function(param, y1, x1, y2, x2, y3, x3) {
  # Definition of elements in param vector (shared between data sets)
  alpha <- param[1]                                    # log-linear intercept
  beta <- param[2]                                     # log-linear slope
  # Likelihood for the Poisson GLM for data set 1 (y1, x1)
  lambda1 <- exp(alpha + beta * x1)
  L1 <- dpois(y1, lambda1)
  # Likelihood for the ztPoisson GLM for data set 2 (y2, x2)
  lambda2 <- exp(alpha + beta * x2)
  L2 <- dpois(y2, lambda2)/(1-ppois(0, lambda2))
  # Likelihood for the cloglog Bernoulli GLM for data set 3 (y3, x3)
  lambda3 <- exp(alpha + beta * x3)
  psi <- 1-exp(-lambda3)
  L3 <- dbinom(y3, 1, psi)
  # Joint log-likelihood and joint NLL: here you can see that sum!
  JointLL <- sum(log(L1)) + sum(log(L2)) + sum(log(L3)) # Joint LL
  return(-JointLL)                                      # Return joint NLL
}

# Minimize NLLjoint
inits <- c(alpha = 0, beta = 0)
solJoint <- optim(inits, NLLjoint, y1 = countdata1$counts, x1 = countdata1$selev, y2 = countdata2$counts, x2 = countdata2$selev,
  y3 =detdata$y, x3 = detdata$selev, hessian = TRUE, method = 'BFGS')

# Get MLE and asymptotic SE and print and save
(tmp4 <- get_MLE(solJoint, 4))
          MLE     ASE  LCL.95  UCL.95
alpha  0.6861 0.01852  0.6498  0.7224
beta  -1.9704 0.03522 -2.0394 -1.9013
             MLE        ASE     LCL.95     UCL.95
alpha  0.6860979 0.01851556  0.6498081  0.7223878
beta  -1.9703732 0.03522001 -2.0394032 -1.9013432
diy_est <- tmp4[,1]

Not too different from an IPM!