Integrated Population Models

Before we start

Packages we’ll use:

ASMbook

rstan

Before we start

Brief introductions 🐟

Before we start

Follow along at https://rpubs.com/amolina/ipm

Recommended book

Before we start

This talk

Integrated models
Integrated population models
Discussion
Examples and code?
Make it successful! 😄 Participate and discuss
A big apology

What is an Integrated Model?

What is a model?
Integrated Population Model
What is an Integrated Model?

What is an Integrated Model?

What?
Why?
How?
When?/When not?

What are IM’s?

Statistical method to share information about some underlying process by combining data sets

How: Form joint likelihood with shared parameters among likelihoods of each data set
NOT ONLY WAY! (but we will focus on this)
Joint likelihood for two or more “disparate” data sets, with at least one shared parameter

IM’s have a shared process

flowchart LR
  A[$$\theta$$] --$$\omega_1$$ --> B($$data_1$$)
  A[$$\theta$$] --$$\omega_2$$ --> C($$data_2$$)

flowchart LR
  subgraph Hidden
    A[$$\theta$$]
    B[$$\gamma$$]
    C[$$\lambda$$]
  end
  Hidden --$$\omega_1$$ --> E($$data_1$$)
  A--$$\omega_2$$ --> D($$data_2$$)

    style Hidden fill:#bdf1f1,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5

Integrated models

IPM’s

estimate abundance and demographic rates
Statistical method to share information about some underlying processes (abundance and demographic rates) by combining data sets

Abundance data (counts)
Productivity data (reproduction)
Capture-reapture data (survival)

Exercise

Abundance data (counts)
Productivity data (reproduction)
Capture-reapture data (survival)

flowchart LR
  subgraph Hidden
    A[$$\theta$$]
    B[$$\gamma$$]
    C[$$\lambda$$]
  end
  Hidden --$$\omega_1$$ --> E($$data_1$$)
  A--$$\omega_2$$ --> D($$data_2$$)

    style Hidden fill:#bdf1f1,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5

Exercise

CJS model:

flowchart LR
  A($$\phi_j$$) ---> C[y]
  D(p) ---> C

Exercise

CJS model:

flowchart LR
  A($$\phi$$) ---> C[y]
  D(p) ---> C

\[ z_{i,f_i} = 1 \]
\[ z_{i,t+1} | z_{i,t} \sim Bernoulli(z_{i,t}\phi_{i,t}) \]
\[ y_{i,t} | z_{i,t} \sim Bernoulli(z_{i,t}p_{i,t}) \]

Exercise

flowchart LR
  subgraph Hidden
    A[$$\theta$$]
    B[$$\gamma$$]
    C[$$\lambda$$]
  end
  Hidden --$$\omega_1$$ --> E($$data_1$$)
  A--$$\omega_2$$ --> D($$data_2$$)

    style Hidden fill:#bdf1f1,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5

Abundance data (counts)
Productivity data (reproduction; counts of eggs, counts of newborns)
Capture-reapture data (survival): CJS

Make a chart

Why and how?

How: Formulate the joint density of all observed data (and latent variables)

More data is better! Higher precision, more parameters
Larger datasets
New parameters may become possible to estimate

When to run an IPM?

Should you always run an IPM when you have the available data? If it has so many benefits!
What should your philosophy be?
When should you NOT run an IPM?

How to run?

Stack all models inside the same stan model statement

How to run?

Stack all models inside the same stan model statement

Important part: Choose the same names for shared parameters
Likelihood portion for each model
That’s it!
Important: You need to understand each model you run

What you should know before running an IPM

Models for population size surveys
Models for “productivity”
Models for survival
Matrix population models

Most complicated aspect of a IPM

Understand each individual model
Understanding “shared” parameters

Theoretical acyclical graph

Example of an IM

Let’s go back to IM’s and run an SDM

Example in R

Common swifts in Switzerland

Example

We have count data for common swifts at different altitudes.

Example in R

Dataset: countdata1

What model should we run?

We are just wondering what the effect of elevation is on abundance (counts)

What model?

A GLM! Very easy to run using the glm function

GLM

summary(fm1 <- glm(counts ~ selev, data= countdata1, family = poisson(link = "log")))


Call:
glm(formula = counts ~ selev, family = poisson(link = "log"), 
    data = countdata1)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.69493    0.03668   18.94   <2e-16 ***
selev       -1.96130    0.06700  -29.27   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1718.8  on 499  degrees of freedom
Residual deviance:  578.4  on 498  degrees of freedom
AIC: 1615.4

Number of Fisher Scoring iterations: 5

Definition of negative log-likelihood for Poisson

library(ASMbook)
NLL1 <- function(param, y, x) {
  alpha <- param[1]
  beta <- param[2]
  lambda <- exp(alpha + beta * x)
  LL <- dpois(y, lambda, log = TRUE)                   # LL contribution for each datum
  return(-sum(LL))                                     # NLL for all data
}

# Minimize NLL
inits <- c(alpha = 0, beta = 0)                        # Need to provide initial values
sol1 <- optim(inits, NLL1, y = countdata1$counts, x = countdata1$selev, hessian = TRUE, method = 'BFGS')
tmp1 <- get_MLE(sol1, 4)

          MLE     ASE LCL.95  UCL.95
alpha  0.6949 0.03668  0.623  0.7668
beta  -1.9613 0.06700 -2.093 -1.8300

We are lucky, and have a secondary (independent) dataset

Dataset of “birdwatchers”
Pretty reliable
They do not record zeroes!
What type of model?
A zero-truncated GLM! VGAM –> we won’t run the actual model

Definition of negative log-likelihood (NLL) for the ztPoisson GLM

NLL2 <- function(param, y, x) {
  alpha <- param[1]
  beta <- param[2]
  lambda <- exp(alpha + beta * x)
  L <- dpois(y, lambda)/(1-ppois(0, lambda))           # L. contribution for each datum
  return(-sum(log(L)))                                 # NLL for all data
}

# Minimize NLL
inits <- c(alpha = 0, beta = 0)                        # Need to provide initial values
sol2 <- optim(inits, NLL2, y = countdata2$counts, x = countdata2$selev, hessian = TRUE, method = 'BFGS')
tmp2 <- get_MLE(sol2, 4)

          MLE     ASE  LCL.95  UCL.95
alpha  0.6461 0.03852  0.5706  0.7216
beta  -2.0351 0.07089 -2.1740 -1.8961

We are lucky, and have a third (independent) dataset

detdata
Detection/nondetection data
NOT COUNT DATA!
But, underlying abundance process affects the probability of detection!
Elevation affects the abundance!
What model?
GLM glm. Binomial!

Fitting cloglog Bernoulli GLM

summary(fm3 <- glm(y ~ selev,data=detdata, family = binomial(link = "cloglog")))


Call:
glm(formula = y ~ selev, family = binomial(link = "cloglog"), 
    data = detdata)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.71081    0.04169   17.05   <2e-16 ***
selev       -1.98144    0.09499  -20.86   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2308.8  on 1999  degrees of freedom
Residual deviance: 1569.8  on 1998  degrees of freedom
AIC: 1573.8

Number of Fisher Scoring iterations: 6

Definition of negative log-likelihood for Bernoulli cloglog

NLL3 <- function(param, y, x) {
  alpha <- param[1]
  beta <- param[2]
  lambda <- exp(alpha + beta * x)
  psi <- 1-exp(-lambda)
  LL <- dbinom(y, 1, psi, log = TRUE)                  # L. contribution for each datum
  return(-sum(LL))                                     # NLL for all data
}

# Minimize NLL
inits <- c(alpha = 0, beta = 0)
sol3 <- optim(inits, NLL3, y = detdata$y, x = detdata$selev, hessian = TRUE, method = 'BFGS')
tmp3 <- get_MLE(sol3, 4)

          MLE     ASE  LCL.95  UCL.95
alpha  0.7108 0.04214  0.6282  0.7934
beta  -1.9815 0.09682 -2.1712 -1.7917

Fitting the IM with a joint likelihood

We are assuming the data are independent (BIG ASSUMPTION! Read about it!)

\[ L_{joint} = L_1L_2L_3 \]
\[NLL_{joint}= - LL_1 - LL_2 -LL_3\]

NLLjoint <- function(param, y1, x1, y2, x2, y3, x3) {
  # Definition of elements in param vector (shared between data sets)
  alpha <- param[1]                                    # log-linear intercept
  beta <- param[2]                                     # log-linear slope
  # Likelihood for the Poisson GLM for data set 1 (y1, x1)
  lambda1 <- exp(alpha + beta * x1)
  L1 <- dpois(y1, lambda1)
  # Likelihood for the ztPoisson GLM for data set 2 (y2, x2)
  lambda2 <- exp(alpha + beta * x2)
  L2 <- dpois(y2, lambda2)/(1-ppois(0, lambda2))
  # Likelihood for the cloglog Bernoulli GLM for data set 3 (y3, x3)
  lambda3 <- exp(alpha + beta * x3)
  psi <- 1-exp(-lambda3)
  L3 <- dbinom(y3, 1, psi)
  # Joint log-likelihood and joint NLL: here you can see that sum!
  JointLL <- sum(log(L1)) + sum(log(L2)) + sum(log(L3)) # Joint LL
  return(-JointLL)                                      # Return joint NLL
}

# Minimize NLLjoint
inits <- c(alpha = 0, beta = 0)
solJoint <- optim(inits, NLLjoint, y1 = countdata1$counts, x1 = countdata1$selev, y2 = countdata2$counts, x2 = countdata2$selev,
  y3 =detdata$y, x3 = detdata$selev, hessian = TRUE, method = 'BFGS')

# Get MLE and asymptotic SE and print and save
(tmp4 <- get_MLE(solJoint, 4))

          MLE     ASE  LCL.95  UCL.95
alpha  0.6861 0.01852  0.6498  0.7224
beta  -1.9704 0.03522 -2.0394 -1.9013

             MLE        ASE     LCL.95     UCL.95
alpha  0.6860979 0.01851556  0.6498081  0.7223878
beta  -1.9703732 0.03522001 -2.0394032 -1.9013432

diy_est <- tmp4[,1]

Not too different from an IPM!

You have the data
Challenge: Run a single integrated model in Stan
https://rpubs.com/amolina/im