Estimating animal abundance by accounting for imperfect detecting

Author: Ye Htet Lwin

Date: 2025-12-25

🐾 Content

Introduction

Model assumption

Pros and Cons of N-mixture model

Model syntax

Modelling procedure

Pros and Cons of N-mixture model

R demo

References

Acknowledgments

All the code used in this tutorial is based on the work of various authors, contributors, and library developers. All credit goes to them for their invaluable contributions and dedication to the programming community. Thank you for your hard work and for sharing your knowledge and tools.

🐅🐻 ** Thank you very much** 🦌🐒

Introduction

In real-world scenarios, it is challenging to detect all individuals in a population.

Imperfect detection!

Factors like habitat, behavior, and observer error affect detection rates.

Accurate population estimates require models that account for detection imperfections.

Imperfect detection

N-mixture model

A statistical model used to estimate abundance of species from repeated count data while accounting for detection probability. (Royle, 2004)

Abundance: Ni~ Poisson (λ) (Ecological or state process)
Detection: 𝐶ij ~Binomial(𝑁i, 𝑝) (Observation process)

Where,

Ni is the True abundance at site i.
λ is the expected abundance
𝐶ij is the count at site i during survey j
p is the per-individual detection probability

Model assumptions

Homogeneity of detection among Ni individuals
Closed population assumption (No births, deaths, immigration, or emigration during the study period!)
Independence detection
Parametric modeling assumptions
No false positive error (Avoid counting non-target species!No recounting!)

What data do we need?

Repeated count data
Site information
Covariates

Measurement error or detection error!
Study area	True abundance	First visit	Second visit	Third visit
A	3	2	1	2
B	4	2	0	3

Some terminology in N-mixture model

Occasion: Specific period of a count at a site.
Occasion length: How long for single occasion! (3 days, 7 days, etc.)
Detection History: Record of detections across occasions.
NA: Not available (No survey during that period)

Pros and cons of N-mixture model

Pros and cons of N-mixture model
Pros	Cons
Accounts for detection probability	Strict model assumptions
Count data is cheap to collect	Count data is less information than mark-recapture data
Does not require auxillary information (eg. Distance, double observer etc.)	Require lots of replications
Analysis is straight forward.	Overdispersion issues
Handles missing data (NA)	Some limitations in camera trap study
Only needs repeated count data
Extensibel with covariates
Assessible software
Implemented in R packages (eg. unmarked, ubms)

How many sample size do we need?

Sites (M),
Occasion (J) ###

N-mixture model syntax

Beware when we use NB!

Good fit / bad prediction dilemma

NB can handles overdispersion and sometime shows lower AIC and good fit.
But it can sometimes produce unrealistic abundance values.
NB can overfit. Don’t solely rely on NB estimates, check residual plot.

$Source - AHM Book$

Modelling procedure

R Demo

Install pacakges

library(unmarked)
library(AICcmodavg)
library(ggplot2)

Set working directory and load data

Check the data (Optional)

head(dataraw)

  sites N occ1 occ2 occ3 veght1 veght2 veght3 elevation   eu_road location
1     1 2    0    2    0    212    248    216       218  310.0000        A
2     2 5    1    2    5    217    238    238       209  630.0793        A
3     3 3    0    3    1    227    225    242       225 1100.0000        A
4     4 2    0    1    2    235    224    183       208 1315.7507        A
5     5 2    1    2    2    242    248    201       187 1175.9678        A
6     6 3    1    1    3    239    263    239       185  233.4524        A

summary(dataraw)

     sites             N              occ1          occ2       
 Min.   : 1.00   Min.   :0.000   Min.   :0.0   Min.   :0.0000  
 1st Qu.:18.25   1st Qu.:1.250   1st Qu.:0.0   1st Qu.:0.0000  
 Median :35.50   Median :2.000   Median :0.0   Median :0.5000  
 Mean   :35.50   Mean   :2.186   Mean   :0.3   Mean   :0.6714  
 3rd Qu.:52.75   3rd Qu.:3.000   3rd Qu.:1.0   3rd Qu.:1.0000  
 Max.   :70.00   Max.   :5.000   Max.   :1.0   Max.   :3.0000  
      occ3            veght1          veght2        veght3      elevation    
 Min.   :0.0000   Min.   : 18.0   Min.   : 16   Min.   :162   Min.   :165.0  
 1st Qu.:0.0000   1st Qu.:189.0   1st Qu.:199   1st Qu.:210   1st Qu.:198.2  
 Median :0.0000   Median :213.0   Median :232   Median :241   Median :208.0  
 Mean   :0.6286   Mean   :201.9   Mean   :205   Mean   :239   Mean   :209.6  
 3rd Qu.:1.0000   3rd Qu.:237.2   3rd Qu.:239   3rd Qu.:269   3rd Qu.:218.8  
 Max.   :5.0000   Max.   :286.0   Max.   :263   Max.   :305   Max.   :275.0  
    eu_road          location        
 Min.   :  14.14   Length:70         
 1st Qu.: 442.50   Class :character  
 Median : 768.16   Mode  :character  
 Mean   : 901.44                     
 3rd Qu.:1159.66                     
 Max.   :2575.62

Look at the data for true abundance (N)

table(dataraw$N) #True abundance distribution


 0  1  2  3  4  5 
 7 11 24 19  8  1

sum(dataraw$N) #153, Ture total population size at M sites

[1] 153

sum(dataraw$N>0) #63, True number of occupied sites

[1] 63

mean(dataraw$N) #2.185, True mean abundance (estimate of lambda)

[1] 2.185714

Look at the data for observation

table(apply(dataraw[,c(3:5)],1,max)) #Observed abundance distribution (max count)


 0  1  2  3  5 
26 28 13  2  1

sum(apply(dataraw[,c(3:5)],1,max)) #Observed total population size at M sites

[1] 65

sum(apply(dataraw[,c(3:5)],1,max)>0) #Observed number of occupied sites

[1] 44

mean(apply(dataraw[,c(3:5)],1,max)) #Observed mean 'relative abundance'

[1] 0.9285714

Look overall summary with table format

result <- data.frame(
  Abundance = c("True abundance (N)", "Observed abundance"),
  Total = c(sum(dataraw$N), sum(apply(dataraw[, 3:5], 1, max))),
  Occupied_sites = c(sum(dataraw$N > 0), sum(apply(dataraw[, 3:5], 1, max) > 0)),
  Mean = c(mean(dataraw$N), mean(apply(dataraw[, 3:5], 1, max)))
)

# Remove row names by setting them to NULL
row.names(result) <- NULL

# Print the result
print(result)

           Abundance Total Occupied_sites      Mean
1 True abundance (N)   153             63 2.1857143
2 Observed abundance    65             44 0.9285714

Until here is just checking our dataset information.
In real world ecological dataset, it is not possible to check actual abundance (N)

Fitting N-mixture model

1. Format data

y <- dataraw[,c(3:5)] ##Count Matrix for repeated count
siteCovs <- dataraw[,c('elevation','eu_road','location')] ##Site covariates for abundance
obsCovs=list(veght=dataraw[,c(6:8)]) ##Observation covariates for detection

2. Create unmarked dataframe

umf <- unmarkedFramePCount(y = y,   ##Count Matrix for repeated count
                           siteCovs = siteCovs,   ##Site covariates for abundance
                           obsCovs = obsCovs ) ##Observation covariates for detection

2.1 Check our unmarked frame

summary(umf)

unmarkedFrame Object

70 sites
Maximum number of observations per site: 3 
Mean number of observations per site: 3 
Sites with at least one detection: 44 

Tabulation of y observations:
  0   1   2   3   5 
124  65  18   2   1 

Site-level covariates:
   elevation        eu_road        location
 Min.   :165.0   Min.   :  14.14   A:30    
 1st Qu.:198.2   1st Qu.: 442.50   B:40    
 Median :208.0   Median : 768.16           
 Mean   :209.6   Mean   : 901.44           
 3rd Qu.:218.8   3rd Qu.:1159.66           
 Max.   :275.0   Max.   :2575.62           

Observation-level covariates:
     veght      
 Min.   : 16.0  
 1st Qu.:198.2  
 Median :226.0  
 Mean   :215.3  
 3rd Qu.:244.0  
 Max.   :305.0

Plot

plot(umf)

2.2 Testing K value with null model

Generally, there has two ways to decide K value

First approach - According to Kery et al. (2018), we can define K value as K=max(Ci,t) + 100

max(y,na.rm = TRUE) #5 is maximum K value (max(Ci,t)), then add 100

[1] 5

Use K value 105 for the model

null_p1=pcount(~1~1,K=105,data=umf) #According to (Kery et al., 2018)
summary(null_p1)  #AIC = 388.2797


Call:
pcount(formula = ~1 ~ 1, data = umf, K = 105)

Abundance (log-scale):
 Estimate    SE    z P(>|z|)
    0.392 0.207 1.89  0.0585

Detection (logit-scale):
 Estimate    SE     z P(>|z|)
   -0.573 0.308 -1.86   0.063

AIC: 388.2797 
Number of sites: 70

null_p2=pcount(~1~1,K=200,data=umf) #Test another K value > 105 to make                                      sure whether AIC value is stable or not
summary(null_p2)  #AIC = 388.2797


Call:
pcount(formula = ~1 ~ 1, data = umf, K = 200)

Abundance (log-scale):
 Estimate    SE    z P(>|z|)
    0.392 0.207 1.89  0.0585

Detection (logit-scale):
 Estimate    SE     z P(>|z|)
   -0.573 0.308 -1.86   0.063

AIC: 388.2797 
Number of sites: 70

AIC = 388.2797 for both null_p1 and null_p2
Same AIC value with different K value, it means we can use K value 105
If AIC is not stable, your model has identifiability issues
Then try another distributions OR add some covariates and then check the best model

2.2 Second approach to check K value

test_K <- function(K){
  fit <- suppressWarnings(pcount(~1~1, umf, K=K))
  exp(coef(fit)[1])
}

kvals <- c(8,10,15,20,25,30)
lams <- sapply(kvals, test_K)
plot(kvals, lams, type='o', pch=19, xlim=c(5, 30),
     xlab="K", ylab="lambda estimate")
abline(v=15, col='red', lty=2)

Based on second method, I will try K > 15.
And then, I will test again for AIC

null_p1=pcount(~1~1,K=20,data=umf)
summary(null_p1)  #AIC = 388.2797


Call:
pcount(formula = ~1 ~ 1, data = umf, K = 20)

Abundance (log-scale):
 Estimate    SE    z P(>|z|)
    0.392 0.207 1.89  0.0585

Detection (logit-scale):
 Estimate    SE     z P(>|z|)
   -0.573 0.308 -1.86   0.063

AIC: 388.2797 
Number of sites: 70

null_p2=pcount(~1~1,K=50,data=umf)
summary(null_p2)  #AIC = 388.2797


Call:
pcount(formula = ~1 ~ 1, data = umf, K = 50)

Abundance (log-scale):
 Estimate    SE    z P(>|z|)
    0.392 0.207 1.89  0.0585

Detection (logit-scale):
 Estimate    SE     z P(>|z|)
   -0.573 0.308 -1.86   0.063

AIC: 388.2797 
Number of sites: 70

AIC value doesn’t change. So I will choose K=20 for this tutorial
You can choose either first or second approach to decide your K value in your model as long as your model AIC is stable
Higer K value will impact your computation time
Now let’s look at our null model estimation
N-mixture used log-scale for abundance and logit scale for detection
So we need to backtransform to get actual mean abundance and detection

lambda=backTransform(null_p1,type='state') ##state for abundance process
lambda

Backtransformed linear combination(s) of Abundance estimate(s)

 Estimate    SE LinComb (Intercept)
     1.48 0.306   0.392           1

Transformation: exp

confint(lambda) #to get 95% confident interval

     0.025    0.975
 0.9860346 2.220318

detection=backTransform(null_p1,type='det') ##det for detection process
detection

Backtransformed linear combination(s) of Detection estimate(s)

 Estimate     SE LinComb (Intercept)
     0.36 0.0711  -0.573           1

Transformation: logistic

confint(detection) #to get 95% confident interval

     0.025     0.975
 0.2353875 0.5077896

Compare with original dataset (Recall the original dataset)

print(result)

           Abundance Total Occupied_sites      Mean
1 True abundance (N)   153             63 2.1857143
2 Observed abundance    65             44 0.9285714

Our null model estimate:
For abundance 1.48 (0.986 Lower CI, 2.22 Upper CI)
For detection 0.36 (0.235 Lower CI, 0.507 Upper CI)
Close to true abundance!
Start fitting different models with covariates

3. Fit Model with pcount

First for detection, abundance as constant (i.e., 1)

det_full=pcount(~scale(veght)+scale(elevation)~1,
                K=20,mixture='P',data=umf) #all detection variables
summary(det_full)


Call:
pcount(formula = ~scale(veght) + scale(elevation) ~ 1, data = umf, 
    K = 20, mixture = "P")

Abundance (log-scale):
 Estimate    SE    z P(>|z|)
     0.38 0.193 1.98  0.0482

Detection (logit-scale):
                 Estimate    SE      z P(>|z|)
(Intercept)       -0.5635 0.292 -1.932  0.0533
scale(veght)       0.0234 0.138  0.169  0.8657
scale(elevation)  -0.4410 0.200 -2.201  0.0277

AIC: 387.277 
Number of sites: 70

det_veght=pcount(~scale(veght)~1,
                K=20,mixture='P',data=umf) #only veght variable
summary(det_veght)


Call:
pcount(formula = ~scale(veght) ~ 1, data = umf, K = 20, mixture = "P")

Abundance (log-scale):
 Estimate    SE    z P(>|z|)
    0.399 0.209 1.91  0.0565

Detection (logit-scale):
             Estimate    SE      z P(>|z|)
(Intercept)   -0.5861 0.311 -1.885  0.0595
scale(veght)   0.0573 0.137  0.418  0.6759

AIC: 390.1048 
Number of sites: 70

det_elevation=pcount(~scale(elevation)~1,
                     K=20,mixture='P',data=umf) #only elevation variable
summary(det_elevation)


Call:
pcount(formula = ~scale(elevation) ~ 1, data = umf, K = 20, mixture = "P")

Abundance (log-scale):
 Estimate    SE    z P(>|z|)
    0.378 0.192 1.97  0.0483

Detection (logit-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)        -0.559 0.290 -1.93  0.0536
scale(elevation)   -0.446 0.198 -2.25  0.0246

AIC: 385.3056 
Number of sites: 70

4. Let’s select the best detection model

Model selection for detection process

Listing models

Modeles <- list(det_full,det_veght,det_elevation)

Names   <- c("det_full","det_veght","det_elevation")

aictab(cand.set = Modeles, modnames = Names)


Model selection based on AICc:

              K   AICc Delta_AICc AICcWt Cum.Wt      LL
det_elevation 3 385.67       0.00   0.70   0.70 -189.65
det_full      4 387.89       2.22   0.23   0.94 -189.64
det_veght     3 390.47       4.80   0.06   1.00 -192.05

det_elevation is the best model, then fit for abundance

(mfull =pcount(~scale(elevation)~scale(elevation)+scale(eu_road)+location,
               K=20,mixture='P',data = umf))


Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + scale(eu_road) + 
    location, data = umf, K = 20, mixture = "P")

Abundance (log-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)         0.872 0.275  3.17 0.00154
scale(elevation)    0.542 0.284  1.91 0.05632
scale(eu_road)     -0.120 0.154 -0.78 0.43538
locationB          -0.716 0.279 -2.56 0.01036

Detection (logit-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)        -0.708 0.342 -2.07  0.0382
scale(elevation)   -0.762 0.305 -2.50  0.0125

AIC: 383.0385 
Number of sites: 70

(m1 =pcount(~scale(elevation)~scale(elevation)+scale(eu_road),
            K=20,mixture='P',data = umf))


Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + scale(eu_road), 
    data = umf, K = 20, mixture = "P")

Abundance (log-scale):
                 Estimate    SE      z P(>|z|)
(Intercept)        0.4868 0.240  2.024  0.0429
scale(elevation)   0.3089 0.267  1.155  0.2479
scale(eu_road)    -0.0911 0.150 -0.607  0.5437

Detection (logit-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)        -0.686 0.334 -2.06  0.0397
scale(elevation)   -0.716 0.307 -2.33  0.0196

AIC: 387.7889 
Number of sites: 70

(m2 =pcount(~scale(elevation)~scale(elevation)+location,
            K=20,mixture='P',data = umf))


Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + location, 
    data = umf, K = 20, mixture = "P")

Abundance (log-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)         0.864 0.274  3.15 0.00161
scale(elevation)    0.470 0.269  1.74 0.08103
locationB          -0.702 0.279 -2.52 0.01186

Detection (logit-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)        -0.705 0.341 -2.07  0.0386
scale(elevation)   -0.756 0.304 -2.48  0.0130

AIC: 381.6681 
Number of sites: 70

(m3 =pcount(~scale(elevation)~scale(eu_road)+location,
            K=20,mixture='P',data = umf))


Call:
pcount(formula = ~scale(elevation) ~ scale(eu_road) + location, 
    data = umf, K = 20, mixture = "P")

Abundance (log-scale):
               Estimate    SE       z P(>|z|)
(Intercept)      0.6291 0.236  2.6615 0.00778
scale(eu_road)  -0.0106 0.145 -0.0732 0.94164
locationB       -0.5289 0.266 -1.9913 0.04644

Detection (logit-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)        -0.551 0.311 -1.77  0.0769
scale(elevation)   -0.283 0.253 -1.12  0.2637

AIC: 385.1921 
Number of sites: 70

(m4 =pcount(~scale(elevation)~scale(elevation),
            K=20,mixture='P',data = umf))


Call:
pcount(formula = ~scale(elevation) ~ scale(elevation), data = umf, 
    K = 20, mixture = "P")

Abundance (log-scale):
                 Estimate    SE    z P(>|z|)
(Intercept)         0.488 0.240 2.03  0.0423
scale(elevation)    0.255 0.253 1.01  0.3132

Detection (logit-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)        -0.685 0.334 -2.05  0.0403
scale(elevation)   -0.712 0.307 -2.32  0.0203

AIC: 386.1667 
Number of sites: 70

(m5 =pcount(~scale(elevation)~scale(eu_road),
            K=20,mixture='P',data = umf))


Call:
pcount(formula = ~scale(elevation) ~ scale(eu_road), data = umf, 
    K = 20, mixture = "P")

Abundance (log-scale):
               Estimate    SE      z P(>|z|)
(Intercept)       0.374 0.193  1.932  0.0533
scale(eu_road)   -0.027 0.141 -0.192  0.8476

Detection (logit-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)        -0.556 0.291 -1.91  0.0565
scale(elevation)   -0.429 0.218 -1.97  0.0487

AIC: 387.2684 
Number of sites: 70

(m6 =pcount(~scale(elevation)~location,
            K=20,mixture='P',data = umf))


Call:
pcount(formula = ~scale(elevation) ~ location, data = umf, K = 20, 
    mixture = "P")

Abundance (log-scale):
            Estimate    SE     z P(>|z|)
(Intercept)     0.63 0.235  2.68 0.00733
locationB      -0.53 0.265 -2.00 0.04562

Detection (logit-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)        -0.549 0.309 -1.78  0.0757
scale(elevation)   -0.291 0.232 -1.25  0.2111

AIC: 383.1975 
Number of sites: 70

Let’s select again for the best abundance model

Model selection for abundance process

Listing models

Modeles <- list(mfull,m1,m2,m3,m4,m5,m6)

Names   <- c("mfull","m1","m2","m3","m4","m5","m6")

aictab(cand.set = Modeles, modnames = Names)


Model selection based on AICc:

      K   AICc Delta_AICc AICcWt Cum.Wt      LL
m2    5 382.61       0.00   0.42   0.42 -185.83
m6    4 383.81       1.21   0.23   0.65 -187.60
mfull 6 384.37       1.77   0.17   0.83 -185.52
m3    5 386.13       3.52   0.07   0.90 -187.60
m4    4 386.78       4.18   0.05   0.95 -189.08
m5    4 387.88       5.28   0.03   0.98 -189.63
m1    5 388.73       6.12   0.02   1.00 -188.89

Finally, m2 is the best model for both detection and abundance

Caution!

There has multiple ways for model selection procedure and you can choose
approapriate method based on your experience and literature review
In our tutorial, m2 is the best model

(m2 =pcount(~scale(elevation)~scale(elevation)+location,
            K=20, mixture='P',data = umf)) #Best model


Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + location, 
    data = umf, K = 20, mixture = "P")

Abundance (log-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)         0.864 0.274  3.15 0.00161
scale(elevation)    0.470 0.269  1.74 0.08103
locationB          -0.702 0.279 -2.52 0.01186

Detection (logit-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)        -0.705 0.341 -2.07  0.0386
scale(elevation)   -0.756 0.304 -2.48  0.0130

AIC: 381.6681 
Number of sites: 70

Check another two distributions: ZIP and NB

(m2_zip =pcount(~scale(elevation)~scale(elevation)+location,
                K=20,mixture='ZIP',data = umf))


Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + location, 
    data = umf, K = 20, mixture = "ZIP")

Abundance (log-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)         1.069 0.434  2.46  0.0138
scale(elevation)    0.540 0.322  1.68  0.0934
locationB          -0.705 0.288 -2.45  0.0145

Detection (logit-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)        -0.855 0.464 -1.84  0.0653
scale(elevation)   -0.812 0.346 -2.34  0.0190

Zero-inflation (logit-scale):
 Estimate   SE     z P(>|z|)
    -2.31 1.48 -1.56   0.119

AIC: 383.1792 
Number of sites: 70

(m2_nb =pcount(~scale(elevation)~scale(elevation)+location,
               K=20,mixture='NB',data = umf))


Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + location, 
    data = umf, K = 20, mixture = "NB")

Abundance (log-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)         0.876 0.293  2.99  0.0028
scale(elevation)    0.475 0.277  1.72  0.0863
locationB          -0.700 0.282 -2.48  0.0131

Detection (logit-scale):
                 Estimate    SE     z P(>|z|)
(Intercept)        -0.724 0.372 -1.95  0.0517
scale(elevation)   -0.762 0.312 -2.45  0.0145

Dispersion (log-scale):
 Estimate   SE     z P(>|z|)
     3.68 7.04 0.523   0.601

AIC: 383.6468 
Number of sites: 70

Model selection again between different mixtures

Modeles <- list(m2,m2_zip,m2_nb)

Names   <- c("m2","m2_zip","m2_nb")

aictab(cand.set = Modeles, modnames = Names)


Model selection based on AICc:

       K   AICc Delta_AICc AICcWt Cum.Wt      LL
m2     5 382.61       0.00   0.59   0.59 -185.83
m2_zip 6 384.51       1.91   0.23   0.82 -185.59
m2_nb  6 384.98       2.37   0.18   1.00 -185.82

Poisson mixture won among three mixtures
Sometime NB shows lower AIC than other two mixtures
In this case, it would be nice to check residual plot because of good fit/ bad prediction dilemma,
Don’t solely rely on NB estimates, as they can sometimes provide unrealistic abundance values.

5. Goodness of fit test

gof=Nmix.gof.test(m2,nsim=100)  #Use 1000 nsim for real data

gof


Chi-square goodness-of-fit for N-mixture model of 'unmarkedFitPCount' class

Observed chi-square statistic = 206.785 
Number of bootstrap samples = 100
P-value = 0.45

Quantiles of bootstrapped statistics:
  0%  25%  50%  75% 100% 
 158  188  202  223  332 

Estimate of c-hat = 1

We got c-hat value 1. It means no overdispersion, Good fit
If c.hat value > 1,our model has overdispersion problem
If c.hat value < 1, our model has underdispersion problem
Then you need to take into account c-hat value in your prediction

Check three results and compare with true mean

cbind(rbind("Poisson" = exp(coef(m2)[1]), 
            "NegBin" = exp(coef(m2_nb)[1]), 
            "ZIP"= exp(coef(m2_zip)[1])))

        lam(Int)
Poisson 2.373612
NegBin  2.401943
ZIP     2.911520

Print the original dataset

print(result)

           Abundance Total Occupied_sites      Mean
1 True abundance (N)   153             63 2.1857143
2 Observed abundance    65             44 0.9285714

Pretty close estimation!

6. Prediction

For Detection

Elevation variable
Hold everything constant except for elevation

newDat <- data.frame(elevation = seq(min(dataraw$elevation), max(dataraw$elevation), length=100))
Epred=modavgPred(cand.set = list(m2), newdata=newDat, parm.type = "detect", type= "response")

newDat$mod_avg_pred <- Epred$mod.avg.pred
newDat$uncond_se <- Epred$uncond.se
newDat$low95 <- Epred$lower.CL
newDat$upp95 <- Epred$upper.CL

###Plotting the result for elevation

ggplot(newDat, aes(x = elevation)) + 
  geom_line(aes(y = mod_avg_pred), colour = "blue") + 
  geom_ribbon(aes(ymin = low95, ymax = upp95), alpha = 0.2) +
  ylim(0, 1) +  
  labs(y = expression("Detection probability"), 
       x = "Elevation (m)") +
  theme_classic()

For Abundance

Location variable
Hold everything constant except for location

newDat <- data.frame(location=factor(c("A", "B")), elevation = mean(dataraw$elevation))
Epred=modavgPred(cand.set = list(m2), newdata=newDat, parm.type = "lambda", type = "response")

newDat$mod_avg_pred <- Epred$mod.avg.pred
newDat$uncond_se <- Epred$uncond.se
newDat$low95 <- Epred$lower.CL
newDat$upp95 <- Epred$upper.CL

###Plotting the result for eu_water
ggplot(newDat, aes(location,mod_avg_pred)) +
  geom_col(aes(fill=location), color = "black", width = 0.85) +
  ylim(0,10) +
  geom_errorbar(aes(ymin = low95,
                    ymax = upp95),
                color = "#22292F",
                width = .1)+
  labs(y = expression("Predicted abundance"), 
       x = "Location of camera traps")+
  theme_classic()

We do not need to draw a plot for elevation in abundance process becasue that variable is not significant in m2

Thank you very much :)

References

M. Kéry y J. A. Royle. 2016. Applied hierarchical modeling in ecology: analysis of distribution, abundance and species richness in R and BUGS. Vol. 1 Prelude and static models. Academic Press, London, UK. 783 Pp. [ISBN: 978-0-12-801378-6].
Quantitative methods for population dynamics. Two day workshop by Olivier Gimenez, Aurélien Besnard and Sarah Cubaynes. https://oliviergimenez.github.io/popdyn-workshop/
Kéry, M. 2018. Identifiability in N‐mixture models: a large‐scale screening test with bird data. Ecology 99:281–288.

Sidebar

Estimating animal abundance by accounting for imperfect detecting

🐾 Content

Acknowledgments

Introduction

Introduction

N-mixture model

Where,

Model assumptions

Model assumptions

What data do we need?

Some terminology in N-mixture model

Pros and cons of N-mixture model

Pros and cons of N-mixture model

How many sample size do we need?

N-mixture model syntax

N-mixture model syntax

Good fit / bad prediction dilemma

Modelling procedure

Modelling procedure

R Demo

R Demo