R Demo
Install pacakges
library(unmarked)
library(AICcmodavg)
library(ggplot2)
Set working directory and load data
Check the data (Optional)
sites N occ1 occ2 occ3 veght1 veght2 veght3 elevation eu_road location
1 1 2 0 2 0 212 248 216 218 310.0000 A
2 2 5 1 2 5 217 238 238 209 630.0793 A
3 3 3 0 3 1 227 225 242 225 1100.0000 A
4 4 2 0 1 2 235 224 183 208 1315.7507 A
5 5 2 1 2 2 242 248 201 187 1175.9678 A
6 6 3 1 1 3 239 263 239 185 233.4524 A
sites N occ1 occ2
Min. : 1.00 Min. :0.000 Min. :0.0 Min. :0.0000
1st Qu.:18.25 1st Qu.:1.250 1st Qu.:0.0 1st Qu.:0.0000
Median :35.50 Median :2.000 Median :0.0 Median :0.5000
Mean :35.50 Mean :2.186 Mean :0.3 Mean :0.6714
3rd Qu.:52.75 3rd Qu.:3.000 3rd Qu.:1.0 3rd Qu.:1.0000
Max. :70.00 Max. :5.000 Max. :1.0 Max. :3.0000
occ3 veght1 veght2 veght3 elevation
Min. :0.0000 Min. : 18.0 Min. : 16 Min. :162 Min. :165.0
1st Qu.:0.0000 1st Qu.:189.0 1st Qu.:199 1st Qu.:210 1st Qu.:198.2
Median :0.0000 Median :213.0 Median :232 Median :241 Median :208.0
Mean :0.6286 Mean :201.9 Mean :205 Mean :239 Mean :209.6
3rd Qu.:1.0000 3rd Qu.:237.2 3rd Qu.:239 3rd Qu.:269 3rd Qu.:218.8
Max. :5.0000 Max. :286.0 Max. :263 Max. :305 Max. :275.0
eu_road location
Min. : 14.14 Length:70
1st Qu.: 442.50 Class :character
Median : 768.16 Mode :character
Mean : 901.44
3rd Qu.:1159.66
Max. :2575.62
Look at the data for true abundance (N)
table(dataraw$N) #True abundance distribution
0 1 2 3 4 5
7 11 24 19 8 1
sum(dataraw$N) #153, Ture total population size at M sites
[1] 153
sum(dataraw$N>0) #63, True number of occupied sites
[1] 63
mean(dataraw$N) #2.185, True mean abundance (estimate of lambda)
[1] 2.185714
Look at the data for observation
table(apply(dataraw[,c(3:5)],1,max)) #Observed abundance distribution (max count)
0 1 2 3 5
26 28 13 2 1
sum(apply(dataraw[,c(3:5)],1,max)) #Observed total population size at M sites
[1] 65
sum(apply(dataraw[,c(3:5)],1,max)>0) #Observed number of occupied sites
[1] 44
mean(apply(dataraw[,c(3:5)],1,max)) #Observed mean 'relative abundance'
[1] 0.9285714
Look overall summary with table format
result <- data.frame(
Abundance = c("True abundance (N)", "Observed abundance"),
Total = c(sum(dataraw$N), sum(apply(dataraw[, 3:5], 1, max))),
Occupied_sites = c(sum(dataraw$N > 0), sum(apply(dataraw[, 3:5], 1, max) > 0)),
Mean = c(mean(dataraw$N), mean(apply(dataraw[, 3:5], 1, max)))
)
# Remove row names by setting them to NULL
row.names(result) <- NULL
# Print the result
print(result)
Abundance Total Occupied_sites Mean
1 True abundance (N) 153 63 2.1857143
2 Observed abundance 65 44 0.9285714
Until here is just checking our dataset information.
In real world ecological dataset, it is not possible to check
actual abundance (N)
Fitting N-mixture model
1. Format data
y <- dataraw[,c(3:5)] ##Count Matrix for repeated count
siteCovs <- dataraw[,c('elevation','eu_road','location')] ##Site covariates for abundance
obsCovs=list(veght=dataraw[,c(6:8)]) ##Observation covariates for detection
2. Create unmarked dataframe
umf <- unmarkedFramePCount(y = y, ##Count Matrix for repeated count
siteCovs = siteCovs, ##Site covariates for abundance
obsCovs = obsCovs ) ##Observation covariates for detection
2.1 Check our unmarked frame
unmarkedFrame Object
70 sites
Maximum number of observations per site: 3
Mean number of observations per site: 3
Sites with at least one detection: 44
Tabulation of y observations:
0 1 2 3 5
124 65 18 2 1
Site-level covariates:
elevation eu_road location
Min. :165.0 Min. : 14.14 A:30
1st Qu.:198.2 1st Qu.: 442.50 B:40
Median :208.0 Median : 768.16
Mean :209.6 Mean : 901.44
3rd Qu.:218.8 3rd Qu.:1159.66
Max. :275.0 Max. :2575.62
Observation-level covariates:
veght
Min. : 16.0
1st Qu.:198.2
Median :226.0
Mean :215.3
3rd Qu.:244.0
Max. :305.0
Plot

2.2 Testing K value with null model
Generally, there has two ways to decide K value
- First approach - According to Kery et al. (2018), we can define K
value as K=max(Ci,t) + 100
max(y,na.rm = TRUE) #5 is maximum K value (max(Ci,t)), then add 100
[1] 5
- Use K value 105 for the model
null_p1=pcount(~1~1,K=105,data=umf) #According to (Kery et al., 2018)
summary(null_p1) #AIC = 388.2797
Call:
pcount(formula = ~1 ~ 1, data = umf, K = 105)
Abundance (log-scale):
Estimate SE z P(>|z|)
0.392 0.207 1.89 0.0585
Detection (logit-scale):
Estimate SE z P(>|z|)
-0.573 0.308 -1.86 0.063
AIC: 388.2797
Number of sites: 70
null_p2=pcount(~1~1,K=200,data=umf) #Test another K value > 105 to make sure whether AIC value is stable or not
summary(null_p2) #AIC = 388.2797
Call:
pcount(formula = ~1 ~ 1, data = umf, K = 200)
Abundance (log-scale):
Estimate SE z P(>|z|)
0.392 0.207 1.89 0.0585
Detection (logit-scale):
Estimate SE z P(>|z|)
-0.573 0.308 -1.86 0.063
AIC: 388.2797
Number of sites: 70
AIC = 388.2797 for both null_p1 and null_p2
Same AIC value with different K value, it means we can use K
value 105
If AIC is not stable, your model has identifiability
issues
Then try another distributions OR add some covariates and then
check the best model
2.2 Second approach to check K value
test_K <- function(K){
fit <- suppressWarnings(pcount(~1~1, umf, K=K))
exp(coef(fit)[1])
}
kvals <- c(8,10,15,20,25,30)
lams <- sapply(kvals, test_K)
plot(kvals, lams, type='o', pch=19, xlim=c(5, 30),
xlab="K", ylab="lambda estimate")
abline(v=15, col='red', lty=2)

Based on second method, I will try K > 15.
And then, I will test again for AIC
null_p1=pcount(~1~1,K=20,data=umf)
summary(null_p1) #AIC = 388.2797
Call:
pcount(formula = ~1 ~ 1, data = umf, K = 20)
Abundance (log-scale):
Estimate SE z P(>|z|)
0.392 0.207 1.89 0.0585
Detection (logit-scale):
Estimate SE z P(>|z|)
-0.573 0.308 -1.86 0.063
AIC: 388.2797
Number of sites: 70
null_p2=pcount(~1~1,K=50,data=umf)
summary(null_p2) #AIC = 388.2797
Call:
pcount(formula = ~1 ~ 1, data = umf, K = 50)
Abundance (log-scale):
Estimate SE z P(>|z|)
0.392 0.207 1.89 0.0585
Detection (logit-scale):
Estimate SE z P(>|z|)
-0.573 0.308 -1.86 0.063
AIC: 388.2797
Number of sites: 70
AIC value doesn’t change. So I will choose K=20 for this
tutorial
You can choose either first or second approach to decide your K
value in your model as long as your model AIC is stable
Higer K value will impact your computation time
Now let’s look at our null model estimation
N-mixture used log-scale for abundance and logit scale for
detection
So we need to backtransform to get actual mean abundance and
detection
lambda=backTransform(null_p1,type='state') ##state for abundance process
lambda
Backtransformed linear combination(s) of Abundance estimate(s)
Estimate SE LinComb (Intercept)
1.48 0.306 0.392 1
Transformation: exp
confint(lambda) #to get 95% confident interval
0.025 0.975
0.9860346 2.220318
detection=backTransform(null_p1,type='det') ##det for detection process
detection
Backtransformed linear combination(s) of Detection estimate(s)
Estimate SE LinComb (Intercept)
0.36 0.0711 -0.573 1
Transformation: logistic
confint(detection) #to get 95% confident interval
0.025 0.975
0.2353875 0.5077896
- Compare with original dataset (Recall the original dataset)
Abundance Total Occupied_sites Mean
1 True abundance (N) 153 63 2.1857143
2 Observed abundance 65 44 0.9285714
Our null model estimate:
For abundance 1.48 (0.986 Lower CI, 2.22 Upper CI)
For detection 0.36 (0.235 Lower CI, 0.507 Upper CI)
Close to true abundance!
Start fitting different models with covariates
3. Fit Model with pcount
- First for detection, abundance as constant (i.e., 1)
det_full=pcount(~scale(veght)+scale(elevation)~1,
K=20,mixture='P',data=umf) #all detection variables
summary(det_full)
Call:
pcount(formula = ~scale(veght) + scale(elevation) ~ 1, data = umf,
K = 20, mixture = "P")
Abundance (log-scale):
Estimate SE z P(>|z|)
0.38 0.193 1.98 0.0482
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.5635 0.292 -1.932 0.0533
scale(veght) 0.0234 0.138 0.169 0.8657
scale(elevation) -0.4410 0.200 -2.201 0.0277
AIC: 387.277
Number of sites: 70
det_veght=pcount(~scale(veght)~1,
K=20,mixture='P',data=umf) #only veght variable
summary(det_veght)
Call:
pcount(formula = ~scale(veght) ~ 1, data = umf, K = 20, mixture = "P")
Abundance (log-scale):
Estimate SE z P(>|z|)
0.399 0.209 1.91 0.0565
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.5861 0.311 -1.885 0.0595
scale(veght) 0.0573 0.137 0.418 0.6759
AIC: 390.1048
Number of sites: 70
det_elevation=pcount(~scale(elevation)~1,
K=20,mixture='P',data=umf) #only elevation variable
summary(det_elevation)
Call:
pcount(formula = ~scale(elevation) ~ 1, data = umf, K = 20, mixture = "P")
Abundance (log-scale):
Estimate SE z P(>|z|)
0.378 0.192 1.97 0.0483
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.559 0.290 -1.93 0.0536
scale(elevation) -0.446 0.198 -2.25 0.0246
AIC: 385.3056
Number of sites: 70
4. Let’s select the best detection model
Model selection for detection process
Listing models
Modeles <- list(det_full,det_veght,det_elevation)
Names <- c("det_full","det_veght","det_elevation")
aictab(cand.set = Modeles, modnames = Names)
Model selection based on AICc:
K AICc Delta_AICc AICcWt Cum.Wt LL
det_elevation 3 385.67 0.00 0.70 0.70 -189.65
det_full 4 387.89 2.22 0.23 0.94 -189.64
det_veght 3 390.47 4.80 0.06 1.00 -192.05
- det_elevation is the best model, then fit for abundance
(mfull =pcount(~scale(elevation)~scale(elevation)+scale(eu_road)+location,
K=20,mixture='P',data = umf))
Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + scale(eu_road) +
location, data = umf, K = 20, mixture = "P")
Abundance (log-scale):
Estimate SE z P(>|z|)
(Intercept) 0.872 0.275 3.17 0.00154
scale(elevation) 0.542 0.284 1.91 0.05632
scale(eu_road) -0.120 0.154 -0.78 0.43538
locationB -0.716 0.279 -2.56 0.01036
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.708 0.342 -2.07 0.0382
scale(elevation) -0.762 0.305 -2.50 0.0125
AIC: 383.0385
Number of sites: 70
(m1 =pcount(~scale(elevation)~scale(elevation)+scale(eu_road),
K=20,mixture='P',data = umf))
Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + scale(eu_road),
data = umf, K = 20, mixture = "P")
Abundance (log-scale):
Estimate SE z P(>|z|)
(Intercept) 0.4868 0.240 2.024 0.0429
scale(elevation) 0.3089 0.267 1.155 0.2479
scale(eu_road) -0.0911 0.150 -0.607 0.5437
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.686 0.334 -2.06 0.0397
scale(elevation) -0.716 0.307 -2.33 0.0196
AIC: 387.7889
Number of sites: 70
(m2 =pcount(~scale(elevation)~scale(elevation)+location,
K=20,mixture='P',data = umf))
Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + location,
data = umf, K = 20, mixture = "P")
Abundance (log-scale):
Estimate SE z P(>|z|)
(Intercept) 0.864 0.274 3.15 0.00161
scale(elevation) 0.470 0.269 1.74 0.08103
locationB -0.702 0.279 -2.52 0.01186
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.705 0.341 -2.07 0.0386
scale(elevation) -0.756 0.304 -2.48 0.0130
AIC: 381.6681
Number of sites: 70
(m3 =pcount(~scale(elevation)~scale(eu_road)+location,
K=20,mixture='P',data = umf))
Call:
pcount(formula = ~scale(elevation) ~ scale(eu_road) + location,
data = umf, K = 20, mixture = "P")
Abundance (log-scale):
Estimate SE z P(>|z|)
(Intercept) 0.6291 0.236 2.6615 0.00778
scale(eu_road) -0.0106 0.145 -0.0732 0.94164
locationB -0.5289 0.266 -1.9913 0.04644
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.551 0.311 -1.77 0.0769
scale(elevation) -0.283 0.253 -1.12 0.2637
AIC: 385.1921
Number of sites: 70
(m4 =pcount(~scale(elevation)~scale(elevation),
K=20,mixture='P',data = umf))
Call:
pcount(formula = ~scale(elevation) ~ scale(elevation), data = umf,
K = 20, mixture = "P")
Abundance (log-scale):
Estimate SE z P(>|z|)
(Intercept) 0.488 0.240 2.03 0.0423
scale(elevation) 0.255 0.253 1.01 0.3132
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.685 0.334 -2.05 0.0403
scale(elevation) -0.712 0.307 -2.32 0.0203
AIC: 386.1667
Number of sites: 70
(m5 =pcount(~scale(elevation)~scale(eu_road),
K=20,mixture='P',data = umf))
Call:
pcount(formula = ~scale(elevation) ~ scale(eu_road), data = umf,
K = 20, mixture = "P")
Abundance (log-scale):
Estimate SE z P(>|z|)
(Intercept) 0.374 0.193 1.932 0.0533
scale(eu_road) -0.027 0.141 -0.192 0.8476
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.556 0.291 -1.91 0.0565
scale(elevation) -0.429 0.218 -1.97 0.0487
AIC: 387.2684
Number of sites: 70
(m6 =pcount(~scale(elevation)~location,
K=20,mixture='P',data = umf))
Call:
pcount(formula = ~scale(elevation) ~ location, data = umf, K = 20,
mixture = "P")
Abundance (log-scale):
Estimate SE z P(>|z|)
(Intercept) 0.63 0.235 2.68 0.00733
locationB -0.53 0.265 -2.00 0.04562
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.549 0.309 -1.78 0.0757
scale(elevation) -0.291 0.232 -1.25 0.2111
AIC: 383.1975
Number of sites: 70
- Let’s select again for the best abundance model
Model selection for abundance process
Listing models
Modeles <- list(mfull,m1,m2,m3,m4,m5,m6)
Names <- c("mfull","m1","m2","m3","m4","m5","m6")
aictab(cand.set = Modeles, modnames = Names)
Model selection based on AICc:
K AICc Delta_AICc AICcWt Cum.Wt LL
m2 5 382.61 0.00 0.42 0.42 -185.83
m6 4 383.81 1.21 0.23 0.65 -187.60
mfull 6 384.37 1.77 0.17 0.83 -185.52
m3 5 386.13 3.52 0.07 0.90 -187.60
m4 4 386.78 4.18 0.05 0.95 -189.08
m5 4 387.88 5.28 0.03 0.98 -189.63
m1 5 388.73 6.12 0.02 1.00 -188.89
- Finally, m2 is the best model for both detection and abundance
Caution!
There has multiple ways for model selection procedure and you can
choose
approapriate method based on your experience and literature
review
In our tutorial, m2 is the best model
(m2 =pcount(~scale(elevation)~scale(elevation)+location,
K=20, mixture='P',data = umf)) #Best model
Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + location,
data = umf, K = 20, mixture = "P")
Abundance (log-scale):
Estimate SE z P(>|z|)
(Intercept) 0.864 0.274 3.15 0.00161
scale(elevation) 0.470 0.269 1.74 0.08103
locationB -0.702 0.279 -2.52 0.01186
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.705 0.341 -2.07 0.0386
scale(elevation) -0.756 0.304 -2.48 0.0130
AIC: 381.6681
Number of sites: 70
- Check another two distributions: ZIP and NB
(m2_zip =pcount(~scale(elevation)~scale(elevation)+location,
K=20,mixture='ZIP',data = umf))
Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + location,
data = umf, K = 20, mixture = "ZIP")
Abundance (log-scale):
Estimate SE z P(>|z|)
(Intercept) 1.069 0.434 2.46 0.0138
scale(elevation) 0.540 0.322 1.68 0.0934
locationB -0.705 0.288 -2.45 0.0145
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.855 0.464 -1.84 0.0653
scale(elevation) -0.812 0.346 -2.34 0.0190
Zero-inflation (logit-scale):
Estimate SE z P(>|z|)
-2.31 1.48 -1.56 0.119
AIC: 383.1792
Number of sites: 70
(m2_nb =pcount(~scale(elevation)~scale(elevation)+location,
K=20,mixture='NB',data = umf))
Call:
pcount(formula = ~scale(elevation) ~ scale(elevation) + location,
data = umf, K = 20, mixture = "NB")
Abundance (log-scale):
Estimate SE z P(>|z|)
(Intercept) 0.876 0.293 2.99 0.0028
scale(elevation) 0.475 0.277 1.72 0.0863
locationB -0.700 0.282 -2.48 0.0131
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.724 0.372 -1.95 0.0517
scale(elevation) -0.762 0.312 -2.45 0.0145
Dispersion (log-scale):
Estimate SE z P(>|z|)
3.68 7.04 0.523 0.601
AIC: 383.6468
Number of sites: 70
Model selection again between different mixtures
Modeles <- list(m2,m2_zip,m2_nb)
Names <- c("m2","m2_zip","m2_nb")
aictab(cand.set = Modeles, modnames = Names)
Model selection based on AICc:
K AICc Delta_AICc AICcWt Cum.Wt LL
m2 5 382.61 0.00 0.59 0.59 -185.83
m2_zip 6 384.51 1.91 0.23 0.82 -185.59
m2_nb 6 384.98 2.37 0.18 1.00 -185.82
Poisson mixture won among three mixtures
Sometime NB shows lower AIC than other two mixtures
In this case, it would be nice to check residual plot because of
good fit/ bad prediction dilemma,
Don’t solely rely on NB estimates, as they can sometimes provide
unrealistic abundance values.
5. Goodness of fit test
gof=Nmix.gof.test(m2,nsim=100) #Use 1000 nsim for real data

Chi-square goodness-of-fit for N-mixture model of 'unmarkedFitPCount' class
Observed chi-square statistic = 206.785
Number of bootstrap samples = 100
P-value = 0.45
Quantiles of bootstrapped statistics:
0% 25% 50% 75% 100%
158 188 202 223 332
Estimate of c-hat = 1
We got c-hat value 1. It means no overdispersion, Good
fit
If c.hat value > 1,our model has overdispersion
problem
If c.hat value < 1, our model has underdispersion
problem
Then you need to take into account c-hat value in your
prediction
Check three results and compare with true mean
cbind(rbind("Poisson" = exp(coef(m2)[1]),
"NegBin" = exp(coef(m2_nb)[1]),
"ZIP"= exp(coef(m2_zip)[1])))
lam(Int)
Poisson 2.373612
NegBin 2.401943
ZIP 2.911520
Print the original dataset
Abundance Total Occupied_sites Mean
1 True abundance (N) 153 63 2.1857143
2 Observed abundance 65 44 0.9285714
6. Prediction
For Detection
newDat <- data.frame(elevation = seq(min(dataraw$elevation), max(dataraw$elevation), length=100))
Epred=modavgPred(cand.set = list(m2), newdata=newDat, parm.type = "detect", type= "response")
newDat$mod_avg_pred <- Epred$mod.avg.pred
newDat$uncond_se <- Epred$uncond.se
newDat$low95 <- Epred$lower.CL
newDat$upp95 <- Epred$upper.CL
###Plotting the result for elevation
ggplot(newDat, aes(x = elevation)) +
geom_line(aes(y = mod_avg_pred), colour = "blue") +
geom_ribbon(aes(ymin = low95, ymax = upp95), alpha = 0.2) +
ylim(0, 1) +
labs(y = expression("Detection probability"),
x = "Elevation (m)") +
theme_classic()

For Abundance
newDat <- data.frame(location=factor(c("A", "B")), elevation = mean(dataraw$elevation))
Epred=modavgPred(cand.set = list(m2), newdata=newDat, parm.type = "lambda", type = "response")
newDat$mod_avg_pred <- Epred$mod.avg.pred
newDat$uncond_se <- Epred$uncond.se
newDat$low95 <- Epred$lower.CL
newDat$upp95 <- Epred$upper.CL
###Plotting the result for eu_water
ggplot(newDat, aes(location,mod_avg_pred)) +
geom_col(aes(fill=location), color = "black", width = 0.85) +
ylim(0,10) +
geom_errorbar(aes(ymin = low95,
ymax = upp95),
color = "#22292F",
width = .1)+
labs(y = expression("Predicted abundance"),
x = "Location of camera traps")+
theme_classic()

- We do not need to draw a plot for elevation in abundance process
becasue that variable is not significant in m2
Thank you very much :)
References
M. Kéry y J. A. Royle. 2016. Applied hierarchical modeling in
ecology: analysis of distribution, abundance and species richness in R
and BUGS. Vol. 1 Prelude and static models. Academic Press, London, UK.
783 Pp. [ISBN: 978-0-12-801378-6].
Quantitative methods for population dynamics. Two day workshop by
Olivier Gimenez, Aurélien Besnard and Sarah Cubaynes. https://oliviergimenez.github.io/popdyn-workshop/
Kéry, M. 2018. Identifiability in N‐mixture models: a large‐scale
screening test with bird data. Ecology 99:281–288.