Preliminary Modeling

Modeling framework

We assigned all species that are detected at each site to three categories: Type B match: Species detected by both Camera + IUCN ( coded as 0 in the model). Type A match: Species detected by camera only (coded as 1). Type C match: Species detected by IUCN only (also coded as 1 in the model).

We ran two separate binomial mixed-effect models to examine what factors are associated with errors of the IUCN range map. We Compared type A, type B matches to assess the potential with omission error and type C and type B for commission error.

The binomial mixed-effects model looks like this:

Candidate variables

Variable Description Prediction
First tier variable
Body mass Previous study found that body mass is associated with two types of error of IUCN range map Body mass will have negative effect on both Commission and Omission errors: large mammal’s distribution data is more accurate. Hence more match
Realm Previous studies show that well IUCN range map is more accurate in well studied region Some realm will have significantly better match. Realm inculdes Afrotropical, Indo-Malay, Nearctic, Neotropical
IUCN_frq Total IUCN assessment of a species Negative effect on both errors. More assessment
IUCN_year Year of the latest IUCN assessment of a species Negative effect on both errors. Recent IUCN range map is more accurate
Camera trap sampling effort Camera trap days Negative effect. The more sampling efforts less
Second tier variable
ForStrat.Value Forgoing stratum. Forgoing strata G (ground), S (sub-canopy), Ar (Arboreal) Arboreal species has more error
Activity Nocturnal Nocturnality: Nocturnal (1), other (0) Positive effect
diet.breadth Number of diet a species has, 1 - 6. large -> generalists Negative effect
Tree_mean Three height: Index of habitat complexity. From 1.8 to 38.8 (m) Positive effect
Random effect
Species Potential random effects Same species has same species covariates
ProjectID Potential random effects 2 Same sites has same site covariates

Do we need random effect?

We generated fixed-effects minimal base-line models and three base-line mixed-model using the “glmer” function with random intercepts for species, projectID, and species + projectID. We can then check if including the random effect is permitted by comparing the AICs from the glm to AIC from the glmer model. If the AIC of the glmer object is smaller than the AIC of the glm object, then this indicates that including random intercepts is justified.

# baseline model glm for omission error
m0.glm.omi <-  glm(type ~ 1, family = binomial, data = omi) 

# add ´control = glmerControl(optimizer = “bobyqa”)´ to avoid unnecessary failures to converge.
# base-line mixed-model 1
m0.glmer1.omi<- glmer(type ~ (1|speciesScientificName), data = omi, family = binomial, control=glmerControl(optimizer="bobyqa"))

# base-line mixed-model 2
m0.glmer2.omi<- glmer(type ~ (1|projectID) , data = omi, family = binomial, control=glmerControl(optimizer="bobyqa"))

# base-line mixed-model 3
m0.glmer3.omi<- glmer(type ~ (1|speciesScientificName) + (1|projectID) , data = omi, family = binomial, control=glmerControl(optimizer="bobyqa"))



# baseline model glm commission error
m0.glm.com <-  glm(type ~ 1, family = binomial, data = com) 

# base-line mixed-model 1
m0.glmer1.com<- glmer(type ~ (1|speciesScientificName), data = com, family = binomial, control=glmerControl(optimizer="bobyqa"))

# base-line mixed-model 2
m0.glmer2.com<- glmer(type ~ (1|projectID) , data = com, family = binomial, control=glmerControl(optimizer="bobyqa"))

# base-line mixed-model 3
m0.glmer3.com<- glmer(type ~ (1|speciesScientificName) + (1|projectID) , data = com, family = binomial, control=glmerControl(optimizer="bobyqa"))

Now, we check if including the random effect is permitted by comparing the AICs from the glm to AIC from the glmer model. If the AIC of the glmer object is smaller than the AIC of the glm object, then this indicates that including random intercepts is justified.

# Omission error
aic.glm <- AIC(logLik(m0.glm.omi))
aic.glmer1 <- AIC(logLik(m0.glmer1.omi))
aic.glmer2 <- AIC(logLik(m0.glmer2.omi))
aic.glmer3 <- AIC(logLik(m0.glmer3.omi))

aic.glm;aic.glmer1;aic.glmer2;aic.glmer3
## [1] 943.08
## [1] 626.3468
## [1] 939.5412
## [1] 599.0271

Same thing with the commission error.

# Commission error
aic.glm <- AIC(logLik(m0.glm.com))
aic.glmer1 <- AIC(logLik(m0.glmer1.com))
aic.glmer2 <- AIC(logLik(m0.glmer2.com))
aic.glmer3 <- AIC(logLik(m0.glmer3.com))

aic.glm;aic.glmer1;aic.glmer2;aic.glmer3
## [1] 2829.176
## [1] 2377.584
## [1] 2509.332
## [1] 2106.953
The AIC of the glmer3 object is smallest shows that including the two random intercepts (species + project) is justified.

Model structure

We fit the full model of all first tier variables.

z1 <- glmer(type ~ BodyMass.Value + realm + IUCN_frq + IUCN_year + sampling_effort_t + (1|speciesScientificName) + (1|projectID), family = binomial, control=glmerControl(optimizer="bobyqa"), data = omi )

summary(z1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## type ~ BodyMass.Value + realm + IUCN_frq + IUCN_year + sampling_effort_t +  
##     (1 | speciesScientificName) + (1 | projectID)
##    Data: omi
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##    608.3    658.0   -294.1    588.3     1053 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.2677 -0.0121 -0.0066 -0.0021  4.3085 
## 
## Random effects:
##  Groups                Name        Variance Std.Dev.
##  speciesScientificName (Intercept) 281.791  16.787  
##  projectID             (Intercept)   3.432   1.852  
## Number of obs: 1063, groups:  speciesScientificName, 319; projectID, 46
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -8.56988    1.48422  -5.774 7.74e-09 ***
## BodyMass.Value     0.21030    0.74060   0.284    0.776    
## realmIndo-Malay   -2.25812    1.99259  -1.133    0.257    
## realmNearctic      0.46038    1.31946   0.349    0.727    
## realmNeotropical  -0.69394    1.50594  -0.461    0.645    
## IUCN_frq          -0.31663    0.29299  -1.081    0.280    
## IUCN_year         -0.30838    0.53728  -0.574    0.566    
## sampling_effort_t -0.04277    0.46271  -0.092    0.926    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) BdyM.V rlmI-M rlmNrc rlmNtr IUCN_f IUCN_y
## BodyMass.Vl  0.013                                          
## relmInd-Mly -0.143  0.033                                   
## realmNerctc -0.362  0.088  0.350                            
## realmNtrpcl -0.091  0.045  0.231  0.364                     
## IUCN_frq    -0.712 -0.083  0.038  0.052 -0.105              
## IUCN_year    0.044 -0.139  0.002 -0.023  0.091 -0.037       
## smplng_ffr_  0.077 -0.002  0.095 -0.267 -0.092 -0.040 -0.010

In a better table:

# summarize tabulated form 
sjPlot::tab_model(z1)
  type
Predictors Odds Ratios CI p
(Intercept) 0.00 0.00 – 0.00 <0.001
BodyMass.Value 1.23 0.29 – 5.27 0.776
realm [Indo-Malay] 0.10 0.00 – 5.19 0.257
realm [Nearctic] 1.58 0.12 – 21.04 0.727
realm [Neotropical] 0.50 0.03 – 9.56 0.645
IUCN_frq 0.73 0.41 – 1.29 0.280
IUCN_year 0.73 0.26 – 2.11 0.566
sampling_effort_t 0.96 0.39 – 2.37 0.926
Random Effects
σ2 3.29
τ00 speciesScientificName 281.79
τ00 projectID 3.43
ICC 0.99
N speciesScientificName 319
N projectID 46
Observations 1063
Marginal R2 / Conditional R2 0.006 / 0.989

None of factors are significant.

What about the commission error??

z2 <- glmer(type ~ BodyMass.Value + realm + IUCN_frq + IUCN_year + sampling_effort_t + (1|speciesScientificName) + (1|projectID), family = binomial, control=glmerControl(optimizer="bobyqa"), data = com)

summary(z2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## type ~ BodyMass.Value + realm + IUCN_frq + IUCN_year + sampling_effort_t +  
##     (1 | speciesScientificName) + (1 | projectID)
##    Data: com
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   2096.0   2152.3  -1038.0   2076.0     2058 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5530 -0.4583  0.1199  0.4005  5.0511 
## 
## Random effects:
##  Groups                Name        Variance Std.Dev.
##  speciesScientificName (Intercept) 4.646    2.155   
##  projectID             (Intercept) 3.077    1.754   
## Number of obs: 2068, groups:  speciesScientificName, 485; projectID, 55
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)   
## (Intercept)        1.94870    0.63384   3.074  0.00211 **
## BodyMass.Value    -0.22709    0.14428  -1.574  0.11550   
## realmIndo-Malay    1.01193    0.83752   1.208  0.22695   
## realmNearctic     -0.85257    0.74775  -1.140  0.25421   
## realmNeotropical   0.55065    0.91522   0.602  0.54740   
## IUCN_frq          -0.20772    0.06899  -3.011  0.00260 **
## IUCN_year          0.33035    0.13351   2.474  0.01334 * 
## sampling_effort_t -0.14456    0.25413  -0.569  0.56947   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) BdyM.V rlmI-M rlmNrc rlmNtr IUCN_f IUCN_y
## BodyMass.Vl  0.040                                          
## relmInd-Mly -0.585  0.004                                   
## realmNerctc -0.681  0.036  0.517                            
## realmNtrpcl -0.523  0.028  0.418  0.480                     
## IUCN_frq    -0.456 -0.150 -0.022  0.035 -0.038              
## IUCN_year   -0.010 -0.175  0.016  0.021  0.055 -0.031       
## smplng_ffr_ -0.068  0.003  0.163  0.062 -0.009  0.008 -0.005
# summarize tabulated form 
sjPlot::tab_model(z2)
  type
Predictors Odds Ratios CI p
(Intercept) 7.02 2.03 – 24.31 0.002
BodyMass.Value 0.80 0.60 – 1.06 0.116
realm [Indo-Malay] 2.75 0.53 – 14.20 0.227
realm [Nearctic] 0.43 0.10 – 1.85 0.254
realm [Neotropical] 1.73 0.29 – 10.43 0.547
IUCN_frq 0.81 0.71 – 0.93 0.003
IUCN_year 1.39 1.07 – 1.81 0.013
sampling_effort_t 0.87 0.53 – 1.42 0.569
Random Effects
σ2 3.29
τ00 speciesScientificName 4.65
τ00 projectID 3.08
ICC 0.70
N speciesScientificName 485
N projectID 55
Observations 2068
Marginal R2 / Conditional R2 0.066 / 0.721

The two index of IUCN assessment history are significant.

The IUCN frequency has negative effect of -0.20. Make sense. The more assessment that IUCN conducted for a species the more it’s range map will match camera trap result.

BUt the IUCN year has positive effect of 0.33. THis does not make sense, it means there are more mismatches in more recent IUCN assessment. I wonder if there is also a temporal mismatches. i.e. Camera survey was conducted in 2016, IUCN assessed in 2020. (Need take a closer look)

Next, let’s fit the full model

Let’s start with the omission error:

z3 <- glmer(type ~ BodyMass.Value + realm + IUCN_frq + IUCN_year + sampling_effort_t + ForStrat.Value + Activity.Nocturnal + diet.breadth + Tree_mean+ (1|speciesScientificName) + (1|projectID), family = binomial, control=glmerControl(optimizer="bobyqa"), data = omi )

summary(z3)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## type ~ BodyMass.Value + realm + IUCN_frq + IUCN_year + sampling_effort_t +  
##     ForStrat.Value + Activity.Nocturnal + diet.breadth + Tree_mean +  
##     (1 | speciesScientificName) + (1 | projectID)
##    Data: omi
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##    614.7    689.2   -292.3    584.7     1048 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3021 -0.0145 -0.0060 -0.0019  4.3041 
## 
## Random effects:
##  Groups                Name        Variance Std.Dev.
##  speciesScientificName (Intercept) 254.878  15.965  
##  projectID             (Intercept)   3.483   1.866  
## Number of obs: 1063, groups:  speciesScientificName, 319; projectID, 46
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)   
## (Intercept)        -6.27018    2.21307  -2.833  0.00461 **
## BodyMass.Value      0.50049    0.67081   0.746  0.45561   
## realmIndo-Malay    -2.21001    1.88677  -1.171  0.24147   
## realmNearctic       0.78683    1.42068   0.554  0.57969   
## realmNeotropical   -0.03781    1.60987  -0.023  0.98126   
## IUCN_frq           -0.37141    0.32304  -1.150  0.25025   
## IUCN_year          -0.68779    0.62741  -1.096  0.27298   
## sampling_effort_t  -0.04349    0.46797  -0.093  0.92595   
## ForStrat.ValueG    -1.83700    1.58263  -1.161  0.24575   
## ForStrat.ValueS    -3.09474    2.33126  -1.327  0.18434   
## Activity.Nocturnal -0.61524    1.30217  -0.472  0.63659   
## diet.breadth        0.46138    0.53647   0.860  0.38978   
## Tree_mean          -0.09176    0.45087  -0.204  0.83873   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
# summarize tabulated form 
sjPlot::tab_model(z3)
  type
Predictors Odds Ratios CI p
(Intercept) 0.00 0.00 – 0.14 0.005
BodyMass.Value 1.65 0.44 – 6.14 0.456
realm [Indo-Malay] 0.11 0.00 – 4.43 0.241
realm [Nearctic] 2.20 0.14 – 35.56 0.580
realm [Neotropical] 0.96 0.04 – 22.59 0.981
IUCN_frq 0.69 0.37 – 1.30 0.250
IUCN_year 0.50 0.15 – 1.72 0.273
sampling_effort_t 0.96 0.38 – 2.40 0.926
ForStrat.Value [G] 0.16 0.01 – 3.54 0.246
ForStrat.Value [S] 0.05 0.00 – 4.37 0.184
Activity.Nocturnal 0.54 0.04 – 6.94 0.637
diet.breadth 1.59 0.55 – 4.54 0.390
Tree_mean 0.91 0.38 – 2.21 0.839
Random Effects
σ2 3.29
τ00 speciesScientificName 254.88
τ00 projectID 3.48
ICC 0.99
N speciesScientificName 319
N projectID 46
Observations 1063
Marginal R2 / Conditional R2 0.013 / 0.988

None of the candidates variable are significant in explaining omission error.

z4 <- glmer(type ~ BodyMass.Value + realm + IUCN_frq + IUCN_year + sampling_effort_t + ForStrat.Value + Activity.Nocturnal + diet.breadth + Tree_mean+ (1|speciesScientificName) + (1|projectID), family = binomial, control=glmerControl(optimizer="bobyqa"), data = com )

summary(z4)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## type ~ BodyMass.Value + realm + IUCN_frq + IUCN_year + sampling_effort_t +  
##     ForStrat.Value + Activity.Nocturnal + diet.breadth + Tree_mean +  
##     (1 | speciesScientificName) + (1 | projectID)
##    Data: com
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   2067.6   2152.2  -1018.8   2037.6     2053 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6215 -0.4408  0.1100  0.3846  5.1149 
## 
## Random effects:
##  Groups                Name        Variance Std.Dev.
##  speciesScientificName (Intercept) 4.416    2.101   
##  projectID             (Intercept) 3.055    1.748   
## Number of obs: 2068, groups:  speciesScientificName, 485; projectID, 55
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         3.15026    0.74465   4.231 2.33e-05 ***
## BodyMass.Value     -0.18441    0.14685  -1.256  0.20919    
## realmIndo-Malay     1.23605    0.83813   1.475  0.14027    
## realmNearctic      -0.68516    0.74743  -0.917  0.35931    
## realmNeotropical    1.05002    0.97892   1.073  0.28344    
## IUCN_frq           -0.21533    0.06957  -3.095  0.00197 ** 
## IUCN_year           0.12543    0.14929   0.840  0.40082    
## sampling_effort_t  -0.19314    0.25692  -0.752  0.45219    
## ForStrat.ValueG    -1.70511    0.38880  -4.386 1.16e-05 ***
## ForStrat.ValueS    -2.78666    0.52358  -5.322 1.02e-07 ***
## Activity.Nocturnal  0.11223    0.32863   0.342  0.73272    
## diet.breadth       -0.26275    0.14510  -1.811  0.07017 .  
## Tree_mean          -0.44031    0.30208  -1.458  0.14495    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
# summarize tabulated form 
sjPlot::tab_model(z4)
  type
Predictors Odds Ratios CI p
(Intercept) 23.34 5.42 – 100.46 <0.001
BodyMass.Value 0.83 0.62 – 1.11 0.209
realm [Indo-Malay] 3.44 0.67 – 17.79 0.140
realm [Nearctic] 0.50 0.12 – 2.18 0.359
realm [Neotropical] 2.86 0.42 – 19.47 0.283
IUCN_frq 0.81 0.70 – 0.92 0.002
IUCN_year 1.13 0.85 – 1.52 0.401
sampling_effort_t 0.82 0.50 – 1.36 0.452
ForStrat.Value [G] 0.18 0.08 – 0.39 <0.001
ForStrat.Value [S] 0.06 0.02 – 0.17 <0.001
Activity.Nocturnal 1.12 0.59 – 2.13 0.733
diet.breadth 0.77 0.58 – 1.02 0.070
Tree_mean 0.64 0.36 – 1.16 0.145
Random Effects
σ2 3.29
τ00 speciesScientificName 4.42
τ00 projectID 3.05
ICC 0.69
N speciesScientificName 485
N projectID 55
Observations 2068
Marginal R2 / Conditional R2 0.123 / 0.732

IUCN frequency remain significant, but not IUCN year.

Forgoing stratum matters.

The dummy level of the forgoing strata is ForStrat.Value [Arboreal] in our model. ForStrat.Value [Ground] and ForStrat.Value [Sub-canopy] have negative effects -1.70 and -2.78 compare to ForStrat.Value [Arboreal]. This means we are more likely to get a mismatch for arboreal species.

This could indicate IUCN range map often over predict distribution of arboreal. But we cannot rule out the fact that camera trap can be less efficient in detecting arboreal species.

Any thoughts on model selection?

I an trying to find the ‘best’ model using i.e. full combination, stepwise regression. But it feels like data dredging.