Analysis conducted by:

Dr. Yoni Gavish & Ron Chen

Data ; Science ; Environment

Statistical and Ecological Solutions

   email: 

   phone: +972-58-4479847




General information on the current analysis:

Project - HaMaarag, State of Nature report, 2023

Group - Birds

Scale - National

Species group - Non Synanthrope species

Factors - Time, Unit

Response - Abundance




Take home messages:

- Abundance decreased non-significantly (p=0.221) by 1.010% per year (1.010 fold per year) and by 8.731% (1.096 fold) over the entire period:

  • Predicted value 2012: 5.98 Individuals per sample

  • Predicted value 2021: 5.45 Individuals per sample




A few general comments:

1. The aim of the analysis is to explore the effect of time and/or proximity to settlements and/or proximity to agriculture on the species richness, total abundance, geometric mean of abundance and community structure of birds in Israel

2. Definition of non rare species:

  • For species richness:

    • All species were included in the analysis
  • For total abundance and geometric mean of abundance (satisfy both criteria):

    • Appeared in at least 50 samples over all campaigns

    • At least 10 individuals over all campaigns

  • For community structure (satisfy both criteria):

    • Appeared in at least 10 samples over all campaigns

    • At least 10 individuals over all campaigns

3. The starting model (maximum complexity) for all analyses was kept constant at:

  • Proximity to agricultural + temporal contrasts:
-  Response ~ agriculture * time + unit + cos_td_rad + sin_td_rad + (1|site/point) 
  • Proximity to settlements + temporal contrast:
-  Response ~ settlements * time + unit + cos_td_rad + sin_td_rad + (1|site/point) 
  • Temporal contrast:
-  Response ~ time + unit + cos_td_rad + sin_td_rad + (1|site/point) 

4. Accounting for spatial structure and repeated measures:

  • Each sampling point was visited several times in different years/campaigns

  • Each sampling point is nested within sites

  • Site and point are thus included as a nested random factor in the model

- For community analysis we used (1|point) instead of (1|site/point) as random factor

5. Accounting for activity levels (random factors):

  • “cos_td_rad”, “sin_td_rad” - measure the date difference from 21 June, as the cos or sin of the difference in days in radians

  • Wind, Precipitation, Temperature and Cloud are not included in the analysis although they may affect activity level since the data is too sparse (many NAs)

  • The hour of sampling was not recorded in T0 - the two related variables c(“cos_hsun”, “sin_hsun”) are ignored in the analysis

6. Model complexity:

  • This analysis was restricted to monotonic increase/decrease

  • More flexible models that allow some curvature may unveil different patterns and trends




1. Prepare the working environment




2. Load data and source functions




3. Arrange the Traits table

  • Add to the traits table the species code, order, family, IUCN category and group, and residency status in Israel

  • Table shows number of NAs, class, mean value (for integers and numeric) and number of factor levels (for character and factors)

Column NAs Class Mean Levels
SciName 0 factor NA 189
SPECIES_CODE 0 character NA 189
HebName 0 factor NA 189
PRIMARY_COM_NAME 0 factor NA 189
Order 0 character NA 20
Family 0 character NA 53
Synanthrope 0 integer 0.180 NA
AlienInvasive 0 integer 0.021 NA
NativeInvasive 0 integer 0.021 NA
synanthrope_or_invasive 0 logical NA NA
batha_specialist 70 logical NA NA
oblig_insectivore 32 logical NA NA
IUCN_AllCat 43 character NA 9
IUCN 0 character NA 3
Resident 0 integer 0.434 NA
Resident_IL 0 factor NA 2



4. Arrange and filter the Plots table

  • Table shows number of NAs, class, mean value (for integers and numeric) and number of factor levels (for character and factors)
Column NAs Class Mean Levels
unit 0 factor NA 8
subunit 926 factor NA 4
site 0 factor NA 70
year 0 integer 2016.665 NA
year_ct 0 numeric 4.665 NA
settlements 791 factor NA 3
agriculture 1169 factor NA 3
habitat 1145 factor NA 5
dunes 1559 factor NA 3
land_use 1455 factor NA 4
point_name 0 factor NA 574
date 0 POSIXct NA NA
date 0 POSIXt NA NA
datetime 441 POSIXct NA NA
datetime 441 POSIXt NA NA
td_sc 0 numeric 0.129 NA
td_rad 0 numeric -0.783 NA
cos_td_rad 0 numeric 0.629 NA
sin_td_rad 0 numeric -0.635 NA
timediff_Jun21 0 difftime NA NA
monitors_name 1023 factor NA 14
wind 1449 ordered NA 5
wind 1449 factor NA 5
precipitation 1390 ordered NA 3
precipitation 1390 factor NA 3
temperature 1433 ordered NA 5
temperature 1433 factor NA 5
clouds 1519 ordered NA 5
clouds 1519 factor NA 5
h_from_sunrise 441 difftime NA NA
cos_hsun 441 numeric 0.817 NA
sin_hsun 441 numeric 0.442 NA
pilot 0 logical NA NA
campaign 0 character NA 5
Unit_Label 0 character NA 8
Region 0 character NA 3



5. Clean Data and identify non-rare species

  • Filter columns less relevant to the analysis

  • Remove all observations with 0 abundance (legacy from former filtering)

  • Add features from the trait table

  • Keep only Non Synanthrope species

  • Extract general properties of each column in Data : BasicInfo.R

  • Identify species that meet the two criteria for non-rare : Get_InciAbunRare.R

  • Table shows number of NAs, class, mean value (for integers and numeric) and number of factor levels (for character and factors)

Column NAs Class Mean Levels
SciName 0 character NA 65
year 0 integer 2016.993 NA
point_name 0 character NA 565
campaign 0 character NA 5
unit 0 character NA 8
site 0 character NA 70
year_ct 0 integer 4.993 NA
count_under_250 0 integer 2.690 NA
agriculture 3092 character NA 3
settlements 2562 character NA 3
cos_td_rad 0 numeric 0.601 NA
sin_td_rad 0 numeric -0.671 NA
SPECIES_CODE 0 data.table NA NA
SPECIES_CODE 0 data.frame NA NA
HebName 0 data.table NA NA
HebName 0 data.frame NA NA
PRIMARY_COM_NAME 0 data.table NA NA
PRIMARY_COM_NAME 0 data.frame NA NA
Order 0 data.table NA NA
Order 0 data.frame NA NA
Family 0 data.table NA NA
Family 0 data.frame NA NA
Synanthrope 0 data.table NA NA
Synanthrope 0 data.frame NA NA
AlienInvasive 0 data.table NA NA
AlienInvasive 0 data.frame NA NA
NativeInvasive 0 data.table NA NA
NativeInvasive 0 data.frame NA NA
synanthrope_or_invasive 0 data.table NA NA
synanthrope_or_invasive 0 data.frame NA NA
batha_specialist 181 data.table NA NA
batha_specialist 181 data.frame NA NA
oblig_insectivore 297 data.table NA NA
oblig_insectivore 297 data.frame NA NA
IUCN_AllCat 0 data.table NA NA
IUCN_AllCat 0 data.frame NA NA
IUCN 0 data.table NA NA
IUCN 0 data.frame NA NA
Resident_IL 0 data.table NA NA
Resident_IL 0 data.frame NA NA

A total of 17 species satisfied the non-rare criteria, including:

Curruca melanocephala ; Galerida cristata ; Chloris chloris ; Alectoris chukar ; Troglodytes troglodytes ; Carduelis carduelis ; Ammomanes deserti ; Iduna pallida ; Emberiza calandra ; Lanius senator ; Lanius excubitor ; Curruca communis ; Burhinus oedicnemus ; Scotocerca inquieta ; Oenanthe melanura ; Cercotrichas galactotes ; Curruca conspicillata




6. Create a dataframe with the relevant variables for analysis

  • For Richness, we use all species.

  • For abundance, mean abundance (geometric) and community structure we filter rare species.

  • Create a data-table with a row for each combination of ‘year’ and ‘point_name’

    • First two columns are ‘year’ and ‘point_name’

    • Column for each species

  • Add empty plots (plots with no observations)

  • Filter Rare species

  • Calculate total abundance

  • Calculate the geometric mean abundance (GMA)

    • For each row

    • For species with at least one individual (Note)

  • Add the relevant predictor variables

Note:

when calculating a geometric mean (GMA), any species with 0 abundance would result with a geometric mean of 0 (since the geometric mean relies on the product). Since most samples include at least one species with 0 abundance, we based the geometric mean only on the species with at least one individual. For empty plots- NaN is returned and these rows should be removed when modeling GMA.

  • Table shows number of NAs, class, mean value (for integers and numeric) and number of factor levels (for character and factors)
Column NAs Class Mean Levels
year 0 numeric 2016.667 NA
point_name 0 character NA 574
Curruca melanocephala 0 numeric 2.121 NA
Galerida cristata 0 numeric 1.561 NA
Chloris chloris 0 numeric 0.627 NA
Alectoris chukar 0 numeric 0.572 NA
Troglodytes troglodytes 0 numeric 0.253 NA
Carduelis carduelis 0 numeric 0.388 NA
Ammomanes deserti 0 numeric 0.176 NA
Iduna pallida 0 numeric 0.125 NA
Emberiza calandra 0 numeric 0.223 NA
Lanius senator 0 numeric 0.099 NA
Lanius excubitor 0 numeric 0.078 NA
Curruca communis 0 numeric 0.123 NA
Burhinus oedicnemus 0 numeric 0.095 NA
Scotocerca inquieta 0 numeric 0.073 NA
Oenanthe melanura 0 numeric 0.071 NA
Cercotrichas galactotes 0 numeric 0.068 NA
Curruca conspicillata 0 numeric 0.068 NA
TotalAbun 0 numeric 6.721 NA
GMA 95 numeric 2.990 NA
site 0 factor NA 70
campaign 0 character NA 5
year_ct 0 numeric 4.667 NA
agriculture 1168 factor NA 3
settlements 791 factor NA 3
unit 0 factor NA 8
Unit_Label 0 character NA 8
Region 0 character NA 3
cos_td_rad 0 numeric 0.629 NA
sin_td_rad 0 numeric -0.636 NA
year_ct_Sq 0 numeric 29.703 NA
Response 0 numeric 6.721 NA

The data contains 95 empty plot sampling events (plot sampling with response=0).

Of these, 16 plots were empty throughout the entire monitoring period.

These 16 plots represent 16 plot sampling events.

Of these 16 plots, 0 plots span more than a single year.




7. Explore the Abundance data frame

  • Visualize the relationship of the response to various explanatory variables

Main conclusions:

  • Considerable variability in total abundance between units, with highest values in the Mediterranean maquis unit and lowest in the inland sands unit

  • Considerable variability between and within sites

Notes:

  • unit Batha vegetation refers to Herbaceous and Dwarf-Shrub Vegetation

  • unit Transition Zone refers to Mediterranean-Desert Transition Zone

  • unit Loess Areas refers to Loess Covered Areas in the Northern Negev


samples with abundance > 110 (not shown in figures)
1478
year 2021
point_name Kerem Maharal Near 3
Unit_Label Mediterranean Maquis
Region Mediterranean
site Kerem Maharal
TotalAbun 117

Number of samples
Year Arid South Negev Highlands Inland Sands Loess Areas Transition Zone Planted Conifer Forest Mediterranean Maquis Batha Vegetation
2012 30 30 0 24 30 0 89 0
2014 30 45 0 27 30 45 0 60
2015 0 0 0 0 0 45 89 0
2016 30 45 0 29 29 0 0 60
2017 0 0 12 0 0 45 88 0
2018 30 45 0 30 30 0 0 60
2019 0 0 12 0 0 45 89 0
2020 30 45 0 30 30 0 0 60
2021 0 0 12 0 0 45 89 0







8. Model Abundance - Idenitify Family and Fixed factors

We fit generalized linear mixed-effects model (glmmTMB) with:

  • The response is: TotalAbun (internally, “Response”)

  • The fixed factors are: year_ct ; unit ; cos_td_rad ; sin_td_rad

  • The groping factors are: site ; point_name - with the later nested within the first (each site contains several plots)

Objectives:

  • Identify the best combination of fixed factors

  • Decide on the family (poisson or negative binomial)

Workflow:

  • Identify the best model for Poisson:

    • Fit the most complex model with glmmTMB

    • Simplify the model with MuMIn::dredge(rank = “AIC”) - removal of fixed terms

    • Select the best model

  • Identify the best model for Negative binomial:

    • Fit the most complex model with glmmTMB

    • Simplify the model with MuMIn::dredge(rank = “AIC”) - removal of fixed terms

    • Select the best model

  • Select the optimal model in terms of family and fixed factors.

    • Use model selection to compare the best poisson and best negative binomial

    • Diagnose the best model with glmmTMB:diagnose and look at the sigma (for NB)




Simplification of the poisson models

  • Best model kept all terms

  • Second best model DeltaAIC > 2

  • Keep first model

cond((Int)) disp((Int)) cond(cos_td_rad) cond(sin_td_rad) cond(Unit_Label) cond(year_ct) df logLik AIC delta weight
16 0.148
0.463 -0.711
-0.025 13 -5034.514 10095.03 0.000 9.999891e-01
8 0.102
0.498 -0.617
NA 12 -5046.942 10117.88 22.857 1.088310e-05
15 0.692
NA -0.391
-0.029 12 -5057.990 10139.98 44.952 1.732868e-10
12 1.030
0.454 -0.709 NA -0.025 6 -5071.672 10155.34 60.317 7.986115e-14
7 0.686
NA -0.253
NA 11 -5074.537 10171.07 76.046 3.068132e-17
4 0.964
0.488 -0.616 NA NA 5 -5083.734 10177.47 82.441 1.253804e-18
11 1.533
NA -0.391 NA -0.028 5 -5093.666 10197.33 102.305 6.090948e-23
3 1.500
NA -0.256 NA NA 4 -5109.495 10226.99 131.963 2.210958e-29
14 0.961
-0.125 NA
-0.007 12 -5101.920 10227.84 132.812 1.446418e-29
6 0.915
-0.091 NA
NA 11 -5102.930 10227.86 132.833 1.431340e-29
5 0.852
NA NA
NA 10 -5106.070 10232.14 137.111 1.685124e-30
13 0.850
NA NA
0.001 11 -5106.052 10234.10 139.076 6.310282e-31
10 1.770
-0.134 NA NA -0.007 5 -5136.389 10282.78 187.751 1.700098e-41
2 1.720
-0.100 NA NA NA 4 -5137.442 10282.89 187.857 1.612062e-41
1 1.655
NA NA NA NA 3 -5141.187 10288.37 193.346 1.036428e-42
9 1.650
NA NA NA 0.001 4 -5141.145 10290.29 195.263 3.973160e-43



Simplification of the negative binomial models

  • Best model kept all the terms

  • Second model DeltaAIC > 2

  • Keep first model

cond((Int)) disp((Int)) cond(cos_td_rad) cond(sin_td_rad) cond(Unit_Label) cond(year_ct) df logLik AIC delta weight
8 0.217
0.552 -0.506
NA 13 -4411.979 8849.958 0.000 5.624061e-01
16 0.241
0.535 -0.544
-0.010 14 -4411.230 8850.460 0.502 4.375647e-01
15 0.870
NA -0.180
-0.015 13 -4422.788 8871.577 21.619 1.136619e-05
7 0.864
NA -0.104
NA 12 -4424.480 8872.960 23.002 5.690782e-06
6 0.857
0.110 NA
NA 12 -4424.603 8873.207 23.249 5.030707e-06
5 0.933
NA NA
NA 11 -4425.938 8873.876 23.918 3.600903e-06
14 0.838
0.123 NA
0.003 13 -4424.551 8875.102 25.144 1.950121e-06
13 0.947
NA NA
-0.004 12 -4425.780 8875.561 25.603 1.550635e-06
4 1.114
0.489 -0.468 NA NA 6 -4448.700 8909.401 59.443 6.954070e-14
12 1.142
0.474 -0.504 NA -0.009 7 -4448.065 8910.130 60.172 4.828188e-14
11 1.671
NA -0.172 NA -0.013 6 -4456.537 8925.073 75.115 2.747779e-17
3 1.655
NA -0.105 NA NA 5 -4457.838 8925.676 75.718 2.032862e-17
1 1.719
NA NA NA NA 4 -4459.289 8926.578 76.620 1.295089e-17
2 1.670
0.075 NA NA NA 5 -4458.674 8927.348 77.390 8.811965e-18
9 1.730
NA NA NA -0.003 5 -4459.217 8928.433 78.475 5.120904e-18
10 1.657
0.084 NA NA 0.002 6 -4458.650 8929.299 79.341 3.321085e-18



Model selection

  • The Negative binomial model (AIC weight = 1) outperformed the Poisson model
cond((Int)) disp((Int)) cond(cos_td_rad) cond(sin_td_rad) cond(Unit_Label) cond(year_ct) family df logLik AIC delta weight
Best_NB 0.241
0.535 -0.544
-0.010 nbinom2(log) 14 -4411.230 8850.46 0.000 1.000000e+00
Best_Po 0.148
0.463 -0.711
-0.025 poisson(log) 13 -5034.514 10095.03 1244.568 5.565952e-271



Diagnostics and sigma

  • Diagnostic of best negative binomial model reveals minor issues
## Unusually large Z-statistics (|x|>5):
## 
##                       sin_td_rad            Unit_LabelLoess Areas 
##                        -5.202071                         5.409626 
## Unit_LabelPlanted Conifer Forest   Unit_LabelMediterranean Maquis 
##                         5.556414                         6.654735 
##       Unit_LabelBatha Vegetation                    d~(Intercept) 
##                        10.129983                        17.909080 
##        theta_1|point_name:site.1                   theta_1|site.1 
##                        -9.957956                       -10.387473 
## 
## Large Z-statistics (estimate/std err) suggest a *possible* failure of
## the Wald approximation - often also associated with parameters that are
## at or near the edge of their range (e.g. random-effects standard
## deviations approaching 0).  (Alternately, they may simply represent
## very well-estimated parameters; intercepts of non-centered models may
## fall in this category.) While the Wald p-values and standard errors
## listed in summary() may be unreliable, profile confidence intervals
## (see ?confint.glmmTMB) and likelihood ratio test p-values derived by
## comparing models (e.g. ?drop1) are probably still OK.  (Note that the
## LRT is conservative when the null value is on the boundary, e.g. a
## variance or zero-inflation value of 0 (Self and Liang 1987; Stram and
## Lee 1994; Goldman and Whelan 2000); in simple cases these p-values are
## approximately twice as large as they should be.)

  • The dispersion parameter of the best negative binomial model was >1, suggesting the negative binomial model is better
## [1] 3.235275



Main conclusions:

  • Continue with the best negative binomial model
## Formula:          
## Response ~ cos_td_rad + sin_td_rad + Unit_Label + year_ct + (1 |  
##     site/point_name)
## Data: MainData
##       AIC       BIC    logLik  df.resid 
##  8850.460  8925.696 -4411.230      1580 
## Random-effects (co)variances:
## 
## Conditional model:
##  Groups          Name        Std.Dev.
##  point_name:site (Intercept) 0.2096  
##  site            (Intercept) 0.2510  
## 
## Number of obs: 1594 / Conditional model: point_name:site, 574; site, 70
## 
## Dispersion parameter for nbinom2 family (): 3.24 
## 
## Fixed Effects:
## 
## Conditional model:
##                      (Intercept)                        cos_td_rad  
##                          0.24131                           0.53453  
##                       sin_td_rad         Unit_LabelNegev Highlands  
##                         -0.54391                           0.55731  
##           Unit_LabelInland Sands             Unit_LabelLoess Areas  
##                          0.41276                           0.93479  
##        Unit_LabelTransition Zone  Unit_LabelPlanted Conifer Forest  
##                          0.78200                           0.82802  
##   Unit_LabelMediterranean Maquis        Unit_LabelBatha Vegetation  
##                          0.96648                           1.59728  
##                          year_ct  
##                         -0.01015
## glmmTMB::glmmTMB(formula = Response ~ cos_td_rad + sin_td_rad + 
##     Unit_Label + year_ct + (1 | site/point_name), data = MainData, 
##     family = "nbinom2", ziformula = ~0, dispformula = ~1, na.action = na.fail)



9. Model Abundance - Identfiy best family variant of the negative binomial model

The glmmTMB function support multiple variants of many families and various link function

  • For poisson: ‘poisson’ vs. ‘compois’

  • For negative binomial: ‘nbinom1’ vs. ‘nbinom2’

  • For Gamma: Different link functions (log vs. identity vs. inverse)

Objectives:

  • Identify the best model family/link variant based on the selected fixed factors

Workflow:

  • Fit an alternative model with family == ‘nbinom1’

  • Simplify the alternative model with MuMIn::dredge()

  • Select the best alternative model

  • Use model selection to compare the best original and alternative nodels (MuMIn::model.sel)

  • Select the optimal model in terms of family variant

  • Diagnose the best model with glmmTMB:diagnose




Simplification of the negative binomial models (nbinom1)

  • Best model kept all terms

  • Second best model DeltaAIC > 2

  • Keep the first

cond((Int)) disp((Int)) cond(cos_td_rad) cond(sin_td_rad) cond(Unit_Label) cond(year_ct) df logLik AIC delta weight
16 0.453
0.519 -0.580
-0.015 14 -4447.651 8923.302 0.000 7.080189e-01
8 0.430
0.539 -0.519
NA 13 -4449.537 8925.074 1.772 2.919430e-01
15 1.052
NA -0.231
-0.019 13 -4458.639 8943.278 19.976 3.252781e-05
7 1.052
NA -0.136
NA 12 -4461.544 8947.087 23.785 4.843230e-06
5 1.149
NA NA
NA 11 -4465.029 8952.059 28.757 4.032874e-07
6 1.124
0.038 NA
NA 12 -4464.831 8953.662 30.360 1.808752e-07
13 1.155
NA NA
-0.002 12 -4464.987 8953.974 30.672 1.548103e-07
14 1.121
0.040 NA
0.000 13 -4464.829 8955.659 32.357 6.666322e-08
12 1.206
0.453 -0.530 NA -0.015 7 -4479.263 8972.527 49.225 1.448774e-11
4 1.172
0.470 -0.469 NA NA 6 -4480.964 8973.928 50.626 7.188622e-12
11 1.706
NA -0.217 NA -0.018 6 -4487.604 8987.207 63.905 9.401576e-15
3 1.687
NA -0.129 NA NA 5 -4490.022 8990.045 66.743 2.275575e-15
1 1.769
NA NA NA NA 4 -4492.998 8993.995 70.693 3.156753e-16
2 1.754
0.025 NA NA NA 5 -4492.914 8995.827 72.525 1.262838e-16
9 1.777
NA NA NA -0.002 5 -4492.963 8995.926 72.624 1.201927e-16
10 1.756
0.023 NA NA 0.000 6 -4492.912 8997.825 74.523 4.651438e-17



Model selection

  • The ‘nbinom2’ model outperformed the ‘nbinom1’ model.
cond((Int)) disp((Int)) cond(cos_td_rad) cond(sin_td_rad) cond(Unit_Label) cond(year_ct) family df logLik AIC delta weight
BestModel 0.241
0.535 -0.544
-0.010 nbinom2(log) 14 -4411.230 8850.460 0.000 1.000000e+00
Best_Opt1 0.453
0.519 -0.580
-0.015 nbinom1(log) 14 -4447.651 8923.302 72.842 1.522462e-16



Diagnostics and sigma

  • Diagnostic of best model reveals minor issues
## Unusually large Z-statistics (|x|>5):
## 
##                       sin_td_rad            Unit_LabelLoess Areas 
##                        -5.202071                         5.409626 
## Unit_LabelPlanted Conifer Forest   Unit_LabelMediterranean Maquis 
##                         5.556414                         6.654735 
##       Unit_LabelBatha Vegetation                    d~(Intercept) 
##                        10.129983                        17.909080 
##        theta_1|point_name:site.1                   theta_1|site.1 
##                        -9.957956                       -10.387473 
## 
## Large Z-statistics (estimate/std err) suggest a *possible* failure of
## the Wald approximation - often also associated with parameters that are
## at or near the edge of their range (e.g. random-effects standard
## deviations approaching 0).  (Alternately, they may simply represent
## very well-estimated parameters; intercepts of non-centered models may
## fall in this category.) While the Wald p-values and standard errors
## listed in summary() may be unreliable, profile confidence intervals
## (see ?confint.glmmTMB) and likelihood ratio test p-values derived by
## comparing models (e.g. ?drop1) are probably still OK.  (Note that the
## LRT is conservative when the null value is on the boundary, e.g. a
## variance or zero-inflation value of 0 (Self and Liang 1987; Stram and
## Lee 1994; Goldman and Whelan 2000); in simple cases these p-values are
## approximately twice as large as they should be.)



Main conclusions:

  • Continue with the ‘nbinom2’ model
## Formula:          
## Response ~ cos_td_rad + sin_td_rad + Unit_Label + year_ct + (1 |  
##     site/point_name)
## Data: MainData
##       AIC       BIC    logLik  df.resid 
##  8850.460  8925.696 -4411.230      1580 
## Random-effects (co)variances:
## 
## Conditional model:
##  Groups          Name        Std.Dev.
##  point_name:site (Intercept) 0.2096  
##  site            (Intercept) 0.2510  
## 
## Number of obs: 1594 / Conditional model: point_name:site, 574; site, 70
## 
## Dispersion parameter for nbinom2 family (): 3.24 
## 
## Fixed Effects:
## 
## Conditional model:
##                      (Intercept)                        cos_td_rad  
##                          0.24131                           0.53453  
##                       sin_td_rad         Unit_LabelNegev Highlands  
##                         -0.54391                           0.55731  
##           Unit_LabelInland Sands             Unit_LabelLoess Areas  
##                          0.41276                           0.93479  
##        Unit_LabelTransition Zone  Unit_LabelPlanted Conifer Forest  
##                          0.78200                           0.82802  
##   Unit_LabelMediterranean Maquis        Unit_LabelBatha Vegetation  
##                          0.96648                           1.59728  
##                          year_ct  
##                         -0.01015
## glmmTMB::glmmTMB(formula = Response ~ cos_td_rad + sin_td_rad + 
##     Unit_Label + year_ct + (1 | site/point_name), data = MainData, 
##     family = "nbinom2", ziformula = ~0, dispformula = ~1, na.action = na.fail)



10. Diagnose the best model with simulated residuals (DHARMa)

General info

To diagnose the model, we simulate residuals with the DHARMa package. For more details why DHARMa should be used with GLMM, see:

  1. https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html

To interpret the residuals:

  • Scaled residual value of 0.5 means that half of the simulated data are higher than the observed value, and half of them lower

  • Scaled residual value of 0.99 would mean that nearly all simulated data are lower than the observed value.

  • The minimum/maximum values for the residuals are 0 and 1

  • For a correctly specified model we would expect asymptotically

    • A uniform (flat) distribution of the scaled residuals

    • Uniformity in y direction if we plot against any predictor




Dispersion:

  • Compare the variance of the observed raw residuals (red line) against the variance of the simulated residuals (histogram)

    • Dispersion > 1 indicates overdispersion

    • Dispersion < 1 indicates underdispersion

  • Observed variance (red line) did not differ from the simulated variances (histogram)

  • Dispersion > 1 but n.s.

  • No evidence for over or under dispersion

## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.367, p-value = 0.016
## alternative hypothesis: two.sided



Outliers:

  • Understanding outliers:

    • When all simulated values for the observation are higher or smaller than the observation itself

    • Tests if the number of Outliers is larger or smaller than expected

  • There are 7 outliers in the simulated residuals

  • Test is n.s.

  • No evidence for excessive number of outliers

## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  SimulatedRes
## outliers at both margin(s) = 8, observations = 1594, p-value = 0.4283
## alternative hypothesis: true probability of success is not equal to 0.003992016
## 95 percent confidence interval:
##  0.002169186 0.009865008
## sample estimates:
## frequency of outliers (expected: 0.00399201596806387 ) 
##                                            0.005018821



Zero inflation:

  • The expected distribution of zeros (histogram) and the observed value (red line)

    • ratioObsSim < 1 : the observed data has less zeros than expected

    • ratioObsSim > 1 : the observed data has more zeros than expected

  • No evidence for zero inflation:

    • Red line within histogram

    • Test is n.s

## 
##  DHARMa zero-inflation test via comparison to expected zeros with
##  simulation under H0 = fitted model
## 
## data:  simulationOutput
## ratioObsSim = 1.0796, p-value = 0.532
## alternative hypothesis: two.sided



Quantiles:

  • For numeric:

    • Solid black / red lines: qunatile regressions (0.25, 0.5, 0.75) of the observed residuals against predicted value

      • Significant regressions in red
    • Points - simulated residuals

    • Dashed horizontal lines - the expected value for the 0.25, 0.5, and 0.75 quantiles (simulated residuals have a uniform distribution)

  • For categorical:

    • Within-group uniformity of categorical predictors [asymptotic one-sample Kolmogorov-Smirnov test]

    • Gray: the residual within the group are distributed according to the model assumptions (dashed lines)

    • Red: the residual within the group deviate from the model assumptions (dashed lines)

    • homogeneity of variances of categorical predictors [Levene’s Test]

    • Non-significant (gray): group variances are constant

    • Significant (red): group variances are not constant

  • Results:

    • The 0.25 and 0.5 quantile regressions of the observed residuals against predicted values were significant, but lines are mostly horizontal and deviation occurs mainly in the extremes

    • The 0.75 quantile regression of the observed residuals against predicted values was non-significant

## 
##  Test for location of quantiles via qgam
## 
## data:  simulationOutput
## p-value = 0.009107
## alternative hypothesis: both



year:

  • See categorical above for details (years treated as categorical)

  • Residuals in 2017 deviated from expected

  • Residuals in all other years did not deviate from expected

  • Variances of different years differ significantly from one another




Unit:

  • See categorical above for details

  • Residuals of 1 out of 8 units deviated from expected:

    • Negev Highlands
  • Variances of the units differ significantly from one another

## $uniformity
## $uniformity$details
## catPred: Arid South
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.14578, p-value = 0.003406
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Negev Highlands
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.087058, p-value = 0.0829
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Inland Sands
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.18601, p-value = 0.1455
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Loess Areas
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.11975, p-value = 0.03609
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Transition Zone
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.059894, p-value = 0.659
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Planted Conifer Forest
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.0756, p-value = 0.1527
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Mediterranean Maquis
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.052242, p-value = 0.1771
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Batha Vegetation
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.036211, p-value = 0.9114
## alternative hypothesis: two-sided
## 
## 
## $uniformity$p.value
## [1] 0.003406054 0.082898104 0.145545594 0.036086718 0.659025079 0.152712776
## [7] 0.177094531 0.911361465
## 
## $uniformity$p.value.cor
## [1] 0.02724843 0.49738862 0.72772797 0.25260703 1.00000000 0.72772797 0.72772797
## [8] 1.00000000
## 
## 
## $homogeneity
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value  Pr(>F)    
## group    7  4.8347 2.1e-05 ***
##       1586                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1



Uniformity of residuals:

  • Kolmogorov-Smirnov Test (KS) of the uniformity of simulated residuals

    • Residuals did not deviate from expected
  • Dispersion test n.s. (see above)

  • Outliers test n.s. (see above)

  • qq plot -

    • Quantiles of the observed residuals vs. qunatiles of the simulated residuals

    • The expected value is the 1:1 line (red)

  • qqline is close to the expected line

## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.046494, p-value = 0.002033
## alternative hypothesis: two-sided



Main conclusions:

  • The 0.25 and 0.5 quantile regressions of the observed residuals against predicted values were significant, but lines are mostly horizontal and deviation occurs mainly in the extremes

  • Some issues with uniformity of residuals within groups (overall, years, units)

  • Other than that, no other issues

  • Continue to compute marginal effects




11. Compute marginal effects for figures

  • Marginal effects are extracted with ggeffects:ggeffect() that calls effects::Effect() internally

  • Marginal effects are only calculated and plotted for terms that appear in the final model:

    • year

    • unit

  • When calculating change for years:

    • FoldIncrease: Value_year / Value_year_0

    • PercentIncrease: 100 * [(Value_year - Value_year_0) / Value_year_0]

    • FoldDecline: Value_year_0 / Value_year

    • PercentDecline: 100 * [(Value_year_0 - Value_year) / Value_year_0]

    • Delta: Value_year - Value_year_0

  • When calculating change for Agriculture / Settlements / Units:

    • Fold: Value_level / Value_min

    • PercentDiffernce: 100 * [(Value_level - Value_min) / Value_min]

    • Delta: Value_level - Value_min




13. Figures for years




14. Figures for unit




15. Calculate the folds and percent change

  • Calculate the folds and percent change - contrast (agriculture / settlements):

    • If Near > Far (coefficient > 0):

      • Contrast_Type: “Increase”

      • Contrast_Fold: exp(coefficient)

      • Contrast_Percent: 100 * (exp(coefficient) - 1)

    • If Near < Far (coefficient > 0):

      • Contrast_Type: “Decrease”

      • Contrast_Fold: 1 / exp(coefficient)

      • Contrast_Percent: 100 * (1 - exp(coefficient))

    • If Near = Far (coefficient = 0):

      • Contrast_Type: “NoChange”

      • Contrast_Fold: 1

      • Contrast_Percent: 0

  • Calculate the folds and percent change - temporal trend (years):

    • Num_Years: Number of years

    • If LastYear > FirstYear (coefficient > 0):

      • Year_Type: “Increase”

      • YearFold_1Year: exp(coefficient)

      • YearPercent_1Year: 100 * (exp(coefficient) - 1)

      • YearFold_NumYear: exp(coefficient * Num_Years)

      • YearPercent_NumYear: 100 * (exp(coefficien * Num_Years) - 1)

    • If LastYear < FirstYear (coefficient > 0):

      • Year_Type: “Decrease”

      • YearFold_1Year: 1 / exp(coefficient)

      • YearPercent_1Year: 100 * (1 - exp(coefficient))

      • YearFold_NumYear: 1 / exp(coefficient * Num_Years)

      • YearPercent_NumYear: 100 * (1 - exp(coefficient * Num_Years))

    • If LastYear = FirstYear (coefficient = 0):

      • Year_Type: “NoChange”

      • YearFold_1Year: 1

      • YearPercent_1Year: 0

      • YearFold_NumYear: 1

      • YearPercent_NumYear: 0




16. Extract information and write to files

  • Summary(BestModel) is written into:

    • SoN23_BirdsNational_Time_Abundance_NonSynanthrope_ModSummary.txt
  • Fixed effects from the best model are written into:

    • SoN23_BirdsNational_Time_Abundance_NonSynanthrope_Coeff.csv
  • General information on the model is written into:

    • SoN23_BirdsNational_Time_Abundance_NonSynanthrope_ModelInfo.csv
  • Marginal effects of years / units are written to:

    • SoN23_BirdsNational_Time_Abundance_NonSynanthrope_MargEffe_Years.csv

    • SoN23_BirdsNational_Time_Abundance_NonSynanthrope_MargEffe_Units.csv

  • A figure is written (png, svg, pdf) for each of the marginal effect figures plotted in this document

    • Names are the predictors (interactions) + indication if the points are observations, partial residuals or none:

      • “AgriUnit_Observation” : agriculture * unit ; observations

      • “SettYearUnit_PartResi” : settlements * year * unit ; partial residuals

      • “AgriYear_None” : agriculture * year, without points

    • Names with “CapY_100” means that the maximal value of Y was truncated at 100 and some samples are not plotted

    • Names with “_Heb” mean that all text elements are in Hebrew




17. Summary of main findings


Data structure:

  • The data contains 1594 samples in which 17 species passed as non-rare (and thus included in the analysis)

  • Total abundances per sample ranges from 0 to 117 individuals


Visual inspection of Response:

  • No apparent trends in total abundance with time

  • Considerable variability in total abundance between units, with highest values in the Mediterranean maquis unit and lowest in the inland sands unit


Model fitting (glmmTMB):

  • Negative binomial outperformed poisson

  • nbinom2 outperformed nbinom1

  • Diagnostics with DHARMa:

    • The 0.25 and 0.5 quantile regressions of the observed residuals against predicted values were significant, but lines are mostly horizontal and deviation occurs mainly in the extremes

    • Some issues with uniformity of residuals within groups (overall, years, units)


Best model call:

## glmmTMB::glmmTMB(formula = Response ~ cos_td_rad + sin_td_rad + 
##     Unit_Label + year_ct + (1 | site/point_name), data = MainData, 
##     family = "nbinom2", ziformula = ~0, dispformula = ~1, na.action = na.fail)

Best model summary:

##  Family: nbinom2  ( log )
## Formula:          
## Response ~ cos_td_rad + sin_td_rad + Unit_Label + year_ct + (1 |  
##     site/point_name)
## Data: MainData
## 
##      AIC      BIC   logLik deviance df.resid 
##   8850.5   8925.7  -4411.2   8822.5     1580 
## 
## Random effects:
## 
## Conditional model:
##  Groups          Name        Variance Std.Dev.
##  point_name:site (Intercept) 0.04392  0.2096  
##  site            (Intercept) 0.06298  0.2510  
## Number of obs: 1594, groups:  point_name:site, 574; site, 70
## 
## Dispersion parameter for nbinom2 family (): 3.24 
## 
## Conditional model:
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                       0.241306   0.186624   1.293  0.19601    
## cos_td_rad                        0.534530   0.110482   4.838 1.31e-06 ***
## sin_td_rad                       -0.543913   0.104557  -5.202 1.97e-07 ***
## Unit_LabelNegev Highlands         0.557312   0.171071   3.258  0.00112 ** 
## Unit_LabelInland Sands            0.412763   0.245670   1.680  0.09293 .  
## Unit_LabelLoess Areas             0.934791   0.172801   5.410 6.32e-08 ***
## Unit_LabelTransition Zone         0.781998   0.174690   4.476 7.59e-06 ***
## Unit_LabelPlanted Conifer Forest  0.828017   0.149020   5.556 2.75e-08 ***
## Unit_LabelMediterranean Maquis    0.966479   0.145232   6.655 2.84e-11 ***
## Unit_LabelBatha Vegetation        1.597276   0.157678  10.130  < 2e-16 ***
## year_ct                          -0.010151   0.008288  -1.225  0.22062    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Best model coefficients:

Effect Estimate Std. Error z value Pr(>|z|) 2.5 % 97.5 %
(Intercept) 0.241 0.187 1.293 0.196 -0.124 0.607
cos_td_rad 0.535 0.110 4.838 0.000 0.318 0.751
sin_td_rad -0.544 0.105 -5.202 0.000 -0.749 -0.339
Unit_LabelNegev Highlands 0.557 0.171 3.258 0.001 0.222 0.893
Unit_LabelInland Sands 0.413 0.246 1.680 0.093 -0.069 0.894
Unit_LabelLoess Areas 0.935 0.173 5.410 0.000 0.596 1.273
Unit_LabelTransition Zone 0.782 0.175 4.476 0.000 0.440 1.124
Unit_LabelPlanted Conifer Forest 0.828 0.149 5.556 0.000 0.536 1.120
Unit_LabelMediterranean Maquis 0.966 0.145 6.655 0.000 0.682 1.251
Unit_LabelBatha Vegetation 1.597 0.158 10.130 0.000 1.288 1.906
year_ct -0.010 0.008 -1.225 0.221 -0.026 0.006

Effect plots - year:

  • Slightly decreasing, almost horizontal

  • Partial residuals look better

x predicted std.error conf.low conf.high group FoldIncrease PercentIncrease FoldDecline PercentDecline Delta
0 5.976 0.053 5.385 6.632 1 1.000 0.000 1.000 0.000 0.000
2 5.856 0.043 5.379 6.376 1 0.980 -2.010 1.021 2.010 -0.120
3 5.797 0.040 5.358 6.272 1 0.970 -2.999 1.031 2.999 -0.179
4 5.738 0.039 5.320 6.189 1 0.960 -3.979 1.041 3.979 -0.238
5 5.680 0.039 5.265 6.128 1 0.951 -4.949 1.052 4.949 -0.296
6 5.623 0.041 5.193 6.089 1 0.941 -5.909 1.063 5.909 -0.353
7 5.566 0.044 5.107 6.067 1 0.931 -6.859 1.074 6.859 -0.410
8 5.510 0.049 5.010 6.060 1 0.922 -7.800 1.085 7.800 -0.466
9 5.454 0.054 4.907 6.063 1 0.913 -8.731 1.096 8.731 -0.522

Effect plots - unit:

  • Partial residuals looks better

  • Highest in Herbaceous and Dwarf-Shrub Vegetation (Batha vegetation in the figures) and Mediterranean Maquis

  • Lowest in the Inland Sands unit

  • Intermediate in all other units

x predicted std.error conf.low conf.high group Fold PercentDiffernce Delta
Arid South 2.401 0.128 1.867 3.087 1 1.000 0.000 0.000
Negev Highlands 4.192 0.116 3.341 5.259 1 1.746 74.597 1.791
Inland Sands 3.628 0.209 2.406 5.470 1 1.511 51.099 1.227
Loess Areas 6.115 0.114 4.889 7.648 1 2.547 154.668 3.714
Transition Zone 5.248 0.120 4.149 6.639 1 2.186 118.584 2.847
Planted Conifer Forest 5.495 0.077 4.729 6.385 1 2.289 128.878 3.094
Mediterranean Maquis 6.311 0.070 5.502 7.240 1 2.629 162.867 3.910
Batha Vegetation 11.860 0.092 9.903 14.204 1 4.940 393.956 9.459

Results summary:

Taxa Birds
Scale National
SpeGroup Resident
DataGroup Time
Response Abundance
Call glmmTMB::glmmTMB(formula = Response ~ cos_td_rad + sin_td_rad + Unit_Label + year_ct + (1 | site/point_name), data = MainData, family = “nbinom2”, ziformula = ~0, dispformula = ~1, na.action = na.fail)
N_Obs 1594
df 14
df.residuals 1580
logLIk -4411.23
AIC 8850.46
BIC 8925.696
deviance 8822.46
sigma 3.235275
ContrastFar_Marginal NA
ContrastNear_Marginal NA
Contrast_Type NA
Contrast_Fold NA
Num_Years 9
Year_Start_Marginal 5.976214
Year_End_Marginal 5.45442
Year_Type Decrease
YearFold_1Year 1.010203
YearFold_NumYear 1.095664
YearPercent_1Year 1.009987
YearPercent_NumYear 8.731179



Session info:

## R version 4.2.3 (2023-03-15 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=he_IL.UTF-8  LC_CTYPE=he_IL.UTF-8    LC_MONETARY=he_IL.UTF-8
## [4] LC_NUMERIC=C            LC_TIME=he_IL.UTF-8    
## 
## attached base packages:
## [1] grid      splines   stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] extrafont_0.19    gridExtra_2.3     ggeffects_1.2.1   ggnewscale_0.4.9 
##  [5] gghalves_0.1.4    GGally_2.1.2      data.table_1.14.8 effects_4.2-3    
##  [9] carData_3.0-5     DHARMa_0.4.6      MuMIn_1.47.5      glmmTMB_1.1.8    
## [13] EnvStats_2.8.1    kableExtra_1.3.4  knitr_1.42        lubridate_1.9.2  
## [17] forcats_1.0.0     stringr_1.5.0     dplyr_1.1.1       purrr_1.0.1      
## [21] readr_2.1.4       tidyr_1.3.0       tibble_3.2.1      ggplot2_3.4.2    
## [25] tidyverse_2.0.0   rmarkdown_2.21   
## 
## loaded via a namespace (and not attached):
##   [1] TH.data_1.1-2      minqa_1.2.5        colorspace_2.1-1  
##   [4] ellipsis_0.3.2     sjlabelled_1.2.0   snakecase_0.11.0  
##   [7] estimability_1.4.1 rstudioapi_0.14    farver_2.1.1      
##  [10] fansi_1.0.4        mvtnorm_1.2-0      xml2_1.3.3        
##  [13] codetools_0.2-19   doParallel_1.0.17  cachem_1.0.7      
##  [16] jsonlite_1.8.4     nloptr_2.0.3       Cairo_1.6-0       
##  [19] Rttf2pt1_1.3.12    shiny_1.7.4        compiler_4.2.3    
##  [22] httr_1.4.5         emmeans_1.8.6      Matrix_1.5-5      
##  [25] fastmap_1.1.1      survey_4.2         cli_3.6.1         
##  [28] later_1.3.0        htmltools_0.5.5    tools_4.2.3       
##  [31] coda_0.19-4        gtable_0.3.3       glue_1.6.2        
##  [34] Rcpp_1.0.10        jquerylib_0.1.4    vctrs_0.6.1       
##  [37] svglite_2.1.1      nlme_3.1-162       extrafontdb_1.0   
##  [40] iterators_1.0.14   insight_0.19.1     xfun_0.38         
##  [43] rbibutils_2.2.16   lme4_1.1-32        rvest_1.0.3       
##  [46] mime_0.12          timechange_0.2.0   lifecycle_1.0.3   
##  [49] gap.datasets_0.0.6 MASS_7.3-58.3      zoo_1.8-11        
##  [52] scales_1.2.1       ragg_1.2.5         promises_1.2.0.1  
##  [55] hms_1.1.3          parallel_4.2.3     sandwich_3.1-0    
##  [58] TMB_1.9.6          RColorBrewer_1.1-3 yaml_2.3.7        
##  [61] sass_0.4.5         reshape_0.8.9      stringi_1.7.12    
##  [64] highr_0.10         gap_1.5-3          foreach_1.5.2     
##  [67] boot_1.3-28.1      qgam_1.3.4         Rdpack_2.6        
##  [70] rlang_1.1.0        pkgconfig_2.0.3    systemfonts_1.0.4 
##  [73] evaluate_0.20      lattice_0.21-8     labeling_0.4.2    
##  [76] tidyselect_1.2.0   plyr_1.8.8         magrittr_2.0.3    
##  [79] R6_2.5.1           generics_0.1.3     multcomp_1.4-23   
##  [82] DBI_1.1.3          pillar_1.9.0       withr_2.5.0       
##  [85] mgcv_1.8-42        survival_3.5-5     nnet_7.3-18       
##  [88] utf8_1.2.3         tzdb_0.3.0         digest_0.6.31     
##  [91] webshot_0.5.4      xtable_1.8-6       httpuv_1.6.9      
##  [94] numDeriv_2022.9-1  textshaping_0.3.6  stats4_4.2.3      
##  [97] munsell_0.5.0      viridisLite_0.4.1  bslib_0.4.2       
## [100] mitools_2.4