Notes:
Only full-sized catchment areas (i.e., without a 500m coastal buffer) were used for this analysis to maintain consistency, as not all coastal catchments had a 500m buffer. See the summary table below for a comparison of catchment areas with vs. without a 500m buffer.
Data are likely a good fit for a zero-inflated model set.
First the dataset is prepared as needed to run models. A trend model is prepared to later standardize the number of incoming birds across the entire dataset to the same day of year (DOY) and radar tilt.
# Model dataframe will be s
# Response var is 'mamuinpd'
s <- data.frame(y = surveys$mamuinpd)
# Clean up any vars as needed
s$observer <- as.factor(surveys$observer)
s[["observer"]][is.na(s$observer)] <- "Unknown" # 15 records have NA observer. Give them value of "Unknown" to match other unknown observer records.
# Make factors
s$region <- as.factor(surveys$region)
s$catchment <- as.factor(surveys$new_name)
s$year <- as.factor(surveys$year)
s$dir <- as.factor(surveys$dir)
# Re-scale numerics
s$s_year <- scale(as.numeric(s$year))[,1]
s$tilt <- as.numeric(surveys$tilt)
s$s_tilt <- scale(s$tilt)[,1]
s$radius <- as.numeric(surveys$radius)
s$s_radius <- scale(s$radius)[,1]
s$doy <- as.numeric(surveys$doy)
s$s_doy <- scale(s$doy)[,1]
# Chuck in lat/lon for good measure, in case of spatial autocorrelation
s$lat <- surveys$lat
s$s_lat <- scale(s$lat)[,1]
s$lon <- surveys$lon
s$s_lon <- scale(s$lon)[,1]
# Keep only complete cases
s <- s[complete.cases(s),]
With the model data (s) prepared, a series of baseline
models are tested.
Baseline assumptions: we assume the number of birds will vary depending on the observer and the year of survey. We are not interested in these variations and thus throw in a random effect structure to account for this. And while it looks like zero inflation models would be good candidate models, we first test no zero inflation vs. two differing types of zero inflation. Finally, we test three different response distributions: Poisson and two different types of negative binomial.
## BASE ZI MODELS
# WITHOUT ZERO-INFLATION
base0_p <- glmmTMB(y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + s_radius + dir + (1|observer) + (1|year*catchment),
data = s,
ziformula = ~0, # NO zero-inflation
family = poisson)
base0_nb1 <- update(base0_p, family = nbinom1)
base0_nb2 <- update(base0_p, family = nbinom2)
# WITH ZERO-INFLATION
# Same zero-inflation parameter fitted across all observations
base1_p <- glmmTMB(y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + s_radius + dir + (1|observer) + (1|year*catchment),
data = s,
ziformula = ~1, # YES zero-inflation
family = poisson)
base1_nb1 <- update(base1_p, family = nbinom1)
base1_nb2 <- update(base1_p, family = nbinom2)
# WITH CONDITIONAL ZERO-INFLATION
# ZI formula is set to be identical to conditional formula
# Note these take awhile to fit...
# ... and all fail to converge, indicating ZI formula is overly complex.
# base2_p <- glmmTMB(y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + s_radius + dir + (1|observer) + (1|year*catchment),
# data = s,
# ziformula = ~., # YES zero-inflation
# family = poisson) # fails
#base2_nb1 <- update(base2_p, family = nbinom1) # fails, takes forever to run
#base2_nb2 <- update(base2_p, family = nbinom2) # also fails
# COMPARE
# Output indicates that negative binomial 2 distribution
# with zero inflation is best fit.
bbmle::AICtab(base0_p, base0_nb1, base0_nb2,
base1_p, base1_nb1, base1_nb2)
## dAIC df
## base0_nb2 0.0 19
## base1_nb2 1.2 20
## base0_nb1 110.7 19
## base1_nb1 112.7 20
## base1_p 23145.6 19
## base0_p 23215.8 18
# Choose our base model
zinb0 <- base1_nb2
The baseline model indicates a zero-inflated negative binomial 2 distribution is the best fit for the response data. Now, add all terms in the full model to test if other variables in the survey dataset are influential in MAMU counts.
# ## BASELINE MODEL ##
# All sites have same trend, but trend varies by observer and year
# This is our baseline zinb0, i.e.:
# glmmTMB(formula = y ~ s_doy + I(s_doy^2) + s_year + region +
# + lat + lon + s_tilt + s_radius + dir
# + (1 | observer) + (1 | year * catchment),
# data = surveys,
# family = nbinom2,
# ziformula = ~1,
# dispformula = ~1)
# +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ #
# ## INTERACTION TERMS ##
# Slope for each doy varies by catchment
# FAILS: by far too many combinations.
# Not enough points per catchment to be meaningful.
#zinbx <- update(zinb0, formula = ~. + (s_doy + I(s_doy^2)) * catchment)
# Slope for each year varies by catchment
# FAILS: by far too many combinations
# Not enough points per catchment to be meaningful.
#zinbx <- update(zinb0, formula = ~. + s_year * catchment)
# Slope for each year varies by region
zinb1 <- update(zinb0, formula = ~. + s_year * region)
# Slope for doy varies by year
zinb2 <- update(zinb0, formula = ~. + s_doy:s_year + I(s_doy^2):s_year)
# Slope for doy varies by region
zinb3 <- update(zinb0, formula = ~. + s_doy:region + I(s_doy^2):region)
# Adding in a DOY:region interaction improves model fit
anova(zinb0, zinb1, zinb2, zinb3)
## Data: s
## Models:
## zinb0: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb0: s_radius + dir + (1 | observer) + (1 | year * catchment), zi=~1, disp=~1
## zinb2: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb2: s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb2: s_doy:s_year + I(s_doy^2):s_year, zi=~1, disp=~1
## zinb1: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb1: s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb1: s_year:region, zi=~1, disp=~1
## zinb3: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb3: s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb3: s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## zinb0 20 18460 18569 -9210.2 18420
## zinb2 22 2
## zinb1 25 18448 18585 -9199.2 18398 3
## zinb3 30 18430 18594 -9185.0 18370 28.319 5 3.153e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It seems varying the effects of DOY by region has the best fit. This makes sense - seasonality is different between the Pacific side of the island and the relatively sheltered mainland coasts.
## RANDOM EFFECT STRUCTURES
# ## BASELINE MODEL ##
# All sites have same trend, but trend varies by observer and year
# This is our baseline zinb3, i.e.:
# glmmTMB(formula = y ~ s_doy + I(s_doy^2) + region
# + s_doy:region + I(s_doy^2):region
# + lat + lon + s_tilt + s_radius + dir
# + (1 | observer) + (1 | year * catchment),
# data = surveys,
# family = nbinom2,
# ziformula = ~1,
# dispformula = ~1)
# +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ #
# ## YEAR, OBSERVER RANDOM EFFECTS ##
# Let us double check our assumption that trend varies by observer and year....
# Trend intercept varies by year/catchment combo only
zinb_y <- update(zinb3, formula = ~. -(1|observer))
# Trend intercept varies by observer only
zinb_o <- update(zinb3, formula = ~. -(1|year * catchment))
# Model comparison indicates both observer and year REs are
# highly significant and should be retained.
anova(zinb3, zinb_y, zinb_o)
## Data: s
## Models:
## zinb_o: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_o: s_radius + dir + (1 | observer) + s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_y: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_y: s_radius + dir + (1 | year * catchment) + s_doy:region + , zi=~1, disp=~1
## zinb_y: I(s_doy^2):region, zi=~1, disp=~1
## zinb3: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb3: s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb3: s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## zinb_o 27 19429 19576 -9687.6 19375
## zinb_y 29 18474 18632 -9207.9 18416 959.49 2 < 2.2e-16 ***
## zinb3 30 18430 18594 -9185.0 18370 45.66 1 1.406e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ #
# ## SPATIAL RANDOM EFFECTS ##
# INTERCEPT-ONLY
# Intercept varies by year and catchment, seperately
zinb_c <- update(zinb3, formula = ~. - (1|year*catchment) + (1|year) + (1|catchment))
# Intercept varies by year and catchment within region, seperately
zinb_cr <- update(zinb3, formula = ~. - (1|year*catchment) + (1|year) + (1|region/catchment))
# Intercept varies by year * region / catchment
zinb_ycr <- update(zinb3, formula = ~. -(1|year*catchment) + (1|year*region/catchment))
# SLOPE-INTERCEPT
# Slope and intercept for year varies by catchment
zinb_yc <- update(zinb3, formula = ~. - (1|year*catchment) + (s_year|catchment))
# Slope and intercept for year varies by region
zinb_yrc <- update(zinb3, formula = ~. - (1|year*catchment) + (s_year|region/catchment))
# Slope and intercept for doy varies by catchment
zinb_dc <- update(zinb3, formula = ~. + (s_doy|catchment) + (I(s_doy^2)|catchment))
# Slope and intercept for doy varies by region
zinb_drc <- update(zinb3, formula = ~. + (s_doy|region/catchment) + + (I(s_doy^2)|region/catchment))
# Compare
anova(zinb3, zinb_c, zinb_cr, zinb_ycr, zinb_yc, zinb_yrc, zinb_dc, zinb_drc)
bbmle::AICtab(zinb3, zinb_c, zinb_cr, zinb_ycr, zinb_yc, zinb_yrc, zinb_dc, zinb_drc)
## Data: s
## Models:
## zinb_c: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_c: s_radius + dir + (1 | observer) + (1 | year) + (1 | catchment) + , zi=~1, disp=~1
## zinb_c: s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb3: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb3: s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb3: s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_cr: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_cr: s_radius + dir + (1 | observer) + (1 | year) + (1 | region/catchment) + , zi=~1, disp=~1
## zinb_cr: s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_yc: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_yc: s_radius + dir + (1 | observer) + (s_year | catchment) + , zi=~1, disp=~1
## zinb_yc: s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_ycr: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_ycr: s_radius + dir + (1 | observer) + (1 | year * region/catchment) + , zi=~1, disp=~1
## zinb_ycr: s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_yrc: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_yrc: s_radius + dir + (1 | observer) + (s_year | region/catchment) + , zi=~1, disp=~1
## zinb_yrc: s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_dc: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_dc: s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb_dc: (s_doy | catchment) + (I(s_doy^2) | catchment) + s_doy:region + , zi=~1, disp=~1
## zinb_dc: I(s_doy^2):region, zi=~1, disp=~1
## zinb_drc: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_drc: s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb_drc: (s_doy | region/catchment) + (I(s_doy^2) | region/catchment) + , zi=~1, disp=~1
## zinb_drc: s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## zinb_c 29 18572 18730 -9257.0 18514
## zinb3 30 18430 18594 -9185.0 18370 143.85 1 <2e-16 ***
## zinb_cr 30 18572 18736 -9256.2 18512 0.00 0 1
## zinb_yc 30 18577 18740 -9258.3 18517 0.00 0 1
## zinb_ycr 31 18683 18852 -9310.3 18621 0.00 1 1
## zinb_yrc 33 2
## zinb_dc 36 3
## zinb_drc 42 6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## dAIC df
## zinb3 0.0 30
## zinb_c 141.8 29
## zinb_cr 142.3 30
## zinb_yc 146.6 30
## zinb_ycr 252.6 31
## zinb_yrc NA 33
## zinb_dc NA 36
## zinb_drc NA 42
While the lowest AIC model includes the random effect of catchment nested within region, the model actually fails to converge and does not produce confidence intervals for our estimates. The second-lowest AIC model was selected instead. The final model includes quadratic effect of DOY, year, tilt, and catchment as fixed effects, with observer and year as random intercepts and a random slope-intercept of DOY by catchment:
y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , s_radius + dir + (1 | observer) + (1 | year * catchment) + , s_doy:region + I(s_doy^2):region
The simulated model residuals suggest that the model generally over-predicts the number of birds, but tests for dispersion and zero-inflation suggest the model distribution (zero-inflated negative binomial 2) is a good choice. Adding more parameters to the model may improve the over-prediction problem. Note that one must interpret the residuals for a negative binomial distribution model with caution!
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.23367, p-value < 2.2e-16
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.49755, p-value = 0.328
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 14, observations = 1721, p-value = 0.8917
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.004454303 0.013611189
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.008134805
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.23367, p-value < 2.2e-16
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.49755, p-value = 0.328
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 14, observations = 1721, p-value = 0.8917
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.004454303 0.013611189
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.008134805
Plots exclude effects of catchment and observer as they have hundreds and dozens of levels, respectively.
The year will be fixed at 2021, while all other continuous variables will be held at their mean.
# Note because we scaled year we need to actually get the scaled equivalent of year = 2021
# Grab every unique catchment, observer, direction combo
# Hold all other numeric values constant
p2021 <- unique(s[,c("region", "catchment", "observer", "dir")])
p2021$year <- unique(s[["year"]][s$year == 2021]) # do this to keep the same factor level
p2021$s_year <- unique(s[["s_year"]][s$year == 2021])
p2021$doy <- mean(s$doy)
p2021$s_doy <- mean(s$s_doy)
p2021$tilt <- round(mean(s$tilt))
p2021$s_tilt <- mean(s$s_tilt)
p2021$radius <- round(mean(s$radius), 2)
p2021$s_radius <- mean(s$s_radius)
# Grab mean lat/long for each survey from the stn df
stn$lon <- st_coordinates(stn)[,1]
stn$lat <- st_coordinates(stn)[,2]
p2021 <- merge(p2021, st_drop_geometry(stn), by.x = "catchment", by.y = "new_name")
p2021$predicted_mamu <- predict(top_model, newdata = p2021, type = "response")
In all the following plots, the predicted number of MAMU derived from the final model has been appended to the observed data and plotted in red. All existing combinations of factors were used for the 2021 data, while all other continuous variables were held at their mean.
Data are likely a good fit for a zero-inflated model set.