Catchment and survey maps

Interactive map

Notes:

The VRI is lowres for display purposes only. All area calculations were done with VRI at a 10x10m resolution.
While the VRI has been most recently calculated in May 01 2023, the data represents habitat extent up to the year 2021.

2021 VRI static maps

MAMU vs catchment area

Only full-sized catchment areas (i.e., without a 500m coastal buffer) were used for this analysis to maintain consistency, as not all coastal catchments had a 500m buffer. See the summary table below for a comparison of catchment areas with vs. without a 500m buffer.

Summary table

Appendix: MAMU survey models

Survey data exploration

Data are likely a good fit for a zero-inflated model set.

Baseline radar survey trend models

First the dataset is prepared as needed to run models. A trend model is prepared to later standardize the number of incoming birds across the entire dataset to the same day of year (DOY) and radar tilt.

# Model dataframe will be s
# Response var is 'mamuinpd'
s <- data.frame(y = surveys$mamuinpd)

# Clean up any vars as needed
s$observer <- as.factor(surveys$observer)
s[["observer"]][is.na(s$observer)] <- "Unknown" # 15 records have NA observer. Give them value of "Unknown" to match other unknown observer records.

# Make factors
s$region <- as.factor(surveys$region)
s$catchment <- as.factor(surveys$new_name)
s$year <- as.factor(surveys$year)
s$dir <- as.factor(surveys$dir)

# Re-scale numerics
s$s_year <- scale(as.numeric(s$year))[,1]

s$tilt <- as.numeric(surveys$tilt)
s$s_tilt <- scale(s$tilt)[,1]

s$radius <- as.numeric(surveys$radius)
s$s_radius <- scale(s$radius)[,1]

s$doy <- as.numeric(surveys$doy)
s$s_doy <- scale(s$doy)[,1]

# Chuck in lat/lon for good measure, in case of spatial autocorrelation
s$lat <- surveys$lat
s$s_lat <- scale(s$lat)[,1]
s$lon <- surveys$lon
s$s_lon <- scale(s$lon)[,1]

# Keep only complete cases
s <- s[complete.cases(s),]

With the model data (s) prepared, a series of baseline models are tested.

Baseline assumptions: we assume the number of birds will vary depending on the observer and the year of survey. We are not interested in these variations and thus throw in a random effect structure to account for this. And while it looks like zero inflation models would be good candidate models, we first test no zero inflation vs. two differing types of zero inflation. Finally, we test three different response distributions: Poisson and two different types of negative binomial.

## BASE ZI MODELS

# WITHOUT ZERO-INFLATION
base0_p <- glmmTMB(y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + s_radius + dir + (1|observer) + (1|year*catchment),
                 data = s,
                 ziformula = ~0, # NO zero-inflation
                 family = poisson)
base0_nb1 <- update(base0_p, family = nbinom1)
base0_nb2 <- update(base0_p, family = nbinom2)

# WITH ZERO-INFLATION
# Same zero-inflation parameter fitted across all observations
base1_p <- glmmTMB(y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + s_radius + dir + (1|observer) + (1|year*catchment),
                 data = s,
                 ziformula = ~1, # YES zero-inflation
                 family = poisson)
base1_nb1 <- update(base1_p, family = nbinom1)
base1_nb2 <- update(base1_p, family = nbinom2)

# WITH CONDITIONAL ZERO-INFLATION
# ZI formula is set to be identical to conditional formula
# Note these take awhile to fit...
# ... and all fail to converge, indicating ZI formula is overly complex.
# base2_p <- glmmTMB(y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + s_radius + dir + (1|observer) + (1|year*catchment),
#                  data = s,
#                  ziformula = ~., # YES zero-inflation
#                  family = poisson) # fails
#base2_nb1 <- update(base2_p, family = nbinom1) # fails, takes forever to run
#base2_nb2 <- update(base2_p, family = nbinom2) # also fails

# COMPARE
# Output indicates that negative binomial 2 distribution
# with zero inflation is best fit.
bbmle::AICtab(base0_p, base0_nb1, base0_nb2, 
              base1_p, base1_nb1, base1_nb2)

##           dAIC    df
## base0_nb2     0.0 19
## base1_nb2     1.2 20
## base0_nb1   110.7 19
## base1_nb1   112.7 20
## base1_p   23145.6 19
## base0_p   23215.8 18

# Choose our base model
zinb0 <- base1_nb2

Adding fixed effects

The baseline model indicates a zero-inflated negative binomial 2 distribution is the best fit for the response data. Now, add all terms in the full model to test if other variables in the survey dataset are influential in MAMU counts.

# ## BASELINE MODEL ##
# All sites have same trend, but trend varies by observer and year
# This is our baseline zinb0, i.e.:
# glmmTMB(formula = y ~ s_doy + I(s_doy^2) + s_year + region + 
#           + lat + lon + s_tilt + s_radius + dir
#           + (1 | observer) + (1 | year * catchment), 
#         data = surveys, 
#         family = nbinom2, 
#         ziformula = ~1, 
#         dispformula = ~1)

# +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ #
# ## INTERACTION TERMS ##

# Slope for each doy varies by catchment
# FAILS: by far too many combinations. 
# Not enough points per catchment to be meaningful.
#zinbx <- update(zinb0, formula = ~. + (s_doy + I(s_doy^2)) * catchment)

# Slope for each year varies by catchment
# FAILS: by far too many combinations
# Not enough points per catchment to be meaningful.
#zinbx <- update(zinb0, formula = ~. + s_year * catchment)

# Slope for each year varies by region
zinb1 <- update(zinb0, formula = ~. + s_year * region)

# Slope for doy varies by year
zinb2 <- update(zinb0, formula = ~. + s_doy:s_year + I(s_doy^2):s_year)

# Slope for doy varies by region
zinb3 <- update(zinb0, formula = ~. + s_doy:region  + I(s_doy^2):region)

# Adding in a DOY:region interaction improves model fit
anova(zinb0, zinb1, zinb2, zinb3)

## Data: s
## Models:
## zinb0: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb0:     s_radius + dir + (1 | observer) + (1 | year * catchment), zi=~1, disp=~1
## zinb2: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb2:     s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb2:     s_doy:s_year + I(s_doy^2):s_year, zi=~1, disp=~1
## zinb1: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb1:     s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb1:     s_year:region, zi=~1, disp=~1
## zinb3: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb3:     s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb3:     s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
##       Df   AIC   BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## zinb0 20 18460 18569 -9210.2    18420                             
## zinb2 22                                          2               
## zinb1 25 18448 18585 -9199.2    18398             3               
## zinb3 30 18430 18594 -9185.0    18370 28.319      5  3.153e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It seems varying the effects of DOY by region has the best fit. This makes sense - seasonality is different between the Pacific side of the island and the relatively sheltered mainland coasts.

Add a random effect structure

## RANDOM EFFECT STRUCTURES

# ## BASELINE MODEL ##
# All sites have same trend, but trend varies by observer and year
# This is our baseline zinb3, i.e.:
# glmmTMB(formula = y ~ s_doy + I(s_doy^2) + region
#           + s_doy:region + I(s_doy^2):region
#           + lat + lon + s_tilt + s_radius + dir
#           + (1 | observer) + (1 | year * catchment), 
#         data = surveys, 
#         family = nbinom2, 
#         ziformula = ~1, 
#         dispformula = ~1)

# +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ #
# ## YEAR, OBSERVER RANDOM EFFECTS ##
# Let us double check our assumption that trend varies by observer and year....
# Trend intercept varies by year/catchment combo only
zinb_y <- update(zinb3, formula = ~. -(1|observer))

# Trend intercept varies by observer only
zinb_o <- update(zinb3, formula = ~. -(1|year * catchment))

# Model comparison indicates both observer and year REs are 
# highly significant and should be retained.
anova(zinb3, zinb_y, zinb_o)

## Data: s
## Models:
## zinb_o: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_o:     s_radius + dir + (1 | observer) + s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_y: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_y:     s_radius + dir + (1 | year * catchment) + s_doy:region + , zi=~1, disp=~1
## zinb_y:     I(s_doy^2):region, zi=~1, disp=~1
## zinb3: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb3:     s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb3:     s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
##        Df   AIC   BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## zinb_o 27 19429 19576 -9687.6    19375                             
## zinb_y 29 18474 18632 -9207.9    18416 959.49      2  < 2.2e-16 ***
## zinb3  30 18430 18594 -9185.0    18370  45.66      1  1.406e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ #
# ## SPATIAL RANDOM EFFECTS ##

# INTERCEPT-ONLY
# Intercept varies by year and catchment, seperately
zinb_c <- update(zinb3, formula = ~. - (1|year*catchment) + (1|year) + (1|catchment))

# Intercept varies by year and catchment within region, seperately
zinb_cr <- update(zinb3, formula = ~. - (1|year*catchment) + (1|year) + (1|region/catchment))

# Intercept varies by year * region / catchment
zinb_ycr <- update(zinb3, formula = ~. -(1|year*catchment) + (1|year*region/catchment))

# SLOPE-INTERCEPT
# Slope and intercept for year varies by catchment
zinb_yc <- update(zinb3, formula = ~. - (1|year*catchment) + (s_year|catchment))

# Slope and intercept for year varies by region
zinb_yrc <- update(zinb3, formula = ~. - (1|year*catchment) + (s_year|region/catchment))

# Slope and intercept for doy varies by catchment
zinb_dc <- update(zinb3, formula = ~. + (s_doy|catchment) + (I(s_doy^2)|catchment))

# Slope and intercept for doy varies by region
zinb_drc <- update(zinb3, formula = ~. + (s_doy|region/catchment) +  + (I(s_doy^2)|region/catchment))

# Compare
anova(zinb3, zinb_c, zinb_cr, zinb_ycr, zinb_yc, zinb_yrc, zinb_dc, zinb_drc)
bbmle::AICtab(zinb3, zinb_c, zinb_cr, zinb_ycr, zinb_yc, zinb_yrc, zinb_dc, zinb_drc)

## Data: s
## Models:
## zinb_c: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_c:     s_radius + dir + (1 | observer) + (1 | year) + (1 | catchment) + , zi=~1, disp=~1
## zinb_c:     s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb3: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb3:     s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb3:     s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_cr: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_cr:     s_radius + dir + (1 | observer) + (1 | year) + (1 | region/catchment) + , zi=~1, disp=~1
## zinb_cr:     s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_yc: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_yc:     s_radius + dir + (1 | observer) + (s_year | catchment) + , zi=~1, disp=~1
## zinb_yc:     s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_ycr: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_ycr:     s_radius + dir + (1 | observer) + (1 | year * region/catchment) + , zi=~1, disp=~1
## zinb_ycr:     s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_yrc: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_yrc:     s_radius + dir + (1 | observer) + (s_year | region/catchment) + , zi=~1, disp=~1
## zinb_yrc:     s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
## zinb_dc: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_dc:     s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb_dc:     (s_doy | catchment) + (I(s_doy^2) | catchment) + s_doy:region + , zi=~1, disp=~1
## zinb_dc:     I(s_doy^2):region, zi=~1, disp=~1
## zinb_drc: y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , zi=~1, disp=~1
## zinb_drc:     s_radius + dir + (1 | observer) + (1 | year * catchment) + , zi=~1, disp=~1
## zinb_drc:     (s_doy | region/catchment) + (I(s_doy^2) | region/catchment) + , zi=~1, disp=~1
## zinb_drc:     s_doy:region + I(s_doy^2):region, zi=~1, disp=~1
##          Df   AIC   BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## zinb_c   29 18572 18730 -9257.0    18514                             
## zinb3    30 18430 18594 -9185.0    18370 143.85      1     <2e-16 ***
## zinb_cr  30 18572 18736 -9256.2    18512   0.00      0          1    
## zinb_yc  30 18577 18740 -9258.3    18517   0.00      0          1    
## zinb_ycr 31 18683 18852 -9310.3    18621   0.00      1          1    
## zinb_yrc 33                                          2               
## zinb_dc  36                                          3               
## zinb_drc 42                                          6               
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##          dAIC  df
## zinb3      0.0 30
## zinb_c   141.8 29
## zinb_cr  142.3 30
## zinb_yc  146.6 30
## zinb_ycr 252.6 31
## zinb_yrc    NA 33
## zinb_dc     NA 36
## zinb_drc    NA 42

Assess final model

While the lowest AIC model includes the random effect of catchment nested within region, the model actually fails to converge and does not produce confidence intervals for our estimates. The second-lowest AIC model was selected instead. The final model includes quadratic effect of DOY, year, tilt, and catchment as fixed effects, with observer and year as random intercepts and a random slope-intercept of DOY by catchment:

y ~ s_doy + I(s_doy^2) + s_year + region + lat + lon + s_tilt + , s_radius + dir + (1 | observer) + (1 | year * catchment) + , s_doy:region + I(s_doy^2):region

The simulated model residuals suggest that the model generally over-predicts the number of birds, but tests for dispersion and zero-inflation suggest the model distribution (zero-inflated negative binomial 2) is a good choice. Adding more parameters to the model may improve the over-prediction problem. Note that one must interpret the residuals for a negative binomial distribution model with caution!

## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.23367, p-value < 2.2e-16
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.49755, p-value = 0.328
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 14, observations = 1721, p-value = 0.8917
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.004454303 0.013611189
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.008134805

## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.23367, p-value < 2.2e-16
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.49755, p-value = 0.328
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 14, observations = 1721, p-value = 0.8917
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.004454303 0.013611189
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.008134805

Effect plots

Plots exclude effects of catchment and observer as they have hundreds and dozens of levels, respectively.

Predict the population at each catchment

The year will be fixed at 2021, while all other continuous variables will be held at their mean.

# Note because we scaled year we need to actually get the scaled equivalent of year = 2021
# Grab every unique catchment, observer, direction combo
# Hold all other numeric values constant
p2021 <- unique(s[,c("region", "catchment", "observer", "dir")])
p2021$year <- unique(s[["year"]][s$year == 2021]) # do this to keep the same factor level
p2021$s_year <- unique(s[["s_year"]][s$year == 2021])
p2021$doy <- mean(s$doy)
p2021$s_doy <- mean(s$s_doy)
p2021$tilt <- round(mean(s$tilt))
p2021$s_tilt <- mean(s$s_tilt)
p2021$radius <- round(mean(s$radius), 2)
p2021$s_radius <- mean(s$s_radius)

# Grab mean lat/long for each survey from the stn df
stn$lon <- st_coordinates(stn)[,1]
stn$lat <- st_coordinates(stn)[,2]
p2021 <- merge(p2021, st_drop_geometry(stn), by.x = "catchment", by.y = "new_name")

p2021$predicted_mamu <- predict(top_model, newdata = p2021, type = "response")

2021 predicted data vs observed data

In all the following plots, the predicted number of MAMU derived from the final model has been appended to the observed data and plotted in red. All existing combinations of factors were used for the 2021 data, while all other continuous variables were held at their mean.