This document describes the occupancy analysis of three pine woods/grassland bird species (Bachman’s Sparrow, Brown-headed Nuthatch, and Northern Bobwhite) on Carolina Sandhills NWR.
The specific objectives of this document are:
To document the data management and analysis process to promote the more autonomous completion of future similar analyses;
To convert an existing occupancy analysis based on the the program PRESENCE into a comparable analysis in R using the unmarked package; and
To explore potential amendments or extensions to the existing analysis to the extent of including multiple variables in a single-species, single-season occupancy model. Multi-year (dynamic) or community occupancy models will be covered in another document.
We demonstrated in a previous document the procedures of loading, evaluating, modifying, and finalizing the occupancy data for analysis. We use the results of this previous exercise as the basis for the occupancy analysis in this document.
The following code imports some custom functions and aliases (abbreviations for existing functions) that we will use. It also installs (if necessary; administrative privileges NOT required!!) and loads the R packages needed to perform the analyses in this document.
# Load some useful aliases and functions
# NOTE: requires an active internet connection
# install.packages("devtools") # Install your first time through
require(devtools)
source_gist(9216051) # Rprofile.R
source_gist(9216061) # various.R
# Loading required packages
toLoad = c("unmarked", "lubridate", "plyr", "dplyr", "AICcmodavg",
"car", "ggplot2", "grid")
instant_pkgs(toLoad)
# Load a modification of an`unmarked::modSel` function and custom plotting function
# newmodSel
source_gist("https://gist.github.com/adamdsmith/c5726c1bd446a518078d448db854ce74")
# plotOccu
source_gist("https://gist.github.com/adamdsmith/99f7f4fddc82a05690d158fa6e50091a")
We can easily load the previously saved data for use in the current session. The verbose = TRUE option requests that R tell us what it loaded — handy.
load("./Data/occupancy_2012.Rda", verbose = TRUE)
## Loading objects:
## BHNU
## BACS
## NOBO
## occ2012
## covar_meds
## covar_mads
## veg
## fires
We’ll use Bachmans’ Sparrow (BACS) as our example throughout.
We can get a naive estimate of BACS occupancy (i.e., not adjusted for detection probability) simply by dividing the number of sites at which BACS were detected at least once into the total number of sites.
# Naive occupancy estimate
# Numerator sums the number of sites with at least 1 detected BACS
(BACS_naive <- sum(apply(BACS@y, 1, sum) > 0) / nrow(BACS@y))
## [1] 0.2769231
The naive estimate of BACS occupancy is 28% (i.e., 28% of sites had BACS present).
However, it’s possible that BACS sparrow were present at sites and simply went undetected. Because we multiple visits to a site over a reasonably short time frame (and can reasonably assume that BACS didn’t colonize or go extinct from a given site during the surveys), we can find a corrected estimate of overall occupancy (i.e., averaged over all sites) using the most simplistic of occupancy models incorporating detection - constant occupancy and detection probability among sites.
# In R, formulas are usually specified RESPONSE ~ VARIABLES
# In this case, `unmarked` already knows our response so we only have to specify
# the right hand side of the detection and occupancy formulas
BACS_null <- occu(~1 ~1, BACS) # Formulas specify detection and occupancy, respectively)
# In this simple case, back transformation can be achieved with a built-in function
# This won't always be the case...
p_det <- backTransform(BACS_null, "det") # Detection probability (observation process)
# Detection probability of ~ 50% per visit given presence
p_det
## Backtransformed linear combination(s) of Detection estimate(s)
##
## Estimate SE LinComb (Intercept)
## 0.504 0.084 0.0162 1
##
## Transformation: logistic
psi <- backTransform(BACS_null, "state") # Occupancy probability (state process)
# After accounting for detection, ~ 32% of sites estimated to have BACS present
psi
## Backtransformed linear combination(s) of Occupancy estimate(s)
##
## Estimate SE LinComb (Intercept)
## 0.315 0.067 -0.775 1
##
## Transformation: logistic
Given the assumptions of this model, we get an estimated detection probability of 50% if BACS was present, suggesting that over the course of three surveys the probability of not detecting BACS at a site, even if it was present, was ~ 12%. As a result, the estimate of overall detection probability should be higher than our naive estimate — and indeed it is at 32%.
Adding covariates to the either the observation (detection) or state (occupancy probability) process models is straightforward and usually one of our primary interests — one simply modifies the formula(e) to include covariates available in the data set. Coming up with the list of candidate models requires some careful thought if we’re going to use a model selection (AIC) approach. If we have clear (or at least reasonably clear) expectations for the variables that are most likely to effect the detection of a given species, we can generate a relatively small set of candidate models to compare. If we don’t know what variables might be important (suggesting an exploratory analysis) or our objective is to generate predictions, we might consider a different approach.
According to the 2012 report, BACS occupancy was related (independently?) to (1) pine basal area, (2) grass ground cover, (3) shrub cover, (4) the number of burns since 2000, and (5) canopy cover. We also fit (6) a “Cox” model in which Jim Cox (Tall Timbers) identified up to 5 important variables; a (8) “null” model representes the constant detection/occupancy model fitted above. A model with heterogeneous detection over time (i.e., probability changes with each visit) was not supported (but is briefly discussed below) so we use only a constant detection probability over time. Thus, we can explore the relative merits of these 7 models.
We use AICc (AIC corrected for small sample size) to compare the support for models in the candidate set. The model with the lowest AICc is the preferred model, but note that models within 2 to 5 AICc units of the best model (in the delta column below) still have some support. The AICcwt column indicates the probability that a given model is the best given the candidate models under consideration. nPars is the number of parameters estimated by a model; cumltvWt is the cumulative weight of models in the candidate set.
# Candidate models
BACS_pineba <- occu(~1 ~pine_ba, BACS)
BACS_grass <- occu(~1 ~grass, BACS)
BACS_shrubs <- occu(~1 ~shrubs, BACS)
BACS_fire <- occu(~1 ~burnfreq, BACS)
BACS_canopy <- occu(~1 ~canopy, BACS)
BACS_Cox <- occu(~1 ~burnfreq + shrubs + bare + grass + hard_ba, BACS)
# Model comparison
# First, create a list of model objects to compare
modList <- list(BACS_pineba, BACS_grass, BACS_shrubs, BACS_fire, BACS_canopy,
BACS_Cox, BACS_null)
# Second, assign them more descriptive names of the model fit
# p(...) indicates the detection (observation) model; all constant (.) in this case
# psi(...) indicates the occupancy (state) mode
names(modList) <- c("p(.)psi(pine_ba)", "p(.)psi(grass)", "p(.)psi(shrubs)",
"p(.)psi(fire)", "p(.)psi(canopy)", "p(.)psi(Cox)",
"p(.)psi(.)")
# Create an object unmarked can use for model comparison
fmList <- fitList(fits = modList)
# Apply model selection using the modified modSel function
# It calculates small-sample corrected AIC in addition to AIC
(ms <- newmodSel(fmList))
## nPars AICc delta AICcwt cumltvWt
## p(.)psi(Cox) 7 137.26 0.00 0.972 0.97
## p(.)psi(fire) 3 145.08 7.83 0.019 0.99
## p(.)psi(grass) 3 147.87 10.62 0.005 1.00
## p(.)psi(pine_ba) 3 149.63 12.38 0.002 1.00
## p(.)psi(.) 2 150.82 13.56 0.001 1.00
## p(.)psi(canopy) 3 151.66 14.41 0.001 1.00
## p(.)psi(shrubs) 3 152.56 15.31 0.000 1.00
Jim Cox’s model is the clear winner in this model set for Bachman’s Sparrow. However, that doesn’t necessarily mean (1) all the variables in the model are important or (2) that it’s even a good/appropriate model. We have to dig deeper.
First, we look at the estimated coefficients for the covariates in the model. This information is contained under the “Occupancy (logit-scale)” section in the summary output. We’re looking for variables with Estimate significantly different from zero. Let’s also look at the confidence intervals for these estimates. Large coefficient estimates and extremely wide confidence intervals are an indication that the model did not fit well for any of a variety of reasons. That doesn’t seem to be the case here.
summary(BACS_Cox)
##
## Call:
## occu(formula = ~1 ~ burnfreq + shrubs + bare + grass + hard_ba,
## data = BACS)
##
## Occupancy (logit-scale):
## Estimate SE z P(>|z|)
## (Intercept) -0.00555 0.625 -0.00888 0.99291
## burnfreq 2.53550 1.016 2.49637 0.01255
## shrubs -0.11524 0.501 -0.23007 0.81804
## bare -2.21733 0.819 -2.70600 0.00681
## grass 0.94896 0.575 1.64978 0.09899
## hard_ba -0.24752 0.228 -1.08744 0.27684
##
## Detection (logit-scale):
## Estimate SE z P(>|z|)
## -0.0937 0.317 -0.296 0.768
##
## AIC: 135.7815
## Number of sites: 65
## optim convergence code: 0
## optim iterations: 27
## Bootstrap iterations: 0
confint(BACS_Cox, type = "state")
## 0.025 0.975
## psi(Int) -1.2312305 1.2201225
## psi(burnfreq) 0.5448167 4.5261907
## psi(shrubs) -1.0969331 0.8664586
## psi(bare) -3.8233464 -0.6113093
## psi(grass) -0.1784157 2.0763344
## psi(hard_ba) -0.6936352 0.1985985
There’s suggestive evidence that bare ground is strongly negatively associated with BACS occupancy, whereas occupancy increases to some extent with increased burn frequency and possibly grass ground cover.
A goodness-of-fit test helps us to evaluate the overall model appropriateness. Specifically, we want to know if the patterns of detection histories observed in our sample differ substantially from the expected distribution of detection histories given the parameter estimates in the model. Basically, we bootstrap a large number of data sets (i.e., detection histories) based on the parameter estimates of our fitted model (i.e., BACS_Cox), run a χ2 test on the detection history distribution, and then compare the χ2 from the original model to those from the bootstrapped set. If our observed χ2 is unusually large (and P-value is small; i.e., < 0.05), it suggests our model doesn’t quite fit the data adequately and we need to adjust our inference of the model coefficients or, in extreme cases, change the model (e.g., maybe an important covariate is missing).
The bootstrap procedure takes some time to run, so be patient…
# Goodness of fit test (MacKenzie & Bailey 2004)
gof_boot <- mb.gof.test(BACS_Cox, nsim = 500, plot.hist=FALSE)
gof_boot
##
## MacKenzie and Bailey goodness-of-fit for single-season occupancy model
##
## Pearson chi-square table:
##
## Cohort Observed Expected Chi-square
## 000 0 47 46.43 0.01
## 001 0 3 2.83 0.01
## 010 0 5 2.83 1.66
## 011 0 1 2.58 0.97
## 100 0 2 2.83 0.24
## 110 0 2 2.58 0.13
## 111 0 5 2.35 3.00
##
## Chi-square statistic = 8.5946
## Number of bootstrap samples = 500
## P-value = 0.152
##
## Quantiles of bootstrapped statistics:
## 0% 25% 50% 75% 100%
## 0.26 3.03 4.72 7.37 18.79
##
## Estimate of c-hat = 1.61
In the case of the Cox BACS model, the χ2 observed from our model (8.59) is not excessively large (P = 0.15), suggesting a reasonably adequate fit to the model. Although not particularly relevant in this case, when model fit seems inadequate based on the χ2 test, the χ2 table can suggest those detection histories that were responsible for the lack of fit — look for larger values (> 3.8 or so). For example, the detection histories in which BACS were detected on all three surveys (history “111”) were slightly more common than expected (i.e., 5 observed vs. 2.35 expected; χ2 ~ 3.0).
The adequacy of the model fit can further be gauged by the value of the estimated overdispersion parameter (c-hat). From the documentation of the goodness-of-fit test function (i.e., ?gof_boot):
“…values of c-hat > 1 indicate overdispersion (variance > mean), but … values much higher than 1 (i.e., > 4) probably indicate lack-of-fit. In cases of moderate overdispersion, one usually multiplies the variance-covariance matrix of the estimates by c-hat. As a result, the SE’s of the estimates are inflated (c-hat is also known as a variance inflation factor).”
Inflating the standard errors and reevaluating the inference of covariates is straightforward to do, but it’s beyond the scope of this document. In the case of BACS, we simply focus on the effects of bare ground and burn frequency to BACS occupancy.
Before exploring the relationships in more detail, it is worth pointing out that we can also consider various influences on the detection probability. Typically we would explore potential changes in detectability over time or among observers. For example, to evaluate whether BACS detection probability changed among visits (using the Cox model for the occupancy model), we would specify the model below. Note that in this case detectability does not seem to change appreciably during the different survey periods as indicated be relatively little support for the heterogeneous detection model (delta > 3 and AICcwt ~ 0.17).
BACS_Cox_d <- occu(~visit ~burnfreq + shrubs + bare + grass + hard_ba, BACS)
# Model comparison
modList <- list(BACS_Cox, BACS_Cox_d)
names(modList) <- c("p(.)psi(Cox)", "p(visit)psi(Cox)")
fmList <- fitList(fits = modList)
# Model selection
(ms <- newmodSel(fmList))
## nPars AICc delta AICcwt cumltvWt
## p(.)psi(Cox) 7 137.26 0.00 0.829 0.83
## p(visit)psi(Cox) 9 140.41 3.15 0.171 1.00
We can now visualize the associations between habitat covariates (bare ground % and fire frequency) and BACS occupancy apparent in the data:
plotOccu(BACS_Cox, data = occ2012[["BACS"]],
plotVars = c("bare", "burnfreq"), # Covariates to plot, in order of desired plotting
varNames = c("% bare ground", "burns since 2000"), # Covariate labels, same order
modelVars = c("shrubs", "grass", "hard_ba"), output = FALSE) # Other covariates in model, not plotted
## NULL
These plots suggest that BACS occupancy is considerably reduced at sites with more bare ground or that have not been burned as frequently. The gray area around the fitted line indicates the 95% prediction interval (i.e., the amount of variability) around the illustrated relationship (black line). Note that these plots assume all other covariates in the model are at their median values, so make sure that this makes sense. For example, does it make sense to look at the effect of bare ground when hardwood basal area = 2.5, grass ground cover = 0.3, shrub cover = 0.275 and the site has been burned 3.5 times since 2000? If not, then we’ll need to consider modifications to the plots.
You can extract the median values of the variables in the model using:
extract_vars(BACS_Cox@formula, covar_meds)
## hard_ba grass bare shrubs burnfreq
## 2.500 0.300 0.275 0.275 3.500