This is an R Markdown document, demonstrating how to run a multi-season occupancy model using the function colext in the unmarked package in R (Ian Fiske and Richard Chandler. 2011. {unmarked}: An {R} Package for Fitting Hierarchical Models of Wildlife Occurrence and Abundance. Journal of Statistical Software 43(10): 1-23).The data used for analyses in this document come from my PhD study at the University of Manitoba, which I used to model how different prairie animals and plants were affected by local-scale vegetation of transmission lines along an urbanization gradient, and how those animals and plants’ numbers were affected by surrounding land use. I was interested at the time in figuring out how and where sections of transmission line between roads might be managed intentionally as alternative habitats for prairie wildlife like butterflies, beetles, birds and plants. I was interested in this question for a couple of reasons. First, I am interested in managing urban landscapes wherever possible to support more biodiversity. Second, as native grasslands like tall-grass prairie are so heavily reduced, critically endangered, and underprotected in areas outside of cities in Manitoba, underused urban grassy spaces along transmission lines and other roadsides might be one of the best available conservation options in Manitoba for restoring patches of tall-grass prairie. So I used wildlife surveys along 48 transmission line sections across 3 years to understand how and where such spaces might be managed for prairie birds and other wildlife.
Urban transmission lines are usually mowed and sprayed frequently to keep them neat and relatively free of weeds while rural transmission lines may be infrequently mowed and sprayed, or used as haylands, or left completely unmanaged in forested or pastured or cropland landscapes except to keep trees from growing into overhead lines. I hypothesized that frequent mowing and spraying would reduce habitat suitability along urban transmission lines for many (but not necessarily all) species. I also hypothesized that large amounts of surrounding urban land (and for prairie birds, wooded lands as well) might reduce settlement by many prairie species, which could negate efforts to improve urban transmission lines as habitat for many species. Grassland amount might also matter, with larger habitats supporting more species: however, I thought that variation in how grasslands were managed might matter more than grassland amount, because management would affect grassland suitability for species. Research from my PhD studies have been published in a couple of papers so far: Urban rights-of-way as reservoirs for tall-grass prairie plants and butterflies (http://dx.doi.org/10.1007/s00267-015-0631-9); Urban rights-of-way as extensive butterfly habitats: A case study from Winnipeg, Canada (http://dx.doi.org/10.1016/j.landurbplan.2016.05.026). I have at least a few more papers to go.
Why might you ask would I be interested in running multi-season occupancy models? Because for many species during wildlife surveys, detection probability during surveys < 1 and because of varying detection probability among sites. Detection probability can vary due to differences in habitat concealment or impacts on sound propagation for aurally detected species like birds. Detection probability also varies with behavior of a species that makes it likely to be detected. In the case of birds that are detected mostly aurally rather than visually, this behavior can be singing or calling rate, which may vary with time of day or time of season. We say that the availability of a bird for detection varies with time of season and time of day. Observer experience also influences whether or not an observer detects a species, and we can say that availability of a bird for detection is greater when a more experienced observer is conducting the survey.
Finally, detection probability should be higher if a species is more abundant at a site. More individuals could mean more conspecific interactions or greater encounter rates between observers and individual organisms. Site features that enhance abundance of a species could enhance detection of that species.
There are already numerous recently developed techniques for accounting for different kinds of detection probability, but hierarchical distance sampling can account for all of the different kinds of detection influences in a single analysis. Distance sampling methods are necessary for estimating densities of organisms at sites, and can model how effective detection radius varies in different kinds of vegetation or different kinds of noisy environments, but cannot model availability for detection. Removal sampling or time-to-first-detection methods can model availability for detection but cannot be used to predict abundance or density at sites. Mixture models can be used to predict relative abundance at sites and probability that a species is perceived by observers if it is present at a site, but do not distinguish availability for detection and detection probability with distance. Mixture models cannot be used to estimate densities at sites. Occupancy models are similar to mixture models but measure whether or not a species is detected at a site rather than species abundance. Removal sampling, mixture modelling, and occupancy modelling in its basic form all assume population closure, that individual animals are not moving in and out of the survey area during the survey period. This assumption is likely to be violated in several ways. If surveys (e.g. point counts) at a site are too long, new individuals may enter the survey area before survey periods are over. Mixture and occupancy models are conducted over several visits within a breeding season, and the assumption is that individuals at a site do not enter or exit the site over the season. But new individuals may be born and previous individuals may die if the season over which visits are conducted is too long. Across breeding seasons, occupancy models can be used if we account for the probability that new individuals that were not at a site in the previous season colonize a site, and for the probability that previously present individuals are no longer present in the next season. We call these modified occupancy models multi-season occupancy models. Such models measure probability of inital occupancy, probability of occupancy in the next time period (given initial occupancy, colonization, and extinction), and probability of detection given occupancy at time T.
Multi-season occupancy models cannot be used to estimate abundance or density of a species (since to estimate density, we need distances estimated to individuals and we can only model individuals as detected or not). However, multi-season occupancy models can be used on multiple years of data in one analysis, or with multiple visits from one season, with each visit divided into intervals to conduct removal sampling.
This Markdown document and script is a simple formatting syntax for authoring HTML, PDF, and MS Word documents, that will show how I run such models. For more details on using R Markdown see http://rmarkdown.rstudio.com. Grey boxes in the Markdown document show up as either code or analysis results in the Knit HTML output document.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(unmarked)
CCSP.MSOM<-read.csv(file="PhDexample_CCSP_justsingingmales_occupancy_2yearsonly.csv", header=TRUE)
The code above starts by loading the unmarked package into RStudio along with its dependencies, and reads into R a csv file located in the same working directory as this RMarkdown document.For simplicity’s sake the CSV file has already been formatted so that it contains all data necessary for hierarchical distance sampling, but the necessary data could be formatted from separate files read into R instead.
The first column in the CSV file contains the site names, which are not necessary for this analysis. The next 6 columns contain detection/non-detection data for Clay-colored Sparrows from 2 years of surveys: 3 visits from the first year in 2008 and 3 visits from the second year in 2009. Note that two sites (Fairview and Gros Iles) were not visited in 2009, so values for Clay-colored Sparrows from those sites in 2009 are “NA”. These values are stored in y.
In the next line, breaks stores the distances bracketing the 4 distance categories per visit, enabling us to calculate densities and abundance later on, as does the next line with transect.length (each site is 500 m long and we are counting birds out to 20 m from either side of the transect line). R corresponds to the number of sites and T the number of visits.
Columns 8-11 contain site variables associated with vegetation management within sites: mowing frequency (x/year), mowing frequency centered (prior to modelling it as a quadratic function), the quadratic term for mowing frequency, and whether or not a site was hayed in 2007 (yes=1, no=0).
Columns 12-15 are land use variables within 100 m of each site, the percentages of land within 100 m consisting of those land uses: roads100 (major roads only), urban100 (major roads, built up lands including houses, larger buildings, parking lots, and smaller streets), grassland100 (mowed, unmowed, pastured or hayed), and wooded100 (forests+shrublands).
Columns 16-20 represent vegetation structural metrics within study sites that are potentially affected by mowing regime and which represent potential nesting habitat for grassland birds: BAREGROUND (bare ground cover), litcov (litter cover), htlow (vegetation height-density, the height up to which >50% of a density pole’s intervals are concealed from view 4 m away), woodycov (woody plant stem cover), and grasscov (live grass stem cover).
Columns 8-20 are the site covariates. There is 1 column per variable. These will be stored in site.cov.
# Organize distance data and covariates into an unmarkedFrame object
y<-CCSP.MSOM[,2:7] #columns 2-7 contain observations of occupancy detected or not, in 2008 and 2009
S <- nrow(CCSP.MSOM) # number of sites, which = 47
J <- 3 # number of secondary sampling occasions
T <- 2 # number of primary periods
#site level covariates
site.cov<-data.frame(mowed=CCSP.MSOM[,8], mowed.c=CCSP.MSOM[,9], mowsquare=CCSP.MSOM[,10], hayed=CCSP.MSOM[,11], roads100=CCSP.MSOM[,12], urban100=CCSP.MSOM[,13], grass100=CCSP.MSOM[,14], wooded100=CCSP.MSOM[,15], bareground=CCSP.MSOM[,16], litcov=CCSP.MSOM[,17], htlow=CCSP.MSOM[,18], woodycov=CCSP.MSOM[,19], grasscov=CCSP.MSOM[,20])
Columns 21-26 contain the Julian dates of each visit. Note 6 columns=3 visits and 2 years, 3 columns per year, stored as a matrix in Julian.m.
Columns 27-32 contain the time period in the morning of each visit. Note 6 columns=3 visits and 2 years, 3 columns per year, with each column converted to a factor, with all 3 factors stored in the data frame Time.df.
Columns 33-38 contain the observer conducting the survey during each visit. Note 6 columns=3 visits and 2 years, 3 columns per year, with each column converted to a factor, with all 3 factors stored in the data frame Obs.df.
Julian date, time of survey, and observer are observation covariates. They can vary with each visit, affecting the availability of Savannah Sparrows for detection during each visit. The first column for each variable here corresponds to the first visit in the first year, second column for each variable corresponds to the second visit in the first year, and so on……Remember the sites that were not visited in year 2 (Fairview, Gros Iles)? They had “NA” values for Savannah Sparrow counts for the three visits in 2009 and they also have “NA” for Julian date, time of survey, and observer for for the three visits in 2009. It is important that the “NA”s match up or there will be an error when creating the Unmarked GDS frame.
Finally columns 39-40 indicate the mowing frequency at each site in 2008 and 2009. This is different from the mowing frequency column 8, which represents the initial mowing frequency at each site in 2009. Mowing regime was changed at 8 sites over the 2 years of analysis (increased at 3 sites, halted at 5 sites). Columns 39-40 will be used as yearly covariates to see if a change in mowing affects the likelihood of colonization or decolonization of a site from 2008 to 2009.
#second.cov (observation covariates), often but not always varying among visits within season. For some reason they need to be made into matrices before incorporation into UMF frame
Julian.m<-as.matrix(CCSP.MSOM[,21:26]) #detectability may vary with Julian date
Timeclass.27<-CCSP.MSOM[,27]#time of day (1=5-6AM, 2=6-7AM, 3=7-8AM, 4=8-9AM, 5=9-10AM)
Timeclass.f27<-as.factor(Timeclass.27)
Timeclass.28<-CCSP.MSOM[,28]#time of day (1=5-6AM, 2=6-7AM, 3=7-8AM, 4=8-9AM, 5=9-10AM)
Timeclass.f28<-as.factor(Timeclass.28)
Timeclass.29<-CCSP.MSOM[,29]#time of day (1=5-6AM, 2=6-7AM, 3=7-8AM, 4=8-9AM, 5=9-10AM)
Timeclass.f29<-as.factor(Timeclass.29)
Timeclass.30<-CCSP.MSOM[,30]#time of day (1=5-6AM, 2=6-7AM, 3=7-8AM, 4=8-9AM, 5=9-10AM)
Timeclass.f30<-as.factor(Timeclass.30)
Timeclass.31<-CCSP.MSOM[,31]#time of day (1=5-6AM, 2=6-7AM, 3=7-8AM, 4=8-9AM, 5=9-10AM)
Timeclass.f31<-as.factor(Timeclass.31)
Timeclass.32<-CCSP.MSOM[,32]#time of day (1=5-6AM, 2=6-7AM, 3=7-8AM, 4=8-9AM, 5=9-10AM)
Timeclass.f32<-as.factor(Timeclass.32)
Time.df<-data.frame(Timeclass.f27,Timeclass.f28,Timeclass.f29,Timeclass.f30,Timeclass.f31,Timeclass.f32)
Obs.33<-CCSP.MSOM[,33]
Obs.f33<-as.factor(Obs.33)
Obs.34<-CCSP.MSOM[,34]
Obs.f34<-as.factor(Obs.34)
Obs.35<-CCSP.MSOM[,35]
Obs.f35<-as.factor(Obs.35)
Obs.36<-CCSP.MSOM[,36]
Obs.f36<-as.factor(Obs.36)
Obs.37<-CCSP.MSOM[,37]
Obs.f37<-as.factor(Obs.37)
Obs.38<-CCSP.MSOM[,38]
Obs.f38<-as.factor(Obs.38)
Obs.df<-data.frame(Obs.f33,Obs.f34,Obs.f35,Obs.f36,Obs.f37,Obs.f38)
#seasonal effects - mowing regime change at treatment sites
yearly.cov<-as.matrix(CCSP.MSOM[,39:40])
#These are then combined into a specialized dataframe:
CCSP.UMF.obscov<-unmarkedMultFrame(y=y,siteCovs=site.cov, numPrimary=2, yearlySiteCovs=list(mowed=yearly.cov),obsCovs=list(Julian=Julian.m, Time=Time.df, Obs=Obs.df))
The first step in modelling is to see how probability of detecting Clay-coloured Sparrows given their presence at a site varies with time of season (26 May-30 June), time of morning (0500-1000 hrs), and observer experience (LL=Lionel Leston, YK=Praepun “Yui” Khattiyakornjaroon, YW=Yan Wang). We will model availability for detection assuming a uniform distance function (in which the third tilde (~) in the formula expression equals 1). We will evaluate these models against a null model of availability.
#Factors affecting p
p.Null<-colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, pformula = ~1, data=CCSP.UMF.obscov)
p.Julian<-colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian, data=CCSP.UMF.obscov)
p.Time<-colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, pformula = ~Time, data=CCSP.UMF.obscov)
p.Obs<-colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, pformula = ~Obs, data=CCSP.UMF.obscov)
p.JulianTime<-colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
p.JulianObs<-colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Obs, data=CCSP.UMF.obscov)
p.TimeObs<-colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, pformula = ~Time+Obs, data=CCSP.UMF.obscov)
p.JulianTimeObs<-colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time+Obs, data=CCSP.UMF.obscov)
models1 <- fitList('psi(.)gam(.)eps(.)p(.)' = p.Null, 'psi(.)gam(.)eps(.)p(Julian+Time+Obs)' = p.JulianTimeObs, 'psi(.)gam(.)eps(.)p(Time+Obs)' = p.TimeObs, 'psi(.)gam(.)eps(.)p(Julian+Obs)' = p.JulianObs, 'psi(.)gam(.)eps(.)p(Obs)' = p.Obs, 'psi(.)gam(.)eps(.)p(Julian)' = p.Julian, 'psi(.)gam(.)eps(.)p(Time)' = p.Time, 'psi(.)gam(.)eps(.)p(Julian+Time)' = p.JulianTime)
ms1 <- modSel(models1)
## Warning in sqrt(diag(vcov(x, altNames = TRUE))): NaNs produced
ms1 #prints results
## nPars AIC delta AICwt cumltvWt
## psi(.)gam(.)eps(.)p(Julian+Time) 9 341.20 0.00 6.1e-01 0.61
## psi(.)gam(.)eps(.)p(Julian+Time+Obs) 11 343.57 2.37 1.9e-01 0.80
## psi(.)gam(.)eps(.)p(Time+Obs) 10 344.88 3.68 9.7e-02 0.90
## psi(.)gam(.)eps(.)p(Time) 8 345.74 4.54 6.3e-02 0.96
## psi(.)gam(.)eps(.)p(Julian+Obs) 7 347.86 6.66 2.2e-02 0.98
## psi(.)gam(.)eps(.)p(Obs) 6 349.19 7.99 1.1e-02 0.99
## psi(.)gam(.)eps(.)p(.) 4 350.07 8.87 7.2e-03 1.00
## psi(.)gam(.)eps(.)p(Julian) 5 384.25 43.05 2.7e-10 1.00
It appears that there are time-of-morning and time-of-season effects on whether or not Clay-coloured Sparrows are detected. The second-best model also suggests observer effects, but that model has a higher AIC value and is less parsimonious. Let’s look at the results in model p.JulianTime.
p.JulianTime
##
## Call:
## colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1,
## pformula = ~Julian + Time, data = CCSP.UMF.obscov)
##
## Initial:
## Estimate SE z P(>|z|)
## 1.28 0.398 3.21 0.00131
##
## Colonization:
## Estimate SE z P(>|z|)
## -0.523 0.799 -0.655 0.513
##
## Extinction:
## Estimate SE z P(>|z|)
## -3.15 1.05 -3 0.00267
##
## Detection:
## Estimate SE z P(>|z|)
## (Intercept) -4.9266 2.3473 -2.099 0.0358
## Julian 0.0381 0.0143 2.674 0.0075
## Time2 -0.0712 0.5325 -0.134 0.8936
## Time3 -1.2920 0.5118 -2.525 0.0116
## Time4 -0.4919 0.5444 -0.904 0.3662
## Time5 -1.0878 0.6074 -1.791 0.0733
##
## AIC: 341.1964
Clay-coloured Sparrows were more likely to be detected later in the season in each year (26 May-30 June) and less likely to be detected later in the morning between 0700-0800 hrs and 0900-1000 hrs than around 0500-0600 hrs.
Next, we test for effects of variables that differ across years that might affect occupancy at sites. Clay-coloured Sparrows nest in shrubs and I wanted to see if a change in mowing regime might affect Clay-coloured Sparrow occupancy. New mowing at three previously unmowed rural sites might reduce occupancy by Clay-coloured Sparrows. Less likely, a halt in mowing at five previously mowed urban sites might make those sites more likely to be occupied by Clay-coloured Sparrows…but probably not, since over the second year of the study shrubs probably don’t have enough time to colonize sites and grow large enough for Clay-coloured Sparrows to nest in.
#Factors affecting gamma and epsilon
epsgamma.null<-colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
#a change in mowing frequency does not influence colonization or extinction. Colonization similar across sites. Extinction similar across sites.
gamma.mowed<-colext(psiformula = ~1, gammaformula = ~mowed, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
#a change in mowing frequency affects colonization of previously unoccupied sites between 2008 and 2009.
epsilon.mowed<-colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~mowed, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
#a change in mowing frequency affects "extinction"" at previously occupied sites between 2008 and 2009.
models2 <- fitList('psi(.)gam(.)eps(mowed)p(Julian+Time)' = epsilon.mowed, 'psi(.)gam(mowed)eps(.)p(Julian+Time)' = gamma.mowed, 'psi(.)gam(.)eps(.)p(Julian+Time)' = epsgamma.null)
ms2 <- modSel(models2)
## Warning in sqrt(diag(vcov(x, altNames = TRUE))): NaNs produced
ms2 #prints results
## nPars AIC delta AICwt cumltvWt
## psi(.)gam(.)eps(.)p(Julian+Time) 9 341.20 0.00 0.500 0.50
## psi(.)gam(mowed)eps(.)p(Julian+Time) 10 341.55 0.36 0.418 0.92
## psi(.)gam(.)eps(mowed)p(Julian+Time) 10 344.83 3.63 0.081 1.00
A change in mowing doesn’t appear to improve explanation of Clay-coloured Sparrow occupancy.
Now we will look at how site occupancy is affected by site covariates. Strictly speaking, this means site occupancy in the first field season. We will start with land use models, pick the best of those, then add local vegetation management covariates to see if we can get a better model. We will also try adding local vegetation structural covariates to the best land use model to see if features associated with potential nesting habitat improve prediction of Clay-coloured Sparrows. All models will assume a uniform distance function (third ~ equals 1) and use Time and Observer as availability covariates (second ~ equals Time+Obs).
LandUse<-colext(psiformula = ~urban100+grass100+wooded100, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
Urban<-colext(psiformula = ~urban100, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
Grassland<-colext(psiformula = ~grass100, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
Wooded<-colext(psiformula = ~wooded100, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
models3 <- fitList('epsgamma.null' = epsgamma.null,'psi(LandUseOnly)gam(.)eps(.)p(Julian+Time)' = LandUse, 'psi(Urban)gam(.)eps(.)p(Julian+Time)' = Urban, 'psi(Wooded)gam(.)eps(.)p(Julian+Time)' = Wooded, 'psi(Grassland)gam(.)eps(.)p(Julian+Time)' = Grassland)
ms3 <- modSel(models3)
ms3 #prints results
## nPars AIC delta AICwt
## psi(Wooded)gam(.)eps(.)p(Julian+Time) 10 338.88 0.00 0.422
## psi(LandUseOnly)gam(.)eps(.)p(Julian+Time) 12 339.37 0.49 0.330
## epsgamma.null 9 341.20 2.32 0.132
## psi(Grassland)gam(.)eps(.)p(Julian+Time) 10 342.68 3.80 0.063
## psi(Urban)gam(.)eps(.)p(Julian+Time) 10 343.02 4.15 0.053
## cumltvWt
## psi(Wooded)gam(.)eps(.)p(Julian+Time) 0.42
## psi(LandUseOnly)gam(.)eps(.)p(Julian+Time) 0.75
## epsgamma.null 0.88
## psi(Grassland)gam(.)eps(.)p(Julian+Time) 0.95
## psi(Urban)gam(.)eps(.)p(Julian+Time) 1.00
Wooded #prints results: already known to be most parsimonious model
##
## Call:
## colext(psiformula = ~wooded100, gammaformula = ~1, epsilonformula = ~1,
## pformula = ~Julian + Time, data = CCSP.UMF.obscov)
##
## Initial:
## Estimate SE z P(>|z|)
## (Intercept) 2.1327 0.6829 3.12 0.00179
## wooded100 -0.0314 0.0158 -1.99 0.04604
##
## Colonization:
## Estimate SE z P(>|z|)
## -0.588 0.851 -0.691 0.49
##
## Extinction:
## Estimate SE z P(>|z|)
## -3.16 1.06 -2.97 0.00297
##
## Detection:
## Estimate SE z P(>|z|)
## (Intercept) -4.9213 2.3360 -2.1067 0.03514
## Julian 0.0378 0.0142 2.6688 0.00761
## Time2 -0.0214 0.5299 -0.0405 0.96772
## Time3 -1.2616 0.5073 -2.4870 0.01288
## Time4 -0.4649 0.5402 -0.8606 0.38945
## Time5 -1.0612 0.6023 -1.7620 0.07807
##
## AIC: 338.8761
A model with just the amount of wooded land within 100 m and not amounts of grassland habitat or urban land within 100 m was the most parsimonious landscape model, suggesting that Clay-coloured Sparrow occupancy probability declined in more wooded landscapes but not meaningfully in more urban landscapes. Clay-coloured Sparrow occupancy probability was not affected by the amount of grassland habitat within 100 m.
VegMan<-colext(psiformula = ~wooded100+mowed.c+mowsquare+hayed, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
Mowed<-colext(psiformula = ~wooded100+mowed.c, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
Hayed<-colext(psiformula = ~wooded100+hayed, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
models4 <- fitList('Mowed' =Mowed, 'Hayed' = Hayed, 'Wooded'=Wooded, 'VegMan'=VegMan)
ms4 <- modSel(models4)
ms4 #prints results
## nPars AIC delta AICwt cumltvWt
## Mowed 11 335.65 0.00 0.665 0.67
## Wooded 10 338.88 3.22 0.133 0.80
## VegMan 13 338.97 3.31 0.127 0.92
## Hayed 11 340.01 4.36 0.075 1.00
Mowed #prints results: already known to be most parsimonious model
##
## Call:
## colext(psiformula = ~wooded100 + mowed.c, gammaformula = ~1,
## epsilonformula = ~1, pformula = ~Julian + Time, data = CCSP.UMF.obscov)
##
## Initial:
## Estimate SE z P(>|z|)
## (Intercept) 2.8122 1.0458 2.69 0.00717
## wooded100 -0.0555 0.0264 -2.10 0.03579
## mowed.c -1.5244 0.8598 -1.77 0.07625
##
## Colonization:
## Estimate SE z P(>|z|)
## -0.303 0.705 -0.43 0.667
##
## Extinction:
## Estimate SE z P(>|z|)
## -3 0.992 -3.02 0.0025
##
## Detection:
## Estimate SE z P(>|z|)
## (Intercept) -4.80219 2.3483 -2.04497 0.0409
## Julian 0.03720 0.0143 2.60824 0.0091
## Time2 0.00284 0.5401 0.00525 0.9958
## Time3 -1.22685 0.5135 -2.38922 0.0169
## Time4 -0.45582 0.5379 -0.84748 0.3967
## Time5 -1.04802 0.6107 -1.71621 0.0861
##
## AIC: 335.6534
Adding mowing frequency covariate improved explanation of Clay-coloured Sparrow occupancy. There was lower probability of initial site occupancy by Clay-coloured Sparrows in both more wooded landscapes and as mowing frequency increased within sites. Let’s now try vegetation structural variables instead.
VegStruc<-colext(psiformula = ~wooded100+woodycov+htlow+litcov+grasscov+bareground, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
VegDensity<-colext(psiformula = ~wooded100+htlow, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
Woodycover<-colext(psiformula = ~wooded100+woodycov, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
Grasscover<-colext(psiformula = ~wooded100+grasscov, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
Litter<-colext(psiformula = ~wooded100+litcov, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
Bareground<-colext(psiformula = ~wooded100+bareground, gammaformula = ~1, epsilonformula = ~1, pformula = ~Julian+Time, data=CCSP.UMF.obscov)
models5 <- fitList('Wooded'=Wooded, 'Woodycover'=Woodycover, 'VegDensity'=VegDensity, 'Grasscover'=Grasscover, 'Bareground'=Bareground, 'Litter'=Litter, 'VegStruc'=VegStruc)
ms5 <- modSel(models5)
ms5 #prints results
## nPars AIC delta AICwt cumltvWt
## Grasscover 11 337.90 0.00 0.346 0.35
## Wooded 10 338.88 0.97 0.213 0.56
## Woodycover 11 339.99 2.09 0.122 0.68
## VegDensity 11 340.22 2.31 0.109 0.79
## Litter 11 340.37 2.47 0.101 0.89
## Bareground 11 340.53 2.63 0.093 0.98
## VegStruc 15 344.04 6.14 0.016 1.00
Adding grass cover lowered the AIC relative to a model with just wooded land cover within 100 m, but not by much. The Wooded model has essentially the same explanatory power as a model with the additional varialbe of grass cover, so the Wooded model is actually more parsimonious. But the Mowed model is the most parsimonious model for predicting Clay-coloured Sparrow occupancy. That will be it for model selection for now. Before we go any further with graphing predictions with the final best model (Mowed), we should test the model’s goodness-of-fit, which is different from the relative fit of AIC. Since our response variable consists of 1’s and 0’s, we will use a Mackenzie-Bailey chi-square test.
library(AICcmodavg)
mb.gof.test (Mowed, print.table=TRUE, nsim=1000, plot.hist=TRUE, plot.seasons=TRUE)#nsim should be normally at least 1000
##
## Goodness-of-fit for dynamic occupancy model
##
## Number of seasons: 2
##
## Chi-square statistic:
## Season 1 Season 2
## 3.6976 9.9247
##
## Total chi-square = 13.6223
## Number of bootstrap samples = 1000
## P-value = 0.201
##
## Quantiles of bootstrapped statistics:
## 0% 25% 50% 75% 100%
## 1.4 7.4 9.8 12.9 26.8
##
## Estimate of c-hat = 1.32
Once the goodness-of-fit test has been run, we can use the variance inflation factor “c-hat” to adjust the level of uncertainty in a given model’s parameters (by multiplying each of a model’s parameters’ SE by c-hat) and assess model parsimony using the Quasi-Akaike Information Criterion statistic (QAIC). c-hat can vary each time the parametric bootstrap is run, just like the test statistic, but the value used below should be reasonably close.
aictab(list('Mowed' =Mowed, 'Hayed' = Hayed, 'Wooded'=Wooded, 'VegMan'=VegMan),modnames=c("Mowed","Hayed","Wooded","VegMan"),c.hat=1.36)#c-hat taken from GOF test.
##
## Model selection based on QAICc:
## (c-hat estimate = 1.36)
##
## K QAICc Delta_QAICc QAICcWt Cum.Wt Quasi.LL
## Mowed 12 263.80 0.00 0.47 0.47 -115.31
## Wooded 11 264.01 0.21 0.42 0.89 -117.23
## Hayed 12 267.01 3.20 0.09 0.99 -116.92
## VegMan 14 271.25 7.44 0.01 1.00 -115.06
Assuming that we still want to use the “Mowed” model, we can now graph predicted variables from the different model components. For Clay-coloured Sparrows, we are probably interested in seeing how initial site occupancy probability varies with mowing frequency and amount of wooded land within 100 m, and how detection probability varies with time of day and Julian date.
#psi affected by %woodedland and mowing frequency (variables in that order)
newData <- data.frame(wooded100=c(0,10,20,30,40), mowed.c=c(-1,-1,-1,-1,-1)) #hypothetical unmowed sites
E.psi <- predict(Mowed, type = 'psi', newdata = newData, appendData=TRUE)
#one psi value per combination of values; in this case shows that predicted probability of occupancy declines as more land within 100 m is urban, wooded or both
#E.psi #display predictions in R workspace (for now turned off)
#write.csv(E.psi, file="CCSP.E.psi.csv")#how to write predictions to output (for now turned off)
newData2 <- data.frame(Julian=c(160,160,160,160,160),Time=c("1","2","3","4","5"))
E.det <- predict(Mowed, type = 'det', newdata = newData2, appendData=TRUE)
#one det value per combination of values; in this case shows that predicted probability of detection declines as more land within 100 m is roads
#E.det #display predictions in R workspace (for now turned off)
#write.csv(E.det, file="CCSP.E.det.csv")#how to write predictions to output (for now turned off)
#graphing predictions
library(ggplot2)
qplot(wooded100, Predicted, data=E.psi, geom='smooth', xlim=c(0,40), ylim=c(0,1), xlab="% 100 m Wooded (Unmowed sites)", ylab="Initial occupancy probability", main="Initial occupancy vs. % wooded land")
#Predicted values should show a monotonic, decreasing relationship in this graph, because the predicted values are based on the ordered values of wooded100 used to generate file E.psi.
Figure 1. Probability of Clay-coloured Sparrow Site Occupancy vs. Amount of Wooded Land Within 100 m of Site.