Introduction

(About this document: This is an R notebook. As such, it is very informal and contains lots of typos and poorly constructed fragment sentences. It’s purpose is to generate converstation among staff. It’s not suitable for publishing in the Daily Camera.)

Data wrangling

Bird Data

We already explored this data set, see here. That batch of analysis/description ended with a list of questions for Will, regarding how to process the data. He provided the following answers:

  • Do we want to subset to 9 species?
    • Yes; I think 70 detections was the recommended threshold; which species >70?
  • What to do about Number values >> 1
    • Probably treat them as one; don’t want to change the actual data, but for this exercise, I think it’s OK
  • Can you fix the Jan 1 date value?
    • I think this was a test of the database, so can be erased. Which transect has a 1/1 survey date?
  • How to handle all of the NA values in distance?
    • NA probably means that the detection type was “FO” (Flyover – those obs should be removed from the analysis)
  • What about angle values > 90?
    • There shouldn’t be angles > 90; anything past 90 deg on the surveyors left or right is behind the survey area
  • What is detection type?
    • Singing, visual, flyover, fly through, etc. Obs with “S” or “V” should be retained for the analysis
  • Are we interested in comparing years? (and do you think that later sampling in 2019 matters?)
    • Yes, we are interested in time series; not sure what you mean re later sampling in 2019, but I don’t think it matters
  • Are we interested in comparing property?
    • Only if there are enough transects on each property; Comparing GRSP abundance in Gunbarrel Hill to systemwide would be of interest, if possible
  • What is BM and WB; are these groups we want to compare?
    • These are simply diffs in naming convention; BM = bird monitoring whereas the older transects on Tallgrass West were named a bit wonkily (TGWB)
  • Are there other groups of interest?
    • It may be interesting to investigate a relationship b/t bird density / abundance and grassland block size…I think in this instance we would need to combine data from ALL the transects in smaller habitat blocks to gain enough n…

In this part of the report, I will make the changes described in the Q and A above.

options(stringsAsFactors = FALSE)
library(readxl)
library(tidyr)
library(dplyr)
library(unmarked)
library(ggplot2)
library(rmarkdown)
d <- read_excel('S:/OSMP/ECO_SYSTEMS/WILDLIFE/BIRDS/Distance data analysis/qry.GMAPAllSpp.No.FO.wDistance.and.EnivroData.2017thru2019.xlsx')
  # 122 Samples
  # 3 years

d2 <- d %>% 
  filter(!DetectionType %in% c("C", "FT", "FT,0", "FT,C", "FT,O", "O", "O,FT")) %>% # retain `DetectionType` "S" and "V" only 
  select(Sample_name, Year, BirdSpp, Number, Date, Distance, Angle) %>% 
  filter(BirdSpp == 'WEME') %>% # retain most abundant bird only
  mutate(Number = 1) %>% 
  drop_na(Distance) %>%  # remove any distance values of NA
  filter(!(Date > '2018-12-31' & Date < '2019-01-02')) %>% # get rid of jan 1 data (2 rows)
  filter(Angle < 91) %>% # remove any data w/angle > 90
  mutate(Distance2 = sight2perpdist(Distance, Angle)) %>%
  mutate(trans = paste0(Sample_name, '-', Date)) %>% # convert distances to perpendicular
  data.frame()

subset(d2, Sample_name == 'NBBM-20')  

Here’s on of the transects for WEME. The next bit of code will simplify this down to one record per date and convert it to distance bins that are required for further analysis.

yDat <- formatDistData( # create distance categories (m). Could change these.
  d2, 
  distCol = "Distance2", 
  transectNameCol = "trans",
  dist.breaks = c(0, 50, 100, 150, 200, 250, 300)
  )  

data.frame(yDat)

632 rows. There are 122 transects total, so if all 6 sample dates (2 visits for each of 3 years) had WEME each year, we’d have 732 rows. Thus, WEME is not on every transect every time, but is present 86% of the times/places.

The columns are the 6 distance bins.

Covariates

We can have covariates on detection and density. In our previous report, we found that we had cloud, wind, and noise as covariates. I’m guessing we’d want to use these as covariates on detection. We’ll process all 3 covariates, but just use wind in the model, since using all 3 takes a long time. We will use year as the covariate on density.

covs <- d %>% select(Sample_name, Date, Year, CloudIndexBegin, WindIndexBegin, NoiseIndexBegin) %>% 
  mutate(trans = paste0(Sample_name, '-', Date)) %>% 
  group_by(trans, Year) %>% 
  summarise(cloud = max(CloudIndexBegin), noise = max(NoiseIndexBegin), wind = max(WindIndexBegin)) %>% 
  data.frame()
covs.ordered <- covs[match(row.names(yDat), covs$trans), ]
covs.ordered$length <- 200
covs.ordered$Year <- as.factor(covs.ordered$Year)
data.frame(covs.ordered)

For each transect/date combination, here are the covariates. I added a length column and set it to 200 for all records.

Now, we’ll merge it together with the distance data.

Then, let’s check that they are ordered the same as the bird data.

all(covs.ordered$trans == row.names(yDat)) # check they are all 
## [1] TRUE

They are.

Finally, we’ll make a special unmarked data object.

umf <- unmarkedFrameDS(
  y = as.matrix(yDat), 
  siteCovs = covs.ordered, 
  survey = "line", 
  dist.breaks = c(0, 50, 100, 150, 200, 250, 300),
  tlength = covs.ordered$length,
  unitsIn="m"
)

OK, ready to go.

Modeling

There are a lot of steps to doing the model right. They include

  • Selecting a K value (the upper bound used in the integration or something like that)
  • Select Poisson or negative-binomila
  • Select a distance function

We’re going to skip all of those for now, and simply use K= 30, poisson, and the haz distance function. I’m confident we can do all the steps to find the correct choices for each of those, but for expediancy I just want to make sure we get format our data correctly and get a model fit that makes sense.

m1 <- distsamp(~wind ~Year, 
                         umf, 
                         keyfun = "hazard", 
                         output = "density", 
                         unitsOut = "kmsq")
summary(m1)
## 
## Call:
## distsamp(formula = ~wind ~ Year, data = umf, keyfun = "hazard", 
##     output = "density", unitsOut = "kmsq")
## 
## Density (log-scale):
##             Estimate     SE      z P(>|z|)
## (Intercept)    4.012 0.0445 90.217  0.0000
## Year2018      -0.049 0.0577 -0.849  0.3961
## Year2019      -0.142 0.0593 -2.397  0.0166
## 
## Detection (log-scale):
##                  Estimate     SE     z P(>|z|)
## shape(Intercept)   4.6589 0.0472 98.71  0.0000
## shapewind1         0.0828 0.0460  1.80  0.0717
## shapewind2         0.0849 0.0483  1.76  0.0790
## shapewind3         0.0847 0.0551  1.54  0.1239
## shapewind4         0.0956 0.0782  1.22  0.2215
## 
## Hazard-rate(scale) (log-scale):
##  Estimate     SE    z   P(>|z|)
##      1.61 0.0695 23.2 4.19e-119
## 
## AIC: 5311.912 
## Number of sites: 632
## optim convergence code: 0
## optim iterations: 175 
## Bootstrap iterations: 0 
## 
## Survey design: line-transect
## Detection function: hazard
## UnitsIn: m
## UnitsOut: kmsq

It took awhile (~ 10 mins) to fit the model. Looks like wind almost has an effect, and some interest declines by year in WEME. Note, when I ran a model with wind, noise, and clouds, wind was significant…

Let’s plot out the model predictions.

NewData <- data.frame(Year = factor(c('2017', '2018', '2019')))    

pred <- predict(
  m1, 
  type = "state", # state = density; det = detection (the shape parameter); scale = scale estimate (the detection fxn)
  newdata = NewData, 
  appendData = TRUE
)

ggplot(pred, aes(x = Year, y = Predicted)) + # a plot of the model predictions  
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymin = Predicted - SE, ymax = Predicted + SE), width = 0.2) +
  theme_bw() +
  ggtitle('Model Predictions')

Here are the model predictions (above).

library(dplyr)
data.frame(yDat, covs.ordered) %>% # a plot of the raw data
  mutate(sum = X.0.50. + X.50.100. + X.100.150. + X.150.200. + X.200.250. + X.250.300.) %>% 
  group_by(Year) %>% 
  summarise(mean = mean(sum), n = length(sum), sd = sd(sum), se = sd/sqrt(n)) %>% 
  ggplot(aes(x = Year, y = mean)) +
    geom_bar(stat = 'identity')  +
    geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = 0.2) +
    theme_bw() +
    ggtitle('Raw Data')

And here’s the raw data for comparison. Pretty dang similar, but note inflated y axis value range on the model predictions.