(About this document: This is an R notebook. As such, it is very informal and contains lots of typos and poorly constructed fragment sentences. It’s purpose is to generate converstation among staff. It’s not suitable for publishing in the Daily Camera.)
We already explored this data set, see here. That batch of analysis/description ended with a list of questions for Will, regarding how to process the data. He provided the following answers:
In this part of the report, I will make the changes described in the Q and A above.
options(stringsAsFactors = FALSE)
library(readxl)
library(tidyr)
library(dplyr)
library(unmarked)
library(ggplot2)
library(rmarkdown)
d <- read_excel('S:/OSMP/ECO_SYSTEMS/WILDLIFE/BIRDS/Distance data analysis/qry.GMAPAllSpp.No.FO.wDistance.and.EnivroData.2017thru2019.xlsx')
# 122 Samples
# 3 years
d2 <- d %>%
filter(!DetectionType %in% c("C", "FT", "FT,0", "FT,C", "FT,O", "O", "O,FT")) %>% # retain `DetectionType` "S" and "V" only
select(Sample_name, Year, BirdSpp, Number, Date, Distance, Angle) %>%
filter(BirdSpp == 'WEME') %>% # retain most abundant bird only
mutate(Number = 1) %>%
drop_na(Distance) %>% # remove any distance values of NA
filter(!(Date > '2018-12-31' & Date < '2019-01-02')) %>% # get rid of jan 1 data (2 rows)
filter(Angle < 91) %>% # remove any data w/angle > 90
mutate(Distance2 = sight2perpdist(Distance, Angle)) %>%
mutate(trans = paste0(Sample_name, '-', Date)) %>% # convert distances to perpendicular
data.frame()
subset(d2, Sample_name == 'NBBM-20')
Here’s on of the transects for WEME. The next bit of code will simplify this down to one record per date and convert it to distance bins that are required for further analysis.
yDat <- formatDistData( # create distance categories (m). Could change these.
d2,
distCol = "Distance2",
transectNameCol = "trans",
dist.breaks = c(0, 50, 100, 150, 200, 250, 300)
)
data.frame(yDat)
632 rows. There are 122 transects total, so if all 6 sample dates (2 visits for each of 3 years) had WEME each year, we’d have 732 rows. Thus, WEME is not on every transect every time, but is present 86% of the times/places.
The columns are the 6 distance bins.
We can have covariates on detection and density. In our previous report, we found that we had cloud, wind, and noise as covariates. I’m guessing we’d want to use these as covariates on detection. We’ll process all 3 covariates, but just use wind in the model, since using all 3 takes a long time. We will use year as the covariate on density.
covs <- d %>% select(Sample_name, Date, Year, CloudIndexBegin, WindIndexBegin, NoiseIndexBegin) %>%
mutate(trans = paste0(Sample_name, '-', Date)) %>%
group_by(trans, Year) %>%
summarise(cloud = max(CloudIndexBegin), noise = max(NoiseIndexBegin), wind = max(WindIndexBegin)) %>%
data.frame()
covs.ordered <- covs[match(row.names(yDat), covs$trans), ]
covs.ordered$length <- 200
covs.ordered$Year <- as.factor(covs.ordered$Year)
data.frame(covs.ordered)
For each transect/date combination, here are the covariates. I added a length column and set it to 200 for all records.
Now, we’ll merge it together with the distance data.
Then, let’s check that they are ordered the same as the bird data.
all(covs.ordered$trans == row.names(yDat)) # check they are all
## [1] TRUE
They are.
Finally, we’ll make a special unmarked data object.
umf <- unmarkedFrameDS(
y = as.matrix(yDat),
siteCovs = covs.ordered,
survey = "line",
dist.breaks = c(0, 50, 100, 150, 200, 250, 300),
tlength = covs.ordered$length,
unitsIn="m"
)
OK, ready to go.
There are a lot of steps to doing the model right. They include
We’re going to skip all of those for now, and simply use K= 30, poisson, and the haz distance function. I’m confident we can do all the steps to find the correct choices for each of those, but for expediancy I just want to make sure we get format our data correctly and get a model fit that makes sense.
m1 <- distsamp(~wind ~Year,
umf,
keyfun = "hazard",
output = "density",
unitsOut = "kmsq")
summary(m1)
##
## Call:
## distsamp(formula = ~wind ~ Year, data = umf, keyfun = "hazard",
## output = "density", unitsOut = "kmsq")
##
## Density (log-scale):
## Estimate SE z P(>|z|)
## (Intercept) 4.012 0.0445 90.217 0.0000
## Year2018 -0.049 0.0577 -0.849 0.3961
## Year2019 -0.142 0.0593 -2.397 0.0166
##
## Detection (log-scale):
## Estimate SE z P(>|z|)
## shape(Intercept) 4.6589 0.0472 98.71 0.0000
## shapewind1 0.0828 0.0460 1.80 0.0717
## shapewind2 0.0849 0.0483 1.76 0.0790
## shapewind3 0.0847 0.0551 1.54 0.1239
## shapewind4 0.0956 0.0782 1.22 0.2215
##
## Hazard-rate(scale) (log-scale):
## Estimate SE z P(>|z|)
## 1.61 0.0695 23.2 4.19e-119
##
## AIC: 5311.912
## Number of sites: 632
## optim convergence code: 0
## optim iterations: 175
## Bootstrap iterations: 0
##
## Survey design: line-transect
## Detection function: hazard
## UnitsIn: m
## UnitsOut: kmsq
It took awhile (~ 10 mins) to fit the model. Looks like wind almost has an effect, and some interest declines by year in WEME. Note, when I ran a model with wind, noise, and clouds, wind was significant…
Let’s plot out the model predictions.
NewData <- data.frame(Year = factor(c('2017', '2018', '2019')))
pred <- predict(
m1,
type = "state", # state = density; det = detection (the shape parameter); scale = scale estimate (the detection fxn)
newdata = NewData,
appendData = TRUE
)
ggplot(pred, aes(x = Year, y = Predicted)) + # a plot of the model predictions
geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = Predicted - SE, ymax = Predicted + SE), width = 0.2) +
theme_bw() +
ggtitle('Model Predictions')
Here are the model predictions (above).
library(dplyr)
data.frame(yDat, covs.ordered) %>% # a plot of the raw data
mutate(sum = X.0.50. + X.50.100. + X.100.150. + X.150.200. + X.200.250. + X.250.300.) %>%
group_by(Year) %>%
summarise(mean = mean(sum), n = length(sum), sd = sd(sum), se = sd/sqrt(n)) %>%
ggplot(aes(x = Year, y = mean)) +
geom_bar(stat = 'identity') +
geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = 0.2) +
theme_bw() +
ggtitle('Raw Data')
And here’s the raw data for comparison. Pretty dang similar, but note inflated y axis value range on the model predictions.