This vignette is designed to explore the estimated death times for individual right whales, and in particular to examine the effects of initial values on the final estimates of death months.
These estimates times/months of death are generated by the model described in Schick et al. (2013): Using Hierarchical Bayes to Understand Movement, Health, and Survival in the Endangered North Atlantic Right Whale, PLoS ONE: e64166.
Herein I will look at the estimates of death times for each of two different model configurations. These models differ only in how the starting times for death are initialised. In the first model run, we initialised starting death months for each animal by taking the last time they were seen (a variable called lastSight) and adding 48 months to it. This is the starting value we’ve used for almost all the model runs to date.
However, after inspecting some of the outputs for possible use in a revised report card, Philip Hamilton (NEAq) had suggested that certain animals seem to die too quickly and without much heterogeniety. In particular, he felt that animals with long sighting gaps ought to be dying more slowly than animals with short sighting gaps. To address this, in the second model run, we initialised the vector of death times by first calculating the mean sighting gap for each individual, and adding that value to lastSight.
This isn’t a prior sensitivity analysis, but what it does look at is how sensitive the posterior estimtes of death are to the starting point. It’s good practice in a computational Bayesian setting to explore this to see if your posterior estimates converge on the same answer regardless of where you started.
This is the model setup we’ve been using so far to date. Before the Gibbs loop, we initialise a set of starting values for the month of death for each individual animal to be 48 months after its last sighting. Here’s what we do:
missingInt <- 4*12
bnt <- 100
death <- rep(nt + bnt, n)
death[deadID] <- deadTime # these are the death times of Known Dead animals
death[death > (lastSight + missingInt)] <- lastSight[death > (lastSight + missingInt)] +
missingInt[death > (lastSight + missingInt)]
The logic here is to create a vector (death) of length n that is populated first with values very far out into the future - beyond the temporal domain of the data. In this way the animals are alive when the modelling stops. This is the nt + bnt value, where nt is the maximum of all the sightings in the data. Then, for any animals known to have died, we set their value to be the month they died. Finally, we test the vector with a conditional that asks if the death time is greater than 4 years after their last sighting. If yes, then set the starting value to be 4 years after their last sighting.
With that vector initialised, then we run through the Gibbs loop and continually estimate deaths for all animals save the known dead animals. There is stochasticity around that, which we store in a matrix called deathyr. We will examine the effect of that initialisation on the results in the next sections.
Model 2 differs from Model 1 only in the way we initialised the death vector. Instead of adding 4 years to the last sighting, what we did was calculate the maximum sighting gap for each animal following 1986. Then we add that individual specific value to its last sighting. We calculate these with this block of code (not run):
# in ingestRawegdataSummary.Rmd
library(lubridate)
sights$Date <- ymd(paste(sights$SightingYear, sights$SightingMonth, sights$SightingDay, sep = '/'))
lgaps <- sights %>%
filter(SightingYear >= 1986) %>%
group_by(SightingEGNo) %>%
arrange(Date) %>%
mutate(sgap = Date - lag(Date)) %>%
summarise(maxgap = round(as.numeric(max(sgap, na.rm = TRUE) / 2628000), 0))
lgaps
write_csv(lgaps, paste(wd, 'maxGapsforPhilip.csv', sep = ''))
# in rightWhaleInput.R
lgaps <- read.csv(file = 'data/maxGapsforPhilip.csv')
iddat <- merge(as.data.frame(ID), lgaps, by.x = 'ID', by.y = 'SightingEGNo', all.x = TRUE)
iddat[which(!is.finite(iddat$maxgap)), 'maxgap'] <- median(iddat$maxgap, na.rm = T)
iddat[which(is.na(iddat$maxgap)), 'maxgap'] <- median(iddat$maxgap, na.rm = T)
Then in rightWhale.R we add this if the longGap == TRUE
missingInt <- 4*12 # time (in months) for which the animal can still be alive after last sighting
if (longGap) {
missingInt <- iddat$maxgap
death[death > (lastSight + missingInt)] <- lastSight[death > (lastSight + missingInt)] + missingInt[death > (lastSight + missingInt)]
} else {
death[death > (lastSight + missingInt)] <- lastSight[death > (lastSight + missingInt)] + missingInt
}
What this effectively means is that some animals’ death will be initialised far from their last sighting, whereas some will be initialised close to their last sighting. The difference lies in the gaps for each animal. The range of the gaps can be seen in the plot below. I’ve added a blue line at 48 months, which was our previous missing interval value:
library(reportCard)
hist(iddat$maxgap, breaks = 30, main = 'Sighting Gaps')
abline(v = 48, lwd = 2, col = 'blue')
The comparison between these two initalisations for the death vector can be seen below. (Note that I’ve added a blue line at the time we’re assessing the matching to be complete – December 2013.)
You can see that they are pretty close to one another. However, because most animals’ maximum gap is less than 48 months, this tends to make the death times closer to the December, 2013 line. Some animals have values much farther out into the future.
So what do the death data look like in the deathyr matrix? Here’s one animal
library(reportCard)
deathyr203[1, 240:300]
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [15] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [29] 0 0 0 0 0 280 581 3482 7547 4247 3025 1983 1424 971
## [43] 662 385 221 117 46 25 4 0 0 0 0 0 0 0
## [57] 0 0 0 0 0
I’ve trimmed it a bit to save space, but as you can see, most of these are 0’s, but a small number of months contain values. These values correspond to the number of times that a particular month was an estimated month of death for this right whale (1001). To make it a bit easier to see, let’s see the candidate months in vector and graphical form:
myName[deathyr203[1, ] > 0]
## [1] "9-1992" "10-1992" "11-1992" "12-1992" "1-1993" "2-1993" "3-1993"
## [8] "4-1993" "5-1993" "6-1993" "7-1993" "8-1993" "9-1993" "10-1993"
## [15] "11-1993" "12-1993"
barplot(deathyr203[1, ][deathyr203[1, ] > 0], names.arg = myName[which(deathyr203[1, ] > 0)])
So you can see there’s quite a range of possible death months. In fact it’s a difference of 15 months. The most likely month of death was 12-1992.
Here’s the same animal for comparison but instead using estimated deaths from a model run that used the second initialisation for the death month. They look almost identical:
myName[deathyr204[1, ] > 0]
## [1] "9-1992" "10-1992" "11-1992" "12-1992" "1-1993" "2-1993" "3-1993"
## [8] "4-1993" "5-1993" "6-1993" "7-1993" "8-1993" "9-1993" "10-1993"
## [15] "11-1993" "12-1993"
barplot(deathyr204[1, ][deathyr204[1, ] > 0], names.arg = myName[which(deathyr204[1, ] > 0)])
Now how about the median dead time for the three animals that were highlighted in the excel spreadsheet you sent me? In the next table, I show the estimated month of death under the two different initialisations (Model 1 and Model 2):
| EGNo | 4 Years | Max Gaps |
|---|---|---|
| 1021 | 10-1998 |
10-1998 |
| 1026 | 12-2009 |
12-2009 |
| 1077 | 10-2004 |
10-2004 |
Ok, so they are identical, which suggests that the results for these three animals, at least, aren’t sensitive to the starting point. The gaps for these animals were actually pretty long:
subset(iddat, ID %in% c(1021, 1026, 1077))
## ID maxgap
## 14 1021 23
## 16 1026 24
## 40 1077 114
So then what did their health look like? See attached pdf.
Of the plots in the pdf, both 1021 and 1077 make sense to me in terms of health, but clearly 1026 does not as they were in good health at the last sighting. While this seems surprising at first blush (to me anyway), it’s possible that this may not seem as anomalous as we think based on the animal’s sighting history. This is an animal that’s been seen a lot, and despite the gap around 2002-3, it’s been seen regularly. So my guess is that based on its sighting parameter, we’d expect to see that animal after September 2009 (last sighting). So if it hasn’t been seen since then, we’d assume it’s dead.
Do the data back up that idea? Let’s look at the estimates of the sightability random effect for each of these three animals. In the next plot, I will show a histogram of the sightings random effect for all animals. I’ll denote the population median with a black line, and then highlight the three individuals you noted.
ng <- 25000
hist(suml / ng, breaks = 30, main = 'Sightings Random Effects', xlab = 'Sightability')
abline(v = median(suml / ng), lwd = 3)
abline(v = mean(suml[which(ID == 1077)] / ng), col = 'red', lwd = 2)
abline(v = mean(suml[which(ID == 1026)] / ng), col = 'green', lwd = 2)
abline(v = mean(suml[which(ID == 1021)] / ng), col = 'blue', lwd = 2)
leg.text <- c('All Animals', 1077, 1026, 1021)
legend(0.003, 400, leg.text, col = c('black', 'red', 'green', 'blue'), lty = 1, lwd = 2)
What does this tell us? Of these three, 1021 is the most sightable, 1077 is the least. Both 1021 and 1026 have higher than average sightability.
The way that these \(\lambda_i\)’s are calculated has to do with where they were seen (both actual observations and imputed ones) given the amount of effort in that area. So the sighting gap isn’t really factored into it explicitly, though it is clearly there implicity - note 1077’s long sighting gap and its low value for \(\lambda\).
One possible way to test this effect of sightability and survival further, I think, is to find animals with equivalent sightability estimates but who are in different health states at their last sighting. By this I mean that while 1077 has a long gap and low sightability, it was in poor health at its last sighting. So we’re not surprised to see death so quickly. We are surprised with 1026, but perhaps this also has to do with its sightability as I mentioned earlier. While it was in good health, given that we expect to see it a lot, it’s most likely dead. This makes sense to me intuitively, though obviously it is dying a lot faster then the 6 year presumed dead threshold.