Survival Data

We want to look at the effects of entanglement severity on survival of individual right whales. We also want to then group survival summaries by entanglement severity to see if/how survival is reduced as the severity of the entanglement increases.

What do the survival data look like? For each individual, we have an entry in the deathyr data frame, which contains the estimates of the month of death:

i <- 308
df <- data.frame(mprob = deathyr[i,][deathyr[i,] > 0], months = myName[deathyr[i,] > 0])
df

##    mprob  months
## 1    487  9-2001
## 2   1059 10-2001
## 3   4647 11-2001
## 4   9192 12-2001
## 5  11206  1-2002
## 6  10442  2-2002
## 7   7633  3-2002
## 8   3736  4-2002
## 9   1128  5-2002
## 10   282  6-2002
## 11    93  7-2002
## 12    51  8-2002
## 13    24  9-2002
## 14    10 10-2002
## 15     8 11-2002
## 16     2 12-2002

g <- ggplot(data = df, aes(1:nrow(df), mprob))+
  geom_bar(stat = 'identity')+
  scale_x_discrete(labels = df$months)+
  xlab('Month-Year')+ 
  theme(axis.text.x  = element_text(angle=45, vjust=0.5, size=16))+
  ggtitle(paste('EGNo: ', ID[i], sep = ''))
print(g)

Survival Vector over Time

So far in the plotting code to get the survival vector, we’ve done this:

svec <- 1 - cumsum(deathyr[i,1:nt]/ngg)
t1   <- min(which(apply(rsum[i,,],1,sum)/ngg > 0))
svec[1:(t1-1)] <- 0
ti <- which(svec > 0)
round(svec[ti], 3)

##   [1] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
##  [12] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
##  [23] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
##  [34] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
##  [45] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
##  [56] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
##  [67] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
##  [78] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
##  [89] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## [100] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## [111] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## [122] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.981
## [133] 0.938 0.752 0.385

Is what you are proposing initially to simply loop over each animal, and take the survival product of svec? That would look like:

prod(svec[ti])

## [1] 0.2661259

We would then build that up for each individual, and get average survival…? For example, here are 100 animals.

idx <- sample(n, 100)
df <- deathyr[idx, ]
svals <- numeric(length(idx))
tst <- idx[c(1, 2, 101)]
for(i in 1:length(idx)){

  svec <- 1 - cumsum(deathyr[idx[i],1:nt]/ngg)
  t1   <- min(which(apply(rsum[idx[i],,],1,sum)/ngg > 0))
  svec[1:(t1-1)] <- 0
  ti <- which(svec > 0)
  svals[i] <- prod(svec[ti])
  
}
summary(svals)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.006505 0.286400 1.000000 0.747200 1.000000 1.000000

hist(svals)

Survival with Uncertainty

However, if I understood your email correctly, this calculation of survival doesn’t fully account for the uncertainty in deathyr[i,], because cumsum() on that vector will always return the same values. Perhaps what you mean is that you sample from the candidate estimates death months in in deathyr[i,] to get a death month/year combination, truncate svec to that, and then calculate? Something along these lines for a single individual:

i <- 308
df <- deathyr[i,]
idx <- which(df > 0) # these are the candidate months in which the animal died
cprob <- df[idx] / sum(df[idx])
print(idx)

##  [1] 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396

nsamp <- 2000
svecOut <- rep(NA, nsamp)

for(j in 1:nsamp) {
  sval <- idx[which(rmultinom(1, 1, cprob) == 1)]
  svec <- 1 - cumsum(deathyr[i, 1:sval] / ngg)
  t1   <- min(which(apply(rsum[i,,],1,sum)/ngg > 0))
  svec[1:(t1-1)] <- 0
  ti <- which(svec > 0)
  svecOut[j] <- prod(svec[ti])
  
}
summary(svecOut)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2661  0.2661  0.2661  0.3256  0.2661  0.9805

And since that’s just for one individual, I’d then repeat that for each individual storing the results of svec[ti] in a list because it’s a different length each time (because of sampling from idx). Then I’d calculate the summary statistics by entanglement class: minor, moderate, and severe? Is that what you proposed?

Survival Analyses

Rob Schick

2015-08-20

Survival Data

Survival Vector over Time

Survival with Uncertainty