Results for Health paper - Part Two

I’ll try to interweave results - tabular and graphical - for the things you’ve highlighted in the revised outline (dated 28 May 2014). This will now include information from the last punch list.

Removing Estimated Dead Animals

First thing that we wanted to look at is the removal of certain time frames of health profiles for individuals that are estimated to be dead. The easiest way to do this is to look at all surviving animals. To do that, we’ll use the death variable, which is output from the model. Anything that has a value lower than nt will be removed, and then we’ll summarise the remaining data. That then will be the lower threshold.

While we’re tallying health, we’ll then find the first instance in each dying animal’s health trace that goes below the threshold. Any health values from that point forward will be set to NA. So let’s load the data:

## 
## Attaching package: 'plyr'
## 
## The following object is masked from 'package:lubridate':
## 
##     here
## 
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:plyr':
## 
##     arrange, desc, failwith, id, mutate, summarise, summarize
## 
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

First up, what do the distribution of deaths look like?

hist(death, breaks = 50)
abline(v = nt, lwd = 3)
abline(h = c(5, 10), lty = 2)

plot of chunk unnamed-chunk-2

Recall that anything over 500 on the x-axis is considered alive. Dashed horizontal lines are at 5 and 10, respectively. This seems reasonable as a first cut (to me at least). Now let’s use that information to retain only the alive individuals in order to summarise their health. To do this, we’ll simply set the dead animals to NA.

mpophealth <- sumh / ng
didx <- death < nt # these are animals imputed to die before the end of the model run
mpophealth[didx,] <- NA # I then set these to NA
mpophealth[mpophealth == 0] <- NA # I then set the times before firstSight to NA

Ok with that mpophealth variable set, we can summarise the distribution of health values for all animals that are currently imputed to be alive.

quantile(mpophealth, na.rm = TRUE)
##    0%   25%   50%   75%  100% 
## 19.58 71.27 78.63 83.72 93.43

So as you can see from the quantiles there is a minimum of 19.5752 but that’s just one animal. If you look at the histogram, it looks like a value of 32.5 is reasonable (that’s where the vertical line is drawn).

hist(mpophealth, breaks = 30)
abline(v = 32.5, lwd = 2)

plot of chunk healthSummaryPlot

We can experiment with that value, but we’ll start there. Let’s make that a matrix (pophealth) that we can use throughout the rest of the document.

hthresh <- 32.5
mpophealth <- sumh / ng
mpophealth[mpophealth < hthresh] <- NA # I then set all values less than this threshold to NA

Resting Females: Successful vs. Unsuccessful Comparisons

First thing we’ll look at are the numbers from the two comparisons we make for resting and for available females. This comparison is to look at the health of successful vs. unsuccessful females. I’ll show the plots, the statistics, and the tallies for the different groups.

Per Roz’s email (4 September 2014), we need to remove the trailing `available’ times. That is, assume that we only know the animal’s status within a known calving/reproductive window, and don’t assume that the years following a last calving are to be classified as ‘available.’ We’ll only have the first year after the last known calving be available.

To make the comparisons, we need to loop over years, and then find out two groups of individuals: 1) those available in a given year and pregnant the next year; and 2) those available in a given year, but still available in the next year.

Those data look like this:

head(pchealth)
##   year navail npregNext nhealth phealth
## 1 1985     12         4   75.30   70.88
## 2 1986      7         2   75.45   82.96
## 3 1987     12         7   81.62   74.77
## 4 1988     10         7   82.14   82.63
## 5 1989      6         4   74.18   79.94
## 6 1990     15         7   77.17   79.34

In the pchealth data frame, the variables are as follows:

  • navail is the sum of 3 to 2, and 3 to 4 animals, i.e. resting to pregnant, resting to available, respectively.
  • npregNext is the sum of the 3 to 2 animals.
  • phealth is the health of the animals who became pregnant
  • nphealth is the health of the animals who DID NOT became pregnant

We can plot the numbers easily to see how many are in each group:

plot of chunk unnamed-chunk-6

## pdf 
##   2

You asked for a summary of these two time series (Resting to not-pregnant; Resting to pregnant) (item # 11):

Statistic Not Pregnant Pregnant
Mean 75.3095 74.4993
SD 4.7499 4.9341

I’ll write these out to a csv file for further comparison:

write.csv(pchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/Summary_RestingPregnantComparison.csv')

But we also want to see if these are different. To do that, we should work with the raw data, not the mean summaries. So we need to grab those. I’ll do this by looping over year as before, but instead of grabbing the mean, I’ll just grab the whole year’s health values:

This yields a data frame where each row has one individual’s health for a given year, along with its status. Here’s what the first few lines of the data look like:

##      EGNo   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct
## 1118 1118 74.48 72.93 71.76 70.96 70.53 70.34 70.39 70.74 71.36 72.25
## 1127 1127 68.13 68.80 69.70 70.48 71.11 71.66 72.14 72.51 72.82 73.07
## 1135 1135 65.68 65.48 65.30 65.66 66.01 66.41 66.91 67.54 68.35 69.32
## 1157 1157 68.20 68.79 69.59 70.69 71.65 72.48 73.19 73.84 74.47 75.06
## 1142 1142 69.42 70.91 72.63 74.71 76.49 78.05 79.32 80.31 81.13 81.66
## 1145 1145 69.99 70.58 71.42 72.19 72.85 73.45 74.02 74.46 74.85 75.16
##        Nov   Dec year status
## 1118 73.06 73.85 1985     32
## 1127 73.20 73.33 1985     32
## 1135 70.14 70.81 1985     32
## 1157 75.66 76.22 1985     32
## 1142 82.05 82.29 1985     34
## 1145 75.49 75.76 1985     34

In the above, status of 32 means they were resting one year, and pregnant the next. Status of 34 means resting one year, but available the next.

Let’s plot out the densities of these to check for normality:

plot of chunk unnamed-chunk-10

## pdf 
##   2

They seem to be left skewed to me. But here’s the t.test comparing their means (n.b. I looked in R’s help, and these 95% CIs are for the sample estimates as we presumed yesterday) are. You can see that the ones that transition to pregnant are healthier:

t.test(as.vector(pchealthAll[pchealthAll$status == 32, 2:13]), as.vector(pchealthAll[pchealthAll$status == 34, 2:13]))
## 
##  Welch Two Sample t-test
## 
## data:  as.vector(pchealthAll[pchealthAll$status == 32, 2:13]) and as.vector(pchealthAll[pchealthAll$status == 34, 2:13])
## t = 1.131, df = 3045, p-value = 0.2581
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2594  0.9668
## sample estimates:
## mean of x mean of y 
##     73.17     72.82

I’ll write out the file for you to have the raw data.

write.csv(pchealthAll, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/RestingPregnantComparisonRawData.csv')

Available Females: Successful vs. Unsuccessful Comparisons

Now let’s look at the available females in the same way.

##   year navail npregNext nhealth phealth
## 1 1985      5         2   78.82   79.23
## 2 1986     11         3   80.65   79.30
## 3 1987     13         6   78.84   79.94
## 4 1988     12         3   79.32   75.41
## 5 1989     12         9   75.56   80.93
## 6 1990      5         2   65.11   78.00

In the avchealth data frame, the variables are as follows:

  • navail is the sum of 4 to 2, and 4 to 4 animals, i.e. available to pregnant, available to available, respectively.
  • npregNext is the sum of the 4 to 2 animals.
  • phealth is the health of the animals who became pregnant
  • nphealth is the health of the animals who DID NOT became pregnant

You asked for a summary of these two time series (Available to not-pregnant; Available to pregnant) (item # 11):

Statistic Not Pregnant Pregnant
Mean 73.5356 75.4246
SD 5.9973 4.8711

I’ll write out the file for you:

write.csv(avchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/Summary_AvailablePregnantComparison.csv')

And then we can plot the numbers in each category by year.

plot of chunk unnamed-chunk-15

## pdf 
##   2

And plot the yearly mean scores of health:

plot of chunk unnamed-chunk-16

## pdf 
##   2
## pdf 
##   2
## Warning: Removed 1 rows containing missing values (geom_path).

But we want to look at the raw data as well so we’ll repeat that loop storing the actual health, not the means.

And on that dataset we’ll run the t-test:

t.test(as.vector(avchealth[avchealth$status == 42, 2:13]), as.vector(avchealth[avchealth$status == 44, 2:13]))
## 
##  Welch Two Sample t-test
## 
## data:  as.vector(avchealth[avchealth$status == 42, 2:13]) and as.vector(avchealth[avchealth$status == 44, 2:13])
## t = 4.787, df = 4439, p-value = 1.751e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.9356 2.2338
## sample estimates:
## mean of x mean of y 
##     74.55     72.96

As before, there are significant differences in the ones who transition to pregnant; they are healthier than those who transition to available.

Here are the density plots of these data:

plot of chunk unnamed-chunk-20

## pdf 
##   2

These too are left skewed, and I’ll write them out to you for further comparison.

write.csv(avchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/AvailablePregnantHealthComparisonsRawData.csv')

Health of Available Animals vs. Calving Rates

Here let’s plot out a scatter plot of health of those available in the current year vs. the ratio of # of calves next year / # available females this year. The filled circles is the health of those who got pregnant; the empty circles are those who didn’t:

plot of chunk unnamed-chunk-22

## pdf 
##   2

Master Plot of Health of Available Animals (Including First Calf) 1 & 2 Years Prior

I’m altering this section somewhat from the first version of this document. Now I want to look at the distribution of health scores in two diffrent perods: 1) the year one years prior to having a calf; and 2) the year two years prior to having a calf. This corresponds to item #10 in the most recent punch list.

First thing we need to do is grab the data and mold it into a useful form. The logic here is that we’ll take the master data frame summary containing all individual health profiles, find the times that are one year prior and then two years prior, and then set everything else to NA.

p.idx <- which(calves == 1, arr.ind = TRUE) # this is the pregnancy year
a1idx <- a2idx <- p.idx
a2idx[, 2] <- p.idx[, 2] - 12 # this is the pregnancy year minus one year
a1idx <- a1idx[which(a1idx[,2] > 0),]
a2idx <- a2idx[which(a2idx[,2] > 0),]

pop.idx <- a1idx
pophealth <- mpophealth
psub1 <- matrix(NA, dim(pophealth)[1], dim(pophealth)[2])
psub1[pop.idx] <- pophealth[pop.idx] 
psub1[psub1 < 5] <- NA # This is to remove health of females that are before firstSight

pop.idx <- a2idx
pophealth <- mpophealth
psub2 <- matrix(NA, dim(pophealth)[1], dim(pophealth)[2])
psub2[pop.idx] <- pophealth[pop.idx] 
psub2[psub2 < 5] <- NA 

df1 <- data.frame(year = 'year1', data = quantile(psub1, c(0.05, 0.25, 0.5, 0.75, 0.95), na.rm = TRUE))
df2 <- data.frame(year = 'year2', data = quantile(psub2, c(0.05, 0.25, 0.5, 0.75, 0.95), na.rm = TRUE))
df <- cbind(df1, df2)
df
##      year  data  year  data
## 5%  year1 52.69 year2 54.86
## 25% year1 67.53 year2 73.99
## 50% year1 72.23 year2 77.70
## 75% year1 79.07 year2 81.66
## 95% year1 86.68 year2 86.40

And then we can visualise the histogram and density plot (I’ll facet by year):

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk plotAvailOneTwo

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## pdf 
##   2

Population Health Scores

Ok, let’s look at the population level health scores. First, I need to factor out the young juveniles, and then marry up the juvenile times with the adult times for a consecutive health profile.

##   SightingEGNo   min_date   max_date MinDate MaxDate MinDateInt MaxDateInt
## 1         1005 1993-08-27 1997-08-23  8-1993  8-1997        284        332
## 2         1017 1988-04-03 2011-03-17  4-1988  3-2011        220        495
## 3         1019 1988-05-05 2009-07-18  5-1988  7-2009        221        475
## 4         1021 1988-04-03 1998-08-04  4-1988  8-1998        220        344
## 5         1026 1989-08-28 2009-09-03  8-1989  9-2009        176        477
## 6         1028 1988-09-02 2008-10-12  9-1988 10-2008        225        466

Then let’s get the population health data set up using these individual and time indices

And then plotted:

plot of chunk unnamed-chunk-25

## pdf 
##   2

Ok, then what are the summary stats of these data, and what do they look like? The Fivenum below summary contains: minimum, lower-hinge, median, upper-hinge, maximum.

fivenum(pophealth)
## [1] 32.56 68.64 78.66 84.12 88.81

And then the histogram of the population health data:

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-27

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## pdf 
##   2

Note that the 0’s are likely from animals that will be estimated as dead. We can talk about pulling out the health values of the animals that the model has estimated to be dead.

Finally, let’s add the calving #’s on top of the health plot to round out the picture a little bit.

plot of chunk unnamed-chunk-28plot of chunk unnamed-chunk-28plot of chunk unnamed-chunk-28

## pdf 
##   2

Sub-population Health

Ok, that was the population level health. What about some of the sub-population categories? We can turn to these now. I’ll leave some of the code out since it’s just a repeat of what we did to generate the population level values. We need summaries for: * Adult males * Young juvs * Old juvs * Pregnant * Lactating * Resting * Available

plot of chunk adultMales

## pdf 
##   2

plot of chunk youngJuveniles

## pdf 
##   2

For Young Juveniles we need to see why the values are tailing off in 2009 and 2010. Easiest way to do this is to look at summaries of the body and skin condition for these time periods. I’ll start in 2008 as that is what appears to be the peak in the late 00’s. Here’s body, followed by skin

jidx <- jd1.idx[jd1.idx$MinDateInt > 456, ]
sko <- bobs <- rep(NA, nrow(jidx))
for(i in 1:nrow(jidx)){
  bobs[i] <- sum(!is.na(hobs[which(ID == jidx$SightingEGNo[i]), jidx$MinDateInt[i]:jidx$MaxDateInt[i]] ))
  sko[i] <- sum(!is.na(skobs[which(ID == jidx$SightingEGNo[i]), jidx$MinDateInt[i]:jidx$MaxDateInt[i]] ))
}
jidx$bobs <- bobs
jidx$skin <- sko
jidx[,c(1,2,3,8,9)]
##     SightingEGNo   min_date   max_date bobs skin
## 242         3701 2008-05-07 2009-09-17    0    0
## 243         3705 2008-02-03 2009-09-24    0    0
## 244         3710 2008-02-03 2009-01-30    0    0
## 245         3712 2008-02-24 2009-11-07    0    0
## 246         3714 2008-01-24 2009-12-21    2    2
## 247         3720 2008-01-13 2009-09-02    0    0
## 248         3725 2008-02-14 2009-08-28    0    0
## 249         3730 2008-02-05 2009-09-17    0    0
## 250         3740 2008-02-20 2009-12-04    0    0
## 251         3742 2008-01-28 2009-12-31    0    0
## 252         3745 2008-01-06 2009-12-22    0    0
## 253         3746 2008-02-16 2009-09-26    0    0
## 254         3750 2008-02-24 2009-09-26    0    0
## 255         3760 2008-05-01 2009-10-20    0    0
## 256         3770 2008-05-07 2009-11-07    0    0
## 257         3791 2008-01-14 2009-05-12    0    0
## 258         3792 2008-01-09 2009-12-22    0    0
## 259         3802 2009-02-12 2010-04-21    0    0
## 260         3803 2010-02-15 2010-08-19    0    0
## 261         3808 2009-02-09 2010-04-18    0    0
## 262         3810 2009-02-07 2010-04-21    0    0
## 263         3812 2009-02-13 2010-05-18    0    0
## 264         3815 2009-01-14 2010-08-31    0    0
## 265         3820 2009-03-23 2010-08-11    0    0
## 266         3821 2009-03-24 2010-04-20    0    0
## 267         3822 2009-02-08 2010-04-13    0    0
## 268         3823 2009-06-07 2010-01-15    0    0
## 269         3830 2009-01-21 2010-01-28    0    0
## 270         3832 2009-02-10 2009-09-21    0    0
## 271         3843 2009-02-13 2010-02-14    0    0
## 272         3845 2009-02-05 2010-04-20    0    0
## 273         3850 2009-02-18 2010-01-19    0    0
## 274         3853 2009-01-02 2009-12-26    0    0
## 275         3890 2009-01-10 2010-04-13    0    0
## 276         3892 2009-01-05 2010-04-13    0    0
## 277         3893 2009-02-08 2010-05-04    0    0
## 278         3903 2010-01-27 2010-02-17    0    0
## 279         3904 2010-03-14 2010-10-24    0    0
## 280         3908 2010-01-18 2011-04-22    0    0
## 281         3911 2010-02-19 2011-02-03    4    4
## 282         3915 2010-01-19 2011-03-17    0    0
## 283         3930 2010-01-25 2011-04-22    0    0
## 284         3934 2010-01-28 2010-02-17    0    0
## 285         3940 2010-01-15 2010-05-18    0    0
## 286         3945 2010-01-03 2010-04-21    0    0
## 287         3946 2011-03-17 2011-03-17    0    0
## 288         3950 2010-01-28 2010-08-22    0    0
## 289         3951 2010-01-07 2011-02-17    0    0
## 290         3966 2010-01-18 2010-05-16    0    0
## 291         3981 2010-01-28 2010-03-07    0    0
## 292         3990 2010-01-28 2010-10-20    0    0

So as you can see, there’s not a lot of VHP data here. The number in the body and skin columns corresponds to the number of cases within that date range that we have an observation for that parameter.

Next we’ll move on to looking at oldJuveniles

plot of chunk oldJuveniles

## pdf 
##   2

And now let’s look at the different reproductive Female categories. For the pregnant females, I can use the existing time query, but I’ll alter it for lactating, resting, and available per our discussions over email. Here are the pregnant females:

plot of chunk pregFemales

## pdf 
##   2

Now let’s adjust the query for the other times in the reproductive life cycle. Note that I won’t print the code. it’s hidden, but here if you want it.

With the calving status updated to remove the trailing available times, we can look at the remaining three categories: plot of chunk lacFemales

## pdf 
##   2

plot of chunk restFemales

## pdf 
##   2

plot of chunk avFemales

## pdf 
##   2

This next block of code just plots the individual traces for 2006 for the available females, as it appears that the median for that first year is quite low (shown in the preceding plot).

avchealth <- numeric(0)
idx <- which(monYr[,2] == 2006)
cavpreg  <- unique(which(newCalves[, idx] == 4 ,arr.ind = TRUE)[,1]) 
df <- mpophealth[cavpreg, idx]
colnames(df) <- 1:12
df <- data.frame(df, ID = row.names(df))
dfL <- melt(df, id.vars = 'ID', measure.vars = 1:12)
p <- ggplot(data = dfL, aes(x = variable, y = value, group = ID))+
  geom_line()
print(p)

plot of chunk testAvail

Indeed if you look at the health traces for 2006, you can see that there are 4 animals with January 2006 health values below 50. Three animals start in the mid-50’s, but go on to recover throughout the year.

Correlations between Health in Available & Resting Females and Calving Ratio

The correlation between health of all available females in one year and calves/year (in the following year) is set up with this chunk of code:

cvec <- table(calfTable$Calving.Year)
cvec <- cvec[which(names(cvec) == 1986):(length(cvec)-1)]
health <- dfavfem[dfavfem$category == 'avfemHealth',-c(1, ncol(dfavfem))]
cvs <- data.frame(calfyr = names(cvec), numcalf = cvec)
hvs <- data.frame(hyear = names(health), hvals = t(health))
dfcor <- cbind(hvs[1:24,], cvs)
head(dfcor)
##      hyear    X1 calfyr numcalf
## 1985  1985 77.48   1986      13
## 1986  1986 79.55   1987      11
## 1987  1987 78.74   1988       7
## 1988  1988 80.19   1989      16
## 1989  1989 79.31   1990      12
## 1990  1990 77.85   1991      17
# cor(dfcor[,'health'], dfcor[,'numcalf'], use = "complete.obs")

Note that I’ve done this quickly, and I’m somewhat tired. So I’ll need to go back and check the logic. But it looks right so far. The correlation is weak (-0.1173), but I still like Scott’s line of thinking about making the link (at least in discussion form) about this and pop consequences.

Health in Calving Females 1 and 2 Years Prior

This corresponds to item #10 in the punch list. We have to generate the data by determining the individuals who gave birth in any given year, and then looking at their health one and two years prior. We’ll loop over years to do this.

cvec <- table(calfTable$Calving.Year)
cvec <- cvec[which(names(cvec) == 1985):(length(cvec)-1)]
yrseq <- seq(1985, 2009)
health1 <- health2 <- rep(NA, length.out = length(yrseq))
for(i in 1:length(yrseq)){
  inds <- unique(calfTable[calfTable$Calving.Year == yrseq[i], 'EGNo']) # These are the moms who gave birth in one year
  r.idx <- which(rownames(mpophealth) %in% inds) # these are their positions in the health data frame
  yr1idx <- which(monYr[,2] == yrseq[i]) - 12 # indices to get health 1 year prior
  yr2idx <- which(monYr[,2] == yrseq[i]) - 24 # indices to get health 2 years prior
  health1[i] <- mean(mpophealth[r.idx, yr1idx], na.rm = TRUE)
  health2[i] <- mean(mpophealth[r.idx, yr2idx], na.rm = TRUE)
}
df <- data.frame(year = yrseq, calves = cvec, hYear1 = health1, hYear2 = health2)
head(df)
##      year calves hYear1 hYear2
## 1985 1985     11  72.31  76.80
## 1986 1986     13  79.64  80.81
## 1987 1987     11  79.95  76.82
## 1988 1988      7  74.60  77.66
## 1989 1989     16  80.04  79.39
## 1990 1990     12  74.19  75.21

You asked for a correlation for these. For # Calves and health one year prior, the correlation is: -0.3815. For # Calves and health two years prior, the correlation is: -0.3682.

I’ll write these out to you for work in excel.

write.csv(df, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/CalvesHealthOneYearTwoYearPrior.csv')

Health in Different Groups

This is a new test to see if the different groups are statistically different. We’ll run an ANOVA on the different groups yearly avaerages. If this isn’t quite what you are after, let me know.

health <- c(adMalesHealth, yjuvsHealth, ojuvsHealth, pfemHealth, lacfemHealth, restfemHealth, avfemHealth)
group <- rep(c('AdMales', 'yJuvs', 'oJuvs', 'pfem', 'lacfem', 'restfem', 'avfem'), each = length(lacfemHealth))
df <- data.frame(health = health,
                 group = group)
aov.ex1 = aov(health ~ group, data = df)
summary(aov.ex1)                             
##              Df Sum Sq Mean Sq F value Pr(>F)    
## group         6   1577   262.9    12.6  1e-11 ***
## Residuals   167   3475    20.8                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
print(model.tables(aov.ex1,"means"),digits=3)
## Tables of means
## Grand mean
##       
## 77.98 
## 
##  group 
##     AdMales avfem lacfem oJuvs pfem restfem yJuvs
##        78.5  77.5   73.6  81.2 77.9    74.6  82.6
## rep    25.0  24.0   25.0  25.0 25.0    25.0  25.0
boxplot(health ~ group, data = df)

plot of chunk anova

Health in Different Groups in Good vs. Bad Years

So now we want to examine the median health of animals in different sub groups during 3 time periods:

  • 1992-1994
  • 1997-1999
  • 2004-2006

We also want them for the three decades. So let’s set these up as indices we can recycle below:

idx92 <- which(monYr[,2] == 1993 | monYr[,2] == 1994 | monYr[,2] == 1995)
idx97 <- which(monYr[,2] == 1998 | monYr[,2] == 1999 | monYr[,2] == 2000)
idx04 <- which(monYr[,2] == 2004 | monYr[,2] == 2005 | monYr[,2] == 2006)
dec1 <- which(monYr[,2] >= 1980 & monYr[,2] <= 1989)
dec2 <- which(monYr[,2] >= 1990 & monYr[,2] <= 1999)
dec3 <- which(monYr[,2] >= 2000 & monYr[,2] <= 2009)

First let’s look at the population level health, and show it with density plots of the yearly average plot of chunk population

## pdf 
##   2

Then we’ll look at Adult Males, juveniles, pregnant, lactating, resting, & available females. Note that I’ve updated this chunk of code to include the mean’s and SDs in the different periods. Between the plots I’ll put in tables with the values (means & SDs). I’ll also put in an ANOVA table of the health in the different periods.

Adult Males - Density Plots

plot of chunk admales

## pdf 
##   2

Adult Males - 3 Periods

Year Mean SD
1993-1995 77.4949 10.8418
1998-2000 71.0307 12.9645
2004-2006 72.1952 12.1114
##                Df  Sum Sq Mean Sq F value Pr(>F)    
## group           2  203441  101720     700 <2e-16 ***
## Residuals   29443 4276964     145                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 73.29 
## 
##  group 
##     Period1 Period2 Period3
##        77.5      71    72.2
## rep  8063.0    9024 12359.0

plot of chunk adMale3perANOVA

Adult Males - 3 Decades

Year Mean SD
1980-1989 80.1518 8.9888
1990-1999 75.2502 12.2764
2000-2009 73.065 12.7411
##                Df   Sum Sq Mean Sq F value Pr(>F)    
## group           2   480106  240053    1649 <2e-16 ***
## Residuals   79308 11547177     146                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 74.92 
## 
##  group 
##     Decade1 Decade2 Decade3
##        80.2    75.3    73.1
## rep 12424.0 27045.0 39842.0

plot of chunk adMale3decANOVA

Young Juveniles - Density Plots

plot of chunk yJuvs

## pdf 
##   2

Young Juveniles - 3 Periods

Year Mean SD
1993-1995 76.8179 11.0619
1998-2000 70.843 13.2567
2004-2006 71.2244 12.2402
##                Df  Sum Sq Mean Sq F value Pr(>F)    
## group           2  156816   78408     524 <2e-16 ***
## Residuals   21792 3261471     150                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 72.85 
## 
##  group 
##     Period1 Period2 Period3
##        76.8    70.8    71.2
## rep  6823.0  7095.0  7877.0

plot of chunk yjuvs3perANOVA

Young Juveniles - 3 Decades

Year Mean SD
1980-1989 80.9409 8.4229
1990-1999 75.1113 12.3217
2000-2009 72.3887 12.7387
##                Df  Sum Sq Mean Sq F value Pr(>F)    
## group           2  784196  392098    2928 <2e-16 ***
## Residuals   66628 8922320     134                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 75.65 
## 
##  group 
##     Decade1 Decade2 Decade3
##        80.9    75.1    72.4
## rep 18060.0 23001.0 25570.0

plot of chunk yJuvs3decANOVA

Old Juveniles - Density Plots

plot of chunk oldJuvs

## pdf 
##   2

Old Juveniles - 3 Periods

Year Mean SD
1993-1995 77.3554 10.9533
1998-2000 70.8807 13.1351
2004-2006 71.0913 12.5323
##                Df  Sum Sq Mean Sq F value Pr(>F)    
## group           2  212034  106017     708 <2e-16 ***
## Residuals   23594 3532958     150                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##      
## 73.1 
## 
##  group 
##     Period1 Period2 Period3
##        77.4    70.9    71.1
## rep  7821.0  7583.0  8193.0

plot of chunk ojuvs3perANOVA

Old Juveniles - 3 Decades

Year Mean SD
1980-1989 80.9098 8.5936
1990-1999 75.5473 12.2218
2000-2009 72.1267 13.1398
##                Df   Sum Sq Mean Sq F value Pr(>F)    
## group           2   862250  431125    3102 <2e-16 ***
## Residuals   72143 10027099     139                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##      
## 75.6 
## 
##  group 
##     Decade1 Decade2 Decade3
##        80.9    75.5    72.1
## rep 18571.0 25509.0 28066.0

plot of chunk oJuvs3decANOVA

Pregnant Females - Density Plots

plot of chunk pregFemales3per

## pdf 
##   2

Pregnant Females - 3 Periods

Year Mean SD
1993-1995 78.929 6.5579
1998-2000 76.7403 7.6905
2004-2006 71.2797 8.8011
##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2  18899    9450     147 <2e-16 ***
## Residuals   1678 108026      64                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 74.54 
## 
##  group 
##     Period1 Period2 Period3
##        78.9    76.7    71.3
## rep   409.0   432.0   840.0

plot of chunk pfem3perANOVA

Pregnant Females - 3 Decades

Year Mean SD
1980-1989 77.6663 7.467
1990-1999 78.1406 7.113
2000-2009 73.137 10.6118
##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2  29280   14640     171 <2e-16 ***
## Residuals   5175 444107      86                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 75.29 
## 
##  group 
##     Decade1 Decade2 Decade3
##        77.7    78.1    73.1
## rep  1157.0  1177.0  2844.0

plot of chunk pfem3decANOVA

Lactating Females - Density Plots

plot of chunk lacFemales3per

## pdf 
##   2

Lactating Females - 3 Periods

Year Mean SD
1993-1995 75.0378 8.9767
1998-2000 75.8439 5.8223
2004-2006 68.3137 7.7374
##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2  12048    6024    97.6 <2e-16 ***
## Residuals   1102  68021      62                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 70.65 
## 
##  group 
##     Period1 Period2 Period3
##          75    75.8    68.3
## rep     249   120.0   736.0

plot of chunk lacfem3perANOVA

Lactating Females - 3 Decades

Year Mean SD
1980-1989 75.2867 9.8549
1990-1999 73.4285 8.9645
2000-2009 70.5919 10.8319
##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2  19082    9541    92.4 <2e-16 ***
## Residuals   4958 512005     103                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 72.38 
## 
##  group 
##     Decade1 Decade2 Decade3
##        75.3    73.4    70.6
## rep  1122.0  1274.0  2565.0

plot of chunk lacfem3decANOVA

Resting Females - Density Plots

plot of chunk restFemales3per

## pdf 
##   2

Resting Females - 3 Periods

Year Mean SD
1993-1995 74.5279 8.9588
1998-2000 69.4645 9.7953
2004-2006 67.4387 10.0995
##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2   9581    4791    49.9 <2e-16 ***
## Residuals   1252 120146      96                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 69.42 
## 
##  group 
##     Period1 Period2 Period3
##        74.5    69.5    67.4
## rep   264.0   305.0   686.0

plot of chunk restfem3perANOVA

Resting Females - 3 Decades

Year Mean SD
1980-1989 77.2302 7.2932
1990-1999 72.0866 10.2495
2000-2009 70.8518 10.1595
##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2  26901   13450     146 <2e-16 ***
## Residuals   4193 386819      92                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 72.71 
## 
##  group 
##     Decade1 Decade2 Decade3
##        77.2    72.1    70.9
## rep   968.0  1313.0  1915.0

plot of chunk restfem3decANOVA

Available Females - Density Plots

plot of chunk avFemales3per

## pdf 
##   2

Available Females - 3 Periods

Year Mean SD
1993-1995 75.7572 11.2671
1998-2000 71.3794 10.2091
2004-2006 67.1272 15.1667
##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2  22821   11410    87.8 <2e-16 ***
## Residuals   2685 348855     130                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 72.39 
## 
##  group 
##     Period1 Period2 Period3
##        75.8    71.4    67.1
## rep   984.0  1332.0   372.0

plot of chunk avfem3perANOVA

Available Females - 3 Decades

Year Mean SD
1980-1989 78.8526 4.9648
1990-1999 73.5189 11.6231
2000-2009 71.1501 12.9331
##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2  31359   15679     124 <2e-16 ***
## Residuals   5097 643149     126                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##       
## 73.72 
## 
##  group 
##     Decade1 Decade2 Decade3
##        78.9    73.5    71.2
## rep   828.0  2844.0  1428.0

plot of chunk avfem3decANOVA

Summary Table Data Prep

Ok, we need to make the data to write to excel. What we want is for each row to correspond to a population or sub-population category, and then each column to have the yearly summary of health (with CIs). I’ll copy this file, along with the others, to a “data” folder in the main dropbox paper folder.

dfAll <- rbind(dfPop, dfadMales, dfyjuves, dfojuves, dfpregfem, dflacfem, dfrestfem, dfavfem)
write.csv(dfAll, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/YearlyHealthSummariesByGroup.csv')

Pairwise Comparison within One Individual: EGNo 1245

Ok, this is just a one-off test of doing a comparison of the ‘before’ and ‘after’ test for one individual. In this case the before will be either resting or available years, and the after will be pregnant years. So 1245 has had 4 calves in these years:

calfTable[calfTable$EGNo == 1245,]
##     EGNo Calving.Year CalfNo CalvingInterval
## 167 1245         1996   2645              NA
## 168 1245         2001   3145               5
## 169 1245         2005   3545               4
## 170 1245         2008   3845               3
idx <- which(ID == 1245)

df <- data.frame(status = newCalves[idx,], dates = as.Date(paste(monYr[,1], 01 , monYr[,2], sep = '/'), format = "%m/%d/%Y"))
df <- df[which(df$status != 0),] 
p <- ggplot(df, aes(dates, status, group = status, colour = factor(status)))+
  geom_point()+
  labs(title = paste('EGNo: ', ID[idx], sep = ''), colour = 'status')+
  scale_colour_discrete(labels = c('Calving', 'Pregnant', 'Resting', 'Available'))
p

plot of chunk unnamed-chunk-31

Ok, let’s set up the vectors to get her health in these periods. What I will define as the successful transitions are either resting to pregnant (3 to 2) or available to pregnant (4 to 2). The unsuccessful transitions will be resting to available (3 to 4) or available to available (4 to 4).

Ok, now that we have the indices set up, let’s get the health enumerated.

Phew. Now that that is behind us, let’s examine health in good versus bad years for EGNo 1245:

t.test(success, failure, paired = TRUE)
## 
##  Paired t-test
## 
## data:  success and failure
## t = 4.432, df = 23, p-value = 0.0001918
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   8.562 23.550
## sample estimates:
## mean of the differences 
##                   16.06

Pairwise Comparison within All Individuals

This is item #12 from the punch list. This time it’s a proper paired t-test.

femID <- unique(sights[which(sights$Age == 'A' & sights$GenderCode == 'F'), 'SightingEGNo'])
outvals <- numeric(0)
success.aov <- failure.aov <- failure <- success <- success.all <- failure.all <- numeric(0)
for(j in 1:length(femID)){
  idx <- femID[j]
  ridx <- which(row.names(newCalves) == idx)
  
  df <- data.frame(year = monYr[,2], status = newCalves[ridx,], change = newCalves[ridx,] - lead(newCalves[ridx,]))
  rpidx <- monYr[which(df$status == 3 & df$change == 1), 2]
  apidx <- monYr[which(df$status == 4 & df$change == 2), 2]
  raidx <- monYr[which(df$status == 3 & df$change == -1), 2]
  avavidx <- numeric(0)
  yrseq <- 1980:2009
  for(i in 1:length(yrseq)){
    yidx <- which(monYr[,2] == yrseq[i])
    nyidx <- which(monYr[,2] == yrseq[i + 1])
    cidx <- which(newCalves[ridx, yidx] == 4 & newCalves[ridx, nyidx] == 4, arr.ind = TRUE)
    if(length(cidx) == 0){next} else {
    avavidx <- c(avavidx, yrseq[i])
    }
  }
  if( length(c(rpidx, apidx)) < 1 | length(c(raidx, avavidx)) < 1){next}
  pophealth <- mpophealth
  pophealth[which(pophealth < 5)] <- NA

  if(length(rpidx) > 0){rphealth <- data.frame(health = pophealth[ridx, which(monYr[, 2] == rpidx)], status = 32)}
  if(length(apidx) > 0){aphealth <- data.frame(health = pophealth[ridx, which(monYr[, 2] == apidx)], status = 42)}
  if(length(rpidx) > 0 & length(apidx) > 0){success <- c(rphealth$health, aphealth$health)} 
  if(length(rpidx) > 0 & length(apidx) == 0){success <- c(rphealth$health)} 
  if(length(rpidx) == 0 & length(apidx) > 0){success <- c(aphealth$health)} 
  
  if(length(raidx) > 0){rahealth <- data.frame(health = pophealth[ridx, which(monYr[, 2] == raidx)], status = 34)}
  if(length(avavidx) > 0){aahealth <- data.frame(health = pophealth[ridx, which(monYr[, 2] == avavidx)], status = 44)}
  if(length(avavidx) > 0 & length(raidx) > 0){failure <- c(rahealth$health, aahealth$health)}
  if(length(avavidx) > 0 & length(raidx) == 0){failure <- c(aahealth$health)}
  if(length(avavidx) == 0 & length(raidx) > 0){failure <- c(rahealth$health)}
  
  
success.all <- c(success.all, mean(success, na.rm = TRUE))
failure.all <- c(failure.all, mean(failure, na.rm = TRUE))

success.aov <- c(success.aov, success)
failure.aov <- c(failure.aov, failure)
}

And then let’s run the t-test. Here I’ll run two. The first is a simple paired with the idea that the means are different. The second specifies that the alternative hypothesis is that the health in success years is greater than health in failure years.

t.test(success.all, failure.all, paired = TRUE)
## 
##  Paired t-test
## 
## data:  success.all and failure.all
## t = 1.641, df = 54, p-value = 0.1066
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3252  3.2603
## sample estimates:
## mean of the differences 
##                   1.468
t.test(success.all, failure.all, paired = TRUE, alt = "greater")
## 
##  Paired t-test
## 
## data:  success.all and failure.all
## t = 1.641, df = 54, p-value = 0.05328
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.02894      Inf
## sample estimates:
## mean of the differences 
##                   1.468

I can also test these difference with an ANOVA, which will keep more of the data:

df <- data.frame(health = c(success.aov, failure.aov), group = rep(c('success', 'failure'), times = c(length(success.aov), length(failure.aov))))
aov.ex1 = aov(health ~ group, data = df)
summary(aov.ex1)                             
##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          1   3094    3094    30.4  4e-08 ***
## Residuals   2334 237910     102                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 4 observations deleted due to missingness
print(model.tables(aov.ex1,"means"),digits=3)
## Tables of means
## Grand mean
##       
## 73.92 
## 
##  group 
##     failure success
##        72.8    75.2
## rep  1248.0  1088.0
boxplot(health ~ group, data = df)

plot of chunk anovaBeforeAfter

So in both cases the successful year is higher than the unsuccessful year. The significance in the t-test is marginal, but in the ANOVA is clear.

rep(c(‘success’, ‘failure’), times = c(length(success.all), length(failure.all)))

Health Climatology

Goal here is to plot out the average monthly health over all years to see if we can generate some sort of climatology type picture. Let’s prep the data first

pop.idx <- popAll.idx
pophealth <- mpophealth
for(i in 1:nrow(pop.idx)){
    pophealth[rownames(pophealth) == pop.idx$SightingEGNo[i], 1:(pop.idx$MinDateInt[i] - 1)] <- NA  # before first sight 
    pophealth[rownames(pophealth) == pop.idx$SightingEGNo[i], (pop.idx$MaxDateInt[i] + 1):ncol(pophealth)] <- NA  # after last time
  }
pophealth[which(pophealth < 5)] <- NA
mhealth <- pophealth[gender == 'M', ]
femhealth <- pophealth[gender == 'F', ]

mvec <- rep(NA, 12)
femquant <- mquant <- matrix(NA, nrow = 3, ncol = 12)
femvec <- rep(NA, 12)
yrs <- unique(monYr[,2])
for(i in 1:12){
  cols <- which(monYr[,1] == i)
  mvec[i] <- median(mhealth[, cols], na.rm = TRUE)
  femvec[i] <- median(femhealth[, cols], na.rm = TRUE)
femquant[,i] <- quantile(femhealth[, cols], c(0.05, 0.5, 0.95), na.rm = TRUE)
mquant[,i] <- quantile(mhealth[, cols], c(0.05, 0.5, 0.95), na.rm = TRUE)
}

And then visualise that

plot(femvec, type = 'b', col = 'grey60', ylim = c(78, 78.5))
lines(mvec, col = 'grey20')
text(6, 78.35, 'Males')
text(6, 78.1, 'Females')

plot of chunk unnamed-chunk-36

pdf(file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/imagesJuly2014/HealthClimatology.pdf')
plot(femvec, type = 'b', col = 'grey60', ylim = c(78, 78.5))
lines(mvec, col = 'grey20')
text(6, 78.35, 'Males')
text(6, 78.1, 'Females')
dev.off()
## pdf 
##   2