Results for Health paper - Part Five

The change at this iteration is to remove 2009 from the analysis. (Change made 18 March 2015)

Removing Estimated Dead Animals

First thing that we wanted to look at is the removal of certain time frames of health profiles for individuals that are estimated to be dead. The easiest way to do this is to look at all surviving animals. To do that, we’ll use the death variable, which is output from the model. Anything that has a value lower than nt will be removed, and then we’ll summarise the remaining data. That then will be the lower threshold.

While we’re tallying health, we’ll then find the first instance in each dying animal’s health trace that goes below the threshold. Any health values from that point forward will be set to NA. So let’s load the data:

## Warning: package 'RColorBrewer' was built under R version 3.1.2

## 
## Attaching package: 'plyr'
## 
## The following object is masked from 'package:lubridate':
## 
##     here
## 
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## 
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

First up, what do the distribution of deaths look like?

hist(death, breaks = 50)
abline(v = nt, lwd = 3)
abline(h = c(5, 10), lty = 2)

Recall that anything over 500 on the x-axis is considered alive. Dashed horizontal lines are at 5 and 10, respectively. This seems reasonable as a first cut (to me at least). Now let’s use that information to retain only the alive individuals in order to summarise their health. To do this, we’ll simply set the dead animals to NA.

mpophealth <- sumh / ng
didx <- death < nt # these are animals imputed to die before the end of the model run
mpophealth[didx,] <- NA # I then set these to NA
mpophealth[mpophealth == 0] <- NA # I then set the times before firstSight to NA

Ok with that mpophealth variable set, we can summarise the distribution of health values for all animals that are currently imputed to be alive.

quantile(mpophealth, na.rm = TRUE)

##       0%      25%      50%      75%     100% 
## 19.57524 71.26664 78.62754 83.71893 93.43222

So as you can see from the quantiles there is a minimum of 19.5752442 but that’s just one animal. If you look at the histogram, it looks like a value of 32.5 is reasonable (that’s where the vertical line is drawn).

hist(mpophealth, breaks = 30)
abline(v = 32.5, lwd = 2)

We can experiment with that value, but we’ll start there. Let’s make that a matrix (pophealth) that we can use throughout the rest of the document.

hthresh <- 32.5
mpophealth <- sumh / ng
mpophealth[mpophealth < hthresh] <- NA # I then set all values less than this threshold to NA

Resting Females: Successful vs. Unsuccessful Comparisons

First thing we’ll look at are the numbers from the two comparisons we make for resting and for available females. This comparison is to look at the health of successful vs. unsuccessful females. I’ll show the plots, the statistics, and the tallies for the different groups.

Per Roz’s email (4 September 2014), we need to remove the trailing `available’ times. That is, assume that we only know the animal’s status within a known calving/reproductive window, and don’t assume that the years following a last calving are to be classified as ‘available.’ We’ll only have the first year after the last known calving be available.

To make the comparisons, we need to loop over years, and then find out two groups of individuals: 1) those available in a given year and pregnant the next year; and 2) those available in a given year, but still available in the next year.

Those data look like this:

head(pchealth)

##   year navail npregNext  nhealth  phealth
## 1 1985     12         4 75.29701 70.88400
## 2 1986      7         2 75.44524 82.96104
## 3 1987     12         7 81.62228 74.76987
## 4 1988     10         7 82.13949 82.62835
## 5 1989      6         4 74.17950 79.94247
## 6 1990     15         7 77.16702 79.34330

In the pchealth data frame, the variables are as follows:

navail is the sum of 3 to 2, and 3 to 4 animals, i.e. resting to pregnant, resting to available, respectively.
npregNext is the sum of the 3 to 2 animals.
phealth is the health of the animals who became pregnant
nphealth is the health of the animals who DID NOT became pregnant

We can plot the numbers easily to see how many are in each group:

## pdf 
##   2

You asked for a summary of these two time series (Resting to not-pregnant; Resting to pregnant) (item # 11):

Statistic	Not Pregnant	Pregnant
Mean	75.3094502	74.4993437
SD	4.7498714	4.934093

I’ll write these out to a csv file for further comparison:

write.csv(pchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/Summary_RestingPregnantComparison.csv')

But we also want to see if these are different. To do that, we should work with the raw data, not the mean summaries. So we need to grab those. I’ll do this by looping over year as before, but instead of grabbing the mean, I’ll just grab the whole year’s health values:

This yields a data frame where each row has one individual’s health for a given year, along with its status. Here’s what the first few lines of the data look like:

##      EGNo      Jan      Feb      Mar      Apr      May      Jun      Jul
## 1118 1118 74.47632 72.93009 71.75582 70.95635 70.52565 70.33860 70.38732
## 1127 1127 68.12994 68.80407 69.70061 70.47954 71.10782 71.65797 72.13763
## 1135 1135 65.67966 65.48355 65.30323 65.66140 66.01000 66.41044 66.90574
## 1157 1157 68.20174 68.78942 69.58654 70.69038 71.64539 72.48434 73.18712
## 1142 1142 69.41604 70.90792 72.62724 74.70894 76.48767 78.05353 79.32139
## 1145 1145 69.98919 70.57856 71.41659 72.18567 72.84624 73.45030 74.02000
##           Aug      Sep      Oct      Nov      Dec year status
## 1118 70.74132 71.35974 72.25307 73.06174 73.84652 1985     32
## 1127 72.51021 72.82288 73.06947 73.19585 73.33331 1985     32
## 1135 67.54200 68.35394 69.31867 70.14101 70.81165 1985     32
## 1157 73.83711 74.47468 75.06130 75.65747 76.21642 1985     32
## 1142 80.30593 81.12743 81.65665 82.04655 82.28964 1985     34
## 1145 74.46484 74.84513 75.16298 75.48712 75.76412 1985     34

In the above, status of 32 means they were resting one year, and pregnant the next. Status of 34 means resting one year, but available the next.

Let’s plot out the densities of these to check for normality:

## pdf 
##   2

They seem to be left skewed to me. But here’s the t.test comparing their means (n.b. I looked in R’s help, and these 95% CIs are for the sample estimates as we presumed yesterday) are.

t.test(as.vector(pchealthAll[pchealthAll$status == 32, 2:13]), as.vector(pchealthAll[pchealthAll$status == 34, 2:13]), alt = 'greater')

## 
##  Welch Two Sample t-test
## 
## data:  as.vector(pchealthAll[pchealthAll$status == 32, 2:13]) and as.vector(pchealthAll[pchealthAll$status == 34, 2:13])
## t = 1.1312, df = 3044.903, p-value = 0.129
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.1607706        Inf
## sample estimates:
## mean of x mean of y 
##  73.17479  72.82108

UPDATE - 12 September 2014 - We can also try breaking this down by year. Instead of running t.test, we can accomplish the same thing with a linear mixed effects model, where pregnancy status is a fixed effect and year is a random effect:

library(lme4)

## Loading required package: Matrix
## Loading required package: Rcpp

fm1 <- lmer(value ~ status + (1 | year), dfm)
print(fm1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: value ~ status + (1 | year)
##    Data: dfm
## REML criterion at convergence: 21727.04
## Random effects:
##  Groups   Name        Std.Dev.
##  year     (Intercept) 4.233   
##  Residual             7.871   
## Number of obs: 3108, groups:  year, 24
## Fixed Effects:
## (Intercept)       status  
##    74.54979     -0.02007

Once it’s fitted, then we can look at the effect by year.

library(lattice)
ranef(fm1)

## $year
##      (Intercept)
## 1985   0.6841645
## 1986   3.2620855
## 1987   3.6329235
## 1988   5.1451804
## 1989   4.2811386
## 1990   2.8996708
## 1991  -2.0103291
## 1992   2.7791659
## 1993   0.3698685
## 1994   3.2482927
## 1995   1.0784938
## 1996   1.4264034
## 1997  -8.0333321
## 1998  -5.9289451
## 1999   3.0814082
## 2000  -0.2251181
## 2001   5.1041826
## 2002   1.9030172
## 2003   1.9425190
## 2004  -7.1421172
## 2005  -8.0230479
## 2006  -3.0245701
## 2007  -4.7162719
## 2008  -1.7347833

dotplot(ranef(fm1, condVar = TRUE))

## $year

There’s a new package called sjPlot that allows for some more colourful visualisations of these (sjPlot package at CRAN). Let’s use that:

library(sjPlot)

## Warning: package 'sjPlot' was built under R version 3.1.2

pdf(file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/imagesMarch2015/HealthComparison_PregnancySuccessfulUnsuccessful_MixedEffects.pdf')
  sjp.lmer(fm1, sort.coef = TRUE)

## Plotting random effects...

dev.off()

## pdf 
##   2

sjp.lmer(fm1, sort.coef = TRUE)

## Plotting random effects...

I’ll write out the file for you to have the raw data.

write.csv(pchealthAll, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/RestingPregnantComparisonRawData.csv')

Available Females: Successful vs. Unsuccessful Comparisons

Now let’s look at the available females in the same way.

##   year navail npregNext  nhealth  phealth
## 1 1988     12         3 79.31726 75.41032
## 2 1989     12         9 75.55785 80.93117
## 3 1990      5         2 65.11144 77.99922
## 4 1991     11         4 74.54692 76.29860
## 5 1992     17         4 74.81760 70.44051
## 6 1993     27         4 75.35592 82.63014

In the avchealth data frame, the variables are as follows:

navail is the sum of 4 to 2, and 4 to 4 animals, i.e. available to pregnant, available to available, respectively.
npregNext is the sum of the 4 to 2 animals.
phealth is the health of the animals who became pregnant
nphealth is the health of the animals who DID NOT became pregnant

You asked for a summary of these two time series (Available to not-pregnant; Available to pregnant) (item # 11):

Statistic	Not Pregnant	Pregnant
Mean	72.6505894	74.8440756
SD	5.9337214	4.9437266

I’ll write out the file for you:

write.csv(avchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/Summary_AvailablePregnantComparison.csv')

And then we can plot the numbers in each category by year.

## pdf 
##   2

And plot the yearly mean scores of health:

## pdf 
##   2

## pdf 
##   2

## Warning: Removed 1 rows containing missing values (geom_path).

But we want to look at the raw data as well so we’ll repeat that loop storing the actual health, not the means.

And on that dataset we’ll run the t-test:

t.test(as.vector(avchealth[avchealth$status == 42, 2:13]), as.vector(avchealth[avchealth$status == 44, 2:13]))

## 
##  Welch Two Sample t-test
## 
## data:  as.vector(avchealth[avchealth$status == 42, 2:13]) and as.vector(avchealth[avchealth$status == 44, 2:13])
## t = 4.7866, df = 4439.2, p-value = 1.751e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.9356372 2.2337594
## sample estimates:
## mean of x mean of y 
##  74.54783  72.96314

As before, there are significant differences in the ones who transition to pregnant; they are healthier than those who transition to available.

Here are the density plots of these data:

## pdf 
##   2

These too are left skewed, and I’ll write them out to you for further comparison.

write.csv(avchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/AvailablePregnantHealthComparisonsRawData.csv')

UPDATE - 12 September 2014 - We can also try breaking this down by year as above. Instead of running t.test, we can accomplish the same thing with a linear mixed effects model, where pregnancy status is a fixed effect and year is a random effect:

fm1 <- lmer(value ~ status + (1 | year), dfm)
print(fm1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: value ~ status + (1 | year)
##    Data: dfm
## REML criterion at convergence: 37516.84
## Random effects:
##  Groups   Name        Std.Dev.
##  year     (Intercept)  4.415  
##  Residual             10.969  
## Number of obs: 4908, groups:  year, 24
## Fixed Effects:
## (Intercept)       status  
##     117.656       -1.014

Once it’s fitted, then we can look at the effect by year.

ranef(fm1)

## $year
##      (Intercept)
## 1985   4.6429460
## 1986   6.3798607
## 1987   5.1514891
## 1988   4.5848792
## 1989   4.8088678
## 1990  -3.2611511
## 1991   1.3321070
## 1992   0.2508165
## 1993   3.0233744
## 1994   1.8258111
## 1995   1.3424077
## 1996   2.7957015
## 1997  -0.3070687
## 1998  -3.4182646
## 1999  -2.5918733
## 2000  -0.6535002
## 2001   0.3110793
## 2002  -5.5231103
## 2003   0.7018151
## 2004   2.2834289
## 2005  -7.6205542
## 2006 -10.1683112
## 2007  -6.6499638
## 2008   0.7592133

dotplot(ranef(fm1, condVar = TRUE))

## $year

All Females: Successful vs. Unsuccessful Comparisons

This is per email dated 9 October to try and make the comparison of all successful vs. unsuccessful females - regardless of their status in the before year. I’ll make a line plot and a t-test. Let’s put the data together & run the t-test:

## 
##  Welch Two Sample t-test
## 
## data:  as.vector(allfemc[allfemc$svec == 1, 2:13]) and as.vector(allfemc[allfemc$svec == 0, 2:13])
## t = 4.6507, df = 7469.871, p-value = 3.364e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.6353701 1.5612396
## sample estimates:
## mean of x mean of y 
##  74.00694  72.90864

These results suggest these values are highly different, and that successful animals have a higher health. What do they look like?

## Warning: %.% is deprecated. Please use %>%

## Warning: %.% is deprecated. Please use %>%

plot(yearsum$year[yearsum$svec == 1], yearsum$health[yearsum$svec == 1], type = 'l', ylim = c(40, 90), ylab = 'Estimated Health', xlab = 'Year', lwd = 1.5, main = 'Yearly Median Health of \n Successful and Unsuccessful Females')
lines(yearsum$year[yearsum$svec == 0], yearsum$health[yearsum$svec == 0], col = 'grey')
legend(x = 'bottomleft', lwd = c(1.5, 1), legend = c('Successful', 'Unsuccessful'), col = c('black', 'grey'))

pdf(file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/imagesMarch2015/HealthSummary_Successful_Unsuccessful.pdf')
plot(yearsum$year[yearsum$svec == 1], yearsum$health[yearsum$svec == 1], type = 'l', ylim = c(40, 90), ylab = 'Estimated Health', xlab = 'Year', lwd = 1.5, main = 'Yearly Median Health of \n Successful and Unsuccessful Females')
lines(yearsum$year[yearsum$svec == 0], yearsum$health[yearsum$svec == 0], col = 'grey')
legend(x = 'bottomleft', lwd = c(1.5, 1), legend = c('Successful', 'Unsuccessful'), col = c('black', 'grey'))

dev.off()

## pdf 
##   2

Health of Available Animals vs. Calving Rates

Here let’s plot out a scatter plot of health of those available in the current year vs. the ratio of # of calves next year / # available females this year. The filled circles is the health of those who got pregnant; the empty circles are those who didn’t:

## pdf 
##   2

Master Plot of Health of Available Animals (Including First Calf) 1 & 2 Years Prior

I’m altering this section somewhat from the first version of this document. Now I want to look at the distribution of health scores in two diffrent perods: 1) the year one years prior to having a calf; and 2) the year two years prior to having a calf. This corresponds to item #10 in the most recent punch list.

First thing we need to do is grab the data and mold it into a useful form. The logic here is that we’ll take the master data frame summary containing all individual health profiles, find the times that are one year prior and then two years prior, and then set everything else to NA.

p.idx <- which(calves == 1, arr.ind = TRUE) # this is the pregnancy year
a1idx <- a2idx <- p.idx
a2idx[, 2] <- p.idx[, 2] - 12 # this is the pregnancy year minus one year
a1idx <- a1idx[which(a1idx[,2] > 0),]
a2idx <- a2idx[which(a2idx[,2] > 0),]

pop.idx <- a1idx
pophealth <- mpophealth
psub1 <- matrix(NA, dim(pophealth)[1], dim(pophealth)[2])
psub1[pop.idx] <- pophealth[pop.idx] 
psub1[psub1 < 5] <- NA # This is to remove health of females that are before firstSight

pop.idx <- a2idx
pophealth <- mpophealth
psub2 <- matrix(NA, dim(pophealth)[1], dim(pophealth)[2])
psub2[pop.idx] <- pophealth[pop.idx] 
psub2[psub2 < 5] <- NA 

df1 <- data.frame(year = 'year1', data = quantile(psub1, c(0.05, 0.25, 0.5, 0.75, 0.95), na.rm = TRUE))
df2 <- data.frame(year = 'year2', data = quantile(psub2, c(0.05, 0.25, 0.5, 0.75, 0.95), na.rm = TRUE))
df <- cbind(df1, df2)
df

##      year     data  year     data
## 5%  year1 52.68927 year2 54.85624
## 25% year1 67.52813 year2 73.98785
## 50% year1 72.22847 year2 77.70067
## 75% year1 79.06686 year2 81.65629
## 95% year1 86.67585 year2 86.40266

And then we can visualise the histogram and density plot (I’ll facet by year):

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

## pdf 
##   2

Population Health Scores

Ok, let’s look at the population level health scores. First, I need to factor out the young juveniles, and then marry up the juvenile times with the adult times for a consecutive health profile.

##   SightingEGNo   min_date   max_date MinDate MaxDate MinDateInt MaxDateInt
## 1         1005 1993-08-27 1997-08-23  8-1993  8-1997        284        332
## 2         1017 1988-04-03 2011-03-17  4-1988  3-2011        220        495
## 3         1019 1988-05-05 2009-07-18  5-1988  7-2009        221        475
## 4         1021 1988-04-03 1998-08-04  4-1988  8-1998        220        344
## 5         1026 1989-08-28 2009-09-03  8-1989  9-2009        176        477
## 6         1028 1988-09-02 2008-10-12  9-1988 10-2008        225        466

Then let’s get the population health data set up using these individual and time indices

And then plotted:

## pdf 
##   2

Ok, then what are the summary stats of these data, and what do they look like? The Fivenum below summary contains: minimum, lower-hinge, median, upper-hinge, maximum.

fivenum(pophealth)

## [1] 32.56354 68.64349 78.65503 84.12143 88.80960

And then the histogram of the population health data:

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

## pdf 
##   2

Note that the 0’s are likely from animals that will be estimated as dead. We can talk about pulling out the health values of the animals that the model has estimated to be dead.

Finally, let’s add the calving #’s on top of the health plot to round out the picture a little bit.

## pdf 
##   2

Sub-population Health

Ok, that was the population level health. What about some of the sub-population categories? We can turn to these now. I’ll leave some of the code out since it’s just a repeat of what we did to generate the population level values. We need summaries for: * Adult males * Young juvs * Old juvs * Pregnant * Lactating * Resting * Available

## pdf 
##   2

## pdf 
##   2

For Young Juveniles we need to see why the values are tailing off in 2009 and 2010. Easiest way to do this is to look at summaries of the body and skin condition for these time periods. I’ll start in 2008 as that is what appears to be the peak in the late 00’s. Here’s body, followed by skin

jidx <- jd1.idx[jd1.idx$MinDateInt > 456, ]
sko <- bobs <- rep(NA, nrow(jidx))
for(i in 1:nrow(jidx)){
  bobs[i] <- sum(!is.na(hobs[which(ID == jidx$SightingEGNo[i]), jidx$MinDateInt[i]:jidx$MaxDateInt[i]] ))
  sko[i] <- sum(!is.na(skobs[which(ID == jidx$SightingEGNo[i]), jidx$MinDateInt[i]:jidx$MaxDateInt[i]] ))
}
jidx$bobs <- bobs
jidx$skin <- sko
jidx[,c(1,2,3,8,9)]

##     SightingEGNo   min_date   max_date bobs skin
## 242         3701 2008-05-07 2009-09-17    0    0
## 243         3705 2008-02-03 2009-09-24    0    0
## 244         3710 2008-02-03 2009-01-30    0    0
## 245         3712 2008-02-24 2009-11-07    0    0
## 246         3714 2008-01-24 2009-12-21    2    2
## 247         3720 2008-01-13 2009-09-02    0    0
## 248         3725 2008-02-14 2009-08-28    0    0
## 249         3730 2008-02-05 2009-09-17    0    0
## 250         3740 2008-02-20 2009-12-04    0    0
## 251         3742 2008-01-28 2009-12-31    0    0
## 252         3745 2008-01-06 2009-12-22    0    0
## 253         3746 2008-02-16 2009-09-26    0    0
## 254         3750 2008-02-24 2009-09-26    0    0
## 255         3760 2008-05-01 2009-10-20    0    0
## 256         3770 2008-05-07 2009-11-07    0    0
## 257         3791 2008-01-14 2009-05-12    0    0
## 258         3792 2008-01-09 2009-12-22    0    0
## 259         3802 2009-02-12 2010-04-21    0    0
## 260         3803 2010-02-15 2010-08-19    0    0
## 261         3808 2009-02-09 2010-04-18    0    0
## 262         3810 2009-02-07 2010-04-21    0    0
## 263         3812 2009-02-13 2010-05-18    0    0
## 264         3815 2009-01-14 2010-08-31    0    0
## 265         3820 2009-03-23 2010-08-11    0    0
## 266         3821 2009-03-24 2010-04-20    0    0
## 267         3822 2009-02-08 2010-04-13    0    0
## 268         3823 2009-06-07 2010-01-15    0    0
## 269         3830 2009-01-21 2010-01-28    0    0
## 270         3832 2009-02-10 2009-09-21    0    0
## 271         3843 2009-02-13 2010-02-14    0    0
## 272         3845 2009-02-05 2010-04-20    0    0
## 273         3850 2009-02-18 2010-01-19    0    0
## 274         3853 2009-01-02 2009-12-26    0    0
## 275         3890 2009-01-10 2010-04-13    0    0
## 276         3892 2009-01-05 2010-04-13    0    0
## 277         3893 2009-02-08 2010-05-04    0    0
## 278         3903 2010-01-27 2010-02-17    0    0
## 279         3904 2010-03-14 2010-10-24    0    0
## 280         3908 2010-01-18 2011-04-22    0    0
## 281         3911 2010-02-19 2011-02-03    4    4
## 282         3915 2010-01-19 2011-03-17    0    0
## 283         3930 2010-01-25 2011-04-22    0    0
## 284         3934 2010-01-28 2010-02-17    0    0
## 285         3940 2010-01-15 2010-05-18    0    0
## 286         3945 2010-01-03 2010-04-21    0    0
## 287         3946 2011-03-17 2011-03-17    0    0
## 288         3950 2010-01-28 2010-08-22    0    0
## 289         3951 2010-01-07 2011-02-17    0    0
## 290         3966 2010-01-18 2010-05-16    0    0
## 291         3981 2010-01-28 2010-03-07    0    0
## 292         3990 2010-01-28 2010-10-20    0    0

So as you can see, there’s not a lot of VHP data here. The number in the body and skin columns corresponds to the number of cases within that date range that we have an observation for that parameter.

Next we’ll move on to looking at oldJuveniles

## pdf 
##   2

And now let’s look at the different reproductive Female categories. For the pregnant females, I can use the existing time query, but I’ll alter it for lactating, resting, and available per our discussions over email. Here are the pregnant females:

## pdf 
##   2

Now let’s adjust the query for the other times in the reproductive life cycle. Note that I won’t print the code. it’s hidden, but here if you want it.

With the calving status updated to remove the trailing available times, we can look at the remaining three categories:

## pdf 
##   2

## pdf 
##   2

## pdf 
##   2

## pdf 
##   2

This next block of code just plots the individual traces for 2006 for the available females, as it appears that the median for that first year is quite low (shown in the preceding plot).

avchealth <- numeric(0)
idx <- which(monYr[,2] == 2006)
cavpreg  <- unique(which(newCalves[, idx] == 4 ,arr.ind = TRUE)[,1]) 
df <- mpophealth[cavpreg, idx]
colnames(df) <- 1:12
df <- data.frame(df, ID = row.names(df))
dfL <- melt(df, id.vars = 'ID', measure.vars = 1:12)
p <- ggplot(data = dfL, aes(x = variable, y = value, group = ID))+
  geom_line()
print(p)

Indeed if you look at the health traces for 2006, you can see that there are 4 animals with January 2006 health values below 50. Three animals start in the mid-50’s, but go on to recover throughout the year.

Correlations between Health in Available & Resting Females and Calving Ratio

The correlation between health of all available females in one year and calves/year (in the following year) is set up with this chunk of code:

cvec <- table(calfTable$Calving.Year)
cvec <- cvec[which(names(cvec) == 1986):(length(cvec)-1)]
health <- dfAVfem[dfAVfem$category == 'avfemHealth',-c(1, ncol(dfAVfem))]
cvs <- data.frame(calfyr = names(cvec), numcalf = cvec)
hvs <- data.frame(hyear = names(health), hvals = t(health))
dfcor <- cbind(hvs[1:24,], cvs)
head(dfcor)

##      hyear       X1 calfyr numcalf
## 1985  1985 82.61077   1986      13
## 1986  1986 84.87700   1987      11
## 1987  1987 85.18796   1988       7
## 1988  1988 86.04096   1989      16
## 1989  1989 86.48974   1990      12
## 1990  1990 84.29497   1991      17

# cor(dfcor[,'X1'], dfcor[,'numcalf'], use = "complete.obs")

Note that I’ve done this quickly, and I’m somewhat tired. So I’ll need to go back and check the logic. But it looks right so far. The correlation is weak (-0.13357), but I still like Scott’s line of thinking about making the link (at least in discussion form) about this and pop consequences.

20 October 2014 - adding in a two year lag:

cvec <- table(calfTable$Calving.Year)
cvec <- cvec[which(names(cvec) == 1987):(length(cvec))]
cvs <- data.frame(calfyr = names(cvec), numcalf = cvec)
dfcor <- cbind(hvs, cvs)
head(dfcor)

##      hyear       X1 calfyr numcalf
## 1985  1985 82.61077   1987      11
## 1986  1986 84.87700   1988       7
## 1987  1987 85.18796   1989      16
## 1988  1988 86.04096   1990      12
## 1989  1989 86.48974   1991      17
## 1990  1990 84.29497   1992      12

Now the two year correlation is: (-0.2728044).

Health in Calving Females 1 and 2 Years Prior

This corresponds to item #10 in the punch list. We have to generate the data by determining the individuals who gave birth in any given year, and then looking at their health one and two years prior. We’ll loop over years to do this.

cvec <- table(calfTable$Calving.Year)
cvec <- cvec[which(names(cvec) == 1985):which(names(cvec) == 2008)]
yrseq <- seq(1985, 2008)
health1 <- health2 <- rep(NA, length.out = length(yrseq))
for(i in 1:length(yrseq)){
  inds <- unique(calfTable[calfTable$Calving.Year == yrseq[i], 'EGNo']) # These are the moms who gave birth in one year
  r.idx <- which(rownames(mpophealth) %in% inds) # these are their positions in the health data frame
  yr1idx <- which(monYr[,2] == yrseq[i]) - 12 # indices to get health 1 year prior
  yr2idx <- which(monYr[,2] == yrseq[i]) - 24 # indices to get health 2 years prior
  health1[i] <- mean(mpophealth[r.idx, yr1idx], na.rm = TRUE)
  health2[i] <- mean(mpophealth[r.idx, yr2idx], na.rm = TRUE)
}
df <- data.frame(year = yrseq, calves = cvec, hYear1 = health1, hYear2 = health2)
head(df)

##      year calves   hYear1   hYear2
## 1985 1985     11 72.31036 76.79529
## 1986 1986     13 79.63716 80.81080
## 1987 1987     11 79.94732 76.81785
## 1988 1988      7 74.60439 77.65860
## 1989 1989     16 80.03650 79.38941
## 1990 1990     12 74.18565 75.20832

You asked for a correlation for these. For # Calves and health one year prior, the correlation is: -0.303643. For # Calves and health two years prior, the correlation is: -0.2037859.

I’ll write these out to you for work in excel.

write.csv(df, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/CalvesHealthOneYearTwoYearPrior.csv')

Health in Different Groups

This is a new test to see if the different groups are statistically different. We’ll run an ANOVA on the different groups yearly avaerages. If this isn’t quite what you are after, let me know. 14 October 2014 n.b. I’ve add a Tukey Honest Significant Differences test here and all subsequent ones as well.

yrseq <- seq(from = 1980, to = 2008)
health <- c(adMalesHealth, yjuvsHealth, ojuvsHealth, pfemHealth, lacfemHealth, restfemHealth, avfemHealth)
group <- rep(c('AdMales', 'yJuvs', 'oJuvs', 'pfem', 'lacfem', 'restfem', 'avfem'), each = length(lacfemHealth))
yvec <- rep(yrseq, times = 7)
df <- data.frame(health = health,
                 group = group, 
                 year = yvec)
aov.ex1 = aov(health ~ group, data = df)
summary(aov.ex1)

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## group         6   1940   323.4   15.56 1.59e-14 ***
## Residuals   192   3990    20.8                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 4 observations deleted due to missingness

boxplot(health ~ group, data = df)

print(model.tables(aov.ex1,"means"),digits=3)

## Tables of means
## Grand mean
##          
## 78.38016 
## 
##  group 
##     AdMales avfem lacfem oJuvs pfem restfem yJuvs
##        79.2  77.7   73.8  81.7   78    75.3  83.4
## rep    27.0  29.0   29.0  28.0   29    29.0  28.0

phoc <- TukeyHSD(aov.ex1, 'group', conf.level = 0.95)
outtsd <- data.frame(phoc$group, sigdiff = phoc$group[,4] < 0.05)
outtsd

##                       diff        lwr        upr        p.adj sigdiff
## avfem-AdMales   -1.5031663 -5.1357938  2.1294612 8.804289e-01   FALSE
## lacfem-AdMales  -5.4615150 -9.0941425 -1.8288875 2.552239e-04    TRUE
## oJuvs-AdMales    2.4575236 -1.2062462  6.1212934 4.188393e-01   FALSE
## pfem-AdMales    -1.2297691 -4.8623965  2.4028584 9.515147e-01   FALSE
## restfem-AdMales -3.9400121 -7.5726396 -0.3073846 2.399035e-02    TRUE
## yJuvs-AdMales    4.1652174  0.5014476  7.8289872 1.473770e-02    TRUE
## lacfem-avfem    -3.9583487 -7.5255181 -0.3911793 1.905323e-02    TRUE
## oJuvs-avfem      3.9606899  0.3618117  7.5595681 2.073614e-02    TRUE
## pfem-avfem       0.2733972 -3.2937722  3.8405666 9.999879e-01   FALSE
## restfem-avfem   -2.4368458 -6.0040152  1.1303236 3.957851e-01   FALSE
## yJuvs-avfem      5.6683837  2.0695055  9.2672619 1.029963e-04    TRUE
## oJuvs-lacfem     7.9190386  4.3201604 11.5179168 1.036533e-08    TRUE
## pfem-lacfem      4.2317459  0.6645765  7.7989153 9.086635e-03    TRUE
## restfem-lacfem   1.5215028 -2.0456665  5.0886722 8.643503e-01   FALSE
## yJuvs-lacfem     9.6267324  6.0278542 13.2256106 2.865042e-12    TRUE
## pfem-oJuvs      -3.6872927 -7.2861709 -0.0884145 4.069068e-02    TRUE
## restfem-oJuvs   -6.3975357 -9.9964139 -2.7986576 6.615662e-06    TRUE
## yJuvs-oJuvs      1.7076938 -1.9226162  5.3380038 8.003959e-01   FALSE
## restfem-pfem    -2.7102431 -6.2774124  0.8569263 2.670738e-01   FALSE
## yJuvs-pfem       5.3949865  1.7961083  8.9938647 2.695163e-04    TRUE
## yJuvs-restfem    8.1052295  4.5063514 11.7041077 4.432938e-09    TRUE

write.csv(outtsd, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/TukeyPostHocTest_AllSubGroupsAllYears.csv')

pdf(file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/imagesMarch2015/anovaBoxplot_results.pdf')
boxplot(health ~ group, data = df)
dev.off()

## pdf 
##   2

Let’s add in a comparison of the health in these different periods for all sub-pop categories.

Health in Different Groups in Good vs. Bad Years

So now we want to examine the median health of animals in different sub groups during these 4 time periods (Updated 20 March 2015):

1993-1995
1998-2000
2001-2003
2004-2006

We also want them for the three decades. So let’s set these up as indices we can recycle below:

idx92 <- which(monYr[,2] == 1993 | monYr[,2] == 1994 | monYr[,2] == 1995)
idx97 <- which(monYr[,2] == 1998 | monYr[,2] == 1999 | monYr[,2] == 2000)
idx07 <- which(monYr[,2] == 2001 | monYr[,2] == 2002 | monYr[,2] == 2003) # n.b. this was 2007-2009; I've changed the years, but kept the index name (idx07) unchanged
idx04 <- which(monYr[,2] == 2004 | monYr[,2] == 2005 | monYr[,2] == 2006)
dec1 <- which(monYr[,2] >= 1980 & monYr[,2] <= 1989)
dec2 <- which(monYr[,2] >= 1990 & monYr[,2] <= 1999)
dec3 <- which(monYr[,2] >= 2000 & monYr[,2] <= 2008)

First let’s look at the population level health, and show it with density plots of the yearly average

## pdf 
##   2

Then we can look at the ANOVA for these periods:

Whole Population - 4 Periods

Year	Mean	SD
1993-1995	77.2801165	10.9607635
1998-2000	70.8345212	12.7826476
2001-2003	73.248665	12.446059
2004-2006	71.4192843	12.330317

##                Df  Sum Sq Mean Sq F value Pr(>F)    
## group           3  209518   69839   471.3 <2e-16 ***
## Residuals   35006 5187694     148                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 73.08135 
## 
##  group 
##     1993-1995 1998-2000 2001-2003 2004-2006
##          77.3      70.8      73.2      71.4
## rep    7962.0    8259.0    8940.0    9849.0

##                           diff        lwr       upr       p.adj sigdiff
## 1998-2000-1993-1995 -6.4455953 -6.9367852 -5.954405 0.00000e+00    TRUE
## 2001-2003-1993-1995 -4.0314516 -4.5133712 -3.549532 0.00000e+00    TRUE
## 2004-2006-1993-1995 -5.8608322 -6.3321593 -5.389505 0.00000e+00    TRUE
## 2001-2003-1998-2000  2.4141437  1.9368294  2.891458 0.00000e+00    TRUE
## 2004-2006-1998-2000  0.5847631  0.1181458  1.051380 7.03345e-03    TRUE
## 2004-2006-2001-2003 -1.8293806 -2.2862294 -1.372532 4.22995e-14    TRUE

And then for the three decades (n.b. this was added 20 October 2014):

Whole Population - 3 Decades

Year	Mean	SD
1980-1989	79.8493781	9.0890584
1990-1999	75.2044962	12.2343087
2000-2008	72.5024043	12.6250179

##                Df  Sum Sq Mean Sq F value Pr(>F)    
## group           2  452015  226007    1589 <2e-16 ***
## Residuals   66301 9429433     142                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 74.83423 
## 
##  group 
##     Decade1 Decade2 Decade3
##        79.8    75.2    72.5
## rep 11568.0 25765.0 28971.0

##                      diff       lwr       upr p.adj sigdiff
## Decade2-Decade1 -4.644882 -4.957697 -4.332067     0    TRUE
## Decade3-Decade1 -7.346974 -7.654379 -7.039569     0    TRUE
## Decade3-Decade2 -2.702092 -2.941437 -2.462747     0    TRUE

Then we’ll look at Adult Males, juveniles, pregnant, lactating, resting, & available females. Note that I’ve updated this chunk of code to include the mean’s and SDs in the different periods. Between the plots I’ll put in tables with the values (means & SDs). I’ll also put in an ANOVA table of the health in the different periods.

Adult Males - Density Plots

## pdf 
##   2

Adult Males - 4 Periods

Year	Mean	SD
1993-1995	77.4949231	10.8418349
1998-2000	71.0306924	12.9644604
2001-2003	74.624761	12.1165273
2004-2006	72.1952074	12.1114213

##                Df  Sum Sq Mean Sq F value Pr(>F)    
## group           3  217168   72389   496.9 <2e-16 ***
## Residuals   39867 5807314     146                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 73.63864 
## 
##  group 
##     1993-1995 1998-2000 2001-2003 2004-2006
##          77.5        71      74.6      72.2
## rep    8063.0      9024   10425.0   12359.0

##                          diff        lwr       upr        p.adj sigdiff
## 1998-2000-1993-1995 -6.464231 -6.9393857 -5.989076 0.000000e+00    TRUE
## 2001-2003-1993-1995 -2.870162 -3.3300044 -2.410320 0.000000e+00    TRUE
## 2004-2006-1993-1995 -5.299716 -5.7435894 -4.855842 0.000000e+00    TRUE
## 2001-2003-1998-2000  3.594069  3.1482467  4.039890 0.000000e+00    TRUE
## 2004-2006-1998-2000  1.164515  0.7351828  1.593847 1.928879e-11    TRUE
## 2004-2006-2001-2003 -2.429554 -2.8418747 -2.017232 0.000000e+00    TRUE

Adult Males - 3 Decades

Year	Mean	SD
1980-1989	80.1517652	8.9888455
1990-1999	75.2501665	12.2764325
2000-2008	73.2388538	12.4671301

##                Df   Sum Sq Mean Sq F value Pr(>F)    
## group           2   439180  219590    1554 <2e-16 ***
## Residuals   74553 10532987     141                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 75.12042 
## 
##  group 
##     Decade1 Decade2 Decade3
##        80.2    75.3    73.2
## rep 12424.0 27045.0 35087.0

##                      diff       lwr       upr p.adj sigdiff
## Decade2-Decade1 -4.901599 -5.203524 -4.599674     0    TRUE
## Decade3-Decade1 -6.912911 -7.203741 -6.622082     0    TRUE
## Decade3-Decade2 -2.011313 -2.236729 -1.785896     0    TRUE

Young Juveniles - Density Plots

## pdf 
##   2

Young Juveniles - 4 Periods

Year	Mean	SD
1993-1995	76.8178578	11.0618823
1998-2000	70.8429764	13.2566712
2001-2003	73.1273757	12.9402922
2004-2006	71.2243791	12.2401932

##                Df  Sum Sq Mean Sq F value Pr(>F)    
## group           3  157239   52413   339.9 <2e-16 ***
## Residuals   29231 4507140     154                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 72.92154 
## 
##  group 
##     1993-1995 1998-2000 2001-2003 2004-2006
##          76.8      70.8      73.1      71.2
## rep    6823.0    7095.0    7440.0    7877.0

##                           diff        lwr        upr        p.adj sigdiff
## 1998-2000-1993-1995 -5.9748814 -6.5157886 -5.4339743 0.000000e+00    TRUE
## 2001-2003-1993-1995 -3.6904821 -4.2252059 -3.1557584 0.000000e+00    TRUE
## 2004-2006-1993-1995 -5.5934787 -6.1210592 -5.0658982 0.000000e+00    TRUE
## 2001-2003-1998-2000  2.2843993  1.7550492  2.8137495 1.776357e-14    TRUE
## 2004-2006-1998-2000  0.3814027 -0.1407307  0.9035361 2.381108e-01   FALSE
## 2004-2006-2001-2003 -1.9029966 -2.4187216 -1.3872717 3.419487e-14    TRUE

Young Juveniles - 3 Decades

Year	Mean	SD
1980-1989	80.9408568	8.4229244
1990-1999	75.1112892	12.3216573
2000-2008	72.3747899	12.6500923

##                Df  Sum Sq Mean Sq F value Pr(>F)    
## group           2  756642  378321    2868 <2e-16 ***
## Residuals   63941 8434830     132                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 75.77848 
## 
##  group 
##     Decade1 Decade2 Decade3
##        80.9    75.1    72.4
## rep 18060.0 23001.0 22883.0

##                      diff       lwr       upr p.adj sigdiff
## Decade2-Decade1 -5.829568 -6.097197 -5.561939     0    TRUE
## Decade3-Decade1 -8.566067 -8.833999 -8.298135     0    TRUE
## Decade3-Decade2 -2.736499 -2.987833 -2.485165     0    TRUE

Old Juveniles - Density Plots

## pdf 
##   2

Old Juveniles - 4 Periods

Year	Mean	SD
1993-1995	77.3554233	10.9532869
1998-2000	70.8807186	13.1351405
2001-2003	72.9849625	12.9271123
2004-2006	71.0912898	12.5322869

##                Df  Sum Sq Mean Sq F value Pr(>F)    
## group           3  212111   70704   458.9 <2e-16 ***
## Residuals   31461 4847615     154                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 73.07109 
## 
##  group 
##     1993-1995 1998-2000 2001-2003 2004-2006
##          77.4      70.9        73      71.1
## rep    7821.0    7583.0      7868    8193.0

##                           diff        lwr       upr        p.adj sigdiff
## 1998-2000-1993-1995 -6.4747048 -6.9886447 -5.960765 0.000000e+00    TRUE
## 2001-2003-1993-1995 -4.3704609 -4.8796529 -3.861269 0.000000e+00    TRUE
## 2004-2006-1993-1995 -6.2641336 -6.7682659 -5.760001 0.000000e+00    TRUE
## 2001-2003-1998-2000  2.1042439  1.5910602  2.617428 3.830269e-14    TRUE
## 2004-2006-1998-2000  0.2105712 -0.2975926  0.718735 7.111623e-01   FALSE
## 2004-2006-2001-2003 -1.8936727 -2.3970341 -1.390311 2.853273e-14    TRUE

Old Juveniles - 3 Decades

Year	Mean	SD
1980-1989	80.9097955	8.5935547
1990-1999	75.5472881	12.2218053
2000-2008	72.1942545	12.9012384

##                Df  Sum Sq Mean Sq F value Pr(>F)    
## group           2  807647  403824    2989 <2e-16 ***
## Residuals   68777 9292524     135                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 75.79107 
## 
##  group 
##     Decade1 Decade2 Decade3
##        80.9    75.5    72.2
## rep 18571.0 25509.0 24700.0

##                      diff       lwr       upr p.adj sigdiff
## Decade2-Decade1 -5.362507 -5.625294 -5.099720     0    TRUE
## Decade3-Decade1 -8.715541 -8.980135 -8.450947     0    TRUE
## Decade3-Decade2 -3.353034 -3.596222 -3.109845     0    TRUE

Pregnant Females - Density Plots

## pdf 
##   2

Pregnant Females - 4 Periods

Year	Mean	SD
1993-1995	78.9290017	6.5579373
1998-2000	76.740339	7.6905249
2001-2003	74.8905612	8.8790286
2004-2006	71.2796643	8.8010568

##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          3  18956    6319    92.3 <2e-16 ***
## Residuals   2337 159979      68                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 74.64181 
## 
##  group 
##     1993-1995 1998-2000 2001-2003 2004-2006
##          78.9      76.7      74.9      71.3
## rep     409.0     432.0     660.0     840.0

##                          diff       lwr        upr        p.adj sigdiff
## 1998-2000-1993-1995 -2.188663 -3.656165 -0.7211600 7.454201e-04    TRUE
## 2001-2003-1993-1995 -4.038440 -5.377006 -2.6998749 7.509549e-13    TRUE
## 2004-2006-1993-1995 -7.649337 -8.931858 -6.3668168 6.425971e-13    TRUE
## 2001-2003-1998-2000 -1.849778 -3.166160 -0.5333959 1.758820e-03    TRUE
## 2004-2006-1998-2000 -5.460675 -6.720025 -4.2013247 6.581402e-13    TRUE
## 2004-2006-2001-2003 -3.610897 -4.717313 -2.5044809 6.838974e-13    TRUE

Pregnant Females - 3 Decades

Year	Mean	SD
1980-1989	77.6662504	7.4670201
1990-1999	78.1405767	7.1130291
2000-2008	73.1725745	10.4802416

##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2  27761   13881     167 <2e-16 ***
## Residuals   4947 411174      83                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##         
## 75.4042 
## 
##  group 
##     Decade1 Decade2 Decade3
##        77.7    78.1    73.2
## rep  1157.0  1177.0  2616.0

##                       diff        lwr       upr        p.adj sigdiff
## Decade2-Decade1  0.4743263 -0.4105255  1.359178 4.198984e-01   FALSE
## Decade3-Decade1 -4.4936759 -5.2483039 -3.739048 3.692008e-09    TRUE
## Decade3-Decade2 -4.9680022 -5.7181716 -4.217833 3.692008e-09    TRUE

Lactating Females - Density Plots

## pdf 
##   2

Lactating Females - 4 Periods

Year	Mean	SD
1993-1995	75.0378168	8.9767111
1998-2000	75.8438726	5.822301
2001-2003	72.3256292	9.4715411
2004-2006	68.3136934	7.7374493

##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          3  13354    4451   60.59 <2e-16 ***
## Residuals   1898 139430      73                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##         
## 71.3502 
## 
##  group 
##     1993-1995 1998-2000 2001-2003 2004-2006
##            75      75.8      72.3      68.3
## rep       249     120.0     797.0     736.0

##                           diff       lwr       upr        p.adj sigdiff
## 1998-2000-1993-1995  0.8060558 -1.643052  3.255164 8.322809e-01   FALSE
## 2001-2003-1993-1995 -2.7121876 -4.312195 -1.112180 8.124482e-05    TRUE
## 2004-2006-1993-1995 -6.7241234 -8.339838 -5.108409 2.538958e-11    TRUE
## 2001-2003-1998-2000 -3.5182434 -5.676236 -1.360251 1.694566e-04    TRUE
## 2004-2006-1998-2000 -7.5301792 -9.699843 -5.360516 2.538658e-11    TRUE
## 2004-2006-2001-2003 -4.0119358 -5.138583 -2.885288 2.539324e-11    TRUE

Lactating Females - 3 Decades

Year	Mean	SD
1980-1989	75.286746	9.8548663
1990-1999	73.4285026	8.96447
2000-2008	70.4195861	9.7046603

##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2  18887    9444   103.8 <2e-16 ***
## Residuals   4490 408572      91                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 72.48821 
## 
##  group 
##     Decade1 Decade2 Decade3
##        75.3    73.4    70.4
## rep  1122.0  1274.0  2097.0

##                      diff       lwr        upr        p.adj sigdiff
## Decade2-Decade1 -1.858243 -2.773874 -0.9426131 6.064321e-06    TRUE
## Decade3-Decade1 -4.867160 -5.694383 -4.0399367 4.358672e-08    TRUE
## Decade3-Decade2 -3.008917 -3.803342 -2.2144908 4.358674e-08    TRUE

Resting Females - Density Plots

## pdf 
##   2

Resting Females - 4 Periods

Year	Mean	SD
1993-1995	74.5278996	8.958792
1998-2000	69.4644672	9.7953215
2001-2003	74.8661268	8.2959392
2004-2006	67.4387221	10.099468

##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          3  20132    6711   76.03 <2e-16 ***
## Residuals   1748 154282      88                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 70.96658 
## 
##  group 
##     1993-1995 1998-2000 2001-2003 2004-2006
##          74.5      69.5      74.9      67.4
## rep     264.0     305.0     497.0     686.0

##                           diff       lwr        upr        p.adj sigdiff
## 1998-2000-1993-1995 -5.0634323 -7.094286 -3.0325785 1.114780e-09    TRUE
## 2001-2003-1993-1995  0.3382272 -1.501640  2.1780949 9.650704e-01   FALSE
## 2004-2006-1993-1995 -7.0891775 -8.838912 -5.3394428 5.060841e-12    TRUE
## 2001-2003-1998-2000  5.4016596  3.644410  7.1589093 5.098810e-12    TRUE
## 2004-2006-1998-2000 -2.0257451 -3.688389 -0.3631012 9.514004e-03    TRUE
## 2004-2006-2001-2003 -7.4274047 -8.850476 -6.0043338 5.023204e-12    TRUE

Resting Females - 3 Decades

Year	Mean	SD
1980-1989	77.2302079	7.293165
1990-1999	72.0866061	10.249516
2000-2008	70.5768716	9.34073

##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2  27801   13900   163.9 <2e-16 ***
## Residuals   3929 333225      85                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 72.71896 
## 
##  group 
##     Decade1 Decade2 Decade3
##        77.2    72.1    70.6
## rep   968.0  1313.0  1651.0

##                      diff       lwr        upr        p.adj sigdiff
## Decade2-Decade1 -5.143602 -6.058321 -4.2288830 2.117008e-08    TRUE
## Decade3-Decade1 -6.653336 -7.527417 -5.7792556 2.117008e-08    TRUE
## Decade3-Decade2 -1.509735 -2.308149 -0.7113204 2.837961e-05    TRUE

Available Females - Density Plots

## pdf 
##   2

Available Females - 4 Periods

Year	Mean	SD
1993-1995	75.7572333	11.2670512
1998-2000	71.3794323	10.2091415
2001-2003	73.9695108	12.013284
2004-2006	67.1271768	15.1667478

##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          3  23700    7900   59.93 <2e-16 ***
## Residuals   3092 407593     132                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 72.60122 
## 
##  group 
##     1993-1995 1998-2000 2001-2003 2004-2006
##          75.8      71.4        74      67.1
## rep     984.0    1332.0       408     372.0

##                          diff         lwr        upr        p.adj sigdiff
## 1998-2000-1993-1995 -4.377801  -5.6183641 -3.1372379 0.0000000000    TRUE
## 2001-2003-1993-1995 -1.787723  -3.5254883 -0.0499568 0.0409821745    TRUE
## 2004-2006-1993-1995 -8.630056 -10.4262787 -6.8338343 0.0000000000    TRUE
## 2001-2003-1998-2000  2.590078   0.9201738  4.2599831 0.0003983173    TRUE
## 2004-2006-1998-2000 -4.252256  -5.9829103 -2.5216007 0.0000000000    TRUE
## 2004-2006-2001-2003 -6.842334  -8.9579897 -4.7266783 0.0000000000    TRUE

Available Females - 3 Decades

Year	Mean	SD
1980-1989	78.8526422	4.9647962
1990-1999	73.5188655	11.6230927
2000-2008	71.1501121	12.933057

##               Df Sum Sq Mean Sq F value Pr(>F)    
## group          2  31359   15679   124.3 <2e-16 ***
## Residuals   5097 643149     126                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Tables of means
## Grand mean
##          
## 73.72157 
## 
##  group 
##     Decade1 Decade2 Decade3
##        78.9    73.5    71.2
## rep   828.0  2844.0  1428.0

##                      diff       lwr       upr        p.adj sigdiff
## Decade2-Decade1 -5.333777 -6.373695 -4.293858 0.000000e+00    TRUE
## Decade3-Decade1 -7.702530 -8.852849 -6.552211 0.000000e+00    TRUE
## Decade3-Decade2 -2.368753 -3.222865 -1.514642 2.579519e-10    TRUE

Summary Table Data Prep

Ok, we need to make the data to write to excel. What we want is for each row to correspond to a population or sub-population category, and then each column to have the yearly summary of health (with CIs). I’ll copy this file, along with the others, to a “data” folder in the main dropbox paper folder.

dfAll <- rbind(dfPop, dfadMales, dfyjuves, dfojuves, dfpregfem, dflacfem, dfrestfem, dfavfem)
write.csv(dfAll, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/YearlyHealthSummariesByGroup.csv')

Pairwise Comparison within One Individual: EGNo 1245

Ok, this is just a one-off test of doing a comparison of the ‘before’ and ‘after’ test for one individual. In this case the before will be either resting or available years, and the after will be pregnant years. So 1245 has had 4 calves in these years:

calfTable[calfTable$EGNo == 1245,]

##     EGNo Calving.Year CalfNo CalvingInterval
## 167 1245         1996   2645              NA
## 168 1245         2001   3145               5
## 169 1245         2005   3545               4
## 170 1245         2008   3845               3

idx <- which(ID == 1245)

df <- data.frame(status = newCalves[idx,], dates = as.Date(paste(monYr[,1], 01 , monYr[,2], sep = '/'), format = "%m/%d/%Y"))
df <- df[which(df$status != 0),] 
p <- ggplot(df, aes(dates, status, group = status, colour = factor(status)))+
  geom_point()+
  labs(title = paste('EGNo: ', ID[idx], sep = ''), colour = 'status')+
  scale_colour_discrete(labels = c('Calving', 'Pregnant', 'Resting', 'Available'))
p

Ok, let’s set up the vectors to get her health in these periods. What I will define as the successful transitions are either resting to pregnant (3 to 2) or available to pregnant (4 to 2). The unsuccessful transitions will be resting to available (3 to 4) or available to available (4 to 4).

Ok, now that we have the indices set up, let’s get the health enumerated.

Phew. Now that that is behind us, let’s examine health in good versus bad years for EGNo 1245:

t.test(success, failure, paired = TRUE)

## 
##  Paired t-test
## 
## data:  success and failure
## t = 4.4319, df = 23, p-value = 0.0001918
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   8.561647 23.550197
## sample estimates:
## mean of the differences 
##                16.05592

Pairwise Comparison within All Individuals

This is item #12 from the punch list. This time it’s a proper paired t-test.

And then let’s run the t-test. Here I’ll run two. The first is a simple paired with the idea that the means are different. The second specifies that the alternative hypothesis is that the health in success years is greater than health in failure years.

t.test(success.tall, failure.tall, paired = TRUE)

## 
##  Paired t-test
## 
## data:  success.tall and failure.tall
## t = 1.6379, df = 54, p-value = 0.1073
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.299517  2.972745
## sample estimates:
## mean of the differences 
##                1.336614

t.test(success.tall, failure.tall, paired = TRUE, alt = "greater")

## 
##  Paired t-test
## 
## data:  success.tall and failure.tall
## t = 1.6379, df = 54, p-value = 0.05363
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.02913975         Inf
## sample estimates:
## mean of the differences 
##                1.336614

I can also test these difference with an ANOVA, which will keep more of the data:

df <- rbind(success.all, failure.all)
aov.ex1 = aov(health ~ group, data = df)
summary(aov.ex1)

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## group          1   3333    3333   31.25 2.36e-08 ***
## Residuals   6618 705877     107                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 4 observations deleted due to missingness

print(model.tables(aov.ex1,"means"),digits=3)

## Tables of means
## Grand mean
##          
## 74.05305 
## 
##  group 
##     success failure
##          75    73.5
## rep    2492  4128.0

boxplot(health ~ group, data = df)

So in both cases the successful year is higher than the unsuccessful year. The significance in the t-test is marginal, but in the ANOVA is clear.

22 October 2014 However, let’s add the mixed effects model. First, here’s a summary of the data by group:

## Source: local data frame [2 x 3]
## 
##     group     mean       sd
## 1 success 74.96629  9.17765
## 2 failure 73.50173 10.96354

And by individual (n.b. I’ve written these out to csv as all the rows don’t print to screen):

dfout <- df %>% group_by(egno, group) %>% summarise(mean = mean(health, na.rm = TRUE), sd = sd(health, na.rm = TRUE))
dfout

## Source: local data frame [110 x 4]
## Groups: egno
## 
##    egno   group     mean         sd
## 1  1001 success 81.55432  5.8707690
## 2  1001 failure 81.26708  0.6619314
## 3  1004 success 74.11819 12.1827599
## 4  1004 failure 76.33360  4.9114960
## 5  1007 success 78.49137  1.9023210
## 6  1007 failure 81.43335  2.4378604
## 7  1012 success 73.44134 11.1419490
## 8  1012 failure 74.54672  8.6347499
## 9  1013 success 72.25758 12.3557549
## 10 1013 failure 76.47423  8.7298574
## ..  ...     ...      ...        ...

write.csv(dfout, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/Summary_PairedTTestInputData.csv')

And here’s the test of health by group using individual as well as year as random effects.

library(lme4)
mod1 <- lmer(health ~ group + (1 | egno) + (1 | year), data = df)
print(mod1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: health ~ group + (1 | egno) + (1 | year)
##    Data: df
## REML criterion at convergence: 46141.76
## Random effects:
##  Groups   Name        Std.Dev.
##  egno     (Intercept) 6.633   
##  year     (Intercept) 4.154   
##  Residual             7.682   
## Number of obs: 6620, groups:  egno, 55; year, 33
## Fixed Effects:
##  (Intercept)  groupfailure  
##       75.085        -1.093

dotplot(ranef(mod1, condVar = TRUE))

## $egno

## 
## $year

And then an interaction term between individual and year, though I’m thinking we should meet to talk about this approach before going too much further. In particular, I would need to sort out the paired success/failure years to explain this more.

mod2 <- lmer(health ~ group + (1 | egno) + (1 | year) + (1 | egno:year), data = df)
print(mod2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: health ~ group + (1 | egno) + (1 | year) + (1 | egno:year)
##    Data: df
## REML criterion at convergence: 32229.31
## Random effects:
##  Groups    Name        Std.Dev.
##  egno:year (Intercept) 7.978   
##  egno      (Intercept) 5.990   
##  year      (Intercept) 3.045   
##  Residual              2.214   
## Number of obs: 6620, groups:  egno:year, 552; egno, 55; year, 33
## Fixed Effects:
##  (Intercept)  groupfailure  
##        74.64         -1.13

head(ranef(mod2)$`egno:year`)

##           (Intercept)
## 1001:1980   1.7279193
## 1001:1981  -0.4662437
## 1001:1984  -6.4021066
## 1001:1987   4.9639349
## 1001:1990   7.5209268
## 1004:1981   4.3457995

dotplot(ranef(mod2, condVar = TRUE))

## $`egno:year`

## 
## $egno

## 
## $year

A quick summary of these models indicates that having the interaction term is a big improvement in model fit, but again, we should probably talk more.

anova(mod1, mod2)

## refitting model(s) with ML (instead of REML)

## Data: df
## Models:
## mod1: health ~ group + (1 | egno) + (1 | year)
## mod2: health ~ group + (1 | egno) + (1 | year) + (1 | egno:year)
##      Df   AIC   BIC logLik deviance Chisq Chi Df Pr(>Chisq)    
## mod1  5 46153 46187 -23071    46143                            
## mod2  6 32245 32285 -16116    32233 13910      1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Health Climatology

Goal here is to plot out the average monthly health over all years to see if we can generate some sort of climatology type picture. Let’s prep the data first

pop.idx <- popAll.idx
pophealth <- mpophealth
for(i in 1:nrow(pop.idx)){
    pophealth[rownames(pophealth) == pop.idx$SightingEGNo[i], 1:(pop.idx$MinDateInt[i] - 1)] <- NA  # before first sight 
    pophealth[rownames(pophealth) == pop.idx$SightingEGNo[i], (pop.idx$MaxDateInt[i] + 1):ncol(pophealth)] <- NA  # after last time
  }
pophealth[which(pophealth < 5)] <- NA
mhealth <- pophealth[gender == 'M', ]
femhealth <- pophealth[gender == 'F', ]

mvec <- rep(NA, 12)
femquant <- mquant <- matrix(NA, nrow = 3, ncol = 12)
femvec <- rep(NA, 12)
yrs <- unique(monYr[,2])
for(i in 1:12){
  cols <- which(monYr[,1] == i)
  mvec[i] <- median(mhealth[, cols], na.rm = TRUE)
  femvec[i] <- median(femhealth[, cols], na.rm = TRUE)
femquant[,i] <- quantile(femhealth[, cols], c(0.05, 0.5, 0.95), na.rm = TRUE)
mquant[,i] <- quantile(mhealth[, cols], c(0.05, 0.5, 0.95), na.rm = TRUE)
}

And then visualise that

plot(femvec, type = 'b', col = 'grey60', ylim = c(78, 78.5))
lines(mvec, col = 'grey20')
text(6, 78.35, 'Males')
text(6, 78.1, 'Females')

pdf(file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/imagesMarch2015/HealthClimatology.pdf')
plot(femvec, type = 'b', col = 'grey60', ylim = c(78, 78.5))
lines(mvec, col = 'grey20')
text(6, 78.35, 'Males')
text(6, 78.1, 'Females')
dev.off()

## pdf 
##   2