The change at this iteration is to remove 2009 from the analysis. (Change made 18 March 2015)
First thing that we wanted to look at is the removal of certain time frames of health profiles for individuals that are estimated to be dead. The easiest way to do this is to look at all surviving animals. To do that, we’ll use the death variable, which is output from the model. Anything that has a value lower than nt will be removed, and then we’ll summarise the remaining data. That then will be the lower threshold.
While we’re tallying health, we’ll then find the first instance in each dying animal’s health trace that goes below the threshold. Any health values from that point forward will be set to NA. So let’s load the data:
## Warning: package 'RColorBrewer' was built under R version 3.1.2
##
## Attaching package: 'plyr'
##
## The following object is masked from 'package:lubridate':
##
## here
##
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
##
## The following objects are masked from 'package:lubridate':
##
## intersect, setdiff, union
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
First up, what do the distribution of deaths look like?
hist(death, breaks = 50)
abline(v = nt, lwd = 3)
abline(h = c(5, 10), lty = 2)
Recall that anything over 500 on the x-axis is considered alive. Dashed horizontal lines are at 5 and 10, respectively. This seems reasonable as a first cut (to me at least). Now let’s use that information to retain only the alive individuals in order to summarise their health. To do this, we’ll simply set the dead animals to NA.
mpophealth <- sumh / ng
didx <- death < nt # these are animals imputed to die before the end of the model run
mpophealth[didx,] <- NA # I then set these to NA
mpophealth[mpophealth == 0] <- NA # I then set the times before firstSight to NA
Ok with that mpophealth variable set, we can summarise the distribution of health values for all animals that are currently imputed to be alive.
quantile(mpophealth, na.rm = TRUE)
## 0% 25% 50% 75% 100%
## 19.57524 71.26664 78.62754 83.71893 93.43222
So as you can see from the quantiles there is a minimum of 19.5752442 but that’s just one animal. If you look at the histogram, it looks like a value of 32.5 is reasonable (that’s where the vertical line is drawn).
hist(mpophealth, breaks = 30)
abline(v = 32.5, lwd = 2)
We can experiment with that value, but we’ll start there. Let’s make that a matrix (pophealth) that we can use throughout the rest of the document.
hthresh <- 32.5
mpophealth <- sumh / ng
mpophealth[mpophealth < hthresh] <- NA # I then set all values less than this threshold to NA
First thing we’ll look at are the numbers from the two comparisons we make for resting and for available females. This comparison is to look at the health of successful vs. unsuccessful females. I’ll show the plots, the statistics, and the tallies for the different groups.
Per Roz’s email (4 September 2014), we need to remove the trailing `available’ times. That is, assume that we only know the animal’s status within a known calving/reproductive window, and don’t assume that the years following a last calving are to be classified as ‘available.’ We’ll only have the first year after the last known calving be available.
To make the comparisons, we need to loop over years, and then find out two groups of individuals: 1) those available in a given year and pregnant the next year; and 2) those available in a given year, but still available in the next year.
Those data look like this:
head(pchealth)
## year navail npregNext nhealth phealth
## 1 1985 12 4 75.29701 70.88400
## 2 1986 7 2 75.44524 82.96104
## 3 1987 12 7 81.62228 74.76987
## 4 1988 10 7 82.13949 82.62835
## 5 1989 6 4 74.17950 79.94247
## 6 1990 15 7 77.16702 79.34330
In the pchealth data frame, the variables are as follows:
navail is the sum of 3 to 2, and 3 to 4 animals, i.e. resting to pregnant, resting to available, respectively.npregNext is the sum of the 3 to 2 animals.phealth is the health of the animals who became pregnantnphealth is the health of the animals who DID NOT became pregnantWe can plot the numbers easily to see how many are in each group:
## pdf
## 2
You asked for a summary of these two time series (Resting to not-pregnant; Resting to pregnant) (item # 11):
| Statistic | Not Pregnant | Pregnant |
|---|---|---|
| Mean | 75.3094502 | 74.4993437 |
| SD | 4.7498714 | 4.934093 |
I’ll write these out to a csv file for further comparison:
write.csv(pchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/Summary_RestingPregnantComparison.csv')
But we also want to see if these are different. To do that, we should work with the raw data, not the mean summaries. So we need to grab those. I’ll do this by looping over year as before, but instead of grabbing the mean, I’ll just grab the whole year’s health values:
This yields a data frame where each row has one individual’s health for a given year, along with its status. Here’s what the first few lines of the data look like:
## EGNo Jan Feb Mar Apr May Jun Jul
## 1118 1118 74.47632 72.93009 71.75582 70.95635 70.52565 70.33860 70.38732
## 1127 1127 68.12994 68.80407 69.70061 70.47954 71.10782 71.65797 72.13763
## 1135 1135 65.67966 65.48355 65.30323 65.66140 66.01000 66.41044 66.90574
## 1157 1157 68.20174 68.78942 69.58654 70.69038 71.64539 72.48434 73.18712
## 1142 1142 69.41604 70.90792 72.62724 74.70894 76.48767 78.05353 79.32139
## 1145 1145 69.98919 70.57856 71.41659 72.18567 72.84624 73.45030 74.02000
## Aug Sep Oct Nov Dec year status
## 1118 70.74132 71.35974 72.25307 73.06174 73.84652 1985 32
## 1127 72.51021 72.82288 73.06947 73.19585 73.33331 1985 32
## 1135 67.54200 68.35394 69.31867 70.14101 70.81165 1985 32
## 1157 73.83711 74.47468 75.06130 75.65747 76.21642 1985 32
## 1142 80.30593 81.12743 81.65665 82.04655 82.28964 1985 34
## 1145 74.46484 74.84513 75.16298 75.48712 75.76412 1985 34
In the above, status of 32 means they were resting one year, and pregnant the next. Status of 34 means resting one year, but available the next.
Let’s plot out the densities of these to check for normality:
## pdf
## 2
They seem to be left skewed to me. But here’s the t.test comparing their means (n.b. I looked in R’s help, and these 95% CIs are for the sample estimates as we presumed yesterday) are.
t.test(as.vector(pchealthAll[pchealthAll$status == 32, 2:13]), as.vector(pchealthAll[pchealthAll$status == 34, 2:13]), alt = 'greater')
##
## Welch Two Sample t-test
##
## data: as.vector(pchealthAll[pchealthAll$status == 32, 2:13]) and as.vector(pchealthAll[pchealthAll$status == 34, 2:13])
## t = 1.1312, df = 3044.903, p-value = 0.129
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -0.1607706 Inf
## sample estimates:
## mean of x mean of y
## 73.17479 72.82108
UPDATE - 12 September 2014 - We can also try breaking this down by year. Instead of running t.test, we can accomplish the same thing with a linear mixed effects model, where pregnancy status is a fixed effect and year is a random effect:
library(lme4)
## Loading required package: Matrix
## Loading required package: Rcpp
fm1 <- lmer(value ~ status + (1 | year), dfm)
print(fm1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: value ~ status + (1 | year)
## Data: dfm
## REML criterion at convergence: 21727.04
## Random effects:
## Groups Name Std.Dev.
## year (Intercept) 4.233
## Residual 7.871
## Number of obs: 3108, groups: year, 24
## Fixed Effects:
## (Intercept) status
## 74.54979 -0.02007
Once it’s fitted, then we can look at the effect by year.
library(lattice)
ranef(fm1)
## $year
## (Intercept)
## 1985 0.6841645
## 1986 3.2620855
## 1987 3.6329235
## 1988 5.1451804
## 1989 4.2811386
## 1990 2.8996708
## 1991 -2.0103291
## 1992 2.7791659
## 1993 0.3698685
## 1994 3.2482927
## 1995 1.0784938
## 1996 1.4264034
## 1997 -8.0333321
## 1998 -5.9289451
## 1999 3.0814082
## 2000 -0.2251181
## 2001 5.1041826
## 2002 1.9030172
## 2003 1.9425190
## 2004 -7.1421172
## 2005 -8.0230479
## 2006 -3.0245701
## 2007 -4.7162719
## 2008 -1.7347833
dotplot(ranef(fm1, condVar = TRUE))
## $year
There’s a new package called sjPlot that allows for some more colourful visualisations of these (sjPlot package at CRAN). Let’s use that:
library(sjPlot)
## Warning: package 'sjPlot' was built under R version 3.1.2
pdf(file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/imagesMarch2015/HealthComparison_PregnancySuccessfulUnsuccessful_MixedEffects.pdf')
sjp.lmer(fm1, sort.coef = TRUE)
## Plotting random effects...
dev.off()
## pdf
## 2
sjp.lmer(fm1, sort.coef = TRUE)
## Plotting random effects...
I’ll write out the file for you to have the raw data.
write.csv(pchealthAll, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/RestingPregnantComparisonRawData.csv')
Now let’s look at the available females in the same way.
## year navail npregNext nhealth phealth
## 1 1988 12 3 79.31726 75.41032
## 2 1989 12 9 75.55785 80.93117
## 3 1990 5 2 65.11144 77.99922
## 4 1991 11 4 74.54692 76.29860
## 5 1992 17 4 74.81760 70.44051
## 6 1993 27 4 75.35592 82.63014
In the avchealth data frame, the variables are as follows:
navail is the sum of 4 to 2, and 4 to 4 animals, i.e. available to pregnant, available to available, respectively.npregNext is the sum of the 4 to 2 animals.phealth is the health of the animals who became pregnantnphealth is the health of the animals who DID NOT became pregnantYou asked for a summary of these two time series (Available to not-pregnant; Available to pregnant) (item # 11):
| Statistic | Not Pregnant | Pregnant |
|---|---|---|
| Mean | 72.6505894 | 74.8440756 |
| SD | 5.9337214 | 4.9437266 |
I’ll write out the file for you:
write.csv(avchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/Summary_AvailablePregnantComparison.csv')
And then we can plot the numbers in each category by year.
## pdf
## 2
And plot the yearly mean scores of health:
## pdf
## 2
## pdf
## 2
## Warning: Removed 1 rows containing missing values (geom_path).
But we want to look at the raw data as well so we’ll repeat that loop storing the actual health, not the means.
And on that dataset we’ll run the t-test:
t.test(as.vector(avchealth[avchealth$status == 42, 2:13]), as.vector(avchealth[avchealth$status == 44, 2:13]))
##
## Welch Two Sample t-test
##
## data: as.vector(avchealth[avchealth$status == 42, 2:13]) and as.vector(avchealth[avchealth$status == 44, 2:13])
## t = 4.7866, df = 4439.2, p-value = 1.751e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.9356372 2.2337594
## sample estimates:
## mean of x mean of y
## 74.54783 72.96314
As before, there are significant differences in the ones who transition to pregnant; they are healthier than those who transition to available.
Here are the density plots of these data:
## pdf
## 2
These too are left skewed, and I’ll write them out to you for further comparison.
write.csv(avchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/AvailablePregnantHealthComparisonsRawData.csv')
UPDATE - 12 September 2014 - We can also try breaking this down by year as above. Instead of running t.test, we can accomplish the same thing with a linear mixed effects model, where pregnancy status is a fixed effect and year is a random effect:
fm1 <- lmer(value ~ status + (1 | year), dfm)
print(fm1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: value ~ status + (1 | year)
## Data: dfm
## REML criterion at convergence: 37516.84
## Random effects:
## Groups Name Std.Dev.
## year (Intercept) 4.415
## Residual 10.969
## Number of obs: 4908, groups: year, 24
## Fixed Effects:
## (Intercept) status
## 117.656 -1.014
Once it’s fitted, then we can look at the effect by year.
ranef(fm1)
## $year
## (Intercept)
## 1985 4.6429460
## 1986 6.3798607
## 1987 5.1514891
## 1988 4.5848792
## 1989 4.8088678
## 1990 -3.2611511
## 1991 1.3321070
## 1992 0.2508165
## 1993 3.0233744
## 1994 1.8258111
## 1995 1.3424077
## 1996 2.7957015
## 1997 -0.3070687
## 1998 -3.4182646
## 1999 -2.5918733
## 2000 -0.6535002
## 2001 0.3110793
## 2002 -5.5231103
## 2003 0.7018151
## 2004 2.2834289
## 2005 -7.6205542
## 2006 -10.1683112
## 2007 -6.6499638
## 2008 0.7592133
dotplot(ranef(fm1, condVar = TRUE))
## $year
This is per email dated 9 October to try and make the comparison of all successful vs. unsuccessful females - regardless of their status in the before year. I’ll make a line plot and a t-test. Let’s put the data together & run the t-test:
##
## Welch Two Sample t-test
##
## data: as.vector(allfemc[allfemc$svec == 1, 2:13]) and as.vector(allfemc[allfemc$svec == 0, 2:13])
## t = 4.6507, df = 7469.871, p-value = 3.364e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.6353701 1.5612396
## sample estimates:
## mean of x mean of y
## 74.00694 72.90864
These results suggest these values are highly different, and that successful animals have a higher health. What do they look like?
## Warning: %.% is deprecated. Please use %>%
## Warning: %.% is deprecated. Please use %>%
plot(yearsum$year[yearsum$svec == 1], yearsum$health[yearsum$svec == 1], type = 'l', ylim = c(40, 90), ylab = 'Estimated Health', xlab = 'Year', lwd = 1.5, main = 'Yearly Median Health of \n Successful and Unsuccessful Females')
lines(yearsum$year[yearsum$svec == 0], yearsum$health[yearsum$svec == 0], col = 'grey')
legend(x = 'bottomleft', lwd = c(1.5, 1), legend = c('Successful', 'Unsuccessful'), col = c('black', 'grey'))
pdf(file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/imagesMarch2015/HealthSummary_Successful_Unsuccessful.pdf')
plot(yearsum$year[yearsum$svec == 1], yearsum$health[yearsum$svec == 1], type = 'l', ylim = c(40, 90), ylab = 'Estimated Health', xlab = 'Year', lwd = 1.5, main = 'Yearly Median Health of \n Successful and Unsuccessful Females')
lines(yearsum$year[yearsum$svec == 0], yearsum$health[yearsum$svec == 0], col = 'grey')
legend(x = 'bottomleft', lwd = c(1.5, 1), legend = c('Successful', 'Unsuccessful'), col = c('black', 'grey'))
dev.off()
## pdf
## 2
Here let’s plot out a scatter plot of health of those available in the current year vs. the ratio of # of calves next year / # available females this year. The filled circles is the health of those who got pregnant; the empty circles are those who didn’t:
## pdf
## 2
I’m altering this section somewhat from the first version of this document. Now I want to look at the distribution of health scores in two diffrent perods: 1) the year one years prior to having a calf; and 2) the year two years prior to having a calf. This corresponds to item #10 in the most recent punch list.
First thing we need to do is grab the data and mold it into a useful form. The logic here is that we’ll take the master data frame summary containing all individual health profiles, find the times that are one year prior and then two years prior, and then set everything else to NA.
p.idx <- which(calves == 1, arr.ind = TRUE) # this is the pregnancy year
a1idx <- a2idx <- p.idx
a2idx[, 2] <- p.idx[, 2] - 12 # this is the pregnancy year minus one year
a1idx <- a1idx[which(a1idx[,2] > 0),]
a2idx <- a2idx[which(a2idx[,2] > 0),]
pop.idx <- a1idx
pophealth <- mpophealth
psub1 <- matrix(NA, dim(pophealth)[1], dim(pophealth)[2])
psub1[pop.idx] <- pophealth[pop.idx]
psub1[psub1 < 5] <- NA # This is to remove health of females that are before firstSight
pop.idx <- a2idx
pophealth <- mpophealth
psub2 <- matrix(NA, dim(pophealth)[1], dim(pophealth)[2])
psub2[pop.idx] <- pophealth[pop.idx]
psub2[psub2 < 5] <- NA
df1 <- data.frame(year = 'year1', data = quantile(psub1, c(0.05, 0.25, 0.5, 0.75, 0.95), na.rm = TRUE))
df2 <- data.frame(year = 'year2', data = quantile(psub2, c(0.05, 0.25, 0.5, 0.75, 0.95), na.rm = TRUE))
df <- cbind(df1, df2)
df
## year data year data
## 5% year1 52.68927 year2 54.85624
## 25% year1 67.52813 year2 73.98785
## 50% year1 72.22847 year2 77.70067
## 75% year1 79.06686 year2 81.65629
## 95% year1 86.67585 year2 86.40266
And then we can visualise the histogram and density plot (I’ll facet by year):
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## pdf
## 2
Ok, let’s look at the population level health scores. First, I need to factor out the young juveniles, and then marry up the juvenile times with the adult times for a consecutive health profile.
## SightingEGNo min_date max_date MinDate MaxDate MinDateInt MaxDateInt
## 1 1005 1993-08-27 1997-08-23 8-1993 8-1997 284 332
## 2 1017 1988-04-03 2011-03-17 4-1988 3-2011 220 495
## 3 1019 1988-05-05 2009-07-18 5-1988 7-2009 221 475
## 4 1021 1988-04-03 1998-08-04 4-1988 8-1998 220 344
## 5 1026 1989-08-28 2009-09-03 8-1989 9-2009 176 477
## 6 1028 1988-09-02 2008-10-12 9-1988 10-2008 225 466
Then let’s get the population health data set up using these individual and time indices
And then plotted:
## pdf
## 2
Ok, then what are the summary stats of these data, and what do they look like? The Fivenum below summary contains: minimum, lower-hinge, median, upper-hinge, maximum.
fivenum(pophealth)
## [1] 32.56354 68.64349 78.65503 84.12143 88.80960
And then the histogram of the population health data:
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## pdf
## 2
Note that the 0’s are likely from animals that will be estimated as dead. We can talk about pulling out the health values of the animals that the model has estimated to be dead.
Finally, let’s add the calving #’s on top of the health plot to round out the picture a little bit.
## pdf
## 2
Ok, that was the population level health. What about some of the sub-population categories? We can turn to these now. I’ll leave some of the code out since it’s just a repeat of what we did to generate the population level values. We need summaries for: * Adult males * Young juvs * Old juvs * Pregnant * Lactating * Resting * Available
## pdf
## 2
## pdf
## 2
For Young Juveniles we need to see why the values are tailing off in 2009 and 2010. Easiest way to do this is to look at summaries of the body and skin condition for these time periods. I’ll start in 2008 as that is what appears to be the peak in the late 00’s. Here’s body, followed by skin
jidx <- jd1.idx[jd1.idx$MinDateInt > 456, ]
sko <- bobs <- rep(NA, nrow(jidx))
for(i in 1:nrow(jidx)){
bobs[i] <- sum(!is.na(hobs[which(ID == jidx$SightingEGNo[i]), jidx$MinDateInt[i]:jidx$MaxDateInt[i]] ))
sko[i] <- sum(!is.na(skobs[which(ID == jidx$SightingEGNo[i]), jidx$MinDateInt[i]:jidx$MaxDateInt[i]] ))
}
jidx$bobs <- bobs
jidx$skin <- sko
jidx[,c(1,2,3,8,9)]
## SightingEGNo min_date max_date bobs skin
## 242 3701 2008-05-07 2009-09-17 0 0
## 243 3705 2008-02-03 2009-09-24 0 0
## 244 3710 2008-02-03 2009-01-30 0 0
## 245 3712 2008-02-24 2009-11-07 0 0
## 246 3714 2008-01-24 2009-12-21 2 2
## 247 3720 2008-01-13 2009-09-02 0 0
## 248 3725 2008-02-14 2009-08-28 0 0
## 249 3730 2008-02-05 2009-09-17 0 0
## 250 3740 2008-02-20 2009-12-04 0 0
## 251 3742 2008-01-28 2009-12-31 0 0
## 252 3745 2008-01-06 2009-12-22 0 0
## 253 3746 2008-02-16 2009-09-26 0 0
## 254 3750 2008-02-24 2009-09-26 0 0
## 255 3760 2008-05-01 2009-10-20 0 0
## 256 3770 2008-05-07 2009-11-07 0 0
## 257 3791 2008-01-14 2009-05-12 0 0
## 258 3792 2008-01-09 2009-12-22 0 0
## 259 3802 2009-02-12 2010-04-21 0 0
## 260 3803 2010-02-15 2010-08-19 0 0
## 261 3808 2009-02-09 2010-04-18 0 0
## 262 3810 2009-02-07 2010-04-21 0 0
## 263 3812 2009-02-13 2010-05-18 0 0
## 264 3815 2009-01-14 2010-08-31 0 0
## 265 3820 2009-03-23 2010-08-11 0 0
## 266 3821 2009-03-24 2010-04-20 0 0
## 267 3822 2009-02-08 2010-04-13 0 0
## 268 3823 2009-06-07 2010-01-15 0 0
## 269 3830 2009-01-21 2010-01-28 0 0
## 270 3832 2009-02-10 2009-09-21 0 0
## 271 3843 2009-02-13 2010-02-14 0 0
## 272 3845 2009-02-05 2010-04-20 0 0
## 273 3850 2009-02-18 2010-01-19 0 0
## 274 3853 2009-01-02 2009-12-26 0 0
## 275 3890 2009-01-10 2010-04-13 0 0
## 276 3892 2009-01-05 2010-04-13 0 0
## 277 3893 2009-02-08 2010-05-04 0 0
## 278 3903 2010-01-27 2010-02-17 0 0
## 279 3904 2010-03-14 2010-10-24 0 0
## 280 3908 2010-01-18 2011-04-22 0 0
## 281 3911 2010-02-19 2011-02-03 4 4
## 282 3915 2010-01-19 2011-03-17 0 0
## 283 3930 2010-01-25 2011-04-22 0 0
## 284 3934 2010-01-28 2010-02-17 0 0
## 285 3940 2010-01-15 2010-05-18 0 0
## 286 3945 2010-01-03 2010-04-21 0 0
## 287 3946 2011-03-17 2011-03-17 0 0
## 288 3950 2010-01-28 2010-08-22 0 0
## 289 3951 2010-01-07 2011-02-17 0 0
## 290 3966 2010-01-18 2010-05-16 0 0
## 291 3981 2010-01-28 2010-03-07 0 0
## 292 3990 2010-01-28 2010-10-20 0 0
So as you can see, there’s not a lot of VHP data here. The number in the body and skin columns corresponds to the number of cases within that date range that we have an observation for that parameter.
Next we’ll move on to looking at oldJuveniles
## pdf
## 2
And now let’s look at the different reproductive Female categories. For the pregnant females, I can use the existing time query, but I’ll alter it for lactating, resting, and available per our discussions over email. Here are the pregnant females:
## pdf
## 2
Now let’s adjust the query for the other times in the reproductive life cycle. Note that I won’t print the code. it’s hidden, but here if you want it.
With the calving status updated to remove the trailing available times, we can look at the remaining three categories:
## pdf
## 2
## pdf
## 2
## pdf
## 2
## pdf
## 2
This next block of code just plots the individual traces for 2006 for the available females, as it appears that the median for that first year is quite low (shown in the preceding plot).
avchealth <- numeric(0)
idx <- which(monYr[,2] == 2006)
cavpreg <- unique(which(newCalves[, idx] == 4 ,arr.ind = TRUE)[,1])
df <- mpophealth[cavpreg, idx]
colnames(df) <- 1:12
df <- data.frame(df, ID = row.names(df))
dfL <- melt(df, id.vars = 'ID', measure.vars = 1:12)
p <- ggplot(data = dfL, aes(x = variable, y = value, group = ID))+
geom_line()
print(p)
Indeed if you look at the health traces for 2006, you can see that there are 4 animals with January 2006 health values below 50. Three animals start in the mid-50’s, but go on to recover throughout the year.
The correlation between health of all available females in one year and calves/year (in the following year) is set up with this chunk of code:
cvec <- table(calfTable$Calving.Year)
cvec <- cvec[which(names(cvec) == 1986):(length(cvec)-1)]
health <- dfAVfem[dfAVfem$category == 'avfemHealth',-c(1, ncol(dfAVfem))]
cvs <- data.frame(calfyr = names(cvec), numcalf = cvec)
hvs <- data.frame(hyear = names(health), hvals = t(health))
dfcor <- cbind(hvs[1:24,], cvs)
head(dfcor)
## hyear X1 calfyr numcalf
## 1985 1985 82.61077 1986 13
## 1986 1986 84.87700 1987 11
## 1987 1987 85.18796 1988 7
## 1988 1988 86.04096 1989 16
## 1989 1989 86.48974 1990 12
## 1990 1990 84.29497 1991 17
# cor(dfcor[,'X1'], dfcor[,'numcalf'], use = "complete.obs")
Note that I’ve done this quickly, and I’m somewhat tired. So I’ll need to go back and check the logic. But it looks right so far. The correlation is weak (-0.13357), but I still like Scott’s line of thinking about making the link (at least in discussion form) about this and pop consequences.
20 October 2014 - adding in a two year lag:
cvec <- table(calfTable$Calving.Year)
cvec <- cvec[which(names(cvec) == 1987):(length(cvec))]
cvs <- data.frame(calfyr = names(cvec), numcalf = cvec)
dfcor <- cbind(hvs, cvs)
head(dfcor)
## hyear X1 calfyr numcalf
## 1985 1985 82.61077 1987 11
## 1986 1986 84.87700 1988 7
## 1987 1987 85.18796 1989 16
## 1988 1988 86.04096 1990 12
## 1989 1989 86.48974 1991 17
## 1990 1990 84.29497 1992 12
Now the two year correlation is: (-0.2728044).
This corresponds to item #10 in the punch list. We have to generate the data by determining the individuals who gave birth in any given year, and then looking at their health one and two years prior. We’ll loop over years to do this.
cvec <- table(calfTable$Calving.Year)
cvec <- cvec[which(names(cvec) == 1985):which(names(cvec) == 2008)]
yrseq <- seq(1985, 2008)
health1 <- health2 <- rep(NA, length.out = length(yrseq))
for(i in 1:length(yrseq)){
inds <- unique(calfTable[calfTable$Calving.Year == yrseq[i], 'EGNo']) # These are the moms who gave birth in one year
r.idx <- which(rownames(mpophealth) %in% inds) # these are their positions in the health data frame
yr1idx <- which(monYr[,2] == yrseq[i]) - 12 # indices to get health 1 year prior
yr2idx <- which(monYr[,2] == yrseq[i]) - 24 # indices to get health 2 years prior
health1[i] <- mean(mpophealth[r.idx, yr1idx], na.rm = TRUE)
health2[i] <- mean(mpophealth[r.idx, yr2idx], na.rm = TRUE)
}
df <- data.frame(year = yrseq, calves = cvec, hYear1 = health1, hYear2 = health2)
head(df)
## year calves hYear1 hYear2
## 1985 1985 11 72.31036 76.79529
## 1986 1986 13 79.63716 80.81080
## 1987 1987 11 79.94732 76.81785
## 1988 1988 7 74.60439 77.65860
## 1989 1989 16 80.03650 79.38941
## 1990 1990 12 74.18565 75.20832
You asked for a correlation for these. For # Calves and health one year prior, the correlation is: -0.303643. For # Calves and health two years prior, the correlation is: -0.2037859.
I’ll write these out to you for work in excel.
write.csv(df, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/CalvesHealthOneYearTwoYearPrior.csv')
This is a new test to see if the different groups are statistically different. We’ll run an ANOVA on the different groups yearly avaerages. If this isn’t quite what you are after, let me know. 14 October 2014 n.b. I’ve add a Tukey Honest Significant Differences test here and all subsequent ones as well.
yrseq <- seq(from = 1980, to = 2008)
health <- c(adMalesHealth, yjuvsHealth, ojuvsHealth, pfemHealth, lacfemHealth, restfemHealth, avfemHealth)
group <- rep(c('AdMales', 'yJuvs', 'oJuvs', 'pfem', 'lacfem', 'restfem', 'avfem'), each = length(lacfemHealth))
yvec <- rep(yrseq, times = 7)
df <- data.frame(health = health,
group = group,
year = yvec)
aov.ex1 = aov(health ~ group, data = df)
summary(aov.ex1)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 6 1940 323.4 15.56 1.59e-14 ***
## Residuals 192 3990 20.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 4 observations deleted due to missingness
boxplot(health ~ group, data = df)
print(model.tables(aov.ex1,"means"),digits=3)
## Tables of means
## Grand mean
##
## 78.38016
##
## group
## AdMales avfem lacfem oJuvs pfem restfem yJuvs
## 79.2 77.7 73.8 81.7 78 75.3 83.4
## rep 27.0 29.0 29.0 28.0 29 29.0 28.0
phoc <- TukeyHSD(aov.ex1, 'group', conf.level = 0.95)
outtsd <- data.frame(phoc$group, sigdiff = phoc$group[,4] < 0.05)
outtsd
## diff lwr upr p.adj sigdiff
## avfem-AdMales -1.5031663 -5.1357938 2.1294612 8.804289e-01 FALSE
## lacfem-AdMales -5.4615150 -9.0941425 -1.8288875 2.552239e-04 TRUE
## oJuvs-AdMales 2.4575236 -1.2062462 6.1212934 4.188393e-01 FALSE
## pfem-AdMales -1.2297691 -4.8623965 2.4028584 9.515147e-01 FALSE
## restfem-AdMales -3.9400121 -7.5726396 -0.3073846 2.399035e-02 TRUE
## yJuvs-AdMales 4.1652174 0.5014476 7.8289872 1.473770e-02 TRUE
## lacfem-avfem -3.9583487 -7.5255181 -0.3911793 1.905323e-02 TRUE
## oJuvs-avfem 3.9606899 0.3618117 7.5595681 2.073614e-02 TRUE
## pfem-avfem 0.2733972 -3.2937722 3.8405666 9.999879e-01 FALSE
## restfem-avfem -2.4368458 -6.0040152 1.1303236 3.957851e-01 FALSE
## yJuvs-avfem 5.6683837 2.0695055 9.2672619 1.029963e-04 TRUE
## oJuvs-lacfem 7.9190386 4.3201604 11.5179168 1.036533e-08 TRUE
## pfem-lacfem 4.2317459 0.6645765 7.7989153 9.086635e-03 TRUE
## restfem-lacfem 1.5215028 -2.0456665 5.0886722 8.643503e-01 FALSE
## yJuvs-lacfem 9.6267324 6.0278542 13.2256106 2.865042e-12 TRUE
## pfem-oJuvs -3.6872927 -7.2861709 -0.0884145 4.069068e-02 TRUE
## restfem-oJuvs -6.3975357 -9.9964139 -2.7986576 6.615662e-06 TRUE
## yJuvs-oJuvs 1.7076938 -1.9226162 5.3380038 8.003959e-01 FALSE
## restfem-pfem -2.7102431 -6.2774124 0.8569263 2.670738e-01 FALSE
## yJuvs-pfem 5.3949865 1.7961083 8.9938647 2.695163e-04 TRUE
## yJuvs-restfem 8.1052295 4.5063514 11.7041077 4.432938e-09 TRUE
write.csv(outtsd, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/TukeyPostHocTest_AllSubGroupsAllYears.csv')
pdf(file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/imagesMarch2015/anovaBoxplot_results.pdf')
boxplot(health ~ group, data = df)
dev.off()
## pdf
## 2
Let’s add in a comparison of the health in these different periods for all sub-pop categories.
So now we want to examine the median health of animals in different sub groups during these 4 time periods (Updated 20 March 2015):
We also want them for the three decades. So let’s set these up as indices we can recycle below:
idx92 <- which(monYr[,2] == 1993 | monYr[,2] == 1994 | monYr[,2] == 1995)
idx97 <- which(monYr[,2] == 1998 | monYr[,2] == 1999 | monYr[,2] == 2000)
idx07 <- which(monYr[,2] == 2001 | monYr[,2] == 2002 | monYr[,2] == 2003) # n.b. this was 2007-2009; I've changed the years, but kept the index name (idx07) unchanged
idx04 <- which(monYr[,2] == 2004 | monYr[,2] == 2005 | monYr[,2] == 2006)
dec1 <- which(monYr[,2] >= 1980 & monYr[,2] <= 1989)
dec2 <- which(monYr[,2] >= 1990 & monYr[,2] <= 1999)
dec3 <- which(monYr[,2] >= 2000 & monYr[,2] <= 2008)
First let’s look at the population level health, and show it with density plots of the yearly average
## pdf
## 2
Then we can look at the ANOVA for these periods:
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 77.2801165 | 10.9607635 |
| 1998-2000 | 70.8345212 | 12.7826476 |
| 2001-2003 | 73.248665 | 12.446059 |
| 2004-2006 | 71.4192843 | 12.330317 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 3 209518 69839 471.3 <2e-16 ***
## Residuals 35006 5187694 148
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 73.08135
##
## group
## 1993-1995 1998-2000 2001-2003 2004-2006
## 77.3 70.8 73.2 71.4
## rep 7962.0 8259.0 8940.0 9849.0
## diff lwr upr p.adj sigdiff
## 1998-2000-1993-1995 -6.4455953 -6.9367852 -5.954405 0.00000e+00 TRUE
## 2001-2003-1993-1995 -4.0314516 -4.5133712 -3.549532 0.00000e+00 TRUE
## 2004-2006-1993-1995 -5.8608322 -6.3321593 -5.389505 0.00000e+00 TRUE
## 2001-2003-1998-2000 2.4141437 1.9368294 2.891458 0.00000e+00 TRUE
## 2004-2006-1998-2000 0.5847631 0.1181458 1.051380 7.03345e-03 TRUE
## 2004-2006-2001-2003 -1.8293806 -2.2862294 -1.372532 4.22995e-14 TRUE
And then for the three decades (n.b. this was added 20 October 2014):
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 79.8493781 | 9.0890584 |
| 1990-1999 | 75.2044962 | 12.2343087 |
| 2000-2008 | 72.5024043 | 12.6250179 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 452015 226007 1589 <2e-16 ***
## Residuals 66301 9429433 142
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 74.83423
##
## group
## Decade1 Decade2 Decade3
## 79.8 75.2 72.5
## rep 11568.0 25765.0 28971.0
## diff lwr upr p.adj sigdiff
## Decade2-Decade1 -4.644882 -4.957697 -4.332067 0 TRUE
## Decade3-Decade1 -7.346974 -7.654379 -7.039569 0 TRUE
## Decade3-Decade2 -2.702092 -2.941437 -2.462747 0 TRUE
Then we’ll look at Adult Males, juveniles, pregnant, lactating, resting, & available females. Note that I’ve updated this chunk of code to include the mean’s and SDs in the different periods. Between the plots I’ll put in tables with the values (means & SDs). I’ll also put in an ANOVA table of the health in the different periods.
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 77.4949231 | 10.8418349 |
| 1998-2000 | 71.0306924 | 12.9644604 |
| 2001-2003 | 74.624761 | 12.1165273 |
| 2004-2006 | 72.1952074 | 12.1114213 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 3 217168 72389 496.9 <2e-16 ***
## Residuals 39867 5807314 146
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 73.63864
##
## group
## 1993-1995 1998-2000 2001-2003 2004-2006
## 77.5 71 74.6 72.2
## rep 8063.0 9024 10425.0 12359.0
## diff lwr upr p.adj sigdiff
## 1998-2000-1993-1995 -6.464231 -6.9393857 -5.989076 0.000000e+00 TRUE
## 2001-2003-1993-1995 -2.870162 -3.3300044 -2.410320 0.000000e+00 TRUE
## 2004-2006-1993-1995 -5.299716 -5.7435894 -4.855842 0.000000e+00 TRUE
## 2001-2003-1998-2000 3.594069 3.1482467 4.039890 0.000000e+00 TRUE
## 2004-2006-1998-2000 1.164515 0.7351828 1.593847 1.928879e-11 TRUE
## 2004-2006-2001-2003 -2.429554 -2.8418747 -2.017232 0.000000e+00 TRUE
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 80.1517652 | 8.9888455 |
| 1990-1999 | 75.2501665 | 12.2764325 |
| 2000-2008 | 73.2388538 | 12.4671301 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 439180 219590 1554 <2e-16 ***
## Residuals 74553 10532987 141
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 75.12042
##
## group
## Decade1 Decade2 Decade3
## 80.2 75.3 73.2
## rep 12424.0 27045.0 35087.0
## diff lwr upr p.adj sigdiff
## Decade2-Decade1 -4.901599 -5.203524 -4.599674 0 TRUE
## Decade3-Decade1 -6.912911 -7.203741 -6.622082 0 TRUE
## Decade3-Decade2 -2.011313 -2.236729 -1.785896 0 TRUE
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 76.8178578 | 11.0618823 |
| 1998-2000 | 70.8429764 | 13.2566712 |
| 2001-2003 | 73.1273757 | 12.9402922 |
| 2004-2006 | 71.2243791 | 12.2401932 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 3 157239 52413 339.9 <2e-16 ***
## Residuals 29231 4507140 154
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 72.92154
##
## group
## 1993-1995 1998-2000 2001-2003 2004-2006
## 76.8 70.8 73.1 71.2
## rep 6823.0 7095.0 7440.0 7877.0
## diff lwr upr p.adj sigdiff
## 1998-2000-1993-1995 -5.9748814 -6.5157886 -5.4339743 0.000000e+00 TRUE
## 2001-2003-1993-1995 -3.6904821 -4.2252059 -3.1557584 0.000000e+00 TRUE
## 2004-2006-1993-1995 -5.5934787 -6.1210592 -5.0658982 0.000000e+00 TRUE
## 2001-2003-1998-2000 2.2843993 1.7550492 2.8137495 1.776357e-14 TRUE
## 2004-2006-1998-2000 0.3814027 -0.1407307 0.9035361 2.381108e-01 FALSE
## 2004-2006-2001-2003 -1.9029966 -2.4187216 -1.3872717 3.419487e-14 TRUE
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 80.9408568 | 8.4229244 |
| 1990-1999 | 75.1112892 | 12.3216573 |
| 2000-2008 | 72.3747899 | 12.6500923 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 756642 378321 2868 <2e-16 ***
## Residuals 63941 8434830 132
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 75.77848
##
## group
## Decade1 Decade2 Decade3
## 80.9 75.1 72.4
## rep 18060.0 23001.0 22883.0
## diff lwr upr p.adj sigdiff
## Decade2-Decade1 -5.829568 -6.097197 -5.561939 0 TRUE
## Decade3-Decade1 -8.566067 -8.833999 -8.298135 0 TRUE
## Decade3-Decade2 -2.736499 -2.987833 -2.485165 0 TRUE
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 77.3554233 | 10.9532869 |
| 1998-2000 | 70.8807186 | 13.1351405 |
| 2001-2003 | 72.9849625 | 12.9271123 |
| 2004-2006 | 71.0912898 | 12.5322869 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 3 212111 70704 458.9 <2e-16 ***
## Residuals 31461 4847615 154
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 73.07109
##
## group
## 1993-1995 1998-2000 2001-2003 2004-2006
## 77.4 70.9 73 71.1
## rep 7821.0 7583.0 7868 8193.0
## diff lwr upr p.adj sigdiff
## 1998-2000-1993-1995 -6.4747048 -6.9886447 -5.960765 0.000000e+00 TRUE
## 2001-2003-1993-1995 -4.3704609 -4.8796529 -3.861269 0.000000e+00 TRUE
## 2004-2006-1993-1995 -6.2641336 -6.7682659 -5.760001 0.000000e+00 TRUE
## 2001-2003-1998-2000 2.1042439 1.5910602 2.617428 3.830269e-14 TRUE
## 2004-2006-1998-2000 0.2105712 -0.2975926 0.718735 7.111623e-01 FALSE
## 2004-2006-2001-2003 -1.8936727 -2.3970341 -1.390311 2.853273e-14 TRUE
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 80.9097955 | 8.5935547 |
| 1990-1999 | 75.5472881 | 12.2218053 |
| 2000-2008 | 72.1942545 | 12.9012384 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 807647 403824 2989 <2e-16 ***
## Residuals 68777 9292524 135
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 75.79107
##
## group
## Decade1 Decade2 Decade3
## 80.9 75.5 72.2
## rep 18571.0 25509.0 24700.0
## diff lwr upr p.adj sigdiff
## Decade2-Decade1 -5.362507 -5.625294 -5.099720 0 TRUE
## Decade3-Decade1 -8.715541 -8.980135 -8.450947 0 TRUE
## Decade3-Decade2 -3.353034 -3.596222 -3.109845 0 TRUE
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 78.9290017 | 6.5579373 |
| 1998-2000 | 76.740339 | 7.6905249 |
| 2001-2003 | 74.8905612 | 8.8790286 |
| 2004-2006 | 71.2796643 | 8.8010568 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 3 18956 6319 92.3 <2e-16 ***
## Residuals 2337 159979 68
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 74.64181
##
## group
## 1993-1995 1998-2000 2001-2003 2004-2006
## 78.9 76.7 74.9 71.3
## rep 409.0 432.0 660.0 840.0
## diff lwr upr p.adj sigdiff
## 1998-2000-1993-1995 -2.188663 -3.656165 -0.7211600 7.454201e-04 TRUE
## 2001-2003-1993-1995 -4.038440 -5.377006 -2.6998749 7.509549e-13 TRUE
## 2004-2006-1993-1995 -7.649337 -8.931858 -6.3668168 6.425971e-13 TRUE
## 2001-2003-1998-2000 -1.849778 -3.166160 -0.5333959 1.758820e-03 TRUE
## 2004-2006-1998-2000 -5.460675 -6.720025 -4.2013247 6.581402e-13 TRUE
## 2004-2006-2001-2003 -3.610897 -4.717313 -2.5044809 6.838974e-13 TRUE
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 77.6662504 | 7.4670201 |
| 1990-1999 | 78.1405767 | 7.1130291 |
| 2000-2008 | 73.1725745 | 10.4802416 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 27761 13881 167 <2e-16 ***
## Residuals 4947 411174 83
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 75.4042
##
## group
## Decade1 Decade2 Decade3
## 77.7 78.1 73.2
## rep 1157.0 1177.0 2616.0
## diff lwr upr p.adj sigdiff
## Decade2-Decade1 0.4743263 -0.4105255 1.359178 4.198984e-01 FALSE
## Decade3-Decade1 -4.4936759 -5.2483039 -3.739048 3.692008e-09 TRUE
## Decade3-Decade2 -4.9680022 -5.7181716 -4.217833 3.692008e-09 TRUE
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 75.0378168 | 8.9767111 |
| 1998-2000 | 75.8438726 | 5.822301 |
| 2001-2003 | 72.3256292 | 9.4715411 |
| 2004-2006 | 68.3136934 | 7.7374493 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 3 13354 4451 60.59 <2e-16 ***
## Residuals 1898 139430 73
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 71.3502
##
## group
## 1993-1995 1998-2000 2001-2003 2004-2006
## 75 75.8 72.3 68.3
## rep 249 120.0 797.0 736.0
## diff lwr upr p.adj sigdiff
## 1998-2000-1993-1995 0.8060558 -1.643052 3.255164 8.322809e-01 FALSE
## 2001-2003-1993-1995 -2.7121876 -4.312195 -1.112180 8.124482e-05 TRUE
## 2004-2006-1993-1995 -6.7241234 -8.339838 -5.108409 2.538958e-11 TRUE
## 2001-2003-1998-2000 -3.5182434 -5.676236 -1.360251 1.694566e-04 TRUE
## 2004-2006-1998-2000 -7.5301792 -9.699843 -5.360516 2.538658e-11 TRUE
## 2004-2006-2001-2003 -4.0119358 -5.138583 -2.885288 2.539324e-11 TRUE
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 75.286746 | 9.8548663 |
| 1990-1999 | 73.4285026 | 8.96447 |
| 2000-2008 | 70.4195861 | 9.7046603 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 18887 9444 103.8 <2e-16 ***
## Residuals 4490 408572 91
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 72.48821
##
## group
## Decade1 Decade2 Decade3
## 75.3 73.4 70.4
## rep 1122.0 1274.0 2097.0
## diff lwr upr p.adj sigdiff
## Decade2-Decade1 -1.858243 -2.773874 -0.9426131 6.064321e-06 TRUE
## Decade3-Decade1 -4.867160 -5.694383 -4.0399367 4.358672e-08 TRUE
## Decade3-Decade2 -3.008917 -3.803342 -2.2144908 4.358674e-08 TRUE
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 74.5278996 | 8.958792 |
| 1998-2000 | 69.4644672 | 9.7953215 |
| 2001-2003 | 74.8661268 | 8.2959392 |
| 2004-2006 | 67.4387221 | 10.099468 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 3 20132 6711 76.03 <2e-16 ***
## Residuals 1748 154282 88
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 70.96658
##
## group
## 1993-1995 1998-2000 2001-2003 2004-2006
## 74.5 69.5 74.9 67.4
## rep 264.0 305.0 497.0 686.0
## diff lwr upr p.adj sigdiff
## 1998-2000-1993-1995 -5.0634323 -7.094286 -3.0325785 1.114780e-09 TRUE
## 2001-2003-1993-1995 0.3382272 -1.501640 2.1780949 9.650704e-01 FALSE
## 2004-2006-1993-1995 -7.0891775 -8.838912 -5.3394428 5.060841e-12 TRUE
## 2001-2003-1998-2000 5.4016596 3.644410 7.1589093 5.098810e-12 TRUE
## 2004-2006-1998-2000 -2.0257451 -3.688389 -0.3631012 9.514004e-03 TRUE
## 2004-2006-2001-2003 -7.4274047 -8.850476 -6.0043338 5.023204e-12 TRUE
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 77.2302079 | 7.293165 |
| 1990-1999 | 72.0866061 | 10.249516 |
| 2000-2008 | 70.5768716 | 9.34073 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 27801 13900 163.9 <2e-16 ***
## Residuals 3929 333225 85
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 72.71896
##
## group
## Decade1 Decade2 Decade3
## 77.2 72.1 70.6
## rep 968.0 1313.0 1651.0
## diff lwr upr p.adj sigdiff
## Decade2-Decade1 -5.143602 -6.058321 -4.2288830 2.117008e-08 TRUE
## Decade3-Decade1 -6.653336 -7.527417 -5.7792556 2.117008e-08 TRUE
## Decade3-Decade2 -1.509735 -2.308149 -0.7113204 2.837961e-05 TRUE
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 75.7572333 | 11.2670512 |
| 1998-2000 | 71.3794323 | 10.2091415 |
| 2001-2003 | 73.9695108 | 12.013284 |
| 2004-2006 | 67.1271768 | 15.1667478 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 3 23700 7900 59.93 <2e-16 ***
## Residuals 3092 407593 132
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 72.60122
##
## group
## 1993-1995 1998-2000 2001-2003 2004-2006
## 75.8 71.4 74 67.1
## rep 984.0 1332.0 408 372.0
## diff lwr upr p.adj sigdiff
## 1998-2000-1993-1995 -4.377801 -5.6183641 -3.1372379 0.0000000000 TRUE
## 2001-2003-1993-1995 -1.787723 -3.5254883 -0.0499568 0.0409821745 TRUE
## 2004-2006-1993-1995 -8.630056 -10.4262787 -6.8338343 0.0000000000 TRUE
## 2001-2003-1998-2000 2.590078 0.9201738 4.2599831 0.0003983173 TRUE
## 2004-2006-1998-2000 -4.252256 -5.9829103 -2.5216007 0.0000000000 TRUE
## 2004-2006-2001-2003 -6.842334 -8.9579897 -4.7266783 0.0000000000 TRUE
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 78.8526422 | 4.9647962 |
| 1990-1999 | 73.5188655 | 11.6230927 |
| 2000-2008 | 71.1501121 | 12.933057 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 31359 15679 124.3 <2e-16 ***
## Residuals 5097 643149 126
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 73.72157
##
## group
## Decade1 Decade2 Decade3
## 78.9 73.5 71.2
## rep 828.0 2844.0 1428.0
## diff lwr upr p.adj sigdiff
## Decade2-Decade1 -5.333777 -6.373695 -4.293858 0.000000e+00 TRUE
## Decade3-Decade1 -7.702530 -8.852849 -6.552211 0.000000e+00 TRUE
## Decade3-Decade2 -2.368753 -3.222865 -1.514642 2.579519e-10 TRUE
Ok, we need to make the data to write to excel. What we want is for each row to correspond to a population or sub-population category, and then each column to have the yearly summary of health (with CIs). I’ll copy this file, along with the others, to a “data” folder in the main dropbox paper folder.
dfAll <- rbind(dfPop, dfadMales, dfyjuves, dfojuves, dfpregfem, dflacfem, dfrestfem, dfavfem)
write.csv(dfAll, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/YearlyHealthSummariesByGroup.csv')
Ok, this is just a one-off test of doing a comparison of the ‘before’ and ‘after’ test for one individual. In this case the before will be either resting or available years, and the after will be pregnant years. So 1245 has had 4 calves in these years:
calfTable[calfTable$EGNo == 1245,]
## EGNo Calving.Year CalfNo CalvingInterval
## 167 1245 1996 2645 NA
## 168 1245 2001 3145 5
## 169 1245 2005 3545 4
## 170 1245 2008 3845 3
idx <- which(ID == 1245)
df <- data.frame(status = newCalves[idx,], dates = as.Date(paste(monYr[,1], 01 , monYr[,2], sep = '/'), format = "%m/%d/%Y"))
df <- df[which(df$status != 0),]
p <- ggplot(df, aes(dates, status, group = status, colour = factor(status)))+
geom_point()+
labs(title = paste('EGNo: ', ID[idx], sep = ''), colour = 'status')+
scale_colour_discrete(labels = c('Calving', 'Pregnant', 'Resting', 'Available'))
p
Ok, let’s set up the vectors to get her health in these periods. What I will define as the successful transitions are either resting to pregnant (3 to 2) or available to pregnant (4 to 2). The unsuccessful transitions will be resting to available (3 to 4) or available to available (4 to 4).
Ok, now that we have the indices set up, let’s get the health enumerated.
Phew. Now that that is behind us, let’s examine health in good versus bad years for EGNo 1245:
t.test(success, failure, paired = TRUE)
##
## Paired t-test
##
## data: success and failure
## t = 4.4319, df = 23, p-value = 0.0001918
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 8.561647 23.550197
## sample estimates:
## mean of the differences
## 16.05592
This is item #12 from the punch list. This time it’s a proper paired t-test.
And then let’s run the t-test. Here I’ll run two. The first is a simple paired with the idea that the means are different. The second specifies that the alternative hypothesis is that the health in success years is greater than health in failure years.
t.test(success.tall, failure.tall, paired = TRUE)
##
## Paired t-test
##
## data: success.tall and failure.tall
## t = 1.6379, df = 54, p-value = 0.1073
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.299517 2.972745
## sample estimates:
## mean of the differences
## 1.336614
t.test(success.tall, failure.tall, paired = TRUE, alt = "greater")
##
## Paired t-test
##
## data: success.tall and failure.tall
## t = 1.6379, df = 54, p-value = 0.05363
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -0.02913975 Inf
## sample estimates:
## mean of the differences
## 1.336614
I can also test these difference with an ANOVA, which will keep more of the data:
df <- rbind(success.all, failure.all)
aov.ex1 = aov(health ~ group, data = df)
summary(aov.ex1)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 1 3333 3333 31.25 2.36e-08 ***
## Residuals 6618 705877 107
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 4 observations deleted due to missingness
print(model.tables(aov.ex1,"means"),digits=3)
## Tables of means
## Grand mean
##
## 74.05305
##
## group
## success failure
## 75 73.5
## rep 2492 4128.0
boxplot(health ~ group, data = df)
So in both cases the successful year is higher than the unsuccessful year. The significance in the t-test is marginal, but in the ANOVA is clear.
22 October 2014 However, let’s add the mixed effects model. First, here’s a summary of the data by group:
## Source: local data frame [2 x 3]
##
## group mean sd
## 1 success 74.96629 9.17765
## 2 failure 73.50173 10.96354
And by individual (n.b. I’ve written these out to csv as all the rows don’t print to screen):
dfout <- df %>% group_by(egno, group) %>% summarise(mean = mean(health, na.rm = TRUE), sd = sd(health, na.rm = TRUE))
dfout
## Source: local data frame [110 x 4]
## Groups: egno
##
## egno group mean sd
## 1 1001 success 81.55432 5.8707690
## 2 1001 failure 81.26708 0.6619314
## 3 1004 success 74.11819 12.1827599
## 4 1004 failure 76.33360 4.9114960
## 5 1007 success 78.49137 1.9023210
## 6 1007 failure 81.43335 2.4378604
## 7 1012 success 73.44134 11.1419490
## 8 1012 failure 74.54672 8.6347499
## 9 1013 success 72.25758 12.3557549
## 10 1013 failure 76.47423 8.7298574
## .. ... ... ... ...
write.csv(dfout, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataMarch2015/Summary_PairedTTestInputData.csv')
And here’s the test of health by group using individual as well as year as random effects.
library(lme4)
mod1 <- lmer(health ~ group + (1 | egno) + (1 | year), data = df)
print(mod1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: health ~ group + (1 | egno) + (1 | year)
## Data: df
## REML criterion at convergence: 46141.76
## Random effects:
## Groups Name Std.Dev.
## egno (Intercept) 6.633
## year (Intercept) 4.154
## Residual 7.682
## Number of obs: 6620, groups: egno, 55; year, 33
## Fixed Effects:
## (Intercept) groupfailure
## 75.085 -1.093
dotplot(ranef(mod1, condVar = TRUE))
## $egno
##
## $year
And then an interaction term between individual and year, though I’m thinking we should meet to talk about this approach before going too much further. In particular, I would need to sort out the paired success/failure years to explain this more.
mod2 <- lmer(health ~ group + (1 | egno) + (1 | year) + (1 | egno:year), data = df)
print(mod2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: health ~ group + (1 | egno) + (1 | year) + (1 | egno:year)
## Data: df
## REML criterion at convergence: 32229.31
## Random effects:
## Groups Name Std.Dev.
## egno:year (Intercept) 7.978
## egno (Intercept) 5.990
## year (Intercept) 3.045
## Residual 2.214
## Number of obs: 6620, groups: egno:year, 552; egno, 55; year, 33
## Fixed Effects:
## (Intercept) groupfailure
## 74.64 -1.13
head(ranef(mod2)$`egno:year`)
## (Intercept)
## 1001:1980 1.7279193
## 1001:1981 -0.4662437
## 1001:1984 -6.4021066
## 1001:1987 4.9639349
## 1001:1990 7.5209268
## 1004:1981 4.3457995
dotplot(ranef(mod2, condVar = TRUE))
## $`egno:year`
##
## $egno
##
## $year
A quick summary of these models indicates that having the interaction term is a big improvement in model fit, but again, we should probably talk more.
anova(mod1, mod2)
## refitting model(s) with ML (instead of REML)
## Data: df
## Models:
## mod1: health ~ group + (1 | egno) + (1 | year)
## mod2: health ~ group + (1 | egno) + (1 | year) + (1 | egno:year)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## mod1 5 46153 46187 -23071 46143
## mod2 6 32245 32285 -16116 32233 13910 1 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Goal here is to plot out the average monthly health over all years to see if we can generate some sort of climatology type picture. Let’s prep the data first
pop.idx <- popAll.idx
pophealth <- mpophealth
for(i in 1:nrow(pop.idx)){
pophealth[rownames(pophealth) == pop.idx$SightingEGNo[i], 1:(pop.idx$MinDateInt[i] - 1)] <- NA # before first sight
pophealth[rownames(pophealth) == pop.idx$SightingEGNo[i], (pop.idx$MaxDateInt[i] + 1):ncol(pophealth)] <- NA # after last time
}
pophealth[which(pophealth < 5)] <- NA
mhealth <- pophealth[gender == 'M', ]
femhealth <- pophealth[gender == 'F', ]
mvec <- rep(NA, 12)
femquant <- mquant <- matrix(NA, nrow = 3, ncol = 12)
femvec <- rep(NA, 12)
yrs <- unique(monYr[,2])
for(i in 1:12){
cols <- which(monYr[,1] == i)
mvec[i] <- median(mhealth[, cols], na.rm = TRUE)
femvec[i] <- median(femhealth[, cols], na.rm = TRUE)
femquant[,i] <- quantile(femhealth[, cols], c(0.05, 0.5, 0.95), na.rm = TRUE)
mquant[,i] <- quantile(mhealth[, cols], c(0.05, 0.5, 0.95), na.rm = TRUE)
}
And then visualise that
plot(femvec, type = 'b', col = 'grey60', ylim = c(78, 78.5))
lines(mvec, col = 'grey20')
text(6, 78.35, 'Males')
text(6, 78.1, 'Females')
pdf(file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/imagesMarch2015/HealthClimatology.pdf')
plot(femvec, type = 'b', col = 'grey60', ylim = c(78, 78.5))
lines(mvec, col = 'grey20')
text(6, 78.35, 'Males')
text(6, 78.1, 'Females')
dev.off()
## pdf
## 2