I’ll try to interweave results - tabular and graphical - for the things you’ve highlighted in the revised outline (dated 28 May 2014). This will now include information from the last punch list.
First thing that we wanted to look at is the removal of certain time frames of health profiles for individuals that are estimated to be dead. The easiest way to do this is to look at all surviving animals. To do that, we’ll use the death variable, which is output from the model. Anything that has a value lower than nt will be removed, and then we’ll summarise the remaining data. That then will be the lower threshold.
While we’re tallying health, we’ll then find the first instance in each dying animal’s health trace that goes below the threshold. Any health values from that point forward will be set to NA. So let’s load the data:
##
## Attaching package: 'plyr'
##
## The following object is masked from 'package:lubridate':
##
## here
##
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:plyr':
##
## arrange, desc, failwith, id, mutate, summarise, summarize
##
## The following objects are masked from 'package:lubridate':
##
## intersect, setdiff, union
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
First up, what do the distribution of deaths look like?
hist(death, breaks = 50)
abline(v = nt, lwd = 3)
abline(h = c(5, 10), lty = 2)
Recall that anything over 500 on the x-axis is considered alive. Dashed horizontal lines are at 5 and 10, respectively. This seems reasonable as a first cut (to me at least). Now let’s use that information to retain only the alive individuals in order to summarise their health. To do this, we’ll simply set the dead animals to NA.
mpophealth <- sumh / ng
didx <- death < nt # these are animals imputed to die before the end of the model run
mpophealth[didx,] <- NA # I then set these to NA
mpophealth[mpophealth == 0] <- NA # I then set the times before firstSight to NA
Ok with that mpophealth variable set, we can summarise the distribution of health values for all animals that are currently imputed to be alive.
quantile(mpophealth, na.rm = TRUE)
## 0% 25% 50% 75% 100%
## 19.58 71.27 78.63 83.72 93.43
So as you can see from the quantiles there is a minimum of 19.5752 but that’s just one animal. If you look at the histogram, it looks like a value of 32.5 is reasonable (that’s where the vertical line is drawn).
hist(mpophealth, breaks = 30)
abline(v = 32.5, lwd = 2)
We can experiment with that value, but we’ll start there. Let’s make that a matrix (pophealth) that we can use throughout the rest of the document.
hthresh <- 32.5
mpophealth <- sumh / ng
mpophealth[mpophealth < hthresh] <- NA # I then set all values less than this threshold to NA
First thing we’ll look at are the numbers from the two comparisons we make for resting and for available females. This comparison is to look at the health of successful vs. unsuccessful females. I’ll show the plots, the statistics, and the tallies for the different groups.
Per Roz’s email (4 September 2014), we need to remove the trailing `available’ times. That is, assume that we only know the animal’s status within a known calving/reproductive window, and don’t assume that the years following a last calving are to be classified as ‘available.’ We’ll only have the first year after the last known calving be available.
To make the comparisons, we need to loop over years, and then find out two groups of individuals: 1) those available in a given year and pregnant the next year; and 2) those available in a given year, but still available in the next year.
Those data look like this:
head(pchealth)
## year navail npregNext nhealth phealth
## 1 1985 12 4 75.30 70.88
## 2 1986 7 2 75.45 82.96
## 3 1987 12 7 81.62 74.77
## 4 1988 10 7 82.14 82.63
## 5 1989 6 4 74.18 79.94
## 6 1990 15 7 77.17 79.34
In the pchealth data frame, the variables are as follows:
navail is the sum of 3 to 2, and 3 to 4 animals, i.e. resting to pregnant, resting to available, respectively.npregNext is the sum of the 3 to 2 animals.phealth is the health of the animals who became pregnantnphealth is the health of the animals who DID NOT became pregnantWe can plot the numbers easily to see how many are in each group:
## pdf
## 2
You asked for a summary of these two time series (Resting to not-pregnant; Resting to pregnant) (item # 11):
| Statistic | Not Pregnant | Pregnant |
|---|---|---|
| Mean | 75.3095 | 74.4993 |
| SD | 4.7499 | 4.9341 |
I’ll write these out to a csv file for further comparison:
write.csv(pchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/Summary_RestingPregnantComparison.csv')
But we also want to see if these are different. To do that, we should work with the raw data, not the mean summaries. So we need to grab those. I’ll do this by looping over year as before, but instead of grabbing the mean, I’ll just grab the whole year’s health values:
This yields a data frame where each row has one individual’s health for a given year, along with its status. Here’s what the first few lines of the data look like:
## EGNo Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1118 1118 74.48 72.93 71.76 70.96 70.53 70.34 70.39 70.74 71.36 72.25
## 1127 1127 68.13 68.80 69.70 70.48 71.11 71.66 72.14 72.51 72.82 73.07
## 1135 1135 65.68 65.48 65.30 65.66 66.01 66.41 66.91 67.54 68.35 69.32
## 1157 1157 68.20 68.79 69.59 70.69 71.65 72.48 73.19 73.84 74.47 75.06
## 1142 1142 69.42 70.91 72.63 74.71 76.49 78.05 79.32 80.31 81.13 81.66
## 1145 1145 69.99 70.58 71.42 72.19 72.85 73.45 74.02 74.46 74.85 75.16
## Nov Dec year status
## 1118 73.06 73.85 1985 32
## 1127 73.20 73.33 1985 32
## 1135 70.14 70.81 1985 32
## 1157 75.66 76.22 1985 32
## 1142 82.05 82.29 1985 34
## 1145 75.49 75.76 1985 34
In the above, status of 32 means they were resting one year, and pregnant the next. Status of 34 means resting one year, but available the next.
Let’s plot out the densities of these to check for normality:
## pdf
## 2
They seem to be left skewed to me. But here’s the t.test comparing their means (n.b. I looked in R’s help, and these 95% CIs are for the sample estimates as we presumed yesterday) are. You can see that the ones that transition to pregnant are healthier:
t.test(as.vector(pchealthAll[pchealthAll$status == 32, 2:13]), as.vector(pchealthAll[pchealthAll$status == 34, 2:13]))
##
## Welch Two Sample t-test
##
## data: as.vector(pchealthAll[pchealthAll$status == 32, 2:13]) and as.vector(pchealthAll[pchealthAll$status == 34, 2:13])
## t = 1.131, df = 3045, p-value = 0.2581
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2594 0.9668
## sample estimates:
## mean of x mean of y
## 73.17 72.82
I’ll write out the file for you to have the raw data.
write.csv(pchealthAll, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/RestingPregnantComparisonRawData.csv')
Now let’s look at the available females in the same way.
## year navail npregNext nhealth phealth
## 1 1985 5 2 78.82 79.23
## 2 1986 11 3 80.65 79.30
## 3 1987 13 6 78.84 79.94
## 4 1988 12 3 79.32 75.41
## 5 1989 12 9 75.56 80.93
## 6 1990 5 2 65.11 78.00
In the avchealth data frame, the variables are as follows:
navail is the sum of 4 to 2, and 4 to 4 animals, i.e. available to pregnant, available to available, respectively.npregNext is the sum of the 4 to 2 animals.phealth is the health of the animals who became pregnantnphealth is the health of the animals who DID NOT became pregnantYou asked for a summary of these two time series (Available to not-pregnant; Available to pregnant) (item # 11):
| Statistic | Not Pregnant | Pregnant |
|---|---|---|
| Mean | 73.5356 | 75.4246 |
| SD | 5.9973 | 4.8711 |
I’ll write out the file for you:
write.csv(avchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/Summary_AvailablePregnantComparison.csv')
And then we can plot the numbers in each category by year.
## pdf
## 2
And plot the yearly mean scores of health:
## pdf
## 2
## pdf
## 2
## Warning: Removed 1 rows containing missing values (geom_path).
But we want to look at the raw data as well so we’ll repeat that loop storing the actual health, not the means.
And on that dataset we’ll run the t-test:
t.test(as.vector(avchealth[avchealth$status == 42, 2:13]), as.vector(avchealth[avchealth$status == 44, 2:13]))
##
## Welch Two Sample t-test
##
## data: as.vector(avchealth[avchealth$status == 42, 2:13]) and as.vector(avchealth[avchealth$status == 44, 2:13])
## t = 4.787, df = 4439, p-value = 1.751e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.9356 2.2338
## sample estimates:
## mean of x mean of y
## 74.55 72.96
As before, there are significant differences in the ones who transition to pregnant; they are healthier than those who transition to available.
Here are the density plots of these data:
## pdf
## 2
These too are left skewed, and I’ll write them out to you for further comparison.
write.csv(avchealth, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/AvailablePregnantHealthComparisonsRawData.csv')
Here let’s plot out a scatter plot of health of those available in the current year vs. the ratio of # of calves next year / # available females this year. The filled circles is the health of those who got pregnant; the empty circles are those who didn’t:
## pdf
## 2
I’m altering this section somewhat from the first version of this document. Now I want to look at the distribution of health scores in two diffrent perods: 1) the year one years prior to having a calf; and 2) the year two years prior to having a calf. This corresponds to item #10 in the most recent punch list.
First thing we need to do is grab the data and mold it into a useful form. The logic here is that we’ll take the master data frame summary containing all individual health profiles, find the times that are one year prior and then two years prior, and then set everything else to NA.
p.idx <- which(calves == 1, arr.ind = TRUE) # this is the pregnancy year
a1idx <- a2idx <- p.idx
a2idx[, 2] <- p.idx[, 2] - 12 # this is the pregnancy year minus one year
a1idx <- a1idx[which(a1idx[,2] > 0),]
a2idx <- a2idx[which(a2idx[,2] > 0),]
pop.idx <- a1idx
pophealth <- mpophealth
psub1 <- matrix(NA, dim(pophealth)[1], dim(pophealth)[2])
psub1[pop.idx] <- pophealth[pop.idx]
psub1[psub1 < 5] <- NA # This is to remove health of females that are before firstSight
pop.idx <- a2idx
pophealth <- mpophealth
psub2 <- matrix(NA, dim(pophealth)[1], dim(pophealth)[2])
psub2[pop.idx] <- pophealth[pop.idx]
psub2[psub2 < 5] <- NA
df1 <- data.frame(year = 'year1', data = quantile(psub1, c(0.05, 0.25, 0.5, 0.75, 0.95), na.rm = TRUE))
df2 <- data.frame(year = 'year2', data = quantile(psub2, c(0.05, 0.25, 0.5, 0.75, 0.95), na.rm = TRUE))
df <- cbind(df1, df2)
df
## year data year data
## 5% year1 52.69 year2 54.86
## 25% year1 67.53 year2 73.99
## 50% year1 72.23 year2 77.70
## 75% year1 79.07 year2 81.66
## 95% year1 86.68 year2 86.40
And then we can visualise the histogram and density plot (I’ll facet by year):
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## pdf
## 2
Ok, let’s look at the population level health scores. First, I need to factor out the young juveniles, and then marry up the juvenile times with the adult times for a consecutive health profile.
## SightingEGNo min_date max_date MinDate MaxDate MinDateInt MaxDateInt
## 1 1005 1993-08-27 1997-08-23 8-1993 8-1997 284 332
## 2 1017 1988-04-03 2011-03-17 4-1988 3-2011 220 495
## 3 1019 1988-05-05 2009-07-18 5-1988 7-2009 221 475
## 4 1021 1988-04-03 1998-08-04 4-1988 8-1998 220 344
## 5 1026 1989-08-28 2009-09-03 8-1989 9-2009 176 477
## 6 1028 1988-09-02 2008-10-12 9-1988 10-2008 225 466
Then let’s get the population health data set up using these individual and time indices
And then plotted:
## pdf
## 2
Ok, then what are the summary stats of these data, and what do they look like? The Fivenum below summary contains: minimum, lower-hinge, median, upper-hinge, maximum.
fivenum(pophealth)
## [1] 32.56 68.64 78.66 84.12 88.81
And then the histogram of the population health data:
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## pdf
## 2
Note that the 0’s are likely from animals that will be estimated as dead. We can talk about pulling out the health values of the animals that the model has estimated to be dead.
Finally, let’s add the calving #’s on top of the health plot to round out the picture a little bit.
## pdf
## 2
Ok, that was the population level health. What about some of the sub-population categories? We can turn to these now. I’ll leave some of the code out since it’s just a repeat of what we did to generate the population level values. We need summaries for: * Adult males * Young juvs * Old juvs * Pregnant * Lactating * Resting * Available
## pdf
## 2
## pdf
## 2
For Young Juveniles we need to see why the values are tailing off in 2009 and 2010. Easiest way to do this is to look at summaries of the body and skin condition for these time periods. I’ll start in 2008 as that is what appears to be the peak in the late 00’s. Here’s body, followed by skin
jidx <- jd1.idx[jd1.idx$MinDateInt > 456, ]
sko <- bobs <- rep(NA, nrow(jidx))
for(i in 1:nrow(jidx)){
bobs[i] <- sum(!is.na(hobs[which(ID == jidx$SightingEGNo[i]), jidx$MinDateInt[i]:jidx$MaxDateInt[i]] ))
sko[i] <- sum(!is.na(skobs[which(ID == jidx$SightingEGNo[i]), jidx$MinDateInt[i]:jidx$MaxDateInt[i]] ))
}
jidx$bobs <- bobs
jidx$skin <- sko
jidx[,c(1,2,3,8,9)]
## SightingEGNo min_date max_date bobs skin
## 242 3701 2008-05-07 2009-09-17 0 0
## 243 3705 2008-02-03 2009-09-24 0 0
## 244 3710 2008-02-03 2009-01-30 0 0
## 245 3712 2008-02-24 2009-11-07 0 0
## 246 3714 2008-01-24 2009-12-21 2 2
## 247 3720 2008-01-13 2009-09-02 0 0
## 248 3725 2008-02-14 2009-08-28 0 0
## 249 3730 2008-02-05 2009-09-17 0 0
## 250 3740 2008-02-20 2009-12-04 0 0
## 251 3742 2008-01-28 2009-12-31 0 0
## 252 3745 2008-01-06 2009-12-22 0 0
## 253 3746 2008-02-16 2009-09-26 0 0
## 254 3750 2008-02-24 2009-09-26 0 0
## 255 3760 2008-05-01 2009-10-20 0 0
## 256 3770 2008-05-07 2009-11-07 0 0
## 257 3791 2008-01-14 2009-05-12 0 0
## 258 3792 2008-01-09 2009-12-22 0 0
## 259 3802 2009-02-12 2010-04-21 0 0
## 260 3803 2010-02-15 2010-08-19 0 0
## 261 3808 2009-02-09 2010-04-18 0 0
## 262 3810 2009-02-07 2010-04-21 0 0
## 263 3812 2009-02-13 2010-05-18 0 0
## 264 3815 2009-01-14 2010-08-31 0 0
## 265 3820 2009-03-23 2010-08-11 0 0
## 266 3821 2009-03-24 2010-04-20 0 0
## 267 3822 2009-02-08 2010-04-13 0 0
## 268 3823 2009-06-07 2010-01-15 0 0
## 269 3830 2009-01-21 2010-01-28 0 0
## 270 3832 2009-02-10 2009-09-21 0 0
## 271 3843 2009-02-13 2010-02-14 0 0
## 272 3845 2009-02-05 2010-04-20 0 0
## 273 3850 2009-02-18 2010-01-19 0 0
## 274 3853 2009-01-02 2009-12-26 0 0
## 275 3890 2009-01-10 2010-04-13 0 0
## 276 3892 2009-01-05 2010-04-13 0 0
## 277 3893 2009-02-08 2010-05-04 0 0
## 278 3903 2010-01-27 2010-02-17 0 0
## 279 3904 2010-03-14 2010-10-24 0 0
## 280 3908 2010-01-18 2011-04-22 0 0
## 281 3911 2010-02-19 2011-02-03 4 4
## 282 3915 2010-01-19 2011-03-17 0 0
## 283 3930 2010-01-25 2011-04-22 0 0
## 284 3934 2010-01-28 2010-02-17 0 0
## 285 3940 2010-01-15 2010-05-18 0 0
## 286 3945 2010-01-03 2010-04-21 0 0
## 287 3946 2011-03-17 2011-03-17 0 0
## 288 3950 2010-01-28 2010-08-22 0 0
## 289 3951 2010-01-07 2011-02-17 0 0
## 290 3966 2010-01-18 2010-05-16 0 0
## 291 3981 2010-01-28 2010-03-07 0 0
## 292 3990 2010-01-28 2010-10-20 0 0
So as you can see, there’s not a lot of VHP data here. The number in the body and skin columns corresponds to the number of cases within that date range that we have an observation for that parameter.
Next we’ll move on to looking at oldJuveniles
## pdf
## 2
And now let’s look at the different reproductive Female categories. For the pregnant females, I can use the existing time query, but I’ll alter it for lactating, resting, and available per our discussions over email. Here are the pregnant females:
## pdf
## 2
Now let’s adjust the query for the other times in the reproductive life cycle. Note that I won’t print the code. it’s hidden, but here if you want it.
With the calving status updated to remove the trailing available times, we can look at the remaining three categories:
## pdf
## 2
## pdf
## 2
## pdf
## 2
This next block of code just plots the individual traces for 2006 for the available females, as it appears that the median for that first year is quite low (shown in the preceding plot).
avchealth <- numeric(0)
idx <- which(monYr[,2] == 2006)
cavpreg <- unique(which(newCalves[, idx] == 4 ,arr.ind = TRUE)[,1])
df <- mpophealth[cavpreg, idx]
colnames(df) <- 1:12
df <- data.frame(df, ID = row.names(df))
dfL <- melt(df, id.vars = 'ID', measure.vars = 1:12)
p <- ggplot(data = dfL, aes(x = variable, y = value, group = ID))+
geom_line()
print(p)
Indeed if you look at the health traces for 2006, you can see that there are 4 animals with January 2006 health values below 50. Three animals start in the mid-50’s, but go on to recover throughout the year.
The correlation between health of all available females in one year and calves/year (in the following year) is set up with this chunk of code:
cvec <- table(calfTable$Calving.Year)
cvec <- cvec[which(names(cvec) == 1986):(length(cvec)-1)]
health <- dfavfem[dfavfem$category == 'avfemHealth',-c(1, ncol(dfavfem))]
cvs <- data.frame(calfyr = names(cvec), numcalf = cvec)
hvs <- data.frame(hyear = names(health), hvals = t(health))
dfcor <- cbind(hvs[1:24,], cvs)
head(dfcor)
## hyear X1 calfyr numcalf
## 1985 1985 77.48 1986 13
## 1986 1986 79.55 1987 11
## 1987 1987 78.74 1988 7
## 1988 1988 80.19 1989 16
## 1989 1989 79.31 1990 12
## 1990 1990 77.85 1991 17
# cor(dfcor[,'health'], dfcor[,'numcalf'], use = "complete.obs")
Note that I’ve done this quickly, and I’m somewhat tired. So I’ll need to go back and check the logic. But it looks right so far. The correlation is weak (-0.1173), but I still like Scott’s line of thinking about making the link (at least in discussion form) about this and pop consequences.
This corresponds to item #10 in the punch list. We have to generate the data by determining the individuals who gave birth in any given year, and then looking at their health one and two years prior. We’ll loop over years to do this.
cvec <- table(calfTable$Calving.Year)
cvec <- cvec[which(names(cvec) == 1985):(length(cvec)-1)]
yrseq <- seq(1985, 2009)
health1 <- health2 <- rep(NA, length.out = length(yrseq))
for(i in 1:length(yrseq)){
inds <- unique(calfTable[calfTable$Calving.Year == yrseq[i], 'EGNo']) # These are the moms who gave birth in one year
r.idx <- which(rownames(mpophealth) %in% inds) # these are their positions in the health data frame
yr1idx <- which(monYr[,2] == yrseq[i]) - 12 # indices to get health 1 year prior
yr2idx <- which(monYr[,2] == yrseq[i]) - 24 # indices to get health 2 years prior
health1[i] <- mean(mpophealth[r.idx, yr1idx], na.rm = TRUE)
health2[i] <- mean(mpophealth[r.idx, yr2idx], na.rm = TRUE)
}
df <- data.frame(year = yrseq, calves = cvec, hYear1 = health1, hYear2 = health2)
head(df)
## year calves hYear1 hYear2
## 1985 1985 11 72.31 76.80
## 1986 1986 13 79.64 80.81
## 1987 1987 11 79.95 76.82
## 1988 1988 7 74.60 77.66
## 1989 1989 16 80.04 79.39
## 1990 1990 12 74.19 75.21
You asked for a correlation for these. For # Calves and health one year prior, the correlation is: -0.3815. For # Calves and health two years prior, the correlation is: -0.3682.
I’ll write these out to you for work in excel.
write.csv(df, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/CalvesHealthOneYearTwoYearPrior.csv')
This is a new test to see if the different groups are statistically different. We’ll run an ANOVA on the different groups yearly avaerages. If this isn’t quite what you are after, let me know.
health <- c(adMalesHealth, yjuvsHealth, ojuvsHealth, pfemHealth, lacfemHealth, restfemHealth, avfemHealth)
group <- rep(c('AdMales', 'yJuvs', 'oJuvs', 'pfem', 'lacfem', 'restfem', 'avfem'), each = length(lacfemHealth))
df <- data.frame(health = health,
group = group)
aov.ex1 = aov(health ~ group, data = df)
summary(aov.ex1)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 6 1577 262.9 12.6 1e-11 ***
## Residuals 167 3475 20.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
print(model.tables(aov.ex1,"means"),digits=3)
## Tables of means
## Grand mean
##
## 77.98
##
## group
## AdMales avfem lacfem oJuvs pfem restfem yJuvs
## 78.5 77.5 73.6 81.2 77.9 74.6 82.6
## rep 25.0 24.0 25.0 25.0 25.0 25.0 25.0
boxplot(health ~ group, data = df)
So now we want to examine the median health of animals in different sub groups during 3 time periods:
We also want them for the three decades. So let’s set these up as indices we can recycle below:
idx92 <- which(monYr[,2] == 1993 | monYr[,2] == 1994 | monYr[,2] == 1995)
idx97 <- which(monYr[,2] == 1998 | monYr[,2] == 1999 | monYr[,2] == 2000)
idx04 <- which(monYr[,2] == 2004 | monYr[,2] == 2005 | monYr[,2] == 2006)
dec1 <- which(monYr[,2] >= 1980 & monYr[,2] <= 1989)
dec2 <- which(monYr[,2] >= 1990 & monYr[,2] <= 1999)
dec3 <- which(monYr[,2] >= 2000 & monYr[,2] <= 2009)
First let’s look at the population level health, and show it with density plots of the yearly average
## pdf
## 2
Then we’ll look at Adult Males, juveniles, pregnant, lactating, resting, & available females. Note that I’ve updated this chunk of code to include the mean’s and SDs in the different periods. Between the plots I’ll put in tables with the values (means & SDs). I’ll also put in an ANOVA table of the health in the different periods.
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 77.4949 | 10.8418 |
| 1998-2000 | 71.0307 | 12.9645 |
| 2004-2006 | 72.1952 | 12.1114 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 203441 101720 700 <2e-16 ***
## Residuals 29443 4276964 145
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 73.29
##
## group
## Period1 Period2 Period3
## 77.5 71 72.2
## rep 8063.0 9024 12359.0
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 80.1518 | 8.9888 |
| 1990-1999 | 75.2502 | 12.2764 |
| 2000-2009 | 73.065 | 12.7411 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 480106 240053 1649 <2e-16 ***
## Residuals 79308 11547177 146
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 74.92
##
## group
## Decade1 Decade2 Decade3
## 80.2 75.3 73.1
## rep 12424.0 27045.0 39842.0
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 76.8179 | 11.0619 |
| 1998-2000 | 70.843 | 13.2567 |
| 2004-2006 | 71.2244 | 12.2402 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 156816 78408 524 <2e-16 ***
## Residuals 21792 3261471 150
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 72.85
##
## group
## Period1 Period2 Period3
## 76.8 70.8 71.2
## rep 6823.0 7095.0 7877.0
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 80.9409 | 8.4229 |
| 1990-1999 | 75.1113 | 12.3217 |
| 2000-2009 | 72.3887 | 12.7387 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 784196 392098 2928 <2e-16 ***
## Residuals 66628 8922320 134
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 75.65
##
## group
## Decade1 Decade2 Decade3
## 80.9 75.1 72.4
## rep 18060.0 23001.0 25570.0
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 77.3554 | 10.9533 |
| 1998-2000 | 70.8807 | 13.1351 |
| 2004-2006 | 71.0913 | 12.5323 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 212034 106017 708 <2e-16 ***
## Residuals 23594 3532958 150
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 73.1
##
## group
## Period1 Period2 Period3
## 77.4 70.9 71.1
## rep 7821.0 7583.0 8193.0
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 80.9098 | 8.5936 |
| 1990-1999 | 75.5473 | 12.2218 |
| 2000-2009 | 72.1267 | 13.1398 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 862250 431125 3102 <2e-16 ***
## Residuals 72143 10027099 139
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 75.6
##
## group
## Decade1 Decade2 Decade3
## 80.9 75.5 72.1
## rep 18571.0 25509.0 28066.0
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 78.929 | 6.5579 |
| 1998-2000 | 76.7403 | 7.6905 |
| 2004-2006 | 71.2797 | 8.8011 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 18899 9450 147 <2e-16 ***
## Residuals 1678 108026 64
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 74.54
##
## group
## Period1 Period2 Period3
## 78.9 76.7 71.3
## rep 409.0 432.0 840.0
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 77.6663 | 7.467 |
| 1990-1999 | 78.1406 | 7.113 |
| 2000-2009 | 73.137 | 10.6118 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 29280 14640 171 <2e-16 ***
## Residuals 5175 444107 86
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 75.29
##
## group
## Decade1 Decade2 Decade3
## 77.7 78.1 73.1
## rep 1157.0 1177.0 2844.0
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 75.0378 | 8.9767 |
| 1998-2000 | 75.8439 | 5.8223 |
| 2004-2006 | 68.3137 | 7.7374 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 12048 6024 97.6 <2e-16 ***
## Residuals 1102 68021 62
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 70.65
##
## group
## Period1 Period2 Period3
## 75 75.8 68.3
## rep 249 120.0 736.0
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 75.2867 | 9.8549 |
| 1990-1999 | 73.4285 | 8.9645 |
| 2000-2009 | 70.5919 | 10.8319 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 19082 9541 92.4 <2e-16 ***
## Residuals 4958 512005 103
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 72.38
##
## group
## Decade1 Decade2 Decade3
## 75.3 73.4 70.6
## rep 1122.0 1274.0 2565.0
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 74.5279 | 8.9588 |
| 1998-2000 | 69.4645 | 9.7953 |
| 2004-2006 | 67.4387 | 10.0995 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 9581 4791 49.9 <2e-16 ***
## Residuals 1252 120146 96
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 69.42
##
## group
## Period1 Period2 Period3
## 74.5 69.5 67.4
## rep 264.0 305.0 686.0
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 77.2302 | 7.2932 |
| 1990-1999 | 72.0866 | 10.2495 |
| 2000-2009 | 70.8518 | 10.1595 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 26901 13450 146 <2e-16 ***
## Residuals 4193 386819 92
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 72.71
##
## group
## Decade1 Decade2 Decade3
## 77.2 72.1 70.9
## rep 968.0 1313.0 1915.0
## pdf
## 2
| Year | Mean | SD |
|---|---|---|
| 1993-1995 | 75.7572 | 11.2671 |
| 1998-2000 | 71.3794 | 10.2091 |
| 2004-2006 | 67.1272 | 15.1667 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 22821 11410 87.8 <2e-16 ***
## Residuals 2685 348855 130
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 72.39
##
## group
## Period1 Period2 Period3
## 75.8 71.4 67.1
## rep 984.0 1332.0 372.0
| Year | Mean | SD |
|---|---|---|
| 1980-1989 | 78.8526 | 4.9648 |
| 1990-1999 | 73.5189 | 11.6231 |
| 2000-2009 | 71.1501 | 12.9331 |
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 31359 15679 124 <2e-16 ***
## Residuals 5097 643149 126
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tables of means
## Grand mean
##
## 73.72
##
## group
## Decade1 Decade2 Decade3
## 78.9 73.5 71.2
## rep 828.0 2844.0 1428.0
Ok, we need to make the data to write to excel. What we want is for each row to correspond to a population or sub-population category, and then each column to have the yearly summary of health (with CIs). I’ll copy this file, along with the others, to a “data” folder in the main dropbox paper folder.
dfAll <- rbind(dfPop, dfadMales, dfyjuves, dfojuves, dfpregfem, dflacfem, dfrestfem, dfavfem)
write.csv(dfAll, file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/dataJuly2014/YearlyHealthSummariesByGroup.csv')
Ok, this is just a one-off test of doing a comparison of the ‘before’ and ‘after’ test for one individual. In this case the before will be either resting or available years, and the after will be pregnant years. So 1245 has had 4 calves in these years:
calfTable[calfTable$EGNo == 1245,]
## EGNo Calving.Year CalfNo CalvingInterval
## 167 1245 1996 2645 NA
## 168 1245 2001 3145 5
## 169 1245 2005 3545 4
## 170 1245 2008 3845 3
idx <- which(ID == 1245)
df <- data.frame(status = newCalves[idx,], dates = as.Date(paste(monYr[,1], 01 , monYr[,2], sep = '/'), format = "%m/%d/%Y"))
df <- df[which(df$status != 0),]
p <- ggplot(df, aes(dates, status, group = status, colour = factor(status)))+
geom_point()+
labs(title = paste('EGNo: ', ID[idx], sep = ''), colour = 'status')+
scale_colour_discrete(labels = c('Calving', 'Pregnant', 'Resting', 'Available'))
p
Ok, let’s set up the vectors to get her health in these periods. What I will define as the successful transitions are either resting to pregnant (3 to 2) or available to pregnant (4 to 2). The unsuccessful transitions will be resting to available (3 to 4) or available to available (4 to 4).
Ok, now that we have the indices set up, let’s get the health enumerated.
Phew. Now that that is behind us, let’s examine health in good versus bad years for EGNo 1245:
t.test(success, failure, paired = TRUE)
##
## Paired t-test
##
## data: success and failure
## t = 4.432, df = 23, p-value = 0.0001918
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 8.562 23.550
## sample estimates:
## mean of the differences
## 16.06
This is item #12 from the punch list. This time it’s a proper paired t-test.
femID <- unique(sights[which(sights$Age == 'A' & sights$GenderCode == 'F'), 'SightingEGNo'])
outvals <- numeric(0)
success.aov <- failure.aov <- failure <- success <- success.all <- failure.all <- numeric(0)
for(j in 1:length(femID)){
idx <- femID[j]
ridx <- which(row.names(newCalves) == idx)
df <- data.frame(year = monYr[,2], status = newCalves[ridx,], change = newCalves[ridx,] - lead(newCalves[ridx,]))
rpidx <- monYr[which(df$status == 3 & df$change == 1), 2]
apidx <- monYr[which(df$status == 4 & df$change == 2), 2]
raidx <- monYr[which(df$status == 3 & df$change == -1), 2]
avavidx <- numeric(0)
yrseq <- 1980:2009
for(i in 1:length(yrseq)){
yidx <- which(monYr[,2] == yrseq[i])
nyidx <- which(monYr[,2] == yrseq[i + 1])
cidx <- which(newCalves[ridx, yidx] == 4 & newCalves[ridx, nyidx] == 4, arr.ind = TRUE)
if(length(cidx) == 0){next} else {
avavidx <- c(avavidx, yrseq[i])
}
}
if( length(c(rpidx, apidx)) < 1 | length(c(raidx, avavidx)) < 1){next}
pophealth <- mpophealth
pophealth[which(pophealth < 5)] <- NA
if(length(rpidx) > 0){rphealth <- data.frame(health = pophealth[ridx, which(monYr[, 2] == rpidx)], status = 32)}
if(length(apidx) > 0){aphealth <- data.frame(health = pophealth[ridx, which(monYr[, 2] == apidx)], status = 42)}
if(length(rpidx) > 0 & length(apidx) > 0){success <- c(rphealth$health, aphealth$health)}
if(length(rpidx) > 0 & length(apidx) == 0){success <- c(rphealth$health)}
if(length(rpidx) == 0 & length(apidx) > 0){success <- c(aphealth$health)}
if(length(raidx) > 0){rahealth <- data.frame(health = pophealth[ridx, which(monYr[, 2] == raidx)], status = 34)}
if(length(avavidx) > 0){aahealth <- data.frame(health = pophealth[ridx, which(monYr[, 2] == avavidx)], status = 44)}
if(length(avavidx) > 0 & length(raidx) > 0){failure <- c(rahealth$health, aahealth$health)}
if(length(avavidx) > 0 & length(raidx) == 0){failure <- c(aahealth$health)}
if(length(avavidx) == 0 & length(raidx) > 0){failure <- c(rahealth$health)}
success.all <- c(success.all, mean(success, na.rm = TRUE))
failure.all <- c(failure.all, mean(failure, na.rm = TRUE))
success.aov <- c(success.aov, success)
failure.aov <- c(failure.aov, failure)
}
And then let’s run the t-test. Here I’ll run two. The first is a simple paired with the idea that the means are different. The second specifies that the alternative hypothesis is that the health in success years is greater than health in failure years.
t.test(success.all, failure.all, paired = TRUE)
##
## Paired t-test
##
## data: success.all and failure.all
## t = 1.641, df = 54, p-value = 0.1066
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3252 3.2603
## sample estimates:
## mean of the differences
## 1.468
t.test(success.all, failure.all, paired = TRUE, alt = "greater")
##
## Paired t-test
##
## data: success.all and failure.all
## t = 1.641, df = 54, p-value = 0.05328
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -0.02894 Inf
## sample estimates:
## mean of the differences
## 1.468
I can also test these difference with an ANOVA, which will keep more of the data:
df <- data.frame(health = c(success.aov, failure.aov), group = rep(c('success', 'failure'), times = c(length(success.aov), length(failure.aov))))
aov.ex1 = aov(health ~ group, data = df)
summary(aov.ex1)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 1 3094 3094 30.4 4e-08 ***
## Residuals 2334 237910 102
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 4 observations deleted due to missingness
print(model.tables(aov.ex1,"means"),digits=3)
## Tables of means
## Grand mean
##
## 73.92
##
## group
## failure success
## 72.8 75.2
## rep 1248.0 1088.0
boxplot(health ~ group, data = df)
So in both cases the successful year is higher than the unsuccessful year. The significance in the t-test is marginal, but in the ANOVA is clear.
rep(c(‘success’, ‘failure’), times = c(length(success.all), length(failure.all)))
Goal here is to plot out the average monthly health over all years to see if we can generate some sort of climatology type picture. Let’s prep the data first
pop.idx <- popAll.idx
pophealth <- mpophealth
for(i in 1:nrow(pop.idx)){
pophealth[rownames(pophealth) == pop.idx$SightingEGNo[i], 1:(pop.idx$MinDateInt[i] - 1)] <- NA # before first sight
pophealth[rownames(pophealth) == pop.idx$SightingEGNo[i], (pop.idx$MaxDateInt[i] + 1):ncol(pophealth)] <- NA # after last time
}
pophealth[which(pophealth < 5)] <- NA
mhealth <- pophealth[gender == 'M', ]
femhealth <- pophealth[gender == 'F', ]
mvec <- rep(NA, 12)
femquant <- mquant <- matrix(NA, nrow = 3, ncol = 12)
femvec <- rep(NA, 12)
yrs <- unique(monYr[,2])
for(i in 1:12){
cols <- which(monYr[,1] == i)
mvec[i] <- median(mhealth[, cols], na.rm = TRUE)
femvec[i] <- median(femhealth[, cols], na.rm = TRUE)
femquant[,i] <- quantile(femhealth[, cols], c(0.05, 0.5, 0.95), na.rm = TRUE)
mquant[,i] <- quantile(mhealth[, cols], c(0.05, 0.5, 0.95), na.rm = TRUE)
}
And then visualise that
plot(femvec, type = 'b', col = 'grey60', ylim = c(78, 78.5))
lines(mvec, col = 'grey20')
text(6, 78.35, 'Males')
text(6, 78.1, 'Females')
pdf(file = '/Users/rob/Dropbox/Papers/RollandEtAl_RightWhaleHealth/imagesJuly2014/HealthClimatology.pdf')
plot(femvec, type = 'b', col = 'grey60', ylim = c(78, 78.5))
lines(mvec, col = 'grey20')
text(6, 78.35, 'Males')
text(6, 78.1, 'Females')
dev.off()
## pdf
## 2