Part I: Further Reading

Drummond and Vowler address a few considerations regarding ANOVA tests:

There are two important points in particular:

Part II: Data Analysis

2.2 Analysis

2.2.1: Interaction and Main Effect Plots

Figure 1 represents the four subgroups of quoll aversion data using box plots. J-U indicates the Juvenile-Untrained group, J-T represents the Juvenile-Trained group, A-U stands for Adult-Untrained, and A-T shows the measurements for the Adult-Trained group. The box indicates the 25th and 75th percentiles, with the black line indicating the median of the data. The dashed line plungers indicate the most extreme data point that is 1.5 times the interquartile range from the quartiles, and the blank dots indicate the most extreme data that doesnt lie in that range.

Figure 2 represents the four subgroups of quoll aversion data using box plots. J-U indicates the Juvenile-Untrained group, J-T represents the Juvenile-Trained group, A-U stands for Adult-Untrained, and A-T shows the measurements for the Adult-Trained group. The box indicates the 25th and 75th percentiles, with the black line indicating the median of the data. The dashed line plungers indicate the most extreme data point that is 1.5 times the interquartile range from the quartiles, and the blank dots indicate the most extreme data that doesnt lie in that range. Figures 2A and 2C show all four subgroups, colored according to age and training, respectively. Figures 2B and 2D show the data grouped into the Juvenile-Adult and Untrained-Trained groups, respectively.

Figure 3 illustrates the interaction plots of both Age and Training with Time. Figure 3A represents the interaction Training has with Age, and Figure 3B represents the interaction Age has with Training.

2.2.2 Answer

Based on the boxplots in Figures 1 and 2, distribution of the subgroups (Juvenile vs. Adult, Untrained vs. Trained) are different, indicating that these two factors might have an effect. The interaction plots in Figure 3 look very similar to each other in that they indicate a large interaction between the factors, and that there are effects due to both factors. It seems that both age and training may cause more aversion to the toxic cane toad. In both graphs, there was a higher mean inspection time in one group (juvenile and untrained, respectively). More tests are needed to quantitatively determine these effects.

2.2.3 Two-Way ANOVA

results.2way = anova2way.boot(quolldat$Time, quolldat$Age, quolldat$Training, center = mean, agg = ssq)
results.round <- round(x = results.2way$p, digits = 4)
theory = lm(Time ~ Age*Training, data=quolldat)
summary(theory)
## 
## Call:
## lm(formula = Time ~ Age * Training, data = quolldat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.8300  -4.5800   0.1225   4.9200  10.1850 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     42.680      1.501  28.427  < 2e-16 ***
## AgeJuvenile                      9.945      2.123   4.684 1.21e-05 ***
## TrainingUntrained                8.835      2.123   4.161 8.26e-05 ***
## AgeJuvenile:TrainingUntrained   -7.130      3.003  -2.374   0.0201 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.714 on 76 degrees of freedom
## Multiple R-squared:  0.3215, Adjusted R-squared:  0.2947 
## F-statistic: 12.01 on 3 and 76 DF,  p-value: 1.62e-06

Figure 4 is the histogram of the bootstrapped 2-way F-statistic. The calculated F-statistics are shown by the purple, green, and pink lines for age, training, and interactions, respectively. The p-values for each statistic are shown at the top of the line.

Table 1A: Two-Way ANOVA Test Results via Handmade Function

Test Variable F-statistic p-value
Age 0.2376027 0
Training 0.162118 310^{-4}
Interactions 0.0741872 0.0192

These values were not expected, which indicated a potential issue with the code (see Formulae Appendix). Seeing this, I chose to look at the built-in ANOVA function in R. While flawed, it could potentially give F-statistics that were more in line with what was expected.

model <- lm(Time ~ Training * Age, data=quolldat)
result <- round(anova(model),4)
result$Fval <- result$`F value`
result$Pval <- result$`Pr(>F)`

Table 1B: Two-Way ANOVA Test Results via Built-in Function

Test Variable F-statistic p-value
Age 12.321 810^{-4}
Training 18.0578 10^{-4}
Interactions 5.6382 0.0201

These values were more aligned with the expected variance; even so, I chose to assess the p-values of the handmade function, as their relative comparisons seemed to show similar results, and the handmade function corrects for the discrepancies and assumptions found within the built-in function.

2.2.4 Answer A

One must first consider the F statistic of the interactions, to see if they have an effect, before performing any other ANOVA tests. The \(\alpha\) was 0.0741872, which is greater than the p-value of 0.0192. Therefore, one may conclude that the additive model can be rejected; that means that both Age and Training have an effect on outcome. Therefore, the two-way F-statistics generated from these analyses will not be used, as they could provide results that are inaccurate. Therefore, we must perform one-way ANOVA tests on both age and training separately.

2.2.4 One-Way ANOVA Tests

#see formula appendix
age <- fstat.all(quolldat$Time, quolldat$Age)
training <- fstat.all(quolldat$Time, quolldat$Training)

Figure 5 illustrates the histograms of the bootstrapped one-way F-statistics. The calculated F-statistics are shown by the purple and green lines for age and training, respectively. The p-values for each statistic are shown at the top of the line.

Table 2A: One-Way ANOVA Test Results via Handmade Function

Test Variable F-statistic p-value
Age 0.1921878 610^{-4}
Training 0.1235853 0.0032

2.2.4 Answer B

As shown in Table 2, the F-statistics are less than one in both tests, but the p-values are significant (610^{-4} and 0.0032, for age and training respectively). Because the F-statistics are less than one, that indicates that there is a greater amount of variance within the groups than between the groups. Because of this, it seems that there might not be a large effect of these two (as opposed to an F-statisic that is greater than one).

After further inspection, it seems that went awry with the code (see Formulae appendix). As such, I chose to verify the information with the built in ANOVA modeling. My results were as follows:

resultT <- anova(lm(Time ~ Training, data=quolldat))
TF <- round(resultT$`F value`[1], 4)
TP <- round(resultT$`Pr(>F)`[1], 4)

resultA <- anova(lm(Time ~ Age, data=quolldat))
AF <- round(resultA$`F value`[1], 4)
AP <- round(resultA$`Pr(>F)`[1], 4)

Table 2B: One-Way ANOVA Test Results via Built-in Function

Test Variable F-statistic p-value
Age 14.9906 210^{-4}
Training 9.6397 0.0027

The p-values from these tests are also significant, but the primary difference is that the F-statistics are actually greater than one, which is much closer to the expected results. Based on visual inspection of the data, it seemed likely that age had a greater effect on time spent inspecting cane toads than training did, and these F-statistics do point to that information; the F-statistic is higher for the comparison of the age samples than it is for the training sample groups. However, as both of the values seem large, one can conclude that these both have an effect on the time spent inspecting the toxic toads.

Figure 6 compares age subpopulations and training subpopulations in a violin plot. The white dot indicates the median, with the black box indicating the values of the upper and lower quantiles. Figure 6A compares the age subgroups, while Figure 6B compares those of the training characteristics.

2.2.6 Answer

Post-Hoc testing can be performed on the two types of variables (age, training). Inspecting the violin plots, which were used in order to see the distributions of the data, led to the conclusion that the subgroups did not come from the same underlying populations, and as such, two-box comparisons were used for both of the post-hoc comparisons. By inspecting the distribution of the data, it seemed that the best measure of central tendency to use was the median, as there was a fairly large amount of skew in all four subgroups. After performing these tests, the p-values must be adjusted to account for the chance of false positivies when performing multiple evaluations.

JA <- twobox.test(quoll.j$Time, quoll.a$Time)
UT <- twobox.test(quoll.u$Time, quoll.t$Time)

#pvalues <- c(JA$p, AJ$p, UT$p, TU$p)
pvalues <- c(JA[4], UT[4])
qvalues <- p.adjust(pvalues, method = p.adjust.methods, n= length(pvalues))

The two-group comparisons yielded the following results:

Comparison of Juvenile vs. Adult:

  • Median difference: 7.7 (4.7, 10.9)
  • P-value: 910^{-4} - Corrected p-value: 0.0018
  • F-statistic p-value: 610^{-4}

Comparison of Untrained vs. Trained:

  • Median difference: 7.25 (-0.65, 10.65)
  • P-value: 0.0376 - Corrected p-value: 0.0376
  • F-statistic p-value: 0.0032

Age and training both have a statistically significant negative effect on the median time inspecting a cane toad, according to the two-box tests. The difference in medians for the age subgroups had a positive effect, while the confidence intervals for the median differences for the training subgroups contained zero effect, meaning that there was a possibility that there was no effect. The training seems to have a minimal effect on making the quolls averse to the toxic toads, but it seems that age also has an effect. This highlights the importance of learning when it comes to survival behaviors. Overall, in order to conserve this species, it would seem that training would help. However, given the amount of effort required train the quolls, it would be incredibly costly to try and train the quoll populations for an effect that can also be seen just by comparing juvenile to adult quolls.

Formulae Appendix

Two-Way ANOVA

ssq <- function(x) sum(x^2)
sab <- function(x) sum(abs(x))

#inner function
Fstat.2way <- function(X, f1, f2, center = mean, agg = sab, rank = F){
  if (rank == F) {Y = X} else {Y = rank(-X)}
  #take the grand mean
  mu = center(Y)
  #generate f1, f2, and interaction groupmeans
  groupmeans.f1 = ave(Y, f1, FUN = center)
  groupmeans.f2 = ave(Y, f2, FUN = center)
  groupmeans.int = ave(Y, f1, f2, FUN = center)
  #calculate the aggregate functions
  SSB.f1 <- agg(groupmeans.f1 - mu)
  SSB.f2 <- agg(groupmeans.f2 - mu)
  SSB.int <- agg(groupmeans.int - (groupmeans.f1 + groupmeans.f2 - mu))
  SSW <- agg(Y - groupmeans.int)
  return( c(f1 = SSB.f1, f2 = SSB.f2, int = SSB.int) / SSW)
}

#outer function
anova2way.boot <- function(X, f1, f2, deals = 10000, center= mean, agg = sab, rank = FALSE){
  dataF <- Fstat.2way(X, f1, f2, center = center, agg = agg, rank = rank)
  #create the vector
  bootF = matrix(NA, nrow = deals, ncol = 3)
  #create the bootstrap within this function
  for(i in 1:deals) {
    bootX = sample (X, replace = T)
    bootF[i,] = Fstat.2way(bootX, f1, f2, center = center, agg = agg, rank = rank)
  }
  pval = c( p1 = sum(bootF[,1] > dataF[1])/deals,
            p2 = sum(bootF[,2] > dataF[2])/deals,
            pint = sum(bootF[,3] > dataF[3])/deals)
  return(list(Fstat = dataF, bootF = bootF, p = pval))
}

One-Way ANOVA

#1 way fstat function
Fstat.1way <- function(Y, group, center = mean, agg = sumsquares) {
  GroupMeans <- ave(Y, group, FUN = center)
  SSB <- agg(GroupMeans - center(Y))
  SSW <- agg(Y - GroupMeans)
  FF <- SSB/SSW
  
  return(FF=FF)
}

#bootstrap for p-value of fstat
Fstat.1way.p = function(Y, group, center = mean, agg = sumsquares){
    deals = 10000
    bootF <- rep(NA, deals)
    for (i in 1:deals){
    quolldat$boottime <- sample(Y, replace = TRUE)
    bootF[i]  = Fstat.1way(quolldat$boottime, group)
    }
    return(bootF)
}

Two-Group Comparisons

twobox.test <- function(datasetA, datasetB){
  #Effect size
deals = 10000
difference = rep(NA,deals)

for (i in 1:deals) {
bootdataA = sample(datasetA,length(datasetA),replace=TRUE)
bootdataB = sample(datasetB,length(datasetB),replace=TRUE)
difference[i] = median(bootdataA) - median(bootdataB)
}

est.median = median(difference)

#calculate confidence intervals
CI = sort(difference)[c(0.025*deals,0.975*deals)]

#NHST
boxA = datasetA - median(datasetA) 
boxB = datasetB - median(datasetB) 
deals = 10000
difference = rep(NA,deals)
for (i in 1:deals) {
  bootdataA = sample(boxA,length(datasetA),replace=TRUE)
  bootdataB = sample(boxB,length(datasetB),replace=TRUE) 
  bootmedianA = median(bootdataA)
  bootmedianB = median(bootdataB)
  
  difference[i] = bootmedianA-bootmedianB
}

diff <- median(datasetA)-median(datasetB)

#generate the p-value thresholds
HP <- sum(difference > abs(diff))
LP <- sum(difference < -abs(diff))

#calculate the pvalue
pval = (HP+LP)/deals
return(c(diff = diff, CI=CI, p=pval, est= est.median))
}


onebox.test <- function(datasetA, datasetB){
  #Effect size
deals = 10000
difference = rep(NA,deals)

for (i in 1:deals) {
bootdataA = sample(datasetA,length(datasetA),replace=TRUE)
bootdataB = sample(datasetB,length(datasetB),replace=TRUE)
difference[i] = median(bootdataA) - median(bootdataB)
}

est.median = median(difference)

#calculate confidence intervals
CI = sort(difference)[c(0.025*deals,0.975*deals)]

#NHST
deals = 10000
    datajoined = c(datasetA, datasetB)
    difference = rep(NA, deals)

for (i in 1 : deals) {
bootdataA = sample(datajoined, length(datasetA), replace = TRUE)
bootdataB = sample(datajoined, length(datasetB), replace = TRUE)
bootmedianA = median (bootdataA)
bootmedianB = median (bootdataB)
difference[i] = bootmedianA - bootmedianB
}

diff <- median(datasetA)-median(datasetB)

#generate the p-value thresholds
HP <- sum(difference > abs(diff))
LP <- sum(difference < -abs(diff))

#calculate the pvalue
pval = (HP+LP)/deals
return(c(diff = diff, CI=CI, p=pval, est= est.median))
}