Week 6 Assignment

Part I: Further Reading

Drummond and Vowler address a few considerations regarding ANOVA tests:

There are a few assumptions that ANOVA makes: that the residuals are normally distrubuted, and that the variation between grooups is relatively similar. In addition, the power of ANOVA is significantly limited by outliers, so excluding these may provide a more accurate test.
ANOVA tests differences in measures of central tendence between 2 or more groups. This means it assesses the differences between groups, and determines if this is due to error by comparing it with the differences within the groups.
One-Way ANOVA is reserved for one independent variable, whereas Two-Way ANOVA is used when there are two or more independent variables. Two-Way is especially when useful when trying to consider the interactions two independent variables might have, especially in the context of experimental design.
Using repeated measures increases the chances of false positives, especially when the subject is measured more than once. In cases where there are multiple measurements, then ANOVA might not be the best test for that dataset.

There are two important points in particular:

ANOVAs measure the variation between two or more groups. Two-Way ANOVAs are supposed to be used when there is more than one independent variable for every measured value. This is calculated in the form of the F-statistic, which the the ratio of the sum of squares between groups to the sum of squares within groups. The higher the F-statistic, the greater relative variance there is between two groups, and therefore the dependent variable is more likely to be affected by the independent variables. An F-statistic of 1 means that the variance between as the variance within the groups, which is indicative of the samples coming from the same populations. The aforementioned assumptions must be considered, but a large sample size is the most integral to generating these values.
There is a possibility that two independent variables can interact, which would affect the ANOVA. As such, a Two-Way ANOVA will determine the F-statistic of the interactions, and if the additive model is sufficient to determine the variance. One-Way ANOVA is used when assessing only one variable, and can only be used on a dataset with more than one independent variable if it is determined that the interactions are significant. Distinguishing between the two types of ANOVA is essential due to the different information given by each test. One-Way ANOVA is limited to one factor, but a Two-Way can perform on two or more factors (with more than two factors, this must be a repeated measure).

Part II: Data Analysis

2.2 Analysis

2.2.1: Interaction and Main Effect Plots

Figure 1 represents the four subgroups of quoll aversion data using box plots. J-U indicates the Juvenile-Untrained group, J-T represents the Juvenile-Trained group, A-U stands for Adult-Untrained, and A-T shows the measurements for the Adult-Trained group. The box indicates the 25th and 75th percentiles, with the black line indicating the median of the data. The dashed line plungers indicate the most extreme data point that is 1.5 times the interquartile range from the quartiles, and the blank dots indicate the most extreme data that doesnt lie in that range.

Figure 2 represents the four subgroups of quoll aversion data using box plots. J-U indicates the Juvenile-Untrained group, J-T represents the Juvenile-Trained group, A-U stands for Adult-Untrained, and A-T shows the measurements for the Adult-Trained group. The box indicates the 25th and 75th percentiles, with the black line indicating the median of the data. The dashed line plungers indicate the most extreme data point that is 1.5 times the interquartile range from the quartiles, and the blank dots indicate the most extreme data that doesnt lie in that range. Figures 2A and 2C show all four subgroups, colored according to age and training, respectively. Figures 2B and 2D show the data grouped into the Juvenile-Adult and Untrained-Trained groups, respectively.

Figure 3 illustrates the interaction plots of both Age and Training with Time. Figure 3A represents the interaction Training has with Age, and Figure 3B represents the interaction Age has with Training.

2.2.2 Answer

Based on the boxplots in Figures 1 and 2, distribution of the subgroups (Juvenile vs. Adult, Untrained vs. Trained) are different, indicating that these two factors might have an effect. The interaction plots in Figure 3 look very similar to each other in that they indicate a large interaction between the factors, and that there are effects due to both factors. It seems that both age and training may cause more aversion to the toxic cane toad. In both graphs, there was a higher mean inspection time in one group (juvenile and untrained, respectively). More tests are needed to quantitatively determine these effects.

2.2.3 Two-Way ANOVA

results.2way = anova2way.boot(quolldat$Time, quolldat$Age, quolldat$Training, center = mean, agg = ssq)
results.round <- round(x = results.2way$p, digits = 4)
theory = lm(Time ~ Age*Training, data=quolldat)
summary(theory)

## 
## Call:
## lm(formula = Time ~ Age * Training, data = quolldat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.8300  -4.5800   0.1225   4.9200  10.1850 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     42.680      1.501  28.427  < 2e-16 ***
## AgeJuvenile                      9.945      2.123   4.684 1.21e-05 ***
## TrainingUntrained                8.835      2.123   4.161 8.26e-05 ***
## AgeJuvenile:TrainingUntrained   -7.130      3.003  -2.374   0.0201 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.714 on 76 degrees of freedom
## Multiple R-squared:  0.3215, Adjusted R-squared:  0.2947 
## F-statistic: 12.01 on 3 and 76 DF,  p-value: 1.62e-06

Figure 4 is the histogram of the bootstrapped 2-way F-statistic. The calculated F-statistics are shown by the purple, green, and pink lines for age, training, and interactions, respectively. The p-values for each statistic are shown at the top of the line.

Table 1A: Two-Way ANOVA Test Results via Handmade Function

Test Variable	F-statistic	p-value
Age	0.2376027	0
Training	0.162118	310^{-4}
Interactions	0.0741872	0.0192

These values were not expected, which indicated a potential issue with the code (see Formulae Appendix). Seeing this, I chose to look at the built-in ANOVA function in R. While flawed, it could potentially give F-statistics that were more in line with what was expected.

model <- lm(Time ~ Training * Age, data=quolldat)
result <- round(anova(model),4)
result$Fval <- result$`F value`
result$Pval <- result$`Pr(>F)`

Table 1B: Two-Way ANOVA Test Results via Built-in Function

Test Variable	F-statistic	p-value
Age	12.321	810^{-4}
Training	18.0578	10^{-4}
Interactions	5.6382	0.0201

These values were more aligned with the expected variance; even so, I chose to assess the p-values of the handmade function, as their relative comparisons seemed to show similar results, and the handmade function corrects for the discrepancies and assumptions found within the built-in function.

2.2.4 Answer A

One must first consider the F statistic of the interactions, to see if they have an effect, before performing any other ANOVA tests. The \(\alpha\) was 0.0741872, which is greater than the p-value of 0.0192. Therefore, one may conclude that the additive model can be rejected; that means that both Age and Training have an effect on outcome. Therefore, the two-way F-statistics generated from these analyses will not be used, as they could provide results that are inaccurate. Therefore, we must perform one-way ANOVA tests on both age and training separately.

2.2.4 One-Way ANOVA Tests

#see formula appendix
age <- fstat.all(quolldat$Time, quolldat$Age)
training <- fstat.all(quolldat$Time, quolldat$Training)

Figure 5 illustrates the histograms of the bootstrapped one-way F-statistics. The calculated F-statistics are shown by the purple and green lines for age and training, respectively. The p-values for each statistic are shown at the top of the line.

Table 2A: One-Way ANOVA Test Results via Handmade Function

Test Variable	F-statistic	p-value
Age	0.1921878	610^{-4}
Training	0.1235853	0.0032

2.2.4 Answer B

As shown in Table 2, the F-statistics are less than one in both tests, but the p-values are significant (610^{-4} and 0.0032, for age and training respectively). Because the F-statistics are less than one, that indicates that there is a greater amount of variance within the groups than between the groups. Because of this, it seems that there might not be a large effect of these two (as opposed to an F-statisic that is greater than one).

After further inspection, it seems that went awry with the code (see Formulae appendix). As such, I chose to verify the information with the built in ANOVA modeling. My results were as follows:

resultT <- anova(lm(Time ~ Training, data=quolldat))
TF <- round(resultT$`F value`[1], 4)
TP <- round(resultT$`Pr(>F)`[1], 4)

resultA <- anova(lm(Time ~ Age, data=quolldat))
AF <- round(resultA$`F value`[1], 4)
AP <- round(resultA$`Pr(>F)`[1], 4)

Table 2B: One-Way ANOVA Test Results via Built-in Function

Test Variable	F-statistic	p-value
Age	14.9906	210^{-4}
Training	9.6397	0.0027

The p-values from these tests are also significant, but the primary difference is that the F-statistics are actually greater than one, which is much closer to the expected results. Based on visual inspection of the data, it seemed likely that age had a greater effect on time spent inspecting cane toads than training did, and these F-statistics do point to that information; the F-statistic is higher for the comparison of the age samples than it is for the training sample groups. However, as both of the values seem large, one can conclude that these both have an effect on the time spent inspecting the toxic toads.

Figure 6 compares age subpopulations and training subpopulations in a violin plot. The white dot indicates the median, with the black box indicating the values of the upper and lower quantiles. Figure 6A compares the age subgroups, while Figure 6B compares those of the training characteristics.

2.2.6 Answer

Post-Hoc testing can be performed on the two types of variables (age, training). Inspecting the violin plots, which were used in order to see the distributions of the data, led to the conclusion that the subgroups did not come from the same underlying populations, and as such, two-box comparisons were used for both of the post-hoc comparisons. By inspecting the distribution of the data, it seemed that the best measure of central tendency to use was the median, as there was a fairly large amount of skew in all four subgroups. After performing these tests, the p-values must be adjusted to account for the chance of false positivies when performing multiple evaluations.

JA <- twobox.test(quoll.j$Time, quoll.a$Time)
UT <- twobox.test(quoll.u$Time, quoll.t$Time)

#pvalues <- c(JA$p, AJ$p, UT$p, TU$p)
pvalues <- c(JA[4], UT[4])
qvalues <- p.adjust(pvalues, method = p.adjust.methods, n= length(pvalues))

The two-group comparisons yielded the following results:

Comparison of Juvenile vs. Adult:

Median difference: 7.7 (4.7, 10.9)
P-value: 910^{-4} - Corrected p-value: 0.0018
F-statistic p-value: 610^{-4}

Comparison of Untrained vs. Trained:

Median difference: 7.25 (-0.65, 10.65)
P-value: 0.0376 - Corrected p-value: 0.0376
F-statistic p-value: 0.0032

Age and training both have a statistically significant negative effect on the median time inspecting a cane toad, according to the two-box tests. The difference in medians for the age subgroups had a positive effect, while the confidence intervals for the median differences for the training subgroups contained zero effect, meaning that there was a possibility that there was no effect. The training seems to have a minimal effect on making the quolls averse to the toxic toads, but it seems that age also has an effect. This highlights the importance of learning when it comes to survival behaviors. Overall, in order to conserve this species, it would seem that training would help. However, given the amount of effort required train the quolls, it would be incredibly costly to try and train the quoll populations for an effect that can also be seen just by comparing juvenile to adult quolls.

Formulae Appendix

Two-Way ANOVA

ssq <- function(x) sum(x^2)
sab <- function(x) sum(abs(x))

#inner function
Fstat.2way <- function(X, f1, f2, center = mean, agg = sab, rank = F){
  if (rank == F) {Y = X} else {Y = rank(-X)}
  #take the grand mean
  mu = center(Y)
  #generate f1, f2, and interaction groupmeans
  groupmeans.f1 = ave(Y, f1, FUN = center)
  groupmeans.f2 = ave(Y, f2, FUN = center)
  groupmeans.int = ave(Y, f1, f2, FUN = center)
  #calculate the aggregate functions
  SSB.f1 <- agg(groupmeans.f1 - mu)
  SSB.f2 <- agg(groupmeans.f2 - mu)
  SSB.int <- agg(groupmeans.int - (groupmeans.f1 + groupmeans.f2 - mu))
  SSW <- agg(Y - groupmeans.int)
  return( c(f1 = SSB.f1, f2 = SSB.f2, int = SSB.int) / SSW)
}

#outer function
anova2way.boot <- function(X, f1, f2, deals = 10000, center= mean, agg = sab, rank = FALSE){
  dataF <- Fstat.2way(X, f1, f2, center = center, agg = agg, rank = rank)
  #create the vector
  bootF = matrix(NA, nrow = deals, ncol = 3)
  #create the bootstrap within this function
  for(i in 1:deals) {
    bootX = sample (X, replace = T)
    bootF[i,] = Fstat.2way(bootX, f1, f2, center = center, agg = agg, rank = rank)
  }
  pval = c( p1 = sum(bootF[,1] > dataF[1])/deals,
            p2 = sum(bootF[,2] > dataF[2])/deals,
            pint = sum(bootF[,3] > dataF[3])/deals)
  return(list(Fstat = dataF, bootF = bootF, p = pval))
}

One-Way ANOVA

#1 way fstat function
Fstat.1way <- function(Y, group, center = mean, agg = sumsquares) {
  GroupMeans <- ave(Y, group, FUN = center)
  SSB <- agg(GroupMeans - center(Y))
  SSW <- agg(Y - GroupMeans)
  FF <- SSB/SSW
  
  return(FF=FF)
}

#bootstrap for p-value of fstat
Fstat.1way.p = function(Y, group, center = mean, agg = sumsquares){
    deals = 10000
    bootF <- rep(NA, deals)
    for (i in 1:deals){
    quolldat$boottime <- sample(Y, replace = TRUE)
    bootF[i]  = Fstat.1way(quolldat$boottime, group)
    }
    return(bootF)
}

Two-Group Comparisons

twobox.test <- function(datasetA, datasetB){
  #Effect size
deals = 10000
difference = rep(NA,deals)

for (i in 1:deals) {
bootdataA = sample(datasetA,length(datasetA),replace=TRUE)
bootdataB = sample(datasetB,length(datasetB),replace=TRUE)
difference[i] = median(bootdataA) - median(bootdataB)
}

est.median = median(difference)

#calculate confidence intervals
CI = sort(difference)[c(0.025*deals,0.975*deals)]

#NHST
boxA = datasetA - median(datasetA) 
boxB = datasetB - median(datasetB) 
deals = 10000
difference = rep(NA,deals)
for (i in 1:deals) {
  bootdataA = sample(boxA,length(datasetA),replace=TRUE)
  bootdataB = sample(boxB,length(datasetB),replace=TRUE) 
  bootmedianA = median(bootdataA)
  bootmedianB = median(bootdataB)
  
  difference[i] = bootmedianA-bootmedianB
}

diff <- median(datasetA)-median(datasetB)

#generate the p-value thresholds
HP <- sum(difference > abs(diff))
LP <- sum(difference < -abs(diff))

#calculate the pvalue
pval = (HP+LP)/deals
return(c(diff = diff, CI=CI, p=pval, est= est.median))
}


onebox.test <- function(datasetA, datasetB){
  #Effect size
deals = 10000
difference = rep(NA,deals)

for (i in 1:deals) {
bootdataA = sample(datasetA,length(datasetA),replace=TRUE)
bootdataB = sample(datasetB,length(datasetB),replace=TRUE)
difference[i] = median(bootdataA) - median(bootdataB)
}

est.median = median(difference)

#calculate confidence intervals
CI = sort(difference)[c(0.025*deals,0.975*deals)]

#NHST
deals = 10000
    datajoined = c(datasetA, datasetB)
    difference = rep(NA, deals)

for (i in 1 : deals) {
bootdataA = sample(datajoined, length(datasetA), replace = TRUE)
bootdataB = sample(datajoined, length(datasetB), replace = TRUE)
bootmedianA = median (bootdataA)
bootmedianB = median (bootdataB)
difference[i] = bootmedianA - bootmedianB
}

diff <- median(datasetA)-median(datasetB)

#generate the p-value thresholds
HP <- sum(difference > abs(diff))
LP <- sum(difference < -abs(diff))

#calculate the pvalue
pval = (HP+LP)/deals
return(c(diff = diff, CI=CI, p=pval, est= est.median))
}