In islandX there are three varieties of cocoa. Because of their success with chocolate chip biscuits, the Cocoabix factory in islandX is interested in extending their production to include cocoa sticks. Cocoa sticks are produced and used locally in the islands. Flakes are grated off the sticks to make the beverage locally known as “dite caco”. This is a delicious chocolate beverage frequently eaten with “bakes”.
Before launching the “Cocoa stick” line, the Cocabix factory made a decision to assess which cocoa plant variety yields the most (g) cocoa.
The Cocabix factory data analysts have conducted a preliminary test to check whether the weight of the cocoa pods produced by the different varieties of cocoa plants have similar weights. They’ve collected data on each variety; 1000 cocoa pods per variety and conducted an Analysis of Variance (ANOVA) test to assess the hypothesis: “there is no statistically significant difference between the mean weight (g) of cocoa pods from the three cocoa plant varieties”. In their briefing paper they’ve expressed this hypothesis (null hypothesis) as \(H_{0}: \mu_{Forastero} = \mu_{Criollo} = \mu_{Trinitario}\) where \(\mu\) denotes the true mean for cocoa pod weight.
A summary of the data can be found in Table 1. Figure 1 provides a graphical summary of the data as well.
Variety | count | mean | sd |
---|---|---|---|
Criollo | 1000 | 420 | 10 |
Forastero | 1000 | 420 | 10 |
Trinitario | 1000 | 390 | 10 |
The ANOVA test relies on the F-test statistic which incorporates the variation between the mean weight of cocoa pods – between and within varieties.
The results from the ANOVA test were revealed in the Cocoabix factory briefing paper. The p value, 2x\(10^-{16}\), associated with this test indicates that there exists a difference in cocoa pod weight (mean) across the three varieties. This information suggests that at least two means are different from each other.
The p value quantifies the probability that the observed differences between the mean cocoa pod weight (between the varieties under consideration) arose “by chance” (under the null hypothesis). The team concluded that there was strong evidence against the null hypothesis.
One of the first questions that the press asked the team was whether the statistical assumptions for ANOVA were met. The analytical team informed them that
The team also discussed the Tukey Honest Significant Differences test (see results below) which provided confidence intervals (the overall error rate is 5%) for the difference in mean cocoa pod weight between each pair of cocoa pod variety.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Weight ~ Variety, data = Investigate)
##
## $Variety
## diff lwr upr p adj
## Forastero-Criollo 0.03583 -1.022282 1.093942 0.9965301
## Trinitario-Criollo -29.82899 -30.887102 -28.770878 0.0000000
## Trinitario-Forastero -29.86482 -30.922932 -28.806708 0.0000000