1 Introduction

A total of 2316 tomatoes have been carefully harvested and weighed. Of these tomatoes, there are three types:

  • Compost variety (homegrown): 282 tomatoes
  • Yellow variety (homegrown): 121 tomatoes
  • Shop variety (shop bought): 1913 tomatoes

In this analysis, we compare and contrast the performance of each tomato type by weight.

2 Descriptive statistics

Table 2.1 shows a summary of descriptive statistics for each tomato type. On average, the compost variety yielded the heaviest tomatoes with a mean of 15.55 grams. Second heaviest was the yellow variety at 6.42 grams, with shop bought tomatoes the smallest on average with a mean of 6.01 grams.

Table 2.1: Descriptive statistics for each tomato type in grams
Type N Mean SD Min Q1 Median Q3 Max
Compost 282 15.55 6.16 4 10 14.5 21 37
Yellow 121 6.42 2.30 2 5 6.0 8 11
Shop 1913 6.01 5.06 1 3 4.0 7 29

Noting that both the compost and yellow varieties are homegrown, we see that 403 home grown tomatoes were yielded in total, while 1913 tomatoes were bought from local shops.

3 Descriptive plots

Figure 3.1 shows box plots for each variety of tomato, clearly showing that the compost variety, on average, are the highest weighing tomatoes. While the yellow and shop varieties are more similar to each other in terms of average weight, the weight of the yellow-variety tomatoes is the least variable of the three types. On the other hand, the weight of the shop bought tomatoes is more variable as compared with the yellow variety. This is also evident from the results in Table 2.1, where we see that the yellow variety has a standard deviation of 2.3, compared with 6.01 for the shop bought variety. The compost variety is the most variable overall, with a standard deviation of 6.16.

Box plots of tomato weights in grams by type of tomato

Figure 3.1: Box plots of tomato weights in grams by type of tomato

Figure 3.2 shows the distribution of each of the three tomato types. We again see that the majority of shop bought tomatoes are relatively small, but also that the majority of tomatoes yielded were shop bought. Figure 3.2 is interactive: see if you can zoom in or out using the buttons at the top, or focus on certain tomato types by clicking on categories in the legend, or change the chart type by using the drop-down menu.

Figure 3.2: Histograms of tomato weights in grams by type of tomato. Use the drop-down menu to view box plots or violin plots

4 Inferential Statistics

Now that we have summarised the data and started to notice some patterns, let’s see what inferences we can draw. For example, we observed that the average weight of the tomatoes is different based on the type of tomato. Is this difference statistically significant? Let’s do a one-way ANOVA hypothesis test to find out.

4.1 One-way ANOVA hypothesis test

We are going to do a one-way ANOVA hypothesis test to determine whether there is a statistically significant difference in mean tomato weight between the compost, yellow and shop varieties. [Note: Due to the difference in variation between the groups, we will use the Welch one-Way ANOVA test which does not assume equal variance].

  • Let \(\mu_1\) denote the population mean tomato weight for the compost variety
  • Let \(\mu_2\) denote the population mean tomato weight for the yellow variety
  • Let \(\mu_3\) denote the population mean tomato weight for the shop variety

We can now define our hypotheses as follows:

\(H_0:\mu_1 = \mu_2 = \mu_3\text{ }\) versus \(\text{ }H_1: \text{ not all } \mu_i\text{'s are equal, for }i = 1, \cdots, 3,\) where:

  • \(H_0\) denotes the null hypothesis that there is no difference in mean weight between the tomato varieties
  • \(H_1\) denotes the alternative hypothesis that there is a difference in mean weight between the tomato varieties.

The results of our hypothesis test are shown below:

## # A tibble: 1 × 7
##   .y.       n statistic   DFn   DFd        p method     
## * <chr> <int>     <dbl> <dbl> <dbl>    <dbl> <chr>      
## 1 Grams  2316      308.     2  324. 1.03e-75 Welch ANOVA

As we can see, the \(p\)-value is very small (1.03e-75, which is very close to 0). Since \(p < 0.05\) we can reject the null hypothesis and conclude that there is a statistically significant difference in mean weight between the tomato varieties.

To summarise: There was a significant difference in the average weight (in grams) \(\left[F(2, 324) = 308, p < 0.001\right]\) between tomato varieties.

However, the one-way ANOVA does not tell us which varieties are statistically significantly different from each other - only that they are not all equal. To find out, we can carry out some post-hoc tests to test for pairwise differences between the groups.

4.2 Post-hoc tests

The following output shows the results of our post-hoc tests using the Games-Howell method.

## # A tibble: 3 × 8
##   .y.   group1  group2 estimate conf.low conf.high p.adj p.adj.signif
## * <chr> <chr>   <chr>     <dbl>    <dbl>     <dbl> <dbl> <chr>       
## 1 Grams Compost Yellow   -9.13   -10.1      -8.13  0     ****        
## 2 Grams Compost Shop     -9.54   -10.4      -8.63  0     ****        
## 3 Grams Yellow  Shop     -0.408   -0.974     0.157 0.205 ns

We can interpret the results as follows.

Comparing the compost and yellow varieties: There is a statistically significant difference in mean weight (\(p < 0.001\)). We are 95% confident that, on average, yellow variety tomatoes weigh between 8.13 and 10.1 grams less than compost variety tomatoes.

Comparing the compost and shop varieties: There is a statistically significant difference in mean weight (\(p < 0.001\)). We are 95% confident that, on average, shop bought tomatoes weigh between 8.63 and 10.4 grams less than compost variety tomatoes.

Comparing the yellow and shop varieties: There is not a statistically significant difference in mean weight (\(p = 0.205\)).

5 Conclusion

The Allsops picked lots of tomatoes but not as many as Bec’s dad.