II. Given Data

PROBLEM: Three different washing solutions are being compared to study their effectiveness in retarding bacteria growth in 5-gallon milk containers. The analysis is done in a laboratory, and only three trials can be run on any day. Because days could represent a potential source of variability, the experimenter decides to use a randomized block design. Observations are taken for four days, and the data are shown here. Analyze the data and draw conclusions.

Data Set:

   Solution Day Bacteria
1         1   1       13
2         1   2       22
3         1   3       18
4         1   4       39
5         2   1       16
6         2   2       24
7         2   3       17
8         2   4       44
9         3   1        5
10        3   2        4
11        3   3        1
12        3   4       22

III. Exploratory Data Analysis

Descriptive Statistic:

data$Solution: 1
    Solution      Day          Bacteria    
 Min.   :1   Min.   :1.00   Min.   :13.00  
 1st Qu.:1   1st Qu.:1.75   1st Qu.:16.75  
 Median :1   Median :2.50   Median :20.00  
 Mean   :1   Mean   :2.50   Mean   :23.00  
 3rd Qu.:1   3rd Qu.:3.25   3rd Qu.:26.25  
 Max.   :1   Max.   :4.00   Max.   :39.00  
------------------------------------------------------------ 
data$Solution: 2
    Solution      Day          Bacteria    
 Min.   :2   Min.   :1.00   Min.   :16.00  
 1st Qu.:2   1st Qu.:1.75   1st Qu.:16.75  
 Median :2   Median :2.50   Median :20.50  
 Mean   :2   Mean   :2.50   Mean   :25.25  
 3rd Qu.:2   3rd Qu.:3.25   3rd Qu.:29.00  
 Max.   :2   Max.   :4.00   Max.   :44.00  
------------------------------------------------------------ 
data$Solution: 3
    Solution      Day          Bacteria    
 Min.   :3   Min.   :1.00   Min.   : 1.00  
 1st Qu.:3   1st Qu.:1.75   1st Qu.: 3.25  
 Median :3   Median :2.50   Median : 4.50  
 Mean   :3   Mean   :2.50   Mean   : 8.00  
 3rd Qu.:3   3rd Qu.:3.25   3rd Qu.: 9.25  
 Max.   :3   Max.   :4.00   Max.   :22.00  
  Solution Bacteria.mean Bacteria.median Bacteria.sd
1        1         23.00            20.0   11.284207
2        2         25.25            20.5   12.996794
3        3          8.00             4.5    9.486833

Interpretation: The result above shows the descriptive statistics such as the minimum, maximum, median, mean, and standard deviation by solution. The mean is lowest for Solution 3 and highest for Solution 2.

Graph (Line-plot):

Interpretation: The plot shows three washing solutions with varying levels of bacteria growth over four days. Solutions 1 and 2 show more variability, while Solution 3 consistently has the lowest growth, suggesting its effectiveness in retarding bacteria growth in 5-gallon milk containers. Day 4 is a critical point with noticeable changes. Furthermore, it suggests that Solution 3 may be the most effective, but further statistical analysis is needed to confirm these observations and assess the significance of these differences.

IV. Experimental Question

Is there evidence of a difference in the effectiveness of retarding bacteria growth among the three different washing solutions at the significance level of α = 0.05?

V. Null and Alternative Hypothesis

Use hypothesis testing at the significance level of α = 0.05.

Ho: μ1 = μ2 = μ3 (no difference in mean readings)

This means that the mean bacterial growth in each solution is equal.

Ha: μ1 ≠ μ2 (at least one difference in mean readings)

At least one of the means is different from the others.

ANOVA TABLE

To estimate the model parameters, we used a two-way ANOVA model.

Analysis of Variance Table

Response: response
           Df  Sum Sq Mean Sq F value    Pr(>F)    
treatments  2  703.50  351.75  40.717 0.0003232 ***
blocks      3 1106.92  368.97  42.711 0.0001925 ***
Residuals   6   51.83    8.64                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation:

The model F-value of 40.72 implies the model is significant. There is only a 0.03% chance that a “model F-value” this large could occur due to noise. Using α = 0.05, the critical value of F_(0.05),(2,6) = 5.14. Since 40.72 > 5.14, we conclude that the type of washing solution affects the mean bacterial growth. Also, days (blocks) seem to differ significantly since the mean square for blocks is large relative to error. Moreover, given that the p-value = 0.0003232 is less than 0.05, we reject the null hypothesis, so we reject the hypothesis that the mean bacteria growth under each solution is equal. Thus, there is a significant variation in growth bacteria from different washing solutions at the 0.05 level of significance.

VI. Post-hoc Test

To compare group means, we need to perform post-hoc tests. In order to see which group is different from the others, we need to compare group 2 by 2. Since there are 3 solutions, we are going to compare solutions 2 by 2 as follows:


     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lm(formula = response ~ treatments + blocks, data = data)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)    
2 - 1 == 0    2.250      2.078   1.083  0.55787    
3 - 1 == 0  -15.000      2.078  -7.217  0.00115 ** 
3 - 2 == 0  -17.250      2.078  -8.300  < 0.001 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
            Df Sum Sq Mean Sq F value   Pr(>F)    
treatments   2  703.5   351.8   40.72 0.000323 ***
blocks       3 1106.9   369.0   42.71 0.000192 ***
Residuals    6   51.8     8.6                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = response ~ treatments + blocks, data = data)

$treatments
      diff        lwr        upr     p adj
2-1   2.25  -4.126879   8.626879 0.5577862
3-1 -15.00 -21.376879  -8.623121 0.0008758
3-2 -17.25 -23.626879 -10.873121 0.0004067

$blocks
          diff        lwr       upr     p adj
2-1  5.3333333  -2.974240 13.640906 0.2193500
3-1  0.6666667  -7.640906  8.974240 0.9917442
4-1 23.6666667  15.359094 31.974240 0.0002622
3-2 -4.6666667 -12.974240  3.640906 0.3037891
4-2 18.3333333  10.025760 26.640906 0.0010843
4-3 23.0000000  14.692427 31.307573 0.0003081

Interpretation:

There is no significant difference between groups 1 and 2 because the adjusted p-value is greater than 0.05 and the confidence interval includes zero. In addition, there is a significant difference between groups 1 and 3 because the adjusted p-value is less than 0.05 and the confidence interval does not include zero. Also, there is a significant difference between groups 2 and 3 because the adjusted p-value is less than 0.05 and the confidence interval does not include zero. Thus, these results suggest which specific pairs of treatment groups have significant differences in means based on the Tukey post-hoc test. Therefore, we can conclude that there is a difference between the means of the three solutions. And it indicates that solution 3 is significantly different than the other two.

To visualize the result above, we plot it to have a simple visual assessment, and it provides more information than the adjusted p-values.

Interpretation: In the graph above, zero indicates that the group means are equal. If an interval does not contain zero, the corresponding means are significantly different. And in the chart above, only the difference between 1-3 and 2-3 is significant. These confidence interval results match the hypothesis test results in the previous table.

VII. Syntax

Below are the code to compute some basic descriptive statistics of our dataset.

data <- data.frame(
  Solution = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
  Day = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
  Bacteria = c(13, 22, 18, 39, 16, 24, 17, 44, 5, 4, 1, 22))
data
by(data, data$Solution, summary)
summaryBy(Bacteria~Solution, data = data, FUN = c(mean, median, sd))

Before actually performing the ANOVA in R is to visualize the data in relation to the research question. This can be done with the lineplot() function in base R.

ggplot(data, aes(x = Day, y = Bacteria, color = as.factor(Solution))) +
  geom_line() +
  geom_point() +
  labs(x = "Day", y = "Bacteria Growth", color = "Solution") +
  ggtitle("Bacteria Growth Over Four Days by Washing Solution")

Now, only the ANOVA can help us to answer the initial research question. ANOVA in R can be done in several ways, but we used the two-way.test function and lm function, since the outputs are exactly the same for both methods, which means that in case of equal variances, results and conclusions will be unchanged.

blocks=as.factor(rep(c(1:4),3))
treatments=as.factor(c(rep(1,4), rep(2,4), rep(3,4)))
y=c(13, 22, 18, 39, 16, 24, 17, 44, 5, 4, 1, 22)
res_aov <- aov(y~ treatments + blocks, data = data)
summary(res_aov)

Furthermore, in terms of multiple testing we used the Post-hoc test in R which is the Tukey HSD, used to compare all groups to each other. Observe that, we also used the summary() and (aov) functions, a method to perform ANOVA because the (res_aov) are resused for the post-hoc test, then we used mcp() function.

mc <- glht(res_aov, linfct = mcp(treatments = "Tukey"))
summary(mc)
tukey_result <- TukeyHSD(res_aov)
print(tukey_result)
plot(tukey_result)