R Markdown

Built-in R data set named PlantGrowth would be used for the Anova test. It contains the weight of plants obtained under a control and two different treatment conditions.

Load and inspect the data by using the function sample_n_by() to display one random row by groups

data("PlantGrowth")
set.seed(1234)
head(PlantGrowth)
##   weight group
## 1   4.17  ctrl
## 2   5.58  ctrl
## 3   5.18  ctrl
## 4   6.11  ctrl
## 5   4.50  ctrl
## 6   4.61  ctrl
PlantGrowth %>% sample_n_by(group, size = 1)
## # A tibble: 3 x 2
##   weight group
##    <dbl> <fct>
## 1   5.14 ctrl 
## 2   3.83 trt1 
## 3   5.37 trt2
levels(PlantGrowth$group)   # Show the levels
## [1] "ctrl" "trt1" "trt2"
PlantGrowth <- PlantGrowth %>%
reorder_levels(group, order = c("ctrl", "trt1", "trt2"))  # We can also reoder levels

Summary Statistics

PlantGrowth %>%
group_by(group) %>%
get_summary_stats(weight, type = "mean_sd")
## # A tibble: 3 x 5
##   group variable     n  mean    sd
##   <fct> <chr>    <dbl> <dbl> <dbl>
## 1 ctrl  weight      10  5.03 0.583
## 2 trt1  weight      10  4.66 0.794
## 3 trt2  weight      10  5.53 0.443
ggboxplot(PlantGrowth, x = "group", y = "weight")

Check Assumptions of Anova

1 Outliers

PlantGrowth %>%
group_by(group) %>%
identify_outliers(weight)
## # A tibble: 2 x 4
##   group weight is.outlier is.extreme
##   <fct>  <dbl> <lgl>      <lgl>     
## 1 trt1    5.87 TRUE       FALSE     
## 2 trt1    6.03 TRUE       FALSE

Comment

There were no extreme outliers

2 Normality

2 approaches can be used to check normality:

  1. Analyzing the ANOVA model residuals to check the normality for all groups together. This approach is easier and it’s very handy when you have many groups or if there are few data points per group.

  2. Check normality for each group separately. This approach might be used when you have only a few groups and many data points per group.

model <- lm(weight ~ group, data = PlantGrowth)  ## Build a linear model
ggqqplot(residuals(model))                       ## create a QQ plot

shapiro_test(residuals(model))                   ## Compute Shapiro Wilk test
## # A tibble: 1 x 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.966   0.438

Comment

In the QQ plot, as all the points fall approximately along the reference line, we can assume normality. This conclusion is supported by the Shapiro-Wilk test. The p-value is not significant (p = 0.13), so we can assume normality

Check normality assumption by groups. Computing Shapiro-Wilk test for each group level. If the data is normally distributed, the p-value should be greater than 0.05.

PlantGrowth %>%
group_by(group) %>%
shapiro_test(weight)
## # A tibble: 3 x 4
##   group variable statistic     p
##   <fct> <chr>        <dbl> <dbl>
## 1 ctrl  weight       0.957 0.747
## 2 trt1  weight       0.930 0.452
## 3 trt2  weight       0.941 0.564

Homogneity of variance assumption

  1. The residuals versus fits plot can be used to check the homogeneity of variances.
plot(model, 1)

  1. Using Levene’s test to check the homogeneity of variances
PlantGrowth %>% levene_test(weight ~ group)
## # A tibble: 1 x 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     2    27      1.12 0.341

comment

From the output above, we can see that the p-value is > 0.05, which is not significant. This means that, there is not significant difference between variances across groups. Therefore, we can assume the homogeneity of variances in the different treatment groups. In a situation where the homogeneity of variance assumption is not met, you can compute the Welch one-way ANOVA test using the function welch_anova_test()[rstatix package]. This test does not require the assumption of equal variances.

Computation

res.aov <- PlantGrowth %>% anova_test(weight ~ group)
## Coefficient covariances computed by hccm()
res.aov
## ANOVA Table (type II tests)
## 
##   Effect DFn DFd     F     p p<.05   ges
## 1  group   2  27 4.846 0.016     * 0.264

Anova Results Conclusion

From the above ANOVA table, it can be seen that there are significant differences between groups (p = 0.016), which are highlighted with “*“, F(2, 27) = 4.85, p = 0.16, eta2[g] = 0.26.

Post hoc Test. Pairwise comparison

pwc <- PlantGrowth %>% tukey_hsd(weight ~ group)
pwc
## # A tibble: 3 x 9
##   term  group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr> <chr>  <chr>       <dbl>    <dbl>    <dbl>     <dbl> <dbl> <chr>       
## 1 group ctrl   trt1            0   -0.371   -1.06      0.320 0.391 ns          
## 2 group ctrl   trt2            0    0.494   -0.197     1.19  0.198 ns          
## 3 group trt1   trt2            0    0.865    0.174     1.56  0.012 *

Conclusion from the pairwise comparison

It can be seen from the output, that only the difference between trt2 and trt1 is significant (adjusted p-value = 0.012).

Visualisation

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.