Built-in R data set named PlantGrowth would be used for the Anova test. It contains the weight of plants obtained under a control and two different treatment conditions.
Load and inspect the data by using the function sample_n_by() to display one random row by groups
data("PlantGrowth")
set.seed(1234)
head(PlantGrowth)
## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
PlantGrowth %>% sample_n_by(group, size = 1)
## # A tibble: 3 x 2
## weight group
## <dbl> <fct>
## 1 5.14 ctrl
## 2 3.83 trt1
## 3 5.37 trt2
levels(PlantGrowth$group) # Show the levels
## [1] "ctrl" "trt1" "trt2"
PlantGrowth <- PlantGrowth %>%
reorder_levels(group, order = c("ctrl", "trt1", "trt2")) # We can also reoder levels
Summary Statistics
PlantGrowth %>%
group_by(group) %>%
get_summary_stats(weight, type = "mean_sd")
## # A tibble: 3 x 5
## group variable n mean sd
## <fct> <chr> <dbl> <dbl> <dbl>
## 1 ctrl weight 10 5.03 0.583
## 2 trt1 weight 10 4.66 0.794
## 3 trt2 weight 10 5.53 0.443
ggboxplot(PlantGrowth, x = "group", y = "weight")
Check Assumptions of Anova
1 Outliers
PlantGrowth %>%
group_by(group) %>%
identify_outliers(weight)
## # A tibble: 2 x 4
## group weight is.outlier is.extreme
## <fct> <dbl> <lgl> <lgl>
## 1 trt1 5.87 TRUE FALSE
## 2 trt1 6.03 TRUE FALSE
Comment
There were no extreme outliers
2 Normality
2 approaches can be used to check normality:
Analyzing the ANOVA model residuals to check the normality for all groups together. This approach is easier and it’s very handy when you have many groups or if there are few data points per group.
Check normality for each group separately. This approach might be used when you have only a few groups and many data points per group.
model <- lm(weight ~ group, data = PlantGrowth) ## Build a linear model
ggqqplot(residuals(model)) ## create a QQ plot
shapiro_test(residuals(model)) ## Compute Shapiro Wilk test
## # A tibble: 1 x 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.966 0.438
Comment
In the QQ plot, as all the points fall approximately along the reference line, we can assume normality. This conclusion is supported by the Shapiro-Wilk test. The p-value is not significant (p = 0.13), so we can assume normality
Check normality assumption by groups. Computing Shapiro-Wilk test for each group level. If the data is normally distributed, the p-value should be greater than 0.05.
PlantGrowth %>%
group_by(group) %>%
shapiro_test(weight)
## # A tibble: 3 x 4
## group variable statistic p
## <fct> <chr> <dbl> <dbl>
## 1 ctrl weight 0.957 0.747
## 2 trt1 weight 0.930 0.452
## 3 trt2 weight 0.941 0.564
Homogneity of variance assumption
plot(model, 1)
PlantGrowth %>% levene_test(weight ~ group)
## # A tibble: 1 x 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 2 27 1.12 0.341
comment
From the output above, we can see that the p-value is > 0.05, which is not significant. This means that, there is not significant difference between variances across groups. Therefore, we can assume the homogeneity of variances in the different treatment groups. In a situation where the homogeneity of variance assumption is not met, you can compute the Welch one-way ANOVA test using the function welch_anova_test()[rstatix package]. This test does not require the assumption of equal variances.
Computation
res.aov <- PlantGrowth %>% anova_test(weight ~ group)
## Coefficient covariances computed by hccm()
res.aov
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 group 2 27 4.846 0.016 * 0.264
Anova Results Conclusion
From the above ANOVA table, it can be seen that there are significant differences between groups (p = 0.016), which are highlighted with “*“, F(2, 27) = 4.85, p = 0.16, eta2[g] = 0.26.
Post hoc Test. Pairwise comparison
pwc <- PlantGrowth %>% tukey_hsd(weight ~ group)
pwc
## # A tibble: 3 x 9
## term group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 group ctrl trt1 0 -0.371 -1.06 0.320 0.391 ns
## 2 group ctrl trt2 0 0.494 -0.197 1.19 0.198 ns
## 3 group trt1 trt2 0 0.865 0.174 1.56 0.012 *
Conclusion from the pairwise comparison
It can be seen from the output, that only the difference between trt2 and trt1 is significant (adjusted p-value = 0.012).
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.