Code
# 1. Load required packages
library(tidyverse)
library(multcompView)
library(emmeans) # For Post-Hoc tests
# 2. Prepare Data
data(ToothGrowth)
ToothGrowth$dose <- as.factor(ToothGrowth$dose) # Convert dose to factorComparing Means Across Multiple Groups with Post-Hoc Tukey & CLD
A One-Way Analysis of Variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of three or more independent groups.
We will use the built-in ToothGrowth dataset, which measures the effect of Vitamin C on tooth growth in guinea pigs. We must ensure the independent variable (dose) is treated as a factor.
# 1. Load required packages
library(tidyverse)
library(multcompView)
library(emmeans) # For Post-Hoc tests
# 2. Prepare Data
data(ToothGrowth)
ToothGrowth$dose <- as.factor(ToothGrowth$dose) # Convert dose to factorThe ANOVA test evaluates the Null Hypothesis () that all group means are equal.
# 3. ANOVA Analysis
anova_result <- aov(len ~ dose, data = ToothGrowth)
summary(anova_result) Df Sum Sq Mean Sq F value Pr(>F)
dose 2 2426 1213 67.42 9.53e-16 ***
Residuals 57 1026 18
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Alternative: Perform ANOVA using pipes
# ToothGrowth %>%
# aov(len ~ dose, data = .) %>%
# summary()If the ANOVA p-value is significant (< 0.05), we use Tukey’s Honest Significant Difference (HSD) test to find which specific groups differ. We then use Compact Letter Display (CLD) to simplify the results: groups sharing a letter are not significantly different.
# 4. Tukey Test & CLD (Compact Letter Display)
tukey_result <- TukeyHSD(anova_result)
# Generate CLD letters
cld_letters <- multcompLetters4(anova_result, tukey_result)
print(cld_letters)$dose
2 1 0.5
"a" "b" "c"
We calculate the mean and standard deviation for each group and append the CLD letters for visualization.
# 5. Calculate Group Means and add CLD
summary_data <- ToothGrowth %>%
group_by(dose) %>%
summarise(
mean_len = mean(len),
sd_len = sd(len)
) %>%
mutate(
cld = cld_letters$dose$Letters # Add CLD letters
)
print(summary_data)# A tibble: 3 × 4
dose mean_len sd_len cld
<fct> <dbl> <dbl> <chr>
1 0.5 10.6 4.50 a
2 1 19.7 4.42 b
3 2 26.1 3.77 c
We create a publication-quality bar chart that includes error bars and the significance letters (CLD).
# 6. Create Bar Graph with CLD
summary_data %>%
ggplot(aes(dose, mean_len)) +
geom_bar(stat = "identity",
fill = "steelblue",
alpha = 0.7) +
geom_errorbar(aes(ymin = mean_len - sd_len,
ymax = mean_len + sd_len),
width = 0.2,
color = "black") +
geom_text(aes(label = cld,
y = mean_len + sd_len + 1.5),
size = 5,
fontface = "bold",
color = "black") +
labs(x = "Vitamin C Dose (mg/day)",
y = "Tooth Length (mm)",
title = "One-Way ANOVA: Impact of Dose on Tooth Growth",
subtitle = "Groups with different letters are significantly different (p < 0.05)") +
theme_test(base_size = 15) +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5))# 7. Save the Plot
# ggsave("anova_cld_plot.png", width = 6, height = 5, dpi = 300)aov(numeric ~ factor, data = df)summary(anova_object)TukeyHSD(anova_object)multcompLetters4()Great Work! You have now moved from basic t-tests to multi-group ANOVA with professional significance mapping.