data(iris)
summary(iris$Sepal.Length) Min. 1st Qu. Median Mean 3rd Qu. Max.
4.300 5.100 5.800 5.843 6.400 7.900
table(iris$Species)
setosa versicolor virginica
50 50 50
Do the three Iris species differ in sepal length, and which pairs differ from each other?
data(iris)
summary(iris$Sepal.Length) Min. 1st Qu. Median Mean 3rd Qu. Max.
4.300 5.100 5.800 5.843 6.400 7.900
table(iris$Species)
setosa versicolor virginica
50 50 50
fit <- aov(Sepal.Length ~ Species, data = iris)
summary(fit) Df Sum Sq Mean Sq F value Pr(>F)
Species 2 63.21 31.606 119.3 <2e-16 ***
Residuals 147 38.96 0.265
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The species factor is highly significant — at least one species mean differs from the others. ANOVA does not tell us which pairs differ; for that we use Tukey’s HSD post-hoc test.
tuk <- TukeyHSD(fit)
tuk Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Sepal.Length ~ Species, data = iris)
$Species
diff lwr upr p adj
versicolor-setosa 0.930 0.6862273 1.1737727 0
virginica-setosa 1.582 1.3382273 1.8257727 0
virginica-versicolor 0.652 0.4082273 0.8957727 0
Means that share a letter are not significantly different at \(\alpha = 0.05\).
library(multcompView)
cld <- multcompLetters4(fit, tuk)
cld_letters <- cld$Species$Letters
cld_df <- data.frame(
Species = names(cld_letters),
cld = unname(cld_letters)
)
cld_df Species cld
1 virginica a
2 versicolor b
3 setosa c
library(ggplot2)
y_label_pos <- max(iris$Sepal.Length) + 0.4
ggplot(iris, aes(Species, Sepal.Length, fill = Species)) +
geom_boxplot(alpha = 0.7, outlier.shape = 1) +
geom_text(
data = cld_df,
aes(x = Species, y = y_label_pos, label = cld),
inherit.aes = FALSE, size = 6, fontface = "bold"
) +
labs(
y = "Sepal length (cm)",
x = NULL
) +
theme_minimal(base_size = 12) +
theme(legend.position = "none")
sessionInfo()R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.3.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_4.0.1 multcompView_0.1-10
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.5 knitr_1.51 rlang_1.1.6
[5] xfun_0.55 otel_0.2.0 generics_0.1.4 S7_0.2.1
[9] jsonlite_2.0.0 labeling_0.4.3 glue_1.8.0 htmltools_0.5.9
[13] scales_1.4.0 rmarkdown_2.30 grid_4.5.2 tibble_3.3.0
[17] evaluate_1.0.5 fastmap_1.2.0 yaml_2.3.12 lifecycle_1.0.4
[21] compiler_4.5.2 dplyr_1.1.4 RColorBrewer_1.1-3 pkgconfig_2.0.3
[25] htmlwidgets_1.6.4 farver_2.1.2 digest_0.6.39 R6_2.6.1
[29] tidyselect_1.2.1 dichromat_2.0-0.1 pillar_1.11.1 magrittr_2.0.4
[33] withr_3.0.2 tools_4.5.2 gtable_0.3.6