Iris: ANOVA, Tukey HSD, and Compact Letter Display

Author

Lab demo

Published

May 5, 2026

Question

Do the three Iris species differ in sepal length, and which pairs differ from each other?

Data

data(iris)
summary(iris$Sepal.Length)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  4.300   5.100   5.800   5.843   6.400   7.900 
table(iris$Species)

    setosa versicolor  virginica 
        50         50         50 

One-way ANOVA

fit <- aov(Sepal.Length ~ Species, data = iris)
summary(fit)
             Df Sum Sq Mean Sq F value Pr(>F)    
Species       2  63.21  31.606   119.3 <2e-16 ***
Residuals   147  38.96   0.265                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The species factor is highly significant — at least one species mean differs from the others. ANOVA does not tell us which pairs differ; for that we use Tukey’s HSD post-hoc test.

Tukey HSD

tuk <- TukeyHSD(fit)
tuk
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Sepal.Length ~ Species, data = iris)

$Species
                      diff       lwr       upr p adj
versicolor-setosa    0.930 0.6862273 1.1737727     0
virginica-setosa     1.582 1.3382273 1.8257727     0
virginica-versicolor 0.652 0.4082273 0.8957727     0

Compact letter display

Means that share a letter are not significantly different at \(\alpha = 0.05\).

library(multcompView)

cld <- multcompLetters4(fit, tuk)
cld_letters <- cld$Species$Letters
cld_df <- data.frame(
  Species = names(cld_letters),
  cld     = unname(cld_letters)
)
cld_df
     Species cld
1  virginica   a
2 versicolor   b
3     setosa   c

Plot with letters

library(ggplot2)

y_label_pos <- max(iris$Sepal.Length) + 0.4

ggplot(iris, aes(Species, Sepal.Length, fill = Species)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 1) +
  geom_text(
    data = cld_df,
    aes(x = Species, y = y_label_pos, label = cld),
    inherit.aes = FALSE, size = 6, fontface = "bold"
  ) +
  labs(
    y = "Sepal length (cm)",
    x = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")
Figure 1: Sepal length by Iris species. Means sharing a letter are not significantly different (Tukey HSD, α = 0.05).

Session info

sessionInfo()
R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.3.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_4.0.1       multcompView_0.1-10

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5        cli_3.6.5          knitr_1.51         rlang_1.1.6       
 [5] xfun_0.55          otel_0.2.0         generics_0.1.4     S7_0.2.1          
 [9] jsonlite_2.0.0     labeling_0.4.3     glue_1.8.0         htmltools_0.5.9   
[13] scales_1.4.0       rmarkdown_2.30     grid_4.5.2         tibble_3.3.0      
[17] evaluate_1.0.5     fastmap_1.2.0      yaml_2.3.12        lifecycle_1.0.4   
[21] compiler_4.5.2     dplyr_1.1.4        RColorBrewer_1.1-3 pkgconfig_2.0.3   
[25] htmlwidgets_1.6.4  farver_2.1.2       digest_0.6.39      R6_2.6.1          
[29] tidyselect_1.2.1   dichromat_2.0-0.1  pillar_1.11.1      magrittr_2.0.4    
[33] withr_3.0.2        tools_4.5.2        gtable_0.3.6