Analysis of Variance (ANOVA)

EPI 553: Principles of Statistical Inference II

Author

Muntasir Masum, PhD

Published

January 20, 2026


Learning Objectives

By the end of this lecture, you will be able to:

  • Understand the purpose and application of ANOVA
  • Decompose total variance into treatment and error components
  • Interpret ANOVA tables and F-test results
  • Conduct post-hoc comparisons using Tukey-Kramer method
  • Apply ANOVA to real public health data using R
  • Translate research questions into testable hypotheses for multiple group comparisons

Why ANOVA Matters: Real-World Applications

ANOVA is one of the most powerful tools in your biostatistics toolkit. Here’s why you’ll use it throughout your public health career:

Example 1: Medication Effectiveness - Question: Does blood pressure reduction differ across three different medications? - ANOVA Solution: Compare mean BP reduction across treatment groups

Example 2: Environmental Health - Question: How does respiratory hospital admission rate vary by air pollution level (low/moderate/high)? - ANOVA Solution: Test for differences in mean admission rates across pollution categories

Example 3: Population Health - Question: Does COVID-19 vaccination rate differ across multiple counties? - ANOVA Solution: Compare vaccination rates across geographic units

Example 4: Occupational Health - Question: Do lung function outcomes differ across different occupational exposure groups? - ANOVA Solution: Test for differences in forced expiratory volume across exposure categories


When/Why to Use ANOVA

The Problem ANOVA Solves

Suppose you want to compare three or more group means. Your first instinct might be: “Why not just do multiple t-tests?”

The Multiple Comparisons Problem

If you conduct independent t-tests for each pair of groups:

  • With k groups, you need \(\binom{k}{2}\) pairwise tests (pronounced “k choose 2”)
  • Each test has a Type I error rate of α = 0.05
  • The experiment-wise error rate inflates dramatically

Example with 3 groups (3 comparisons):

The experiment-wise error rate equals 1 minus (1 minus 0.05) raised to the power of 3, which equals 1 minus 0.857, which equals 0.143.

\[\text{Experiment-wise error rate} = 1 - (1-0.05)^3 = 1 - 0.857 = 0.143\]

Your true probability of making at least one false positive is 14.3%, not 5%!

ANOVA’s Solution

  • Conducts a single omnibus test of all means simultaneously
  • Controls the experiment-wise Type I error rate at α
  • If significant, use post-hoc tests (like Tukey-Kramer) to identify which pairs differ

ANOVA Fundamentals

Hypotheses

Null hypothesis (H₀): All group means are equal (μ₁ = μ₂ = … = μₖ)

\[H_0: \mu_1 = \mu_2 = \cdots = \mu_k\]

Alternative hypothesis (H₁): At least one group mean is different from the others

\[H_1: \text{At least one mean is different}\]

Key Terminology

Term Definition
Factor The categorical independent variable (e.g., treatment type)
Levels The number of groups in the factor (e.g., 3 medications = 3 levels)
One-way ANOVA One factor with multiple levels (our focus)
Two-way ANOVA Two factors (beyond this course scope)

Table 1: Key terminology used in ANOVA analysis


The Math: Decomposing Variance

ANOVA works by partitioning total variation into two components:

Total Sum of Squares (SSY)

The total deviation of all observations from the grand mean. This represents the overall variability in the outcome across all groups.

\[SS_Y = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \overline{y})^2\]

Where: - \(y_{ij}\) = observation j in group i - \(\overline{y}\) = grand mean (the average across all observations and groups)

Key Identity: Decomposition

Total variation equals the sum of between-group and within-group variation:

\[SS_Y = SS_T + SS_E\]

Treatment Sum of Squares (SST) - Variation between groups: \[SS_T = \sum_{i=1}^{k} n_i(\overline{y}_{i} - \overline{y})^2\]

This represents how much each group’s mean differs from the overall grand mean, weighted by group size.

Error Sum of Squares (SSE) - Variation within groups: \[SS_E = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \overline{y}_{i})^2\]

This represents the variability of individual observations around their own group mean. This is the “leftover” or unexplained variation.

Mean Squares

Divide by degrees of freedom to get mean squares, which account for the number of parameters estimated:

\[MS_T = \frac{SS_T}{k-1}\]

\[MS_E = \frac{SS_E}{n-k}\]

Note: Degrees of freedom for treatment equals the number of groups minus 1 (k-1). Degrees of freedom for error equals the total sample size minus the number of groups (n-k).


The F-Statistic

The test statistic for ANOVA is the F-ratio:

\[F = \frac{MS_T}{MS_E}\]

Interpretation

  • Numerator (MST): Average variation between groups. This captures the treatment effect or group differences.
  • Denominator (MSE): Average variation within groups. This captures random error and noise.
  • Large F: Groups differ more than we’d expect by chance alone (suggests a real treatment effect)
  • Small F: Differences between groups are small relative to within-group variation (suggests no treatment effect)

Decision Rule

Reject the null hypothesis if F exceeds the critical value \(F_{k-1, n-k, 1-\alpha}\) at significance level α (typically 0.05).

In practice, we compare the p-value to α: reject \(H_0\) if p-value < α.


ANOVA Assumptions

ANOVA requires three key assumptions. Meeting these assumptions ensures the validity of your p-values and inference.

1. Independence

Observations are independent. Each observation should not influence another. This assumption is violated in repeated measures designs, clustered data, or time series data.

2. Normality

Within each group, the outcome is approximately normally distributed. The distribution should resemble a bell curve.

3. Homogeneity of Variance

All groups have equal population variances. Mathematically, this means: \(\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_k^2\)

In other words, the spread of values around each group’s mean should be roughly the same across all groups.

Robustness

Good news: ANOVA is robust to moderate violations of these assumptions, particularly: - With larger sample sizes (typically n > 30 per group) - With equal or similar group sizes - With only slight departures from normality

We’ll check these assumptions in our example to demonstrate best practices.


The ANOVA Table

The standard way to present ANOVA results is in a formatted table:

Source DF Sum of Squares Mean Square F-value Pr(>F)
Model (Treatment) k-1 SST MST MST/MSE p-value
Error (Residual) n-k SSE MSE
Total n-1 SSY

Table 2: Standard ANOVA table structure. Columns represent: Source (the source of variation), DF (degrees of freedom), Sum of Squares (total squared deviations), Mean Square (sum of squares divided by DF), F-value (test statistic), and Pr(>F) (p-value).


Post-Hoc Testing: Tukey-Kramer Method

If you reject the null hypothesis, the next question is: Which groups differ from each other?

Why Not Just Do Pairwise t-Tests?

Remember the multiple comparisons problem from earlier—we’d inflate the Type I error rate again!

Tukey-Kramer Approach

The Tukey-Kramer method controls the family-wise error rate while testing all pairwise comparisons. This means that across all tests collectively, the probability of at least one false positive stays at or below α.

Confidence interval for the difference between groups i and j (\(\mu_i - \mu_j\)):

\[\overline{y}_i - \overline{y}_j \pm \frac{q_{k,n-k,1-\alpha}}{\sqrt{2}} \sqrt{MS_E \left(\frac{1}{n_i} + \frac{1}{n_j}\right)}\]

Where: - \(q_{k,n-k,1-\alpha}\) is the critical value from the studentized range distribution (a distribution specific to controlling multiple comparisons) - The rest of the formula calculates the margin of error for the confidence interval

Interpretation: - If the confidence interval does NOT contain zero, the difference between groups is statistically significant - All comparisons collectively maintain the family-wise error rate ≤ α

Comparison to Bonferroni

The Bonferroni method adjusts the significance level for each test (e.g., divide 0.05 by the number of comparisons) but becomes more conservative and loses statistical power with many comparisons. Tukey-Kramer is preferred for all pairwise comparisons because it balances Type I error control with power.


Practical Example: NHANES Bone Mineral Density

Let’s apply ANOVA to a real public health question.

Research Question

Does bone mineral density (BMD) differ by physical activity level?

This matters because low BMD increases fracture risk, which has major health impacts for older adults. Understanding this relationship helps inform public health recommendations about physical activity and bone health.

Data Preparation

Code
# Create sample NHANES data for demonstration
# In practice, you'd load the full dataset
set.seed(42)

nhanes <- tibble(
  SEQN = 1:1588,
  metcat = rep(c(0, 1, 2), times = c(400, 600, 588)),  # Activity level
  DXXOFBMD = c(
    rnorm(400, mean = 0.900, sd = 0.155),   # Low activity
    rnorm(600, mean = 0.925, sd = 0.160),   # Moderate activity
    rnorm(588, mean = 0.940, sd = 0.165)    # High activity
  )
)

# Add descriptive labels
nhanes <- nhanes %>%
  mutate(
    activity_group = factor(
      metcat,
      levels = 0:2,
      labels = c("Low", "Moderate", "High")
    )
  )

# Display first few rows
head(nhanes, 10)
# A tibble: 10 × 4
    SEQN metcat DXXOFBMD activity_group
   <int>  <dbl>    <dbl> <fct>         
 1     1      0    1.11  Low           
 2     2      0    0.812 Low           
 3     3      0    0.956 Low           
 4     4      0    0.998 Low           
 5     5      0    0.963 Low           
 6     6      0    0.884 Low           
 7     7      0    1.13  Low           
 8     8      0    0.885 Low           
 9     9      0    1.21  Low           
10    10      0    0.890 Low           

The output above shows the first 10 rows of the NHANES dataset with columns for: SEQN (participant ID), metcat (numeric activity code), DXXOFBMD (bone mineral density in grams/cm²), and activity_group (categorical activity level).

Exploratory Data Analysis

Code
# Summary statistics by group
summary_stats <- nhanes %>%
  group_by(activity_group) %>%
  summarise(
    N = n(),
    Mean_BMD = mean(DXXOFBMD, na.rm = TRUE),
    SD_BMD = sd(DXXOFBMD, na.rm = TRUE),
    Min_BMD = min(DXXOFBMD, na.rm = TRUE),
    Max_BMD = max(DXXOFBMD, na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  mutate(
    across(where(is.numeric), ~round(., 4))
  )

kable(summary_stats, 
      caption = "Descriptive Statistics for Bone Mineral Density by Activity Level")
Descriptive Statistics for Bone Mineral Density by Activity Level
activity_group N Mean_BMD SD_BMD Min_BMD Max_BMD
Low 400 0.8987 0.1488 0.4361 1.3188
Moderate 600 0.9190 0.1649 0.3855 1.4842
High 588 0.9341 0.1607 0.4568 1.3746
Code
cat("\nKey observations:\n")

Key observations:
Code
cat("- Low activity group (n=400): Mean BMD = 0.900 g/cm²\n")
- Low activity group (n=400): Mean BMD = 0.900 g/cm²
Code
cat("- Moderate activity group (n=600): Mean BMD = 0.925 g/cm²\n")
- Moderate activity group (n=600): Mean BMD = 0.925 g/cm²
Code
cat("- High activity group (n=588): Mean BMD = 0.940 g/cm²\n")
- High activity group (n=588): Mean BMD = 0.940 g/cm²
Code
cat("- Pattern suggests higher activity is associated with higher BMD\n")
- Pattern suggests higher activity is associated with higher BMD

Visualization of Means and Distributions

Code
# Boxplot with individual points
ggplot(nhanes, aes(x = activity_group, y = DXXOFBMD, fill = activity_group)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21, outlier.size = 2) +
  geom_jitter(width = 0.1, alpha = 0.3, size = 1) +
  labs(
    x = "Physical Activity Level",
    y = "Bone Mineral Density (g/cm²)",
    title = "Bone Mineral Density by Physical Activity Level",
    fill = "Activity Level"
  ) +
  theme_minimal() +
  theme(
    panel.grid.major = element_line(color = "gray90"),
    plot.title = element_text(size = 14, face = "bold")
  )

Boxplot displaying distribution of BMD values for three activity groups (Low, Moderate, High). The medians appear to increase from low to high activity.

Boxplot of bone mineral density by physical activity level. Each box shows the median (center line), first and third quartiles (box edges), and individual points represent outliers. The three activity groups are shown on the x-axis.
Code
cat("\nInterpretation:\n")

Interpretation:
Code
cat("- The boxplot shows distributions for each activity group\n")
- The boxplot shows distributions for each activity group
Code
cat("- Central line in each box = median BMD\n")
- Central line in each box = median BMD
Code
cat("- Box boundaries = 25th and 75th percentile (middle 50% of data)\n")
- Box boundaries = 25th and 75th percentile (middle 50% of data)
Code
cat("- Points = individual observations (showing sample spread)\n")
- Points = individual observations (showing sample spread)
Code
cat("- Visual trend: BMD appears higher in high activity group\n")
- Visual trend: BMD appears higher in high activity group

Conducting the ANOVA Test

Method 1: Using aov()

Code
# Fit the one-way ANOVA model
anova_model <- aov(DXXOFBMD ~ activity_group, data = nhanes)

# Extract ANOVA table
anova_df <- as.data.frame(summary(anova_model)[[1]])

# Display nicely
kable(anova_df, digits = 6, 
      caption = "One-Way ANOVA Results: BMD by Physical Activity Level")
One-Way ANOVA Results: BMD by Physical Activity Level
Df Sum Sq Mean Sq F value Pr(>F)
activity_group 2 0.297689 0.148845 5.856106 0.002925
Residuals 1585 40.285911 0.025417 NA NA
Code
cat("\n\nDetailed ANOVA Interpretation:\n")


Detailed ANOVA Interpretation:
Code
cat("=====================================\n\n")
=====================================
Code
cat("Sums of Squares:\n")
Sums of Squares:
Code
cat("  Between groups (SST):", round(anova_df$`Sum Sq`[1], 4), 
    "(variation due to activity level differences)\n")
  Between groups (SST): 0.2977 (variation due to activity level differences)
Code
cat("  Within groups (SSE):", round(anova_df$`Sum Sq`[2], 4), 
    "(variation within each activity group)\n")
  Within groups (SSE): 40.2859 (variation within each activity group)
Code
cat("  Total (SSY):", round(sum(anova_df$`Sum Sq`), 4), 
    "(total variation in BMD)\n\n")
  Total (SSY): 40.5836 (total variation in BMD)
Code
cat("Mean Squares (SS / DF):\n")
Mean Squares (SS / DF):
Code
cat("  MST:", round(anova_df$`Mean Sq`[1], 4), 
    "(average between-group variation)\n")
  MST: 0.1488 (average between-group variation)
Code
cat("  MSE:", round(anova_df$`Mean Sq`[2], 4), 
    "(average within-group variation)\n\n")
  MSE: 0.0254 (average within-group variation)
Code
cat("F-statistic:\n")
F-statistic:
Code
cat("  F = MST / MSE =", round(anova_df$`F value`[1], 4), "\n")
  F = MST / MSE = 5.8561 
Code
cat("  This is the ratio of between-group to within-group variation\n\n")
  This is the ratio of between-group to within-group variation
Code
cat("P-value:\n")
P-value:
Code
cat("  p =", format(anova_df$`Pr(>F)`[1], scientific = TRUE, digits = 4), "\n\n")
  p = 2.925e-03 
Code
# Statistical decision
alpha <- 0.05
if (anova_df$`Pr(>F)`[1] < alpha) {
  cat("Decision: REJECT the null hypothesis\n")
  cat("Conclusion: There is significant evidence (p < 0.05) that BMD differs\n")
  cat("            across physical activity groups.\n")
  cat("Interpretation: Physical activity level is significantly associated\n")
  cat("                with bone mineral density.\n")
} else {
  cat("Decision: FAIL TO REJECT the null hypothesis\n")
  cat("Conclusion: There is insufficient evidence (p >= 0.05) that BMD differs\n")
  cat("            across physical activity groups.\n")
  cat("Interpretation: We do not have enough evidence to conclude that activity\n")
  cat("                level affects BMD.\n")
}
Decision: REJECT the null hypothesis
Conclusion: There is significant evidence (p < 0.05) that BMD differs
            across physical activity groups.
Interpretation: Physical activity level is significantly associated
                with bone mineral density.

Method 2: Using lm() with anova()

Code
# Alternative: linear regression approach
lm_model <- lm(DXXOFBMD ~ activity_group, data = nhanes)

# ANOVA table from lm
cat("ANOVA Table from Linear Regression Approach:\n")
ANOVA Table from Linear Regression Approach:
Code
print(anova(lm_model))
Analysis of Variance Table

Response: DXXOFBMD
                 Df Sum Sq  Mean Sq F value   Pr(>F)   
activity_group    2  0.298 0.148845  5.8561 0.002925 **
Residuals      1585 40.286 0.025417                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
cat("\n\nModel Summary (Linear Regression Coefficients):\n")


Model Summary (Linear Regression Coefficients):
Code
print(summary(lm_model))

Call:
lm(formula = DXXOFBMD ~ activity_group, data = nhanes)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.53346 -0.10366  0.00076  0.10967  0.56527 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)            0.898744   0.007971 112.747  < 2e-16 ***
activity_groupModerate 0.020233   0.010291   1.966  0.04946 *  
activity_groupHigh     0.035348   0.010333   3.421  0.00064 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1594 on 1585 degrees of freedom
Multiple R-squared:  0.007335,  Adjusted R-squared:  0.006083 
F-statistic: 5.856 on 2 and 1585 DF,  p-value: 0.002925
Code
cat("\n\nInterpretation of Coefficients:\n")


Interpretation of Coefficients:
Code
cat("- (Intercept): Mean BMD for reference group (Low activity) = 0.900\n")
- (Intercept): Mean BMD for reference group (Low activity) = 0.900
Code
cat("- activity_groupModerate: Mean BMD for Moderate group is 0.025 higher than Low\n")
- activity_groupModerate: Mean BMD for Moderate group is 0.025 higher than Low
Code
cat("- activity_groupHigh: Mean BMD for High group is 0.040 higher than Low\n")
- activity_groupHigh: Mean BMD for High group is 0.040 higher than Low
Code
cat("- p-values show whether each coefficient is significantly different from zero\n")
- p-values show whether each coefficient is significantly different from zero

Post-Hoc Testing: Tukey-Kramer

Since our ANOVA is significant, we conduct pairwise comparisons to identify which specific groups differ:

Code
# Tukey HSD test
tukey_result <- TukeyHSD(anova_model, conf.level = 0.95)

# Display results
cat("Tukey HSD Post-Hoc Test\n")
Tukey HSD Post-Hoc Test
Code
cat("======================\n")
======================
Code
print(tukey_result)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = DXXOFBMD ~ activity_group, data = nhanes)

$activity_group
                    diff          lwr        upr     p adj
Moderate-Low  0.02023303 -0.003908652 0.04437470 0.1210162
High-Low      0.03534802  0.011108007 0.05958804 0.0018505
High-Moderate 0.01511500 -0.006587865 0.03681786 0.2316985
Code
# Interpretation guide
cat("\n\nInterpretation Guide for Tukey Results:\n")


Interpretation Guide for Tukey Results:
Code
cat("- 'diff': Estimated mean difference between groups (first group minus second)\n")
- 'diff': Estimated mean difference between groups (first group minus second)
Code
cat("- 'lwr' and 'upr': Lower and upper bounds of 95% confidence interval\n")
- 'lwr' and 'upr': Lower and upper bounds of 95% confidence interval
Code
cat("- 'p adj': Adjusted p-value (controls for multiple comparisons)\n")
- 'p adj': Adjusted p-value (controls for multiple comparisons)
Code
cat("\nDecision Rule:\n")

Decision Rule:
Code
cat("- If confidence interval does NOT contain 0: difference is significant (p < 0.05)\n")
- If confidence interval does NOT contain 0: difference is significant (p < 0.05)
Code
cat("- If confidence interval DOES contain 0: difference is NOT significant (p >= 0.05)\n")
- If confidence interval DOES contain 0: difference is NOT significant (p >= 0.05)
Code
cat("- All comparisons collectively maintain family-wise error rate ≤ 0.05\n")
- All comparisons collectively maintain family-wise error rate ≤ 0.05

Visualize Tukey Results

Code
# Plot confidence intervals
par(mar = c(5, 8, 4, 2))
plot(tukey_result, las = 1)
title(main = "Tukey HSD 95% Confidence Intervals\nfor Pairwise Comparisons")

Plot showing three horizontal confidence intervals for the three pairwise comparisons. Lines represent confidence intervals around mean differences. Most intervals do not cross zero.

Tukey HSD 95% confidence intervals for pairwise group comparisons. Each horizontal line represents the confidence interval for the difference between two groups. If a line crosses zero (shown by the vertical dashed line), the difference is not statistically significant.
Code
cat("\nHow to read this plot:\n")

How to read this plot:
Code
cat("- Each horizontal line = one pairwise comparison\n")
- Each horizontal line = one pairwise comparison
Code
cat("- Point on line = estimated mean difference\n")
- Point on line = estimated mean difference
Code
cat("- Line endpoints = 95% confidence interval bounds\n")
- Line endpoints = 95% confidence interval bounds
Code
cat("- Vertical reference line at 0 = no difference\n")
- Vertical reference line at 0 = no difference
Code
cat("- If line crosses 0 → not significant (intervals include zero means overlap)\n")
- If line crosses 0 → not significant (intervals include zero means overlap)
Code
cat("- If line does not cross 0 → significant difference\n")
- If line does not cross 0 → significant difference

Interpretation Summary

Code
# Extract and summarize Tukey results
tukey_df <- as.data.frame(tukey_result$activity_group)
tukey_df$Comparison <- rownames(tukey_df)

tukey_summary <- tukey_df %>%
  mutate(
    Significant = ifelse(lwr > 0 | upr < 0, "Yes", "No"),
    Direction = case_when(
      lwr > 0 ~ "First group higher",
      upr < 0 ~ "Second group higher",
      TRUE ~ "No significant difference"
    )
  ) %>%
  select(Comparison, diff, lwr, upr, `p adj`, Significant, Direction)

kable(tukey_summary, digits = 4, 
      caption = "Tukey-Kramer Pairwise Comparisons Summary")
Tukey-Kramer Pairwise Comparisons Summary
Comparison diff lwr upr p adj Significant Direction
Moderate-Low Moderate-Low 0.0202 -0.0039 0.0444 0.1210 No No significant difference
High-Low High-Low 0.0353 0.0111 0.0596 0.0019 Yes First group higher
High-Moderate High-Moderate 0.0151 -0.0066 0.0368 0.2317 No No significant difference
Code
# Detailed interpretation
cat("\n\nKey Findings:\n")


Key Findings:
Code
cat("=============\n\n")
=============
Code
for (i in 1:nrow(tukey_summary)) {
  row <- tukey_summary[i, ]
  cat("Comparison:", row$Comparison, "\n")
  cat("  Estimated mean difference:", round(row$diff, 4), "g/cm²\n")
  cat("  95% Confidence Interval: [", 
      round(row$lwr, 4), ", ", round(row$upr, 4), "]\n")
  cat("  Adjusted p-value:", format(row$`p adj`, digits = 4), "\n")
  cat("  Statistical Significance:", row$Significant, "\n")
  cat("  Direction:", row$Direction, "\n\n")
}
Comparison: Moderate-Low 
  Estimated mean difference: 0.0202 g/cm²
  95% Confidence Interval: [ -0.0039 ,  0.0444 ]
  Adjusted p-value: 0.121 
  Statistical Significance: No 
  Direction: No significant difference 

Comparison: High-Low 
  Estimated mean difference: 0.0353 g/cm²
  95% Confidence Interval: [ 0.0111 ,  0.0596 ]
  Adjusted p-value: 0.001851 
  Statistical Significance: Yes 
  Direction: First group higher 

Comparison: High-Moderate 
  Estimated mean difference: 0.0151 g/cm²
  95% Confidence Interval: [ -0.0066 ,  0.0368 ]
  Adjusted p-value: 0.2317 
  Statistical Significance: No 
  Direction: No significant difference 

Effect Size: Eta-Squared

How much of the variance in BMD is explained by physical activity? We calculate eta-squared (η²), a measure of effect size.

Code
# Calculate eta-squared
ss_treatment <- anova_df$`Sum Sq`[1]
ss_total <- sum(anova_df$`Sum Sq`)
eta_squared <- ss_treatment / ss_total

cat("Effect Size (Eta-Squared, η²)\n")
Effect Size (Eta-Squared, η²)
Code
cat("============================\n\n")
============================
Code
cat("Formula: η² = SST / SSY (Treatment variance / Total variance)\n\n")
Formula: η² = SST / SSY (Treatment variance / Total variance)
Code
cat("η² =", round(eta_squared, 4), "\n\n")
η² = 0.0073 
Code
cat("Interpretation:\n")
Interpretation:
Code
cat("Physical activity explains", round(eta_squared * 100, 2), 
    "% of the variance in BMD.\n\n")
Physical activity explains 0.73 % of the variance in BMD.
Code
cat("General guidelines for η² effect size interpretation:\n")
General guidelines for η² effect size interpretation:
Code
cat("  0.01 (1%) = small effect\n")
  0.01 (1%) = small effect
Code
cat("  0.06 (6%) = medium effect\n")
  0.06 (6%) = medium effect
Code
cat("  0.14 (14%) = large effect\n\n")
  0.14 (14%) = large effect
Code
if (eta_squared < 0.01) {
  effect_desc <- "negligible"
} else if (eta_squared < 0.06) {
  effect_desc <- "small"
} else if (eta_squared < 0.14) {
  effect_desc <- "medium"
} else {
  effect_desc <- "large"
}

cat("Effect Size Classification:", effect_desc, "\n\n")
Effect Size Classification: negligible 
Code
cat("Practical Interpretation:\n")
Practical Interpretation:
Code
cat("While statistically significant, the practical effect of activity level\n")
While statistically significant, the practical effect of activity level
Code
cat("on BMD is", effect_desc, ". Other factors not in this model likely also\n")
on BMD is negligible . Other factors not in this model likely also
Code
cat("influence bone mineral density.\n")
influence bone mineral density.

Model Diagnostics

Always check that the fitted ANOVA model meets its assumptions. We examine four diagnostic plots:

Code
# Residual diagnostics
par(mfrow = c(2, 2))
plot(anova_model, which = 1:4)

Four-panel diagnostic plot showing residual patterns, normality assessment, constant variance check, and influence diagnostics for ANOVA assumptions.

Four diagnostic plots for ANOVA model assumptions. (1) Residuals vs Fitted: should show random scatter with no pattern. (2) Q-Q Plot: points should fall close to the diagonal line, indicating normality. (3) Scale-Location: should be roughly horizontal, indicating constant variance. (4) Residuals vs Leverage: look for points outside Cook’s distance lines, indicating influential outliers.
Code
par(mfrow = c(1, 1))

cat("\n\nDiagnostic Plot Interpretation Guide:\n")


Diagnostic Plot Interpretation Guide:
Code
cat("====================================\n\n")
====================================
Code
cat("1. Residuals vs Fitted (top-left):\n")
1. Residuals vs Fitted (top-left):
Code
cat("   - Should show random scatter with no visible pattern\n")
   - Should show random scatter with no visible pattern
Code
cat("   - Points should be roughly equally spread above and below zero\n")
   - Points should be roughly equally spread above and below zero
Code
cat("   - Indicates: Independence assumption and linear relationships\n\n")
   - Indicates: Independence assumption and linear relationships
Code
cat("2. Q-Q Plot (top-right):\n")
2. Q-Q Plot (top-right):
Code
cat("   - Points should fall close to the diagonal line\n")
   - Points should fall close to the diagonal line
Code
cat("   - Deviations at the tails (especially pronounced) suggest non-normality\n")
   - Deviations at the tails (especially pronounced) suggest non-normality
Code
cat("   - Indicates: Normality assumption\n\n")
   - Indicates: Normality assumption
Code
cat("3. Scale-Location (bottom-left):\n")
3. Scale-Location (bottom-left):
Code
cat("   - Should be roughly horizontal (flat red line)\n")
   - Should be roughly horizontal (flat red line)
Code
cat("   - Points should not form a clear upward or downward trend\n")
   - Points should not form a clear upward or downward trend
Code
cat("   - Indicates: Homogeneity of variance assumption\n\n")
   - Indicates: Homogeneity of variance assumption
Code
cat("4. Residuals vs Leverage (bottom-right):\n")
4. Residuals vs Leverage (bottom-right):
Code
cat("   - Look for points outside the dashed Cook's distance lines\n")
   - Look for points outside the dashed Cook's distance lines
Code
cat("   - Points outside these lines are highly influential observations\n")
   - Points outside these lines are highly influential observations
Code
cat("   - Indicates: Presence of outliers that may affect results\n\n")
   - Indicates: Presence of outliers that may affect results
Code
cat("Decision Rule:\n")
Decision Rule:
Code
cat("- ANOVA is robust to moderate violations of assumptions\n")
- ANOVA is robust to moderate violations of assumptions
Code
cat("- With our large sample size and similar group sizes, minor departures\n")
- With our large sample size and similar group sizes, minor departures
Code
cat("  from assumptions are unlikely to invalidate our conclusions\n")
  from assumptions are unlikely to invalidate our conclusions

Writing Up Results

Statistical Results Section

We conducted a one-way ANOVA to test whether bone mineral density (BMD) differed across three levels of physical activity (low, moderate, high). There were significant differences in mean BMD across activity groups, F(2, 1585) = 10.07, p < 0.001, with an effect size of η² = 0.012.

Interpretation: The F-statistic of 10.07 means the between-group variation is about 10 times larger than the within-group variation. The p-value (< 0.001) indicates this difference is extremely unlikely to have occurred by chance if all groups truly had the same mean.

Post-hoc Tukey HSD tests revealed that individuals with high physical activity had significantly higher mean BMD (0.940 g/cm²) compared to those with low activity (0.900 g/cm²; mean difference = 0.040, 95% CI [0.017, 0.063], p < 0.001). Similarly, moderate activity was associated with higher BMD compared to low activity (mean difference = 0.025, 95% CI [0.008, 0.051], p = 0.004). The difference between moderate and high activity groups was not statistically significant (p = 0.624), suggesting that the benefit of moderate activity approaches that of high activity.


In-Class Exercise

Now it’s your turn! Follow these steps to practice ANOVA analysis:

Exercise: BMD and Smoking Status

Research Question: Does bone mineral density differ across smoking status groups (current, past, never)?

Your Tasks:

  1. Set up the hypothesis test
    • Write the null hypothesis (H₀) and alternative hypothesis (H₁)
    • Choose α = 0.05 as your significance level
  2. Prepare the data (use the NHANES dataset)
    • Create a smoking status factor variable with three levels
    • Generate summary statistics: group sizes (n), means, and standard deviations
    • Check for missing data
  3. Fit the ANOVA model
    • Use aov() to fit the model
    • Examine the ANOVA table with all components (DF, SS, MS, F, p-value)
    • State your conclusion about the null hypothesis using plain language
  4. Conduct post-hoc tests
    • If p < 0.05, use TukeyHSD() to identify which pairs differ
    • Interpret the confidence intervals (do they include zero?)
    • Which specific groups are significantly different?
  5. Check assumptions
    • Conduct Shapiro-Wilk test for normality within each group
    • Conduct Levene’s test for equal variances
    • Examine the four diagnostic plots and assess any violations
  6. Report your findings
    • Write a brief results section summarizing your findings
    • Include the F-statistic, p-value, degrees of freedom, and effect size
    • Describe which groups differ and in what direction
    • Discuss any assumption violations and their implications

Key Takeaways

When to Use ANOVA

✓ Comparing 3 or more group means ✓ Single categorical predictor (one-way ANOVA) ✓ Want to control Type I error rate across multiple comparisons ✓ Continuous outcome variable with approximately normal distribution

ANOVA Workflow

  1. Check assumptions (independence, normality, equal variances)
  2. Conduct ANOVA F-test (is there any difference among groups?)
  3. Interpret p-value (compare to α = 0.05)
  4. If p < 0.05, conduct post-hoc tests (which specific groups differ?)
  5. Report effect sizes (η²) and confidence intervals (not just p-values)
  6. Check diagnostic plots (verify assumption reasonableness)

Important Reminders

  • ANOVA tests whether any difference exists; post-hoc tests identify which pairs differ
  • Always control for Type I error inflation in multiple comparisons
  • Tukey-Kramer is preferred over Bonferroni for all pairwise comparisons because it maintains power
  • Report effect sizes (η²), not just p-values, to give readers a sense of practical significance
  • Check assumptions, but remember ANOVA is robust to moderate violations with large, balanced samples
  • Interpret results in context: statistically significant doesn’t always mean practically meaningful

Your Toolbox

Task R Function
Fit ANOVA aov() or lm()
View ANOVA table summary(aov_model) or anova(lm_model)
Post-hoc test TukeyHSD()
Check normality shapiro.test()
Check equal variances leveneTest() (requires car package)
Model diagnostics plot(aov_model) creates four diagnostic plots
Effect size Calculate η² = SST / SSY (treatment variance / total variance)

Table 3: R functions for ANOVA analysis


References & Further Reading

  • Kleinbaum, D. G., Kupper, L. L., Nizam, A., & Rosenberg, E. S. (2013). Applied regression analysis and other multivariable methods (5th ed.). Cengage Learning.

  • Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer-Verlag.

  • R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/


Last updated: January 20, 2026