Analysis of Variance (ANOVA)

EPI 553: Principles of Statistical Inference II

Author

Muntasir Masum, PhD

Published

January 20, 2026

Learning Objectives

By the end of this lecture, you will be able to:

Understand the purpose and application of ANOVA
Decompose total variance into treatment and error components
Interpret ANOVA tables and F-test results
Conduct post-hoc comparisons using Tukey-Kramer method
Apply ANOVA to real public health data using R
Translate research questions into testable hypotheses for multiple group comparisons

Why ANOVA Matters: Real-World Applications

ANOVA is one of the most powerful tools in your biostatistics toolkit. Here’s why you’ll use it throughout your public health career:

Example 1: Medication Effectiveness - Question: Does blood pressure reduction differ across three different medications? - ANOVA Solution: Compare mean BP reduction across treatment groups

Example 2: Environmental Health - Question: How does respiratory hospital admission rate vary by air pollution level (low/moderate/high)? - ANOVA Solution: Test for differences in mean admission rates across pollution categories

Example 3: Population Health - Question: Does COVID-19 vaccination rate differ across multiple counties? - ANOVA Solution: Compare vaccination rates across geographic units

Example 4: Occupational Health - Question: Do lung function outcomes differ across different occupational exposure groups? - ANOVA Solution: Test for differences in forced expiratory volume across exposure categories

When/Why to Use ANOVA

The Problem ANOVA Solves

Suppose you want to compare three or more group means. Your first instinct might be: “Why not just do multiple t-tests?”

The Multiple Comparisons Problem

If you conduct independent t-tests for each pair of groups:

With k groups, you need \(\binom{k}{2}\) pairwise tests (pronounced “k choose 2”)
Each test has a Type I error rate of α = 0.05
The experiment-wise error rate inflates dramatically

Example with 3 groups (3 comparisons):

The experiment-wise error rate equals 1 minus (1 minus 0.05) raised to the power of 3, which equals 1 minus 0.857, which equals 0.143.

\[\text{Experiment-wise error rate} = 1 - (1-0.05)^3 = 1 - 0.857 = 0.143\]

Your true probability of making at least one false positive is 14.3%, not 5%!

ANOVA’s Solution

Conducts a single omnibus test of all means simultaneously
Controls the experiment-wise Type I error rate at α
If significant, use post-hoc tests (like Tukey-Kramer) to identify which pairs differ

ANOVA Fundamentals

Hypotheses

Null hypothesis (H₀): All group means are equal (μ₁ = μ₂ = … = μₖ)

\[H_0: \mu_1 = \mu_2 = \cdots = \mu_k\]

Alternative hypothesis (H₁): At least one group mean is different from the others

\[H_1: \text{At least one mean is different}\]

Key Terminology

Term	Definition
Factor	The categorical independent variable (e.g., treatment type)
Levels	The number of groups in the factor (e.g., 3 medications = 3 levels)
One-way ANOVA	One factor with multiple levels (our focus)
Two-way ANOVA	Two factors (beyond this course scope)

Table 1: Key terminology used in ANOVA analysis

The Math: Decomposing Variance

ANOVA works by partitioning total variation into two components:

Total Sum of Squares (SSY)

The total deviation of all observations from the grand mean. This represents the overall variability in the outcome across all groups.

\[SS_Y = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \overline{y})^2\]

Where: - \(y_{ij}\) = observation j in group i - \(\overline{y}\) = grand mean (the average across all observations and groups)

Key Identity: Decomposition

Total variation equals the sum of between-group and within-group variation:

\[SS_Y = SS_T + SS_E\]

Treatment Sum of Squares (SST) - Variation between groups: \[SS_T = \sum_{i=1}^{k} n_i(\overline{y}_{i} - \overline{y})^2\]

This represents how much each group’s mean differs from the overall grand mean, weighted by group size.

Error Sum of Squares (SSE) - Variation within groups: \[SS_E = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \overline{y}_{i})^2\]

This represents the variability of individual observations around their own group mean. This is the “leftover” or unexplained variation.

Mean Squares

Divide by degrees of freedom to get mean squares, which account for the number of parameters estimated:

\[MS_T = \frac{SS_T}{k-1}\]

\[MS_E = \frac{SS_E}{n-k}\]

Note: Degrees of freedom for treatment equals the number of groups minus 1 (k-1). Degrees of freedom for error equals the total sample size minus the number of groups (n-k).

The F-Statistic

The test statistic for ANOVA is the F-ratio:

\[F = \frac{MS_T}{MS_E}\]

Interpretation

Numerator (MST): Average variation between groups. This captures the treatment effect or group differences.
Denominator (MSE): Average variation within groups. This captures random error and noise.
Large F: Groups differ more than we’d expect by chance alone (suggests a real treatment effect)
Small F: Differences between groups are small relative to within-group variation (suggests no treatment effect)

Decision Rule

Reject the null hypothesis if F exceeds the critical value \(F_{k-1, n-k, 1-\alpha}\) at significance level α (typically 0.05).

In practice, we compare the p-value to α: reject \(H_0\) if p-value < α.

ANOVA Assumptions

ANOVA requires three key assumptions. Meeting these assumptions ensures the validity of your p-values and inference.

1. Independence

Observations are independent. Each observation should not influence another. This assumption is violated in repeated measures designs, clustered data, or time series data.

2. Normality

Within each group, the outcome is approximately normally distributed. The distribution should resemble a bell curve.

3. Homogeneity of Variance

All groups have equal population variances. Mathematically, this means: \(\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_k^2\)

In other words, the spread of values around each group’s mean should be roughly the same across all groups.

Robustness

Good news: ANOVA is robust to moderate violations of these assumptions, particularly: - With larger sample sizes (typically n > 30 per group) - With equal or similar group sizes - With only slight departures from normality

We’ll check these assumptions in our example to demonstrate best practices.

The ANOVA Table

The standard way to present ANOVA results is in a formatted table:

Source	DF	Sum of Squares	Mean Square	F-value	Pr(>F)
Model (Treatment)	k-1	SST	MST	MST/MSE	p-value
Error (Residual)	n-k	SSE	MSE	—	—
Total	n-1	SSY	—	—	—

Table 2: Standard ANOVA table structure. Columns represent: Source (the source of variation), DF (degrees of freedom), Sum of Squares (total squared deviations), Mean Square (sum of squares divided by DF), F-value (test statistic), and Pr(>F) (p-value).

Post-Hoc Testing: Tukey-Kramer Method

If you reject the null hypothesis, the next question is: Which groups differ from each other?

Why Not Just Do Pairwise t-Tests?

Remember the multiple comparisons problem from earlier—we’d inflate the Type I error rate again!

Tukey-Kramer Approach

The Tukey-Kramer method controls the family-wise error rate while testing all pairwise comparisons. This means that across all tests collectively, the probability of at least one false positive stays at or below α.

Confidence interval for the difference between groups i and j (\(\mu_i - \mu_j\)):

\[\overline{y}_i - \overline{y}_j \pm \frac{q_{k,n-k,1-\alpha}}{\sqrt{2}} \sqrt{MS_E \left(\frac{1}{n_i} + \frac{1}{n_j}\right)}\]

Where: - \(q_{k,n-k,1-\alpha}\) is the critical value from the studentized range distribution (a distribution specific to controlling multiple comparisons) - The rest of the formula calculates the margin of error for the confidence interval

Interpretation: - If the confidence interval does NOT contain zero, the difference between groups is statistically significant - All comparisons collectively maintain the family-wise error rate ≤ α

Comparison to Bonferroni

The Bonferroni method adjusts the significance level for each test (e.g., divide 0.05 by the number of comparisons) but becomes more conservative and loses statistical power with many comparisons. Tukey-Kramer is preferred for all pairwise comparisons because it balances Type I error control with power.

Practical Example: NHANES Bone Mineral Density

Let’s apply ANOVA to a real public health question.

Research Question

Does bone mineral density (BMD) differ by physical activity level?

This matters because low BMD increases fracture risk, which has major health impacts for older adults. Understanding this relationship helps inform public health recommendations about physical activity and bone health.

Data Preparation

Code

# Create sample NHANES data for demonstration
# In practice, you'd load the full dataset
set.seed(42)

nhanes <- tibble(
  SEQN = 1:1588,
  metcat = rep(c(0, 1, 2), times = c(400, 600, 588)),  # Activity level
  DXXOFBMD = c(
    rnorm(400, mean = 0.900, sd = 0.155),   # Low activity
    rnorm(600, mean = 0.925, sd = 0.160),   # Moderate activity
    rnorm(588, mean = 0.940, sd = 0.165)    # High activity
  )
)

# Add descriptive labels
nhanes <- nhanes %>%
  mutate(
    activity_group = factor(
      metcat,
      levels = 0:2,
      labels = c("Low", "Moderate", "High")
    )
  )

# Display first few rows
head(nhanes, 10)

# A tibble: 10 × 4
    SEQN metcat DXXOFBMD activity_group
   <int>  <dbl>    <dbl> <fct>         
 1     1      0    1.11  Low           
 2     2      0    0.812 Low           
 3     3      0    0.956 Low           
 4     4      0    0.998 Low           
 5     5      0    0.963 Low           
 6     6      0    0.884 Low           
 7     7      0    1.13  Low           
 8     8      0    0.885 Low           
 9     9      0    1.21  Low           
10    10      0    0.890 Low

The output above shows the first 10 rows of the NHANES dataset with columns for: SEQN (participant ID), metcat (numeric activity code), DXXOFBMD (bone mineral density in grams/cm²), and activity_group (categorical activity level).

Exploratory Data Analysis

Code

# Summary statistics by group
summary_stats <- nhanes %>%
  group_by(activity_group) %>%
  summarise(
    N = n(),
    Mean_BMD = mean(DXXOFBMD, na.rm = TRUE),
    SD_BMD = sd(DXXOFBMD, na.rm = TRUE),
    Min_BMD = min(DXXOFBMD, na.rm = TRUE),
    Max_BMD = max(DXXOFBMD, na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  mutate(
    across(where(is.numeric), ~round(., 4))
  )

kable(summary_stats, 
      caption = "Descriptive Statistics for Bone Mineral Density by Activity Level")

Descriptive Statistics for Bone Mineral Density by Activity Level
activity_group	N	Mean_BMD	SD_BMD	Min_BMD	Max_BMD
Low	400	0.8987	0.1488	0.4361	1.3188
Moderate	600	0.9190	0.1649	0.3855	1.4842
High	588	0.9341	0.1607	0.4568	1.3746

Code

cat("\nKey observations:\n")


Key observations:

Code

cat("- Low activity group (n=400): Mean BMD = 0.900 g/cm²\n")

- Low activity group (n=400): Mean BMD = 0.900 g/cm²

Code

cat("- Moderate activity group (n=600): Mean BMD = 0.925 g/cm²\n")

- Moderate activity group (n=600): Mean BMD = 0.925 g/cm²

Code

cat("- High activity group (n=588): Mean BMD = 0.940 g/cm²\n")

- High activity group (n=588): Mean BMD = 0.940 g/cm²

Code

cat("- Pattern suggests higher activity is associated with higher BMD\n")

- Pattern suggests higher activity is associated with higher BMD

Visualization of Means and Distributions

Code

# Boxplot with individual points
ggplot(nhanes, aes(x = activity_group, y = DXXOFBMD, fill = activity_group)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21, outlier.size = 2) +
  geom_jitter(width = 0.1, alpha = 0.3, size = 1) +
  labs(
    x = "Physical Activity Level",
    y = "Bone Mineral Density (g/cm²)",
    title = "Bone Mineral Density by Physical Activity Level",
    fill = "Activity Level"
  ) +
  theme_minimal() +
  theme(
    panel.grid.major = element_line(color = "gray90"),
    plot.title = element_text(size = 14, face = "bold")
  )

Boxplot displaying distribution of BMD values for three activity groups (Low, Moderate, High). The medians appear to increase from low to high activity. — Boxplot of bone mineral density by physical activity level. Each box shows the median (center line), first and third quartiles (box edges), and individual points represent outliers. The three activity groups are shown on the x-axis.

Code

cat("\nInterpretation:\n")


Interpretation:

Code

cat("- The boxplot shows distributions for each activity group\n")

- The boxplot shows distributions for each activity group

Code

cat("- Central line in each box = median BMD\n")

- Central line in each box = median BMD

Code

cat("- Box boundaries = 25th and 75th percentile (middle 50% of data)\n")

- Box boundaries = 25th and 75th percentile (middle 50% of data)

Code

cat("- Points = individual observations (showing sample spread)\n")

- Points = individual observations (showing sample spread)

Code

cat("- Visual trend: BMD appears higher in high activity group\n")

- Visual trend: BMD appears higher in high activity group

Conducting the ANOVA Test

Method 1: Using `aov()`

Code

# Fit the one-way ANOVA model
anova_model <- aov(DXXOFBMD ~ activity_group, data = nhanes)

# Extract ANOVA table
anova_df <- as.data.frame(summary(anova_model)[[1]])

# Display nicely
kable(anova_df, digits = 6, 
      caption = "One-Way ANOVA Results: BMD by Physical Activity Level")

One-Way ANOVA Results: BMD by Physical Activity Level
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
activity_group	2	0.297689	0.148845	5.856106	0.002925
Residuals	1585	40.285911	0.025417	NA	NA

Code

cat("\n\nDetailed ANOVA Interpretation:\n")



Detailed ANOVA Interpretation:

Code

cat("=====================================\n\n")

=====================================

Code

cat("Sums of Squares:\n")

Sums of Squares:

Code

cat("  Between groups (SST):", round(anova_df$`Sum Sq`[1], 4), 
    "(variation due to activity level differences)\n")

  Between groups (SST): 0.2977 (variation due to activity level differences)

Code

cat("  Within groups (SSE):", round(anova_df$`Sum Sq`[2], 4), 
    "(variation within each activity group)\n")

  Within groups (SSE): 40.2859 (variation within each activity group)

Code

cat("  Total (SSY):", round(sum(anova_df$`Sum Sq`), 4), 
    "(total variation in BMD)\n\n")

  Total (SSY): 40.5836 (total variation in BMD)

Code

cat("Mean Squares (SS / DF):\n")

Mean Squares (SS / DF):

Code

cat("  MST:", round(anova_df$`Mean Sq`[1], 4), 
    "(average between-group variation)\n")

  MST: 0.1488 (average between-group variation)

Code

cat("  MSE:", round(anova_df$`Mean Sq`[2], 4), 
    "(average within-group variation)\n\n")

  MSE: 0.0254 (average within-group variation)

Code

cat("F-statistic:\n")

F-statistic:

Code

cat("  F = MST / MSE =", round(anova_df$`F value`[1], 4), "\n")

  F = MST / MSE = 5.8561

Code

cat("  This is the ratio of between-group to within-group variation\n\n")

  This is the ratio of between-group to within-group variation

Code

cat("P-value:\n")

P-value:

Code

cat("  p =", format(anova_df$`Pr(>F)`[1], scientific = TRUE, digits = 4), "\n\n")

  p = 2.925e-03

Code

# Statistical decision
alpha <- 0.05
if (anova_df$`Pr(>F)`[1] < alpha) {
  cat("Decision: REJECT the null hypothesis\n")
  cat("Conclusion: There is significant evidence (p < 0.05) that BMD differs\n")
  cat("            across physical activity groups.\n")
  cat("Interpretation: Physical activity level is significantly associated\n")
  cat("                with bone mineral density.\n")
} else {
  cat("Decision: FAIL TO REJECT the null hypothesis\n")
  cat("Conclusion: There is insufficient evidence (p >= 0.05) that BMD differs\n")
  cat("            across physical activity groups.\n")
  cat("Interpretation: We do not have enough evidence to conclude that activity\n")
  cat("                level affects BMD.\n")
}

Decision: REJECT the null hypothesis
Conclusion: There is significant evidence (p < 0.05) that BMD differs
            across physical activity groups.
Interpretation: Physical activity level is significantly associated
                with bone mineral density.

Method 2: Using `lm()` with `anova()`

Code

# Alternative: linear regression approach
lm_model <- lm(DXXOFBMD ~ activity_group, data = nhanes)

# ANOVA table from lm
cat("ANOVA Table from Linear Regression Approach:\n")

ANOVA Table from Linear Regression Approach:

Code

print(anova(lm_model))

Analysis of Variance Table

Response: DXXOFBMD
                 Df Sum Sq  Mean Sq F value   Pr(>F)   
activity_group    2  0.298 0.148845  5.8561 0.002925 **
Residuals      1585 40.286 0.025417                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code

cat("\n\nModel Summary (Linear Regression Coefficients):\n")



Model Summary (Linear Regression Coefficients):

Code

print(summary(lm_model))


Call:
lm(formula = DXXOFBMD ~ activity_group, data = nhanes)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.53346 -0.10366  0.00076  0.10967  0.56527 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)            0.898744   0.007971 112.747  < 2e-16 ***
activity_groupModerate 0.020233   0.010291   1.966  0.04946 *  
activity_groupHigh     0.035348   0.010333   3.421  0.00064 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1594 on 1585 degrees of freedom
Multiple R-squared:  0.007335,  Adjusted R-squared:  0.006083 
F-statistic: 5.856 on 2 and 1585 DF,  p-value: 0.002925

Code

cat("\n\nInterpretation of Coefficients:\n")



Interpretation of Coefficients:

Code

cat("- (Intercept): Mean BMD for reference group (Low activity) = 0.900\n")

- (Intercept): Mean BMD for reference group (Low activity) = 0.900

Code

cat("- activity_groupModerate: Mean BMD for Moderate group is 0.025 higher than Low\n")

- activity_groupModerate: Mean BMD for Moderate group is 0.025 higher than Low

Code

cat("- activity_groupHigh: Mean BMD for High group is 0.040 higher than Low\n")

- activity_groupHigh: Mean BMD for High group is 0.040 higher than Low

Code

cat("- p-values show whether each coefficient is significantly different from zero\n")

- p-values show whether each coefficient is significantly different from zero

Post-Hoc Testing: Tukey-Kramer

Since our ANOVA is significant, we conduct pairwise comparisons to identify which specific groups differ:

Code

# Tukey HSD test
tukey_result <- TukeyHSD(anova_model, conf.level = 0.95)

# Display results
cat("Tukey HSD Post-Hoc Test\n")

Tukey HSD Post-Hoc Test

Code

cat("======================\n")

======================

Code

print(tukey_result)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = DXXOFBMD ~ activity_group, data = nhanes)

$activity_group
                    diff          lwr        upr     p adj
Moderate-Low  0.02023303 -0.003908652 0.04437470 0.1210162
High-Low      0.03534802  0.011108007 0.05958804 0.0018505
High-Moderate 0.01511500 -0.006587865 0.03681786 0.2316985

Code

# Interpretation guide
cat("\n\nInterpretation Guide for Tukey Results:\n")



Interpretation Guide for Tukey Results:

Code

cat("- 'diff': Estimated mean difference between groups (first group minus second)\n")

- 'diff': Estimated mean difference between groups (first group minus second)

Code

cat("- 'lwr' and 'upr': Lower and upper bounds of 95% confidence interval\n")

- 'lwr' and 'upr': Lower and upper bounds of 95% confidence interval

Code

cat("- 'p adj': Adjusted p-value (controls for multiple comparisons)\n")

- 'p adj': Adjusted p-value (controls for multiple comparisons)

Code

cat("\nDecision Rule:\n")


Decision Rule:

Code

cat("- If confidence interval does NOT contain 0: difference is significant (p < 0.05)\n")

- If confidence interval does NOT contain 0: difference is significant (p < 0.05)

Code

cat("- If confidence interval DOES contain 0: difference is NOT significant (p >= 0.05)\n")

- If confidence interval DOES contain 0: difference is NOT significant (p >= 0.05)

Code

cat("- All comparisons collectively maintain family-wise error rate ≤ 0.05\n")

- All comparisons collectively maintain family-wise error rate ≤ 0.05

Visualize Tukey Results

Code

# Plot confidence intervals
par(mar = c(5, 8, 4, 2))
plot(tukey_result, las = 1)
title(main = "Tukey HSD 95% Confidence Intervals\nfor Pairwise Comparisons")

Plot showing three horizontal confidence intervals for the three pairwise comparisons. Lines represent confidence intervals around mean differences. Most intervals do not cross zero. — Tukey HSD 95% confidence intervals for pairwise group comparisons. Each horizontal line represents the confidence interval for the difference between two groups. If a line crosses zero (shown by the vertical dashed line), the difference is not statistically significant.

Code

cat("\nHow to read this plot:\n")


How to read this plot:

Code

cat("- Each horizontal line = one pairwise comparison\n")

- Each horizontal line = one pairwise comparison

Code

cat("- Point on line = estimated mean difference\n")

- Point on line = estimated mean difference

Code

cat("- Line endpoints = 95% confidence interval bounds\n")

- Line endpoints = 95% confidence interval bounds

Code

cat("- Vertical reference line at 0 = no difference\n")

- Vertical reference line at 0 = no difference

Code

cat("- If line crosses 0 → not significant (intervals include zero means overlap)\n")

- If line crosses 0 → not significant (intervals include zero means overlap)

Code

cat("- If line does not cross 0 → significant difference\n")

- If line does not cross 0 → significant difference

Interpretation Summary

Code

# Extract and summarize Tukey results
tukey_df <- as.data.frame(tukey_result$activity_group)
tukey_df$Comparison <- rownames(tukey_df)

tukey_summary <- tukey_df %>%
  mutate(
    Significant = ifelse(lwr > 0 | upr < 0, "Yes", "No"),
    Direction = case_when(
      lwr > 0 ~ "First group higher",
      upr < 0 ~ "Second group higher",
      TRUE ~ "No significant difference"
    )
  ) %>%
  select(Comparison, diff, lwr, upr, `p adj`, Significant, Direction)

kable(tukey_summary, digits = 4, 
      caption = "Tukey-Kramer Pairwise Comparisons Summary")

Tukey-Kramer Pairwise Comparisons Summary
	Comparison	diff	lwr	upr	p adj	Significant	Direction
Moderate-Low	Moderate-Low	0.0202	-0.0039	0.0444	0.1210	No	No significant difference
High-Low	High-Low	0.0353	0.0111	0.0596	0.0019	Yes	First group higher
High-Moderate	High-Moderate	0.0151	-0.0066	0.0368	0.2317	No	No significant difference

Code

# Detailed interpretation
cat("\n\nKey Findings:\n")



Key Findings:

Code

cat("=============\n\n")

=============

Code

for (i in 1:nrow(tukey_summary)) {
  row <- tukey_summary[i, ]
  cat("Comparison:", row$Comparison, "\n")
  cat("  Estimated mean difference:", round(row$diff, 4), "g/cm²\n")
  cat("  95% Confidence Interval: [", 
      round(row$lwr, 4), ", ", round(row$upr, 4), "]\n")
  cat("  Adjusted p-value:", format(row$`p adj`, digits = 4), "\n")
  cat("  Statistical Significance:", row$Significant, "\n")
  cat("  Direction:", row$Direction, "\n\n")
}

Comparison: Moderate-Low 
  Estimated mean difference: 0.0202 g/cm²
  95% Confidence Interval: [ -0.0039 ,  0.0444 ]
  Adjusted p-value: 0.121 
  Statistical Significance: No 
  Direction: No significant difference 

Comparison: High-Low 
  Estimated mean difference: 0.0353 g/cm²
  95% Confidence Interval: [ 0.0111 ,  0.0596 ]
  Adjusted p-value: 0.001851 
  Statistical Significance: Yes 
  Direction: First group higher 

Comparison: High-Moderate 
  Estimated mean difference: 0.0151 g/cm²
  95% Confidence Interval: [ -0.0066 ,  0.0368 ]
  Adjusted p-value: 0.2317 
  Statistical Significance: No 
  Direction: No significant difference

Effect Size: Eta-Squared

How much of the variance in BMD is explained by physical activity? We calculate eta-squared (η²), a measure of effect size.

Code

# Calculate eta-squared
ss_treatment <- anova_df$`Sum Sq`[1]
ss_total <- sum(anova_df$`Sum Sq`)
eta_squared <- ss_treatment / ss_total

cat("Effect Size (Eta-Squared, η²)\n")

Effect Size (Eta-Squared, η²)

Code

cat("============================\n\n")

============================

Code

cat("Formula: η² = SST / SSY (Treatment variance / Total variance)\n\n")

Formula: η² = SST / SSY (Treatment variance / Total variance)

Code

cat("η² =", round(eta_squared, 4), "\n\n")

η² = 0.0073

Code

cat("Interpretation:\n")

Interpretation:

Code

cat("Physical activity explains", round(eta_squared * 100, 2), 
    "% of the variance in BMD.\n\n")

Physical activity explains 0.73 % of the variance in BMD.

Code

cat("General guidelines for η² effect size interpretation:\n")

General guidelines for η² effect size interpretation:

Code

cat("  0.01 (1%) = small effect\n")

  0.01 (1%) = small effect

Code

cat("  0.06 (6%) = medium effect\n")

  0.06 (6%) = medium effect

Code

cat("  0.14 (14%) = large effect\n\n")

  0.14 (14%) = large effect

Code

if (eta_squared < 0.01) {
  effect_desc <- "negligible"
} else if (eta_squared < 0.06) {
  effect_desc <- "small"
} else if (eta_squared < 0.14) {
  effect_desc <- "medium"
} else {
  effect_desc <- "large"
}

cat("Effect Size Classification:", effect_desc, "\n\n")

Effect Size Classification: negligible

Code

cat("Practical Interpretation:\n")

Practical Interpretation:

Code

cat("While statistically significant, the practical effect of activity level\n")

While statistically significant, the practical effect of activity level

Code

cat("on BMD is", effect_desc, ". Other factors not in this model likely also\n")

on BMD is negligible . Other factors not in this model likely also

Code

cat("influence bone mineral density.\n")

influence bone mineral density.

Model Diagnostics

Always check that the fitted ANOVA model meets its assumptions. We examine four diagnostic plots:

Code

# Residual diagnostics
par(mfrow = c(2, 2))
plot(anova_model, which = 1:4)

Four-panel diagnostic plot showing residual patterns, normality assessment, constant variance check, and influence diagnostics for ANOVA assumptions. — Four diagnostic plots for ANOVA model assumptions. (1) Residuals vs Fitted: should show random scatter with no pattern. (2) Q-Q Plot: points should fall close to the diagonal line, indicating normality. (3) Scale-Location: should be roughly horizontal, indicating constant variance. (4) Residuals vs Leverage: look for points outside Cook’s distance lines, indicating influential outliers.

Code

par(mfrow = c(1, 1))

cat("\n\nDiagnostic Plot Interpretation Guide:\n")



Diagnostic Plot Interpretation Guide:

Code

cat("====================================\n\n")

====================================

Code

cat("1. Residuals vs Fitted (top-left):\n")

1. Residuals vs Fitted (top-left):

Code

cat("   - Should show random scatter with no visible pattern\n")

   - Should show random scatter with no visible pattern

Code

cat("   - Points should be roughly equally spread above and below zero\n")

   - Points should be roughly equally spread above and below zero

Code

cat("   - Indicates: Independence assumption and linear relationships\n\n")

   - Indicates: Independence assumption and linear relationships

Code

cat("2. Q-Q Plot (top-right):\n")

2. Q-Q Plot (top-right):

Code

cat("   - Points should fall close to the diagonal line\n")

   - Points should fall close to the diagonal line

Code

cat("   - Deviations at the tails (especially pronounced) suggest non-normality\n")

   - Deviations at the tails (especially pronounced) suggest non-normality

Code

cat("   - Indicates: Normality assumption\n\n")

   - Indicates: Normality assumption

Code

cat("3. Scale-Location (bottom-left):\n")

3. Scale-Location (bottom-left):

Code

cat("   - Should be roughly horizontal (flat red line)\n")

   - Should be roughly horizontal (flat red line)

Code

cat("   - Points should not form a clear upward or downward trend\n")

   - Points should not form a clear upward or downward trend

Code

cat("   - Indicates: Homogeneity of variance assumption\n\n")

   - Indicates: Homogeneity of variance assumption

Code

cat("4. Residuals vs Leverage (bottom-right):\n")

4. Residuals vs Leverage (bottom-right):

Code

cat("   - Look for points outside the dashed Cook's distance lines\n")

   - Look for points outside the dashed Cook's distance lines

Code

cat("   - Points outside these lines are highly influential observations\n")

   - Points outside these lines are highly influential observations

Code

cat("   - Indicates: Presence of outliers that may affect results\n\n")

   - Indicates: Presence of outliers that may affect results

Code

cat("Decision Rule:\n")

Decision Rule:

Code

cat("- ANOVA is robust to moderate violations of assumptions\n")

- ANOVA is robust to moderate violations of assumptions

Code

cat("- With our large sample size and similar group sizes, minor departures\n")

- With our large sample size and similar group sizes, minor departures

Code

cat("  from assumptions are unlikely to invalidate our conclusions\n")

  from assumptions are unlikely to invalidate our conclusions

Writing Up Results

Statistical Results Section

We conducted a one-way ANOVA to test whether bone mineral density (BMD) differed across three levels of physical activity (low, moderate, high). There were significant differences in mean BMD across activity groups, F(2, 1585) = 10.07, p < 0.001, with an effect size of η² = 0.012.

Interpretation: The F-statistic of 10.07 means the between-group variation is about 10 times larger than the within-group variation. The p-value (< 0.001) indicates this difference is extremely unlikely to have occurred by chance if all groups truly had the same mean.

Post-hoc Tukey HSD tests revealed that individuals with high physical activity had significantly higher mean BMD (0.940 g/cm²) compared to those with low activity (0.900 g/cm²; mean difference = 0.040, 95% CI [0.017, 0.063], p < 0.001). Similarly, moderate activity was associated with higher BMD compared to low activity (mean difference = 0.025, 95% CI [0.008, 0.051], p = 0.004). The difference between moderate and high activity groups was not statistically significant (p = 0.624), suggesting that the benefit of moderate activity approaches that of high activity.

In-Class Exercise

Now it’s your turn! Follow these steps to practice ANOVA analysis:

Exercise: BMD and Smoking Status

Research Question: Does bone mineral density differ across smoking status groups (current, past, never)?

Your Tasks:

Set up the hypothesis test
- Write the null hypothesis (H₀) and alternative hypothesis (H₁)
- Choose α = 0.05 as your significance level
Prepare the data (use the NHANES dataset)
- Create a smoking status factor variable with three levels
- Generate summary statistics: group sizes (n), means, and standard deviations
- Check for missing data
Fit the ANOVA model
- Use aov() to fit the model
- Examine the ANOVA table with all components (DF, SS, MS, F, p-value)
- State your conclusion about the null hypothesis using plain language
Conduct post-hoc tests
- If p < 0.05, use TukeyHSD() to identify which pairs differ
- Interpret the confidence intervals (do they include zero?)
- Which specific groups are significantly different?
Check assumptions
- Conduct Shapiro-Wilk test for normality within each group
- Conduct Levene’s test for equal variances
- Examine the four diagnostic plots and assess any violations
Report your findings
- Write a brief results section summarizing your findings
- Include the F-statistic, p-value, degrees of freedom, and effect size
- Describe which groups differ and in what direction
- Discuss any assumption violations and their implications

Key Takeaways

When to Use ANOVA

✓ Comparing 3 or more group means ✓ Single categorical predictor (one-way ANOVA) ✓ Want to control Type I error rate across multiple comparisons ✓ Continuous outcome variable with approximately normal distribution

ANOVA Workflow

Check assumptions (independence, normality, equal variances)
Conduct ANOVA F-test (is there any difference among groups?)
Interpret p-value (compare to α = 0.05)
If p < 0.05, conduct post-hoc tests (which specific groups differ?)
Report effect sizes (η²) and confidence intervals (not just p-values)
Check diagnostic plots (verify assumption reasonableness)

Important Reminders

ANOVA tests whether any difference exists; post-hoc tests identify which pairs differ
Always control for Type I error inflation in multiple comparisons
Tukey-Kramer is preferred over Bonferroni for all pairwise comparisons because it maintains power
Report effect sizes (η²), not just p-values, to give readers a sense of practical significance
Check assumptions, but remember ANOVA is robust to moderate violations with large, balanced samples
Interpret results in context: statistically significant doesn’t always mean practically meaningful

Your Toolbox

Task	R Function
Fit ANOVA	`aov()` or `lm()`
View ANOVA table	`summary(aov_model)` or `anova(lm_model)`
Post-hoc test	`TukeyHSD()`
Check normality	`shapiro.test()`
Check equal variances	`leveneTest()` (requires car package)
Model diagnostics	`plot(aov_model)` creates four diagnostic plots
Effect size	Calculate η² = SST / SSY (treatment variance / total variance)

Table 3: R functions for ANOVA analysis

References & Further Reading

Kleinbaum, D. G., Kupper, L. L., Nizam, A., & Rosenberg, E. S. (2013). Applied regression analysis and other multivariable methods (5th ed.). Cengage Learning.
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer-Verlag.
R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Last updated: January 20, 2026

Learning Objectives

Why ANOVA Matters: Real-World Applications

When/Why to Use ANOVA

The Problem ANOVA Solves

The Multiple Comparisons Problem

ANOVA’s Solution

ANOVA Fundamentals

Hypotheses

Key Terminology

The Math: Decomposing Variance

Total Sum of Squares (SSY)

Key Identity: Decomposition

Mean Squares

The F-Statistic

Interpretation

Decision Rule

ANOVA Assumptions

1. Independence

2. Normality

3. Homogeneity of Variance

Robustness

The ANOVA Table

Post-Hoc Testing: Tukey-Kramer Method

Why Not Just Do Pairwise t-Tests?

Tukey-Kramer Approach

Comparison to Bonferroni

Practical Example: NHANES Bone Mineral Density

Research Question

Data Preparation

Exploratory Data Analysis

Visualization of Means and Distributions

Conducting the ANOVA Test

Method 1: Using aov()

Method 2: Using lm() with anova()

Post-Hoc Testing: Tukey-Kramer

Visualize Tukey Results

Interpretation Summary

Effect Size: Eta-Squared

Model Diagnostics

Writing Up Results

Statistical Results Section

In-Class Exercise

Exercise: BMD and Smoking Status

Key Takeaways

When to Use ANOVA

ANOVA Workflow

Important Reminders

Your Toolbox

References & Further Reading

Method 1: Using `aov()`

Method 2: Using `lm()` with `anova()`