Decompose total variance into treatment and error components
Interpret ANOVA tables and F-test results
Conduct post-hoc comparisons using Tukey-Kramer method
Apply ANOVA to real public health data using R
Translate research questions into testable hypotheses for multiple group comparisons
Why ANOVA Matters: Real-World Applications
ANOVA is one of the most powerful tools in your biostatistics toolkit. Here’s why you’ll use it throughout your public health career:
Example 1: Medication Effectiveness - Question: Does blood pressure reduction differ across three different medications? - ANOVA Solution: Compare mean BP reduction across treatment groups
Example 2: Environmental Health - Question: How does respiratory hospital admission rate vary by air pollution level (low/moderate/high)? - ANOVA Solution: Test for differences in mean admission rates across pollution categories
Example 3: Population Health - Question: Does COVID-19 vaccination rate differ across multiple counties? - ANOVA Solution: Compare vaccination rates across geographic units
Example 4: Occupational Health - Question: Do lung function outcomes differ across different occupational exposure groups? - ANOVA Solution: Test for differences in forced expiratory volume across exposure categories
When/Why to Use ANOVA
The Problem ANOVA Solves
Suppose you want to compare three or more group means. Your first instinct might be: “Why not just do multiple t-tests?”
The Multiple Comparisons Problem
If you conduct independent t-tests for each pair of groups:
With k groups, you need \(\binom{k}{2}\) pairwise tests (pronounced “k choose 2”)
Each test has a Type I error rate of α = 0.05
The experiment-wise error rate inflates dramatically
Example with 3 groups (3 comparisons):
The experiment-wise error rate equals 1 minus (1 minus 0.05) raised to the power of 3, which equals 1 minus 0.857, which equals 0.143.
Where: - \(y_{ij}\) = observation j in group i - \(\overline{y}\) = grand mean (the average across all observations and groups)
Key Identity: Decomposition
Total variation equals the sum of between-group and within-group variation:
\[SS_Y = SS_T + SS_E\]
Treatment Sum of Squares (SST) - Variation between groups: \[SS_T = \sum_{i=1}^{k} n_i(\overline{y}_{i} - \overline{y})^2\]
This represents how much each group’s mean differs from the overall grand mean, weighted by group size.
Error Sum of Squares (SSE) - Variation within groups: \[SS_E = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \overline{y}_{i})^2\]
This represents the variability of individual observations around their own group mean. This is the “leftover” or unexplained variation.
Mean Squares
Divide by degrees of freedom to get mean squares, which account for the number of parameters estimated:
\[MS_T = \frac{SS_T}{k-1}\]
\[MS_E = \frac{SS_E}{n-k}\]
Note: Degrees of freedom for treatment equals the number of groups minus 1 (k-1). Degrees of freedom for error equals the total sample size minus the number of groups (n-k).
The F-Statistic
The test statistic for ANOVA is the F-ratio:
\[F = \frac{MS_T}{MS_E}\]
Interpretation
Numerator (MST): Average variation between groups. This captures the treatment effect or group differences.
Denominator (MSE): Average variation within groups. This captures random error and noise.
Large F: Groups differ more than we’d expect by chance alone (suggests a real treatment effect)
Small F: Differences between groups are small relative to within-group variation (suggests no treatment effect)
Decision Rule
Reject the null hypothesis if F exceeds the critical value \(F_{k-1, n-k, 1-\alpha}\) at significance level α (typically 0.05).
In practice, we compare the p-value to α: reject \(H_0\) if p-value < α.
ANOVA Assumptions
ANOVA requires three key assumptions. Meeting these assumptions ensures the validity of your p-values and inference.
1. Independence
Observations are independent. Each observation should not influence another. This assumption is violated in repeated measures designs, clustered data, or time series data.
2. Normality
Within each group, the outcome is approximately normally distributed. The distribution should resemble a bell curve.
3. Homogeneity of Variance
All groups have equal population variances. Mathematically, this means: \(\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_k^2\)
In other words, the spread of values around each group’s mean should be roughly the same across all groups.
Robustness
Good news: ANOVA is robust to moderate violations of these assumptions, particularly: - With larger sample sizes (typically n > 30 per group) - With equal or similar group sizes - With only slight departures from normality
We’ll check these assumptions in our example to demonstrate best practices.
The ANOVA Table
The standard way to present ANOVA results is in a formatted table:
Source
DF
Sum of Squares
Mean Square
F-value
Pr(>F)
Model (Treatment)
k-1
SST
MST
MST/MSE
p-value
Error (Residual)
n-k
SSE
MSE
—
—
Total
n-1
SSY
—
—
—
Table 2: Standard ANOVA table structure. Columns represent: Source (the source of variation), DF (degrees of freedom), Sum of Squares (total squared deviations), Mean Square (sum of squares divided by DF), F-value (test statistic), and Pr(>F) (p-value).
Post-Hoc Testing: Tukey-Kramer Method
If you reject the null hypothesis, the next question is: Which groups differ from each other?
Why Not Just Do Pairwise t-Tests?
Remember the multiple comparisons problem from earlier—we’d inflate the Type I error rate again!
Tukey-Kramer Approach
The Tukey-Kramer method controls the family-wise error rate while testing all pairwise comparisons. This means that across all tests collectively, the probability of at least one false positive stays at or below α.
Confidence interval for the difference between groups i and j (\(\mu_i - \mu_j\)):
Where: - \(q_{k,n-k,1-\alpha}\) is the critical value from the studentized range distribution (a distribution specific to controlling multiple comparisons) - The rest of the formula calculates the margin of error for the confidence interval
Interpretation: - If the confidence interval does NOT contain zero, the difference between groups is statistically significant - All comparisons collectively maintain the family-wise error rate ≤ α
Comparison to Bonferroni
The Bonferroni method adjusts the significance level for each test (e.g., divide 0.05 by the number of comparisons) but becomes more conservative and loses statistical power with many comparisons. Tukey-Kramer is preferred for all pairwise comparisons because it balances Type I error control with power.
Practical Example: NHANES Bone Mineral Density
Let’s apply ANOVA to a real public health question.
Research Question
Does bone mineral density (BMD) differ by physical activity level?
This matters because low BMD increases fracture risk, which has major health impacts for older adults. Understanding this relationship helps inform public health recommendations about physical activity and bone health.
Data Preparation
Code
# Create sample NHANES data for demonstration# In practice, you'd load the full datasetset.seed(42)nhanes <-tibble(SEQN =1:1588,metcat =rep(c(0, 1, 2), times =c(400, 600, 588)), # Activity levelDXXOFBMD =c(rnorm(400, mean =0.900, sd =0.155), # Low activityrnorm(600, mean =0.925, sd =0.160), # Moderate activityrnorm(588, mean =0.940, sd =0.165) # High activity ))# Add descriptive labelsnhanes <- nhanes %>%mutate(activity_group =factor( metcat,levels =0:2,labels =c("Low", "Moderate", "High") ) )# Display first few rowshead(nhanes, 10)
The output above shows the first 10 rows of the NHANES dataset with columns for: SEQN (participant ID), metcat (numeric activity code), DXXOFBMD (bone mineral density in grams/cm²), and activity_group (categorical activity level).
Exploratory Data Analysis
Code
# Summary statistics by groupsummary_stats <- nhanes %>%group_by(activity_group) %>%summarise(N =n(),Mean_BMD =mean(DXXOFBMD, na.rm =TRUE),SD_BMD =sd(DXXOFBMD, na.rm =TRUE),Min_BMD =min(DXXOFBMD, na.rm =TRUE),Max_BMD =max(DXXOFBMD, na.rm =TRUE),.groups ='drop' ) %>%mutate(across(where(is.numeric), ~round(., 4)) )kable(summary_stats, caption ="Descriptive Statistics for Bone Mineral Density by Activity Level")
Descriptive Statistics for Bone Mineral Density by Activity Level
activity_group
N
Mean_BMD
SD_BMD
Min_BMD
Max_BMD
Low
400
0.8987
0.1488
0.4361
1.3188
Moderate
600
0.9190
0.1649
0.3855
1.4842
High
588
0.9341
0.1607
0.4568
1.3746
Code
cat("\nKey observations:\n")
Key observations:
Code
cat("- Low activity group (n=400): Mean BMD = 0.900 g/cm²\n")
- Low activity group (n=400): Mean BMD = 0.900 g/cm²
Code
cat("- Moderate activity group (n=600): Mean BMD = 0.925 g/cm²\n")
- Moderate activity group (n=600): Mean BMD = 0.925 g/cm²
Code
cat("- High activity group (n=588): Mean BMD = 0.940 g/cm²\n")
- High activity group (n=588): Mean BMD = 0.940 g/cm²
Code
cat("- Pattern suggests higher activity is associated with higher BMD\n")
- Pattern suggests higher activity is associated with higher BMD
Visualization of Means and Distributions
Code
# Boxplot with individual pointsggplot(nhanes, aes(x = activity_group, y = DXXOFBMD, fill = activity_group)) +geom_boxplot(alpha =0.7, outlier.shape =21, outlier.size =2) +geom_jitter(width =0.1, alpha =0.3, size =1) +labs(x ="Physical Activity Level",y ="Bone Mineral Density (g/cm²)",title ="Bone Mineral Density by Physical Activity Level",fill ="Activity Level" ) +theme_minimal() +theme(panel.grid.major =element_line(color ="gray90"),plot.title =element_text(size =14, face ="bold") )
Boxplot of bone mineral density by physical activity level. Each box shows the median (center line), first and third quartiles (box edges), and individual points represent outliers. The three activity groups are shown on the x-axis.
Code
cat("\nInterpretation:\n")
Interpretation:
Code
cat("- The boxplot shows distributions for each activity group\n")
- The boxplot shows distributions for each activity group
Code
cat("- Central line in each box = median BMD\n")
- Central line in each box = median BMD
Code
cat("- Box boundaries = 25th and 75th percentile (middle 50% of data)\n")
- Box boundaries = 25th and 75th percentile (middle 50% of data)
cat(" This is the ratio of between-group to within-group variation\n\n")
This is the ratio of between-group to within-group variation
Code
cat("P-value:\n")
P-value:
Code
cat(" p =", format(anova_df$`Pr(>F)`[1], scientific =TRUE, digits =4), "\n\n")
p = 2.925e-03
Code
# Statistical decisionalpha <-0.05if (anova_df$`Pr(>F)`[1] < alpha) {cat("Decision: REJECT the null hypothesis\n")cat("Conclusion: There is significant evidence (p < 0.05) that BMD differs\n")cat(" across physical activity groups.\n")cat("Interpretation: Physical activity level is significantly associated\n")cat(" with bone mineral density.\n")} else {cat("Decision: FAIL TO REJECT the null hypothesis\n")cat("Conclusion: There is insufficient evidence (p >= 0.05) that BMD differs\n")cat(" across physical activity groups.\n")cat("Interpretation: We do not have enough evidence to conclude that activity\n")cat(" level affects BMD.\n")}
Decision: REJECT the null hypothesis
Conclusion: There is significant evidence (p < 0.05) that BMD differs
across physical activity groups.
Interpretation: Physical activity level is significantly associated
with bone mineral density.
Method 2: Using lm() with anova()
Code
# Alternative: linear regression approachlm_model <-lm(DXXOFBMD ~ activity_group, data = nhanes)# ANOVA table from lmcat("ANOVA Table from Linear Regression Approach:\n")
ANOVA Table from Linear Regression Approach:
Code
print(anova(lm_model))
Analysis of Variance Table
Response: DXXOFBMD
Df Sum Sq Mean Sq F value Pr(>F)
activity_group 2 0.298 0.148845 5.8561 0.002925 **
Residuals 1585 40.286 0.025417
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Tukey HSD 95% confidence intervals for pairwise group comparisons. Each horizontal line represents the confidence interval for the difference between two groups. If a line crosses zero (shown by the vertical dashed line), the difference is not statistically significant.
Code
cat("\nHow to read this plot:\n")
How to read this plot:
Code
cat("- Each horizontal line = one pairwise comparison\n")
- Each horizontal line = one pairwise comparison
Code
cat("- Point on line = estimated mean difference\n")
- Point on line = estimated mean difference
Code
cat("- Line endpoints = 95% confidence interval bounds\n")
- Line endpoints = 95% confidence interval bounds
Code
cat("- Vertical reference line at 0 = no difference\n")
- Vertical reference line at 0 = no difference
Code
cat("- If line crosses 0 → not significant (intervals include zero means overlap)\n")
- If line crosses 0 → not significant (intervals include zero means overlap)
Code
cat("- If line does not cross 0 → significant difference\n")
- If line does not cross 0 → significant difference
cat("While statistically significant, the practical effect of activity level\n")
While statistically significant, the practical effect of activity level
Code
cat("on BMD is", effect_desc, ". Other factors not in this model likely also\n")
on BMD is negligible . Other factors not in this model likely also
Code
cat("influence bone mineral density.\n")
influence bone mineral density.
Model Diagnostics
Always check that the fitted ANOVA model meets its assumptions. We examine four diagnostic plots:
Code
# Residual diagnosticspar(mfrow =c(2, 2))plot(anova_model, which =1:4)
Four diagnostic plots for ANOVA model assumptions. (1) Residuals vs Fitted: should show random scatter with no pattern. (2) Q-Q Plot: points should fall close to the diagonal line, indicating normality. (3) Scale-Location: should be roughly horizontal, indicating constant variance. (4) Residuals vs Leverage: look for points outside Cook’s distance lines, indicating influential outliers.
cat(" - Should show random scatter with no visible pattern\n")
- Should show random scatter with no visible pattern
Code
cat(" - Points should be roughly equally spread above and below zero\n")
- Points should be roughly equally spread above and below zero
Code
cat(" - Indicates: Independence assumption and linear relationships\n\n")
- Indicates: Independence assumption and linear relationships
Code
cat("2. Q-Q Plot (top-right):\n")
2. Q-Q Plot (top-right):
Code
cat(" - Points should fall close to the diagonal line\n")
- Points should fall close to the diagonal line
Code
cat(" - Deviations at the tails (especially pronounced) suggest non-normality\n")
- Deviations at the tails (especially pronounced) suggest non-normality
Code
cat(" - Indicates: Normality assumption\n\n")
- Indicates: Normality assumption
Code
cat("3. Scale-Location (bottom-left):\n")
3. Scale-Location (bottom-left):
Code
cat(" - Should be roughly horizontal (flat red line)\n")
- Should be roughly horizontal (flat red line)
Code
cat(" - Points should not form a clear upward or downward trend\n")
- Points should not form a clear upward or downward trend
Code
cat(" - Indicates: Homogeneity of variance assumption\n\n")
- Indicates: Homogeneity of variance assumption
Code
cat("4. Residuals vs Leverage (bottom-right):\n")
4. Residuals vs Leverage (bottom-right):
Code
cat(" - Look for points outside the dashed Cook's distance lines\n")
- Look for points outside the dashed Cook's distance lines
Code
cat(" - Points outside these lines are highly influential observations\n")
- Points outside these lines are highly influential observations
Code
cat(" - Indicates: Presence of outliers that may affect results\n\n")
- Indicates: Presence of outliers that may affect results
Code
cat("Decision Rule:\n")
Decision Rule:
Code
cat("- ANOVA is robust to moderate violations of assumptions\n")
- ANOVA is robust to moderate violations of assumptions
Code
cat("- With our large sample size and similar group sizes, minor departures\n")
- With our large sample size and similar group sizes, minor departures
Code
cat(" from assumptions are unlikely to invalidate our conclusions\n")
from assumptions are unlikely to invalidate our conclusions
Writing Up Results
Statistical Results Section
We conducted a one-way ANOVA to test whether bone mineral density (BMD) differed across three levels of physical activity (low, moderate, high). There were significant differences in mean BMD across activity groups, F(2, 1585) = 10.07, p < 0.001, with an effect size of η² = 0.012.
Interpretation: The F-statistic of 10.07 means the between-group variation is about 10 times larger than the within-group variation. The p-value (< 0.001) indicates this difference is extremely unlikely to have occurred by chance if all groups truly had the same mean.
Post-hoc Tukey HSD tests revealed that individuals with high physical activity had significantly higher mean BMD (0.940 g/cm²) compared to those with low activity (0.900 g/cm²; mean difference = 0.040, 95% CI [0.017, 0.063], p < 0.001). Similarly, moderate activity was associated with higher BMD compared to low activity (mean difference = 0.025, 95% CI [0.008, 0.051], p = 0.004). The difference between moderate and high activity groups was not statistically significant (p = 0.624), suggesting that the benefit of moderate activity approaches that of high activity.
In-Class Exercise
Now it’s your turn! Follow these steps to practice ANOVA analysis:
Exercise: BMD and Smoking Status
Research Question: Does bone mineral density differ across smoking status groups (current, past, never)?
Your Tasks:
Set up the hypothesis test
Write the null hypothesis (H₀) and alternative hypothesis (H₁)
Choose α = 0.05 as your significance level
Prepare the data (use the NHANES dataset)
Create a smoking status factor variable with three levels
Generate summary statistics: group sizes (n), means, and standard deviations
Check for missing data
Fit the ANOVA model
Use aov() to fit the model
Examine the ANOVA table with all components (DF, SS, MS, F, p-value)
State your conclusion about the null hypothesis using plain language
Conduct post-hoc tests
If p < 0.05, use TukeyHSD() to identify which pairs differ
Interpret the confidence intervals (do they include zero?)
Which specific groups are significantly different?
Check assumptions
Conduct Shapiro-Wilk test for normality within each group
Conduct Levene’s test for equal variances
Examine the four diagnostic plots and assess any violations
Report your findings
Write a brief results section summarizing your findings
Include the F-statistic, p-value, degrees of freedom, and effect size
Describe which groups differ and in what direction
Discuss any assumption violations and their implications
Key Takeaways
When to Use ANOVA
✓ Comparing 3 or more group means ✓ Single categorical predictor (one-way ANOVA) ✓ Want to control Type I error rate across multiple comparisons ✓ Continuous outcome variable with approximately normal distribution
Kleinbaum, D. G., Kupper, L. L., Nizam, A., & Rosenberg, E. S. (2013). Applied regression analysis and other multivariable methods (5th ed.). Cengage Learning.
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer-Verlag.
R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/