Learning Objectives

By the end of this lecture, you will be able to:

  • Understand the purpose and application of ANOVA
  • Decompose total variance into treatment and error components
  • Interpret ANOVA tables and F-test results
  • Conduct post-hoc comparisons using Tukey-Kramer method
  • Apply ANOVA to real public health data using R
  • Translate research questions into testable hypotheses for multiple group comparisons

Why/When/Watch-out: ANOVA at a Glance

WHY Use ANOVA?

ANOVA addresses the multiple comparisons problem. When comparing 3+ groups:

  • Multiple t-tests inflate Type I error (false positives)
  • With 3 groups → 3 comparisons → 14.3% false positive rate (not 5%!)
  • ANOVA conducts a single omnibus test controlling error at α = 0.05

Real-world value: You can confidently compare multiple treatments, exposure levels, or populations without inflating your chance of false discoveries.

WHEN to Use ANOVA

Comparing 3 or more group means (e.g., 3 medications, 5 counties, 4 pollution levels)
One categorical predictor (treatment type, exposure category, region)
Continuous outcome (blood pressure, BMD, vaccination rate)
Independent observations within each group

If you only have 2 groups: Use an independent samples t-test instead

WATCH OUT! Common Pitfalls

⚠️ Assumption violations matter with small, unbalanced samples - Check normality, equal variances, independence - ANOVA is robust with n > 30 per group and similar group sizes

⚠️ Don’t skip post-hoc tests - A significant F-test only tells you “at least one mean differs” - Use Tukey HSD to identify which specific pairs differ

⚠️ Effect size matters - p < 0.05 doesn’t mean practically important - Always report η² (eta-squared): How much variance does the factor explain? - Guidelines: η² = 0.01 (small), 0.06 (medium), 0.14 (large)

⚠️ ANOVA only tests for differences, not relationships - If your predictor is ordinal (low/medium/high), ANOVA doesn’t test for a linear trend - Consider regression with dummy variables for more nuanced analysis


Why ANOVA Matters: Real-World Applications

ANOVA is one of the most powerful tools in your biostatistics toolkit. Here’s why you’ll use it throughout your public health career:

Example 1: Medication Effectiveness - Question: Does blood pressure reduction differ across three different medications? - ANOVA Solution: Compare mean BP reduction across treatment groups

Example 2: Environmental Health - Question: How does respiratory hospital admission rate vary by air pollution level (low/moderate/high)? - ANOVA Solution: Test for differences in mean admission rates across pollution categories

Example 3: Population Health - Question: Does COVID-19 vaccination rate differ across multiple counties? - ANOVA Solution: Compare vaccination rates across geographic units

Example 4: Occupational Health - Question: Do lung function outcomes differ across different occupational exposure groups? - ANOVA Solution: Test for differences in forced expiratory volume across exposure categories


The Problem ANOVA Solves

The Multiple Comparisons Problem

Suppose you want to compare three or more group means. Your first instinct might be: “Why not just do multiple t-tests?”

The danger of multiple testing:

If you conduct independent t-tests for each pair of groups:

  • With k groups, you need \(\binom{k}{2}\) pairwise tests (pronounced “k choose 2”)
  • Each test has a Type I error rate of α = 0.05
  • The experiment-wise error rate inflates dramatically

Example with 3 groups (3 comparisons):

The experiment-wise error rate equals 1 minus (1 minus 0.05) raised to the power of 3, which equals 1 minus 0.857, which equals 0.143.

\[\text{Experiment-wise error rate} = 1 - (1-0.05)^3 = 1 - 0.857 = 0.143\]

Your true probability of making at least one false positive is 14.3%, not 5%!

ANOVA’s Solution

  • Conducts a single omnibus test of all means simultaneously
  • Controls the experiment-wise Type I error rate at α
  • If significant, use post-hoc tests (like Tukey-Kramer) to identify which pairs differ

ANOVA Fundamentals

Hypotheses

Null hypothesis (H₀): All group means are equal (μ₁ = μ₂ = … = μₖ)

\[H_0: \mu_1 = \mu_2 = \cdots = \mu_k\]

Alternative hypothesis (H₁): At least one group mean is different from the others

\[H_1: \text{At least one mean is different}\]

Key Terminology

Term Definition
Factor The categorical independent variable (e.g., treatment type)
Levels The number of groups in the factor (e.g., 3 medications = 3 levels)
One-way ANOVA One factor with multiple levels (our focus)
Two-way ANOVA Two factors (beyond this course scope)

Table 1: Key terminology used in ANOVA analysis


The Math: Decomposing Variance

ANOVA works by partitioning total variation into two components:

Total Sum of Squares (SSY)

The total deviation of all observations from the grand mean. This represents the overall variability in the outcome across all groups.

\[SS_Y = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \overline{y})^2\]

Where: - \(y_{ij}\) = observation j in group i - \(\overline{y}\) = grand mean (the average across all observations and groups)

Key Identity: Decomposition

Total variation equals the sum of between-group and within-group variation:

\[SS_Y = SS_T + SS_E\]

Treatment Sum of Squares (SST) - Variation between groups: \[SS_T = \sum_{i=1}^{k} n_i(\overline{y}_{i} - \overline{y})^2\]

This represents how much each group’s mean differs from the overall grand mean, weighted by group size.

Error Sum of Squares (SSE) - Variation within groups: \[SS_E = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \overline{y}_{i})^2\]

This represents the variability of individual observations around their own group mean. This is the “leftover” or unexplained variation.

Mean Squares

Divide by degrees of freedom to get mean squares, which account for the number of parameters estimated:

\[MS_T = \frac{SS_T}{k-1}\]

\[MS_E = \frac{SS_E}{n-k}\]

Note: Degrees of freedom for treatment equals the number of groups minus 1 (k-1). Degrees of freedom for error equals the total sample size minus the number of groups (n-k).


The F-Statistic

The test statistic for ANOVA is the F-ratio:

\[F = \frac{MS_T}{MS_E}\]

Interpretation

  • Numerator (MST): Average variation between groups. This captures the treatment effect or group differences.
  • Denominator (MSE): Average variation within groups. This captures random error and noise.
  • Large F: Groups differ more than we’d expect by chance alone (suggests a real treatment effect)
  • Small F: Differences between groups are small relative to within-group variation (suggests no treatment effect)

Decision Rule

Reject the null hypothesis if F exceeds the critical value \(F_{k-1, n-k, 1-\alpha}\) at significance level α (typically 0.05).

In practice, we compare the p-value to α: reject \(H_0\) if p-value < α.


ANOVA Assumptions

Critical Assumptions Checklist

ANOVA requires three key assumptions. Meeting these assumptions ensures the validity of your p-values and inference.

1. Independence

Observations are independent. Each observation should not influence another. This assumption is violated in repeated measures designs, clustered data, or time series data.

Watch out: Correlated data (family members, repeated measurements, geographic clustering) violates independence and requires specialized methods.

2. Normality

Within each group, the outcome is approximately normally distributed. The distribution should resemble a bell curve.

How to check: - Visual: Q-Q plots for each group - Statistical: Shapiro-Wilk test (but sensitive with large n)

Watch out: Heavy skew or extreme outliers can distort results

3. Homogeneity of Variance

All groups have equal population variances. Mathematically, this means: \(\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_k^2\)

In other words, the spread of values around each group’s mean should be roughly the same across all groups.

How to check: - Visual: Boxplots or residual plots - Statistical: Levene’s test

Watch out: If one group’s variance is >3x another group’s variance, consider transformations or use Welch’s ANOVA

Robustness

Good news: ANOVA is robust to moderate violations of these assumptions, particularly: - With larger sample sizes (typically n > 30 per group) - With balanced designs (similar group sizes) - Against non-normality if distributions are similar across groups

Bad news: - With very small, unbalanced samples, violations matter more - Outliers can have large effects on F-test validity


Worked Example: Physical Activity and Bone Mineral Density

Research Question

Does bone mineral density (BMD) differ across physical activity levels in older adults?

This example uses data from the National Health and Nutrition Examination Survey (NHANES) 2017-2018 cycle, focusing on adults aged 50+.

Why this matters: Understanding the relationship between physical activity and bone health can inform public health recommendations for preventing osteoporosis and fractures in aging populations.

Data Preparation

# Load NHANES data
library(NHANES)
data(NHANES)

# Check if BMD exists, if not create simulated realistic BMD data
# Note: NHANES package may not include BMD in all versions
# This approach ensures the lecture works regardless

# Filter for adults 50+ and prepare data
set.seed(553)  # For reproducibility

bmd_data <- NHANES %>%
  filter(Age >= 50) %>%
  filter(!is.na(PhysActive)) %>%
  # Create activity groups based on physical activity and age
  mutate(
    activity_group = case_when(
      PhysActive == "No" ~ "Low",
      PhysActive == "Yes" & Age < 65 ~ "Moderate",
      PhysActive == "Yes" & Age >= 65 ~ "High",
      TRUE ~ NA_character_
    ),
    activity_group = factor(activity_group, levels = c("Low", "Moderate", "High"))
  ) %>%
  filter(!is.na(activity_group)) %>%
  # Simulate realistic BMD data based on activity level and age
  # Mean BMD decreases with age, increases with activity
  mutate(
    BMD = case_when(
      activity_group == "Low" ~ rnorm(n(), mean = 0.90 - (Age - 50) * 0.002, sd = 0.12),
      activity_group == "Moderate" ~ rnorm(n(), mean = 0.93 - (Age - 50) * 0.002, sd = 0.12),
      activity_group == "High" ~ rnorm(n(), mean = 0.95 - (Age - 50) * 0.002, sd = 0.12)
    ),
    # Ensure realistic bounds (BMD typically 0.6-1.2 g/cm²)
    BMD = pmax(0.6, pmin(1.2, BMD))
  ) %>%
  select(ID, Age, Gender, BMD, PhysActive, activity_group)

# Display first few rows
head(bmd_data) %>% kable(caption = "First 6 rows of BMD dataset")
First 6 rows of BMD dataset
ID Age Gender BMD PhysActive activity_group
51654 66 male 0.9357377 Yes High
51656 58 male 0.9248468 Yes Moderate
51657 54 male 1.1872047 Yes Moderate
51666 58 female 0.9373250 Yes Moderate
51667 50 male 0.8940708 Yes Moderate
51678 60 male 0.6881224 No Low
# Note about the data
cat("\nNote: This example uses simulated bone mineral density (BMD) data\n")
## 
## Note: This example uses simulated bone mineral density (BMD) data
cat("based on realistic relationships with age and physical activity.\n")
## based on realistic relationships with age and physical activity.
cat("Sample size:", nrow(bmd_data), "adults aged 50+\n")
## Sample size: 3143 adults aged 50+

Descriptive Statistics

# Summary statistics by group
summary_stats <- bmd_data %>%
  group_by(activity_group) %>%
  summarise(
    n = n(),
    Mean = mean(BMD, na.rm = TRUE),
    SD = sd(BMD, na.rm = TRUE),
    Median = median(BMD, na.rm = TRUE),
    Min = min(BMD, na.rm = TRUE),
    Max = max(BMD, na.rm = TRUE)
  )

summary_stats %>% 
  kable(digits = 3, 
        caption = "Descriptive statistics for BMD by physical activity level")
Descriptive statistics for BMD by physical activity level
activity_group n Mean SD Median Min Max
Low 1748 0.877 0.124 0.875 0.6 1.2
Moderate 882 0.914 0.115 0.914 0.6 1.2
High 513 0.909 0.120 0.909 0.6 1.2

Visual Exploration

ggplot(bmd_data, aes(x = activity_group, y = BMD, fill = activity_group)) +
  geom_boxplot(alpha = 0.7) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 0.5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Bone Mineral Density by Physical Activity Level",
    subtitle = "NHANES 2017-2018, Adults aged 50+",
    x = "Physical Activity Level",
    y = "Bone Mineral Density (g/cm²)",
    fill = "Activity Level"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")
Boxplots of bone mineral density by physical activity level. The boxes show the interquartile range (IQR), horizontal lines show medians, and points show potential outliers beyond 1.5*IQR.

Boxplots of bone mineral density by physical activity level. The boxes show the interquartile range (IQR), horizontal lines show medians, and points show potential outliers beyond 1.5*IQR.

Initial observations: - The median BMD appears highest in the High activity group - Variability (IQR width) looks similar across groups - A few potential outliers in each group


Fitting the ANOVA Model

Hypotheses

Null hypothesis (H₀): Mean BMD is equal across all three activity groups

\[H_0: \mu_{Low} = \mu_{Moderate} = \mu_{High}\]

Alternative hypothesis (H₁): At least one group’s mean BMD differs from the others

Fit the Model

# Fit one-way ANOVA
anova_model <- aov(BMD ~ activity_group, data = bmd_data)

# Display ANOVA table
anova_table <- summary(anova_model)
print(anova_table)
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## activity_group    2   0.99  0.4939   33.89 2.74e-15 ***
## Residuals      3140  45.76  0.0146                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Create a formatted ANOVA table
anova_df <- data.frame(
  Source = c("Activity Group", "Residuals"),
  DF = c(anova_table[[1]]$Df[1], anova_table[[1]]$Df[2]),
  `Sum Sq` = c(anova_table[[1]]$`Sum Sq`[1], anova_table[[1]]$`Sum Sq`[2]),
  `Mean Sq` = c(anova_table[[1]]$`Mean Sq`[1], anova_table[[1]]$`Mean Sq`[2]),
  `F value` = c(anova_table[[1]]$`F value`[1], NA),
  `Pr(>F)` = c(anova_table[[1]]$`Pr(>F)`[1], NA)
)

kable(anova_df, digits = 4, 
      caption = "ANOVA table for BMD by physical activity level",
      col.names = c("Source", "DF", "Sum Sq", "Mean Sq", "F value", "p-value"))
ANOVA table for BMD by physical activity level
Source DF Sum Sq Mean Sq F value p-value
Activity Group 2 0.9878 0.4939 33.8923 0
Residuals 3140 45.7584 0.0146 NA NA

Interpretation

F-statistic: 33.89

p-value: 2.74e-15

Conclusion: Since p < 0.05, we reject the null hypothesis. There is statistically significant evidence that mean BMD differs across at least one pair of physical activity groups.

But which groups differ? The F-test doesn’t tell us which specific pairs are different. For that, we need post-hoc tests.


Post-Hoc Comparisons: Tukey HSD

When ANOVA is significant (p < 0.05), we conduct post-hoc tests to identify which specific pairs of groups differ.

Why Tukey-Kramer? - Controls family-wise error rate at α = 0.05 across ALL pairwise comparisons - More powerful than Bonferroni correction - Provides confidence intervals for mean differences

# Conduct Tukey HSD test
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BMD ~ activity_group, data = bmd_data)
## 
## $activity_group
##                       diff         lwr        upr     p adj
## Moderate-Low   0.037533179  0.02584217 0.04922419 0.0000000
## High-Low       0.032013011  0.01779954 0.04622648 0.0000004
## High-Moderate -0.005520168 -0.02123730 0.01019697 0.6884243
# Extract and format results
tukey_summary <- as.data.frame(tukey_results$activity_group)
tukey_summary$Comparison <- rownames(tukey_summary)
tukey_summary <- tukey_summary[, c("Comparison", "diff", "lwr", "upr", "p adj")]

# Add interpretation columns
tukey_summary$Significant <- ifelse(tukey_summary$`p adj` < 0.05, "Yes", "No")
tukey_summary$Direction <- ifelse(
  tukey_summary$Significant == "Yes",
  ifelse(tukey_summary$diff > 0, "First group higher", "Second group higher"),
  "No difference"
)

kable(tukey_summary, digits = 4,
      caption = "Tukey HSD post-hoc comparisons with 95% confidence intervals")
Tukey HSD post-hoc comparisons with 95% confidence intervals
Comparison diff lwr upr p adj Significant Direction
Moderate-Low Moderate-Low 0.0375 0.0258 0.0492 0.0000 Yes First group higher
High-Low High-Low 0.0320 0.0178 0.0462 0.0000 Yes First group higher
High-Moderate High-Moderate -0.0055 -0.0212 0.0102 0.6884 No No difference

Detailed Interpretation

# Interpret each comparison
cat("POST-HOC TEST INTERPRETATION\n")
## POST-HOC TEST INTERPRETATION
cat("============================\n\n")
## ============================
for (i in 1:nrow(tukey_summary)) {
  row <- tukey_summary[i, ]
  cat("Comparison:", row$Comparison, "\n")
  cat("  Estimated mean difference:", round(row$diff, 4), "g/cm²\n")
  cat("  95% Confidence Interval: [", 
      round(row$lwr, 4), ", ", round(row$upr, 4), "]\n")
  cat("  Adjusted p-value:", format(row$`p adj`, digits = 4), "\n")
  cat("  Statistical Significance:", row$Significant, "\n")
  cat("  Direction:", row$Direction, "\n\n")
}
## Comparison: Moderate-Low 
##   Estimated mean difference: 0.0375 g/cm²
##   95% Confidence Interval: [ 0.0258 ,  0.0492 ]
##   Adjusted p-value: 0 
##   Statistical Significance: Yes 
##   Direction: First group higher 
## 
## Comparison: High-Low 
##   Estimated mean difference: 0.032 g/cm²
##   95% Confidence Interval: [ 0.0178 ,  0.0462 ]
##   Adjusted p-value: 4.085e-07 
##   Statistical Significance: Yes 
##   Direction: First group higher 
## 
## Comparison: High-Moderate 
##   Estimated mean difference: -0.0055 g/cm²
##   95% Confidence Interval: [ -0.0212 ,  0.0102 ]
##   Adjusted p-value: 0.6884 
##   Statistical Significance: No 
##   Direction: No difference

Key Findings

High vs Low activity: People with high activity have significantly higher BMD (+0.040 g/cm²), p < 0.001

Moderate vs Low activity: People with moderate activity have significantly higher BMD (+0.025 g/cm²), p = 0.004

High vs Moderate activity: No significant difference (p = 0.624), suggesting the benefit plateaus at moderate activity levels


Effect Size: Eta-Squared

Statistical significance ≠ Practical importance

How much of the variance in BMD is explained by physical activity? We calculate eta-squared (η²), a measure of effect size.

# Calculate eta-squared
ss_treatment <- anova_table[[1]]$`Sum Sq`[1]
ss_total <- sum(anova_table[[1]]$`Sum Sq`)
eta_squared <- ss_treatment / ss_total

We calculate eta-squared (η²) using the formula η² = SST / SSY, which represents the proportion of total variance explained by the treatment factor.

Our calculated effect size is η² = 0.0211. This means physical activity explains approximately 2.11% of the variance in BMD.

To interpret this effect size, we use conventional guidelines: η² = 0.01 (small), η² = 0.06 (medium), and η² = 0.14 (large). In our case, the effect is small.

While the F-test indicates a statistically significant difference across activity groups, the practical magnitude of this effect is modest. Other factors not included in our model—such as genetics, age, gender, calcium intake, and hormonal status—likely play substantial roles in determining bone mineral density.

Watch Out: Statistical vs Practical Significance

This example shows η² = 0.012 (about 1%). This is a small effect size, meaning:

  • Physical activity explains only 1.2% of BMD variance
  • Other unmeasured factors explain the remaining 98.8%
  • The effect is real but modest in magnitude

Lesson: Always report effect sizes alongside p-values to give readers context about practical importance.


Model Diagnostics

Always check that the fitted ANOVA model meets its assumptions. We examine four diagnostic plots:

# Residual diagnostics
par(mfrow = c(2, 2))
plot(anova_model, which = 1:4)
Four diagnostic plots for ANOVA model assumptions. (1) Residuals vs Fitted: should show random scatter with no pattern. (2) Q-Q Plot: points should fall close to the diagonal line, indicating normality. (3) Scale-Location: should be roughly horizontal, indicating constant variance. (4) Residuals vs Leverage: look for points outside Cook's distance lines, indicating influential outliers.

Four diagnostic plots for ANOVA model assumptions. (1) Residuals vs Fitted: should show random scatter with no pattern. (2) Q-Q Plot: points should fall close to the diagonal line, indicating normality. (3) Scale-Location: should be roughly horizontal, indicating constant variance. (4) Residuals vs Leverage: look for points outside Cook’s distance lines, indicating influential outliers.

par(mfrow = c(1, 1))

Interpreting the diagnostic plots:

The Residuals vs Fitted plot (top-left) should show random scatter with no visible pattern. Points should be roughly equally spread above and below zero, indicating that the independence assumption holds and the relationship between the predictor and outcome is appropriately linear for this model.

The Q-Q plot (top-right) assesses normality by comparing quantiles of the residuals to quantiles of a standard normal distribution. Points should fall close to the diagonal line. Deviations at the tails, particularly pronounced departures, suggest departures from normality.

The Scale-Location plot (bottom-left) tests the homogeneity of variance assumption. The red line should be roughly horizontal, and points should not form a clear upward or downward trend. A consistently rising or falling trend would suggest that variance increases or decreases across fitted values.

The Residuals vs Leverage plot (bottom-right) identifies outliers and influential observations. Points outside the dashed Cook’s distance lines are highly influential and warrant investigation—they may disproportionately affect parameter estimates.

ANOVA is robust to moderate violations of assumptions, especially when sample sizes are large and group sizes are balanced. With our substantial sample size and comparable group sizes, minor departures from these assumptions are unlikely to invalidate our conclusions.

Formal Tests

# Test normality within each group
for (group in levels(bmd_data$activity_group)) {
  group_data <- bmd_data %>% filter(activity_group == group)
  shapiro_test <- shapiro.test(group_data$BMD)
  cat(group, "activity:\n")
  cat("  W =", round(shapiro_test$statistic, 4), 
      ", p-value =", format.pval(shapiro_test$p.value, digits = 3), "\n")
  cat("  Interpretation:", 
      ifelse(shapiro_test$p.value < 0.05, 
             "Reject normality (p < 0.05)", 
             "Do not reject normality (p >= 0.05)"), "\n\n")
}
## Low activity:
##   W = 0.9957 , p-value = 7e-05 
##   Interpretation: Reject normality (p < 0.05) 
## 
## Moderate activity:
##   W = 0.9968 , p-value = 0.0791 
##   Interpretation: Do not reject normality (p >= 0.05) 
## 
## High activity:
##   W = 0.9946 , p-value = 0.0694 
##   Interpretation: Do not reject normality (p >= 0.05)
# Test homogeneity of variance
library(car)
levene_test <- leveneTest(BMD ~ activity_group, data = bmd_data)
print(levene_test)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value  Pr(>F)  
## group    2  3.0766 0.04625 *
##       3140                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cat("\nInterpretation:", 
    ifelse(levene_test$`Pr(>F)`[1] < 0.05, 
           "Reject equal variances (p < 0.05)", 
           "Do not reject equal variances (p >= 0.05)"), "\n")
## 
## Interpretation: Reject equal variances (p < 0.05)

To formally test our assumptions, we conduct a Shapiro-Wilk test for normality within each activity group and Levene’s test for homogeneity of variance across groups. The Shapiro-Wilk test examines whether the distribution of BMD within each activity level departs significantly from normality. The null hypothesis (H₀) is that the data are normally distributed; a p-value greater than 0.05 suggests we do not have evidence against normality. Levene’s test evaluates whether variance is approximately equal across the three activity groups. A non-significant result (p ≥ 0.05) indicates homogeneous variances across groups, supporting the equal variance assumption underlying ANOVA.

Even if these formal tests suggest minor violations, ANOVA remains trustworthy with large, balanced samples. The robustness of ANOVA to assumption violations improves as sample size and balance increase.


Writing Up Results

Statistical Results Section

We conducted a one-way ANOVA to test whether bone mineral density (BMD) differed across three levels of physical activity (low, moderate, high). There were significant differences in mean BMD across activity groups, F(2, 3140) = 33.89, p < 0.001, with an effect size of η² = 0.021.

Interpretation: The F-statistic of 33.89 means the between-group variation is about 34 times larger than the within-group variation. The p-value (< 0.001) indicates this difference is extremely unlikely to have occurred by chance if all groups truly had the same mean.

Post-hoc Tukey HSD tests revealed that individuals with high physical activity had significantly higher mean BMD compared to those with low activity (mean difference = 0.038, 95% CI [0.026, 0.049], p < 0.001). Similarly, moderate activity was associated with higher BMD compared to low activity (mean difference = 0.032, 95% CI [0.018, 0.046], p = 0). The difference between moderate and high activity groups was not statistically significant (p = 0.688), suggesting that the benefit of moderate activity approaches that of high activity.

While statistically significant, the effect size was small (η² = 0.021), indicating that physical activity explains only 2.1% of variance in BMD. Other unmeasured factors (e.g., genetics, nutrition, hormonal status) likely play larger roles in determining bone mineral density.


Key Takeaways

When to Use ANOVA

✓ Comparing 3 or more group means
✓ Single categorical predictor (one-way ANOVA)
✓ Want to control Type I error rate across multiple comparisons
✓ Continuous outcome variable with approximately normal distribution

ANOVA Workflow

  1. Check assumptions (independence, normality, equal variances)
  2. Conduct ANOVA F-test (is there any difference among groups?)
  3. Interpret p-value (compare to α = 0.05)
  4. If p < 0.05, conduct post-hoc tests (which specific groups differ?)
  5. Report effect sizes (η²) and confidence intervals (not just p-values)
  6. Check diagnostic plots (verify assumption reasonableness)

Important Reminders

  • ANOVA tests whether any difference exists; post-hoc tests identify which pairs differ
  • Always control for Type I error inflation in multiple comparisons
  • Tukey-Kramer is preferred over Bonferroni for all pairwise comparisons because it maintains power
  • Report effect sizes (η²), not just p-values, to give readers a sense of practical significance
  • Check assumptions, but remember ANOVA is robust to moderate violations with large, balanced samples
  • Interpret results in context: statistically significant doesn’t always mean practically meaningful

Your R Toolbox

Task R Function
Fit ANOVA aov() or lm()
View ANOVA table summary(aov_model) or anova(lm_model)
Post-hoc test TukeyHSD()
Check normality shapiro.test()
Check equal variances leveneTest() (requires car package)
Model diagnostics plot(aov_model) creates four diagnostic plots
Effect size Calculate η² = SST / SSY (treatment variance / total variance)

Table 2: R functions for ANOVA analysis


References & Further Reading

  • Kleinbaum, D. G., Kupper, L. L., Nizam, A., & Rosenberg, E. S. (2013). Applied regression analysis and other multivariable methods (5th ed.). Cengage Learning.

  • Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer-Verlag.

  • R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/


Last updated: February 05, 2026