Time: ~30 minutes
Goal: Practice one-way ANOVA analysis from start to finish using real public health data
Learning Objectives:
Structure:
Submission: Upload your completed .Rmd file and published to Brightspace by the end of class.
Why ANOVA? We have one continuous outcome (SBP) and one categorical predictor with THREE groups (BMI category). Using multiple t-tests would inflate our Type I error rate.
# Load necessary libraries
library(tidyverse) # For data manipulation and visualization
library(knitr) # For nice tables
library(car) # For Levene's test
library(NHANES) # NHANES dataset
# Load the NHANES data
data(NHANES)Create analysis dataset:
# Set seed for reproducibility
set.seed(553)
# Create BMI categories and prepare data
bp_bmi_data <- NHANES %>%
filter(Age >= 18 & Age <= 65) %>% # Adults 18-65
filter(!is.na(BPSysAve) & !is.na(BMI)) %>%
mutate(
bmi_category = case_when(
BMI < 25 ~ "Normal",
BMI >= 25 & BMI < 30 ~ "Overweight",
BMI >= 30 ~ "Obese",
TRUE ~ NA_character_
),
bmi_category = factor(bmi_category,
levels = c("Normal", "Overweight", "Obese"))
) %>%
filter(!is.na(bmi_category)) %>%
select(ID, Age, Gender, BPSysAve, BMI, bmi_category)
# Display first few rows
head(bp_bmi_data) %>%
kable(caption = "Blood Pressure and BMI Dataset (first 6 rows)")| ID | Age | Gender | BPSysAve | BMI | bmi_category |
|---|---|---|---|---|---|
| 51624 | 34 | male | 113 | 32.22 | Obese |
| 51624 | 34 | male | 113 | 32.22 | Obese |
| 51624 | 34 | male | 113 | 32.22 | Obese |
| 51630 | 49 | female | 112 | 30.57 | Obese |
| 51647 | 45 | female | 118 | 27.24 | Overweight |
| 51647 | 45 | female | 118 | 27.24 | Overweight |
##
## Normal Overweight Obese
## 1939 1937 2150
Interpretation: We have 6026 adults with complete BP and BMI data across three BMI categories.
# Calculate summary statistics by BMI category
summary_stats <- bp_bmi_data %>%
group_by(bmi_category) %>%
summarise(
n = n(),
Mean = mean(BPSysAve),
SD = sd(BPSysAve),
Median = median(BPSysAve),
Min = min(BPSysAve),
Max = max(BPSysAve)
)
summary_stats %>%
kable(digits = 2,
caption = "Descriptive Statistics: Systolic BP by BMI Category")| bmi_category | n | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|---|
| Normal | 1939 | 114.23 | 15.01 | 113 | 78 | 221 |
| Overweight | 1937 | 118.74 | 13.86 | 117 | 83 | 186 |
| Obese | 2150 | 121.62 | 15.27 | 120 | 82 | 226 |
Observation: The mean SBP appears to increase from Normal (114.2) to Overweight (118.7) to Obese (121.6).
# Create boxplots with individual points
ggplot(bp_bmi_data,
aes(x = bmi_category, y = BPSysAve, fill = bmi_category)) +
geom_boxplot(alpha = 0.7, outlier.shape = NA) +
geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Systolic Blood Pressure by BMI Category",
subtitle = "NHANES Data, Adults aged 18-65",
x = "BMI Category",
y = "Systolic Blood Pressure (mmHg)",
fill = "BMI Category"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "none")What the plot tells us:
Null Hypothesis (H₀): μ_Normal = μ_Overweight =
μ_Obese
(All three population means are equal)
Alternative Hypothesis (H₁): At least one population mean differs from the others
Significance level: α = 0.05
# Fit the one-way ANOVA model
anova_model <- aov(BPSysAve ~ bmi_category, data = bp_bmi_data)
# Display the ANOVA table
summary(anova_model)## Df Sum Sq Mean Sq F value Pr(>F)
## bmi_category 2 56212 28106 129.2 <2e-16 ***
## Residuals 6023 1309859 217
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:
Why do we need this? The F-test tells us that groups differ, but not which groups differ. Tukey’s Honest Significant Difference controls the family-wise error rate for multiple pairwise comparisons.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BPSysAve ~ bmi_category, data = bp_bmi_data)
##
## $bmi_category
## diff lwr upr p adj
## Overweight-Normal 4.507724 3.397134 5.618314 0
## Obese-Normal 7.391744 6.309024 8.474464 0
## Obese-Overweight 2.884019 1.801006 3.967033 0
Interpretation:
| Comparison | Mean Diff | 95% CI | p-value | Significant? |
|---|---|---|---|---|
| Overweight - Normal | 4.51 | [3.4, 5.62] | 1.98e-13 | Yes |
| Obese - Normal | 7.39 | [6.31, 8.47] | < 0.001 | Yes |
| Obese - Overweight | 2.88 | [1.8, 3.97] | 1.38e-09 | Yes |
Conclusion: All three pairwise comparisons are statistically significant. Obese adults have higher SBP than overweight adults, who in turn have higher SBP than normal-weight adults.
# Extract sum of squares from ANOVA table
anova_summary <- summary(anova_model)[[1]]
ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)
# Calculate eta-squared
eta_squared <- ss_treatment / ss_total
cat("Eta-squared (η²):", round(eta_squared, 4), "\n")## Eta-squared (η²): 0.0411
## Percentage of variance explained: 4.11 %
Interpretation: BMI category explains 4.11% of the variance in systolic BP.
While statistically significant, the practical effect is modest—BMI category alone doesn’t explain most of the variation in blood pressure.
ANOVA Assumptions:
Diagnostic Plot Interpretation:
# Levene's test for homogeneity of variance
levene_test <- leveneTest(BPSysAve ~ bmi_category, data = bp_bmi_data)
print(levene_test)## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.7615 0.06328 .
## 6023
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Levene’s Test Interpretation:
Overall Assessment: With n > 2000, ANOVA is robust to minor violations. Our assumptions are reasonably satisfied.
Example Results Section:
We conducted a one-way ANOVA to examine whether mean systolic blood pressure (SBP) differs across BMI categories (Normal, Overweight, Obese) among 6,026 adults aged 18-65 from NHANES. Descriptive statistics showed mean SBP of 114.2 mmHg (SD = 15) for normal weight, 118.7 mmHg (SD = 13.9) for overweight, and 121.6 mmHg (SD = 15.3) for obese individuals.
The ANOVA revealed a statistically significant difference in mean SBP across BMI categories, F(2, 6023) = 129.24, p < 0.001. Tukey’s HSD post-hoc tests indicated that all pairwise comparisons were significant (p < 0.05): obese adults had on average 7.4 mmHg higher SBP than normal-weight adults, and 2.9 mmHg higher than overweight adults.
The effect size (η² = 0.041) indicates that BMI category explains 4.1% of the variance in systolic blood pressure, representing a small practical effect. These findings support the well-established relationship between higher BMI and elevated blood pressure, though other factors account for most of the variation in SBP.
Your Task: Complete the same 9-step analysis workflow you just practiced, but now on a different outcome and predictor.
# Prepare the dataset
set.seed(553)
mental_health_data <- NHANES %>%
filter(Age >= 18) %>%
filter(!is.na(DaysMentHlthBad) & !is.na(PhysActive)) %>%
mutate(
activity_level = case_when(
PhysActive == "No" ~ "None",
PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays < 3 ~ "Moderate",
PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays >= 3 ~ "Vigorous",
TRUE ~ NA_character_
),
activity_level = factor(activity_level,
levels = c("None", "Moderate", "Vigorous"))
) %>%
filter(!is.na(activity_level)) %>%
select(ID, Age, Gender, DaysMentHlthBad, PhysActive, activity_level)
# YOUR TURN: Display the first 6 rows and check sample sizes
# Sample sizes by physical activity group
mental_health_data %>%
count(activity_level)## # A tibble: 3 × 2
## activity_level n
## <fct> <int>
## 1 None 3139
## 2 Moderate 768
## 3 Vigorous 1850
YOUR TURN - Answer these questions:
# YOUR TURN: Calculate summary statistics by activity level
# Hint: Follow the same structure as the guided example
# Variables to summarize: n, Mean, SD, Median, Min, Max
mental_health_data %>%
group_by(activity_level) %>%
summarise(
n = n(),
Mean = mean(DaysMentHlthBad),
SD = sd(DaysMentHlthBad),
Median = median(DaysMentHlthBad),
Min = min(DaysMentHlthBad),
Max = max(DaysMentHlthBad)
)## # A tibble: 3 × 7
## activity_level n Mean SD Median Min Max
## <fct> <int> <dbl> <dbl> <dbl> <int> <int>
## 1 None 3139 5.08 9.01 0 0 30
## 2 Moderate 768 3.81 6.87 0 0 30
## 3 Vigorous 1850 3.54 7.17 0 0 30
YOUR TURN - Interpret:
# YOUR TURN: Create boxplots comparing DaysMentHlthBad across activity levels
# Hint: Use the same ggplot code structure as the example
# Change variable names and labels appropriately
library(ggplot2)
ggplot(mental_health_data, aes(x = activity_level, y = DaysMentHlthBad)) +
geom_boxplot(fill = "lightblue") +
labs(
title = "Days of Poor Mental Health by Physical Activity Level",
x = "Physical Activity Level",
y = "Days of Poor Mental Health (Past 30 Days)"
) +
theme_minimal()YOUR TURN - Describe what you see:
YOUR TURN - Write the hypotheses:
Null Hypothesis (H₀): The mean number of days of poor mental health is equal across all physical activity levels (None, Moderate, and Vigorous).
H0: μNone=μModerate=μVigorous
Alternative Hypothesis (H₁): At least one physical activity group has a mean number of days of poor mental health that is different from the others.
H1: At least one μ differs
Significance level: α = 0.05
# YOUR TURN: Fit the ANOVA model
# Outcome: DaysMentHlthBad
# Predictor: activity_level
# Fit the one-way ANOVA model
anova_model <- aov(DaysMentHlthBad ~ activity_level, data = mental_health_data)
# Display the ANOVA table
summary(anova_model)## Df Sum Sq Mean Sq F value Pr(>F)
## activity_level 2 3109 1554.6 23.17 9.52e-11 ***
## Residuals 5754 386089 67.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
YOUR TURN - Extract and interpret the results:
# YOUR TURN: Conduct Tukey HSD test
# Only if your ANOVA p-value < 0.05
anova_model <- aov(DaysMentHlthBad ~ activity_level, data = mental_health_data)
summary(anova_model)## Df Sum Sq Mean Sq F value Pr(>F)
## activity_level 2 3109 1554.6 23.17 9.52e-11 ***
## Residuals 5754 386089 67.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = DaysMentHlthBad ~ activity_level, data = mental_health_data)
##
## $activity_level
## diff lwr upr p adj
## Moderate-None -1.2725867 -2.045657 -0.4995169 0.0003386
## Vigorous-None -1.5464873 -2.109345 -0.9836298 0.0000000
## Vigorous-Moderate -0.2739006 -1.098213 0.5504114 0.7159887
YOUR TURN - Complete the table:
| Comparison | Mean Difference | 95% CI Lower | 95% CI Upper | p-value | Significant? |
|---|---|---|---|---|---|
| Moderate - None | −1.27 | −2.05 | −0.50 | 0.00034 | Yes |
| Vigorous - None | −1.55 | −2.11 | −0.98 | < 0.001 | Yes |
| Vigorous - Moderate | −0.27 | −1.10 | 0.55 | 0.716 | No |
Interpretation:
Which specific groups differ significantly? Tukey post-hoc comparisons showed that participants in the Moderate and Vigorous physical activity groups reported significantly fewer days of poor mental health than those in the None group. There was no statistically significant difference in the mean number of bad mental health days between the Moderate and Vigorous activity groups.
# YOUR TURN: Calculate eta-squared
# Hint: Extract Sum Sq from the ANOVA summary
anova_summary <- summary(anova_model)[[1]]
eta_sq <- anova_summary["activity_level", "Sum Sq"] /
sum(anova_summary[, "Sum Sq"])
eta_sq## [1] 0.007988564
YOUR TURN - Interpret:
YOUR TURN - Evaluate each plot:
Residuals vs Fitted: The residuals appear to be randomly scattered around zero with no strong systematic pattern. This suggests that the linearity and independence assumptions are reasonably met, although slight clustering is expected with a discrete outcome variable.
Q-Q Plot: The points follow the reference line fairly closely, with some deviation in the tails. Given the large sample size, these minor departures from normality are not concerning, and the normality assumption is reasonably satisfied.
Scale-Location: The spread of residuals is relatively consistent across fitted values, with no strong funnel shape. This suggests that the homogeneity of variance assumption is adequately met.
Residuals vs Leverage: No observations appear to have both high leverage and large residuals. There are no influential points that would unduly affect the ANOVA results.
# YOUR TURN: Conduct Levene's test
library(car)
leveneTest(DaysMentHlthBad ~ activity_level, data = mental_health_data)## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 23.168 9.517e-11 ***
## 5754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
YOUR TURN - Overall assessment:
YOUR TURN - Write a complete 2-3 paragraph results section:
Include: 1. Sample description and descriptive statistics 2. F-test results 3. Post-hoc comparisons (if applicable) 4. Effect size interpretation 5. Public health significance
Your Results Section:
The analytic sample included 5,757 adults aged 18 years and older with complete data on physical activity and mental health. Participants were categorized into three physical activity groups: None (n = 3,139), Moderate (n = 768), and Vigorous (n = 1,850). Descriptive statistics indicated that the mean number of days of poor mental health in the past 30 days was highest among individuals reporting no physical activity (M = 5.08, SD = 9.01), followed by those engaging in moderate physical activity (M = 3.81, SD = 6.87), and lowest among those engaging in vigorous physical activity (M = 3.54, SD = 7.17). Median values were zero across all groups, reflecting a right-skewed distribution with many participants reporting no days of poor mental health.
A one-way analysis of variance (ANOVA) was conducted to examine differences in mean days of poor mental health across physical activity levels. The ANOVA revealed a statistically significant effect of physical activity level on days of poor mental health, F(2, 5754) = 23.17, p < .001. Tukey post-hoc comparisons showed that individuals in both the Moderate (mean difference = −1.27, p < .001) and Vigorous (mean difference = −1.55, p < .001) activity groups reported significantly fewer days of poor mental health compared to those in the None group. However, there was no statistically significant difference between the Moderate and Vigorous activity groups (p = .716).
The effect size for the ANOVA was small (η² = 0.008), indicating that physical activity level explained approximately 0.8% of the variance in days of poor mental health. While the magnitude of the effect was modest, the findings have important public health implications. Even small reductions in poor mental health days at the population level may translate into meaningful improvements in overall well-being. These results suggest that engaging in regular physical activity—whether moderate or vigorous—is associated with fewer days of poor mental health compared to no physical activity, highlighting physical activity as a potentially important and accessible target for mental health promotion.
1. How does the effect size help you understand the practical vs. statistical significance?
The effect size provides information about the magnitude of the relationship, not just whether it is statistically significant. In this analysis, the ANOVA was statistically significant, but the effect size (η² ≈ 0.008) was small, indicating that physical activity level explains less than 1% of the variation in days of poor mental health. This helps distinguish statistical significance, which can be influenced by large sample sizes, from practical significance, which reflects how meaningful the difference is in real-world terms.
2. Why is it important to check ANOVA assumptions? What might happen if they’re violated?
Checking ANOVA assumptions is important because violations can lead to invalid results, such as inflated Type I error rates or reduced statistical power. If assumptions like normality or homogeneity of variance are severely violated, the F-test may not accurately reflect true group differences. Although ANOVA is fairly robust to minor violations, especially with large samples, serious violations could result in incorrect conclusions about statistical significance.
3. In public health practice, when might you choose to use ANOVA?
ANOVA is useful in public health when comparing the mean value of a continuous outcome across three or more groups, such as comparing average blood pressure across different treatment groups or mental health outcomes across levels of health behaviors. It allows researchers to test for overall group differences before conducting more detailed comparisons, making it an efficient tool for evaluating group-based interventions or population subgroups.
4. What was the most challenging part of this lab activity?
The most challenging part of this lab was interpreting the results beyond statistical significance, particularly understanding how a small effect size can still be meaningful in a large public health context. Additionally, correctly setting up and interpreting post-hoc comparisons required careful attention to reference groups and confidence intervals.
Before submitting, verify you have:
To submit: Upload both your .Rmd file and the HTML output to Brightspace.
Lab completed on: February 05, 2026
Total Points: 15
| Category | Criteria | Points | Notes |
|---|---|---|---|
| Code Execution | All code chunks run without errors | 4 | - Deduct 1 pt per major error - Deduct 0.5 pt per minor warning |
| Completion | All “YOUR TURN” sections attempted | 4 | - Part B Steps 1-9 completed - All fill-in-the-blank answered - Tukey table filled in |
| Interpretation | Correct statistical interpretation | 4 | - Hypotheses correctly stated (1 pt) - ANOVA results interpreted (1 pt) - Post-hoc results interpreted (1 pt) - Assumptions evaluated (1 pt) |
| Results Section | Professional, complete write-up | 3 | - Includes descriptive stats (1 pt) - Reports F-test & post-hoc (1 pt) - Effect size & significance (1 pt) |
Code Execution (4 points):
Completion (4 points):
Interpretation (4 points):
Results Section (3 points):