Time: ~30 minutes
Goal: Practice one-way ANOVA analysis from start to finish using real public health data
Learning Objectives:
Structure:
Submission: Upload your completed .Rmd file and published to Brightspace by the end of class.
Why ANOVA? We have one continuous outcome (SBP) and one categorical predictor with THREE groups (BMI category). Using multiple t-tests would inflate our Type I error rate.
# Load necessary libraries
library(tidyverse) # For data manipulation and visualization
library(knitr) # For nice tables
library(car) # For Levene's test
library(NHANES) # NHANES dataset
# Load the NHANES data
data(NHANES)Create analysis dataset:
# Set seed for reproducibility
set.seed(553)
# Create BMI categories and prepare data
bp_bmi_data <- NHANES %>%
filter(Age >= 18 & Age <= 65) %>% # Adults 18-65
filter(!is.na(BPSysAve) & !is.na(BMI)) %>%
mutate(
bmi_category = case_when(
BMI < 25 ~ "Normal",
BMI >= 25 & BMI < 30 ~ "Overweight",
BMI >= 30 ~ "Obese",
TRUE ~ NA_character_
),
bmi_category = factor(bmi_category,
levels = c("Normal", "Overweight", "Obese"))
) %>%
filter(!is.na(bmi_category)) %>%
select(ID, Age, Gender, BPSysAve, BMI, bmi_category)
# Display first few rows
head(bp_bmi_data) %>%
kable(caption = "Blood Pressure and BMI Dataset (first 6 rows)")| ID | Age | Gender | BPSysAve | BMI | bmi_category |
|---|---|---|---|---|---|
| 51624 | 34 | male | 113 | 32.22 | Obese |
| 51624 | 34 | male | 113 | 32.22 | Obese |
| 51624 | 34 | male | 113 | 32.22 | Obese |
| 51630 | 49 | female | 112 | 30.57 | Obese |
| 51647 | 45 | female | 118 | 27.24 | Overweight |
| 51647 | 45 | female | 118 | 27.24 | Overweight |
##
## Normal Overweight Obese
## 1939 1937 2150
Interpretation: We have 6026 adults with complete BP and BMI data across three BMI categories.
# Calculate summary statistics by BMI category
summary_stats <- bp_bmi_data %>%
group_by(bmi_category) %>%
summarise(
n = n(),
Mean = mean(BPSysAve),
SD = sd(BPSysAve),
Median = median(BPSysAve),
Min = min(BPSysAve),
Max = max(BPSysAve)
)
summary_stats %>%
kable(digits = 2,
caption = "Descriptive Statistics: Systolic BP by BMI Category")| bmi_category | n | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|---|
| Normal | 1939 | 114.23 | 15.01 | 113 | 78 | 221 |
| Overweight | 1937 | 118.74 | 13.86 | 117 | 83 | 186 |
| Obese | 2150 | 121.62 | 15.27 | 120 | 82 | 226 |
Observation: The mean SBP appears to increase from Normal (114.2) to Overweight (118.7) to Obese (121.6).
# Create boxplots with individual points
ggplot(bp_bmi_data,
aes(x = bmi_category, y = BPSysAve, fill = bmi_category)) +
geom_boxplot(alpha = 0.7, outlier.shape = NA) +
geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Systolic Blood Pressure by BMI Category",
subtitle = "NHANES Data, Adults aged 18-65",
x = "BMI Category",
y = "Systolic Blood Pressure (mmHg)",
fill = "BMI Category"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "none")What the plot tells us:
Null Hypothesis (H₀): μ_Normal = μ_Overweight =
μ_Obese
(All three population means are equal)
Alternative Hypothesis (H₁): At least one population mean differs from the others
Significance level: α = 0.05
# Fit the one-way ANOVA model
anova_model <- aov(BPSysAve ~ bmi_category, data = bp_bmi_data)
# Display the ANOVA table
summary(anova_model)## Df Sum Sq Mean Sq F value Pr(>F)
## bmi_category 2 56212 28106 129.2 <2e-16 ***
## Residuals 6023 1309859 217
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:
Why do we need this? The F-test tells us that groups differ, but not which groups differ. Tukey’s Honest Significant Difference controls the family-wise error rate for multiple pairwise comparisons.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BPSysAve ~ bmi_category, data = bp_bmi_data)
##
## $bmi_category
## diff lwr upr p adj
## Overweight-Normal 4.507724 3.397134 5.618314 0
## Obese-Normal 7.391744 6.309024 8.474464 0
## Obese-Overweight 2.884019 1.801006 3.967033 0
Interpretation:
| Comparison | Mean Diff | 95% CI | p-value | Significant? |
|---|---|---|---|---|
| Overweight - Normal | 4.51 | [3.4, 5.62] | 3.82e-12 | Yes |
| Obese - Normal | 7.39 | [6.31, 8.47] | < 0.001 | Yes |
| Obese - Overweight | 2.88 | [1.8, 3.97] | 1.38e-09 | Yes |
Conclusion: All three pairwise comparisons are statistically significant. Obese adults have higher SBP than overweight adults, who in turn have higher SBP than normal-weight adults.
# Extract sum of squares from ANOVA table
anova_summary <- summary(anova_model)[[1]]
ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)
# Calculate eta-squared
eta_squared <- ss_treatment / ss_total
cat("Eta-squared (η²):", round(eta_squared, 4), "\n")## Eta-squared (η²): 0.0411
## Percentage of variance explained: 4.11 %
Interpretation: BMI category explains 4.11% of the variance in systolic BP.
While statistically significant, the practical effect is modest—BMI category alone doesn’t explain most of the variation in blood pressure.
ANOVA Assumptions:
Diagnostic Plot Interpretation:
# Levene's test for homogeneity of variance
levene_test <- leveneTest(BPSysAve ~ bmi_category, data = bp_bmi_data)
print(levene_test)## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.7615 0.06328 .
## 6023
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Levene’s Test Interpretation:
Overall Assessment: With n > 2000, ANOVA is robust to minor violations. Our assumptions are reasonably satisfied.
Example Results Section:
We conducted a one-way ANOVA to examine whether mean systolic blood pressure (SBP) differs across BMI categories (Normal, Overweight, Obese) among 6,026 adults aged 18-65 from NHANES. Descriptive statistics showed mean SBP of 114.2 mmHg (SD = 15) for normal weight, 118.7 mmHg (SD = 13.9) for overweight, and 121.6 mmHg (SD = 15.3) for obese individuals.
The ANOVA revealed a statistically significant difference in mean SBP across BMI categories, F(2, 6023) = 129.24, p < 0.001. Tukey’s HSD post-hoc tests indicated that all pairwise comparisons were significant (p < 0.05): obese adults had on average 7.4 mmHg higher SBP than normal-weight adults, and 2.9 mmHg higher than overweight adults.
The effect size (η² = 0.041) indicates that BMI category explains 4.1% of the variance in systolic blood pressure, representing a small practical effect. These findings support the well-established relationship between higher BMI and elevated blood pressure, though other factors account for most of the variation in SBP.
Your Task: Complete the same 9-step analysis workflow you just practiced, but now on a different outcome and predictor.
# Prepare the dataset
set.seed(553)
mental_health_data <- NHANES %>%
filter(Age >= 18) %>%
filter(!is.na(DaysMentHlthBad) & !is.na(PhysActive)) %>%
mutate(
activity_level = case_when(
PhysActive == "No" ~ "None",
PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays < 3 ~ "Moderate",
PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays >= 3 ~ "Vigorous",
TRUE ~ NA_character_
),
activity_level = factor(activity_level,
levels = c("None", "Moderate", "Vigorous"))
) %>%
filter(!is.na(activity_level)) %>%
select(ID, Age, Gender, DaysMentHlthBad, PhysActive, activity_level)
# Display first few rows: Display the first 6 rows and check sample sizes
head(mental_health_data) %>%
kable(caption = "First 6 rows of Mental Health dataset")| ID | Age | Gender | DaysMentHlthBad | PhysActive | activity_level |
|---|---|---|---|---|---|
| 51624 | 34 | male | 15 | No | None |
| 51624 | 34 | male | 15 | No | None |
| 51624 | 34 | male | 15 | No | None |
| 51630 | 49 | female | 10 | No | None |
| 51647 | 45 | female | 3 | Yes | Vigorous |
| 51647 | 45 | female | 3 | Yes | Vigorous |
## Sample size: 5757 adults aged 18+
##
## None Moderate Vigorous
## 3139 768 1850
Answer these questions:
# Calculate summary statistics by activity level
# Variables to summarize: n, Mean, SD, Median, Min, Max
summary_stats <- mental_health_data %>%
group_by(activity_level) %>%
summarize (
n = n(),
Mean = mean(DaysMentHlthBad),
SD = sd(DaysMentHlthBad),
Median = median(DaysMentHlthBad),
Min = min(DaysMentHlthBad),
Max = max(DaysMentHlthBad)
)
summary_stats %>%
kable(digits = 2,
caption = "Descriptive Statistics: Mean Bad Mental Health Days by Activity Level")| activity_level | n | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|---|
| None | 3139 | 5.08 | 9.01 | 0 | 0 | 30 |
| Moderate | 768 | 3.81 | 6.87 | 0 | 0 | 30 |
| Vigorous | 1850 | 3.54 | 7.17 | 0 | 0 | 30 |
Interpret:
The group with the highest mean number of bad mental health days are those who are classified under none for their activity level with a mean of 5.08.
The group with the lowest mean number of bad mental health days are those who are classified under vigorous for their activity level with a mean of 3.54.
# Create boxplots comparing DaysMentHlthBad across activity levels
ggplot(mental_health_data,
aes (x = activity_level, y = DaysMentHlthBad, fill = activity_level)) +
geom_boxplot(alpha = 0.7, outlier.shape = NA) +
geom_jitter (width = 0.2, alpha = 0.1, size = 0.5) +
scale_fill_brewer (palette = "Set2") +
labs(
title = "Mean Bad Mental Health Days by Activity Level",
subtitle = "NHANES Data, Adults aged 18+",
x = "Activity Level",
y = "Bad Mental Health Days",
fill = "Activity Level"
) +
theme_minimal (base_size = 12) +
theme (legend.position = "none")Describe what you see:
There appears to be a trend: higher activity level (vigorous) appear to have less bad mental health days
Variability (box heights) look similar across the none and moderate activity levels whereas vigorous appears to have a shorter box height or less variability.
Write the hypotheses:
Null Hypothesis (H₀):
Null Hypothesis (H₀): μ_None = μ_Moderate = μ_Vigorous (All three population means are equal)
Alternative Hypothesis (H₁):
Alternative Hypothesis (H₁): At least one population mean differs from the others
Significance level: α = 0.05
# Fit the ANOVA model
# Outcome: DaysMentHlthBad
# Predictor: activity_level
anova_model <- aov(DaysMentHlthBad ~ activity_level, data = mental_health_data)
# Display the ANOVA table
summary(anova_model)## Df Sum Sq Mean Sq F value Pr(>F)
## activity_level 2 3109 1554.6 23.17 9.52e-11 ***
## Residuals 5754 386089 67.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Extract and interpret the results:
# Conduct Tukey HSD test
# Only if your ANOVA p-value < 0.05
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = DaysMentHlthBad ~ activity_level, data = mental_health_data)
##
## $activity_level
## diff lwr upr p adj
## Moderate-None -1.2725867 -2.045657 -0.4995169 0.0003386
## Vigorous-None -1.5464873 -2.109345 -0.9836298 0.0000000
## Vigorous-Moderate -0.2739006 -1.098213 0.5504114 0.7159887
Complete the table:
| Comparison | Mean Difference | 95% CI Lower | 95% CI Upper | p-value | Significant? |
|---|---|---|---|---|---|
| Moderate - None | -1.27 | -2.0 | -0.5 | <.001 | Yes |
| Vigorous - None | -1.55 | -2.1 | -1.0 | <.001 | Yes |
| Vigorous - Moderate | -0.27 | -1.1 | 0.6 | 0.716 | No |
Interpretation:
Which specific groups differ significantly?
Moderate compared to none and vigorous compared to none are considered statistically significant; however, vigorous compared to moderate is not statistically significant. None active individuals have higher mean number bad mental health days when compared to both moderate and vigorous activity level; however, there is not a statistically significant difference when comparing mean number bad mental health days in individuals with moderate and vigorous activity level.
# Calculate eta-squared
# Extract Sum Sq from the ANOVA summary
anova_summary <- summary(anova_model)[[1]]
ss_treatment <- anova_summary$'Sum Sq'[1]
ss_total <- sum(anova_summary$'Sum Sq')
eta_squared <- ss_treatment / ss_total
cat("Eta-squared (η²):", round(eta_squared, 4), "/n")## Eta-squared (η²): 0.008 /n
# Percentage of variance explained
cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")## Percentage of variance explained: 0.8 %
Interpret:
η² = 0.008
Percentage of variance explained: 0.8%
Effect size classification (small/medium/large):
Small (less than 0.01)
While statistically significant, the practical effect is minor - activity level alone doesn’t explain most of the variation in number of bad mental health days.
Evaluate each plot:
We see three vertical columns of points, with each representing a group of activity level. The red line is relatively flat and stays close to zero even though it does seem to lower slightly as it gets further away.
Since the points do not follow the diagonal line in the top right, where they are curved away, we would conclude right-skewness. Since our sample size is large, Central LImit Theorem allows us to use ANOVA as it is robust to this violation.
Red line is relatively flat except for a slight incline upward. The spread appears to be consistent across groups because the columns are approximately the same height. This could give insight to the Levene’s test which rejected the null hypotheis of equal variances (unequal variances found).
Since there are no points in the corners that cross the dashed red lines called “Cook’s Distance,” there is no single outlier unfairly driving our results.
# Conduct Levene's test
levene_test <- leveneTest(DaysMentHlthBad ~ activity_level, data = mental_health_data)
print(levene_test)## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 23.168 9.517e-11 ***
## 5754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Overall assessment:
Our assumptions are reasonably satisfied because we have an n > 2000. ANOVA is robust to minor violations, when the sample size is this large.
P-value: 9.517e-11 Since p < 0.05, we would reject equal variances which would threaten our conclusions; however, we have a large sample size.
Write a complete 2-3 paragraph results section:
Include: 1. Sample description and descriptive statistics 2. F-test results 3. Post-hoc comparisons (if applicable) 4. Effect size interpretation 5. Public health significance
Your Results Section:
We conducted a one-way ANOVA to examine whether there is a difference in the mean number of days with poor mental health across three physical activity levels (none, moderate, vigorous) among 5757 adults aged 18 and over from NHANES. Descriptive statistics showed a mean number of bad mental health days of 5.08 (SD = 9.01) for activity level none, 3.81 for activity level moderate (SD = 6.87), and 3.54 for activity level vigorous (SD = 7.17).
The ANOVA revealed a statistically significant mean difference in number of bad mental health days across activity levels, F(2, 5754) = 23.17, p < 0.0001. Tukey’s HSD post-hoc tests indicated that two pairwise comparisons were significant (p < 0.05), individuals with activity level of moderate had on average 1.27 less bad mental health days than those classified as none and individuals with an activity level of vigorous had on average 1.55 less bad mental health days than those classified as none. The pairwise comparison of moderate to vigorous activity level was not significant (p = 0.716) with an average 0.27 less bad mental health days.
The effect size (η² = 0.008) indicates that activity level explains 0.8% of the variance in the number of bad mental health days, representing a very small practical effect. These findings support the well-established relationship between lower activity level and increased bad mental health days, though it appears that other factors account for most of the variance in number of bad mental health days.
1. How does the effect size help you understand the practical vs. statistical significance?
Eta-squared can help us understand effect size because it is a proportion of the total variance explained by the treatment group. Sometimes, the statistical versus practical significance have alternate meanings from one another. Although an effect size may be considered statistical, it may not be practical and vice versa. Effect size gives us conventional guidelines to compare our effect to and understand not only its statistical but also practical significance. For this reason, we should always report p-values with effect size to provide an overview.
2. Why is it important to check ANOVA assumptions? What might happen if they’re violated?
It is important to check assumptions in ANOVA especially when we have a small, unbalanced sample. If the ANOVA does not meet the three key assumptions and they are violated, then the validity of the p-values and inference may be invalid. If the sample size is above 30, ANOVA is typically robust to moderate violations of these assumptions. With the very small, unbalanced sample, we need to pay more attention to the violators because the outliers can have large effects on F-test validity.
3. In public health practice, when might you choose to use ANOVA?
ANOVA seems to be used in countless public health practices. It can be used to understand medication effectiveness such as monitoring blood pressure across different medication groups. It could be used to study environmental health by studying mean ospital admissions by different levels of pollution. In population health, it could be used to compare vaccination rates across geographic units. Finally for occupational health, it could be employed to understand differences in lung function based on occupation.
4. What was the most challenging part of this lab activity?
I think the most challenging part of this lab was interpreting the model diagnostics to check that the fitted ANOVA model meets all of its assumptions. I had never analyzed these plots before, so I truly did not know what I was supposed to be looking for as signifiers to denote failed assumptions.
Before submitting, verify you have:
To submit: Upload both your .Rmd file and the HTML output to Brightspace.
Lab completed on: February 05, 2026
Total Points: 15
| Category | Criteria | Points | Notes |
|---|---|---|---|
| Code Execution | All code chunks run without errors | 4 | - Deduct 1 pt per major error - Deduct 0.5 pt per minor warning |
| Completion | All “YOUR TURN” sections attempted | 4 | - Part B Steps 1-9 completed - All fill-in-the-blank answered - Tukey table filled in |
| Interpretation | Correct statistical interpretation | 4 | - Hypotheses correctly stated (1 pt) - ANOVA results interpreted (1 pt) - Post-hoc results interpreted (1 pt) - Assumptions evaluated (1 pt) |
| Results Section | Professional, complete write-up | 3 | - Includes descriptive stats (1 pt) - Reports F-test & post-hoc (1 pt) - Effect size & significance (1 pt) |
Code Execution (4 points):
Completion (4 points):
Interpretation (4 points):
Results Section (3 points):