Time: ~30 minutes
Goal: Practice one-way ANOVA analysis from start to finish using real public health data
Learning Objectives:
Structure:
Submission: Upload your completed .Rmd file and published to Brightspace by the end of class.
Why ANOVA? We have one continuous outcome (SBP) and one categorical predictor with THREE groups (BMI category). Using multiple t-tests would inflate our Type I error rate.
# Load necessary libraries
library(tidyverse) # For data manipulation and visualization
library(knitr) # For nice tables
library(car) # For Levene's test
library(NHANES) # NHANES dataset
# Load the NHANES data
data(NHANES)Create analysis dataset:
# Set seed for reproducibility
set.seed(553)
# Create BMI categories and prepare data
bp_bmi_data <- NHANES %>%
filter(Age >= 18 & Age <= 65) %>% # Adults 18-65
filter(!is.na(BPSysAve) & !is.na(BMI)) %>%
mutate(
bmi_category = case_when(
BMI < 25 ~ "Normal",
BMI >= 25 & BMI < 30 ~ "Overweight",
BMI >= 30 ~ "Obese",
TRUE ~ NA_character_
),
bmi_category = factor(bmi_category,
levels = c("Normal", "Overweight", "Obese"))
) %>%
filter(!is.na(bmi_category)) %>%
select(ID, Age, Gender, BPSysAve, BMI, bmi_category)
# Display first few rows
head(bp_bmi_data) %>%
kable(caption = "Blood Pressure and BMI Dataset (first 6 rows)")| ID | Age | Gender | BPSysAve | BMI | bmi_category |
|---|---|---|---|---|---|
| 51624 | 34 | male | 113 | 32.22 | Obese |
| 51624 | 34 | male | 113 | 32.22 | Obese |
| 51624 | 34 | male | 113 | 32.22 | Obese |
| 51630 | 49 | female | 112 | 30.57 | Obese |
| 51647 | 45 | female | 118 | 27.24 | Overweight |
| 51647 | 45 | female | 118 | 27.24 | Overweight |
##
## Normal Overweight Obese
## 1939 1937 2150
Interpretation: We have 6026 adults with complete BP and BMI data across three BMI categories.
# Calculate summary statistics by BMI category
summary_stats <- bp_bmi_data %>%
group_by(bmi_category) %>%
summarise(
n = n(),
Mean = mean(BPSysAve),
SD = sd(BPSysAve),
Median = median(BPSysAve),
Min = min(BPSysAve),
Max = max(BPSysAve)
)
summary_stats %>%
kable(digits = 2,
caption = "Descriptive Statistics: Systolic BP by BMI Category")| bmi_category | n | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|---|
| Normal | 1939 | 114.23 | 15.01 | 113 | 78 | 221 |
| Overweight | 1937 | 118.74 | 13.86 | 117 | 83 | 186 |
| Obese | 2150 | 121.62 | 15.27 | 120 | 82 | 226 |
Observation: The mean SBP appears to increase from Normal (114.2) to Overweight (118.7) to Obese (121.6).
# Create boxplots with individual points
ggplot(bp_bmi_data,
aes(x = bmi_category, y = BPSysAve, fill = bmi_category)) +
geom_boxplot(alpha = 0.7, outlier.shape = NA) +
geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Systolic Blood Pressure by BMI Category",
subtitle = "NHANES Data, Adults aged 18-65",
x = "BMI Category",
y = "Systolic Blood Pressure (mmHg)",
fill = "BMI Category"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "none")What the plot tells us:
Null Hypothesis (H₀): μ_Normal = μ_Overweight =
μ_Obese
(All three population means are equal)
Alternative Hypothesis (H₁): At least one population mean differs from the others
Significance level: α = 0.05
# Fit the one-way ANOVA model
anova_model <- aov(BPSysAve ~ bmi_category, data = bp_bmi_data)
# Display the ANOVA table
summary(anova_model)## Df Sum Sq Mean Sq F value Pr(>F)
## bmi_category 2 56212 28106 129.2 <2e-16 ***
## Residuals 6023 1309859 217
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:
Why do we need this? The F-test tells us that groups differ, but not which groups differ. Tukey’s Honest Significant Difference controls the family-wise error rate for multiple pairwise comparisons.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BPSysAve ~ bmi_category, data = bp_bmi_data)
##
## $bmi_category
## diff lwr upr p adj
## Overweight-Normal 4.507724 3.397134 5.618314 0
## Obese-Normal 7.391744 6.309024 8.474464 0
## Obese-Overweight 2.884019 1.801006 3.967033 0
Interpretation:
| Comparison | Mean Diff | 95% CI | p-value | Significant? |
|---|---|---|---|---|
| Overweight - Normal | 4.51 | [3.4, 5.62] | 1.98e-13 | Yes |
| Obese - Normal | 7.39 | [6.31, 8.47] | < 0.001 | Yes |
| Obese - Overweight | 2.88 | [1.8, 3.97] | 1.38e-09 | Yes |
Conclusion: All three pairwise comparisons are statistically significant. Obese adults have higher SBP than overweight adults, who in turn have higher SBP than normal-weight adults.
# Extract sum of squares from ANOVA table
anova_summary <- summary(anova_model)[[1]]
ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)
# Calculate eta-squared
eta_squared <- ss_treatment / ss_total
cat("Eta-squared (η²):", round(eta_squared, 4), "\n")## Eta-squared (η²): 0.0411
## Percentage of variance explained: 4.11 %
Interpretation: BMI category explains 4.11% of the variance in systolic BP.
While statistically significant, the practical effect is modest—BMI category alone doesn’t explain most of the variation in blood pressure.
ANOVA Assumptions:
Diagnostic Plot Interpretation:
# Levene's test for homogeneity of variance
levene_test <- leveneTest(BPSysAve ~ bmi_category, data = bp_bmi_data)
print(levene_test)## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.7615 0.06328 .
## 6023
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Levene’s Test Interpretation:
Overall Assessment: With n > 2000, ANOVA is robust to minor violations. Our assumptions are reasonably satisfied.
Example Results Section:
We conducted a one-way ANOVA to examine whether mean systolic blood pressure (SBP) differs across BMI categories (Normal, Overweight, Obese) among 6,026 adults aged 18-65 from NHANES. Descriptive statistics showed mean SBP of 114.2 mmHg (SD = 15) for normal weight, 118.7 mmHg (SD = 13.9) for overweight, and 121.6 mmHg (SD = 15.3) for obese individuals.
The ANOVA revealed a statistically significant difference in mean SBP across BMI categories, F(2, 6023) = 129.24, p < 0.001. Tukey’s HSD post-hoc tests indicated that all pairwise comparisons were significant (p < 0.05): obese adults had on average 7.4 mmHg higher SBP than normal-weight adults, and 2.9 mmHg higher than overweight adults.
The effect size (η² = 0.041) indicates that BMI category explains 4.1% of the variance in systolic blood pressure, representing a small practical effect. These findings support the well-established relationship between higher BMI and elevated blood pressure, though other factors account for most of the variation in SBP.
Your Task: Complete the same 9-step analysis workflow you just practiced, but now on a different outcome and predictor.
# Prepare the dataset
set.seed(553)
mental_health_data <- NHANES %>%
filter(Age >= 18) %>%
filter(!is.na(DaysMentHlthBad) & !is.na(PhysActive)) %>%
mutate(
activity_level = case_when(
PhysActive == "No" ~ "None",
PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays < 3 ~ "Moderate",
PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays >= 3 ~ "Vigorous",
TRUE ~ NA_character_
),
activity_level = factor(activity_level,
levels = c("None", "Moderate", "Vigorous"))
) %>%
filter(!is.na(activity_level)) %>%
select(ID, Age, Gender, DaysMentHlthBad, PhysActive, activity_level)
# YOUR TURN: Display the first 6 rows and check sample sizes
head(mental_health_data) %>%
kable(caption = "Mental Health and Physical Activity (first 6 rows)")| ID | Age | Gender | DaysMentHlthBad | PhysActive | activity_level |
|---|---|---|---|---|---|
| 51624 | 34 | male | 15 | No | None |
| 51624 | 34 | male | 15 | No | None |
| 51624 | 34 | male | 15 | No | None |
| 51630 | 49 | female | 10 | No | None |
| 51647 | 45 | female | 3 | Yes | Vigorous |
| 51647 | 45 | female | 3 | Yes | Vigorous |
##
## None Moderate Vigorous
## 3139 768 1850
YOUR TURN - Answer these questions:
# YOUR TURN: Calculate summary statistics by activity level
# Hint: Follow the same structure as the guided example
# Variables to summarize: n, Mean, SD, Median, Min, Max
summary_stats <- mental_health_data %>%
group_by(activity_level) %>%
summarise(
n = n(),
Mean = mean(DaysMentHlthBad),
SD = sd(DaysMentHlthBad),
Median = median(DaysMentHlthBad),
Min = min(DaysMentHlthBad),
Max = max(DaysMentHlthBad)
)
summary_stats %>%
kable(digits = 2,
caption = "Descriptive Statistics: Days with Bad Mental Health by Activity level")| activity_level | n | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|---|
| None | 3139 | 5.08 | 9.01 | 0 | 0 | 30 |
| Moderate | 768 | 3.81 | 6.87 | 0 | 0 | 30 |
| Vigorous | 1850 | 3.54 | 7.17 | 0 | 0 | 30 |
YOUR TURN - Interpret:
Which group has the highest mean number of bad mental health days? -From the table it can be seen that the people who said they are not physically active has to highest number of mean (5.08) regarding having days with bad mental health.
Which group has the lowest? -The group of people who said that they are vigorous in physical activity has the lowest number of mean (3.54) regarding having days with bad mental health.
# YOUR TURN: Create boxplots comparing DaysMentHlthBad across activity levels
# Hint: Use the same ggplot code structure as the example
# Change variable names and labels appropriately
ggplot(mental_health_data,
aes(x = activity_level, y = DaysMentHlthBad, fill = activity_level)) +
geom_boxplot(alpha = 0.7, outlier.shape = NA) +
geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Days with Bad Mental Health by Physical Activity level",
subtitle = "NHANES Data, Adults aged 18-65",
x = "Physical activity level",
y = "Days with Bad Mental Health",
fill = "Physical Activity level"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "none")YOUR TURN - Describe what you see:
YOUR TURN - Write the hypotheses:
Null Hypothesis (H₀): There is no significant difference in the means of the group of the people who do not do physical activity, moderately do physical activity and vigorously do physical activity. μ_none = μ_moderate = μ_vigorous
Alternative Hypothesis (H₁): There are significant difference in the means of at least one group of people of none, moderate and vigorous group.
Significance level: α = 0.05
# YOUR TURN: Fit the ANOVA model
# Outcome: DaysMentHlthBad
# Predictor: activity_level
anova_model <- aov(DaysMentHlthBad ~ activity_level, data = mental_health_data)
# Display the ANOVA table
summary(anova_model)## Df Sum Sq Mean Sq F value Pr(>F)
## activity_level 2 3109 1554.6 23.17 9.52e-11 ***
## Residuals 5754 386089 67.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
YOUR TURN - Extract and interpret the results:
# YOUR TURN: Conduct Tukey HSD test
# Only if your ANOVA p-value < 0.05
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = DaysMentHlthBad ~ activity_level, data = mental_health_data)
##
## $activity_level
## diff lwr upr p adj
## Moderate-None -1.2725867 -2.045657 -0.4995169 0.0003386
## Vigorous-None -1.5464873 -2.109345 -0.9836298 0.0000000
## Vigorous-Moderate -0.2739006 -1.098213 0.5504114 0.7159887
YOUR TURN - Complete the table:
| Comparison | Mean Difference | 95% CI Lower | 95% CI Upper | p-value | Significant? |
|---|---|---|---|---|---|
| Moderate - None | -1.2725867 | -2.045657 | -0.4995169 | 0.0003386 | Yes |
| Vigorous - None | -1.5464873 | -2.109345 | -0.9836298 | 0.0000000 | Yes |
| Vigorous - Moder | -0.2739006 | -1.098213 | 0.5504114 | 0.7159887 | No |
Interpretation:
Which specific groups differ significantly?
# YOUR TURN: Calculate eta-squared
# Hint: Extract Sum Sq from the ANOVA summary
# Extract sum of squares from ANOVA table
anova_summary <- summary(anova_model)[[1]]
ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)
# Calculate eta-squared
eta_squared <- ss_treatment / ss_total
cat("Eta-squared (η²):", round(eta_squared, 4), "\n")## Eta-squared (η²): 0.008
## Percentage of variance explained: 0.8 %
**YOUR TURN - Interpret:**
- η² = 0.008
- Percentage of variance explained: 0.8 %
- Effect size classification (small/medium/large): Large
- What does this mean practically?
Although the effect size is classified as large, the η² value of 0.008 means that only 0.8% of the variance in the outcome is explained by the group differences. Practically, this indicates that the independent variable has a statistically significant effect, but the actual impact is very small in real-world terms. The groups differ, but the difference does not explain much of the overall variation in the outcome.
---
### Step 8: Check Assumptions
``` r
par(mfrow = c(2, 2))
plot(anova_model)
YOUR TURN - Evaluate each plot:
The fitted values range roughly from 3.5 to 5.0, and the residuals are mostly centered around 0, ranging approximately from –1 to +1. There is no strong curved pattern, suggesting the linearity assumption is reasonably satisfied.
The standardized residuals range from about –3 to +3. The points follow the line in the middle but deviate at the upper tail (above about 2), suggesting some departure from normality, especially in the extreme values.
The √|standardized residuals| values range approximately from 0 to 1.5. The red line is relatively flat, indicating that the variance of residuals is fairly constant across fitted values (homoscedasticity is mostly met).
Leverage values are very small, ranging roughly from 0.0000 to 0.0012. No points exceed the Cook’s distance reference lines, indicating there are no highly influential observations affecting the model.
# YOUR TURN: Conduct Levene's test
# Levene's test for homogeneity of variance
levene_test <- leveneTest(DaysMentHlthBad ~ activity_level, data = mental_health_data)
print(levene_test)## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 23.168 9.517e-11 ***
## 5754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
YOUR TURN - Overall assessment:
Levene’s test is statistically significant F(2, 5754) = 23.17, p = 9.52e-11, which means the assumption of homogeneity of variance is violated. The variances across the three groups are not equal.
Probably not, because the sample size is very large (5757 people). When the sample size is large, ANOVA is usually still reliable even if variances are not equal.
YOUR TURN - Write a complete 2-3 paragraph results section:
Include: 1. Sample description and descriptive statistics 2. F-test results 3. Post-hoc comparisons (if applicable) 4. Effect size interpretation 5. Public health significance
Your Results Section:
The study included a total of 5,757 participants divided into three activity groups: None (n = 3,139), Moderate (n = 768), and Vigorous (n = 1,850). The mean outcome was highest in the None group (Mean = 5.08, SD = 9.01), compared to the Moderate group (Mean = 3.81, SD = 6.87) and the Vigorous group (Mean = 3.54, SD = 7.17). The median was 0 for all three groups, and the values ranged from 0 to 30 in each group, indicating a right-skewed distribution with some higher values.
A one-way ANOVA showed a statistically significant difference between the groups, F(2, 5754) = 23.17, p < 0.001, indicating that at least one group mean differs from the others. Post-hoc comparisons revealed that both the Moderate and Vigorous groups were significantly different from the None group (p < 0.001 for both comparisons). But, there was no significant difference between the Moderate and Vigorous groups (p = 0.716).
Although the results were statistically significant, the effect size was very small (η² = 0.008), meaning that only 0.8% of the variance in the outcome was explained by group differences. This suggests that while the differences are statistically detectable due to the large sample size, the practical or clinical impact is minimal. From a public health perspective, the findings indicate a measurable but small association, and other factors likely play a much larger role in explaining the outcome.
1. How does the effect size help you understand the practical vs. statistical significance?
Effect size tells us how big or meaningful a difference or relationship is, not just whether it is statistically significant. A result can be statistically significant but have a very small effect, and vice versa.
2. Why is it important to check ANOVA assumptions? What might happen if they’re violated?
It is important to check ANOVA assumptions to make sure the results are accurate. If they are violated, the test may give wrong conclusions about whether groups are different.
3. In public health practice, when might you choose to use ANOVA?
I would use ANOVA when comparing the average outcome across three or more groups. For example, comparing mean blood pressure across different activity levels.
4. What was the most challenging part of this lab activity?
The most challenging part of this lab activity is th reflection question part.
Before submitting, verify you have:
To submit: Upload both your .Rmd file and the HTML output to Brightspace.
Lab completed on: February 05, 2026
Total Points: 15
| Category | Criteria | Points | Notes |
|---|---|---|---|
| Code Execution | All code chunks run without errors | 4 | - Deduct 1 pt per major error - Deduct 0.5 pt per minor warning |
| Completion | All “YOUR TURN” sections attempted | 4 | - Part B Steps 1-9 completed - All fill-in-the-blank answered - Tukey table filled in |
| Interpretation | Correct statistical interpretation | 4 | - Hypotheses correctly stated (1 pt) - ANOVA results interpreted (1 pt) - Post-hoc results interpreted (1 pt) - Assumptions evaluated (1 pt) |
| Results Section | Professional, complete write-up | 3 | - Includes descriptive stats (1 pt) - Reports F-test & post-hoc (1 pt) - Effect size & significance (1 pt) |
Code Execution (4 points):
Completion (4 points):
Interpretation (4 points):
Results Section (3 points):