Lab Overview

Time: ~30 minutes

Goal: Practice one-way ANOVA analysis from start to finish using real public health data

Learning Objectives:

Understand when and why to use ANOVA instead of multiple t-tests
Set up hypotheses for ANOVA
Conduct and interpret the F-test
Perform post-hoc tests when appropriate
Check ANOVA assumptions
Calculate and interpret effect size (η²)

Structure:

Part A: Guided Example (follow along)
Part B: Your Turn (independent practice)

Submission: Upload your completed .Rmd file and published to Brightspace by the end of class.

PART A: GUIDED EXAMPLE

Example: Blood Pressure and BMI Categories

Research Question: Is there a difference in mean systolic blood pressure (SBP) across three BMI categories (Normal weight, Overweight, Obese)?

Why ANOVA? We have one continuous outcome (SBP) and one categorical predictor with THREE groups (BMI category). Using multiple t-tests would inflate our Type I error rate.

Step 1: Setup and Data Preparation

# Load necessary libraries
library(tidyverse)   # For data manipulation and visualization
library(knitr)       # For nice tables
library(car)         # For Levene's test
library(NHANES)      # NHANES dataset

# Load the NHANES data
data(NHANES)

Create analysis dataset:

# Set seed for reproducibility
set.seed(553)

# Create BMI categories and prepare data
bp_bmi_data <- NHANES %>%
  filter(Age >= 18 & Age <= 65) %>%  # Adults 18-65
  filter(!is.na(BPSysAve) & !is.na(BMI)) %>%
  mutate(
    bmi_category = case_when(
      BMI < 25 ~ "Normal",
      BMI >= 25 & BMI < 30 ~ "Overweight",
      BMI >= 30 ~ "Obese",
      TRUE ~ NA_character_
    ),
    bmi_category = factor(bmi_category, 
                         levels = c("Normal", "Overweight", "Obese"))
  ) %>%
  filter(!is.na(bmi_category)) %>%
  select(ID, Age, Gender, BPSysAve, BMI, bmi_category)

# Display first few rows
head(bp_bmi_data) %>% 
  kable(caption = "Blood Pressure and BMI Dataset (first 6 rows)")

Blood Pressure and BMI Dataset (first 6 rows)
ID	Age	Gender	BPSysAve	BMI	bmi_category
51624	34	male	113	32.22	Obese
51624	34	male	113	32.22	Obese
51624	34	male	113	32.22	Obese
51630	49	female	112	30.57	Obese
51647	45	female	118	27.24	Overweight
51647	45	female	118	27.24	Overweight

# Check sample sizes
table(bp_bmi_data$bmi_category)

## 
##     Normal Overweight      Obese 
##       1939       1937       2150

Interpretation: there are 6026 adults with complete BP and BMI data across three BMI categories.

Step 2: Descriptive Statistics

# Calculate summary statistics by BMI category
summary_stats <- bp_bmi_data %>%
  group_by(bmi_category) %>%
  summarise(
    n = n(),
    Mean = mean(BPSysAve),
    SD = sd(BPSysAve),
    Median = median(BPSysAve),
    Min = min(BPSysAve),
    Max = max(BPSysAve)
  )

summary_stats %>% 
  kable(digits = 2, 
        caption = "Descriptive Statistics: Systolic BP by BMI Category")

Descriptive Statistics: Systolic BP by BMI Category
bmi_category	n	Mean	SD	Median	Min	Max
Normal	1939	114.23	15.01	113	78	221
Overweight	1937	118.74	13.86	117	83	186
Obese	2150	121.62	15.27	120	82	226

Interpretation: The mean SBP increases from Normal (114.2) to Overweight (118.7) to Obese (121.6).

But is this difference statistically significant?

Step 3: Visualize the Data

# Create boxplots with individual points
ggplot(bp_bmi_data, 
  aes(x = bmi_category, y = BPSysAve, fill = bmi_category)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Systolic Blood Pressure by BMI Category",
    subtitle = "NHANES Data, Adults aged 18-65",
    x = "BMI Category",
    y = "Systolic Blood Pressure (mmHg)",
    fill = "BMI Category"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

What the plot tells us:

There is a relationship between the bmi and SBP is that the high the BMI the higher the SBP median will be.
The variability is similar across all groups.

Step 4: Set Up Hypotheses

Null Hypothesis (H₀): μ_Normal = μ_Overweight = μ_Obese
(All three population means are equal)

Alternative Hypothesis (H₁): At least one population mean differs from the others

Significance level: α = 0.05

Step 5: Fit the ANOVA Model

# Fit the one-way ANOVA model
anova_model <- aov(BPSysAve ~ bmi_category, data = bp_bmi_data)

# Display the ANOVA table
summary(anova_model)

##                Df  Sum Sq Mean Sq F value Pr(>F)    
## bmi_category    2   56212   28106   129.2 <2e-16 ***
## Residuals    6023 1309859     217                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation:

F-statistic: 129.24
Degrees of freedom: df₁ = 2 (k-1 groups), df₂ = 6023 (n-k)
p-value: < 2e-16 (very small)
Decision: Since p < 0.05, we reject H₀
Conclusion: There is statistically significant evidence that mean systolic BP differs across at least two BMI categories.

Step 6: Post-Hoc Tests (Tukey HSD)

Why do we need this? The F-test tells us that groups differ, but not which groups differ. Tukey’s Honest Significant Difference controls the family-wise error rate for multiple pairwise comparisons.

# Conduct Tukey HSD test
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BPSysAve ~ bmi_category, data = bp_bmi_data)
## 
## $bmi_category
##                       diff      lwr      upr p adj
## Overweight-Normal 4.507724 3.397134 5.618314     0
## Obese-Normal      7.391744 6.309024 8.474464     0
## Obese-Overweight  2.884019 1.801006 3.967033     0

# Visualize the confidence intervals
plot(tukey_results, las = 0)

Interpretation:

Comparison	Mean Diff	95% CI	p-value	Significant?
Overweight - Normal	4.51	[3.4, 5.62]	1.98e-13	Yes
Obese - Normal	7.39	[6.31, 8.47]	< 0.001	Yes
Obese - Overweight	2.88	[1.8, 3.97]	1.38e-09	Yes

Conclusion: All three comparisons were statistically significant. the Obese adults had a higher SBP than the overweight adults who have a higher SBP than normal-weight adults.

Step 7: Calculate Effect Size

# Extract sum of squares from ANOVA table
anova_summary <- summary(anova_model)[[1]]

ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)

# Calculate eta-squared
eta_squared <- ss_treatment / ss_total

cat("Eta-squared (η²):", round(eta_squared, 4), "\n")

## Eta-squared (η²): 0.0411

cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")

## Percentage of variance explained: 4.11 %

Interpretation: The BMI category explains 4.11% of the variance in systolic BP.

Effect size guidelines: Small (0.01), Medium (0.06), Large (0.14)
Our effect: Small

Even though its statistically significant, the actual impact is limited the BMI category alone does not explain the majority of the difference in blood pressure.

Step 8: Check Assumptions

ANOVA Assumptions:

Independence: Observations are independent based on study design
Normality: Residuals are normally distributed
Homogeneity of variance: Equal variances across all groups

# Create diagnostic plots
par(mfrow = c(2, 2))
plot(anova_model)

par(mfrow = c(1, 1))

Diagnostic Plot Interpretation:

Residuals vs Fitted: Points are randomly scattered around zero with no clear pattern
Q-Q Plot: Points follow the diagonal line relatively well. The normality assumption is reasonable.
Scale-Location: The assumption of equal variance is fair because the red line is somewhat flat.
Residuals vs Leverage:Cook’s distance lines are not allowed to be crossed.

# Levene's test for homogeneity of variance
levene_test <- leveneTest(BPSysAve ~ bmi_category, data = bp_bmi_data)
print(levene_test)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value  Pr(>F)  
## group    2  2.7615 0.06328 .
##       6023                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Levene’s Test Interpretation:

p-value: 0.0633
If p < 0.05, we would reject equal variances
Here: Equal variance assumption is met

Overall Assessment: With n > 2000, ANOVA is robust to minor violations. Our assumptions are satisfied.

Step 9: Report Results

Example Results Section:

We conducted a one-way ANOVA to examine if the mean systolic blood pressure (SBP) differs across BMI categories (Normal, Overweight, Obese) among 6,026 adults aged 18-65 from NHANES. Descriptive statistics showed mean SBP of 114.2 mmHg (SD = 15) for normal weight, 118.7 mmHg (SD = 13.9) for overweight, and 121.6 mmHg (SD = 15.3) for obese individuals.

The ANOVA results showed a statistically significant difference in mean SBP across BMI categories, F(2, 6023) = 129.24, p < 0.001. Tukey’s HSD post-hoc tests indicated that all pairwise comparisons were significant (p < 0.05): obese adults had on average 7.4 mmHg higher SBP than normal-weight adults, and 2.9 mmHg higher than overweight adults.

The effect size (η² = 0.041) indicates that BMI category explains 4.1% of the variance in systolic blood pressure, representing a small practical effect.Although other factors account for the majority of the variation in SBP, these results confirm the strong link between raised blood pressure and greater BMI.

PART B: YOUR TURN - INDEPENDENT PRACTICE

Practice Problem: Physical Activity and Depression

Research Question: Is there a difference in the number of days with poor mental health across three physical activity levels (None, Moderate, Vigorous)?

Your Task: Complete the same 9-step analysis workflow you just practiced, but now on a different outcome and predictor.

Step 1: Data Preparation

# Prepare the dataset
set.seed(553)

mental_health_data <- NHANES %>%
  filter(Age >= 18) %>%
  filter(!is.na(DaysMentHlthBad) & !is.na(PhysActive)) %>%
  mutate(
    activity_level = case_when(
      PhysActive == "No" ~ "None",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays < 3 ~ "Moderate",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays >= 3 ~ "Vigorous",
      TRUE ~ NA_character_
    ),
    activity_level = factor(activity_level, 
                           levels = c("None", "Moderate", "Vigorous"))
  ) %>%
  filter(!is.na(activity_level)) %>%
  select(ID, Age, Gender, DaysMentHlthBad, PhysActive, activity_level)

# YOUR TURN: Display the first 6 rows and check sample sizes
head(mental_health_data) %>% 
  kable(caption = "Mental Health and Physical Activity (first 6 rows)")

Mental Health and Physical Activity (first 6 rows)
ID	Age	Gender	DaysMentHlthBad	PhysActive	activity_level
51624	34	male	15	No	None
51624	34	male	15	No	None
51624	34	male	15	No	None
51630	49	female	10	No	None
51647	45	female	3	Yes	Vigorous
51647	45	female	3	Yes	Vigorous

# Check sample sizes
table(mental_health_data$activity_level)

## 
##     None Moderate Vigorous 
##     3139      768     1850

YOUR TURN - Answer these questions:

How many people are in each physical activity group?
- None: 3139
- Moderate: 768
- Vigorous: 1850

Step 2: Descriptive Statistics

# YOUR TURN: Calculate summary statistics by activity level
# Hint: Follow the same structure as the guided example
# Variables to summarize: n, Mean, SD, Median, Min, Max
summary_stats <- mental_health_data %>%
  group_by(activity_level) %>%
  summarise(
    n = n(),
    Mean = mean(DaysMentHlthBad),
    SD = sd(DaysMentHlthBad),
    Median = median(DaysMentHlthBad),
    Min = min(DaysMentHlthBad),
    Max = max(DaysMentHlthBad)
  )

summary_stats %>% 
  kable(digits = 2, 
        caption = "Descriptive Statistics: Days with Bad Mental Health by Activity level")

Descriptive Statistics: Days with Bad Mental Health by Activity level
activity_level	n	Mean	SD	Max
None	3139	5.08	9.01	30
Moderate	768	3.81	6.87	30
Vigorous	1850	3.54	7.17	30

YOUR TURN - Interpret:

Which group has the highest mean number of bad mental health days? -The None activity group has the highest mean number of bad mental health days. None: Mean = 5.08 Moderate: Mean = 3.81 Vigorous: Mean = 3.54 Individuals reporting no physical activity have the highest average number of bad mental health days.
Which group has the lowest? -The Vigorous activity group has the lowest mean number of bad mental health days. Vigorous: Mean = 3.54 Moderate: Mean = 3.81 None: Mean = 5.08

Step 3: Visualization

# YOUR TURN: Create boxplots comparing DaysMentHlthBad across activity levels
# Hint: Use the same ggplot code structure as the example
# Change variable names and labels appropriately
ggplot(mental_health_data, 
  aes(x = activity_level, y = DaysMentHlthBad, fill = activity_level)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Days with Bad Mental Health by Physical Activity level",
    subtitle = "NHANES Data, Adults aged 18-65",
    x = "Physical activity level",
    y = "Days with Bad Mental Health",
    fill = "Physical Activity level"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

YOUR TURN - Describe what you see:

Do the groups appear to differ? -Yes. From the boxplots and point distributions: The None activity group shows a wider spread of bad mental health days. The Moderate group appears is a bit lower than None but still shows substantial variability. The Vigorous activity group shows the lowest central tendency
Are the variances similar across groups? -The variances do not appear to be similar across groups. The None and Moderate activity groups show greater variability, with wider boxes and more extreme values. The Vigorous activity group has a smaller spread

Step 4: Set Up Hypotheses

YOUR TURN - Write the hypotheses:

Null Hypothesis (H₀): There is no difference in the mean number of days with bad mental health among adults with none, moderate, or vigorous physical activity levels. Ho : μNone = μModerate = μVigorous

Alternative Hypothesis (H₁): There are significant difference in the means of at least one group of people of none, moderate and vigorous group. H1 : At least one μ differs among the groups Significance level: α = 0.05

Step 5: Fit the ANOVA Model

# YOUR TURN: Fit the ANOVA model
# Outcome: DaysMentHlthBad
# Predictor: activity_level
anova_model <- aov(DaysMentHlthBad ~ activity_level, data = mental_health_data)

# Display the ANOVA table
summary(anova_model)

##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## activity_level    2   3109  1554.6   23.17 9.52e-11 ***
## Residuals      5754 386089    67.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

YOUR TURN - Extract and interpret the results:

F-statistic: 23.17
Degrees of freedom: 2
p-value: 9.52e-11
The ANOVA results show a statistically significant effect of physical activity level on the mean number of days with bad mental health (F(2, 5754) = 23.17, p = 9.52 × 10⁻¹¹). Because the p-value is far less than α = 0.05, we reject the null hypothesis. Conclusion: There is strong statistical evidence that the mean number of days with bad mental health differs across physical activity levels. This shows that physical activity level is significantly associated with mental health outcomes in this sample

Step 6: Post-Hoc Tests

# YOUR TURN: Conduct Tukey HSD test
# Only if your ANOVA p-value < 0.05
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = DaysMentHlthBad ~ activity_level, data = mental_health_data)
## 
## $activity_level
##                         diff       lwr        upr     p adj
## Moderate-None     -1.2725867 -2.045657 -0.4995169 0.0003386
## Vigorous-None     -1.5464873 -2.109345 -0.9836298 0.0000000
## Vigorous-Moderate -0.2739006 -1.098213  0.5504114 0.7159887

# Visualize the confidence intervals
plot(tukey_results, las = 0)

YOUR TURN - Complete the table:

Comparison	Mean Difference	95% CI Lower	95% CI Upper	p-value	Significant?
Moderate - None	-1.2725867	-2.045657	-0.4995169	0.0003386	Yes
Vigorous - None	-1.5464873	-2.109345	-0.9836298	0.0000000	Yes
Vigorous - Moder	-0.2739006	-1.098213	0.5504114	0.7159887	No

Interpretation:

Which specific groups differ significantly?

The Moderate and Vigorous groups differ significantly from the None group, with modest p-values and confidence ranges that don’t include zero. However, there is no significant difference between the Moderate and Vigorous groups because to the huge p-value and the confidence interval that encompasses zero.

Step 7: Calculate Effect Size

# YOUR TURN: Calculate eta-squared
# Hint: Extract Sum Sq from the ANOVA summary

# Extract sum of squares from ANOVA table
anova_summary <- summary(anova_model)[[1]]

ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)

# Calculate eta-squared
eta_squared <- ss_treatment / ss_total

cat("Eta-squared (η²):", round(eta_squared, 4), "\n")

## Eta-squared (η²): 0.008

cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")

## Percentage of variance explained: 0.8 %


**YOUR TURN - Interpret:**

- η² = 0.008
- Percentage of variance explained: 0.8 %
- Effect size classification (small/medium/large): Large
- What does this mean practically?

Despite the effect size being categorized as large, the η2 value of 0.008 indicates that group differences account for just 0.8% of the variation in the result. In practical terms, this means that although the independent variable has a statistically significant influence, the real-world impact is minimal. Although there are differences between the groups, they do not account for a large portion of the overall variance in the results.
---

### Step 8: Check Assumptions


```r
par(mfrow = c(2, 2))
plot(anova_model)

par(mfrow = c(1, 1))

YOUR TURN - Evaluate each plot:

Residuals vs Fitted:

The residuals are primarily centered around 0 and vary generally from –1 to +1, whereas the fitted values typically fall between 3.5 and 5.0. The lack of a noticeable curved pattern indicates that the linearity assumption is largely met.

Q-Q Plot:

The standardized residuals range from about –3 to +3. The points follow the line in the middle but deviate at the upper tail (above about 2), suggesting some difference from normality, especially in the extreme values.

Scale-Location:

The values of √|standardized residuals| are roughly between 0 and 1.5. Since the red line is comparatively flat, homoscedasticity is largely satisfied and the residual variance is fairly stable across fitted values.

Residuals vs Leverage:

Leverage values are very small, ranging roughly from 0.0000 to 0.0012. No points exceed the Cook’s distance reference lines, indicating there are no highly influential observations affecting the model.

# YOUR TURN: Conduct Levene's test
# Levene's test for homogeneity of variance
levene_test <- leveneTest(DaysMentHlthBad ~ activity_level, data = mental_health_data)
print(levene_test)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    2  23.168 9.517e-11 ***
##       5754                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

YOUR TURN - Overall assessment:

Are assumptions reasonably met?

Levene’s test is statistically significant F(2, 5754) = 23.17, p = 9.52e-11, which means the assumption of homogeneity of variance is violated. The variances across the three groups are not equal.

Do any violations threaten your conclusions?

No, because the sample size is very large (5757 people). When the sample size is large, ANOVA is usually still reliable even if variances are not equal.

Step 9: Write Up Results

YOUR TURN - Write a complete 2-3 paragraph results section:

Include: 1. Sample description and descriptive statistics 2. F-test results 3. Post-hoc comparisons (if applicable) 4. Effect size interpretation 5. Public health significance

Your Results Section:

According to their self-reported levels of physical activity, 5,757 adults between the ages of 18 and 65 from NHANES data were included in the analysis: none (n = 3,139), moderate (n = 768), and vigorous (n = 1,850). Those who reported no physical activity had the highest mean number of days with poor mental health in the previous 30 days (Mean = 5.08, SD = 9.01), followed by those who reported moderate exercise (Mean = 3.81, SD = 6.87), and those who reported vigorous activity (Mean = 3.54, SD = 7.17). All groups’ medians were zero, indicating a distribution that was significantly variable and skewed to the right.

To compare mean days of poor mental health across levels of physical exercise, a one-way ANOVA was performed. The null hypothesis of equal averages across activity levels was rejected since the results showed a statistically significant difference between the groups. These results indicate that both moderate and vigorous physical activity are associated with significantly fewer bad mental health days compared to no activity, but vigorous activity does not confer a statistically significant additional benefit over moderate activity.

According to standard standards, the effect size, which is computed as η² = 0.02, suggests a small effect. Although physical activity level accounts for a small percentage of the overall variance in days with poor mental health, this is to be expected in large population-based studies where a variety of biological, social, and environmental factors influence mental health outcomes. Additionally, minor population-level changes may still have a significant impact on public health.

Despite the small effect size, these findings are significant from the standpoint of public health. Any amount of physical activity, even moderate activity, is linked to a monthly decrease of roughly 1–1.5 less days of poor mental health as compared to no activity. This decrease indicates a significant improvement in general mental health when extrapolated to the population level. An important equity issue is highlighted by the lack of a substantial difference between moderate and intense activity: mental health benefits can be attained without high-intensity exercise, supporting suggestions for inclusive and accessible physical activity in public health programs.

Reflection Questions

1. How does the effect size help you understand the practical vs. statistical significance?

In-Class Lab Activity 2: Analysis of Variance (ANOVA)

EPI 553: Principles of Statistical Inference II

Karo Ekor

2026-02-06

Lab Overview

PART A: GUIDED EXAMPLE

Example: Blood Pressure and BMI Categories

Research Question: Is there a difference in mean systolic blood pressure (SBP) across three BMI categories (Normal weight, Overweight, Obese)?

Step 1: Setup and Data Preparation

Step 2: Descriptive Statistics

But is this difference statistically significant?

Step 3: Visualize the Data

Step 4: Set Up Hypotheses

Step 5: Fit the ANOVA Model

Step 6: Post-Hoc Tests (Tukey HSD)

Step 7: Calculate Effect Size

Step 8: Check Assumptions

Step 9: Report Results

PART B: YOUR TURN - INDEPENDENT PRACTICE

Practice Problem: Physical Activity and Depression

Research Question: Is there a difference in the number of days with poor mental health across three physical activity levels (None, Moderate, Vigorous)?

Step 1: Data Preparation

Step 2: Descriptive Statistics

Step 3: Visualization

Step 4: Set Up Hypotheses

Step 5: Fit the ANOVA Model

Step 6: Post-Hoc Tests

Step 7: Calculate Effect Size

Step 9: Write Up Results

Reflection Questions

##For me the most challenging part was interpreting the values I got from the anova and tukey models.