Lab Overview

Time: ~30 minutes

Goal: Practice one-way ANOVA analysis from start to finish using real public health data

Learning Objectives:

  • Understand when and why to use ANOVA instead of multiple t-tests
  • Set up hypotheses for ANOVA
  • Conduct and interpret the F-test
  • Perform post-hoc tests when appropriate
  • Check ANOVA assumptions
  • Calculate and interpret effect size (η²)

PART B: YOUR TURN - INDEPENDENT PRACTICE

Practice Problem: Physical Activity and Depression

Research Question: Is there a difference in the number of days with poor mental health across three physical activity levels (None, Moderate, Vigorous)?

Your Task: Complete the same 9-step analysis workflow you just practiced, but now on a different outcome and predictor.


Step 1: Setup and Data Preparation

# Load necessary libraries
library(tidyverse)   # For data manipulation and visualization
library(knitr)       # For nice tables
library(car)         # For Levene's test
library(NHANES)      # NHANES dataset

# Load the NHANES data
data(NHANES)

Create analysis dataset:

# Prepare the dataset
set.seed(553)

mental_health_data <- NHANES %>%
  filter(Age >= 18) %>%
  filter(!is.na(DaysMentHlthBad) & !is.na(PhysActive)) %>%
  mutate(
    activity_level = case_when(
      PhysActive == "No" ~ "None",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays < 3 ~ "Moderate",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays >= 3 ~ "Vigorous",
      TRUE ~ NA_character_
    ),
    activity_level = factor(activity_level, 
                           levels = c("None", "Moderate", "Vigorous"))
  ) %>%
  filter(!is.na(activity_level)) %>%
  select(ID, Age, Gender, DaysMentHlthBad, PhysActive, activity_level)

# YOUR TURN: Display the first 6 rows and check sample sizes
#Display the first 6 rows
head(mental_health_data) %>% 
  kable(caption = "Mental Health and Physical Activity Dataset (first 6 rows)")
Mental Health and Physical Activity Dataset (first 6 rows)
ID Age Gender DaysMentHlthBad PhysActive activity_level
51624 34 male 15 No None
51624 34 male 15 No None
51624 34 male 15 No None
51630 49 female 10 No None
51647 45 female 3 Yes Vigorous
51647 45 female 3 Yes Vigorous
# Check sample sizes
table(mental_health_data$activity_level)
## 
##     None Moderate Vigorous 
##     3139      768     1850

YOUR TURN - Answer these questions:

  • How many people are in each physical activity group?
    • None: 3139
    • Moderate: 768
    • Vigorous: 1850

Step 2: Descriptive Statistics

# YOUR TURN: Calculate summary statistics by activity level
# Hint: Follow the same structure as the guided example
# Variables to summarize: n, Mean, SD, Median, Min, Max

summary_statistics <- mental_health_data %>%
  group_by(activity_level) %>%
  summarise(
    n = n(),
    Mean = mean(DaysMentHlthBad),
    SD = sd(DaysMentHlthBad),
    Median = median(DaysMentHlthBad),
    Min = min(DaysMentHlthBad),
    Max = max(DaysMentHlthBad)
  )

summary_statistics %>% 
  kable(digits = 2, 
        caption = "Descriptive Statistics: Days Mental Health Bad by Activity level")
Descriptive Statistics: Days Mental Health Bad by Activity level
activity_level n Mean SD Median Min Max
None 3139 5.08 9.01 0 0 30
Moderate 768 3.81 6.87 0 0 30
Vigorous 1850 3.54 7.17 0 0 30

YOUR TURN - Interpret:

  • Which group has the highest mean number of bad mental health days? The group with the highest mean number of bad mental health days is the “none” group (5.08).
  • Which group has the lowest? The group with the lowest mean number of bad mental health days is the “vigorous” group (3.54).

Step 3: Visualization

# YOUR TURN: Create boxplots comparing DaysMentHlthBad across activity levels
# Hint: Use the same ggplot code structure as the example
# Change variable names and labels appropriately
ggplot(mental_health_data, 
  aes(x = activity_level, y = DaysMentHlthBad, fill = activity_level)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Days Mental Health Bad by Activity Level",
    subtitle = "NHANES Data, Adults aged 18-65",
    x = "Activity Level",
    y = "Days Mental Health Bad",
    fill = "Activity Level"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

YOUR TURN - Describe what you see:

  • Do the groups appear to differ? The boxes overlap, but the vigorous group appears shifted downward. It also appears that the lowest activity level group has the highest mean bad mental health days
  • Are the variances similar across groups? The variances of the “none” and the “moderate” activity level groups seem to have similar variances (they are similar in height), however, the variance of the vigorous group differs(it has the shortest box).

Step 4: Set Up Hypotheses

YOUR TURN - Write the hypotheses:

Null Hypothesis (H₀): μ_None = μ_Moderate = μ_Vigorous

Alternative Hypothesis (H₁): At least one population mean differs from the others

Significance level: α = 0.05


Step 5: Fit the ANOVA Model

# YOUR TURN: Fit the ANOVA model
# Outcome: DaysMentHlthBad
# Predictor: activity_level

anova_model <- aov(DaysMentHlthBad ~ activity_level, data = mental_health_data)
summary(anova_model)
##                  Df Sum Sq Mean Sq F value  Pr(>F)    
## activity_level    2   3109    1555    23.2 9.5e-11 ***
## Residuals      5754 386089      67                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

YOUR TURN - Extract and interpret the results:

  • F-statistic: 23.2
  • Degrees of freedom: 2
  • p-value: 9.5e-11
  • Decision (reject or fail to reject H₀): Since p < 0.05, we reject H₀
  • Statistical conclusion in words: There is statistically significant evidence that mean Days Mental Health Bad differs across at least two activity level groups.

Step 6: Post-Hoc Tests

# YOUR TURN: Conduct Tukey HSD test
# Only if your ANOVA p-value < 0.05
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = DaysMentHlthBad ~ activity_level, data = mental_health_data)
## 
## $activity_level
##                      diff    lwr     upr  p adj
## Moderate-None     -1.2726 -2.046 -0.4995 0.0003
## Vigorous-None     -1.5465 -2.109 -0.9836 0.0000
## Vigorous-Moderate -0.2739 -1.098  0.5504 0.7160

YOUR TURN - Complete the table:

Comparison Mean Difference 95% CI Lower 95% CI Upper p-value Significant?
Moderate - None -1.2726 -2.046 -0.4995 0.0003 significant
Vigorous - None -1.5465 -2.109 -0.9836 0.0000 significant
Vigorous - Moderate -0.2739 -1.098 0.5504 0.7160 not significant

Interpretation:

Which specific groups differ significantly? The “Moderate - None” and the “Vigorous - None” groups differ significantly. Individuals with vigorous activity levels and individuals with moderate activity levels have lower amounts of mental health bad days than individuals with no activity level.


Step 7: Calculate Effect Size

# YOUR TURN: Calculate eta-squared
# Hint: Extract Sum Sq from the ANOVA summary
anova_summary <- summary(anova_model)[[1]]

ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)

# Calculate eta-squared
eta_squared <- ss_treatment / ss_total

cat("Eta-squared (η²):", round(eta_squared, 4), "\n")
## Eta-squared (η²): 0.008
#Percentage of variance explained
cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")
## Percentage of variance explained: 0.8 %

YOUR TURN - Interpret:

  • η² = 0.008
  • Percentage of variance explained: Activity level category explains 0.8 % of the variance in Days Mental Health Bad
  • Effect size classification (small/medium/large): Small
  • What does this mean practically? While statistically significant, the practical effect is modest. Activity Level category alone doesn’t explain most of the variation in Days Mental Health Bad.

Step 8: Check Assumptions

# YOUR TURN: Create diagnostic plots
par(mfrow = c(2, 2))
plot(anova_model)

par(mfrow = c(1, 1))
plot(anova_model)

YOUR TURN - Evaluate each plot:

  1. Residuals vs Fitted: Points show random scatter around zero with no clear pattern. However, there are three areas on the plot where data points are heavily concentrated in vertical lines
  2. Q-Q Plot: Points do not follow the diagonal line → Normality assumption is not reasonable
  3. Scale-Location: Red line seems to be scaling upward → Equal variance assumption is not reasonable
  4. Residuals vs Leverage: No points beyond Cook’s distance lines → No highly influential outliers
# YOUR TURN: Conduct Levene's test
levene_test <- leveneTest(DaysMentHlthBad ~ activity_level, data = mental_health_data)
print(levene_test)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value  Pr(>F)    
## group    2    23.2 9.5e-11 ***
##       5754                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

YOUR TURN - Overall assessment:

  • Are assumptions reasonably met? p < 0.05, reject equal variances. Equal variance assumption is not met.
  • Do any violations threaten your conclusions? With n > 2000, ANOVA is robust to moderate violations. With unbalanced designs (different group sizes, None - 3139, Moderate - 768, Vigorous - 1850 ) violations matter more, threatening conclusions. Assumptions are not reasonably satisfied, there is violations of the homogeneity of variance assumption.

Step 9: Write Up Results

YOUR TURN - Write a complete 2-3 paragraph results section:

Include: 1. Sample description and descriptive statistics 2. F-test results 3. Post-hoc comparisons (if applicable) 4. Effect size interpretation 5. Public health significance

Your Results Section:

We conducted a one-way ANOVA to examine whether mean days mental health bad differs across activity level groups (None, Moderate, Vigorous) among 5757 adults aged 18-65 from NHANES. Descriptive statistics showed mean days mental health bad of 5.08 days (SD = 9) for no activity level, 3.81 days (SD = 6.9) for moderate activity level, and 3.54 days (SD = 7.2) for vigorous activity level.

The ANOVA revealed a statistically significant difference in mean SBP across BMI categories, F(2, 5757) = 23.2, p < 0.001. Tukey’s HSD post-hoc tests indicated that two pairwise comparisons, “Moderate - None” and “Vigorous - None”, were significant (p < 0.05): adults with moderate activity level had on average 1.3 fewer bad mental health days than adults with no activity level, adults with vigorous activity level had on average 1.5 fewer bad mental health days than adults with no activity level.

The effect size (η² = 0.008) indicates that the activity level category explains 0.8 % of the variance in days mental health bad, representing a small practical effect. These findings support the well-established relationship between higher activity levels and less days mental health bad, though other factors account for most of the variation in days mental health bad.


Reflection Questions

1. How does the effect size help you understand the practical vs. statistical significance?

Statistically significant doesn’t always mean practically meaningful. Effect size matters, p < 0.05 doesn’t mean practically important; you should always report η² (eta-squared) because it shows how much variance a factor explains. While something may be statistically significant, the practical effect may differ because one variable alone doesn’t explain most of the variation within another variable.

2. Why is it important to check ANOVA assumptions? What might happen if they’re violated?

It important to check ANOVA assumptions because meeting these assumptions ensures the validity of your p-values and inference. If they’re violated there may be inconsistencies when making observations for independence, normality, and homogeneity of variance.

3. In public health practice, when might you choose to use ANOVA?

In public health practice, one might choose to use ANOVA to compare 3 or more group means, when there is one categorical predictor, when they have a continuous outcome, or when there are independent observations within each group. Furthermore, you may choose to use ANOVA when you want to confidently compare multiple treatments, exposure levels, or populations without inflating your chance of false discoveries.

4. What was the most challenging part of this lab activity?

The most challenging part of this lab activity was probably the part when I had to explain if assumptions were met, and if any violations threatened my conclusions.


Submission Checklist

Before submitting, verify you have:

To submit: Upload both your .Rmd file and the HTML output to Brightspace.


Lab completed on: February 05, 2026