Lab Overview

Time: ~30 minutes

Goal: Practice one-way ANOVA analysis from start to finish using real public health data

Learning Objectives:

  • Understand when and why to use ANOVA instead of multiple t-tests
  • Set up hypotheses for ANOVA
  • Conduct and interpret the F-test
  • Perform post-hoc tests when appropriate
  • Check ANOVA assumptions
  • Calculate and interpret effect size (η²)

Step 1: Setup and Data Preparation

# Load necessary libraries
library(tidyverse)   # For data manipulation and visualization
library(knitr)       # For nice tables
library(car)         # For Levene's test
library(NHANES)      # NHANES dataset

# Load the NHANES data
data(NHANES)

PART B: YOUR TURN - INDEPENDENT PRACTICE

Practice Problem: Physical Activity and Depression

Research Question: Is there a difference in the number of days with poor mental health across three physical activity levels (None, Moderate, Vigorous)?

Your Task: Complete the same 9-step analysis workflow you just practiced, but now on a different outcome and predictor.


Step 1: Data Preparation

# Prepare the dataset
set.seed(553)

mental_health_data <- NHANES %>%
  filter(Age >= 18) %>%
  filter(!is.na(DaysMentHlthBad) & !is.na(PhysActive)) %>%
  mutate(
    activity_level = case_when(
      PhysActive == "No" ~ "None",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays < 3 ~ "Moderate",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays >= 3 ~ "Vigorous",
      TRUE ~ NA_character_
    ),
    activity_level = factor(activity_level, 
                           levels = c("None", "Moderate", "Vigorous"))
  ) %>%
  filter(!is.na(activity_level)) %>%
  select(ID, Age, Gender, DaysMentHlthBad, PhysActive, activity_level)

# YOUR TURN: Display the first 6 rows and check sample sizes

# Display first 6 rows
head(mental_health_data) %>% 
  kable(caption = "Physical Activity and Mental Health (first 6 rows)")
Physical Activity and Mental Health (first 6 rows)
ID Age Gender DaysMentHlthBad PhysActive activity_level
51624 34 male 15 No None
51624 34 male 15 No None
51624 34 male 15 No None
51630 49 female 10 No None
51647 45 female 3 Yes Vigorous
51647 45 female 3 Yes Vigorous
# Check sample sizes
table(mental_health_data$activity_level)
## 
##     None Moderate Vigorous 
##     3139      768     1850

YOUR TURN - Answer these questions:

  • How many people are in each physical activity group?
    • None: 3139
    • Moderate: 768
    • Vigorous: 1850

Step 2: Descriptive Statistics

# YOUR TURN: Calculate summary statistics by activity level
# Hint: Follow the same structure as the guided example
# Variables to summarize: n, Mean, SD, Median, Min, Max

# Calculate summary statistics by BMI category
summary_stats <- mental_health_data %>%
  group_by(activity_level) %>%
  summarise(
    n = n(),
    Mean = mean(DaysMentHlthBad),
    SD = sd(DaysMentHlthBad),
    Median = median(DaysMentHlthBad),
    Min = min(DaysMentHlthBad),
    Max = max(DaysMentHlthBad)
  )

summary_stats %>% 
  kable(digits = 2, 
        caption = "Descriptive Statistics: Days with Bad Mental Health by Physical Activity Category")
Descriptive Statistics: Days with Bad Mental Health by Physical Activity Category
activity_level n Mean SD Median Min Max
None 3139 5.08 9.01 0 0 30
Moderate 768 3.81 6.87 0 0 30
Vigorous 1850 3.54 7.17 0 0 30

YOUR TURN - Interpret:

  • Which group has the highest mean number of bad mental health days? None physical activity
  • Which group has the lowest? Vigorous physical activity

Step 3: Visualization

# YOUR TURN: Create boxplots comparing DaysMentHlthBad across activity levels
# Hint: Use the same ggplot code structure as the example
# Change variable names and labels appropriately

# Create boxplots with individual points
ggplot(mental_health_data, 
  aes(x = activity_level, y = DaysMentHlthBad, fill = activity_level)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Days with Bad Mental Health by Physical Activity Category",
    subtitle = "NHANES Data, Adults aged 18-65",
    x = "Physical Activity Level",
    y = "Days with Bad Mental Health",
    fill = "Physical Activity Level"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

YOUR TURN - Describe what you see:

  • Do the groups appear to differ? The groups with none or moderate levels of physical activity do not appear to differ; however, the group with vigorous physical activity does appear to have lower days with bad mental health.
  • Are the variances similar across groups? The variances are similar between the groups with none or moderate levels of physical activity, but the variance is reduced for the group with vigorous physical activity.

Step 4: Set Up Hypotheses

YOUR TURN - Write the hypotheses:

Null Hypothesis (H₀): μ_None = μ_Moderate = μ_Vigorous
(All three population means are equal)

Alternative Hypothesis (H₁): At least one population mean differs from the others

Significance level: α = 0.05


Step 5: Fit the ANOVA Model

# YOUR TURN: Fit the ANOVA model
# Outcome: DaysMentHlthBad
# Predictor: activity_level

# Fit the one-way ANOVA model
anova_model <- aov(DaysMentHlthBad ~ activity_level, data = mental_health_data)

# Display the ANOVA table
summary(anova_model)
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## activity_level    2   3109  1554.6   23.17 9.52e-11 ***
## Residuals      5754 386089    67.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

YOUR TURN - Extract and interpret the results:

  • F-statistic: 23.17
  • Degrees of freedom: 2
  • p-value: 9.52e-11 (very small)
  • Decision (reject or fail to reject H₀): Since p < 0.05, we reject H₀
  • Statistical conclusion in words: There is statistically significant evidence that mean number of days with bad mental health differs across at least two physical activity categories.

Step 6: Post-Hoc Tests

# YOUR TURN: Conduct Tukey HSD test
# Only if your ANOVA p-value < 0.05

# Conduct Tukey HSD test
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = DaysMentHlthBad ~ activity_level, data = mental_health_data)
## 
## $activity_level
##                         diff       lwr        upr     p adj
## Moderate-None     -1.2725867 -2.045657 -0.4995169 0.0003386
## Vigorous-None     -1.5464873 -2.109345 -0.9836298 0.0000000
## Vigorous-Moderate -0.2739006 -1.098213  0.5504114 0.7159887
# Visualize the confidence intervals
plot(tukey_results, las = 0)

YOUR TURN - Complete the table:

Comparison Mean Difference 95% CI Lower 95% CI Upper p-value Significant?
Moderate - None -1.27 -2.05 -0.50 0.0003 Yes
Vigorous - None -1.55 -2.11 -0.98 0.0000 Yes
Vigorous - Moderate -0.27 -1.10 0.55 0.7160 No

Interpretation:

Which specific groups differ significantly? None and moderate physical activity. None and moderate have confidence intervals that do not include zero, therefore we can reject the null hypothesis, whereas vigorous does include zero so we cannot reject the null. This suggests that physical activity has a benefit plateau at moderate levels.


Step 7: Calculate Effect Size

# YOUR TURN: Calculate eta-squared
# Hint: Extract Sum Sq from the ANOVA summary

# Extract sum of squares from ANOVA table
anova_summary <- summary(anova_model)[[1]]

ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)

# Calculate eta-squared
eta_squared <- ss_treatment / ss_total

cat("Eta-squared (η²):", round(eta_squared, 4), "\n")
## Eta-squared (η²): 0.008
cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")
## Percentage of variance explained: 0.8 %

YOUR TURN - Interpret:

  • η² = 0.008
  • Percentage of variance explained: 0.8%
  • Effect size classification (small/medium/large): Small
  • What does this mean practically? While statistically significant, the practical effect is that the physical activity category alone doesn’t explain most of the variation in days with bad mental health.

Step 8: Check Assumptions

# YOUR TURN: Create diagnostic plots
# Create diagnostic plots
par(mfrow = c(2, 2))
plot(anova_model)

par(mfrow = c(1, 1))

YOUR TURN - Evaluate each plot:

  1. Residuals vs Fitted: Points do not show random scatter around zero, suggesting there may be outliers or evidence of heteroscedasticity

  2. Q-Q Plot: Points do not follow the diagonal line reasonably well; therefore, the normality assumption may not be reasonable

  3. Scale-Location: The red line is rising, suggesting that the equal variance assumption may not be reasonable

  4. Residuals vs Leverage: There are no points beyond Cook’s distance lines so there are no highly influential outliers

# YOUR TURN: Conduct Levene's test
# Levene's test for homogeneity of variance
levene_test <- leveneTest(DaysMentHlthBad ~ activity_level, data = mental_health_data)
print(levene_test)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    2  23.168 9.517e-11 ***
##       5754                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

YOUR TURN - Overall assessment:

  • Are assumptions reasonably met? Variances differ significantly with a p-value that is < 0.05; therefore we must reject equal variances. ;
  • Do any violations threaten your conclusions? These post-hoc tests suggest violations to our ANOVA analysis. While the sample size is large at over 5,000, the categories of physical activity are not balanced, with the none category accounting for over 50% of the total sample and moderate category only accounting for approximately 13%. However, even with these violations the ANOVA remains trustworthy due to its significant p-value and large sample size.

Step 9: Write Up Results

YOUR TURN - Write a complete 2-3 paragraph results section:

Include: 1. Sample description and descriptive statistics 2. F-test results 3. Post-hoc comparisons (if applicable) 4. Effect size interpretation 5. Public health significance

Your Results Section:

This NHANES sample divided participants into categories based on their levels of physical activity in order to compare exercise and mental health, as defined by mean number of days with bad mental health. The total sample of 5,757 comprised 3,139 (54.5%) individuals who engage in no physical activity, 768 (13.3%) with moderate physical activity totaling less than 3 days per week, and 1,850 (32.1%) with vigorous levels of physical activity totaling more than 3 days per week. Those in the none category of physical activity had a mean of 5.08 number of bad mental health days, whereas the moderate group had 3.81 and vigorous group had 3.54.

The F-statistic of 23.17 means the between-group variation is about 23 times larger than the within-group variation. The p-value (< 0.001 or =9.52e-11) indicates this difference is extremely unlikely to have occurred by chance if all groups truly had the same mean.

Post-hoc Tukey HSD tests revealed that individuals with vigorous physical activity had significantly lower mean number of days with bad mental health compared to those with no physical activity (mean difference = -1.55, 95% CI [-2.11, -0.98], p < 0.001). Similarly, moderate activity was associated with lower higher mean number of days with bad mental health compared to low activity (mean difference = -1.27, 95% CI [-2.05, -0.50], p < 0.001). The difference between moderate and vigorous activity groups was not statistically significant (p = 0.716), suggesting that there is a physical activity benefit limit that approximates moderate levels.

While statistically significant, the effect size was small (η² = 0.008), indicating that physical activity explains only 0.08% of variance in bad mental health days. Other unmeasured factors such as mental health conditions, genetics, cardiovascular disease, and other overall health metrics likely play larger roles in bad mental health days.


Reflection Questions

1. How does the effect size help you understand the practical vs. statistical significance?

Effect size shows the true impact of a variable on the outcome in interest. This adds critical context when discussing p-values and statistical significance, especially when communicating the larger public health importance.

2. Why is it important to check ANOVA assumptions? What might happen if they’re violated?

It’s important as a significant ANOVA test cannot tell you which group is significantly assoicated with an outcome. Other post-hoc test can tell you more about the balance and makeup of your data. When these post-hoc tests are violated researchers can be more specific in their interpretation of their results and describe any limitations in their data, which could lead to further data transformations or weighting.

3. In public health practice, when might you choose to use ANOVA?

When comparing at least 3 groups by their mean of an outcome of interest. An example could be comparing maternal mortality by race/ethnicity.

4. What was the most challenging part of this lab activity?

Interpreting post-hoc testing results.