Lab Overview

Time: ~30 minutes

Goal: Practice one-way ANOVA analysis from start to finish using real public health data

Learning Objectives:

  • Understand when and why to use ANOVA instead of multiple t-tests
  • Set up hypotheses for ANOVA
  • Conduct and interpret the F-test
  • Perform post-hoc tests when appropriate
  • Check ANOVA assumptions
  • Calculate and interpret effect size (η²)

Structure:

  • Part A: Guided Example (follow along)
  • Part B: Your Turn (independent practice)

Submission: Upload your completed .Rmd file and published to Brightspace by the end of class.


Step 1: Setup and Data Preparation

# Load necessary libraries
library(tidyverse)   # For data manipulation and visualization
library(knitr)       # For nice tables
library(car)         # For Levene's test
library(NHANES)      # NHANES dataset

# Load the NHANES data
data(NHANES)

PART B: YOUR TURN - INDEPENDENT PRACTICE

Practice Problem: Physical Activity and Depression

Research Question: Is there a difference in the number of days with poor mental health across three physical activity levels (None, Moderate, Vigorous)?

Your Task: Complete the same 9-step analysis workflow you just practiced, but now on a different outcome and predictor.


Step 1: Data Preparation

# Prepare the dataset
set.seed(553)

mental_health_data <- NHANES %>%
  filter(Age >= 18) %>%
  filter(!is.na(DaysMentHlthBad) & !is.na(PhysActive)) %>%
  mutate(
    activity_level = case_when(
      PhysActive == "No" ~ "None",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays < 3 ~ "Moderate",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays >= 3 ~ "Vigorous",
      TRUE ~ NA_character_
    ),
    activity_level = factor(activity_level, 
                           levels = c("None", "Moderate", "Vigorous"))
  ) %>%
  filter(!is.na(activity_level)) %>%
  select(ID, Age, Gender, DaysMentHlthBad, PhysActive, activity_level)

# YOUR TURN: Display the first 6 rows and check sample sizes

head(mental_health_data)
## # A tibble: 6 × 6
##      ID   Age Gender DaysMentHlthBad PhysActive activity_level
##   <int> <int> <fct>            <int> <fct>      <fct>         
## 1 51624    34 male                15 No         None          
## 2 51624    34 male                15 No         None          
## 3 51624    34 male                15 No         None          
## 4 51630    49 female              10 No         None          
## 5 51647    45 female               3 Yes        Vigorous      
## 6 51647    45 female               3 Yes        Vigorous
table(mental_health_data$activity_level)
## 
##     None Moderate Vigorous 
##     3139      768     1850

YOUR TURN - Answer these questions:

  • How many people are in each physical activity group?
    • None: 3139
    • Moderate: 768
    • Vigorous: 1850

Step 2: Descriptive Statistics

# YOUR TURN: Calculate summary statistics by activity level
# Hint: Follow the same structure as the guided example
# Variables to summarize: n, Mean, SD, Median, Min, Max

summary_stats <- mental_health_data %>%
  group_by(activity_level) %>%
  summarise(
    n = n(),
    Mean = mean(DaysMentHlthBad),
    SD = sd(DaysMentHlthBad),
    Median = median(DaysMentHlthBad),
    Min = min(DaysMentHlthBad),
    Max = max(DaysMentHlthBad)
  )

summary_stats %>% 
  kable(digits = 2, 
        caption = "Descriptive Statistics: Bad Mental Health Days by Physical Activity Level")
Descriptive Statistics: Bad Mental Health Days by Physical Activity Level
activity_level n Mean SD Median Min Max
None 3139 5.08 9.01 0 0 30
Moderate 768 3.81 6.87 0 0 30
Vigorous 1850 3.54 7.17 0 0 30

YOUR TURN - Interpret:

  • Which group has the highest mean number of bad mental health days?

The No Physical Activity group has the highest mean number of bad mental health days.

  • Which group has the lowest?

The Vigorous Physical Activity group has the lowest mean number of bad mental health days.


Step 3: Visualization

# YOUR TURN: Create boxplots comparing DaysMentHlthBad across activity levels
# Hint: Use the same ggplot code structure as the example
# Change variable names and labels appropriately

ggplot(mental_health_data, 
  aes(x = activity_level, y = DaysMentHlthBad, fill = activity_level)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Bad Mental Health Days by Physical Activity Level",
    subtitle = "NHANES Data, Adults >= 18",
    x = "Physical Activity Level",
    y = "Number of Bad Mental Health Days",
    fill = "Physical Activity Level"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

YOUR TURN - Describe what you see:

  • Do the groups appear to differ?

The median values of the groups do not appear to differ too much, many people in all three physical activity groups had 0 bad mental health days. However, the No Physical Activity group appears to have a greater number of extreme values than the Vigorous and Moderate Activity groups.

  • Are the variances similar across groups?

Based on the IQR, the variances of the No Physical Activity and Moderate Physical Activity appear similar, however the variance of the Vigorous Activity group is smaller than the other two.


Step 4: Set Up Hypotheses

YOUR TURN - Write the hypotheses:

Null Hypothesis (H₀):

The number of bad mental health days in US adults >= 18 is not associated with Physical Activity Level.

Alternative Hypothesis (H₁):

The number of bad mental health days in US adults >= 18 is associated with Physical Activity Level.

Significance level: α = 0.05


Step 5: Fit the ANOVA Model

# YOUR TURN: Fit the ANOVA model
# Outcome: DaysMentHlthBad
# Predictor: activity_level

anova_model <- aov(DaysMentHlthBad ~ activity_level, data = mental_health_data)

# Display the ANOVA table
summary(anova_model)
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## activity_level    2   3109  1554.6   23.17 9.52e-11 ***
## Residuals      5754 386089    67.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

YOUR TURN - Extract and interpret the results:

  • F-statistic: 23.17
  • Degrees of freedom: df1 = 2 (k-1), df2 = 5754 (n-k)
  • p-value: 9.52e-11
  • Decision (reject or fail to reject H₀): Reject H0
  • Statistical conclusion in words: There are significant differences in the number of bad mental health days between Physical Activity groups.

Step 6: Post-Hoc Tests

# YOUR TURN: Conduct Tukey HSD test
# Only if your ANOVA p-value < 0.05

tukey_results <- TukeyHSD(anova_model)
print(tukey_results)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = DaysMentHlthBad ~ activity_level, data = mental_health_data)
## 
## $activity_level
##                         diff       lwr        upr     p adj
## Moderate-None     -1.2725867 -2.045657 -0.4995169 0.0003386
## Vigorous-None     -1.5464873 -2.109345 -0.9836298 0.0000000
## Vigorous-Moderate -0.2739006 -1.098213  0.5504114 0.7159887
# Visualize the confidence intervals
plot(tukey_results, las = 0)

YOUR TURN - Complete the table:

Comparison Mean Difference 95% CI Lower 95% CI Upper p-value Significant?
Moderate-None -1.2725867 -2.045657 -0.4995169 0.0003386 Yes
Vigorous-None -1.5464873 -2.109345 -0.9836298 0.0000000 Yes
Vigorous-Moderate -0.2739006 -1.098213 0.5504114 0.7159887 No

Interpretation:

Which specific groups differ significantly?

No Physical Activity differs significantly from Moderate Physical Activity and Vigorous Physical Activity. There is no significant difference between Vigorous and Moderate.


Step 7: Calculate Effect Size

# YOUR TURN: Calculate eta-squared
# Hint: Extract Sum Sq from the ANOVA summary

anova_summary <- summary(anova_model)[[1]]

ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)

# Calculate eta-squared
eta_squared <- ss_treatment / ss_total

cat("Eta-squared (η²):", round(eta_squared, 4), "\n")
## Eta-squared (η²): 0.008
cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")
## Percentage of variance explained: 0.8 %

YOUR TURN - Interpret:

  • η² = 0.008
  • Percentage of variance explained: 0.8%
  • Effect size classification (small/medium/large): Small
  • What does this mean practically?

Even though the ANOVA revealed a significant difference in the number of bad mental health days between physical activity groups, physical activity group does not explain much of the variance in bad mental health days.


Step 8: Check Assumptions

# YOUR TURN: Create diagnostic plots

par(mfrow = c(2, 2))
plot(anova_model)

par(mfrow = c(1, 1))

YOUR TURN - Evaluate each plot:

  1. Residuals vs Fitted:

Points are not equally spread below and above zero. This indicates that the assumption of independence of observations is not accurate.

  1. Q-Q Plot:

Points deviate substantially from the diagonal line, indicating the assumption of normality has been violated.

  1. Scale-Location:

The line is not flat and exhibits a clear upward trend. This means that the groups do not have equal variances.

  1. Residuals vs Leverage:

There are some points outside the boundaries of Cook’s distance, indicating there are outliers that substantially influence the results.

# YOUR TURN: Conduct Levene's test

levene_test <- leveneTest(DaysMentHlthBad ~ activity_level, data = mental_health_data)
print(levene_test)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    2  23.168 9.517e-11 ***
##       5754                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

YOUR TURN - Overall assessment:

  • Are assumptions reasonably met?

No, all the ANOVA assumptions are violated.

  • Do any violations threaten your conclusions?

No because the sample size is very large (N=5757) so the ANOVA is still valid in spite of the assumption violations.


Step 9: Write Up Results

YOUR TURN - Write a complete 2-3 paragraph results section:

Include: 1. Sample description and descriptive statistics 2. F-test results 3. Post-hoc comparisons (if applicable) 4. Effect size interpretation 5. Public health significance

Your Results Section:

A one-way ANOVA was conducted on a sample of adults >= 18 years of age from the NHANES dataset (N = 5757). We tested whether the number of Bad Mental Health Days differed across three levels of physical activity: None (n=3139), Moderate (n=768), and High (n=1850). There was a significant difference in number of Bad Mental Health days across groups, F(2,5754) = 23.17, p<0.001, with an effect size of η² = 0.008.

Post-hoc Tukey HSD tests indicated a significant difference between No Physical Activity and Moderate Physical Activity (mean difference = -1.27, 95% CI[-2.05, -0.50], p<0.001), as well as between No Physical Activity and Vigorous Physical Activity (mean difference = -1.55, 95% CI[-2.11, -0.98], p<0.001). No significant difference existed between Moderate and Vigorous Physical Activity (mean difference = -0.27, 95% CI[-1.10, 0.55], p=0.72).

Although the results were statistically significant, the low effect size (η² = 0.008) indicates that only 0.8% of the variance in number of Bad Mental Health days is explained by Physical Activity. From a public health standpoint, other variables such as diet, socioeconomic status, and stressful life events are likely more important contributors to mental health outcomes.


Reflection Questions

1. How does the effect size help you understand the practical vs. statistical significance?

The effect size helps us better understand the extent to which the groups differ, as opposed to the p-value which simply tells you if a result is statistically significant. Effect size tells us how much of the variance in the outcome variable is explained by the predictor variable. A result can be statistically significant, but if effect size is low, the predictor variable does not explain much of the variance of the outcome. Thus the predictor variable alone likely does not have great practical significance.

2. Why is it important to check ANOVA assumptions? What might happen if they’re violated?

It is important to check assumptions because whether or not they are met determines the validity of the ANOVA results. If they are violated, and the sample size is low, the ANOVA results may paint an inaccurate picture of group differences.

3. In public health practice, when might you choose to use ANOVA?

An ANOVA might be used to determine if there is a significant difference in disease outcome between three or more levels of an exposure.

4. What was the most challenging part of this lab activity?

For me, the most challenging part of this activity was interpreting the diagnostic tests.


Submission Checklist

Before submitting, verify you have:

To submit: Upload both your .Rmd file and the HTML output to Brightspace.


Lab completed on: February 03, 2026


GRADING RUBRIC (For TA Use)

Total Points: 15

Category Criteria Points Notes
Code Execution All code chunks run without errors 4 - Deduct 1 pt per major error
- Deduct 0.5 pt per minor warning
Completion All “YOUR TURN” sections attempted 4 - Part B Steps 1-9 completed
- All fill-in-the-blank answered
- Tukey table filled in
Interpretation Correct statistical interpretation 4 - Hypotheses correctly stated (1 pt)
- ANOVA results interpreted (1 pt)
- Post-hoc results interpreted (1 pt)
- Assumptions evaluated (1 pt)
Results Section Professional, complete write-up 3 - Includes descriptive stats (1 pt)
- Reports F-test & post-hoc (1 pt)
- Effect size & significance (1 pt)

Detailed Grading Guidelines

Code Execution (4 points):

  • 4 pts: All code runs perfectly, produces correct output
  • 3 pts: Minor issues (1-2 small errors or warnings)
  • 2 pts: Several errors but demonstrates understanding
  • 1 pt: Major errors, incomplete code
  • 0 pts: Code does not run at all

Completion (4 points):

  • 4 pts: All sections attempted thoughtfully
  • 3 pts: 1-2 sections incomplete or minimal effort
  • 2 pts: Several sections missing
  • 1 pt: Only partial completion
  • 0 pts: Little to no work completed

Interpretation (4 points):

  • 4 pts: All interpretations correct and well-explained
  • 3 pts: Minor errors in interpretation
  • 2 pts: Several interpretation errors
  • 1 pt: Significant misunderstanding of concepts
  • 0 pts: No interpretation provided

Results Section (3 points):

  • 3 pts: Publication-quality, complete results section
  • 2 pts: Good but missing some elements
  • 1 pt: Incomplete or poorly written
  • 0 pts: No results section written

Common Deductions

  • -0.5 pts: Missing sample sizes in write-up
  • -0.5 pts: Not reporting confidence intervals
  • -1 pt: Incorrect hypothesis statements
  • -1 pt: Misinterpreting p-values
  • -1 pt: Not checking assumptions
  • -0.5 pts: Poor formatting (no tables, unclear output)