# Load necessary libraries
library(tidyverse) # For data manipulation and visualization
library(knitr) # For nice tables
library(car) # For Levene's test
library(NHANES) # NHANES dataset
library(dplyr)
# Load the NHANES data
data(NHANES)
Your Task: Complete the same 9-step analysis workflow you just practiced, but now on a different outcome and predictor.
# Prepare the dataset
set.seed(553)
mental_health_data <- NHANES %>%
filter(Age >= 18) %>%
filter(!is.na(DaysMentHlthBad) & !is.na(PhysActive)) %>%
mutate(
activity_level = case_when(
PhysActive == "No" ~ "None",
PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays < 3 ~ "Moderate",
PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays >= 3 ~ "Vigorous",
TRUE ~ NA_character_
),
activity_level = factor(activity_level,
levels = c("None", "Moderate", "Vigorous"))
) %>%
filter(!is.na(activity_level)) %>%
select(ID, Age, Gender, DaysMentHlthBad, PhysActive, activity_level)
# YOUR TURN: Display the first 6 rows and check sample sizes
head(mental_health_data) %>%
kable(caption = "Poor Mental Health and Physical Activity Dataset (first 6 rows)")
| ID | Age | Gender | DaysMentHlthBad | PhysActive | activity_level |
|---|---|---|---|---|---|
| 51624 | 34 | male | 15 | No | None |
| 51624 | 34 | male | 15 | No | None |
| 51624 | 34 | male | 15 | No | None |
| 51630 | 49 | female | 10 | No | None |
| 51647 | 45 | female | 3 | Yes | Vigorous |
| 51647 | 45 | female | 3 | Yes | Vigorous |
# Check sample sizes
table(mental_health_data$activity_level)
##
## None Moderate Vigorous
## 3139 768 1850
YOUR TURN - Answer these questions:
# YOUR TURN: Calculate summary statistics by activity level
summary_stats <- mental_health_data %>%
group_by(activity_level) %>%
summarise(
n = n(),
Mean = mean(DaysMentHlthBad),
SD = sd(DaysMentHlthBad),
Median = median(DaysMentHlthBad),
Min = min(DaysMentHlthBad),
Max = max(DaysMentHlthBad)
)
summary_stats %>%
kable(digits = 2,
caption = "Descriptive Statistics: Poor Mental Health by Physical Activity level")
| activity_level | n | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|---|
| None | 3139 | 5.08 | 9.01 | 0 | 0 | 30 |
| Moderate | 768 | 3.81 | 6.87 | 0 | 0 | 30 |
| Vigorous | 1850 | 3.54 | 7.17 | 0 | 0 | 30 |
YOUR TURN - Interpret:
Which group has the highest mean number of bad mental health days? Answer: Group which had None physical activity
Which group has the lowest? Answer: Vigorous group
# YOUR TURN: Create boxplots comparing DaysMentHlthBad across activity levels
ggplot(mental_health_data,
aes(x = activity_level, y = DaysMentHlthBad, fill = activity_level)) +
geom_boxplot(alpha = 0.7, outlier.shape = NA) +
geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Poor Mental Health Days by Physical Activity Level",
subtitle = "NHANES Data, Adults aged 18-65",
x = "Physical Activity Level",
y = "Poor Mental Health Days",
fill = "Activity level"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "none")
# Hint: Use the same ggplot code structure as the example
# Change variable names and labels appropriately
YOUR TURN - Describe what you see:
All three groups have the same median: 0 poor mental health days.That means most people do not have mental health issues.
However, Q3 is highest among those with none physical activity, lower in the moderate activity level groups and lowest among the vigorous activity levels. This means those with vigorous activity level tend to have a fewer poor-mental-helath days.
The boxplots indicate greater variability in poor mental-health days among adults reporting “None” physical activity, as evidenced by a larger interquartile range and wider spread of values compared to the “Moderate” and “Vigorous” groups.
Null Hypothesis (H₀): μ_none = μ_moderate =
μ_vigirous
(All three population means are equal)
Alternative Hypothesis (H₁): At least one population mean differs from the others
Significance level: α = 0.05
# Fit the one-way ANOVA model
anova_model <- aov(DaysMentHlthBad ~ activity_level, data = mental_health_data)
# Display the ANOVA table
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## activity_level 2 3109 1554.6 23.17 9.52e-11 ***
## Residuals 5754 386089 67.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:
YOUR TURN - Extract and interpret the results:
Why do we need this? The F-test tells us that groups differ, but not which groups differ. Tukey’s Honest Significant Difference controls the family-wise error rate for multiple pairwise comparisons.
# Conduct Tukey HSD test
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = DaysMentHlthBad ~ activity_level, data = mental_health_data)
##
## $activity_level
## diff lwr upr p adj
## Moderate-None -1.2725867 -2.045657 -0.4995169 0.0003386
## Vigorous-None -1.5464873 -2.109345 -0.9836298 0.0000000
## Vigorous-Moderate -0.2739006 -1.098213 0.5504114 0.7159887
# Visualize the confidence intervals
plot(tukey_results, las = 0)
YOUR TURN - Complete the table:
| Comparison | Mean Difference | 95% CI Lower | 95% CI Upper | p-value | Significant? |
|---|---|---|---|---|---|
| Moderate - None | -1.2725867 | -2.045657 | -0.4995169 | 0.0003386 | Yes |
| Vigorous - None | -1.5464873 | -2.109345 | -0.9836298 | 0.0000000 | Yes |
| Vigorous - Moderate | -0.2739006 | -1.098213 | 0.5504114 | 0.7159887 | No |
Interpretation: Answer: Based on the Tukey-adjusted 95% confidence intervals, both the moderate- and vigorous- activity groups differed significantly from the no-activity group because their intervals excluded zero, whereas the confidence interval for the vigorous-moderate comparison included zero, indicating no significant difference.
# YOUR TURN: Calculate eta-squared
# Hint: Extract Sum Sq from the ANOVA summary
# Extract sum of squares from ANOVA table
anova_summary <- summary(anova_model)[[1]]
ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)
# Calculate eta-squared
eta_squared <- ss_treatment / ss_total
cat("Eta-squared (η²):", round(eta_squared, 4), "\n")
## Eta-squared (η²): 0.008
cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")
## Percentage of variance explained: 0.8 %
YOUR TURN - Interpret:
Only 0.8% of the variance in poor mental health days is explained by physical activity level 99.2% of the variance is due to other factors (genetics, stress, socioeconomic status, sleep, diet, social support, etc.)
Effective size indicate a small portion of the variance.
Individual Level: Physical activity is a very small piece of the mental health puzzle. You can’t predict someone’s mental health well just by knowing their activity level.
Population Level: Even small effects matter when applied to millions of people. A difference of 1.5 fewer poor mental health days per month across a population is still meaningful for public health.
# YOUR TURN: Create diagnostic plots
par(mfrow = c(2, 2))
plot(anova_model)
par(mfrow = c(1, 1))
YOUR TURN - Evaluate each plot:
Residuals vs Fitted: Points are clustered at a few fitted values with no obvious curved pattern and the smoothing line is roughly flat, suggesting the linearity assumption is generally reasonable.
Q-Q Plot: Plot revealed pronounced departures from the reference line, particularly in the tails of the distribution, indicating that the residuals deviated substantially from normality, hence skewed to the right.
Scale-Location: Plot displayed relatively constant variability of standardized residuals across fitted values, with no clear increasing or decreasing trend, suggesting that the assumption of constant variance is mostly satisfied.
Residuals vs Leverage: Plot indicated that all observations fall within the Cook’s distance boundaries and show low leverage, meaning no single data point is unduly influencing the model.
# YOUR TURN: Conduct Levene's test
levene_test <- leveneTest(DaysMentHlthBad ~ activity_level, data = mental_health_data)
print(levene_test)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 23.168 9.517e-11 ***
## 5754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
YOUR TURN - Overall assessment:
YOUR TURN - Write a complete 2-3 paragraph results section:
Include: 1. Sample description and descriptive statistics 2. F-test results 3. Post-hoc comparisons (if applicable) 4. Effect size interpretation 5. Public health significance
Your Results Section:
The study analyzed 5,757 adults with complete data on physical activity and mental health outcomes. The sample included three groups based on physical activity level: None (n = 3,139, 54.5%), Moderate (n = 768, 13.3%), and Vigorous (n = 1,850, 32.1%). Participants with no physical activity reported an average of 5.08 days (SD = 9.01) of poor mental health in the past 30 days, compared to 3.81 days (SD = 6.87) for moderate activity and 3.54 days (SD = 7.17) for vigorous activity. Although Levene’s test indicated unequal variances across groups (F(2, 5754) = 23.17, p < .001), the ANOVA was conducted given the robustness of the F-test to this violation with large sample sizes.
The one-way ANOVA revealed a statistically significant effect of physical activity level on poor mental health days, F(2, 5754) = 23.17, p < .001, η² = 0.008. Tukey’s HSD post-hoc tests showed that adults with no physical activity reported significantly more poor mental health days than those with moderate activity (mean difference = 1.27 days, 95% CI [0.50, 2.05], p < .001) and vigorous activity (mean difference = 1.55 days, 95% CI [0.98, 2.11], p < .001). However, no significant difference emerged between moderate and vigorous activity levels (mean difference = 0.27 days, 95% CI [-0.55, 1.10], p = .72). The effect size was small, with physical activity level accounting for less than 1% of the variance in poor mental health days. Despite the modest effect size, the absolute difference of approximately 1.5 fewer poor mental health days per month among physically active adults represents a meaningful public health benefit at the population level, suggesting that promoting any level of physical activity may contribute to improved mental health outcomes.
1. How does the effect size help you understand the practical vs. statistical significance?
Effect size tells me whether a statistically significant finding actually matters in the real world. In my analysis, I found p < .001 (statistically significant), but η² = 0.008 means physical activity only explains 0.8% of mental health variance. This shows me that while the effect is real, it’s very small - physical activity is just one tiny piece of the mental health puzzle. Without effect size, I might have overinterpreted my significant p-value and thought exercise was way more important than it actually is for individual mental health outcomes.
2. Why is it important to check ANOVA assumptions? What might happen if they’re violated?
Checking assumptions ensures the validity of ANOVA results. Violated independence increases Type I error rates. Violated homogeneity of variance affects accuracy of p-values and confidence intervals. Violated normality impacts the F-test, though large samples provide robustness. In this analysis, the large sample size (n = 5,757) mitigated concerns about heterogeneity and non-normality, but duplicate IDs posed a more serious threat to independence.
3. In public health practice, when might you choose to use ANOVA?
I would use ANOVA when comparing a continuous health outcome across three or more groups. For example: comparing blood pressure across different diet interventions (Mediterranean, low-carb, standard), testing if disease rates differ across multiple geographic regions, evaluating if health education programs at different intensity levels affect knowledge scores differently, or like in my study, comparing mental health days across physical activity levels. ANOVA is particularly useful in public health because we often want to compare multiple intervention strategies or exposure levels at once, rather than just two groups.
4. What was the most challenging part of this lab activity?
Interpreting the discrepancy between statistical significance (p < .001) and practical significance (η² = 0.008) required careful consideration of both individual-level and population-level implications. Additionally, determining which assumption violations were acceptable given the large sample size versus which threatened validity required applying statistical judgment rather than following fixed rules.
Before submitting, verify you have:
To submit: Upload both your .Rmd file and the HTML output to Brightspace.
Lab completed on: February 05, 2026
Total Points: 15
| Category | Criteria | Points | Notes |
|---|---|---|---|
| Code Execution | All code chunks run without errors | 4 | - Deduct 1 pt per major error - Deduct 0.5 pt per minor warning |
| Completion | All “YOUR TURN” sections attempted | 4 | - Part B Steps 1-9 completed - All fill-in-the-blank answered - Tukey table filled in |
| Interpretation | Correct statistical interpretation | 4 | - Hypotheses correctly stated (1 pt) - ANOVA results interpreted (1 pt) - Post-hoc results interpreted (1 pt) - Assumptions evaluated (1 pt) |
| Results Section | Professional, complete write-up | 3 | - Includes descriptive stats (1 pt) - Reports F-test & post-hoc (1 pt) - Effect size & significance (1 pt) |
Code Execution (4 points):
Completion (4 points):
Interpretation (4 points):
Results Section (3 points):