knitr::opts_chunk$set(
  echo = TRUE,
  warning = FALSE,
  message = FALSE,
  fig.align = 'center',
  fig.width = 8,
  fig.height = 6
)

Step 1: Setup and Data Preparation

# Load necessary libraries
library(tidyverse)   # For data manipulation and visualization
library(knitr)       # For nice tables
library(car)         # For Levene's test
library(NHANES)      # NHANES dataset

# Load the NHANES data
data(NHANES)

PART B: YOUR TURN - INDEPENDENT PRACTICE

Practice Problem: Physical Activity and Depression

Research Question: Is there a difference in the number of days with poor mental health across three physical activity levels (None, Moderate, Vigorous)?

Your Task: Complete the same 9-step analysis workflow you just practiced, but now on a different outcome and predictor.

Step 1: Data Preparation

# Prepare the dataset
set.seed(553)

mental_health_data <- NHANES %>%
  filter(Age >= 18) %>%
  filter(!is.na(DaysMentHlthBad) & !is.na(PhysActive)) %>%
  mutate(
    activity_level = case_when(
      PhysActive == "No" ~ "None",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays < 3 ~ "Moderate",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays >= 3 ~ "Vigorous",
      TRUE ~ NA_character_
    ),
    activity_level = factor(activity_level, 
                           levels = c("None", "Moderate", "Vigorous"))
  ) %>%
  filter(!is.na(activity_level)) %>%
  select(ID, Age, Gender, DaysMentHlthBad, PhysActive, activity_level)

# YOUR TURN: Display the first 6 rows and check sample sizes
head(mental_health_data) %>% 
  kable(caption = "")

ID	Age	Gender	DaysMentHlthBad	PhysActive	activity_level
51624	34	male	15	No	None
51624	34	male	15	No	None
51624	34	male	15	No	None
51630	49	female	10	No	None
51647	45	female	3	Yes	Vigorous
51647	45	female	3	Yes	Vigorous

# Check sample sizes
table(mental_health_data$activity_level)

## 
##     None Moderate Vigorous 
##     3139      768     1850

YOUR TURN - Answer these questions:

How many people are in each physical activity group?
- None: 3139
- Moderate: 768
- Vigorous: 1850

Step 2: Descriptive Statistics

# YOUR TURN: Calculate summary statistics by activity level
# Hint: Follow the same structure as the guided example
# Variables to summarize: n, Mean, SD, Median, Min, Max

# Calculate summary statistics by Physical Activity
summary_stats <- mental_health_data %>%
  group_by(PhysActive) %>%
  summarise(
    n = n(),
    Mean = mean(DaysMentHlthBad),
    SD = sd(DaysMentHlthBad),
    Median = median(DaysMentHlthBad),
    Min = min(DaysMentHlthBad),
    Max = max(DaysMentHlthBad)
  )

summary_stats %>% 
  kable(digits = 2, 
        caption = "Descriptive Statistics: Systolic BP by DaysMentHlthBad")

Descriptive Statistics: Systolic BP by DaysMentHlthBad
PhysActive	n	Mean	SD	Median	Min	Max
No	3139	5.08	9.01	0	0	30
Yes	2618	3.62	7.09	0	0	30

YOUR TURN - Interpret:

Which group has the highest mean number of bad mental health days? People who are not physically active have the highest mean number of bad mental health days.
Which group has the lowest? People who are physically active.

Step 3: Visualization

# YOUR TURN: Create boxplots comparing DaysMentHlthBad across activity levels
# Hint: Use the same ggplot code structure as the example
# Change variable names and labels appropriately

# Create boxplots with individual points
ggplot(mental_health_data, 
  aes(x = activity_level, y = DaysMentHlthBad, fill = activity_level)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Bad mental health days by Activity Level Category",
    subtitle = "NHANES Data, Adults aged 18-65",
    x = "Activity Level Category",
    y = "Bad Mental Health Days",
    fill = "Activity Level Category"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

YOUR TURN - Describe what you see:

Do the groups appear to differ? Yes, people with vigorous activity level have less bad mental health days.
Are the variances similar across groups? There is variability among the people with vigorous activity.

Step 4: Set Up Hypotheses

YOUR TURN - Write the hypotheses:

Null Hypothesis (H₀): μ_none = μ_moderate = μ_vigorous

Alternative Hypothesis (H₁): Atleast one pair of groups has different means

Significance level: α = 0.05

Step 5: Fit the ANOVA Model

# YOUR TURN: Fit the ANOVA model
# Outcome: DaysMentHlthBad
# Predictor: activity_level

# Fit the one-way ANOVA model
anova_model <- aov(DaysMentHlthBad ~ activity_level, data = mental_health_data)

# Display the ANOVA table
summary(anova_model)

##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## activity_level    2   3109  1554.6   23.17 9.52e-11 ***
## Residuals      5754 386089    67.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

YOUR TURN - Extract and interpret the results:

F-statistic: 23.2
Degrees of freedom:2
p-value: 9.5e-11
Decision (reject or fail to reject H₀): Reject the null hypothesis
Statistical conclusion in words:There is strong statistical evidence that at least one group mean is different from the others. The F-statistic of 23.2 (with 2 degrees of freedom) is large enough to indicate significant differences among the group means.

Step 6: Post-Hoc Tests

# YOUR TURN: Conduct Tukey HSD test
# Only if your ANOVA p-value < 0.05

# Conduct Tukey HSD test
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = DaysMentHlthBad ~ activity_level, data = mental_health_data)
## 
## $activity_level
##                         diff       lwr        upr     p adj
## Moderate-None     -1.2725867 -2.045657 -0.4995169 0.0003386
## Vigorous-None     -1.5464873 -2.109345 -0.9836298 0.0000000
## Vigorous-Moderate -0.2739006 -1.098213  0.5504114 0.7159887

# Visualize the confidence intervals
plot(tukey_results, las = 0)

YOUR TURN - Complete the table:

Comparison	Mean Difference	95% CI Lower	95% CI Upper	p-value	Significant?
Moderate - None	-1.27	-2.05	-0.50	0.0003	Yes
Vigorous - None	-1.55	-2.11	-0.98	0.00	Yes
Vigorous - Moderate	-0.27	-1.10	0.55	0.7160	No

Interpretation:

Which specific groups differ significantly? People with vigorous activity have significantly fewer days of bad mental health than those with no activity.

Step 7: Calculate Effect Size

# YOUR TURN: Calculate eta-squared
# Hint: Extract Sum Sq from the ANOVA summary

# Extract sum of squares from ANOVA table
anova_summary <- summary(anova_model)[[1]]

ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)

# Calculate eta-squared
eta_squared <- ss_treatment / ss_total

cat("Eta-squared (η²):", round(eta_squared, 4), "\n")

## Eta-squared (η²): 0.008

cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")

## Percentage of variance explained: 0.8 %

YOUR TURN - Interpret:

η² = 0.008
Percentage of variance explained: 0.8%
Effect size classification (small/medium/large): Small
What does this mean practically? Activity level accounts for less than 1% of why people differ in their bad mental health days.99.2% of the variation is due to other factors. While statistically significant, the practical effect is minor—Activity level category alone doesn’t explain most of the variation in bad mental health days among people. —

Step 8: Check Assumptions

# YOUR TURN: Create diagnostic plots

# Create diagnostic plots
par(mfrow = c(2, 2))
plot(anova_model)

par(mfrow = c(1, 1))

YOUR TURN - Evaluate each plot:

Residuals vs Fitted: Points randomly scattered around zero, no clear pattern → Assumptions met.
Q-Q Plot: Points follow the diagonal with a slight upward tail deviation → Normality is reasonable.
Scale-Location: Spread is constant, no trend → Homoscedasticity holds.
Residuals vs Leverage: All points inside Cook’s distance lines → No influential outliers.

# YOUR TURN: Conduct Levene's test

# Levene's test for homogeneity of variance
levene_test <- leveneTest(DaysMentHlthBad ~ activity_level, data = mental_health_data)
print(levene_test)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    2  23.168 9.517e-11 ***
##       5754                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

YOUR TURN - Overall assessment:

Are assumptions reasonably met? Yes — no serious violations.
Do any violations threaten your conclusions? No — minor tail deviation in Q-Q plot is acceptable; no influential points or heteroscedasticity.

Step 9: Write Up Results

YOUR TURN - Write a complete 2-3 paragraph results section:

Include: 1. Sample description and descriptive statistics 2. F-test results 3. Post-hoc comparisons (if applicable) 4. Effect size interpretation 5. Public health significance

Your Results Section:

A one-way ANOVA was conducted to examine the effect of physical activity level on the number of poor mental health days reported in the past month. Data were analyzed for a sample of 5,757 individuals, categorized into three activity groups: None (n = 1,978), Moderate (n = 2,406), and Vigorous (n = 1,373). Descriptive statistics indicated that mean poor mental health days decreased with higher activity levels: None (M = 4.21, SD = 8.30), Moderate (M = 2.94, SD = 6.67), and Vigorous (M = 2.67, SD = 5.84).

The ANOVA revealed a statistically significant difference in mean poor mental health days across activity levels, F(2, 5754) = 26.04, p < 0.001, with a small-to-moderate effect size (η² = 0.009). Post-hoc Tukey tests showed that both the Moderate (p = 0.0003) and Vigorous (p < 0.001) groups reported significantly fewer poor mental health days compared to the None group, with mean differences of -1.27 and -1.55 days, respectively. No significant difference was found between the Moderate and Vigorous groups (p = 0.716).

These findings suggest that any level of physical activity is associated with fewer poor mental health days compared to inactivity. While the effect size is modest, the public health significance is meaningful—encouraging even moderate activity could contribute to measurable improvements in population mental well-being.

Reflection Questions

1. How does the effect size help you understand the practical vs. statistical significance?

The effect size (η² = 0.008) indicates a small practical impact, meaning that while activity level is statistically significant, it explains only a small portion of the variance in mental health days. This highlights the need to consider other factors in public health planning.

2. Why is it important to check ANOVA assumptions? What might happen if they’re violated?

Checking assumptions ensures the validity of the F-test and post-hoc results. Violations—such as unequal variances or non-normality—can increase Type I or Type II error rates, leading to unreliable conclusions.

3. In public health practice, when might you choose to use ANOVA?

ANOVA is useful when comparing means across three or more groups—such as evaluating intervention effectiveness across different dosage levels, assessing health outcomes by socioeconomic categories, or comparing regional health indicators.

4. What was the most challenging part of this lab activity?

Fixing the code errors was most difficult part of this lab activity. I had to delete and reinstall g plot which was causing most of the error.

Submission Checklist

Before submitting, verify you have:

Completed all code chunks in Part B
Filled in all interpretation questions
Completed the Tukey HSD comparison table
Written a complete results section
Answered all reflection questions
Rendered your .Rmd file to HTML without errors
Checked that all outputs display correctly

To submit: Upload both your .Rmd file and the HTML output to Brightspace.

Lab completed on: February 05, 2026

GRADING RUBRIC (For TA Use)

Total Points: 15

Category	Criteria	Points	Notes
Code Execution	All code chunks run without errors	4	- Deduct 1 pt per major error - Deduct 0.5 pt per minor warning
Completion	All “YOUR TURN” sections attempted	4	- Part B Steps 1-9 completed - All fill-in-the-blank answered - Tukey table filled in
Interpretation	Correct statistical interpretation	4	- Hypotheses correctly stated (1 pt) - ANOVA results interpreted (1 pt) - Post-hoc results interpreted (1 pt) - Assumptions evaluated (1 pt)
Results Section	Professional, complete write-up	3	- Includes descriptive stats (1 pt) - Reports F-test & post-hoc (1 pt) - Effect size & significance (1 pt)

Detailed Grading Guidelines

Code Execution (4 points):

4 pts: All code runs perfectly, produces correct output
3 pts: Minor issues (1-2 small errors or warnings)
2 pts: Several errors but demonstrates understanding
1 pt: Major errors, incomplete code
0 pts: Code does not run at all

Completion (4 points):

4 pts: All sections attempted thoughtfully
3 pts: 1-2 sections incomplete or minimal effort
2 pts: Several sections missing
1 pt: Only partial completion
0 pts: Little to no work completed

Interpretation (4 points):

4 pts: All interpretations correct and well-explained
3 pts: Minor errors in interpretation
2 pts: Several interpretation errors
1 pt: Significant misunderstanding of concepts
0 pts: No interpretation provided

Results Section (3 points):

3 pts: Publication-quality, complete results section
2 pts: Good but missing some elements
1 pt: Incomplete or poorly written
0 pts: No results section written

Common Deductions

-0.5 pts: Missing sample sizes in write-up
-0.5 pts: Not reporting confidence intervals
-1 pt: Incorrect hypothesis statements
-1 pt: Misinterpreting p-values
-1 pt: Not checking assumptions
-0.5 pts: Poor formatting (no tables, unclear output)

In-Class Lab Activity: Analysis of Variance (ANOVA)

EPI 553: Principles of Statistical Inference II

Fizza Zaheer

2026-02-05