Lab Overview

Time: ~30 minutes

Goal: Practice one-way ANOVA analysis from start to finish using real public health data

Learning Objectives:

  • Understand when and why to use ANOVA instead of multiple t-tests
  • Set up hypotheses for ANOVA
  • Conduct and interpret the F-test
  • Perform post-hoc tests when appropriate
  • Check ANOVA assumptions
  • Calculate and interpret effect size (η²)

Structure:

  • Part A: Guided Example (follow along)
  • Part B: Your Turn (independent practice)

Submission: Upload your completed .Rmd file and published to Brightspace by the end of class.


PART A: GUIDED EXAMPLE

Example: Blood Pressure and BMI Categories

Research Question: Is there a difference in mean systolic blood pressure (SBP) across three BMI categories (Normal weight, Overweight, Obese)?

Why ANOVA? We have one continuous outcome (SBP) and one categorical predictor with THREE groups (BMI category). Using multiple t-tests would inflate our Type I error rate.


Step 1: Setup and Data Preparation

# Load necessary libraries
library(tidyverse)   # For data manipulation and visualization
library(knitr)       # For nice tables
library(car)         # For Levene's test
library(NHANES)      # NHANES dataset

# Load the NHANES data
data(NHANES)

Create analysis dataset:

# Set seed for reproducibility
set.seed(553)

# Create BMI categories and prepare data
bp_bmi_data <- NHANES %>%
  filter(Age >= 18 & Age <= 65) %>%  # Adults 18-65
  filter(!is.na(BPSysAve) & !is.na(BMI)) %>%
  mutate(
    bmi_category = case_when(
      BMI < 25 ~ "Normal",
      BMI >= 25 & BMI < 30 ~ "Overweight",
      BMI >= 30 ~ "Obese",
      TRUE ~ NA_character_
    ),
    bmi_category = factor(bmi_category, 
                         levels = c("Normal", "Overweight", "Obese"))
  ) %>%
  filter(!is.na(bmi_category)) %>%
  select(ID, Age, Gender, BPSysAve, BMI, bmi_category)

# Display first few rows
head(bp_bmi_data) %>% 
  kable(caption = "Blood Pressure and BMI Dataset (first 6 rows)")
Blood Pressure and BMI Dataset (first 6 rows)
ID Age Gender BPSysAve BMI bmi_category
51624 34 male 113 32.22 Obese
51624 34 male 113 32.22 Obese
51624 34 male 113 32.22 Obese
51630 49 female 112 30.57 Obese
51647 45 female 118 27.24 Overweight
51647 45 female 118 27.24 Overweight
# Check sample sizes
table(bp_bmi_data$bmi_category)
## 
##     Normal Overweight      Obese 
##       1939       1937       2150

Interpretation: We have 6026 adults with complete BP and BMI data across three BMI categories.


Step 2: Descriptive Statistics

# Calculate summary statistics by BMI category
summary_stats <- bp_bmi_data %>%
  group_by(bmi_category) %>%
  summarise(
    n = n(),
    Mean = mean(BPSysAve),
    SD = sd(BPSysAve),
    Median = median(BPSysAve),
    Min = min(BPSysAve),
    Max = max(BPSysAve)
  )

summary_stats %>% 
  kable(digits = 2, 
        caption = "Descriptive Statistics: Systolic BP by BMI Category")
Descriptive Statistics: Systolic BP by BMI Category
bmi_category n Mean SD Median Min Max
Normal 1939 114.23 15.01 113 78 221
Overweight 1937 118.74 13.86 117 83 186
Obese 2150 121.62 15.27 120 82 226

Observation: The mean SBP appears to increase from Normal (114.2) to Overweight (118.7) to Obese (121.6).

But is this difference statistically significant?


Step 3: Visualize the Data

# Create boxplots with individual points
ggplot(bp_bmi_data, 
  aes(x = bmi_category, y = BPSysAve, fill = bmi_category)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Systolic Blood Pressure by BMI Category",
    subtitle = "NHANES Data, Adults aged 18-65",
    x = "BMI Category",
    y = "Systolic Blood Pressure (mmHg)",
    fill = "BMI Category"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

What the plot tells us:

  • There appears to be a trend: higher BMI categories have higher median SBP
  • The boxes overlap, but the obese group appears shifted upward
  • Variability (box heights) looks similar across groups

Step 4: Set Up Hypotheses

Null Hypothesis (H₀): μ_Normal = μ_Overweight = μ_Obese
(All three population means are equal)

Alternative Hypothesis (H₁): At least one population mean differs from the others

Significance level: α = 0.05


Step 5: Fit the ANOVA Model

# Fit the one-way ANOVA model
anova_model <- aov(BPSysAve ~ bmi_category, data = bp_bmi_data)

# Display the ANOVA table
summary(anova_model)
##                Df  Sum Sq Mean Sq F value Pr(>F)    
## bmi_category    2   56212   28106   129.2 <2e-16 ***
## Residuals    6023 1309859     217                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation:

  • F-statistic: 129.24
  • Degrees of freedom: df₁ = 2 (k-1 groups), df₂ = 6023 (n-k)
  • p-value: < 2e-16 (very small)
  • Decision: Since p < 0.05, we reject H₀
  • Conclusion: There is statistically significant evidence that mean systolic BP differs across at least two BMI categories.

Step 6: Post-Hoc Tests (Tukey HSD)

Why do we need this? The F-test tells us that groups differ, but not which groups differ. Tukey’s Honest Significant Difference controls the family-wise error rate for multiple pairwise comparisons.

# Conduct Tukey HSD test
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BPSysAve ~ bmi_category, data = bp_bmi_data)
## 
## $bmi_category
##                       diff      lwr      upr p adj
## Overweight-Normal 4.507724 3.397134 5.618314     0
## Obese-Normal      7.391744 6.309024 8.474464     0
## Obese-Overweight  2.884019 1.801006 3.967033     0
# Visualize the confidence intervals
plot(tukey_results, las = 1)

Interpretation:

Comparison Mean Diff 95% CI p-value Significant?
Overweight - Normal 4.51 [3.4, 5.62] 3.82e-12 Yes
Obese - Normal 7.39 [6.31, 8.47] < 0.001 Yes
Obese - Overweight 2.88 [1.8, 3.97] 1.38e-09 Yes

Conclusion: All three pairwise comparisons are statistically significant. Obese adults have higher SBP than overweight adults, who in turn have higher SBP than normal-weight adults.


Step 7: Calculate Effect Size

# Extract sum of squares from ANOVA table
anova_summary <- summary(anova_model)[[1]]

ss_treatment <- anova_summary$`Sum Sq`[1]
ss_total <- sum(anova_summary$`Sum Sq`)

# Calculate eta-squared
eta_squared <- ss_treatment / ss_total

cat("Eta-squared (η²):", round(eta_squared, 4), "\n")
## Eta-squared (η²): 0.0411
cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")
## Percentage of variance explained: 4.11 %

Interpretation: BMI category explains 4.11% of the variance in systolic BP.

  • Effect size guidelines: Small (0.01), Medium (0.06), Large (0.14)
  • Our effect: Small

While statistically significant, the practical effect is modest—BMI category alone doesn’t explain most of the variation in blood pressure.


Step 8: Check Assumptions

ANOVA Assumptions:

  1. Independence: Observations are independent (assumed based on study design)
  2. Normality: Residuals are approximately normally distributed
  3. Homogeneity of variance: Equal variances across groups
# Create diagnostic plots
par(mfrow = c(2, 2))
plot(anova_model)

par(mfrow = c(1, 1))

Diagnostic Plot Interpretation:

  1. Residuals vs Fitted: Points show random scatter around zero with no clear pattern → Good!
  2. Q-Q Plot: Points follow the diagonal line reasonably well → Normality assumption is reasonable
  3. Scale-Location: Red line is relatively flat → Equal variance assumption is reasonable
  4. Residuals vs Leverage: No points beyond Cook’s distance lines → No highly influential outliers
# Levene's test for homogeneity of variance
levene_test <- leveneTest(BPSysAve ~ bmi_category, data = bp_bmi_data)
print(levene_test)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value  Pr(>F)  
## group    2  2.7615 0.06328 .
##       6023                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Levene’s Test Interpretation:

  • p-value: 0.0633
  • If p < 0.05, we would reject equal variances
  • Here: Equal variance assumption is met

Overall Assessment: With n > 2000, ANOVA is robust to minor violations. Our assumptions are reasonably satisfied.


Step 9: Report Results

Example Results Section:

We conducted a one-way ANOVA to examine whether mean systolic blood pressure (SBP) differs across BMI categories (Normal, Overweight, Obese) among 6,026 adults aged 18-65 from NHANES. Descriptive statistics showed mean SBP of 114.2 mmHg (SD = 15) for normal weight, 118.7 mmHg (SD = 13.9) for overweight, and 121.6 mmHg (SD = 15.3) for obese individuals.

The ANOVA revealed a statistically significant difference in mean SBP across BMI categories, F(2, 6023) = 129.24, p < 0.001. Tukey’s HSD post-hoc tests indicated that all pairwise comparisons were significant (p < 0.05): obese adults had on average 7.4 mmHg higher SBP than normal-weight adults, and 2.9 mmHg higher than overweight adults.

The effect size (η² = 0.041) indicates that BMI category explains 4.1% of the variance in systolic blood pressure, representing a small practical effect. These findings support the well-established relationship between higher BMI and elevated blood pressure, though other factors account for most of the variation in SBP.


PART B: YOUR TURN - INDEPENDENT PRACTICE

Practice Problem: Physical Activity and Depression

Research Question: Is there a difference in the number of days with poor mental health across three physical activity levels (None, Moderate, Vigorous)?

Your Task: Complete the same 9-step analysis workflow you just practiced, but now on a different outcome and predictor.


Step 1: Data Preparation

# Prepare the dataset
set.seed(553)

mental_health_data <- NHANES %>%
  filter(Age >= 18) %>%
  filter(!is.na(DaysMentHlthBad) & !is.na(PhysActive)) %>%
  mutate(
    activity_level = case_when(
      PhysActive == "No" ~ "None",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays < 3 ~ "Moderate",
      PhysActive == "Yes" & !is.na(PhysActiveDays) & PhysActiveDays >= 3 ~ "Vigorous",
      TRUE ~ NA_character_
    ),
    activity_level = factor(activity_level, 
                           levels = c("None", "Moderate", "Vigorous"))
  ) %>%
  filter(!is.na(activity_level)) %>%
  select(ID, Age, Gender, DaysMentHlthBad, PhysActive, activity_level)

# Display first few rows: Display the first 6 rows and check sample sizes
head(mental_health_data) %>% 
  kable(caption = "First 6 rows of Mental Health dataset")
First 6 rows of Mental Health dataset
ID Age Gender DaysMentHlthBad PhysActive activity_level
51624 34 male 15 No None
51624 34 male 15 No None
51624 34 male 15 No None
51630 49 female 10 No None
51647 45 female 3 Yes Vigorous
51647 45 female 3 Yes Vigorous
# Check Total Sample Size
cat("Sample size:", nrow(mental_health_data), "adults aged 18+\n")
## Sample size: 5757 adults aged 18+
# Check Each Category Sample Size
table(mental_health_data$activity_level)
## 
##     None Moderate Vigorous 
##     3139      768     1850

Answer these questions:

  • How many people are in each physical activity group?
    • None: 3139
    • Moderate: 768
    • Vigorous: 1850

Step 2: Descriptive Statistics

# Calculate summary statistics by activity level
# Variables to summarize: n, Mean, SD, Median, Min, Max
summary_stats <- mental_health_data %>%
  group_by(activity_level) %>%
  summarize (
    n = n(),
    Mean = mean(DaysMentHlthBad), 
    SD = sd(DaysMentHlthBad),
    Median = median(DaysMentHlthBad),
    Min = min(DaysMentHlthBad),
    Max = max(DaysMentHlthBad)
  )

summary_stats %>%
  kable(digits = 2, 
        caption = "Descriptive Statistics: Mean Bad Mental Health Days by Activity Level")
Descriptive Statistics: Mean Bad Mental Health Days by Activity Level
activity_level n Mean SD Median Min Max
None 3139 5.08 9.01 0 0 30
Moderate 768 3.81 6.87 0 0 30
Vigorous 1850 3.54 7.17 0 0 30

Interpret:

  • Which group has the highest mean number of bad mental health days?

The group with the highest mean number of bad mental health days are those who are classified under none for their activity level with a mean of 5.08.

  • Which group has the lowest?

The group with the lowest mean number of bad mental health days are those who are classified under vigorous for their activity level with a mean of 3.54.


Step 3: Visualization

# Create boxplots comparing DaysMentHlthBad across activity levels
ggplot(mental_health_data,
       aes (x = activity_level, y = DaysMentHlthBad, fill = activity_level)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter (width = 0.2, alpha = 0.1, size = 0.5) +
  scale_fill_brewer (palette = "Set2") +
  labs(
    title = "Mean Bad Mental Health Days by Activity Level",
    subtitle = "NHANES Data, Adults aged 18+",
    x = "Activity Level",
    y = "Bad Mental Health Days", 
    fill = "Activity Level"
  ) +
  theme_minimal (base_size = 12) +
  theme (legend.position = "none")

Describe what you see:

  • Do the groups appear to differ?

There appears to be a trend: higher activity level (vigorous) appear to have less bad mental health days

  • Are the variances similar across groups?

Variability (box heights) look similar across the none and moderate activity levels whereas vigorous appears to have a shorter box height or less variability.


Step 4: Set Up Hypotheses

Write the hypotheses:

Null Hypothesis (H₀):

Null Hypothesis (H₀): μ_None = μ_Moderate = μ_Vigorous (All three population means are equal)

Alternative Hypothesis (H₁):

Alternative Hypothesis (H₁): At least one population mean differs from the others

Significance level: α = 0.05


Step 5: Fit the ANOVA Model

# Fit the ANOVA model
# Outcome: DaysMentHlthBad
# Predictor: activity_level
anova_model <- aov(DaysMentHlthBad ~ activity_level, data = mental_health_data)

# Display the ANOVA table
summary(anova_model)
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## activity_level    2   3109  1554.6   23.17 9.52e-11 ***
## Residuals      5754 386089    67.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Extract and interpret the results:

  • F-statistic: 23.17
  • Degrees of freedom: df₁ = 2 (k-1 groups), df₂ = 5754 (n-k)
  • p-value: 9.52e-11
  • Decision (reject or fail to reject H₀): Since p < 0.05, we reject H₀
  • Statistical conclusion in words: There is statistically significant evidence that mean number of bad mental health days differs across at least two activity level categories.

Step 6: Post-Hoc Tests

# Conduct Tukey HSD test
# Only if your ANOVA p-value < 0.05

tukey_results <- TukeyHSD(anova_model)
print(tukey_results)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = DaysMentHlthBad ~ activity_level, data = mental_health_data)
## 
## $activity_level
##                         diff       lwr        upr     p adj
## Moderate-None     -1.2725867 -2.045657 -0.4995169 0.0003386
## Vigorous-None     -1.5464873 -2.109345 -0.9836298 0.0000000
## Vigorous-Moderate -0.2739006 -1.098213  0.5504114 0.7159887
# Visualize the confidence intervals

plot(tukey_results, las = 1)

Complete the table:

Comparison Mean Difference 95% CI Lower 95% CI Upper p-value Significant?
Moderate - None -1.27 -2.0 -0.5 <.001 Yes
Vigorous - None -1.55 -2.1 -1.0 <.001 Yes
Vigorous - Moderate -0.27 -1.1 0.6 0.716 No

Interpretation:

Which specific groups differ significantly?

Moderate compared to none and vigorous compared to none are considered statistically significant; however, vigorous compared to moderate is not statistically significant. None active individuals have higher mean number bad mental health days when compared to both moderate and vigorous activity level; however, there is not a statistically significant difference when comparing mean number bad mental health days in individuals with moderate and vigorous activity level.


Step 7: Calculate Effect Size

# Calculate eta-squared
# Extract Sum Sq from the ANOVA summary

anova_summary <- summary(anova_model)[[1]]

ss_treatment <- anova_summary$'Sum Sq'[1]
ss_total <- sum(anova_summary$'Sum Sq')

eta_squared <- ss_treatment / ss_total

cat("Eta-squared (η²):", round(eta_squared, 4), "/n")
## Eta-squared (η²): 0.008 /n
# Percentage of variance explained
cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")
## Percentage of variance explained: 0.8 %

Interpret:

  • η² = 0.008

  • Percentage of variance explained: 0.8%

  • Effect size classification (small/medium/large):

Small (less than 0.01)

  • What does this mean practically?

While statistically significant, the practical effect is minor - activity level alone doesn’t explain most of the variation in number of bad mental health days.


Step 8: Check Assumptions

# Create diagnostic plots

par(mfrow = c(2,2))
plot(anova_model)

par(mfrow = c(1,1))

Evaluate each plot:

  1. Residuals vs Fitted:

We see three vertical columns of points, with each representing a group of activity level. The red line is relatively flat and stays close to zero even though it does seem to lower slightly as it gets further away.

  1. Q-Q Plot:

Since the points do not follow the diagonal line in the top right, where they are curved away, we would conclude right-skewness. Since our sample size is large, Central LImit Theorem allows us to use ANOVA as it is robust to this violation.

  1. Scale-Location:

Red line is relatively flat except for a slight incline upward. The spread appears to be consistent across groups because the columns are approximately the same height. This could give insight to the Levene’s test which rejected the null hypotheis of equal variances (unequal variances found).

  1. Residuals vs Leverage:

Since there are no points in the corners that cross the dashed red lines called “Cook’s Distance,” there is no single outlier unfairly driving our results.

# Conduct Levene's test

levene_test <- leveneTest(DaysMentHlthBad ~ activity_level, data = mental_health_data)
print(levene_test)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    2  23.168 9.517e-11 ***
##       5754                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall assessment:

  • Are assumptions reasonably met?

Our assumptions are reasonably satisfied because we have an n > 2000. ANOVA is robust to minor violations, when the sample size is this large.

  • Do any violations threaten your conclusions?

P-value: 9.517e-11 Since p < 0.05, we would reject equal variances which would threaten our conclusions; however, we have a large sample size.


Step 9: Write Up Results

Write a complete 2-3 paragraph results section:

Include: 1. Sample description and descriptive statistics 2. F-test results 3. Post-hoc comparisons (if applicable) 4. Effect size interpretation 5. Public health significance

Your Results Section:

We conducted a one-way ANOVA to examine whether there is a difference in the mean number of days with poor mental health across three physical activity levels (none, moderate, vigorous) among 5757 adults aged 18 and over from NHANES. Descriptive statistics showed a mean number of bad mental health days of 5.08 (SD = 9.01) for activity level none, 3.81 for activity level moderate (SD = 6.87), and 3.54 for activity level vigorous (SD = 7.17).

The ANOVA revealed a statistically significant mean difference in number of bad mental health days across activity levels, F(2, 5754) = 23.17, p < 0.0001. Tukey’s HSD post-hoc tests indicated that two pairwise comparisons were significant (p < 0.05), individuals with activity level of moderate had on average 1.27 less bad mental health days than those classified as none and individuals with an activity level of vigorous had on average 1.55 less bad mental health days than those classified as none. The pairwise comparison of moderate to vigorous activity level was not significant (p = 0.716) with an average 0.27 less bad mental health days.

The effect size (η² = 0.008) indicates that activity level explains 0.8% of the variance in the number of bad mental health days, representing a very small practical effect. These findings support the well-established relationship between lower activity level and increased bad mental health days, though it appears that other factors account for most of the variance in number of bad mental health days.


Reflection Questions

1. How does the effect size help you understand the practical vs. statistical significance?

Eta-squared can help us understand effect size because it is a proportion of the total variance explained by the treatment group. Sometimes, the statistical versus practical significance have alternate meanings from one another. Although an effect size may be considered statistical, it may not be practical and vice versa. Effect size gives us conventional guidelines to compare our effect to and understand not only its statistical but also practical significance. For this reason, we should always report p-values with effect size to provide an overview.

2. Why is it important to check ANOVA assumptions? What might happen if they’re violated?

It is important to check assumptions in ANOVA especially when we have a small, unbalanced sample. If the ANOVA does not meet the three key assumptions and they are violated, then the validity of the p-values and inference may be invalid. If the sample size is above 30, ANOVA is typically robust to moderate violations of these assumptions. With the very small, unbalanced sample, we need to pay more attention to the violators because the outliers can have large effects on F-test validity.

3. In public health practice, when might you choose to use ANOVA?

ANOVA seems to be used in countless public health practices. It can be used to understand medication effectiveness such as monitoring blood pressure across different medication groups. It could be used to study environmental health by studying mean ospital admissions by different levels of pollution. In population health, it could be used to compare vaccination rates across geographic units. Finally for occupational health, it could be employed to understand differences in lung function based on occupation.

4. What was the most challenging part of this lab activity?

I think the most challenging part of this lab was interpreting the model diagnostics to check that the fitted ANOVA model meets all of its assumptions. I had never analyzed these plots before, so I truly did not know what I was supposed to be looking for as signifiers to denote failed assumptions.


Submission Checklist

Before submitting, verify you have:

To submit: Upload both your .Rmd file and the HTML output to Brightspace.


Lab completed on: February 05, 2026


GRADING RUBRIC (For TA Use)

Total Points: 15

Category Criteria Points Notes
Code Execution All code chunks run without errors 4 - Deduct 1 pt per major error
- Deduct 0.5 pt per minor warning
Completion All “YOUR TURN” sections attempted 4 - Part B Steps 1-9 completed
- All fill-in-the-blank answered
- Tukey table filled in
Interpretation Correct statistical interpretation 4 - Hypotheses correctly stated (1 pt)
- ANOVA results interpreted (1 pt)
- Post-hoc results interpreted (1 pt)
- Assumptions evaluated (1 pt)
Results Section Professional, complete write-up 3 - Includes descriptive stats (1 pt)
- Reports F-test & post-hoc (1 pt)
- Effect size & significance (1 pt)

Detailed Grading Guidelines

Code Execution (4 points):

  • 4 pts: All code runs perfectly, produces correct output
  • 3 pts: Minor issues (1-2 small errors or warnings)
  • 2 pts: Several errors but demonstrates understanding
  • 1 pt: Major errors, incomplete code
  • 0 pts: Code does not run at all

Completion (4 points):

  • 4 pts: All sections attempted thoughtfully
  • 3 pts: 1-2 sections incomplete or minimal effort
  • 2 pts: Several sections missing
  • 1 pt: Only partial completion
  • 0 pts: Little to no work completed

Interpretation (4 points):

  • 4 pts: All interpretations correct and well-explained
  • 3 pts: Minor errors in interpretation
  • 2 pts: Several interpretation errors
  • 1 pt: Significant misunderstanding of concepts
  • 0 pts: No interpretation provided

Results Section (3 points):

  • 3 pts: Publication-quality, complete results section
  • 2 pts: Good but missing some elements
  • 1 pt: Incomplete or poorly written
  • 0 pts: No results section written

Common Deductions

  • -0.5 pts: Missing sample sizes in write-up
  • -0.5 pts: Not reporting confidence intervals
  • -1 pt: Incorrect hypothesis statements
  • -1 pt: Misinterpreting p-values
  • -1 pt: Not checking assumptions
  • -0.5 pts: Poor formatting (no tables, unclear output)