Introduction

In the previous lectures, we fitted Multiple Linear Regression (MLR) models and interpreted coefficients, $R^2$, and overall model significance. But we glossed over a critical question: how do we formally test whether individual predictors — or groups of predictors — contribute meaningfully to the model?

This lecture focuses on hypothesis testing within the MLR framework. We will cover:

Partial F-tests for individual variables using Type I (sequential) and Type III (partial) sums of squares
T-tests for individual regression coefficients ($\beta$s) and their equivalence to Type III partial F-tests
Tests for groups of variables (chunk tests / multiple partial F-tests)
How to implement all of these in R using anova(), car::Anova(), and summary()

These tools are fundamental to epidemiologic analysis — they let us determine which variables belong in a model, whether confounders are operating, and whether a parsimonious model adequately describes the data.

Setup and Data

library(tidyverse)

## Warning: package 'ggplot2' was built under R version 4.4.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(haven)

## Warning: package 'haven' was built under R version 4.4.3

library(janitor)

## Warning: package 'janitor' was built under R version 4.4.3

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(knitr)

## Warning: package 'knitr' was built under R version 4.4.3

library(kableExtra)

## Warning: package 'kableExtra' was built under R version 4.4.3

## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(plotly)

## Warning: package 'plotly' was built under R version 4.4.3

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

library(broom)

## Warning: package 'broom' was built under R version 4.4.3

library(ggeffects)

## Warning: package 'ggeffects' was built under R version 4.4.3

library(ggstats)

## Warning: package 'ggstats' was built under R version 4.4.3

library(gtsummary)

## Warning: package 'gtsummary' was built under R version 4.4.3

library(GGally)

## Warning: package 'GGally' was built under R version 4.4.3

library(car)

## Warning: package 'car' was built under R version 4.4.3

## Loading required package: carData

## Warning: package 'carData' was built under R version 4.4.3

## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some

library(lmtest)

## Warning: package 'lmtest' was built under R version 4.4.3

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 4.4.3

## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(corrplot)

## Warning: package 'corrplot' was built under R version 4.4.3

## corrplot 0.95 loaded

Part 2: In-Class Lab Activity (Pair-Based)

EPI 553 — Tests of Hypotheses Lab Due: End of class

Data for the Lab

Use the same BRFSS 2020 dataset from the guided practice.

The BRFSS 2020 Dataset

We continue using the Behavioral Risk Factor Surveillance System (BRFSS) 2020 dataset. Our research question remains:

#BRFSS MLR Dataset

# Create the brfss_mlr dataset from scratch
set.seed(553)  # For reproducibility

n <- 5000  # Sample size

brfss_mlr <- tibble(
  # Mentally unhealthy days (0-30 scale)
  menthlth_days = round(pmin(pmax(rnbinom(n, mu = 5, size = 0.5), 0), 30)),
  
  # Physically unhealthy days (0-30 scale)
  physhlth_days = round(pmin(pmax(rnbinom(n, mu = 4, size = 0.5), 0), 30)),
  
  # Sleep hours per night (4-12 range)
  sleep_hrs = round(pmin(pmax(rnorm(n, mean = 7, sd = 1.5), 4), 12), 1),
  
  # Age in years (18-99)
  age = sample(18:99, n, replace = TRUE),
  
  # Income category (1-8, higher = more income)
  income_cat = sample(1:8, n, replace = TRUE, 
                      prob = c(0.1, 0.12, 0.13, 0.15, 0.15, 0.15, 0.1, 0.1)),
  
  # Sex as factor
  sex = factor(sample(c("Male", "Female"), n, replace = TRUE)),
  
  # Exercise as factor
  exercise = factor(sample(c("Yes", "No"), n, replace = TRUE, prob = c(0.7, 0.3)))
) %>%
  mutate(
    # Add realistic relationships
    menthlth_days = round(menthlth_days + 
                          0.3 * physhlth_days - 
                          0.2 * (sleep_hrs - 7) + 
                          0.02 * (50 - age) - 
                          0.1 * income_cat + 
                          ifelse(sex == "Female", 0.5, 0) - 
                          ifelse(exercise == "Yes", 0.3, 0)),
    # Ensure within 0-30 range
    menthlth_days = pmin(pmax(menthlth_days, 0), 30)
  )

# Verify it worked
glimpse(brfss_mlr)

## Rows: 5,000
## Columns: 7
## $ menthlth_days <dbl> 7, 12, 5, 4, 10, 0, 14, 7, 0, 13, 30, 4, 1, 1, 25, 15, 1…
## $ physhlth_days <dbl> 1, 16, 12, 7, 16, 3, 2, 2, 1, 1, 18, 1, 3, 0, 17, 0, 10,…
## $ sleep_hrs     <dbl> 5.9, 8.4, 6.9, 8.6, 7.0, 7.5, 8.4, 7.9, 6.6, 6.6, 5.6, 9…
## $ age           <int> 20, 57, 22, 65, 47, 24, 79, 81, 38, 48, 91, 25, 97, 60, …
## $ income_cat    <int> 2, 5, 7, 5, 6, 6, 7, 7, 4, 1, 1, 5, 1, 8, 7, 2, 3, 4, 6,…
## $ sex           <fct> Male, Female, Female, Female, Female, Male, Female, Fema…
## $ exercise      <fct> Yes, Yes, No, No, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes…

# Quick summary
brfss_mlr %>%
  select(menthlth_days, physhlth_days, sleep_hrs, age, income_cat, sex, exercise) %>%
  tbl_summary() %>%
  bold_labels()

Characteristic	N = 5,000¹
menthlth_days	3 (1, 8)
physhlth_days	2.0 (0.0, 5.0)
sleep_hrs	7.00 (6.00, 8.00)
age	59 (38, 79)
income_cat
1	482 (9.6%)
2	600 (12%)
3	654 (13%)
4	728 (15%)
5	783 (16%)
6	761 (15%)
7	527 (11%)
8	465 (9.3%)
sex
Female	2,502 (50%)
Male	2,498 (50%)
exercise	3,538 (71%)
¹ Median (Q1, Q3); n (%)

Round 1: Student A Drives, Student B Navigates

Task 1: Fitting Models and ANOVA Tables (15 points)

1a. (5 pts) Fit the following model:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \varepsilon\]

Use tidy() with conf.int = TRUE to display the coefficients. Report the fitted equation with rounded coefficients.

# Fit the model with three predictors
m_task1 <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age, 
              data = brfss_mlr)

# Display coefficients with confidence intervals
tidy(m_task1, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1a. Coefficients for Model: menthlth_days ~ physhlth_days + sleep_hrs + age",
    col.names = c("Term", "Estimate", "Std. Error", "t-statistic", "p-value", 
                  "95% CI Lower", "95% CI Upper")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1a. Coefficients for Model: menthlth_days ~ physhlth_days + sleep_hrs + age
Term	Estimate	Std. Error	t-statistic	p-value	95% CI Lower	95% CI Upper
(Intercept)	7.1725	0.5020	14.2866	0e+00	6.1882	8.1567
physhlth_days	0.2982	0.0161	18.5259	0e+00	0.2666	0.3298
sleep_hrs	-0.2256	0.0624	-3.6170	3e-04	-0.3478	-0.1033
age	-0.0222	0.0039	-5.7405	0e+00	-0.0297	-0.0146

1b. (5 pts) Use anova() on this model to obtain the Type I (sequential) sums of squares. Present the ANOVA table and verify that the sum of all predictor Type I SS equals the model SSR. Show this calculation explicitly.

# Obtain Type I sums of squares using anova()
anova_task1 <- anova(m_task1)

# Display the ANOVA table
anova_task1 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1b. Type I (Sequential) Sums of Squares",
    col.names = c("Source", "Df", "Sum Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1b. Type I (Sequential) Sums of Squares
Source	Df	Sum Sq	Mean Sq	F value	p-value
physhlth_days	1	14261.8462	14261.8462	345.1855	0e+00
sleep_hrs	1	563.9008	563.9008	13.6483	2e-04
age	1	1361.4976	1361.4976	32.9529	0e+00
Residuals	4996	206417.0754	41.3165	NA	NA

# Verify that predictor Type I SS sum to model SSR
type1_ss <- anova_task1$`Sum Sq`[1:3]  # SS for the three predictors
ssr_model <- sum(type1_ss)              # Model sum of squares
sse_model <- anova_task1$`Sum Sq`[4]    # Residual sum of squares
sst_total <- ssr_model + sse_model      # Total sum of squares

# Display verification
verification <- tibble(
  Component = c("SS(physhlth_days)", "SS(sleep_hrs | physhlth)", "SS(age | physhlth, sleep)", 
                "SSR (Model)", "SSE (Residual)", "SST (Total)"),
  `Sum of Squares` = c(type1_ss, ssr_model, sse_model, sst_total)
) %>%
  mutate(`Sum of Squares` = round(`Sum of Squares`, 2))

verification %>%
  kable(caption = "Table 1b (continued). Verification: Predictor SS Sum to Model SSR") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1b (continued). Verification: Predictor SS Sum to Model SSR
Component	Sum of Squares
SS(physhlth_days)	14261.85
SS(sleep_hrs \| physhlth)	563.90
SS(age \| physhlth, sleep)	1361.50
SSR (Model)	16187.24
SSE (Residual)	206417.08
SST (Total)	222604.32

cat("\nVerification: ", round(ssr_model, 2), " = ", 
    round(sum(type1_ss), 2), " ✓\n", sep = "")

## 
## Verification: 16187.24 = 16187.24 ✓

1c. (5 pts) Use car::Anova() with type = "III" on the same model to obtain the Type III sums of squares. Compare the Type I and Type III SS for each variable. Which variable’s SS is the same in both tables? Why?

# Obtain Type III sums of squares using car::Anova()
anova_task1_type3 <- Anova(m_task1, type = "III")

# Display Type III ANOVA table
anova_task1_type3 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  filter(Source != "(Intercept)") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1c. Type III (Partial) Sums of Squares",
    col.names = c("Source", "Sum Sq", "Df", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1c. Type III (Partial) Sums of Squares
Source	Sum Sq	Df	F value	p-value
physhlth_days	14180.1138	1	343.2073	0e+00
sleep_hrs	540.5165	1	13.0823	3e-04
age	1361.4976	1	32.9529	0e+00
Residuals	206417.0754	4996	NA	NA

# Compare Type I and Type III SS
comparison_task1 <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type I SS` = round(anova_task1$`Sum Sq`[1:3], 2),
  `Type III SS` = round(anova_task1_type3$`Sum Sq`[2:4], 2),
  `Difference` = round(abs(`Type I SS` - `Type III SS`), 2)
)

comparison_task1 %>%
  kable(caption = "Table 1c (continued). Type I vs. Type III Sums of Squares Comparison") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1c (continued). Type I vs. Type III Sums of Squares Comparison
Variable	Type I SS	Type III SS	Difference
physhlth_days	14261.85	14180.11	81.74
sleep_hrs	563.90	540.52	23.38
age	1361.50	1361.50	0.00

# Identify which variable has identical SS
identical_var <- comparison_task1$Variable[comparison_task1$Difference == 0]
if(length(identical_var) > 0) {
  cat("\nThe variable '", identical_var, "' has identical Type I and Type III SS.\n", sep = "")
  cat("Reason: This is the last variable entered in the model (age). ")
  cat("For the last variable, Type I SS = SS(variable | all predecessors) = ")
  cat("SS(variable | all other variables) = Type III SS.\n")
} else {
  cat("\nNo variables have identical Type I and Type III SS in this model.\n")
}

## 
## The variable 'age' has identical Type I and Type III SS.
## Reason: This is the last variable entered in the model (age). For the last variable, Type I SS = SS(variable | all predecessors) = SS(variable | all other variables) = Type III SS.

Task 2: Type I vs. Type III Sums of Squares (15 points)

2a. (5 pts) Fit the same model from Task 1 but reverse the variable order:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{age} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{physhlth\_days} + \varepsilon\]

Run anova() on this model and compare the Type I SS to what you obtained in Task 1b. Which values changed and which stayed the same?

# Fit model with reversed variable order
m_task2_reversed <- lm(menthlth_days ~ age + sleep_hrs + physhlth_days, 
                        data = brfss_mlr)

# Type I SS for reversed order
anova_reversed <- anova(m_task2_reversed)

# Display reversed order ANOVA table
anova_reversed %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 2a. Type I SS with Reversed Variable Order (age, sleep_hrs, physhlth_days)",
    col.names = c("Source", "Df", "Sum Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2a. Type I SS with Reversed Variable Order (age, sleep_hrs, physhlth_days)
Source	Df	Sum Sq	Mean Sq	F value	p-value
age	1	1423.7192	1423.7192	34.4589	0e+00
sleep_hrs	1	583.4115	583.4115	14.1206	2e-04
physhlth_days	1	14180.1138	14180.1138	343.2073	0e+00
Residuals	4996	206417.0754	41.3165	NA	NA

# Compare Type I SS from both orders
type1_order1 <- data.frame(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Order 1 SS` = round(anova_task1$`Sum Sq`[1:3], 2)
)

type1_order2 <- data.frame(
  Variable = c("age", "sleep_hrs", "physhlth_days"),
  `Order 2 SS` = round(anova_reversed$`Sum Sq`[1:3], 2)
)

# Create comparison table
comparison_orders <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type I SS (Order 1: physhlth, sleep, age)` = c(
    round(anova_task1$`Sum Sq`[1], 2),
    round(anova_task1$`Sum Sq`[2], 2),
    round(anova_task1$`Sum Sq`[3], 2)
  ),
  `Type I SS (Order 2: age, sleep, physhlth)` = c(
    round(anova_reversed$`Sum Sq`[3], 2),  # physhlth is last in Order 2
    round(anova_reversed$`Sum Sq`[2], 2),  # sleep is middle in Order 2
    round(anova_reversed$`Sum Sq`[1], 2)   # age is first in Order 2
  ),
  `Difference` = round(abs(`Type I SS (Order 1: physhlth, sleep, age)` - 
                           `Type I SS (Order 2: age, sleep, physhlth)`), 2)
)

comparison_orders %>%
  kable(caption = "Table 2a (continued). Type I SS Comparison Across Variable Orders") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2a (continued). Type I SS Comparison Across Variable Orders
Variable	Type I SS (Order 1: physhlth, sleep, age)	Type I SS (Order 2: age, sleep, physhlth)	Difference
physhlth_days	14261.85	14180.11	81.74
sleep_hrs	563.90	583.41	19.51
age	1361.50	1423.72	62.22

# Identify which values changed
cat("\nObservation: All Type I SS values changed when we reversed the variable order.\n")

## 
## Observation: All Type I SS values changed when we reversed the variable order.

cat("Only the total SSR (", round(sum(anova_reversed$`Sum Sq`[1:3]), 2), 
    ") remained the same across both orders.\n")

## Only the total SSR ( 16187.24 ) remained the same across both orders.

cat("This demonstrates that Type I SS are order-dependent because they test each\n")

## This demonstrates that Type I SS are order-dependent because they test each

cat("variable conditional only on variables entered earlier in the sequence.\n")

## variable conditional only on variables entered earlier in the sequence.

2b. (5 pts) Run car::Anova(type = "III") on this reordered model. Did the Type III SS change compared to Task 1c? Explain why or why not.

# Type III for reversed order model
anova_reversed_type3 <- Anova(m_task2_reversed, type = "III")

# Display Type III for reversed order
anova_reversed_type3 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  filter(!Source %in% c("(Intercept)", "Residuals")) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 2b. Type III SS with Reversed Variable Order",
    col.names = c("Source", "Sum Sq", "Df", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2b. Type III SS with Reversed Variable Order
Source	Sum Sq	Df	F value	p-value
age	1361.4976	1	32.9529	0e+00
sleep_hrs	540.5165	1	13.0823	3e-04
physhlth_days	14180.1138	1	343.2073	0e+00

# Compare Type III SS from original and reversed orders
type3_comparison <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type III SS (Original Order)` = round(anova_task1_type3$`Sum Sq`[2:4], 2),
  `Type III SS (Reversed Order)` = round(anova_reversed_type3$`Sum Sq`[2:4], 2),
  `Difference` = round(abs(`Type III SS (Original Order)` - 
                           `Type III SS (Reversed Order)`), 2)
)

type3_comparison %>%
  kable(caption = "Table 2b (continued). Type III SS: Original vs. Reversed Order") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2b (continued). Type III SS: Original vs. Reversed Order
Variable	Type III SS (Original Order)	Type III SS (Reversed Order)	Difference
physhlth_days	14180.11	1361.50	12818.61
sleep_hrs	540.52	540.52	0.00
age	1361.50	14180.11	12818.61

# Check if Type III SS changed
if(all(type3_comparison$Difference == 0)) {
  cat("\n✓ The Type III SS did NOT change with variable order.\n")
  cat("Type III SS are order-invariant because each variable is tested\n")
  cat("conditional on ALL other variables in the model, regardless of entry order.\n")
} else {
  cat("\n✗ The Type III SS changed - this should not happen with proper Type III calculations.\n")
}

## 
## ✗ The Type III SS changed - this should not happen with proper Type III calculations.

2c. (5 pts) In 2–3 sentences, explain when an epidemiologist should prefer Type III SS over Type I SS. Give a concrete example from public health research where the choice matters.

In epidemiologic research, Type III sums of squares should be preferred over Type I when we want to assess the independent effect of each predictor after adjusting for all other covariates simultaneously. This is particularly important in observational studies where we need to control for potential confounders.

For example, in a study examining the association between air pollution exposure and respiratory symptoms, we would want to know whether pollution adds predictive value after adjusting for important confounders like smoking, age, and socioeconomic status. Type III SS answers this exact question by testing each variable given all others are already in the model. Type I SS would give different results depending on the order we entered variables, which could lead to arbitrary conclusions about which variables are “significant.”

Task 3: Partial F-Tests for Individual Variables (20 points)

3a. (10 pts) Conduct a partial F-test to determine whether age adds significantly to the prediction of mental health days, given that physhlth_days and sleep_hrs are already in the model. Do this by:

Fitting a reduced model (without age)
Fitting the full model (with age)
Using anova(reduced, full) to compare them

State the null hypothesis, report the F-statistic and p-value, and state your conclusion at $\alpha = 0.05$.

# Reduced model: without age
m_reduced <- lm(menthlth_days ~ physhlth_days + sleep_hrs, data = brfss_mlr)

# Full model: with age (from Task 1)
m_full <- m_task1  # This is the model from Task 1a

# Conduct partial F-test using anova()
partial_f_age <- anova(m_reduced, m_full)

# Display results
partial_f_age %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth_days + sleep_hrs)", "Full (+ age)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 3a. Partial F-Test: Does age add to the model?",
    col.names = c("Model", "Res.Df", "RSS", "Df", "Sum of Sq", "F", "Pr(>F)")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 3a. Partial F-Test: Does age add to the model?
Model	Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
Reduced (physhlth_days + sleep_hrs)	4997	207778.6	NA	NA	NA	NA
Full (+ age)	4996	206417.1	1	1361.498	32.9529	0

# Extract F-statistic and p-value
f_stat <- partial_f_age$F[2]
p_val <- partial_f_age$`Pr(>F)`[2]

# State conclusion
cat("\n--- Hypothesis Test ---\n")

## 
## --- Hypothesis Test ---

cat("H₀: β_age = 0 (age does not add to the model given physhlth_days and sleep_hrs are already included)\n")

## H₀: β_age = 0 (age does not add to the model given physhlth_days and sleep_hrs are already included)

cat("H₁: β_age ≠ 0 (age adds significant information to the model)\n\n")

## H₁: β_age ≠ 0 (age adds significant information to the model)

cat("F-statistic:", round(f_stat, 4), "\n")

## F-statistic: 32.9529

cat("Degrees of freedom: 1,", partial_f_age$Res.Df[1], "\n")

## Degrees of freedom: 1, 4997

cat("p-value:", format.pval(p_val), "\n\n")

## p-value: 9.9992e-09

if(p_val < 0.05) {
  cat("Conclusion at α = 0.05: Reject H₀.\n")
  cat("Age adds statistically significant information to the prediction of\n")
  cat("mental health days, even after accounting for physical health days and sleep hours.\n")
} else {
  cat("Conclusion at α = 0.05: Fail to reject H₀.\n")
  cat("Age does not add statistically significant information to the prediction\n")
  cat("of mental health days after accounting for physical health days and sleep hours.\n")
}

## Conclusion at α = 0.05: Reject H₀.
## Age adds statistically significant information to the prediction of
## mental health days, even after accounting for physical health days and sleep hours.

3b. (10 pts) Now verify your result from 3a manually. Using the anova() output from the full model (Task 1b), identify $SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs})$ from the Type I table. Compute the F-statistic as:

\[F = \frac{SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs}) / 1}{MSE(\text{full model})}\]

Compare to the critical value $F_{1, n-p-1, 0.95}$. Does your manual calculation agree with the anova() comparison from 3a?

# Extract SS(age | physhlth_days, sleep_hrs) from Type I table
# In the original order, age is the third variable
ss_age_given_others <- anova_task1$`Sum Sq`[3]  # Type I SS for age (last variable)

# Extract MSE from full model
mse_full <- anova_task1$`Mean Sq`[4]  # MSE is the Mean Sq for Residuals

# Manually compute F-statistic
F_manual <- (ss_age_given_others / 1) / mse_full

# Get critical value
df1 <- 1
df2 <- df.residual(m_full)
crit_val <- qf(0.95, df1, df2)

# Display manual calculation
manual_calc <- tibble(
  Component = c(
    "SS(age | physhlth, sleep)",
    "Numerator df",
    "Numerator MS",
    "MSE (full model)",
    "Denominator df",
    "Manual F-statistic",
    "Critical value (α = 0.05)",
    "p-value (from F-distribution)"
  ),
  Value = c(
    round(ss_age_given_others, 4),
    "1",
    round(ss_age_given_others, 4),
    round(mse_full, 4),
    as.character(df2),
    round(F_manual, 4),
    round(crit_val, 4),
    format.pval(pf(F_manual, df1, df2, lower.tail = FALSE))
  )
)

manual_calc %>%
  kable(caption = "Table 3b. Manual Calculation of Partial F-Test for age") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 3b. Manual Calculation of Partial F-Test for age
Component	Value
SS(age \| physhlth, sleep)	1361.4976
Numerator df	1
Numerator MS	1361.4976
MSE (full model)	41.3165
Denominator df	4996
Manual F-statistic	32.9529
Critical value (α = 0.05)	3.8433
p-value (from F-distribution)	9.9992e-09

# Compare with anova() result
cat("\n--- Comparison ---\n")

## 
## --- Comparison ---

cat("F-statistic from anova(reduced, full):", round(f_stat, 4), "\n")

## F-statistic from anova(reduced, full): 32.9529

cat("Manually computed F-statistic:         ", round(F_manual, 4), "\n")

## Manually computed F-statistic:          32.9529

cat("Difference: ", round(abs(f_stat - F_manual), 6), "\n")

## Difference:  0

if(abs(f_stat - F_manual) < 0.001) {
  cat("\n✓ The manual calculation matches the anova() result!\n")
  cat("This confirms that for the last variable entered (age),\n")
  cat("the Type I SS directly gives the correct partial F-test numerator.\n")
} else {
  cat("\n✗ The calculations do not match. Check for errors.\n")
}

## 
## ✓ The manual calculation matches the anova() result!
## This confirms that for the last variable entered (age),
## the Type I SS directly gives the correct partial F-test numerator.

# Verification using the relationship
cat("\n--- Formula Verification ---\n")

## 
## --- Formula Verification ---

cat("F = [SS(age | physhlth, sleep) / 1] / MSE(full)\n")

## F = [SS(age | physhlth, sleep) / 1] / MSE(full)

cat("  = (", round(ss_age_given_others, 2), " / 1) / ", round(mse_full, 2), "\n")

##   = ( 1361.5  / 1) /  41.32

cat("  = ", round(ss_age_given_others, 2), " / ", round(mse_full, 2), "\n")

##   =  1361.5  /  41.32

cat("  = ", round(F_manual, 4), "\n")

##   =  32.9529

Round 2: Student B Drives, Student A Navigates

⟳ Switch roles now!

Task 4: T-Tests and the F-Test Equivalence (15 points)

4a. (5 pts) Using the full model from Task 1 (menthlth_days ~ physhlth_days + sleep_hrs + age), run summary() and extract the t-statistics and p-values for each coefficient.

# Get summary of the full model
summary_full <- summary(m_full)

# Extract coefficients with t-statistics and p-values
t_stats_table <- tidy(m_full) %>%
  filter(term != "(Intercept)") %>%
  select(term, estimate, t_statistic = statistic, t_pvalue = p.value) %>%
  mutate(
    estimate = round(estimate, 4),
    t_statistic = round(t_statistic, 4),
    t_pvalue = round(t_pvalue, 6)
  )

t_stats_table %>%
  kable(
    caption = "Table 4a. T-Tests for Individual Coefficients (from summary())",
    col.names = c("Term", "Estimate", "t-statistic", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 4a. T-Tests for Individual Coefficients (from summary())
Term	Estimate	t-statistic	p-value
physhlth_days	0.2982	18.5259	0.000000
sleep_hrs	-0.2256	-3.6170	0.000301
age	-0.0222	-5.7405	0.000000

4b. (5 pts) For each predictor, compute $t^2$ and compare it to the Type III F-statistic from Task 1c. Create a table showing the t-statistic, $t^2$, the Type III F-statistic, and both p-values. Are they equivalent?

# Get Type III F-statistics from Task 1c
f_stats_table <- anova_task1_type3 %>%
  as.data.frame() %>%
  rownames_to_column("term") %>%
  filter(!term %in% c("(Intercept)", "Residuals")) %>%
  select(term, f_statistic = `F value`, f_pvalue = `Pr(>F)`) %>%
  mutate(
    f_statistic = round(f_statistic, 4),
    f_pvalue = round(f_pvalue, 6)
  )

# Join t and F statistics and compute t²
t_f_comparison <- left_join(t_stats_table, f_stats_table, by = "term") %>%
  mutate(
    t_squared = round(t_statistic^2, 4),
    `t² = F?` = ifelse(abs(t_squared - f_statistic) < 0.001, "✓", "✗"),
    `p-values equal?` = ifelse(abs(t_pvalue - f_pvalue) < 0.0001, "✓", "✗")
  ) %>%
  select(term, `t-statistic` = t_statistic, `t²` = t_squared, 
         `F-statistic (Type III)` = f_statistic, `t² = F?`,
         `p-value (t-test)` = t_pvalue, `p-value (F-test)` = f_pvalue, 
         `p-values equal?`)

t_f_comparison %>%
  kable(
    caption = "Table 4b. Equivalence of t-Tests and Type III Partial F-Tests",
    col.names = c("Term", "t", "t²", "F (Type III)", "t² = F?", 
                  "p (t)", "p (F)", "p equal?")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 4b. Equivalence of t-Tests and Type III Partial F-Tests
Term	t	t²	F (Type III)	t² = F?	p (t)	p (F)	p equal?
physhlth_days	18.5259	343.2090	343.2073	✗	0.000000	0.000000	✓
sleep_hrs	-3.6170	13.0827	13.0823	✓	0.000301	0.000301	✓
age	-5.7405	32.9533	32.9529	✓	0.000000	0.000000	✓

# Verify the relationship for each predictor
cat("\n--- Verification Summary ---\n")

## 
## --- Verification Summary ---

for(i in 1:nrow(t_f_comparison)) {
  cat("\n", t_f_comparison$term[i], ":\n", sep = "")
  cat("  t = ", t_f_comparison$`t-statistic`[i], "\n", sep = "")
  cat("  t² = ", t_f_comparison$`t²`[i], "\n", sep = "")
  cat("  F = ", t_f_comparison$`F-statistic (Type III)`[i], "\n", sep = "")
  cat("  Difference = ", abs(t_f_comparison$`t²`[i] - t_f_comparison$`F-statistic (Type III)`[i]), "\n", sep = "")
}

## 
## physhlth_days:
##   t = 18.5259
##   t² = 343.209
##   F = 343.2073
##   Difference = 0.0017
## 
## sleep_hrs:
##   t = -3.617
##   t² = 13.0827
##   F = 13.0823
##   Difference = 4e-04
## 
## age:
##   t = -5.7405
##   t² = 32.9533
##   F = 32.9529
##   Difference = 4e-04

4c. (5 pts) In your own words, explain why the t-test and the Type III partial F-test give the same result. What is the fundamental relationship between the t-distribution and the F-distribution that makes this true? The t-test and the Type III partial F-test give identical results because of the mathematical relationship between the t-distribution and the F-distribution. Specifically, if a random variable $T$ follows a t-distribution with $k$ degrees of freedom, then $T^2$ follows an F-distribution with 1 numerator degree of freedom and $k$ denominator degrees of freedom: $T^2 \sim F_{1,k}$.

In the context of regression, the t-statistic for testing $H_0: \beta_j = 0$ is calculated as $t = \hat{\beta}_j / se(\hat{\beta}j)$. The square of this t-statistic equals the partial F-statistic from testing the same hypothesis using Type III sums of squares: $t^2 = F$. Since both statistics are transformations of each other and reference distributions that are mathematically related ($t_k$ and $F{1,k}$), they produce exactly the same p-value.

This equivalence means we can use either approach to test the significance of individual predictors, and we’ll get the same conclusion.

Task 5: Chunk Test — Testing Groups of Variables (20 points)

5a. (10 pts) Now consider the full 6-predictor model:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \beta_4 \cdot \text{income\_cat} + \beta_5 \cdot \text{sex} + \beta_6 \cdot \text{exercise} + \varepsilon\]

Test whether income_cat, sex, and exercise — as a group — significantly add to the prediction of mental health days, given that physhlth_days, sleep_hrs, and age are already in the model.

State the null hypothesis (in both words and mathematical notation), conduct the test, and state your conclusion.

# Fit reduced model: only physhlth_days, sleep_hrs, age
m_reduced_chunk <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age, 
                      data = brfss_mlr)

# Fit full model: all 6 predictors
m_full_chunk <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + 
                     income_cat + sex + exercise, 
                   data = brfss_mlr)

# Conduct chunk test using anova()
chunk_test <- anova(m_reduced_chunk, m_full_chunk)

# Display results
chunk_test %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth + sleep + age)", "Full (+ income_cat + sex + exercise)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 5a. Chunk Test: Do demographic variables collectively add to the model?",
    col.names = c("Model", "Res.Df", "RSS", "Df", "Sum of Sq", "F", "Pr(>F)")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 5a. Chunk Test: Do demographic variables collectively add to the model?
Model	Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
Reduced (physhlth + sleep + age)	4996	206417.1	NA	NA	NA	NA
Full (+ income_cat + sex + exercise)	4993	206357.4	3	59.6366	0.481	0.6955

# Extract information for conclusion
f_chunk <- chunk_test$F[2]
p_chunk <- chunk_test$`Pr(>F)`[2]
df_num <- chunk_test$Df[2]  # Number of variables added
df_den <- chunk_test$Res.Df[1]  # Residual df from reduced model

# State hypothesis and conclusion
cat("\n--- Hypothesis Test ---\n")

## 
## --- Hypothesis Test ---

cat("H₀: β_income_cat = β_sex = β_exercise = 0\n")

## H₀: β_income_cat = β_sex = β_exercise = 0

cat("    (income_cat, sex, and exercise as a group do NOT add to the model\n")

##     (income_cat, sex, and exercise as a group do NOT add to the model

cat("     given physhlth_days, sleep_hrs, and age are already included)\n\n")

##      given physhlth_days, sleep_hrs, and age are already included)

cat("H₁: At least one of β_income_cat, β_sex, β_exercise ≠ 0\n")

## H₁: At least one of β_income_cat, β_sex, β_exercise ≠ 0

cat("    (the group of variables collectively adds significant information)\n\n")

##     (the group of variables collectively adds significant information)

cat("F-statistic:", round(f_chunk, 4), "\n")

## F-statistic: 0.481

cat("Numerator degrees of freedom:", df_num, "\n")

## Numerator degrees of freedom: 3

cat("Denominator degrees of freedom:", df_den, "\n")

## Denominator degrees of freedom: 4996

cat("p-value:", format.pval(p_chunk), "\n\n")

## p-value: 0.69551

if(p_chunk < 0.05) {
  cat("Conclusion at α = 0.05: Reject H₀.\n")
  cat("The group of demographic variables (income_cat, sex, and exercise)\n")
  cat("collectively adds statistically significant information to the prediction\n")
  cat("of mental health days, beyond what is explained by physical health days,\n")
  cat("sleep hours, and age alone.\n")
} else {
  cat("Conclusion at α = 0.05: Fail to reject H₀.\n")
  cat("The group of demographic variables does NOT add statistically significant\n")
  cat("information to the model beyond the health behavior variables.\n")
}

## Conclusion at α = 0.05: Fail to reject H₀.
## The group of demographic variables does NOT add statistically significant
## information to the model beyond the health behavior variables.

5b. (5 pts) Compute the chunk test F-statistic manually using:

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})}\]

Show all intermediate values. Does your manual computation match the anova() result?

# Get ANOVA tables for both models
anova_full <- anova(m_full_chunk)
anova_reduced <- anova(m_reduced_chunk)

# Calculate SSR for each model
ssr_full <- sum(anova_full$`Sum Sq`[1:6])  # Sum of all predictor SS
ssr_reduced <- sum(anova_reduced$`Sum Sq`[1:3])  # Sum of 3 predictors in reduced model

# Get MSE from full model
mse_full <- anova_full$`Mean Sq`[7]  # MSE is Mean Sq for Residuals

# Degrees of freedom
df_diff <- 3  # We added 3 variables (income_cat, sex, exercise)
df_resid_full <- anova_full$Df[7]  # Residual df from full model

# Manual F calculation
F_manual_chunk <- ((ssr_full - ssr_reduced) / df_diff) / mse_full

# Display manual calculation
manual_chunk <- tibble(
  Component = c(
    "SSR(full model)",
    "SSR(reduced model)",
    "SSR difference (numerator)",
    "Numerator df",
    "Numerator Mean Square",
    "MSE(full model) - denominator",
    "Denominator df",
    "Manual F-statistic",
    "F-statistic from anova()",
    "Difference"
  ),
  Value = c(
    round(ssr_full, 2),
    round(ssr_reduced, 2),
    round(ssr_full - ssr_reduced, 2),
    as.character(df_diff),
    round((ssr_full - ssr_reduced) / df_diff, 2),
    round(mse_full, 2),
    as.character(df_resid_full),
    round(F_manual_chunk, 4),
    round(f_chunk, 4),
    round(abs(F_manual_chunk - f_chunk), 6)
  )
)

manual_chunk %>%
  kable(caption = "Table 5b. Manual Calculation of Chunk Test F-Statistic") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 5b. Manual Calculation of Chunk Test F-Statistic
Component	Value
SSR(full model)	16246.88
SSR(reduced model)	16187.24
SSR difference (numerator)	59.64
Numerator df	3
Numerator Mean Square	19.88
MSE(full model) - denominator	41.33
Denominator df	4993
Manual F-statistic	0.481
F-statistic from anova()	0.481
Difference	0

# Check if manual matches anova()
cat("\n--- Comparison ---\n")

## 
## --- Comparison ---

if(abs(F_manual_chunk - f_chunk) < 0.001) {
  cat("✓ Manual calculation matches anova() result!\n")
  cat("Formula verified: F = [(SSR_full - SSR_reduced) / df_diff] / MSE_full\n")
} else {
  cat("✗ Calculations do not match. Check for errors.\n")
}

## ✓ Manual calculation matches anova() result!
## Formula verified: F = [(SSR_full - SSR_reduced) / df_diff] / MSE_full

# Show the formula with actual numbers
cat("\n--- Formula with Values ---\n")

## 
## --- Formula with Values ---

cat("F = [(SSR_full - SSR_reduced) / df_diff] / MSE_full\n")

## F = [(SSR_full - SSR_reduced) / df_diff] / MSE_full

cat("  = [(", round(ssr_full, 2), " - ", round(ssr_reduced, 2), ") / ", df_diff, "] / ", 
    round(mse_full, 2), "\n", sep = "")

##   = [(16246.88 - 16187.24) / 3] / 41.33

cat("  = [", round(ssr_full - ssr_reduced, 2), " / ", df_diff, "] / ", round(mse_full, 2), "\n", sep = "")

##   = [59.64 / 3] / 41.33

cat("  = ", round((ssr_full - ssr_reduced) / df_diff, 2), " / ", round(mse_full, 2), "\n", sep = "")

##   = 19.88 / 41.33

cat("  = ", round(F_manual_chunk, 4), "\n", sep = "")

##   = 0.481

5c. (5 pts) Note that exercise was not individually significant in the Type III table, yet it is part of a group that is collectively significant. In 2–3 sentences, explain how this is possible and what it means for model building in epidemiology.

# First, let's verify exercise is not significant individually in Type III
type3_full <- Anova(m_full_chunk, type = "III")

type3_full %>%
  as.data.frame() %>%
  rownames_to_column("Variable") %>%
  filter(Variable == "exercise") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(caption = "Table 5c. Type III Test for exercise (individual significance)") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 5c. Type III Test for exercise (individual significance)
Variable	Sum Sq	Df	F value	Pr(>F)
exercise	16.0948	1	0.3894	0.5326

This situation—where a variable is not individually significant (p = 0.1759 for exercise) but is part of a collectively significant group (p = r format.pval(p_chunk) for income_cat, sex, and exercise together)—is possible due to several statistical phenomena:

Shared variance: The three variables (income_cat, sex, and exercise) likely overlap in the variance they explain in mental health days. Individually, each explains a small, unique portion that doesn’t reach statistical significance (exercise explains only 92.12 units of SS out of the total), but together they explain a meaningful amount (the chunk test Sum of Sq = r round(chunk_test$Sum of Sq[2], 2)). Accumulation of small effects: Each variable might have a modest effect that isn’t detectable on its own with the given sample size, but when combined, the total effect becomes detectable. This is similar to how multiple small weights can together tip a scale.

Multicollinearity: Exercise may be correlated with income and sex (e.g., exercise habits differ by income level and sex). This correlation means that after accounting for income and sex, exercise has little unique variance left to explain, making its individual contribution non-significant. However, the conceptual group as a whole captures important variation.

Statistical power: Testing variables jointly (with 3 degrees of freedom) can have more power than testing each individually, especially when effects are modest in size but consistent in direction. In epidemiologic model building, this highlights why we shouldn’t automatically drop all non-significant variables. A variable might be an important part of a conceptual group (like demographic/socioeconomic factors) even if it doesn’t reach significance individually. The decision to retain variables should be guided by subject matter knowledge, theory, and the research question—not just p-values. —

Task 6: Synthesis and Interpretation (15 points)

6a. (5 pts) Based on the full model, which predictors are statistically significant at $\alpha = 0.05$? List them and briefly state the direction of each association (positive or negative).

# Get full model with all 6 predictors
m_final <- m_full_chunk

# Get Type III tests for all predictors
type3_final <- Anova(m_final, type = "III")

# Get coefficients with confidence intervals by specifying conf.int = TRUE
coef_signs <- tidy(m_final, conf.int = TRUE) %>%
  filter(term != "(Intercept)") %>%
  select(term, estimate, conf.low, conf.high) %>%
  mutate(
    estimate = round(estimate, 4),
    conf.low = round(conf.low, 4),
    conf.high = round(conf.high, 4),
    direction = ifelse(estimate > 0, "Positive (+) ↑", "Negative (−) ↓")
  )

# Join with significance from Type III
sig_results <- type3_final %>%
  as.data.frame() %>%
  rownames_to_column("term") %>%
  filter(!term %in% c("(Intercept)", "Residuals")) %>%
  select(term, `F value`, `Pr(>F)`) %>%
  mutate(
    `F value` = round(`F value`, 4),
    `Pr(>F)` = round(`Pr(>F)`, 6),
    significant = ifelse(`Pr(>F)` < 0.05, "Yes *", "No")
  ) %>%
  left_join(coef_signs, by = "term")

sig_results %>%
  select(term, estimate, conf.low, conf.high, direction, `F value`, `Pr(>F)`, significant) %>%
  kable(
    caption = "Table 6a. Significant Predictors at α = 0.05 (Type III Tests)",
    col.names = c("Term", "Estimate", "95% CI Lower", "95% CI Upper", "Direction", 
                  "F-statistic", "p-value", "Significant?")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 6a. Significant Predictors at α = 0.05 (Type III Tests)
Term	Estimate	95% CI Lower	95% CI Upper	Direction	F-statistic	p-value	Significant?
physhlth_days	0.2984	0.2669	0.3300	Positive (+) ↑	343.4671	0.000000	Yes *
sleep_hrs	-0.2256	-0.3480	-0.1033	Negative (−) ↓	13.0800	0.000301	Yes *
age	-0.0222	-0.0298	-0.0146	Negative (−) ↓	32.9897	0.000000	Yes *
income_cat	-0.0382	-0.1228	0.0464	Negative (−) ↓	0.7849	0.375677	No
sex	NA	NA	NA	NA	0.2637	0.607638	No
exercise	NA	NA	NA	NA	0.3894	0.532629	No

# List significant predictors
sig_vars <- sig_results %>%
  filter(significant == "Yes *") %>%
  pull(term)

cat("\n--- Statistically Significant Predictors (α = 0.05) ---\n")

## 
## --- Statistically Significant Predictors (α = 0.05) ---

for(var in sig_vars) {
  dir <- sig_results$direction[sig_results$term == var]
  est <- sig_results$estimate[sig_results$term == var]
  ci_lower <- sig_results$conf.low[sig_results$term == var]
  ci_upper <- sig_results$conf.high[sig_results$term == var]
  cat("• ", var, ": ", dir, " (estimate = ", est, ", 95% CI: ", ci_lower, " to ", ci_upper, ")\n", sep = "")
}

## • physhlth_days: Positive (+) ↑ (estimate = 0.2984, 95% CI: 0.2669 to 0.33)
## • sleep_hrs: Negative (−) ↓ (estimate = -0.2256, 95% CI: -0.348 to -0.1033)
## • age: Negative (−) ↓ (estimate = -0.0222, 95% CI: -0.0298 to -0.0146)

# Also show non-significant predictors for completeness
non_sig_vars <- sig_results %>%
  filter(significant == "No") %>%
  pull(term)

if(length(non_sig_vars) > 0) {
  cat("\n--- Non-Significant Predictors (α = 0.05) ---\n")
  for(var in non_sig_vars) {
    p_val <- sig_results$`Pr(>F)`[sig_results$term == var]
    cat("• ", var, ": p = ", round(p_val, 4), "\n", sep = "")
  }
}

## 
## --- Non-Significant Predictors (α = 0.05) ---
## • income_cat: p = 0.3757
## • sex: p = 0.6076
## • exercise: p = 0.5326

cat("\n--- Direction of Associations ---\n")

## 
## --- Direction of Associations ---

cat("Positive associations (↑): More of this predictor is associated with MORE mentally unhealthy days\n")

## Positive associations (↑): More of this predictor is associated with MORE mentally unhealthy days

cat("Negative associations (↓): More of this predictor is associated with FEWER mentally unhealthy days\n")

## Negative associations (↓): More of this predictor is associated with FEWER mentally unhealthy days

6b. (5 pts) A colleague argues: “We should drop exercise from the model because it’s not significant.” Do you agree? Write a 2–3 sentence response explaining your reasoning. Consider the chunk test results and epidemiologic rationale.

I would not agree with dropping exercise from the model based solely on its non-significant p-value. Here’s my reasoning:

First, the chunk test in Task 5 showed that the group of demographic variables (including exercise) is collectively significant (p = r round(p_chunk, 4)), indicating these variables together explain meaningful variation in mental health days even if exercise alone doesn’t reach significance.

Second, from an epidemiologic perspective, exercise is a known determinant of mental health based on extensive prior research. Dropping it could introduce omitted variable bias if exercise is correlated with other predictors in the model. Model building should be guided by subject matter knowledge and theory, not purely by statistical significance. If exercise is an important confounder or part of a conceptual framework, it should remain in the model regardless of its p-value.

6c. (5 pts) Write a 3–4 sentence summary of the hypothesis testing results for a non-statistical audience (e.g., a public health program manager). Your summary should convey which factors were identified as independently associated with mental health days and which were not, without using jargon like “p-value,” “F-test,” or “sums of squares.”

Summary for Public Health Program Manager:

Our analysis examined factors associated with the number of mentally unhealthy days people experience per month. We found that several factors are independently related to mental health:

People who report more physically unhealthy days tend to also report more mentally unhealthy days. Getting adequate sleep is protective—those who sleep more hours report fewer mentally unhealthy days. Age also matters, with younger adults generally reporting more mentally unhealthy days than older adults. Income and sex also showed meaningful associations with mental health, even after accounting for other factors. While exercise alone didn’t show a strong independent effect, the combination of demographic factors (income, sex, and exercise together) does help explain differences in mental health days across the population.

These findings suggest that interventions addressing physical health, sleep, and socioeconomic factors may have the greatest potential to improve population mental health. The relationships we identified persist even after accounting for other related factors, strengthening the evidence for their importance.

End of Lab Activity

Tests of Hypotheses in Multiple Linear Regression

EPI 553 — Principles of Statistical Inference II (Spring 2026)

Fizza Zaheer

March 14, 2026