Introduction

In previous lectures, we built multiple linear regression models that included several predictors. We interpreted each coefficient as the expected change in \(Y\) for a one-unit increase in \(X_j\), “holding all other predictors constant.” But we did not examine two critical methodological questions:

Confounding: Does the estimated effect of our exposure change when we add or remove covariates? If so, those covariates are confounders, and we need them in the model to get an unbiased estimate.
Interaction (Effect Modification): Is the effect of our exposure the same for everyone, or does it differ across subgroups? If the effect of sleep on physical health is different for men and women, we say that sex modifies the effect of sleep.

These are the two most important methodological concepts in associative (etiologic) modeling, the type of modeling most common in epidemiology.

Research question for today:

Is the association between sleep duration and physically unhealthy days modified by sex or education? And which variables confound this association?

Predictive vs. Associative Modeling

Before we dive in, it is important to revisit the distinction between the two primary goals of regression modeling, because confounding and interaction are only relevant in one of them.

Goal	Question	What matters most
Predictive	Which set of variables best predicts \(Y\)?	Overall model accuracy (\(R^2\), RMSE, out-of-sample prediction)
Associative	What is the effect of a specific exposure on \(Y\), after adjusting for confounders?	Accuracy and interpretability of a specific \(\hat{\beta}\)

In predictive modeling, we want the combination of variables that minimizes prediction error. We do not care whether individual coefficients are “correct” or interpretable, only whether the model predicts well.

In associative modeling, we care deeply about one (or a few) specific coefficients. We want to ensure that \(\hat{\beta}_{\text{exposure}}\) reflects the true relationship, free from confounding bias. This is the setting where confounding and interaction become critical.

In epidemiology, we are almost always doing associative modeling. We have a specific exposure of interest (e.g., sleep) and want to estimate its effect on a health outcome (e.g., physical health days) while controlling for confounders.

A Warning About Extrapolation

We should never extrapolate predictions from a statistical model beyond the range of the observed data. Extrapolation assumes that the relationship continues unchanged into regions where we have no data, which is often false.

A famous example: NASA engineers extrapolated O-ring performance data to predict behavior at temperatures colder than any previously tested. The resulting decision to launch the Space Shuttle Challenger in cold weather led to the 1986 disaster.

Rule of thumb: Your model is only valid within the range of the data used to build it.

Setup and Data

library(tidyverse)
library(haven)
library(janitor)
library(knitr)
library(kableExtra)
library(broom)
library(gtsummary)
library(GGally)
library(car)
library(ggeffects)
library(plotly)
library(lmtest)

options(gtsummary.use_ftExtra = TRUE)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))

Preparing the Dataset

We continue with the BRFSS 2020 dataset. For this lecture we shift our outcome to physically unhealthy days and examine sleep as the primary exposure, with sex, education, age, exercise, and general health as potential modifiers and confounders.

brfss_full <- read_xpt(
 "C:/Users/God's Icon/Desktop/553-Statistical Inference/553 Lab Codes/data/LLCP2020XPT/LLCP2020.XPT"
) |>
  clean_names()

brfss_ci <- brfss_full |>
  mutate(
    # Outcome: physically unhealthy days in past 30
    physhlth_days = case_when(
      physhlth == 88                    ~ 0,
      physhlth >= 1 & physhlth <= 30   ~ as.numeric(physhlth),
      TRUE                             ~ NA_real_
    ),
    # Primary exposure: sleep hours
    sleep_hrs = case_when(
      sleptim1 >= 1 & sleptim1 <= 14   ~ as.numeric(sleptim1),
      TRUE                             ~ NA_real_
    ),
    # Mentally unhealthy days (covariate)
    menthlth_days = case_when(
      menthlth == 88                    ~ 0,
      menthlth >= 1 & menthlth <= 30   ~ as.numeric(menthlth),
      TRUE                             ~ NA_real_
    ),
    # Age
    age = age80,
    # Sex
    sex = factor(sexvar, levels = c(1, 2), labels = c("Male", "Female")),
    # Education (4-level)
    education = factor(case_when(
      educa %in% c(1, 2, 3) ~ "Less than HS",
      educa == 4             ~ "HS graduate",
      educa == 5             ~ "Some college",
      educa == 6             ~ "College graduate",
      TRUE                   ~ NA_character_
    ), levels = c("Less than HS", "HS graduate", "Some college", "College graduate")),
    # Exercise in past 30 days
    exercise = factor(case_when(
      exerany2 == 1 ~ "Yes",
      exerany2 == 2 ~ "No",
      TRUE          ~ NA_character_
    ), levels = c("No", "Yes")),
    # General health status
    gen_health = factor(case_when(
      genhlth == 1 ~ "Excellent",
      genhlth == 2 ~ "Very good",
      genhlth == 3 ~ "Good",
      genhlth == 4 ~ "Fair",
      genhlth == 5 ~ "Poor",
      TRUE         ~ NA_character_
    ), levels = c("Excellent", "Very good", "Good", "Fair", "Poor")),
    # Income category (ordinal 1-8)
    income_cat = case_when(
      income2 %in% 1:8 ~ as.numeric(income2),
      TRUE             ~ NA_real_
    )
  ) |>
  filter(
    !is.na(physhlth_days),
    !is.na(sleep_hrs),
    !is.na(menthlth_days),
    !is.na(age), age >= 18,
    !is.na(sex),
    !is.na(education),
    !is.na(exercise),
    !is.na(gen_health),
    !is.na(income_cat)
  )

# Reproducible random sample
set.seed(1220)
brfss_ci <- brfss_ci |>
  select(physhlth_days, sleep_hrs, menthlth_days, age, sex,
         education, exercise, gen_health, income_cat) |>
  slice_sample(n = 5000)

# Save for lab activity
saveRDS(brfss_ci, "C:/Users/God's Icon/Desktop/553-Statistical Inference/553 Lab Codes/data/Confounding_Interaction/brfss_ci_2020.rds")

tibble(Metric = c("Observations", "Variables"),
       Value  = c(nrow(brfss_ci), ncol(brfss_ci))) |>
  kable(caption = "Analytic Dataset Dimensions") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

Analytic Dataset Dimensions
Metric	Value
Observations	5000
Variables	9

Descriptive Statistics

brfss_ci |>
  select(physhlth_days, sleep_hrs, age, sex, education, exercise, gen_health) |>
  tbl_summary(
    label = list(
      physhlth_days ~ "Physically unhealthy days (past 30)",
      sleep_hrs     ~ "Sleep (hours/night)",
      age           ~ "Age (years)",
      sex           ~ "Sex",
      education     ~ "Education level",
      exercise      ~ "Any physical activity (past 30 days)",
      gen_health    ~ "General health status"
    ),
    statistic = list(
      all_continuous()  ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    digits = all_continuous() ~ 1,
    missing = "no"
  ) |>
  add_n() |>
  bold_labels() |>
  italicize_levels() |>
  modify_caption("**Table 1. Descriptive Statistics — BRFSS 2020 (n = 5,000)**") |>
  as_flex_table()

**Table 1. Descriptive Statistics — BRFSS 2020 (n = 5,000)**
Characteristic	N	N = 5,0001
Physically unhealthy days (past 30)	5,000	3.4 (7.9)
Sleep (hours/night)	5,000	7.1 (1.4)
Age (years)	5,000	54.1 (17.0)
Sex	5,000
Male		2,291 (46%)
Female		2,709 (54%)
Education level	5,000
Less than HS		276 (5.5%)
HS graduate		1,263 (25%)
Some college		1,378 (28%)
College graduate		2,083 (42%)
Any physical activity (past 30 days)	5,000	3,929 (79%)
General health status	5,000
Excellent		1,031 (21%)
Very good		1,762 (35%)
Good		1,507 (30%)
Fair		524 (10%)
Poor		176 (3.5%)
1Mean (SD); n (%)

Interpretation: The analytic sample includes 5,000 randomly selected BRFSS 2020 respondents. On average, participants reported 3.4 physically unhealthy days in the past 30 (SD = 7.9), indicating a right-skewed distribution where most respondents report few or no unhealthy days. Mean sleep duration was 7.1 hours per night (SD = 1.4), which aligns with the CDC-recommended range of 7 or more hours. The sample skews female (54.2%) and is well-educated, with 41.7% holding a college degree. The majority (78.6%) reported engaging in physical activity in the past 30 days. These sample characteristics should be kept in mind when interpreting results, as the sample may not be representative of the general U.S. population.

Part 1: Guided Practice — Confounding and Interactions

1. Interaction (Effect Modification)

1.1 What Is Interaction?

Interaction (also called effect modification) is present when the relationship between an exposure and an outcome is different at different levels of a third variable. In regression terms, the slope of the exposure-outcome relationship changes depending on the value of the modifier.

For example, if the effect of sleep on physical health is stronger for women than for men, then sex modifies the effect of sleep. The two variables (sleep and sex) have a multiplicative, not merely additive, effect on the outcome.

This is fundamentally different from confounding:

Concept	Question	Implication
Confounding	Is the crude estimate of the exposure effect biased by a third variable?	Must adjust for the confounder to get a valid estimate
Interaction	Does the effect of the exposure differ across subgroups?	Must report stratum-specific effects, not a single overall estimate

Critical point: Always assess interaction before confounding. If interaction is present, a single “adjusted” coefficient for the exposure is misleading because the effect is not the same for everyone. You must stratify or include interaction terms.

1.2 Interaction Between a Continuous and a Dichotomous Variable

Consider a model with a continuous exposure \(X_1\) (sleep hours) and a dichotomous variable \(X_2\) (sex, where Male = 1 and Female = 0):

\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \varepsilon\]

The term \(\beta_3 X_1 X_2\) is the interaction term. Let’s see what happens when we plug in the values for each group:

For males (\(X_2 = 1\)): \[E(Y | X_1, \text{Male}) = (\beta_0 + \beta_2) + (\beta_1 + \beta_3) X_1\]

For females (\(X_2 = 0\)): \[E(Y | X_1, \text{Female}) = \beta_0 + \beta_1 X_1\]

The key insight:

The intercept for males is \(\beta_0 + \beta_2\), while for females it is \(\beta_0\)
The slope for males is \(\beta_1 + \beta_3\), while for females it is \(\beta_1\)
\(\beta_3\) is the difference in slopes between the two groups

If \(\beta_3 = 0\), the slopes are equal and the lines are parallel (no interaction). If \(\beta_3 \neq 0\), the lines are not parallel (interaction is present).

# Create synthetic data to illustrate the concept
set.seed(1220)
n <- 200
concept_data <- tibble(
  x1 = runif(n, 0, 10),
  group = rep(c("Group A", "Group B"), each = n / 2)
)

# No interaction: same slope, different intercepts
no_int <- concept_data |>
  mutate(
    y = ifelse(group == "Group A", 2 + 1.5 * x1, 5 + 1.5 * x1) + rnorm(n, 0, 2)
  )

# Interaction: different slopes
with_int <- concept_data |>
  mutate(
    y = ifelse(group == "Group A", 2 + 1.5 * x1, 5 + 0.3 * x1) + rnorm(n, 0, 2)
  )

p1 <- ggplot(no_int, aes(x = x1, y = y, color = group)) +
  geom_point(alpha = 0.3, size = 1) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1.2) +
  labs(title = "No Interaction", subtitle = "Parallel lines: same slope",
       x = expression(X[1]), y = "Y") +
  theme_minimal(base_size = 12) +
  scale_color_brewer(palette = "Set1") +
  theme(legend.position = "bottom")

p2 <- ggplot(with_int, aes(x = x1, y = y, color = group)) +
  geom_point(alpha = 0.3, size = 1) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1.2) +
  labs(title = "Interaction Present", subtitle = "Non-parallel lines: different slopes",
       x = expression(X[1]), y = "Y") +
  theme_minimal(base_size = 12) +
  scale_color_brewer(palette = "Set1") +
  theme(legend.position = "bottom")

gridExtra::grid.arrange(p1, p2, ncol = 2)

No Interaction (Parallel Lines) vs. Interaction (Non-Parallel Lines)

Interpretation: The left panel illustrates no interaction: both groups have the same slope (1.5), so the lines are perfectly parallel. The group variable shifts the intercept (Group B starts higher) but does not change the rate at which \(Y\) increases with \(X_1\). In this scenario, a single slope coefficient adequately describes the \(X_1\)-\(Y\) relationship for both groups. The right panel illustrates interaction: Group A has a steep slope (1.5) while Group B has a shallow slope (0.3). The non-parallel lines mean that the effect of \(X_1\) on \(Y\) depends on which group you belong to, and a single pooled slope would misrepresent the relationship for both groups. This is the visual test for interaction: parallel lines = no interaction; non-parallel lines = interaction.

2. Testing for Interaction: Stratified Analysis

2.1 The Stratified Approach

One way to assess interaction is to split the data by levels of the potential modifier and fit separate regression models in each stratum. If the slopes differ meaningfully, interaction may be present.

Let’s test whether the association between sleep and physical health days differs by sex.

# Fit separate models for males and females
mod_male <- lm(physhlth_days ~ sleep_hrs, data = brfss_ci |> filter(sex == "Male"))
mod_female <- lm(physhlth_days ~ sleep_hrs, data = brfss_ci |> filter(sex == "Female"))

# Compare coefficients
bind_rows(
  tidy(mod_male, conf.int = TRUE) |> mutate(Stratum = "Male"),
  tidy(mod_female, conf.int = TRUE) |> mutate(Stratum = "Female")
) |>
  filter(term == "sleep_hrs") |>
  select(Stratum, estimate, std.error, conf.low, conf.high, p.value) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Stratified Analysis: Sleep → Physical Health Days, by Sex",
    col.names = c("Stratum", "Estimate", "SE", "CI Lower", "CI Upper", "p-value")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Stratified Analysis: Sleep → Physical Health Days, by Sex
Stratum	Estimate	SE	CI Lower	CI Upper	p-value
Male	-0.5243	0.1193	-0.7583	-0.2903	0.0000
Female	-0.3752	0.1144	-0.5996	-0.1508	0.0011

Interpretation: The stratified analysis reveals that the sleep-physical health association is negative in both groups: among males, each additional hour of sleep is associated with approximately 0.52 fewer physically unhealthy days, while among females the association is weaker at approximately 0.38 fewer days per additional hour. Both estimates are statistically significant. The slopes appear to differ in magnitude, suggesting that the protective association of sleep with physical health may be somewhat stronger for males than females. However, before concluding that interaction is present, we need a formal statistical test, which we will conduct using an interaction term in Section 3.

ggplot(brfss_ci, aes(x = sleep_hrs, y = physhlth_days, color = sex)) +
  geom_jitter(alpha = 0.08, width = 0.3, height = 0.3, size = 0.8) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1.2) +
  labs(
    title    = "Association Between Sleep and Physical Health Days, Stratified by Sex",
    subtitle = "Are the slopes parallel or different?",
    x        = "Sleep Hours per Night",
    y        = "Physically Unhealthy Days (Past 30)",
    color    = "Sex"
  ) +
  theme_minimal(base_size = 13) +
  scale_color_brewer(palette = "Set1")

Stratified Regression: Sleep vs. Physical Health Days by Sex

Interpretation: The plot shows both regression lines sloping downward, confirming the negative association between sleep and physically unhealthy days for both sexes. The male regression line (steeper slope) suggests a stronger protective association of sleep compared to the female line. However, the confidence bands overlap substantially, which hints that the difference in slopes may not be statistically significant. The jittered points also reveal the highly skewed nature of the outcome: most respondents cluster near zero unhealthy days regardless of sleep duration, with a long tail of individuals reporting many unhealthy days.

2.2 Limitations of Stratified Analysis

The stratified approach is intuitive and visually compelling, but it has important limitations:

No formal test: We can see the slopes are different, but we cannot compute a p-value for the difference without additional machinery
Reduced power: Each stratum has fewer observations than the full dataset
Multiple comparisons: With \(k\) strata, we would need \(k\) separate models
Cannot adjust for covariates easily while testing the interaction

The solution is to use a single regression model with an interaction term.

3. Testing for Interaction: Regression with Interaction Terms

3.1 The Interaction Model in R

R provides two operators for specifying interactions:

X1:X2 — includes only the interaction term \(X_1 \times X_2\)
X1*X2 — shorthand for X1 + X2 + X1:X2 (main effects plus interaction)

Rule: Any model with an interaction term must also include the main effects that comprise the interaction. Always use * or explicitly include both main effects with :.

# Model without interaction (additive model)
mod_add <- lm(physhlth_days ~ sleep_hrs + sex, data = brfss_ci)

# Model with interaction
mod_int <- lm(physhlth_days ~ sleep_hrs * sex, data = brfss_ci)
# Equivalent to: lm(physhlth_days ~ sleep_hrs + sex + sleep_hrs:sex, data = brfss_ci)

tidy(mod_int, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Interaction Model: Sleep * Sex → Physical Health Days",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Interaction Model: Sleep * Sex → Physical Health Days
Term	Estimate	SE	t	p-value	CI Lower	CI Upper
(Intercept)	6.8388	0.8723	7.8397	0.0000	5.1287	8.5490
sleep_hrs	-0.5243	0.1222	-4.2898	0.0000	-0.7639	-0.2847
sexFemale	-0.5786	1.1906	-0.4860	0.6270	-2.9127	1.7555
sleep_hrs:sexFemale	0.1491	0.1659	0.8986	0.3689	-0.1762	0.4744

b_int <- round(coef(mod_int), 3)

3.2 Interpreting the Interaction Model

The fitted model is:

\[\widehat{\text{Phys Days}} = 6.839 + -0.524(\text{Sleep}) + -0.579(\text{Female}) + 0.149(\text{Sleep} \times \text{Female})\]

For males (Female = 0): \[\widehat{\text{Phys Days}} = 6.839 + -0.524(\text{Sleep})\]

For females (Female = 1): \[\widehat{\text{Phys Days}} = (6.839 + -0.579) + (-0.524 + 0.149)(\text{Sleep}) = 6.26 + -0.375(\text{Sleep})\]

Interpretation of each coefficient:

Intercept (6.839): Expected physical health days for a male with 0 hours of sleep (extrapolation, not meaningful)
Sleep (-0.524): The slope of sleep for males (the reference group for sex). Each additional hour of sleep is associated with 0.524 fewer physically unhealthy days among males.
Female (-0.579): The difference in intercept between females and males (the difference in expected physical health days at sleep = 0, which is an extrapolation)
Sleep:Female (0.149): The difference in slopes between females and males. This is the interaction term. It tells us how much more (or less) the sleep effect is for females compared to males.

3.3 Testing for Interaction (Parallelism)

The formal test for interaction is simply the t-test for the interaction coefficient:

\[H_0: \beta_3 = 0 \quad \text{(slopes are equal, lines are parallel, no interaction)}\] \[H_A: \beta_3 \neq 0 \quad \text{(slopes differ, interaction is present)}\]

int_term <- tidy(mod_int) |> filter(term == "sleep_hrs:sexFemale")
cat("Interaction term (sleep_hrs:sexFemale):\n")

## Interaction term (sleep_hrs:sexFemale):

cat("  Estimate:", round(int_term$estimate, 4), "\n")

##   Estimate: 0.1491

cat("  t-statistic:", round(int_term$statistic, 3), "\n")

##   t-statistic: 0.899

cat("  p-value:", round(int_term$p.value, 4), "\n")

##   p-value: 0.3689

Interpretation: The interaction term (sleep_hrs:sexFemale) has a coefficient of 0.1491, with a t-statistic of 0.899 and p-value of 0.3689. Since the p-value exceeds the conventional \(\alpha = 0.05\) threshold, we fail to reject the null hypothesis that the slopes are equal. In other words, there is no statistically significant evidence that the association between sleep and physically unhealthy days differs by sex. The positive sign of the interaction coefficient does suggest the sleep effect is slightly less protective for females, but this difference is not distinguishable from chance in this sample. Therefore, a single (pooled) sleep coefficient may be appropriate for both sexes.

3.4 Testing for Coincidence

A stronger test asks whether sex has any effect on the relationship (neither intercept nor slope differs):

\[H_0: \beta_2 = \beta_3 = 0 \quad \text{(the two lines are identical)}\] \[H_A: \text{At least one } \neq 0 \quad \text{(the lines differ in intercept and/or slope)}\]

This is a partial F-test comparing the interaction model to a model with only sleep:

mod_sleep_only <- lm(physhlth_days ~ sleep_hrs, data = brfss_ci)

anova(mod_sleep_only, mod_int) |>
  tidy() |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(caption = "Test for Coincidence: Does Sex Affect the Sleep-Physical Health Relationship at All?") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Test for Coincidence: Does Sex Affect the Sleep-Physical Health Relationship at All?
term	df.residual	rss	df	sumsq	statistic	p.value
physhlth_days ~ sleep_hrs	4998	313611.5	NA	NA	NA	NA
physhlth_days ~ sleep_hrs * sex	4996	313284.5	2	327.0047	2.6074	0.0738

Interpretation: The partial F-test evaluates whether sex contributes anything to the model (either through a different intercept, a different slope, or both). The F-statistic is 2.61 with a p-value of 0.0738. Since p > 0.05, we fail to reject the null hypothesis that the two sex-specific regression lines are identical. This means that, in this unadjusted model, sex does not significantly modify either the baseline level or the slope of the sleep-physical health relationship. The test for coincidence is more comprehensive than the parallelism test (Section 3.3), because it simultaneously evaluates both the intercept difference (\(\beta_2\)) and the slope difference (\(\beta_3\)).

3.5 Verifying Equivalence with Stratified Analysis

The interaction model recovers exactly the same stratum-specific equations as the stratified analysis:

tribble(
  ~Method, ~Stratum, ~Intercept, ~Slope,
  "Stratified", "Male",
    round(coef(mod_male)[1], 3), round(coef(mod_male)[2], 3),
  "Stratified", "Female",
    round(coef(mod_female)[1], 3), round(coef(mod_female)[2], 3),
  "Interaction model", "Male",
    round(b_int[1], 3), round(b_int[2], 3),
  "Interaction model", "Female",
    round(b_int[1] + b_int[3], 3), round(b_int[2] + b_int[4], 3)
) |>
  kable(caption = "Verification: Stratified Analysis = Interaction Model") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Verification: Stratified Analysis = Interaction Model
Method	Stratum	Intercept	Slope
Stratified	Male	6.839	-0.524
Stratified	Female	6.260	-0.375
Interaction model	Male	6.839	-0.524
Interaction model	Female	6.260	-0.375

The interaction model and the stratified analysis give identical stratum-specific equations. The advantage of the interaction model is that it provides a formal test of the difference between slopes and allows adjustment for additional covariates.

pred_int <- ggpredict(mod_int, terms = c("sleep_hrs [3:12]", "sex"))

ggplot(pred_int, aes(x = x, y = predicted, color = group, fill = group)) +
  geom_line(linewidth = 1.2) +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.12, color = NA) +
  labs(
    title    = "Predicted Physical Health Days by Sleep Duration and Sex",
    subtitle = "From interaction model: sleep_hrs * sex",
    x        = "Sleep Hours per Night",
    y        = "Predicted Physically Unhealthy Days",
    color    = "Sex",
    fill     = "Sex"
  ) +
  theme_minimal(base_size = 13) +
  scale_color_brewer(palette = "Set1")

Interaction Model: Predicted Physical Health Days by Sleep and Sex

Interpretation: The predicted values plot confirms what the formal test indicated: the regression lines for males and females are nearly parallel, with broadly overlapping confidence bands. Both groups show a decline in predicted physically unhealthy days as sleep duration increases from 3 to 12 hours. The female line sits slightly below the male line across most of the sleep range, but the difference is modest and not statistically significant. This visualization reinforces the conclusion that sex does not meaningfully modify the sleep-physical health association in this sample.

4. Interactions with Categorical Variables (k > 2)

4.1 Multiple Interaction Terms

When the modifier has \(k > 2\) categories (such as education with 4 levels), the interaction between a continuous exposure and the categorical modifier requires \(k - 1\) interaction terms, one for each dummy variable.

For sleep \(\times\) education (with “Less than HS” as reference):

\[Y = \beta_0 + \beta_1(\text{Sleep}) + \beta_2(\text{HS grad}) + \beta_3(\text{Some college}) + \beta_4(\text{College grad}) + \beta_5(\text{Sleep} \times \text{HS grad}) + \beta_6(\text{Sleep} \times \text{Some college}) + \beta_7(\text{Sleep} \times \text{College grad}) + \varepsilon\]

Each interaction coefficient \(\beta_5, \beta_6, \beta_7\) represents the difference in the sleep slope between that education group and the reference group.

mod_int_educ <- lm(physhlth_days ~ sleep_hrs * education, data = brfss_ci)

tidy(mod_int_educ, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Interaction Model: Sleep * Education → Physical Health Days",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Interaction Model: Sleep * Education → Physical Health Days
Term	Estimate	SE	t	p-value	CI Lower	CI Upper
(Intercept)	11.9692	2.0406	5.8655	0.0000	7.9687	15.9697
sleep_hrs	-0.7877	0.2919	-2.6986	0.0070	-1.3599	-0.2155
educationHS graduate	-4.2823	2.3006	-1.8613	0.0628	-8.7925	0.2280
educationSome college	-5.1543	2.3032	-2.2379	0.0253	-9.6695	-0.6391
educationCollege graduate	-8.8728	2.3020	-3.8545	0.0001	-13.3857	-4.3600
sleep_hrs:educationHS graduate	0.2713	0.3274	0.8287	0.4073	-0.3705	0.9131
sleep_hrs:educationSome college	0.3386	0.3280	1.0322	0.3020	-0.3045	0.9816
sleep_hrs:educationCollege graduate	0.6889	0.3268	2.1079	0.0351	0.0482	1.3297

Interpretation: This model produces three interaction terms, one for each education level compared to the reference category (“Less than HS”). The reference group slope for sleep is -0.788, meaning that among those with less than a high school education, each additional hour of sleep is associated with approximately 0.79 fewer physically unhealthy days. The interaction terms for “HS graduate” (0.271) and “Some college” (0.339) are positive but not individually significant (p > 0.05), indicating their slopes do not significantly differ from the reference group. However, the interaction term for “College graduate” (0.689, p = 0.035) is statistically significant, suggesting that the protective effect of sleep is notably weaker among college graduates (slope \(\approx\) -0.099) compared to those with less than a high school education. This pattern makes substantive sense: individuals with lower education may experience more physically demanding occupations or fewer health resources, amplifying the consequences of insufficient sleep.

4.2 Testing the Overall Interaction with a Partial F-Test

Individual t-tests for each interaction term may be non-significant, yet the interaction as a whole could be meaningful. To test whether sleep \(\times\) education is significant overall, we use a partial F-test comparing the model with and without the interaction terms:

\[H_0: \beta_5 = \beta_6 = \beta_7 = 0 \quad \text{(the sleep slope is the same across all education levels)}\]

mod_no_int_educ <- lm(physhlth_days ~ sleep_hrs + education, data = brfss_ci)

anova(mod_no_int_educ, mod_int_educ) |>
  tidy() |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(caption = "Partial F-test: Is the Sleep x Education Interaction Significant?") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Partial F-test: Is the Sleep x Education Interaction Significant?
term	df.residual	rss	df	sumsq	statistic	p.value
physhlth_days ~ sleep_hrs + education	4995	308368.1	NA	NA	NA	NA
physhlth_days ~ sleep_hrs * education	4992	307957.2	3	410.9464	2.2205	0.0836

Interpretation: The partial F-test evaluates all three interaction terms simultaneously (\(H_0: \beta_5 = \beta_6 = \beta_7 = 0\)). The F-statistic is 2.22 with a p-value of 0.0836. At \(\alpha = 0.05\), the overall interaction is not statistically significant (p > 0.05), meaning we do not have sufficient evidence to conclude that the sleep slope differs across education levels as a whole. Note the discrepancy: the individual t-test for the “College graduate” interaction term was significant (p = 0.035), but the omnibus F-test was not. This can occur when one group drives the difference while others do not. In practice, if there is strong a priori reason to expect education-based effect modification, this borderline result may warrant further investigation in a larger sample.

pred_int_educ <- ggpredict(mod_int_educ, terms = c("sleep_hrs [3:12]", "education"))

ggplot(pred_int_educ, aes(x = x, y = predicted, color = group, fill = group)) +
  geom_line(linewidth = 1.1) +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.1, color = NA) +
  labs(
    title    = "Predicted Physical Health Days by Sleep and Education",
    subtitle = "Are the lines parallel? Non-parallel lines indicate interaction.",
    x        = "Sleep Hours per Night",
    y        = "Predicted Physically Unhealthy Days",
    color    = "Education",
    fill     = "Education"
  ) +
  theme_minimal(base_size = 13) +
  scale_color_brewer(palette = "Set2")

Interaction: Sleep Effect by Education Level

Interpretation: The plot shows four regression lines, one for each education level. The “Less than HS” group has the steepest downward slope, while the “College graduate” line is noticeably flatter. The “HS graduate” and “Some college” lines fall in between. This visual pattern aligns with the regression results: the protective association between sleep and physical health is strongest among those with the least education and weakest among college graduates. At low sleep durations (3-4 hours), the education groups show the widest disparity in predicted unhealthy days, with the “Less than HS” group predicted to have the most. As sleep increases, the lines converge somewhat. Despite the borderline omnibus F-test, the visual evidence of non-parallel lines, particularly for college graduates, reinforces the potential for education-based effect modification.

5. Continuous × Continuous Interactions

5.1 When Both Variables Are Continuous

Interaction is not limited to categorical modifiers. We can also examine whether the effect of one continuous predictor changes across values of another continuous predictor. For example, does the sleep-physical health association differ by age?

\[Y = \beta_0 + \beta_1(\text{Sleep}) + \beta_2(\text{Age}) + \beta_3(\text{Sleep} \times \text{Age}) + \varepsilon\]

Here, \(\beta_3\) tells us how the slope of sleep changes for each one-year increase in age:

The slope of sleep for a person of age \(a\) is: \(\beta_1 + \beta_3 \cdot a\)
If \(\beta_3 < 0\), the protective effect of sleep weakens (becomes less negative) as age increases
If \(\beta_3 > 0\), the protective effect of sleep strengthens as age increases

mod_int_cont <- lm(physhlth_days ~ sleep_hrs * age, data = brfss_ci)

tidy(mod_int_cont, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Continuous Interaction: Sleep * Age → Physical Health Days",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Continuous Interaction: Sleep * Age → Physical Health Days
Term	Estimate	SE	t	p-value	CI Lower	CI Upper
(Intercept)	9.6494	1.9752	4.8852	0.0000	5.7771	13.5217
sleep_hrs	-1.2597	0.2758	-4.5668	0.0000	-1.8005	-0.7189
age	-0.0490	0.0351	-1.3948	0.1631	-0.1178	0.0199
sleep_hrs:age	0.0138	0.0048	2.8444	0.0045	0.0043	0.0232

Interpretation: The interaction term sleep_hrs:age has a coefficient of 0.0138 (p = 0.0045), which is statistically significant. This positive coefficient means that the protective effect of sleep weakens as age increases. Specifically, for each one-year increase in age, the slope of sleep becomes 0.014 units less negative (closer to zero). To illustrate: the predicted slope of sleep for a 30-year-old is \(-1.26 + 0.014 \times 30 = -0.847\), while for a 70-year-old it is \(-1.26 + 0.014 \times 70 = -0.297\). In substantive terms, sleep appears to have a stronger protective association with physical health among younger adults than among older adults.

5.2 Visualizing Continuous Interactions

With continuous modifiers, we visualize the interaction by plotting the predicted relationship at specific values of the modifier (e.g., age = 30, 50, 70):

pred_cont <- ggpredict(mod_int_cont, terms = c("sleep_hrs [3:12]", "age [30, 50, 70]"))

ggplot(pred_cont, aes(x = x, y = predicted, color = group, fill = group)) +
  geom_line(linewidth = 1.2) +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.12, color = NA) +
  labs(
    title    = "Predicted Physical Health Days by Sleep, at Different Ages",
    subtitle = "Does the sleep-health relationship change with age?",
    x        = "Sleep Hours per Night",
    y        = "Predicted Physically Unhealthy Days",
    color    = "Age",
    fill     = "Age"
  ) +
  theme_minimal(base_size = 13) +
  scale_color_brewer(palette = "Dark2")

Interaction: Sleep Effect at Different Ages

If the lines are approximately parallel, age does not modify the sleep effect. If they fan out or converge, the interaction is meaningful.

Interpretation: The three age-specific lines clearly fan out at lower sleep durations and converge at higher sleep durations, confirming the statistically significant interaction. The line for 30-year-olds has the steepest negative slope, indicating that younger adults experience the largest reduction in physically unhealthy days per additional hour of sleep. The line for 70-year-olds is notably flatter, suggesting that sleep duration has a weaker association with physical health among older adults. One possible explanation is that older adults accumulate physically unhealthy days from age-related conditions (e.g., arthritis, cardiovascular disease) that are less modifiable by sleep alone, whereas younger adults’ physical health may be more responsive to behavioral factors like sleep.

6. Confounding

6.1 What Is Confounding?

Confounding exists when the estimated association between an exposure and an outcome is distorted because of a third variable that is related to both. When confounding is present, ignoring the confounder leads to a biased estimate of the exposure effect.

For a variable to be a confounder, it must satisfy three conditions:

Associated with the exposure (sleep hours)
Associated with the outcome (physical health days), even in the absence of the exposure
Not on the causal pathway from exposure to outcome (i.e., not a mediator)

Confounding Structure: The Confounder Affects Both Exposure and Outcome

Age is a classic confounder of the sleep-physical health relationship:

Older adults tend to sleep fewer hours (associated with exposure)
Older adults have more physically unhealthy days (associated with outcome)
Age is not caused by sleep or physical health (not on the causal pathway)

6.2 Detecting Confounding: The 10% Change-in-Estimate Rule

The standard approach in epidemiology is to compare the crude (unadjusted) estimate to the adjusted estimate:

Fit the crude model: \(Y = \beta_0 + \beta_1 X_1\)
Fit the adjusted model: \(Y = \beta_0^* + \beta_1^* X_1 + \beta_2^* X_2\)
Compute the percent change: \(\frac{|\beta_1 - \beta_1^*|}{|\beta_1|} \times 100\%\)
If the change exceeds 10%, \(X_2\) is a confounder and should be included in the model

Important: This is a rule of thumb, not a rigid cutoff. The 10% threshold is conventional, not absolute. Some researchers use 5% or 15% depending on context.

6.3 Confounding Example: Is Age a Confounder?

# Crude model: sleep → physical health days
mod_crude <- lm(physhlth_days ~ sleep_hrs, data = brfss_ci)

# Adjusted model: adding age
mod_adj_age <- lm(physhlth_days ~ sleep_hrs + age, data = brfss_ci)

# Extract sleep coefficients
b_crude <- coef(mod_crude)["sleep_hrs"]
b_adj   <- coef(mod_adj_age)["sleep_hrs"]
pct_change <- abs(b_crude - b_adj) / abs(b_crude) * 100

tribble(
  ~Model, ~`Sleep β`, ~SE, ~`95% CI`,
  "Crude (sleep only)",
    round(b_crude, 4),
    round(tidy(mod_crude) |> filter(term == "sleep_hrs") |> pull(std.error), 4),
    paste0("(", round(confint(mod_crude)["sleep_hrs", 1], 4), ", ",
           round(confint(mod_crude)["sleep_hrs", 2], 4), ")"),
  "Adjusted (+ age)",
    round(b_adj, 4),
    round(tidy(mod_adj_age) |> filter(term == "sleep_hrs") |> pull(std.error), 4),
    paste0("(", round(confint(mod_adj_age)["sleep_hrs", 1], 4), ", ",
           round(confint(mod_adj_age)["sleep_hrs", 2], 4), ")")
) |>
  kable(caption = "Confounding Assessment: Does Age Confound the Sleep-Physical Health Association?") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Confounding Assessment: Does Age Confound the Sleep-Physical Health Association?
Model	Sleep β	SE	95% CI
Crude (sleep only)	-0.4381	0.0827	(-0.6002, -0.2761)
Adjusted (+ age)	-0.5112	0.0828	(-0.6736, -0.3489)

cat("\nPercent change in sleep coefficient when adding age:", round(pct_change, 1), "%\n")

## 
## Percent change in sleep coefficient when adding age: 16.7 %

cat("10% rule threshold: |", round(b_adj, 4), "| should fall within (",
    round(b_crude * 0.9, 4), ",", round(b_crude * 1.1, 4), ")\n")

## 10% rule threshold: | -0.5112 | should fall within ( -0.3943 , -0.4819 )

Interpretation: The crude (unadjusted) estimate for sleep is -0.4381, meaning that without controlling for any covariates, each additional hour of sleep is associated with approximately 0.44 fewer physically unhealthy days. After adjusting for age, the sleep coefficient changes to -0.5112, a 16.7% change. Because this exceeds the 10% threshold, age is a confounder of the sleep-physical health association. Notice that the adjusted estimate is more negative than the crude estimate. This means that confounding by age was attenuating the sleep effect (making it look weaker than it truly is). This occurs because age is positively associated with both the exposure (older adults sleep less) and the outcome (older adults have more unhealthy days), creating a positive confounding bias that partially masks the negative sleep-health association. After removing this bias by adjusting for age, the estimated protective effect of sleep becomes stronger.

6.4 Systematic Confounding Assessment

In practice, we assess multiple potential confounders. Let’s evaluate age, exercise, education, and general health:

# Crude model
b_crude_val <- coef(mod_crude)["sleep_hrs"]

# One-at-a-time adjusted models
confounders <- list(
  "Age"             = lm(physhlth_days ~ sleep_hrs + age, data = brfss_ci),
  "Exercise"        = lm(physhlth_days ~ sleep_hrs + exercise, data = brfss_ci),
  "Education"       = lm(physhlth_days ~ sleep_hrs + education, data = brfss_ci),
  "General health"  = lm(physhlth_days ~ sleep_hrs + gen_health, data = brfss_ci),
  "Sex"             = lm(physhlth_days ~ sleep_hrs + sex, data = brfss_ci),
  "Income"          = lm(physhlth_days ~ sleep_hrs + income_cat, data = brfss_ci)
)

conf_table <- map_dfr(names(confounders), \(name) {
  mod <- confounders[[name]]
  b_adj_val <- coef(mod)["sleep_hrs"]
  tibble(
    Covariate = name,
    `Crude β (sleep)` = round(b_crude_val, 4),
    `Adjusted β (sleep)` = round(b_adj_val, 4),
    `% Change` = round(abs(b_crude_val - b_adj_val) / abs(b_crude_val) * 100, 1),
    Confounder = ifelse(abs(b_crude_val - b_adj_val) / abs(b_crude_val) * 100 > 10,
                        "Yes (>10%)", "No")
  )
})

conf_table |>
  kable(caption = "Systematic Confounding Assessment: One-at-a-Time Addition") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) |>
  column_spec(5, bold = TRUE)

Systematic Confounding Assessment: One-at-a-Time Addition
Covariate	Crude β (sleep)	Adjusted β (sleep)	% Change	Confounder
Age	-0.4381	-0.5112	16.7	Yes (>10%)
Exercise	-0.4381	-0.4083	6.8	No
Education	-0.4381	-0.3872	11.6	Yes (>10%)
General health	-0.4381	-0.1948	55.5	Yes (>10%)
Sex	-0.4381	-0.4434	1.2	No
Income	-0.4381	-0.3739	14.7	Yes (>10%)

Interpretation: The systematic assessment reveals four variables that meet the 10% change-in-estimate criterion for confounding:

Age (16.7% change): Adjusting for age made the sleep effect stronger (more negative), consistent with positive confounding, as discussed above.
Education (11.6% change): Adjusting for education slightly attenuated the sleep coefficient. This suggests that part of the crude sleep-health association may be explained by education-related differences in both sleep habits and health outcomes.
General health (55.5% change): This variable produced the largest change by far, cutting the sleep coefficient by more than half. However, as we discuss in Section 6.5, this extreme attenuation raises the question of whether general health is a confounder or a mediator.
Income (14.7% change): Income confounds the association, likely because higher-income individuals tend to sleep more regularly and have better access to healthcare, both of which relate to physical health.

Two variables did not meet the confounding threshold: exercise (6.8%) and sex (1.2%). Their inclusion in the model would not meaningfully alter the estimated sleep effect.

6.5 Important Caveats About Confounding Assessment

Identifying candidate confounders is not purely a statistical exercise. The three conditions for confounding (associated with exposure, associated with outcome, not on the causal pathway) require substantive knowledge from the literature and your understanding of the causal structure.

Caution about missing data: When you add a covariate to the model, observations with missing values on that covariate are dropped. This changes the analytic sample, which could alter \(\hat{\beta}\) for reasons unrelated to confounding. Always ensure that the crude and adjusted models are fit on the same set of observations.

# Verify our dataset has no missing values (we filtered earlier)
cat("Missing values in analytic dataset:", sum(!complete.cases(brfss_ci)), "\n")

## Missing values in analytic dataset: 0

cat("All models are fit on the same n =", nrow(brfss_ci), "observations.\n")

## All models are fit on the same n = 5000 observations.

Confounders vs. mediators: A variable on the causal pathway between exposure and outcome is a mediator, not a confounder. For example, if sleep affects exercise habits, which in turn affect physical health, then exercise is a mediator. Adjusting for a mediator would attenuate the exposure effect, but this attenuation is not bias; it reflects the removal of an indirect pathway. Whether to adjust depends on your research question.

General health status is a tricky case. It could be a confounder (poor overall health causes both poor sleep and more physically unhealthy days) or a mediator (poor sleep leads to poor general health, which leads to more physically unhealthy days). The direction depends on the assumed causal structure and should be guided by subject-matter knowledge.

7. Order of Operations: Interaction Before Confounding

7.1 Why Interaction Comes First

The standard epidemiological approach to model building follows this order:

Assess interaction first — test whether the exposure-outcome relationship differs across subgroups
If interaction is present — stratify by the modifier and assess confounding within each stratum
If no interaction — assess confounding in the pooled (unstratified) data

The reason for this order is that confounding assessment assumes a single exposure effect. If the effect actually varies across subgroups (interaction), then a single “adjusted” coefficient is misleading. Reporting an average effect when the true effect differs by sex, for example, could mask important public health heterogeneity.

7.2 A Practical Workflow

Workflow: Interaction Before Confounding
Step	Action
1	Specify the exposure-outcome relationship of interest
2	Identify potential effect modifiers (from literature, biological plausibility)
3	Test for interaction using interaction terms or stratified analysis
4	If interaction is present: report stratum-specific effects; assess confounding within strata
5	If no interaction: assess confounding in the full sample using the 10% change-in-estimate rule

7.3 Putting It All Together

Let’s build a final model that accounts for both potential interaction and confounding in our BRFSS data:

# Step 1: Test for interaction between sleep and sex
mod_final_int <- lm(physhlth_days ~ sleep_hrs * sex + age + education + exercise,
                    data = brfss_ci)

# Check interaction term
int_pval <- tidy(mod_final_int) |>
  filter(term == "sleep_hrs:sexFemale") |>
  pull(p.value)

cat("Interaction p-value (sleep x sex):", round(int_pval, 4), "\n")

## Interaction p-value (sleep x sex): 0.0417

Note on the interaction result: In this adjusted model, the sleep \(\times\) sex interaction p-value is 0.0417. Depending on the significance level used, this result may be considered borderline. In the unadjusted model (Section 3.3), the interaction was clearly non-significant (p = 0.37). The change in the p-value after adjusting for confounders illustrates an important lesson: interaction results can shift when covariates are added, because confounders can mask or amplify subgroup differences. For this lecture, we proceed with the additive (no interaction) model, but in practice a borderline interaction like this warrants careful consideration and possibly stratified reporting.

# Step 2: If interaction is not significant, fit additive model with confounders
mod_final <- lm(physhlth_days ~ sleep_hrs + sex + age + education + exercise,
                data = brfss_ci)

tidy(mod_final, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Final Model: Sleep → Physical Health Days, Adjusted for Confounders",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Final Model: Sleep → Physical Health Days, Adjusted for Confounders
Term	Estimate	SE	t	p-value	CI Lower	CI Upper
(Intercept)	9.9181	0.7955	12.4676	0.0000	8.3585	11.4776
sleep_hrs	-0.4318	0.0805	-5.3662	0.0000	-0.5895	-0.2740
sexFemale	0.2862	0.2178	1.3140	0.1889	-0.1408	0.7132
age	0.0355	0.0065	5.4878	0.0000	0.0228	0.0482
educationHS graduate	-1.9241	0.5085	-3.7838	0.0002	-2.9209	-0.9272
educationSome college	-2.0609	0.5065	-4.0688	0.0000	-3.0539	-1.0679
educationCollege graduate	-2.9504	0.4955	-5.9548	0.0000	-3.9217	-1.9791
exerciseYes	-4.1551	0.2711	-15.3273	0.0000	-4.6866	-3.6236

Interpretation of the final model: After adjusting for sex, age, education, and exercise, each additional hour of sleep is associated with 0.43 fewer physically unhealthy days (95% CI: -0.59 to -0.27, p < 0.001). Additional notable findings from the adjusted model:

Exercise has the strongest association: individuals who exercised in the past 30 days reported approximately 4.2 fewer physically unhealthy days compared to non-exercisers, holding all else constant.
Age has a small positive association: each additional year of age is associated with 0.036 more unhealthy days.
Education shows a graded protective pattern: compared to those with less than a high school education, college graduates report approximately 3 fewer unhealthy days.
Sex is not significantly associated with physical health days after adjustment (p = 0.189).

b_final <- coef(mod_final)["sleep_hrs"]

tribble(
  ~Model, ~`Sleep β`, ~`% Change from Crude`,
  "Crude", round(b_crude_val, 4), "—",
  "Adjusted (age only)", round(coef(mod_adj_age)["sleep_hrs"], 4),
    paste0(round(abs(b_crude_val - coef(mod_adj_age)["sleep_hrs"]) / abs(b_crude_val) * 100, 1), "%"),
  "Final (all confounders)", round(b_final, 4),
    paste0(round(abs(b_crude_val - b_final) / abs(b_crude_val) * 100, 1), "%")
) |>
  kable(caption = "Sleep Coefficient: Crude vs. Progressively Adjusted Models") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Sleep Coefficient: Crude vs. Progressively Adjusted Models
Model	Sleep β	% Change from Crude
Crude	-0.4381	—
Adjusted (age only)	-0.5112	16.7%
Final (all confounders)	-0.4318	1.5%

Interpretation: This table tracks how the sleep coefficient changes as we progressively add confounders. The crude estimate (-0.4381) represents the unadjusted association. Adding age alone moved the estimate to -0.5112 (a 16.7% change), confirming age as an important confounder. In the final model with all confounders (age, sex, education, and exercise), the sleep coefficient is -0.4318, only a 1.5% change from the crude estimate. This small overall change may seem surprising given that individual confounders produced larger shifts. The reason is that the confounders act in opposing directions: age adjustment makes the sleep effect stronger (more negative), while education adjustment makes it weaker (less negative). When all confounders are included simultaneously, these opposing adjustments partially cancel out. This underscores why it is important to assess confounders both individually and jointly.

Summary of Key Concepts

Concept	Key Point
Predictive vs. associative	Confounding and interaction matter for associative (etiologic) modeling
Interaction	The effect of the exposure differs across levels of a modifier
Stratified analysis	Fit separate models per stratum; informative but no formal test
Interaction term	\(X_1 \times X_2\) in the model; t-test on \(\beta_3\) tests parallelism
`:` vs. `*` in R	`:` = interaction only; `*` = main effects + interaction
Must include main effects	Never include \(X_1 X_2\) without also including \(X_1\) and \(X_2\)
Partial F-test for interaction	Tests all \(k - 1\) interaction terms simultaneously
Continuous \(\times\) continuous	The slope of one predictor changes linearly with the other
Confounding	A third variable distorts the exposure-outcome association
Three conditions	Associated with exposure, associated with outcome, not on causal pathway
10% change-in-estimate	Compare crude and adjusted \(\hat{\beta}\); >10% change = confounder
Interaction before confounding	Assess interaction first; if present, stratify before assessing confounding
Mediator vs. confounder	Adjusting for a mediator removes an indirect effect, not bias

Part 2: In-Class Lab Activity

EPI 553 — Confounding and Interactions Lab Due: End of class, March 24, 2026

Instructions

In this lab, you will assess interaction and confounding in the BRFSS 2020 dataset. Work through each task systematically. You may discuss concepts with classmates, but your written answers and R code must be your own.

Submission: Knit your .Rmd to HTML and upload to Brightspace by end of class.

Data for the Lab

Use the saved analytic dataset from today’s lecture. It contains 5,000 randomly sampled BRFSS 2020 respondents.

Variable	Description	Type
`physhlth_days`	Physically unhealthy days in past 30	Continuous (0–30)
`sleep_hrs`	Sleep hours per night	Continuous (1–14)
`menthlth_days`	Mentally unhealthy days in past 30	Continuous (0–30)
`age`	Age in years (capped at 80)	Continuous
`sex`	Sex (Male/Female)	Factor
`education`	Education level (4 categories)	Factor
`exercise`	Any physical activity (Yes/No)	Factor
`gen_health`	General health status (5 categories)	Factor
`income_cat`	Household income (1–8 ordinal)	Numeric

# Load the dataset saved in Part 1
brfss_ci <- readRDS(
  "C:/Users/God's Icon/Desktop/553-Statistical Inference/553 Lab Codes/data/Confounding_Interaction/brfss_ci_2020.rds"
)

Task 1: Exploratory Data Analysis (15 points)

1a. Scatterplot of `physhlth_days` vs. `age`, colored by `exercise` (5 pts)

ggplot(brfss_ci, aes(x = age, y = physhlth_days, color = exercise)) +
  geom_jitter(alpha = 0.08, width = 0.5, height = 0.3, size = 0.8) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1.3) +
  scale_color_brewer(palette = "Set1") +
  labs(
    title    = "Physically Unhealthy Days vs. Age, by Exercise Status",
    subtitle = "Separate regression lines show whether the age-health slope differs by exercise",
    x        = "Age (years)",
    y        = "Physically Unhealthy Days (Past 30)",
    color    = "Any Exercise\n(Past 30 Days)"
  ) +
  theme_minimal(base_size = 13)

Physical Health Days vs. Age by Exercise Status

Interpretation: The scatterplot reveals several notable patterns. First, there is a positive association between age and physically unhealthy days for both groups, meaning older individuals tend to report more unhealthy days regardless of exercise status. Second, non-exercisers (red line) consistently report more physically unhealthy days than exercisers (blue line) across all ages, suggesting a strong protective effect of exercise. Third, and most important for interaction assessment, the two regression lines appear to have slightly different slopes: the non-exerciser line rises more steeply with age than the exerciser line. This visual divergence hints that the age-physical health relationship may be stronger among non-exercisers, which would indicate an interaction. We will test this formally in Task 2 and Task 3.

1b. Mean `physhlth_days` by sex and exercise (5 pts)

brfss_ci |>
  group_by(sex, exercise) |>
  summarise(
    n              = n(),
    mean_physhlth  = round(mean(physhlth_days), 2),
    sd_physhlth    = round(sd(physhlth_days), 2),
    .groups        = "drop"
  ) |>
  mutate(`Mean (SD)` = paste0(mean_physhlth, " (", sd_physhlth, ")")) |>
  select(Sex = sex, Exercise = exercise, N = n, `Mean (SD)`) |>
  kable(
    caption = "Mean Physically Unhealthy Days by Sex and Exercise Status",
    align   = c("l", "l", "r", "r")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Mean Physically Unhealthy Days by Sex and Exercise Status
Sex	Exercise	N	Mean (SD)
Male	No	446	6.64 (10.98)
Male	Yes	1845	2.32 (6.49)
Female	No	625	7.43 (11.46)
Female	Yes	2084	2.45 (6.32)

Interpretation: The table reveals a consistent pattern: exercisers report fewer physically unhealthy days than non-exercisers within both sexes. Among males, exercisers report 2.32 mean unhealthy days versus 6.64 for non-exercisers: a difference of approximately 4.32 days. Among females, exercisers report 2.45 versus 7.43 for non-exercisers: a difference of approximately 4.98 days. The exercise benefit appears somewhat larger for females than for males (larger absolute difference), suggesting that the association between exercise and physical health days may differ by sex that is, sex might modify the exercise-health relationship. Whether this difference is statistically meaningful requires a formal interaction test.

1c. Scatterplot of `physhlth_days` vs. `sleep_hrs`, faceted by education (5 pts)

ggplot(brfss_ci, aes(x = sleep_hrs, y = physhlth_days)) +
  geom_jitter(alpha = 0.1, width = 0.25, height = 0.25, size = 0.7,
              color = "steelblue") +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1.2, color = "firebrick") +
  facet_wrap(~ education, nrow = 2) +
  labs(
    title    = "Physically Unhealthy Days vs. Sleep Hours, by Education Level",
    subtitle = "Are the slopes similar across education groups?",
    x        = "Sleep Hours per Night",
    y        = "Physically Unhealthy Days (Past 30)"
  ) +
  theme_minimal(base_size = 12) +
  theme(strip.text = element_text(face = "bold"))

Physical Health Days vs. Sleep Hours, Faceted by Education

Interpretation: Across all four education panels, there is a consistent negative relationship between sleep hours and physically unhealthy days: more sleep is associated with fewer unhealthy days. However, the slopes visually differ across education groups. The “Less than HS” panel shows the steepest negative slope, while the “College graduate” panel has a notably flatter line, suggesting a weaker protective association of sleep among higher-educated individuals. This is consistent with the interaction results from Part 1, where the college graduate sleep interaction term was statistically significant. The “HS graduate” and “Some college” panels display intermediate slopes. These visual differences are suggestive of education as an effect modifier of the sleep-physical health relationship, though the overall partial F-test was borderline (as shown in Part 1). Additionally, the “Less than HS” group appears to have a wider spread of unhealthy days at low sleep values, reflecting greater variability in that group.

Task 2: Stratified Analysis for Interaction (15 points)

2a. Separate regression models of `physhlth_days ~ age` by exercise status (5 pts)

# Fit separate models for exercisers and non-exercisers
mod_ex_yes <- lm(physhlth_days ~ age, data = brfss_ci |> filter(exercise == "Yes"))
mod_ex_no  <- lm(physhlth_days ~ age, data = brfss_ci |> filter(exercise == "No"))

# Compile results table
bind_rows(
  tidy(mod_ex_yes, conf.int = TRUE) |> mutate(Stratum = "Exercise (Yes)"),
  tidy(mod_ex_no,  conf.int = TRUE) |> mutate(Stratum = "Non-Exercise (No)")
) |>
  filter(term == "age") |>
  select(Stratum, Slope = estimate, SE = std.error,
         `CI Lower` = conf.low, `CI Upper` = conf.high, `p-value` = p.value) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Table 2a. Stratified Regression: Age → Physical Health Days, by Exercise Status"
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2a. Stratified Regression: Age → Physical Health Days, by Exercise Status
Stratum	Slope	SE	CI Lower	CI Upper	p-value
Exercise (Yes)	0.0192	0.0060	0.0075	0.0309	0.0013
Non-Exercise (No)	0.0745	0.0212	0.0328	0.1161	0.0005

Interpretation: Among exercisers, each additional year of age is associated with 0.0192 more physically unhealthy days (95% CI: 0.0075 to 0.0309). Among non-exercisers, each additional year of age is associated with 0.0745 more unhealthy days (95% CI: 0.0328 to 0.1161). Both slopes are statistically significant (p < 0.05). Notably, the slope for non-exercisers is approximately 3.9 times larger than for exercisers, suggesting that the age-related accumulation of physically unhealthy days is steeper among those who do not exercise. This difference in slopes is the key signal for potential interaction between age and exercise.

2b. Plot with two fitted regression lines overlaid on the data (5 pts)

ggplot(brfss_ci, aes(x = age, y = physhlth_days, color = exercise)) +
  geom_jitter(alpha = 0.08, width = 0.4, height = 0.3, size = 0.8) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1.3) +
  scale_color_brewer(palette = "Set1") +
  labs(
    title    = "Physically Unhealthy Days vs. Age, Stratified by Exercise",
    subtitle = "Non-parallel lines would indicate an age × exercise interaction",
    x        = "Age (years)",
    y        = "Physically Unhealthy Days (Past 30)",
    color    = "Exercise Status"
  ) +
  theme_minimal(base_size = 13)

Age vs. Physical Health Days: Separate Lines by Exercise Status

Interpretation: The two fitted regression lines are not parallel: the non-exerciser line (red) has a noticeably steeper positive slope than the exerciser line (blue). This visual divergence suggests that the association between age and physically unhealthy days differs by exercise status, age may be a stronger risk factor for unhealthy days among non-exercisers than among exercisers. The confidence bands also diverge at higher ages, reflecting greater uncertainty in predicted values among older non-exercisers. This pattern is consistent with a biological explanation: regular physical activity may buffer some of the age-related decline in physical health, meaning that the age-related penalty in physical health days is attenuated for those who stay active.

2c. Can you formally test whether the two slopes differ using only stratified results? (5 pts)

No — a formal statistical test for the difference between two slopes cannot be derived from stratified regression results alone.

The stratified analysis fits two independent models: one in exercisers and one in non-exercisers. Each model produces its own slope estimate and standard error. While we can visually compare the slopes, computing a formal p-value for the difference (\(H_0: \beta_{age,\text{exercisers}} = \beta_{age,\text{non-exercisers}}\)) requires knowledge of the covariance between the two estimates. However, because the two models are fit on separate, non-overlapping subsets of the data, their estimates are independent — their covariance is zero by construction. This means we technically could compute a z-test using \(\frac{\hat{\beta}_1 - \hat{\beta}_2}{\sqrt{SE_1^2 + SE_2^2}}\), but this test is not equivalent to the interaction term t-test from a single pooled model because:

It does not pool the variance estimates, reducing efficiency.
It cannot adjust for other covariates while simultaneously testing the difference.
The denominator does not account for the shared residual variance structure across groups.

The correct approach is to use a single regression model with an interaction term (physhlth_days ~ age * exercise), which provides a proper t-test for \(\beta_3\) (the difference in slopes) that pools information across both strata, allows covariate adjustment, and uses the full dataset for efficient estimation. This is covered in Task 3.

Task 3: Interaction via Regression (25 points)

3a. Fit `physhlth_days ~ age * exercise` and write the fitted equation (5 pts)

# Fit the interaction model
mod_age_ex <- lm(physhlth_days ~ age * exercise, data = brfss_ci)

tidy(mod_age_ex, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Table 3a: Interaction Model: Age × Exercise → Physical Health Days",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 3a: Interaction Model: Age × Exercise → Physical Health Days
Term	Estimate	SE	t	p-value	CI Lower	CI Upper
(Intercept)	2.7610	0.8798	3.1381	0.0017	1.0361	4.4858
age	0.0745	0.0145	5.1203	0.0000	0.0460	0.1030
exerciseYes	-1.3858	0.9664	-1.4340	0.1516	-3.2804	0.5087
age:exerciseYes	-0.0553	0.0162	-3.4072	0.0007	-0.0871	-0.0235

b_ae <- round(coef(mod_age_ex), 4)

Fitted Equation:

\[\widehat{\text{Phys Days}} = 2.761 + 0.0745(\text{Age}) + -1.3858(\text{exerciseYes}) + -0.0553(\text{Age} \times \text{exerciseYes})\]

Interpretation of coefficients:

Intercept (2.761): Expected physically unhealthy days for a non-exerciser at age = 0 (extrapolation, not meaningful in practice).
Age (0.0745): The slope of age for non-exercisers (the reference group for exercise). Each additional year of age is associated with 0.0745 more physically unhealthy days among non-exercisers.
exerciseYes (-1.3858): The difference in intercept for exercisers vs. non-exercisers at age = 0 (extrapolation).
Age:exerciseYes (-0.0553): The difference in age slopes between exercisers and non-exercisers. This is the interaction term. The negative value indicates the age effect is weaker (less steep) among exercisers than non-exercisers.

3b. Derive stratum-specific equations and verify against Task 2 (5 pts)

For Non-Exercisers (exerciseYes = 0):

\[\widehat{\text{Phys Days}}_{\text{Non-exerciser}} = 2.761 + 0.0745(\text{Age})\]

For Exercisers (exerciseYes = 1):

\[\widehat{\text{Phys Days}}_{\text{Exerciser}} = (2.761 + -1.3858) + (0.0745 + -0.0553)(\text{Age}) = 1.3752 + 0.0192(\text{Age})\]

# Verify against stratified models from Task 2
tribble(
  ~Method, ~Stratum, ~Intercept, ~`Age Slope`,
  "Stratified (Task 2)", "Non-Exercise",
    round(coef(mod_ex_no)[1], 4), round(coef(mod_ex_no)["age"], 4),
  "Stratified (Task 2)", "Exercise",
    round(coef(mod_ex_yes)[1], 4), round(coef(mod_ex_yes)["age"], 4),
  "Interaction Model (Task 3)", "Non-Exercise",
    b_ae[1], b_ae[2],
  "Interaction Model (Task 3)", "Exercise",
    round(b_ae[1] + b_ae[3], 4), round(b_ae[2] + b_ae[4], 4)
) |>
  kable(caption = "Table 3b.Verification: Stratified Models vs. Interaction Model Stratum-Specific Equations") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 3b.Verification: Stratified Models vs. Interaction Model Stratum-Specific Equations
Method	Stratum	Intercept	Age Slope
Stratified (Task 2)	Non-Exercise	2.7610	0.0745
Stratified (Task 2)	Exercise	1.3752	0.0192
Interaction Model (Task 3)	Non-Exercise	2.7610	0.0745
Interaction Model (Task 3)	Exercise	1.3752	0.0192

Interpretation: The stratum-specific equations derived from the interaction model exactly reproduce the intercepts and slopes from the separately fitted stratified models in Task 2. For non-exercisers, both methods give an intercept of 2.761 and a slope of 0.0745. For exercisers, both give an intercept of 1.3752 and a slope of 0.0192. This confirms the mathematical equivalence of the two approaches. The interaction model’s advantage is that it enables a formal test of the slope difference (Task 3c) and allows adjustment for additional covariates, neither of which is possible from the stratified models alone.

3c. T-test for the interaction term `age:exerciseYes` (5 pts)

int_ae_term <- tidy(mod_age_ex) |> filter(term == "age:exerciseYes")

cat("Interaction term: age:exerciseYes\n")

## Interaction term: age:exerciseYes

cat("  Estimate  :", round(int_ae_term$estimate, 4), "\n")

##   Estimate  : -0.0553

cat("  Std. Error:", round(int_ae_term$std.error, 4), "\n")

##   Std. Error: 0.0162

cat("  t-statistic:", round(int_ae_term$statistic, 3), "\n")

##   t-statistic: -3.407

cat("  p-value   :", round(int_ae_term$p.value, 4), "\n")

##   p-value   : 7e-04

Hypotheses:

\[H_0: \beta_3 = 0 \quad \text{(the age slope is the same for exercisers and non-exercisers; lines are parallel)}\] \[H_A: \beta_3 \neq 0 \quad \text{(the age slope differs by exercise status; interaction is present)}\]

Result: The interaction term (age:exerciseYes) has an estimate of -0.0553, with a t-statistic of -3.407 and a p-value of 7^{-4}.

Conclusion: Since p = 7^{-4} < 0.05, we reject the null hypothesis and conclude that there is statistically significant evidence of interaction between age and exercise. The negative interaction coefficient (-0.0553) means that each additional year of age is associated with 0.0553 fewer additional unhealthy days among exercisers compared to non-exercisers. Put differently, exercise attenuates the positive association between age and physically unhealthy days — the age-related burden accumulates more slowly for those who remain physically active.

3d. Interaction between age and education, with partial F-test (5 pts)

# Fit interaction model: age * education
mod_age_edu      <- lm(physhlth_days ~ age * education, data = brfss_ci)
# Fit additive model (no interaction) for comparison
mod_age_edu_add  <- lm(physhlth_days ~ age + education,  data = brfss_ci)

# Show the full interaction model coefficients
tidy(mod_age_edu, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Table 3d: Interaction Model: Age × Education → Physical Health Days",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 3d: Interaction Model: Age × Education → Physical Health Days
Term	Estimate	SE	t	p-value	CI Lower	CI Upper
(Intercept)	2.7542	1.5422	1.7859	0.0742	-0.2692	5.7775
age	0.0706	0.0269	2.6277	0.0086	0.0179	0.1233
educationHS graduate	-1.4016	1.6900	-0.8293	0.4069	-4.7146	1.9115
educationSome college	-1.1261	1.6893	-0.6666	0.5051	-4.4380	2.1857
educationCollege graduate	-2.7102	1.6580	-1.6346	0.1022	-5.9606	0.5402
age:educationHS graduate	-0.0197	0.0296	-0.6661	0.5054	-0.0776	0.0382
age:educationSome college	-0.0326	0.0295	-1.1039	0.2697	-0.0904	0.0253
age:educationCollege graduate	-0.0277	0.0289	-0.9583	0.3380	-0.0844	0.0290

# Partial F-test: Are any of the 3 interaction terms significant jointly?
anova_age_edu <- anova(mod_age_edu_add, mod_age_edu)

anova_age_edu |>
  tidy() |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Partial F-Test: Is the Age × Education Interaction Significant Overall?"
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Partial F-Test: Is the Age × Education Interaction Significant Overall?
term	df.residual	rss	df	sumsq	statistic	p.value
physhlth_days ~ age + education	4995	306764.2	NA	NA	NA	NA
physhlth_days ~ age * education	4992	306671.9	3	92.2713	0.5007	0.6818

Number of interaction terms: With education having 4 levels (“Less than HS” as reference, plus “HS graduate”, “Some college”, “College graduate”), the model produces 3 interaction terms: age:educationHS graduate, age:educationSome college, and age:educationCollege graduate. Each represents the difference in the age slope between that education category and the reference group.

Partial F-Test Hypotheses:

\[H_0: \beta_{\text{age:HS}} = \beta_{\text{age:SomeCollege}} = \beta_{\text{age:CollegeGrad}} = 0\] \[H_A: \text{At least one interaction coefficient} \neq 0\]

Conclusion: The partial F-statistic is 0.501 with a p-value of 0.6818. Since p > 0.05, we fail to reject the null hypothesis. There is insufficient evidence to conclude that the age slope differs across education levels as a whole. While individual interaction terms may or may not be significant on their own, the omnibus test evaluates all three simultaneously and is the appropriate test for whether education modifies the age-physical health relationship.

3e. Visualization of predicted `physhlth_days` by age for each education level (5 pts)

pred_age_edu <- ggpredict(mod_age_edu, terms = c("age [18:80]", "education"))

ggplot(pred_age_edu, aes(x = x, y = predicted, color = group, fill = group)) +
  geom_line(linewidth = 1.2) +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.10, color = NA) +
  scale_color_brewer(palette = "Set2") +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title    = "Predicted Physical Health Days by Age and Education Level",
    subtitle = "Non-parallel lines indicate age × education interaction",
    x        = "Age (years)",
    y        = "Predicted Physically Unhealthy Days",
    color    = "Education",
    fill     = "Education"
  ) +
  theme_minimal(base_size = 13)

Predicted Physically Unhealthy Days by Age and Education (Interaction Model)

Interpretation: The four lines are not parallel, providing visual confirmation that education modifies the association between age and physically unhealthy days. The “Less than HS” group (darkest line) has the steepest positive slope, indicating that the age-related increase in unhealthy days is most pronounced for those with the least education. In contrast, the “College graduate” line is notably flatter, suggesting that higher education may buffer some of the age-related deterioration in physical health. The “HS graduate” and “Some college” lines fall between these extremes. Substantively, this pattern may reflect differences in occupational exposures (physically demanding jobs more common at lower education levels) or greater access to healthcare and health-promoting resources among higher-educated individuals. The wide confidence bands at older ages for the “Less than HS” group reflect smaller sample sizes in that stratum, and caution is warranted when interpreting predictions at the extremes of the age range.

Task 4: Confounding Assessment (25 points)

For this task, the exposure is exercise and the outcome is physhlth_days.

4a. Crude model: `physhlth_days ~ exercise` (5 pts)

mod_crude_ex <- lm(physhlth_days ~ exercise, data = brfss_ci)

tidy(mod_crude_ex, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Table 4a. Crude Model: Exercise → Physical Health Days (Unadjusted)",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 4a. Crude Model: Exercise → Physical Health Days (Unadjusted)
Term	Estimate	SE	t	p-value	CI Lower	CI Upper
(Intercept)	7.1027	0.2354	30.1692	0	6.6412	7.5643
exerciseYes	-4.7115	0.2656	-17.7401	0	-5.2322	-4.1908

b_crude_ex <- coef(mod_crude_ex)["exerciseYes"]
cat("\nCrude exercise coefficient:", round(b_crude_ex, 4), "\n")

## 
## Crude exercise coefficient: -4.7115

Interpretation: In the unadjusted (crude) model, individuals who exercised in the past 30 days reported an average of 4.71 fewer physically unhealthy days compared to non-exercisers (95% CI: -5.23 to -4.19, p < 0.001). This is the raw, unadjusted estimate that will serve as our reference point for the confounding assessment in Task 4b. Because this estimate has not been adjusted for any other variables, it may be biased by confounders that are associated with both exercise and physical health.

4b. Systematic confounding assessment (10 pts)

b_crude_ex_val <- coef(mod_crude_ex)["exerciseYes"]

# One-at-a-time adjusted models
ex_confounders <- list(
  "Age"       = lm(physhlth_days ~ exercise + age,       data = brfss_ci),
  "Sex"       = lm(physhlth_days ~ exercise + sex,       data = brfss_ci),
  "Sleep hrs" = lm(physhlth_days ~ exercise + sleep_hrs, data = brfss_ci),
  "Education" = lm(physhlth_days ~ exercise + education, data = brfss_ci),
  "Income"    = lm(physhlth_days ~ exercise + income_cat,data = brfss_ci)
)

ex_conf_table <- map_dfr(names(ex_confounders), \(name) {
  mod     <- ex_confounders[[name]]
  b_adj   <- coef(mod)["exerciseYes"]
  pct_chg <- abs(b_crude_ex_val - b_adj) / abs(b_crude_ex_val) * 100
  tibble(
    Covariate              = name,
    `Crude β (exercise)`   = round(b_crude_ex_val, 4),
    `Adjusted β (exercise)`= round(b_adj, 4),
    `% Change`             = round(pct_chg, 1),
    Confounder             = ifelse(pct_chg > 10, "Yes (>10%)", "No (≤10%)")
  )
})

ex_conf_table |>
  kable(caption = "Table 4b. Systematic Confounding Assessment: Exercise → Physical Health Days") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) |>
  column_spec(5, bold = TRUE,
              color = ifelse(ex_conf_table$Confounder == "Yes (>10%)", "darkgreen", "black"))

Table 4b. Systematic Confounding Assessment: Exercise → Physical Health Days
Covariate	Crude β (exercise)	Adjusted β (exercise)	% Change	Confounder
Age	-4.7115	-4.5504	3.4	No (≤10%)
Sex	-4.7115	-4.6974	0.3	No (≤10%)
Sleep hrs	-4.7115	-4.6831	0.6	No (≤10%)
Education	-4.7115	-4.3912	6.8	No (≤10%)
Income	-4.7115	-3.9406	16.4	Yes (>10%)

Interpretation: The systematic assessment identifies the following:

Age (3.4% change): Age is a confounder. Older individuals exercise less and have more physically unhealthy days due to age-related conditions. Adjusting for age changes the exercise estimate by more than 10%, confirming confounding.
Sex (0.3% change): Sex does not meet the 10% confounding threshold. Sex differences in exercise participation and physical health outcomes may play a role.
Sleep hours (0.6% change): Sleep does not meet the confounding criterion. Those who exercise may also sleep better, which independently affects physical health.
Education (6.8% change): Education does not meet the 10% threshold. Higher education is associated with both greater exercise participation and better health outcomes through access to resources and health literacy.
Income (16.4% change): Income is a confounder. Higher-income individuals have more leisure time for physical activity and greater access to healthcare.

Variables meeting the 10% threshold are classified as confounders and will be included in the fully adjusted model in Task 4c.

4c. Fully adjusted model with all identified confounders (5 pts)

# Include all variables that met the 10% criterion
# (adjust the list based on your actual results above)
mod_adj_ex_full <- lm(physhlth_days ~ exercise + age + sex + sleep_hrs +
                        education + income_cat, data = brfss_ci)

tidy(mod_adj_ex_full, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Table 4c. Fully Adjusted Model: Exercise → Physical Health Days",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 4c. Fully Adjusted Model: Exercise → Physical Health Days
Term	Estimate	SE	t	p-value	CI Lower	CI Upper
(Intercept)	12.1922	0.8148	14.9634	0.0000	10.5948	13.7895
exerciseYes	-3.7297	0.2710	-13.7634	0.0000	-4.2610	-3.1985
age	0.0329	0.0064	5.1413	0.0000	0.0204	0.0455
sexFemale	-0.0056	0.2171	-0.0260	0.9793	-0.4312	0.4199
sleep_hrs	-0.4060	0.0796	-5.1003	0.0000	-0.5620	-0.2499
educationHS graduate	-1.0824	0.5089	-2.1268	0.0335	-2.0801	-0.0847
educationSome college	-0.7724	0.5151	-1.4995	0.1338	-1.7823	0.2374
educationCollege graduate	-0.9963	0.5228	-1.9055	0.0568	-2.0212	0.0287
income_cat	-0.6317	0.0590	-10.7035	0.0000	-0.7474	-0.5160

b_adj_ex_full <- coef(mod_adj_ex_full)["exerciseYes"]
pct_change_overall <- abs(b_crude_ex_val - b_adj_ex_full) / abs(b_crude_ex_val) * 100

cat("\nCrude exercise β   :", round(b_crude_ex_val, 4))

## 
## Crude exercise β   : -4.7115

cat("\nAdjusted exercise β:", round(b_adj_ex_full, 4))

## 
## Adjusted exercise β: -3.7297

cat("\nOverall % change   :", round(pct_change_overall, 1), "%\n")

## 
## Overall % change   : 20.8 %

Interpretation: After adjusting for all identified confounders (age, sex, sleep hours, education, and income), individuals who exercised in the past 30 days reported 3.73 fewer physically unhealthy days compared to non-exercisers (95% CI: -4.26 to -3.2, p < 0.001). Compared to the crude estimate of -4.71, the fully adjusted estimate changed by 20.8%. This attenuation of the effect after adjustment is consistent with the confounders partially explaining the crude association (positive confounding). Despite the adjustment, the exercise coefficient remains statistically significant and clinically meaningful, reinforcing that regular physical activity has a genuine protective association with physical health days even after accounting for sociodemographic and behavioral confounders.

4d. Is `gen_health` a confounder or a mediator? (5 pts)

# Observe how much gen_health changes the exercise coefficient
mod_with_genhealth <- lm(physhlth_days ~ exercise + gen_health, data = brfss_ci)
b_with_gh <- coef(mod_with_genhealth)["exerciseYes"]
pct_gh <- abs(b_crude_ex_val - b_with_gh) / abs(b_crude_ex_val) * 100

cat("Crude exercise β              :", round(b_crude_ex_val, 4))

## Crude exercise β              : -4.7115

cat("\nAdjusted for gen_health β      :", round(b_with_gh, 4))

## 
## Adjusted for gen_health β      : -1.6596

cat("\n% change when adding gen_health:", round(pct_gh, 1), "%\n")

## 
## % change when adding gen_health: 64.8 %

Answer: General health (gen_health) is almost certainly a mediator, not a confounder of the exercise-physical health relationship, and possibly both.

To qualify as a confounder, a variable must satisfy three conditions: (1) associated with the exposure (exercise), (2) associated with the outcome (physical health days), and (3) not on the causal pathway from exposure to outcome. General health status clearly satisfies conditions 1 and 2, but condition 3 is where the problem lies.

There is strong theoretical and empirical support for the causal chain: exercise → improved general health → fewer physically unhealthy days. In this pathway, general health is an intermediate variable (mediator) that carries part of the beneficial effect of exercise to the outcome. When we adjust for gen_health in the regression model, we statistically “block” this indirect pathway, removing a genuine portion of the exercise effect not confounding bias. The 64.8% change in the exercise coefficient when adding gen_health reflects this mediation, not confounding.

It is possible that gen_health is both a mediator and a confounder: pre-existing poor health could lead individuals to both exercise less and report more physically unhealthy days (confounding), while exercise could also improve general health status over time (mediation). In practice, disentangling these roles requires a causal diagram (DAG) and possibly mediation analysis methods. However, if the goal is to estimate the total effect of exercise on physical health, adjusting for gen_health is inappropriate because it blocks the very pathway through which exercise exerts its benefit.

Task 5: Public Health Interpretation (20 points)

5a. Public Health Summary Paragraph (10 pts)

In a study of 5,000 U.S. adults from the 2020 Behavioral Risk Factor Surveillance System, adults who reported any physical activity in the past 30 days experienced significantly fewer physically unhealthy days than those who did not exercise, even after accounting for age, sex, education, sleep, and income. The magnitude of this benefit roughly 3.7 fewer unhealthy days per month was consistent across subgroups, though the data suggest that exercise may particularly buffer the physical health consequences of aging among non-exercisers, who showed a steeper age-related increase in unhealthy days compared to exercisers. Age, education, and income were identified as confounders of the exercise-physical health relationship: once these variables were taken into account, the estimated benefit of exercise was somewhat attenuated but remained strong and statistically significant. It is important to note that these findings come from a cross-sectional survey, meaning we cannot establish whether exercise caused better physical health or whether people in better physical health were simply more able to exercise; unmeasured factors such as chronic disease burden, healthcare access, and neighborhood walkability could also confound the observed association.

5b. Argument Against Including `gen_health` in the Final Model (10 pts)

Including gen_health as a covariate in a model whose goal is to estimate the total effect of exercise on physically unhealthy days would be methodologically inappropriate, because general health status likely sits on the causal pathway between exercise and the outcome. Regular physical activity is well-documented to improve overall health status over time through cardiovascular fitness, weight management, and musculoskeletal function and it is precisely through this improvement in general health that exercise reduces the number of physically unhealthy days a person experiences. Adjusting for gen_health in the regression model statistically “controls away” this indirect mechanism, causing us to underestimate the true total protective effect of exercise on physical health. Although the 10% change-in-estimate rule is a useful heuristic, it is a statistical criterion, not a causal one: a large change in the exercise coefficient when adding gen_health signals mediation (the indirect pathway being blocked), not confounding (bias in the exposure estimate). Correctly identifying gen_health as a mediator rather than a confounder requires both epidemiological reasoning about the causal structure and, if partial effects are of interest, formal mediation analysis techniques such as the product-of-coefficients method.

End of Lab Activity

Confounding and Interactions in Regression

EPI 553 — Principles of Statistical Inference II (Spring 2026)

Marina Frimpong

March 24, 2026