Part 9: In-Class Lab Activity

EPI 553 — Logistic Regression Part 2 Lab Due: End of class, April 14, 2026

Instructions

In this lab, you will build a multiple logistic regression model, conduct hypothesis tests, examine an interaction, and assess goodness-of-fit and discrimination. Use the same BRFSS 2020 logistic dataset from Part 1. Work through each task systematically. You may discuss concepts with classmates, but your written answers and R code must be your own.

Submission: Knit your .Rmd to HTML and upload to Brightspace by end of class.

Data for the Lab

Variable	Description	Type
`fmd`	Frequent mental distress (No/Yes)	Factor (outcome)
`menthlth_days`	Mentally unhealthy days (0–30)	Numeric
`physhlth_days`	Physically unhealthy days (0–30)	Numeric
`sleep_hrs`	Sleep hours per night (1–14)	Numeric
`age`	Age in years (capped at 80)	Numeric
`sex`	Sex (Male/Female)	Factor
`bmi`	Body mass index	Numeric
`exercise`	Exercised in past 30 days (No/Yes)	Factor
`income_cat`	Household income category (1–8)	Numeric
`smoker`	Former/Never vs. Current	Factor

library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(gtsummary)
library(car)
library(ggeffects)
library(ResourceSelection)
library(pROC)
library(performance)

brfss_logistic <- readRDS(
  "~/Desktop/EPI553/data/brfss_logistic_2020 copy.rds"
)

Task 1: Build a Multiple Logistic Regression Model (15 points)

1a. (5 pts) Fit a multiple logistic regression model predicting fmd from at least 5 predictors of your choice.

mod_fmd <- glm(fmd ~ exercise + sex + physhlth_days + age + sleep_hrs, 
               data= brfss_logistic,
               family= binomial(link= "logit"))

1b. (5 pts) Report the adjusted ORs with 95% CIs in a publication-quality table using tbl_regression().

mod_fmd |>
  tbl_regression(
    exponentiate = TRUE,
    label = list(
      exercise      ~ "Exercise (past 30 days)",
      sex           ~ "Sex",
      physhlth_days ~ "Physically unhealthy days",
      age           ~ "Age (per year)",
      sleep_hrs     ~ "Sleep hours"
    )
  ) |>
  bold_labels() |>
  bold_p()

Characteristic	OR	95% CI	p-value
Exercise (past 30 days)
No	—	—
Yes	0.75	0.62, 0.90	0.002
Sex
Male	—	—
Female	1.75	1.48, 2.07	<0.001
Physically unhealthy days	1.07	1.06, 1.08	<0.001
Age (per year)	0.97	0.96, 0.97	<0.001
Sleep hours	0.85	0.80, 0.90	<0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

1c. (5 pts) Interpret the adjusted OR for two predictors of your choice in 1-2 sentences each. Make sure to mention what the OR represents (per unit change for continuous; reference category for categorical).

In comparison to those who did not exercise in the past 30 days (reference category), those who did exercise have 0.75 times the odds of frequent mental distress (aOR= 0.75, 95% CI: 0.62-0.90). In other words, there is 25% lower odds of fmd among those who exercised, after adjusting for sex, physical health, age, and sleep. Since the aOR is less than 1, it indicates that exercise is a protective factor.

For each additional hour of sleep, the odds of frequent mental distress decrease by 15% (aOR= 0.85, 95% CI= 0.80-0.90), after adjusting for exercise, sex, physical health, and age. Since aOR is less than 1, it indicates sleep is a protective factor.

Task 2: Wald and Likelihood Ratio Tests (15 points)

2a. (5 pts) Identify the Wald p-value for each predictor in your model from the tidy() or summary() output.

tidy(mod_fmd, conf.int = TRUE, exponentiate = TRUE) |>
  mutate(across(c(estimate, std.error, statistic, conf.low, conf.high),
                \(x) round(x, 3)),
         p.value = format.pval(p.value, digits = 3)) |>
  kable(caption = "Wald Tests for Each Coefficient (Exponentiated)") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

Wald Tests for Each Coefficient (Exponentiated)
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	2.088	0.238	3.088	0.00202	1.309	3.332
exerciseYes	0.747	0.095	-3.072	0.00213	0.620	0.901
sexFemale	1.745	0.086	6.508	7.62e-11	1.476	2.065
physhlth_days	1.070	0.004	17.276	< 2e-16	1.062	1.078
age	0.966	0.003	-12.447	< 2e-16	0.961	0.971
sleep_hrs	0.849	0.027	-5.955	2.60e-09	0.804	0.896

exerciseYes: p-value= 0.002 sexFemale: p-value < 0.001 physhlth_days: p-value < 0.001 age: p-value < 0.001 sleep_hrs: p-value < 0.001

2b. (5 pts) Fit a reduced model that drops one predictor of your choice. Perform a likelihood ratio test comparing the full and reduced models using anova(reduced, full, test = "LRT").

mod_reduced <- glm(
  fmd ~ sex + physhlth_days + age + sleep_hrs,
  data = brfss_logistic,
  family = binomial
)

anova(mod_reduced, mod_fmd, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test: Does adding exercise improve the model?") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

LR Test: Does adding exercise improve the model?
Resid. Df	Resid. Dev	Df	Deviance	Pr(>Chi)
4995	3675.927	NA	NA	NA
4994	3666.679	1	9.248	0.002

2c. (5 pts) Compare the conclusions from the Wald test (for the dropped predictor) and the LR test. Do they agree? In 2-3 sentences, explain when the two tests might disagree.

The Wald test and the LR test agree. The Wald test was statistically significant for exercise (p-value= 0.002) and the likelihood ratio test was also significant (p-value= 0.002) which indicates that exercise is an important predictor. The two tests might disagree when sample sizes are small or with extreme estimates (coefficients). When sample sizes are small or coefficients are large the wald test becomes less reliable.

Task 3: Test an Interaction (20 points)

3a. (5 pts) Fit a model that includes an interaction between two predictors of your choice (e.g., exercise * sex or smoker * age).

mod_interaction <- glm(fmd ~ exercise * sex + physhlth_days + age + sleep_hrs, data= brfss_logistic, family= binomial
)

mod_interaction |>
  tbl_regression(exponentiate = TRUE) |>
  bold_labels() |>
  bold_p()

Characteristic	OR	95% CI	p-value
exercise
No	—	—
Yes	0.74	0.56, 0.97	0.027
sex
Male	—	—
Female	1.72	1.28, 2.31	<0.001
physhlth_days	1.07	1.06, 1.08	<0.001
IMPUTED AGE VALUE COLLAPSED ABOVE 80	0.97	0.96, 0.97	<0.001
sleep_hrs	0.85	0.80, 0.90	<0.001
exercise * sex
Yes * Female	1.02	0.71, 1.46	>0.9
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

3b. (5 pts) Perform a likelihood ratio test comparing the model with the interaction to the model without it.

mod_no_interaction <- glm(
  fmd ~ exercise + sex + age + sleep_hrs + physhlth_days,
  data = brfss_logistic,
  family = binomial
)

anova(mod_no_interaction, mod_interaction, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test for Exercise × Sex Interaction") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

LR Test for Exercise × Sex Interaction
Resid. Df	Resid. Dev	Df	Deviance	Pr(>Chi)
4994	3666.679	NA	NA	NA
4993	3666.664	1	0.016	0.9

3c. (5 pts) Create a visualization of the interaction using ggpredict() and plot().

ggpredict(mod_interaction, terms = c("exercise", "sex")) |>
  plot() +
  labs(title = "Predicted Probability of FMD by Exercise and Sex",
       x = "Exercise", y = "Predicted Probability of FMD",
       color = "Sex") +
  theme_minimal()

3d. (5 pts) In 3-4 sentences, interpret the interaction. Does the effect of one predictor differ across levels of the other? If statistically significant, report the stratum-specific odds ratios.

The interaction between exercise and sex is not statistically significant, the p-value is greater than 0.9. The likelihood test supports this with a p-value= 0.9 therefore the effect of exercise on fmd does not differ by sex. Since the interaction is not statistically significant the effect of both groups can be interpreted by themselves. This means females have 1.72 times the odds of fmd compared to males, regardless of exercise status. Also, the odds of fmd are 26% lower for those who exercise compared to those who do not.

Task 4: Goodness-of-Fit and Discrimination (25 points)

4a. (5 pts) Compute McFadden’s pseudo-R² for your full model using performance::r2_mcfadden().

performance::r2_mcfadden(mod_fmd)

## # R2 for Generalized Linear Regression
##        R2: 0.138
##   adj. R2: 0.137

4b. (5 pts) Perform the Hosmer-Lemeshow goodness-of-fit test using ResourceSelection::hoslem.test(). Report the test statistic and p-value. Comment on the interpretation given your sample size.

hl_test <- hoslem.test(
  x = as.numeric(brfss_logistic$fmd) - 1,
  y = fitted(mod_fmd),
  g = 10
)

hl_test

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  as.numeric(brfss_logistic$fmd) - 1, fitted(mod_fmd)
## X-squared = 15.295, df = 8, p-value = 0.05366

test statistic (X-squared)= 15.295 p-value= 0.05366 Given the large sample size and a p-value just above 0.05, it is suggested that the model seems to fit the data but it should be interpreted with caution.

4c. (5 pts) Create a calibration plot showing observed vs. predicted probabilities by decile. Comment on whether the model appears well calibrated.

brfss_logistic |>
  mutate(pred_prob = fitted(mod_fmd),
         obs_outcome = as.numeric(fmd) - 1,
         decile = ntile(pred_prob, 10)) |>
  group_by(decile) |>
  summarise(
    mean_pred = mean(pred_prob),
    mean_obs  = mean(obs_outcome),
    .groups = "drop"
  ) |>
  ggplot(aes(x = mean_pred, y = mean_obs)) +
  geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
  geom_point(size = 3, color = "steelblue") +
  geom_line(color = "steelblue") +
  labs(title = "Calibration Plot: Observed vs. Predicted Probability of FMD",
       subtitle = "Points should fall on the dashed line for perfect calibration",
       x = "Mean Predicted Probability (within decile)",
       y = "Observed Proportion (within decile)") +
  theme_minimal()

The model is not perfectly well calibrated. There are points that fall both above and under the 45-degree line but they are all relatively close.

4d. (10 pts) Compute and plot the ROC curve using pROC::roc(). Report the AUC. Based on the AUC value, how would you describe the model’s discrimination ability (poor, acceptable, excellent, outstanding)?

roc_obj <- roc(
  response = brfss_logistic$fmd,
  predictor = fitted(mod_fmd),
  levels = c("No", "Yes"),
  direction = "<"
)

auc_value <- auc(roc_obj)

ggroc(roc_obj, color = "steelblue", linewidth = 1.2) +
  geom_abline(slope = 1, intercept = 1, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Frequent Mental Distress Model",
       subtitle = paste0("AUC = ", round(auc_value, 3)),
       x = "Specificity", y = "Sensitivity") +
  theme_minimal()

The AUC= 0.763 Based on the AUC of 0.763, the models discrimination ability is acceptable. —

End of Lab Activity

Logistic Regression — Part 2

EPI 553 — Principles of Statistical Inference II (Spring 2026)

Muntasir Masum

April 14, 2026