Introduction

In Part 1, we introduced the logistic model, the logit transformation, and the connection between logistic regression coefficients and odds ratios. We fit simple logistic regression models with single predictors and previewed multiple logistic regression.

In Part 2, we go deeper:

Multiple logistic regression with adjusted odds ratios for confounder control
Maximum likelihood estimation and the Wald and likelihood ratio tests
Confidence intervals for adjusted odds ratios
Interaction terms in logistic regression and effect modification
Goodness-of-fit: deviance, Hosmer-Lemeshow test, pseudo-R²
Model discrimination: ROC curves and the area under the curve (AUC)
Diagnostics: residuals, influential observations, and linearity in the logit

Textbook reference: Kleinbaum et al., Chapter 22 (Sections 22.4 and 22.5)

In-Class Lab Activity

EPI 553 — Logistic Regression Part 2 Lab Due: End of class, April 14, 2026

Instructions

In this lab, you will build a multiple logistic regression model, conduct hypothesis tests, examine an interaction, and assess goodness-of-fit and discrimination. Use the same BRFSS 2020 logistic dataset from Part 1. Work through each task systematically. You may discuss concepts with classmates, but your written answers and R code must be your own.

Submission: Knit your .Rmd to HTML and upload to Brightspace by end of class.

Data for the Lab

Variable	Description	Type
`fmd`	Frequent mental distress (No/Yes)	Factor (outcome)
`menthlth_days`	Mentally unhealthy days (0–30)	Numeric
`physhlth_days`	Physically unhealthy days (0–30)	Numeric
`sleep_hrs`	Sleep hours per night (1–14)	Numeric
`age`	Age in years (capped at 80)	Numeric
`sex`	Sex (Male/Female)	Factor
`bmi`	Body mass index	Numeric
`exercise`	Exercised in past 30 days (No/Yes)	Factor
`income_cat`	Household income category (1–8)	Numeric
`smoker`	Former/Never vs. Current	Factor

Setup and Data

library(tidyverse)
library(knitr)
library(kableExtra)
library(broom)
library(gtsummary)
library(car)
library(ggeffects)
library(ResourceSelection)  # for Hosmer-Lemeshow
library(pROC)               # for ROC/AUC
library(performance)        # for model performance metrics

options(gtsummary.use_ftExtra = TRUE)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))

brfss_logistic <- readRDS(
  "/Users/nataliasmall/Downloads/EPI 553/brfss_logistic_2020.rds")

dim(brfss_logistic)

## [1] 5000   10

Task 1: Build a Multiple Logistic Regression Model (15 points)

1a. (5 pts) Fit a multiple logistic regression model predicting fmd from at least 5 predictors of your choice.

mod_multi <- glm(fmd ~ physhlth_days + sleep_hrs + exercise + sex + age, 
                 data = brfss_logistic, 
                 family = binomial(link = "logit"))

1b. (5 pts) Report the adjusted ORs with 95% CIs in a publication-quality table using tbl_regression().

mod_multi |>
  tbl_regression(
    exponentiate = TRUE,
    label = list(
      physhlth_days   ~ "Physically unhealthy days",
      sleep_hrs       ~ "Sleep hours",
      exercise        ~ "Exercise (past 30 days)",
      sex             ~ "Sex",
      age             ~ "Age (per year)"
    )
  ) |>
  bold_labels() |>
  bold_p()

Characteristic	OR	95% CI	p-value
Physically unhealthy days	1.07	1.06, 1.08	<0.001
Sleep hours	0.85	0.80, 0.90	<0.001
Exercise (past 30 days)
No	—	—
Yes	0.75	0.62, 0.90	0.002
Sex
Male	—	—
Female	1.75	1.48, 2.07	<0.001
Age (per year)	0.97	0.96, 0.97	<0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

1c. (5 pts) Interpret the adjusted OR for two predictors of your choice in 1-2 sentences each. Make sure to mention what the OR represents (per unit change for continuous; reference category for categorical). Exercise (past 30 days) Compared to those who have not exercised in the past 30 days (the reference group), those who have excersied have 0.75 times the odds of frequent mental distress, adjusting for all other variables in the model.
SexCompared to males (the reference group), females have 1.75 times the odds of frequent mental distress, holding all other variables constant. —

Task 2: Wald and Likelihood Ratio Tests (15 points)

2a. (5 pts) Identify the Wald p-value for each predictor in your model from the tidy() or summary() output.

tidy(mod_multi, conf.int = TRUE, exponentiate = TRUE) |>
  mutate(across(c(estimate, std.error, statistic, conf.low, conf.high),
                \(x) round(x, 3)),
         p.value = format.pval(p.value, digits = 3)) |>
  kable(caption = "Wald Tests for Each Coefficient (Exponentiated)") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

Wald Tests for Each Coefficient (Exponentiated)
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	2.088	0.238	3.088	0.00202	1.309	3.332
physhlth_days	1.070	0.004	17.276	< 2e-16	1.062	1.078
sleep_hrs	0.849	0.027	-5.955	2.60e-09	0.804	0.896
exerciseYes	0.747	0.095	-3.072	0.00213	0.620	0.901
sexFemale	1.745	0.086	6.508	7.62e-11	1.476	2.065
age	0.966	0.003	-12.447	< 2e-16	0.961	0.971

2b. (5 pts) Fit a reduced model that drops one predictor of your choice. Perform a likelihood ratio test comparing the full and reduced models using anova(reduced, full, test = "LRT").

mod_reduced <- glm(fmd ~ physhlth_days + sleep_hrs + exercise + sex,
  data = brfss_logistic,
  family = binomial)

anova(mod_reduced, mod_multi, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test: Does adding age improve the model?") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

LR Test: Does adding age improve the model?
Resid. Df	Resid. Dev	Df	Deviance	Pr(>Chi)
4995	3827.637	NA	NA	NA
4994	3666.679	1	160.958	0

2c. (5 pts) Compare the conclusions from the Wald test (for the dropped predictor) and the LR test. Do they agree? In 2-3 sentences, explain when the two tests might disagree. Interpretation: The Wald test highlights that age is independently significant contributer to predict fmd; likelihood ratio test (LR test) has a p-value < 0.001 indicating that the the dropped variable (age) jointly contributed to model fit. They do agree. Wald test might disagree from a likelihood ratio test when sample sizes are small or when coefficients are large, LR test is generally preferred for those situations.

Task 3: Test an Interaction (20 points)

3a. (5 pts) Fit a model that includes an interaction between two predictors of your choice (e.g., exercise * sex or smoker * age).

mod_interact <- glm(
  fmd ~ exercise * sex + age + smoker + sleep_hrs + income_cat,
  data = brfss_logistic,
  family = binomial
)

mod_interact |>
  tbl_regression(exponentiate = TRUE) |>
  bold_labels() |>
  bold_p()

Characteristic	OR	95% CI	p-value
exercise
No	—	—
Yes	0.61	0.47, 0.79	<0.001
sex
Male	—	—
Female	1.66	1.26, 2.21	<0.001
IMPUTED AGE VALUE COLLAPSED ABOVE 80	0.97	0.97, 0.98	<0.001
smoker
Former/Never	—	—
Current	1.25	1.05, 1.48	0.012
sleep_hrs	0.81	0.77, 0.86	<0.001
income_cat	0.85	0.82, 0.89	<0.001
exercise * sex
Yes * Female	1.00	0.71, 1.41	>0.9
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

3b. (5 pts) Perform a likelihood ratio test comparing the model with the interaction to the model without it.

mod_no_interact <- glm(
  fmd ~ exercise + sex + age + smoker + sleep_hrs + income_cat,
  data = brfss_logistic,
  family = binomial
)

anova(mod_no_interact, mod_interact, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test for Exercise × Sex Interaction") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

LR Test for Exercise × Sex Interaction
Resid. Df	Resid. Dev	Df	Deviance	Pr(>Chi)
4993	3870.197	NA	NA	NA
4992	3870.197	1	0	0.987

3c. (5 pts) Create a visualization of the interaction using ggpredict() and plot().

ggpredict(mod_interact, terms = c("exercise", "sex")) |>
  plot() +
  labs(title = "Predicted Probability of FMD by Exercise and Sex",
       x = "Exercise", y = "Predicted Probability of FMD",
       color = "Sex") +
  theme_minimal()

3d. (5 pts) In 3-4 sentences, interpret the interaction. Does the effect of one predictor differ across levels of the other? If statistically significant, report the stratum-specific odds ratios. Interpretation: Since the p-value for the LR test is 0.987 (> 0.05), the interaction is not statistically significant, the effect of exercise does not differ by sex. We can drop the interaction term and use the simpler main-effects model. The visualization further confirms this conclusion, since the lines are relatively parallel, there is no interaction. There is no need to report the stratum-specific odds ratios because the interaction is not statistically significant.

Task 4: Goodness-of-Fit and Discrimination (25 points)

4a. (5 pts) Compute McFadden’s pseudo-R² for your full model using performance::r2_mcfadden().

performance::r2_mcfadden(mod_multi)

## # R2 for Generalized Linear Regression
##        R2: 0.138
##   adj. R2: 0.137

4b. (5 pts) Perform the Hosmer-Lemeshow goodness-of-fit test using ResourceSelection::hoslem.test(). Report the test statistic and p-value. Comment on the interpretation given your sample size.

ResourceSelection::hoslem.test(mod_multi$y, fitted(mod_multi))

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  mod_multi$y, fitted(mod_multi)
## X-squared = 15.295, df = 8, p-value = 0.05366

X-squared:15.295 p-value:0.05366 Interpretation:Since the p-value, 0.05366, is greater than 0.05 the Hosmer-Lemeshow test suggests that the model had a good fit. While the test can be over-powered and detect trivial misfit given the large sample size, N = 5000,the model still passes the threshold for a good fit.

4c. (5 pts) Create a calibration plot showing observed vs. predicted probabilities by decile. Comment on whether the model appears well calibrated.

brfss_logistic |>
  mutate(pred_prob = fitted(mod_multi),
         obs_outcome = as.numeric(fmd) - 1,
         decile = ntile(pred_prob, 10)) |>
  group_by(decile) |>
  summarise(
    mean_pred = mean(pred_prob),
    mean_obs  = mean(obs_outcome),
    .groups = "drop"
  ) |>
  ggplot(aes(x = mean_pred, y = mean_obs)) +
  geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
  geom_point(size = 3, color = "steelblue") +
  geom_line(color = "steelblue") +
  labs(title = "Calibration Plot: Observed vs. Predicted Probability of FMD",
       subtitle = "Points should fall on the dashed line for perfect calibration",
       x = "Mean Predicted Probability (within decile)",
       y = "Observed Proportion (within decile)") +
  theme_minimal()

Interpretation:The points fall closely along the 45-degree dashed line, indicating good calibration. While there are minor fluctuations in the middle deciles, the model does not consistently over-predict or under-predict the probability of fmd, suggesting the predicted probabilities are reliable.

4d. (10 pts) Compute and plot the ROC curve using pROC::roc(). Report the AUC. Based on the AUC value, how would you describe the model’s discrimination ability (poor, acceptable, excellent, outstanding)?

roc_obj <- roc(
  response = brfss_logistic$fmd,
  predictor = fitted(mod_multi),
  levels = c("No", "Yes"),
  direction = "<"
)

auc_value <- auc(roc_obj)

ggroc(roc_obj, color = "steelblue", linewidth = 1.2) +
  geom_abline(slope = 1, intercept = 1, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Frequent Mental Distress Model",
       subtitle = paste0("AUC = ", round(auc_value, 3)),
       x = "Specificity", y = "Sensitivity") +
  theme_minimal()

Interpretation:The AUC is 0.763, which is falls between 0.75 and 0.80. This indicates an acceptable discrimination, meaning the model can distinguish between individuals with and without FMD reasonably well (correctly identifying the person with fmd about 76% of the time).

End of Lab Activity

Logistic Regression — Part 2

EPI 553 — Principles of Statistical Inference II (Spring 2026)

Natalia Small

April 14, 2026