Introduction

In Part 1, we introduced the logistic model, the logit transformation, and the connection between logistic regression coefficients and odds ratios. We fit simple logistic regression models with single predictors and previewed multiple logistic regression.

In Part 2, we go deeper:

  1. Multiple logistic regression with adjusted odds ratios for confounder control
  2. Maximum likelihood estimation and the Wald and likelihood ratio tests
  3. Confidence intervals for adjusted odds ratios
  4. Interaction terms in logistic regression and effect modification
  5. Goodness-of-fit: deviance, Hosmer-Lemeshow test, pseudo-R²
  6. Model discrimination: ROC curves and the area under the curve (AUC)
  7. Diagnostics: residuals, influential observations, and linearity in the logit

Textbook reference: Kleinbaum et al., Chapter 22 (Sections 22.4 and 22.5)


Setup and Data

library(tidyverse)
library(knitr)
library(kableExtra)
library(broom)
library(gtsummary)
library(car)
library(ggeffects)
library(ResourceSelection)  # for Hosmer-Lemeshow
library(pROC)               # for ROC/AUC
library(performance)        # for model performance metrics
library(sjPlot)
library(modelsummary)


options(gtsummary.use_ftExtra = TRUE)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))

Part 9: In-Class Lab Activity

EPI 553 — Logistic Regression Part 2 Lab Due: End of class, April 14, 2026

brfss_logistic <- readRDS("C:/Users/joshm/Documents/UAlbany/Spring 2026/EPI 553/Labs/brfss_logistic_2020.rds")

Task 1: Build a Multiple Logistic Regression Model (15 points)

1a. (5 pts) Fit a multiple logistic regression model predicting fmd from at least 5 predictors of your choice.

multi <- glm(fmd ~ physhlth_days + sleep_hrs + age + sex + bmi,
                 data = brfss_logistic,
                 family = binomial(link = "logit"))

1b. (5 pts) Report the adjusted ORs with 95% CIs in a publication-quality table using tbl_regression().

multi |>
  tbl_regression(
    exponentiate = TRUE,
    label = list(
      physhlth_days ~ "Physically unhealthy days",
      sleep_hrs  ~ "Sleep hours (per hour)",
      age ~ "Age in years",
      sex ~ "Sex",
      bmi ~ "BMI"
    )
  ) |>
  bold_labels() |>
  bold_p()
Characteristic OR 95% CI p-value
Physically unhealthy days 1.07 1.07, 1.08 <0.001
Sleep hours (per hour) 0.85 0.80, 0.90 <0.001
Age in years 0.97 0.96, 0.97 <0.001
Sex


    Male
    Female 1.75 1.48, 2.07 <0.001
BMI 1.01 0.99, 1.02 0.3
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

1c. (5 pts) Interpret the adjusted OR for two predictors of your choice in 1-2 sentences each. Make sure to mention what the OR represents (per unit change for continuous; reference category for categorical).

Each one hour increase in sleep hours is associated with 0.85 times the risk of frequent mental distress. Females have 1.75 times the risk of frequent mental distress compared to males.


Task 2: Wald and Likelihood Ratio Tests (15 points)

2a. (5 pts) Identify the Wald p-value for each predictor in your model from the tidy() or summary() output.

tidy(multi, conf.int = TRUE, exponentiate = TRUE) |>
  mutate(across(c(estimate, std.error, statistic, conf.low, conf.high),
                \(x) round(x, 3)),
         p.value = format.pval(p.value, digits = 3)) |>
  kable(caption = "Wald Tests for Each Coefficient (Exponentiated)") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Wald Tests for Each Coefficient (Exponentiated)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 1.316 0.294 0.935 0.350 0.740 2.343
physhlth_days 1.073 0.004 18.471 < 2e-16 1.065 1.081
sleep_hrs 0.848 0.028 -5.959 2.53e-09 0.804 0.895
age 0.967 0.003 -12.126 < 2e-16 0.962 0.972
sexFemale 1.751 0.085 6.553 5.63e-11 1.481 2.071
bmi 1.007 0.006 1.063 0.288 0.994 1.019

2b. (5 pts) Fit a reduced model that drops one predictor of your choice. Perform a likelihood ratio test comparing the full and reduced models using anova(reduced, full, test = "LRT").

reduced <- glm(fmd ~ physhlth_days + age + sex + bmi,
  data = brfss_logistic,
  family = binomial
)

anova(reduced, multi, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test: Does Sleep Improve the Model?") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
LR Test: Does Sleep Improve the Model?
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
4995 3711.105 NA NA NA
4994 3674.804 1 36.301 0

2c. (5 pts) Compare the conclusions from the Wald test (for the dropped predictor) and the LR test. Do they agree? In 2-3 sentences, explain when the two tests might disagree.

The Wald test and LR test agree. The p-value for sleep_hrs was 2.53e-09 in the Wald test and was 0 in the LR test, which are both very small p-values. The two tests might disagree if there is a small sample size or if the estimates themselves are extreme values.


Task 3: Test an Interaction (20 points)

3a. (5 pts) Fit a model that includes an interaction between two predictors of your choice (e.g., exercise * sex or smoker * age).

interact <- glm(fmd ~ sex * exercise,
  data = brfss_logistic,
  family = binomial
)

interact |>
  tbl_regression(exponentiate = TRUE,
    label = list(
      exercise ~ "Exercise status",
      sex ~ "Sex"
))
Characteristic OR 95% CI p-value
Sex


    Male
    Female 1.72 1.31, 2.25 <0.001
Exercise status


    No
    Yes 0.59 0.46, 0.76 <0.001
Sex * Exercise status


    Female * Yes 0.98 0.70, 1.36 0.9
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

3b. (5 pts) Perform a likelihood ratio test comparing the model with the interaction to the model without it.

no_interact <- glm(fmd ~ sex + exercise,
  data = brfss_logistic,
  family = binomial
)


anova(interact, no_interact, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test: Does Interaction Improve the Model?") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
LR Test: Does Interaction Improve the Model?
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
4996 4165.132 NA NA NA
4997 4165.149 -1 -0.017 0.896

3c. (5 pts) Create a visualization of the interaction using ggpredict() and plot().

ggpredict(interact, terms = c("exercise", "sex")) |>
  plot() +
  labs(title = "Predicted Probability of FMD by Sex & Exercise",
       x = "exercise", y = "Predicted Probability of FMD",
       color = "sex") +
  theme_minimal()

3d. (5 pts) In 3-4 sentences, interpret the interaction. Does the effect of one predictor differ across levels of the other? If statistically significant, report the stratum-specific odds ratios.

The interaction between sex and exercise is not statistically significant as p > 0.05. Thus, the effect of exercise on frequent mental distress does not differ across sex. In the visualization, the two lines were essentially parallel, confirming the lack of interaction.

ggpredict(interact, terms = c("exercise", "sex")) |>
  as_tibble() |>
  pivot_wider(id_cols = group, names_from = x, values_from = predicted) |>
  mutate(OR_yes_vs_no = (Yes / (1 - Yes)) / (No / (1 - No))) |>
  dplyr::select(Sex = group, OR_yes_vs_no) |>
  kable(digits = 3,
        col.names = c("Sex", "OR (Exercise: Yes vs. No)"),
        caption = "Stratum-Specific Odds Ratios for Exercise") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Stratum-Specific Odds Ratios for Exercise
Sex OR (Exercise: Yes vs. No)
Male 0.590
Female 0.577

Task 4: Goodness-of-Fit and Discrimination (25 points)

4a. (5 pts) Compute McFadden’s pseudo-R² for your full model using performance::r2_mcfadden().

performance::r2_mcfadden(multi)
## # R2 for Generalized Linear Regression
##        R2: 0.136
##   adj. R2: 0.135

4b. (5 pts) Perform the Hosmer-Lemeshow goodness-of-fit test using ResourceSelection::hoslem.test(). Report the test statistic and p-value. Comment on the interpretation given your sample size.

hl_test <- hoslem.test(
  x = as.numeric(brfss_logistic$fmd) - 1,
  y = fitted(multi),
  g = 10
)

hl_test
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  as.numeric(brfss_logistic$fmd) - 1, fitted(multi)
## X-squared = 11.178, df = 8, p-value = 0.1918

The p-value is >0.05, indicating that the model fits well in all regions of predicted probability. With the large sample size, it is possible this test detects even very minor instances where the model does not fit.

4c. (5 pts) Create a calibration plot showing observed vs. predicted probabilities by decile. Comment on whether the model appears well calibrated.

brfss_logistic |>
  mutate(pred_prob = fitted(multi),
         obs_outcome = as.numeric(fmd) - 1,
         decile = ntile(pred_prob, 10)) |>
  group_by(decile) |>
  summarise(
    mean_pred = mean(pred_prob),
    mean_obs  = mean(obs_outcome),
    .groups = "drop"
  ) |>
  ggplot(aes(x = mean_pred, y = mean_obs)) +
  geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
  geom_point(size = 3, color = "steelblue") +
  geom_line(color = "steelblue") +
  labs(title = "Calibration Plot: Observed vs. Predicted Probability of FMD",
       subtitle = "Points should fall on the dashed line for perfect calibration",
       x = "Mean Predicted Probability (within decile)",
       y = "Observed Proportion (within decile)") +
  theme_minimal()

The model appears well calibrated as the points do not deviate too far from the fitted line.

4d. (10 pts) Compute and plot the ROC curve using pROC::roc(). Report the AUC. Based on the AUC value, how would you describe the model’s discrimination ability (poor, acceptable, excellent, outstanding)?

roc_obj <- roc(
  response = brfss_logistic$fmd,
  predictor = fitted(multi),
  levels = c("No", "Yes"),
  direction = "<"
)

auc_value <- auc(roc_obj)

ggroc(roc_obj, color = "steelblue", linewidth = 1.2) +
  geom_abline(slope = 1, intercept = 1, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Frequent Mental Distress Model",
       subtitle = paste0("AUC = ", round(auc_value, 3)),
       x = "Specificity", y = "Sensitivity") +
  theme_minimal()

AUC = 0.761, which is acceptable and indicates the model can distinguish between individuals with and without FMD fairly well.


End of Lab Activity