In Part 1, we introduced the logistic model, the logit transformation, and the connection between logistic regression coefficients and odds ratios. We fit simple logistic regression models with single predictors and previewed multiple logistic regression.
In Part 2, we go deeper:
Textbook reference: Kleinbaum et al., Chapter 22 (Sections 22.4 and 22.5)
library(tidyverse)
library(knitr)
library(kableExtra)
library(broom)
library(gtsummary)
library(car)
library(ggeffects)
library(ResourceSelection) # for Hosmer-Lemeshow
library(pROC) # for ROC/AUC
library(performance) # for model performance metrics
library(sjPlot)
library(modelsummary)
options(gtsummary.use_ftExtra = TRUE)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))EPI 553 — Logistic Regression Part 2 Lab Due: End of class, April 14, 2026
brfss_logistic <- readRDS("C:/Users/joshm/Documents/UAlbany/Spring 2026/EPI 553/Labs/brfss_logistic_2020.rds")1a. (5 pts) Fit a multiple logistic regression model
predicting fmd from at least 5 predictors of your
choice.
multi <- glm(fmd ~ physhlth_days + sleep_hrs + age + sex + bmi,
data = brfss_logistic,
family = binomial(link = "logit"))1b. (5 pts) Report the adjusted ORs with 95% CIs in
a publication-quality table using tbl_regression().
multi |>
tbl_regression(
exponentiate = TRUE,
label = list(
physhlth_days ~ "Physically unhealthy days",
sleep_hrs ~ "Sleep hours (per hour)",
age ~ "Age in years",
sex ~ "Sex",
bmi ~ "BMI"
)
) |>
bold_labels() |>
bold_p()| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| Physically unhealthy days | 1.07 | 1.07, 1.08 | <0.001 |
| Sleep hours (per hour) | 0.85 | 0.80, 0.90 | <0.001 |
| Age in years | 0.97 | 0.96, 0.97 | <0.001 |
| Sex | |||
| Male | — | — | |
| Female | 1.75 | 1.48, 2.07 | <0.001 |
| BMI | 1.01 | 0.99, 1.02 | 0.3 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
1c. (5 pts) Interpret the adjusted OR for two predictors of your choice in 1-2 sentences each. Make sure to mention what the OR represents (per unit change for continuous; reference category for categorical).
Each one hour increase in sleep hours is associated with 0.85 times the risk of frequent mental distress. Females have 1.75 times the risk of frequent mental distress compared to males.
2a. (5 pts) Identify the Wald p-value for each
predictor in your model from the tidy() or
summary() output.
tidy(multi, conf.int = TRUE, exponentiate = TRUE) |>
mutate(across(c(estimate, std.error, statistic, conf.low, conf.high),
\(x) round(x, 3)),
p.value = format.pval(p.value, digits = 3)) |>
kable(caption = "Wald Tests for Each Coefficient (Exponentiated)") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 1.316 | 0.294 | 0.935 | 0.350 | 0.740 | 2.343 |
| physhlth_days | 1.073 | 0.004 | 18.471 | < 2e-16 | 1.065 | 1.081 |
| sleep_hrs | 0.848 | 0.028 | -5.959 | 2.53e-09 | 0.804 | 0.895 |
| age | 0.967 | 0.003 | -12.126 | < 2e-16 | 0.962 | 0.972 |
| sexFemale | 1.751 | 0.085 | 6.553 | 5.63e-11 | 1.481 | 2.071 |
| bmi | 1.007 | 0.006 | 1.063 | 0.288 | 0.994 | 1.019 |
2b. (5 pts) Fit a reduced model that drops one
predictor of your choice. Perform a likelihood ratio test comparing the
full and reduced models using
anova(reduced, full, test = "LRT").
reduced <- glm(fmd ~ physhlth_days + age + sex + bmi,
data = brfss_logistic,
family = binomial
)
anova(reduced, multi, test = "LRT") |>
kable(digits = 3,
caption = "LR Test: Does Sleep Improve the Model?") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi) |
|---|---|---|---|---|
| 4995 | 3711.105 | NA | NA | NA |
| 4994 | 3674.804 | 1 | 36.301 | 0 |
2c. (5 pts) Compare the conclusions from the Wald test (for the dropped predictor) and the LR test. Do they agree? In 2-3 sentences, explain when the two tests might disagree.
The Wald test and LR test agree. The p-value for sleep_hrs was 2.53e-09 in the Wald test and was 0 in the LR test, which are both very small p-values. The two tests might disagree if there is a small sample size or if the estimates themselves are extreme values.
3a. (5 pts) Fit a model that includes an interaction
between two predictors of your choice (e.g., exercise * sex
or smoker * age).
interact <- glm(fmd ~ sex * exercise,
data = brfss_logistic,
family = binomial
)
interact |>
tbl_regression(exponentiate = TRUE,
label = list(
exercise ~ "Exercise status",
sex ~ "Sex"
))| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| Sex | |||
| Male | — | — | |
| Female | 1.72 | 1.31, 2.25 | <0.001 |
| Exercise status | |||
| No | — | — | |
| Yes | 0.59 | 0.46, 0.76 | <0.001 |
| Sex * Exercise status | |||
| Female * Yes | 0.98 | 0.70, 1.36 | 0.9 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
3b. (5 pts) Perform a likelihood ratio test comparing the model with the interaction to the model without it.
no_interact <- glm(fmd ~ sex + exercise,
data = brfss_logistic,
family = binomial
)
anova(interact, no_interact, test = "LRT") |>
kable(digits = 3,
caption = "LR Test: Does Interaction Improve the Model?") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi) |
|---|---|---|---|---|
| 4996 | 4165.132 | NA | NA | NA |
| 4997 | 4165.149 | -1 | -0.017 | 0.896 |
3c. (5 pts) Create a visualization of the
interaction using ggpredict() and plot().
ggpredict(interact, terms = c("exercise", "sex")) |>
plot() +
labs(title = "Predicted Probability of FMD by Sex & Exercise",
x = "exercise", y = "Predicted Probability of FMD",
color = "sex") +
theme_minimal()3d. (5 pts) In 3-4 sentences, interpret the interaction. Does the effect of one predictor differ across levels of the other? If statistically significant, report the stratum-specific odds ratios.
The interaction between sex and exercise is not statistically significant as p > 0.05. Thus, the effect of exercise on frequent mental distress does not differ across sex. In the visualization, the two lines were essentially parallel, confirming the lack of interaction.
ggpredict(interact, terms = c("exercise", "sex")) |>
as_tibble() |>
pivot_wider(id_cols = group, names_from = x, values_from = predicted) |>
mutate(OR_yes_vs_no = (Yes / (1 - Yes)) / (No / (1 - No))) |>
dplyr::select(Sex = group, OR_yes_vs_no) |>
kable(digits = 3,
col.names = c("Sex", "OR (Exercise: Yes vs. No)"),
caption = "Stratum-Specific Odds Ratios for Exercise") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Sex | OR (Exercise: Yes vs. No) |
|---|---|
| Male | 0.590 |
| Female | 0.577 |
4a. (5 pts) Compute McFadden’s pseudo-R² for your
full model using performance::r2_mcfadden().
## # R2 for Generalized Linear Regression
## R2: 0.136
## adj. R2: 0.135
4b. (5 pts) Perform the Hosmer-Lemeshow
goodness-of-fit test using
ResourceSelection::hoslem.test(). Report the test statistic
and p-value. Comment on the interpretation given your sample size.
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: as.numeric(brfss_logistic$fmd) - 1, fitted(multi)
## X-squared = 11.178, df = 8, p-value = 0.1918
The p-value is >0.05, indicating that the model fits well in all regions of predicted probability. With the large sample size, it is possible this test detects even very minor instances where the model does not fit.
4c. (5 pts) Create a calibration plot showing observed vs. predicted probabilities by decile. Comment on whether the model appears well calibrated.
brfss_logistic |>
mutate(pred_prob = fitted(multi),
obs_outcome = as.numeric(fmd) - 1,
decile = ntile(pred_prob, 10)) |>
group_by(decile) |>
summarise(
mean_pred = mean(pred_prob),
mean_obs = mean(obs_outcome),
.groups = "drop"
) |>
ggplot(aes(x = mean_pred, y = mean_obs)) +
geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
geom_point(size = 3, color = "steelblue") +
geom_line(color = "steelblue") +
labs(title = "Calibration Plot: Observed vs. Predicted Probability of FMD",
subtitle = "Points should fall on the dashed line for perfect calibration",
x = "Mean Predicted Probability (within decile)",
y = "Observed Proportion (within decile)") +
theme_minimal()The model appears well calibrated as the points do not deviate too far from the fitted line.
4d. (10 pts) Compute and plot the ROC curve using
pROC::roc(). Report the AUC. Based on the AUC value, how
would you describe the model’s discrimination ability (poor, acceptable,
excellent, outstanding)?
roc_obj <- roc(
response = brfss_logistic$fmd,
predictor = fitted(multi),
levels = c("No", "Yes"),
direction = "<"
)
auc_value <- auc(roc_obj)
ggroc(roc_obj, color = "steelblue", linewidth = 1.2) +
geom_abline(slope = 1, intercept = 1, linetype = "dashed", color = "red") +
labs(title = "ROC Curve for Frequent Mental Distress Model",
subtitle = paste0("AUC = ", round(auc_value, 3)),
x = "Specificity", y = "Sensitivity") +
theme_minimal()AUC = 0.761, which is acceptable and indicates the model can distinguish between individuals with and without FMD fairly well.
End of Lab Activity