In Part 1, we introduced the logistic model, the logit transformation, and the connection between logistic regression coefficients and odds ratios. We fit simple logistic regression models with single predictors and previewed multiple logistic regression.
In Part 2, we go deeper:
Textbook reference: Kleinbaum et al., Chapter 22 (Sections 22.4 and 22.5)
EPI 553 — Logistic Regression Part 2 Lab Due: End of class, April 14, 2026
In this lab, you will build a multiple logistic regression model, conduct hypothesis tests, examine an interaction, and assess goodness-of-fit and discrimination. Use the same BRFSS 2020 logistic dataset from Part 1. Work through each task systematically. You may discuss concepts with classmates, but your written answers and R code must be your own.
Submission: Knit your .Rmd to HTML and upload to Brightspace by end of class.
| Variable | Description | Type |
|---|---|---|
fmd |
Frequent mental distress (No/Yes) | Factor (outcome) |
menthlth_days |
Mentally unhealthy days (0–30) | Numeric |
physhlth_days |
Physically unhealthy days (0–30) | Numeric |
sleep_hrs |
Sleep hours per night (1–14) | Numeric |
age |
Age in years (capped at 80) | Numeric |
sex |
Sex (Male/Female) | Factor |
bmi |
Body mass index | Numeric |
exercise |
Exercised in past 30 days (No/Yes) | Factor |
income_cat |
Household income category (1–8) | Numeric |
smoker |
Former/Never vs. Current | Factor |
library(tidyverse)
library(knitr)
library(kableExtra)
library(broom)
library(gtsummary)
library(car)
library(ggeffects)
library(ResourceSelection) # for Hosmer-Lemeshow
library(pROC) # for ROC/AUC
library(performance) # for model performance metrics
options(gtsummary.use_ftExtra = TRUE)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))brfss_logistic <- readRDS(
"/Users/nataliasmall/Downloads/EPI 553/brfss_logistic_2020.rds")
dim(brfss_logistic)## [1] 5000 10
1a. (5 pts) Fit a multiple logistic regression model
predicting fmd from at least 5 predictors of your
choice.
mod_multi <- glm(fmd ~ physhlth_days + sleep_hrs + exercise + sex + age,
data = brfss_logistic,
family = binomial(link = "logit")) 1b. (5 pts) Report the adjusted ORs with 95% CIs in
a publication-quality table using tbl_regression().
mod_multi |>
tbl_regression(
exponentiate = TRUE,
label = list(
physhlth_days ~ "Physically unhealthy days",
sleep_hrs ~ "Sleep hours",
exercise ~ "Exercise (past 30 days)",
sex ~ "Sex",
age ~ "Age (per year)"
)
) |>
bold_labels() |>
bold_p()| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| Physically unhealthy days | 1.07 | 1.06, 1.08 | <0.001 |
| Sleep hours | 0.85 | 0.80, 0.90 | <0.001 |
| Exercise (past 30 days) | |||
| No | — | — | |
| Yes | 0.75 | 0.62, 0.90 | 0.002 |
| Sex | |||
| Male | — | — | |
| Female | 1.75 | 1.48, 2.07 | <0.001 |
| Age (per year) | 0.97 | 0.96, 0.97 | <0.001 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
1c. (5 pts) Interpret the adjusted OR for two
predictors of your choice in 1-2 sentences each. Make sure to mention
what the OR represents (per unit change for continuous; reference
category for categorical). Exercise (past 30 days)
Compared to those who have not exercised in the past 30 days (the
reference group), those who have excersied have 0.75 times the odds of
frequent mental distress, adjusting for all other variables in the
model.
SexCompared to males (the reference group), females
have 1.75 times the odds of frequent mental distress, holding all other
variables constant. —
2a. (5 pts) Identify the Wald p-value for each
predictor in your model from the tidy() or
summary() output.
tidy(mod_multi, conf.int = TRUE, exponentiate = TRUE) |>
mutate(across(c(estimate, std.error, statistic, conf.low, conf.high),
\(x) round(x, 3)),
p.value = format.pval(p.value, digits = 3)) |>
kable(caption = "Wald Tests for Each Coefficient (Exponentiated)") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 2.088 | 0.238 | 3.088 | 0.00202 | 1.309 | 3.332 |
| physhlth_days | 1.070 | 0.004 | 17.276 | < 2e-16 | 1.062 | 1.078 |
| sleep_hrs | 0.849 | 0.027 | -5.955 | 2.60e-09 | 0.804 | 0.896 |
| exerciseYes | 0.747 | 0.095 | -3.072 | 0.00213 | 0.620 | 0.901 |
| sexFemale | 1.745 | 0.086 | 6.508 | 7.62e-11 | 1.476 | 2.065 |
| age | 0.966 | 0.003 | -12.447 | < 2e-16 | 0.961 | 0.971 |
2b. (5 pts) Fit a reduced model that drops one
predictor of your choice. Perform a likelihood ratio test comparing the
full and reduced models using
anova(reduced, full, test = "LRT").
mod_reduced <- glm(fmd ~ physhlth_days + sleep_hrs + exercise + sex,
data = brfss_logistic,
family = binomial)
anova(mod_reduced, mod_multi, test = "LRT") |>
kable(digits = 3,
caption = "LR Test: Does adding age improve the model?") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi) |
|---|---|---|---|---|
| 4995 | 3827.637 | NA | NA | NA |
| 4994 | 3666.679 | 1 | 160.958 | 0 |
2c. (5 pts) Compare the conclusions from the Wald test (for the dropped predictor) and the LR test. Do they agree? In 2-3 sentences, explain when the two tests might disagree. Interpretation: The Wald test highlights that age is independently significant contributer to predict fmd; likelihood ratio test (LR test) has a p-value < 0.001 indicating that the the dropped variable (age) jointly contributed to model fit. They do agree. Wald test might disagree from a likelihood ratio test when sample sizes are small or when coefficients are large, LR test is generally preferred for those situations.
3a. (5 pts) Fit a model that includes an interaction
between two predictors of your choice (e.g., exercise * sex
or smoker * age).
mod_interact <- glm(
fmd ~ exercise * sex + age + smoker + sleep_hrs + income_cat,
data = brfss_logistic,
family = binomial
)
mod_interact |>
tbl_regression(exponentiate = TRUE) |>
bold_labels() |>
bold_p()| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| exercise | |||
| No | — | — | |
| Yes | 0.61 | 0.47, 0.79 | <0.001 |
| sex | |||
| Male | — | — | |
| Female | 1.66 | 1.26, 2.21 | <0.001 |
| IMPUTED AGE VALUE COLLAPSED ABOVE 80 | 0.97 | 0.97, 0.98 | <0.001 |
| smoker | |||
| Former/Never | — | — | |
| Current | 1.25 | 1.05, 1.48 | 0.012 |
| sleep_hrs | 0.81 | 0.77, 0.86 | <0.001 |
| income_cat | 0.85 | 0.82, 0.89 | <0.001 |
| exercise * sex | |||
| Yes * Female | 1.00 | 0.71, 1.41 | >0.9 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
3b. (5 pts) Perform a likelihood ratio test comparing the model with the interaction to the model without it.
mod_no_interact <- glm(
fmd ~ exercise + sex + age + smoker + sleep_hrs + income_cat,
data = brfss_logistic,
family = binomial
)
anova(mod_no_interact, mod_interact, test = "LRT") |>
kable(digits = 3,
caption = "LR Test for Exercise × Sex Interaction") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi) |
|---|---|---|---|---|
| 4993 | 3870.197 | NA | NA | NA |
| 4992 | 3870.197 | 1 | 0 | 0.987 |
3c. (5 pts) Create a visualization of the
interaction using ggpredict() and plot().
ggpredict(mod_interact, terms = c("exercise", "sex")) |>
plot() +
labs(title = "Predicted Probability of FMD by Exercise and Sex",
x = "Exercise", y = "Predicted Probability of FMD",
color = "Sex") +
theme_minimal()3d. (5 pts) In 3-4 sentences, interpret the interaction. Does the effect of one predictor differ across levels of the other? If statistically significant, report the stratum-specific odds ratios. Interpretation: Since the p-value for the LR test is 0.987 (> 0.05), the interaction is not statistically significant, the effect of exercise does not differ by sex. We can drop the interaction term and use the simpler main-effects model. The visualization further confirms this conclusion, since the lines are relatively parallel, there is no interaction. There is no need to report the stratum-specific odds ratios because the interaction is not statistically significant.
4a. (5 pts) Compute McFadden’s pseudo-R² for your
full model using performance::r2_mcfadden().
## # R2 for Generalized Linear Regression
## R2: 0.138
## adj. R2: 0.137
4b. (5 pts) Perform the Hosmer-Lemeshow
goodness-of-fit test using
ResourceSelection::hoslem.test(). Report the test statistic
and p-value. Comment on the interpretation given your sample size.
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: mod_multi$y, fitted(mod_multi)
## X-squared = 15.295, df = 8, p-value = 0.05366
X-squared:15.295 p-value:0.05366 Interpretation:Since the p-value, 0.05366, is greater than 0.05 the Hosmer-Lemeshow test suggests that the model had a good fit. While the test can be over-powered and detect trivial misfit given the large sample size, N = 5000,the model still passes the threshold for a good fit.
4c. (5 pts) Create a calibration plot showing observed vs. predicted probabilities by decile. Comment on whether the model appears well calibrated.
brfss_logistic |>
mutate(pred_prob = fitted(mod_multi),
obs_outcome = as.numeric(fmd) - 1,
decile = ntile(pred_prob, 10)) |>
group_by(decile) |>
summarise(
mean_pred = mean(pred_prob),
mean_obs = mean(obs_outcome),
.groups = "drop"
) |>
ggplot(aes(x = mean_pred, y = mean_obs)) +
geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
geom_point(size = 3, color = "steelblue") +
geom_line(color = "steelblue") +
labs(title = "Calibration Plot: Observed vs. Predicted Probability of FMD",
subtitle = "Points should fall on the dashed line for perfect calibration",
x = "Mean Predicted Probability (within decile)",
y = "Observed Proportion (within decile)") +
theme_minimal()
Interpretation:The points fall closely along the
45-degree dashed line, indicating good calibration. While there are
minor fluctuations in the middle deciles, the model does not
consistently over-predict or under-predict the probability of fmd,
suggesting the predicted probabilities are reliable.
4d. (10 pts) Compute and plot the ROC curve using
pROC::roc(). Report the AUC. Based on the AUC value, how
would you describe the model’s discrimination ability (poor, acceptable,
excellent, outstanding)?
roc_obj <- roc(
response = brfss_logistic$fmd,
predictor = fitted(mod_multi),
levels = c("No", "Yes"),
direction = "<"
)
auc_value <- auc(roc_obj)
ggroc(roc_obj, color = "steelblue", linewidth = 1.2) +
geom_abline(slope = 1, intercept = 1, linetype = "dashed", color = "red") +
labs(title = "ROC Curve for Frequent Mental Distress Model",
subtitle = paste0("AUC = ", round(auc_value, 3)),
x = "Specificity", y = "Sensitivity") +
theme_minimal()
Interpretation:The AUC is 0.763, which is falls between
0.75 and 0.80. This indicates an acceptable discrimination, meaning the
model can distinguish between individuals with and without FMD
reasonably well (correctly identifying the person with fmd about 76% of
the time).
End of Lab Activity