Part 9: In-Class Lab Activity

EPI 553 — Logistic Regression Part 2 Lab Due: End of class, April 14, 2026


Instructions

In this lab, you will build a multiple logistic regression model, conduct hypothesis tests, examine an interaction, and assess goodness-of-fit and discrimination. Use the same BRFSS 2020 logistic dataset from Part 1. Work through each task systematically. You may discuss concepts with classmates, but your written answers and R code must be your own.

Submission: Knit your .Rmd to HTML and upload to Brightspace by end of class.


Data for the Lab

Variable Description Type
fmd Frequent mental distress (No/Yes) Factor (outcome)
menthlth_days Mentally unhealthy days (0–30) Numeric
physhlth_days Physically unhealthy days (0–30) Numeric
sleep_hrs Sleep hours per night (1–14) Numeric
age Age in years (capped at 80) Numeric
sex Sex (Male/Female) Factor
bmi Body mass index Numeric
exercise Exercised in past 30 days (No/Yes) Factor
income_cat Household income category (1–8) Numeric
smoker Former/Never vs. Current Factor
library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(gtsummary)
library(car)
library(ggeffects)
library(ResourceSelection)
library(pROC)
library(performance)

brfss_logistic <- readRDS("data/brfss_logistic_2020.rds"
)

Task 1: Build a Multiple Logistic Regression Model (15 points)

1a. (5 pts) Fit a multiple logistic regression model predicting fmd from at least 5 predictors of your choice.

mod_full <- glm(
  fmd ~ exercise + age + sex + sleep_hrs + income_cat,
  data = brfss_logistic,
  family = binomial(link = "logit")
)

1b. (5 pts) Report the adjusted ORs with 95% CIs in a publication-quality table using tbl_regression().

mod_full |>
  tbl_regression(
    exponentiate = TRUE,
    label = list(
      exercise      ~ "Exercise (past 30 days)",
      age           ~ "Age (per year)",
      sex           ~ "Sex",
      sleep_hrs     ~ "Sleep hours",
      income_cat    ~ "Income category (per unit)"
    )
  ) |>
  bold_labels() |>
  bold_p()
Characteristic OR 95% CI p-value
Exercise (past 30 days)


    No
    Yes 0.60 0.51, 0.72 <0.001
Age (per year) 0.97 0.97, 0.98 <0.001
Sex


    Male
    Female 1.67 1.42, 1.96 <0.001
Sleep hours 0.81 0.77, 0.86 <0.001
Income category (per unit) 0.84 0.81, 0.88 <0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

1c. (5 pts) Interpret the adjusted OR for two predictors of your choice in 1-2 sentences each. Make sure to mention what the OR represents (per unit change for continuous; reference category for categorical).

The aOR for age suggests that age reduces the odds of having frequent mental distress by 0.97 per year. This means that the older individuals are more likely to have less mental distress. The aOR for sex suggests that females have 1.67 times the odds of having frequent mental distress as compared to males. This means that males are more likely to have mental distress.

Task 2: Wald and Likelihood Ratio Tests (15 points)

2a. (5 pts) Identify the Wald p-value for each predictor in your model from the tidy() or summary() output.

tidy(mod_full, conf.int = TRUE, exponentiate = TRUE) |>
  mutate(across(c(estimate, std.error, statistic, conf.low, conf.high),
                \(x) round(x, 3)),
         p.value = format.pval(p.value, digits = 3)) |>
  kable(caption = "Wald Tests for Each Coefficient (Exponentiated)") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Wald Tests for Each Coefficient (Exponentiated)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 8.967 0.247 8.890 < 2e-16 5.534 14.564
exerciseYes 0.603 0.090 -5.635 1.75e-08 0.506 0.720
age 0.973 0.003 -10.551 < 2e-16 0.968 0.978
sexFemale 1.668 0.083 6.138 8.37e-10 1.417 1.965
sleep_hrs 0.811 0.027 -7.705 1.31e-14 0.769 0.855
income_cat 0.844 0.019 -9.042 < 2e-16 0.814 0.876

2b. (5 pts) Fit a reduced model that drops one predictor of your choice. Perform a likelihood ratio test comparing the full and reduced models using anova(reduced, full, test = "LRT").

mod_reduced <- glm(
  fmd ~ exercise + age + sex + sleep_hrs,
  data = brfss_logistic,
  family = binomial
)

anova(mod_reduced, mod_full, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test: Does adding income improve the model?") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
LR Test: Does adding income improve the model?
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
4995 3957.029 NA NA NA
4994 3876.520 1 80.509 0

2c. (5 pts) Compare the conclusions from the Wald test (for the dropped predictor) and the LR test. Do they agree? In 2-3 sentences, explain when the two tests might disagree. The Wald test shows that income is significantly associated with frequent mental distress. The likelihood ratio test which dropped the income variable showed that the full model (with income variable) is significantly better at predicting mental distress; therefore, these two tests agree. These test may not agree when there is a small sample size.


Task 3: Test an Interaction (20 points)

3a. (5 pts) Fit a model that includes an interaction between two predictors of your choice (e.g., exercise * sex or smoker * age).

mod_interact <- glm(
  fmd ~ exercise*sleep_hrs + sex + age + income_cat,
  data = brfss_logistic,
  family = binomial
)

mod_interact |>
  tbl_regression(exponentiate = TRUE) |>
  bold_labels() |>
  bold_p()
Characteristic OR 95% CI p-value
exercise


    No
    Yes 1.09 0.52, 2.27 0.8
sleep_hrs 0.86 0.79, 0.93 <0.001
sex


    Male
    Female 1.67 1.42, 1.97 <0.001
IMPUTED AGE VALUE COLLAPSED ABOVE 80 0.97 0.97, 0.98 <0.001
income_cat 0.85 0.81, 0.88 <0.001
exercise * sleep_hrs


    Yes * sleep_hrs 0.91 0.82, 1.02 0.10
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

3b. (5 pts) Perform a likelihood ratio test comparing the model with the interaction to the model without it.

mod_no_interact <- glm(
  fmd ~ exercise + sleep_hrs + sex + age + income_cat,
  data = brfss_logistic,
  family = binomial
)

anova(mod_no_interact, mod_interact, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test for Exercise × Sleep Interaction") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
LR Test for Exercise × Sleep Interaction
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
4994 3876.520 NA NA NA
4993 3873.862 1 2.657 0.103

3c. (5 pts) Create a visualization of the interaction using ggpredict() and plot().

ggpredict(mod_interact, terms = c("exercise", "sleep_hrs")) |>
  plot() +
  labs(title = "Predicted Probability of FMD by Exercise and Sleep Hours",
       x = "exercise", y = "Predicted Probability of FMD",
       color = "sleep_hrs") +
  theme_minimal()

3d. (5 pts) In 3-4 sentences, interpret the interaction. Does the effect of one predictor differ across levels of the other? If statistically significant, report the stratum-specific odds ratios. Exercise does not differ across sleep hours. The interaction was not found to be statistically significant. The likelihood ratio test shows that the model without the interaction is better at predicting mental distress. The plot shows that the relationship between sleep and mental distress does not change depending on exercise. —

Task 4: Goodness-of-Fit and Discrimination (25 points)

4a. (5 pts) Compute McFadden’s pseudo-R² for your full model using performance::r2_mcfadden().

performance::r2_mcfadden(mod_full)
## # R2 for Generalized Linear Regression
##        R2: 0.088
##   adj. R2: 0.088

4b. (5 pts) Perform the Hosmer-Lemeshow goodness-of-fit test using ResourceSelection::hoslem.test(). Report the test statistic and p-value. Comment on the interpretation given your sample size.

hl_test <- hoslem.test(
  x = as.numeric(brfss_logistic$fmd) - 1,
  y = fitted(mod_full),
  g = 10
)

hl_test
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  as.numeric(brfss_logistic$fmd) - 1, fitted(mod_full)
## X-squared = 18.281, df = 8, p-value = 0.01922

The Hosmer-Lemeshow test resulted in a test statistic of 18.28 and a p-value = 0.019. This low and significant p-value suggests a poor fit. Larger sample sizes like this study can be over-powered and result in detection of misfit.

4c. (5 pts) Create a calibration plot showing observed vs. predicted probabilities by decile. Comment on whether the model appears well calibrated.

brfss_logistic |>
  mutate(pred_prob = fitted(mod_full),
         obs_outcome = as.numeric(fmd) - 1,
         decile = ntile(pred_prob, 10)) |>
  group_by(decile) |>
  summarise(
    mean_pred = mean(pred_prob),
    mean_obs  = mean(obs_outcome),
    .groups = "drop"
  ) |>
  ggplot(aes(x = mean_pred, y = mean_obs)) +
  geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
  geom_point(size = 3, color = "steelblue") +
  geom_line(color = "steelblue") +
  labs(title = "Calibration Plot: Observed vs. Predicted Probability of FMD",
       subtitle = "Points should fall on the dashed line for perfect calibration",
       x = "Mean Predicted Probability (within decile)",
       y = "Observed Proportion (within decile)") +
  theme_minimal()

The model appears to be well calibrated given that the observed proportion align closely with the predicted probabilities. There are two instances of largers over and under prediction

4d. (10 pts) Compute and plot the ROC curve using pROC::roc(). Report the AUC. Based on the AUC value, how would you describe the model’s discrimination ability (poor, acceptable, excellent, outstanding)?

roc_obj <- roc(
  response = brfss_logistic$fmd,
  predictor = fitted(mod_full),
  levels = c("No", "Yes"),
  direction = "<"
)

auc_value <- auc(roc_obj)

ggroc(roc_obj, color = "steelblue", linewidth = 1.2) +
  geom_abline(slope = 1, intercept = 1, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Frequent Mental Distress Model",
       subtitle = paste0("AUC = ", round(auc_value, 3)),
       x = "Specificity", y = "Sensitivity") +
  theme_minimal()

The AUC is 0.717. This indicates that the model’s disrimination ability is acceptable. —

End of Lab Activity