Part 9: In-Class Lab Activity

EPI 553 — Logistic Regression Part 2 Lab Due: End of class, April 14, 2026


Instructions

In this lab, you will build a multiple logistic regression model, conduct hypothesis tests, examine an interaction, and assess goodness-of-fit and discrimination. Use the same BRFSS 2020 logistic dataset from Part 1. Work through each task systematically. You may discuss concepts with classmates, but your written answers and R code must be your own.

Submission: Knit your .Rmd to HTML and upload to Brightspace by end of class.


Data for the Lab

Variable Description Type
fmd Frequent mental distress (No/Yes) Factor (outcome)
menthlth_days Mentally unhealthy days (0–30) Numeric
physhlth_days Physically unhealthy days (0–30) Numeric
sleep_hrs Sleep hours per night (1–14) Numeric
age Age in years (capped at 80) Numeric
sex Sex (Male/Female) Factor
bmi Body mass index Numeric
exercise Exercised in past 30 days (No/Yes) Factor
income_cat Household income category (1–8) Numeric
smoker Former/Never vs. Current Factor
library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(gtsummary)
library(car)
library(ggeffects)
library(ResourceSelection)
library(pROC)
library(performance)

brfss_logistic <- readRDS(
  "~/Desktop/EPI553/data/brfss_logistic_2020 copy.rds"
)

Task 1: Build a Multiple Logistic Regression Model (15 points)

1a. (5 pts) Fit a multiple logistic regression model predicting fmd from at least 5 predictors of your choice.

mod_fmd <- glm(fmd ~ exercise + sex + physhlth_days + age + sleep_hrs, 
               data= brfss_logistic,
               family= binomial(link= "logit"))

1b. (5 pts) Report the adjusted ORs with 95% CIs in a publication-quality table using tbl_regression().

mod_fmd |>
  tbl_regression(
    exponentiate = TRUE,
    label = list(
      exercise      ~ "Exercise (past 30 days)",
      sex           ~ "Sex",
      physhlth_days ~ "Physically unhealthy days",
      age           ~ "Age (per year)",
      sleep_hrs     ~ "Sleep hours"
    )
  ) |>
  bold_labels() |>
  bold_p()
Characteristic OR 95% CI p-value
Exercise (past 30 days)


    No
    Yes 0.75 0.62, 0.90 0.002
Sex


    Male
    Female 1.75 1.48, 2.07 <0.001
Physically unhealthy days 1.07 1.06, 1.08 <0.001
Age (per year) 0.97 0.96, 0.97 <0.001
Sleep hours 0.85 0.80, 0.90 <0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

1c. (5 pts) Interpret the adjusted OR for two predictors of your choice in 1-2 sentences each. Make sure to mention what the OR represents (per unit change for continuous; reference category for categorical).

In comparison to those who did not exercise in the past 30 days (reference category), those who did exercise have 0.75 times the odds of frequent mental distress (aOR= 0.75, 95% CI: 0.62-0.90). In other words, there is 25% lower odds of fmd among those who exercised, after adjusting for sex, physical health, age, and sleep. Since the aOR is less than 1, it indicates that exercise is a protective factor.

For each additional hour of sleep, the odds of frequent mental distress decrease by 15% (aOR= 0.85, 95% CI= 0.80-0.90), after adjusting for exercise, sex, physical health, and age. Since aOR is less than 1, it indicates sleep is a protective factor.


Task 2: Wald and Likelihood Ratio Tests (15 points)

2a. (5 pts) Identify the Wald p-value for each predictor in your model from the tidy() or summary() output.

tidy(mod_fmd, conf.int = TRUE, exponentiate = TRUE) |>
  mutate(across(c(estimate, std.error, statistic, conf.low, conf.high),
                \(x) round(x, 3)),
         p.value = format.pval(p.value, digits = 3)) |>
  kable(caption = "Wald Tests for Each Coefficient (Exponentiated)") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Wald Tests for Each Coefficient (Exponentiated)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 2.088 0.238 3.088 0.00202 1.309 3.332
exerciseYes 0.747 0.095 -3.072 0.00213 0.620 0.901
sexFemale 1.745 0.086 6.508 7.62e-11 1.476 2.065
physhlth_days 1.070 0.004 17.276 < 2e-16 1.062 1.078
age 0.966 0.003 -12.447 < 2e-16 0.961 0.971
sleep_hrs 0.849 0.027 -5.955 2.60e-09 0.804 0.896

exerciseYes: p-value= 0.002 sexFemale: p-value < 0.001 physhlth_days: p-value < 0.001 age: p-value < 0.001 sleep_hrs: p-value < 0.001

2b. (5 pts) Fit a reduced model that drops one predictor of your choice. Perform a likelihood ratio test comparing the full and reduced models using anova(reduced, full, test = "LRT").

mod_reduced <- glm(
  fmd ~ sex + physhlth_days + age + sleep_hrs,
  data = brfss_logistic,
  family = binomial
)

anova(mod_reduced, mod_fmd, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test: Does adding exercise improve the model?") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
LR Test: Does adding exercise improve the model?
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
4995 3675.927 NA NA NA
4994 3666.679 1 9.248 0.002

2c. (5 pts) Compare the conclusions from the Wald test (for the dropped predictor) and the LR test. Do they agree? In 2-3 sentences, explain when the two tests might disagree.

The Wald test and the LR test agree. The Wald test was statistically significant for exercise (p-value= 0.002) and the likelihood ratio test was also significant (p-value= 0.002) which indicates that exercise is an important predictor. The two tests might disagree when sample sizes are small or with extreme estimates (coefficients). When sample sizes are small or coefficients are large the wald test becomes less reliable.

Task 3: Test an Interaction (20 points)

3a. (5 pts) Fit a model that includes an interaction between two predictors of your choice (e.g., exercise * sex or smoker * age).

mod_interaction <- glm(fmd ~ exercise * sex + physhlth_days + age + sleep_hrs, data= brfss_logistic, family= binomial
)

mod_interaction |>
  tbl_regression(exponentiate = TRUE) |>
  bold_labels() |>
  bold_p()
Characteristic OR 95% CI p-value
exercise


    No
    Yes 0.74 0.56, 0.97 0.027
sex


    Male
    Female 1.72 1.28, 2.31 <0.001
physhlth_days 1.07 1.06, 1.08 <0.001
IMPUTED AGE VALUE COLLAPSED ABOVE 80 0.97 0.96, 0.97 <0.001
sleep_hrs 0.85 0.80, 0.90 <0.001
exercise * sex


    Yes * Female 1.02 0.71, 1.46 >0.9
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

3b. (5 pts) Perform a likelihood ratio test comparing the model with the interaction to the model without it.

mod_no_interaction <- glm(
  fmd ~ exercise + sex + age + sleep_hrs + physhlth_days,
  data = brfss_logistic,
  family = binomial
)

anova(mod_no_interaction, mod_interaction, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test for Exercise × Sex Interaction") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
LR Test for Exercise × Sex Interaction
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
4994 3666.679 NA NA NA
4993 3666.664 1 0.016 0.9

3c. (5 pts) Create a visualization of the interaction using ggpredict() and plot().

ggpredict(mod_interaction, terms = c("exercise", "sex")) |>
  plot() +
  labs(title = "Predicted Probability of FMD by Exercise and Sex",
       x = "Exercise", y = "Predicted Probability of FMD",
       color = "Sex") +
  theme_minimal()

3d. (5 pts) In 3-4 sentences, interpret the interaction. Does the effect of one predictor differ across levels of the other? If statistically significant, report the stratum-specific odds ratios.

The interaction between exercise and sex is not statistically significant, the p-value is greater than 0.9. The likelihood test supports this with a p-value= 0.9 therefore the effect of exercise on fmd does not differ by sex. Since the interaction is not statistically significant the effect of both groups can be interpreted by themselves. This means females have 1.72 times the odds of fmd compared to males, regardless of exercise status. Also, the odds of fmd are 26% lower for those who exercise compared to those who do not.


Task 4: Goodness-of-Fit and Discrimination (25 points)

4a. (5 pts) Compute McFadden’s pseudo-R² for your full model using performance::r2_mcfadden().

performance::r2_mcfadden(mod_fmd)
## # R2 for Generalized Linear Regression
##        R2: 0.138
##   adj. R2: 0.137

4b. (5 pts) Perform the Hosmer-Lemeshow goodness-of-fit test using ResourceSelection::hoslem.test(). Report the test statistic and p-value. Comment on the interpretation given your sample size.

hl_test <- hoslem.test(
  x = as.numeric(brfss_logistic$fmd) - 1,
  y = fitted(mod_fmd),
  g = 10
)

hl_test
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  as.numeric(brfss_logistic$fmd) - 1, fitted(mod_fmd)
## X-squared = 15.295, df = 8, p-value = 0.05366

test statistic (X-squared)= 15.295 p-value= 0.05366 Given the large sample size and a p-value just above 0.05, it is suggested that the model seems to fit the data but it should be interpreted with caution.

4c. (5 pts) Create a calibration plot showing observed vs. predicted probabilities by decile. Comment on whether the model appears well calibrated.

brfss_logistic |>
  mutate(pred_prob = fitted(mod_fmd),
         obs_outcome = as.numeric(fmd) - 1,
         decile = ntile(pred_prob, 10)) |>
  group_by(decile) |>
  summarise(
    mean_pred = mean(pred_prob),
    mean_obs  = mean(obs_outcome),
    .groups = "drop"
  ) |>
  ggplot(aes(x = mean_pred, y = mean_obs)) +
  geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
  geom_point(size = 3, color = "steelblue") +
  geom_line(color = "steelblue") +
  labs(title = "Calibration Plot: Observed vs. Predicted Probability of FMD",
       subtitle = "Points should fall on the dashed line for perfect calibration",
       x = "Mean Predicted Probability (within decile)",
       y = "Observed Proportion (within decile)") +
  theme_minimal()

The model is not perfectly well calibrated. There are points that fall both above and under the 45-degree line but they are all relatively close.

4d. (10 pts) Compute and plot the ROC curve using pROC::roc(). Report the AUC. Based on the AUC value, how would you describe the model’s discrimination ability (poor, acceptable, excellent, outstanding)?

roc_obj <- roc(
  response = brfss_logistic$fmd,
  predictor = fitted(mod_fmd),
  levels = c("No", "Yes"),
  direction = "<"
)

auc_value <- auc(roc_obj)

ggroc(roc_obj, color = "steelblue", linewidth = 1.2) +
  geom_abline(slope = 1, intercept = 1, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Frequent Mental Distress Model",
       subtitle = paste0("AUC = ", round(auc_value, 3)),
       x = "Specificity", y = "Sensitivity") +
  theme_minimal()

The AUC= 0.763 Based on the AUC of 0.763, the models discrimination ability is acceptable. —

End of Lab Activity