Part 9: In-Class Lab Activity

EPI 553 — Logistic Regression Part 2 Lab Due: End of class, April 14, 2026


Instructions

In this lab, you will build a multiple logistic regression model, conduct hypothesis tests, examine an interaction, and assess goodness-of-fit and discrimination. Use the same BRFSS 2020 logistic dataset from Part 1. Work through each task systematically. You may discuss concepts with classmates, but your written answers and R code must be your own.

Submission: Knit your .Rmd to HTML and upload to Brightspace by end of class.


Data for the Lab

Variable Description Type
fmd Frequent mental distress (No/Yes) Factor (outcome)
menthlth_days Mentally unhealthy days (0–30) Numeric
physhlth_days Physically unhealthy days (0–30) Numeric
sleep_hrs Sleep hours per night (1–14) Numeric
age Age in years (capped at 80) Numeric
sex Sex (Male/Female) Factor
bmi Body mass index Numeric
exercise Exercised in past 30 days (No/Yes) Factor
income_cat Household income category (1–8) Numeric
smoker Former/Never vs. Current Factor
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'tibble' was built under R version 4.5.2
## Warning: package 'tidyr' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
## Warning: package 'purrr' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.2
## Warning: package 'stringr' was built under R version 4.5.2
## Warning: package 'forcats' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom)
## Warning: package 'broom' was built under R version 4.5.2
library(knitr)
## Warning: package 'knitr' was built under R version 4.5.2
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.5.2
## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(gtsummary)
## Warning: package 'gtsummary' was built under R version 4.5.2
library(car)
## Warning: package 'car' was built under R version 4.5.2
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.5.2
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some
library(ggeffects)
## Warning: package 'ggeffects' was built under R version 4.5.2
library(ResourceSelection)
## Warning: package 'ResourceSelection' was built under R version 4.5.3
## ResourceSelection 0.3-6   2023-06-27
library(pROC)
## Warning: package 'pROC' was built under R version 4.5.3
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## 
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
library(performance)
## Warning: package 'performance' was built under R version 4.5.3
brfss_logistic <- readRDS(
  "C:/Users/abbym/OneDrive/Desktop/STATS553/R Materials/epi553/scripts/brfss_logistic.rds"
)

Task 1: Build a Multiple Logistic Regression Model (15 points)

mod_full <- glm(
  fmd ~ exercise + smoker + age + sex + sleep_hrs,
  data = brfss_logistic,
  family = binomial(link = "logit")
)
mod_full |>
  tbl_regression(
    exponentiate = TRUE,
    label = list(
      exercise      ~ "Exercise (past 30 days)",
      smoker        ~ "Smoking status",
      age           ~ "Age (per year)",
      sex           ~ "Sex",
      sleep_hrs     ~ "Sleep hours"
    )
  ) |>
  bold_labels() |>
  bold_p()
Characteristic OR 95% CI p-value
Exercise (past 30 days)


    No
    Yes 0.54 0.46, 0.65 <0.001
Smoking status


    Former/Never
    Current 1.45 1.23, 1.71 <0.001
Age (per year) 0.98 0.97, 0.98 <0.001
Sex


    Male
    Female 1.77 1.51, 2.08 <0.001
Sleep hours 0.81 0.76, 0.85 <0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

1a. (5 pts) Fit a multiple logistic regression model predicting fmd from at least 5 predictors of your choice.

1b. (5 pts) Report the adjusted ORs with 95% CIs in a publication-quality table using tbl_regression().

1c. (5 pts) Interpret the adjusted OR for two predictors of your choice in 1-2 sentences each. Make sure to mention what the OR represents (per unit change for continuous; reference category for categorical). Compared to those that don’t smoke, those that do smoke have 45% increased odds of having frequent mental distress, holding all other variables constant. For every one additional hour of sleep, the odds of having frequent mental distress decreases by 19%, holding all other variables constant. —

Task 2: Wald and Likelihood Ratio Tests (15 points)

tidy(mod_full, conf.int = TRUE, exponentiate = TRUE) |>
  mutate(across(c(estimate, std.error, statistic, conf.low, conf.high),
                \(x) round(x, 3)),
         p.value = format.pval(p.value, digits = 3)) |>
  kable(caption = "Wald Tests for Each Coefficient (Exponentiated)") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Wald Tests for Each Coefficient (Exponentiated)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 2.788 0.246 4.175 2.98e-05 1.723 4.514
exerciseYes 0.544 0.088 -6.921 4.48e-12 0.458 0.646
smokerCurrent 1.451 0.085 4.381 1.18e-05 1.228 1.713
age 0.976 0.003 -9.107 < 2e-16 0.971 0.981
sexFemale 1.771 0.082 6.935 4.07e-12 1.507 2.082
sleep_hrs 0.807 0.028 -7.813 5.56e-15 0.764 0.851
mod_reduced <- glm(
  fmd ~ exercise + smoker + age + sex,
  data = brfss_logistic,
  family = binomial
)

anova(mod_reduced, mod_full, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test: Does adding exercise + smoker improve the model?") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
LR Test: Does adding exercise + smoker improve the model?
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
4995 4000.588 NA NA NA
4994 3938.021 1 62.567 0

2a. (5 pts) Identify the Wald p-value for each predictor in your model from the tidy() or summary() output.

2b. (5 pts) Fit a reduced model that drops one predictor of your choice. Perform a likelihood ratio test comparing the full and reduced models using anova(reduced, full, test = "LRT").

2c. (5 pts) Compare the conclusions from the Wald test (for the dropped predictor) and the LR test. Do they agree? In 2-3 sentences, explain when the two tests might disagree. The Wald test p-value for sleep_hrs was 5.56e-15 and the LR test p-value was 0. This means that both tests show that sleep_hrs improves the model. The two tests might disagree when the other variables have an impact on the variable that has been dropped. Sometimes a single variable alone doesn’t provide significant information, but when combined with other variables in a model, it does provide significant information. —

Task 3: Test an Interaction (20 points)

mod_int <- glm(
  fmd ~ exercise * smoker + age + sex + sleep_hrs,
  data = brfss_logistic,
  family = binomial(link = "logit")
)
anova(mod_int, mod_full, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test: Does adding exercise + smoker interaction improve the model?") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
LR Test: Does adding exercise + smoker interaction improve the model?
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
4993 3936.635 NA NA NA
4994 3938.021 -1 -1.386 0.239
ggpredict(mod_int, terms = c("exercise", "smoker")) |>
  plot() +
  labs(title = "Predicted Probability of FMD by Exercise and Smoker",
       x = "Exercise", y = "Predicted Probability of FMD",
       color = "Smoker") +
  theme_minimal()
## Ignoring unknown labels:
## • linetype : "smoker"
## • shape : "smoker"

3a. (5 pts) Fit a model that includes an interaction between two predictors of your choice (e.g., exercise * sex or smoker * age).

3b. (5 pts) Perform a likelihood ratio test comparing the model with the interaction to the model without it.

3c. (5 pts) Create a visualization of the interaction using ggpredict() and plot().

3d. (5 pts) In 3-4 sentences, interpret the interaction. Does the effect of one predictor differ across levels of the other? If statistically significant, report the stratum-specific odds ratios. The effect of exercise on frequent mental distress does not differ by sex, and vice versa. The p-value of the likelihood ratio test is 0.239, meaning the interaction is not significant. This means it does not improve or provide significant information to the model. —

Task 4: Goodness-of-Fit and Discrimination (25 points)

performance::r2_mcfadden(mod_full)
## # R2 for Generalized Linear Regression
##        R2: 0.074
##   adj. R2: 0.073
hl_test <- hoslem.test(
  x = as.numeric(brfss_logistic$fmd) - 1,
  y = fitted(mod_full),
  g = 10
)

hl_test
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  as.numeric(brfss_logistic$fmd) - 1, fitted(mod_full)
## X-squared = 17.02, df = 8, p-value = 0.0299
brfss_logistic |>
  mutate(pred_prob = fitted(mod_full),
         obs_outcome = as.numeric(fmd) - 1,
         decile = ntile(pred_prob, 10)) |>
  group_by(decile) |>
  summarise(
    mean_pred = mean(pred_prob),
    mean_obs  = mean(obs_outcome),
    .groups = "drop"
  ) |>
  ggplot(aes(x = mean_pred, y = mean_obs)) +
  geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
  geom_point(size = 3, color = "steelblue") +
  geom_line(color = "steelblue") +
  labs(title = "Calibration Plot: Observed vs. Predicted Probability of FMD",
       subtitle = "Points should fall on the dashed line for perfect calibration",
       x = "Mean Predicted Probability (within decile)",
       y = "Observed Proportion (within decile)") +
  theme_minimal()

roc_obj <- roc(
  response = brfss_logistic$fmd,
  predictor = fitted(mod_full),
  levels = c("No", "Yes"),
  direction = "<"
)

auc_value <- auc(roc_obj)

ggroc(roc_obj, color = "steelblue", linewidth = 1.2) +
  geom_abline(slope = 1, intercept = 1, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Frequent Mental Distress Model",
       subtitle = paste0("AUC = ", round(auc_value, 3)),
       x = "Specificity", y = "Sensitivity") +
  theme_minimal()

4a. (5 pts) Compute McFadden’s pseudo-R² for your full model using performance::r2_mcfadden().

4b. (5 pts) Perform the Hosmer-Lemeshow goodness-of-fit test using ResourceSelection::hoslem.test(). Report the test statistic and p-value. Comment on the interpretation given your sample size. The Hosmer-Lemeshow test had a p-value of 0.0299, which means that the model does not fit well in some of the regions of predicted probability, but since we are using a large sample size, the test could be overpowered and this p-value could be representative of very trivial and small shifts. 4c. (5 pts) Create a calibration plot showing observed vs. predicted probabilities by decile. Comment on whether the model appears well calibrated. The model looks fairly well calibrated. It follows the 45 degree line pretty well, with only a few more major deviations from it. 4d. (10 pts) Compute and plot the ROC curve using pROC::roc(). Report the AUC. Based on the AUC value, how would you describe the model’s discrimination ability (poor, acceptable, excellent, outstanding)? The AUC is 0.695. This means that the model’s discrimination ability is poor. —

End of Lab Activity