Introduction

In Part 1, we introduced the logistic model, the logit transformation, and the connection between logistic regression coefficients and odds ratios. We fit simple logistic regression models with single predictors and previewed multiple logistic regression.

In Part 2, we go deeper:

  1. Multiple logistic regression with adjusted odds ratios for confounder control
  2. Maximum likelihood estimation and the Wald and likelihood ratio tests
  3. Confidence intervals for adjusted odds ratios
  4. Interaction terms in logistic regression and effect modification
  5. Goodness-of-fit: deviance, Hosmer-Lemeshow test, pseudo-R²
  6. Model discrimination: ROC curves and the area under the curve (AUC)
  7. Diagnostics: residuals, influential observations, and linearity in the logit

Textbook reference: Kleinbaum et al., Chapter 22 (Sections 22.4 and 22.5)


Setup and Data

library(tidyverse)
library(knitr)
library(kableExtra)
library(broom)
library(gtsummary)
library(car)
library(ggeffects)
library(ResourceSelection)  # for Hosmer-Lemeshow
library(pROC)               # for ROC/AUC
library(performance)        # for model performance metrics
library(sjPlot)
library(modelsummary)

options(gtsummary.use_ftExtra = TRUE)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))
brfss_logistic <- readRDS(
  "C:/Users/userp/OneDrive/Рабочий стол/HSTA553/R files/brfss_logistic_2020.rds"
)

dim(brfss_logistic)
## [1] 5000   10

Outcome: fmd — Frequent Mental Distress (1 = 14+ mentally unhealthy days in past 30, 0 = otherwise).


Part 1: Multiple Logistic Regression

The Adjusted Model

We extend the simple model to include several predictors:

\[\text{logit}[\Pr(Y = 1)] = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k\]

Each coefficient \(\beta_j\) represents the change in log-odds for a one-unit increase in \(X_j\), holding all other predictors constant. Exponentiating gives the adjusted odds ratio:

\[\text{aOR}_j = e^{\beta_j}\]

mod_full <- glm(
  fmd ~ exercise + smoker + age + sex + sleep_hrs + income_cat + bmi + physhlth_days,
  data = brfss_logistic,
  family = binomial(link = "logit")
)

mod_full |>
  tbl_regression(
    exponentiate = TRUE,
    label = list(
      exercise      ~ "Exercise (past 30 days)",
      smoker        ~ "Smoking status",
      age           ~ "Age (per year)",
      sex           ~ "Sex",
      sleep_hrs     ~ "Sleep hours",
      income_cat    ~ "Income category (per unit)",
      bmi           ~ "BMI",
      physhlth_days ~ "Physically unhealthy days"
    )
  ) |>
  bold_labels() |>
  bold_p()
Characteristic OR 95% CI p-value
Exercise (past 30 days)


    No
    Yes 0.82 0.68, 0.99 0.041
Smoking status


    Former/Never
    Current 1.30 1.09, 1.56 0.004
Age (per year) 0.97 0.96, 0.97 <0.001
Sex


    Male
    Female 1.68 1.41, 1.98 <0.001
Sleep hours 0.86 0.81, 0.90 <0.001
Income category (per unit) 0.91 0.87, 0.95 <0.001
BMI 1.01 0.99, 1.02 0.4
Physically unhealthy days 1.06 1.06, 1.07 <0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Interpretation: Each row gives the adjusted odds ratio (aOR) and 95% CI for one predictor, controlling for all others. For example, the aOR for current smoking compares the odds of frequent mental distress for current smokers vs. former/never smokers, after adjusting for age, sex, sleep, income, BMI, exercise, and physical health. An aOR > 1 indicates a risk factor; an aOR < 1 indicates a protective factor.

Scaling Continuous Predictors

A 1-year change in age or a 1-unit change in BMI is rarely the most clinically meaningful comparison. We can rescale to improve interpretation:

mod_scaled <- glm(
  fmd ~ exercise + smoker + I(age/10) + sex + sleep_hrs +
        income_cat + I(bmi/5) + physhlth_days,
  data = brfss_logistic,
  family = binomial
)

mod_scaled |>
  tbl_regression(
    exponentiate = TRUE,
    label = list(
      "I(age/10)"  ~ "Age (per 10 years)",
      "I(bmi/5)"   ~ "BMI (per 5 units)"
    )
  ) |>
  bold_labels()
Characteristic OR 95% CI p-value
exercise


    No
    Yes 0.82 0.68, 0.99 0.041
smoker


    Former/Never
    Current 1.30 1.09, 1.56 0.004
Age (per 10 years) 0.73 0.69, 0.77 <0.001
sex


    Male
    Female 1.68 1.41, 1.98 <0.001
sleep_hrs 0.86 0.81, 0.90 <0.001
income_cat 0.91 0.87, 0.95 <0.001
BMI (per 5 units) 1.03 0.97, 1.10 0.4
physhlth_days 1.06 1.06, 1.07 <0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Interpretation: Now the aOR for age compares two individuals 10 years apart, and the aOR for BMI compares two individuals 5 BMI units apart, both more clinically interpretable.


Part 2: Maximum Likelihood Estimation and Hypothesis Tests

Maximum Likelihood Estimation (ML)

Unlike linear regression, which uses ordinary least squares, logistic regression coefficients are estimated by maximum likelihood. The algorithm finds the values of \(\beta_0, \beta_1, \ldots, \beta_k\) that maximize the likelihood of observing the data.

The likelihood function for \(n\) independent binary observations is:

\[L(\boldsymbol{\beta}) = \prod_{i=1}^{n} p_i^{y_i}(1 - p_i)^{1 - y_i}\]

where \(p_i = \Pr(Y_i = 1 \mid X_i)\) is the predicted probability for observation \(i\). Taking the log gives the log-likelihood:

\[\ln L(\boldsymbol{\beta}) = \sum_{i=1}^{n} \left[y_i \ln p_i + (1 - y_i) \ln(1 - p_i)\right]\]

The ML estimates \(\hat{\beta}\) are obtained iteratively (typically by Newton-Raphson). We never compute these by hand, but it is important to know that R’s glm() reports Deviance \(= -2 \ln \hat{L}\), which is the foundation for hypothesis testing.

Wald Test (z-test for individual coefficients)

The Wald test is the default test reported by summary() and tidy(). For each coefficient \(\beta_j\):

\[z = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)} \sim N(0, 1) \text{ under } H_0: \beta_j = 0\]

The p-value tests whether the coefficient is significantly different from zero, equivalently whether the OR is significantly different from 1.

tidy(mod_full, conf.int = TRUE, exponentiate = TRUE) |>
  mutate(across(c(estimate, std.error, statistic, conf.low, conf.high),
                \(x) round(x, 3)),
         p.value = format.pval(p.value, digits = 3)) |>
  kable(caption = "Wald Tests for Each Coefficient (Exponentiated)") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Wald Tests for Each Coefficient (Exponentiated)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 2.198 0.356 2.215 0.02675 1.095 4.414
exerciseYes 0.820 0.097 -2.044 0.04095 0.679 0.993
smokerCurrent 1.301 0.091 2.890 0.00386 1.088 1.555
age 0.969 0.003 -11.225 < 2e-16 0.963 0.974
sexFemale 1.675 0.086 5.975 2.30e-09 1.415 1.985
sleep_hrs 0.857 0.027 -5.643 1.67e-08 0.812 0.904
income_cat 0.909 0.020 -4.703 2.56e-06 0.873 0.946
bmi 1.006 0.006 0.932 0.35113 0.993 1.018
physhlth_days 1.065 0.004 15.634 < 2e-16 1.057 1.073

Caveat: The Wald test can be unreliable when sample sizes are small or when coefficients are large. The likelihood ratio test is generally preferred for these situations.

Likelihood Ratio Test (LR test)

The likelihood ratio test compares two nested models: a “full” model and a “reduced” model that drops one or more predictors. The test statistic is:

\[\text{LR} = -2(\ln \hat{L}_{\text{reduced}} - \ln \hat{L}_{\text{full}}) = D_{\text{reduced}} - D_{\text{full}}\]

Under \(H_0\) that the dropped predictors have no effect, LR follows a \(\chi^2\) distribution with degrees of freedom equal to the number of parameters dropped.

Example: Test the joint contribution of smoking and exercise

mod_reduced <- glm(
  fmd ~ age + sex + sleep_hrs + income_cat + bmi + physhlth_days,
  data = brfss_logistic,
  family = binomial
)

anova(mod_reduced, mod_full, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test: Does adding exercise + smoker improve the model?") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
LR Test: Does adding exercise + smoker improve the model?
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
4993 3641.913 NA NA NA
4991 3628.474 2 13.439 0.001

Interpretation: The LR test gives a \(\chi^2\) statistic on 3 degrees of freedom (1 for exercise, 2 for smoker — but smoker has 2 levels so 1 dummy variable is created, making df = 2 here actually). A small p-value means the dropped variables jointly contribute to model fit.

Wald vs. LR test in practice

Aspect Wald test LR test
What it tests Single coefficient or vector Nested model comparison
Computational cost Very fast Requires fitting two models
Reliability with small samples Less reliable Generally preferred
Reported by R summary(model) anova(m1, m2, test = "LRT")

In large samples (like ours with n = 5,000), the two tests usually agree. In smaller samples or with extreme estimates, prefer the LR test.


Part 3: Confidence Intervals for Adjusted Odds Ratios

For the OR of a single coefficient, the 95% CI is computed on the log-odds scale and then exponentiated:

\[95\% \text{ CI for } e^{\beta_j} = \exp\left(\hat{\beta}_j \pm 1.96 \cdot \text{SE}(\hat{\beta}_j)\right)\]

This is the default approach used by confint() and tidy(..., conf.int = TRUE).

ci_table <- tidy(mod_full, conf.int = TRUE, exponentiate = TRUE) |>
  filter(term != "(Intercept)") |>
  dplyr::select(term, estimate, conf.low, conf.high) |>
  mutate(across(c(estimate, conf.low, conf.high), \(x) round(x, 3)))

ci_table |>
  kable(col.names = c("Predictor", "aOR", "95% CI Lower", "95% CI Upper"),
        caption = "Adjusted Odds Ratios with 95% CIs") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Adjusted Odds Ratios with 95% CIs
Predictor aOR 95% CI Lower 95% CI Upper
exerciseYes 0.820 0.679 0.993
smokerCurrent 1.301 1.088 1.555
age 0.969 0.963 0.974
sexFemale 1.675 1.415 1.985
sleep_hrs 0.857 0.812 0.904
income_cat 0.909 0.873 0.946
bmi 1.006 0.993 1.018
physhlth_days 1.065 1.057 1.073

Forest Plot of Adjusted ORs

A forest plot is the standard way to visualize multiple ORs and their CIs:

ci_table |>
  ggplot(aes(x = estimate, y = reorder(term, estimate))) +
  geom_vline(xintercept = 1, linetype = "dashed", color = "red") +
  geom_point(size = 3, color = "steelblue") +
  geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2,
                 color = "steelblue") +
  scale_x_log10() +
  labs(title = "Forest Plot of Adjusted Odds Ratios for Frequent Mental Distress",
       subtitle = "Reference line at OR = 1; log-scale x-axis",
       x = "Adjusted Odds Ratio (95% CI)", y = NULL) +
  theme_minimal()

Interpretation: Predictors whose CIs do not cross the dashed line at OR = 1 are statistically significantly associated with FMD at the 0.05 level. The log-scale x-axis ensures that ORs of 0.5 and 2.0 (which represent equally strong associations in opposite directions) appear equidistant from 1.


Part 4: Interaction (Effect Modification)

What Is Interaction in Logistic Regression?

Interaction (effect modification) occurs when the effect of one predictor on the outcome depends on the value of another predictor. In logistic regression, interaction is modeled by including a product term:

\[\text{logit}[\Pr(Y=1)] = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 (X_1 \cdot X_2)\]

If \(\beta_3 \neq 0\), the OR for \(X_1\) depends on the value of \(X_2\).

Example: Does the effect of exercise differ by sex?

mod_interact <- glm(
  fmd ~ exercise * sex + age + smoker + sleep_hrs + income_cat,
  data = brfss_logistic,
  family = binomial
)

mod_interact |>
  tbl_regression(exponentiate = TRUE) |>
  bold_labels() |>
  bold_p()
Characteristic OR 95% CI p-value
exercise


    No
    Yes 0.61 0.47, 0.79 <0.001
sex


    Male
    Female 1.66 1.26, 2.21 <0.001
IMPUTED AGE VALUE COLLAPSED ABOVE 80 0.97 0.97, 0.98 <0.001
smoker


    Former/Never
    Current 1.25 1.05, 1.48 0.012
sleep_hrs 0.81 0.77, 0.86 <0.001
income_cat 0.85 0.82, 0.89 <0.001
exercise * sex


    Yes * Female 1.00 0.71, 1.41 >0.9
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Testing the Interaction

mod_no_interact <- glm(
  fmd ~ exercise + sex + age + smoker + sleep_hrs + income_cat,
  data = brfss_logistic,
  family = binomial
)

anova(mod_no_interact, mod_interact, test = "LRT") |>
  kable(digits = 3,
        caption = "LR Test for Exercise × Sex Interaction") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
LR Test for Exercise × Sex Interaction
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
4993 3870.197 NA NA NA
4992 3870.197 1 0 0.987

Interpretation: If the p-value for the LR test is small (< 0.05), the interaction is statistically significant: the effect of exercise differs by sex. If not, we can drop the interaction term and use the simpler main-effects model.

Visualizing the Interaction

ggpredict(mod_interact, terms = c("exercise", "sex")) |>
  plot() +
  labs(title = "Predicted Probability of FMD by Exercise and Sex",
       x = "Exercise", y = "Predicted Probability of FMD",
       color = "Sex") +
  theme_minimal()

Interpretation: If the lines are parallel, there is no interaction. If they cross or diverge, the effect of exercise differs across sex.

Stratified Odds Ratios

When an interaction is present, we report stratum-specific odds ratios rather than a single overall OR:

# Stratum-specific ORs from the interaction model
ggpredict(mod_interact, terms = c("exercise", "sex")) |>
  as_tibble() |>
  pivot_wider(id_cols = group, names_from = x, values_from = predicted) |>
  mutate(OR_yes_vs_no = (Yes / (1 - Yes)) / (No / (1 - No))) |>
  dplyr::select(Sex = group, OR_yes_vs_no) |>
  kable(digits = 3,
        col.names = c("Sex", "OR (Exercise: Yes vs. No)"),
        caption = "Stratum-Specific Odds Ratios for Exercise") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Stratum-Specific Odds Ratios for Exercise
Sex OR (Exercise: Yes vs. No)
Male 0.612
Female 0.614

Part 5: Goodness-of-Fit

Deviance

The deviance of a logistic model is:

\[D = -2 \ln \hat{L}\]

It is analogous to the residual sum of squares in linear regression: smaller is better. By itself, the deviance is hard to interpret, but differences in deviance between nested models follow a \(\chi^2\) distribution and form the basis of the LR test.

glance(mod_full) |>
  dplyr::select(null.deviance, df.null, deviance, df.residual, AIC, BIC) |>
  kable(digits = 1, caption = "Model Fit Statistics") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Model Fit Statistics
null.deviance df.null deviance df.residual AIC BIC
4251.3 4999 3628.5 4991 3646.5 3705.1

Quick check: The difference between null.deviance and deviance represents the improvement from adding all predictors to an intercept-only model. We can test this with an LR test on df.null - df.residual degrees of freedom.

Pseudo-R²

There is no exact analog of \(R^2\) for logistic regression, but several “pseudo-R²” measures exist. The most common is McFadden’s R²:

\[R^2_{\text{McFadden}} = 1 - \frac{\ln \hat{L}_{\text{full}}}{\ln \hat{L}_{\text{null}}}\]

Values between 0.2 and 0.4 are considered excellent fit.

performance::r2_mcfadden(mod_full)
## # R2 for Generalized Linear Regression
##        R2: 0.147
##   adj. R2: 0.146

Interpretation: McFadden’s R² should not be interpreted on the same scale as linear regression R². Values are typically much smaller (e.g., 0.1 may indicate a reasonable fit).

Hosmer-Lemeshow Test

The Hosmer-Lemeshow test assesses the agreement between predicted and observed event rates within deciles of predicted probability. A non-significant p-value indicates adequate fit.

hl_test <- hoslem.test(
  x = as.numeric(brfss_logistic$fmd) - 1,
  y = fitted(mod_full),
  g = 10
)

hl_test
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  as.numeric(brfss_logistic$fmd) - 1, fitted(mod_full)
## X-squared = 8.9639, df = 8, p-value = 0.3453

Interpretation: A small p-value (< 0.05) suggests that the model does not fit well in some regions of predicted probability. With large samples (like ours), the Hosmer-Lemeshow test can be over-powered and detect trivial misfit. Always pair it with a calibration plot.

Calibration Plot

brfss_logistic |>
  mutate(pred_prob = fitted(mod_full),
         obs_outcome = as.numeric(fmd) - 1,
         decile = ntile(pred_prob, 10)) |>
  group_by(decile) |>
  summarise(
    mean_pred = mean(pred_prob),
    mean_obs  = mean(obs_outcome),
    .groups = "drop"
  ) |>
  ggplot(aes(x = mean_pred, y = mean_obs)) +
  geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
  geom_point(size = 3, color = "steelblue") +
  geom_line(color = "steelblue") +
  labs(title = "Calibration Plot: Observed vs. Predicted Probability of FMD",
       subtitle = "Points should fall on the dashed line for perfect calibration",
       x = "Mean Predicted Probability (within decile)",
       y = "Observed Proportion (within decile)") +
  theme_minimal()

Interpretation: A well-calibrated model has points lying close to the 45-degree line. Systematic departures suggest miscalibration: points above the line indicate the model under-predicts; points below indicate over-prediction.


Part 6: Discrimination — ROC Curve and AUC

While calibration assesses how well predicted probabilities match observed rates, discrimination assesses how well the model separates events from non-events.

The ROC curve plots sensitivity (true positive rate) against 1 − specificity (false positive rate) across all possible probability cutoffs.

The AUC (area under the ROC curve) summarizes discrimination:

AUC Discrimination
0.5 No discrimination (chance)
0.6-0.7 Poor
0.7-0.8 Acceptable
0.8-0.9 Excellent
> 0.9 Outstanding
roc_obj <- roc(
  response = brfss_logistic$fmd,
  predictor = fitted(mod_full),
  levels = c("No", "Yes"),
  direction = "<"
)

auc_value <- auc(roc_obj)

ggroc(roc_obj, color = "steelblue", linewidth = 1.2) +
  geom_abline(slope = 1, intercept = 1, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Frequent Mental Distress Model",
       subtitle = paste0("AUC = ", round(auc_value, 3)),
       x = "Specificity", y = "Sensitivity") +
  theme_minimal()

Interpretation: An AUC of approximately 0.75-0.80 indicates acceptable to excellent discrimination, meaning the model can distinguish between individuals with and without FMD reasonably well. Note that calibration and discrimination are distinct concepts: a model can have good discrimination but poor calibration, or vice versa.


Part 7: Diagnostics for Logistic Regression

Linearity in the Logit (for continuous predictors)

For continuous predictors, logistic regression assumes a linear relationship between the predictor and the log-odds. We can check this with a smoothed plot of the logit against the predictor.

brfss_logistic |>
  mutate(logit_pred = predict(mod_full, type = "link")) |>
  ggplot(aes(x = age, y = logit_pred)) +
  geom_point(alpha = 0.2, color = "steelblue") +
  geom_smooth(method = "loess", se = FALSE, color = "darkorange") +
  labs(title = "Linearity in the Logit: Age",
       x = "Age (years)", y = "Predicted Log-Odds (logit)") +
  theme_minimal()

Interpretation: A roughly linear loess curve supports the linearity assumption. A clearly curved pattern suggests we should add a quadratic term or use a spline.

Influential Observations

Cook’s distance and standardized residuals from logistic regression can be examined the same way as in linear regression:

brfss_logistic |>
  mutate(cooks_d = cooks.distance(mod_full),
         row_id = row_number()) |>
  ggplot(aes(x = row_id, y = cooks_d)) +
  geom_point(alpha = 0.4, color = "steelblue") +
  geom_hline(yintercept = 4 / nrow(brfss_logistic),
             linetype = "dashed", color = "red") +
  labs(title = "Cook's Distance for Logistic Regression Model",
       subtitle = "Red line: 4/n threshold",
       x = "Observation Index", y = "Cook's Distance") +
  theme_minimal()

Multicollinearity

VIFs work the same way as in linear regression:

vif(mod_full) |>
  as.data.frame() |>
  rownames_to_column("Predictor") |>
  kable(digits = 2, caption = "Variance Inflation Factors") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Variance Inflation Factors
Predictor vif(mod_full)
exercise 1.13
smoker 1.12
age 1.17
sex 1.01
sleep_hrs 1.02
income_cat 1.14
bmi 1.03
physhlth_days 1.20

Rule of thumb: VIF > 5 (or 10) indicates problematic multicollinearity.


Part 8: Reporting Logistic Regression Results

A publication-quality logistic regression table should include:

  1. Sample size and number of events
  2. Adjusted odds ratios with 95% CIs
  3. p-values for each coefficient
  4. Model fit statistics (AIC or pseudo-R²) - (may declutter)
  5. Discrimination metric (AUC) - (methodological purposes)
mod_full |>
  tbl_regression(
    exponentiate = TRUE,
    label = list(
      exercise      ~ "Exercise (past 30 days)",
      smoker        ~ "Smoking status",
      age           ~ "Age",
      sex           ~ "Sex",
      sleep_hrs     ~ "Sleep hours",
      income_cat    ~ "Income category",
      bmi           ~ "BMI",
      physhlth_days ~ "Physically unhealthy days"
    )
  ) |>
  add_glance_source_note(
    #include = c(nobs, AIC, BIC),
    include = everything(),
    label = list(nobs ~ "N", AIC ~ "AIC", BIC ~ "BIC")
  ) |>
  bold_labels() |>
  bold_p() |>
  modify_caption("**Adjusted Odds Ratios for Frequent Mental Distress, BRFSS 2020**")
Adjusted Odds Ratios for Frequent Mental Distress, BRFSS 2020
Characteristic OR 95% CI p-value
Exercise (past 30 days)


    No
    Yes 0.82 0.68, 0.99 0.041
Smoking status


    Former/Never
    Current 1.30 1.09, 1.56 0.004
Age 0.97 0.96, 0.97 <0.001
Sex


    Male
    Female 1.68 1.41, 1.98 <0.001
Sleep hours 0.86 0.81, 0.90 <0.001
Income category 0.91 0.87, 0.95 <0.001
BMI 1.01 0.99, 1.02 0.4
Physically unhealthy days 1.06 1.06, 1.07 <0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
Null deviance = 4,251; Null df = 4,999; Log-likelihood = -1,814; AIC = 3,646; BIC = 3,705; Deviance = 3,628; Residual df = 4,991; N = 5,000

Summary

Concept Tool / R function
Multiple logistic regression glm(y ~ x1 + x2 + ..., family = binomial)
Adjusted odds ratios tidy(model, exponentiate = TRUE)
Wald test Default in summary() and tidy()
Likelihood ratio test anova(reduced, full, test = "LRT")
Confidence intervals confint(model) (profile CI) or tidy(..., conf.int = TRUE)
Interaction glm(y ~ x1 * x2, ...); test with LR test
Pseudo-R² performance::r2_mcfadden()
Hosmer-Lemeshow ResourceSelection::hoslem.test()
Calibration plot Decile-based observed vs. predicted
ROC curve / AUC pROC::roc() and pROC::auc()
Diagnostics cooks.distance(), vif(), linearity plots
Publication table gtsummary::tbl_regression()

Next Lecture (April 16)

  • Polytomous logistic regression: outcomes with > 2 nominal categories
  • Ordinal logistic regression: outcomes with > 2 ordered categories
  • Proportional odds assumption and how to test it

References

  • Kleinbaum, D. G., Kupper, L. L., Nizam, A., & Rosenberg, E. S. (2013). Applied Regression Analysis and Other Multivariable Methods (5th ed.), Chapter 22.
  • Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression (3rd ed.). Wiley.
  • Steyerberg, E. W. (2019). Clinical Prediction Models (2nd ed.). Springer.


Part 9: In-Class Lab Activity

EPI 553 — Logistic Regression Part 2 Lab Nursultan Due: End of class, April 16, 2026


Instructions

In this lab, you will build a multiple logistic regression model, conduct hypothesis tests, examine an interaction, and assess goodness-of-fit and discrimination. Use the same BRFSS 2020 logistic dataset from Part 1. Work through each task systematically. You may discuss concepts with classmates, but your written answers and R code must be your own.

Submission: Knit your .Rmd to HTML and upload to Brightspace by end of class.


Data for the Lab

Variable Description Type
fmd Frequent mental distress (No/Yes) Factor (outcome)
menthlth_days Mentally unhealthy days (0–30) Numeric
physhlth_days Physically unhealthy days (0–30) Numeric
sleep_hrs Sleep hours per night (1–14) Numeric
age Age in years (capped at 80) Numeric
sex Sex (Male/Female) Factor
bmi Body mass index Numeric
exercise Exercised in past 30 days (No/Yes) Factor
income_cat Household income category (1–8) Numeric
smoker Former/Never vs. Current Factor
library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(gtsummary)
library(car)
library(ggeffects)
library(ResourceSelection)
library(pROC)
library(performance)

brfss_logistic <- readRDS(
  "C:/Users/userp/OneDrive/Рабочий стол/HSTA553/R files/brfss_logistic_2020.rds"
)

Task 1: Build a Multiple Logistic Regression Model (15 points)

1a. (5 pts) Fit a multiple logistic regression model predicting fmd from at least 5 predictors of your choice.

mod_full <- glm(
  fmd ~ exercise + smoker + age + sex + sleep_hrs + income_cat + bmi + physhlth_days,
  data = brfss_logistic,
  family = binomial
)

summary(mod_full)
## 
## Call:
## glm(formula = fmd ~ exercise + smoker + age + sex + sleep_hrs + 
##     income_cat + bmi + physhlth_days, family = binomial, data = brfss_logistic)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    0.787582   0.355539   2.215  0.02675 *  
## exerciseYes   -0.198110   0.096920  -2.044  0.04095 *  
## smokerCurrent  0.263333   0.091131   2.890  0.00386 ** 
## age           -0.031914   0.002843 -11.225  < 2e-16 ***
## sexFemale      0.515884   0.086336   5.975 2.30e-09 ***
## sleep_hrs     -0.154351   0.027350  -5.643 1.67e-08 ***
## income_cat    -0.095547   0.020316  -4.703 2.56e-06 ***
## bmi            0.005881   0.006308   0.932  0.35113    
## physhlth_days  0.062973   0.004028  15.634  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4251.3  on 4999  degrees of freedom
## Residual deviance: 3628.5  on 4991  degrees of freedom
## AIC: 3646.5
## 
## Number of Fisher Scoring iterations: 5

1b. (5 pts) Report the adjusted ORs with 95% CIs in a publication-quality table using tbl_regression().

mod_full |>
  tbl_regression(
    exponentiate = TRUE,
    label = list(
      exercise      ~ "Exercise (past 30 days)",
      smoker        ~ "Smoking status",
      age           ~ "Age (per year)",
      sex           ~ "Sex",
      sleep_hrs     ~ "Sleep hours",
      income_cat    ~ "Income category",
      bmi           ~ "BMI",
      physhlth_days ~ "Physically unhealthy days"
    )
  ) |>
  bold_labels() |>
  bold_p()
Characteristic OR 95% CI p-value
Exercise (past 30 days)


    No
    Yes 0.82 0.68, 0.99 0.041
Smoking status


    Former/Never
    Current 1.30 1.09, 1.56 0.004
Age (per year) 0.97 0.96, 0.97 <0.001
Sex


    Male
    Female 1.68 1.41, 1.98 <0.001
Sleep hours 0.86 0.81, 0.90 <0.001
Income category 0.91 0.87, 0.95 <0.001
BMI 1.01 0.99, 1.02 0.4
Physically unhealthy days 1.06 1.06, 1.07 <0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

1c. (5 pts) Interpret the adjusted OR for two predictors of your choice in 1-2 sentences each. Make sure to mention what the OR represents (per unit change for continuous; reference category for categorical).

#Exercise Holding all other variables constant, individuals who exercised in the past 30 days have 18% lower odds of frequent mental distress compared to those who did not exercise (OR = 0.82, 95% CI: 0.68–0.99, p = 0.041). This suggests that exercise is a protective factor. #Smoking status Holding all other variables constant, current smokers have 30% higher odds of frequent mental distress compared to former/never smokers (OR = 1.30, 95% CI: 1.09–1.56, p = 0.004). This indicates that smoking is associated with increased risk of mental distress.


Task 2: Wald and Likelihood Ratio Tests (15 points)

2a. (5 pts) Identify the Wald p-value for each predictor in your model from the tidy() or summary() output.

tidy(mod_full, conf.int = TRUE, exponentiate = TRUE) |>
  mutate(
    across(c(estimate, conf.low, conf.high), round, 3),
    p.value = signif(p.value, 3)
  ) |>
  kable(caption = "Wald Test Results for Full Logistic Regression Model") |>
  kable_styling(full_width = FALSE)
Wald Test Results for Full Logistic Regression Model
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 2.198 0.3555386 2.2151792 2.67e-02 1.095 4.414
exerciseYes 0.820 0.0969201 -2.0440582 4.09e-02 0.679 0.993
smokerCurrent 1.301 0.0911309 2.8896106 3.86e-03 1.088 1.555
age 0.969 0.0028431 -11.2249312 0.00e+00 0.963 0.974
sexFemale 1.675 0.0863364 5.9752778 0.00e+00 1.415 1.985
sleep_hrs 0.857 0.0273503 -5.6434879 0.00e+00 0.812 0.904
income_cat 0.909 0.0203161 -4.7030327 2.60e-06 0.873 0.946
bmi 1.006 0.0063078 0.9323927 3.51e-01 0.993 1.018
physhlth_days 1.065 0.0040280 15.6338895 0.00e+00 1.057 1.073

2b. (5 pts) Fit a reduced model that drops one predictor of your choice. Perform a likelihood ratio test comparing the full and reduced models using anova(reduced, full, test = "LRT").

mod_reduced <- glm(
  fmd ~ exercise + smoker + age + sex + sleep_hrs + income_cat + physhlth_days,
  data = brfss_logistic,
  family = binomial
)

anova(mod_reduced, mod_full, test = "LRT")
## Analysis of Deviance Table
## 
## Model 1: fmd ~ exercise + smoker + age + sex + sleep_hrs + income_cat + 
##     physhlth_days
## Model 2: fmd ~ exercise + smoker + age + sex + sleep_hrs + income_cat + 
##     bmi + physhlth_days
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1      4992     3629.3                     
## 2      4991     3628.5  1  0.86499   0.3523

#The likelihood ratio test comparing the reduced model (without BMI) and the full model (with BMI) shows a p-value of 0.3523. This indicates that adding BMI to the model does not significantly improve model fit. Therefore, BMI does not contribute meaningfully to explaining frequent mental distress after adjusting for the other predictors.

2c. (5 pts) Compare the conclusions from the Wald test (for the dropped predictor) and the LR test. Do they agree? In 2-3 sentences, explain when the two tests might disagree.

#The Wald test for BMI shows that it is not statistically significant (p = 0.35), suggesting that BMI is not associated with frequent mental distress after adjusting for other variables. Similarly, the likelihood ratio test comparing the reduced and full models yields a non-significant result (p = 0.3523), indicating that including BMI does not significantly improve model fit. Therefore, both the Wald test and the likelihood ratio test agree that BMI is not an important predictor in this model. In general, these tests may disagree in smaller samples or when coefficient estimates are unstable, and the likelihood ratio test is typically preferred because it is more reliable.


Task 3: Test an Interaction (20 points)

3a. (5 pts) Fit a model that includes an interaction between two predictors of your choice (e.g., exercise * sex or smoker * age).

mod_interact <- glm(
  fmd ~ exercise * sex + smoker + age + sleep_hrs + income_cat + bmi + physhlth_days,
  data = brfss_logistic,
  family = binomial
)

summary(mod_interact)
## 
## Call:
## glm(formula = fmd ~ exercise * sex + smoker + age + sleep_hrs + 
##     income_cat + bmi + physhlth_days, family = binomial, data = brfss_logistic)
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            0.805743   0.360167   2.237  0.02528 *  
## exerciseYes           -0.228996   0.138612  -1.652  0.09852 .  
## sexFemale              0.477173   0.151448   3.151  0.00163 ** 
## smokerCurrent          0.264191   0.091167   2.898  0.00376 ** 
## age                   -0.031901   0.002843 -11.219  < 2e-16 ***
## sleep_hrs             -0.154427   0.027348  -5.647 1.64e-08 ***
## income_cat            -0.095566   0.020313  -4.705 2.54e-06 ***
## bmi                    0.005966   0.006313   0.945  0.34461    
## physhlth_days          0.062983   0.004028  15.637  < 2e-16 ***
## exerciseYes:sexFemale  0.057111   0.183730   0.311  0.75592    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4251.3  on 4999  degrees of freedom
## Residual deviance: 3628.4  on 4990  degrees of freedom
## AIC: 3648.4
## 
## Number of Fisher Scoring iterations: 5

3b. (5 pts) Perform a likelihood ratio test comparing the model with the interaction to the model without it.

mod_no_interact <- glm(
  fmd ~ exercise + sex + smoker + age + sleep_hrs + income_cat + bmi + physhlth_days,
  data = brfss_logistic,
  family = binomial
)

anova(mod_no_interact, mod_interact, test = "LRT")
## Analysis of Deviance Table
## 
## Model 1: fmd ~ exercise + sex + smoker + age + sleep_hrs + income_cat + 
##     bmi + physhlth_days
## Model 2: fmd ~ exercise * sex + smoker + age + sleep_hrs + income_cat + 
##     bmi + physhlth_days
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1      4991     3628.5                     
## 2      4990     3628.4  1 0.096573    0.756

#The likelihood ratio test comparing the model with the interaction term to the model without it yields a p-value of 0.756. This indicates that adding the interaction between exercise and sex does not significantly improve model fit. Therefore, the interaction term does not contribute meaningfully to explaining frequent mental distress.

3c. (5 pts) Create a visualization of the interaction using ggpredict() and plot().

ggpredict(mod_interact, terms = c("exercise", "sex")) |>
  plot() +
  labs(
    title = "Predicted Probability of Frequent Mental Distress by Exercise and Sex",
    x = "Exercise",
    y = "Predicted Probability of FMD",
    color = "Sex"
  ) +
  theme_minimal()

3d. (5 pts) In 3-4 sentences, interpret the interaction. Does the effect of one predictor differ across levels of the other? If statistically significant, report the stratum-specific odds ratios.

#The interaction between exercise and sex is not statistically significant (p = 0.756), indicating that the effect of exercise on frequent mental distress does not differ by sex. In other words, the association between exercise and frequent mental distress is similar for both males and females. The interaction plot supports this finding, as the predicted probabilities for males and females show roughly parallel trends across exercise levels. Therefore, the simpler model without the interaction term is preferred, and stratum-specific odds ratios are not necessary to report.


Task 4: Goodness-of-Fit and Discrimination (25 points)

4a. (5 pts) Compute McFadden’s pseudo-R² for your full model using performance::r2_mcfadden().

performance::r2_mcfadden(mod_full)
## # R2 for Generalized Linear Regression
##        R2: 0.147
##   adj. R2: 0.146

#McFadden’s pseudo-R² for the model is 0.147, indicating a moderate level of model fit. Although pseudo-R² values are typically lower than traditional R² in linear regression, this value suggests that the model explains a meaningful portion of the variability in frequent mental distress. Overall, the model provides a reasonable fit to the data.

4b. (5 pts) Perform the Hosmer-Lemeshow goodness-of-fit test using ResourceSelection::hoslem.test(). Report the test statistic and p-value. Comment on the interpretation given your sample size.

hl_test <- hoslem.test(
  x = as.numeric(brfss_logistic$fmd) - 1,
  y = fitted(mod_full),
  g = 10
)

hl_test
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  as.numeric(brfss_logistic$fmd) - 1, fitted(mod_full)
## X-squared = 8.9639, df = 8, p-value = 0.3453

#The Hosmer–Lemeshow goodness-of-fit test yielded a p-value of 0.3453, which is greater than 0.05. This indicates that there is no evidence of poor fit, and the model fits the data adequately. In other words, the predicted probabilities of frequent mental distress are consistent with the observed outcomes across risk groups. Given the large sample size, this result further supports that the model provides a reasonable fit.

4c. (5 pts) Create a calibration plot showing observed vs. predicted probabilities by decile. Comment on whether the model appears well calibrated.

brfss_logistic |>
  mutate(
    pred_prob = fitted(mod_full),
    obs_outcome = as.numeric(fmd) - 1,
    decile = ntile(pred_prob, 10)
  ) |>
  group_by(decile) |>
  summarise(
    mean_pred = mean(pred_prob),
    mean_obs = mean(obs_outcome),
    .groups = "drop"
  ) |>
  ggplot(aes(x = mean_pred, y = mean_obs)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
  geom_point(size = 3, color = "steelblue") +
  geom_line(color = "steelblue") +
  labs(
    title = "Calibration Plot",
    x = "Mean Predicted Probability",
    y = "Observed Proportion"
  ) +
  theme_minimal()

#The calibration plot shows that the predicted probabilities closely follow the 45-degree reference line, indicating good agreement between predicted and observed values. Most points lie near the line across deciles of risk, suggesting that the model is well calibrated. There is no strong evidence of systematic overestimation or underestimation. Overall, the model provides accurate probability predictions for frequent mental distress.

4d. (10 pts) Compute and plot the ROC curve using pROC::roc(). Report the AUC. Based on the AUC value, how would you describe the model’s discrimination ability (poor, acceptable, excellent, outstanding)?

roc_obj <- roc(
  response = brfss_logistic$fmd,
  predictor = fitted(mod_full),
  levels = c("No", "Yes"),
  direction = "<"
)

auc_value <- auc(roc_obj)
auc_value
## Area under the curve: 0.7701
ggroc(roc_obj, linewidth = 1.2) +
  geom_abline(slope = 1, intercept = 1, linetype = "dashed", color = "red") +
  labs(
    title = "ROC Curve for Full Logistic Regression Model",
    subtitle = paste("AUC =", round(auc_value, 3)),
    x = "Specificity",
    y = "Sensitivity"
  ) +
  theme_minimal()

#The area under the ROC curve (AUC) is 0.77, indicating acceptable discrimination. This means that the model is reasonably good at distinguishing between individuals with and without frequent mental distress. In other words, there is a 77% probability that the model will correctly rank a randomly selected individual with frequent mental distress higher than a randomly selected individual without it. Overall, the model demonstrates satisfactory predictive performance.


End of Lab Activity