In Part 1, we introduced the logistic model, the logit transformation, and the connection between logistic regression coefficients and odds ratios. We fit simple logistic regression models with single predictors and previewed multiple logistic regression.
In Part 2, we go deeper:
Textbook reference: Kleinbaum et al., Chapter 22 (Sections 22.4 and 22.5)
library(tidyverse)
library(knitr)
library(kableExtra)
library(broom)
library(gtsummary)
library(car)
library(ggeffects)
library(ResourceSelection) # for Hosmer-Lemeshow
library(pROC) # for ROC/AUC
library(performance) # for model performance metrics
options(gtsummary.use_ftExtra = TRUE)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))brfss_logistic <- readRDS(
"E:/College WORK MS in EPI/SEM 2 MS EPI/Epi553 STATS 2/brfss_logistic_2020.rds"
)
dim(brfss_logistic)## [1] 5000 10
Outcome: fmd — Frequent Mental Distress
(1 = 14+ mentally unhealthy days in past 30, 0 = otherwise).
We extend the simple model to include several predictors:
\[\text{logit}[\Pr(Y = 1)] = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k\]
Each coefficient \(\beta_j\) represents the change in log-odds for a one-unit increase in \(X_j\), holding all other predictors constant. Exponentiating gives the adjusted odds ratio:
\[\text{aOR}_j = e^{\beta_j}\]
mod_full <- glm(
fmd ~ exercise + smoker + age + sex + sleep_hrs + income_cat + bmi + physhlth_days,
data = brfss_logistic,
family = binomial(link = "logit")
)
mod_full |>
tbl_regression(
exponentiate = TRUE,
label = list(
exercise ~ "Exercise (past 30 days)",
smoker ~ "Smoking status",
age ~ "Age (per year)",
sex ~ "Sex",
sleep_hrs ~ "Sleep hours",
income_cat ~ "Income category (per unit)",
bmi ~ "BMI",
physhlth_days ~ "Physically unhealthy days"
)
) |>
bold_labels() |>
bold_p()| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| Exercise (past 30 days) | |||
| No | — | — | |
| Yes | 0.82 | 0.68, 0.99 | 0.041 |
| Smoking status | |||
| Former/Never | — | — | |
| Current | 1.30 | 1.09, 1.56 | 0.004 |
| Age (per year) | 0.97 | 0.96, 0.97 | <0.001 |
| Sex | |||
| Male | — | — | |
| Female | 1.68 | 1.41, 1.98 | <0.001 |
| Sleep hours | 0.86 | 0.81, 0.90 | <0.001 |
| Income category (per unit) | 0.91 | 0.87, 0.95 | <0.001 |
| BMI | 1.01 | 0.99, 1.02 | 0.4 |
| Physically unhealthy days | 1.06 | 1.06, 1.07 | <0.001 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
Interpretation: Each row gives the adjusted odds ratio (aOR) and 95% CI for one predictor, controlling for all others. For example, the aOR for current smoking compares the odds of frequent mental distress for current smokers vs. former/never smokers, after adjusting for age, sex, sleep, income, BMI, exercise, and physical health. An aOR > 1 indicates a risk factor; an aOR < 1 indicates a protective factor.
A 1-year change in age or a 1-unit change in BMI is rarely the most clinically meaningful comparison. We can rescale to improve interpretation:
mod_scaled <- glm(
fmd ~ exercise + smoker + I(age/10) + sex + sleep_hrs +
income_cat + I(bmi/5) + physhlth_days,
data = brfss_logistic,
family = binomial
)
mod_scaled |>
tbl_regression(
exponentiate = TRUE,
label = list(
"I(age/10)" ~ "Age (per 10 years)",
"I(bmi/5)" ~ "BMI (per 5 units)"
)
) |>
bold_labels()| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| exercise | |||
| No | — | — | |
| Yes | 0.82 | 0.68, 0.99 | 0.041 |
| smoker | |||
| Former/Never | — | — | |
| Current | 1.30 | 1.09, 1.56 | 0.004 |
| Age (per 10 years) | 0.73 | 0.69, 0.77 | <0.001 |
| sex | |||
| Male | — | — | |
| Female | 1.68 | 1.41, 1.98 | <0.001 |
| sleep_hrs | 0.86 | 0.81, 0.90 | <0.001 |
| income_cat | 0.91 | 0.87, 0.95 | <0.001 |
| BMI (per 5 units) | 1.03 | 0.97, 1.10 | 0.4 |
| physhlth_days | 1.06 | 1.06, 1.07 | <0.001 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
Interpretation: Now the aOR for age compares two individuals 10 years apart, and the aOR for BMI compares two individuals 5 BMI units apart, both more clinically interpretable.
Unlike linear regression, which uses ordinary least squares, logistic regression coefficients are estimated by maximum likelihood. The algorithm finds the values of \(\beta_0, \beta_1, \ldots, \beta_k\) that maximize the likelihood of observing the data.
The likelihood function for \(n\) independent binary observations is:
\[L(\boldsymbol{\beta}) = \prod_{i=1}^{n} p_i^{y_i}(1 - p_i)^{1 - y_i}\]
where \(p_i = \Pr(Y_i = 1 \mid X_i)\) is the predicted probability for observation \(i\). Taking the log gives the log-likelihood:
\[\ln L(\boldsymbol{\beta}) = \sum_{i=1}^{n} \left[y_i \ln p_i + (1 - y_i) \ln(1 - p_i)\right]\]
The ML estimates \(\hat{\beta}\) are
obtained iteratively (typically by Newton-Raphson). We never compute
these by hand, but it is important to know that R’s glm()
reports Deviance \(= -2 \ln
\hat{L}\), which is the foundation for hypothesis testing.
The Wald test is the default test reported by
summary() and tidy(). For each coefficient
\(\beta_j\):
\[z = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)} \sim N(0, 1) \text{ under } H_0: \beta_j = 0\]
The p-value tests whether the coefficient is significantly different from zero, equivalently whether the OR is significantly different from 1.
tidy(mod_full, conf.int = TRUE, exponentiate = TRUE) |>
mutate(across(c(estimate, std.error, statistic, conf.low, conf.high),
\(x) round(x, 3)),
p.value = format.pval(p.value, digits = 3)) |>
kable(caption = "Wald Tests for Each Coefficient (Exponentiated)") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 2.198 | 0.356 | 2.215 | 0.02675 | 1.095 | 4.414 |
| exerciseYes | 0.820 | 0.097 | -2.044 | 0.04095 | 0.679 | 0.993 |
| smokerCurrent | 1.301 | 0.091 | 2.890 | 0.00386 | 1.088 | 1.555 |
| age | 0.969 | 0.003 | -11.225 | < 2e-16 | 0.963 | 0.974 |
| sexFemale | 1.675 | 0.086 | 5.975 | 2.30e-09 | 1.415 | 1.985 |
| sleep_hrs | 0.857 | 0.027 | -5.643 | 1.67e-08 | 0.812 | 0.904 |
| income_cat | 0.909 | 0.020 | -4.703 | 2.56e-06 | 0.873 | 0.946 |
| bmi | 1.006 | 0.006 | 0.932 | 0.35113 | 0.993 | 1.018 |
| physhlth_days | 1.065 | 0.004 | 15.634 | < 2e-16 | 1.057 | 1.073 |
Caveat: The Wald test can be unreliable when sample sizes are small or when coefficients are large. The likelihood ratio test is generally preferred for these situations.
The likelihood ratio test compares two nested models: a “full” model and a “reduced” model that drops one or more predictors. The test statistic is:
\[\text{LR} = -2(\ln \hat{L}_{\text{reduced}} - \ln \hat{L}_{\text{full}}) = D_{\text{reduced}} - D_{\text{full}}\]
Under \(H_0\) that the dropped predictors have no effect, LR follows a \(\chi^2\) distribution with degrees of freedom equal to the number of parameters dropped.
mod_reduced <- glm(
fmd ~ age + sex + sleep_hrs + income_cat + bmi + physhlth_days,
data = brfss_logistic,
family = binomial
)
anova(mod_reduced, mod_full, test = "LRT") |>
kable(digits = 3,
caption = "LR Test: Does adding exercise + smoker improve the model?") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi) |
|---|---|---|---|---|
| 4993 | 3641.913 | NA | NA | NA |
| 4991 | 3628.474 | 2 | 13.439 | 0.001 |
Interpretation: The LR test gives a \(\chi^2\) statistic on 3 degrees of freedom (1 for exercise, 2 for smoker — but smoker has 2 levels so 1 dummy variable is created, making df = 2 here actually). A small p-value means the dropped variables jointly contribute to model fit.
| Aspect | Wald test | LR test |
|---|---|---|
| What it tests | Single coefficient or vector | Nested model comparison |
| Computational cost | Very fast | Requires fitting two models |
| Reliability with small samples | Less reliable | Generally preferred |
| Reported by R | summary(model) |
anova(m1, m2, test = "LRT") |
In large samples (like ours with n = 5,000), the two tests usually agree. In smaller samples or with extreme estimates, prefer the LR test.
For the OR of a single coefficient, the 95% CI is computed on the log-odds scale and then exponentiated:
\[95\% \text{ CI for } e^{\beta_j} = \exp\left(\hat{\beta}_j \pm 1.96 \cdot \text{SE}(\hat{\beta}_j)\right)\]
This is the default approach used by confint() and
tidy(..., conf.int = TRUE).
ci_table <- tidy(mod_full, conf.int = TRUE, exponentiate = TRUE) |>
filter(term != "(Intercept)") |>
dplyr::select(term, estimate, conf.low, conf.high) |>
mutate(across(c(estimate, conf.low, conf.high), \(x) round(x, 3)))
ci_table |>
kable(col.names = c("Predictor", "aOR", "95% CI Lower", "95% CI Upper"),
caption = "Adjusted Odds Ratios with 95% CIs") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Predictor | aOR | 95% CI Lower | 95% CI Upper |
|---|---|---|---|
| exerciseYes | 0.820 | 0.679 | 0.993 |
| smokerCurrent | 1.301 | 1.088 | 1.555 |
| age | 0.969 | 0.963 | 0.974 |
| sexFemale | 1.675 | 1.415 | 1.985 |
| sleep_hrs | 0.857 | 0.812 | 0.904 |
| income_cat | 0.909 | 0.873 | 0.946 |
| bmi | 1.006 | 0.993 | 1.018 |
| physhlth_days | 1.065 | 1.057 | 1.073 |
A forest plot is the standard way to visualize multiple ORs and their CIs:
ci_table |>
ggplot(aes(x = estimate, y = reorder(term, estimate))) +
geom_vline(xintercept = 1, linetype = "dashed", color = "red") +
geom_point(size = 3, color = "steelblue") +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2,
color = "steelblue") +
scale_x_log10() +
labs(title = "Forest Plot of Adjusted Odds Ratios for Frequent Mental Distress",
subtitle = "Reference line at OR = 1; log-scale x-axis",
x = "Adjusted Odds Ratio (95% CI)", y = NULL) +
theme_minimal()Interpretation: Predictors whose CIs do not cross the dashed line at OR = 1 are statistically significantly associated with FMD at the 0.05 level. The log-scale x-axis ensures that ORs of 0.5 and 2.0 (which represent equally strong associations in opposite directions) appear equidistant from 1.
Interaction (effect modification) occurs when the effect of one predictor on the outcome depends on the value of another predictor. In logistic regression, interaction is modeled by including a product term:
\[\text{logit}[\Pr(Y=1)] = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 (X_1 \cdot X_2)\]
If \(\beta_3 \neq 0\), the OR for \(X_1\) depends on the value of \(X_2\).
mod_interact <- glm(
fmd ~ exercise * sex + age + smoker + sleep_hrs + income_cat,
data = brfss_logistic,
family = binomial
)
mod_interact |>
tbl_regression(exponentiate = TRUE) |>
bold_labels() |>
bold_p()| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| exercise | |||
| No | — | — | |
| Yes | 0.61 | 0.47, 0.79 | <0.001 |
| sex | |||
| Male | — | — | |
| Female | 1.66 | 1.26, 2.21 | <0.001 |
| IMPUTED AGE VALUE COLLAPSED ABOVE 80 | 0.97 | 0.97, 0.98 | <0.001 |
| smoker | |||
| Former/Never | — | — | |
| Current | 1.25 | 1.05, 1.48 | 0.012 |
| sleep_hrs | 0.81 | 0.77, 0.86 | <0.001 |
| income_cat | 0.85 | 0.82, 0.89 | <0.001 |
| exercise * sex | |||
| Yes * Female | 1.00 | 0.71, 1.41 | >0.9 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
mod_no_interact <- glm(
fmd ~ exercise + sex + age + smoker + sleep_hrs + income_cat,
data = brfss_logistic,
family = binomial
)
anova(mod_no_interact, mod_interact, test = "LRT") |>
kable(digits = 3,
caption = "LR Test for Exercise × Sex Interaction") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi) |
|---|---|---|---|---|
| 4993 | 3870.197 | NA | NA | NA |
| 4992 | 3870.197 | 1 | 0 | 0.987 |
Interpretation: If the p-value for the LR test is small (< 0.05), the interaction is statistically significant: the effect of exercise differs by sex. If not, we can drop the interaction term and use the simpler main-effects model.
ggpredict(mod_interact, terms = c("exercise", "sex")) |>
plot() +
labs(title = "Predicted Probability of FMD by Exercise and Sex",
x = "Exercise", y = "Predicted Probability of FMD",
color = "Sex") +
theme_minimal()Interpretation: If the lines are parallel, there is no interaction. If they cross or diverge, the effect of exercise differs across sex.
When an interaction is present, we report stratum-specific odds ratios rather than a single overall OR:
# Stratum-specific ORs from the interaction model
ggpredict(mod_interact, terms = c("exercise", "sex")) |>
as_tibble() |>
pivot_wider(id_cols = group, names_from = x, values_from = predicted) |>
mutate(OR_yes_vs_no = (Yes / (1 - Yes)) / (No / (1 - No))) |>
dplyr::select(Sex = group, OR_yes_vs_no) |>
kable(digits = 3,
col.names = c("Sex", "OR (Exercise: Yes vs. No)"),
caption = "Stratum-Specific Odds Ratios for Exercise") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Sex | OR (Exercise: Yes vs. No) |
|---|---|
| Male | 0.612 |
| Female | 0.614 |
The deviance of a logistic model is:
\[D = -2 \ln \hat{L}\]
It is analogous to the residual sum of squares in linear regression: smaller is better. By itself, the deviance is hard to interpret, but differences in deviance between nested models follow a \(\chi^2\) distribution and form the basis of the LR test.
glance(mod_full) |>
dplyr::select(null.deviance, df.null, deviance, df.residual, AIC, BIC) |>
kable(digits = 1, caption = "Model Fit Statistics") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| null.deviance | df.null | deviance | df.residual | AIC | BIC |
|---|---|---|---|---|---|
| 4251.3 | 4999 | 3628.5 | 4991 | 3646.5 | 3705.1 |
Quick check: The difference between
null.deviance and deviance represents the
improvement from adding all predictors to an intercept-only model. We
can test this with an LR test on df.null - df.residual
degrees of freedom.
There is no exact analog of \(R^2\) for logistic regression, but several “pseudo-R²” measures exist. The most common is McFadden’s R²:
\[R^2_{\text{McFadden}} = 1 - \frac{\ln \hat{L}_{\text{full}}}{\ln \hat{L}_{\text{null}}}\]
Values between 0.2 and 0.4 are considered excellent fit.
## # R2 for Generalized Linear Regression
## R2: 0.147
## adj. R2: 0.146
Interpretation: McFadden’s R² should not be interpreted on the same scale as linear regression R². Values are typically much smaller (e.g., 0.1 may indicate a reasonable fit).
The Hosmer-Lemeshow test assesses the agreement between predicted and observed event rates within deciles of predicted probability. A non-significant p-value indicates adequate fit.
hl_test <- hoslem.test(
x = as.numeric(brfss_logistic$fmd) - 1,
y = fitted(mod_full),
g = 10
)
hl_test##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: as.numeric(brfss_logistic$fmd) - 1, fitted(mod_full)
## X-squared = 8.9639, df = 8, p-value = 0.3453
Interpretation: A small p-value (< 0.05) suggests that the model does not fit well in some regions of predicted probability. With large samples (like ours), the Hosmer-Lemeshow test can be over-powered and detect trivial misfit. Always pair it with a calibration plot.
brfss_logistic |>
mutate(pred_prob = fitted(mod_full),
obs_outcome = as.numeric(fmd) - 1,
decile = ntile(pred_prob, 10)) |>
group_by(decile) |>
summarise(
mean_pred = mean(pred_prob),
mean_obs = mean(obs_outcome),
.groups = "drop"
) |>
ggplot(aes(x = mean_pred, y = mean_obs)) +
geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
geom_point(size = 3, color = "steelblue") +
geom_line(color = "steelblue") +
labs(title = "Calibration Plot: Observed vs. Predicted Probability of FMD",
subtitle = "Points should fall on the dashed line for perfect calibration",
x = "Mean Predicted Probability (within decile)",
y = "Observed Proportion (within decile)") +
theme_minimal()Interpretation: A well-calibrated model has points lying close to the 45-degree line. Systematic departures suggest miscalibration: points above the line indicate the model under-predicts; points below indicate over-prediction.
While calibration assesses how well predicted probabilities match observed rates, discrimination assesses how well the model separates events from non-events.
The ROC curve plots sensitivity (true positive rate) against 1 − specificity (false positive rate) across all possible probability cutoffs.
The AUC (area under the ROC curve) summarizes discrimination:
| AUC | Discrimination |
|---|---|
| 0.5 | No discrimination (chance) |
| 0.6-0.7 | Poor |
| 0.7-0.8 | Acceptable |
| 0.8-0.9 | Excellent |
| > 0.9 | Outstanding |
roc_obj <- roc(
response = brfss_logistic$fmd,
predictor = fitted(mod_full),
levels = c("No", "Yes"),
direction = "<"
)
auc_value <- auc(roc_obj)
ggroc(roc_obj, color = "steelblue", linewidth = 1.2) +
geom_abline(slope = 1, intercept = 1, linetype = "dashed", color = "red") +
labs(title = "ROC Curve for Frequent Mental Distress Model",
subtitle = paste0("AUC = ", round(auc_value, 3)),
x = "Specificity", y = "Sensitivity") +
theme_minimal()Interpretation: An AUC of approximately 0.75-0.80 indicates acceptable to excellent discrimination, meaning the model can distinguish between individuals with and without FMD reasonably well. Note that calibration and discrimination are distinct concepts: a model can have good discrimination but poor calibration, or vice versa.
For continuous predictors, logistic regression assumes a linear relationship between the predictor and the log-odds. We can check this with a smoothed plot of the logit against the predictor.
brfss_logistic |>
mutate(logit_pred = predict(mod_full, type = "link")) |>
ggplot(aes(x = age, y = logit_pred)) +
geom_point(alpha = 0.2, color = "steelblue") +
geom_smooth(method = "loess", se = FALSE, color = "darkorange") +
labs(title = "Linearity in the Logit: Age",
x = "Age (years)", y = "Predicted Log-Odds (logit)") +
theme_minimal()Interpretation: A roughly linear loess curve supports the linearity assumption. A clearly curved pattern suggests we should add a quadratic term or use a spline.
Cook’s distance and standardized residuals from logistic regression can be examined the same way as in linear regression:
brfss_logistic |>
mutate(cooks_d = cooks.distance(mod_full),
row_id = row_number()) |>
ggplot(aes(x = row_id, y = cooks_d)) +
geom_point(alpha = 0.4, color = "steelblue") +
geom_hline(yintercept = 4 / nrow(brfss_logistic),
linetype = "dashed", color = "red") +
labs(title = "Cook's Distance for Logistic Regression Model",
subtitle = "Red line: 4/n threshold",
x = "Observation Index", y = "Cook's Distance") +
theme_minimal()VIFs work the same way as in linear regression:
vif(mod_full) |>
as.data.frame() |>
rownames_to_column("Predictor") |>
kable(digits = 2, caption = "Variance Inflation Factors") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Predictor | vif(mod_full) |
|---|---|
| exercise | 1.13 |
| smoker | 1.12 |
| age | 1.17 |
| sex | 1.01 |
| sleep_hrs | 1.02 |
| income_cat | 1.14 |
| bmi | 1.03 |
| physhlth_days | 1.20 |
Rule of thumb: VIF > 5 (or 10) indicates problematic multicollinearity.
A publication-quality logistic regression table should include:
mod_full |>
tbl_regression(
exponentiate = TRUE,
label = list(
exercise ~ "Exercise (past 30 days)",
smoker ~ "Smoking status",
age ~ "Age",
sex ~ "Sex",
sleep_hrs ~ "Sleep hours",
income_cat ~ "Income category",
bmi ~ "BMI",
physhlth_days ~ "Physically unhealthy days"
)
) |>
add_glance_source_note(
include = c(nobs, AIC, BIC),
label = list(nobs ~ "N", AIC ~ "AIC", BIC ~ "BIC")
) |>
bold_labels() |>
bold_p() |>
modify_caption("**Adjusted Odds Ratios for Frequent Mental Distress, BRFSS 2020**")| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| Exercise (past 30 days) | |||
| No | — | — | |
| Yes | 0.82 | 0.68, 0.99 | 0.041 |
| Smoking status | |||
| Former/Never | — | — | |
| Current | 1.30 | 1.09, 1.56 | 0.004 |
| Age | 0.97 | 0.96, 0.97 | <0.001 |
| Sex | |||
| Male | — | — | |
| Female | 1.68 | 1.41, 1.98 | <0.001 |
| Sleep hours | 0.86 | 0.81, 0.90 | <0.001 |
| Income category | 0.91 | 0.87, 0.95 | <0.001 |
| BMI | 1.01 | 0.99, 1.02 | 0.4 |
| Physically unhealthy days | 1.06 | 1.06, 1.07 | <0.001 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
| N = 5,000; AIC = 3,646; BIC = 3,705 | |||
| Concept | Tool / R function |
|---|---|
| Multiple logistic regression | glm(y ~ x1 + x2 + ..., family = binomial) |
| Adjusted odds ratios | tidy(model, exponentiate = TRUE) |
| Wald test | Default in summary() and tidy() |
| Likelihood ratio test | anova(reduced, full, test = "LRT") |
| Confidence intervals | confint(model) (profile CI) or
tidy(..., conf.int = TRUE) |
| Interaction | glm(y ~ x1 * x2, ...); test with LR test |
| Pseudo-R² | performance::r2_mcfadden() |
| Hosmer-Lemeshow | ResourceSelection::hoslem.test() |
| Calibration plot | Decile-based observed vs. predicted |
| ROC curve / AUC | pROC::roc() and pROC::auc() |
| Diagnostics | cooks.distance(), vif(), linearity
plots |
| Publication table | gtsummary::tbl_regression() |
EPI 553 — Logistic Regression Part 2 Lab Due: End of class, April 14, 2026
In this lab, you will build a multiple logistic regression model, conduct hypothesis tests, examine an interaction, and assess goodness-of-fit and discrimination. Use the same BRFSS 2020 logistic dataset from Part 1. Work through each task systematically. You may discuss concepts with classmates, but your written answers and R code must be your own.
Submission: Knit your .Rmd to HTML and upload to Brightspace by end of class.
| Variable | Description | Type |
|---|---|---|
fmd |
Frequent mental distress (No/Yes) | Factor (outcome) |
menthlth_days |
Mentally unhealthy days (0–30) | Numeric |
physhlth_days |
Physically unhealthy days (0–30) | Numeric |
sleep_hrs |
Sleep hours per night (1–14) | Numeric |
age |
Age in years (capped at 80) | Numeric |
sex |
Sex (Male/Female) | Factor |
bmi |
Body mass index | Numeric |
exercise |
Exercised in past 30 days (No/Yes) | Factor |
income_cat |
Household income category (1–8) | Numeric |
smoker |
Former/Never vs. Current | Factor |
library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(gtsummary)
library(car)
library(ggeffects)
library(ResourceSelection)
library(pROC)
library(performance)
brfss_logistic <- readRDS(
"E:/College WORK MS in EPI/SEM 2 MS EPI/Epi553 STATS 2/brfss_logistic_2020.rds"
)1a. (5 pts) Fit a multiple logistic regression model
predicting fmd from at least 5 predictors of your
choice.
mod_mlr_full <- glm(
fmd ~ sleep_hrs + exercise + smoker + sex + income_cat + physhlth_days,
data = brfss_logistic,
family = binomial(link = "logit")
)
tidy(mod_full, conf.int = TRUE, exponentiate = FALSE) |>
mutate(across(where(is.numeric), ~ round(.x, 3))) |>
kable(caption = "Multiple Logistic Regression: FMD ~ 6 Predictors (Log-Odds Scale)") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 0.788 | 0.356 | 2.215 | 0.027 | 0.091 | 1.485 |
| exerciseYes | -0.198 | 0.097 | -2.044 | 0.041 | -0.387 | -0.007 |
| smokerCurrent | 0.263 | 0.091 | 2.890 | 0.004 | 0.084 | 0.442 |
| age | -0.032 | 0.003 | -11.225 | 0.000 | -0.038 | -0.026 |
| sexFemale | 0.516 | 0.086 | 5.975 | 0.000 | 0.347 | 0.686 |
| sleep_hrs | -0.154 | 0.027 | -5.643 | 0.000 | -0.208 | -0.101 |
| income_cat | -0.096 | 0.020 | -4.703 | 0.000 | -0.135 | -0.056 |
| bmi | 0.006 | 0.006 | 0.932 | 0.351 | -0.007 | 0.018 |
| physhlth_days | 0.063 | 0.004 | 15.634 | 0.000 | 0.055 | 0.071 |
1b. (5 pts) Report the adjusted ORs with 95% CIs in
a publication-quality table using tbl_regression().
mod_mlr_full |>
tbl_regression(
exponentiate = TRUE,
label = list(
sleep_hrs ~ "Sleep hours (per 1 hr)",
exercise ~ "Exercise (past 30 days)",
smoker ~ "Smoking status",
sex ~ "Sex",
income_cat ~ "Income category (per unit)",
physhlth_days ~ "Physically unhealthy days"
)
) |>
bold_labels() |>
bold_p() |>
modify_caption("**Table 1. Adjusted Odds Ratios for Frequent Mental Distress, BRFSS 2020**")| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| Sleep hours (per 1 hr) | 0.82 | 0.78, 0.87 | <0.001 |
| Exercise (past 30 days) | |||
| No | — | — | |
| Yes | 0.91 | 0.76, 1.10 | 0.3 |
| Smoking status | |||
| Former/Never | — | — | |
| Current | 1.66 | 1.40, 1.96 | <0.001 |
| Sex | |||
| Male | — | — | |
| Female | 1.62 | 1.38, 1.92 | <0.001 |
| Income category (per unit) | 0.92 | 0.88, 0.95 | <0.001 |
| Physically unhealthy days | 1.06 | 1.05, 1.07 | <0.001 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
1c. (5 pts) Interpret the adjusted OR for two predictors of your choice in 1-2 sentences each. Make sure to mention what the OR represents (per unit change for continuous; reference category for categorical).
Sleep hours: For each additional hour of sleep per night, the adjusted odds of frequent mental distress decrease by 18% (OR = 0.82). This means that more sleep is associated with lower odds of reporting poor mental health, holding all other variables constant.
Exercise Compared with adults who did not exercise in the past 30 days, those who did exercise have slightly lower adjusted odds of frequent mental distress (OR = 0.91), although this association is not statistically significant (p = 0.30). This means exercise does not show a meaningful independent association with mental distress after adjusting for other variables.
2a. (5 pts) Identify the Wald p-value for each
predictor in your model from the tidy() or
summary() output.
tidy(mod_mlr_full, conf.int = TRUE, exponentiate = TRUE) |>
filter(term != "(Intercept)") |>
mutate(
across(c(estimate, std.error, statistic, conf.low, conf.high), ~ round(.x, 3)),
p.value = format.pval(p.value, digits = 3, eps = 0.001)
) |>
kable(
col.names = c("Term", "aOR", "SE", "z (Wald)", "95% CI Lower", "95% CI Upper", "Wald p-value"),
caption = "Wald Tests for Each Coefficient in the Full Model"
) |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Term | aOR | SE | z (Wald) | 95% CI Lower | 95% CI Upper | Wald p-value |
|---|---|---|---|---|---|---|
| sleep_hrs | 0.823 | 0.027 | -7.206 | <0.001 | 0.781 | 0.868 |
| exerciseYes | 0.913 | 0.094 | -0.967 | 0.334 | 0.761 | 1.099 |
| smokerCurrent | 1.656 | 0.087 | 5.801 | <0.001 | 1.396 | 1.964 |
| sexFemale | 1.625 | 0.085 | 5.717 | <0.001 | 1.376 | 1.920 |
| income_cat | 0.916 | 0.020 | -4.386 | <0.001 | 0.881 | 0.953 |
| physhlth_days | 1.057 | 0.004 | 14.483 | <0.001 | 1.049 | 1.065 |
Interpretation: Sleep hours (aOR = 0.823, p < 0.001) Each additional hour of sleep is associated with 17.7% lower odds of frequent mental distress. The Wald test shows this effect is highly significant.
Exercise (aOR = 0.913, p = 0.334) People who exercised in the past 30 days have slightly lower odds of frequent mental distress, but the Wald test shows this effect is not statistically significant.
Current smoking (aOR = 1.656, p < 0.001) Current smokers have 66% higher odds of frequent mental distress compared with former/never smokers. The Wald test strongly supports this as a significant independent predictor.
Female sex (aOR = 1.625, p < 0.001) Women have 62.5% higher odds of frequent mental distress compared with men. The Wald test confirms this is a significant effect.
Income category (aOR = 0.916, p < 0.001) Each step up in income category reduces the odds of frequent mental distress by 8.4%. The Wald test shows this is a significant protective factor.
Physical health days (aOR = 1.057, p < 0.001) Each additional physically unhealthy day increases the odds of frequent mental distress by 5.7%. This is one of the strongest predictors, with a very large Wald statistic.
2b. (5 pts) Fit a reduced model that drops one
predictor of your choice. Perform a likelihood ratio test comparing the
full and reduced models using
anova(reduced, full, test = "LRT").
We will drop exercise category from the full model to test whether it jointly improves model fit.
mod_mlr_reduced <- glm(
fmd ~ sleep_hrs + smoker + sex + income_cat + physhlth_days,
data = brfss_logistic,
family = binomial
)
anova(mod_mlr_reduced, mod_mlr_full, test = "LRT") |>
kable(
digits = 3,
caption = "LR Test: Does dropping exercise improve the model?"
) |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi) |
|---|---|---|---|---|
| 4994 | 3762.947 | NA | NA | NA |
| 4993 | 3762.018 | 1 | 0.928 | 0.335 |
2c. (5 pts) Compare the conclusions from the Wald test (for the dropped predictor) and the LR test. Do they agree? In 2-3 sentences, explain when the two tests might disagree.
The Wald test for exerciseYes showed a non‑significant effect (p = 0.334), and the LR test also showed no significant improvement when exercise was removed (Deviance = 0.928, df = 1, p = 0.335). Both tests conclude that exercise does not independently improve model fit. The Wald test can disagree with the LR test when the coefficient estimate is unstable, has a large standard error, or lies near the boundary of the parameter space. This is because the Wald test relies on normal approximation, which can perform poorly in small samples. The LR test is generally more robust because it compares the full and reduced models directly, so disagreement usually happens when the Wald test underestimates significance while the LR test detects a real effect. —
3a. (5 pts) Fit a model that includes an interaction between two predictors of your choice.
We will test whether the effect of sleeping hours on FMD differs by sex.
mod_mlr_interact <- glm(
fmd ~ sleep_hrs* sex + smoker + exercise + income_cat + physhlth_days,
data = brfss_logistic,
family = binomial
)
mod_mlr_interact |>
tbl_regression(exponentiate = TRUE) |>
bold_labels() |>
bold_p() |>
modify_caption("**Model with sleeping hours and Sex Interaction**")| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| sleep_hrs | 0.80 | 0.74, 0.87 | <0.001 |
| sex | |||
| Male | — | — | |
| Female | 1.15 | 0.56, 2.38 | 0.7 |
| smoker | |||
| Former/Never | — | — | |
| Current | 1.66 | 1.40, 1.96 | <0.001 |
| exercise | |||
| No | — | — | |
| Yes | 0.91 | 0.76, 1.10 | 0.3 |
| income_cat | 0.92 | 0.88, 0.95 | <0.001 |
| physhlth_days | 1.06 | 1.05, 1.06 | <0.001 |
| sleep_hrs * sex | |||
| sleep_hrs * Female | 1.05 | 0.95, 1.17 | 0.3 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
Interpretation: The OR for sleep is 0.80, meaning each additional hour of sleep reduces the odds of FMD by 20% for the reference group (men). The OR for female sex is 1.15, but not significant (p = 0.70), meaning women do not differ from men in baseline odds of FMD after adjustment. —
3b. (5 pts) Perform a likelihood ratio test comparing the model with the interaction to the model without it.
mod_mlr_no_interact <- glm(
fmd ~ sleep_hrs + sex + smoker + exercise + income_cat + physhlth_days,
data = brfss_logistic,
family = binomial
)
anova(mod_mlr_no_interact, mod_mlr_interact, test = "LRT") |>
kable(
digits = 3,
caption = "LR Test for sleep hours and Sex Interaction"
) |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi) |
|---|---|---|---|---|
| 4993 | 3762.018 | NA | NA | NA |
| 4992 | 3761.115 | 1 | 0.903 | 0.342 |
Interpretation: Because the LR test p‑value is 0.342, > α = 0.05, we fail to reject the null hypothesis that the sleep and sex interaction coefficient equals zero. This means there is no statistically significant evidence that the effect of sleep hours on frequent mental distress differs between men and women.
3c. (5 pts) Create a visualization of the
interaction using ggpredict() and plot().
ggpredict(mod_mlr_interact, terms = c("sleep_hrs", "sex")) |>
plot() +
labs(
title = "Predicted Probability of FMD by Sleep Hours and Sex",
subtitle = "Interaction model; ribbons = 95% confidence intervals",
x = "Sleeping Hours",
y = "Predicted Probability of FMD",
color = "Sex"
) +
scale_y_continuous(labels = scales::percent_format()) +
theme_minimal()Interpretation: More sleep is associated with a lower predicted probability of frequent mental distress (FMD) for both men and women. Women have somewhat higher predicted probabilities overall but the slopes are nearly parallel.
3d. (5 pts) In 3-4 sentences, interpret the interaction. Does the effect of one predictor differ across levels of the other? If statistically significant, report the stratum-specific odds ratios.
Since I want to look at sleep, I am comparing two sleep amounts Short Sleep (5 hours) vs. Healthy Sleep (8 hours).
ggpredict(mod_mlr_interact, terms = c("sleep_hrs [5, 8]", "sex")) |>
as_tibble() |>
pivot_wider(id_cols = group, names_from = x, values_from = predicted) |>
mutate(OR_5vs8 = (`5` / (1 - `5`)) / (`8` / (1 - `8`))) |>
dplyr::select(Sex = group, OR_5vs8) |>
kable(
digits = 3,
col.names = c("Sex", "OR (5 hrs vs. 8 hrs Sleep)"),
caption = "Odds of Outcome: Short Sleep (5h) vs. Adequate Sleep (8h) Stratified by Sex"
) |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Sex | OR (5 hrs vs. 8 hrs Sleep) |
|---|---|
| Male | 1.949 |
| Female | 1.672 |
Interpretation: Men (OR = 1.949) Among men, sleeping 5 hours instead of 8 hours is associated with approximately 95% higher odds of frequent mental distress. In other words, short‑sleeping men have nearly double the odds of FMD compared with men who sleep adequately.
Women (OR = 1.672) Among women, sleeping 5 hours instead of 8 hours is associated with about 67% higher odds of frequent mental distress. Women sleeping lower hours show elevated risk, but the increase is smaller than in men.
4a. (5 pts) Compute McFadden’s pseudo-R² for your
full model using performance::r2_mcfadden().
## # R2 for Generalized Linear Regression
## R2: 0.115
## adj. R2: 0.115
Interpretation: An R² of 0.115 means that full model explains about 11.5% of the variation in the likelihood of frequent mental distress. In logistic regression, R² values are typically much lower than in linear regression, so an R² around 0.10–0.20 is common even for well‑performing models. This shows that while the predictors (sleep, smoking, sex, income, physical health days, etc.) cover a meaningful risk, most of the variability in mental distress is driven by factors not included in the model, which is expected for complex behavioral health outcomes.
4b. (5 pts) Perform the Hosmer-Lemeshow
goodness-of-fit test using
ResourceSelection::hoslem.test(). Report the test statistic
and p-value. Comment on the interpretation given your sample size.
hl_lab <- hoslem.test(
x = as.numeric(brfss_logistic$fmd) - 1,
y = fitted(mod_mlr_full),
g = 10
)
hl_lab##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: as.numeric(brfss_logistic$fmd) - 1, fitted(mod_mlr_full)
## X-squared = 16.297, df = 8, p-value = 0.03832
tibble(
Statistic = round(hl_lab$statistic, 3),
df = hl_lab$parameter,
`p-value` = format.pval(hl_lab$p.value, digits = 3, eps = 0.001)
) |>
kable(caption = "Hosmer-Lemeshow Goodness-of-Fit Test") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Statistic | df | p-value |
|---|---|---|
| 16.297 | 8 | 0.0383 |
Interpretation: The Hosmer-Lemeshow (HL) test evaluates whether the observed event proportions within deciles of predicted probability agree with the model’s predicted probabilities. Although the Hosmer–Lemeshow test was statistically significant (p = 0.038), it suggests minor miscalibration and this is likely due to the large sample size, and the model still appears to fit reasonably well based on other diagnostics.
4c. (5 pts) Create a calibration plot showing observed vs. predicted probabilities by decile. Comment on whether the model appears well calibrated.
brfss_logistic |>
mutate(
pred_prob = fitted(mod_mlr_full),
obs_outcome = as.numeric(fmd) - 1,
decile = ntile(pred_prob, 10)
) |>
group_by(decile) |>
summarise(
mean_pred = mean(pred_prob),
mean_obs = mean(obs_outcome),
n = n(),
.groups = "drop"
) |>
ggplot(aes(x = mean_pred, y = mean_obs)) +
geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed",
linewidth = 0.8) +
geom_point(aes(size = n), color = "steelblue", alpha = 0.85) +
geom_line(color = "steelblue", linewidth = 0.7) +
scale_size_continuous(name = "Decile n", range = c(3, 8)) +
scale_x_continuous(labels = scales::percent_format()) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Calibration Plot: Observed vs. Predicted Probability of FMD",
subtitle = "Each point = one predicted probability decile; dashed line = perfect calibration",
x = "Mean Predicted Probability (within decile)",
y = "Observed Proportion with FMD (within decile)"
) +
theme_minimal()Interpretation: The calibration plot compares the model’s predicted probabilities of frequent mental distress (FMD) with the observed proportions across deciles of predicted risk. The blue line (observed values) follows the red dashed line (perfect calibration) fairly closely, and it shows that the model’s predictions are well‑aligned with actual outcomes.
There is some mild deviation at a few deciles where the observed points fall slightly above or below the ideal line but overall it shows reasonable calibration, especially given the large sample size.
4d. (10 pts) Compute and plot the ROC curve using
pROC::roc(). Report the AUC. Based on the AUC value, how
would you describe the model’s discrimination ability (poor, acceptable,
excellent, outstanding)?
roc_lab <- roc(
response = brfss_logistic$fmd,
predictor = fitted(mod_mlr_full),
levels = c("No", "Yes"),
direction = "<"
)
auc_lab <- auc(roc_lab)
# ROC curve plot
ggroc(roc_lab, color = "#2166ac", linewidth = 1.3) +
geom_abline(slope = 1, intercept = 1, linetype = "dashed",
color = "red", linewidth = 0.8) +
annotate("text", x = 0.35, y = 0.15,
label = paste0("AUC = ", round(auc_lab, 3)),
size = 5, color = "#2166ac", fontface = "bold") +
labs(
title = "ROC Curve for Frequent Mental Distress — Lab Model",
subtitle = "Blue curve = model; red dashed = chance (AUC = 0.50)",
x = "Specificity (1 − False Positive Rate)",
y = "Sensitivity (True Positive Rate)"
) +
theme_minimal()# Print AUC with confidence interval
ci_auc <- ci.auc(roc_lab)
tibble(
AUC = round(as.numeric(auc_lab), 3),
`95% CI Lower` = round(ci_auc[1], 3),
`95% CI Upper` = round(ci_auc[3], 3)
) |>
kable(caption = "AUC with 95% Bootstrap Confidence Interval") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| AUC | 95% CI Lower | 95% CI Upper |
|---|---|---|
| 0.736 | 0.716 | 0.757 |
Interpretation: An AUC of 0.736 falls in the range with acceptable discrimination.It means the model can reasonably distinguish between individuals with and without frequent mental distress, performing better than chance but not at the level of an excellent one.
Overall interpretation:
Frequent mental distress among U.S. adults is influenced by a combination of behavioral, demographic, and socioeconomic factors. In the fully adjusted model, shorter sleep duration, current smoking, lower income, and a greater number of physically unhealthy days each independently increased the odds of FMD, while exercise showed no meaningful association after adjustment. Model performance was solid, with acceptable discrimination (AUC = 0.736) and reasonable calibration, despite a slightly significant Hosmer–Lemeshow test that likely reflects the large sample size rather than substantive misfit. Interaction testing showed that the protective effect of sleep did not differ significantly by sex, and predicted‑probability plots confirmed parallel declines in FMD risk for men and women as sleep increased. Overall, the findings highlight sleep, smoking cessation, and physical health improvement as key important points for population‑level mental health interventions.
End of Lab Activity