Lab Setup

library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(gtsummary)
library(nnet)       # multinom() for polytomous regression
library(MASS)       # polr() for ordinal regression
library(brant)      # Brant test for proportional odds
library(ggeffects)

options(gtsummary.use_ftExtra = TRUE)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))

# Load the pre-saved dataset provided with the lecture
brfss_poly <- readRDS("brfss_polytomous_2020.rds")

cat("Dataset loaded. N =", nrow(brfss_poly), "\n")

## Dataset loaded. N = 5000

cat("Variables:", paste(names(brfss_poly), collapse = ", "), "\n")

## Variables: genhlth_ord, genhlth_nom, age, sex, bmi, exercise, income_cat, smoker

Task 1: Explore the Outcome (15 points)

1a. Frequency Table of `genhlth_ord` (5 pts)

brfss_poly |>
  dplyr::count(genhlth_ord) |>
  dplyr::mutate(
    Percentage = round(100 * n / sum(n), 1),
    Cumulative_Pct = cumsum(Percentage)
  ) |>
  knitr::kable(
    col.names = c("General Health Category", "N", "%", "Cumulative %"),
    caption   = "Table 1. Frequency Distribution of General Health (3-Level Outcome)"
  ) |>
  kableExtra::kable_styling(bootstrap_options = "striped", full_width = FALSE)

Table 1. Frequency Distribution of General Health (3-Level Outcome)
General Health Category	N	%	Cumulative %
Excellent/VG	2380	47.6	47.6
Good	1624	32.5	80.1
Fair/Poor	996	19.9	100.0

Interpretation: The sample is distributed across three ordered health categories: Excellent/VG (47.6%), Good (32.5%), and Fair/Poor (19.9%). The majority of respondents report Excellent or Very Good health, with a smaller proportion in the Fair/Poor category, consistent with the typical BRFSS distribution.

1b. Stacked Bar Chart: General Health by Smoking Status (5 pts)

brfss_poly |>
  dplyr::count(smoker, genhlth_ord) |>
  dplyr::group_by(smoker) |>
  dplyr::mutate(pct = 100 * n / sum(n)) |>
  dplyr::ungroup() |>
  ggplot(aes(x = smoker, y = pct, fill = genhlth_ord)) +
  geom_col(position = "stack", width = 0.6) +
  geom_text(
    aes(label = paste0(round(pct, 1), "%")),
    position = position_stack(vjust = 0.5),
    size = 4, color = "white", fontface = "bold"
  ) +
  scale_fill_manual(
    values = c(
      "Excellent/VG" = "#2E8B57",
      "Good"         = "#F0A500",
      "Fair/Poor"    = "#CD5C5C"
    ),
    name = "General Health"
  ) +
  labs(
    title    = "Figure 1. Distribution of General Health by Smoking Status",
    subtitle = "BRFSS 2020 analytical sample (N = 5,000)",
    x        = "Smoking Status",
    y        = "Percentage (%)",
    caption  = "Categories ordered from best (bottom) to worst (top) health."
  ) +
  theme_minimal() +
  theme(
    plot.title      = element_text(face = "bold", size = 13),
    axis.title      = element_text(size = 12),
    legend.position = "right"
  )

Figure 1. Distribution of General Health by Smoking Status

Figure 1 Interpretation: Current smokers show a markedly different health distribution compared to former/never smokers. The proportion reporting Fair/Poor health is notably higher among current smokers, while the proportion reporting Excellent/VG health is lower. This visual contrast motivates the inclusion of smoking status as a key predictor in the regression models.

1c. Descriptive Table Stratified by `genhlth_ord` (5 pts)

brfss_poly |>
  dplyr::select(genhlth_ord, age, sex, bmi, exercise, smoker, income_cat) |>
  tbl_summary(
    by = genhlth_ord,
    label = list(
      age        ~ "Age (years)",
      sex        ~ "Sex",
      bmi        ~ "BMI (kg/m²)",
      exercise   ~ "Exercise in past 30 days",
      smoker     ~ "Smoking status",
      income_cat ~ "Income category (1–8)"
    ),
    statistic = list(
      all_continuous()  ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    digits = list(all_continuous() ~ 1)
  ) |>
  add_overall() |>
  bold_labels() |>
  modify_caption(
    "**Table 2. Descriptive Statistics Stratified by General Health Category**"
  ) |>
  modify_spanning_header(all_stat_cols() ~ "**General Health Category**")

**Table 2. Descriptive Statistics Stratified by General Health Category**
Characteristic	General Health Category
Characteristic	Overall N = 5,000¹	Excellent/VG N = 2,380¹	Good N = 1,624¹	Fair/Poor N = 996¹
Age (years)	57.0 (16.2)	54.7 (16.5)	57.9 (16.3)	60.8 (14.6)
Sex
Male	2,717 (54%)	1,315 (55%)	906 (56%)	496 (50%)
Female	2,283 (46%)	1,065 (45%)	718 (44%)	500 (50%)
BMI (kg/m²)	28.5 (6.3)	27.4 (5.2)	29.1 (6.4)	30.0 (8.0)
Exercise in past 30 days	3,630 (73%)	1,999 (84%)	1,161 (71%)	470 (47%)
Smoking status
Former/Never	3,304 (66%)	1,692 (71%)	1,015 (63%)	597 (60%)
Current	1,696 (34%)	688 (29%)	609 (38%)	399 (40%)
Income category (1–8)
1	225 (4.5%)	39 (1.6%)	72 (4.4%)	114 (11%)
2	271 (5.4%)	69 (2.9%)	92 (5.7%)	110 (11%)
3	403 (8.1%)	124 (5.2%)	128 (7.9%)	151 (15%)
4	512 (10%)	174 (7.3%)	188 (12%)	150 (15%)
5	519 (10%)	198 (8.3%)	197 (12%)	124 (12%)
6	714 (14%)	354 (15%)	249 (15%)	111 (11%)
7	844 (17%)	443 (19%)	289 (18%)	112 (11%)
8	1,512 (30%)	979 (41%)	409 (25%)	124 (12%)
¹ Mean (SD); n (%)

Table 2 Interpretation: Clear gradients in all predictors are visible across health categories. Mean age, BMI, and the proportion of current smokers increase progressively from Excellent/VG to Fair/Poor health. Conversely, exercise participation and income level decrease as health worsens. These patterns confirm that the selected predictors are meaningfully associated with the outcome and justify their inclusion in the regression models.

Task 2: Multinomial Logistic Regression (20 points)

2a. Fit Multinomial Logistic Regression Model (5 pts)

# Fit multinomial logistic regression with 4+ predictors
# Reference category for outcome: "Excellent/VG" (first level)
mod_multi <- multinom(
  genhlth_nom ~ age + sex + bmi + exercise + income_cat + smoker,
  data  = brfss_poly,
  trace = FALSE
)

cat("=== Multinomial Model Summary ===\n")

## === Multinomial Model Summary ===

summary(mod_multi)

## Call:
## multinom(formula = genhlth_nom ~ age + sex + bmi + exercise + 
##     income_cat + smoker, data = brfss_poly, trace = FALSE)
## 
## Coefficients:
##           (Intercept)        age  sexFemale        bmi exerciseYes income_cat
## Good        -1.317306 0.01497067 -0.1417121 0.05087197   -0.500636 -0.1693992
## Fair/Poor   -1.283880 0.02579060 -0.1257209 0.06741312   -1.338926 -0.3932631
##           smokerCurrent
## Good          0.4117899
## Fair/Poor     0.3436415
## 
## Std. Errors:
##           (Intercept)         age  sexFemale         bmi exerciseYes income_cat
## Good        0.2689762 0.002189706 0.06782701 0.005768280  0.08178912 0.01748213
## Fair/Poor   0.3296881 0.002916356 0.08583048 0.006778269  0.09155273 0.02090747
##           smokerCurrent
## Good         0.07723392
## Fair/Poor    0.09720041
## 
## Residual Deviance: 9287.557 
## AIC: 9315.557

Model specification: The multinomial logistic regression model predicts general health (nominal, 3 levels) from age, sex, BMI, exercise participation, income category, and smoking status. The reference category for the outcome is “Excellent/VG.” The model simultaneously estimates two sets of log relative risk ratios: (1) “Good” vs. “Excellent/VG” and (2) “Fair/Poor” vs. “Excellent/VG.”

2b. Relative Risk Ratios (RRRs) with 95% CIs (10 pts)

# Build tidy RRR table, drop intercept rows, format columns
rrr_full <- tidy(mod_multi, conf.int = TRUE, exponentiate = TRUE) |>
  dplyr::filter(term != "(Intercept)") |>
  dplyr::select(y.level, term, estimate, conf.low, conf.high, p.value) |>
  dplyr::mutate(
    dplyr::across(c(estimate, conf.low, conf.high), \(x) round(x, 3)),
    p.value = format.pval(p.value, digits = 3, eps = 0.001)
  )

# Split by comparison and render as two separate kables to avoid
# pack_rows() index errors (row count varies by model/data)
rrr_full |>
  dplyr::filter(y.level == "Good") |>
  dplyr::select(-y.level) |>
  knitr::kable(
    col.names = c("Predictor", "RRR", "Lower 95% CI", "Upper 95% CI", "p-value"),
    caption   = "Table 3a. RRRs: Good vs. Excellent/VG"
  ) |>
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 3a. RRRs: Good vs. Excellent/VG
Predictor	RRR	Lower 95% CI	Upper 95% CI	p-value
age	1.015	1.011	1.019	<0.001
sexFemale	0.868	0.760	0.991	0.0367
bmi	1.052	1.040	1.064	<0.001
exerciseYes	0.606	0.516	0.712	<0.001
income_cat	0.844	0.816	0.874	<0.001
smokerCurrent	1.510	1.297	1.756	<0.001

rrr_full |>
  dplyr::filter(y.level == "Fair/Poor") |>
  dplyr::select(-y.level) |>
  knitr::kable(
    col.names = c("Predictor", "RRR", "Lower 95% CI", "Upper 95% CI", "p-value"),
    caption   = "Table 3b. RRRs: Fair/Poor vs. Excellent/VG"
  ) |>
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 3b. RRRs: Fair/Poor vs. Excellent/VG
Predictor	RRR	Lower 95% CI	Upper 95% CI	p-value
age	1.026	1.020	1.032	<0.001
sexFemale	0.882	0.745	1.043	0.1430
bmi	1.070	1.056	1.084	<0.001
exerciseYes	0.262	0.219	0.314	<0.001
income_cat	0.675	0.648	0.703	<0.001
smokerCurrent	1.410	1.165	1.706	<0.001

2c. Interpretation of RRR for Smoking (Fair/Poor vs. Excellent/VG) (5 pts)

# Extract smoking RRR for Fair/Poor vs Excellent/VG
rrr_tbl <- tidy(mod_multi, conf.int = TRUE, exponentiate = TRUE) |>
  dplyr::filter(y.level == "Fair/Poor", term == "smokerCurrent")

rrr_val  <- round(rrr_tbl$estimate, 2)
rrr_low  <- round(rrr_tbl$conf.low, 2)
rrr_high <- round(rrr_tbl$conf.high, 2)
rrr_p    <- format.pval(rrr_tbl$p.value, digits = 3, eps = 0.001)

cat("Smoking RRR (Fair/Poor vs. Excellent/VG):", rrr_val,
    "(95% CI:", rrr_low, "–", rrr_high, "; p =", rrr_p, ")\n")

## Smoking RRR (Fair/Poor vs. Excellent/VG): 1.41 (95% CI: 1.17 – 1.71 ; p = <0.001 )

Interpretation of RRR for Current Smoking (Fair/Poor vs. Excellent/VG):

Holding age, sex, BMI, exercise, and income constant, current smokers have a relative risk ratio of 1.41 (95% CI: 1.17–1.71; p <0.001) for reporting Fair/Poor health compared to Excellent/Very Good health, relative to former/never smokers. In other words, the relative risk of being classified in the Fair/Poor health category (rather than Excellent/VG) is approximately 41% higher among current smokers than among former/never smokers, after adjusting for the other covariates in the model.

Task 3: Ordinal Logistic Regression (25 points)

3a. Fit Ordinal Logistic Regression Model (5 pts)

# Fit proportional odds model using MASS::polr()
# Hess = TRUE required to compute standard errors
mod_ord <- polr(
  genhlth_ord ~ age + sex + bmi + exercise + income_cat + smoker,
  data = brfss_poly,
  Hess = TRUE
)

cat("=== Ordinal Model Summary ===\n")

## === Ordinal Model Summary ===

summary(mod_ord)

## Call:
## polr(formula = genhlth_ord ~ age + sex + bmi + exercise + income_cat + 
##     smoker, data = brfss_poly, Hess = TRUE)
## 
## Coefficients:
##                  Value Std. Error t value
## age            0.01797   0.001867   9.627
## sexFemale     -0.12885   0.056835  -2.267
## bmi            0.04973   0.004534  10.967
## exerciseYes   -0.92167   0.064426 -14.306
## income_cat    -0.26967   0.014113 -19.107
## smokerCurrent  0.29313   0.064304   4.558
## 
## Intercepts:
##                   Value    Std. Error t value 
## Excellent/VG|Good   0.0985   0.2200     0.4478
## Good|Fair/Poor      1.8728   0.2212     8.4650
## 
## Residual Deviance: 9318.274 
## AIC: 9334.274

Model specification: The proportional odds (cumulative logit) model regresses the ordered health outcome (Excellent/VG < Good < Fair/Poor) on the same set of predictors. Unlike the multinomial model, a single set of regression coefficients is estimated and assumed to apply equally across both cumulative cut-points (≤ Excellent/VG vs. > Excellent/VG; and ≤ Good vs. Fair/Poor). Two intercepts (threshold parameters, labeled zeta) mark the cut-points on the latent health scale.

3b. Cumulative Odds Ratios with 95% CIs (5 pts)

tidy(mod_ord, conf.int = TRUE, exponentiate = TRUE) |>
  dplyr::filter(coef.type == "coefficient") |>
  dplyr::select(term, estimate, conf.low, conf.high) |>
  dplyr::mutate(
    dplyr::across(c(estimate, conf.low, conf.high), \(x) round(x, 3)),
    Direction = dplyr::case_when(
      estimate > 1 ~ "\u2191 Worse health",
      estimate < 1 ~ "\u2193 Better health",
      TRUE         ~ "Null"
    )
  ) |>
  knitr::kable(
    col.names = c("Predictor", "Cumulative OR", "Lower 95% CI", "Upper 95% CI",
                  "Direction"),
    caption   = "Table 4. Ordinal Logistic Regression: Cumulative Odds Ratios"
  ) |>
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 4. Ordinal Logistic Regression: Cumulative Odds Ratios
Predictor	Cumulative OR	Lower 95% CI	Upper 95% CI	Direction
age	1.018	1.014	1.022	↑ Worse health
sexFemale	0.879	0.786	0.983	↓ Better health
bmi	1.051	1.042	1.060	↑ Worse health
exerciseYes	0.398	0.351	0.451	↓ Better health
income_cat	0.764	0.743	0.785	↓ Better health
smokerCurrent	1.341	1.182	1.521	↑ Worse health

3c. Plain-Language Interpretation of One OR (5 pts)

# Extract income OR
or_income <- tidy(mod_ord, conf.int = TRUE, exponentiate = TRUE) |>
  dplyr::filter(coef.type == "coefficient", term == "income_cat")

or_val  <- round(or_income$estimate, 3)
or_low  <- round(or_income$conf.low, 3)
or_high <- round(or_income$conf.high, 3)

cat("Income OR:", or_val, "(95% CI:", or_low, "–", or_high, ")\n")

## Income OR: 0.764 (95% CI: 0.743 – 0.785 )

Interpretation of the Cumulative OR for Income Category:

Each one-unit increase in income category (on the 1–8 scale) is associated with a cumulative odds ratio of 0.764 (95% CI: 0.743–0.785) for reporting worse general health. Because this OR is less than 1.0, higher income is associated with lower odds of worse health at every cut-point. Specifically: moving up one income level multiplies the odds of reporting “Good or worse” health (vs. “Excellent/VG”) by 0.764, AND it multiplies the odds of reporting “Fair/Poor” (vs. “Excellent/VG or Good”) by the same factor of 0.764. The “at every cut-point” property is the core of the proportional odds assumption — the income gradient is assumed to operate with the same magnitude throughout the health scale, not just at one threshold.

3d. Predicted Probabilities by BMI (10 pts)

ggpredict(mod_ord, terms = "bmi [10:60 by=1]") |>
  plot() +
  labs(
    title    = "Figure 2. Predicted Probability of Each Health Category by BMI",
    subtitle = "Ordinal logistic regression; all other predictors held at mean/reference values",
    x        = "BMI (kg/m²)",
    y        = "Predicted Probability",
    caption  = "Shaded bands = 95% confidence intervals. Predictions from MASS::polr() model."
  ) +
  theme_minimal() +
  theme(
    plot.title    = element_text(face = "bold", size = 13),
    plot.subtitle = element_text(size = 11)
  )

Figure 2. Predicted Probability of Each Health Category by BMI (Ordinal Model)

Figure 2 Interpretation: As BMI increases, the predicted probability of reporting Excellent/VG health declines steadily, while the probability of Fair/Poor health rises. The probability of reporting Good health remains comparatively stable across the BMI range, as higher BMI primarily shifts respondents between the two health extremes rather than into or out of the middle category. This gradient is consistent with the well-established epidemiologic literature linking obesity to poorer self-rated health, and reflects the significant negative income-health relationship captured by the single cumulative OR from the ordinal model.

Task 4: Check Assumptions and Compare (15 points)

4a. Brant Test for Proportional Odds Assumption (5 pts)

brant_out <- brant(mod_ord)

## -------------------------------------------- 
## Test for X2  df  probability 
## -------------------------------------------- 
## Omnibus      35.87   6   0
## age      0.06    1   0.81
## sexFemale    0.89    1   0.34
## bmi      7.1 1   0.01
## exerciseYes  10.58   1   0
## income_cat   10.54   1   0
## smokerCurrent    7.76    1   0.01
## -------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds

as.data.frame(brant_out) |>
  tibble::rownames_to_column("Predictor") |>
  dplyr::mutate(
    X2          = round(X2, 2),
    probability = ifelse(probability < 0.001,
                         "< 0.001",
                         sprintf("%.3f", probability)),
    Violated    = dplyr::case_when(
      Predictor == "Omnibus" & (probability == "< 0.001" |
        suppressWarnings(as.numeric(gsub("< ", "", probability))) < 0.05) ~ "Yes (overall model)",
      Predictor == "Omnibus" ~ "No (overall model)",
      probability == "< 0.001" |
        suppressWarnings(as.numeric(gsub("< ", "", probability))) < 0.05  ~ "\u26a0\ufe0f Yes",
      TRUE ~ ""
    )
  ) |>
  knitr::kable(
    col.names = c("Predictor", "Chi-square", "df", "p-value",
                  "Assumption Violated?"),
    caption   = "Table 5. Brant Test for the Proportional Odds Assumption",
    align     = c("l", "r", "r", "r", "c")
  ) |>
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) |>
  kableExtra::row_spec(0, bold = TRUE) |>
  kableExtra::row_spec(1, bold = TRUE, background = "#f0f0f0")

Table 5. Brant Test for the Proportional Odds Assumption
Predictor	Chi-square	df	p-value	Assumption Violated?
Omnibus	35.87	6	< 0.001	Yes (overall model)
age	0.06	1	0.814
sexFemale	0.89	1	0.345
bmi	7.10	1	0.008	⚠️ Yes
exerciseYes	10.58	1	0.001	⚠️ Yes
income_cat	10.54	1	0.001	⚠️ Yes
smokerCurrent	7.76	1	0.005	⚠️ Yes

Brant Test Interpretation: The omnibus Brant test evaluates whether the proportional odds assumption holds for the model as a whole, and predictor-level tests identify which specific variables may be driving any violation. A p-value < 0.05 for the omnibus test indicates that the assumption fails overall; predictor-level p-values < 0.05 flag specific variables whose effect on health differs across cut-points. If violations are detected, the results of the ordinal model should be interpreted cautiously; use of the multinomial model or a partial proportional odds model may be more appropriate.

4b. AIC Comparison: Multinomial vs. Ordinal Model (5 pts)

tibble::tibble(
  Model       = c("Multinomial (nnet::multinom)", "Ordinal PO (MASS::polr)"),
  Parameters  = c(mod_multi$edf,
                  length(coef(mod_ord)) + length(mod_ord$zeta)),
  AIC         = round(c(AIC(mod_multi), AIC(mod_ord)), 2),
  Delta_AIC   = round(c(0, AIC(mod_ord) - AIC(mod_multi)), 2)
) |>
  knitr::kable(
    col.names = c("Model", "No. Parameters", "AIC", "ΔAIC vs. Multinomial"),
    caption   = "Table 6. Model Comparison: Multinomial vs. Ordinal Logistic Regression"
  ) |>
  kableExtra::kable_styling(bootstrap_options = "striped", full_width = FALSE)

Table 6. Model Comparison: Multinomial vs. Ordinal Logistic Regression
Model	No. Parameters	AIC	ΔAIC vs. Multinomial
Multinomial (nnet::multinom)	14	9315.56	0.00
Ordinal PO (MASS::polr)	8	9334.27	18.72

cat("Multinomial AIC:", round(AIC(mod_multi), 2), "\n")

## Multinomial AIC: 9315.56

cat("Ordinal AIC:    ", round(AIC(mod_ord), 2), "\n")

## Ordinal AIC:     9334.27

cat("Difference (Ordinal - Multinomial):", round(AIC(mod_ord) - AIC(mod_multi), 2), "\n")

## Difference (Ordinal - Multinomial): 18.72

AIC Comparison Interpretation: The multinomial model uses 14 parameters compared to 8 for the ordinal model. The model with the lower AIC provides a better balance of fit and parsimony. A ΔAIC > 2 in favor of one model is considered meaningful evidence for that model; a ΔAIC > 10 is strong evidence. The direction and magnitude of this difference, combined with the Brant test results, inform the final model recommendation below.

4c. Model Recommendation (5 pts)

aic_multi <- round(AIC(mod_multi), 2)
aic_ord   <- round(AIC(mod_ord), 2)
delta_aic <- round(aic_ord - aic_multi, 2)

cat("Model recommendation summary:\n")

## Model recommendation summary:

cat("  Multinomial AIC:  ", aic_multi, "\n")

##   Multinomial AIC:   9315.56

cat("  Ordinal AIC:      ", aic_ord, "\n")

##   Ordinal AIC:       9334.27

cat("  Delta AIC:        ", delta_aic,
    ifelse(delta_aic > 0,
           " (multinomial fits better)",
           " (ordinal fits better)"), "\n")

##   Delta AIC:         18.71  (multinomial fits better)

Model Recommendation:

The choice between models depends on two criteria examined above: (1) whether the proportional odds assumption holds (Brant test), and (2) which model achieves a better AIC.

If the Brant omnibus test is non-significant (p ≥ 0.05) and the ordinal model has a lower or comparable AIC, the ordinal proportional odds model is preferred. It is more parsimonious (fewer parameters), produces a single interpretable OR that applies consistently across the entire health scale, and is the standard approach when the outcome is ordered and the PO assumption holds.

If the Brant test is significant (p < 0.05), the proportional odds assumption is violated and the ordinal OR is misleading — it averages over effects that actually differ by cut-point. In that case, the multinomial model is more appropriate despite using more parameters, because it allows the effect of each predictor to vary freely across outcome comparisons.

In practice, if only specific predictors violate the PO assumption (predictor-level Brant p < 0.05 for a subset), a partial proportional odds model (via VGAM::vglm()) is the most flexible solution, allowing those variables to have cut-point-specific effects while retaining proportional odds for the remainder.

Session Information

sessionInfo()

## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggeffects_2.3.2  brant_0.3-0      MASS_7.3-61      nnet_7.3-19     
##  [5] gtsummary_2.5.0  kableExtra_1.4.0 knitr_1.51       broom_1.0.12    
##  [9] lubridate_1.9.3  forcats_1.0.0    stringr_1.5.1    dplyr_1.2.1     
## [13] purrr_1.0.2      readr_2.1.5      tidyr_1.3.1      tibble_3.2.1    
## [17] ggplot2_4.0.2    tidyverse_2.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       xfun_0.56          bslib_0.10.0       insight_1.4.6     
##  [5] lattice_0.22-6     tzdb_0.4.0         vctrs_0.7.3        tools_4.4.2       
##  [9] generics_0.1.4     datawizard_1.3.0   pkgconfig_2.0.3    Matrix_1.7-1      
## [13] RColorBrewer_1.1-3 S7_0.2.1           gt_1.3.0           lifecycle_1.0.5   
## [17] compiler_4.4.2     farver_2.1.2       textshaping_0.4.0  litedown_0.9      
## [21] htmltools_0.5.8.1  sass_0.4.10        yaml_2.3.10        pillar_1.11.1     
## [25] jquerylib_0.1.4    cachem_1.1.0       commonmark_2.0.0   tidyselect_1.2.1  
## [29] digest_0.6.37      stringi_1.8.4      labeling_0.4.3     fastmap_1.2.0     
## [33] grid_4.4.2         cli_3.6.3          magrittr_2.0.3     cards_0.7.1       
## [37] withr_3.0.2        scales_1.4.0       backports_1.5.0    timechange_0.3.0  
## [41] rmarkdown_2.30     otel_0.2.0         hms_1.1.4          evaluate_1.0.5    
## [45] haven_2.5.5        viridisLite_0.4.3  markdown_2.0       rlang_1.1.7       
## [49] glue_1.8.0         xml2_1.5.2         svglite_2.2.2      rstudioapi_0.18.0 
## [53] jsonlite_2.0.0     R6_2.6.1           systemfonts_1.3.1  fs_1.6.6

EPI 553 — Polytomous and Ordinal Regression Lab

Fizza Zaheer

April 16, 2026

Lab Setup

Task 1: Explore the Outcome (15 points)

1a. Frequency Table of `genhlth_ord` (5 pts)

1b. Stacked Bar Chart: General Health by Smoking Status (5 pts)

1c. Descriptive Table Stratified by `genhlth_ord` (5 pts)

Task 2: Multinomial Logistic Regression (20 points)

2a. Fit Multinomial Logistic Regression Model (5 pts)

2b. Relative Risk Ratios (RRRs) with 95% CIs (10 pts)

2c. Interpretation of RRR for Smoking (Fair/Poor vs. Excellent/VG) (5 pts)

Task 3: Ordinal Logistic Regression (25 points)

3a. Fit Ordinal Logistic Regression Model (5 pts)

3b. Cumulative Odds Ratios with 95% CIs (5 pts)

3c. Plain-Language Interpretation of One OR (5 pts)

3d. Predicted Probabilities by BMI (10 pts)

Task 4: Check Assumptions and Compare (15 points)

4a. Brant Test for Proportional Odds Assumption (5 pts)

4b. AIC Comparison: Multinomial vs. Ordinal Model (5 pts)

4c. Model Recommendation (5 pts)

Session Information

EPI 553 — Polytomous and Ordinal Regression Lab

Fizza Zaheer

April 16, 2026

Lab Setup

Task 1: Explore the Outcome (15 points)

1a. Frequency Table of genhlth_ord (5 pts)

1b. Stacked Bar Chart: General Health by Smoking Status (5 pts)

1c. Descriptive Table Stratified by genhlth_ord (5 pts)

Task 2: Multinomial Logistic Regression (20 points)

2a. Fit Multinomial Logistic Regression Model (5 pts)

2b. Relative Risk Ratios (RRRs) with 95% CIs (10 pts)

2c. Interpretation of RRR for Smoking (Fair/Poor vs. Excellent/VG) (5 pts)

Task 3: Ordinal Logistic Regression (25 points)

3a. Fit Ordinal Logistic Regression Model (5 pts)

3b. Cumulative Odds Ratios with 95% CIs (5 pts)

3c. Plain-Language Interpretation of One OR (5 pts)

3d. Predicted Probabilities by BMI (10 pts)

Task 4: Check Assumptions and Compare (15 points)

4a. Brant Test for Proportional Odds Assumption (5 pts)

4b. AIC Comparison: Multinomial vs. Ordinal Model (5 pts)

4c. Model Recommendation (5 pts)

Session Information

1a. Frequency Table of `genhlth_ord` (5 pts)

1c. Descriptive Table Stratified by `genhlth_ord` (5 pts)