Part 0. Data Loading and Setup
0.1. Load Required Packages
- Required packages for data cleaning: tidyverse
- Required packages for presentation: knitr, kableExtra
- Required packages for data retrieval: vroom
0.2 Import Map of Run IDs
# Import Project Lookup
#--------------------------------------------------------------------------------------------------
# This can be used to add additional years or remove years more quickly. This is currently
# set to 2018-2019, but by switching the excel here to:
# "LU_byYear_2016-2019-AlexMossawir_3.15.26.xlsx"
# the full 2016-2019 range can be ingested.
#--------------------------------------------------------------------------------------------------
lu_run <- readRDS("data/output/output_4.11.rds")
##Print LU for Reference
lu_run %>%
select(lu_year, lu_url, lu_dict, limit, data_timestamp) |>
kable(
caption = "Project Lookup"
)
0.3. Import Final Phase 1 Dataset
Loop through the selected run lookup, import specified RDS for every
run_id, store timestamp
import_path_name <- "phase_1_processed"
import_destinationcol_name <- "a_data"
## File's processed data saved to
lu_run$import_cached_data <- file.path(paste0("data/",import_path_name,"/",lu_run$run_id,".rds"))
lu_run$import_cached_ts <- file.path(paste0("data/",import_path_name,"/",lu_run$run_id,"_ts.txt"))
for (i in seq_len(nrow(lu_run))) {
##Read RDS from listed path
lu_run[[import_destinationcol_name]][[i]] <- readRDS(
lu_run$import_cached_data[[i]]
)
## Store date read from timestamp
lu_run[[paste0(import_destinationcol_name,"_ts")]][[i]] <- readLines(lu_run$import_cached_ts[[i]])
}
rm(import_destinationcol_name, import_path_name, i)
0.4. Import Dictionaries
import_path_name <- "phase_1_dicts"
import_dictionary_path <- file.path(paste0("data/",import_path_name,"/dictionary.rds"))
import_dictionary_ts <- file.path(paste0("data/",import_path_name,"/dictionary_ts.txt"))
#Dictionary
dicts <- readRDS(import_dictionary_path)
rm(import_dictionary_path, import_path_name)
Part 1. Analytical Sample Update
The analytical sample has been updated to add the following rules
since the first check-in:
- Exclude observations where SPARCS-encoded sex is “U”, as a result of
the small sample size for this group.
- Exclude observations where CCSR diagnosis is missing.
1.1. Analytical Sample Size
The workflow has filtered the data down to the analytical sample and
outputted the results of each step of the process into a complete audit
table.
# Cache Info and Location
#--------------------------------------------------------------------------------------------------
audit_table <- lu_run$audit %>% bind_rows() %>%
filter(applied == TRUE & !(n_change == 0)) %>%
arrange(run_id, rule_id) %>%
mutate(
year = factor(run_id,
levels = c("puf_2016","puf_2017","puf_2018","puf_2019"),
labels = c("2016",
"2017",
"2018",
"2019")
))
n_change <- audit_table%>%
pivot_wider(names_from = year, values_from = c(n_change)) %>%
# filter(applied == TRUE) %>%
mutate(
logged_as = factor(logged_as, levels = unique(logged_as)) ) %>%
group_by(logged_as) %>%
summarize(
# "2016" = sum(`2016`, na.rm=TRUE),
# "2017" = sum(`2017`, na.rm=TRUE),
"2018" = sum(`2018`, na.rm=TRUE),
"2019" = sum(`2019`, na.rm=TRUE))
n_initial <- audit_table %>%
group_by(year) %>%
summarize(
"logged_as" = "Initial Sample Size",
"n" = max(n_initial)
) %>%
pivot_wider(names_from = year, values_from = n)
n_final <- audit_table %>%
group_by(year) %>%
summarize(
"logged_as" = "Final Analytical Sample",
"n" = min(n_final)
) %>%
pivot_wider(names_from = year, values_from = n)
sample_size <- bind_rows(
n_initial, n_change, n_final)
sample_size %>% kable(
caption= "Analytical Sample Size",
col.names = c("",
#"2016",
#"2017",
"2018",
"2019"),
format.args = list(big.mark = ",")
) %>% kable_styling() %>%
row_spec(1, bold=TRUE) %>%
row_spec(nrow(sample_size), bold = TRUE)
Analytical Sample Size
|
|
2018
|
2019
|
|
Initial Sample Size
|
2,352,807
|
2,339,462
|
|
Enhanced De-ID
|
8,804
|
9,071
|
|
Exclude Newborns
|
212,809
|
210,594
|
|
Include Only Medical Admissions
|
562,563
|
563,070
|
|
Exclude unknown gender due to cell size limits
|
13
|
15
|
|
Exclude missing CCSR diagnoses
|
46
|
38
|
|
Final Analytical Sample
|
1,568,572
|
1,556,674
|
rm(n_change, n_initial, n_final)
Part 2. Regression Model Specification
This question requires a multi-level model wherein the outcome has a
relationship with both micro-level variables (individual case-mix
qualities of the discharge), and macro-level variables (hospital-level
qualities).
This initial model will focus on the outcome INF002
(Septicemia), based on the results of the exploratory data
analysis and subsequent literature review.
2.1. Model Type
Because the outcome for this analysis is a binary variable (discharge
diagnosis in specified category = 1), a logistic regression will be
used. This analysis will use glmer within the
lme4 package, to implement a mixed-effects model.
2.2. Regression Equation
The logit of the probability that a discharge was assigned INF002 was
regressed on permanent facility ID (random intercept), adjusting for age
group, gender, grouped primary payment typology, and APR Severity of
Illness.
\[\begin{aligned}
\text{logit}\left( P(Y_{ij} = 1) \right)
&= \beta_0 \\
&\quad + \beta^{(\text{year})}_{2019}\,\text{Year}_{2019,ij} \\
&\quad + \beta^{(\text{age})}_{18\!-\!29}\,\text{Age}_{18\!-\!29,ij}
\\
&\quad + \beta^{(\text{age})}_{30\!-\!49}\,\text{Age}_{30\!-\!49,ij}
\\
&\quad + \beta^{(\text{age})}_{50\!-\!69}\,\text{Age}_{50\!-\!69,ij}
\\
&\quad + \beta^{(\text{age})}_{70+}\,\text{Age}_{70+,ij} \\
&\quad + \beta^{(\text{gender})}_{\text{male}}\,\text{Male}_{ij} \\
&\quad +
\beta^{(\text{payor})}_{\text{public}}\,\text{Payor}_{\text{public},ij}
\\
&\quad +
\beta^{(\text{payor})}_{\text{self}}\,\text{Payor}_{\text{self},ij} \\
&\quad +
\beta^{(\text{payor})}_{\text{other}}\,\text{Payor}_{\text{other},ij} \\
&\quad +
\beta^{(\text{soi})}_{\text{moderate}}\,\text{SOI}_{\text{moderate},ij}
\\
&\quad +
\beta^{(\text{soi})}_{\text{major}}\,\text{SOI}_{\text{major},ij} \\
&\quad +
\beta^{(\text{soi})}_{\text{extreme}}\,\text{SOI}_{\text{extreme},ij} \\
&\quad + u_j
\end{aligned}\]
\[
u_j \sim N(0, \sigma^2_u)
\]
2.3. Reference Categories
The reference category for each categorical variable is as
follows:
- Year - 2018
- Total discharge volumes between 2018 and 2019 are nearly identical,
so both years are viable anchor points, however, 2018 precedes 2019,
making this a clearly interpretable reference point.
- Age - Ages 0-17
- Referencing on the youngest group makes interpretation intuitive,
where each coefficient reflects a change in age relative to the youngest
group.
- Younger age groups typically have lower sepsis rates, based on the
literature review as well as initial investigation this data, so the
0-17 group is clinically meaningful.
- Gender - Female
- Payor - Private
- While NYS does have notably high Medicaid enrollment relative to
other states, the majority of people in the United States are enrolled
in private healthcare.
- APR SOI - Minor
- Minor represents the lowest APR Severity of Illness
classification.
2.4. Covariate Justification
The goal of this analysis is to isolate hospital-level effects within
the model for adjustment for epidemiological spatial surveillance. While
this is not for risk-adjustment, much of the research on modeling and
examining inter-hospital effects of administrative data has been done
for purposes of risk-adjustment, so the core elements of the case-mix
adjustment have been drawn from this field. Age group
and gender are stable, widely available demographic
categories present in administrative data that is often used to help
isolate hospital-level effects from differences in patient
composition.
In 2015, hospitals transitioned from ICD-9 to ICD-10. The inclusion
of year in the analysis is intended to capture temporal
drift in general and in particular the changes to administrative
practices in the years following the adoption of ICD-10.
Payment typology reflects both patient-level
socioeconomic indicators, as well as structural factors that may
influence coding behavior at the facility-level. This is being included
as a patient-level covariate to control for case-mix SES, but it is also
a potential confounder. Further sensitivity analysis incorporating more
hospital-level factors (full facility payment typology in addition to
external sources of facility sturcture) may be warranted.
APR Severity of Illness (SOI) is included as a
case-mix proxy for true clinical severity from fields provided in the
PUF. Because these are assigned downstream of coding behavior, they
cannot be fully disentangled from coding behavior in this dataset,
however, it does represent the best available clinical severity proxy
given these constraints.
Part 3. Regression Results
3.1. Generate Models
Due to the volume of data, processed models get cached in folders
specified with save_model to minimize reprocessing.
Unadjusted model is fit only on a hospital-level random
intercept.
a_data_full <- bind_rows(lu_run$a_data)
save_model <- "models/unadjusted_glm_inf002.rds"
mod_unadjusted <- if (file.exists(save_model)) {
print(paste0("Loading working data from cache: ", save_model))
readRDS(save_model)
} else {
print("Fitting full model.")
glmer(dx_INF002~(1|v_facilityid),
family=binomial, data=a_data_full,
control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
}
## [1] "Loading working data from cache: models/unadjusted_glm_inf002.rds"
if (file.exists(save_model)) {
print("Retain")
} else {
saveRDS(mod_unadjusted, save_model)
print("Writing rds")
}
## [1] "Retain"
Adjusted model is is fit on hospital-level random
intercept and discharge-level fixed effects.
save_model <- "models/m1_full_inf002.rds"
mod_m1inf002 <- if (file.exists(save_model)) {
print(paste0("Loading working data from cache: ", save_model))
readRDS(save_model)
} else {
print("Fitting full model.")
mod_m1inf002 <- glmer(dx_INF002~ vp_year + vp_age + vp_gender + vp_apr_soi + vp_pay_1 + (1|v_facilityid),
family=binomial, data=a_data_adjusted,
control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
}
## [1] "Loading working data from cache: models/m1_full_inf002.rds"
if (file.exists(save_model)) {
print("Retain")
} else {
saveRDS(mod_m1inf002, save_model)
print("Writing rds")
}
## [1] "Retain"
rm(a_data_adjusted, save_model)
3.3. Display Model Comparison
summary_file <- "models/summary_comparison_df.rds"
if (file.exists(summary_file)) {
print(paste0("Loading summary from cache: ", summary_file))
summary_df <- readRDS(summary_file)
} else {
print("Generating model summary and caching...")
summary_df <- modelsummary(
list("Unadjusted" = mod_unadjusted, "Adjusted" = mod_m1inf002),
coef_map = c("SD (Intercept v_facilityid)" = "SD (Intercept) — Facility"),
gof_map = c("icc", "r2.marginal", "r2.conditional", "nobs", "aic"),
output = "dataframe"
)
saveRDS(summary_df, summary_file)
}
## [1] "Loading summary from cache: models/summary_comparison_df.rds"
summary_file <- "models/summary_comparison.rds"
if (file.exists(summary_file)) {
print(paste0("Loading summary from cache: ", summary_file))
summary_variance_only <- readRDS(summary_file)
} else {
print("Generating model summary and caching...")
summary_variance_only <- modelsummary(
list("Unadjusted" = mod_unadjusted, "Adjusted" = mod_m1inf002),
coef_map = c("SD (Intercept v_facilityid)" = "SD (Intercept) — Facility"),
gof_map = c("icc", "r2.marginal", "r2.conditional", "nobs", "aic")
)
saveRDS(summary_variance_only, summary_file)
}
## [1] "Loading summary from cache: models/summary_comparison.rds"
| |
Unadjusted |
Adjusted |
| SD (Intercept) — Facility |
1.595 |
1.324 |
| ICC |
0.4 |
0.3 |
| R2 Marg. |
0.000 |
0.203 |
| R2 Cond. |
0.436 |
0.480 |
| Num.Obs. |
3125246 |
3125246 |
| AIC |
1617567.7 |
1383976.7 |
The unadjusted model, containing just a random
intercept, indicates that hospital identity alone accounts for 40% of
the variation in whether a discharge is coded as septicemia. This high
ICC, reflecting linkage between hospital and outcome, aligns with the
existing literature - the burden of sepsis is reflected inconsistently
in electronic health records and coding processes (Fleischmann-Struzek,
2023; Schwarzkopf, 2019), and additionally subject to regional
variations in incidence and socioeconomic factors - both in the
patient’s community and in the hospital facilities (Moore, 2017;
Lippert, 2022).
Further, different hospitals receive different patients. Adjusting
the model to include case-mix covariates reduces the ICC to 32% -
indicating that even with control for case-mix, there is still
substantial hospital-level variation.
The Marginal \(R^2\), reflecting the
variance explained by fixed effects alone is 0.203. Conditional \(R^2\), reflecting total variance explained
by both fixed and random effects, increases only marginally from .44 to
.48. This indicates that the fixed effects are redistributing variance
previously absorbed by the random intercept, rather than explaining new
variance in the outcome.
Given the high remaining between-hospital variance for Septicemia,
true differences in prevalence vs coding artifacts cannot be
disentangled without the addition of additional hospital-level
structural covariates.
3.4. Likelihood Ratio Test
kable(anova(mod_unadjusted, mod_m1inf002))
| mod_unadjusted |
2 |
1617568 |
1617594 |
-808781.8 |
1617564 |
NA |
NA |
NA |
| mod_m1inf002 |
14 |
1383977 |
1384158 |
-691974.4 |
1383949 |
233614.9 |
12 |
0 |
The adjusted model represents a statistically significant improvement
in fit over the unadjusted model (\(\chi^2(12)
= 233{,}615\), p <0.001). AIC and BIC decrease substantially
confirming the fixed effects contribute meaningfully to model fit.
3.5. Fixed Effects Odds Ratios
Fixed effect ORs are conditional on facility, representing all
within-hospital comparisons. Because the sample size is 3.2M, almost all
effects are statistically significant, and effect sizes are the more
informative signal.
Age, gender and
payor behave as expected as case-mix proxies - age has
a strong monotonic effect in line with existing literature, while gender
and payor category had small effects. There is a slight decline in odds
by year, as 2019 has a 0.93 OR vs. 2018.
APR-SOI is the dominant predictor, driving the
majority of the explanatory power of the model. The OR of Extreme
illness (relative to minor) is 32.6. This is in part a result of
Septicemia being definitionally severe, with the APR-SOI being derived
from the same diagnosis codes, so the adjustment is partially circular.
While further disentangling of this effect is not possible with the data
available in the PUF, deeper analysis with covariates from a more
limited SPARCS tier may offer further insight.
mod_m1inf002 |>
tbl_regression(
exponentiate = TRUE,
label = list(
vp_year ~ "Discharge year",
vp_age ~ "Patient age group (years)",
vp_gender ~ "Patient gender",
vp_apr_soi ~ "APR Severity of Illness",
vp_pay_1 ~ "Discharge primary payment typology"
)
) |>
bold_labels() |>
bold_p() |>
as_flex_table()
Characteristic | OR | 95% CI | p-value |
|---|
Discharge year |
|
|
|
2018 | — | — |
|
2019 | 0.93 | 0.93, 0.94 | <0.001 |
Patient age group (years) |
|
|
|
0-17 | — | — |
|
18-29 | 1.82 | 1.74, 1.91 | <0.001 |
30-49 | 2.22 | 2.13, 2.32 | <0.001 |
50-69 | 2.62 | 2.51, 2.73 | <0.001 |
70+ | 2.92 | 2.80, 3.04 | <0.001 |
Patient gender |
|
|
|
Female | — | — |
|
Male | 1.05 | 1.04, 1.06 | <0.001 |
APR Severity of Illness |
|
|
|
Minor | — | — |
|
Moderate | 3.11 | 3.03, 3.19 | <0.001 |
Major | 8.58 | 8.37, 8.80 | <0.001 |
Extreme | 32.6 | 31.8, 33.4 | <0.001 |
Discharge primary payment typology |
|
|
|
Private | — | — |
|
Public | 0.95 | 0.93, 0.96 | <0.001 |
Self-Pay | 1.04 | 1.00, 1.08 | 0.071 |
Other | 0.75 | 0.72, 0.78 | <0.001 |
Abbreviations: CI = Confidence Interval, OR = Odds Ratio |
Part 4. Diagnostics
Because this is a hierarchical logistic model with exclusively
categorical covariates, the DHARMa package will be used to
simulate and plot residuals.
4.1. DHARMa Simulated Residuals
Because the data set is 3.2M observations, the n of
simulated variables is very subject to running into a long-vector
barrier. Because the data is hierarchical, sampling for residual
simulation risks losing relative effects, so I have proceeded with
simulated n=250 for each observation in the dataset.
save_model <- "models/dharma_sim_res.rds"
sim_res <- if (file.exists(save_model)) {
print(paste0("Loading working data from cache: ", save_model))
readRDS(save_model)
} else {
print("Fitting full model.")
sim_res <- simulateResiduals(mod_m1inf002, n = 250)
}
## [1] "Loading working data from cache: models/dharma_sim_res.rds"
if (file.exists(save_model)) {
print("Retain")
} else {
saveRDS(sim_res, save_model)
print("Writing rds")
}
## [1] "Retain"
4.4. Decile Plot for Calibration
Because n is so large, the H-L test would likely see the same
sample-size oversensitivity and reject the calibration, like earlier
tests. For this reason, I am validating calibration with a decile
plot.
a_data_full$pred <- predict(mod_m1inf002, type = "response")
cal <- calibration_plot(data = a_data_full, obs = "dx_INF002", pred = "pred", nTiles = 10)
cal$calibration_plot +
labs(title = "Calibration plot by decile of predicted probability of Septicemia coding",
x= "Mean predicted probability",
y = "Observed proportion"
) +
theme_classic()
The decile plot indicates that the model is very well calibrated, with
all deciles falling neatly along the diagonal, indicating that mean
predicted probabilities align closely with observed proportion. The
highest decile predicted value appears to be right in line with the
observed proportion, indicating that the range of observations at the
furthest end of the probabilities (highlighted in 4.3) represent a very
small subset of the dataset.
Part 5. Regression Visualizations
5.1 Primary Association Visualization
Because the primary expsoure variable is a categorical predictor, I
generated the predicted outcome values with 95% CIs. For clarity, and to
keep the 208 facilities meaningfully visualized, I generated 2 reference
patients. I chose patients that do not fall on the far ends, as we’ve
seen the prediction is somewhat weaker due to the ceiling of high
predictions with high-risk categories.
##Create an example patient with set fixed parameters
case_mix_list_a <-
c(
vp_apr_soi = "Moderate",
vp_year = "2018",
vp_pay_1 = "Private",
vp_age="50-69",
vp_gender = "Female"
)
case_mix_list_b <-
c(
vp_apr_soi = "Minor",
vp_year = "2018",
vp_pay_1 = "Private",
vp_age="18-29",
vp_gender = "Female"
)
##Ggpredict patient A
testpatienta <- ggpredict(mod_m1inf002,
terms = "v_facilityid",
type = "random",
condition = case_mix_list_a
) |>
as.data.frame() |>
arrange(predicted) |>
mutate(rank = row_number())
##grab order of ref patient A
fac_order <- testpatienta |>
pull("x")
##Ggpredict patient b
testpatientb <- ggpredict(mod_m1inf002,
terms = "v_facilityid",
type = "random",
condition = case_mix_list_b
) |>
as.data.frame()
##Get dynamic list of reference values cleaned for display
display <- names(case_mix_list_a) |>
str_remove("^vp_") |>
str_replace_all("_", " ") |>
str_to_title() |>
str_replace_all("\\bApr\\b", "APR") |>
str_replace_all("\\bSoi\\b", "SOI") |>
str_replace_all("Pay 1", "Payor")
## Generate labels from the patient case mix reference lists
profile_label_a <- paste(display, unlist(case_mix_list_a),
sep = ": ", collapse = "\n")
profile_label_b <- paste(display, unlist(case_mix_list_b),
sep = ": ", collapse = "\n")
## stack data, adding 'reference patient' col, which is filled with the cleaned labels for legend
combined <- bind_rows(
!!profile_label_a := testpatienta,
!!profile_label_b := testpatientb,
.id = "Reference Patient"
) |>
## order this by patient a's facility's
mutate(x = factor(x, levels = fac_order))
### Caterpillar plot
ggplot(combined, aes(x = x, y = predicted, color = `Reference Patient`)) +
geom_linerange(aes(ymin = conf.low, ymax = conf.high), alpha = 0.4) +
geom_point(size = 0.8) +
scale_y_continuous(labels = scales::percent) +
labs(x = "Facility (ranked by predicted probability for Reference Patient A)",
y = "Predicted probability of septicemia",
title = "Figure 1. Facility-level predicted septicemia probability for reference patient",
subtitle = "NYS SPARCS Inpatient Hospital Discharges, 2018 - 2019") +
theme_classic() +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())
Figure 1 shows predicted probability and 95% CIs for septicemia coding
across all 208 facilities for two reference patient profiles. There is
large vertical spread within each profile, with relatively parallel
paths, indicating that facility variation is consistent across the
patient profiles controlled for by the case-mix covariates. This
persistent variation (vertical spread) is consistent with facility-level
property, representing the variation this analysis aims to quantify.
5.2 Full model coefficient plot
ci_table <- tidy(mod_m1inf002, conf.int = TRUE, exponentiate = TRUE) |>
filter(term != "sd__(Intercept)" & term !="(Intercept)" ) |>
dplyr::select(term, estimate, conf.low, conf.high) |>
mutate(across(c(estimate, conf.low, conf.high), \(x) round(x, 3))) |>
mutate(
term = term |>
str_replace("^vp_year", "Year: ") |>
str_replace("^vp_age", "Age: ") |>
str_replace("^vp_gender", "Gender: ") |>
str_replace("^vp_apr_soi", "APR SOI: ") |>
str_replace("^vp_pay_1", "Payor: ")
) |>
mutate(
term = term |>
str_replace("Year: 2019", "Year: 2019 (ref: 2018)") |>
str_replace("Age: 18-29", "Age: 18-29 (ref: 0-17)") |>
str_replace("Gender: Male", "Gender: Male (ref: Female)") |>
str_replace("APR SOI: Moderate", "APR SOI: Moderate (ref: Minor)") |>
str_replace("Payor: Public", "Payor: Public (ref: Private)")
)
ci_table |>
ggplot(aes(x = estimate, y = reorder(term, estimate))) +
geom_vline(xintercept = 1, linetype = "dashed", color = "red") +
geom_point(size = 1, color = "steelblue") +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2,
color = "steelblue") +
scale_x_log10() +
labs(title = "Forest Plot of Adjusted Odds Ratios for Septicemia Diagnosis",
subtitle = "Reference line at OR = 1; log-scale x-axis",
x = "Adjusted Odds Ratio (95% CI)", y = NULL) +
theme_classic()
Figure 2 demonstrates the dramatic effect of inclusion by APR in the
model’s adjustment - the odds of Septicemia for observations with an APR
SOI of Extreme vs the Minor reference category, holding all other
variables stable, is an order of magnitude (~30x vs 3x for the
next-largest category) larger than any other covariate vs its respective
reference category. This is expected, given the fact that the Outcome is
derived from the same codes as the APR-SOI, which raises the possibility
of circularity - it may be worth testing alternative covariates as
potential adjustors of clinical severity. Age and APR-SOI together are
the main drivers of adjustment in this model, while Gender, Payor, and
Year have statistically significant ORs, but much lower magnitude,
consistent with their role as case-mix adjustments, rather than primary
drivers.
References
Fleischmann-Struzek, C., & Rudd, K. (2023). Challenges of
assessing the burden of sepsis. Medizinische Klinik, Intensivmedizin Und
Notfallmedizin, 118(Suppl 2), 68–74. https://doi.org/10.1007/s00063-023-01088-7
Lippert, A. M. (2022). System Failure: The Geographic Distribution of
Sepsis-Associated Death in the USA and Factors Contributing to the
Mortality Burden of Black Communities. Journal of Racial and Ethnic
Health Disparities, 1–10. https://doi.org/10.1007/s40615-022-01418-z
Moore, J. X., Donnelly, J. P., Griffin, R., Safford, M. M., Howard,
G., Baddley, J., & Wang, H. E. (2017). Community characteristics and
regional variations in sepsis. International Journal of Epidemiology,
46(5), 1607–1617. https://doi.org/10.1093/ije/dyx099
Schwarzkopf, D., Fleischmann-Struzek, C., Schlattmann, P., Dorow, H.,
Ouart, D., Edel, A., Gonnert, F. A., Götz, J., Gründling, M., Heim, M.,
Jaschinski, U., Lindau, S., Meybohm, P., Putensen, C., Sander, M., &
Reinhart, K. (2020). Validation study of German inpatient administrative
health data for epidemiological surveillance and measurement of quality
of care for sepsis: The OPTIMISE study protocol. BMJ Open, 10(10),
e035763. https://doi.org/10.1136/bmjopen-2019-035763
Appendix
AI Disclosure
Claude was used to generate the regex for the string replace function
in Part 5.1 - this was to ensure that the reference patient could be
updated dynamically, to allow me to switch between versions of the plot
more easily, while still keeping the figure display clean. Claude AI was
asked “gsubs (or dplyr equivalent) to first remove”vp_“, then change
all”_” to ” “, then apply title case”, claude generated the
string_replace_all sequence as seen in the code. Then I followed up with
“Add lines to get APR SOI back to title case” and updated the code.