1 Generate Synthetic Data (rare event, ~1% eDKA)

Synthetic cohort and rare-event mechanism (overview).
We simulate N = 1,500 SGLT2i-exposed inpatient encounters to study a rare outcome (eDKA). The base event rate is set to rare rate = 0.01, then modified by clinically relevant covariates through a logistic model. Let \(Y\in\{0,1\}\) be eDKA; for encounter \(i\), we set the linear predictor as \(\,\operatorname{logit}(p_i)=\alpha+\beta_{\mathrm{DM}}\mathrm{DM}_i+\beta_{\mathrm{Surg}}\mathrm{Surg}_i+\beta_{\mathrm{HF}}\mathrm{HF}_i+\beta_{\mathrm{CKD}}\mathrm{CKD}_i+\beta_{\mathrm{NPO}}\mathrm{NPO}_i+\beta_{\mathrm{Hold}}\mathrm{Hold}_i+\beta_{\mathrm{Age}}\mathrm{Age10}_i\,\). We choose \(\alpha=\operatorname{logit}(\text{rare_rate})\) so that, in the absence of risk factors, \(p_i\approx 1\%\). We then map to probability via \(p_i=\operatorname{logit}^{-1}(\cdot)=\mathrm{plogis}(\cdot)\) and draw \(Y_i\sim\mathrm{Bernoulli}(p_i)\).

Covariates.
We mimic typical inpatient features: age (mean ≈ 62, SD 15), sex, and binary indicators for diabetes (DM), heart failure (HF), CKD, surgery during the encounter, SGLT2i held (peri-op/inpatient), and NPO/poor intake. Coefficients encode plausible qualitative effects: \(\beta_{\mathrm{DM}}=0.45,\ \beta_{\mathrm{Surg}}=0.55,\ \beta_{\mathrm{HF}}=0.35,\ \beta_{\mathrm{CKD}}=0.30,\ \beta_{\mathrm{NPO}}=0.40\) (increase risk), \(\beta_{\mathrm{Hold}}=-0.25\) (reduces risk), and \(\beta_{\mathrm{Age}}=0.10\) per 10-year deviation using \(\mathrm{Age10}=(\mathrm{age}-60)/10\).

Evaluability vs. non-evaluability (case ascertainment).
Because real charts often lack key diagnostics, we simulate an evaluable flag \(E\sim\mathrm{Bernoulli}(0.80)\). Only \(E=1\) encounters are eligible for endpoint assessment; if \(E=0\), we record a reason—“Missing ketone test” vs. “Missing acidosis element”—with user-set proportions (miss_ketone_rate, miss_acid_rate). For incidence, the primary denominator is the evaluable subset, while non-evaluable counts are still tallied to show potential under-ascertainment.

Masking the outcome where not evaluable.
We set edka_obs = NA when \(E=0\) to mirror practice: do not label a case without required labs.

Sanity checks.
We report: Exposed Number \(=N\); Evaluable Number \(\sum \mathbf{1}(E=1)\); Non-evaluable Number \(\sum \mathbf{1}(E=0)\); eDKA event counts \(\sum \mathbf{1}(Y=1 \land E=1)\); eDKA rate \(\big(\sum \mathbf{1}(Y=1 \land E=1)\big)\big/\big(\sum \mathbf{1}(E=1)\big)\).

This confirms the simulated rare incidence and documents how many records are non-evaluable, which is essential context when reporting incidence and subgroup summaries.

## $Exposed_Number
## [1] 1500
## 
## $Evaluable_Number
## [1] 1216
## 
## $Nonevaluable_Number
## [1] 284
## 
## $Nonevaluable_MissingKetone
## [1] 215
## 
## $Nonevaluable_MissingAcid
## [1] 69
## 
## $eDKA_events_Observation
## [1] 26
## 
## $eDKA_rate_Observation
## [1] 0.02138158

2 Table 1 – Baseline Characteristics (Exposed Cohort & Evaluable Subset)

Table 1. Demographics, comorbidities, and evaluability (single cohort)
Characteristic Overall (N = 1500)
Male, n (%) 743 (49.5%)
Age, years (Mean (SD)) 62.1 (15.2)
Diabetes, n (%) 831 (55.4%)
Heart failure (any), n (%) 461 (30.7%)
└─ HFrEF, n (%) 291 (19.4%)
└─ HFpEF, n (%) 170 (11.3%)
CKD, n (%) 422 (28.1%)
Surgery, n (%) 411 (27.4%)
SGLT2i held, n (%) 897 (59.8%)
NPO/Poor intake, n (%) 547 (36.5%)
Evaluable (complete labs), n (%) 1216 (81.1%)
Non-evaluable (any missing), n (%) 284 (18.9%)
└─ Missing ketone test, n (%) 215 (14.3%)
└─ Missing acidosis element, n (%) 69 (4.6%)

3 Contingency Tables by Subgroups (Evaluable Subset)

Contingency tables with exact 95% CIs and subgroup-level p-values (Evaluable subset)
Subgroup Level No eDKA eDKA Total Prop_eDKA CI_low CI_high Test P_value
Diabetes No 531 10 541 0.0185 0.0089 0.0337 Pearson’s Chi-squared test with Yates’ continuity correction 0.670200
Diabetes Yes 659 16 675 0.0237 0.0136 0.0382 Pearson’s Chi-squared test with Yates’ continuity correction 0.670200
Heart failure (any) No 828 15 843 0.0178 0.0100 0.0292 Pearson’s Chi-squared test with Yates’ continuity correction 0.277800
Heart failure (any) Yes 362 11 373 0.0295 0.0148 0.0522 Pearson’s Chi-squared test with Yates’ continuity correction 0.277800
HF subtype No HF 828 15 843 0.0178 0.0100 0.0292 Fisher’s Exact Test for Count Data 0.371500
HF subtype HFrEF 236 7 243 0.0288 0.0117 0.0584 Fisher’s Exact Test for Count Data 0.371500
HF subtype HFpEF 126 4 130 0.0308 0.0084 0.0769 Fisher’s Exact Test for Count Data 0.371500
CKD No 868 19 887 0.0214 0.0129 0.0332 Pearson’s Chi-squared test with Yates’ continuity correction 1.000000
CKD Yes 322 7 329 0.0213 0.0086 0.0433 Pearson’s Chi-squared test with Yates’ continuity correction 1.000000
Surgery No 867 16 883 0.0181 0.0104 0.0293 Pearson’s Chi-squared test with Yates’ continuity correction 0.290000
Surgery Yes 323 10 333 0.0300 0.0145 0.0545 Pearson’s Chi-squared test with Yates’ continuity correction 0.290000
SGLT2i held No 463 19 482 0.0394 0.0239 0.0609 Pearson’s Chi-squared test with Yates’ continuity correction 0.000897
SGLT2i held Yes 727 7 734 0.0095 0.0038 0.0196 Pearson’s Chi-squared test with Yates’ continuity correction 0.000897
NPO/Poor intake No 749 17 766 0.0222 0.0130 0.0353 Pearson’s Chi-squared test with Yates’ continuity correction 0.960100
NPO/Poor intake Yes 441 9 450 0.0200 0.0092 0.0376 Pearson’s Chi-squared test with Yates’ continuity correction 0.960100
Sex Female 589 16 605 0.0264 0.0152 0.0426 Pearson’s Chi-squared test with Yates’ continuity correction 0.309300
Sex Male 601 10 611 0.0164 0.0079 0.0299 Pearson’s Chi-squared test with Yates’ continuity correction 0.309300

4 Simple Regression

Simple logistic regression for eDKA (Evaluable subset): OR (95% CI)
term estimate conf.low conf.high p.value
Intercept 0.032 0.012 0.077 0.0000
Diabetes (DM=1) 1.247 0.563 2.890 0.5920
Surgery (1) 1.649 0.709 3.660 0.2270
HF subtype: HFrEF (vs No HF) 1.553 0.583 3.760 0.3470
HF subtype: HFpEF (vs No HF) 1.765 0.492 5.030 0.3250
CKD (1) 1.004 0.385 2.339 0.9930
NPO/Poor intake (1) 0.939 0.394 2.098 0.8810
SGLT2i held (1) 0.237 0.092 0.546 0.0013
Sex: Male (vs Female) 0.613 0.265 1.354 0.2340
Age (per 10y) 1.037 0.803 1.336 0.7790