Synthetic cohort and rare-event mechanism
(overview).
We simulate N = 1,500 SGLT2i-exposed inpatient
encounters to study a rare outcome (eDKA). The
base event rate is set to rare rate = 0.01, then
modified by clinically relevant covariates through a logistic
model. Let \(Y\in\{0,1\}\) be
eDKA; for encounter \(i\), we set the
linear predictor as \(\,\operatorname{logit}(p_i)=\alpha+\beta_{\mathrm{DM}}\mathrm{DM}_i+\beta_{\mathrm{Surg}}\mathrm{Surg}_i+\beta_{\mathrm{HF}}\mathrm{HF}_i+\beta_{\mathrm{CKD}}\mathrm{CKD}_i+\beta_{\mathrm{NPO}}\mathrm{NPO}_i+\beta_{\mathrm{Hold}}\mathrm{Hold}_i+\beta_{\mathrm{Age}}\mathrm{Age10}_i\,\).
We choose \(\alpha=\operatorname{logit}(\text{rare_rate})\)
so that, in the absence of risk factors, \(p_i\approx 1\%\). We then map to
probability via \(p_i=\operatorname{logit}^{-1}(\cdot)=\mathrm{plogis}(\cdot)\)
and draw \(Y_i\sim\mathrm{Bernoulli}(p_i)\).
Covariates.
We mimic typical inpatient features: age (mean ≈ 62, SD 15), sex, and
binary indicators for diabetes (DM), heart
failure (HF), CKD, surgery
during the encounter, SGLT2i held (peri-op/inpatient),
and NPO/poor intake. Coefficients encode plausible
qualitative effects: \(\beta_{\mathrm{DM}}=0.45,\
\beta_{\mathrm{Surg}}=0.55,\ \beta_{\mathrm{HF}}=0.35,\
\beta_{\mathrm{CKD}}=0.30,\ \beta_{\mathrm{NPO}}=0.40\) (increase
risk), \(\beta_{\mathrm{Hold}}=-0.25\)
(reduces risk), and \(\beta_{\mathrm{Age}}=0.10\) per 10-year
deviation using \(\mathrm{Age10}=(\mathrm{age}-60)/10\).
Evaluability vs. non-evaluability (case
ascertainment).
Because real charts often lack key diagnostics, we
simulate an evaluable flag \(E\sim\mathrm{Bernoulli}(0.80)\). Only \(E=1\) encounters are eligible for endpoint
assessment; if \(E=0\), we record a
reason—“Missing ketone test” vs. “Missing acidosis
element”—with user-set proportions (miss_ketone_rate,
miss_acid_rate). For incidence, the primary
denominator is the evaluable subset, while
non-evaluable counts are still tallied to show
potential under-ascertainment.
Masking the outcome where not evaluable.
We set edka_obs = NA when \(E=0\) to mirror practice: do not label a
case without required labs.
Sanity checks.
We report: Exposed Number \(=N\); Evaluable Number
\(\sum \mathbf{1}(E=1)\);
Non-evaluable Number \(\sum
\mathbf{1}(E=0)\); eDKA event counts \(\sum \mathbf{1}(Y=1 \land E=1)\);
eDKA rate \(\big(\sum
\mathbf{1}(Y=1 \land E=1)\big)\big/\big(\sum
\mathbf{1}(E=1)\big)\).
This confirms the simulated rare incidence and documents how many records are non-evaluable, which is essential context when reporting incidence and subgroup summaries.
## $Exposed_Number
## [1] 1500
##
## $Evaluable_Number
## [1] 1216
##
## $Nonevaluable_Number
## [1] 284
##
## $Nonevaluable_MissingKetone
## [1] 215
##
## $Nonevaluable_MissingAcid
## [1] 69
##
## $eDKA_events_Observation
## [1] 26
##
## $eDKA_rate_Observation
## [1] 0.02138158
| Characteristic | Overall (N = 1500) |
|---|---|
| Male, n (%) | 743 (49.5%) |
| Age, years (Mean (SD)) | 62.1 (15.2) |
| Diabetes, n (%) | 831 (55.4%) |
| Heart failure (any), n (%) | 461 (30.7%) |
| └─ HFrEF, n (%) | 291 (19.4%) |
| └─ HFpEF, n (%) | 170 (11.3%) |
| CKD, n (%) | 422 (28.1%) |
| Surgery, n (%) | 411 (27.4%) |
| SGLT2i held, n (%) | 897 (59.8%) |
| NPO/Poor intake, n (%) | 547 (36.5%) |
| Evaluable (complete labs), n (%) | 1216 (81.1%) |
| Non-evaluable (any missing), n (%) | 284 (18.9%) |
| └─ Missing ketone test, n (%) | 215 (14.3%) |
| └─ Missing acidosis element, n (%) | 69 (4.6%) |
| Subgroup | Level | No eDKA | eDKA | Total | Prop_eDKA | CI_low | CI_high | Test | P_value |
|---|---|---|---|---|---|---|---|---|---|
| Diabetes | No | 531 | 10 | 541 | 0.0185 | 0.0089 | 0.0337 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.670200 |
| Diabetes | Yes | 659 | 16 | 675 | 0.0237 | 0.0136 | 0.0382 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.670200 |
| Heart failure (any) | No | 828 | 15 | 843 | 0.0178 | 0.0100 | 0.0292 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.277800 |
| Heart failure (any) | Yes | 362 | 11 | 373 | 0.0295 | 0.0148 | 0.0522 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.277800 |
| HF subtype | No HF | 828 | 15 | 843 | 0.0178 | 0.0100 | 0.0292 | Fisher’s Exact Test for Count Data | 0.371500 |
| HF subtype | HFrEF | 236 | 7 | 243 | 0.0288 | 0.0117 | 0.0584 | Fisher’s Exact Test for Count Data | 0.371500 |
| HF subtype | HFpEF | 126 | 4 | 130 | 0.0308 | 0.0084 | 0.0769 | Fisher’s Exact Test for Count Data | 0.371500 |
| CKD | No | 868 | 19 | 887 | 0.0214 | 0.0129 | 0.0332 | Pearson’s Chi-squared test with Yates’ continuity correction | 1.000000 |
| CKD | Yes | 322 | 7 | 329 | 0.0213 | 0.0086 | 0.0433 | Pearson’s Chi-squared test with Yates’ continuity correction | 1.000000 |
| Surgery | No | 867 | 16 | 883 | 0.0181 | 0.0104 | 0.0293 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.290000 |
| Surgery | Yes | 323 | 10 | 333 | 0.0300 | 0.0145 | 0.0545 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.290000 |
| SGLT2i held | No | 463 | 19 | 482 | 0.0394 | 0.0239 | 0.0609 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.000897 |
| SGLT2i held | Yes | 727 | 7 | 734 | 0.0095 | 0.0038 | 0.0196 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.000897 |
| NPO/Poor intake | No | 749 | 17 | 766 | 0.0222 | 0.0130 | 0.0353 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.960100 |
| NPO/Poor intake | Yes | 441 | 9 | 450 | 0.0200 | 0.0092 | 0.0376 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.960100 |
| Sex | Female | 589 | 16 | 605 | 0.0264 | 0.0152 | 0.0426 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.309300 |
| Sex | Male | 601 | 10 | 611 | 0.0164 | 0.0079 | 0.0299 | Pearson’s Chi-squared test with Yates’ continuity correction | 0.309300 |
| term | estimate | conf.low | conf.high | p.value |
|---|---|---|---|---|
| Intercept | 0.032 | 0.012 | 0.077 | 0.0000 |
| Diabetes (DM=1) | 1.247 | 0.563 | 2.890 | 0.5920 |
| Surgery (1) | 1.649 | 0.709 | 3.660 | 0.2270 |
| HF subtype: HFrEF (vs No HF) | 1.553 | 0.583 | 3.760 | 0.3470 |
| HF subtype: HFpEF (vs No HF) | 1.765 | 0.492 | 5.030 | 0.3250 |
| CKD (1) | 1.004 | 0.385 | 2.339 | 0.9930 |
| NPO/Poor intake (1) | 0.939 | 0.394 | 2.098 | 0.8810 |
| SGLT2i held (1) | 0.237 | 0.092 | 0.546 | 0.0013 |
| Sex: Male (vs Female) | 0.613 | 0.265 | 1.354 | 0.2340 |
| Age (per 10y) | 1.037 | 0.803 | 1.336 | 0.7790 |