Diabetic Readmission Risk Lab

Recommended Model

Lasso (L1)

10 of 19 features · auditable

Held-Out AUC

0.604

95% CI [0.591, 0.617]

Risk Gradient

3.3×

top vs bottom decile

Annual Net Benefit

+$1.31M

at 10K discharges · 30% prevention

Models Trained

linear · tree · neural

Features Used

administrative only

47,642

Training Patients

70% stratified split

20,419

Test Patients

30% held-out

9.02%

Base Rate

imbalanced

Cross-Model Comparison Table ?All metrics computed on the same held-out test set at each model’s Youden-optimal threshold. AUC measures discrimination; sensitivity is recall; specificity is true-negative rate; precision is positive predictive value; F1 balances precision and recall.

Every metric computed on the held-out test set at each model’s Youden-optimal threshold. The starred row is the best AUC.

Bootstrap 95% CIs computed analytically via the Hanley-McNeil standard error (1982). All seven models’ CIs overlap, confirming they are statistically indistinguishable. SE(AUC) = sqrt(AUC(1-AUC) + (n+-1)(Q1-AUC^2) + (n–1)(Q2-AUC^2)) / sqrt(n+ n-)

Discrimination — AUC across all 7 models ?AUC = Area Under the ROC Curve. Probability that a randomly chosen positive case scores higher than a randomly chosen negative. 0.5 = chance; 1.0 = perfect.

All models cluster near AUC = 0.60. This is the information ceiling of administrative-data features for this outcome — not an algorithm-choice problem.

ROC Curves — All Models Overlaid ?An ROC curve traces sensitivity (TPR) against 1 - specificity (FPR) as the threshold varies from 1 to 0. Steeper, closer-to-corner curves are better.

The curves are visually indistinguishable, confirming the AUC convergence.

Multi-Metric Radar — Where Each Model Wins

Axes are min-max normalized across the 7 models, so the radar shows relative performance, not absolute. Look for models that bulge on the metric you care about (e.g. sensitivity for triage).

Calibration — Are Predicted Probabilities Trustworthy? ?A calibration plot bins patients by their predicted probability, then plots predicted vs observed rates. Points on the diagonal mean the model is well-calibrated (a “20% risk” really happens 20% of the time).

Each dot is one of 10 risk deciles. The diagonal is perfect calibration. Lasso’s points hug the diagonal closely — the predicted probabilities can be trusted at face value, which is essential for the business P&L math.

Learning Curves — Would More Training Data Help? ?Train Lasso on increasingly larger subsets of the training data and measure test AUC. If the curve has flattened, the model has extracted all available signal; if still rising, more data would help.

The curve flattens between 30K and 47K training patients. Additional data would not materially improve AUC — the bottleneck is feature signal, not sample size.

Feature Correlation Heatmap ?Pairwise Pearson correlations on the 19 features. High off-diagonal values (|r| > 0.5) indicate multicollinearity, which explains why Lasso drops some features without losing AUC.

Strong correlations between admission-type indicators and between prior-utilization features explain Lasso’s feature drops — the dropped features are mostly redundant.

Subgroup Performance — Fairness Audit ?A fair model performs comparably across demographic subgroups. Large AUC or sensitivity gaps indicate the model is more accurate for some patients than others.

Lasso’s AUC across demographic subgroups. Differences under 0.02 are within statistical noise; gaps above 0.04 warrant investigation.

Reading the fairness audit. Performance is broadly comparable across race and gender subgroups (max AUC gap = 0.018). The largest disparity is by age: the model has higher discrimination for 60+ patients than for under-40s, likely because younger diabetics are rarer and the model has less training signal for them. Action: monitor performance for under-40 patients post-deployment.

Context: Comparison vs CMS HRRP Penalty

Under the CMS Hospital Readmissions Reduction Program (HRRP), the average penalty for a 200-bed hospital with above-expected readmissions is roughly $130K to $250K per year. The Lasso model’s annual net benefit of +$1.31M covers that penalty exposure approximately 5–10× over — deployment is financially compelling even before counting reputational benefits.

Reading the scoreboard. Random Forest leads on sensitivity (71%) but pays in specificity (44%) — it casts the widest net. Lasso and Ridge tie for top AUC at 0.604 and both produce calibrated probabilities; Lasso uses fewer features. Logistic GLM is the interpretable baseline that gives you odds ratios for free. Pick Lasso for production deployment, Random Forest for sensitivity-first triage, GLM when stakeholders demand interpretability.

Pick a Model

All seven models are loaded. Switch any time — the NO EFFECT tags update to show what each model drops.

Predicted 30-Day Readmission Risk

–%

–

vs population baseline of 9.02%

Active modelLasso (L1)

Decision threshold–

Cohort baseline9.02%

Risk vs baseline–

Risk percentile–

Where this patient ranks in the 68,061-patient cohort –

0% (lowest risk)50%100% (highest risk)

Predicted probability 8.3% · approx. 95% prediction interval [6.6%, 10.4%] based on cross-validated residual variance

Recent predictions (last 6) ?Each dot shows the predicted risk for a recent scenario you tested. The current scenario is the larger dot. click any dot to recall

No prior predictions yet — move sliders to start a history.

Among similar patients in the test cohort

–

Loading cohort comparison…

–

Save scenario:

Saved scenarios will appear here

Adjust Patient Profile

Move sliders, toggle markers. The hero result above updates on every change. Each slider shows where the current value sits in the cohort distribution.

Decision Waterfall — Why This Prediction? ?Reads top-to-bottom: start at the model intercept, add each feature’s contribution, end at the final log-odds. Sigmoid of the final value gives the probability.

Starts at the model intercept, walks through each active feature’s contribution, ends at the final log-odds — the sigmoid of which is your probability. Audit step-by-step exactly how the model reasoned.

How a clinician would use this: At point of discharge, enter the patient’s profile in under 60 seconds. If the badge reads Above baseline with the YES flag, route to the high-risk pathway — 48-hour follow-up call, home-health referral, transitional-care visit at day 7. The waterfall above explains why, which is what makes the decision auditable.

Try a Built-In Scenario

One click loads two patient profiles. Useful for clinical training and model auditing.

Configure The Two Patients

Edit either card directly. The hero numbers and the comparison table below update live.

Patient A

Reference profile

–%

–

Patient B

Comparison profile

–%

–

Where the Two Patients Differ

Only features with different values between A and B are listed below.

Driver Comparison

Top 8 features by absolute contribution, shown for both patients side by side.

Why this tab matters. A single-number prediction is hard to trust. Seeing exactly how the model responds to a one-variable change — diabetes medication off versus on, holding everything else equal — builds clinician trust, surfaces model bugs, and exposes potential bias. Try the Prior Inpatient scenario: it’s the single strongest predictor in the Lasso model, and you can watch the gauge move from low to high.

Economic Assumptions (Edit Any Field) ?All downstream calculations re-run live as you edit. Defaults sourced from Jencks et al. (NEJM 2009) and CMS HRRP penalty data.

Defaults sourced from Jencks et al. (NEJM 2009) and CMS HRRP penalty data.

Intervention cost per flagged patient

Care coordinator + follow-up call

Cost of one 30-day readmission

Avg DRG-weighted cost

Prevention effectiveness

0 = no effect, 1 = all preventable

Annual diabetic discharges

Your hospital’s volume

Population baseline rate (%)

Your cohort’s untreated rate

One-time implementation cost

EHR integration + training

Active model:

–

Flagged / Year

–

Readmissions Caught

–

Readmissions Missed

false negatives

–

Intervention Cost

annual outlay

–

Readmission Savings

annual avoided

–

Net Benefit / Year

bottom line

–

ROI

savings / cost

–

NNF

flags per catch

–

Net / Discharge

per-patient value

–

Do-Nothing Cost

no model deployed

–

Optimal Threshold

net-benefit max

Break-Even Prevention Effectiveness

–

Below this prevention rate, the model loses money. Current setting is 30% — the margin of safety is –.

Time to Payback Implementation

–

Your one-time implementation cost is recovered from net benefit in this many days. Lower is better.

Net Benefit Across All Decision Thresholds ?The red marker is the threshold maximizing annual net benefit under your settings. Move economic sliders above to watch it shift.

The red marker is the threshold that maximizes annual net benefit under your current settings. Move the economic sliders above to see how it shifts.

Decision Curve Analysis (Vickers-Elkin) ?DCA compares the model against “Treat All” and “Treat None” strategies across every plausible threshold probability. The model is useful for any threshold where its line is above both alternatives.

Standard medical-economics framework for evaluating prediction models. The model beats both “treat everyone” and “treat no one” across all clinically reasonable thresholds — this is the formal justification for deploying it over either default.

Manual Threshold Playground ?Drag to set any threshold and see the confusion matrix and net benefit at that operating point. Hospital leadership may prefer a sensitivity-tilted or specificity-tilted threshold over Youden-optimal.

0.08

Override the Youden-optimal threshold to test sensitivity-first or specificity-first deployments.

–True Positives

–False Positives

–False Negatives

–True Negatives

At this manual threshold: Net benefit – · ROI –. –

Cost Scenarios — Side-by-Side Comparison ?Three intervention-cost scenarios shown at Youden-optimal threshold for each. Pick the column closest to your hospital’s actual program cost.

Three cost scenarios shown at each one’s Youden-optimal threshold. Lets decision-makers see which intervention budget produces the best net benefit.

Confusion Matrix at Optimal Threshold

Scaled to your annual discharge volume. Green = correct, red = errors.

Risk Gradient by Predicted-Risk Decile

The model concentrates risk in the top deciles — this is what powers triage even at modest AUC.

Reading the P&L. The break-even logic is simple: one prevented readmission ($15,400) is worth roughly 50 interventions ($300 each). The model only needs to convert one in 50 flagged patients into a prevented readmission to break even. At the Youden-optimal threshold and 30% prevention effectiveness, the model converts roughly one in 10 — a 5× margin over break-even.

Sensitivity analysis. Try halving prevention effectiveness to 0.15: the model still produces positive net benefit. Try setting intervention cost to $1,000 (very expensive program): the optimal threshold rises but net benefit remains positive. The model is robust to substantial parameter uncertainty — this is the case for deploying it.

Filter cohort view by: Showing full cohort (N = 68,061)

Cohort at a Glance

After preprocessing the raw 101,766-encounter dataset, the analytic sample contains 68,061 first-encounter patients. Age dominates the distribution, with the modal patient in the 60-80 band.

9.02% of patients are readmitted within 30 days — 6,142 events on a 68,061-patient denominator.
The mean length of stay is 4.4 days; the 95th percentile is 11 days.
78% of patients are on diabetes medication at discharge; 54% are on insulin.
The cohort is dominated by older patients: 65% are aged 60 or above.

Q1 — Does discharge disposition predict 30-day readmission?

Method: Chi-square test of independence on the 5-level disposition × binary outcome cross-tabulation. Cramér’s V reported to quantify effect strength independent of sample size.

Result: χ²(4, N = 68,061) = 757.06, p < .001, Cramér’s V = 0.106. We reject the null hypothesis of independence.

Substantive Interpretation

Patients discharged home with home-health services readmit at 19.8% — nearly 3× the baseline.
This subgroup is only 3.2% of patients but generates 7.0% of all readmissions.
The Cramér’s V of 0.106 indicates a small-to-medium effect — statistically significant and practically meaningful.
Operational action: Re-evaluate the home-health referral criteria. If sicker patients are being routed to home-health because they cannot go home alone, the 19.8% rate reflects underlying acuity, not failure of home-health.

Q2 — Does length of stay vary by admission type and age, with interaction?

Method: Two-way ANOVA with interaction term. F-statistics and eta-squared (η²) reported separately to distinguish statistical from practical significance.

Result: Admission F(2, 60,253) = 53.54, p < .001, η² = 0.0018; Age F(2, 60,253) = 373.88, p < .001, η² = 0.0122; Interaction F(4, 60,253) = 2.85, p = .023, η² = 0.0002. All three null hypotheses rejected.

Substantive Interpretation

All three effects are statistically significant because n = 60,000+ is huge.
Only age is practically meaningful (η² = 1.22%, explaining 1.22% of LOS variance).
The roughly parallel lines confirm the additive structure: older patients require longer stays regardless of admission urgency.
Operational action: Capacity planners should size beds by age cohort first, admission type second. A unit serving primarily 60+ patients needs ~30% more bed-days per admission than a 40−60 unit.

Q3 — Can regularization recover a parsimonious model with no loss in discrimination?

Method: Compare 7 classifiers (logistic GLM, Ridge L2, Lasso L1, Elastic Net α=0.5, Random Forest, XGBoost, Neural Network) on the same held-out test split.

Result: All models cluster at AUC ≈ 0.60−0.61. Lasso retains 10 of 19 features — a 47% reduction at zero AUC cost.

The administrative-data ceiling. Seven different algorithm families produce statistically indistinguishable AUC. This is not a model-selection problem; the dataset’s features simply do not contain more signal.
Lasso drops: minority-race indicators, gender, HbA1c testing flag, admission type binaries, prior outpatient visits.
Lasso keeps: prior inpatient visits (strongest predictor), length of stay, number of diagnoses, age, insulin use, diabetes medication, medication change flag, and 3 others.
Operational action: Deploy Lasso for the discharge-screening tool. The 10-variable form is auditable on a single sheet of paper — an essential property for clinical decision support.

For the seven-model comparison details, see the Model Scoreboard tab.

What three questions together tell us. Discharge pathway is a strong leading indicator (Q1). Age, not admission type, drives capacity needs (Q2). And the predictive signal in administrative data tops out at AUC ≈ 0.60 regardless of algorithm — but that’s still enough to produce a 3× risk gradient between deciles, which is the operational engine of triage (Q3). The three answers together justify a deployment-ready model used in conjunction with discharge-pathway redesign, not in place of it.

Data Source & Cohort Construction

The Diabetes 130-US Hospitals dataset (Strack et al., 2014), accessed via the UCI Machine Learning Repository, contains 101,766 inpatient encounters across 130 U.S. hospitals between 1999 and 2008.

Preprocessing Pipeline

Removed three high-missingness columns: weight (97% missing), payer_code (40%), medical_specialty (49%).
Excluded deceased and hospice-discharge encounters (discharge_disposition_id ∈ {11, 13, 14, 19, 20, 21}).
De-duplicated to the first encounter per patient_nbr to avoid intra-patient dependence.
Dropped rows missing race, primary diagnosis, or with gender = Unknown/Invalid.
Final analytic sample: 68,061 patients with 9.02% 30-day readmission rate.

Feature Engineering (19 final predictors)

Utilization (3): prior inpatient, ER, and outpatient visits in the year before the index encounter.
Index encounter (5): length of stay, lab procedures, in-hospital procedures, medications, diagnoses.
Demographics (5): age, gender, race (3 indicators).
Treatment markers (4): HbA1c tested, medication changed, on diabetes med, on insulin.
Admission context (2): emergency and urgent admission indicators.

Statistical & Machine-Learning Methods

Q1 — Chi-Square Test of Independence

Tests the null hypothesis that discharge disposition and 30-day readmission are statistically independent. Cramér’s V quantifies effect strength (0 = none, 1 = perfect), correcting for sample size so that the result is interpretable independent of the n = 68,061 denominator that would make any tiny effect “significant”.

Q2 — Two-Way ANOVA With Interaction

Tests three simultaneous nulls: no admission-type effect, no age-group effect, no interaction. Eta-squared (η²) is reported for each effect because in a 68,061-row dataset every p-value will be vanishingly small — we need an effect-size measure to know if any of it matters operationally.

Q3 — Seven Classifiers Compared

Logistic GLM — Interpretable baseline. Coefficients are log-odds ratios.
Ridge (L2) — cv.glmnet, α = 0, 5-fold CV optimized for AUC. Shrinks coefficients toward zero without dropping features.
Lasso (L1) — Same procedure, α = 1. Drops weak features entirely.
Elastic Net — α = 0.5 hybrid of L1 and L2. Compromise between Ridge and Lasso.
Random Forest — 200 trees, max depth 10, min leaf size 20. Captures non-linear interactions.
XGBoost — 300 boosting rounds, depth 4, learning rate 0.05, 85% column & row subsample.
Neural Network — Single hidden layer with 16 ReLU units, weight decay 0.001, early stopping at 200 iterations.

Evaluation Protocol

70/30 stratified train/test split with caret::createDataPartition and set.seed(42). All metrics reported on the held-out test set. Optimal classification threshold chosen by Youden’s J (maximizing sensitivity + specificity − 1).

Live Deployment Architecture

All seven models are trained once, offline. The Lasso intercept, the per-model coefficient vectors (for linear models) or linear surrogates fit on logit(p) (for tree and neural models), and the standardization statistics are exported as a single 20 KB JSON cache. The cache is embedded into the published HTML page; a JavaScript engine reads it and recomputes predictions, threshold-based decisions, and per-feature contributions in real time as the user moves sliders. No server, no R session, no compute budget, and no latency.

Data Dictionary — All 19 Features ?Precise operational definition for every feature used in the model. Required for reproducibility and clinical handoff.

Required for reproducibility: the precise operational definition of every feature in the model, with the source variable in the raw dataset.

Feature	Type	Definition	Source
`age_num`	numeric	Age band midpoint (5, 15, 25, …, 95) derived from the original 10-year age groups	`age`
`gender_bin`	binary	1 if Male, 0 if Female; Unknown/Invalid records dropped	`gender`
`race_AA`	binary	1 if African American, 0 otherwise	`race`
`race_His`	binary	1 if Hispanic, 0 otherwise	`race`
`race_Oth`	binary	1 if Other (Asian, Other), 0 otherwise (Caucasian is reference)	`race`
`adm_emerg`	binary	1 if Emergency admission (admission_type_id = 1), 0 otherwise	`admission_type_id`
`adm_urgent`	binary	1 if Urgent admission (admission_type_id = 2), 0 otherwise	`admission_type_id`
`time_in_hospital`	integer	Length of stay in days (1-14, integer)	`time_in_hospital`
`num_lab_procedures`	integer	Number of laboratory tests during the encounter	`num_lab_procedures`
`num_procedures`	integer	Number of non-lab procedures performed during the encounter	`num_procedures`
`num_medications`	integer	Number of distinct generic medication names administered	`num_medications`
`number_outpatient`	integer	Outpatient visits in the 365 days before the index encounter	`number_outpatient`
`number_emergency`	integer	Emergency-room visits in the 365 days before the index encounter	`number_emergency`
`number_inpatient`	integer	Inpatient admissions in the 365 days before the index encounter	`number_inpatient`
`number_diagnoses`	integer	Number of distinct diagnosis codes assigned to the encounter	`number_diagnoses`
`A1C_tested`	binary	1 if HbA1c was measured during the encounter (any result), 0 if “None”	`A1Cresult`
`change_bin`	binary	1 if any diabetic medication was changed during the encounter, 0 otherwise	`change`
`diabetesMed_bin`	binary	1 if patient is prescribed diabetes medication at discharge, 0 otherwise	`diabetesMed`
`insulin_bin`	binary	1 if insulin was prescribed or its dose changed during the encounter, 0 otherwise	`insulin`

Limitations & Generalizability Caveats

Vintage of data. The dataset covers 1999–2008. Diabetes management, hospital reimbursement (HRRP launched 2012), and discharge practices have evolved substantially since. Recalibration on contemporary data is essential before clinical deployment.
Single-country generalizability. All 130 hospitals are U.S. facilities. Care patterns, insurance mix, and outcome definitions differ in other healthcare systems — results may not transfer without local refitting.
Race coding is coarse. The original dataset uses six race categories that collapse heterogeneous populations (e.g., “Asian” combines populations with very different diabetes profiles). Bias audits should use richer demographic data when available.
Outcome definition. 30-day readmission counts any readmission to any of the 130 facilities; readmissions to other hospitals are missed, biasing the outcome rate downward by an estimated 5–10%.
Administrative-data ceiling. All seven models converge near AUC = 0.60 because the available features capture utilization but not clinical state (lab values, vitals, comorbidity severity). Adding ICD-coded comorbidities, lab trends, and social-determinant data would likely raise AUC to 0.70–0.75.
No prospective validation. All metrics are from a single random train/test split, not a temporal or external validation cohort. Real deployments should validate on hold-out time periods.
Causal interpretation. Coefficients describe statistical association at discharge, not causal effect. Insulin prescribing is correlated with disease severity; the model is not a recommendation for or against insulin.

References (APA 7th)

Centers for Medicare & Medicaid Services. (2023). Hospital Readmissions Reduction Program (HRRP). https://www.cms.gov/medicare/quality-initiatives-patient-assessment-instruments/value-based-programs/hrrp/hospital-readmissions-reduction-program-hrrp

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. https://doi.org/10.1145/2939672.2939785

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22.

Jencks, S. F., Williams, M. V., & Coleman, E. A. (2009). Rehospitalizations among patients in the Medicare fee-for-service program. New England Journal of Medicine, 360(14), 1418-1428.

Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1-26.

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18-22.

Strack, B., DeShazo, J. P., Gennings, C., Olmo, J. L., Ventura, S., Cios, K. J., & Clore, J. N. (2014). Impact of HbA1c measurement on hospital readmission rates. BioMed Research International, 2014, 781670.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267-288.

Reproducibility: set.seed(42). Place diabetic_data.csv in the working directory and source the script. The exported model_cache.json (~20 KB) is what this RPubs page embeds at knit time.

Complete R Script (645 lines) — click to collapse

# =============================================================================
# ALY 6015 — Final Project (Module 6)
# Predicting 30-Day Hospital Readmission in Diabetic Patients
# Multi-Model Risk Lab — trains 7 classifiers, runs 3 statistical tests,
# prints every figure in the report, exports JSON cache for the RPubs app.
#
# Authors:  Anush Goel · Mandeep Kaur · Sandeep Kaur
# Date:     May 2026
# =============================================================================
# Sections (each starts with "# ---- N. Name -----" so the Appendix tab
# automatically splits them into collapsible blocks):
#   0. Setup
#   1. Data preprocessing  (101,766 -> 68,061)
#   2. Q1 — Chi-square: discharge disposition x readmission
#   3. Q2 — Two-way ANOVA: length of stay ~ admission x age
#   4. Q3 — Train 7 classifiers (GLM, Ridge, Lasso, Enet, RF, XGB, NN)
#   5. Score & cache (model_cache.json)
#   6. Report figures — print only, no file saving
# Reproducibility: set.seed(42)
# =============================================================================


# ---- 0. Environment Setup & Reproducibility ----
pkgs <- c("tidyverse","janitor","glmnet","randomForest","xgboost",
          "nnet","pROC","caret","jsonlite","plotly","scales","kableExtra")
for (p in pkgs) if (!requireNamespace(p, quietly = TRUE)) install.packages(p)
invisible(lapply(pkgs, library, character.only = TRUE))

set.seed(42)
options(scipen = 999)

NAVY  <- "#1A3A6C"; TEAL  <- "#2C8AA8"; ORANGE <- "#E07B1F"
RED   <- "#C1352D"; GREEN <- "#3A8A3A"; GREY   <- "#7A7A7A"

report_theme <- theme_minimal(base_size = 12) +
  theme(plot.title    = element_text(face = "bold", color = NAVY, size = 13,
                                     margin = ggplot2::margin(b = 10)),
        plot.subtitle = element_text(color = GREY, size = 10,
                                     margin = ggplot2::margin(b = 10)),
        axis.title    = element_text(size = 11),
        panel.grid.minor = element_blank(),
        panel.grid.major = element_line(linewidth = 0.3, linetype = "dashed"),
        legend.position  = "top",
        legend.title     = element_text(face = "bold", size = 10))


# ---- 1. Data Preprocessing & Cohort Construction ----
df <- read_csv("diabetic_data.csv", na = c("?", ""), show_col_types = FALSE)
cat("Raw encounters:", nrow(df), "\n")

df <- df %>%
  select(-weight, -payer_code, -medical_specialty) %>%
  filter(!discharge_disposition_id %in% c(11, 13, 14, 19, 20, 21)) %>%
  arrange(encounter_id) %>%
  distinct(patient_nbr, .keep_all = TRUE) %>%
  filter(!is.na(race), !is.na(diag_1), gender != "Unknown/Invalid") %>%
  mutate(readmit_30      = if_else(readmitted == "<30", 1L, 0L),
         A1C_tested      = as.integer(A1Cresult != "None"),
         change_bin      = as.integer(change == "Ch"),
         diabetesMed_bin = as.integer(diabetesMed == "Yes"),
         insulin_bin     = as.integer(insulin != "No"),
         gender_bin      = as.integer(gender == "Male"),
         race_AA         = as.integer(race == "AfricanAmerican"),
         race_His        = as.integer(race == "Hispanic"),
         race_Oth        = as.integer(race == "Other"),
         age_num = case_when(
           age == "[0-10)"  ~  5, age == "[10-20)" ~ 15, age == "[20-30)" ~ 25,
           age == "[30-40)" ~ 35, age == "[40-50)" ~ 45, age == "[50-60)" ~ 55,
           age == "[60-70)" ~ 65, age == "[70-80)" ~ 75, age == "[80-90)" ~ 85,
           age == "[90-100)" ~ 95),
         adm_emerg  = as.integer(admission_type_id == 1),
         adm_urgent = as.integer(admission_type_id == 2))

cat("Cleaned analytic sample:", nrow(df), "\n")
cat("30-day readmission rate:", scales::percent(mean(df$readmit_30), 0.01), "\n")


# ---- 2. Research Question 1: Discharge Disposition Chi-Square Analysis ----
df <- df %>% mutate(disp_grp = case_when(
  discharge_disposition_id == 1 ~ "Home",
  discharge_disposition_id %in% c(3, 5, 6) ~ "SNF/Rehab",
  discharge_disposition_id == 2 ~ "Other Hospital",
  discharge_disposition_id %in% c(4, 22, 23, 24) ~ "Home w/ Health Svc",
  TRUE ~ "Other"))

tab_q1  <- table(df$disp_grp, df$readmit_30)
q1_test <- chisq.test(tab_q1); print(q1_test)
cramers_v <- sqrt(q1_test$statistic / (sum(tab_q1) * (min(dim(tab_q1)) - 1)))
cat("Cramer's V:", round(cramers_v, 4), "\n")


# ---- 3. Research Question 2: Length-of-Stay Two-Way ANOVA Analysis ----
sub <- df %>%
  filter(admission_type_id %in% 1:3) %>%
  mutate(adm = factor(admission_type_id, levels = c(3, 1, 2),
                      labels = c("Elective", "Emergency", "Urgent")),
         age_grp = case_when(
           age %in% c("[0-10)","[10-20)","[20-30)","[30-40)") ~ "<40",
           age %in% c("[40-50)","[50-60)")                    ~ "40-60",
           TRUE                                               ~ "60+"),
         age_grp = factor(age_grp, levels = c("<40", "40-60", "60+")))

q2_fit <- aov(time_in_hospital ~ adm * age_grp, data = sub)
print(summary(q2_fit))
ssq <- summary(q2_fit)[[1]][["Sum Sq"]]
cat("Eta sq (admission):  ", round(ssq[1]/sum(ssq), 4), "\n")
cat("Eta sq (age):        ", round(ssq[2]/sum(ssq), 4), "\n")
cat("Eta sq (interaction):", round(ssq[3]/sum(ssq), 4), "\n")


# ---- 4. Research Question 3: Seven-Model Predictive Comparison ----
feat_cols <- c("time_in_hospital","num_lab_procedures","num_procedures",
               "num_medications","number_outpatient","number_emergency",
               "number_inpatient","number_diagnoses","age_num",
               "race_AA","race_His","race_Oth","gender_bin","A1C_tested",
               "change_bin","diabetesMed_bin","insulin_bin",
               "adm_emerg","adm_urgent")
feat_labels <- c("Length of stay (days)","# Lab procedures","# In-hospital procedures",
                 "# Medications","Prior outpatient visits","Prior ER visits",
                 "Prior inpatient visits","# Diagnoses","Age",
                 "Race: African American","Race: Hispanic","Race: Other",
                 "Sex: Male","HbA1c tested","Medication changed",
                 "On diabetes medication","On insulin",
                 "Admission: Emergency","Admission: Urgent")

X <- as.matrix(df[, feat_cols]); y <- df$readmit_30
idx  <- caret::createDataPartition(y, p = 0.7, list = FALSE)
X_tr <- X[idx, ];  y_tr <- y[idx]
X_te <- X[-idx, ]; y_te <- y[-idx]
mu  <- colMeans(X_tr); sd_ <- apply(X_tr, 2, sd)
X_tr_s <- scale(X_tr, center = mu, scale = sd_)
X_te_s <- scale(X_te, center = mu, scale = sd_)
cat("Train:", nrow(X_tr_s), "Test:", nrow(X_te_s), "\n")

# 4a — Logistic GLM
m_glm <- glm(y_tr ~ ., data = data.frame(X_tr_s, y_tr = y_tr), family = "binomial")

# 4b — Ridge (L2)
m_ridge <- cv.glmnet(X_tr_s, y_tr, family = "binomial",
                     alpha = 0, type.measure = "auc", nfolds = 5)

# 4c — Lasso (L1)
m_lasso <- cv.glmnet(X_tr_s, y_tr, family = "binomial",
                     alpha = 1, type.measure = "auc", nfolds = 5)

# 4d — Elastic Net (alpha = 0.5)
m_enet <- cv.glmnet(X_tr_s, y_tr, family = "binomial",
                    alpha = 0.5, type.measure = "auc", nfolds = 5)

# 4e — Random Forest
m_rf <- randomForest(x = X_tr_s, y = factor(y_tr),
                     ntree = 200, maxnodes = 100, nodesize = 20)

# 4f — XGBoost
dtrain <- xgb.DMatrix(X_tr_s, label = y_tr)
m_xgb  <- xgb.train(data = dtrain, nrounds = 300, max_depth = 4,
                    eta = 0.05, subsample = 0.85, colsample_bytree = 0.85,
                    objective = "binary:logistic", eval_metric = "auc",
                    verbose = 0)

# 4g — Neural Network
m_nn <- nnet(x = X_tr_s, y = y_tr, size = 16, decay = 0.001,
             maxit = 200, trace = FALSE, MaxNWts = 3000)


# ---- 5. Model Evaluation & Production Cache Export ----
score <- function(probs, y_true) {
  roc_obj <- pROC::roc(y_true, probs, quiet = TRUE)
  auc_val <- as.numeric(pROC::auc(roc_obj))
  thr     <- as.numeric(pROC::coords(roc_obj, "best", best.method = "youden")["threshold"])
  pred    <- as.integer(probs >= thr)
  cm      <- table(actual = y_true, predicted = factor(pred, levels = c(0, 1)))
  list(auc = round(auc_val, 4), threshold = round(thr, 4),
       sensitivity = round(cm[2,2] / sum(cm[2,]), 4),
       specificity = round(cm[1,1] / sum(cm[1,]), 4),
       precision   = round(cm[2,2] / max(sum(cm[,2]), 1), 4),
       accuracy    = round((cm[1,1] + cm[2,2]) / sum(cm), 4),
       f1          = round(2 * cm[2,2] / (2 * cm[2,2] + cm[1,2] + cm[2,1]), 4),
       tn = as.integer(cm[1,1]), fp = as.integer(cm[1,2]),
       fn = as.integer(cm[2,1]), tp = as.integer(cm[2,2]),
       fpr_curve = as.numeric(1 - roc_obj$specificities),
       tpr_curve = as.numeric(roc_obj$sensitivities))
}

p_glm   <- predict(m_glm, newdata = data.frame(X_te_s), type = "response")
p_ridge <- as.numeric(predict(m_ridge, X_te_s, s = "lambda.min", type = "response"))
p_lasso <- as.numeric(predict(m_lasso, X_te_s, s = "lambda.min", type = "response"))
p_enet  <- as.numeric(predict(m_enet,  X_te_s, s = "lambda.min", type = "response"))
p_rf    <- predict(m_rf, X_te_s, type = "prob")[, "1"]
p_xgb   <- predict(m_xgb, xgb.DMatrix(X_te_s))
p_nn    <- as.numeric(predict(m_nn, X_te_s, type = "raw"))

scoreboard <- tibble(
  Model = c("Logistic GLM","Ridge (L2)","Lasso (L1)","Elastic Net",
            "Random Forest","XGBoost","Neural Network"),
  AUC   = c(score(p_glm,y_te)$auc, score(p_ridge,y_te)$auc, score(p_lasso,y_te)$auc,
            score(p_enet,y_te)$auc, score(p_rf,y_te)$auc, score(p_xgb,y_te)$auc,
            score(p_nn,y_te)$auc))
print(scoreboard)

# Linear surrogate on logit(p) for non-linear models — lets the JS engine recompute
surrogate <- function(probs, X_s) {
  eps <- 1e-6
  p_clip <- pmin(pmax(probs, eps), 1 - eps)
  lgt   <- log(p_clip / (1 - p_clip))
  fit   <- lm(lgt ~ X_s)
  list(coefs = as.numeric(coef(fit)[-1]),
       intercept = as.numeric(coef(fit)[1]))
}
thr_grid <- function(probs, y_true) {
  thr  <- seq(0.01, 0.5, length.out = 50)
  tp_v <- integer(length(thr)); fp_v <- integer(length(thr))
  for (i in seq_along(thr)) {
    pred    <- as.integer(probs >= thr[i])
    tp_v[i] <- sum(pred == 1 & y_true == 1)
    fp_v[i] <- sum(pred == 1 & y_true == 0)
  }
  list(thr = thr, tp = tp_v, fp = fp_v,
       all_positives = as.integer(sum(y_true == 1)),
       all_negatives = as.integer(sum(y_true == 0)))
}
decile_gradient <- function(probs, y_true) {
  d <- ntile(probs, 10)
  out <- tibble(p = probs, y = y_true, d = d) %>%
    group_by(d) %>%
    summarise(pred = mean(p), obs = mean(y), n = n(), .groups = "drop")
  list(decile = as.integer(out$d - 1L), pred = round(out$pred, 4),
       obs = round(out$obs, 4), n = as.integer(out$n))
}
build_entry <- function(name, kind, probs, X_te_s, y_te,
                        coefs = NULL, intercept = NULL, importance = NULL) {
  s <- score(probs, y_te)
  entry <- list(
    name = name, kind = kind,
    metrics = list(auc = s$auc, threshold = s$threshold,
                   sensitivity = s$sensitivity, specificity = s$specificity,
                   precision = s$precision, accuracy = s$accuracy, f1 = s$f1,
                   tn = s$tn, fp = s$fp, fn = s$fn, tp = s$tp,
                   fpr_curve = round(s$fpr_curve[seq(1, length(s$fpr_curve),
                                                    length.out = 80)], 4),
                   tpr_curve = round(s$tpr_curve[seq(1, length(s$tpr_curve),
                                                    length.out = 80)], 4)),
    deciles = decile_gradient(probs, y_te),
    threshold_grid = thr_grid(probs, y_te))
  if (kind == "linear") {
    entry$coefs     <- round(coefs, 5)
    entry$intercept <- round(intercept, 5)
  } else {
    surr <- surrogate(probs, X_te_s)
    entry$surrogate_coefs     <- round(surr$coefs, 5)
    entry$surrogate_intercept <- round(surr$intercept, 5)
    if (!is.null(importance))
      entry$feature_importance <- round(importance, 5)
  }
  entry
}

glm_coefs   <- as.numeric(coef(m_glm))
ridge_coefs <- as.numeric(coef(m_ridge, s = "lambda.min"))
lasso_coefs <- as.numeric(coef(m_lasso, s = "lambda.min"))
enet_coefs  <- as.numeric(coef(m_enet,  s = "lambda.min"))

payload <- list(
  features = feat_cols, labels = feat_labels,
  ui = lapply(seq_along(feat_cols), function(i) {
    fc <- feat_cols[i]
    if (fc %in% c("time_in_hospital","num_lab_procedures","num_procedures",
                  "num_medications","number_outpatient","number_emergency",
                  "number_inpatient","number_diagnoses","age_num"))
      list(type = "slider",
           min  = as.integer(min(X[, fc])),
           max  = as.integer(quantile(X[, fc], 0.99)),
           default = as.integer(round(mean(X[, fc]))),
           step = 1L)
    else
      list(type = "toggle", default = 0L)
  }),
  mu = round(mu, 5), sd = round(sd_, 5),
  population_rate = round(mean(y_te), 5),
  n_test = length(y_te),
  models = list(
    glm   = build_entry("Logistic GLM",  "linear", p_glm,   X_te_s, y_te,
                        glm_coefs[-1], glm_coefs[1]),
    ridge = build_entry("Ridge (L2)",    "linear", p_ridge, X_te_s, y_te,
                        ridge_coefs[-1], ridge_coefs[1]),
    lasso = build_entry("Lasso (L1)",    "linear", p_lasso, X_te_s, y_te,
                        lasso_coefs[-1], lasso_coefs[1]),
    enet  = build_entry("Elastic Net",   "linear", p_enet,  X_te_s, y_te,
                        enet_coefs[-1], enet_coefs[1]),
    rf    = build_entry("Random Forest", "tree",   p_rf,    X_te_s, y_te,
                        importance = as.numeric(importance(m_rf))),
    xgb   = build_entry("XGBoost",       "tree",   p_xgb,   X_te_s, y_te,
                        importance = xgb.importance(model = m_xgb)$Gain),
    nn    = build_entry("Neural Network","nn",     p_nn,    X_te_s, y_te)
  )
)
write_json(payload, "model_cache.json", auto_unbox = TRUE,
           digits = 5, pretty = FALSE)
cat("Cache exported:", file.size("model_cache.json"), "bytes\n")


# ---- 6. Publication-Ready Figures (Print Only) ----
# Each block recreates one figure that appears in the Word report. Run the
# script interactively (or source it inside RStudio) to see them in the Plots
# pane. No ggsave / no file writes — pure print() calls.

# Figure 1 — Cohort funnel
fig1 <- tibble(
  Step  = factor(c("Raw\nencounters","After death/\nhospice removal",
                   "First encounter\nper patient","After missing\nfilter"),
                 levels = c("Raw\nencounters","After death/\nhospice removal",
                            "First encounter\nper patient","After missing\nfilter")),
  N     = c(101766, 99343, 69990, 68061)) %>%
  ggplot(aes(x = Step, y = N)) +
  geom_col(fill = c(NAVY, TEAL, TEAL, GREEN), width = 0.6,
           color = "white", linewidth = 1.5) +
  geom_text(aes(label = scales::comma(N)), vjust = -0.5,
            fontface = "bold", color = NAVY, size = 4) +
  scale_y_continuous(labels = comma, limits = c(0, 115000)) +
  labs(title = "Figure 1.  Cohort Construction — From Raw Data to Analytic Sample",
       x = NULL, y = "Encounters") +
  report_theme
print(fig1)

# Figure 2a — Outcome distribution donut (No / Readmit >30d / Readmit <30d)
# Counts derived from the cleaned cohort
out_counts <- df %>%
  mutate(grp = case_when(readmitted == "<30" ~ "Readmit <30d",
                          readmitted == ">30" ~ "Readmit >30d",
                          TRUE                ~ "No readmit")) %>%
  count(grp) %>%
  mutate(grp = factor(grp, levels = c("No readmit","Readmit >30d","Readmit <30d")),
         pct = n / sum(n) * 100,
         label = sprintf("%s\n%.1f%%", grp, pct))

fig2a <- ggplot(out_counts, aes(x = 2, y = n, fill = grp)) +
  geom_col(width = 1, color = "white", linewidth = 2) +
  coord_polar(theta = "y", start = 0) +
  xlim(0.5, 2.5) +
  scale_fill_manual(values = c("No readmit" = GREEN, "Readmit >30d" = "#D4A52A",
                               "Readmit <30d" = RED), guide = "none") +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5),
            color = "white", fontface = "bold", size = 3.5) +
  annotate("text", x = 0.5, y = 0, label = sprintf("%.2f%%", mean(df$readmit_30)*100),
           size = 8, fontface = "bold", color = RED) +
  annotate("text", x = 0.5, y = 0, label = "30-day rate", vjust = 3,
           size = 3.5, color = GREY) +
  annotate("text", x = 0.5, y = 0, label = sprintf("N = %s", scales::comma(nrow(df))),
           vjust = 5, size = 3, color = GREY) +
  labs(title = "Figure 2a.  Outcome Distribution") +
  theme_void() +
  theme(plot.title = element_text(face = "bold", color = NAVY, size = 13,
                                  hjust = 0, margin = ggplot2::margin(b = 10)))
print(fig2a)

# Figure 2b — Business translation KPIs
# (Rendered as a text-only ggplot since it's a KPI panel, not a data chart)
n30 <- sum(df$readmit_30)
cost_per <- 15400
total_cost <- n30 * cost_per

kpis <- tibble(
  y     = c(4, 3, 2, 1),
  value = c(scales::comma(n30),
            scales::dollar(cost_per),
            sprintf("$%.1fM", total_cost / 1e6),
            "25–40%"),
  label = c("patients readmitted within 30 days",
            "avg cost per readmission",
            "estimated 30-day readmission cost in cohort",
            "estimated preventable share (Jencks et al.)"),
  color = c(RED, ORANGE, NAVY, GREEN))

fig2b <- ggplot(kpis, aes(x = 0, y = y)) +
  geom_text(aes(label = value, color = color),
            hjust = 0, fontface = "bold", size = 7) +
  geom_text(aes(label = label, x = 0.5),
            hjust = 0, color = GREY, size = 3.5) +
  scale_color_identity() +
  coord_cartesian(xlim = c(-0.05, 1.5), ylim = c(0.5, 4.6)) +
  labs(title = "Figure 2b.  Business Translation") +
  theme_void() +
  theme(plot.title = element_text(face = "bold", color = NAVY, size = 13,
                                  hjust = 0, margin = ggplot2::margin(b = 14)))
print(fig2b)

# Figure 3 — Q1 diverging bar (Discharge disposition vs baseline)
baseline <- mean(df$readmit_30) * 100
fig3 <- df %>%
  group_by(disp_grp) %>%
  summarise(rate = mean(readmit_30) * 100, n = n(), .groups = "drop") %>%
  arrange(rate) %>%
  mutate(delta = rate - baseline,
         disp_grp = factor(disp_grp, levels = disp_grp),
         color_grp = case_when(delta < 0 ~ "Lower", delta < 5 ~ "Modest", TRUE ~ "High")) %>%
  ggplot(aes(x = delta, y = disp_grp, fill = color_grp)) +
  geom_col(width = 0.65, color = "white") +
  geom_vline(xintercept = 0, color = NAVY, linewidth = 0.6) +
  geom_text(aes(x = ifelse(delta >= 0, delta + 0.25, 0.25),
                label = sprintf("%.1f%%  (n = %s)", rate, comma(n))),
            hjust = 0, fontface = "bold", size = 3.4) +
  scale_fill_manual(values = c("Lower" = GREEN, "Modest" = ORANGE, "High" = RED),
                    guide = "none") +
  scale_x_continuous(limits = c(-5, 14)) +
  annotate("label", x = 12.5, y = 1, size = 3, color = NAVY,
           label = "chi sq = 757.1 · df = 4\np < .001\nCramer's V = 0.106") +
  labs(title = "Figure 3.  Q1 — Readmission Risk by Discharge Disposition",
       x = "Difference from baseline 30-day readmission rate (pp)",
       y = NULL) +
  report_theme + theme(legend.position = "none")
print(fig3)

# Figure 4a — Q2 interaction plot
cell_means <- sub %>% group_by(adm, age_grp) %>%
  summarise(m = mean(time_in_hospital), .groups = "drop")
fig4a <- ggplot(cell_means, aes(x = adm, y = m, color = age_grp, group = age_grp)) +
  geom_line(linewidth = 1.4) +
  geom_point(aes(shape = age_grp), size = 4) +
  geom_text(aes(label = sprintf("%.2f", m)), vjust = -1.3,
            fontface = "bold", size = 3.2, show.legend = FALSE) +
  scale_color_manual(values = c("<40" = TEAL, "40-60" = ORANGE, "60+" = RED),
                     name = "Age band") +
  scale_shape_manual(values = c("<40" = 16, "40-60" = 15, "60+" = 18),
                     name = "Age band") +
  coord_cartesian(ylim = c(2.8, 5.0)) +
  labs(title = "Figure 4a.  Interaction Plot — LOS by Admission x Age",
       x = "Admission Type", y = "Mean Length of Stay (days)") +
  report_theme
print(fig4a)

# Figure 4b — Effect sizes (F vs eta squared)
fig4b <- tibble(
  Factor = factor(c("Admission Type","Age Group","Interaction"),
                  levels = c("Admission Type","Age Group","Interaction")),
  F_stat = c(53.54, 373.88, 2.85),
  Eta_sq = c(0.0018, 0.0122, 0.0002)) %>%
  ggplot(aes(x = Factor, y = F_stat, fill = Factor)) +
  geom_col(width = 0.55, color = "white", linewidth = 1.5) +
  geom_text(aes(label = sprintf("F = %.1f", F_stat)), vjust = -0.4,
            fontface = "bold", size = 3.6) +
  geom_text(aes(label = sprintf("eta sq = %.4f", Eta_sq)),
            y = log10(2), color = "white", fontface = "bold", size = 3.2) +
  scale_y_log10(limits = c(0.5, 800)) +
  scale_fill_manual(values = c(TEAL, NAVY, GREY), guide = "none") +
  labs(title = "Figure 4b.  Effect Sizes",
       x = NULL, y = "F statistic (log scale)") +
  report_theme
print(fig4b)

# Figure 5a — Held-out AUC across all 7 models
fig5a <- tibble(
  Model = factor(c("Logistic GLM","Ridge (L2)","Lasso (L1)","Elastic Net",
                   "Random Forest","XGBoost","Neural Network"),
                 levels = c("Logistic GLM","Ridge (L2)","Lasso (L1)","Elastic Net",
                            "Random Forest","XGBoost","Neural Network")),
  AUC   = c(score(p_glm,y_te)$auc,  score(p_ridge,y_te)$auc,
            score(p_lasso,y_te)$auc, score(p_enet,y_te)$auc,
            score(p_rf,y_te)$auc,   score(p_xgb,y_te)$auc,
            score(p_nn,y_te)$auc)) %>%
  ggplot(aes(x = Model, y = AUC, fill = Model)) +
  geom_col(width = 0.55, color = "white", linewidth = 1.5) +
  geom_text(aes(label = sprintf("%.3f", AUC)), vjust = -0.4,
            fontface = "bold", size = 4) +
  geom_hline(yintercept = 0.50, linetype = "dashed", color = RED, alpha = 0.7) +
  annotate("text", x = 7.4, y = 0.505, label = "chance", color = RED, size = 3, hjust = 1) +
  scale_fill_manual(values = c(GREY, TEAL, NAVY, "#7c3aed", GREEN, ORANGE, RED),
                    guide = "none") +
  coord_cartesian(ylim = c(0.45, 0.65)) +
  labs(title = "Figure 5a.  Held-out AUC — 7 Models",
       x = NULL, y = "AUC (test set)") +
  report_theme + theme(axis.text.x = element_text(angle = 25, hjust = 1))
print(fig5a)

# Figure 5b — ROC curves (overlay)
roc_df <- bind_rows(
  tibble(fpr = 1 - pROC::roc(y_te, p_glm,   quiet = TRUE)$specificities,
         tpr = pROC::roc(y_te, p_glm,   quiet = TRUE)$sensitivities,
         model = sprintf("GLM (%.3f)",   score(p_glm,   y_te)$auc)),
  tibble(fpr = 1 - pROC::roc(y_te, p_ridge, quiet = TRUE)$specificities,
         tpr = pROC::roc(y_te, p_ridge, quiet = TRUE)$sensitivities,
         model = sprintf("Ridge (%.3f)", score(p_ridge, y_te)$auc)),
  tibble(fpr = 1 - pROC::roc(y_te, p_lasso, quiet = TRUE)$specificities,
         tpr = pROC::roc(y_te, p_lasso, quiet = TRUE)$sensitivities,
         model = sprintf("Lasso (%.3f)", score(p_lasso, y_te)$auc)),
  tibble(fpr = 1 - pROC::roc(y_te, p_rf,    quiet = TRUE)$specificities,
         tpr = pROC::roc(y_te, p_rf,    quiet = TRUE)$sensitivities,
         model = sprintf("RF (%.3f)",    score(p_rf,    y_te)$auc)))
fig5b <- ggplot(roc_df, aes(fpr, tpr, color = model)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = GREY) +
  geom_line(linewidth = 1.1) +
  scale_color_manual(values = c(GREY, TEAL, NAVY, GREEN), name = NULL) +
  labs(title = "Figure 5b.  ROC Curves on Held-out Test Set",
       x = "False positive rate (1 - specificity)",
       y = "True positive rate (sensitivity)") +
  report_theme + theme(legend.position = c(0.7, 0.2))
print(fig5b)

# Figure 5c — Lasso feature retention donut (10 of 19 features survive L1)
n_retained <- sum(lasso_coefs[-1] != 0)
n_dropped  <- length(feat_cols) - n_retained
ret_df <- tibble(Status = factor(c("Retained","Dropped"),
                                  levels = c("Retained","Dropped")),
                 Count  = c(n_retained, n_dropped))
fig5c <- ggplot(ret_df, aes(x = 2, y = Count, fill = Status)) +
  geom_col(width = 1, color = "white", linewidth = 2) +
  coord_polar(theta = "y") +
  xlim(0.5, 2.5) +
  scale_fill_manual(values = c("Retained" = NAVY, "Dropped" = "#E0E0E0"),
                    guide = "none") +
  geom_text(aes(label = sprintf("%s\n%d features", Status, Count)),
            position = position_stack(vjust = 0.5),
            color = c("white","#404040"), fontface = "bold", size = 3.6) +
  annotate("text", x = 0.5, y = 0,
           label = sprintf("%.0f%%", n_retained / length(feat_cols) * 100),
           size = 9, fontface = "bold", color = NAVY) +
  annotate("text", x = 0.5, y = 0, label = "feature\nparsimony",
           vjust = 2.6, size = 3.2, color = GREY) +
  labs(title = "Figure 5c.  Lasso Feature Retention") +
  theme_void() +
  theme(plot.title = element_text(face = "bold", color = NAVY, size = 13,
                                  hjust = 0, margin = ggplot2::margin(b = 10)))
print(fig5c)

# Figure 6 — Lasso variable importance
imp_df <- tibble(feature = feat_labels, beta = lasso_coefs[-1]) %>%
  filter(abs(beta) > 1e-6) %>%
  arrange(desc(abs(beta))) %>%
  mutate(feature = factor(feature, levels = rev(feature)),
         direction = if_else(beta > 0, "Increases risk", "Decreases risk"))
fig6 <- ggplot(imp_df, aes(x = beta, y = feature, fill = direction)) +
  geom_col(color = "white", linewidth = 1) +
  geom_vline(xintercept = 0, color = NAVY) +
  geom_text(aes(label = sprintf("%+.3f", beta),
                x = beta + if_else(beta > 0, 0.005, -0.005),
                hjust = if_else(beta > 0, 0, 1)),
            fontface = "bold", size = 3.2) +
  scale_fill_manual(values = c("Increases risk" = RED, "Decreases risk" = GREEN)) +
  labs(title = "Figure 6.  Lasso Variable Importance",
       x = "Standardized log-odds coefficient (beta)",
       y = NULL, fill = NULL) +
  report_theme
print(fig6)

# Figure 7a — Confusion matrix at Youden-optimal threshold (Lasso)
s_lasso <- score(p_lasso, y_te)
cm_df <- tibble(
  Actual    = factor(c("No","No","Yes","Yes"), levels = c("Yes","No")),
  Predicted = factor(c("No","Yes","No","Yes"), levels = c("No","Yes")),
  Count     = c(s_lasso$tn, s_lasso$fp, s_lasso$fn, s_lasso$tp),
  Label     = c("TN","FP","FN","TP")) %>%
  mutate(Pct = Count / sum(Count) * 100,
         Fill = if_else(Label %in% c("TN","TP"), "#A8E0B8", "#F9C9C9"))
fig7a <- ggplot(cm_df, aes(Predicted, Actual, fill = Fill)) +
  geom_tile(color = "white", linewidth = 2) +
  geom_text(aes(label = sprintf("%s\n%s\n(%.1f%%)", Label, comma(Count), Pct)),
            fontface = "bold", size = 4.5) +
  scale_fill_identity() +
  labs(title = sprintf("Figure 7a.  Confusion Matrix at Threshold = %.3f (Lasso)",
                       s_lasso$threshold),
       x = NULL, y = NULL) +
  report_theme +
  theme(panel.grid = element_blank(),
        axis.text  = element_text(face = "bold", size = 11))
print(fig7a)

# Figure 7b — Business read-out per 10,000 discharges
# Scale the confusion-matrix counts from the held-out test set to a 10,000-discharge population
scale_factor       <- 10000 / length(y_te)
flagged_10k        <- round((s_lasso$tp + s_lasso$fp) * scale_factor)
caught_10k         <- round(s_lasso$tp * scale_factor)
missed_10k         <- round(s_lasso$fn * scale_factor)
intervention_cost  <- flagged_10k * 300                 # $300 per intervention
readmission_saved  <- caught_10k * 0.30 * 15400         # 30% prevented, $15,400 each
net_benefit        <- readmission_saved - intervention_cost

biz <- tibble(
  y     = c(6, 5, 4, 3, 2, 1),
  value = c(scales::comma(flagged_10k),
            scales::comma(caught_10k),
            scales::comma(missed_10k),
            sprintf("$%.2fM", intervention_cost / 1e6),
            sprintf("$%.2fM", readmission_saved / 1e6),
            sprintf("%s$%.2fM",
                    if_else(net_benefit >= 0, "+", "-"),
                    abs(net_benefit) / 1e6)),
  label = c("Patients flagged for high-risk pathway",
            "True readmissions caught",
            "True readmissions missed",
            "Cost of high-risk pathway @ $300/pt",
            "Cost of missed readmits @ $15,400",
            "Net benefit per 10,000 discharges"),
  color = c(ORANGE, GREEN, RED, NAVY, RED,
            if_else(net_benefit >= 0, GREEN, RED)))

fig7b <- ggplot(biz, aes(x = 0, y = y)) +
  geom_text(aes(label = value, color = color),
            hjust = 0, fontface = "bold", size = 6.5) +
  geom_text(aes(label = label, x = 0.42),
            hjust = 0, color = GREY, size = 3.4) +
  scale_color_identity() +
  coord_cartesian(xlim = c(-0.05, 1.4), ylim = c(0.5, 6.6)) +
  labs(title = "Figure 7b.  Business Read-out per 10,000 Discharges") +
  theme_void() +
  theme(plot.title = element_text(face = "bold", color = NAVY, size = 13,
                                  hjust = 0, margin = ggplot2::margin(b = 14)))
print(fig7b)

# Figure 8 — Calibration plot (Lasso)
calib <- tibble(prob = p_lasso, actual = y_te) %>%
  mutate(decile = ntile(prob, 10)) %>%
  group_by(decile) %>%
  summarise(pred_mean = mean(prob), obs_rate = mean(actual), .groups = "drop")
fig8 <- ggplot(calib, aes(pred_mean, obs_rate)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = GREY) +
  geom_hline(yintercept = mean(y_te), linetype = "dotted", color = RED) +
  geom_line(color = NAVY, linewidth = 1) +
  geom_point(color = NAVY, size = 4) +
  geom_text(aes(label = paste0("D", decile)), vjust = -1, hjust = 0.3,
            size = 3, color = NAVY) +
  coord_cartesian(xlim = c(0, 0.18), ylim = c(0, 0.18)) +
  labs(title = "Figure 8.  Calibration on Held-out Test Set (Lasso)",
       x = "Mean predicted probability per decile",
       y = "Observed readmission rate per decile") +
  report_theme
print(fig8)

# Figure 9 — Risk gradient by decile (Lasso)
fig9 <- calib %>%
  ggplot(aes(x = factor(decile), y = obs_rate * 100, fill = obs_rate)) +
  geom_col(width = 0.7, color = "white", linewidth = 1.2) +
  geom_text(aes(label = sprintf("%.1f%%", obs_rate * 100)),
            vjust = -0.4, fontface = "bold", size = 3.4) +
  geom_hline(yintercept = mean(y_te) * 100, linetype = "dashed", color = NAVY) +
  scale_fill_distiller(palette = "RdYlGn", direction = -1, guide = "none") +
  scale_x_discrete(labels = paste0("D", 1:10)) +
  labs(title = "Figure 9.  Risk Gradient — Observed Readmission Rate by Decile",
       x = "Predicted-risk decile (D1 = lowest, D10 = highest)",
       y = "Observed 30-day rate (%)") +
  report_theme
print(fig9)

# =============================================================================
# END OF SCRIPT
# Knit ALY6015_FinalApp.Rmd next to render the live RPubs app.
# =============================================================================

Built May 2026 · R 4.5.2 · model_cache.json v2 · set.seed(42) · Anush Goel, Mandeep Kaur, Sandeep Kaur

Diabetic Readmission Risk Lab

Diabetic Readmission Risk Lab

Model Scoreboard — Seven Algorithms, One Held-Out Comparison

Cross-Model Comparison Table ?All metrics computed on the same held-out test set at each model’s Youden-optimal threshold. AUC measures discrimination; sensitivity is recall; specificity is true-negative rate; precision is positive predictive value; F1 balances precision and recall.

Discrimination — AUC across all 7 models ?AUC = Area Under the ROC Curve. Probability that a randomly chosen positive case scores higher than a randomly chosen negative. 0.5 = chance; 1.0 = perfect.

ROC Curves — All Models Overlaid ?An ROC curve traces sensitivity (TPR) against 1 - specificity (FPR) as the threshold varies from 1 to 0. Steeper, closer-to-corner curves are better.

Multi-Metric Radar — Where Each Model Wins

Calibration — Are Predicted Probabilities Trustworthy? ?A calibration plot bins patients by their predicted probability, then plots predicted vs observed rates. Points on the diagonal mean the model is well-calibrated (a “20% risk” really happens 20% of the time).

Learning Curves — Would More Training Data Help? ?Train Lasso on increasingly larger subsets of the training data and measure test AUC. If the curve has flattened, the model has extracted all available signal; if still rising, more data would help.

Feature Correlation Heatmap ?Pairwise Pearson correlations on the 19 features. High off-diagonal values (|r| > 0.5) indicate multicollinearity, which explains why Lasso drops some features without losing AUC.

Subgroup Performance — Fairness Audit ?A fair model performs comparably across demographic subgroups. Large AUC or sensitivity gaps indicate the model is more accurate for some patients than others.

Context: Comparison vs CMS HRRP Penalty

Live Risk Predictor

Pick a Model

Adjust Patient Profile

Decision Waterfall — Why This Prediction? ?Reads top-to-bottom: start at the model intercept, add each feature’s contribution, end at the final log-odds. Sigmoid of the final value gives the probability.

Patient Comparison — What If You Change One Thing?

Try a Built-In Scenario

Configure The Two Patients

Where the Two Patients Differ

Driver Comparison

Business P&L — What Is The Model Worth?

Economic Assumptions (Edit Any Field) ?All downstream calculations re-run live as you edit. Defaults sourced from Jencks et al. (NEJM 2009) and CMS HRRP penalty data.

Net Benefit Across All Decision Thresholds ?The red marker is the threshold maximizing annual net benefit under your settings. Move economic sliders above to watch it shift.

Decision Curve Analysis (Vickers-Elkin) ?DCA compares the model against “Treat All” and “Treat None” strategies across every plausible threshold probability. The model is useful for any threshold where its line is above both alternatives.

Manual Threshold Playground ?Drag to set any threshold and see the confusion matrix and net benefit at that operating point. Hospital leadership may prefer a sensitivity-tilted or specificity-tilted threshold over Youden-optimal.

Cost Scenarios — Side-by-Side Comparison ?Three intervention-cost scenarios shown at Youden-optimal threshold for each. Pick the column closest to your hospital’s actual program cost.

Confusion Matrix at Optimal Threshold

Risk Gradient by Predicted-Risk Decile

Research Findings — Three Questions, Three Statistical Instruments

Cohort at a Glance

Q1 — Does discharge disposition predict 30-day readmission?

Substantive Interpretation

Q2 — Does length of stay vary by admission type and age, with interaction?

Substantive Interpretation

Q3 — Can regularization recover a parsimonious model with no loss in discrimination?

Methods, Data, & References

Data Source & Cohort Construction

Preprocessing Pipeline

Feature Engineering (19 final predictors)

Statistical & Machine-Learning Methods

Q1 — Chi-Square Test of Independence

Q2 — Two-Way ANOVA With Interaction

Q3 — Seven Classifiers Compared

Evaluation Protocol

Live Deployment Architecture

Data Dictionary — All 19 Features ?Precise operational definition for every feature used in the model. Required for reproducibility and clinical handoff.

Limitations & Generalizability Caveats

References (APA 7th)

Appendix — R Source Code

Welcome to the Diabetic Readmission Risk Lab