Cost analyses: late-stage prediabetes prevention

Data & assumptions

Data source: Diabetes Prevention Impact Toolkit (CDC; US-based)

Reference: https://nccd.cdc.gov/Toolkit/DiabetesImpact/
This report evaluates late-stage prediabetes prevention savings based on simulated projections from the National Diabetes Prevention Program (NDP) (https://www.cdc.gov/diabetes-prevention/)

Summary of our main assumptions:

Total sample: 100,000 adults in the US with late stage prediabetes (Hb1Ac 6.0-6.4) aka big fish out of whom 2,638 are assumed to participate in the prevention program (NDP)
Age bracket: 45–64 years only
BMI: 40.00% with BMI ≥ 30.0: 40.00% (obese); 60.00% with BMI ≥ 24.0 and < 30.0 (overweight)
Prevention program cost : $417.00 annually (given as default by CDC*)
Productivity loss: days missed per Year from work due to diabetes: 3.3 (given as default by CDC*)

Additional details behind default assumptions: https://nccd.cdc.gov/Toolkit/DiabetesImpact/docs/input-checklist_state.pdf

# Load required package
library(knitr)
library(tidyverse)
library(patchwork)


# Create data frame
inputs <- data.frame(
  Toolkit_Inputs = c(
    "<span style='color:red'><b>Population</b></span>", 
    "Total Adult Population", 
    "Eligible Persons", 
    "Completed Screening", 
    "Participate in Intervention",

    "<span style='color:red'><b>Age Breakdown</b></span>",
    "  % Age 45–64", 
    "  % Age 65–74", 
    "  % Age 75+",

    "<span style='color:red'><b>Sex Breakdown</b></span>", 
    "  % Male", 
    "  % Female",

    "<span style='color:red'><b>Race Breakdown</b></span>", 
    "  % White (Non-Hispanic)", 
    "  % Black (Non-Hispanic)", 
    "  % Hispanic", 
    "  % Asian", 
    "  % Other Race", 

    "<span style='color:red'><b>Body Weight Breakdown</b></span>", 
    "  % Obese (BMI ≥ 30.0)", 
    "  % Overweight (BMI ≥ 24.0 and < 30.0)", 

    "<span style='color:red'><b>Prediabetes Screening Participation</b></span>",
    "Persons with Prediabetes", 
    "No new screenings for prediabetes", 
    "% Eligible previously screened", 
    "% Screened who participate", 
    
   "<span style='color:red'><b>Prevention Cost Assumptions </b></span>",
    "Prevention Program Cost", 
    "Costs in Year of Diagnosis", 
    "Costs per Year After Diagnosis", 
    "Discount Rate", 
    
   "<span style='color:red'><b>Productivity Loss Assumptions </b></span>", 
    "Days Missed per Year", 
    "Daily Earnings (with diabetes)"
  ),
  Values_Used_in_Analysis = c(
    "", 
    "100,000", 
    "16,386", 
    "7,538", 
    "2,638",
    
    "",
    "100.00%", 
    "0.00%", 
    "0.00%", 
    
    "", 
    "53.82%", 
    "46.18%", 
    
    "", 
    "66.75%", 
    "10.36%", 
    "14.99%", 
    "5.10%", 
    "2.80%", 
    
    "", 
    "40.00%", 
    "60.00%", 

    "",
    "6.20%", 
    "Checked", 
    "46.00%", 
    "35.00%", 
    
    "", 
    "$417.00", 
    "$6,425.00", 
    "$3,900.00", 
    "3.00%", 
    
    "", 
    "3.3", 
    "$276.00"
  )
)

# Print nicely formatted markdown table
kable(inputs, format = "markdown", col.names = c("**Toolkit Inputs**", "**Values Used in Analysis**"))

Toolkit Inputs	Values Used in Analysis
Population
Total Adult Population	100,000
Eligible Persons	16,386
Completed Screening	7,538
Participate in Intervention	2,638
Age Breakdown
% Age 45–64	100.00%
% Age 65–74	0.00%
% Age 75+	0.00%
Sex Breakdown
% Male	53.82%
% Female	46.18%
Race Breakdown
% White (Non-Hispanic)	66.75%
% Black (Non-Hispanic)	10.36%
% Hispanic	14.99%
% Asian	5.10%
% Other Race	2.80%
Body Weight Breakdown
% Obese (BMI ≥ 30.0)	40.00%
% Overweight (BMI ≥ 24.0 and < 30.0)	60.00%
Prediabetes Screening Participation
Persons with Prediabetes	6.20%
No new screenings for prediabetes	Checked
% Eligible previously screened	46.00%
% Screened who participate	35.00%
Prevention Cost Assumptions
Prevention Program Cost	$417.00
Costs in Year of Diagnosis	$6,425.00
Costs per Year After Diagnosis	$3,900.00
Discount Rate	3.00%
Productivity Loss Assumptions
Days Missed per Year	3.3
Daily Earnings (with diabetes)	$276.00

🏥 Medical Costs: With & Without Prevention

This section analyzes projected medical costs for 2,638 individuals with late stage prediabetes under two scenarios:

No Prevention: Baseline T2 diabetes-specific medical costs with no preventive measures.
With Prevention: T2 diabetes-specific medical costs with participation in lifestyle preventive program (NDP).

See ‘📝 Notes’ section for diabetes-specific medical cost estimate calculation and reference

📊 Plot 1: Annual Medical Costs

Compares T2 diabetes-specific medical costs across both scenarios (no prevention vs. prevention) over a 10-year projected horizon.
Bar chart visualizes raw (undiscounted) T2 diabetes-specific medical costs per year per participant.
Labels above the bars show the annual cost differences (no prevention - prevention) displayed in light pink.
Plot 1. does not account for how much the prevention intervention costs.
Diffference in cost (no prevention vs. prevention) is calculated per participant.

df_med <- data.frame(
  Year = 1:10,
  No_Intervention = c(6001, 12024, 18047, 24055, 30128, 36168, 42181, 48163, 54104, 60007),
  With_Intervention = c(5864, 11770, 17670, 23588, 29571, 35519, 41438, 47325, 53171, 58977)
)

df_med <- df_med %>%
  mutate(Difference = No_Intervention - With_Intervention)

# 2. Reshape for Plot 1
df_med_long <- df_med %>%
  pivot_longer(cols = c(No_Intervention, With_Intervention),
               names_to = "Type", values_to = "Cost") %>%
  mutate(Type = recode(Type,
                       "No_Intervention" = "No Prevention",
                       "With_Intervention" = "With Prevention"))

df_labels <- df_med %>%
  mutate(
    Label = paste0("+$", format(Difference, big.mark = ",")),
    Y_Position = pmax(No_Intervention, With_Intervention) + 1000,
    X_Position = as.numeric(factor(Year)) + 0.2
  )

# 3. Plot 1: Medical Costs per Year
plot1 <- ggplot(df_med_long, aes(x = factor(Year), y = Cost, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.8) +
  geom_label(data = df_labels,
             aes(x = X_Position, y = Y_Position, label = Label),
             fill = "#FDEDEC",
             color = "black",
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +
  scale_fill_manual(
    name = "",
    values = c("No Prevention" = "#D55E00", "With Prevention" = "#0072B2")
  ) +
  labs(
    title = "",
    subtitle = "T2D-specific medical costs over 10y (no prevention vs. prevention)",
    x = "Year",
    y = "Annual Medical Cost ($ USD)"
  ) +
  theme_classic(base_size = 14) +
  theme(plot.margin = margin(10, 20, 10, 20),
        legend.position = "bottom")
plot1

📉 Plot 2: Annual Medical Savings

Vlisualizes annual net medical savings from prevention (derived from plot1) over 10-year projected horizon, accounting for the $ cost of prevention (NDP).
✅ Green bar: shows prevention savings* discounted back to Year 0 using a 3% annual discount rate to reflect time value of money.
⚠️ Yellow bar shows cost of prevention assumed $417/year(with 3% annual discount rate[to discuss if discounting here makes sense]
Labels above bars show difference in discounted prevention savings and prevention costs, with green (prevention savings > cost) and red (prevention cost > savings).
Diffference in savings (prevention savings vs. prevention costs) is calculated per participant.

library(tidyverse)
library(patchwork)
library(tidyverse)
library(ggplot2)
library(knitr)
library(kableExtra)

# 4. Add discounting for Plot 2 and Table
discount_rate <- 0.03
df_med_discounted <- df_med %>%
  mutate(
    Intervention_Cost = 417,
    Discount_Factor = 1 / (1 + discount_rate) ^ Year,
    PV_Savings = Difference * Discount_Factor,
    PV_Intervention_Cost = Intervention_Cost * Discount_Factor,
    Net_Benefit = PV_Savings - PV_Intervention_Cost
  )

df_med_long_discounted <- df_med_discounted %>%
  select(Year, PV_Savings, PV_Intervention_Cost) %>%
  pivot_longer(cols = c(PV_Savings, PV_Intervention_Cost),
               names_to = "Type", values_to = "Amount") %>%
  mutate(Type = recode(Type,
                       "PV_Savings" = "Prevention Savings",
                       "PV_Intervention_Cost" = "Prevention Cost"))

# Labels: include **negative net benefit** explicitly
# Update label data with custom colors
df_labels_discounted <- df_med_discounted %>%
  mutate(
    Label = ifelse(Net_Benefit >= 0,
                   paste0("+$", round(Net_Benefit, 0)),
                   paste0("-$", abs(round(Net_Benefit, 0)))),
    Y_Position = pmax(PV_Savings, PV_Intervention_Cost) + 100,
    X_Position = as.numeric(factor(Year)) + 0.2,
    Fill_Color = ifelse(Net_Benefit < 0, "#FDEDEC", "#D4EFDF"),  # pink for negative
    Label_Color = ifelse(Net_Benefit < 0, "red", "black")
  )

# Plot 2: Avoid double fill scale
plot2 <- ggplot(df_med_long_discounted, aes(x = factor(Year), y = Amount, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.8) +

  # Labels with custom background fill and text color
  geom_label(data = df_labels_discounted,
             aes(x = X_Position, y = Y_Position, label = Label),
             fill = df_labels_discounted$Fill_Color,
             color = df_labels_discounted$Label_Color,
             inherit.aes = FALSE,
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +

  scale_fill_manual(
    name = "",
    values = c("Prevention Savings" = "#009E73", "Prevention Cost" = "#E69F00")
  ) +
  labs(
    title = "",
    subtitle = "",
    x = "Year",
    y = "Discounted $ USD"
  ) + theme_classic(base_size = 14) +
  theme(plot.margin = margin(10, 20, 10, 20),
        legend.position = "bottom")
# 6. Combine Plots Side by Side
plot2

📋 Summary Tables

Table 1: displays discounted differences for each year between present prevention savings and intervention costs, with Net benefit: categorized as either 💰 savings or 🧾 loss.
Table 2: displays summary descriptives (mean, median and SD) of prevention net benefit aggregated over 10 Years.

df_table <- df_med_discounted %>%
  mutate(
    `Year` = Year,
    `PV Savings` = paste0("$", round(PV_Savings, 1)),
    `PV Cost` = paste0("$", round(PV_Intervention_Cost, 1)),
    `Net Benefit` = ifelse(Net_Benefit > 0,
                           paste0("+$", round(Net_Benefit, 1), " (savings)"),
                           paste0("-", abs(round(Net_Benefit, 1)), " (loss)"))
  ) %>%
  select(Year, `PV Savings`, `PV Cost`, `Net Benefit`)

kable(df_table, align = "c",
      caption = "Table 1: Medical Cost Savings & Net Benefit by Year (per individual)") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)

Table 1: Medical Cost Savings & Net Benefit by Year (per individual)
Year	PV Savings	PV Cost	Net Benefit
1	$133	$404.9	-271.8 (loss)
2	$239.4	$393.1	-153.6 (loss)
3	$345	$381.6	-36.6 (loss)
4	$414.9	$370.5	+$44.4 (savings)
5	$480.5	$359.7	+$120.8 (savings)
6	$543.5	$349.2	+$194.3 (savings)
7	$604.1	$339.1	+$265.1 (savings)
8	$661.5	$329.2	+$332.3 (savings)
9	$715.1	$319.6	+$395.5 (savings)
10	$766.4	$310.3	+$456.1 (savings)

# Calculate summary statistics
# If df_table exists and Net Benefit column is in string format:
library(dplyr)
library(stringr)

# Extract numeric values from Net Benefit column
net_benefits <- df_table %>%
  mutate(Net_Benefit_Num = as.numeric(str_extract(`Net Benefit`, "-?\\d+\\.?\\d*"))) %>%
  pull(Net_Benefit_Num)

# Compute summary statistics
summary_stats <- tibble(
  Mean_Net_Benefit = mean(net_benefits),
  Median_Net_Benefit = median(net_benefits),
  SD_Net_Benefit = sd(net_benefits)
)

# Format with dollar signs and 2 decimal places
summary_stats_fmt <- summary_stats %>%
  mutate(across(everything(), ~ sprintf("$%.2f", .)))

# Display the table
kable(summary_stats_fmt,
      col.names = c("Mean Net Benefit", "Median Net Benefit", "Standard Deviation"),
      align = c("r", "r", "r"),
      caption = "Table 2: Aggregated Net Benefit Over 10 Years (per individual)") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)

Table 2: Aggregated Net Benefit Over 10 Years (per individual)
Mean Net Benefit	Median Net Benefit	Standard Deviation
$134.65	$157.55	$239.67

📝 Notes

How is medical cost calculated?
- As per the CDC diabetes dashboard, medical cost figures reflect average excess medical expenses for people with T2D (vs. no T2D).
- Shrestha et al. (2016), CDC analysis of MarketScan longitudinal medical claims (2001–2013).
To discuss next if it makes sense to discount prevention program costs.

💼 Productivity Costs: With & Without Prevention

📊 Plot 1: Annual Productivity Costs

Compares annual productivity loss in $ per participant across two scenarios (no T2D prevention vs. prevention) over a 10-year projected horizon.
Bar chart visualizes raw undiscounted costs of productivity loss as the cumulative difference between both scenarios (per participant).
Plot 1 does not account for cost of prevention.
CDC assumes:
- Days of work missed per year due to T2D diabetes = 3.3 days*
- CDC assumption: Daily earnings for persons with T2D diabetes = $276*

*See notes for more details on assumptions.

# 1. Base Productivity Data
df_prod <- data.frame(
  Year = 1:10,
  No_Intervention = c(55, 157, 301, 480, 689, 924, 1181, 1455, 1744, 2045),
  With_Intervention = c(35, 110, 222, 373, 556, 768, 1003, 1259, 1531, 1816)
)

df_prod <- df_prod %>%
  mutate(Difference = No_Intervention - With_Intervention)

# 2. Reshape for Plot 1
df_prod_long <- df_prod %>%
  pivot_longer(cols = c(No_Intervention, With_Intervention),
               names_to = "Type", values_to = "Cost") %>%
  mutate(Type = recode(Type,
                       "No_Intervention" = "No Prevention",
                       "With_Intervention" = "With Prevention"))

df_labels_prod <- df_prod %>%
  mutate(
    Label = paste0("+$", Difference),
    Y_Position = pmax(No_Intervention, With_Intervention) + 100,
    X_Position = as.numeric(factor(Year)) + 0.2
  )

# === Plot 1: Raw Productivity Costs ===
plot1_prod <- ggplot(df_prod_long, aes(x = factor(Year), y = Cost, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.8) +
  geom_label(data = df_labels_prod,
             aes(x = X_Position, y = Y_Position, label = Label),
             fill = "#FDEDEC",
             color = "black",
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +
  scale_fill_manual(
    name = "Scenario",
    values = c("No Prevention" = "#D55E00", "With Prevention" = "#0072B2")
  ) +
  labs(
    title = "",
    subtitle = "",
    x = "Year",
    y = "Annual Productivity Cost ($ USD)"
  ) +
  theme_classic(base_size = 14) +
  theme(plot.margin = margin(10, 20, 10, 20),
        legend.position = "bottom")

plot1_prod

📉 Plot 2: Annual Productivity Savings

Vlisualizes annual net productivity savings from prevention over 10-year projected horizon, accounting for the cost of prevention (NDP).
⚠️ Yellow bar shows cost of prevention assumed $417/year(with 3% annual discount rate)
✅ Green bar: shows productivity-specific savings* *discounted back to Year 0 using a 3% annual discount rate** to reflect time value of money.
Labels above the bars show the difference in discounted prevention savings on productivity versus prevention costs, with red (prevention cost > productivity savings).

  # === Plot 2: Discounted Productivity Costs ===
discount_rate <- 0.03
df_prod_discounted <- df_prod %>%
  mutate(
    Intervention_Cost = 417,
    Discount_Factor = 1 / (1 + discount_rate) ^ Year,
    PV_Savings = Difference * Discount_Factor,
    PV_Intervention_Cost = Intervention_Cost * Discount_Factor,
    Net_Benefit = PV_Savings - PV_Intervention_Cost
  )

df_prod_long_discounted <- df_prod_discounted %>%
  select(Year, PV_Savings, PV_Intervention_Cost) %>%
  pivot_longer(cols = c(PV_Savings, PV_Intervention_Cost),
               names_to = "Type", values_to = "Amount") %>%
  mutate(Type = recode(Type,
                       "PV_Savings" = "Prevention Savings",
                       "PV_Intervention_Cost" = "Prevention Cost"))

df_labels_discounted_prod <- df_prod_discounted %>%
  mutate(
    Label = ifelse(Net_Benefit >= 0,
                   paste0("+$", round(Net_Benefit, 0)),
                   paste0("-$", abs(round(Net_Benefit, 0)))),
    Y_Position = pmax(PV_Savings, PV_Intervention_Cost) + 20,
    X_Position = as.numeric(factor(Year)) + 0.2,
    Fill_Color = ifelse(Net_Benefit < 0, "#FDEDEC", "#D4EFDF"),
    Label_Color = ifelse(Net_Benefit < 0, "red", "black")
  )

plot2_prod <- ggplot(df_prod_long_discounted, aes(x = factor(Year), y = Amount, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.8) +
  geom_label(data = df_labels_discounted_prod,
             aes(x = X_Position, y = Y_Position, label = Label),
             fill = df_labels_discounted_prod$Fill_Color,
             color = df_labels_discounted_prod$Label_Color,
             inherit.aes = FALSE,
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +
  scale_fill_manual(
    name = "Discounted Components",
    values = c("Prevention Savings" = "#009E73", "Prevention Cost" = "#E69F00")
  ) +
  labs(
    title = "",
    subtitle = "",
    x = "Year",
    y = "Discounted $ USD"
  ) +
  theme_classic(base_size = 14) +
  theme(plot.margin = margin(10, 20, 10, 20),
        legend.position = "bottom")

plot2_prod

📋 Summary Table

Displays the discounted differences for each year between productivity-related savings, and intervention cost, showing that prevention costs > savings on productivity.

# Prepare productivity cost summary table
df_summary_prod <- df_prod_discounted %>%
  mutate(
    PV_Savings = round(PV_Savings),
    PV_Intervention_Cost = round(PV_Intervention_Cost),
    PV_Net_Savings = round(Net_Benefit)
  ) %>%
  select(
    Year,
    `Discounted Savings` = PV_Savings,
    `Discounted Cost` = PV_Intervention_Cost,
    `Net Present Value` = PV_Net_Savings
  ) %>%
  mutate(
    `Discounted Savings` = paste0("$", format(`Discounted Savings`, big.mark = ",")),
    `Discounted Cost` = paste0("$", format(`Discounted Cost`, big.mark = ",")),
    `Net Present Value` = paste0("$", format(`Net Present Value`, big.mark = ","))
  )

# Output Markdown table
knitr::kable(df_summary_prod, align = "c",
             caption = "Discounted Productivity Savings vs. Intervention Cost (per participant)") %>%
  kableExtra::kable_styling(full_width = FALSE, position = "left", font_size = 14)

Discounted Productivity Savings vs. Intervention Cost (per participant)
Year	Discounted Savings	Discounted Cost	Net Present Value
1	$ 19	$405	$-385
2	$ 44	$393	$-349
3	$ 72	$382	$-309
4	$ 95	$370	$-275
5	$115	$360	$-245
6	$131	$349	$-219
7	$145	$339	$-194
8	$155	$329	$-174
9	$163	$320	$-156
10	$170	$310	$-140

npv_values <- df_summary_prod %>%
  mutate(NPV_Num = as.numeric(str_replace_all(`Net Present Value`, "[ $,]", ""))) %>%
  pull(NPV_Num)

# Compute summary statistics
npv_summary <- tibble(
  Mean_NPV = mean(npv_values),
  Median_NPV = median(npv_values),
  SD_NPV = sd(npv_values)
)

# Format with dollar signs
npv_summary_fmt <- npv_summary %>%
  mutate(across(everything(), ~ sprintf("$%.2f", .)))

# Render summary table
kable(npv_summary_fmt,
      col.names = c("Mean Net Present Value", "Median", "SD"),
      align = c("r", "r", "r"),
      caption = "Summary of Net Benefit (Productivity) Over 10 Years (per participant)") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)

Summary of Net Benefit (Productivity) Over 10 Years (per participant)
Mean Net Present Value	Median	SD
$-244.60	$-232.00	$83.45

📝 Notes

Days of work missed due to T2 diabetes are calculated the excess days of work missed by someone with diabetes compared with a similar person without diabetes (e.g., similar age, sex, comorbidities).
Estimate of $276 in daily earnings is a weighted average of the estimated daily earnings reported in the current Population Survey for 45- to 64-year-old males and females. 45 to 64 age group is selected because the mean age of persons with prediabetes is about 52 years old

✨ Cumulative Savings: Medical & Productivity

📊 Plot : Annual Cumulative Prevention Savings

Plot combines both medical and productivity savings relative to combined prevention costs over 10-year projected horizon.
Green labels above bars show savings > costs, pink labels show costs > savings

library(tidyverse)

# Data
df_summary <- data.frame(
  Year = 1:10,
  Sum_Cost_Saving = c(157, 301, 456, 575, 690, 805, 920, 1034, 1147, 1259),
  Program_Cost = rep(417, 10),
  Net_Cost = c(260, 116, -39, -158, -273, -388, -503, -617, -730, -842)
)

# Long format for bars
df_long_summary <- df_summary %>%
  pivot_longer(cols = c(Sum_Cost_Saving, Program_Cost),
               names_to = "Type", values_to = "Amount") %>%
  mutate(Type = recode(Type,
                       "Sum_Cost_Saving" = "Cumulative Savings",
                       "Program_Cost" = "Program Cost"))  # FIXED this line

# Label dataset with custom fill colors
df_labels <- df_summary %>%
  mutate(
    Label = paste0("$", Net_Cost),
    Fill_Color = ifelse(Net_Cost > 0, "#FDEDEC", "#D4EFDF"),  # Light red or light blue
    Y_Position = pmax(Sum_Cost_Saving, Program_Cost) + 40
  )


# Plot
ggplot(df_long_summary, aes(x = factor(Year), y = Amount, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +

  # Label layer with background fill from Fill_Color column
  geom_label(data = df_labels,
             aes(x = factor(Year),
                 y = Y_Position,
                 label = Label),
             fill = df_labels$Fill_Color,
             color = "black",
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +

  # Correct bar fill colors
  scale_fill_manual(
    name = "Component",
    values = c("Cumulative Savings" = "#009E73",  # green
               "Program Cost" = "#E69F00")        # orange
  ) +

  labs(
    title = "",
    subtitle = "",
    x = "Year",
    y = "Cumulative Value (USD)"
  ) +
  theme_classic(base_size = 14)

📋 Summary Table

df_table <- df_summary %>%
  mutate(
    `Cumulative Savings` = paste0("$", format(Sum_Cost_Saving, big.mark = ",")),
    `Program Cost` = paste0("$", format(Program_Cost, big.mark = ",")),
    `Net Cost` = ifelse(
      Net_Cost < 0,
      paste0("$-", format(abs(Net_Cost), big.mark = ","), " (savings)"),
      paste0("$", format(Net_Cost, big.mark = ","))
    )
  ) %>%
  select(
    Year,
    `Cumulative Savings`,
    `Program Cost`,
    `Net Cost`
  )

kable(df_table, align = "c",
      caption = "Cumulative Cost Savings vs. Program Cost Over 10 Years") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)

Cumulative Cost Savings vs. Program Cost Over 10 Years
Year	Cumulative Savings	Program Cost	Net Cost
1	$ 157	$417	$ 260
2	$ 301	$417	$ 116
3	$ 456	$417	$- 39 (savings)
4	$ 575	$417	$-158 (savings)
5	$ 690	$417	$-273 (savings)
6	$ 805	$417	$-388 (savings)
7	$ 920	$417	$-503 (savings)
8	$1,034	$417	$-617 (savings)
9	$1,147	$417	$-730 (savings)
10	$1,259	$417	$-842 (savings)

######

# Extract numeric values
cost_summary_values <- df_summary %>%
  summarise(
    Cumulative_Savings = Sum_Cost_Saving,
    Program_Cost = Program_Cost,
    Net_Cost = Net_Cost
  )

# Compute summary statistics
cost_summary_stats <- cost_summary_values %>%
  summarise(
    Mean_Savings = mean(Cumulative_Savings),
    Median_Savings = median(Cumulative_Savings),
    SD_Savings = sd(Cumulative_Savings),
    Mean_Program = mean(Program_Cost),
    Median_Program = median(Program_Cost),
    SD_Program = sd(Program_Cost),
    Mean_Net = mean(Net_Cost),
    Median_Net = median(Net_Cost),
    SD_Net = sd(Net_Cost)
  )

# Reshape into long format for table display
cost_summary_long <- tibble(
  Metric = c("Cumulative Savings"),
  Mean = c(cost_summary_stats$Mean_Savings),
  Median = c(cost_summary_stats$Median_Savings),
  SD = c(cost_summary_stats$SD_Savings)
)

# Format values with dollar signs and commas
cost_summary_fmt <- cost_summary_long %>%
  mutate(across(c(Mean, Median, SD), ~ sprintf("$%s", format(round(.x, 2), big.mark = ","))))

# Render summary table
kable(cost_summary_fmt,
      col.names = c("Metric", "Mean", "Median", "SD"),
      align = c("l", "r", "r", "r"),
      caption = "Summary Statistics for Net Savings (medical and productivity combined) Over 10 Years") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)

Summary Statistics for Net Savings (medical and productivity combined) Over 10 Years
Metric	Mean	Median	SD
Cumulative Savings	$734.4	$747.5	$365.26

⚖️ QALYs gained from prevention

Cumulative Quality Adjusted Life Years (QALYs) Gained for 2,638 Projected Participants.

📊 Plot : Difference in QALYs (prevention vs. no)

library(tidyverse)
library(tidyverse)

# Data
df_qaly <- data.frame(
  Year = 1:10,
  No_Intervention = c(2145.0, 4211.9, 6203.3, 8121.9, 9970.1, 11750.4, 13465.3, 15116.9, 16707.5, 18239.3),
  With_Intervention = c(2147.3, 4217.5, 6213.1, 8135.8, 9988.1, 11772.5, 13491.3, 15146.8, 16741.2, 18276.8),
  QALYs_Gained = c(2.3, 5.6, 9.8, 13.9, 18.0, 22.1, 26.0, 29.9, 33.7, 37.5)
)

# Bar format
df_long_qaly <- df_qaly %>%
  pivot_longer(cols = c(No_Intervention, With_Intervention),
               names_to = "Type", values_to = "QALYs") %>%
  mutate(Type = recode(Type,
                       "No_Intervention" = "No Prevention",
                       "With_Intervention" = "With Prevention"))

# Labels with adjusted X position
df_qaly_labels <- df_qaly %>%
  mutate(
    Label = paste0("+", QALYs_Gained, " QALYs"),
    Y_Position = pmax(No_Intervention, With_Intervention) + 100,
    X_Position = as.numeric(factor(Year)) + 0.2  # slight right shift
  )

# Plot
ggplot(df_long_qaly, aes(x = factor(Year), y = QALYs, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.8) +
  
  # Labels with custom x and y
  geom_label(data = df_qaly_labels,
             aes(x = X_Position,
                 y = Y_Position,
                 label = Label),
             fill = "#D1ECF1",
             color = "black",
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +

  scale_fill_manual(
    name = "Scenario",
    values = c("No Prevention" = "#999999",  # grey
               "With Prevention" = "#56B4E9") # blue
  ) +

  labs(
    title = "",
    subtitle = "",
    x = "Year",
    y = "Cumulative QALYs"
  ) +
  theme_classic(base_size = 14) +
  theme(plot.margin = margin(10, 20, 10, 20))  # more spacing left/right

📋 Summary Table

The Cumulative “QALYs gained” column is equal to the difference between combined (medical + productivity) costs in the “QALYs with No Intervention” column and the “QALYs with Intervention” column.

# Data
df_qaly <- data.frame(
  Year = 1:10,
  No_Intervention = c(2145.0, 4211.9, 6203.3, 8121.9, 9970.1, 11750.4, 13465.3, 15116.9, 16707.5, 18239.3),
  With_Intervention = c(2147.3, 4217.5, 6213.1, 8135.8, 9988.1, 11772.5, 13491.3, 15146.8, 16741.2, 18276.8),
  QALYs_Gained = c(2.3, 5.6, 9.8, 13.9, 18.0, 22.1, 26.0, 29.9, 33.7, 37.5)
)

# Table formatting
df_qaly_table <- df_qaly %>%
  mutate(
    `No Intervention` = format(round(No_Intervention, 1), big.mark = ","),
    `With Intervention` = format(round(With_Intervention, 1), big.mark = ","),
    `QALYs Gained` = paste0("+", QALYs_Gained, " (gained)")
  ) %>%
  select(
    Year,
    `No Intervention`,
    `With Intervention`,
    `QALYs Gained`
  )

# Create table
kable(df_qaly_table, align = "c",
      caption = "Cumulative QALYs Over 10 Years: Intervention vs. No Intervention") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)

Cumulative QALYs Over 10 Years: Intervention vs. No Intervention
Year	No Intervention	With Intervention	QALYs Gained
1	2,145.0	2,147.3	+2.3 (gained)
2	4,211.9	4,217.5	+5.6 (gained)
3	6,203.3	6,213.1	+9.8 (gained)
4	8,121.9	8,135.8	+13.9 (gained)
5	9,970.1	9,988.1	+18 (gained)
6	11,750.4	11,772.5	+22.1 (gained)
7	13,465.3	13,491.3	+26 (gained)
8	15,116.9	15,146.8	+29.9 (gained)
9	16,707.5	16,741.2	+33.7 (gained)
10	18,239.3	18,276.8	+37.5 (gained)

library(dplyr)
library(stringr)
library(knitr)
library(kableExtra)

# Extract numeric QALYs gained values from string (e.g., "+2.3 (gained)")
qaly_values <- df_qaly_table %>%
  mutate(QALYs_Num = as.numeric(str_replace_all(`QALYs Gained`, "[^0-9.-]", ""))) %>%
  pull(QALYs_Num)

# Compute summary statistics
qaly_summary <- tibble(
  Mean_QALYs = mean(qaly_values),
  Median_QALYs = median(qaly_values),
  SD_QALYs = sd(qaly_values)
)

# Format to two decimals, with "+" prefix
qaly_summary_fmt <- qaly_summary %>%
  mutate(across(everything(), ~ sprintf("+%.2f", .)))

# Render summary table
kable(qaly_summary_fmt,
      col.names = c("Mean QALYs Gained", "Median", "Standard Deviation"),
      align = c("r", "r", "r"),
      caption = "Summary of QALYs Gained Over 10 Years") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)

Summary of QALYs Gained Over 10 Years
Mean QALYs Gained	Median	Standard Deviation
+19.88	+20.05	+12.01

Preliminary Take-aways

✅ Summary medical cost-savings

Years 1–3: Not cost-effective → discounted savings < cost; Years 4–10: Cost-effective → discounted savings > cost; Breakeven occurs around Year 4
However, important to note over 10 years SD of the net benefit is 2X larger than both the mean, suggesting very high that there is high variability in the net benefit outcomes across time (standard deviation of $239.67, compared to a mean of $134.65)

❌ Summary Productivity cost-savings

Plots suggest T2D prevention intervention not cost-effective on productvity loss (assumed 3.3 per days of work missed per year per participant)

✅ Summary cuumulative cost-savings

On average, plots suggest T2D prevention intervention are cost-effective on cumulative (medical & productivity) costs, with approx. $734.4 (sd=$365.26) savings anually per participant over 10 years.
Years 1–2: Not cost-effective → discounted savings < cost; Years 3–10: Cost-effective → discounted savings > cost; Breakeven occurs around Year 3.

✅ Summary QALYs

On average, prevention led to a gain of approximately +19.88 QALYs over 10 years, with a standard deviation of +12.01.
Prevention consistently showed higher QALYs compared to no intervention across all 10 years.

Cost analyses: late-stage prediabetes prevention

Mia Jovanova

2025-06-26

Data & assumptions

🏥 Medical Costs: With & Without Prevention

📊 Plot 1: Annual Medical Costs

📉 Plot 2: Annual Medical Savings

📋 Summary Tables

📝 Notes

💼 Productivity Costs: With & Without Prevention

📊 Plot 1: Annual Productivity Costs

📉 Plot 2: Annual Productivity Savings

📋 Summary Table

📝 Notes

✨ Cumulative Savings: Medical & Productivity

📊 Plot : Annual Cumulative Prevention Savings

📋 Summary Table

⚖️ QALYs gained from prevention

📊 Plot : Difference in QALYs (prevention vs. no)

📋 Summary Table

Preliminary Take-aways

✅ Summary medical cost-savings

❌ Summary Productivity cost-savings

✅ Summary cuumulative cost-savings

✅ Summary QALYs