Data & assumptions

Data source: Diabetes Prevention Impact Toolkit (CDC; US-based)

Summary of our main assumptions:

  • Total sample: 100,000 adults in the US with late stage prediabetes (Hb1Ac 6.0-6.4) aka big fish out of whom 2,638 are assumed to participate in the prevention program (NDP)
  • Age bracket: 45–64 years only
  • BMI: 40.00% with BMI ≥ 30.0: 40.00% (obese); 60.00% with BMI ≥ 24.0 and < 30.0 (overweight)
  • Prevention program cost : $417.00 annually (given as default by CDC*)
  • Productivity loss: days missed per Year from work due to diabetes: 3.3 (given as default by CDC*)

Additional details behind default assumptions: https://nccd.cdc.gov/Toolkit/DiabetesImpact/docs/input-checklist_state.pdf

# Load required package
library(knitr)
library(tidyverse)
library(patchwork)


# Create data frame
inputs <- data.frame(
  Toolkit_Inputs = c(
    "<span style='color:red'><b>Population</b></span>", 
    "Total Adult Population", 
    "Eligible Persons", 
    "Completed Screening", 
    "Participate in Intervention",

    "<span style='color:red'><b>Age Breakdown</b></span>",
    "  % Age 45–64", 
    "  % Age 65–74", 
    "  % Age 75+",

    "<span style='color:red'><b>Sex Breakdown</b></span>", 
    "  % Male", 
    "  % Female",

    "<span style='color:red'><b>Race Breakdown</b></span>", 
    "  % White (Non-Hispanic)", 
    "  % Black (Non-Hispanic)", 
    "  % Hispanic", 
    "  % Asian", 
    "  % Other Race", 

    "<span style='color:red'><b>Body Weight Breakdown</b></span>", 
    "  % Obese (BMI ≥ 30.0)", 
    "  % Overweight (BMI ≥ 24.0 and < 30.0)", 

    "<span style='color:red'><b>Prediabetes Screening Participation</b></span>",
    "Persons with Prediabetes", 
    "No new screenings for prediabetes", 
    "% Eligible previously screened", 
    "% Screened who participate", 
    
   "<span style='color:red'><b>Prevention Cost Assumptions </b></span>",
    "Prevention Program Cost", 
    "Costs in Year of Diagnosis", 
    "Costs per Year After Diagnosis", 
    "Discount Rate", 
    
   "<span style='color:red'><b>Productivity Loss Assumptions </b></span>", 
    "Days Missed per Year", 
    "Daily Earnings (with diabetes)"
  ),
  Values_Used_in_Analysis = c(
    "", 
    "100,000", 
    "16,386", 
    "7,538", 
    "2,638",
    
    "",
    "100.00%", 
    "0.00%", 
    "0.00%", 
    
    "", 
    "53.82%", 
    "46.18%", 
    
    "", 
    "66.75%", 
    "10.36%", 
    "14.99%", 
    "5.10%", 
    "2.80%", 
    
    "", 
    "40.00%", 
    "60.00%", 

    "",
    "6.20%", 
    "Checked", 
    "46.00%", 
    "35.00%", 
    
    "", 
    "$417.00", 
    "$6,425.00", 
    "$3,900.00", 
    "3.00%", 
    
    "", 
    "3.3", 
    "$276.00"
  )
)

# Print nicely formatted markdown table
kable(inputs, format = "markdown", col.names = c("**Toolkit Inputs**", "**Values Used in Analysis**"))
Toolkit Inputs Values Used in Analysis
Population
Total Adult Population 100,000
Eligible Persons 16,386
Completed Screening 7,538
Participate in Intervention 2,638
Age Breakdown
% Age 45–64 100.00%
% Age 65–74 0.00%
% Age 75+ 0.00%
Sex Breakdown
% Male 53.82%
% Female 46.18%
Race Breakdown
% White (Non-Hispanic) 66.75%
% Black (Non-Hispanic) 10.36%
% Hispanic 14.99%
% Asian 5.10%
% Other Race 2.80%
Body Weight Breakdown
% Obese (BMI ≥ 30.0) 40.00%
% Overweight (BMI ≥ 24.0 and < 30.0) 60.00%
Prediabetes Screening Participation
Persons with Prediabetes 6.20%
No new screenings for prediabetes Checked
% Eligible previously screened 46.00%
% Screened who participate 35.00%
Prevention Cost Assumptions
Prevention Program Cost $417.00
Costs in Year of Diagnosis $6,425.00
Costs per Year After Diagnosis $3,900.00
Discount Rate 3.00%
Productivity Loss Assumptions
Days Missed per Year 3.3
Daily Earnings (with diabetes) $276.00

🏥 Medical Costs: With & Without Prevention

This section analyzes projected medical costs for 2,638 individuals with late stage prediabetes under two scenarios:

  • No Prevention: Baseline T2 diabetes-specific medical costs with no preventive measures.
  • With Prevention: T2 diabetes-specific medical costs with participation in lifestyle preventive program (NDP).

See ‘📝 Notes’ section for diabetes-specific medical cost estimate calculation and reference


📊 Plot 1: Annual Medical Costs

  • Compares T2 diabetes-specific medical costs across both scenarios (no prevention vs. prevention) over a 10-year projected horizon.
  • Bar chart visualizes raw (undiscounted) T2 diabetes-specific medical costs per year per participant.
  • Labels above the bars show the annual cost differences (no prevention - prevention) displayed in light pink.
  • Plot 1. does not account for how much the prevention intervention costs.
  • Diffference in cost (no prevention vs. prevention) is calculated per participant.

df_med <- data.frame(
  Year = 1:10,
  No_Intervention = c(6001, 12024, 18047, 24055, 30128, 36168, 42181, 48163, 54104, 60007),
  With_Intervention = c(5864, 11770, 17670, 23588, 29571, 35519, 41438, 47325, 53171, 58977)
)

df_med <- df_med %>%
  mutate(Difference = No_Intervention - With_Intervention)

# 2. Reshape for Plot 1
df_med_long <- df_med %>%
  pivot_longer(cols = c(No_Intervention, With_Intervention),
               names_to = "Type", values_to = "Cost") %>%
  mutate(Type = recode(Type,
                       "No_Intervention" = "No Prevention",
                       "With_Intervention" = "With Prevention"))

df_labels <- df_med %>%
  mutate(
    Label = paste0("+$", format(Difference, big.mark = ",")),
    Y_Position = pmax(No_Intervention, With_Intervention) + 1000,
    X_Position = as.numeric(factor(Year)) + 0.2
  )

# 3. Plot 1: Medical Costs per Year
plot1 <- ggplot(df_med_long, aes(x = factor(Year), y = Cost, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.8) +
  geom_label(data = df_labels,
             aes(x = X_Position, y = Y_Position, label = Label),
             fill = "#FDEDEC",
             color = "black",
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +
  scale_fill_manual(
    name = "",
    values = c("No Prevention" = "#D55E00", "With Prevention" = "#0072B2")
  ) +
  labs(
    title = "",
    subtitle = "T2D-specific medical costs over 10y (no prevention vs. prevention)",
    x = "Year",
    y = "Annual Medical Cost ($ USD)"
  ) +
  theme_classic(base_size = 14) +
  theme(plot.margin = margin(10, 20, 10, 20),
        legend.position = "bottom")
plot1

📉 Plot 2: Annual Medical Savings

  • Vlisualizes annual net medical savings from prevention (derived from plot1) over 10-year projected horizon, accounting for the $ cost of prevention (NDP).

  • Green bar: shows prevention savings* discounted back to Year 0 using a 3% annual discount rate to reflect time value of money.

  • ⚠️ Yellow bar shows cost of prevention assumed $417/year(with 3% annual discount rate[to discuss if discounting here makes sense]

  • Labels above bars show difference in discounted prevention savings and prevention costs, with green (prevention savings > cost) and red (prevention cost > savings).

  • Diffference in savings (prevention savings vs. prevention costs) is calculated per participant.


library(tidyverse)
library(patchwork)
library(tidyverse)
library(ggplot2)
library(knitr)
library(kableExtra)

# 4. Add discounting for Plot 2 and Table
discount_rate <- 0.03
df_med_discounted <- df_med %>%
  mutate(
    Intervention_Cost = 417,
    Discount_Factor = 1 / (1 + discount_rate) ^ Year,
    PV_Savings = Difference * Discount_Factor,
    PV_Intervention_Cost = Intervention_Cost * Discount_Factor,
    Net_Benefit = PV_Savings - PV_Intervention_Cost
  )

df_med_long_discounted <- df_med_discounted %>%
  select(Year, PV_Savings, PV_Intervention_Cost) %>%
  pivot_longer(cols = c(PV_Savings, PV_Intervention_Cost),
               names_to = "Type", values_to = "Amount") %>%
  mutate(Type = recode(Type,
                       "PV_Savings" = "Prevention Savings",
                       "PV_Intervention_Cost" = "Prevention Cost"))

# Labels: include **negative net benefit** explicitly
# Update label data with custom colors
df_labels_discounted <- df_med_discounted %>%
  mutate(
    Label = ifelse(Net_Benefit >= 0,
                   paste0("+$", round(Net_Benefit, 0)),
                   paste0("-$", abs(round(Net_Benefit, 0)))),
    Y_Position = pmax(PV_Savings, PV_Intervention_Cost) + 100,
    X_Position = as.numeric(factor(Year)) + 0.2,
    Fill_Color = ifelse(Net_Benefit < 0, "#FDEDEC", "#D4EFDF"),  # pink for negative
    Label_Color = ifelse(Net_Benefit < 0, "red", "black")
  )

# Plot 2: Avoid double fill scale
plot2 <- ggplot(df_med_long_discounted, aes(x = factor(Year), y = Amount, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.8) +

  # Labels with custom background fill and text color
  geom_label(data = df_labels_discounted,
             aes(x = X_Position, y = Y_Position, label = Label),
             fill = df_labels_discounted$Fill_Color,
             color = df_labels_discounted$Label_Color,
             inherit.aes = FALSE,
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +

  scale_fill_manual(
    name = "",
    values = c("Prevention Savings" = "#009E73", "Prevention Cost" = "#E69F00")
  ) +
  labs(
    title = "",
    subtitle = "",
    x = "Year",
    y = "Discounted $ USD"
  ) + theme_classic(base_size = 14) +
  theme(plot.margin = margin(10, 20, 10, 20),
        legend.position = "bottom")
# 6. Combine Plots Side by Side
plot2

📋 Summary Tables

  • Table 1: displays discounted differences for each year between present prevention savings and intervention costs, with Net benefit: categorized as either 💰 savings or 🧾 loss.
  • Table 2: displays summary descriptives (mean, median and SD) of prevention net benefit aggregated over 10 Years.

df_table <- df_med_discounted %>%
  mutate(
    `Year` = Year,
    `PV Savings` = paste0("$", round(PV_Savings, 1)),
    `PV Cost` = paste0("$", round(PV_Intervention_Cost, 1)),
    `Net Benefit` = ifelse(Net_Benefit > 0,
                           paste0("+$", round(Net_Benefit, 1), " (savings)"),
                           paste0("-", abs(round(Net_Benefit, 1)), " (loss)"))
  ) %>%
  select(Year, `PV Savings`, `PV Cost`, `Net Benefit`)

kable(df_table, align = "c",
      caption = "Table 1: Medical Cost Savings & Net Benefit by Year (per individual)") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)
Table 1: Medical Cost Savings & Net Benefit by Year (per individual)
Year PV Savings PV Cost Net Benefit
1 $133 $404.9 -271.8 (loss)
2 $239.4 $393.1 -153.6 (loss)
3 $345 $381.6 -36.6 (loss)
4 $414.9 $370.5 +$44.4 (savings)
5 $480.5 $359.7 +$120.8 (savings)
6 $543.5 $349.2 +$194.3 (savings)
7 $604.1 $339.1 +$265.1 (savings)
8 $661.5 $329.2 +$332.3 (savings)
9 $715.1 $319.6 +$395.5 (savings)
10 $766.4 $310.3 +$456.1 (savings)
# Calculate summary statistics
# If df_table exists and Net Benefit column is in string format:
library(dplyr)
library(stringr)

# Extract numeric values from Net Benefit column
net_benefits <- df_table %>%
  mutate(Net_Benefit_Num = as.numeric(str_extract(`Net Benefit`, "-?\\d+\\.?\\d*"))) %>%
  pull(Net_Benefit_Num)

# Compute summary statistics
summary_stats <- tibble(
  Mean_Net_Benefit = mean(net_benefits),
  Median_Net_Benefit = median(net_benefits),
  SD_Net_Benefit = sd(net_benefits)
)

# Format with dollar signs and 2 decimal places
summary_stats_fmt <- summary_stats %>%
  mutate(across(everything(), ~ sprintf("$%.2f", .)))

# Display the table
kable(summary_stats_fmt,
      col.names = c("Mean Net Benefit", "Median Net Benefit", "Standard Deviation"),
      align = c("r", "r", "r"),
      caption = "Table 2: Aggregated Net Benefit Over 10 Years (per individual)") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)
Table 2: Aggregated Net Benefit Over 10 Years (per individual)
Mean Net Benefit Median Net Benefit Standard Deviation
$134.65 $157.55 $239.67

📝 Notes

  • How is medical cost calculated?

    • As per the CDC diabetes dashboard, medical cost figures reflect average excess medical expenses for people with T2D (vs. no T2D).
    • Shrestha et al. (2016), CDC analysis of MarketScan longitudinal medical claims (2001–2013).


  • To discuss next if it makes sense to discount prevention program costs.

💼 Productivity Costs: With & Without Prevention

📊 Plot 1: Annual Productivity Costs

  • Compares annual productivity loss in $ per participant across two scenarios (no T2D prevention vs. prevention) over a 10-year projected horizon.

  • Bar chart visualizes raw undiscounted costs of productivity loss as the cumulative difference between both scenarios (per participant).

  • Plot 1 does not account for cost of prevention.

  • CDC assumes:

    • Days of work missed per year due to T2D diabetes = 3.3 days*
    • CDC assumption: Daily earnings for persons with T2D diabetes = $276*


  • *See notes for more details on assumptions.

# 1. Base Productivity Data
df_prod <- data.frame(
  Year = 1:10,
  No_Intervention = c(55, 157, 301, 480, 689, 924, 1181, 1455, 1744, 2045),
  With_Intervention = c(35, 110, 222, 373, 556, 768, 1003, 1259, 1531, 1816)
)

df_prod <- df_prod %>%
  mutate(Difference = No_Intervention - With_Intervention)

# 2. Reshape for Plot 1
df_prod_long <- df_prod %>%
  pivot_longer(cols = c(No_Intervention, With_Intervention),
               names_to = "Type", values_to = "Cost") %>%
  mutate(Type = recode(Type,
                       "No_Intervention" = "No Prevention",
                       "With_Intervention" = "With Prevention"))

df_labels_prod <- df_prod %>%
  mutate(
    Label = paste0("+$", Difference),
    Y_Position = pmax(No_Intervention, With_Intervention) + 100,
    X_Position = as.numeric(factor(Year)) + 0.2
  )

# === Plot 1: Raw Productivity Costs ===
plot1_prod <- ggplot(df_prod_long, aes(x = factor(Year), y = Cost, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.8) +
  geom_label(data = df_labels_prod,
             aes(x = X_Position, y = Y_Position, label = Label),
             fill = "#FDEDEC",
             color = "black",
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +
  scale_fill_manual(
    name = "Scenario",
    values = c("No Prevention" = "#D55E00", "With Prevention" = "#0072B2")
  ) +
  labs(
    title = "",
    subtitle = "",
    x = "Year",
    y = "Annual Productivity Cost ($ USD)"
  ) +
  theme_classic(base_size = 14) +
  theme(plot.margin = margin(10, 20, 10, 20),
        legend.position = "bottom")

plot1_prod

📉 Plot 2: Annual Productivity Savings

  • Vlisualizes annual net productivity savings from prevention over 10-year projected horizon, accounting for the cost of prevention (NDP).

  • ⚠️ Yellow bar shows cost of prevention assumed $417/year(with 3% annual discount rate)

  • Green bar: shows productivity-specific savings* *discounted back to Year 0 using a 3% annual discount rate** to reflect time value of money.

  • Labels above the bars show the difference in discounted prevention savings on productivity versus prevention costs, with red (prevention cost > productivity savings).


  # === Plot 2: Discounted Productivity Costs ===
discount_rate <- 0.03
df_prod_discounted <- df_prod %>%
  mutate(
    Intervention_Cost = 417,
    Discount_Factor = 1 / (1 + discount_rate) ^ Year,
    PV_Savings = Difference * Discount_Factor,
    PV_Intervention_Cost = Intervention_Cost * Discount_Factor,
    Net_Benefit = PV_Savings - PV_Intervention_Cost
  )

df_prod_long_discounted <- df_prod_discounted %>%
  select(Year, PV_Savings, PV_Intervention_Cost) %>%
  pivot_longer(cols = c(PV_Savings, PV_Intervention_Cost),
               names_to = "Type", values_to = "Amount") %>%
  mutate(Type = recode(Type,
                       "PV_Savings" = "Prevention Savings",
                       "PV_Intervention_Cost" = "Prevention Cost"))

df_labels_discounted_prod <- df_prod_discounted %>%
  mutate(
    Label = ifelse(Net_Benefit >= 0,
                   paste0("+$", round(Net_Benefit, 0)),
                   paste0("-$", abs(round(Net_Benefit, 0)))),
    Y_Position = pmax(PV_Savings, PV_Intervention_Cost) + 20,
    X_Position = as.numeric(factor(Year)) + 0.2,
    Fill_Color = ifelse(Net_Benefit < 0, "#FDEDEC", "#D4EFDF"),
    Label_Color = ifelse(Net_Benefit < 0, "red", "black")
  )

plot2_prod <- ggplot(df_prod_long_discounted, aes(x = factor(Year), y = Amount, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.8) +
  geom_label(data = df_labels_discounted_prod,
             aes(x = X_Position, y = Y_Position, label = Label),
             fill = df_labels_discounted_prod$Fill_Color,
             color = df_labels_discounted_prod$Label_Color,
             inherit.aes = FALSE,
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +
  scale_fill_manual(
    name = "Discounted Components",
    values = c("Prevention Savings" = "#009E73", "Prevention Cost" = "#E69F00")
  ) +
  labs(
    title = "",
    subtitle = "",
    x = "Year",
    y = "Discounted $ USD"
  ) +
  theme_classic(base_size = 14) +
  theme(plot.margin = margin(10, 20, 10, 20),
        legend.position = "bottom")

plot2_prod

📋 Summary Table

  • Displays the discounted differences for each year between productivity-related savings, and intervention cost, showing that prevention costs > savings on productivity.

# Prepare productivity cost summary table
df_summary_prod <- df_prod_discounted %>%
  mutate(
    PV_Savings = round(PV_Savings),
    PV_Intervention_Cost = round(PV_Intervention_Cost),
    PV_Net_Savings = round(Net_Benefit)
  ) %>%
  select(
    Year,
    `Discounted Savings` = PV_Savings,
    `Discounted Cost` = PV_Intervention_Cost,
    `Net Present Value` = PV_Net_Savings
  ) %>%
  mutate(
    `Discounted Savings` = paste0("$", format(`Discounted Savings`, big.mark = ",")),
    `Discounted Cost` = paste0("$", format(`Discounted Cost`, big.mark = ",")),
    `Net Present Value` = paste0("$", format(`Net Present Value`, big.mark = ","))
  )

# Output Markdown table
knitr::kable(df_summary_prod, align = "c",
             caption = "Discounted Productivity Savings vs. Intervention Cost (per participant)") %>%
  kableExtra::kable_styling(full_width = FALSE, position = "left", font_size = 14)
Discounted Productivity Savings vs. Intervention Cost (per participant)
Year Discounted Savings Discounted Cost Net Present Value
1 $ 19 $405 $-385
2 $ 44 $393 $-349
3 $ 72 $382 $-309
4 $ 95 $370 $-275
5 $115 $360 $-245
6 $131 $349 $-219
7 $145 $339 $-194
8 $155 $329 $-174
9 $163 $320 $-156
10 $170 $310 $-140
npv_values <- df_summary_prod %>%
  mutate(NPV_Num = as.numeric(str_replace_all(`Net Present Value`, "[ $,]", ""))) %>%
  pull(NPV_Num)

# Compute summary statistics
npv_summary <- tibble(
  Mean_NPV = mean(npv_values),
  Median_NPV = median(npv_values),
  SD_NPV = sd(npv_values)
)

# Format with dollar signs
npv_summary_fmt <- npv_summary %>%
  mutate(across(everything(), ~ sprintf("$%.2f", .)))

# Render summary table
kable(npv_summary_fmt,
      col.names = c("Mean Net Present Value", "Median", "SD"),
      align = c("r", "r", "r"),
      caption = "Summary of Net Benefit (Productivity) Over 10 Years (per participant)") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)
Summary of Net Benefit (Productivity) Over 10 Years (per participant)
Mean Net Present Value Median SD
$-244.60 $-232.00 $83.45

📝 Notes

  • Days of work missed due to T2 diabetes are calculated the excess days of work missed by someone with diabetes compared with a similar person without diabetes (e.g., similar age, sex, comorbidities).

  • Estimate of $276 in daily earnings is a weighted average of the estimated daily earnings reported in the current Population Survey for 45- to 64-year-old males and females. 45 to 64 age group is selected because the mean age of persons with prediabetes is about 52 years old

✨ Cumulative Savings: Medical & Productivity

📊 Plot : Annual Cumulative Prevention Savings

  • Plot combines both medical and productivity savings relative to combined prevention costs over 10-year projected horizon.
  • Green labels above bars show savings > costs, pink labels show costs > savings
library(tidyverse)

# Data
df_summary <- data.frame(
  Year = 1:10,
  Sum_Cost_Saving = c(157, 301, 456, 575, 690, 805, 920, 1034, 1147, 1259),
  Program_Cost = rep(417, 10),
  Net_Cost = c(260, 116, -39, -158, -273, -388, -503, -617, -730, -842)
)

# Long format for bars
df_long_summary <- df_summary %>%
  pivot_longer(cols = c(Sum_Cost_Saving, Program_Cost),
               names_to = "Type", values_to = "Amount") %>%
  mutate(Type = recode(Type,
                       "Sum_Cost_Saving" = "Cumulative Savings",
                       "Program_Cost" = "Program Cost"))  # FIXED this line

# Label dataset with custom fill colors
df_labels <- df_summary %>%
  mutate(
    Label = paste0("$", Net_Cost),
    Fill_Color = ifelse(Net_Cost > 0, "#FDEDEC", "#D4EFDF"),  # Light red or light blue
    Y_Position = pmax(Sum_Cost_Saving, Program_Cost) + 40
  )


# Plot
ggplot(df_long_summary, aes(x = factor(Year), y = Amount, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +

  # Label layer with background fill from Fill_Color column
  geom_label(data = df_labels,
             aes(x = factor(Year),
                 y = Y_Position,
                 label = Label),
             fill = df_labels$Fill_Color,
             color = "black",
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +

  # Correct bar fill colors
  scale_fill_manual(
    name = "Component",
    values = c("Cumulative Savings" = "#009E73",  # green
               "Program Cost" = "#E69F00")        # orange
  ) +

  labs(
    title = "",
    subtitle = "",
    x = "Year",
    y = "Cumulative Value (USD)"
  ) +
  theme_classic(base_size = 14)

📋 Summary Table

df_table <- df_summary %>%
  mutate(
    `Cumulative Savings` = paste0("$", format(Sum_Cost_Saving, big.mark = ",")),
    `Program Cost` = paste0("$", format(Program_Cost, big.mark = ",")),
    `Net Cost` = ifelse(
      Net_Cost < 0,
      paste0("$-", format(abs(Net_Cost), big.mark = ","), " (savings)"),
      paste0("$", format(Net_Cost, big.mark = ","))
    )
  ) %>%
  select(
    Year,
    `Cumulative Savings`,
    `Program Cost`,
    `Net Cost`
  )

kable(df_table, align = "c",
      caption = "Cumulative Cost Savings vs. Program Cost Over 10 Years") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)
Cumulative Cost Savings vs. Program Cost Over 10 Years
Year Cumulative Savings Program Cost Net Cost
1 $ 157 $417 $ 260
2 $ 301 $417 $ 116
3 $ 456 $417 $- 39 (savings)
4 $ 575 $417 $-158 (savings)
5 $ 690 $417 $-273 (savings)
6 $ 805 $417 $-388 (savings)
7 $ 920 $417 $-503 (savings)
8 $1,034 $417 $-617 (savings)
9 $1,147 $417 $-730 (savings)
10 $1,259 $417 $-842 (savings)
######

# Extract numeric values
cost_summary_values <- df_summary %>%
  summarise(
    Cumulative_Savings = Sum_Cost_Saving,
    Program_Cost = Program_Cost,
    Net_Cost = Net_Cost
  )

# Compute summary statistics
cost_summary_stats <- cost_summary_values %>%
  summarise(
    Mean_Savings = mean(Cumulative_Savings),
    Median_Savings = median(Cumulative_Savings),
    SD_Savings = sd(Cumulative_Savings),
    Mean_Program = mean(Program_Cost),
    Median_Program = median(Program_Cost),
    SD_Program = sd(Program_Cost),
    Mean_Net = mean(Net_Cost),
    Median_Net = median(Net_Cost),
    SD_Net = sd(Net_Cost)
  )

# Reshape into long format for table display
cost_summary_long <- tibble(
  Metric = c("Cumulative Savings"),
  Mean = c(cost_summary_stats$Mean_Savings),
  Median = c(cost_summary_stats$Median_Savings),
  SD = c(cost_summary_stats$SD_Savings)
)

# Format values with dollar signs and commas
cost_summary_fmt <- cost_summary_long %>%
  mutate(across(c(Mean, Median, SD), ~ sprintf("$%s", format(round(.x, 2), big.mark = ","))))

# Render summary table
kable(cost_summary_fmt,
      col.names = c("Metric", "Mean", "Median", "SD"),
      align = c("l", "r", "r", "r"),
      caption = "Summary Statistics for Net Savings (medical and productivity combined) Over 10 Years") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)
Summary Statistics for Net Savings (medical and productivity combined) Over 10 Years
Metric Mean Median SD
Cumulative Savings $734.4 $747.5 $365.26

⚖️ QALYs gained from prevention

  • Cumulative Quality Adjusted Life Years (QALYs) Gained for 2,638 Projected Participants.

📊 Plot : Difference in QALYs (prevention vs. no)

library(tidyverse)
library(tidyverse)

# Data
df_qaly <- data.frame(
  Year = 1:10,
  No_Intervention = c(2145.0, 4211.9, 6203.3, 8121.9, 9970.1, 11750.4, 13465.3, 15116.9, 16707.5, 18239.3),
  With_Intervention = c(2147.3, 4217.5, 6213.1, 8135.8, 9988.1, 11772.5, 13491.3, 15146.8, 16741.2, 18276.8),
  QALYs_Gained = c(2.3, 5.6, 9.8, 13.9, 18.0, 22.1, 26.0, 29.9, 33.7, 37.5)
)

# Bar format
df_long_qaly <- df_qaly %>%
  pivot_longer(cols = c(No_Intervention, With_Intervention),
               names_to = "Type", values_to = "QALYs") %>%
  mutate(Type = recode(Type,
                       "No_Intervention" = "No Prevention",
                       "With_Intervention" = "With Prevention"))

# Labels with adjusted X position
df_qaly_labels <- df_qaly %>%
  mutate(
    Label = paste0("+", QALYs_Gained, " QALYs"),
    Y_Position = pmax(No_Intervention, With_Intervention) + 100,
    X_Position = as.numeric(factor(Year)) + 0.2  # slight right shift
  )

# Plot
ggplot(df_long_qaly, aes(x = factor(Year), y = QALYs, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.8) +
  
  # Labels with custom x and y
  geom_label(data = df_qaly_labels,
             aes(x = X_Position,
                 y = Y_Position,
                 label = Label),
             fill = "#D1ECF1",
             color = "black",
             show.legend = FALSE,
             size = 3.5,
             label.size = 0.2) +

  scale_fill_manual(
    name = "Scenario",
    values = c("No Prevention" = "#999999",  # grey
               "With Prevention" = "#56B4E9") # blue
  ) +

  labs(
    title = "",
    subtitle = "",
    x = "Year",
    y = "Cumulative QALYs"
  ) +
  theme_classic(base_size = 14) +
  theme(plot.margin = margin(10, 20, 10, 20))  # more spacing left/right

📋 Summary Table

  • The Cumulative “QALYs gained” column is equal to the difference between combined (medical + productivity) costs in the “QALYs with No Intervention” column and the “QALYs with Intervention” column.
# Data
df_qaly <- data.frame(
  Year = 1:10,
  No_Intervention = c(2145.0, 4211.9, 6203.3, 8121.9, 9970.1, 11750.4, 13465.3, 15116.9, 16707.5, 18239.3),
  With_Intervention = c(2147.3, 4217.5, 6213.1, 8135.8, 9988.1, 11772.5, 13491.3, 15146.8, 16741.2, 18276.8),
  QALYs_Gained = c(2.3, 5.6, 9.8, 13.9, 18.0, 22.1, 26.0, 29.9, 33.7, 37.5)
)

# Table formatting
df_qaly_table <- df_qaly %>%
  mutate(
    `No Intervention` = format(round(No_Intervention, 1), big.mark = ","),
    `With Intervention` = format(round(With_Intervention, 1), big.mark = ","),
    `QALYs Gained` = paste0("+", QALYs_Gained, " (gained)")
  ) %>%
  select(
    Year,
    `No Intervention`,
    `With Intervention`,
    `QALYs Gained`
  )

# Create table
kable(df_qaly_table, align = "c",
      caption = "Cumulative QALYs Over 10 Years: Intervention vs. No Intervention") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)
Cumulative QALYs Over 10 Years: Intervention vs. No Intervention
Year No Intervention With Intervention QALYs Gained
1 2,145.0 2,147.3 +2.3 (gained)
2 4,211.9 4,217.5 +5.6 (gained)
3 6,203.3 6,213.1 +9.8 (gained)
4 8,121.9 8,135.8 +13.9 (gained)
5 9,970.1 9,988.1 +18 (gained)
6 11,750.4 11,772.5 +22.1 (gained)
7 13,465.3 13,491.3 +26 (gained)
8 15,116.9 15,146.8 +29.9 (gained)
9 16,707.5 16,741.2 +33.7 (gained)
10 18,239.3 18,276.8 +37.5 (gained)
library(dplyr)
library(stringr)
library(knitr)
library(kableExtra)

# Extract numeric QALYs gained values from string (e.g., "+2.3 (gained)")
qaly_values <- df_qaly_table %>%
  mutate(QALYs_Num = as.numeric(str_replace_all(`QALYs Gained`, "[^0-9.-]", ""))) %>%
  pull(QALYs_Num)

# Compute summary statistics
qaly_summary <- tibble(
  Mean_QALYs = mean(qaly_values),
  Median_QALYs = median(qaly_values),
  SD_QALYs = sd(qaly_values)
)

# Format to two decimals, with "+" prefix
qaly_summary_fmt <- qaly_summary %>%
  mutate(across(everything(), ~ sprintf("+%.2f", .)))

# Render summary table
kable(qaly_summary_fmt,
      col.names = c("Mean QALYs Gained", "Median", "Standard Deviation"),
      align = c("r", "r", "r"),
      caption = "Summary of QALYs Gained Over 10 Years") %>%
  kable_styling(full_width = FALSE, position = "left", font_size = 14)
Summary of QALYs Gained Over 10 Years
Mean QALYs Gained Median Standard Deviation
+19.88 +20.05 +12.01

Preliminary Take-aways

✅ Summary medical cost-savings

  • Years 1–3: Not cost-effective → discounted savings < cost; Years 4–10: Cost-effective → discounted savings > cost; Breakeven occurs around Year 4

  • However, important to note over 10 years SD of the net benefit is 2X larger than both the mean, suggesting very high that there is high variability in the net benefit outcomes across time (standard deviation of $239.67, compared to a mean of $134.65)

❌ Summary Productivity cost-savings

  • Plots suggest T2D prevention intervention not cost-effective on productvity loss (assumed 3.3 per days of work missed per year per participant)

✅ Summary cuumulative cost-savings

  • On average, plots suggest T2D prevention intervention are cost-effective on cumulative (medical & productivity) costs, with approx. $734.4 (sd=$365.26) savings anually per participant over 10 years.
  • Years 1–2: Not cost-effective → discounted savings < cost; Years 3–10: Cost-effective → discounted savings > cost; Breakeven occurs around Year 3.

✅ Summary QALYs

  • On average, prevention led to a gain of approximately +19.88 QALYs over 10 years, with a standard deviation of +12.01.
  • Prevention consistently showed higher QALYs compared to no intervention across all 10 years.