TGF ROI Workflow Dependency Map

This document maps the local ROI workflow from raw inputs to generated outputs. It is intended as a practical runbook for rerunning the analysis after changing assumptions or after receiving a new set of externally supplied dump_ic model outputs.

Main Workflow Diagram

flowchart TD
  Config["config.yml<br/>paths, output folders, combined_filter"]
  ProjectPaths["R/project_paths.R<br/>path helpers, assumption readers, output writers"]
  Orchestrator["scripts/make_combined_kable.R<br/>main entry point"]

  Assumptions["data/raw/helpers/model_assumptions.csv<br/>modifiable model assumptions"]
  GrantCycles["data/raw/helpers/grant_cycle_periods.csv<br/>grant-cycle year mapping"]
  PoD["data/raw/helpers/PoD2020.xlsx<br/>life table / mortality inputs"]
  WEO["data/raw/helpers/WEOOct24.xlsx<br/>WEO macroeconomic inputs"]
  LEB["data/raw/helpers/LEB22_WDI.xlsx<br/>life expectancy at birth"]
  Alloc["data/raw/helpers/allocations_17bn.xlsx<br/>TGF allocation helper"]
  TierRegion["data/raw/helpers/tier_and_region_master.xlsx<br/>region, income, SSA filters"]

  HIVDump["data/raw/model_dumps/dump_ic_hiv_corrected_18dec.csv"]
  TBDump["data/raw/model_dumps/dump_ic_tb.csv"]
  MalDump["data/raw/model_dumps/dump_ic_malaria.csv"]

  WEOscript["R/WEO_for_ROI.R<br/>builds in-memory WEOOct24_wide with VSLY_USD"]
  HIV["R/ROI_HIV.R"]
  TB["R/ROI_TB.R"]
  MAL["R/ROI_MAL.R"]

  DiseaseOutputs["outputs/csv + outputs/xlsx<br/>DISEASE_ROI_Results_TIMESTAMP"]
  DataProblems["outputs/data_problems<br/>DISEASE_data_problems_TIMESTAMP<br/>roi_data_problems_latest"]
  Combined["outputs/csv/HTM_FILTER_combined_table.csv<br/>outputs/html/combined_roi_table_FILTER.html"]

  Config --> ProjectPaths
  ProjectPaths --> Orchestrator
  Orchestrator --> HIV
  Orchestrator --> TB
  Orchestrator --> MAL

  Assumptions --> ProjectPaths
  GrantCycles --> ProjectPaths
  Assumptions --> WEOscript
  WEO --> WEOscript
  LEB --> WEOscript

  HIVDump --> HIV
  TBDump --> TB
  MalDump --> MAL
  PoD --> HIV
  PoD --> TB
  PoD --> MAL
  Alloc --> HIV
  Alloc --> TB
  Alloc --> MAL
  TierRegion --> HIV
  TierRegion --> TB
  TierRegion --> MAL
  GrantCycles --> HIV
  GrantCycles --> TB
  GrantCycles --> MAL
  Assumptions --> HIV
  Assumptions --> TB
  Assumptions --> MAL

  WEOscript --> HIV
  WEOscript --> TB
  WEOscript --> MAL

  HIV --> DiseaseOutputs
  TB --> DiseaseOutputs
  MAL --> DiseaseOutputs
  HIV --> DataProblems
  TB --> DataProblems
  MAL --> DataProblems
  Orchestrator --> DataProblems
  Orchestrator --> Combined

Inputs

Input Config key Used by Role
data/raw/model_dumps/dump_ic_hiv_corrected_18dec.csv dump_ic_hiv R/ROI_HIV.R HIV model outputs by indicator, country, year, and scenario
data/raw/model_dumps/dump_ic_tb.csv dump_ic_tb R/ROI_TB.R TB model outputs by indicator, country, year, and scenario
data/raw/model_dumps/dump_ic_malaria.csv dump_ic_mal R/ROI_MAL.R Malaria model outputs by indicator, country, year, and scenario
data/raw/helpers/model_assumptions.csv model_assumptions All ROI scripts and R/WEO_for_ROI.R Discounting, scenario comparison, investment shares, productivity assumptions, VSL assumptions, WEO repair settings
data/raw/helpers/grant_cycle_periods.csv grant_cycle_periods All ROI scripts Maps calendar years to grant cycles such as GC7 and GC8
data/raw/helpers/PoD2020.xlsx PoD2020 All ROI scripts Remaining life expectancy and working-life calculations
data/raw/helpers/WEOOct24.xlsx weo_oct24 R/WEO_for_ROI.R Macro inputs for GDP, PPP, growth, VSL, and VSLY
data/raw/helpers/LEB22_WDI.xlsx leb22_wdi R/WEO_for_ROI.R LEB22 input used in VSLY denominator
data/raw/helpers/allocations_17bn.xlsx df_alloc All ROI scripts Allocation helper with AllocH, AllocT, and AllocM
data/raw/helpers/tier_and_region_master.xlsx tier_and_region_master All ROI scripts Country metadata used for region/income filters

The three dump_ic files are expected to have the same general shape as the current files:

Required field Current use
indicator Selects costs, deaths, cases, YLD, and disease-specific states
country Renamed to iso; used for joins and aggregation
scenario_descriptor Renamed to scenario; compared using comparison_scenario assumptions
year Used for reporting periods, grant cycles, and discounting
model_central Renamed to val; used as the central model output
model_high, model_low Present in current dumps but not used in current ROI calculations

Script Order

Run the workflow from the project root:

Rscript scripts/make_combined_kable.R

The orchestrator runs in this order:

  1. scripts/make_combined_kable.R
  2. R/project_paths.R
  3. R/ROI_HIV.R
  4. R/WEO_for_ROI.R, sourced inside HIV
  5. R/ROI_TB.R
  6. R/WEO_for_ROI.R, sourced inside TB
  7. R/ROI_MAL.R
  8. R/WEO_for_ROI.R, sourced inside Malaria
  9. Combine the latest disease data-problem files
  10. Write the combined CSV and HTML tables

Each disease script can technically be run on its own, but the canonical workflow is scripts/make_combined_kable.R because it produces all disease outputs plus the combined table.

Generated Outputs

Main outputs from Rscript scripts/make_combined_kable.R:

Output pattern Notes
outputs/csv/HIV_ROI_Results<timestamp>.csv Country-level HIV ROI results
outputs/xlsx/HIV_ROI_Results<timestamp>.xlsx Excel copy of country-level HIV ROI results
outputs/csv/TB_ROI_Results<timestamp>.csv Country-level TB ROI results
outputs/xlsx/TB_ROI_Results<timestamp>.xlsx Excel copy of country-level TB ROI results
outputs/csv/Malaria_ROI_Results<timestamp>.csv Country-level Malaria ROI results
outputs/xlsx/Malaria_ROI_Results<timestamp>.xlsx Excel copy of country-level Malaria ROI results
outputs/data_problems/<Disease>_data_problems<timestamp>.csv Disease-specific data-problem log for that run
outputs/data_problems/<Disease>_data_problems<timestamp>.xlsx Excel copy of disease-specific data-problem log
outputs/data_problems/<Disease>_data_problems_latest.csv Latest disease-specific data-problem log, overwritten each run
outputs/data_problems/roi_data_problems_latest.csv Combined latest data-problem log across HIV, TB, and Malaria
outputs/data_problems/roi_data_problems_latest.xlsx Excel copy of combined latest data-problem log
outputs/csv/HTM_<filter>_combined_table.csv Combined table used to replicate the RPubs-style headline table
outputs/html/combined_roi_table_<filter>.html Browser-previewable combined kable table

With the current config.yml, <filter> is all, so the main combined outputs are:

  • outputs/csv/HTM_all_combined_table.csv
  • outputs/html/combined_roi_table_all.html

Additional utility outputs:

Command Output
Rscript scripts/validate_combined_output.R Checks current default combined output against the RPubs headline table
Rscript scripts/test_discount_base_years.R Writes outputs/base_year_tests/discount_base_year_status.csv, discount_base_year_summary.csv, per-base-year combined-table snapshots, and logs
Rscript scripts/build_printable_report.R Rebuilds docs/printable_roi_refactor_report.html

How To Change Input Assumptions And Rerun

  1. Edit data/raw/helpers/model_assumptions.csv.
  2. Keep parameter, scope, subparameter, value, and value_type internally consistent. Numeric values should keep value_type as number or integer; scenario labels should keep value_type as string.
  3. For comparison_scenario, set one baseline row and one intervention row. The chosen values must appear as scenario_descriptor values in all three disease dumps.
  4. If changing grant-cycle years or extending future cycles, edit data/raw/helpers/grant_cycle_periods.csv rather than model_assumptions.csv.
  5. Run the full workflow:
Rscript scripts/make_combined_kable.R
  1. Inspect the latest data-problem file:
open outputs/data_problems/roi_data_problems_latest.csv
  1. If the goal is to reproduce the current RPubs headline table, run:
Rscript scripts/validate_combined_output.R

If assumptions changed intentionally, the RPubs validator may fail because the results are expected to change. In that case, treat successful completion of scripts/make_combined_kable.R plus a review of outputs/data_problems/roi_data_problems_latest.csv as the main sanity check.

For discount-base-year changes, run the regression harness:

Rscript scripts/test_discount_base_years.R

How To Use A New Set Of External dump_ic Files

There are two safe options.

Option A: Replace the local copied dump files

Use this when the new files should become the current default inputs and have the same filenames:

  1. Place the new local copies at:
    • data/raw/model_dumps/dump_ic_hiv_corrected_18dec.csv
    • data/raw/model_dumps/dump_ic_tb.csv
    • data/raw/model_dumps/dump_ic_malaria.csv
  2. Leave config.yml unchanged.
  3. Confirm the files still have the required columns listed above.
  4. Confirm the desired scenario_descriptor values are present in all three files.
  5. Run:
Rscript scripts/make_combined_kable.R

Option B: Keep the old copied dump files and point config to new filenames

Use this when you want to keep old and new dump files side by side:

  1. Copy the new files into data/raw/model_dumps/.
  2. Update these three keys in config.yml:
dump_ic_hiv: data/raw/model_dumps/<new_hiv_dump>.csv
dump_ic_mal: data/raw/model_dumps/<new_malaria_dump>.csv
dump_ic_tb: data/raw/model_dumps/<new_tb_dump>.csv
  1. If the new dumps use different scenario names, update these rows in data/raw/helpers/model_assumptions.csv:
A003,global,comparison_scenario,baseline,<new_baseline>,string,
A003,global,comparison_scenario,intervention,<new_intervention>,string,
  1. Run:
Rscript scripts/make_combined_kable.R
  1. Review outputs/data_problems/roi_data_problems_latest.csv. This is especially important with new dumps because ISO codes, scenario names, year ranges, or missing WEO/allocation joins can silently affect aggregate results.

  2. Do not use scripts/validate_combined_output.R as a success criterion for genuinely new model dumps. That validator is designed to check whether the current local workflow still reproduces the known RPubs headline table.

Common Things That Can Break A Rerun

Symptom Likely cause Where to fix
Error saying configured scenarios are missing comparison_scenario values do not exist in one or more dumps data/raw/helpers/model_assumptions.csv, or use dumps with the expected scenario_descriptor values
Missing VSLY, VSL, or wage-productivity values ISO/year not joining to WEO or LEB22 inputs WEOOct24.xlsx, LEB22_WDI.xlsx, ISO codes in dumps
Missing TGF percent-cost weighting ISO not found in allocation helper, or allocation column missing for disease allocations_17bn.xlsx
Unexpected reporting periods Year not mapped to GC7/GC8 or combined period assumptions changed grant_cycle_periods.csv and model_assumptions.csv
RPubs validator fails after intentional assumption or dump changes Expected behavior; the legacy benchmark changed Compare new outputs directly and inspect data-problem logs