Reproducible analyses:

Coding best practices for research

Derek Weix

dpw48@drexel.edu

Drexel University

October 29, 2025

What are coding best practices?

Coding best practices are general guidelines to maximize the quality of our code.

\(~\)

Possible situations:
- Research paper/one time project.
- Ongoing analyses that will be updated over time.
- Non-interactive, pipeline environment.

\(~\)

The best code is not that which runs the fastest, or has the fewest keystrokes, but that which is easiest to understand.

Why should I care?

Personal benefits:
- Following best practices makes it easier for you to understand how your own research choices are translate into code.
- Results in higher adaptability to alternative analysis choices.

\(~\)

Research society benefits:
- Provides clarity to other researchers regarding your choice of methods.
- Allows for much easier adaptation of these methods to alternative problems.

How to approach coding?

Deliberately!
- Treat the final version of your code with the same care as the final draft of your manuscript.
- Recall that code should be publicly available, even if the data are not.
- Prioritize clarity.

\(~\)

Code should be:
1. Organized.
2. Consistent.
3. Generalized.

Organizing our code

Directory
- README
- Scripts
  - Sections with each script
- Sub-directories
  - Data
  - Intermediary results
  - Final results

Sample directory architechture

# Prints directory content in tree-like format
fs::dir_tree("~/git/MS219_temperature_diabetes", recurse = FALSE)

~/git/MS219_temperature_diabetes
├── 00_import_data.R
├── 01_descriptive_statistics.R
├── 02_analysis_overall.R
├── 03_analysis_age.R
├── 04_analysis_climate.R
├── 05_figures_base.R
├── 05_figures_ggplot2.R
├── 06_edfs.R
├── data
├── dm_analysis_219.R
├── MS219_temperature_diabetes.Rproj
├── README.md
└── results

Sample directory architechture

# Prints directory content in tree-like format
fs::dir_tree("~/git/MS219_temperature_diabetes", recurse = TRUE)

~/git/MS219_temperature_diabetes
├── 00_import_data.R
├── 01_descriptive_statistics.R
├── 02_analysis_overall.R
├── 03_analysis_age.R
├── 04_analysis_climate.R
├── 05_figures_base.R
├── 05_figures_ggplot2.R
├── 06_edfs.R
├── data
│   └── diabetes.rds
├── dm_analysis_219.R
├── MS219_temperature_diabetes.Rproj
├── README.md
└── results
    ├── edf_age.rds
    ├── edf_climate.rds
    ├── edf_overall.rds
    ├── edf_sim_age.rds
    ├── edf_sim_climate.rds
    ├── edf_sim_overall.rds
    ├── table_1.html
    ├── table_2.html
    └── table_climate_zone.html

Organization within scripts

Think about our code as a book that can be read front to back.

\(~\)

Separate scripts: Chapters in a book.
Sections within scripts: Paragraphs within a chapter.
Comments within sections: Handwritten annotations.

\(~\)

00_import_data.R

library(tidyverse)

# File paths --------------------------------------------------------------

path_diabetes <- "//files.drexel.edu/colleges/SOPH/Shared/UHC/Projects/Wellcome_Trust/Manuscripts/MS219_RangelMoreno/Data Request/Data Build_20250416/Derived Data"
path_BEC <- "//files.drexel.edu/colleges/SOPH/Shared/UHC/Projects/Wellcome_Trust/Data Methods Core/Data/BEC Data/BEC_L1AD_08162023.csv"


# Read in data ------------------------------------------------------------

# Temperature & mortality data by city, date, age, and sex
df <-
  path_diabetes |> 
  list.files(full.names = TRUE) |> 
  map(\(path_city) haven::read_sas(path_city)) |> 
  list_rbind()

# Build environment core data (used to extract climate zones)
df_BEC <- read_csv(path_BEC)


# Format data -------------------------------------------------------------

df <- df |> 
  # Restrict age to adults over 20
  filter(thisage %in% c("20-34", "35-49", "50-64", "65+")) |>  
  # Renaming/formatting
  mutate(date = ymd(allDate),
         temp = L1ADtemp_pw,
         t_percent = tmp_pw_percentile/100,
         age = fct_collapse(thisage,
                            `<65` = c("20-34", "35-49", "50-64"),
                            `65+` = c("65+")),
         country = factor(country, levels = c("BR", "CL", "CO", "CR", "GT", "SV", "MX", "PA", "PE")),
         .keep = "unused") |>
  # Sum diabetes cases by city, date, and age group
  summarize(across(c(deaths, diabetes, kidney), sum),                  
            across(c(temp, t_percent, country), first),
            .by = c(salid1, date, age)) |> 
  # Order observations by city, date, and age group
  arrange(across(c(salid1, date, age)))

df_BEC <- df_BEC |> 
  # Extract L1 identifier and climate zone
  select(SALID1, BECCZL1AD) |> 
  # Create usable climate_zone factor
  mutate(climate_zone = 
           case_when(
             BECCZL1AD %in% 1:4   ~ "Tropical",
             BECCZL1AD %in% 5:8   ~ "Arid",
             BECCZL1AD %in% 9:17  ~ "Temperate",
             BECCZL1AD %in% 18:29 ~ "Continental",
             BECCZL1AD %in% 30:31 ~ "Polar",
             .default = "Other") |> factor(),
         .keep = "unused") |> 
  rename_with(tolower)


# Join and save data ------------------------------------------------------

# Add climate zone to temp/mortality data
df <- left_join(df, df_BEC, by = "salid1")

# Save formatted as .rds file
saveRDS(df, "data/diabetes.rds")

Consistency

Pick a style guide
- Naming conventions
  - directories, files
  - functions, variables

\(~\)

Programming strategies
- Functional or object oriented
- File saving conventions
- etc.

Generalizability

Write code that is robust to external changes.
- Refer to columns by name, instead of by position.

\(~\)

Write code so it can be run on any computer with minimal changes.
- Absolute pathing vs. relative pathing.

\(~\)

Avoid copy & pasting code like the plague!
- Save important global variables to the global environment.
- Utilize mapping or loops.
- Use helper functions to keep code human readable.

Clarity

There is no great writing, only great rewriting.

\[~\]

Get code working while applying these principles.
Revise code systematically.
Goal: code should be as simple and clear as possible while accomplishing your goal.

Sample code scripts

README.md

# Summary

This repository includes the companion code to the paper *Non-optimal temperature and diabetes mellitus mortality in 329 Latin American cities.*

## Abstract

Diabetes mellitus is the fifth leading cause of mortality in Latin America (LATAM). This highly urbanized region has experienced rising temperatures in recent years and increasing frequency of extreme temperature events. Little is known about the association between ambient temperature and diabetes mortality, and its differences across age groups and climate zones in urban areas. We assessed the association between daily ambient temperature and diabetes mortality in 329 LATAM cities across nine countries from 2000 to 2022, as well as the heterogeneity of effects associated with age and climate zones. We used a case time series design and conditional quasi-Poisson distributed lags non-linear models (0-21 day lags). We analyzed 2,223,748 diabetes deaths across all cities. We observed an inverse J-shaped curve association between temperature and diabetes mortality, with the minimum mortality temperature percentile (MMTp) at the 83rd. Diabetes mortality risk was higher at temperatures below the MMTp than at temperatures above it. We estimated that 0.5% (95% CI: 0.3%, 0.6%) of diabetes deaths were attributable to heat temperatures, whereas the excess death fraction for cold was 8.3% (95% CI: 7.4%, 9.2%). Effects of cold were slightly stronger in adults <65 years old and in people living in arid cities, while heat showed a higher risk for adults ≥65 years of age. Both heat and cold are associated with increased diabetes mellitus mortality in Latin American cities. Strategies to protect people with diabetes against the impacts of ambient temperature are needed, such as creating health monitoring programs, issuing early temperature alerts for diabetes patients, and promoting access to thermally comfortable environments.

# Code

Scripts:

- `00_import_data.R`: Imports all external DMC data and creates the `diabetes.rds` data frame used in all analyses.
- `01_descriptive_statistics.R`: Computes descriptive statistics and generates Table 1 and the climate zone temperature table.
- `02_analysis_overall.R`: Fits model for the overall analysis.
- `03_analysis_age.R`: Fits models for age specific analysis. Requires `02_analysis_overall.R` to be run.
- `04_analysis_climate.R`: Fits models for climate-zone specific analysis. Requires `02_analysis_overall.R` to be run.
- `05_figures_base.R`: Generates Figures 1, 2, 3 with base R graphics. Depends on the relevant analysis scripts to be run.
- `05_figures_ggplot2.R`: Generates Figures 1, 2, 3 with ggplot2. Depends on the relevant analysis scripts to be run.
- `06_edfs.R`: Calculates the excess death fractions (EDFs) and EDF intervals for all analysis. Depends on the relevant analysis scripts to be run. Generates Table 2.

# Data

Data for this project comes from the DMC.

00_import_data.R

library(tidyverse)

# File paths --------------------------------------------------------------

path_diabetes <- "//files.drexel.edu/colleges/SOPH/Shared/UHC/Projects/Wellcome_Trust/Manuscripts/MS219_RangelMoreno/Data Request/Data Build_20250416/Derived Data"
path_BEC <- "//files.drexel.edu/colleges/SOPH/Shared/UHC/Projects/Wellcome_Trust/Data Methods Core/Data/BEC Data/BEC_L1AD_08162023.csv"


# Read in data ------------------------------------------------------------

# Temperature & mortality data by city, date, age, and sex
df <-
  path_diabetes |> 
  list.files(full.names = TRUE) |> 
  map(\(path_city) haven::read_sas(path_city)) |> 
  list_rbind()

# Build environment core data (used to extract climate zones)
df_BEC <- read_csv(path_BEC)


# Format data -------------------------------------------------------------

df <- df |> 
  # Restrict age to adults over 20
  filter(thisage %in% c("20-34", "35-49", "50-64", "65+")) |>  
  # Renaming/formatting
  mutate(date = ymd(allDate),
         temp = L1ADtemp_pw,
         t_percent = tmp_pw_percentile/100,
         age = fct_collapse(thisage,
                            `<65` = c("20-34", "35-49", "50-64"),
                            `65+` = c("65+")),
         country = factor(country, levels = c("BR", "CL", "CO", "CR", "GT", "SV", "MX", "PA", "PE")),
         .keep = "unused") |>
  # Sum diabetes cases by city, date, and age group
  summarize(across(c(deaths, diabetes, kidney), sum),                  
            across(c(temp, t_percent, country), first),
            .by = c(salid1, date, age)) |> 
  # Order observations by city, date, and age group
  arrange(across(c(salid1, date, age)))

df_BEC <- df_BEC |> 
  # Extract L1 identifier and climate zone
  select(SALID1, BECCZL1AD) |> 
  # Create usable climate_zone factor
  mutate(climate_zone = 
           case_when(
             BECCZL1AD %in% 1:4   ~ "Tropical",
             BECCZL1AD %in% 5:8   ~ "Arid",
             BECCZL1AD %in% 9:17  ~ "Temperate",
             BECCZL1AD %in% 18:29 ~ "Continental",
             BECCZL1AD %in% 30:31 ~ "Polar",
             .default = "Other") |> factor(),
         .keep = "unused") |> 
  rename_with(tolower)


# Join and save data ------------------------------------------------------

# Add climate zone to temp/mortality data
df <- left_join(df, df_BEC, by = "salid1")

# Save formatted as .rds file
saveRDS(df, "data/diabetes.rds")

01_descriptive_statistics.R

library(tidyverse)
library(glue)
library(gt)

# Load data
df <- readRDS("data/diabetes.rds")


# Descriptive results -----------------------------------------------------

# Total of all diabetes deaths 
total_diabetes <- sum(df$diabetes); total_diabetes

# Percentage of diabetes deaths by age
df |>
  summarize(pct_diabetes = 100*sum(diabetes)/total_diabetes, 
            .by = age)


# Table 1 -----------------------------------------------------------------

# Columns 1-5
tbl_11 <- df |> 
  summarize(num_cities = n_distinct(salid1),
            study_period = glue("({first_year}-{final_year})",
                                first_year = min(year(date)),
                                final_year = max(year(date))),
            tot_diabetes = sum(diabetes),
            pct_diabetes = 100*sum(diabetes)/total_diabetes,
            .by = country) |> 
  arrange(country); tbl_11

# Column 6 [NOT READY, MAY DEPEND ON POP PROJECTIONS DATA]
tbl_12 <- df |> 
  mutate(year = year(date)) |> 
  summarize(annual_diabetes = sum(diabetes), .by = c(country, year)) |> 
  summarize(avg_annual_diabetes = glue("{med} ({lower}, {upper})",
                                       med = median(annual_diabetes),
                                       lower = quantile(annual_diabetes, .1),
                                       upper = quantile(annual_diabetes, .9)),
            .by = country) |> 
  arrange(country); tbl_12

# Column 7
tbl_13 <- df |> 
  summarize(temp = first(temp), .by = c(country, salid1, date)) |> 
  summarize(temp = glue("{med} ({lower}, {upper})",
                        med = median(temp)         |> round(1),
                        lower = quantile(temp, .1) |> round(1),
                        upper = quantile(temp, .9) |> round(1)),
            .by = country) |> 
  arrange(country); tbl_13

# Create table
tbl_1 <- tbl_11 |> 
  left_join(tbl_12, by = "country") |> 
  left_join(tbl_13, by = "country") |> 
  gt() |> 
  tab_header(title = "Table 1. Diabetes mellitus mortality and temperature characteristics in 329 selected cities of Latin America") |>
  tab_spanner(label = "Total diabetes deaths", columns = c(tot_diabetes, pct_diabetes)) |> 
  fmt_number(columns = tot_diabetes, decimals = 0) |> 
  fmt_number(columns = pct_diabetes, decimals = 1) |> 
  tab_footnote(location = cells_column_labels(columns = c(avg_annual_diabetes, temp)),
               footnote = "Median (10th percentile, 90th percentile)") |> 
  tab_footnote(location = cells_column_labels(columns = temp),
               footnote = "Rates per 100,000 inhabitants") |> 
  opt_footnote_marks(marks = "letters") |> 
  cols_label(country = "Country",
             num_cities = "No. of cities",
             study_period = "Study period",
             tot_diabetes = "n",
             pct_diabetes = "(%)",
             avg_annual_diabetes = "Average annual diabetes mortality rate",
             temp = "Daily mean temperature (°C)"); tbl_1

# Save as html table (this can be opened with Excel)
gtsave(tbl_1, "results/table_1.html")


# Climate zone table ------------------------------------------------------

# Create table
tbl_climate_zone <- df |> 
  summarize(num_cities = n_distinct(salid1),
            min_temp  = min(temp),
            mean_temp = mean(temp),
            max_temp  = max(temp),
            p01 = quantile(temp, .01),
            p25 = quantile(temp, .25),
            p50 = quantile(temp, .5),
            p75 = quantile(temp, .75),
            p99 = quantile(temp, .99),
            .by = climate_zone) |> 
  arrange(climate_zone) |> 
  gt() |> 
  tab_header(title = "Temperature (°C) by climate zone") |>
  tab_spanner(label = "Percentile", columns = c(p01, p25, p50, p75, p99)) |>
  fmt_number(columns = c(min_temp, mean_temp, max_temp,
                         p01, p25, p50, p75, p99),
             decimals = 1) |> 
  cols_label(climate_zone = "Climate zone",
             num_cities = "Number of cities",
             min_temp = "Minimum",
             mean_temp = "Mean",
             max_temp = "Maximum",
             p01 = "1st",
             p25 = "25th",
             p50 = "50th",
             p75 = "75th",
             p99 = "99th")

# Save as html table (this can be opened with Excel)
gtsave(tbl_climate_zone, "results/table_climate_zone.html")

02_analysis_overall.R

library(tidyverse)
library(glue)
library(gt)
library(gnm); library(dlnm); library(splines)
library(SALURhelper)

# Load data
df <- 
  readRDS("data/diabetes.rds") |> 
  mutate(strata = glue("{salid1}:{age}:{year(date)}:{month(date)}:{wday(date, label = TRUE)}") |> factor(),
         group  = glue("{salid1}:{age}") |> factor())

glimpse(df)

# Fit model ---------------------------------------------------------------

# Model specifications
n_lag <- 21

splines_var <- list(fun = "ns", knots = c(.1, .75, .9))
splines_lag <- list(fun = "ns", knots = logknots(n_lag, nk = 3))

seq_pred <- seq(0, 1, length.out = 101)  # Sequence of temp %s for predictions
seq_RR_temps <- c(2, 6, 26, 76, 96, 100) # Used to extract RRs at .01, .05, .25, .75, .95, .99 temperature percentiles

# Define cross-basis
cbt <- crossbasis(df$t_percent,
                  lag = n_lag,
                  argvar = splines_var,
                  arglag = splines_lag,
                  group = df$group)

# Fit conditional quasi-Poisson DLNM
fit <- gnm(diabetes ~ cbt,
           family = quasipoisson,
           eliminate = strata,
           data = df)

# Minimum mortality temperature percentile
MMTp <- crossreduce(cbt, fit, at = seq_pred, cen = .5) |> get_MMT()

# Mean risk ratio via reduced predictions
red  <- crossreduce(cbt, fit, at = seq_pred, cen = MMTp)

red |> 
  get_df_RRs() |> 
  slice(seq_RR_temps) # Extract RRs at specified temp percentiles

03_analysis_age.R

# source("02_analysis_overall.R")

# Add age category indicator variables
df <- 
  df |> 
  mutate(under_65 = ifelse(age != "65+", 1, 0),
         over_65  = ifelse(age == "65+", 1, 0))

glimpse(df)


# Fit age-specific effect modification models -----------------------------

# Define interaction cross-bases
cb_age1 <- cbt*df$under_65
cb_age2 <- cbt*df$over_65

# Fit CTS with age-specific reference groups
fit_age1 <- update(fit, . ~ . + cb_age2) # reference group: <65
fit_age2 <- update(fit, . ~ . + cb_age1) # reference group: 65+

# Minimum mortality temperature percentile
MMTp_age1 <- crossreduce(cbt, fit_age1, at = seq_pred, cen = .5) |> get_MMT()
MMTp_age2 <- crossreduce(cbt, fit_age2, at = seq_pred, cen = .5) |> get_MMT()

# Mean risk ratio via reduced predictions
red_age1  <- crossreduce(cbt, fit_age1, at = seq_pred, cen = MMTp_age1)
red_age2  <- crossreduce(cbt, fit_age2, at = seq_pred, cen = MMTp_age2)

list(red_age1, red_age2) |> 
  map(\(x) slice(get_df_RRs(x), seq_RR_temps)) # Extract RRs at specified temp percentiles

04_analysis_climate.R

# source("02_analysis_overall.R")

# Add climate zone indicator variables
df <- 
  df |> 
  mutate(arid      = ifelse(climate_zone == "Arid", 1, 0),
         polar     = ifelse(climate_zone == "Polar", 1, 0),
         temperate = ifelse(climate_zone == "Temperate", 1, 0),
         tropical  = ifelse(climate_zone == "Tropical", 1, 0))

glimpse(df)



# Fit climate-zone specific models ----------------------------------------

# Define interaction cross-bases
cb_arid      <- cbt*df$arid
cb_polar     <- cbt*df$polar
cb_temperate <- cbt*df$temperate
cb_tropical  <- cbt*df$tropical

# Fit CTS with climate-zone reference groups
fit_arid      <- update(fit, . ~ . + cb_polar + cb_temperate + cb_tropical) # reference group: arid
fit_polar     <- update(fit, . ~ . + cb_arid  + cb_temperate + cb_tropical) # reference group: polar
fit_temperate <- update(fit, . ~ . + cb_polar + cb_arid      + cb_tropical) # reference group: temperate
fit_tropical  <- update(fit, . ~ . + cb_polar + cb_temperate + cb_arid)     # reference group: tropical

# Minimum mortality temperature percentile
MMTp_arid      <- crossreduce(cbt, fit_arid,      at = seq_pred, cen = .5) |> get_MMT()
MMTp_polar     <- crossreduce(cbt, fit_polar,     at = seq_pred, cen = .5) |> get_MMT()
MMTp_temperate <- crossreduce(cbt, fit_temperate, at = seq_pred, cen = .5) |> get_MMT()
MMTp_tropical  <- crossreduce(cbt, fit_tropical,  at = seq_pred, cen = .5) |> get_MMT()

# Mean risk ratio via reduced predictions
red_arid      <- crossreduce(cbt, fit_arid,      at = seq_pred, cen = MMTp_arid)
red_polar     <- crossreduce(cbt, fit_polar,     at = seq_pred, cen = MMTp_polar)
red_temperate <- crossreduce(cbt, fit_temperate, at = seq_pred, cen = MMTp_temperate)
red_tropical  <- crossreduce(cbt, fit_tropical,  at = seq_pred, cen = MMTp_tropical)

list(red_arid, red_polar, red_temperate, red_tropical) |> 
  map(\(x) slice(get_df_RRs(x), seq_RR_temps)) # Extract RRs at specified temp percentiles

05_figures_ggplot2.R

# Figure 1 (ggplot2) ------------------------------------------------------
# source("02_analysis_overall.R")

red |> 
  get_df_RRs() |> 
  ggplot(aes(exposure)) +
  geom_ribbon(aes(ymin = RR_low, ymax = RR_high, fill = exposure < MMTp), alpha = .2) +
  geom_line(aes(y = RR, color = exposure < MMTp)) +
  geom_hline(yintercept = 1) +
  geom_vline(xintercept = c(0.01, 0.25, 0.50, 0.75, 0.99), linetype = "dashed", alpha = .5) +
  geom_vline(xintercept = get_MMT(red), linetype = "dotdash") +
  scale_color_manual(values = c("TRUE" = "blue", "FALSE" = "red"),
                     aesthetics = c("color", "fill")) +
  coord_transform(y = "log") +
  scale_x_continuous(breaks = seq(0, 1, .25)) +
  labs(x = "Temperature percentile", y = "RR") +
  theme_classic() +
  theme(legend.position = "none")

# Figure 2 (ggplot2) ------------------------------------------------------
# source("03_analysis_age.R")

list(`Under 65`   = get_df_RRs(red_age1),
     `65 or Over` = get_df_RRs(red_age2)) |> 
  list_rbind(names_to = "age") |> 
  ggplot(aes(exposure)) +
  geom_ribbon(aes(ymin = RR_low, ymax = RR_high, fill = age), alpha = .2) +
  geom_line(aes(y = RR, color = age)) +
  geom_hline(yintercept = 1) +
  geom_vline(xintercept = c(0.01, 0.25, 0.50, 0.75, 0.99), linetype = "dashed", alpha = .5) +
  scale_color_brewer(palette = "Set1") +
  scale_fill_brewer(palette = "Set1") +
  coord_transform(y = "log") +
  scale_x_continuous(breaks = seq(0, 1, .25)) +
  labs(x = "Temperature percentile", y = "RR", color = "", fill = "") +
  theme_classic() +
  theme(legend.position = c(.83, .8))


# Figure 3 (ggplot2) ------------------------------------------------------
# source(04_analysis_climate.R")

list(Arid      = get_df_RRs(red_arid),
     #Polar     = get_df_RRs(red_polar), # Removed because of high uncertainty
     Temperate = get_df_RRs(red_temperate),
     Tropical  = get_df_RRs(red_tropical)) |> 
  list_rbind(names_to = "climate_zone") |> 
  ggplot(aes(exposure)) +
  geom_ribbon(aes(ymin = RR_low, ymax = RR_high, fill = climate_zone), alpha = .2) +
  geom_line(aes(y = RR, color = climate_zone)) +
  geom_hline(yintercept = 1) +
  geom_vline(xintercept = c(0.01, 0.25, 0.50, 0.75, 0.99), linetype = "dashed", alpha = .5) +
  scale_color_brewer(palette = "Set1") +
  scale_fill_brewer(palette = "Set1") +
  coord_transform(y = "log") +
  scale_x_continuous(breaks = seq(0, 1, .25)) +
  labs(x = "Temperature percentile", y = "RR", color = "", fill = "") +
  theme_classic() +
  theme(legend.position = c(.83, .8))

06_edfs.R

set.seed(1234)


# Helper functions --------------------------------------------------------

# Temperature ranges over which EDFs are calculated
get_temp_ranges <- function(MMTp) {
  list(non_optimal  = c(0, 1),
       all_heat     = c(MMTp, 1),
       extreme_heat = c(.95, 1),
       all_cold     = c(0, MMTp),
       extreme_cold = c(0, .05))
}

# Gets EDFs for chosen analyses
get_edf <- function(list_model, # List of model objects
                    list_cen,   # List or vector of MMTp values
                    list_names, # List or vector EDF group names
                    sim = FALSE # Indicates simulated results (used for interval estimates)
                    ) {
  
  # Check model, center, and name lengths are equal
  if(length(list_model) != length(list_cen) | length(list_model) != length(list_names)) {
    stop("Error: list_model, list_cen, and list_names must have equal length.")
  }
  
  # Map over models & their respective MMTp
  list(model  = list_model,
       center = list_cen) |> 
    pmap(\(model, center) {
      # Calculate relevant temp ranges based on MMTp
      get_temp_ranges(center) |> 
        map(\(temp_range) {
          # Calculate attributable fraction 
          attrdl(x     = model$data$t_percent,
                 basis = cbt,
                 cases = model$data$diabetes,
                 model = model,
                 dir   = "back",
                 cen   = center,
                 range = temp_range,
                 sim   = sim,
                 nsim  = 1000)
        })
    # Set list names to list_names
    }) |> set_names(list_names)
}

# Converts output of get_edf into table form
edf_to_tbl <- function(list_edf, list_edf_sim) {
  # Map over groups
  list(list_edf, list_edf_sim) |> 
    pmap(\(list_edf, list_edf_sim) {
      # Map over temperate range
      list(edf = list_edf, edf_sim = list_edf_sim) |> 
        pmap(\(edf, edf_sim) {
          glue("{estimate} ({lower}, {upper})",
               estimate = round(100*edf, n_round),
               lower    = round(quantile(100*edf_sim, .025), n_round),
               upper    = round(quantile(100*edf_sim, .975), n_round))
        }) |> 
        as_tibble()
    }) |> 
    list_rbind(names_to = "category")
}

# Overall EDFs ------------------------------------------------------------
# source("02_analysis_overall.R")

# Estimates of EDFs
edf <- get_edf(list(fit), MMTp, "overall")
saveRDS(edf, "results/edf_overall.rds")

# Simulations to get interval EDF estimates
edf_sim <- get_edf(list(fit), MMTp, "overall", sim = TRUE)
saveRDS(edf_sim, "results/edf_sim_overall.rds")


# Age specific EDFs -------------------------------------------------------
# source("03_analysis_age.R")

# Estimates of age specific EDFs
edf_age <- get_edf(list(fit_age1, fit_age2),
                   c(MMTp_age1, MMTp_age2),
                   c("under_65", "over_65"))
saveRDS(edf_age, "results/edf_age.rds")

# Simulations to get interval EDF estimates
edf_sim_age <- get_edf(list(fit_age1, fit_age2),
                       c(MMTp_age1, MMTp_age2),
                       c("under_65", "over_65"), 
                       sim = TRUE)
saveRDS(edf_sim_age, "results/edf_sim_age.rds")


# Climate zone specific EDFs ----------------------------------------------
# source("04_analysis_climate.R")

# Estimates of climate zone specific EDFs
edf_climate <- get_edf(list(fit_arid, fit_temperate, fit_tropical),
                       c(MMTp_arid, MMTp_temperate, MMTp_tropical),
                       c("arid", "temperate", "tropical"))
saveRDS(edf_climate, "results/edf_climate.rds")

# Simulations to get interval EDF estimates
edf_sim_climate <- get_edf(list(fit_arid, fit_temperate, fit_tropical),
                           c(MMTp_arid, MMTp_temperate, MMTp_tropical),
                           c("arid", "temperate", "tropical"), 
                           sim = TRUE)
saveRDS(edf_sim_climate, "results/edf_sim_climate.rds")

# Table 2 -----------------------------------------------------------------
edf <- readRDS("results/edf_overall.rds")
edf_sim <- readRDS("results/edf_sim_overall.rds")

edf_age <- readRDS("results/edf_age.rds")
edf_sim_age <- readRDS("results/edf_sim_age.rds")

edf_climate <- readRDS("results/edf_climate.rds")
edf_sim_climate <- readRDS("results/edf_sim_climate.rds")

# Partial EDF table
n_round <- 2

# Convert EDF output to tables
tbl_edf1 <- edf_to_tbl(edf, edf_sim)
tbl_edf2 <- edf_to_tbl(edf_age, edf_sim_age)
tbl_edf3 <- edf_to_tbl(edf_climate, edf_sim_climate)

# Create table 2
tbl_2 <- 
  bind_rows(tbl_edf1, tbl_edf2, tbl_edf3) |> 
  mutate(category = c("Overall", "<65", "≥65", "Arid", "Temperate", "Tropical")) |> 
  gt(rowname_col = "category") |> 
  tab_row_group(label = "Age groups", rows = c("<65", "≥65")) |> 
  tab_row_group(label = "Climate zones", rows = c("Arid", "Temperate", "Tropical")) |> 
  row_group_order(groups = c(NA, "Age groups", "Climate zones")) |> 
  tab_header(title = "Diabetes mellitus excess death fraction (%, 95% CI) attributable to temperature") |> 
  tab_footnote(location = cells_column_labels(columns = non_optimal),
               footnote = "Percentage of total deaths attributable by temperatures above or below the minimum mortality temperature percentile.") |> 
  tab_footnote(location = cells_column_labels(columns = extreme_heat),
               footnote = "≥95th percentile of the city-specific daily temperature distribution.") |> 
  tab_footnote(location = cells_column_labels(columns = extreme_cold),
               footnote = "≤5th percentile of the city-specific daily temperature distribution.") |> 
  opt_footnote_marks(marks = "letters") |> 
  cols_label(non_optimal = "Non-optimal temperature",
             all_heat = "All heat",
             extreme_heat = "Extreme heat",
             all_cold = "All cold",
             extreme_cold = "Extreme cold"); tbl_2

# Save as html table (this can be opened with Excel)
gtsave(tbl_2, "results/table_2.html")

Reproducible analyses:

What are coding best practices?

Why should I care?

How to approach coding?

Organizing our code

Sample directory architechture

Sample directory architechture

Organization within scripts

Consistency

Generalizability

Clarity

There is no great writing, only great rewriting.

Sample code scripts

Version control & sharing code

Questions or comments?