Think about our code as a book that can be read front to back.
\(~\)
Separate scripts: Chapters in a book.
Sections within scripts: Paragraphs within a chapter.
Comments within sections: Handwritten annotations.
\(~\)
00_import_data.R
library(tidyverse)# File paths --------------------------------------------------------------path_diabetes <-"//files.drexel.edu/colleges/SOPH/Shared/UHC/Projects/Wellcome_Trust/Manuscripts/MS219_RangelMoreno/Data Request/Data Build_20250416/Derived Data"path_BEC <-"//files.drexel.edu/colleges/SOPH/Shared/UHC/Projects/Wellcome_Trust/Data Methods Core/Data/BEC Data/BEC_L1AD_08162023.csv"# Read in data ------------------------------------------------------------# Temperature & mortality data by city, date, age, and sexdf <- path_diabetes |>list.files(full.names =TRUE) |>map(\(path_city) haven::read_sas(path_city)) |>list_rbind()# Build environment core data (used to extract climate zones)df_BEC <-read_csv(path_BEC)# Format data -------------------------------------------------------------df <- df |># Restrict age to adults over 20filter(thisage %in%c("20-34", "35-49", "50-64", "65+")) |># Renaming/formattingmutate(date =ymd(allDate),temp = L1ADtemp_pw,t_percent = tmp_pw_percentile/100,age =fct_collapse(thisage,`<65`=c("20-34", "35-49", "50-64"),`65+`=c("65+")),country =factor(country, levels =c("BR", "CL", "CO", "CR", "GT", "SV", "MX", "PA", "PE")),.keep ="unused") |># Sum diabetes cases by city, date, and age groupsummarize(across(c(deaths, diabetes, kidney), sum), across(c(temp, t_percent, country), first),.by =c(salid1, date, age)) |># Order observations by city, date, and age grouparrange(across(c(salid1, date, age)))df_BEC <- df_BEC |># Extract L1 identifier and climate zoneselect(SALID1, BECCZL1AD) |># Create usable climate_zone factormutate(climate_zone =case_when( BECCZL1AD %in%1:4~"Tropical", BECCZL1AD %in%5:8~"Arid", BECCZL1AD %in%9:17~"Temperate", BECCZL1AD %in%18:29~"Continental", BECCZL1AD %in%30:31~"Polar",.default ="Other") |>factor(),.keep ="unused") |>rename_with(tolower)# Join and save data ------------------------------------------------------# Add climate zone to temp/mortality datadf <-left_join(df, df_BEC, by ="salid1")# Save formatted as .rds filesaveRDS(df, "data/diabetes.rds")
Write code so it can be run on any computer with minimal changes.
Absolute pathing vs. relative pathing.
\(~\)
Avoid copy & pasting code like the plague!
Save important global variables to the global environment.
Utilize mapping or loops.
Use helper functions to keep code human readable.
Clarity
There is no great writing, only great rewriting.
\[~\]
Get code working while applying these principles.
Revise code systematically.
Goal: code should be as simple and clear as possible while accomplishing your goal.
Sample code scripts
README.md
# SummaryThis repository includes the companion code to the paper *Non-optimal temperature and diabetes mellitus mortality in 329 Latin American cities.*## AbstractDiabetes mellitus is the fifth leading cause of mortality in Latin America (LATAM). This highly urbanized region has experienced rising temperatures in recent years and increasing frequency of extreme temperature events. Little is known about the association between ambient temperature and diabetes mortality, and its differences across age groups and climate zones in urban areas. We assessed the association between daily ambient temperature and diabetes mortality in 329 LATAM cities across nine countries from 2000 to 2022, as well as the heterogeneity of effects associated with age and climate zones. We used a case time series design and conditional quasi-Poisson distributed lags non-linear models (0-21 day lags). We analyzed 2,223,748 diabetes deaths across all cities. We observed an inverse J-shaped curve association between temperature and diabetes mortality, with the minimum mortality temperature percentile (MMTp) at the 83rd. Diabetes mortality risk was higher at temperatures below the MMTp than at temperatures above it. We estimated that 0.5% (95% CI: 0.3%, 0.6%) of diabetes deaths were attributable to heat temperatures, whereas the excess death fraction for cold was 8.3% (95% CI: 7.4%, 9.2%). Effects of cold were slightly stronger in adults <65 years old and in people living in arid cities, while heat showed a higher risk for adults ≥65 years of age. Both heat and cold are associated with increased diabetes mellitus mortality in Latin American cities. Strategies to protect people with diabetes against the impacts of ambient temperature are needed, such as creating health monitoring programs, issuing early temperature alerts for diabetes patients, and promoting access to thermally comfortable environments.# CodeScripts:- `00_import_data.R`: Imports all external DMC data and creates the `diabetes.rds` data frame used in all analyses.- `01_descriptive_statistics.R`: Computes descriptive statistics and generates Table 1 and the climate zone temperature table.- `02_analysis_overall.R`: Fits model for the overall analysis.- `03_analysis_age.R`: Fits models for age specific analysis. Requires `02_analysis_overall.R` to be run.- `04_analysis_climate.R`: Fits models for climate-zone specific analysis. Requires `02_analysis_overall.R` to be run.- `05_figures_base.R`: Generates Figures 1, 2, 3 with base R graphics. Depends on the relevant analysis scripts to be run.- `05_figures_ggplot2.R`: Generates Figures 1, 2, 3 with ggplot2. Depends on the relevant analysis scripts to be run.- `06_edfs.R`: Calculates the excess death fractions (EDFs) and EDF intervals for all analysis. Depends on the relevant analysis scripts to be run. Generates Table 2.# DataData for this project comes from the DMC.
00_import_data.R
library(tidyverse)# File paths --------------------------------------------------------------path_diabetes <-"//files.drexel.edu/colleges/SOPH/Shared/UHC/Projects/Wellcome_Trust/Manuscripts/MS219_RangelMoreno/Data Request/Data Build_20250416/Derived Data"path_BEC <-"//files.drexel.edu/colleges/SOPH/Shared/UHC/Projects/Wellcome_Trust/Data Methods Core/Data/BEC Data/BEC_L1AD_08162023.csv"# Read in data ------------------------------------------------------------# Temperature & mortality data by city, date, age, and sexdf <- path_diabetes |>list.files(full.names =TRUE) |>map(\(path_city) haven::read_sas(path_city)) |>list_rbind()# Build environment core data (used to extract climate zones)df_BEC <-read_csv(path_BEC)# Format data -------------------------------------------------------------df <- df |># Restrict age to adults over 20filter(thisage %in%c("20-34", "35-49", "50-64", "65+")) |># Renaming/formattingmutate(date =ymd(allDate),temp = L1ADtemp_pw,t_percent = tmp_pw_percentile/100,age =fct_collapse(thisage,`<65`=c("20-34", "35-49", "50-64"),`65+`=c("65+")),country =factor(country, levels =c("BR", "CL", "CO", "CR", "GT", "SV", "MX", "PA", "PE")),.keep ="unused") |># Sum diabetes cases by city, date, and age groupsummarize(across(c(deaths, diabetes, kidney), sum), across(c(temp, t_percent, country), first),.by =c(salid1, date, age)) |># Order observations by city, date, and age grouparrange(across(c(salid1, date, age)))df_BEC <- df_BEC |># Extract L1 identifier and climate zoneselect(SALID1, BECCZL1AD) |># Create usable climate_zone factormutate(climate_zone =case_when( BECCZL1AD %in%1:4~"Tropical", BECCZL1AD %in%5:8~"Arid", BECCZL1AD %in%9:17~"Temperate", BECCZL1AD %in%18:29~"Continental", BECCZL1AD %in%30:31~"Polar",.default ="Other") |>factor(),.keep ="unused") |>rename_with(tolower)# Join and save data ------------------------------------------------------# Add climate zone to temp/mortality datadf <-left_join(df, df_BEC, by ="salid1")# Save formatted as .rds filesaveRDS(df, "data/diabetes.rds")
01_descriptive_statistics.R
library(tidyverse)library(glue)library(gt)# Load datadf <-readRDS("data/diabetes.rds")# Descriptive results -----------------------------------------------------# Total of all diabetes deaths total_diabetes <-sum(df$diabetes); total_diabetes# Percentage of diabetes deaths by agedf |>summarize(pct_diabetes =100*sum(diabetes)/total_diabetes, .by = age)# Table 1 -----------------------------------------------------------------# Columns 1-5tbl_11 <- df |>summarize(num_cities =n_distinct(salid1),study_period =glue("({first_year}-{final_year})",first_year =min(year(date)),final_year =max(year(date))),tot_diabetes =sum(diabetes),pct_diabetes =100*sum(diabetes)/total_diabetes,.by = country) |>arrange(country); tbl_11# Column 6 [NOT READY, MAY DEPEND ON POP PROJECTIONS DATA]tbl_12 <- df |>mutate(year =year(date)) |>summarize(annual_diabetes =sum(diabetes), .by =c(country, year)) |>summarize(avg_annual_diabetes =glue("{med} ({lower}, {upper})",med =median(annual_diabetes),lower =quantile(annual_diabetes, .1),upper =quantile(annual_diabetes, .9)),.by = country) |>arrange(country); tbl_12# Column 7tbl_13 <- df |>summarize(temp =first(temp), .by =c(country, salid1, date)) |>summarize(temp =glue("{med} ({lower}, {upper})",med =median(temp) |>round(1),lower =quantile(temp, .1) |>round(1),upper =quantile(temp, .9) |>round(1)),.by = country) |>arrange(country); tbl_13# Create tabletbl_1 <- tbl_11 |>left_join(tbl_12, by ="country") |>left_join(tbl_13, by ="country") |>gt() |>tab_header(title ="Table 1. Diabetes mellitus mortality and temperature characteristics in 329 selected cities of Latin America") |>tab_spanner(label ="Total diabetes deaths", columns =c(tot_diabetes, pct_diabetes)) |>fmt_number(columns = tot_diabetes, decimals =0) |>fmt_number(columns = pct_diabetes, decimals =1) |>tab_footnote(location =cells_column_labels(columns =c(avg_annual_diabetes, temp)),footnote ="Median (10th percentile, 90th percentile)") |>tab_footnote(location =cells_column_labels(columns = temp),footnote ="Rates per 100,000 inhabitants") |>opt_footnote_marks(marks ="letters") |>cols_label(country ="Country",num_cities ="No. of cities",study_period ="Study period",tot_diabetes ="n",pct_diabetes ="(%)",avg_annual_diabetes ="Average annual diabetes mortality rate",temp ="Daily mean temperature (°C)"); tbl_1# Save as html table (this can be opened with Excel)gtsave(tbl_1, "results/table_1.html")# Climate zone table ------------------------------------------------------# Create tabletbl_climate_zone <- df |>summarize(num_cities =n_distinct(salid1),min_temp =min(temp),mean_temp =mean(temp),max_temp =max(temp),p01 =quantile(temp, .01),p25 =quantile(temp, .25),p50 =quantile(temp, .5),p75 =quantile(temp, .75),p99 =quantile(temp, .99),.by = climate_zone) |>arrange(climate_zone) |>gt() |>tab_header(title ="Temperature (°C) by climate zone") |>tab_spanner(label ="Percentile", columns =c(p01, p25, p50, p75, p99)) |>fmt_number(columns =c(min_temp, mean_temp, max_temp, p01, p25, p50, p75, p99),decimals =1) |>cols_label(climate_zone ="Climate zone",num_cities ="Number of cities",min_temp ="Minimum",mean_temp ="Mean",max_temp ="Maximum",p01 ="1st",p25 ="25th",p50 ="50th",p75 ="75th",p99 ="99th")# Save as html table (this can be opened with Excel)gtsave(tbl_climate_zone, "results/table_climate_zone.html")
02_analysis_overall.R
library(tidyverse)library(glue)library(gt)library(gnm); library(dlnm); library(splines)library(SALURhelper)# Load datadf <-readRDS("data/diabetes.rds") |>mutate(strata =glue("{salid1}:{age}:{year(date)}:{month(date)}:{wday(date, label = TRUE)}") |>factor(),group =glue("{salid1}:{age}") |>factor())glimpse(df)# Fit model ---------------------------------------------------------------# Model specificationsn_lag <-21splines_var <-list(fun ="ns", knots =c(.1, .75, .9))splines_lag <-list(fun ="ns", knots =logknots(n_lag, nk =3))seq_pred <-seq(0, 1, length.out =101) # Sequence of temp %s for predictionsseq_RR_temps <-c(2, 6, 26, 76, 96, 100) # Used to extract RRs at .01, .05, .25, .75, .95, .99 temperature percentiles# Define cross-basiscbt <-crossbasis(df$t_percent,lag = n_lag,argvar = splines_var,arglag = splines_lag,group = df$group)# Fit conditional quasi-Poisson DLNMfit <-gnm(diabetes ~ cbt,family = quasipoisson,eliminate = strata,data = df)# Minimum mortality temperature percentileMMTp <-crossreduce(cbt, fit, at = seq_pred, cen = .5) |>get_MMT()# Mean risk ratio via reduced predictionsred <-crossreduce(cbt, fit, at = seq_pred, cen = MMTp)red |>get_df_RRs() |>slice(seq_RR_temps) # Extract RRs at specified temp percentiles
03_analysis_age.R
# source("02_analysis_overall.R")# Add age category indicator variablesdf <- df |>mutate(under_65 =ifelse(age !="65+", 1, 0),over_65 =ifelse(age =="65+", 1, 0))glimpse(df)# Fit age-specific effect modification models -----------------------------# Define interaction cross-basescb_age1 <- cbt*df$under_65cb_age2 <- cbt*df$over_65# Fit CTS with age-specific reference groupsfit_age1 <-update(fit, . ~ . + cb_age2) # reference group: <65fit_age2 <-update(fit, . ~ . + cb_age1) # reference group: 65+# Minimum mortality temperature percentileMMTp_age1 <-crossreduce(cbt, fit_age1, at = seq_pred, cen = .5) |>get_MMT()MMTp_age2 <-crossreduce(cbt, fit_age2, at = seq_pred, cen = .5) |>get_MMT()# Mean risk ratio via reduced predictionsred_age1 <-crossreduce(cbt, fit_age1, at = seq_pred, cen = MMTp_age1)red_age2 <-crossreduce(cbt, fit_age2, at = seq_pred, cen = MMTp_age2)list(red_age1, red_age2) |>map(\(x) slice(get_df_RRs(x), seq_RR_temps)) # Extract RRs at specified temp percentiles