Time-weighted exposure metrics are essential in environmental epidemiology for capturing cumulative or lagged effects of exposures such as noise or air pollution. The calculate_tw_exposure() function provides a flexible way to compute time-weighted averages, minimum/maximum values, and missingness metrics for configurable time windows around each observation.
The calculate_tw_exposure() function computes a 5-year time-weighted average for each year, using the current year and the previous four years of data. The function also provides information on the minimum and maximum values in the window, as well as the number of missing periods. This is useful for understanding the data quality and completeness of the exposure data.
The calculate_tw_exposure() function can also be used to compute a 5-year time-weighted average for the previous year (lagged exposure). This is useful for assessing the impact of past exposures on current health outcomes. This approach is particularly relevant in epidemiological studies where the effects of exposure may not be immediate and can manifest over time.
Visualize 5-Year Averages
Code
ggplot(result_energy_mean, aes(x =year, y =lden_5yr_avg, color =factor(id)))+geom_line()+geom_point()+labs( title ="5-Year Time-Weighted Energy-Based Mean Noise Exposure", x ="Year", y ="5-Year Avg Lden (dB)", color ="ID")+theme_minimal()
Comparing Averaging Methods
Code
result_energy_median<-calculate_tw_exposure( data =sample_data, timewindow_start =-4, timewindow_end =0, energy =TRUE, average_function ="median", result_variable ="lden_5yr_avg")result_arith_mean<-calculate_tw_exposure( data =sample_data, timewindow_start =-4, timewindow_end =0, energy =FALSE, average_function ="mean", result_variable ="lden_5yr_avg")result_arith_median<-calculate_tw_exposure( data =sample_data, timewindow_start =-4, timewindow_end =0, energy =FALSE, average_function ="median", result_variable ="lden_5yr_avg")compare_methods<-bind_rows(result_energy_mean%>%filter(id==2, year==2018)%>%select(id, year, lden_5yr_avg, method_description),result_energy_median%>%filter(id==2, year==2018)%>%select(id, year, lden_5yr_avg, method_description),result_arith_mean%>%filter(id==2, year==2018)%>%select(id, year, lden_5yr_avg, method_description),result_arith_median%>%filter(id==2, year==2018)%>%select(id, year, lden_5yr_avg, method_description))compare_methods%>%gt::gt()
ggplot(result_energy_mean, aes(x =year, y =missing_percentage, color =factor(id)))+geom_col(position ="dodge", fill =NA)+labs( title ="Missing Data Percentage in 5-Year Windows", x ="Year", y ="Missing Data (%)", color ="ID")+theme_minimal()
Example: Quarterly Data and 2-Year Preceding Average
Simulated Quarterly Data
Code
quarterly_sample_data<-expand.grid( id =1:2, quarter =1:16)%>%mutate( year =2015+(quarter-1)%/%4, qtr =((quarter-1)%%4)+1, period_id =paste0(year, "Q", qtr), lden =60+5*sin(quarter/2)+rnorm(n(), 0, 2))head(quarterly_sample_data)%>%gt::gt()
A 2-year (8-quarter) time-weighted average is calculated for each quarter, using all available data in the window. This approach is useful for exposures with potential cumulative or lagged effects over multiple years.
Best Practices
Use energy = TRUE for decibel or other logarithmic exposures
Choose window size based on exposure-outcome latency
Filter or flag periods with high missingness
Compare mean vs median for robustness
Session Info
Source Code
---title: "Time-Weighted Exposure Aggregation: Demo of calculate_tw_exposure() Function"format: html: toc: true toc-depth: 4 toc-location: left code-fold: show code-tools: true code-link: true embed-resources: true theme: cosmo self-contained: true pdf: toc: true toc-depth: 4 number-sections: true colorlinks: true code-fold: show keep-tex: falseexecute: echo: true warning: false message: falsebibliography: references.bib---```{r}#| label: setup#| eval: true#| include: false#| warning: false#| echo: falselibrary(dplyr)library(ggplot2)library(gt)set.seed(123)# Source the function file (adjust path as needed)source("calculate_tw_exposure.R")```# IntroductionTime-weighted exposure metrics are essential in environmental epidemiology for capturing cumulative or lagged effects of exposures such as noise or air pollution. The `calculate_tw_exposure()` function provides a flexible way to compute time-weighted averages, minimum/maximum values, and missingness metrics for configurable time windows around each observation.# Function Overview## Function Signature```{r}#| label: function-signature#| eval: falsecalculate_tw_exposure(data, id_var ="id", time_period_var ="year", exposure_var ="lden", timewindow_start =-5,timewindow_end =0,return_all =TRUE,energy =TRUE,average_function ="mean",result_variable ="tw_avg_exposure")```## Key Parameters| Parameter | Description ||-----------|-------------|| `data` | Data frame with exposure data || `id_var` | ID variable name (default: "id") || `time_period_var` | Time period variable (e.g., year) || `exposure_var` | Exposure variable (e.g., lden) || `timewindow_start` | Start of window (negative integer, e.g., -5) || `timewindow_end` | End of window (integer, e.g., 0) || `energy` | Use energy-based averaging (for dB) || `average_function` | "mean" or "median" || `result_variable` | Name for result column |## Return ValueThe function returns a data frame with the following columns (in addition to the original data):| Column Name | Description ||------------------------|----------------------------------------------------------------------------------------------|| [original columns] | All columns from the input data || [result_variable] | Calculated time-weighted average exposure for the window (name set by result_variable param) || window_start | Start of the time window (period value) || window_end | End of the time window (period value) || window_min_value | Minimum exposure value in the window (excluding NAs) || window_max_value | Maximum exposure value in the window (excluding NAs) || periods_in_window | Number of non-missing periods in the window || missing_periods | Number of missing periods in the window || missing_percentage | Percentage of missing periods in the window || window_range | Text description of the window range (e.g., "2010 - 2014") || window_description | Human-readable description of the window (e.g., "Current year and 4 periods back") || method_description | Description of the averaging method used (e.g., "Energy-based mean") |# Example Usage## Simulated Data```{r}#| label: example-datasample_data <-expand.grid(id =1:3,year =2010:2020) %>%as.data.frame() %>%mutate(base_lden =case_when( id ==1~60, id ==2~65, id ==3~58 ),variation =rnorm(n(), 0, 4),lden = base_lden + variation,lden =ifelse(id ==1& year ==2013, NA, lden),lden =ifelse(id ==2& year %in%c(2011, 2012), NA, lden) )# Show sample of the datahead(sample_data) %>% gt::gt()```## Calculate 5-Year Energy-Based Mean```{r}#| label: tw-energy-meanresult_energy_mean <-calculate_tw_exposure(data = sample_data,timewindow_start =-4,timewindow_end =0,energy =TRUE,average_function ="mean",result_variable ="lden_5yr_avg")result_energy_mean %>%filter(id ==1) %>%select(id, year, window_range, window_description, lden, lden_5yr_avg, window_min_value, window_max_value, periods_in_window, missing_percentage ) %>%slice(4:20) %>% gt::gt()```The `calculate_tw_exposure()` function computes a 5-year time-weighted average for each year, using the current year and the previous four years of data. The function also provides information on the minimum and maximum values in the window, as well as the number of missing periods.This is useful for understanding the data quality and completeness of the exposure data.```{r}#| label: tw-energy-mean-lag1#| eval: falseresult_energy_mean_lag <-calculate_tw_exposure(data = sample_data,timewindow_start =-5,timewindow_end =0,energy =TRUE,average_function ="mean",result_variable ="lden_lag1_6_5yr_avg")result_energy_mean %>%filter(id ==1) %>%select(id, year, window_range, window_description, lden, lden_lag1_6_5yr_avg, window_min_value, window_max_value, periods_in_window, missing_percentage ) %>%slice(4:20) %>% gt::gt()```The `calculate_tw_exposure()` function can also be used to compute a 5-year time-weighted average for the previous year (lagged exposure). This is useful for assessing the impact of past exposures on current health outcomes.This approach is particularly relevant in epidemiological studies where the effects of exposure may not be immediate and can manifest over time.## Visualize 5-Year Averages```{r}#| label: plot-tw-avg#| fig-width: 8#| fig-height: 5ggplot(result_energy_mean, aes(x = year, y = lden_5yr_avg, color =factor(id))) +geom_line() +geom_point() +labs(title ="5-Year Time-Weighted Energy-Based Mean Noise Exposure",x ="Year",y ="5-Year Avg Lden (dB)",color ="ID" ) +theme_minimal()```# Comparing Averaging Methods```{r}#| label: compare-methodsresult_energy_median <-calculate_tw_exposure(data = sample_data,timewindow_start =-4,timewindow_end =0,energy =TRUE,average_function ="median",result_variable ="lden_5yr_avg")result_arith_mean <-calculate_tw_exposure(data = sample_data,timewindow_start =-4,timewindow_end =0,energy =FALSE,average_function ="mean",result_variable ="lden_5yr_avg")result_arith_median <-calculate_tw_exposure(data = sample_data,timewindow_start =-4,timewindow_end =0,energy =FALSE,average_function ="median",result_variable ="lden_5yr_avg")compare_methods <-bind_rows( result_energy_mean %>%filter(id ==2, year ==2018) %>%select(id, year, lden_5yr_avg, method_description), result_energy_median %>%filter(id ==2, year ==2018) %>%select(id, year, lden_5yr_avg, method_description), result_arith_mean %>%filter(id ==2, year ==2018) %>%select(id, year, lden_5yr_avg, method_description), result_arith_median %>%filter(id ==2, year ==2018) %>%select(id, year, lden_5yr_avg, method_description))compare_methods %>% gt::gt()```# Handling Missing Data```{r}#| label: missingnessresult_energy_mean %>%filter(id ==1) %>%select(year, periods_in_window, missing_periods, missing_percentage) %>%head(10) %>% gt::gt()```# Data Quality Visualization```{r}#| label: plot-missingness#| fig-width: 8#| fig-height: 5ggplot(result_energy_mean, aes(x = year, y = missing_percentage, color =factor(id))) +geom_col(position ="dodge", fill =NA) +labs(title ="Missing Data Percentage in 5-Year Windows",x ="Year",y ="Missing Data (%)",color ="ID" ) +theme_minimal()```# Example: Quarterly Data and 2-Year Preceding Average## Simulated Quarterly Data```{r}#| label: quarterly-demo-dataquarterly_sample_data <-expand.grid(id =1:2,quarter =1:16) %>%mutate(year =2015+ (quarter -1) %/%4,qtr = ((quarter -1) %%4) +1,period_id =paste0(year, "Q", qtr),lden =60+5*sin(quarter /2) +rnorm(n(), 0, 2) )head(quarterly_sample_data) %>% gt::gt()```## Calculate 2-Year (8-Quarter) Preceding Average```{r}#| label: tw-2yr-avgresult_2yr_avg <-calculate_tw_exposure(data = quarterly_sample_data,id_var ="id",time_period_var ="quarter",exposure_var ="lden",timewindow_start =-8,timewindow_end =0,energy =TRUE,average_function ="mean",result_variable ="lden_2yr_avg")result_2yr_avg %>%filter(id ==1) %>%select(id, quarter, lden, lden_2yr_avg, window_min_value, window_max_value, periods_in_window, missing_percentage) %>%head(12) %>% gt::gt()```A 2-year (8-quarter) time-weighted average is calculated for each quarter, using all available data in the window. This approach is useful for exposures with potential cumulative or lagged effects over multiple years.# Best Practices- Use `energy = TRUE` for decibel or other logarithmic exposures- Choose window size based on exposure-outcome latency- Filter or flag periods with high missingness- Compare mean vs median for robustness# Session Info```{r}#| eval: false#| echo: false#| label: session-infosessionInfo()```