Time-Weighted Exposure Aggregation: Demo of calculate_tw_exposure() Function

Introduction

Time-weighted exposure metrics are essential in environmental epidemiology for capturing cumulative or lagged effects of exposures such as noise or air pollution. The calculate_tw_exposure() function provides a flexible way to compute time-weighted averages, minimum/maximum values, and missingness metrics for configurable time windows around each observation.

Function Overview

Function Signature

Code
calculate_tw_exposure(data, 
                     id_var = "id", 
                     time_period_var = "year", 
                     exposure_var = "lden", 
                     timewindow_start = -5,
                     timewindow_end = 0,
                     return_all = TRUE,
                     energy = TRUE,
                     average_function = "mean",
                     result_variable = "tw_avg_exposure")

Key Parameters

Parameter Description
data Data frame with exposure data
id_var ID variable name (default: “id”)
time_period_var Time period variable (e.g., year)
exposure_var Exposure variable (e.g., lden)
timewindow_start Start of window (negative integer, e.g., -5)
timewindow_end End of window (integer, e.g., 0)
energy Use energy-based averaging (for dB)
average_function “mean” or “median”
result_variable Name for result column

Return Value

The function returns a data frame with the following columns (in addition to the original data):

Column Name Description
[original columns] All columns from the input data
[result_variable] Calculated time-weighted average exposure for the window (name set by result_variable param)
window_start Start of the time window (period value)
window_end End of the time window (period value)
window_min_value Minimum exposure value in the window (excluding NAs)
window_max_value Maximum exposure value in the window (excluding NAs)
periods_in_window Number of non-missing periods in the window
missing_periods Number of missing periods in the window
missing_percentage Percentage of missing periods in the window
window_range Text description of the window range (e.g., “2010 - 2014”)
window_description Human-readable description of the window (e.g., “Current year and 4 periods back”)
method_description Description of the averaging method used (e.g., “Energy-based mean”)

Example Usage

Simulated Data

Code
sample_data <- expand.grid(
  id = 1:3,
  year = 2010:2020
) %>%
  as.data.frame() %>%
  mutate(
    base_lden = case_when(
      id == 1 ~ 60,
      id == 2 ~ 65,
      id == 3 ~ 58
    ),
    variation = rnorm(n(), 0, 4),
    lden = base_lden + variation,
    lden = ifelse(id == 1 & year == 2013, NA, lden),
    lden = ifelse(id == 2 & year %in% c(2011, 2012), NA, lden)
  )

# Show sample of the data
head(sample_data) %>%
  gt::gt()
id year base_lden variation lden
1 2010 60 3.5125340 63.51253
2 2010 65 3.2863243 68.28632
3 2010 58 2.7545610 60.75456
1 2011 60 2.2156706 62.21567
2 2011 65 -0.2476468 NA
3 2011 58 -1.2238507 56.77615

Calculate 5-Year Energy-Based Mean

Code
result_energy_mean <- calculate_tw_exposure(
  data = sample_data,
  timewindow_start = -4,
  timewindow_end = 0,
  energy = TRUE,
  average_function = "mean",
  result_variable = "lden_5yr_avg"
)

result_energy_mean %>%
  filter(id == 1) %>%
  select(id, year, window_range, window_description, lden, lden_5yr_avg, window_min_value, window_max_value, periods_in_window, missing_percentage ) %>%
  slice(4:20) %>%
  gt::gt()
id year window_range window_description lden lden_5yr_avg window_min_value window_max_value periods_in_window missing_percentage
1 2013 2009 - 2013 Current year and 4 periods back NA 61.87066 58.47812 63.51253 3 40
1 2014 2010 - 2014 Current year and 4 periods back 55.50757 60.94349 55.50757 63.51253 4 20
1 2015 2011 - 2015 Current year and 4 periods back 63.11986 60.77050 55.50757 63.11986 4 20
1 2016 2012 - 2016 Current year and 4 periods back 59.88581 60.09128 55.50757 63.11986 4 20
1 2017 2013 - 2017 Current year and 4 periods back 59.09692 60.20447 55.50757 63.11986 4 20
1 2018 2014 - 2018 Current year and 4 periods back 62.33845 60.72337 55.50757 63.11986 5 0
1 2019 2015 - 2019 Current year and 4 periods back 61.51856 61.44221 59.09692 63.11986 5 0
1 2020 2016 - 2020 Current year and 4 periods back 55.92570 60.26091 55.92570 62.33845 5 0

The calculate_tw_exposure() function computes a 5-year time-weighted average for each year, using the current year and the previous four years of data. The function also provides information on the minimum and maximum values in the window, as well as the number of missing periods. This is useful for understanding the data quality and completeness of the exposure data.

Code
result_energy_mean_lag <- calculate_tw_exposure(
  data = sample_data,
  timewindow_start = -5,
  timewindow_end = 0,
  energy = TRUE,
  average_function = "mean",
  result_variable = "lden_lag1_6_5yr_avg"
)

result_energy_mean %>%
  filter(id == 1) %>%
  select(id, year, window_range, window_description, lden, lden_lag1_6_5yr_avg, window_min_value, window_max_value, periods_in_window, missing_percentage ) %>%
  slice(4:20) %>%
  gt::gt()

The calculate_tw_exposure() function can also be used to compute a 5-year time-weighted average for the previous year (lagged exposure). This is useful for assessing the impact of past exposures on current health outcomes. This approach is particularly relevant in epidemiological studies where the effects of exposure may not be immediate and can manifest over time.

Visualize 5-Year Averages

Code
ggplot(result_energy_mean, aes(x = year, y = lden_5yr_avg, color = factor(id))) +
  geom_line() +
  geom_point() +
  labs(
    title = "5-Year Time-Weighted Energy-Based Mean Noise Exposure",
    x = "Year",
    y = "5-Year Avg Lden (dB)",
    color = "ID"
  ) +
  theme_minimal()

Comparing Averaging Methods

Code
result_energy_median <- calculate_tw_exposure(
  data = sample_data,
  timewindow_start = -4,
  timewindow_end = 0,
  energy = TRUE,
  average_function = "median",
  result_variable = "lden_5yr_avg"
)

result_arith_mean <- calculate_tw_exposure(
  data = sample_data,
  timewindow_start = -4,
  timewindow_end = 0,
  energy = FALSE,
  average_function = "mean",
  result_variable = "lden_5yr_avg"
)

result_arith_median <- calculate_tw_exposure(
  data = sample_data,
  timewindow_start = -4,
  timewindow_end = 0,
  energy = FALSE,
  average_function = "median",
  result_variable = "lden_5yr_avg"
)

compare_methods <- bind_rows(
  result_energy_mean %>% filter(id == 2, year == 2018) %>% select(id, year, lden_5yr_avg, method_description),
  result_energy_median %>% filter(id == 2, year == 2018) %>% select(id, year, lden_5yr_avg, method_description),
  result_arith_mean %>% filter(id == 2, year == 2018) %>% select(id, year, lden_5yr_avg, method_description),
  result_arith_median %>% filter(id == 2, year == 2018) %>% select(id, year, lden_5yr_avg, method_description)
)

compare_methods %>%
  gt::gt()
id year lden_5yr_avg method_description
2 2018 66.89778 Energy-based mean
2 2018 64.82852 Energy-based median
2 2018 65.88896 Arithmetic mean
2 2018 64.82852 Arithmetic median

Handling Missing Data

Code
result_energy_mean %>%
  filter(id == 1) %>%
  select(year, periods_in_window, missing_periods, missing_percentage) %>%
  head(10) %>%
  gt::gt()
year periods_in_window missing_periods missing_percentage
2010 1 4 80
2011 2 3 60
2012 3 2 40
2013 3 2 40
2014 4 1 20
2015 4 1 20
2016 4 1 20
2017 4 1 20
2018 5 0 0
2019 5 0 0

Data Quality Visualization

Code
ggplot(result_energy_mean, aes(x = year, y = missing_percentage, color = factor(id))) +
  geom_col(position = "dodge", fill = NA) +
  labs(
    title = "Missing Data Percentage in 5-Year Windows",
    x = "Year",
    y = "Missing Data (%)",
    color = "ID"
  ) +
  theme_minimal()

Example: Quarterly Data and 2-Year Preceding Average

Simulated Quarterly Data

Code
quarterly_sample_data <- expand.grid(
  id = 1:2,
  quarter = 1:16
) %>%
  mutate(
    year = 2015 + (quarter - 1) %/% 4,
    qtr = ((quarter - 1) %% 4) + 1,
    period_id = paste0(year, "Q", qtr),
    lden = 60 + 5 * sin(quarter / 2) + rnorm(n(), 0, 2)
  )

head(quarterly_sample_data) %>%
  gt::gt()
id quarter year qtr period_id lden
1 1 2015 1 2015Q1 64.38414
2 1 2015 1 2015Q1 63.49392
1 2 2015 2 2015Q2 64.68482
2 2 2015 2 2015Q2 62.95154
1 3 2015 3 2015Q3 67.70878
2 3 2015 3 2015Q3 63.78696

Calculate 2-Year (8-Quarter) Preceding Average

Code
result_2yr_avg <- calculate_tw_exposure(
  data = quarterly_sample_data,
  id_var = "id",
  time_period_var = "quarter",
  exposure_var = "lden",
  timewindow_start = -8,
  timewindow_end = 0,
  energy = TRUE,
  average_function = "mean",
  result_variable = "lden_2yr_avg"
)

result_2yr_avg %>%
  filter(id == 1) %>%
  select(id, quarter, lden, lden_2yr_avg, window_min_value, window_max_value, periods_in_window, missing_percentage) %>%
  head(12) %>%
  gt::gt()
id quarter lden lden_2yr_avg window_min_value window_max_value periods_in_window missing_percentage
1 1 64.38414 64.38414 64.38414 64.38414 1 88.88889
1 2 64.68482 64.53708 64.38414 64.68482 2 77.77778
1 3 67.70878 65.86792 64.38414 67.70878 3 66.66667
1 4 68.92115 66.85424 64.38414 68.92115 4 55.55556
1 5 62.52096 66.26806 62.52096 68.92115 5 44.44444
1 6 59.28479 65.64683 59.28479 68.92115 6 33.33333
1 7 57.75270 65.09335 57.75270 68.92115 7 22.22222
1 8 54.31275 64.56496 54.31275 68.92115 8 11.11111
1 9 53.54254 64.09612 53.54254 68.92115 9 0.00000
1 10 54.44493 63.60615 53.54254 68.92115 9 0.00000
1 11 55.32160 63.02154 53.54254 68.92115 9 0.00000
1 12 55.36716 61.42334 53.54254 68.92115 9 0.00000

A 2-year (8-quarter) time-weighted average is calculated for each quarter, using all available data in the window. This approach is useful for exposures with potential cumulative or lagged effects over multiple years.

Best Practices

  • Use energy = TRUE for decibel or other logarithmic exposures
  • Choose window size based on exposure-outcome latency
  • Filter or flag periods with high missingness
  • Compare mean vs median for robustness

Session Info