Time-Weighted Exposure Aggregation: Demo of calculate_tw_exposure() Function

Introduction

Time-weighted exposure metrics are essential in environmental epidemiology for capturing cumulative or lagged effects of exposures such as noise or air pollution. The calculate_tw_exposure() function provides a flexible way to compute time-weighted averages, minimum/maximum values, and missingness metrics for configurable time windows around each observation.

Function Overview

Function Signature

Code

calculate_tw_exposure(data, 
                     id_var = "id", 
                     time_period_var = "year", 
                     exposure_var = "lden", 
                     timewindow_start = -5,
                     timewindow_end = 0,
                     return_all = TRUE,
                     energy = TRUE,
                     average_function = "mean",
                     result_variable = "tw_avg_exposure")

Key Parameters

Parameter	Description
`data`	Data frame with exposure data
`id_var`	ID variable name (default: “id”)
`time_period_var`	Time period variable (e.g., year)
`exposure_var`	Exposure variable (e.g., lden)
`timewindow_start`	Start of window (negative integer, e.g., -5)
`timewindow_end`	End of window (integer, e.g., 0)
`energy`	Use energy-based averaging (for dB)
`average_function`	“mean” or “median”
`result_variable`	Name for result column

Return Value

The function returns a data frame with the following columns (in addition to the original data):

Column Name	Description
[original columns]	All columns from the input data
[result_variable]	Calculated time-weighted average exposure for the window (name set by result_variable param)
window_start	Start of the time window (period value)
window_end	End of the time window (period value)
window_min_value	Minimum exposure value in the window (excluding NAs)
window_max_value	Maximum exposure value in the window (excluding NAs)
periods_in_window	Number of non-missing periods in the window
missing_periods	Number of missing periods in the window
missing_percentage	Percentage of missing periods in the window
window_range	Text description of the window range (e.g., “2010 - 2014”)
window_description	Human-readable description of the window (e.g., “Current year and 4 periods back”)
method_description	Description of the averaging method used (e.g., “Energy-based mean”)

Example Usage

Simulated Data

Code

sample_data <- expand.grid(
  id = 1:3,
  year = 2010:2020
) %>%
  as.data.frame() %>%
  mutate(
    base_lden = case_when(
      id == 1 ~ 60,
      id == 2 ~ 65,
      id == 3 ~ 58
    ),
    variation = rnorm(n(), 0, 4),
    lden = base_lden + variation,
    lden = ifelse(id == 1 & year == 2013, NA, lden),
    lden = ifelse(id == 2 & year %in% c(2011, 2012), NA, lden)
  )

# Show sample of the data
head(sample_data) %>%
  gt::gt()

id	year	base_lden	variation	lden
1	2010	60	3.5125340	63.51253
2	2010	65	3.2863243	68.28632
3	2010	58	2.7545610	60.75456
1	2011	60	2.2156706	62.21567
2	2011	65	-0.2476468	NA
3	2011	58	-1.2238507	56.77615

Calculate 5-Year Energy-Based Mean

Code

result_energy_mean <- calculate_tw_exposure(
  data = sample_data,
  timewindow_start = -4,
  timewindow_end = 0,
  energy = TRUE,
  average_function = "mean",
  result_variable = "lden_5yr_avg"
)

result_energy_mean %>%
  filter(id == 1) %>%
  select(id, year, window_range, window_description, lden, lden_5yr_avg, window_min_value, window_max_value, periods_in_window, missing_percentage ) %>%
  slice(4:20) %>%
  gt::gt()

id	year	window_range	window_description	lden	lden_5yr_avg	window_min_value	window_max_value	periods_in_window	missing_percentage
1	2013	2009 - 2013	Current year and 4 periods back	NA	61.87066	58.47812	63.51253	3	40
1	2014	2010 - 2014	Current year and 4 periods back	55.50757	60.94349	55.50757	63.51253	4	20
1	2015	2011 - 2015	Current year and 4 periods back	63.11986	60.77050	55.50757	63.11986	4	20
1	2016	2012 - 2016	Current year and 4 periods back	59.88581	60.09128	55.50757	63.11986	4	20
1	2017	2013 - 2017	Current year and 4 periods back	59.09692	60.20447	55.50757	63.11986	4	20
1	2018	2014 - 2018	Current year and 4 periods back	62.33845	60.72337	55.50757	63.11986	5	0
1	2019	2015 - 2019	Current year and 4 periods back	61.51856	61.44221	59.09692	63.11986	5	0
1	2020	2016 - 2020	Current year and 4 periods back	55.92570	60.26091	55.92570	62.33845	5	0

The calculate_tw_exposure() function computes a 5-year time-weighted average for each year, using the current year and the previous four years of data. The function also provides information on the minimum and maximum values in the window, as well as the number of missing periods. This is useful for understanding the data quality and completeness of the exposure data.

Code

result_energy_mean_lag <- calculate_tw_exposure(
  data = sample_data,
  timewindow_start = -5,
  timewindow_end = 0,
  energy = TRUE,
  average_function = "mean",
  result_variable = "lden_lag1_6_5yr_avg"
)

result_energy_mean %>%
  filter(id == 1) %>%
  select(id, year, window_range, window_description, lden, lden_lag1_6_5yr_avg, window_min_value, window_max_value, periods_in_window, missing_percentage ) %>%
  slice(4:20) %>%
  gt::gt()

The calculate_tw_exposure() function can also be used to compute a 5-year time-weighted average for the previous year (lagged exposure). This is useful for assessing the impact of past exposures on current health outcomes. This approach is particularly relevant in epidemiological studies where the effects of exposure may not be immediate and can manifest over time.

Visualize 5-Year Averages

Code

ggplot(result_energy_mean, aes(x = year, y = lden_5yr_avg, color = factor(id))) +
  geom_line() +
  geom_point() +
  labs(
    title = "5-Year Time-Weighted Energy-Based Mean Noise Exposure",
    x = "Year",
    y = "5-Year Avg Lden (dB)",
    color = "ID"
  ) +
  theme_minimal()

Comparing Averaging Methods

Code

result_energy_median <- calculate_tw_exposure(
  data = sample_data,
  timewindow_start = -4,
  timewindow_end = 0,
  energy = TRUE,
  average_function = "median",
  result_variable = "lden_5yr_avg"
)

result_arith_mean <- calculate_tw_exposure(
  data = sample_data,
  timewindow_start = -4,
  timewindow_end = 0,
  energy = FALSE,
  average_function = "mean",
  result_variable = "lden_5yr_avg"
)

result_arith_median <- calculate_tw_exposure(
  data = sample_data,
  timewindow_start = -4,
  timewindow_end = 0,
  energy = FALSE,
  average_function = "median",
  result_variable = "lden_5yr_avg"
)

compare_methods <- bind_rows(
  result_energy_mean %>% filter(id == 2, year == 2018) %>% select(id, year, lden_5yr_avg, method_description),
  result_energy_median %>% filter(id == 2, year == 2018) %>% select(id, year, lden_5yr_avg, method_description),
  result_arith_mean %>% filter(id == 2, year == 2018) %>% select(id, year, lden_5yr_avg, method_description),
  result_arith_median %>% filter(id == 2, year == 2018) %>% select(id, year, lden_5yr_avg, method_description)
)

compare_methods %>%
  gt::gt()

id	year	lden_5yr_avg	method_description
2	2018	66.89778	Energy-based mean
2	2018	64.82852	Energy-based median
2	2018	65.88896	Arithmetic mean
2	2018	64.82852	Arithmetic median

Handling Missing Data

Code

result_energy_mean %>%
  filter(id == 1) %>%
  select(year, periods_in_window, missing_periods, missing_percentage) %>%
  head(10) %>%
  gt::gt()

year	periods_in_window	missing_periods	missing_percentage
2010	1	4	80
2011	2	3	60
2012	3	2	40
2013	3	2	40
2014	4	1	20
2015	4	1	20
2016	4	1	20
2017	4	1	20
2018	5	0	0
2019	5	0	0

Data Quality Visualization

Code

ggplot(result_energy_mean, aes(x = year, y = missing_percentage, color = factor(id))) +
  geom_col(position = "dodge", fill = NA) +
  labs(
    title = "Missing Data Percentage in 5-Year Windows",
    x = "Year",
    y = "Missing Data (%)",
    color = "ID"
  ) +
  theme_minimal()

Example: Quarterly Data and 2-Year Preceding Average

Simulated Quarterly Data

Code

quarterly_sample_data <- expand.grid(
  id = 1:2,
  quarter = 1:16
) %>%
  mutate(
    year = 2015 + (quarter - 1) %/% 4,
    qtr = ((quarter - 1) %% 4) + 1,
    period_id = paste0(year, "Q", qtr),
    lden = 60 + 5 * sin(quarter / 2) + rnorm(n(), 0, 2)
  )

head(quarterly_sample_data) %>%
  gt::gt()

id	quarter	year	qtr	period_id	lden
1	1	2015	1	2015Q1	64.38414
2	1	2015	1	2015Q1	63.49392
1	2	2015	2	2015Q2	64.68482
2	2	2015	2	2015Q2	62.95154
1	3	2015	3	2015Q3	67.70878
2	3	2015	3	2015Q3	63.78696

Calculate 2-Year (8-Quarter) Preceding Average

Code

result_2yr_avg <- calculate_tw_exposure(
  data = quarterly_sample_data,
  id_var = "id",
  time_period_var = "quarter",
  exposure_var = "lden",
  timewindow_start = -8,
  timewindow_end = 0,
  energy = TRUE,
  average_function = "mean",
  result_variable = "lden_2yr_avg"
)

result_2yr_avg %>%
  filter(id == 1) %>%
  select(id, quarter, lden, lden_2yr_avg, window_min_value, window_max_value, periods_in_window, missing_percentage) %>%
  head(12) %>%
  gt::gt()

id	quarter	lden	lden_2yr_avg	window_min_value	window_max_value	periods_in_window	missing_percentage
1	1	64.38414	64.38414	64.38414	64.38414	1	88.88889
1	2	64.68482	64.53708	64.38414	64.68482	2	77.77778
1	3	67.70878	65.86792	64.38414	67.70878	3	66.66667
1	4	68.92115	66.85424	64.38414	68.92115	4	55.55556
1	5	62.52096	66.26806	62.52096	68.92115	5	44.44444
1	6	59.28479	65.64683	59.28479	68.92115	6	33.33333
1	7	57.75270	65.09335	57.75270	68.92115	7	22.22222
1	8	54.31275	64.56496	54.31275	68.92115	8	11.11111
1	9	53.54254	64.09612	53.54254	68.92115	9	0.00000
1	10	54.44493	63.60615	53.54254	68.92115	9	0.00000
1	11	55.32160	63.02154	53.54254	68.92115	9	0.00000
1	12	55.36716	61.42334	53.54254	68.92115	9	0.00000

A 2-year (8-quarter) time-weighted average is calculated for each quarter, using all available data in the window. This approach is useful for exposures with potential cumulative or lagged effects over multiple years.

Best Practices

Use energy = TRUE for decibel or other logarithmic exposures
Choose window size based on exposure-outcome latency
Filter or flag periods with high missingness
Compare mean vs median for robustness

Introduction

Function Overview

Function Signature

Key Parameters

Return Value

Example Usage

Simulated Data

Calculate 5-Year Energy-Based Mean

Visualize 5-Year Averages

Comparing Averaging Methods

Handling Missing Data

Data Quality Visualization

Example: Quarterly Data and 2-Year Preceding Average

Simulated Quarterly Data

Calculate 2-Year (8-Quarter) Preceding Average

Best Practices

Session Info