Exposure Aggregation in Epidemiological Research: Time-Weighted Methods and Applications

Introduction

In epidemiological research, especially in environmental health studies, properly aggregating exposure data over time is crucial for accurate exposure assessment (Brink and Haagsma 2024). Environmental exposures such as noise, typically reported using standardized metrics like day-evening-night level (\(L_{den}\)) (European Union 2002; Day–evening–night noise level 2025), exhibit significant temporal variations that must be carefully processed to avoid exposure misclassification. This document provides a comprehensive guide to using the create_period_exposures() function, which offers flexible methods for aggregating environmental exposure measurements across different time periods.

Key Features of the Function

The create_period_exposures() function provides:

  1. Aggregation of exposures into standardized time periods (month, quarter, half-year, year)
  2. Support for both point-in-time data and period-based exposure data
  3. Special handling for logarithmic values (e.g., decibel-scale noise measurements)
  4. Time-weighted averaging to account for varying exposure durations
  5. Data quality metrics to assess completeness of exposure information
  6. Flexible parameter options to customize the aggregation process

Function Documentation

Function Signature and Parameters

Code
create_period_exposures <- function(
    data, 
    id_var = "id", 
    date_var = NULL,
    in_date_var = "period_start",
    out_date_var = "period_end",
    exposure_var = "lden", 
    period = "quarter",
    energy = TRUE,
    average_function = "mean",
    weight_by_days = TRUE,
    # output variables
    result_variable = "avg_exposure",
    period_id_var = "period_id",
    period_start_var = "period_start",
    period_end_var = "period_end",
    keep_original = FALSE,
    # debugging
    debug = FALSE
)

Parameter Descriptions

Parameter Description
data Data frame containing exposure data
id_var Name of ID variable (default: “id”)
date_var Name of date variable for point data (default: NULL)
in_date_var Name of start date variable for period data (default: “period_start”)
out_date_var Name of end date variable for period data (default: “period_end”)
exposure_var Name of exposure variable (default: “lden”)
period Aggregation period type: “month”, “quarter”, “halfyear”, “year” (default: “quarter”)
energy Whether to use energy-based averaging for logarithmic values (default: TRUE)
average_function Function to use for averaging: “mean” or “median” (default: “mean”)
weight_by_days Whether to weight by days in period (default: TRUE)
result_variable Name for the result column (default: “avg_exposure”)
period_id_var Name for period identifier column (default: “period_id”)
period_start_var Name for period start column (default: “period_start”)
period_end_var Name for period end column (default: “period_end”)
keep_original Whether to keep original values (default: FALSE)
debug Whether to show debugging information (default: FALSE)

Return Value

The function returns a data frame with one row per ID per time period with:

  • ID variable
  • Period identifier (e.g., “2020-01”, “2020Q1”)
  • Period start and end dates
  • Aggregated exposure value
  • Minimum and maximum exposure values in period
  • Data quality metrics:
    • Days in period
    • Days with data
    • Missing days
    • Missing percentage
  • Method description (e.g., “Energy-based mean (day-weighted)”)

Use Cases and Examples

Example 1: Point-in-Time Data (Daily Measurements)

This example demonstrates how to aggregate daily PM2.5 measurements into monthly periods.

Generate Sample Daily Data

Code
# Create a date sequence for 2020
all_dates <- seq(as.Date("2020-01-01"), as.Date("2020-12-31"), by = "day")

# Create sample daily data for 3 individuals with extensive missing date patterns
# to better illustrate missingness threshold effects
daily_data <- data.frame(
  id = rep(1:3, each = length(all_dates)),
  date = rep(all_dates, 3)
) %>%
  # Add diverse missing patterns:
  # ID 1: Various missing patterns throughout the year
  filter(!(id == 1 & month(date) == 1 & day(date) %in% c(5:9, 15:19, 25:29))) %>% # Jan: 15 missing days (3 sets of 5 days)
  filter(!(id == 1 & month(date) == 3 & day(date) %in% c(10:25))) %>% # Mar: 16 consecutive days missing
  filter(!(id == 1 & month(date) %in% c(5, 6) & day(date) %% 3 == 0)) %>% # May-Jun: Every 3rd day missing
  filter(!(id == 1 & month(date) == 8 & day(date) %in% c(1:7, 15:21))) %>% # Aug: Two weeks missing
  filter(!(id == 1 & month(date) == 11 & day(date) %in% c(1:5, 11:15, 21:25))) %>% # Nov: 15 days missing in sets
  
  # ID 2: Some months with low to moderate missingness, some with high
  filter(!(id == 2 & month(date) == 2 & day(date) %% 5 == 0)) %>% # Feb: Every 5th day missing (low)
  filter(!(id == 2 & month(date) == 4 & day(date) %in% c(1:20))) %>% # Apr: 20 days missing (high)
  filter(!(id == 2 & month(date) == 6 & day(date) %% 3 == 0)) %>% # Jun: Every 3rd day missing (moderate)
  filter(!(id == 2 & month(date) == 7 & day(date) %in% c(5:10, 20:31))) %>% # Jul: 18 days missing (high)
  filter(!(id == 2 & month(date) == 9 & day(date) %% 4 == 0)) %>% # Sep: Every 4th day missing (low)
  filter(!(id == 2 & month(date) == 12 & day(date) %in% c(10:30))) %>% # Dec: 21 days missing (high)
  
  # ID 3: Gradual increase in missingness through the year
  filter(!(id == 3 & month(date) == 1 & day(date) %in% c(15:17))) %>% # Jan: 3 days missing (~10%)
  filter(!(id == 3 & month(date) == 2 & day(date) %in% c(5:9))) %>% # Feb: 5 days missing (~17%) 
  filter(!(id == 3 & month(date) == 4 & day(date) %in% c(10:16))) %>% # Apr: 7 days missing (~23%)
  filter(!(id == 3 & month(date) == 5 & day(date) %in% c(5:14))) %>% # May: 10 days missing (~33%)
  filter(!(id == 3 & month(date) == 7 & day(date) %in% c(1:15))) %>% # Jul: 15 days missing (~48%)
  filter(!(id == 3 & month(date) == 9 & day(date) %in% c(1:20))) %>% # Sep: 20 days missing (~67%)
  filter(!(id == 3 & month(date) == 11 & day(date) %in% c(1:25))) %>% # Nov: 25 days missing (~83%)
  
  # Add PM2.5 exposure values
  mutate(
    # Base level
    base_pm25 = case_when(
      id == 1 ~ 12.0, # urban area
      id == 2 ~ 15.5, # near highway
      id == 3 ~ 8.5,  # suburban
      TRUE ~ 10.0
    ),
    # Seasonal pattern (higher in winter)
    month_effect = 5 * cos((month(date) - 1) * pi/6),
    # Random daily variation
    daily_variation = rnorm(n(), 0, 2),
    # Final PM2.5 value
    pm25 = pmax(0, base_pm25 + month_effect + daily_variation) # ensure no negative values
  )

# Calculate missing days per month for documentation
monthly_summary <- daily_data %>%
  group_by(id, year = year(date), month = month(date)) %>%
  summarize(
    days_with_data = n(),
    days_in_month = days_in_month(first(date)),
    missing_days = days_in_month - days_with_data,
    missing_pct = 100 * missing_days / days_in_month,
    .groups = "drop"
  )

# Show sample of the data
head(daily_data) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
id date base_pm25 month_effect daily_variation pm25
1 2020-01-01 12 5 -1.1209513 15.87905
1 2020-01-02 12 5 -0.4603550 16.53965
1 2020-01-03 12 5 3.1174166 20.11742
1 2020-01-04 12 5 0.1410168 17.14102
1 2020-01-10 12 5 0.2585755 17.25858
1 2020-01-11 12 5 3.4301300 20.43013

Plot Daily Data to Show Patterns

Code
# Plot daily values for each ID
ggplot(daily_data, aes(x = date, y = pm25, color = factor(id))) +
  geom_point(alpha = 0.3) +
  geom_smooth(se = FALSE) +
  facet_wrap(~ id, ncol = 1) +
  labs(
    title = "Daily PM2.5 Exposure by Individual",
    subtitle = "Note the missing data patterns and seasonal trends",
    x = "Date",
    y = "PM2.5 (μg/m³)")",
    color = "ID"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

Aggregate to Monthly Periods

Code
# Aggregate daily data to monthly periods
monthly_data <- create_period_exposures(
  data = daily_data,
  id_var = "id",
  date_var = "date",
  exposure_var = "pm25",
  period = "month",
  energy = FALSE,         # Arithmetic averaging for PM2.5 values
  average_function = "mean",
  result_variable = "monthly_pm25",
  debug = FALSE
)

# Show the monthly aggregated data
monthly_data %>%
  select(id, period_id, period_start, period_end, monthly_pm25, 
         missing_percentage, method_description) %>%
  head(10) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
id period_id period_start period_end monthly_pm25 missing_percentage method_description
1 2020-01 2020-01-01 2020-01-31 17.509085 48.38710 Arithmetic mean (day-weighted)
1 2020-02 2020-02-01 2020-02-29 16.257268 0.00000 Arithmetic mean (day-weighted)
1 2020-03 2020-03-01 2020-03-31 14.622774 51.61290 Arithmetic mean (day-weighted)
1 2020-04 2020-04-01 2020-04-30 12.048841 0.00000 Arithmetic mean (day-weighted)
1 2020-05 2020-05-01 2020-05-31 9.484487 32.25806 Arithmetic mean (day-weighted)
1 2020-06 2020-06-01 2020-06-30 7.438511 33.33333 Arithmetic mean (day-weighted)
1 2020-07 2020-07-01 2020-07-31 6.705139 0.00000 Arithmetic mean (day-weighted)
1 2020-08 2020-08-01 2020-08-31 8.116065 45.16129 Arithmetic mean (day-weighted)
1 2020-09 2020-09-01 2020-09-30 9.502457 0.00000 Arithmetic mean (day-weighted)
1 2020-10 2020-10-01 2020-10-31 11.822999 0.00000 Arithmetic mean (day-weighted)

Visualize Monthly Aggregates

Code
# Plot monthly values
ggplot(monthly_data, aes(x = period_start, y = monthly_pm25, color = factor(id))) +
  geom_point() +
  geom_line() +
  facet_wrap(~ id, ncol = 1) +
  labs(
    title = "Monthly Aggregated PM2.5 Exposure",
    x = "Month",
    y = "PM2.5 (μg/m³)")",
    color = "ID"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

Visualize Data Completeness

Code
# Plot missing data percentage by month
ggplot(monthly_data, aes(x = period_start, y = missing_percentage, fill = factor(id))) +
  geom_col(position = "dodge") +
  labs(
    title = "Missing Data Percentage by Month",
    x = "Month",
    y = "Missing Data (%)",
    fill = "ID"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

Example 2: Period Data (Address History with Noise Exposures)

This example demonstrates how to handle exposure data that already covers periods (e.g., address history with noise exposure estimates).

Generate Sample Address History Data

Code
# Create sample address history with noise exposure data
address_data <- data.frame(
  id = rep(1:3, each = 4),
  period_start = as.Date(c(
    # ID 1 address history
    "2020-01-01", "2020-04-15", "2020-07-01", "2020-10-01",
    # ID 2 address history
    "2020-01-01", "2020-03-01", "2020-08-15", "2020-11-01",
    # ID 3 address history
    "2020-01-01", "2020-05-01", "2020-06-15", "2020-09-01"
  )),
  period_end = as.Date(c(
    # ID 1 address history
    "2020-04-14", "2020-06-30", "2020-09-30", "2020-12-31",
    # ID 2 address history
    "2020-02-29", "2020-08-14", "2020-10-31", "2020-12-31",
    # ID 3 address history
    "2020-04-30", "2020-06-14", "2020-08-31", "2020-12-31"
  )),
  # Different Lden (noise) exposure at each address
  lden = c(
    62.3, 55.7, 58.2, 64.1,  # ID 1 exposures
    60.5, 54.2, 66.8, 61.3,  # ID 2 exposures
    53.7, 56.9, 59.5, 65.2   # ID 3 exposures
  )
)

# Show the address history data
address_data %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
id period_start period_end lden
1 2020-01-01 2020-04-14 62.3
1 2020-04-15 2020-06-30 55.7
1 2020-07-01 2020-09-30 58.2
1 2020-10-01 2020-12-31 64.1
2 2020-01-01 2020-02-29 60.5
2 2020-03-01 2020-08-14 54.2
2 2020-08-15 2020-10-31 66.8
2 2020-11-01 2020-12-31 61.3
3 2020-01-01 2020-04-30 53.7
3 2020-05-01 2020-06-14 56.9
3 2020-06-15 2020-08-31 59.5
3 2020-09-01 2020-12-31 65.2

Visualize Address History Timeline

Code
# Create a timeline plot of address history
address_data %>%
  mutate(
    duration = as.numeric(period_end - period_start) + 1,
    address_number = rep(1:4, 3)  # Add address number for each person
  ) %>%
  ggplot(aes(x = period_start, xend = period_end, y = factor(id), yend = factor(id), 
             color = lden, size = lden)) +
  geom_segment() +
  geom_point(aes(x = period_start), size = 3) +
  geom_point(aes(x = period_end), size = 3) +
  geom_text(aes(label = sprintf("%.1f", lden), x = period_start + (period_end - period_start)/2), 
            vjust = -1, size = 3) +
  scale_color_viridis_c(option = "plasma") +
  labs(
    title = "Address History Timeline with Noise Exposure",
    x = "Date",
    y = "Individual ID",
    color = "Noise level (dB)",
    size = "Noise level (dB)"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

Aggregate to Quarterly Periods

Code
# Aggregate to quarterly periods
quarterly_data <- create_period_exposures(
  data = address_data,
  id_var = "id",
  in_date_var = "period_start",  # For period data, use in_date_var and out_date_var
  out_date_var = "period_end",
  exposure_var = "lden",
  period = "quarter",
  energy = TRUE,         # Energy-based averaging for noise values
  weight_by_days = TRUE,  # Weight by duration at each address
  result_variable = "quarterly_lden",
  debug = FALSE
)

# Show the quarterly aggregated data
quarterly_data %>%
  select(id, period_id, period_start, period_end, quarterly_lden, 
         days_with_data, days_in_period, method_description) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
id period_id period_start period_end quarterly_lden days_with_data days_in_period method_description
1 2020Q1 2020-01-01 2020-03-31 62.30000 91 91 Energy-based mean (day-weighted)
1 2020Q2 2020-04-01 2020-06-30 55.70000 77 91 Energy-based mean (day-weighted)
1 2020Q3 2020-07-01 2020-09-30 58.20000 92 92 Energy-based mean (day-weighted)
1 2020Q4 2020-10-01 2020-12-31 64.10000 92 92 Energy-based mean (day-weighted)
2 2020Q1 2020-01-01 2020-03-31 59.18761 91 91 Energy-based mean (day-weighted)
2 2020Q3 2020-07-01 2020-09-30 66.80000 47 92 Energy-based mean (day-weighted)
2 2020Q4 2020-10-01 2020-12-31 61.30000 61 92 Energy-based mean (day-weighted)
3 2020Q1 2020-01-01 2020-03-31 53.70000 91 91 Energy-based mean (day-weighted)
3 2020Q2 2020-04-01 2020-06-30 57.74578 61 91 Energy-based mean (day-weighted)
3 2020Q3 2020-07-01 2020-09-30 65.20000 30 92 Energy-based mean (day-weighted)

Visualize Quarterly Aggregates

Code
# Plot quarterly noise exposures
ggplot(quarterly_data, aes(x = period_start, y = quarterly_lden, color = factor(id))) +
  geom_point(size = 3) +
  geom_line() +
  labs(
    title = "Quarterly Aggregated Noise Exposure",
    subtitle = "Day-weighted energy-based averages from address history",
    x = "Quarter",
    y = "Noise Level (dB)",
    color = "ID"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

Advanced Features

Handling Logarithmic Values (Energy-Based Averaging)

For exposures measured on logarithmic scales (like decibel-based noise measurements), arithmetic averaging is incorrect. The function provides energy-based averaging options.

Comparing Arithmetic vs. Energy-Based Averaging for Noise

Code
# Generate sample monthly noise data with large variations across the year
noise_data <- data.frame(
  id = rep(1, 12),
  month = 1:12,
  period_start = seq(as.Date("2020-01-01"), by = "month", length.out = 12),
  # Create a realistic yearly pattern with one very loud month (summer festival in July)
  lden = c(60, 59, 61, 62, 63, 64, 75, 65, 63, 62, 60, 61)
)

# Calculate arithmetic and energy-based means
arithmetic_mean <- mean(noise_data$lden)
energy_mean <- 10 * log10(mean(10^(noise_data$lden/10)))

# Show the results
results_df <- data.frame(
  Method = c("Arithmetic Mean", "Energy-Based Mean"),
  Value = c(arithmetic_mean, energy_mean)
)

results_df %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Method Value
Arithmetic Mean 62.91667
Energy-Based Mean 66.17781
Code
# Plot to show the difference
noise_data %>%
  mutate(month_name = factor(month.name[month], levels = month.name)) %>%
  ggplot(aes(x = month_name, y = lden, group = 1)) +
  geom_point(size = 3) +
  geom_line() +
  geom_hline(yintercept = arithmetic_mean, color = "red", linetype = "dashed") +
  geom_hline(yintercept = energy_mean, color = "blue", linetype = "dashed") +
  annotate("text", x = 11.5, y = arithmetic_mean + 0.5, 
           label = "Arithmetic Mean", hjust = 1, color = "red") +
  annotate("text", x = 11.5, y = energy_mean - 0.5, 
           label = "Energy-Based Mean", hjust = 1, color = "blue") +
  labs(
    title = "Effect of Energy-Based Averaging for Annual Noise Levels",
    subtitle = "Note how energy-based averaging gives more weight to the loud July values",
    x = "Month",
    y = "Noise Level (dB)"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Code
# Add explanation text
cat("The example above shows monthly noise levels throughout a year, with a particularly loud month in July (75 dB).\n\n")
The example above shows monthly noise levels throughout a year, with a particularly loud month in July (75 dB).
Code
cat("The arithmetic mean (", round(arithmetic_mean, 1), " dB) underestimates the actual perceived average noise level.\n")
The arithmetic mean ( 62.9  dB) underestimates the actual perceived average noise level.
Code
cat("The energy-based mean (", round(energy_mean, 1), " dB) properly accounts for the logarithmic nature of the decibel scale,\n")
The energy-based mean ( 66.2  dB) properly accounts for the logarithmic nature of the decibel scale,
Code
cat("giving more weight to louder periods as they contribute disproportionately to the overall sound energy.\n\n")
giving more weight to louder periods as they contribute disproportionately to the overall sound energy.
Code
cat("This difference is crucial in epidemiological studies of noise exposure, where using arithmetic\n")
This difference is crucial in epidemiological studies of noise exposure, where using arithmetic
Code
cat("means would systematically underestimate the true exposure and potentially lead to exposure misclassification.\n")
means would systematically underestimate the true exposure and potentially lead to exposure misclassification.

Time-Weighted Averaging with Different Day Counts

When aggregating period data, weighting by the number of days ensures proper representation of exposures.

Code
# Create sample data with uneven period lengths
uneven_data <- data.frame(
  id = rep(1, 3),
  period_start = as.Date(c("2020-01-01", "2020-01-15", "2020-02-01")),
  period_end = as.Date(c("2020-01-14", "2020-01-31", "2020-02-28")),
  no2 = c(25, 15, 20)  # NO2 values
)

# Show the data
uneven_data %>%
  mutate(days = as.numeric(period_end - period_start) + 1) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
id period_start period_end no2 days
1 2020-01-01 2020-01-14 25 14
1 2020-01-15 2020-01-31 15 17
1 2020-02-01 2020-02-28 20 28
Code
# Calculate weighted and unweighted monthly averages
monthly_weighted <- create_period_exposures(
  data = uneven_data,
  in_date_var = "period_start",
  out_date_var = "period_end",
  exposure_var = "no2",
  period = "month",
  energy = FALSE,
  weight_by_days = TRUE,
  result_variable = "monthly_no2_weighted"
)

monthly_unweighted <- create_period_exposures(
  data = uneven_data,
  in_date_var = "period_start",
  out_date_var = "period_end",
  exposure_var = "no2",
  period = "month",
  energy = FALSE,
  weight_by_days = FALSE,
  result_variable = "monthly_no2_unweighted"
)

# Compare the results
comparison <- left_join(
  select(monthly_weighted, id, period_id, monthly_no2_weighted),
  select(monthly_unweighted, id, period_id, monthly_no2_unweighted),
  by = c("id", "period_id")
) %>%
  mutate(difference = monthly_no2_weighted - monthly_no2_unweighted)

comparison %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
id period_id monthly_no2_weighted monthly_no2_unweighted difference
1 2020-01 19.51613 20 -0.483871
1 2020-02 20.00000 20 0.000000

Considerations for Data Quality

The function provides metrics to assess data quality, which can be used to filter results based on completeness.

Code
# Filter monthly data based on 5% missing threshold
complete_months_5pct <- monthly_data %>%
  filter(missing_percentage <= 5) %>%
  select(id, period_id, monthly_pm25, missing_percentage)

incomplete_months_5pct <- monthly_data %>%
  filter(missing_percentage > 5) %>%
  select(id, period_id, monthly_pm25, missing_percentage)

# Compare number of complete vs incomplete months
# print("Complete months (≤5% missing):", nrow(complete_months_5pct), "\n")")
# print("Incomplete months (>5% missing):", nrow(incomplete_months_5pct), "\n")

The choice of missingness threshold depends on your study requirements and exposure type. As shown in the tabs above, different thresholds result in different sets of included/excluded periods. Some considerations:

  • Stricter thresholds (e.g., 5%): Ensure high data quality but may exclude many periods
  • Moderate thresholds (e.g., 10-20%): Balance between data quality and sample size
  • Permissive thresholds (e.g., 30%): Include more periods but with potentially less reliable estimates

For critical research applications, it’s often advisable to perform sensitivity analyses using different thresholds to assess the impact on your findings.

Performance Considerations

The function can process large datasets efficiently, but performance may vary based on the number of IDs and time periods.

Code
# Generate a larger dataset
large_daily_data <- data.frame(
  id = rep(1:1000, each = 365),
  date = rep(seq(as.Date("2020-01-01"), by = "day", length.out = 365), 1000),
  exposure = rnorm(1000 * 365, 50, 10)
)

# Time the execution
system.time({
  large_monthly_data <- create_period_exposures(
    data = large_daily_data,
    date_var = "date",
    exposure_var = "exposure",
    period = "month"
  )
})

For very large datasets, consider using parallel processing:

Code
library(parallel)

# Split data by ID
ids <- unique(large_daily_data$id)
id_chunks <- split(ids, ceiling(seq_along(ids) / 100))  # Process 100 IDs at a time

# Function to process a chunk of IDs
process_chunk <- function(ids) {
  chunk_data <- large_daily_data[large_daily_data$id %in% ids, ]
  create_period_exposures(
    data = chunk_data,
    date_var = "date",
    exposure_var = "exposure",
    period = "month"
  )
}

# Process chunks in parallel
results_list <- mclapply(id_chunks, process_chunk, mc.cores = detectCores() - 1)

# Combine results
combined_results <- do.call(rbind, results_list)

Best Practices and Recommendations

General Guidelines

  1. Choose appropriate time periods based on your research question and the exposure dynamics
  2. Use energy-based averaging for logarithmic values (noise, decibels)
  3. Weight by days when using period data with varying durations
  4. Filter by data quality metrics for more reliable analyses
  5. Validate aggregated results against raw data when possible

Common Issues and Solutions

Issue Solution
Missing data Use missing_percentage to filter periods with insufficient data
Mixed data types Convert point data to period data by duplicating the date
Incorrect averaging Ensure energy=TRUE for decibel values
Memory limitations Process the data in batches by ID
Performance concerns Consider parallel processing for very large datasets

Conclusion

The create_period_exposures() function provides a flexible and powerful tool for aggregating exposure data in epidemiological research. By properly accounting for time periods, data quality, and appropriate averaging methods, researchers can generate more reliable exposure metrics for health outcome analysis.

When publishing research using this function, we recommend documenting:

  1. The chosen time periods and their relevance to health outcomes
  2. Averaging methods (energy-based vs. arithmetic)
  3. Data completeness thresholds applied
  4. Any pre-processing steps applied to the raw exposure data

Session Information

Code
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2019 x64 (build 17763)

Matrix products: default


locale:
[1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252   
[3] LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C                   
[5] LC_TIME=Swedish_Sweden.1252    

time zone: Europe/Stockholm
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tictoc_1.2.1     kableExtra_1.4.0 knitr_1.48       patchwork_1.3.0 
[5] lubridate_1.9.3  ggplot2_3.5.1    dplyr_1.1.4     

loaded via a namespace (and not attached):
 [1] Matrix_1.6-5      gtable_0.3.5      jsonlite_1.8.8    highr_0.11       
 [5] compiler_4.3.1    tidyselect_1.2.1  xml2_1.3.6        stringr_1.5.1    
 [9] splines_4.3.1     systemfonts_1.2.1 scales_1.3.0      yaml_2.3.10      
[13] fastmap_1.2.0     lattice_0.21-8    R6_2.6.1          labeling_0.4.3   
[17] generics_0.1.3    htmlwidgets_1.6.4 tibble_3.2.1      munsell_0.5.1    
[21] svglite_2.1.3     pillar_1.10.1     rlang_1.1.4       stringi_1.8.4    
[25] xfun_0.49         viridisLite_0.4.2 timechange_0.3.0  cli_3.6.3        
[29] mgcv_1.9-1        withr_3.0.2       magrittr_2.0.3    digest_0.6.37    
[33] grid_4.3.1        rstudioapi_0.17.1 nlme_3.1-162      lifecycle_1.0.4  
[37] vctrs_0.6.5       evaluate_1.0.1    glue_1.8.0        farver_2.1.1     
[41] colorspace_2.1-0  rmarkdown_2.29    tools_4.3.1       pkgconfig_2.0.3  
[45] htmltools_0.5.8.1

References

Brink, Mark, and Juanita Haagsma. 2024. “Determining the Population Health Impact of Environmental Noise.” In A Sound Approach to Noise and Health, 75–96. Springer Nature Singapore Singapore. https://library.oapen.org/bitstream/handle/20.500.12657/94584/978-981-97-6121-0.pdf?sequence=1#page=82.
Day–evening–night noise level. 2025. “Day–Evening–Night Noise Level – Wikipedia.” https://en.wikipedia.org/wiki/Day%E2%80%93evening%E2%80%93night_noise_level#Definition.
European Union. 2002. “Directive 2002/49/EC of the European Parliament and of the Council of 25 June 2002 Relating to the Assessment and Management of Environmental Noise.” http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32002L0049&from=EN.