Air Pollution and Fetal Growth in Sweden

Purpose

To explore descriptives on exposure, outcome and sociodemographics including data availability.

To explore associations between air pollution and multiple pregnancy and neonatal growth outcomes.

A data base compiling fetal ultrasound and birth anthropometric outcomes, corresponding z-scores, sociodemographic information on pregnant and their children, and corresponding exposure data on 4 air pollutants by gestational week is used.

Data flow chart

Datasets used

Birth dataset English Language: chamnha_healthdata Cases: n=620,866; Individual ids: 620,866; 2014-2019; Variables: ID, date of birth, gestational age, demographics

Birth dataset Swedish language: db Cases: n=639,236; Individual ids: 629,989; 2014-2019; Variables: ID, date of birth, gestational age, demographics

Ultrasound dataset: ultraljud_foster; Individual ids: 569038; 2014-2019; Variables: ID, date of ultrasound (several per case), number of fetuses, ultrasound measures (crl, bpd etc.)

KUB ultrasound dataset: kub_undersokningar; Cases: n=213,250; Individual ids: 205,390; 2014-2019; Variables: ID, date of birth, ultrasound measures

Exposure dataset: Individual ids: n=500,958; exposure data provided by spatiotemporal model

Code

# Review availability of cases in datasets used in this project

setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("1_originaldata/chamnha_healthdata.Rda")
#names(ch_ht)
#num_unique_ids <- length(unique(ch_ht$Graviditet))
#num_unique_ids
#access datasets: Birth dataset Swedish language
setwd("P:/EnvEpi-Projekt/CHAMNHA")
load("1_paper_temp_ptb/02_datasets/health_data.Rda")
#names(db)
#num_unique_ids <- length(unique(db$GraviditetID))
#num_unique_ids
#access datasets: Ultrasound dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
data_us <- as.data.table(read.csv(file = "1_originaldata/ultraljud_foster.csv", sep = ";")) 
#names(data_us)
#num_unique_ids <- length(unique(data_us$graviditetid))
#num_unique_ids 
#access datasets: KUB dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("2_processeddata/conceptiondate_all/conceptiondate_kub.Rda")
#names(data1)
#num_unique_ids <- length(unique(data1$id))
#num_unique_ids

Identical cases matrix across datasets

Overlap matrix of identical cases between the four datasets Birth dataset English Language, Birth dataset Swedish language, Ultrasound dataset, KUB ultrasound dataset

Code

#Rename id variable, remove other cases, deduplicate cases 
bir_en <- ch_ht %>%
  rename(id = Graviditet) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
bir_sw <- db %>%
  rename(id = GraviditetID) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
ultra <- data_us %>%
  rename(id = graviditetid) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
kub <- data1 %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
rm(data1, db, ch_ht, data_us)

#Make a matrix of identical cases in the four datasets

# Count the total number of cases in each dataset
dataset_counts <- sapply(list(bir_en, bir_sw, ultra, kub), nrow)
names(dataset_counts) <- c("bir_en", "bir_sw", "ultra", "kub")

# Calculate the overlap matrix
datasets <- list(bir_en$id, bir_sw$id, ultra$id, kub$id)
overlap_matrix <- matrix(0, nrow = 4, ncol = 4, dimnames = list(names(dataset_counts), names(dataset_counts)))

for (i in 1:4) {
  for (j in 1:4) {
    overlap_matrix[i, j] <- length(intersect(datasets[[i]], datasets[[j]]))
  }
}

# Add dataset counts as row and column labels
rownames(overlap_matrix) <- paste0(names(dataset_counts), " (", dataset_counts, ")")
colnames(overlap_matrix) <- paste0(names(dataset_counts), " (", dataset_counts, ")")

overlap_matrix

Suggested inclusion criteria

registered in the Swedish Pregnancy Register 2014-2019: birth_eng/ch_ht (n=620866) and birth_swe/db (n=629989)
any ultrasound outcome during pregnancy available: ultrasound dataset (n=569038) and kub (n=205390)
date of conception via: 1) kub exam; 2) ultrasound exam; 3) birth dataset
residential address available for matching exposure with individual pregnant women (n=568922)
birth weight z-score: birth weight z-score calculated by gestational age-specific birth weight provided by birth_eng/birth_swe datasets

Exclusion criteria

no outcome data or extreme outliers (ultrasound/birth weight: z-score >5, < -5)

Flow chart

Data transformation

The dataset is modified for presentation purposes and variables. For example, variables on month, year and season of conception, birth weight z-score tertiles are defined.

To Do: Add data from Swedish birth dataset (sociodemographics need to be translated and merged)

Code

rm(list = ls())
# set wd for manual work (even though already specified in YAML code)
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("2_processeddata/conceptiondate_all/conceptiondate_merged_ustype_gestage_maxultrasoundmeasurement_birthoutcomes_USzscores_EFWzscores_BWzscores_whozscores_sorted_outcavail_socdem_wide_expos.Rda")

####database

#rename database
data <- data_exposure
rm(data_exposure)

# remove data without exposure and without zBW
data <- data %>%
  filter(
    !is.na(pm25_trim_first) & !is.na(pm10_trim_first) & !is.na(no2_trim_first) & !is.na(o3_trim_first)) %>% 
  filter(!is.na(zBW))

####rename variables

#rename variables
data <- data %>%
  mutate(mat_age_cat = dplyr::recode(mat_age_cat, 
                         "<=24" = "≤24", 
                         ">=35" = "≥35"),
    fstart = dplyr::recode(fstart, 
                    "Elektiv sectio" = "Cesarean section", 
                    "Induktion" = "Induction", 
                    "Spontan" = "Spontaneous")
  )

#### make new variables

#define season of conception
#month of conception
data <- as.data.table(data)
data[, month_conception := as.numeric(format(conceptiondate, "%m"))]
#year of conception
data[, year_conception := as.numeric(format(conceptiondate, "%y"))]
#year group
data <- data %>%
  mutate(year_group = case_when(
    year_conception %in% c(13, 14) ~ "2013/14",
    year_conception %in% c(15, 16) ~ "2015/16",
    year_conception %in% c(17, 18, 19) ~ "2017/18/19",
    TRUE ~ as.character(year_conception)
  ))
#define season of conception
data <- data %>%
  mutate(season_conception = factor(case_when(
    month_conception %in% c(12, 1, 2) ~ "Winter",
    month_conception %in% c(3, 4, 5) ~ "Spring",
    month_conception %in% c(6, 7, 8) ~ "Summer",
    month_conception %in% c(9, 10, 11) ~ "Fall",
    TRUE ~ NA_character_
  ), levels = c("Winter", "Spring", "Summer", "Fall")))

# Create a new variable birth_weight_tertiles and birth weight
data <- data %>%
  mutate(
    zBW_tertiles = cut(
      zBW,
      breaks = quantile(zBW, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile zBW", "Mid tertile zBW", "High tertile zBW"),
      include.lowest = TRUE)) %>%
  mutate(
    BW_tertiles = cut(
      birth_weight,
      breaks = quantile(birth_weight, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile BW", "Mid tertile BW", "High tertile BW"),
      include.lowest = TRUE))
# Create tertiles for each pollutant
data <- data %>%
  mutate(
    pm25_tertiles = cut(
      pm25_trim_first,
      breaks = quantile(pm25_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    pm10_tertiles = cut(
      pm10_trim_first,
      breaks = quantile(pm10_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    no2_tertiles = cut(
      no2_trim_first,
      breaks = quantile(no2_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    o3_tertiles = cut(
      o3_trim_first,
      breaks = quantile(o3_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  )

# Clean specific variables
data <- data %>%
  mutate(meduc = if_else(meduc == "Don´t know", NA_character_, meduc)) %>% 
 mutate(smoke_preg = factor(smoke_preg,
                            levels = c("Yes", "No"),
                            labels = c("Smoker", "Non-smoker")))
#clean variable zBHC (n=61)
data$zBHC[is.infinite(data$zBHC)] <- NA

names(data)

#save edited dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
save(data, file = "2_processeddata/conceptiondate_all/conceptiondate_merged_ustype_gestage_maxultrasoundmeasurement_birthoutcomes_USzscores_EFWzscores_BWzscores_whozscores_sorted_outcavail_socdem_wide_expos_procRquarto.Rda")

Exposure descriptives

Mean and tertile exposure levels

Mean and tertile air pollution levels during first trimester in the study population.

Code

#load edited dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("2_processeddata/conceptiondate_all/conceptiondate_merged_ustype_gestage_maxultrasoundmeasurement_birthoutcomes_USzscores_EFWzscores_BWzscores_whozscores_sorted_outcavail_socdem_wide_expos_procRquarto.Rda")

# Mean pollution levels
trialdata <- data |> 
  select(pm25_trim_first, pm10_trim_first, no2_trim_first, o3_trim_first)

mean_exposure_table <-
  tbl_summary(
    trialdata,
    label = list(pm25_trim_first ~" PM2.5 (µg/m3)", pm10_trim_first ~ "PM10 (µg/m3)", no2_trim_first ~ "NO2 (µg/m3)", o3_trim_first ~ "O3 (µg/m3)"),
        statistic = list(all_continuous() ~ "{mean} ({sd})"), 
    digits = all_continuous() ~ 1,
    missing = "no") |> 
      modify_header(label = "**Characteristics**") |> 
      bold_labels() |> 
      modify_caption("Table 2: Ambient air pollution levels during first trimester in the study population")

# Tertile pollution levels
# Summary table for PM10
trialdata <- data |> 
  select(pm25_trim_first, pm10_trim_first, no2_trim_first, o3_trim_first, 
         pm25_tertiles, pm10_tertiles, no2_tertiles, o3_tertiles)

summary_table_pm10 <- tbl_summary(
  trialdata,
  by = pm10_tertiles, 
  include = c("pm10_trim_first"),
  label = list(pm10_trim_first ~ "PM10 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no")
# Summary table for NO2
summary_table_no2 <- tbl_summary(
  trialdata,
  by = no2_tertiles, 
  include = c("no2_trim_first"),
  label = list(no2_trim_first ~ "NO2 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)
# Summary table for O3
summary_table_o3 <- tbl_summary(
  trialdata,
  by = o3_tertiles, 
  include = c("o3_trim_first"),
  label = list(o3_trim_first ~ "O3 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)
# Summary table for PM25
summary_table_pm25 <- tbl_summary(
  trialdata,
  by = pm25_tertiles, 
  include = c("pm25_trim_first"),
  label = list(pm25_trim_first ~ "PM2.5 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)

summary_tertiles_exposure <- tbl_stack(list(summary_table_pm25, summary_table_pm10, summary_table_no2, summary_table_o3))

Table 2: Ambient air pollution levels during first trimester in the study population
Characteristics	N = 497,402¹
PM2.5 (µg/m3)	7.6 (2.0)
PM10 (µg/m3)	15.2 (4.1)
NO2 (µg/m3)	14.3 (8.0)
O3 (µg/m3)	53.3 (9.5)
¹ Mean (SD)

Characteristic	Low tertile N = 165,801¹	Mid tertile N = 165,800¹	High tertile N = 165,801¹
PM2.5 (µg/m3)	5.4 (1.0)	7.6 (0.5)	9.7 (1.1)
PM10 (µg/m3)	11.2 (1.8)	14.7 (0.8)	19.8 (2.9)
NO2 (µg/m3)	6.2 (2.7)	13.3 (1.6)	23.3 (5.9)
O3 (µg/m3)	43.3 (3.2)	52.2 (3.1)	64.5 (4.7)
¹ Mean (SD)

Exposure in relation to WHO/EU guidelines

WHO/EU guidelines for individual air pollutants and proportion of population exposed to levels exceeding guuidelines during first trimester.

Code

thresholds <- tibble(
  Pollutant = c("PM2.5", "PM10", "NO2", "O3"),
  WHO_Threshold = c(5, 15, 10, 60),
  EU_Threshold = c(10, 20, 20, 100)
)

# Calculate exceedance proportions
exceedance_data <- data %>%
  summarise(
    PM2.5_WHO = mean(pm25_trim_first > thresholds$WHO_Threshold[1], na.rm = TRUE),
    PM10_WHO = mean(pm10_trim_first > thresholds$WHO_Threshold[2], na.rm = TRUE),
    NO2_WHO = mean(no2_trim_first > thresholds$WHO_Threshold[3], na.rm = TRUE),
    O3_WHO = mean(o3_trim_first > thresholds$WHO_Threshold[4], na.rm = TRUE),
    PM2.5_EU = mean(pm25_trim_first > thresholds$EU_Threshold[1], na.rm = TRUE),
    PM10_EU = mean(pm10_trim_first > thresholds$EU_Threshold[2], na.rm = TRUE),
    NO2_EU = mean(no2_trim_first > thresholds$EU_Threshold[3], na.rm = TRUE),
    O3_EU = mean(o3_trim_first > thresholds$EU_Threshold[4], na.rm = TRUE)
  ) %>%
  pivot_longer(
    cols = everything(),
    names_to = c("Pollutant", "Standard"),
    names_sep = "_",
    values_to = "Proportion_Exceeding"
  ) %>%
  mutate(Proportion_Exceeding = round(Proportion_Exceeding * 100, 1)) # Convert to percentage

# Combine WHO and EU data into a single table
final_table <- thresholds %>%
  left_join(
    exceedance_data %>%
      filter(Standard == "WHO") %>%
      select(Pollutant, WHO_Exceedance = Proportion_Exceeding),
    by = "Pollutant"
  ) %>%
  left_join(
    exceedance_data %>%
      filter(Standard == "EU") %>%
      select(Pollutant, EU_Exceedance = Proportion_Exceeding),
    by = "Pollutant"
  )

# Create the gt table
gt_table <- final_table %>%
  gt() %>%
  tab_header(
    title = "International Guidelines and Proportion of Cases Exceeding",
    subtitle = "WHO (2021) and EU (2024) Standards for Air Pollutants (Average in First Trimester)"
  ) %>%
  cols_label(
    Pollutant = "Pollutant",
    WHO_Threshold = "WHO Guideline (µg/m³)",
    WHO_Exceedance = "Proportion Exceeding WHO (%)",
    EU_Threshold = "EU Guideline (µg/m³)",
    EU_Exceedance = "Proportion Exceeding EU (%)"
  )

Pollutant	WHO Guideline (µg/m³)	EU Guideline (µg/m³)	Proportion Exceeding WHO (%)	Proportion Exceeding EU (%)
International Guidelines and Proportion of Cases Exceeding
WHO (2021) and EU (2024) Standards for Air Pollutants (Average in First Trimester)
PM2.5	5	10	88.9	10.1
PM10	15	20	45.7	13.4
NO2	10	20	69.7	20.2
O3	60	100	27.2	0.0

Exposure by season and year

Distribution of air pollutants by season and by year.

Code

#boxplots: air quality by season of conception

#select data: air pollutants and season
data1 <- data %>%
  select(pm25_trim_first, pm10_trim_first, o3_trim_first, no2_trim_first, 
         season_conception)

#reshape data into long format
data1_long <- data1 %>%
  gather(variable, value, pm25_trim_first:no2_trim_first)

#replace values of air pollutants
data1_long <- data1_long %>%
  mutate(variable = case_when(
    variable == "pm25_trim_first" ~ "PM 2.5",
    variable == "pm10_trim_first" ~ "PM 10",
    variable == "no2_trim_first" ~ "NO2",
    variable == "o3_trim_first" ~ "O3",
    TRUE ~ variable  # Keep the original value if it doesn't match any condition
  ))  %>%
  rename(Pollutant = variable)

#plot combined boxplots: season
airpoll_season <- ggplot(data1_long, aes(x = season_conception, y = value, fill = Pollutant)) +
  geom_boxplot() +
  labs(title = "Air Quality by season of conception",
       x = "Season of conception",
       y = "Concentration of air pollution (µm/m3)",
       fill = "Pollutant") +
  scale_fill_manual(values = c("blue", "red", "green", "purple"))  +
  theme_classic() +
  theme(axis.text.x = element_text(size = 16),  axis.text.y = element_text(size = 16),  
        legend.text = element_text(size = 16),  legend.title = element_text(size = 16), 
        plot.title = element_text(size = 20)) 

#boxplots: air quality by year

#select data: air pollutants and season
data2 <- data %>%
  select(pm25_trim_first, pm10_trim_first, o3_trim_first, no2_trim_first, 
         year_group)

#reshape data into long format
data2_long <- data2 %>%
  gather(variable, value, pm25_trim_first:no2_trim_first)

#replace values of air pollutants
data2_long <- data2_long %>%
  mutate(variable = case_when(
    variable == "pm25_trim_first" ~ "PM 2.5",
    variable == "pm10_trim_first" ~ "PM 10",
    variable == "no2_trim_first" ~ "NO2",
    variable == "o3_trim_first" ~ "O3",
    TRUE ~ variable  # Keep the original value if it doesn't match any condition
  ))  %>%
  rename(Pollutant = variable)

#plot combined boxplots: season
airpoll_year <- ggplot(data2_long, aes(x = year_group, y = value, fill = Pollutant)) +
  geom_boxplot() +
  labs(title = "Air Quality by year of conception",
       x = "Year of conception",
       y = "Concentration of air pollution (µm/m3)",
       fill = "Pollutant") +
  scale_fill_manual(values = c("blue", "red", "green", "purple"))  +
  theme_classic() +
  theme(axis.text.x = element_text(size = 16),  axis.text.y = element_text(size = 16),  
        legend.text = element_text(size = 16),  legend.title = element_text(size = 16), 
        plot.title = element_text(size = 20))

Correlation between exposures

Correlation between exposure to multiple air pollutants during first trimester (gestational week 0-13) in the study population. Note: Blue: coefficient >0; red: coefficient <0.

Code

#load libraries
pacman::p_load(AMR, 
               corrplot, 
               gridGraphics)
 
#set data to data.table
setDT(data)

# Correlation between first trimester exposures
data1 <- data %>%
  filter(
    !is.na(pm25_trim_first) & !is.na(pm10_trim_first) & !is.na(no2_trim_first) & !is.na(o3_trim_first)
  )

exp_matrix <- data1 %>%
  select(
    pm25_trim_first,
    pm10_trim_first,
    no2_trim_first,
    o3_trim_first
  )

colnames(exp_matrix) <- c("Trim1 PM2.5", "Trim1 PM10", "Trim1 NO2", "Trim1 O3")

#apply function cor to  matrix "exp_matrix
polday <- cor(as.matrix(exp_matrix))

corrplot(method = "ellipse", 
           polday, type = "lower", 
           order = "original", 
           title = element_blank(), 
           addCoef.col = 'black', tl.col = "black", 
           cl.pos = 'n', col = COL2('RdBu'),
           tl.srt = 0, 
           diag = FALSE, 
           mar = c(0, 0, 1, 0),
           tl.offset = 0.71, 
           col.lim = c(-1, 1),
           number.cex = 1.5, 
           tl.cex = 1.2,
           cl.cex = 1.3
)

Correlation between exposure to multiple air pollutants during first trimester (gestational week 0-13). and for individual weeks in first, second and third trimester in the study population. Note: Blue: coefficient >0; red: coefficient <0.

Code

#Correlation between weekly exposures Week 1, Week 13, Week 28

# Exclude cases with all exposures missing
# Exclude cases with all exposures missing
data1 <- data %>%
  filter(
    !is.na(pm25_trim_first) & !is.na(pm10_trim_first) & !is.na(no2_trim_first) & !is.na(o3_trim_first) & 
    !is.na(pm25_trim_second) & !is.na(pm10_trim_second) & !is.na(no2_trim_second) & !is.na(o3_trim_second) &
    !is.na(pm25_trim_third) & !is.na(pm10_trim_third) & !is.na(no2_trim_third) & !is.na(o3_trim_third)
  )

# Create the exposure matrix
exp_matrix <- data1 %>%
  select(
    pm25_trim_first, pm25_trim_second, pm25_trim_third, 
    pm10_trim_first, pm10_trim_second, pm10_trim_third, 
    no2_trim_first, no2_trim_second, no2_trim_third,
    o3_trim_first, o3_trim_second, o3_trim_third
  )

# Set the names for the figure
colnames(exp_matrix) <- c(
  "T1 PM2.5", "T2 PM2.5", "T3 PM2.5",
  "T1 PM10", "T2 PM10", "T3 PM10",
  "T1 NO2", "T2 NO2", "T3 NO2",
  "T1 O3", "T2 O3", "T3 O3"
)

# Apply the cor function to the exposure matrix
polday <- cor(as.matrix(exp_matrix))

# Save the correlation plot
corrplot_T1T2T3 <- corrplot(
  method = "ellipse", 
  polday, 
  type = "lower", 
  order = "original", 
  title = element_blank(), 
  addCoef.col = 'black', 
  tl.col = "black", 
  cl.pos = 'n', 
  col = COL2('RdBu'),
  tl.srt = 90, 
  diag = FALSE, 
  number.cex = 0.6, 
  mar = c(0, 0, 1, 0), 
  tl.offset = 0.5, 
  tl.cex = 0.8,
  col.lim = c(-1, 1),
  cl.cex = 0.7
)

Outcomes descriptives

Distribution of birth weight and birth weight ultrasound z-scores.

Code

#Box plot of zBW by tertiles

#plot birth_weight_zscore_tertiles by actual birth weight
boxplot_BWtertiles <- ggplot(data %>% filter(!is.na(birth_weight)), aes(x = BW_tertiles, y = birth_weight)) +
  geom_boxplot() + 
  labs(x = "Birth Weight Tertiles", y = "Birth Weight (kg)") +
  ggtitle("Boxplot of birth weight by Birth Weight Tertiles")

#plot birth_weight_zscore_tertiles by actual birth weight
boxplot_zBWtertiles <- ggplot(data %>% filter(!is.na(zBW)), aes(x = zBW_tertiles, y = zBW)) +
  geom_boxplot() + 
  labs(x = "Z-score Birth Weight Tertiles", y = "z-score Birth Weight") +
  ggtitle("Boxplot of z-score birth weight by z-score Birth Weight Tertiles")

#plot birth weight z-scores as histogram

zBW_histogram <- ggplot(data, aes(x = zBW)) +
  geom_histogram(
    binwidth = 0.5, # Adjust bin width as needed
    fill = "steelblue",
    color = "black",
    alpha = 0.7
  ) +
  labs(
    title = "Histogram of zBW",
    x = "zBW",
    y = "Frequency"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5), # Center title
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12)
  ) 

#Make ggplots for z-score distribution

# Select relevant columns
selected_zscore_columns <- data %>%
  select(
    zAC_W16W23, zBPD_W16W23, zFL_W16W23,
    zAC_W24W31, zBPD_W24W31, zFL_W24W31,
    zAC_W32W40, zBPD_W32W40, zFL_W32W40
  )

# Reshape data to long format
long_zscore_data <- selected_zscore_columns %>%
  pivot_longer(
    cols = everything(),
    names_to = c("Variable", "TimeWindow"),
    names_pattern = "z(.*)_(W\\d{2}W\\d{2})",
    values_to = "zScore") %>%
  filter(zScore > -10 & zScore < 10)

zscore_color_mapping <- c("AC" = "green", "BPD" = "red", "FL" = "blue")

# Create a combined plot for z-scores
zscore_combined_plot <- long_zscore_data %>%
  ggplot(aes(x = zScore, fill = Variable, color = Variable)) +
  geom_histogram(
    aes(y = ..density..),
    position = "identity",
    alpha = 0.05,
    bins = 30
  ) +
  geom_density(alpha = 0.3, size = 1) +
  scale_fill_manual(values = zscore_color_mapping) +
  scale_color_manual(values = zscore_color_mapping) +
  facet_wrap(~ TimeWindow, scales = "free", ncol = 1) +
  labs(
    title = "Z-Scores for ultrasound measures in three gestational perods",
    x = "Z-Score",
    y = "Density"
  ) +
  theme_minimal() +
  theme(
    legend.title = element_blank(),
    strip.text = element_text(face = "bold")
  )

#EFW z-scores

# Select relevant columns for EFW z-scores
selected_efw_zscore_columns <- data %>%
  select(
    zEFW_W16W23, 
    zEFW_W24W31, 
    zEFW_W32W40
  )

# Reshape data to long format for EFW z-scores
long_efw_zscore_data <- selected_efw_zscore_columns %>%
  pivot_longer(
    cols = everything(),
    names_to = "TimeWindow",
    names_pattern = "zEFW_(W\\d{2}W\\d{2})",
    values_to = "zScore"
  ) %>%
  filter(zScore > -10 & zScore < 10)

# Define color mapping for EFW (if needed, can be single color)
efw_color_mapping <- c("EFW" = "purple")

# Create the combined plot for EFW z-scores
efw_zscore_combined_plot <- long_efw_zscore_data %>%
  ggplot(aes(x = zScore, fill = "EFW", color = "EFW")) +
  geom_histogram(
    aes(y = ..density..),
    position = "identity",
    alpha = 0.05,
    bins = 30
  ) +
  geom_density(alpha = 0.3, size = 1) +
  scale_fill_manual(values = efw_color_mapping) +
  scale_color_manual(values = efw_color_mapping) +
  facet_wrap(~ TimeWindow, scales = "free", ncol = 1) +
  labs(
    title = "Z-Scores for EFW in three gestational periods",
    x = "Z-Score",
    y = "Density"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",  # Hide legend since only one variable
    strip.text = element_text(face = "bold")
  )

Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

Outcome table and Missing outcomes

Distribution of birth and ultrasound raw outcomes, z-scores, and distribution of missing outcomes.

Code

#make trial dataset

trialdata <- data |> 
  select(zBW_tertiles,  zBW,
         birth_weight, birth_length, zBL, birth_headcircumference, zBHC,
         g_age_d_BPD_W16W23 , BPD_W16W23 , zBPD_W16W23 , 
                        g_age_d_BPD_W24W31 , BPD_W24W31 , zBPD_W24W31 , 
                        g_age_d_BPD_W32W40 , BPD_W32W40 , zBPD_W32W40 , 
                        g_age_d_AC_W16W23 , AC_W16W23, zAC_W16W23 , 
                        g_age_d_AC_W24W31 , AC_W24W31 , zAC_W24W31 , 
                        g_age_d_AC_W32W40 , AC_W32W40 , zAC_W32W40 , 
                        g_age_d_FL_W16W23 , FL_W16W23 , zFL_W16W23 , 
                        g_age_d_FL_W24W31 , FL_W24W31 , zFL_W24W31 , 
                        g_age_d_FL_W32W40 , FL_W32W40 , zFL_W32W40 , 
                        g_age_d_EFW_W16W23 , EFW_W16W23 , zEFW_W16W23 , 
                        g_age_d_EFW_W24W31 , EFW_W24W31 , zEFW_W24W31 , 
                        g_age_d_EFW_W32W40 , EFW_W32W40 , zEFW_W32W40)

#make trial crosstable by zBW
outcome_table <-
  tbl_summary(
    trialdata,
    include = c("birth_weight", "birth_length", "birth_headcircumference", 
                "zBL", "zBHC", 
      "BPD_W16W23", "BPD_W24W31", "BPD_W32W40",
"zBPD_W16W23", "zBPD_W24W31", "zBPD_W32W40", 
"AC_W16W23", "AC_W24W31", "AC_W32W40",
"zAC_W16W23", "zAC_W24W31", "zAC_W32W40", 
"FL_W16W23", "FL_W24W31", "FL_W32W40",
"zFL_W16W23", "zFL_W24W31", "zFL_W32W40", 
"EFW_W16W23", "EFW_W24W31", "EFW_W32W40",
"zEFW_W16W23", "zEFW_W24W31", "zEFW_W32W40"),
        by = zBW_tertiles, 
    label = list(birth_weight ~"Birth weight (kg)", birth_length ~"Birth length (cm)", birth_headcircumference ~"Birth head circumference (cm)", 
      BPD_W16W23 ~ "BPD W16-W23 (mm)", zBPD_W16W23 ~ "zBPD W16-W23", BPD_W24W31 ~ "BPD W24-W31 (mm)", zBPD_W24W31 ~ "zBPD W24-W31", BPD_W32W40 ~ "BPD W32-W40 (mm)", zBPD_W32W40 ~ "zBPD W32-W40", AC_W16W23 ~ "AC W16-W23 (mm)", zAC_W16W23 ~ "zAC W16-W23", AC_W24W31 ~ "AC W24-W31 (mm)", zAC_W24W31 ~ "zAC W24-W31", AC_W32W40 ~ "AC W32-W40 (mm)", zAC_W32W40 ~ "zAC W32-W40", FL_W16W23 ~ "FL W16-W23 (mm)", zFL_W16W23 ~ "zFL W16-W23", FL_W24W31 ~ "FL W24-W31 (mm)", zFL_W24W31 ~ "zFL W24-W31", FL_W32W40 ~ "FL W32-W40 (mm)", zFL_W32W40 ~ "zFL W32-W40", EFW_W16W23 ~ "EFW W16-W23 (g)", zEFW_W16W23 ~ "zEFW W16-W23", EFW_W24W31 ~ "EFW W24-W31 (g)", zEFW_W24W31 ~ "zEFW W24-W31", EFW_W32W40 ~ "EFW W32-W40 (g)", zEFW_W32W40 ~ "zEFW W32-W40"),
        statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"), 
    digits = all_continuous() ~ 1, 
      missing = "no") |> 
  add_overall() |> 
modify_header(label = "**Characteristics**") |> 
  bold_labels() |> 
modify_caption("Table 1. Anthropometry characteristics. during pregnancy, stratified by tertiles of birth weight z-score")  |> 
 modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**zBW tertiles**")

#make missing outcome table
missing_outcome_table <- tbl_summary(
  trialdata, 
  by = zBW_tertiles, 
  include = c("birth_weight", "birth_length", "birth_headcircumference", 
              "BPD_W16W23", "AC_W16W23", "FL_W16W23", "EFW_W16W23",
              "BPD_W24W31", "AC_W24W31", "FL_W24W31", "EFW_W24W31", 
              "BPD_W32W40", "AC_W32W40", "FL_W32W40", "EFW_W32W40"), 
  label = list(BPD_W16W23 ~ "BPD W16-W23", BPD_W24W31 ~ "BPD W24-W31", 
               BPD_W32W40 ~ "BPD W32-W40", AC_W16W23 ~ "AC W16-W23", 
               AC_W24W31 ~ "AC W24-W31", AC_W32W40 ~ "AC W32-W40", 
               FL_W16W23 ~ "FL W16-W23",  FL_W24W31 ~ "FL W24-W31", 
               FL_W32W40 ~ "FL W32-W40", EFW_W16W23 ~ "EFW W16-W23", 
               EFW_W24W31 ~ "EFW W24-W31", EFW_W32W40 ~ "EFW W32-W40", 
               birth_weight ~"Birth weight", birth_length ~"Birth length", birth_headcircumference ~"Birth head circumference"),
   statistic = list(all_continuous() ~ "{mean}", missing = "no"), missing_stat = "({p_miss}%)") %>%
  modify_header(label = "**Characteristics**") %>%
   add_n(statistic = "{N_miss} ({p_miss}%)")  %>%
  modify_header(n = "**Missing**") %>%
  bold_levels() %>%
  modify_caption("Table 2. Missing proportions of outcome characteristics")

Table 1. Anthropometry characteristics. during pregnancy, stratified by tertiles of birth weight z-score
Characteristics	Overall N = 497,402¹	zBW tertiles
Characteristics	Overall N = 497,402¹	Low tertile zBW N = 165,831¹	Mid tertile zBW N = 165,774¹	High tertile zBW N = 165,797¹
Birth weight (kg)	3.5 (0.5)	3.1 (0.4)	3.5 (0.3)	4.0 (0.4)
Birth length (cm)	50.0 (4.5)	48.4 (4.7)	50.0 (4.2)	51.5 (4.1)
Birth head circumference (cm)	34.7 (3.8)	33.8 (3.9)	34.7 (3.6)	35.6 (3.7)
zBL	0.6 (1.1)	-0.3 (0.9)	0.6 (0.8)	1.4 (0.8)
zBHC	0.9 (1.1)	0.2 (1.1)	0.9 (1.0)	1.6 (1.0)
BPD W16-W23 (mm)	44.0 (3.7)	43.9 (3.6)	44.0 (3.5)	44.1 (4.0)
BPD W24-W31 (mm)	76.4 (6.4)	75.9 (5.9)	76.2 (6.5)	77.3 (6.8)
BPD W32-W40 (mm)	89.5 (6.6)	88.4 (5.8)	89.1 (7.3)	91.2 (6.6)
zBPD W16-W23	-0.5 (1.0)	-0.6 (0.9)	-0.5 (0.9)	-0.5 (1.2)
zBPD W24-W31	-0.4 (0.9)	-0.7 (0.9)	-0.3 (0.9)	0.0 (0.9)
zBPD W32-W40	-0.3 (1.4)	-0.7 (1.2)	-0.3 (1.6)	0.2 (1.3)
AC W16-W23 (mm)	139.1 (16.1)	137.7 (16.8)	139.1 (16.2)	140.6 (15.0)
AC W24-W31 (mm)	259.1 (26.1)	253.4 (22.2)	258.1 (25.6)	266.7 (28.6)
AC W32-W40 (mm)	324.9 (30.5)	313.2 (23.3)	322.2 (27.1)	341.5 (33.6)
zAC W16-W23	0.4 (1.7)	0.1 (1.9)	0.4 (1.8)	0.6 (1.5)
zAC W24-W31	0.8 (1.0)	0.2 (0.9)	0.8 (0.8)	1.5 (1.0)
zAC W32-W40	0.7 (1.1)	-0.1 (0.8)	0.7 (0.7)	1.5 (1.0)
FL W16-W23 (mm)	29.0 (3.8)	28.8 (3.9)	29.0 (4.0)	29.2 (3.5)
FL W24-W31 (mm)	56.8 (5.8)	56.4 (5.1)	56.7 (6.6)	57.5 (5.7)
FL W32-W40 (mm)	69.0 (5.7)	68.2 (4.8)	68.5 (5.1)	70.3 (6.8)
zFL W16-W23	0.3 (1.5)	0.2 (1.6)	0.4 (1.7)	0.5 (1.3)
zFL W24-W31	1.1 (1.4)	0.7 (1.1)	1.1 (1.9)	1.4 (1.1)
zFL W32-W40	1.1 (1.3)	0.6 (1.0)	1.1 (1.0)	1.5 (1.7)
EFW W16-W23 (g)	232.9 (123.2)	228.1 (138.0)	232.5 (121.5)	238.2 (108.3)
EFW W24-W31 (g)	1,507.6 (393.2)	1,429.6 (327.1)	1,492.8 (385.8)	1,613.2 (444.3)
EFW W32-W40 (g)	2,756.7 (639.8)	2,537.7 (465.9)	2,686.2 (581.8)	3,089.1 (732.1)
zEFW W16-W23	-1.9 (17.2)	-2.2 (28.1)	-1.8 (6.0)	-1.6 (7.8)
zEFW W24-W31	0.7 (1.0)	0.1 (0.9)	0.7 (0.8)	1.4 (0.9)
zEFW W32-W40	0.5 (1.0)	-0.3 (0.8)	0.5 (0.7)	1.4 (0.8)
¹ Mean (SD)

Table 2. Missing proportions of outcome characteristics
Characteristics	Missing	Low tertile zBW N = 165,831¹	Mid tertile zBW N = 165,774¹	High tertile zBW N = 165,797¹
Birth weight	0 (0%)	3.05	3.51	4.05
Birth length	0 (0%)	48.45	50.00	51.54
Birth head circumference	0 (0%)	33.76	34.70	35.61
BPD W16-W23	25,535 (5.1%)	43.89	43.96	44.12
Unknown		(5.4%)	(4.9%)	(5.1%)
AC W16-W23	64,546 (13%)	138	139	141
Unknown		(13%)	(13%)	(13%)
FL W16-W23	28,646 (5.8%)	28.81	28.99	29.18
Unknown		(6.0%)	(5.5%)	(5.8%)
EFW W16-W23	66,081 (13%)	228	232	238
Unknown		(14%)	(13%)	(13%)
BPD W24-W31	418,159 (84%)	76	76	77
Unknown		(82%)	(85%)	(85%)
AC W24-W31	418,389 (84%)	253	258	267
Unknown		(82%)	(85%)	(85%)
FL W24-W31	422,171 (85%)	56.4	56.7	57.5
Unknown		(83%)	(86%)	(86%)
EFW W24-W31	422,323 (85%)	1,430	1,493	1,613
Unknown		(83%)	(86%)	(86%)
BPD W32-W40	322,849 (65%)	88.4	89.1	91.2
Unknown		(60%)	(69%)	(66%)
AC W32-W40	318,599 (64%)	313	322	341
Unknown		(59%)	(68%)	(65%)
FL W32-W40	331,598 (67%)	68.2	68.5	70.3
Unknown		(61%)	(70%)	(68%)
EFW W32-W40	331,933 (67%)	2,538	2,686	3,089
Unknown		(62%)	(70%)	(68%)
¹ Mean

Repeated ultrasound outcomes during pregnancy: Availability

To show cases with repeated ultrasound measurements for gestational windows

Code

#Variables indicating overall data availability by gestational window
data <- data %>%
  mutate(
    # For weeks 16-23
    avail_W16W23 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W16W23"), 
                            starts_with("AC_W16W23"), 
                            starts_with("FL_W16W23"), 
                            starts_with("EFW_W16W23")))) > 0, 
      "Yes", "Missing"),
    # For weeks 24-31
    avail_W24W31 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W24W31"), 
                            starts_with("AC_W24W31"), 
                            starts_with("FL_W24W31"), 
                            starts_with("EFW_W24W31")))) > 0, 
      "Yes", "Missing"),
    # For weeks 32-40
    avail_W32W40 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W32W40"), 
                            starts_with("AC_W32W40"), 
                            starts_with("FL_W32W40"), 
                            starts_with("EFW_W32W40")))) > 0, 
      "Yes", "Missing"),
    # For birth variables
    avail_birth = if_else(
      rowSums(!is.na(select(., birth_weight, birth_length, birth_headcircumference))) > 0, 
      "Yes", "Missing")
  )

proportions <- data %>%
  mutate(
    time_points_available = rowSums(select(., avail_W16W23, avail_W24W31, avail_W32W40) == "Yes"),
    availability_group = case_when(
      time_points_available == 3 ~ "Measurements in all 3 gestational windows",
      time_points_available == 2 ~ "Measurements in 2 of 3 gestational windows",
      time_points_available == 1 ~ "Measurements in 1 of 3 gestational windows",
      time_points_available == 0 ~ "0 time points"
    )
  ) %>%
  group_by(availability_group) %>%
  summarise(
    count = n(),
    proportion = n() / nrow(data)
  ) %>%
  ungroup()

# Filter out the "0 time points" category
proportions <- proportions %>%
  filter(availability_group != "0 time points") %>%
  mutate(label = as.character(count)) # Only include the number of cases in the label

# Create the pie chart
ggplot(proportions, aes(x = "", y = proportion, fill = availability_group)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  theme_void() +
  labs(
    title = "Availability of Repeated Ultrasound Measuremnets Across Gestational Windows (n)",
    fill = "G"
  ) +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), size = 5) + # Add case count labels
  theme(
    legend.title = element_text(size = 14),  # Increase legend title font size
    legend.text = element_text(size = 12)    # Increase legend text font size
  ) +
  scale_fill_brewer(palette = "Set2")

Sociodemographics descriptives

Demographics by zBW tertiles and demographics for cases with missing outcomes by trimester.

Code

#Demographics by tertiles 

#make trial dataset
trialdata <- data |> 
  select(zBW_tertiles, gest_age_birth, birth_weight, birth_length, birth_headcircumference, zBHC,
sex, fstart, season, geogr, mat_age_cat, meduc, bmi_cat, smoke_preg, alcohol_cat, hypertension, dm2)

#make crosstable by zBW
baseline_table <-
  tbl_summary(
    trialdata,
    include = c("gest_age_birth", "birth_weight", #anthropometrics
                "sex", "fstart", "season",  #fetal characteristics
                "geogr", "mat_age_cat", "meduc", "bmi_cat", "smoke_preg"), #maternal characteristics
        by = zBW_tertiles, 
    label = list(gest_age_birth ~ "Gestational age at birth (days)", birth_weight ~ "Birth weight (kg)",
    sex ~ "Fetal sex", fstart ~ "Birth type", season ~ "Season of birth", 
    geogr ~ "Region of residency", mat_age_cat ~ "Maternal age at birth (years)", meduc ~ "Maternal education", bmi_cat ~ "Maternal body mass index", smoke_preg ~ "Maternal smoking status"),
        statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"), 
    digits = all_continuous() ~ 1, 
    missing_text = "Missing") |> 
  add_overall() |> 
modify_header(label = "**Characteristics**") |> 
  bold_labels() |> 
modify_caption("Table 1. Baseline characteristics. Birth anthropometrics, fetal and maternal characteristics stratified by tertiles of birth weight z-score")  |> 
 modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**zBW tertiles**")

Table: Characteristics for cases with available ultrasound parameters in respective gestational windows
Characteristics	Available data W16W23	Available data W24W31	Available data W32W40
Characteristics	N = 472,098¹	N = 79,337¹	N = 179,135¹
Gestational age at birth (days)	278.7 (11.4)	274.5 (14.4)	277.9 (10.9)
Birth weight (kg)	3.5 (0.5)	3.4 (0.6)	3.5 (0.6)
Fetal sex
Female	229,515 (49%)	39,301 (50%)	87,815 (49%)
Male	242,583 (51%)	40,036 (50%)	91,320 (51%)
Birth type
Cesarean section	35,863 (7.6%)	10,134 (13%)	18,167 (10%)
Induction	85,997 (18%)	20,413 (26%)	46,314 (26%)
Spontaneous	350,238 (74%)	48,790 (61%)	114,654 (64%)
Season of birth
Winter	104,575 (22%)	18,286 (23%)	40,781 (23%)
Spring	122,428 (26%)	20,031 (25%)	45,705 (26%)
Summer	128,676 (27%)	21,725 (27%)	47,572 (27%)
Fall	116,419 (25%)	19,295 (24%)	45,077 (25%)
Region of residency
North	43,318 (9.2%)	7,389 (9.3%)	14,222 (7.9%)
Central	200,986 (43%)	37,950 (48%)	81,978 (46%)
South	227,794 (48%)	33,998 (43%)	82,935 (46%)
Maternal age at birth (years)
≤24	52,946 (11%)	8,448 (11%)	18,579 (10%)
25-29	148,574 (31%)	22,436 (28%)	51,991 (29%)
30-34	164,063 (35%)	26,241 (33%)	61,153 (34%)
≥35	106,461 (23%)	22,201 (28%)	47,394 (26%)
Missing	54	11	18
Maternal education
High school	152,610 (38%)	25,996 (40%)	57,919 (39%)
Primary or lower	33,412 (8.4%)	6,116 (9.3%)	13,175 (8.8%)
University or higher	212,411 (53%)	33,501 (51%)	79,286 (53%)
Missing	73,665	13,724	28,755
Maternal body mass index
Underweight	12,186 (2.7%)	2,610 (3.4%)	5,512 (3.2%)
Normalweight	264,628 (58%)	39,056 (51%)	91,291 (53%)
Overweight	117,423 (26%)	19,331 (25%)	42,931 (25%)
Obesity	62,225 (14%)	15,203 (20%)	32,712 (19%)
Missing	15,636	3,137	6,689
Maternal smoking status
Smoker	20,506 (4.7%)	4,791 (6.5%)	10,822 (6.5%)
Non-smoker	412,917 (95%)	68,385 (93%)	154,505 (93%)
Missing	38,675	6,161	13,808
¹ Mean (SD); n (%)

Table: Characteristics for cases with missing ultrasound parameters in respective gestational windows
Characteristics	Missing data W16W23	Missing data W24W31	Missing data W32W40
Characteristics	N = 25,304¹	N = 418,065¹	N = 318,267¹
Gestational age at birth (days)	276.8 (13.6)	279.3 (10.7)	278.9 (11.8)
Birth weight (kg)	3.5 (0.6)	3.6 (0.5)	3.6 (0.5)
Fetal sex
Female	12,222 (48%)	202,436 (48%)	153,922 (48%)
Male	13,082 (52%)	215,629 (52%)	164,345 (52%)
Birth type
Cesarean section	2,491 (9.8%)	28,220 (6.8%)	20,187 (6.3%)
Induction	5,818 (23%)	71,402 (17%)	45,501 (14%)
Spontaneous	16,995 (67%)	318,443 (76%)	252,579 (79%)
Season of birth
Winter	5,999 (24%)	92,288 (22%)	69,793 (22%)
Spring	6,175 (24%)	108,572 (26%)	82,898 (26%)
Summer	6,692 (26%)	113,643 (27%)	87,796 (28%)
Fall	6,438 (25%)	103,562 (25%)	77,780 (24%)
Region of residency
North	2,887 (11%)	38,816 (9.3%)	31,983 (10%)
Central	6,793 (27%)	169,829 (41%)	125,801 (40%)
South	15,624 (62%)	209,420 (50%)	160,483 (50%)
Maternal age at birth (years)
≤24	2,968 (12%)	47,466 (11%)	37,335 (12%)
25-29	7,315 (29%)	133,453 (32%)	103,898 (33%)
30-34	8,085 (32%)	145,907 (35%)	110,995 (35%)
≥35	6,930 (27%)	91,190 (22%)	65,997 (21%)
Missing	6	49	42
Maternal education
High school	7,994 (39%)	134,608 (38%)	102,685 (38%)
Primary or lower	1,863 (9.1%)	29,159 (8.3%)	22,100 (8.2%)
University or higher	10,567 (52%)	189,477 (54%)	143,692 (54%)
Missing	4,880	64,821	49,790
Maternal body mass index
Underweight	484 (2.1%)	10,060 (2.5%)	7,158 (2.3%)
Normalweight	11,423 (49%)	236,995 (59%)	184,760 (60%)
Overweight	6,303 (27%)	104,395 (26%)	80,795 (26%)
Obesity	4,942 (21%)	51,964 (13%)	34,455 (11%)
Missing	2,152	14,651	11,099
Maternal smoking status
Smoker	1,301 (5.8%)	17,016 (4.4%)	10,985 (3.8%)
Non-smoker	21,060 (94%)	365,592 (96%)	279,472 (96%)
Missing	2,943	35,457	27,810
¹ Mean (SD); n (%)

Table: Characteristics for cases stratified by repeated outcome availability
Characteristics	Number of gestational windows for which data is available			3 N = 41,593¹
Characteristics	0 N = 10,575¹	1 N = 284,677¹	2 N = 160,557¹	3 N = 41,593¹
Gestational age at birth (days)	278.4 (13.2)	279.5 (10.9)	277.8 (12.3)	275.2 (11.6)
Birth weight (kg)	3.5 (0.6)	3.6 (0.5)	3.5 (0.6)	3.4 (0.6)
Fetal sex
Female	5,101 (48%)	137,325 (48%)	78,627 (49%)	20,684 (50%)
Male	5,474 (52%)	147,352 (52%)	81,930 (51%)	20,909 (50%)
Birth type
Cesarean section	750 (7.1%)	16,909 (5.9%)	14,830 (9.2%)	5,865 (14%)
Induction	1,845 (17%)	40,205 (14%)	36,776 (23%)	12,989 (31%)
Spontaneous	7,980 (75%)	227,563 (80%)	108,951 (68%)	22,739 (55%)
Season of birth
Winter	2,398 (23%)	62,274 (22%)	36,338 (23%)	9,564 (23%)
Spring	2,473 (23%)	74,506 (26%)	41,214 (26%)	10,410 (25%)
Summer	2,971 (28%)	78,107 (27%)	43,004 (27%)	11,286 (27%)
Fall	2,733 (26%)	69,790 (25%)	40,001 (25%)	10,333 (25%)
Region of residency
North	1,577 (15%)	27,878 (9.8%)	13,199 (8.2%)	3,551 (8.5%)
Central	2,291 (22%)	111,073 (39%)	73,404 (46%)	21,011 (51%)
South	6,707 (63%)	145,726 (51%)	73,954 (46%)	17,031 (41%)
Maternal age at birth (years)
≤24	1,239 (12%)	33,334 (12%)	17,384 (11%)	3,957 (9.5%)
25-29	3,200 (30%)	93,615 (33%)	47,836 (30%)	11,238 (27%)
30-34	3,384 (32%)	99,922 (35%)	54,991 (34%)	13,851 (33%)
≥35	2,752 (26%)	57,769 (20%)	40,323 (25%)	12,547 (30%)
Missing	0	37	23	0
Maternal education
High school	3,468 (39%)	91,258 (38%)	52,367 (39%)	13,511 (39%)
Primary or lower	699 (7.9%)	19,633 (8.2%)	11,759 (8.7%)	3,184 (9.3%)
University or higher	4,664 (53%)	129,142 (54%)	71,460 (53%)	17,712 (51%)
Missing	1,744	44,644	24,971	7,186
Maternal body mass index
Underweight	164 (1.7%)	6,201 (2.3%)	4,808 (3.1%)	1,497 (3.7%)
Normalweight	5,243 (53%)	166,285 (61%)	84,879 (55%)	19,644 (49%)
Overweight	2,846 (29%)	72,279 (26%)	38,397 (25%)	10,204 (25%)
Obesity	1,606 (16%)	29,675 (11%)	27,193 (18%)	8,693 (22%)
Missing	716	10,237	5,280	1,555
Maternal smoking status
Smoker	378 (4.0%)	9,242 (3.6%)	9,684 (6.5%)	2,503 (6.5%)
Non-smoker	9,138 (96%)	249,926 (96%)	138,858 (93%)	36,055 (94%)
Missing	1,059	25,509	12,015	3,035
¹ Mean (SD); n (%)

Table 1. Baseline characteristics. Birth anthropometrics, fetal and maternal characteristics stratified by tertiles of birth weight z-score
Characteristics	Overall N = 497,402¹	zBW tertiles
Characteristics	Overall N = 497,402¹	Low tertile zBW N = 165,831¹	Mid tertile zBW N = 165,774¹	High tertile zBW N = 165,797¹
Gestational age at birth (days)	278.6 (11.5)	277.6 (12.1)	278.3 (11.8)	279.8 (10.4)
Birth weight (kg)	3.5 (0.5)	3.1 (0.4)	3.5 (0.3)	4.0 (0.4)
Fetal sex
Female	241,737 (49%)	79,380 (48%)	81,181 (49%)	81,176 (49%)
Male	255,665 (51%)	86,451 (52%)	84,593 (51%)	84,621 (51%)
Birth type
Cesarean section	38,354 (7.7%)	9,848 (5.9%)	12,569 (7.6%)	15,937 (9.6%)
Induction	91,815 (18%)	32,069 (19%)	27,120 (16%)	32,626 (20%)
Spontaneous	367,233 (74%)	123,914 (75%)	126,085 (76%)	117,234 (71%)
Season of birth
Winter	110,574 (22%)	37,142 (22%)	37,075 (22%)	36,357 (22%)
Spring	128,603 (26%)	42,875 (26%)	43,024 (26%)	42,704 (26%)
Summer	135,368 (27%)	44,671 (27%)	45,010 (27%)	45,687 (28%)
Fall	122,857 (25%)	41,143 (25%)	40,665 (25%)	41,049 (25%)
Region of residency
North	46,205 (9.3%)	14,629 (8.8%)	15,436 (9.3%)	16,140 (9.7%)
Central	207,779 (42%)	70,317 (42%)	69,586 (42%)	67,876 (41%)
South	243,418 (49%)	80,885 (49%)	80,752 (49%)	81,781 (49%)
Maternal age at birth (years)
≤24	55,914 (11%)	21,679 (13%)	18,646 (11%)	15,589 (9.4%)
25-29	155,889 (31%)	52,489 (32%)	52,743 (32%)	50,657 (31%)
30-34	172,148 (35%)	55,880 (34%)	57,248 (35%)	59,020 (36%)
≥35	113,391 (23%)	35,759 (22%)	37,115 (22%)	40,517 (24%)
Missing	60	24	22	14
Maternal education
High school	160,604 (38%)	51,678 (38%)	52,750 (38%)	56,176 (40%)
Primary or lower	35,275 (8.4%)	14,146 (10%)	11,376 (8.1%)	9,753 (6.9%)
University or higher	222,978 (53%)	71,859 (52%)	75,662 (54%)	75,457 (53%)
Missing	78,545	28,148	25,986	24,411
Maternal body mass index
Underweight	12,670 (2.6%)	6,712 (4.2%)	3,963 (2.5%)	1,995 (1.2%)
Normalweight	276,051 (58%)	99,690 (62%)	95,339 (60%)	81,022 (51%)
Overweight	123,726 (26%)	36,060 (23%)	40,442 (25%)	47,224 (30%)
Obesity	67,167 (14%)	17,292 (11%)	20,186 (13%)	29,689 (19%)
Missing	17,788	6,077	5,844	5,867
Maternal smoking status
Smoker	21,807 (4.8%)	10,047 (6.6%)	6,593 (4.3%)	5,167 (3.4%)
Non-smoker	433,977 (95%)	142,026 (93%)	145,375 (96%)	146,576 (97%)
Missing	41,618	13,758	13,806	14,054
¹ Mean (SD); n (%)

Included vs Excluded

Demographics of included (inclusion criteria: date of conception, ultrasound, exposure data) vs excluded (SPR dataset).

Table. Included and Excluded Characteristics
Characteristics	All SPR births n=620,866
Characteristics	Excluded N = 123,464¹	Included N = 497,402¹
Gestational age at birth (days)	39.4 (2.6)	39.7 (1.6)
Birth weight (g)	3,462.7 (655.3)	3,537.8 (535.6)
Preterm birth	7,964 (7.2%)	22,853 (4.6%)
Fetal sex
Boy	63,539 (52%)	255,665 (51%)
Girl	59,765 (48%)	241,737 (49%)
Unknown	0 (0%)	0 (0%)
Birth type
Elektiv sectio	7,136 (5.8%)	38,354 (7.7%)
Induktion	18,174 (15%)	91,815 (18%)
Spontan	98,154 (80%)	367,233 (74%)
Region of residency
North	7,272 (11%)	46,205 (9.3%)
Central	10,618 (17%)	207,779 (42%)
South	45,453 (72%)	243,418 (49%)
Maternal age at birth (years)
<=24	20,186 (16%)	55,914 (11%)
25-29	40,132 (33%)	155,889 (31%)
30-34	39,267 (32%)	172,148 (35%)
>=35	23,746 (19%)	113,391 (23%)
Maternal education
Primary or lower	10,327 (10%)	35,275 (7.7%)
High school	36,552 (36%)	160,604 (35%)
University or higher	42,808 (42%)	222,978 (49%)
Don´t know	12,222 (12%)	40,327 (8.8%)
Maternal body mass index
Underweight	2,399 (2.5%)	12,670 (2.6%)
Normalweight	52,599 (54%)	276,051 (58%)
Overweight	26,991 (28%)	123,726 (26%)
Obesity	15,371 (16%)	67,167 (14%)
Maternal smoking status	4,964 (6.0%)	21,807 (4.8%)
¹ Mean (SD); n (%)

Analysis

Association between IQR increment in exposure and outcomes

To assess the association between mean air pollution levels during the first and exploratively second trimester of pregnancy and outcomes in single pollutant linear regression models.

Exposure (independent variable, IQR increment): Mean preconception and first trimester levels of PM2.5, PM10, NO2 and O3

Outcomes (dependent variable, z-scores): zAC_W16W23, zFL_W16W23, zBPD_W16W23, zAC_W24W31, zFL_W24W31, zBPD_W24W31, zAC_W32W40, zFL_W32W40, zBPD_W32W40, zBW, zBL, zBHC

Co-variates (adjusted for in all models): year and month of conception, maternal education, BMI and age

Function Linear regression: First and second trimester exposure and all fetal and birth outcomes

Results written to T1_birth_regression_results.xlsx

Results written to T1_W16W23_regression_results.xlsx

Results written to T1_W24W31_regression_results.xlsx

Results written to T1_W32W40_regression_results.xlsx

Results written to T2_birth_regression_results.xlsx

Results written to T2_W24W31_regression_results.xlsx

Results written to T2_W32W40_regression_results.xlsx

Combined excel files for T1 and for T2

By pollutant. T1 exposure and outcomes. Associations in linear regression

By time point in pregnancy: T1 exposure and outcomes Associations in linear regression

By time point in pregnancy: T2 exposure and outcomes Associations in linear regression

Subanalysis: Repeated observations W16W23 and W32W40(n=153,000) : Linear regression

Results written to T1_birth_regression_results.xlsx

Results written to T1_W16W23_regression_results.xlsx

Results written to T1_W24W31_regression_results.xlsx

Results written to T1_W32W40_regression_results.xlsx

Mean levels and correlation of exposure Trimester 1 and Trimester 2

Mean exposure levels in T1 and T2

             T1        T2
PM2.5  7.551261  7.500292
PM10  15.219967 15.310912
NO2   14.259364 14.329791
O3    53.326492 54.175025

Correlation of exposure levels in T1 and T2

Assessment of linearity

Redo zAC, zFL, zBPD, zEFW linear regression

Redo zBW / zBL / zBHC linear regression

tertile linear regression: zAC, zFL, zBPD, zEFW

tertile linear regression zBW / zBL / zBHC

Prepare for Tertile/Linear regression Forest Plot

[1] "pollutant"           "exposure intercept"  "intercept threshold"
[4] "p"                   "beta"                "ci_low"             
[7] "ci_up"               "outcome"

Function Forest Plot

Assessment of linearity: GLM model zBW

png 
  2

png 
  2

png 
  2

png 
  2

Assessment of linearity: GLM model zAC_W16W23

png 
  2

png 
  2

png 
  2

png 
  2

Stratification birth weight / zBPDW16W23

Two pollutant model

png 
  2

png 
  2

zBW two pollutant

zBPD two pollutant

G computational method

G-computation for analyzing the effects of exposure mixtures. Quantile g-computation yields estimates of the effect of increasing all exposures by one quantile, simultaneously. This, it estimates a “mixture effect” useful in the study of exposure mixtures such as air pollution.

Linear model:

Adjusted for year and month of conception, education, BMI and maternal age 4 Quantiles

G computational models, 4 quantiles without Ozone

      Outcome   Beta  Lower  Upper Sum_positive Sum_negative  Analysis
1 zBPD_W16W23 -0.010 -0.347 -0.014        0.000        -0.01 Q4, no O3
2  zAC_W16W23 -0.006  0.373 -0.013        0.003        -0.01 Q4, no O3
3  zFL_W16W23  0.002  0.287 -0.004        0.012        -0.01 Q4, no O3

G computational models, 4 quantiles with Ozone

      Outcome   Beta  Lower  Upper Sum_positive Sum_negative    Analysis
1 zBPD_W16W23 -0.014 -0.341 -0.019        0.000       -0.014 Q4, with O3
2  zAC_W16W23 -0.007  0.373 -0.018        0.004       -0.011 Q4, with O3
3  zFL_W16W23  0.010  0.269  0.001        0.018       -0.008 Q4, with O3

5 quantiles, without ozone

      Outcome   Beta  Lower  Upper Sum_positive Sum_negative  Analysis
1 zBPD_W16W23 -0.008 -0.337 -0.011        0.001       -0.009 Q5, no O3
2  zAC_W16W23 -0.006  0.375 -0.011        0.004       -0.010 Q5, no O3
3  zFL_W16W23  0.001  0.298 -0.004        0.010       -0.009 Q5, no O3

10 quantiles, without ozone

      Outcome   Beta  Lower  Upper Sum_positive Sum_negative   Analysis
1 zBPD_W16W23 -0.004 -0.339 -0.005        0.000       -0.004 Q10, no O3
2  zAC_W16W23 -0.003  0.366 -0.005        0.001       -0.004 Q10, no O3
3  zFL_W16W23  0.001  0.282 -0.002        0.005       -0.005 Q10, no O3

Joint G computational results for first trimester: with/without O3, Q4, Q5, Q10

       Outcome   Beta  Lower  Upper Sum_positive Sum_negative    Analysis
1   zAC_W16W23 -0.003  0.366 -0.005        0.001       -0.004  Q10, no O3
2   zAC_W16W23 -0.006  0.373 -0.013        0.003       -0.010   Q4, no O3
3   zAC_W16W23 -0.007  0.373 -0.018        0.004       -0.011 Q4, with O3
4   zAC_W16W23 -0.006  0.375 -0.011        0.004       -0.010   Q5, no O3
5  zBPD_W16W23 -0.004 -0.339 -0.005        0.000       -0.004  Q10, no O3
6  zBPD_W16W23 -0.010 -0.347 -0.014        0.000       -0.010   Q4, no O3
7  zBPD_W16W23 -0.014 -0.341 -0.019        0.000       -0.014 Q4, with O3
8  zBPD_W16W23 -0.008 -0.337 -0.011        0.001       -0.009   Q5, no O3
9   zFL_W16W23  0.001  0.282 -0.002        0.005       -0.005  Q10, no O3
10  zFL_W16W23  0.002  0.287 -0.004        0.012       -0.010   Q4, no O3
11  zFL_W16W23  0.010  0.269  0.001        0.018       -0.008 Q4, with O3
12  zFL_W16W23  0.001  0.298 -0.004        0.010       -0.009   Q5, no O3

Longitudinal models

Calculate Delta Beta for zBPD, zAC, zFL

Visualise differences in Δz-scores

zBPD_d1d2, zBPD_d1d3, zBPD_d2d3 zAC_d1d2, zAC_d1d3, zAC_d2d3 zFL_d1d2, zFL_d1d3, zFL_d2d3

Longitudinal analysis: Association between IQR increment in exposure and difference in outcomes

To assess the association between mean air pollution levels during the first trimester of pregnancy and differences in outcomes in single pollutant linear regression models.

Exposure (independent variable, IQR increment): Mean first trimester levels of PM2.5, PM10, NO2 and O3

Outcomes (dependent variable, z-scores): zBPD_d1d2, zBPD_d1d3, zBPD_d2d3 zAC_d1d2, zAC_d1d3, zAC_d2d3 zFL_d1d2, zFL_d1d3, zFL_d2d3

Co-variates (adjusted for in all models): year and month of conception, maternal education, BMI and age

Function Linear regression

Results written to delta_BPD_regression_results.xlsx

Results written to delta_AC_regression_results.xlsx

Results written to delta_FL_regression_results.xlsx

Combined excel files

By pollutant. Associations between T1 exposure and Δ outcomes in linear regression

By outcome. Associations between T1 exposure and Δ outcomes in linear regression

KUB sensitivity analyses

Linear regression among KUB cases

Association between IQR increment in exposure and outcomes

To assess the association between mean air pollution levels during the first and exploratively second trimester of pregnancy and outcomes in single pollutant linear regression models.

Exposure (independent variable, IQR increment): Mean preconception and first trimester levels of PM2.5, PM10, NO2 and O3

Outcomes (dependent variable, z-scores): zAC_W16W23, zFL_W16W23, zBPD_W16W23, zAC_W24W31, zFL_W24W31, zBPD_W24W31, zAC_W32W40, zFL_W32W40, zBPD_W32W40, zBW, zBL, zBHC

Co-variates (adjusted for in all models): year and month of conception, maternal education, BMI and age

Function Linear regression: First and second trimester exposure and all fetal and birth outcomes

Results written to T1_birth_regression_results.xlsx

Results written to T1_W16W23_regression_results.xlsx

Results written to T1_W24W31_regression_results.xlsx

Results written to T1_W32W40_regression_results.xlsx

Results written to T2_birth_regression_results.xlsx

Results written to T2_W24W31_regression_results.xlsx

Results written to T2_W32W40_regression_results.xlsx

Combined excel files for T1 and for T2

By pollutant. T1 exposure and outcomes. Associations in linear regression

To do list: - Update flow chart - Add repeated measurement to flow chart - Check G computational results?

Presentation - Summarize results - Delayed effect on FL over time? - Pronounced effect on birth weight - G-computational model for all outcomes?