Air Pollution and Fetal Growth in Sweden

Author

Nikolaus Mezger

Purpose

To explore descriptives on exposure, outcome and sociodemographics including data availability.

To explore associations between air pollution and multiple pregnancy and neonatal growth outcomes.

A data base compiling fetal ultrasound and birth anthropometric outcomes, corresponding z-scores, sociodemographic information on pregnant and their children, and corresponding exposure data on 4 air pollutants by gestational week is used.

Data flow chart

Datasets used

Birth dataset English Language: chamnha_healthdata Cases: n=620,866; Individual ids: 620,866; 2014-2019; Variables: ID, date of birth, gestational age, demographics

Birth dataset Swedish language: db Cases: n=639,236; Individual ids: 629,989; 2014-2019; Variables: ID, date of birth, gestational age, demographics

Ultrasound dataset: ultraljud_foster; Individual ids: 569038; 2014-2019; Variables: ID, date of ultrasound (several per case), number of fetuses, ultrasound measures (crl, bpd etc.)

KUB ultrasound dataset: kub_undersokningar; Cases: n=213,250; Individual ids: 205,390; 2014-2019; Variables: ID, date of birth, ultrasound measures

Exposure dataset: Individual ids: n=500,958; exposure data provided by spatiotemporal model

Code
# Review availability of cases in datasets used in this project

setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("1_originaldata/chamnha_healthdata.Rda")
#names(ch_ht)
#num_unique_ids <- length(unique(ch_ht$Graviditet))
#num_unique_ids
#access datasets: Birth dataset Swedish language
setwd("P:/EnvEpi-Projekt/CHAMNHA")
load("1_paper_temp_ptb/02_datasets/health_data.Rda")
#names(db)
#num_unique_ids <- length(unique(db$GraviditetID))
#num_unique_ids
#access datasets: Ultrasound dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
data_us <- as.data.table(read.csv(file = "1_originaldata/ultraljud_foster.csv", sep = ";")) 
#names(data_us)
#num_unique_ids <- length(unique(data_us$graviditetid))
#num_unique_ids 
#access datasets: KUB dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("2_processeddata/conceptiondate_all/conceptiondate_kub.Rda")
#names(data1)
#num_unique_ids <- length(unique(data1$id))
#num_unique_ids 

Identical cases matrix across datasets

Overlap matrix of identical cases between the four datasets Birth dataset English Language, Birth dataset Swedish language, Ultrasound dataset, KUB ultrasound dataset

Code
#Rename id variable, remove other cases, deduplicate cases 
bir_en <- ch_ht %>%
  rename(id = Graviditet) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
bir_sw <- db %>%
  rename(id = GraviditetID) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
ultra <- data_us %>%
  rename(id = graviditetid) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
kub <- data1 %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
rm(data1, db, ch_ht, data_us)

#Make a matrix of identical cases in the four datasets

# Count the total number of cases in each dataset
dataset_counts <- sapply(list(bir_en, bir_sw, ultra, kub), nrow)
names(dataset_counts) <- c("bir_en", "bir_sw", "ultra", "kub")

# Calculate the overlap matrix
datasets <- list(bir_en$id, bir_sw$id, ultra$id, kub$id)
overlap_matrix <- matrix(0, nrow = 4, ncol = 4, dimnames = list(names(dataset_counts), names(dataset_counts)))

for (i in 1:4) {
  for (j in 1:4) {
    overlap_matrix[i, j] <- length(intersect(datasets[[i]], datasets[[j]]))
  }
}

# Add dataset counts as row and column labels
rownames(overlap_matrix) <- paste0(names(dataset_counts), " (", dataset_counts, ")")
colnames(overlap_matrix) <- paste0(names(dataset_counts), " (", dataset_counts, ")")

overlap_matrix

Suggested inclusion criteria

  • registered in the Swedish Pregnancy Register 2014-2019: birth_eng/ch_ht (n=620866) and birth_swe/db (n=629989)
  • any ultrasound outcome during pregnancy available: ultrasound dataset (n=569038) and kub (n=205390)
  • date of conception via: 1) kub exam; 2) ultrasound exam; 3) birth dataset
  • residential address available for matching exposure with individual pregnant women (n=568922)
  • birth weight z-score: birth weight z-score calculated by gestational age-specific birth weight provided by birth_eng/birth_swe datasets

Exclusion criteria

  • no outcome data or extreme outliers (ultrasound/birth weight: z-score >5, < -5)

Flow chart

Data transformation

The dataset is modified for presentation purposes and variables. For example, variables on month, year and season of conception, birth weight z-score tertiles are defined.

To Do: Add data from Swedish birth dataset (sociodemographics need to be translated and merged)

Code
rm(list = ls())
# set wd for manual work (even though already specified in YAML code)
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("2_processeddata/conceptiondate_all/conceptiondate_merged_ustype_gestage_maxultrasoundmeasurement_birthoutcomes_USzscores_EFWzscores_BWzscores_whozscores_sorted_outcavail_socdem_wide_expos.Rda")

####database

#rename database
data <- data_exposure
rm(data_exposure)

# remove data without exposure and without zBW
data <- data %>%
  filter(
    !is.na(pm25_trim_first) & !is.na(pm10_trim_first) & !is.na(no2_trim_first) & !is.na(o3_trim_first)) %>% 
  filter(!is.na(zBW))

####rename variables

#rename variables
data <- data %>%
  mutate(mat_age_cat = dplyr::recode(mat_age_cat, 
                         "<=24" = "≤24", 
                         ">=35" = "≥35"),
    fstart = dplyr::recode(fstart, 
                    "Elektiv sectio" = "Cesarean section", 
                    "Induktion" = "Induction", 
                    "Spontan" = "Spontaneous")
  )

#### make new variables

#define season of conception
#month of conception
data <- as.data.table(data)
data[, month_conception := as.numeric(format(conceptiondate, "%m"))]
#year of conception
data[, year_conception := as.numeric(format(conceptiondate, "%y"))]
#year group
data <- data %>%
  mutate(year_group = case_when(
    year_conception %in% c(13, 14) ~ "2013/14",
    year_conception %in% c(15, 16) ~ "2015/16",
    year_conception %in% c(17, 18, 19) ~ "2017/18/19",
    TRUE ~ as.character(year_conception)
  ))
#define season of conception
data <- data %>%
  mutate(season_conception = factor(case_when(
    month_conception %in% c(12, 1, 2) ~ "Winter",
    month_conception %in% c(3, 4, 5) ~ "Spring",
    month_conception %in% c(6, 7, 8) ~ "Summer",
    month_conception %in% c(9, 10, 11) ~ "Fall",
    TRUE ~ NA_character_
  ), levels = c("Winter", "Spring", "Summer", "Fall")))

# Create a new variable birth_weight_tertiles and birth weight
data <- data %>%
  mutate(
    zBW_tertiles = cut(
      zBW,
      breaks = quantile(zBW, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile zBW", "Mid tertile zBW", "High tertile zBW"),
      include.lowest = TRUE)) %>%
  mutate(
    BW_tertiles = cut(
      birth_weight,
      breaks = quantile(birth_weight, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile BW", "Mid tertile BW", "High tertile BW"),
      include.lowest = TRUE))
# Create tertiles for each pollutant
data <- data %>%
  mutate(
    pm25_tertiles = cut(
      pm25_trim_first,
      breaks = quantile(pm25_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    pm10_tertiles = cut(
      pm10_trim_first,
      breaks = quantile(pm10_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    no2_tertiles = cut(
      no2_trim_first,
      breaks = quantile(no2_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    o3_tertiles = cut(
      o3_trim_first,
      breaks = quantile(o3_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  )

# Clean specific variables
data <- data %>%
  mutate(meduc = if_else(meduc == "Don´t know", NA_character_, meduc)) %>% 
 mutate(smoke_preg = factor(smoke_preg,
                            levels = c("Yes", "No"),
                            labels = c("Smoker", "Non-smoker")))
#clean variable zBHC (n=61)
data$zBHC[is.infinite(data$zBHC)] <- NA

names(data)

#save edited dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
save(data, file = "2_processeddata/conceptiondate_all/conceptiondate_merged_ustype_gestage_maxultrasoundmeasurement_birthoutcomes_USzscores_EFWzscores_BWzscores_whozscores_sorted_outcavail_socdem_wide_expos_procRquarto.Rda")

Exposure descriptives

Mean and tertile exposure levels

Mean and tertile air pollution levels during first trimester in the study population.

Code
#load edited dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("2_processeddata/conceptiondate_all/conceptiondate_merged_ustype_gestage_maxultrasoundmeasurement_birthoutcomes_USzscores_EFWzscores_BWzscores_whozscores_sorted_outcavail_socdem_wide_expos_procRquarto.Rda")

# Mean pollution levels
trialdata <- data |> 
  select(pm25_trim_first, pm10_trim_first, no2_trim_first, o3_trim_first)

mean_exposure_table <-
  tbl_summary(
    trialdata,
    label = list(pm25_trim_first ~" PM2.5 (µg/m3)", pm10_trim_first ~ "PM10 (µg/m3)", no2_trim_first ~ "NO2 (µg/m3)", o3_trim_first ~ "O3 (µg/m3)"),
        statistic = list(all_continuous() ~ "{mean} ({sd})"), 
    digits = all_continuous() ~ 1,
    missing = "no") |> 
      modify_header(label = "**Characteristics**") |> 
      bold_labels() |> 
      modify_caption("Table 2: Ambient air pollution levels during first trimester in the study population")

# Tertile pollution levels
# Summary table for PM10
trialdata <- data |> 
  select(pm25_trim_first, pm10_trim_first, no2_trim_first, o3_trim_first, 
         pm25_tertiles, pm10_tertiles, no2_tertiles, o3_tertiles)

summary_table_pm10 <- tbl_summary(
  trialdata,
  by = pm10_tertiles, 
  include = c("pm10_trim_first"),
  label = list(pm10_trim_first ~ "PM10 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no")
# Summary table for NO2
summary_table_no2 <- tbl_summary(
  trialdata,
  by = no2_tertiles, 
  include = c("no2_trim_first"),
  label = list(no2_trim_first ~ "NO2 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)
# Summary table for O3
summary_table_o3 <- tbl_summary(
  trialdata,
  by = o3_tertiles, 
  include = c("o3_trim_first"),
  label = list(o3_trim_first ~ "O3 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)
# Summary table for PM25
summary_table_pm25 <- tbl_summary(
  trialdata,
  by = pm25_tertiles, 
  include = c("pm25_trim_first"),
  label = list(pm25_trim_first ~ "PM2.5 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)

summary_tertiles_exposure <- tbl_stack(list(summary_table_pm25, summary_table_pm10, summary_table_no2, summary_table_o3))
Table 2: Ambient air pollution levels during first trimester in the study population
Characteristics N = 497,4021
PM2.5 (µg/m3) 7.6 (2.0)
PM10 (µg/m3) 15.2 (4.1)
NO2 (µg/m3) 14.3 (8.0)
O3 (µg/m3) 53.3 (9.5)
1 Mean (SD)
Characteristic Low tertile
N = 165,801
1
Mid tertile
N = 165,800
1
High tertile
N = 165,801
1
PM2.5 (µg/m3) 5.4 (1.0) 7.6 (0.5) 9.7 (1.1)
PM10 (µg/m3) 11.2 (1.8) 14.7 (0.8) 19.8 (2.9)
NO2 (µg/m3) 6.2 (2.7) 13.3 (1.6) 23.3 (5.9)
O3 (µg/m3) 43.3 (3.2) 52.2 (3.1) 64.5 (4.7)
1 Mean (SD)

Exposure in relation to WHO/EU guidelines

WHO/EU guidelines for individual air pollutants and proportion of population exposed to levels exceeding guuidelines during first trimester.

Code
thresholds <- tibble(
  Pollutant = c("PM2.5", "PM10", "NO2", "O3"),
  WHO_Threshold = c(5, 15, 10, 60),
  EU_Threshold = c(10, 20, 20, 100)
)

# Calculate exceedance proportions
exceedance_data <- data %>%
  summarise(
    PM2.5_WHO = mean(pm25_trim_first > thresholds$WHO_Threshold[1], na.rm = TRUE),
    PM10_WHO = mean(pm10_trim_first > thresholds$WHO_Threshold[2], na.rm = TRUE),
    NO2_WHO = mean(no2_trim_first > thresholds$WHO_Threshold[3], na.rm = TRUE),
    O3_WHO = mean(o3_trim_first > thresholds$WHO_Threshold[4], na.rm = TRUE),
    PM2.5_EU = mean(pm25_trim_first > thresholds$EU_Threshold[1], na.rm = TRUE),
    PM10_EU = mean(pm10_trim_first > thresholds$EU_Threshold[2], na.rm = TRUE),
    NO2_EU = mean(no2_trim_first > thresholds$EU_Threshold[3], na.rm = TRUE),
    O3_EU = mean(o3_trim_first > thresholds$EU_Threshold[4], na.rm = TRUE)
  ) %>%
  pivot_longer(
    cols = everything(),
    names_to = c("Pollutant", "Standard"),
    names_sep = "_",
    values_to = "Proportion_Exceeding"
  ) %>%
  mutate(Proportion_Exceeding = round(Proportion_Exceeding * 100, 1)) # Convert to percentage

# Combine WHO and EU data into a single table
final_table <- thresholds %>%
  left_join(
    exceedance_data %>%
      filter(Standard == "WHO") %>%
      select(Pollutant, WHO_Exceedance = Proportion_Exceeding),
    by = "Pollutant"
  ) %>%
  left_join(
    exceedance_data %>%
      filter(Standard == "EU") %>%
      select(Pollutant, EU_Exceedance = Proportion_Exceeding),
    by = "Pollutant"
  )

# Create the gt table
gt_table <- final_table %>%
  gt() %>%
  tab_header(
    title = "International Guidelines and Proportion of Cases Exceeding",
    subtitle = "WHO (2021) and EU (2024) Standards for Air Pollutants (Average in First Trimester)"
  ) %>%
  cols_label(
    Pollutant = "Pollutant",
    WHO_Threshold = "WHO Guideline (µg/m³)",
    WHO_Exceedance = "Proportion Exceeding WHO (%)",
    EU_Threshold = "EU Guideline (µg/m³)",
    EU_Exceedance = "Proportion Exceeding EU (%)"
  )
International Guidelines and Proportion of Cases Exceeding
WHO (2021) and EU (2024) Standards for Air Pollutants (Average in First Trimester)
Pollutant WHO Guideline (µg/m³) EU Guideline (µg/m³) Proportion Exceeding WHO (%) Proportion Exceeding EU (%)
PM2.5 5 10 88.9 10.1
PM10 15 20 45.7 13.4
NO2 10 20 69.7 20.2
O3 60 100 27.2 0.0

Exposure by season and year

Distribution of air pollutants by season and by year.

Code
#boxplots: air quality by season of conception

#select data: air pollutants and season
data1 <- data %>%
  select(pm25_trim_first, pm10_trim_first, o3_trim_first, no2_trim_first, 
         season_conception)

#reshape data into long format
data1_long <- data1 %>%
  gather(variable, value, pm25_trim_first:no2_trim_first)

#replace values of air pollutants
data1_long <- data1_long %>%
  mutate(variable = case_when(
    variable == "pm25_trim_first" ~ "PM 2.5",
    variable == "pm10_trim_first" ~ "PM 10",
    variable == "no2_trim_first" ~ "NO2",
    variable == "o3_trim_first" ~ "O3",
    TRUE ~ variable  # Keep the original value if it doesn't match any condition
  ))  %>%
  rename(Pollutant = variable)

#plot combined boxplots: season
airpoll_season <- ggplot(data1_long, aes(x = season_conception, y = value, fill = Pollutant)) +
  geom_boxplot() +
  labs(title = "Air Quality by season of conception",
       x = "Season of conception",
       y = "Concentration of air pollution (µm/m3)",
       fill = "Pollutant") +
  scale_fill_manual(values = c("blue", "red", "green", "purple"))  +
  theme_classic() +
  theme(axis.text.x = element_text(size = 16),  axis.text.y = element_text(size = 16),  
        legend.text = element_text(size = 16),  legend.title = element_text(size = 16), 
        plot.title = element_text(size = 20)) 

#boxplots: air quality by year

#select data: air pollutants and season
data2 <- data %>%
  select(pm25_trim_first, pm10_trim_first, o3_trim_first, no2_trim_first, 
         year_group)

#reshape data into long format
data2_long <- data2 %>%
  gather(variable, value, pm25_trim_first:no2_trim_first)

#replace values of air pollutants
data2_long <- data2_long %>%
  mutate(variable = case_when(
    variable == "pm25_trim_first" ~ "PM 2.5",
    variable == "pm10_trim_first" ~ "PM 10",
    variable == "no2_trim_first" ~ "NO2",
    variable == "o3_trim_first" ~ "O3",
    TRUE ~ variable  # Keep the original value if it doesn't match any condition
  ))  %>%
  rename(Pollutant = variable)

#plot combined boxplots: season
airpoll_year <- ggplot(data2_long, aes(x = year_group, y = value, fill = Pollutant)) +
  geom_boxplot() +
  labs(title = "Air Quality by year of conception",
       x = "Year of conception",
       y = "Concentration of air pollution (µm/m3)",
       fill = "Pollutant") +
  scale_fill_manual(values = c("blue", "red", "green", "purple"))  +
  theme_classic() +
  theme(axis.text.x = element_text(size = 16),  axis.text.y = element_text(size = 16),  
        legend.text = element_text(size = 16),  legend.title = element_text(size = 16), 
        plot.title = element_text(size = 20)) 

Correlation between exposures

Correlation between exposure to multiple air pollutants during first trimester (gestational week 0-13) in the study population. Note: Blue: coefficient >0; red: coefficient <0.

Code
#load libraries
pacman::p_load(AMR, 
               corrplot, 
               gridGraphics)
 
#set data to data.table
setDT(data)

# Correlation between first trimester exposures
data1 <- data %>%
  filter(
    !is.na(pm25_trim_first) & !is.na(pm10_trim_first) & !is.na(no2_trim_first) & !is.na(o3_trim_first)
  )

exp_matrix <- data1 %>%
  select(
    pm25_trim_first,
    pm10_trim_first,
    no2_trim_first,
    o3_trim_first
  )

colnames(exp_matrix) <- c("Trim1 PM2.5", "Trim1 PM10", "Trim1 NO2", "Trim1 O3")

#apply function cor to  matrix "exp_matrix
polday <- cor(as.matrix(exp_matrix))

corrplot(method = "ellipse", 
           polday, type = "lower", 
           order = "original", 
           title = element_blank(), 
           addCoef.col = 'black', tl.col = "black", 
           cl.pos = 'n', col = COL2('RdBu'),
           tl.srt = 0, 
           diag = FALSE, 
           mar = c(0, 0, 1, 0),
           tl.offset = 0.71, 
           col.lim = c(-1, 1),
           number.cex = 1.5, 
           tl.cex = 1.2,
           cl.cex = 1.3
)

Correlation between exposure to multiple air pollutants during first trimester (gestational week 0-13). and for individual weeks in first, second and third trimester in the study population. Note: Blue: coefficient >0; red: coefficient <0.

Code
#Correlation between weekly exposures Week 1, Week 13, Week 28

# Exclude cases with all exposures missing
# Exclude cases with all exposures missing
data1 <- data %>%
  filter(
    !is.na(pm25_trim_first) & !is.na(pm10_trim_first) & !is.na(no2_trim_first) & !is.na(o3_trim_first) & 
    !is.na(pm25_trim_second) & !is.na(pm10_trim_second) & !is.na(no2_trim_second) & !is.na(o3_trim_second) &
    !is.na(pm25_trim_third) & !is.na(pm10_trim_third) & !is.na(no2_trim_third) & !is.na(o3_trim_third)
  )

# Create the exposure matrix
exp_matrix <- data1 %>%
  select(
    pm25_trim_first, pm25_trim_second, pm25_trim_third, 
    pm10_trim_first, pm10_trim_second, pm10_trim_third, 
    no2_trim_first, no2_trim_second, no2_trim_third,
    o3_trim_first, o3_trim_second, o3_trim_third
  )

# Set the names for the figure
colnames(exp_matrix) <- c(
  "T1 PM2.5", "T2 PM2.5", "T3 PM2.5",
  "T1 PM10", "T2 PM10", "T3 PM10",
  "T1 NO2", "T2 NO2", "T3 NO2",
  "T1 O3", "T2 O3", "T3 O3"
)

# Apply the cor function to the exposure matrix
polday <- cor(as.matrix(exp_matrix))

# Save the correlation plot
corrplot_T1T2T3 <- corrplot(
  method = "ellipse", 
  polday, 
  type = "lower", 
  order = "original", 
  title = element_blank(), 
  addCoef.col = 'black', 
  tl.col = "black", 
  cl.pos = 'n', 
  col = COL2('RdBu'),
  tl.srt = 90, 
  diag = FALSE, 
  number.cex = 0.6, 
  mar = c(0, 0, 1, 0), 
  tl.offset = 0.5, 
  tl.cex = 0.8,
  col.lim = c(-1, 1),
  cl.cex = 0.7
)

Outcomes descriptives

Distribution of birth weight and birth weight ultrasound z-scores.

Code
#Box plot of zBW by tertiles

#plot birth_weight_zscore_tertiles by actual birth weight
boxplot_BWtertiles <- ggplot(data %>% filter(!is.na(birth_weight)), aes(x = BW_tertiles, y = birth_weight)) +
  geom_boxplot() + 
  labs(x = "Birth Weight Tertiles", y = "Birth Weight (kg)") +
  ggtitle("Boxplot of birth weight by Birth Weight Tertiles")

#plot birth_weight_zscore_tertiles by actual birth weight
boxplot_zBWtertiles <- ggplot(data %>% filter(!is.na(zBW)), aes(x = zBW_tertiles, y = zBW)) +
  geom_boxplot() + 
  labs(x = "Z-score Birth Weight Tertiles", y = "z-score Birth Weight") +
  ggtitle("Boxplot of z-score birth weight by z-score Birth Weight Tertiles")

#plot birth weight z-scores as histogram

zBW_histogram <- ggplot(data, aes(x = zBW)) +
  geom_histogram(
    binwidth = 0.5, # Adjust bin width as needed
    fill = "steelblue",
    color = "black",
    alpha = 0.7
  ) +
  labs(
    title = "Histogram of zBW",
    x = "zBW",
    y = "Frequency"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5), # Center title
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12)
  ) 

#Make ggplots for z-score distribution

# Select relevant columns
selected_zscore_columns <- data %>%
  select(
    zAC_W16W23, zBPD_W16W23, zFL_W16W23,
    zAC_W24W31, zBPD_W24W31, zFL_W24W31,
    zAC_W32W40, zBPD_W32W40, zFL_W32W40
  )

# Reshape data to long format
long_zscore_data <- selected_zscore_columns %>%
  pivot_longer(
    cols = everything(),
    names_to = c("Variable", "TimeWindow"),
    names_pattern = "z(.*)_(W\\d{2}W\\d{2})",
    values_to = "zScore") %>%
  filter(zScore > -10 & zScore < 10)

zscore_color_mapping <- c("AC" = "green", "BPD" = "red", "FL" = "blue")

# Create a combined plot for z-scores
zscore_combined_plot <- long_zscore_data %>%
  ggplot(aes(x = zScore, fill = Variable, color = Variable)) +
  geom_histogram(
    aes(y = ..density..),
    position = "identity",
    alpha = 0.05,
    bins = 30
  ) +
  geom_density(alpha = 0.3, size = 1) +
  scale_fill_manual(values = zscore_color_mapping) +
  scale_color_manual(values = zscore_color_mapping) +
  facet_wrap(~ TimeWindow, scales = "free", ncol = 1) +
  labs(
    title = "Z-Scores for ultrasound measures in three gestational perods",
    x = "Z-Score",
    y = "Density"
  ) +
  theme_minimal() +
  theme(
    legend.title = element_blank(),
    strip.text = element_text(face = "bold")
  )

#EFW z-scores

# Select relevant columns for EFW z-scores
selected_efw_zscore_columns <- data %>%
  select(
    zEFW_W16W23, 
    zEFW_W24W31, 
    zEFW_W32W40
  )

# Reshape data to long format for EFW z-scores
long_efw_zscore_data <- selected_efw_zscore_columns %>%
  pivot_longer(
    cols = everything(),
    names_to = "TimeWindow",
    names_pattern = "zEFW_(W\\d{2}W\\d{2})",
    values_to = "zScore"
  ) %>%
  filter(zScore > -10 & zScore < 10)

# Define color mapping for EFW (if needed, can be single color)
efw_color_mapping <- c("EFW" = "purple")

# Create the combined plot for EFW z-scores
efw_zscore_combined_plot <- long_efw_zscore_data %>%
  ggplot(aes(x = zScore, fill = "EFW", color = "EFW")) +
  geom_histogram(
    aes(y = ..density..),
    position = "identity",
    alpha = 0.05,
    bins = 30
  ) +
  geom_density(alpha = 0.3, size = 1) +
  scale_fill_manual(values = efw_color_mapping) +
  scale_color_manual(values = efw_color_mapping) +
  facet_wrap(~ TimeWindow, scales = "free", ncol = 1) +
  labs(
    title = "Z-Scores for EFW in three gestational periods",
    x = "Z-Score",
    y = "Density"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",  # Hide legend since only one variable
    strip.text = element_text(face = "bold")
  )

Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

Outcome table and Missing outcomes

Distribution of birth and ultrasound raw outcomes, z-scores, and distribution of missing outcomes.

Code
#make trial dataset

trialdata <- data |> 
  select(zBW_tertiles,  zBW,
         birth_weight, birth_length, zBL, birth_headcircumference, zBHC,
         g_age_d_BPD_W16W23 , BPD_W16W23 , zBPD_W16W23 , 
                        g_age_d_BPD_W24W31 , BPD_W24W31 , zBPD_W24W31 , 
                        g_age_d_BPD_W32W40 , BPD_W32W40 , zBPD_W32W40 , 
                        g_age_d_AC_W16W23 , AC_W16W23, zAC_W16W23 , 
                        g_age_d_AC_W24W31 , AC_W24W31 , zAC_W24W31 , 
                        g_age_d_AC_W32W40 , AC_W32W40 , zAC_W32W40 , 
                        g_age_d_FL_W16W23 , FL_W16W23 , zFL_W16W23 , 
                        g_age_d_FL_W24W31 , FL_W24W31 , zFL_W24W31 , 
                        g_age_d_FL_W32W40 , FL_W32W40 , zFL_W32W40 , 
                        g_age_d_EFW_W16W23 , EFW_W16W23 , zEFW_W16W23 , 
                        g_age_d_EFW_W24W31 , EFW_W24W31 , zEFW_W24W31 , 
                        g_age_d_EFW_W32W40 , EFW_W32W40 , zEFW_W32W40)

#make trial crosstable by zBW
outcome_table <-
  tbl_summary(
    trialdata,
    include = c("birth_weight", "birth_length", "birth_headcircumference", 
                "zBL", "zBHC", 
      "BPD_W16W23", "BPD_W24W31", "BPD_W32W40",
"zBPD_W16W23", "zBPD_W24W31", "zBPD_W32W40", 
"AC_W16W23", "AC_W24W31", "AC_W32W40",
"zAC_W16W23", "zAC_W24W31", "zAC_W32W40", 
"FL_W16W23", "FL_W24W31", "FL_W32W40",
"zFL_W16W23", "zFL_W24W31", "zFL_W32W40", 
"EFW_W16W23", "EFW_W24W31", "EFW_W32W40",
"zEFW_W16W23", "zEFW_W24W31", "zEFW_W32W40"),
        by = zBW_tertiles, 
    label = list(birth_weight ~"Birth weight (kg)", birth_length ~"Birth length (cm)", birth_headcircumference ~"Birth head circumference (cm)", 
      BPD_W16W23 ~ "BPD W16-W23 (mm)", zBPD_W16W23 ~ "zBPD W16-W23", BPD_W24W31 ~ "BPD W24-W31 (mm)", zBPD_W24W31 ~ "zBPD W24-W31", BPD_W32W40 ~ "BPD W32-W40 (mm)", zBPD_W32W40 ~ "zBPD W32-W40", AC_W16W23 ~ "AC W16-W23 (mm)", zAC_W16W23 ~ "zAC W16-W23", AC_W24W31 ~ "AC W24-W31 (mm)", zAC_W24W31 ~ "zAC W24-W31", AC_W32W40 ~ "AC W32-W40 (mm)", zAC_W32W40 ~ "zAC W32-W40", FL_W16W23 ~ "FL W16-W23 (mm)", zFL_W16W23 ~ "zFL W16-W23", FL_W24W31 ~ "FL W24-W31 (mm)", zFL_W24W31 ~ "zFL W24-W31", FL_W32W40 ~ "FL W32-W40 (mm)", zFL_W32W40 ~ "zFL W32-W40", EFW_W16W23 ~ "EFW W16-W23 (g)", zEFW_W16W23 ~ "zEFW W16-W23", EFW_W24W31 ~ "EFW W24-W31 (g)", zEFW_W24W31 ~ "zEFW W24-W31", EFW_W32W40 ~ "EFW W32-W40 (g)", zEFW_W32W40 ~ "zEFW W32-W40"),
        statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"), 
    digits = all_continuous() ~ 1, 
      missing = "no") |> 
  add_overall() |> 
modify_header(label = "**Characteristics**") |> 
  bold_labels() |> 
modify_caption("Table 1. Anthropometry characteristics. during pregnancy, stratified by tertiles of birth weight z-score")  |> 
 modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**zBW tertiles**")

#make missing outcome table
missing_outcome_table <- tbl_summary(
  trialdata, 
  by = zBW_tertiles, 
  include = c("birth_weight", "birth_length", "birth_headcircumference", 
              "BPD_W16W23", "AC_W16W23", "FL_W16W23", "EFW_W16W23",
              "BPD_W24W31", "AC_W24W31", "FL_W24W31", "EFW_W24W31", 
              "BPD_W32W40", "AC_W32W40", "FL_W32W40", "EFW_W32W40"), 
  label = list(BPD_W16W23 ~ "BPD W16-W23", BPD_W24W31 ~ "BPD W24-W31", 
               BPD_W32W40 ~ "BPD W32-W40", AC_W16W23 ~ "AC W16-W23", 
               AC_W24W31 ~ "AC W24-W31", AC_W32W40 ~ "AC W32-W40", 
               FL_W16W23 ~ "FL W16-W23",  FL_W24W31 ~ "FL W24-W31", 
               FL_W32W40 ~ "FL W32-W40", EFW_W16W23 ~ "EFW W16-W23", 
               EFW_W24W31 ~ "EFW W24-W31", EFW_W32W40 ~ "EFW W32-W40", 
               birth_weight ~"Birth weight", birth_length ~"Birth length", birth_headcircumference ~"Birth head circumference"),
   statistic = list(all_continuous() ~ "{mean}", missing = "no"), missing_stat = "({p_miss}%)") %>%
  modify_header(label = "**Characteristics**") %>%
   add_n(statistic = "{N_miss} ({p_miss}%)")  %>%
  modify_header(n = "**Missing**") %>%
  bold_levels() %>%
  modify_caption("Table 2. Missing proportions of outcome characteristics")
Table 1. Anthropometry characteristics. during pregnancy, stratified by tertiles of birth weight z-score
Characteristics Overall
N = 497,402
1
zBW tertiles
Low tertile zBW
N = 165,831
1
Mid tertile zBW
N = 165,774
1
High tertile zBW
N = 165,797
1
Birth weight (kg) 3.5 (0.5) 3.1 (0.4) 3.5 (0.3) 4.0 (0.4)
Birth length (cm) 50.0 (4.5) 48.4 (4.7) 50.0 (4.2) 51.5 (4.1)
Birth head circumference (cm) 34.7 (3.8) 33.8 (3.9) 34.7 (3.6) 35.6 (3.7)
zBL 0.6 (1.1) -0.3 (0.9) 0.6 (0.8) 1.4 (0.8)
zBHC 0.9 (1.1) 0.2 (1.1) 0.9 (1.0) 1.6 (1.0)
BPD W16-W23 (mm) 44.0 (3.7) 43.9 (3.6) 44.0 (3.5) 44.1 (4.0)
BPD W24-W31 (mm) 76.4 (6.4) 75.9 (5.9) 76.2 (6.5) 77.3 (6.8)
BPD W32-W40 (mm) 89.5 (6.6) 88.4 (5.8) 89.1 (7.3) 91.2 (6.6)
zBPD W16-W23 -0.5 (1.0) -0.6 (0.9) -0.5 (0.9) -0.5 (1.2)
zBPD W24-W31 -0.4 (0.9) -0.7 (0.9) -0.3 (0.9) 0.0 (0.9)
zBPD W32-W40 -0.3 (1.4) -0.7 (1.2) -0.3 (1.6) 0.2 (1.3)
AC W16-W23 (mm) 139.1 (16.1) 137.7 (16.8) 139.1 (16.2) 140.6 (15.0)
AC W24-W31 (mm) 259.1 (26.1) 253.4 (22.2) 258.1 (25.6) 266.7 (28.6)
AC W32-W40 (mm) 324.9 (30.5) 313.2 (23.3) 322.2 (27.1) 341.5 (33.6)
zAC W16-W23 0.4 (1.7) 0.1 (1.9) 0.4 (1.8) 0.6 (1.5)
zAC W24-W31 0.8 (1.0) 0.2 (0.9) 0.8 (0.8) 1.5 (1.0)
zAC W32-W40 0.7 (1.1) -0.1 (0.8) 0.7 (0.7) 1.5 (1.0)
FL W16-W23 (mm) 29.0 (3.8) 28.8 (3.9) 29.0 (4.0) 29.2 (3.5)
FL W24-W31 (mm) 56.8 (5.8) 56.4 (5.1) 56.7 (6.6) 57.5 (5.7)
FL W32-W40 (mm) 69.0 (5.7) 68.2 (4.8) 68.5 (5.1) 70.3 (6.8)
zFL W16-W23 0.3 (1.5) 0.2 (1.6) 0.4 (1.7) 0.5 (1.3)
zFL W24-W31 1.1 (1.4) 0.7 (1.1) 1.1 (1.9) 1.4 (1.1)
zFL W32-W40 1.1 (1.3) 0.6 (1.0) 1.1 (1.0) 1.5 (1.7)
EFW W16-W23 (g) 232.9 (123.2) 228.1 (138.0) 232.5 (121.5) 238.2 (108.3)
EFW W24-W31 (g) 1,507.6 (393.2) 1,429.6 (327.1) 1,492.8 (385.8) 1,613.2 (444.3)
EFW W32-W40 (g) 2,756.7 (639.8) 2,537.7 (465.9) 2,686.2 (581.8) 3,089.1 (732.1)
zEFW W16-W23 -1.9 (17.2) -2.2 (28.1) -1.8 (6.0) -1.6 (7.8)
zEFW W24-W31 0.7 (1.0) 0.1 (0.9) 0.7 (0.8) 1.4 (0.9)
zEFW W32-W40 0.5 (1.0) -0.3 (0.8) 0.5 (0.7) 1.4 (0.8)
1 Mean (SD)
Table 2. Missing proportions of outcome characteristics
Characteristics Missing Low tertile zBW
N = 165,831
1
Mid tertile zBW
N = 165,774
1
High tertile zBW
N = 165,797
1
Birth weight 0 (0%) 3.05 3.51 4.05
Birth length 0 (0%) 48.45 50.00 51.54
Birth head circumference 0 (0%) 33.76 34.70 35.61
BPD W16-W23 25,535 (5.1%) 43.89 43.96 44.12
    Unknown
(5.4%) (4.9%) (5.1%)
AC W16-W23 64,546 (13%) 138 139 141
    Unknown
(13%) (13%) (13%)
FL W16-W23 28,646 (5.8%) 28.81 28.99 29.18
    Unknown
(6.0%) (5.5%) (5.8%)
EFW W16-W23 66,081 (13%) 228 232 238
    Unknown
(14%) (13%) (13%)
BPD W24-W31 418,159 (84%) 76 76 77
    Unknown
(82%) (85%) (85%)
AC W24-W31 418,389 (84%) 253 258 267
    Unknown
(82%) (85%) (85%)
FL W24-W31 422,171 (85%) 56.4 56.7 57.5
    Unknown
(83%) (86%) (86%)
EFW W24-W31 422,323 (85%) 1,430 1,493 1,613
    Unknown
(83%) (86%) (86%)
BPD W32-W40 322,849 (65%) 88.4 89.1 91.2
    Unknown
(60%) (69%) (66%)
AC W32-W40 318,599 (64%) 313 322 341
    Unknown
(59%) (68%) (65%)
FL W32-W40 331,598 (67%) 68.2 68.5 70.3
    Unknown
(61%) (70%) (68%)
EFW W32-W40 331,933 (67%) 2,538 2,686 3,089
    Unknown
(62%) (70%) (68%)
1 Mean

Repeated ultrasound outcomes during pregnancy: Availability

To show cases with repeated ultrasound measurements for gestational windows

Code
#Variables indicating overall data availability by gestational window
data <- data %>%
  mutate(
    # For weeks 16-23
    avail_W16W23 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W16W23"), 
                            starts_with("AC_W16W23"), 
                            starts_with("FL_W16W23"), 
                            starts_with("EFW_W16W23")))) > 0, 
      "Yes", "Missing"),
    # For weeks 24-31
    avail_W24W31 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W24W31"), 
                            starts_with("AC_W24W31"), 
                            starts_with("FL_W24W31"), 
                            starts_with("EFW_W24W31")))) > 0, 
      "Yes", "Missing"),
    # For weeks 32-40
    avail_W32W40 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W32W40"), 
                            starts_with("AC_W32W40"), 
                            starts_with("FL_W32W40"), 
                            starts_with("EFW_W32W40")))) > 0, 
      "Yes", "Missing"),
    # For birth variables
    avail_birth = if_else(
      rowSums(!is.na(select(., birth_weight, birth_length, birth_headcircumference))) > 0, 
      "Yes", "Missing")
  )

proportions <- data %>%
  mutate(
    time_points_available = rowSums(select(., avail_W16W23, avail_W24W31, avail_W32W40) == "Yes"),
    availability_group = case_when(
      time_points_available == 3 ~ "Measurements in all 3 gestational windows",
      time_points_available == 2 ~ "Measurements in 2 of 3 gestational windows",
      time_points_available == 1 ~ "Measurements in 1 of 3 gestational windows",
      time_points_available == 0 ~ "0 time points"
    )
  ) %>%
  group_by(availability_group) %>%
  summarise(
    count = n(),
    proportion = n() / nrow(data)
  ) %>%
  ungroup()

# Filter out the "0 time points" category
proportions <- proportions %>%
  filter(availability_group != "0 time points") %>%
  mutate(label = as.character(count)) # Only include the number of cases in the label

# Create the pie chart
ggplot(proportions, aes(x = "", y = proportion, fill = availability_group)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  theme_void() +
  labs(
    title = "Availability of Repeated Ultrasound Measuremnets Across Gestational Windows (n)",
    fill = "G"
  ) +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), size = 5) + # Add case count labels
  theme(
    legend.title = element_text(size = 14),  # Increase legend title font size
    legend.text = element_text(size = 12)    # Increase legend text font size
  ) +
  scale_fill_brewer(palette = "Set2")

Sociodemographics descriptives

Demographics by zBW tertiles and demographics for cases with missing outcomes by trimester.

Code
#Demographics by tertiles 

#make trial dataset
trialdata <- data |> 
  select(zBW_tertiles, gest_age_birth, birth_weight, birth_length, birth_headcircumference, zBHC,
sex, fstart, season, geogr, mat_age_cat, meduc, bmi_cat, smoke_preg, alcohol_cat, hypertension, dm2)

#make crosstable by zBW
baseline_table <-
  tbl_summary(
    trialdata,
    include = c("gest_age_birth", "birth_weight", #anthropometrics
                "sex", "fstart", "season",  #fetal characteristics
                "geogr", "mat_age_cat", "meduc", "bmi_cat", "smoke_preg"), #maternal characteristics
        by = zBW_tertiles, 
    label = list(gest_age_birth ~ "Gestational age at birth (days)", birth_weight ~ "Birth weight (kg)",
    sex ~ "Fetal sex", fstart ~ "Birth type", season ~ "Season of birth", 
    geogr ~ "Region of residency", mat_age_cat ~ "Maternal age at birth (years)", meduc ~ "Maternal education", bmi_cat ~ "Maternal body mass index", smoke_preg ~ "Maternal smoking status"),
        statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"), 
    digits = all_continuous() ~ 1, 
    missing_text = "Missing") |> 
  add_overall() |> 
modify_header(label = "**Characteristics**") |> 
  bold_labels() |> 
modify_caption("Table 1. Baseline characteristics. Birth anthropometrics, fetal and maternal characteristics stratified by tertiles of birth weight z-score")  |> 
 modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**zBW tertiles**")
Table: Characteristics for cases with available ultrasound parameters in respective gestational windows
Characteristics
Available data W16W23
Available data W24W31
Available data W32W40
N = 472,0981 N = 79,3371 N = 179,1351
Gestational age at birth (days) 278.7 (11.4) 274.5 (14.4) 277.9 (10.9)
Birth weight (kg) 3.5 (0.5) 3.4 (0.6) 3.5 (0.6)
Fetal sex


    Female 229,515 (49%) 39,301 (50%) 87,815 (49%)
    Male 242,583 (51%) 40,036 (50%) 91,320 (51%)
Birth type


    Cesarean section 35,863 (7.6%) 10,134 (13%) 18,167 (10%)
    Induction 85,997 (18%) 20,413 (26%) 46,314 (26%)
    Spontaneous 350,238 (74%) 48,790 (61%) 114,654 (64%)
Season of birth


    Winter 104,575 (22%) 18,286 (23%) 40,781 (23%)
    Spring 122,428 (26%) 20,031 (25%) 45,705 (26%)
    Summer 128,676 (27%) 21,725 (27%) 47,572 (27%)
    Fall 116,419 (25%) 19,295 (24%) 45,077 (25%)
Region of residency


    North 43,318 (9.2%) 7,389 (9.3%) 14,222 (7.9%)
    Central 200,986 (43%) 37,950 (48%) 81,978 (46%)
    South 227,794 (48%) 33,998 (43%) 82,935 (46%)
Maternal age at birth (years)


    ≤24 52,946 (11%) 8,448 (11%) 18,579 (10%)
    25-29 148,574 (31%) 22,436 (28%) 51,991 (29%)
    30-34 164,063 (35%) 26,241 (33%) 61,153 (34%)
    ≥35 106,461 (23%) 22,201 (28%) 47,394 (26%)
    Missing 54 11 18
Maternal education


    High school 152,610 (38%) 25,996 (40%) 57,919 (39%)
    Primary or lower 33,412 (8.4%) 6,116 (9.3%) 13,175 (8.8%)
    University or higher 212,411 (53%) 33,501 (51%) 79,286 (53%)
    Missing 73,665 13,724 28,755
Maternal body mass index


    Underweight 12,186 (2.7%) 2,610 (3.4%) 5,512 (3.2%)
    Normalweight 264,628 (58%) 39,056 (51%) 91,291 (53%)
    Overweight 117,423 (26%) 19,331 (25%) 42,931 (25%)
    Obesity 62,225 (14%) 15,203 (20%) 32,712 (19%)
    Missing 15,636 3,137 6,689
Maternal smoking status


    Smoker 20,506 (4.7%) 4,791 (6.5%) 10,822 (6.5%)
    Non-smoker 412,917 (95%) 68,385 (93%) 154,505 (93%)
    Missing 38,675 6,161 13,808
1 Mean (SD); n (%)
Table: Characteristics for cases with missing ultrasound parameters in respective gestational windows
Characteristics
Missing data W16W23
Missing data W24W31
Missing data W32W40
N = 25,3041 N = 418,0651 N = 318,2671
Gestational age at birth (days) 276.8 (13.6) 279.3 (10.7) 278.9 (11.8)
Birth weight (kg) 3.5 (0.6) 3.6 (0.5) 3.6 (0.5)
Fetal sex


    Female 12,222 (48%) 202,436 (48%) 153,922 (48%)
    Male 13,082 (52%) 215,629 (52%) 164,345 (52%)
Birth type


    Cesarean section 2,491 (9.8%) 28,220 (6.8%) 20,187 (6.3%)
    Induction 5,818 (23%) 71,402 (17%) 45,501 (14%)
    Spontaneous 16,995 (67%) 318,443 (76%) 252,579 (79%)
Season of birth


    Winter 5,999 (24%) 92,288 (22%) 69,793 (22%)
    Spring 6,175 (24%) 108,572 (26%) 82,898 (26%)
    Summer 6,692 (26%) 113,643 (27%) 87,796 (28%)
    Fall 6,438 (25%) 103,562 (25%) 77,780 (24%)
Region of residency


    North 2,887 (11%) 38,816 (9.3%) 31,983 (10%)
    Central 6,793 (27%) 169,829 (41%) 125,801 (40%)
    South 15,624 (62%) 209,420 (50%) 160,483 (50%)
Maternal age at birth (years)


    ≤24 2,968 (12%) 47,466 (11%) 37,335 (12%)
    25-29 7,315 (29%) 133,453 (32%) 103,898 (33%)
    30-34 8,085 (32%) 145,907 (35%) 110,995 (35%)
    ≥35 6,930 (27%) 91,190 (22%) 65,997 (21%)
    Missing 6 49 42
Maternal education


    High school 7,994 (39%) 134,608 (38%) 102,685 (38%)
    Primary or lower 1,863 (9.1%) 29,159 (8.3%) 22,100 (8.2%)
    University or higher 10,567 (52%) 189,477 (54%) 143,692 (54%)
    Missing 4,880 64,821 49,790
Maternal body mass index


    Underweight 484 (2.1%) 10,060 (2.5%) 7,158 (2.3%)
    Normalweight 11,423 (49%) 236,995 (59%) 184,760 (60%)
    Overweight 6,303 (27%) 104,395 (26%) 80,795 (26%)
    Obesity 4,942 (21%) 51,964 (13%) 34,455 (11%)
    Missing 2,152 14,651 11,099
Maternal smoking status


    Smoker 1,301 (5.8%) 17,016 (4.4%) 10,985 (3.8%)
    Non-smoker 21,060 (94%) 365,592 (96%) 279,472 (96%)
    Missing 2,943 35,457 27,810
1 Mean (SD); n (%)
Table: Characteristics for cases stratified by repeated outcome availability
Characteristics
Number of gestational windows for which data is available
3
N = 41,593
1
0
N = 10,575
1
1
N = 284,677
1
2
N = 160,557
1
Gestational age at birth (days) 278.4 (13.2) 279.5 (10.9) 277.8 (12.3) 275.2 (11.6)
Birth weight (kg) 3.5 (0.6) 3.6 (0.5) 3.5 (0.6) 3.4 (0.6)
Fetal sex



    Female 5,101 (48%) 137,325 (48%) 78,627 (49%) 20,684 (50%)
    Male 5,474 (52%) 147,352 (52%) 81,930 (51%) 20,909 (50%)
Birth type



    Cesarean section 750 (7.1%) 16,909 (5.9%) 14,830 (9.2%) 5,865 (14%)
    Induction 1,845 (17%) 40,205 (14%) 36,776 (23%) 12,989 (31%)
    Spontaneous 7,980 (75%) 227,563 (80%) 108,951 (68%) 22,739 (55%)
Season of birth



    Winter 2,398 (23%) 62,274 (22%) 36,338 (23%) 9,564 (23%)
    Spring 2,473 (23%) 74,506 (26%) 41,214 (26%) 10,410 (25%)
    Summer 2,971 (28%) 78,107 (27%) 43,004 (27%) 11,286 (27%)
    Fall 2,733 (26%) 69,790 (25%) 40,001 (25%) 10,333 (25%)
Region of residency



    North 1,577 (15%) 27,878 (9.8%) 13,199 (8.2%) 3,551 (8.5%)
    Central 2,291 (22%) 111,073 (39%) 73,404 (46%) 21,011 (51%)
    South 6,707 (63%) 145,726 (51%) 73,954 (46%) 17,031 (41%)
Maternal age at birth (years)



    ≤24 1,239 (12%) 33,334 (12%) 17,384 (11%) 3,957 (9.5%)
    25-29 3,200 (30%) 93,615 (33%) 47,836 (30%) 11,238 (27%)
    30-34 3,384 (32%) 99,922 (35%) 54,991 (34%) 13,851 (33%)
    ≥35 2,752 (26%) 57,769 (20%) 40,323 (25%) 12,547 (30%)
    Missing 0 37 23 0
Maternal education



    High school 3,468 (39%) 91,258 (38%) 52,367 (39%) 13,511 (39%)
    Primary or lower 699 (7.9%) 19,633 (8.2%) 11,759 (8.7%) 3,184 (9.3%)
    University or higher 4,664 (53%) 129,142 (54%) 71,460 (53%) 17,712 (51%)
    Missing 1,744 44,644 24,971 7,186
Maternal body mass index



    Underweight 164 (1.7%) 6,201 (2.3%) 4,808 (3.1%) 1,497 (3.7%)
    Normalweight 5,243 (53%) 166,285 (61%) 84,879 (55%) 19,644 (49%)
    Overweight 2,846 (29%) 72,279 (26%) 38,397 (25%) 10,204 (25%)
    Obesity 1,606 (16%) 29,675 (11%) 27,193 (18%) 8,693 (22%)
    Missing 716 10,237 5,280 1,555
Maternal smoking status



    Smoker 378 (4.0%) 9,242 (3.6%) 9,684 (6.5%) 2,503 (6.5%)
    Non-smoker 9,138 (96%) 249,926 (96%) 138,858 (93%) 36,055 (94%)
    Missing 1,059 25,509 12,015 3,035
1 Mean (SD); n (%)
Table 1. Baseline characteristics. Birth anthropometrics, fetal and maternal characteristics stratified by tertiles of birth weight z-score
Characteristics Overall
N = 497,402
1
zBW tertiles
Low tertile zBW
N = 165,831
1
Mid tertile zBW
N = 165,774
1
High tertile zBW
N = 165,797
1
Gestational age at birth (days) 278.6 (11.5) 277.6 (12.1) 278.3 (11.8) 279.8 (10.4)
Birth weight (kg) 3.5 (0.5) 3.1 (0.4) 3.5 (0.3) 4.0 (0.4)
Fetal sex



    Female 241,737 (49%) 79,380 (48%) 81,181 (49%) 81,176 (49%)
    Male 255,665 (51%) 86,451 (52%) 84,593 (51%) 84,621 (51%)
Birth type



    Cesarean section 38,354 (7.7%) 9,848 (5.9%) 12,569 (7.6%) 15,937 (9.6%)
    Induction 91,815 (18%) 32,069 (19%) 27,120 (16%) 32,626 (20%)
    Spontaneous 367,233 (74%) 123,914 (75%) 126,085 (76%) 117,234 (71%)
Season of birth



    Winter 110,574 (22%) 37,142 (22%) 37,075 (22%) 36,357 (22%)
    Spring 128,603 (26%) 42,875 (26%) 43,024 (26%) 42,704 (26%)
    Summer 135,368 (27%) 44,671 (27%) 45,010 (27%) 45,687 (28%)
    Fall 122,857 (25%) 41,143 (25%) 40,665 (25%) 41,049 (25%)
Region of residency



    North 46,205 (9.3%) 14,629 (8.8%) 15,436 (9.3%) 16,140 (9.7%)
    Central 207,779 (42%) 70,317 (42%) 69,586 (42%) 67,876 (41%)
    South 243,418 (49%) 80,885 (49%) 80,752 (49%) 81,781 (49%)
Maternal age at birth (years)



    ≤24 55,914 (11%) 21,679 (13%) 18,646 (11%) 15,589 (9.4%)
    25-29 155,889 (31%) 52,489 (32%) 52,743 (32%) 50,657 (31%)
    30-34 172,148 (35%) 55,880 (34%) 57,248 (35%) 59,020 (36%)
    ≥35 113,391 (23%) 35,759 (22%) 37,115 (22%) 40,517 (24%)
    Missing 60 24 22 14
Maternal education



    High school 160,604 (38%) 51,678 (38%) 52,750 (38%) 56,176 (40%)
    Primary or lower 35,275 (8.4%) 14,146 (10%) 11,376 (8.1%) 9,753 (6.9%)
    University or higher 222,978 (53%) 71,859 (52%) 75,662 (54%) 75,457 (53%)
    Missing 78,545 28,148 25,986 24,411
Maternal body mass index



    Underweight 12,670 (2.6%) 6,712 (4.2%) 3,963 (2.5%) 1,995 (1.2%)
    Normalweight 276,051 (58%) 99,690 (62%) 95,339 (60%) 81,022 (51%)
    Overweight 123,726 (26%) 36,060 (23%) 40,442 (25%) 47,224 (30%)
    Obesity 67,167 (14%) 17,292 (11%) 20,186 (13%) 29,689 (19%)
    Missing 17,788 6,077 5,844 5,867
Maternal smoking status



    Smoker 21,807 (4.8%) 10,047 (6.6%) 6,593 (4.3%) 5,167 (3.4%)
    Non-smoker 433,977 (95%) 142,026 (93%) 145,375 (96%) 146,576 (97%)
    Missing 41,618 13,758 13,806 14,054
1 Mean (SD); n (%)

Included vs Excluded

Demographics of included (inclusion criteria: date of conception, ultrasound, exposure data) vs excluded (SPR dataset).

Table. Included and Excluded Characteristics
Characteristics
All SPR births n=620,866
Excluded
N = 123,464
1
Included
N = 497,402
1
Gestational age at birth (days) 39.4 (2.6) 39.7 (1.6)
Birth weight (g) 3,462.7 (655.3) 3,537.8 (535.6)
Preterm birth 7,964 (7.2%) 22,853 (4.6%)
Fetal sex

    Boy 63,539 (52%) 255,665 (51%)
    Girl 59,765 (48%) 241,737 (49%)
    Unknown 0 (0%) 0 (0%)
Birth type

    Elektiv sectio 7,136 (5.8%) 38,354 (7.7%)
    Induktion 18,174 (15%) 91,815 (18%)
    Spontan 98,154 (80%) 367,233 (74%)
Region of residency

    North 7,272 (11%) 46,205 (9.3%)
    Central 10,618 (17%) 207,779 (42%)
    South 45,453 (72%) 243,418 (49%)
Maternal age at birth (years)

    <=24 20,186 (16%) 55,914 (11%)
    25-29 40,132 (33%) 155,889 (31%)
    30-34 39,267 (32%) 172,148 (35%)
    >=35 23,746 (19%) 113,391 (23%)
Maternal education

    Primary or lower 10,327 (10%) 35,275 (7.7%)
    High school 36,552 (36%) 160,604 (35%)
    University or higher 42,808 (42%) 222,978 (49%)
    Don´t know 12,222 (12%) 40,327 (8.8%)
Maternal body mass index

    Underweight 2,399 (2.5%) 12,670 (2.6%)
    Normalweight 52,599 (54%) 276,051 (58%)
    Overweight 26,991 (28%) 123,726 (26%)
    Obesity 15,371 (16%) 67,167 (14%)
Maternal smoking status 4,964 (6.0%) 21,807 (4.8%)
1 Mean (SD); n (%)

Analysis

Association between IQR increment in exposure and outcomes

To assess the association between mean air pollution levels during the first and exploratively second trimester of pregnancy and outcomes in single pollutant linear regression models.

Exposure (independent variable, IQR increment): Mean preconception and first trimester levels of PM2.5, PM10, NO2 and O3

Outcomes (dependent variable, z-scores): zAC_W16W23, zFL_W16W23, zBPD_W16W23, zAC_W24W31, zFL_W24W31, zBPD_W24W31, zAC_W32W40, zFL_W32W40, zBPD_W32W40, zBW, zBL, zBHC

Co-variates (adjusted for in all models): year and month of conception, maternal education, BMI and age

Function Linear regression: First and second trimester exposure and all fetal and birth outcomes
Results written to T1_birth_regression_results.xlsx 
Results written to T1_W16W23_regression_results.xlsx 
Results written to T1_W24W31_regression_results.xlsx 
Results written to T1_W32W40_regression_results.xlsx 
Results written to T2_birth_regression_results.xlsx 
Results written to T2_W24W31_regression_results.xlsx 
Results written to T2_W32W40_regression_results.xlsx 
Combined excel files for T1 and for T2

By pollutant. T1 exposure and outcomes. Associations in linear regression

By time point in pregnancy: T1 exposure and outcomes Associations in linear regression

By time point in pregnancy: T2 exposure and outcomes Associations in linear regression

Subanalysis: Repeated observations W16W23 and W32W40(n=153,000) : Linear regression

Results written to T1_birth_regression_results.xlsx 
Results written to T1_W16W23_regression_results.xlsx 
Results written to T1_W24W31_regression_results.xlsx 
Results written to T1_W32W40_regression_results.xlsx 

Mean levels and correlation of exposure Trimester 1 and Trimester 2

Mean exposure levels in T1 and T2

             T1        T2
PM2.5  7.551261  7.500292
PM10  15.219967 15.310912
NO2   14.259364 14.329791
O3    53.326492 54.175025

Correlation of exposure levels in T1 and T2

Assessment of linearity

Redo zAC, zFL, zBPD, zEFW linear regression
Redo zBW / zBL / zBHC linear regression
tertile linear regression: zAC, zFL, zBPD, zEFW
tertile linear regression zBW / zBL / zBHC
Prepare for Tertile/Linear regression Forest Plot
[1] "pollutant"           "exposure intercept"  "intercept threshold"
[4] "p"                   "beta"                "ci_low"             
[7] "ci_up"               "outcome"            

Function Forest Plot

Assessment of linearity: GLM model zBW

png 
  2 
png 
  2 
png 
  2 
png 
  2 

Assessment of linearity: GLM model zAC_W16W23

png 
  2 
png 
  2 
png 
  2 
png 
  2 

Stratification birth weight / zBPDW16W23

Two pollutant model

png 
  2 
png 
  2 
zBW two pollutant

zBPD two pollutant

G computational method

G-computation for analyzing the effects of exposure mixtures. Quantile g-computation yields estimates of the effect of increasing all exposures by one quantile, simultaneously. This, it estimates a “mixture effect” useful in the study of exposure mixtures such as air pollution.

Linear model:

Adjusted for year and month of conception, education, BMI and maternal age 4 Quantiles

G computational models, 4 quantiles without Ozone

      Outcome   Beta  Lower  Upper Sum_positive Sum_negative  Analysis
1 zBPD_W16W23 -0.010 -0.347 -0.014        0.000        -0.01 Q4, no O3
2  zAC_W16W23 -0.006  0.373 -0.013        0.003        -0.01 Q4, no O3
3  zFL_W16W23  0.002  0.287 -0.004        0.012        -0.01 Q4, no O3

G computational models, 4 quantiles with Ozone

      Outcome   Beta  Lower  Upper Sum_positive Sum_negative    Analysis
1 zBPD_W16W23 -0.014 -0.341 -0.019        0.000       -0.014 Q4, with O3
2  zAC_W16W23 -0.007  0.373 -0.018        0.004       -0.011 Q4, with O3
3  zFL_W16W23  0.010  0.269  0.001        0.018       -0.008 Q4, with O3

5 quantiles, without ozone

      Outcome   Beta  Lower  Upper Sum_positive Sum_negative  Analysis
1 zBPD_W16W23 -0.008 -0.337 -0.011        0.001       -0.009 Q5, no O3
2  zAC_W16W23 -0.006  0.375 -0.011        0.004       -0.010 Q5, no O3
3  zFL_W16W23  0.001  0.298 -0.004        0.010       -0.009 Q5, no O3

10 quantiles, without ozone

      Outcome   Beta  Lower  Upper Sum_positive Sum_negative   Analysis
1 zBPD_W16W23 -0.004 -0.339 -0.005        0.000       -0.004 Q10, no O3
2  zAC_W16W23 -0.003  0.366 -0.005        0.001       -0.004 Q10, no O3
3  zFL_W16W23  0.001  0.282 -0.002        0.005       -0.005 Q10, no O3

Joint G computational results for first trimester: with/without O3, Q4, Q5, Q10
       Outcome   Beta  Lower  Upper Sum_positive Sum_negative    Analysis
1   zAC_W16W23 -0.003  0.366 -0.005        0.001       -0.004  Q10, no O3
2   zAC_W16W23 -0.006  0.373 -0.013        0.003       -0.010   Q4, no O3
3   zAC_W16W23 -0.007  0.373 -0.018        0.004       -0.011 Q4, with O3
4   zAC_W16W23 -0.006  0.375 -0.011        0.004       -0.010   Q5, no O3
5  zBPD_W16W23 -0.004 -0.339 -0.005        0.000       -0.004  Q10, no O3
6  zBPD_W16W23 -0.010 -0.347 -0.014        0.000       -0.010   Q4, no O3
7  zBPD_W16W23 -0.014 -0.341 -0.019        0.000       -0.014 Q4, with O3
8  zBPD_W16W23 -0.008 -0.337 -0.011        0.001       -0.009   Q5, no O3
9   zFL_W16W23  0.001  0.282 -0.002        0.005       -0.005  Q10, no O3
10  zFL_W16W23  0.002  0.287 -0.004        0.012       -0.010   Q4, no O3
11  zFL_W16W23  0.010  0.269  0.001        0.018       -0.008 Q4, with O3
12  zFL_W16W23  0.001  0.298 -0.004        0.010       -0.009   Q5, no O3

Longitudinal models

Calculate Delta Beta for zBPD, zAC, zFL

Visualise differences in Δz-scores

zBPD_d1d2, zBPD_d1d3, zBPD_d2d3 zAC_d1d2, zAC_d1d3, zAC_d2d3 zFL_d1d2, zFL_d1d3, zFL_d2d3

Longitudinal analysis: Association between IQR increment in exposure and difference in outcomes

To assess the association between mean air pollution levels during the first trimester of pregnancy and differences in outcomes in single pollutant linear regression models.

Exposure (independent variable, IQR increment): Mean first trimester levels of PM2.5, PM10, NO2 and O3

Outcomes (dependent variable, z-scores): zBPD_d1d2, zBPD_d1d3, zBPD_d2d3 zAC_d1d2, zAC_d1d3, zAC_d2d3 zFL_d1d2, zFL_d1d3, zFL_d2d3

Co-variates (adjusted for in all models): year and month of conception, maternal education, BMI and age

Function Linear regression
Results written to delta_BPD_regression_results.xlsx 
Results written to delta_AC_regression_results.xlsx 
Results written to delta_FL_regression_results.xlsx 
Combined excel files

By pollutant. Associations between T1 exposure and Δ outcomes in linear regression

By outcome. Associations between T1 exposure and Δ outcomes in linear regression

KUB sensitivity analyses

Linear regression among KUB cases

Association between IQR increment in exposure and outcomes

To assess the association between mean air pollution levels during the first and exploratively second trimester of pregnancy and outcomes in single pollutant linear regression models.

Exposure (independent variable, IQR increment): Mean preconception and first trimester levels of PM2.5, PM10, NO2 and O3

Outcomes (dependent variable, z-scores): zAC_W16W23, zFL_W16W23, zBPD_W16W23, zAC_W24W31, zFL_W24W31, zBPD_W24W31, zAC_W32W40, zFL_W32W40, zBPD_W32W40, zBW, zBL, zBHC

Co-variates (adjusted for in all models): year and month of conception, maternal education, BMI and age

Function Linear regression: First and second trimester exposure and all fetal and birth outcomes
Results written to T1_birth_regression_results.xlsx 
Results written to T1_W16W23_regression_results.xlsx 
Results written to T1_W24W31_regression_results.xlsx 
Results written to T1_W32W40_regression_results.xlsx 
Results written to T2_birth_regression_results.xlsx 
Results written to T2_W24W31_regression_results.xlsx 
Results written to T2_W32W40_regression_results.xlsx 
Combined excel files for T1 and for T2

By pollutant. T1 exposure and outcomes. Associations in linear regression

To do list: - Update flow chart - Add repeated measurement to flow chart - Check G computational results?

Presentation - Summarize results - Delayed effect on FL over time? - Pronounced effect on birth weight - G-computational model for all outcomes?