Descriptives Fetal Growth Air Pollution

Purpose

To explore descriptive statistics on exposure, outcome and sociodemographics including data availability.

A data base compiling fetal ultrasound and birth anthropometric outcomes, corresponding z-scores, sociodemographic information on pregnant and their children, and corresponding exposure data on 4 air pollutants by gestational week is used.

Data flow chart

Raw datasets

Birth dataset English Language: chamnha_healthdata Cases: n=620,866; Individual ids: 620,866; 2014-2019; Variables: ID, date of birth, gestational age, demographics

Birth dataset Swedish language: db Cases: n=639,236; Individual ids: 629,989; 2014-2019; Variables: ID, date of birth, gestational age, demographics

Ultrasound dataset: ultraljud_foster; Individual ids: 569038; 2014-2019; Variables: ID, date of ultrasound (several per case), number of fetuses, ultrasound measures (crl, bpd etc.)

KUB ultrasound dataset: kub_undersokningar; Cases: n=213,250; Individual ids: 205,390; 2014-2019; Variables: ID, date of birth, ultrasound measures

Exposure dataset: Individual ids: n=500,958; exposure data provided by spatiotemporal model

Code

# Review availability of cases in datasets used in this project

#access datasets: Birth dataset English language
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("1_originaldata/chamnha_healthdata.Rda")
#names(ch_ht)
#num_unique_ids <- length(unique(ch_ht$Graviditet))
#num_unique_ids
#access datasets: Birth dataset Swedish language
setwd("P:/EnvEpi-Projekt/CHAMNHA")
load("1_paper_temp_ptb/02_datasets/health_data.Rda")
#names(db)
#num_unique_ids <- length(unique(db$GraviditetID))
#num_unique_ids
#access datasets: Ultrasound dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
data_us <- as.data.table(read.csv(file = "1_originaldata/ultraljud_foster.csv", sep = ";")) 
#names(data_us)
#num_unique_ids <- length(unique(data_us$graviditetid))
#num_unique_ids 
#access datasets: KUB dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("2_processeddata/conceptiondate_all/conceptiondate_kub.Rda")
#names(data1)
#num_unique_ids <- length(unique(data1$id))
#num_unique_ids

Identical cases matrix across datasets

Overlap matrix of identical cases between the four datasets Birth dataset English Language, Birth dataset Swedish language, Ultrasound dataset, KUB ultrasound dataset

Code

#Rename id variable, remove other cases, deduplicate cases 
bir_en <- ch_ht %>%
  rename(id = Graviditet) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
bir_sw <- db %>%
  rename(id = GraviditetID) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
ultra <- data_us %>%
  rename(id = graviditetid) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
kub <- data1 %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
rm(data1, db, ch_ht, data_us)

#Make a matrix of identical cases in the four datasets

# Count the total number of cases in each dataset
dataset_counts <- sapply(list(bir_en, bir_sw, ultra, kub), nrow)
names(dataset_counts) <- c("bir_en", "bir_sw", "ultra", "kub")

# Calculate the overlap matrix
datasets <- list(bir_en$id, bir_sw$id, ultra$id, kub$id)
overlap_matrix <- matrix(0, nrow = 4, ncol = 4, dimnames = list(names(dataset_counts), names(dataset_counts)))

for (i in 1:4) {
  for (j in 1:4) {
    overlap_matrix[i, j] <- length(intersect(datasets[[i]], datasets[[j]]))
  }
}

# Add dataset counts as row and column labels
rownames(overlap_matrix) <- paste0(names(dataset_counts), " (", dataset_counts, ")")
colnames(overlap_matrix) <- paste0(names(dataset_counts), " (", dataset_counts, ")")

print(overlap_matrix)

                bir_en (620866) bir_sw (629989) ultra (569038) kub (205390)
bir_en (620866)          620866          620866         560508       202163
bir_sw (629989)          620866          629989         569038       205390
ultra (569038)           560508          569038         569038       194180
kub (205390)             202163          205390         194180       205390

Code

rm(list = ls())

Suggested inclusion criteria

registered in the Swedish Pregnancy Register 2014-2019: birth_swe/db (n=629989) and birth_eng/ch_ht (n=620866)
??any ultrasound outcome during pregnancy available: ultrasound dataset (n=569038) and kub (n=205390) ??
date of conception via: 1) kub exam; 2) ultrasound exam; 3) birth dataset
residential address available for matching exposure with individual pregnant women (n=500,XXX)
?birth weight z-score?: birth weight z-score calculated by gestational age-specific birth weight provided by birth_eng/birth_swe datasets

Exclusion criteria

??no outcome data??
??no co-variates??

Flow chart

Data transformation

The dataset is modified for presentation purposes and variables. For example, variables on month, year and season of conception, birth weight z-score tertiles are defined.

To Do: Add data from Swedish birth dataset (sociodemographics need to be translated and merged)

Code

rm(list = ls())
# set wd for manual work (even though already specified in YAML code)
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("2_processeddata/conceptiondate_all/conceptiondate_merged_ustype_gestage_maxultrasoundmeasurement_birthoutcomes_USzscores_EFWzscores_BWzscores_whozscores_sorted_outcavail_socdem_wide_expos.Rda")

#rename database
data <- data_exposure
rm(data_exposure)

# remove data without exposure and without zBW
data <- data %>%
  filter(
    !is.na(pm25_trim_first) & !is.na(pm10_trim_first) & !is.na(no2_trim_first) & !is.na(o3_trim_first)) %>% 
  filter(!is.na(zBW))

#rename variables
data <- data %>%
  mutate(mat_age_cat = dplyr::recode(mat_age_cat, 
                         "<=24" = "≤24", 
                         ">=35" = "≥35"),
    fstart = dplyr::recode(fstart, 
                    "Elektiv sectio" = "Cesarean section", 
                    "Induktion" = "Induction", 
                    "Spontan" = "Spontaneous")
  )

#define season of conception
#month of conception
data <- as.data.table(data)
data[, month_conception := as.numeric(format(conceptiondate, "%m"))]
#year of conception
data[, year_conception := as.numeric(format(conceptiondate, "%y"))]
#year group
data <- data %>%
  mutate(year_group = case_when(
    year_conception %in% c(13, 14) ~ "2013/14",
    year_conception %in% c(15, 16) ~ "2015/16",
    year_conception %in% c(17, 18, 19) ~ "2017/18/19",
    TRUE ~ as.character(year_conception)
  ))
#define season of conception
data <- data %>%
  mutate(season_conception = factor(case_when(
    month_conception %in% c(12, 1, 2) ~ "Winter",
    month_conception %in% c(3, 4, 5) ~ "Spring",
    month_conception %in% c(6, 7, 8) ~ "Summer",
    month_conception %in% c(9, 10, 11) ~ "Fall",
    TRUE ~ NA_character_
  ), levels = c("Winter", "Spring", "Summer", "Fall")))
# Create a new variable birth_weight_tertiles and birth weight
data <- data %>%
  mutate(
    zBW_tertiles = cut(
      zBW,
      breaks = quantile(zBW, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile zBW", "Mid tertile zBW", "High tertile zBW"),
      include.lowest = TRUE)) %>%
  mutate(
    BW_tertiles = cut(
      birth_weight,
      breaks = quantile(birth_weight, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile BW", "Mid tertile BW", "High tertile BW"),
      include.lowest = TRUE))
# Create tertiles for each pollutant
data <- data %>%
  mutate(
    pm25_tertiles = cut(
      pm25_trim_first,
      breaks = quantile(pm25_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    pm10_tertiles = cut(
      pm10_trim_first,
      breaks = quantile(pm10_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    no2_tertiles = cut(
      no2_trim_first,
      breaks = quantile(no2_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    o3_tertiles = cut(
      o3_trim_first,
      breaks = quantile(o3_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  )

# Clean specific variables
data <- data %>%
  mutate(meduc = if_else(meduc == "Don´t know", NA_character_, meduc)) %>% 
 mutate(smoke_preg = factor(smoke_preg,
                            levels = c("Yes", "No"),
                            labels = c("Smoker", "Non-smoker")))
#clean variable zBHC (n=61)
data$zBHC[is.infinite(data$zBHC)] <- NA

Exposure descriptives

Mean and tertile exposure levels

Mean and tertile air pollution levels during first trimester in the study population.

Code

# Mean pollution levels
trialdata <- data |> 
  select(pm25_trim_first, pm10_trim_first, no2_trim_first, o3_trim_first)

mean_exposure_table <-
  tbl_summary(
    trialdata,
    label = list(pm25_trim_first ~" PM2.5 (µg/m3)", pm10_trim_first ~ "PM10 (µg/m3)", no2_trim_first ~ "NO2 (µg/m3)", o3_trim_first ~ "O3 (µg/m3)"),
        statistic = list(all_continuous() ~ "{mean} ({sd})"), 
    digits = all_continuous() ~ 1,
    missing = "no") |> 
      modify_header(label = "**Characteristics**") |> 
      bold_labels() |> 
      modify_caption("Table 2: Ambient air pollution levels during first trimester in the study population")

# Tertile pollution levels
# Summary table for PM10
trialdata <- data |> 
  select(pm25_trim_first, pm10_trim_first, no2_trim_first, o3_trim_first, 
         pm25_tertiles, pm10_tertiles, no2_tertiles, o3_tertiles)

summary_table_pm10 <- tbl_summary(
  trialdata,
  by = pm10_tertiles, 
  include = c("pm10_trim_first"),
  label = list(pm10_trim_first ~ "PM10 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no")
# Summary table for NO2
summary_table_no2 <- tbl_summary(
  trialdata,
  by = no2_tertiles, 
  include = c("no2_trim_first"),
  label = list(no2_trim_first ~ "NO2 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)
# Summary table for O3
summary_table_o3 <- tbl_summary(
  trialdata,
  by = o3_tertiles, 
  include = c("o3_trim_first"),
  label = list(o3_trim_first ~ "O3 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)
# Summary table for PM25
summary_table_pm25 <- tbl_summary(
  trialdata,
  by = pm25_tertiles, 
  include = c("pm25_trim_first"),
  label = list(pm25_trim_first ~ "PM2.5 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)

Table 2: Ambient air pollution levels during first trimester in the study population
Characteristics	N = 497,402¹
PM2.5 (µg/m3)	7.6 (2.0)
PM10 (µg/m3)	15.2 (4.1)
NO2 (µg/m3)	14.3 (8.0)
O3 (µg/m3)	53.3 (9.5)
¹ Mean (SD)

Characteristic	Low tertile N = 165,801¹	Mid tertile N = 165,800¹	High tertile N = 165,801¹
PM2.5 (µg/m3)	5.4 (1.0)	7.6 (0.5)	9.7 (1.1)
¹ Mean (SD)

Characteristic	Low tertile N = 165,801¹	Mid tertile N = 165,800¹	High tertile N = 165,801¹
PM10 (µg/m3)	11.2 (1.8)	14.7 (0.8)	19.8 (2.9)
¹ Mean (SD)

Characteristic	Low tertile N = 165,801¹	Mid tertile N = 165,800¹	High tertile N = 165,801¹
NO2 (µg/m3)	6.2 (2.7)	13.3 (1.6)	23.3 (5.9)
¹ Mean (SD)

Characteristic	Low tertile N = 165,801¹	Mid tertile N = 165,800¹	High tertile N = 165,801¹
O3 (µg/m3)	43.3 (3.2)	52.2 (3.1)	64.5 (4.7)
¹ Mean (SD)

Exposure in relation to WHO/EU guidelines

WHO/EU guidelines for individual air pollutants and proportion of population exposed to levels exceeding guuidelines during first trimester.

Code

thresholds <- tibble(
  Pollutant = c("PM2.5", "PM10", "NO2", "O3"),
  WHO_Threshold = c(5, 15, 10, 60),
  EU_Threshold = c(10, 20, 20, 100)
)

# Calculate exceedance proportions
exceedance_data <- data %>%
  summarise(
    PM2.5_WHO = mean(pm25_trim_first > thresholds$WHO_Threshold[1], na.rm = TRUE),
    PM10_WHO = mean(pm10_trim_first > thresholds$WHO_Threshold[2], na.rm = TRUE),
    NO2_WHO = mean(no2_trim_first > thresholds$WHO_Threshold[3], na.rm = TRUE),
    O3_WHO = mean(o3_trim_first > thresholds$WHO_Threshold[4], na.rm = TRUE),
    PM2.5_EU = mean(pm25_trim_first > thresholds$EU_Threshold[1], na.rm = TRUE),
    PM10_EU = mean(pm10_trim_first > thresholds$EU_Threshold[2], na.rm = TRUE),
    NO2_EU = mean(no2_trim_first > thresholds$EU_Threshold[3], na.rm = TRUE),
    O3_EU = mean(o3_trim_first > thresholds$EU_Threshold[4], na.rm = TRUE)
  ) %>%
  pivot_longer(
    cols = everything(),
    names_to = c("Pollutant", "Standard"),
    names_sep = "_",
    values_to = "Proportion_Exceeding"
  ) %>%
  mutate(Proportion_Exceeding = round(Proportion_Exceeding * 100, 1)) # Convert to percentage

# Combine WHO and EU data into a single table
final_table <- thresholds %>%
  left_join(
    exceedance_data %>%
      filter(Standard == "WHO") %>%
      select(Pollutant, WHO_Exceedance = Proportion_Exceeding),
    by = "Pollutant"
  ) %>%
  left_join(
    exceedance_data %>%
      filter(Standard == "EU") %>%
      select(Pollutant, EU_Exceedance = Proportion_Exceeding),
    by = "Pollutant"
  )

# Create the gt table
gt_table <- final_table %>%
  gt() %>%
  tab_header(
    title = "International Guidelines and Proportion of Cases Exceeding",
    subtitle = "WHO (2021) and EU (2024) Standards for Air Pollutants (Average in First Trimester)"
  ) %>%
  cols_label(
    Pollutant = "Pollutant",
    WHO_Threshold = "WHO Guideline (µg/m³)",
    WHO_Exceedance = "Proportion Exceeding WHO (%)",
    EU_Threshold = "EU Guideline (µg/m³)",
    EU_Exceedance = "Proportion Exceeding EU (%)"
  )

Pollutant	WHO Guideline (µg/m³)	EU Guideline (µg/m³)	Proportion Exceeding WHO (%)	Proportion Exceeding EU (%)
International Guidelines and Proportion of Cases Exceeding
WHO (2021) and EU (2024) Standards for Air Pollutants (Average in First Trimester)
PM2.5	5	10	88.9	10.1
PM10	15	20	45.7	13.4
NO2	10	20	69.7	20.2
O3	60	100	27.2	0.0

Exposure by season and year

Distribution of air pollutants by season and by year.

Code

#boxplots: air quality by season of conception

#select data: air pollutants and season
data1 <- data %>%
  select(pm25_wk_1, pm10_wk_1, o3_wk_1, no2_wk_1, 
         season_conception)

#reshape data into long format
data1_long <- data1 %>%
  gather(variable, value, pm25_wk_1:no2_wk_1)

#replace values of air pollutants
data1_long <- data1_long %>%
  mutate(variable = case_when(
    variable == "pm25_wk_1" ~ "PM 2.5",
    variable == "pm10_wk_1" ~ "PM 10",
    variable == "no2_wk_1" ~ "NO2",
    variable == "o3_wk_1" ~ "O3",
    TRUE ~ variable  # Keep the original value if it doesn't match any condition
  ))  %>%
  rename(Pollutant = variable)

#plot combined boxplots: season
airpoll_season <- ggplot(data1_long, aes(x = season_conception, y = value, fill = Pollutant)) +
  geom_boxplot() +
  labs(title = "Air Quality by season of conception",
       x = "Season of conception",
       y = "Concentration of air pollution (µm/m3)",
       fill = "Pollutant") +
  scale_fill_manual(values = c("blue", "red", "green", "purple"))  +
  theme_classic() +
  theme(axis.text.x = element_text(size = 16),  axis.text.y = element_text(size = 16),  
        legend.text = element_text(size = 16),  legend.title = element_text(size = 16), 
        plot.title = element_text(size = 20)) 

#boxplots: air quality by year

#select data: air pollutants and season
data2 <- data %>%
  select(pm25_wk_1, pm10_wk_1, o3_wk_1, no2_wk_1, 
         year_group)

#reshape data into long format
data2_long <- data2 %>%
  gather(variable, value, pm25_wk_1:no2_wk_1)

#replace values of air pollutants
data2_long <- data2_long %>%
  mutate(variable = case_when(
    variable == "pm25_wk_1" ~ "PM 2.5",
    variable == "pm10_wk_1" ~ "PM 10",
    variable == "no2_wk_1" ~ "NO2",
    variable == "o3_wk_1" ~ "O3",
    TRUE ~ variable  # Keep the original value if it doesn't match any condition
  ))  %>%
  rename(Pollutant = variable)

#plot combined boxplots: season
airpoll_year <- ggplot(data2_long, aes(x = year_group, y = value, fill = Pollutant)) +
  geom_boxplot() +
  labs(title = "Air Quality by year of conception",
       x = "Year of conception",
       y = "Concentration of air pollution (µm/m3)",
       fill = "Pollutant") +
  scale_fill_manual(values = c("blue", "red", "green", "purple"))  +
  theme_classic() +
  theme(axis.text.x = element_text(size = 16),  axis.text.y = element_text(size = 16),  
        legend.text = element_text(size = 16),  legend.title = element_text(size = 16), 
        plot.title = element_text(size = 20))

Correlation between exposures

Correlation between exposure to multiple air pollutants during first trimester (gestational week 0-13) in the study population. Note: Blue: coefficient >0; red: coefficient <0.

Code

#load libraries
pacman::p_load(AMR, 
               corrplot, 
               gridGraphics)
 
#set data to data.table
setDT(data)

# Correlation between first trimester exposures
data1 <- data %>%
  filter(
    !is.na(pm25_trim_first) & !is.na(pm10_trim_first) & !is.na(no2_trim_first) & !is.na(o3_trim_first)
  )

exp_matrix <- data1 %>%
  select(
    pm25_trim_first,
    pm10_trim_first,
    no2_trim_first,
    o3_trim_first
  )

colnames(exp_matrix) <- c("Trim1 PM2.5", "Trim1 PM10", "Trim1 NO2", "Trim1 O3")

#apply function cor to  matrix "exp_matrix
polday <- cor(as.matrix(exp_matrix))

corrplot(method = "ellipse", 
           polday, type = "lower", 
           order = "original", 
           title = element_blank(), 
           addCoef.col = 'black', tl.col = "black", 
           cl.pos = 'n', col = COL2('RdBu'),
           tl.srt = 0, 
           diag = FALSE, 
           mar = c(0, 0, 1, 0),
           tl.offset = 0.71, 
           col.lim = c(-1, 1),
           number.cex = 1.5, 
           tl.cex = 1.2,
           cl.cex = 1.3
)

Correlation between exposure to multiple air pollutants during first trimester (gestational week 0-13). and for individual weeks in first, second and third trimester in the study population. Note: Blue: coefficient >0; red: coefficient <0.

Code

#Correlation between weekly exposures Week 1, Week 13, Week 28

# Exclude cases with all exposures missing
# Exclude cases with all exposures missing
data1 <- data %>%
  filter(
    !is.na(pm25_wk_1) & !is.na(pm10_wk_1) & !is.na(no2_wk_1) & !is.na(o3_wk_1) & 
    !is.na(pm25_wk_13) & !is.na(pm10_wk_13) & !is.na(no2_wk_13) & !is.na(o3_wk_13) &
    !is.na(pm25_wk_28) & !is.na(pm10_wk_28) & !is.na(no2_wk_28) & !is.na(o3_wk_28)
  )

# Create the exposure matrix
exp_matrix <- data1 %>%
  select(
    pm25_wk_1, pm25_wk_13, pm25_wk_28, 
    pm10_wk_1, pm10_wk_13, pm10_wk_28, 
    no2_wk_1, no2_wk_13, no2_wk_28,
    o3_wk_1, o3_wk_13, o3_wk_28
  )

# Set the names for the figure
colnames(exp_matrix) <- c(
  "W1 PM2.5", "W13 PM2.5", "W28 PM2.5",
  "W1 PM10", "W13 PM10", "W28 PM10",
  "W1 NO2", "W13 NO2", "W28 NO2",
  "W1 O3", "W13 O3", "W28 O3"
)

# Apply the cor function to the exposure matrix
polday <- cor(as.matrix(exp_matrix))

# Save the correlation plot
corrplot_wk_W1W13W28 <- corrplot(
  method = "ellipse", 
  polday, 
  type = "lower", 
  order = "original", 
  title = element_blank(), 
  addCoef.col = 'black', 
  tl.col = "black", 
  cl.pos = 'n', 
  col = COL2('RdBu'),
  tl.srt = 90, 
  diag = FALSE, 
  number.cex = 0.6, 
  mar = c(0, 0, 1, 0), 
  tl.offset = 0.5, 
  tl.cex = 0.8,
  col.lim = c(-1, 1),
  cl.cex = 0.7
)

Outcomes descriptives

Distribution of birth weight and birth weight ultrasound z-scores.

Code

#Box plot of zBW by tertiles

#plot birth_weight_zscore_tertiles by actual birth weight
boxplot_BWtertiles <- ggplot(data %>% filter(!is.na(birth_weight)), aes(x = BW_tertiles, y = birth_weight)) +
  geom_boxplot() + 
  labs(x = "Birth Weight Tertiles", y = "Birth Weight (kg)") +
  ggtitle("Boxplot of birth weight by Birth Weight Tertiles")

#plot birth_weight_zscore_tertiles by actual birth weight
boxplot_zBWtertiles <- ggplot(data %>% filter(!is.na(zBW)), aes(x = zBW_tertiles, y = zBW)) +
  geom_boxplot() + 
  labs(x = "Z-score Birth Weight Tertiles", y = "z-score Birth Weight") +
  ggtitle("Boxplot of z-score birth weight by z-score Birth Weight Tertiles")

#Make ggplots for z-score distribution

# Select relevant columns
selected_zscore_columns <- data %>%
  select(
    zAC_W16W23, zBPD_W16W23, zFL_W16W23,
    zAC_W24W31, zBPD_W24W31, zFL_W24W31,
    zAC_W32W40, zBPD_W32W40, zFL_W32W40
  )

# Reshape data to long format
long_zscore_data <- selected_zscore_columns %>%
  pivot_longer(
    cols = everything(),
    names_to = c("Variable", "TimeWindow"),
    names_pattern = "z(.*)_(W\\d{2}W\\d{2})",
    values_to = "zScore") %>%
  filter(zScore > -10 & zScore < 10)

zscore_color_mapping <- c("AC" = "green", "BPD" = "red", "FL" = "blue")

# Create a combined plot for z-scores
zscore_combined_plot <- long_zscore_data %>%
  ggplot(aes(x = zScore, fill = Variable, color = Variable)) +
  geom_histogram(
    aes(y = ..density..),
    position = "identity",
    alpha = 0.05,
    bins = 30
  ) +
  geom_density(alpha = 0.3, size = 1) +
  scale_fill_manual(values = zscore_color_mapping) +
  scale_color_manual(values = zscore_color_mapping) +
  facet_wrap(~ TimeWindow, scales = "free", ncol = 1) +
  labs(
    title = "Z-Scores for ultrasound measures in three gestational perods",
    x = "Z-Score",
    y = "Density"
  ) +
  theme_minimal() +
  theme(
    legend.title = element_blank(),
    strip.text = element_text(face = "bold")
  )

#EFW z-scores

# Select relevant columns for EFW z-scores
selected_efw_zscore_columns <- data %>%
  select(
    zEFW_W16W23, 
    zEFW_W24W31, 
    zEFW_W32W40
  )

# Reshape data to long format for EFW z-scores
long_efw_zscore_data <- selected_efw_zscore_columns %>%
  pivot_longer(
    cols = everything(),
    names_to = "TimeWindow",
    names_pattern = "zEFW_(W\\d{2}W\\d{2})",
    values_to = "zScore"
  ) %>%
  filter(zScore > -10 & zScore < 10)

# Define color mapping for EFW (if needed, can be single color)
efw_color_mapping <- c("EFW" = "purple")

# Create the combined plot for EFW z-scores
efw_zscore_combined_plot <- long_efw_zscore_data %>%
  ggplot(aes(x = zScore, fill = "EFW", color = "EFW")) +
  geom_histogram(
    aes(y = ..density..),
    position = "identity",
    alpha = 0.05,
    bins = 30
  ) +
  geom_density(alpha = 0.3, size = 1) +
  scale_fill_manual(values = efw_color_mapping) +
  scale_color_manual(values = efw_color_mapping) +
  facet_wrap(~ TimeWindow, scales = "free", ncol = 1) +
  labs(
    title = "Z-Scores for EFW in three gestational periods",
    x = "Z-Score",
    y = "Density"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",  # Hide legend since only one variable
    strip.text = element_text(face = "bold")
  )

Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

Outcome table and Missing outcomes

Distribution of birth and ultrasound raw outcomes, z-scores, and distribution of missing outcomes.

Code

#make trial dataset

trialdata <- data |> 
  select(zBW_tertiles,  zBW,
         birth_weight, birth_length, zBL, birth_headcircumference, zBHC,
         g_age_d_BPD_W16W23 , BPD_W16W23 , zBPD_W16W23 , 
                        g_age_d_BPD_W24W31 , BPD_W24W31 , zBPD_W24W31 , 
                        g_age_d_BPD_W32W40 , BPD_W32W40 , zBPD_W32W40 , 
                        g_age_d_AC_W16W23 , AC_W16W23, zAC_W16W23 , 
                        g_age_d_AC_W24W31 , AC_W24W31 , zAC_W24W31 , 
                        g_age_d_AC_W32W40 , AC_W32W40 , zAC_W32W40 , 
                        g_age_d_FL_W16W23 , FL_W16W23 , zFL_W16W23 , 
                        g_age_d_FL_W24W31 , FL_W24W31 , zFL_W24W31 , 
                        g_age_d_FL_W32W40 , FL_W32W40 , zFL_W32W40 , 
                        g_age_d_EFW_W16W23 , EFW_W16W23 , zEFW_W16W23 , 
                        g_age_d_EFW_W24W31 , EFW_W24W31 , zEFW_W24W31 , 
                        g_age_d_EFW_W32W40 , EFW_W32W40 , zEFW_W32W40)

#make trial crosstable by zBW
outcome_table <-
  tbl_summary(
    trialdata,
    include = c("birth_weight", "birth_length", "birth_headcircumference", 
                "zBL", "zBHC", 
      "BPD_W16W23", "BPD_W24W31", "BPD_W32W40",
"zBPD_W16W23", "zBPD_W24W31", "zBPD_W32W40", 
"AC_W16W23", "AC_W24W31", "AC_W32W40",
"zAC_W16W23", "zAC_W24W31", "zAC_W32W40", 
"FL_W16W23", "FL_W24W31", "FL_W32W40",
"zFL_W16W23", "zFL_W24W31", "zFL_W32W40", 
"EFW_W16W23", "EFW_W24W31", "EFW_W32W40",
"zEFW_W16W23", "zEFW_W24W31", "zEFW_W32W40"),
        by = zBW_tertiles, 
    label = list(birth_weight ~"Birth weight", birth_length ~"Birth length", birth_headcircumference ~"Birth head circumference", 
      BPD_W16W23 ~ "BPD W16-W23", zBPD_W16W23 ~ "zBPD W16-W23", BPD_W24W31 ~ "BPD W24-W31", zBPD_W24W31 ~ "zBPD W24-W31", BPD_W32W40 ~ "BPD W32-W40", zBPD_W32W40 ~ "zBPD W32-W40", AC_W16W23 ~ "AC W16-W23", zAC_W16W23 ~ "zAC W16-W23", AC_W24W31 ~ "AC W24-W31", zAC_W24W31 ~ "zAC W24-W31", AC_W32W40 ~ "AC W32-W40", zAC_W32W40 ~ "zAC W32-W40", FL_W16W23 ~ "FL W16-W23", zFL_W16W23 ~ "zFL W16-W23", FL_W24W31 ~ "FL W24-W31", zFL_W24W31 ~ "zFL W24-W31", FL_W32W40 ~ "FL W32-W40", zFL_W32W40 ~ "zFL W32-W40", EFW_W16W23 ~ "EFW W16-W23", zEFW_W16W23 ~ "zEFW W16-W23", EFW_W24W31 ~ "EFW W24-W31", zEFW_W24W31 ~ "zEFW W24-W31", EFW_W32W40 ~ "EFW W32-W40", zEFW_W32W40 ~ "zEFW W32-W40"),
        statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"), 
    digits = all_continuous() ~ 1, 
      missing = "no") |> 
  add_overall() |> 
modify_header(label = "**Characteristics**") |> 
  bold_labels() |> 
modify_caption("Table 1. Anthropometry characteristics. during pregnancy, stratified by tertiles of birth weight z-score")  |> 
 modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**zBW tertiles**")

#make missing outcome table
missing_outcome_table <- tbl_summary(
  trialdata, 
  by = zBW_tertiles, 
  include = c("birth_weight", "birth_length", "birth_headcircumference", 
              "BPD_W16W23", "AC_W16W23", "FL_W16W23", "EFW_W16W23",
              "BPD_W24W31", "AC_W24W31", "FL_W24W31", "EFW_W24W31", 
              "BPD_W32W40", "AC_W32W40", "FL_W32W40", "EFW_W32W40"), 
  label = list(BPD_W16W23 ~ "BPD W16-W23", BPD_W24W31 ~ "BPD W24-W31", 
               BPD_W32W40 ~ "BPD W32-W40", AC_W16W23 ~ "AC W16-W23", 
               AC_W24W31 ~ "AC W24-W31", AC_W32W40 ~ "AC W32-W40", 
               FL_W16W23 ~ "FL W16-W23",  FL_W24W31 ~ "FL W24-W31", 
               FL_W32W40 ~ "FL W32-W40", EFW_W16W23 ~ "EFW W16-W23", 
               EFW_W24W31 ~ "EFW W24-W31", EFW_W32W40 ~ "EFW W32-W40", 
               birth_weight ~"Birth weight", birth_length ~"Birth length", birth_headcircumference ~"Birth head circumference"),
   statistic = list(all_continuous() ~ "{mean}", missing = "no"), missing_stat = "({p_miss}%)") %>%
  modify_header(label = "**Characteristics**") %>%
   add_n(statistic = "{N_miss} ({p_miss}%)")  %>%
  modify_header(n = "**Missing**") %>%
  bold_levels() %>%
  modify_caption("Table 2. Missing proportions of outcome characteristics")

Table 1. Anthropometry characteristics. during pregnancy, stratified by tertiles of birth weight z-score
Characteristics	Overall N = 497,402¹	zBW tertiles
Characteristics	Overall N = 497,402¹	Low tertile zBW N = 165,831¹	Mid tertile zBW N = 165,774¹	High tertile zBW N = 165,797¹
Birth weight	3.5 (0.5)	3.1 (0.4)	3.5 (0.3)	4.0 (0.4)
Birth length	50.0 (4.5)	48.4 (4.7)	50.0 (4.2)	51.5 (4.1)
Birth head circumference	34.7 (3.8)	33.8 (3.9)	34.7 (3.6)	35.6 (3.7)
zBL	0.6 (1.1)	-0.3 (0.9)	0.6 (0.8)	1.4 (0.8)
zBHC	0.9 (1.1)	0.2 (1.1)	0.9 (1.0)	1.6 (1.0)
BPD W16-W23	44.0 (3.7)	43.9 (3.6)	44.0 (3.5)	44.1 (4.0)
BPD W24-W31	76.4 (6.4)	75.9 (5.9)	76.2 (6.5)	77.3 (6.8)
BPD W32-W40	89.5 (6.6)	88.4 (5.8)	89.1 (7.3)	91.2 (6.6)
zBPD W16-W23	-0.5 (1.0)	-0.6 (0.9)	-0.5 (0.9)	-0.5 (1.2)
zBPD W24-W31	-0.4 (0.9)	-0.7 (0.9)	-0.3 (0.9)	0.0 (0.9)
zBPD W32-W40	-0.3 (1.4)	-0.7 (1.2)	-0.3 (1.6)	0.2 (1.3)
AC W16-W23	139.1 (16.1)	137.7 (16.8)	139.1 (16.2)	140.6 (15.0)
AC W24-W31	259.1 (26.1)	253.4 (22.2)	258.1 (25.6)	266.7 (28.6)
AC W32-W40	324.9 (30.5)	313.2 (23.3)	322.2 (27.1)	341.5 (33.6)
zAC W16-W23	0.4 (1.7)	0.1 (1.9)	0.4 (1.8)	0.6 (1.5)
zAC W24-W31	0.8 (1.0)	0.2 (0.9)	0.8 (0.8)	1.5 (1.0)
zAC W32-W40	0.7 (1.1)	-0.1 (0.8)	0.7 (0.7)	1.5 (1.0)
FL W16-W23	29.0 (3.8)	28.8 (3.9)	29.0 (4.0)	29.2 (3.5)
FL W24-W31	56.8 (5.8)	56.4 (5.1)	56.7 (6.6)	57.5 (5.7)
FL W32-W40	69.0 (5.7)	68.2 (4.8)	68.5 (5.1)	70.3 (6.8)
zFL W16-W23	0.3 (1.5)	0.2 (1.6)	0.4 (1.7)	0.5 (1.3)
zFL W24-W31	1.1 (1.4)	0.7 (1.1)	1.1 (1.9)	1.4 (1.1)
zFL W32-W40	1.1 (1.3)	0.6 (1.0)	1.1 (1.0)	1.5 (1.7)
EFW W16-W23	232.9 (123.2)	228.1 (138.0)	232.5 (121.5)	238.2 (108.3)
EFW W24-W31	1,507.6 (393.2)	1,429.6 (327.1)	1,492.8 (385.8)	1,613.2 (444.3)
EFW W32-W40	2,756.7 (639.8)	2,537.7 (465.9)	2,686.2 (581.8)	3,089.1 (732.1)
zEFW W16-W23	-1.9 (17.2)	-2.2 (28.1)	-1.8 (6.0)	-1.6 (7.8)
zEFW W24-W31	0.7 (1.0)	0.1 (0.9)	0.7 (0.8)	1.4 (0.9)
zEFW W32-W40	0.5 (1.0)	-0.3 (0.8)	0.5 (0.7)	1.4 (0.8)
¹ Mean (SD)

Table 2. Missing proportions of outcome characteristics
Characteristics	Missing	Low tertile zBW N = 165,831¹	Mid tertile zBW N = 165,774¹	High tertile zBW N = 165,797¹
Birth weight	0 (0%)	3.05	3.51	4.05
Birth length	0 (0%)	48.45	50.00	51.54
Birth head circumference	0 (0%)	33.76	34.70	35.61
BPD W16-W23	25,535 (5.1%)	43.89	43.96	44.12
Unknown		(5.4%)	(4.9%)	(5.1%)
AC W16-W23	64,546 (13%)	138	139	141
Unknown		(13%)	(13%)	(13%)
FL W16-W23	28,646 (5.8%)	28.81	28.99	29.18
Unknown		(6.0%)	(5.5%)	(5.8%)
EFW W16-W23	66,081 (13%)	228	232	238
Unknown		(14%)	(13%)	(13%)
BPD W24-W31	418,159 (84%)	76	76	77
Unknown		(82%)	(85%)	(85%)
AC W24-W31	418,389 (84%)	253	258	267
Unknown		(82%)	(85%)	(85%)
FL W24-W31	422,171 (85%)	56.4	56.7	57.5
Unknown		(83%)	(86%)	(86%)
EFW W24-W31	422,323 (85%)	1,430	1,493	1,613
Unknown		(83%)	(86%)	(86%)
BPD W32-W40	322,849 (65%)	88.4	89.1	91.2
Unknown		(60%)	(69%)	(66%)
AC W32-W40	318,599 (64%)	313	322	341
Unknown		(59%)	(68%)	(65%)
FL W32-W40	331,598 (67%)	68.2	68.5	70.3
Unknown		(61%)	(70%)	(68%)
EFW W32-W40	331,933 (67%)	2,538	2,686	3,089
Unknown		(62%)	(70%)	(68%)
¹ Mean

Availability of repeated outcomes

To show cases with repeated measurements for gestational windows

Code

#Variables indicating overall data availability by gestational window
data <- data %>%
  mutate(
    # For weeks 16-23
    avail_W16W23 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W16W23"), 
                            starts_with("AC_W16W23"), 
                            starts_with("FL_W16W23"), 
                            starts_with("EFW_W16W23")))) > 0, 
      "Yes", "Missing"),
    # For weeks 24-31
    avail_W24W31 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W24W31"), 
                            starts_with("AC_W24W31"), 
                            starts_with("FL_W24W31"), 
                            starts_with("EFW_W24W31")))) > 0, 
      "Yes", "Missing"),
    # For weeks 32-40
    avail_W32W40 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W32W40"), 
                            starts_with("AC_W32W40"), 
                            starts_with("FL_W32W40"), 
                            starts_with("EFW_W32W40")))) > 0, 
      "Yes", "Missing"),
    # For birth variables
    avail_birth = if_else(
      rowSums(!is.na(select(., birth_weight, birth_length, birth_headcircumference))) > 0, 
      "Yes", "Missing")
  )

proportions <- data %>%
  mutate(
    time_points_available = rowSums(select(., avail_W16W23, avail_W24W31, avail_W32W40) == "Yes"),
    availability_group = case_when(
      time_points_available == 3 ~ "All 3 gestational windows",
      time_points_available == 2 ~ "2 of 3 gestational windows",
      time_points_available == 1 ~ "1 of 3 gestational windows",
      time_points_available == 0 ~ "0 time points"
    )
  ) %>%
  group_by(availability_group) %>%
  summarise(
    count = n(),
    proportion = n() / nrow(data)
  ) %>%
  ungroup()

# Filter out the "0 time points" category
proportions <- proportions %>%
  filter(availability_group != "0 time points") %>%
  mutate(label = as.character(count)) # Only include the number of cases in the label

# Create the pie chart
ggplot(proportions, aes(x = "", y = proportion, fill = availability_group)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  theme_void() +
  labs(
    title = "Availability of Repeated Observations Across Gestational Windows (n)",
    fill = "G"
  ) +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), size = 5) + # Add case count labels
  theme(
    legend.title = element_text(size = 14),  # Increase legend title font size
    legend.text = element_text(size = 12)    # Increase legend text font size
  ) +
  scale_fill_brewer(palette = "Set2")

Sociodemographics descriptives

Demographics by zBW tertiles and demographics for cases with missing outcomes by trimester.

Code

#Demographics by tertiles 

#make trial dataset
trialdata <- data |> 
  select(zBW_tertiles, gest_age_birth, birth_weight, birth_length, birth_headcircumference, zBHC,
sex, fstart, season, geogr, mat_age_cat, meduc, bmi_cat, smoke_preg, alcohol_cat, hypertension, dm2)

#make crosstable by zBW
baseline_table <-
  tbl_summary(
    trialdata,
    include = c("gest_age_birth", "birth_weight", #anthropometrics
                "sex", "fstart", "season",  #fetal characteristics
                "geogr", "mat_age_cat", "meduc", "bmi_cat", "smoke_preg"), #maternal characteristics
        by = zBW_tertiles, 
    label = list(gest_age_birth ~ "Gestational age at birth (days)", birth_weight ~ "Birth weight (kg)",
    sex ~ "Fetal sex", fstart ~ "Birth type", season ~ "Season of birth", 
    geogr ~ "Region of residency", mat_age_cat ~ "Maternal age at birth (years)", meduc ~ "Maternal education", bmi_cat ~ "Maternal body mass index", smoke_preg ~ "Maternal smoking status"),
        statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"), 
    digits = all_continuous() ~ 1, 
    missing_text = "Missing") |> 
  add_overall() |> 
modify_header(label = "**Characteristics**") |> 
  bold_labels() |> 
modify_caption("Table 1. Baseline characteristics. Birth anthropometrics, fetal and maternal characteristics stratified by tertiles of birth weight z-score")  |> 
 modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**zBW tertiles**")

Table 1. Baseline characteristics. Birth anthropometrics, fetal and maternal characteristics stratified by tertiles of birth weight z-score
Characteristics	Overall N = 497,402¹	zBW tertiles
Characteristics	Overall N = 497,402¹	Low tertile zBW N = 165,831¹	Mid tertile zBW N = 165,774¹	High tertile zBW N = 165,797¹
Gestational age at birth (days)	278.6 (11.5)	277.6 (12.1)	278.3 (11.8)	279.8 (10.4)
Birth weight (kg)	3.5 (0.5)	3.1 (0.4)	3.5 (0.3)	4.0 (0.4)
Fetal sex
Female	241,737 (49%)	79,380 (48%)	81,181 (49%)	81,176 (49%)
Male	255,665 (51%)	86,451 (52%)	84,593 (51%)	84,621 (51%)
Birth type
Cesarean section	38,354 (7.7%)	9,848 (5.9%)	12,569 (7.6%)	15,937 (9.6%)
Induction	91,815 (18%)	32,069 (19%)	27,120 (16%)	32,626 (20%)
Spontaneous	367,233 (74%)	123,914 (75%)	126,085 (76%)	117,234 (71%)
Season of birth
Winter	110,574 (22%)	37,142 (22%)	37,075 (22%)	36,357 (22%)
Spring	128,603 (26%)	42,875 (26%)	43,024 (26%)	42,704 (26%)
Summer	135,368 (27%)	44,671 (27%)	45,010 (27%)	45,687 (28%)
Fall	122,857 (25%)	41,143 (25%)	40,665 (25%)	41,049 (25%)
Region of residency
North	46,205 (9.3%)	14,629 (8.8%)	15,436 (9.3%)	16,140 (9.7%)
Central	207,779 (42%)	70,317 (42%)	69,586 (42%)	67,876 (41%)
South	243,418 (49%)	80,885 (49%)	80,752 (49%)	81,781 (49%)
Maternal age at birth (years)
≤24	55,914 (11%)	21,679 (13%)	18,646 (11%)	15,589 (9.4%)
25-29	155,889 (31%)	52,489 (32%)	52,743 (32%)	50,657 (31%)
30-34	172,148 (35%)	55,880 (34%)	57,248 (35%)	59,020 (36%)
≥35	113,391 (23%)	35,759 (22%)	37,115 (22%)	40,517 (24%)
Missing	60	24	22	14
Maternal education
High school	160,604 (38%)	51,678 (38%)	52,750 (38%)	56,176 (40%)
Primary or lower	35,275 (8.4%)	14,146 (10%)	11,376 (8.1%)	9,753 (6.9%)
University or higher	222,978 (53%)	71,859 (52%)	75,662 (54%)	75,457 (53%)
Missing	78,545	28,148	25,986	24,411
Maternal body mass index
Underweight	12,670 (2.6%)	6,712 (4.2%)	3,963 (2.5%)	1,995 (1.2%)
Normalweight	276,051 (58%)	99,690 (62%)	95,339 (60%)	81,022 (51%)
Overweight	123,726 (26%)	36,060 (23%)	40,442 (25%)	47,224 (30%)
Obesity	67,167 (14%)	17,292 (11%)	20,186 (13%)	29,689 (19%)
Missing	17,788	6,077	5,844	5,867
Maternal smoking status
Smoker	21,807 (4.8%)	10,047 (6.6%)	6,593 (4.3%)	5,167 (3.4%)
Non-smoker	433,977 (95%)	142,026 (93%)	145,375 (96%)	146,576 (97%)
Missing	41,618	13,758	13,806	14,054
¹ Mean (SD); n (%)

Table: Characteristics for cases with available ultrasound parameters in respective gestational windows
Characteristics	Available data W16W23	Available data W24W31	Available data W32W40
Characteristics	N = 472,098¹	N = 79,337¹	N = 179,135¹
Gestational age at birth (days)	278.7 (11.4)	274.5 (14.4)	277.9 (10.9)
Birth weight (kg)	3.5 (0.5)	3.4 (0.6)	3.5 (0.6)
Fetal sex
Female	229,515 (49%)	39,301 (50%)	87,815 (49%)
Male	242,583 (51%)	40,036 (50%)	91,320 (51%)
Birth type
Cesarean section	35,863 (7.6%)	10,134 (13%)	18,167 (10%)
Induction	85,997 (18%)	20,413 (26%)	46,314 (26%)
Spontaneous	350,238 (74%)	48,790 (61%)	114,654 (64%)
Season of birth
Winter	104,575 (22%)	18,286 (23%)	40,781 (23%)
Spring	122,428 (26%)	20,031 (25%)	45,705 (26%)
Summer	128,676 (27%)	21,725 (27%)	47,572 (27%)
Fall	116,419 (25%)	19,295 (24%)	45,077 (25%)
Region of residency
North	43,318 (9.2%)	7,389 (9.3%)	14,222 (7.9%)
Central	200,986 (43%)	37,950 (48%)	81,978 (46%)
South	227,794 (48%)	33,998 (43%)	82,935 (46%)
Maternal age at birth (years)
≤24	52,946 (11%)	8,448 (11%)	18,579 (10%)
25-29	148,574 (31%)	22,436 (28%)	51,991 (29%)
30-34	164,063 (35%)	26,241 (33%)	61,153 (34%)
≥35	106,461 (23%)	22,201 (28%)	47,394 (26%)
Missing	54	11	18
Maternal education
High school	152,610 (38%)	25,996 (40%)	57,919 (39%)
Primary or lower	33,412 (8.4%)	6,116 (9.3%)	13,175 (8.8%)
University or higher	212,411 (53%)	33,501 (51%)	79,286 (53%)
Missing	73,665	13,724	28,755
Maternal body mass index
Underweight	12,186 (2.7%)	2,610 (3.4%)	5,512 (3.2%)
Normalweight	264,628 (58%)	39,056 (51%)	91,291 (53%)
Overweight	117,423 (26%)	19,331 (25%)	42,931 (25%)
Obesity	62,225 (14%)	15,203 (20%)	32,712 (19%)
Missing	15,636	3,137	6,689
Maternal smoking status
Smoker	20,506 (4.7%)	4,791 (6.5%)	10,822 (6.5%)
Non-smoker	412,917 (95%)	68,385 (93%)	154,505 (93%)
Missing	38,675	6,161	13,808
¹ Mean (SD); n (%)

Table: Characteristics for cases with missing ultrasound parameters in respective gestational windows
Characteristics	Missing data W16W23	Missing data W24W31	Missing data W32W40
Characteristics	N = 25,304¹	N = 418,065¹	N = 318,267¹
Gestational age at birth (days)	276.8 (13.6)	279.3 (10.7)	278.9 (11.8)
Birth weight (kg)	3.5 (0.6)	3.6 (0.5)	3.6 (0.5)
Fetal sex
Female	12,222 (48%)	202,436 (48%)	153,922 (48%)
Male	13,082 (52%)	215,629 (52%)	164,345 (52%)
Birth type
Cesarean section	2,491 (9.8%)	28,220 (6.8%)	20,187 (6.3%)
Induction	5,818 (23%)	71,402 (17%)	45,501 (14%)
Spontaneous	16,995 (67%)	318,443 (76%)	252,579 (79%)
Season of birth
Winter	5,999 (24%)	92,288 (22%)	69,793 (22%)
Spring	6,175 (24%)	108,572 (26%)	82,898 (26%)
Summer	6,692 (26%)	113,643 (27%)	87,796 (28%)
Fall	6,438 (25%)	103,562 (25%)	77,780 (24%)
Region of residency
North	2,887 (11%)	38,816 (9.3%)	31,983 (10%)
Central	6,793 (27%)	169,829 (41%)	125,801 (40%)
South	15,624 (62%)	209,420 (50%)	160,483 (50%)
Maternal age at birth (years)
≤24	2,968 (12%)	47,466 (11%)	37,335 (12%)
25-29	7,315 (29%)	133,453 (32%)	103,898 (33%)
30-34	8,085 (32%)	145,907 (35%)	110,995 (35%)
≥35	6,930 (27%)	91,190 (22%)	65,997 (21%)
Missing	6	49	42
Maternal education
High school	7,994 (39%)	134,608 (38%)	102,685 (38%)
Primary or lower	1,863 (9.1%)	29,159 (8.3%)	22,100 (8.2%)
University or higher	10,567 (52%)	189,477 (54%)	143,692 (54%)
Missing	4,880	64,821	49,790
Maternal body mass index
Underweight	484 (2.1%)	10,060 (2.5%)	7,158 (2.3%)
Normalweight	11,423 (49%)	236,995 (59%)	184,760 (60%)
Overweight	6,303 (27%)	104,395 (26%)	80,795 (26%)
Obesity	4,942 (21%)	51,964 (13%)	34,455 (11%)
Missing	2,152	14,651	11,099
Maternal smoking status
Smoker	1,301 (5.8%)	17,016 (4.4%)	10,985 (3.8%)
Non-smoker	21,060 (94%)	365,592 (96%)	279,472 (96%)
Missing	2,943	35,457	27,810
¹ Mean (SD); n (%)

Included vs Excluded

Demographics of included (inclusion criteria: date of conception, ultrasound, exposure data) vs excluded (SPR dataset).

Table. Included and Excluded Characteristics
Characteristics	All SPR births n=620,866
Characteristics	Excluded N = 123,464¹	Included N = 497,402¹
Gestational age at birth (days)	39.4 (2.6)	39.7 (1.6)
Birth weight (g)	3,462.7 (655.3)	3,537.8 (535.6)
Preterm birth	7,964 (7.2%)	22,853 (4.6%)
Fetal sex
Boy	63,539 (52%)	255,665 (51%)
Girl	59,765 (48%)	241,737 (49%)
Unknown	0 (0%)	0 (0%)
Birth type
Elektiv sectio	7,136 (5.8%)	38,354 (7.7%)
Induktion	18,174 (15%)	91,815 (18%)
Spontan	98,154 (80%)	367,233 (74%)
Region of residency
North	7,272 (11%)	46,205 (9.3%)
Central	10,618 (17%)	207,779 (42%)
South	45,453 (72%)	243,418 (49%)
Maternal age at birth (years)
<=24	20,186 (16%)	55,914 (11%)
25-29	40,132 (33%)	155,889 (31%)
30-34	39,267 (32%)	172,148 (35%)
>=35	23,746 (19%)	113,391 (23%)
Maternal education
Primary or lower	10,327 (10%)	35,275 (7.7%)
High school	36,552 (36%)	160,604 (35%)
University or higher	42,808 (42%)	222,978 (49%)
Don´t know	12,222 (12%)	40,327 (8.8%)
Maternal body mass index
Underweight	2,399 (2.5%)	12,670 (2.6%)
Normalweight	52,599 (54%)	276,051 (58%)
Overweight	26,991 (28%)	123,726 (26%)
Obesity	15,371 (16%)	67,167 (14%)
Maternal smoking status	4,964 (6.0%)	21,807 (4.8%)
¹ Mean (SD); n (%)

Purpose

Data flow chart

Raw datasets

Identical cases matrix across datasets

Suggested inclusion criteria

Exclusion criteria

Flow chart

Data transformation

Exposure descriptives

Mean and tertile exposure levels

Exposure in relation to WHO/EU guidelines

Exposure by season and year

Correlation between exposures

Outcomes descriptives

Distribution of birth weight and birth weight ultrasound z-scores.

Outcome table and Missing outcomes

Availability of repeated outcomes

Sociodemographics descriptives

Included vs Excluded

Analysis