Descriptives Fetal Growth Air Pollution

Author

Nikolaus Mezger

Purpose

To explore descriptive statistics on exposure, outcome and sociodemographics including data availability.

A data base compiling fetal ultrasound and birth anthropometric outcomes, corresponding z-scores, sociodemographic information on pregnant and their children, and corresponding exposure data on 4 air pollutants by gestational week is used.

Data flow chart

Raw datasets

Birth dataset English Language: chamnha_healthdata Cases: n=620,866; Individual ids: 620,866; 2014-2019; Variables: ID, date of birth, gestational age, demographics

Birth dataset Swedish language: db Cases: n=639,236; Individual ids: 629,989; 2014-2019; Variables: ID, date of birth, gestational age, demographics

Ultrasound dataset: ultraljud_foster; Individual ids: 569038; 2014-2019; Variables: ID, date of ultrasound (several per case), number of fetuses, ultrasound measures (crl, bpd etc.)

KUB ultrasound dataset: kub_undersokningar; Cases: n=213,250; Individual ids: 205,390; 2014-2019; Variables: ID, date of birth, ultrasound measures

Exposure dataset: Individual ids: n=500,958; exposure data provided by spatiotemporal model

Code
# Review availability of cases in datasets used in this project

#access datasets: Birth dataset English language
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("1_originaldata/chamnha_healthdata.Rda")
#names(ch_ht)
#num_unique_ids <- length(unique(ch_ht$Graviditet))
#num_unique_ids
#access datasets: Birth dataset Swedish language
setwd("P:/EnvEpi-Projekt/CHAMNHA")
load("1_paper_temp_ptb/02_datasets/health_data.Rda")
#names(db)
#num_unique_ids <- length(unique(db$GraviditetID))
#num_unique_ids
#access datasets: Ultrasound dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
data_us <- as.data.table(read.csv(file = "1_originaldata/ultraljud_foster.csv", sep = ";")) 
#names(data_us)
#num_unique_ids <- length(unique(data_us$graviditetid))
#num_unique_ids 
#access datasets: KUB dataset
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("2_processeddata/conceptiondate_all/conceptiondate_kub.Rda")
#names(data1)
#num_unique_ids <- length(unique(data1$id))
#num_unique_ids 

Identical cases matrix across datasets

Overlap matrix of identical cases between the four datasets Birth dataset English Language, Birth dataset Swedish language, Ultrasound dataset, KUB ultrasound dataset

Code
#Rename id variable, remove other cases, deduplicate cases 
bir_en <- ch_ht %>%
  rename(id = Graviditet) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
bir_sw <- db %>%
  rename(id = GraviditetID) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
ultra <- data_us %>%
  rename(id = graviditetid) %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
kub <- data1 %>%
  select(id) %>%
  distinct(id, .keep_all = TRUE)
rm(data1, db, ch_ht, data_us)

#Make a matrix of identical cases in the four datasets

# Count the total number of cases in each dataset
dataset_counts <- sapply(list(bir_en, bir_sw, ultra, kub), nrow)
names(dataset_counts) <- c("bir_en", "bir_sw", "ultra", "kub")

# Calculate the overlap matrix
datasets <- list(bir_en$id, bir_sw$id, ultra$id, kub$id)
overlap_matrix <- matrix(0, nrow = 4, ncol = 4, dimnames = list(names(dataset_counts), names(dataset_counts)))

for (i in 1:4) {
  for (j in 1:4) {
    overlap_matrix[i, j] <- length(intersect(datasets[[i]], datasets[[j]]))
  }
}

# Add dataset counts as row and column labels
rownames(overlap_matrix) <- paste0(names(dataset_counts), " (", dataset_counts, ")")
colnames(overlap_matrix) <- paste0(names(dataset_counts), " (", dataset_counts, ")")

print(overlap_matrix)
                bir_en (620866) bir_sw (629989) ultra (569038) kub (205390)
bir_en (620866)          620866          620866         560508       202163
bir_sw (629989)          620866          629989         569038       205390
ultra (569038)           560508          569038         569038       194180
kub (205390)             202163          205390         194180       205390
Code
rm(list = ls())

Suggested inclusion criteria

  • registered in the Swedish Pregnancy Register 2014-2019: birth_swe/db (n=629989) and birth_eng/ch_ht (n=620866)
  • ??any ultrasound outcome during pregnancy available: ultrasound dataset (n=569038) and kub (n=205390) ??
  • date of conception via: 1) kub exam; 2) ultrasound exam; 3) birth dataset
  • residential address available for matching exposure with individual pregnant women (n=500,XXX)
  • ?birth weight z-score?: birth weight z-score calculated by gestational age-specific birth weight provided by birth_eng/birth_swe datasets

Exclusion criteria

  • ??no outcome data??
  • ??no co-variates??

Flow chart

Data transformation

The dataset is modified for presentation purposes and variables. For example, variables on month, year and season of conception, birth weight z-score tertiles are defined.

To Do: Add data from Swedish birth dataset (sociodemographics need to be translated and merged)

Code
rm(list = ls())
# set wd for manual work (even though already specified in YAML code)
setwd("P:/EnvEpi-Projekt/CHAMNHA/6_ultrasound_nikolaus")
load("2_processeddata/conceptiondate_all/conceptiondate_merged_ustype_gestage_maxultrasoundmeasurement_birthoutcomes_USzscores_EFWzscores_BWzscores_whozscores_sorted_outcavail_socdem_wide_expos.Rda")

#rename database
data <- data_exposure
rm(data_exposure)

# remove data without exposure and without zBW
data <- data %>%
  filter(
    !is.na(pm25_trim_first) & !is.na(pm10_trim_first) & !is.na(no2_trim_first) & !is.na(o3_trim_first)) %>% 
  filter(!is.na(zBW))

#rename variables
data <- data %>%
  mutate(mat_age_cat = dplyr::recode(mat_age_cat, 
                         "<=24" = "≤24", 
                         ">=35" = "≥35"),
    fstart = dplyr::recode(fstart, 
                    "Elektiv sectio" = "Cesarean section", 
                    "Induktion" = "Induction", 
                    "Spontan" = "Spontaneous")
  )

#define season of conception
#month of conception
data <- as.data.table(data)
data[, month_conception := as.numeric(format(conceptiondate, "%m"))]
#year of conception
data[, year_conception := as.numeric(format(conceptiondate, "%y"))]
#year group
data <- data %>%
  mutate(year_group = case_when(
    year_conception %in% c(13, 14) ~ "2013/14",
    year_conception %in% c(15, 16) ~ "2015/16",
    year_conception %in% c(17, 18, 19) ~ "2017/18/19",
    TRUE ~ as.character(year_conception)
  ))
#define season of conception
data <- data %>%
  mutate(season_conception = factor(case_when(
    month_conception %in% c(12, 1, 2) ~ "Winter",
    month_conception %in% c(3, 4, 5) ~ "Spring",
    month_conception %in% c(6, 7, 8) ~ "Summer",
    month_conception %in% c(9, 10, 11) ~ "Fall",
    TRUE ~ NA_character_
  ), levels = c("Winter", "Spring", "Summer", "Fall")))
# Create a new variable birth_weight_tertiles and birth weight
data <- data %>%
  mutate(
    zBW_tertiles = cut(
      zBW,
      breaks = quantile(zBW, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile zBW", "Mid tertile zBW", "High tertile zBW"),
      include.lowest = TRUE)) %>%
  mutate(
    BW_tertiles = cut(
      birth_weight,
      breaks = quantile(birth_weight, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile BW", "Mid tertile BW", "High tertile BW"),
      include.lowest = TRUE))
# Create tertiles for each pollutant
data <- data %>%
  mutate(
    pm25_tertiles = cut(
      pm25_trim_first,
      breaks = quantile(pm25_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    pm10_tertiles = cut(
      pm10_trim_first,
      breaks = quantile(pm10_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    no2_tertiles = cut(
      no2_trim_first,
      breaks = quantile(no2_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  ) %>%
  mutate(
    o3_tertiles = cut(
      o3_trim_first,
      breaks = quantile(o3_trim_first, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
      labels = c("Low tertile", "Mid tertile", "High tertile"),
      include.lowest = TRUE)
  )

# Clean specific variables
data <- data %>%
  mutate(meduc = if_else(meduc == "Don´t know", NA_character_, meduc)) %>% 
 mutate(smoke_preg = factor(smoke_preg,
                            levels = c("Yes", "No"),
                            labels = c("Smoker", "Non-smoker")))
#clean variable zBHC (n=61)
data$zBHC[is.infinite(data$zBHC)] <- NA

Exposure descriptives

Mean and tertile exposure levels

Mean and tertile air pollution levels during first trimester in the study population.

Code
# Mean pollution levels
trialdata <- data |> 
  select(pm25_trim_first, pm10_trim_first, no2_trim_first, o3_trim_first)

mean_exposure_table <-
  tbl_summary(
    trialdata,
    label = list(pm25_trim_first ~" PM2.5 (µg/m3)", pm10_trim_first ~ "PM10 (µg/m3)", no2_trim_first ~ "NO2 (µg/m3)", o3_trim_first ~ "O3 (µg/m3)"),
        statistic = list(all_continuous() ~ "{mean} ({sd})"), 
    digits = all_continuous() ~ 1,
    missing = "no") |> 
      modify_header(label = "**Characteristics**") |> 
      bold_labels() |> 
      modify_caption("Table 2: Ambient air pollution levels during first trimester in the study population")

# Tertile pollution levels
# Summary table for PM10
trialdata <- data |> 
  select(pm25_trim_first, pm10_trim_first, no2_trim_first, o3_trim_first, 
         pm25_tertiles, pm10_tertiles, no2_tertiles, o3_tertiles)

summary_table_pm10 <- tbl_summary(
  trialdata,
  by = pm10_tertiles, 
  include = c("pm10_trim_first"),
  label = list(pm10_trim_first ~ "PM10 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no")
# Summary table for NO2
summary_table_no2 <- tbl_summary(
  trialdata,
  by = no2_tertiles, 
  include = c("no2_trim_first"),
  label = list(no2_trim_first ~ "NO2 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)
# Summary table for O3
summary_table_o3 <- tbl_summary(
  trialdata,
  by = o3_tertiles, 
  include = c("o3_trim_first"),
  label = list(o3_trim_first ~ "O3 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)
# Summary table for PM25
summary_table_pm25 <- tbl_summary(
  trialdata,
  by = pm25_tertiles, 
  include = c("pm25_trim_first"),
  label = list(pm25_trim_first ~ "PM2.5 (µg/m3)"), 
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"
  ),
  digits = all_continuous() ~ 1,
  missing = "no"
)
Table 2: Ambient air pollution levels during first trimester in the study population
Characteristics N = 497,4021
PM2.5 (µg/m3) 7.6 (2.0)
PM10 (µg/m3) 15.2 (4.1)
NO2 (µg/m3) 14.3 (8.0)
O3 (µg/m3) 53.3 (9.5)
1 Mean (SD)
Characteristic Low tertile
N = 165,801
1
Mid tertile
N = 165,800
1
High tertile
N = 165,801
1
PM2.5 (µg/m3) 5.4 (1.0) 7.6 (0.5) 9.7 (1.1)
1 Mean (SD)
Characteristic Low tertile
N = 165,801
1
Mid tertile
N = 165,800
1
High tertile
N = 165,801
1
PM10 (µg/m3) 11.2 (1.8) 14.7 (0.8) 19.8 (2.9)
1 Mean (SD)
Characteristic Low tertile
N = 165,801
1
Mid tertile
N = 165,800
1
High tertile
N = 165,801
1
NO2 (µg/m3) 6.2 (2.7) 13.3 (1.6) 23.3 (5.9)
1 Mean (SD)
Characteristic Low tertile
N = 165,801
1
Mid tertile
N = 165,800
1
High tertile
N = 165,801
1
O3 (µg/m3) 43.3 (3.2) 52.2 (3.1) 64.5 (4.7)
1 Mean (SD)

Exposure in relation to WHO/EU guidelines

WHO/EU guidelines for individual air pollutants and proportion of population exposed to levels exceeding guuidelines during first trimester.

Code
thresholds <- tibble(
  Pollutant = c("PM2.5", "PM10", "NO2", "O3"),
  WHO_Threshold = c(5, 15, 10, 60),
  EU_Threshold = c(10, 20, 20, 100)
)

# Calculate exceedance proportions
exceedance_data <- data %>%
  summarise(
    PM2.5_WHO = mean(pm25_trim_first > thresholds$WHO_Threshold[1], na.rm = TRUE),
    PM10_WHO = mean(pm10_trim_first > thresholds$WHO_Threshold[2], na.rm = TRUE),
    NO2_WHO = mean(no2_trim_first > thresholds$WHO_Threshold[3], na.rm = TRUE),
    O3_WHO = mean(o3_trim_first > thresholds$WHO_Threshold[4], na.rm = TRUE),
    PM2.5_EU = mean(pm25_trim_first > thresholds$EU_Threshold[1], na.rm = TRUE),
    PM10_EU = mean(pm10_trim_first > thresholds$EU_Threshold[2], na.rm = TRUE),
    NO2_EU = mean(no2_trim_first > thresholds$EU_Threshold[3], na.rm = TRUE),
    O3_EU = mean(o3_trim_first > thresholds$EU_Threshold[4], na.rm = TRUE)
  ) %>%
  pivot_longer(
    cols = everything(),
    names_to = c("Pollutant", "Standard"),
    names_sep = "_",
    values_to = "Proportion_Exceeding"
  ) %>%
  mutate(Proportion_Exceeding = round(Proportion_Exceeding * 100, 1)) # Convert to percentage

# Combine WHO and EU data into a single table
final_table <- thresholds %>%
  left_join(
    exceedance_data %>%
      filter(Standard == "WHO") %>%
      select(Pollutant, WHO_Exceedance = Proportion_Exceeding),
    by = "Pollutant"
  ) %>%
  left_join(
    exceedance_data %>%
      filter(Standard == "EU") %>%
      select(Pollutant, EU_Exceedance = Proportion_Exceeding),
    by = "Pollutant"
  )

# Create the gt table
gt_table <- final_table %>%
  gt() %>%
  tab_header(
    title = "International Guidelines and Proportion of Cases Exceeding",
    subtitle = "WHO (2021) and EU (2024) Standards for Air Pollutants (Average in First Trimester)"
  ) %>%
  cols_label(
    Pollutant = "Pollutant",
    WHO_Threshold = "WHO Guideline (µg/m³)",
    WHO_Exceedance = "Proportion Exceeding WHO (%)",
    EU_Threshold = "EU Guideline (µg/m³)",
    EU_Exceedance = "Proportion Exceeding EU (%)"
  )
International Guidelines and Proportion of Cases Exceeding
WHO (2021) and EU (2024) Standards for Air Pollutants (Average in First Trimester)
Pollutant WHO Guideline (µg/m³) EU Guideline (µg/m³) Proportion Exceeding WHO (%) Proportion Exceeding EU (%)
PM2.5 5 10 88.9 10.1
PM10 15 20 45.7 13.4
NO2 10 20 69.7 20.2
O3 60 100 27.2 0.0

Exposure by season and year

Distribution of air pollutants by season and by year.

Code
#boxplots: air quality by season of conception

#select data: air pollutants and season
data1 <- data %>%
  select(pm25_wk_1, pm10_wk_1, o3_wk_1, no2_wk_1, 
         season_conception)

#reshape data into long format
data1_long <- data1 %>%
  gather(variable, value, pm25_wk_1:no2_wk_1)

#replace values of air pollutants
data1_long <- data1_long %>%
  mutate(variable = case_when(
    variable == "pm25_wk_1" ~ "PM 2.5",
    variable == "pm10_wk_1" ~ "PM 10",
    variable == "no2_wk_1" ~ "NO2",
    variable == "o3_wk_1" ~ "O3",
    TRUE ~ variable  # Keep the original value if it doesn't match any condition
  ))  %>%
  rename(Pollutant = variable)

#plot combined boxplots: season
airpoll_season <- ggplot(data1_long, aes(x = season_conception, y = value, fill = Pollutant)) +
  geom_boxplot() +
  labs(title = "Air Quality by season of conception",
       x = "Season of conception",
       y = "Concentration of air pollution (µm/m3)",
       fill = "Pollutant") +
  scale_fill_manual(values = c("blue", "red", "green", "purple"))  +
  theme_classic() +
  theme(axis.text.x = element_text(size = 16),  axis.text.y = element_text(size = 16),  
        legend.text = element_text(size = 16),  legend.title = element_text(size = 16), 
        plot.title = element_text(size = 20)) 

#boxplots: air quality by year

#select data: air pollutants and season
data2 <- data %>%
  select(pm25_wk_1, pm10_wk_1, o3_wk_1, no2_wk_1, 
         year_group)

#reshape data into long format
data2_long <- data2 %>%
  gather(variable, value, pm25_wk_1:no2_wk_1)

#replace values of air pollutants
data2_long <- data2_long %>%
  mutate(variable = case_when(
    variable == "pm25_wk_1" ~ "PM 2.5",
    variable == "pm10_wk_1" ~ "PM 10",
    variable == "no2_wk_1" ~ "NO2",
    variable == "o3_wk_1" ~ "O3",
    TRUE ~ variable  # Keep the original value if it doesn't match any condition
  ))  %>%
  rename(Pollutant = variable)

#plot combined boxplots: season
airpoll_year <- ggplot(data2_long, aes(x = year_group, y = value, fill = Pollutant)) +
  geom_boxplot() +
  labs(title = "Air Quality by year of conception",
       x = "Year of conception",
       y = "Concentration of air pollution (µm/m3)",
       fill = "Pollutant") +
  scale_fill_manual(values = c("blue", "red", "green", "purple"))  +
  theme_classic() +
  theme(axis.text.x = element_text(size = 16),  axis.text.y = element_text(size = 16),  
        legend.text = element_text(size = 16),  legend.title = element_text(size = 16), 
        plot.title = element_text(size = 20)) 

Correlation between exposures

Correlation between exposure to multiple air pollutants during first trimester (gestational week 0-13) in the study population. Note: Blue: coefficient >0; red: coefficient <0.

Code
#load libraries
pacman::p_load(AMR, 
               corrplot, 
               gridGraphics)
 
#set data to data.table
setDT(data)

# Correlation between first trimester exposures
data1 <- data %>%
  filter(
    !is.na(pm25_trim_first) & !is.na(pm10_trim_first) & !is.na(no2_trim_first) & !is.na(o3_trim_first)
  )

exp_matrix <- data1 %>%
  select(
    pm25_trim_first,
    pm10_trim_first,
    no2_trim_first,
    o3_trim_first
  )

colnames(exp_matrix) <- c("Trim1 PM2.5", "Trim1 PM10", "Trim1 NO2", "Trim1 O3")

#apply function cor to  matrix "exp_matrix
polday <- cor(as.matrix(exp_matrix))

corrplot(method = "ellipse", 
           polday, type = "lower", 
           order = "original", 
           title = element_blank(), 
           addCoef.col = 'black', tl.col = "black", 
           cl.pos = 'n', col = COL2('RdBu'),
           tl.srt = 0, 
           diag = FALSE, 
           mar = c(0, 0, 1, 0),
           tl.offset = 0.71, 
           col.lim = c(-1, 1),
           number.cex = 1.5, 
           tl.cex = 1.2,
           cl.cex = 1.3
)

Correlation between exposure to multiple air pollutants during first trimester (gestational week 0-13). and for individual weeks in first, second and third trimester in the study population. Note: Blue: coefficient >0; red: coefficient <0.

Code
#Correlation between weekly exposures Week 1, Week 13, Week 28

# Exclude cases with all exposures missing
# Exclude cases with all exposures missing
data1 <- data %>%
  filter(
    !is.na(pm25_wk_1) & !is.na(pm10_wk_1) & !is.na(no2_wk_1) & !is.na(o3_wk_1) & 
    !is.na(pm25_wk_13) & !is.na(pm10_wk_13) & !is.na(no2_wk_13) & !is.na(o3_wk_13) &
    !is.na(pm25_wk_28) & !is.na(pm10_wk_28) & !is.na(no2_wk_28) & !is.na(o3_wk_28)
  )

# Create the exposure matrix
exp_matrix <- data1 %>%
  select(
    pm25_wk_1, pm25_wk_13, pm25_wk_28, 
    pm10_wk_1, pm10_wk_13, pm10_wk_28, 
    no2_wk_1, no2_wk_13, no2_wk_28,
    o3_wk_1, o3_wk_13, o3_wk_28
  )

# Set the names for the figure
colnames(exp_matrix) <- c(
  "W1 PM2.5", "W13 PM2.5", "W28 PM2.5",
  "W1 PM10", "W13 PM10", "W28 PM10",
  "W1 NO2", "W13 NO2", "W28 NO2",
  "W1 O3", "W13 O3", "W28 O3"
)

# Apply the cor function to the exposure matrix
polday <- cor(as.matrix(exp_matrix))

# Save the correlation plot
corrplot_wk_W1W13W28 <- corrplot(
  method = "ellipse", 
  polday, 
  type = "lower", 
  order = "original", 
  title = element_blank(), 
  addCoef.col = 'black', 
  tl.col = "black", 
  cl.pos = 'n', 
  col = COL2('RdBu'),
  tl.srt = 90, 
  diag = FALSE, 
  number.cex = 0.6, 
  mar = c(0, 0, 1, 0), 
  tl.offset = 0.5, 
  tl.cex = 0.8,
  col.lim = c(-1, 1),
  cl.cex = 0.7
)

Outcomes descriptives

Distribution of birth weight and birth weight ultrasound z-scores.

Code
#Box plot of zBW by tertiles

#plot birth_weight_zscore_tertiles by actual birth weight
boxplot_BWtertiles <- ggplot(data %>% filter(!is.na(birth_weight)), aes(x = BW_tertiles, y = birth_weight)) +
  geom_boxplot() + 
  labs(x = "Birth Weight Tertiles", y = "Birth Weight (kg)") +
  ggtitle("Boxplot of birth weight by Birth Weight Tertiles")

#plot birth_weight_zscore_tertiles by actual birth weight
boxplot_zBWtertiles <- ggplot(data %>% filter(!is.na(zBW)), aes(x = zBW_tertiles, y = zBW)) +
  geom_boxplot() + 
  labs(x = "Z-score Birth Weight Tertiles", y = "z-score Birth Weight") +
  ggtitle("Boxplot of z-score birth weight by z-score Birth Weight Tertiles")

#Make ggplots for z-score distribution

# Select relevant columns
selected_zscore_columns <- data %>%
  select(
    zAC_W16W23, zBPD_W16W23, zFL_W16W23,
    zAC_W24W31, zBPD_W24W31, zFL_W24W31,
    zAC_W32W40, zBPD_W32W40, zFL_W32W40
  )

# Reshape data to long format
long_zscore_data <- selected_zscore_columns %>%
  pivot_longer(
    cols = everything(),
    names_to = c("Variable", "TimeWindow"),
    names_pattern = "z(.*)_(W\\d{2}W\\d{2})",
    values_to = "zScore") %>%
  filter(zScore > -10 & zScore < 10)

zscore_color_mapping <- c("AC" = "green", "BPD" = "red", "FL" = "blue")

# Create a combined plot for z-scores
zscore_combined_plot <- long_zscore_data %>%
  ggplot(aes(x = zScore, fill = Variable, color = Variable)) +
  geom_histogram(
    aes(y = ..density..),
    position = "identity",
    alpha = 0.05,
    bins = 30
  ) +
  geom_density(alpha = 0.3, size = 1) +
  scale_fill_manual(values = zscore_color_mapping) +
  scale_color_manual(values = zscore_color_mapping) +
  facet_wrap(~ TimeWindow, scales = "free", ncol = 1) +
  labs(
    title = "Z-Scores for ultrasound measures in three gestational perods",
    x = "Z-Score",
    y = "Density"
  ) +
  theme_minimal() +
  theme(
    legend.title = element_blank(),
    strip.text = element_text(face = "bold")
  )

#EFW z-scores

# Select relevant columns for EFW z-scores
selected_efw_zscore_columns <- data %>%
  select(
    zEFW_W16W23, 
    zEFW_W24W31, 
    zEFW_W32W40
  )

# Reshape data to long format for EFW z-scores
long_efw_zscore_data <- selected_efw_zscore_columns %>%
  pivot_longer(
    cols = everything(),
    names_to = "TimeWindow",
    names_pattern = "zEFW_(W\\d{2}W\\d{2})",
    values_to = "zScore"
  ) %>%
  filter(zScore > -10 & zScore < 10)

# Define color mapping for EFW (if needed, can be single color)
efw_color_mapping <- c("EFW" = "purple")

# Create the combined plot for EFW z-scores
efw_zscore_combined_plot <- long_efw_zscore_data %>%
  ggplot(aes(x = zScore, fill = "EFW", color = "EFW")) +
  geom_histogram(
    aes(y = ..density..),
    position = "identity",
    alpha = 0.05,
    bins = 30
  ) +
  geom_density(alpha = 0.3, size = 1) +
  scale_fill_manual(values = efw_color_mapping) +
  scale_color_manual(values = efw_color_mapping) +
  facet_wrap(~ TimeWindow, scales = "free", ncol = 1) +
  labs(
    title = "Z-Scores for EFW in three gestational periods",
    x = "Z-Score",
    y = "Density"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",  # Hide legend since only one variable
    strip.text = element_text(face = "bold")
  )

Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

Outcome table and Missing outcomes

Distribution of birth and ultrasound raw outcomes, z-scores, and distribution of missing outcomes.

Code
#make trial dataset

trialdata <- data |> 
  select(zBW_tertiles,  zBW,
         birth_weight, birth_length, zBL, birth_headcircumference, zBHC,
         g_age_d_BPD_W16W23 , BPD_W16W23 , zBPD_W16W23 , 
                        g_age_d_BPD_W24W31 , BPD_W24W31 , zBPD_W24W31 , 
                        g_age_d_BPD_W32W40 , BPD_W32W40 , zBPD_W32W40 , 
                        g_age_d_AC_W16W23 , AC_W16W23, zAC_W16W23 , 
                        g_age_d_AC_W24W31 , AC_W24W31 , zAC_W24W31 , 
                        g_age_d_AC_W32W40 , AC_W32W40 , zAC_W32W40 , 
                        g_age_d_FL_W16W23 , FL_W16W23 , zFL_W16W23 , 
                        g_age_d_FL_W24W31 , FL_W24W31 , zFL_W24W31 , 
                        g_age_d_FL_W32W40 , FL_W32W40 , zFL_W32W40 , 
                        g_age_d_EFW_W16W23 , EFW_W16W23 , zEFW_W16W23 , 
                        g_age_d_EFW_W24W31 , EFW_W24W31 , zEFW_W24W31 , 
                        g_age_d_EFW_W32W40 , EFW_W32W40 , zEFW_W32W40)

#make trial crosstable by zBW
outcome_table <-
  tbl_summary(
    trialdata,
    include = c("birth_weight", "birth_length", "birth_headcircumference", 
                "zBL", "zBHC", 
      "BPD_W16W23", "BPD_W24W31", "BPD_W32W40",
"zBPD_W16W23", "zBPD_W24W31", "zBPD_W32W40", 
"AC_W16W23", "AC_W24W31", "AC_W32W40",
"zAC_W16W23", "zAC_W24W31", "zAC_W32W40", 
"FL_W16W23", "FL_W24W31", "FL_W32W40",
"zFL_W16W23", "zFL_W24W31", "zFL_W32W40", 
"EFW_W16W23", "EFW_W24W31", "EFW_W32W40",
"zEFW_W16W23", "zEFW_W24W31", "zEFW_W32W40"),
        by = zBW_tertiles, 
    label = list(birth_weight ~"Birth weight", birth_length ~"Birth length", birth_headcircumference ~"Birth head circumference", 
      BPD_W16W23 ~ "BPD W16-W23", zBPD_W16W23 ~ "zBPD W16-W23", BPD_W24W31 ~ "BPD W24-W31", zBPD_W24W31 ~ "zBPD W24-W31", BPD_W32W40 ~ "BPD W32-W40", zBPD_W32W40 ~ "zBPD W32-W40", AC_W16W23 ~ "AC W16-W23", zAC_W16W23 ~ "zAC W16-W23", AC_W24W31 ~ "AC W24-W31", zAC_W24W31 ~ "zAC W24-W31", AC_W32W40 ~ "AC W32-W40", zAC_W32W40 ~ "zAC W32-W40", FL_W16W23 ~ "FL W16-W23", zFL_W16W23 ~ "zFL W16-W23", FL_W24W31 ~ "FL W24-W31", zFL_W24W31 ~ "zFL W24-W31", FL_W32W40 ~ "FL W32-W40", zFL_W32W40 ~ "zFL W32-W40", EFW_W16W23 ~ "EFW W16-W23", zEFW_W16W23 ~ "zEFW W16-W23", EFW_W24W31 ~ "EFW W24-W31", zEFW_W24W31 ~ "zEFW W24-W31", EFW_W32W40 ~ "EFW W32-W40", zEFW_W32W40 ~ "zEFW W32-W40"),
        statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"), 
    digits = all_continuous() ~ 1, 
      missing = "no") |> 
  add_overall() |> 
modify_header(label = "**Characteristics**") |> 
  bold_labels() |> 
modify_caption("Table 1. Anthropometry characteristics. during pregnancy, stratified by tertiles of birth weight z-score")  |> 
 modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**zBW tertiles**")

#make missing outcome table
missing_outcome_table <- tbl_summary(
  trialdata, 
  by = zBW_tertiles, 
  include = c("birth_weight", "birth_length", "birth_headcircumference", 
              "BPD_W16W23", "AC_W16W23", "FL_W16W23", "EFW_W16W23",
              "BPD_W24W31", "AC_W24W31", "FL_W24W31", "EFW_W24W31", 
              "BPD_W32W40", "AC_W32W40", "FL_W32W40", "EFW_W32W40"), 
  label = list(BPD_W16W23 ~ "BPD W16-W23", BPD_W24W31 ~ "BPD W24-W31", 
               BPD_W32W40 ~ "BPD W32-W40", AC_W16W23 ~ "AC W16-W23", 
               AC_W24W31 ~ "AC W24-W31", AC_W32W40 ~ "AC W32-W40", 
               FL_W16W23 ~ "FL W16-W23",  FL_W24W31 ~ "FL W24-W31", 
               FL_W32W40 ~ "FL W32-W40", EFW_W16W23 ~ "EFW W16-W23", 
               EFW_W24W31 ~ "EFW W24-W31", EFW_W32W40 ~ "EFW W32-W40", 
               birth_weight ~"Birth weight", birth_length ~"Birth length", birth_headcircumference ~"Birth head circumference"),
   statistic = list(all_continuous() ~ "{mean}", missing = "no"), missing_stat = "({p_miss}%)") %>%
  modify_header(label = "**Characteristics**") %>%
   add_n(statistic = "{N_miss} ({p_miss}%)")  %>%
  modify_header(n = "**Missing**") %>%
  bold_levels() %>%
  modify_caption("Table 2. Missing proportions of outcome characteristics")
Table 1. Anthropometry characteristics. during pregnancy, stratified by tertiles of birth weight z-score
Characteristics Overall
N = 497,402
1
zBW tertiles
Low tertile zBW
N = 165,831
1
Mid tertile zBW
N = 165,774
1
High tertile zBW
N = 165,797
1
Birth weight 3.5 (0.5) 3.1 (0.4) 3.5 (0.3) 4.0 (0.4)
Birth length 50.0 (4.5) 48.4 (4.7) 50.0 (4.2) 51.5 (4.1)
Birth head circumference 34.7 (3.8) 33.8 (3.9) 34.7 (3.6) 35.6 (3.7)
zBL 0.6 (1.1) -0.3 (0.9) 0.6 (0.8) 1.4 (0.8)
zBHC 0.9 (1.1) 0.2 (1.1) 0.9 (1.0) 1.6 (1.0)
BPD W16-W23 44.0 (3.7) 43.9 (3.6) 44.0 (3.5) 44.1 (4.0)
BPD W24-W31 76.4 (6.4) 75.9 (5.9) 76.2 (6.5) 77.3 (6.8)
BPD W32-W40 89.5 (6.6) 88.4 (5.8) 89.1 (7.3) 91.2 (6.6)
zBPD W16-W23 -0.5 (1.0) -0.6 (0.9) -0.5 (0.9) -0.5 (1.2)
zBPD W24-W31 -0.4 (0.9) -0.7 (0.9) -0.3 (0.9) 0.0 (0.9)
zBPD W32-W40 -0.3 (1.4) -0.7 (1.2) -0.3 (1.6) 0.2 (1.3)
AC W16-W23 139.1 (16.1) 137.7 (16.8) 139.1 (16.2) 140.6 (15.0)
AC W24-W31 259.1 (26.1) 253.4 (22.2) 258.1 (25.6) 266.7 (28.6)
AC W32-W40 324.9 (30.5) 313.2 (23.3) 322.2 (27.1) 341.5 (33.6)
zAC W16-W23 0.4 (1.7) 0.1 (1.9) 0.4 (1.8) 0.6 (1.5)
zAC W24-W31 0.8 (1.0) 0.2 (0.9) 0.8 (0.8) 1.5 (1.0)
zAC W32-W40 0.7 (1.1) -0.1 (0.8) 0.7 (0.7) 1.5 (1.0)
FL W16-W23 29.0 (3.8) 28.8 (3.9) 29.0 (4.0) 29.2 (3.5)
FL W24-W31 56.8 (5.8) 56.4 (5.1) 56.7 (6.6) 57.5 (5.7)
FL W32-W40 69.0 (5.7) 68.2 (4.8) 68.5 (5.1) 70.3 (6.8)
zFL W16-W23 0.3 (1.5) 0.2 (1.6) 0.4 (1.7) 0.5 (1.3)
zFL W24-W31 1.1 (1.4) 0.7 (1.1) 1.1 (1.9) 1.4 (1.1)
zFL W32-W40 1.1 (1.3) 0.6 (1.0) 1.1 (1.0) 1.5 (1.7)
EFW W16-W23 232.9 (123.2) 228.1 (138.0) 232.5 (121.5) 238.2 (108.3)
EFW W24-W31 1,507.6 (393.2) 1,429.6 (327.1) 1,492.8 (385.8) 1,613.2 (444.3)
EFW W32-W40 2,756.7 (639.8) 2,537.7 (465.9) 2,686.2 (581.8) 3,089.1 (732.1)
zEFW W16-W23 -1.9 (17.2) -2.2 (28.1) -1.8 (6.0) -1.6 (7.8)
zEFW W24-W31 0.7 (1.0) 0.1 (0.9) 0.7 (0.8) 1.4 (0.9)
zEFW W32-W40 0.5 (1.0) -0.3 (0.8) 0.5 (0.7) 1.4 (0.8)
1 Mean (SD)
Table 2. Missing proportions of outcome characteristics
Characteristics Missing Low tertile zBW
N = 165,831
1
Mid tertile zBW
N = 165,774
1
High tertile zBW
N = 165,797
1
Birth weight 0 (0%) 3.05 3.51 4.05
Birth length 0 (0%) 48.45 50.00 51.54
Birth head circumference 0 (0%) 33.76 34.70 35.61
BPD W16-W23 25,535 (5.1%) 43.89 43.96 44.12
    Unknown
(5.4%) (4.9%) (5.1%)
AC W16-W23 64,546 (13%) 138 139 141
    Unknown
(13%) (13%) (13%)
FL W16-W23 28,646 (5.8%) 28.81 28.99 29.18
    Unknown
(6.0%) (5.5%) (5.8%)
EFW W16-W23 66,081 (13%) 228 232 238
    Unknown
(14%) (13%) (13%)
BPD W24-W31 418,159 (84%) 76 76 77
    Unknown
(82%) (85%) (85%)
AC W24-W31 418,389 (84%) 253 258 267
    Unknown
(82%) (85%) (85%)
FL W24-W31 422,171 (85%) 56.4 56.7 57.5
    Unknown
(83%) (86%) (86%)
EFW W24-W31 422,323 (85%) 1,430 1,493 1,613
    Unknown
(83%) (86%) (86%)
BPD W32-W40 322,849 (65%) 88.4 89.1 91.2
    Unknown
(60%) (69%) (66%)
AC W32-W40 318,599 (64%) 313 322 341
    Unknown
(59%) (68%) (65%)
FL W32-W40 331,598 (67%) 68.2 68.5 70.3
    Unknown
(61%) (70%) (68%)
EFW W32-W40 331,933 (67%) 2,538 2,686 3,089
    Unknown
(62%) (70%) (68%)
1 Mean

Availability of repeated outcomes

To show cases with repeated measurements for gestational windows

Code
#Variables indicating overall data availability by gestational window
data <- data %>%
  mutate(
    # For weeks 16-23
    avail_W16W23 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W16W23"), 
                            starts_with("AC_W16W23"), 
                            starts_with("FL_W16W23"), 
                            starts_with("EFW_W16W23")))) > 0, 
      "Yes", "Missing"),
    # For weeks 24-31
    avail_W24W31 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W24W31"), 
                            starts_with("AC_W24W31"), 
                            starts_with("FL_W24W31"), 
                            starts_with("EFW_W24W31")))) > 0, 
      "Yes", "Missing"),
    # For weeks 32-40
    avail_W32W40 = if_else(
      rowSums(!is.na(select(., starts_with("BPD_W32W40"), 
                            starts_with("AC_W32W40"), 
                            starts_with("FL_W32W40"), 
                            starts_with("EFW_W32W40")))) > 0, 
      "Yes", "Missing"),
    # For birth variables
    avail_birth = if_else(
      rowSums(!is.na(select(., birth_weight, birth_length, birth_headcircumference))) > 0, 
      "Yes", "Missing")
  )

proportions <- data %>%
  mutate(
    time_points_available = rowSums(select(., avail_W16W23, avail_W24W31, avail_W32W40) == "Yes"),
    availability_group = case_when(
      time_points_available == 3 ~ "All 3 gestational windows",
      time_points_available == 2 ~ "2 of 3 gestational windows",
      time_points_available == 1 ~ "1 of 3 gestational windows",
      time_points_available == 0 ~ "0 time points"
    )
  ) %>%
  group_by(availability_group) %>%
  summarise(
    count = n(),
    proportion = n() / nrow(data)
  ) %>%
  ungroup()

# Filter out the "0 time points" category
proportions <- proportions %>%
  filter(availability_group != "0 time points") %>%
  mutate(label = as.character(count)) # Only include the number of cases in the label

# Create the pie chart
ggplot(proportions, aes(x = "", y = proportion, fill = availability_group)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  theme_void() +
  labs(
    title = "Availability of Repeated Observations Across Gestational Windows (n)",
    fill = "G"
  ) +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), size = 5) + # Add case count labels
  theme(
    legend.title = element_text(size = 14),  # Increase legend title font size
    legend.text = element_text(size = 12)    # Increase legend text font size
  ) +
  scale_fill_brewer(palette = "Set2")

Sociodemographics descriptives

Demographics by zBW tertiles and demographics for cases with missing outcomes by trimester.

Code
#Demographics by tertiles 

#make trial dataset
trialdata <- data |> 
  select(zBW_tertiles, gest_age_birth, birth_weight, birth_length, birth_headcircumference, zBHC,
sex, fstart, season, geogr, mat_age_cat, meduc, bmi_cat, smoke_preg, alcohol_cat, hypertension, dm2)

#make crosstable by zBW
baseline_table <-
  tbl_summary(
    trialdata,
    include = c("gest_age_birth", "birth_weight", #anthropometrics
                "sex", "fstart", "season",  #fetal characteristics
                "geogr", "mat_age_cat", "meduc", "bmi_cat", "smoke_preg"), #maternal characteristics
        by = zBW_tertiles, 
    label = list(gest_age_birth ~ "Gestational age at birth (days)", birth_weight ~ "Birth weight (kg)",
    sex ~ "Fetal sex", fstart ~ "Birth type", season ~ "Season of birth", 
    geogr ~ "Region of residency", mat_age_cat ~ "Maternal age at birth (years)", meduc ~ "Maternal education", bmi_cat ~ "Maternal body mass index", smoke_preg ~ "Maternal smoking status"),
        statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"), 
    digits = all_continuous() ~ 1, 
    missing_text = "Missing") |> 
  add_overall() |> 
modify_header(label = "**Characteristics**") |> 
  bold_labels() |> 
modify_caption("Table 1. Baseline characteristics. Birth anthropometrics, fetal and maternal characteristics stratified by tertiles of birth weight z-score")  |> 
 modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**zBW tertiles**")
Table 1. Baseline characteristics. Birth anthropometrics, fetal and maternal characteristics stratified by tertiles of birth weight z-score
Characteristics Overall
N = 497,402
1
zBW tertiles
Low tertile zBW
N = 165,831
1
Mid tertile zBW
N = 165,774
1
High tertile zBW
N = 165,797
1
Gestational age at birth (days) 278.6 (11.5) 277.6 (12.1) 278.3 (11.8) 279.8 (10.4)
Birth weight (kg) 3.5 (0.5) 3.1 (0.4) 3.5 (0.3) 4.0 (0.4)
Fetal sex



    Female 241,737 (49%) 79,380 (48%) 81,181 (49%) 81,176 (49%)
    Male 255,665 (51%) 86,451 (52%) 84,593 (51%) 84,621 (51%)
Birth type



    Cesarean section 38,354 (7.7%) 9,848 (5.9%) 12,569 (7.6%) 15,937 (9.6%)
    Induction 91,815 (18%) 32,069 (19%) 27,120 (16%) 32,626 (20%)
    Spontaneous 367,233 (74%) 123,914 (75%) 126,085 (76%) 117,234 (71%)
Season of birth



    Winter 110,574 (22%) 37,142 (22%) 37,075 (22%) 36,357 (22%)
    Spring 128,603 (26%) 42,875 (26%) 43,024 (26%) 42,704 (26%)
    Summer 135,368 (27%) 44,671 (27%) 45,010 (27%) 45,687 (28%)
    Fall 122,857 (25%) 41,143 (25%) 40,665 (25%) 41,049 (25%)
Region of residency



    North 46,205 (9.3%) 14,629 (8.8%) 15,436 (9.3%) 16,140 (9.7%)
    Central 207,779 (42%) 70,317 (42%) 69,586 (42%) 67,876 (41%)
    South 243,418 (49%) 80,885 (49%) 80,752 (49%) 81,781 (49%)
Maternal age at birth (years)



    ≤24 55,914 (11%) 21,679 (13%) 18,646 (11%) 15,589 (9.4%)
    25-29 155,889 (31%) 52,489 (32%) 52,743 (32%) 50,657 (31%)
    30-34 172,148 (35%) 55,880 (34%) 57,248 (35%) 59,020 (36%)
    ≥35 113,391 (23%) 35,759 (22%) 37,115 (22%) 40,517 (24%)
    Missing 60 24 22 14
Maternal education



    High school 160,604 (38%) 51,678 (38%) 52,750 (38%) 56,176 (40%)
    Primary or lower 35,275 (8.4%) 14,146 (10%) 11,376 (8.1%) 9,753 (6.9%)
    University or higher 222,978 (53%) 71,859 (52%) 75,662 (54%) 75,457 (53%)
    Missing 78,545 28,148 25,986 24,411
Maternal body mass index



    Underweight 12,670 (2.6%) 6,712 (4.2%) 3,963 (2.5%) 1,995 (1.2%)
    Normalweight 276,051 (58%) 99,690 (62%) 95,339 (60%) 81,022 (51%)
    Overweight 123,726 (26%) 36,060 (23%) 40,442 (25%) 47,224 (30%)
    Obesity 67,167 (14%) 17,292 (11%) 20,186 (13%) 29,689 (19%)
    Missing 17,788 6,077 5,844 5,867
Maternal smoking status



    Smoker 21,807 (4.8%) 10,047 (6.6%) 6,593 (4.3%) 5,167 (3.4%)
    Non-smoker 433,977 (95%) 142,026 (93%) 145,375 (96%) 146,576 (97%)
    Missing 41,618 13,758 13,806 14,054
1 Mean (SD); n (%)
Table: Characteristics for cases with available ultrasound parameters in respective gestational windows
Characteristics
Available data W16W23
Available data W24W31
Available data W32W40
N = 472,0981 N = 79,3371 N = 179,1351
Gestational age at birth (days) 278.7 (11.4) 274.5 (14.4) 277.9 (10.9)
Birth weight (kg) 3.5 (0.5) 3.4 (0.6) 3.5 (0.6)
Fetal sex


    Female 229,515 (49%) 39,301 (50%) 87,815 (49%)
    Male 242,583 (51%) 40,036 (50%) 91,320 (51%)
Birth type


    Cesarean section 35,863 (7.6%) 10,134 (13%) 18,167 (10%)
    Induction 85,997 (18%) 20,413 (26%) 46,314 (26%)
    Spontaneous 350,238 (74%) 48,790 (61%) 114,654 (64%)
Season of birth


    Winter 104,575 (22%) 18,286 (23%) 40,781 (23%)
    Spring 122,428 (26%) 20,031 (25%) 45,705 (26%)
    Summer 128,676 (27%) 21,725 (27%) 47,572 (27%)
    Fall 116,419 (25%) 19,295 (24%) 45,077 (25%)
Region of residency


    North 43,318 (9.2%) 7,389 (9.3%) 14,222 (7.9%)
    Central 200,986 (43%) 37,950 (48%) 81,978 (46%)
    South 227,794 (48%) 33,998 (43%) 82,935 (46%)
Maternal age at birth (years)


    ≤24 52,946 (11%) 8,448 (11%) 18,579 (10%)
    25-29 148,574 (31%) 22,436 (28%) 51,991 (29%)
    30-34 164,063 (35%) 26,241 (33%) 61,153 (34%)
    ≥35 106,461 (23%) 22,201 (28%) 47,394 (26%)
    Missing 54 11 18
Maternal education


    High school 152,610 (38%) 25,996 (40%) 57,919 (39%)
    Primary or lower 33,412 (8.4%) 6,116 (9.3%) 13,175 (8.8%)
    University or higher 212,411 (53%) 33,501 (51%) 79,286 (53%)
    Missing 73,665 13,724 28,755
Maternal body mass index


    Underweight 12,186 (2.7%) 2,610 (3.4%) 5,512 (3.2%)
    Normalweight 264,628 (58%) 39,056 (51%) 91,291 (53%)
    Overweight 117,423 (26%) 19,331 (25%) 42,931 (25%)
    Obesity 62,225 (14%) 15,203 (20%) 32,712 (19%)
    Missing 15,636 3,137 6,689
Maternal smoking status


    Smoker 20,506 (4.7%) 4,791 (6.5%) 10,822 (6.5%)
    Non-smoker 412,917 (95%) 68,385 (93%) 154,505 (93%)
    Missing 38,675 6,161 13,808
1 Mean (SD); n (%)
Table: Characteristics for cases with missing ultrasound parameters in respective gestational windows
Characteristics
Missing data W16W23
Missing data W24W31
Missing data W32W40
N = 25,3041 N = 418,0651 N = 318,2671
Gestational age at birth (days) 276.8 (13.6) 279.3 (10.7) 278.9 (11.8)
Birth weight (kg) 3.5 (0.6) 3.6 (0.5) 3.6 (0.5)
Fetal sex


    Female 12,222 (48%) 202,436 (48%) 153,922 (48%)
    Male 13,082 (52%) 215,629 (52%) 164,345 (52%)
Birth type


    Cesarean section 2,491 (9.8%) 28,220 (6.8%) 20,187 (6.3%)
    Induction 5,818 (23%) 71,402 (17%) 45,501 (14%)
    Spontaneous 16,995 (67%) 318,443 (76%) 252,579 (79%)
Season of birth


    Winter 5,999 (24%) 92,288 (22%) 69,793 (22%)
    Spring 6,175 (24%) 108,572 (26%) 82,898 (26%)
    Summer 6,692 (26%) 113,643 (27%) 87,796 (28%)
    Fall 6,438 (25%) 103,562 (25%) 77,780 (24%)
Region of residency


    North 2,887 (11%) 38,816 (9.3%) 31,983 (10%)
    Central 6,793 (27%) 169,829 (41%) 125,801 (40%)
    South 15,624 (62%) 209,420 (50%) 160,483 (50%)
Maternal age at birth (years)


    ≤24 2,968 (12%) 47,466 (11%) 37,335 (12%)
    25-29 7,315 (29%) 133,453 (32%) 103,898 (33%)
    30-34 8,085 (32%) 145,907 (35%) 110,995 (35%)
    ≥35 6,930 (27%) 91,190 (22%) 65,997 (21%)
    Missing 6 49 42
Maternal education


    High school 7,994 (39%) 134,608 (38%) 102,685 (38%)
    Primary or lower 1,863 (9.1%) 29,159 (8.3%) 22,100 (8.2%)
    University or higher 10,567 (52%) 189,477 (54%) 143,692 (54%)
    Missing 4,880 64,821 49,790
Maternal body mass index


    Underweight 484 (2.1%) 10,060 (2.5%) 7,158 (2.3%)
    Normalweight 11,423 (49%) 236,995 (59%) 184,760 (60%)
    Overweight 6,303 (27%) 104,395 (26%) 80,795 (26%)
    Obesity 4,942 (21%) 51,964 (13%) 34,455 (11%)
    Missing 2,152 14,651 11,099
Maternal smoking status


    Smoker 1,301 (5.8%) 17,016 (4.4%) 10,985 (3.8%)
    Non-smoker 21,060 (94%) 365,592 (96%) 279,472 (96%)
    Missing 2,943 35,457 27,810
1 Mean (SD); n (%)

Included vs Excluded

Demographics of included (inclusion criteria: date of conception, ultrasound, exposure data) vs excluded (SPR dataset).

Table. Included and Excluded Characteristics
Characteristics
All SPR births n=620,866
Excluded
N = 123,464
1
Included
N = 497,402
1
Gestational age at birth (days) 39.4 (2.6) 39.7 (1.6)
Birth weight (g) 3,462.7 (655.3) 3,537.8 (535.6)
Preterm birth 7,964 (7.2%) 22,853 (4.6%)
Fetal sex

    Boy 63,539 (52%) 255,665 (51%)
    Girl 59,765 (48%) 241,737 (49%)
    Unknown 0 (0%) 0 (0%)
Birth type

    Elektiv sectio 7,136 (5.8%) 38,354 (7.7%)
    Induktion 18,174 (15%) 91,815 (18%)
    Spontan 98,154 (80%) 367,233 (74%)
Region of residency

    North 7,272 (11%) 46,205 (9.3%)
    Central 10,618 (17%) 207,779 (42%)
    South 45,453 (72%) 243,418 (49%)
Maternal age at birth (years)

    <=24 20,186 (16%) 55,914 (11%)
    25-29 40,132 (33%) 155,889 (31%)
    30-34 39,267 (32%) 172,148 (35%)
    >=35 23,746 (19%) 113,391 (23%)
Maternal education

    Primary or lower 10,327 (10%) 35,275 (7.7%)
    High school 36,552 (36%) 160,604 (35%)
    University or higher 42,808 (42%) 222,978 (49%)
    Don´t know 12,222 (12%) 40,327 (8.8%)
Maternal body mass index

    Underweight 2,399 (2.5%) 12,670 (2.6%)
    Normalweight 52,599 (54%) 276,051 (58%)
    Overweight 26,991 (28%) 123,726 (26%)
    Obesity 15,371 (16%) 67,167 (14%)
Maternal smoking status 4,964 (6.0%) 21,807 (4.8%)
1 Mean (SD); n (%)

Analysis