# Load required libraries
library(tidyverse)
library(readxl)
library(knitr)
library(kableExtra)
library(gtsummary)
library(patchwork)
## Original microbiome_dataset values:
## 
##   No  Yes <NA> 
## 3656  203    0
## Data loaded successfully!
## Total participants: 3859
## Group distribution:
## 
## Excluded Included     <NA> 
##     3656      203        0

Overview

This report assesses the representativeness of the selected Microbiome dataset cohort relative to the overall cohort (hereafter referred to as the “Microbiome” and “overall” cohorts).

Tables and plots are presented to assess the frequency, distribution, etc. for each variable.

Assess representativeness according to demographic/baseline variable set: - Outlined in the “Variable Category” column of the data dictionary sheet in the data set.

Summary of Results

We find the distributions for most “primary” variables broadly similar between the Microbiome and overall cohorts, with some notable differences:

Primary Variables: - Maternal age at birth differs significantly (p<0.001), with the Microbiome cohort being slightly older (mean 33.0 vs 31.8 years) - Maternal asthma shows a significant difference (p=0.037), with lower prevalence in the Microbiome cohort (3.9% vs 8.1%) - DASS domains show some differences, with the Microbiome cohort having slightly better mental health scores at 36 weeks - Other demographic variables (gender, birth weight/length, BMI, diabetes, etc.) are well-balanced

Sample Availability: - The Microbiome cohort shows significantly higher engagement across all questionnaire measures (all p<0.001) - Much higher completion rates for ASQ questionnaires at all time points - Greater participation in follow-up assessments

Key Differences: - Twin births are over-represented in the Microbiome cohort (8.9% vs 3.2%, p<0.001) - Child age differs significantly, with Microbiome children being older (mean 4.5 vs 3.8 years, p<0.001) - BMI at 3 years is significantly lower in the Microbiome cohort (15.5 vs 15.9, p=0.004)

Data & Methods

  • DEIDENTIFIED Full Participant list Perron - Final dataset July 2025.xlsx
    • Contains data and data dictionary.
  • After some cleaning and the assignment of variable names, we get the following dimensions (rows, columns):
dim(dat)
## [1] 3859   86

We assess the variable similarity between Microbiome cohort (N = 203) and overall cohort (N = 3859) using:

  1. Summary tabulations
  2. Distributional plots
  3. Simple statistical tests
    • Note: the p-values presented can be interpreted with a grain of salt. Often in cases with large sample sizes, non-meaningful differences (in reality) return “significant” p-values.

Variable Breakdown

There are ~79 candidate variables that can be used to assess the similarity between the Microbiome and overall cohorts. - These variables are broadly “sample availability”, “child/maternal demographic”, “child/maternal characteristics”.

Firstly, let’s select just a handful candidate variables with the aim of getting an overall “snapshot” of the similarity between the sub-cohort and overall cohort. - For example, we want to ensure the sub-cohort is not entirely female, born in a single year, of a single ethnic origin, etc.

“Primary” variables (amended to include the 19 variables outlined) - Gender of child - Maternal age at birth - Maternal pre-pregnancy weight - Maternal pre-pregnancy height - Maternal pre-pregnancy BMI - Infant weight - Infant length - Infant BMI at birth - Infant ethnic origin - Indigenous status of baby - Vaginal or C-section birth - Maternal gestational diabetes status - Maternal Type 2 Diabetes - Maternal mental health diagnosis (Depression, Anxiety disorder, Bipolar, Schizophrenia, OCD, Anorexia Nervous, Specific Phobias, Behavioural Disorders) - Individual disorder breakdown: Each mental health condition is now analyzed separately, allowing participants with multiple conditions to be counted in each relevant category - Any mental health diagnosis: Overall indicator of any mental health condition - Depression: Depressive disorders - Anxiety: Anxiety disorders
- Bipolar: Bipolar affective disorder - OCD: Obsessive-compulsive disorder (includes various spellings) - Anorexia: Anorexia nervosa - Behavioural: Behavioural disorders - Maternal Asthma - Number of 18wk DASS domains with “Severe” or “Extremely Severe” - Number of 18wk DASS domains with “Normal” - Number of 36wk DASS domains with “Severe” or “Extremely Severe” - Number of 36wk DASS domains with “Normal”

Sample availability variables - Availability of maternal/child urine/blood/stool samples (20 weeks, 2 months, 6 months, 12 months, 3 years) - ASQ completion (4 month, 9 month, 1 year, 3 year, 5 year) - Early Connors assigned and completed - REDCap questionnaires assigned and completed - Availability of MNS data - Total questionnaires completed

Outcome variables - 1yr child wheeze - 1 year BMI - 1 year Ferritin results - 1 year count of positive SPT wheals - 1 year any positive food SPT wheals - 1 year any positive airborne/enviro SPT wheals - 3 year count of positive SPT wheals - 3 year any positive food SPT wheals - 3 year any positive airborne/enviro SPT wheals - 3 year BMI - 3 year wheeze - 3 year asthma - 3 year ferritin - 5 year BMI - 5 year asthma - 5 year Ferritin - 5 year any positive food SPT wheals - 5 year any positive airborne/enviro SPT wheals - 3 year count of Connors domains equal to, or above 65 - 3 year count of other clinical indicators parent reported as “3” highest

Remaining variables - Variables not contained in the primary nor sample availability variable set.

1) Primary Variables

Broadly, the distribution “primary” variable set between the Microbiome and overall cohorts is similar.

Notes - Child sex is well-balanced between groups (46.8% vs 47.9% female, p=0.799) - Maternal age at birth differs significantly, with the Microbiome cohort being older (mean 33.0 vs 31.8 years, p<0.001) - Maternal pre-pregnancy characteristics (weight, height, BMI) are well-balanced between groups - Infant birth characteristics (weight, length, BMI) show no significant differences - Ethnic origin and indigenous status are approximately balanced - Birth type, maternal gestational diabetes, and Type 2 diabetes show no significant differences - Mental health diagnosis patterns: - Overall mental health diagnoses are similar between groups (10.8% vs 14.5%) - Individual disorders show similar distributions across both cohorts - Maternal asthma shows a significant difference (p=0.037), with lower rates in the Microbiome cohort (3.9% vs 8.1%) - DASS mental health scores: - 18-week assessments show the Microbiome cohort trends toward better mental health (more “Normal” scores) - 36-week assessments show significantly more “Normal” scores in the Microbiome cohort (p=0.042) - Generally fewer severe mental health symptoms in the Microbiome group

# Primary continuous variables
primary_continuous <- list(
  list("gender_child", "Child sex assigned at birth"),
  list("maternal_age_birth", "Maternal Age (Birth)"),
  list("maternal_prepreg_weight", "Maternal pre-pregnancy weight"),
  list("maternal_prepreg_height", "Maternal Pre-pregnancy height"),
  list("maternal_prepreg_bmi_calc", "Maternal Pre-pregnancy BMI (Calc.)"),
  list("infant_weight", "Infant Weight"),
  list("infant_length", "Infant Length at birth"),
  list("infant_bmi_calc", "Infant BMI at birth (Calc.)"),
  list("ethnic_origin", "Ethnic Origin"),
  list("indigenous_status", "Indigenous Status of Baby"),
  list("birth_type_derived", "Vaginal or C section birth (Deriv.)"),
  list("maternal_gest_diabetes_derived", "Maternal Gestational Diabetes? (Deriv.)"),
  list("maternal_diabetes_t2_derived", "Maternal Type 2 Diabetes? (Deriv.)"),  # UPDATED
  list("mh_any", "Any Maternal Mental Health Diagnosis"),
  list("mh_depression", "Maternal Depression"),
  list("mh_anxiety", "Maternal Anxiety Disorder"),
  list("mh_bipolar", "Maternal Bipolar Disorder"),
  # REMOVED: list("mh_schizophrenia", "Maternal Schizophrenia"),
  list("mh_ocd", "Maternal OCD"),
  list("mh_anorexia", "Maternal Anorexia"),
  # REMOVED: list("mh_phobias", "Maternal Specific Phobias"),
  list("mh_behavioural", "Maternal Behavioural Disorders"),
  list("maternal_asthma_derived", "Maternal Asthma? (Deriv.)"),
  list("dass21_18w_severe_count", "Number of 18wk DASS domains with \"Severe\" or \"Extremely Severe\""),
  list("dass21_18w_normal_count", "Number of 18wk DASS domains with \"Normal\""),
  list("dass21_36w_severe_count", "Number of 36wk DASS domains with \"Severe\" or \"Extremely Severe\""),
  list("dass21_36w_normal_count", "Number of 36wk DASS domains with \"Normal\"")
)


for(var_info in primary_continuous) {
  var_name <- var_info[[1]]
  title <- var_info[[2]]
  
  if(var_name %in% numeric_vars) {
    analyze_continuous(dat, var_name, title)
  } else {
    analyze_categorical(dat, var_name, title)
  }
}

Child sex assigned at birth

P-value: 0.799 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Female 95 (46.8%) 1849 (47.9%)
Male 108 (53.2%) 2010 (52.1%)
Total 203 (100.0%) 3859 (100.0%)

Maternal Age (Birth)

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 203 33 (24, 44) 32 (30, 35)
Overall 3859 31.8 (17, 50) 32 (29, 35)

Unknown: Included = 0 , Overall = 0

Maternal pre-pregnancy weight

P-value: 0.662 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 191 69.6 (47, 128) 67 (60, 75)
Overall 3633 70.5 (38, 134) 68 (60, 79)

Unknown: Included = 12 , Overall = 226

Maternal Pre-pregnancy height

P-value: 0.595 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 194 1.7 (1.5, 1.8) 1.7 (1.6, 1.7)
Overall 3698 1.7 (1.4, 1.9) 1.6 (1.6, 1.7)

Unknown: Included = 9 , Overall = 161

Maternal Pre-pregnancy BMI (Calc.)

P-value: 0.587 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 191 25.4 (17.3, 42.3) 24.1 (21.9, 28.5)
Overall 3623 25.7 (14.7, 47.3) 24.7 (21.9, 28.6)

Unknown: Included = 12 , Overall = 236

Infant Weight

P-value: 0.315 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 186 3317.7 (1480, 4500) 3355 (3071.2, 3652.5)
Overall 3578 3369.2 (1095, 5410) 3390 (3062.8, 3695)

Unknown: Included = 17 , Overall = 281

Infant Length at birth

P-value: 0.096 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 186 50 (39, 57) 50 (48, 52)
Overall 3574 50.3 (31, 60) 50 (49, 52)

Unknown: Included = 17 , Overall = 285

Infant BMI at birth (Calc.)

P-value: 0.917 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 186 13.2 (7.6, 17.6) 13.2 (12.3, 14.1)
Overall 3574 13.3 (7.6, 32.3) 13.2 (12.3, 14.2)

Unknown: Included = 17 , Overall = 285

Ethnic Origin

P-value: 0.543 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 17 (8.4%) 285 (7.4%)
1 158 (77.8%) 2951 (76.5%)
3 8 (3.9%) 177 (4.6%)
4 2 (1%) 116 (3%)
5 2 (1%) 18 (0.5%)
8 16 (7.9%) 288 (7.5%)
10 0 (0%) 8 (0.2%)
6 0 (0%) 4 (0.1%)
7 0 (0%) 12 (0.3%)
Total 203 (100.0%) 3859 (100.0%)

Indigenous Status of Baby

P-value: 1.000 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 17 (8.4%) 283 (7.3%)
4 186 (91.6%) 3563 (92.3%)
1 0 (0%) 12 (0.3%)
2 0 (0%) 1 (0%)
Total 203 (100.0%) 3859 (100.0%)

Vaginal or C section birth (Deriv.)

P-value: 0.392 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 17 (8.4%) 281 (7.3%)
Caesarean Elective 58 (28.6%) 964 (25%)
Caesarean Emergency 41 (20.2%) 804 (20.8%)
Vaginal 87 (42.9%) 1810 (46.9%)
Total 203 (100.0%) 3859 (100.0%)

Maternal Gestational Diabetes? (Deriv.)

P-value: 0.695 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 17 (8.4%) 281 (7.3%)
FALSE 172 (84.7%) 3271 (84.8%)
TRUE 14 (6.9%) 307 (8%)
Total 203 (100.0%) 3859 (100.0%)

Maternal Type 2 Diabetes? (Deriv.)

P-value: 1.000 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 17 (8.4%) 281 (7.3%)
FALSE 186 (91.6%) 3572 (92.6%)
TRUE 0 (0%) 6 (0.2%)
Total 203 (100.0%) 3859 (100.0%)

Any Maternal Mental Health Diagnosis

P-value: 0.151 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 181 (89.2%) 3298 (85.5%)
TRUE 22 (10.8%) 561 (14.5%)
Total 203 (100.0%) 3859 (100.0%)

Maternal Depression

P-value: 0.239 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 192 (94.6%) 3557 (92.2%)
TRUE 11 (5.4%) 302 (7.8%)
Total 203 (100.0%) 3859 (100.0%)

Maternal Anxiety Disorder

P-value: 0.278 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 186 (91.6%) 3437 (89.1%)
TRUE 17 (8.4%) 422 (10.9%)
Total 203 (100.0%) 3859 (100.0%)

Maternal Bipolar Disorder

P-value: 1.000 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 203 (100%) 3845 (99.6%)
TRUE 0 (0%) 14 (0.4%)
Total 203 (100.0%) 3859 (100.0%)

Maternal OCD

P-value: 1.000 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 203 (100%) 3855 (99.9%)
TRUE 0 (0%) 4 (0.1%)
Total 203 (100.0%) 3859 (100.0%)

Maternal Anorexia

P-value: 1.000 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 203 (100%) 3843 (99.6%)
TRUE 0 (0%) 16 (0.4%)
Total 203 (100.0%) 3859 (100.0%)

Maternal Behavioural Disorders

P-value: 1.000 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 203 (100%) 3849 (99.7%)
TRUE 0 (0%) 10 (0.3%)
Total 203 (100.0%) 3859 (100.0%)

Maternal Asthma? (Deriv.)

P-value: 0.037 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 17 (8.4%) 281 (7.3%)
FALSE 178 (87.7%) 3264 (84.6%)
TRUE 8 (3.9%) 314 (8.1%)
Total 203 (100.0%) 3859 (100.0%)

Number of 18wk DASS domains with “Severe” or “Extremely Severe”

P-value: 0.171 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 177 0.1 (0, 2) 0 (0, 0)
Overall 2781 0.1 (0, 3) 0 (0, 0)

Unknown: Included = 26 , Overall = 1078

Number of 18wk DASS domains with “Normal”

P-value: 0.064 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 177 2.7 (0, 3) 3 (3, 3)
Overall 2781 2.6 (0, 3) 3 (2, 3)

Unknown: Included = 26 , Overall = 1078

Number of 36wk DASS domains with “Severe” or “Extremely Severe”

P-value: 0.285 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 119 0.1 (0, 3) 0 (0, 0)
Overall 1570 0.1 (0, 3) 0 (0, 0)

Unknown: Included = 84 , Overall = 2289

Number of 36wk DASS domains with “Normal”

P-value: 0.042 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 119 2.8 (0, 3) 3 (3, 3)
Overall 1570 2.6 (0, 3) 3 (3, 3)

Unknown: Included = 84 , Overall = 2289

2) Sample Availability

In general, sample availability and engagement in the Microbiome cohort is substantially higher relative to the overall cohort, with significant differences across virtually all measures.

# Sample availability variables
sample_vars <- list(
  list("mns_data_available", "MNS Data Available?"),
  list("asq_assigned", "ASQ Questionnaires Assigned"),
  list("asq_completed", "ASQ Questionnaires Completed"),
  list("early_connors_assigned", "Early Connors Assigned"),
  list("early_connors_completed", "Early Connors Completed"),
  list("aes_assigned", "AES Questionnaires Assigned"),
  list("aes_completed", "AES Questionnaires Completed"),
  list("redcap_assigned", "RedCap Questionnaires Assigned"),
  list("redcap_completed", "RedCap Questionnaires completed"),
  list("questionnaires_total_completed", "Total Questionnaires Completed"),
  list("asq_4m_completed", "ASQ 4 Month Completed"),
  list("asq_4m_paed_review", "ASQ 4 Month Review with Paediatrician"),
  list("asq_9m_completed", "ASQ 9 Month Completed"),
  list("asq_9m_paed_review", "ASQ 9 Month Review with Paediatrician"),
  list("asq_1yr_completed", "ASQ 1 Year Completed"),
  list("asq_1yr_paed_review", "ASQ 1 Year Review with Paediatrician"),
  list("asq_3yr_completed", "ASQ 3 Year Completed"),
  list("asq_3yr_paed_review", "ASQ 3 Year Review with Paediatrician"),
  list("asq_5yr_completed", "ASQ 5 Year Completed"),
  list("asq_5yr_paed_review", "ASQ 5 Year Review with Paediatrician"),
  list("asq_paed_review_count", "Number of times ASQ has prompted review with PAED")
)

for(var_info in sample_vars) {
  var_name <- var_info[[1]]
  title <- var_info[[2]]
  
  if(var_name %in% numeric_vars) {
    analyze_continuous(dat, var_name, title)
  } else {
    analyze_categorical(dat, var_name, title)
  }
}

MNS Data Available?

P-value: 0.633 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 17 (8.4%) 281 (7.3%)
TRUE 186 (91.6%) 3578 (92.7%)
Total 203 (100.0%) 3859 (100.0%)

ASQ Questionnaires Assigned

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 203 6 (3, 8) 6 (4, 8)
Overall 3591 5.1 (0, 8) 5 (4, 6)

Unknown: Included = 0 , Overall = 268

ASQ Questionnaires Completed

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 203 4.3 (0, 9) 4 (3, 5)
Overall 3591 2.6 (0, 9) 2 (1, 4)

Unknown: Included = 0 , Overall = 268

Early Connors Assigned

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 203 1.2 (0, 2) 1 (0, 2)
Overall 3591 0.8 (0, 2) 1 (0, 1)

Unknown: Included = 0 , Overall = 268

Early Connors Completed

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 203 0.8 (0, 2) 1 (0, 1)
Overall 3591 0.4 (0, 2) 0 (0, 1)

Unknown: Included = 0 , Overall = 268

AES Questionnaires Assigned

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 203 3.9 (1, 5) 4 (4, 4)
Overall 3591 3.4 (0, 5) 3 (3, 4)

Unknown: Included = 0 , Overall = 268

AES Questionnaires Completed

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 203 2 (0, 5) 2 (1, 3)
Overall 3591 1.1 (0, 5) 1 (0, 2)

Unknown: Included = 0 , Overall = 268

RedCap Questionnaires Assigned

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 203 5.9 (2, 7) 6 (5, 7)
Overall 3591 5.1 (1, 7) 5 (4, 7)

Unknown: Included = 0 , Overall = 268

RedCap Questionnaires completed

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 203 4.3 (0, 7) 5 (3, 6)
Overall 3591 2.8 (0, 7) 3 (1, 4)

Unknown: Included = 0 , Overall = 268

Total Questionnaires Completed

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 203 11.3 (0, 22) 11 (9, 14)
Overall 3859 6.3 (0, 22) 6 (2, 10)

Unknown: Included = 0 , Overall = 0

ASQ 4 Month Completed

P-value: <0.001 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 58 (28.6%) 2004 (51.9%)
TRUE 145 (71.4%) 1855 (48.1%)
Total 203 (100.0%) 3859 (100.0%)

ASQ 4 Month Review with Paediatrician

P-value: 0.074 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 58 (28.6%) 2004 (51.9%)
FALSE 80 (39.4%) 885 (22.9%)
TRUE 65 (32%) 970 (25.1%)
Total 203 (100.0%) 3859 (100.0%)

ASQ 9 Month Completed

P-value: <0.001 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 49 (24.1%) 2141 (55.5%)
TRUE 154 (75.9%) 1718 (44.5%)
Total 203 (100.0%) 3859 (100.0%)

ASQ 9 Month Review with Paediatrician

P-value: <0.001 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 49 (24.1%) 2141 (55.5%)
FALSE 94 (46.3%) 807 (20.9%)
TRUE 60 (29.6%) 911 (23.6%)
Total 203 (100.0%) 3859 (100.0%)

ASQ 1 Year Completed

P-value: <0.001 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 19 (9.4%) 1327 (34.4%)
TRUE 184 (90.6%) 2532 (65.6%)
Total 203 (100.0%) 3859 (100.0%)

ASQ 1 Year Review with Paediatrician

P-value: 0.017 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 19 (9.4%) 1327 (34.4%)
FALSE 110 (54.2%) 1293 (33.5%)
TRUE 74 (36.5%) 1239 (32.1%)
Total 203 (100.0%) 3859 (100.0%)

ASQ 3 Year Completed

P-value: <0.001 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 84 (41.4%) 2684 (69.6%)
TRUE 119 (58.6%) 1175 (30.4%)
Total 203 (100.0%) 3859 (100.0%)

ASQ 3 Year Review with Paediatrician

P-value: 1.000 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 84 (41.4%) 2684 (69.6%)
FALSE 46 (22.7%) 450 (11.7%)
TRUE 73 (36%) 725 (18.8%)
Total 203 (100.0%) 3859 (100.0%)

ASQ 5 Year Completed

P-value: <0.001 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
FALSE 164 (80.8%) 3611 (93.6%)
TRUE 39 (19.2%) 248 (6.4%)
Total 203 (100.0%) 3859 (100.0%)

ASQ 5 Year Review with Paediatrician

P-value: 1.000 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 164 (80.8%) 3611 (93.6%)
FALSE 11 (5.4%) 72 (1.9%)
TRUE 28 (13.8%) 176 (4.6%)
Total 203 (100.0%) 3859 (100.0%)

Number of times ASQ has prompted review with PAED

P-value: 0.020 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 200 1.5 (0, 4) 1 (1, 2)
Overall 3014 1.3 (0, 4) 1 (1, 2)

Unknown: Included = 3 , Overall = 845

3) Outcome Variables

Notes - Child wheeze at 1 year: No significant difference between groups (15.8% vs 14.0%, p=0.152) - BMI measurements: - 1-year BMI shows no significant difference (p=0.121) - 3-year BMI is significantly lower in the Microbiome cohort (15.5 vs 15.9, p=0.004) - 5-year BMI shows no significant difference (p=0.122) - Wheeze and asthma outcomes: No significant differences at any time point (1, 3, or 5 years) - Skin prick test (SPT) results: - Generally similar patterns between groups - Slight trend toward fewer positive results in Microbiome cohort at 3 years (p=0.076) - No significant differences in food or airborne allergies at any time point - Ferritin levels: No significant differences at 1 year (p=0.998), 3 years (p=0.703), or 5 years (p=0.832) - Behavioral assessments: - ASQ paediatric review rates show one significant difference (1-year reviews, p=0.017) - Connors domain scores and clinical indicators show no significant differences - Follow-up participation: The Microbiome cohort shows much higher engagement in data collection - Overall assessment: Most clinical outcome variables demonstrate good representativeness, with the main difference being consistently higher study engagement in the Microbiome cohort

# Outcome variables
outcome_vars <- list(
  list("child_wheeze_1yr", "Has your child ever had a wheezed at 1 Year?"),
  list("bmi_1yr_calc", "BMI at 1 Year (Calc.)"),
  list("ferritin_1yr", "Ferritin Results at 1 Year"),
  list("spt_positive_count_1yr", "Count of positive SPT wheals(>= 3MM WHEAL) at 1 Year"),
  list("spt_food_positive_1yr", "Any positive Food SPT wheals (>=3mm) at 1 Year"),
  list("spt_airborne_positive_1yr", "Any positive airborne/enviro SPT wheals (>=3mm) at 1 Year"),
  list("spt_positive_count_3yr", "Count of positive SPT wheals(>= 3MM WHEAL) at 3 Years"),
  list("spt_food_positive_3yr", "Any positive Food SPT wheals (>=3mm) at 3 Years"),
  list("spt_airborne_positive_3yr", "Any positive airborne/enviro SPT wheals (>=3mm) at 3 Years"),
  list("bmi_3yr_calc", "BMI at 3 Years (Calc.)"),
  list("wheeze_3yr", "3 year wheeze"),
  list("asthma_3yr", "3 year asthma"),
  list("followup_3yr", "3yr ferritin"),
  list("bmi_5yr_calc", "BMI at 5 Years (Calc.)"),
  list("asthma_5yr", "5 year asthma"),
  list("ferritin_5yr", "5yr ferritin"),
  list("spt_positive_count_5yr", "Count of positive SPT wheals (>=3mm) at 5 Years"),
  list("spt_food_positive_5yr", "Any positive Food SPT wheals (>=3mm) at 5 Years"),
  list("spt_airborne_positive_5yr", "Any positive airborne/enviro SPT wheals (>=3mm) at 5 Years"),
  list("connors_domains_above65_3yr", "Count of Connors domains equal to, or above 65 at 3 Years"),
  list("clinical_indicators_highest_3yr", "Count of other clinical indicators parent reported as \"3\" highest at 3 Years")
)

for(var_info in outcome_vars) {
  var_name <- var_info[[1]]
  title <- var_info[[2]]
  
  if(var_name %in% numeric_vars) {
    analyze_continuous(dat, var_name, title)
  } else {
    analyze_categorical(dat, var_name, title)
  }
}

Has your child ever had a wheezed at 1 Year?

P-value: 0.152 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 14 (6.9%) 1315 (34.1%)
No 157 (77.3%) 2002 (51.9%)
Yes 32 (15.8%) 542 (14%)
Total 203 (100.0%) 3859 (100.0%)

BMI at 1 Year (Calc.)

P-value: 0.121 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 154 16.8 (13.8, 21.6) 16.8 (15.9, 17.6)
Overall 1832 17 (10.2, 26.1) 16.9 (16, 18)

Unknown: Included = 49 , Overall = 2027

Ferritin Results at 1 Year

P-value: 0.998 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 80 32.5 (5, 367) 25 (17, 36.5)
Overall 542 32.1 (5, 871) 25 (16, 37)

Unknown: Included = 123 , Overall = 3317

Count of positive SPT wheals(>= 3MM WHEAL) at 1 Year

P-value: 0.913 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 200 0.2 (0, 5) 0 (0, 0)
Overall 2383 0.2 (0, 8) 0 (0, 0)

Unknown: Included = 3 , Overall = 1476

Any positive Food SPT wheals (>=3mm) at 1 Year

P-value: 0.758 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 3 (1.5%) 1476 (38.2%)
FALSE 180 (88.7%) 2165 (56.1%)
TRUE 20 (9.9%) 218 (5.6%)
Total 203 (100.0%) 3859 (100.0%)

Any positive airborne/enviro SPT wheals (>=3mm) at 1 Year

P-value: 0.478 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 3 (1.5%) 1476 (38.2%)
FALSE 197 (97%) 2325 (60.2%)
TRUE 3 (1.5%) 58 (1.5%)
Total 203 (100.0%) 3859 (100.0%)

Count of positive SPT wheals(>= 3MM WHEAL) at 3 Years

P-value: 0.076 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 111 0.1 (0, 3) 0 (0, 0)
Overall 1034 0.2 (0, 5) 0 (0, 0)

Unknown: Included = 92 , Overall = 2825

Any positive Food SPT wheals (>=3mm) at 3 Years

P-value: 1.000 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 92 (45.3%) 2825 (73.2%)
FALSE 107 (52.7%) 992 (25.7%)
TRUE 4 (2%) 42 (1.1%)
Total 203 (100.0%) 3859 (100.0%)

Any positive airborne/enviro SPT wheals (>=3mm) at 3 Years

P-value: 0.053 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 92 (45.3%) 2825 (73.2%)
FALSE 102 (50.2%) 882 (22.9%)
TRUE 9 (4.4%) 152 (3.9%)
Total 203 (100.0%) 3859 (100.0%)

BMI at 3 Years (Calc.)

P-value: 0.004 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 98 15.5 (12.7, 19.3) 15.4 (14.7, 16.4)
Overall 1021 15.9 (11.3, 27.3) 15.8 (15, 16.7)

Unknown: Included = 105 , Overall = 2838

3 year wheeze

P-value: 0.615 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 82 (40.4%) 2597 (67.3%)
FALSE 96 (47.3%) 973 (25.2%)
TRUE 25 (12.3%) 289 (7.5%)
Total 203 (100.0%) 3859 (100.0%)

3 year asthma

P-value: 0.267 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 83 (40.9%) 2603 (67.5%)
FALSE 116 (57.1%) 1233 (32%)
TRUE 4 (2%) 23 (0.6%)
Total 203 (100.0%) 3859 (100.0%)

3yr ferritin

P-value: 0.703 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 89 21 (7, 67) 18 (14, 24)
Overall 610 22.1 (5, 175) 19 (14, 26)

Unknown: Included = 114 , Overall = 3249

BMI at 5 Years (Calc.)

P-value: 0.122 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 58 15.4 (12.6, 18.9) 15.4 (14.7, 16)
Overall 466 15.8 (11.5, 26.4) 15.6 (14.9, 16.5)

Unknown: Included = 145 , Overall = 3393

5 year asthma

P-value: 1.000 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 140 (69%) 3374 (87.4%)
FALSE 60 (29.6%) 461 (11.9%)
TRUE 3 (1.5%) 24 (0.6%)
Total 203 (100.0%) 3859 (100.0%)

5yr ferritin

P-value: 0.832 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 37 24.4 (7, 55) 21 (17, 30)
Overall 262 25.8 (7, 124) 22 (17, 30)

Unknown: Included = 166 , Overall = 3597

Count of positive SPT wheals (>=3mm) at 5 Years

P-value: 0.723 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 61 1 (0, 6) 0 (0, 2)
Overall 414 1 (0, 10) 0 (0, 2)

Unknown: Included = 142 , Overall = 3445

Any positive Food SPT wheals (>=3mm) at 5 Years

P-value: 0.702 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 142 (70%) 3445 (89.3%)
FALSE 56 (27.6%) 388 (10.1%)
TRUE 5 (2.5%) 26 (0.7%)
Total 203 (100.0%) 3859 (100.0%)

Any positive airborne/enviro SPT wheals (>=3mm) at 5 Years

P-value: 0.774 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 142 (70%) 3445 (89.3%)
FALSE 38 (18.7%) 268 (6.9%)
TRUE 23 (11.3%) 146 (3.8%)
Total 203 (100.0%) 3859 (100.0%)

Count of Connors domains equal to, or above 65 at 3 Years

P-value: 0.937 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 111 0.4 (0, 6) 0 (0, 1)
Overall 1062 0.4 (0, 7) 0 (0, 1)

Unknown: Included = 92 , Overall = 2797

Count of other clinical indicators parent reported as “3” highest at 3 Years

Data not available

4) Remaining Variables

All remaining variables not contained in (1), (2) or (3).

Key differences identified in remaining variables:

  • Twin births are significantly over-represented in the Microbiome cohort (8.9% vs 3.2%, p<0.001)
  • Current child age differs significantly, with Microbiome children being older (mean 4.5 vs 3.8 years, p<0.001)
  • Age at Peapod assessment differs significantly (mean 7.7 vs 5.1 days, p<0.001)
  • DASS anxiety scores at 18 weeks show a significant difference (p=0.044), with the Microbiome cohort having better anxiety profiles
  • Previous pregnancies and parity show no significant differences
  • BMI at Peapod and other DASS measures show no significant differences

The over-representation of twins and older child age in the Microbiome cohort suggests this sub-sample may represent families with longer study engagement and more complex pregnancies.

# Remaining variables
remaining_vars <- list(
  list("singleton_twin", "Singleton or Twin"),
  list("current_age_march2024", "Current age of child (as of March 2024)"),
  list("previous_pregnancies", "Previous Pregnancies"),
  list("previous_pregnancies_parity", "Previous Pregnancies Parity"),
  list("bmi_peapod_calc", "BMI at Peapod (Calc.)"),
  list("age_days_peapod_calc", "Age (days) at Peapod (Calc.)"),
  list("dass21_18w_depression", "DASS21 Depression 18 Week"),
  list("dass21_18w_anxiety", "DASS21 Anxiety 18 Week"),
  list("dass21_18w_stress", "DASS21 Stress 18 Week"),
  list("dass21_36w_depression", "DASS21 Depression 36 Week"),
  list("dass21_36w_anxiety", "DASS21 Anxiety 36 Week"),
  list("dass21_36w_stress", "DASS21 Stress 36 Week")
)

for(var_info in remaining_vars) {
  var_name <- var_info[[1]]
  title <- var_info[[2]]
  
  if(var_name %in% numeric_vars) {
    analyze_continuous(dat, var_name, title)
  } else {
    analyze_categorical(dat, var_name, title)
  }
}

Singleton or Twin

P-value: <0.001 (Chi-squared test)
Characteristic Included
N = 203
Overall
N = 3859
Singleton 185 (91.1%) 3735 (96.8%)
Twins 18 (8.9%) 124 (3.2%)
Total 203 (100.0%) 3859 (100.0%)

Current age of child (as of March 2024)

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 203 4.5 (1.4, 7.2) 4.8 (3.1, 5.9)
Overall 3859 3.8 (0.6, 7.3) 3.7 (2.4, 5.2)

Unknown: Included = 0 , Overall = 0

Previous Pregnancies

P-value: 0.158 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 186 1.3 (0, 6) 1 (0, 2)
Overall 3576 1.2 (0, 14) 1 (0, 2)

Unknown: Included = 17 , Overall = 283

Previous Pregnancies Parity

P-value: 0.582 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 186 0.7 (0, 4) 1 (0, 1)
Overall 3570 0.7 (0, 6) 0 (0, 1)

Unknown: Included = 17 , Overall = 289

BMI at Peapod (Calc.)

P-value: 0.640 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 167 12.8 (10, 17.1) 12.6 (11.9, 13.5)
Overall 2210 12.8 (8.5, 28.1) 12.8 (11.8, 13.8)

Unknown: Included = 36 , Overall = 1649

Age (days) at Peapod (Calc.)

P-value: <0.001 (Wilcoxon rank sum test)
Characteristic N Mean (Min, Max) Median (Q1, Q3)
Included 167 7.7 (0, 56) 3 (1, 10)
Overall 2210 5.1 (0, 82) 2 (1, 4)

Unknown: Included = 36 , Overall = 1649

DASS21 Depression 18 Week

P-value: 0.800 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 26 (12.8%) 1078 (27.9%)
Mild 8 (3.9%) 153 (4%)
Moderate 6 (3%) 120 (3.1%)
Normal 162 (79.8%) 2460 (63.7%)
Severe 1 (0.5%) 24 (0.6%)
Extremely Severe 0 (0%) 24 (0.6%)
Total 203 (100.0%) 3859 (100.0%)

DASS21 Anxiety 18 Week

P-value: 0.043 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 26 (12.8%) 1078 (27.9%)
Extremely Severe 1 (0.5%) 57 (1.5%)
Mild 9 (4.4%) 302 (7.8%)
Moderate 9 (4.4%) 130 (3.4%)
Normal 154 (75.9%) 2224 (57.6%)
Severe 4 (2%) 68 (1.8%)
Total 203 (100.0%) 3859 (100.0%)

DASS21 Stress 18 Week

P-value: 0.607 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 26 (12.8%) 1078 (27.9%)
Extremely Severe 2 (1%) 23 (0.6%)
Mild 6 (3%) 155 (4%)
Moderate 6 (3%) 101 (2.6%)
Normal 161 (79.3%) 2447 (63.4%)
Severe 2 (1%) 55 (1.4%)
Total 203 (100.0%) 3859 (100.0%)

DASS21 Depression 36 Week

P-value: 0.232 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 84 (41.4%) 2289 (59.3%)
Extremely Severe 1 (0.5%) 13 (0.3%)
Mild 6 (3%) 86 (2.2%)
Normal 111 (54.7%) 1409 (36.5%)
Severe 1 (0.5%) 14 (0.4%)
Moderate 0 (0%) 48 (1.2%)
Total 203 (100.0%) 3859 (100.0%)

DASS21 Anxiety 36 Week

P-value: 0.880 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 84 (41.4%) 2289 (59.3%)
Extremely Severe 1 (0.5%) 32 (0.8%)
Mild 9 (4.4%) 130 (3.4%)
Moderate 4 (2%) 72 (1.9%)
Normal 104 (51.2%) 1310 (33.9%)
Severe 1 (0.5%) 26 (0.7%)
Total 203 (100.0%) 3859 (100.0%)

DASS21 Stress 36 Week

P-value: 0.055 (Fisher’s exact test)
Characteristic Included
N = 203
Overall
N = 3859
Unknown 84 (41.4%) 2289 (59.3%)
Mild 2 (1%) 71 (1.8%)
Normal 115 (56.7%) 1403 (36.4%)
Severe 2 (1%) 31 (0.8%)
Extremely Severe 0 (0%) 12 (0.3%)
Moderate 0 (0%) 53 (1.4%)
Total 203 (100.0%) 3859 (100.0%)

Plotting

# Create distribution plots for key continuous variables
plot_vars <- c("maternal_age_birth", "infant_weight", "bmi_1yr_calc", "bmi_3yr_calc", 
               "bmi_5yr_calc", "ferritin_1yr", "ferritin_3yr", "ferritin_5yr", "dass21_18w_normal_count", 
               "dass21_36w_normal_count")

for(var in plot_vars) {
  if(var %in% names(dat) && is.numeric(dat[[var]]) && !all(is.na(dat[[var]]))) {
    
    # Create a more descriptive title
    plot_title <- case_when(
      var == "maternal_age_birth" ~ "Maternal Age at Birth",
      var == "infant_weight" ~ "Infant Weight at Birth",
      var == "bmi_1yr_calc" ~ "BMI at 1 Year",
      var == "bmi_3yr_calc" ~ "BMI at 3 Years", 
      var == "bmi_5yr_calc" ~ "BMI at 5 Years",
      var == "ferritin_1yr" ~ "Ferritin Levels at 1 Year",
      var == "ferritin_3yr" ~ "Ferritin Levels at 3 Years",
      var == "ferritin_5yr" ~ "Ferritin Levels at 5 Years",
      var == "dass21_18w_normal_count" ~ "DASS Normal Domains at 18 Weeks",
      var == "dass21_36w_normal_count" ~ "DASS Normal Domains at 36 Weeks",
      TRUE ~ str_replace_all(var, "_", " ")
    )
    
    # Create data for all three plots with a unified grouping variable
    plot_data <- dat %>%
      filter(!is.na(.data[[var]])) %>%
      mutate(
        group_type = case_when(
          microbiome_dataset == "Excluded" ~ "Excluded",
          microbiome_dataset == "Included" ~ "Included",
          TRUE ~ "Overall"
        )
      )
    
    # Add overall data
    overall_data <- dat %>%
      filter(!is.na(.data[[var]])) %>%
      mutate(group_type = "Overall")
    
    # Combine data for three-panel plot
    combined_data <- bind_rows(plot_data, overall_data) %>%
      mutate(
        group_type = factor(group_type, levels = c("Excluded", "Included", "Overall"))
      )
    
    p <- combined_data %>%
      ggplot(aes(x = .data[[var]])) +
      geom_histogram(aes(fill = group_type), alpha = 0.7, bins = 30) +
      facet_wrap(~group_type, scales = "free_y", ncol = 3) +
      theme_minimal() +
      labs(
        title = paste("Distribution of", plot_title),
        x = plot_title,
        y = "Count"
      ) +
      theme(
        legend.position = "none",
        plot.title = element_text(size = 14, face = "bold"),
        strip.text = element_text(size = 12, face = "bold")
      ) +
      scale_fill_manual(values = c("Excluded" = "#E74C3C", "Included" = "#3498DB", "Overall" = "#2C3E50"))
    
    print(p)
  }
}

# Create a comparison plot for ferritin levels across time points
ferritin_data <- dat %>%
  select(microbiome_dataset, ferritin_1yr, ferritin_3yr, ferritin_5yr) %>%
  pivot_longer(cols = c(ferritin_1yr, ferritin_3yr, ferritin_5yr), 
               names_to = "time_point", 
               values_to = "ferritin_level") %>%
  filter(!is.na(ferritin_level)) %>%
  mutate(
    time_point = case_when(
      time_point == "ferritin_1yr" ~ "1 Year",
      time_point == "ferritin_3yr" ~ "3 Years",
      time_point == "ferritin_5yr" ~ "5 Years",
      TRUE ~ time_point
    ),
    time_point = factor(time_point, levels = c("1 Year", "3 Years", "5 Years"))
  )

if(nrow(ferritin_data) > 0) {
  p_ferritin <- ferritin_data %>%
    ggplot(aes(x = ferritin_level, fill = microbiome_dataset)) +
    geom_histogram(alpha = 0.7, position = "identity", bins = 25) +
    facet_grid(microbiome_dataset ~ time_point, scales = "free") +
    theme_minimal() +
    labs(
      title = "Ferritin Levels Comparison Across Time Points",
      x = "Ferritin Level",
      y = "Count"
    ) +
    theme(
      legend.position = "none",
      plot.title = element_text(size = 14, face = "bold"),
      strip.text = element_text(size = 11, face = "bold")
    ) +
    scale_fill_manual(values = c("Excluded" = "#E74C3C", "Included" = "#3498DB"))
  
  print(p_ferritin)
}

# Create a comparison plot for DASS normal domains across time points
dass_data <- dat %>%
  select(microbiome_dataset, dass21_18w_normal_count, dass21_36w_normal_count) %>%
  pivot_longer(cols = c(dass21_18w_normal_count, dass21_36w_normal_count), 
               names_to = "time_point", 
               values_to = "normal_count") %>%
  filter(!is.na(normal_count)) %>%
  mutate(
    time_point = case_when(
      time_point == "dass21_18w_normal_count" ~ "18 Weeks",
      time_point == "dass21_36w_normal_count" ~ "36 Weeks", 
      TRUE ~ time_point
    )
  )

if(nrow(dass_data) > 0) {
  p_dass <- dass_data %>%
    ggplot(aes(x = normal_count, fill = microbiome_dataset)) +
    geom_bar(alpha = 0.7, position = "dodge") +
    facet_grid(microbiome_dataset ~ time_point) +
    theme_minimal() +
    labs(
      title = "DASS Normal Domains Comparison Across Time Points",
      x = "Number of Normal DASS Domains (0-3)",
      y = "Count"
    ) +
    theme(
      legend.position = "none",
      plot.title = element_text(size = 14, face = "bold"),
      strip.text = element_text(size = 11, face = "bold")
    ) +
    scale_fill_manual(values = c("Excluded" = "#E74C3C", "Included" = "#3498DB")) +
    scale_x_continuous(breaks = 0:3)
  
  print(p_dass)
}

# Create a BMI trajectory comparison plot
bmi_data <- dat %>%
  select(microbiome_dataset, bmi_1yr_calc, bmi_3yr_calc, bmi_5yr_calc) %>%
  pivot_longer(cols = c(bmi_1yr_calc, bmi_3yr_calc, bmi_5yr_calc), 
               names_to = "time_point", 
               values_to = "bmi") %>%
  filter(!is.na(bmi)) %>%
  mutate(
    time_point = case_when(
      time_point == "bmi_1yr_calc" ~ "1 Year",
      time_point == "bmi_3yr_calc" ~ "3 Years",
      time_point == "bmi_5yr_calc" ~ "5 Years",
      TRUE ~ time_point
    ),
    time_point = factor(time_point, levels = c("1 Year", "3 Years", "5 Years"))
  )

if(nrow(bmi_data) > 0) {
  p_bmi <- bmi_data %>%
    ggplot(aes(x = bmi, fill = microbiome_dataset)) +
    geom_histogram(alpha = 0.7, position = "identity", bins = 25) +
    facet_grid(microbiome_dataset ~ time_point, scales = "free") +
    theme_minimal() +
    labs(
      title = "BMI Comparison Across Time Points",
      x = "BMI",
      y = "Count"
    ) +
    theme(
      legend.position = "none",
      plot.title = element_text(size = 14, face = "bold"),
      strip.text = element_text(size = 11, face = "bold")
    ) +
    scale_fill_manual(values = c("Excluded" = "#E74C3C", "Included" = "#3498DB"))
  
  print(p_bmi)
}

Reproducible Research Information

This document was prepared using the software R, via the RStudio IDE, and was written in RMarkdown.

sessionInfo()
## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8   
## [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.utf8    
## 
## time zone: Australia/Perth
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] patchwork_1.2.0  gtsummary_2.3.0  kableExtra_1.4.0 knitr_1.48      
##  [5] readxl_1.4.3     lubridate_1.9.3  forcats_1.0.0    stringr_1.5.1   
##  [9] dplyr_1.1.4      purrr_1.0.2      readr_2.1.5      tidyr_1.3.1     
## [13] tibble_3.2.1     ggplot2_3.5.1    tidyverse_2.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9        utf8_1.2.4        generics_0.1.3    xml2_1.3.6       
##  [5] stringi_1.8.4     hms_1.1.3         digest_0.6.37     magrittr_2.0.3   
##  [9] evaluate_0.24.0   grid_4.3.1        timechange_0.3.0  fastmap_1.2.0    
## [13] cellranger_1.1.0  jsonlite_1.8.8    fansi_1.0.6       viridisLite_0.4.2
## [17] scales_1.3.0      jquerylib_0.1.4   cli_3.6.3         rlang_1.1.4      
## [21] munsell_0.5.1     withr_3.0.1       cachem_1.1.0      yaml_2.3.10      
## [25] tools_4.3.1       tzdb_0.4.0        colorspace_2.1-1  vctrs_0.6.5      
## [29] R6_2.5.1          lifecycle_1.0.4   pkgconfig_2.0.3   pillar_1.9.0     
## [33] bslib_0.8.0       gtable_0.3.5      glue_1.8.0        systemfonts_1.2.2
## [37] highr_0.11        xfun_0.52         tidyselect_1.2.1  rstudioapi_0.16.0
## [41] farver_2.1.2      htmltools_0.5.8.1 labeling_0.4.3    rmarkdown_2.28   
## [45] svglite_2.1.3     compiler_4.3.1