# Load required libraries
library(tidyverse)
library(readxl)
library(knitr)
library(kableExtra)
library(gtsummary)
library(patchwork)
## Original microbiome_dataset values:
##
## No Yes <NA>
## 3656 203 0
## Data loaded successfully!
## Total participants: 3859
## Group distribution:
##
## Excluded Included <NA>
## 3656 203 0
Overview
This report assesses the representativeness of the selected
Microbiome dataset cohort relative to the overall
cohort (hereafter referred to as the “Microbiome” and “overall”
cohorts).
Tables and plots are presented to assess the frequency, distribution,
etc. for each variable.
Assess representativeness according to demographic/baseline variable
set: - Outlined in the “Variable Category” column of the data dictionary
sheet in the data set.
Summary of Results
We find the distributions for most “primary” variables broadly
similar between the Microbiome and overall cohorts, with some notable
differences:
Primary Variables: - Maternal age at
birth differs significantly (p<0.001), with the Microbiome
cohort being slightly older (mean 33.0 vs 31.8 years) - Maternal
asthma shows a significant difference (p=0.037), with lower
prevalence in the Microbiome cohort (3.9% vs 8.1%) - DASS
domains show some differences, with the Microbiome cohort
having slightly better mental health scores at 36 weeks - Other
demographic variables (gender, birth weight/length, BMI, diabetes, etc.)
are well-balanced
Sample Availability: - The Microbiome cohort shows
significantly higher engagement across all
questionnaire measures (all p<0.001) - Much higher completion rates
for ASQ questionnaires at all time points - Greater participation in
follow-up assessments
Key Differences: - Twin births are
over-represented in the Microbiome cohort (8.9% vs 3.2%, p<0.001) -
Child age differs significantly, with Microbiome
children being older (mean 4.5 vs 3.8 years, p<0.001) - BMI
at 3 years is significantly lower in the Microbiome cohort
(15.5 vs 15.9, p=0.004)
Data & Methods
- DEIDENTIFIED Full Participant list Perron - Final dataset
July 2025.xlsx
- Contains data and data dictionary.
- After some cleaning and the assignment of variable names, we get the
following dimensions (rows, columns):
dim(dat)
## [1] 3859 86
We assess the variable similarity between Microbiome cohort (N = 203)
and overall cohort (N = 3859) using:
- Summary tabulations
- Distributional plots
- Simple statistical tests
- Note: the p-values presented can be interpreted with a grain of
salt. Often in cases with large sample sizes, non-meaningful differences
(in reality) return “significant” p-values.
Variable Breakdown
There are ~79 candidate variables that can be used to assess the
similarity between the Microbiome and overall cohorts. - These variables
are broadly “sample availability”, “child/maternal demographic”,
“child/maternal characteristics”.
Firstly, let’s select just a handful candidate variables with the aim
of getting an overall “snapshot” of the similarity between the
sub-cohort and overall cohort. - For example, we want to ensure the
sub-cohort is not entirely female, born in a single year, of a single
ethnic origin, etc.
“Primary” variables (amended to include the 19 variables
outlined) - Gender of child - Maternal age at birth - Maternal
pre-pregnancy weight - Maternal pre-pregnancy height - Maternal
pre-pregnancy BMI - Infant weight - Infant length - Infant BMI at birth
- Infant ethnic origin - Indigenous status of baby - Vaginal or
C-section birth - Maternal gestational diabetes status - Maternal Type 2
Diabetes - Maternal mental health diagnosis (Depression, Anxiety
disorder, Bipolar, Schizophrenia, OCD, Anorexia Nervous, Specific
Phobias, Behavioural Disorders) - Individual disorder
breakdown: Each mental health condition is now analyzed
separately, allowing participants with multiple conditions to be counted
in each relevant category - Any mental health
diagnosis: Overall indicator of any mental health condition -
Depression: Depressive disorders -
Anxiety: Anxiety disorders
- Bipolar: Bipolar affective disorder -
OCD: Obsessive-compulsive disorder (includes various
spellings) - Anorexia: Anorexia nervosa -
Behavioural: Behavioural disorders - Maternal Asthma -
Number of 18wk DASS domains with “Severe” or “Extremely Severe” - Number
of 18wk DASS domains with “Normal” - Number of 36wk DASS domains with
“Severe” or “Extremely Severe” - Number of 36wk DASS domains with
“Normal”
Sample availability variables - Availability of
maternal/child urine/blood/stool samples (20 weeks, 2 months, 6 months,
12 months, 3 years) - ASQ completion (4 month, 9 month, 1 year, 3 year,
5 year) - Early Connors assigned and completed - REDCap questionnaires
assigned and completed - Availability of MNS data - Total questionnaires
completed
Outcome variables - 1yr child wheeze - 1 year BMI -
1 year Ferritin results - 1 year count of positive SPT wheals - 1 year
any positive food SPT wheals - 1 year any positive airborne/enviro SPT
wheals - 3 year count of positive SPT wheals - 3 year any positive food
SPT wheals - 3 year any positive airborne/enviro SPT wheals - 3 year BMI
- 3 year wheeze - 3 year asthma - 3 year ferritin - 5 year BMI - 5 year
asthma - 5 year Ferritin - 5 year any positive food SPT wheals - 5 year
any positive airborne/enviro SPT wheals - 3 year count of Connors
domains equal to, or above 65 - 3 year count of other clinical
indicators parent reported as “3” highest
Remaining variables - Variables not contained in the
primary nor sample availability variable set.
1) Primary Variables
Broadly, the distribution “primary” variable set between the
Microbiome and overall cohorts is similar.
Notes - Child sex is well-balanced
between groups (46.8% vs 47.9% female, p=0.799) - Maternal age
at birth differs significantly, with the Microbiome cohort
being older (mean 33.0 vs 31.8 years, p<0.001) - Maternal
pre-pregnancy characteristics (weight, height, BMI) are
well-balanced between groups - Infant birth
characteristics (weight, length, BMI) show no significant
differences - Ethnic origin and indigenous status are
approximately balanced - Birth type, maternal gestational
diabetes, and Type 2 diabetes show no significant differences -
Mental health diagnosis patterns: - Overall mental
health diagnoses are similar between groups (10.8% vs 14.5%) -
Individual disorders show similar distributions across both cohorts -
Maternal asthma shows a significant difference
(p=0.037), with lower rates in the Microbiome cohort (3.9% vs 8.1%) -
DASS mental health scores: - 18-week assessments show
the Microbiome cohort trends toward better mental health (more “Normal”
scores) - 36-week assessments show significantly more “Normal” scores in
the Microbiome cohort (p=0.042) - Generally fewer severe mental health
symptoms in the Microbiome group
# Primary continuous variables
primary_continuous <- list(
list("gender_child", "Child sex assigned at birth"),
list("maternal_age_birth", "Maternal Age (Birth)"),
list("maternal_prepreg_weight", "Maternal pre-pregnancy weight"),
list("maternal_prepreg_height", "Maternal Pre-pregnancy height"),
list("maternal_prepreg_bmi_calc", "Maternal Pre-pregnancy BMI (Calc.)"),
list("infant_weight", "Infant Weight"),
list("infant_length", "Infant Length at birth"),
list("infant_bmi_calc", "Infant BMI at birth (Calc.)"),
list("ethnic_origin", "Ethnic Origin"),
list("indigenous_status", "Indigenous Status of Baby"),
list("birth_type_derived", "Vaginal or C section birth (Deriv.)"),
list("maternal_gest_diabetes_derived", "Maternal Gestational Diabetes? (Deriv.)"),
list("maternal_diabetes_t2_derived", "Maternal Type 2 Diabetes? (Deriv.)"), # UPDATED
list("mh_any", "Any Maternal Mental Health Diagnosis"),
list("mh_depression", "Maternal Depression"),
list("mh_anxiety", "Maternal Anxiety Disorder"),
list("mh_bipolar", "Maternal Bipolar Disorder"),
# REMOVED: list("mh_schizophrenia", "Maternal Schizophrenia"),
list("mh_ocd", "Maternal OCD"),
list("mh_anorexia", "Maternal Anorexia"),
# REMOVED: list("mh_phobias", "Maternal Specific Phobias"),
list("mh_behavioural", "Maternal Behavioural Disorders"),
list("maternal_asthma_derived", "Maternal Asthma? (Deriv.)"),
list("dass21_18w_severe_count", "Number of 18wk DASS domains with \"Severe\" or \"Extremely Severe\""),
list("dass21_18w_normal_count", "Number of 18wk DASS domains with \"Normal\""),
list("dass21_36w_severe_count", "Number of 36wk DASS domains with \"Severe\" or \"Extremely Severe\""),
list("dass21_36w_normal_count", "Number of 36wk DASS domains with \"Normal\"")
)
for(var_info in primary_continuous) {
var_name <- var_info[[1]]
title <- var_info[[2]]
if(var_name %in% numeric_vars) {
analyze_continuous(dat, var_name, title)
} else {
analyze_categorical(dat, var_name, title)
}
}
Child sex assigned at birth
P-value: 0.799 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Female
|
95 (46.8%)
|
1849 (47.9%)
|
Male
|
108 (53.2%)
|
2010 (52.1%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Maternal Age (Birth)
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
203
|
33 (24, 44)
|
32 (30, 35)
|
Overall
|
3859
|
31.8 (17, 50)
|
32 (29, 35)
|
Unknown: Included = 0 , Overall = 0
Maternal pre-pregnancy weight
P-value: 0.662 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
191
|
69.6 (47, 128)
|
67 (60, 75)
|
Overall
|
3633
|
70.5 (38, 134)
|
68 (60, 79)
|
Unknown: Included = 12 , Overall = 226
Maternal Pre-pregnancy height
P-value: 0.595 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
194
|
1.7 (1.5, 1.8)
|
1.7 (1.6, 1.7)
|
Overall
|
3698
|
1.7 (1.4, 1.9)
|
1.6 (1.6, 1.7)
|
Unknown: Included = 9 , Overall = 161
Maternal Pre-pregnancy BMI (Calc.)
P-value: 0.587 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
191
|
25.4 (17.3, 42.3)
|
24.1 (21.9, 28.5)
|
Overall
|
3623
|
25.7 (14.7, 47.3)
|
24.7 (21.9, 28.6)
|
Unknown: Included = 12 , Overall = 236
Infant Weight
P-value: 0.315 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
186
|
3317.7 (1480, 4500)
|
3355 (3071.2, 3652.5)
|
Overall
|
3578
|
3369.2 (1095, 5410)
|
3390 (3062.8, 3695)
|
Unknown: Included = 17 , Overall = 281
Infant Length at birth
P-value: 0.096 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
186
|
50 (39, 57)
|
50 (48, 52)
|
Overall
|
3574
|
50.3 (31, 60)
|
50 (49, 52)
|
Unknown: Included = 17 , Overall = 285
Infant BMI at birth (Calc.)
P-value: 0.917 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
186
|
13.2 (7.6, 17.6)
|
13.2 (12.3, 14.1)
|
Overall
|
3574
|
13.3 (7.6, 32.3)
|
13.2 (12.3, 14.2)
|
Unknown: Included = 17 , Overall = 285
Ethnic Origin
P-value: 0.543 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
17 (8.4%)
|
285 (7.4%)
|
1
|
158 (77.8%)
|
2951 (76.5%)
|
3
|
8 (3.9%)
|
177 (4.6%)
|
4
|
2 (1%)
|
116 (3%)
|
5
|
2 (1%)
|
18 (0.5%)
|
8
|
16 (7.9%)
|
288 (7.5%)
|
10
|
0 (0%)
|
8 (0.2%)
|
6
|
0 (0%)
|
4 (0.1%)
|
7
|
0 (0%)
|
12 (0.3%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Indigenous Status of Baby
P-value: 1.000 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
17 (8.4%)
|
283 (7.3%)
|
4
|
186 (91.6%)
|
3563 (92.3%)
|
1
|
0 (0%)
|
12 (0.3%)
|
2
|
0 (0%)
|
1 (0%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Vaginal or C section birth (Deriv.)
P-value: 0.392 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
17 (8.4%)
|
281 (7.3%)
|
Caesarean Elective
|
58 (28.6%)
|
964 (25%)
|
Caesarean Emergency
|
41 (20.2%)
|
804 (20.8%)
|
Vaginal
|
87 (42.9%)
|
1810 (46.9%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Maternal Gestational Diabetes? (Deriv.)
P-value: 0.695 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
17 (8.4%)
|
281 (7.3%)
|
FALSE
|
172 (84.7%)
|
3271 (84.8%)
|
TRUE
|
14 (6.9%)
|
307 (8%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Maternal Type 2 Diabetes? (Deriv.)
P-value: 1.000 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
17 (8.4%)
|
281 (7.3%)
|
FALSE
|
186 (91.6%)
|
3572 (92.6%)
|
TRUE
|
0 (0%)
|
6 (0.2%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Any Maternal Mental Health Diagnosis
P-value: 0.151 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
181 (89.2%)
|
3298 (85.5%)
|
TRUE
|
22 (10.8%)
|
561 (14.5%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Maternal Depression
P-value: 0.239 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
192 (94.6%)
|
3557 (92.2%)
|
TRUE
|
11 (5.4%)
|
302 (7.8%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Maternal Anxiety Disorder
P-value: 0.278 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
186 (91.6%)
|
3437 (89.1%)
|
TRUE
|
17 (8.4%)
|
422 (10.9%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Maternal Bipolar Disorder
P-value: 1.000 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
203 (100%)
|
3845 (99.6%)
|
TRUE
|
0 (0%)
|
14 (0.4%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Maternal OCD
P-value: 1.000 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
203 (100%)
|
3855 (99.9%)
|
TRUE
|
0 (0%)
|
4 (0.1%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Maternal Anorexia
P-value: 1.000 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
203 (100%)
|
3843 (99.6%)
|
TRUE
|
0 (0%)
|
16 (0.4%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Maternal Behavioural Disorders
P-value: 1.000 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
203 (100%)
|
3849 (99.7%)
|
TRUE
|
0 (0%)
|
10 (0.3%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Maternal Asthma? (Deriv.)
P-value: 0.037 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
17 (8.4%)
|
281 (7.3%)
|
FALSE
|
178 (87.7%)
|
3264 (84.6%)
|
TRUE
|
8 (3.9%)
|
314 (8.1%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Number of 18wk DASS domains with “Severe” or “Extremely Severe”
P-value: 0.171 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
177
|
0.1 (0, 2)
|
0 (0, 0)
|
Overall
|
2781
|
0.1 (0, 3)
|
0 (0, 0)
|
Unknown: Included = 26 , Overall = 1078
Number of 18wk DASS domains with “Normal”
P-value: 0.064 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
177
|
2.7 (0, 3)
|
3 (3, 3)
|
Overall
|
2781
|
2.6 (0, 3)
|
3 (2, 3)
|
Unknown: Included = 26 , Overall = 1078
Number of 36wk DASS domains with “Severe” or “Extremely Severe”
P-value: 0.285 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
119
|
0.1 (0, 3)
|
0 (0, 0)
|
Overall
|
1570
|
0.1 (0, 3)
|
0 (0, 0)
|
Unknown: Included = 84 , Overall = 2289
Number of 36wk DASS domains with “Normal”
P-value: 0.042 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
119
|
2.8 (0, 3)
|
3 (3, 3)
|
Overall
|
1570
|
2.6 (0, 3)
|
3 (3, 3)
|
Unknown: Included = 84 , Overall = 2289
2) Sample Availability
In general, sample availability and engagement in the
Microbiome cohort is substantially higher relative to the
overall cohort, with significant differences across virtually all
measures.
# Sample availability variables
sample_vars <- list(
list("mns_data_available", "MNS Data Available?"),
list("asq_assigned", "ASQ Questionnaires Assigned"),
list("asq_completed", "ASQ Questionnaires Completed"),
list("early_connors_assigned", "Early Connors Assigned"),
list("early_connors_completed", "Early Connors Completed"),
list("aes_assigned", "AES Questionnaires Assigned"),
list("aes_completed", "AES Questionnaires Completed"),
list("redcap_assigned", "RedCap Questionnaires Assigned"),
list("redcap_completed", "RedCap Questionnaires completed"),
list("questionnaires_total_completed", "Total Questionnaires Completed"),
list("asq_4m_completed", "ASQ 4 Month Completed"),
list("asq_4m_paed_review", "ASQ 4 Month Review with Paediatrician"),
list("asq_9m_completed", "ASQ 9 Month Completed"),
list("asq_9m_paed_review", "ASQ 9 Month Review with Paediatrician"),
list("asq_1yr_completed", "ASQ 1 Year Completed"),
list("asq_1yr_paed_review", "ASQ 1 Year Review with Paediatrician"),
list("asq_3yr_completed", "ASQ 3 Year Completed"),
list("asq_3yr_paed_review", "ASQ 3 Year Review with Paediatrician"),
list("asq_5yr_completed", "ASQ 5 Year Completed"),
list("asq_5yr_paed_review", "ASQ 5 Year Review with Paediatrician"),
list("asq_paed_review_count", "Number of times ASQ has prompted review with PAED")
)
for(var_info in sample_vars) {
var_name <- var_info[[1]]
title <- var_info[[2]]
if(var_name %in% numeric_vars) {
analyze_continuous(dat, var_name, title)
} else {
analyze_categorical(dat, var_name, title)
}
}
MNS Data Available?
P-value: 0.633 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
17 (8.4%)
|
281 (7.3%)
|
TRUE
|
186 (91.6%)
|
3578 (92.7%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
ASQ Questionnaires Assigned
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
203
|
6 (3, 8)
|
6 (4, 8)
|
Overall
|
3591
|
5.1 (0, 8)
|
5 (4, 6)
|
Unknown: Included = 0 , Overall = 268
ASQ Questionnaires Completed
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
203
|
4.3 (0, 9)
|
4 (3, 5)
|
Overall
|
3591
|
2.6 (0, 9)
|
2 (1, 4)
|
Unknown: Included = 0 , Overall = 268
Early Connors Assigned
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
203
|
1.2 (0, 2)
|
1 (0, 2)
|
Overall
|
3591
|
0.8 (0, 2)
|
1 (0, 1)
|
Unknown: Included = 0 , Overall = 268
Early Connors Completed
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
203
|
0.8 (0, 2)
|
1 (0, 1)
|
Overall
|
3591
|
0.4 (0, 2)
|
0 (0, 1)
|
Unknown: Included = 0 , Overall = 268
AES Questionnaires Assigned
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
203
|
3.9 (1, 5)
|
4 (4, 4)
|
Overall
|
3591
|
3.4 (0, 5)
|
3 (3, 4)
|
Unknown: Included = 0 , Overall = 268
AES Questionnaires Completed
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
203
|
2 (0, 5)
|
2 (1, 3)
|
Overall
|
3591
|
1.1 (0, 5)
|
1 (0, 2)
|
Unknown: Included = 0 , Overall = 268
RedCap Questionnaires Assigned
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
203
|
5.9 (2, 7)
|
6 (5, 7)
|
Overall
|
3591
|
5.1 (1, 7)
|
5 (4, 7)
|
Unknown: Included = 0 , Overall = 268
RedCap Questionnaires completed
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
203
|
4.3 (0, 7)
|
5 (3, 6)
|
Overall
|
3591
|
2.8 (0, 7)
|
3 (1, 4)
|
Unknown: Included = 0 , Overall = 268
Total Questionnaires Completed
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
203
|
11.3 (0, 22)
|
11 (9, 14)
|
Overall
|
3859
|
6.3 (0, 22)
|
6 (2, 10)
|
Unknown: Included = 0 , Overall = 0
ASQ 4 Month Completed
P-value: <0.001 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
58 (28.6%)
|
2004 (51.9%)
|
TRUE
|
145 (71.4%)
|
1855 (48.1%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
ASQ 4 Month Review with Paediatrician
P-value: 0.074 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
58 (28.6%)
|
2004 (51.9%)
|
FALSE
|
80 (39.4%)
|
885 (22.9%)
|
TRUE
|
65 (32%)
|
970 (25.1%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
ASQ 9 Month Completed
P-value: <0.001 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
49 (24.1%)
|
2141 (55.5%)
|
TRUE
|
154 (75.9%)
|
1718 (44.5%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
ASQ 9 Month Review with Paediatrician
P-value: <0.001 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
49 (24.1%)
|
2141 (55.5%)
|
FALSE
|
94 (46.3%)
|
807 (20.9%)
|
TRUE
|
60 (29.6%)
|
911 (23.6%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
ASQ 1 Year Completed
P-value: <0.001 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
19 (9.4%)
|
1327 (34.4%)
|
TRUE
|
184 (90.6%)
|
2532 (65.6%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
ASQ 1 Year Review with Paediatrician
P-value: 0.017 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
19 (9.4%)
|
1327 (34.4%)
|
FALSE
|
110 (54.2%)
|
1293 (33.5%)
|
TRUE
|
74 (36.5%)
|
1239 (32.1%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
ASQ 3 Year Completed
P-value: <0.001 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
84 (41.4%)
|
2684 (69.6%)
|
TRUE
|
119 (58.6%)
|
1175 (30.4%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
ASQ 3 Year Review with Paediatrician
P-value: 1.000 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
84 (41.4%)
|
2684 (69.6%)
|
FALSE
|
46 (22.7%)
|
450 (11.7%)
|
TRUE
|
73 (36%)
|
725 (18.8%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
ASQ 5 Year Completed
P-value: <0.001 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
FALSE
|
164 (80.8%)
|
3611 (93.6%)
|
TRUE
|
39 (19.2%)
|
248 (6.4%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
ASQ 5 Year Review with Paediatrician
P-value: 1.000 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
164 (80.8%)
|
3611 (93.6%)
|
FALSE
|
11 (5.4%)
|
72 (1.9%)
|
TRUE
|
28 (13.8%)
|
176 (4.6%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Number of times ASQ has prompted review with PAED
P-value: 0.020 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
200
|
1.5 (0, 4)
|
1 (1, 2)
|
Overall
|
3014
|
1.3 (0, 4)
|
1 (1, 2)
|
Unknown: Included = 3 , Overall = 845
3) Outcome Variables
Notes - Child wheeze at 1 year: No
significant difference between groups (15.8% vs 14.0%, p=0.152) -
BMI measurements: - 1-year BMI shows no significant
difference (p=0.121) - 3-year BMI is significantly
lower in the Microbiome cohort (15.5 vs 15.9, p=0.004) - 5-year
BMI shows no significant difference (p=0.122) - Wheeze and
asthma outcomes: No significant differences at any time point
(1, 3, or 5 years) - Skin prick test (SPT) results: -
Generally similar patterns between groups - Slight trend toward fewer
positive results in Microbiome cohort at 3 years (p=0.076) - No
significant differences in food or airborne allergies at any time point
- Ferritin levels: No significant differences at 1 year
(p=0.998), 3 years (p=0.703), or 5 years (p=0.832) - Behavioral
assessments: - ASQ paediatric review rates show one significant
difference (1-year reviews, p=0.017) - Connors domain scores and
clinical indicators show no significant differences - Follow-up
participation: The Microbiome cohort shows much higher
engagement in data collection - Overall assessment:
Most clinical outcome variables demonstrate good representativeness,
with the main difference being consistently higher study engagement in
the Microbiome cohort
# Outcome variables
outcome_vars <- list(
list("child_wheeze_1yr", "Has your child ever had a wheezed at 1 Year?"),
list("bmi_1yr_calc", "BMI at 1 Year (Calc.)"),
list("ferritin_1yr", "Ferritin Results at 1 Year"),
list("spt_positive_count_1yr", "Count of positive SPT wheals(>= 3MM WHEAL) at 1 Year"),
list("spt_food_positive_1yr", "Any positive Food SPT wheals (>=3mm) at 1 Year"),
list("spt_airborne_positive_1yr", "Any positive airborne/enviro SPT wheals (>=3mm) at 1 Year"),
list("spt_positive_count_3yr", "Count of positive SPT wheals(>= 3MM WHEAL) at 3 Years"),
list("spt_food_positive_3yr", "Any positive Food SPT wheals (>=3mm) at 3 Years"),
list("spt_airborne_positive_3yr", "Any positive airborne/enviro SPT wheals (>=3mm) at 3 Years"),
list("bmi_3yr_calc", "BMI at 3 Years (Calc.)"),
list("wheeze_3yr", "3 year wheeze"),
list("asthma_3yr", "3 year asthma"),
list("followup_3yr", "3yr ferritin"),
list("bmi_5yr_calc", "BMI at 5 Years (Calc.)"),
list("asthma_5yr", "5 year asthma"),
list("ferritin_5yr", "5yr ferritin"),
list("spt_positive_count_5yr", "Count of positive SPT wheals (>=3mm) at 5 Years"),
list("spt_food_positive_5yr", "Any positive Food SPT wheals (>=3mm) at 5 Years"),
list("spt_airborne_positive_5yr", "Any positive airborne/enviro SPT wheals (>=3mm) at 5 Years"),
list("connors_domains_above65_3yr", "Count of Connors domains equal to, or above 65 at 3 Years"),
list("clinical_indicators_highest_3yr", "Count of other clinical indicators parent reported as \"3\" highest at 3 Years")
)
for(var_info in outcome_vars) {
var_name <- var_info[[1]]
title <- var_info[[2]]
if(var_name %in% numeric_vars) {
analyze_continuous(dat, var_name, title)
} else {
analyze_categorical(dat, var_name, title)
}
}
Has your child ever had a wheezed at 1 Year?
P-value: 0.152 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
14 (6.9%)
|
1315 (34.1%)
|
No
|
157 (77.3%)
|
2002 (51.9%)
|
Yes
|
32 (15.8%)
|
542 (14%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
BMI at 1 Year (Calc.)
P-value: 0.121 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
154
|
16.8 (13.8, 21.6)
|
16.8 (15.9, 17.6)
|
Overall
|
1832
|
17 (10.2, 26.1)
|
16.9 (16, 18)
|
Unknown: Included = 49 , Overall = 2027
Ferritin Results at 1 Year
P-value: 0.998 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
80
|
32.5 (5, 367)
|
25 (17, 36.5)
|
Overall
|
542
|
32.1 (5, 871)
|
25 (16, 37)
|
Unknown: Included = 123 , Overall = 3317
Count of positive SPT wheals(>= 3MM WHEAL) at 1 Year
P-value: 0.913 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
200
|
0.2 (0, 5)
|
0 (0, 0)
|
Overall
|
2383
|
0.2 (0, 8)
|
0 (0, 0)
|
Unknown: Included = 3 , Overall = 1476
Any positive Food SPT wheals (>=3mm) at 1 Year
P-value: 0.758 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
3 (1.5%)
|
1476 (38.2%)
|
FALSE
|
180 (88.7%)
|
2165 (56.1%)
|
TRUE
|
20 (9.9%)
|
218 (5.6%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Any positive airborne/enviro SPT wheals (>=3mm) at 1 Year
P-value: 0.478 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
3 (1.5%)
|
1476 (38.2%)
|
FALSE
|
197 (97%)
|
2325 (60.2%)
|
TRUE
|
3 (1.5%)
|
58 (1.5%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Count of positive SPT wheals(>= 3MM WHEAL) at 3 Years
P-value: 0.076 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
111
|
0.1 (0, 3)
|
0 (0, 0)
|
Overall
|
1034
|
0.2 (0, 5)
|
0 (0, 0)
|
Unknown: Included = 92 , Overall = 2825
Any positive Food SPT wheals (>=3mm) at 3 Years
P-value: 1.000 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
92 (45.3%)
|
2825 (73.2%)
|
FALSE
|
107 (52.7%)
|
992 (25.7%)
|
TRUE
|
4 (2%)
|
42 (1.1%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Any positive airborne/enviro SPT wheals (>=3mm) at 3 Years
P-value: 0.053 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
92 (45.3%)
|
2825 (73.2%)
|
FALSE
|
102 (50.2%)
|
882 (22.9%)
|
TRUE
|
9 (4.4%)
|
152 (3.9%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
BMI at 3 Years (Calc.)
P-value: 0.004 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
98
|
15.5 (12.7, 19.3)
|
15.4 (14.7, 16.4)
|
Overall
|
1021
|
15.9 (11.3, 27.3)
|
15.8 (15, 16.7)
|
Unknown: Included = 105 , Overall = 2838
3 year wheeze
P-value: 0.615 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
82 (40.4%)
|
2597 (67.3%)
|
FALSE
|
96 (47.3%)
|
973 (25.2%)
|
TRUE
|
25 (12.3%)
|
289 (7.5%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
3 year asthma
P-value: 0.267 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
83 (40.9%)
|
2603 (67.5%)
|
FALSE
|
116 (57.1%)
|
1233 (32%)
|
TRUE
|
4 (2%)
|
23 (0.6%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
3yr ferritin
P-value: 0.703 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
89
|
21 (7, 67)
|
18 (14, 24)
|
Overall
|
610
|
22.1 (5, 175)
|
19 (14, 26)
|
Unknown: Included = 114 , Overall = 3249
BMI at 5 Years (Calc.)
P-value: 0.122 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
58
|
15.4 (12.6, 18.9)
|
15.4 (14.7, 16)
|
Overall
|
466
|
15.8 (11.5, 26.4)
|
15.6 (14.9, 16.5)
|
Unknown: Included = 145 , Overall = 3393
5 year asthma
P-value: 1.000 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
140 (69%)
|
3374 (87.4%)
|
FALSE
|
60 (29.6%)
|
461 (11.9%)
|
TRUE
|
3 (1.5%)
|
24 (0.6%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
5yr ferritin
P-value: 0.832 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
37
|
24.4 (7, 55)
|
21 (17, 30)
|
Overall
|
262
|
25.8 (7, 124)
|
22 (17, 30)
|
Unknown: Included = 166 , Overall = 3597
Count of positive SPT wheals (>=3mm) at 5 Years
P-value: 0.723 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
61
|
1 (0, 6)
|
0 (0, 2)
|
Overall
|
414
|
1 (0, 10)
|
0 (0, 2)
|
Unknown: Included = 142 , Overall = 3445
Any positive Food SPT wheals (>=3mm) at 5 Years
P-value: 0.702 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
142 (70%)
|
3445 (89.3%)
|
FALSE
|
56 (27.6%)
|
388 (10.1%)
|
TRUE
|
5 (2.5%)
|
26 (0.7%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Any positive airborne/enviro SPT wheals (>=3mm) at 5 Years
P-value: 0.774 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
142 (70%)
|
3445 (89.3%)
|
FALSE
|
38 (18.7%)
|
268 (6.9%)
|
TRUE
|
23 (11.3%)
|
146 (3.8%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Count of Connors domains equal to, or above 65 at 3 Years
P-value: 0.937 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
111
|
0.4 (0, 6)
|
0 (0, 1)
|
Overall
|
1062
|
0.4 (0, 7)
|
0 (0, 1)
|
Unknown: Included = 92 , Overall = 2797
Count of other clinical indicators parent reported as “3” highest at
3 Years
Data not available
4) Remaining Variables
All remaining variables not contained in (1), (2) or (3).
Key differences identified in remaining
variables:
- Twin births are significantly over-represented in
the Microbiome cohort (8.9% vs 3.2%, p<0.001)
- Current child age differs significantly, with
Microbiome children being older (mean 4.5 vs 3.8 years, p<0.001)
- Age at Peapod assessment differs significantly
(mean 7.7 vs 5.1 days, p<0.001)
- DASS anxiety scores at 18 weeks show a significant
difference (p=0.044), with the Microbiome cohort having better anxiety
profiles
- Previous pregnancies and parity show no significant
differences
- BMI at Peapod and other DASS measures show no
significant differences
The over-representation of twins and older child age in the
Microbiome cohort suggests this sub-sample may represent families with
longer study engagement and more complex pregnancies.
# Remaining variables
remaining_vars <- list(
list("singleton_twin", "Singleton or Twin"),
list("current_age_march2024", "Current age of child (as of March 2024)"),
list("previous_pregnancies", "Previous Pregnancies"),
list("previous_pregnancies_parity", "Previous Pregnancies Parity"),
list("bmi_peapod_calc", "BMI at Peapod (Calc.)"),
list("age_days_peapod_calc", "Age (days) at Peapod (Calc.)"),
list("dass21_18w_depression", "DASS21 Depression 18 Week"),
list("dass21_18w_anxiety", "DASS21 Anxiety 18 Week"),
list("dass21_18w_stress", "DASS21 Stress 18 Week"),
list("dass21_36w_depression", "DASS21 Depression 36 Week"),
list("dass21_36w_anxiety", "DASS21 Anxiety 36 Week"),
list("dass21_36w_stress", "DASS21 Stress 36 Week")
)
for(var_info in remaining_vars) {
var_name <- var_info[[1]]
title <- var_info[[2]]
if(var_name %in% numeric_vars) {
analyze_continuous(dat, var_name, title)
} else {
analyze_categorical(dat, var_name, title)
}
}
Singleton or Twin
P-value: <0.001 (Chi-squared test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Singleton
|
185 (91.1%)
|
3735 (96.8%)
|
Twins
|
18 (8.9%)
|
124 (3.2%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Current age of child (as of March 2024)
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
203
|
4.5 (1.4, 7.2)
|
4.8 (3.1, 5.9)
|
Overall
|
3859
|
3.8 (0.6, 7.3)
|
3.7 (2.4, 5.2)
|
Unknown: Included = 0 , Overall = 0
Previous Pregnancies
P-value: 0.158 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
186
|
1.3 (0, 6)
|
1 (0, 2)
|
Overall
|
3576
|
1.2 (0, 14)
|
1 (0, 2)
|
Unknown: Included = 17 , Overall = 283
Previous Pregnancies Parity
P-value: 0.582 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
186
|
0.7 (0, 4)
|
1 (0, 1)
|
Overall
|
3570
|
0.7 (0, 6)
|
0 (0, 1)
|
Unknown: Included = 17 , Overall = 289
BMI at Peapod (Calc.)
P-value: 0.640 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
167
|
12.8 (10, 17.1)
|
12.6 (11.9, 13.5)
|
Overall
|
2210
|
12.8 (8.5, 28.1)
|
12.8 (11.8, 13.8)
|
Unknown: Included = 36 , Overall = 1649
Age (days) at Peapod (Calc.)
P-value: <0.001 (Wilcoxon rank sum test)
Characteristic
|
N
|
Mean (Min, Max)
|
Median (Q1, Q3)
|
Included
|
167
|
7.7 (0, 56)
|
3 (1, 10)
|
Overall
|
2210
|
5.1 (0, 82)
|
2 (1, 4)
|
Unknown: Included = 36 , Overall = 1649
DASS21 Depression 18 Week
P-value: 0.800 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
26 (12.8%)
|
1078 (27.9%)
|
Mild
|
8 (3.9%)
|
153 (4%)
|
Moderate
|
6 (3%)
|
120 (3.1%)
|
Normal
|
162 (79.8%)
|
2460 (63.7%)
|
Severe
|
1 (0.5%)
|
24 (0.6%)
|
Extremely Severe
|
0 (0%)
|
24 (0.6%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
DASS21 Anxiety 18 Week
P-value: 0.043 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
26 (12.8%)
|
1078 (27.9%)
|
Extremely Severe
|
1 (0.5%)
|
57 (1.5%)
|
Mild
|
9 (4.4%)
|
302 (7.8%)
|
Moderate
|
9 (4.4%)
|
130 (3.4%)
|
Normal
|
154 (75.9%)
|
2224 (57.6%)
|
Severe
|
4 (2%)
|
68 (1.8%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
DASS21 Stress 18 Week
P-value: 0.607 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
26 (12.8%)
|
1078 (27.9%)
|
Extremely Severe
|
2 (1%)
|
23 (0.6%)
|
Mild
|
6 (3%)
|
155 (4%)
|
Moderate
|
6 (3%)
|
101 (2.6%)
|
Normal
|
161 (79.3%)
|
2447 (63.4%)
|
Severe
|
2 (1%)
|
55 (1.4%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
DASS21 Depression 36 Week
P-value: 0.232 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
84 (41.4%)
|
2289 (59.3%)
|
Extremely Severe
|
1 (0.5%)
|
13 (0.3%)
|
Mild
|
6 (3%)
|
86 (2.2%)
|
Normal
|
111 (54.7%)
|
1409 (36.5%)
|
Severe
|
1 (0.5%)
|
14 (0.4%)
|
Moderate
|
0 (0%)
|
48 (1.2%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
DASS21 Anxiety 36 Week
P-value: 0.880 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
84 (41.4%)
|
2289 (59.3%)
|
Extremely Severe
|
1 (0.5%)
|
32 (0.8%)
|
Mild
|
9 (4.4%)
|
130 (3.4%)
|
Moderate
|
4 (2%)
|
72 (1.9%)
|
Normal
|
104 (51.2%)
|
1310 (33.9%)
|
Severe
|
1 (0.5%)
|
26 (0.7%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
DASS21 Stress 36 Week
P-value: 0.055 (Fisher’s exact test)
Characteristic
|
Included N = 203
|
Overall N = 3859
|
Unknown
|
84 (41.4%)
|
2289 (59.3%)
|
Mild
|
2 (1%)
|
71 (1.8%)
|
Normal
|
115 (56.7%)
|
1403 (36.4%)
|
Severe
|
2 (1%)
|
31 (0.8%)
|
Extremely Severe
|
0 (0%)
|
12 (0.3%)
|
Moderate
|
0 (0%)
|
53 (1.4%)
|
Total
|
203 (100.0%)
|
3859 (100.0%)
|
Plotting
# Create distribution plots for key continuous variables
plot_vars <- c("maternal_age_birth", "infant_weight", "bmi_1yr_calc", "bmi_3yr_calc",
"bmi_5yr_calc", "ferritin_1yr", "ferritin_3yr", "ferritin_5yr", "dass21_18w_normal_count",
"dass21_36w_normal_count")
for(var in plot_vars) {
if(var %in% names(dat) && is.numeric(dat[[var]]) && !all(is.na(dat[[var]]))) {
# Create a more descriptive title
plot_title <- case_when(
var == "maternal_age_birth" ~ "Maternal Age at Birth",
var == "infant_weight" ~ "Infant Weight at Birth",
var == "bmi_1yr_calc" ~ "BMI at 1 Year",
var == "bmi_3yr_calc" ~ "BMI at 3 Years",
var == "bmi_5yr_calc" ~ "BMI at 5 Years",
var == "ferritin_1yr" ~ "Ferritin Levels at 1 Year",
var == "ferritin_3yr" ~ "Ferritin Levels at 3 Years",
var == "ferritin_5yr" ~ "Ferritin Levels at 5 Years",
var == "dass21_18w_normal_count" ~ "DASS Normal Domains at 18 Weeks",
var == "dass21_36w_normal_count" ~ "DASS Normal Domains at 36 Weeks",
TRUE ~ str_replace_all(var, "_", " ")
)
# Create data for all three plots with a unified grouping variable
plot_data <- dat %>%
filter(!is.na(.data[[var]])) %>%
mutate(
group_type = case_when(
microbiome_dataset == "Excluded" ~ "Excluded",
microbiome_dataset == "Included" ~ "Included",
TRUE ~ "Overall"
)
)
# Add overall data
overall_data <- dat %>%
filter(!is.na(.data[[var]])) %>%
mutate(group_type = "Overall")
# Combine data for three-panel plot
combined_data <- bind_rows(plot_data, overall_data) %>%
mutate(
group_type = factor(group_type, levels = c("Excluded", "Included", "Overall"))
)
p <- combined_data %>%
ggplot(aes(x = .data[[var]])) +
geom_histogram(aes(fill = group_type), alpha = 0.7, bins = 30) +
facet_wrap(~group_type, scales = "free_y", ncol = 3) +
theme_minimal() +
labs(
title = paste("Distribution of", plot_title),
x = plot_title,
y = "Count"
) +
theme(
legend.position = "none",
plot.title = element_text(size = 14, face = "bold"),
strip.text = element_text(size = 12, face = "bold")
) +
scale_fill_manual(values = c("Excluded" = "#E74C3C", "Included" = "#3498DB", "Overall" = "#2C3E50"))
print(p)
}
}










# Create a comparison plot for ferritin levels across time points
ferritin_data <- dat %>%
select(microbiome_dataset, ferritin_1yr, ferritin_3yr, ferritin_5yr) %>%
pivot_longer(cols = c(ferritin_1yr, ferritin_3yr, ferritin_5yr),
names_to = "time_point",
values_to = "ferritin_level") %>%
filter(!is.na(ferritin_level)) %>%
mutate(
time_point = case_when(
time_point == "ferritin_1yr" ~ "1 Year",
time_point == "ferritin_3yr" ~ "3 Years",
time_point == "ferritin_5yr" ~ "5 Years",
TRUE ~ time_point
),
time_point = factor(time_point, levels = c("1 Year", "3 Years", "5 Years"))
)
if(nrow(ferritin_data) > 0) {
p_ferritin <- ferritin_data %>%
ggplot(aes(x = ferritin_level, fill = microbiome_dataset)) +
geom_histogram(alpha = 0.7, position = "identity", bins = 25) +
facet_grid(microbiome_dataset ~ time_point, scales = "free") +
theme_minimal() +
labs(
title = "Ferritin Levels Comparison Across Time Points",
x = "Ferritin Level",
y = "Count"
) +
theme(
legend.position = "none",
plot.title = element_text(size = 14, face = "bold"),
strip.text = element_text(size = 11, face = "bold")
) +
scale_fill_manual(values = c("Excluded" = "#E74C3C", "Included" = "#3498DB"))
print(p_ferritin)
}

# Create a comparison plot for DASS normal domains across time points
dass_data <- dat %>%
select(microbiome_dataset, dass21_18w_normal_count, dass21_36w_normal_count) %>%
pivot_longer(cols = c(dass21_18w_normal_count, dass21_36w_normal_count),
names_to = "time_point",
values_to = "normal_count") %>%
filter(!is.na(normal_count)) %>%
mutate(
time_point = case_when(
time_point == "dass21_18w_normal_count" ~ "18 Weeks",
time_point == "dass21_36w_normal_count" ~ "36 Weeks",
TRUE ~ time_point
)
)
if(nrow(dass_data) > 0) {
p_dass <- dass_data %>%
ggplot(aes(x = normal_count, fill = microbiome_dataset)) +
geom_bar(alpha = 0.7, position = "dodge") +
facet_grid(microbiome_dataset ~ time_point) +
theme_minimal() +
labs(
title = "DASS Normal Domains Comparison Across Time Points",
x = "Number of Normal DASS Domains (0-3)",
y = "Count"
) +
theme(
legend.position = "none",
plot.title = element_text(size = 14, face = "bold"),
strip.text = element_text(size = 11, face = "bold")
) +
scale_fill_manual(values = c("Excluded" = "#E74C3C", "Included" = "#3498DB")) +
scale_x_continuous(breaks = 0:3)
print(p_dass)
}

# Create a BMI trajectory comparison plot
bmi_data <- dat %>%
select(microbiome_dataset, bmi_1yr_calc, bmi_3yr_calc, bmi_5yr_calc) %>%
pivot_longer(cols = c(bmi_1yr_calc, bmi_3yr_calc, bmi_5yr_calc),
names_to = "time_point",
values_to = "bmi") %>%
filter(!is.na(bmi)) %>%
mutate(
time_point = case_when(
time_point == "bmi_1yr_calc" ~ "1 Year",
time_point == "bmi_3yr_calc" ~ "3 Years",
time_point == "bmi_5yr_calc" ~ "5 Years",
TRUE ~ time_point
),
time_point = factor(time_point, levels = c("1 Year", "3 Years", "5 Years"))
)
if(nrow(bmi_data) > 0) {
p_bmi <- bmi_data %>%
ggplot(aes(x = bmi, fill = microbiome_dataset)) +
geom_histogram(alpha = 0.7, position = "identity", bins = 25) +
facet_grid(microbiome_dataset ~ time_point, scales = "free") +
theme_minimal() +
labs(
title = "BMI Comparison Across Time Points",
x = "BMI",
y = "Count"
) +
theme(
legend.position = "none",
plot.title = element_text(size = 14, face = "bold"),
strip.text = element_text(size = 11, face = "bold")
) +
scale_fill_manual(values = c("Excluded" = "#E74C3C", "Included" = "#3498DB"))
print(p_bmi)
}
