Baseline characteristics of all screened women using means ± SDs for continuous variables and counts (percentages) for categorical variables

Characteristic N = 449
Age (years) 40.2 ± 14.9
Age group
    <30 139 (31%)
    30-50 175 (39%)
    >50 135 (30%)
Season
    Autumn 96 (21%)
    Spring 169 (38%)
    Summer 80 (18%)
    Winter 104 (23%)
Patient status (categorical)
    Fertile 238 (53%)
    Pregnancy 40 (8.9%)
    Premenopause 50 (11%)
    Menopause 121 (27%)
Vaginal pH
    4 73 (16%)
    5 266 (59%)
    6 99 (22%)
    7 10 (2.2%)
    8 1 (0.2%)
Sexually active
    No 33 (7.3%)
    Yes 393 (88%)
    Unknown 23 (5.1%)
Itching symptoms 26 (5.8%)
Burning symptoms 38 (8.5%)
Leukorrhea (abnormal discharge) 70 (16%)
Chlamydia trachomatis 2 (0.5%)
Ureaplasma spp. 108 (24%)
Mycoplasma hominis 10 (2.2%)
Trichomonas vaginalis
    negative 449 (100%)
Candida spp. 54 (12%)
Escherichia coli 143 (32%)
Proteus spp. 147 (33%)
Pseudomonas spp. 109 (24%)
Gardnerella vaginalis 281 (63%)
Staphylococcus aureus 212 (47%)
Enterococcus faecalis 262 (58%)
Neisseria gonorrhoeae
    negative 449 (100%)
Streptococcus agalactiae (GBS) 30 (6.7%)

Symptoms dumbbell plot

For each pathogen and each symptom (itching, burning, leukorrhea) we computed the percentage symptomatic among pathogen-positive and pathogen-negative tests. Differences in proportions were explored with 2×2 tests (chi-square or Fisher when expected counts <5), with Benjamini–Hochberg false discovery rate (BH–FDR) control across comparisons.

Figure 1. Percent symptomatic among pathogen-positive (yellow) and pathogen-negative (grey) tests; connector length shows absolute difference. No BH–FDR-significant differences

Figure 1. Percent symptomatic among pathogen-positive (yellow) and pathogen-negative (grey) tests; connector length shows absolute difference. No BH–FDR-significant differences

Across pathogens, symptom prevalence was broadly similar between positive and negative tests; absolute differences were generally small and none remained significant after BH–FDR. This supports the interpretation that attendance at the clinic was not symptom-driven and that symptoms alone do not distinguish pathogen positivity in this setting.

Uni and Multivariable model for Relative Risk of infection by collected factors

We modeled per-pathogen positivity using Poisson regression with log link and robust (HC) standard errors, reporting risk ratios (RRs) with 95% CIs. Primary covariates: age group (three levels: <30 ref, 30-50, >50), reproductive status, sexual activity, and season (reference: autumn). Only pathogens with nr. of positivity > 10 were included in the analysis. For display, we summarized a common subset of predictors in a faceted forest plot (one panel per predictor). BH–FDR was applied within predictor across pathogens.

Figure 2. Adjusted RRs (95% CIs) from Poisson models with robust SEs; vertical line = RR 1.0. Filled points denote BH–FDR < 0.05.

Figure 2. Adjusted RRs (95% CIs) from Poisson models with robust SEs; vertical line = RR 1.0. Filled points denote BH–FDR < 0.05.

Adjusted associations were generally modest and heterogeneous by organism. Higher pH , sexual activity and reproductive status showed very limited, pathogen-specific associations.Some seasonal effects were revealed, in particular for Staphylococcus A and Pseudomonas, showing significantly lower positivity rate in Spring and Summer compared to Autumn.

Monthly Incidence

For each pathogen and month we plotted the proportion positive among tests performed (positives/tests), with Wilson 95% CIs.

Figure 3. Monthly proportion positive (positives/tests) with Wilson 95% CIs; numbers under the axis show monthly tests; dashed lined: generalized additive models with cyclic month term

Figure 3. Monthly proportion positive (positives/tests) with Wilson 95% CIs; numbers under the axis show monthly tests; dashed lined: generalized additive models with cyclic month term

Several organisms showed stable monthly proportions, while others displayed suggestive seasonality (e.g., XXX). Some fluctuations coincided with months having fewer tests and wider CIs.

Upset Plot

We visualized co-detections among common pathogens using an UpSet plot, ranking intersections by size. We evaluated pairwise independence using 2×2 tests (chi-square or Fisher) comparing observed vs expected co-detections given marginal prevalences, with BH–FDR control. For selected pairs we further examined adjusted co-occurrence

Figure 4. UpSet plot of the most common co-detection sets; bar labels show counts.

Figure 4. UpSet plot of the most common co-detection sets; bar labels show counts.

sp1 sp2 obs exp p_adj
gardnerella_vaginalis enterococcus_fecalis 211 179.129 0.000
staphylococcus_aureus enterococcus_fecalis 159 135.144 0.000
gardnerella_vaginalis staphylococcus_aureus 158 144.944 0.031
proteus enterococcus_fecalis 128 93.708 0.000
e_coli enterococcus_fecalis 116 91.158 0.000
proteus staphylococcus_aureus 96 75.825 0.000
e_coli staphylococcus_aureus 92 73.762 0.001
pseudomonas staphylococcus_aureus 70 56.224 0.016
pseudomonas gardnerella_vaginalis 62 74.523 0.018
pseudomonas enterococcus_fecalis 57 69.484 0.023

Co-detections were frequent for combinations that included Gardnerella, Enterococcus, and Staphylococcus, with the largest intersections reaching 25 individuals. Looking at combinations of two pathogens, several pairs appeared more common than expected by chance in the unadjusted analysis, in particular pairs including Gardnerella, Enterococcus, Staphylococcus, Pseudomonas and E. Coli, each with >50 observations.

Infection pairs analysis

To assess whether pathogens co-occur beyond case-mix, we modeled each pathogen A as the outcome in a Poisson log-link model with robust SEs and included pathogen B as a predictor with covariate adjustment. We fit models in both directions (A~B and B~A), and took the average log-RRs; FDR control was applied across pairs.

Figure 5. Adjusted co-occurrence heat map. Cells show RR for symmetrized A↔B association; * BH–FDR < 0.05.

Figure 5. Adjusted co-occurrence heat map. Cells show RR for symmetrized A↔︎B association; * BH–FDR < 0.05.

After adjustment, several bacterial pairs showed positive co-occurrence, in particular those involving Enterococcus f. and E. coli; others showed moderate associations (e.g., Gardnerella with Proteus or Staphylococcus). Most remaining pairs were near RR≈1. Findings were consistent with the UpSet intersections and robust to covariate adjustment.