Oral Microbiome and Breast Cancer Risk in the NIH-AARP

Author

Giovanny Herrera

Published

June 5, 2025

Abstract

Introduction: Breast cancer is the most common cancer among women. Studies suggest a link between periodontal disease and breast cancer risk, but whether oral microbiome differences precede or result from cancer remains unclear. We examined this prospective association in the NIH-AARP Diet and Health Study.

Methods: We matched 348 women who developed breast cancer after oral wash collection to controls using incidence density sampling. Controls were matched by age at sample collection, race and ethnicity, and smoking status (see Figure 1 for more information about the selection of cases and controls). DNA was extracted, and shotgun metagenomic sequencing was performed. We used conditional logistic regression to calculate the odds ratios (OR) for the associations between alpha diversity, beta diversity, species-, and gene-level metrics, with breast cancer risk, adjusting for potential confounders.

Figure 1- Selection of cases and controls

Characteristics of study participants

Different characteristics of study participants are showed in Table 1

Table 1- Characteristics of study participants
Variable Incident Case Matched Control
- Mean/N SD/(%) Mean/N SD/(%)
Age at entry 61.5 5.2 61.5 5.3
Age group - -
<55 yrs 46 14.5 49 15.4
55-59 yrs 77 24.2 72 22.6
60-64 yrs 106 33.3 106 33.3
65-69 yrs 79 24.8 80 25.2
≥ 70 yrs 10 3.1 11 3.5
Age at sample collection 70.5 5.2 70.5 5.2
Age at sample collection group - -
<65 yrs 57 17.9 57 17.9
65-69 yrs 82 25.8 81 25.5
70-75 yrs 106 33.3 107 33.6
>75yrs 73 23 73 23
Race - -
Non-Hispanic White 300 94.3 300 94.3
Others 18 5.7 18 5.7
Education - -
<=12 years or completed high school 63 19.8 75 23.6
Post-high school training (technical training or some college) 118 37.1 115 36.2
College graduate and post graduate 137 43.1 128 40.3
BMI 26.7 5.5 26.5 5.5
BMI Category - -
Healthy weight 145 45.6 148 46.5
Overweight 106 33.3 103 32.4
Class 1-3 obesity 67 21.1 67 21.1
Age at menarche - -
<=12 years old 137 43.1 161 50.6
13-14 years old 148 46.5 134 42.1
≥ 15 years old 33 10.4 23 7.2
Number of live-born children - -
Never had a child 52 16.4 47 14.8
1 31 9.7 38 11.9
2 88 27.7 73 23
3-4 114 35.8 121 38.1
5->=10 33 10.4 39 12.3
Age at first live birth - -
Never gave birth 48 15.1 47 14.8
<16-19 47 14.8 49 15.4
20-24 143 45 141 44.3
25-29 62 19.5 56 17.6
>= 30 18 5.7 25 7.9
Age at menopause - -
<40 45 14.2 54 17
40-44 39 12.3 46 14.5
45-49 61 19.2 60 18.9
50-54 125 39.3 124 39
≥55 30 9.4 17 5.3
Still menstruating 18 5.7 17 5.3
Hormone status - -
Never 122 38.4 125 39.3
Current 164 51.6 167 52.5
Former 32 10.1 26 8.2
Total years used replacement hormones - -
Never used hormones 122 38.4 125 39.3
<=5 years 67 21.1 65 20.4
5-9 years 45 14.2 44 13.8
≥10 years 84 26.4 84 26.4
First-degree female relatives with breast cancer  - -
No 250 78.6 264 83
Yes 54 17 42 13.2
Unknown/Missing 14 4.4 12 3.8
Cigarette use - -
Never 134 42.1 134 42.1
Former 132 41.5 132 41.5
Current 38 11.9 38 11.9
Unknown/Missing 14 4.4 14 4.4
Alcohol from alcoholic drinks (beer, wine, liquor) (g/day) 6.6 13.6 6.3 14.4
Alcohol from alcoholic drinks (beer, wine, liquor) (g/day) group - -
0 57 17.9 70 22
>0-1 63 19.8 77 24.2
>1-2 58 18.2 53 16.7
>2-5 46 14.5 35 11
>5-10 32 10.1 22 6.9
>10 62 19.5 61 19.2
Self-reported health condition - -
Excellent 73 23 62 19.5
Very good 131 41.2 130 40.9
Good 92 28.9 96 30.2
Fair 22 6.9 30 9.4
Breast cancer grade - -
Grade I or (well) differentiated 76 23.9 0 0
Grade II or moderately (well) differentiated 126 39.6 0 0
Grade III or poorly differentiated 74 23.3 1 0.3
Grade IV or undifferentiated/anaplastic 12 3.8 0 0
Grade and differentiation not stated 30 9.4 0 0
NA - 337 106
Estrogen Receptor for breast cancer - -
0 = Test Not Done 0 0 0 0
1 Test Done, Results Positive/Elevated 256 80.5 1 0.3
2 = Test Done, Results Negative/Normal 30 9.4 0 0
3 Test Done, Results Borderline or Undetermined Whether Positive or Negative 1 0.3 0 0
x=Registry did not return hormone receptor data 12 3.8 0 0
NA 0 0 317 99.7
Missing 19 6 0 0
Progesterone receptor for breast cancer - -
0 = Test Not Done 0 0 0 0
1 Test Done, Results Positive/Elevated 206 64.8 1 0.3
2 = Test Done, Results Negative/Normal 72 22.6 0 0
3 Test Done, Results Borderline or Undetermined Whether Positive or Negative 2 0.6 0 0
x=Registry did not return hormone receptor data 13 4.1 0 0
NA 0 0 317 99.7
Missing 25 7.9 0 0
Breast cancer behavior (OLD_BREAST_BEHV) - -
2 = in situ 67 21.1 0 0
3 = malignant, primary site 251 78.9 0 0

Alpha diversity

We did not observe differences in alpha diversity indices by breast cancer status (see Figure 2).

Figure 2- Alpha diversity by breast cancer status

Mean and SD by breast cancer status are showed in Table 2

Table 2- Mean and SD of alpha diversity indices by breast cancer status

Conditional Logistic regression for alpha diversity

Standardization

Alpha diversity indices were standardized based on standard deviation from controls using the following formula:

\[ (x - mean(x[controls])) / sd(x[controls]) \tag{1}\]

Conditional logistic regression formula

We used the following formula to perform conditional logistic regression:

\[ Breastcan \sim index + strata(SET) + BMIcur + Educ + Livechild + Menopage+ Rel1dbreast + Alcdrinks \tag{2}\]

Breastcan: Breast cancer status, SET: matching factor, BMIcur: Body mass index at sample collection, Educ: Education Level, Livechild: Number of live-born children, Menopage: age at menopause, Rel1dbreast: First-degree female reatives with breast cancer, Alcdrinks: Alcohol from alcoholic drinks (beer, wine, liquor) (g/day)

Table 3 shows that Pielou’s evenness was statistically significant when adjusting for confounders

Table 3- Conditional logistic regression for alpha diversitty indices

Quantiles

Alpha diversity indices were also analyzed using quartiles estimated from the distribution among the controls . See Table 4 for more details.

Table 4- Conditional logistic regression of alpha diversity indices by quantiles. *=Statistically significant P-values

Beta diversity

No spatial clusters were observed in PCoA plots (Figure 3).

Figure 3- PCoA plots by breast cancer status

MiRKAT

The microbiome regression-based test, adjusting for confounders, was not statistically significant (BC-MiRKAT = 0.2768, Jaccard-MiRKAT = 0.3035). This analysis was also conducted without the matching factor and covariates (Table 5)

Table 5- Microbiome regression-based test (MiRKAT) results

As the MiRKAT does not perform conditional logistic regression, matching factor was no used as a covariate in other models

Percentage of human DNA

Conditional logistic regressions were also performed for the percentage of human DNA with and without adjusting for covariates.

Table 6- Conditional logistic regression for the percentage of human DNA

Conditional logistic regression results

Genus level

Clogit for each genus associated with breast cancer in the Ghana study

Relative abundance

Table 7- Associations between relative abundance of oral microbiome genera and the risk of breast cancer

Presence

Table 8- Associations between the presence of oral microbiome genera and the risk of breast cancer

Species level

The following conditional logistic regressions were run:

Relative abundance

Clogit for the scaled relative abundance (only for species with relative abundance >0.1%)

Table 9- Associations between relative abundance of oral microbiome species and the risk of breast cancer

CLR-transformed relative abundance

Clogit for CLR-transformed data (only for species with relative abundance >0.1%)

Table 10- Associations between CLR-transformed relative abundance of oral microbiome species and the risk of breast cancer

Presence of species

Clogit for the presence of species with prevalence between 5% and 95%

Table 11- Associations between the presence of oral microbiome species and the risk of breast cancer

Interaction between relative abundance and presence

Clogit including a term for the presence and then a term for the interaction between the scaled relative abundance and presence (include both betas)

Table 12- Evaluation of the interaction between the presence and the relative abundance

Gene level

Relative abundance

Clogit for the Scaled relative abundance (only for genes with relative abundance > 0.01% (>100 copies per million) (81 genes)

Table 13- Associations between genes (CPM) and the risk of breast cancer

Presence

Clogit for the presence of genes using the same threshold mentioned above (>100 copies per million) (81 genes)

Table 14- Associations between the presence of genes and the risk of breast cancer

Interaction

Clogit including a term for the presence and then a term for the interaction between the scaled relative abundance and presence (include both betas)

Table 15- Evaluation of the interaction between the presence of genes and the relative abundance

Oral pathogen complexes

Due to previously observed associations between periodontal disease and breast cancer risk, we have a pre-specified hypothesis that specific periodontal pathogens (see Table 16) will be associated with breast cancer risk

Table 16- Periodontal pathogens and their microbial complexes
Species Microbial complex
Tannerella forsythia Red complex
Tannerella denticola Red complex
Porphyromonas gingivalis Red complex
Prevotella intermedia Orange complex
Prevotella nigrescens Orange complex
Prevotella micros Orange complex
Fusobacterium nucleatum Orange complex
Fusobacterium periodonticum Orange complex
Campylobacter gracilis Orange complex
Campylobacter rectus Orange complex
Campylobacter showae Orange complex
Eubacterium nodatum Orange complex
Streptococcus constellatus Orange complex

Samples were classified based on the presence of any species within each complex listed in Table 16. We then conducted conditional logistic regression, including terms for both the presence of any pathogen from each complex and the total relative abundance of species within each complex.

Table 17- Associations between periodontal pathogen complexes and the risk of breast cancer

Invasive breast cancer

We excluded in situ breast cancer cases and their controls.  Check the Breast cancer behavior variable in Table 1

Alpha and Beta diversity for invasive breast cancer

Based on rarefaction at 1 million reads, metrics for alpha diversity were generated. Bray-Curtis and Jaccard distance matrices were computed based on 502 samples (excluding 67 in situ cases and their controls).

Figure 4- PCoA plots and Alpha diversity by breast cancer status

MiRKAT

The microbiome regression-based test, adjusting for confounders, was not statistically significant for any distance matrix.

Table 18- Microbiome regression-based test (MiRKAT) results for invasive breast cancer cases

ER+ and PR+

Estrogen receptor positive cases

256 cases were Estrogen Receptor (ER) Positive (Table 1). We restricted the analysis to breast cancer cases only. The ER variable was recoded, with ER-positive designated as the outcome and all other categories classified as negative. Unconditional logistic regression was performed for alpha diversity indices ( see Table 19), and MiRKAT was performed for beta diversity matrices (see Table 20)

Figure 5- Alpha and Beta diversity by Estrogen Receptor status

Unconditional logistic regression for alpha diversity indices

Table 19- Conditional logistic regression for alpha diversitty indices by Estrogen Receptor status

MiRKAT

Table 20- Microbiome regression-based test (MiRKAT) results by Estrogen Receptor status

Progesterone Receptor positive cases

206 cases were Progesterone Receptor (PR) Positive (Table 1). We restricted the analysis to breast cancer cases only. The PR variable was recoded, with PR-positive designated as the outcome and all other categories classified as negative. Unconditional logistic regression was performed for alpha diversity indices ( see Table 21), and MiRKAT was performed for beta diversity matrices (see Table 22)

Figure 6- Alpha and Beta diversity by Progesterone Receptor status

Unconditional logistic regression for alpha diversity indices

Table 21- Conditional logistic regression for alpha diversitty indices by Progesterone Receptor status

MiRKAT

Table 22- Microbiome regression-based test (MiRKAT) results by Progesterone Receptor status

Time between sample collection and breast cancer diagnosis

Cases were classified into two groups according to the time from biospecimen collection to subsequent breast cancer diagnosis. 97 cases were diagnosed during the first 2 years of follow-up, while 221 cases were diagnosed after 2 years of follow-up (See Table 23 for more information about the time between the sample collection and diagnosis)

Table 23- Mean and SD of the time from sample collection to breast cancer diagnosis

Alpha and Beta diversity for less than 2 years

Figure 7- PCoA plots and alpha diversity among cases diagnosed within two years of sample collection

MiRKAT

Table 24- Microbiome regression-based test (MiRKAT) results among cases diagnosed within two years of sample collection

Conditional logistic regression for alpha diversity indices

Table 25- Conditional logistic regression for alpha diversitty indices among cases diagnosed within two years of sample collection

Conditional logistic regression

Genus

Relative abundance

Table 26- Associations between genera (rel. abund) and the risk of bc among cases diagnosed within two years of sample collection

Presence

Table 27- Associations between the presence of genera and the risk of bc among cases diagnosed within two years of sample collection

Species

Relative abundance

Table 28- Associations between relative abundance of oral microbiome species and the risk of breast cancer among cases diagnosed within two years of sample collection

Presence

Alpha and Beta diversity for more than 2 years

Figure 8- PCoA plots and alpha diversity among cases diagnosed after two years of sample collection

MiRKAT for more than 2 years

Table 29- Microbiome regression-based test (MiRKAT) results among cases diagnosed after two years of sample collection

Conditional logistic regression for alpha diversity indices

Table 30- Conditional logistic regression for alpha diversitty indices among cases diagnosed after two years of sample collection

Conditional logistic regression among cases diagnosed after 2 years of sample collection

Genus

Relative abundance

Table 31- Associations between genera (rel. abund) and the risk of bc among cases diagnosed after two years of sample collection

Presence

Table 32- Associations between the presence of genera and the risk of bc among cases diagnosed after two years of sample collection