Variable | Incident | Case | Matched | Control |
---|---|---|---|---|
- | Mean/N | SD/(%) | Mean/N | SD/(%) |
Age at entry | 61.5 | 5.2 | 61.5 | 5.3 |
Age group | - | - | ||
<55 yrs | 46 | 14.5 | 49 | 15.4 |
55-59 yrs | 77 | 24.2 | 72 | 22.6 |
60-64 yrs | 106 | 33.3 | 106 | 33.3 |
65-69 yrs | 79 | 24.8 | 80 | 25.2 |
≥ 70 yrs | 10 | 3.1 | 11 | 3.5 |
Age at sample collection | 70.5 | 5.2 | 70.5 | 5.2 |
Age at sample collection group | - | - | ||
<65 yrs | 57 | 17.9 | 57 | 17.9 |
65-69 yrs | 82 | 25.8 | 81 | 25.5 |
70-75 yrs | 106 | 33.3 | 107 | 33.6 |
>75yrs | 73 | 23 | 73 | 23 |
Race | - | - | ||
Non-Hispanic White | 300 | 94.3 | 300 | 94.3 |
Others | 18 | 5.7 | 18 | 5.7 |
Education | - | - | ||
<=12 years or completed high school | 63 | 19.8 | 75 | 23.6 |
Post-high school training (technical training or some college) | 118 | 37.1 | 115 | 36.2 |
College graduate and post graduate | 137 | 43.1 | 128 | 40.3 |
BMI | 26.7 | 5.5 | 26.5 | 5.5 |
BMI Category | - | - | ||
Healthy weight | 145 | 45.6 | 148 | 46.5 |
Overweight | 106 | 33.3 | 103 | 32.4 |
Class 1-3 obesity | 67 | 21.1 | 67 | 21.1 |
Age at menarche | - | - | ||
<=12 years old | 137 | 43.1 | 161 | 50.6 |
13-14 years old | 148 | 46.5 | 134 | 42.1 |
≥ 15 years old | 33 | 10.4 | 23 | 7.2 |
Number of live-born children | - | - | ||
Never had a child | 52 | 16.4 | 47 | 14.8 |
1 | 31 | 9.7 | 38 | 11.9 |
2 | 88 | 27.7 | 73 | 23 |
3-4 | 114 | 35.8 | 121 | 38.1 |
5->=10 | 33 | 10.4 | 39 | 12.3 |
Age at first live birth | - | - | ||
Never gave birth | 48 | 15.1 | 47 | 14.8 |
<16-19 | 47 | 14.8 | 49 | 15.4 |
20-24 | 143 | 45 | 141 | 44.3 |
25-29 | 62 | 19.5 | 56 | 17.6 |
>= 30 | 18 | 5.7 | 25 | 7.9 |
Age at menopause | - | - | ||
<40 | 45 | 14.2 | 54 | 17 |
40-44 | 39 | 12.3 | 46 | 14.5 |
45-49 | 61 | 19.2 | 60 | 18.9 |
50-54 | 125 | 39.3 | 124 | 39 |
≥55 | 30 | 9.4 | 17 | 5.3 |
Still menstruating | 18 | 5.7 | 17 | 5.3 |
Hormone status | - | - | ||
Never | 122 | 38.4 | 125 | 39.3 |
Current | 164 | 51.6 | 167 | 52.5 |
Former | 32 | 10.1 | 26 | 8.2 |
Total years used replacement hormones | - | - | ||
Never used hormones | 122 | 38.4 | 125 | 39.3 |
<=5 years | 67 | 21.1 | 65 | 20.4 |
5-9 years | 45 | 14.2 | 44 | 13.8 |
≥10 years | 84 | 26.4 | 84 | 26.4 |
First-degree female relatives with breast cancer | - | - | ||
No | 250 | 78.6 | 264 | 83 |
Yes | 54 | 17 | 42 | 13.2 |
Unknown/Missing | 14 | 4.4 | 12 | 3.8 |
Cigarette use | - | - | ||
Never | 134 | 42.1 | 134 | 42.1 |
Former | 132 | 41.5 | 132 | 41.5 |
Current | 38 | 11.9 | 38 | 11.9 |
Unknown/Missing | 14 | 4.4 | 14 | 4.4 |
Alcohol from alcoholic drinks (beer, wine, liquor) (g/day) | 6.6 | 13.6 | 6.3 | 14.4 |
Alcohol from alcoholic drinks (beer, wine, liquor) (g/day) group | - | - | ||
0 | 57 | 17.9 | 70 | 22 |
>0-1 | 63 | 19.8 | 77 | 24.2 |
>1-2 | 58 | 18.2 | 53 | 16.7 |
>2-5 | 46 | 14.5 | 35 | 11 |
>5-10 | 32 | 10.1 | 22 | 6.9 |
>10 | 62 | 19.5 | 61 | 19.2 |
Self-reported health condition | - | - | ||
Excellent | 73 | 23 | 62 | 19.5 |
Very good | 131 | 41.2 | 130 | 40.9 |
Good | 92 | 28.9 | 96 | 30.2 |
Fair | 22 | 6.9 | 30 | 9.4 |
Breast cancer grade | - | - | ||
Grade I or (well) differentiated | 76 | 23.9 | 0 | 0 |
Grade II or moderately (well) differentiated | 126 | 39.6 | 0 | 0 |
Grade III or poorly differentiated | 74 | 23.3 | 1 | 0.3 |
Grade IV or undifferentiated/anaplastic | 12 | 3.8 | 0 | 0 |
Grade and differentiation not stated | 30 | 9.4 | 0 | 0 |
NA | - | 337 | 106 | |
Estrogen Receptor for breast cancer | - | - | ||
0 = Test Not Done | 0 | 0 | 0 | 0 |
1 Test Done, Results Positive/Elevated | 256 | 80.5 | 1 | 0.3 |
2 = Test Done, Results Negative/Normal | 30 | 9.4 | 0 | 0 |
3 Test Done, Results Borderline or Undetermined Whether Positive or Negative | 1 | 0.3 | 0 | 0 |
x=Registry did not return hormone receptor data | 12 | 3.8 | 0 | 0 |
NA | 0 | 0 | 317 | 99.7 |
Missing | 19 | 6 | 0 | 0 |
Progesterone receptor for breast cancer | - | - | ||
0 = Test Not Done | 0 | 0 | 0 | 0 |
1 Test Done, Results Positive/Elevated | 206 | 64.8 | 1 | 0.3 |
2 = Test Done, Results Negative/Normal | 72 | 22.6 | 0 | 0 |
3 Test Done, Results Borderline or Undetermined Whether Positive or Negative | 2 | 0.6 | 0 | 0 |
x=Registry did not return hormone receptor data | 13 | 4.1 | 0 | 0 |
NA | 0 | 0 | 317 | 99.7 |
Missing | 25 | 7.9 | 0 | 0 |
Breast cancer behavior (OLD_BREAST_BEHV) | - | - | ||
2 = in situ | 67 | 21.1 | 0 | 0 |
3 = malignant, primary site | 251 | 78.9 | 0 | 0 |
Oral Microbiome and Breast Cancer Risk in the NIH-AARP
Abstract
Introduction: Breast cancer is the most common cancer among women. Studies suggest a link between periodontal disease and breast cancer risk, but whether oral microbiome differences precede or result from cancer remains unclear. We examined this prospective association in the NIH-AARP Diet and Health Study.
Methods: We matched 348 women who developed breast cancer after oral wash collection to controls using incidence density sampling. Controls were matched by age at sample collection, race and ethnicity, and smoking status (see Figure 1 for more information about the selection of cases and controls). DNA was extracted, and shotgun metagenomic sequencing was performed. We used conditional logistic regression to calculate the odds ratios (OR) for the associations between alpha diversity, beta diversity, species-, and gene-level metrics, with breast cancer risk, adjusting for potential confounders.
Characteristics of study participants
Different characteristics of study participants are showed in Table 1
Alpha diversity
We did not observe differences in alpha diversity indices by breast cancer status (see Figure 2).
Mean and SD by breast cancer status are showed in Table 2
Conditional Logistic regression for alpha diversity
Standardization
Alpha diversity indices were standardized based on standard deviation from controls using the following formula:
\[ (x - mean(x[controls])) / sd(x[controls]) \tag{1}\]
Conditional logistic regression formula
We used the following formula to perform conditional logistic regression:
\[ Breastcan \sim index + strata(SET) + BMIcur + Educ + Livechild + Menopage+ Rel1dbreast + Alcdrinks \tag{2}\]
Breastcan: Breast cancer status, SET: matching factor, BMIcur: Body mass index at sample collection, Educ: Education Level, Livechild: Number of live-born children, Menopage: age at menopause, Rel1dbreast: First-degree female reatives with breast cancer, Alcdrinks: Alcohol from alcoholic drinks (beer, wine, liquor) (g/day)
Table 3 shows that Pielou’s evenness was statistically significant when adjusting for confounders
Quantiles
Alpha diversity indices were also analyzed using quartiles estimated from the distribution among the controls . See Table 4 for more details.
Beta diversity
No spatial clusters were observed in PCoA plots (Figure 3).
MiRKAT
The microbiome regression-based test, adjusting for confounders, was not statistically significant (BC-MiRKAT = 0.2768, Jaccard-MiRKAT = 0.3035). This analysis was also conducted without the matching factor and covariates (Table 5)
As the MiRKAT does not perform conditional logistic regression, matching factor was no used as a covariate in other models
Percentage of human DNA
Conditional logistic regressions were also performed for the percentage of human DNA with and without adjusting for covariates.
Conditional logistic regression results
Genus level
Clogit for each genus associated with breast cancer in the Ghana study
Relative abundance
Presence
Species level
The following conditional logistic regressions were run:
Relative abundance
Clogit for the scaled relative abundance (only for species with relative abundance >0.1%)
CLR-transformed relative abundance
Clogit for CLR-transformed data (only for species with relative abundance >0.1%)
Presence of species
Clogit for the presence of species with prevalence between 5% and 95%
Interaction between relative abundance and presence
Clogit including a term for the presence and then a term for the interaction between the scaled relative abundance and presence (include both betas)
Gene level
Relative abundance
Clogit for the Scaled relative abundance (only for genes with relative abundance > 0.01% (>100 copies per million) (81 genes)
Presence
Clogit for the presence of genes using the same threshold mentioned above (>100 copies per million) (81 genes)
Interaction
Clogit including a term for the presence and then a term for the interaction between the scaled relative abundance and presence (include both betas)
Oral pathogen complexes
Due to previously observed associations between periodontal disease and breast cancer risk, we have a pre-specified hypothesis that specific periodontal pathogens (see Table 16) will be associated with breast cancer risk
Species | Microbial complex |
---|---|
Tannerella forsythia | Red complex |
Tannerella denticola | Red complex |
Porphyromonas gingivalis | Red complex |
Prevotella intermedia | Orange complex |
Prevotella nigrescens | Orange complex |
Prevotella micros | Orange complex |
Fusobacterium nucleatum | Orange complex |
Fusobacterium periodonticum | Orange complex |
Campylobacter gracilis | Orange complex |
Campylobacter rectus | Orange complex |
Campylobacter showae | Orange complex |
Eubacterium nodatum | Orange complex |
Streptococcus constellatus | Orange complex |
Samples were classified based on the presence of any species within each complex listed in Table 16. We then conducted conditional logistic regression, including terms for both the presence of any pathogen from each complex and the total relative abundance of species within each complex.
Invasive breast cancer
We excluded in situ breast cancer cases and their controls. Check the Breast cancer behavior variable in Table 1
Alpha and Beta diversity for invasive breast cancer
Based on rarefaction at 1 million reads, metrics for alpha diversity were generated. Bray-Curtis and Jaccard distance matrices were computed based on 502 samples (excluding 67 in situ cases and their controls).
MiRKAT
The microbiome regression-based test, adjusting for confounders, was not statistically significant for any distance matrix.
ER+ and PR+
Estrogen receptor positive cases
256 cases were Estrogen Receptor (ER) Positive (Table 1). We restricted the analysis to breast cancer cases only. The ER variable was recoded, with ER-positive designated as the outcome and all other categories classified as negative. Unconditional logistic regression was performed for alpha diversity indices ( see Table 19), and MiRKAT was performed for beta diversity matrices (see Table 20)
Unconditional logistic regression for alpha diversity indices
MiRKAT
Progesterone Receptor positive cases
206 cases were Progesterone Receptor (PR) Positive (Table 1). We restricted the analysis to breast cancer cases only. The PR variable was recoded, with PR-positive designated as the outcome and all other categories classified as negative. Unconditional logistic regression was performed for alpha diversity indices ( see Table 21), and MiRKAT was performed for beta diversity matrices (see Table 22)
Unconditional logistic regression for alpha diversity indices
MiRKAT
Time between sample collection and breast cancer diagnosis
Cases were classified into two groups according to the time from biospecimen collection to subsequent breast cancer diagnosis. 97 cases were diagnosed during the first 2 years of follow-up, while 221 cases were diagnosed after 2 years of follow-up (See Table 23 for more information about the time between the sample collection and diagnosis)