This document is the analytic workflow of metabarcoding data of bacteria from PRIN Tuber project The data were generated by amplification of the 16S rRNA gene. The analysis is performed using the R software and the RStudio interface.
this is a citation test: (Martin et al., 2010)
Metadata and phyloseq objects are imported from the QIIME2 pipeline.
## # A tibble: 48 × 7
## `sample-id` sampleName sampleGroup sampleType samplingSite samplingArea
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1CO1 1CO_A 1CO not_treated C1 CONTROL
## 2 1CO2 1CO_B 1CO not_treated C1 CONTROL
## 3 1CO3 1CO_C 1CO not_treated C1 CONTROL
## 4 2CO1 2CO_A 2CO not_treated C2 CONTROL
## 5 2CO2 2CO_B 2CO not_treated C2 CONTROL
## 6 2CO3 2CO_C 2CO not_treated C2 CONTROL
## 7 3CO1 3CO_A 3CO not_treated C3 CONTROL
## 8 3CO2 3CO_B 3CO not_treated C3 CONTROL
## 9 3CO3 3CO_C 3CO not_treated C3 CONTROL
## 10 4CO1 4CO_A 4CO not_treated C4 CONTROL
## # ℹ 38 more rows
## # ℹ 1 more variable: samplingTime <chr>
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 12247 taxa and 48 samples ]
## sample_data() Sample Data: [ 48 samples by 6 sample variables ]
## tax_table() Taxonomy Table: [ 12247 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 12247 tips and 12238 internal nodes ]
## refseq() DNAStringSet: [ 12247 reference sequences ]
First glimpse of bacterial microbiome in the whole dataset.
Historical Weather Data was obtained from the OpenWeatherMap. The data include temperature, humidity, wind speed, and other weather-related variables. The data could used to explore the relationship between weather conditions and the bacterial microbiome or Tuber biomass.
Based on the intercepts of the trend lines, it appears that 2020 started with a higher mean temperature than 2019 during the months considered. However, the overall trend shows a more rapid decrease in temperature for 2020.
## Reading layer `Reg01012024_WGS84' from data source
## `C:\Users\STEFANOGHIGNONE\ProjectsInR\data\Limiti01012024\Reg01012024\Reg01012024_WGS84.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 20 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 313279.3 ymin: 3933683 xmax: 1312016 ymax: 5220292
## Projected CRS: WGS 84 / UTM zone 32N
The Tuber biomass was quantified using the qPCR method. The Ct values were used to estimate the mean and standard deviation of the Ct values for each sample group. The Ct values are inversely proportional to the amount of DNA in the sample. The lower the Ct value, the higher the amount of DNA in the sample.
The dataset is filtered to remove low abundant ASVs, samples with low sequencing depth, contaminants and singletons. The following steps are performed:
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 12007 taxa and 47 samples ]
## sample_data() Sample Data: [ 47 samples by 6 sample variables ]
## tax_table() Taxonomy Table: [ 12007 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 12007 tips and 11998 internal nodes ]
## refseq() DNAStringSet: [ 12007 reference sequences ]
The Core Microbiome is defined as the set of ASVs present in both sampling areas (CONTROL AND TREATTED), not necessary present in all samples. The CORE microbiome is identified using the phyloseq package. The following steps are performed:
Then we are interseted in those 6707 ASVs that are present in both sampling areas, that represent the 58% of the whole dataset. We will use this subset for the following analysis.
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 6707 taxa and 47 samples ]
## sample_data() Sample Data: [ 47 samples by 6 sample variables ]
## tax_table() Taxonomy Table: [ 6707 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 6707 tips and 6702 internal nodes ]
## refseq() DNAStringSet: [ 6707 reference sequences ]
| Phylum | Abundance | rel.freq |
|---|---|---|
| Acidobacteriota | 2651185 | 24.54% |
| Pseudomonadota | 2137173 | 19.79% |
| Chloroflexota | 1515232 | 14.03% |
| Actinomycetota | 1471555 | 13.62% |
| Planctomycetota | 1219076 | 11.29% |
| Verrucomicrobiota | 378871 | 3.51% |
| Gemmatimonadota | 319134 | 2.95% |
| Bacteroidota | 288739 | 2.67% |
| Myxococcota | 213390 | 1.98% |
| Methylomirabilota | 101164 | 0.94% |
| Latescibacterota | 95914 | 0.89% |
| Armatimonadota | 85081 | 0.79% |
| Bacillota | 73697 | 0.68% |
| Thermodesulfobacteriota | 47978 | 0.44% |
| NB1-j | 40819 | 0.38% |
| Deinococcota | 39250 | 0.36% |
| Bdellovibrionota | 18469 | 0.17% |
| Thermoproteota | 12594 | 0.12% |
| Candidatus_Kapabacteria | 11122 | 0.1% |
| Patescibacteria | 10411 | 0.1% |
| Elusimicrobiota | 9305 | 0.09% |
| Entotheonellaeota | 8711 | 0.08% |
| Cyanobacteriota | 8120 | 0.08% |
| MBNT15 | 8104 | 0.08% |
| Nitrospinota | 6987 | 0.06% |
| Candidatus_Kryptonia | 4604 | 0.04% |
| Nitrospirota | 3080 | 0.03% |
| Sumerlaeota | 2812 | 0.03% |
| Zixibacteria | 2430 | 0.02% |
| Thermoplasmatota | 2357 | 0.02% |
| Fibrobacterota | 2118 | 0.02% |
| Incertae_Sedis | 1840 | 0.02% |
| Ignavibacteriota | 1799 | 0.02% |
| Hydrogenedentes | 1786 | 0.02% |
| Spirochaetota | 1385 | 0.01% |
| WS2 | 1004 | 0.01% |
| WS4 | 925 | 0.01% |
| FCPU426 | 802 | 0.01% |
| GAL15 | 513 | 0% |
| Fusobacteriota | 498 | 0% |
| Chlamydiota | 486 | 0% |
| Nanoarchaeota | 405 | 0% |
| Dependentiae | 290 | 0% |
| Candidatus_Eremiobacterota | 198 | 0% |
| Abditibacteriota | 58 | 0% |
| SAR324_clade(Marine_group_B) | 45 | 0% |
[selected by taxon name]
[selected by phylogenetic similarity, among Xanthobacteraceae]
## Taxonomy Table: [1 taxa by 7 taxonomic ranks]:
## Kingdom Phylum Class
## EU133537.1.1316 "d__Bacteria" "Pseudomonadota" "Alphaproteobacteria"
## Order Family Genus Species
## EU133537.1.1316 "Hyphomicrobiales" "Xanthobacteraceae" "Bradyrhizobium" NA
## OTU Table: [1 taxa and 47 samples]
## taxa are rows
## 1BO1 1BO2 1BO3 1CO1 1CO2 1CO3 1NT1 1NT2 1NT3 1T1 1T2 1T3 2BO1
## EU133537.1.1316 581 132 229 112 191 196 476 399 174 227 348 311 178
## 2BO2 2BO3 2CO1 2CO2 2CO3 2NT1 2NT2 2NT3 2T1 2T2 2T3 3BO1 3BO3
## EU133537.1.1316 337 336 0 180 93 100 83 156 141 134 109 63 387
## 3CO1 3CO2 3CO3 3NT1 3NT2 3NT3 3T1 3T2 3T3 4BO1 4BO2 4BO3 4CO1
## EU133537.1.1316 132 197 221 98 140 202 254 269 266 78 86 107 140
## 4CO2 4CO3 4NT1 4NT2 4NT3 4T1 4T2 4T3
## EU133537.1.1316 196 339 188 193 121 408 259 381
1. PERMANOVA Results
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Blocks: metadata$samplingSite
## Permutation: free
## Number of permutations: 9999
##
## adonis2(formula = bray_dist ~ samplingArea * samplingTime, data = metadata, permutations = ctrl, by = "terms", type = 3)
## Df SumOfSqs R2 F Pr(>F)
## samplingArea 1 0.3104 0.03440 1.6937 1e-04 ***
## samplingTime 1 0.4443 0.04924 2.4244 1e-04 ***
## samplingArea:samplingTime 1 0.3879 0.04300 2.1169 2e-04 ***
## Residual 43 7.8799 0.87336
## Total 46 9.0225 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Interaction P-value: 2e-04
Evidence for a BACI Effect: The PERMANOVA results, prior to Hellinger transformation, show a statistically significant samplingArea:samplingTime interaction (p = 2e-04). This indicates that the microbial community composition changed differently over time in the treated areas compared to the control areas. This provides strong evidence for a Before-After-Control-Impact (BACI) effect.
2. Homogeneity of Dispersion Test (Before Transformation)
##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 9999
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 3 0.070436 0.0234787 7.0244 9999 5e-04 ***
## Residuals 43 0.143725 0.0033424
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
## CONTROL.t0 TREATED.t0 CONTROL.t1 TREATED.t1
## CONTROL.t0 0.1697000 0.0013000 0.0015
## TREATED.t0 0.1762115 0.0236000 0.0285
## CONTROL.t1 0.0011841 0.0261143 0.5340
## TREATED.t1 0.0018466 0.0286547 0.5354276
Impact of Heterogeneous Dispersion: The homogeneity of dispersion test, conducted on the original (untransformed) data, is significant (p = 5e-04). This indicates that there are significant differences in the dispersion (variance) of microbial community composition among the groups (CONTROL.t0, TREATED.t0, CONTROL.t1, TREATED.t1). As such, any conclusions drawn from the PERMANOVA model should be taken cautiously.
3. PERMANOVA Results (Hellinger Transformed)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Blocks: metadata$samplingSite
## Permutation: free
## Number of permutations: 9999
##
## adonis2(formula = bray_dist_hellinger ~ samplingArea * samplingTime, data = metadata, permutations = ctrl, by = "terms", type = 3)
## Df SumOfSqs R2 F Pr(>F)
## samplingArea 1 0.2303 0.02927 1.4068 0.0001 ***
## samplingTime 1 0.4110 0.05225 2.5111 0.0001 ***
## samplingArea:samplingTime 1 0.1871 0.02379 1.1433 0.0264 *
## Residual 43 7.0379 0.89469
## Total 46 7.8663 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Interaction P-value (Hellinger): 0.0264
Transformation and Continued BACI Signal: With Hellinger Transformation, PERMANOVA results present a still significant samplingArea:samplingTime interaction (p=0.0264). The BACI effect is somewhat weaker, but still present and significant after transformation.
4. Homogeneity of Dispersion Test (After Transformation):
##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 9999
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 3 0.060606 0.020202 9.297 9999 2e-04 ***
## Residuals 43 0.093438 0.002173
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
## CONTROL.t0 TREATED.t0 CONTROL.t1 TREATED.t1
## CONTROL.t0 0.11000000 0.00030000 0.0008
## TREATED.t0 0.11367043 0.00950000 0.0127
## CONTROL.t1 0.00032167 0.00816958 0.7847
## TREATED.t1 0.00057920 0.01248912 0.78649891
Dispersion Persists after Transformation:
5. CI Metrics
## [1] "CI Divergence: 0.00785809500206752"
## [1] "CI Contribution: -0.0338172272475884"
Ambiguous Community Shift (CI Metrics):
The metrics that assess community shifts show departure between treated and control sites in time (CI Divergence: 0.00785809500206752), but a negative CI contribution (-0.0338172272475884) suggest that control sites contribute more to time dependent community composition change than treated sites.
6. Ordination Plot
The plot shows some separation between control and treated groups, but the ellipses overlap considerably. This suggests that while there is a statistically significant BACI effect, the magnitude of the effect may not be large. The visual output does suggest that the samplingArea variable effects samples more than the samplingTime variable.
Conclusions
The analysis provides evidence for a statistically significant BACI effect. The microbial communities in the treated areas changed differently over time compared to the control areas.
However, the significant heterogeneity of dispersion suggests caution. Although the Hellinger transformation attempts to mitigate differences in dispersion, it does not eliminate the issue entirely.
The analysis provides evidence for a BACI effect, but the dispersion differences require careful and nuanced interpretation.
Here’s a step-by-step approach to calculate correlations between biomass and alpha diversity indices grouped by samplingTime. It’s not possible to perform other kind of grouping becouse some groups have insufficient data for correlation calculations (e.g., groups with only 1 observation).
Correlation requires ≥3 observations per group to compute a meaningful estimate. This means each unique combination of samplingSite and samplingTime must have ≥3 samples for valid correlation calculations. In our case, All groups have only 1 sample, which is insufficient for correlation calculations.
Grouping by samplingTime
No Significant Correlations
All p-values are > 0.05, meaning there is no statistically significant linear relationship between biomass and alpha diversity indices at either time point.
Trends in t1
At t1, moderate positive correlations exist between biomass and:
*Shannon diversity (r = 0.516)
*Observed OTUs (r = 0.545)
These suggest potential biological patterns worth exploring with larger sample sizes.
Weak Relationships in t0
Correlations at t0 are very weak, indicating little association between biomass and diversity metrics.
Key Takeaway
While no significant correlations were found, the trends in t1 suggest further investigation with larger datasets or stratified analyses (e.g., by treatment group or sampling site) could be valuable.