Introduction

This document is the analytic workflow of metabarcoding data of bacteria from PRIN Tuber project The data were generated by amplification of the 16S rRNA gene. The analysis is performed using the R software and the RStudio interface.

this is a citation test: (Martin et al., 2010)

Data Import

Metadata and phyloseq objects are imported from the QIIME2 pipeline.

## # A tibble: 48 × 7
##    `sample-id` sampleName sampleGroup sampleType  samplingSite samplingArea
##    <chr>       <chr>      <chr>       <chr>       <chr>        <chr>       
##  1 1CO1        1CO_A      1CO         not_treated C1           CONTROL     
##  2 1CO2        1CO_B      1CO         not_treated C1           CONTROL     
##  3 1CO3        1CO_C      1CO         not_treated C1           CONTROL     
##  4 2CO1        2CO_A      2CO         not_treated C2           CONTROL     
##  5 2CO2        2CO_B      2CO         not_treated C2           CONTROL     
##  6 2CO3        2CO_C      2CO         not_treated C2           CONTROL     
##  7 3CO1        3CO_A      3CO         not_treated C3           CONTROL     
##  8 3CO2        3CO_B      3CO         not_treated C3           CONTROL     
##  9 3CO3        3CO_C      3CO         not_treated C3           CONTROL     
## 10 4CO1        4CO_A      4CO         not_treated C4           CONTROL     
## # ℹ 38 more rows
## # ℹ 1 more variable: samplingTime <chr>

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 12247 taxa and 48 samples ]
## sample_data() Sample Data:       [ 48 samples by 6 sample variables ]
## tax_table()   Taxonomy Table:    [ 12247 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 12247 tips and 12238 internal nodes ]
## refseq()      DNAStringSet:      [ 12247 reference sequences ]

First glimpse of bacterial microbiome in the whole dataset.

Historical Weather Data

Historical Weather Data was obtained from the OpenWeatherMap. The data include temperature, humidity, wind speed, and other weather-related variables. The data could used to explore the relationship between weather conditions and the bacterial microbiome or Tuber biomass.

Based on the intercepts of the trend lines, it appears that 2020 started with a higher mean temperature than 2019 during the months considered. However, the overall trend shows a more rapid decrease in temperature for 2020.

Sampling site location

## Reading layer `Reg01012024_WGS84' from data source 
##   `C:\Users\STEFANOGHIGNONE\ProjectsInR\data\Limiti01012024\Reg01012024\Reg01012024_WGS84.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 20 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 313279.3 ymin: 3933683 xmax: 1312016 ymax: 5220292
## Projected CRS: WGS 84 / UTM zone 32N

Tuber Biomass Estimation

The Tuber biomass was quantified using the qPCR method. The Ct values were used to estimate the mean and standard deviation of the Ct values for each sample group. The Ct values are inversely proportional to the amount of DNA in the sample. The lower the Ct value, the higher the amount of DNA in the sample.

Data Preprocessing

The dataset is filtered to remove low abundant ASVs, samples with low sequencing depth, contaminants and singletons. The following steps are performed:

retain samples with more than 10k reads
prune OTUs that are not present in at least one sample
subset the data to only include Bacteria and Archaea
remove contaminats (Mitochondria and Chloroplast) and NA phylum sequences
remove singletons, features that appear less than 3 times in the whole dataset

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 12007 taxa and 47 samples ]
## sample_data() Sample Data:       [ 47 samples by 6 sample variables ]
## tax_table()   Taxonomy Table:    [ 12007 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 12007 tips and 11998 internal nodes ]
## refseq()      DNAStringSet:      [ 12007 reference sequences ]

Core Microbiome

The Core Microbiome is defined as the set of ASVs present in both sampling areas (CONTROL AND TREATTED), not necessary present in all samples. The CORE microbiome is identified using the phyloseq package. The following steps are performed:

Subset the phyloseq object by sampling area
Get the ASVs present in each sampling area
Create a list of ASVs for the Venn diagram
Create and plot the Venn diagram

Then we are interseted in those 6707 ASVs that are present in both sampling areas, that represent the 58% of the whole dataset. We will use this subset for the following analysis.

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 6707 taxa and 47 samples ]
## sample_data() Sample Data:       [ 47 samples by 6 sample variables ]
## tax_table()   Taxonomy Table:    [ 6707 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 6707 tips and 6702 internal nodes ]
## refseq()      DNAStringSet:      [ 6707 reference sequences ]

Basic statistics

Count at Phylum level, shows that the dataset comprises 46 Phyla.

Phylum	Abundance	rel.freq
Acidobacteriota	2651185	24.54%
Pseudomonadota	2137173	19.79%
Chloroflexota	1515232	14.03%
Actinomycetota	1471555	13.62%
Planctomycetota	1219076	11.29%
Verrucomicrobiota	378871	3.51%
Gemmatimonadota	319134	2.95%
Bacteroidota	288739	2.67%
Myxococcota	213390	1.98%
Methylomirabilota	101164	0.94%
Latescibacterota	95914	0.89%
Armatimonadota	85081	0.79%
Bacillota	73697	0.68%
Thermodesulfobacteriota	47978	0.44%
NB1-j	40819	0.38%
Deinococcota	39250	0.36%
Bdellovibrionota	18469	0.17%
Thermoproteota	12594	0.12%
Candidatus_Kapabacteria	11122	0.1%
Patescibacteria	10411	0.1%
Elusimicrobiota	9305	0.09%
Entotheonellaeota	8711	0.08%
Cyanobacteriota	8120	0.08%
MBNT15	8104	0.08%
Nitrospinota	6987	0.06%
Candidatus_Kryptonia	4604	0.04%
Nitrospirota	3080	0.03%
Sumerlaeota	2812	0.03%
Zixibacteria	2430	0.02%
Thermoplasmatota	2357	0.02%
Fibrobacterota	2118	0.02%
Incertae_Sedis	1840	0.02%
Ignavibacteriota	1799	0.02%
Hydrogenedentes	1786	0.02%
Spirochaetota	1385	0.01%
WS2	1004	0.01%
WS4	925	0.01%
FCPU426	802	0.01%
GAL15	513	0%
Fusobacteriota	498	0%
Chlamydiota	486	0%
Nanoarchaeota	405	0%
Dependentiae	290	0%
Candidatus_Eremiobacterota	198	0%
Abditibacteriota	58	0%
SAR324_clade(Marine_group_B)	45	0%

Due to the high number of ASVs, we will focus on the most abundant Phyla: we’ll Pick the first 10 most abundant Phyla and merge others into “Other”.

Distribution of the most abundant PHYLA in the dataset.

Distribution of the most abundant CLASS in the dataset.

Hunt for Bradyrhizobium

Pseudomonadota

Bradyrhizobium

[selected by taxon name]

Xanthobacteraceae

[selected by phylogenetic similarity, among Xanthobacteraceae]

## Taxonomy Table:     [1 taxa by 7 taxonomic ranks]:
##                 Kingdom       Phylum           Class                
## EU133537.1.1316 "d__Bacteria" "Pseudomonadota" "Alphaproteobacteria"
##                 Order              Family              Genus            Species
## EU133537.1.1316 "Hyphomicrobiales" "Xanthobacteraceae" "Bradyrhizobium" NA

## OTU Table:          [1 taxa and 47 samples]
##                      taxa are rows
##                 1BO1 1BO2 1BO3 1CO1 1CO2 1CO3 1NT1 1NT2 1NT3 1T1 1T2 1T3 2BO1
## EU133537.1.1316  581  132  229  112  191  196  476  399  174 227 348 311  178
##                 2BO2 2BO3 2CO1 2CO2 2CO3 2NT1 2NT2 2NT3 2T1 2T2 2T3 3BO1 3BO3
## EU133537.1.1316  337  336    0  180   93  100   83  156 141 134 109   63  387
##                 3CO1 3CO2 3CO3 3NT1 3NT2 3NT3 3T1 3T2 3T3 4BO1 4BO2 4BO3 4CO1
## EU133537.1.1316  132  197  221   98  140  202 254 269 266   78   86  107  140
##                 4CO2 4CO3 4NT1 4NT2 4NT3 4T1 4T2 4T3
## EU133537.1.1316  196  339  188  193  121 408 259 381

Differential abundance analysis

all not treated vs treated

Among TREATED samples, t0 vs t1

Among CONTROL samples, t0 vs t1

Before-After-Control-Impact Analysis

1. PERMANOVA Results

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Blocks:  metadata$samplingSite 
## Permutation: free
## Number of permutations: 9999
## 
## adonis2(formula = bray_dist ~ samplingArea * samplingTime, data = metadata, permutations = ctrl, by = "terms", type = 3)
##                           Df SumOfSqs      R2      F Pr(>F)    
## samplingArea               1   0.3104 0.03440 1.6937  1e-04 ***
## samplingTime               1   0.4443 0.04924 2.4244  1e-04 ***
## samplingArea:samplingTime  1   0.3879 0.04300 2.1169  2e-04 ***
## Residual                  43   7.8799 0.87336                  
## Total                     46   9.0225 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Interaction P-value: 2e-04

Evidence for a BACI Effect: The PERMANOVA results, prior to Hellinger transformation, show a statistically significant samplingArea:samplingTime interaction (p = 2e-04). This indicates that the microbial community composition changed differently over time in the treated areas compared to the control areas. This provides strong evidence for a Before-After-Control-Impact (BACI) effect.

2. Homogeneity of Dispersion Test (Before Transformation)

## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 9999
## 
## Response: Distances
##           Df   Sum Sq   Mean Sq      F N.Perm Pr(>F)    
## Groups     3 0.070436 0.0234787 7.0244   9999  5e-04 ***
## Residuals 43 0.143725 0.0033424                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
##            CONTROL.t0 TREATED.t0 CONTROL.t1 TREATED.t1
## CONTROL.t0             0.1697000  0.0013000     0.0015
## TREATED.t0  0.1762115             0.0236000     0.0285
## CONTROL.t1  0.0011841  0.0261143                0.5340
## TREATED.t1  0.0018466  0.0286547  0.5354276

Impact of Heterogeneous Dispersion: The homogeneity of dispersion test, conducted on the original (untransformed) data, is significant (p = 5e-04). This indicates that there are significant differences in the dispersion (variance) of microbial community composition among the groups (CONTROL.t0, TREATED.t0, CONTROL.t1, TREATED.t1). As such, any conclusions drawn from the PERMANOVA model should be taken cautiously.

3. PERMANOVA Results (Hellinger Transformed)

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Blocks:  metadata$samplingSite 
## Permutation: free
## Number of permutations: 9999
## 
## adonis2(formula = bray_dist_hellinger ~ samplingArea * samplingTime, data = metadata, permutations = ctrl, by = "terms", type = 3)
##                           Df SumOfSqs      R2      F Pr(>F)    
## samplingArea               1   0.2303 0.02927 1.4068 0.0001 ***
## samplingTime               1   0.4110 0.05225 2.5111 0.0001 ***
## samplingArea:samplingTime  1   0.1871 0.02379 1.1433 0.0264 *  
## Residual                  43   7.0379 0.89469                  
## Total                     46   7.8663 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Interaction P-value (Hellinger): 0.0264

Transformation and Continued BACI Signal: With Hellinger Transformation, PERMANOVA results present a still significant samplingArea:samplingTime interaction (p=0.0264). The BACI effect is somewhat weaker, but still present and significant after transformation.

4. Homogeneity of Dispersion Test (After Transformation):

## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 9999
## 
## Response: Distances
##           Df   Sum Sq  Mean Sq     F N.Perm Pr(>F)    
## Groups     3 0.060606 0.020202 9.297   9999  2e-04 ***
## Residuals 43 0.093438 0.002173                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
##            CONTROL.t0 TREATED.t0 CONTROL.t1 TREATED.t1
## CONTROL.t0            0.11000000 0.00030000     0.0008
## TREATED.t0 0.11367043            0.00950000     0.0127
## CONTROL.t1 0.00032167 0.00816958                0.7847
## TREATED.t1 0.00057920 0.01248912 0.78649891

Dispersion Persists after Transformation:

The dispersion test is still significant after Hellinger transformation. (p=2e-04).
Therefore, even with transformation, the assumption of equal dispersion among groups is not met, so caution is still advised when interpreting PERMANOVA output.

5. CI Metrics

## [1] "CI Divergence: 0.00785809500206752"

## [1] "CI Contribution: -0.0338172272475884"

Ambiguous Community Shift (CI Metrics):

The metrics that assess community shifts show departure between treated and control sites in time (CI Divergence: 0.00785809500206752), but a negative CI contribution (-0.0338172272475884) suggest that control sites contribute more to time dependent community composition change than treated sites.

6. Ordination Plot

The plot shows some separation between control and treated groups, but the ellipses overlap considerably. This suggests that while there is a statistically significant BACI effect, the magnitude of the effect may not be large. The visual output does suggest that the samplingArea variable effects samples more than the samplingTime variable.

Conclusions

The analysis provides evidence for a statistically significant BACI effect. The microbial communities in the treated areas changed differently over time compared to the control areas.

However, the significant heterogeneity of dispersion suggests caution. Although the Hellinger transformation attempts to mitigate differences in dispersion, it does not eliminate the issue entirely.

The analysis provides evidence for a BACI effect, but the dispersion differences require careful and nuanced interpretation.

Correlation

Here’s a step-by-step approach to calculate correlations between biomass and alpha diversity indices grouped by samplingTime. It’s not possible to perform other kind of grouping becouse some groups have insufficient data for correlation calculations (e.g., groups with only 1 observation).

Correlation requires ≥3 observations per group to compute a meaningful estimate. This means each unique combination of samplingSite and samplingTime must have ≥3 samples for valid correlation calculations. In our case, All groups have only 1 sample, which is insufficient for correlation calculations.

Grouping by samplingTime

No Significant Correlations

All p-values are > 0.05, meaning there is no statistically significant linear relationship between biomass and alpha diversity indices at either time point.

Trends in t1

At t1, moderate positive correlations exist between biomass and:

*Shannon diversity (r = 0.516)

*Observed OTUs (r = 0.545)

These suggest potential biological patterns worth exploring with larger sample sizes.

Weak Relationships in t0

Correlations at t0 are very weak, indicating little association between biomass and diversity metrics.

Key Takeaway

While no significant correlations were found, the trends in t1 suggest further investigation with larger datasets or stratified analyses (e.g., by treatment group or sampling site) could be valuable.

Reference List

Martin, F., Kohler, A., Murat, C., Balestrini, R., Coutinho, P. M., Jaillon, O., Montanini, B., Morin, E., Noel, B., Percudani, R., et al. (2010). Perigord black truffle genome uncovers evolutionary origins and mechanisms of symbiosis. NATURE 464:1033–1038.

PRIN TUBER: bacteria metabarcoding