Treatment	Treatment Number
NC-GP+COC	1
NATUR GR	11
SURMAX	8

Samples per Group
	1	11	8
Cecum	12	12	12
Ileum	12	12	12
Feces	10	12	12

Exploration

After filtering low-abundance sequences, the remaining data can be explored more easily. An evaluation of the overall dataset shows 585 remaining taxa, and we see below the distribution of these taxa, first at the phylum level.

This plot highlights the distribution of taxa at the Family level within treatment and sample location. We can see that while the sample type has a strong impact on distribution, treatment only affects the composition of the microbiome in a few instances.

There are other ways to explore these data, as well, such as principal coordinates analysis, and alpha diversity analysis. These methods allow us to identify differences in composition or richness based on experimental or environmental factors (i.e. Treatment, age, farm, temperature, and more).

Alpha Diversity

We start by looking at the richness of the community in each location, comaparing the different treatment groups. There are several methods for measuring community richness; one is to simply count the number of taxa found in each location. Others involve more complicated indices that take into account other features of the community, such as the distribution of taxa across various phylogenetic groups. Shannon and Simpson indices are two examples of this.

Here we find no differences between treatment groups in any of the three locations, regardless of the index used. This is not uncommon; many changes in microbial diversity occur at the level of abundance, rather than richness. In other words, the types of bacteria in a community may not change very much, but their abundance can shift.

Differentially Abundant Features

After comparing the grouping of samples by PCOA, we now look at some methods for identifying specific changes in composition based on the treatments included in this study. The data can be filtered, merged, and subset to make comparisons between any two groups, and at any taxonomic level.

GOAL = To identify features (i.e., species, OTUs, gene families, etc.) that differ according to some study condition of interest.

DESeq2

One such method for identifying differential abundance is DESeq. DESeq (Differential Expression analysis for Sequence counts) is a robust method for computing differential expression in complex datasets. This analysis compares the levels of each unique sequence in two groups, identifies sequences that are significantly up or down group 1 relative to group 2, and performs some corrections that reduce the number of false positives (False discovery correction for multiple comparisons). The results are normally reported as the Log_2 fold-change in group 1 relative to group 2; this is the estimated effect size.

\[ Log_2FC = Log_2\frac{A}{B} = Log_2A - Log_2B \]

This means that when bacteria X is higher in group A compared to group B, you will see a positive fold-change; if bacteria X is lower in group A, it will be negative. If the Log_2 fold-change is 1 that means that the real fold-change is 2, or that A is double of B. A Log_2 fold-change of -1 for A vs B means that A has half the expression of that bacteria as B.

\[ \\ Negative \ Log_2FC = A < B \\ Positive \ Log_2FC = A > B \]

ANCOMBC

Another method available for evaluating differentially abundant bacteria is ANCOMBC (Analysis of Composition of Microbiomes with Bias Correction). ANCOM was developed specifically for microbiome analysis, and accounts for the underlying structure of the data. Additionally, it can be used for comparing the composition of microbiomes in more than two populations at a time.

The underlying equations used to generate ANCOMBC’s list of differential features is a bit different from DESeq; by using both ANCOMBC and DESeq together, we can identify the features (sequences) that are identified as significant by both methods. This gives us higher confidence in the importance of these features.

Both for most comparisons in this dataset, both DESeq and ANCOM identified 10 or more significant taxa, but in most cases, only a few of those taxa were found in both analyses. You can see the DESeq output below, as well as the normalized difference in abundance for those features identified by both methods.

Significant features by method
	DESeq	ANCOM
C11 vs C8	8	28
C1 vs C8	22	31
C1 vs C11	17	16
I11 vs I8	22	3
I1 vs I8	20	7
I1 vs I11	25	8

These results suggest that within the cecum…, while in the ileum…

…………

Variance Analyses

In addition to evaluating changes in individual bacteria, we also want to understand what impact our variables of interest had on the overall population. PERMANOVA (Permutational multivariate analysis of variance) quantifies multivariate differences between groups, evaluating whether the group has a significant effect on overall gut microbiota composition or diversity.

We start by looking at the impact of treatment and sample location on alpha diversity.

metadata$TreatmentNumber = as.factor(metadata$TreatmentNumber)
aov.shannon.Treatment = aov(Alfa_Shannon ~ TreatmentNumber, data=metadata)
summary(aov.shannon.Treatment)

##                  Df Sum Sq Mean Sq F value Pr(>F)
## TreatmentNumber   2   0.22  0.1078    0.05  0.951
## Residuals       103 220.54  2.1412

TukeyHSD(aov.shannon.Treatment)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Alfa_Shannon ~ TreatmentNumber, data = metadata)
## 
## $TreatmentNumber
##            diff        lwr       upr     p adj
## 8-1  0.05818931 -0.7739751 0.8903537 0.9848737
## 11-1 0.11101103 -0.7211534 0.9431754 0.9460661
## 11-8 0.05282172 -0.7673685 0.8730119 0.9871538

metadata$SampleLocation = as.factor(metadata$SampleLocation)
aov.shannon.SampleLocation = aov(Alfa_Shannon ~ SampleLocation, data=metadata)
summary(aov.shannon.SampleLocation)

##                 Df Sum Sq Mean Sq F value Pr(>F)    
## SampleLocation   2 136.28   68.14   83.08 <2e-16 ***
## Residuals      103  84.48    0.82                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TukeyHSD(aov.shannon.SampleLocation)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Alfa_Shannon ~ SampleLocation, data = metadata)
## 
## $SampleLocation
##          diff        lwr         upr     p adj
## F-C -2.132055 -2.6470954 -1.61701439 0.0000000
## I-C -2.581177 -3.0888064 -2.07354741 0.0000000
## I-F -0.449122 -0.9641625  0.06591854 0.1003684

aov.shannon.BOTH = aov(Alfa_Shannon ~ SampleLocation*TreatmentNumber, data=metadata)
summary(aov.shannon.BOTH)

##                                Df Sum Sq Mean Sq F value Pr(>F)    
## SampleLocation                  2 136.28   68.14  80.789 <2e-16 ***
## TreatmentNumber                 2   0.37    0.19   0.219  0.803    
## SampleLocation:TreatmentNumber  4   2.30    0.57   0.681  0.607    
## Residuals                      97  81.81    0.84                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#TukeyHSD(aov.shannon.BOTH)

From these analyses we see no significant impact of treatment on the richness of the microbial community, though there is a strong impact of location on richness. Applying the same analysis on the compositional data, we reach similar conclusions. PERMANOVA does not show significant impacts of treatment on composition of the microbiome. Despite the lack of significance, we can explore these results more fully as needed, using the calculated PERMANOVA coefficients to see the top taxa separating the groups.

## 
## Call:
## adonis(formula = t(otu_table(ODLEPobj)) ~ TreatmentNumber * SampleLocation,      data = metadata, permutations = 99, method = "bray") 
## 
## Permutation: free
## Number of permutations: 99
## 
## Terms added sequentially (first to last)
## 
##                                 Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)   
## TreatmentNumber                  2     0.652  0.3260  1.3428 0.01899   0.18   
## SampleLocation                   2     9.372  4.6860 19.3040 0.27293   0.01 **
## TreatmentNumber:SampleLocation   4     0.768  0.1921  0.7914 0.02238   0.78   
## Residuals                       97    23.546  0.2427         0.68571          
## Total                          105    34.339                 1.00000          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Another method for identifying taxa that separate experimental groups is SIMPER. SIMPER identifies the taxa that most contribute to beta-diversity measures. These are just the taxa that most contribute to Bray-Curtis distance measures between the compared groups. This is the same distance matrix that is used to generate popular beta diversity graphics such as PCOA.

In this analysis, we focus on the filtered subset of taxa, as described above; removing very low-abundance species whose counts might be unreliable or biased. Similar to ANCOMBC and DESeq, we can assign a significance to each of these taxa and their contribution to the groups using the Kruskal Wallis Test, which is a nonparametric equivalent to an ANOVA test. However, the limitations to SIMPER + Kruskal Wallis make methods such as DESeq more popular.

Top Taxa
Comparison	SIMPER	Taxa	pval	fdr
I_C	0.00865026484968401	UNKOWN-UNKOWN	0.6588838	0.7479006
I_C	0.008499901169711	Campylobacter-UNKOWN	0.6685589	0.7479006
I_C	0.00955565341919401	UNKOWN-UNKOWN	0.6216761	0.7479006
I_C	0.00895963445751102	Parasutterella-UNKOWN	0.7479006	0.7479006
F_C	0.009740570513597	Parasutterella-UNKOWN	0.7479006	0.7479006
I_C	0.00913408948661798	Faecalibacterium-uncultured_bacterium	0.2695016	0.7479006

REPORT_325

Daniela Varela T