Introduction

Microbiome and gene expression data from Iluma Experiment 325 was evaluated based on Treatment (3 treatments included) and performance. Data was collected from animals at 28 days of age, following treatment from day of hatch. As you can see below, there are 12 samples per treatment in each location, except for the fecal samples, where two samples were excluded or missing. The sequencing process identified more than 1800 unique taxa, many of which are very low abundance, or present in a small number of samples. As these are unlikely to be biologically related to any performance or treatment-based effects, we passed the data through a filter to highlight the taxa that are more likely to be important in animal health and production.

Treatment Treatment Number
NC-GP+COC 1
NATUR GR 11
SURMAX 8
Samples per Group
1 11 8
Cecum 12 12 12
Ileum 12 12 12
Feces 10 12 12
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 585 taxa and 106 samples ]
## sample_data() Sample Data:       [ 106 samples by 10 sample variables ]
## tax_table()   Taxonomy Table:    [ 585 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 585 tips and 584 internal nodes ]

Exploration

After filtering low-abundance sequences, the remaining data can be explored more easily. An evaluation of the overall dataset shows 585 remaining taxa, and we see below the distribution of these taxa, first at the phylum level.

This plot highlights the distribution of taxa at the Family level within treatment and sample location. We can see that while the sample type has a strong impact on distribution, treatment only affects the composition of the microbiome in a few instances.

There are other ways to explore these data, as well, such as principal coordinates analysis, and alpha diversity analysis. These methods allow us to identify differences in composition or richness based on experimental or environmental factors (i.e. Treatment, age, farm, temperature, and more).

Alpha Diversity

We start by looking at the richness of the community in each location, comaparing the different treatment groups. There are several methods for measuring community richness; one is to simply count the number of taxa found in each location. Others involve more complicated indices that take into account other features of the community, such as the distribution of taxa across various phylogenetic groups. Shannon and Simpson indices are two examples of this.

Here we find no differences between treatment groups in any of the three locations, regardless of the index used. This is not uncommon; many changes in microbial diversity occur at the level of abundance, rather than richness. In other words, the types of bacteria in a community may not change very much, but their abundance can shift.

Differentially Abundant Features

After comparing the grouping of samples by PCOA, we now look at some methods for identifying specific changes in composition based on the treatments included in this study. The data can be filtered, merged, and subset to make comparisons between any two groups, and at any taxonomic level.

GOAL = To identify features (i.e., species, OTUs, gene families, etc.) that differ according to some study condition of interest.

DESeq2

One such method for identifying differential abundance is DESeq. DESeq (Differential Expression analysis for Sequence counts) is a robust method for computing differential expression in complex datasets. This analysis compares the levels of each unique sequence in two groups, identifies sequences that are significantly up or down group 1 relative to group 2, and performs some corrections that reduce the number of false positives (False discovery correction for multiple comparisons). The results are normally reported as the Log_2 fold-change in group 1 relative to group 2; this is the estimated effect size.

\[ Log_2FC = Log_2\frac{A}{B} = Log_2A - Log_2B \]

This means that when bacteria X is higher in group A compared to group B, you will see a positive fold-change; if bacteria X is lower in group A, it will be negative. If the Log_2 fold-change is 1 that means that the real fold-change is 2, or that A is double of B. A Log_2 fold-change of -1 for A vs B means that A has half the expression of that bacteria as B.

\[ \\ Negative \ Log_2FC = A < B \\ Positive \ Log_2FC = A > B \]

ANCOMBC

Another method available for evaluating differentially abundant bacteria is ANCOMBC (Analysis of Composition of Microbiomes with Bias Correction). ANCOM was developed specifically for microbiome analysis, and accounts for the underlying structure of the data. Additionally, it can be used for comparing the composition of microbiomes in more than two populations at a time.

The underlying equations used to generate ANCOMBC’s list of differential features is a bit different from DESeq; by using both ANCOMBC and DESeq together, we can identify the features (sequences) that are identified as significant by both methods. This gives us higher confidence in the importance of these features.

Both for most comparisons in this dataset, both DESeq and ANCOM identified 10 or more significant taxa, but in most cases, only a few of those taxa were found in both analyses. You can see the DESeq output below, as well as the normalized difference in abundance for those features identified by both methods.

Significant features by method
DESeq ANCOM
C11 vs C8 8 28
C1 vs C8 22 31
C1 vs C11 17 16
I11 vs I8 22 3
I1 vs I8 20 7
I1 vs I11 25 8

These results suggest that within the cecum…, while in the ileum…

…………

Variance Analyses

In addition to evaluating changes in individual bacteria, we also want to understand what impact our variables of interest had on the overall population. PERMANOVA (Permutational multivariate analysis of variance) quantifies multivariate differences between groups, evaluating whether the group has a significant effect on overall gut microbiota composition or diversity.

We start by looking at the impact of treatment and sample location on alpha diversity.

metadata$TreatmentNumber = as.factor(metadata$TreatmentNumber)
aov.shannon.Treatment = aov(Alfa_Shannon ~ TreatmentNumber, data=metadata)
summary(aov.shannon.Treatment)
##                  Df Sum Sq Mean Sq F value Pr(>F)
## TreatmentNumber   2   0.22  0.1078    0.05  0.951
## Residuals       103 220.54  2.1412
TukeyHSD(aov.shannon.Treatment)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Alfa_Shannon ~ TreatmentNumber, data = metadata)
## 
## $TreatmentNumber
##            diff        lwr       upr     p adj
## 8-1  0.05818931 -0.7739751 0.8903537 0.9848737
## 11-1 0.11101103 -0.7211534 0.9431754 0.9460661
## 11-8 0.05282172 -0.7673685 0.8730119 0.9871538
metadata$SampleLocation = as.factor(metadata$SampleLocation)
aov.shannon.SampleLocation = aov(Alfa_Shannon ~ SampleLocation, data=metadata)
summary(aov.shannon.SampleLocation)
##                 Df Sum Sq Mean Sq F value Pr(>F)    
## SampleLocation   2 136.28   68.14   83.08 <2e-16 ***
## Residuals      103  84.48    0.82                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(aov.shannon.SampleLocation)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Alfa_Shannon ~ SampleLocation, data = metadata)
## 
## $SampleLocation
##          diff        lwr         upr     p adj
## F-C -2.132055 -2.6470954 -1.61701439 0.0000000
## I-C -2.581177 -3.0888064 -2.07354741 0.0000000
## I-F -0.449122 -0.9641625  0.06591854 0.1003684
aov.shannon.BOTH = aov(Alfa_Shannon ~ SampleLocation*TreatmentNumber, data=metadata)
summary(aov.shannon.BOTH)
##                                Df Sum Sq Mean Sq F value Pr(>F)    
## SampleLocation                  2 136.28   68.14  80.789 <2e-16 ***
## TreatmentNumber                 2   0.37    0.19   0.219  0.803    
## SampleLocation:TreatmentNumber  4   2.30    0.57   0.681  0.607    
## Residuals                      97  81.81    0.84                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#TukeyHSD(aov.shannon.BOTH)

From these analyses we see no significant impact of treatment on the richness of the microbial community, though there is a strong impact of location on richness. Applying the same analysis on the compositional data, we reach similar conclusions. PERMANOVA does not show significant impacts of treatment on composition of the microbiome. Despite the lack of significance, we can explore these results more fully as needed, using the calculated PERMANOVA coefficients to see the top taxa separating the groups.

## 
## Call:
## adonis(formula = t(otu_table(ODLEPobj)) ~ TreatmentNumber * SampleLocation,      data = metadata, permutations = 99, method = "bray") 
## 
## Permutation: free
## Number of permutations: 99
## 
## Terms added sequentially (first to last)
## 
##                                 Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)   
## TreatmentNumber                  2     0.652  0.3260  1.3428 0.01899   0.18   
## SampleLocation                   2     9.372  4.6860 19.3040 0.27293   0.01 **
## TreatmentNumber:SampleLocation   4     0.768  0.1921  0.7914 0.02238   0.78   
## Residuals                       97    23.546  0.2427         0.68571          
## Total                          105    34.339                 1.00000          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Another method for identifying taxa that separate experimental groups is SIMPER. SIMPER identifies the taxa that most contribute to beta-diversity measures. These are just the taxa that most contribute to Bray-Curtis distance measures between the compared groups. This is the same distance matrix that is used to generate popular beta diversity graphics such as PCOA.

In this analysis, we focus on the filtered subset of taxa, as described above; removing very low-abundance species whose counts might be unreliable or biased. Similar to ANCOMBC and DESeq, we can assign a significance to each of these taxa and their contribution to the groups using the Kruskal Wallis Test, which is a nonparametric equivalent to an ANOVA test. However, the limitations to SIMPER + Kruskal Wallis make methods such as DESeq more popular.

Top Taxa
Comparison SIMPER Taxa pval fdr
I_C 0.00865026484968401 UNKOWN-UNKOWN 0.6588838 0.7479006
I_C 0.008499901169711 Campylobacter-UNKOWN 0.6685589 0.7479006
I_C 0.00955565341919401 UNKOWN-UNKOWN 0.6216761 0.7479006
I_C 0.00895963445751102 Parasutterella-UNKOWN 0.7479006 0.7479006
F_C 0.009740570513597 Parasutterella-UNKOWN 0.7479006 0.7479006
I_C 0.00913408948661798 Faecalibacterium-uncultured_bacterium 0.2695016 0.7479006