The structure of the bighorn sheep microbiome

Phyloseq object

The ASV table, taxa table and metadata were combined to create a phyloseq object.

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 19935 taxa and 39 samples ]
## sample_data() Sample Data:       [ 39 samples by 18 sample variables ]
## tax_table()   Taxonomy Table:    [ 19935 taxa by 6 taxonomic ranks ]

Relative abundance of phyla

The phyloseq object with the unknown phyla removed and subset to the 13 most abundant phyla was used to create this stacked bar plot of relative abundance for individual sheep and population.

Frequency histogram

The vegan package was used to calculate the percent frequencies for each phyla across all sheep (the unknown phyla were removed) and a histogram was plotted to show how common each phyla are among the sheep.

asvmat=as.matrix(asv)
phylum=tax1$Phylum
rownames(asvmat)=phylum
asvmat=t(sapply(by(asvmat, rownames(asvmat), colSums), identity))
asvmat=asvmat[-1,]
asv.pres=apply(asvmat > 0,1, sum)
#Compute percentage frequencies
asv.relf=100*asv.pres/ncol(asvmat)

The same package was then used to create a histogram showing how common phyla are among the populations.

Amplicon sequence variants (ASVs) richness

The phyloseq object with the unknown phyla removed, was rareified to a sequencing depth of 20,000 and the alpha diversity was estimated for each sheep using the phyloseq package. A boxplot was then created comparing the richness between and within the populations.

Nestedness

The vegan package was also used to calculate the nestedness of the phyla across the sheep (the unknown phyla was removed). The nestednodf function was used for this, which produces a plot that displays the present/absence of phyla across all sheep, with the axis being order by diversity.

nestedasv=nestednodf(asvmat, order=TRUE, weighted=FALSE, wbinary=FALSE)
#col nestedness=86.677, row nestedness= 76.810, NODF value=83.671

Phi coefficient

The Phi Coefficient is a measure of average association between sheep populations and phyla. The phi coefficient was calculated using the indicspecies package for each sequence.The sequences were then matched to their phyla and the mean phi coefficient calculated for each phyla (with the unknown phyla removed). A heatmap was then plotted to visualise the association between phyla and popualtions, with green representing a positive association and dark blue a negaitve one.

#Creating a presence/absence table (required for function)
asv2=as.data.frame(t(asv))
#Creating vector for groups (i.e all NE sheep in one group)
groups = c(rep(1, 8), rep(2, 2), rep(3, 4), rep(4, 6), rep(5, 6), rep(6, 3), 
           rep(4, 1), rep(7,7), rep(2,1), rep(6,1))
asvpa=ifelse(asv2>0,1,0)
#Calculating phi coefficient with only single sites (no site combinations)
phi=multipatt(asvpa, groups, func="r.g", duleg=TRUE, control= how(nperm=999))
#Determines the ecological preference of species among site groups
p3=phi$str #Negative values = "avoid" site, Positive values = prefers site 
#Adding phylum to phi table
newp3=cbind(p3, Phylum = tax1$Phylum)
#Getting data into correct form for ggplot
p3long=melt(as.data.frame(newp3), id.vars= "Phylum", measure.vars=c("1", "2", "3", 
                                                                    "4", "5", "6", 
                                                                    "7"))
#Turn variable into a character vector
p3long$variable=as.character(p3long$variable)
#Turn value into a numeric vector
p3long$value=as.numeric(p3long$value)
#Group phylum and populations together and calculating the mean phi coefficient
p3long=p3long %>%
  group_by(Phylum, variable) %>%
  summarise(mean=mean(value, na.rm=TRUE))
#Remove unknown phyla 
p3long=p3long[-c(1,2,3,4,5,6,7),]

Biological indicators

The indicspecies package was used to calculate the indicator value index (the product of sensitivity and the positive predictive value) for all sequences for each population. The best indicators (sequences) for each population are plotted and their phyla labeled.

#Determines the predictive value of species as indicators of the conditions
#prevailing in site groups 
indval=multipatt(asv2, groups, control = how(nperm=999))
#Select A and B values for best possible indicator from each pop and plot bar chart