In this document, we walk through how to use the Fidelity R-package to explore the host specificity of microbial communities. Although this package was created for microbial data, it can be used with all types of ecological communities. In the example below, we use data of foliar fungal communities associated with three conifer tree hosts, a data set that was analyzed in this paper: “Host specificity of fungal functional groups covaries with elevation: implications for intraspecific plant interactions”. https://doi.org/10.1101/2025.05.24.655939.
This package has two functions, Fidelity() and Fidelity_physeq().These functions output the same results, but differ in that Fidelity_physeq() is compatible with phyloseq objects. Scroll down for the Fidelity_physeq() example. With the output from the functions, you should be able to answer the following ecological questions:
## load packages
library(Fidelity)
library(tidyverse)
library(phyloseq)
## load foliar fungal community data
foliar_fungi <- read.csv("foliar.csv", row.names = 1)
## load environmental variable data
foliar_env <- read.csv("foliar_env.csv", row.names = 1)
## isolate relevant environmental variables
foliar_meta <- subset(foliar_env, select = c("species", "elevation"))
The foliar_fungi data frame contains species counts of foliar fungal communities associated with three dominant conifer tree species found across the Cascades Elevation Gradient in Western Oregon. Each row represents a community associated with an individual tree, or a sample. Each column represents a different fungal species, indicated by Operational Taxonomic Units (OTUs). The data set contains 65 individual tree hosts and a total of 1072 fungal OTUs.
The data fame should contain count data. You can use this package with relative abundance, total abundance or presence-absence data. If you have presence-absence data, set the argument allow.pa = TRUE.
## view community data frame
head(foliar_fungi[, 1:5])
## otu.2 otu.3 otu.4 otu.5 otu.6
## sample-1 1774 3079 2221 0 1556
## sample-2 47 4678 1 1984 433
## sample-3 603 548 399 0 88
## sample-4 2019 2363 1192 2 2334
## sample-5 4314 150 2846 650 500
## sample-6 1713 2071 2840 0 1928
The foliar_meta data frame contains information on the tree host species (species) and elevation (elevation) of each individual tree host and its associated fungal community. There are three tree species in this study: Douglas-fir (Pseudotsuga menziesii, PSME), Western Hemlock (Tsuga heterophylla, TSHE) and Pacific Yew (Taxus brevifolia, TABR).
Take note that the row names in the community data frame (foliar_fungi) must match the order and identity of those found in the meta data frame (foliar_env). Be sure that the row names from these two matrices match before proceeding forward.
## view metadata
head(foliar_meta)
## species elevation
## sample-1 psme 500
## sample-2 tshe 820
## sample-3 psme 700
## sample-4 psme 1020
## sample-5 tshe 700
## sample-6 tabr 510
There is one function in this package, and it generates all of the data for downstream analyses. The input values for this function are a community data frame (foliar_fungi) and a grouping vector. In our case, we are interested in understanding the community specificity values associated with the different tree hosts, so the (foliar_meta$species) is our grouping vector. Although we use hosts in this tutorial, you can also use other types of groups (ie elevation, genotype, treatment).
In the function, we can indicate whether we would like to remove rare taxa from the analysis. Rare taxa are defined as those that are too rare to have a significant p-value from the Indicator Species Analysis. The default is rm.rare.taxa = TRUE.
We can also indicate whether we would like to view the occurrence vs p-value plot, or ovplot (below). With this plot, each point represents a taxon in the study. We can use it to visualize the threshold of rare taxa removal (black vertical line). This threshold is determined by first splitting the taxa into occurrence groups, then removing taxa in those groups for which no taxa has a p-value less than 0.05 (black horizontal line).
With our foliar fungal data set, all taxa found on two hosts or less were removed.
## execute function
output <- Fidelity(comm = foliar_fungi,
groups = foliar_meta$species,
seed = 123,
n.perm = 999,
pval.cutoff = 0.05,
max.ratio = 0,
ovp.plot = TRUE,
rm.rare.taxa = TRUE,
allow.pa = FALSE)
With the output$community_specificity_index, we can compare host specificity of microbial communities as a whole.
## rename community_specificity_index data frame
csi <- output$community_specificity_index
head(csi)
## sample_id community_index group
## 1 sample-1 0.635 psme
## 2 sample-2 0.677 tshe
## 3 sample-3 0.657 psme
## 4 sample-4 0.597 psme
## 5 sample-5 0.716 tshe
## 6 sample-6 0.546 tabr
## analysis of variance to examine if there are statistically significant differences in community specificity among the hosts
csi_aov <- aov(community_index ~ group, data = csi)
summary(csi_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 0.1043 0.05215 6.86 0.00206 **
## Residuals 61 0.4637 0.00760
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There are significant differences in specificity among hosts. Let’s look at pairwise differences.
## Tukey-Kramer post-hoc test to look at pairwise differences
TukeyHSD(csi_aov)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = community_index ~ group, data = csi)
##
## $group
## diff lwr upr p adj
## tabr-psme 0.10113636 0.03231846 0.16995427 0.0022671
## tshe-psme 0.08342308 0.01687140 0.14997475 0.0104220
## tshe-tabr -0.01771329 -0.07838742 0.04296085 0.7636247
PSME communities are significantly less host specific than those associated with TSHE or TABR. Let’s look at a plot to visualize the differences.
## create color palette
pal_comm <- c("#E5F5E0", "#A1D99B", "#31A354")
## create plot
plot_comm <- ggplot(csi, aes(x = group, y = community_index, fill = group)) + geom_boxplot(color = "#006D2C", linewidth = 1.5) + xlab("tree host species") + ylab("community specificity index") + scale_fill_manual(values = pal_comm) + theme_classic(base_size = 15) + theme(legend.position = "none")
plot_comm
Now let’s examine how host specificity changes with elevation.
## merge csi and metadata
foliar_meta$sample_id <- rownames(foliar_meta)
csi_merge <- merge(foliar_meta, csi)
head(csi_merge)
## sample_id species elevation community_index group
## 1 sample-1 psme 500 0.635 psme
## 2 sample-10 psme 510 0.602 psme
## 3 sample-11 psme 700 0.560 psme
## 4 sample-13 tshe 510 0.760 tshe
## 5 sample-14 psme 1020 0.599 psme
## 6 sample-15 tshe 510 0.741 tshe
## linear regression to examine if host specificity changes with elevation
csi_lm <- lm(community_index ~ elevation, data = csi_merge)
summary(csi_lm)
##
## Call:
## lm(formula = community_index ~ elevation, data = csi_merge)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.160826 -0.068040 0.008876 0.077341 0.172466
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.788e-01 3.961e-02 14.611 < 2e-16 ***
## elevation 1.756e-04 5.105e-05 3.439 0.00105 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08772 on 62 degrees of freedom
## Multiple R-squared: 0.1602, Adjusted R-squared: 0.1467
## F-statistic: 11.83 on 1 and 62 DF, p-value: 0.001049
There is a significant relationship between host specificity and elevation. Let’s take a look at the plot.
## create plot
plot_lm <- ggplot(data = csi_merge, aes(x = elevation, y = community_index)) +
geom_point() +
geom_smooth(method = "lm", color = "#006D2C", linewidth = 1.5) + theme_classic(base_size = 15) + theme(legend.position = "none") + ylab("community specificity index") + xlab("elevation (meters)")
plot_lm
We are interested in the host specificity of a few taxa of ecological relevance in this system.
We can compare their host specificity values to each other with the output$taxon_specificity_index.
## rename taxon_specificity_index data frame
tsi <- output$taxon_specificity_index
head(tsi)
## # A tibble: 6 × 5
## comm_name iv.max p.value taxon_index most_loyal_to
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 otu.2 54.9 0.00100 0.999 tshe
## 2 otu.3 41.6 0.548 0.452 tshe
## 3 otu.4 38.1 0.843 0.157 tabr
## 4 otu.5 98.0 0 1 tshe
## 5 otu.6 36.1 0.956 0.0440 tabr
## 6 otu.7 34.2 0.0370 0.963 tabr
##identify mean taxa specificity value
mean <- mean(tsi$taxon_index)
mean
## [1] 0.6945397
## subset taxa of interest
special_taxa <- subset(tsi, comm_name == "otu.71" | comm_name == "otu.176" | comm_name == "otu.13" | comm_name == "otu.74")
names <- c("N. gaeumanii", "R. parkerii", "R. weirii", "A. sydowii")
## plot host specificity of key taxa with the mean specificity value for the community
plot_sp_taxa <- ggplot(special_taxa, aes(x = comm_name, y = taxon_index)) + geom_point(size = 5) + xlab("taxon") + ylab("taxon specificity index") + theme_classic(base_size = 15) + theme(legend.position = "none") + scale_x_discrete(labels = names) + geom_hline(yintercept = mean, color = "#006D2C", linetype = "dashed", size = 1.5)
plot_sp_taxa
The graph shows that the three pathogens and mutualists previously known to associate with PSME are more host specific than the mean taxon specificity value (green horizontal line). In contrast, Aspergillus sydowii, a saprobe known to associate more generally, is less host specific than the community as a whole.
We are interested in understanding whether the foliar fungal communities are more host specific than soil fungal communities found at the same sites. The output of the community weighted mean analysis used to generate specificity indices is scaled to range between 0-1, with 0 representing communities that are less host specific and 1 representing communities that are more host specific. Because these values are scaled, we can compare them with host specificity indices from other studies using the Fidelity() function.
For these comparisons below, we are using an additional data frame generated by the Fidelity() function for soil communities.
## load data frame of soil fungal community host specificity values
csi_soil <- read.csv("csi_soil.csv")
##make a new column to indicate that these are soil communities
csi_soil <- csi_soil %>%
mutate(substrate = "soil")
head(csi_soil)
## sample_id community_index group substrate
## 1 sample-1_soil 0.620 psme soil
## 2 sample-2_soil 0.603 tshe soil
## 3 sample-3_soil 0.688 tshe soil
## 4 sample-4_soil 0.459 tabr soil
## 5 sample-5_soil 0.644 tshe soil
## 6 sample-6_soil 0.753 psme soil
##make a new column for the foliar csi to indicate that these are communities associated with needles
csi_foliar <- csi %>%
mutate(substrate = "needle")
head(csi_foliar)
## sample_id community_index group substrate
## 1 sample-1 0.635 psme needle
## 2 sample-2 0.677 tshe needle
## 3 sample-3 0.657 psme needle
## 4 sample-4 0.597 psme needle
## 5 sample-5 0.716 tshe needle
## 6 sample-6 0.546 tabr needle
##join the two data frames in order to make comparisons
csi_join <- full_join(csi_soil, csi_foliar)
## anaylsis of variance test to see if there are host specificity differences based on substrate
join_aov <- aov(community_index ~ substrate, csi_join)
summary(join_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## substrate 1 0.5231 0.5231 61.86 1.25e-12 ***
## Residuals 130 1.0994 0.0085
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## create plot to visualize these differences
pal_substrate <- c("#006D2C", "#993404")
plot_substrate <- ggplot(csi_join, aes(x = substrate, y = community_index, fill = substrate)) + geom_boxplot(color = "black", linewidth = 1.5) + xlab("substrate") + ylab("community specificity index") + scale_fill_manual(values = pal_substrate) + theme_classic(base_size = 15) + theme(legend.position = "none")
plot_substrate
Here we can see that soil fungal communities at these study sites are significantly less host specific than foliar fungal communities.
The Fidelity_physeq() function allows for phyloseq objects as input variables. Much like the Fidelity() function, there are two inputs required: a phyloseq object and a grouping vector sourced from the sample_data of the phyloseq object.
##use the R-package phlyoseq to create a phlyoseq object with the OTU and metadata tables
OTU <- otu_table(foliar_fungi, taxa_are_rows = FALSE)
SAMPLE <- sample_data(foliar_meta)
ps_object <- phyloseq(OTU, SAMPLE)
##use the phlyoseq object in the Fidelity_physeq() function
output_ps <- Fidelity_physeq(ps_object,
groups = "species",
seed = 123,
n.perm = 999,
pval.cutoff = 0.05,
max.ratio = 0,
ovp.plot = TRUE,
rm.rare.taxa = TRUE)
The output_ps and output generated with the Fidelity() function above are equivalent. The output_ps can be used for the downstream analyses demonstrated above.