Fidelity: cheatsheet

takeaways

In this document, we walk through how to use the Fidelity R-package to explore the host specificity of microbial communities. Although this package was created for microbial data, it can be used with all types of ecological communities. In the example below, we use data of foliar fungal communities associated with three conifer tree hosts, a data set that was analyzed in this paper: “Host specificity of fungal functional groups covaries with elevation: implications for intraspecific plant interactions”. https://doi.org/10.1101/2025.05.24.655939.

This package has two functions, Fidelity() and Fidelity_physeq().These functions output the same results, but differ in that Fidelity_physeq() is compatible with phyloseq objects. Scroll down for the Fidelity_physeq() example. With the output from the functions, you should be able to answer the following ecological questions:

How does host specificity compare among my communities? output$community_specificity_index
How does host specificity compare among my taxa? output$taxon_specificity_index
How does host specificity of my communities compare to that of other communities? output$community_specificity_index

## load packages
library(Fidelity)
library(tidyverse)
library(phyloseq)

## load foliar fungal community data
foliar_fungi <- read.csv("foliar.csv", row.names = 1)

## load environmental variable data
foliar_env <- read.csv("foliar_env.csv", row.names = 1)

## isolate relevant environmental variables
foliar_meta <- subset(foliar_env, select = c("species", "elevation"))

what data are we working with?

The foliar_fungi data frame contains species counts of foliar fungal communities associated with three dominant conifer tree species found across the Cascades Elevation Gradient in Western Oregon. Each row represents a community associated with an individual tree, or a sample. Each column represents a different fungal species, indicated by Operational Taxonomic Units (OTUs). The data set contains 65 individual tree hosts and a total of 1072 fungal OTUs.

The data fame should contain count data. You can use this package with relative abundance, total abundance or presence-absence data. If you have presence-absence data, set the argument allow.pa = TRUE.

## view community data frame
head(foliar_fungi[, 1:5])

##          otu.2 otu.3 otu.4 otu.5 otu.6
## sample-1  1774  3079  2221     0  1556
## sample-2    47  4678     1  1984   433
## sample-3   603   548   399     0    88
## sample-4  2019  2363  1192     2  2334
## sample-5  4314   150  2846   650   500
## sample-6  1713  2071  2840     0  1928

The foliar_meta data frame contains information on the tree host species (species) and elevation (elevation) of each individual tree host and its associated fungal community. There are three tree species in this study: Douglas-fir (Pseudotsuga menziesii, PSME), Western Hemlock (Tsuga heterophylla, TSHE) and Pacific Yew (Taxus brevifolia, TABR).

Take note that the row names in the community data frame (foliar_fungi) must match the order and identity of those found in the meta data frame (foliar_env). Be sure that the row names from these two matrices match before proceeding forward.

## view metadata
head(foliar_meta)

##          species elevation
## sample-1    psme       500
## sample-2    tshe       820
## sample-3    psme       700
## sample-4    psme      1020
## sample-5    tshe       700
## sample-6    tabr       510

running the function

There is one function in this package, and it generates all of the data for downstream analyses. The input values for this function are a community data frame (foliar_fungi) and a grouping vector. In our case, we are interested in understanding the community specificity values associated with the different tree hosts, so the (foliar_meta$species) is our grouping vector. Although we use hosts in this tutorial, you can also use other types of groups (ie elevation, genotype, treatment).

In the function, we can indicate whether we would like to remove rare taxa from the analysis. Rare taxa are defined as those that are too rare to have a significant p-value from the Indicator Species Analysis. The default is rm.rare.taxa = TRUE.

We can also indicate whether we would like to view the occurrence vs p-value plot, or ovplot (below). With this plot, each point represents a taxon in the study. We can use it to visualize the threshold of rare taxa removal (black vertical line). This threshold is determined by first splitting the taxa into occurrence groups, then removing taxa in those groups for which no taxa has a p-value less than 0.05 (black horizontal line).

With our foliar fungal data set, all taxa found on two hosts or less were removed.

## execute function
output <- Fidelity(comm = foliar_fungi,
                groups = foliar_meta$species,
                seed = 123,
                n.perm = 999,
                pval.cutoff = 0.05,
                max.ratio = 0,
                ovp.plot = TRUE,
                rm.rare.taxa = TRUE,
                allow.pa = FALSE)

how does host specificity compare among my communities?

With the output$community_specificity_index, we can compare host specificity of microbial communities as a whole.

## rename community_specificity_index data frame 
csi <- output$community_specificity_index

head(csi)

##   sample_id community_index group
## 1  sample-1           0.635  psme
## 2  sample-2           0.677  tshe
## 3  sample-3           0.657  psme
## 4  sample-4           0.597  psme
## 5  sample-5           0.716  tshe
## 6  sample-6           0.546  tabr

## analysis of variance to examine if there are statistically significant differences in community specificity among the hosts
csi_aov <- aov(community_index ~ group, data = csi)
summary(csi_aov)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## group        2 0.1043 0.05215    6.86 0.00206 **
## Residuals   61 0.4637 0.00760                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There are significant differences in specificity among hosts. Let’s look at pairwise differences.

## Tukey-Kramer post-hoc test to look at pairwise differences
TukeyHSD(csi_aov)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = community_index ~ group, data = csi)
## 
## $group
##                  diff         lwr        upr     p adj
## tabr-psme  0.10113636  0.03231846 0.16995427 0.0022671
## tshe-psme  0.08342308  0.01687140 0.14997475 0.0104220
## tshe-tabr -0.01771329 -0.07838742 0.04296085 0.7636247

PSME communities are significantly less host specific than those associated with TSHE or TABR. Let’s look at a plot to visualize the differences.

## create color palette
pal_comm <- c("#E5F5E0", "#A1D99B", "#31A354")

## create plot
plot_comm <- ggplot(csi, aes(x = group, y = community_index, fill = group)) + geom_boxplot(color = "#006D2C", linewidth = 1.5) + xlab("tree host species") + ylab("community specificity index") + scale_fill_manual(values = pal_comm) + theme_classic(base_size = 15) + theme(legend.position = "none")

plot_comm

Now let’s examine how host specificity changes with elevation.

## merge csi and metadata
foliar_meta$sample_id <- rownames(foliar_meta)

csi_merge <- merge(foliar_meta, csi)
head(csi_merge)

##   sample_id species elevation community_index group
## 1  sample-1    psme       500           0.635  psme
## 2 sample-10    psme       510           0.602  psme
## 3 sample-11    psme       700           0.560  psme
## 4 sample-13    tshe       510           0.760  tshe
## 5 sample-14    psme      1020           0.599  psme
## 6 sample-15    tshe       510           0.741  tshe

## linear regression to examine if host specificity changes with elevation
csi_lm <- lm(community_index ~ elevation, data = csi_merge)
summary(csi_lm)

## 
## Call:
## lm(formula = community_index ~ elevation, data = csi_merge)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.160826 -0.068040  0.008876  0.077341  0.172466 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.788e-01  3.961e-02  14.611  < 2e-16 ***
## elevation   1.756e-04  5.105e-05   3.439  0.00105 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08772 on 62 degrees of freedom
## Multiple R-squared:  0.1602, Adjusted R-squared:  0.1467 
## F-statistic: 11.83 on 1 and 62 DF,  p-value: 0.001049

There is a significant relationship between host specificity and elevation. Let’s take a look at the plot.

## create plot
plot_lm <-  ggplot(data = csi_merge, aes(x = elevation, y = community_index)) +
      geom_point() +
      geom_smooth(method = "lm", color = "#006D2C", linewidth = 1.5) + theme_classic(base_size = 15) + theme(legend.position = "none") + ylab("community specificity index") + xlab("elevation (meters)")

plot_lm

how does host specificity compare among my taxa?

We are interested in the host specificity of a few taxa of ecological relevance in this system.

Nothophaeocryptocus gaeumanii (otu.71) - dominant pathogen known to associate with PSME hosts
Rhabdocline parkerii (otu.176) - puntative mutualist known to associate with PSME hosts
Rhabdocline weirii (otu.13) - less-dominant pathogen known to associate with PSME hosts
Aspergillus sydowii (otu.74) - dominant foliar saprobe known to be found on a variety of hosts and environments

We can compare their host specificity values to each other with the output$taxon_specificity_index.

## rename taxon_specificity_index data frame 
tsi <- output$taxon_specificity_index

head(tsi)

## # A tibble: 6 × 5
##   comm_name iv.max p.value taxon_index most_loyal_to
##   <chr>      <dbl>   <dbl>       <dbl> <chr>        
## 1 otu.2       54.9 0.00100      0.999  tshe         
## 2 otu.3       41.6 0.548        0.452  tshe         
## 3 otu.4       38.1 0.843        0.157  tabr         
## 4 otu.5       98.0 0            1      tshe         
## 5 otu.6       36.1 0.956        0.0440 tabr         
## 6 otu.7       34.2 0.0370       0.963  tabr

##identify mean taxa specificity value
mean <- mean(tsi$taxon_index)
mean

## [1] 0.6945397

## subset taxa of interest
special_taxa <- subset(tsi, comm_name == "otu.71" | comm_name == "otu.176" | comm_name == "otu.13" | comm_name == "otu.74")

names <- c("N. gaeumanii", "R. parkerii", "R. weirii", "A. sydowii")

## plot host specificity of key taxa with the mean specificity value for the community
plot_sp_taxa <- ggplot(special_taxa, aes(x = comm_name, y = taxon_index)) + geom_point(size = 5) + xlab("taxon") + ylab("taxon specificity index") + theme_classic(base_size = 15) + theme(legend.position = "none") + scale_x_discrete(labels = names) + geom_hline(yintercept = mean, color = "#006D2C", linetype = "dashed", size = 1.5)

plot_sp_taxa

The graph shows that the three pathogens and mutualists previously known to associate with PSME are more host specific than the mean taxon specificity value (green horizontal line). In contrast, Aspergillus sydowii, a saprobe known to associate more generally, is less host specific than the community as a whole.

how does host specificity of my communities compare to that of other communities?

We are interested in understanding whether the foliar fungal communities are more host specific than soil fungal communities found at the same sites. The output of the community weighted mean analysis used to generate specificity indices is scaled to range between 0-1, with 0 representing communities that are less host specific and 1 representing communities that are more host specific. Because these values are scaled, we can compare them with host specificity indices from other studies using the Fidelity() function.

For these comparisons below, we are using an additional data frame generated by the Fidelity() function for soil communities.

## load data frame of soil fungal community host specificity values
csi_soil <- read.csv("csi_soil.csv")

##make a new column to indicate that these are soil communities
csi_soil <- csi_soil %>%
  mutate(substrate = "soil")

head(csi_soil)

##       sample_id community_index group substrate
## 1 sample-1_soil           0.620  psme      soil
## 2 sample-2_soil           0.603  tshe      soil
## 3 sample-3_soil           0.688  tshe      soil
## 4 sample-4_soil           0.459  tabr      soil
## 5 sample-5_soil           0.644  tshe      soil
## 6 sample-6_soil           0.753  psme      soil

##make a new column for the foliar csi to indicate that these are communities associated with needles
csi_foliar <- csi %>%
  mutate(substrate = "needle")

head(csi_foliar)

##   sample_id community_index group substrate
## 1  sample-1           0.635  psme    needle
## 2  sample-2           0.677  tshe    needle
## 3  sample-3           0.657  psme    needle
## 4  sample-4           0.597  psme    needle
## 5  sample-5           0.716  tshe    needle
## 6  sample-6           0.546  tabr    needle

##join the two data frames in order to make comparisons
csi_join <- full_join(csi_soil, csi_foliar)

## anaylsis of variance test to see if there are host specificity differences based on substrate
join_aov <- aov(community_index ~ substrate, csi_join)
summary(join_aov)

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## substrate     1 0.5231  0.5231   61.86 1.25e-12 ***
## Residuals   130 1.0994  0.0085                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## create plot to visualize these differences
pal_substrate <- c("#006D2C", "#993404")


plot_substrate <- ggplot(csi_join, aes(x = substrate, y = community_index, fill = substrate)) + geom_boxplot(color = "black", linewidth = 1.5) + xlab("substrate") + ylab("community specificity index") + scale_fill_manual(values = pal_substrate) + theme_classic(base_size = 15) + theme(legend.position = "none")

plot_substrate

Here we can see that soil fungal communities at these study sites are significantly less host specific than foliar fungal communities.

Fidelity_physeq(): a function compatible with phyloseq objects

The Fidelity_physeq() function allows for phyloseq objects as input variables. Much like the Fidelity() function, there are two inputs required: a phyloseq object and a grouping vector sourced from the sample_data of the phyloseq object.

##use the R-package phlyoseq to create a phlyoseq object with the OTU and metadata tables
OTU <- otu_table(foliar_fungi, taxa_are_rows = FALSE)
SAMPLE <- sample_data(foliar_meta)

ps_object <- phyloseq(OTU, SAMPLE)

##use the phlyoseq object in the Fidelity_physeq() function
output_ps <- Fidelity_physeq(ps_object,
                groups = "species",
                seed = 123,
                n.perm = 999,
                pval.cutoff = 0.05,
                max.ratio = 0,
                ovp.plot = TRUE,
                rm.rare.taxa = TRUE)

The output_ps and output generated with the Fidelity() function above are equivalent. The output_ps can be used for the downstream analyses demonstrated above.