Decide on co-occurrence network construction

The goal of this work is to determine if there are patterns in taxa associations in the pitcher plant microbiome across a large geographic range, which of these patterns are explained by habitat filtering and dispersal limitation, and if aggregation at higher taxonomic levels obscures those patterns.

Starting off with a network consisting of all the sites (n=108)

How does adjusting the correlation cutoff affect network results?

Here I am keeping taxa that are present at least three times in 10/108 samples. This is somewhat lower than reported in many papers I looked at. Sequence counts are CLR transformed. The alpha value is set to 0.05. Setting alpha to 0.01 doesn’t change the results.

Taxa that occur at least 3 times in 10 samples

## [1] "The total number of OTUs: 220"

Network characteristics (single network, 220 taxa)

##                  property cutoff_0.4 cutoff_0.6
## 1  clustering.coefficient     0.6383     1.0000
## 2              modularity     0.9477     0.9174
## 3             mean.degree     0.6000     0.2000
## 4                    size    66.0000    22.0000
## 5                   order   220.0000   220.0000
## 6            edge.density     0.0027     0.0009
## 7           mean.distance     1.4409     1.0000
## 8             no.clusters   163.0000   201.0000
## 9             norm.degree     0.0027     0.0009
## 10 betweenness.centrality     0.0004     0.0000
## 11     mean.shortest.path     1.4409     1.0000

Compared with a random network

##                 property empirical average.random SD.random
## 1 clustering.coefficient    0.6383         0.0020    0.0115
## 2  average.shortest.path    1.4409         2.1665    0.5249
## 3             modularity    0.9477         0.9453    0.0145

This might be too few OTUs. I’ll try building the networks using a more permissive OTU prevalence filter.

Taxa that occurr 3 times in 6 samples

## [1] "The total number of OTUs: 448"

Network characteristics (single network, 448 taxa)

##                  property cutoff_0.4 cutoff_0.6
## 1  clustering.coefficient     0.2571     0.6923
## 2              modularity     0.8845     0.9560
## 3             mean.degree     1.1830     0.1741
## 4                    size   265.0000    39.0000
## 5                   order   448.0000   448.0000
## 6            edge.density     0.0026     0.0004
## 7           mean.distance     6.8360     1.1364
## 8             no.clusters   235.0000   412.0000
## 9             norm.degree     0.0026     0.0004
## 10 betweenness.centrality     0.0233     0.0000
## 11     mean.shortest.path     6.8360     1.1364

Compared with a random network

##                 property empirical average.random SD.random
## 1 clustering.coefficient    0.2571         0.0027    0.0053
## 2  average.shortest.path    6.8360         9.9743    2.5302
## 3             modularity    0.8845         0.9224    0.0136

Using twice as many taxa makes a big difference in the network. However, this is still a low correlation cut off, and the modularity isn’t much different from random. I’ll try including even more OTUs and setting the correlation cut off at a reasonable level (0.6).

Taxa that occurr 3 times in 3 samples

## [1] "The total number of OTUs: 887"

Network characteristics (single network, 887 taxa)

##                  property    value
## 1  clustering.coefficient   0.4429
## 2              modularity   0.8913
## 3             mean.degree   0.5366
## 4                    size 238.0000
## 5                   order 887.0000
## 6            edge.density   0.0006
## 7           mean.distance   3.6211
## 8             no.clusters 713.0000
## 9             norm.degree   0.0006
## 10 betweenness.centrality   0.0009
## 11     mean.shortest.path   3.6211

Compared with a random network

##                 property empirical average.random SD.random
## 1 clustering.coefficient    0.4429         0.0012    0.0060
## 2  average.shortest.path    3.6211         2.0980    0.4485
## 3             modularity    0.8913         0.9862    0.0028

How does subsetting the network into parts affect network results?

Here I assigned the sites to subspecies ranges based on visual inspection of the USDA plants database. The prevalence-based filtering is done after splitting samples into subsets.

Southern sites network (n = 21)

## [1] "The total number of OTUs: 1532"

Taxa present at least 3 times in 2 samples

## [1] "The total number of OTUs: 463"

Network characteristics of southern sites (Spearman)

##                  property  southern
## 1  clustering.coefficient    0.7856
## 2              modularity    0.6905
## 3             mean.degree   11.7019
## 4                    size 2709.0000
## 5                   order  463.0000
## 6            edge.density    0.0253
## 7           mean.distance    4.8836
## 8             no.clusters   30.0000
## 9             norm.degree    0.0253
## 10 betweenness.centrality    0.0759
## 11     mean.shortest.path    4.8836

Compare with random network (southern sites)

##                 property empirical average.random SD.random
## 1 clustering.coefficient    0.7856         0.0250    0.0015
## 2  average.shortest.path    4.8836         2.7527    0.0023
## 3             modularity    0.6905         0.2537    0.0040

Northern sites network (n = 84)

The total number of OTUs

## [1] "The total number of OTUs: 4213"

Keeping OTUs present at least 3 times in 2 samples for 28 sites

## [1] "Number of taxa: 1071"

That is a large number of taxa to build the network and takes a really long time to run. Moreover, the number of observations can affect network outcomes (Kara et al., 2013) so it makes sense to maintain a consistent number of samples.

I could reduce the number of samples to the 21 most northern (7 sites) for comparison to the southern dataset.

Taxa present at least 3 times in 2 samples (n=21; 7 north most sites)

## [1] "Number of taxa: 327"

Network characteristics of northern sites (Spearman)

##                  property northern
## 1  clustering.coefficient   0.3855
## 2              modularity   0.8042
## 3             mean.degree   2.9235
## 4                    size 478.0000
## 5                   order 327.0000
## 6            edge.density   0.0090
## 7           mean.distance   7.1195
## 8             no.clusters  61.0000
## 9             norm.degree   0.0089
## 10 betweenness.centrality   0.1117
## 11     mean.shortest.path   7.1195

Compare with random network (northern sites)

##                 property empirical average.random SD.random
## 1 clustering.coefficient    0.3855         0.0089    0.0039
## 2  average.shortest.path    7.1195         5.3070    0.1102
## 3             modularity    0.8042         0.6130    0.0101

Decide on co-occurrence network construction

2022-10-14

Starting off with a network consisting of all the sites (n=108)

How does adjusting the correlation cutoff affect network results?

Taxa that occur at least 3 times in 10 samples

Network characteristics (single network, 220 taxa)

Compared with a random network

Taxa that occurr 3 times in 6 samples

Network characteristics (single network, 448 taxa)

Compared with a random network

Taxa that occurr 3 times in 3 samples

Network characteristics (single network, 887 taxa)

Compared with a random network

How does subsetting the network into parts affect network results?

Southern sites network (n = 21)

Taxa present at least 3 times in 2 samples

Network characteristics of southern sites (Spearman)

Compare with random network (southern sites)

Northern sites network (n = 84)

Keeping OTUs present at least 3 times in 2 samples for 28 sites

Taxa present at least 3 times in 2 samples (n=21; 7 north most sites)

Network characteristics of northern sites (Spearman)

Compare with random network (northern sites)