The goal of this work is to determine if there are patterns in taxa associations in the pitcher plant microbiome across a large geographic range, which of these patterns are explained by habitat filtering and dispersal limitation, and if aggregation at higher taxonomic levels obscures those patterns.
Here I am keeping taxa that are present at least three times in 10/108 samples. This is somewhat lower than reported in many papers I looked at. Sequence counts are CLR transformed. The alpha value is set to 0.05. Setting alpha to 0.01 doesn’t change the results.
## [1] "The total number of OTUs: 220"
## property cutoff_0.4 cutoff_0.6
## 1 clustering.coefficient 0.6383 1.0000
## 2 modularity 0.9477 0.9174
## 3 mean.degree 0.6000 0.2000
## 4 size 66.0000 22.0000
## 5 order 220.0000 220.0000
## 6 edge.density 0.0027 0.0009
## 7 mean.distance 1.4409 1.0000
## 8 no.clusters 163.0000 201.0000
## 9 norm.degree 0.0027 0.0009
## 10 betweenness.centrality 0.0004 0.0000
## 11 mean.shortest.path 1.4409 1.0000
## property empirical average.random SD.random
## 1 clustering.coefficient 0.6383 0.0020 0.0115
## 2 average.shortest.path 1.4409 2.1665 0.5249
## 3 modularity 0.9477 0.9453 0.0145
This might be too few OTUs. I’ll try building the networks using a more permissive OTU prevalence filter.
## [1] "The total number of OTUs: 448"
## property cutoff_0.4 cutoff_0.6
## 1 clustering.coefficient 0.2571 0.6923
## 2 modularity 0.8845 0.9560
## 3 mean.degree 1.1830 0.1741
## 4 size 265.0000 39.0000
## 5 order 448.0000 448.0000
## 6 edge.density 0.0026 0.0004
## 7 mean.distance 6.8360 1.1364
## 8 no.clusters 235.0000 412.0000
## 9 norm.degree 0.0026 0.0004
## 10 betweenness.centrality 0.0233 0.0000
## 11 mean.shortest.path 6.8360 1.1364
## property empirical average.random SD.random
## 1 clustering.coefficient 0.2571 0.0027 0.0053
## 2 average.shortest.path 6.8360 9.9743 2.5302
## 3 modularity 0.8845 0.9224 0.0136
Using twice as many taxa makes a big difference in the network. However, this is still a low correlation cut off, and the modularity isn’t much different from random. I’ll try including even more OTUs and setting the correlation cut off at a reasonable level (0.6).
## [1] "The total number of OTUs: 887"
## property value
## 1 clustering.coefficient 0.4429
## 2 modularity 0.8913
## 3 mean.degree 0.5366
## 4 size 238.0000
## 5 order 887.0000
## 6 edge.density 0.0006
## 7 mean.distance 3.6211
## 8 no.clusters 713.0000
## 9 norm.degree 0.0006
## 10 betweenness.centrality 0.0009
## 11 mean.shortest.path 3.6211
## property empirical average.random SD.random
## 1 clustering.coefficient 0.4429 0.0012 0.0060
## 2 average.shortest.path 3.6211 2.0980 0.4485
## 3 modularity 0.8913 0.9862 0.0028
Here I assigned the sites to subspecies ranges based on visual inspection of the USDA plants database. The prevalence-based filtering is done after splitting samples into subsets.
## [1] "The total number of OTUs: 1532"
## [1] "The total number of OTUs: 463"
## property southern
## 1 clustering.coefficient 0.7856
## 2 modularity 0.6905
## 3 mean.degree 11.7019
## 4 size 2709.0000
## 5 order 463.0000
## 6 edge.density 0.0253
## 7 mean.distance 4.8836
## 8 no.clusters 30.0000
## 9 norm.degree 0.0253
## 10 betweenness.centrality 0.0759
## 11 mean.shortest.path 4.8836
## property empirical average.random SD.random
## 1 clustering.coefficient 0.7856 0.0250 0.0015
## 2 average.shortest.path 4.8836 2.7527 0.0023
## 3 modularity 0.6905 0.2537 0.0040
The total number of OTUs
## [1] "The total number of OTUs: 4213"
## [1] "Number of taxa: 1071"
That is a large number of taxa to build the network and takes a really long time to run. Moreover, the number of observations can affect network outcomes (Kara et al., 2013) so it makes sense to maintain a consistent number of samples.
I could reduce the number of samples to the 21 most northern (7 sites) for comparison to the southern dataset.
## [1] "Number of taxa: 327"
## property northern
## 1 clustering.coefficient 0.3855
## 2 modularity 0.8042
## 3 mean.degree 2.9235
## 4 size 478.0000
## 5 order 327.0000
## 6 edge.density 0.0090
## 7 mean.distance 7.1195
## 8 no.clusters 61.0000
## 9 norm.degree 0.0089
## 10 betweenness.centrality 0.1117
## 11 mean.shortest.path 7.1195
## property empirical average.random SD.random
## 1 clustering.coefficient 0.3855 0.0089 0.0039
## 2 average.shortest.path 7.1195 5.3070 0.1102
## 3 modularity 0.8042 0.6130 0.0101