Using relative abundance normalization instead of CLR in this run.
Here I am keeping taxa that are present at least three times in 10/108 samples. This is somewhat lower than reported in many papers I looked at. Sequence counts are CLR transformed. The alpha value is set to 0.05. Setting alpha to 0.01 doesn’t change the results.
## [1] "The total number of OTUs: 220"
## property cutoff_0.4 cutoff_0.6
## 1 clustering.coefficient 0.4395 0.8000
## 2 modularity 0.8463 0.9563
## 3 mean.degree 2.0091 0.3636
## 4 size 221.0000 40.0000
## 5 order 220.0000 220.0000
## 6 edge.density 0.0092 0.0017
## 7 mean.distance 3.3191 1.0698
## 8 no.clusters 77.0000 184.0000
## 9 norm.degree 0.0091 0.0017
## 10 betweenness.centrality 0.0119 0.0000
## 11 mean.shortest.path 3.3191 1.0698
## property empirical average.random SD.random
## 1 clustering.coefficient 0.4395 0.0096 0.0080
## 2 average.shortest.path 3.3191 6.7730 0.4732
## 3 modularity 0.8463 0.7392 0.0177
This might be too few OTUs. I’ll try building the networks using a more permissive OTU prevalence filter.
## [1] "The total number of OTUs: 448"
## property cutoff_0.4 cutoff_0.6
## 1 clustering.coefficient 0.4284 0.6531
## 2 modularity 0.7247 0.9246
## 3 mean.degree 3.9420 0.4509
## 4 size 883.0000 101.0000
## 5 order 448.0000 448.0000
## 6 edge.density 0.0088 0.0010
## 7 mean.distance 7.1702 1.4615
## 8 no.clusters 89.0000 367.0000
## 9 norm.degree 0.0088 0.0010
## 10 betweenness.centrality 0.0833 0.0002
## 11 mean.shortest.path 7.1702 1.4615
## property empirical average.random SD.random
## 1 clustering.coefficient 0.4284 0.0086 0.0027
## 2 average.shortest.path 7.1702 4.5518 0.0378
## 3 modularity 0.7247 0.5130 0.0070
Using twice as many taxa makes a big difference in the network. However, this is still a low correlation cut off, and the network isn’t much different from random. Next, I made a network with 887 OTUs and a correlation cut off of 0.6. In that network, modularity still wasn’t different from random (not shown; took a long time to run)
Here I assigned the sites to subspecies ranges based on visual inspection of the USDA plants database. The prevalence-based filtering is done after subsetting.
## [1] "The total number of OTUs: 1532"
## [1] "The total number of OTUs: 463"
## property cutoff_0.6 cutoff_0.7
## 1 clustering.coefficient 0.7795 0.8336
## 2 modularity 0.6111 0.6660
## 3 mean.degree 17.8920 11.5767
## 4 size 4142.0000 2680.0000
## 5 order 463.0000 463.0000
## 6 edge.density 0.0387 0.0251
## 7 mean.distance 4.2840 10.0758
## 8 no.clusters 17.0000 69.0000
## 9 norm.degree 0.0386 0.0250
## 10 betweenness.centrality 0.0743 0.2122
## 11 mean.shortest.path 4.2840 10.0758
## property empirical average.random SD.random
## 1 clustering.coefficient 0.8336 0.0252 0.0016
## 2 average.shortest.path 10.0758 2.7619 0.0028
## 3 modularity 0.6660 0.2553 0.0047
The total number of OTUs
## [1] "The total number of OTUs: 4213"
## [1] "The total number of OTUs: 1071"
That is a large number of taxa to build the network and takes a really long time to run. Moreover, the number of observations can affect network outcomes (Kara et al., 2013) so it makes sense to maintain a consistent number of samples.
I will reduce the number of samples to the 21 most northern (7 sites) for comparison to the southern dataset.
## [1] "The total number of OTUs: 327"
## property cutoff_0.6 cutoff_0.7
## 1 clustering.coefficient 0.4673 0.5343
## 2 modularity 0.7721 0.9050
## 3 mean.degree 4.5933 2.2018
## 4 size 751.0000 360.0000
## 5 order 327.0000 327.0000
## 6 edge.density 0.0141 0.0068
## 7 mean.distance 6.3581 5.6152
## 8 no.clusters 24.0000 119.0000
## 9 norm.degree 0.0140 0.0067
## 10 betweenness.centrality 0.0646 0.0205
## 11 mean.shortest.path 6.3581 5.6152
## property empirical average.random SD.random
## 1 clustering.coefficient 0.4673 0.0147 0.0033
## 2 average.shortest.path 6.3581 3.9420 0.0252
## 3 modularity 0.7721 0.4567 0.0069