Decide on co-occurrence network construction

Using relative abundance normalization instead of CLR in this run.

Starting off with a network consisting of all the sites (n=108)

How does adjusting the correlation cutoff affect network results?

Here I am keeping taxa that are present at least three times in 10/108 samples. This is somewhat lower than reported in many papers I looked at. Sequence counts are CLR transformed. The alpha value is set to 0.05. Setting alpha to 0.01 doesn’t change the results.

Taxa that occur at least 3 times in 10 samples

## [1] "The total number of OTUs: 220"

Network characteristics (single network, 220 taxa)

##                  property cutoff_0.4 cutoff_0.6
## 1  clustering.coefficient     0.4395     0.8000
## 2              modularity     0.8463     0.9563
## 3             mean.degree     2.0091     0.3636
## 4                    size   221.0000    40.0000
## 5                   order   220.0000   220.0000
## 6            edge.density     0.0092     0.0017
## 7           mean.distance     3.3191     1.0698
## 8             no.clusters    77.0000   184.0000
## 9             norm.degree     0.0091     0.0017
## 10 betweenness.centrality     0.0119     0.0000
## 11     mean.shortest.path     3.3191     1.0698

Compared with a random network

##                 property empirical average.random SD.random
## 1 clustering.coefficient    0.4395         0.0096    0.0080
## 2  average.shortest.path    3.3191         6.7730    0.4732
## 3             modularity    0.8463         0.7392    0.0177

This might be too few OTUs. I’ll try building the networks using a more permissive OTU prevalence filter.

Taxa that occurr 3 times in 6 samples

## [1] "The total number of OTUs: 448"

Network characteristics (single network, 448 taxa)

##                  property cutoff_0.4 cutoff_0.6
## 1  clustering.coefficient     0.4284     0.6531
## 2              modularity     0.7247     0.9246
## 3             mean.degree     3.9420     0.4509
## 4                    size   883.0000   101.0000
## 5                   order   448.0000   448.0000
## 6            edge.density     0.0088     0.0010
## 7           mean.distance     7.1702     1.4615
## 8             no.clusters    89.0000   367.0000
## 9             norm.degree     0.0088     0.0010
## 10 betweenness.centrality     0.0833     0.0002
## 11     mean.shortest.path     7.1702     1.4615

Compared with a random network

##                 property empirical average.random SD.random
## 1 clustering.coefficient    0.4284         0.0086    0.0027
## 2  average.shortest.path    7.1702         4.5518    0.0378
## 3             modularity    0.7247         0.5130    0.0070

Using twice as many taxa makes a big difference in the network. However, this is still a low correlation cut off, and the network isn’t much different from random. Next, I made a network with 887 OTUs and a correlation cut off of 0.6. In that network, modularity still wasn’t different from random (not shown; took a long time to run)

How does subsetting the network into southern and northern subspecies ranges affect network results?

Here I assigned the sites to subspecies ranges based on visual inspection of the USDA plants database. The prevalence-based filtering is done after subsetting.

Southern sites network (n = 21)

## [1] "The total number of OTUs: 1532"

OTUs present at least 3 times in 2 samples

## [1] "The total number of OTUs: 463"

Network characteristics of southern sites

##                  property cutoff_0.6 cutoff_0.7
## 1  clustering.coefficient     0.7795     0.8336
## 2              modularity     0.6111     0.6660
## 3             mean.degree    17.8920    11.5767
## 4                    size  4142.0000  2680.0000
## 5                   order   463.0000   463.0000
## 6            edge.density     0.0387     0.0251
## 7           mean.distance     4.2840    10.0758
## 8             no.clusters    17.0000    69.0000
## 9             norm.degree     0.0386     0.0250
## 10 betweenness.centrality     0.0743     0.2122
## 11     mean.shortest.path     4.2840    10.0758

Compare with random network (southern sites)

##                 property empirical average.random SD.random
## 1 clustering.coefficient    0.8336         0.0252    0.0016
## 2  average.shortest.path   10.0758         2.7619    0.0028
## 3             modularity    0.6660         0.2553    0.0047

Northern sites network (n = 84)

The total number of OTUs

## [1] "The total number of OTUs: 4213"

Keeping OTUs present at least 3 times in 2 samples for 28 sites

## [1] "The total number of OTUs: 1071"

That is a large number of taxa to build the network and takes a really long time to run. Moreover, the number of observations can affect network outcomes (Kara et al., 2013) so it makes sense to maintain a consistent number of samples.

I will reduce the number of samples to the 21 most northern (7 sites) for comparison to the southern dataset.

Taxa present at least 3 times in 2 samples (n=21; 7 north most sites)

## [1] "The total number of OTUs: 327"

Network characteristics of northern sites

##                  property cutoff_0.6 cutoff_0.7
## 1  clustering.coefficient     0.4673     0.5343
## 2              modularity     0.7721     0.9050
## 3             mean.degree     4.5933     2.2018
## 4                    size   751.0000   360.0000
## 5                   order   327.0000   327.0000
## 6            edge.density     0.0141     0.0068
## 7           mean.distance     6.3581     5.6152
## 8             no.clusters    24.0000   119.0000
## 9             norm.degree     0.0140     0.0067
## 10 betweenness.centrality     0.0646     0.0205
## 11     mean.shortest.path     6.3581     5.6152

Compare with random network (northern sites)

##                 property empirical average.random SD.random
## 1 clustering.coefficient    0.4673         0.0147    0.0033
## 2  average.shortest.path    6.3581         3.9420    0.0252
## 3             modularity    0.7721         0.4567    0.0069

Decide on co-occurrence network construction

2022-10-14

Starting off with a network consisting of all the sites (n=108)

How does adjusting the correlation cutoff affect network results?

Taxa that occur at least 3 times in 10 samples

Network characteristics (single network, 220 taxa)

Compared with a random network

Taxa that occurr 3 times in 6 samples

Network characteristics (single network, 448 taxa)

Compared with a random network

How does subsetting the network into southern and northern subspecies ranges affect network results?

Southern sites network (n = 21)

OTUs present at least 3 times in 2 samples

Network characteristics of southern sites

Compare with random network (southern sites)

Northern sites network (n = 84)

Keeping OTUs present at least 3 times in 2 samples for 28 sites

Taxa present at least 3 times in 2 samples (n=21; 7 north most sites)

Network characteristics of northern sites

Compare with random network (northern sites)