Earth microbial co-occurrence network
Fourteen microbial co-occurrence networks representing different environments were constructed, comprising 12,646 exact sequence variants (ESVs). To reduce noise and false-positive predictions, network inclusion was restricted to ESVs present in at least 10% of samples; we also used conservative statistical cut-off values (see “Materials and methods” section). The 14 networks were merged into a single Earth microbial co-occurrence network by overlapping the vertices and edges; the final network consists of 2928 vertices and 54,299 edges after removing unconnected vertices (Fig. 1a). The scale-free property (R2=0.19, P<0.001) and independency between abundance and degree (R2=− 0.08, P=0.07) suggest a non-random co-occurrence pattern in this microbial network (Fig. S1). As ESVs were annotated to their representative microbial taxa, we were able to identify 812 taxa-pairs that were present more than twice in the global microbial co-occurrence network. We validated 432 co-occurrence edges, 15 intra-taxa edges, and 6 competition edges via literature mining (Data file S1). Although this only accounts for 1.5% of edges in the global microbial co-occurrence network, those 812 taxa pairs account for 30% of the edges presented in more than 6 environments.
This global network exhibits a high degree of modularity, but 87.9% of vertices were accounted for by only 8 of the 53 total modules (Fig. S2). Among these eight modules, the first 5 are densely nested into a giant module, while modules 6, 7, and 8 remain isolated from this greater module (Fig. S3). All 8 modules were comprised of different taxonomic profiles and were dominated by Clostridia, Alphaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria (Fig. 1a). Vertices from microbiomes of soils, non-saline waters, animal distal guts, and animal surfaces were present in all 8 modules (Fig. 1b; Fig. S4a), and overrepresented in different modules (Fig. S4b). However, vertices from microbiomes of animal corpus were mostly restricted to and overrepresented than random frequency in M3 (3.1%), while vertices from plant corpus comprised a major portion of and was overrepresented than random frequency in M3 (4.6%) and M4 (2.3%; Fig. S4a-b).
Phylogeny of co-occurrence network
With regard to phylogeny, a non-random edge distribution across taxa was observed, with most co-occurrence relationships derived from Alphaproteobacteria, Clostridia, and Deltaproteobacteria (Fig. 2a) classes. Most of the combinations between dominant classes are overrepresented than random frequency (Fig. 2b). However, only certain combinations between rare classes, such as Flavobacteriia and Gemmatinonadetes, Bacteroidia and Anaeroblineae, and Gemmatinonadetes and Bacteroidia, are overrepresented than random frequency. For within taxa co-occurrence, only co-occurrence within Deltaproteobacteria, Planctomycetia, Anaerolineae, and Acidobacteria Gp2 classes were overrepresented than random frequency. Given that the subnetworks for different environments display different co-occurrence patterns, certain co-occurrence relationships were only overrepresented than random frequency in specific environments (Fig. 2c).
Topological properties
To avoid biases introduced by sample number and ESV number, we inferred 12 subnetworks for each environment with datasets trimmed into uniform size (see the “Materials and methods” section). The topological properties were highly variable between the 12 environmental subnetworks (Fig. 3). Although the datasets for 12 environments were trimmed into the same number, the edge numbers of the subnetwork of animal distal gut (4574) was 13 times larger than the subnetwork of non-saline surface (350). The diameter values ranged from 4 to 6 but were not correlated with edge numbers. The clustering coefficient values of subnetworks for animal proximal gut (0.22) and saline sediment (0.22) were higher than of subnetworks for other environments. The average separation (0.30) and modularity (2.7) were the highest for the subnetwork of non-saline surface. Average betweenness centrality values of subnetworks of animal distal gut (212.6) and soil (206.0) were greater than those of other environments.
Generalist and specialist edges
The proportion of generalist edges, which were present in more than one subnetwork, ranged from 34.3 to 57.0% of the edges in corresponding subnetworks (Fig. 4a). Generalist edges accounted for less than 50% of edges in most subnetworks, except in non-saline water, animal secretion, and the surfaces of plants and animals. The environmental localization of generalist edges was assessed using omission scores (OS, see the “Materials and methods” section). Only 3.4% of generalist edges were identified as local edges (Data file S2).
Specialist edges, which are present in a single subnetwork, could link environment specific vertex pairs present in environment specific subnetworks or link general vertex pairs present in at least two subnetworks. The proportion of specialist edges linking specific vertex pairs accounted for 54.5% of edges in the animal proximal gut subnetwork and 52.4% of edges in the rhizosphere subnetwork, but only accounted for 15.6% of the edges in the animal secretion subnetwork. The proportion of specialist edges linking generalist vertex pairs ranged from 9.6 to 29.8% of edge numbers in corresponding subnetworks; most were greater than 20% except in the animal proximal gut (9.6%), rhizosphere (13.3%) and saline water (19.1%) subnetworks. The proportions of generalist edges were negatively correlated with the proportions of specialist edges linking specific vertex pairs (ρ=−0.87, P<0.001), but were not correlated with the proportion of specialist edges linking generalist vertex pairs (ρ=0.11, P<0.72) (Fig. S5). Moreover, the proportions of those three edge types were not related to edge numbers in subnetworks (P>0.10) (Fig. S6).
The profiles of the 50 most abundant associated vertices were different for the three edge groups (Fig. 4b). For example, Sphingobacterium was enriched in vertices associated with generalist edges, in which the most abundant edges were Sphingobacterium-Spartobacteria, Sphingobacterium-Legionella, and Sphingobacterium-Solirubrobacter (Data file S2). Microgenomates was enriched in vertices associated with specialist edges linking generalist vertices, in which the most abundant co-occurrence relationships were between Microgenomates and Armatimonates. The taxa profiles of vertices associated with those three edge groups varied with environments (Fig. S7-S9).
Based on edge overlap among the subnetworks inferred from trimmed microbial community data, the 12 environments were clustered into two groups (Fig. 4c). One group consisted of the subnetworks of soil, non-saline water, animal surface, and animal distal gut (group I); the other cluster consisted of the subnetworks for rhizosphere, plant surface, secretion and proximal gut of animal, saline water and sediment, and non-saline sediment and surface (group II). Those two groups were mainly linked through the surface microbiomes of plants and animals.
Network hubs
To correct for biases of sample or taxa number, we identified the ten hubs with the highest degree from each subnetwork inferred from 12 trimmed datasets with the same sample and taxa number. A total of 120 hubs belonged to 60 ESVs (Fig. 5a), which were mainly from phyla Clostridia, Deltaproteobacteria, Alphaproteobacteria, Actinobacteria, and Gammaproteobacteria (Fig. 5b). Based on hub presentation, 12 subnetworks were clustered into two groups, which were consist with the two groups clustered based on edge overlap as described above. Acidobacteria Gp2 and Nisaea were identified as hubs in most of subnetworks. Latescibacteria was identified as hubs in all the subnetworks of soil, non-saline water, animal surface, and animal distal gut (group I). Treponema, Micrococcus, and Methanobrevibacter were identified as hubs in four of the subnetworks for rhizosphere, plant surface, secretion and proximal gut of animal, saline water and sediment, and non-saline sediment and surface (group II). Thirty-seven hubs were identified as specialist hubs, which were identified as hubs in only one subnetwork (Fig. 5a), such as in the subnetworks for soil (5), saline sediment (5), and rhizosphere (5).
Negative co-occurrence links
The proportion of negative edges ranged from 1.9 to 48.9% in the 12 subnetworks inferred from trimmed datasets (Fig. 6). Most of subnetworks consisted of more than 10% negative edges, except in subnetworks for soil (1.9%) and non-saline water (7.5%). The proportion of negative edges ranged from 10.1 to 20.1% in the subnetworks for animal associated microbiomes (animal surface, secretion, and distal and proximal gut) and ranged from 27.1 to 30.8% in the subnetworks for plant-associated microbiomes (rhizosphere and plant surface). The proportion of negative edges ranged from 32.8 to 39.7% in the subnetworks for sediments and reached 48.9% in the subnetwork for non-saline surface. Vertices linked with negative edges were dominated by phyla Alphaproteobacteria, Actinobacteria, Clostridia, Deltaproteobacteria, and Gammaproteobacteria, but the taxa profiles of negative edge-linked vertices varied with environments (Fig. 6). A substantial proportion of negative edges were linked with Acidobacteria in the subnetworks of soil, saline sediment, and animal proximal gut, with Spirochaetia in the subnetworks of saline and non-saline water, and with Sphingobacteria in the subnetworks of surface of plant, animal, and non-saline environments. However, most negative edges were environmental specialists at genus level, except for the negative co-occurrence relationships between Spartobacteria and Acidobacteria Gp10, between Legionella and Plantactinospora, and between Acidobacteria Gp6 and Acidobacteria Gp10 (Data file S3).