The functional evolution of termite gut microbiota
Microbiome volume 10, Article number: 78 (2022)
Termites primarily feed on lignocellulose or soil in association with specific gut microbes. The functioning of the termite gut microbiota is partly understood in a handful of wood-feeding pest species but remains largely unknown in other taxa. We intend to fill this gap and provide a global understanding of the functional evolution of termite gut microbiota.
We sequenced the gut metagenomes of 145 samples representative of the termite diversity. We show that the prokaryotic fraction of the gut microbiota of all termites possesses similar genes for carbohydrate and nitrogen metabolisms, in proportions varying with termite phylogenetic position and diet. The presence of a conserved set of gut prokaryotic genes implies that essential nutritional functions were present in the ancestor of modern termites. Furthermore, the abundance of these genes largely correlated with the host phylogeny. Finally, we found that the adaptation to a diet of soil by some termite lineages was accompanied by a change in the stoichiometry of genes involved in important nutritional functions rather than by the acquisition of new genes and pathways.
Our results reveal that the composition and function of termite gut prokaryotic communities have been remarkably conserved since termites first appeared ~ 150 million years ago. Therefore, the “world’s smallest bioreactor” has been operating as a multipartite symbiosis composed of termites, archaea, bacteria, and cellulolytic flagellates since its inception.
Termites are the oldest social insect lineage, comprising about 3000 described species classified into nine families . Together with Cryptocercus, a genus of sub-social wood-feeding cockroach forming the sister group of termites and from which they diverged ~ 170 Million years ago [13, 81], they form one of the few animal lineages able to metabolize lignocellulose, one of the most abundant biomolecule on Earth . Similar to the abundance of their food, termites are also abundant. This is especially true in tropical and subtropical ecosystems where they are the most important macroscopic decomposers of organic matter [10, 42, 53, 129], moving tons of material per hectare every year, and considerably influencing soil properties and net productivity [44, 67].
All modern termite families feed on wood, except for the Termitidae, sometimes referred to as the higher termites, a monophyletic termite family nested within the lower termites, a group including the remaining eight termite families. Termitidae first appeared ~ 50 Million years ago [13, 27]. They feed on substrates distributed along the wood-soil decomposition gradient [12, 37]. Although Termitidae and other termite families produce their own endogenous cellulases [139, 148], their ability to decompose wood or soil organic matter largely depends on symbiosis with mutualistic gut microbes [23, 149]. The mutualistic gut microbes include bacteria, archaea, and, in the case of lower termites, cellulolytic flagellates [24, 33]. In addition, one of the eight subfamilies of Termitidae, the Macrotermitinae, is associated with the cellulolytic fungus Termitomyces they cultivate inside their nests [33, 116].
Termites and many of their nutritional symbionts are mutually obligate [24, 34]. The cellulolytic flagellates of termites are typically found nowhere else than in termite guts and are efficiently transmitted across host generations [86, 90]. Similarly, many prokaryotes present in termite guts are found nowhere else in nature [16, 52]. Their vertical mode of inheritance is supported by the observations that differences among termite gut prokaryotic and protist communities tend to increase as phylogenetic distances among termite hosts increase [1, 132]. In addition, the diet of the termite host, which broadly correlates with the termite phylogeny , also shapes the termite gut microbial communities [36, 88]. Whether the termite phylogeny is recapitulated by gut microbial functions, as it is recapitulated by the taxonomic composition of microbial communities, remains unknown.
Investigations of termite gut microbe genomes have revealed that, in addition to the production of enzymes involved in lignocellulose digestion, gut microbes have numerous nutritional functions, including nitrogen fixation and recycling abilities that supplement the nitrogen-poor diet of their host [54, 79, 99, 152]. While metagenomics and metatranscriptomics surveys of termite guts have been carried out for an increasingly large number of termite species [51, 80, 84, 140, 147], often with the prospect of harvesting cellulolytic enzymes able to convert plant biomass into biofuel (e.g., [30, 134]), there has been a marked sampling bias towards easy-to-sample wood-feeding termite species, and species with pest status. Far less is known about the function and taxonomy of the gut prokaryotic communities of other termite lineages, such as basal wood-feeding lineages or lineages with soil-feeding habits . Because of this gap in our knowledge, it remains unclear how the taxonomy and function of the gut microbiome have been evolving since termites came to be ~ 150 Million years ago [13, 27]. Similarly, how the acquisition of a diet based on soil has affected the taxonomy and function of gut microbial communities remains an open question. A metagenomics survey based on a comprehensive sampling of termites is required to answer these questions.
In this study, we sequenced whole gut metagenomes of 145 termite samples representative of the phylogenetic and ecological diversity of termites, including many lineages that have remained undocumented. We also sequenced the gut metagenome of one sample of Cryptocercus, the sister group of termites . We used the assembled prokaryotic contigs of this dataset to determine (1) when important gut prokaryotic pathways involved in nutritional functions were acquired by termites; (2) to which extent termite phylogeny is predictive of gut prokaryote taxonomic and functional composition; and (3) the taxonomic and functional changes experienced by gut prokaryote communities following the acquisition of a diet of soil.
Results and discussion
The taxonomic composition of termite gut prokaryotes
We sequenced whole gut metagenomes, including the hindgut containing the bulk of the gut microbiota, of 145 termite species (Table S1, Figure S1). Our sampling included species from the nine termite families and species from the eight subfamilies of Termitidae . Our shotgun sequencing approach generated an average of 72.5 million reads per sample (ranging between 17.3 and 142.5 million reads per sample) that were assembled into an average of 92,237 scaffolds > 1000 bps (ranging between 818 and 364,576 scaffolds), constituting 63.3% of mapped reads. The proportions of prokaryotic reads were on average 18.4% in lower termites and 20.5% in higher termites.
We used 40 marker genes [131, 150] to determine the taxonomy and estimate the abundance of each major bacterial lineage present in the 129 termite gut metagenome assemblies including upward of 10,000 contigs longer than 1000 bps, in total. Shorter contigs were removed from the analyses. We compared the bacterial community composition and abundance inferred from marker gene data and 16S ribosomal RNA (rRNA) gene amplicon sequences of 74 termite gut samples (Figure S2). Our estimates of the bacterial community composition and abundance using the 16S rRNA gene were consistent with previously published estimates using the same marker [16, 36, 88]. These estimates also showed similarities at the phylum level to that inferred from marker gene data. However, the abundance distribution estimated by both approaches showed some disagreements for several families. Notably, Dysgomonadaceae, Ruminococcaceae, Synergistaceae, and Oscillospiraceae occurred at low abundances among the marker genes but were represented by many 16S rRNA gene sequences in most termite species (Table S2). These discrepancies are likely the result of variation in 16S rRNA gene copy numbers [41, 143], which are higher in these lineages or are possibly artifacts generated during 16S rRNA gene amplicon PCR cycles. They might also reflect the incomplete coverage of our metagenomes or, to a certain extent, the differences in the databases used for classification.
In total, we identified 114 bacterial families and family-level lineages represented in the marker genes of more than 5% of our 129 termite gut metagenome assemblies (Table S3). These 114 bacterial family-level lineages belonged to 19 phyla. An additional 193 other bacterial family-level lineages were recorded from the gut of no more than a few termite species and were possibly transient and not strictly associated with termite guts. We calculated the Moran I index on the abundance of these 114 family-level bacterial lineages to test whether bacterial abundance is correlated with termite phylogeny. We found a phylogenetic autocorrelation signal for 59 of the 114 bacterial lineages. This signal remained significant at a 5% false discovery rate (FDR) correction for 27 bacterial lineages, including some of the most abundant bacterial lineages (Fig. 1, Table S4). For example, the wood-fiber-associated Fibrobacteraceae [87, 140] are dominant in the gut of Microcerotermes, Nasutitermitinae, and related termite lineages and are either undetectable or occur at low abundance in the assemblies of other termite lineages. Another example is the Endomicrobiaceae, which comprise flagellate-associated [59, 127] and free-living lineages [60, 89]; they are abundant in lower termites and almost absent in higher termites.
Our dense taxonomic sampling of diverse termite hosts also allowed us to identify bacterial lineages whose association with termites has remained largely unreported. For example, we found that the Holophagaceae, a bacterial family of Acidobacteriota previously reported from the gut of three humus-feeding termite species  and two species of Nasutitermitinae , is widely distributed in Nasutitermitinae, Foraminitermitinae, the Cephalotermes-group, and the Pericapritermes-group (Fig. 1). Altogether, our results demonstrate that termite phylogeny is remarkably predictive of the gut bacterial community composition, at least at the family level, as shown for termite gut protists .
Using the same 40 marker genes and 129 metagenome assemblies used for bacteria, we investigated the diversity of gut-associated Archaea across the termite phylogenetic tree. In total, we identified 16 archaeal families and family-level lineages, including Methanoculleaceae and Methanocorpusculaceae (order Methanomicrobiales), Methanosarcinaceae (order Methanosarcinales), Methanobacteriaceae (order Methanobacteriales), Methanomethylophilaceae (order Methanomassiliicoccales), and UBA233 (class Bathyarchaeia). Seven out of sixteen family-level lineages were present in the gut of more than 5% of termite species. The abundance of Methanosarcinaceae, UBA233, and an unclassified family-level lineage of Bathyarchaeia showed significant autocorrelation signals with the termite phylogenetic tree when no FDR correction was applied (Fig. 1, Table S4). Bathyarchaeia occurred in the clade of Termitidae excluding Macrotermitinae, Sphaerotermitinae, and Foraminitermitinae, confirming previous reports , and Methanosarcinaceae was found in Macrotermitinae, Nasutitermitinae, and Cubitermitinae and related termite lineages (Fig. 1). Archaea represented on average less than 1% of the gut prokaryotes in wood-feeding termite species, while their proportion reached 4.6% in Macrotermitinae and 10.6% in soil-feeding termite species, and was exceptionally high in the soil-feeding Mimeutermes in which 59.8% of the marker genes were assigned to Bathyarchaeia. Our results are in line with the higher archaeal-to-bacterial ratios reported in soil-feeding termites compared to their wood-feeding counterparts, reflecting the higher methane emission rates of soil-feeding termites [25, 26].
The carbohydrate-active enzymes of termite gut prokaryotes
We investigated the evolution of prokaryotic carbohydrate-active enzymes (hereafter: CAZymes) using the same 129 gut metagenome assemblies used to investigate gut prokaryotic composition. The de novo assemblies of these 129 gut metagenomes contained an average of 127,159 prokaryotic open reading frames (ORF). We identified ORFs coding for CAZymes using Hidden Markov model searches against the dbCAN2 database . As a first step, we investigated the evolution of enzymes derived from prokaryotes with no consideration of their taxonomic origin. In total, we found 346 CAZyme categories in 129 gut metagenomes that consisted of 205 glycoside hydrolases (GHs), 57 glycoside transferases (GTs), 18 enzymes with carbohydrate-binding modules (CBMs), 16 carbohydrate esterases (CEs), 41 polysaccharide lyases (PLs), and nine redox enzymes with auxiliary activities (AAs) (Table S5). We did not find any CAZymes in only one gut metagenome (that of Araujotermes parvellus, at e-value cut-off below e−30, the smallest of the 129 gut metagenomes examined, which contained only 867 prokaryotic contigs). For the other 128 gut metagenomes, the number of CAZyme categories varied between 5 and 139 per gut metagenome. Five GH families, GH2, GH3, GH10, GH31, and GH77, were found in more than 85% of the termite species. 14 GHs, seven of which had putative lignocellulolytic activity, were found in 75 to 85% of the termite species. Therefore, glycoside hydrolases previously found to be abundant in the gut of particular termite species (e.g., [30, 147]) are universally part of the gut enzymatic repertoire of termites.
We calculated the Moran I index on the abundance of 211 CAZymes, including 146 CAZyme families and 65 subfamilies, present in more than 10% of termite species, and found an autocorrelation signal with the termite phylogenetic tree for 107 CAZymes. The autocorrelation signal remained significant after FDR correction for 77 CAZymes (Fig. 2, Table S6). Therefore, as for gut prokaryotic composition, termite phylogeny is predictive of the CAZyme repertoire present in termite guts.
Two factors that potentially affect the CAZyme repertoire of termite gut prokaryotes are diet and co-occurring non-prokaryotic cellulolytic symbiotic partners. To investigate the influence of these two factors on the CAZyme repertoire of termite gut prokaryotes, we distinguished four termite groups differing in their diets and association with co-occurring non-prokaryotic cellulolytic symbiotic partners. These four groups were: soil-feeding Termitidae (SF) and wood-feeding Termitidae excluding Macrotermitinae (WF), which host no other symbionts than gut prokaryotes , the fungus-cultivating Macrotermitinae (FC), which feed on wood or plant litter and cultivate cellulolytic fungi of the genus Termitomyces , and lower termites (LT), which feed on wood and host cellulolytic flagellates in their gut . Overall, the abundance of prokaryotic CAZymes was the highest in WF and the lowest in SF, while LT and FC fell between these two extremes (Table S7). These results are consistent with the scarcity of lignocellulose in the diet of SF, which predominantly feed on the nitrogen-rich fraction of the soil, including microbial biomass and organic residues associated with clay particles [65, 66, 92, 93]. The intermediate abundance of prokaryotic CAZymes in FC and LT reflects their dependence on the diverse cellulolytic enzymes produced by Termitomyces fungi  and gut flagellates [94, 153], respectively.
Task partitioning between gut prokaryotes and other symbionts—in which both partners participate in different steps of wood digestion and provide different sets of CAZyme—could be revealed from the gut metagenomes of LT and FC. Principal component analysis revealed that the prokaryotic CAZyme repertoire differs considerably among SF, LT, FC, and WF (Fig. 3A). To characterize more accurately the contribution of termite gut prokaryotes to wood digestion, whenever possible, we identified the substrate of each 211 CAZymes (including 146 families and 65 subfamilies) present in more than 10% of termite species. We individually compared the abundance of these 211 CAZymes using phylogenetic ANOVA. We found that 178 comparisons were significantly different, and 177 comparisons remained significant after FDR corrections (Fig. 3A, Table S7). Notably, we found that the combined seven GHs exclusively identified as cellulases were significantly depleted in LT as compared to other termite groups and were significantly depleted in FC and SF as compared to WF (Fig. 2, Table S7). A similar pattern was found for the combined 29 GHs exclusively identified as hemicellulases, which were significantly more abundant in WF than in other termite groups (Fig. 3A, Table S7). Therefore, the gut metagenomes of LT and FC appear to be depleted in prokaryotic GHs targeting cellulose as compared to WF, possibly reflecting task partitioning between termite gut prokaryotes and eukaryotic symbionts such as cellulolytic flagellates in LT and Termitomyces in FC. Task partitioning between gut prokaryotes and Termitomyces in FC was previously suggested for Macrotermes natalensis , with gut symbionts primarily participating in the final digestion of oligosaccharides and Termitomyces performing the breakdown of complex carbohydrates. In support of this hypothesis, several GHs, such as GH8, GH26, GH45, GH5_2, and GH53, largely depleted from the gut metagenomes of LT, were highly expressed by the gut cellulolytic flagellates of C. formosanus  and were abundant in the gut metagenomes of WF. However, several GHs encoded by gut prokaryotes are also highly expressed by the gut cellulolytic flagellates of C. formosanus (e.g., GH13_8, GH36, GH3, GH92, GH133) . Therefore, the extent of the complementarity between the CAZyme repertoires of gut flagellates and prokaryotes is unclear and requires further investigation.
We next investigated the taxonomic origin of the prokaryotic CAZymes found in the same 129 whole gut metagenomes. We focused on the 19 GHs found in more than 10% of termite species and embedded in contigs longer than 5000 bps, allowing taxonomic annotation based on several genes. Contigs including genes with discordant taxonomic annotations potentially indicate horizontal gene transfers, as is common among bacteria , and were removed. We found that Bacteroidota were a significant source of GH2, GH9, GH10, GH20, GH28, GH29, GH30, GH31, and GH130 in FC and LT, while, as previously described , they rarely encoded these GHs in non-Macrotermitinae Termitidae (WF and SF) (Fig. 4, Table S8). In contrast, Fibrobacteres, which were very rare in LT, were a significant source of GH2, GH3, GH8, GH9, GH10, GH11, GH18, GH26, GH30, GH43, GH94, and GH130 in WF. Two other bacterial phyla, Spirochaetota and Firmicutes A, encoded most of the investigated GHs and were important contributors of GHs in WF (Fig. 4, Table S8). Therefore, the primary contributors of GHs are distinct between lower and higher termites. These results are consistent with previous reports indicating a possible involvement of the ectosymbiotic Bacteroidota of some oxymonadid flagellates in cellulose and hemicellulose hydrolysis [141, 156] in lower termites, while Fibrobacteres, Spirochaeota, and/or Firmicutes are major agents in cellulose and hemicellulose degradation in higher termites [30, 51, 84, 140, 147]. Our comprehensive analyses strongly indicate that the loss of cellulolytic flagellates in the ancestor of higher termites was accompanied by a major reworking of the cellulolytic bacterial communities, from Bacteroidota in LT to Fibrobacterota and Spirochaeota in WF and to Firmicutes in SF.
CAZymes are often organized as polysaccharide utilization loci (PULs) that target complex polysaccharides . To search for PULs in our metagenomes, we reconstructed metagenome-assembled genomes (MAGs) by grouping contigs with similarities in sequence composition and depth of coverage. In total, we obtained 654 prokaryotic MAGs that ranged in completeness from 30 to 100% with < 10% contamination for lineage-specific marker genes. We kept low-quality MAGs, with completeness between 30 to 50%, as several such MAGs possessed complete pathways of interest (Figure S3, Table S9). The 654 MAGs included members of 16 phyla of bacteria and four phyla of archaea and included representatives of all major prokaryote phyla known to be present in the termite gut. We found 128 PULs distributed across 130 MAGs, including 31 MAGs of Bacteroidota, 71 MAGs of Firmicutes, 13 MAGs of Proteobacteria, 12 MAGs of Spirochaetota, two MAGs of Actinobacteria, and one MAG of Verrucomicrobiota (Table S10). Sixteen PULs, found in 10 MAGs, had all the PUL components and mainly targeted lignocellulose components such as cellulose and xylan, and saccharides such as melibiose, alginate, and lactose. A total of 107 PULs found in 74 MAGs encoded enzymes for more than one substrate but did not have all the PUL components, possibly reflecting the incompleteness of our MAGs or missing components nonessential for their activity, as experimentally demonstrated in the xylan utilization system (Xus) of a Bacteroidota associated with Pseudacanthotermes . Altogether, our data provide an overview of the PUL distribution in termite gut microbes.
Reductive acetogenesis in the termite gut
The fermentation of wood fibers by the termite gut microbiota produces mainly acetate, which is used by the termite host, and H2 and CO2 [24, 58]. Most of the H2 is used to produce additional acetate by the reduction of CO2 [19, 20, 106]. We focused on the genes of seven enzymes of the Wood-Ljungdahl pathway (WLP) of reductive acetogenesis that are present in all acetogens from termite guts identified to date, namely formate dehydrogenase H (fdhF), formate-tetrahydrofolate ligase (fhs), methylenetetrahydrofolate dehydrogenase (folD), 5,10-methylenetetrahydrofolate reductase (metF), acetyl-CoA synthase (acsABCDE), phosphotransacetylase (pta), and acetate kinase (ack), which are essential to operate the bacterial WLP . We compared the relative abundance of these markers across the 129 whole gut metagenomes used for previous analyses and found a significant phylogenetic autocorrelation signal with the termite phylogenetic tree for five of the seven enzymes, two of which remain significant after FDR correction (fdhF and acsABCDE) (Fig. 5, Table S11). The simultaneous presence of fdhF and acsABCDE is a strong predictor for the distribution of reductive acetogenesis across the termite phylogenetic tree. In the absence of fdhF and acsABCDE, the presence of the five other enzymes, fhs, folD, metF, pta, and ack, is less predictive of reductive acetogenesis because they are also involved in the metabolism of amino acids and purins, and other fermentative pathways.
The seven enzymes encoded by all acetogens significantly differed in relative abundance among the four termite groups. They were generally more abundant in LT and WF than in FC and SF (Fig. 3B, Table S11). These analyses are in agreement with previous studies that measured the potential rates of acetogenesis in a smaller set of termite species and corroborate the hypothesis that reductive acetogenesis is mostly associated with a diet of wood and is less important in fungus-cultivating Macrotermitinae and soil-feeding lineages [19, 137].
To determine the identity of the acetogens, we searched each MAG for the genes of the seven enzymes associated with reductive acetogenesis. We found 44 MAGs associated with six termite families and Cryptocercus that encoded at least five of the seven enzymes, but none of these MAGs contained the complete set of genes (Table S12, Fig. 6A). In addition to formate dehydrogenase H (fdhF), we also searched for the genes encoding [FeFe] hydrogenase Group A4 (HydA) and the iron-sulfur cluster proteins (HycB3, HycB4), the other subunits of the hydrogen-dependent CO2 reductase (HDCR) complex catalyzing the first step of CO2 reduction to formate [61, 119]. Two MAGs lacked fdhF but contained all other genes of the WLP and the HDCR complex (Table S12, Fig. 6A). These MAGs belonged to the Desulfobacterota family Adiutricaceae, which comprises the putatively acetogenic Candidatus Adiutrix intracellularis, a flagellate endosymbiont from the archotermopsid Zootermopsis, and numerous uncharacterized representatives from other lower and higher termites . Like Ca. Adiutrix intracellularis, none of the MAGs classified as Adiutricaceae encoded a pathway for dissimilatory sulfate reduction. They were found not only in the rhinotermitid Dolichorhinotermes but also in the higher termite Microcerotermes, indicating that the putatively free-living members of Adiutricaceae from higher termites (which lack gut flagellates) are acetogenic.
Because none of the other MAGs encoded a complete WLP, we could not unambiguously attribute acetogenic status to any other prokaryote lineage. Considering the high rates of reductive acetogenesis in many lower and higher termites, particularly the wood-feeding species , this may be explained either by the incompleteness of our MAGs or the failure to assemble any genomes of the populations responsible for the acetogenic activity. Based on the low free energy yields of both reductive acetogenesis and methanogenesis, it has been speculated that the proportion of (hydrogenotrophic) acetogens among the prokaryotic community in termite hindguts may be as low as that of (hydrogenotrophic) methanogens . The problem of genome assembly from low abundance populations would be exacerbated by a high species diversity among members of a particular metabolic guild. Alternatively, the absence of a complete reductive acetogenesis pathway among our MAGs may be genuine. This could be the case among the MAGs assigned to the family Treponemataceae B. Although the first isolate of this lineage is a homoacetogen with a complete WLP , none of the other species isolated to date are acetogenic . With the exception of Treponema primitia , Candidatus Treponema intracellularis , and Candidatus Adiutrix , the identity of the populations responsible for reductive acetogenesis in termite guts, including the putatively acetogenic Candidatus Termitimicrobium (Bathyarcheia ), remains open to speculation.
Methanogenesis in the termite gut
The methanogenic archaea present in the gut of termites consume a large fraction of H2 and are responsible for 3% of global methane emissions [25, 26]. We searched the 129 gut metagenomes used in earlier analyses for genes part of methanogenesis pathways. Because of the low abundance of Archaea in termite guts [26, 82], the abundance of genes involved in methanogenesis was often near or below our detection threshold. As a consequence, we were unable to analyze each gene independently. Instead, we calculated the Moran’s I index using the abundance of genes encoding the methyl-coenzyme M reductase complex (mcrABG), which catalyzes the final step of methanogenesis . We found no autocorrelation signal with the termite phylogenetic tree (Fig. 5, Table S11).
We compared the abundance of mcrABG among the four termite groups and found no significant differences (Fig. 3B, Table S11). However, this lack of significance probably reflects the low abundance of archaeal reads in our assemblies, rather than an actual uniformity of methanogenesis pathways across termites, as methane emission rates are known to be diet-related and particularly high in species feeding on soil (e.g., [8, 9, 19, 130]).
We searched our gut metagenomes for operons encoding mcrABG. We found 14 operons belonging to four methanogenic archaeal orders, Methanomassiliicoccales, Methanobacteriales, Methanomicrobiales, and Methanosarcinales, derived from the gut metagenomes of 14 termite species, including four of the eight families of LT and five of the nine subfamilies of Termitidae (Table S13). All mcrABG operons of LT were classified as Methanobacteriales, which is in agreement with previous reports on the prevalence of Methanobacteriales in LT . An exception was found in the gut metagenome of Porotermes quadricollis, which yielded a mcrABG operon from Methanomethylophilaceae (order Methanomassiliicoccales). This is unusual because members of this order are frequently encountered in higher termites and millipedes  but have been detected only once in the lower termite Reticulitermes speratus .
Next, we analyzed the methanogenic capacities of 26 MAGs of Archaea reconstructed from the gut metagenomes of 23 termite species from four termite families and the cockroach Cryptocercus. Only 13 MAGs belonging to Methanomicrobiales, Methanobacteriales, Methanosarcinales, and Methanomassiliicoccales encoded the mcrABG complex, indicating that the assemblies are incomplete (Fig. 6B, Table S14). Five of these 13 MAGs possessed complete pathways for methylotrophic methanogenesis and one MAG possessed complete pathways for hydrogenotrophic methanogenesis (Fig. 6B). The five MAGs showing genomic evidence of methylotrophic methanogenesis included one MAG of Methanosarcinales (genus Methanimicrococcus) and four MAGs of Methanomassiliicoccales, including three MAGs classified to genus Methanoplasma and one MAG classified to family Methanomethylophilaceae. Only two MAGs of Methanoplasma encoded a methanol:coenzyme M methyltransferase (mtaABC) complex, which is required for growth on methanol and typical for all members of this lineage , and one of the Methanosarcinales MAGs and one Methanoplasma MAG encoded a complete heterodisulfide reductase complex (HdrA2B2C2/mvhADG) present in most methanogens [29, 136], underscoring the incompleteness of the MAGs. The same was true for hydrogenotrophic methanogenesis, for which only one MAG belonging to Methanobacteriaceae (genus Methanobrevibacter C) possessed most of the genes required for the reduction of CO2 to methane, including a heterodisulfide reductase (HdrABC/mvhADG) complex, an iron-sulfur flavoprotein along with a F420-independent hydrogenase (Fdh), and a F420 reducing hydrogenase (FrhABC) (Fig. 6B, Table S14). The absence of aceticlastic methanogens is in agreement with previous reports [25, 26]. Overall, our results highlight the diversity of methanogens found in termite guts and the diversity of the pathways they use.
Sulfate-reducing bacteria are potential H2-consumers in the gut of termites [18, 39, 73] (Fig. 5). However, sulfate concentration is low in the termite gut, as is H2 consumption by sulfate-reducing bacteria [23, 39]. We found all the genes of the dissimilatory sulfate reduction pathway, namely the two subunits of adenylylsulfate reductase (aprA and aprB), sulfate adenylyltransferase (sat), and dissimilatory sulfite reductase (dsrAB), in six out of eight lower termite families, all the higher termite subfamilies, and Cryptocercus. The abundance of aprAB and sat was significantly correlated with the termite phylogenetic tree, and the correlation remained significant after FDR correction for sat (Fig. 5, Table S11). Notably, the presence of aprAB and sat is predictive of the dissimilatory sulfate reduction pathway only in the presence of dsrAB, because they are also involved in sulfate assimilation.
Comparisons of the four termite groups showed that the abundance of aprAB was significantly higher in WF than in SF and the abundance of sat was significantly higher in LT than WF and SF (Fig. 3B, Table S11). While sulfate reducers have been isolated from the guts of LT, FC, and SF [18, 73], we found metagenomic evidence that sulfate reduction is also prevalent in WF.
Next, we analyzed the sulfate-reducing capabilities of our 654 MAGs and found a complete pathway for dissimilatory sulfate reduction in four MAGs (Fig. 6C, Table S15). Three of these MAGs, found in the termites Parrhinotermes, Reticulitermes, and Tumulitermes, were assigned to Desulfovibrionaceae (Desulfobacterota), which are common in the termite gut and generate energy via sulfate respiration [74, 117]. Of note, the fourth MAG, retrieved from the gut metagenome of the apicotermitine Heimitermes laticeps, belonged to the Proteobacteria family Burkholderiaceae, a bacterial family that was, prior to this study, largely unreported from termite guts, and that is abundant in Apicotermitinae and in the termite clade that includes the Cubitermitinae, the Pericapritermes-group, and the Termes-group. The evidence for dissimilatory sulfate reduction in Burkholderiaceae termite guts suggests that the capacity for sulfate respiration is more widely distributed than expected.
Nitrogen recycling by termite gut prokaryotes
Because the content of nitrogen in wood is low, termites have evolved mechanisms of nitrogen conservation. The termite gut microbiota contributes to the nitrogen metabolism of its host by recycling nitrogen [22, 56]. Like most insects, termites convert waste products from nitrogen metabolism into uric acid, but, unlike other insects, the gut prokaryotes of termites degrade uric acid into ammonia, which is subsequently assimilated by the gut microbiota . We searched the 129 metagenomes used for previous analyses and found only a few genes possibly involved in uric acid degradation, including 11 aegA (a putative oxidoreductase suspected to be involved in uric acid degradation by Enterobacteriaceae ) in six termite species. Since the uricolytic prokaryotes isolated from termite guts are strict anaerobes [107, 108, 138], it is likely that they use alternative, so far unknown, pathways. Termite tissues reportedly lack uricase activity , but when we examined the transcriptomes of 53 termite species generated by Bucek et al. , we found evidence for the expression of a gene encoding urate oxidase in 20 termite species belonging to four termite families (Figure S4). This indicates that termites should be able to carry out the first step of uric acid degradation. However, the extent of the contribution of the termite host to uricolysis and the identity of the uricolytic prokaryotes and their catabolic pathways remain unknown.
The metagenomes of all termite families included numerous prokaryotic genes from other pathways involved in the production of ammonia (Fig. 5, Table S11), including ureases (ureABC), which degrade urea into ammonia [55, 100], and some of the genes of the dissimilatory nitrate reduction pathway (narGHI, napAB, nrfAB), which convert nitrate into ammonia. Among those, the abundance of ureABC genes significantly correlated with the termite phylogenetic tree after FDR correction (Fig. 5, Table S11). We also found in the metagenomes of all termite families genes from pathways involved in amino acid biosynthesis from ammonia, including glutamine synthetase (glnA) and glutamate synthase (gltBD), the genes involved in the synthesis of glutamate from ammonia, and carbamate kinase (arcC), ornithine carbamoyltransferase (argF), argininosuccinate synthase (argG), and argininosuccinate lyase (argH), the genes involved in arginine biosynthesis from ammonia . The abundance of gltBD correlated with the termite phylogenetic tree after FDR correction (Fig. 5, Table S11). Therefore, the termite phylogeny is a good predictor of the enteric abundance of some of the prokaryotic genes involved in ammonia metabolism in termites.
We compared the four termite groups using the relative abundance of the nitrogen-recycling genes and found that the abundance of ureABC differed among termite groups, with the gut metagenomes of LT and WF significantly enriched in ureABC as compared to those of SF and FC (Fig. 3B, Table S11). In contrast, the abundance of some of the genes of the dissimilatory nitrate reduction pathway, such as napAB and narGHI, was significantly reduced in the gut metagenomes of WF compared to SF and FC (Fig. 3B, Table S11). This suggests that the high rates of nitrate ammonification previously found in two soil-feeding species  are a characteristic that all soil-feeding termites share with fungus-cultivating termites. We also found that gltBD was significantly enriched in LT as compared to other termite groups, while argFGH was significantly enriched in LT and WF as compared to SF (Fig. 3B, Table S11). The low abundance of genes involved in ammonia assimilation in soil-feeding termites is likely linked to their diet, which includes soil peptidic residues [65, 66].
Next, we searched our 654 MAGs to determine the taxonomic identity of the prokaryotes involved in nitrogen recycling. Six MAGs possessed the three ureases ureABC that convert urea into ammonia, and 15 MAGs included a complete dissimilatory nitrate reduction pathway that converts nitrate into ammonia. All these MAGs belonged to diverse lineages of Proteobacteria and Campylobacterota (order Campylobacterales), except for one MAG of Firmicutes (genus Bacillus) found in Foraminitermes rhinoceros and endowed with ureABC, narGHI, and nirBD (Fig. 7A, Table S16). We also found numerous MAGs capable of ammonia assimilation into glutamate and arginine, indicating that ammonia is an important nitrogen source for many termite gut prokaryotes. 91 MAGs possessed glnA and gltBD for glutamate biosynthesis from ammonia, while 26 MAGs possessed the four genes arcC, argF, argG, and argH for arginine biosynthesis from ammonia via the urea cycle, including 12 MAGs that also contained the glutamate biosynthesis pathway (Fig. 7A, Table S16). 66 MAGs encoding glutamate biosynthesis genes, and 15 MAGs with arginine biosynthesis genes, also possessed the ammonium transporter Amt. These MAGs belonged to ten phyla, including many representatives of the lineages abundant in the gut of termites (Fig. 7A, Table S16). Therefore, a great many bacterial lineages contribute to the nitrogen metabolism of their termite hosts.
Nitrogen fixation by termite gut prokaryotes
Many species of wood-feeding termites host dinitrogen-fixing prokaryotes in their gut, which compensate for the low nitrogen content of wood . They fix nitrogen with either the molybdenum-dependent (Nif), vanadium-dependent (Vnf), or iron-only alternative nitrogenases (Anf) [63, 97, 152]. We found gene homologs for the structural subunits of these nitrogenases (collectively referred to as nifDHK) in metagenomes from all termite families and in Cryptocercus (Fig. 5). Their abundance significantly correlated with the termite phylogeny after FDR correction (Fig. 5, Table S11), as was the case for several other pathways involved in nitrogen economy. There were significant differences among termite groups, with the nitrogenase reads in the gut metagenomes of LT and WF being 24.4-fold more abundant than in SF and 20.2-fold more abundant than in FC (Figs. 3B, 5, Table S11). This is in line with the higher rate of N2 fixation measured in LT and WF than in SF and FC , and reflects the high amount of nitrogen present in soil and fungi, making the energy-demanding process of N2 fixation unnecessary [23, 56].
To identify the diazotrophs present in the gut of termites, we taxonomically classified contigs longer than 5000 bps that contained the six genes present in all diazotrophs, nifDHK (which encode nitrogenase), and nifB, nifE, nifN, which encode proteins involved in nitrogenase biosynthesis . We identified 15 contigs matching these criteria in the gut metagenomes of 12 termite species, representing five of the nine termite families (Table S17). These contigs were assigned to diverse prokaryote lineages, including nine contigs of diverse Bacteroidota, three contigs of the Spirochaetota order Treponematales, two contigs of the Proteobacteria family Enterobacteriaceae, and one contig of the archaeal genus Methanobrevibacter. We carried out the same analyses on our MAGs and found 18 MAGs that contained a nifHDKBEN cluster, including seven MAGs that belonged to phyla not represented in the contigs >5000 bps. Among these seven MAGs, there were four MAGs of the Actinobacteriota family UBA8131, one MAG of the Planctomycetota family Thermoguttaceae, one MAG of the Verrucomicrobiota family Chthoniobacteraceae, and one MAG of the Firmicutes C order Acidaminococcales (Fig. 7A, Table S16). Therefore, the taxonomy of diazotrophs found in our termite species set corroborates previous evidence that termites host diverse communities of diazotrophs in their guts [35, 98, 152].
We next investigated the taxonomic distribution of diazotrophs across termites. We focused on contigs longer than 5000 bps that included genes with concordant taxonomic annotation and contained a nifHDK operon (Fig. 7B, Table S18). In lower termites, the dominant diazotroph was an undescribed Bacteroidota allied to an ectosymbiont of the Cryptocercus gut flagellate Barbulanympha . This undescribed Bacteroidota was found in three of the eight families of LT. It was also largely absent from the gut metagenomes of Coptotermes and Heterotermes, which harbor the flagellate endosymbiont Candidatus Azobacteroides as the main diazotroph . The diazotrophs of Termitidae belonged to various phyla. Notably, we found the N2-fixing Candidatus Azobacteroides in the nasutitermitine Coatitermes (which lacks gut flagellates) and a N2-fixing Treponematales in Mastotermes, highlighting that the dominant lineages of diazotrophs in particular termite lineages are also harbored at a low abundance by unrelated species of termites (Fig. 7B, Table S18). Therefore, our results indicate that the phylogenetic position of termite species determined, at least partly, the taxonomy of their dominant diazotrophs.
The metagenomics and metatranscriptomics surveys of termite guts carried out so far targeted a limited number of termite species (e.g., [51, 80, 84, 140, 147]), and thus did not permit an investigation of how the gut microbiome of these social roaches has been evolving in term of function and composition since termite origin, ~150 Million years ago. To address this issue and provide a global picture of the taxonomic and functional composition of the termite gut microbiome, we generated gut metagenomes for a comprehensive set of 145 termite species. The 129 most complete metagenomes were used to investigate the functional evolution of termite gut microbiota, revealing that (1) gut prokaryotic genes involved in the main nutritional functions are generally present across termites, suggesting these genes were already harbored by the common ancestor of modern termites; (2) the termite phylogenetic tree is largely predictive of the gut bacterial community composition and the nutritional function they exert; and (3) the acquisition of a diet of soil was accompanied by a change in the stoichiometry of genes and metabolic pathways involved in important nutritional functions rather than by the acquisition of new genes and pathways.
The analyses of the gut metagenomes of one Cryptocercus species and 145 termite species indicated that prokaryotic CAZymes, genes of the reductive acetogenesis, sulfur reduction, and methanogenesis pathways, and genes involved in nitrogen fixation and recycling are present across the nine termite families. The nutritional functions previously known to be performed by the gut prokaryotic symbionts of particular termite species (e.g., [30, 147]) are probably performed in the gut of all termites. These results strongly suggest that important nutritional functions were already harbored by the common ancestor of modern termites. Important nutritional functions are performed in the gut of modern termites by multiple bacterial and archaeal lineages, some of which may have been acquired, together with the charismatic gut cellulolytic flagellates, by the ancestor of termites . In support of this hypothesis, many termite gut bacteria phylotypes form monophyletic groups present in the gut of various termite families but absent from other environments . We postulate that, as the cockroach-like ancestor of termites evolved wood-feeding, it recruited facultative gut microbes able to degrade wood and participate in the nitrogen economy as essential gut symbionts.
Our analyses indicate that the phylogenetic position of termite species is partly predictive of the functions of gut bacterial communities. This is best illustrated by CAZymes, whose abundance often correlated with the termite phylogenetic tree. However, correlation with the termite phylogenetic tree was not found for some genes, such as the mcrABG genes of the methanogenesis pathway, the genes of the sulfate reduction pathway, and the genes of the dissimilatory nitrate reduction pathway. Whether this lack of correlation is genuine or reflects the insufficient depth of sequencing is unclear and requires further study. In any case, our results indicate that the correlation found between the phylogenetic tree of termites and their gut bacterial and protist communities [1, 132] is also found for some gut microbial functions.
The comparison of four termite groups, soil-feeding Termitidae (SF), fungus-cultivating Macrotermitinae (FC), non-Macrotermitinae wood-feeding Termitidae (WF), and lower termites (LT), reveals that genes and metabolic pathways important to termites are present in all termite species, but their abundances vary among groups. Notably, the gut metagenomes of SF possessed on average fewer CAZymes, nitrogenases, reductive acetogenesis, and sulfate-reducing genes than the gut metagenomes of other termite groups. Therefore, as pointed out by Marynowska et al. , the gut prokaryote communities of SF retain important carbohydrate metabolism capabilities. Nevertheless, our dataset clearly indicates that these abilities are much reduced in soil-feeders compared to wood-feeders. Overall, our results support the idea that the acquisition of soil-feeding was accompanied by changes in the abundance of the gut prokaryote metabolic pathways important to termite nutrition.
We collected a total of 145 termite samples and one sample of the cockroach Cryptocercus kyebangensis (Table S1, Figure S1). These samples were representative of the global termite diversity. All samples were preserved in RNA-later® and stored at – 80 °C until DNA extraction.
DNA extraction and sequencing
Genomic DNA extraction was performed on the whole guts of five workers using the NucleoSpin Soil kit (Macherey-Nagel) according to the manufacturer’s protocol. Library preparation was performed using the KAPA Hyperplus Kit, which is based on a unique dual tag indexing approach that minimizes the effects of index hopping. Libraries were either PE250-sequenced on the Illumina HiSeq2500 platform or PE150-sequenced on the Illumina HiSeq4000 platform (Table S1).
Data filtering and assembly of metagenomic reads
Raw reads were filtered based on their quality. Reads with average Phred quality score below 30 were removed using Trimmomatic v 0.33 . The “SLIDINGWINDOW” was set to “4:30” to trim low-quality bases (Phred quality score below 30) from the 3′ end of the reads. We removed the 16 base pairs at the 5′ end of each read using the “HEADCROP” option because we observed over-represented k-mers in this region of the reads. Reads shorter than 50 bps were removed.
The quality-controlled reads were assembled into contigs using SPAdes v 3.11.1  with the “meta” option and k-mer sizes of 21, 31, 41, 51, 71. The assembly quality was checked using the “metaquast” option of QUAST v 3.1 (Quality Assessment for Genome Assemblies) based on weighted median contig size (N50)  and percent of reads mapped to the contigs [76, 101]. Only the reads mapped to prokaryotic contigs were examined in this study (see the “Taxonomic annotation” and “Functional annotation” sections below). In total, we assembled 145 termite gut metagenomes and performed downstream analyses on the 129 metagenomes including upward of 10,000 contigs longer than 1000 bps. The 16 metagenomes with less than 10,000 contigs longer than 1000 bps were exclusively used to reconstruct metagenome-assembled genomes (MAGs).
Termite phylogenetic tree reconstruction
We built a phylogenetic tree of termites using mitochondrial genomes retrieved from metagenome assemblies. Mitochondrial contigs derived from termites were identified using BLAST search (sequence length > 5000 and percent identity > 90)  against previously published whole mitochondrial genomes of termites [13,14,15, 146]. Mitochondrial genomes were complete or near-complete in most cases. Each contig derived from mitochondrial genomes was annotated using the MITOS webserver . The 13 protein-coding genes, two rRNA genes, and 22 transfer RNA (tRNA) genes were aligned with MAFFT v 7.305  using default settings. The alignments were concatenated, and the third codon position of protein-coding genes was removed. The dataset was partitioned into four subsets: one for the first codon position of protein-coding genes, one for the second codon position of protein-coding genes, one for the two rRNA genes, and one for the 22 tRNA genes. A Bayesian phylogenetic tree was generated using BEAST v 2.4.8 . We used an uncorrelated relaxed lognormal clock model , and a birth-death speciation process as a tree prior . The molecular clock was calibrated using nine fossil calibrations used by Bucek et al.  (Table S19). The fossil calibrations were implemented as exponential priors on node times. Because transcriptome-based phylogenies unambiguously support the monophyly of Sphaerotermitinae and Macrotermitinae  (unlike mitochondrial genome-based phylogenies ;), we constrained Sphaerotermitinae + Macrotermitinae to be monophyletic. Similarly, we constrained non-Stylotermitidae Neoisoptera to form a monophyletic group. The MCMC chain was sampled every 1000 steps over 0.4 billion generations. The convergence of the chain was assessed using Tracer v 1.7.1 , and the initial 10% was discarded. We carried out two replicate MCMC runs to ensure convergence of the chain.
Reconstruction of metagenome-assembled genomes
We reconstructed metagenome-assembled genomes (MAGs) from metagenomes contigs using CONCOCT v 0.4.0  implemented in the metawrap software v 0.9  with default parameters. MAG quality checking, based on 43 single-copy marker genes (Table S9), was performed with CheckM v 1.0.11 . High-quality MAGs, medium-quality MAGs, and low-quality MAGs with upward of 30% completeness and downward of 10% contamination were retained (Table S9) . We retained low-quality MAGs that were at least 30% complete because, in some cases, they were endowed with complete pathways. Despite having fewer single-copy marker genes, 65.35% of these MAGs possessed more than ten tRNA, and 17.54% had at least one of the three rRNA genes. All MAGs that did not meet these criteria were discarded. In addition, we discarded MAGs with obvious mismatches among marker genes. To identify these MAGs, we built Maximum Likelihood phylogenetic trees for all 43 single-copy marker genes with FastTree v 2.1.11 . MAGs that fall in different phyla for different marker genes were considered having obvious mismatches and were discarded. The rRNA genes were extracted using METAXA2 software , tRNA genes were predicted via tRNAscan-SE tool , and MAG coverage was calculated with the “metawrap quant_bins” command of the metawrap software .
The annotation of genomic features of bacterial and archaeal contigs and MAGs was carried out with Prokka v 1.14 . This step allowed the identification of coding sequences (CDS), rRNAs, and tRNAs, which were used in downstream analyses. To identify the taxonomy of the metagenome contigs, we taxonomically annotated single-copy marker genes and other protein-coding genes in contigs longer than 1000 bps. Forty single-copy marker genes were extracted using mOTU software ver1 [131, 150]. Single-copy marker genes were taxonomically annotated using DIAMOND BLASTp  with e-value ≤ 1e−24 and output format 102, which uses the lowest common ancestor algorithm for annotation. Other protein-coding genes were annotated using the same settings as marker genes but with DIAMOND BLASTx algorithm. Both annotations were performed using the GTDB database ver 95 as a reference . Taxonomic annotation of MAGs was based on bacterial and archaeal reference trees using GTDB-Tk v1.3.0 based on GTDB ver 95 .
We used the genomic DNA extracted from whole termite guts to produce PCR amplicon sequences of the V4 region of the 16S rRNA gene. PCR reactions were carried out using the primer pairs 515F (XXXXXGTGTGYCAGCMGCCGCGGTAA, ) and 806R (XXXXXXXXCCGGACTACNVGGGTWTCTAAT, ). All pairs of primers were endowed with unique dual tag indexes (8X overhang on the forward primer and 5X overhang on the reverse primer) to minimize the effects of index hopping between libraries. We conducted PCR amplification using Takara Tks Gflex DNA Polymerase with the following conditions: initial denaturation (3 min at 94 °C), 30 cycles of amplification (45 s at 94 °C, 60 s at 50 °C, and 90 s at 72 °C), and a terminal extension (10 min at 72 °C). All PCR reactions were scaled down using one-half of the reagents recommended in the manufacturer’s protocol. Prepared libraries were mixed in equimolar concentration and paired-end-sequenced on the Illumina MiSeq platform. The analysis of the 16S rRNA gene amplicon sequences was performed with mothur v1.44.1  following the standard procedure for Illumina data analysis described by Kozich et al. . After removing low-quality reads and chimera, sequences were clustered into operational taxonomic units (OTUs) at a sequence similarity level of 97% using VSEARCH . Sequences were classified using the naïve Bayesian classifier  implemented in mothur and the SILVA reference database release 138 . The abundance of every family inferred from 16S rRNA gene amplicon data and the 40 marker genes annotated from metagenomic data was then compared. In total, we found 143 prokaryote families and family-level lineages in common across both datasets.
We carried out functional annotation of the CDSs identified with Prokka v.1.14.5  using the “metagenome” option for the annotation of metagenomes and MAGs. The “metagenome” option takes into account the fragmented nature of metagenomic data. We only performed functional annotation on contigs taxonomically annotated as bacteria or archaea. We used the CAZy database  as a reference to identify CDSs with carbohydrate metabolizing properties. Protein sequences were searched against a set of profile Hidden Markov models (HMM) representing CAZy domains deposited in the dbCAN database release 7 . We used an e-value lower than e−30 and coverage greater than 0.35 as thresholds to extract the best domain matches.
Hydrogenases were annotated using HMM searches against the Pfam database version 32.0  using an e-value cut-off of e−30. Catalytic subunits of hydrogenases were classified into different classes using the k-nearest neighbor algorithm implemented in the HydDB webtool . For the [FeFe] hydrogenase Group A4, we carried out a manual inspection of the conserved motifs in the protein sequence .
We reconstructed prokaryotic metabolic pathways from our metagenomes with KOFam scan v.1.1.0 [49, 68]. We used the KEGG database as a reference and e-value cut-off of e−30. Each protein sequence was annotated to the gene family level with the KEGG-Decode python module . The MAG metabolic pathways were annotated with KOFam scan v.1.1.0 using default settings. Some gene families appeared to be absent from some MAGs after annotation against the KEGG database. To confirm the absence of these gene families, we carried out BLAST searches (amino acid identity > 60% and alignment length > 100 amino acids) against the Annotree protein sequence database .
Relative abundance of gene families
The relative abundance of CDSs was calculated by mapping the raw reads on the sequences. Briefly, the reads were mapped to the assembled contigs annotated as bacteria or archaea. Relative abundance was calculated for each CDS using Salmon v.1.4.0 with the “meta” option. Salmon corrects GC-content bias, gene-length differences, and sampling effort . We used Transcripts per Million (TPM) values to calculate the relative abundance of CDSs. TPM is a normalization method typically used in studies comparing gene expression levels among transcriptomes. In the case of gut metagenomes, TPM reflects the abundance of a gene in the bacterial community rather than its expression level (or the relative abundance of transcripts). TPM values were retained for downstream analyses if they were embedded into contigs longer than 1000 bps and had a TPM value higher than 1. This approach removes potentially spurious genes with low coverage. The log(TPM+1) values were calculated for visualization of the relative abundance data. Individual TPM counts were normalized using centered log(2)-ratio (clr) transformation for marker genes and genes of functional interest to account for the compositional structure and unequal numbers of reads in our metagenome data. Clr transformation enhances sub-compositional comparisons (gene vs gene, bacteria vs bacteria) and reduces spurious correlations. Positive and negative TPM values indicate positive and negative departures from the overall compositional mean, which is zero . Clr transformation of TPM values for marker genes and genes of functional interest was performed using the R package propr using 0.65 as a pseudo count to account for zero values . We did not calculate TPM for MAGs but instead used presence/absence to investigate pathway completeness.
We investigated whether the abundance of the genes and pathways of interest were phylogenetically autocorrelated to the time-calibrated tree of termites. To do so, we calculated the Moran’s I phylogenetic autocorrelation index using the R package phylosignal  on CDSs embedded in contigs longer than 1000 bps and with TPM value higher than 1. This analysis was carried out for each bacterial and archaeal phylum present in at least 5% of the metagenomes, using the combined 40 single-copy marker genes (see Table S3). A 5% false discovery rate (FDR) correction was calculated according to the p.adjust function implemented in the R package stats . Similarly, we calculated the Moran’s I phylogenetic autocorrelation index for every 211 CAZymes present in more than 10% of gut metagenomes and carried out a 5% FDR correction. Finally, the analysis was performed for each gene involved in the reductive acetogenesis, sulfate-reducing, nitrogen recycling, and nitrogen fixating pathways. For the mcrABG gene of the methanogenesis pathway, the analysis was performed on the combined genes because their abundance was too low to be analyzed individually as other genes. We applied a 5% FDR correction.
To examine whether the abundance of the genes and pathways of interest differed with the termite diet and the presence of non-prokaryotic co-symbionts, we performed phylogenetic ANOVA using the procD.pgls function implemented in the R package geomorph . A 5% FDR correction was calculated using the p.adjust function implemented in the R package stats . The termite diet was determined based on literature data [12, 37] and was considered made of wood or soil. Wood-feeding termite species included feeding groups I and II (including grass and leaf litter), while soil-feeding termites included feeding groups III and IV (sensu ). Non-prokaryotic co-symbionts are found in two groups of wood-feeding termites: the lower termites, which include all termites with the exclusion of Termitidae and host cellulolytic flagellates in their gut, and the Macrotermitinae, a subfamily of Termitidae that cultivates cellulolytic Termitomyces in fungal combs. Therefore, we recognized four groups of termites: the lower termites (LT), the soil-feeding termites (all Termitidae, SF), the Macrotermitinae (FC), and the non-Macrotermitinae wood-feeding Termitidae (WF). This analysis was also performed on CAZyme families encoded by specific prokaryotic phyla present in more than 10% of termite gut metagenomes in contigs longer than 5000 bps, to ensure correct taxonomic annotation. All metagenome contigs longer than 5000 bps with dinitrogen-fixing genes were also examined.
We visualized termite samples according to the abundance of CAZyme families present in their gut metagenomes using Principal Component Analysis (PCA). The PCA was performed using the prcomp function implemented in the R package stats  and visualized using the R package ggbiplot . Similar analyses were performed on the genes involved in reductive acetogenesis, sulfate reduction, dissimilatory nitrate reduction, urea degradation, glutamate biosynthesis, arginine biosynthesis, ammonia transport, nitrogen fixation, and mcrABG genes of the methanogenesis pathway.
Uricase genes encoded by termites
We searched the 53 termite transcriptomes previously published by Bucek et al.  for the presence of uricases. These transcriptomes were either derived from whole worker bodies or worker heads, and included species of all termite families. Protein sequences of predicted uricases from termites (XP_023702357, GFG34960), cockroaches (PSN45555, CDO39394), fireflies (KAF529609, XP_031344605), sawflies (XP_015591878, XP_015521616), ant (XP_011159093), fruit fly (NP_476779), and rat (NP_446220) were used as a query in TBLASTn searches. The longest open reading frames for all significant TBLASTn search hits (e-value < 10−30) were identified and translated using hmmer2go obtained from https://github.com/sestaton/HMMER2GO. The nonsense proteins that did not provide any significant BLASTx hit against the NCBI RefSeq database (e-value < 10−10) were discarded. The remaining predicted protein sequences, derived from 23 transcripts, were assigned KEGG annotations using eggNOG-Mapper version 4.5 . The protein sequences were aligned using CLUSTAL W , and the alignment was visually inspected.
Availability of data and materials
Raw sequence data generated in this study have been deposited on MG-RAST (https://www.mg-rast.org/mgmain.html?mgpage=project&project=mgp101108) (see Table S1 for individual IDs). MAGs generated in this study, their metadata, and the spreadsheets used to generate the main figures are available on Figshare (https://figshare.com/projects/EGU-The-functional-evolution-of-termite-gut-microbiota/126695). The scripts used in this study are available on github (https://github.com/oist/EGU-The-functional-evolution-of-termite-gut-microbiota).
Abdul Rahman N, Parks DH, Willner DL, Engelbrektson AL, Goffredi SK, Warnecke F, et al. A molecular survey of Australian and North American termite genera indicates that vertical inheritance is the primary force shaping termite gut microbiomes. Microbiome. 2015;3:5.
Adams DC, Otárola-Castillo E. Geomorph: An R package for the collection and analysis of geometric morphometric shape data. Methods Ecol Evol. 2013;4:393–9.
Alneberg J, Bjarnason BS, De Bruijn I, Schirmer M, Quick J, Ijaz UZ, et al. Binning metagenomic contigs by coverage and composition. Nat Methods. 2014;11:1144–6.
Altschup SF, Warren G, Miller W, Myers EW, Lipman D. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.
Apprill A, Mcnally S, Parsons R, Weber L. Minor revision to V4 region SSU rRNA 806R gene primer greatly increases detection of SAR11 bacterioplankton. Aquat Microb Ecol. 2015;75:129–37.
Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, et al. METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Mol Ecol Resour. 2015;15:1403–14.
Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, et al. MITOS: improved de novo metazoan mitochondrial genome annotation. Mol Phylogenet Evol. 2013;69:313–9.
Bignell DE, Eggleton P. On the elevated intestinal pH of higher termites (Isoptera: Termitidae). Ins Soc. 1995;42:57–69.
Bignell D, Eggleton P, Nunes L, Thomas K. Termites as mediators of forest carbon fluxes in tropical forests: budgets for carbon dioxide and methane emissions. In: Watt AD, Stork NE, Hunter MD, editors. Forests and insects. London: Chapman and Hall; 1997. p. 109–34.
Bignell DE. The role of symbionts in the evolution of termites and their rise to ecological dominance in the tropics. In: Hurst CJ, editor. The mechanistic benefits of microbial symbionts. Dordrecht: Springer; 2016. p. 121–72.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20.
Bourguignon T, Šobotník J, Lepoint G, Martin JM, Hardy OJ, Dejean A, et al. Feeding ecology and phylogenetic structure of a complex neotropical termite assemblage, revealed by nitrogen stable isotope ratios. Ecol Entomol. 2011;36:261–9.
Bourguignon T, Lo N, Cameron SL, Šobotník J, Hayashi Y, Shigenobu S, et al. The evolutionary history of termites as inferred from 66 mitochondrial genomes. Mol Biol Evol. 2015;32:406–21.
Bourguignon T, Lo N, Šobotník J, Sillam-Dussès D, Roisin Y, Evans TA. Oceanic dispersal, vicariance and human introduction shaped the modern distribution of the termites Reticulitermes, Heterotermes and Coptotermes. Proc R Soc B. 2016;283:20160179.
Bourguignon T, Lo N, Šobotník J, Ho SYW, Iqbal N, Coissac E, et al. Mitochondrial phylogenomics resolves the global spread of higher termites, ecosystem engineers of the tropics. Mol Biol Evol. 2017;34:589–97.
Bourguignon T, Lo N, Dietrich C, Šobotník J, Sidek S, Roisin Y, et al. Rampant host switching shaped the termite gut microbiome. Curr Biol. 2018;28:649–54.
Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol. 2017;35:725–31.
Brauman A, Koenig JF, Dutreix J, Garcia JL. Characterization of two sulfate-reducing bacteria from the gut of the soil-feeding termite, Cubitermes speciosus. Antonie Van Leeuwenhoek. 1990;58:271–5.
Brauman A, Kane MD, Labat M, Breznak JA. Genesis of acetate and methane by gut bacteria of nutritionally diverse termites. Science. 1992;257:1384–7.
Breznak JA, Switzer JM. Acetate synthesis from H2 plus CO2 by termite gut microbes. Appl Environ Microbiol. 1986;52:623–30.
Breznak JA, Brune A. Role of microorganisms in the digestion of lignocellulose by termites. Annu Rev Entomol. 1994;39:453–87.
Breznak JA. Ecology of prokaryotic microbes in the guts of wood and litter-feeding termites. In: Abe T, Bignell DE, Higashi M, editors. Termites: evolution, sociality, symbioses, ecology. Dordrecht: Kluwer Academic Publishers; 2000. p. 209–31.
Brune A, Ohkuma M. Role of the termite gut microbiota in symbiotic digestion. In: Bignell DE, Roisin Y, Lo N, editors. Biology of termites: a modern synthesis. Springer; 2011. p. 439–75.
Brune A. Symbiotic digestion of lignocellulose in termite guts. Nat Rev Microbiol. 2014;12:168–80.
Brune A. Methanogens in the digestive tract of termites. In (Endo)symbiotic methanogenic archaea. In: Hackstein JHP, editor. Book series: Microbiology monographs, vol. 19. 2nd ed. Cham: Springer; 2018. p. 81–101.
Brune A. Methanogenesis in the digestive tracts of insects and other arthropods. In: Stams AJM, Sousa DE, editors. Book series: Handbook of hydrocarbon and lipid microbiology Biogenesis of hydrocarbons. Cham: Springer; 2019. p. 229–60.
Bucek A, Šobotník J, He S, Shi M, McMahon DP, Holmes EC, et al. Evolution of termite symbiosis informed by transcriptome-based phylogenies. Curr Biol. 2019;29:3728–34.
Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12:59–60.
Buckel W, Thauer RK. Energy conservation via electron bifurcating ferredoxin reduction and proton/Na+ translocating ferredoxin oxidation. Biochim Biophys Acta. 2013;1827:94–113.
Calusinska M, Marynowska M, Bertucci M, Untereiner B, Klimek D, Goux X, et al. Integrative omics analysis of the termite gut system adaptation to Miscanthus diet identifies lignocellulose degradation enzymes. Commun Biol. 2020;3:275.
Chan PP, Lowe TM. tRNAscan-SE: Searching for tRNA genes in genomic sequences. Methods Mol Biol. 2019;1962:1–14.
Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2020;36:1925–7.
Chouvenc T, Šobotník J, Engel MS, Bourguignon T. Termite evolution: mutualistic associations, key innovations, and the rise of Termitidae. Cell Mol Life Sci. 2021;78:2749–69.
Cleveland LR. The physiological and symbiotic relationships between the intestinal protozoa of termites and their host, with special reference to Reticulitermes flavipes Kollar. Biol Bull. 1924;46:203–27.
Desai MS, Brune A. Bacteroidales ectosymbionts of gut flagellates shape the nitrogen-fixing community in dry-wood termites. ISME J. 2012;6:1302–13.
Dietrich C, Köhler T, Brune A. The cockroach origin of the termite gut microbiota: Patterns in bacterial community structure reflect major evolutionary events. Appl Environ Microbiol. 2014;80:2261–9.
Donovan SE, Eggleton P, Bignell DE. Gut content analysis and a new feeding group classification of termites. Ecol Entomol. 2001;26:356–66.
Dos Santos PC, Fang Z, Mason SW, Setubal JC, Dixon R. Distribution of nitrogen fixation and nitrogenase-like sequences amongst microbial genomes. BMC Genomics. 2012;13:162.
Dröge S, Limper U, Emtiazi F, Schönig I, Pavlus N, Drzyzga O, et al. In vitro and in vivo sulfate reduction in the gut contents of the termite Mastotermes darwiniensis and the rose-chafer Pachnoda marginata. J Gen Appl Microbiol. 2005;51:57–64.
Drummond AJ, Ho SYW, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006;4:e88.
Edgar RC. Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences. PeerJ. 2018;6:e4652.
Eggleton P, Bignell DE, Sands WA, Mawdsley NA, Lawton JH, Wood TG, et al. The diversity, abundance and biomass of termites under differing levels of disturbance in the Mbalmayo Forest Reserve, southern Cameroon. Philos Trans R Soc B Biol Sci. 1996;351:51–68.
El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47:D427–32.
Evans TA. Invasive termites. In: Bignell DE, Roisin Y, Lo N, editors. Biology of termites: a modern synthesis. Dordrecht Heidelberg London New York: Springer; 2011. p. 519–62.
Evans PN, Boyd JA, Leu AO, Woodcroft BJ, Parks DH, Hugenholtz P, et al. An evolving view of methane metabolism in the Archaea. Nat Rev Microbiol. 2019;17:219–32.
Gernhard T. The conditioned reconstructed process. J Theor Biol. 2008;253:769–78.
Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome datasets are compositional: and this is not optional. Front Microbiol. 2017;8:2224.
Graber JR, Breznak JA. Physiology and nutrition of Treponema primitia, an H-2/CO2-acetogenic spirochete from termite hindguts. Appl Environ Microbiol. 2004;70:1307–14.
Graham ED, Heidelberg JF, Tully BJ. Potential for primary productivity in a globally-distributed bacterial phototroph. ISME J. 2018;12:1861–6.
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5.
He S, Ivanova N, Kirton E, Allgaier M, Bergin C, Scheffrahn RH, et al. Comparative metagenomic and metatranscriptomic analysis of hindgut paunch microbiota in wood- and dung-feeding higher termites. PLoS One. 2013;8:e61126.
Hervé V, Liu P, Dietrich C, Sillam-Dussès D, Stiblik P, Šobotník J, et al. Phylogenomic analysis of 589 metagenome-assembled genomes encompassing all major prokaryotic lineages from the gut of higher termites. PeerJ. 2020;8:e8614.
Holt JA, Lepage M. Termites and soil properties. In: Abe T, Bignell DE, Higashi M, editors. Termites: evolution, sociality, symbioses, ecology. Dordrecht: Kluwer Academic Publishers; 2000. p. 389–407.
Hongoh Y, Sharma VK, Prakash T, Noda S, Toh H, Taylor TD, et al. Genome of an endosymbiont coupling N2 fixation to cellulolysis within protist cells in termite gut. Science. 2008;322:1108–9.
Hongoh Y, Ohkuma M. Termite gut flagellates and their methanogenic and eubacterial symbionts. In: Hackstein JH, editor. (Endo)symbiotic methanogenic archaea. Springer-Verlag; 2010. p. 55–79.
Hongoh Y. Toward the functional analysis of uncultivable, symbiotic microorganisms in the termite gut. Cell Mol Life Sci. 2011;68:1311–25.
Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 2016;44:D286–93.
Hungate RE. Experiments on the nutrition of Zootermopsis. III. The anaerobic carbohydrate dissimilation by the intestinal protozoa. Ecology. 1939;20:230–45.
Ikeda-Ohtsubo W, Brune A. Cospeciation of termite gut flagellates and their bacterial endosymbionts: Trichonympha species and ʻCandidatus Endomicrobium trichonymphaeʼ. Mol Ecol. 2009;18:332–42.
Ikeda-Ohtsubo W, Faivre N, Brune A. Putatively free-living ʻEndomicrobiaʼ – ancestors of the intracellular symbionts of termite gut flagellates? Environ Microbiol Rep. 2010;2:554–9.
Ikeda-Ohtsubo W, Strassert JFH, Köhler T, Mikaelyan A, Gregor I, McHardy AC, et al. ‘Candidatus Adiutrix intracellularis’, an endosymbiont of termite gut flagellates, is the first representative of a deep-branching clade of Deltaproteobacteria and a putative homoacetogen. Environ Microbiol. 2016;18:2548–64.
Inoue T, Kitade O, Yoshimura T, Yamaoka I. Symbiotic associations with protists. In: Abe T, Bignell DE, Higashi M, editors. Termites: evolution, sociality, symbioses, ecology. Dordrecht: Kluwer Academic Publishers; 2000. p. 275–88.
Inoue JI, Oshima K, Suda W, Sakamoto M, Iino T, Noda S, et al. Distribution and evolution of nitrogen fixation genes in the phylum Bacteroidetes. Microbes Environ. 2015;30:44–50.
Iwadate Y, Kato JI. Identification of a formate-dependent uric acid degradation pathway in Escherichia coli. J Bacteriol. 2019;201:e00573–18.
Ji R, Brune A. Transformation and mineralization of 14C-labeled cellulose, peptidoglycan, and protein by the soil-feeding termite Cubitermes Orthognathus. Biol Fertil Soils. 2001;33:166–74.
Ji R, Brune A. Digestion of peptidic residues in humic substances by an alkali-stable and humic-acid-tolerant proteolytic activity in the gut of soil-feeding termites. Soil Biol Biochem. 2005;37:1648–55.
Jouquet P, Bottinelli N, Shanbhag RR, Bourguignon T, Traoré S, Abbasi SA. Termites: The neglected soil engineers of tropical soils. Soil Sci. 2016;181:157–65.
Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44:D457–62.
Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–66.
Keck F, Rimet F, Bouchez A, Franc A. Phylosignal: an R package to measure, test, and explore the phylogenetic signal. Ecol Evol. 2016;6:2774–80.
Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl Environ Microbiol. 2013;79:5112–20.
Krishna K, Grimaldi DA, Engel MS. Treatise on the Isoptera of the world: Vol. 1. Bull Am Museum Nat Hist. 2013;377:1–196.
Kuhnigk T, Branke J, Krekeler D, Cypionka H, König H. A feasible role of sulfate-reducing bacteria in the termite gut. Syst Appl Microbiol. 1996;19:139–49.
Kuwahara H, Yuki M, Izawa K, Ohkuma M, Hongoh Y. Genome of ‘Ca. Desulfovibrio trichonymphae’, an H2-oxidizing bacterium in a tripartite symbiotic system within a protist cell in the termite gut. ISME J. 2017;11:766–76.
Lang K, Schuldes J, Klingl A, Poehlein A, Daniel R, Brune A. New mode of energy metabolism in the seventh order of methanogens as revealed by comparative genome analysis of “Candidatus Methanoplasma termitum.”. Appl Environ Microbiol. 2015;81:1338–52.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie2. Nat Methods. 2012;9:357–9.
Larkin MA, Blackshields G, Brown NP, Chenna R, Mcgettigan PA, McWilliam H, et al. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23:2947–8.
Leadbetter JR, Schmidt TM, Graber JR, Breznak JA. Acetogenesis from H2 plus CO2 by spirochetes from termite guts. Science. 1999;283:686–9.
Lilburn TG, Kim KS, Ostrom NE, Byzek KR, Leadbetter JR, Breznak JA. Nitrogen fixation by symbiotic and free living spirochetes. Science. 2001;292:2495–8.
Liu C, Zou G, Yan X, Zhou X. Screening of multimeric β-xylosidases from the gut microbiome of a higher termite, Globitermes brachycerastes. Int J Biol Sci. 2018;14:608–15.
Lo N, Tokuda G, Watanabe H, Rose H, Slaytor M, Maekawa K, et al. Evidence from multiple gene sequences indicates that termites evolved from wood-feeding cockroaches. Curr Biol. 2000;10:801–4.
Loh HQ, Hervé V, Brune A. Metabolic potential for reductive acetogenesis and a novel energy-converting [NiFe] hydrogenase in Bathyarchaeia from termite guts – A genome-centric analysis. Front Microbiol. 2021;11:635786.
Lombard V, Ramulu HG, Drula E, Coutinho PM, Henrissat B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 2014;42:D490–5.
Marynowska M, Goux X, Sillam-Dussès D, Rouland-Lefèvre C, Halder R, Wilmes P, et al. Compositional and functional characterisation of biomass-degrading microbial communities in guts of plant fibre- and soil-feeding higher termites. Microbiome. 2020;8:96.
Mendler K, Chen H, Parks DH, Lobb B, Hug LA, Doxey AC. AnnoTree: visualization and exploration of a functionally annotated microbial tree of life. Nucleic Acids Res. 2019;47:4442–8.
Michaud C, Hervé V, Dupont S, Dubreuil G, Bézier AM, Meunier J, et al. Efficient but occasionally imperfect vertical transmission of gut mutualistic protists in a wood-feeding termite. Mol Ecol. 2020;29:308–24.
Mikaelyan A, Strassert JFH, Tokuda G, Brune A. The fibre-associated cellulolytic bacterial community in the hindgut of wood-feeding higher termites (Nasutitermes spp.). Environ Microbiol. 2014;16:2711–22.
Mikaelyan A, Dietrich C, Köhler T, Poulsen M, Sillam-Dussès D, Brune A. Diet is the primary determinant of bacterial community structure in the guts of higher termites. Mol Ecol. 2015;24:5284–95.
Mikaelyan A, Thompson CL, Meuser K, Zheng H, Rani P, Plarre R, et al. High-resolution phylogenetic analysis of Endomicrobia reveals multiple acquisitions of endosymbiotic lineages by termite gut flagellates. Environ Microbiol Rep. 2017;9:477–83.
Nalepa C. What kills the hindgut flagellates of lower termites during the host molting cycle? Microorganisms. 2017;5:82.
Nalepa CA. Ancestral transfer of symbionts between cockroaches and termites: an unlikely scenario. Proc R Soc B. 1991;246:185–9.
Ngugi DK, Ji R, Brune A. Nitrogen mineralization, denitrification, and nitrate ammonification by soil-feeding termites: a 15N-based approach. Biogeochemistry. 2011;103:355–69.
Ngugi DK, Brune A. Nitrate reduction, nitrous oxide formation, and anaerobic ammonia oxidation to nitrite in the gut of soil-feeding termites (Cubitermes and Ophiotermes spp.). Environ Microbiol. 2012;14:860–71.
Nishimura Y, Otagiri M, Yuki M, Shimizu M, Inoue J, Moriya S, et al. Division of functional roles for termite gut protists revealed by single-cell transcriptomes. ISME J. 2020;14:2449–60.
Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27:824–34.
Ochman H, Lawrence JG, Groisman EA. Lateral gene transfer and the nature of bacterial innovation. Nature. 2000;405:299–304.
Ohkuma M, Noda S, Kudo T. Phylogenetic diversity of nitrogen fixation genes in the symbiotic microbial community in the gut of diverse termites. Appl Environ Microbiol. 1999;65:4926–34.
Ohkuma M, Noda S, Hongoh Y, Kudo T. Coevolution of symbiotic systems of termites and their gut microorganisms. Riken Rev. 2001;41:73–4.
Ohkuma M, Brune A. Diversity, structure, and evolution of the termite gut microbial community. In: Bignell DE, Roisin Y, Lo N, editors. Biology of termites: a modern synthesis. Springer; 2011. p. 413–138.
Ohkuma M, Noda S, Hattori S, Iida T, Yuki M, Starns D, et al. Acetogenesis from H2 plus CO2 and nitrogen fixation by an endosymbiotic spirochete of a termite-gut cellulolytic protist. Proc Natl Acad Sci U S A. 2015;112:10224–30.
Papudeshi B, Haggerty JM, Doane M, Morris MM, Walsh K, Beattie DT, et al. Optimizing and evaluating the reconstruction of metagenome-assembled microbial genomes. BMC Genomics. 2017;18:915.
Parada AE, Needham DM, Fuhrman JA. Every base matters: Assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ Microbiol. 2016;18:1403–14.
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–55.
Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for bacteria and archaea. Nat Biotechnol. 2020;38:1079–86.
Paul K, Nonoh JO, Mikulski L, Brune A. “Methanoplasmatales”, Thermoplasmatales-related archaea in termite guts and other environments, are the seventh order of methanogens. Appl Environ Microbiol. 2012;78:8245–53.
Pester M, Brune A. Hydrogen is the central free intermediate during lignocellulose degradation by termite gut symbionts. ISME J. 2007;1:551–65.
Potrikus CJ, Breznak JA. Uric acid-degrading bacteria in guts of termites [Reticulitermes flavipes (Kollar)]. Appl Environ Microbiol. 1980;40:117–24.
Potrikus CJ, Breznak JA. Gut bacteria recycle uric acid nitrogen in termites: a strategy for nutrient conservation. Proc Natl Acad Sci U S A. 1981;78:4601–5.
Poulsen M, Hu H, Li C, Chen Z, Xu L, Otani S, et al. Complementary symbiont contributions to plant decomposition in a fungus-farming termite. Proc Natl Acad Sci U S A. 2014;111:14500–5.
Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol. 2009;26:1641–50.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Glo FO, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41:D590–6.
Quinn TP, Richardson MF, Lovell D, Crowley TM. propr: an R-package for identifying proportionally abundant features using compositional data analysis. Sci Rep. 2017;7:16252.
R Core Team. (2014). R: A language and environment for statistical computing. Available from: http://www.r-project.org/.
Rambaut A, Drummond A, Xie D, Baele G, Suchard MA. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst Biol. 2018;67:901–4.
Rognes T, Flouri T, Nichols B, Quince C, Mahé F. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016;4:e2584.
Rouland-Lefèvre C. Symbiosis with fungi. In: Abe T, Bignell DE, Higashi T, editors. Termites: evolution, sociality, symbioses, ecology. Dordrecht: Kluwer Academic Publishers; 2000. p. 289–306.
Sato T, Hongoh Y, Noda S, Hattori S, Ui S, Ohkuma M. Candidatus Desulfovibrio trichonymphae, a novel intracellular symbiont of the flagellate Trichonympha agilis in termite gut. Environ Microbiol. 2009;11:1007–15.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–41.
Schuchmann K, Müller V. A bacterial electron-bifurcating hydrogenase. J Biol Chem. 2012;287:31165–71.
Schuchmann K, Müller V. Autotrophy at the thermodynamic limit of life: a model for energy conservation in acetogenic bacteria. Nat Rev Microbiol. 2014;12:809–21.
Schuchmann K, Chowdhury NP, Müller V. Complex multimeric [FeFe] hydrogenases: biochemistry, physiology and new opportunities for the hydrogen economy. Front Microbiol. 2018;9:2911.
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–9.
Shinzato N, Matsumoto T, Yamaoka I, Oshima T, Yamagishi A. Methanogenic symbionts and the locality of their host lower termites. Microbes Environ. 2001;16:43–7.
Søndergaard D, Pedersen CNS, Greening C. HydDB: a web tool for hydrogenase classification and analysis. Sci Rep. 2016;6:34212.
Song Y, Hervé V, Radek R, Pfeiffer F, Zheng H, Brune A. Characterization and phylogenomic analysis of Breznakiella homolactica gen. nov. sp. nov. indicate that termite gut treponemes evolved from non-acetogenic spirochetes in cockroaches. Environ Microbiol. 2021; 23:4228–45.
Srivastava A, Malik L, Sarkar H, Zakeri M, Almodaresi F, Soneson C, et al. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol. 2020;21:239.
Stingl U, Radek R, Yang H, Brune A. “Endomicrobia”: cytoplasmic symbionts of termite gut protozoa form a separate phylum of prokaryotes. Appl Environ Microbiol. 2005;71:1473–9.
Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4:vey016.
Sugimoto A, Bignell DE, Macdonald JA. Global impact of termites on the carbon cycle and atmospheric trace gases. In: Abe T, Bignell DE, Higashi M, editors. Termites: evolution, sociality, symbioses, ecology. Dordrecht: Kluwer Academic Publishers; 2000. p. 409–35.
Sugimoto A, Inoue T, Tayasu I, Miller L, Takeichi S, Abe T. Methane and hydrogen production in a termite-symbiont system. Ecol Res. 1998;13:241–57.
Sunagawa S, Mende DR, Zeller G, Izquierdo-Carrasco F, Berger SA, Kultima JR, et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat Methods. 2013;10:1196–9.
Tai V, James ER, Nalepa CA, Scheffrahn RH, Perlman SJ, Keeling PJ. The role of host phylogeny varies in shaping microbial diversity in the hindguts of lower termites. Appl Environ Microbiol. 2015;81:1059–70.
Tai V, Carpenter KJ, Weber PK, Nalepa CA, Perlman SJ, Keeling PJ. Genome evolution and nitrogen fixation in bacterial ectosymbionts of a protist inhabiting wood-feeding cockroaches. Appl Environ Microbiol. 2016;82:4682–95.
Tartar A, Wheeler MM, Zhou X, Coy MR, Boucias DG, Scharf ME. Parallel metatranscriptome analyses of host and symbiont gene expression in the gut of the termite Reticulitermes flavipes. Biotechnol Biofuels. 2009;2:25.
Terrapon N, Lombard V, Gilbert HJ, Henrissat B. Automatic prediction of polysaccharide utilization loci in Bacteroidetes species. Bioinformatics. 2015;31:647–55.
Thauer RK, Kaster AK, Seedorf H, Buckel W, Hedderich R. Methanogenic archaea: ecologically relevant differences in energy conservation. Nat Rev Microbiol. 2008;6:579–91.
Tholen A, Brune A. Localization and in situ activities of homoacetogenic bacteria in the highly compartmentalized hindgut of soil-feeding higher termites (Cubitermes spp.). Appl Environ Microbiol. 1999;65:4497–505.
Thong-On A, Suzuki K, Noda S, Inoue JI, Kajiwara S, Ohkuma M. Isolation and characterization of anaerobic bacteria for symbiotic recycling of uric acid nitrogen in the gut of various termites. Microbes Environ. 2012;27:186–92.
Tokuda G, Lo N, Watanabe H, Arakawa G, Matsumoto T, Noda H. Major alteration of the expression site of endogenous cellulases in members of an apical termite lineage. Mol Ecol. 2004;13:3219–28.
Tokuda G, Mikaelyan A, Fukui C, Matsuura Y, Watanabe H, Fujishima M, et al. Fiber-associated spirochetes are major agents of hemicellulose degradation in the hindgut of wood-feeding higher termites. Proc Natl Acad Sci U S A. 2018;115:E11996–2004.
Treitli SC, Kolisko M, Husník F, Keeling PJ, Hampl V. Revealing the metabolic capacity of Streblomastix strix and its bacterial symbionts using single-cell metagenomics. Proc Natl Acad Sci U S A. 2019;116:19675–84.
Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome. 2018;6:158.
Větrovský T, Baldrian P. The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses. PLoS One. 2013;8:e57923.
Vu, V.Q. (2011). ggbiplot: a ggplot2 based biplot. R package version 0.55. Available from: http://github.com/vqv/ggbiplot.
Wang Q, Garrity GM, Tiedje JM, Cole JR. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007;73:5261–7.
Wang M, Buček A, Šobotník J, Sillam-Dussès D, Evans TA, Roisin Y, et al. Historical biogeography of the termite clade Rhinotermitinae (Blattodea: Isoptera). Mol Phylogenet Evol. 2019;132:100–4.
Warnecke F, Luginbuhl P, Ivanova N, Ghassemian M, Richardson TH, Stege JT, et al. Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature. 2007;450:560–5.
Watanabe H, Noda H, Tokuda G, Lo N. A cellulase gene of termite origin. Nature. 1998;394:330–1.
Watanabe H, Tokuda G. Cellulolytic systems in insects. Annu Rev Entomol. 2010;55:609–32.
Wu D, Jospin G, Eisen JA. Systematic identification of gene families for use as ‘markers’ for fhylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS One. 2013;8:e77033.
Wu, H. (2018). Characterizing xylan-degrading enzymes from a putative xylan utilization system derived from termite gut metagenome. PhD thesis, INSA de Toulouse, France.
Yamada A, Inoue T, Noda S, Hongoh Y, Ohkuma M. Evolutionary trend of phylogenetic diversity of nitrogen fixation genes in the gut community of wood-feeding termites. Mol Ecol. 2007;16:3768–77.
Yamin M. Cellulose metabolism by the flagellate Trichonympha from a termite is independent of endosymbiotic bacteria. Science. 1981;211:58–9.
Yan D. Protection of the glutamate pool concentration in enteric bacteria. 2007;104:9475–80.
Yin Y, Mao X, Yang J, Chen X, Mao F, Xu Y. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2012;40:W445–51.
Yuki M, Kuwahara H, Shintani M, Izawa K, Sato T, Starns D, et al. Dominant ectosymbiotic bacteria of cellulolytic protists in the termite gut also have the potential to digest lignocellulose. Environ Microbiol. 2015;17:4942–53.
Zhang H, Yohe T, Huang L, Entwistle S, Wu P, Yang Z, et al. dbCAN2: A meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2018;46:W95–101.
We thank the DNA Sequencing Section and the Scientific Computation and Data Analysis Section of the Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan, for assistance with sequencing and for providing access to the OIST computing cluster, respectively.
This work was supported by the subsidiary funding to OIST, by the Czech Science Foundation (project No. 20-20548S), by the Internal Grant Agency of the Faculty of Tropical AgriSciences, CULS (20213112), by the Australian Research Council through a Future Fellowship to NL, and by the Japan Society for the Promotion of Science through a Kakenhi grant to GT (17H01510) and a DC2 graduate student fellowship awarded to JA.
Ethic approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Termite samples sequenced in the study. Table S2. Relative abundance of family-level prokaryotic taxa inferred from gut metagenome and 16S rRNA amplicon data of 74 termite samples. The prokaryotic taxonomy was determined with GTDB for marker genes and with SILVA for 16S rRNA data. The relative abundance was clr-transformed to account for differences in sequencing method and sequencing depth among metagenome samples. Table S3. Taxonomic distribution of major bacterial and archaeal groups based on relative abundance of 40 single-copy marker genes. We analyzed the marker genes present in contigs longer than 1000 bps in >5% of gut metagenomes. The relative abundance is represented as transcripts per million (TPM). Table S4. Moran's I phylogenetic autocorrelation index calculated for 123 prokaryote families. Significance was assessed with 9999 random permutations. P-values <0.05 are indicated by asterisks. Table S5. Relative abundance of prokaryotic CAZymes in gut metagenomes with upward of 10000 contigs longer than 1000 bps. Relative abundance is given as transcripts per million (TPM). Table S6. Moran's I phylogenetic autocorrelation index calculated for 211 prokaryotic CAZymes present in more than 10% of gut metagenomes. Significance was assessed with 9999 random permutations. P-values <0.05 are indicated by asterisks. Table S7. Phylogenetic ANOVA calculated for 211 prokaryotic CAZymes present in more than 10% of gut metagenomes. Significance was assessed with 9999 random permutations. P-values of phylogenetic ANOVA and pairwise comparisons were adjusted at 5% false discovery rate (FDR). The relative abundance of each CAZyme for the four termite groups are indicated by mean TPM values. Significance of pairwise comparisons between termite groups are indicated by asterisks (* p < 0.05; ** p < 0.01; *** p < 0.001). Table S8. Phylogenetic ANOVA comparing the taxonomic origin of the 19 prokaryotic CAZymes found in 10% of gut metagenomes and embedded in contigs longer than 5000 bps. Significance was assessed with 9999 random permutations. The relative abundance of each CAZyme for the four termite groups are indicated by mean TPM values. Significance of pairwise comparisons between termite groups are indicated by asterisks (* p < 0.05; ** p < 0.01; *** p < 0.001). Table S9. Information about the 654 MAGs reconstructed in this study. Table S10. Distribution of polysaccharide utilization loci (PULs) across the MAGs. PULs with at least one GH and Bacteroidota PULs with at least one susCD complex are shown. MAGs containing PULs with all the components are highlighted in grey. Table S11. Moran’s I phylogenetic autocorrelation index and phylogenetic ANOVA performed on the genes involved in the final steps of the lignocellulose digestion in the gut of termites. For genes composed of multiple subunits, all subunits were summed together. Significance was assessed with 9999 random permutations. P-values were adjusted at 5% false discovery rate (FDR). The relative abundance of each gene for the four termite groups are indicated by mean TPM values. Significance of pairwise comparisons between termite groups are indicated by asterisks (* p < 0.05; ** p < 0.01; *** p < 0.001). Table S12. Distribution of genes involved in reductive acetogenesis among MAGs. Distribution is shown as presence (1) and absence (0). Asterisks indicate genes that were annotated using BLASTx search against the AnnoTree database (perc. identity >60%, align. length >100 aa). Other genes were annotated using HMM search against the KEGG or Pfam databases. [FeFe] hydrogenase GroupA4 were annotated using the Hyddb webtool followed by manual inspection of the conserved motifs. The total number of HycB3 (PF13247) found in each MAG is shown. MAGs with almost complete reductive acetogenesis pathway (>5 genes) and HDCR complex are highlighted in grey. Table S13. Relative abundance of methyl-coenzyme M reductase (mcrABG) gene complex present in metagenome contigs longer than 5000 bps. Contigs were annotated using BLASTx search against the GTDB database. Relative abundance of the gene family is shown as raw TPM. Table S14. Distribution of genes involved in methanogenesis among MAGs. Distribution is shown as presence (1) and absence (0). Asterisks indicate genes that were annotated using BLASTx search against the AnnoTree database (perc. identity >60%, align. length >100 aa). Other genes were annotated using HMM search against the KEGG or Pfam databases. Highlighted MAGs have a complete Methanogenesis pathway. Table S15. Distribution of genes involved in sulfate reducing among MAGs. Distribution is shown as presence (1) and absence (0). Asterisks indicate genes that were annotated using BLASTx search against the AnnoTree database (perc. identity >60%, align. length >100 aa). MAGs with complete sulfate reducing pathway are highlighted. Table S16. Genes involved in nitrogen metabolism and fixation found in our MAGs. Distribution is shown as presence (1) and absence (0). Asterisks indicate genes that were annotated using BLASTx search against the AnnoTree database (perc. identity >60%, align. length >100 aa). MAGs with complete nitrogen fixation or dissimilatory nitrate reduction pathways are highlighted. Table S17. Contigs endowed with a NifHDKENB (nifHDKENB, vnfHDKENB, or anfHDKENB) gene complex found in gut metagenomes. The relative abundance is given as raw TPM. Table S18. Contigs endowed with a NifHDK (nifHDK, vnfHDK, or anfHDK) gene complex found in termite gut metagenomes. The relative abundance is given as raw TPM. Table S19. Fossil calibrations used to calibrate the time-calibrated tree of termites.
. Time-calibrated phylogenetic tree of termites inferred from mitochondrial genome sequences. Figure S2. Relative abundance of archaeal and bacterial phyla inferred from the termite gut metagenomes and the 16S rRNA amplicon data of 74 termite samples. Figure S3. Maximum likelihood phylogenetic tree inferred from 43 single-copy marker genes of 654 metagenome-assembled genomes (MAGs). The completeness and contamination of MAGs was inferred with CheckM . Detailed information about each MAG is available in Table S9. Figure S4. Protein sequence alignment of predicted uricases from 53 termite transcriptomes previously published in Bucek et al. .
About this article
Cite this article
Arora, J., Kinjo, Y., Šobotník, J. et al. The functional evolution of termite gut microbiota. Microbiome 10, 78 (2022). https://doi.org/10.1186/s40168-022-01258-3
- Vertical inheritance