Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria

Estuaries are among the most productive habitats on the planet. Bacteria in estuary sediments control the turnover of organic carbon and the cycling of nitrogen and sulfur. These communities are complex and primarily made up of uncultured lineages, thus little is known about how ecological and metabolic processes are partitioned in sediments. De novo assembly and binning resulted in the reconstruction of 82 bacterial genomes from different redox regimes of estuary sediments. These genomes belong to 23 bacterial groups, including uncultured candidate phyla (for example, KSB1, TA06, and KD3-62) and three newly described phyla (White Oak River (WOR)-1, WOR-2, and WOR-3). The uncultured phyla are generally most abundant in the sulfate-methane transition (SMTZ) and methane-rich zones, and genomic data predict that they mediate essential biogeochemical processes of the estuarine environment, including organic carbon degradation and fermentation. Among the most abundant organisms in the sulfate-rich layer are novel Gammaproteobacteria that have genes for the oxidation of sulfur and the reduction of nitrate and nitrite. Interestingly, the terminal steps of denitrification (NO3 to N2O and then N2O to N2) are present in distinct bacterial populations. This dataset extends our knowledge of the metabolic potential of several uncultured phyla. Within the sediments, there is redundancy in the genomic potential in different lineages, often distinct phyla, for essential biogeochemical processes. We were able to chart the flow of carbon and nutrients through the multiple geochemical layers of bacterial processing and reveal potential ecological interactions within the communities.


Background
Estuaries are dynamic environments where nutrient-rich river water mixes with shallow coastal and nutrient-rich deep ocean water. They are among the most productive environments on the planet [1]. Processes within estuaries also mediate the transfer of carbon from land to sea, release a considerable amount of CO 2 to the atmosphere [2], and sequester carbon in sediments [3]. As this organic carbon is degraded, and oxygen is consumed, a variety of favorable anaerobic respiratory processes are mediated by sediment microorganisms that contribute to sulfur, nitrogen, and iron transformations. Thus, shallow estuary sediments and their microbial communities are a global hotspot for biogeochemical cycling.
Our ability to partition these processes and the underlying metabolic pathways among specific microbial groups have remained limited by the complexity and abundance of uncultured groups present in sediment communities. Metagenomic studies have characterized the genetic potential of marine sediment microbial communities (for example, see Biddle et al. [4]). However, because sequencing reads are often not assigned to specific taxa, such approaches typically do not link individual community members to metabolic pathways [5]. Further, cultureindependent genomic reconstructions of estuary sediment microbial communities are lacking; thus little is known about uncultured communities in this complex microbial ecosystem.
To better understand the metabolic capabilities of uncultured bacteria, we obtained 262 Gbp of genomic sequence from sediment profiles from the White Oak River (WOR) estuary, North Carolina. Our sequencing effort focused on three key redox layers; the sulfate-rich, sulfatemethane transition (SMTZ), and methane-rich zones. This genomic dataset was assembled and binned to obtain partial and near-complete genomes and to reconstruct metabolic pathways of numerous community members. Reconstructing the metabolic capabilities of numerous bacterial groups in the sediment communities enabled us to identify their roles in carbon, iron, nitrogen, and sulfur cycling. This dataset provides a genomic road map of how biogeochemical processes are hypothesized to be partitioned in aquatic sediment bacterial communities.

Genomic reconstruction and identification
Sediment samples were collected with push cores at three adjacent mid-estuary locations. Since the distinct redox layers shared similar bacterial communities across the three sites, we combined genomic reads from sites 2 and 3 from the sulfate-rich zone (8 to 12 cm) and SMTZ (24 to 32 cm). The co-assembly of separate samples resulted in better assembly and greater coverage of the genomic bins. Due to the large number of SMTZ sequence reads, a separate assembly was generated from site 1 (26 to 30 cm). The methane-rich zone (52 to 54 cm) assembly was generated only from site 1. Statistics (total size, N50, number of ORFs, etc.) about these assemblies used for binning can be found in Additional file 1: Table S1. Subsequent binning by tetra-nucleotide frequency coupled with genomic coverage resulted in over 120 genomic bins of Bacteria (Additional file 1: Figure S1). Based on completeness, taxon coverage, and genomic novelty, 82 bins were chosen for detailed characterization of genome-encoded metabolic pathways. Of the bins, 26, 35, and 21 are from the sulfate-rich, SMTZ, and methanerich zones, respectively. Of these genomes, 58 are estimated to be >70% complete and 32 are >80% complete, with minimal estimated contamination (average approximately 10% among them all) (Additional file 1: Table S2). Eight of the bins were found to contain more than one genome, by the presence of multiple single-copy genes [6]. Only 30 of these genomes contain 16S rRNA genes (>300 bp) likely due to fragmentation commonly seen among highly conserved genes in short-read assemblies [7]. Therefore, phylogenetic analysis of concatenated ribosomal proteins was also used to determine the taxonomic identities of the remaining bins ( Figure 1) [8]. Ribosomal proteins belonging to novel phyla for which reference sequences were not available were identified by the presence of at least one 16S rRNA gene in the clade (Figure 2). The distribution of ribosomal proteins and other singlecopy genes is detailed in Additional file 1: Table S3.

Genomic abundance of community members in the sediment profile
To quantify the genomic abundance of community members in each zone, all reads were mapped to all genes for ribosomal protein S3 in the genome assemblies (Additional file 1: Figure S2). The assembly from shallow, sulfate-rich sediments was dominated by Beta-, Gamma-, and Deltaproteobacteria (including Myxococcales), Bacteroidetes, and Nitrospira. The SMTZ and the deeper methane-rich sediments were dominated by Archaea and Chloroflexi (Additional file 1: Figure S2), consistent with previous qPCR and rRNA slot blot results showing that bacterial dominance in surficial White Oak River estuarine sediments is considerably reduced and even reversed downcore [13]. The candidate bacterial phyla WS3, OD1, TA06, Zixibacteria, WOR-1, WOR-2, and WOR-3 were sufficiently abundant for genome reconstruction only in the deeper sediment layers.

Organic carbon degradation and fermentation
Sedimentary microbial communities process the input of photosynthetic organic matter from the overlying water column and thus play a key role in the degradation of complex carbon substrates [14]. All the bacterial genomes present here were searched for carbohydratemetabolizing enzymes using the 'CAZy' database [15]. Many of the genes in CAZy are involved in cellular maintenance processes; therefore, those that are specifically involved in hydrolysis of organic carbon based on previous studies [16] were identified. Generally, the genomes belonging to the Chloroflexi (specifically Anaerolineae), Bacteroidetes, Gemmatimonadetes, Planctomycetes, WOR-1, and WOR-2 phyla contained the broadest array of carbohydrate hydrolytic genes ( Figure 3). Candidate phyla KSB1 and KD3-62 also have relatively high numbers of these genes. These organisms have a variety of cellulose, hemicellulose, and polysaccharide degradation genes suggesting a role in the initial degradation and hydrolysis of complex organic carbon compounds. Multiple genes are involved in the degradation of chitin, the long-chain polymer of N-acetylglucosamine, a major structural component of fungal and algal cell wells, and of arthropod exoskeletons. Endo-acting chitinase genes were identified in two of the WOR-3 bins (SMTZ-60 and SMTZ-42), KSB1 (or Cloacimonetes) bin SMTZ-31, and a variety of Chloroflexi and Planctomyetes. N-acetyl-glucosaminidase genes were found in several Bacteroides, Gammaproteobacteria, Deltaproteobacteria, and Zixibacteria genomes. Diversity of organisms from which genomic bins were reconstructed from the White Oak River sediments. Phylogenetic tree inferred from 16 syntenous ribosomal protein genes present within genomic bins from the sediment metagenomic assemblies. Each sequence in bold is from one genomic bin. Genomic bins belonging to novel phyla for which no reference genomes are available (WOR-1, WOR-2, WOR-3, TA06, and KD3-62) were designated based on corresponding 16S rRNA gene phylogenetic analyses ( Figure 2). The sample depths from which each of the bins were obtained are delineated by blue (shallow), green (SMTZ), and red (deep). The phylogeny was generated using the PhyML (maximum likelihood) method.
Proteins account for a large proportion of bioavailable carbon and nitrogen for sediment communities [17]. In multiple phyla, these resources are accessed via extracellular peptidases (Additional file 2: Table S5). The greatest numbers of peptidases were identified in the candidate phyla KSB1, WOR-1, WOR-2, and WOR-3, and in the Bacteroidetes and Gemmatimonadetes, suggesting that these community members are substantially involved in protein degradation along with sedimentary benthic Archaea [18].
Several of the bacterial groups capable of hydrolyzing complex organic carbon also have pathways for glycolytic fermentation of glucose to acetate, including WOR-1 (Figure 4), WOR-2, WS3, Bacteroidetes, Nitrospira (SG8-3), and Spirochetes (bin DG-61). All these groups have the reductive acetyl-CoA (Wood-Ljungdahl) and phosphate acetyltransferase-acetate kinase pathways for carbon fixation and acetate production; however, the WS3 and WOR-1 genomes appear to lack acetyl-CoA synthetase, which is essential to the Wood-Ljungdahl pathway. The Spirochetes and WOR-2 genomes contain lactate dehydrogenase genes suggesting they are also capable of lactate fermentation. The Bacteroidetes, Spirochetes, and Thermodesulfovibrio-like bins have genes that encode aldehyde dehydrogenase and alcohol dehydrogenase, suggesting they are capable of full fermentation to ethanol. Fermentation has been demonstrated in Thermodesulfovibrio spp. cultures [20]. The genome bins did not yield an identifiable complete pathway for butyrate formation. The end products of these fermentation pathways fuel terminal respiration in the sediment community; the key electron donors acetate and hydrogen are among the principal drivers of sulfate reduction [21] (Figure 4).
Several of the bacteria possess genes for Ni,Fe-hydrogenases, which can be involved in H 2 production or consumption. These genes are ubiquitous in Gemmatiomonas, Myxococcales, Delta-, and Gammaproteobacteria (SG-11, SG-13, SG-15, SG-30, and SG-31) and are likely used for consumption of H 2 by respiratory processes (for example, sulfate reduction and denitrification). The Nitrospira bin (SG8-35-4, related to Thermodesulfovibrio spp., Figure 1) contains genes for Ni,Fe-hydrogenases that are likely to participate in both sulfate reduction with hydrogen as the electron donor and the fermentative production of H 2 , which have been demonstrated in Thermodesulfovibrio spp. [20]. Several bacterial groups that are capable of organic carbon degradation and fermentation have Ni,Fehydrogenase genes, including Planctomycetes, Spirochetes, Chloroflexi, WOR-2, WOR-3, and WOR-1 ( Figure 4). The extensive distribution of these hydrogenases among both fermenting and respiring bacteria indicates that H 2 is a highly dynamic electron carrier produced and consumed by a wide range of sediment microbes, as it is in other anoxic environments [22,23]. No genes for Fe,Fe-hydrogenases were identified, which are thought to primarily produce H 2 [24].

Dissimilatory sulfur and nitrogen cycling
Aquatic sediments are characterized by redox gradients, as oxidized compounds are gradually reduced by respiration. To determine the respiratory repertoire of the bacterial community members, the estuary genomic bins were surveyed for key genes of respiration pathways. Several of the genomic bins contain dissimilatory sulfite reductase (dsr) genes, indicative of microbial sulfate and sulfite reduction. To account for the possibility that these genes belong to a phylogenetically distinct group of genes (rdsr) that mediate the reverse reaction [25], we generated a phylogenetic tree of all the dsr-like genes recovered (Additional file 1: Figure S3). The Deltaproteobacteria, which constituted the most abundantly detected microorganisms in the sulfate-rich zone (Additional file 1: Figure S2), have reducing-type dsr genes and complete sulfate reduction pathways (with the exception of the two Myxococcales bins). The bin SG8-35-4, a sister lineage to Thermodesulfovibrio spp. within the Nitrospira phylum, is capable of sulfate reduction based on the presence of dsr genes.
A complete sulfate reduction pathway was identified in the Gemmatiomonas-like bin SG8-17. The dsrAB genes from this bin fall within a phylogenetically deeply branched unknown clade (Additional file 1: Figure S3). This clade (designated DSR-J [26]) includes dsrAB sequences from an intertidal sand flat, Hydrate Ridge, deep-sea, and estuary sediments, suggesting these sulfate reducers are widespread Figure 4 Flow diagram of the potential interactions between (left to right) organic carbon utilization, fermentation, and respiration identified in the bacterial genomes reconstructed in this study. Arrows represent metabolic capabilities that were identified in the metagenomic reconstruction from the White Oak River estuary. The dashed lines on the right represent potential electron donors for the anaerobic respiration processes. Note that the Gammaproteobacteria are capable of coupling nitrate reduction to either thiosulfate or sulfide oxidation. Abbreviations in the diagram are as follows; DNRA, dissimilatory nitrate reduction to ammonia; 'Betaprot' , Betaproteobacteria; 'Deltaprot' , Deltaproteobacteria; 'Gemmatio' , Gemmatimonadetes ('Gemm38-2' refers specifically to bin 38-2), 'Gammaprot' , Gammaproteobacteria; 'Myxococca' , Myxococcales; 'Plancto' , Planctomycetes.
in coastal and marine sediments. Genes of this clade were hypothesized to have been horizontally transferred [26]. No dsr genes were identified in any of the other Gemmatiomonadetes-like bins. Since all of the sequences for this clade had previously been recovered from large-insert (fosmid) clones that lacked 16S rRNA genes, their taxonomic affiliation had been uncertain. In this dataset the dsrAB genes are located on a 9.6 kb contig that is confidently binned, with both the tetranucleotide and coverage signatures, consistent with assignment to SG8-17. This genomic bin contains additional genes for sulfate reduction on other contigs, including aprAB, SAT, and dsrC. Thus, this member of the Gemmatimonadetes appears to be capable of sulfate reduction.
In several Gammaproteobacteria, the gene clusters for nitrate reduction and sulfur oxidation are mutually intertwined. The rdsr gene cluster of bins SG8-11, SG8-15, SG8-45, SG8-47, and SG8-50 includes a gene with homology to a nitrate sensor (narX) and a luxR-like transcription regulator ( Figure 6). This type of sensor protein has been implicated in gene expression in response to changes in nitrate/nitrite concentrations [28]. Further, the gene cluster for nitrate reduction (napABCDGH) in the gammaproteobacterial bins SG8-45 and SG8-50 contains two dsrC genes, which have been suggested to regulate rdsr gene expression [29]. The presence of two dsrC genes in the nap operon suggests that regulation of sulfur oxidation and nitrate reduction is coordinated, consistent with the coupling of these processes in Thioalkalivibrio spp. [30,31], a close relative of these bins. However, since the functions of dsrC and narX genes are putative, this inference needs to be verified with experimental evidence. The enrichment of these Gammaproteobacteria in shallow sediment samples, where the sulfide porewater concentrations decrease to the detection limit [32], is consistent with nitrate-dependent oxidation of sulfur compounds. Genes Figure 5 Diagrams of metabolic potential and electron transport of WOR-1 (bin DG-54-3) and Gammaproteobacteria (bin SG8-45), based on gene content. ATPase, ATP synthetase; FDH, formate dehydrogenase; NiFe-hyd, Ni,Fe-hydrogenase; Cytb/c1, quinone cytochrome oxidoreductase; Cyt c, cytochrome c; nap/nar, nitrate reductase; nir, nitrite reductase; nor, nitric oxide reductase; SAT, sulfate transferase; apr, APS reductase; rdsr, reverse dissimilatory sulfite reductase; Rub, RuBisCO; Q, quinine; SDH/FR, succinate dehydrogenase/fumarate reductase; NDH, NADH dehydrogenase; SOX, sulfur oxidation multienzyme complex. The sulfide quinone oxidoreductase (sqr) gene was not identified in this particular genomic bin, however many other closely related Gammaproteobacteria do have sqr genes. homologous to cytochrome c oxidases are present in SG8-11, SG8-31, SG8-45, and SG8-50, suggesting that some of the Gammaproteobacteria are also capable of O 2 reduction.
Dissimilatory nitrate reduction to ammonia (DNRA) has been shown to co-occur with denitrification in estuary sediments [33]. To look for the presence of DNRA genes in the genomes, we constructed a phylogenetic tree of all the formate-dependent nitrite reductase (nrfA) genes identified in genomic bins (Additional file 1: Figure S4). Most (72%) of these do not fall into a clade that has been described as true nrfA genes [34,35]. Two Deltaproteobacteria bins (SG8-35 and SG8-35-2) were confidently identified as being involved in DNRA.

Iron cycling
While the microbial cycling of iron in marine sediments has been commonly documented by geochemical approaches [36], well-documented biochemical pathways for iron oxidation and reduction now enable environmental genetic studies of these processes as well. For example, periplasmic and outer-membrane-anchored c-type cytochromes and a beta-barrel protein within the metal reduction (Mtr) respiratory pathway are essential for respiratory electron transport across the outer cell membrane to iron minerals in Shewanella oneidensis [37,38]. Homologues of these enzymes catalyze lithotrophic iron oxidation in Sideroxydans lithotrophicus [39] and phototrophic iron oxidation in Rhodopseudomonas palustris [40]. Genes with homology to mtr genes are present in seven different WOR genomic bins. Betaproteobacterial genomes reconstructed from the sulfate-rich zone (SG8-39, SG8-40, and SG8-41) have mtrABC genes with sequence identity (40% to 50%, 23% to 35%, and 26% to 28% at the protein level, respectively) to those found in Shewanella and Geobacter spp., suggesting that they are capable of iron reduction. Gemmatimonadetes bin SG8-38-2 has a putative mtrABC gene cluster and may be involved in iron cycling as well. Consistent with Zixibacteria genomes obtained from groundwater, bin SM-73-2 contains mtrAB but lacks the genes for extracellular cytochromes implicated in iron reduction [37]. The Gammaproteobacteria bin SG8-47 contains both the genes homologous to mtrABC and a cytoplasmic membrane-associated c-type cytochrome (cymA) required for iron reduction [41]. The closely related bacteria SG8-11 and SG8-30 have all these genes except mtrC and cymA, respectively. Bacteria potentially capable of iron reduction have multiple multi-heme cytochromes [42]; Gammaproteobacteria bins SG8-11 and SG8-30 also each have genes encoding six unique types of these cytochromes.

Conclusions
The highly resolved genomic reconstruction of estuary sediment microbial populations revealed potential physiological pathways of individual community members, including several recently defined (for example, KD3-62, TA06, Zixibacteria, and BRC1) and three newly described (WOR-1, WOR-2, and WOR-3) uncultured candidate phyla. The 12 genome bins belonging to WOR-1, WOR-2, WOR-3, KD3-62, and TA06 are the first to be constructed from these phyla. Based on their genome sequences, several of these groups appear to be capable of hydrolysis and fermentation of a variety of organic compounds, expanding the range of bacterial phyla known to hydrolyze and ferment biopolymers (sugars and proteins) to low molecular weight substrates. An average of 14 carbohydrate hydrolases was found per genome, with WOR-1 and WOR-2 having the most with 20 and 39, respectively. Planctomycetes and Bacteroides are also among the most versatile carbohydrate degrading bacteria in the White Oak River sediments with an average of 30 and 80 hydrolytic genes per genome, respectively.
This study identified potential new bacterial capabilities in sulfur cycling. One uncultured Gemmatimonadetes bacterium was linked to a previously taxonomically unassigned dsr gene clade, suggesting that this group is capable of sulfate reduction, a process that is commonly catalyzed by Deltaproteobacteria. Sequences for this group have been recovered from sediments throughout the world, suggesting a cosmopolitan Gemmatimonadetes lineage of sulfate-reducing bacteria. The oxidation of sulfide and thiosulfate is essential to sulfur cycling in marine and coastal sediments [43]; the ongoing microbial census of these processes is uncovering new key populations, such as uncultured Gammaproteobacteria mediating sulfur oxidation in coastal sediments [43]. Several of the most abundant Gammaproteobacteria in the sulfate-rich zone described here have the genetic potential for coupling anaerobic sulfur or hydrogen oxidation to reduction of nitrate to nitrous oxide, as documented in Beggiatoa spp. in marine sediments as well [44]. Interestingly, the genomic potential for the next denitrification step (N 2 O to N 2 ) in the White Oak River estuary sediments is found in different organisms, the Gemmatiomadetes and Myxococcales. This apparent splitting of the denitrification pathway over multiple taxa, and concurrent leakage of the intermediate N 2 O, provide an explanation as to why estuaries function as a significant source of N 2 O to the atmosphere [45].
Placing potential metabolic capabilities of individual populations within the framework of other community members provided a wiring diagram of the potential biogeochemical interactions at a system level. This study demonstrates how key pathways of carbon degradation and sulfur, nitrogen, and iron cycling may be distributed over a previously unexplored range of bacterial phyla within the estuarine sediment community; the genomic analysis provides a new perspective on the functioning of these pathways in nature, characterized by a high degree of functional redundancy among different lineages and by metabolic plasticity within specific organisms [46]. The comprehensive genomic reconstruction of these sediment communities provides a wide spectrum of specific links between metabolic potential and diversity and reveals potential ecological interactions within the communities; it begins to chart the possible flow of carbon and nutrients through the multiple layers of microbial processing, assimilation, and remineralization in the estuarine environment. Cores were stored at 4°C overnight and processed 24 h after sampling, as detailed in Lazar et al. [32]. Each core was sectioned into 2-cm intervals. From each site, one core was subsampled for geochemical analyses and the others were subsampled for DNA extractions. DNA was extracted using the Ultra-Clean Mega Soil DNA Isolation Kit (MoBio, Carlsbad, CA, USA), using 6 g of sediment, and stored at −80°C until use. Sulfate, sulfide, and methane profiles of these cores at the three sites can be seen in Lazar et al. [32]. The SMTZ is located at around 26, 24, and 16 cm in sites 1, 2, and 3 (respectively); therefore, the samples above that depth represent the sulfate-rich zones and those below the methane-rich zones.

Genomic assembly, binning, and annotation
Illumina (HiSeq 2000 PE100) shotgun genomic reads were screened against Illumina artifacts (adapters, DNA spike-ins) (Illumina Inc., San Diego, CA, USA) with a sliding window kmer size of 28 and a step size of 1. Reads with 3 or more N's or with average quality score of less than Q20 and a length <50 bps were removed. Screened reads were trimmed from both ends using a minimum quality cutoff of 5 using Sickle (https, //github.com/ najoshi/sickle). Trimmed, screened, paired-end Illumina reads were assembled using IDBA-UD [47] with the following parameters (-pre_correction -mink 55 -maxk 95 -step 10 -seed_kmer 55). To maximize assembly, reads from different sites were co-assembled.
The shallow assembly was a combination of high-quality reads (474,179,948 with an average read length 148 bp) from sites 2 (8 to 12 cm) and 3 (8 to 10 cm). The SMTZ assembly was generated from a combination of highquality reads (698,574,240, average read length 143 bp and average insert 274 bp) from sites 2 (30 to 32 cm) and 3 (24 to 28 cm). The deep assembly was generated from high-quality reads (378,027,948, average read length 124 bp and average insert 284 bp) of site 1 (52 to 54 cm). Since we were not able to co-assemble all three of the samples from the SMTZ due to computational limits, an additional assembly was generated from the third sample (site 1, 26 to 30 cm) from high-quality 345,710,832 reads (average length of 129 bp and average insert 281 bp). The contigs from this sample were co-binned with the assembly of the other two samples (sites 2 and 3, described above). This resulted in some closely related but unique bins, for example, the Gammaproteobacteria bins SMTZ1-46 (from site 1) and SMTZ-46 (from the site 2 and 3 assembly). Contigs with genes of particular interest were checked for chimeras by looking for dips in coverage within read mappings.
Initial binning of the assembled fragments was done using tetra-nucleotide frequencies signatures using 5 kb fragments of the contigs, as detailed in Dick et al. [48]. ESOM maps were manually delineated and curated based on clusters within the map (as shown in Additional file 1: Figure S1). This binning was enhanced by incorporating coverage signatures for all of the assembled contigs into the ESOM maps [49,50]. Coverage was determined by recruiting reads (from each individual library/sample) to scaffolds by BLASTN (bitscore >75) and then normalized to the number of reads from each library, determining which genomic bin each of the 5 kb sub-portions of the contigs were assigned then assessed the accuracy of the binning. Contigs larger than 15 kb were assigned to the bin where the majority of the 5 kb sub-portions were assigned. The completeness of the genomes within bins was then estimated by counting single-copy genes using CheckM [51]. Some of these bins were then shown to contain multiple (2)(3)(4)(5) closely related genomes based on the presence of multiple copies of single-copy genes. Those bins were further separated by plotting differential coverage of scaffolds between two libraries. Distinct clusters of scaffolds on the coverage plots were manually delineated into new bins. Binning was also manually curated based on GC content, top blast hits, and mate-pairings.
Genes were called and putative function was assigned using the JGI IMG/MER system [52]. The functions of predicted genes (including all those discussed in detail here) were manually curated and revised by comparison of homology to a variety of databases including KEGG, pfam, NCBI, and COG. The CAZy database was used to identify carbohydrate active genes [16]; a subset was selected which has been shown to be involved in specific carbohydrate degradation pathways (Additional file 3: Table S6) [22,53].

Phylogenetic analyses
The concatenated ribosomal protein tree was generated using syntenic genes that have been shown to have undergone limited lateral gene transfer (rpL2, 3, 4, 5, 6, 14, 15, 16, 18, 22, 24 and rpS3, 8, 10, 17, 19) [54]. The reference data sets were derived from the Phylosift database [55], with additional sets from the Joint Genomic Institute IMG database as detailed in Castelle et al. [56]. Scaffolds containing <50% of these 16 genes were not included in the analyses. Only two bins (DG-61 and DG-78) remained below this cutoff. We searched NCBI to include additional closely related reference sequences. Amino acid alignments of the individual genes were generated using MUSCLE [57] and manually curated. The curated alignments were then concatenated for phylogenetic analyses. The ribosomal protein tree included 182 taxa and 2,411 unambiguously aligned positions. The phylogeny shown in all the figures were generated using maximum likelihood using Phyml [58]. Bootstrap values were generated from 100 replicates of UPGMA tree building method and Jukes-Cantor distance modeling.

Availability of supporting data
The full metagenomic assemblies presented in this study are available in IMG with the following IMG Taxon IDs. They are Taxon ID: 3300001855 (sulfate-rich assembly, 'SG' genomes), 3300002052 (SMTZ site 1 assembly, 'SM1' genomes), 3300001753 (SMTZ sites 2 and 3 assembly, 'SM23' genomes), and 3300001854 (methane-rich assembly, 'DG' genomes). The genomic bins supporting the results of this article are being made available in NCBI Genbank under the BioProjectID PRJNA270657.

Additional files
Additional file 1: Supplementary data Table S1. General characteristics of the large fraction (contigs larger than 5 kb) of the four assemblies generated and used for binning, in the study. Table S2. Summary of genome reconstruction completeness, contamination, and strain heterogeneity based on CheckM package [51]. For more detailed information about individual genomic bins, see Table S3. Table S3. Number of phylogenetic proteins as identified using Phylosift in each of the genomic bins. Table S4. General characteristics of all bacterial genomic bins. Figure S1. Tetra-nucleotide ESOM binning map of shallow assembly. Each data point is a 5-kb portion of DNA sequence. Separation of data points are visible as brown lines (background color) which delineates clustering of sequences by the mapping. The manually delineated bins are colored and labeled. Notice that in a few cases (SG8-35, SG8-35-1, SG8-35-2, and SG8-38 and 38-1), multiple closely-related bins fall within one large cluster on this map. These bins were found to contain more than one genome and were therefore further delineated based on differential coverage plots. Figure S2. Abundances of top genotypes in the SMTZ (24 to 32 cm) and methane-rich (44 to 48 cm) layers of the sediment profiles based on the number of reads that map to all the genes for ribosomal protein S3. Those that are represented in the genomic bins are labeled. Figure S3. Phylogenetic tree of concatenated dsrAB genes within bacteria genomic bins from this study. Closed and open circles represent maximum likelihood (ProML, ARB package) bootstrap value >75 and >50, respectively. Figure S4. Phylogenetic tree of WOR proteins annotated at formate-dependent nitrite-reductase (periplasmic cyctochrome c552), NrfA. Cluster of sequences considered to be involved in DNRA are delineated on the right (Welsh et al. [34]). Genomic bin designations are shown in bold.
Additional file 2: Table S5. Table showing the abundance and identification of genes involved in protein degradation in the genomic bins in this study. The pfam and EC numbers used to search the annotations are provided. The number in each cell represents the number of that particular gene family that was identified.
Additional file 3: Table S6. A complete list of the abundance and specific genes in the genomic bins involved in degradation of organic carbon compounds identified by comparison to the CAZy database. The number represents the number of that particular gene that was found in that bin.