Environmental context
We surveyed the microbiomes along a latitudinal transect (73–81° N) of the salinity-stratified waters of the Canada Basin using a combination of metagenomics and metatranscriptomics (Fig. 1 a–b, Table S1). The sampling design targeted distinct water column features, including the relatively fresh surface mixed layer (surface; 5 m and 20 m), the subsurface chlorophyll maximum (SCM; 55–95 m), the FDOMmax associated with colder Pacific-origin water (32.3 and 33.1 PSU, 90–250 m), the warmer Atlantic-origin water (Tmax and AW; 360–1000 m depth), and Arctic bottom water (~3800 m). The warmer Atlantic-origin water and Arctic bottom water are herein collectively referred to as deep waters.
We sought to determine the distribution and composition of OM in the Canada Basin, with a focus on the distribution of tOM. Optical properties of the OM, such as fluorescence, have previously been used to assess the composition of OM in the ocean and differentiate between terrestrial and marine OM sources [46, 47]. We used excitation emission matrix (EEC) fluorescence spectroscopy combined with parallel factor analysis (PARAFAC) to determine the distribution of fluorescent DOM components. In the Canada Basin, seven components (C1-C7) were identified, as previously defined in DeFrancesco and Guéguen [26]. These components corresponded to terrestrially derived humic-like DOM (C1 and C4), amino acid or protein material (C3 and C6), or microbially derived humic-like DOM (C2, C5, and C7) (Fig. 1c). The aromatic-rich C1 was the most abundant component within the FDOMmax samples (25–27%) but also in the whole water column below the surface (20–22% in the SCM and 21–23% in the deep), verifying that a significant fraction of OM is of terrestrial origin. Of the terrestrial components, C4 was the dominant component in the surface (19–30%). The reduced contribution of C1 in the surface is because C1 is more red shifted than C4 indicating a stronger aromatic character and thus enhanced photosensitivity. Overall, these results indicate a strong contribution of a photostable fraction from terrestrial origin in the FDOM of the surface and an aromatic-rich fraction from terrestrial origin in the FDOM of the whole water column below the surface.
Vertical partitioning of metabolic features in metagenomes and metatranscriptomes
We investigated the abundance and distribution of aromatic compound degradation pathways in the Canada Basin microbiomes in relation to tOM availability. To investigate how metabolic pathways were distributed across the Canada Basin, we first performed nonmetric multidimensional scaling (NMDS) analysis on the abundance of enzyme-encoding genes (genes assigned to enzyme commission (EC) numbers) annotated from metagenome or metatranscriptome assemblies. NMDS ordination showed that metagenomes (stress = 0.11) were partitioned into four clusters consisting of samples collected from either the surface, the SCM, FDOMmax, or deep water (Fig. 2a). A similar pattern was observed in the NMDS ordination of metatranscriptomes (stress = 0.0503), although the variation between samples from within the same water column feature was higher than observed in metagenomes (Fig. 2b). In addition, there was less separation between the samples from the FDOMmax and from deeper Atlantic-origin waters in the ordination based on metatranscriptomes compared to metagenomes.
We next determined which enzymatic reactions differentiated the metagenomes across the stratified water column using nonnegative matrix factorization (NMF), which is a tool for extracting meaningful features from high dimensional data [48]. In our analyses, NMF decomposes the matrix of EC number abundances into two matrices. Matrix 1 presents a reduced number of elements that describe the overall similarities of the metagenomes based on EC number composition, while matrix 2 presents the weighted contribution of individual EC numbers on each of the elements in matrix 1. We determined that a decomposition with four elements best represented the overall enzyme composition of metagenomes (Fig. S1). The four elements, herein referred to as sub-metagenomes (Fig. S2), represented the same patterns as observed in the NMDS ordination, corresponding to the surface, SCM, FDOMmax, and deep waters (Fig. 2a).
We then assessed which EC numbers were strongly associated with each of the four sub-metagenomes by calculating an EC index value. This EC index value quantifies the tendency of an EC number to be specific to a single sub-metagenome (EC index values range between −1 and 1). The distribution of EC indices was plotted for each of the four sub-metagenomes. Overall, the means of the EC indices associated with aromatic compound degradation and other metabolic pathways in the four water column features were significantly different (PERMANOVA, F = 89.8, p < 0.0001). Each sub-metagenome has a collection of EC numbers with relatively high indices (> 0.5) (Fig. 2c). However, the most striking observation was that EC numbers involved in aromatic compound degradation were predominantly associated with the FDOMmax sub-metagenome, as demonstrated by the higher index values for EC numbers from aromatic compound degradation pathways than from other metabolic pathways in the FDOMmax (Student t-test, t = 13.26, p < 0.0001). The EC indices for aromatic compound degradation genes were smaller than the EC indices associated with other metabolic processes in the surface (Student t-test, t = 8.89, p < 0.0001) and not significantly different for the SCM (Student t-test, t = 0.369, p = 0.414) and the deep (Student t-test, t = 0.56, p = 0.545) sub-metagenomes.
We performed a similar NMF analysis on EC numbers in the metatranscriptomes (Fig. S1). Similar with the NMF analysis of metagenomes, decomposition resulted in four elements, herein referred to as sub-metatranscriptomes (Fig. S1), corresponding to the surface, SCM, FDOMmax, and deep waters (Fig. S3). For the sub-metatranscriptomes, the means of the EC indices from aromatic compound degradation and from other metabolic pathways in the four water column features were significantly different (PERMANOVA, F = 121, p < 0.0001). For the sub-metatranscriptomes, however, we observed higher indices for the EC numbers from aromatic compound degradation than for EC numbers from other metabolic processes in the SCM (Student t-test, t = 0.0218, p < 0.0001), the FDOMmax (Student t-test, t = 0.0444, p < 0.0001), and the deep waters (Student t-test, t = 0.0611, p < 0.0001) (Fig. 2d).
Aromatic compound degradation genes in global ocean metagenomes
As the humic-rich OM input to the Arctic Ocean is disproportionately high compared to other oceans, we investigated if genes associated with aromatic compound degradation were more abundant in the Canada Basin metagenomes compared to other oceanic metagenomes. As terrestrial OM is a significant contributor of HS to the Arctic Ocean, we restricted our analysis to genes involved in processing aromatic compounds of terrestrial origin. We focused the analysis on a set of 46 pathways previously implicated in degrading aromatic compounds from lignin (Fig. 3a, Table S2). We compared the relative abundance of aromatic compound degradation genes between metagenomes of the Canada Basin water features (surface, SCM, FDOMmax, deep) and metagenomes from both the surface and subsurface waters (SCM, mesopelagic, bathypelagic ) of the Atlantic, Pacific, Indian, and Southern Oceans as well as the Mediterranean Sea and Red Sea (Fig. 3b). The Canada Basin FDOMmax metagenomes contained the highest fraction of aromatic compound degradation genes (3.4%). Aromatic compound degradation genes were identified in other oceanic metagenomes (1.5–2.5% of total protein-coding genes) and the relative abundance of aromatic compound degradation genes increased with water depth. Overall, the mean percentage of aromatic compound degradation genes in the water column features of the Canada Basin were significantly different than in other oceans (PERMANOVA, F = 27.8, p < 0.0001). Specifically, the relative abundance was consistently higher (1.3–1.7 fold) in the microbiomes of the Canada Basin upper water column features compared to microbiomes from other oceanic zones (Student t-test: surface/surface, t = 0.58, p = 0.0013; SCM/SCM, t = 0.92, p = 0.0001; FDOMmax/mesopelagic, t = 0.81, p = 0.0001) (Fig. 3c). However, we did not observe significant differences between the percentage of aromatic compound degradation genes of Arctic deepwater microbiomes and the microbiomes of other oceans deep waters (Student t-test, t = 0.20, p = 0.490) (Fig. 3c).
Distribution of aromatic compound degradation genes and pathways in metagenomes and metatranscriptomes
To elucidate the diversity of aromatic compounds that the Arctic Ocean microbiomes can access as growth substrates, we assessed the diversity and the completeness of aromatic compound degradation pathways in Canada Basin metagenomes. We found evidence for the presence of 44 of the 46 aromatic compound degradation pathways in the metagenomes (Fig. S4). A complete set of genes were identified for over half of these pathways in the metagenomes, irrespective of the water column feature (Fig. S4). Evidence for the 44 pathways was also identified in the metatranscriptomes, including expression of the full complement of genes for 22 pathways (Fig. S4).
To measure the distribution of the aromatic compound degradation pathways through the water column, we used a selection of 39 unique marker genes for the 46 aromatic compound degradation pathways (Table S2). To provide a measure of pathway abundance and expression, we summed the depth of coverage of each marker gene or transcript and corrected for differences in metagenome sequencing effort (Fig. 4). Out of the 39 unique marker genes, 32 were detected in the Canada Basin metagenomes (Fig. 4a, Fig. S5). Generally, the most abundant genes were also most abundant in the metatranscriptomes (Fig. 4 a–b, Fig. S5, Fig. S6). Most of the marker genes were most abundant in the FDOMmax yet were most highly expressed in deep waters (Fig. 4 a–b).
Vanillate monooxygenase (K03862) was the most abundant marker gene within all water column features (20 copies/106 reads in the FDOMmax) for pathways degrading aromatic compounds from terrestrial sources, while 3-O-methylgallate 3,4-dioxygenase (K15065) showed a lower abundance (8 copies/106 reads in the FDOMmax). While vanillate monooxygenase was more abundant in the metatranscriptomes of the FDOMmax, 3-O-methylgallate 3,4-dioxygenase was more abundant in the metatranscriptomes of the deep layers. Both ring fission protocatechuate dioxygenases (K00449 and K04101) were abundant in metagenomes (8 and 6 copies/106 reads in the FDOMmax) and metatranscriptomes (2.5 and 7.5 copies/106 reads in the FDOMmax). Overall, these results show that the Canada Basin microbiomes can fully transform aromatic compounds from terrestrial sources into central carbon metabolism intermediates, with an enhanced capacity in the FDOMmax.
A number of aromatic compounds (e.g., salicylate, 3-hydroxycinnamate, or benzoate) can originate from lignin as well as other sources such as marine phytoplankton. Within the pathways involved in the degradation of aromatic compounds from possible marine or terrestrial origin, benzoate CoA-ligase (K04110), salicylate monooxygenase (K00480), and 3-hydroxycinnamate hydroxylase (K05712) were the most abundant in metagenomes (20, 24, and 17 copies/106 reads, respectively) but not in metatranscriptomes (7.5, 5, and 2 copies/106 reads). The most common funneling pathway for benzoate was through benzoyl-CoA as evidenced by the lower abundance of genes (5 copies/106 reads) and transcripts (2 copies/106 reads) encoding benzoate 1,2-dioxygenase (K05549) compared to benzoate CoA-ligase. Accordingly, the ring-fission benzoyl-CoA 2,3-epoxidase (K15512) was significantly more abundant (22 copies/106 reads) than the ring-fission marker gene catechol 1,2-dioxygenase (K03381) and catechol 2,3-dioxygenase (K00446) (3 and 7 copies/106 reads in the deep and SCM, respectively). However, both benzoyl-CoA 2,3-epoxidase and catechol 2,3-dioxygenase were among the most abundant genes in the metatranscriptomes (20 copies/106 reads) but with maximum abundance in the SCM and the deep, respectively (Fig. 4b). Of the ring-fission pathway marker genes, gentisate 1,2-dioxygenase (K00450) was one of the most abundant in metagenomes (15 copies/106 reads in the FDOMmax) and metatranscriptomes (20 copies/106 reads in the deep).
Taxonomic identity of aromatic compound degradation genes and their distribution across the microbiomes
We estimated the fraction of bacterial genomes harboring each marker gene by comparing the total number of gene variants for select aromatic compound degradation pathway markers to the number of the single-copy universally distributed recA genes (Fig. 4c, Fig. S7). The estimated fraction of bacterial genomes with aromatic compound degradation genes increased with depth, reaching a maximum in the FDOMmax (8–75%, Fig. 4c) and then decreased in the deep water (5–25%). The genes present in the highest fraction of bacterial genomes were involved in the degradation of benzoate through benzoyl-CoA (50% for K04110 and 45% for K15521 in the FDOMmax), gentisate (65% for K00450 in the FDOMmax), vanillate (75% for K03862 in the FDOMmax), and salicylate and 3-hydroxycinnamate (45% for K00480 and 40% for K05712 in the SCM) (Fig. S7). These numbers may be overestimated as they assume only a single gene copy per genome, whereas multiple paralogs may be present in a single genome (continued below).
Taxonomic analysis of aromatic compound degradation marker genes revealed that the number of gene clusters generally increased continuously with depth (Fig. 4d). Surface gene clusters were predominantly affiliated with Rhodobacterales (more than 50% of the gene clusters for K03381, K00446, K00481, and K03862) and unclassified Alphaproteobacteria (up to 50% for K15065 gene clusters), with a significant contribution from Gammaproteobacteria for K05549 (40%) and K15065 (25%) (Fig. 4d). In the SCM and FDOMmax, unclassified Alphaproteobacteria dominated the taxonomic affiliations of aromatic compound degradation gene clusters (10–55%), and Rhodospirillales contributed significantly to all gene clusters (10–30%), except for genes involved in the degradation of methyl gallate (K15065), which was primarily encoded by Chloroflexi (30% in the FDOMmax) (Fig. 4d). We generally observed more gene clusters in the deep than in the FDOMmax (Fig. 4d), while these genes were present in a smaller fraction of the deep communities than the FDOMmax communities (Fig. 4c), suggesting a broader phylogenetic diversity of aromatic compound degradation genes in the deep than in the FDOMmax. This is supported by the large contribution of other taxa (taxa contributing individually to < 5%) in the deep microbiomes. The contribution of taxa such as Rhodospirillales and Rhodobacterales may be underestimated due to the large fraction of Alphaproteobacteria genes that could only be assigned at the class level.
Aromatic compound processing pathways captured in metagenome-assembled genomes
We reconstructed metagenome-assembled genomes (MAGs) from our metagenomic data. We performed metagenomic binning of each of the 22 metagenome assemblies individually to reconstruct a total of 1772 MAGs. After filtering for genomes with greater than 30% completeness and less than 10% contamination, 823 genomes remained (Fig. S8). Thirty-one of the 32 marker genes involved in aromatic compound degradation pathways were identified (only dihydroxyphenylacetate 2,3-dioxygenase — K00455 was not detected) across 59% (482 of 823) of the MAGs (Figs. S9 and S10). The highest percentage of MAGs harboring aromatic compound degradation genes was in the FDOMmax (64%) and SCM (67%) and the lowest percentage in the surface (47%) and deep waters (54%). In general, the taxonomic diversity of MAGs increased with depth. Marker genes were identified in a broad taxonomic diversity of MAGs, including Alphaproteobacteria, Gammaproteobacteria, and Dehalococcoidia (Fig. S9). Alphaproteobacteria were common in the SCM and FDOMmax, while Gammaproteobacteria were common in the surface.
To investigate the ecology of bacterial taxa most implicated in the degradation of aromatic compounds, we further examined MAGs with complete or near-complete aromatic compound degradation pathways. We selected 46 MAGs most enriched in near-complete aromatic compound degradation pathways (see “Methods,” Fig. S10, Table S3). Of the 46 MAGs, 24 were recovered from metagenomes originating from the FDOMmax and 16 from the SCM layers. Thirty-eight of the MAGs were assigned to Alphaproteobacteria (Fig. S11), 3 MAGs belonged to the Dehalococcoida, 4 MAGs to the Gammaproteobacteria, and one MAG to the class Binatia.
Given the large representation of Alphaproteobacteria in the MAGs most implicated in aromatic compounds degradation, we investigated the evolutionary origins and phylogenetic relationships between our 38 Alphaproteobacteria MAGs and reference genomes available from the Genome Taxonomy Database (GTDB) [49] (Table S4). Based on an average nucleotide identity threshold of 95%, our 38 Alphaproteobacteria MAGs belonged to 16 genomospecies, which were phylogenetically associated with 12 clades (Fig. 5a, Fig. S12). The clades were within the Rhodobacterales, Thallassobaculales, Rhodospirillales, Defluviicoccales and five GTDB orders of uncultured Alphaproteobacteria (UBA8366, UBA6615, GCA2731375, UBA2966, UBA828). Each clade was comprised of Canada Basin MAGs as well as a basal branch consisting of genomes of marine origin. These results demonstrate that the Alphaproteobacteria MAGs were phylogenetically distinct but shared recent common ancestry with genomes from other oceanic zones.
To investigate the distribution of the 12 clades represented by the Alphaproteobacteria MAGs across the Arctic water column features, we performed fragment recruitment of both the metagenomic and metatranscriptomic reads against the MAGs representing the 16 genomospecies (Fig. 5b). Overall, the distribution in metagenomes and metatranscriptomes was similar, and all the MAGs were most abundant and active either in the FDOMmax or the SCM (Fig. 5b). We identified four general patterns of distribution across water column features consisting of (1) restriction to the FDOMmax (Defluvii-CB9_331, UBA2966); (2) common to the SCM and FDOMmax (UBA8366, UBA6615 clade, Thalassobaculales clade, Defluviiccocales genomes CB21_SCM.1 and CB4_SCM.1); (3) common in the FDOMmax and deeper waters (UBA828 clade genomes, GCA 2732375 genomes, and Rhodospirillales genomes); and (4) restricted to the SCM (Rhodosp-CB4_SCM and Rhodobact-CB21_SCM). These results show that MAGs with near-complete aromatic compound degradation pathways are strongly associated with and active in HS-rich regions of the water column.
Global ocean distribution of Alphaproteobacteria MAGs from Canada Basin
We sought to determine if the MAGs most implicated in aromatic compound degradation were more broadly distributed beyond the Arctic Ocean. We therefore investigated the distribution of the Alphaproteobacteria MAGs (Table S5) by fragment recruitment against a set of 151 metagenomes broadly representative of the global ocean microbiome (Fig. 6). Of the 16 representative MAGs, two were commonly detected outside of the Arctic Ocean. Rhodosp-CB9_331.2 was identified in the mesopelagic metagenomes from all oceanic regions, but not the Mediterranean Sea or Red Sea. The Rhodobacterales MAG (Rhodobact-CB21_SCM) was also identified in surface water metagenomes, most notably from the Southern Ocean. Although several other MAGs were detected at low frequency in the Southern Ocean (e.g., Rhodosp-CB6-bottom and Defluvii-CB9-331), the majority (12 MAGs, 75% of the MAGs) were not detected outside of the Arctic Ocean. Likewise, the vast majority of the most closely related marine reference genomes were not detected in Canada Basin metagenomes. The exceptions were Planktomarinamarina (Rhodobacterales) and the reference genome within UBA828.
Enhanced aromatic compound degradation capacity in Alphaproteobacteria MAGs restricted to the Canada Basin
We hypothesized that an enhanced capability for aromatic compound degradation may be implicated in the evolutionary adaptation of the Arctic Ocean populations. We therefore compared the abundance and diversity of aromatic compound degradation genes between the Arctic MAGs and the set of their closest relatives. Between 1.5 and 4% of the genes with an EC annotation was annotated with an EC from lignin-derived aromatic compound degradation pathways for both the Arctic MAGs and genomes from other oceans (Fig. 7). However, out of the 16 Arctic Ocean MAGs, 10 possessed a higher diversity of marker genes, and 14 possessed a higher number of marker gene copies than their sister taxa from other oceans (Fig. 7). This is despite the difference in genome completeness estimated for the reference MAGs (64–98%) compared to the Arctic Ocean MAGs (range of 42–95% completeness). Some marker genes were found exclusively in our Arctic Alphaproteobacteria MAGs, including those for the degradation of catechol to 3-oxoadipate, phenol 1, dehydrodivanillate, protocatechuate 1, cinnamate, and resorcinol (Fig. 7). Other marker genes were found exclusively in the reference MAGs, such as genes for the degradation of benzoyl-CoA II, gallate I and III, m-cresol, and γ-resorcylate.
Every single Arctic MAG contained at least one and up to seven marker genes that were not found in its closest related genome from other oceans (Fig. 7). These included gentisate 1,2-dioxygenase (K00450) in Rhodosp-CB4_SCM, Defluvii-CB9_331, UBA2966-CBN3_323, Thalasso-CBN3_331, and Rhodobact-CB21_SCM, as well as vanillate O-demethylase (K03862) in Rhodosp-CB6_Bottom and Rhodosp-CB9_331.1 or benzoate CoA-ligase (K04110) in UBA6615-CBN3_323 and Rhodosp-CB9_331.2. These genes were the most abundant in the metagenomes and metatranscriptomes and most common throughout the Arctic Ocean microbiomes (Fig. 4 a–c). The three MAGs with the highest number of aromatic compound degradation gene copies (Defluvii-CB21_SCM.1, Rhodosp-CB9_331.1, and Defluvii-CB9_331) all possessed more copies of the vanillate O-demethylase (K03862) gene than their sister genomes. In addition, all of their sister genomes lacked the protocatechuate ring-opening genes (K04101 or K00449). A vast majority of the marker genes were also expressed in our Arctic Ocean MAGs (Fig. 7, right panel), and we found only gentisate 1,2-dioxygenase (K00450) slightly expressed in the MarineAlpha10_Bin2 of the reference genomes.