Overview of metagenomes and Chlorobium MAGs
Ace Lake, Ellis Fjord, and Taynaya Bay are herein referred to as AL, EF, and TB, respectively. Biomass was collected by filtration through a 20-μm pre-filter onto large format filters (3, 0.8, and 0.1 μm) for AL and EF, and into Sterivex cartridges (0.22 μm) for TB (see the “Methods” section). The filtered reads from 18 AL (~ 99 Gb), three EF (~48 Gb) and one TB (~ 12 Gb) oxic-anoxic interface metagenomes were used for fragment recruitment (FR) analyses (Additional file 1: Table S1); for these analyses, the AL and EF metagenome reads from the three filter fractions representing a single sample (date and depth) were pooled to form merged metagenomes (see the “Methods” section). The assembled contigs from individual AL (~ 6 Gb), EF (~ 7 Gb) and TB (~ 700 Mb) metagenomes (Additional file 1: Table S1) were used to determine the Chlorobium OTU abundance distribution in the three Vestfold Hills systems, and for viral analyses.
A total of 59 high or medium quality MAGs were analysed, of which 31 AL, five EF, and two TB high-quality MAGs had ≥ 99% genome completeness (Additional file 1: Table S2; Additional file 2: Dataset S1). The MAGs represented 67,265 genes on 1124 Chlorobium contigs, and both 16S rRNA gene and FmoA (Fenna-Matthews-Olson protein; bacteriochlorophyll A) protein sequences were used as phylogenetic markers [42]. For FR analyses the AL_ref MAG (Dec 2014, 19 m depth, 0.1 μm-filter) contained 27 contigs and 1,797 genes and was 99% complete (1,812,610 bp), and the EF_ref MAG (Oct 2014, 45-m depth, 3-μm filter) contained 32 contigs and 1807 genes and was 99% complete (1,836,564 bp) (Additional file 1: Tables S2 and S3; Additional file 2: Dataset S1).
Chlorobium species present in EF and TB
Chlorobium OTUs were most abundant in EF (45 m) and TB (11 m) at depths where oxic-anoxic interfaces have previously been recorded [8, 29, 31], with a relative abundance (EF, ≤ 49%; TB, 6%) comparable to the range of abundances observed in AL (< 1–84%; Fig. 2) [19]. In TB where Chlorobium had lower relative abundance than EF or AL, the Simpson’s index of diversity was higher (1 − λ′ > 0.9 compared to ≤ 0.7).
All 16S rRNA genes from AL, EF, and TB Chlorobium MAGs had identical sequences (1505 bp), as did all FmoA protein sequences (366 aa) (Additional file 1: Fig. S1). The pair-wise, average nucleotide identify (ANI) of all Chlorobium MAGs was ≥ 99.9% over ≥ 92% alignment fraction. FR of AL, EF, and TB metagenome reads to the Chlorobium 16S rRNA gene (EF_ref MAG) revealed a number of SNPs with variant frequency ≥ 0.01 (i.e., at least 1% of the aligned reads contained the SNP) (Additional file 3: Dataset S2). All of these SNPs, except one from the AL Dec 2014 merged metagenome, two from the EF merged metagenome, and four from the TB metagenome, had very low read depth (on average < 5) and could represent sequencing errors (Additional file 3: Dataset S2). In contrast, the read depth of the Chlorobium 16S rRNA gene sequence (lacking SNPs) was > 80 in all AL (except Oct 2014, read depth 31), EF and TB metagenomes, and > 11,000 in some metagenomes (Additional file 3: Dataset S2). These data indicate that the same species of Chlorobium was present in all three Vestfold Hills systems, representing at least 97% of AL, 97% of EF, and 98% of TB Chlorobium population, and was the only detectable Chlorobium species in AL throughout a seasonal cycle (also see below in “Ca. Chlorobium antarcticum population variation between AL, EF, and TB”).
IMG (Integrated Microbial Genomes) taxonomy denoted all MAGs as most closely related to Chlorobium phaeovibrioides DSM 265 (herein referred to as Cpv-DSM265). The 16S rRNA gene identity (99%; 17 nt mismatches; Additional file 1: Fig. S1a), FmoA protein identity (98%; six aa mutations; Additional file 1: Fig. S1b), ANI (85% over 80–86% alignment fraction), and average amino acid identity (AAI; 89%) distinguish the Antarctic species from Cpv-DSM265, and these differences are reflected in 16S rRNA gene and FmoA protein trees (Fig. 3) (also see below in “Comparison of Ca. Chlorobium antarcticum to Cpv-DSM265 and global representation”). In view of the genomic and phylogenetic differences we name the Antarctic species, Candidatus Chlorobium antarcticum sp. nov. (from ant.arc'ti.cum. L. neut. adj. antarcticum southern, Antarctic) (type MAG AL_ref MAG = 3300023061_2; 99% complete; 0.55% contamination) (Additional file 1: Table S2; Additional file 2: Dataset S1).
Ca. Chlorobium antarcticum population variation within AL
Aligning AL metagenome filtered-reads to the AL_ref MAG to identify SNPs determined that no fixed mutations (variant frequency ≥ 0.9) were present. However, seven LCRs were identified (Fig. 4; Additional file 1: Table S4). The LCRs encoded cell wall modification, cell defence, transport, DNA repair, protein modification, metabolism, mobile element, and hypothetical genes (Additional file 1: Tables S4, S5, and S6). Metabolic genes included: (i) a cluster of nine genes representing the N-type rotary ATPase (N-ATPase) operon (atpD, atpC, atpQ, atpR, atpB, atpE, atpF, atpA, atpG), which codes for ATPase subunits involved in ATP-dependent efflux of Na+ or H+ ions; (ii) a cluster of eight single-copy genes involved in the anaerobic pathway for cobalamin biosynthesis (cbiD, cbiJ, cbiL, cbiK, cysG, and bifunctional cbiFG, cbiET, cbiHC), plus a single copy gene involved in cobinamide salvaging (cbiZ); (iii) a gene cluster containing one cobaltochelatase (cobN) and three magnesium chelatase (bchD, bchH, bchI) genes; (iv) TonB-dependent and ABC transporter proteins involved in the import of iron, cobalt, and cobalamin across the outer membrane and inner membrane, respectively; (v) a gene cluster for export of proteases (Additional file 1: Tables S5 and S6).
A seasonal pattern was observed, with the proportion of the Ca. Chlorobium antarcticum population that possessed the genes within LCRs tending to be higher in summer than in winter or spring (Additional file 1: Tables S4, S5, and S6), most notably for genes associated with cobalamin synthesis and transport (also see below in “Population structure of cobalamin biosynthesis and transport genes”).
The range of transport genes present in the LCRs of Ca. Chlorobium antarcticum MAGs is indicative of the population supporting a diversity of transport abilities (Additional file 1: Table S4 and S5). For example, protease export systems with similarity to Pseudomonas aeruginosa AprDEF were present in at least 28% of the Ca. Chlorobium antarcticum population, and abundance did not vary with season (Group 7 in Additional file 1: Table S5). For GSB, iron is an essential trace element required for the photosynthetic reaction centre [16]. The concentration of iron in AL increases with depth, being ~ 1 μM at the oxic-anoxic interface [30, 43]. TonB-dependent transporter and ABC transporter genes enable the uptake of both inorganic iron and organic forms of iron (siderophores, hemoproteins) [44]. All Ca. Chlorobium antarcticum MAGs contained two sets of ferrous iron transporter genes (feoABC and feoAB), and three TonB-dependent transporter genes potentially involved in iron complex import across the outer membrane. However, the ABC transporters associated with the uptake of iron complexes were only identified in LCRs (Groups 1 and 2 in Additional file 1: Table S5), indicating an augmented capacity for these phylotypes to source exogenous iron (at least 56% of the Ca. Chlorobium antarcticum population).
An N-ATPase operon (atpDCQRBEFAG) was present in at least 61% of the Ca. Chlorobium antarcticum population, with abundance varying only marginally by season (Group 8 in Additional file 1: Table S5); in addition, F0F1 ATP synthase genes were present throughout the Ca. Chlorobium antarcticum population. N-ATPases utilize ATP to actively transport Na+ or H+ ions out of the bacterial cell [45,46,47]. The Ca. Chlorobium antarcticum ATPase subunit c amino acid sequence included the two glutamate residues in both of its C- and N-terminal helices that are diagnostic of Na+-binding [45,46,47], indicating it functions in Na+ export. N-ATPase genes have been identified in some Chlorobi, including Chlorobaculum parvum, Chlorobaculum tepidum (partial locus only), Pelodictyon luteolum, and Prosthecochloris aestuarii [48, 49].
Ca. Chlorobium antarcticum population variation between AL, EF, and TB
Similar to the analysis of SNPs within the AL population, no fixed SNPs were observed for EF metagenome reads against the EF_ref MAG. However, from 1807 genes in the EF_ref MAG, SNPs were identified in 68 genes only from AL, two only from TB, and 19 genes from both AL and TB (Fig. 5; Additional file 1: Table S7). Most SNPs occurred in genes involved in intracellular functions, with a smaller proportion in cell wall modification, substrate transport, and membrane protein genes. SNPs were present in regions of the EF_ref MAG that had even FR coverage, except for those in a hypothetical gene (contig E1, Additional file 1: Table S7), a precorrin-3B methylase/precorrin isomerase gene (contig E15, Additional file 1: Table S7), and gene for a receptor for the TonB-dependent uptake of iron-containing proteins (contig E17, Additional file 1: Table S7). This indicated that the AL and TB SNPs tended to occur within all Ca. Chlorobium antarcticum subpopulations, and were therefore characteristic of each system.
A total of 12 LCRs were identified from FR of AL, EF and TB metagenome reads to the EF_ref MAG (Fig. 5; Additional file 1: Table S4). Notably, five AL LCRs identified against the AL_ref MAG were also LCRs from FR of AL, EF, and TB reads to the EF_ref MAG (Additional file 1: Table S4) indicating that the main (detectable) Ca. Chlorobium antarcticum phylotypes existed in all three Vestfold Hills systems. The LCRs encoded cell wall modification, cell defence, transport, DNA repair, protein modification, Na+ or H+ ion efflux, anaerobic cobalamin biosynthesis, cobinamide salvaging, and cobalt/magnesium chelatase genes, similar to the gene functions of the AL_ref MAG LCRs. LCRs specific to the EF_ref MAG included cell wall modification, general function, and hypothetical genes.
To assess gene order of phylotypes, the contigs of AL, EF, and TB MAGs were aligned to AL_ref MAG (Additional file 3: Dataset S2). Most of the AL_ref MAG contigs that did not align to the contigs of the other MAGs were from AL_ref MAG LCRs, consistent with gene order varying in Ca. Chlorobium antarcticum phylotypes.
While the main phylotypes were shared amongst systems, some LCRs (e.g., contigs E29–E32) had very low read depth (≤ 2%) in all three systems (Additional file 1: Table S4) indicating that the genetic capacity represented by these contigs was rare within the overall Ca. Chlorobium antarcticum population. The relative coverage of some LCRs also varied considerably between systems indicative of different population structures for these specific genes (Fig. 6; Additional file 1: Table S4). For example, the 11-kb contig E1 represented 3% of the EF Ca. Chlorobium antarcticum population but 69% of the TB Ca. Chlorobium antarcticum population. Based on relative coverage, phylotypes represented by LCRs contributed more to the TB Ca. Chlorobium antarcticum population than to the AL or EF populations (Fig. 6; Additional file 1: Table S4). However, EF_ref MAG SNPs were more prevalent for AL than TB, indicating that SNP-based variation was more similar between EF and TB Ca. Chlorobium antarcticum populations than either were to the AL population. The apparent differences in contribution of LCRs and SNPs to the Ca. Chlorobium antarcticum population from each system may reflect the cellular mechanisms involved in generating variation (e.g., DNA repair) and/or environmental effects (e.g., selective forces), and determining the causes will require further investigation (also see Additional file 1: Supplementary text).
To determine if phylotypes from AL, EF, or TB existed with greater sequence divergence than the FR matching criteria permitted (≥ 95% identity), G + C content of metagenome contigs was plotted against read depth and the taxonomy of contig clusters assigned (Additional file 1: Fig. S2); this approach was previously used to identify phylotypes of Antarctic haloarchaea with significantly different genomes to known species [38]. The contigs in the main cluster were from Ca. Chlorobium antarcticum (Additional file 1: Fig. S2). Aside from a number of contigs from some smaller clusters (see the “Methods” section), none of the OTUs of small clusters represented Ca. Chlorobium antarcticum, indicating that phylotypes with more divergence than the cutoffs used for assigning LCRs were not detectable in the metagenome data.
Collectively, the high ANI/AAI between MAGs (see above in “Chlorobium species present in EF and TB”), the small extent of variation represented by SNPs and LCRs, and the taxonomic findings of the analysis of GC/read-depth clusters, illustrate that the Ca. Chlorobium antarcticum population has remarkably little genomic variation.
Comparison of Ca. Chlorobium antarcticum to Cpv-DSM265 and global representation
The AL, EF, and TB contigs had overall low nucleotide identity (< 90%) when aligned to the Cpv-DSM265 genome, with many gaps and differences in gene content (Fig. 7). As described previously, Ca. Chlorobium antarcticum is green rather than brown in colour (unlike Cpv-DSM265); as well as possessing the biosynthetic pathway for chlorobactene (found in green-coloured GSB), Ca. Chlorobium antarcticum lacks the capacities to synthesize bacteriochlorophyll e and isorenieratene, both found in Cpv-DSM265 and other brown-coloured GSB [19].
Many of the Cpv-DSM265 genes that caused the alignment gaps were associated with transposases and hypothetical genes (Additional file 3: Dataset S2). However, some were genes involved in thiosulphate oxidation (sox gene cluster containing soxA, soxB, soxX, soxY, soxZ), assimilatory sulphate reduction (cysC, cysD, cysN), and pilus assembly, none of which were present in the Ca. Chlorobium antarcticum MAGs. GSB do not tend to have a genomic capacity to perform assimilatory sulphate reduction [50], and it has been speculated that Cpv-DSM265 acquired the sox gene cluster on a mobile element from another member of the Chlorobiaceae family that originated in Proteobacteria [51]. Ca. Chlorobium antarcticum is therefore predicted to not be able to assimilate sulphate or to oxidise thiosulphate.
A number of Ca. Chlorobium antarcticum contigs did not align to the Cpv-DSM265 genome (Additional file 3: Dataset S2). These contigs contained anaerobic cobalamin biosynthesis, cobalt transport, cobalamin transport, cobalt/magnesium chelatase, and N-ATPase genes, all of which were absent from the Cpv-DSM265 genome. While cobalamin transport and magnesium chelatase genes were present in all Ca. Chlorobium antarcticum MAGs, all of the contigs that did not align with the Cpv-DSM265 genome represented LCRs of the AL_ref MAG and EF_ref MAG (Additional file 1: Tables S4, S5, and S6). It is therefore possible that Cpv-DSM265 represents a phylotype that lacks these genetic loci, or that the loci represent functions that are of particular importance to the Antarctic Ca. Chlorobium antarcticum population (also see below in “Population structure of cobalamin biosynthesis and transport genes”).
The Ca. Chlorobium antarcticum MAGs encoded multiple glycosyltransferase genes involved in cell wall biosynthesis that were not identified in the Cpv-DSM265 genome; the glycosyltransferases were represented throughout the Ca. Chlorobium antarcticum population, with only a few in LCRs (Additional file 1: Table S4), and are therefore characteristic of this Antarctic species. The glycosyltransferases may fulfil roles in cold adaptation through their function in biosynthesis and modification of cell walls [13, 52]. RNA helicases present in LCRs may also fulfil roles in cold adaptation through a potential functional capacity to unravel RNA secondary structures and influence rates of protein synthesis [53, 54]. The CRISPR-Cas defence systems [55] varied between the two Chlorobium species with Ca. Chlorobium antarcticum containing subtype I-E and Cpv-DSM265 containing subtype I-C (also see below in “Ca. Chlorobium antarcticum-virus interactions”). These genomic differences underscore specific metabolic and defence capabilities of the two Chlorobium species.
The global representation of Ca. Chlorobium antarcticum was assessed by matching the Ca. Chlorobium antarcticum 16S rRNA gene to all 16S rRNA genes from public metagenomes and genomes and the Ca. Chlorobium antarcticum FmoA protein sequence to all proteins from genomes (including MAGs and single-cell genomes) in IMG. All metagenome and genome matches were ≤ 99% 16S rRNA gene identity, and with the exception of Cpv-DSM265 (98% identity), all FmoA sequences had < 98% identity (Additional file 4: Dataset S3). The inability to identify Ca. Chlorobium antarcticum outside of Antarctica was in marked contrast to its representation in data from the three Vestfold Hills systems.
Population structure of cobalamin biosynthesis and transport genes
Cobalamin and cobamide analogues are cofactors that function in a variety of metabolic processes, and although most bacteria contain cobamide-dependent enzymes, most are incapable of synthesizing the cofactors and need to source if from the environment [56, 57]. Cobalamin is an organometallic compound containing a central corrin ring with chelated cobalt. The biologically active form of cobalamin, adenosylcobalamin, can be synthesized by an aerobic or anaerobic pathway, with part of the pathway shared by both (Additional file 1: Fig. S3).
All the genes in the anaerobic pathway for cobalamin biosynthesis have been reported for Chlorobaculum tepidum [4]. However, a comparative genomics assessment of 11,000 bacterial species did not identify all cobamide biosynthesis genes in the 10 Chlorobi that were examined, including Cpv-DSM265, and categorized them as cobinamide salvagers [57]. We determined that Ca. Chlorobium antarcticum encodes the anaerobic pathway, with the genes exclusive to the anaerobic pathway (green-coloured branch between precorrin-2 and cob(II)yrinate a,c-diamide in Fig. 8) located in a LCR. At least 29% of the AL Ca. Chlorobium antarcticum population from all time periods, and 8% and 72% of the EF and TB Ca. Chlorobium antarcticum populations, respectively, possessed the genes, although coverage was about 2-fold higher in AL in summer compared to winter (Additional file 1: Tables S4 and S6).
The anaerobic synthesis of 5,6-dimethylbenzimidazole (DMB), the lower axial ligand of adenosylcobalamin, involves enzymes from the bzaABCDE operon acting on 5-amino-1-(5-phospho-β-D-ribosyl)imidazole as substrate [60]. While the Ca. Chlorobium antarcticum MAGs did not possess bzaABCDE or cobC it did encode the DMB activation and utilization genes (cobT, cobS). This indicates that similar to some other bacteria [68, 69], Ca. Chlorobium antarcticum may have a capacity to remodel exogenous DMB to produce cobalamin. The gene cobC can perform the final step in adenosylcobalamin synthesis, but Ca. Chlorobium antarcticum MAGs lacked this gene and may instead utilize alternative genes, cblZ or cblXY, which have been proposed to function in Actinobacteria and some Alphaproteobacteria, respectively [61].
The Ca. Chlorobium antarcticum LCRs also contained a colocalized cluster of genes annotated as cobaltochelatase subunit CobN and magnesium chelatase subunits BchH, BchI and BchD (Additional file 1: Table S6). CobN forms a complex with cobaltochelatase subunits CobS and CobT (which were not identified in the MAGs) and catalyses cobalt insertion during aerobic cobalamin biosynthesis [70, 71], and BchH, BchI and BchD can function in magnesium insertion during bacteriochlorophyll biosynthesis [72]. However, sequence similarity exists between cobaltochelatase NST and magnesium chelatase HID [73, 74] and it has been speculated that BchI and BchD may function as CobS and CobT to form a functional cobaltochelatase complex [61]. In Ca. Chlorobium antarcticum, these cobalt/magnesium chelatase genes were colocalized with potential cobalamin transport genes (LCR5 in Additional file 1: Table S4; Groups 4 and 5 in Additional file 1: Table S5) and therefore may function in cobalamin biosynthesis. In support of this inference, it was speculated that the colocalization of cobalt/magnesium chelatases beside a TonB-dependent receptor protein for cobalamin in Chlorobaculum tepidum may pertain to cobalt being inserted into exogenously acquired cobalamin [4]. Moreover, additional magnesium chelatase genes, including three coding for BchH and one each for BchI and BchD, were present throughout the Ca. Chlorobium antarcticum population which likely function in bacteriochlorophyll synthesis rather than cobalamin production. Most GSB contain three homologues of BchH, denoted BchH, BchS, and BchT [75], which have been reported to be active magnesium chelatases that exhibit differences in their enzymatic properties [76].
Cobalamin biosynthesis genes can be colocalized with the cobalt transporter genes cbiMNQO [61, 62], and this was the case in Ca. Chlorobium antarcticum (LCR5 in Additional file 1: Table S4). Cobalt is relatively concentrated in AL, with ~6 nM at the oxic-anoxic interface which is ~ 300-times the concentration in sea water [30, 43]. The cbiMNQO gene cluster was present in a LCR (Group 6 in Additional file 1: Table S5) with the genes present in at least 41% of the Ca. Chlorobium antarcticum population from all time periods, although an approximately 1.5-fold higher coverage occurred in summer compared to winter; the minimum abundance (~ 30%) and seasonal change (~ 2-fold higher in summer) are similar to the phylotypes containing the cobalamin biosynthesis genes.
The Ca. Chlorobium antarcticum MAGs contained cobA, cobP/cobU, and cbiZ, representing all the genes known in bacteria and archaea to be involved in salvaging cobinamide [63,64,65,66]. cbiZ can also function in salvaging pseudocobalamin, and cbiZ was the only gene located in a LCR (Fig. 8; Additional file 1: Table S6). These data indicate that the whole lake population of Ca. Chlorobium antarcticum was likely adept at converting cobinamide into intermediates of cobalamin biosynthesis, and a subpopulation (at least 8% from all time periods) had the capacity to also salvage pseudocobalamin. The coverage of cbiZ was about 2-fold higher in summer, matching the seasonal abundance pattern of cobalt transporter and cobalamin biosynthesis genes (Additional file 1: Tables S5 and S6).
In Ca. Chlorobium antarcticum MAGs, the cbiZ and cobalamin transporter genes were colocalized (LCR5 in Additional file 1: Table S4), as is the case in many bacteria, including Chlorobium [65]. It has been speculated that Rhodobacter sphaeroides may use cobalamin transporters to scavenge pseudocobalamin produced by cyanobacteria and convert it to cobalamin precursors using CbiZ [65, 66, 77,78,79,80]. AL supports a high abundance of Synechococcus that blooms in summer close to the oxic-anoxic interface [19, 81], indicating that it may be the source of pseudocobalamin that is imported and converted to cobalamin precursors by cbiZ.
The uptake of cobalamin itself requires TonB-dependent transport (BtuB) through the outer membrane and ABC transporters (e.g., BtuCDF) or energy-coupling factor (CbrT) through the inner membrane [82,83,84]. Ca. Chlorobium antarcticum contained two putative btuB TonB-dependent transporter genes, plus a set of ABC transporter genes (btuC, permease; btuD, ATP-binding; btuF, substrate-binding) throughout the population. Additional putative btuB and btuCDF genes were also present in LCRs (Groups 3, 4, and 5 in Additional file 1: Table S5) in at least 7% of the Ca. Chlorobium antarcticum population across all time periods, although the abundance was 2–3-fold higher in summer compared to winter (Groups 3, 4, and 5 in Additional file 1: Table S5).
The biosynthesis and transport of cobalamin has been shown to be regulated by cobalamin-binding riboswitches that are present in the 5′-untranslated region of genes, including btuB (cobalamin transporter), metE (5-methyltetrahydropteroyltriglutamate homocysteine methyltransferase), and nrdD (ribonucleoside-triphosphate reductase) [85,86,87,88,89,90,91,92,93]. A total of six cobalamin riboswitch sequences were identified in LCRs of Ca. Chlorobium antarcticum, one each upstream of btuB and btuF (both cobalamin transporters), metE, nrdD, and at the end of two contigs (Fig. 4b; Additional file 1: Table S6). Three additional cobalamin riboswitch sequences were identified throughout the Ca. Chlorobium antarcticum population, one each upstream of two btuB genes, and a hypothetical protein-coding gene. In Chlorobi, the genes with cobalamin riboswitch sequences are mainly translationally regulated; regulation has been shown to involve inhibition of translation initiation, where cobalamin (in the form of adenosylcobalamin) binds to the riboswitch RNA sequence of the regulated mRNA, leading to a perturbed mRNA structure that inhibits ribosome binding and subsequent translation [88, 89, 91].
Overall, the phylotype data for cobalamin-related biosynthesis, salvaging, and transport indicate that all of the Ca. Chlorobium antarcticum population is capable of importing cobalamin (Additional file 1: Tables S4, S5, and S6), although the proportion of the population with additional cobalamin transport genes varies with the system: EF, 7%; AL, 7% increasing to 25% in summer; TB, 78% (Additional file 1: Tables S4 and S5). Certain phylotypes are also capable of importing and salvaging cobinamide and pseudocobalamin, with this capacity also increasing in summer in AL.
Ca. Chlorobium antarcticum-virus interactions
The subtype I-E CRISPR-Cas system in Ca. Chlorobium antarcticum contained the core cas genes casA (or cse1) and casB (or cse2) with genes arranged cas3, casA, casB, casE, casC, casD, cas1, cas2, followed by a CRISPR spacer array, indicating the system could be functional. Analysis of NCBI gene annotation data showed CRISPR-Cas systems to be common in GSB, the subtypes to vary, and some species to contain multiple subtypes (Additional file 1: Table S8). No genes associated with BREX (bacteriophage exclusion) or DISARM (defence island system associated with restriction-modification) systems were identified. However, type I R-M (restriction-modification) methyltransferase and endonuclease and two type IV R-M endonuclease genes were identified (Additional file 1: Table S9), with the type I R-M genes present in a LCR (Additional file 1: Tables S4). Additionally, five genes associated with toxin-antitoxin (T-A) systems (parD, parE, relF, brnA, abiEi) were identified in Ca. Chlorobium antarcticum (Additional file 1: Table S9), with brnA in a LCR (Additional file 1: Table S4). The most likely system to contribute to the control of viral propagation is the AbiE type IV T-A system, an ABI (abortive infection) system that causes cell dormancy and prevents viral dissemination [94], but it is unclear if this system was functional as the antitoxin gene (abiEi) was identified but not the toxin gene (abiEii).
Potential Ca. Chlorobium antarcticum viruses were identified by aligning the Ca. Chlorobium antarcticum CRISPR-Cas spacers to an Antarctic virus catalogue, and a spacer database was used to identify additional potential hosts of the viruses (see the “Methods” section) [19]. A total of 79 CRISPR spacers from EF Ca. Chlorobium antarcticum MAGs (Additional file 1: Table S10) mapped to potential viruses. Eight viral contigs had 97% identity to spacer Spc230 (Additional file 1: Table S11). The viral contigs were from AL metagenomes and belonged to viral cluster cl_248, a previously identified potential AL Chlorobium virus [19]. No EF Ca. Chlorobium antarcticum spacers were mapped to EF viral contigs, which likely reflects the smaller size of the EF metagenome dataset compared to AL which resulted in 6,104 EF viral contigs compared to 30,897 AL viral contigs in the Antarctic virus catalogue.
As the TB metagenomes were not available when the Antarctic virus catalogue and IMG/VR spacer database were constructed [19], a slightly different approach was used to identify viral contigs matching to spacers in TB Ca. Chlorobium antarcticum MAGs (see the “Methods” section). A total of 58 TB Ca. Chlorobium antarcticum spacers were aligned against the Antarctic virus catalogue, resulting in nine spacers (Spc236, Spc238, Spc241, Spc243–Spc245, Spc249, Spc251, Spc252; Additional file 1: Table S10) matching to 23 viral contigs with ≥ 97% identity. Eighteen of the viral contigs were from AL metagenomes and belonged to viral cluster cl_1024 (14) and viral singletons sg_10581 (1), sg_14551 (1), sg_14796 (1), and sg_14959 (1); cl_1024 was previously identified as a potential AL Chlorobium virus [19]. The remaining five viral contigs were from hypersaline Antarctic systems, Deep Lake and Rauer 13 Lake [41], and belonged to cl_9176 (1), sg_1370 (1), sg_1648 (1), sg_1649 (1), and sg_1677 (1). Similar to EF, no TB Ca. Chlorobium antarcticum spacers mapped to the 995 available TB viral contigs, likely reflecting the size of the metagenome dataset. It is noteworthy that the AL Ca. Chlorobium antarcticum spacers themselves had ≥ 97% identity matches to viral contigs from AL as well as Deep Lake, Club Lake, Organic Lake, and some Rauer Island lakes (Rauer 2, 3, 5, 6, 11, and 13 lakes) (Fig. 9; Additional file 1: Table S11).
The viral contigs representing potential EF and TB Ca. Chlorobium antarcticum viruses were matched (100% identity) to host spacers, identifying potential hosts to be primarily Gammaproteobacteria and Chlorobi (including Chlorobium OTUs from the Vestfold Hills), plus Actinobacteria, Bacteroidetes, Firmicutes, Betaproteobacteria, Deltaproteobacteria, and Verrucomicrobia (Additional file 1: Table S12). These host assignments were similar to previous findings for AL Chlorobium viruses [19] and point to Ca. Chlorobium antarcticum viruses from all three systems belonging to similar viral clusters (e.g., cl_1024 and cl_248). This host analysis indicates that the viruses likely prey on several different bacterial genera as a wide variety of hosts, and may therefore be considered generalist rather than specialist viruses [95,96,97].
The predicted Ca. Chlorobium antarcticum viruses also appeared to be widely distributed with spacer matches to viral contigs from hypersaline systems enriched in haloarchaea (Deep Lake, Club Lake, Rauer 3, 6, and 13 lakes) and diverse bacterial taxa (Organic Lake, Rauer 2, 5, and 11 lakes) (Fig. 9; Additional file 1: Table S11). Chlorobium has not been reported in these lake systems, and the microbial communities in Deep Lake [38, 41] and Organic Lake [98, 99] in particular, have been intensively studied. In contrast, the other potential hosts, notably Gammaproteobacteria, are prevalent in Organic Lake [98, 99] and have been identified in some of the other lakes [38, 41], further reinforcing that the potential Ca. Chlorobium antarcticum viruses have characteristics of generalist viruses infecting a broad host range [95,96,97].