Remarkably coherent population structure for a dominant Antarctic Chlorobium species

In Antarctica, summer sunlight enables phototrophic microorganisms to drive primary production, thereby “feeding” ecosystems to enable their persistence through the long, dark winter months. In Ace Lake, a stratified marine-derived system in the Vestfold Hills of East Antarctica, a Chlorobium species of green sulphur bacteria (GSB) is the dominant phototroph, although its seasonal abundance changes more than 100-fold. Here, we analysed 413 Gb of Antarctic metagenome data including 59 Chlorobium metagenome-assembled genomes (MAGs) from Ace Lake and nearby stratified marine basins to determine how genome variation and population structure across a 7-year period impacted ecosystem function. A single species, Candidatus Chlorobium antarcticum (most similar to Chlorobium phaeovibrioides DSM265) prevails in all three aquatic systems and harbours very little genomic variation (≥ 99% average nucleotide identity). A notable feature of variation that did exist related to the genomic capacity to biosynthesize cobalamin. The abundance of phylotypes with this capacity changed seasonally ~ 2-fold, consistent with the population balancing the value of a bolstered photosynthetic capacity in summer against an energetic cost in winter. The very high GSB concentration (> 108 cells ml−1 in Ace Lake) and seasonal cycle of cell lysis likely make Ca. Chlorobium antarcticum a major provider of cobalamin to the food web. Analysis of Ca. Chlorobium antarcticum viruses revealed the species to be infected by generalist (rather than specialist) viruses with a broad host range (e.g., infecting Gammaproteobacteria) that were present in diverse Antarctic lakes. The marked seasonal decrease in Ca. Chlorobium antarcticum abundance may restrict specialist viruses from establishing effective lifecycles, whereas generalist viruses may augment their proliferation using other hosts. The factors shaping Antarctic microbial communities are gradually being defined. In addition to the cold, the annual variation in sunlight hours dictates which phototrophic species can grow and the extent to which they contribute to ecosystem processes. The Chlorobium population studied was inferred to provide cobalamin, in addition to carbon, nitrogen, hydrogen, and sulphur cycling, as critical ecosystem services. The specific Antarctic environmental factors and major ecosystem benefits afforded by this GSB likely explain why such a coherent population structure has developed in this Chlorobium species. FJebgLEuwcBu4fPtsFrHHT Video abstract Video abstract

In Antarctica, summer sunlight can persist for 24 hours and deliver intense photosynthetically active radiation to drive primary production by phototrophic communities of phytoplankton; levels as high as 1225 μE m −2 S −1 have been recorded [27,28]. While photosynthetic algae are known to play key phototrophic roles in the Southern Ocean, as do cyanobacteria in continental aquatic systems, comparatively little is known about Antarctic GSB [7,9,19,27]. The most wellcharacterized Antarctic GSB are Chlorobium from Ace Lake [13,14,19]. Ace Lake is one of many meromictic (stratified) lakes within East Antarctica, Vestfold Hills [29], a region that harbours Chlorobiaceae [8,9] (Fig. 1). Using microscopy, growth and isolation approaches, Chlorobiaceae were identified in a number of lakes and fjords, including Ellis Fjord and Taynaya Bay [8,9] (Fig.  1). Ellis Fjord and Taynaya Bay contain marine basins where shallow sills restrict water flow from the Southern Ocean thereby permitting stratification of the water column and the development of stable oxic-anoxic interfaces [29,31,32]. While Ace Lake is one of the most extensively studied systems in Antarctica in terms of microbiology [19,30,34], Ellis Fjord [8,35,36] and Taynaya Bay [8,37] have had little study and no metagenomic assessments.
In Ace Lake, the Chlorobium abundance exhibits marked seasonal variation, with highest abundance in summer, numbers falling during winter, and lowest abundance in early spring (> 100-fold lower than summer) before a rebound back into summer [19]. The seasonal fluctuation was attributed primarily to changes in light hours rather than to the possible controlling effects of viral predation [19]. Despite the availability of very large metagenome datasets and associated metagenome-assembled genomes (MAGs) for Chlorobium from Ace Lake, genomic variation and population structure have not been examined.
Insight into Antarctic haloarchaea genomic variation has been gained from analyses of single nucleotide polymorphisms (SNPs), and low coverage regions (LCRs) generated from mapping metagenome reads to reference genomes or MAGs [38][39][40][41]. LCRs arise from phylotypes that do not possess the sequences or have sufficiently diverged sequences that do not recruit. Phylotypes that contain genes in LCRs possess a unique genomic capacity compared to phylotypes that lack the genes. Highly divergent genes within LCRs can also confer distinct functional traits by conferring altered protein functions such as specificity for substrates or substrate preference, altered specificity for viral attachment to cell surface proteins, and so forth. Examining the function of genes from variable regions can determine whether phylotypes represent ecotypes that may occupy distinct ecological niches within an ecosystem.
In this study, the MAGs of Chlorobium from Ace Lake, Ellis Fjord, and Taynaya Bay were compared to each other and to non-Antarctic Chlorobium species in order to determine the following: (i) which Chlorobium species characterize the individual Antarctic systems; (ii) whether the species are endemic to Antarctica; (iii) what genomic traits characterize phylotypes within and between the Antarctic systems, including seasonal populations in Ace Lake. Chlorobium phylotypes and Chlorobium viruses were examined to determine: (i) the biogeographic distribution of Chlorobium viruses in the Vestfold Hills; (ii) the types of viral defence systems possessed by the Chlorobium; (iii) the characteristics of virus-host dynamics in each system. As a result, we greatly expanded knowledge of Antarctic Chlorobiaceae and learned how the unique Antarctic environment controls the evolution of these primary producers.

Results and discussion
Overview of metagenomes and Chlorobium MAGs Ace Lake, Ellis Fjord, and Taynaya Bay are herein referred to as AL, EF, and TB, respectively. Biomass was collected by filtration through a 20-μm pre-filter onto large format filters (3, 0.8, and 0.1 μm) for AL and EF, and into Sterivex cartridges (0.22 μm) for TB (see the "Methods" section). The filtered reads from 18 AL (~99 Gb), three EF (~48 Gb) and one TB (~12 Gb) oxicanoxic interface metagenomes were used for fragment recruitment (FR) analyses (Additional file 1: Table S1); for these analyses, the AL and EF metagenome reads from the three filter fractions representing a single sample (date and depth) were pooled to form merged metagenomes (see the "Methods" section). The assembled contigs from individual AL (~6 Gb), EF (~7 Gb) and TB (~700 Mb) metagenomes (Additional file 1: Table   Fig. 1 Location of Ace Lake, Ellis Fjord, and Taynaya Bay in the Vestfold Hills, East Antarctica. Ace Lake (68°28′ S, 78°11′ E) is 25 m deep with a strong halocline and chemocline that coincides with the oxic-anoxic interface at a depth of 12-15m, and supports the growth of a microbial community that was derived from the Southern Ocean about 5,000 years ago [19,27,29,30]. Ellis Fjord (68°36′ S, 78°07′ E) is an~10-km-long, narrow water inlet that contains six basins (EF1-EF6) that are up to 117 m deep, with the two inner basins (EF1 and EF2) being meromictic [31]. The sill at the entrance to Ellis Fjord is 4 m deep and the six marine basins are separated by sills of different depths (1-30 m) [8,29,31,32]. Taynaya Bay (68°27′ S, 78°17′ E) is a marine water inlet with a maximum depth of up to 80 m, containing six basins, of which five (Burke and TB1-TB4) are meromictic [29,31]. Ace Lake and Taynaya Bay Basin 1 are~2 km apart, and Ellis Fjord Basin 2 is~14 km to the west of Ace Lake. All three systems are covered by ice for much of the year. The satellite map of the Vestfold Hills and the distance measurements were produced using the interactive atlas available on Landsat Image Mosaic of Antarctica website [33]. The locations of Ellis Fjord and Taynaya Bay basins were from published data [29,31]. The photos of the aquatic systems were taken by Sarah Brazendale and Rick Cavicchioli S1) were used to determine the Chlorobium OTU abundance distribution in the three Vestfold Hills systems, and for viral analyses.
All 16S rRNA genes from AL, EF, and TB Chlorobium MAGs had identical sequences (1505 bp), as did all FmoA protein sequences (366 aa) (Additional file 1: Fig.  S1). The pair-wise, average nucleotide identify (ANI) of all Chlorobium MAGs was ≥ 99.9% over ≥ 92% alignment fraction. FR of AL, EF, and TB metagenome reads to the Chlorobium 16S rRNA gene (EF_ref MAG) revealed a number of SNPs with variant frequency ≥ 0.01 (i.e., at least 1% of the aligned reads contained the SNP) (Additional file 3: Dataset S2). All of these SNPs, except one from the AL Dec 2014 merged metagenome, two from the EF merged metagenome, and four from the TB metagenome, had very low read depth (on average < 5) and could represent sequencing errors (Additional file 3: Dataset S2). In contrast, the read depth of the Chlorobium 16S rRNA gene sequence (lacking SNPs) was > 80 in all AL (except Oct 2014, read depth 31), EF and TB metagenomes, and > 11,000 in some metagenomes (Additional file 3: Dataset S2). These data indicate that the same species of Chlorobium was present in all three Vestfold Hills systems, representing at least 97% of AL, 97% of EF, and 98% of TB Chlorobium population, and was the only detectable Chlorobium species in AL throughout a seasonal cycle (also see below in "Ca. Chlorobium antarcticum population variation between AL, EF, and TB"). IMG (Integrated Microbial Genomes) taxonomy denoted all MAGs as most closely related to Chlorobium phaeovibrioides DSM 265 (herein referred to as Cpv-DSM265). The 16S rRNA gene identity (99%; 17 nt mismatches; Additional file 1: Fig. S1a), FmoA protein identity (98%; six aa mutations; Additional file 1: Fig. S1b), ANI (85% over 80-86% alignment fraction), and average amino acid identity (AAI; 89%) distinguish the Antarctic species from Cpv-DSM265, and these differences are reflected in 16S rRNA gene and FmoA protein trees ( Fig. 3) (also see below in "Comparison of Ca. Chlorobium antarcticum to Cpv-DSM265 and global representation"). In view of the genomic and phylogenetic differences we name the Antarctic species, Candidatus Chlorobium antarcticum sp. nov. (from ant.arc'ti.cum. L. neut. adj. antarcticum southern, Antarctic) (type MAG AL_ref MAG = 3300023061_2; 99% complete; 0.55% contamination) (Additional file 1: Table S2; Additional file 2: Dataset S1).
A seasonal pattern was observed, with the proportion of the Ca. Chlorobium antarcticum population that possessed the genes within LCRs tending to be higher in summer than in winter or spring (Additional file 1: Tables S4, S5, and S6), most notably for genes associated with cobalamin synthesis and transport (also see below in "Population structure of cobalamin biosynthesis and transport genes").
The range of transport genes present in the LCRs of Ca. Chlorobium antarcticum MAGs is indicative of the population supporting a diversity of transport abilities (Additional file 1: Table S4 and S5). For example, protease export systems with similarity to Pseudomonas aeruginosa AprDEF were present in at least 28% of the Ca. Chlorobium antarcticum population, and abundance did not vary with season (Group 7 in Additional file 1: Table S5). For GSB, iron is an essential trace element required for the photosynthetic reaction centre [16]. The concentration of iron in AL increases with depth, being~1 μM at the oxic-anoxic interface [30,43]. TonB-dependent transporter and ABC transporter genes enable the uptake of both inorganic iron and organic forms of iron (siderophores, hemoproteins) [44]. All Ca. Chlorobium antarcticum MAGs contained two sets of ferrous iron transporter genes (feoABC and feoAB), and three TonB-dependent transporter genes potentially involved in iron complex import across the outer membrane. However, the ABC transporters associated with the uptake of iron complexes were only identified in LCRs (Groups 1 and 2 in Additional file 1: Table S5), indicating an augmented capacity for these phylotypes to source exogenous iron (at least 56% of the Ca. Chlorobium antarcticum population).
An N-ATPase operon (atpDCQRBEFAG) was present in at least 61% of the Ca. Chlorobium antarcticum population, with abundance varying only marginally by season (Group 8 in Additional file 1: Table S5); in addition, F 0 F 1 ATP synthase genes were present throughout the Ca. Chlorobium antarcticum population. N-ATPases utilize ATP to actively transport Na + or H + ions out of the bacterial cell [45][46][47]. The Ca. Chlorobium antarcticum ATPase subunit c amino acid sequence included the two glutamate residues in both of its C-and N-terminal helices that are diagnostic of Na +binding [45][46][47], indicating it functions in Na + export. N-ATPase genes have been identified in some Chlorobi, including Chlorobaculum parvum, Chlorobaculum tepidum (partial locus only), Pelodictyon luteolum, and Prosthecochloris aestuarii [48,49]. Chlorobium antarcticum in the oxic-anoxic interface of Ace Lake (AL), Ellis Fjord (EF), and Taynaya Bay (TB). The AL abundances were generated from a time-series of metagenomes from different seasons (x-axis: Dec summer, red font; Jul and Aug winter, blue font; Oct and Nov spring, green font), whereas the EF and TB abundances were from metagenomes from spring (EF, Oct 2014; TB, Nov 2014) (Additional file 1: Table S1). The AL and EF data were from samples collected on large format filters (y-axis: 3 μm, red; 0.8 μm, yellow; 0.1 μm, purple), whereas the TB data were from samples collected using Sterivex cartridges (y-axis: 0.22 μm, blue). Due to the dynamic range of the data (0.4-84%), the percentage abundance values for Ca. Chlorobium antarcticum in metagenomes from each filter fraction and time period (see relative abundance calculation in the "Methods" section) are shown below the bar chart. Ca. Chlorobium antarcticum population variation between AL, EF, and TB Similar to the analysis of SNPs within the AL population, no fixed SNPs were observed for EF metagenome reads against the EF_ref MAG. However, from 1807 genes in the EF_ref MAG, SNPs were identified in 68 genes only from AL, two only from TB, and 19 genes from both AL and TB ( Fig. 5; Additional file 1: Table  S7). Most SNPs occurred in genes involved in intracellular functions, with a smaller proportion in cell wall modification, substrate transport, and membrane protein genes. SNPs were present in regions of the EF_ref MAG that had even FR coverage, except for those in a hypothetical gene (contig E1, Additional file 1: Table S7), a precorrin-3B methylase/precorrin isomerase gene (contig E15, Additional file 1: Table S7), and gene for a receptor for the TonB-dependent uptake of ironcontaining proteins (contig E17, Additional file 1: Table  S7). This indicated that the AL and TB SNPs tended to occur within all Ca. Chlorobium antarcticum subpopulations, and were therefore characteristic of each system.
A total of 12 LCRs were identified from FR of AL, EF and TB metagenome reads to the EF_ref MAG ( Fig. 5; Additional file 1: Table S4). Notably, five AL LCRs identified against the AL_ref MAG were also LCRs from FR of AL, EF, and TB reads to the EF_ref MAG (Additional file 1: Table S4) indicating that the main (detectable) Ca. Chlorobium antarcticum phylotypes existed in all three Vestfold Hills systems. The LCRs encoded cell wall modification, cell defence, transport, DNA repair, protein modification, Na + or H + ion efflux, anaerobic cobalamin biosynthesis, cobinamide salvaging, and cobalt/ magnesium chelatase genes, similar to the gene functions of the AL_ref MAG LCRs. LCRs specific to the EF_ref MAG included cell wall modification, general function, and hypothetical genes.
To assess gene order of phylotypes, the contigs of AL, EF, and TB MAGs were aligned to AL_ref MAG (Additional file 3: Dataset S2). Most of the AL_ref MAG contigs that did not align to the contigs of the other MAGs were from AL_ref MAG LCRs, consistent with gene order varying in Ca. Chlorobium antarcticum phylotypes.
While the main phylotypes were shared amongst systems, some LCRs (e.g., contigs E29-E32) had very low read depth (≤ 2%) in all three systems (Additional file 1: Table S4) indicating that the genetic capacity represented by these contigs was rare within the overall Ca. Chlorobium antarcticum population. The relative coverage of some LCRs also varied considerably between systems indicative of different population structures for these specific genes ( Fig. 6; Additional file 1: Table S4). For example, the 11-kb contig E1 represented 3% of the EF Ca. Chlorobium antarcticum population but 69% of the TB Ca. Chlorobium antarcticum population. Based on relative coverage, phylotypes represented by LCRs contributed more to the TB Ca. Chlorobium antarcticum population than to the AL or EF populations (Fig.  6; Additional file 1: Table S4). However, EF_ref MAG SNPs were more prevalent for AL than TB, indicating that SNP-based variation was more similar between EF and TB Ca. Chlorobium antarcticum populations than either were to the AL population. The apparent differences in contribution of LCRs and SNPs to the Ca. Chlorobium antarcticum population from each system may reflect the cellular mechanisms involved in generating variation (e.g., DNA repair) and/or environmental effects (e.g., selective forces), and determining the causes will require further investigation (also see Additional file 1: Supplementary text).
To determine if phylotypes from AL, EF, or TB existed with greater sequence divergence than the FR matching criteria permitted (≥ 95% identity), G + C content of metagenome contigs was plotted against read depth and the taxonomy of contig clusters assigned (Additional file 1: Fig. S2); this approach was previously used to identify phylotypes of Antarctic haloarchaea with significantly different genomes to known species [38]. The contigs in the main cluster were from Ca. Chlorobium antarcticum (Additional file 1: Fig. S2). Aside from a number of contigs from some smaller clusters (see the "Methods" section), none of the OTUs of small clusters represented Ca. Chlorobium antarcticum, indicating that phylotypes with more divergence than the cutoffs used for assigning LCRs were not detectable in the metagenome data.
Collectively, the high ANI/AAI between MAGs (see above in "Chlorobium species present in EF and TB"), the small extent of variation represented by SNPs and LCRs, and the taxonomic findings of the analysis of GC/ read-depth clusters, illustrate that the Ca. Chlorobium antarcticum population has remarkably little genomic variation.

Comparison of Ca. Chlorobium antarcticum to Cpv-DSM265 and global representation
The AL, EF, and TB contigs had overall low nucleotide identity (< 90%) when aligned to the Cpv-DSM265 genome, with many gaps and differences in gene content (Fig. 7). As described previously, Ca. Chlorobium antarcticum is green rather than brown in colour (unlike Cpv-DSM265); as well as possessing the biosynthetic pathway for chlorobactene (found in green-coloured GSB), Ca. Chlorobium antarcticum lacks the capacities to synthesize bacteriochlorophyll e and isorenieratene, both found in Cpv-DSM265 and other brown-coloured GSB [19].
A number of Ca. Chlorobium antarcticum contigs did not align to the Cpv-DSM265 genome (Additional file 3: Dataset S2). These contigs contained anaerobic cobalamin biosynthesis, cobalt transport, cobalamin transport, cobalt/magnesium chelatase, and N-ATPase genes, all of which were absent from the Cpv-DSM265 genome. While cobalamin transport and magnesium chelatase genes were present in all Ca. Chlorobium antarcticum MAGs, all of the contigs that did not align with the Cpv-DSM265 genome represented LCRs of the AL_ref MAG and EF_ref MAG (Additional file 1: Tables S4, S5, and S6). It is therefore possible that Cpv-DSM265 represents a phylotype that lacks these genetic loci, or that the loci represent functions that are of particular importance to the Antarctic Ca. Chlorobium antarcticum population (also see below in "Population structure of cobalamin biosynthesis and transport genes").
The global representation of Ca. Chlorobium antarcticum was assessed by matching the Ca. Chlorobium antarcticum 16S rRNA gene to all 16S rRNA genes from public metagenomes and genomes and the Ca.
Chlorobium antarcticum FmoA protein sequence to all proteins from genomes (including MAGs and single-cell genomes) in IMG. All metagenome and genome matches were ≤ 99% 16S rRNA gene identity, and with the exception of Cpv-DSM265 (98% identity), all FmoA sequences had < 98% identity (Additional file 4: Dataset S3). The inability to identify Ca. Chlorobium antarcticum outside of Antarctica was in marked contrast to its representation in data from the three Vestfold Hills systems.
All the genes in the anaerobic pathway for cobalamin biosynthesis have been reported for Chlorobaculum tepidum [4]. However, a comparative genomics assessment of 11,000 bacterial species did not identify all cobamide biosynthesis genes in the 10 Chlorobi that were examined, including Cpv-DSM265, and categorized them as cobinamide salvagers [57]. We determined that Ca. Chlorobium antarcticum encodes the anaerobic pathway, with the genes exclusive to the anaerobic pathway (green-coloured branch between precorrin-2 and cob(II)yrinate a,c-diamide in Fig. 8) located in a LCR. At least 29% of the AL Ca. Chlorobium antarcticum population from all time periods, and 8% and 72% of the EF and TB Ca. Chlorobium antarcticum populations, respectively, possessed the genes, although coverage was about 2-fold higher in AL in summer compared to winter (Additional file 1: Tables S4 and S6).
The anaerobic synthesis of 5,6-dimethylbenzimidazole (DMB), the lower axial ligand of adenosylcobalamin, involves enzymes from the bzaABCDE operon acting on 5-amino-1-(5-phospho-β-D-ribosyl)imidazole as substrate [60]. While the Ca. Chlorobium antarcticum MAGs did not possess bzaABCDE or cobC it did encode the DMB activation and utilization genes (cobT, cobS). This indicates that similar to some other bacteria [68,69], Ca. Chlorobium antarcticum may have a capacity to remodel exogenous DMB to produce cobalamin. The gene cobC can perform the final step in adenosylcobalamin synthesis, but Ca. Chlorobium antarcticum MAGs lacked this gene and may instead utilize alternative genes, cblZ or cblXY, which have been proposed to function in Actinobacteria and some Alphaproteobacteria, respectively [61].
The Ca. Chlorobium antarcticum LCRs also contained a colocalized cluster of genes annotated as cobaltochelatase subunit CobN and magnesium chelatase subunits BchH, BchI and BchD (Additional file 1: Table S6). CobN forms a complex with cobaltochelatase subunits CobS and CobT (which were not identified in the MAGs) and catalyses cobalt insertion during aerobic cobalamin biosynthesis [70,71], and BchH, BchI and BchD can function in magnesium insertion during bacteriochlorophyll biosynthesis [72]. However, sequence similarity exists between cobaltochelatase NST and magnesium chelatase HID [73,74] and it has been speculated that BchI and BchD may function as CobS and CobT to form a functional cobaltochelatase complex [61]. In Ca. Chlorobium antarcticum, these cobalt/magnesium chelatase genes were colocalized with potential cobalamin transport genes (LCR5 in Additional file 1: Table S4; Groups 4 and 5 in Additional file 1: Table S5) and therefore may function in cobalamin biosynthesis. In support of this inference, it was speculated that the  colocalization of cobalt/magnesium chelatases beside a TonB-dependent receptor protein for cobalamin in Chlorobaculum tepidum may pertain to cobalt being inserted into exogenously acquired cobalamin [4]. Moreover, additional magnesium chelatase genes, including three coding for BchH and one each for BchI and BchD, were present throughout the Ca. Chlorobium antarcticum population which likely function in bacteriochlorophyll synthesis rather than cobalamin production. Most GSB contain three homologues of BchH, denoted BchH, BchS, and BchT [75], which have been reported to be active magnesium chelatases that exhibit differences in their enzymatic properties [76]. Cobalamin biosynthesis genes can be colocalized with the cobalt transporter genes cbiMNQO [61,62], and this was the case in Ca. Chlorobium antarcticum (LCR5 in Additional file 1: Table S4). Cobalt is relatively concentrated in AL, with~6 nM at the oxic-anoxic interface which is~300-times the concentration in sea water [30,43]. The cbiMNQO gene cluster was present in a LCR (Group 6 in Additional file 1: Table S5) with the genes present in at least 41% of the Ca. Chlorobium antarcticum population from all time periods, although an approximately 1.5-fold higher coverage occurred in summer compared to winter; the minimum abundance (~30%) and seasonal change (~2-fold higher in summer) are similar to the phylotypes containing the cobalamin biosynthesis genes.
The Ca. Chlorobium antarcticum MAGs contained cobA, cobP/cobU, and cbiZ, representing all the genes known in bacteria and archaea to be involved in salvaging cobinamide [63][64][65][66]. cbiZ can also function in salvaging pseudocobalamin, and cbiZ was the only gene located in a LCR ( Fig. 8; Additional file 1: Table S6). These data indicate that the whole lake population of Ca. Chlorobium antarcticum was likely adept at converting cobinamide into intermediates of cobalamin biosynthesis, and a subpopulation (at least 8% from all time periods) had the capacity to also salvage pseudocobalamin. The coverage of cbiZ was about 2-fold higher in summer, matching the seasonal abundance pattern of cobalt transporter and cobalamin biosynthesis genes (Additional file 1: Tables S5 and S6).
In Ca. Chlorobium antarcticum MAGs, the cbiZ and cobalamin transporter genes were colocalized (LCR5 in Additional file 1: Table S4), as is the case in many bacteria, including Chlorobium [65]. It has been speculated that Rhodobacter sphaeroides may use cobalamin transporters to scavenge pseudocobalamin produced by cyanobacteria and convert it to cobalamin precursors using CbiZ [65,66,[77][78][79][80]. AL supports a high abundance of Synechococcus that blooms in summer close to the oxic-anoxic interface [19,81], indicating that it may be the source of pseudocobalamin that is imported and converted to cobalamin precursors by cbiZ.
The biosynthesis and transport of cobalamin has been shown to be regulated by cobalamin-binding riboswitches that are present in the 5′-untranslated region of genes, including btuB (cobalamin transporter), metE (5-methyltetrahydropteroyltriglutamate homocysteine methyltransferase), and nrdD (ribonucleoside-triphosphate reductase) [85][86][87][88][89][90][91][92][93]. A total of six cobalamin riboswitch sequences were identified in LCRs of Ca. Chlorobium antarcticum, one each upstream of btuB and btuF (both cobalamin transporters), metE, nrdD, and at the end of two contigs ( Fig. 4b; Additional file 1: Table S6). Three additional cobalamin riboswitch sequences were identified throughout the Ca. Chlorobium antarcticum population, one each upstream of two btuB genes, and a hypothetical protein-coding gene. In Chlorobi, the genes with cobalamin riboswitch sequences are mainly translationally regulated; regulation has been shown to involve inhibition of translation initiation, where cobalamin (in the form of adenosylcobalamin) binds to the riboswitch RNA sequence of the regulated mRNA, leading to a perturbed mRNA structure that inhibits ribosome binding and subsequent translation [88,89,91].
Overall, the phylotype data for cobalamin-related biosynthesis, salvaging, and transport indicate that all of the Ca. Chlorobium antarcticum population is capable of importing cobalamin (Additional file 1: Tables S4, S5, and S6), although the proportion of the population with additional cobalamin transport genes varies with the system: EF, 7%; AL, 7% increasing to 25% in summer; TB, 78% (Additional file 1: Tables S4 and S5). Certain phylotypes are also capable of importing and salvaging cobinamide and pseudocobalamin, with this capacity also increasing in summer in AL.

Ca. Chlorobium antarcticum-virus interactions
The subtype I-E CRISPR-Cas system in Ca. Chlorobium antarcticum contained the core cas genes casA (or cse1) and casB (or cse2) with genes arranged cas3, casA, casB, casE, casC, casD, cas1, cas2, followed by a CRISPR spacer array, indicating the system could be functional. Analysis of NCBI gene annotation data showed CRISPR-Cas systems to be common in GSB, the subtypes to vary, and some species to contain multiple subtypes (Additional file 1: Table S8). No genes associated with BREX (bacteriophage exclusion) or DISARM (defence island system associated with restriction-modification) systems were identified. However, type I R-M (restriction-modification) methyltransferase and endonuclease and two type IV R-M endonuclease genes were identified (Additional file 1: Table S9), with the type I R-M genes present in a LCR (Additional file 1: Tables S4). Additionally, five genes associated with toxin-antitoxin (T-A) systems (parD, parE, relF, brnA, abiEi) were identified in Ca. Chlorobium antarcticum (Additional file 1: Table  S9), with brnA in a LCR (Additional file 1: Table S4). The most likely system to contribute to the control of viral propagation is the AbiE type IV T-A system, an ABI (abortive infection) system that causes cell dormancy and prevents viral dissemination [94], but it is unclear if this system was functional as the antitoxin gene (abiEi) was identified but not the toxin gene (abiEii).
Potential Ca. Chlorobium antarcticum viruses were identified by aligning the Ca. Chlorobium antarcticum CRISPR-Cas spacers to an Antarctic virus catalogue, and a spacer database was used to identify additional potential hosts of the viruses (see the "Methods" section) [19]. A total of 79 CRISPR spacers from EF Ca. Chlorobium antarcticum MAGs (Additional file 1: Table S10) mapped to potential viruses. Eight viral contigs had 97% identity to spacer Spc230 (Additional file 1: Table S11). The viral contigs were from AL metagenomes and belonged to viral cluster cl_248, a previously identified potential AL Chlorobium virus [19]. No EF Ca. Chlorobium antarcticum spacers were mapped to EF viral contigs, which likely reflects the smaller size of the EF metagenome dataset compared to AL which resulted in 6,104 EF viral contigs compared to 30,897 AL viral contigs in the Antarctic virus catalogue.
The viral contigs representing potential EF and TB Ca. Chlorobium antarcticum viruses were matched (100% identity) to host spacers, identifying potential hosts to be primarily Gammaproteobacteria and Chlorobi (including Chlorobium OTUs from the Vestfold Hills), plus Actinobacteria, Bacteroidetes, Firmicutes, Betaproteobacteria, Deltaproteobacteria, and Verrucomicrobia (Additional file 1: Table S12). These host assignments were similar to previous findings for AL Chlorobium viruses [19] and point to Ca. Chlorobium antarcticum viruses from all three systems belonging to similar viral clusters (e.g., cl_ 1024 and cl_248). This host analysis indicates that the viruses likely prey on several different bacterial genera as Fig. 9 Biogeographic association between viral contigs and Ca. Chlorobium antarcticum CRISPR spacers. The schematic depicts the Vestfold Hills and Rauer Islands systems that were the sources of the viral contigs that matched to Ca. Chlorobium antarcticum CRISPR spacers (Additional file 1: Table S11). Lines (red or blue) connect an aquatic system where CRISPR-spacers were identified to a system where matching viral contigs were identified. The width of a line (red or blue) approximates the number of spacer-viral contig matches. The dark blue end of a line (red or blue) denotes the system that was the source of the viral contigs, with the other end of the line being the source of the Ca. Chlorobium antarcticum CRISPR-spacers. Spacer-viral contig matches within the three systems harbouring Ca. Chlorobium antarcticum (AL, EF, and TB; red lines) are distinguished from spacer-viral contig matches between AL, EF, or TB, and the other aquatic systems in the Vestfold Hills and Rauer Islands (blue lines). Sources of Ca. Chlorobium antarcticum spacers are denoted by large circles: AL ( ), EF ( ), and TB ( ); other lakes are denoted by small circles ( ). Sources of viral contigs included: AL, DL (Deep Lake), CL (Club Lake), OL (Organic Lake), RL(F) (Rauer Lakes from Filla Island: RL2, 3, 11), RL(T) (Rauer Lakes from Torckler Island: RL5, 6, 13). The location of the systems relative to each other is shown approximately to scale a wide variety of hosts, and may therefore be considered generalist rather than specialist viruses [95][96][97].
The predicted Ca. Chlorobium antarcticum viruses also appeared to be widely distributed with spacer matches to viral contigs from hypersaline systems enriched in haloarchaea (Deep Lake, Club Lake, Rauer 3, 6, and 13 lakes) and diverse bacterial taxa (Organic Lake, Rauer 2, 5, and 11 lakes) ( Fig. 9; Additional file 1: Table  S11). Chlorobium has not been reported in these lake systems, and the microbial communities in Deep Lake [38,41] and Organic Lake [98,99] in particular, have been intensively studied. In contrast, the other potential hosts, notably Gammaproteobacteria, are prevalent in Organic Lake [98,99] and have been identified in some of the other lakes [38,41], further reinforcing that the potential Ca. Chlorobium antarcticum viruses have characteristics of generalist viruses infecting a broad host range [95][96][97].

Conclusions
We have shown that a single species of Chlorobium was detected in AL, EF, and TB that has distinct genomic traits to its closest relative Cpv-DSM265 (Additional file 1: Table S13) and is not identifiable in available metagenome data from elsewhere in the world. As such, we conclude that Ca. Chlorobium antarcticum is to the best of our knowledge, endemic to the stratified lakes and fjords of the Vestfold Hills of East Antarctica.
Variation present as SNPs and LCRs defined population variation of Ca. Chlorobium antarcticum, indicating the presence of phylotypes and ecotypes, with the population structure differing marginally amongst the three systems. Limited genomic variation of Ca. Chlorobium antarcticum in AL across a 7-year period illustrates that the population is currently stable. Seasonal changes in population structure were inferred to arise as a natural response to sunlight hours and growth of active populations. Population variation contributing to survivability was inferred for genes associated with cold adaptation, metabolism, and viral defence. In particular, cobalamin synthesis and transport stood out as a genomic facet of Ca. Chlorobium antarcticum that was subject to seasonal variation in population structure and was likely a trait relevant to effective ecosystem functioning.
Cobalamin deficiency can impair bacteriochlorophyll content and chlorosome formation, with cobalamin supplementation restoring bacteriochlorophyll content [100,101]. The higher abundance in summer (cf. 2-fold higher than winter) of Ca. Chlorobium antarcticum phylotypes that possess a genomic capacity for cobalamin biosynthesis, cobinamide and pseudocobalamin salvaging, cobalt transport, and/or cobalamin transport, fits with the importance of cobalamin for supporting phototrophic processes and may help cells recuperate after a long, dark winter to regain the very high abundance they achieve in summer. Conversely, the involvement of~30 genes and energetic cost associated with cobalamin biosynthesis [56] fits with the ecosystem supporting a reduced capacity in winter when sunlight is limited or absent. While bacteria rely on cobalamin for growth, most bacteria in microbial communities lack the biosynthetic capacity [56,57]. Ca. Chlorobium antarcticum is the most abundant species in AL and is key to ecosystem function, being probably the single most important member of the food web [19]. Its requirement for cobalamin for effective phototrophic growth likely generates positive selection within the Ca. Chlorobium antarcticum population for a biosynthetic capacity. As a result of its niche competitiveness, the species generates a very high level of biomass mid-water in the lake (>10 8 cells ml −1 ) [14]. Therefore, in addition to its role in carbon, nitrogen, hydrogen, and sulphur cycling [13,14,19], Ca. Chlorobium antarcticum is also likely to be the main provider of exogenous cobalamin to the lake food web; this provision would be facilitated by the seasonal lysis and release of cellular contents of > 99% of the summer population of cells.
Partially based on Chlorobium-virus interactions in AL, it was proposed that some Antarctic viruses may persist by achieving less harmful interactions with their hosts than counterparts from warmer environments [19]. However, Chlorobium-virus interactions are not well understood because very few GSB viruses have been described [17,102]. Through this study and a previous study [19], a total of 59 viral contigs and 12 viral clusters or singletons were mapped to Ca. Chlorobium antarcticum CRISPR-spacers, resulting in the discovery of 12 potential Chlorobium viruses. These viruses are predicted to be generalists. It has been speculated that viruses can evolve into specialist viruses when they are exposed to a homogenous host population (e.g., composed of a single species) that does not change with time, whereas generalist viruses can evolve from viruses exposed to a heterogenous host population (e.g., composed of multiple species) that fluctuates with time [95]. The adaptation of a specialist virus to effective replication in a single host may result in a cost to fitness when replicating in other potential hosts, whereas a generalist virus is not expected to suffer a fitness cost as it is adapted to replicate in different hosts [95]. While Ca. Chlorobium antarcticum represents a remarkably dominant species with relatively subtle population variation and may therefore be expected to harbour specialist viruses, its seasonal abundance in AL changes by at least 100-fold [19]. If as proposed, sunlight hours control seasonal abundance of AL Chlorobium [19], the marked change in host abundance may select against the establishment of specialist viruses, while still leaving Ca.
Chlorobium antarcticum as a host for generalist viruses that have a capacity to propagate in other bacterial hosts. In this regard, a reliance on sunlight and seasonal die-off during winter and early spring may significantly benefit the long-term persistence of Ca. Chlorobium antarcticum in Antarctic aquatic systems.
The Antarctic continent is geographically isolated and Antarctic environmental conditions distinguish it from most other regions of the globe [27,103,104]. The remoteness and environmental conditions create major logistical challenges for performing scientific research, yet without adequate research, policy makers will be compromised when making decisions about Antarctica's future [104]. Metagenomic approaches have greatly enhanced the understanding of indigenous Antarctic microorganisms [27,103]. For example, Antarctic soil bacteria were discovered that scavenge and oxidize atmospheric H 2 , which in association with CO and/or CO 2 , enables chemosynthetic growth [105]. In the Vestfold Hills and Rauer Islands, three different genera have been found to dominate the haloarchaea population of hypersaline lakes, making photoheterotrophy the main microbial process occuring in these lakes [38,40,41,106]. The species appear to be endemic to Antarctica, with one member, Halohasta litchfieldiae (tADL), constituting ≤ 45% of each lake's microbial community [38,41]. Relatively little genomic variation exists within and between the populations from the hypersaline systems, but both environment and distance effects have been inferred to contribute to biogeographical patterning of variation [41]. A major phylotype of Hht. litchfieldiae with relatively low ANI (~0.8) has also been discovered [38,40]. Based on our current research, we make the claim that Ca. Chlorobium antarcticum represents the Antarctic species with the least amount of known population-level, genomic variation. The capacity to state this is predicated on having a very large Ca. Chlorobium antarcticum metagenome dataset (~159 Gb) that provided a MAG read depth of up to~11,000. The coherence of the population is particularly striking in view of it being retained across a 7-year time span, across the populations from three distinct water bodies, and throughout the population of a seasonal cycle, during which relative cellular abundance changed > 100-fold. Future efforts need to evaluate how distinct Antarctic species and communities are by canvassing the environmental and biogeographic diversity of Antarctica's ecosystems and obtaining sufficient metagenomic depth to assemble MAGs and perform population-level studies. Achieving this will help to establish the extent of Antarctic microbial endemism, the uniqueness of contributions that Antarctic microbes make to global biogeochemical cycles, and the risks associated with anthropogenic impact, including climate change, on the Antarctic biome [27,104,107].

Methods
Sample collection, DNA sequencing, MAG generation, and abundance calculations The sampling, sequencing, assembly, and annotation of AL metagenomes were described previously [19,108]. Biomass from EF Basin 2 ( Fig. 1) was collected from 5-, 18-, 45-, and 60-m depths by filtration through a 20-μm prefilter onto large (293 mm diameter) format filters (3, 0.8 and 0.1 μm) and DNA extracted as previously described [13,38,41]. The sequencing, assembly, and annotation of EF metagenomes were performed by the Joint Genome Institute as previously described [19], generating 12 EF metagenomes (three filter fractions from four depths) (Additional file 1: Table S1). The biomass from TB Basin 1 ( Fig. 1) was collected from 5-and 11-m depths by filtration through a 20-μm prefilter into Sterivex cartridges (0.22 μm filter) and the DNA extracted and sequenced as previously described [108] (Additional file 1: Table S1). The QC filtered and error-corrected reads (BFC v181) [109] from the AL, EF, and TB metagenomes were assembled using metaSPAdes [110,111] and annotated through IMG (Additional file 1: Table  S1). The IMG pipeline generated Ca. Chlorobium antarcticum MAGs, of which we used 50 AL, seven EF, and two TB MAGs (one MAG per metagenome) that were medium to high quality and > 50% genome completeness; the MAGs with their respective metagenomes are available in IMG (see IMG Bin IDs in Additional file 1: Table S2; Additional file 2: Dataset S1). For MIMAG (minimum information about MAGs) [112] data preparation, MAG quality data and metadata were obtained from IMG, except MAG N50 and L50 contig statistics which were generated using Quast v5.0.2 [113] (Additional file 2: Dataset S1). Chlorobium OTU abundances from AL were calculated previously [19]. Contig taxonomy assignments, Chlorobium OTU bin refinement, abundance calculations, and alpha diversity (Simpson's index of diversity) from EF and TB metagenomes were determined as previously described [19].

Ca. Chlorobium antarcticum genomic variation
The metagenome reads from the oxic-anoxic interface of AL, EF, and TB were used for FR analyses of Ca. Chlorobium antarcticum (Additional file 1: Table S14). The AL metagenomes used were all Illumina data and repre- However, it is noteworthy that Chlorobium abundance in AL in 2006 was previously shown to be comparable to 2008 and 2013/2014 [19], so inferences from this study are likely to apply to the 2006 population. The AL and EF reads from the three filters from a specific time period and depth were pooled and converted to multi-FASTA format using an in-house script, thereby facilitating comparative analyses between AL and EF metagenomes (biomass in the size range 0.1-20 μm) with TB metagenomes (0.22-20-μm biomass size range) (Additional file 1: Table S14). For the analysis of genomic variation within the AL Ca. Chlorobium antarcticum population, the MAG from Dec 2014, 19-m depth, 0.1-μm filter was used (AL_ref MAG). For analyses between AL, EF, and TB, the EF Ca. Chlorobium antarcticum MAG from 45-m depth, 3-μm filter was used (EF_ref MAG). The two MAGs were selected because they had the highest total base pair count and > 99% genome completeness. To determine the Ca. Chlorobium antarcticum MAG contig arrangement that best represents a draft genome, the AL_ref MAG and EF_ref MAG contigs were organised in Mauve v2.4.0 [114] using Cpv-DSM265 as the reference genome with default parameters. Contigs were subsequently manually reordered by comparing nucleotide sequences from AL, EF, and TB using the blastn module of BLAST+ v2.9.0 [115] and considering only ≥ 500-bp alignment length matches of 100% identity. Arising from this, MAG contigs were grouped into scaffolds (Additional file 1: Table S3).
The metagenome reads were aligned to AL_ref MAG or EF_ref MAF using BBMap v38.51 [116] with 95% minimum alignment identity (minid = 0.95), generating SAM files. The BAM and BAI alignment and index files were created from SAM files using Samtools v1.10 [117] and were used for SNP analysis in IGV [118]. Only the SNPs with variant frequency ≥ 0.9 (i.e., at least 90% of the reads aligned at the position containing the SNP) were considered fixed mutations, similar to a previously described method [38]. The total number of aligned reads and the base coverages of AL_ref MAG and EF_ref MAG were calculated using the "flagstat" and "depth" functions of Samtools, respectively. To identify LCRs, the base coverages of AL_ref MAG and EF_ref MAG in metagenomes from AL, EF, and TB were plotted on circos plots using R v4.0.2. The LCRs that spanned multiple adjacent contigs on a scaffold were considered a single LCR (Additional file 1: Table S4); for example, LCR5 spanned contigs A13-A17 from AL_ref MAG and contigs E14-E15 from EF_ref MAG. The IMG autoannotated genes identified in LCRs were manually annotated by aligning the protein sequences to reference proteins from the UniProtKB/Swiss-Prot database using the ExPASy BLAST+ online service [119], and those with poor alignment or no hits were realigned to reference proteins in the UniProtKB database or RefSeq protein database using the NCBI blastp suite [120].
For comparison of gene order between AL_ref MAG and other AL, EF, and TB high-quality MAGs of ≥ 99% genome completeness, the MAG contigs were aligned using the blastn module of BLAST+ v2.9.0. The alignments were manually parsed to assess the gene order of MAGs compared to that of AL_ref MAG, and MAG contigs that did not align, had lower identity matches (< 80%) or short length matches (< 1 kb) were identified (Additional file 3: Dataset S2).

GC content vs read depth analysis
Based on an approach previously reported for analysing haloarchaea [38], metagenome contigs of length ≥ 1 kb, and 30-70% GC content, and Ca. Chlorobium antarcticum MAG contigs from AL, EF, and TB, were plotted in a GC content-read depth 2D space using Python v3.6.4. The metagenome contig clusters placed close to the Ca. Chlorobium antarcticum MAG contig cluster that had a GC content ranging from 35-65% and read depth up to 7500 and length ≥ 10 kb, were selected for taxonomic analysis. The contigs were aligned to the Ca. Chlorobium antarcticum MAGs and Cpv-DSM265 genome. The alignment files were manually parsed to identify cluster contigs with low identity and high query alignment fraction (≥ 5 kb), and their taxonomies were determined using the IMG Phylodist file-based contig taxonomies, as described previously [19]. Some small clusters of metagenome contigs were from Ca. Chlorobium antarcticum (Additional file 1: Fig. S2c), with 100% identity to Ca. Chlorobium antarcticum MAG contigs. These metagenome contigs likely belonged to two incomplete Ca. Chlorobium antarcticum MAGs (60% and 66% bin completeness) generated from 0.8-3-and 0.1-0.8-μm filter Nov 2008 spring metagenomes, respectively, from AL oxic-anoxic interface.

Ca. Chlorobium antarcticum phylotype abundance
The Ca. Chlorobium antarcticum population containing a "region of interest" (specific LCR, gene, or gene cluster) was determined from the relative coverages of the corresponding region, calculated using the formula: where Region is the region of interest and MAG is AL_ ref MAG or EF_ref MAG. The numerator indicates the sum of the read depths of the bases in a region of interest or MAG, calculated in each metagenome. The denominator indicates the total number of bases in the region of interest or MAG.
The approximate percentage of the Ca. Chlorobium antarcticum population containing a region of interest, in a season (summer, winter, spring) or a system (AL, EF, TB) were determined by averaging the percentages calculated in metagenomes from a season or a system, respectively. To assess the significance of the differences in summer and winter coverages of LCR genes of AL_ref MAG, the DESeq2 R package [121] was used with gene read depths from all time periods. The result for summer and winter comparison was generated using the "contrast" option of DESeq2 result function. DESeq2 method uses Wald test to calculate the P-value for significance analysis and uses Benjamini-Hochberg adjustment to calculate adjusted P-value for assessing significance considering a specific false discovery rate (i.e., the fraction of false positives amongst the significant values). Here, P-values < 0.05 were considered significant at the 95% significance level. BH-adjusted P-values < 0.05 were regarded as significant, considering a 5% fraction of false positives as acceptable (Additional file 1: Tables S5 and S6).

Comparative analysis of Ca. Chlorobium antarcticum and Cpv-DSM265
A total of 31 AL, five EF and two TB Ca. Chlorobium antarcticum MAGs with ≥ 99% genome completeness were aligned to the Cpv-DSM265 genome (RefSeq ID: NC_009337.1) using the blastn module of BLAST+ v2.9.0 and Samtools v1.10, generating SAM, BAM, and BAI files. The alignments were analysed using IGV to assess the types of variations (indels or SNPs) in MAG sequences. The auto-annotated genes on MAG contigs or Cpv-DSM265 genome that showed no alignment were assessed. To identify cobalamin riboswitch sequences in Ca. Chlorobium antarcticum, four cobalamin riboswitch genes from the Cpv-DSM265 genome were aligned to AL_ref MAG contigs using the NCBI blastn suite [120]. The Ca. Chlorobium antarcticum cobalamin riboswitch sequences were verified, and additional cobalamin riboswitch sequences were identified, using the Rfam database [122,123] (Additional file 1: Table S6). The overall functional potential of Cpv-DSM265 and Ca. Chlorobium antarcticum MAGs from AL (AL_ref MAG), EF (EF_ref MAG), and TB (MAG from TB 11-m depth metagenome) were compared using COG number data generated by IMG. The COG numbers were categorized using COG reference data from IMG (database accessed on 21 December 2020). Genes with COG number assignments belonging to more than one COG category were assigned to multiple categories (Additional file 1: Fig. S4).

ANI, AAI, and phylogenetic analyses
The pair-wise ANI of Ca. Chlorobium antarcticum MAGs, as well as ANI against the Cpv-DSM265 genome were calculated using pyani [124]. The AAI of MAGs was calculated using the AAI-profiler online service [125], which compared the input protein sequences with the proteins of species in the UniProt database [126]. The phylogenetic analysis of Ca. Chlorobium antarcticum was performed using the 16S rRNA gene and FmoA protein sequences from AL, EF, and TB MAGs, as well as various members of the Chlorobiaceae family (Additional file 1: Table S15). The 16S rRNA genes were aligned using the ClustalW algorithm and FmoA proteins were aligned using the Neighbour Joining cluster method of the MUSCLE algorithm in MEGA X v10.1.7 [127]. The alignments were used for generating maximum likelihood trees in MEGA using default parameters and 1000 bootstrap values.
The proportion of the Chlorobium population that was represented by Ca. Chlorobium antarcticum in the AL, EF, and TB oxic-anoxic interface metagenomes was estimated by aligning AL, EF and TB metagenome reads to the Ca. Chlorobium antarcticum 16S rRNA gene from EF_ref MAG using BBMap v38.51 and Samtools (see above in "Ca. Chlorobium antarcticum genomic variation"). The default minid was used for alignment with BBMap. SNPs with variant frequency ≥ 0.01 (i.e., at least 1% of the reads aligned at the position containing the SNP) were considered during analysis in IGV (Additional file 3: Dataset S2).
Assessment of the endemism of Ca. Chlorobium antarcticum to the Vestfold Hills was performed by comparing Ca. Chlorobium antarcticum marker (16S rRNA gene and FmoA protein) sequences to available metagenome and genome data in IMG. The Ca. Chlorobium antarcticum 16S rRNA gene was aligned to the IMG databases of 16S rRNA genes from publicassembled metagenomes (accessed on 14 Mar 2021) and public isolates (accessed on 30 Mar 2021) using the IMG RNA BLAST (blastn) online service with e-value 10 −5 . The Ca. Chlorobium antarcticum FmoA protein sequence was aligned to the IMG isolate protein database (including proteins from isolate genomes, MAGs, and single-amplified genomes; accessed on 14 Mar 2021) using the IMG RNA BLAST (blastp) online service with e-value 10 −5 .

Ca. Chlorobium antarcticum defence genes and associated viruses
The AL, EF, and TB Ca. Chlorobium antarcticum MAG genes were manually parsed to identify those associated with defence, such as R-M, DISARM, BREX, and T-A (specifically ABI mechanism) systems. The putative defence genes were manually annotated (see above in "Ca. Chlorobium antarcticum genomic variation").
The potential viruses associated with EF and TB Ca. Chlorobium antarcticum were determined using the CRISPR spacers and repeats in metagenome IMG CRISPR annotation files, as well as the data in an Antarctic virus catalogue and IMG/VR spacer database, as described previously [19]. The Antarctic virus catalogue contained a list of viral contigs identified in a range of Antarctic metagenomes, along with their viral cluster or singleton designations, and the IMG/VR spacer database contained a list of spacer sequences and their matches to host contigs [128]; the construction of these two databases was described previously [19]. The databases did not include TB metagenome data as these metagenomes were not available at the time the databases were created. To identify TB viral contigs, all TB assembled contigs were aligned to the Antarctic virus catalogue using the blastn module of BLAST+ v2.9.0, with e-value 10 −3 and ≥ 97% alignment identity. A total of 995 TB contigs with ≥ 1000-bp alignment length and 100% identity across the whole length of either the query contig or the reference viral contig were considered to be TB viral contigs; this approach to identifying TB viral contigs from matches to the Antarctic virus catalogue is not as rigorous as might be achieved using the virus identification pipeline [129].
The Ca. Chlorobium antarcticum CRISPR spacers in EF and TB metagenomes were identified from the Ca. Chlorobium antarcticum MAGs and Chlorobium OTU refined bins (Additional file 1: Table S10). CRISPR arrays tended to be present at the ends of contigs, possibly indicative of assembly constraints caused by sequence repeats. To potentially capture a greater number of spacers, TB MAGs derived from assembly of non-error corrected reads (IMG Genome IDs: 3300038786, 3300039186) were also analysed. The viral contigs potentially associated with EF and TB Ca. Chlorobium antarcticum were determined by aligning the Chlorobium spacer sequences to viral contigs in the Antarctic virus catalogue and to TB viral contigs using the 'megablast' option of BLAST+ v2.9.0, with e-value 10 −3 and ≥ 97% alignment identity. The data in the Antarctic virus catalogue were used to assign viral cluster or singleton designations to the potential Ca. Chlorobium antarcticum viral contigs. This approach to assessing virus-host relationships was described previously [19].