- Open Access
Microdiversity and phylogeographic diversification of bacterioplankton in pelagic freshwater systems revealed through long-read amplicon sequencing
Microbiome volume 9, Article number: 24 (2021)
Freshwater ecosystems are inhabited by members of cosmopolitan bacterioplankton lineages despite the disconnected nature of these habitats. The lineages are delineated based on > 97% 16S rRNA gene sequence similarity, but their intra-lineage microdiversity and phylogeography, which are key to understanding the eco-evolutional processes behind their ubiquity, remain unresolved. Here, we applied long-read amplicon sequencing targeting nearly full-length 16S rRNA genes and the adjacent ribosomal internal transcribed spacer sequences to reveal the intra-lineage diversities of pelagic bacterioplankton assemblages in 11 deep freshwater lakes in Japan and Europe.
Our single nucleotide-resolved analysis, which was validated using shotgun metagenomic sequencing, uncovered 7–101 amplicon sequence variants for each of the 11 predominant bacterial lineages and demonstrated sympatric, allopatric, and temporal microdiversities that could not be resolved through conventional approaches. Clusters of samples with similar intra-lineage population compositions were identified, which consistently supported genetic isolation between Japan and Europe. At a regional scale (up to hundreds of kilometers), dispersal between lakes was unlikely to be a limiting factor, and environmental factors or genetic drift were potential determinants of population composition. The extent of microdiversification varied among lineages, suggesting that highly diversified lineages (e.g., Iluma-A2 and acI-A1) achieve their ubiquity by containing a consortium of genotypes specific to each habitat, while less diversified lineages (e.g., CL500-11) may be ubiquitous due to a small number of widespread genotypes. The lowest extent of intra-lineage diversification was observed among the dominant hypolimnion-specific lineage (CL500-11), suggesting that their dispersal among lakes is not limited despite the hypolimnion being a more isolated habitat than the epilimnion.
Our novel approach complemented the limited resolution of short-read amplicon sequencing and limited sensitivity of the metagenome assembly-based approach, and highlighted the complex ecological processes underlying the ubiquity of freshwater bacterioplankton lineages. To fully exploit the performance of the method, its relatively low read throughput is the major bottleneck to be overcome in the future.
Microbial phylogeography is the study of the diversification and distribution of microorganisms across space and time, and offers insights into eco-evolutionary processes that generate and maintain the ubiquity and diversity of microbial populations. However, our understanding of microbial phylogeography is far behind that of macroorganisms  and has not yet achieved a general consensus, as evidenced by the fact that the old tenet of microbiology “Everything is everywhere, but the environment selects” remains a matter of debate . The next key step is accurately profiling the diversity of environmental microbial assemblages, which is challenging because they are dominated by organisms that are recalcitrant to cultivation [3, 4]. The rapid advances of sequencing and bioinformatics technologies have provided novel opportunities for cultivation-independent, high-resolution analysis that may effectively address this long-standing topic.
For prokaryotes, the current de facto standard approach for phylogenetic profiling of an environmental community is high-throughput amplicon sequencing of the 16S rRNA gene. However, in exchange for its universality, the phylogenetic resolution of the 16S rRNA gene is limited, allowing organisms to be resolved to the genus, but not species level [5, 6]. Furthermore, the phylogenetic resolution is usually limited by the short reads (150–300 bp paired-end) generated by Illumina sequencers, which is the most commonly used sequencing platform and can sequence only a portion of the 16S rRNA gene (~ 1500 bp). To achieve finer phylogenetic resolution, the more variable ribosomal internal transcribed spacer (ITS) region, located between the 16S and 23S rRNA genes, is a common alternative marker. The ITS sequence can differentiate ecologically distinct intra-lineage populations that cannot be resolved using the 16S rRNA gene. ITS-based microdiversities across space, time, and environmental gradients have been reported for members of marine Pelagibacter [7, 8] picocyanobacteria [9,10,11] and freshwater Limnohabitans [12,13,14], Polynucleobacter , and Synechococcus . In contrast to the 16S rRNA gene, ITS lacks a universal primer and a comprehensive database, and is too variable for determining diversity across lineages; for example, the length of ITS varies by hundreds of base pairs within a phylum . Therefore, the ITS is ideally sequenced along with the adjacent 16S rRNA gene to achieve finer phylogenetic resolution and broad taxonomic classification .
The technical limitations of short-read platforms have recently been tested using long-read sequencing platforms, namely, Pacific Biosciences (PacBio) and Oxford Nanopore. These platforms can sequence amplicons of the complete 16S rRNA gene [19,20,21,22] and even the entire rRNA gene operon including the ITS and small and large subunit rRNA genes (> 4000 bp) of prokaryotes [23,24,25] and microbial eukaryotes [26,27,28]. A major drawback of long-read sequencing is its higher per-base error rate compared to short-read sequencing. With the PacBio platform, this issue can be solved through construction of a circular consensus sequence (CCS), in which individual amplicon molecules are sequenced many times using circularized library templates that provide consensus-sequence error correction . When combined with a quality-filtering process based on per-base quality scores, the error rate of a CCS-generated amplicon can be reduced to a level comparable to those of short-read platforms [19, 24, 27] allowing analysis at single-nucleotide resolution .
Genomic average nucleotide identity (ANI) is another promising approach for high-resolution phylogenetic profiling of prokaryotes, with ANI > 95% considered a robust threshold for species-level classification [30,31,32]. ANI is often applied to a metagenome-assembled genome (MAG), a draft-quality genome reconstructed from shotgun metagenomic sequences, providing the opportunity to perform cultivation-independent, genome-resolved phylogenetic analysis. However, the MAG-based approach suffers from several technical limitations: First, the number of samples is practically limited due to the high cost of metagenomic sequencing. Second, the analysis requires reconstruction of a high-quality MAG, which is challenging for bacterial lineages with low abundance (i.e., low sequencing coverage) or those harboring highly microdiversified genotypes [33, 34]. Third, a MAG often lacks a 16S rRNA gene due to the difficulty of reconstructing such a highly conserved region , making it difficult to link the MAG with 16S rRNA gene-based phylogenies. Long-read amplicon sequencing can avoid these limitations and provide a complementary approach to MAG-based analysis that can achieve high-resolution phylogenetic profiling of an environmental microbial assemblage.
Lakes are physically disconnected ecosystems, and thus offer a good model for investigating the diversification and phylogeographic processes of microbes. Whereas 16S rRNA gene-based phylogenies have been used to characterize the cosmopolitan bacterioplankton lineages that are ubiquitously dominant in freshwater systems [36,37,38], analyses at finer phylogenetic resolution have been used to identify intra-lineage microdiversity and phylogeography. A global-scale investigation of microdiversity was conducted for the PnecC subcluster of the genus Polynucleobacter (Betaproteobacteria), a cosmopolitan freshwater bacterial group, using the ITS and glnA gene as markers, which showed distance–decay patterns and global-scale niche separation of subgroups adapted to different thermal conditions . Genome-resolved studies focusing on other ubiquitous lineages, such as LD12 (Alphaproteobacteria) and acI (Actinobacteria), have also shown that geographically isolated lakes are generally inhabited by distinct species (i.e., ANI < 95%) [39, 40]. Meanwhile, genomes sharing ANI > 95% were isolated from lakes located up to hundreds of kilometers apart and separated by the Alps [41, 42] and even between lakes on different continents [40, 43, 44]. No global phylogeographic patterns at the phylogenetic resolutions of ITS  and ANI  were observed for Microcystis, ubiquitous bloom-forming freshwater cyanobacteria. Due to the sparseness of data at fine phylogenetic resolution across habitats and lineages, the following questions remain open: (i) How much microdiversity exists for each bacterioplankton lineage within and among lakes? (ii) Does microdiversity exhibit any phylogeographic patterns following distance–decay relationships or environmental gradients? (iii) Are phylogeographic patterns similar or distinct among microbial lineages?
To address these questions, the present study first applied single nucleotide-resolved long-read amplicon sequencing targeting the nearly full-length 16S rRNA gene and ITS regions (~ 2000 bp) to investigate the microdiversity and phylogeographic patterns of multiple freshwater bacterioplankton lineages among multiple lakes. Sampling was performed at pelagic sites of deep (> 70 m) oligo-mesotrophic freshwater lakes. Compared with shallow freshwater habitats, deep lakes are fewer in number, but are characterized by larger water volume, longer water retention time, and older age . Thus, the pelagic microbial communities of deep lakes are expected to be less strongly influenced by disturbances from terrestrial and sedimental inputs and to show more robust spatial and temporal distributions, allowing analysis of limited samples to better represent their diversification, dispersal, and historical processes. We explored nine Japanese and two European lakes and performed analyses at both the regional and intercontinental scales. In each lake, samples were collected from two water layers, the surface mixed layer (epilimnion) and the oxygenated hypolimnion (water layer below the thermocline). The oxygenated hypolimnion is generally found in deep oligo-mesotrophic holomictic lakes and is inhabited by specific bacterioplankton lineages . While inhabitants of the epilimnion may migrate over long distances using neighboring shallow lakes and ponds as “stepping stones” [45, 48], hypolimnion inhabitants are more likely to be isolated due to the limited occurrence of their habitat. Therefore, we hypothesized that hypolimnion-specific lineages would be more deeply diversified among lakes than epilimnion-specific lineages.
Samples were collected from nine Japanese and two European perialpine lakes (Fig. 1 and Table 1). Physicochemical details of the lakes are provided in Table S1. In each lake, two water layers, the epilimnion and oxygenated hypolimnion, were sampled at a pelagic station during the stratification period (Table 1). Samples of the perialpine lakes were collected in October 2017 using 0.22-μm pore-size polyethersulfone filter cartridges (Millipore, Sterivex SVGP01050) following prefiltration through a 5.0-μm pore-size polycarbonate filter (Whatman, cat. no. 111113). At least 2 L of lake water was filtered for each sample and the filter cartridge was stored at − 20 °C until further processing. The filter was manually removed from the Sterivex cartridge and processed using the PowerBiofilm DNA Isolation Kit (MoBio Laboratories) for extraction of DNA. Samples from the Japanese lakes were collected during a previous study in 2015 for short-read 16S rRNA gene amplicon sequencing , and the remaining DNA extracts were used for the present study. In addition, DNA samples collected in October 2010 from Lake Biwa  were used as temporal replicates for comparison of samples taken in 2010 and 2015 from the same lake. Each sample originated from the same water sampler. When a water sample was filtered through more than one filter, the extracts from each filter were pooled. A total of 24 samples (12 lakes/times × 2 depths) were collected.
Polymerase chain reaction (PCR), library preparation, and sequencing
To amplify the nearly full-length 16S rRNA gene and ITS sequences, we used the primer set 27Fmod [5′-AGRGTTTGATYMTGGCTCAG-3′]  and 23Sr-mod [5′-RGTTBYCYCATTCRG-3′]. The latter primer was based on the widely used 23Sr (also known as 23S-125r) primer [5′-GGGTTBCCCCATTCRG-3′], which targets the ITS side of the 23S rRNA gene [12, 18, 51,52,53], and was modified to cover broader taxonomic groups (Table S2). For multiplexing, a 16-base index sequence was prepared for each sample and conjugated to the 5′ end of both the forward and reverse primers. PCR was performed in 25-μL reactions containing 2 μL of template DNA (1–5 ng μL−1) using the Blend Taq Plus (TOYOBO, Osaka, Japan) buffer system. Cycling conditions were as follows: initial denaturation at 94 °C for 180 s, followed by 35 cycles of amplification (denaturation at 94 °C for 45 s, annealing at 50 °C for 45 s, extension at 72 °C for 90 s) and a final extension at 72 °C for 180 s. Amplification of approximately 2000-bp amplicons was confirmed through agarose gel electrophoresis. For each sample, at least two reaction mixtures were prepared and pooled after amplification to obtain sufficient DNA and mitigate potential PCR biases. The resulting PCR products were purified using AMPure XP beads (Beckman Coulter), quantified with the Qubit dsDNA HS Assay kit (Thermo Fisher Scientific), and pooled equimolarly. The sequencing library was prepared using the SMRTBell Template Prep Kit 1.0 following the procedure and checklist 2 kb template preparation and sequencing protocol (Pacific Biosciences, Inc.; document version, PN001-143-835-08). Sequencing was performed on the PacBio RSII sequencer using P6-C4 chemistry and five SMRT (single-molecule real-time) cells with a 4-h video length. Demultiplexed CCSs were generated using the RS_ReadsOfInsert.2.1 protocol in SMRT Analysis software version 2.3.0 with the following settings: Minimum Full Passes = 2, Minimum Predicted Accuracy = 90, and Minimum Barcode Score = 22 in Symmetric Barcode Mode. The raw CCS reads were deposited under accession number PRJDB9651.
Analysis of sequencing outputs
The 159,262 CCS reads (average length = 1951 bp) obtained from the 24 samples were analyzed using the DADA2 v. 1.12.1 package  with the R v. 3.4.4 software (http://www.R-project.org/), which is a pipeline for assigning reads to single nucleotide-resolved amplicon sequence variants (ASVs). We followed the workflow previously designed for CCS-based long-read amplicon sequencing : The demultiplexed FASTQ files of CCS reads were processed by sequentially applying the removePrimers, filterAndTrim, derepFastq, learnErrors, dada, makeSequenceTable, and removeBimeraDenovo functions (the R script used here is available in the Supplementary Text). The resulting table mapped 28,153 reads from 24 samples to 742 ASVs (Supplementary Dataset).
The 742 ASV sequences thus obtained (including both 16S rRNA gene and ITS) were aligned against the SILVA SSU 132 Ref NR database  using the SINA Aligner v. 1.2.11 webserver  with default parameters. Based on the alignment results, the 16S rRNA gene region was extracted from each ASV and clustered using the --cluster_fast command in VSEARCH v. 2.8.0 software . Clusters generated using 97% and 100% identity thresholds were designated as operational taxonomic unit (OTU) and SSU-ASV, respectively. A representative sequence for each OTU was determined using the --consout option. Each OTU was taxonomically classified using the SINA Aligner v. 1.2.11 webserver with reference to the SILVA SSU 132 Ref NR database. In addition, each OTU was annotated based on the nomenclature for freshwater bacterioplankton proposed by Newton et al. , which was applied using the “—search” option in the SINA v. 1.2.11 stand-alone tool against the original ARB  database provided by Newton et al. . Finally, ASVs were assigned back to OTUs and SSU-ASVs using the --usearch_global command in the VSEARCH software with 97% and 100% identity thresholds, respectively (Supplementary Dataset). For unknown reasons, the number of reads assigned to each sample varied considerably: from 190 (epilimnion of Lake Zurich) to 4515 (hypolimnion of Lake Biwa), with a median of 1114.5 (Supplementary Dataset). To make the best use of the available data, we used all reads for analysis and did not perform rarefaction. For each OTU, samples with < 20 assigned reads were not included in subsequent analysis to minimize biases caused by a low number of reads. To avoid biases introduced by uneven sequencing depths among the samples, our investigation focused on the dominant ASVs and did not address the overall richness or rare ASVs.
For each OTU, the pairwise Bray–Curtis dissimilarities of ASV composition were calculated among samples using the vegan v. 2.5-2 package  of the R software. Hierarchical clustering of the data was performed using the hclust function with the ‘methods = “ward.D2”’ option, and the results were visualized using the ComplexHeatmap v. 2.2.0 package  of the R software.
Assessment of methodological performance through metagenomic read mapping
To evaluate the performance of our single nucleotide-resolved analysis, we tested whether the single nucleotide polymorphisms (SNPs) predicted from ASVs can be reproduced in shotgun metagenomic reads. We focused on sequences of the CL500-11 lineage in Lake Biwa, and used the assembly and read mapping results generated from a sample collected in the hypolimnion of the lake in September 2016 . Reads mapped to the contig containing the rRNA gene operon of CL500-11 were analyzed using the IGV v. 2.4.10 software  to determine the base frequency for each nucleotide position.
Results and discussion
Methodological performance and limitations
Two major challenges are associated with the long-read sequencing platform: read accuracy and read throughput. In response to the former challenge, we targeted the 16S rRNA gene and ITS sequences but excluded the adjacent 23S rRNA gene, as a longer insert would result in (i) lower per-base quality of CCS due to fewer rounds of subreads and (ii) a smaller number of error-free reads at the same per-base error rate. To test the accuracy and sensitivity of our analysis, we focused on the CL500-11 lineage in the hypolimnion of Lake Biwa—the OTU and sample with the greatest number of reads (2606 reads) in the present study (Supplementary Dataset)—and investigated whether the SNPs expected from ASVs could be reproduced through metagenomic read mapping. Although the metagenomic sample was collected a year after the main sampling period for the present study , metagenomic reads (i.e., randomly fragmented total DNA) reflect real sequence variance in the environment and thus are a good reference for assessing the performance of our approach. We identified 30 base positions with SNPs by aligning all 24 ASVs of CL500-11 detected in the sample (Fig. 2a). The results showed that among all observed SNPs, 78.9% were common to both ASV and mapping analyses, whereas only 5.3% were expected based on ASV but undetected through mapping (Fig. 2a and b). Such high reproducibility was remarkable because most SNPs expected based on ASVs were rare (occurring in < 10% of the reads) (Fig. 2a). Besides, the proportions of major SNPs were almost identical in ASV and mapping analyses (Fig. 2c). Notably, the original metagenomic study  assembled only one (ASV_2) of the ASVs present in the lake (Fig. 3). This limitation is due to the assembler generating a consensus high-quality contig to represent a lineage rather than fragmented assemblies reflecting different microvariants [33, 34, 63]. Overall, we demonstrated that, given a sufficient number of reads, our approach successfully recovered the variants and proportions of single nucleotide-resolved sequence diversity in the environment and is sensitive enough to detect minor sequence types with a low rate of false positives, which would be overlooked by the MAG-based approach.
Relatively low read throughput is another major challenge facing long-read amplicon sequencing. This limitation occurs because a significant number of reads are discarded during read quality control along with the lower per-cost sequencing throughput compared to short-read sequencing. In the present study, we used the stringent quality filtration threshold proposed for the original DADA2 pipeline developed for CCS-based long-read amplicon sequencing . As a result, 28,153 reads were assigned to the ASVs, representing only 17.6% of the original CCS (generated using the “Minimum Predicted Accuracy = 90” parameter) and ~ 11.3% of the nominal output of the sequencer (~ 250,000 reads per five SMRT cells on PacBio RSII). The low number of total reads, together with a significant fluctuation of the sequencing depth among the samples and the lack of replicate samples due to the high sequencing cost, prevented us from alpha diversity estimates and limited the performance of our analyses (see the “Methods” section). Given the potential high performance of the technology when a sufficient number of reads are available (as discussed above), its low read throughput is the most critical issue to be solved for a promising use of the method. Notably, another study targeting a shorter region (the full-length 16S rRNA gene) using a newer generation of sequencing platform (PacBio Sequel) and chemistry (S/P3-C3/5.0) reported more efficient read recovery using the same analytical pipeline ; that analysis (referred to as “Replicate 2” of the fecal sample) resulted in 186,124 reads assigned to ASVs, representing ~ 37.2% of the nominal output of the sequencer (~ 500,000 reads per single SMRT cell on PacBio Sequel). The rapid improvement of sequencing and bioinformatics technologies will eventually overcome the limitation of throughput, making long-read amplicon sequencing one of the most promising approaches for high-resolution phylogenetic profiling of environmental microbial assemblages, as it will complement the limited resolution of short-read amplicon sequencing and the limited sensitivity of the MAG-based approach.
The overall community composition in our study showed severe discrepancies compared to that described in a previous study  based on short-read amplicon sequencing of the same DNA extracts (Fig. S1). Whereas members of the phyla Actinobacteria and Chloroflexi were overrepresented in our long-read analysis, those affiliated with Bacteroidetes, Planctomycetes, and Verrucomicrobia were underrepresented (Fig. S1). Primer specificity could be one reason behind this effect. Although the primer 23Sr-mod shows improved universality over the original 23Sr, it still misses members of some bacterial groups and most Archaea (Table S2). For example, 23Sr-mod targets only 40.2% and 47.4% of all Planctomycetes and Verrucomicrobia, respectively, which might explain their underrepresentation in our results (Fig. S1). Another major factor affecting our results may be connectivity of the rRNA gene operon. A recent study reported that > 40% of rRNA genes in environmental samples could be unlinked, meaning that the 16S and 23S rRNA genes are not in adjacent locations within a genome and thus the ITS region is absent . Planctomycetes is one phylum with a large proportion of unlinked rRNA genes , which may have resulted in their underrepresentation (Fig. S1). While our approach can efficiently resolve microdiversity within individual lineages, it is not designed to reveal overall microbial community composition, for which conventional short-read amplicon sequencing or a direct quantification via fluorescence in situ hybridization is still a reasonable approach .
Phylogenetic composition of long-read amplicons
Overall, our analysis generated 742 ASVs assigned to 441 SSU-ASVs and 155 OTUs (Supplementary Dataset). The results revealed the diversity of sequence-discrete populations within lineages sharing > 97% or even 100% 16S rRNA gene sequence identity, which could not have been observed using single nucleotide-resolved short-read amplicon sequencing analysis of the same samples . The sequences were affiliated with ten phyla, four of which accounted for > 95% of total reads. Samples from the epilimnia were dominated by the phyla Actinobacteria (68.3% of reads), Proteobacteria (18.3%), and Bacteroidetes (10.8%), while the phyla Actinobacteria (50.8%), Chloroflexi (34.5%), and Proteobacteria (9.3%) dominated the hypolimnion samples (Supplementary Dataset). For further analysis, we selected 11 OTUs (hereafter “dominant lineages”) that were ubiquitous (detected in more than eight samples) and abundant (> 300 reads; 1.1% of the total). The dominant lineages collectively accounted for 85.4% of total read abundance in the present study, and included one OTU in Chloroflexi (CL500-11), one in Alphaproteobacteria (LD12), and nine in Actinobacteria (acI-A1, acI-A6, acI-A7, acI-B1, Iluma-A1, Iluma-A2, acIV-B, and two acI-C2) (Supplementary Dataset).
Intra-lineage microdiversity and phylogeographic patterns among lakes
Each of the dominant lineages harbored 5–34 SSU-ASVs and 7–101 ASVs (Supplementary Dataset). Despite the large number of ASVs detected in each lineage, the majority of ASVs were rare, with only a few ASVs accounting for most reads in each sample. The average proportion of the most abundant ASV in a sample ranged from 43.8% (acI-A1) to 84.9% (acI-C2-e), and that of the three most abundant ASVs ranged from 78.5% (acI-B1) to 100% (acI-C2-e) (Fig. 4). Based on the ASV composition of each sample (Fig. S2), Bray–Curtis dissimilarity matrices among samples were generated for each lineage (Fig. S3), and the overall pattern across lineages was evaluated by averaging the matrices of the 11 dominant lineages (Fig. 5). The results indicated that five clusters shared closely related ASV compositions (Fig. 5); while lakes in Europe and Hokkaido formed separate clusters, samples from other Japanese lakes clustered by water layer (i.e., epilimnion and hypolimnion), except for samples from Lake Inawashiro, which clustered independently (Fig. 5). Other exceptions were the hypolimnion of Lake Chuzenji (included in the Hokkaido cluster), and the epilimnion of Lake Motosu (clustered separately from all other samples) (Fig. 5). Note that we employed Bray–Curtis dissimilarity (i.e., weighted by the relative abundance) for clustering to mitigate potential biases introduced by the uneven sequencing depth among the samples. Thus, the averaged matrix among the dominant lineages (Fig. 5) reflects the composition of abundant ASVs but may not follow those of minor ASVs and minor lineages.
Clusters that were separated between Japan and Europe, as well as between Hokkaido and Honshu islands in Japan, were observed in most individual lineages (Fig. S3), indicating the universality of these trends. Planktonic organisms in inland waters can migrate between unconnected habitats on aerosols or due to animal and human activities . Our results suggest that the migration of bacterioplankton across lakes is limited by distance. It should be noted that the observed phylogeographic pattern is a consequence not only of dispersal limitation but also of successful colonization by migrating populations. The finding that a few ASVs dominated lineages within each sample (Fig. 4) supports the hypothesis that members of the same lineage (i.e., those sharing > 97% 16S rRNA gene similarity) have overlapping ecological niches; therefore, the environmental capacity for supporting multiple genotypes at the same time and place is limited. This hypothesis evokes the theory of priority effects, in which an increase in migrant genotypes due to genetic drift is limited by a high frequency of indigenous genotypes occupying the same niche . Based on these results and assumptions, we propose that migration of lake bacterioplankton between Japan and Europe is unlikely, but that migration occurs at distances of up to hundreds of kilometers with a sufficiently high frequency to overcome priority effects.
The existence of epilimnion- and hypolimnion-specific clusters rather than distance–decay relationships among lakes in Honshu and Kyushu (Fig. 5) suggests that local environments or genetic drift are stronger factors in determining the population composition within the region than dispersal limitation. We speculate that the inclusion of the hypolimnion of Lake Chuzenji within the Hokkaido cluster (Fig. 5), which was also supported by the patterns observed in individual lineages (e.g., acI-A7, Iluma-A1, acIV-B, and acI-C2-h; Fig. S3), might reflect environmental factors. Due to its high elevation, Lake Chuzenji is the coldest (with hypolimnetic temperature permanently at 4 °C or lower) and the only dimictic lake in Honshu (Tables 1 and S1), and these conditions might have selected for cold-tolerant or psychrophilic populations that are also dominant in Hokkaido lakes. The unique microbial population in Lake Inawashiro (Fig. 5) may have resulted from environmental sorting in the past. While this lake is a typical oligotrophic monomictic lake today (Tables 1 and S1), it was previously acidic (pH < 5.0) due to volcanic inflow and has experienced rapid neutralization (pH = 6.6–7.0) over the last 30 years . Because pH is one of the main drivers of lake microbial community composition , the present genotype diversity in the lake could have resulted from a bottleneck during the acidic period. Together, these results suggest that environmental sorting is a potential determinant of intra-lineage population composition in lake bacterioplankton, especially among habitats located within hundreds of kilometers. It is important to note that the environmental parameters of the samples (Tables 1 and S1) cannot directly validate the environmental sorting of the community. We cannot rule out the possibility that the observed pattern has resulted from a random genetic drift. To conclude the significance of adaptive processes shaping an intra-lineage population composition, gene- or cultivation-based evidence is required, as was demonstrated in studies on members of Polynucleobacter [15, 41] and Limnohabitans [12, 70].
Previous studies have investigated temporal trends in the intra-lineage population composition of freshwater bacterioplankton, demonstrating that the community consists of both persistent and transient populations [13, 71, 72]. In line with this finding, the temporal replicates collected in 2010 and 2015 in Lake Biwa revealed that the dominant ASVs were shared between the years, although some ASVs observed in 2010 were absent in 2015 and vice versa (Fig. S4). Remarkably, the temporal replicates from Lake Biwa were most closely related to each other based on the dissimilatory matrix (Fig. 5), indicating that the temporal change in the lake was less significant than the differences among lakes. Therefore, the observed phylogeographic pattern (Fig. 5) may be robust regardless of sampling time. However, generalization of the results should be carefully considered, as our temporal analysis was limited to two time points (5 years apart) and one replicate in Lake Biwa. Our data motivated further investigation of the potentially vast spectrum of bacterioplankton microdiversity across seasons and years in a lake, which will provide further insights into the mechanisms driving co-existence and turnover in multiple intra-lineage genotypes.
Microdiversity and phylogeographic patterns of the most abundant hypolimnetic lineage (CL500-11)
In addition to the overall phylogeographic pattern, analyses of individual lineages provided a more detailed perspective on their microdiversification. Here, we focus on CL500-11 (“Ca. Profundisolitarius”), a cosmopolitan bacterioplankton lineage that was dominant in the oxygenated hypolimnia of deep freshwater lakes [73, 74]. Although CL500-11 was almost exclusively detected in the hypolimnion, it was among the most abundant (6084 reads; 21.6% of the total) and ubiquitous (detected in nine out of 11 lakes) lineages in the present study. We identified a total of 48 ASVs, among which four were dominant (ASV_1, 2, 5, and 10) and collectively accounted for 72.6% of CL500-11 reads in the Japanese lakes, and one (ASV_13) exclusively dominated European lakes and accounted for 99.2% of reads from those sites (Figs. 3 and S2). Between the ASVs observed in Japan and Europe, 5 bp mismatches were found in the 16S rRNA gene, along with at least 15 bp mismatches and a 1 bp gap in the ITS region (Fig. 3). These results further support our finding that bacterioplankton inhabiting European and Japanese lakes are genetically distinct.
The four predominant Japanese ASVs shared identical 16S rRNA gene sequences and differed at only four variable base positions in the ITS region (Fig. 3). Each of these ASVs was detected in at least five lakes (Fig. 3), resulting in a relatively uniform population composition across Japan compared with other dominant lineages (Figs. 6 and S3). Such a low degree of diversification was unexpected and contrary to our original hypothesis that hypolimnion-specific lineages would be more deeply diversified among lakes than epilimnion inhabitants due to their limited opportunities for dispersal. Instead, our results suggest that the preference for deep water is not a major factor limiting the diversification and dispersal processes of lake bacterioplankton. One may argue that their actual genomic diversity may be even lower, as different ASVs can be derived from multiple rRNA gene operons within the same genome. However, we ruled out this possibility based on the ASVs occurring independently; for example, only one of these ASVs (ASV_1) was detected in Lake Mashu (Fig. 3).
A previous MAG-based analysis demonstrated that at least three species-level (ANI > 95%) clusters exist within CL500-11: MAGs from Lake Biwa, Lake Michigan, and the Caspian Sea were distinct (ANI < 95%) from each other, while those from Lake Zurich were found in both the Lake Michigan and Caspian Sea clusters . The finding that MAGs from Europe and North America shared ANI > 95% seems contrasting to our finding that CL500-11 populations from different continents (Japan and Europe) were genetically distant from each other. Indeed, the 16S rRNA gene and ITS sequence of the MAG from Lake Michigan  were identical to the European ASV in the present study (Fig. 3), further supporting their intercontinental genetic connectivity between Europe and North America. However, more information is required to draw a firm conclusion about their global-scale microdiversity and phylogeographic patterns, as evidence of unobserved sequence diversity has been found among the CL500-11 lineage. The ITS sequence of a CL500-11 MAG collected in Lake Zurich in May 2013  was not identical to our European ASV (Fig. 3), despite the sequence of the MAG containing complete matches to the primers used in the present study. In North America, two full-length 16S rRNA gene sequences of CL500-11 were retrieved from Crater Lake (accession number = AF316759)  and Yellowstone Lake (HM446117) , and both exhibited 5 bp mismatches or gaps compared to the European ASV and to MAGs from Lake Michigan and Lake Zurich (Fig. 4). Moreover, three partial 16S rRNA gene sequences with at least one mismatch relative to our European ASV were detected in other European perialpine lakes (Lakes Annecy, Bourget, and Geneva)  (see Figure S10 in Okazaki et al. ). We should note that the number of reads assigned to CL500-11 was considerably lower in European lakes (379 reads) than in Japanese lakes (5705 reads) due to uneven sample number and sequencing throughput between the two regions (Fig. 5 and Supplementary Dataset). Thus, it is possible that rare sequence types of CL500-11 in European lakes could not be detected in our analysis due to insufficient data. These findings indicate that the vast sequence space of the CL500-11 lineage remains underexplored both spatially and temporally.
Comparison of diversification patterns among lineages
In contrast to the limited diversification of CL500-11 among Japanese lakes, Iluma-A2 and acI-A1 represented the opposite extreme, as their population compositions were unique for most lakes (Figs. 6, S2, and S3). Other dominant lineages exhibited intermediate degrees of diversification (Fig. 6). The wide degree of diversification among lineages may be due to differences in dispersal characteristics, such as the likelihood of aerosolization [79, 80]. Alternatively, the extent of diversification may reflect an ecological strategy of a lineage to exploit its genomic diversity. That is, lineages with relatively uniform population compositions among lakes (e.g., CL500-11) may harbor a few widespread genotypes that can dominate a broad range of habitats, whereas highly diversified lineages (e.g., Iluma-A2 and acI-A1) may be composed of many specialist genotypes that collectively colonize a wide range of environments. The latter case assumes partitioning of functional capabilities among genotypes within a lineage to minimize the amount of genetic material required per cell in a given niche [14, 41, 81]. Indeed, a high degree of microdiversification, mainly related to carbon metabolism, was reported among closely related acI-A1 strains with 100% identical 16S rRNA gene sequences, and even within the ANI > 95% species boundary (e.g., “Ca. Planktophila versatilis” and “Ca. Planktophila dulcis”) . The microdiversity and ecological strategies underlying the ubiquity of freshwater bacterioplankton lineages are worth further exploration. The key step in such research is linking the 16S rRNA gene and ITS sequences with genomic and functional diversity, which requires assembly of high-quality genomes (i.e., those including the rRNA operon) that may be obtained through cultivation-based approaches [39, 41, 42] or long-read shotgun metagenome sequencing .
Finally, we note that genomic variation can exist even among cells with identical ITS sequences [15, 83, 84]. For example, the dominance of ASV_7 (the LD12 lineage) in both Lake Zurich and Japanese lakes (Fig. S2 and Supplementary Dataset) may have resulted from a shared ITS sequence among different genotypes, given the overall pattern of genetic disconnection between Japan and Europe (Fig. 5). Similarly, the unexpectedly low degree of diversification among CL500-11 sequences in Japanese lakes (Figs. 6 and S3) might be attributable to the inability to detect their genomic diversity using ITS sequences. Our method can detect differences in genotypes but cannot conclusively show homogeneity among them, and the latter characteristic remains to be tested in future works using genome-resolved approaches.
In the present study, we applied single nucleotide-resolved long-read amplicon sequencing analysis to a large collection of environmental samples obtained from 11 deep freshwater lakes in Japan and Europe. The results demonstrated sympatric, allopatric, and temporal microdiversity in lake bacterioplankton and revealed phylogeographic patterns that could not be observed based on short-read amplicon sequencing or the MAG-based approach. Whereas previous studies have reported genomes sharing ANI > 95% in freshwater habitats thousands of kilometers apart [40, 43, 44, 46] as well as in distant marine [85, 86] systems, our results consistently supported genetic isolation between lakes in Japan and Europe. The rapid accumulation of sequence data obtained from all over the world will allow for this topic to be revisited to draw broader conclusions about the global-scale dispersal and diversification processes of ubiquitous freshwater bacterioplankton lineages. Meanwhile, understanding intra-lineage population diversity at the regional scale (up to hundreds of kilometers), where dispersal limitation appears to be relatively weak, requires consideration of a complex combination of factors including the local environment, functional diversity of genotypes, migration frequency, genetic drift, and the ecological strategies of the lineage. The present study highlights the potential of long-read amplicon sequencing as a strategy for tackling these challenges. The bottleneck of our approach to be overcome in the future is achieving a constant and high number of reads per sample, which is a prerequisite to fully exploit the high sensitivity and accuracy of our approach (Fig. 2). With the rapid improvement and decreasing cost of sequencing technology, phylogenetic resolution beyond that of the 16S rRNA gene will become essential to microbial ecology and will reshape our understanding of environmental microbial diversity and ubiquity.
Availability of data and materials
The raw CCS reads generated in the present study were deposited under accession number PRJDB9651. The R script used in the DADA2 analysis workflow is available in the Supplementary Text. Read abundance tables and representative nucleotide sequences for the ASVs and OTUs are available in Supplementary Dataset.
Martiny JBH, Bohannan BJM, Brown JH, Colwell RK, Fuhrman JA, Green JL, et al. Microbial biogeography: putting microorganisms on the map. Nat Rev Microbiol. 2006;4:102–12.
van der Gast CJ. Microbial biogeography: the end of the ubiquitous dispersal hypothesis? Environ Microbiol. 2015;17:544–6.
Steen AD, Crits-Christoph A, Carini P, DeAngelis KM, Fierer N, Lloyd KG, et al. High proportions of bacteria and archaea across most biomes remain uncultured. ISME J. 2019;13:3126–30.
Lloyd KG, Steen AD, Ladau J, Yin J, Crosby L. Phylogenetically novel uncultured microbial cells dominate earth microbiomes. mSystems. 2018;3:e00055–18.
Yarza P, Yilmaz P, Pruesse E, Glöckner FO, Ludwig W, Schleifer K-H, et al. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Microbiol. 2014;12:635–45.
Rodriguez-R LM, Castro JC, Kyrpides NC, Cole JR, Tiedje JM, Konstantinidis KT. How much do rRNA gene surveys underestimate extant bacterial diversity? Appl Environ Microbiol. 2018;84:e00014–8.
Brown MV, Lauro FM, DeMaere MZ, Muir L, Wilkins D, Thomas T, et al. Global biogeography of SAR11 marine bacteria. Mol Syst Biol. 2012;8:595.
Needham DM, Sachdeva R, Fuhrman JA. Ecological dynamics and co-occurrence among marine phytoplankton, bacteria and myoviruses shows microdiversity matters. ISME J. 2017;11:1614–29.
Martiny AC, Tai APK, Veneziano D, Primeau F, Chisholm SW. Taxonomic resolution, ecotypes and the biogeography of Prochlorococcus. Environ Microbiol. 2009;11:823–32.
Ahlgren NA, Perelman JN, Yeh YC, Fuhrman JA. Multi-year dynamics of fine-scale marine cyanobacterial populations are more strongly explained by phage interactions than abiotic, bottom-up factors. Environ Microbiol. 2019;21:2948–63.
Rocap G, Distel DL, Waterbury JB, Chisholm SW. Resolution of prochlorococcus and synechococcus ecotypes by using 16S-23S ribosomal DNA internal transcribed spacer sequences. Appl Environ Microbiol. 2002;68:1180–91.
Kasalický V, Jezbera J, Hahn MW, Šimek K. The diversity of the Limnohabitans genus, an important group of freshwater bacterioplankton, by characterization of 35 isolated strains. PLoS One. 2013;8:e58209.
Jezberová J, Jezbera J, Znachor P, Nedoma J, Kasalický V, Šimek K. The Limnohabitans genus harbors generalistic and opportunistic subtypes: evidence from spatiotemporal succession in a canyon-shaped reservoir. Appl Environ Microbiol. 2017;83:e01530–17.
Jezbera J, Jezberová J, Kasalický V, Šimek K, Hahn MW. Patterns of Limnohabitans microdiversity across a large set of freshwater habitats as revealed by reverse line blot hybridization. PLoS One. 2013;8:e58527.
Hahn MW, Koll U, Jezberová J, Camacho A, et al. Environ Microbiol. 2015;17:829–40.
Crosbie ND, Pöckl M, Weisse T. Dispersal and phylogenetic diversity of nonmarine Picocyanobacteria, inferred from 16S rRNA gene and cpcBA-intergenic spacer sequence analyses. Appl Environ Microbiol. 2003;69:5716–21.
Stewart FJ, Cavanaugh CM. Intragenomic variation and evolution of the internal transcribed spacer of the rRNA operon in bacteria. J Mol Evol. 2007;65:44–67.
Brown MV, Fuhrman JA. Marine bacterial microdiversity as revealed by internal transcribed spacer analysis. Aquat Microb Ecol. 2005;41:15–23.
Earl JP, Adappa ND, Krol J, Bhat AS, Balashov S, Ehrlich RL, et al. Species-level bacterial community profiling of the healthy sinonasal microbiome using Pacific Biosciences sequencing of full-length 16S rRNA genes. Microbiome. 2018;6:190.
Fichot EB, Norman RS. Microbial phylogenetic profiling with the Pacific Biosciences sequencing platform. Microbiome. 2013;1:10.
Singer E, Bushnell B, Coleman-Derr D, Bowman B, Bowers RM, Levy A, et al. High-resolution phylogenetic microbial community profiling. ISME J. 2016;10:2020–32.
Benítez-Páez A, Portune KJ, Sanz Y. Species-level resolution of 16S rRNA gene amplicons sequenced through the MinIONTM portable nanopore sequencer. Gigascience. 2016;5:4.
Cuscó A, Catozzi C, Viñes J, Sanchez A, Francino O. Microbiota profiling with long amplicons using Nanopore sequencing: full-length 16S rRNA gene and whole rrn operon. F1000Research. 2018;7:1755.
Martijn J, Lind AE, Schön ME, Spiertz I, Juzokaite L, Bunikis I, et al. Confident phylogenetic identification of uncultured prokaryotes through long read amplicon sequencing of the 16S-ITS-23S rRNA operon. Environ Microbiol. 2019;21:2485–98.
Kerkhof LJ, Dillon KP, Häggblom MM, McGuinness LR. Profiling bacterial communities by MinION sequencing of ribosomal operons. Microbiome. 2017;5:116.
Tedersoo L, Anslan S. Towards PacBio-based pan-eukaryote metabarcoding using full-length ITS sequences. Environ Microbiol Rep. 2019;11:659–68.
Heeger F, Bourne EC, Baschien C, Yurkov A, Bunk B, Spröer C, et al. Long-read DNA metabarcoding of ribosomal RNA in the analysis of fungi from aquatic environments. Mol Ecol Resour. 2018;18:1500–14.
Jamy M, Foster R, Barbera P, Czech L, Kozlov A, Stamatakis A, et al. Long-read metabarcoding of the eukaryotic rDNA operon to phylogenetically and taxonomically resolve environmental diversity. Mol Ecol Resour. 2020;20:429–43.
Callahan BJ, Wong J, Heiner C, Oh S, Theriot CM, Gulati AS, et al. High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution. Nucleic Acids Res. 2019;47:e103.
Caro-Quintero A, Konstantinidis KT. Bacterial species may exist, metagenomics reveal. Environ Microbiol. 2012;14:347–55.
Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90 K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9:5114.
Olm MR, Crits-Christoph A, Diamond S, Lavy A, Matheus Carnevali PB, Banfield JF. Consistent metagenome-derived metrics verify and delineate bacterial species boundaries. mSystems. 2020;5:e00731–19.
Vollmers J, Wiegand S, Kaster A-K. Comparing and evaluating metagenome assembly tools from a microbiologist’s perspective - not only size matters! PLoS One. 2017;12:e0169662.
Olson ND, Treangen TJ, Hill CM, Cepeda-Espinoza V, Ghurye J, Koren S, et al. Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes. Brief Bioinform. 2019;20:1140–50.
Yuan C, Lei J, Cole J, Sun Y. Reconstructing 16S rRNA genes in metagenomic data. Bioinformatics. 2015;31:i35–43.
Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S. A guide to the natural history of freshwater lake bacteria. Microbiol Mol Biol Rev. 2011;75:14–49.
Zwart G, Crump BC, Kamst-van Agterveld MP, Hagen F, Han SK. Typical freshwater bacteria: an analysis of available 16S rRNA gene sequences from plankton of lakes and rivers. Aquat Microb Ecol. 2002;28:141–55.
Okazaki Y, Fujinaga S, Tanaka A, Kohzu A, Oyagi H, Nakano S. Ubiquity and quantitative significance of bacterioplankton lineages inhabiting the oxygenated hypolimnion of deep freshwater lakes. ISME J. 2017;11:2279–93.
Neuenschwander SM, Ghai R, Pernthaler J, Salcher MM. Microdiversification in genome-streamlined ubiquitous freshwater Actinobacteria. ISME J. 2018;12:185–98.
Garcia SL, Stevens SLR, Crary B, Martinez-Garcia M, Stepanauskas R, Woyke T, et al. Contrasting patterns of genome-level diversity across distinct co-occurring bacterial populations. ISME J. 2018;12:742–55.
Hoetzinger M, Schmidt J, Jezberová J, Koll U, Hahn MW. Microdiversification of a pelagic polynucleobacter species is mainly driven by acquisition of genomic islands from a partially interspecific gene pool. Appl Environ Microbiol. 2017;83:e02266–16.
Salcher MM, Schaefle D, Kaspar M, Neuenschwander SM, Ghai R. Evolution in action: habitat transition from sediment to the pelagial leads to genome streamlining in Methylophilaceae. ISME J. 2019;13:2764–77.
Mehrshad M, Salcher MM, Okazaki Y, Nakano S, Šimek K, Andrei A-S, et al. Hidden in plain sight—highly abundant and diverse planktonic freshwater Chloroflexi. Microbiome. 2018;6:176.
Cabello-Yeves PJ, Haro-Moreno JM, Martin-Cuadrado A-B, Ghai R, Picazo A, Camacho A, et al. Novel Synechococcus genomes reconstructed from freshwater reservoirs. Front Microbiol. 2017;8:1151.
van Gremberghe I, Leliaert F, Mergeay J, Vanormelingen P, Van der Gucht K, Debeer A-E, et al. Lack of phylogeographic structure in the freshwater Cyanobacterium Microcystis aeruginosa suggests global dispersal. PLoS One. 2011;6:e19561.
Harke MJ, Steffen MM, Gobler CJ, Otten TG, Wilhelm SW, Wood SA, et al. A review of the global ecology, genomics, and biogeography of the toxic cyanobacterium. Microcystis spp. harmful algae. 2016;54:4–20.
Tilzer MM, Serruya C. Large lakes. Berlin, Heidelberg: Springer Berlin Heidelberg; 1990.
Hoetzinger M, Hahn MW. Genomic divergence and cohesion in a species of pelagic freshwater bacteria. BMC Genomics. 2017;18:794.
Okazaki Y, Nakano S-I. Vertical partitioning of freshwater bacterioplankton community in a deep mesotrophic lake with a fully oxygenated hypolimnion (Lake Biwa, Japan). Environ Microbiol Rep. 2016;8:780–8.
Kim S-W, Suda W, Kim S, Oshima K, Fukuda S, Ohno H, et al. Robustness of gut microbiota of healthy adults in response to probiotic intervention revealed by high-throughput pyrosequencing. DNA Res. 2013;20:241–53.
Fisher MM, Triplett EW. Automated approach for ribosomal intergenic spacer analysis of microbial diversity and its application to freshwater bacterial communities. Appl Environ Microbiol. 1999;65:4630–6.
Hahn MW, Pockl M, Wu QL. Low intraspecific diversity in a Polynucleobacter subcluster population numerically dominating bacterioplankton of a freshwater pond. Appl Environ Microbiol. 2005;71:4539–47.
Purahong W, Stempfhuber B, Lentendu G, Francioli D, Reitz T, Buscot F, et al. Influence of commonly used primer systems on automated ribosomal intergenic spacer analysis of bacterial communities in environmental samples. PLoS One. 2015;10:e0118967.
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods. 2016;13:581–3.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2012;41:D590–6.
Pruesse E, Peplies J, Glöckner FO. SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics. 2012;28:1823–9.
Rognes T, Flouri T, Nichols B, Quince C, Mahé F. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016;4:e2584.
Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar A, et al. ARB: A software environment for sequence data. Nucleic Acids Res. 2004;32:1363–71.
Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, et al. Vegan: community ecology package. 2018;:https://cran.r-project.org/package=vegan.
Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32:2847–9.
Okazaki Y, Nishimura Y, Yoshida T, Ogata H, Nakano S. Genome-resolved viral and cellular metagenomes revealed potential key virus-host interactions in a deep freshwater lake. Environ Microbiol. 2019;21:4740–54.
Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14:178–92.
Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27:824–34.
Brewer TE, Albertsen M, Edwards A, Kirkegaard RH, Rocha EPC, Fierer N. Unlinked rRNA genes are widespread among bacteria and archaea. ISME J. 2020;14:597–608.
Piwosz K, Shabarova T, Pernthaler J, Posch T, Šimek K, Porcal P, et al. Bacterial and eukaryotic small-subunit amplicon data do not provide a quantitative picture of microbial communities, but they are reliable in the context of ecological interpretations. mSphere. 2020;5:e00052–20.
Incagnone G, Marrone F, Barone R, Robba L, Naselli-Flores L. How do freshwater organisms cross the “dry ocean”? A review on passive dispersal and colonization processes with a special focus on temporary ponds. Hydrobiologia. 2015;750:103–23.
De Meester L, Gómez A, Okamura B, Schwenk K. The monopolization hypothesis and the dispersal–gene flow paradox in aquatic organisms. Acta Oecologica. 2002;23:121–35.
Sutani D, Utsumi M, Kato Y, Sugiura N. Estimation of the changes in phytoplankton community composition in a volcanic acidotrophic lake, Inawashiro, Japan. Japanese J Water Treat Biol. 2014;50:53–69.
Lindström ES, Kamst-Van Agterveld MP, Zwart G. Distribution of typical freshwater bacterial groups is associated with pH, temperature, and lake water retention time. Appl Environ Microbiol. 2005;71:8201–6.
Kasalický V, Zeng Y, Piwosz K, Šimek K, Kratochvilová H, Koblížek M. Aerobic anoxygenic photosynthesis is commonly present within the genus Limnohabitans. Appl Environ Microbiol. 2018;84:e02116–17.
Newton RJ, Kent AD, Triplett EW, McMahon KD. Microbial community dynamics in a humic lake: differential persistence of common freshwater phylotypes. Environ Microbiol. 2006;8:956–70.
García-García N, Tamames J, Linz AM, Pedrós-Alió C, Puente-Sánchez F. Microdiversity ensures the maintenance of functional microbial communities under changing environmental conditions. ISME J. 2019;13:2969–83.
Okazaki Y, Salcher MM, Callieri C, Nakano S. The broad habitat spectrum of the CL500-11 lineage (phylum Chloroflexi), a dominant bacterioplankton in oxygenated hypolimnia of deep freshwater lakes. Front Microbiol. 2018;9:2891.
Okazaki Y, Hodoki Y, Nakano S. Seasonal dominance of CL500-11 bacterioplankton (phylum Chloroflexi ) in the oxygenated hypolimnion of Lake Biwa. Japan. FEMS Microbiol Ecol. 2013;83:82–92.
Denef VJ, Mueller RS, Chiang E, Liebig JR, Vanderploeg HA. Chloroflexi CL500-11 populations that predominate deep-lake hypolimnion bacterioplankton rely on nitrogen-rich dissolved organic matter metabolism and C 1 compound oxidation. Appl Environ Microbiol. 2016;82:1423–32.
Urbach E, Vergin KL, Young L, Morse A, Larson GL, Giovannoni SJ. Unusual bacterioplankton community structure in ultra-oligotrophic Crater Lake. Limnol Oceanogr. 2001;46:557–72.
Yang T, Lyons S, Aguilar C, Cuhel R, Teske A. Microbial communities and chemosynthesis in Yellowstone Lake sublacustrine hydrothermal vent waters. Front Microbiol. 2011;2:130.
Humbert J-F, Dorigo U, Cecchi P, Le Berre B, Debroas D, Bouvy M. Comparison of the structure and composition of bacterial communities from temperate and tropical freshwater ecosystems. Environ Microbiol. 2009;11:2339–50.
Michaud JM, Thompson LR, Kaul D, Espinoza JL, Richter RA, Xu ZZ, et al. Taxon-specific aerosolization of bacteria and viruses in an experimental ocean-atmosphere mesocosm. Nat Commun. 2018;9:2017.
Fahlgren C, Gómez-Consarnau L, Zábori J, Lindh MV, Krejci R, Mårtensson EM, et al. Seawater mesocosm experiments in the Arctic uncover differential transfer of marine bacteria to aerosols. Environ Microbiol Rep. 2015;7:460–70.
Sangwan N, Zarraonaindia I, Hampton-Marcell JT, Ssegane H, Eshoo TW, Rijal G, et al. Differential functional constraints cause strain-level endemism in polynucleobacter populations. mSystems. 2016;1:e00003–16.
Moss EL, Maghini DG, Bhatt AS. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat Biotechnol. 2020; https://doi.org/10.1038/s41587-020-0422-6.
Biller SJ, Berube PM, Lindell D, Chisholm SW. Prochlorococcus: the structure and function of collective diversity. Nat Rev Microbiol. 2015;13:13–27.
Hahn MW, Pöckl M. Ecotypes of planktonic actinobacteria with identical 16S rRNA genes adapted to thermal niches in temperate, subtropical, and tropical freshwater habitats. Appl Environ Microbiol. 2005;71:766–73.
Haro-Moreno JM, Rodriguez-Valera F, Rosselli R, Martinez-Hernandez F, Roda-Garcia JJ, Gomez ML, et al. Ecogenomics of the SAR11 clade. Environ Microbiol. 2020;22:1748–63.
Delmont TO, Kiefl E, Kilinc O, Esen OC, Uysal I, Rappé MS, et al. Single-amino acid variants reveal evolutionary processes that shape the biogeography of a global SAR11 subclade. Elife. 2019;8:e46497.
We are grateful to T. Koitabashi, Y Goda, T Denboh, Y Ito, M Sugiyama, and E Loher for their assistance in field sampling.
YO was supported by The Kyoto University Foundation Overseas’ Research Fellowship. The work in the Japanese lakes was supported by JSPS KAKENHI (15 J00971, 15 J01065, and 18 J00300) and the Environment Research and Technology Development Fund (5-1304 and 5-1607) of the Ministry of the Environment, Japan. MMS was supported by the Grant Agency of the Czech Republic (GAČR projects 19-23469S and 20-12496X) and the Swiss National Science Foundation (SNSF project 310030_185108). The work in Lake Maggiore was supported by the International Commission for the Protection of Italian-Swiss Waters (CIPAIS).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1:
Additional file 2:
Supplementary Table S2. Comparison of the 23Sr (original) and 23Sr-mod (modified for the present study) primers, showing their coverage for each phylum. Coverage was determined using the TestProbe 3.0 tool with reference to the SILVA LSU 132 Parc database (Quast et al.,  allowing no mismatches.
Additional file 3:
Figure S1. Comparison of the taxonomic composition (at the phylum level) of reads generated using long-read (this study) and short-read  platforms. Data from nine Japanese lakes sampled in 2015 were averaged for both water layers. Note that the same DNA extracts were used for both studies. Figure S2. The relative abundance of ASVs in each sample for each lineage. Rows and columns are clustered based on the Bray-Curtis dissimilarity among samples and ASVs, respectively (see Materials and Methods for detail). Abbreviations for sample names follow those in Fig. 5. Legends are shown at the bottom. Figure S3. Clustering of samples based on Bray–Curtis dissimilarity of amplicon sequence variant composition for each lineage. Abbreviations for sample names follow those in Fig. 5. Figure S4. Comparison of the five most abundant amplicon sequence variants (ASVs) between temporal replicates (2010 and 2015) collected in Lake Biwa for each water layer. Bars indicate the relative abundances of ASVs within each lineage and are ordered by abundance rank for each sample. Gray lines indicate succession of ranks between two time points; N.D., not detected. Supplementary Text (R script).
Additional file 4:
Supplementary Dataset. Summarized dataset related to amplicon sequence variants (ASVs) and operational taxonomic units (OTUs). The Excel file consists of two worksheets. In the first sheet (named “by_ASV”), each row represents an individual ASV. The columns indicate the corresponding SSU-ASV and OTU IDs, the taxonomy assigned to the OTU, and read abundance in each sample. The nucleotide sequences of the ASVs are shown in the last column. In the second sheet (named “by_OTU”), each row represents an individual OTU. The columns indicate the taxonomy, classification based on freshwater bacterioplankton nomenclature (see Methods for details), the number of SSU-ASVs and ASVs assigned to the OTU, the number of samples in which the OTU was detected, and read abundances in total and in each sample. OTUs are sorted based on the total read number. Representative 16S rRNA gene sequences of the OTUs are shown in the last column. In each sheet, the total number of the reads assigned to each sample is shown in the first row. Abbreviations for sample names follow those in Fig. 5.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Okazaki, Y., Fujinaga, S., Salcher, M.M. et al. Microdiversity and phylogeographic diversification of bacterioplankton in pelagic freshwater systems revealed through long-read amplicon sequencing. Microbiome 9, 24 (2021). https://doi.org/10.1186/s40168-020-00974-y
- Freshwater bacterioplankton
- Long-read amplicon sequencing
- Ribosomal internal transcribed spacers