Study station, sampling, and environmental information
Surface water samples were collected in Xinglinwan Reservoir, Xiamen City, Fujian Province, Southeast China, at three stations (station C: 24° 36′ 53′′ N, 118° 03′ 11′′ E; station L: 24° 36′ 21′′ N, 118° 03′ 37′′ E; station G: 24° 36′ 09′′ N, 118° 03′ 59′′ E) (Fig. 1a). Specifically, samples were collected approximately daily from August 12, 2016 to August 30, 2016 at station C (12 samples), daily or twice a week from August 12 to September 20 in 2016 at station L (22 samples), and approximately daily from August 12 to September 20, 2016 and then twice a week from September 23, 2016 to August 18, 2017 at station G (116 samples). Each water sample was divided into two subsamples: one for microeukaryotic plankton analyses and the other for water chemistry analyses. About 500 mL of surface water (upper 50 cm) was filtered through a 200-μm mesh to remove larger particles and then filtered through 0.22-μm pore-size polycarbonate membrane filters (47-mm diameter, Millipore, Billerica, MA, USA) to collect the microeukaryotic cells within 60-min. The filters were then stored at – 80 °C until further analysis.
In addition, 18 environmental variables were measured or collected (Additional file 1: Figure S1 and S2). Water temperature, pH, dissolved oxygen, turbidity, electrical conductivity, salinity, and oxidation-reduction potential (ORP) were measured in situ with a Hydrolab DS5 multiparameter water quality analyzer (Hach Company, Loveland, CO, USA). Chl-a concentrations were quantified by PHYTO-PAM Phytoplankton Analyzer (Heinz Walz GmbH, Eichenring, Germany). Total carbon (TC), total organic carbon (TOC), total nitrogen (TN), ammonium nitrogen (NH4-N), nitrate nitrogen (NO3-N), nitrite nitrogen (NO2-N), total phosphorus (TP), and phosphate phosphorus (PO4-P) were measured according to the standard methods described in our previous study [59]. Precipitation and daily average wind speed data were downloaded from the Xiamen Meteorological Bureau. The precipitation data consisted of a 7-day accumulation before the sampling day. To study the relationship between salinity and precipitation, data on daily precipitation were collected and 3-year salinity data were measured from 2016 to 2018 (Additional file 1: Figure S3).
DNA extraction, PCR, and Illumina sequencing
The total DNA was extracted from the filters using the FastDNA SPIN Kit and the FastPrep Instrument (MP Biomedicals, Santa Ana, CA, USA) following the manufacturer’s instructions. For the microeukaryotic plankton community, the V9 region of eukaryotic 18S rRNA gene was amplified using the primer pair 1380F and 1510R [60]. The 30-μL PCR reaction included 15-μL of Phusion High-Fidelity PCR Master Mix (New England Biolabs, Beverly, MA, USA), 0.2 μM of each primer, and 10 ng of community DNA. The PCR reactions included initial denaturation at 98 °C for 1 min, followed by 30 cycles of 10 s at 98 °C, 50 °C for 60 s, and 72 °C for 30 s. At the end of the amplification, the amplicons were subjected to a final 10 min extension at 72 °C. The PCR products from triplicate reactions per sample were pooled and gel-purified. All libraries were sequenced on the Illumina HiSeq platform (Illumina Inc., San Diego, CA, USA) using a paired-end (2 × 150 bp) approach.
To explore whether bacterial communities had an impact on the microeukaryotic plankton community, the V3-V4 hypervariable regions of the 16S rRNA gene were amplified, purified, and quantified following our previous procedure [61]. The triplicate PCR products were pooled together, and sequencing was performed on the Illumina MiSeq platform (Illumina, Inc., San Diego, CA, USA) using 2 × 250 bp paired-end sequencing approach.
Bioinformatics
Pairs of reads from both the 18S rRNA and 16S rRNA gene were processed using VSEARCH v.2.14.1 [62]. Quality check and sequence merge were conducted using MOTHUR v.1.39.5 [63], and the filtered reads (“sequences”) were then processed as unique sequences using “minuniquesize 8” parameter in VSEARCH. The unoise3 algorithm was used to discard chimeras and assign operational taxonomic units (OTUs) at a 97% sequence similarity threshold in USEARCH v11 [64]. Subsequently, for microeukaryotic plankton, representative sequences from each OTU were taxonomically classified using an 80% confidence threshold against the Protist Ribosomal Reference (PR2) reference sequence database [65]. To make the taxonomic classification more user friendly and portable, the taxonomic assignments were adjusted to be in accordance with the taxonomic reference of eukaryotes [66, 67]. To minimize inclusion of sequencing errors, OTUs present in < 5 samples with < 10 sequences were excluded from the downstream analyses. After the OTU table was generated, we randomly rarefied a subset of 146,973 sequences from each of the 150 samples to standardize the sequencing effort using MOTHUR v.1.39.5 [63]. The 146,973 sequences were selected because they represented the sample with the lowest sequence number from all the samples. Finally, the total dataset retained 19,952 OTUs and 22,045,950 sequences at 97% similarity threshold. OTU numbers of unclassified Eukaryota accounted for 19.2% of the whole OTUs, and sequences of unclassified Eukaryota accounted for 3.3% of the whole sequences. For bacterial communities, OTU sequences were taxonomically classified by running USEARCH v11 against the Greengenes database [68]. All archaea, chloroplasts, eukaryota, mitochondria, and unknown sequences were discarded. Bacterial OTUs present in < 5 samples with < 10 sequences were removed. Finally, the total dataset was randomly normalized to 52,248 sequences for each of 116 samples from station G, and these sequences were clustered into 16,153 OTUs at a 97% similarity threshold.
Considering that the microeukaryotic OTUs identified at the 97% similarity level with the unoise3 algorithm is not a specific and accurate estimation of the species or strain level diversity, we further defined ASVs (amplicon sequence variants) using the unoise3 algorithm, as described previously [69]. Reads were quality-filtered to a maximum expected error threshold of 1.0, and then unoise3 was performed to identify ASVs using default settings dataset. In this study, ASVs were included only for beta-diversity analysis of microeukaryotic plankton to assess simply whether our results were biased by the OTUs definition approach.
Real-time quantitative PCR
Real-time quantitative PCR (qPCR) was used to quantify the number of microeukaryotic plankton 18S rRNA gene copies using a LightCycler 480 instrument (Roche, Basel, Switzerland). The 20-μL reaction mixture consisted of 10 μL 2 × LightCycler 480 SYBR Green I Master Mix (Roche, Basel, Switzerland), 2 μL DNA template, 0.8 μM of each primer, and 6.4 μL RNase-free water. The PCR runs included tested samples and a negative control in triplicate. The following thermal cycling conditions were used: 30 s at 94 °C, followed by 40 cycles of 5 s at 94 °C, 15 s at 50 °C, and 10 s at 72 °C. Gene fragments were diluted (108−102 gene copies/μL) to generate the standard curve using a plasmid containing the 18S rRNA gene. The amplification efficiency (E) of qPCR was calculated using the equation E = [10(−1/slope) − 1]. The qPCR efficiency of 18S rRNA ranged from 85 to 108% in this study. In addition, qPCR amplification of bacterial 16S rRNA gene was also performed using a Lightcycler 480 instrument (Roche, Basel, Switzerland) according to our previous method [70]. The qPCR efficiency of 16S rRNA gene was between 95 and 105%.
Definition of core and satellite taxa
Partitioning microbial communities into core and satellite taxa according to their abundance and occurrence frequency has contributed to our understanding of community assembly and functioning in many spatiotemporal datasets [32, 71]. We arbitrarily defined “core” and “satellite” taxa based on previous study [33]; thus, core taxa were defined as the OTUs with an occurrence frequency ≥ 75% in all samples and satellite taxa as the OTUs with an occurrence frequency < 50% in all samples. Detailed descriptions of core and satellite datasets are presented in Additional file 1: Figure S4 and Additional file 1: Table S2.
Statistical analyses
Alpha-diversity index (i.e., Shannon-Wiener index) and Tukey’s honestly significant difference (Tukey HSD) post hoc test were conducted using R software (version 3.6.1) [72]. The alpha-diversity index was calculated for each sample using the diversity function in the “vegan” package [72]. Temporal and salinity effects on alpha-diversity were evaluated using two-way analysis of variance (two-way ANOVA).
Microeukaryotic plankton community composition was visualized using non-metric multidimensional scaling (NMDS) based on Bray-Curtis dissimilarities. Analysis of similarity (ANOSIM) was used to evaluate differences in microeukaryotic plankton communities between groups. To reveal the temporal dynamics of microeukaryotic plankton communities, a time-lag regression analysis was applied to analyze the Bray-Curtis dissimilarity between each pair of samples, and the time difference (time-lag) was then plotted against the community dissimilarity. The effects of time and salinity on the Bray-Curtis dissimilarity were evaluated using Spearman’s rank correlation. In this study, we used two types of time: the absolute time span and the annual cycle time span. For example, the time lag between January and December should be (i.e., from January to December) 1 rather than 12 in the second time type.
In order to study the influence of abiotic (environmental factors) and biotic (bacterial community) variables on microeukaryotic plankton community composition, we computed the pairwise Bray-Curtis distances between samples on the basis of the relative abundance of microeukaryotic plankton (the compositional data). We also computed the pairwise Euclidean distance between the samples on the basis of environmental data and bacterial alpha- and beta-diversity. Then, partial Mantel test [73] was performed to assess the correlation between microeukaryotic plankton community composition and environmental variables or bacterial data, respectively.
Further, to determine the relative contribution of time (in this study: month) and salinity to the assembly of microeukaryotic plankton communities, Mantel and partial Mantel tests were applied [73]. The similarity matrices of the microeukaryotic plankton community were generated based on the Bray-Curtis index. The time and salinity matrices were obtained using the Euclidean distance. Mantel test assessed the correlation between microeukaryotic plankton community (Bray-Curtis dissimilarity) and salinity (Euclidean distance) or time (Euclidean distance), respectively. The partial Mantel test was performed to estimate the relative contribution of salinity or time variables to the changes in the microeukaryotic plankton community.
In addition, to further study the roles of core and satellite taxa on ecosystem functions, we calculated the multi-nutrient cycling index that can track the cycling of multiple nutrients in aquatic ecosystem [74] (see Additional file 1 for a detailed description). Afterwards, random forest (RF) machine learning [75] was used to assess the effects of alpha- and beta-diversity of core and satellite subcommunities on the multi-nutrient cycling index (see Additional file 1 for detailed description).
Neutral community model
To estimate the effects of stochastic processes on the microeukaryotic plankton community assembly, a neutral community model was used [76], applying non-linear least-squares to generate the best fit between the frequency of OTUs occurrence and their relative abundance [77]. R2 value indicates the goodness of fit to the model, which was calculated following the “Östman’s method” [78]. When R2 is close to 1, the community assembly is fully consistent with stochastic processes. When it does not describe the community composition, R2 can be ≤ 0. Model computations were performed with R version 3.6.1 [72].
Habitat niche breadth
To explore the relative effects of stochastic and deterministic processes on microeukaryotic plankton communities, we calculated Levins’ niche breadth (B) index for the microeukaryotic plankton using the formula:
$$ {B}_j=\frac{1}{\sum_{i=1}^N{P_{ij}}^2} $$
where Bj indicates the habitat niche breadth of OTU j in a metacommunity; N represents the total number of communities in each metacommunity; Pij is the proportion of OTU j in community i [23, 47]. A given OTU with high B value represents a wide habitat niche breadth. The community level B value (Bcom) was calculated as the average of B values from all taxa occurring in one given community [23, 49]. A microeukaryotic plankton community with a wide niche breadth is expected to be metabolically more flexible at the community level than one with a narrow niche breadth [23, 47, 49]. The analysis was performed using the “niche.width” function within R package “spaa” [79].
Null model
We tested clustering or overdispersion of microeukaryotic plankton communities by examining the deviation of each observed metric from the average of the null model (checkerboard score (C-score)) [80]. The values obtained were standardized to allow comparisons among assemblages using the standardized effect size (SES). Specifically, the sequence table was transformed into a binary matrix of presence (1) and absence (0), and then SES was calculated under the null model [81]. The SES for C-score was estimated as the difference between the observed index and the mean of the stimulated index over the standard deviation of the stimulated index [82]. The higher or lower SES value than the expected null value is interpreted as overdispersion or underdispersion, respectively, and the magnitude of SES is interpreted as the strength of the effect of deterministic processes on the assemblage [83]. C-score was evaluated based on 30,000 simulations and using the sequential swap randomization algorithm with the package “EcoSimR” in R version 3.6.1 [72].
Network construction
We constructed species co-occurrence networks based on samples from the whole study period (August 2016–August 2017, 116 samples) for station G in the Xinglinwan Reservoir. To reduce the complexity of the datasets, we removed OTUs present in less than 20 samples with less than 200 sequences for the construction of networks. We also constructed community subnetworks for the all, core, and satellite microeukaryotic plankton based on samples at the three salinity levels, respectively.
SparCC was used to calculate pairwise correlations between plankton OTUs [84]. Only robust (|r| > 0.6) and statistically significant (P < 0.01) correlations were incorporated into the network analyses. Network visualization was generated with Cytoscape version 3.6.1 and Gephi version 0.9.1. Each node indicates a given OTU, and each edge represents a significant correlation between two OTUs. Degree represents the number of edges connecting each node to the rest nodes of the network. Normally, the high topological characteristic values (such as node, edge, and degree) suggest a more complex network. In general, there are two common network distributions. One is the random network. A random network follows a Poisson distribution of edges per node, meaning that there are no highly associated nodes and that most nodes exhibit a similar number of edges [85]. The other is non-random network. That is, scale-free or small-world network that has a power-law distribution, implying that some nodes are highly associated and maintaining the network together [86, 87].
Based on metabolic network approaches [88], the network hubs (Zi-score > 2.5; Pi-score > 0.62), module hubs (Zi-score > 2.5; Pi-score < 0.62), connectors (Zi-score < 2.5; Pi-score > 0.62), and peripherals (Zi-score < 2.5; Pi-score < 0.62) were identified [87]. All hubs and connector nodes could be defined as potential keystone species in co-occurrence networks [89]. Furthermore, the 1000 Erdös-Réyni random networks, which exhibit the same number of nodes and edges as the real networks, were calculated in the “igraph” R package, with each edge having the same probability of being assigned to a node [85]. To further describe the topological parameters of the networks, a set of metrics of both real and random networks were calculated and compared: clustering coefficient, average path length, and modularity.
The network dissimilarity between different salinity levels was identified using the widely applied equation, which consists of a re-expression of classical measures of dissimilarity following a partition of shared and total items [90, 91]:
$$ {\beta}_w=\frac{a+b+c}{\left(2a+b+c\right)/2}-1 $$
where βw is dissimilarity between networks B and C, a represents number of shared edges between networks B and C, b represents number of edges unique to network B, and c represents number of edges unique to network C.
Finally, network stability was evaluated by removing nodes in the static network to estimate how quickly robustness degraded, and network robustness was assessed by natural connectivity of the nodes. The node removing was a random repetitive process [92].