Dissecting the factors shaping fish skin microbiomes in a heterogeneous inland water system

Background Fish skin microbiomes are rarely studied in inland water systems, in spite of their importance for fish health and ecology. This is mainly because fish species distribution often covaries with other biotic and abiotic factors, complicating the study design. We tackled this issue in the northern part of the Jordan River system, in which a few fish species geographically overlap, across steep gradients of water temperature and salinity. Results Using 16S rRNA metabarcoding, we studied the water properties that shape the skin bacterial communities, and their interaction with fish taxonomy. To better characterise the indigenous skin community, we excluded bacteria that were equally abundant in the skin samples and in the water samples, from our analysis of the skin samples. With this in mind, we found alpha diversity of the skin communities to be stable across sites, but higher in benthic loaches, compared to other fish. Beta diversity was found to be different among sites and to weakly covary with the dissolved oxygen, when treated skin communities were considered. In contrast, water temperature and conductivity were strong factors explaining beta diversity in the untreated skin communities. Beta diversity differences between co-occurring fish species emerged only for the treated skin communities. Metagenomics predictions highlighted the microbiome functional implications of excluding the water community contamination from the fish skin communities. Finally, we found that human-induced eutrophication promotes dysbiosis of the fish skin community, with signatures relating to fish health. Conclusions Consideration of the background water microbiome when studying fish skin microbiomes, across varying fish species and water properties, exposes patterns otherwise undetected and highlight among-fish-species differences. We suggest that sporadic nutrient pollution events, otherwise undetected, drive fish skin communities to dysbiosis. This finding is in line with a recent study, showing that biofilms capture sporadic pollution events, undetectable by interspersed water monitoring. Video abstract.


Introduction
The importance of the cutaneous mucus in fish is well established. The teleost epidermal mucus provides mechanical protection against physical and biological harm; thanks to its viscosity and high turnover [1,2], and it contains agents taking part in ecological interactions [3]. Additionally, it is a primary immune response site, in which the innate immune system and antimicrobial peptides are highly active [4]. Other biochemical activities involving defensins, lysozymes and lectin-like agglutinins additionally respond to pathogens [5]. In contrast, many mutualistic and commensal microbes are well adapted to use the mucus as adhesion site and can evade the defence mechanisms it provides [6]. This community also interferes with infections [7][8][9], via competition or antagonistic interactions [10,11]. Dysbiosis of the skin microbial community can drive it out of homoeostasis and promote infection [12], although not every perturbation in the microbiome must lead to the loss of function [13].
Although the skin microbiome in fish has not been the focus of microbiome research, some important progress has been made by a few research groups. The skin microbiome is known to be affected by both environmental and fish species dependent factors [14,15], with evidence for cophylogeny in coral reef fish [15]. On the population level, however, the existence of microbiome covariation with host genetics is inconsistent among systems [16][17][18]. Interpopulation variation appears to rely, in part, on variable resolution of antagonistic relationships among microbial species [17]. Capture stress has been shown to correlate with microbiome contamination, in particular by Vibrio spp. [19]. Conversely, perceived opportunistic pathogens such as Vibrio spp. appear to constitute small fractions of normal microbiomes and culture-dependent techniques grossly over-represent them [20]. Additional studies identified stress indicators [21] and probiotics candidates [22], both with conceivable applications in aquaculture and nature conservation, as well as the finding that captivity reduces the skin microbiome biodiversity [18,23,24]. Consistent salinity bioindicators were also recovered in an experimental system utilizing euryhaline fish [25].
While most of the current research is targeted at fish species with commercial relevance [2,[26][27][28][29] or food safety [30], a few studies have dissected wild fish communities or populations, utilizing deep-sequencing culture-independent methods [15,16,31] and leaving the vast majority of wild habitats unexamined [32]. In the wild, particularly in fragmented and heterogenous inland water systems, it is difficult to test the effect of geographically varying abiotic conditions on a given species, since the fish community composition often covaries with them [25].
In this work, we have sampled the upper reaches of the Jordan River system and Springs Valley streams, north and south of the Sea of Galilee, respectively. This range includes heterogeneous sites, differing in fish community composition and water properties. We sampled mostly in nature reserves, although three of the sites suffer humaninduced eutrophication, one of which is a settling pool and two others receiving fish farm and fishpond outlets. The geographic range of a few fish species in this part of the system partially overlap, thus allowing us to study host and site-dependent effects on fish skin microbiomes. Due to the sensitivity of the sampled ecosystem, we employed a non-destructive sampling procedure, swabbing the captured fish on site and immediately releasing them. Our results reveal effects of both fish species and sampling site on the skin microbiome, highlight the importance of considering the background microbial contamination of the swab samples by the water and show that eutrophication may drive the skin microbiome to dysbiosis.

Sampling
To study the microbial diversity in freshwater fish skin and the factors shaping it, we have sampled a cumulative number of 14 species from 17 locations representing three streams north of the Sea of Galilee (three to six sites in each stream) and two streams to its south (one and two sites per stream). We will hereafter denote the two regions the "northern" and "southern" basins ( Fig. 1; Additional file 4: Table S1). Additionally, we collected 2 l of water in each site. In total, we accumulated 176 fish skin swab samples and 17 water bottles. In the northern basin, Capoeta damascina (Cyprinidae) were collected from all sites in the Hermon (H) and Snir (S) streams, and from two sites in the Jordan River (J). The species most co-occurring with C. damascina was Oxynoemacheilus insignis (Nemacheilidae), which was found in three H sites, one S site and one J site. Unlike C. damascina, O. insignis was also captured in Tel-Saharonim Stream (T, southern basin). Another relatively widely dispersed group included the Tilapiine (Cichlidae) species Coptodon zillii (formerly Tilapia), Sarotherodon galilaeus and hybrids of Oreochromis aureus, which were found in three J sites in the northern basin, co-occurring with C. damascina in one site, and in the two southern basin streams, co-occurring with O. insignis in one site. The remaining species, belonging to Cyprinidae, Haplochrominae (Cichlidae), Poeciliidae and Mugilidae, had a narrow geographic rage and a small geographic overlap with other species (Additional file 4: Table S1).
Sequence data: "raw swab" and "skin-corrected" bacterial communities To minimize our impact on the sampling site, we rubbed fish along their lateral line on site, using sterile swabs, and immediately released them. This method resulted in variable DNA quantities retrieved from each swab, and a subset of samples was selected post hoc. According to alpha diversity rarefaction curves, the alpha diversity in both swab and water samples was thoroughly represented by 1000 sequences or more (Additional file 1: Figure S1). Consequently, after exclusion of organelle reads that may amount to as much as half the sequence data, we retained 120 fish skin samples and 10 water samples, with a mean sequence read count of 2996 sequence reads, ranging from 1022 to 6686 reads of individual samples. We considered this dataset to represent "raw swab communities". We further filtered the biom table to include only amplicon sequence variants (ASV) that were unique to swab samples, or that had significantly higher relative abundance in swab samples than in water samples, based on Benjamini-Hochberg corrected [33] Mann-Whitney U test [34]. This data set was denoted the "corrected skin communities". Throughout the results, we address both the raw swab communities and the corrected skin communities to study the effect of this analytic procedure.

Bacterial diversity
The following approach was taken to study the factors shaping the fish skin microbiome. Alpha and beta diversity were quantified with Faith's phylogenetic diversity (Faith PD) index [35] and an unweighted UNIFRAC [36] pairwise distance matrix, respectively. Significance differences among location and fish taxonomy categories were then tested with Benjamini-Hochberg corrected [33] Kruskal-Wallis [37] and PERMANOVA [38] tests, for alpha and beta diversity, respectively. Principal coordinates analyses (PCoA) [39,40] were used to visualise beta diversity clusters and the proportion of total variance they explain, coupled with biplot analyses [40], to detect the ASVs that change among the PCoA clusters. ANCOM tests [41] were used to identify ASVs that vary between sites or fish taxa. We further used Pearson correlation [42] to study the correlation between the water temperature, conductivity, pH or dissolved oxygen, and the Faith PD values, or the first or second PCoA axis values. The entire procedure was carried out twice, for the raw swab communities and the corrected skin communities.

Alpha diversity
Alpha diversity results are summarised in Fig. 3. Similar mean Faith PD values were found in raw swab communities (8.6 ± 2.6 SD) and water sample communities (9.2 ± 2.2), compared to the lower values computed for corrected skin communities (3.7 ± 0.8). When considering the raw swab communities, both the stream (p value = 1.8 × 10 − 11 ) and fish family or tribe (p value = 0.002) were significant factors, with many significant pairwise differences among streams (10 − 7 < q value < 0.01; Fig. 3a; Additional file 5: Table S2). Some additional significant differences were found among fish taxa, but only among largely non-overlapping fish families (Cyprinidae and Haplochrominae; q value = 0.047, Cyprinidae and Tilapiinae; q value = 0.047; Fig. 3b; Additional file 6: Table S3). For these pairs of taxa, we cannot tease apart the location effect from that of fish taxonomy due to the covariance of the two factors. When considering the corrected skin communities, all pairwise differences between streams were non-significant, in contrast with the raw swab community results. The only significant differences found were between Nemacheilidae and each of its co-occurring fish taxa Cyprinidae (q value = 0.015 and Additional file 2: Figure S2B) and Tilapiinae (q value = 0.015 and Additional file 2: Figure S2B). Additionally, posterior distributions of the corrected skin samples' Faith PD μ values were lower than for the raw swab samples for all sites (Additional file 2: Figure S2C) and fish families (Additional file 2: Figure S2D). For the corrected skin communities, the water temperature, conductivity and salinity no longer explained Faith PD values (R 2 = 0.02, 0.06 and 0.04, respectively). The Faith PD values of the water communities covaried with those of the raw swab communities, with the exception of sites with low water temperatures (Fig. 3c).
While the Qiime2 PCoA analysis has the advantage of incorporating the ASV phylogeny through the UniFrac distance matrix, it utilises linear functions to reduce the dimensionality of the data. Similarly, Pearson's correlation assumes linear relationships between the PCoA axes and the environmental measurements tested. To evaluate the extent to which assumptions of linearity bias the results, we have repeated the analysis, replacing the PCoA analysis with a kernel principal component analysis (PCA). Kernel PCA, while agnostic to the ASV phylogeny, allows to compare a linear kernel with a radial basis function (RBF) kernel, which does not assume linearity [43]. We then tested the correlation of principal components with environmental measurements using epsilon support vector regression (SVR) [44], with a linear kernel or RBF kernel (non-linear). The results of both linear and non-linear non-phylogenetic approaches supported those described in Additional file 3: Figure S3.

Beta diversity
Both stream-based (Additional file 7: Table S4) and fish family-based (Additional file 8: Table S5) groupings were globally significant for the raw and corrected communities (p value = 0.001), as well as pairwise stream comparisons (0.001 < q value < 0.004). However, significant differences between pairs of fish taxa that co-occur geographically were found only for the corrected skin communities. These pairs included Nemacheilidae and its co-occurring fish taxa (Cyprinidae, Haplochrominae and Tilapiinae with a q value < 0.007 in the three comparisons). Differences between co-occurring fish taxa were not recovered for raw swab communities.
PCoA results for the raw swab communities, water sample communities and corrected skin communities are shown in Fig. 4a, b. For the raw and corrected communities, the first PCoA axis explained 14.7 and 14.8% of the total variance, respectively, and the second 7.2 and 10%, respectively. Despite the similar percent of explained variance between the raw and corrected communities, north and south basin distinctiveness was lost in the corrected dataset (Fig. 4b). Water sample communities from the northern basin clustered separately from raw swab samples from the northern basin, but were similar to water samples and raw swab samples from the southern basin (Fig. 4a).
The explanatory ASVs changed between the raw and corrected communities, with Cetobacterium (ASV 58d0), a salinity bioindicator [25], having the strongest effect for the raw swab communities (Fig. 4a), and the anaerobes [45] Phycisphaeraceae (ASV 9830) and Anaerobacillus (ASV e942) for the corrected communities (Fig. 4b). Accordingly, in Fig. 4a, temperature, conductivity and pH ( Fig. 4c-e) correlated strongly with the first PCoA axis values of the raw swab communities (R 2 = 0.75, 0.66 and 0.31, respectively), but this effect was mostly lost for the corrected skin community values (R 2 = 0.07, 0.01 and 0.06, respectively) and a weak correlation with the dissolved oxygen was observed instead (Fig. 4f, R 2 = 0.19). The second PCoA axis had weaker correlations with any of the water measurements than the first axis. For the second axis, the raw swab community values correlated with pH and dissolved oxygen measurements (R 2 = 0.17 and 0.26, respectively) and the corrected skin community values correlated with the temperature measurements (R 2 = 0.14).
To summarise, beta diversity in the raw swab communities is best explained by the water salinity or temperature. The corrected skin communities, however, are less affected by water characteristics, of which dissolved oxygen level is the strongest. Accordingly, in the raw swab communities, a salinity bioindicator bacterium varies the most, whereas for the corrected skin communities we detect large variations in anaerobic bacteria. It is important to note that dissolved oxygen measurements were not taken in the H stream, and thus, the strength of this finding is tentative.
As for the alpha diversity, we evaluated how assumptions of linearity may bias the results, by comparing the linear and RBF kernels in a kernel PCA framework [43], followed by SVR [44] with linear or RBF kernel. For PC1, we similarly observed a reduction in the proportion of explained variance between the raw swab and corrected skin datasets for temperature, conductivity and pH. For the dissolved oxygen, R 2 depended on the kernel but was similar between the raw swab and corrected skin datasets. For PC2, the dissolved oxygen was not dominant either, independently from assumptions of linearity (Additional file 3: Figure S3).

Basin-specific PCoA
To further investigate the relationship between the fish taxonomy and the skin microbiome, we carried out another PCoA, separating the northern and southern basins to increase the geographic range overlap of the included fish taxa in each analysis (Fig. 5). This analysis supported the importance of the sampling site in explaining the beta diversity in the raw swab communities ( Fig. 5a and b, addressing the northern and southern basins, respectively, with marker shapes representing the different streams). However, stream separation was reduced when analysing the corrected skin community in the northern basin (Fig. 5c). This analysis further exposes a clear separation between Nemacheilidae and Cichlidae (Haplochrominae + Tilapiinae), for the raw swab communities (Fig. 5a) and more so for the corrected skin communities (Fig. 5c). According to ANCOM test, the bacterium explaining the difference between Nemacheilidae and Cichlidae is Exiguobacterium (ASV 0cb4) for both the raw and corrected communities.

Proteobacteria-Bacteroidetes ratios reveal dysbiosis in eutrophic sites
The ratio between Proteobacteria and Bacteroidetes is associated with fish health, with compromised individuals having increased Bacteroidetes relative abundances [46]. We compared the relative abundances of these phyla among the sampling sites to derive ecological Fig. 4 Beta diversity across the study area. Principal coordinates analysis (PCoA) of a swab bacterial communities and b corrected skin bacterial communities. The first and second PCoA axes correspond to the y and x axes of each plot, respectively. The percent variance explained by each axis is denoted as the axis label. The four most important ASVs and their effect sizes are indicated by biplot arrows. Subplots c to j demonstrate the correlation of the first (c, f) and second (g, j) PCoA axis values with the temperature (c, g), conductivity (d, h), pH (e, i) and dissolved oxygen (f, j). Circle and square markers denote the northern and southern basins respectively. Blue, grey and orange markers denote water, swab and corrected skin communities, respectively insight ( Fig. 6), checking both the raw swab communities (Fig. 6a) and the corrected skin communities (Fig. 6b). A clear connection emerged for the corrected skin community, associating elevated Bacteroidetes relative abundances and reduced Proteobacteria relative abundances with human-induced eutrophication, site H.0.6 being a settling pool, site J.1 a fish pond outlet and sites S.2 and S.3 situated just upstream and downstream of a fish farm outlet. T.2, which was not initially categorised as eutrophic, also had a high relative abundance of Bacteroidetes. This is a shallow site amidst agricultural land.

Predicted metabolic differences between the raw swab communities and the corrected skin communities
To understand the effect of water background contamination on the inference of fish skin microbiomes and the reconstruction of metabolic models that would be recovered from their metagenomes, we employed PICRUSt   [47]. PICRUSt predicts a metagenome based on ASVs and bacterial genomes available in online databases. Figure 7 summarises the number of metabolic pathways with a significantly different relative abundance between the raw and corrected predicted metagenomes, according to their KEGG category. The most frequent KEGG categories with significantly different pathway representations were "Biosynthesis of secondary metabolites", "Microbial metabolism in diverse environments" and "Biosynthesis of antibiotics". This indicates that the raw swab communities and the corrected skin communities would produce different metabolic models, with respect to the ecological function of their members.

Freshwater fish skin microbiome
Few studies have investigated skin microbiomes in freshwater fish, and it is not clear if it is fundamentally different than those of marine fish. Larsen et al. [14] have sampled the catadromous Mugil cephalus in marine environments and found it to have an uncharacteristically high relative abundances of Alphaproteobacteria, compared to the strictly marine fish they have sampled. A similar excess of Alphaproteobacteria was found in wild Salmo salar [18] and Salvelinus fontinalis [17], anadromous salmonid species. Amazon River fish were also found to have high Alphaproteobacteria under certain physicochemical conditions [31]. In stark contrast, Gammaproteobacteria dominated the skin microbiome in wild S. salar fry [24], and also that of Silurus glanis, a catfish caught in the wild [16]. Of the five instances, the catfish is the only strictly freshwater inhabitant, but it lacks scales.
In this study, we have investigated freshwater fish and identified Alphaproteobacteria as having the highest median relative abundance, highlighting their dominance as a possible feature of some freshwater fish skin microbiomes, compared to marine fish [15,48]. Such a difference is conceivable due to consistent abiotic differences between marine and freshwater habitats, and the resulting differences in fish biology between them. However, Fig. 7 Functional differences between the raw swab bacterial communities (orange) and corrected skin communities (green). Each bar denotes the number of KEGG pathways with a significant relative abundance difference between the raw and corrected communities, in each of the KEGG pathway categories (x axis) with such few and methodologically different studies of freshwater fish, and the exceptions that exist among them, this hypothesis requires further study.

Site-related factors shaping the fish skin microbiome
Based on alpha and beta diversity analyses, the raw swab communities of some of the sites are clearly different from the water communities in the same location, particularly when water temperature is low. This may form the impression that the raw swab communities properly represent the skin microbiome in fish. However, our results show that this may be misleading and the water background contamination should be formally addressed. Water measurements, especially the temperature, may seem to govern alpha and beta diversity of fish skin communities. The temperature and conductivity may be perceived as overwhelmingly strong effects on beta diversity in particular (R 2 = 0.75 and 0.66 for the first PCoA axis). However, when the water background contamination is addressed, these effects are lost completely for alpha diversity, and become much weaker, in the case of beta diversity, where the percent dissolved oxygen and temperature emerge as two weak factors (R 2 = 0.18 for the first axis and R 2 = 0.15 for the second axis, for oxygen and temperature, respectively).
Statistical tests of group effects on the alpha and beta diversity support this finding. Alpha diversity differences among streams are completely lost following the elimination of background water contamination, revealing a constant alpha diversity in fish skin among streams in the system. Beta diversity significantly differs among most pairs of streams for the raw and corrected communities, but the ASVs causing these differences change with the elimination of background contamination. Cetobacterium, a skin microbiome salinity bioindicator [25], emerges as the main source of variation among sites, but this effect is lost following the treatment of background noise, and Phycisphaeraceae and Anaerobacillus become the main varying component. Both Phycisphaeraceae [45] and Anaerobacillus are facultative or strict anaerobes and accordingly they change with the dissolved oxygen levels.
To summarise, in the studied area, the alpha diversity of fish skin microbiomes is governed by limiting factors set by the skin mucus and are independent from the abiotic condition differences among sites. Beta diversity seems to be sensitive mainly to dissolved oxygen levels in the water, bearing in mind that dissolved oxygen measurements are missing from the H stream. This finding is consistent with the results obtained by Sylvain et al. [31] who found dissolved oxygen to be a stronger water property than the temperature, salinity and pH in shaping the skin microbiome composition in two Amazon River species. Sylvain et al. [31] identified even stronger chemical properties, which we have not accounted for in this study.

Fish taxonomy effects on fish skin microbiomes
Alpha diversity differences between spatially overlapping fish taxa significantly emerge only when the water background contamination is addressed, highlighting once again the importance of this analytic procedure. The O. insignis skin microbiome has a higher alpha diversity than co-occurring species. Beta diversity differences between O. insignis and co-occurring species is also detected, and the statistical significance of these comparisons persist with the elimination of background contamination. We do not have the scope to determine the source of this difference, with fish phylogeny or niche among the possible sources, O. insignis being strictly benthic and C. damascina, the most common cyprinid, strictly pelagic.

Anthropogenic eutrophication promotes skin dysbiosis
We have a priori defined three sites as interrupted, in which human activity produces excess nutrients that are released into the water. These include a settling pool feeding into the Hermon Stream (site H.0.6), a site in the Snir Stream downstream a fish farm outlet (S.3) and a fishpond outlet on the Jordan River (J.1). The relative abundances of the two phyla, Proteobacteria and Bacteroidetes, strongly covary with these levels of interrupted and uninterrupted sites, where Bacteroidetes relative abundances increase at the expense of Proteobacteria, at the interrupted sites. The relationship between these two phyla is a hallmark of dysbiosis and reduced fish health, in the skin microbiome [46], although different factors can have similar biodiversity signatures. An additional site, which we have not a priori identified as interrupted, also presented elevated Bacteroidetes relative abundances. This site is a very small water body, which is likely to be easily enriched by runoff. As Fig. 6b shows, the treatment of background noise is crucial to distinguish an increase of Bacteroidetes in the water from real skin dysbiosis. This result supports the finding of Legrand et al. [46] as a useful ecological bioindicator for monitoring wild environments. Further, it is in line with the notion that sporadic pollution events of aquatic environments cannot always be detected by bulk water monitoring strategies, while biofilms do capture such events and bear testament to them [49].

Predicted skin microbiome function changes with the consideration of background noise
To predict the implications of background noise treatment for functional inference, we compared the KEGG pathway composition between the raw and corrected predicted metagenomes. We have found that the removal of variants that equally occur in the skin and water microbiomes fundamentally changes the variety of potential pathways in the metagenome. The largest change between the raw and corrected microbial skin communities was in KEGG pathways related to the biosynthesis of secondary metabolites, microbial metabolism in diverse environments and the biosynthesis of antibiotics. These categories are fundamental to the way bacteria interact with their environment and a metagenomic analysis of this sort would be more accurate, taking background noise effects on the microbiome composition into consideration.

Conclusion
In this study, we highlight the importance of a formal consideration of water background noise in fish skin microbiomes, when studying heterogenous inland systems, in which fish species and environmental conditions covary. In the northern Jordan River system, north and south of the Sea of Galilee, we identify a consistent alpha diversity among sites, indicating that the limiting factors of alpha diversity in the skin microbiome are set by the mucus itself, and not by water properties. We further identify the dissolved oxygen to play a role in governing the community composition on the skin, in accordance with a previous research. Finally, we find the ratio of Proteobacteria and Bacteroidetes in the skin microbiome, a useful and informative biomarker for freshwater habitat monitoring.

Study area, sampling procedure and fish identification
Samples were collected between August and October 2017 from 17 sites in the Northern Jordan River water system, nine of the sites representing the upper reaches tributaries Hermon and Snir, five sites representing the northern Jordan River itself and three additional sites from the Springs Valley Jordan River tributaries (Fig. 1, Additional file 4: Table S1). Fish collection was commissioned by the Israeli Nature and Park Authority (NPA), as a part of their monitoring program, under permit 2017/41719. Fish were collected using either an electroshocker or a seine and placed in multiple large containers to avoid contact among individuals. The fish were classified on site, swabbed along the lateral line using a sterile swab and released immediately. Fish species were identified according to the following criteria: Oxynoemacheilus insignis (Heckel, 1843) is the only loach in the system. Astatotilapia flaviijosephi (Lortet, 1883) is the only haplochromine in the system, and it is therefore the only cichlid with egg-shaped marks on its anal fin. Coptodon zillii (Gervais, 1848) is a cichlid with a dotted tail fin and 8-9 protrusions per gill raker. Sarotherodon galilaeus (Linnaeus, 1758) is a cichlid with a clear convex tail fin and over 13 protrusions per gill raker and a black mark on the operculum. Oreochromis hybrids are cichlids with striped tail fins and over 17 protrusions per gill raker. Gambusia affinis (Baird and Girard, 1853) is the only killifish in the study area and identifiable by its size (< 5 cm) and superior mouth. Mugilidae individuals escaped from fish farms in the regions (The Jordan River system has no marine outlet.) and were identified by their general mugilid form. Within Cyprinidae, Carasobarbus canis (Valenciennes, 1842) and Barbus longiceps (Valenciennes, 1842) each have two pairs of barbels. B. longiceps with an elongated head and over 50 scales along its lateral line. C. canis is distinguishable by its short head and very large scales, less than 40 along the lateral line. Capoeta damascina (Valenciennes, 1842) has one pair of barbels and very small scales, over 70 along its lateral line. Garra nana (Heckel, 1843) and Garra jordanica Geiger and Freyhof, 2014 have small, barely visible barbels. G. jordanica has a suction cup and G. nana a fold, under the lower lip. Acanthobrama lissneri Tortonese, 1952, has elongated and compressed body, with deeply forked tail and up to 12 cm adult total length, and Pseudophoxinus kervillei (Pellegrin, 1911) is of similar size, has circular body section and a forked tail with a black stain at the base.
In addition to swab samples, in each sampling site, 2 l of water were filtered using a sterile mixed cellulose esters 0.45-μm-pore-size filter. The swabs and filters were kept in ice on site and transferred to a − 80°C until further processing. The water temperature, conductivity, pH and percent dissolved oxygen were measured at each site using a YSI ProPlus with a Quatro Cable multiparameter cable, with the exception of dissolved oxygen at Hermon Stream (H) sites and pH at two Jordan River (J) sites.
16S rRNA library preparation DNA was extracted from the swabs and filters using the DNeasy PowerSoil and PowerWater DNA extraction kits (Qiagen), respectively, following the manufacturer's instructions. Metabarcoding libraries were prepared using a two-step PCR protocol, in which the first PCR reaction is designed to amplify the genetic marker along with artificial overhang sequences and the second PCR reaction is designed to attach sample specific barcode sequences and Illumina flow cell adapters. The forward and reverse PCR primers in the first reaction were ′5-tcgtcggcagcgtcagatgtgtataagagacagCCTACGGGNGGCW GCAG-′3 and ′5-gtctcgtgggctcggagatgtgtataagagacag-GACTACHVGGGTATCTAATCC-′3, respectively, including the target-specific primers for the V3-V4 region [50] with overhangs in lowercase. For the second PCR reaction, the forward and reverse primers were ′ ′5-AATGATACGGCGACCACCGAGATCTACACtcgtcggc agcgtcagatgtgtataagagacag-′3 and ′5-CAAGCAGAAG ACGGCATACGAGATXXXXXXgtctcgtgggctcgg-3′, with Illumina adapters (uppercase), overhang complementary sequences (lowercase), and sample-specific DNA barcodes ('X' sequence). The PCR reactions were carried out in triplicate, with the KAPA HiFi HotStart ReadyMix PCR Kit (KAPA biosystems), in a volume of 25 μl, including 1 μl of DNA template and following the manufacturer's instructions. The first PCR reaction started with a denaturation step of 3 min at 95°C, followed by 35 cycles of 20 s denaturation at 98°C, 15 s of annealing at 55°C and 7 s of polymerization at 72°C. The reaction was finalized with another 1-min-long polymerization step. The second PCR reaction was carried out in a volume of 25 μl as well, but with 10 μl of the PCR1 product as DNA template. It started with a denaturation step of 3 min at 95°C, followed by 8 cycles of 20 s denaturation at 98°C, 15 s of annealing at 55°C and 7 s of polymerization at 72°C. The second PCR reaction was also finalized with another 1-min-long polymerization step. The first and second PCR reaction products were purified using AMPure XP PCR product cleanup and size selection kit (Beckman Coulter), following the manufacturer's instructions, and sequenced on an Illumina MiSeq to produce 250 base-pair paired-end sequence reads. The sequencing was carried out by the genomics applications laboratory at the faculty of medicine, Hebrew University. The raw sequence data is archived in NCBI under BioProject PRJNA560003 (Temporary reviewer's link: https://bit. ly/2YWRTvC).

Amplicon sequence variance, taxonomy assignment and background noise treatment
The bioinformatics analysis is provided on GitHub, at https://git.io/fjFZo (DOI: 10.5281/zenodo.3583001) as a Jupyter Notebook (https://bit.ly/2z7teFm), coupled with raw data and intermediate and output files. Sequence data trimming, amplicon sequence variant (ASV) prediction and taxonomic identification were carried out in Trimmomatic 0.39 [51] (https://bit.ly/2Hcv6AZ) and DADA2 1.12 [52]. The naive Bayesian classifier used to predict taxonomic identities was trained with data from the SILVA SSU-rRNA database version 132 [53] (https://bit.ly/2 OZXrkl). The resulting ASV biom table was filtered with QIIME2 2019.4 [54] to exclude ASVs assignable to eukaryotes or eukaryotic organelles and include ones with at least 100 copies in at least two samples (https://bit.ly/3 0euZwh). Following alpha rarefaction analysis (https://bit. ly/2NeetsA), the ASV biom table was further filtered to exclude samples with less than 1000 sequences. A subset of the ASV biom table was created to represent the skin microbiome without ASVs that are likely to belong strictly to the water (https://bit.ly/2Z4vXhp). In this subset, we included ASVs that were unique to the swab samples, or that had a significantly higher relative abundance in the swab samples than in water samples, based on Benjamini-Hochberg corrected [33] Mann-Whitney U test [34] (https://bit.ly/2HbEyEP). To carry out this test, we used SciPy 1.2 [55] and StatsModels 0.10 [56]. The original and corrected probability values are denoted as "p value" and "q value", respectively. This process is regarded as "background noise treatment", and the subset as the "corrected skin community" throughout the text. To study the taxonomic composition of the samples (https://bit. ly/2TMkEFl) and the relationship between Proteobacteria and Bacteroidetes in the different sampling sites (https://bit.ly/33GUgkL), we collapsed the ASV biom table to taxonomic tables (https://bit.ly/2z9Qlic) using QIIME2 2019.4 [54].

Biodiversity analyses
To study the factors shaping alpha diversity, we computed Faith phylogenetic diversity (Faith PD) indices [35] for each sample, and tested the global and pairwise effect of stream and fish family levels, using the Kruskal-Wallis test [37] in QIIME2 2019.4 [54] (https://bit.ly/2OZPfR2). Faith PD depends on the number of ASVs in the sample, their pairwise phylogenetic distances and their relative abundances. We further tested the correlation of Faith PD values with the water measurements using SciPy 1.2 [55]. This was carried out for both the raw swab communities and the corrected skin communities (https://bit.ly/2z6v217).
To study the factors shaping beta diversity, we produced unweighted UNIFRAC matrices [36] which were used for principal coordinates analysis (PCoA) [39,40], biplots [40] and PERMANOVA tests [38] in QIIME2 2019.4 [54] (https://bit.ly/2OZPfR2). The factors considered were the stream of origin and the family or tribe of the host fish. For the latter, we carried the analyses per basin to increase the geographic overlap of the fish species. This procedure was carried out for both the raw swab communities and corrected skin communities. We further tested the correlation of the water measurements with the values along the first and second PCoA axes, in order to explain these axes, using SciPy 1.2 [55] (https:// bit.ly/2KV9Vod). Finally, we executed ANCOM tests [41] to identify the bacterial ASVs explaining the group separation between significantly different fish families (https://bit.ly/2KH1M7O).

Evaluating assumptions of linearity in the biodiversity analysis
The UniFrac-based PCoA analysis and subsequent Pearson's correlation assume linear relationships in various steps. To gage the importance of such assumptions in biasing the results, we utilised the kernel principal component analysis (PCA) and support vector regression (SVR), implemented in Scikit-learn [57]. Kernel PCA is agnostic to the ASV phylogeny and was therefore not preferred as our primary workflow. However, a direct comparison between the non-linear radial basis function (RBF) kernel and a linear kernel [43] is available in both the Kernel PCA and SVR [44] steps, making this approach particularly suited to evaluate linearity assumptions (https://bit.ly/2rf396V).

Posterior distributions for Faith PD and relative abundance expected means
We have taken a Bayesian approach to evaluate the posterior Faith PD mean (μ) of the different site and fish family levels, as well as the μ relative abundance of bacterial taxa in the raw swab and corrected skin samples. The analysis was carried out in PyMC3 [58] (https://bit.ly/2M5P5nm), using normal priors with the sample mean. We additionally computed posterior distributions for the difference between each pair of compared μ values, expressed as the subtraction of μ1 and μ2, divided by the pooled standard deviation of both parameters. The pooled standard deviation was expressed as the sum of standard deviations, divided by 2. Posteriors were sampled at least 11,200 times per parameter, in four chains, until convergence.

Functional implications of background noise treatments in swab samples
To predict the differences in relative abundances of metabolic pathways between the raw and treated swab communities, we predicted their metagenomes and abundances of KEGG ENZYME terms [59], using PICRUSt 2.1.4-b [47] (https://bit.ly/2ZcYSf4). ENZYME term abundances were converted to relative abundances using pandas 0.42 [60] and the differences between the raw and corrected samples were tested with Benjamini-Hochberg corrected [33] Wilcoxon tests [61] in SciPy 1.2 [55] and StatsModels 0.10 [56]. The original and corrected probability values are denoted as "p value" and "q value", respectively. The KEGG PATHWAY categories of each significantly different entry were retrieved with Biopython's REST KEGG API [62].