Study design
Human participants
Stool samples were gathered from 14 healthy adults, aged 18–65 years, from Denmark and Hong Kong. Samples were collected over 3–4 months. The Danish study was approved by the local ethics committee in Region Zealand, Denmark (SJ-383), and the Hong Kong study was approved by the Institutional Review Board of The University of Hong Kong/Hospital Authority Hong Kong West Cluster (UW 17-042). All work was performed in accordance with the Good Clinical Practice principles and the Helsinki Declaration. Written informed consent was obtained from all participants. Patient characteristics are described in (Suppl. Table 13). Subjects with any of the exclusion criteria below were not eligible for entry into the present study: (i) history of taking antibiotics over the last 6 months, (ii) receiving systemic antifungals/antifungal mouthwashes or probiotics concurrently, (iii) patients suffering from immunosuppressive conditions or taking immunosuppressants, and (iv) severe medical comorbidities requiring frequent hospitalization. Another cohort of six healthy, untreated individuals from Canada was acquired from a previous study from Raymond et al. [58].
Treatment
Of the participants, 12 were treated for 6 days with 1 antibiotic drug out of 5: doxycycline (tetracycline class), azithromycin (macrolide class), Augmentin (β-lactam class), ciprofloxacin (quinolone class), and cefuroxime (β-lactam class). Two untreated participants were used as controls.
Sampling
From each participant in the clinical study in Denmark, 6 stool samples were obtained: one 15 days before treatment (± 1 day), two during treatment (days 3 and 5 of treatment ± 1 day), and three at 15, 30, and 90 days after treatment (± 1 day). From each participant in the clinical study in Hong Kong, four stool samples were obtained at 7 days before treatment (± 1 day), day 6 of treatment, and 30 and 90 days after treatment. Collected samples were aliquoted and stored at − 80° immediately after collection until DNA extraction. Stool samples of control patients treated with placebo [58] were acquired before, 7 days, and 90 days after treatment.
Metagenomics and metatranscriptomics sequencing
For participants in the clinical study in Denmark, bacterial metagenomics and metatranscriptomics raw data were obtained from Kang et al. (in preparation). Briefly, DNA was extracted using a MO BIO PowerMax Soil DNA Extraction Kit (MO BIO Laboratories, Inc) and purified with PowerClean Pro DNA Clean-Up Kits (MO BIO Laboratories, Inc.) according to the manufacturer’s protocol. For RNA, rRNA was depleted using a Ribo-Zero Gold rRNA removal kit—Epidemiology (Illumina). The remaining total RNA was extracted using a MO BIO PowerMicrobiome™ RNA Isolation Kit (MO BIO Laboratories, Inc.). RNA and DNA sequencing were performed on an Illumina HiSeq 2000 (PE125). For participants in the clinical study in Hong Kong, bacterial DNA and RNA were extracted from 200 mg aliquots of frozen stool by Beijing Genome Institute (BGI). DNA was extracted using an E.Z.N.A.® Stool DNA Kit according to the manufacturer’s protocol. For RNA, rRNA was depleted using a Ribo-Zero™ Magnetic Kit. The remaining total RNA was extracted using a RiboPure-Yeast Kit. All samples were sequenced on an Illumina HiSeq 4000 platform (Illumina, San Diego, California, USA; paired-end, insert size 350 bp, read length 150 bp for DNA and 100 bp for RNA).
Internal transcribed spacer sequencing
All stool samples from both cohorts were processed by Novogene for internal transcribed spacer (ITS) sequencing. DNA was extracted using the following protocol: Stool samples were thoroughly mixed with 900 μL of CTAB lysis buffer. All samples were incubated at 65 °C for 60 min before being centrifuged at 12000×g for 5 min at 4 °C. Supernatants were transferred to fresh 2-mL microcentrifuge tubes and 900 μL of phenol:chloroform:isoamyl alcohol (25:24:1, pH = 6.7; Sigma-Aldrich) was added for each extraction. Samples were mixed thoroughly prior to being incubated at room temperature for 10 min. Phase separation occurred by centrifugation at 12,000×g for 15 min at 4 °C, and the upper aqueous phase was re-extracted with a further 900 μL of phenol:chloroform:isoamyl alcohol. Next, samples were centrifuged at 12,000×g for 10 min at 4 °C, and the upper aqueous phases were transferred to fresh 2-mL microcentrifuge tubes. The final extraction was performed with 900 μL of chloroform:isoamyl alcohol (24:1), and layer separation occurred by centrifugation at 12,000×g for 15 min at 4 °C. Precipitation of DNA was achieved by adding the upper phase from the last extraction step to 450 μL of isopropanol (Sigma-Aldrich) containing 50 μL of 7.5 M ammonium acetate (Fisher). Samples were incubated at − 20 °C overnight, although shorter incubations (1 h) produced lower DNA yields. Samples were centrifuged at 7500×g for 10 min at 4 °C, and supernatants were discarded. Finally, DNA pellets were washed three times in 1 mL of 70% (v/v) ethanol (Fisher). The final pellet was air-dried and re-suspended in 200 μL of 75 mM TE buffer (pH = 8.0; Sigma-Aldrich). The resulting fungal sequences were amplified using ITS2-F: 5′ GCATCGATGAAGAACGCAGC-3′ and ITS2-R: 5′ TCCTCCGCTTATTGATATGC-3′ primers [59, 60]. ITS2 amplicons were generated in three steps by PCR with 38 cycles: 98 °C 10s, 59 °C 10s, and 72 °C 30s followed by sequencing on the Illumina HiSeq platform (2 × 250 bp, Novogen, China).
Metabolomics
For 4 participants, bile acid profiles and MicrobioMET profiles were assessed by Metabo-Profile (Shanghai, China) using aliquots of frozen stool. For bile acid profiles, bile acid-free matrix (BAFM) was obtained using the charcoal-stripping protocol. Calibrators and quality controls were prepared for the BAFM and processed as for extraction of bile acids from stool samples. About 10 mg prechilled zirconium oxide beads were added to 10 mg stool with 15 μl ultrapure water. To each sample, a 200-μl aliquot of prechilled acetonitrile/methanol containing 10 internal standards was added for homogenization. After centrifugation at 13,500 rpm and 4 °C for 20 min, 50 μl supernatant was transferred to 96-well plates. Acetonitrile/water (150 μl) was added for gentle shaking for 5 min before injection into an ultra-performance liquid chromatography column coupled to tandem mass spectrometry (UPLC-MS/MS) system to quantitate bile acids.
MicrobioMET profiles including aromatic phenols and indoles, phenolic acids, short-chain fatty acids and branched-chain amino acids, amino acids, and organic acids were quantitated using gas chromatography coupled to time-of-flight mass spectrometer (GC-TOFMS). Stool aliquots (50 mg) were homogenized with 300 μl NaOH (1 M) solution using a homogenizer and centrifuged at 13,500 rpm and 4 °C for 20 min. Supernatants (200 μl) were transferred into autosampler vials and residue extracted with 200 μl cold methanol. After a second homogenization and centrifugation, 167 μl supernatant was combined with the first supernatant in the autosampler vial. Extracts were capped and used for automated sample derivatization by a robotic multipurpose sample MPS2 with dual heads (Gerstel, Muehlheim, Germany). Samples pre-treated with sodium sulfate were shaken at 1500 rpm and 4 °C for 20 min and transferred to capped empty autosampler vials for the GC-TOFMS.
Data processing
Quality control of sequence data
Quality control of raw reads (DNA, RNA) used a previously described pipeline [61]. Adapter sequences, low-quality bases (Q < 20), duplicated reads, reads shorter than 75 bp and reads mapping to the human genome with 95% coverage were filtered out. Computational scripts are at https://github.com/TingtZHENG/VirMiner/.
In situ bacterial growth rate estimation
Quality controlled FASTQ samples were sub-sampled to 2 million reads per sample. GRiD version 1.3 [25] was used with the corresponding stool database on sub-sampled samples to assess the growth bacterial strains. Default parameters were used but with minimum coverage threshold of 0.5 in order to investigate growth rates for different thresholds. After investigating the results, and as suggested by the GRiD authors, we continued with the growth estimates for strains with coverage 1.0 or higher. Statistical testing of (a) median growth rates and (b) the number of growing species was performed with a Wilcoxon signed-rank test. Normalized effect size r was estimated using the R package “rcompanion” and its function “wilcoxonPairedR”.
Abundance profiling
HUMAnN2 [19] version 0.11.1 was used to estimate gene family abundances in metagenomic DNA and RNA samples. Resulting reads per kilo-base (RPK) for gene family abundances at species level (including unclassified taxa) were further normalized by counts per million (CPM), resulting in a transcripts per kilo-base million (TPKM) like normalization.
PIPITS pipeline [21] version 1.4.5 was used for ITS with default parameters including quality filtering, read-pair merging, ITS2 filtering, and chimaera removal. Remaining reads were binned based on 97% similarity as operational taxonomic unit and aligned to the UNITE fungi database using Mothur classifier [22]. For further downstream analysis, all samples were normalized by cumulative sum scaling using MetagenomeSeq [62].
For bile acid profiles, raw data from UPLC-MS/MS were processed using QuanMET software (v1.0, Metabo-Profile) for peak integration, calibration and quantitation for each bile acid. The analyte concentration of unknown bile acid was calculated using a calibration curve.
For MicrobioMET profiles, raw data from the GC-TOFMS were processed using proprietary software XploreMET (v2.0, Metabo-Profile) for automatic baseline denosing, smoothing, peak picking, and peak signal alignment. MS-based quantitative metabolomics determined the concentration of unknown metabolites by comparing the unknown to a calibration curve. Abundance of MirobioMET profiles was calculated to minimize large individual variations in metabolites.
Metagenomic sequences from HUMAnN2 profiles
TPKM-normalized gene family abundances from DNA were clustered using mgs-canopy version 1.0 software (https://anaconda.org/bioconda/mgs-canopy). We used standard parameters except for a Pearson correlation coefficient cut-off of 0.95 instead of the default 0.9. Gene family clusters with at least 700 genes were considered metagenomic sequences (MGS). Taxonomic annotation of MGS used species annotation information available for each gene family. We calculated contributions of each species to an MGS (including unclassified taxa). An MGS was annotated to species level using the largest gene family distribution if the gene contribution of that species was at least 51% and the second largest species (a) was “unclassified” or (b) contributed at most 10%. MGS with more than 90% gene contribution from the same species were considered “pure” or “unambiguous”. Using a more stringent species assignment than the original method [29], from a total of 213 MGS, we obtained 80 with species-level assignment (Suppl. Table 6).
Genomic strains from MGS
MGS with species assignments were processed independently. Reads that (1) contributed to the abundance of an MGS, and (2) mapped to the inferred species (based on ChocoPhlAn reference [19]) were extracted. We used PanPhlAn [38] version 1.2.1.3 to create species-specific pangenomes based on reference genomes from the National Center for Biotechnology Information (Suppl. Table 14), mapped reads against the species pangenome, and calculated per-gene per-reference profiles. Gene abundance was normalized to reads per base. A gene was covered sufficiently if it had at least 0.5 reads per base. We accepted a strain reference if: (1) at least 90% of genes in the MGS were found to have sufficient coverage, and (2) the reference had the highest number of covered genes. For experimental verification, we considered using a commercially available strain if the number of covered genes was at most 1% less than the best-fitting strain.
Diversity analysis
Diversity analysis of fungal and bacterial communities was performed in R version 3.6.1 using the package vegan [63] version 2.5-5. Testing for significant differences in alpha diversity between time points was performed using a two-sided Wilcoxon signed-rank test. Resulting p values were adjusted for multiple testing using FDR. UniFrac metrics measured beta diversity by accounting for phylogenetic similarities of different species. Weighted UniFrac gives the most importance to dominant species. Unweighted UniFrac does not consider abundance. Generalized UniFrac with α = 50% gives the most weight to moderately abundant species [64] and the generalized UniFrac with α = 75% to species with abundance between median and dominant levels.
Transcriptional activity
Relative abundances using DNA and RNA were normalized to transcripts per million. Let f denote a gene or pathway. The transcriptional activity of f is defined as the TPKM-normalized RNA abundance of f divided by the TPKM-normalized DNA abundance of f.
Core metatranscriptome
The core metatranscriptome was described in [20]. Briefly, we used MetaCyc pathway relative abundances as generated by HUMANn2 for both DNA and RNA. We calculated transcriptional activity for each pathway. The core metatranscriptome was defined as the set of pathways with a sample prevalence of at least 80% with variable metatranscriptome having prevalence of 30 to 80%. Pathways with less than 30% prevalence were ignored.
Contributional alpha diversity
We followed the procedure in [26], with some exceptions. For each MetaCyc pathway (PWY), the contribution of species to the pathway was determined. Ecological alpha diversity measures (Shannon and Simpson) were applied per sample and separately using DNA and RNA data. Mean diversity per sample was used to test for significant differences between time points using pairwise two-sided signed Wilcoxon tests. Resulting p values were corrected for multiple testing using false discovery rate (FDR).
Statistics for MGS and ITS abundance
We used MetagenomeSeq [62] version 1.22.0 with a zero-inflated Gaussian mixture model. Following the MetagenomeSeq vignette, CSS normalization was applied on relative abundance data. All possible pairwise tests between the different sampling time points were performed (baseline, DT, EPT, and LPT time points). We controlled for patient-wise differences when possible. For MGS, D2 and D4 were excluded to improve signal quality. A 15% prevalence filter was used for each test independently. Controlling for multiple testing was performed on p values using FDR.
Two-way PERMANOVA testing
Stool samples from the same participant were statistically dependent. To test for significant differences in means of beta diversity between different time points, two-way permutational analyses of variance (PERMANOVA) were performed using “subject id” as covariate and “sample time point” as second independent variable. We performed tests on beta diversity matrices using the function “adonis” as implemented in R package vegan with 9999 permutations. We reported F values, R2, and p values for “sample time point”. P values from pairwise PERMANOVA tests were corrected for multiple testing using FDR.
Compositionality tests
We implemented a compositionality test from Palleja et al. [27]. Briefly, we used the test to address the issue of false-positive and false-negative findings in compositional data [65]. We accepted significant findings for a species based on relative abundance only if they would still be significant if other species were removed from the abundance table. Therefore, if one species was removed, the data were total-sum normalized and p values calculated. The procedure was repeated for all species. The final p value for a species was determined using the highest calculated p value. Thus, a species could not become significant because of depletion or inflation of another dominant species. Since this test was very conservative, we used a higher q value of 0.1 to decide significance to avoid overlooking potential findings.
Correlation analyses using stool metabolite abundance
To identify metabolites with a potential effect on Candida, Saccharomyces, Penicillium, and Aspergillus spp., we calculated Spearman’s correlations for total-sum scaling (TSS) ITS abundance and both bile acid and MicroMET profiles. To account for zero-inflation, we considered only samples with nonzero abundance of Candida albicans (5 samples). We then considered all significant correlations (p < 0.05) with an absolute correlation of at least 60%.
To identify direct or indict bacterial producers of the metabolites, total-sum scaled MGS abundances were correlated with log2 transformed metabolites abundances. Correlation was inferred using sparse partial least squared analysis (sPLS) by utilizing relevance vectors (R package mixOmics [30]).
Co-abundance networks
Co-abundance networks were created based on total-sum-normalized data using BAnOCC [31]. Significance of an edge was determined as described [20]. For posterior inference, we used the 95% credible interval. An edge was therefore considered significant if the corresponding 95% credible interval did not contain zero. Only significant correlations with an absolute estimated coefficient of at least 30% were used for analysis. Significant changes in network structure between any two time points were determined using Wilcoxon signed-rank tests on node degree. Effect sizes are reported in terms of a standardized effect size analogous to the one used for the Mann-Whitney test, \( r=z/\sqrt{n} \), where z is the z-statistic of the paired test and n is the number of observations. r values are analogous to Pearson correlation coefficients. Hence, r ranges from − 1 (100% decrease) to 1 (100% increase). Formula and implementation can be found in the R package “rcompanion”.
Fungal species co-abundance network
TSS-normalized operational taxonomic unit (OTU) abundances based on ITS2 data were used. OTUs detected in less than 10% of samples were removed. BAnOCC was executed with 5 chains, 5000 iterations, and 1000 warmup cycles to reach convergence. BAnOCC was used as described above.
MGS-ITS network with BAnOCC
MGS and ITS relative abundances were independently total-sum normalized. Only species measured in 25% of samples were used further. Abundances of less prevalent species were summed per sample into a group called “other” to maintain library sizes. MGS and ITS features abundances were combined and analysed using BAnOCC as described above.
RNA-PWY-ITS network with BAnOCC
RNA abundances of PWY and ITS were independently total-sum normalized. A 50% samples prevalence filter was applied to make this computation feasible and decrease false-positive rate. Abundances of less prevalent features were summed per sample into a group called “other”. BAnOCC was used as described above.
Supernatant experiments
Strains and culture conditions
Odoribacter splanchnicus (DSM20712), Bacteroides eggerthii (DSM20697), C. albicans (SC5314/ ATCC MYA-2876), C. albicans (ATCC 10231), and C. albicans (ATCC 18804) were grown at 37 °C under anaerobic conditions (anaerobic gas mixture, 95% N2, and 5% H2) in pre-reduced modified Gifu anaerobic media (mGAM; Nissui Pharmaceutical Co. Ltd.) broth for liquid cultures or mGAM broth supplemented with agar (Nissui Pharmaceutical Co. Ltd.) for growth on plates.
Sterile bacterial supernatants
Bacterial strains grown for 48 h in mGAM broth were subcultured 1:50 in fresh mGAM broth and grown for 48 h in anaerobic conditions at 37 °C. Bacterial cultures were spun down at 11,000×g for 5 min. Supernatants were carefully removed and filtered through 0.2-μM syringe filters to remove bacteria in suspension.
Supernatant growth inhibition assays
C. albicans growth rates were analysed in 200 μl liquid mGAM with 50% or 100% sterile bacterial supernatant added. C. albicans inoculations were at 1:1000 from an overnight culture grown in aerobic conditions at 37 °C. Cultures were in 96-well microtiter plates at 37 °C with orbital shaking 365 cpm (2 mm). Cell densities were measured every 10 min at optical density 600 nm (OD600) using a microtiter reader (BioTek ELx800). Growth rates were calculated by plotting the log OD measurements in log phase and calculating slopes for timepoints in log phase where r2 was closest to 1, using at least 12 time points (2 h apart).
Supernatant metabolite assays
Analysis of SCFA in samples was carried out by MS-Omics as follows. Samples were acidified using hydrochloride acid, and deuterium labelled internal standards where added. All samples were analysed in a randomized order. Analysis was performed using a high polarity column (Zebron™ ZB-FFAP, GC Cap. Column 30 m × 0.25 mm × 0.25 μm) installed in a GC (7890B, Agilent) coupled with a quadropole detector (5977B, Agilent). The system was controlled by ChemStation (Agilent). Raw data was converted to netCDF format using Chemstation (Agilent), before the data was imported and processed in Matlab R2014b (Mathworks, Inc.) using the PARADISe software described by Johnsen et. al [68].
Other compounds such as bile acids were analysed using MS/MS. The analysis was carried out using a Thermo Scientific Vanquish LC coupled to Thermo Q Exactive HF MS. An electrospray ionization interface was used as ionization source. Analysis was performed in negative and positive ionization mode. The UPLC was performed using a slightly modified version of the protocol described by Catalin et al. (UPLC/MS Monitoring of Water-Soluble Vitamin Bs in Cell Culture Media in Minutes, Water Application note 2011, 720004042en). Peak areas were extracted using Compound Discoverer 2.0 (Thermo Scientific). Identification of compounds were performed at four levels: level 1—identification by retention times (compared against in-house authentic standards), accurate mass (with an accepted deviation of 3 ppm), and MS/MS spectra; level 2a—identification by retention times (compared against in-house authentic standards), accurate mass (with an accepted deviation of 3 ppm); level 2b—identification by accurate mass (with an accepted deviation of 3 ppm), and MS/MS spectra; level 3—identification by accurate mass alone (with an accepted deviation of 3 ppm).
C. albicans growth inhibition by metabolites
Metabolites were acquired from the companies Sigma-Aldrich, Merck KGaA, and Roth. More specific details can be found in Suppl. Table 15.
C. albicans growth curves
Dilution series of metabolites in water were started at concentrations approximately 10-fold below maximum solubility in water where applicable (Suppl. Table 16). Dilutions were in synthetic SD medium (1× yeast nitrogen base, 2% glucose, 0.5% NH4SO4). C. albicans was grown overnight in YPD (1% yeast extract, 2 % peptone, 2 % glucose), washed 3× in sterile water, and inoculated at 1:100 (OD600 ≈ 0.2). Absorbance was measured every 15 min with an infinite M200pro microwell plate reader (Tecan, Austria) set to 30 °C with intermittent shaking (10 s orbital shaking before each measurement). Generation times were calculated from the obtained triplicate growth curves.
Host cell damage assays
To determine the influence of metabolites on the general capacity of C. albicans to cause host cell damage, we used an established epithelial cell model based on the vaginal epithelial cell line A431. A431 were grown in RPMI media containing 10% foetal bovine serum (FBS), and 200 μl cells at 105 cells/ml were seeded into 96-well plates and incubated at 37 °C, 5% CO2. After 48 h, cells were washed with 1× PBS, and 100 μl compound at indicated concentrations was added, followed by 100 μl Candida cells at multiplicity of infection 1. Incubation continued under the same conditions for 24 h. Basal lactate dehydrogenase (LDH) release (low control) was determined with 200 μl RPMI, and maximum LDH release (high control) determined by addition of 100 μl 0.5 % Triton X-100 to cells in 100 μl RPMI. Plate were centrifuged at 250×g for 10 min and supernatants were removed and diluted 1:10 and mixed with 100 μl freshly prepared LDH assay mix (Roche). After 25 min at room temperature in the dark, LDH activity was determined with a microplate reader (Tecan infinite M200) as absorbance (A) at 492 nm, with 660 nm as a reference. Damage was calculated as (Asample − Alow)/(Ahigh − Alow).
C. albicans morphology
The effect of metabolites on C. albicans morphology was tested at all concentrations used in cell damage assays. Metabolites were diluted in 250 μl RPMI medium with 10% FBS and added to 250 μl C. albicans in RPMI in 24-well plates to indicated concentrations. Plates were incubated at 37 °C and 5% CO2 for 4 h to induce hyphae formation. Medium was removed and cells fixed with Histofix 4% formaldehyde solution. Morphology was evaluated using an inverse microscope (Axio Zeiss Vert. A1) to differentiate yeasts, hyphae, and pseudohyphae.
Pairwise co-cultivation experiments
Interactions between C. albicans and B. eggerthii and O. splanchnicus were assayed via pairwise cultivations. C. albicans cell counts were compared to control conditions of cultivation without bacteria.
Fungal and bacterial cells were grown anaerobically at 37 °C for up to 48 h in mGAM and used as inocula for pairwise experiments. Inocula biomasses were estimated via OD600 and adjusted to 1.0 by diluting in appropriate media. Inocula were transferred to microplates containing the same media to a final OD600 of 0.01. Ratios of fungal to bacteria cells were 1:1. Microplates were incubated at 37 °C statically under anaerobic conditions. Cell counts from inocula were resolved, prior to the co-cultivation experiments, via flow cytometry (BD LSRFortessa, BD Biosciences, Franklin Lakes, NJ, USA).
Five microplates were prepared using the same inoculum. Microplates were removed from the anaerobic chamber every 5 h (0, 5, 10, 15, and 20 h cultivation). Cells were immediately fixed in 2% formaldehyde for 15 min at room temperature by mixing an equal amount of sample volume and 4% formaldehyde (Sigma-Aldrich, Saint Louis, MI, USA) [66, 67]. After fixing, total C. albicans cells were counted via flow cytometry (BD LSRFortessa, BD Biosciences, Franklin Lakes, NJ, USA). Experiments were performed in triplicate.