- Open Access
Novel soil-inhabiting clades fill gaps in the fungal tree of life
Microbiomevolume 5, Article number: 42 (2017)
Fungi are a diverse eukaryotic group of degraders, pathogens, and symbionts, with many lineages known only from DNA sequences in soil, sediments, air, and water.
We provide rough phylogenetic placement and principal niche analysis for >40 previously unrecognized fungal groups at the order and class level from global soil samples based on combined 18S (nSSU) and 28S (nLSU) rRNA gene sequences. Especially, Rozellomycota (Cryptomycota), Zygomycota s.lat, Ascomycota, and Basidiomycota are rich in novel fungal lineages, most of which exhibit distinct preferences for climate and soil pH.
This study uncovers the great phylogenetic richness of previously unrecognized order- to phylum-level fungal lineages. Most of these rare groups are distributed in different ecosystems of the world but exhibit distinct ecological preferences for climate or soil pH. Across the fungal kingdom, tropical and non-tropical habitats are equally likely to harbor novel groups. We advocate that a combination of traditional and high-throughput sequencing methods enable efficient recovery and phylogenetic placement of such unknown taxonomic groups.
Fungi are one of the key microbial groups in terrestrial ecosystems that enabled colonization of land by plants and facilitated development of soil that supports most of the biota on Earth [1, 2]. The kingdom Fungi is one of the most diverse groups of life with an estimated 1.5–6 million species that represent heterotrophic mutualists, pathogens, and saprotrophs [3, 4]. The 70,000–100,000 currently recognized species are distributed among 156 orders, 46 classes, and 12 phyla [3, 5, 6]. Fungi have traditionally been identified and classified based on morphological characters of fruiting bodies and living cultures. Similar to bacteria and archaea, merely <1% of fungal species have been cultivated with established protocols, which renders large taxonomic groups undescribed and virtually unknown to science [6, 7]. Roughly 80% of all soil-inhabiting fungal taxa cannot be identified at the species level, and 20% cannot be reliably assigned to known orders .
For the last two decades, molecular discovery and characterization of fungi have rapidly outpaced traditional morphological description. Public sequence databases have accumulated internal transcribed spacer (ITS) barcodes  representing hundreds of groups of closely related fungal species with no taxonomic identity due to the paucity of relevant reference sequences and lack of phylogenetically informative ribosomal RNA (rRNA) genes  (Additional file 1). Studies using a single molecular marker have shed light on several divergent but undescribed lineages of marine and terrestrial organisms among bacteria , protists , and fungi [13, 14]. Analysis of multiple genetic markers obtained from vegetative tissues, single-cell genomics, or whole metagenome assays of the environment has improved the phylogenetic placement and classification for many of these previously unknown organisms [14–17], but many more remain overlooked . Because many of these lineages are not known from voucher material, the inability to name organisms only on the basis of sequence data hinders higher-level classification of fungi and other taxa .
Here, we aim to determine the phylogenetic placement of previously unclassified soil fungi by developing 452 taxon-specific primers (Additional file 2: Table S1) targeting nuclear 18S (nSSU) and 28S (nLSU) rRNA genes in 263 ITS-based operational taxonomic units (OTUs) from global soil samples analyzed by Tedersoo et al. . Since the long 18S-ITS-28S rRNA gene sequences were generated by combining several amplicons from Sanger sequencing and 454 pyrosequencing (Fig. 1), we performed a multi-step quality control to exclude any potentially artefactual entities. For the recovered novel soil fungal lineages, our purpose was to establish broad ecological niches for climatic and edaphic parameters and to determine geographic distribution together with endemicity patterns. We hypothesized that tropical soils harbor relatively more enigmatic fungal lineages, because (i) tropical habitats exhibit greater speciation but lower extinction rates , (ii) tropical forests harbor greater fungal richness , and (iii) lower latitudes are relatively poorly covered by biodiversity and taxonomic research .
Results and discussion
Novel clades of fungi
Phylogenetic analyses revealed 37 major clades and seven single branches (singleton lineages) of previously unrecognized or unclassified fungi with distinct phylogenetic position that warrant at least order-level classification (Additional file 1: Text S1). In the 18S rRNA gene and concatenated gene analyses, the clade GS01 was placed in a sister position to all remaining fungi, although the statistical support for this and most other early branching configurations remained poor (Additional file 1: Figures S1-S3).
Altogether, 11 clades (GS2–GS12) and three distinct branches (32%) of previously unclassified soil fungi were placed within Rozellomycota (Cryptomycota). Our findings highlight that the remarkable phylogenetic diversity of Rozellomycota from aquatic ecosystems [14, 20] is also observed in terrestrial habitats. Unlike in recent analyses , Rozellomycota was separated from the phylum Aphelidea that accommodates the clade GS16, a large and well-supported group with no taxonomically characterized representatives. Other zoosporic phyla accommodated fewer undescribed fungal clades. Chytridiomycota harbored two distinct environmental groups, the clade GS13 with an unsettled position, and the clade GS14 in a sister position to Spizellomycetales. The clade GS15 formed a long branch within the Blastocladiomycota, albeit with low support (BS <70). Two clades of closely related soil fungi clustered with the enigmatic “chytrid” genus Olpidium that warrants a (sub)phylum of its own . Taxonomically uncharacterized novel lineages of Chytridiomycota s.lat. are particularly common in freshwater  and marine environments .
Among the former zygomycetes, the clade GS19 formed a deep lineage at the base of Kickxellomycotina and Zoopagomycotina. Clades GS20, GS21, and GS22 were loosely associated with Endogonales (Mucoromycotina), whereas a single group (clade GS23) formed a monophyletic branch with Umbelopsidaceae (Mucoromycotina). All these groups warrant at least class-level distinction from other mucoralean taxa . A single novel clade of Glomeromycota—clade GS24—displayed strong affinities to Paraglomerales. From this group, a single spore collection (INSD accession JN936327) has been sequenced but not yet described.
Three class-level clades were related to the subphylum Pucciniomycotina of the Basidiomycota. Clades GS25 and GS26 represented successive sister groups to the remaining Pucciniomycotina, whereas the clade GS27 formed a sister group to Agaricostilbomycetes. The latter clade includes an 18S rRNA gene (Sanger) sequence from the voucher specimen RB1040 named as Platygloea sp. that appears distantly related to other Platygloeales and other Pucciniomycotina. Three novel clades (GS28–GS30) and branches were identified within the early-diverging Agaricomycetes, but their sister groups remained poorly resolved (BS <70). Multiple divergent sequences were also recovered in the orders Sebacinales, Trechisporales, Agaricales, Thelephorales, Hymenochaetales, and Atheliales.
Within Ascomycota, the Taphrinomycotina subphylum included a well-supported sister group (clade GS31) to the Archaeorhizomycetes, a recently described class that is largely composed of environmental sequences . The clades GS32 and GS33 were closely related to the Orbiliales within Orbiliomycetes. Several additional unidentified taxa clustered within Pezizomycetes, but no deep lineages were evident in this group. Phylogenetic relationships of other classes of the Pezizomycotina were more poorly resolved, but these comprised four previously unidentified order-level clades (GS34–GS37) and two prominent branches as well as multiple taxa with clear affinities to known orders. These clades were related to the Eurotiomycetes, Lecanoromycetes, Sordariomycetes, or Symbiotaphrinales, albeit with no support. In contrast to multiple novel lineages in the early diverging fungal phyla, no such deep undescribed lineages of Dikarya were evident from aquatic environments .
Distribution of previously unrecognized clades
Niche modelling of the clades and prominent branches revealed that the distribution of most groups is significantly related to climatic or edaphic conditions. Across the 41 most common groups, the mean annual temperature (MAT), mean annual precipitation (MAP), time since last fire, and soil pH accounted for the strongest predictors in 44, 20, 15, and 12% of the taxa, respectively (Fig. 3, Additional file 1: Figures S4-S8). Soil C concentration and soil P concentration had a predominant effect in only a few cases (Additional file 1: Figures S4, S8). Altogether 46% of the groups had a preference for tropical climate as judged by their distribution patterns relative to MAT and MAP (Additional file 1: Figures S5, S6; Text S1). In contrast, 32% of the groups were distinctly more frequent in cool temperate climate, whereas 7 and 5% of the groups peaked in warm temperate soils and tundra soils, respectively.
While 39% of the groups had a unimodal relationship with pH, peaking at moderately acidic values, some 32 and 7% of the groups exhibited preference for highly acidic and neutral soil, respectively (Additional file 1: Figure S7; Text S1). In terms of soil pH and climate, similar preference patterns were described for the most species-rich classes of fungi . The more common niche development in acidic soils relative to neutral soils may be related to the characteristic substrate of saprotrophic fungi in strongly or moderately acidic humus derived from litter. It is also possible that less intense sampling in neutral soils may have rendered selection of the rare alkaliphilous groups less likely and that it may have favored non-selective groups instead.
Several groups of Rozellomycota exhibited preference for either of the extreme pH conditions, although the whole phylum taken together did not respond to soil pH. Except for the clades GS10 and GS11, all divergent groups of Rozellomycota were relatively more common in cool temperate or subarctic climate, which stands in stark contrast to the suggested niche of early diverging fungal lineages in tropical latitudes . Frequent clade formation of the Rozellomycota isolates from soil with those from freshwater, marine, and anoxic habitats suggests that specialization for physical habitat is relatively limited, but distribution of these groups may be influenced by substrate pH at the clade level. It is also possible that the definition of the Rozellomycota clades is too broad for detecting environmental patterns, because their age may exceed that of relatively more recently evolved phyla in Dikarya . As all known members of Rozellomycota (incl. Microsporidia) and Aphelidea are obligate pathogens of various other eukaryotes, such as amoebae, algae, and other fungi , the distribution of these species may depend indirectly on interaction specificity and habitat preference of host organisms.
In contrast to Rozellomycota, the undescribed ascomycete clades were generally more prominent in warm and moist tropical climates, and their relative abundance peaked in moderately acidic soils. The most common ascomycete classes varied greatly in their preference for climate and pH . These group-specific responses and the presence of multiple functional groups caution against phylum-level analyses of fungal ecological patterns .
Most of the undescribed clades and branches were rare but nonetheless widely distributed in different habitats. The niche analysis revealed that roughly half of the groups had significant differences in geographic distribution among biomes and regions (Table 1). In particular, Europe, Central America, and Southern South America stood out as focal geographic regions for a large proportion of the undescribed groups. The groups branch5 (four OTUs), clades GS06 (five OTUs), and GS26 (four OTUs) exhibited the strongest endemicity, being distributed exclusively in Australia, Europe, and Northern South America, respectively. These extreme patterns are at least partly attributable to geographically aggregated and insufficient taxonomic sampling of the uncommon groups. For many other undescribed clades, the complementary information in sequence databases provides ample evidence for more widespread distribution in soil and furthermore suggests that several clades of the early-diverging fungal phyla may actually be relatively more common in aquatic environments (Fig. 2).
Implications of cryptic microbial diversity
Our study highlights the presence of multiple previously undescribed fungal groups and approximates their phylogenetic position within fungi. These clades and branches seem to represent only a tip of the iceberg in the ocean of unknown fungal lineages, because the groups recovered here matched at >80% similarity to only 13 out of >1000 compound clusters of ITS sequences with no order-level described representatives [10, 29] and we focused solely on a prominent but still limited subset of soil-inhabiting taxa. Contrary to our hypothesis of higher diversity of novel clades in the tropics, the preferred niche of undescribed groups was equally likely to be tropical or non-tropical. It is notable that nearly one third of these clades were also recovered from soil in a single comprehensively sampled field experiment in NC, USA , suggesting that numerous undescribed and widespread fungal lineages await discovery and formal description in single habitats. Most importantly, all fungal phyla accommodate previously unrecognized fungal groups, but Rozellomycota stands out as particularly understudied phylogenetically and taxonomically both in aquatic habitats [20, 24] and in soil. The great phylogenetic richness of Rozellomycota is probably related to their ecologically successful obligate energy parasitism on protists, fungi, and algae and a more recent switch (Microsporidia) to an intracellular habitat in animals. This may have resulted in their early radiation and accelerated evolution of various genes as well as overall genome compaction [20, 31].
DNA barcoding of culture collections and fungaria, as well as release of sequence data for public use, will certainly uncover true vouchered representatives of several of our undescribed clades and facilitate formal taxonomic description of these groups. Both fruiting bodies and cultures form an excellent basis for genomic analysis to understand the functional capacities of undescribed taxa and improve phylogenetic resolution [16, 32, 33]. Metagenomics and single-cell genomic analyses offer promising tools for taxonomic and functional characterization of bacteria  and aquatic microeukaryotes  in their intimate environment, and these methods may provide satisfactory results also for unicellular zoosporic fungi . They nevertheless remain a major challenge in the context of multicellular fungi and other eukaryotes due to the typical growth of these taxa inside substrates, the 10–100 times greater genome size compared to bacteria, and the arrangement of genetic information in multiple chromosomes . We predict that the combination of targeted DNA capture and sequencing of long metagenomics fragments will soon provide unprecedented insights into the phylogeny and function of eukaryotic microorganisms and shed light on tens to hundreds of previously unrecognized lineages of life.
We nevertheless fear that a non-trivial proportion of our undescribed lineages will cede little ground to immediate scientific scrutiny. The combination of uncultivability and not forming appreciable fruiting bodies or other tangible morphological structures is particularly problematic from a genomics point of view. Indeed, that very combination precludes both straightforward genome sequencing and formal description of the underlying species . It will presumably take a long time before all the taxa presented here will have formal names. We hope that the scientific community is prepared to address these lineages using informal names, such as “clade GS01” (Additional file 1: Text S1), in the meanwhile. These taxa are every bit as real and worthy of scientific study as taxa bearing formal Latin classifications. The ecological roles and functional capacities of these undescribed lineages remain poorly understood, which makes their exploration all the more pressing given that fungi including the early diverging lineages represent important sources for pharmacy and the enzyme industry . There is, furthermore, little reason to think that soil is the sole source for previously undescribed fungal lineages; it is likely that habitats and substrates such as water, sediments, and other organisms will prove to be equally rich sources of taxonomic dark matter [37, 38].
This study extends and illustrates previous findings that the soil habitat harbors thousands of undescribed fungal taxa [8, 10, 13, 14], which we place to >30 previously unrecognized well-supported fungal lineages. More importantly, these order- and class-level groups are distributed throughout the fungal tree of life and exhibit specific ecological preferences and/or biogeographic distribution patterns. To enable cross-communication of these major phylogenetic clades among research groups, we propose a provisional naming system until their valid taxonomic description or matching with hitherto unsequenced species. These clade names are linked to fungal ITS and rRNA gene sequences in the UNITE database. Combining fluorescent probing and single-cell sequencing to cover nearly full-length rRNA genes will certainly improve our understanding about the ecophysiology and evolution of these enigmatic fungal clades.
We used the global soil DNA samples and fungal ITS2 data set from 365 localities in 38 countries  to address phylogenetic and ecological hypotheses about the distribution of previously unknown fungal lineages. In brief, 40 subsamples of soil (50-mm diam. to 50-mm depth) were collected from each 2500-m2 site, pooled, air-dried, and pulverized. The soil powder was subjected to chemical analysis of macro- and micronutrients and DNA extraction (2 g) and 454 pyrosequencing, followed by quality filtering, clustering at 98% sequence similarity, and removal of singletons . From the final data set of 50,589 operational taxonomic units (OTUs), we identified taxa originally assigned to fungi or rare protist groups as well as taxa with unknown taxonomic affiliations that displayed sequence similarity <80% to any species with a Latin binomial using BLASTn queries against an annotated copy of the International Nucleotide Sequence Databases (INSDc) as maintained in UNITE . Depending on taxa, 80% ITS sequence similarity roughly corresponds to the family or order in fungi [8, 9]. Nearly 15% of all OTUs corresponded to this criterion, suggesting the presence of numerous new taxa at the family level or higher. Representative sequences of these OTUs were further clustered at 80% sequence similarity using single-linkage clustering and at least a 100-base coverage in Sequencher 5.1 (GeneCodes Corp., Ann Arbor, MI, USA) to assign individual OTUs to larger taxonomic groups. To ensure that all major taxonomic clusters (>10 OTUs) were covered, we selected 203 individual OTUs and 23 groups of closely related OTUs (altogether comprising 60 OTUs with sequence similarity >95% within groups) for design of taxon-specific primers and more detailed phylogenetic analyses. At 80% similarity level, the selected OTUs represented 1111 OTUs and 15,515 sequences. We sought to amplify the 3′ part of the 18S rRNA gene and the 5′ part of the 28S rRNA genes to allow phylogenetic inference at the kingdom level. For each of these taxa, we designed reverse and forward primers in the variable part of the ITS region according to the following criteria: (i) melting temperature of primers 54–58 °C; (ii) AT/CG ratio 33–62%; (iii) primer length 16–21 bases; (iv) perfect match of the last 10 bases to <20 OTUs in the whole data set (usually matching no other OTUs); and (v) distance from the flanking 5.8S and 28S rRNA genes >20 bases to allow detection of unspecific amplification.
To amplify the 18S rRNA gene, the specific reverse primers were paired with the NS5a and NS7a primers (Additional file 2: Table S1). To amplify the 28S rRNA gene, we combined the specific forward primers with TW13 and LR5. PCR with specific primers was performed for both of the two rRNA gene regions and two alternative primer combinations for 443 samples representing 263 OTUs. Sanger sequencing was performed bidirectionally using the universal PCR primers and the primers ITS2 and/or fITS7R for 18S rRNA gene or LR0R for 28S rRNA gene (Additional file 2: Table S1). Contigs were assembled in Sequencher with manual quality trimming. The reads obtained using 18S and 28S rRNA gene primers typically overlapped at least partly with the pyrosequenced ITS2 fragment, which allowed us to exercise initial chimera control. Individual sequences were further BLASTn-queried against GenBank to detect inconsistencies in the identification of 18S rRNA gene, ITS1, ITS2, and 28S rRNA gene sequences. Full-length sequences were also subjected to chimera detection using UCHIME  against other taxa in the data set and all INSDc entries spanning from 18S to 28S rRNA genes. These analyses revealed five potentially chimeric constructs that were removed. PCR and Sanger sequencing were successful for 244 samples of 18S (168 OTUs) and 298 samples of 28S (193 OTUs) rRNA genes. Altogether, 138 OTUs were represented by both 18S and 28S rRNA gene sequences, whereas sequencing failed completely for 25 OTUs. The most common issues with specific primers included (i) multiple amplicons seen as smear on the gel (18S rRNA gene), no amplification (18S and 28S rRNA genes), and poor fitting of the complementary sequencing primer, resulting in poor signal (18S rRNA gene). Individual reads were generally of high quality, indicating the sequence origin to be that of a single organism.
We obtained high-quality 18S and/or 28S rRNA gene Sanger sequences for 90.5% of the targeted OTUs, including all but two major groups (>10 OTUs). High-quality sequences were mainly recovered from samples with relatively high abundance of target DNA (>0.2% of ITS sequences), but in many cases, 18S and 28S rRNA gene data could be recovered from singletons, i.e., taxa contributing to <0.05% of all sequences per sample. Certain samples and OTUs failed to yield any amplicons, suggesting DNA degradation and unsuitability of the designed or eukaryote primers, respectively.
For phylogenetic inference, we used (i) the core 18S + 28S rRNA gene data set of James et al.  supplemented with (ii) 18S and 28S rRNA gene sequences of more recently obtained specimens or cultures of early diverging fungal lineages, (iii) 18S and 28S rRNA gene sequences of at least one representative of all fungal orders (except ascomycetes, for which representatives of ca. 70% orders and all classes were included), and 18S or 28S rRNA gene sequences of the best BLASTn hits (at least 600 bases) of our OTUs. Whenever possible, we included 18S and 28S rRNA gene sequences from the same specimen and preferably from the type species of that taxon for taxonomic reliability. Since we included best-matching sequences, the 18S and 28S rRNA gene data sets were unbalanced, comprising ca. 25% of non-overlapping entries. Initially, the two data sets were aligned separately in MAFFT 7  with the FFT-NS-i option. Poorly aligned regions were removed using GBlocks v. 0.91b , with the following parameters: minimum number of sequences for a conserved position = 50% of sequences, minimum number of sequences for a flank position = 75% of sequences, minimum number of contiguous non-conserved positions = 20, minimum length of a block = 2, and allowed gap positions = All. The final alignment length of 18S and 28S rRNA genes was 1701 and 879 positions, respectively. Because the phylogenetic positions of target taxa were similar relative to the core specimens, we concatenated the two alignments for a joint analysis in addition to separate analyses. Phylograms were inferred using maximum likelihood as implemented in RAxML 7.2.8 using the GTRCAT model . For the combined data set, 1000 heuristic searches were performed using a skeleton constraint tree for taxa in James et al.  and support estimated from 1000 rapid bootstraps (also using the constraint trees). Individual 18S and 28S rRNA gene phylogenies were estimated using the –x option with 1000 rapid bootstraps and no constraint tree. During a series of analyses, we excluded the following taxa from the original AFTOL alignments because of extremely long branches or inconsistent phylogenetic placement: Agonimia sp., Bacidia schweinitzii, Candida lusitaniae, Cryptomycocolax abnormis, Dermatocarpon miniatum, Encephalitozoon cuniculi, Echinoplaca strigulacea, and Yarrowia lipolytica. These taxa did not represent sister groups for any of our undescribed OTUs according to the initial analyses.
Based on the topology of the concatenated tree, we focused on statistically supported branches (BS >70) featuring no described species. We refer to these as clades following the International Code of Phylogenetic Nomenclature . We also addressed the unique branches comprising single sequences if these could not be placed to orders or classes. Each novel group (37 clades and seven branches altogether representing 819 OTUs and 9778 sequences) that comprised >1 OTU (93% of these groups) was subjected to niche analysis using a machine learning Random forest algorithm  by combining the randomForest  and VSURF  packages of R. This approach makes no assumptions on the distribution of residuals and type of response, which renders it suitable for analysis of very sparse data sets including large numbers of absences. For niche analysis, we compiled all information on the richness and distribution of OTUs within the above-defined clades as well as associated metadata . From the initial pool of 17 edaphic, floristic, and climatic variables, we selected the six most important predictors across the whole data set, removing multicollinear and unimportant variables. In the final Random forest model selection, we thus included only mean annual temperature (MAT), mean annual precipitation (MAP), soil pH, soil P and C concentration, and time since last fire. In a separate analysis, we tested whether the distribution of clades was biased in relation to biomes and ecoregions, which were treated as categorical predictors. P values were calculated based on 999 data re-arrangement permutations using the rfPermute package of R . To assess the efficiency of models, 10-fold cross-validation was used. The original data were randomly partitioned into 10 subsets to generate training sets and test sets. This process was repeated 100 times and revealed an R 2-cv accuracy index of models for training sets to explain test sets (Additional file 1: Figure S4). To illustrate the niches, we present the occurrence of specific OTUs within each clade compared with the null distribution of site conditions in histograms. The niche of clades was considered to be significantly narrower than expected if (i) the standard deviation of the null distribution exceeded that of OTU distribution >2-fold and (ii) the Levene test for homogeneity of variances was significant at α = 0.05. To visualize the relationships of clades with the climatic, edaphic, and biogeographic environment, we constructed a two-dimensional detrended correspondence analysis (DCA) ordination biplot using the occurrence of OTUs of clades and prominent branches and Bray-Curtis distance as implemented in the vegan package of R  (Fig. 3).
The 18S and 28S rRNA gene sequences were further compared with metadata and phylograms in the literature from which the other environmental sequences used in phylograms were obtained (Additional file 1: Table S2). These data and associated metadata were integrated for interpreting the ecological and geographic distribution of the soil-inhabiting groups. In addition, the ITS sequences of all focal taxa were compared with the 80% sequence similarity-based compound clusters in the UNITE database  to determine the relative identification capacity of the newly described groups against clusters of recently accumulated fungal ITS barcodes.
Dighton J. Fungi in ecosystem processes. New York: Marcel Dekker; 2003.
Knack JJ, Wilcox LW, Delaux P-M, Piotrowski MJ, Cook ME, Graham JM. Microbiomes of streptophyte algae and bryophytes suggest that a functional suite of microbiota fostered plant colonization of land. Int J Plant Sci. 2015;176:405–20.
Blackwell M. The Fungi: 1, 2, 3 … 5.1 million species? Am J Bot. 2011;98:426–38.
Wardle DA, Lindahl BD. Disentangling global soil fungal diversity. Science. 2014;346:1052–3.
Spatafora JF, McLauglin DJ. The Mycota 7: systematics and evolution. Berlin: Springer; 2014/2015.
Hawksworth DL. The fungal dimension of biodiversity: magnitude, significance, and conservation. Mycol Res. 1991;95:641–55.
Vartoukian S, Palmer RM, Wade WG. Strategies for culture of ‘unculturable’ bacteria. FEMS Microbiol Ecol. 2010;309:1–7.
Tedersoo L, Bahram M, Põlme S. Global diversity and geography of soil fungi. Science. 2014;346:1078.
Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Fungal Barcoding Consortium. Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proc Natl Acad Sci U S A. 2012;109:6241–6.
Nilsson RH, Wurzbacher C, Bahram M, Coimbra VRM, Larsson E, Tedersoo L, Eriksson J, Duarte Ritter C, Svantesson S, Sánchez-García M, Ryberg M, Kristiansson E, Abarenkov K. Top 50 most wanted fungi. MycoKeys. 2016;12:29–40.
Hugenholtz P, Goebel M, Pace NR. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. Appl Environ Microbiol. 1998;180:4765–74.
Moon-van der Staay SY, De Vachter R, Vaulot D. Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity. Nature. 2001;409:607–10.
Schadt CW, Martin AP, Lipson DA, Schmidt SK. Seasonal dynamics of previously unknown fungal lineages in tundra soils. Science. 2003;301:1359–61.
Jones MDM, Forn I, Gadelha C, Egan MJ, Bass D, Massana R, Richards TA. Discovery of novel intermediate forms redefines the fungal tree of life. Nature. 2011;474:200–3.
James TY, Kauff F, Schoch CL. Reconstructing the early evolution of fungi using a six-gene phylogeny. Nature. 2006;443:818–22.
Rosling A, Cox F, Cruz-Martinez K, Ihrmark K, Grelet G-A, Lindahl BD, Menkis A, James TY. Archaeorhizomycetes: unearthing an ancient class of ubiquitous soil fungi. Science. 2011;333:876–9.
Green Tringe S, von Mering C, Kobayashi A, Salamov AA, Chen K, Chang HW, et al. Comparative metagenomics of microbial communities. Science. 2005;308:554–7.
Hibbett D. The invisible dimension of fungal diversity. Science. 2016;351:1150–1.
Jablonski D, Roy K, Valentine AW. Out of the tropics: evolutionary dynamics of the latitudinal diversity gradient. Science. 2006;314:102–5.
Grossart H-P, Wurzbacher C, James TY, Kagami M. Discovery of dark matter fungi in aquatic ecosystems demands a reappraisal of the phylogeny and ecology of zoosporic fungi. Fung Ecol. 2016;19:28–38.
Karpov SA, Mamkaeva MA, Aleoshin VV, Nassonova E, Lilje O, Gleason FH. Morphology, phylogeny, and ecology of the aphelids (Aphelidea, Opisthokonta) and proposal for the new superphylum Opisthosporidia. Front Microbiol. 2014;5:112.
Sekimoto S, Rochon D, Long JE, Dee JM, Berbee ML. A multigene phylogeny of Olpidium and its implications for early fungal evolution. BMC Evol Biol. 2011;11:331.
Lefevre E, Fletcher PM, Powell MJ. Temporal variation of the small eukaryotic community in two freshwater lakes: emphasis on zoosporic fungi. Aquat Microb Ecol. 2012;67:91–105.
Richards TA, Guy L, Mahe F, del Campo J, Romac S, Jones MDM, Maguie F, Dunthorn M, de Vargas C, Massana R, Chambouvet A. Molecular diversity and distribution of marine fungi across 130 European environmental samples. Proc R Soc B. 2015;282:20152243.
Benny GL, Smith ME, Kirk PM, Tretter ED, White MM. Challenges and future perspectives in the systematics of Kickxellomycotina, Mortierellomycotina, Mucoromycotina, and Zoopagomycotina. In: Li D-W, editor. Biology of microfungi. Cham: Springer; 2016. p. 65–126.
Treseder KK, Maltz M, Hawkins BA, Fierer N, Stajich JE, McGuire KL. Evolutionary histories of soil fungi are reflected in their large scale biogeography. Ecol Lett. 2014;9:1086–93.
Stajich JE, Berbee ML, Blackwell M, Hibbett DS, James TY, Spatafora JW, Taylor JW. The Fungi. Curr Biol. 2009;19:R840–5.
Lindahl BD, Nilsson RH, Tedersoo L, Abarenkov K, Carlsen T, Kjøller R, et al. Fungal community analysis by high-throughput sequencing of amplified markers—a user’s guide. New Phytol. 2013;199:288–99.
Kõljalg U, Tedersoo L, Nilsson RH, Abarenkov K. Digital identifiers for fungal species. Science. 2016;352:1182–3.
Mueller RC, Balasch MM, Kuske CL. Contrasting soil fungal community responses to experimental nitrogen addition using the large subunit rRNA taxonomic marker and cellobiohydrolase I functional marker. Mol Ecol. 2014;23:4406–17.
James TY, Pelin A, Bonen L, Ahrendt S, Sain D, Corradi N, Stajich JE. Shared signatures of parasitism and phylogenomics unite Cryptomycota and Microsporidia. Curr Biol. 2013;23:1548–53.
Dentinger BTM, Gaya E, O’Brien H, Suz LM, Lachlan R, Diaz-Valderrama JR, Koch RA, Aime MC. Tales from the crypt: genome mining from fungarium specimens improves resolution of the mushroom tree of life. Biol J Linn Soc. 2016;117:11–32.
Tedersoo L, Liiv I, Kivistik PA, Anslan S, Kõljalg U, Bahram M. Genomics and metagenomics technologies to recover ribosomal DNA and single-copy genes from old fruitbody and ectomycorrhiza specimens. MycoKeys. 2016;13:1–20.
Yoon HS, Price DC, Stepanauskas R, Rajah VD, Sieracki ME, Wilson WH, Yand EC, Duffy S, Bhattacharya D. Single-cell genomics reveals organismal interactions in uncultivated marine protists. Science. 2011;332:714–7.
del Campo J, Sieracki ME, Molestina RE, Keeling P, Massana R, Ruiz-Trillo I. The others: our biased perspective of eukaryotic genomes. Trends Ecol Evol. 2014;29:252–9.
Solomon KV, Haitjema CH, Henske JK, Gilmore SP, Borges-Rivera D, Lipzen A. Early-branching gut fungi possess a large, comprehensive array of biomass-degrading enzymes. Science. 2016;351:1192–5.
Sridhar KR, Beaton M, Bärlöcher F. Fungal propagules and DNA in feces of two detritus-feeding amphipods. Microb Ecol. 2011;61:31–40.
Panzer K, Yilmaz P, Weiß M, Reich L, Richter M, Wiese J. Identification of habitat-specific biomes of aquatic fungal communities using a comprehensive nearly full-length 18S rRNA dataset enriched with contextual data. PLoS ONE. 2015;10:e0134377.
Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, et al. Towards a unified paradigm for sequence-based identification of Fungi. Mol Ecol. 2013;22:5271–7.
Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. 2011;27:2194–200.
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.
Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000;17:540–52.
Cantino P, de Queiroz K. International code of phylogenetic nomenclature 4c. 2011. http://www.ohiou.edu/phylocode/. Accessed 15 May 2016.
Stamatakis A, Aberer AJ, Goll C, Smith SA, Berger SA, Izquierdo-Carrasco F. RAxML-Light: a tool for computing terabyte phylogenies. Bioinformatics. 2012;28:2064–6.
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Liaw A, Wiener A. Classification and regression by randomForest. R News. 2002;2:18–22.
Genuer R, Poggi J-M, Tuleau-Malot C. VSURF: an R package for variable selection using random forests. R J. 2015;7:19–33.
http://CRAN.R-project.org/package=rfPermute. Accessed 5 May 2016.
Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens MH, Wagner H. Vegan: community ecology package. R package version 2.0-10. 2013.
https://unite.ut.ee/. Accessed 12 May 2016.
We thank K. Abarenkov for the sequence archiving and U. Kõljalg and C. Wurzbacher and five anonymous referees for the constructive comments on an earlier version of the manuscript.
This study was funded from the Estonian Science Foundation grants 9286, PUT0171, PUT1399; EMP265; MOBERC1; and EcolChange to cover all aspects of the work.
Availability of data and materials
All sequences are available through SRA (accession SRP055957), GenBank (accessions KY687510-KY687860), and UNITE (accessions UDB014609-UDB014959). OTU distribution data and sample metadata are available in Additional file 2: Data S1.
LT and TYJ conceptualized the work. RP performed the molecular analyses. TYJ ran the phylogenetic analyses. MB and RHN performed the statistical analyses. LT, RHN and TYJ wrote the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
This work is not related to human or animal subjects or protected species.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Full concatenated 18S and 28S rRNA gene phylogram. Figure S2. Full 18S rRNA gene phylogram. Figure S3. Full 28S rRNA gene phylogram. Figure S4. Best models of Random forest machine learning-based niche analysis of fungal clades and prominent branches. Figure S5. Histograms indicating the distribution of fungal clades (summed occurrences of OTUs) in sites with specified mean annual temperature. Figure S6. Histograms indicating the distribution of fungal clades (summed occurrences of OTUs) in sites with specified mean annual precipitation. Figure S7. Histograms indicating the distribution of fungal clades (summed occurrences of OTUs) in sites with specified soil pH. Figure S8. Histograms indicating the distribution of fungal clades (summed occurrences of OTUs) in sites with specified time since last fire, soil carbon content, and soil phosphorus concentration. Text S1. Profiles of undescribed clades and prominent branches of fungi. (PDF 10058 kb)