Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity
© He et al.; licensee BioMed Central. 2015
Received: 5 January 2015
Accepted: 3 April 2015
Published: 20 May 2015
The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses.
Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded.
As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.
Rapid advances in DNA sequencing technologies over the past decade have allowed us to study communities of microorganisms in far greater depth than was previously possible. Many of these studies involve PCR amplification and sequencing of marker genes (often the 16S small ribosomal subunit RNA (rRNA)) from complex communities of organisms, which can then be compared to databases of known sequences to identify the taxa present in the microbial community. These methods have led to the discovery of new organisms at a much faster rate than taxonomists can describe and name. To facilitate taxonomy-independent analyses and to reduce the computational resources necessary for such, marker gene sequence reads are typically clustered based on sequence similarity, under the assumption that sequences with greater similarity represent more phylogenetically similar organisms. These clusters, or operational taxonomic units (OTUs), are widely used as an analytical unit in microbial ecology studies .
Due to the lack of a gold standard of ‘correct’ OTUs, several measurements have been used to evaluate the performance of clustering methods, for example, rationality of OTU structure [2,3], computational efficiency (that is, runtime and memory requirements) , and the ability to cope with OTU inflation . However, OTU stability has rarely been studied to date, despite the importance of this property. Here, we define the stability of an OTU by whether it contains the same clustered sequence(s) regardless of the number of sequences that are clustered. If OTUs are found to be unstable when clustering different numbers of sequences in different clustering runs, the sequences in a given OTU may be assigned to different OTUs. Alternatively, sequences assigned to different OTUs can be assigned to a single OTU.
In this study, we reveal that unstable OTUs lead to non-overlapping rarefaction curves. We further show that these unstable OTUs can also affect beta-diversity analyses. We also evaluated existing de novo and reference-based clustering methods to show that all de novo clustering methods are unstable to some extent, while closed-reference clustering generates stable OTUs. Closed-reference clustering has the considerable limitation of excluding any OTUs that are not defined in a pre-existing reference set, which in turn excludes novel OTUs from analysis. As a compromise between generating OTU instability and the potential elimination of novel taxa, we recommend using open-reference OTU clustering , which we show to result in more stable OTUs than fully de novo clustering methods.
Results and discussion
Changing membership of OTUs at different sequencing depths (OTU instability) - a neglected but important property for analyses of microbial diversity
To illustrate the problem created by unstable OTUs, we reproduced the non-overlapping rarefaction curves using the same dataset (Canada soil dataset) and the same clustering method (complete linkage clustering, referred to as CL clustering) employed by Roesch et al. (Figure 1a). We randomly subsampled the raw sequences at four sequencing depths (20%, 40%, 60%, and 80% of the input sequences) using 30 replicates of each. We then used complete linkage (CL) clustering to cluster each of the subsamples (definitions of all clustering methods can be found in Additional file 1) and generated rarefaction curves for each sampling depth. In the case of CL clustering, the rarefaction curve produced by a larger subsample is steeper than that produced by a smaller subsample.
One goal when generating rarefaction curves is to support interpolation, meaning that if we create a rarefaction curve from a full dataset, we would like to use that curve to determine how many species would be observed for some number of sequences that amounts to less than the total. For example, when we interpolate from the rarefaction curve created from a full dataset, we estimate that we have approximately 4,500 species if we randomly select 30,000 sequences from the full dataset (point A in Figure 1a). The problem that non-overlapping rarefaction curves pose for interpolation, however, is that if we instead randomly subsampled 30,000 sequences from an 80% subsample of the full dataset, we would estimate that only 4,200 species are represented by these 30,000 sequences (point B in Figure 1a). This scenario would essentially be true in cases where only a few sequences were collected per sample, a phenomenon that conflicts with the expected behavior of rarefaction curves.
We have observed that the non-overlapping of rarefaction curves, as illustrated in Figure 1a, is actually caused by the instability of OTU clustering methods. In other words, the cluster that a sequence is assigned to can be affected by the number of sequences being clustered. An illustration of this hypothesis is shown in Figure 1b. If we observe only two sequences, S1 and S2, within the similarity threshold (indicated by linking with a bar), they are clustered into a single OTU (OTU1). We then add three more sequences, S3, S4, and S5, which could be linked to S1 or to S2, but several pairwise distances exceed the threshold (these pairs are not linked by bars). By definition of CL, pairwise distances for all sequences assigned to a single OTU must fit within the distance threshold [8,9], which could allow S1 and S2 to be separated into OTU2 and OTU3. OTU1 disappears at this sequencing depth, and its sequences are reassigned to two different OTUs, illustrating the problem of OTU instability. Theoretically, adding more sequences tends to split existing OTUs when using the CL algorithm. As a result, when being clustered with a larger dataset versus a smaller dataset, the same sequences will be grouped into more OTUs. This will result in a steepening of the rarefaction curve that is derived from the larger sample and the conclusion that it has a higher alpha-diversity. Rarefaction curves that arise from CL are therefore more sensitive to sequencing depth. Though this effect is weak, it still partially illustrates why, in some cases, the collecting of a number of sequences that is based on a smaller sample size would be expected to produce a rarefaction curve that reaches a plateau, and instead a continually increasing rarefaction curve is produced. This phenomenon of an individual being assigned to different OTUs simply because of increased or decreased sampling depth is obviously problematic. An analogous situation based on traditional (macro-scale) ecology would be if counting different numbers of birds within a fixed area led to the redefinition of which individual birds group together as a species. However, the above-described instability is not due to the occasional identification of novel species, as might be the case in traditional ecology. In contrast, these changes to OTU membership occur systematically within a large proportion of the sequences being reassigned between OTUs.
To further investigate the effect of unstable OTUs on biological interpretation, we next explored beta-diversity using ordination. Using Principal Coordinate Analysis (PCoA), we compared the microbial communities against the full dataset using subsamples comprising 60% of the full dataset. We repeated this subsampling 30 times to create replicates. We then used CL clustering to cluster all of the subsamples, as well as the full dataset, and combined the clustering results by representative OTU sequence (defined as the most abundant sequence in each OTU). The samples were then randomly rarefied to include 30,000 sequences per sample, including the 30 replicate rarefactions that resulted from the clustering of the full dataset. Following rarefaction, all samples contained the same number of sequences so that the only differences among them were the number of sequences that were initially clustered. PCoA demonstrated that these samples separated according to the number of sequences that were initially clustered, indicating that OTU instability results in the same samples appearing to have different compositions (Figure 1c). A similar result was observed when comparing the 20%, 40%, and 80% subsamples against the full dataset (Additional file 2: Figure S1). Furthermore, 125 OTUs (after false discovery rate (FDR) correction) and 26 OTUs (after Bonferroni correction) were determined to be significantly different between these two groups using the Mann-Whitney U test. We also tested the effect that unstable OTUs have on calculating taxonomic composition and found the effect to be very limited (Additional file 3: Figure S2 and Additional file 4). This is because these OTUs are still assigned to the same taxa as a consequence of their phylogenetic proximity, despite the fact that they are changing when more sequences are added using CL (also discussed below in the section detailing the tolerance of PCoA to using phylogenetic metrics with unstable OTUs).
Alternative hierarchical and greedy clustering methods also produce unstable OTUs
The instability produced by average linkage is more complicated because both OTU splitting and OTU merging can occur. These conflicting effects lead to more subtle differences in OTU counts, and the resultant rarefaction curves that are created with AL overlap at different depths (Figure 2d). Furthermore, the AL OTUs themselves are unstable (Additional file 5: Figure S3) due to the large number of OTU splitting and merging events that occur. Additionally, even though these unstable OTUs affect beta-diversity (Adonis, R = 0.16, P = 0.001), the major separation in PCoA appears to be caused by factors other than sample size; for example, the possible inclusion of differences that result from the input order of the sequences or the presence or absence of certain key sequences within different subsamples (Figure 2e). This observation may result from the sensitivity of AL to the order of input sequences, which would result in different clustering patterns. When using AL, 804 OTUs (after FDR correction) and 5 OTUs (after Bonferroni correction) were differentially represented across the two sampling depths.
Greedy clustering, such as that which is implemented in USEARCH, is another commonly used de novo clustering method that is more computationally efficient than CL, SL, and AL. When using greedy clustering, a sequence must be within the distance threshold of a single OTU centroid to be clustered in that OTU. Furthermore, sequences are processed in a defined order, and each query sequence will either be assigned to an existing OTU or as the centroid of a new OTU. If one query sequence is within the distance threshold of multiple existing OTU centroids, it can be assigned to either the closest centroid (here referred to as distance-based greedy clustering (DGC)) or the most abundant centroid (here referred to as abundance-based greedy clustering (AGC)) (Additional file 1). Alternative approaches exist for breaking such ties; however, we chose to limit our focus to those that are the most commonly employed. In the present study, we evaluate USEARCH as a method for greedy clustering (we did not evaluate UPARSE because its clustering algorithm is the same as that used in USEARCH).
We used greedy clustering on alpha rarefaction curves and beta-diversity PCoA to analyze the effects generated by unstable OTUs. As stated above, DGC and AGC both suffer from centroid changeability (this effect is not biased towards OTU splitting or merging), and DGC additionally suffers from the splitting of existing OTUs. As a result, DGC and CL clustering produced similar curves, which became steeper as subsample size increased (Figure 3c). In contrast, AGC produced overlapped curves that were unaffected by depth (Figure 3d). However, as with AL clustering, this does not mean that the OTUs were stable, but only that similar numbers of (possibly different) OTUs were obtained at the different subsampling depths. The unstable OTUs produced in DGC and AGC effect estimations of beta-diversity (Figure 3e,f). In the case of AGC, 392 OTUs (after FDR correction) and 14 OTUs (after Bonferroni correction) were determined to be differentially represented across the two depths, and in the case of DGC, these numbers were 370 and 15, respectively.
One de novo clustering method that does produce stable OTUs is dereplication or the clustering of sequences that are identical and of equal length (Additional file 8: Figure S4a). As with closed-reference OTU clustering, all OTUs remain absolutely stable across different sequencing depths because clustering is not affected by the composition of the sequence collection being clustered. As a result, rarefaction curves produced using dereplication are overlapping across different depths (Additional file 8: Figure S4b), and beta-diversity is not affected by the size of the subsamples (Additional file 8: Figure S4c). Moreover, not a single OTU is determined to be significantly different between the two groups. It is important to note that dereplication is highly vulnerable to identifying spurious OTUs that result from sequencing error. Due to its stability in binning OTUs, it also produces overlapping rarefaction curves across different depths, indicating that unstable OTUs (rather than sequencing errors) are the main cause of non-overlapping rarefaction curves. Furthermore, the stability of the dereplication method suggests that a higher similarity threshold for clustering may reduce the occurrence of unstable OTUs, as de novo clustering methods become more similar to dereplication as the similarity threshold increases. In practice, dereplication clustering yields high numbers of OTUs, which is computationally expensive to employ downstream. Thus, modern dataset sizes prevent us from working with sequences that have only been dereplicated. It is possible that future methods may use approaches based on dereplication to manage the problem of OTU instability. Another extreme example would be the clustering of all sequences into one OTU while that OTU remains absolutely stable. Nevertheless, unlike dereplication, OTUs can be utilized in further analyses, such as alpha-diversity, beta-diversity, and taxonomic composition. Furthermore, clustering all sequences into one OTU can hardly be called ‘clustering’ and is completely useless for downstream analysis.
Reference-based methods minimize the problem of unstable OTUs
To overcome the limitations of closed-reference OTU clustering, open-reference OTU clustering can be used. Open-reference clustering begins in the same way as closed-reference clustering but continues to cluster the sequences that do not match the reference collection in a de novo manner. Although existing de novo clustering methods produce unstable OTUs, open-reference clustering can be much more stable than such methods because many sequences are initially clustered by the closed-reference approach. We evaluated OTU stability in open-reference clustering using AGC for the de novo clustering step (Figure 4a,b,c) and found it to be a much more effective method than using de novo methods alone. The majority of the unstable OTUs were low abundance sequences with no reference match (a category of sequences that is commonly considered to be error-prone). Open-reference OTU clustering produces overlapping rarefaction curves (Additional file 9: Figure S5a), and even though the instability of open-reference OTU clustering still affects PCoA analysis (Additional file 9: Figure S5b), the PC and R value (by ADONIS, R = 0.03) is lower than with any other de novo method alone, as is the number of OTUs that are differentially represented across the two groups (104 OTUs after FDR correction and 2 OTUs after Bonferroni correction). We compared open-reference clustering methods with other de novo methods on additional datasets, focusing on the proportion of unstable sequences and unstable OTUs and found that these results are generally consistent across environment types and sequencing technologies (Additional file 7: Figure S6).
In addition to quantifying the instability of OTUs, we used the MCC index to investigate how the clustering of sequence pairs changed based on clustering of the full dataset versus the 60% subset (Figure 4b, Additional file 6: Table S2). It is clear that the two reference-based methods and dereplication clustering have the highest stability by this metric and that AGC is the most stable of the de novo clustering methods (Kruskal-Wallis test, P < 0.05). AL had the lowest MCC value, indicating that the clustering of many sequences pairs changed when using this method. Alternatively, SL produced a higher MCC value than most of the de novo methods, including AL and CL. Nevertheless, part of the reason for the high MCC value of SL is that its FP value equals 0 (sequences that are separated in a smaller subsample will be merged into a single OTU in a larger subsample, but the reverse situation does not happen at all). Thus, due to its severe problems with OTU merging, SL should not be considered a much more stable method.
Phylogenetic beta-diversity metrics minimize the effect of OTU instability
Unlike non-phylogenetic metrics, where all OTUs are considered equally dissimilar from each other, phylogenetic metrics such as UniFrac take into account the phylogenetic relationship between OTUs when calculating distances between samples. Unstable OTU clustering methods will move sequences between OTUs that would usually be closely related evolutionarily so that the calculated distance between samples should generally remain more similar than it would when using non-phylogenetic diversity metrics. We re-analyzed the effect of unstable OTUs on beta-diversity using CL, SL, AL, AGC, and DGC based on UniFrac distance (Additional file 10: Figure S7). The results show that unstable OTUs of CL, AGC, and DGC minimally affect beta-diversity using UniFrac distance, confirming the hypothesis that when sequences are changing between closely related OTUs with these unstable methods, phylogenetic metrics are more tolerant to that instability. Nevertheless, in SL clustering, distantly related OTUs can ultimately be joined into a single OTU, so that beta-diversity can be affected even when using UniFrac distance. In AL, the major separation is still caused by different clustering patterns, as with the non-phylogenetic metrics.
Assigning an organism to a different species based on the number of individuals included in a given census is obviously problematic in traditional macro-ecology. Unfortunately, due to an artifact of many OTU clustering methods, the equivalent situation is common in microbial ecology studies that use taxonomic-independent methods. This study demonstrated for the first time that the problem of depth-dependent rarefaction curves that is inherent in most of the de novo clustering algorithms arises from unstable OTUs and not from sequencing errors. Furthermore, unstable OTUs may also contribute to depth-dependent beta-diversity estimates and OTU membership. Our results demonstrate that the closed-reference OTU clustering method provides stable OTUs, while all other de novo clustering methods (except dereplication) produce unstable OTUs. Unfortunately, the closed-reference OTU method is limited by the availability of databases of known marker gene sequences, which excludes novel OTUs from analysis. To balance this caveat of closed-reference OTU clustering against the problem of OTU instability, we recommend using open-reference OTU clustering employing AGC (for example, as implemented using uclust  in QIIME ), a relatively stable de novo method, to cluster sequences that did not map to the reference database. This allows for clustering of all sequences and introduces minimal OTU instability compared to de novo OTU clustering methods (which also cluster all sequences). As new OTU clustering algorithms continue to be rapidly developed, we suggest that OTU stability should be an important consideration in future endeavors. However, we do not argue that OTU stability is the most important quality of clustering methods. Other attributes of clustering methods, such as diminishing sequencing errors, rational OTU structure, niche consistency, and time efficiency, should also be considered when choosing a method for a specific type of analysis or evaluating a novel approach to clustering. Furthermore, as the 16S rRNA gene database is expanding, closed-reference OTU clustering and new algorithms that bypass OTU clustering altogether during microbiome analysis may render problems concerning OTU instability obsolete.
The ‘Canada soil’ dataset [GenBank:EF308591-EF361836] was originally used to describe that rarefaction curves generated from the same dataset but at different depths often do not overlap  and is used here to demonstrate unstable OTUs. This dataset was one of the earliest reported 454 datasets, and quality information was not available for the sequences contained therein. Although we demonstrated in the results section (via the dereplication method) that OTU instability is introduced by clustering algorithms and not by sequencing errors, we nevertheless removed sequences that contained ambiguous bases (N), potential chimeras identified by UCHIME using the de novo mode (using --minchunk 20 --xn 7 –noskipgaps 2) , and sequences that were of a length that ranged outside of two standard deviations from the mean. Such sequences were removed for quality control purposes (in the absence of quality scores). The original dataset contained a total of 53,246 sequences, and 13,469 unique sequences remained after dereplication. Following our quality control measures, a total of 50,542 sequences with 13,293 unique tags remained, ranging in length from 86 to 120 bp.
The hierarchical clustering algorithms, CL, AL, and SL, were run using MOTHUR 1.27 . The same pairwise Needleman-Wunsch distance matrix (with default parameters) was calculated for each pair of unique sequences and used in the three clustering algorithms at a 97% identity threshold. We used the QIIME 1.7.0  USEARCH 6.1 wrapper  for the AGC and DGC greedy clustering algorithms. We used QIIME 1.7.0  with the Greengenes 13_8 database for closed- and open-reference clustering. These clustering methods are further defined in Additional file 1, along with commands used the execute them.
To investigate the effect of sequencing depths on rarefaction curves, we subsampled our input data (prior to OTU clustering) at 5 depths (20%, 40%, 60%, 80%, and 100% of the total data), with 30 replicates at each depth. We then clustered each of these datasets with each of the clustering methods and created rarefaction curves. The presented data included the average rarefaction curve across all replicates for each clustering method at each depth.
To demonstrate the effect of sequencing depth on beta-diversity, we subsampled our input data to include 60% of sequences prior to OTU clustering and compared it with the full dataset (the first round of subsampling was to create a 60% subsample to compare with the full dataset). Because de novo clustering methods do not provide universal OTU identifiers, after clustering each of these datasets, we combined the clustering results according to a representative sequence of each OTU (which we chose to be the most abundant sequence in each OTU). In this way, OTUs with the same representative sequences were merged into a single OTU. We then created a BIOM file  from the merged OTUs for further analyses. We computed beta-diversity using QIIME 1.7.0 with all samples rarefied to 30,000 sequences per sample (60% of the total dataset, the second round of subsampling; to diminish the effect of depth on PCoA, we only tested the effect of unstable OTUs on PCoA). To accomplish this, we employed the Bray-Curtis distance (a non-phylogenetic metric) and weighted UniFrac (a phylogenetic metric). Adonis was used to test whether the samples from the full dataset and the 60% subset clustered independently of each other and to quantify the size of that effect (this was performed by running compare_categories.py --method adonis -i dm.txt -m Map.txt -c Treatment -o adonis_out -n 999).
We used a proportion of unstable sequences, a proportion of unstable OTUs, and Matthews’s correlation coefficient (MCC) to quantify OTU stability. Unstable sequences were defined to include unique sequences that were represented by different OTU centroids at the different sequencing depths. Unstable OTUs were defined to include OTUs whose membership changed at different sequencing depths (in other words, if an OTU is composed of the same sequences at different sequencing depths, excluding sequences that were not included in the smaller subsample, then that OTU is defined as being stable). To compute the percent of unstable sequences and unstable OTUs, we compared the clustering result of the full dataset with that of the 60% subsample using different clustering methods to analyze 30 replicates for each method. If a unique sequence was represented by the same representative sequence in both the 60% and the full dataset (excluding sequences not present in the 60% dataset), we considered the sequence to be stable. Otherwise, the sequence was considered to be unstable. If an OTU in the 60% subsample contained all of the same sequences as the full dataset (not including any sequences not present in the 60% subsample), we considered the OTU to be stable. Otherwise, we considered the OTU to be unstable. We additionally grouped all of the OTUs according to the frequency of their centroid at counts of 1, 2, 3 to 5, 6 to 10 and higher than 10, for the purpose of evaluating the stability of each of these groups.
If two sequences were clustered together in the full dataset and also in the 60% subsample, we recorded it as a true positive (TP). If two sequences were clustered separately in the full dataset and also in the 60% subsample, we recorded it as a true negative (TN). If two sequences were clustered together in the full dataset but not in the 60% subsample, we recorded it as a false negative (FN). Finally, if two sequences were clustered separately in the full dataset but together in the 60% subsample, we recorded it as a false positive (FP). Higher MCC values indicate enhanced stability of the clustering method.
This work was supported by the National Natural Science Foundation of China (NSFC 31270152 and 31322003), the COMRA project (DY125-15-R-01), the Program for New Century Excellent Talents in University (NCET-11-0921), the Guangdong Natural Science Foundation (S2011010004136), the Educational Commission of Guangdong Province, China (2012KJCX0031), a grant awarded by the United States National Science Foundation [NSF/BDI 0960626 to SMH], and grants awarded by the National Institutes of Health and the Howard Hughes Medical Institute.
- The Human Microbiome Project Consortium. A framework for human microbiome research. Nature. 2012;486:215–21.View ArticleGoogle Scholar
- Jiang X-T, Zhang H, Sheng H-F, Wang Y, He Y, Zou F, et al. Two-stage clustering (TSC): a pipeline for selecting operational taxonomic units for the high-throughput sequencing of PCR amplicons. PLoS ONE. 2012;7:e30230.PubMed CentralPubMedView ArticleGoogle Scholar
- Schloss PD, Westcott SL. Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol. 2011;77:3219–26.PubMed CentralPubMedView ArticleGoogle Scholar
- Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–1.PubMedView ArticleGoogle Scholar
- Edgar RC. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods. 2013;10:996–8.PubMedView ArticleGoogle Scholar
- Roesch LFW, Fulthorpe RR, Riva A, Casella G, Hadwin AKM, Kent AD, et al. Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J. 2007;1:283–90.PubMed CentralPubMedGoogle Scholar
- Rideout JR, He Y, Navas-Molina JA, Walters WA, Ursell LK, Gibbons SM, et al. Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences. Peer J. 2014;2:e545.PubMed CentralPubMedView ArticleGoogle Scholar
- Huse SM, Welch DM, Morrison HG, Sogin ML. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ Microbiol. 2010;12:1889–98.PubMed CentralPubMedView ArticleGoogle Scholar
- McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A, et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME Journal. 2011;6:610–8.PubMed CentralPubMedView ArticleGoogle Scholar
- Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. 2011;27:2194–200.PubMed CentralPubMedView ArticleGoogle Scholar
- Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–41.PubMed CentralPubMedView ArticleGoogle Scholar
- Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7:335–6.PubMed CentralPubMedView ArticleGoogle Scholar
- Daniel McDonald JCC, Justin K, Jai Ram R, Jesse S, Doug W, Andreas W, et al. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. Giga Science. 2013;1:7.View ArticleGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.