Skip to main content
Fig. 1 | Microbiome

Fig. 1

From: MetaDecoder: a novel method for clustering metagenomic contigs

Fig. 1

The two-layer architecture of MetaDecoder. A GPU-based modified Dirichlet process Gaussian mixture model (DPGMM) is designed as the first layer to cluster all contigs (≥ 2.5 Kb by default) into preliminary clusters based on the combination of k-mer frequency and coverage. These clusters with an average Euclidian distance of pairwise k-mer frequencies being greater than 0.04 will be marked as abnormal clusters and removed from the subsequent analysis. Each preliminary cluster is then involved in the second layer to be further clustered, which comprises a semi-supervised k-mer frequency probabilistic model with an elaborated seed selection model, and a modified Gaussian mixture model (GMM) as the coverage probabilistic model. Pure clusters (with an estimated genome number of one) are output, and the remaining contigs will continue to the next iteration until all contigs are consumed

Back to article page