From: Recovering complete and draft population genomes from metagenome datasets
Method | Starting point | Clustering methods | Negatives | Positives | Computational Resources |
---|---|---|---|---|---|
Nucleotide composition (NC) | Oligonucleotide frequency matrix and %G+C-based screening. | HCL, correlation-based network graph and emergent self-organization maps (ESOM). | (i) More efficent for the genomes with skewed nucleotide composition patterns. | (i) Individual metagenome assemblies or samples where populations do not change over time can be used. | (i) R packages: qgraph (8), i graph, pv-clust [82] |
(ii) tetramerFreqs [83] (https://github.com/tetramerFreqs/Binning) | |||||
(iii) Databionic ESOM tools [84]. (http://databionic-esom.sourceforge.net/) | |||||
(ii) Less efficient in differentiating between closely related genotypes. | |||||
(iv) 2T-binning [85] (http://hmp.ucalgary.ca/HMP/metagenomes/data/SCADC/454/Binning/2TBinning/) | |||||
(iii) Depends on the visualization and manual inspection of bins and therefore are not suitable for very large assemblies representing complex environments. | |||||
Nucleotide composition and abundance (NCA) | A composite distance matrix from oligonucleotide frequency matrix and coverage. | K-medioids clustering, Gaussian mixture models, and expectation and maximization algorithm. | (ii), (iv) Require multiple samples for better performance, and therefore are associated with cost, time, and computational resources. | (i), (ii) Improved contig binning than NC method. | (i) MetaBAT [54]. (https://bitbucket.org/berkeleylab/metabat) |
(ii) CONCOCT [86] (https://github.com/BinPro/CONCOCT) | |||||
(iii) MaxBin [87] (http://downloads.jbei.org/data/microbial_communities/MaxBin/MaxBin.html) | |||||
(iv) GroopM [57]. (https://github.com/minillinim/GroopM) | |||||
(v) Databionic ESOM tools [84] (http://databionic-esom.sourceforge.net/) | |||||
Differential abundance (DA) | Differential coverage patterns across multiple samples where population changed in abundance over time. | Profile based correlation cut-off. | (iv) Must have multiple samples with population changed in abundance over time, and therefore are associated with cost, computational time, and resources. | (ii), (iii) Strain level resolution can be achieved. | (i) Multi-metagenome [49] (https://github.com/MadsAlbertsen/multi-metagenome) |
(ii) MGS Canopy algorithm [51] (https://github.com/fplaza/mgs-canopy-algorithm). | |||||
(iii) Databionic ESOM tools [84] (http://databionic-esom.sourceforge.net/) |