Skip to main content
Fig. 2 | Microbiome

Fig. 2

From: High-resolution strain-level microbiome composition analysis from short reads

Fig. 2

The overview of StrainScan. (a) The sketch of the strain genome clustering process. Given the strain genomes (G1, G2, ...) of the bacteria of interest, all-against-all k-mers Jaccard similarities are computed using Dashing [40]. Genomes are then clustered using single-linkage hierarchical clustering. By default, the clustering threshold is set to a Jaccard similarity of 0.95. In this example, given the cutoff represented by the dashed red line, five clusters from C1 to C5 are output by the clustering process. (b) Given the clusters, construct the hierarchical cluster tree for later cluster-level identification. (c) Generate collinear blocks to extract k-mers that can help distinguish different strains inside the same cluster. (d) Step d concludes the indexing structure process for the reference genomes. (e) and (f) The indexing structure and the sequencing data (reads) are input for strain search. (e) Search for clusters. (f) Strains are identified by the iterative matrix multiplication, and the relative abundance profile is finally inferred by elastic net regression

Back to article page