Skip to main content
Fig. 1 | Microbiome

Fig. 1

From: Metagenomic strain detection with SameStr: identification of a persisting core gut microbiota transferable by fecal transplantation

Fig. 1

Species-specific shared strain detection in metagenomic samples with SameStr. A Schematic of the SameStr workflow. SameStr has been implemented modularly, including optional wrapper functions for quality preprocessing and alignment of whole-genome shotgun (WGS) metagenomic reads to species-specific MetaPhlAn markers (align), functions for the conversion to nucleotide variant profiles (convert), extraction of markers from genome sequences (extract), sample and reference pooling (merge), extensive global, per-sample, marker and position filtering (filter) and comparison of SNV profiles (compare) based on maximum variant similarity (MVS). SameStr outputs (summarize) tables denoting pairwise comparison results, including species alignment similarity and overlap, and co-occurrence of taxa at distinct taxonomic levels (based on MetaPhlAn) and at the strain level. B SameStr identifies shared strains in metagenomic samples by calculating a pairwise MVS, using all single-nucleotide variants detected in the read alignments of these samples to species-specific marker genes. C To assess the MetaPhlAn-based phylogenetic resolution (db_v20) and validate the 99.9% similarity threshold of shared strains, which is used by SameStr, 458 bacterial genomes from 20 of the most abundant and prevalent fecal microbiota species in our rCDI cohort (Table S4) were compared with MetaPhlAn2 [30] and based on average nucleotide identities (ANIs) as determined with FastANI [31]. MetaPhlAn2 and FastANI-based pairwise sequence similarities are strongly correlated (Spearman’s r = 0.93, p < 2.2e−16, n = 9813), demonstrating comparable phylogenetic resolution. Genome similarities exhibit a multimodal distribution (two-dimensional density kernel contours): reference genomes share peak sequence similarities at 97.5%, 99.0%, and above 99.9% identity that reflect the presence of distinct species, subspecies, and strains in the reference dataset

Back to article page