Skip to main content
Fig. 4 | Microbiome

Fig. 4

From: Model-free prediction of microbiome compositions

Fig. 4

Dissimilarity-overlap analysis of the training data as a proxy for the kNN performance. a-c Examples of dissimilarity-overlap curves (DOCs) for three cohorts of \(m=1000\) simulated samples, with \(N=40\) species, interaction strength \(\sigma =3.4\), and different universality values (\(\lambda = 0, 0.25, 0.5\)). The slope of the DOC is calculated using linear fit over the highest 20% overlapping points (blue lines). d The kNN gain, \(\Delta\), versus the DOC slope for different cohorts of simulated samples. For each cohort, we calculate its DOC slope as well as its \(\Delta\) value, calculated based on 50 test samples. Different cohorts were generated by choosing the ‘interaction strength’ and the ‘universality’ features. Blue symbols represent the mean \(\Delta\) over 10 cohorts with a fixed universality value of \(\lambda =0\) and interaction strength \(\sigma\) ranging between 0.8 and 3.4. Yellow diamonds represent the mean \(\Delta\) over 10 cohorts with a fixed interaction strength \(\sigma =3.4\) and \(\lambda\) between 0 and 1. The error bars represent the standard error (SE). The straight lines are linear fits (with goodness of fit \(R^2 = 0.99\) for both lines). e-g Examples of DOCs of three cohorts of real microbial samples from different body sites (“left retroauricular crease,” “hard palate,” and “subgingival plaque”). h Same as (d), for 13 body sites of the HMP dataset. The straight line is a linear fit (goodness of fit \(R^2 = 0.96\)). In both simulated and real microbial data, the kNN gain is larger for cohorts with steeper DOC slopes (larger absolute values)

Back to article page