Skip to main content
Fig. 5 | Microbiome

Fig. 5

From: Gut microbiome-metabolome interactions predict host condition

Fig. 5

Internal representation improves outcome prediction compared with microbiome and metabolites and is associated with dataset features. A Average SCC between the CCA outputs of the microbiome and metadata (pink), the metabolites and metadata (yellow), and LOCATE’s representation and the metadata (blue). A one-sided t-test is applied between the models. The stars follow the previous figures. BD Weights of the CCA between LOCATE’s representations and the metadata on its two first components on He (B), Jacob (C), and Poyet (D). When the variable is categorical, all the weights are stacked together in different colors (for the categorical information, see Supplementary material Table S7). The first component values are in blue colors and the second component values are in green. E, F Bar plots of average AUC (E) and the average SCC (F) of the predicted outcomes over different datasets and different tasks. The pink colors represent the different microbiome-based models. The light pink represents an iMic model trained on the microbiome data only (referred to as “Mic. iMic”). The dark pink represents an iMic model trained on the microbiome and the metadata together (referred to as “Mic., meta iMic”). The yellow colors represent the metabolites-based models. The light yellow represents a logistic regression (LR) model in E or a Ridge model in F trained only on the metabolites (referred to as “Met. LR”) and the dark yellow represents an iMic model trained on both the metabolites and microbiome (referred to as “Mic., Met. iMic”). The blue colors represent the models based on LOCATE. The lightest light blue represents the Log network (referred to as “Log-log LR”). The intermediate blue represents a model trained on LOCATE’s representation (referred to as “Z LOCATE LR”), while the darkest blue represents a model trained on both LOCATE’s representation and the metadata (referred to as “Z LOCATE, meta LR”). The standard errors between the 10 cross-validations are in black. A one-sided t-test was applied between the models. The p-values \(< 0.001\) in all the comparisons apart from Kim and some of the LI tasks. GI Effect of a decreasing number of metabolites for LOCATE’s representation on the condition predictions in He (G), Jacob (H), and Poyet (I). The x-axis represents the number of pairs of microbiome and metabolites used for the training of LOCATE; the y-axis represents the difference between the average AUC (over 10 runs) of the predicted outcome based on LOCATE’s representation and the average AUC (over 10 runs) of the predicted outcome based on the microbiome only. In most of the datasets, 50 metabolites are enough for LOCATE’s representation to be better than the microbiome. The pink line represents the zero value, and the dashed yellow line represents the metabolites’ contribution (of all samples) to the microbiome. When LOCATE is better the point is above the pink line. J, K Bar plots of average AUC (J) and the average SCC (K) of the predicted outcomes over different datasets and different tasks. The orange color represents the Multiview model’s results. The red colors represent the IntegratedLearner different models. The pink-red color represents an IntegratedLearner variant of microbiome only, the orange-red color represents an IntegratedLearner variant of metabolites only, the red color represents an IntegratedLearner variant of stacked, and the dark red color represents an IntegratedLearner variant of concatenated. The blue color represents LOCATE. The standard errors between the 10 cross-validations are in black. A one-sided t-test was applied between the models. The p-values \(< 0.001\) in all the comparisons apart from the LI task, Poyet, and VAT18. All the results of A, E, F, J, and K are reported as an average of 10 runs on an external test set

Back to article page