From: The evolutionary signal in metagenome phyletic profiles predicts many gene functions

Inferring gene function from metagenomes representing distinct environments. a Proportions of Gene Ontology (GO) terms that can be simultaneously predicted from a certain number of environments, expressed for three different stringencies (Pr thresholds). b Ability to predict GO functions, expressed as the function-specific accuracy of the environment-representing MPP. Rows in heatmaps represent highly specific GO functions (IC > 8), columns are environments, and brighter colors represent higher accuracy (as AUPRC score). Rows are ordered by hierarchical clustering (full dendrogram in Additional file 1: Figure S9). c Distribution of the selected associations over seven environment types. d A REVIGO plot [82] showing the semantic similarity of the ‘Biological process’ GO functions that were associated with the human host metagenomic data. Circle color represents excess accuracy, computed by subtracting the function-specific AUPRC of the second-best MPP from the AUPRC of the best MPP. e, f Precision-recall curves for two GO functions associated with human host data sets. g, h Distributions of GO function relative abundances across metagenomes from different environments. Points in the violin plot represent first quartile, median and third quartile. Width of the violin plots is scaled proportionally to the number of observed metagenomes in the group

