Fig. 5From: Machine learning-aided analyses of thousands of draft genomes reveal specific features of activated sludge processesPerformance of the random forest model. a Confusion matrix showing the performance of the random forest model on the 20% testing data group of the holdout validation. b Prediction accuracy of the random forest model determined based on 10-fold cross-validation. c ROC curves for evaluating the random forest model created from 10-fold cross-validation. d The completeness and contamination of correctly predicted MAGs and wrongly predicted MAGs. Boxplots along the x- and y-axes show the means and quartiles of the completeness and contamination values of correctly and wrongly predicted MAGsBack to article page