Skip to main content

Table 2 Classification performance using RF under different levels of features covering total contributions

From: Statistical modeling of gut microbiota for personalized health status monitoring

Diseases

Accumulated contribution percentage for feature selection

KS-92 features

All 313 features

0.3

0.4

0.5

0.6

0.7

0.8

0.9

ACVD

0.731

0.738

0.747

0.762

0.752

0.791

0.788

0.770

0.784

SA

0.617

0.708

0.592

0.492

0.633

0.600

0.683

0.600

0.883

CRC

0.652

0.765

0.746

0.762

0.815

0.794

0.789

0.827

0.828

CA

0.539

0.463

0.550

0.529

0.554

0.632

0.621

0.523

0.554

UC

0.706

0.809

0.807

0.797

0.773

0.742

0.696

0.668

0.725

CD

0.862

0.868

0.878

0.934

0.934

0.959

0.950

0.953

0.986

T2D

0.575

0.665

0.623

0.693

0.712

0.696

0.694

0.618

0.591

IGT

0.674

0.487

0.530

0.560

0.622

0.562

0.462

0.540

0.592

RA

0.525

0.579

0.613

0.616

0.576

0.629

0.646

0.631

0.571

OB

0.723

0.706

0.756

0.786

0.773

0.777

0.800

0.800

0.883

OW

0.467

0.511

0.571

0.551

0.539

0.621

0.509

0.454

0.525

UW

0.497

0.683

0.733

0.811

0.751

0.805

0.737

0.638

0.516

Averaged

0.602

0.658

0.683

0.701

0.694

0.721

0.698

0.659

0.664

  1. aThe BHC of each species among diseases in the discovery cohort is ranked in descending order, and the total contribution is \(\sum\nolimits_{i=1}^D{\mathrm{BHC}}_{x_i}\). Features \(x_{1} ,x_{2} ,...,x_{p}\) are selected until the ratio of accumulated contribution \(\sum\nolimits_{i = 1}^{p} {{\text{BHC}}_{{x_{i} }} }\) to total contribution exceeds the given percentage \(\eta\), that is \({{\sum\nolimits_{i = 1}^{p} {{\text{BHC}}_{{x_{i} }} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^{p} {{\text{BHC}}_{{x_{i} }} } } {\sum\nolimits_{i = 1}^{D} {{\text{BHC}}_{{x_{i} }} } }}} \right. \kern-0pt} {\sum\nolimits_{i = 1}^{D} {{\text{BHC}}_{{x_{i} }} } }} > \eta\). Once the features have been selected, the RF is used for classification evaluation.
  2. bExcluding the SA with the least sample size, the CD with the best AUC result, and the IGT with the worst AUC result, the remaining AUC results were averaged, and when the ratio exceeded 0.8, the average AUC reached the best