Skip to main content

Table 1 Taxonomic classification performance on genus level for benchmark datasets

From: Flexible metagenome analysis using the MGX framework

 

Kraken

Kaiju

Centrifuge

MetaPhlAn 2

MGX

RefSeq

 

True positive

12,059,412

9,329,288

12,611,380

414,943

12,566,362

False positive

18,748

185,899

53,092

7,171

20,698

False negative

1,281,840

3,844,813

695,528

12,937,886

772,940

Sensitivity

0.9039

0.7082

0.9477

0.0311

0.9421

Precision

0.9984

0.9805

0.9958

0.9830

0.9984

Accuracy

0.9027

0.6983

0.9440

0.0311

0.9406

F1 score

0.9488

0.8224

0.9712

0.0602

0.9694

GenBank

 

True positive

1,851,436

2,592,655

2,175,122

92,383

3,976,270

False positive

398,899

1,230,445

864,989

10,378

734,389

False negative

9,629,665

8,56,900

8,839,889

11,777,239

7,169,341

Sensitivity

0.1613

0.2435

0.1975

0.0078

0.3568

Precision

0.8227

0.6782

0.7155

0.8990

0.8441

Accuracy

0.1558

0.2182

0.1831

0.0078

0.3347

F1 score

0.2697

0.3583

0.3095

0.0154

0.5015

  1. All tools achieve high precision on the RefSeq-derived metagenome, as the source organisms are already included in the relevant classification databases. For the GenBank-based metagenome containing only species not present in the tools’ databases, MetaPhlAn 2 offers high precision but only a very low sensitivity (0.78%), followed by the MGX-provided default pipeline, which ranks highest in sensitivity and accuracy as well as F1 score. Numbers in italics denote best results