Skip to main content

Table 2 Optimized methods configurations for standard operating conditions

From: Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin

    

Mock

Cross-validated

Novel taxa

 

Target

Condition

Method

Parameters

F

P

R

F

P

R

F

P

R

Threshold

16S rRNA gene

Balanced

NB-bespoke

[6,6]:0.9

0.705

0.98

0.582

0.827

0.931

0.744

0.165

0.243

0.125

F = (0.49, 0.8, 0.1)

  

[6,6]:0.92

0.705

0.98

0.581

0.825

0.936

0.737

0.165

0.251

0.123

F = (0.7, 0.8, 0.15)

  

[6,6]:0.94

0.703

0.98

0.579

0.822

0.942

0.729

0.162

0.259

0.118

 
  

[7,7]:0.92

0.712

0.978

0.592

0.831

0.931

0.751

0.151

0.221

0.115

 
  

[7,7]:0.94

0.708

0.978

0.586

0.829

0.936

0.743

0.157

0.239

0.117

 
 

Naive-Bayes

[7,7]:0.7

0.495

0.797

0.38

0.819

0.886

0.761

0.115

0.138

0.099

 
 

rdp

0.6

0.564

0.798

0.457

0.815

0.868

0.768

0.102

0.128

0.084

 
  

0.7

0.55

0.799

0.438

0.812

0.892

0.746

0.124

0.173

0.096

 
 

Uclust

0.51:0.9:3

0.498

0.746

0.392

0.846

0.876

0.817

0.154

0.201

0.126

 

Precision

NB-bespoke

[6,6]:0.98

0.676

0.987

0.537

0.803

0.956

0.692

0.163

0.303

0.111

P = (0.94, 0.95, 0.25)

  

[7,7]:0.98

0.687

0.98

0.551

0.815

0.951

0.713

0.164

0.283

0.115

 
 

rdp

1

0.239

0.941

0.16

0.632

0.968

0.469

0.12

0.457

0.069

 

Recall

NB-bespoke

[12,12]:0.5

0.754

0.8

0.721

0.815

0.83

0.801

0.053

0.058

0.049

R = (0.47, 0.75, 0.04)

  

[14,14]:0.5

0.758

0.802

0.726

0.811

0.826

0.797

0.052

0.057

0.048

R = (0.7, 0.75, 0.04)

  

[16,16]:0.5

0.755

0.785

0.732

0.808

0.825

0.792

0.052

0.058

0.047

 
  

[18,18]:0.5

0.772

0.803

0.748

0.805

0.823

0.789

0.055

0.061

0.05

 
  

[32,32]:0.5

0.937

0.966

0.913

0.788

0.818

0.76

0.054

0.067

0.045

 
 

Naive-Bayes

[11,11]:0.5

0.567

0.77

0.479

0.793

0.82

0.768

0.059

0.065

0.055

 
  

[12,12]:0.5

0.567

0.769

0.479

0.79

0.816

0.765

0.059

0.064

0.055

 
  

[18,18]:0.5

0.564

0.764

0.477

0.779

0.807

0.753

0.057

0.063

0.051

 
 

rdp

0.5

0.577

0.791

0.48

0.816

0.848

0.787

0.068

0.079

0.06

 

Novel

Blast+

10:0.51:0.8

0.436

0.723

0.325

0.816

0.896

0.749

0.225

0.332

0.171

F = (0.4, 0.8, 0.2)

 

Uclust

0.76:0.9:5

0.467

0.775

0.348

0.84

0.938

0.76

0.219

0.358

0.158

 
 

VSEARCH

10:0.51:0.8

0.45

0.74

0.342

0.814

0.891

0.75

0.226

0.333

0.171

 
  

10:0.51:0.9

0.45

0.74

0.342

0.82

0.896

0.755

0.219

0.338

0.162

 

Fungi

Balanced

Naive-Bayes

[6,6]:0.94

0.874

0.935

0.827

0.481

0.57

0.416

0.374

0.438

0.327

F = (0.85, 0.45, 0.37)

   

[6,6]:0.96

0.874

0.935

0.827

0.495

0.597

0.423

0.399

0.473

0.344

 
   

[6,6]:0.98

0.874

0.935

0.827

0.505

0.629

0.423

0.426

0.52

0.361

 
   

[7,7]:0.98

0.874

0.935

0.827

0.485

0.596

0.409

0.388

0.47

0.33

 
  

NB-bespoke

[6,6]:0.94

0.928

0.968

0.915

0.48

0.567

0.416

0.371

0.433

0.325

 
   

[6,6]:0.96

0.928

0.968

0.915

0.491

0.59

0.42

0.393

0.466

0.34

 
   

[6,6]:0.98

0.927

0.97

0.913

0.504

0.624

0.422

0.421

0.512

0.358

 
   

[7,7]:0.98

0.935

0.97

0.921

0.487

0.596

0.412

0.386

0.466

0.329

 
  

rdp

0.7

0.929

0.939

0.922

0.479

0.572

0.413

0.382

0.451

0.332

 
   

0.8

0.924

0.939

0.915

0.507

0.633

0.422

0.434

0.534

0.366

 
   

0.9

0.922

0.937

0.913

0.517

0.698

0.411

0.47

0.617

0.379

 
 

Precision

Naive-Bayes

[6,6]:0.98

0.874

0.935

0.827

0.505

0.629

0.423

0.426

0.52

0.361

P = (0.92, 0.6, 0.3)

  

NB-bespoke

[6,6]:0.98

0.927

0.97

0.913

0.504

0.624

0.422

0.421

0.512

0.358

 
  

rdp

0.8

0.924

0.939

0.915

0.507

0.633

0.422

0.434

0.534

0.366

 
   

0.9

0.922

0.937

0.913

0.517

0.698

0.411

0.47

0.617

0.379

 
   

1

0.821

0.943

0.742

0.461

0.81

0.322

0.459

0.774

0.327

 
 

Recall

NB-bespoke

[6,6]:0.92

0.938

0.971

0.924

0.467

0.544

0.409

0.353

0.407

0.312

R = (0.9, 0.4, 0.3)

   

[6,6]:0.94

0.928

0.968

0.915

0.48

0.567

0.416

0.371

0.433

0.325

 
   

[6,6]:0.96

0.928

0.968

0.915

0.491

0.59

0.42

0.393

0.466

0.34

 
   

[6,6]:0.98

0.927

0.97

0.913

0.504

0.624

0.422

0.421

0.512

0.358

 
   

[7,7]:0.96

0.935

0.969

0.921

0.47

0.56

0.404

0.357

0.422

0.31

 
   

[7,7]:0.98

0.935

0.97

0.921

0.487

0.596

0.412

0.386

0.466

0.329

 
  

rdp

0.7

0.929

0.939

0.922

0.479

0.572

0.413

0.382

0.451

0.332

 
   

0.8

0.924

0.939

0.915

0.507

0.633

0.422

0.434

0.534

0.366

 
   

0.9

0.922

0.937

0.913

0.517

0.698

0.411

0.47

0.617

0.379

 
 

Novel

Naive-Bayes

[6,6]:0.98

0.874

0.935

0.827

0.505

0.629

0.423

0.426

0.52

0.361

F = (0.85, 0.45, 0.4)

  

NB-bespoke

[6,6]:0.98

0.927

0.97

0.913

0.504

0.624

0.422

0.421

0.512

0.358

 
  

rdp

0.8

0.923

0.939

0.915

0.507

0.633

0.422

0.434

0.534

0.366

 
   

0.9

0.921

0.937

0.913

0.517

0.698

0.411

0.47

0.617

0.379

 
  1. aF, F-measure; P, precision; R, recall
  2. bNaive Bayes parameters: k-mer range, confidence
  3. cRDP parameters: confidence
  4. dBLAST+/VSEARCH parameters: max accepts, minimum consensus, minimum percent identity
  5. eUCLUST parameters: minimum consensus, similarity, max accepts
  6. fThreshold describes the score cut-offs used to define optimal method ranges, in the following format: [metric = (mock score, cross-validated score, novel-taxa score)]. If two cut-offs are given, the second indicates a higher cut-off used to select parameters for the developmental NB-bespoke method, and the configurations listed are the union of the two cutoffs: the second cutoff for selecting NB-bespoke, the first for selecting all other methods