Strategy | Pros | Cons | Data combination bias |
---|---|---|---|
Closed-reference | • Is extremely parallelizable | • Is limited to finding diversity present in OTU reference | • May show large bias if combining studies with differential representation in the reference |
• Computes reference assignments only once | |||
• Is highly unlikely to retain non-16S sequences | |||
• Supports and reads fragments from multiple loci | |||
• Gets the phylogeny and taxonomy for free | |||
De novo | • Utilizes all of the sequences | • Must hold all sequence data in memory | • May generate spurious OTUs if combining studies with differential error profiles |
• Requires no OTU database | • Is very complex to parallelize | ||
• Can group organisms distinct from anything seen before | • Produces spurious OTUs without pre-filtering | ||
• May produce phylogenies sensitive to subtle differences in OTUs | • Is infeasible if data are from multiple loci | ||
• Must redo OTU picking with all data being combined | |||
Open-reference | • Leverages an OTU database but also utilizes sequences that do not match to that database | • Produces spurious OTUs without pre-filtering | • Shows less bias due to differential diversity representation than closed-reference |
• Is infeasible if data are from multiple loci | |||
• Is modestly parallelizable | • Must redo OTU picking with all data being combined | ||
• Shows less bias due to differential error profiles than de novo |