Parameter | Value | Comment |
---|---|---|
megablast | Megablast is optimized for rapid homology search of nucleotide sequences with expected high sequence identity | |
taxidlist | All bacterial NCBI taxa ID | The nucleotide database is masked to include bacterial sequences only. The full list of IDs was obtained from the NCBI taxonomy database with Entrez filter “txid2[Organism:exp]” (n = 515,103 at the time of writing) |
word_size | 64 | The performance of BLASTn in recovering a bacterial host was consistent across word sizes 32, 48, and 64. As the longer word size contributes to faster searches, 64 was selected |
max_hsps | 3 | |
qcov_hsp_perc | 50 | |
perc_identity | 75 | |
max_target_seqs | 500 | The initial BLAST + results are limited to the top 500 matches. However, when a particular segment matches 500 times with identical bit scores across bacteria, the sequence homology search is repeated with the identity and coverage set to the maximum value of the first run, with homology hits to return = 5000. This ensures that all top-scoring bacteria are represented in the results |