Skip to main content

Table 1 BLAST + settings used in MGS2AMR

From: MGS2AMR: a gene-centric mining of metagenomic sequencing data for pathogens and their antimicrobial resistance profile

Parameter

Value

Comment

megablast

 

Megablast is optimized for rapid homology search of nucleotide sequences with expected high sequence identity

taxidlist

All bacterial NCBI taxa ID

The nucleotide database is masked to include bacterial sequences only. The full list of IDs was obtained from the NCBI taxonomy database with Entrez filter “txid2[Organism:exp]” (n = 515,103 at the time of writing)

word_size

64

The performance of BLASTn in recovering a bacterial host was consistent across word sizes 32, 48, and 64. As the longer word size contributes to faster searches, 64 was selected

max_hsps

3

 

qcov_hsp_perc

50

 

perc_identity

75

 

max_target_seqs

500

The initial BLAST + results are limited to the top 500 matches. However, when a particular segment matches 500 times with identical bit scores across bacteria, the sequence homology search is repeated with the identity and coverage set to the maximum value of the first run, with homology hits to return = 5000. This ensures that all top-scoring bacteria are represented in the results