Skip to main content

Table 1 Overview of tools to identify and predict phage sequences in microbial ecosystems

From: Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data

Software

Description

Reference

DeepVirFinder

Predicts viral sequences via a k-mer-based deep learning method using convolutional neural networks (CNN). Based on VirFinder

[37]

MARVEL

Machine learning tool for predicting phage sequences in metagenomic bins

[38]

MetaPhinder

Integrates BLAST hits to multiple phage genomes in a database to identify phage sequences in assembled contigs

[39]

viralVerify (metaviralSPAdes)

ViralVerify is a module of metaviralSPAdes which classifies contigs with a Naïve Bayes classifier based on Hidden Markov models protein hits

[40]

PhaMers

Identifies phage sequences by a machine learning model based on k-mer frequencies

[41]

PPR-Meta

Deep learning CNN approach to identify both phages and plasmids

[42]

Seeker

Deep learning framework that uses long short-term memory model (LSTM) which does not depend on sequence motifs

[43]

VIBRANT

Deep learning neural network based on protein signatures which also highlights auxiliary metabolic genes and pathways

[35]

ViraMiner

Extension of DeepVirFinder that is trained to identify any virus that may colonise human samples

[44]

VirFinder

K-mer-based machine learning method for identification of viral contigs

[45]

virMine

Iterative pipeline that relies on the abundance of nonviral sequences in databases to strictly filter out unwanted contigs. Pipeline accepts both reads or assembled contigs

[46]

VirMiner

Web-based pipeline that handles genome assembly, functional annotation using a variety of databases and identification of phage contigs via a random forest algorithm

[47]

VirNet

Deep learning neural network using an attentional neural model trained on nucleotide viral fragments

[48]

VIROME

Web-based pipeline that classifies viral sequences based on homology to databases and functionally annotates them. No local version

[34]

VirSorter

Uses referenced-based and reference-free approaches in unison relying on probabilistic similarity models and referenced-based protein homology searches to increase novel virus detection

[28]

VirSorter2

Builds on VirSorter by applying machine learning to evaluate “viralness” using genomic features. Works with a wider variety of viral groups than its predecessor

[36]

VirusSeeker

Made up of two BLAST-based pipelines — virome and discovery. Virome aligns reads to a curated database to identify viral sequences and compute their abundance in the sample. Discovery focuses on contig-based analysis to aid novel virus discovery

[49]

  1. Tools in italics were not included in this study as they were either not relevant to this study or technical difficulties were encountered during their use. MARVEL was excluded as it currently limited to detecting phages of the Caudovirales order