Skip to main content

Table 1 Outline of current research gaps and future goals discussed at the January 2018 M3 Meeting

From: Current progress and future opportunities in applications of bioinformatics for biodefense and pathogen detection: report from the Winter Mid-Atlantic Microbiome Meet-up, College Park, MD, January 10, 2018

Research gaps Current limitations Community goals
Tracking microbial communities across time and topography (Key Conclusions 1 and 3)
 Importance: studies incorporating temporal and/or spatial sampling allow us to detect important shifts in community dynamics
 Application example: detecting the spread of infection in a hospital or of a pathogen contaminating crops and spreading food-borne illness
• Sequencing strategies are not able to quantify viable organisms (which is essential for biodefense applications)
• Lack of well-established statistical approaches for exploring longitudinal microbiome data
• Increased sample size makes these studies more expensive and harder to obtain sufficient statistical power for all subjects/time points/regions
• Collection, sequencing, and sharing of more time series datasets
• Development of statistical methods and tools to help analyze longitudinal and/or geospatial microbiome datasets
Looking beyond bacterial pathogens (Key Conclusion 2)
 Importance: viral and fungal components of the microbiome are often under-explored, despite their potential implications in biodefense
 Application example: better understanding the transmission of infectious viruses, like influenza
• Lack of a universally distributed marker gene (viruses)
• Difficult to obtain sufficient material from low biomass environments
• High levels of host contamination
• Incomplete databases
• More consistent database curation and maintenance (potentially incentivized financially or with publications)
• Improved gene function identification
Development and application of metagenomic analysis tools (Key Conclusion 4)
 Importance: computational tools need to be developed to help improve the utility of high-throughput sequencing strategies for biodefense problems
 Application example: improved metagenome assembly methods could better delineate between different strains of a pathogen in samples
• Tools for metagenome pre-processing, assembly, and binning are not always sensitive or fast enough for detection of pathogens in a sample
• As sequencing technologies advance, we need new tools to handle output from long- and short-read technologies, as well as single-cell metagenomics approaches
• Easy to install, open-access software with comprehensive documentation detailing best and worst use cases
• Defined metrics for critical assessment and validation of existing tools
• Software and database versions should be more consistently reported in the literature and preserved for future replication of analyses
Navigating the trade-off between speed and accuracy (Key Conclusion 4)
 Importance: metagenomic analysis used for pathogen detection and identification are time-sensitive
 Application example: deciding if a food product should be recalled due to contamination
• Current algorithms vary in speed and accuracy (often sacrificing one for the other)
• Large datasets, error-prone heuristics, and coarse resolution of k-mer-based methods present challenges
• Better documentation of available tools to help users optimize their software choice based on their available resources
• Improvements in sequencing technologies and tools/algorithms to improve both speed and accuracy
Storing and sharing data (Key Conclusion 5)
 Importance: access to publicly available datasets will help in verification of results and advance of scientific knowledge. Scientists need to be encouraged to move their data out of private silos and into shared databases
• Not all data can be shared because it is important to protect personally identifiable information or intellectual property rights
• Lack of sufficient infrastructure or manpower to upload or store datasets at scale
• Defined quality standard to maintain usable, open repositories
• Improved ways for secure interrogation of genomic datasets that cannot be openly shared due to privacy regulations