Skip to main content

Table 1 Summary of technical terms

From: Streaming histogram sketching for rapid microbiome analytics

Term Definition
Consistent weighted sampling An efficient method of sub-sampling histogram data that takes into account the frequency of each bin
De novo Analyses based solely on the collected sequence data
Dimensionality reduction Representing the sequence data in a metagenome by a relatively small number of collective quantities
Dissimilarity measure A measure of how dissimilar two metagenomes are, typically used to identify significant changes in microbiome composition
Feature vectors A set of key quantities of a dataset that can be used as input to a machine learning algorithm
Histosketch A small approximate representation of histogram data, such as a k-mer spectrum.
Jaccard similarity A measure of the similarity of two datasets based on the proportion of shared members.
K-mer A short sub-sequence extracted from a read or genome
K-mer spectrum The set of all observed k-mers, together with their abundances in the sequence dataset
Locality-sensitive hashing A method of dimensionality reduction which hashes sequence data in such a way that similar sequences are kept together
Reference-based Making use of existing reference genomes to align and classify new sequencing data