Skip to main content

Optimizing methods and dodging pitfalls in microbiome research


Research on the human microbiome has yielded numerous insights into health and disease, but also has resulted in a wealth of experimental artifacts. Here, we present suggestions for optimizing experimental design and avoiding known pitfalls, organized in the typical order in which studies are carried out. We first review best practices in experimental design and introduce common confounders such as age, diet, antibiotic use, pet ownership, longitudinal instability, and microbial sharing during cohousing in animal studies. Typically, samples will need to be stored, so we provide data on best practices for several sample types. We then discuss design and analysis of positive and negative controls, which should always be run with experimental samples. We introduce a convenient set of non-biological DNA sequences that can be useful as positive controls for high-volume analysis. Careful analysis of negative and positive controls is particularly important in studies of samples with low microbial biomass, where contamination can comprise most or all of a sample. Lastly, we summarize approaches to enhancing experimental robustness by careful control of multiple comparisons and to comparing discovery and validation cohorts. We hope the experimental tactics summarized here will help researchers in this exciting field advance their studies efficiently while avoiding errors.


Studies of microbial communities—the microbiome—have become quite popular in recent years. These studies are powered by the new DNA sequencing technologies which allow acquisition of over one trillion bases of sequence information in a single instrument run. Using these methods, sequence profiles of microbial communities from different sources can be obtained and compared to elucidate the associated patterns in the microbiota. For example, human samples from a disease state can be compared to samples from healthy controls, allowing for quantification of differences [1,2,3,4,5,6,7,8]. In these studies, DNA is first purified from the samples. DNA sequencing is then used to characterize the associated taxa, querying either a marker gene (16S for bacteria, 18S for eukaryotes, and ITS for fungi) or all DNAs in a mixture (shotgun metagenomics sequencing). In at least some situations, the nature of these microbial communities matters a lot—fecal microbial transplantation radically resets gut community structure and cures relapsing Clostridium difficile infection in up to 90% of cases [9, 10].

Carrying out definitive experiments on the microbiota requires great care, as in any field of research. All analytical methods have biases that must be taken into account in experimental execution and interpretation. For example, for analysis of 16S rRNA gene segments, the choice of gene region studied influences the types of bacteria queried [11,12,13,14,15,16]. Another example, emphasized here, involves low microbial biomass samples. If there is very little microbial DNA in a specimen, the library preparation and sequencing methods will often return sequences that are derived primarily from contamination [17,18,19,20,21,22,23,24]. Contaminating sequences can originate in reagents, dust, crossover between samples, or other sources. Without appropriate precautions and controls, these false calls can be difficult to distinguish from authentic microbiota. Other challenges mentioned below include changes associated with sample storage, microbial sharing among animals during cohousing, and authentic longitudinal microbial instability in the body site of a host animal.

The goal of this article is to catalog major challenges in microbiome research and to outline approaches to address them. Many of these points have come up in the projects of the PennCHOP Microbiome Program, with which the authors of this article are associated. This review is intended to help our collaborators and other microbiome researchers wrestling with these issues. We will focus primarily on laboratory work important for microbiome analysis and touch on computational and statistical methods only briefly. Most examples will be from 16S rRNA marker gene sequencing, but examples from ITS marker gene sequencing for fungi and shotgun metagenomics are also discussed. Several good articles have also addressed these issues and are recommended as additional reading [25,26,27,28,29]. Reviews of methods for bioinformatics analysis of microbiome specimens include [28, 30,31,32,33]. We focus here on studies of the vertebrate microbiome and break out points that are specific to studies of humans and model organisms. We present sections in an order that matches the progression of performing an experiment—the paper begins with study design, continues with sample collection and processing, and concludes with analysis.

Planning a microbiome experiment

It is essential to plan carefully to ensure that the experiment carried out will answer the question posed. Plan the statistical analysis for your study at the start. If possible, carry out a power analysis. Several approaches tailored to microbiome research have been reported [34, 35].

Consider the influence of factors such as antibiotic use, age, sex, diet, geography, and pet ownership

The human microbiome is sensitive to its environment, which can considerably confound associating any particular condition or intervention with a change in microbiota composition. Drug use, diet, age, geography, pet ownership, and sex have all been reported to influence function and composition [36,37,38,39]. In 2008, Relman and colleagues documented effects of antibiotic treatment on the gut microbiome, and many subsequent studies have also reported effects [5, 40,41,42]. It has further been suggested that additional prescription drugs can affect microbiome analyses [43, 44]. For example, Imhann et al. have suggested that decreasing the acidity of the stomach with proton pump inhibitors allows upper gastrointestinal microbes to move down into the gut more readily [45], altering the composition of the lower gastrointestinal microbiota and increasing the risk of C. difficile infections.

Diet also influences the microbiota [5, 46,47,48,49,50,51,52,53,54,55,56]. Microbial community structure and gene expression are reported to change on short-time scales in response to extreme short-term alterations in diet [57]. Long-term dietary patterns have been linked to gut microbiomes dominated by certain genera—diets high in protein and animal fat are associated with high Bacteroides, whereas diets high in carbohydrates are associated with high Prevotella [55].

The human microbiome evolves from birth until death. Typically, the gut microbiota adopts a stable anaerobic pattern around age 3 years but varies in early life [58,59,60]. The microbiome also changes in old age, with institutionalized elderly commonly developing high levels of Proteobacteria [61]. Thus, it is critical to use age-matched controls for microbiota comparisons.

Sex can also affect microbiome studies. The gut microbiome serves as a virtual endocrine organ due to the metabolites and neurotransmitters it produces [62]. For example, early microbial exposure has increased testosterone levels in male mice, leading to a protective effect against type 1 diabetes [63]. When the microbiota from these protected male mice was transplanted into younger female mice, the same protection against type 1 diabetes was seen [63]. A study of an anti-psychotic drug on weight and gut microbiota in male and female rats reported that drug treatment induced significant weight gain in female rats only [64]. Microbial circadian rhythms in mice were reported to differ between sexes [65]. Sex differences in microbiota have also been reported in macaques [39, 66].

Remarkably, even sexual preference among men has been linked to gut microbiome differences [67], which may be a confounding factor in studies of gut microbiome and HIV infection where controls were not matched by sexual preference.

Other studies have investigated whether pets influence the human microbiome and vice versa [68]. One group showed that cohabiting adults shared more similar skin microbiota if they owned a dog [69].

How each of these factors will influence any given microbiome study is dependent on the question asked and the strengths of differences between study groups. In general, it is important to enumerate possible confounders during experimental design, quantify each, and then treat them each as independent variables in downstream statistical analyses.

Longitudinal instability

During experimental design, it is important to consider the longitudinal stability of the microbiota to be studied. The healthy human adult gut is known to be largely stable in microbial composition over time [70,71,72], and a perturbation in such stability—dysbiosis—has been associated with diseases such as inflammatory bowel disease [1, 5, 73]. However, the microbiome of other sites, like the human vagina, can vary on short-time scales without necessarily indicating dysbiosis [74,75,76,77,78]. Even the gut microbiome has been reported to display circadian behavior on a 24-h cycle [65, 79, 80]. Thus, for studies of a new sample type, it is essential to understand longitudinal variation in order to acquire samples that address the question posed.

Different batches of DNA extraction kit reagents can be a significant source of variation for longitudinal studies [23, 81]. It is wise to purchase all the extraction kits needed at the start of the study, or store samples and extract all at the same time, to minimize the effects of this variable.

Cage effects in animal experiments

Cage effects can derail microbiome studies in mice and may be important for other laboratory animals as well. Mice housed in the same cage come to share similar gut microbiota due to mixing by coprophagia [82]. For perspective, in a recent study, mouse strain was found to account for 19% of the variation in gut microbiota, whereas cage effects contributed to 31% [83].

To account for cage effects, an investigator must set up multiple cages for each study group and treat the cage as a variable in the final statistical analyses. One can then determine whether microbial communities differ between groups given the measured effect of the cage variable. To keep costs down, it is fine to house two to three mice per cage [84,85,86].

As an example, consider the longitudinal study of fungal populations during an antibiotic intervention in mice in Dollive et al. [87]. In this work, antibiotic treatment was associated with increased fungal colonization in the treated groups (Fig. 1). The fungi detected were mostly consistent within each cage, but varied from cage to cage within each treatment group and also in the untreated controls. The types of fungi detected changed longitudinally, but nevertheless were consistent within cages. This highlights how potent cage effects can be, and emphasizes the importance of analyzing multiple cages per study group.

Fig. 1

Example of cage effects dominating a mouse study of fungal communities. Fungal lineages in the murine gut were inferred from ITS rRNA gene sequencing of pellets [87]. The heat maps summarize taxonomic assignments derived from the sequence data. The color scale to the right indicates the proportions of each lineage; white indicates not detected. Caging dominated over treatment in this study. The three conditions studied were continuous exposure to antibiotics (Condition 1), short-term exposure to antibiotics (Condition 2), and no exposure to antibiotics (Condition 3). For details see [87]

Considerations during sample collection and processing

Sample storage conditions

The most important considerations for storing microbiome samples are to reduce changes in the original microbiota from sample collection to processing and to keep storage conditions consistent for all samples in a study. Sample storage conditions are not always consistent between labs due to downstream applications and resource limitations. In 2010, Wu et al. compared human fecal samples that were immediately frozen at −80 °C, stored on ice for 24 h, or stored on ice for 48 h before DNA extraction and analysis. Differences due to storage method were not significant compared to differences between human individuals [88].

Due to an increased number of studies collecting samples from remote locations, several groups have assessed the efficacy of preservation methods that may be used when laboratory freezers are not readily available. In 2016, Song et al. tested the effects of different preservatives and temperature fluctuations on feces to mimic microbiome sampling in the field. If fecal samples cannot be frozen, store the samples in 95% ethanol, on FTA cards, or use the OMNIgene Gut kit [89]. These conditions are optimized for sample collection in the field; however, they may not be applicable to all studies depending on study goals and available resources. Other groups have also published similar sample storage studies [90,91,92,93,94,95,96,97].

We recently performed a study on the storage of oral swab samples and found conditions to be relatively forgiving (Fig. 2). In this study, we collected cheek swab samples from three healthy subjects and stored them in a variety of conditions (frozen at −20 °C, refrigerated at 4 °C, or stored at an ambient temperature of 20 °C for 0, 24, 48, 72, or 96 h) before freezing at −80 °C (details and additional analysis are presented in Additional file 1). Figure 2 shows a principal coordinates analysis of unweighted UniFrac distance between the samples. The subject identifier (Fig. 2a) accounted for almost half the total variation in UniFrac distances (R 2 = 0.47, P < 0.001). The storage conditions did not represent a significant effect (Fig. 2b)—we estimated the relative effect size at less than half the effect of inter-subject variability (R 2 = 0.17, P = 0.2). The UniFrac results were recapitulated in our analysis of taxon abundances, where the effect of subject far exceeded any potential storage effects. This analysis provided evidence that over a period of 3 days, storage conditions of cheek swabs did not substantially influence the measured oral microbiome composition for these subjects. Another group recently investigated the effect of collection method, storage condition, and storage medium on taxonomic relative abundance in saliva and dental plaque, and found saliva samples stored in OMNIgene medium to be relatively consistent after a week at room temperature [98].

Fig. 2

Effects of sample storage methods on community structure inferred for oral swabs. Oral swab samples were acquired from three human individuals and DNA extracted. DNAs were amplified using 16S rRNA gene primers binding to the V1-V2 region then sequenced using the Illumina platform using our standard procedures [88]. Unweighted Unifrac (C [129].) was used to generate distances between all pairs of samples then results were displayed using Principal Coordinate Analysis (PCoA). a Samples from each of the three subjects are color coded (red, blue, and green). b Nine storage conditions were compared, indicated by the different colors. The key to storage conditions is at the right

Optimal storage conditions have also been investigated for other sample types. Lauber et al. tested the effect of both temperature and length of storage on relative taxon abundance of bacterial communities in soil, human skin, and human fecal samples. The overall composition of bacterial communities and the relative abundance of most major bacterial taxa did not change with different storage conditions studied (P > 0.1 for all sample types) [99]. Replicate samples for both skin and feces clustered by host rather than by temperature or length of storage. However, Lauber et al. mentioned that one fecal sample replicate kept at room temperature was excluded from analysis due to visible fungal growth before DNA was extracted. Though convenience can be prioritized when handling samples over a short period of time (e.g., shipping samples on cold-packs for a 48 hour period before putting them in the freezer), we do recommend freezing samples promptly after collection or using alternative preservative methods if freezers are unavailable [89].

Low microbial biomass samples—managing environmental contamination

Handling and analyzing samples with low microbial biomass can be challenging. Reagent and laboratory contamination comprise a larger proportion of the total microbial load in these samples compared to samples with rich microbial communities (e.g., healthy human feces). The low absolute amount of starting material can be overpowered by trace amounts of DNA from reagents or laboratory instruments used for sample processing, so that some or all of the microbial reads can be derived from environmental sources. Accounting for potential contaminants is especially important when studying the microbiome of body sites with low levels of bacteria, such as the human lung and skin, or sites that may not normally harbor any microbes at all, such as various healthy tissues [17, 19, 22].

Problems with contamination were well recognized even before the era of deep sequencing [100,101,102]. More recently, several groups have reported on the presence of bacteria in DNA extraction kits—the “kitome”—as well as other reagents used during sample processing [20, 23, 24, 103]. Salter et al. demonstrated that serial dilutions of a bacterial culture produced more contaminating 16S sequence reads and fewer “real” reads with each subsequent dilution, until contamination accounted for the majority of total sequences [23]. This pattern occurred at three different institutes that participated in this study, indicating a widespread issue [23]. Salter and colleagues also investigated effects of the number of PCR cycles for amplification. For low biomass samples, 20 cycles was too low, but 40 cycles recovered both contaminating and authentic low level sequences [23]. Later, Kennedy and colleagues reported that starting template concentration was the major factor behind variability in sequencing results [104]. Even in metagenomic samples prepared without a targeted PCR amplification step, similar contamination patterns were observed for samples containing low amounts of microbial DNA [23].

The kitome varies between kits, and can even vary between different lots of the same kit [20, 23]. Thus, it is best to process all samples in a project side by side using the same batches of reagents. It is crucial to record the kit used to process each sample, and which batch of each kit was used. If multiple kits were used, treat kit batch as a factor in the statistical analysis.

In our lab, we have investigated different DNA extraction methods in order to minimize the presence of the kitome. While the MO BIO PowerSoil DNA Isolation Kit (MO BIO Laboratories, Carlsbad, CA, USA) provides high yields and has been used widely in microbiome work, including the Human Microbiome Project [105], the kit was designed to isolate DNA from soil, stool, and environmental samples which are high in microbial DNA. The MO BIO kit was not manufactured with the intention of minimizing background contamination. C. difficile and Streptophyta, for example, have both been identified as possible reagent contaminants in this kit [22]. For low microbial biomass samples, we instead recommend using DNA isolation kits designed to minimize kit contamination (e.g., the QIAamp UCP (UltraClean production) Pathogen Mini Kit (QIAGEN)). Remember: it is important to choose one kit type for all of the samples in a microbiome study. Thus, if a project contains both low and high microbial biomass samples, please commit to one kit type for all samples in order to avoid kitome variation.

On the analytical side, several methods have been developed for filtering suspected contaminating taxa. In a study of the human oral and lung microbiome, Bittinger et al. introduced a method to determine the probability that fungal taxa arose from contamination sources [18], making use of the total fungal DNA concentration, as approximated by post-PCR assays of DNA concentration using PicoGreen. The PicoGreen assay is usually included in the sequencing protocol as a standard step, so the data is available with no extra effort. Similarly, Lazarevic et al. presented a method that incorporates measurements of total DNA concentration by qPCR, a more accurate but more resource-intensive approach [106]. Jervis-Bardy and colleagues showed that contaminating taxa tend to show a strong decrease in relative abundance as total DNA concentration increases and used this as the basis of another method to remove contaminant taxa [21]. Individual contamination sources can be modeled using SourceTracker, which employs a Bayesian approach to estimate the relative fraction of sequence reads arising from each source [107].

Studies investigating a potential placenta microbiome provide a case study of the difficulties of working with low biomass samples (Fig. 3). Several groups have reported that there may be a unique, low-abundance microbiome in healthy human placenta [46, 108,109,110], but reporting of negative controls in these studies has been incomplete.

Fig. 3

Wrestling with kit contamination—similar bacterial composition in placental samples and negative controls. Relative abundances of bacterial lineages were inferred from 16S V1-V2 rRNA marker gene sequence information [22]. Samples studied included negative controls, fetal side (FS) placental swabs, maternal side (MS) placental swabs, saliva, and vaginal swabs. Replicates of each sample were extracted using two different kits—the kit type is indicated above each panel. Operating room (OR) air swabs are swabs that were waved in the air at the time of sample collection to be used as negative controls. Saliva samples, which are high in microbial biomass, showed similar compositions for each of the two extractions; placental samples resemble the kit-specific negative controls

However, a series of independent control studies showed no significant difference in taxonomic abundance between placenta samples and contamination controls [22]. Lauder and colleagues extracted DNA from placenta from six human subjects and worked them up alongside several types of blank swabs and empty extraction wells containing reagents only. DNA was extracted from samples using two different purification kits in order to characterize the contribution of the kitome. Real-time qPCR was performed to quantify total 16S rRNA gene copies in placental samples, controls, and saliva samples (from the same subjects) which were also purified using both DNA extraction kits. Placental samples and controls showed copy numbers that were low and indistinguishable from negative controls regardless of the kit used, whereas oral samples showed high signals several logs above background. Characterization of bacterial lineages by 16S rRNA gene sequencing showed that oral samples harbored distinct 16S profiles characteristic of the well-studied oral microbiota, and results were consistent between kits. However, placental and control samples looked similar to each other, but the pattern seen tracked with the DNA extraction kit used rather than with the sample type (Fig. 3). Several of the shared lineages found in placental and control samples were known contaminants of DNA extraction kits. The inference was that the kitome provided the predominant microbial signature in placental samples [22]. It remains to be seen whether future studies can show a clear distinction between placental samples and negative controls.

Negative control samples

It is essential to collect negative control samples to allow empirical assessment of the contamination background. We commonly include three types of negative control samples on each 16S rRNA marker gene sequencing run (Fig. 4). In “blank swab” samples, a sterile swab was opened from its package in the sequencing lab, and the full sequencing protocol was applied to the swab. In “blank extraction” samples, DNA extraction and all subsequent steps were carried out with no additional input material. In “blank library” samples, the extraction protocol was not applied; DNA-free water (UltraClean PCR Water, MO BIO Laboratories, Carlsbad, CA, USA) was used as input to the post-extraction steps of the protocol, starting with library generation, to characterize contamination in downstream steps.

Fig. 4

Analysis of three negative control sample types reveals contaminating taxa. Data for negative controls was acquired using 16S V1-V2 rRNA marker gene sequencing analyzed on the Illumina MiSeq platform. Data from 11 experiments were pooled. a Comparison of average read counts. Experimental samples had an average read count of 137,243 and negative control samples had an average read count of 6613. b Heat map summary of bacterial lineages present in negative control samples. Different OTUs are present in DNA-extraction controls (“blank extraction” and “blank swab”) and library preparation controls (“library blank”) collected over multiple sequencing runs

If microbial biomass is low, additional negative control samples can be included to measure contaminating DNA introduced during sample collection. As an example, in studies of the lung microbiome using bronchoalveolar lavage, an excellent negative control can be generated by washing the bronchoscope with a sample of the lavage saline prior to carrying out the bronchoscopy [19].

In our recent work, the average number of DNA sequence reads for negative control samples was typically five logs lower than the average for experimental samples derived from high biomass sites such as feces (Fig. 4a). The bacterial taxa appearing in negative control samples were among those previously reported as contamination in the literature, including Comamonadaceae, Ralstonia, and Propionibacterium (Fig. 4b).

Positive control samples

Side by side sequencing of new samples with well-vetted positive controls is strongly recommended. Positive control samples allow verification that sample preparation and sequencing procedures are running smoothly. When samples are purified on multi-well plates, the consistent placement of samples in defined locations on plates allows any sample tracking mix-ups to be detected in the sequence output. Positive and negative controls will ideally be positioned asymmetrically on extraction plates, uniquely defining the plate orientation.

Many studies have used positive controls comprised of mixtures of cultured organisms (“mock communities”) [23, 96, 111] or known mixtures of free DNA (“mock DNA” samples) [88, 112, 113], both of which make useful controls. Analysis usually shows that sequencing results are reproducible within a method and lab environment, but biases can differ between methods and labs [23].

For a simple positive control, we designed and synthesized mock DNA samples as gene blocks (Fig. 5a, see Additional file 2 for DNA sequences). We selected DNA to synthesize using regions of the 16S rRNA gene in eight archaeal species which would not normally be detected in experimental data because the sequences at the amplification primer binding sites in the archaeal V1-V2 region do not match the bacterial V1-V2 primers used. In the engineered sequences, bacterial 16S V1-V2 primer binding sites were added synthetically to archaeal controls, allowing amplification. This has the advantage that the control sequences can be easily distinguished from experimental samples while still being processed through the same pipeline. A disadvantage of this strategy is that such controls are specific to a particular primer set and must be remade for each amplicon used. However, given the low cost of synthetic DNA, cost for a set of positive controls is modest (about $450). After sequencing archaeal gene block samples in 11 separate sequencing runs, we found that the relative abundances of the sequences were relatively consistent (Fig. 5b).

Fig. 5

Synthetic non-biological 16S DNA as a positive control for 16S rRNA marker gene sequencing. a A diagram of the gene block design. At the top is a typical 16S rRNA gene amplicon, with primer binding sites for the widely used 27F and 338R primers. To generate recognizable sequences that would not be found authentically in samples, synthesized DNAs with the forward (27F) and reverse (338R) primer landing sites added to Archaeal DNA sequences, creating molecules not found in nature but readily analyzed using conventional pipelines. b Control sequence mixtures using the gene blocks show consistent relative abundances. Note that the eight gene blocks annotate as five archaeal taxa. c Heat map displaying the relative abundance of control gene blocks, where each square represents one well on a 96-well plate of a typical 16S rRNA marker gene sequencing project. Positive control wells where gene block was added and amplified alongside experimental samples are denoted with “x”

The gene block design provided an opportunity to test the level of cross-contamination between experimental samples during wet-lab library preparation (in 96-well plates) and sequence acquisition. Figure 5c shows representative results from one sequencing run. The abundance of control archaeal taxa did not increase with proximity to positive control samples on the 96-well plates (P = 0.6, linear regression analysis), suggesting that spill-over during preparation was not a prominent source of admixture between samples. However, low levels of these sequences could be detected in multiple dispersed samples (Fig. 5c, blue squares), potentially due to misreading of bar codes or hybridization of DNA molecules in adjacent clusters during Illumina sequencing [114]. A possible means of suppressing this would be to use bar codes on both ends of the amplicons and to require precise matches to both in the quality filtering [115].

The gene block scheme is a simple method for ensuring proper amplification of experimental samples, tracking sample mix-ups, and measuring sample cross-contamination during library preparation and sequencing. However, synthetic positive controls are not useful for benchmarking analytical and statistical methods. Analysis methods developed for real communities often do not perform as well on mock communities, and vice versa, due to the presence of naturally occurring sequence variation and low abundance taxa.

Many investigators use primers that simultaneously target the 16S region of both bacteria and archaea, for example, the 515fB/806rB primer set used by the Earth Microbiome Project [116, 117]. Here, there is no advantage to using archaeal sequences in the gene blocks because archaea might be observed in experimental samples. Nonetheless, investigators can build gene blocks using artificially altered DNA sequences that are different enough to be reliably distinguished from genomic sequence but similar enough to be compatible with the analysis pipeline. In Additional file 2, we present example gene block sets for the 515fB/806rB primer pair.

When artificial positive control samples are not suitable or cost effective, many of the benefits may be achieved by sequencing a small number of positive control samples collected from the field. We have used samples of pond water and saliva as indicators of consistency in sample preparation and sequencing, though ultimately found the mock DNA samples to be more convenient.

Contamination in shotgun metagenomic data

Microbial DNA introduced by reagents can also be detected in shotgun metagenomic sequencing. As for amplicon sequencing, contamination is particularly apparent in samples with low microbial biomass. This is seen both for samples with generally low biomass (e.g., skin swab) and for samples dominated by non-microbial DNA (e.g., tissue biopsy).

For example, in our work to characterize the microbiota in sarcoidosis, we performed shotgun metagenomic sequencing on tissue DNA extracted using both standard (DNeasy PowerSoil, Qiagen, Valencia, CA, USA) and low-contaminant (QiaAmp UCP Pathogen, Qiagen, Valencia, CA, USA) kits (unpublished data). When sequencing negative control samples, we observed that the kit background differed between the two (Fig. 6a). Lineages found in both kits were also present in our low biomass tissue samples, likely derived from reagents. Lineages found in both samples and controls included Propionibacterium spp. and Corynebacterium spp., commonly associated with human skin, and Bradyrhyzobium, a common soil bacteria also identified as a contaminant by other groups [23, 118]. Of concern, this lineage has been proposed to be responsible for a colitis syndrome in patients undergoing umbilical-cord hematopoietic stem-cell transplantation [118, 119]—it will be key to strengthen the link to colitis with additional forms of data to rule out contamination as an explanation.

Fig. 6

Contamination in shotgun metagenomic data. a Lineages observed in shotgun metagenomic sequencing of negative control samples using standard (DNeasy PowerSoil) and low-contaminant (QiaAmp UCP Pathogen) kits. b Detecting Bacillus phage phi29 polymerase reads in a blank sample. Twenty-one reads from a blank sample aligned to the DNA polymerase gene (1145 to 2863 bp) of Bacillus phage phi29. The protein was purchased as a reagent from a commercial supplier, suggestive of contamination of the protein with cloned DNA encoding the polymerase gene used in protein over-expression

This indicates that while some reagent contamination is unavoidable, usage of low-contaminant kits reduces the total sequencing effort spent on contaminants. Furthermore, it highlights the importance of sequencing and analyzing extraction controls, because without them it is impossible to distinguish reagent contamination from true microbial signals.

An extreme example of contamination detection comes from virome analysis, where multiple displacement amplification is used to amplify specimens. The multiple displacement amplification method uses the phage phi29 DNA polymerase, a highly processive phage polymerase, to copy target DNA prior to library preparation. Shotgun metagenomic sequencing of a blank virome prep sample (unpublished data) returned hits on phage phi29, but upon inspection, these turned out to align exclusively to the polymerase gene (Fig. 6b). Evidently the amplification method was so sensitive that we recovered the gene used to produce a protein that we had purchased from a commercial supplier and used in our library preparation procedure.

Considerations during analysis

This article is mostly concerned with optimal procedures for laboratory methods, but we do want to comment on three issues in analyzing and interpreting microbiome data.

Handling of negative controls

It is essential to report compositions of negative control samples as for all other samples. Work up negative control samples through the full pipeline. Sequence negative control samples even if library yield is low or undetectable. Show the lineages present in stacked bar graphs or heat maps. Check negative control data into sequence archives when experimental samples are deposited. Do not just subtract lineages in negative controls and consider the problem solved. There is no reason to think that contaminating lineages are fully sampled without specific evidence, and there can be cases where environmental lineages are authentically present in samples and functionally important.

Controlling multiple comparisons

High-throughput sequencing experiments commonly generate sequence reads attributed to hundreds of taxa. Researchers wishing to know which taxa are potentially associated with a difference in phenotype must make many comparisons, each time testing a null hypothesis of no difference in taxon abundance. In addition, studies will often involve multiple types of clinical data, allowing myriad comparisons over the microbiome data set. If the acceptable false positive rate for the test is set at a certain level (e.g., 5%), these repeated comparisons will raise the chances of getting a false positive higher than that level. To re-adjust the false positive rate back to the desired level, a multiple testing correction must be used.

This type of problem—controlling for multiple comparisons—is well covered by the statistical literature. A conservative approach is to ensure that none of the hypotheses are falsely rejected, within a specified probability, using the Bonferroni correction [120]. However, this method has been shown to be unacceptably conservative, leading to too many false negatives. A more popular approach is to control for a pre-specified rate of false discovery (i.e., false rejections of the null hypothesis). Benjamini and Hochberg presented a method to control for the false discovery rate in a series of independent tests [121], and this is the formulation used in microbiome analysis software such as QIIME [122] and Mothur [123]. Use of a multiple testing correction is strongly recommended whenever multiple comparisons are made.

Discovery and validation cohorts

Moving beyond single experiments, researchers can provide better and more reliable evidence for a discovery by re-producing the results in an independent cohort of samples. The use of separate discovery and validation cohorts is standard in genome-wide association studies, which are also massively multivariate (e.g., [124].). Using this strategy in the microbiome context, the experiment is first conducted in the discovery cohort and taxa or gene types are selected using a particular testing procedure. The validation cohort is then analyzed to test only those results found to be significant in the discovery cohort. The total number of tests is thus drastically reduced in the validation cohort.

Several microbiome studies have used independent discovery and validation cohorts to select taxa of interest for a disease state. Sabino et al. identified three bacterial genera associated with primary sclerosing cholangitis in a discovery cohort and used their results to correctly classify 75% of subjects in an independent validation cohort [125]. Forslund et al. used separate cohorts to replicate their findings of taxa altered in metformin-treated subjects with type 2 diabetes mellitus [126]. In a series of papers, a composite index of bacterial taxon abundance in stool associated with inflammatory bowel disease (IBD) was developed in one group of subjects [73], and then found to distinguish IBD from healthy controls in an independent follow-up study [127]. Kelsen et al. applied the discovery-validation cohort design to determine differences in the subgingival microbiota between children with Crohn’s disease and healthy controls [128], and successfully demonstrated reproducible taxa. Additionally, they were able to distinguish taxa that were associated with antibiotic use from those associated only with the disease.


Summarizing the considerations above, we can make several recommendations for the design and execution of microbiome studies.

  • For analysis, multiple confounding factors need to be taken into account, including antibiotic use, age, sex, diet, geography, and pet ownership.

  • In animal studies, cage effects can dominate over what may seem to be extreme interventions. Thus, it is critical to set up each condition to be studied in multiple cages, so that the caging variable can be isolated and accounted for.

  • Although we recommend storing samples, especially fecal samples, at −80 °C immediately after collection for most accurate results, alternative storage methods for field studies also lead to results with relatively small deviations. For new sample types, it will be wise to test for changes during storage under study-specific storage conditions.

  • In a cross-sectional study, it is essential to know whether the time point sampled will be representative. For example, the healthy adult gut microbiota does not change radically over short time scales, but that of the vagina sometimes does. Therefore, it is important to assess the relationship of possible longitudinal dynamics to the question posed.

  • Be energetic in creating and analyzing negative controls—DNA extraction kits usually come with contaminants, and contamination may vary between suppliers and even between batches of the same kit.

  • Use positive controls for each batch of samples. Mock communities are valuable for this, and the simple synthetic DNA controls presented here (Additional file 2) are also quite useful. Place controls asymmetrically in purification plates to verify proper sample tracking through the DNA purification and library preparation procedures.

  • Low microbial biomass samples present many challenges. When starting a study that might involve low microbial biomass samples, it is essential to quantify the microbial load in the samples to understand the extent of the challenge. QPCR of total 16S rRNA gene copies can be used for this purpose, as can conventional plating assays if applicable. In an experiment that may involve low biomass samples, start with the null hypothesis that all sequence data reflects contamination only, and ask whether this idea can be rejected in a statistical analysis of the data.

  • Be realistic about “data dredging,” that is, imposing a rigorous statistical method to control multiple comparisons.

  • Lastly, if affordable, it greatly strengthens a study to assess effects in separate discovery and validation cohorts.

There is no question that the human microbiota are critical for health and disease—by attending to the above challenges, one can generate high quality data to drive new discoveries in this exciting field.



Internal transcribed spacer


  1. 1.

    Chehoud C, Albenberg LG, Judge C, Hoffmann C, Grunberg S, Bittinger K, Wu GD. Fungal signature in the gut microbiota of pediatric patients with inflammatory bowel disease. Inflamm Bowel Dis. 2015;21(8):1948–56. doi:10.1097/MIB.0000000000000454.

    PubMed  PubMed Central  Article  Google Scholar 

  2. 2.

    Debelius JW, Vazquez-Baeza Y, McDonald D, Xu Z, Wolfe E, Knight R. Turning participatory microbiome research into usable data: lessons from the american gut project. J Microbiol Biol Educ. 2016;17(1):46–50. doi:10.1128/jmbe.v17i1.1034.

    PubMed  PubMed Central  Article  Google Scholar 

  3. 3.

    Faust K, Sathirapongsasuti JF, Izard J, Segata N, Gevers D, Raes J, Huttenhower C. Microbial co-occurrence relationships in the human microbiome. PLoS Comput Biol. 2012;8(7):e1002606. doi:10.1371/journal.pcbi.1002606.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Human Microbiome Project, C. A framework for human microbiome research. Nature. 2012;486(7402):215–21. doi:10.1038/nature11209.

    Article  CAS  Google Scholar 

  5. 5.

    Lewis JD, Chen EZ, Baldassano RN, Otley AR, Griffiths AM, Lee D, Bushman FD. Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric crohn’s disease. Cell Host Microbe. 2015;18(4):489–500. doi:10.1016/j.chom.2015.09.008.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Ley RE, Turnbaugh PJ, Klein S, Gordon JI. Microbial ecology: human gut microbes associated with obesity. Nature. 2006;444(7122):1022–3. doi:10.1038/4441022a.

    CAS  PubMed  Article  Google Scholar 

  7. 7.

    Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Wang J. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464(7285):59–65. doi:10.1038/nature08821.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8.

    Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature. 2007;449(7164):804–10. doi:10.1038/nature06244.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. 9.

    Kassam Z, Lee CH, Yuan Y, Hunt RH. Fecal microbiota transplantation for Clostridium difficile infection: systematic review and meta-analysis. Am J Gastroenterol. 2013;108(4):500–8. doi:10.1038/ajg.2013.59.

    PubMed  Article  Google Scholar 

  10. 10.

    van Nood E, Vrieze A, Nieuwdorp M, Fuentes S, Zoetendal EG, de Vos WM, Keller JJ. Duodenal infusion of donor feces for recurrent Clostridium difficile. N Engl J Med. 2013;368(5):407–15. doi:10.1056/NEJMoa1205037.

    PubMed  Article  CAS  Google Scholar 

  11. 11.

    Baker GC, Smith JJ, Cowan DA. Review and re-analysis of domain-specific 16S primers. J Microbiol Methods. 2003;55(3):541–55.

    CAS  PubMed  Article  Google Scholar 

  12. 12.

    D’Amore R, Ijaz UZ, Schirmer M, Kenny JG, Gregory R, Darby AC, Hall N. A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling. BMC Genomics. 2016;17:55. doi:10.1186/s12864-015-2194-9.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  13. 13.

    Liu Z, Lozupone C, Hamady M, Bushman FD, Knight R. Short pyrosequencing reads suffice for accurate microbial community analysis. Nucleic Acids Res. 2007;35(18):e120. doi:10.1093/nar/gkm541.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  14. 14.

    Mizrahi-Man O, Davenport ER, Gilad Y. Taxonomic classification of bacterial 16S rRNA genes using short sequencing reads: evaluation of effective study designs. PLoS One. 2013;8(1):e53608. doi:10.1371/journal.pone.0053608.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. 15.

    Schloss PD, Jenior ML, Koumpouras CC, Westcott SL, Highlander SK. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system. PeerJ. 2016;4:e1869. doi:10.7717/peerj.1869.

    PubMed  PubMed Central  Article  Google Scholar 

  16. 16.

    Tremblay J, Singh K, Fern A, Kirton ES, He S, Woyke T, Tringe SG. Primer and platform effects on 16S rRNA tag sequencing. Front Microbiol. 2015;6:771. doi:10.3389/fmicb.2015.00771.

    PubMed  PubMed Central  Google Scholar 

  17. 17.

    Aho VT, Pereira PA, Haahtela T, Pawankar R, Auvinen P, Koskinen K. The microbiome of the human lower airways: a next generation sequencing perspective. World Allergy Organ J. 2015;8(1):23. doi:10.1186/s40413-015-0074-z.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  18. 18.

    Bittinger K, Charlson ES, Loy E, Shirley DJ, Haas AR, Laughlin A, Bushman FD. Improved characterization of medically relevant fungi in the human respiratory tract using next-generation sequencing. Genome Biol. 2014;15(10):487. doi:10.1186/s13059-014-0487-y.

    PubMed  PubMed Central  Article  Google Scholar 

  19. 19.

    Charlson ES, Bittinger K, Haas AR, Fitzgerald AS, Frank I, Yadav A, Collman RG. Topographical continuity of bacterial populations in the healthy human respiratory tract. Am J Respir Crit Care Med. 2011;184(8):957–63. doi:10.1164/rccm.201104-0655OC.

    PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Glassing A, Dowd SE, Galandiuk S, Davis B, Chiodini RJ. Inherent bacterial DNA contamination of extraction and sequencing reagents may affect interpretation of microbiota in low bacterial biomass samples. Gut Pathog. 2016;8:24. doi:10.1186/s13099-016-0103-7.

    PubMed  PubMed Central  Article  Google Scholar 

  21. 21.

    Jervis-Bardy J, Leong LE, Marri S, Smith RJ, Choo JM, Smith-Vaughan HC, Marsh RL. Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of Illumina MiSeq data. Microbiome. 2015;3:19. doi:10.1186/s40168-015-0083-8.

    PubMed  PubMed Central  Article  Google Scholar 

  22. 22.

    Lauder AP, Roche AM, Sherrill-Mix S, Bailey A, Laughlin AL, Bittinger K, Bushman FD. Comparison of placenta samples with contamination controls does not provide evidence for a distinct placenta microbiota. Microbiome. 2016;4(1):29. doi:10.1186/s40168-016-0172-3.

    PubMed  PubMed Central  Article  Google Scholar 

  23. 23.

    Salter SJ, Cox MJ, Turek EM, Calus ST, Cookson WO, Moffatt MF, Walker AW. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 2014;12:87. doi:10.1186/s12915-014-0087-z.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  24. 24.

    Weiss S, Amir A, Hyde ER, Metcalf JL, Song SJ, Knight R. Tracking down the sources of experimental contamination in microbiome studies. Genome Biol. 2014;15(12):564. doi:10.1186/s13059-014-0564-2.

    PubMed  PubMed Central  Article  Google Scholar 

  25. 25.

    Di Bella JM, Bao Y, Gloor GB, Burton JP, Reid G. High throughput sequencing methods and analysis for microbiome research. J Microbiol Methods. 2013;95(3):401–14. doi:10.1016/j.mimet.2013.08.011.

    PubMed  Article  CAS  Google Scholar 

  26. 26.

    Foster JA, Bunge J, Gilbert JA, Moore JH. Measuring the microbiome: perspectives on advances in DNA-based techniques for exploring microbial life. Brief Bioinform. 2012;13(4):420–9. doi:10.1093/bib/bbr080.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Goodrich JK, Di Rienzi SC, Poole AC, Koren O, Walters WA, Caporaso JG, Ley RE. Conducting a microbiome study. Cell. 2014;158(2):250–62. doi:10.1016/j.cell.2014.06.037.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Kuczynski J, Lauber CL, Walters WA, Parfrey LW, Clemente JC, Gevers D, Knight R. Experimental and analytical tools for studying the human microbiome. Nat Rev Genet. 2012;13(1):47–58. doi:10.1038/nrg3129.

    CAS  Article  Google Scholar 

  29. 29.

    Robinson CK, Brotman RM, Ravel J. Intricacies of assessing the human microbiome in epidemiologic studies. Ann Epidemiol. 2016;26(5):311–21. doi:10.1016/j.annepidem.2016.04.005.

    PubMed  PubMed Central  Article  Google Scholar 

  30. 30.

    Bikel S, Valdez-Lara A, Cornejo-Granados F, Rico K, Canizales-Quinteros S, Soberon X, Ochoa-Leyva A. Combining metagenomics, metatranscriptomics and viromics to explore novel microbial interactions: towards a systems-level understanding of human microbiome. Comput Struct Biotechnol J. 2015;13:390–401. doi:10.1016/j.csbj.2015.06.001.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31.

    Kim Y, Koh I, Rho M. Deciphering the human microbiome using next-generation sequencing data and bioinformatics approaches. Methods. 2015;79–80:52–9. doi:10.1016/j.ymeth.2014.10.022.

    PubMed  Article  CAS  Google Scholar 

  32. 32.

    Laukens D, Brinkman BM, Raes J, De Vos M, Vandenabeele P. Heterogeneity of the gut microbiome in mice: guidelines for optimizing experimental design. FEMS Microbiol Rev. 2016;40(1):117–32. doi:10.1093/femsre/fuv036.

    PubMed  Article  Google Scholar 

  33. 33.

    Tsilimigras MC, Fodor AA. Compositional data analysis of the microbiome: fundamentals, tools, and challenges. Ann Epidemiol. 2016;26(5):330–5. doi:10.1016/j.annepidem.2016.03.002.

    PubMed  Article  Google Scholar 

  34. 34.

    Kelly BJ, Gross R, Bittinger K, Sherrill-Mix S, Lewis JD, Collman RG, Li H. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA. Bioinformatics. 2015;31(15):2461–8. doi:10.1093/bioinformatics/btv183.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    La Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, Wang Q, Shannon WD. Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS One. 2012;7(12):e52078. doi:10.1371/journal.pone.0052078.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  36. 36.

    Blaser M, Bork P, Fraser C, Knight R, Wang J. The microbiome explored: recent insights and future challenges. Nat Rev Microbiol. 2013;11(3):213–7. doi:10.1038/nrmicro2973.

    CAS  PubMed  Article  Google Scholar 

  37. 37.

    Dave M, Higgins PD, Middha S, Rioux KP. The human gut microbiome: current knowledge, challenges, and future directions. Transl Res. 2012;160(4):246–57. doi:10.1016/j.trsl.2012.05.003.

    CAS  PubMed  Article  Google Scholar 

  38. 38.

    Lozupone CA, Stombaugh JI, Gordon JI, Jansson JK, Knight R. Diversity, stability and resilience of the human gut microbiota. Nature. 2012;489(7415):220–30. doi:10.1038/nature11550.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. 39.

    McKenna P, Hoffmann C, Minkah N, Aye PP, Lackner A, Liu Z, Bushman FD. The macaque gut microbiome in health, lentiviral infection, and chronic enterocolitis. PLoS Pathog. 2008;4(2):e20. doi:10.1371/journal.ppat.0040020.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  40. 40.

    Abeles SR, Ly M, Santiago-Rodriguez TM, Pride DT. Effects of long term antibiotic therapy on human oral and fecal viromes. PLoS One. 2015;10(8):e0134941. doi:10.1371/journal.pone.0134941.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  41. 41.

    Dethlefsen L, Huse S, Sogin ML, Relman DA. The pervasive effects of an antibiotic on the human gut microbiota, as revealed by deep 16S rRNA sequencing. PLoS Biol. 2008;6(11):e280. doi:10.1371/journal.pbio.0060280.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  42. 42.

    Jakobsson HE, Jernberg C, Andersson AF, Sjolund-Karlsson M, Jansson JK, Engstrand L. Short-term antibiotic treatment has differing long-term impacts on the human throat and gut microbiome. PLoS One. 2010;5(3):e9836. doi:10.1371/journal.pone.0009836.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  43. 43.

    Devkota S. MICROBIOME. Prescription drugs obscure microbiome analyses. Science. 2016;351(6272):452–3. doi:10.1126/science.aaf1353.

    CAS  PubMed  Article  Google Scholar 

  44. 44.

    Mardinoglu A, Boren J, Smith U. Confounding effects of metformin on the human gut microbiome in type 2 diabetes. Cell Metab. 2016;23(1):10–2. doi:10.1016/j.cmet.2015.12.012.

    CAS  PubMed  Article  Google Scholar 

  45. 45.

    Imhann F, Bonder MJ, Vich Vila A, Fu J, Mujagic Z, Vork L, Zhernakova A. Proton pump inhibitors affect the gut microbiome. Gut. 2016;65(5):740–8. doi:10.1136/gutjnl-2015-310376.

    CAS  PubMed  Article  Google Scholar 

  46. 46.

    Amarasekara R, Jayasekara RW, Senanayake H, Dissanayake VH. Microbiome of the placenta in pre-eclampsia supports the role of bacteria in the multifactorial cause of pre-eclampsia. J Obstet Gynaecol Res. 2015;41(5):662–9. doi:10.1111/jog.12619.

    CAS  PubMed  Article  Google Scholar 

  47. 47.

    Dore J, Blottiere H. The influence of diet on the gut microbiota and its consequences for health. Curr Opin Biotechnol. 2015;32:195–9. doi:10.1016/j.copbio.2015.01.002.

    CAS  PubMed  Article  Google Scholar 

  48. 48.

    Fallucca F, Porrata C, Fallucca S, Pianesi M. Influence of diet on gut microbiota, inflammation and type 2 diabetes mellitus. First experience with macrobiotic Ma-Pi 2 diet. Diabetes Metab Res Rev. 2014;30 Suppl 1:48–54. doi:10.1002/dmrr.2518.

    CAS  PubMed  Article  Google Scholar 

  49. 49.

    Hrncir T, Stepankova R, Kozakova H, Hudcovic T, Tlaskalova-Hogenova H. Gut microbiota and lipopolysaccharide content of the diet influence development of regulatory T cells: studies in germ-free mice. BMC Immunol. 2008;9:65. doi:10.1186/1471-2172-9-65.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  50. 50.

    Moreira AP, Texeira TF, Ferreira AB, Peluzio Mdo C, Alfenas Rde C. Influence of a high-fat diet on gut microbiota, intestinal permeability and metabolic endotoxaemia. Br J Nutr. 2012;108(5):801–9. doi:10.1017/S0007114512001213.

    CAS  PubMed  Article  Google Scholar 

  51. 51.

    Murphy EA, Velazquez KT, Herbert KM. Influence of high-fat diet on gut microbiota: a driving force for chronic disease risk. Curr Opin Clin Nutr Metab Care. 2015;18(5):515–20. doi:10.1097/MCO.0000000000000209.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. 52.

    Rothe M, Blaut M. Evolution of the gut microbiota and the influence of diet. Benef Microbes. 2013;4(1):31–7. doi:10.3920/BM2012.0029.

    CAS  PubMed  Article  Google Scholar 

  53. 53.

    Scott KP, Gratz SW, Sheridan PO, Flint HJ, Duncan SH. The influence of diet on the gut microbiota. Pharmacol Res. 2013;69(1):52–60. doi:10.1016/j.phrs.2012.10.020.

    CAS  PubMed  Article  Google Scholar 

  54. 54.

    Sherman MP, Zaghouani H, Niklas V. Gut microbiota, the immune system, and diet influence the neonatal gut-brain axis. Pediatr Res. 2015;77(1-2):127–35. doi:10.1038/pr.2014.161.

    PubMed  Article  Google Scholar 

  55. 55.

    Wu GD, Chen J, Hoffmann C, Bittinger K, Chen YY, Keilbaugh SA, Lewis JD. Linking long-term dietary patterns with gut microbial enterotypes. Science. 2011;334(6052):105–8. doi:10.1126/science.1208344.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  56. 56.

    Wu GD, Compher C, Chen EZ, Smith SA, Shah RD, Bittinger K, Lewis JD. Comparative metabolomics in vegans and omnivores reveal constraints on diet-dependent gut microbiota metabolite production. Gut. 2016;65(1):63–72. doi:10.1136/gutjnl-2014-308209.

    CAS  PubMed  Article  Google Scholar 

  57. 57.

    David LA, Maurice CF, Carmody RN, Gootenberg DB, Button JE, Wolfe BE, Turnbaugh PJ. Diet rapidly and reproducibly alters the human gut microbiome. Nature. 2014;505(7484):559–63. doi:10.1038/nature12820.

    CAS  PubMed  Article  Google Scholar 

  58. 58.

    Koenig JE, Spor A, Scalfone N, Fricker AD, Stombaugh J, Knight R, Ley RE. Succession of microbial consortia in the developing infant gut microbiome. Proc Natl Acad Sci U S A. 2011;108 Suppl 1:4578–85. doi:10.1073/pnas.1000081107.

    CAS  PubMed  Article  Google Scholar 

  59. 59.

    Lee D, Albenberg L, Compher C, Baldassano R, Piccoli D, Lewis JD, Wu GD. Diet in the pathogenesis and treatment of inflammatory bowel diseases. Gastroenterology. 2015;148(6):1087–106. doi:10.1053/j.gastro.2015.01.007.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  60. 60.

    Yatsunenko T, Rey FE, Manary MJ, Trehan I, Dominguez-Bello MG, Contreras M, Gordon JI. Human gut microbiome viewed across age and geography. Nature. 2012;486(7402):222–7. doi:10.1038/nature11053.

    CAS  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Claesson MJ, Cusack S, O’Sullivan O, Greene-Diniz R, de Weerd H, Flannery E, O’Toole PW. Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proc Natl Acad Sci U S A. 2011;108 Suppl 1:4586–91. doi:10.1073/pnas.1000097107.

    CAS  PubMed  Article  Google Scholar 

  62. 62.

    Clarke G, Stilling RM, Kennedy PJ, Stanton C, Cryan JF, Dinan TG. Minireview: gut microbiota: the neglected endocrine organ. Mol Endocrinol. 2014;28(8):1221–38. doi:10.1210/me.2014-1108.

    PubMed  Article  CAS  Google Scholar 

  63. 63.

    Markle JG, Frank DN, Mortin-Toth S, Robertson CE, Feazel LM, Rolle-Kampczyk U, Danska JS. Sex differences in the gut microbiome drive hormone-dependent regulation of autoimmunity. Science. 2013;339(6123):1084–8. doi:10.1126/science.1233521.

    CAS  PubMed  Article  Google Scholar 

  64. 64.

    Davey KJ, O’Mahony SM, Schellekens H, O’Sullivan O, Bienenstock J, Cotter PD, Cryan JF. Gender-dependent consequences of chronic olanzapine in the rat: effects on body weight, inflammatory, metabolic and microbiota parameters. Psychopharmacol (Berl). 2012;221(1):155–69. doi:10.1007/s00213-011-2555-2.

    CAS  Article  Google Scholar 

  65. 65.

    Liang X, Bushman FD, FitzGerald GA. Rhythmicity of the intestinal microbiota is regulated by gender and the host circadian clock. Proc Natl Acad Sci U S A. 2015;112(33):10479–84. doi:10.1073/pnas.1501305112.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  66. 66.

    Ren W, Ma Y, Yang L, Gettie A, Salas J, Russell K, Cheng-Mayer C. Fast disease progression in simian HIV-infected female macaque is accompanied by a robust local inflammatory innate immune and microbial response. AIDS. 2015;29(10):F1–8. doi:10.1097/QAD.0000000000000711.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  67. 67.

    Noguera-Julian M, Rocafort M, Guillen Y, Rivera J, Casadella M, Nowak P, Paredes R. Gut microbiota linked to sexual preference and HIV infection. EBioMed. 2016;5:135–46. doi:10.1016/j.ebiom.2016.01.032.

    Article  Google Scholar 

  68. 68.

    Oh C, Lee K, Cheong Y, Lee SW, Park SY, Song CS, Lee JB. Comparison of the oral microbiomes of canines and their owners using next-generation sequencing. PLoS One. 2015;10(7):e0131468. doi:10.1371/journal.pone.0131468.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  69. 69.

    Song SJ, Lauber C, Costello EK, Lozupone CA, Humphrey G, Berg-Lyons D, Knight R. Cohabiting family members share microbiota with one another and with their dogs. Elife. 2013;2:e00458. doi:10.7554/eLife.00458.

    PubMed  PubMed Central  Google Scholar 

  70. 70.

    Jalanka-Tuovinen J, Salonen A, Nikkila J, Immonen O, Kekkonen R, Lahti L, de Vos WM. Intestinal microbiota in healthy adults: temporal analysis reveals individual and common core and relation to intestinal symptoms. PLoS One. 2011;6(7):e23035. doi:10.1371/journal.pone.0023035.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  71. 71.

    Rajilic-Stojanovic M, Heilig HG, Tims S, Zoetendal EG, & de Vos WM. Long-term monitoring of the human intestinal microbiota composition. Environ Microbiol. 2012. doi:10.1111/1462-2920.12023

  72. 72.

    Zoetendal EG, Akkermans AD, De Vos WM. Temperature gradient gel electrophoresis analysis of 16S rRNA from human fecal samples reveals stable and host-specific communities of active bacteria. Appl Environ Microbiol. 1998;64(10):3854–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73.

    Gevers D, Kugathasan S, Denson LA, Vazquez-Baeza Y, Van Treuren W, Ren B, Xavier RJ. The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe. 2014;15(3):382–92. doi:10.1016/j.chom.2014.02.005.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  74. 74.

    Brotman RM, Shardell MD, Gajer P, Tracy JK, Zenilman JM, Ravel J, Gravitt PE. Interplay between the temporal dynamics of the vaginal microbiota and human papillomavirus detection. J Infect Dis. 2014;210(11):1723–33. doi:10.1093/infdis/jiu330.

    PubMed  PubMed Central  Article  Google Scholar 

  75. 75.

    Chehoud C, Stieh DJ, Bailey AG, Laughlin AL, Allen SA, McCotter KL, Bushman FD. Associations of the vaginal microbiota with HIV infection, bacterial vaginosis and demographic factors. AIDS. 2017. doi:10.1097/QAD.0000000000001421.

  76. 76.

    Gajer P, Brotman RM, Bai G, Sakamoto J, Schutte UM, Zhong X, Ravel J. Temporal dynamics of the human vaginal microbiota. Sci Transl Med. 2012;4(132):132ra152. doi:10.1126/scitranslmed.3003605.

    Article  Google Scholar 

  77. 77.

    Ravel J, Brotman RM, Gajer P, Ma B, Nandy M, Fadrosh DW, Forney LJ. Daily temporal dynamics of vaginal microbiota before, during and after episodes of bacterial vaginosis. Microbiome. 2013;1(1):29. doi:10.1186/2049-2618-1-29.

    PubMed  PubMed Central  Article  Google Scholar 

  78. 78.

    Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SS, McCulle SL, Forney LJ. Vaginal microbiome of reproductive-age women. Proc Natl Acad Sci U S A. 2011;108 Suppl 1:4680–7. doi:10.1073/pnas.1002611107.

    CAS  PubMed  Article  Google Scholar 

  79. 79.

    Liang X, Bushman FD, FitzGerald GA. Time in motion: the molecular clock meets the microbiome. Cell. 2014;159(3):469–70. doi:10.1016/j.cell.2014.10.020.

    CAS  PubMed  Article  Google Scholar 

  80. 80.

    Thaiss CA, Zeevi D, Levy M, Segal E, Elinav E. A day in the life of the meta-organism: diurnal rhythms of the intestinal microbiome and its host. Gut Microbes. 2015;6(2):137–42. doi:10.1080/19490976.2015.1016690.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  81. 81.

    Bushon RN, Kephart CM, Koltun GF, Francy DS, Schaefer 3rd FW, Alan Lindquist HD. Statistical assessment of DNA extraction reagent lot variability in real-time quantitative PCR. Lett Appl Microbiol. 2010;50(3):276–82. doi:10.1111/j.1472-765X.2009.02788.x.

    CAS  PubMed  Article  Google Scholar 

  82. 82.

    Campbell JH, Foster CM, Vishnivetskaya T, Campbell AG, Yang ZK, Wymore A, Podar M. Host genetic and environmental effects on mouse intestinal microbiota. ISME J. 2012;6(11):2033–44. doi:10.1038/ismej.2012.54.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  83. 83.

    Hildebrand F, Nguyen TL, Brinkman B, Yunta RG, Cauwe B, Vandenabeele P, Raes J. Inflammation-associated enterotypes, host genotype, cage and inter-individual effects drive gut microbiota variation in common laboratory mice. Genome Biol. 2013;14(1):R4. doi:10.1186/gb-2013-14-1-r4.

    PubMed  PubMed Central  Article  Google Scholar 

  84. 84.

    Arndt SS, Laarakker MC, van Lith HA, van der Staay FJ, Gieling E, Salomons AR, Ohl F. Individual housing of mice--impact on behaviour and stress responses. Physiol Behav. 2009;97(3-4):385–93. doi:10.1016/j.physbeh.2009.03.008.

    CAS  PubMed  Article  Google Scholar 

  85. 85.

    Laber K, Veatch LM, Lopez MF, Mulligan JK, Lathers DM. Effects of housing density on weight gain, immune function, behavior, and plasma corticosterone concentrations in BALB/c and C57BL/6 mice. J Am Assoc Lab Anim Sci. 2008;47(2):16–23.

    CAS  PubMed  PubMed Central  Google Scholar 

  86. 86.

    Paigen B, Currer JM, Svenson KL. Effects of varied housing density on a hybrid mouse strain followed for 20 months. PLoS One. 2016;11(2):e0149647. doi:10.1371/journal.pone.0149647.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  87. 87.

    Dollive S, Chen YY, Grunberg S, Bittinger K, Hoffmann C, Vandivier L, Bushman FD. Fungi of the murine gut: episodic variation and proliferation during antibiotic treatment. PLoS One. 2013;8(8):e71806. doi:10.1371/journal.pone.0071806.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  88. 88.

    Wu GD, Lewis JD, Hoffmann C, Chen YY, Knight R, Bittinger K, Bushman FD. Sampling and pyrosequencing methods for characterizing bacterial communities in the human gut using 16S sequence tags. BMC Microbiol. 2010;10:206. doi:10.1186/1471-2180-10-206.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  89. 89.

    Song SJ, Amir A, Metcalf L, Amato KR, Xu ZZ, Humphrey G, & Knight R. Preservation methods differ in fecal microbiome stability, affecting suitability for field studies. mSystems. 2016; 1(3). doi:10.1128/mSystems.00021-16

  90. 90.

    Blekhman R, Tang K, Archie EA, Barreiro LB, Johnson ZP, Wilson ME, Tung J. Common methods for fecal sample storage in field studies yield consistent signatures of individual identity in microbiome sequencing data. Sci Rep. 2016;6:31519. doi:10.1038/srep31519.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  91. 91.

    Choo JM, Leong LE, Rogers GB. Sample storage conditions significantly influence faecal microbiome profiles. Sci Rep. 2015;5:16350. doi:10.1038/srep16350.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  92. 92.

    Dominianni C, Wu J, Hayes RB, Ahn J. Comparison of methods for fecal microbiome biospecimen collection. BMC Microbiol. 2014;14:103. doi:10.1186/1471-2180-14-103.

    PubMed  PubMed Central  Article  Google Scholar 

  93. 93.

    Hill CJ, Brown JR, Lynch DB, Jeffery IB, Ryan CA, Ross RP, O’Toole PW. Effect of room temperature transport vials on DNA quality and phylogenetic composition of faecal microbiota of elderly adults and infants. Microbiome. 2016;4(1):19. doi:10.1186/s40168-016-0164-3.

    PubMed  PubMed Central  Article  Google Scholar 

  94. 94.

    Kerckhof FM, Courtens EN, Geirnaert A, Hoefman S, Ho A, Vilchez-Vargas R, Boon N. Optimized cryopreservation of mixed microbial communities for conserved functionality and diversity. PLoS One. 2014;9(6):e99517. doi:10.1371/journal.pone.0099517.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  95. 95.

    McKain N, Genc B, Snelling TJ, Wallace RJ. Differential recovery of bacterial and archaeal 16S rRNA genes from ruminal digesta in response to glycerol as cryoprotectant. J Microbiol Methods. 2013;95(3):381–3. doi:10.1016/j.mimet.2013.10.009.

    CAS  PubMed  Article  Google Scholar 

  96. 96.

    Sinha R, Abnet CC, White O, Knight R, Huttenhower C. The microbiome quality control project: baseline study design and future directions. Genome Biol. 2015;16:276. doi:10.1186/s13059-015-0841-8.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  97. 97.

    Vogtmann E, Chen J, Amir A, Shi J, Abnet CC, Nelson H, Sinha R. Comparison of collection methods for fecal samples in microbiome Studies. Am J Epidemiol. 2017;185(2):115–23. doi:10.1093/aje/kww177.

    PubMed  Article  Google Scholar 

  98. 98.

    Luo T, Srinivasan U, Ramadugu K, Shedden KA, Neiswanger K, Trumble E, Foxman B. Effects of specimen collection methodologies and storage conditions on the short-term stability of oral microbiome taxonomy. Appl Environ Microbiol. 2016;82(18):5519–29. doi:10.1128/AEM.01132-16.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  99. 99.

    Lauber CL, Zhou N, Gordon JI, Knight R, Fierer N. Effect of storage conditions on the assessment of bacterial community structure in soil and human-associated samples. FEMS Microbiol Lett. 2010;307(1):80–6. doi:10.1111/j.1574-6968.2010.01965.x.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  100. 100.

    Corless CE, Guiver M, Borrow R, Edwards-Jones V, Kaczmarski EB, Fox AJ. Contamination and sensitivity issues with a real-time universal 16S rRNA PCR. J Clin Microbiol. 2000;38(5):1747–52.

    CAS  PubMed  PubMed Central  Google Scholar 

  101. 101.

    Rand KH, Houck H. Taq polymerase contains bacterial DNA of unknown origin. Mol Cell Probes. 1990;4(6):445–50.

    CAS  PubMed  Article  Google Scholar 

  102. 102.

    Tanner MA, Goebel BM, Dojka MA, Pace NR. Specific ribosomal DNA sequences from diverse environmental settings correlate with experimental contaminants. Appl Environ Microbiol. 1998;64(8):3110–3.

    CAS  PubMed  PubMed Central  Google Scholar 

  103. 103.

    Shen H, Rogelj S, Kieft TL. Sensitive, real-time PCR detects low-levels of contamination by Legionella pneumophila in commercial reagents. Mol Cell Probes. 2006;20(3-4):147–53. doi:10.1016/j.mcp.2005.09.007.

    CAS  PubMed  Article  Google Scholar 

  104. 104.

    Kennedy K, Hall MW, Lynch MD, Moreno-Hagelsieb G, Neufeld JD. Evaluating bias of illumina-based bacterial 16S rRNA gene profiles. Appl Environ Microbiol. 2014;80(18):5717–22. doi:10.1128/AEM.01451-14.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  105. 105.

    Wagner Mackenzie B, Waite DW, Taylor MW. Evaluating variation in human gut microbiota profiles due to DNA extraction method and inter-subject differences. Front Microbiol. 2015;6:130. doi:10.3389/fmicb.2015.00130.

    PubMed  PubMed Central  Article  Google Scholar 

  106. 106.

    Lazarevic V, Gaia N, Girard M, Schrenzel J. Decontamination of 16S rRNA gene amplicon sequence datasets based on bacterial load assessment by qPCR. BMC Microbiol. 2016;16:73. doi:10.1186/s12866-016-0689-4.

    PubMed  PubMed Central  Article  Google Scholar 

  107. 107.

    Knights D, Kuczynski J, Charlson ES, Zaneveld J, Mozer MC, Collman RG, Kelley ST. Bayesian community-wide culture-independent microbial source tracking. Nat Methods. 2011;8(9):761–3. doi:10.1038/nmeth.1650.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  108. 108.

    Aagaard K, Ma J, Antony KM, Ganu R, Petrosino J, Versalovic J. The placenta harbors a unique microbiome. Sci Transl Med. 2014;6(237):237ra265. doi:10.1126/scitranslmed.3008599.

    Article  CAS  Google Scholar 

  109. 109.

    Antony KM, Ma J, Mitchell KB, Racusin DA, Versalovic J, Aagaard K. The preterm placental microbiome varies in association with excess maternal gestational weight gain. Am J Obstet Gynecol. 2015;212(5):653. doi:10.1016/j.ajog.2014.12.041. e651-616.

    PubMed  Article  Google Scholar 

  110. 110.

    Zheng J, Xiao X, Zhang Q, Mao L, Yu M, Xu J. The placental microbiome varies in association with low birth weight in full-term neonates. Nutrients. 2015;7(8):6924–37. doi:10.3390/nu7085315.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  111. 111.

    Yuan S, Cohen DB, Ravel J, Abdo Z, Forney LJ. Evaluation of methods for the extraction and purification of DNA from the human microbiome. PLoS One. 2012;7(3):e33865. doi:10.1371/journal.pone.0033865.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  112. 112.

    Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, Birren BW. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 2011;21(3):494–504. doi:10.1101/gr.112730.110.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  113. 113.

    Parada AE, Needham DM, Fuhrman JA. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ Microbiol. 2016;18(5):1403–14. doi:10.1111/1462-2920.13023.

    CAS  PubMed  Article  Google Scholar 

  114. 114.

    Kircher M, Heyn P, Kelso J. Addressing challenges in the production and analysis of illumina sequencing data. BMC Genomics. 2011;12:382. doi:10.1186/1471-2164-12-382.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  115. 115.

    Brady T, Roth SL, Malani N, Wang GP, Berry CC, Leboulch P, Bushman FD. A method to sequence and quantify DNA integration for monitoring outcome in gene therapy. Nucleic Acids Res. 2011;39(11):e72. doi:10.1093/nar/gkr140.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  116. 116.

    Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, Turnbaugh PJ, Knight R. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci U S A. 2011;108 Suppl 1:4516–22. doi:10.1073/pnas.1000080107.

    CAS  PubMed  Article  Google Scholar 

  117. 117.

    Walters W, Hyde ER, Berg-Lyons D, Ackermann G, Humphrey G, Parada A, Knight R. Improved bacterial 16S rRNA gene (V4 and V4-5) and fungal internal transcribed spacer marker gene primers for microbial community surveys. mSystems. 2016; 1(1). doi:10.1128/mSystems.00009-15.

  118. 118.

    Laurence M, Hatzis C, Brash DE. Common contaminants in next-generation sequencing that hinder discovery of low-abundance microbes. PLoS One. 2014;9(5):e97876. doi:10.1371/journal.pone.0097876.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  119. 119.

    Bhatt AS, Freeman SS, Herrera AF, Pedamallu CS, Gevers D, Duke F, Meyerson M. Sequence-based discovery of Bradyrhizobium enterica in cord colitis syndrome. N Engl J Med. 2013;369(6):517–28. doi:10.1056/NEJMoa1211115.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  120. 120.

    Dunn O. Multiple Comparisons Among Means. J Am Stat Assoc. 1961;56(293):52–64. doi:10.2307/2282330. citeulike-article-id:7471132.

    Article  Google Scholar 

  121. 121.

    Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57(1):289–300. doi:10.2307/2346101. citeulike-article-id:1042553.

    Google Scholar 

  122. 122.

    Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Knight R. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7(5):335–6. doi:10.1038/nmeth.f.303.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  123. 123.

    Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Weber CF. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75(23):7537–41. doi:10.1128/AEM.01541-09.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  124. 124.

    Wang X, Tucker NR, Rizki G, Mills R, Krijger PH, de Wit E, Boyer LA. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. Elife. 2016; 5. doi:10.7554/eLife.10557

  125. 125.

    Sabino J, Vieira-Silva S, Machiels K, Joossens M, Falony G, Ballet V, Raes J. Primary sclerosing cholangitis is characterised by intestinal dysbiosis independent from IBD. Gut. 2016. doi:10.1136/gutjnl-2015-311004

  126. 126.

    Forslund K, Hildebrand F, Nielsen T, Falony G, Le Chatelier E, Sunagawa S, Pedersen O. Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota. Nature. 2015;528(7581):262–6. doi:10.1038/nature15766.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  127. 127.

    Shaw KA, Bertha M, Hofmekler T, Chopra P, Vatanen T, Srivatsa A, Kugathasan S. Dysbiosis, inflammation, and response to treatment: a longitudinal study of pediatric subjects with newly diagnosed inflammatory bowel disease. Genome Med. 2016;8(1):75. doi:10.1186/s13073-016-0331-y.

    PubMed  PubMed Central  Article  Google Scholar 

  128. 128.

    Kelsen J, Bittinger K, Pauly-Hubbard H, Posivak L, Grunberg S, Baldassano R, Bushman FD. Alterations of the subgingival microbiota in pediatric Crohn’s disease studied longitudinally in discovery and validation cohorts. Inflamm Bowel Dis. 2015;21(12):2797–805. doi:10.1097/MIB.0000000000000557.

    PubMed  PubMed Central  Article  Google Scholar 

  129. 129.

    Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol. 2005;71(12):8228–35. doi:10.1128/AEM.71.12.8228-8235.2005.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references


We are grateful to Laurie Zimmerman and members of the Bushman laboratory for help and suggestions.


This work was supported by the National Institute of Allergy and Infectious Diseases P30 AI 045008 (EC, AL, and FDB); the National Heart, Lung, and Blood Institute R01 HL113252 (RGC); National Institute of Allergy and Infectious Diseases T32 AI007632 (SSM, and CC); Pennsylvania Department of Health SAP 4100068710 (DK, CEH, CZ, LM, CT, RB, and KB); Crohn’s and Colitis Foundation of America Career Development Award 3276 (JK); National Institutes of Health 1T32DK101371-01 (MC).

Availability of data and materials

The raw sequence files generated for comparisons of swab storage methods, positive gene block controls, and negative control samples are available from the NCBI Sequence Read Archive (BioProject accessions PRJNA356343, PRJNA356422, PRJNA356404, and PRJNA380255, respectively).

Authors’ contributions

DK, CEH, LM, AL, EC, SSM, RGC, RB, FDB, and KB wrote the manuscript. DK, CEH, LM, AL, JK, and MC carried out experiments for the comparison of storage methods and positive/negative control samples. JK, MC, FDB, and KB designed the comparison of storage methods. DK, CEH, LM, FDB, and KB designed the comparison of positive/negative controls. CZ, CT, SSM, CC, and KB performed the data analysis. All authors read and approved of the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information



Corresponding author

Correspondence to Kyle Bittinger.

Additional files

Additional file 1:

Supplementary methods. (PDF 1926 kb)

Additional file 2:

DNA sequences for gene block control samples. (XLSX 11 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kim, D., Hofstaedter, C.E., Zhao, C. et al. Optimizing methods and dodging pitfalls in microbiome research. Microbiome 5, 52 (2017).

Download citation


  • Metagenomics
  • 16S rRNA gene
  • Shotgun metagenomics
  • Environmental contamination
  • Methods
  • Study design
  • Best practices