Optimizing methods and dodging pitfalls in microbiome research
Microbiome volume 5, Article number: 52 (2017)
Research on the human microbiome has yielded numerous insights into health and disease, but also has resulted in a wealth of experimental artifacts. Here, we present suggestions for optimizing experimental design and avoiding known pitfalls, organized in the typical order in which studies are carried out. We first review best practices in experimental design and introduce common confounders such as age, diet, antibiotic use, pet ownership, longitudinal instability, and microbial sharing during cohousing in animal studies. Typically, samples will need to be stored, so we provide data on best practices for several sample types. We then discuss design and analysis of positive and negative controls, which should always be run with experimental samples. We introduce a convenient set of non-biological DNA sequences that can be useful as positive controls for high-volume analysis. Careful analysis of negative and positive controls is particularly important in studies of samples with low microbial biomass, where contamination can comprise most or all of a sample. Lastly, we summarize approaches to enhancing experimental robustness by careful control of multiple comparisons and to comparing discovery and validation cohorts. We hope the experimental tactics summarized here will help researchers in this exciting field advance their studies efficiently while avoiding errors.
Studies of microbial communities—the microbiome—have become quite popular in recent years. These studies are powered by the new DNA sequencing technologies which allow acquisition of over one trillion bases of sequence information in a single instrument run. Using these methods, sequence profiles of microbial communities from different sources can be obtained and compared to elucidate the associated patterns in the microbiota. For example, human samples from a disease state can be compared to samples from healthy controls, allowing for quantification of differences [1,2,3,4,5,6,7,8]. In these studies, DNA is first purified from the samples. DNA sequencing is then used to characterize the associated taxa, querying either a marker gene (16S for bacteria, 18S for eukaryotes, and ITS for fungi) or all DNAs in a mixture (shotgun metagenomics sequencing). In at least some situations, the nature of these microbial communities matters a lot—fecal microbial transplantation radically resets gut community structure and cures relapsing Clostridium difficile infection in up to 90% of cases [9, 10].
Carrying out definitive experiments on the microbiota requires great care, as in any field of research. All analytical methods have biases that must be taken into account in experimental execution and interpretation. For example, for analysis of 16S rRNA gene segments, the choice of gene region studied influences the types of bacteria queried [11,12,13,14,15,16]. Another example, emphasized here, involves low microbial biomass samples. If there is very little microbial DNA in a specimen, the library preparation and sequencing methods will often return sequences that are derived primarily from contamination [17,18,19,20,21,22,23,24]. Contaminating sequences can originate in reagents, dust, crossover between samples, or other sources. Without appropriate precautions and controls, these false calls can be difficult to distinguish from authentic microbiota. Other challenges mentioned below include changes associated with sample storage, microbial sharing among animals during cohousing, and authentic longitudinal microbial instability in the body site of a host animal.
The goal of this article is to catalog major challenges in microbiome research and to outline approaches to address them. Many of these points have come up in the projects of the PennCHOP Microbiome Program, with which the authors of this article are associated. This review is intended to help our collaborators and other microbiome researchers wrestling with these issues. We will focus primarily on laboratory work important for microbiome analysis and touch on computational and statistical methods only briefly. Most examples will be from 16S rRNA marker gene sequencing, but examples from ITS marker gene sequencing for fungi and shotgun metagenomics are also discussed. Several good articles have also addressed these issues and are recommended as additional reading [25,26,27,28,29]. Reviews of methods for bioinformatics analysis of microbiome specimens include [28, 30,31,32,33]. We focus here on studies of the vertebrate microbiome and break out points that are specific to studies of humans and model organisms. We present sections in an order that matches the progression of performing an experiment—the paper begins with study design, continues with sample collection and processing, and concludes with analysis.
Planning a microbiome experiment
It is essential to plan carefully to ensure that the experiment carried out will answer the question posed. Plan the statistical analysis for your study at the start. If possible, carry out a power analysis. Several approaches tailored to microbiome research have been reported [34, 35].
Consider the influence of factors such as antibiotic use, age, sex, diet, geography, and pet ownership
The human microbiome is sensitive to its environment, which can considerably confound associating any particular condition or intervention with a change in microbiota composition. Drug use, diet, age, geography, pet ownership, and sex have all been reported to influence function and composition [36,37,38,39]. In 2008, Relman and colleagues documented effects of antibiotic treatment on the gut microbiome, and many subsequent studies have also reported effects [5, 40,41,42]. It has further been suggested that additional prescription drugs can affect microbiome analyses [43, 44]. For example, Imhann et al. have suggested that decreasing the acidity of the stomach with proton pump inhibitors allows upper gastrointestinal microbes to move down into the gut more readily , altering the composition of the lower gastrointestinal microbiota and increasing the risk of C. difficile infections.
Diet also influences the microbiota [5, 46,47,48,49,50,51,52,53,54,55,56]. Microbial community structure and gene expression are reported to change on short-time scales in response to extreme short-term alterations in diet . Long-term dietary patterns have been linked to gut microbiomes dominated by certain genera—diets high in protein and animal fat are associated with high Bacteroides, whereas diets high in carbohydrates are associated with high Prevotella .
The human microbiome evolves from birth until death. Typically, the gut microbiota adopts a stable anaerobic pattern around age 3 years but varies in early life [58,59,60]. The microbiome also changes in old age, with institutionalized elderly commonly developing high levels of Proteobacteria . Thus, it is critical to use age-matched controls for microbiota comparisons.
Sex can also affect microbiome studies. The gut microbiome serves as a virtual endocrine organ due to the metabolites and neurotransmitters it produces . For example, early microbial exposure has increased testosterone levels in male mice, leading to a protective effect against type 1 diabetes . When the microbiota from these protected male mice was transplanted into younger female mice, the same protection against type 1 diabetes was seen . A study of an anti-psychotic drug on weight and gut microbiota in male and female rats reported that drug treatment induced significant weight gain in female rats only . Microbial circadian rhythms in mice were reported to differ between sexes . Sex differences in microbiota have also been reported in macaques [39, 66].
Remarkably, even sexual preference among men has been linked to gut microbiome differences , which may be a confounding factor in studies of gut microbiome and HIV infection where controls were not matched by sexual preference.
How each of these factors will influence any given microbiome study is dependent on the question asked and the strengths of differences between study groups. In general, it is important to enumerate possible confounders during experimental design, quantify each, and then treat them each as independent variables in downstream statistical analyses.
During experimental design, it is important to consider the longitudinal stability of the microbiota to be studied. The healthy human adult gut is known to be largely stable in microbial composition over time [70,71,72], and a perturbation in such stability—dysbiosis—has been associated with diseases such as inflammatory bowel disease [1, 5, 73]. However, the microbiome of other sites, like the human vagina, can vary on short-time scales without necessarily indicating dysbiosis [74,75,76,77,78]. Even the gut microbiome has been reported to display circadian behavior on a 24-h cycle [65, 79, 80]. Thus, for studies of a new sample type, it is essential to understand longitudinal variation in order to acquire samples that address the question posed.
Different batches of DNA extraction kit reagents can be a significant source of variation for longitudinal studies [23, 81]. It is wise to purchase all the extraction kits needed at the start of the study, or store samples and extract all at the same time, to minimize the effects of this variable.
Cage effects in animal experiments
Cage effects can derail microbiome studies in mice and may be important for other laboratory animals as well. Mice housed in the same cage come to share similar gut microbiota due to mixing by coprophagia . For perspective, in a recent study, mouse strain was found to account for 19% of the variation in gut microbiota, whereas cage effects contributed to 31% .
To account for cage effects, an investigator must set up multiple cages for each study group and treat the cage as a variable in the final statistical analyses. One can then determine whether microbial communities differ between groups given the measured effect of the cage variable. To keep costs down, it is fine to house two to three mice per cage [84,85,86].
As an example, consider the longitudinal study of fungal populations during an antibiotic intervention in mice in Dollive et al. . In this work, antibiotic treatment was associated with increased fungal colonization in the treated groups (Fig. 1). The fungi detected were mostly consistent within each cage, but varied from cage to cage within each treatment group and also in the untreated controls. The types of fungi detected changed longitudinally, but nevertheless were consistent within cages. This highlights how potent cage effects can be, and emphasizes the importance of analyzing multiple cages per study group.
Considerations during sample collection and processing
Sample storage conditions
The most important considerations for storing microbiome samples are to reduce changes in the original microbiota from sample collection to processing and to keep storage conditions consistent for all samples in a study. Sample storage conditions are not always consistent between labs due to downstream applications and resource limitations. In 2010, Wu et al. compared human fecal samples that were immediately frozen at −80 °C, stored on ice for 24 h, or stored on ice for 48 h before DNA extraction and analysis. Differences due to storage method were not significant compared to differences between human individuals .
Due to an increased number of studies collecting samples from remote locations, several groups have assessed the efficacy of preservation methods that may be used when laboratory freezers are not readily available. In 2016, Song et al. tested the effects of different preservatives and temperature fluctuations on feces to mimic microbiome sampling in the field. If fecal samples cannot be frozen, store the samples in 95% ethanol, on FTA cards, or use the OMNIgene Gut kit . These conditions are optimized for sample collection in the field; however, they may not be applicable to all studies depending on study goals and available resources. Other groups have also published similar sample storage studies [90,91,92,93,94,95,96,97].
We recently performed a study on the storage of oral swab samples and found conditions to be relatively forgiving (Fig. 2). In this study, we collected cheek swab samples from three healthy subjects and stored them in a variety of conditions (frozen at −20 °C, refrigerated at 4 °C, or stored at an ambient temperature of 20 °C for 0, 24, 48, 72, or 96 h) before freezing at −80 °C (details and additional analysis are presented in Additional file 1). Figure 2 shows a principal coordinates analysis of unweighted UniFrac distance between the samples. The subject identifier (Fig. 2a) accounted for almost half the total variation in UniFrac distances (R 2 = 0.47, P < 0.001). The storage conditions did not represent a significant effect (Fig. 2b)—we estimated the relative effect size at less than half the effect of inter-subject variability (R 2 = 0.17, P = 0.2). The UniFrac results were recapitulated in our analysis of taxon abundances, where the effect of subject far exceeded any potential storage effects. This analysis provided evidence that over a period of 3 days, storage conditions of cheek swabs did not substantially influence the measured oral microbiome composition for these subjects. Another group recently investigated the effect of collection method, storage condition, and storage medium on taxonomic relative abundance in saliva and dental plaque, and found saliva samples stored in OMNIgene medium to be relatively consistent after a week at room temperature .
Optimal storage conditions have also been investigated for other sample types. Lauber et al. tested the effect of both temperature and length of storage on relative taxon abundance of bacterial communities in soil, human skin, and human fecal samples. The overall composition of bacterial communities and the relative abundance of most major bacterial taxa did not change with different storage conditions studied (P > 0.1 for all sample types) . Replicate samples for both skin and feces clustered by host rather than by temperature or length of storage. However, Lauber et al. mentioned that one fecal sample replicate kept at room temperature was excluded from analysis due to visible fungal growth before DNA was extracted. Though convenience can be prioritized when handling samples over a short period of time (e.g., shipping samples on cold-packs for a 48 hour period before putting them in the freezer), we do recommend freezing samples promptly after collection or using alternative preservative methods if freezers are unavailable .
Low microbial biomass samples—managing environmental contamination
Handling and analyzing samples with low microbial biomass can be challenging. Reagent and laboratory contamination comprise a larger proportion of the total microbial load in these samples compared to samples with rich microbial communities (e.g., healthy human feces). The low absolute amount of starting material can be overpowered by trace amounts of DNA from reagents or laboratory instruments used for sample processing, so that some or all of the microbial reads can be derived from environmental sources. Accounting for potential contaminants is especially important when studying the microbiome of body sites with low levels of bacteria, such as the human lung and skin, or sites that may not normally harbor any microbes at all, such as various healthy tissues [17, 19, 22].
Problems with contamination were well recognized even before the era of deep sequencing [100,101,102]. More recently, several groups have reported on the presence of bacteria in DNA extraction kits—the “kitome”—as well as other reagents used during sample processing [20, 23, 24, 103]. Salter et al. demonstrated that serial dilutions of a bacterial culture produced more contaminating 16S sequence reads and fewer “real” reads with each subsequent dilution, until contamination accounted for the majority of total sequences . This pattern occurred at three different institutes that participated in this study, indicating a widespread issue . Salter and colleagues also investigated effects of the number of PCR cycles for amplification. For low biomass samples, 20 cycles was too low, but 40 cycles recovered both contaminating and authentic low level sequences . Later, Kennedy and colleagues reported that starting template concentration was the major factor behind variability in sequencing results . Even in metagenomic samples prepared without a targeted PCR amplification step, similar contamination patterns were observed for samples containing low amounts of microbial DNA .
The kitome varies between kits, and can even vary between different lots of the same kit [20, 23]. Thus, it is best to process all samples in a project side by side using the same batches of reagents. It is crucial to record the kit used to process each sample, and which batch of each kit was used. If multiple kits were used, treat kit batch as a factor in the statistical analysis.
In our lab, we have investigated different DNA extraction methods in order to minimize the presence of the kitome. While the MO BIO PowerSoil DNA Isolation Kit (MO BIO Laboratories, Carlsbad, CA, USA) provides high yields and has been used widely in microbiome work, including the Human Microbiome Project , the kit was designed to isolate DNA from soil, stool, and environmental samples which are high in microbial DNA. The MO BIO kit was not manufactured with the intention of minimizing background contamination. C. difficile and Streptophyta, for example, have both been identified as possible reagent contaminants in this kit . For low microbial biomass samples, we instead recommend using DNA isolation kits designed to minimize kit contamination (e.g., the QIAamp UCP (UltraClean production) Pathogen Mini Kit (QIAGEN)). Remember: it is important to choose one kit type for all of the samples in a microbiome study. Thus, if a project contains both low and high microbial biomass samples, please commit to one kit type for all samples in order to avoid kitome variation.
On the analytical side, several methods have been developed for filtering suspected contaminating taxa. In a study of the human oral and lung microbiome, Bittinger et al. introduced a method to determine the probability that fungal taxa arose from contamination sources , making use of the total fungal DNA concentration, as approximated by post-PCR assays of DNA concentration using PicoGreen. The PicoGreen assay is usually included in the sequencing protocol as a standard step, so the data is available with no extra effort. Similarly, Lazarevic et al. presented a method that incorporates measurements of total DNA concentration by qPCR, a more accurate but more resource-intensive approach . Jervis-Bardy and colleagues showed that contaminating taxa tend to show a strong decrease in relative abundance as total DNA concentration increases and used this as the basis of another method to remove contaminant taxa . Individual contamination sources can be modeled using SourceTracker, which employs a Bayesian approach to estimate the relative fraction of sequence reads arising from each source .
Studies investigating a potential placenta microbiome provide a case study of the difficulties of working with low biomass samples (Fig. 3). Several groups have reported that there may be a unique, low-abundance microbiome in healthy human placenta [46, 108,109,110], but reporting of negative controls in these studies has been incomplete.
However, a series of independent control studies showed no significant difference in taxonomic abundance between placenta samples and contamination controls . Lauder and colleagues extracted DNA from placenta from six human subjects and worked them up alongside several types of blank swabs and empty extraction wells containing reagents only. DNA was extracted from samples using two different purification kits in order to characterize the contribution of the kitome. Real-time qPCR was performed to quantify total 16S rRNA gene copies in placental samples, controls, and saliva samples (from the same subjects) which were also purified using both DNA extraction kits. Placental samples and controls showed copy numbers that were low and indistinguishable from negative controls regardless of the kit used, whereas oral samples showed high signals several logs above background. Characterization of bacterial lineages by 16S rRNA gene sequencing showed that oral samples harbored distinct 16S profiles characteristic of the well-studied oral microbiota, and results were consistent between kits. However, placental and control samples looked similar to each other, but the pattern seen tracked with the DNA extraction kit used rather than with the sample type (Fig. 3). Several of the shared lineages found in placental and control samples were known contaminants of DNA extraction kits. The inference was that the kitome provided the predominant microbial signature in placental samples . It remains to be seen whether future studies can show a clear distinction between placental samples and negative controls.
Negative control samples
It is essential to collect negative control samples to allow empirical assessment of the contamination background. We commonly include three types of negative control samples on each 16S rRNA marker gene sequencing run (Fig. 4). In “blank swab” samples, a sterile swab was opened from its package in the sequencing lab, and the full sequencing protocol was applied to the swab. In “blank extraction” samples, DNA extraction and all subsequent steps were carried out with no additional input material. In “blank library” samples, the extraction protocol was not applied; DNA-free water (UltraClean PCR Water, MO BIO Laboratories, Carlsbad, CA, USA) was used as input to the post-extraction steps of the protocol, starting with library generation, to characterize contamination in downstream steps.
If microbial biomass is low, additional negative control samples can be included to measure contaminating DNA introduced during sample collection. As an example, in studies of the lung microbiome using bronchoalveolar lavage, an excellent negative control can be generated by washing the bronchoscope with a sample of the lavage saline prior to carrying out the bronchoscopy .
In our recent work, the average number of DNA sequence reads for negative control samples was typically five logs lower than the average for experimental samples derived from high biomass sites such as feces (Fig. 4a). The bacterial taxa appearing in negative control samples were among those previously reported as contamination in the literature, including Comamonadaceae, Ralstonia, and Propionibacterium (Fig. 4b).
Positive control samples
Side by side sequencing of new samples with well-vetted positive controls is strongly recommended. Positive control samples allow verification that sample preparation and sequencing procedures are running smoothly. When samples are purified on multi-well plates, the consistent placement of samples in defined locations on plates allows any sample tracking mix-ups to be detected in the sequence output. Positive and negative controls will ideally be positioned asymmetrically on extraction plates, uniquely defining the plate orientation.
Many studies have used positive controls comprised of mixtures of cultured organisms (“mock communities”) [23, 96, 111] or known mixtures of free DNA (“mock DNA” samples) [88, 112, 113], both of which make useful controls. Analysis usually shows that sequencing results are reproducible within a method and lab environment, but biases can differ between methods and labs .
For a simple positive control, we designed and synthesized mock DNA samples as gene blocks (Fig. 5a, see Additional file 2 for DNA sequences). We selected DNA to synthesize using regions of the 16S rRNA gene in eight archaeal species which would not normally be detected in experimental data because the sequences at the amplification primer binding sites in the archaeal V1-V2 region do not match the bacterial V1-V2 primers used. In the engineered sequences, bacterial 16S V1-V2 primer binding sites were added synthetically to archaeal controls, allowing amplification. This has the advantage that the control sequences can be easily distinguished from experimental samples while still being processed through the same pipeline. A disadvantage of this strategy is that such controls are specific to a particular primer set and must be remade for each amplicon used. However, given the low cost of synthetic DNA, cost for a set of positive controls is modest (about $450). After sequencing archaeal gene block samples in 11 separate sequencing runs, we found that the relative abundances of the sequences were relatively consistent (Fig. 5b).
The gene block design provided an opportunity to test the level of cross-contamination between experimental samples during wet-lab library preparation (in 96-well plates) and sequence acquisition. Figure 5c shows representative results from one sequencing run. The abundance of control archaeal taxa did not increase with proximity to positive control samples on the 96-well plates (P = 0.6, linear regression analysis), suggesting that spill-over during preparation was not a prominent source of admixture between samples. However, low levels of these sequences could be detected in multiple dispersed samples (Fig. 5c, blue squares), potentially due to misreading of bar codes or hybridization of DNA molecules in adjacent clusters during Illumina sequencing . A possible means of suppressing this would be to use bar codes on both ends of the amplicons and to require precise matches to both in the quality filtering .
The gene block scheme is a simple method for ensuring proper amplification of experimental samples, tracking sample mix-ups, and measuring sample cross-contamination during library preparation and sequencing. However, synthetic positive controls are not useful for benchmarking analytical and statistical methods. Analysis methods developed for real communities often do not perform as well on mock communities, and vice versa, due to the presence of naturally occurring sequence variation and low abundance taxa.
Many investigators use primers that simultaneously target the 16S region of both bacteria and archaea, for example, the 515fB/806rB primer set used by the Earth Microbiome Project [116, 117]. Here, there is no advantage to using archaeal sequences in the gene blocks because archaea might be observed in experimental samples. Nonetheless, investigators can build gene blocks using artificially altered DNA sequences that are different enough to be reliably distinguished from genomic sequence but similar enough to be compatible with the analysis pipeline. In Additional file 2, we present example gene block sets for the 515fB/806rB primer pair.
When artificial positive control samples are not suitable or cost effective, many of the benefits may be achieved by sequencing a small number of positive control samples collected from the field. We have used samples of pond water and saliva as indicators of consistency in sample preparation and sequencing, though ultimately found the mock DNA samples to be more convenient.
Contamination in shotgun metagenomic data
Microbial DNA introduced by reagents can also be detected in shotgun metagenomic sequencing. As for amplicon sequencing, contamination is particularly apparent in samples with low microbial biomass. This is seen both for samples with generally low biomass (e.g., skin swab) and for samples dominated by non-microbial DNA (e.g., tissue biopsy).
For example, in our work to characterize the microbiota in sarcoidosis, we performed shotgun metagenomic sequencing on tissue DNA extracted using both standard (DNeasy PowerSoil, Qiagen, Valencia, CA, USA) and low-contaminant (QiaAmp UCP Pathogen, Qiagen, Valencia, CA, USA) kits (unpublished data). When sequencing negative control samples, we observed that the kit background differed between the two (Fig. 6a). Lineages found in both kits were also present in our low biomass tissue samples, likely derived from reagents. Lineages found in both samples and controls included Propionibacterium spp. and Corynebacterium spp., commonly associated with human skin, and Bradyrhyzobium, a common soil bacteria also identified as a contaminant by other groups [23, 118]. Of concern, this lineage has been proposed to be responsible for a colitis syndrome in patients undergoing umbilical-cord hematopoietic stem-cell transplantation [118, 119]—it will be key to strengthen the link to colitis with additional forms of data to rule out contamination as an explanation.
This indicates that while some reagent contamination is unavoidable, usage of low-contaminant kits reduces the total sequencing effort spent on contaminants. Furthermore, it highlights the importance of sequencing and analyzing extraction controls, because without them it is impossible to distinguish reagent contamination from true microbial signals.
An extreme example of contamination detection comes from virome analysis, where multiple displacement amplification is used to amplify specimens. The multiple displacement amplification method uses the phage phi29 DNA polymerase, a highly processive phage polymerase, to copy target DNA prior to library preparation. Shotgun metagenomic sequencing of a blank virome prep sample (unpublished data) returned hits on phage phi29, but upon inspection, these turned out to align exclusively to the polymerase gene (Fig. 6b). Evidently the amplification method was so sensitive that we recovered the gene used to produce a protein that we had purchased from a commercial supplier and used in our library preparation procedure.
Considerations during analysis
This article is mostly concerned with optimal procedures for laboratory methods, but we do want to comment on three issues in analyzing and interpreting microbiome data.
Handling of negative controls
It is essential to report compositions of negative control samples as for all other samples. Work up negative control samples through the full pipeline. Sequence negative control samples even if library yield is low or undetectable. Show the lineages present in stacked bar graphs or heat maps. Check negative control data into sequence archives when experimental samples are deposited. Do not just subtract lineages in negative controls and consider the problem solved. There is no reason to think that contaminating lineages are fully sampled without specific evidence, and there can be cases where environmental lineages are authentically present in samples and functionally important.
Controlling multiple comparisons
High-throughput sequencing experiments commonly generate sequence reads attributed to hundreds of taxa. Researchers wishing to know which taxa are potentially associated with a difference in phenotype must make many comparisons, each time testing a null hypothesis of no difference in taxon abundance. In addition, studies will often involve multiple types of clinical data, allowing myriad comparisons over the microbiome data set. If the acceptable false positive rate for the test is set at a certain level (e.g., 5%), these repeated comparisons will raise the chances of getting a false positive higher than that level. To re-adjust the false positive rate back to the desired level, a multiple testing correction must be used.
This type of problem—controlling for multiple comparisons—is well covered by the statistical literature. A conservative approach is to ensure that none of the hypotheses are falsely rejected, within a specified probability, using the Bonferroni correction . However, this method has been shown to be unacceptably conservative, leading to too many false negatives. A more popular approach is to control for a pre-specified rate of false discovery (i.e., false rejections of the null hypothesis). Benjamini and Hochberg presented a method to control for the false discovery rate in a series of independent tests , and this is the formulation used in microbiome analysis software such as QIIME  and Mothur . Use of a multiple testing correction is strongly recommended whenever multiple comparisons are made.
Discovery and validation cohorts
Moving beyond single experiments, researchers can provide better and more reliable evidence for a discovery by re-producing the results in an independent cohort of samples. The use of separate discovery and validation cohorts is standard in genome-wide association studies, which are also massively multivariate (e.g., .). Using this strategy in the microbiome context, the experiment is first conducted in the discovery cohort and taxa or gene types are selected using a particular testing procedure. The validation cohort is then analyzed to test only those results found to be significant in the discovery cohort. The total number of tests is thus drastically reduced in the validation cohort.
Several microbiome studies have used independent discovery and validation cohorts to select taxa of interest for a disease state. Sabino et al. identified three bacterial genera associated with primary sclerosing cholangitis in a discovery cohort and used their results to correctly classify 75% of subjects in an independent validation cohort . Forslund et al. used separate cohorts to replicate their findings of taxa altered in metformin-treated subjects with type 2 diabetes mellitus . In a series of papers, a composite index of bacterial taxon abundance in stool associated with inflammatory bowel disease (IBD) was developed in one group of subjects , and then found to distinguish IBD from healthy controls in an independent follow-up study . Kelsen et al. applied the discovery-validation cohort design to determine differences in the subgingival microbiota between children with Crohn’s disease and healthy controls , and successfully demonstrated reproducible taxa. Additionally, they were able to distinguish taxa that were associated with antibiotic use from those associated only with the disease.
Summarizing the considerations above, we can make several recommendations for the design and execution of microbiome studies.
For analysis, multiple confounding factors need to be taken into account, including antibiotic use, age, sex, diet, geography, and pet ownership.
In animal studies, cage effects can dominate over what may seem to be extreme interventions. Thus, it is critical to set up each condition to be studied in multiple cages, so that the caging variable can be isolated and accounted for.
Although we recommend storing samples, especially fecal samples, at −80 °C immediately after collection for most accurate results, alternative storage methods for field studies also lead to results with relatively small deviations. For new sample types, it will be wise to test for changes during storage under study-specific storage conditions.
In a cross-sectional study, it is essential to know whether the time point sampled will be representative. For example, the healthy adult gut microbiota does not change radically over short time scales, but that of the vagina sometimes does. Therefore, it is important to assess the relationship of possible longitudinal dynamics to the question posed.
Be energetic in creating and analyzing negative controls—DNA extraction kits usually come with contaminants, and contamination may vary between suppliers and even between batches of the same kit.
Use positive controls for each batch of samples. Mock communities are valuable for this, and the simple synthetic DNA controls presented here (Additional file 2) are also quite useful. Place controls asymmetrically in purification plates to verify proper sample tracking through the DNA purification and library preparation procedures.
Low microbial biomass samples present many challenges. When starting a study that might involve low microbial biomass samples, it is essential to quantify the microbial load in the samples to understand the extent of the challenge. QPCR of total 16S rRNA gene copies can be used for this purpose, as can conventional plating assays if applicable. In an experiment that may involve low biomass samples, start with the null hypothesis that all sequence data reflects contamination only, and ask whether this idea can be rejected in a statistical analysis of the data.
Be realistic about “data dredging,” that is, imposing a rigorous statistical method to control multiple comparisons.
Lastly, if affordable, it greatly strengthens a study to assess effects in separate discovery and validation cohorts.
There is no question that the human microbiota are critical for health and disease—by attending to the above challenges, one can generate high quality data to drive new discoveries in this exciting field.
Internal transcribed spacer
Chehoud C, Albenberg LG, Judge C, Hoffmann C, Grunberg S, Bittinger K, Wu GD. Fungal signature in the gut microbiota of pediatric patients with inflammatory bowel disease. Inflamm Bowel Dis. 2015;21(8):1948–56. doi:10.1097/MIB.0000000000000454.
Debelius JW, Vazquez-Baeza Y, McDonald D, Xu Z, Wolfe E, Knight R. Turning participatory microbiome research into usable data: lessons from the american gut project. J Microbiol Biol Educ. 2016;17(1):46–50. doi:10.1128/jmbe.v17i1.1034.
Faust K, Sathirapongsasuti JF, Izard J, Segata N, Gevers D, Raes J, Huttenhower C. Microbial co-occurrence relationships in the human microbiome. PLoS Comput Biol. 2012;8(7):e1002606. doi:10.1371/journal.pcbi.1002606.
Human Microbiome Project, C. A framework for human microbiome research. Nature. 2012;486(7402):215–21. doi:10.1038/nature11209.
Lewis JD, Chen EZ, Baldassano RN, Otley AR, Griffiths AM, Lee D, Bushman FD. Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric crohn’s disease. Cell Host Microbe. 2015;18(4):489–500. doi:10.1016/j.chom.2015.09.008.
Ley RE, Turnbaugh PJ, Klein S, Gordon JI. Microbial ecology: human gut microbes associated with obesity. Nature. 2006;444(7122):1022–3. doi:10.1038/4441022a.
Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Wang J. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464(7285):59–65. doi:10.1038/nature08821.
Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature. 2007;449(7164):804–10. doi:10.1038/nature06244.
Kassam Z, Lee CH, Yuan Y, Hunt RH. Fecal microbiota transplantation for Clostridium difficile infection: systematic review and meta-analysis. Am J Gastroenterol. 2013;108(4):500–8. doi:10.1038/ajg.2013.59.
van Nood E, Vrieze A, Nieuwdorp M, Fuentes S, Zoetendal EG, de Vos WM, Keller JJ. Duodenal infusion of donor feces for recurrent Clostridium difficile. N Engl J Med. 2013;368(5):407–15. doi:10.1056/NEJMoa1205037.
Baker GC, Smith JJ, Cowan DA. Review and re-analysis of domain-specific 16S primers. J Microbiol Methods. 2003;55(3):541–55.
D’Amore R, Ijaz UZ, Schirmer M, Kenny JG, Gregory R, Darby AC, Hall N. A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling. BMC Genomics. 2016;17:55. doi:10.1186/s12864-015-2194-9.
Liu Z, Lozupone C, Hamady M, Bushman FD, Knight R. Short pyrosequencing reads suffice for accurate microbial community analysis. Nucleic Acids Res. 2007;35(18):e120. doi:10.1093/nar/gkm541.
Mizrahi-Man O, Davenport ER, Gilad Y. Taxonomic classification of bacterial 16S rRNA genes using short sequencing reads: evaluation of effective study designs. PLoS One. 2013;8(1):e53608. doi:10.1371/journal.pone.0053608.
Schloss PD, Jenior ML, Koumpouras CC, Westcott SL, Highlander SK. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system. PeerJ. 2016;4:e1869. doi:10.7717/peerj.1869.
Tremblay J, Singh K, Fern A, Kirton ES, He S, Woyke T, Tringe SG. Primer and platform effects on 16S rRNA tag sequencing. Front Microbiol. 2015;6:771. doi:10.3389/fmicb.2015.00771.
Aho VT, Pereira PA, Haahtela T, Pawankar R, Auvinen P, Koskinen K. The microbiome of the human lower airways: a next generation sequencing perspective. World Allergy Organ J. 2015;8(1):23. doi:10.1186/s40413-015-0074-z.
Bittinger K, Charlson ES, Loy E, Shirley DJ, Haas AR, Laughlin A, Bushman FD. Improved characterization of medically relevant fungi in the human respiratory tract using next-generation sequencing. Genome Biol. 2014;15(10):487. doi:10.1186/s13059-014-0487-y.
Charlson ES, Bittinger K, Haas AR, Fitzgerald AS, Frank I, Yadav A, Collman RG. Topographical continuity of bacterial populations in the healthy human respiratory tract. Am J Respir Crit Care Med. 2011;184(8):957–63. doi:10.1164/rccm.201104-0655OC.
Glassing A, Dowd SE, Galandiuk S, Davis B, Chiodini RJ. Inherent bacterial DNA contamination of extraction and sequencing reagents may affect interpretation of microbiota in low bacterial biomass samples. Gut Pathog. 2016;8:24. doi:10.1186/s13099-016-0103-7.
Jervis-Bardy J, Leong LE, Marri S, Smith RJ, Choo JM, Smith-Vaughan HC, Marsh RL. Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of Illumina MiSeq data. Microbiome. 2015;3:19. doi:10.1186/s40168-015-0083-8.
Lauder AP, Roche AM, Sherrill-Mix S, Bailey A, Laughlin AL, Bittinger K, Bushman FD. Comparison of placenta samples with contamination controls does not provide evidence for a distinct placenta microbiota. Microbiome. 2016;4(1):29. doi:10.1186/s40168-016-0172-3.
Salter SJ, Cox MJ, Turek EM, Calus ST, Cookson WO, Moffatt MF, Walker AW. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 2014;12:87. doi:10.1186/s12915-014-0087-z.
Weiss S, Amir A, Hyde ER, Metcalf JL, Song SJ, Knight R. Tracking down the sources of experimental contamination in microbiome studies. Genome Biol. 2014;15(12):564. doi:10.1186/s13059-014-0564-2.
Di Bella JM, Bao Y, Gloor GB, Burton JP, Reid G. High throughput sequencing methods and analysis for microbiome research. J Microbiol Methods. 2013;95(3):401–14. doi:10.1016/j.mimet.2013.08.011.
Foster JA, Bunge J, Gilbert JA, Moore JH. Measuring the microbiome: perspectives on advances in DNA-based techniques for exploring microbial life. Brief Bioinform. 2012;13(4):420–9. doi:10.1093/bib/bbr080.
Goodrich JK, Di Rienzi SC, Poole AC, Koren O, Walters WA, Caporaso JG, Ley RE. Conducting a microbiome study. Cell. 2014;158(2):250–62. doi:10.1016/j.cell.2014.06.037.
Kuczynski J, Lauber CL, Walters WA, Parfrey LW, Clemente JC, Gevers D, Knight R. Experimental and analytical tools for studying the human microbiome. Nat Rev Genet. 2012;13(1):47–58. doi:10.1038/nrg3129.
Robinson CK, Brotman RM, Ravel J. Intricacies of assessing the human microbiome in epidemiologic studies. Ann Epidemiol. 2016;26(5):311–21. doi:10.1016/j.annepidem.2016.04.005.
Bikel S, Valdez-Lara A, Cornejo-Granados F, Rico K, Canizales-Quinteros S, Soberon X, Ochoa-Leyva A. Combining metagenomics, metatranscriptomics and viromics to explore novel microbial interactions: towards a systems-level understanding of human microbiome. Comput Struct Biotechnol J. 2015;13:390–401. doi:10.1016/j.csbj.2015.06.001.
Kim Y, Koh I, Rho M. Deciphering the human microbiome using next-generation sequencing data and bioinformatics approaches. Methods. 2015;79–80:52–9. doi:10.1016/j.ymeth.2014.10.022.
Laukens D, Brinkman BM, Raes J, De Vos M, Vandenabeele P. Heterogeneity of the gut microbiome in mice: guidelines for optimizing experimental design. FEMS Microbiol Rev. 2016;40(1):117–32. doi:10.1093/femsre/fuv036.
Tsilimigras MC, Fodor AA. Compositional data analysis of the microbiome: fundamentals, tools, and challenges. Ann Epidemiol. 2016;26(5):330–5. doi:10.1016/j.annepidem.2016.03.002.
Kelly BJ, Gross R, Bittinger K, Sherrill-Mix S, Lewis JD, Collman RG, Li H. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA. Bioinformatics. 2015;31(15):2461–8. doi:10.1093/bioinformatics/btv183.
La Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, Wang Q, Shannon WD. Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS One. 2012;7(12):e52078. doi:10.1371/journal.pone.0052078.
Blaser M, Bork P, Fraser C, Knight R, Wang J. The microbiome explored: recent insights and future challenges. Nat Rev Microbiol. 2013;11(3):213–7. doi:10.1038/nrmicro2973.
Dave M, Higgins PD, Middha S, Rioux KP. The human gut microbiome: current knowledge, challenges, and future directions. Transl Res. 2012;160(4):246–57. doi:10.1016/j.trsl.2012.05.003.
Lozupone CA, Stombaugh JI, Gordon JI, Jansson JK, Knight R. Diversity, stability and resilience of the human gut microbiota. Nature. 2012;489(7415):220–30. doi:10.1038/nature11550.
McKenna P, Hoffmann C, Minkah N, Aye PP, Lackner A, Liu Z, Bushman FD. The macaque gut microbiome in health, lentiviral infection, and chronic enterocolitis. PLoS Pathog. 2008;4(2):e20. doi:10.1371/journal.ppat.0040020.
Abeles SR, Ly M, Santiago-Rodriguez TM, Pride DT. Effects of long term antibiotic therapy on human oral and fecal viromes. PLoS One. 2015;10(8):e0134941. doi:10.1371/journal.pone.0134941.
Dethlefsen L, Huse S, Sogin ML, Relman DA. The pervasive effects of an antibiotic on the human gut microbiota, as revealed by deep 16S rRNA sequencing. PLoS Biol. 2008;6(11):e280. doi:10.1371/journal.pbio.0060280.
Jakobsson HE, Jernberg C, Andersson AF, Sjolund-Karlsson M, Jansson JK, Engstrand L. Short-term antibiotic treatment has differing long-term impacts on the human throat and gut microbiome. PLoS One. 2010;5(3):e9836. doi:10.1371/journal.pone.0009836.
Devkota S. MICROBIOME. Prescription drugs obscure microbiome analyses. Science. 2016;351(6272):452–3. doi:10.1126/science.aaf1353.
Mardinoglu A, Boren J, Smith U. Confounding effects of metformin on the human gut microbiome in type 2 diabetes. Cell Metab. 2016;23(1):10–2. doi:10.1016/j.cmet.2015.12.012.
Imhann F, Bonder MJ, Vich Vila A, Fu J, Mujagic Z, Vork L, Zhernakova A. Proton pump inhibitors affect the gut microbiome. Gut. 2016;65(5):740–8. doi:10.1136/gutjnl-2015-310376.
Amarasekara R, Jayasekara RW, Senanayake H, Dissanayake VH. Microbiome of the placenta in pre-eclampsia supports the role of bacteria in the multifactorial cause of pre-eclampsia. J Obstet Gynaecol Res. 2015;41(5):662–9. doi:10.1111/jog.12619.
Dore J, Blottiere H. The influence of diet on the gut microbiota and its consequences for health. Curr Opin Biotechnol. 2015;32:195–9. doi:10.1016/j.copbio.2015.01.002.
Fallucca F, Porrata C, Fallucca S, Pianesi M. Influence of diet on gut microbiota, inflammation and type 2 diabetes mellitus. First experience with macrobiotic Ma-Pi 2 diet. Diabetes Metab Res Rev. 2014;30 Suppl 1:48–54. doi:10.1002/dmrr.2518.
Hrncir T, Stepankova R, Kozakova H, Hudcovic T, Tlaskalova-Hogenova H. Gut microbiota and lipopolysaccharide content of the diet influence development of regulatory T cells: studies in germ-free mice. BMC Immunol. 2008;9:65. doi:10.1186/1471-2172-9-65.
Moreira AP, Texeira TF, Ferreira AB, Peluzio Mdo C, Alfenas Rde C. Influence of a high-fat diet on gut microbiota, intestinal permeability and metabolic endotoxaemia. Br J Nutr. 2012;108(5):801–9. doi:10.1017/S0007114512001213.
Murphy EA, Velazquez KT, Herbert KM. Influence of high-fat diet on gut microbiota: a driving force for chronic disease risk. Curr Opin Clin Nutr Metab Care. 2015;18(5):515–20. doi:10.1097/MCO.0000000000000209.
Rothe M, Blaut M. Evolution of the gut microbiota and the influence of diet. Benef Microbes. 2013;4(1):31–7. doi:10.3920/BM2012.0029.
Scott KP, Gratz SW, Sheridan PO, Flint HJ, Duncan SH. The influence of diet on the gut microbiota. Pharmacol Res. 2013;69(1):52–60. doi:10.1016/j.phrs.2012.10.020.
Sherman MP, Zaghouani H, Niklas V. Gut microbiota, the immune system, and diet influence the neonatal gut-brain axis. Pediatr Res. 2015;77(1-2):127–35. doi:10.1038/pr.2014.161.
Wu GD, Chen J, Hoffmann C, Bittinger K, Chen YY, Keilbaugh SA, Lewis JD. Linking long-term dietary patterns with gut microbial enterotypes. Science. 2011;334(6052):105–8. doi:10.1126/science.1208344.
Wu GD, Compher C, Chen EZ, Smith SA, Shah RD, Bittinger K, Lewis JD. Comparative metabolomics in vegans and omnivores reveal constraints on diet-dependent gut microbiota metabolite production. Gut. 2016;65(1):63–72. doi:10.1136/gutjnl-2014-308209.
David LA, Maurice CF, Carmody RN, Gootenberg DB, Button JE, Wolfe BE, Turnbaugh PJ. Diet rapidly and reproducibly alters the human gut microbiome. Nature. 2014;505(7484):559–63. doi:10.1038/nature12820.
Koenig JE, Spor A, Scalfone N, Fricker AD, Stombaugh J, Knight R, Ley RE. Succession of microbial consortia in the developing infant gut microbiome. Proc Natl Acad Sci U S A. 2011;108 Suppl 1:4578–85. doi:10.1073/pnas.1000081107.
Lee D, Albenberg L, Compher C, Baldassano R, Piccoli D, Lewis JD, Wu GD. Diet in the pathogenesis and treatment of inflammatory bowel diseases. Gastroenterology. 2015;148(6):1087–106. doi:10.1053/j.gastro.2015.01.007.
Yatsunenko T, Rey FE, Manary MJ, Trehan I, Dominguez-Bello MG, Contreras M, Gordon JI. Human gut microbiome viewed across age and geography. Nature. 2012;486(7402):222–7. doi:10.1038/nature11053.
Claesson MJ, Cusack S, O’Sullivan O, Greene-Diniz R, de Weerd H, Flannery E, O’Toole PW. Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proc Natl Acad Sci U S A. 2011;108 Suppl 1:4586–91. doi:10.1073/pnas.1000097107.
Clarke G, Stilling RM, Kennedy PJ, Stanton C, Cryan JF, Dinan TG. Minireview: gut microbiota: the neglected endocrine organ. Mol Endocrinol. 2014;28(8):1221–38. doi:10.1210/me.2014-1108.
Markle JG, Frank DN, Mortin-Toth S, Robertson CE, Feazel LM, Rolle-Kampczyk U, Danska JS. Sex differences in the gut microbiome drive hormone-dependent regulation of autoimmunity. Science. 2013;339(6123):1084–8. doi:10.1126/science.1233521.
Davey KJ, O’Mahony SM, Schellekens H, O’Sullivan O, Bienenstock J, Cotter PD, Cryan JF. Gender-dependent consequences of chronic olanzapine in the rat: effects on body weight, inflammatory, metabolic and microbiota parameters. Psychopharmacol (Berl). 2012;221(1):155–69. doi:10.1007/s00213-011-2555-2.
Liang X, Bushman FD, FitzGerald GA. Rhythmicity of the intestinal microbiota is regulated by gender and the host circadian clock. Proc Natl Acad Sci U S A. 2015;112(33):10479–84. doi:10.1073/pnas.1501305112.
Ren W, Ma Y, Yang L, Gettie A, Salas J, Russell K, Cheng-Mayer C. Fast disease progression in simian HIV-infected female macaque is accompanied by a robust local inflammatory innate immune and microbial response. AIDS. 2015;29(10):F1–8. doi:10.1097/QAD.0000000000000711.
Noguera-Julian M, Rocafort M, Guillen Y, Rivera J, Casadella M, Nowak P, Paredes R. Gut microbiota linked to sexual preference and HIV infection. EBioMed. 2016;5:135–46. doi:10.1016/j.ebiom.2016.01.032.
Oh C, Lee K, Cheong Y, Lee SW, Park SY, Song CS, Lee JB. Comparison of the oral microbiomes of canines and their owners using next-generation sequencing. PLoS One. 2015;10(7):e0131468. doi:10.1371/journal.pone.0131468.
Song SJ, Lauber C, Costello EK, Lozupone CA, Humphrey G, Berg-Lyons D, Knight R. Cohabiting family members share microbiota with one another and with their dogs. Elife. 2013;2:e00458. doi:10.7554/eLife.00458.
Jalanka-Tuovinen J, Salonen A, Nikkila J, Immonen O, Kekkonen R, Lahti L, de Vos WM. Intestinal microbiota in healthy adults: temporal analysis reveals individual and common core and relation to intestinal symptoms. PLoS One. 2011;6(7):e23035. doi:10.1371/journal.pone.0023035.
Rajilic-Stojanovic M, Heilig HG, Tims S, Zoetendal EG, & de Vos WM. Long-term monitoring of the human intestinal microbiota composition. Environ Microbiol. 2012. doi:10.1111/1462-2920.12023
Zoetendal EG, Akkermans AD, De Vos WM. Temperature gradient gel electrophoresis analysis of 16S rRNA from human fecal samples reveals stable and host-specific communities of active bacteria. Appl Environ Microbiol. 1998;64(10):3854–9.
Gevers D, Kugathasan S, Denson LA, Vazquez-Baeza Y, Van Treuren W, Ren B, Xavier RJ. The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe. 2014;15(3):382–92. doi:10.1016/j.chom.2014.02.005.
Brotman RM, Shardell MD, Gajer P, Tracy JK, Zenilman JM, Ravel J, Gravitt PE. Interplay between the temporal dynamics of the vaginal microbiota and human papillomavirus detection. J Infect Dis. 2014;210(11):1723–33. doi:10.1093/infdis/jiu330.
Chehoud C, Stieh DJ, Bailey AG, Laughlin AL, Allen SA, McCotter KL, Bushman FD. Associations of the vaginal microbiota with HIV infection, bacterial vaginosis and demographic factors. AIDS. 2017. doi:10.1097/QAD.0000000000001421.
Gajer P, Brotman RM, Bai G, Sakamoto J, Schutte UM, Zhong X, Ravel J. Temporal dynamics of the human vaginal microbiota. Sci Transl Med. 2012;4(132):132ra152. doi:10.1126/scitranslmed.3003605.
Ravel J, Brotman RM, Gajer P, Ma B, Nandy M, Fadrosh DW, Forney LJ. Daily temporal dynamics of vaginal microbiota before, during and after episodes of bacterial vaginosis. Microbiome. 2013;1(1):29. doi:10.1186/2049-2618-1-29.
Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SS, McCulle SL, Forney LJ. Vaginal microbiome of reproductive-age women. Proc Natl Acad Sci U S A. 2011;108 Suppl 1:4680–7. doi:10.1073/pnas.1002611107.
Liang X, Bushman FD, FitzGerald GA. Time in motion: the molecular clock meets the microbiome. Cell. 2014;159(3):469–70. doi:10.1016/j.cell.2014.10.020.
Thaiss CA, Zeevi D, Levy M, Segal E, Elinav E. A day in the life of the meta-organism: diurnal rhythms of the intestinal microbiome and its host. Gut Microbes. 2015;6(2):137–42. doi:10.1080/19490976.2015.1016690.
Bushon RN, Kephart CM, Koltun GF, Francy DS, Schaefer 3rd FW, Alan Lindquist HD. Statistical assessment of DNA extraction reagent lot variability in real-time quantitative PCR. Lett Appl Microbiol. 2010;50(3):276–82. doi:10.1111/j.1472-765X.2009.02788.x.
Campbell JH, Foster CM, Vishnivetskaya T, Campbell AG, Yang ZK, Wymore A, Podar M. Host genetic and environmental effects on mouse intestinal microbiota. ISME J. 2012;6(11):2033–44. doi:10.1038/ismej.2012.54.
Hildebrand F, Nguyen TL, Brinkman B, Yunta RG, Cauwe B, Vandenabeele P, Raes J. Inflammation-associated enterotypes, host genotype, cage and inter-individual effects drive gut microbiota variation in common laboratory mice. Genome Biol. 2013;14(1):R4. doi:10.1186/gb-2013-14-1-r4.
Arndt SS, Laarakker MC, van Lith HA, van der Staay FJ, Gieling E, Salomons AR, Ohl F. Individual housing of mice--impact on behaviour and stress responses. Physiol Behav. 2009;97(3-4):385–93. doi:10.1016/j.physbeh.2009.03.008.
Laber K, Veatch LM, Lopez MF, Mulligan JK, Lathers DM. Effects of housing density on weight gain, immune function, behavior, and plasma corticosterone concentrations in BALB/c and C57BL/6 mice. J Am Assoc Lab Anim Sci. 2008;47(2):16–23.
Paigen B, Currer JM, Svenson KL. Effects of varied housing density on a hybrid mouse strain followed for 20 months. PLoS One. 2016;11(2):e0149647. doi:10.1371/journal.pone.0149647.
Dollive S, Chen YY, Grunberg S, Bittinger K, Hoffmann C, Vandivier L, Bushman FD. Fungi of the murine gut: episodic variation and proliferation during antibiotic treatment. PLoS One. 2013;8(8):e71806. doi:10.1371/journal.pone.0071806.
Wu GD, Lewis JD, Hoffmann C, Chen YY, Knight R, Bittinger K, Bushman FD. Sampling and pyrosequencing methods for characterizing bacterial communities in the human gut using 16S sequence tags. BMC Microbiol. 2010;10:206. doi:10.1186/1471-2180-10-206.
Song SJ, Amir A, Metcalf L, Amato KR, Xu ZZ, Humphrey G, & Knight R. Preservation methods differ in fecal microbiome stability, affecting suitability for field studies. mSystems. 2016; 1(3). doi:10.1128/mSystems.00021-16
Blekhman R, Tang K, Archie EA, Barreiro LB, Johnson ZP, Wilson ME, Tung J. Common methods for fecal sample storage in field studies yield consistent signatures of individual identity in microbiome sequencing data. Sci Rep. 2016;6:31519. doi:10.1038/srep31519.
Choo JM, Leong LE, Rogers GB. Sample storage conditions significantly influence faecal microbiome profiles. Sci Rep. 2015;5:16350. doi:10.1038/srep16350.
Dominianni C, Wu J, Hayes RB, Ahn J. Comparison of methods for fecal microbiome biospecimen collection. BMC Microbiol. 2014;14:103. doi:10.1186/1471-2180-14-103.
Hill CJ, Brown JR, Lynch DB, Jeffery IB, Ryan CA, Ross RP, O’Toole PW. Effect of room temperature transport vials on DNA quality and phylogenetic composition of faecal microbiota of elderly adults and infants. Microbiome. 2016;4(1):19. doi:10.1186/s40168-016-0164-3.
Kerckhof FM, Courtens EN, Geirnaert A, Hoefman S, Ho A, Vilchez-Vargas R, Boon N. Optimized cryopreservation of mixed microbial communities for conserved functionality and diversity. PLoS One. 2014;9(6):e99517. doi:10.1371/journal.pone.0099517.
McKain N, Genc B, Snelling TJ, Wallace RJ. Differential recovery of bacterial and archaeal 16S rRNA genes from ruminal digesta in response to glycerol as cryoprotectant. J Microbiol Methods. 2013;95(3):381–3. doi:10.1016/j.mimet.2013.10.009.
Sinha R, Abnet CC, White O, Knight R, Huttenhower C. The microbiome quality control project: baseline study design and future directions. Genome Biol. 2015;16:276. doi:10.1186/s13059-015-0841-8.
Vogtmann E, Chen J, Amir A, Shi J, Abnet CC, Nelson H, Sinha R. Comparison of collection methods for fecal samples in microbiome Studies. Am J Epidemiol. 2017;185(2):115–23. doi:10.1093/aje/kww177.
Luo T, Srinivasan U, Ramadugu K, Shedden KA, Neiswanger K, Trumble E, Foxman B. Effects of specimen collection methodologies and storage conditions on the short-term stability of oral microbiome taxonomy. Appl Environ Microbiol. 2016;82(18):5519–29. doi:10.1128/AEM.01132-16.
Lauber CL, Zhou N, Gordon JI, Knight R, Fierer N. Effect of storage conditions on the assessment of bacterial community structure in soil and human-associated samples. FEMS Microbiol Lett. 2010;307(1):80–6. doi:10.1111/j.1574-6968.2010.01965.x.
Corless CE, Guiver M, Borrow R, Edwards-Jones V, Kaczmarski EB, Fox AJ. Contamination and sensitivity issues with a real-time universal 16S rRNA PCR. J Clin Microbiol. 2000;38(5):1747–52.
Rand KH, Houck H. Taq polymerase contains bacterial DNA of unknown origin. Mol Cell Probes. 1990;4(6):445–50.
Tanner MA, Goebel BM, Dojka MA, Pace NR. Specific ribosomal DNA sequences from diverse environmental settings correlate with experimental contaminants. Appl Environ Microbiol. 1998;64(8):3110–3.
Shen H, Rogelj S, Kieft TL. Sensitive, real-time PCR detects low-levels of contamination by Legionella pneumophila in commercial reagents. Mol Cell Probes. 2006;20(3-4):147–53. doi:10.1016/j.mcp.2005.09.007.
Kennedy K, Hall MW, Lynch MD, Moreno-Hagelsieb G, Neufeld JD. Evaluating bias of illumina-based bacterial 16S rRNA gene profiles. Appl Environ Microbiol. 2014;80(18):5717–22. doi:10.1128/AEM.01451-14.
Wagner Mackenzie B, Waite DW, Taylor MW. Evaluating variation in human gut microbiota profiles due to DNA extraction method and inter-subject differences. Front Microbiol. 2015;6:130. doi:10.3389/fmicb.2015.00130.
Lazarevic V, Gaia N, Girard M, Schrenzel J. Decontamination of 16S rRNA gene amplicon sequence datasets based on bacterial load assessment by qPCR. BMC Microbiol. 2016;16:73. doi:10.1186/s12866-016-0689-4.
Knights D, Kuczynski J, Charlson ES, Zaneveld J, Mozer MC, Collman RG, Kelley ST. Bayesian community-wide culture-independent microbial source tracking. Nat Methods. 2011;8(9):761–3. doi:10.1038/nmeth.1650.
Aagaard K, Ma J, Antony KM, Ganu R, Petrosino J, Versalovic J. The placenta harbors a unique microbiome. Sci Transl Med. 2014;6(237):237ra265. doi:10.1126/scitranslmed.3008599.
Antony KM, Ma J, Mitchell KB, Racusin DA, Versalovic J, Aagaard K. The preterm placental microbiome varies in association with excess maternal gestational weight gain. Am J Obstet Gynecol. 2015;212(5):653. doi:10.1016/j.ajog.2014.12.041. e651-616.
Zheng J, Xiao X, Zhang Q, Mao L, Yu M, Xu J. The placental microbiome varies in association with low birth weight in full-term neonates. Nutrients. 2015;7(8):6924–37. doi:10.3390/nu7085315.
Yuan S, Cohen DB, Ravel J, Abdo Z, Forney LJ. Evaluation of methods for the extraction and purification of DNA from the human microbiome. PLoS One. 2012;7(3):e33865. doi:10.1371/journal.pone.0033865.
Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, Birren BW. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 2011;21(3):494–504. doi:10.1101/gr.112730.110.
Parada AE, Needham DM, Fuhrman JA. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ Microbiol. 2016;18(5):1403–14. doi:10.1111/1462-2920.13023.
Kircher M, Heyn P, Kelso J. Addressing challenges in the production and analysis of illumina sequencing data. BMC Genomics. 2011;12:382. doi:10.1186/1471-2164-12-382.
Brady T, Roth SL, Malani N, Wang GP, Berry CC, Leboulch P, Bushman FD. A method to sequence and quantify DNA integration for monitoring outcome in gene therapy. Nucleic Acids Res. 2011;39(11):e72. doi:10.1093/nar/gkr140.
Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, Turnbaugh PJ, Knight R. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci U S A. 2011;108 Suppl 1:4516–22. doi:10.1073/pnas.1000080107.
Walters W, Hyde ER, Berg-Lyons D, Ackermann G, Humphrey G, Parada A, Knight R. Improved bacterial 16S rRNA gene (V4 and V4-5) and fungal internal transcribed spacer marker gene primers for microbial community surveys. mSystems. 2016; 1(1). doi:10.1128/mSystems.00009-15.
Laurence M, Hatzis C, Brash DE. Common contaminants in next-generation sequencing that hinder discovery of low-abundance microbes. PLoS One. 2014;9(5):e97876. doi:10.1371/journal.pone.0097876.
Bhatt AS, Freeman SS, Herrera AF, Pedamallu CS, Gevers D, Duke F, Meyerson M. Sequence-based discovery of Bradyrhizobium enterica in cord colitis syndrome. N Engl J Med. 2013;369(6):517–28. doi:10.1056/NEJMoa1211115.
Dunn O. Multiple Comparisons Among Means. J Am Stat Assoc. 1961;56(293):52–64. doi:10.2307/2282330. citeulike-article-id:7471132.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57(1):289–300. doi:10.2307/2346101. citeulike-article-id:1042553.
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Knight R. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7(5):335–6. doi:10.1038/nmeth.f.303.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Weber CF. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75(23):7537–41. doi:10.1128/AEM.01541-09.
Wang X, Tucker NR, Rizki G, Mills R, Krijger PH, de Wit E, Boyer LA. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. Elife. 2016; 5. doi:10.7554/eLife.10557
Sabino J, Vieira-Silva S, Machiels K, Joossens M, Falony G, Ballet V, Raes J. Primary sclerosing cholangitis is characterised by intestinal dysbiosis independent from IBD. Gut. 2016. doi:10.1136/gutjnl-2015-311004
Forslund K, Hildebrand F, Nielsen T, Falony G, Le Chatelier E, Sunagawa S, Pedersen O. Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota. Nature. 2015;528(7581):262–6. doi:10.1038/nature15766.
Shaw KA, Bertha M, Hofmekler T, Chopra P, Vatanen T, Srivatsa A, Kugathasan S. Dysbiosis, inflammation, and response to treatment: a longitudinal study of pediatric subjects with newly diagnosed inflammatory bowel disease. Genome Med. 2016;8(1):75. doi:10.1186/s13073-016-0331-y.
Kelsen J, Bittinger K, Pauly-Hubbard H, Posivak L, Grunberg S, Baldassano R, Bushman FD. Alterations of the subgingival microbiota in pediatric Crohn’s disease studied longitudinally in discovery and validation cohorts. Inflamm Bowel Dis. 2015;21(12):2797–805. doi:10.1097/MIB.0000000000000557.
Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol. 2005;71(12):8228–35. doi:10.1128/AEM.71.12.8228-8235.2005.
We are grateful to Laurie Zimmerman and members of the Bushman laboratory for help and suggestions.
This work was supported by the National Institute of Allergy and Infectious Diseases P30 AI 045008 (EC, AL, and FDB); the National Heart, Lung, and Blood Institute R01 HL113252 (RGC); National Institute of Allergy and Infectious Diseases T32 AI007632 (SSM, and CC); Pennsylvania Department of Health SAP 4100068710 (DK, CEH, CZ, LM, CT, RB, and KB); Crohn’s and Colitis Foundation of America Career Development Award 3276 (JK); National Institutes of Health 1T32DK101371-01 (MC).
Availability of data and materials
The raw sequence files generated for comparisons of swab storage methods, positive gene block controls, and negative control samples are available from the NCBI Sequence Read Archive (BioProject accessions PRJNA356343, PRJNA356422, PRJNA356404, and PRJNA380255, respectively).
DK, CEH, LM, AL, EC, SSM, RGC, RB, FDB, and KB wrote the manuscript. DK, CEH, LM, AL, JK, and MC carried out experiments for the comparison of storage methods and positive/negative control samples. JK, MC, FDB, and KB designed the comparison of storage methods. DK, CEH, LM, FDB, and KB designed the comparison of positive/negative controls. CZ, CT, SSM, CC, and KB performed the data analysis. All authors read and approved of the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Kim, D., Hofstaedter, C.E., Zhao, C. et al. Optimizing methods and dodging pitfalls in microbiome research. Microbiome 5, 52 (2017). https://doi.org/10.1186/s40168-017-0267-5
- 16S rRNA gene
- Shotgun metagenomics
- Environmental contamination
- Study design
- Best practices