This was a nested case–control study conducted to compare changes in the vaginal microbiota of patients who had a spontaneous preterm labor and delivery with those who had an uncomplicated pregnancy. Cases and controls were selected in a 1:4 ratio from a prospective longitudinal cohort study designed to examine the relationship between biological markers and pregnancy outcome. The study included 18 cases and 72 controls. Patients with indicated preterm birth (for example, preeclampsia, intrauterine growth restriction, or congenital anomalies) were excluded. Patients volunteered to participate in the study and signed a written informed consent. The use of samples from the longitudinal study of pregnant women was approved by the Human Investigations Committee of Wayne State University and the Institutional Review Board (IRB) of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD).
A normal pregnant woman was defined as one without obstetrical, medical or surgical complications, who delivered at term (38 to 42 weeks) without congenital anomalies or acute histologic chorioamnionitis. Preterm labor was diagnosed by the presence of at least two uterine contractions every 10 minutes associated with cervical changes in patients with a gestational age between 20 and 34 weeks. Preterm premature rupture of membranes (PPROM) was identified with a sterile speculum examination with documentation of vaginal pooling and positive nitrazine and ferning tests. Spontaneous preterm delivery was defined as having occurred prior to the 34th week of gestation in patients with either intact membranes or PPROM. Acute histologic chorioamnionitis was diagnosed based on the presence of inflammatory cells in the chorionic plate and/or chorioamniotic membranes [98–100]. Sixty-one percent (11/18) of the cases had evidence of acute histologic chorioamnionitis.
Pregnant women who agreed to participate in the study had a speculum examination at each visit; a sample of vaginal fluid was collected under direct visualization from the posterior vaginal fornix by an obstetrician or midwife using a Dacron swab (medical packaging swab – PAK™ Carmarillo, CA, USA). The protocol called for sample collection every 4 weeks until 24 weeks of gestation, and then every 2 weeks until the last prenatal visit. Vaginal swabs were placed in a tube without any buffer and immediately stored at −70°C until assayed.
DNA extraction, amplification and pyrosequencing of barcoded V1-V3 hypervariable regions of 16S rRNA genes
Procedures for the extraction of genomic DNA from frozen vaginal swabs have been developed and validated previously [101, 95, 102]. Briefly, frozen vaginal swabs were immersed in 1 ml of sterile PBS and vortexed for 10 minutes. A total of 500 μl of the cell suspension was mixed with 500 μl of pre-warmed (55°C) cell lysis buffer composed of 0.05 M potassium phosphate buffer containing 50 μl lyzosyme (10 mg/ml), 6 μl of mutanolysin (25,000 U/ml; Sigma-Aldrich, St. Louis, MO, USA) and 3 μl of lysostaphin (4,000 U/ml in sodium acetate; Sigma-Aldrich) and the mixture was incubated for one hour at 37°C. Then 10 μl proteinase K (20 mg/ml), 100 μl 10% SDS, and 20 μl RNase A (20 mg/ml) were added and the mixture was incubated for one hour at 55 °C. The samples were transferred to a FastPrep Lysing Matrix B tube (MP Biomedical, Santa Ana, CA, USA) and microbial cells were lysed by mechanical disruption using a bead beater (FastPrep instrument, Qbiogene, Carlsbad, CA, USA) set at 6.0 m/second for 30 seconds. The lysate was processed using the CellFree500 kit on a QIAsymphony robotic platform. The DNA was eluted into 100 μl of TE (10 mM Tris–HCl, 1 mM EDTA) buffer, pH 8.0. This procedure provided between 2.5 and 5 μg of high quality whole genomic DNA from vaginal swabs.
The bacterial species composition and abundance in vaginal communities were determined using culture-independent methods. The V1-V3 hypervariable regions of the 16S rRNA gene were amplified using an optimized primer set comprising 27 F  and 534R. Because primer 534R contains a unique sample identifying barcode, up to 192 samples were sequenced per run and generated 4,000 to 6,000 sequence reads per sample. The primers were as follows:
27 F - 5′-GCCTTGCCAGCCCGCTCAGTCAGAGTTTGATCCTGGCTCAG-3′
534R - 5′-GCCTCCCTCGCGCCATCAGNNNNNNNNCATTACCGCGGCTGCTGGCA
The italicized sequences are the 454 Life Sciences® primers B and A in 27 F and 534R, respectively, and the bold font denotes the universal 16S rRNA primers 27 F and 534R. The barcode within 534R is denoted by eight Ns (but varies from six to eight Ns) and were identical to those used by the Human Microbiome Project . A mixture of bacterial 27 F primers was used to maximize sequence type discovery and eliminate the PCR amplification bias described by Frank et al. . The 27 F formulation remains relatively simple: having only seven distinct primer sequences there is minimal loss of overall amplification efficiency and specificity. The 27 F primer mixture was: four parts of four-fold-degenerate primer 27f-YM (5′-AGAGTTTGATYMTGGCTCAG, where Y is C or T) plus one part each of primers specific for the amplification of Bifidobacteriaceae (27f-Bif, 5′-AGGGTTCGATTCTGGCTCAG), Borrelia (27f-Bor, 5′-AGAGTTTGATCC TGGCTTAG), and Chlamydiales (27f-Chl, 5′-AGAATTTGATCTTGGTTCAG) sequences. This primer formulation was previously shown to better maintain the original rRNA gene ratio of Lactobacillus spp. to Gardnerella spp. in quantitative PCR assays, particularly under stringent amplification conditions .
For every set of 192 vaginal genomic DNA samples, PCR amplification of 16S rRNA genes was performed in 96-well microtiter plates as follows: 1× PCR buffer, 0.3 μM primer 27 F and 534R, 0.25 μl HotStar HiFidelity DNA polymerase (5 U/μl; Qiagen, Germantown, MD), and 25 ng of template DNA in a total reaction volume of 25 μl. Reactions were set up on a QIAgility robotic platform in a semi-sterile environment. Reactions were run in a DNA engine Tetrad2 instrument (Bio-Rad, Hercules, CA) using the following cycling parameters: 5 minutes denaturing at 95°C followed by 29 cycles of 30 seconds at 94°C (denaturing), 30 seconds at 52°C (annealing) and 60 seconds at 72°C (elongation), with a final extension at 72°C for 10 minutes. Separate plates containing negative controls without a template for each of the 96 barcoded primers were included for each set of plates processed: in our workflow, if one of these samples is positive, the samples and negative controls plates are rerun with new primers; however, no amplicons were observed in any of the no template controls. The presence of amplicons was confirmed by gel electrophoresis on a 2% agarose gel stained with SYBRGreen (Life Technologies, Carlsbad, CA, USA). PCR products were quantified using the Quant-iT Picogreen® quantification system (Life Technologies) and equimolar amounts (100 ng) of the PCR amplicons were mixed in a single tube using the QIAgility robotic platform. Amplification primers and reaction buffer were removed by processing the amplicon mixture with the Agencourt AMPure Kit (Beckman-Coulter, Pasadena, CA, USA). All PCR amplification reactions that failed were repeated twice using different amounts of template DNA, and if these failed, the samples were excluded from the analysis.
The purified amplicon mixtures were sequenced by 454 pyrosequencing using 454 Life Sciences® (Roche/454 Life Sciences, Branford, CT) primer A by the Genomics Resource Center at the Institute for Genome Sciences, University of Maryland School of Medicine, using Roche/454 Titanium chemistries and protocols recommended by the manufacturer and amended by the Center.
All sequences were trimmed before the first ambiguous base pair. The QIIME software package  was used for quality control of the sequence reads using the split-library.pl script and the following criteria: 1) minimum and maximum length of 250 bp and 450 bp; 2) an average of q25 over a sliding window of 25 bp. If the read quality dropped below q25, it was trimmed at the first base pair of the window then reassessed for length criteria; 3) a perfect match to a barcode sequence; and 4) presence of the 534R 16S primer sequence used for amplification. Sequences were binned based on sample-specific barcode sequences and trimmed by removal of the barcode and primer sequences (forward, if present, and reverse). High-quality sequence reads were first de-replicated (99% similarity) using the UCLUST software package . Detection of potential chimeric sequences was performed using the UCHIME component of UCLUST  with the de novo algorithm. Chimeric sequences were removed prior to taxonomic assignments. Taxonomic assignments were performed on each individual quality checked 16S rRNA sequence read using a combination of pplacer  and speciateIT (speciateIT.sourceforge.net). Taxonomic assignments (sequence read counts and relative abundances) are shown in Additional file 1: Table S1. All sequence data and metadata were deposited in the Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/Traces/sra/) under BioProject PRJNA242473 (SRA accession SRA150182, SRP040750).
The abundance of bacteria is generally expressed on a logarithmic scale (base 10), given the wide range of bacterial abundance and the exponential nature of bacterial growth under certain circumstances (for example in vitro). The standard is to compare microbial abundance over time using the difference of logs, log10 (a) - log10 (b), which is the same as the log fold change log10 (a/b), where a and b are relative abundances of a given microorganism in two samples (for example, two sampling time points).
Changes in the abundance of a complex microbial ecosystem within the same patient at different time points were estimated for specific phylotypes. We assessed the dissimilarity between community states (in other words, how divergent community states are) using the Jensen-Shannon metric . In microbial ecology, the term ‘community state’ refers to the relative abundance of all phylotypes at a particular time point in a subject; in our case, a sample of vaginal fluid.
The Jensen-Shannon divergence between two community states, p and q, is the average of the Kullback–Leibler divergences DKL
(p, a) and DKL
is the mean of p
(p,q) is the Kullback–Leibler divergence defined as:
where p = (p1, …, pn) and q = (q1, …, qn).
The Kullback–Leibler divergence DKL(p, q) calculates the mean log fold changes log (pi/qi). While the Kullback–Leibler measure is widely used, it has one drawback: its value becomes infinite if one of the components of q is zero. In contrast, the Jensen-Shannon divergence always yields a value between 0 and 1. A Jensen-Shannon divergence score of 0 means that two community states are the same. In contrast, a Jensen-Shannon divergence score of 1 means that the two community states are completely different. The square root of the Jensen-Shannon divergence is called ‘Jensen-Shannon distance.’
The term ‘community state type’ is used in microbial ecology to describe a group of community states with similar microbial phylotype composition and abundance [95, 110]. Such grouping is desirable in order to reduce dimensionality. Utilizing Jensen-Shannon divergence as a measure of dissimilarity among community states and hierarchical clustering with Ward linkage, six vaginal community state types in pregnant and non-pregnant women have been previously identified [95, 97]. Four of the community state types (I, II, III and V) are dominated by Lactobacillus spp. (Lactobacillus crispatus, L. gasseri, L. iners, and L. jensenii, respectively) and the remaining two community state types (IV-A, IV-B) consist of microbial ecosystems with a diverse array of anaerobes and strict anaerobes, and substantially lower numbers of Lactobacillus spp. than the other community state types.
Statistical procedures to evaluate the differential abundance of phylotypes between women who deliver at term and those who had a spontaneous preterm delivery
In order to assess a change in phylotype relative abundance between the two groups, we modeled the relative abundance of one phylotype at a time as a function of study group (that is, normal pregnancy versus spontaneous preterm delivery). Only phylotypes present (one read count) in 25% or more of the samples were considered in the analysis.
Read count data obtained from a longitudinal experiment design are typically modeled using generalized estimation equations (GEE) or linear mixed-effects (LME) models by assuming a Poisson or negative binomial distribution of the response. The choice of a Poisson distribution is justified when the count variance equals the count mean, while the negative binomial distribution is preferred when the mean-variance equality cannot be safely assumed.
Several phylotypes were not detected in a large proportion of samples; hence, the frequency of 0 count values in the dataset was larger than expected under a Poisson or negative binomial distribution. Therefore, models that allow for zero inflation are more appropriate; indeed, this approach has been used for decades .
To ensure a proper fit of the count data of each phylotype, we utilized zero-inflated negative binomial mixed-effects models (ZINBLME) in addition to negative binomial linear mixed effects (NBLME) and Poisson linear mixed effects (PLME) models. These three types of models were fitted to each phylotype, and the model with the lowest Akaike Information Criterion (AIC) value was retained. The P-value for the association between the microbial relative abundance and the group variable was computed only for the best model (smallest AIC).
The mixed effects modeling of the read counts data (dependent variable) on pregnancy status (independent variable) was performed using the NLMIXED procedure in SAS (version 9.3) as discussed elsewhere [97, 112, 113]. All three types of models (PLME, NBLME and ZINBLME) included an offset term (the log of the total number of reads in a given sample) to allow for a comparison of the relative abundance (and not absolute counts) between groups. The random effect in the ZINBLME models was allowed only on the non-zero inflation component (negative binomial mean).
For each of the three types of models, the reported coefficient represents the difference in mean log relative abundance between patients who subsequently had a spontaneous preterm delivery and those who delivered at term, which was further converted into a fold change. The P-value of the model with the best fit (smallest AIC) was retained and false discovery rate adjustment was applied across the phylotypes. A q-value <0.1 and fold change >1.5 were considered significant.
Analytical approach to examine changes in abundance of phylotypes with gestational age
The approach used to identify phylotypes associated with spontaneous preterm delivery, described above, was also used to characterize changes in the phylotypes’ abundance as a function of gestational age. The gestational age range over which samples were obtained in this longitudinal study of pregnant women who deliver at term was divided into three intervals: 6.9 to 22.1, 22.2 to 29.8 and 29.9 to 41 weeks. The two cut-off points at 22.1 and 29.8 weeks were selected so that the resulting three intervals had comparable gestational age windows and a comparable number of vaginal samples. The 5th and 95th percentiles of the gestational age over which samples were collected, were calculated. Then, the interval between the 5th and 95th percentile was divided into three gestational age windows (14.5 to 22.1 weeks, 22.2 to 29.8 weeks and 29.9 to 37.5 weeks).
This analysis provides a simple description of the gestational age-related trends in microbial abundance (for example, an increase in abundance of approximately two fold from the first to second interval). However, such an approach may not capture potentially more complex trends in the microbial abundance as a function of gestational age. Therefore, a secondary analysis was performed by treating gestational age as a continuous variable. Orthogonal polynomial terms based on gestational age were used as explanatory variables in a NBLME model. The response variable in this model was the observed number of reads for each phylotype in each sample. The degree of the polynomial function was selected so that the resulting model minimized the AIC criterion. The degree of the polynomial function varied from 1 to 7. The P-values for the ‘between intervals’ comparisons as well as the P-value for each polynomial term were adjusted across phylotypes. A false discovery rate of 10% was used.