Skip to main content

Respiratory tract clinical sample selection for microbiota analysis in patients with pulmonary tuberculosis



Changes in respiratory tract microbiota have been associated with diseases such as tuberculosis, a global public health problem that affects millions of people each year. This pilot study was carried out using sputum, oropharynx, and nasal respiratory tract samples collected from patients with pulmonary tuberculosis and healthy control individuals, in order to compare sample types and their usefulness in assessing changes in bacterial and fungal communities.


Most V1-V2 16S rRNA gene sequences belonged to the phyla Firmicutes, Bacteroidetes, Proteobacteria, Actinobacteria, and Fusobacteria, with differences in relative abundances and in specific taxa associated with each sample type. Most fungal ITS1 sequences were classified as Ascomycota and Basidiomycota, but abundances differed for the different samples. Bacterial and fungal community structures in oropharynx and sputum samples were similar to one another, as indicated by several beta diversity analyses, and both differed from nasal samples. The only difference between patient and control microbiota was found in oropharynx samples for both bacteria and fungi. Bacterial diversity was greater in sputum samples, while fungal diversity was greater in nasal samples.


Respiratory tract microbial communities were similar in terms of the major phyla identified, yet they varied in terms of relative abundances and diversity indexes. Oropharynx communities varied with respect to health status and resembled those in sputum samples, which are collected from tuberculosis patients only due to the difficulty in obtaining sputum from healthy individuals, suggesting that oropharynx samples can be used to analyze community structure alterations associated with tuberculosis.


Recent studies suggest that microbial communities inhabiting the human body can influence the host's health status and contribute to disease [1]. The human upper respiratory tract represents the major portal of entry for numerous airborne microorganisms, such as bacteria, fungi, or viruses [2]. High-throughput sequencing methods have provided great insight regarding the composition of the respiratory tract-associated microbiota, which has been recently related with the development of diseases such as asthma [3], nosocomial pneumonia, pulmonary cystic fibrosis [4], and chronic obstructive pulmonary disease [5].

Tuberculosis (TB), a respiratory disease caused by Mycobacterium tuberculosis (Mtb), is a major global public health problem that affects millions of people each year and ranks as the second leading cause of death from an infectious disease worldwide, with 8.6 million new cases and 1.3 million deaths in 2012 (25% of them were HIV-associated) [6]. The Mtb pathogen typically affects the lungs (pulmonary TB) but can affect other sites as well (extrapulmonary TB). Individuals with pulmonary TB can expel bacteria by talking, coughing, or sneezing, spreading the pathogen through airborne particles that are inhaled by others. The complex Mtb-human host interaction and the resulting infectious process indicate that TB disease development may be a multifactorial process [7]. Microorganism characteristics coupled to local host immune response determine whether bacilli are cleared or will lead to either acute or latent disease [2].

Recent studies of the respiratory tract microbiota using sputum samples and mixtures of saliva and pharyngeal secretions indicate changes and possible associations with pulmonary TB [8, 9]. In this work, we examined the microbiota in three types of respiratory tract samples, nasal and oropharynx swabs and sputum, the latter taken only from patients since sputum is difficult to procure from healthy individuals, not to mention the more invasive bronchoalveolar lavage. Previous studies have shown that oropharyngeal swabs can be a reasonable proxy for lung samples [10], and an analysis in healthy individuals indicated that lung and upper airway bacterial populations, which include the oropharynx, were largely indistinguishable from one another [11]. Given that the resemblance between oropharyngeal and sputum communities is still unclear and the difficulty of getting sputum samples from healthy individuals, the aim of this work was to use different sample types and determine which one could be used to evaluate the composition of the respiratory tract microbiota associated with TB patients and healthy controls.

Population and sampling

To assess respiratory tract microbiota associated with TB patients and healthy controls, we collected nasal, oropharynx, and sputum samples from six TB patients and nasal and oropharynx samples from six healthy controls. The inclusion and exclusion criteria can be found in Additional file 1, and the demographic and clinical characteristics of individuals are shown in Additional file 2. Nasal samples were taken by swabbing the mucosal surface of the deep nasal cavity by doing ten rotational movements in each nostril; oropharynx swabs were taken from the back wall of the oropharynx, avoiding contact with other surfaces such as tonsil, palate, and tongue. As previously reported, the median body mass index (BMI) was significantly lower in TB patients (19.6) compared to healthy controls (25.5) (Table 1) [12]. All sputum, nasal and oropharynx samples were collected, processed as reported [13], and used to isolate DNA with the MoBio PowerSoil DNA Isolation Kit (MO Bio Laboratories, Carlsbad, CA, USA) [14, 15], following the manufacturer's recommendations.

Table 1 Population characteristics

Bacterial diversity

The V1-V2 hypervariable region of the bacterial 16S rDNA was amplified with primers 27F (5′ AGAGTTTGATCCTGGCTCAG 3′) and 338R (5′ TGCTGCCTCCCGTAGGAGT 3′) [16], using 10 ng DNA and AccuPrime™ Taq DNA polymerase (Invitrogen, Thermo Fisher Scientific, Waltham, MA, USA) with the following conditions: 95°C for 3 min, followed by 35 cycles of 20 s at 95°C, 20 s at 52°C and 60 s at 65°C, and ending with 6 min at 72°C. Samples were sequenced using 454/Roche GS-FLX Titanium chemistry (EnGenCore, University of South Carolina, Columbia, SC, USA). Pyrosequencing reads have been submitted to the NCBI Sequence Read Archive (BioProject no. PRJNA242354). All sequence analyses were carried out using Quantitative Insights Into Microbial Ecology (QIIME) v1.6 [17]. Approximately 589,000 sequences with a length size larger than 200 bps remained after quality filtering (386,645 and 202,422 reads from TB patient and control samples, respectively) using a quality score of 25 with a slide window of 40 bases. The open-reference operational taxonomic unit (OTU) picking protocol was used to discard sequences that were likely not rRNA and chimeras using 97% sequence identity and the Greengenes core set [18]. Samples were rarified to the minimum number of sequence reads per sample (the number varied from 10,480 to 38,099), and taxonomic classification was performed using the Ribosomal Database Project naïve Bayesian classifier [19]. Chao1 and Shannon indexes were calculated for taxon richness and diversity estimations, respectively. Significance tests were performed using the non-parametric Mann-Whitney U test (SPSS V.18, SPSS Inc, Chicago, IL, USA). A first comparison showed that sputum samples had the highest diversity, followed by oropharynx and the least diverse were nasal samples. Both nasal and oropharynx samples from healthy controls were more diverse than samples from TB patients, with a significant difference in the Shannon index for nasal samples (Table 2). Most sequences in all samples (>99% in TB patients and >98% in healthy controls) belonged to five phyla, Firmicutes, Bacteroidetes, Proteobacteria, Actinobacteria, and Fusobacteria, consistent with previous reports [9, 20, 21] (see Figure 1A). White's non-parametric t test (pairwise comparisons) [22], ANOVA (multiple comparisons), and false discovery rate (FDR) correction, all implemented in the STAMP software [23], were used to identify groups that could be characteristic of each sample type. STAMP results showed that of the predominant phyla, only Bacteroidetes (p = 0.017) and Thermi (p = 0.020) were significantly different among sample types (nasal, oropharynx, and sputum). Principal coordinate analyses (PCoA) and unweighted pair group method with arithmetic mean (UPGMA) analyses performed to compare communities indicated that oropharynx and sputum microbial communities clustered together, whereas nasal samples clustered separately, consistent with previous analyses of oropharynx and nasal communities [14, 20] (Figure 1). Between-group versus within-group UniFrac distances, with permutation, were analyzed using Student's t test for significant differences of averages to see if communities from the same sample type were more similar to one another than to the other communities. The oropharynx sample communities were as similar to the sputum sample communities as they were to each other (p > 0.05, data not shown), and likewise, communities from sputum samples were also indistinguishable from oropharynx communities, indicating that they are closely related.

Table 2 Sequence data and diversity indexes
Figure 1
figure 1

Analysis of bacterial 16S rRNA gene sequences. (A) Taxonomic classification (bottom) and UPGMA analysis based on unweighted UniFrac metric (top) for sequences obtained from TB patient (P) or healthy control (C) sputum (S), oropharynx (O), and nasal (N) samples. Different individuals are indicated by numbers. (B) PCoA UniFrac weighted analysis of sputum (green), oropharynx (blue), and nasal (red) samples for controls (squares) and patients (circles).

These differences were marked by a higher abundance of some phyla, particularly Bacteroidetes and Fusobacteria in oropharynx samples and Thermi in nasal swabs (p values = 0.034, 0.030, and 0.031, respectively). Fourteen taxa differed significantly between nasal and oropharynx samples when both patient and control groups were analyzed together, but only some of these showed differences within each group: one for patients versus three phyla for controls (Table 3). When comparing sputum and oropharynx communities, only for TB patients for which both samples were collected, the only observed difference was in Actinobacteria, which was significantly higher in sputum samples (Figure 1A, Table 3); no significant differences were found at other phylogenetic levels. As expected, sequences belonging to the genus Mycobacterium were detected only in sputum but not in patient oropharynx samples, consistent with culture results.

Table 3 Phyla that differ significantly between sample types

Samples were also analyzed in order to see changes in respiratory tract bacterial communities associated to health status. The only difference between patient and control groups, using either nasal and oropharynx samples separately or both sample types (nasal and oropharynx) together, was found in oropharynx samples, where unclassified sequences belonging to the Streptococcaceae family were more abundant in TB patients (p = 0.00878, not shown). Taken together, these observations indicate alterations in these communities and raise the possibility that such imbalances could affect, or result from, infection and/or colonization.

Fungal diversity

The fungal nuclear ribosomal internal transcribed spacer ITS1 region was amplified using the primer set ITS-5 (5′GGAAGTAAAAGTCGTAACAAGG3′) and ITS-2 (5′GCTGCGTTCTTCATCGATGC3′) [24] and conditions as indicated above for Bacteria, but doing 35 cycles of 60 s at 94°C, 60 s at 55.2°C, and 90 s at 72°C, followed by a final extension for 10 min at 72°C. Amplicons were subjected to pyrosequencing, and sequence analysis was done as indicated above for Bacteria. Of a total of 783,925 raw sequences obtained, 268,751 sequences with a length size larger than 100 bps were retained after filtering for quality (34.3%). Chimeras and non-rRNAs sequences were discarded, as mentioned above for Bacteria, using 97% sequence identity set of fungal ITS sequences from the UNITE database [25]. Samples were rarified to 2,076 reads per sample (the number of reads per sample ranged from 1 to 42,479), leaving only 17 samples from patients (out of 18) and 7 from controls (out of 12). Nasal samples showed greater fungal richness and diversity, although the differences between patients and controls in samples of the same type were not significant (Table 2). Overall, the majority of the ITS1 sequences analyzed (90%) were classified as belonging to the phylum Ascomycota, followed by Basidiomycota. This was observed for all sample types with the exception of nasal samples from healthy control individuals (Figure 2), and is consistent with nasal fungal analysis in the nares [26]. However, the genus Malassezia was not predominant in this study, as has been reported previously for diverse skin sites, probably due to different environmental conditions of the body sites sampled [26]. Again, communities clustered according to sample type (oropharynx, nasal, and sputum) (Figure 2), and TB patient sputum and oropharynx samples showed similar relative abundances with no significant differences at the phylum level (Figure 2, Table 3). Significant differences were observed only when comparing patient nasal communities with those of the oropharynx (Ascomycota and unclassified sequences) or sputum (unclassified sequences) (Table 3). Similar to Bacteria, differences between patients and controls were observed only in oropharynx samples, with a decrease of the genus Cryptococcus in patients (p = <1e-15, not shown). In TB patients, Candida and Aspergillus were the most frequent genera for both sputum and oropharyngeal samples, even though no significant differences were found when compared with healthy controls. In contrast to Bacteria, significant differences at the phylum level between oropharynx and nasal sample communities were seen only in patients with TB but not in controls (Table 3). Previous work on skin microbial communities indicated that bacterial and fungal richness did not show a linear correlation and that diversity was dependent on body site [26]. Similarly, in this study, the diversity of bacterial and fungal communities was found to vary inversely between samples analyzed: bacterial diversity was greater in oropharynx when compared with nasal samples, whereas fungi were more diverse in nasal than in oropharynx samples (Table 2).

Figure 2
figure 2

Phylum level analysis of fungal ITS1 sequences. The bottom shows classification for sequences obtained from TB patient (P) and healthy control (C) sputum (S), oropharynx (O), and nasal (N) samples. The top indicates clustering analysis based on Jaccard distances.


Differences in community diversity indexes and in abundance of particular taxa, specifically in oropharynx communities, between TB patients and healthy controls suggest disturbance of respiratory tract microbial communities, despite the overall similarity in terms of the major phyla identified. These altered communities could either result from or influence infection and/or colonization by M. tuberculosis, a possibility that can be further examined by studying changes in particular taxa or in functionality via metagenomic sequencing using samples collected at various time points. More importantly, there was a resemblance between communities from sputum in TB patients and those present in the oropharynx, both of which were distinct from the nasal microbiota. This study therefore indicates that oropharynx samples can be valuable for probing respiratory tract microbiota and sets the groundwork for more extensive comparison and analysis of possible microbial community imbalances associated with a diseased state such as TB.

Ethics statement

The research complied with the standards and recommendations for biomedical research involving human subjects adopted by the 18th World Medical Assembly, Helsinki, Finland, June 1964 and the 59th Meeting, Seoul, 2008. Ethical standards also complied with resolution N°008430 (1993) established by the Colombian Ministry of Health for work with humans. Informed written consent was obtained from all participants prior to enrollment with approval by the Ethics Committee of Corporación Corpogen (Bogotá), Corporación para Investigaciones Biológicas-CIB (Medellín) and with the approval of the Research Committee METROSALUD, ESE (Medellín).



Body mass index


Human immunodeficiency virus

ITS1 :

Internal transcribed spacer region 1

Mtb :

Mycobacterium tuberculosis


Operational taxonomic unit

PCoA :

Principal coordinate analyses


Quantitative Insights Into Microbial Ecology

TB :



Unweighted pair group method with arithmetic mean.


  1. Littman DR, Pamer EG: Role of the commensal microbiota in normal and pathogenic host immune responses. Cell Host Microbe. 2011, 10: 311-323.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Delhaes L, Monchy S, Frealle E, Hubans C, Salleron J, Leroy S, Prevotat A, Wallet F, Wallaert B, Dei-Cas E, Sime-Ngando T, Chabé M, Viscogliosi E: The airway microbiota in cystic fibrosis: a complex fungal and bacterial community—implications for therapeutic management. PLoS One. 2012, 7: e36313-

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Marri PR, Stern DA, Wright AL, Billheimer D, Martinez FD: Asthma-associated differences in microbial composition of induced sputum. J Allergy Clin Immunol. 2013, 131: 346-352. e341-343

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Zhou Y, Lin P, Li Q, Han L, Zheng H, Wei Y, Cui Z, Ni Y, Guo X: Analysis of the microbiota of sputum samples from patients with lower respiratory tract infections. Acta Biochim Biophys Sin (Shanghai). 2010, 42: 754-761.

    Article  Google Scholar 

  5. Cabrera-Rubio R, Garcia-Nunez M, Seto L, Anto JM, Moya A, Monso E, Mira A: Microbiome diversity in the bronchial tracts of patients with chronic obstructive pulmonary disease. J Clin Microbiol. 2012, 50: 3562-3568.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Global Tuberculosis Report 2013 - World Health Organization - WHO. Access Date Octuber 30 2013

  7. Sharma SK, Mohan A: Tuberculosis: from an incurable scourge to a curable disease—journey over a millennium. Indian J Med Res. 2013, 137: 455-493.

    PubMed  PubMed Central  Google Scholar 

  8. Cui Z, Zhou Y, Li H, Zhang Y, Zhang S, Tang S, Guo X: Complex sputum microbial composition in patients with pulmonary tuberculosis. BMC Microbiol. 2012, 12: 276-

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Cheung MK, Lam WY, Fung WY, Law PT, Au CH, Nong W, Kam KM, Kwan HS, Tsui SK: Sputum microbiota in tuberculosis as revealed by 16S rRNA pyrosequencing. PLoS One. 2013, 8: e54574-

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Klepac-Ceraj V, Lemon KP, Martin TR, Allgaier M, Kembel SW, Knapp AA, Lory S, Brodie EL, Lynch SV, Bohannan BJ, Green JL, Maurer BA, Kolter R: Relationship between cystic fibrosis respiratory tract bacterial communities and age, genotype, antibiotics and Pseudomonas aeruginosa. Environ Microbiol. 2010, 12: 1293-1303.

    Article  CAS  PubMed  Google Scholar 

  11. Charlson ES, Bittinger K, Haas AR, Fitzgerald AS, Frank I, Yadav A, Bushman FD, Collman RG: Topographical continuity of bacterial populations in the healthy human respiratory tract. Am J Respir Crit Care Med. 2011, 184: 957-963.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Tverdal A: Body mass index and incidence of tuberculosis. Eur J Respir Dis. 1986, 69: 355-362.

    CAS  PubMed  Google Scholar 

  13. Consortium THMP: Structure, function and diversity of the healthy human microbiome. Nature. 2012, 486: 207-214.

    Article  Google Scholar 

  14. Charlson ES, Chen J, Custers-Allen R, Bittinger K, Li H, Sinha R, Hwang J, Bushman FD, Collman RG: Disordered microbial communities in the upper respiratory tract of cigarette smokers. PLoS One. 2010, 5: e15216-

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Wu GD, Lewis JD, Hoffmann C, Chen YY, Knight R, Bittinger K, Hwang J, Chen J, Berkowsky R, Nessel L, Li H, Bushman FD: Sampling and pyrosequencing methods for characterizing bacterial communities in the human gut using 16S sequence tags. BMC Microbiol. 2010, 10: 206-

    Article  PubMed  PubMed Central  Google Scholar 

  16. Hamady M, Walker JJ, Harris JK, Gold NJ, Knight R: Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat Methods. 2008, 5: 235-237.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R: QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010, 7: 335-336.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006, 72: 5069-5072.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007, 73: 5261-5267.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Lemon KP, Klepac-Ceraj V, Schiffer HK, Brodie EL, Lynch SV, Kolter R: Comparative analyses of the bacterial microbiota of the human nostril and oropharynx. MBio. 2010, 1: e00129-10-

    PubMed  PubMed Central  Google Scholar 

  21. Ling Z, Liu X, Luo Y, Yuan L, Nelson KE, Wang Y, Xiang C, Li L: Pyrosequencing analysis of the human microbiota of healthy Chinese undergraduates. BMC Genomics. 2013, 14: 390-

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. White JR, Nagarajan N, Pop M: Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009, 5: e1000352-

    Article  PubMed  PubMed Central  Google Scholar 

  23. Parks DH, Beiko RG: Identifying biologically relevant differences between metagenomic communities. Bioinformatics. 2010, 26: 715-721.

    Article  CAS  PubMed  Google Scholar 

  24. White TJ, Bruns T, Lee S, Taylor J: Amplification and Direct Sequencing of fungal ribosomal RNA genes for Phylogenetics. 1990, San Diego: Academic

    Book  Google Scholar 

  25. Koljalg U, Larsson KH, Abarenkov K, Nilsson RH, Alexander IJ, Eberhardt U, Erland S, Hoiland K, Kjoller R, Larsson E, Pennanen T, Sen R, Taylor AF, Tedersoo L, Vrålstad T, Ursing BM: UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytol. 2005, 166: 1063-1068.

    Article  CAS  PubMed  Google Scholar 

  26. Findley K, Oh J, Yang J, Conlan S, Deming C, Meyer JA, Schoenfeld D, Nomicos E, Park M, Kong HH, Segre JA, Program NIHISCCS: Topographic diversity of fungal and bacterial communities in human skin. Nature. 2013, 498: 367-370.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


This work was funded by Colciencias (Grant No. 657049326148). We would like to thank the program ‘Habitante de Calle,’ Secretaría de Salud, Medellín, Colombia, and Dr. Lucas Arias, for facilitating access to the institution and to patients with pulmonary TB diagnosis. We would also like to thank Alejandro Reyes and Silvia Restrepo (Universidad de Los Andes, Bogotá, Colombia) for their input and experimental assistance during the development of this research.

Author information

Authors and Affiliations


Corresponding author

Correspondence to María Mercedes Zambrano.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

PDP, MMZ, JR, and LEB were involved in experimental design. LEB assembled the population and epidemiological data and performed the sampling. LEB and MLC were involved in the molecular procedures. LDS, JRB, and JMA developed workflows for the analysis of bacterial and fungal diversity. LEB, LDS, MLC, JMA, and MMZ analyzed the data. LEB, LDS, MLC, and MMZ wrote and edited the paper. All authors read and approved the final manuscript.

Luz Elena Botero, Luisa Delgado-Serrano contributed equally to this work.

Electronic supplementary material


Additional file 1:Definition of groups and inclusion and exclusion criteria. Parameters used to select the individuals for the study. (PDF 58 KB)


Additional file 2:Demographic and clinical characteristics of the population. Metadata that describes the most important demographic and clinical features of the study individuals. (XLS 29 KB)

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit

The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Botero, L.E., Delgado-Serrano, L., Cepeda, M.L. et al. Respiratory tract clinical sample selection for microbiota analysis in patients with pulmonary tuberculosis. Microbiome 2, 29 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: