Microbial function and genital inflammation in young South African women at high risk of HIV infection

Background Female genital tract (FGT) inflammation is an important risk factor for HIV acquisition. The FGT microbiome is closely associated with inflammatory profile, however, the relative importance of microbial activities has not been established. Since proteins are key elements representing actual microbial functions, this study utilized metaproteomics to evaluate the relationship between FGT microbial function and inflammation in 113 young and adolescent South African women at high risk of HIV infection. Women were grouped as having low, medium or high FGT inflammation by K-means clustering according to pro-inflammatory cytokine concentrations. Results A total of 3,186 microbial and human proteins were identified in lateral vaginal wall swabs using liquid chromatography-tandem mass spectrometry, while 94 microbial taxa were included in the taxonomic analysis. Both metaproteomics and 16S rRNA gene sequencing analyses showed increased non-optimal bacteria and decreased lactobacilli in women with FGT inflammatory profiles. However, differences in the predicted relative abundance of most bacteria were observed between 16S rRNA gene sequencing and metaproteomics analyses. Bacterial protein functional annotations (gene ontology) predicted inflammatory cytokine profiles more accurately than bacterial relative abundance determined by 16S rRNA gene sequence analysis, as well as functional predictions based on 16S rRNA gene sequence data (p<0.0001). The majority of microbial biological processes were underrepresented in women with high inflammation compared to those with low inflammation, including a Lactobacillus-associated signature of reduced cell wall organization and peptidoglycan biosynthesis. This signature remained associated with high FGT inflammation in a subset of 74 women nine weeks later, was upheld after adjusting for Lactobacillus relative abundance, and was associated with in vitro inflammatory cytokine responses to Lactobacillus isolates from the same women. Reduced cell wall organization and peptidoglycan biosynthesis were also associated with high FGT inflammation in an independent sample of ten women. Conclusions Both the presence of specific microbial taxa in the FGT and their properties and activities are critical determinants of FGT inflammation. Our findings support those of previous studies suggesting that peptidoglycan is directly immunosuppressive, and identify a possible avenue for biotherapeutic development to reduce inflammation in the FGT. To facilitate further investigations of microbial activities, we have developed the FGT-METAP application that is available at (http://immunodb.org/FGTMetap/).

10 taxonomic analysis. Both metaproteomics and 16S rRNA gene sequencing analyses showed 11 increased non-optimal bacteria and decreased lactobacilli in women with FGT inflammatory profiles.

12
However, differences in the predicted relative abundance of most bacteria were observed between 13 16S rRNA gene sequencing and metaproteomics analyses. Bacterial protein functional annotations 14 (gene ontology) predicted inflammatory cytokine profiles more accurately than bacterial relative 15 abundance determined by 16S rRNA gene sequence analysis, as well as functional predictions based 16 on 16S rRNA gene sequence data (p<0.0001). The majority of microbial biological processes were 17 underrepresented in women with high inflammation compared to those with low inflammation, 18 including a Lactobacillus-associated signature of reduced cell wall organization and peptidoglycan 19 biosynthesis. This signature remained associated with high FGT inflammation in a subset of 74 20 women nine weeks later, was upheld after adjusting for Lactobacillus relative abundance, and was 21 associated with in vitro inflammatory cytokine responses to Lactobacillus isolates from the same 22 women. Reduced cell wall organization and peptidoglycan biosynthesis were also associated with 23 high FGT inflammation in an independent sample of ten women.  e  t  a  p  r  o  t  e  o  m  i  c  s  ;  m  i  c  r  o  b  i  o  m  e  ;  m  i  c  r  o  b  i  a  l  f  u  n  c  t  i  o  n  ;  f  e  m  a  l  e  g  e  n  i  t  a  l  t  r  a  c  t  ;  i  n  f  l  a  m  m  a  t  i  o  n  ;  c  y  t  o  k  i

24
Bacterial vaginosis (BV), non-optimal cervicovaginal bacteria, and sexually transmitted infections 25 (STIs), which are highly prevalent in South African women, are likely important drivers of genital 26 inflammation in this population [9,10]. BV and non-optimal bacteria have been consistently associated 27 with a marked increase in pro-inflammatory cytokine concentrations [9][10][11][12]. Conversely, women who 28 have "optimal" vaginal microbiota, primarily consisting of Lactobacillus spp., have low levels of genital 29 inflammation and a reduced risk of acquiring HIV [13]. Lactobacilli and the lactic acid that they 30 5 produce may actively suppress inflammatory responses and thus may play a significant role in 1 modulating immune profiles in the FGT [14][15][16]. However, partly due to the complexity and diversity of 2 the microbiome, the immunomodulatory mechanisms of specific vaginal bacterial species are not fully 3 understood [17]. Further adding to this complexity is the fact that substantial differences in the 4 cervicovaginal microbiota exist by geographical location and ethnicity and that the properties of 5 different strains within particular microbial species are also highly variable [15,18]. Of the proteins identified in lateral vaginal wall swab samples, 38.8% were human, 55.8% were 2 bacterial, 2.7% fungal, 0.4% archaeal, 0.09% of viral origin, and 2% grouped as "other" (not shown).

3
However, as the majority of taxa identified had <2 proteins detected, a more stringent cut-off was 4 applied to include only taxa with >3 detected proteins, or 2 proteins detected in multiple taxa. The final 5 curated dataset included 44% human, 55% bacterial (n=81 taxa), and 1% fungal proteins (n=13 taxa) 6 ( Fig. 1a-d; Additional file 2: Table S2). When the relative abundance of the most abundant bacterial 7 taxa identified using metaproteomics was compared to the relative abundance of the most abundant 8 bacteria identified using 16S rRNA gene sequencing, a large degree of similarity was found at the 9 genus level. A total of 6/9 of the genera identified using 16S rRNA gene sequence analysis were also 10 identified using metaproteomics (Fig. 1e

5
S5]. Significantly overrepresented pathways in women with high versus low inflammation included 6 multiple inflammatory processes -such as chronic response to antigenic stimulus, positive regulation 7 of IL-6 production and inflammatory response (FDR adj. p<0.0001 for all; Additional file 1: Fig. S5).

9
Weighted correlation network analysis of microbial and host proteins identified five modules (clusters) 10 representing co-correlations between microbial and host proteins ( Fig. 2a, b). The yellow module 11 included primarily L. iners proteins, while the turquoise module primarily consisted of L. crispatus and 12 host proteins. The grey and brown modules consisted entirely of host proteins and the blue module 13 included non-optimal bacteria and host proteins. Pro-inflammatory cytokines correlated inversely with 14 the Lactobacillus modules (yellow and turquoise) and positively with the grey, brown and blue 15 modules. Chemokines were significantly positively correlated with the brown and grey modules and 16 inversely with the turquoise module (Fig. 2c). Blue, yellow, turquoise and grey modules correlated 17 with BV Nugent score, while BV showed no significant correlation with the brown module (Fig. 2d).

18
The finding that the brown module did not include any microbial proteins and was not associated with

19
BV or STIs, suggests the presence of inflammatory processes independent of the microbiota. For 20 STIs, no significant correlations (p-value <0.05) with any of the five modules were detected (Fig. 2d).

21
To determine whether the co-correlations of microbial proteins with host proteins had functional 22 meaning, we profiled the top biological processes of proteins as shown in the row sidebar in Fig. 2a.

23
Host proteins that correlated negatively with the blue module (consisting of non-optimal bacteria) were 24 mostly involved in cell adhesion, cell-cell adhesion, cell-cell junction assembly, and regulation of cell-25 adhesion pathways. This suggests reduced epithelial barrier function associated with G. vaginalis,

26
Prevotella spp., Megasphaera, and A. vaginae. Host proteins involved in immune system and 27 inflammatory response pathways were positively correlated with brown and grey modules and 28 inversely associated with the turquoise module including L. crispatus. Despite variability in the 29 microbial proteins and taxa, redundant microbial functional pathways were identified across different 30 9 modules, indicating that microbiota share a set of common metabolic functions (e.g. glucose 1 metabolism, protein translation and synthesis), as expected. Although most lactobacilli proteins were 2 negatively associated with BV and pro-inflammatory cytokines (see turquoise and yellow modules in 3 Fig. 2a), almost all L. iners proteins were clustered in the yellow module with slightly different 4 functional pathways from lactobacilli proteins in the turquoise module.

6
Microbial function predicts genital inflammation with greater accuracy than taxa relative 7 abundance 8 The accuracies of (i) microbial relative abundance (determined using 16S rRNA gene sequencing), (ii) 9 functional prediction based on 16S rRNA gene sequence data, (iii) microbial protein relative 10 abundance (determined using metaproteomics) and (iv) microbial molecular function (determined by 11 aggregation of GOs of proteins identified using metaproteomics) for prediction of genital inflammation 12 status (low, medium and high groups) were evaluated using random forest analysis.

23
Atopobium species), beta-phosphoglucomutase activity (expressed by lactobacilli) and GTPase 24 activity (of multiple proteins produced by mainly Prevotella, G. vaginalis, and some lactobacilli) as 25 illustrated in Fig. 3d. These findings suggest that the functions of key BV-associated bacteria and 26 lactobacilli are critical for determining the level of genital inflammation. It was also found that the out-

27
of-bag (OOB) error rate distribution was significantly lower for molecular function, followed by protein 28 relative abundance, functional prediction based on 16S rRNA gene sequence data and then taxa 29 relative abundance (Fig. 3e). This suggests that the activities of the bacteria present in the FGT play a  Using metaproteomic data from an independent group of 10 women, we 16 similarly found that peptidoglycan biosynthetic process, cell wall organization, and regulation of cell 17 shape GOs were underrepresented in women with high inflammation (Fig. 5a).

19
To further investigate whether the associations between FGT inflammation and Lactobacillus 20 biological processes and cellular components were independent of Lactobacillus relative abundance,

21
we compared these GOs in BV negative women with Lactobacillus dominant communities who were 22 grouped (median splitting of samples based on the first principal component of nine pro-inflammatory 23 cytokine concentrations) as having low (n=20) versus high (n=20) inflammation (Fig. S6).

24
Interestingly, it was found that, even though only non-significant trends towards decreased relative 25 abundance of lactobacilli determined by 16S rRNA gene sequencing (p=0.73) and metaproteomics 26 (p=0.14) were observed, several previously identified GOs, including peptidoglycan-based cell wall 27 (p=0.01), cell wall organization (p=0.03), and S-layer (p=0.01), remained associated with the level of 28 inflammation (Fig. S6). This suggests an association between these biological processes and cellular 29 components that is not fully explained by the relative abundance of the lactobacilli themselves.

25
Changes in FGT metaproteomic profiles over time

26
To evaluate variations in FGT metaproteomic profiles associated with inflammatory profiles over time,

27
we compared proteins, taxa and biological process and cellular component GOs between two time-28 points nine weeks apart (n=74). The grouping according to inflammatory cytokine profile was 29 consistent at both visits for 41/74 (55%) participants, including 7/12 (58%) women who received 1 3 antibiotic treatment between visits (Fig. 6a). The findings based on the second visit also confirmed the 1 role of inflammation as one of the drivers of variation in complex metaproteome data, and similar 2 patterns of data distribution according to inflammation status were observed at both visits (Fig. 6b).

3
When the proteins, taxa and functions that were most closely associated with inflammation grouping 4 (high vs. low) at the first visit, were evaluated at the second visit, overall profiles remained similar 5 between visits (Fig. 6c-e). Significantly lower relative abundance of Lactobacillus species and 6 increased non-optimal bacteria were observed in women with inflammation at the second visit ( Fig.   7 6d). Similarly, cell wall organization, regulation of cell shape, and peptidoglycan biosynthetic process   between visits in women who moved from a higher to a lower inflammation grouping and decreased in 20 women who moved from a lower to a higher inflammation grouping.

22
While we found a large degree of similarity between metaproteomics and 16S rRNA gene sequence 23 taxonomic assignment determined for the same women [27], important differences in the predicted 24 relative abundance of most bacteria were observed. Differences in the relative abundance of certain 25 taxa observed using metaproteomics and 16S rRNA gene sequence data is not surprising as 26 taxonomic assignment from metaproteomic approaches also indicates functional activity (measured 27 by the amount of expressed protein), rather than simply the abundance of a particular bacterial taxon.

28
Additionally, others have noted that 16S rRNA gene sequencing does not have the ability to 29 distinguish between live bacteria and transient DNA [28]. We also observed differences that are likely 1 5 due to database limitations. For example, Lachnovaginosum genomospecies (previously known as 1 BVAB1) was not identified in the metaproteomics analysis. This is due to the fact that this species 2 was not present in the UniProt or NCBI databases [29]. However, the Clostridium and Ruminococcus 3 species which were identified are likely to be Lachnovaginosum genomospecies since these taxa fall 4 into the same Clostridiales order. Additionally, Clostridium and Ruminococcus species were more 5 abundant in women with high compared to low inflammation, as expected for 6 Lachnovaginosum genomospecies [10]. Metaproteomic analysis was able to identify lactobacilli to 7 species level, while sequencing of the V4 region of the 16S rRNA gene had more limited resolution for 8 this genus, as expected.

10
The finding that fungal protein relative abundance was substantially lower than bacterial protein 11 relative abundance is not surprising as it has been estimated that the total number of fungal cells is 12 orders of magnitude lower than the number of bacterial cells in the human body [30]. Although

13
Candida proteins were detected, the curated dataset (following removal of taxa that had only 1 protein 14 detected or 2 proteins detected in only a single sample) did not include any Candida species. This 15 may be due to the relative rarity of fungal cells in these samples and it would be interesting to 16 evaluate alternative methodology for sample processing to better resolve this population in the future

17
[31]. A limitation of microbial functional analysis using metaproteomics is that, at present, there is 18 generally sparse population of functional information in the databases for the majority of the 19 microbiome. Thus, the findings of this study will be biased toward the microbes that are better 20 annotated. Nonetheless, microbial function was closely associated with genital inflammatory profiles 21 and it is possible that this relationship may be even stronger should better annotation exist. Another and allows users to repeat the analyses presented here, as well as obtain additional information by 1 analyzing individual proteins, species, and pathways.

3
A significant functional signature associated with FGT inflammation grouping included Lactobacillus-4 associated cell wall, peptidoglycan and cell membrane biological processes and cellular components.

5
Although these functions were linked exclusively to Lactobacillus species, the associations with FGT 6 inflammation were upheld after adjusting for Lactobacillus relative abundance determined by 16S 7 rRNA gene sequencing and were also evident in an analysis including only women with Lactobacillus-8 dominant microbiota. This profile was confirmed when these GOs were compared between 9 Lactobacillus isolates that induced high versus low levels of cytokine production in vitro. Previous 10 studies have suggested that the peptidoglycan structure, the proteins present in the cell wall, as well 11 as the cell membrane, may influence the immunomodulatory properties of Lactobacillus species 12 [14,24]. In a murine model, administration of peptidoglycan extracted from gut lactobacilli was able to

28
was further found that L. crispatus was more strongly associated with low inflammation in the 29 differential abundance analysis and that L. iners was the Lactobacillus sp. most frequently detected in 1 7 women with high inflammation. The co-expression analysis revealed that the majority of L. crispatus 1 proteins grouped separately to L. iners proteins. However, both the L. crispatus and the L. iners 2 modules were strongly inversely associated with inflammatory cytokine and chemokine 3 concentrations. This is interesting since L. crispatus dominance is considered optimal, while the role 4 of L. iners, the most prevalent Lactobacillus species in African women [10,11], is poorly understood 5 and it has been associated with compositional instability and transition to a non-optimal microbiota, as 6 well as increased risk of STI acquisition [32,33].

8
In this study, we observed a similar host proteome profile associated with high FGT inflammation 9 compared to previous studies [7]. Multiple inflammatory pathways were overrepresented in women 10 with high versus low FGT inflammation, while signatures of reduced barrier function were observed in 11 women with high inflammation, with underabundant endothelial, ectoderm and tight junction biological 12 processes. These findings similarly suggest that genital inflammation may be associated with 13 epithelial barrier function.

16
The link between FGT microbial function and local inflammatory responses described in this study 17 suggests that both the presence of specific microbial taxa in the FGT and their properties and 18 activities likely play a critical role in modulating inflammation. Currently, the annotation of microbial 19 functions is sparse, but with ever-increasing amounts of high-quality, high-throughput data, the 20 available information will improve steadily. The findings of the present study contribute to our 21 understanding of the mechanisms by which the microbiota may influence local immunity, and in turn 22 alter the risk of HIV infection. Additionally, the analyses described herein identify specific microbial 23 properties that may be harnessed for biotherapeutic development.

11
The detailed parameters are provided in Additional file 1:

17
whilst exhibiting a high degree of similarity compared to the 16S rRNA gene sequence data.

18
Therefore, the data generated using the first database was used for downstream analysis.

20
For quality control analysis, raw protein intensities and log 10 -transformed iBAQ intensities of the 21 quality control pool, as well as the raw protein intensities of the clinical samples, were compared 22 between batches (not shown

4
The effects of BV, STIs and chemokine and pro-inflammatory cytokine profiles on the metaproteome 5 were investigated by PCA using the mixOmics R package [37]. The ComplexHeatmap package [38] 6 was used to generate heatmaps and cluster the samples and metaproteomes based on the 7 hierarchical and k-means clustering methods. Additionally, R packages EnhancedVolcano, weighted 8 correlation network analysis (WGCNA) [39], and FlipPlots were applied for plotting volcano plots, co-9 correlation analysis, and Sankey diagrams, respectively. Basic R functions and the ggplot2 R 10 package were used for data manipulation, transformation, normalization and generation of graphics.

11
To identify the key factors distinguishing women defined as having low, medium and high 12 inflammation in their FGTs, we used random forest analysis using the R package randomForest [40] 13 using the following settings: (i) the type of random forest was classification, (ii) the number of trees   Coulter, CA, USA) and quantified using the Qubit dsDNA HS Assay (Life Technologies, CA, USA).

5
Illumina sequencing adapters and dual-index barcodes were added to the purified amplicon products

10
Inflammatory and antimicrobial properties differ between vaginal Lactobacillus isolates from South