The neovaginal microbiome of transgender women post-gender reassignment surgery

Gender reassignment surgery is a procedure some transgender women (TW) undergo for gender-affirming purposes. This often includes the construction of a neovagina using existing penile and scrotal tissue and/or a sigmoid colon graft. There are limited data regarding the composition and function of the neovaginal microbiome representing a major gap in knowledge in neovaginal health. Metaproteomics was performed on secretions collected from the neovaginas (n = 5) and rectums (n = 7) of TW surgically reassigned via penile inversion/scrotal graft with (n = 1) or without (n = 4) a sigmoid colon graft extension and compared with secretions from cis vaginas (n = 32). We identified 541 unique bacterial proteins from 38 taxa. The most abundant taxa in the neovaginas were Porphyromonas (30.2%), Peptostreptococcus (9.2%), Prevotella (9.0%), Mobiluncus (8.0%), and Jonquetella (7.2%), while cis vaginas were primarily Lactobacillus and Gardnerella. Rectal samples were mainly composed of Prevotella and Roseburia. Neovaginas (median Shannon’s H index = 1.33) had higher alpha diversity compared to cis vaginas (Shannon’s H = 0.35) (p = 7.2E−3, Mann-Whitney U test) and were more similar to the non-Lactobacillus dominant/polymicrobial cis vaginas based on beta diversity (perMANOVA, p = 0.001, r2 = 0.342). In comparison to cis vaginas, toll-like receptor response, amino acid, and short-chain fatty acid metabolic pathways were increased (p < 0.01), while keratinization and cornification proteins were decreased (p < 0.001) in the neovaginal proteome. Penile skin-lined neovaginas have diverse, polymicrobial communities that show similarities in composition to uncircumcised penises and host responses to cis vaginas with bacterial vaginosis (BV) including increased immune activation pathways and decreased epithelial barrier function. Developing a better understanding of microbiome-associated inflammation in the neovaginal environment will be important for improving our knowledge of neovaginal health. A1pw-5DXJFMq35mSKHH2fx Video Abstract Video Abstract


Introduction
Transgender is a term used to define people whose gender identity is different from their assigned sex at birth [1]. Many transgender women (TW), defined here as people assigned as male at birth who identify as female, undergo medical interventions such as feminizing hormone therapy and gender reassignment surgery (GRS) for gender affirmation purposes [2]. GRS generally includes neovaginoplasty, during which a neovagina is created through penile inversion, scrotal grafts, sigmoid colon grafts, and/or a combination thereof [3].
There are scarce data about the environment of the neovagina particularly at the molecular and microbial level. For cisgender women (CW), i.e., those assigned as female at birth, it is well understood that optimal vaginal microbiota include Lactobacillus species. Meanwhile, non-optimal microbial communities in the cis vagina such as those containing Gardnerella vaginalis, Atopobium, Prevotella, Mobiluncus, and other anaerobic species are often higher in diversity and associate with a condition known as bacterial vaginosis (BV). BV or anaerobic dysbiosis has been linked to increased genital tract inflammation and an increased risk of STI acquisition in CW as well as in uncircumcised men [4][5][6][7]. However, limited molecular sequencing studies have been conducted on neovaginas for the purposes of defining the microbiome [8,9]. To address this gap, we chose to investigate the neovaginal microbiome using a metaproteomics technique [10]. The objectives of this study were to map and characterize the microbial composition and function of neovaginal and rectal secretions from TW and compare them with vaginal secretions from CW and to assess relationships between microbial communities measured and host immune pathways.

Demographic and clinical characteristics of study participants
We examined rectal and neovaginal secretions from TW (n = 9) and compared them against vaginal secretions from CW (n = 30). We excluded participants without measurable bacterial protein levels. Five out of the nine (56%) neovaginal samples and seven out of the nine (78%) rectal samples and all cis vaginal samples had measurable bacterial protein levels and remained in the analysis. Age, GRS methods, feminizing hormone therapy, sexually transmitted infections (STI), and sexual behavior data for the TW and CW with measurable neovaginal and vaginal microbiome data are described in Table 1. CW were younger (median age = 31) than TW (median age = 48) (p = 0.058, Mann-Whitney U test, Table 1). The median time elapsed since last GRS was 9.5 years (range 3.5-34 years).
Hierarchical clustering of individual profiles revealed 3 main branches based on bacterial protein composition (Fig. 1b). Branch 1 included 5 rectal samples, and 1 cis vaginal sample dominated by Prevotella or Roseburia. Branch 2 and 3a were composed entirely of cis vaginal samples who were dominated by Lactobacillus or Gardnerella, respectively. Branch 3b included all neovaginal samples (n = 5) as well as 3 cis vaginal and 2 rectal samples. Branch 3b had significantly higher alpha diversity based on Shannon's H diversity than branch 1, 2, and 3A (2.77, 5.27, and 2.9 fold changes, respectively; Shannon's H index, p = 0.0005, Kruskal-Wallis). Neovaginal samples had higher alpha diversity (Shannon H index median = 1.33) than cis vaginal samples (Shannon H median = 0.35) (p = 0.0072, Mann-Whitney U test) when examined separately. Indeed, neovaginas grouped more closely with non-Lactobacillus dominant/polymicrobial (< 50% Lactobacillus proteins contribute to the microbial profile) than cis vaginas when Bray-Curtis dissimilarity distances were examined (Fig. 2). Variation in bacterial community composition between individuals can be attributed to sample type (p = 0.001, r 2 = 0.13, perMANOVA) and Lactobacillus levels (p = 0.001, r 2 = 0.21 perMANOVA). 16S rRNA gene profiling also revealed similarities between the neovaginal   Proteins annotated to Jonquetella anthropi, the only bacteria identified belonging to the phylum Synergistetes, were uniquely identified in 60% of neovaginal samples. 16S rRNA genes belonging to family Synergistaceae were detected in 80% the neovaginal samples. Other unique taxa identified in at least one neovaginal sample included various Proteobacteria (Escherichia, Campylobacter, Eikenella), Firmicutes (Anaerosphaera, Anaeroglobus, Pseudoramibacter), Fusobacteria (Fusobacterium), and Actinobacteria (Actinomyces). Interestingly, the one neovaginal sample that had a sigmoid colon graft for neovaginal extension purposes had a microbiome that appeared more gut-like such that Bacteroidaceae and Enterobacteriaceae were the main taxa detected via 16S rRNA gene sequencing. This was partially confirmed via metaproteomics as there were Escherichia proteins detected as well (Supplemental Figure 1). This participant's matching rectal sample (R6) was also composed of elevated levels of Bacteroidaceae (41%) as measured by 16S rRNA gene sequencing (Supplemental Figure 3).
A subset of the samples (n = 4) included in this study had matched neovaginal and rectal secretions collected (Supplemental Figure 4A). Principal coordinate analysis highlighted the differences in microbial composition based on bacterial proteins that exist between each of the rectal-neovaginal sample pairs based on beta diversity (Bray-Curtis distances) (Supplemental Figure 4B). Variation in bacterial community composition between rectal and neovaginal samples can be attributed to sample type (p = 0.014, r 2 = 0.24, perMANOVA) and Roseburia levels (p = 0.02, r 2 = 0.20, perMANOVA).

Microbial functional differences exist between neovaginal and cis vaginal samples
Of the 541 bacterial proteins identified, 377 (70%) were successfully assigned functions from the KEGG Pathway database. The top five most abundant broad, B-level functions in neovaginal samples included energy metabolism (29.8%), carbohydrate metabolism (23.2%), amino acid metabolism (17.8%), metabolism of cofactors and vitamins (9.3%), and signal transduction (7.6%). The top B-level functions in cis vaginal samples were carbohydrate metabolism (37.5%), energy metabolism (17.5%), signal transduction (8.2%), metabolism of cofactors and vitamins (6.1%), and membrane transport (5.1%) (Supplemental Figure 5). Upon further evaluation of more specific functional categories (KEGG ko level), vitamin B6 metabolism via phosphoserine aminotransferase from Poryphromonas and various forms of amino acid and fatty acid metabolism were uniquely associated with neovaginal samples (Supplemental Table 1).

The neovagina associates with increased immune activation and decreased barrier function pathways
To explore host immunity differences between neovaginas and cis vaginas, we performed differential protein expression analysis; 158 (15.1%) proteins were significantly different between neovaginas and cis vaginas (p < 0.05, Mann-Whitney U test, Supplemental Table 2). Of those 158, 68 met the 80% power restriction based on an effect size of 2.7-fold difference. Principal component analysis and hierarchical clustering highlighted how the abundance of these proteins differs between neovaginal and cis vaginal samples (Fig. 3).
Furthermore, we performed protein set enrichment analysis comparing our data set against pre-defined protein sets from cervical immune cells. The protein set that was the most enriched and overlapped with the proteins found to decrease in the neovaginal compartment relative to the cis vaginal compartment were from CD4+CD38+HLADR+ T cells (normalized enrichment score = − 2.01, FDR q value = 1.16E−3) (Supplemental Figure 6, Supplemental Table 8).

Discussion
Anaerobic bacterial species dominated the neovaginal microbiome. The neovaginal microbial profiles identified in this study overlap with what has been seen in previous penile skin-lined neovaginal studies as well as uncircumcised penile studies including elevated levels of Prevotella, Porphyromonas, and Peptoniphilus (Clostridiales Family XI) [7,8,[11][12][13][14][15]. Indeed, the bacterial composition of penile skinlined neovaginas resembled those of uncircumcised penises with penile community state types (CST) known to be abundant with BV-associated bacteria [14].
Despite the great deal of consistency of taxa observed in our study and others, several unique taxa were also identified: Eikenella, Anaeroglobus, Anaerosphaera, and Pseudoramibacter. Bacteria identified in the neovagina may represent bacteria that were seeded by unique routes of transmission. For instance, Eikenella corrodens is a commensal bacteria found in the mouth. Oralgenital contact has been suggested as a possible route of transmission of these bacteria to the genital tract [16]. Anaeroglobus geminatusa, Pseudoramibacter alactolyticus, Campylobacter ureolyticus, Fusobacterium nucleatum, and Actinomyces have been described as putative pathogens also found in the oral cavity associated with periodontitis and endodontic infections [17][18][19][20][21]. The presence of oral bacteria in the neovaginal compartment could suggest oral-genital bacterial transmission. Jonquetella anthropi has been detected on the scrotum and penis and has also been described as an opportunistic pathogen associated with soft tissue infections [7,22,23]. Taxa belonging to the phylum Synergistetes have been detected from healthy cis vaginas, although we did not see any detected in the cis vaginal samples analyzed in our study [22]. Due to the detection of J. anthropi from scrotal/penile samples in other studies, and the fact that the penile inversion/scrotal graft surgical method was the main surgical method used on the TW in this study, there may have been carry over or seeding of these microbiota from the original penis and/or scrotum into the neovagina. Indeed, 3 out of the 4 participants and 4 out of 4 participants who had penile inversion/scrotal graft neovaginoplasty surgery method conducted without a sigmoid colon graft had detectable J. anthropi proteins and had measurable Synergistaceae 16S rRNA gene sequences, respectively, in their neovaginal samples. Furthermore, the one neovaginal sample that had a sigmoid colon graft in addition to the penile inversion/scrotal graft surgical method had a microbiome that appeared more gut-like as Bacteroidaceae and Enterobacteriaceae were the main taxa dominating its microbial profile. This finding provides further evidence that the organs used to generate and/or modify the neovagina represent major sources of bacterial transmission or origination that may contribute to the neovaginal microbiome.
Some studies suggest that the vaginal compartment is seeded by bacteria found in the rectal compartment [9]. We found very little bacterial protein-based compositional similarity between matching neovaginal and rectal profiles based on bacterial proteins measured, although we were underpowered to properly evaluate this comparison. The rectal microbial profiles observed in our study were similar to those of other studies particularly those that examined rectal/anal microbiomes of CW as well as men who have sex with men where Prevotella and Bacteroides were most abundant [24][25][26].
Various taxa that are associated with BV in CW were detected in the neovagina, including elevated levels of Prevotella, Mobiluncus, Porphyromonas, and Peptostreptococcus [27]. Neovaginas also had similar host responses to cis vaginas with BV such that we observed increased immune activation signatures including increased amino acid metabolism, short-chain fatty acid metabolism, TLR responses, and bacterial invasion/phagocytosis, as well as decreased signatures of barrier and innate immune function [28][29][30]. Decreased levels of particular antimicrobial and/or defense proteins such as cathelicidin (CAMP) and lipocalin-2 (LCN2) may hinder appropriate immune responses to non-optimal bacteria [31,32]. Decreases in LCN2 may lead to increased bacteria-driven inflammation as this protein has been shown to limit inflammation by restricting bacterial access to iron [33]. Increased amino acid metabolism, particularly the degradation of isoleucine, leucine, and valine, has been linked with antimicrobial protein expression including beta-defensins and mucosal barrier function [34,35]. Therefore, a lack of these amino acids or amino acid starvation could impair barrier function. This has also been shown to trigger inflammation and T helper 17 cell responses [36,37]. Furthermore, bacterial vitamin B6 metabolism, a bacterial function uniquely associated with neovaginas in our study, may also be linked to increased host inflammation and hindered immune responses as vitamin B6 levels have been shown to be inversely correlated with various proinflammatory markers. A deficiency in vitamin B6 is associated with reduced lymphocyte proliferation, T cellmediated cytotoxicity, and antibody production [38,39]. We also found that neovaginal host signatures overlapped with signatures associated with elevated, activated CD4+ T cell levels in the female genital tract providing further evidence of increased immune activation signatures are being detected within the neovagina. Overall, these data suggest that neovaginas are similar to polymicrobial or BV-like cis vaginas based on the bacterial composition, bacterial function, and the corresponding host immune activation and barrier dysfunction profiles.
Neovaginas generated from penile and scrotal skin, which are known to express estrogen receptors, may also have an intrinsic pre-disposition to decreased barrier protein expression due to low estrogen levels relative to the cis vagina [40,41]. Indeed, we found estrogenregulated keratins at lower levels in the neovagina as well as a number of cornified envelope proteins. The cornified envelope, as well as the corneodesmosomes found within, is critical to maintaining barrier integrity in tissues that experience mechanical stress such as the neovaginal or vaginal skin [42], and if these are weakened, neovaginas may be more likely to experience tears and/or damage and have limitations to their wound healing potential [43][44][45]. It is also well understood that estrogen promotes keratinization and barrier integrity in the vagina of animal models as well as in the inner foreskin in humans, and a lack of estrogen or its receptors results in a loss of the cornified layer [46][47][48]. Of the 5 neovaginal samples included in our analysis, three reported taking estrogen transdermally; therefore, it is possible our observations could be related to a lack of intrinsic and/or pharmaceutically delivered estrogen.
While there was considerable overlap, we observed different bacterial information from 16S rRNA gene sequencing and mass spectrometry-based proteomics. This is not unexpected as these two methodologies measure different components of the microbiome. Proteomic data better reflects the metabolic state of a bacterium which may be dependent upon many factors including the growth state, nutrient availability, and composition of the neighboring microbiota [49][50][51]. 16S rRNA gene data provides more sensitive data on bacterial composition, but does not provide information on bacterial activity. Therefore, it is not unexpected to observe differences in the proteomic and genomic data in this study.
Limitations exist in this study including its small sample size and post hoc study design. Four of the TW included in this study had the penile inversion/scrotal graft surgical method used for their neovaginoplasties, and one TW had the penile inversion/scrotal graft method with a sigmoid colon graft extension. Future studies with larger sample sizes will be required to better compare the impact of surgical method (i.e., penile inversion versus sigmoid colon grafts) on the microbiome. Further to this, future studies should include uncircumsised penile samples as a comparator to penile skin-lined neovaginal samples. Another limitation in our study was that the CW were not from the same geographic location as the TW and therefore may introduce underlying variation between study groups. There are a few methodological limitations in our study that are important to note. Shotgun proteomics is less sensitive than other sequencing methods such as 16S rRNA sequencing and metagenomics, and therefore, fewer bacterial species were detected by metaproteomics. The 16S rRNA gene sequencing methods used on CW and TW samples were different (V3-V4 vs V4 regions, respectively), as the CW data were originally generated for an independent study. Utilizing different regions of the 16S rRNA gene can impact which bacteria are preferentially amplified, and this represents a limitation to this study. Furthermore, the CW and TW 16S rRNA gene data were each processed using different bioinformatics pipelines, which could also introduce biases in each data set. Nevertheless, we do not expect these to be major contributing factors to the observations in this study. Despite these limitations, this is the first study to evaluate the microbiome of the neovagina using a metaproteomics technique where both bacterial composition and function can be described and related to host responses.

Conclusions
This study identified unique bacteria in the neovaginal compartment which may have been transmitted via the oral-genital route and/or may represent bacteria originally associated with the organs used to generate and/or modify the neovagina. This study corroborates previous neovaginal studies identifying neovaginas with diverse, polymicrobial communities that elicit similar host responses to cis vaginas with BV. Increased immune activation and reduced barrier protein signatures detected within the neovaginal compartment, whether caused by the bacteria present or an intrinsic lack or insufficient level of pharmaceutically delivered estrogen, are important findings that increase our understanding of the physiology of the neovagina.

Study populations and ethics statement
This is a cross-sectional study that evaluated Brazilian, TW recruited at the LaPClin-AIDS Clinical Research Laboratory of the National Institute of Infectious Diseases Evandro Chagas (INI), at Oswaldo Cruz Foundation (FIOCRUZ) in Rio de Janeiro, Brazil, and CW from Canada and Sweden. All TW were over the age of 18 and were tested for STIs including HIV, chlamydia, gonorrhea, syphilis, and hepatitis B and C. CW were also over the age of 18, not pregnant, and were tested for HIV, chlamydia, and gonorrhea. Swedish participants included in our CW control group were from a low-risk cohort not taking any form of hormonal contraception. Canadian participants included in our CW control group were from a higher-risk cohort of women experiencing negative reproductive health outcomes including vaginal symptoms and/or HIV/STI infections. Women whose samples had no measurable bacterial protein levels were excluded. The study was approved by the Research Ethics Committee at the INI-FIOCRUZ, the Research Ethics Board of the University of Manitoba, and the Stockholm Regional Ethics Board. Written informed consent was obtained from all study participants.

Sample collection
Secretions were collected by swabbing the neovaginal and rectal compartments from TW who had undergone GRS. Cervicovaginal secretions were collected from CW via swab or cervicovaginal lavage. Secretions from TW were placed in cryotubes of 500 μL Allprotect (Qiagen, Valencia, CA) and then frozen at − 80°C. Secretions from CW were frozen at − 80°C shortly after collection.

Sample preparation for mass spectrometry
Swabs were centrifuged to remove excess Allprotect. Swabs were eluted with phosphate-buffered saline (pH 7.0) at 4°C. Eluates were centrifuged to remove cellular debris and stored at − 80°C. Equal volumes and/or concentrations of each sample were digested with trypsin and analyzed by tandem mass spectrometry as described by Birse et al. [64]. Briefly, samples were denatured with urea, reduced with diothiothreitol, alkylated with iodoacetamide, and digested with trypsin into peptides. Peptides were cleaned of salt and detergents by reversephase liquid chromatography (LC) using a step-function gradient. Cleaned peptides were quantified using Lava-Pep's Fluorescent Peptide and Protein Quantification Kit (Gel Company, CA, USA) according to the manufacturer's protocol.

Mass spectrometry analysis
One microgram of peptide per sample was re-suspended in 2% acetonitrile, 0.1% formic acid, and injected into a nano-flow LC system (Easy nLC, Thermo Fisher, MA, USA) connected inline to a Q Exactive Quadrupole Orbitrap mass spectrometer (Thermo Fisher, MA, USA). The Q Exactive mass spectrometer (MS) used the following method: a 50-cm long, 2.0-μm particle-sized Easy-Spray C-18 column (Thermo Fisher, MA, USA) was used for peptide separation. The elution gradient was from 98% buffer A to 30% buffer B in 200 min at a constant flow rate of 200 nL/min. MS spectra were acquired on the Orbitrap analyzer at 70,000 resolution at 200 m/z. After each MS spectrum and automatic selection, the 15 most intense precursor ions were selected from fragmentation by high collision dissociation, at 28% normalized collision energy, and were acquired in the Orbitrap analyzer at 17,500 resolution at 200 m/z. Bacterial peptide identity searching was performed using Mascot (v2.4; Matrix Science, Boston, MA). Data were searched against a manually curated TrEMBL (Translated European Molecular Biology Laboratory) database containing the major genera identified in an initial search against all TrEMBL bacterial proteins. The curated database contained 57 different bacterial taxa (Supplemental Table 9) and the database from Homo sapiens to rule out potential homologs. Human peptide identity searching was performed with Mascot v2.4.0 (Matrix Science) against the human SwissProt database. A decoy database was included to determine the false discovery rate. Search results for both human and bacteria were imported into scaffold separately to validate the protein identifications, using the following criteria: ≤ 0.1% false discovery rate (FDR) for peptide identification, ≤ 1% FDR for protein identification, and at least 2 unique peptides identified per protein. Microbial abundance was calculated by summing normalized total spectral counts for all proteins associated with each genus. Host proteome results were imported into Progenesis LC-Mass Spectrometry software to perform label-free differential protein expression analysis based on MS peak intensities. Feature detection, normalization, and quantification were all performed using default settings from the software.

Functional microbiome analysis
Functional microbiome analysis was performed using KEGG (Kyoto Encyclopedia of Genes and Genomes) ontology assignment through the GhostKOALA (KEGG Orthology And Links Annotation) portal. Pathway maps were reconstructed using observed proteins and manually curated to remove 7 categories associated with organism-level functions (aging, cardiovascular diseases, endocrine and metabolic diseases, endocrine system, immune system, nervous system, neurodegenerative diseases), protein not found in the database, and 2 general "overview" categories to eliminate redundancy (global overview and maps, cancers: overview). Cumulative functional abundance for each category was calculated by summing abundances of all associated protein spectral counts, and proteins belonging to multiple categories contributed to each of those associated.
The protein set size parameters were set between 15 and 500 proteins associated. Protein sets with an FDR q value below 0.05 were included in our analysis. For protein sets with overlapping associations with our data set, only those with a normalized enrichment score greater than 2 are shown. We defined normalized enrichment scores (NES) greater than an absolute value of 2.0 as high scoring, NES > |1.5| as medium scoring, and NES < 1.5 as weak scoring. Enrichment scores are assigned based on how similar each protein is ranked between the two data sets. The greater the overlap and consistent ranking of proteins between our data set and the predefined data set, the higher the enrichment score. The rank metric score is the score used to position the gene in the ranked list and represents each protein's correlation with your phenotype of interest (i.e., neovaginal vs cis vagina).
PCR products were purified using Ampure XP beads (Beckman Coulter, Mississauga, ON) and run on QIAXcel Advanced Instrument (QIAGEN, Inc., Toronto, ON) to check amplicon purity and band size. All samples were amplified to add sequencing adaptors in a second PCR, using Nextera XT Index Kit v2 Set A and Set D (Illumina Inc., San Diego, CA, USA). This PCR reaction was completed in a total volume of 50 μL and had 8 cycles of 95°C for 30 s, 55°C for 30 s, 72°C for 30 s, followed by extension at 72°C for 5 min. PCR products were again purified using Ampure XP beads and run on QIAXcel Advanced Instrument (QIAGEN, Inc., Toronto, ON) to check amplicon purity and band size. DNA concentration in nanograms per microliter was quantified using Qubit 2.0 fluorometer (Life Technologies, Inc., Burlington, ON), after which concentration of all samples was normalized to 4 nM. Samples were prepared for MiSeq following manufacturer's protocol (Illumina Inc., San Diego, CA, USA). Final pooled DNA was diluted to 8 pM, and a spike-in of 10% PhiX was run with pooled samples. Experiment was run on Illumina MiSeq using 500 cycle v2 PE reagents, resulting in 2 × 250 bp pairedend reads.
Reads were analyzed using mothur v1.39.5 [54] and following outline of the MiSeq SOP [55]. Briefly, forward and reverse reads for each sample were joined into contigs and the primer sequences were trimmed. The tertiary quartile of contig length was found to be 427 bp, and therefore, any contigs over 427 bp in length were discarded. A custom version of the 16S rDNA SILVA reference alignment (v132) [56] was made specific to the V3-V4 region of the 16S rRNA gene, and contigs were aligned to this reference. Any sequences that did not align were discarded. Sequences with up to 2 base pair differences were combined in a precluster step, following which chimeras were identified and removed using UCHIME [57]. Sequences were then classified using the naive Bayesian classifier [58] and Ribosomal Database Project (RDP) taxonomy database (v16) [59]. Phylotype classification was used to identify sequences to the phylogenetic level of genus, and a taxonomy summary table produced.
Sixty-two genera were identified across all samples. Taxa that were identified with a higher abundance in water controls than in samples (based on a fold calculation) were removed as contaminants, with the exception of Pseudomonas. This taxon was detected at a similar level of low abundance in both samples and water control, but as we have previously detected Pseudomonas in CVL samples using mass spectrometry, we elected to include this genus [10]. Replicates for each biological sample were pooled, and for the purposes of visualization, the top 25% most abundant taxa overall are shown, while remaining lower abundance taxa have been binned to "other." Swedish cis vaginal 16S rRNA gene analysis DNA extraction, targeted amplification, and sequencing of the V3-V4 regions of the 16S rRNA gene from vaginal mucosal samples were performed as described previously [60].

Statistical analysis
Differential protein expression was performed using non-parametric Mann-Whitney U tests. Power calculations were performed in G*Power (v3.1.9.2). We were able to detect host proteome differences of 2.7-fold between samples taken from cis vaginas (n = 30) and neovaginas (n = 5) while retaining 80% power, assuming a proteome variance of 100% [60], an adjusted alpha = 0.0001. Alpha (Shannon's H index) and beta diversity (Bray-Curtis dissimilarity distances) calculations as well as permutational multivariate analysis of variance (per-MANOVA) calculations were performed in R (v3.5.0) using the vegan.R (v2.5-3) package. Principal coordinate analysis was conducted in R using ape.R (v5.2) package. The phyloseq.R (v1.16) package was also used for the analysis of 16S microbiome profiling data. Principal component analysis was conducted in MatLab and EigenVector software. Graphs and statistical analysis for bacterial protein function were generated using the functional microbiome analysis pipeline (LOGAN) [63]. Hierarchical clustering was conducted in R using NMF.R (v0.21.0). Pearson distance metrics and complete linkage were the parameters specified. Human protein functional analysis was conducted using over-representation analysis via ConsensusPathDB (Max Planck Institute for Molecular Genetics). Pathway-based sets (INOH, Reactome, KEGG) and gene ontology biological processes level 5 categories were selected. p values were calculated using hypergeometric tests.

Supplementary information
Supplementary information accompanies this paper at https://doi.org/10. 1186/s40168-020-00804-1. Additional file 6: Supplemental Figure 6. Protein set enrichment analysis identifies overlapping signatures measured in the neovagina of transgender women and in cervices of cisgender women with elevated activated CD4+ T cell levels. A) Pre-ranked proteins are plotted based on their enrichment score. Each dot on the plot represents overlapping proteins known to decrease in individuals with elevated cervical CD4+CD38+HLADR+ T cell levels as well as those known to decrease in neovagina relative to cis vagina. B) The rank metric score represents each protein's correlation with the neovaginal phenotype. Proteins highlighted in blue represent those that account for the core enrichment, proteins that contribute most to the enrichment, observed. Table 1. Percent coverage and mean normalized bacterial spectral counts of KEGG ko level functions from neovaginal and cis vaginal compartments. Supplemental Table 2. Proteins differentially abundant between neovaginas and cis vaginas. Supplemental Table 3. Enriched host pathways positively associated with the neovaginal compartment compared to the cis vaginal compartment. Supplemental Table 4. Enriched host pathways negatively associated with the neovaginal compartment compared to the cis vaginal compartment. Supplemental Table 5. Human protein correlates of bacterial diversity as measured by Shannon's H index. Supplemental Table 6. Enriched host pathways positively associated with bacterial diversity as measured by Shannon's H index. Supplemental Table 7. Enriched host pathways negatively associated with bacterial diversity as measured by Shannon's H index. Supplemental Table 8. Immune cell protein set signatures that overlap with signatures enriched in the neovagina. Supplemental Table 9. Taxa included in the curated database and each taxa's protein detection from initial TREMBL bacteria database search. 16S analysis and analyzed data from the transwomen cohort. AL performed the 16S analysis and analyzed data from the cis women cohort. CFZ generated the cervical cell database used for protein set enrichment analysis. SM and LNR generated the application for functional metaproteome analysis (LOGAN). LNR also assisted with data processing and provided statistical support. EMJ, BG, RKF, and VV collected the TW's samples. KB and FB collected samples from the Swedish participant controls included in our study. VP collected samples from the Canadian participant controls included in our study. AB and GA funded and conceived the study. The authors read and approved the final manuscript. Availability of data and materials 16S rRNA gene sequence files and metadata for the TW and CW samples used in this study have been deposited in Figshare (https://figshare.com/ articles/TW_neovaginal_rectal_buccal_seq/11690382; https://figshare.com/ articles/CW_16S/11710227). Data sets including unrarefied OTU tables from 16S rRNA gene sequence data, metadata, protein spectral count data, and R scripts used in this study are available in GitHub (https://github.com/ kmbirse/Birse_etal_Neovaginal-Microbiome).

Ethics approval and consent to participate
The study was approved by the Research Ethics Committee at the Evandro Chagas National Institute of Infectology (Rio de Janeiro, Brazil, CAAE: 55081016.0.0000.5262), the Research Ethics Board of the University of Manitoba (ethics #HS21185 (H2017:338)), and the Stocholm Regional Ethics Board (2018/1476-32/2). Written informed consent was obtained from all study participants.

Consent for publication
Not applicable