Host macaque transcripts confirm increased inflammation in ICD
Prior reports of ICD indicate that it is a form of inflammatory bowel disease of the colon [2, 3]. To verify this phenotype in our samples, we investigated the host transcriptome, which is known to include intestinal epithelial cells of the colon that are exfoliated in fecal samples [22]. RNA reads mapped to rhesus macaque (host) proteins were examined to compare host transcript abundance in control and ICD samples. There were 562 host transcripts that were robustly increased in ICD (log2 fold change ≥ 2 relative to control; adjusted p ≤ 0.05) and 358 host transcripts increased in control animals (log2 fold change ≥ 2 relative to ICD; adjusted p ≤ 0.05). Functional enrichment analyses yielded major clusters for transcripts increased in ICD, including “collagen/extracellular matrix(ECM)”, “ECM-receptor interaction/PI3K-AKT signaling,” “inflammatory bowel disease”, and “Toll-like receptor signaling pathway.” Upstream regulator analysis revealed well-known inflammatory regulators interferon gamma (IFNγ), tumor necrosis factor (TNF), interleukin-1 beta (IL1β), and interferon alpha (IFNα) as highly significant (− log(p-value) > 30) in ICD animals relative to controls. For host transcripts that were elevated in control relative to ICD, there were no significant upstream regulators.
Ten host transcripts were 32× higher in ICD than controls (Fig. 1). Prostaglandin G/H synthase 2 (PTGS2, also known as cyclooxygenase-2 or COX-2) is a downstream target of innate immune signaling via the Toll-like receptor 4 (TLR4) signaling pathway. Formin-like protein 1 (FMNL1) is a major component of macrophage podosomes, which is an integrin-based structure that is unique to invasive cells [23]. Protein S100-A9 forms a heterodimer with S100-A8 (also increased in ICD, log2 fold change = 3.80, adjusted p = 0.0002). This heterodimer is known as calprotectin, which comprises more than half of the cytosolic protein in neutrophils [24]. Fecal calprotectin is frequently used as a marker of intestinal inflammation [25]. Interleukin-8 precursor (IL-8, also known as CXCL8) is expressed by intestinal epithelial cells to recruit neutrophils from the lamina propria to the epithelium [26]. Pleckstrin (PLEK) is most highly expressed in neutrophils and macrophages (using GeneVisible [27]). Interleukin-1 beta (IL1β) is a pro-inflammatory cytokine that is secreted by activated macrophages and also by colonocytes [28]. Indoleamine 2,3-dioxygenase 1 (IDO1) is an enzyme that is the rate-limiting step in tryptophan catabolism. IDO1 is highly expressed by antigen-presenting cells to reduce tryptophan levels in the local microenvironment, which reduces activated T cell responses and promotes regulatory T cell activity [29]. In addition to antigen-presenting cells, colonocytes also express IDO1 during inflammation [30]. Granulocyte colony-stimulating factor receptor (CSF3R, also known as CD114) is expressed by neutrophils [31]. Myeloid cell nuclear differentiation antigen (MNDA) can be expressed by innate immune cells or intestinal epithelial cells to activate inflammasomes [32]. TLR4 is a receptor expressed by intestinal epithelial cells that recognize bacterial PAMPs, such as lipopolysaccharide [33]. Taken together, these most highly differentially expressed genes suggest that the stool samples from animals with ICD contain cells which express a pro-inflammatory profile.
To determine which host cell types might be differentially present in the stool samples, the expression of cell marker-associated genes were examined. Keratins KRT8, KRT18, KRT19, and KRT20 were all more highly expressed in ICD, confirming that exfoliated colonocytes were present in the stool and more abundant in ICD. Genes CD14, CD44, CD53, and CD97 were more highly expressed in ICD, suggesting that a heterogeneous population of leukocytes may be present in ICD. The high expression of CSF3R, S100-A8, and S100-A9 was supportive of the presence of neutrophils. Thus, the host transcriptome provided some evidence that there were more exfoliated colonocytes and leukocytes in the stool of ICD-afflicted animals compared with controls. Collectively, the host cells in stool from animals with ICD included colonocytes as well as leukocytes, and these cells exhibited a pro-inflammatory transcriptional program.
Fecal metatranscriptomes had decreased diversity in ICD samples and clustered by phenotype
Using all RNA-Seq reads that mapped to bacterial proteins, diversity among the microbial populations was calculated based on transcript abundances, with a 0.1% abundance cutoff threshold for organism identification at the genus level. Comparison of the calculated diversity metrics between samples from control and ICD-afflicted animals showed a significant decrease in diversity in ICD. Using the Shannon diversity calculation, controls had an average diversity score of 3.28, compared with the ICD-associated average diversity score of 1.93 (p = 1.245e−5) (Fig. 2a). Similarly, by the Simpson diversity calculation, there was a higher average diversity among control samples when compared with ICD (0.906 vs. 0.656, p = 2.380e−5) (Fig. 2b).
Using the same organism transcript abundance measurements with the 0.1% threshold, a principal component analysis (PCA) (Fig. 2c) demonstrated that control and ICD phenotypes clearly segregated, with greater variation seen across ICD samples when compared to the control samples. Hierarchical clustering of whole transcriptomes also showed clustering of control samples (Fig. 2d). However, ICD-associated metatranscriptomes split into two clusters, with some ICD samples (ICD.8, ICD.11, ICD.10) more similar to controls than to other ICD samples. These data suggest greater heterogeneity among the ICD samples. In both the PCA and clustering analyses, control animal 5’s metatranscriptome was more similar to the ICD animals than other controls. Upon re-inspection of life histories, individual macaque control 5 was in the hospital for more days than any other control animal, so it is possible that this animal was not truly a healthy control. However, for the purposes of the remaining analyses, this sample was retained among the controls. In summary, microbial transcriptome diversity was decreased in ICD samples, but heterogeneity between ICD samples was increased.
Control and ICD samples harbored significant differences in organism-specific gene expression
Using all mRNA reads mapped to bacterial organisms, transcript counts were summarized at the genus level. The genera with the highest transcript abundances in control samples included Prevotella, Bacteroides, Treponema, Ruminococcus, Clostridium, Streptococcus, Oscillibacter, Megasphaera, Bacteroidales, and Eubacterium (Fig. 3). The genera with the highest transcript abundances in ICD samples included Prevotella, Bacteroides, Megasphaera, Eubacterium, Clostridium, Ruminococcus, Roseburia, Faecalibacterium, Parabacteroides, and Lactobacillus. Although many genera were common between both control and ICD samples, their relative transcript abundances differed; importantly, the “Other” category (all other genera not represented by the ten most abundant) was significantly greater in control samples (Fig. 3). Examination of differential gene expression revealed many differences between ICD and controls with 732 bacterial genera, out of 1511 total detected, showing significantly different expression levels (Additional file 1). Among the genera with the highest transcript abundances, the largest difference was observed in Prevotella, which were greatly increased (2.7-log2 fold change, adjusted p = 2.68e−16) in the ICD-afflicted animals relative to controls. There were also increases in ICD in Bacteroides (1.8-log2 fold change, adjusted p = 1.09e−13), Megasphaera (2.0-log2 fold change, adjusted p = 4.82e−6), Selenomonas (2.2-log2 fold change, adjusted p = 3.56e−14), and Campylobacter (2.7-log2 fold change, adjusted p = 2.54e−9), whereas there was a decrease in Treponema (1.7-log2 fold decrease, adjusted p = 7.2e−14). Thus, while many genera expressed genes in both control and ICD samples, their relative expression differed substantially between the two phenotypes, with the largest significant fold changes being among abundant organisms in Prevotella and Campylobacter genera.
Community-wide and Prevotella-specific transcriptomes suggest differences in nutrient availability between control and ICD samples
To further explore microbial community-wide differences in the metatranscriptomes between control and ICD samples, reads were mapped to a database derived from the SEED Subsystems hierarchy [34]. Comparison of the ICD cases to controls revealed changes even at the broadest (level 1) SEED hierarchy categories. The largest increases were in membrane transport (1.23-log2 fold increase, adjusted p = 1.76e−29), sulfur metabolism (0.75-log2 fold increase, adjusted p = 1.20e−13), and carbohydrate-linked functions (0.58-log2 fold increase, adjusted p = 7.72e−23). Notable decreases in ICD-afflicted microbiome functions were seen in the categories of secondary metabolism (2.02-log2 fold decrease, adjusted p = 2.82e−8), dormancy and sporulation (1.16-log2 fold decrease, adjusted p = 3.59e−7), nitrogen metabolism (0.99-log2 fold decrease, adjusted p = 7.1e−6), and iron acquisition and metabolism (0.98-log2 fold decrease, adjusted p = 9.84e−13). (Fig. 4). Within the sulfur metabolism category, one significantly increased enzyme was sulfatase (1.8-log2 fold increase, adjusted p = 9.92e−7), which may be one of the inflammation-contributing factors [35]. The SEED level 2 annotations “Central carbohydrate metabolism,” “Monosaccharides,” “Polysaccharides,” and “Di- and oligosaccharides” were all increased in ICD (Additional file 2), suggesting ample access to carbon sources [35]. Thus, the overall functional annotation of the bacterial community-wide transcriptome suggests that the fecal microbiome from ICD had better access to nutrients than the control fecal microbiome. This may possibly be due to the faster transit time of diarrhea, which has been shown to alter microbial composition, diversity, and activity [36].
To determine whether the functional differences of the ICD metatranscriptome may be due to faster-growing microbes in the fecal samples as would be typical of the small intestine, we examined annotations related to growth rate. Epithelial cells of the small intestine have a higher turnover rate than cells in the colon [37], and shorter transit times (e.g., diarrhea) have been linked to faster growth of some microbial species [38]. At the community level, “DNA replication” was not significantly different between control and ICD, but the level 3 annotation, “RNA_polymerase_bacterial” was increased in ICD (log2 fold change 0.55, p = 0.0001). The concentration of RNA polymerase is known to increase during exponential growth [39]; thus, it is possible that fecal microbes in ICD were growing faster than fecal microbes in controls, which would be consistent with more simple carbohydrate sources reaching the distal colon.
Given that Prevotella had greater relative abundance of transcripts in ICD, we investigated the functional annotations of genes specifically expressed by this genus. The largest significant increases in ICD relative to control samples were in the categories of respiration (0.74-log2 fold increase, adjusted p = 5.35e−25) and membrane transport (0.68-log2 fold increase, adjusted p = 9.98e−24). The largest significant decreases were in the categories of stress response (0.89-log2 fold decrease, adjusted p = 1.32e−13), potassium metabolism (0.70-log2 fold decrease, adjusted p = 3.03e−5), and phage and transposable elements (0.59-log2 fold decrease, adjusted p = 9.82e−7) (Additional file 3). Like the community-wide metatranscriptome, the Prevotella-specific transcriptome also had increased “Central carbohydrate metabolism,” “Monosaccharides,” “Polysaccharides,” and “Di- and oligosaccharides” in ICD relative to controls. We next asked whether Prevotella had greater “DNA replication” in ICD, but there was actually a slight decrease (− 0.14-log2 fold, adjusted p = 0.0041). “RNA_polymerase_bacterial” was statistically unchanged. In summary, the Prevotella-specific transcriptome suggests that Prevotella in fecal samples of animals with ICD may have had greater access to nutrients without being in an exponential stage of growth.
Transcripts from bacterial pathogens were more abundant in ICD animals
Several pathogenic species have been previously linked with ICD [2,3,4], although none has definitively been proven to be responsible for the condition. We therefore examined the metatranscriptome data for transcripts that mapped to these known pathogens: Camplyobacter coli (C. coli), Camplyobacter jejuni (C. jejuni), Helicobacter pylori (H. pylori), Shigella flexneri (S. flexneri), and Yersinia enterocolitica (Y. enterocolitica). Interestingly, some gene expression for all of these pathogens was detectable even in control samples. However, relative transcriptional abundances of all of these known bacterial pathogens were significantly higher in the ICD samples (Fig. 5). The largest differences between control and ICD were in Campylobacter species.
Screening of fungi, protozoa, and archaea yielded potential non-bacterial opportunists
To determine whether non-bacterial opportunists may potentially be active in ICD, fecal metatranscriptome reads were mapped to fungal, protozoan, and archaeal databases. The fungus, Sordaria macrospora, had higher transcript abundances in ICD (3.18-log2 fold increase, adjusted p = 6.33e−07). Another species from this genus, Sordaria fimicola, is known to be present in mammalian dung [40]. Sordaria fimicola was not in the reference data set, nor was any other species of Sordaria, so it is possible that the active species in macaques was Sordaria fimicola due to the limitations of the data. Interestingly, transcripts associated with the known fungal opportunist, Candida albicans, were not differentially abundant between control and ICD animals.
Non-human primates are vulnerable to several enteric protozoa, including Entamoeba histolytica, Giardia lamblia, Cryptosporidium parvum, Cyclospora cayetanensis, and Balantidium coli [41]. E. histolytica and G. lamblia were among the protozoans with the highest gene expression in this cohort, but their transcripts were not differentially abundant between control and ICD animals. RNA-Seq reads mapping to Cryptosporidium parvum were relatively lower in this cohort, compared with E. histolytica and G. lamblia, and reads mapping to Cyclospora cayetanensis were exceedingly rare and likely to be off-target hits. There were no species from the Balantidium genus in the data set.
Protozoans with the most abundant transcripts in the fecal samples from the macaques were Blastocystis sp. and Trichomonas vaginalis. Blastocystis sp. was higher in the control animals (1.96-log2 fold decrease, adjusted p = 0.03). T. vaginalis was higher in the animals with ICD (2.24-log2 fold increase, adjusted p = 0.002). T. vaginalis was the only species in the reference data set representative of the Trichomonas genus, so it is possible that the particular species with increased gene expression in macaques with ICD was not T. vaginalis.
Although there are no known pathogens among archaea, some archaeal genera are known to be gut-associated. Among the ten most abundant archaeal genera detected in the fecal samples, six were significantly different between control and ICD animals and all of these were lower in ICD. These include Methanosarcina (0.16-log2 fold decrease, adjusted p = 0.004), Methanococcus (0.16-log2 fold decrease, adjusted p = 0.018), Methanobacterium (0.13-log2 fold decrease, adjusted p = 0.009), Methanocorpusculum (0.21-log2 fold decrease, adjusted p = 0.044), and Methanocella (0.18-log2 fold decrease, adjusted p = 0.003). These genera are predominately methane producers. Methane-producing archaea may be depleted in ICD; however, this observation may be confounded by gut transit time, which is reduced by the presence of methane [42]. In summary, ICD was associated with increased gene expression by a fungus (Sordaria) and a protozoan (Trichomonas) and decreased gene expression by archaeal methanogens.
Campylobacter expresses mucosa-associated transcripts in ICD
Of all potential bacterial and non-bacterial pathogens (excluding viruses, which are not represented in the metatranscriptome), Campylobacter transcripts were the most differentially abundant between ICD and controls. We therefore investigated the functional annotations of genes specifically expressed by this genus. Functional annotation differences between control and ICD animals (Additional file 4) indicate that organisms within this genus experienced more oxidative stress and were more adherent and likely more virulent in ICD relative to control animals. Fermentation, and specifically the abundance of anaerobic respiratory reductases, was reduced in ICD relative to control, suggesting that Campylobacter was exposed to a more oxygen-rich environment in the ICD intestine compared with control. Abundance of genes related to oxidative stress and the stringent response ((p)ppGpp_metabolism) were higher in ICD-afflicted animals, whereas genes related to iron acquisition and metabolism were lower in ICD relative to control. This is expected given the inflammatory environment of the colon during ICD. Control of iron acquisition is a common microbial mechanism for reducing exposure to damage via the Fenton reaction.
The oxidative stress that was clearly experienced by Campylobacter could potentially be due to a more adherent population in ICD. Genes involved in adhesion were significantly more abundant in ICD animals as were genes for motility and chemotaxis, including flagellum (Additional file 4). Flagellar motility is required for Campylobacter colonization [43] and invasion of cultured intestinal epithelial cells [44,45,46]. This tighter affiliation with the host likely facilitates both increased oxidative stress for the Campylobacter present in ICD animals, as described above, and increased virulence. The stringent response, which aids in survival of oxygen stress conditions, has also been shown to be necessary for attachment and invasion of intestinal epithelial cells [47]. Further evidence of a more virulent phenotype in ICD animals was that the pVir plasmid was more nearly four times as abundant (1.8-log2 fold, p = 4.66e−04) in ICD animals as in control animals (Additional file 4). Although the definitive role of this plasmid in the virulence of Campylobacter has yet to be uncovered, it contains genes with homology to type four secretion system genes in Helicobacter pylori and Agrobacterium tumefaciens. The mutation of several of these genes affects C. jejuni adhesion and invasion in an intestinal epithelial cell culture model [48]. Overall, the Campylobacter-specific transcriptome suggests that Campylobacter were more closely associated with the mucosa in ICD than in controls.
Transcripts from mucin-degrading bacteria and mucin-degrading enzymes were increased in ICD
Having screened the metatranscriptomes for potential pathogens, we next investigated our hypothesis that ICD is associated with increased mucin degradation. First, we subsetted the data to compare abundances of transcripts from bacterial species known to be linked with inflammatory bowel conditions [49], although they are also considered to be commensals. Significantly higher transcript levels were observed in ICD for Ruminococcus gnavus, Ruminococcus torques, Bacteroides thetaiotaomicron, Bacteroides caccae, and Bacteroides fragilis (Fig. 6a). Abundances of transcripts associated with several Bifidobacterium species (B. bifidum, B. breve, B. longum) were slightly, but significantly, higher as well. In short, transcripts associated with all known mucin-degrading gut microbes except for Akkermasia muciniphilia were significantly more abundant in the feces of animals with ICD. These organisms are associated with both degradation of the protective mucin layer that forms on the inside of the gut epithelium and with increased presence in human irritable bowel disease [49,50,51].
Given that mucin-degrading species (Fig. 6a) are also known to be commensals that can thrive on dietary carbohydrates, we next examined the abundances of transcripts arising from the expression of specific enzymes known to be involved in mucin degradation. All of these mucin-degrading enzymes were differentially expressed between ICD and controls (Fig. 6b). Transcript abundances from six of the seven mucin-degrading enzymes—alpha- and beta-galactosidase, alpha-fucosidase, alpha-mannosidase, N-acetylglucosamindase, and O-acetylesterase—were all higher in ICD. In general, transcripts from both mucin-degrading bacteria and mucin-degrading enzymes were higher in ICD.
Intestinal cells synthesize fucosylated mucins in ICD
Given the differential expression of mucus-degrading enzymes in ICD compared with controls, we hypothesized that control and ICD-afflicted animals may have different mucus composition. Colon sections from control and ICD animals were stained with FITC-conjugated lectins for mannose and fucose (Fig. 7a). All images were selected and analyzed in a blinded manner (animal status unknown). Animals with ICD had markedly higher fucose relative to controls (p = 0.008) (Fig. 7b). No changes in intensity were observed for mannose levels between control and ICD animals. These observations suggest that the colons of animals with ICD have greater amounts of fucose than the controls.
Given that transcripts for microbial fucosidases were higher in ICD-associated metatranscriptomes and fucose was more abundant in the ICD-associated colon sections, we asked whether the host cell transcriptome was consistent with what would be expected of increased synthesis of fucosylated mucins. The most highly expressed mucin transcripts were mucin-12 (MUC12), mucin-2 (MUC2), mucin-17 (MUC17), and mucin-13 (MUC13). All of these mucins are known to be expressed by enterocytes in the colon. They were all significantly increased in ICD relative to controls: mucin-12, 2.1-log2 fold change, adjusted p = 2.01e−5; mucin-2, 2.3-log2 fold change, adjusted p = 3.98e−6; mucin-17, 1.3-log2 fold change, adjusted p = 0.005; and mucin-13, 1.6-log2 fold change, adjusted p = 1.77e−5. Overall, host fucosyltransferase gene expression was low across all samples. However, the most abundant fucosyltransferase, galactoside 3(4)-l-fucosyltransferase (FUT3), was more highly expressed in ICD (1.5-log2 fold change, adjusted p = 0.02) than the controls. Interestingly, the second most abundant fucosyltransferase, galactoside 2-alpha-l-fucosyltransferase 2 (FUT2), which is known to confer secretor status in humans [52], was not differentially expressed between ICD and control. In summary, the immunohistochemistry of colon sections and the examination of the host cell transcriptome suggest that ICD is associated with increased fucosylated mucins.
Fucose utilization in ICD
Given the increased abundance of transcripts from mucin-degrading enzymes and the increased availability of fucose in the colon of animals with ICD, we next investigated how fucose might be utilized. First, the metatranscriptome was subsetted by the annotation “alpha-l-fucosidase” to determine which microbes were liberating fucose. The top five species expressing this enzyme were all members of the Bacteroides genera: B. fragilis, B. xylanisolvens, B. vulgatus, B. thetaiotaomicron, and B. helcogenes. Expression of alpha-l-fucosidase by these species was significantly increased in ICD relative to controls.
The liberation of fucose from fucosylated mucins by these species is known to increase the amount of available fucose in the lumen [53]. We therefore investigated the expression of “fucose permease” to determine which microbes were importing fucose. The species with the most significantly increased abundance of “fucose permease” transcripts in ICD were Bacteroides salanitronis, Prevotella dentalis, Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides vulgatus, and Haemophilus influenzae. These results suggest that Bacteroides could potentially be cross-feeding fucose to Prevotella and Haemophilus in ICD.
An overview of mechanisms proposed by the results in this study is given in Fig. 8. Bacteroides sp. is known to stimulate host production of fucosylated mucins [54]. We found that enterocyte mucins and fucosyltransferase-3 transcripts were elevated in the host transcriptome. Immunohistochemistry of colon sections also illustrated the much higher presence of fucose in ICD relative to controls. Multiple Bacteroides sp. in ICD expressed more alpha-l-fucosidase, which liberates fucose from fucosylated mucins in the lumen. Bacteroides sp. as well as other microbes, such as Prevotella and Haemophilus, expressed fucose permeases which would have enabled them to import fucose, which likely improves their persistence in ICD. Interestingly, Campylobacter did not express more transcripts that give rise to proteins which import fucose; instead, it expressed more transcripts for adherence proteins, like cadF, that would have enabled it to adhere to fucosylated mucins.