CViewer: a Java-based statistical framework for integration of shotgun metagenomics with other omics datasets

Koci, Orges; Russell, Richard K.; Shaikh, M. Guftar; Edwards, Christine; Gerasimidis, Konstantinos; Ijaz, Umer Zeeshan

doi:10.1186/s40168-024-01834-9

Software
Open access
Published: 29 June 2024

CViewer: a Java-based statistical framework for integration of shotgun metagenomics with other omics datasets

Orges Koci¹,
Richard K. Russell²,
M. Guftar Shaikh³,
Christine Edwards¹,
Konstantinos Gerasimidis¹ &
…
Umer Zeeshan Ijaz ORCID: orcid.org/0000-0001-5780-8551^4,5,6

Microbiome volume 12, Article number: 117 (2024) Cite this article

Metrics details

Abstract

Background

Shotgun metagenomics for microbial community survey recovers enormous amount of information for microbial genomes that include their abundances, taxonomic, and phylogenetic information, as well as their genomic makeup, the latter of which then helps retrieve their function based on annotated gene products, mRNA, protein, and metabolites. Within the context of a specific hypothesis, additional modalities are often included, to give host-microbiome interaction. For example, in human-associated microbiome projects, it has become increasingly common to include host immunology through flow cytometry. Whilst there are plenty of software approaches available, some that utilize marker-based and assembly-based approaches, for downstream statistical analyses, there is still a dearth of statistical tools that help consolidate all such information in a single platform. By virtue of stringent computational requirements, the statistical workflow is often passive with limited visual exploration.

Results

In this study, we have developed a Java-based statistical framework (https://github.com/KociOrges/cviewer) to explore shotgun metagenomics data, which integrates seamlessly with conventional pipelines and offers exploratory as well as hypothesis-driven analyses. The end product is a highly interactive toolkit with a multiple document interface, which makes it easier for a person without specialized knowledge to perform analysis of multiomics datasets and unravel biologically relevant patterns. We have designed algorithms based on frequently used numerical ecology and machine learning principles, with value-driven from integrated omics tools which not only find correlations amongst different datasets but also provide discrimination based on case–control relationships.

Conclusions

CViewer was used to analyse two distinct metagenomic datasets with varying complexities. These include a dietary intervention study to understand Crohn’s disease changes during a dietary treatment to include remission, as well as a gut microbiome profile for an obesity dataset comparing subjects who suffer from obesity of different aetiologies and against controls who were lean. Complete analyses of both studies in CViewer then provide very powerful mechanistic insights that corroborate with the published literature and demonstrate its full potential.

Video Abstract

Introduction

Whilst advances in high-throughput sequencing technologies have revolutionized the study of uncultured microbial communities, recent microbiome surveys using whole-genome shotgun sequencing delineate an enormous amount of information about microbial genomes, their taxonomic and functional profiles. With the ability to assemble nearly complete genomes, we are now at the stage where we can understand the mechanistic underpinnings to the biological processes involved in the studies for which the sequencing data are generated. Additionally, we can use metadata (typically, physicochemical and clinical parameters) relevant to the study to highlight patterns of interest. Whilst many tools have become available in the past few years, from microbial genome binnings [1,2,3] to annotation [4, 5], as well as downstream statistical analyses [6, 7], there is still a dire need to consolidate all the biological entities of microbial genomes, such as gene products, mRNA, protein, metabolites, and their interactions in a single platform, and to enable exploration of this multi-component dataset in an easy-to-use interface.

This combination multiomics approach will be advantageous and will enable researchers, especially those who have a basic understanding about microbiology and microbial informatics, to quickly elucidate the interplay between microbiomes and their environments. Furthermore, combining visualizations with statistical inference in a single software platform has an added advantage of leveraging time spent on the analyses. Previous work towards this goal included conventional genome viewers like MGAViewer [8] where it is only possible for one to analyse at most two genomes together for comparative genomics, albeit, impractical for metagenomics exploration. Whilst there are other attempts such as Anvi’o [3], a tool that implements an advanced analysis and visualization platform for omics data, combining taxonomic coverages with phylogenetic trees, and is typically used as a de facto tool for metagenomics exploration, as well as Elviz [9] and WHAM [10]! which allow the exploration of metagenomic data. Although useful, these tools have limited support for statistical analyses, particularly Anvi’o [3] and Elviz [9], without any means to do an integrative assessment. In view of the limitations as above, there is an unmet need for improvement to fully exploit the potential of the shotgun metagenomics data, both in terms of graphical user interface interactivity, as well as statistical inference. This serves as the basis for developing CViewer in a language such as Java that is not only cross-platform compatible, but also has an established graphical interface to show relevant information on the fly.

The design principles of CViewer include Java-based solution able to run on users’ local machines to incorporate the output from CONCOCT [2], a binning software, as well as annotation data for the contigs from major third party taxonomic and annotation tools [5, 11]. All this is done to explore the sample space in the context of clinical metadata, with the interface given in Fig. 1. Beyond visualizing contigs from CONCOCT software, and looking at how they cluster at the species level, we have implemented a variety of statistical algorithms from numerical ecology literature, what is conventionally available in R’s vegan package [7], to allow exploratory as well as hypothesis driven analyses, emphasizing functional traits of microbial communities and their phylogenetic signal to assess community assembly. The implemented functionalities in CViewer include alpha (based on the Simpson Index [12], Shannon Entropy [13], and Pielou’s Evenness [14]) and beta diversity (principal component analysis [15], multi-dimensional scaling [16]) estimates; statistical procedures to test covariates in terms of correlation with microbial community structure (fuzzy set ordination [17]) and the variability they account for (PERMANOVA [18]) estimates (Fig. 1A); differential abundance (utilizing both the non-parametric Kruskall-Wallis [19] test and Friedman [20] test for repeated measures) and correlation analyses (including the Pearson’s [21], Kendall’s [22] and Spearman’s [23] coefficients) (Fig. 1B); enrichment analyses of KEGG [24] metabolic pathways (Fig. 1C); phylogenetic alpha diversity indices (based on the Net Relatedness Index and Nearest Taxon Index [25]) (Fig. 1D); visualization and analysis of coverage, clustering and functional diversity of metagenomics contigs (Fig. 1E); and integrative techniques for the analysis of multi-omics datasets (including the DISCO-SCA [26], JAVA [27], and O2PLS [28] algorithms) (Fig. 1F), with details given in the Additional file 1: Supplementary Note 1. The software and the accompanying data along with video demos exploring these features are available at https://github.com/KociOrges/cviewer.

To demonstrate how CViewer is useful in analysing the metagenomics datasets, we have next used the software to explore a dataset of gut microbiome composition and metabolic profile from children with active Crohn’s disease (CD) who undergo dietary therapy with exclusive enteral nutrition (EEN). The data were categorized as Crohn’s disease individuals who were treated with EEN for two months and healthy individuals to use as the reference microbiome and metabolome of healthy status. A smaller subset of these data has been previously published under Gerasimidis et al. [29], Quince et al. [30], and Alghamdi et al. [31], and additional samples and metagenomic sequencing were incorporated to see if the trends became more prominent. In addition, CViewer was tested on another dataset where the gut microbiome of children with pathological cause of hypothalamic obesity (Prader-Willi syndrome) was compared against children who suffer from common or classical type of obesity. Lean subjects from each of these two groups were used as controls. For more details on the datasets and the study characteristics, the reader is referred to Additional file 1: Supplementary Materials and Methods.

Results

Crohn’s disease dataset

Community structure of the gut metagenome and metabolome

Alpha diversity analysis using the Shannon index suggested significantly lower diversity in the gut metagenome of CD children compared to that of healthy controls (p < 0.0001) (Fig. 2A). Similar results were also found when we looked for differences during treatment with EEN (Fig. 2B and Additional file 1: Supplementary Note 2). Shannon diversity remained significantly lower in CD subjects than in controls prior to EEN, throughout treatment and post-treatment (A:Start of EEN, p = 0.0001; B:15d, p < 0.0001; C:30d, p < 0.0001; D:End of EEN, p < 0.0001; E:Free Diet, p = 0.0034). However, during EEN, the diversity of CD subjects decreased further, with the effect becoming noticeable after 15 days of treatment, reaching minimum diversity after ~ 30 days, and then showing a slight recovery towards the end of EEN, and complete recovery to pre-treatment levels when patients returned to habitual diet. When we explored for differences between the treatment days, our results suggested a significant difference only between the time points C and E of EEN for Shannon’s diversity (p = 0.0134).

Beta diversity analysis of the gut metagenome using PCA showed an evident clustering of CD patients distinct from controls (Fig. 2C). This was also observed using PERMANOVA (distances between groups) which suggested that 8.3% of the variation in community structure was explained significantly by the sample groups (p = 0.001). When the CD subjects were grouped according to EEN samples, PCA described higher variability of the community structure for CD children compared to healthy controls (Fig. 2D) and the amount of significantly explained variability according to PERMANOVA increased to 11.22% (p = 0.001) suggesting that more differences in the community structure were accounted to changes during EEN. However, after the CD patients completed the treatment and returned to their free habitual diet, the samples appeared closer to the pre-treatment groups. Furthermore, analysis using fuzzy set ordination (FSO) suggested a significant association between the microbial community structure of the CD individuals at point D (end of EEN) and E (free diet) of EEN with calprotectin (r = 0.638, p = 0.001), a marker of gut inflammation, (Additional file 1: Supplementary Note 2 and Figs. S4A–S4B) and showed that patients who achieved remission, especially at the end of EEN, were related to lower calprotectin levels compared with the patients who still have active disease following completion of treatment. Similarly, PCA analysis on the faecal metabolome showed a distinct separation for the healthy control samples from the CD patient group (Fig. 2E). In addition, PCA showed that the metabolomes from healthy children were more tightly clustered, whilst CD patients were more variable and changed during EEN treatment (Fig. 2F).

Differential abundance analysis

Abundance analysis showed several clusters (genomes expressed in average abundances across samples) differentiating significantly in the CD groups and the healthy controls (Additional file 1: Table S1). Amongst them, Bifidobacterium longum was in lower abundance in CD patients, at all sampling points, with a significant decrease noticed during EEN, when compared with the healthy control group (A:Start of EEN, p > 0.05; B:15d, p = 0.0002; C:30d, p = 0.0001; D:End of EEN, p < 0.0001) (Fig. 3A). Moreover, Bifidobacterium adolescentis was in significantly lower abundance in the CD samples at all sampling points with the effect being more evident after 15 days of EEN and onwards (A:Start of EEN, p = 0.0282; B:15d, p = 0.0002; C:30d, p = 0.0002; D:End of EEN, p < 0.0001; E:Free Diet, p = 0.0002) (Fig. 3A). In a similar way, Eubacterium rectale was found in a lower abundance in CD groups and remained significantly lower at all points, although it moved closer to the control levels post-EEN treatment on food reintroduction (A:Start of EEN, p = 0.0001; B:15d, p < 0.0001; C:30d, p < 0.0001; D:End of EEN, p < 0.0001; E:Free Diet, p = 0.0357) (Fig. 3A). In contrast, Escherichia coli was significantly more prevalent in samples from CD patients at EEN initiation compared with the healthy controls (p = 0.0009) and remained more abundant in CD individuals across all stages of EEN, although a decreasing pattern could be observed during the treatment course (Fig. 3A).

Differential analysis using Kruskal–Wallis also suggested that 487 annotated metabolites were significantly different between the CD groups and the healthy controls (Additional file 1: Table S2). More specifically, the levels of docosatetraenoic acid were found in a higher abundance in children with active CD at all sampling points compared to the healthy group (A:Start of EEN, p = 0.0001; B:15d, p = 0.0001; C:30d, p = 0.0017; D:End of EEN, p = 0.0002; E:Free Diet, p = 0.0089). (Fig. 3B). This was also the case for icosatrienoic acid (A:Start of EEN, p = 0.0014; B:15d, p = 0.0002; C:30d, p = 0.0026; D:End of EEN, p = 0.0007; E:Free Diet, p = 0.0147) and arachidonic acid (5Z_8Z_11Z_14Z-eicosatetraenoic acid) (A:Start of EEN, p = 0.0001; B:15d, p = 0.021; C:30d, p = 0.0195; D:End of EEN, p = 0.0189; E:Free Diet, p = 0.0056) (Fig. 3B). The compounds remained significantly higher than the controls during EEN although some of the metabolites regressed to pre-treatment levels when subjects returned to free diet, with the effect being most pronounced for arachidonic acid. In contrast, the levels of an ornithine isomer (N4-acetyl-N4-hydroxy-1-aminopropane) were significantly lower in the CD samples than in the healthy group, with the effect being more noticeable between the pre-treatment samples and the healthy controls (A:Start of EEN, p = 0.0013; B:15d, p = 0.016; C:30d, p = 0.0021; D:End of EEN, p = 0.0069; E:Free Diet, p = 0.020) (Fig. 3B).

Changes in faecal bacterial metabolites during EEN

Significant differences were observed when we assessed the quantitative changes in the concentration of major targeted bacterial metabolites of CD patients during EEN (Fig. 4A). Compared to healthy individuals, faecal pH was significantly higher when patients were on EEN, but no differences were observed at EEN initiation or when patients returned to their habitual diet (A:Start of EEN, p > 0.05; B:15d, p = 0.0005; C:30d, p = 0.0004; D:End of EEN p < 0.0001; E:Free Diet, p > 0.5) (Fig. 4A). This was also the case for total sulphide (A:Start of EEN, p > 0.05; B:15d, p = 0.0015; C:30d, p = 0.0004; D:End of EEN, p = 0.0018; E:Free Diet, p > 0.5) (Fig. 4A). In contrast, butyric acid was significantly lower in patients with CD compared to healthy controls with the effect being more evident after 15 days of EEN and onwards. This effect was lost when patients returned to their habitual diet (A:Start of EEN, p = 0.0137; B:15d, p = 0.0002; C:30d, p = 0.0003; D:End of EEN, p < 0.0001; E:Free Diet, p > 0.5) (Fig. 4A). A similar pattern was also found for the proportional ratio of butyric acid (A:Start of EEN, p = 0.0367; B:15d, p = 0.0002; C:30d, p = 0.0003; D:End of EEN, p = 0.0002; E:Free Diet, p > 0.5) (Fig. 4A).

Integrated analysis of metagenomics and metabolomics

An integrated analysis of faecal metagenomics and metabolomics was performed to investigate possible interactions between the two datasets. This methodology is particularly useful for decomposing the variability of the composite omics systems into a joint variability or common structure that highlights the biological mechanisms underlying both the omics types under study. High contributions to the common structure across the two datasets (expressed as component loadings) could indicate a mechanistic association between the microbes and the metabolites, such as that a species may release a particular metabolite, or that specific metabolites may stimulate the growth of a particular species.

To explore this, the DISCO-SCA [26] method was used for samples collected from CD patients before EEN initiation and healthy individuals (see Additional file 1: Supplementary Material and Methods and Supplementary Note 1 for more details on methods and data pre-processing). Figure 4B and C visualize the estimated DISCO-SCA joint loadings for the omics datasets for the first common component, sorted by absolute value. When we explored the 30 first species with the highest contribution values for the metagenome, we noticed that Rothia dentocariosa had the largest value in the common structure loadings, whilst a high contribution was also noticed for several species of the genus Streptococcus (Streptococcus intermedius/oralis/constellatus/gordonii/arginosus/parasanguinis/agalactiae) (Fig. 4B). In a similar way, when we examined the 30 first metabolites with the highest contribution in the common structure for the metabolome, we found that D-gluconic acid, 2-Aminomuconate semialdehyde, and N-(3-oxooctanoyl)-L-homoserine had the highest magnitude of positive loading values as compared to the other metabolites (Fig. 4C).

Obesity dataset

Bacterial community structure

Shannon’s entropy and Pielou’s evenness showed no significant difference in the microbial diversity between the four groups, suggesting that the bacterial communities of the participants were similar in species abundance and richness (p = 0.223) (Fig. 5A) and evenness (p = 0.228). Shannon entropy provided similar results when participants were grouped according to obesity status (p = 0.272) and obesity aetiology (p = 0.551). Moreover, although the difference was not significant (p = 0.156), it could be seen that the microbiota of the hypothalamic lean group was more diverse than the hypothalamic obese children (Fig. 5B). Beta diversity analysis using MDS showed no evident clustering in the community structure of the four groups, suggesting a similar microbial profile between the study participants (Fig. 5C). This was also confirmed by PERMANOVA, which suggested that the groupings did not account for a significant amount of the variability in the community structure (R² = 7.13%, p = 0.355). Using PERMANOVA, no significant effect of obesity phenotype (R² = 1.66%, p = 0.826) or obesity aetiology (R² = 2.12%, p = 0.532) on the community structure variation was observed. Moreover, differential analysis using Kruskal–Wallis did not suggest any significant differences in the average abundances of clusters (genomes) between the four groups when they were compared to each other, or when they were categorized according to obesity phenotype and aetiology.

Faecal bacterial metabolites

Significant differences in the proportion and concentration of bacterial metabolites were observed when the samples were grouped based on pathology (healthy vs. pathology). Healthy participants (healthy lean and healthy obese) had a significantly lower percentage (p = 0.001) of acetate compared with the pathology groups (hypothalamic lean and hypothalamic obese) (Fig. 5H). In contrast, healthy participants had a significantly higher percentage of butyrate (p = 0.001) and ammonia concentration (p = 0.013) than the hypothalamic individuals (Fig. 5H, I). In a similar way, when we examined the differences between the samples according to obesity status (healthy lean and hypothalamic lean vs. healthy obese and hypothalamic obese), it was noticed that the obese group (healthy obese and hypothalamic obese) had a significantly higher concentration (p = 0.011) of acetate, concentration (p = 0.001) and percentage (p = 0.001) of propionate, and concentration (p = 0.006) of total SCFA than the lean samples (Fig. 5I and Additional file 1: Fig. S5).

Implementation

CViewer was developed as a Java-based desktop application that has cross-platform portability since a compiled Java program can run on all platforms for which there exists a Java Virtual Machine (JVM). This holds for all major operating systems, including Windows, Mac OS, and Linux. Java is also very flexible for designing GUIs by providing a rich set of packages for graphics manipulation. For visualization, a combination of toolkits and chart libraries were used mainly from JavaFX, Swing, and JFreeChart, which offer easy-of-use benefits and extensible and pluggable components resilient to future enhancements. We have considered an intuitive integrated multiple document interface (MDI) as opposed to the single document interface (SDI) where you have one window per functionality to avoid cluttering up of windows on the desktop and to have a single window with multiple panes instead to visualize all the information. This makes the GUI more coherent, simpler to use and easier to operate.

The software design requires user input in the form of CSV, TSV, and GBK files. The business logic performs data entry validation and, when necessary, element matching for the input datasets to ensure compatibility for downstream analyses (i.e. all datasets should describe the same set of samples). In addition to user-provided data, the software relies on the local filesystem for retrieving and storing information relevant to analysis, such as the full list of KEGG metabolic pathways and maps and single-copy clusters of orthologous genes useful for phylogenomic analysis, which are given in the form of TXT, PNG, or XML files. A Document Object Model (DOM) parser is used to represent and modify XML pathway maps as tree structures, and ImageIO API to extract them into image formats. The software is distributed in the form of an executable, ready-to-run, and platform-independent package file (JAR) which requires no installation or further configuration from the user. The application interacts with the operating system, and in principle, no Internet connection is required when using the tool.

For the sake of simplicity, CViewer has been designed to make use of the available machine memory for data manipulation, instead of utilizing a dedicated database for this purpose (e.g. MySQL, MongoDB). This approach alleviates the efforts of database configuration and maintenance which can often be complicated and time-consuming, particularly for people with no related knowledge. However, it may still pose a challenge in terms of the computational power that is needed for the tool to achieve optimal performance. To address this, user-provided data are pre-processed upon entry and compressed into files of reduced size by maintaining information only essential for the tool to operate or present in the software interface, as is often the case for annotation files (GBK). In addition, indexing is applied as a pre-processing step for faster information retrieval of, e.g. CDS regions of a particular contig. Nevertheless, we still consider the implementation of a database as useful for CViewer, particularly in the case of a web-based version of the software. System maintenance would then rely exclusively on the application server making it possible for users to access the resource online with no further action. To achieve this, a new system design would need to be established employing web development technologies (e.g. SpringBoot, J2EE), an objective that we pursue to realize in a future version of our tool.

Discussion

CViewer has successfully revealed patterns of interest in two independent datasets: a dataset for longitudinal gut microbiome and metabolomic profiles from children with Crohn’s disease who undergo dietary treatment with EEN; as well as on gut microbiome profile for an obesity dataset where subjects have a non-pathological or a pathological cause of obesity, and as compared to those who are lean. In the former study, beta diversity analysis of the gut microbiome and metabolome showed a clear separation of the CD groups throughout treatment and post-treatment from the healthy controls. In addition, the gut microbiota of CD patients changed soon after EEN initiation was less diverse than in controls at all sampling points and decreased during EEN with a recovery to pre-treatment levels when patients returned to their habitual diet. These observations align with previous studies [32,33,34,35], including some of ours [30, 31], demonstrating a distinct clustering of the metabolomes [35] and metagenomes of CD patients from that of healthy controls and exhibiting general decreases in microbiota diversity relative to healthy individuals [32], especially during EEN treatment [33, 34], suggesting the usefulness of an integrative approach. Our analysis also showed that classic commensal organisms such as Bifidobacterium longum, Bifidobacterium adolescentis, and Eubacterium rectale differentiated in the CD groups over the course of EEN and were in lower abundance in CD children, compared to controls. These results align to previous research exploring the role of these organisms in CD pathogenesis [36]. Conversely, we noticed a higher abundance of E. coli in CD subjects than in controls; an expected outcome as the E. coli population is a much-studied topic in CD, particularly adherent invasive E. coli which are overrepresented in CD patients [37].

Also, in accordance with our previous findings [31], omega-6 fatty acids, including docosatetraenoic acid and arachidonic acid, were in significantly higher levels in the CD patients compared to healthy individuals and remained high both pre- and post-EEN treatment in the CD group. These findings conform to published work that highlights these compounds as pro-inflammatory in the gut and implicated in IBD [38, 39]. At the same time, by exploiting exploratory methods further provided by the software, such as FSO, we were able to highlight an association between calprotectin and the microbiome of CD patients, especially for those who achieved remission at point D (end of EEN) of treatment. In their research, Quince et al. [30] reported several taxa which were different between the CD and controls and significantly correlated with calprotectin, with Bifidobacterium spp. having the strongest negative and Atopobium spp. the strongest positive association with calprotectin in multivariate regression analysis. They also suggested that EEN caused the reduction in the relative abundance of gut bacteria that were positively and negatively associated with calprotectin. Although our study is limited in this perspective due to the lack of an implementation for regression analysis in the current version of the software, such an enhancement is one of our immediate future plans and with our findings for calprotectin and microbiome showing promising results, additional analysis on this aspect will be pursued in a future study with CViewer.

Finally, integrated analysis of the metagenome and the metabolome showed that D-gluconic acid, 2-Aminomuconate semialdehyde, and N-(3-oxooctanoyl)-L-homoserine had the highest magnitude of positive loading values as compared to the other metabolites, whilst a high contribution was observed for Rothia dentocariosa and several species of the genus Streptococcus (Streptococcus intermedius/oralis/constellatus/gordonii/arginosus/parasanguinis/agalactiae) in the common structure loadings, i.e. the biological mechanisms underlying both omics datasets. Rothia species are assumed to have important interactions with the host immune system that may promote inflammatory diseases, including Crohn’s disease. Rothia dentocariosa, in particular, has been previously reported to increase the production of the inflammatory cytokine tumour necrosis factor-alpha (TNF-a) and thus might act as an intermediate factor for increased inflammation in the oral cavity [40]. Moreover, Streptococci, especially the Streptococcus anginosus (milleri) group, comprising of S. intermedius, S. arginosus, and S. constellatus species, has been associated with liver abscess in patients with Crohn’s disease [41]. Similar to the metabolite loadings, the values for these species were also positive and could suggest a potentially synergistic positive interaction between the microbes and the metabolites, where the highly contributing metabolites could either promote the growth of the species highlighted above, or that those species may produce the particular metabolites. These associations captured with CViewer provide some novel insights into the interactions of the two omics datasets and as relevant studies are still not prevalent in literature, supplementary correlation analysis (using, e.g. Spearman’s correlation) of the metabolites and the species with high contributions in the common structure loadings could be useful to illustrate them even further as part of a future exploration with the tool.

In the latter study of lean and obese children of obesity of different aetiology, we demonstrated that the gut microbiota and metabolic activity did not differentiate between obesity of different aetiology or between obese and lean phenotypes, suggesting that gut microbiota is not incriminated in the aetiology of obesity. Moreover, our results described higher faecal SCFA in the two obese groups (common and hypothalamic obese) compared with lean (healthy and hypothalamic lean). Even though the role of SCFA in obesity is still not fully understood, recent research has demonstrated that a higher concentration of SCFA in faeces is associated with obesity and hypertension [42]. However, the absence of difference in SCFA concentration between individuals with obesity of different aetiology strongly suggests that the increased production of SCFA in these groups is a secondary effect of differences in dietary patterns and hyperphagia rather than a primary defect increasing the risk of obesity onset.

Conclusions

Analytics for shotgun sequencing metagenomics has seen substantial growth in recent years by availability of numerous tools that cover and advance a particular aspect of the analyses, with their usability confined to the passive running of scripts that allow little or no interactivity. CViewer on the other hand allows one to explore the datasets and benefits from the interactivity offered by the graphical user interface, particularly the implementation platform in Java is well suited towards this purpose. Secondly, we have incorporated comprehensive statistical tools in the framework to allow one place to process and analyse all the datasets without the need to use any third-party tools such as R or Matlab. Moreover, multiomics exploration allows seamless integration of metagenomics with other omics technologies, such as metabolomics, which reveal patterns that not only consolidate/correlate between multiple datasets but provide discriminatory cues for multiple treatment groups. This way, an overarching linkage between multiple modalities fills in the gaps in our understanding of how microbes are behaving in the context of the hypothesis under which the data is generated. Whilst we are exploring community assembly using phylogenetic alpha diversity measures (NRI/NTI), some new methods have appeared in recent years, some that are extensions of the above [43, 44] and, if implemented, can give further insights into the community assemblage processes. Nonetheless, in the absence of these methods, the existing functionality is sufficient to reveal useful patterns in the metagenomics datasets. Moreover, whilst we have tested and demonstrated the integration of metagenomics with metabolomics in CViewer, future versions can possibly extend the analysis to a simultaneous exploration of metagenomics with two or more modalities including proteomics and transcriptomics to enable far more extensive exploration of the data.

Availability of data and materials

The code for CViewer software and the associated data are available at https://github.com/KociOrges/cviewer.

References

Lu YY, Chen T, Fuhrman JA, Sun F. COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge CO-alignment and paired-end read LinkAge. Bioinformatics. 2017;33:791–8.
Article CAS PubMed Google Scholar
Alneberg J, et al. Binning metagenomic contigs by coverage and composition. Nat Methods. 2014;11:1144–6.
Article CAS PubMed Google Scholar
Eren AM, et al. Anvi’o: an advanced analysis and visualization platformfor ’omics data. PeerJ. 2015;3:e1319.
Article PubMed PubMed Central Google Scholar
Overbeek R, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33:5691–702.
Article CAS PubMed PubMed Central Google Scholar
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–9.
Article CAS PubMed Google Scholar
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.
Article PubMed PubMed Central Google Scholar
Oksanen J, et al. Package ‘vegan’ Title Community Ecology Package Version 2.5–6. 2019.
Google Scholar
Zhu Z, et al. MGAviewer: a desktop visualization tool for analysis of metagenomics alignment data. Bioinformatics. 2013;29:122–3.
Article CAS PubMed Google Scholar
Cantor M, et al. Elviz - exploration of metagenome assemblies with an interactive visualization tool. BMC Bioinformatics. 2015;16:130.
Article PubMed PubMed Central Google Scholar
Devlin JC, Battaglia T, Blaser MJ, Ruggles KV. WHAM!: a web-based visualization suite for user-defined analysis of metagenomic shotgun sequencing data. BMC Genomics. 2018;19:1–11.
Article Google Scholar
Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15:R46.
Article PubMed PubMed Central Google Scholar
Simpson EH. Measurement of diversity [16]. Nature. 1949;163:688. Preprint at https://doi.org/10.1038/163688a0.
Article Google Scholar
Spellerberg IF, Fedor PJ. A tribute to Claude-Shannon (1916–2001) and a plea for more rigorous use of species richness, species diversity and the ‘Shannon-Wiener’ Index. Glob Ecol Biogeogr. 2003;12:177–9.
Article Google Scholar
Wiegand H. Pielou, E. C. An introduction to mathematical ecology. Wiley Interscience. John Wiley & Sons, New York 1969. VIII + 286 S., 32 Abb., Preis 140 s. Biom Z. 1971;13:219–20.
Article Google Scholar
Wold H. Soft modelling by latent variables: the non-linear iterative partial least squares (NIPALS) approach. J Appl Probab. 1975;12:117–42.
Article Google Scholar
Gower JC. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika. 1966;53:325–38.
Article Google Scholar
Roberts DW. Ordination on the basis of fuzzy set theory. Vegetatio. 1986;66:123–31.
Article Google Scholar
Anderson MJ, Ellingsen KE, McArdle BH. Multivariate dispersion as a measure of beta diversity. Ecology Letters, 9(6), 683–693. 2006, doi: 10.1111/j.1461-0248.2006.00926.x of beta diversity. Ecol Lett. 2006;9:683–93.
Article PubMed Google Scholar
Kruskal WH, Wallis WA. Use of ranks in one-criteron analysis of variance. J Am Stat Assoc. 1952;47:583–621.
Article Google Scholar
Siegel S, John Castellan N Jr. Nonparametric statistics for the behavioral sciences, International Edition. 1988. p. 262–72.
Google Scholar
Pearson K. Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. Philos Trans R Soc Lond A. 187;253–318. Preprint at https://doi.org/10.2307/90707.
Kendall MG. A new measure of rank correlation. Biometrika. 1938;30:81–93.
Article Google Scholar
Spearman C. ‘General intelligence’, objectively determined and measured. Am J Psychol. 1904;15:201.
Article Google Scholar
Du J, et al. KEGG-PATH: Kyoto encyclopedia of genes and genomes-based pathway analysis using a path analysis model. Mol Biosyst. 2014;10:2441–7.
Article CAS PubMed Google Scholar
Webb CO. Exploring the phylogenetic structure of ecological communities: an example for rain forest trees. Am Nat. 2000;156:145–55.
Article PubMed Google Scholar
Schouteden M, Van Deun K, Wilderjans TF, Van Mechelen I. DISCO-SCA. Behav Res Methods. 2014;46:576–87.
Article PubMed Google Scholar
Lock EF, Hoadley KA, Marron JS, Nobel AB. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat. 2013;7:523–42.
Article PubMed PubMed Central Google Scholar
Trygg J, Wold S. O2-PLS, a two-block (X-Y) latent variable regression (LVR) method with an integral OSC filter. J Chemom. 2003;17:53–64.
Article CAS Google Scholar
Gerasimidis K, et al. Decline in presumptively protective gut bacterial species and metabolites are paradoxically associated with disease improvement in pediatric Crohn’s disease during enteral nutrition. Inflamm Bowel Dis. 2014;20:861–71.
Article PubMed Google Scholar
Quince C, et al. Extensive modulation of the fecal metagenome in children with Crohn’s disease during exclusive enteral nutrition. Am J Gastroenterol. 2015;110:1718–29.
Article CAS PubMed PubMed Central Google Scholar
Alghamdi A, et al. Untargeted metabolomics of extracts from faecal samples demonstrates distinct differences between paediatric Crohn’s disease patients and healthy controls but no significant changes resulting from exclusive enteral nutrition treatment. Metabolites. 2018;8:82.
Article PubMed PubMed Central Google Scholar
Jacobs JP, et al. A disease-associated microbial and metabolomics state in relatives of pediatric inflammatory bowel disease patients. Cell Mol Gastroenterol Hepatol. 2016;2:750–66.
Article PubMed PubMed Central Google Scholar
Kaakoush NO, et al. Effect of exclusive enteral nutrition on the microbiota of children with newly diagnosed Crohn’s disease. Clin Transl Gastroenterol. 2015;6:e71.
Article CAS PubMed PubMed Central Google Scholar
Guinet-Charpentier C, Lepage P, Morali A, Chamaillard M, Peyrin-Biroulet L. Effects of enteral polymeric diet on gut microbiota in children with Crohn’s disease. Gut. 2017;66:194–5.
Article CAS PubMed Google Scholar
Bjerrum JT, Wang Y, Hao F, Coskun M, Ludwig C, Günther U, et al. Metabonomics of human fecal extracts characterize ulcerative colitis, Crohn’s disease and healthy individuals. Metabolomics. 2015;11:122–33.
Article CAS PubMed Google Scholar
Kabeerdoss J, Jayakanthan P, Pugazhendhi S, Ramakrishna BS. Alterations of mucosal microbiota in the colon of patients with inflammatory bowel disease revealed by real time polymerase chain reaction amplification of 16S ribosomal ribonucleic acid. Indian J Med Res. 2015;142:23–32.
Article CAS PubMed PubMed Central Google Scholar
Kotlowski R, Bernstein CN, Sepehri S, Krause DO. High prevalence of Escherichia coli belonging to the B2+D phylogenetic group in inflammatory bowel disease. Gut. 2007;56:669–75.
Article CAS PubMed Google Scholar
Musso G, Gambino R, Cassader M. Interactions between gut microbiota and host metabolism predisposing to obesity and diabetes. Annu Rev Med. 2011;62:361–80.
Article CAS PubMed Google Scholar
Kaliannan K, Wang B, Li X-Y, Kim K-J, Kang JX. A host-microbiome interaction mediates the opposing effects of omega-6 and omega-3 fatty acids on metabolic endotoxemia. Sci Rep. 2015;5:11276.
Article CAS PubMed PubMed Central Google Scholar
Kataoka H, et al. Rothia dentocariosa induces TNF-alpha production in a TLR2-dependent manner. Pathog Dis. 2014;71:65–8.
Article CAS PubMed Google Scholar
Narayanan S, et al. Crohn’s disease presenting as pyogenic liver abscess with review of previous case reports. Am J Gastroenterol. 1998;93:2607–9.
Article CAS PubMed Google Scholar
de la Cuesta-Zuluaga J, et al. Higher fecal short-chain fatty acid levels are associated with gut microbiome dysbiosis, obesity, hypertension and cardiometabolic disease risk factors. Nutrients. 2019;11:51.
Article Google Scholar
Ning D, Deng Y, Tiedje JM, Zhou J. A general framework for quantitatively assessing ecological stochasticity. Proc Natl Acad Sci U S A. 2019;116:16892–8.
Article CAS PubMed PubMed Central Google Scholar
Kraft NJB, et al. Disentangling the drivers of β diversity along latitudinal and elevational gradients. Science. 2011;1979(333):1755–8.
Article Google Scholar

Download references

Acknowledgements

We would like to thank and dedicate this paper in memory of Dr. Jaffar Khan who performed the sample collection and laboratory analysis of the obesity datasets. We would also like to thank Christopher Quince who assisted with the generation of contigs for the Crohn’s disease dataset.

Availability and requirements

Project name: CViewer.

Project home page: https://github.com/KociOrges/cviewer.

Operating system(s): Platform independent.

Programming language: Java.

Other requirements: Java Platform Standard Edition 1.8.0 or later.

Licence: MIT.

Any restrictions to use by non-academics: no.

Funding

UZI is funded by NERC IRF NE/L011956/1, BBSRC BB/T010657/1, and EPSRC EP/V030515/1. OK is supported by Nestle Industrial PhD Partnership with the University of Glasgow. RKR is supported by a National Health Service senior research fellowship. The Glasgow Children Hospital Charity and the Children with Crohn’s and Colitis funded the metagenomics analysis of the datasets.

Author information

Authors and Affiliations

Human Nutrition, School of Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow Royal Infirmary, Glasgow, G4 0SF, UK
Orges Koci, Christine Edwards & Konstantinos Gerasimidis
Department of Paediatric Gastroenterology, Hepatology and Nutrition, Royal Hospital for Children & Young People, Edinburgh, EH16 4TJ, UK
Richard K. Russell
Department of Endocrinology, Royal Hospital for Children, Glasgow, 1345 Govan Rd., Glasgow, G51 4T, UK
M. Guftar Shaikh
Water & Environment Research Group, University of Glasgow, Mazumdar-Shaw Advanced Research Centre, Glasgow, G11 6EW, UK
Umer Zeeshan Ijaz
National University of Ireland, Galway, University Road, Galway, H91 TK33, Ireland
Umer Zeeshan Ijaz
Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool, L69 7BE, UK
Umer Zeeshan Ijaz

Authors

Orges Koci
View author publications
You can also search for this author in PubMed Google Scholar
Richard K. Russell
View author publications
You can also search for this author in PubMed Google Scholar
M. Guftar Shaikh
View author publications
You can also search for this author in PubMed Google Scholar
Christine Edwards
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Gerasimidis
View author publications
You can also search for this author in PubMed Google Scholar
Umer Zeeshan Ijaz
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

UZI and KG designed the study; UZI, KG, and RKR directed this study as supervisors for OK; OK wrote the software under the guidance of UZI and carried out the statistical analysis; OK and UZI wrote the manuscript; KG critically interpreted findings; RKR, CE, and MGT provid-ed feedback on the manuscript and clinical relevance/translation of this work; All authors read, commented on, and approved the paper.

Corresponding author

Correspondence to Umer Zeeshan Ijaz.

Ethics declarations

Ethics approval and consent to participate

The Crohn’s disease study received ethics approval by the Yorkhill Research Ethics Committee (05/S0708/66). The obesity study was approved by the West of Scotland Research Ethics Committee (WoREC) and the Research and Development Department of National Health Service (R&D NHS) Greater Glasgow and Clyde on 14 September 2011 for a period of 4 years under the study reference number WS/11/032 and title “Diet, gut microbiota, and energy from colonic fermentation of dietary carbohydrates in children with simple and pathological obesity; cause or effect?”. For both studies, the carer and patients provided written consent.

Consent for publication

All authors read, commented on, and approved the paper.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

40168_2024_1834_MOESM1_ESM.docx

Additional file 1: Supplementary Materials and Methods, Notes 1-3, Tables S1–S2, and Figs. S1–S5. Table S1. List of CONCOCT clusters (genomes) that were significantly different in the CD groups (before, during, and after EEN treatment) compared with the group of healthy controls (H) (P-value < 0.05). Mean expression indicates the mean log-normalized abundance for each genome. A post hoc pairwise Dunn’s comparison indicating significant differences between the groups is shown on the right half. A; StartofEEN; B; 15d; C; 30d; D; EndofEEN; E; FreeDiet; H; healhy. Table S2. List of metabolites that were significantly different in the CD groups (before, during, and after EEN treatment) compared with the group of healthy controls (H) (P-value < 0.05). Mean expression indicates the mean Pareto-scaled frequency for each metabolite. A post hoc pairwise Dunn’s comparison indicating significant differences between the groups is shown on the right half. A; StartofEEN; B; 15d; C; 30d; D; EndofEEN; E; FreeDiet; H; healhy. Table S3. Table showing significant differences in subject characteristics between subjects who suffer from obesity of different aetiology and against controls who are lean (P-value < 0.05). SDS; Standard Deviation Scores; Ht; Height (cm), Wt; Weight (Kg), BMI; Body Mass Index (kg/m2).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Koci, O., Russell, R.K., Shaikh, M.G. et al. CViewer: a Java-based statistical framework for integration of shotgun metagenomics with other omics datasets. Microbiome 12, 117 (2024). https://doi.org/10.1186/s40168-024-01834-9

Download citation

Received: 16 July 2023
Accepted: 09 May 2024
Published: 29 June 2024
DOI: https://doi.org/10.1186/s40168-024-01834-9

CViewer: a Java-based statistical framework for integration of shotgun metagenomics with other omics datasets

Abstract

Background

Results

Conclusions

Introduction

Results

Crohn’s disease dataset

Community structure of the gut metagenome and metabolome

Differential abundance analysis

Changes in faecal bacterial metabolites during EEN

Integrated analysis of metagenomics and metabolomics

Obesity dataset

Bacterial community structure

Faecal bacterial metabolites

Implementation

Discussion

Conclusions

Availability of data and materials

References

Acknowledgements

Availability and requirements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

40168_2024_1834_MOESM1_ESM.docx

Rights and permissions

About this article

Cite this article

Share this article

Microbiome

Contact us