Gut microbiome analysis of indigenous Malaysians and urban controls
To identify and characterize helminth-associated microbiome effects, this study consisted of a cross-sectional component that compares urban individuals (n = 56) living in Kuala Lumpur (KL) with indigenous Orang Asli (OA) (n = 351) from five different villages (Figs. S1 and S2), as well as a longitudinal component to examine changes to the microbiome after anthelmintic (albendazole) treatment. A total of 650 fecal samples (including longitudinal samples) were processed for metagenomic sequencing, resulting in 11,480,206,516 paired reads after quality control and contamination removal (Supplementary Fig. S3). We compared different OA villages, which have different prevalence of soil-transmitted helminth infections (Supplementary Figs. S1 and S2). In the longitudinal phase of the study, consented OA subjects were treated with 400 mg albendazole for 3 consecutive days after collection of the first fecal sample. At 21 and 42 days following treatment, additional fecal samples were collected; however, this phase of the study was disrupted by the COVID-19 pandemic, reducing the number of paired samples available. KL subjects were not treated with albendazole, and they provided only one sample. Questionnaire data were collected and analyzed for some of the study subjects (n = 340).
When we first mapped the metagenomic sequences to RefSeq (i.e., bacteria, protozoa, fungi, viral, archaea genomes), we observed a very low percentage of mapped reads (median: 41.6%). However, when we mapped the sequences to databases that incorporate MAGs (i.e., HRGM [25] and UHGG [3], the percentage of sequencing reads mapped to HRGM (median: 91.5%) and UHGG (median: 87.9%) was much higher than RefSeq (Fig. 1A). Additionally, the percentage of mapped reads to all three databases was higher in KL subjects than the OA population (HRGM: p = 2.6e−11; UHGG: p = 1.2e−07; RefSeq: p = 2e−07) (Fig. 1A), indicating that there are more unknown microbial genomes in the OA population.
Utilizing HRGM, we determined the core microbiota for the Malaysian population and found that 237 core bacterial species were 100% shared among the subjects (Fig. 1B; Supplementary Figs. S4 A–E and S5 A–C). The most abundant phylum was Firmicutes A, the majority of which were uncultured species [3] (Fig. 1B). Agathobacter rectalis, Blautia_A wexlerae, and Agathobacter faecis were the main species from Firmicutes A (Fig. 1B). Using a cross-validated random forest model to identify core microbiota species driving the variation between OA vs KL subjects, we achieved a mean prediction accuracy of 98.05% at a kappa of 96.06% (out-of-bag error = 1.8%). Megamonas funiformis, Phocaeicola plebeius A, Bacteroides stercoris, Phocaeicola massiliensis, and HRGM Genome 3145 were the top five predictors between OA and KL subjects (Supplementary Fig. S6 A–C). Of these, HRGM Genome 3145, Gemmiger sp900539695, and Blautia A sp000436615 were more abundant in OA subjects, while Megamonas funiformi, Phocaeicola plebeius A, and Bacteroides stercoris were more abundant in KL subjects (Supplementary Fig. S6 A–C). The bacterial species with the largest variation (cutoff 6.0 for the coefficient of variation) among the core gut microbiota is shown in Supplementary Fig. S7A. To control for covariates, we utilized (MaAsLin2) to identify bacterial taxa differentially abundant between OA and KL subjects that are independent of village, age, and sex. Fourteen bacterial species, of which many are uncharacterized, including HRGM_Genome_2427 (p = 0.009), CAG 964.sp000435335 (p = 0.009), and Ruminococcus_E sp003438075 (p = 0.009), are more abundant in OA subjects, whereas HRGM_Genome_0171 (p = 2e−4) and HRGM_Genome_3486 (p = 0.009) are more abundant in KL subjects (Fig. 1C and Supplementary Table S1).
The Orang Asli live in different geographical settings and have distinctive cultures and lifestyles. We found that KL subjects have higher mapped reads than all OA villages (Fig. 1D; Supplementary Fig. S7 B and C), and the percentage of mapped reads from both villages Rasau (p = 2e−16) and Legong (p = 2e−16) was markedly lower compared to KL (Fig. 1D; Supplementary Fig. S7 B and C). To compare pairwise beta diversity at the species level within each village group to the KL cohort and to use a reference independent strategy as an alternative approach, we assessed Jaccard distances using 21 nucleotide k-mers and genus-level annotations from HRGM, which showed similar results (Fig. 1E). In addition, we observed that Rasau and Legong had the highest beta diversity and nucleotide dissimilarity compared to KL (Fig. 1E). Moreover, comparison of bacterial communities at species level across geographical locations using Jaccard distance revealed substantial differences between villages (ADONIS: p = 0.001, R2 = −0.073; analysis of similarity [ANOSIM]: p = 0.001, R = 0.215) (Fig. 1F, Supplementary Table S2). From the principal coordinate analysis (PCoA) plot (Fig. 1F), we observed clustering of the samples from Rasau and Legong. Conversely, the samples from Bangkong and Sepat were clustered together with KL, while Judah exhibited a more dispersed distribution. Hence, OA subjects in Rasau and Legong were more similar in gut microbial composition and were different from KL and other villages. Equivalent beta-diversity results were observed with other k-mers sketches (31 and 51) and at the species level (Fig. S8 A–G).
Village-dependent effects of helminth infection on the gut microbiome
We determined the infection intensity and the prevalence of intestinal helminth infection among the 351 OA subjects and found that Trichuris infection (61.8%, n = 217) was the most predominant, followed by hookworm (20.8%, 73) and Ascaris (17.9%, 63) infections (Fig. 2A). The distribution of age and gender of these subjects is shown in Fig. S9 A and B. The overall prevalence of helminth infection was 67.2% (n = 236) (Fig. 2A), and infection intensity was summarized in Fig. S9C. For beta diversity at species level, based on PCoA, there were differences in gut microbiome between infected and uninfected individuals; however, statistically, the effect size was small (ADONIS: p = 0.001, R2 = 0.024; ANOSIM: p = 0.001, R = 0.145) (Fig. 2B, Supplementary Table S2), which was also the case for Bray-Curtis distance and nonmultidimensional scaling (NMDS) ordination (Supplementary Fig. S10 A–C).
For alpha diversity at species level, we observed higher species richness in the samples from infected subjects (p = 2.50e−5) (Fig. 2C). This relationship was confirmed by a linear mixed model analyses controlling for village as a random effect (p = 1.18 × 10−6). Individuals infected with either single (p = 0.005) or multiple species of helminths (p = 0.033) had higher species richness (Fig. 2C). Trichuris-infected OA (p = 9.70e−06) had higher species richness than uninfected (Fig. 2C), including those infected at light (eggs per gram [epg] < 999; p = 0.045) and moderate (epg < 9,999; p = 8.34e−07) intensities (Fig. 2C). Other alpha-diversity indices (i.e., Shannon and Simpson, at species level as well) are shown in Supplementary Fig. S11 A–H, and results for each village are shown in Supplementary Fig. S12 A–E. The prevalence of helminth infection varied according to village and was highest in Rasau (89.6%, n = 43 of 48), followed by Legong (81.0%, 81 of 100), Judah (71.6%, 83 of 116), Sepat (55.0%, 22 of 40), and Bangkong (14.9%, 7 of 47) (Fig. 2D). As Trichuris was the predominant helminth, the prevalence of Trichuris was similar for Rasau (81.3%, 39 of 48), Legong (77.0%, 77 of 100), Judah (65.6%, 76 of 116), Sepat (50.0%, 20 of 40), and Bangkong (10.6%, 5 of 47) (Fig. 2D). There was no infected individual with helminths in KL. The two villages with the highest prevalence, Rasau (p = 2.0e−4) and Legong (p = 8.1e−07), showed higher species richness compared to KL (Fig. 2E). Also, we observed that species richness appeared to be greater when helminth infections in the villages were more prevalent, which was similar to the order of villages for unmapped reads shown in Fig. 1D. To determine if Trichuris infection intensity was associated with unmapped reads, we performed a Spearman correlation test and found that the intensity of Trichuris infection was positively correlated (p = 3.2e−06, R = 0.25) with the percentage of unmapped reads to the HRGM database (Fig. 2F). These results indicated that helminth infections were associated with underrepresentation in the catalog of bacterial genomes.
We next determined the relative contribution of village and helminth infection status on the gut microbiome in relation to other factors (e.g., whether they had probiotic food, diarrhea, or antibiotics in the past 3 months, different age groups, and protozoa infection). ADONIS analysis at species level indicated that only village (p =1.000e−4, F-value = 1.672, R2 = 0.025) and helminth status (p = 0.028, F-value = 1.387, R2 = 0.010) had significant effects on the gut microbiome composition (Fig. 2G). Since village has the largest effect size on gut microbiome composition, we next used MaAsLin2 [26] to identify bacterial species that were differentially abundant between Trichuris infected and uninfected individuals from specific villages. Importantly, we found that the bacterial species that were most differentially abundant between infected and uninfected subjects were unique to each village (Supplementary Fig. S13A). For example, Haemophilus_A.parahaemolyticus and Corynebacterium provencense were different in Bangkong and Rasau, whereas Prevotella.sp900316565 was different in Sepat, C941.sp004557565 and UBA10281.HRGM_Genome_2392 in Judah, Prevotella.sp900546575 and Prevotella.HRGM_Genome_3676 in Legong, and UBA1829.sp900549045 and F082.HRGM_Genome_5331 in Rasau (Supplementary Fig. S13A). Similar patterns of results were obtained with ANCOM-BC (Supplementary Fig. S13B). These results indicated that helminth infections may have different effects on the gut microbiome in different villages.
To specifically test the hypothesis that effects of helminth infection on the microbiome are highly dependent on village, we used multivariate distance matrix regression (MDMR) [27] to test for statistical interactions between helminth infection and village and to calculate relative effect sizes on microbiota variation at species level. We find that there was a significant interaction between village and helminth infection, and that the effect of village is greater than helminth infection status after accounting for the effects of this interaction (Fig. 2H).
To identify bacterial features that are significant in helminth-village interactions, as well as independent of these covariates, we used MaAsLin2 with helminth and village as fixed effects, to identify bacteria that are independent and associated with the interaction between helminth and village. Of the 230 helminth-associated bacteria, more than 55% (n = 135) were associated with village (Fig. 2I). Hence, most of the effects of helminth-associated bacteria are village dependent, and in different villages, there are different bacteria associated with helminth infections. Several Lactobacillus species, including Lactobacillus gasseri and Lactobacillus crispatus, were associated with helminth infection independent of village (Supplementary Fig. S14 and Table S3).
Dynamic changes to the gut microbiome after anthelmintic treatment
Longitudinal interventional approaches provide stronger assessment of cause-and-effect relationships. Fecal samples analyzed at pre- and post-anthelmintic treatment provided insights into the effects of deworming on the gut microbiome. Individual subjects were grouped into four categories (i.e., full responders [n = 43 paired; from 26–33,099 to 0 epg], partial responders [n = 23 paired; from 281–119,875 to 26–71,579 epg], nonresponders [n = 5 paired; from 204–1097 to 281–1632 epg], and uninfected [n = 58 paired]), based on the Trichuris infection intensity before and after deworming (Fig. 3A). While mixed infection was present in some individuals, hookworm and Ascaris infection were always cured after deworming (Supplementary Fig. S15A).
First, we compared pre and post samples for responders, which include both full and partial responders. PCoA based on Jaccard distances showed that there are differences in gut microbiota composition at species level between pre and post treatment, but the effect size was small (ADONIS: p = 0.001, R2 = 0.014; ANOSIM: p = 0.001, R = 0.072) (Fig. 3B, Supplementary Table S4). Since albendazole may have a direct effect on the microbiota, we next compared the gut microbiota profile pre and post treatment for uninfected individuals. Similar to the responders, PCoA based on Jaccard distances also indicated differences in gut microbiota composition at species level between pre and post samples, with a small effect size (ADONIS: p = 0.006, R2 = 0.012; ANOSIM: p = 0.001, R = 0.069) (Fig. 3C, Supplementary Table S4). NMDS ordination, Bray-Curtis distance matrix, and beta-dispersion analysis showed similar results (Supplementary Figs. S16 A–E and S17 A–E, Table S4), and there were no significant changes to alpha diversity at species level between pre and post treatment (i.e., Richness, Shannon, Simpson) (Supplementary Fig. S15 B and C).
Using MaAsLin2 for differential abundance testing, we found changes of 911 bacterial species at pre and post treatment among responders. However, there was substantial overlap with changes found in pre and post treatment samples for uninfected individuals (658 species, 72.2%) (Fig. 3D and Supplementary Fig. S18A), with only 253 taxa which were specific to the responders. For example, in both responders and uninfected individuals, the relative abundance of Collinsella sp003466125 (p = 1.52e−08; p = 3.66e−07, respectively) and RUG013.sp001486445 (p = 2.53e−07; p = 3.80e−06) was reduced after deworming, while the relative abundance of Bilophila sp900550745 increased (p = 1.33e−08; p = 5.81e−05) (Supplementary Fig. S18 B and C). To assess the longitudinal effects of albendazole treatment, we also used MaAsLin2 to identify taxa altered by treatment response, controlling for infection status and village as fixed effects. Of the 576 species that were identified to be associated with these covariates, the majority were associated with village (n = 305) and with infection status (n = 200), and only 69 species were associated with treatment response, of which only four species were independent of village and infection status (Supplementary Fig. S19A). Of the four, only one (CAG.245.sp900552135) showed a statistically significant (Supplementary Fig. S19B) association with treatment response but independent of village or infection status (Supplementary Table S5).
Next, we used MaAsLin2 to identify bacterial taxa that were associated with drug response and helminth status, correcting for village as a covariate. There were a total of 293 bacteria species that were associated with drug response, only six of which were associated with host response (Supplementary Fig. S19C). There were only two taxa (Sutterella HRGM Genome 4418 and Muricomes contorta B) associated with deworming treatment in responders but not in nonresponders in this model. The Sutterella HRGM Genome 4418 (p = 8.10e−06) was more abundant in helminth-infected individuals, whereas Muricomes contorta B (p = 0.024) was more enriched in nonhelminth-infected individuals (Fig. 3E). In contrast, there were many more taxa (n = 295) that are associated with deworming treatment in both responders and nonresponders like Collinsella sp900540845 and Collinsella stercoris (Fig. 3F), which indicates that the effects of albendazole were greater on the bacterial communities than helminth infection itself (Supplementary Table S6). Hence, albendazole may have a substantial effect on the microbiota that may be an important confounding factor for deworming studies.
In some individuals, we conducted a follow-up study 42 days post-anthelmintic treatment. There were no differences in alpha diversity on day 42 (Supplementary Fig. S20A), and although beta-diversity analysis at species level showed significant differences between three timepoints (i.e., pre, 21 days, and 42 days) (Supplementary Fig. S20 B–C, Supplementary Table S7), these differences are driven by the pre-treatment samples (Supplementary Fig. S20D). Therefore, the changes in the gut microbiome in both responders and uninfected individuals after albendazole treatment remain stable by day 42.
Bacterial replication in the context of helminth infection
Actively replicating bacteria can be identified by calculating the index of replication based on coverage trends of bidirectional genome replication from a single origin of replication. We used the algorithm growth rate index (GRiD) to estimate the growth rate of gut bacteria in relation to helminth infection status. Spearman correlation analysis on Trichuris egg burden with the growth rate of the bacteria identified 350 bacterial species with growth rate associated with Trichuris egg burden (Fig. 4 A and B, Supplementary Fig. S21, and Supplementary Table S8). Prevotella stercorea replication was most positively associated (p = 1.58e−14, R = 0.39) with egg burden, while Bifidobacterium longum (p = 1.30e-11, R = −0.35) and Phocaeicola vulgatus (p = 3.45e−9, R = −0.31) were negatively associated with egg burden. Using a linear mixed model of Trichuris egg burden while controlling for village, Prevotella stercorea and Bifidobacterium longum were significantly associated with egg burden (p = 4.89e−06 and p = 7.99e−08, respectively). The predicted replication rate of Prevotella sterorea was higher in Trichuris-infected individuals (p = 1.30e−09), while the predicted replication rate of Bifidobacterium longum and Phocaeicola vulgatus was notably lower in Trichuris-infected individuals (p = 4.50e−09and p = 9.80e−09, respectively) (Fig. 4C).
For the longitudinal deworming component of the study, we observed that the growth rate of 93 bacterial species was different between pre and post treatment samples among the responders. Among these bacterial species, slightly more than one-third of them (n = 33) were also identified from the cross-sectional analysis (Supplementary Fig. S22A). Spearman correlation analysis demonstrated that the growth rate of uncultured Oscillibacter sp. (p = 6.000e−04) and Phocaeicola vulgatus (p = 0.006) was significantly correlated with Trichuris burden (Supplementary Fig. S22B and Supplementary Table S9). After we verified the results by building a linear mixed model controlling for village, both uncultured Oscillibacter sp. (p = 0.033) and Phocaeicola_vulgatus (p = 0.028) were negatively associated with Trichuris burden, indicating more replication in responders after anti-helminthic treatment (Supplementary Fig. S22C). However, considerable portions of responder-associated taxa (32 out of 93) were also observed in the noninfected individuals (n = 67) (Supplementary Fig. S22D). Therefore, it could be difficult to disentangle the effects of Trichuris infection and direct effects of albendazole treatment on the dynamics of the microbiome. Hence, we conducted MaAsLin2 analyses to identify bacterial replication (based on GRiD score) that were associated with Trichuris infection while controlling for treatment group and village. Collinsella_sp._TF06.26 (adjusted p = 0.004) was positively associated with Trichuris infection, while Phocaiecola vulgatus (adjusted p = 0.04), Burkholderia sp. K4410.MGS.135 (p = 0.04), Bacteroides stercoris (adjusted p = 0.04), and Phocaeicola massiliensis (adjusted p = 0.04) were negatively associated with Trichuris infection (Supplementary Table S10).
Functional gene profiles of the Orang Asli microbiota and the effects of albendazole treatment
We used the HUMAnN tool to investigate pathway inference and gene families with Pfam domains (Figs. S23–S25). To adjust for covariates, we used MaAsLin2 to identify the pathways and gene families that were differentially abundant between the following: (1) Orang Asli and Urban cohorts, while controlling for age and sex; (2) Helminth positive and negative individuals, while including village as a covariate; and (3) Treatment response groups while controlling for village. Using these models, we find that the L-tryptophan biosynthesis superpathway was enriched in the Orang Asli microbiome compared to urban controls from KL (Supplementary Figs. S23 and S25A). Tryptophan, an essential amino acid, and its catabolites have been suggested to affect intestinal homeostasis through the aryl hydrocarbon receptor and may be important in inflammatory bowel diseases [28]. Hence, future work on microbial metabolism in the Orang Asli may focus on this pathway.
After controlling for village, there were no significant pathways differentiating helminth positive and negative individuals. Between different villages, we found the strongest significance for the peptidoglycan biosynthesis II (Staphylococci) pathway (Supplementary Figs. S23 and S25B). Villages with high helminth prevalence have individuals enriched in this pathway, but we did not find a significant relationship between helminth infection and Staphylococcus aureus abundance. This pathway may be important in the generation of peptidoglycan in other gram-positive bacteria, and the significance of the geographical difference in the abundance of this pathway is still unclear. We also found that the microbiome after albendazole treatment is enriched for the L-glutamate degradation V (via hydroxyglutarate) pathway (Supplementary Figs. S23 and S25C), which is an indication that albendazole may affect the fermentation of amino acids in an anoxic environment. Additionally, in our gene family enrichment analysis, our only substantial observation is that the phosphoenolpyruvate carboxylase gene family was decreased after albendazole treatment (Supplementary Figs. S2 and S25D). This also indicates how albendazole can affect metabolic processes of the microbiome; however, the implications of these results remain unclear.