Deep subsurface physicochemical conditions enrich for a conserved microbial community over time
Chemical and microbial dynamics were interrogated across three wells within the STACK shale play, OK, USA. Differences in drilling and HF techniques between the STACK-14 and STACK-16 & 17 wells (Table S1) afforded a unique opportunity to investigate how variability in the chemistry and microbiology of the input fluids (“frack fluids”) used in fracturing of the shale influenced the microbial community assembly over time.
Microbiological and chemical analyses revealed that the frack fluids (Fig. 1) for each well (STACK-14 vs. STACK-16 & 17) had statistically discernable starting microbial communities (Figure S1) and metabolite chemistries (Table S4), as measured by 16S rRNA gene sequencing and Nuclear Magnetic Resonance (NMR) spectroscopy, respectively. For example, choline and isopropanol were discriminant chemical features in STACK-14 frack fluids, while acetate and glutarate were discriminant compounds in STACK-16 & 17 frack fluids (Table S4). However, despite initial differences in microbial community composition and chemical inputs, microbial communities in produced fluids collected 100 days after HF could no longer be statistically distinguished between the wells, suggesting deep subsurface shale conditions enriched for similar microbial taxa.
Metagenome-derived insights into community composition and dynamics mirrored observations made with complementary 16S rRNA gene datasets. Briefly, dominant taxa across both datasets were affiliated with Thermotogae, Fusobacteriales, and Clostridia. Focusing on metagenomic analyses, the dominant microbial community members between the three wells were represented by 24 metagenome assembled genomes (MAGs) (achieving >5% relative abundance at any time point) (Table S5). We observed the dominance of a single, high-quality Thermotoga petrophila MAG (M2-7-6-bin.8) (92% complete, <2% contamination) in the majority of all 18 produced fluid timepoints across the 3 wells and note the overwhelming dominance of this Thermotoga MAG through the entire STACK-17 timeseries (Fig. 2). The remainder of the microbial community across the STACK wells was dominated by MAGs affiliated with Firmicutes, Desulfobacterota, and Bacteroidota, with only one Archaeal MAG recovered (Halobacterota). Three MAGs were affiliated with two novel genera, Clostridia SK-Y3 (K-7-4-bin.6) and Peptococcia DRI-13 (M1-7-4-bin.22). We were unable to assign family-level placement for two MAGs (Fusobacteriales (K-7-4-bin.55) and Desulfitibacterales (K-7-2-bin.50)), highlighting their taxonomic novelty.
Metabolic characterization of these MAGs revealed that samples were dominated by inferred fermenters and sub-populations of inferred respiratory sulfate- and thiosulfate-reducing microorganisms (Table S5). Functional profiling of the prevalent Thermotoga MAG (M2-7-6-bin.8) revealed a fermentative lifestyle with the capacity for both simple and complex carbon degradation, findings similar to laboratory-based physiological studies of this genus [53,54,55,56]. Other key taxa inferred to be fermenters were affiliated with the classes Clostridia, Mahellia, and Bacteroidia. All inferred fermenters lacked genomic evidence of a complete electron transport chain, and here, we cataloged the possible organic carbon sources for growth via inventorying the genes encoding carbohydrate active enzymes (CAZymes) (Figure S2). MAGs that represent taxa inferred to perform sulfur cycling were affiliated with Classes Desulfovibrionia, Deferribacteres, Syntrophobacteria, Peptococcia, and Moorellia and were characterized by the presence of reductive dsrAB and/or phsA genes, smaller complements of CAZymes and more complete electron transport chains (Figure S2). Together, these dominant microorganisms have the potential to produce corrosive sulfide and organic acids, which are highly detrimental to the recovery of oil and gas in these systems.
Genome-resolved source-tracking reveals hydraulic fracturing inputs play a crucial role in the inoculation of dominant microorganisms in fractured shale ecosystems
Given that deep shale formations are most likely devoid of microbial life prior to HF, a key goal for management of these systems is determining the source of microbial taxa that subsequently colonize and persist within the fracture network. Previous studies by our research group and others have hypothesized that exogenous microorganisms introduced during the HF process are responsible for inoculating the fracture network [5, 6]. Here, we leveraged a novel and extensive catalog of input samples used in the development of the STACK-14, 16, & 17 wells to perform genome-resolved source tracking of 24 dominant MAGs in support of this hypothesis.
By mapping metagenomic reads from input samples to MAGs, we detected genomic signatures for five of the 24 dominant and persisting microorganisms in input samples (Figure S3), providing the first detailed source tracking of persisting, dominant microbes during the well engineering. Not all 24 dominant MAGs had detectable signals in input materials; however, this is likely due to the physical complexity of the materials and sequencing depth of samples rather than evidence of indigenous microbial life. Microorganisms that persist in hydraulically fractured shales often have metabolic potential to produce corrosive organic acids or sulfides which damage well infrastructure and interfere with oil and gas recovery. Indeed, two MAGs representing inferred fermentative taxa, including the dominant Thermotoga MAG (M2-7-6-bin.8), were identified in source water and frack fluids, while the SK-Y3 Clostridia MAG (K-7-4-bin.6) was detected in drill muds. Notably, three MAGs with putative roles in sulfur cycling (Shewanella, Peptococcia; DRI-13, Desulfitibacterales) were also detected in drill muds (Figure S3). The detection of four out of five key MAGs in the drill muds suggests that these organic-rich materials likely harbor key taxa that colonize the fracture network [57]. As such, these materials may require more targeted microbial control practices to minimize subsurface biomass growth. Additionally, the detection of the dominant Thermotoga genome in frack fluids offers strong evidence that this microorganism is derived from surface inputs. Given the prevalence of microorganisms in fractured shale ecosystems and the consequences of their metabolic by-products on subsurface infrastructure and resources, understanding these sources of biomass is crucial for targeted microbial management.
Organic additives used in the hydraulic fracturing process are a nutrient resource for shale colonizing microbial members
Complex organic additives used during the HF process may be degraded by colonizing microorganisms, potentially yielding more labile substrates [58]. To investigate how such processes supported microbial metabolism within the persisting shale community, we coupled MAG metabolic profiles with recovered fluid metabolite chemistry. Bacteria and archaea that encode expansive CAZyme profiles are likely capable of degrading polymers such as guar gum and cellulose—some of the most common organic polymers present in frack fluids [59, 60]. In the STACK system, we infer that multiple taxonomically distinct fermenters—primarily Thermotoga petrophila (M2-7-6-bin.8), Clostridia SK-Y3 (K-7-4-bin.6), and Fusobacteriales (K-7-4-bin.55)—were responsible for initially degrading the complex carbon polymers added as amendments (Fig. 3 and Figure S2). The potential for guar gum degradation was inferred from the presence of alpha-galactosidases that remove galactose side chains and beta-mannosidases that subsequently cleave the mannose backbone. Likewise, the ability to degrade cellulose was determined from the presence of CAZymes capable of cellulose backbone and oligo cleavage (Fig. 3 and Figure S2). Beyond these specific organic polymers, we detected genes encoding extensive collections of CAZymes (Figure S2) within many putative fermenters, indicating the capability for the degradation of other minor organic polymers introduced in the HF process.
The degradation of polymeric carbon by fermentative community members yielded a range of waste organic acids that likely fueled respiratory metabolisms through intracommunity metabolic exchange. Acetate production was predicted for the majority of MAGs encoding likely fermenters, and concentrations were observed to increase up to 7 mM in STACK-16 & 17 samples (Fig. 3). Similarly, high propionate concentrations (up to 600 μM) measured in STACK-16 & 17 samples likely resulted from the activity of dominant Thermotoga and Clostridia microorganisms (Fig. 3). Reflecting its role as a dominant genome in the STACK samples, the Thermotoga MAG encoded genomic potential for degradation of cellulose, guar gum, and xyloglucan, and its relative abundance was predictive (via sparse Partial Least Squares regression analyses; sPLS) of acetate, propionate, and butyrate metabolite concentrations in the fluids, findings consistent with the metabolic role predicted from the genome. Other significant sPLS linkages between genomes and organic acids were identified for MAGs affiliated with the Fusobacteriales, Clostridiales, and Desulfomicrobiaceae (Fig. 3), further supporting our genomic inferences of carbon cycling in this ecosystem.
While fermentative metabolisms are dominant in this system, we also observed the presence of a lower abundance sub-community of respiratory sulfur reducing microorganisms. Freshwater used in the hydraulic fracturing process can promote the dissolution of sulfate minerals from the surrounding rock matrix [61] and thus produced fluids frequently contain sulfate and thiosulfate. The organic acids that are generated as waste products from fermentative microorganisms likely serve as electron donors to support this respiratory lifestyle (Fig. 4). Specifically, the presence of putative sulfate- and thiosulfate-reducing microorganisms likely drives consumption of organic acids such as acetate and lactate (Figs. 3 and 4). Ultimately, we identified the genome-resolved metabolic potential to catalyze the flow of carbon from added complex organic polymers used in the HF process to the consumption of organic acids by inferred sulfate- and thiosulfate-reducing microorganisms. This finding further emphasizes the importance of input materials in sustaining the persisting microbial community for extended periods of time.
Active viral predation influences microbial community heterogeneity
Viruses were prevalent in STACK samples, with 5587 viral contigs (>10kb in length) identified across all produced fluid and input samples. The majority of viruses detected in this study were identified from topside input samples, with 748 found to persist in produced fluids recovered from the STACK shale play. The viral populations between wells encompass a majority of shared vMAGs, likely reflecting the previously noted microbial community convergence. However, we also detected subsets of vMAGs unique to each well (Figure S4) that could be reflective of unique genera, species, or strains that are not shared across wells. Prior to this work, 1838 vMAGs (>5kb in length), with only 852 >10Kb from 33 samples across 5 HF wells were recovered from the Appalachian Basin [24]. Indeed, only 17 of the viruses recovered from STACK samples were shared with Appalachian Basin vMAGs, and thus, our results greatly expand the virome sampling of geographically distinct hydraulically fractured shale ecosystems.
The unique viral populations scaled in proportion with the richness of MAGs in each well. Here, STACK-14 hosted the largest number of unique vMAGs and also exhibited the highest microbial host genomic richness. Of the 539 vMAGs that clustered with International Committee on Taxonomy of Viruses (ICTV)-classified reference sequences, all were classified within the viral order Caudovirales. Within Caudovirales, the majority were in the order Siphoviridae (39.5%) followed by Myoviridae (34.5%) (Figure S4). However, the majority of vMAGs identified in these STACK samples could not be assigned to ICTV taxonomic clusters, highlighting the novelty of viruses present in this engineered deep terrestrial ecosystem.
Responding to the presence of these viruses, the majority (18 of 24) of the dominant MAGs, including every MAG that achieved 20% or greater relative abundance in a given sample, encoded a CRISPR-Cas viral defense system (Fig. 5a). Furthermore, only one MAG, a low relative abundance Desulfomicrobiaceae (WD-3-bin.38) that was present at the last sampling time point (~500 days), lacked a CRISPR-Cas system. Through the perfect matching of viral protospacers (i.e., sequences in vMAGs) with spacers in bacterial CRISPR-Cas systems we directly linked viruses to 12 microbial hosts, with the majority of MAGs linked to multiple vMAGs (Fig. 5a). The identification of CRISPR-Cas-protospacer matches between viruses and half of the persisting bacterial hosts highlights the extent of virus-host interactions in this subsurface ecosystem and the role these processes likely play in shaping community assembly.
Our findings also provide new insights into viral ecology of this system. We report an instance where the same virus was linked to two distinct Firmicutes MAGs, Peptococcia, and Clostridia SK-Y3 (M1-7-4-bin.22 and K-7-4-bin.6, respectively). Identification of the protospacers from both MAGs that were linked to the virus revealed that they were not identical and matched viral genes for phosphopentomutase and a helix-turn-helix domain protein. We consider this observation likely the result of incorporation of protospacer sequences from common viral genes (likely from two distinct viruses) into bacterial spacer arrays, rather than multiple infections from the same virus with a broad host range. Viruses generally exhibit high host specificity and infection across multiple different genera is uncommonly reported using similar methods [62, 63].
Bacterial interactions with viruses, as inferred from CRISPR-Cas linkages, had variable impacts on the ability of a given MAG to persist within the STACK ecosystem. For example, a Lachnospirales MAG (M2-7-5-bin.8) that was linked to seven unique viruses exhibited dramatic decreases in relative abundance across all three wells—a common characteristic of microorganisms under viral predation in many other ecosystems [24, 64, 65] (Fig. 5a). In contrast, the dominant Thermotoga MAG was linked to two viruses yet generally did not exhibit relative abundance decreases (Fig. 5a) between the timepoints sampled. We note however that MAGs are composites of many populations of closely related members, and thus, the impact to specific strains may be obscured in this approach.
It is likely that many taxa are impacted by viral predation in this ecosystem. Evidence in support of this is the positive correlation between the most abundant MAGs (e.g., Thermotoga) and the relative abundance of viruses that are linked to them (Fig. 5b). Given the requirement of active bacterial cells for viral replication, these patterns imply that dominant microbial taxa must be continually infected and lysed to support these large pools of free viruses. Additionally, cell lysis can result in mobilization of key metabolites that can subsequently act as substrates for the remaining microbial community [24]. We previously observed such processes occurring at the strain level in samples recovered from Appalachian Basin shales, where infection and associated cell lysis of one Halanaerobium congolense cultivated strain yielded niche space for emergence of another distinct strain [24]. However, these dynamics can be obscured at higher taxonomic (e.g., species or MAG) levels, resulting in the appearance of stable community composition. Here, we anticipate that similar virus-host interactions are occurring in Thermotoga, resulting in an ongoing “arms race” between multiple Thermotoga petrophila strains and associated viruses that supports high relative abundances of both virus and host.
Lower salinity deep shales are characterized by higher taxonomic and metabolic diversity and the dominance of Thermotogae
To date, the majority of genome-resolved metagenomic studies detailing the microbiology of HF systems were performed in eastern US shale formations (i.e., Marcellus & Utica formations). These systems distinguish themselves from the STACK shale play through the presence of highly saline produced waters, generated from the dissolution of salt minerals in the shale rock [14, 28, 66]. For example, produced water in the Appalachian Basin can reach brine-level salinities (126.74 ± 35.61 mS/cm), whereas salinities in the STACK produced fluids were roughly 5-fold lower (25.06 ± 8.85 mS/cm). Although accurate temperature measurements for hydraulically fractured wells can be difficult to obtain, it is likely that the STACK wells also exhibit higher temperatures compared to their eastern counterparts [28, 29].
Due to thermodynamic and physiological constraints, salinity likely exerts a strong influence on the microbial community within the shale fracture network. Consistent with this concept, we measured 4-fold higher Shannon’s diversity in these less saline STACK wells, relative to microbial communities in produced fluids from Appalachian Basin wells (Table S6). However, as generally observed across the majority of time-resolved shale studies, microbial alpha diversity in STACK samples decreased with time, reflecting the influence of other abiotic and viral constraints on community assembly through the lifetime of the wells (Fig. 2).
Salinity can also constrain the ability of specific metabolisms (and therefore taxonomies) to operate in a given environment [25,26,27]. For example, heterotrophic sulfate reduction may not be thermodynamically favorable in environments where the cost of osmoregulation is greater than the energy gained from a redox couple. This principle was previously used to explain the absence of canonical sulfate reducing microorganisms from high salinity wells in the Appalachian Basin [5]. In contrast to those results, here we observed a persistent, low relative abundance community of inferred respiratory sulfate- and thiosulfate-reducing microorganisms in the STACK wells that are likely able to tolerate the lower salinity conditions (Fig. 3). Further underscoring the contrasts between these two basins, we also note the lack of genomic potential for the cycling of quaternary amines and methylamines in the lower salinity STACK shale play (Supplementary Discussion). However, despite lower salinities relative to the Appalachian Basin, we still observe the prevalence of osmoprotection strategies in the dominant STACK MAGs, suggesting the importance of this physiological trait in persisting in this ecosystem (Figure S5, Supplementary Discussion).
Finally, we note the impact of salinity on the distribution of Thermotoga species across HF shales. As described earlier, a Thermotoga petrophila MAG (M2-7-6-bin.8) was dominant in the majority of STACK produced water samples (Fig. 2). This is in contrast to the majority of samples from the Appalachian Basin where halophilic fermenter Halanaerobium strains dominate the microbial communities and likely occupy similar niches in the shale ecosystem [5, 13, 22, 67, 68].
Expanding these analyses, we assessed the geographic distribution of Thermotoga across a range of fractured shales displaying gradients in salinity and temperature [18, 28, 29]. Equipment used in the drilling and development of HF wells is re-used across large geographic areas, potentially aiding in the distribution of dominant microorganisms such as Thermotoga and Halanaerobium. However, as shown here, neither of these taxa dominate in all shale formations. To better understand this pattern, marker gene (i.e., 16S rRNA gene) relative abundance data from this study was paired with results from existing deep shale ecosystem studies from the Utica [5] and Marcellus formations [7, 16], Bakken [17, 18] and Three Forks formations, and the Denver-Julesburg (DJ) Basin [18]. Our analysis revealed that Thermotogae displays a clear biogeographic signal, decreasing dramatically in relative abundance as formation salinity increases to values characteristic of Appalachian Basin or Bakken formations (Fig. 6). These observations suggest that in situ salinity may act as a control on Thermotoga distribution across the deep terrestrial subsurface. Temperature also likely has an effect on microbial community composition in different shale formations. In contrast to the effects of salinity, elevated temperatures are known to select for thermophilic taxa such as Thermotoga [69, 70]. As such, the presence of higher in situ temperatures in the STACK formation (100–120°C) [29] compared to the Appalchian Basin (50–100°C) [28], likely promotes Thermotogae dominance in this system. We speculate that in more saline shale ecosystems, Thermotoga may be unable to compete with Halanaerobium, while in the presence of elevated temperatures and lower salinity (i.e., STACK shale play, DJ Basin), Thermotoga may out-compete Halanaerobium with lower temperature growth thresholds.
Despite the differences in microbial community composition between the STACK shale play and the Appalachian Basin, the dominance of either Halanaerobium or Thermotoga highlights the central conserved role that fermenters play both in these ecosystems. These halophilic or thermophilic taxa may be thought of as “microbial weeds” that encode specific traits, allowing them to maximize the conditions and available resources in the aftermath of HF and out compete other microbial community members [71]. We infer that the ability to degrade complex carbon polymers used in the HF process is a key trait for microorganisms to persist in fractured shale ecosystems. Although Thermotoga contained CAZymes for degradation of common topside additives (e.g., Guar Gum), this MAG contained fewer CAZymes than many of the other inferred fermenters in the STACK system. These observations suggest that in the relatively stable chemical environment of the deep subsurface, a more constrained genomic repertoire may be optimal for persisting over extended periods of time [72], in contrast to other “opportunitroph” microorganisms with broader metabolic and physiological potential [73].