A gene co-association network regulating gut microbial communities in a Duroc pig population

Analyses of gut microbiome composition in livestock species have shown its potential to contribute to the regulation of complex phenotypes. However, little is known about the host genetic control over the gut microbial communities. In pigs, previous studies are based on classical “single-gene-single-trait” approaches and have evaluated the role of host genome controlling gut prokaryote and eukaryote communities separately. In order to determine the ability of the host genome to control the diversity and composition of microbial communities in healthy pigs, we undertook genome-wide association studies (GWAS) for 39 microbial phenotypes that included 2 diversity indexes, and the relative abundance of 31 bacterial and six commensal protist genera in 390 pigs genotyped for 70 K SNPs. The GWAS results were processed through a 3-step analytical pipeline comprised of (1) association weight matrix; (2) regulatory impact factor; and (3) partial correlation and information theory. The inferred gene regulatory network comprised 3561 genes (within a 5 kb distance from a relevant SNP–P < 0.05) and 738,913 connections (SNP-to-SNP co-associations). Our findings highlight the complexity and polygenic nature of the pig gut microbial ecosystem. Prominent within the network were 5 regulators, PRDM15, STAT1, ssc-mir-371, SOX9 and RUNX2 which gathered 942, 607, 588, 284 and 273 connections, respectively. PRDM15 modulates the transcription of upstream regulators of WNT and MAPK-ERK signaling to safeguard naive pluripotency and regulates the production of Th1- and Th2-type immune response. The signal transducer STAT1 has long been associated with immune processes and was recently identified as a potential regulator of vaccine response to porcine reproductive and respiratory syndrome. The list of regulators was enriched for immune-related pathways, and the list of predicted targets includes candidate genes previously reported as associated with microbiota profile in pigs, mice and human, such as SLIT3, SLC39A8, NOS1, IL1R2, DAB1, TOX3, SPP1, THSD7B, ELF2, PIANP, A2ML1, and IFNAR1. Moreover, we show the existence of host-genetic variants jointly associated with the relative abundance of butyrate producer bacteria and host performance. Taken together, our results identified regulators, candidate genes, and mechanisms linked with microbiome modulation by the host. They further highlight the value of the proposed analytical pipeline to exploit pleiotropy and the crosstalk between bacteria and protists as significant contributors to host-microbiome interactions and identify genetic markers and candidate genes that can be incorporated in breeding program to improve host-performance and microbial traits. CMu2mBtM4pXqucLimd7oyA Video Abstract Video Abstract


Background
The gut microbiota is a diverse ecosystem predominantly dominated by bacteria, but other microorganisms such as fungi and protists are also present. Influenced by the host immunity and environmental factors such as age, diet, and geography, the gut microbiota is known to be involved in many physiological functions as well as in disease pathogenesis (see for instance the recent reviews of Richard and Sokol, 2019 [1]; Torp Austvoll et al. 2020 [2]).
Lagging behind humans and model organisms, studies of the gut microbiome in livestock species have dramatically increased over the last decade thanks to a decreased metagenome next-generation sequencing cost. In the particular case of the pig, these studies have provided valuable information into the gut microbiota compositional changes, as well as associations between the porcine gut microbial communities and production traits [3][4][5][6][7][8]. In contrast, little is known about the host genetic control over the gut microbial communities in pigs. Previous studies based on classical "single-gene-single-trait" approaches and have independently evaluated the role of host-genetic genome controlling pig gut prokaryote [9][10][11] or eukaryote communities [5]. Therefore, ignoring the crosstalk between bacterial and protist communities, but also the contribution of gene-by-gene interactions shaping the pig gut microbial ecosystem. In their review, Aluthge et al. (2019) [12] identified the lack of sophisticated computational and bioinformatics approaches as the major bottleneck to look beyond compositional changes and instead understand the functional causative role of the pig gut microbiome.
To address this void, we propose an association weight matrix approach (AWM) [13,14] to generate a gene coassociation network where nodes are SNP mapped to or nearby gene coding regions, and found to be associated with the diversity and abundance of 31 bacterial and 6 commensal protist genera in the gut of 390 pigs genotyped for 70 K SNPs. In building and interpreting the network, we place emphasis on gene regulators responsible for the crosstalk between the host bacterial and protists communities.

Methods
Animal care and experimental procedures were carried out following national and institutional guidelines for the Good Experimental Practices and were approved by the IRTA Ethical Committee.

Animals, sample collection, and gut microbiome phenotypes
The pigs employed in this study are a subset of those reported in Ramayo-Caldas et al. (2020) [5]. In brief, we used a total of 405 weaned piglets (204 males and 201 females) distributed in seven batches. Samples were collected in a commercial farm at 60 ± 8 days of age. Fecal DNA was extracted with the DNeasy PowerSoil Kit (QIAGEN, Hilden, Germany), following manufacturer's instructions. The 16S rRNA gene fragment was amplified using the primers V3_F357_N: 5′-CCTACGGGNG GCWGCAG-3′ and V4_R805: 5′-GACTACHVGGGTA TCTAATCC-3′. Protist-specific primers F-566: 5′-CAG-CAGCCGCGGTAATTCC-3′ and R-1200: 5′-CCCGTG TTGAGTCAAATTAAGC-3′ were used to amplify the 18S rRNA gene fragment. Amplicons were paired-end (2 × 250 nt) sequenced on an Illumina NovaSeq (Illumina, San Diego, CA, USA) at the University of Illinois Keck Center. Sequences were analyzed with QIIME2 [15] and processed into amplicon sequences variants (ASVs) at 99% of identity. Samples with less than 10,000 reads were excluded and ASVs present in less than three samples and representing less than 0.005% of the total counts were discarded. ASVs were classified to the lowest possible taxonomic level based on SILVA v123 database for 18S rRNA genes, and GreenGenes Database for Bacteria [16]. Bacteria and protist alpha diversity were evaluated with the Shannon index (Shannon, 1948); before the estimation of diversity indexes, samples were rarefied at 10,000 reads of depth. Finally, ASVs were aggregated at genera level, and only those genus present in more than 60% of the samples were considered in posteriors data analysis. In total, we captured 39 microbiome phenotypes that included 2 diversity indexes, and the relative abundance of 31 bacterial and 6 commensal protist genera. (Supplementary Table 1).

Genotypes and genome-wide association studies (GWAS)
The Porcine 70 K GGP Porcine HD Array (Illumina, San Diego, CA) was used to genotype 390 out of 405 animals. We excluded single nucleotide polymorphisms (SNPs) with minor allele frequencies < 5%, rates of missing genotypes above 10%, as well as SNPs that did not map to the porcine reference genome (Sscrofa11.1 assembly). Then, to identify SNPs from the host genome associated with the alpha diversity as well as bacteria and protists relative abundances, a series of 39 GWAS were performed between 42,562 SNPs and the alpha diversity or the centered log ratio (clr) transformed genera abundance. For the GWAS, we used the GCTA software [17] using the following model at each SNP: where y ijk corresponds to the microbiome phenotype under scrutiny of the i-th individual animal of sex j in the k-th batch; sex j and b k correspond to the systematic effects of j-th sex (2 levels) and k-th batch (7 levels), respectively; u i is the random additive genetic effect of the i-th individual, collectively distributed as u~N(0, σ 2 u G) where σ 2 u is the additive genetic variance and G is the genomic relationship matrix calculated using the filtered autosomal SNPs based on the methodology of Yang et al. (2011) [17]; s li is the genotype (coded as 0,1,2) for the l-th SNP of the i-th individual, and a l is the allele substitution effect of the l-th SNP on the microbiome phenotype being analyzed. Following [18] and with equivalent original derivations from [19], FDR was calculated as Where P is the P value tested, A is the number of SNP that were significant at the P value tested, and T is the total number of SNP tested. For further analyses and in order to allow for direct comparison across phenotypes, estimated SNP effects were standardized by dividing them by the standard deviation of all SNP effects.
Association weight matrix, regulatory impact factors, and gene co-association networks We used the AWM methodology [13,14] in combination with the regulatory impact factors (RIF) algorithm [20] to identify the SNP, anchored to genes, to be included in the co-association gene network. Genes in the AWM are tagged by SNP that were found to be either associated with the key phenotype (alpha diversity), pleiotropic, or key regulators. In detail, the AWM was built in accordance with the following steps:

Initial search of SNP in genes (SNP-gene)
Using a population of 20 European pig breeds, Muñoz et al. (2019) [21] reported linkage disequilibrium of r2 > 0.2 for SNP pairs separated by 0.05 Mb. Therefore, we selected SNP located in the coding region or within 5 kb of an annotated gene based on the Sscrofa11.1 reference genome assembly.

SNP-genes associated with key phenotypes
Using the alpha diversity indexes for bacteria and protists as key phenotypes, we selected those SNP-Gene associated with alpha diversity and namely diversityassociated SNPs.

SNP-genes with pleiotropic potential
We captured the average number of phenotypes to which the diversity-associated SNPs were associated with, namely N A , and selected the remaining SNP-Genes associated (P < 0.05) to more than N A phenotypes and referred to as pleiotropic SNPs.

SNP-genes with regulatory potential
We used the RIF algorithm [20] to identify key regulatory SNP. Details are as follows.
To undertake the RIF analysis, we considered the diversity-associated SNPs and the pleiotropic SNP as potential targets of all the SNP-Genes annotated as either transcription factors (TF) or microRNA genes (miRNA). The RIF algorithm is designed to detect loci with high regulatory potential in a set of loci, TFs, and microRNA genes in our case, while contrasting two biological conditions or groups, such as bacteria and protists in our case. The analysis makes use of two metrics: RIF1 and RIF2. While RIF1 prioritizes regulators that are consistently the most differentially co-associated with the highly associated potential target genes, RIF2 highlights regulators with the most altered abilities to predict the association of potential target genes.
To identify significant gene-gene interactions, we used the partial correlation and information theory (PCIT) algorithm [22] which calculates pairwise correlations between loci while accounting for the influence of a third locus. Unlike likelihood-based approaches, which invoke a parametric distribution (e.g., normal) assumed to hold under the null hypothesis and then a nominal P value (e.g., 5%) used to ascertain significance, PCIT is an information theoretic approach. Its threshold is an informative metric; in this case, the partial correlation after exploring all trios in judging the significance of a given correlation, which might then become a connection when inferring a network. It thereby tests all possible 3-way combinations in a dataset and only keeps correlations between loci if they are significant and independent of the expression of another locus, whereas no hard threshold is set for the correlation strength. The significance threshold for each combination of loci depends on the average ratio of partial to direct correlations. Gene interactions were predicted using correlation analysis of the SNP effects across pairwise rows of the AWM. Hence, the AWM-predicted gene interactions are based on significant co-association between SNP. In the network, every node represents a gene (or SNP), whereas every edge connecting two nodes represents a significant gene-gene interaction (based on SNP-SNP co-association). Finally, the Cytoscape software [23] was used to visualize the gene network and the CentiScaPe plugin [24] was used to calculate specific node centrality values and network topology parameters.

Transcription factor binding motifs search
To further investigate the alleged role of the most relevant TFs, i.e., the PR/Set domain 15 (PRDM15) and the signal transducer and activator of transcription 1 (STAT1), a TF-binding motif search was implemented in the promoter region of the predicted target genes (i.e., genes with significant associations according to the AWM approach for PRDM15 and STAT1. To this end, we established putative promoter regions 1 kb upstream from the start of the coding region of each target gene and downloaded the corresponding sequences in the Sscrofa11.1 porcine assembly according to Ensembl repositories [25], by means of the BioMart tool (https:// www.ensembl.org/biomart/martview/). Transcription factor binding motifs for PRDM15 and STAT1 genes were retrieved from JASPAR database v.2020 [26] represented as position frequency matrices (PFMs). The FIMO algorithm [27] within MEME Suite [28] was subsequently employed for scanning individual matches between PFMs and retrieved promoter sequences. The Scrofa11.1 assembly was used for building a zero-order Markov model and integrated in FIMO calculations in order to correct for possible biases in nucleotide proportions. Motif occurrences with estimated P value below 10 − 4 were considered significant and retained for further analyses.

MicroRNA-binding sites search and structural inference
Following the rationale for detecting putative functional interactions among the set of key regulators identified by the AWM approach, we aimed at determining whether miRNA-mRNA significant co-associations found for relevant miRNA regulator genes were suggestive of an interaction between the miRNA and their co-associated mRNA transcripts. In this way, the ssc-miR-371 gene was among the top regulatory factors according to RIF metrics. The mRNA genes harboring SNPs significantly co-associated with the polymorphism within miR-371 (rs320008166, n.59 T > C) gene were retrieved and their 3′-UTR sequences according to the Sscrofa11.1 assembly annotation in Ensembl repositories [25] were downloaded by means of the BioMart tool (https://www.ensembl.org/biomart/martview/). In order to obtain a putative list of targeted mRNAs by ssc-miR-371, the seed region of the miRNA (2nd to 8th 5′ nucleotides of the mature miRNA) was reverse complemented and miRNA binding sites comprising perfect sequence matches between the seed and the retrieved 3′-UTRs (7mer-m8 sites) were assessed by using the locate tool from the SeqKit toolkit [29]. The potential structural consequences of rs320008166 (n.59 T > C) in the hairpin organization of ssc-miR-371 precursor transcript was assessed with the RNAfold software [30]. Table 1 lists the number of significant SNP and false discovery rate (FDR) at three nominal P value thresholds (P value < 0.05, 0.01, and 0.001) across the 39 phenotypes that were subjected to GWAS. The results highlight the trade-off that exists between significant SNP (the higher the better) and the FDR (the lower the better) as P values become more stringent. On average across the 39 phenotypes, the number of significant SNP (average FDR in brackets) at P values < 0.05, 0.01, and 0.001 was 4015.3 (FDR = 50.6%), 272.8 (FDR = 16.5%), and 87.6 (FDR = 5.8%), respectively. At the most stringent P value threshold of < 0.001, the highest number of significant SNP (N = 381; FDR = 1.1%) was obtained for the abundance of Faecalibacterium. On the other extreme, the lowest number of significant SNP (N = 25; FDR = 17.0%) was obtained for the abundance of Catenibacterium. Details of the genome map position and strength of the statistical association of the most significant SNP in each of the 39 phenotypes are given in Table 2. For each SNP-phenotype pair, the distance to and identity of the nearest gene is also listed in Table 2. With five instances, Chromosome 8 harbored the highest number of most associated SNP including one in the coding region of SORCS2 (SNP rs320095924) and one in the coding region of TRIM2 (rs329143797).

Gene-tailored association between microbial traits
The results of GWAS served as the basis for the AWM, the gene co-association network, and the regulatory impact factor approaches. In a first step, for all 39 phenotypes, estimated SNP effects were standardized by dividing them by the standard deviation of all SNP effects. After applying the analytical pipeline described in the "Methods" sections, the resulting AWM was comprised of 3561 SNP anchored to individual genes of which 121 were annotated as TF and 7 microRNA genes. In addition, there were 47 key regulators according to the RIF analyses including 3 microRNA genes. Notably, 10 of the 47 key regulators did not have a significant association (P < 0.05) with any of the phenotypes and their relevance would have been overlooked by GWAS alone. For the remaining 3551 genes, the number of associated phenotypes ranged from 1 to 13 and averaged 3.79. A total of 84.08% of the SNPs mapped within genes and the 15.92% were located upstream/downstream of annotated genes. Correlations between microbial traits were calculated using AWM columns (standardized SNP effects across bacteria and protists diversity and abundance) and were visualized as a hierarchical tree cluster, in which strong positive and negative correlations are displayed as proximity and distance, respectively (Fig. 1). Gene co-association network linked to microbial phenotypes Supplementary Fig. 1 shows an overview of the PCIT-inferred gene co-association network for 3561 SNP-Genes included in the AWM procedure, that connected by 738,913 edges of which 374,116 were positive and 364,797 were negative. In the network, node color indicates the phenotype with the strongest association (Supplementary Fig. 1). The network is characterized by a large central module where most of the bacteria vs. protists crosstalk takes place, surrounded by lots of smaller modules mostly phenotype specific. Reflecting the pleiotropic nature of the AWM, while the key phenotypes to capture SNP-Genes were the bacteria and protist alpha diversities, these only represent a minority of the genes: 241 for bacteria alpha diversity (dark blue nodes) and 231 for protist alpha diversity (bright red nodes). In contrast, other bacteria abundance was captured by 2568 genes (light blue nodes), while other protist abundance phenotypes were captured by 521 genes (orange nodes).

Key regulators in the network
The RIF analyses identified 47 key regulators among genes selected in the AWM procedure, and these are listed in Table 3, and Fig. 2 resumes the gene coassociation sub-network comprising the 47 regulators identified by RIF. Prominent among the 47 key regulators were PRDM15, STAT1, ssc-mir-371, SOX9, and RUNX2 which gathered 942, 607, 588, 284, and 273 connections, respectively (Table 3). PRDM15 was associated with 10 traits and was the regulator showing the highest pleotropic value (Table 1). PRDM15 is a zinc-finger sequence-specific chromatin factor that modulates the transcription of upstream regulators of WNT and MAPK-ERK signaling to safeguard naive pluripotency and regulates the production of Th1-and Th2-type immune response [31]. Similarly, the microRNA sscmir-371 has been reported to play an important role in pluripotent regulation in pigs [32]. On the other hand, mice with depleted gut microbiome have been found to develop liver damage and bone loss through the mediation of SOX9 [33] and RUNX2 [34], respectively. The signal transducer STAT1 was associated with five microbial traits (Table 3). STAT1 has long been associated with immune processes and was recently identified as a potential regulator of vaccine response to porcine reproductive and respiratory syndrome [35]. It should be noted that between 32 and 22.5% of the predicted target genes had at least one TF-binding site for STAT1 and PRDM15, respectively (Supplementary Table 2). Regarding the microRNA ssc-mir-371, a total of 155 binding sites were identified (Supplementary Table 3), comprising a total of 71 different mRNA genes (12% of the initial 588 co-associated genes). The search for background random miRNA-binding sites in the reverse complemented 3′-UTRs gave a total of 115 different binding sites (Supplementary Table 3), comprising 48 different mRNA genes, i.e., the expected number of miRNA 7mer-m8 binding sites for ssc-miR-371 seed was increased by 1.48 folds compared with background random miRNA-mRNA interactions. When we assessed the putative structural consequences of the presence of the rs320008166 (n.59 T > C) mutated allele, a reduction in the minimum free energy (MFE) of the folding of the precursor miRNA hairpin was highlighted. More specifically, the presence of the alternative C allele at position 59th of the precursor region of the miRNA implied the stabilization of a G:U wobble pairing in the wild-type miRNA sequence, introducing a stable canonical Watson-Crick G:C pairing. While the miRNA hairpin carrying the T allele had a MFE = − 35.44 kcal/mol, the presence of the T allele implied an estimated MFE = − 37.74 kcal/mol. Overall, in agreement with AWM original publication [13], here we provide a promoter sequence in silico validation of some of the predicted TF-target genes and also for the first time predicted miRNA target genes. A closer inspection of values in Table 3 reveals some fascinating relationships. Pleiotropy (measured by the number of significantly associated phenotypes) and connectivity (number of first neighbors in the co-association network) are significantly correlated (r = 0.571; P value < 0.0001), suggesting that both metrics are indicators of the pluripotential capacity of the regulators. This finding is of relevance because while pleiotropy was computed from the number of significant phenotypes in the GWAS, the number of connections is a feature of the connectivity in a co-association network. Two vastly different concepts which, when pointing to the same outcome, underscore the relevance of, in this case, regulators. Similarly, while RIF1 and RIF2 scores are moderately correlated (r = 0.421; P value < 0.01), only RIF2 is significantly correlated with pleiotropy (r = 0.471; P value < 0.001) and more significantly with connectivity (r = 0.806; P value < 0.0001). This relationship, to our knowledge, never before documented, indicates the ability of RIF2 scores to prioritize regulators that, in our case study, have an uncanny ability to address the 'bacteria vs. protists' crosstalk by differentiating between genes associated with bacterial traits from those associated with protists phenotypes, i.e., the contrast used in developing RIF. To further explore 'bacteria vs. protists' crosstalk, Fig. 3 shows the 3-way relationship between a gene's association with alpha diversity in bacteria, alpha diversity in protists, and its pleiotropy across the 3561 SNP-Genes included in the AWM. While alpha diversities indexes were the key phenotypes used to capture genes for the AWM, a further step aimed at identifying genes with pleiotropic potential (see "Methods"). The plateau of the surface revealed by Fig. 3 indicates that indeed SNP-Genes with near-zero non-significant association with alpha diversity phenotypes were still significantly associated across a large number of other microbial abundance phenotypes which reflect their potential pleiotropic effect.

Discussions
In this study, we propose a system biology approach to identify candidate genes, regulators, and biological pathways associated with 39 microbial traits. The cluster distribution based on the estimated additive value mirrored the distinct community composition types (enterotype clusters) as well as the known co-occurrence patterns of the pig gut microbiota [4]. For instance, pig gut enterotypes driver taxon Prevotella and Mitsuokella clustered together and distantly of a second cluster that includes Treponema and Ruminococcus (Fig. 1). We also noted that butyrate producer genera such as Faecalibacterium, Dorea, Blautia, Butyrococcus, and Coprococcus tend to   Fig. 2 Gene co-association sub-network: PCIT-inferred gene co-association sub-network comprising the 47 key regulators identified by RIF (diamonds) and their first neighbors (ellipses) presenting strong significant correlations (> 0.7). Node color is mapped to the phenotype of the strongest association: dark blue for bacteria alpha diversity (BACT_Alpha), light blue for other bacteria abundance phenotypes (BACT), dark green for protist alpha diversity (PROT_Alpha), and light green for other protist abundance phenotypes (PROT). Pink and cyan edges indicate positive and negative correlations, respectively, and node size indicates the amount of pleiotropy cluster closely, which suggest a common directionality of the additive values, and perhaps a common genetic control for this groups of taxa. Notwithstanding the fact that targeting of 16S rRNA variable regions with shortread sequencing platforms cannot achieve the taxonomic resolution at the species level, our results add confidence and suggest the usefulness of the proposed analytical framework to recover key ecological properties of gut microbial ecosystem. The gene co-association network ( Supplementary Fig.  1) revealed the identity of predicted target genes and the higher complexity and polygenic nature of the diversity and composition of the pig gut microbial ecosystem. It is worth noting that literature mining confirms the association between the pig host-genome and the relative abundance of six bacterial genus reported by Crespo et al. 2019 (Supplementary Table 4). Furthermore, our findings also confirm association of 27 of the 68 genes recently reported by [36] as linked with the alpha diversity and the relative abundance of member of the swine gut microbiota (Supplementary Table 5). Remarkably, PRDM15 was among the genes commonly identify by [36]. As previously mentioned, PRDM15 was associated with the highest number of traits (Table 3) Table 4), as well as QTLs reported by [36] associated with members of Clostridium, Succinivibrio, Bacteroides, Prevotella, Blautia, Turicibacter, Treponema, Mobiluncus, and Oscillibacter genera. Therefore, our results suggest that in contrast to host genome-microbiota association performed in humans and mouse [37][38][39], several QTLs reported in swine can be replicated which open the possibility to identify genetic markers and candidate genes that can be incorporated in genetic breeding program to improve microbial traits.
We also noted that the list of target-genes contains a total of 200 candidate genes previously reported as linked to microbial traits in mouse or human studies (Supplementary Table 6). Among them, it is worth highlighting the following: (1) SLIT3, reported in the UK Twins Dataset [40] as associated with unclassified Clostridiaceae [41], linked to MetaCyc pathways involved in plant-derived steroid degradation, and whose expression is upregulated in colon crypts during the conventionalization of germ-free mice [42]; (2) SLC39A8, a gene with a pleiotropic missense variant related with Crohn's disease and the composition of human gut microbiome [43]; and (3) NOS1, for which a pleiotropic association with body fatness and gut microbiota composition in mice had been shown [44]. We also identified other genes that had been shown to be related to β-diversity (CSMD1, ZFAT, FRMPD1, CLEC16A, IL1R2, BANK1, PRKAG2, LHFPL3, ST5, and NXN) and gut bacterial abundance (TOX3, NAPG, DLEC1, COL19A, DDM, and Fig. 3 Microbiome diversity and pleiotropy. Surface plot of the 3-way relationship between the SNP association with alpha diversity in protists (width), bacteria (depth) and its pleiotropy (height), measured as the number of significantly associated (adjusted P value < 0.05) protist and bacteria genera IFNAR1) in mice [45][46][47]. Genes reported by Allison et al. (2019) [48] as differentially expressed in primary human colonic epithelial cells (NEBL, ASAP3, ABLIM1,  CUEDC2, PRRC2C, DENND1A, LAMC1, MAL2, ITGB1, CAST, A2ML1, IL7R, PCDH7, NFATC2IP, SORCS2, and DNM3) were also found in our study. Furthermore, genes linked to the functional profiles of the human gut ecosystem (SORCS2, LRRC32, and ARAP1) were among the predicted target genes in our network. As previously mentioned, SORCS2 was reported as linked to a plantderived steroid degradation pathway. Meanwhile, genetic variants located in ARAP1 were linked to the bile acid metabolism, and LRRC32 was associated to the profile of 'cell-cell signaling' GO term [41]. It should be noted that many of these genes, including the key regulators, would have been missed by traditional single-trait GWAS which highly the usefulness of the proposed analytical pipeline to identify novels regulators and candidate genes linked with diversity and the composition of pig gut microbial ecosystem.

Host-genome markers associated with butyrate producing bacteria and piglets body weight
The identification of host-genetic markers linked with the relative abundance of butyrate producer bacteria such as Faecalibacterium, Dorea, Blautia, Butyrococcus, and Coprococcus (Table 2) prompted us to investigate whether these associations can be expanded in terms of overall pig's wellbeing, and using the piglets' body weight as the proxy for health and productivity. Butyrate is an important energy source for intestinal epithelial with anti-inflammatory potential that influences cell differentiation and strengthens the epithelial defense barrier [49,50]. In fact, the beneficial effect of butyrate on swine growth and intestinal integrity have been documented [51,52]. Therefore, we focused on the identification of SNPs with pleiotropic effect associated with the relative abundance of butyrate producer bacteria and host performance. After correcting for the systematic effects of sex (2 levels) and batch (7 levels), we found one significant SNP located at 96,004,482 bp of SSC2 (rs81361511) linked with the relative abundance of members of Butyricicoccus (P = 2.43E− 15) and piglets body weight (P = 0.026) ( Table 4), as well as two SNPs associated with the relative abundance of Coprococcus (rs81344777, P = 1.26E− 07) and Faecalibacterium (rs81288412, P = 1.07E − 17) that were suggestively associated with piglets body weight. It is noteworthy to highlight that in these three cases, the allelic effects were in the same direction: the same allele affects the relative abundance of Butyricicoccus, Coprococcus, Faecalibacterium and piglets body weight. In agreement with [36], our findings suggest the existence of host-genetic variants jointly associated with the relative abundance of beneficial bacterial and host performance. However, larger studies including experimental validations and alternative source of information at host (additional phenotypic traits) and microbial (whole-metagenome, meta-transcriptomics) level are needed to fully characterize the role of the host genomeassociated microbial communities in swine production performance, welfare, and health.

Conclusions
In the present study, we built and explored a SNP-gene co-association network comprising 3561 genes related to the diversity and abundance of 31 bacterial and 6 commensal protist genera in pigs gut microbiota. Besides identifying genes associated with alpha diversity in both bacteria and protist, our analytical approach takes advantage of the genetic contribution to related microbial traits and revealed genetic variants with pleiotropic effects on pig gut microbiota profile. We also identify SNPs with pleiotropic effect associated with the relative abundance of butyrate producer bacteria (Faecalibacterium, Butyrococcus, and Coprococcus) and host performance. Placing emphasis on regulatory elements, a total of 47 regulators that enriched for immune-related pathways were identified. Among them, five regulators resulted prominent within the network: PRDM15, STAT1, sscmir-371, SOX9, and RUNX2. The list of predicted targets included 200 candidate genes previously reported as associated with microbiota profile in mice and human, such as SLIT3, SLC39A8, NOS1, IL1R2, DAB1, TOX3, SPP1, THSD7B, ELF2, PIANP, A2ML1, and IFNAR1. Taken together, our results highlight the value of the proposed analytical pipeline to exploit pleiotropy and the crosstalk between bacteria and protists as significant contributors to host-microbiome interactions.
Additional file 1: Supplementary Table 1. Microbial traits employed in the study.
Additional file 2: Supplementary Fig. 1. Overview of the PCITinferred gene co-association network for 3561 SNP-Genes included in the AWM procedure.