- Open Access
Immunogenetic variation shapes the gut microbiome in a natural vertebrate population
Microbiome volume 10, Article number: 41 (2022)
The gut microbiome (GM) can influence many biological processes in the host, impacting its health and survival, but the GM can also be influenced by the host’s traits. In vertebrates, Major Histocompatibility Complex (MHC) genes play a pivotal role in combatting pathogens and are thought to shape the host’s GM. Despite this—and the documented importance of both GM and MHC variation to individual fitness—few studies have investigated the association between the GM and MHC in the wild.
We characterised MHC class I (MHC-I), MHC class II (MHC-II) and GM variation in individuals within a natural population of the Seychelles warbler (Acrocephalus sechellensis). We determined how the diversity and composition of the GM varied with MHC characteristics, in addition to environmental factors and other host traits. Our results show that the presence of specific MHC alleles, but not MHC diversity, influences both the diversity and composition of the GM in this population. MHC-I alleles, rather than MHC-II alleles, had the greatest impact on the GM. GM diversity was negatively associated with the presence of three MHC-I alleles (Ase-ua3, Ase-ua4, Ase-ua5), and one MHC-II allele (Ase-dab4), while changes in GM composition were associated with the presence of four different MHC-I alleles (Ase-ua1, Ase-ua7, Ase-ua10, Ase-ua11). There were no associations between GM diversity and TLR3 genotype, but GM diversity was positively correlated with genome-wide heterozygosity and varied with host age and field period.
These results suggest that components of the host’s immune system play a role in shaping the GM of wild animals. Host genotype—specifically MHC-I and to a lesser degree MHC-II variation—can modulate the GM, although whether this occurs directly, or indirectly through effects on host health, is unclear. Importantly, if immune genes can regulate host health through modulation of the microbiome, then it is plausible that the microbiome could also influence selection on immune genes. As such, host–microbiome coevolution may play a role in maintaining functional immunogenetic variation within natural vertebrate populations.
Most animals harbour a complex microbial community including bacteria, archaea, viruses, and microbial eukaryotes, collectively known as the host’s microbiome. Members of this diverse community have coevolved with their host’s over evolutionary time and can play a fundamental role in their host’s biology . Consequently, it is important to understand the ecological and evolutionary processes that shape host-associated microbial communities . This is particularly true of the vertebrate gut, where the gut microbiome (GM) contributes to many key biological processes in the host, from enabling nutrient uptake  to pathogen defence . As such, studies on domestic or laboratory animals have demonstrated that disrupting the GM can have negative consequences for host health and survival . However, the factors that govern the diversity and composition of the GM remain unclear, particularly in natural populations where organisms are exposed to greater levels of environmental and microbial complexity compared to domestic or laboratory animals .
Studies on wild vertebrates are now attempting to unravel the importance of extrinsic factors, such as the environment, season and diet in shaping GM variation among species, populations, and individuals e.g. [7–9]. Within a population, individual variation in the GM has also been linked to a multitude of host factors including sex , age , body condition , cognition , sociality , and hormones [15, 16]. Host genetics may also play an important role in determining individual GM variation within a population . For example, increased genetic relatedness predicts decreased GM dissimilarity [10, 18] and compositional changes [19, 20].
Immunogenetic variation may play an important role in driving individual differences in the GM (reviewed in ). Microbial complexity presents a unique challenge to the host immune system, which has evolved to prevent pathogenic microbes from proliferating, while still enabling beneficial microbes (i.e., those that form mutualistic/commensal interactions) to remain [22, 23]. In humans, inappropriate immune responses can lead to detrimental compositional changes to the GM via a loss of diversity and stability, as well as an increase in the proliferation of pathogenic bacteria . Such changes in GM composition can lead to serious health outcomes [5, 25], so accurate regulation of microbes by the host is essential. Variation in GM composition has been associated with variation in immune receptor genes (which detect antigens, inducing an immune response) , making them key candidate genes that might influence the relationship between host genotype and the GM. For example, pattern recognition receptor genes involved in the innate immune response, including Toll-like receptors (TLR), nucleotide-binding oligomerization-like receptors, RIG-I-like receptors and C-type lectin receptors, play an important role in recognising and regulating the GM in humans and captive animals (reviewed in [26, 27]).
In vertebrates, Major Histocompatibility Complex (MHC) genes, which play a key role in antigen detection in the adaptive immune response , may also shape GM variation. A specialised group of host cell-surface glycoproteins encoded by MHC genes, recognise and bind protein antigens (including microbial derived peptides), presenting them to T lymphocytes and B cell receptors. If the combined MHC-peptide complex is recognised as non-self, the T and B cells are activated, triggering an appropriate immune cascade . After such a reaction, the adaptive immune system produces memory cells, enhancing any future immune response (an acquired response) if the host re-encounters that specific antigen. MHC class I (MHC-I) molecules, which are expressed on the surface of virtually all somatic cells, primarily present intracellular peptides to cytotoxic CD8+ T cells, while MHC class II (MHC-II) molecules are present on antigen-presenting cells and mainly present extracellular antigens to helper CD4+ T cells, including those from bacterial species in the intestinal tract [30, 31]. Functional variants of MHC genes can confer differential antigen recognition and affect fitness [32, 33], with pathogen-mediated selection thought to drive the extraordinary within- and among-population variation observed at the MHC [34, 35]. Likewise, individuals with different MHC genotypes are likely to respond differently to different microbial species, including commensals in the gut , which could contribute to the inter-individual GM variation seen in natural vertebrate systems. Comparing functional immunogenetic variation with the diversity and composition of the GM could provide further understanding of the selective pressures acting on the maintenance of host genetic variation, with the GM being one possible factor driving selection at MHC genes.
Various studies on captive animals have found links between individual MHC variation and GM composition [37–39]. However, captivity can profoundly alter an organism’s microbiome  and the complexities associated with natural populations cannot be captured in such studies . Few studies have investigated associations between the microbiome and individual MHC variation in wild animals. Those that have, have primarily focused on MHC-II variation, as this is central to humoral immunity against extracellular microbes . For example, in the reddish-grey mouse lemur (Microcebus griseorufus), the presence of specific MHC supertypes was associated with changes in GM composition, but not diversity . In threespine sticklebacks (Gasterosteus aculeatus), GM diversity and MHC-II diversity were negatively correlated, and changes in GM composition and diversity were associated with specific MHC-II alleles . In contrast, in two species of giant salamander, the Ozark hellbender (Cryptobranchus alleganiensis bishopi) and eastern hellbender (C. a. alleganiensis), although the composition of the cutaneous microbiome was associated with specific MHC-II alleles, cutaneous microbiome diversity and composition were positively correlated with individual MHC-II divergence . Lastly, studies on Leach’s storm petrel (Oceanodroma leucorhoa), and the blue petrel (Halobaena caerulea), found that variation in body surface (feather and glandular) bacterial communities, responsible for the production of volatile cues involved in mate choice, was associated with MHC-II variation [45, 46]. However, very few studies have tested for an association between MHC-I genes (or any other host immune genes) and the microbiome in a natural population (but see [19, 42]). This is despite the fact that MHC-I variation could impact the GM indirectly, for example via differences in individual susceptibility to intracellular infections (such as viruses) that could then impact the health and/or GM of the host .
The isolated population of Seychelles warblers (Acrocephalus sechellensis) on Cousin Island has been intensively monitored since 1985 [48, 49]—and represents an excellent study system in which to investigate associations between MHC variation and the GM in the wild. The age, sex, status and territory of nearly every individual is known, and DNA samples have been collected since 1990 . This population harbours reduced genetic variation as a result of a past genetic bottleneck . However, variation still exists across the genome  and, importantly, at both MHC-I  and MHC-II  loci, as well as at some TLR genes . Differences in individual fitness have been linked to this genetic variation, for example, individual condition  and reproductive success  are negatively correlated with genome-wide homozygosity. Likewise, differential survival and reproductive success are associated with variation in the viral-sensing TLR3 gene . Lastly, survival is positively associated with a specific MHC-I allele (Ase-ua4) and MHC-I diversity , and the occurrence of extra-pair paternity is negatively related to MHC-I diversity of the social male . But whether there is functional variation at MHC-II in the Seychelles warbler and if this is related to variation in fitness remains unresolved.
Here, we aim to test whether immunogenetic variation, alongside other host and environmental factors, is associated with individual GM variation in the Seychelles warbler. Specifically, we test if bacterial diversity and community composition are associated with MHC-I and MHC-II gene diversity, or the presence of specific alleles at MHC-I, MHC-II, or TLR3 loci. It is difficult to make clear predictions about such associations. Individual GM diversity might be negatively associated with immunogenetic variation  if this genetic diversity enables hosts to recognise and instigate an immune response against more bacterial species. Alternatively, GM diversity might be positively associated with greater, or optimal, immunogenetic diversity. This could occur via two different pathways: first, directly, with greater immunogenetic diversity helping to eliminate highly competitive (potentially pathogenic) bacterial taxa, while still tolerating a network of commensal/mutualistic bacterial species—for example, there is evidence that following peptide-binding, the MHC can also enable tolerance towards certain microbial species including gut commensals; this can be induced through mechanisms involving regulatory T cells or immunoglobulin A [60–62]- and second, indirectly, with greater, or optimal, immunogenetic diversity conferring increased host health and fitness, which in turn may be associated with greater GM diversity . We also predict that GM composition and diversity will differ in relation to the presence or absence of specific immune alleles via differences in immunity and tolerance. This is expected to be most marked for MHC-II alleles, as these are expressed extracellularly on antigen-presenting cells that can extend into the gut lumen and are therefore important in the recognition of gut microbes .
Study species and sample collection
The Seychelles warbler is a small insectivorous passerine, endemic to the Seychelles archipelago. They are facultative cooperative breeders, defending strict year-round territories . The population on Cousin Island (4°20’S,55°40’E; 0.29 km2)—which has been extensively monitored since 1985 [48, 49] has a carrying capacity of ca. 320 adult individuals, existing in ca. 115 territories [48, 63, 64]. There is virtually no migration to or from Cousin . Individuals can live for a maximum of 19 years, with a median post-fledging lifespan of 5.5 years . A comprehensive population census is conducted bi-yearly during the major breeding season (June–September) in the south-east monsoon, and the minor breeding season (January–March) in the north-west monsoon . Territory quality (as defined by insect prey availability) varies quantifiably within and between years ; thus, it is possible to separate the influence of environmental factors from host-intrinsic factors when investigating individual GM variation.
Individuals are either caught as chicks in the nest or by mist net. Each bird is given a metal British Trust for Ornithology (BTO) ring and a unique combination of three colour rings, allowing it to be individually identifiable. Birds are aged based on hatch date, behaviour, or eye colour; grey eyes indicate an age < 5 months, light brown eyes are characteristic of sub-adults (5–12 months), and adults (> 12 months) have red-brown eyes [48, 69]. Blood samples (25 μl) are collected by brachial venipuncture and stored in 0.8 ml of absolute ethanol at either room temperature or 4°C.
In total, 343 faecal samples were collected from 293 captures, across two years and three consecutive field seasons: the major 2017, minor 2018, and major 2018 breeding seasons. Captured birds were immediately placed into a clean bag. In the first major breeding season of our study (2017), this was a laundered cotton bag; however, for all following seasons, birds were placed into a single use plastic-lined, flat-bottomed paper bag containing a plastic tray covered by a metal grate, according to an established protocol . The metal grate and tray were sterilised with a 10% bleach solution between use. Individuals were removed from the bag once they had defaecated or after a maximum of 30 min. A sterile flocked swab was used to transfer faecal material into a sterile microcentrifuge tube containing 1 ml of absolute ethanol. If the bird defaecated outside of the bag or tray, then a sample was still collected. Control samples from possible sources of contamination such as the bag, grate, and tray were taken throughout each sampling season, using sterile flocked swabs (n = 1 per type). In addition to this, four swab samples were taken from fieldworkers’ hands throughout each sampling season; the gDNA from different hand samples was pooled into one sample prior to sequencing. Faecal samples were stored in the field at 4°C for a maximum of 3 months (mean days ± SEM: 38.1 ± 1.3), before being transported to the lab, where they were stored at −80°C prior to extraction. There was no association between faecal GM diversity (measured as Shannon, Chao1 and Faith's phylogenetic diversity) or composition (measured as weighted UniFrac), and extraction time (time in days from collection to extraction) or sequencing run (n = 195, P > 0.05).
Genomic DNA (gDNA) was extracted from blood using the DNeasy blood and tissue kit (Qiagen). Individuals were genotyped at 30 polymorphic microsatellite loci [52, 64], and standardised individual microsatellite heterozygosity (Hs) was calculated using the R 3.6.1 function Genhet 3.1 . Sex was determined via PCR [64, 72]. Variation at one non-synonymous SNP within the leucine-rich repeat domain of TLR3 exon 4 was determined following .
MHC sequencing and bioinformatics
In total, 314 samples were MHC sequenced, including 229 samples from individuals that had GM data and 31 samples from individuals previously MHC screened at either MHC-I  or MHC-II  using older techniques. The remainder included 30 repeated samples, 23 negative controls (making up at least 5% of each plate) and four samples (one per plate) from one great reed warbler (Acrocephalus arundinaceus) individual to serve as a positive control.
MHC-I exon 3 and MHC-II exon 2 were amplified and sequenced using previously validated primer sets [48, 49] (Additional file 1: Table S1), with the addition of Illumina index sites. Additionally, six random hexamers (N) were added to the first round PCR (PCR1) primers to increase diversity and improve cluster separation . The two primer pairs amplifying MHC-I each included a motif-specific primer situated within exon 3, and one general primer situated in intron 3, and so amplified 262 bp of the full exon (274 bp). These primers had been designed to preferentially amplify functional variants, while avoiding known pseudogenes [53, 75]. The primers for MHC-II, situated within the flanking introns 1 and 2 of exon 2, amplify a 291-bp fragment. These sequences were then edited to the 270 bp MHC-II exon 2 . The term ‘allele’ is used to describe the different variants amplified for each class of MHC, consistent with other publications investigating MHC diversity; however, alleles cannot be assigned to specific (duplicated) loci within each MHC class. Previous work suggests that, in the Seychelles warbler, there are a minimum of four duplicated MHC-I loci and six MHC-II loci [53, 54]. Sequencing of the MHC-I and MHC-II exons was carried out using 2x 250-bp paired-end sequencing on an Illumina MiSeq platform (Illumina, San Diego) (see Additional file 1: Supplementary methods for details).
Processing and MHC genotyping of raw Illumina MiSeq data were conducted using the Amplicon Sequencing Assignment (AmpliSAS) tool . First, FastQC was used to check read quality, before merging pair-ended reads together using AmpliMERGE (10,257,832 merged sequences). AmpliCLEAN was then used to remove low-quality reads (Phred score of <30) and any that were missing either primers or barcodes (e.g., from residual PhiX). MHC-I and MHC-II sequences were separated resulting in 3,044,897 raw MHC-I sequences and 6,144,575 raw MHC-II sequences. All downstream bioinformatics and analysis were conducted separately for MHC-I and MHC-II. Cutadapt 1.6  was used to remove MHC-specific primers, the six random hexamers, and short reads (<100 bp). For MHC-II sequences, remaining intron regions were also removed, leaving a 270-bp fragment spanning the full exon. AmpliCHECK was used for preliminary data exploration, using Illumina-based default settings. Finally, AmpliSAS was used for demultiplexing, clustering, and filtering reads. First, a subset of 30 duplicated samples were used to optimise parameters for MHC-I and MHC-II, testing both minimum dominant frequency settings for the clustering step, and minimum amplicon depth for the filtering step, as recommended in . Based on these results (Additional file 1: Table S2, S3) Illumina default clustering settings were used (1% substitution errors, 0.001% indel errors, 25% minimum dominant frequency) for both MHC classes. For the filtering step, chimaeras and sequences that only appeared in one sample were removed, and the minimum amplicon frequency was set as 1.6% for MHC-I and 1.8% for MHC-II. This resulted in 1,267,410 raw MHC-I sequences and 1,385,049 raw MHC-II sequences. Due to computational limitations, the MHC-II dataset was split into two halves and analysed using AmpliSAS separately, before being combined using AmpliCOMBINE in the web version of AmpliSAS.
For MHC-I, the majority of putative alleles were 262 bp, although three sequences that were <262 bp were present in >80% individuals. These were not homologous to any known MHC gene when checked using blastn and were removed from downstream analysis. The majority of MHC-II putative alleles were the full 270 bp length, although there were also sequences between 267 and 269 bp, which were similar to MHC genes (see results). All MHC-II sequences <267 bp in length were not similar to any known MHC genes and, as putative sequencing artefacts, were removed from downstream analysis. MHC-I and MHC-II putative alleles were first checked against all known Seychelles warbler alleles. Any unknown putative alleles were then checked against the GenBank (NCBI) nucleotide database (accessed on June 25, 2020) to assess similarity to known MHC alleles from other related species. Additionally, samples of insufficient read depth based on rarefaction curves, which equated to a minimum read depth of 150 per amplicon for MHC-I, and 100 per amplicon for MHC-II, were removed. For 30 individuals sequenced twice to confirm repeatability, the sample with the greatest read depth was retained. After processing, the total number of reads assigned to an allele was 1,071,525 for MHC-I (mean ± SEM = 4391.5 ± 149.3 per sample) and 1,123,211 for MHC-II (mean ± SEM = 4603.3 ± 888.2 per sample) in the Seychelles warbler.
Microbial extraction, sequencing, and bioinformatics
In total, 400 faecal samples were sequenced across three sequencing runs (two plates per sequencing run). This included 343 unique faecal samples (from 235 individuals) and 14 control samples, the latter of which included six extraction negative controls, four positive controls (using a microbial community standard), and four sampling controls. Additionally, 43 faecal samples were sequenced twice (20 within the same run and 23 across different runs) to determine sequencing accuracy and repeatability within, and between runs.
Faecal samples were centrifuged for 10 min at 10,000 rpm, and the supernatant was removed. To remove any residual ethanol, the resulting pellet was washed with 100 μl RNase/DNA-free molecular grade water by centrifuging at 10,000 rpm for 10 min. The supernatant was then removed, and the washing step repeated a further two times. Microbial DNA was extracted from 0.05–0.1 g of each sample using the DNeasy PowerSoil Kit (Qiagen), according to an optimised version of the manufacturer’s instructions. Modifications consisted of a heat block step (65°C for 10 min) prior to bead-beating and elution of DNA in a final volume of 60 μl elution buffer. To further assess sequencing accuracy and repeatability between runs, a ZymoBIOMICS microbial community standard (D6300) was extracted as a positive control using a ZymoBIOMICS DNA miniprep kit (Zymo Research), according to the manufacturer’s instructions. The observed microbiome profiles of the positive control represented the expected microbial community (8 ASVs). The accuracy and repeatability of the DNeasy PowerSoil Kit extraction method were assessed and validated in a separate study which also utilised the ZymoBIOMICS microbial community standard (D6300).
Extracted gDNA was quantified using a Qubit 4.0 Fluorometer (Invitrogen) with a Qubit dsDNA HS assay kit (Invitrogen). Aliquots of gDNA were shipped on dry ice to the Centre for Genomic Research, University of Liverpool, for library preparation, pooling and sequencing. Bacterial barcoding was performed with a 2-step amplification process using the primers 515F (5'TGCCAGCMGCCGCGGTAA3’) and 806R (5’GGACTACHVGGGTWTCTAAT3’) , which amplify the V4 region of the 16S rRNA gene (see Additional file 1: Supplementary methods for details). Paired-end sequencing was carried out using 2x 250-bp paired-end sequencing on an Illumina MiSeq platform (Illumina, San Diego).
For each sequencing run, raw reads were first trimmed using Cutadapt 1.2.1  to remove Illumina adapter sequences. Reads were further trimmed using Sickle 1.200 with a minimum window quality score of 20, resulting in 12,308,047, 9,397,303 and 9,831,508 demultiplexed reads for the three runs, respectively (mean ± SEM per sample: 102,567.1 ± 10,454.8, 67,123.6 ± 6,633.1, 70,225.1 ± 5,423.5). Sequences were then analysed using QIIME2 2019.10 . Based on overall quality scores, the first 10 bases of each read were trimmed, and sequences truncated to 210 bp for both forward and reverse reads. The DADA2 plugin 2019.10.0 was used to join paired-end reads, denoise, remove chimaeras and residual PhiX reads, dereplicate and call amplicon sequence variants (ASVs) [80, 81]. Following this, results from the three separate runs were merged, resulting in a total of 22,997,693 reads (mean ± SEM per sample: 57,494.3 ± 3424.8) with 36,182 ASVs. A mid-point rooted phylogeny was then constructed using the masked alignment MAFFT  and the Fast Tree approach . Taxonomic assignment of ASVs was performed by training a naïve-Bayes classifier on the SILVA 132 16S dataset using 99% sequence similarity [84, 85]. Plastid-like and archaeal sequences were removed, as well as amplicons which only had one read across the whole dataset (singletons) which likely represent sequencing errors. In total, ASVs from 19 genera were present in the negative control samples (Additional file 2). Of these, 15 genera were present at < 3% overall abundance in the negative control samples (and had greater read counts in the faecal samples) and so were not considered as contaminants introduced at extraction. Furthermore, two genera had fewer than 500 reads across faecal samples, and one genus was found at equal abundance in negative extraction controls and faecal samples from different sequencing runs, and so it could not be determined which sample was contaminated; as such, these sequences were retained in analysis. A visual assessment of samples showed that two ASVs were present at relatively high sequencing depth in negative extraction controls—these were also prevalent in faecal samples with low sequencing depth and DNA concentration and so were removed as probable contaminants. These two ASVs were additionally confirmed as contaminants using the prevalence and filtering methods in the DECONTAM package . One of these ASVs was from the genus Defltia (relative abundance of 90.5% in a negative extraction control from the first run), and one was from the genus Limnobacter (relative abundance of 99.9% in a negative extraction control from the third run). The removal of these ASVs resulted in a total of 21,863,215 reads (mean ± SEM per sample: 54,795.0 ± 3,439.6) with 34,869 ASVs. Following cleaning, visual assessment of the positive controls revealed that only the eight expected ASVs were present, while visual assessment of the collection controls showed they had quite different bacterial profiles when compared to the faecal samples, with few overlapping ASVs at high abundance. This indicates that neither the sequencing nor the sampling methods significantly impacted taxa present in the faecal samples. The final sample metadata, ASV and taxonomy tables were all exported from QIIME2 into R 3.6.1 where they were processed using phyloseq 1.28.0 . Sample completeness and rarefaction curves were generated using iNEXT 2.0.20 ; completeness plateaued at approximately 10,000 reads and 34 samples (including all six negative extraction controls) with fewer reads were excluded from downstream analyses (Additional file 1: Fig S1). Overall, 320 unique samples (93%) were retained from 224 individuals.
The repeatability of GM sequencing was tested by comparing samples that were sequenced multiply within and across sequencing runs and that had sequencing depth of >10,000 reads (37 out of the initial 43 samples). Individual repeatability was tested for individuals that had multiple samples collected from the same capture or were caught multiple times in the same field season and that had a sequencing depth of >10,000 reads (115 samples from 51 individuals). The individual repeatability dataset was further filtered to remove ASVs that had a total read count of <50 across samples and could potentially represent sequencing errors. Euclidean dissimilarity between pairs of samples was compared using one metric of alpha diversity (the Shannon diversity index) and two metrics of beta diversity (unweighted and weighted UniFrac of between and within duplicated samples) using Kruskal–Wallis tests.
Unless otherwise stated, all analyses were conducted in R 3.6.1. To characterise the Seychelles warbler GM, samples sequenced twice for repeatability analysis (sample duplicates) were filtered such that only the sample with the greatest read-depth was retained. Samples collected from the same bird during the same catch session (catch duplicates) were filtered to retain the single sample with the smallest potential exposure to external contamination, i.e., samples collected from cleaned trays were prioritised over those collected from other substrates, then the sample with the highest read depth was prioritised. The removal of sample and catch duplicates resulted in 281 samples (from 224 individuals). This resulted in a total of 16,562,592 reads (mean ± SEM per sample: 58,941.6 ± 3997.7) with 34,869 ASVs in the faecal dataset. For all alpha diversity, beta diversity and differential abundance analyses, microbiome samples taken from chicks were excluded due to a small sample size (n = 11). Individuals with incomplete MHC genotype data (n = 25) were also excluded. Lastly, to prevent pseudo-replication, where an individual had multiple samples taken at different capture events, a single sample was selected at random, giving an overall sample size of 195 samples from 195 individuals from Cousin Island. Prior to downstream analysis the dataset was further filtered to remove ASVs that had a total read count of <50 across samples and could potentially represent sequencing errors. This resulted in a total of 10,680,281 reads (mean ± SEM per sample: 54,770.7 ± 4,170.4) with 9,628 ASVs in the cleaned, non-rarefied dataset.
All 195 samples were rarefied to a depth of 10,000 reads, based on sample completeness curves, leaving a total of 1,950,000 reads and 27,547 ASVs across samples in the rarefied dataset. Analyses were run using both rarefied and non-rarefied data; however, as results were comparable between datasets and library size was highly variable across samples, only the outcome of models using the rarefied dataset are presented. Three metrics of alpha diversity were calculated: Chao1  (microbial species richness), Shannon diversity index  (species richness, taking into account sample evenness) and Faith’s phylogenetic diversity index  (the phylogenetic diversity of a sample). Chao1 and Shannon diversity indices were calculated using phyloseq 1.28.0 , and Faith’s phylogenetic diversity was calculated using btools 0.0.1 . Both Chao1 and Faith’s phylogenetic diversity were log-transformed in models to improve residual fit.
Linear models with a Gaussian distribution were constructed using glmmTMB 0.2.3  to determine whether the alpha diversity of the GM differed with (1) the presence/absence of individual immune genes and (2) immune gene diversity. For each diversity metric two models were run. The first model included the presence/absence of all the MHC-I and MHC-II alleles that were present in at least 15% of sampled individuals and that were the correct length (see above). This included Ase-dab3, Ase-dab4, Ase-dab5, Ase-ua1/10, Ase-ua3, Ase-ua4, Ase-ua5, Ase-ua6, Ase-ua7, Ase-ua8, Ase-ua9, and Ase-ua11 (Ase-ua1 and Ase-ua10 were perfectly correlated, so only Ase-ua1 was included). The second model contained MHC-I diversity, MHC-II diversity, and the squares of each of these terms, since optimal, rather than a greater diversity, of MHC alleles could be more beneficial . Both models also included individual heterozygosity (Hs) and TLR3 genotype (TLR3AA, TLR3AC or TLR3CC). The field period sampled (major 2017, major 2018, minor 2018), sex (male, female), and age (fledgling, old fledging, sub-adult, adult) were also included, as these factors have been shown to influence GM variation in other studies. Alpha diversity (Shannon, log Chao1, log Faith) was entered as the response variable. In all models, continuous factors were standardised (scaled and centred) using arm 1.10-1 . Biologically relevant interactions were initially included in models but were removed prior to model averaging to enable interpretation of first-order effects, as all were non-significant (P < 0.1). Field period and territory quality were correlated (linear model; F2,185 = 117.2, P < 0.001), so only field period was included in the models. Collinearity between independent variables was tested using variance inflation factors ensuring an upper limit of three. Collinearity between the presence/absence of immune alleles was further assessed using GGally 2.0.0 . DHARMA 0.2.4  was used to confirm that there was no over or under dispersion, or residual spatial or temporal autocorrelation in the models. Model averaging—an information-theoretic approach using the dredge function in MUMIn 1.43.6 —was used to select plausible models. All models within seven AICc of the top model were included in the averaged model, to obtain the final conditional model .
The unrarefied dataset was further filtered to remove ASVs that appeared in fewer than five samples, based on an assessment of overall ASV prevalence and abundance (Additional file 1: Fig S2). Overall, 1,934 ASVs were retained following filtering. To account for uneven sequencing depth across samples, reads were normalised using the cumulative sum scaling function  in metagenomeSeq 1.26.3 . This method includes adding a pseudocount and transforming the data (log (x + 0.0001). Therefore, to preserve zeros from the original counts, post-transformation values were corrected by subtracting the log of the pseudocount . Two beta diversity metrics that incorporate phylogenetic distance were then calculated using phyloseq 1.28.0 ; these were unweighted UniFrac distance (based on the presence/absence of microbial taxa) and weighted UniFrac distance (a quantitative measure which accounts for differences in the abundances of microbial taxa) . Marginal Permutational Analysis of Variance tests (PERMANOVAs) were used to assess whether GM community composition differed with immune gene characteristics, using the adonis2 function in Vegan 2.5.6  with 10,000 permutations. As with alpha diversity models, two sets of PERMANOVA tests were constructed for each beta diversity metric, with the first set of models including the presence/absence of MHC alleles, and the second set of models including MHC diversity; other variables were included as in alpha diversity models. To clarify whether significant differences detected in PERMANOVA tests were caused by differences in mean values, rather than variation in dispersion across groups , homogeneity of group dispersions was tested using the betadisper function in Vegan 2.5.6 . Principle coordinate analysis (PCoA), based on weighted and unweighted UniFrac distances, was used to visualise the differences in composition between groups.
Differential abundance analysis
To assess whether particular ASVs were differentially abundant between groups of individuals with different immune gene characteristics, DESeq2 1.24.0  was used. For this analysis, unrarefied reads were filtered following the same protocol as used for beta diversity analysis, but untransformed, as DeSeq2 uses its own variance stabilising transformation to account for variation in library size across samples. Overall, 1,934 ASVs were retained following filtering. DeSeq2 estimates the log2-fold change in microbial abundance between sample groups using a negative binomial distribution to model ASV counts. Only variables that were associated with significant compositional shifts in PERMANOVA tests (Ase-ua7, Ase-ua11, Ase-ua1/10, field period, and age class) were included in this analysis to avoid over-parametrization (Table 1). To account for the large number of zero counts for individual ASVs, the “poscounts” estimator was included when estimating size factors. As part of the DeSeq2 pipeline, differential ASV abundance was assessed using negative binomial Wald tests and P values were adjusted using the Benjamini and Hochberg false-discovery rate correction, with a significance cut-off of P < 0.01. Two ASVs did not converge due to a high number of zero counts across samples; these were removed from the analysis.
Seychelles warbler GM profile
The overall bacterial phyla and class profile of the Seychelles warbler GM was similar to other passerine species . We identified 40 bacterial phyla across the 281 samples combined; however, of these, Proteobacteria (42% of total reads), Firmicutes (22%), and Actinobacteria (17%) dominated, with all other phyla being present at lower relative abundances (summing to <5% of the total read count). The dominant bacterial classes were Gammaproteobacteria (25%), Alphaproteobacteria (16%), Actinobacteria (16%), Bacilli (16%), and Clostridia (6%) (Fig 1). At lower taxonomic levels, we identified 126 classes, 372 orders, 745 families, 1,827 genera, 2,586 species and 34,869 ASVs across the 281 samples combined.
The core microbiome was further characterised at the family level by extracting bacterial families that appeared in at least 50% of samples with a minimum relative abundance of 0.1% (Additional file 1: Table S4). This resulted in the detection of 28 core families, with ASVs from these families making up 75% of all reads. Of the core families, eight were present in at least 80% of samples, and four accounted for >5% of all reads; this latter group consisted of Enterobacteriaceae (23% of total reads), Streptococcaceae (10%), Rhizobiaceae (6%), and Enterococcaceae (5%). Of the assigned genera, 20 were present in at least 50% of samples and ASVs from these genera made up a total of 28% of all reads. However, of these, only two genera (Microbacterium and Enterococcus) were present in more than 80% of samples.
Despite the presence of a core microbiome, the abundance of individual bacterial taxa was highly variable across individuals as is typical in a wild species (Fig. 1) [109–111]. Additionally, there was considerable individual variation in alpha diversity when measured as Chao1 (mean = 323.2 ± 14.63 SEM), Shannon (mean = 4.0 ± 0.07), and Faith’s phylogenetic diversity (mean = 18.8 ± 0.62). Individual repeatability of the GM was tested using samples taken from the same individual during the same catch or field season, and three metrics of diversity (Shannon, unweighted, and weighted UniFrac). Pairwise beta diversity distances between different individuals were significantly greater (more dissimilar) than within-individual comparisons of samples taken during the same catch or field season (n = 51, unweighted UniFrac; χ1 = 21.28, P <0.001: weighted UniFrac; χ1 = 6.06, P = 0.014; Additional file 1: Fig S3). Likewise, pairwise Shannon diversity distances between different individuals were significantly greater (more dissimilar) than within-individual comparisons (n = 49, χ1 = 4.51, P= 0.034; Additional file 1: Fig S3). Note that in this analysis, two samples with unusually low Shannon diversity (0.9 compared to a mean of 4.0) were not included.
Repeatability of GM sequencing for the same sample was tested using three metrics of diversity (Shannon, unweighted, and weighted UniFrac). As expected, pairwise distances between samples were significantly greater (or more dissimilar) than within-sample comparisons, i.e., pairwise distances when gDNA from the same sample was sequenced twice (n = 37, P < 0.001) (Additional file 1: Fig S4).
244 individuals were successfully genotyped at MHC-I exon 3 and MHC-II exon 2 genes. The repeatability of MHC-I genotypes was 95.0% and of MHC-II was 90.1%, based on 26 and 24 duplicate samples, respectively (Additional file 1: Table S2, S3). The great reed warbler positive control sample had four MHC-I and four MHC-II alleles—all of which mapped with 100% similarity to previously sequenced great reed warbler MHC alleles.
On average, individuals had 5.0 MHC-I alleles: 2–7 alleles per individual. Of these, 10 MHC-I alleles were present in >5% but <95% individuals, and another 10 were present in ≤5% of individuals (Fig. 2). Nine of the ten common alleles matched previously sequenced alleles , with an average of 98% sequence similarity. Ase-ua2 was not present in the current sequencing cohort. When comparing 29 individuals also genotyped using reference strand-mediated conformation analysis , and excluding Ase-ua2, there was 95% similarity between genotyping methods.
Including all MHC-II alleles, individuals had on average 5.8 alleles (range 3–11) out of a total of 24 alleles (Fig. 2a). However, of these 24 MHC-II alleles, only 14 were of the full exon length (270 bp), six alleles had a 1-bp deletion (269 bp in length), two alleles had a 2-bp deletion (268 bp) and two alleles had a 3-bp deletion (267 bp). Of the 10 alleles which contained indels, three of these also contained stop codons, and all were missing the Cys74 residue, which along with Cys10 residue creates a disulphide bridge, which is important for conformation of the mature MHC protein; therefore, these alleles were removed as putative pseudo- or non-functional alleles. Concentrating on putatively functional MHC-II alleles, there were 2.9 alleles on average per individual (range 1–5 alleles per individual). Of these, only three were present in >5% but < 95% of individuals (Fig. 2b). Of the other putatively functional alleles, two were present in virtually all individuals, while nine alleles had a frequency <5%.
For downstream analysis, only alleles of the full, correct length (i.e. MHC-I: 262 bp, MHC-II: 270 bp) were included when calculating diversity or for presence/absence. Again, for the presence/absence of alleles, only alleles present in >5% but <95% of individuals were included. Of those alleles included in the final presence/absence analyses, all 10 of the MHC-I and three of the MHC-II alleles translated into unique amino acid sequences.
The effect of MHC and other host variables on GM alpha diversity
The presence of four MHC alleles was associated with reduced diversity and richness of the GM (Fig. 3). Individuals with the Ase-ua5 allele had significantly lower alpha diversity for all calculated metrics, compared to individuals without (Additional file 1: Table S6, Fig 3), indicating that Ase-ua5 negatively influences the richness (β ± SE = −0.32 ± 0.13, z = 2.40, P = 0.016), evenness (β ± SE = −0.55 ± 0.25, z = 2.22, P = 0.027) and the phylogenetic diversity (β ± SE = −0.22 ± 0.09, z = 2.36, P = 0.018) of the GM. Likewise, the presence of the Ase-ua3 allele was also associated with a decrease in Shannon diversity (β ± SE = -0.51 ± 0.24, z = 2.16, P = 0.031), Chao1 richness (β ± SE = −0.41 ± 0.16, z = 2.62, P = 0.009: Additional file 1: Table S6a, Fig 3) and phylogenetic diversity of the GM (β ± SE = −0.25 ± 0.10, z = 2.39, P = 0.017: Additional file 1: Table S6a, Fig. 3). The presence of the Ase-ua4 allele was associated with reduced GM richness (β ± SE = −0.34 ± 0.14, z = 2.35, P = 0.019) and phylogenetic diversity (β ± SE = −0.21 ± 0.10, z = 2.15, P = 0.032; Additional file 1: Table S6a, Fig. 3), but there was no significant difference in the evenness of the GM between individuals with or without the Ase-ua4 allele (Additional file 1: Table S6a, Fig 3). None of the remaining MHC-I alleles or TLR3 genotype were associated with alpha diversity (Additional file 1: Table S6a, Fig. 3). Likewise, most MHC-II alleles were not associated with changes in GM diversity. However, the presence of one allele, Ase-dab4, was associated with a reduction in Shannon diversity (β ± SE = −0.45 ± 0.23, z = 1.98, P = 0.048; Additional file 1 Table S6a, Fig 3), but not Chao1 richness or Faith’s phylogenetic diversity (Additional file 1: Table S6a, Fig 3). There was no significant effect of MHC-I or MHC-II diversity, or diversity2 on alpha diversity (Additional file 1: Table S6b). By contrast, individual heterozygosity was positively associated with Shannon diversity (β ± SE = 0.35 ± 0.16, z = 2.21, P = 0.027; Additional file 1: Table S6a, Fig. 3), but not Chao1 or Faith’s phylogenetic diversity, though these both showed the same pattern.
Males had reduced phylogenetic diversity compared to females (β ± SE = −0.13 ± 0.06, z = 2.13, P = 0.034, Additional file 1: Table S6, Fig 3), and the same pattern was evident when comparing richness. There was also an association between GM diversity and age, with old fledglings having reduced evenness, richness and phylogenetic diversity compared to adults (Shannon; β ± SE = −0.57 ± 0.27, z = 2.10, P = 0.036: logChao1: β ± SE = -0.45 ± 0.16, z = 2.62, P = 0.009; logFaiths: β ± SE = −0.26 ± 0.11, z = 2.45, P = 0.015; Additional file 1: Table S6, Fig 3). There was no association between GM diversity and field period (Additional file 1: Table S6a, Fig. 3), suggesting that environmental variation across field periods had little influence on the observed variation in alpha diversity values across individuals.
The effect of host variables on GM composition
In addition to effects on alpha diversity, compositional differences in the GM of individuals with, or without, specific MHC-I alleles were identified, although the alleles that showed an effect were not the same as those associated with shifts in GM alpha diversity. PERMANOVA tests showed that the overall composition of the GM was significantly different for individuals with the Ase-ua7 allele, the Ase-ua11 allele, or for Ase-ua1 (and Ase-ua10 as these alleles were co-occurring) versus those without them (Table 1, Fig. 4), but only when weighted UniFrac distances were used as a beta diversity metric. None of the differences detected in PERMANOVA tests were due to differential dispersion (all betadisper tests: P > 0.05), indicating that the results reflected differences in mean values across groups. However, although statistically significant, the presence/absence of specific alleles only explained a minimum of 1.4% and a maximum of 1.7% (per allele) of the variation in GM composition, suggesting that each allele only had a small influence on overall GM composition (Table 1). The remaining MHC-I and MHC-II alleles, as well as MHC-I and MHC-II diversity (or diversity2) had no effect on GM composition (Table 1). Additionally, TLR3 genotype and Hs were not associated with any of the beta diversity metrics (Table 1).
Age class was associated with a compositional shift in the GM in PERMANOVAs based on unweighted UniFrac distances (Table 1) and explained 1.9% of the variance in GM composition. Based on the Principal Coordinate analysis plots, this difference was due to old fledglings being slightly more differentiated compared to other age classes (Additional file 1: Fig S5). However, this effect was absent in models based on weighted UniFrac, which takes the abundance of ASVs into account (Table 1). This suggests that the changes in composition with age class may be due to differences in the presence/absence of different bacterial taxa in the GM, rather than differing abundances of the same taxa. There were no differences in beta diversity between males and females (Table 1). There were significant compositional differences in the GM between field periods, which overall explained either 1.7 or 2.1% variance for unweighted and weighted UniFrac distance, respectively (Table 1).
The influence of host variables on the abundance of specific ASVs
The co-occurring Ase-ua1/10 alleles were associated with the greatest change in ASV abundance, with 67 ASVs (across 31 bacterial orders) being significantly more abundant when these alleles were absent and 32 ASVs (across 15 orders) being more abundant when they were present (Additional file 3; Fig. 5c). Fewer taxa were differentially abundant between groups of individuals with/without Ase-ua11. In this instance, 12 ASVs (across 7 orders) were significantly more abundant when the allele was absent, and 32 ASVs (across 17 orders) were more abundant when the allele was present (Additional file 3; Fig. 5b). Overall, 30 ASVs (across 13 orders) were significantly more abundant when the allele Ase-ua7 was absent and 22 ASVs (across 16 orders) were more abundant when the allele was present (Additional file 3; Fig. 5a).
Many ASVs were significantly less abundant in old fledglings compared to other age groups (old fledglings compared to fledglings: 24 vs 150; old-fledgling compared to sub-adults: 59 vs 134; old-fledglings compared to adults: 18 vs 168; Additional file 1: Fig S6). In comparison, the numbers of differentially abundant taxa between other age groups were more even (fledglings compared to sub-adults: 47 vs 55; fledglings compared to adults: 22 vs 33; sub-adults compared to adults: 22 vs 86; Additional file 1: Fig S6).
Concentrating on extrinsic associations with GM, 225, 227, and 144 ASVs were significantly differentially abundant between the three field periods, respectively (Additional file 1: Fig S7). The majority of ASVs were underrepresented in the minor 2018 season compared to either major season (60 in the minor 2018 vs 167 in the major 2017 season, and 37 in the minor 2018 season vs 188 in the major 2018 season). Of these, 150 ASVs from the minor season differed in abundance across analyses.
In this study, we screened GM variation and MHC class I and II characteristics of individuals in a natural population of the Seychelles warbler. This enabled us to assess how the diversity and composition of the bacterial component of the GM is associated with individual immunogenetic variation i.e. MHC and TLR3 genotypes. Our results indicate that differences in GM diversity are associated with the presence of certain MHC alleles (specifically, lower diversity is associated with four of the 13 tested MHC alleles). Furthermore, differences in GM composition are associated with the presence/absence of four (different) MHC-I alleles, including the differential abundance of certain bacterial taxa. While we found no effect of MHC diversity or TLR3 genotype on the GM, we did find a positive association between bacterial GM diversity and individual genome-wide heterozygosity. Lastly, GM characteristics were also associated with several other host specific or extrinsic variables, namely sex, age, and sampling period.
MHC variation in the Seychelles warbler
Here, we screened variation at the MHC-I exon 3 and MHC-II exon 2 genes using next-generation sequencing. We found reduced functional allelic diversity at MHC-II compared to MHC-I, consistent with what has been found in other passerines . Previous studies on the Seychelles warbler have provided evidence that balancing selection has maintained variation at both the MHC-I  and MHC-II . However, the latter study did not fully resolve individual MHC-II characteristics because of difficulties with the cloning and reference strand-mediated conformation analysis techniques used. In the present study, we were able to confirm (class-I) and fully characterise (class-II) individual MHC variation. Our results, showing that variation has been maintained at both sites in this species despite reduced genome-wide variation , support the idea that balancing selection is maintaining variation at both MHC classes. Given the MHC’s role in antigen detection, this is likely to be pathogen-mediated selection.
GM variation is associated with MHC variation
We found that GM diversity was negatively associated with the presence of three (of 10) MHC-I alleles (Ase-ua3, Ase-ua4, Ase-ua5) and one (of three) MHC-II alleles (Ase-dab4). All three MHC-I alleles were consistently associated with a reduction in GM bacterial richness and phylogenetic diversity. Ase-ua3 and Ase-ua5 were additionally associated with reduced evenness in the GM. This suggests that these alleles may lead to the selective elimination of certain bacterial taxa from the gut, resulting in a reduced community of species with a narrower phylogenetic range. This is similar to another study that identified MHC-II motifs associated with reduced GM diversity in threespine sticklebacks . It is likely that MHC-II alleles—such as Ase-dab4 in the Seychelles warbler—directly impact GM diversity, since MHC-II molecules are produced in antigen-presenting cells, which are abundant in the lamina propria behind the gut epithelium . Such dendritic cells can extend between epithelial cells where they phagocytose particles, including microbes, from the gut lumen . Antigens from these microbes are then exported to the cell surface by MHC-II molecules, so that they can be presented to B and T cell populations  and, if recognised, instigate an immune response.
Our study expands on previous work by investigating how variation at MHC-I genes impact GM variation in a natural population (see also ). Laboratory studies comparing transgenic and wild-type rats  have previously shown that the expression of human MHC-I genes (specifically HLA-B27, typically associated with arthritis) alters GM composition and that this may be related to immune-mediated disease state. Likewise, in congenic mice , the lack of MHC-I-mediated antigen presentation resulted in shifts in composition and structure of the GM. In addition to the simplification of the GM in captive populations , animal models are typically engineered to be homozygous/heterozygous or to have different expression at specific alleles. In comparison, natural populations normally contain a diverse array of alleles and allelic combinations, and are exposed to a wider range of microbial taxa; thus, biologically relevant indirect effects of host genes on the GM may not be observed in captivity. This is particularly important when considering associations between GM characteristics and individual MHC-I variation. MHC-I molecules typically respond to intracellular microbes, rather than extracellular microbes, and play a central role in anti-viral and anti-tumour immunity . As such, we would not necessarily expect these molecules to recognise bacterial antigens, although cross reactivity via the presentation of exogenously derived antigens can occur . Instead, MHC-I variation may indirectly affect GM characteristics by impacting other aspects of the host’s health and physiology. In the Seychelles warbler, we know that survival is associated with a specific MHC-I allele (Ase-ua4)  and, although we do not know what drives this effect (i.e. we have not identified the selective factor responsible), any such interaction could also shape changes in the GM. In our study, the Ase-ua4 allele was associated with a reduction in GM richness and phylogenetic diversity but did not significantly change the composition of the GM. Numerous studies have linked MHC-I variation with susceptibility to malarial infection in passerines [116, 117] and other taxa . Malaria infections alter the GM via disruption to immune homeostasis  and could play a role in the Seychelles warbler, in which a single strain of the malarial parasite (Haemoproteus nucleocondensus) has been identified . Alternatively, MHC-I might alter an individual’s susceptibility to another, as yet, unidentified pathogen, that could also drive differences in the diversity of the GM [47, 120].
As well as MHC alleles directly, or indirectly (through fitness) driving the differences in the GM we see here, it is possible that these associations arise due to other causes. GM taxa have been shown to be heritable , and other genes have been associated with GM variation in wild populations . It may also be that variation at other loci in linkage disequilibrium with the MHC genes investigated here could drive the associations observed here. This is especially important to consider in bottlenecked species, such as the Seychelles warbler, where there will be considerable linkage disequilibrium throughout the genome. However, given the important role of the MHC genes in the acquired immune response, and because the alleles identified encode putatively functional variation in the peptide binding region of the MHC molecules, it is logical to suspect that the MHC variation is involved in driving the changes in GM variation seen here.
Given the negative, but lack of positive, associations identified between MHC alleles and GM alpha diversity, it is surprising that there was no significant effect of overall MHC diversity on the GM. One might expect the cumulative negative effect of each allele (Fig. 3) to cause at least a weak negative correlation between GM diversity and MHC diversity. However, with the multitude of factors involved in determining both the host’s GM and immune response, this lack of association could simply be due to limited power to detect weak effects, as is often the case when examining associations between immunocompetence and MHC variation . Assessing how the functional divergence of MHC alleles within an individual—which provides information about the range of antigens that can be detected —rather than just the number of MHC genes has provided additional resolution in other MHC studies e.g. . This approach can provide extra clarity in species with high MHC diversity where multiple alleles may be functionally similar with overlapping antigen binding properties [125, 126] particularly when considering the diversity of bacterial taxa that are present within the GM. However, in the bottlenecked population of the Seychelles warbler, the very limited number of alleles present in the MHC-I are all highly divergent and cannot be further reduced into functional supertypes. This is likely because functionally divergent MHC alleles were selected for through the bottleneck this species underwent 
We also observed compositional differences in the GM associated with four MHC-I alleles (Ase-ua7, Ase-ua11, and the co-occurring Ase-ua1/10 alleles) but not with any MHC-II alleles. Alleles linked to compositional differences were different to those that were negatively associated with GM alpha diversity. This pattern could arise if different ASVs are tolerated, or removed by MHC-mediated responses, but overall GM richness remains similar across individuals with or without a particular allele, causing compositional but not alpha diversity differences. In contrast, where a greater number of taxonomically similar ASVs and/or less abundant taxa—which may have a reduced impact on compositional differences—are removed by MHC-mediated responses, this may result in reduced alpha diversity but a largely similar composition in individuals with, compared to those without, these alleles. The biggest compositional shift was associated with Ase-ua1/10, whose presence/absence caused the greatest number of differentially abundant ASVs. This is perhaps not surprising, as individuals with both alleles would be able to recognise a larger number of antigens, thus providing a broader immune response compared to a single allele. However, the presence/absence of Ase-ua1/10 only explained 0.5–1.4% (depending on the metric of beta diversity) of the variance in GM composition suggesting that, separately, each allele has a relatively small impact on the GM overall.
Several ASVs were not assigned beyond the level of bacterial family and many bacterial taxa have not been fully characterised, making it difficult to draw conclusions about the functional significance of compositional changes in the GM for the host. However, there were several potentially interesting, shared candidate taxa that were differentially abundant between individuals with/without the Ase-ua1/10 and Ase-ua11 alleles. For example, individuals with these alleles had a reduced abundance of ASVs from the order Lactobacillales, a lactic acid-producing bacterial taxon, generally thought to be a beneficial member of the GM. Indeed, members of this order are used as probiotics in poultry farms to boost the immune response of chickens . In contrast, individuals with the Ase-ua1/10 and Ase-ua11 alleles had an increased abundance of ASVs from Bacteroidales, an order commonly associated with chronic intestinal inflammation . Two of these ASVs were from the genus Bacteroides, while species from this genus can be mutualistic, opportunistic pathogenic infections can occur in humans and other animals . Likewise, a third ASV from the genus Alistipes has a pathogenic role in various human and animal diseases . The patterns of change associated with Ase-ua7 were different to those arising from the presence/absence of Ase-ua1/10, and Ase-ua11, with fewer ASVs from the orders Lactobacillales and Bacteriodales being differentially abundant. Instead, several ASVs in the order Clostridiales were significantly more abundant when the Ase-ua11 or Ase-ua1/10 alleles were present (and less abundant when the Ase-ua7 allele was present), suggesting that this order could have been selectively tolerated or that ASVs in this order proliferated when other competing taxa were removed.
Direct tolerance of gut commensals by the host immune system can occur by several means, secretions produced by the microbes themselves can induce tolerance by the host immune system, or tolerogenic responses can be genetically encoded by the host . For example, genetically encoded tolerance mechanisms can occur through differential MHC expression on key tolerogenic inducing cells, such as the group 3 innate lymphoid cells . Immunoglobulin A repertoire in the gut can be controlled by MHC genotype, mediating the response against commensal microbes . Mucosal dendritic cells can also induce tolerogenic T and B cell responses, and regulatory T cells can directly recognise, and tolerate, commensal antigens [61, 62]. However, it is not clear how the presence of different polymorphisms in the MHC binding region could facilitate these mechanisms. More likely, coevolution between host and GM resulted in the MHC repertoire that has evolved to tolerate commensal taxa, with MHC alleles that eliminate commensal antigens being under weakened (or even negative) selection compared to those that eliminate pathogenic microbial antigens.
Cumulatively, the variance in composition explained by overall MHC allele presence or absence was low (6.3% or 8.9% for unweighted or weighted UniFrac, respectively). However, this is not unusual when investigating the factors that influence GM composition across individuals within a single population; for example, environmental and host factors explained between 0.4 and 10% variance in red squirrels (Tamiasciurus hudsonicus) . Even sampling period, the most significant predictor of beta diversity in our study, explained only 2% of variation in the GM. One explanation for the low level of explained variance could be the greater presence of transitionary microbiota in the avian gut i.e. dietary, or environmental microbes that are ingested and pass through the intestine without interacting with the host . Adaptations for flight have placed constraints on avian morphology, leading to shorter intestinal lengths, and consequently, shorter food retention times ; this may reduce the potential for bacterial species to adapt to the avian gut and to variation in host ecology. Secondly, if many bacterial taxa carry out the same function in the host gut, there could be a high turnover of species without any consequences for the host . This can give rise to high inter-individual variation in the GM and may explain why the variables analysed here (or indeed in many within-population studies of the GM) explain a low proportion of the overall variance. Indeed, functional GM diversity may be more important than species diversity for host fitness . To address this, future work incorporating metagenomic analysis would allow greater resolution of bacterial species and a more accurate assessment of the functional composition of the GM .
Apart from the acquired immune response, the GM may also be affected by the innate immune response (underpinned by genes such as TLRs) . However, we detected no effect of TLR3 genotype (one of the few TLRs to have functional variation in this system ) on the GM; this is despite survival and reproductive success being significantly associated with TLR3 variation in the Seychelles warbler . This is perhaps not surprising, given TLR3’s role in recognising viral dsRNA , rather than any bacterial conserved structures.
Individual heterozygosity was positively correlated with GM bacterial richness in the Seychelles warbler, though this was not associated with differences in GM phylogenetic diversity or composition. A decrease in individual heterozygosity (or increase in homozygosity) may reflect increased inbreeding. In the Seychelles warbler, increased inbreeding is associated with poorer individual condition (via reduced telomere length ) and reproductive success, with maternal homozygosity negatively predicting offspring survival in years with high mortality [57, 139]. Thus, we may be detecting an indirect effect of increased inbreeding, whereby the decreased fitness or health of individuals in turn negatively impacts GM diversity . However, it is also possible that this association could be driven by reduced heterozygosity of currently unknown, specific functional loci directly reducing GM diversity. In future studies, it could be informative to use either quantitative trait locus mapping , or genome-wide association studies to identify candidate genes associated with GM variation, e.g. [19, 142]. The Seychelles warbler could be particularly useful for this, as it underwent a bottleneck in the 1960s, resulting in a 25% reduction in genome-wide variation , thus making it a more tractable study system in which to disentangle the associations between host genetic variation and the GM.
It is difficult to assess the impact of the identified relationships between GM variation and MHC alleles on host fitness. Typically, greater GM alpha diversity is thought to be beneficial, as it correlates with increased health and survival in humans . However, other studies have shown that high alpha diversity can indicate dis-regulation and GM instability . Similarly, the function of many bacteria in the GM of wild animals is unknown. Given the key role that MHC genes play in pathogen resistance, it is possible that the observed negative correlation between GM alpha diversity and presence of specific MHC alleles is beneficial to the host. For example, Seychelles warblers with the Ase-ua4 allele had reduced GM alpha diversity, and this same allele conferred a survival advantage in individuals [50,58]. To fully unpick the consequences of these GM/MHC relationships in the Seychelles warbler, longitudinal data and analyses accounting for within- and between-individual differences are needed to fully test whether there are fitness differences between individuals with different MHC alleles and GM characteristics. This is no small undertaking, and will require extensive and powerful datasets, which are not yet available.
The GM may drive the evolution of immune genes
Variation in the GM can affect traits important to the host’s own fitness , including host immune function , the severity of diseases  and, ultimately, survival [109, 147], and this could provide the potential for evolutionary adaptation in the host . Balancing selection is thought to be central in maintaining diversity at MHC genes [149–151]. Thus, the GM could act as a selective pressure, shaping host phenotypes (reviewed in [152, 153]), ultimately resulting in host-microbiome co-evolution and adaptation, or speciation [9, 154]. For example, if components of the GM interact with MHC variation, leading to the differential selection of MHC alleles, this could explain how variation at MHC genes has been maintained in the previously bottlenecked Seychelles warbler [51, 53], despite the very limited macroparasite fauna in this population . We detected no effect of MHC diversity (or optimality) on the GM, and therefore no evidence of MHC heterozygote advantage in relation to the GM . However, specific alleles were associated with differences in GM composition; this is consistent with either rare allele  or fluctuating selection  mechanisms, although differentiating between the two is extremely difficult . Identifying the function of GM taxa that are associated with MHC alleles, whether they be pathogenic, beneficial, or commensal, could help infer the significance and direction of these associations.
Effects of age, sex and field period on the GM
In addition to genetic factors, several other key variables influenced the GM. Our results indicated a relationship between GM composition and age class in the Seychelles warbler. Old fledglings had reduced GM alpha diversity and compositional differences compared to all other age group comparisons (which did not differ from one another). In the Seychelles warbler, old fledglings are newly independent and start to forage for themselves and so may be eating different—perhaps lower quality—food items compared to older birds. This may explain the reduced number of differentially abundant taxa present in this age group, compared to others, including a reduced abundance of ASVs in the order Planctomycetes, which are typically transient colonisers of the gut (but see ). Alternatively, exposure to stress via glucocorticoids alters host GM in other species . Thus, increased stress in young individuals as they encounter new situations and pathogens could contribute to differences between age groups. Indeed, mortality is greatest during the first year of life in the Seychelles warbler .
While sex is an important determinant of individual variation in natural populations, its importance as a driver of GM variation varies across vertebrate species [10, 14, 110, 158]. Sex was only associated with a minor difference in the GM of the Seychelles warbler, with males having marginally reduced diversity, but no difference in composition compared to females. It is, perhaps, not surprising that the effect of sex on the GM was so limited, given that Seychelles warblers of both sexes have the same diet and exhibit limited differences in morphology and behaviour. In threespine sticklebacks, GM–MHC associations were sex-dependent ; however, we found no evidence of this in the Seychelles warbler.
Within a species, seasonal changes in diet can be an important factor driving GM variation [12, 110, 159]. In the Seychelles warbler, field period explains 1.7–2.1% of the variance in GM composition. Although the temperature on Cousin Island is relatively stable, there are measurable differences between seasons and years , which could lead to variation in the type and abundance of insect prey species. This may explain the observed difference in GM composition, but not diversity, between field periods. For example, mean island-wide territory quality increased by 80% in the major 2017 field period and 75% in the minor 2018 field period, compared to the later major 2018 field period. Alternatively, increases in food availability between seasons could also act indirectly on the GM by buffering individuals against stress or susceptibility to pathogens.
Our results show that variation has been maintained at MHC-I and MHC-II genes in the Seychelles warbler, and that the presence of specific alleles, but not MHC diversity, was associated with differences in GM diversity and composition. It is possible that such GM–MHC interactions might explain previous results in this population showing that specific MHC alleles are associated with higher survival. However, further longitudinal data are needed to establish whether these associations equate to fitness differences between individuals and to better understand host immunogenetic–GM coevolution.
Availability of data and materials
All 16S rRNA gene amplicon sequences have been submitted to the European Nucleotide Archive (ENA) database under the study accession number PRJEB45408. The 44 MHC alleles have been deposited at GenBank, accession numbers for MHC class I alleles are MZ509455-74, and for MHC class II alleles are MZ509475-98. All metadata, along with R scripts used to run analyses, are available in the Github Repository, https://github.com/Seychelle-Warbler-Project/Davies_2021_Microbiome.
Hird SM. Evolutionary biology needs wild microbiomes. Front Microbiol. 2017;8.
McFall-Ngai M, Hadfield MG, Bosch TCG, Carey HV, Domazet-Lošo T, Douglas AE, et al. Animals in a bacterial world, a new imperative for the life sciences. Proc Natl Acad Sci U S A. 2013;110.
Hooper LV, Midtvedt T, Gordon JI. How host-microbial interactions shape the nutrient environment of the mammalian intestine. Annu Rev Nutr. 2002;22.
Pickard JM, Zeng MY, Caruso R, Núñez G. Gut microbiota: role in pathogen colonization, immune responses, and inflammatory disease. Immunol Rev. 2017;279.
Round JL, Mazmanian SK. The gut microbiota shapes intestinal immune responses during health and disease. Nat Rev Immunol. 2009;9.
Amato KR. Co-evolution in context: the importance of studying gut microbiomes in wild animals. Microbiome Sci Med. 2013;1.
Maurice CF, Knowles S, Ladau J, Pollard KS, Fenton A, Pedersen AB, et al. Marked seasonal variation in the wild mouse gut microbiota. ISME J. 2015;9.
Hird SM, Sánchez C, Carstens BC, Brumfield RT. Comparative gut microbiota of 59 neotropical bird species. Front Microbiol. 2015;6.
Greene LK, Williams CV, Junge RE, Mahefarisoa KL, Rajaonarivelo T, Rakotondrainibe H, et al. A role for gut microbiota in host niche differentiation. ISME J. 2020;14.
Stoffel MA, Acevedo-Whitehouse K, Morales-Durán N, Grosser S, Chakarov N, Krüger O, et al. Early sexual dimorphism in the developing gut microbiome of northern elephant seals. Mol Ecol. 2020;29.
Videvall E, Song SJ, Bensch HM, Strandh M, Engelbrecht A, Serfontein N, et al. The development of gut microbiota in ostriches and its association with juvenile growth. bioRxiv. 2018.
Bolnick DI, Snowberg LK, Hirsch PE, Lauber CL, Knight R, Caporaso JG, et al. Individuals’ diet diversity influences gut microbial diversity in two freshwater fish (threespine stickleback and Eurasian perch). Ecol Lett. 2014;17.
Davidson GL, Cooke AC, Johnson CN, Quinn JL. The gut microbiome as a driver of individual variation in cognition and functional behaviour. Philos Trans R Soc Lond Ser B Biol Sci. 2018;373.
Tung J, Barreiro LB, Burns MB, Grenier J-C, Lynch J, Grieneisen LE, et al. Social networks predict gut microbiome composition in wild baboons. eLife. 2015;4.
Mallott EK, Borries C, Koenig A, Amato KR, Lu A. Reproductive hormones mediate changes in the gut microbiome during pregnancy and lactation in Phayre’s leaf monkeys. Sci Rep. 2020;10.
Noguera JC, Aira M, Pérez-Losada M, Domínguez J, Velando A. Glucocorticoids modulate gastrointestinal microbiome in a wild bird. R Soc Open Sci. 2018;5.
Spor A, Koren O, Ley R. Unravelling the effects of the environment and host genotype on the gut microbiome. Nat Rev Microbiol. 2011;9.
Perofsky AC, Lewis RJ, Abondano LA, Fiore AD, Meyers LA. Hierarchical social networks shape gut microbial composition in wild Verreaux’s sifaka. Proc R Soc B. 2017;284.
Suzuki TA, Phifer-Rixey M, Mack KL, Sheehan MJ, Lin D, Bi K, et al. Host genetic determinants of the gut microbiota of wild mice. Mol Ecol. 2019;28.
Couch CE, Arnold HK, Crowhurst RS, Jolles AE, Sharpton TJ, Witczak MF, et al. Bighorn sheep gut microbiomes associate with genetic and spatial structure across a metapopulation. Sci Rep. 2020;10.
Marietta E, Rishi A, Taneja V. Immunogenetic control of the intestinal microbiota. Immunology. 2015:145.
Hooper LV, Littman DR, Macpherson AJ. Interactions between the microbiota and the immune system. Science. 2012:336.
Pirr S, Viemann D. Host factors of favorable intestinal microbial colonization. Front Immunol. 2020;11.
Claesson MJ, Jeffery IB, Conde S, Power SE, O’Connor EM, Cusack S, et al. Gut microbiota composition correlates with diet and health in the elderly. Nature. 2012;488.
Cho I, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet. 2012;13.
Kurilshikov A, Wijmenga C, Fu J, Zhernakova A. Host genetics and gut microbiome: challenges and perspectives. Trends Immunol. 2017;38.
Thaiss CA, Zmora N, Levy M, Elinav E. The microbiome and innate immunity. Nature. 2016:535.
Piertney SB, Oliver MK. The evolutionary ecology of the major histocompatibility complex. Heredity. 2005;96.
Delves PJ, Roitt IM. The immune system. N Engl J Med. 2000;343.
Klein J. Natural history of the major histocompatibility complex: Wiley; 1986.
Hughes AL, Yeager M. Natural selection at major histocompatibility complex loci of vertebrates. Annu Rev Genet. 1998;32.
Hill AVS, Allsopp CEM, Kwiatkowski D, Anstey NM, Twumasi P, Rowe PA, et al. Common West African HLA antigens are associated with protection from severe malaria. Nature. 1991;352.
Loiseau C, Zoorob R, Garnier S, Birard J, Federici P, Julliard R, et al. Antagonistic effects of a Mhc class I allele on malaria-infected house sparrows. Ecol Lett. 2008;11.
Hedrick PW. Evolutionary genetics of the major histocompatibility complex. Am Nat. 1994;143.
Spurgin LG, Richardson DS. How pathogens drive genetic diversity: MHC, mechanisms and misunderstandings. Proc R Soc B. 2010;277.
Khan AA, Yurkovetskiy L, O’Grady K, Pickard JM, Pooter R, Antonopoulos DA, et al. Polymorphic immune mechanisms regulate commensal repertoire. Cell Rep. 2019:29.
Toivanen P, Vaahtovuo J, Eerola E. Influence of major histocompatibility complex on bacterial composition of fecal flora. Infect Immun. 2001;69.
Khan MAW, Stephens WZ, Mohammed AD, Round JL, Kubinak JL. Does MHC heterozygosity influence microbiota form and function. PLoS One. 2019:14.
Lin P, Bach M, Asquith M, Lee AY, Akileswaran L, Stauffer P, et al. HLA-B27 and human β2-Microglobulin affect the gut microbiota of transgenic rats. PLoS One. 2014;9.
Clayton JB, Vangay P, Huang H, Ward T, Hillmann BM, Al-Ghalith GA, et al. Captivity humanizes the primate microbiome. Proc Natl Acad Sci U S A. 2016;113.
Jensen PE. Recent advances in antigen processing and presentation. Nat Immunol. 2007;8.
Montero BK, Wasimuddin SN, Gillingham MAF, Ratovonamana YR, Rakotondranary SJ, et al. Evidence of MHC class I and II influencing viral and helminth infection via the microbiome in a non-human primate. PLoS Pathog. 2021;17.
Bolnick DI, Snowberg LK, Caporaso JG, Lauber C, Knight R, Stutz WE. Major Histocompatibility Complex class IIb polymorphism influences gut microbiota composition and diversity. Mol Ecol. 2014;23.
Hernández-Gómez O, Briggler JT, Williams RN. Influence of immunogenetics, sex and body condition on the cutaneous microbial communities of two giant salamanders. Mol Ecol. 2018;27.
Leclaire S, Strandh M, Dell’Ariccia G, Gabirot M, Westerdahl H, Bonadonna F. Plumage microbiota covaries with the major histocompatibility complex in blue petrels. Mol Ecol. 2019;28.
Pearce DS, Hoover BA, Jennings S, Nevitt GA, Docherty KM. Morphological and genetic factors shape the microbiome of a seabird species (Oceanodroma leucorhoa) more than environmental and social factors. Microbiome. 2017;5.
Li N, Ma W-T, Pang M, Fan Q-L, Hua J-L. The Commensal microbiota and viral infection: A comprehensive review. Front Immunol. 2019;10.
Komdeur J. Importance of habitat saturation and territory quality for evolution of cooperative breeding in the Seychelles warbler. Nature. 1992;358.
Hammers M, Kingma SA, Spurgin LG, Bebbington K, Dugdale HL, Burke T, et al. Breeders that receive help age more slowly in a cooperatively breeding bird. Nat Com. 2019;10.
Davies CS, Taylor MI, Hammers M, Burke T, Komdeur J, Dugdale HL, et al. Contemporary evolution of the innate immune receptor gene TLR3 in an isolated vertebrate population. Mol Ecol. 2021;30.
Spurgin LG, Wright DJ, van der Velde M, Collar NJ, Komdeur J, Burke T, et al. Museum DNA reveals the demographic history of the endangered Seychelles warbler. Evol Appl. 2014;7.
Richardson D, Jury F, Blaakmeer K, Komdeur J, Burke T. Parentage assignment and extra-group paternity in a cooperative breeder: the Seychelles warbler (Acrocephalus sechellensis). Mol Ecol. 2001;10.
Richardson DS, Westerdahl H. MHC diversity in two Acrocephalus species: the outbred Great reed warbler and the inbred Seychelles warbler. Mol Ecol. 2003;12.
Hutchings K. Parasite-mediated selection in an island endemic, the Seychelles warbler (Acrocephalus sechellensis). University of East. Anglia. 2009.
Gilroy D, van Oosterhout C, Komdeur J, Richardson DS. Toll-like receptor variation in the bottlenecked population of the endangered Seychelles warbler. Anim Conserv. 2017;20.
Bebbington K, Spurgin LG, Fairfield EA, Dugdale HL, Komdeur J, Burke T, et al. Telomere length reveals cumulative individual and transgenerational inbreeding effects in a passerine bird. Mol Ecol. 2016;25.
Brouwer L, Komdeur J, Richardson DS. Heterozygosity–fitness correlations in a bottlenecked island species: a case study on the Seychelles warbler. Mol Ecol. 2007;16.
Brouwer L, Barr I, Van De Pol M, Burke T, Komdeur J, Richardson DS. MHC-dependent survival in a wild population: evidence for hidden genetic benefits gained through extra-pair fertilizations. Mol Ecol. 2010;19.
Richardson DS, Komdeur J, Burke T, von Schantz T. MHC-based patterns of social and extra-pair mate choice in the Seychelles warbler. Proc R Soc B. 2005;272.
Kubinak JL, Stephens WZ, Soto R, Petersen C, Chiaro T, Gogokhia L, et al. MHC variation sculpts individualized microbial communities that control susceptibility to enteric infection. Nat Commun. 2015:6.
Lathrop SK, Bloom SM, Rao SM, Nutsch K, Lio C-W, Santacruz N, et al. Peripheral education of the immune system by colonic commensal microbiota. Nature. 2011;478.
Spasova DS, Surh CD. Blowing on embers: commensal microbiota and our immune system. Front Immunol. 2014;5.
Brouwer L, Tinbergen JM, Both C, Bristol R, Richardson DS, Komdeur J. Experimental evidence for density-dependent reproduction in a cooperatively breeding passerine. Ecology. 2009:90.
Sparks AM, Spurgin LG, van der Velde M, Fairfield EA, Komdeur J, Burke T, et al. Telomere heritability and parental age at conception effects in a wild avian population. Mol Ecol. 2021.
Komdeur J, Piersma T, Kraaijeveld K, Kraaijeveld-Smit F, Richardson DS. Why Seychelles Warblers fail to recolonize nearby islands: unwilling or unable to fly there? Ibis. 2004;146.
Hammers M, Brouwer L. Rescue behaviour in a social bird: removal of sticky ‘bird-catcher tree’seeds by group members. Behaviour. 2017;154.
Komdeur J, Daan S. Breeding in the monsoon: semi-annual reproduction in the Seychelles warbler (Acrocephalus sechellensis). J Ornithol. 2005;146.
Komdeur J, Burke T, Dugdale HL, Richardson DS. Seychelles warblers: complexities of the helping paradox. Cooperative breeding in vertebrates. Stud Ecol Evol Behav. 2016.
Wright DJ. Evolutionary and conservation genetics of the Seychelles warbler (Acrocephalus sechellensis). University of East. Anglia. 2014.
Knutie SA, Gotanda KM. A non-invasive method to collect fecal samples from wild birds for microbiome studies. Microb. 2018;76.
Coulon A: genhet: an easy-to-use R function to estimate individual heterozygosity. Mol Ecol Resour 2010; 10.
Griffiths R, Double MC, Orr K, Dawson RJG. a DNA test to sex most birds. Mol Ecol. 1998;7.
Wright DJ, Spurgin LG, Collar NJ, Komdeur J, Burke T, Richardson DS. The impact of translocations on neutral and functional genetic diversity within and among populations of the Seychelles warbler. Mol Ecol. 2014;23.
Miya M, Sato Y, Fukunaga T, Sado T, Poulsen JY, Sato K, et al. MiFish, a set of universal PCR primers for metabarcoding environmental DNA from fishes: detection of more than 230 subtropical marine species. R Soc Open Sci. 2015;2.
Westerdahl H, Wittzell H, Schantz T, Bensch S. MHC class I typing in a songbird with numerous loci and high polymorphism using motif-specific PCR and DGGE. Heredity. 2004;92.
Sebastian A, Herdegen M, Migalska M, Radwan J. Amplisas: a web server for multilocus genotyping using next-generation amplicon sequencing data. Mol Ecol Resour. 2016;16.
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17.
Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, Turnbaugh PJ, et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci U S A. 2011;108.
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010:7.
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods. 2016;13.
Callahan BJ, McMurdie PJ, Holmes SP. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 2017;11.
Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30.
Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol. 2009;26.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2012;41.
Yilmaz P, Parfrey LW, Yarza P, Gerken J, Pruesse E, Quast C, et al. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res. 2013;42.
Davis NM, Proctor DM, Holmes SP, Relman DA, Callahan BJ. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome. 2018;6.
McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One. 2013;8.
Hsieh T, Ma K, Chao A. iNEXT: an R package for rarefaction and extrapolation of species diversity (H ill numbers). Methods Ecol Evol. 2016;7.
Chao A. Nonparametric estimation of the number of classes in a population. Scand Stat Theory Appl. 1984.
Shannon CE. A mathematical theory of communication. Bell Syst Techn J. 1948;27.
Faith DP. Conservation evaluation and phylogenetic diversity. Biol Conserv. 1992;61.
Battaglia T. btools: a suite of R function for all types of microbial diversity analyses. 0.0.1 edition; 2018.
Brooks ME, Kristensen K, van Benthem KJ, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. R J. 2017;9.
Nowak MA, Tarczy-Hornoch K, Austyn JM. The optimal number of major histocompatibility complex molecules in an individual. Proc Natl Acad Sci U S A. 1992;89.
Gelman A, Su Y, Masanao Y, Zheng T, Dorie V. arm: data analysis using regression and multilevel/hierarchical models, version 1.10-1; 2018.
Schloerke B, Crowley J, Cook D, Hofmann H, Wickham H, Briatte F, et al: Ggally: extension to ggplot2. 2011.
Hartig F: DHARMa: residual diagnostics for hierarchical (multi-level/mixed) regression models. R package version 024. 2017; 5.
Barton K, Barton MK. Package ‘MuMIn’. R package version 1.43.6 edition; 2019.
Burnham KP, Anderson DR, Huyvaert KP. AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behav Ecol Sociobiol. 2011;65.
Paulson JN, Stine O, Bravo H, Pop M. Robust methods for differential abundance analysis in marker gene surveys. Nat Methods. 2013;10.
Paulson JN, Pop M, Bravo HC: metagenomeSeq: statistical analysis for sparse high-throughput sequencing. Bioconductor package 2013; 1.
Thorsen J, Brejnrod A, Mortensen M, Rasmussen MA, Stokholm J, Al-Soud WA, et al. Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies. Microbiome. 2016;4.
Lozupone C, Hamady M, Knight R. UniFrac–an online tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinform. 2006;7.
Lozupone CA, Hamady M, Kelley ST, Knight R. Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial Ccommunities. Appl Environ. 2007;73.
Oksanen J, Kindt R, Legendre P, O’Hara B, Stevens MHH, Oksanen MJ, et al. The vegan package Community ecology package; 2007. p. 10.
Anderson MJ. A new method for non-parametric multivariate analysis of variance. Austral Ecol. 2001;26.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15.
Kropáčková L, Těšický M, Albrecht T, Kubovčiak J, Čížková D, Tomášek O, et al. Codiversification of gastrointestinal microbiota and phylogeny in passerines is not explained by ecological divergence. Mol Ecol. 2017.
Davidson GL, Somers SE, Wiley N, Johnson CN, Reichert MS, Ross RP, et al. A time-lagged association between the gut microbiome, nestling weight and nestling survival in wild great tits. J Anim Ecol. 2021;90.
Ren T, Boutin S, Humphries MM, Dantzer B, Gorrell JC, Coltman DW, et al. Seasonal, spatial, and maternal effects on gut microbiome in wild red squirrels. Microbiome. 2017;5.
Góngora E, Elliott KH, Whyte L. Gut microbiome is affected by inter-sexual and inter-seasonal variation in diet for thick-billed murres (Uria lomvia). Sci Rep. 2021;11.
Minias P, Pikus E, Whittingham LA, Dunn PO. A global analysis of selection at the avian MHC. Evolution. 2018:72.
Niess JH, Brand S, Gu X, Landsman L, Jung S, McCormick BA, et al. CX3CR1-Mediated dendritic cell access to the intestinal lumen and bacterial clearance. Science. 2005;307.
Hooper LV, Macpherson AJ. Immune adaptations that maintain homeostasis with the intestinal microbiota. Nat Rev Immunol. 2010;10.
Ackerman AL, Cresswell P. Cellular mechanisms governing cross-presentation of exogenous antigens. Nat Immunol. 2004;5.
Biedrzycka A, Bielański W, Ćmiel A, Solarz W, Zając T, Migalska M, et al. Blood parasites shape extreme major histocompatibility complex diversity in a migratory passerine. Mol Ecol. 2018;27.
Westerdahl H, Waldenström J, Hansson B, Hasselquist D, von Schantz T, Bensch S. Associations between malaria and MHC genes in a migratory songbird. Proc R Soc B. 2005;272.
Mukherjee D, Chora ÂF, Mota MM. Microbiota, a third player in the host plasmodium affair. Trends Parasitol. 2020;36.
Fairfield EA, Hutchings K, Gilroy DL, Kingma SA, Burke T, Komdeur J, et al. The impact of conservation-driven translocations on blood parasite prevalence in the Seychelles warbler. Sci Rep. 2016;6.
Rosshart SP, Vassallo BG, Angeletti D, Hutchinson DS, Morgan AP, Takeda K, et al. Wild mouse gut microbiota promotes host fitness and improves disease resistance. Cell. 2017;171.
Grieneisen L, Dasari M, Gould TJ, Björk JR, Grenier J-C, Yotova V, et al. Gut microbiome heritability is nearly universal but environmentally contingent. Science. 2021;373.
Gaigher A, Burri R, San-Jose LM, Roulin A, Fumagalli L. Lack of statistical power as a major limitation in understanding MHC-mediated immunocompetence in wild vertebrate populations. Mol Ecol. 2019;28.
Wakeland EK, Boehme S, She JX, Lu C-C, McIndoe RA, Cheng I, et al. Ancestral polymorphisms of MHC class II genes: divergent allele advantage. J Immunol Res. 1990;9.
Pineaux M, Merkling T, Danchin E, Hatch S, Duneau D, Blanchard P, et al. Sex and hatching order modulate the association between MHC-II diversity and fitness in early-life stages of a wild seabird. Mol Ecol. 2020;29.
Sepil I, Lachish S, Hinks AE, Sheldon BC. MHC supertypes confer both qualitative and quantitative resistance to avian malaria infections in a wild bird population. Proc Soc B. 2013;280.
Lenz TL. Computational prediction of MHC II-antigen binding supports divergent allele advantage and explains trans-species polymorphism. Evolution. 2011;65.
Brisbin JT, Gong J, Orouji S, Esufali J, Mallick AI, Parvizi P, et al. Oral treatment of chickens with lactobacilli influences elicitation of immune responses. Clin Vaccine Immunol. 2011;18.
Zitomersky NL, Atkinson BJ, Franklin SW, Mitchell PD, Snapper SB, Comstock LE, et al. Characterization of adherent bacteroidales from intestinal biopsies of children and young adults with inflammatory bowel disease. PLoS One. 2013;8.
Wexler HM. Bacteroides: the good, the bad, and the nitty-gritty. Clin. 2007;20.
Parker BJ, Wearsch PA, Veloo ACM, Rodriguez-Palacios A. The genus Alistipes: gut bacteria with emerging implications to inflammation, cancer, and mental health. Front Immunol. 2020;11.
Round JL, O'Connell RM, Mazmanian SK. Coordination of tolerogenic immune responses by the commensal microbiota. J Autoimmun. 2010;34.
Hepworth MR, Fung TC, Masur SH, Kelsen JR, McConnell FM, Dubrot J, et al. Immune tolerance. Group 3 innate lymphoid cells mediate intestinal selection of commensal bacteria-specific CD4+ T cells. Science (New York, NY). 2015;348.
Song SJ, Sanders JG, Delsuc F, Metcalf J, Amato K, Taylor MW, et al. Comparative analyses of vertebrate gut microbiomes reveal convergence between birds and bats. mBio. 2020;11.
Caviedes-Vidal E, McWhorter TJ, Lavin SR, Chediack JG, Tracy CR, Karasov WH. The digestive adaptation of flying vertebrates: high intestinal paracellular absorption compensates for smaller guts. Proc Natl Acad Sci U S A. 2007;104.
Huttenhower C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, et al. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486.
Louca S, Polz MF, Mazel F, Albright MBN, Huber JA, O’Connor MI, et al. Function and functional redundancy in microbial systems. Nat Ecol Evol. 2018;2.
Ranjan R, Rani A, Metwally A, McGee HS, Perkins DL. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing. Biochem Biophys Res Commun. 2016;469.
Barton GM. Viral recognition by Toll-like receptors. Semin Immunol. 2007;19.
Richardson DS, Komdeur J, Burke T. Inbreeding in the Seychelles warbler: environment-dependent maternal effects. Evolution. 2004;58.
Grosser S, Sauer J, Paijmans AJ, Caspers BA, Forcada J, Wolf JBW, et al. Fur seal microbiota are shaped by the social and physical environment, show mother–offspring similarities and are associated with host genetic quality. Mol Ecol. 2019;28.
Snijders AM, Langley SA, Kim Y-M, Brislawn CJ, Noecker C, Zink EM, et al. Influence of early life exposure, host genetics and diet on the mouse gut microbiome and metabolome. Nat Microbiol. 2016;2.
Bonder MJ, Kurilshikov A, Tigchelaar EF, Mujagic Z, Imhann F, Vila AV, et al. The effect of host genetics on the gut microbiome. Nat Genet. 2016;48.
Coyte KZ, Schluter J, Foster KR. The ecology of the microbiome: networks, competition, and stability. Science. 2015;350.
Suzuki TA. Links between natural variation in the microbiome and host fitness in wild mammals. Integr. 2017;57.
van Veelen HPJ, Falcão Salles J, Matson KD, van der Velde M, Tieleman BI. Microbial environment shapes immune function and cloacal microbiota dynamics in zebra finches Taeniopygia guttata. Anim Microbiome. 2020;2.
Villarino NF, LeCleir GR, Denny JE, Dearth SP, Harding CL, Sloan SS, et al. Composition of the gut microbiota modulates the severity of malaria. Proc Natl Acad Sci U S A. 2016;113.
Benskin CMH, Rhodes G, Pickup RW, Mainwaring MC, Wilson K, Hartley IR. Life history correlates of fecal bacterial species richness in a wild population of the blue tit Cyanistes caeruleus. Ecol Evol. 2015;5.
Kolodny O, Callahan BJ, Douglas AE. The role of the microbiome in host evolution. Philos Trans R Soc Lond Ser B Biol Sci. 2020;375.
Eizaguirre C, Lenz TL, Kalbe M, Milinski M. Rapid and adaptive evolution of MHC genes under parasite selection in experimental vertebrate populations. Nat Comms. 2012;3.
Hedrick PW. Pathogen resistance and genetic variation at MHC loci. Evolution. 2002;56.
Bernatchez L, Landry C. MHC studies in nonmodel vertebrates: what have we learned about natural selection in 15 years? J Evol Biol. 2003;16.
Koskella B, Bergelson J. The study of host-microbiome (co) evolution across levels of selection. Philos Trans R Soc Lond Ser B Biol Sci. 2020;375.
Rowe M, Veerus L, Trosvik P, Buckling A, Pizzari T. The reproductive microbiome: an emerging driver of sexual selection, sexual conflict, mating systems, and reproductive isolation. Trends Ecol Evol. 2020;35.
Moeller AH, Sanders JG. Roles of the gut microbiota in the adaptive evolution of mammalian species. Philos Trans R Soc Lond Ser B Biol Sci. 2020;375.
Doherty PC, Zinkernagel RM. Enhanced immunological surveillance in mice heterozygous at the H-2 gene complex. Nature. 1975;256.
Slade R, McCallum H. Overdominant vs. frequency-dependent selection at MHC loci. Genetics. 1992;132.
Cayrou C, Sambe B, Armougom F, Raoult D, Drancourt M. Molecular diversity of the Planctomycetes in the human gut microbiota in France and Senegal. APMIS. 2013;121.
Teyssier A, Rouffaer LO, Saleh Hudin N, Strubbe D, Matthysen E, Lens L, et al. Inside the guts of the city: urban-induced alterations of the gut microbiota in a wild passerine. Sci Total Environ. 2018;612.
Michel AJ, Ward LM, Goffredi SK, Dawson KS, Baldassarre DT, Brenner A, et al. The gut of the finch: uniqueness of the gut microbiome of the Galápagos vampire finch. Microbiome. 2018;6.
van de Crommenacker J, Komdeur J, Burke T, Richardson DS. Spatio-temporal variation in territory quality and oxidative status: a natural experiment in the Seychelles warbler (Acrocephalus sechellensis). J Anim Ecol. 2011;80.
We thank the Seychelles Bureau of Standards and the Department of Environment for permission to conduct fieldwork overall and Nature Seychelles for facilitating fieldwork on Cousin Island. This study would not have been possible without the contribution of exceptional fieldworkers and technicians. We particularly thank Gavin Horsburgh for assistance in MHC sequencing, Maria-Elena Mannarelli for assistance extracting the faecal samples, and Marco van der Velde for microsatellite genotyping. Bacterial data generation and analysis were carried out by the Centre for Genomic Research, University of Liverpool. The research presented in this paper was carried out on the High-Performance Computing Clusters supported respectively by the Research and Specialist Computing Support service at the University of East Anglia and IT Services at the University of Sheffield.
CSD was funded by the Natural Environment Research Council and EnvEast DTP (NE/L002582/1). The MHC laboratory work was supported by the UK Natural Environment Research Council (NERC) Biomolecular Analysis Facility at the University of Sheffield (NBAF1150). The microbiome sequencing was funded by a NERC NBAF Pilot Scheme Grant (NBAF1092) and by a NERC grant (NE/S010939/1) to DSR.
Ethics approval and consent to participate
Fieldwork was carried out in accordance with local ethical regulations and agreements. The Seychelles Department of Environment and the Seychelles Bureau of Standards approved the fieldwork.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1: Table S1.
Primer sequences used for MHC sequencing. Degenerate bases are shown according to IUPAC codes: Y = C/T, N = any base. Table S2. Repeatability of MHC-I (n = 26) and MHC-II (n = 24) genotyping for different dominant frequency thresholds. Minimum amplicon frequency was kept constant at 0.3%. Threshold with the greatest repeatability is in bold. Table S3. Repeatability of MHC-I (n = 26) and MHC-II (n = 24) genotyping for different minimum amplicon frequencies. Minimum dominant frequency threshold was kept constant at 25%. Minimum amplicon frequency with the greatest repeatability for each MHC class is in bold. Table S5. Core families present in 281 faecal samples, collected from 224 Seychelles warblers. Core microbiome is defined as bacterial families that appeared in at least 50% of samples, with a minimum relative abundance of 0.1%. Total number of reads, and % of all reads are included. Table S5. The effect of host-associated variables on gut microbiome diversity in the Seychelles warbler (n = 195). GLMMs for three metrics of alpha diversity: Shannon diversity, Chao 1(log transformed) and Faiths phylogenetic diversity (log transformed): A) including the presence/absence of MHC alleles or, B) MHC diversity. A Linear model was used to generate conditional model-averaged estimates (β), their standard error (SE), z value, P value, and relative importance (ω) are shown for all predictors featuring in the top model set (ΔAICc ≤ 7). All continuous factors were standardised. Estimates are in reference to MHC allele = absent, TLR3 genotype = TLR3AA, sex = female, age class = fledgling, field period = Major 2017. Significant terms are in bold and underlined. *** P < 0.001, ** P < 0.01, * P < 0.05. Figure S1. (A) Sample completeness and (B) rarefaction curves in Seychelles warbler faecal samples. Each line represents a single faecal sample (281 faecal samples, collected from 224 Seychelles warblers). Curves were generated using the R package iNEXT 2.0.20, with 50 bootstrap replicates per sample. The dashed line represents the number of reads used as a cut-off for retaining samples in downstream analysis (all samples with fewer than 10,000 reads were removed). Figure S2. Prevalence and total abundance of all ASV’s separated by phylum. Each phylum is shown in a separate plot, and a different colour. Dashed lines represent the values used as cut-offs for filtering rare taxa before alpha and beta diversity analyses (minimum abundance = 50), and additional filtering for beta diversity (prevalence threshold = 2.5%). Figure S3. Individual repeatability of alpha and beta diversity measures in the Seychelles warbler. This was tested by sequecnoing multiple samples taken from the same individuals; these samples were collected during the same field season (n = 115 faecal samples from 51 individuals. Pairwise Euclidean distances were calculated between samples taken from different individuals, versus those from within the same individual, in the same season for A). Shannon dissimilarity B) unweighted UniFrac dissimilarity and C) weighted UniFrac dissimilarity. Boxes span the interquartile (25% - 75%) range. Whiskers extend to 1.5 times the interquartile range. The median is marked by a horizontal line and the mean is marked by a diamond. Dark blue points in A) indicate pairwise comparisons involving two outliers. Significant differences are shown, and P-values are derived from Kruskal–Wallis tests: *** P < 0.001, * P < 0.05. Figure S4. The repeatability of sequencing methods. This was tested by sequencing 37 faecal samples taken from individual Seychelles warblers twice. A) Relative abundance (%) of the 10 most abundant taxa at the phylum level for the 37 duplicated samples. Each column represents one sample, black lines separate duplicated samples. All other taxa within each sample are collapsed into the low abundance category. B) The pairwise Euclidean dissimilarity between different samples, versus within pairs of duplicated samples (same DNA sequenced twice) for i. Shannon dissimilarity ii. unweighted UniFrac dissimilarity and iii. weighted UniFrac dissimilarity. Boxes span the interquartile (25% - 75%) range. Whiskers extend to 1.5 times the interquartile range. The median is marked by a horizontal line and the mean is marked by a diamond. Significant differences are shown, and P-values were derived from Kruskal–Wallis tests: *** P < 0.001. Figure S5. Beta diversity of Seychelles warbler gut microbiome composition in different age classes. The principal coordinate plots are based on A) unweighted UniFrac distances, and B) weighted UniFrac distances. Points represent a single faecal sample from a different individual (n = 195). Sample sizes are specified in brackets in the legend, and colours indicate the age class which was either fledgling (yellow), old-fledgling (green), sub-adult (indigo) and adult (purple). Ellipses represent a 95% confidence interval around the cluster centroids. Figure S6. Differentially abundant ASV’s in the gut microbiome of Seychelles warblers between different age categories (FL = fledgling, OFL = old fledgling, SA = sub-adult, A = adult). ASVs are grouped at the level of bacterial order and coloured according to bacterial phylum. Differential ASV abundance was assessed using negative binomial Wald tests and P values were adjusted using the Benjamini and Hochberg false-discovery rate correction with a significance cut-off of P < 0.01. ASVs shown with a log2 fold change greater than zero are significantly more abundant in the age classes on the left and ASVs with a log2 fold change smaller than zero are significantly more abundant in age classes on the right. Figure S7. Differentially abundant ASV’s in the gut microbiome of Seychelles warblers, between seasons. Comparisons are A) Major 2017 vs Minor 2018, B) Major 2018 vs Minor 2017, or C) Major 2017 vs Major 2018. ASVs are grouped at the level of bacterial order and coloured according to bacterial phylum. Differential ASV abundance was assessed using negative binomial Wald tests and P values were adjusted using the Benjamini and Hochberg false-discovery rate correction with a significance cut-off of P < 0.01. ASVs shown with a log2 fold change greater than zero are significantly more abundant in seasons on the left and ASVs with a log2 fold change smaller than zero are significantly more abundant in seasons on the right.
Additional file 2.
The identity of 19 amplicon sequencing variants (ASVs) identified in the negative extraction controls. A taxonomic breakdown and the number of reads associated with each ASV is provided. ASVs were either filtered from the dataset before further analysis or retained. In the second tab is a detailed breakdown of number of reads with each ASV present in each sample, along with whether it is a negative control or faecal sample, and whether it was sequenced in the 1st, 2nd, or 3rd sequencing run.
Additional file 3.
Differentially abundant ASVs (Padj < 0.01) in the gut microbiomes of Seychelles warblers, according to the presence/absence of the MHC-I alleles A) Ase-ua7 B) Ase-ua11 or C) Ase-ua1/10. Differential ASV abundance was assessed using negative binomial Wald tests and P values were adjusted using the Benjamini and Hochberg false-discovery rate correction with a significance cut-off of P < 0.01. ASVs shown with a log2-fold change greater than zero are significantly more abundant in individuals without this allele and ASVs with a log2 fold change smaller than zero are significantly more abundant in individuals with a copy of this allele.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Davies, C.S., Worsley, S.F., Maher, K.H. et al. Immunogenetic variation shapes the gut microbiome in a natural vertebrate population. Microbiome 10, 41 (2022). https://doi.org/10.1186/s40168-022-01233-y
- Major Histocompatibility Complex
- Gut microbiome
- Microbial diversity
- Acrocephalus sechellensis
- Genetic variation
- Life history