Farms selection and sample collection
The final number of farms included in the study was 38, distributed all over Spain (see Additional file 12: Figure S7). These farms were divided into intensive (19 herds) and extensive (19 herds) farms based on their production system. Farm and sampling characteristics, including sampling season, and antimicrobial consumption of each farm are included in an additional table (see Additional file 13). Sampling was carried out between 2017 and 2018 in feedlots with pigs between 6 and 8 months old. No antimicrobial treatment was administered in the immediate month prior to the sampling.
On each farm, feces, environmental swabs, and slurry were collected. Five fresh fecal samples were obtained from the rectum of fattening pigs. Five samples were swabbed in the environment of the fattening unit (feeders, drinkers, floors, walls, and windows) using swabs soaked in phosphate buffer saline (PBS) 1×. On those farms with slurry pits, three samples were collected from different points of the pit. Nine extensive farms did not provide slurry samples due to the lack of slurry pits in their facilities. The samplings were carried out by trained veterinarians and the samples were sent to our laboratory under cooling conditions (2–8 °C) in less than 24 h.
Antimicrobial use (AMU)
The veterinary practitioner responsible for each farm recorded the antimicrobial consumption of the pigs of the fattening unit sampled over the immediate 4-month period prior to sampling. AMU was categorized into 10 classes: (i) total, (ii) aminoglycosides, (iii) beta-lactams, (iv) diaminopyrimidines, (v) MLP, (vi) phenicols, (vii) polymyxins, (viii) quinolones, (ix) sulfonamides, and (x) tetracyclines. For each antimicrobial class, consumption per farm was expressed in annual mg/PCU, following the European Surveillance of Veterinary Antimicrobial Consumption (ESVAC) protocol [52].
Sample processing, DNA extraction, library preparation, and sequencing
From each farm, a single DNA sample was obtained from fecal, environmental and, if available, slurry samples. A total of 105 DNA samples were sequenced from these 38 independent herds, including three DNA extraction negative controls. The samples were divided into environment (38), feces (38), and slurry (29). Prior to DNA extraction, fecal samples from each farm were pooled by stirring thoroughly with a sterile tongue depressor using 3 g per individual sample, obtaining a final composite sample of 15 g. After its homogenization, 2 g were soaked in 18 ml of PBS 1× and vigorously mixed for 5 min using a Stomacher Laboratory Blender (Seward, Worthing, UK). For the environmental and slurry samples, 2 ml were recovered from each individual sample and added to a 15-ml sterile Falcon tube (BD, Erembodegem, Belgium), obtaining a final volume of 10 ml and 6 ml, respectively. These tubes were centrifuged at 4500×g for 10 min at 4 °C, discharging the supernatant. Sample handling was performed on ice.
DNA was extracted using the Stool DNA Purification Kit (EURX, Gdańsk, Poland) following the manufacturer’s instructions with minor modifications. As starting material for DNA extractions, 500 μl were used for the three different samples. The final DNA was eluted in 200 μl of 10 mM Tris HCl buffer (pH 8) after its incubation for 5 min for maximum elution efficiency and stored at – 80 °C until its use. Negative controls were prepared with 500 μl of sterile distilled water as starting material and included in DNA extraction batches to confirm that no contamination occurred in the samples during DNA extraction and sequencing.
Prior to sequencing, a Qubit High Sensitivity DNA assay (BioSciences, Dublin, Ireland) was used to determine the total DNA concentration, being its purity assessed by the 260/280 and 260/230 absorbance ratios using a spectrophotometer NanoDrop ND-1000 (Thermo Fisher Scientific, Wilmintong, Delaware). Paired-end sequencing libraries were prepared from the extracted DNA using the Illumina Nextera XTLibrary Preparation Kit (Illumina Inc., San Diego, CA, USA) followed by sequencing on the Illumina NextSeq 500, with a NextSeq 500/550 High Output Reagent kit v2 (300 cycles), in accordance with the standard Illumina sequencing protocols.
Reads quality filtering
Pre-processing of raw reads by sequence quality and length was performed with PRINSEQ-Lite v0.20.4 [53]. A mean quality lower than Q25 in a 10 base pair sliding window was the criteria utilized for trimming low quality reads at the 3′-end. Moreover, a minimum length of 150 base pairs was ensured for all reads. The Illumina sequences clean were screened against the pig reference genome (Sus scrofa UCSC) downloaded from Illumina iGenomes (https://support.illumina.com/sequencing/sequencing_software/igenome.html, 2019) to remove host reads using BMTagger v3.101 (ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/bmtagger/, 2011). Read duplicates were removed using the Picard MarkDuplicates tool v2.18.1 (https://broadinstitute.github.io/picard/, 2016) to create fastq files with unique reads only. Afterwards, reads were subjected to a further quality filtering step. In brief, sequences were trimmed for low quality score using a modified version of the script trimBWAstyle.pl that works directly from BAM files (TrimBWAstyle.usingBam.pl, 2010; https://github.com/genome/genome/blob/master/lib/perl/Genome/Site/TGI/Hmp/HmpSraProcess/trimBWAstyle.usingBam.pl). The script was used to trim off bases with a quality value of 3 or lower. This threshold was chosen to delete all the bases with an uncertain quality as defined by Illumina’s EAMMS (End Anchored Max Scoring Segments) filter. Additionally, reads trimmed to less than 200bp were also removed.
Assembly into contigs and taxonomic annotation of reads and contigs
Filtered reads were assembled using IDBA_UD v1.1.3 (kmers 20–120) [54], keeping those contigs with length above 500 bp. Contigs and filtered reads were taxonomically assigned by using Kraken2 software v2.0.8-beta [55] and kraken2-microbial database (2018-09-03) (https://lomanlab.github.io/mockcommunity/mc_databases.html). Only those taxa belonging to the kingdom Bacteria were used for further analyses. The same procedure and database were employed for the taxonomical assignment of both reads and contigs in order to avoid biases caused by the use of different approaches. The relative abundance matrix of filtered reads at family level was used for bacterial microbiome characterization analyses.
ARGs annotation
Reads from each sample were mapped against the ResFinder database (2019-08-28) [56] using Bowtie2 v2.3.4.1 [57, 58]. The “.trimmed_pairs” fastq files generated by Bowtie2 were transformed into a fasta file where forward and reverse reads were concatenated. This new fasta file was employed to perform a BLAST v2.6.0 [59] against the ResFinder database [56] using a 70% identity cut-off and taking 100 hits (max_target_seqs) in order to avoid problems associated to BLAST use in local [60]. Only the first hit per sequence was kept for further analyses.
The document “phenotypes.txt” was downloaded from the ResFinder repository (2019-10-01) (https://bitbucket.org/genomicepidemiology/resfinder_db/src/master/) and manually curated in order to modify the “class” variable, gathering genes that confer resistance to macrolides, lincosamides, streptogramins, and pleuromutilins into the MLSP class, and those that confer resistance to oxazolidinones, as the oxazolidinone class. This last group included cfr genes, which confer resistance to phenicols, lincosamides, oxazolidinones, pleuromutilins and streptogramins A, the optrA gene, which confers resistance to phenicols and oxazolidinones, and the poxtA gene, which confers resistance to phenicols, oxazolidinones, and tetracyclines.
The manually curated version of phenotypes.txt file (see Additional file 14) was used to create two different matrices from BLASTn-firsthit file: (a) gene abundance, (b) antimicrobial resistance class abundance. Abundance matrices were transformed to CPM matrices, defined as a normalization which consists of scaling the counts by the total number of filtered reads, for further analyses.
Taxonomical assignment of ARGs
The “.trimmed_pairs” fastq files generated by mapping the reads on the ResFinder database [56] by Bowtie2 [57, 58] were re-mapped against contigs using the same approach, in order to know which ARG-read belonged to each contig.
Taxonomic assignment for each contig was exported to the contained ARG-read, prior to the final quantification of ARG gene and AMR class per taxonomic group at family level. Abundance matrices were transformed to CPM matrices for further analyses.
AMR mobilome characterization
BLASTn comparison [59] against the ResFinder database [56] was carried out for contigs longer than 1500bp, keeping only those contigs containing ARG for their mobilome analysis. Plasmid location was predicted by PlasFlow v1.1 [61], LGT events were detected by WAAFLE (https://huttenhower.sph.harvard.edu/waafle) and integrons were predicted by Integron_Finder v2 [62], using the assembled contig files as query files.
The coding sequences (CDS) within LGT events and integron regions were extracted by using their coordinates in the contig from WAAFLE and Integron_Finder output files, and bedtools utilities v2.29.0 (https://bedtools.readthedocs.io/en/latest/) to extract this CDS fasta files from contig fasta files. CDS fasta files were used for BLASTn comparison [59] against the ResFinder database [56] to determine which LGT events and integrons contained ARGs. Additionally, complete integron sequences were extracted from contig fasta files and clustered using VSEARCH v2.7.1 [63] with “--cluster_fast” option, to evaluate which integrons were shared between samples.
An in-house ruby script was developed to summarize AMR mobilome outputs in a contig-count per sample matrix, gathering ARGs with chromosomal, plasmid or unknown location, together with LGT events and integron location data. ARGs with unknown location were excluded from further mobilome analyses.
Statistical analyses and figures visualization
Along the study, analyses were carried out initially for the whole sample collection and later among the two production systems (extensive and intensive) and the three different sample types (environment, feces, and slurry), individually and nested. Besides, the analyses were split into the 16 antimicrobial classes provided by the curated ResFinder database: (i) total, (ii) aminoglycosides, (iii) beta-lactams, (iv) diaminopyrimidines, (v) fosfomycin, (vi) glycopeptides, (vii) MLSP, (viii) multidrug, (ix) nitroimidazoles, (x) phenicols, (xi) oxazolidinones, (xii) polymyxins, (xiii) quinolones, (xiv) rifamycins, (xv) sulfonamides, and (xvi) tetracyclines. All analyses were carried out using R v3.6.2 [64].
The within-herd resistome diversity was computed at the gene-level CPM matrix using the R package vegan v2.5.6 [65]. Alpha diversity was estimated by the Inverse Simpson Diversity (1/D), Simpson (1-D), Shannon and ARGs richness indexes. Comparisons in alpha diversity estimates were carried out with the Wilcoxon signed-rank test through the ggpubr package v0.4.0 [66]. Beta diversity was estimated by Bray-Curtis dissimilarities and analyzed by non-metric multidimensional scaling (NMDS) using the “metaMDS” function in vegan. Within-group dispersion was evaluated through the “betadisper” function. Finally, the effect of the type of sample and the production system on sample dissimilarities was determined by permutational multivariate analysis of variance (PERMANOVA) using distance matrices with “adonis2” function (pairwise adonis). These analyses were also computed for bacterial microbiome characterization at the family-level relative abundance matrix.
Associations among the variables under study (production system and sample type) with the CPM matrices for ARGs and AMR classes, as those taxonomically assigned, the contig-counts matrix for AMR mobilome and the relative abundance matrix for microbiome characterization, were performed with the Kruskal Wallis test and the post hoc Wilcoxon signed-rank test. All p values were adjusted by following the Benjamini and Hochberg method [67] and significance was established at p < 0.05.
Procrustes analyses were used to determine the association between the resistome and the bacterial microbiome composition. Bray-Curtis dissimilarities from each matrix were ordinated using NMDS. The symmetric Procrustes correlation coefficients between the resistome and the microbiome ordinations, p values and plots were obtained using the “protest” function in vegan.
As AMU data were strongly skewed, they were log10 transformed. In addition, a pseudocount of 1 was added before the log10 transformation to deal with an excess of zeros in the data [43]. To reveal the association between AMU and AMR, as the link between AMU and AMR mobilome, the pairwise Spearman’s rank correlation was calculated for CPM matrices for AMR classes and the contig-count AMR mobilome matrix, respectively. Correlations were removed if the Spearman correlation coefficient, rs, was lower than 0.4 and the p value > 0.05, adjusting this p value to avoid false positives using the Benjamini and Hochberg method [67]. The correlations were carried out with the R package Hmisc v4.4.0 [68].
The circular Bray-Curtis resistome dendrogram was constructed by exporting the dendrogram in Newick format using the R package ape v5.4 [69] and further annotating it using the Interactive Tree of Life tool (https://itol.embl.de/). Other plots were produced using the ggplot2 package v3.3.2 [70], and further modified using the software Inkscape v0.92.4 (https://inkscape.org/). The level of statistical significance was represented with asterisks: four asterisks (****) indicated a p value less than 0.0001; three asterisks (***) indicated a p value between 0.0001 and 0.001; two asterisks (**) indicated a p value between 0.001 and 0.01; one asterisk (*) indicated a p-value between 0.01 and 0.05; non-significance (ns) indicated a p value higher than 0.05.