Fischer 344 male rats were purchased at 5 weeks from Charles River (Saint-Germain-Nuelles, France). The rats were housed individually in metabolic cages, allowed free access to diet and tap water, in a room kept at a temperature of 22 °C on a 12-h light-dark cycle. After acclimatisation, the rats were randomly allocated into dietary groups; the rats were fed control (CON), calcium carbonate (Ca), haemin (HEM), or haemin+calcium carbonate (HEM-Ca) diets for 14 or 21 days. Body weight was monitored at the beginning, at the middle, and at the end of experimental periods, and food and water intakes were recorded at the end of experiments.
All diets were based on a modified low calcium AIN76 diet in a powdered form (UPAE, INRA, France), balanced in iron (haem in haem-enriched diets vs. ferric citrate in control diets), proteins (20% using casein), and lipids (5% using safflower oil). Haem groups (HEM and HEM-Ca) received 1.5 μmol/g diet of haemin (Sigma Chemical), and control groups (CON and Ca) received 0.036% of ferric citrate to balance iron level. Calcium level is critical for the haem promotion of carcinogenesis; calcium was thus excluded from mineral mix but dibasic calcium phosphate was included in all diets at a low concentration of 3.4 g/kg. Calcium groups (Ca and HEM-Ca) received 2% of calcium carbonate (Sigma chemical) in order to chelate haem in the intestinal lumen.
Faecal and urinary biomarkers
Twenty-four-hour faecal pellets and urine samples were collected under each metabolic cage to assess faecal haem, faecal thiobarbituric reactive substances (TBARs), and urinary 1,4 dihydroxynonene mercapturic acid (DHN-MA) as described previously [7, 16]. Faecal waters were prepared from 0.5 g of fresh faeces homogenised in 1 ml of distilled water and 50 μl of butylated hydroxytoluene (Sigma) 0.45 M, using Fast-Prep® (MP Biomedicals, Illkirch, France) for 3 cycles of 30 s at 6 m s−1. After centrifugation at 5,500g for 20 min, faecal water (supernatant) was collected and kept at − 20 °C until use.
In vivo assessment of colon mucosa inflammation, permeability, and genotoxicity
Colonic mucosa inflammation was evaluated as described previously by myeloperoxidase activity (MPO) and confirmed by measuring the IL-1β and IL-10 cytokines using commercial kits according to the manufacturers’ protocols .
Colonic paracellular permeability was evaluated using 51-chromium-labelled ethylenediamine tetra-acetic acid (51Cr-EDTA; Perkin Elmer Life Science, Paris, France). After 14 days of experimental diets, rats received an administration of 51Cr-EDTA (25.9 kBq) diluted in 0.5 ml of saline by oral gavage. Rats were then placed in metabolic cages, and radioactivity in urine was measured with a gamma counter (Cobra II; Packard, Meriden, CT, USA) after 24 h. Permeability to 51Cr-EDTA was expressed as the percentage of total radioactivity administered.
Colon mucosa genotoxicity was evaluated by alkaline comet assay. Cells were collected by scraping the mucosa and stored in NaCl 0.075 M/EDTA 0.024 M buffer at pH 7.5 before slow freezing at − 80 °C. After counting, cells were embedded in 0.7% Low Melting Point Agarose (Sigma) and laid on CometAssay® HT slides (Trevigen) in triplicate. Slides were immersed overnight in a lysis solution (NaCl 2.5 M/EDTA 0.1 M/Tris 10 mM pH 10/DMSO 10%/Triton 1%). Then, after 40 min for unwinding in electrophoresis buffer (EDTA 1 mM/NaOH 0.3 M, pH 13), the slides were transferred into an electrophoresis tank at 28 V (resulting in 0.8 V/cm on the platform) for 24 min in buffer (EDTA 1 mM/NaOH 0.3 M). Finally, slides were immersed in a PBS solution for neutralisation and cells were fixed using cold absolute ethanol. For DNA staining, 2 μg/ml of ethidium bromide is added on each sample. Fifty cells per slide and 2 slides per sample were analysed using a Nikon 50i fluorescence microscope equipped with a camera and the Komet 6.0 software. The extent of DNA damage was evaluated for each cell through the measurement of intensity of all tail pixels divided by the total intensity of all pixels in head and tail of comet. The median from these 100 values was calculated and named as % tail DNA. The experiment was done in triplicate.
Expression of genes involved in aldehyde detoxification, inflammation, and paracellular permeability
Total RNAs were extracted from tissue using the RNeasy plus Mini kit (Qiagen, France) according to the manufacturer’s instructions. RNA samples were reverse transcribed using the iScript cDNA Synthesis kit (Biorad, France). Amplifications were carried out using a ViiA7 Real-Time PCR System (Applied Biosystems, Forster City, CA, USA). The 384-well plates were prepared by an Agilent Bravo Automated Liquid Handling Platform (Agilent Technologies, Santa Clara, CA, USA). Each well contained a final volume of a 5-μl mix: 2.5 μl of iQ SYBR Green Supermix (Biorad, France) used as a fluorescent dye, 1.5 μl of each primer set, and 1 μL of cDNA material. Thermal cycling conditions were as follows: 3 min denaturation at 95 °C followed by 40 cycles at 95 °C for 15 s, 15 s at 60 °C, 15 s at 72 °C, and a melting curve. Data were collected using the Quant-Studio Real time PCR Software v1.1 (Applied Biosystems). Results were normalised with the housekeeping genes TATA-box binding protein (TBP) and 18S and expressed as absolute abundance/gene copy number (delta Ct method). Sequences of primers designed for rat cells are listed in Additional file 1: Table S1a.
Assessment of aldhehyde detoxication, inflammation, permeability, genotoxicity, and ROS formation after faecal water treatment of murine colon epithelial cells
Apc+/+ (derived from C57BL/6 J mice) colon epithelial cells  express the heat-labile SV40 large T antigen (AgT tsa58) under the control of an IFNγ-inducible promoter. The Apc+/+ cell line expressed cytokeratin 18, a marker of their epithelial phenotype . The culture conditions affected cell proliferation due to the thermolabile tsA58 T antigen, which confers conditional immortalization: at 33 °C with IFNγ, the large T antigen is active and drives cellular proliferation, and at 37 °C, the temperature-sensitive mutation yields an inactive protein and cells act like non-proliferating epithelial cells. All studies were conducted at the non-permissive temperature (37 °C).
Expression of genes involved in aldehyde detoxification, inflammation, and permeability
Normal murine epithelial colonic cells (Apc+/+) were treated with filtered faecal waters diluted at 1/160 from rats fed with CON, Ca, HEM, or HEM-Ca diets to assess the expression of genes involved in aldehyde detoxication, cellular inflammation, and paracellular permeability by RT-qPCR. Real-time quantitative PCR (qPCR) were realised as described in the section above with minor changes in the quantitative PCR amplification conditions: a first one-hold stage at 95 °C for 10 min followed by 40 cycles (95 °C for 15 s and 60 °C for 30 s) and a final extending step (95 °C for 15 s, 60 °C for 1 min, and 95 °C for 15 s) for melt curve analysis. Primer sequences designed for mice cells are listed in Additional file 1: Table S1b.
Cellular permeability (TEER)
Cells were seeded onto Transwell inserts (Greiner Bio-one®, 3.0 μm pore polyethylene terephthalate membrane insert, 0.6.106 pores/cm2) at a density of 260,000 cells/insert in Dulbecco-modified essential medium (DMEM) supplemented with 10% (v/v) foetal calf serum, 1% (v/v) penicillin/streptomycin, and 10 U/ml interferon-γ. After seeding, the inserts were transferred into a cellZscope module (NanoAnalytics Münster, Germany) at 5% CO2 and at the permissive temperature of 33 °C. After 24 h, the module was transferred to 37 °C (5% CO2) without interferon-γ. When TEER was stabilised, the medium was changed for a medium without foetal calf serum. Six hours after, the cells were treated for 24 h with filtered faecal waters diluted at 1/160 from rats fed with CON, Ca, HEM, or HEM-Ca diets. Faecal waters from rats in the CON and HEM groups were additionally incubated or not with polymer resin to trap aldehydes as previously described . Five replicates were performed per condition, and the experiment was performed three times. TEER values were recorded every 40 min, and the results normalised with the value before treatment.
To verify potential differences in cell viability after treatment, cells seeded on inserts were fixed in paraformaldehyde 4% and stained using fluorescent dye Hoechst 33342 (Life Technologies, 0.5 ng/ml in PBS). Apoptotic (fragmented and/or condensed) and alive nuclei were counted using fluorescence microscope (Evos FL Digital Inverted Microscope, AMG) and expressed as percentage of total population (n > 500 nuclei).
Cells were seeded into 24-well plates at 260,000 cells/well in the same medium described above at the permissive temperature of 33 °C. After 24 h, cells were transferred to 37 °C without interferon-γ for 24 h and then treated with native filtered faecal waters diluted at 1/320 from rats fed with CON, Ca, HEM, or HEM-Ca diets and with faecal water from CON and HEM groups additionally incubated with polymer resin to trap aldehydes. After 24 h, the cells were trypsinized and suspended in culture medium with serum and DNA damage was assessed by alkaline comet assay as described in the section above.
Reactive oxygen species formation
Intracellular H2O2 was measured by flow cytometry using H2-DCF-DA (dichlorodihydro-fluorescein diacetate) (Life Technologies). After treatment with filtered native or aldehyde-depleted faecal waters diluted 1/100 for 1 h, the cells were gently trypsinized, stained for 20 min at 37 °C with H2-DCF-DA (10 μM) in HEPES-buffered solution, and next analysed using a MACSQuant Analyzer (Miltenyi Biotec). Menadione (100 μM) (Sigma-Aldrich) was used as a positive control. Each measurement was conducted on 40,000 events in the B1 channel (525 ± 25 nm) and analysed with VenturiOne software.
Faecal extracts preparation
Faecal extracts for NMR spectroscopy and microbiota composition analysis were prepared by homogenising 1 g of frozen faecal pellets three times using a FastPrep® (MP Biomedicals, Illkirch, France) at 6 m s− 1 for 30 s. Sixty milligrams of this homogenate were suspended in 1.2 ml of phosphate buffer (0.2 M, pH 7.4) containing 90% D2O, 1% (w/v) of sodium 3-(trimethylsilyl)propionate (TSP), and 0.3 mM NaN3. After vortex mixing, samples were centrifuged at 10,000g for 10 min at 4 °C. The supernatants were collected and transferred into an NMR tube (outer diameter, 5 mm) pending NMR analysis.
All 1H-NMR spectra from faecal extracts were obtained on a Bruker Ascend 800 Advance III NMR spectrometer (Bruker, France) on the LISBP metabolomics platform (MetaToul) operating at 800.13 MHz for the 1H resonance frequency using an inverse detection 5-mm 1H-13C-31P-15N cryoprobe (CQPCI) attached to a cryoplatform (preamplifier cooling unit). The 1H-NMR spectra were acquired at 298K using the Carr-Purcell-Meiboom-Gill (CPMG) spin-echo pulse sequence with pre-saturation and a total spin-echo delay (2nτ) of 100 ms. A total of 32 transients were collected into 64,000 data points using a spectral width of 15 ppm, a relaxation delay of 5 s, and an acquisition time of 2.72 s.
Data were analysed by applying an exponential window function with a 0.3-Hz line broadening prior to Fourier transformation. The resultant spectra were phased, baseline corrected, and calibrated to TSP (δ 0.00) manually using Mnova NMR (v9.0, Mestrelab Research). The spectra were subsequently imported into MatLab (R2014a, MathsWorks, Inc.). The region containing the water resonance (δ 4.6–5.2 ppm) was removed, and the spectra were normalised to the probabilistic quotient  and aligned using a previously published function . All data were analysed using full-resolution spectra. Data were mean-centred and scaled using the unit variance scaling prior to analysis using projection on latent structure-discriminant analysis (O-PLS-DA). A first PLS-DA model was build using the mixOmics R package (6.1.1 version) [21,22,23] using all four treatment groups. Pairwise O-PLS-DA models were then constructed to compare the groups 2 by 2. 1H-NMR data were used as independent variables (X matrix) and regressed against a dummy matrix (Y matrix) indicating the class of samples (CON vs HEM, CON vs Ca, HEM vs HEM-Ca) . PLS-derived models were evaluated for goodness of prediction (Q2Y value) using eightfold cross-validation. Parameters of the final models are indicated in the figure legends. To identify metabolites responsible for discrimination between the groups, the O-PLS-DA correlation coefficients (r2) were calculated for each variable and back-scaled into a spectral domain, so that the shape of NMR spectra and the sign of the coefficients were preserved . The weights of the variables were colour-coded, according to the square of the O-PLS-DA correlation coefficients. Correlation coefficients extracted from significant models were filtered so that only significant correlations above the threshold defined by Pearson’s critical correlation coefficient (P < 0.05; |r2| > 0.49) were considered significant. For illustration purposes, the area under the curve of several signals of interest was integrated and statistical significance was tested using t test.
SCFA assay in faeces
All the organic acids were extracted by vigorous homogenisation with ultrapure water followed by centrifugation (14,000g, 15 min at 4 °C). The SCFA in the supernatants was then derivatised by esterification and analysed with a gas chromatograph equipped with a capillary column (30 m, 0.32 mm ID; RestekRtx 502.2) and fitted with a flame ionisation detector using a modification of the method of Kristensen et al. . The amounts of SCFA were determined by external standards with reference to internal standards.
Microbial community analysis
Genomic DNA was obtained from faecal extracts using the ZR Faecal DNA MiniprepTM kit (Zymo Research), and DNA quantity was determined using a TECAN Fluorometer (Qubit® dsDNA HS Assay Kit, Molecular Probes).
16S rRNA gene amplification and amplicon sequencing
The V3-V4 hypervariable region of the 16S rRNA gene was amplified by PCR. The forward PCR primer 5′CTT TCC CTA CAC GAC GCT CTT CCG ATC TAC GGR AGG CAG CAG3′ was a 43-nuclotide fusion primer consisting of the 28-nt illumina adapter (designed by bold font) and the 14-nt broad range bacterial primer 343F. The reverse PCR primer 5′GGA GTT CAG ACG TGT GCT CTT CCG ATC TTA CCA GGG TAT CTA ATC CT3′ was a 47-nuclotide fusion primer consisting of the 28-nt illumina adapter (designed by bold font) and the 19-nt broad range bacterial primer 784R.
The PCR mix contained MTP Taq DNA polymerase (SIGMA, 0,05 U/μl), 200 μM of each DNTP (SIGMA, premix), and 0,5 μM of each primer. After initial denaturation at 94 °C for 60 s in CFX-96 Thermal Cycler (Bio-Rad), 30 cycles were run with 60 s denaturation at 94 °C, 60 s annealing at 65 °C, and 60 s at 72 °C, round ended with 10 min extension at 72 °C. Amplification quality (length, quantity, and specificity) was verified using the Agilent 2200 TapeStation System (High Sensitivity D1000 ScreenTape assay) and AATI Fragment Analyser at the GeT (Genomic and Transcriptomic, TRIX, and PlaGe) platforms in Toulouse. Because MiSeq enables paired 250-bp reads, the ends of each read are overlapped and can be stitched together to generate extremely high-quality, full-length reads of the entire V3 and V4 region in a single run. Single multiplexing was performed using home-made 6 bp index, which were added to R784 during a second PCR with 12 cycles using forward primer (AAT GATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC) and reverse primer (CAAGCAGAAGACGGCATACGAGAT-index-GTGACTGGAGTTCAGACGTGT). The resulting PCR products were purified and loaded onto the Illumina MiSeq cartridge according to the manufacturer instructions. The quality of the run was checked internally using PhiX, and then, each pair-end sequences were assigned to its sample with the help of the previously integrated index. Each pair-end sequences were assembled using Flash software  using at least a 10-bp overlap between the forward and reverse sequences, allowing 10% of mismatch. The lack of contamination was checked with a negative control during the PCR (water as template). The quality of the stitching procedure was controlled using four bacterial samples that are run routinely in the sequencing facility in parallel to the current samples.
16S rRNA gene analysis
High quality filtered reads (2,502,588 reads) were further processed using FROGS pipeline (Find Rapidly OTU with Galaxy Solution) to obtain OTUs and their respective taxonomic assignment thanks to Galaxy instance (https://galaxy-workbench.toulouse.inra.fr) . Initial FROGS pre-process step allowed to select overlapped reads with expected length without N, yielding to 1,886,283 pass-filter reads (an average of 60,000 reads per sample). Swarm clustering method was applied by using a first run for denoising with a distance of 1 and then a second run for clustering with an aggregation maximal distance of 3 on the seeds of first swarm , yielding to 267,558 clusters (an average of 10,000 per sample). Putative chimaeras were removed using VSEARCH combined to cross-validation (GitHub repository. Doi 10.5281/zenedo.15524), yielding to 189,803 clusters (an average of 6400 per sample). Cluster abundances were filtered at 0.005%  and/or had to be present at least in three samples, yielding to 332 final clusters (an average of 212 clusters per sample) corresponding to 1,425,084 final valid reads (an average of 44,534 valid reads per sample). One hundred percent of clusters were affiliated to OTU by using a silva123 16S reference database and a taxonomic multi-affiliation procedure (Blast+with equal multi-hits ). Since rarefaction has shown to result in high rates of false-positive tests for differential abundance, counts were not rarefied . Richness and diversity indexes of bacterial community, as well as clustering and ordinations, were computed using the Phyloseq package (v 1.19.1) in RStudio software [23, 33]. Within sample community alpha diversity was assessed by observed diversity (i.e. sum of unique OTUs per sample) and Simpson index, abundance-based richness indices. Divergence in community composition between samples was quantitatively assessed by calculating weighted UniFrac (abundance and phylogenetic relation) distance matrices. Unconstrained ordination was visualised using multidimensional scaling (MDS) and hierarchical clustering (complete linkage combined with wUniFrac distance) and compared using Adonis test (9999 permutations).
In order to evaluate differential abundance in response to diet and identify important taxa modulated by haem and associated to lipoperoxidation status, OTUs were agglomerated at the species rank, reducing the taxon list to 122. Differentially abundant taxa were identified by characterising the difference between two different diets (multivariate analysis, Kruskall-Wallis non-parametric pairwise comparisons) using LEfSe algorithm with an alpha value of 0.01 and a threshold on the logarithmic LDA score for discriminative features of 3  Univariate differential abundance of taxa was also tested using a negative binomial noise model for over dispersion as implemented in the R package DESeq2 (v1.14.1, [32, 35]). In order to identify taxa altered by haem, LRT method was first applied to select significantly affected taxa across the 4 diets (86 taxa) before applying a pairwise post-test comparison (contrast/Wald test) to sort taxa altered by haem as compared to control diet (39 taxa). In parallel, a 2 × 2 factor design combined with a Wald test was applied in order to identify taxa for which haem effect changed across calcium exposition (interaction term). On the 33 final taxa corresponding to the interaction term, 12 taxa were selected because the addition of calcium in haem-enriched diet restored their initial observed level in the control diet. Taxa were considered significantly differentially abundant between diets if their adjusted P value was below 0.01 and if estimated change was log2FC > |1.5|. Tests were corrected for multiple inferences using the Benjamini-Hochberg method to control the false discovery rate. The sequences used for analysis can be found in the MG-RAST database under the project name “Haem_calcium”, with the following accession numbers: mgp89255.
Integrative analysis of two or three datasets using rCCA or extended sGCCA respectively
Regularised canonical correlations analysis (rCCA)  or extended sparse generalised canonical analysis (sGCCA named DIABLO)  was performed using the R package mixOmics (v 6.1.1) in order to improve the representation of the links between bacterial taxa, metabolite signatures, physiological traits (metadata), and most importantly, lipoperoxidation status. Both of these supervised multiblock approaches (two or three datasets), for which generic recommended frameworks were applied [36, 37], are able to maximise common or correlated information into a single exploratory analysis. Prior to integrative analysis with other datasets, the microbiota dataset was processed according to the multivariate statistical mixMC framework, which included data processing using Total Sum Scaling (TSS) normalisation and log-ratio transformation .
Results are expressed as mean ± SEM unless otherwise stated. For in vivo experiments, N refers to the number of animals per group used for in each experiment. The significance of differences between experimental groups was determined by ANOVA with Holm-Sidak multiple comparison post-test, or the Kruskal-Wallis non-parametric ANOVA with Dunn’s multiple comparison post-test as appropriate (Prism 6, Graph Pad Software). Two-side analyses were used throughout, and P < 0.05 was considered significant.