Alongside the n = 50 human and n = 8 murine brain samples, we also investigated several negative and positive controls for this proof of concept study (see Table 1). Human brain samples included tissues from healthy (healthy control, HC) and PD donors and were comprised of samples from different brain regions (olfactory bulb, inferior frontal gyrus, and putamen), all of which could be affected at different stages of the PD pathology; fresh sterile cortex samples were obtained during resective neurosurgery (sterile cortex, SC). Due to the nature of this very low biomass sequencing experiment, we used special filter and analysing algorithms that are part of the overall study results.
Quality filtering and removal of off-target amplicons in metabarcoding data
After removal of spurious OTUs by standard 16S rRNA gene sequencing approaches (denoising, de novo chimera removal, crosstalk), we obtained 3014 zero-range OTUs (zOTUs).
A large fraction of zOTUs could not be taxonomically assigned, even to the bacterial phylum level (53.3% of 3014). For this reason, we implemented a filtering algorithm in the LotuS pipeline [21], to map zOTUs onto the human and murine reference genome and determine whether they truly represented bacterial zOTUs. This showed that 1032 (34.2%) of the 3014 zOTUs mapped onto the human or mouse genome. Thus, these were off-target amplifications of the murine or human genome. Notably, these off-targets were not amplicons of human or murine mitochondrial 16S or chromosomal 18S rDNA genes; these would have been taxonomically classified as such using standard reference database such as SILVA [38]. These human or mouse genomic regions had likely some similarity to our primers, as we could detect amplification primers on 1012 of the 1032 off-target amplicons.
Off-target amplicons probably occur due to the extremely low bacterial biomass in samples and primer competition with the dominant host DNA background. Most of the off-target zOTUs identified (887; 86%) had no taxonomic assignment. However, 145 (14%) indeed matched references in the SILVA databases using LotuS least common ancestor (LCA) taxonomic assignments; most of these were assigned to Firmicutes (144) with one hit to Phragmoplastophyta, which is a plant/eukaryote. The majority of the Firmicutes hits could not be classified further while some were matched to Clostridium difficile (n=38) on species level. These 38 high-quality matches were explained due to highly similar short sequences between the reference 16S rRNA gene and zOTU sequences, often at 100% identity and on average 35 nt length, just passing the default 1e-9 blast e-value confidence threshold used in the LotuS LCA step. Interestingly, these matches occurred at the end of forward reads (average position in merged read was at 275 bp, read length was 300 bp), and we speculate that they could be amplified host DNA regions that were chimeric to bacterial contamination in the sample.
Quantities of bacterial zOTUs corresponding to off-target amplicons varied between samples and seem to be inversely related to the amount of contaminants (Fig. 1 and Suppl. Fig. 1a). We investigated if this was related to the extraction protocols, and indeed the accumulated off-target zOTU abundance in a sample was significantly increased in the PK1 protocol (P=8e−8, Fig. 1a) that was less prone to bacterial contamination. Correlating the fraction of off-target zOTU abundance to the amount of 16S rDNA copies found in each sequencing well further confirmed this, being significantly anticorrelated (P=3e−5, rho=− 0.52, Fig. 1b). Further, the samples derived from mice held under germ-free conditions, and hypothetically being composed of purely murine DNA, as well as mice from SPF (specific pathogen free) cages, contained approx. 80% murine off-target amplicons (Fig. 1c).
These experiments showed that off-targets are likely the result of missing primer targets, resulting in 16S rRNA gene primers binding to suboptimal binding sites in either mouse or human genomes.
Given the challenge with host-derived off-target amplicons, we examined the performance of other taxonomic assignment algorithms to classify them. Specifically, we compared the taxonomic assignments made by the widely utilized pipelines, QIIME1, Dada2, mothur, and LotuS (Fig. 1d). The RDP classifier [39] used in LotuS-RDP (using by default a confidence threshold of 80) misclassified more off-target amplicons as bacteria at the domain level than the LotuS-LCA approach (Fig. 1d). Fewer sequences were erroneously assigned to bacteria by LotuS-RDP and LotuS-LCA than by mothur and Dada2 (the latter also using the RDP classifier, but with a default confidence threshold of 50). Thus, LotuS-LCA classifies off-target amplicons accurately compared with the other three workflows. QIIME1 (sortmerna and uclust) had the fewest number of false positives, but these represented a wider phylogenetic spectrum.
We next evaluated whether the underlying clustering algorithm had any influence on the detection and classification of off-target amplicons. When generating ASVs instead of zOTUs with Dada2 sequence clustering (Suppl. Table 2), similarly 34.2% of all ASVs (1763 from 5149 ASVs) represent off-target sequences mapping onto the human or the mouse genome. Most of them (1574/89.3%) could not be assigned to a bacterial or eukaryotic SSU sequence, while 10.5% (185 ASVs) were assigned using LotuS’s default taxonomic LCA assignments. Using instead the RDP classifier to assign sequences, 65.7% (759 of 1156) of all off-targets were taxonomically classified to the bacterial domain at high (>0.8) confidence.
Overall, based on our dataset, seemingly all workflows, independent of the underlying sequence clustering (zOTU or ASV), produce false-positive taxonomic assignments by classifying off-target amplicons as bacteria and by assigning a further taxonomic level to them.
Despite their numerous occurrence, off-target amplicons are only one form of false-positive zOTUs that we excluded from our analysis.
Contaminants in metabarcoding data of human and murine brains
Having removed off-targets and other false-positive zOTUs (denoised zOTUs, chimeric zOTUs, PhiX matching), we attempted to classify remaining zOTUs (Suppl. Table 3) that might represent true-positive bacteria in brain samples, or contaminant bacteria in reagents, environment, or on technical machines. Two different bioinformatic approaches exist for removing contaminant bacteria, relying either on negative or positive controls [40]. To rigorously identify confounding contaminants, we included both negative and positive controls (Fig. 2a), which were evaluated with different computational methods.
Most contaminant zOTUs were detected using approaches focused on negative controls using decontam [35]. The “isNotContaminant” method was superior to other decontam methods by comparing the prevalence of zOTUs across true samples and negative controls to identify non-contaminants, i.e., detecting contaminants by increased prevalence in negative controls, independently of the absolute bacterial biomass assessed via qPCR. Prevalence-based contaminant identification remains valid even in extremely low biomass samples as it is expected that non-contaminants will appear in larger proportions in true samples than in negative controls. The isContaminant method (iC1 and iC2, prevalence or frequency based, respectively) or the combination of both (combined frequency and prevalence in “isContaminant” method, iC3) identified fewer contaminants. Identification of known contaminant taxa introduced through laboratory reagents (Suppl. Table 1) and of contaminants being present in negative controls at an abundance threshold of ≥ 0.01 resulted in comparable findings, removing more contaminants than identified using the positive control, the serially diluted mock community in which the bacterial composition is known a priori. Notably, approximately one third of relative zOTUs abundances in study samples were purely composed of contaminants (Fig. 2b). Analysis of the contaminant taxa (heatmap of contaminants and off-target amplicons, Suppl. Fig. 3) revealed no clear clustering of taxonomic signals in human samples. Non-metric multidimensional scaling (NMDS) of all zOTUs showed a separation of samples based on tissue origin, but after removing off-target and contaminant zOTUs, group cluster was overlapping (Suppl. Figure 4), indicating that initial differences between study groups might be driven by technical signals.
Combining all approaches (contaminant identification and off-target filtering), we thus excluded 2684 zOTUs from further downstream analysis and tested the remaining 331 zOTUs for abundance differences.
Validation with 16S rRNA gene qPCR results
To quantify the total bacterial biomass in our samples per mg of brain tissue, we used 16S rRNA gene qPCR. 16S rRNA gene copies were corrected for the bacterial biomass expected after both off-target, and contaminant DNA fractions were removed.
To control for our methodological approaches, we also evaluated a set of serially diluted mock samples (positive control) and several negative controls. The number of 16S rRNA gene copies detected in mock samples were in line with the expected qPCR copies given the manufacturer’s bacterial density and our dilutions (Suppl. Fig. 2a) validating our approach for estimating bacterial biomass. However, using amplicon sequencing to determine the mock community composition, we found only six of the eight expected bacterial species at theoretical abundances, but Listeria monocytogenes was absent, and Bacillus subtilis was at a lower than expected abundance (both bacteria are reported to be difficult to lyse [41, 42]). No mechanical lysis was performed in extracting brain samples or mock community samples to ensure comparability within the analyses. In total, the original taxa composition represented 99.5% of relative zOTU abundance in the undiluted mock community sample and remained stable up to a dilution of 1:103. Contaminant DNA increased with subsequent dilutions. Thus, at the two highest serial dilutions (1:104, 1:105), 13.5% and 18.7% of reads were attributable to contaminants, respectively (Suppl. Fig. 2b).
Negative controls did not differ in terms of the number of 16S rRNA gene copies present (Suppl. Table 4). Overall bacterial biomass was extremely low in biological samples, compared with either the mock samples (115,730,862 16S rRNA gene copies/μl in the undiluted sample, and 4182/μl in the most diluted sample, respectively) or the bacterial biomass found, e.g., in mouse feces (160,000,000.0 (1.6 × 108) 16S rRNA gene copies/mg feces, data from N. Beraza), and we are likely reaching the methodological limits for bacterial detection using 16S rRNA gene qPCR.
In human brain samples, 16S rRNA gene copy number significantly exceeded the gene copy numbers in mouse samples and negative controls (Fig. 3; Suppl. Table 4), even when correcting for the fraction of off-targets in each sample. This was likely an effect of the differentially handled tissue within the DNA extraction (see below). Although PD samples appeared to contain more bacterial DNA than HC, after off-target correction, there were no significant differences between PD and HC (P > 0.05). Furthermore, the quantity of bacterial DNA from SC samples was the same as that from PD or HC samples (P > 0.05, Fig. 3b). No correlation was found between the number of 16S rRNA gene copies/quantity of bacterial DNA in human samples and the patient’s age or the post-mortem delay (Suppl. Fig. 2c), making post-mortem bacterial invasion or an age-related effect unlikely.
Last, we investigated the influence of tissue handling during DNA extraction procedure. Non-sterile tissue handling (PK2) resulted in a significantly greater quantity of contaminants in all samples than sterile tissue handling (PK1, Suppl. Fig. 1b). This was also reflected by an increased bacterial biomass assessed with qPCR (Fig. 3c), likely explaining the increased total 16S rRNA gene copy number in human samples.
Contaminant taxa present in positive and negative controls
In total, 98 different contaminants were present within the mock samples, of which nine had a missing taxonomy on genus level (Suppl. Table 5). As expected, we found no murine genomic DNA in the mock samples. However, in the two most diluted mock samples (dilutions of 1:104 and 1:105), three zOTUs corresponded to human off-target amplicons (Zotu2403 6e-05%, Zotu1715 0.00012%, and Zotu1762 0.000112% relative abundance, respectively, Suppl. Table 5). The taxonomic diversity of contaminants in the negative controls was greatest in the PCR reagents (no template/KitQIB), compared with DNA-extraction blank/KitUKB, DNA buffer, and sterile water (Suppl. Fig. 5). Further analysis showed that 99% of taxa across all negative controls were exogenous contaminants; 1% was human DNA in the DNA extraction blank samples, possibly due to “cross-talk” [26] or human contamination during sequencing.
Analysis of putative true positive zOTUs in brain samples
After correction for off-target zOTUs and contaminants, we further tested the abundance of zOTUs for putative differences among study groups. The filtered zOTU composition was extremely low in species abundances and irregularly distributed among all biological samples (Fig. 4a); most remaining zOTUs were unclassified at species level.
All taxonomic signals of true positive zOTUs (Fig. 4a) were reassessed using a manual Blast (Suppl. Table 6). Six out of the 17 most abundant zOTUs across all study samples belonged to either the human or the mouse genome (zOTU20/uncl. Peptostreptococcales-Tissierellales, zOTU24/uncl. Peptostreptococcales-Tissierellales, zOTU56/Clostridioides difficile, zOTU88/uncl. Peptostreptococcales-Tissierellales, zOTU95/unlcassified, and zOTU126/uncl. Clostridia); three were either classified as bacteria or human/mouse genome with similar query coverage and percent identity (zOTU13/uncl. Peptostreptococcales-Tissierellales, zOTU121/uncl. Clostridia, and zOTU113/uncl. Clostridia); and one was classified as an uncultured organism (zOTU180/uncl. Oscillospiraceae). However, six were accurately classified as bacteria (of which five were classified as uncultured bacteria in the NCBI databases with manual Blast; zOTU34/uncl. Methylotenera, zOTU93/uncultured Bacteroidales bacterium, zOTU763/Romboutsia ilealis, zOTU217/uncl. Shewanella, zOTU91/uncl. Lachnospiraceae NK4A136 group, and zOTU254/uncl. Paludicola), occurring across all types of study samples including negative controls. One zOTU (zOTU124), assigned to Methylobacterium-Methylorubrum by LotuS and manual Blast, was present only in human samples (n = 11, across PD, SC, and HC), and at extremely low abundances. These bacteria are commonly found in the atmosphere, soil, and on human skin, but also in laboratory reagents [43, 44]. Very rarely, they have been reported as opportunistic pathogens in clinical samples [44]. Thus, there is a remote possibility that these might represent undetected invasions of healthy and diseased brains, but due to the pathogenic nature and the frequent presence in soil and reagents, it seems unlikely that they are true positives. Further, plotting the frequency of zOTU124 in relation to bacterial biomass showed an inverse relationship (Suppl. Fig. 6), as would be expected for a contaminant bacterium.
Of the remaining zOTUs that appeared to be true positive bacteria in human samples, 52 were uniquely present in putamen samples; the five most abundant were zOTU34/Methylotenera, zOTU187/Sporichthyaceae, zOTU231/Nitrospiraceae, zOTU259/Moraxellaceae bacterium HYN0046, and zOTU256/Comamonadaceae (Fig. 4d), which were environmental bacteria in soils and waste water or present as pathogens in the oral human cavity. However, it is likely that this was as a result of “non-sterile” (PK2) tissue handling during DNA extraction, which only became apparent after statistical blocking for tissue handling type. The zOTUs from putamen samples all matched known bacteria more closely than host genomes and would therefore seem to represent additional contaminants in our extraction protocols (Suppl. Fig. 1b). These contaminants were likely missed with our initial computational approach, as we did not include negative controls processed with the same PK2 protocol, more prone to contaminants (see above).
Manual Blast analysis of enriched zOTUs in the olfactory bulb (not shown) revealed them all to be of human origin, except zOTU2986/Salmonella enterica which was identified as an uncultured bacterium clone. Since zOTU2986 was also present (at low prevalence) in human and murine brain samples, and in negative controls, it appears almost certainly to be a false-positive.
Further, in human samples, six zOTUs contributed to the significant difference between PD and HC samples: zOTU643, zOTU1896, zOTU873, zOTU607, zOTU939, and zOTU1931 (Fig. 4c). These six were all assigned to Clostridia by LotuS-LCA and appeared to be of human origin when using a manual Blast search. Thus, they represent additional off-target amplicons not detected in our automatic off-target removal.
The same was true for five enriched zOTUs from murine samples (Fig. 4b), of which four (zOTU24, zOTU593, zOTU590, and zOTU1665) were originally assigned to Clostridia but were ultimately found to be murine genomic DNA. Manual Blast identified ZOTU847/uncl. Lachnospiraceae NK4A136 group, which was enriched in murine samples, as an uncultured bacterium clone, but it was only present in two samples.
An additional 75 zOTUs that remained in the analysis after automatic off-target removal and automatic contamination removal were discovered to be off-target amplicons through manual blast searches. All other workflows evaluated also classified the majority of these remaining 75 zOTUs as bacteria (Suppl. Fig. 7). Thus, all the in-depth evaluations of brain-enriched microbes in our study (with one very unlikely exception) can be considered as false-positive zOTUs, and no convincing taxonomic signal of a consistent bacterial colonization/presence was found.
Validation of study results in an independent dataset
In order to assess whether off-target amplifications could emerge as a broader phenomenon in low-biomass 16S rRNA gene sequencing experiments, we additionally evaluated an independent dataset (a 16S rRNA gene sequencing experiment in human brain tissue from PD and healthy individuals, see “Methods”).
Of 2165 generated zOTUs in this dataset, 8.7% (198 zOTUs) were mapping onto the human genome. After filtering zOTUs with no taxonomic assignment and technical artifacts, we obtained 1264 zOTUs, which could putatively originate from bacteria. As positive or negative controls were not available in this dataset, bioinformatic approaches to remove contaminants were inapplicable. However, when matching the detected species to typical contaminants (coming from laboratory reagents, Suppl. Table 1), 402 zOTUs matched to known contaminant bacteria.