Overview of samples and taxonomic composition
We collected morning and evening microbiota samples across a 10-day sampling period from the palms of four people in Hong Kong, from the bed headboards and internal front door knobs in their homes, and from handrails in a subway entrance and a park or campus near their homes. The study participants all lived alone and in different regions of Hong Kong. We surveyed the microbial communities present in each sample with metagenomic sequencing. Although this metagenomic approach captured members of the Archaea, Eukarya and some viruses, Bacteria were the most abundant domain across all samples of all types (skin: mean abundance 93%, SD 6.9%; household surface: mean 95%, SD 8.8%; public surface: mean 95%, SD 11%). The pooled microbial abundance across all skin samples was dominated by a shared set of known skin commensals, most notably the families Propionibacteriaceae, Micrococcaceae and Moraxellaceae (Fig. 1). However, each person’s skin microbiome had at least one notable bacterial family that was present at higher abundance on that person than on other people, creating individual taxonomic profiles that were similar between a person’s left and right palm and generally maintained across the 10 days of sampling. Person 1 was noted to have a high abundance of Micrococcaceae compared to the other people in the study; person 2 was characterised by Dermacoccaceae; person 3 a mix of families including Gordoniaceae, Dietziaceae and Dermacoccaceae; and person 4 Dermabacteraceae. These distinctive profiles were reflected in the surface microbiomes from the respective homes, with residence 1 having a substantial abundance of Micrococcaceae, residence 2 a high abundance of Dermacoccaceae on the door knob, residence 3 a high abundance of Gordoniaceae on both the door knob and bed headboard and residence 4 a high abundance of Dermabacteraceae. This pattern of locality-specific families was not as evident for the public surface samples, which are likely to reflect microbiota input pooled from the local population and environmental sources.
The overall abundance of Propionibacteriaceae in skin samples (mean relative abundance 27%, SD 23% across all skin samples) was higher than in our previous 16S rRNA gene-based survey of a similar group of Hong Kong households and occupants [11]. This is likely because the 515F/806R primer pair used in the previous study is biassed against the phylum Actinobacteria [18] of which the family Propionibacteriaceae is a member, and highlights a benefit of the metagenomic approach in capturing a more representative sample of the species present.
Human skin as the major microbial source for surface communities
A key premise of microbiota matching as a forensic method is that there is significant transfer of microbiota from a person’s skin to surfaces they touch. We used three methods to confirm that human skin was the major source of microbiota on the sampled household and public surfaces. First, we examined the relative abundances of the families identified by Dunn et al. [19] as indicators of human (skin, oral, stool) vs. environmental (leaf, soil) sources. With the exception of ten of the public surface samples, in which ‘leaf’ was the major microbial source identified and ‘human skin’ the second-major source, the highest-abundance identified source in every surface sample was human skin (Fig. 2a). Second, we used SourceTracker [20] to identify whether human skin or negative control samples were the most likely sources for surface communities (Fig. 2b). Again with the exception of public surfaces, human skin was predicted to be the major source for samples of all types, although the predicted proportion of skin as a source for household surface samples (mean 75%, SD 25%) was higher than for public surface samples (mean 36%, SD 31%), with a much higher proportion of ‘unknown’ sources in these environments. As the ‘skin’ source is determined only from the skin of study participants, this ‘unknown’ source potentially includes skin inputs from other people who have touched these public surfaces. Third, we compared the Bray-Curtis dissimilarities between pairs of skin and surface samples taken from the same location with pairs from different locations (Fig. 2c). Communities at all surface sites were significantly more similar (Mann-Whitney p < 0.05) to skin communities from the same location than communities from different locations, although the magnitude of the difference was larger for household surfaces than for public surfaces. Together, these results confirm the now well-established finding that the microbiomes on frequently touched surfaces both in homes and in public spaces are assembled substantially from deposition of microbiota from human skin, and in particular resemble the skin of the household occupants or local residents.
Performance of microbiota matching methods
Microbiota matching aims to accurately identify the person who has touched a surface by comparing the surface’s microbiome (hereafter the ‘query’) to the skin microbiomes of a group of people (hereafter the ‘pool’ of ‘references’). Previous studies have used a range of methods to achieve this, including SourceTracker prediction of the sources of surface microbiota [11, 13], shared taxonomic composition between query and references [9] and community dissimilarity between query and references [7, 17]. We compared matching with SourceTracker, which we have used previously to match a similar Hong Kong cohort to their household surfaces with 67% accuracy [11], to matching by minimum Canberra distance or Bray-Curtis dissimilarity [17] (Fig. 3). Included in this comparison were query and reference samples taken at the same time as well as samples taken at different times (‘sampling delay’). Overall, the Canberra distance method produced the most accurate matches, with 84% accuracy for household surface and 50% for public surface samples across all matching attempts, significantly higher (χ squared p < 0.05) than the SourceTracker method (77% accurate for household surfaces, 42% for public) and the Bray-Curtis dissimilarity method (75% household, 38% public). Of the 172 combinations of query site and sampling delay for which each matching method was tested, the Canberra distance method had the highest or equal-highest matching accuracy in 102 combinations. This is consistent with the report of Watanabe et al. [17] that the Canberra distance outperforms other distance and dissimilarity measures; as noted by Meadow et al. [21], this is likely because the Canberra distance is more influenced by low-abundance taxa that drive the individuality of grossly similar microbial habitats such as skin or surfaces. We used the Canberra distance method for all further matching analyses.
Temporal effects on matching accuracy
We examined the effect of sampling delay, in which a surface ‘query’ sample is taken some time before or after a skin ‘reference’ sample, on the accuracy of microbiota matching. As our study took place over a 10-day period with morning and evening samples taken on each day, the range of sampling delays examined was between zero (samples taken at the same time) and 10 days (query sample taken 10 days after reference sample, or vice-versa), with increments of approximately 12 h. For household surface queries, there was no significant relationship between absolute sampling delay and matching accuracy (Fig. 3; Spearman’s ρ, p > 0.05), in contrast with our previous study where matching accuracy declined significantly for household surface queries collected several months before or after the reference pool [11]. This suggests that household surface traces up to at least 10 days old have not yet been substantially affected by the processes that degrade the microbial matching utility of older traces. By contrast, for public surface queries, there was a strong (Spearman’s ρ = –0.73 for public handrail, –0.51 for subway exit handrail) and significant (p < 0.05) negative relationship between absolute sampling delay and matching accuracy (Fig. 3).
There was also a substantial and significant relationship between the time of day at which the query sample and/or pool of reference samples was taken and matching accuracy (Fig. S1). For household surfaces, queries collected in the evening had a significantly higher accuracy rate than morning queries (χ squared p < 0.05), whilst skin reference pools collected in the morning had a significantly higher accuracy rate than those collected in the evening (p < 0.05). For public surface sites, however, morning queries had a significantly higher accuracy rate than evening queries (p < 0.05), whilst there was no significant difference in accuracy rate between morning and evening references. This suggests that there are significant differences in the temporal dynamics underlying the similarity of people’s skin and home microbiota environments compared to skin and public surfaces, with divergence between skin and public surface microbiomes occurring significantly faster than between skin and household surface. Further, it suggests that these dynamics, which include processes of microbial dispersal between sites as well as the elimination of dispersal traces, are not uniform throughout the day but are in some way influenced by the diurnal cycle.
Diurnal variation in skin and surface microbiomes
We noted that some taxa at both skin and surface sites appeared to have a diurnal abundance pattern, with relative abundance consistently higher in either the morning or evening. Time of day was a significant contributor to variance at the community level, with a significant difference between morning and evening communities across all species and sites (PERMANOVA p = 0.008). To better understand this diurnal variation and how it may help account for the observed time-of-day effects on microbiome matching, we investigated the identity and abundance patterns of these diurnally varying taxa. The abundance time series for each species at each site and location was decomposed into trend, random and seasonal components, with the seasonal component having a period of 1 day (i.e. a single diurnal cycle). The strength of the seasonal component (FS) was defined as its variance relative to the variance in the random component, and the significance of this strength assessed with 999 random permutations of the species abundances. With this method, we identified a total of 160 combinations of species, site and location where the species exhibited a significant (p < 0.05) pattern of diurnal variation (Fig. 4). The proportion of species identified as significantly diurnal, as well as the FS of those species, was highest at skin sites (14% of all combinations of species, site and location, mean FS = 0.40), followed by smaller proportions at household surfaces (6.0%, mean FS = 0.33) and smaller again at public surfaces (2.1%, mean FS = 0.29). Among significantly diurnal species, the difference in relative abundance between morning and evening was on average largest at skin sites (mean difference in relative abundance 3.3%, SD 6.4%), followed by public surfaces (mean 3.0%, SD 9.4%) and household surfaces (mean 2.5%, SD 5.7%).
Before examining biological explanations for these diurnal patterns, we first considered whether they could be artefactual or due to error. One possible source of an artefactual diurnal pattern would be a systematic difference in the processing of morning and evening samples, such as the introduction of a contaminant species though mishandling, or a bias for or against some species due to differences in DNA extraction technique. Because such a systematic difference would be expected to apply similarly to samples from the four locations, which were processed in parallel in the same laboratory and using the same equipment and reagents, we examined the number of significantly diurnal species at each site which were found in only one location. For all sites, the majority of significantly diurnal species were unique to a single location (72% of species at left palm, 75% at right palm, 100% at bed headboard, 64% at door knob, 80% at public handrail, 100% at subway exit handrail), a pattern not consistent with a systematic methodological error. We also examined the proportion of species with a significantly diurnal pattern that were identified in the negative control samples. The majority (75%) of significantly diurnal species across all samples were absent from all the negative control samples, although this proportion was smaller than that of non-diurnal species of which 94% were absent from the negative controls.
We also examined the possibility that this effect may be driven by input into our metagenomes of genetic material from reagent contamination, in spite of the decontamination steps taken (see the “Methods” section). In particular, if there was a systematic difference in the total biomass in morning compared to evening samples at the DNA extraction and sequencing stages, reagent contamination would proportionally contribute a greater proportion of the observed species abundance in the lower-biomass samples. We note that many of the genera found to exhibit a significant diurnal pattern, such as Janibacter, Microoccus and Dietzia, have previously been reported as common DNA extraction reagent contaminants [22], although these genera are also plausible components of human skin and environmental microbiomes. We re-ran the time series decompositions for all samples with all species that appeared in our negative control samples removed. Of all the species that had been initially identified as significantly diurnal, 25% (n = 40) were removed because they were present in the negative controls; of the remainder, 17.5% (n = 28) were no longer found to be significantly diurnal, perhaps due to compositional effects (see next paragraph ), whilst 57.5% (n = 92) remained significantly diurnal. Considering that many of the species identified in the negative controls were likely to be genuinely present in the sampled sites, these results do not support a major contribution of reagent contamination to the observed patterns of diurnal variation.
Another possible cause for an artefactual diurnal pattern is the compositional effect of normalising species counts in each sample to produce relative abundances [23]. Relative species abundances within a sample are not independent: an increase in the relative abundance of one species proportionally reduces the relative abundances of all others, even if the absolute abundances of the other species remained unchanged. Thus, it is possible that even a single high-abundance species that has a genuine diurnal pattern could cause other species within the same sample to falsely appear to have complementary diurnal patterns. We examined the number of significantly diurnal species at each site that had their abundance peak in the morning vs. evening, across the 24 combinations of site and location. Of these 24 combinations, five had no diurnal species, seven had diurnal species that all exhibited their highest abundance at the same time of day (i.e. all during the morning or during the evening), nine had a mix of diurnal species most abundant in the morning and in the evening, and the remaining three had a single species most abundant at one time with the remainder most abundant at the other time. The pattern in these latter three sites is potentially consistent with a compositional effect, with a true diurnal pattern in a single species driving an apparent pattern in the others. To examine this possibility further, the species with the singular diurnal pattern was removed from these three sites, species relative abundances were re-normalised, and the time series decompositions were re-run to determine the effect on the other diurnal species. At one of the sites, there was no difference in the number of diurnal species, one site had one (of eight, excluding the intentionally removed species) fewer significantly diurnal species after the re-analysis, and the last site had no significantly diurnal species (from three previously). Overall, whilst the possibility of a compositional effect at two of these three sites could not be excluded, there was no evidence of a compositional effect driving the diurnal pattern for the majority of sites.
To better characterise the species that exhibit a diurnal pattern, we first examined the relationship between diurnality and abundance. Diurnal species (mean relative abundance 3.1%, SD 6.5%) were significantly (Mann-Whitney p < 0.05) more abundant than non-diurnal samples (mean 0.72%, SD 2.9%), excluding samples in which a given species was absent. Abundance was also positively correlated with the FS of diurnal species (Spearman’s ρ = 0.39, p < 0.05). It is notable that this positive relationship between seasonality and abundance exists in spite of the fact that diurnally varying species necessarily have significantly reduced abundance in around half of the samples in a given time series. We next examined the taxonomic composition of the diurnal species. By relative abundance, diurnal species at all sites were overwhelmingly dominated by the phylum Actinobacteria, and particularly the families Propionibacteriaceae, Micrococcaceae, Gordoniaceae and Dermacoccaceae (Fig. 4b). These families were noted to correspond well to the families that characterised the skin and surface samples for a particular location; for example, the family Gordoniaceae was unusually abundant in skin, household surface and public surface samples from location 3 (Fig. 1).
Whilst this study was not designed to assess physiological or environmental factors that may contribute to this diurnal pattern, we evaluated whether microbial dispersal was a potential or partial explanation. We hypothesised that microbial dispersal from public surfaces to hands occurs primarily during the daylight hours, when people are likely to be out of their homes and in public spaces, and is followed by a ‘normalisation’ of the skin microbiota overnight driven by bathing, handwashing and interaction with household objects and surfaces that act as reservoirs of the household occupant’s normal skin microbiome. This hypothesis presumes that the major route of dispersal among the three types of sites is from public surfaces, which represent pooled human skin microbiota as well as environmental inputs, onto skin whilst people are out in public spaces, and subsequently onto surfaces within people’s private homes. This is consistent with the observed proportions of diurnal varying species at sites of each type, with the fixed public pool having the fewest diurnal species with the weakest FS, skin having the most diurnal species with the strongest FS, and household sites falling in between. This hypothesis would also explain why morning skin references had significantly higher accuracy rates than evening references when matching against household queries, as the evening references were taken at the end of a day of dispersal from public surfaces, whilst morning references were taken at the end of the period of normalisation. It may also explain why morning public surface queries had marginally but significantly higher accuracy rates than evening queries, as these queries were taken during the daytime hours when people were likely out and interacting with public surfaces. However, it does not explain why household surface evening queries had higher accuracy rates than morning queries (Fig. S1).
To test this hypothesis, we generated Dynamic Bayesian Networks (DBNs) for all taxonomic families present at each location, modelling the effect of the abundance of each family at a given site on the abundance at other sites after one time point increment (i.e. morning to evening, or evening to the following morning, approximately 12 h). In these networks, a node represents a site at a particular location (e.g. ‘subway exit handrail at location 2’ or ‘left palm at location 4’), whilst an edge from one node to another represents a causal relationship between the abundance of a family at the first site (the ‘parent’ node) and the abundance of the same family at the second site (the ‘child’ node) after one time point increment, implying microbial dispersal from one site to another [13]. Forty-seven percent of the DBNs were generated with at least one edge. Public surface nodes were significantly (Kruskal-Wallis p < 0.05) more likely to be parents than nodes of other types, yielding 599 edges in total across all networks, compared to 525 edges from skin parent nodes and 476 from household surface parent nodes. Skin nodes were significantly (Kruskal-Wallis p < 0.05) more likely to be child nodes than other types, with 597 edges ending at skin nodes, compared to 522 ending at household surface nodes and 481 at public surface nodes. The single most common edge type was from a public surface node to a skin node, comprising 261 edges in total. This strongly supports dispersal from public surfaces to skin at the approximately 12-h time scale as the major dispersal route between the three types of sites studied. However, these results do not suggest that dispersal from public surfaces to skin is the only, or even the major, driver of diurnal variation. Other regular patterns of activity, such as bathing, as well as physiological factors are very likely to play major roles.
Whilst there has been some previous work on the connection between the human circadian rhythm and the gut microbiome (reviewed in [24]), to our knowledge, there have been few previous reports on diurnal variation in the human skin microbiome, whether driven by physiological factors or by dispersal. Interestingly, one previous study examined human skin immediately after contact with public surfaces in the subway system in Hong Kong and found a similar pattern of diurnal variation between morning and evening samples, with the Actinobacteria as a major contributor to this variation as well as being the most abundant phylum overall [25]. This suggests that in addition to the effect of the daily rhythm of contact with public surfaces followed by ‘normalisation’ with the household microbiome that we propose above, there is also an effect of intra-day variation on public surfaces themselves, although as described above we found fewer significantly diurnal species on public surfaces than on skin or household surfaces.
Our results demonstrate yet another way in which forensic microbiota traces are not simply ‘microbial fingerprints’, but rather momentary snapshots of a dynamic network of microbial dispersal in which a person’s skin and a surface on which they leave a trace are only two nodes. In particular, they suggest that understanding the potential acquisition of microbiota from touched surfaces into the skin microbiome may be useful to develop effective matching methods. Despite the success of microbiota matching in this and previous similar studies [11,12,13], using relatively simple methods of comparing the community distances between reference and query samples, these studies have the advantage of identifying a match from a small and closed pool of references of which the correct match is known to be a member. In real-world forensic applications, the reference pool could potentially comprise the entire population of a city or country, with no certainty that the correct match is part of this pool. There have been some previous demonstrations of alternative approaches to microbial matching based on identifying marker taxa or sequences, including the metagenomic hitting set-based method described by Franzosa et al. [14], the hidSkinPlex panel of curated marker genes for forensic identification described by Schmedes et al. [26], and the use of minimum entropy decomposition (MED) to find individually identifying clusters of marker gene sequences described by Richardson et al. [27]. As we have shown in this and in our previous study [11], community-level matching will always have to contend with forces of microbial dispersal that degrade the similarity between a query and a reference sample. Approaches focused instead on identifying a targeted set of marker sequences or taxa that can then be traced through a dispersal event, or potentially even though multiple steps of a microbial dispersal network, may ultimately be more reliable for forensic matching than community-level methods.