Forensic analysis of the microbiome of phones and shoes
© Lax et al.; licensee BioMed Central. 2015
Received: 21 January 2015
Accepted: 3 April 2015
Published: 12 May 2015
Microbial interaction between human-associated objects and the environments we inhabit may have forensic implications, and the extent to which microbes are shared between individuals inhabiting the same space may be relevant to human health and disease transmission. In this study, two participants sampled the front and back of their cell phones, four different locations on the soles of their shoes, and the floor beneath them every waking hour over a 2-day period. A further 89 participants took individual samples of their shoes and phones at three different scientific conferences.
Samples taken from different surface types maintained significantly different microbial community structures. The impact of the floor microbial community on that of the shoe environments was strong and immediate, as evidenced by Procrustes analysis of shoe replicates and significant correlation between shoe and floor samples taken at the same time point. Supervised learning was highly effective at determining which participant had taken a given shoe or phone sample, and a Bayesian method was able to determine which participant had taken each shoe sample based entirely on its similarity to the floor samples. Both shoe and phone samples taken by conference participants clustered into distinct groups based on location, though much more so when an unweighted distance metric was used, suggesting sharing of low-abundance microbial taxa between individuals inhabiting the same space.
Correlations between microbial community sources and sinks allow for inference of the interactions between humans and their environment.
KeywordsForensic microbiology Source-sink dynamics Shoe microbiome Phone microbiome Microbial time series
In recent years, research into the microbial interactions between humans and their surroundings has revolutionized our understanding of the microbial ecology of the built environment . The dynamic relationship between the bacteria associated with human skin and the microbiome of indoor surfaces and of objects we interact with has demonstrated the degree to which the human microbiome can shape the microbial ecology of our homes, offices, hospitals, and cities [2-6]. Characterizing this microbial dynamic is critical for many purposes, such as determining the rate and progression of microbial colonization of human infants exposed to the indoor microbiome [7,8]. We therefore believe it is essential to determine how the microbial ecology of the built environment establishes and fluctuates over time.
Research on the microbial exchange between human and built environments has illuminated the forensic potential of the microbiome. In some cases, human microbial signatures have been used to match individuals to objects they have interacted with, including computer keyboards . Work on the microbiome of multiple home surfaces has shown that the microbial signature of a family can be highly predictive of the microbiome of that family’s home and that individuals within a home can be differentiated . Indeed, recent work on the microbial assemblages associated with smart phones has shown that individuals leave their skin microbiome on the surface of their phones . The rate at which these microbial communities change after they are deposited on a surface is also potentially valuable for forensic applications. Recent work has shown that postmortem, the microbiome of animal hosts changes dramatically, but in a predictable manner . This predictability enables us to use microbial assemblages to help explore not just where someone is right now but also where they may have been recently . To explore the potential to determine the microbial fingerprint of individuals on personal items, we performed a detailed biogeographic and longitudinal characterization of the microbial communities on personal mobile phones. Additionally, we examined whether the microbial communities associated with an individual’s shoes were determined by the floor microbiome associated with where they were walking.
Results and discussion
Identifying signatures on shoe and phone samples
Summary of predictive accuracy of random forest supervised learning models
Estimated error ± SD
All phone samples
0.037 ± 0.062
All shoe samples
0.010 ± 0.020
P1 phone samples
0.417 ± 0.206
P2 phone samples
0.268 ± 0.180
P1 shoe samples
0.705 ± 0.125
P2 shoe samples
0.796 ± 0.090
In contrast to the high error ratio of models predicting study participant, the models did no better than expected by chance in determining which of the four shoe sites a sample had been taken from, even when models were segregated by study participant. We propose that this is due to the homogenization of communities across the shoe sole over time or to rapid changes in community structure at each sampling site. A similar pattern was observed in phone samples, with the models able to classify the participant a phone sample was taken from (error ratio of 13.6) but unable to determine whether the sample had been taken from the front or back of a given phone (Table 1).
Random forest models were also used to assess which bacterial taxa were most associated with different surface types. Models were trained on a genus-level summary of the OTU table, and shoe and floor samples were merged into a single surface type based on their similarity in ordination analyses. When trained at the genus level, models were able to determine whether a sample was taken from a phone or a shoe/floor with an error ratio of 3.6. The 20 genera with the highest feature importance scores are summarized in Additional file 2: Figure S2, with skin-associated genera such as Streptococcus, Propionibacterium, and Corynebacterium highly enriched in phone samples relative to shoe samples.
Longitudinal interaction between shoe and floor communities
To determine whether changes in the microbial community of the four shoe environments tended to be similar at each hourly sampling interval, we employed Procrustes analysis of the four sets of principal coordinates (Additional file 4: Figure S4). All three pairwise comparisons for each study participant produced significant P values (P < 0.005; Additional file 5: Table S1), demonstrating that changes in the microbial communities of the four shoe environments resemble each other at each sampling interval, and thus suggesting a consistent impact from the floor microbial community. Procrustes analysis of the principal coordinates from the front and back of participants’ phones did not produce significant P values, which we hypothesize is likely due to greater heterogeneity in community composition across the surface area of an individual phone at a given time point than would be observed across a shoe at a given time point due to lower overall biomass and high volatility in hand-associated microbial communities. It is also likely that microbes from the back of phones are likely to be sourced mostly from hands while the front may also be sourced from the face of the owner.
Although our experimental design only allows us to assess the impact of the floor microbial community on that of the shoe sole, it is of course also true that shoes influence floor microbial communities by depositing microbes that have adhered to them. As participants walk, bacteria may adhere to shoes and be subsequently transferred back to the floor in a dynamic process of continual loading and unloading of microbes. A study of uptake and deposit of particles via indoor foot traffic showed that in many cases downplay of particles in the size range of bacteria from shoe to floor is greater than uptake by the shoe .
To assess the stability of microbial community structure across the 12 individual shoe and phone time series, we focused on weighted UniFrac distance between samples from consecutive time points and visualized community volatility as a density plot of those distances (Additional file 6: Figure S5). Phone-associated microbial communities were observed to be both less stable (higher median distance) and more variable in their rate of change over time (broader distribution) than shoe-associated communities. By contrast, little difference was observed between the four shoe environments or between the two phone environments. We hypothesize that the high volatility of phone-associated microbial communities is likely due to a small microbial biomass that would be prone to a rapid turnover in community composition and the very high volatility of hand-associated microbiota that has been observed in previous studies .
Biogeographic influence on community structure
In addition to the two time series participants, we also collected individual shoe and phone samples from volunteers at three academic conferences, one in Vancouver, BC (N = 29), one in Washington, D.C. (N = 26), and one in California (N = 34). California samples were taken from two different rooms at the same conference while Vancouver and Washington samples were all taken from the same room. We used these data both to corroborate the patterns of diversity observed in the time series with a larger number of participants and to assess the differentiation in community structure attributable to geographic segregation.
As in the time series analyses, phone and shoe microbial communities were significantly different (Figure 4; Pseudo-F = 38.2 for weighted UniFrac, P < 0.0001). The location at which samples were collected also played a significant role in shaping community similarity, especially in shoe samples (Pseudo-F = 8.8, weighted UniFrac, P < 0.0001) though also significantly in phone samples (Pseudo-F = 4.9, weighted UniFrac, P < 0.0001). Random forest models were able to determine which of the three conferences a sample was taken from significantly better than expected by chance for both the shoe and phone environments (error ratio = 11.7 and 8.0, respectively). This suggests to us that, as seen in the time series data, different sites maintain a significantly different floor microbial community, which in turn shapes the microbial assemblage structure associated with the shoe samples.
Microbial communities show unique structure and composition based on surface type, the identity of the person interacting with the surface, and geographic location. This has significant implications for a variety of applications. While we suggest that it is possible to infer individual identities based on the microbial community associated with their smart phone surface, it is less likely that this assemblage could be used to track where that person has been recently located in space due to the rapid turnover of the surface-associated microbial community. We believe that the personalized-nature of the human microbiome and the distinct community types associated with urban and built environments may play a significant role in future forensic investigations.
This article reports the results of two studies, one of which employed longitudinal sampling of shoe and phone microbial communities (time series study) and one of which collected individual shoe and phone samples from individuals attending three geographically disparate conferences (biogeographical study). For the time series study, two participants were recruited to sample their shoes and phones every hour over the course of two 12-hour time periods on consecutive days. Samples were collected by the participants by rubbing sterile swabs pre-moistened with 0.15 M saline solution on each site of interest. Floor samples were taken immediately adjacent from wherever the participant was standing at the time of shoe sampling; not necessarily in an area where they had recently stepped. All samples were immediately placed at −20°C, or on dry ice in cases where samples were collected while participants were away from home or office. At each sampling site, participants made note of their current environment and of all actions taken over the proceeding hour. Participant 1 wore flat-bottomed, rubber soled boots while participant 2 wore sneakers with a more complex sole topography. Each participant wore the same pair of shoes on their 2 days of sampling, both with rubber soles.
For the biogeographical study: At three national and international conferences during 2012, samples were collected at random from participants’ phones and shoes. Samples were collected by the participants by rubbing sterile swabs pre-moistened with 0.15 M saline solution on each site of interest. All samples were immediately placed on dry ice and shipped to Argonne National Laboratory, where they were stored at −80°C until processed.
Total DNA was extracted from swabs using the Extract-N-Amp plant PCR kit (Sigma, St. Louis, USA) following the manufacture’s protocol with minor modifications. After extraction, DNA was quantified using PicoGreen (Invitrogen, Grand Island, USA) and a plate reader. DNA was then amplified using the Earth Microbiome Project barcoded primer set, adapted for the Illumina HiSeq2000 and MiSeq (Illumina, San Diego, USA) by adding nine extra bases in the adapter region of the forward amplification primer that support paired-end sequencing. The V4 region of the 16S rRNA gene (515 F-806R) was amplified with region-specific primers that included the Illumina flowcell adapter sequences and a 12-base barcode sequence [15,16]. Each 25 μl PCR reaction contained the following: 12 μl of MoBio PCR Water (Certified DNA-Free; MoBio, Carlsbad, USA), 10 μl of 5 Prime HotMasterMix (1×), 1 μl of forward primer (5 μM concentration, 200 pM final), 1 μl of Golay Barcode Tagged Reverse Primer (5 μM concentration, 200 pM final), and 1 μl of template DNA. The conditions for PCR were as follows: 94°C for 3 min to denature the DNA, with 35 cycles at 94°C for 45 s, 50°C for 60 s, and 72°C for 90 s, with a final extension of 10 min at 72°C to ensure complete amplification. Amplicons were quantified using PicoGreen (Invitrogen) and a plate reader. Once quantified, different volumes of each of the products are pooled into a single tube so that each amplicon is represented equally. This pool is then cleaned up using UltraClean® PCR Clean-Up Kit (MoBio, Carlsbad, USA), and then quantified using Qubit (Invitrogen, Grand Island, USA). After quantification, the molarity of the pool is determined and diluted down to 2 nM, denatured, and then diluted to a final concentration of 4 pM with a 30% PhiX spike for loading on the Illumina HiSeq2000 sequencer (for the time series study), and a final concentration of 6.1 pM with a 30% PhiX spike for sequencing on the Illumina MiSeq (for the biogeographical study).
Sequence processing and analysis
Unpaired reads of length 151 bp for both the time series and biogeographic studies were clustered together at 97% identity using the Quantitative Insights Into Microbial Ecology (QIIME) script pick_open_reference_otus.py, with the May 2013 release of Greengenes (greengenes.lbl.gov) as the reference. OTUs comprising only a single read were discarded, and samples were rarified to an even depth of 1,000 reads.
Analysis of beta-diversity was performed by calculating the pairwise weighted and unweighted UniFrac  distance between each pair of samples, and the resulting distance matrix was used for all downstream statistical tests of sample similarity. The significance of sample groupings was assessed using PERMANOVA (QIIME’s compare_categories.py script) and statistical significance was calculated by comparing the Pseudo-F statistic to a distribution generated by 10,000 permutations of the randomized dataset.
Random forest models
Random forest supervised learning models were used to determine the diagnostic power of microbial community profiles in predicting the surface type or participant a sample originated from. These models form decision trees using a subset of samples to identify patterns associated with a metadata category and then test the accuracy of the tree on the remaining samples not used for training. Each model runs a number of independent trees and reports the ratio of model error to random error as a metric for the predictive power of the category’s microbial communities. A greater ratio of baseline to model error indicates a better ability to classify that grouping by microbial community alone. The models were run using the supervised_learning.py command in QIIME, with 1,000 trees per model and tenfold cross validation.
For the SourceTracker models, all four shoe samples taken by each participant at a given time point were consolidated and treated as individual sinks (N = 29 and 27 for participants 1 and 2, respectively). All floor samples from the two participants’ time series were collapsed and treated as the two possible sources to the shoe sink community. Models were run following QIIME tutorial guidelines (http://qiime.org/tutorials/source_tracking.html).
Procrustes analysis compares the shape of two PCoA plots by optimally rotating and scaling one plot to best fit the other, with the goodness of fit measured by the M2 statistic. P values are generated using a Monte Carlo simulation in which sample identifiers are shuffled (here 1,000 times) and the M2 statistic is compared to the distribution drawn from these permutations. The proportion of M2 values that are equal or lower than the actual M2 value is the Monte Carlo P value.
Only time points in which all four shoe samples passed quality filtering were considered (N = 24 for participant 1 and 19 for participant 2). For each participant, samples were divided by shoe environment and four different sets of principal coordinates were computed based on weighted UniFrac distance between samples. The QIIME script transform_coordinate_matricies.py was used for Procrustes analysis, with the left heel coordinates used as the reference and the other three coordinate matrices transformed to best fit the reference.
Availability of supporting data
All sequencing data as well as the OTU table and mapping file are available at http://figshare.com/articles/Forensic_analysis_of_the_microbiome_of_phones_and_shoes/1311743.
This work was enabled by the generous support of the Alfred P Sloan foundation. This work was supported in part by the U.S. Dept. of Energy under Contract DE-AC02-06CH11357. S.M.G. was supported by an EPA STAR Graduate Fellowship and by a National Institutes of Health Training Grant 5 T-32 EB-009412.
- Kelley ST, Gilbert JA. Studying the microbiology of the indoor environment. Genome Biol. 2013, 14. doi:10.1186/gb-2013-14-2-202.Google Scholar
- Lax S, Smith DP, Hampton-Marcell J, Owens SM, Handley KM, Scott NM, et al. Longitudinal analysis of microbial interaction between humans and the indoor environment. Science. 2014;345(6200):1048–52.View ArticlePubMedGoogle Scholar
- Gibbons SM, Schwartz T, Fouquier J, Mitchell M, Sangwan N, Gilbert JA, et al. Ecological succession and viability of human-associated microbiota on restroom surfaces. Appl Environ Microbiol. 2015;81(2):765–73.View ArticlePubMed CentralPubMedGoogle Scholar
- Brooks B, Firek BA, Miller CS, Sharon I, Thomas BC, Baker R, et al. Microbes in the neonatal intensive care unit resemble those found in the gut of premature infants. Microbiome 2014, 2(1). doi:10.1186/2049-2618-2-1.Google Scholar
- Meadow JF, Altrichter AE, Kembel SW, Moriyama M, O’Connor TK, Womack AM, et al. Bacterial communities on classroom surfaces vary with human contact. Microbiome. 2014;2(7):1–7.Google Scholar
- Song, SJ, Lauber C, Costello EK, Lozupone CA, Humphrey G, Berg-Lyons D, et al. Cohabiting family members share microbiota with one another and with their dogs. eLIFE; 2013;2:e00458.Google Scholar
- Groer MW, Luciano AA, Dishaw LJ, Ashmeade TL, Miller E, Gilber JA. Development of the preterm infant gut microbiome: a research priority. Microbiome. 2014, 2(38). doi: 10.1186/2049-2618-2-38.Google Scholar
- Lax S, Nagler C, Gilbert JA. Our interface with the built environment: immunity and the indoor microbiota. Trends Immunol. 2015. in press.Google Scholar
- Fierer N, Lauber CL, Zhou N, Mcdonald D, Costello EK, Knight R. Forensic identification using skin bacterial communities. Proc Natl Acad Sci. 2010;107(14):6477–81.View ArticlePubMed CentralPubMedGoogle Scholar
- Meadow JF. Altrichter AE, Green JL. Mobile phones carry the personal microbiome of their owners. Peer J. 2014; 2(e447). doi: 10.7717/peerj.447.Google Scholar
- Metcalf JL, Wegener Parfey L, Gonzalez A, Lauber CL, Knights D, Ackermann G, et al. A microbial clock provides an accurate estimate of the postmortem interval in a mouse model system. eLIFE. 2013;2, e01104.View ArticlePubMed CentralPubMedGoogle Scholar
- Blaser MJ. Harnessing the power of the human microbiome. PNAS. 2010;107(14):6125–6.View ArticlePubMed CentralPubMedGoogle Scholar
- Knights D, Kuczynski J, Charlson ES, Zaneveld J, Mozer MC, Collman RG, et al. Bayesian community-wide culture-independent microbial source tracking. Nat Methods. 2011;8:761–3.View ArticlePubMed CentralPubMedGoogle Scholar
- Sippola MR, Sextro RG, Thatcher TL. Measurements and modeling of deposited particle transport by foot traffic indoors. Environ Sci Technol. 2014;48:3800–7.View ArticlePubMedGoogle Scholar
- Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high- throughput community sequencing data Intensity normalization improves color calling in SOLiD sequencing. Nat Methods. 2010;7:335–6.View ArticlePubMed CentralPubMedGoogle Scholar
- Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, Fierer N, et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 2012;6:1621–4.View ArticlePubMed CentralPubMedGoogle Scholar
- Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol. 2005;71:8228–35.View ArticlePubMed CentralPubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.