Microbial DNA on the move: sequencing based detection and analysis of transduced DNA in pure cultures and microbial communities

Horizontal gene transfer (HGT) plays a central role in microbial evolution. Our understanding of the mechanisms, frequency and taxonomic range of HGT in polymicrobial environments is limited, as we currently rely on historical HGT events inferred from genome sequencing and studies involving cultured microorganisms. We lack approaches to observe ongoing HGT in microbial communities. To address this knowledge gap, we developed a DNA sequencing based “transductomics” approach that detects and characterizes microbial DNA transferred via transduction. We validated our approach using model systems representing a range of transduction modes and show that we can detect numerous classes of transducing DNA. Additionally, we show that we can use this methodology to obtain insights into DNA transduction among all major taxonomic groups of the intestinal microbiome. This work extends the genomic toolkit for the broader study of mobile DNA within microbial communities and could be used to understand how phenotypes spread within microbiomes. Significance Statement Microbes can rapidly evolve new capabilities by acquiring genes from other organisms through a process called horizontal gene transfer (HGT). HGT occurs via different routes, one of which is by the transfer of DNA carried by microbe infecting viruses (phages) or virus-like agents. This process is called transduction and has primarily been studied in the lab using pure cultures or indirectly in environmental communities by analyzing signatures in microbial genomes revealing past transduction events. The transductomics approach that we present here, allows for the detection and characterization of genes that are potentially transferred between microbes in complex microbial communities at the time of measurement and thus provides insights into real-time ongoing horizontal gene transfer.

currently rely on historical HGT events inferred from genome sequencing and studies involving cultured 23 microorganisms. We lack approaches to observe ongoing HGT in microbial communities. To address this 24 knowledge gap, we developed a DNA sequencing based "transductomics" approach that detects and 25 characterizes microbial DNA transferred via transduction. We validated our approach using model 26 systems representing a range of transduction modes and show that we can detect numerous classes of 27 transducing DNA. Additionally, we show that we can use this methodology to obtain insights into DNA 28 transduction among all major taxonomic groups of the intestinal microbiome. This work extends the 29 genomic toolkit for the broader study of mobile DNA within microbial communities and could be used to 30 understand how phenotypes spread within microbiomes. 31

32
Microbes can rapidly evolve new capabilities by acquiring genes from other organisms through a process 33 called horizontal gene transfer (HGT). HGT occurs via different routes, one of which is by the transfer of 34 The importance of horizontal gene transfer (HGT) as a driver of rapid evolution and adaptation in 42 microbial communities and host-associated microbiomes has become increasingly recognized(1, 2). 43 Publicly available genomes and metagenomes have revealed pervasive horizontally acquired genes in 44 almost all available genomes. A study of HGT in the human microbiome, for example, showed >10,000 45 recently transferred genes in 2,235 analyzed genomes(3). HGT has been implicated in the spread of 46 antibiotic resistance genes(4), toxin and other virulence genes(5, 6), as well as genes that enable digestion 47 of dietary compounds by microbes in the intestine(7), and metabolic genes that augment microbial 48 metabolism with critical functions in environmental populations(8). Despite its recognized importance, 49 our understanding of the taxonomic range, frequency, and mechanisms of HGT are still limited. Most 50 studies of HGT in microbiomes rely on analysis of microbial genomes (3,9) and as such these methods 51 attempt to reconstruct historical HGT. What we currently lack are methods that measure ongoing HGT 52 and identify the mechanism of DNA transfer. Here we present a novel method that specifically determines 53 the sequence of DNA that is transferred between cells via one of the major known pathways for DNA 54 transfer -transduction. 55 Currently, there are three major ways that genetic material is known to be exchanged between microbial 56 cells, (1) transformation -uptake of DNA by naturally competent cells, (2) conjugation -exchange of 57 genetic material (e.g. plasmids) using direct contact between donor and recipient cells, and (3) 58 transduction -transfer of genetic material by viruses or virus-like particles (VLPs)(2). Here we focus on 59 transduction only. There are several known types of transduction including classic specialized and 60 generalized transduction, and more recently discovered types, including gene transfer agents (GTAs), 61 lateral transduction and hijacking of bacteriophage (phage) particles by genomic islands(10-12). During 62 specialized transduction DNA adjacent to prophage integration sites in the bacterial genome are co-63 excised at a low frequency and packaged into phage heads after prophage genome replication. In 64 generalized transduction non-random pieces of the host bacterial genomic DNA or plasmids get packaged 65 at low frequency into phage particles when a lytic phage infects and replicates in a bacterial cell. This non-66 random packaging is mediated by genomic features that resemble the packaging site (pac site) on the 67 phage genome, which is used by the phage particle packaging machinery as the start site phage DNA 68 packaging into the capsid(13). In lateral transduction prophages replicate while still integrated in the host 69 genome and prophage packaging initiates in situ ultimately leading to high frequency packaging of host 70 DNA in a unidirectional fashion away from the prophage integration site(12).GTAs are phage-like 71 particles encoded in bacterial genomes that package random pieces of the genomic DNA upon production 72 and can transfer these pieces to other cells(10). In contrast to phages, GTAs do not carry the DNA content 73 sufficient to support their reproduction in the target cells. Lastly, some genomic islands, including 74  In the sample preparation step the sample is gently  111 homogenized and split into two subsamples. One subsample is directly used for whole community DNA 112 extraction, the other subsample is subjected to ultra-purification of virus-like particles (VLPs) using a 113 combination of filtration, DNAse digest and CsCl density gradient centrifugation as previously 114 described (24)  specialized transducing bacteriophage λ to analyze the sequencing coverage patterns produced by 129 specialized transduction. In specialized transduction a prophage, which is integrated in the chromosome of 130 the bacterial host, packages host genome derived DNA with low frequency due to imprecise excision from 131 the genome upon prophage induction. Prophage λ integrates between the gal (galactose metabolism) and 132 bio (biotin metabolism) operons in the E. coli genome. In rare cases λ excision is imprecise and either the 133 gal or the bio operon is excised and packaged in the phage particle ( Fig. 2a) into the chromosome or remains as an extrachromosomal element, which is diluted out in the population 137 during cell divisions. 138 Using the transductomics approach we found that coverage of the E. coli genome with sequencing reads 139 derived from purified λ phage particles is almost exclusively restricted to the λ phage integration site and 140 two ~25 kbp regions on the left and right of the λ integration site (Fig. 2b and c). These flanking regions 141 with read coverage represent the regions that are transduced by λ phage as indicated by the presence of the 142 bio and the gal operons in these flanking regions (Fig. 2c). The coverage of the λ prophage region is 143 roughly 10,000 fold greater than the coverage of the flanking transduced regions indicating that only a 144 small number of phage particles actually carry transduced DNA and thus are specialized transducing 145 particles. 146 Using the E. coli-prophage λ model we show that specialized transduction by a prophage produces a 147 unique read coverage pattern. Furthermore, analysis of the read coverage pattern of the transduced DNA 148 region adjacent to the prophage DNA allows determination of both the size and content of the transduced 149 host genome region (~50 kbp in total in case of λ), as well as estimation of the frequency with which 150 transducing particles are produced (1:10,000 in case of λ). The number of transducing particles produced 151 based on our data is roughly 100-fold higher than previously reported values for successful transduction of 152 the gal operon by phage λ (1:1,000,000 successful transductions per λ particles)(28), which indicates that 153 only a small fraction of λ carrying host DNA ultimately leads to successful transduction. Genome coverage pattern associated with prophage λ induction and specialized transducing prophage λ.

163
The upper box shows coverage patterns for whole genome sequencing reads and purified phage particle 164 reads mapped to the E. coli genome. c) In the lower box, an enlargement of the purified phage read 165 coverage for the prophage λ region is shown (log scale). The positions of the gal and bio operons, which 166 are known to be transduced by prophage λ, are indicated(27). 167 Generalized transduction of the Salmonella enterica serovar typhimurium LT2 genome by 168 phage P22 and the E. coli genome by phage P1: 169 We used two well-studied generalized transducing bacteriophages P22 and P1 to analyze the sequencing 170 coverage patterns produced during generalized transducing events. In generalized transduction nonspecific 171 host chromosomal DNA is packaged into phage particles during lytic infection and can then be injected 172 into a new host cell (Fig. 3a) whole genome sequencing of S. enterica yielded even coverage (Fig. 3b). Regions of high or low P22 read 179 coverage corresponded in 23 out of 28 previously reported transduced chromosomal markers(30) (Fig.  180 3b). Only one region at around 4 Mbp, for which high transduction frequencies had been reported, did not 181 show high coverage (Fig. 3b), which might be due to differences in pac sites within this region between 182 the S. enterica strain used in our study and the strain used in 1982. 183 The coverage of P22 derived reads showed a distinct pattern of peaks that rise vertically on one side and 184 decline slowly over several 100 kbp increments on the other side. We speculate that the vertical edge of 185 the peak corresponds to the location of the pac site at which the packaging of DNA into phage heads is 186 initiated and that the slope of the peak indicates the range of processivity of the headful packaging 187 mechanism (i.e. how many headfuls are packaged into particles before the packaging apparatus dissociates 188 from the chromosome). This speculation is based on several facts: (1) the size of host DNA carried by 189 transducing particles corresponds to the size of the P22 genome (~44 kbp)(31); (2) the P22 genome is 190 replicated by rolling circle replication, which produces long concatemers of P22 DNA. A specific 191 sequence on the phage DNA (pac site) initiates the packaging of these concatemers into phage heads using 192 a headful mechanism(31); (3) the packaging of phage DNA continues sequentially along the P22 genome 193 concatemer with a decreasing probability for each next headful to be encapsulated in a phage particle(30); 194 (4) there are five to six sequences on the S. enterica genome that are similar to the pac site, which leads to 195 packaging of Salmonella DNA into P22 particles upon P22 infection, albeit with much lower frequency as 196 compared to P22 DNA(30). 197 For E. coli phage P1, the majority of sequencing reads from purified P1 particles mapped to the P1 198 genome and only 4.5% of the reads mapped to the E. coli genome. The percentage of transducing P1 199 phages was previously reported to be 6%(32). We also observed that the P1 derived reads mapping to the 200 E. coli genome covered the genome unevenly. However, the pattern was less pronounced as compared to 201 P22 and S. enterica (Fig. 3c). This low unevenness in sequencing read coverage corresponds to previous 202 data on transduction frequencies of chromosomal markers, which found a maximum transduction 203 frequency across the E. coli genome of 10 fold(33). packaged into the phage head by a so called head-full packaging mechanism, which relies on the 215 recognition of a packaging (pac) site. The bacterial host chromosomes contain sites that resemble the pac 216 site and thus lead to packaging of non-random pieces of the host chromosome into phage heads. The 217 packaging happens in a processive fashion i.e. after one phage head has been filled the packaging 218 machinery continues to fill the next phage head with the remaining DNA molecule. The likelihood that the 219 packaging machinery dissociates from the molecule increases the further away from the pac site it gets, 220 thus leading to a decreased packaging efficiency over distance. b) Salmonella enterica genome coverage 221 pattern associated with generalized transduction by phage P22. Whole genome sequencing reads and 222 purified phage particle reads were mapped to the S. enterica genome. In the lower part transduction with a maximum 30 fold difference between the lowest and highest covered regions (Fig. 4d). Reads from 302 whole genome sequencing of B. subtilis covered the genome evenly slightly increasing toward the origin 303 of replication, as expected(39). The genomic region containing PBSX had a lower read coverage in VLP 304 particle derived reads as compared to neighboring genomic regions. This is consistent with results from a 305 previous study where it was found that a genetic marker integrated in the PBSX region was less frequently 306 packaged into particles as compared to a marker in a neighboring region(40). Interestingly, the genomic 307 region containing the prophage SPbeta, which gets excised upon mitomycin C treatment(41), did not 308 show any higher or lower coverage in the VLP particle derived sequencing reads as compared to 309 neighboring genomic regions (Fig. 4d). 310 Our results show that packaging of host DNA by the GTA-like PBSX element of B. subtilis produces a 311 distinct and non-random sequencing coverage pattern that bears similarities to the read coverage pattern 312 produced by the generalized transducing phage P1 (Fig. 3c). 313

314
The patterns for different transduction modes have distinct characteristics that will impact sensitivity of 315 detection and the false positive rate. For prophage induction and specialized transduction pattern detection 316 there are three potential challenges: (1) the length of the genome sequence fragment (contig) used for read 317 mapping needs to be sufficiently long to encompass both the prophage genome, as well as a portion of the 318 host genome; (2) potential assembly artifacts (chimeric contigs consisting of multiple source genomes) 319 can lead to highly uneven read coverage that could look similar to an induced prophage pattern. In the 320 case of our approach this is mitigated by the fact that we map whole metagenome and VLP reads to the 321 same contigs and thus we expect to obtain even read coverage for the whole metagenome read mapping, 322 which is indicative of correct assembly; and (3) if read coverage is too low patterns will not be sufficiently 323 distinct. It can be expected that frequency of specialized transduction is specific to specific host species 324 and prophages. Nevertheless, we tested the lower limit of read coverage levels needed for detection of 325 prophage induction and specialized/lateral transduction by down sampling read numbers for the VLP 326 reads from E. coli prophage λ and E. faecalis prophages (Figs. 2 and 4b) to achieve coverage levels 327 similar to what we observed for our mouse case study below, which ranged from several tenfold to several 328 thousand fold. For E. coli prophage λ we found that at ~6000x maximum read coverage (5% of total 329 reads) the specialized transduction pattern was still weakly visible, but disappeared at lower coverages, 330 while the induction of the prophage itself was still identifiable at read coverages of 20x (0.01% of total 331 reads) and less (Fig. S1a). For the E. faecalis prophages specialized/lateral transduction patterns were still 332 visible at ~500x coverage for the pp1 region and at ~150x for the pp5 region. Prophage induction was 333 detectable at coverages well be low 40x (Fig. S1b). These results indicate that specialized and lateral 334 transduction, as well as prophage induction, can sufficiently be detected with read coverages obtained in 335 shotgun metagenomic sequencing of VLPs. 336 For generalized transduction and GTA mediated DNA transfer pattern detection the two main challenges 337 are; (1) potential generation of similar patterns by contamination of the ultra-purified VLPs with DNA 338 from microbial cells, which can for example be addressed by comparing contig rank abundances between 339 whole metagenome and VLP read coverage (see below in case study); and (2) difficulty to recognize the 340 pattern on short contigs, because sloping can extend across 100s of kbp. To test if generalized transduction 341 or GTA-like patterns can be detected on shorter contigs we used the P22, P1 and PBSX data to simulate 342 how contig length impacts pattern visibility. For this we looked at the coverage patterns of 200 kbp long 343 stretches in the genome (Fig. S2). We found that detectability of generalized and GTA-like patterns in 200 344 kbp sequence stretches depended on where the 200 kbp stretch was located within the overall read 345 coverage pattern. In some cases distinguishable coverage sloping was observed (e.g. #2 in Fig. S2a and #2  346 in S2c) in other cases coverage looked even or irregular (e.g. #4 in S2b and #4 in S2c). These results 347 indicate that generalized transduction and GTA mediated DNA transfer can be detected from contig 348 lengths produced using short read shotgun metagenomics of microbiome samples, however, some DNA 349 transfer events are likely missed if the longer contigs do not cover regions that show the characteristic 350 coverage sloping associated with these transfer events. 351

Case study: High occurrence of transduction in the intestinal microbiome 352
We next assessed the power and application of our transductomics approach for detecting transduced 353 DNA in VLPs from complex microbiomes. We sequenced the whole metagenome (~390 mio reads) and 354 VLPs (~360 mio reads) from a fecal sample of one mouse to high coverage. The VLPs were ultra-purified 355 using the multi-step procedure shown in Fig. 1, for which we previously showed that it efficiently 356 removes DNA from microbial cells and the mouse present in fecal samples(24). We were able to assemble 357 2143 contigs >40 kbp from the whole metagenome reads with the largest contig being 813 kbp (ENA 358 accession for assembly: ERZ1273841). We discarded contigs <40 kbp because detection of transduction 359 patterns requires coverage analysis of a sufficiently large genomic region. We mapped the metagenomic 360 and VLP reads to the contigs >40 kbp to obtain the read coverage patterns. For complete metagenome, 361 44% of all reads mapped to the contigs >40 kbp and for the VLPs 10% of all reads indicating that a large 362 portion of DNA carried in VLPs is derived from prophages and microbial hosts. Of the 2143 contigs, 1957 363 showed a "standard" read coverage pattern (Fig. 5a, Suppl. Table S1), i.e. high even coverage of the 364 contigs with metagenomic reads and low even or no coverage with VLP reads, indicating no mobilization 365 of host DNA in VLPs. The remaining 186 contigs (8.6% of all contigs >40 kbp) showed a read coverage 366 pattern that indicates potential mobilization of DNA in VLPs (Fig. 5b-f, Suppl. Table S1). 367 To verify that the multi-step VLP ultra-purification procedure employed for this study efficiently removes 368 DNA from microbial cells, ruling out potential microbial host DNA contamination, we further assessed 369 read coverage patterns for the 186 contigs in comparison to all contigs. We ranked all 2143 contigs by 370 their normalized coverage for both the whole metagenome and the purified VLP samples (i.e. average x 371 fold read coverage / sum of average x fold read coverage for the sample) with the highest normalized 372 coverage being assigned rank 1 (Table S1). The expectation is that contigs from which DNA is carried in 373 VLPs have the same or lower rank for the VLP sample as compared to the whole metagenome sample, 374 while the rank for contigs for which VLP reads are derived from microbial contamination should have a 375 higher rank as compared to the whole metagenome reads, because randomly contaminating DNA would 376 be depleted in the purified VLP sample. We found that 26 out of the 186 contigs with coverage patterns 377 suggesting DNA mobilization had a normalized coverage based rank that was higher for VLP read 378 coverage than for whole metagenome read coverage indicating that these 26 patterns are potentially due to 379 contamination or alternatively due to very low efficiency of mobilization. 380 We classified all contigs taxonomically using CAT(42) (Suppl . Tables S2 and S3). The majority of contigs 381 were classified as Bacteroidetes (all contigs: 805, transduction pattern contigs: 83), Firmicutes (all: 586, 382 transduction pattern: 42), Proteobacteria (all: 89, transduction pattern: 3), or not classified at the phylum 383 level (all: 527, transduction pattern: 34). We found that with a few exceptions the relative abundance of 384 contigs assigned to specific phyla was similar between the set of all contigs >40 kbp and the subset of 385 contigs with transduction patterns. The phyla that differed in relative contig abundance were 386 Proteobacteria with less than half the relative abundance in the contigs with transduction patterns, 387 Verrucomicrobia with 3.5x and Candidatus Saccharibacteria with 11.5x the contig abundance in the 388 contigs with transduction patterns. Since members of Cand. Saccharibacteria have been shown to be 389 extremely small (200 to 300 nm)(43) it is likely that they share similar properties with bacteriophages in 390 terms of size and density and thus might get enriched in the VLP fraction. In fact, all transduction patterns 391 of Cand. Saccharibacteria contigs were classified as "unknown" or "unknown, potentially a small 392 bacterium" prior to knowing the taxonomic identity of the contigs. 393 We classified the type of DNA mobilization/transduction in the 186 contigs with a mobilization pattern 394 based on the visual characteristics of the mobilized region in the VLP read coverage, as well as based on 395 annotated genes within the mobilized region. For example, we classified mobilization patterns as prophage 396 if the characteristic pattern showed high coverage with sharp edges on both sides (compare Fig. 2) and the 397 presence of characteristic phage genes (e.g. capsid proteins) as an additional but not required criterion. 398 We observed 74 contigs that indicated induced prophages. Of these, 12 (16%) prophages showed 399 indications of specialized transduction i.e. read coverage above the base level of the contig in regions 400 adjacent to the prophage ( Fig. 5b and d). Additionally, we classified 8 patterns as potential prophages or 401 chromosomal islands, as they showed the same pattern as other prophages, but we were unable to find 402 recognizable phage genes in the annotations. 403 We found patterns of potential generalized transduction or GTA carried DNA in 46 contigs, however, 404 some patterns were observed for shorter contigs and could thus potentially be incorrect classifications 405 (Fig. 5c). One of the contigs (NODE_5, classified as Bacteria) with a generalized transduction or GTA 406 pattern additionally showed a sharp coverage drop in a ~15 kbp region only in the VLP reads (Fig. 5c). 407 This region is flanked by a tRNA gene and carries one gene annotated as a potential virulence factor, 408 internalin used by Listeria monocytogenes for host cell entry(44). This region might represent a 409 chromosomal island that was excised from the bacterial chromosome prior to or during production of the 410 unknown VLP and that did not get encapsulated in the VLP. Alternatively, similar strains might be present 411 in the sample, but only some carry the chromosomal island and strains carrying the chromosomal island 412 are less prone to producing the VLPs, e.g., by superinfection resistance provided by the chromosomal 413 island against a generalized transducing phage. 414 We observed 9 patterns that showed strong differences between whole metagenome read coverage and 415 VLP read coverage, but that did not correspond to any of the patterns we analyzed in our proof-of-416 principle work. However, based on gene annotations we determined that these patterns likely represent 417 retrotransposons or other transposable elements. For example, on contig NODE_1640 (classified as 418 Bacteria by CAT) we observed high coverage with VLP reads on one part of the contig, which carries a 419 gene annotated as a retrotransposon (Fig. 5e). Interestingly, the retrotransposon region is flanked by a ltrA 420 gene which is encoded on bacterial group II intron and encodes maturase, an enzyme with reverse 421 transcriptase and endonuclease activity(45). Surprisingly the region containing the ltrA gene had above 422 average coverage in the whole metagenome reads, but no coverage in the VLP reads. This suggests that 423 the intron actively reverse splices into expressed RNA with subsequent formation of cDNA(45) leading to 424 increased copy number of this genomic region. As another example, on contig NODE_1223 (classified as 425 Bacteria by CAT) a region containing a transposase gene is strongly overrepresented in the VLP reads 426 suggesting that this region is a transposable element that is packaged into a VLP (Fig. 5e). 427 Finally, we determined that two patterns are likely lytic phages and 47 patterns are classified as 428 "unknown" transduced DNA, as the coverage pattern is uneven indicating transport in VLPs but we could 429 not determine the type of transport. To provide an example, in contig NODE_646 (classified as 430 Clostridiales by CAT) we observed a potential prophage pattern in which we found some of the main 431 phage relevant genes such as major capsid protein, however, within the prophage pattern we observed 432 high coverage spikes for which we currently have no good explanation (Fig. 5f). 433  Table S2). The complete metagenome reads and the purified 437 VLP reads were mapped to the same exact set of contigs assembled from the complete metagenome reads. 438 The read coverage pattern of the complete metagenome reads provides evidence for the correct assembly 439 of the contigs and allows to distinguish potential transduction derived VLP read coverage patterns from 440 VLP coverage patterns due to contamination with microbial DNA. With the exception of panel A) all 441 shown contigs have the same or a lower abundance rank for VLP read coverage as compared to complete 442 metagenome read coverage indicating that their overall read coverage was enriched in the VLP samples. 443 Read coverage due to VLP sample contamination with cellular DNA is expected to result in a higher 444 abundance rank for VLP read coverage, as compared to complete metagenome read coverage. 445

446
The transductomics approach that we developed should be applicable to a broad range of environments 447 ranging from host-associated microbiomes to soils and aqueous environments. One of the major surprises for us when analyzing the mouse intestinal transductome data was that around 462 one quarter of the transduction patterns that we identified are unknown. These patterns showed even 463 coverage in the whole metagenome reads and strong uneven coverage in the VLP reads (e.g. Fig. 4e), 464 however, we were unable to associate them clearly with any of transduction modes that we have 465 investigated with pure cultures. We foresee two types of future studies to characterize the nature of the 466 transducing particles that lead to these unknown patterns and to exclude that they are some kind of 467 artifact. First, read coverage patterns of newly discovered modes of transduction have to be analyzed with 468 the "transductomics" approach to correlate the patterns to patterns observed in microbiomes and microbial 469 communities. While we investigated the transduction patterns associated with both major known 470 transduction pathways, as well as more recently discovered transduction pathways, novel modes of 471 transduction are continuously discovered. These novel transduction modes that need to be characterized 472 with our approach include new types of GTAs(10), lateral transduction(12) and DNA transfer in outer 473 membrane vesicles(50, 51). Second, approaches that allow linking specific transduced DNA sequences to 474 the identity of transducing particles in microbial community samples can be developed. We envision, for 475 example, that high resolution filtration and density gradient based separation of individual VLPs will 476 allow linking the transduced DNA (by sequencing) to the identity of the transducing VLPs using 477 proteomics to identify VLP proteins. Using and developing these approaches further will allow us to 478 increase the range of transduction modes that can be detected in microbial communities, as well as 479 potentially reveal currently unknown types of transduction that are not known from pure culture studies 480 yet. 481 We see several pathways for improving the sensitivity, accuracy and throughput of the transductomics 482 approach in the future. Currently, our ability to detect generalized transduction patterns is limited by the 483 fact that detection of these patterns requires long stretches of the microbial host genome to be assembled. 484 Our P22 and P1 data shows these patterns stretch across genomic regions >500 kbp. Additionally, high 485 sequencing coverage is needed for the detection of these patterns. Assembly of long contigs in 486 metagenomes of high diversity communities is currently hampered by the relatively short read lengths of 487 sequencers that allow for high coverage. We expect, that increasing read numbers of long-read sequencing 488 technologies such as PacBio and Oxford Nanopore in the future will allow us to sequence complex 489 microbiomes to sufficient depth for the assembly of long metagenomic contigs. A combination of long-490 read sequencing for the whole community metagenomes in combination with a short-read, high-coverage 491 approach for the VLP fraction will in the future provide more sensitive and accurate detection of 492 generalized transduction patterns. In addition to improvements in the realm of long-read sequencing we 493 expect the development of computational tools for the automatic or semi-automatic detection of 494 transduction patterns in read coverage data from paired whole metagenome and VLP metagenome 495 sequencing. There is a large number of possible parameters that could be used to train a machine learning 496 algorithm to detect transduction patterns. These parameters include differences between average read 497 coverage and maximal read coverage for VLP reads (Table S1) and the comparison of contig rank 498 abundance based on coverage, which we used to cross check transduction patterns for signatures of 499 microbial DNA contamination. Such computational tools will enable the high-throughput detection of 500 transduction patterns in many samples, which is currently limited by the need for visual inspection of 501 patterns. 502

In vitro bacteriophage propagation and induction of transducing prophages and other 504 elements 505
Lambda. E. coli KL740 was inoculated into 300 ml of LB and grown to on OD600 of 0.7 at 28ºC with 506 aeration. The culture flask was transferred to a 42ºC water bath for 10 minutes and then incubated at 42º C 507 for 30 min with shaking. The temperature was reduced to 28º C and cell lysis was allowed to proceed for 2 508 hrs. The remaining cells and debris were removed by centrifugation at 2750 x g for 10 minutes and the 509 phage containing culture fluid was filtered through a 0.45 µm membrane. 510

P22.
The data set used to analyze generalized transduction by Salmonella phage P22 was taken from a 511 previous study assessing methods for phage particle purification from intestinal contents(24). For a 512 detailed description of P22 propagation and purification please refer to our previous publication. Enterococcal prophages. E. faecalis strain VE14089, a derivative of E. faecalis V583 that has been cured 534 of its three endogenous plasmids(15), was subcultured to an OD600 of 0.025 in 1 L of pre warmed brain 535 heart infusion broth (BHI) and grown statically at 37ºC to an OD600 of 0.5. To induce excision of 536 integrated prophages, ciprofloxacin was added to the culture at a final concertation of 2 µg/ml and the 537 bacteria were grown for an additional 4 hrs at 37ºC. The bacterial cells and debris were centrifuged at 538 2750 x g for 10 min and the culture fluid was filtered through a 0.45µm membrane. 539 Purification of phage particles from culture fluid 540 All phage containing culture fluid was treated with 10 U of DNase and 2.5 U of RNase for 1 hr at RT. 1 M 541 solid NaCl and 10 % wt/vol polyethylene glycol (PEG) 8000 was added and the phages were precipitated 542 O/N on ice at 4ºC. The precipitated phages were resuspended in 2 ml of SM-plus and loaded directly onto 543 CsCl step gradients (1.35, 1.5 and 1.7 g/ml fractions) and centrifuged for 16 hrs at 83,000 x g. The phage 544 bands were extracted from the CsCl gradients using a 23-gauge needle and syringe, brought up to 4 ml with 545 SM-plus buffer and loaded onto a 10,000 Da molecular weight cutoff Amicon centrifugal filter (EMD 546 Millipore) to remove excess CsCl. The phages were washed 3 times with ~4 ml of SM-plus and then stored 547 at 4ºC. 548

Isolation of phage and host bacterial DNA from pure cultures 549
Following CsCl purification of phages and phage-like elements, DNA was isolated by adding 0.5 % SDS, 550 20 mM EDTA (pH=8) and 50 µg/ml Proteinase K (New England Biolabs) and incubating at 56ºC for 1 551 hour. Samples were cooled to RT and extracted with an equal volume of phenol:chloroform:isoamyl 552 alcohol. The samples were centrifuged at 12,000 x g for 1 min and the aqueous phase containing the DNA 553 was extracted with an equal volume of chloroform. Following centrifugation at ~16,000 x g for 1 min 554 0.3M NaOAc (pH=7) was added followed by an equal volume of 100% isopropanol to precipitate the 555 DNA. The DNA was pelleted at 12,000 x g for 30 min and washed once with 500 µl of 70% ethanol. The 556 samples were decanted and the pellets were air dried for 10 min and resuspended in 100 µl of sterile 557 water. 558 For the isolation of bacterial genomic DNA, we used the Gentra Puregene Yeast/Bacteria Kit (Qiagen) 559 according to the manufacturer's instructions. 560 Isolation and purification of bacteria and VLPs from mouse fecal pellets for metagenomic 561 sequencing 562 The entire colon contents of one male C57BL6/J mouse were added to 1.2 ml of SM-plus buffer and 563 homogenized manually with the handle of a sterile disposable inoculating loop. After homogenization the 564 sample was brought up to 2 ml with SM-plus. One third of the sample volume was added to a fresh tube 565 containing 100 mM EDTA and set aside on ice. This represented the unprocessed whole metagenome 566 sample. The remaining two thirds of the sample volume were used to isolate VLPs. 567 VLPs from the homogenized feces were ultra-purified as described previously(24). Briefly, the sample was 568 centrifuged at 2500 x g for 5 min, the supernatant transferred to a clean tube and centrifuged a second time 569 at 5000 x g to pellet any residual bacteria and debris. The supernatant was transferred to a sterile 1 ml syringe 570 and filtered through a 0.45 µm syringe filter. The clarified supernatant was treated with 100 U of DNase 571 and 15 U of RNase for 1 hr at 37ºC. The sample was loaded onto a CsCl step gradient (1.35, 1.5 and 1.7 572 g/ml fractions) and centrifuged for 16 hrs at 83,000 x g. The VLPs residing at the interface of the 1.35 and 573 1.5 g/ml fractions were collected (~2 ml) and the CsCl was removed by centrifugal filtration as described 574 above. The purified VLPs were disrupted by the addition of 50 µg/ml proteinase K and 0.5% sodium dodecyl 575 sulfate (SDS) at 56º C for 1 hr. The samples were cooled to room temperature and total DNA was extracted 576 by the addition of and equal volume of phenol:chloroform:isoamyl alcohol. The organic phase was separated 577 by centrifugation at 12,000 x g for 2 minutes and the aqueous phase was extracted with an equal volume of 578 chloroform. The DNA was precipitated by the addition of 0.3 M NaOAc, pH 7, and an equal volume of 579 isopropanol. The DNA pellet was washed once with ice cold 70% ethanol and resuspended in 100 µl of 580 sterile water. The DNA was further cleaned on a MinElute spin column (Qiagen) and eluted into 12 µl of 581 elution buffer (Qiagen). 582 To purify total metagenomic DNA, unclarified fecal homogenate was treated with 5 mg/ml lysozyme for 583 30 min at 37º C. The sample was transferred to 2 ml Lysing Matrix B tubes (MP Biomedical) and bead beat 584 in a Bullet Blender BBX24B (Next Advance) at top speed for 1 min followed by placing on ice for 2 min. 585 This was repeated a total of 4 times. The samples were centrifuged at 12,000 x g for 1 min and the DNA 586 from the supernatant was extracted, precipitated and purified as described above. 587

DNA Sequencing 588
The concentration of purified microbial DNA was determined using a Qubit 3.0 fluorometer (Thermo-589 Fisher). Prior to library preparation total microbial DNA was sheared using a Covaris S2 sonicator with a 590 duty cycle of 10%, intensity setting of 5.0 and a duration of 2 x 60 sec at 4º C. Sequencing libraries were 591 generated using the KAPA HTP library preparation kit KR0426 -v3.13 (KAPA Biosystems) with 592 Illumina TruSeq ligation adapters. Library quality was determined using a 2100 Bioanalyzer system 593 (Agilent). Libraries were size selected and purified in the range of 300-900 bp fragments and subjected to 594 Read decontamination and trimming of the mouse metagenome 75 and 150 bp reads were performed using 607 the BBMap short read aligner (v. 36.19)(25) as previously described(52). Briefly, for decontamination, 608 raw reads were mapped to the internal Illumina control phiX174 (J02482.1), the mouse (mm10) and 609 human (hg38) reference genomes using the bbsplit algorithm with default settings. The resulting 610 unmapped reads were adapter trimmed and low-quality reads and reads of insufficient length were 611 removed using the bbduk algorithm with the following parameters: ktrim = lr, k = 20, mink = 4, 612 minlength = 20, qtrim = f. The reads were assembled using SPAdes version 3.6.1(53) with the following 613 parameters: --only-assembler -k 21,33,55,77,99. Assembled contigs <40 kbp were discarded. The 614 assembly resulted in 2143 contigs >40 kbp. 615

Read mapping and read coverage visualization 620
The whole (meta)genome and purified VLP read sets were mapped onto the corresponding reference 621 genomes of pure culture organisms or the mouse fecal metagenome contigs using BBmap(25) with the 622 following parameters: ambiguous=random qtrim=lr minid=0.97. The generated read mapping files (.bam) 623 were sorted and indexed using SAMtools(v. 1.7)(55). Integrative Genomics Viewer (IGV, v. 2.3.67) tools 624 were used to generate tiled data files (.tdf) from the read mapping (.bam) files for data compression and 625 faster access in IGV using the following parameters: count command, zoom levels: 9, using the mean, 626 window size: 25 or 100(26). Read coverage patterns were displayed and visually assessed in IGV using a 627 linear or if needed log scale. 628