In-depth study of tomato and weed viromes reveals undiscovered plant virus diversity in an agroecosystem

Viruses influence plants in agroecosystems, where their pathogenic nature in crops is primarily studied. Within the same systems, their diversity in non-crop plants and role outside the disease perspective is less known. To better understand their diversity and ecology, we performed an extensive virome exploration focusing on tomatoes and diverse weed species within or surrounding tomato farms. We detected 126 viruses, wherein 80 were novel (70 exclusively from weeds), and demonstrated infectivity of a novel tobamovirus in solanaceous hosts. Diversities of predominantly detected tomato viruses were variable, in some cases, with patterns comparable to global isolates of same species. We phylogenetically analyzed and taxonomically classified novel viruses and showed links between a subgroup of phylogenetically-related rhabdoviruses and a group of taxonomically-related host plants. Around one-third of viruses (n=10) detected in tomatoes were also detected in weeds, which might indicate possible role of weeds as inoculum reservoirs. We showed that even in a relatively well studied agroecosystem, a large part of plant viromes can still be unknown. Extensive biological and ecological insights generated from such holistic agroecosystem virome exploration will aid in anticipating possible emergences of plant virus diseases, and would serve as baseline for post-discovery characterization studies.


Introduction 20
The awareness on the importance of virus diseases, especially amid an on-going COVID-19 21 pandemic 1 , increased research interest on the exploration of virus diversity across ecosyst ems, 22 assisted by high-throughput sequencing (HTS) 2-4 , and by exploring global nucleotide 23 databases 5-7 . In agroecosystems, viruses are ubiquitous microbes associated with eukaryotic 24 hosts including crop and weed plants, fungi, oomycetes, arthropods, and nematodes, as well as 25 prokaryotes such as bacteria 8,9 . Thus, viruses could influence dynamics of plant populations 26 and individual phytobiomes, directly, or through modulation of other ecological or 27 environmental factors 10 . Due to the parasitic nature, high transmissibility and adaptability of 28 plant pathogenic viruses 11 , it was estimated that they account for half of emerging diseases in 29 plants 12 , and losses equivalent to around a quarter of expected crop yield 13 . Tomato (Solanum 30 lycopersicum L.), which has the highest volume of vegetable production globally 14 , is 31 associated with more than 300 viruses including several that are frequently associated with 32 disease symptoms and yield losses 15 . In recent years plant virologists have witnessed a spread 33 of emerging tomato viruses, such as tomato brown rugose fruit virus (ToBRFV) 16 and tomato 34 leaf curl New Delhi virus (ToLCNDV) 17 . To study such a high diversity of viruses, HTS has 35 become the tool of choice. HTS-based viromics, coupled with bioinformatics tools, enable 36 inference of biological, evolutionary, and ecological insights 18 , which impact also fields of virus 37 diagnostics and epidemiology 19 . 38 With HTS, plant virology has greatly shifted from traditional focus on disease-associated 39 relationships 18 , to a scalable and unbiased ecosystems-level approach 3  including weeds (designated simply 'weeds' hereafter) surrounding selected tomato production 64 sites in Slovenia. We examined their viromes using an HTS-based approach, followed by virus 65 characterization, to answer the following questions: (1) How diverse and prevalent are plant 66 plant virus taxa, and four other families that are known to be associated with other eukaryotic 94 hosts (Dicistroviridae, Iflaviridae, Lispiviridae, and Picornaviridae) (Fig. 1e). Majority of the 95 detected viruses were from 15 taxa of positive sense (+) single-stranded (ss) RNA viruses 96 (n=61), followed by four families of negative sense (-) ssRNA viruses (n=17) (Fig. 1f). Ten 97 viruses were found both in tomato and weed composite samples from several localities, some 98 of which are known to have wide host range, and are endemic in tomato (i.e., tomato spotted 99 wilt orthotospovirus (TSWV), cucumber mosaic virus (CMV), tomato mosaic virus (ToMV), 100 and potato virus Y (PVY)) 15 (Fig. 1g). Viruses recently detected in tomato (i.e., Solanum 101 nigrum ilarvirus 1 (SnIV1), Ranunculus white mottle ophiovirus (RWMV), tomato matilda 102 virus (TMaV)) 23,32-34 , a known insect (Aphis glycines virus 1 (AGV1)) and a known fungal 103 (Leveillula taurica associated rhabdo-like virus 1 (LtaRLV1)) virus, and a novel virus (plant 104 associated tobamo-like virus 1 (PaToLV1)) were also detected in both sample types. Among 105 the viruses detected both in tomatoes and weeds, there were several cases in which we could 106 detect the same virus within the same location in both sample types: seven viruses (TSWV, 107 CMV, ToMV, PaToLV1, LtaRLV1, PVY, and TMaV) with such pattern were observed. 108 Three known viruses were detected for the first time both in tomatoes and in Slovenia, and five 109 others were detected for the first time in the country, (i.e., were not reported before in local 110 comprehensive records) 35 (Fig. 1h) To get general insights into the association of detected viruses with observed disease symptoms, 116 we have compared the number and overlap of detected viruses in sampled symptomatic and 117 asymptomatic tomatoes. Out of the 45 viruses detected in tomato composite samples, only four 118 (8.9%) were exclusively detected in asymptomatic tomatoes, and 23 (51.1%) exclusively in 119 symptomatic tomatoes (Fig. 2a). Eighteen (40.0%) viruses were detected in both types of 120 tomato samples, of which, six (i.e., southern tomato virus (STV), CMV, PVY, olive latent virus 121 1 (OLV1), TMaV, and ToMV) were detected in at least six composite samples. A total of seven 122 new viruses, and eight known but still unclassified arthropod, fungal and oomycete viruses were 123 detected in tomatoes. Using RT-PCR assays (for details, see Methods and Supplementary Table  124 7), we confirmed the presence of a subset of mostly novel viruses in weeds and tomatoes, and 125 identified key hosts that could be potential transmission hubs, or alternate hosts (Fig. 2b, for 126 details see Supplementary Table 9).Some examples include: SnIV1 detected in tomatoes and in 127 another weed host, Physalis sp., from a single farm, PaToLV1 detected in tomatoes and in 128 Convolvulus arvensis also in a single farm, and TMaV detected in tomatoes and in three other 129 weed species (i.e., Chenopodium sp., Ranunculus repens, and Erigeron annuus) that span five 130 localities. RWMV was detected both in Solanum nigrum, and in four pools of tomato samples 131 from three different localities. Twelve novel viruses (discussed in succeeding sections), 132 including six rhabdoviruses and three tombusviruses, were detected in four Asteraceae species 133 (i.e., Picris echoides, Cichorium intybus, Taraxacum officinale, and Cirsium arvense). 134 To gain further insights on the population genomic diversity of the most prevalent tomato 135 viruses, we examined populations of 12 viruses (viruses for which we were able to assemble at 136 least three full genomes). STV, ToMV, TMaV and TSWV showed the narrowest range of 137 pairwise nucleotide identities, lowest nucleotide diversity and number of polymorphic sites, Tospoviridae species. MerV1 has genome segments similar to plant orthotospoviruses ( Fig.  154 4b,e). Phylogenetic analyses based on conserved RdRp aa sequences of orthotospoviruses 155 revealed that MerV1 is related to viruses in phylogroup C (i.e., a clade of phylogenetically-156 related orthotospoviruses) 36 (Fig. 4i). 157 A novel fimovirus, Artemisia fimovirus 1 (ArtV1) was detected in Artemisia verlotiorum. It is 158 30.2-46.1% identical to related Fimoviridae species based on RdRp aa comparison, and has 159 genome organization similar to plant fimoviruses (Fig. 4a,c,f). Phylogenetic analyses with 160 known fimoviruses revealed that ArtV1 is related to Perilla mosaic virus, making it the fourth 161 member of the divergent phylogroup IV of Fimoviridae (i.e., a clade of phylogenetically-related 162 fimoviruses) 37,38 (Fig. 4i). 163 A novel bunya-like virus, closely related to Fimoviridae and Tospoviridae, was detected in 164 seven symptomatic tomatoes using RT-PCR assays, thus named tomato associated bunya-like 165 virus 1 (TaBLV1). Pairwise comparison of RdRp aa sequences showed that TaBLV1 is only 166 20.8-22.4% identical to closely related bunyaviruses, however, only L-and M-like genome 167 segments were found for TaBLV1 (L, M and S segments are characteristic of plant -infecting 168 orthotospoviruses) (Fig. 4a,g). TaBLV1 is more closely related to orthotospoviruses based on 169 RdRp phylogenetic analyses (Fig. 4i). 170 We assembled for the first time, the full genome of RWMV, which was previously detected in 171 tomatoes and peppers in Slovenia 32 (Fig. 4a,d,h). Using RT-PCR assays, RWMV was detected 172 in three different localities, in eight tomatoes, and one S. nigrum, which is a newly identified 173 associated host. All RWMV isolates form a single clade (100% bootstrap support (BS)), but 174

Sequence quality screening, trimming and virus genome assembly 509
Raw reads were trimmed, screened for quality and analyzed following a previously described 510 pipeline for plant virus detection using HTS 71 . Contigs were primarily assembled from the 511 filtered reads using CLC Genomics Workbench (GWB) v. 20 71 (Qiagen, USA). Within the used 512 pipeline, virus and virus-like reads and contigs were initially identified by mapping trimmed 513 reads/contigs to virus RefSeq database (version from Jul. 2020) 72 and viral domains searches 514 in contigs against pFam database v. 33. Candidate viral contigs were later confirmed by 515 homology search using BLASTn against the NCBI nucleotide (nt) and BLASTx 73 against NCBI 516 non-redundant protein (nr) databases from Dec. 2020. Assembly in metaSPAdes v. 3.14 74 was 517 also implemented to recover longer contigs in some cases, where these were not assembled in 518 CLC-GWB. Consensus genome sequences of detected viruses were reconstructed using de-519 novo assembled contigs and/or iterative reads mapping to the most similar reference sequences 520 obtained from NCBI GenBank. Since datasets were derived from pooled samples, consensus 521 viral genomes were reconstructed only for the viruses with observed low to moderate 522 population diversity (determined after manual inspection of the mapping files), indicative of 523 infection with a single viral lineage. As a final checkup, mapping of reads of corresponding 524 datasets to reconstructed consensus virus genomes was implemented (with 95% identity and 525 read length fraction). Overall sequencing results and statistics, and information on sequencing 526 read archive (SRA) metadata are given in Supplementary Table 3. The internal controls were 527 used to check the prevalence of sequencing cross-talks. A threshold of <0.00001% of total reads 528 in the library was set, based on the general virome composition of the internal controls, to 529 classify sequences as either a possible contamination, crosstalk, or low titer virus infection. 530 For the sequences obtained from inoculated plants (discussed below) using MinION platform, 531 quality screening was done following a customized workflow 70 , that uses the programs from 532 the NanoPack program 75 for quality screening and visualization. Reads were assembled using 533 confirmed by analyses of phylogenetic relationships with known virus taxa (described below). 550

Genome assembly and screening for putative viroids 551
Viroid-like circular RNAs were assembled using the SLS-PFOR2 program 54 . BLASTn 552 searches 73 against the NCBI nt database and BLASTx searches against NCBI nr database, from 553 Dec. 2020, were implemented to filter out sequences from known organisms, and sequences 554 that code for proteins (E-val<10 -4 ). Filtered contigs were re-examined by remapping (95% 555 identity and read length fraction) virtually-diced reads (generated as a part of SLS-PFOR2 556 pipeline) and trimmed reads to the assembled circular RNAs. Contigs with average mapping 557 depth below 10 were manually discarded. Contigs were again filtered based on the presence of 558 two or more rotationally identical contigs of the same length (i.e., indicating the presence of (+) 559 and (-) strands). After selecting for rotational identical contigs, presence of viroid-like 560 secondary structure motifs (e.g., avsunviroid hammerhead ribozyme and rod-like pospiviroid 561 structures) were predicted using mFold 82 and forna 83 . Low structural Gibbs free energy and 562 visual inspections such as high degree of base paring, or degree of branching were the criteria 563 used to preliminarily select for putative novel viroids. 564

RT-PCR assays and Sanger sequencing 565
PCR primers were designed using Primer3Plus 84 , and RT-PCR assays were designed to identify 566 individual plant host(s) of selected viruses detected using HTS of pooled samples 567 (Supplementary Table 7). RT-PCRs were performed using OneStep RT-PCR kit (Qiagen, USA) 568 with thermocycling program conditions given in Supplementary Table 8. To confirm genome 569 circularity of viroid-like circular RNAs, abutting primers for inverse RT-PCR were manually 570 designed as previously described 54 (Supplementary Fig. 2). Amplicons were visualized in 1% 571 agarose gel, stained with ethidium bromide. When PCR amplicons need to be sequenced, to 572 confirm sequence identity or circularity in the viroid-like sequences, they were purified using 573 QIAquick PCR purification kit (Qiagen, USA), and sent at Eurofins Genomics, Germany. 574 Amplicon sequences were visualized and trimmed in CLC-GWB v. 20. Final sequences were 575 remapped (95% identity and read length fraction) to the target virus genome to confirm its 576 identity, or to putative viroid genome to confirm its circularity ( Supplementary Fig. 2). Photos 577 of identified plant hosts are in Supplementary Fig. 3. 578

Phylogenetic analyses 579
Phylogenetic analyses were performed to investigate taxonomic positions of detected newly 580 discovered and known viruses from different taxonomic groups (Supplementary Table 10 Tables 4-6. All other data, expanded or detailed figures, and raw results (i.e., data source) of all 792 analyses are provided in the 'Supplementary information' section of this paper.