Metagenomic and metatranscriptomic inventories of the lower Amazon River, May 2011

Background The Amazon River runs nearly 6500 km across the South American continent before emptying into the western tropical North Atlantic Ocean. In terms of both volume and watershed area, it is the world’s largest riverine system, affecting elemental cycling on a global scale. Results A quantitative inventory of genes and transcripts benchmarked with internal standards was obtained at five stations in the lower Amazon River during May 2011. At each station, metagenomes and metatranscriptomes were obtained in duplicate for two microbial size fractions (free-living, 0.2 to 2.0 μm; particle-associated, 2.0 to 297 μm) using 150 × 150 paired-end Illumina sequencing. Forty eight sample datasets were obtained, averaging 15 × 106 potential protein-encoding reads each (730 × 106 total). Prokaryotic metagenomes and metatranscriptomes were dominated by members of the phyla Actinobacteria, Planctomycetes, Betaproteobacteria, Verrucomicrobia, Nitrospirae, and Acidobacteria. The actinobacterium SCGC AAA027-L06 reference genome recruited the greatest number of reads overall, with this single bin contributing an average of 50 billion genes and 500 million transcripts per liter of river water. Several dominant taxa were unevenly distributed between the free-living and particle-associated size fractions, such as a particle-associated bias for reads binning to planctomycete Schlesneria paludicola and a free-living bias for actinobacterium SCGC AAA027-L06. Gene expression ratios (transcripts to gene copy ratio) increased downstream from Óbidos to Macapá and Belém, indicating higher per cell activity of Amazon River bacteria and archaea as river water approached the ocean. Conclusion This inventory of riverine microbial genes and transcripts, benchmarked with internal standards for full quantitation, provides an unparalleled window into microbial taxa and functions in the globally important Amazon River ecosystem. Electronic supplementary material The online version of this article (doi:10.1186/s40168-015-0099-0) contains supplementary material, which is available to authorized users.


RNA Processing for Total Community Metatranscriptomes
Prior to RNA extraction, the filters were thawed, removed from the preservative solution, placed in Whirl-Pak bags (Nasco, Fort Artkinson, WI), and flash-frozen in liquid nitrogen. Frozen filters were broken into small pieces with a rubber mallet and transferred to a 50 mL lysis tube containing 10 mL of Denaturation Solution (Ambion), 500 µL of Plant RNA Isolation Aid (Ambion), 2 mL of sterilized zirconium beads (OPS Diagnostics), and internal standards [1]. Tubes were vortexed for 10 min to lyse cells, after which time the tubes were centrifuged for 1 min at 5,000 rpm. The lysates were transferred to a sterile 15 mL conical tube and then centrifuged for 5 min at 5,000 rpm. The clarified lysates were transferred to sterile 50 mL conical tubes and 3.5 mL of saturated phenol (pH 4.3) was added to each lysate and vortexed thoroughly. The tubes were centrifuged for 8 min at 12,000 x g, after which the non-viscous phase in each tube was transferred to a fresh 50 mL conical tube, and 3.5 mL of a phenol:chloroform solution (1:1, pH 5) was added and tube contents were mixed well. Following another 8 min centrifugation at 12,000 x g, the aqueous phase in each tube was transferred to a sterile 50 mL conical tube and 5 mL of chloroform:isoamyl alcohol solution was added. Tubes were vortexed and contents were centrifuged for 5 min at 12,000 x g. The final aqueous phase in each tube was transferred to a fresh 50 mL conical tube prior to the addition of an equal volume of 100% ethanol. Each mixture was homogenized by passage through a syringe several times. RNA purification was completed for each sample using the Direct-Zol RNA Kit (Zymo Research) according to manufacturer's protocol. Residual DNA was removed from the samples by two successive treatments with the Turbo DNA-free kit (Invitrogen, Carlsbad, CA). Ribosomal RNA (rRNA) was selectively removed using community-specific biotinylated-rRNA probes prepared from DNA collected simultaneously [2]. To maximize the removal of rRNA, probes were created for Bacterial and Archaeal 16S and 23S rRNA and Eukaryotic 18S and 28S rRNA. Probe-bound rRNA was removed via hybridization to streptavidin-coated magnetic beads (New England Biolabs, Ipswich, MA), and successful removal of rRNA from the samples was confirmed using either an Experion automated electrophoresis system (Bio-Rad Laboratories, Hercules, CA) or a Bioanalyzer (Agilent Technologies, Santa Clara, CA). rRNA-depleted samples were linearly amplified using the MessageAmp II-Bacteria Kit (Applied Biosystems, Austin, TX), and amplified mRNA was converted into cDNA using the Superscript III First Strand synthesis system (Invitrogen, Carlsbad, CA) with random primers, followed by the NEBnext mRNA second strand synthesis module (New England Biolabs, Ipswich, MA), both according to manufacturer protocols. Synthesized cDNA was purified using the QIAquick PCR purification kit (Qiagen, Valencia, CA) followed by EtOH precipitation, resuspension in 100 µL of TE buffer, and storage at -80 o C until library preparation for sequencing.

DNA Processing for Metagenomes
DNA was extracted and purified as previously described [3,4] with some modification. Briefly, each filter was thawed, removed from RNAlater, and rinsed three times in autoclaved, filter-sterilized, 0.1% phosphate-buffered saline (PBS) to remove any residual RNAlater. Each filter was shattered as described above and placed in a tube containing DNA extraction buffer [DEB: 0.1 M Tris-HCl (pH 8), 0.1 M Na-EDTA (pH 8), 0.1 M Na 2 H 2 PO 4 (pH 8), 1.5 M NaCl, 5% CTAB]. All liquid from the rinses as well as the original RNAlater was pushed through a Sterivex-GP filter capsule (EMD Millipore, Billerica, MA), which was subsequently rinsed 3 times to salvage any lost cells. The capsule was opened and the filter sliced into pieces and added to the tube with the original membrane filter and an internal genomic DNA standard (described below). Following treatments with proteinase-K, lysozyme, and sodium dodecyl sulfate, DNA was purified via phenol:chloroform extraction and isopropanol precipitation.

Internal Standards
Omics processing included the addition of internal standards to allow for calculation of volume-based absolute copy numbers for each gene or transcript type, rather than just relative quantification (i.e., counts L -1 in addition to % of library) [1,5]. Two mRNA standards that mimicked prokaryotic mRNAs were synthesized by in vitro transcription using a method modified previously described [1]. The standards were constructed by linearizing two custom synthesized vectors with a restriction enzyme. Each was purified by phenol:chloroform:isoamyl alcohol extraction and ethanol precipitation. Complete digestion of the vector was confirmed on a 1% agarose gel. The DNA fragment was then transcribed in vitro using the Riboprobe in vitro Transcription System (Promega, Madison, WI) according to the manufacturer's protocol using a T7 RNA polymerase to create the two artificial transcripts that were each 1,006 nt in length. Residual DNA was removed using RQ1 RNase-Free DNase and the RNA was purified by phenol:chloroform:isoamyl alcohol extraction and ethanol precipitation. The RNA standards were quantified using the Quant-iT Ribogreen RNA Reagent and Kit (Invitrogen, Carlsbad, CA), and RNA nucleotide length was confirmed with a bioanalyzer. A known copy number of each standard was added independently to each lysis tube immediately prior to the addition of the sample filter.
The genomic internal standard consisted of Thermus thermophilus DSM7039 [HB8] genomic DNA (American Type Culture Collection, Manassas, VA) added immediately prior to cell lysis.
The amount of internal standards added was calculated based on estimated yield of DNA and total RNA as in [1].

Sequencing and Data Processing
cDNA and DNA was sheared ultrasonically to ~200-250 bp fragments and TruSeq libraries (Illumina Inc., San Diego, CA) were constructed for paired-end (150 x 150) sequencing using the Illumina HiSeq2500 platform (Illumina Inc., San Diego, CA). Following sequencing, reads were paired using PandaSeq (Masella et al., 2012) and filtered with FastX toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) with a minimum score cutoff of 20 over 80% of a read. From the paired, quality-controlled reads, internal standard sequences were quantified and removed, and any rRNA sequences were removed from the metatranscriptomes. Transcript and gene abundances as well as expression ratios were calculated as previously described [1].

Figure S1.
Hierarchical clustering of the Bray-Curtis dissimilarities in the taxonomic binning of transcripts from two microbial size fractions from 50% water depth at each of five locations in the Amazon River and from the surface water at the Tapajós station (n=2 for all). Note the similarity in composition of the Tapajós surface and 50% water depth, indicative of a well-mixed water column.