Metagenomic and metatranscriptomic inventories of the lower Amazon River, May 2011
© Satinsky et al. 2015
Received: 26 March 2015
Accepted: 12 August 2015
Published: 10 September 2015
The Amazon River runs nearly 6500 km across the South American continent before emptying into the western tropical North Atlantic Ocean. In terms of both volume and watershed area, it is the world’s largest riverine system, affecting elemental cycling on a global scale.
A quantitative inventory of genes and transcripts benchmarked with internal standards was obtained at five stations in the lower Amazon River during May 2011. At each station, metagenomes and metatranscriptomes were obtained in duplicate for two microbial size fractions (free-living, 0.2 to 2.0 μm; particle-associated, 2.0 to 297 μm) using 150 × 150 paired-end Illumina sequencing. Forty eight sample datasets were obtained, averaging 15 × 106 potential protein-encoding reads each (730 × 106 total). Prokaryotic metagenomes and metatranscriptomes were dominated by members of the phyla Actinobacteria, Planctomycetes, Betaproteobacteria, Verrucomicrobia, Nitrospirae, and Acidobacteria. The actinobacterium SCGC AAA027-L06 reference genome recruited the greatest number of reads overall, with this single bin contributing an average of 50 billion genes and 500 million transcripts per liter of river water. Several dominant taxa were unevenly distributed between the free-living and particle-associated size fractions, such as a particle-associated bias for reads binning to planctomycete Schlesneria paludicola and a free-living bias for actinobacterium SCGC AAA027-L06. Gene expression ratios (transcripts to gene copy ratio) increased downstream from Óbidos to Macapá and Belém, indicating higher per cell activity of Amazon River bacteria and archaea as river water approached the ocean.
This inventory of riverine microbial genes and transcripts, benchmarked with internal standards for full quantitation, provides an unparalleled window into microbial taxa and functions in the globally important Amazon River ecosystem.
KeywordsAmazon River Metagenomics Metatranscriptomics Internal standards Microbial communities
The Amazon River is the world’s largest riverine system, formed by a network of tributaries draining Andean and lowland basins . Understanding the fate of materials transported through the Amazon River will help to better quantify its impact on global elemental cycles. As is the case for other large tropical rivers, the Amazon drains extensive floodplains and other continental areas of high primary and secondary production  and contributes significantly to organic matter and nutrient export to the ocean [3, 4]. Processes occurring within the river also drive large fluxes of methane and carbon dioxide to the atmosphere [5–7].
Here, metagenomic and metatranscriptomic sequences were obtained by Illumina sequencing, using 150 × 150 bp overlapping paired-end reads. Whereas community genomic data have typically been analyzed within a relative framework (i.e., percent of metagenome and percent of metatranscriptome), the approach used here incorporated known copy numbers of internal standards added at the initiation of sample processing . This allowed transcripts and genes to be inventoried within an absolute framework (transcripts L−1, gene copies L−1, and transcript to gene copy ratios), facilitating comparisons of gene expression levels and regulatory responses among taxa and between river locations.
Measurements were made at five stations along the lower Amazon River including the historic downstream gauging station, Óbidos, the clear water Tapajós tributary, and the three primary channels near the Amazon River mouth. For each station, metagenomes and metatranscriptomes were obtained in duplicate for two discrete size fractions (free-living, 0.2 to 2.0 μm; particle-associated, 2.0 to 297 μm), resulting in 40 datasets (5 stations × 2 nucleic acid types × 2 size fractions × 2 replicates). At the Tapajós station, an additional set of filters were collected from the surface water for comparison with the sample collected at 50 % of the river depth. Following quality control (removal of poor quality reads, removal of ribosomal RNA (rRNAs) from metatranscriptomes, removal of internal standards, and joining of overlapping 150 bp paired ends), 760 million potential protein-encoding reads were obtained and analyzed for taxonomy and function.
Detailed sample collection and processing methodology can be found in Additional file 1. The five sampling sites were located at Óbidos, the Tapajós River confluence, the north and south Macapá channels, and Belém (Fig. 1a, Additional file 2). Water samples were collected at 50 % of water column depth at each station, which ranged from 10–33 m among the stations, and microbial cells were collected by filtration and preserved in RNAlater (Applied Biosystems, Austin, TX). During sample processing, internal standards consisting of two different ~1000 base RNA standards [10, 11] and T. thermophilus HB8 genomic DNA standard  were added to each sample prior to cell lysis. The samples collected for metatranscriptomics were processed by extracting total RNA from the collected filters following the removal of RNAlater, treating the extracted total RNA with DNase to remove residual DNA, depleting rRNA through subtractive hybridization with community specific biotinylated nucleotide probes, linearly amplifying the remaining transcripts, and making double-stranded complementary DNA (cDNA) for library preparation and sequencing. The metagenomic samples were processed by extracting DNA and removing residual proteins and RNA. Following sample processing, cDNA or DNA was sheared and libraries were constructed for paired-end sequencing (150 × 150) using the Illumina HiSeq 2500 platform.
From a total of 48 samples, we obtained 1.27 × 109 raw sequences. Following quality control, 0.94 × 109 reads with a mean length of 212 bp were obtained. Internal standards were quantified and removed, along with any remaining rRNA sequences, leaving 0.73 × 109 possible protein-encoding reads. These were annotated against the RefSeq protein database using RAPSearch2 , and abundance per liter was calculated based on internal standard recovery following methods in Satinsky et al.  (Additional file 2).
Biological and chemical data measured concurrently with sample collection included temperature, depth, conductivity, bacterial abundance, dissolved inorganic carbon concentrations, and chlorophyll concentrations (Additional file 2). Datasets describing the organic chemistry and bacterial respiration were collected at each of the lower stations [13–15].
The PANDAseq program  was used to join the paired-end Illumina reads using all default parameters except for the threshold score, which was set at a value stricter than the default (0.8). The FASTX-Toolkit [http://hannonlab.cshl.edu/fastx_toolkit/index.html] was used for quality control of the paired reads. Ribosomal RNA and internal standard sequences were identified in the metatranscriptomes using a Blastn search against a custom database containing representative rRNA sequences and internal standard sequences; sequences with a bit score ≥50 were identified as either rRNA or internal standards and removed from the datasets. Internal standards were identified in metagenomes by first performing a Blastn search (bit score cutoff ≥ 50) against the T. thermophilus HB8 genome. Hits were subsequently queried against the RefSeq protein database using Blastx (bit score cutoff ≥ 40) to identify and quantify T. thermophilus HB8 protein-encoding reads, and these reads were removed from the datasets prior to data analysis and deposition.
Library types and reads obtained in the Amazon River May 2011 microbial gene and transcript inventories, as part of the Amazon Continuum Project
Metagenomes (community DNA)
Metatranscriptomes (community mRNA)
Size fractions sampled
8.85 × 108
1.73 × 109
Joined reads post QC
3.54 × 108
6.17 × 108
Average joined read length (bp)
1.52 × 108
Potential protein-encoding reads
3.52 × 108
3.78 × 108
Average abundance (genes L−1 or transcripts L−1)
3.70 × 1012
6.18 × 1012
3.24 × 1010
6.17 × 1010
1.25 × 1011
1.94 × 1011
4.70 × 109
8.07 × 109
9.56 × 1010
1.88 × 1011
3.44 × 109
3.04 × 1010
1.39 × 1011
1.31 × 1011
4.50 × 108
5.42 × 108
Gene and transcript inventories for the ten reference genome bins recruiting the highest number of metagenome reads
Mean transcripts L−1
Mean genes L−1
Mean % of transcripts
Mean % of genes
PA genome equivalents L−1
FL genome equivalents L−1
Total genome equivalents L−1
Actinobacterium SCGC AAA027-L06
4.84 × 109
4.97 × 1011
1.92 × 108
2.08 × 108
4.00 × 108
1.30 × 109
1.51 × 1011
1.71 × 107
4.41 × 106
2.15 × 107
Polynucleobacter necessarius asym.
1.55 × 109
1.49 × 1011
4.19 × 107
2.73 × 107
6.91 × 107
Candidatus Solibacter usitatus Ellin6076
1.16 × 109
1.44 × 1011
1.31 × 107
5.53 × 106
1.86 × 107
2.00 × 109
1.16 × 1011
1.25 × 107
5.36 × 106
1.79 × 107
Niastella koreensis GR20-10
1.04 × 109
1.12 × 1011
9.54 × 106
6.09 × 106
1.56 × 107
Ilumatobacter coccineum YM16-304
4.44 × 108
1.10 × 1011
1.56 × 107
1.04 × 107
2.60 × 107
Candidatus Nitrospira defluvii
1.44 × 109
7.42 × 1010
1.45 × 107
4.34 × 106
1.88 × 107
5.38 × 108
7.27 × 1010
2.11 × 107
1.46 × 107
3.57 × 107
Opitutus terrae PB90-1
6.94 × 108
6.75 × 1010
7.61 × 106
7.04 × 106
1.46 × 107
1.50 × 1010
1.49 × 1012
6.38 × 108b
We were able to assess the accuracy of gene count calculations based on the internal genomic standard by comparing genome equivalent estimates to direct cell count data obtained by epifluorescence microscopy. The average number of genome equivalents in Amazon River water was estimated to be 3.80 × 109 genomes L−1, calculated by extrapolating from the sum of the genome equivalents of the top ten taxa (6.38 × 108; Table 2) and assuming they account for 16.8 % of the total genome equivalents (the same as their percent contribution to total genes; Table 2). In excellent agreement with this internal standard-based calculation, direct cell counts indicated an average of 3.46 × 109 cells L−1 (Additional file 2).
Calculations of the expression ratio (defined as the ratio of transcripts to gene copy) allowed comparisons of transcriptional activity among taxa, and by river location and size fraction. Of the ten most dominant prokaryotic reference genomes, all but nitrite-oxidizing Candidatus Nitrospira defluvii showed higher average transcripts to gene copy ratios in the larger size class, indicative of more active cells when associated with aggregates or particulate material. Within this overall pattern, however, expression ratios were consistently higher for free-living cells in the upriver stations at Óbidos and Tapajós and at the Macapá-South station. At Macapá-North and Belém, as well as in offshore Amazon Plume stations sampled in a previous study , particle-associated prokaryotic cells were considerably more transcriptionally active (Fig. 1b). An additional collection was made of surface water at the Tapajós station to compare with the 50 % water depth sample, in order to assess depth-related differences in river microbial communities that could indicate water column substructure. The composition of the surface and 50 % depth metagenomes and metatranscriptomes were highly similar (Additional file 2: Figure S1).
The Amazon basin plays a central role in global nutrient cycling, and the rainforest surrounding the river is responsible for nearly 10 % of global primary production . At its mouth, the Amazon discharges water at a rate greater than the next six largest global rivers combined. To better understand the diversity and metabolic activity of microbial communities within this extremely large river system and its oceanic plume, four high-throughput metagenomic and metatranscriptomic sequence datasets are being produced as part of the ANACONDAS and ROCA projects (http://amazoncontinuum.org). The June 2010 plume dataset has been published . Two additional datasets consisting of concurrently sampled plume and river collections in July 2012 are in progress. These high-coverage, size-discrete, and replicated datasets are all benchmarked with internal genomic and messenger RNA (mRNA) standards. Future analysis will focus on expression of biogeochemically relevant genes mediating key transformations in the carbon, nitrogen, and phosphorus cycles and the physiological and environmental factors regulating expression levels [10, 25].
Availability of supporting data
Sequences from this May 2011 Amazon Continuum study are available from NCBI under accession numbers SRP039390 (metagenomes) and SRP037995 (metatranscriptomes). The NCBI sequences are fastq files from which internal standard sequences (metagenomes and metatranscriptomes) and rRNA sequences (metatranscriptomes only) have been removed prior to deposition. Metadata accompanying the omics datasets are provided in Additional file 2. ANACONDAS and ROCA project data are also available at the BCO-DMO data repository (http://www.bco-dmo.org/project/2097).
We appreciate the assistance of Roger Nilsen with library preparation and thank the scientists of the ROCA and ANACONDAS projects, Henrique O. Sawakuchi (CENA, Piracicaba), Alan Cavalcanti da Cunha (UNIFAP, Macapá), Daímio Brito (UEAP, Macapá),Troy P. Beldini (UFOPA, Santarem), José Mauro (UFOPA, Santarem), and Rodrigo da Silva (UFOPA, Santarem) for the supporting data and insights. This research is funded by the Gordon and Betty Moore Foundation through Grants GBMF 2293 and 2928 to PL Yager and Grant GMBF 538.01 to MA Moran. Resources and technical expertise were provided by the University of Georgia’s Georgia Advanced Computing Resource Center.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Richey JE, Devol AH, Wofsy SC, Victoria R, Riberio MNG. Biogenic gases and the oxidation and reduction of carbon in amazon river and floodplain waters. Limnol Oceanogr. 1988;33(4):551–61.View ArticleGoogle Scholar
- Nebel G, Dragsted J, Vega AS. Litter fall, biomass and net primary production in flood plain forests in the Peruvian Amazon. Forest Ecol Manag. 2001;150(1–2):93–102.View ArticleGoogle Scholar
- Hedges JI, Clark WA, Quay PD, Richey JE, Devol AH, Santos UD. Compositions and fluxes of particulate organic material in the Amazon river. Limnol Oceanogr. 1986;31(4):717–38.View ArticleGoogle Scholar
- Spencer RGM, Hernes PJ, Aufdenkampe AK, Baker A, Gulliver P, Stubbins A, et al. An initial investigation into the organic matter biogeochemistry of the Congo River. Geochim Cosmochim Ac. 2012;84:614–27.View ArticleGoogle Scholar
- Devol AH, Richey JE, Clark WA, King SL, Martinelli LA. Methane emissions to the troposphere from the Amazon floodplain. J Geophys Res-Atmos. 1988;93(D2):1583–92.View ArticleGoogle Scholar
- Richey JE, Melack JM, Aufdenkampe AK, Ballester VM, Hess LL. Outgassing from Amazonian rivers and wetlands as a large tropical source of atmospheric CO2. Nature. 2002;416(6881):617–20.View ArticlePubMedGoogle Scholar
- Sawakuchi HO, Bastviken D, Sawakuchi AO, Krusche AV, Ballester MVR, Richey JE. Methane emissions from Amazonian Rivers and their contribution to the global methane budget. Global Change Biol. 2014;20(9):2829–40.View ArticleGoogle Scholar
- Ghai R, Rodriguez-Valera F, McMahon KD, Toyama D, Rinke R, Cristina Souza De Oliveira T, et al. Metagenomics of the water column in the pristine upper course of the Amazon river. PLoS One. 2011;6(8), e23785.PubMed CentralView ArticlePubMedGoogle Scholar
- Satinsky BM, Gifford SM, Crump BC, Moran MA. Use of internal standards for quantitative metatranscriptome and metagenome analysis. Method Enzymol. 2013;531:237–50.View ArticleGoogle Scholar
- Satinsky BM, Crump BC, Smith CB, Sharma S, Zielinski BL, Doherty M, et al. Microspatial gene expression patterns in the Amazon River Plume. Proc Natl Acad Sci U S A. 2014;111(30):11085–90.PubMed CentralView ArticlePubMedGoogle Scholar
- Satinsky BM, Zielinski BL, Doherty M, Smith CB, Sharma S, Paul JH, et al. The Amazon continuum dataset: quantitative metagenomic and metatranscriptomic inventories of the Amazon River plume, June 2010. Microbiome. 2014;2:17.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhao Y, Tang H, Ye Y. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics. 2012;28(1):125–6.PubMed CentralView ArticlePubMedGoogle Scholar
- Seidel M, Yager PL, Ward ND, Carpenter EJ, Gomes HR, Krusche AV, et al. Molecular-level changes of dissolved organic matter along the Amazon River-to-ocean continuum. Mar Chem. 2015; doi:10.1016/j.marchem.2015.06.019.
- Ward ND, Keil RG, Medeiros PM, Brito DC, Cunha AC, Dittmar T, et al. Degradation of terrestrially derived macromolecules in the Amazon River. Nat Geosci. 2013;6(7):530–3.View ArticleGoogle Scholar
- Ward ND, Krusche AV, Sawakuchi HO, Brito DC, Cunha AC, Moura JMS, et al. The compositional evolution of dissolved and particulate organic matter along the lower Amazon River—Óbidos to the ocean. Mar Chem. 2015; doi:10.1016/j.marchem.2015.06.013.
- Masella AP, Bartram AK, Truszkowski JM, Brown DG, Neufeld JD. PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics. 2012;13:31.PubMed CentralView ArticlePubMedGoogle Scholar
- Garcia SL, McMahon KD, Martinez-Garcia M, Srivastava A, Sczyrba A, Stepanauskas R, et al. Metabolic potential of a single cell belonging to one of the most abundant lineages in freshwater bacterioplankton. ISME J. 2013;7(1):137–47.PubMed CentralView ArticlePubMedGoogle Scholar
- Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S. A guide to the natural history of freshwater lake bacteria. Microbiol Mol Biol Rev. 2011;75(1):14–49.PubMed CentralView ArticlePubMedGoogle Scholar
- Delong EF, Franks DG, Alldredge AL. Phylogenetic diversity of aggregate-attached Vs free-living marine bacterial assemblages. Limnol Oceanogr. 1993;38(5):924–34.View ArticleGoogle Scholar
- Fuchsman CA, Staley JT, Oakley BB, Kirkpatrick JB, Murray JW. Free-living and aggregate-associated Planctomycetes in the Black Sea. Fems Microbiol Ecol. 2012;80(2):402–16.View ArticlePubMedGoogle Scholar
- Spieck E, Hartwig C, McCormack I, Maixner F, Wagner M, Lipski A, et al. Selective enrichment and molecular characterization of a previously uncultured Nitrospira-like bacterium from activated sludge. Environ Microbiol. 2006;8(3):405–15.View ArticlePubMedGoogle Scholar
- Giovannoni SJ, Tripp HJ, Givan S, Podar M, Vergin KL, Baptista D, et al. Genome streamlining in a cosmopolitan oceanic bacterium. Science. 2005;309(5738):1242–5.View ArticlePubMedGoogle Scholar
- Luo H, Moran MA. Evolutionary ecology of the marine roseobacter clade. Microbiol Mol Biol Rev. 2014;78(4):573–87.View ArticlePubMedGoogle Scholar
- Field CB, Behrenfeld MJ, Randerson JT, Falkowski P. Primary production of the biosphere: integrating terrestrial and oceanic components. Science. 1998;281(5374):237–40.View ArticlePubMedGoogle Scholar
- Hilton JA, Satinsky BM, Doherty M, Zielinski B, Zehr JP. Metatranscriptomics of N2-fixing cyanobacteria in the Amazon River plume. ISME J. 2015;9(7):1557–69.View ArticlePubMedGoogle Scholar