Editorial | Open | Published:
“Available upon request”: not good enough for microbiome data!
Microbiomevolume 6, Article number: 8 (2018)
Open data that is free and publicly available without restrictions is critical for progress in any scientific discipline and has been the cornerstone of sound and reproducible genomics research. Microbiome research is still a relatively young, thriving, active research field, with great biomedical potential. As a large data-driven research field, microbiome projects can include hundreds or even thousands of participants, samples, and associated background (“metadata”) parameters. Processing this data, identifying meaningful associations, and determining significance depends on complex, often non-standardized bioinformatics and biostatistics protocols. Reproducibility, transparency, and expandability of these protocols to review, evaluate, and build upon this work is crucial to fulfill on the promise of microbiome research and maintain credibility. At the absolute minimum, unrestricted access to the raw sequencing data and associated metadata is needed and has been recognized and implemented by the scientific community, some journals, and funding agencies. In practice, access to open protocols for data processing and analysis is also important to promote reproducibility and advances in the field but rarely provided. Unfortunately, there appears to be an increasing number of studies that are failing to satisfy even basic, community-accepted standards.
Motivated by a number of recent negative experiences in our own research projects, as well as our interaction with authors aiming to publish in Microbiome, this editorial aims to shed light on common problems in the field and make recommendations to reinforce a culture of open data and protocols for microbiome research.
Access to sequence data is required by most peer-reviewed journals. However, when we attempted to access published sequence and metadata from microbiome projects, we have often encountered missing, incomplete, inconsistent and/or incomprehensible sequence and metadata, and reluctance by authors, editors, and publishers to react to our complaints.
Authors increasingly use new models for data distribution, which restrict or limit data access. Data is only made “available upon request” or access granted based on non-transparent, arbitrary, and costly application procedures.
Reproducibility is further complicated by the limited availability of bioinformatic and biostatistic protocols, including software versions, program parameters, and code of software scripts.
Although personal instances will vary, examples like the one highlighted in Table 1 are commonplace and largely unreported. We believe that the field would greatly benefit from an improved open data and open protocol culture. In the following, we outline a number of recommendations, which we have begun implementing at Microbiome:
Free unrestricted access to data and metadata, non-commercial bioinformatic software, options and code of published analysis should be given at the time of manuscript peer review and ongoing once published.
Released data and protocols should encompass all parameters and analyses (including the code and scripts used) that are part of the publications and needed to fully reproduce its results.
Journal peer review guidelines should be extended to include checking compliance with open data and protocol guidelines.
Journal responsibilities should be extended and reinforced to control compliance and to react to non-compliance.
We are concerned that recent trends will continue and that they will set the precedent for data access restriction, greatly limiting scientific progress and reproducibility. We should note that some may try to contest open data access under the veil of privacy, but while data must be handled ethically, the public release of non-identifiable molecular data that has already led to publishable results must be the minimum moral/scientific standard to which researchers must be held. Further, funding agencies (public and private) should require their grantees to be fully compliant with open data access policies and endorse open data guidelines developed by the scientific community. We would encourage all microbiome researchers including authors, editors, and peer reviewers to stand up for open data access in order to ensure progress, credibility, and reproducibility in this rapidly developing research field.
Zhernakova A, Kurilshikov A, Bonder MJ, Tigchelaar EF, Schirmer M, Vatanen T, Mujagic Z, Vila AV, Falony G, Vieira-Silva S, et al. Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science. 2016;352(6285):565–9.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.