GBIF has launched the pilot phase for the Metabarcoding Data Programme, intended to improve GBIF’s integration of DNA metabarcoding data on biodiversity. As in part a response to the ongoing effort to enrich the GBIF data model, the programme establishes a framework for GBIF nodes to support and engage communities of researchers who collect and manage such evidence using a newly developed tool, the Metabarcoding Data Toolkit (MDT).
The pilot programme is open exclusively to GBIF node managers who wish to manage an instance of the MDT. Node managers interested in joining the pilot may complete the application form, which outlines the terms and conditions of this hosted service. Upon approval of an application, the Secretariat will configure an installation and provide pilot participants an initial introduction for the service, as well as ongoing support, both to ensure successful data publication to GBIF and to gather feedback on the MDT’s ongoing development and improvement.
Consultations in recent years within the GBIF network and among metabarcoding data holders had highlighted the need for user-friendly tools to format and publish DNA metabarcoding data. GBIF Secretariat staff members Tobias Guldberg Frøslev and Thomas Stjernegaard Jeppesen developed the MDT to address this need, building a tool that data holders with basic knowledge of data standards and GBIF processes can use to convert “OTU tables” common among the DNA metabarcoding community into a standardized GBIF-ready format.
”As a mycologist and having worked as a researcher with biodiversity and DNA metabarcoding for many years, I’m excited to see GBIF offer easier ways to share typical eDNA datasets,” said Frøslev. “I’m hopeful this will help accelerate the closing of taxonomic data gaps, particularly in groups like fungi, bacteria and micro-eukaryotes.”
Feedback from GBIF nodes on initial prototypes of the MDT shaped the programme’s design, reinforcing their pivotal role in the network as coordinators of the flow of data from DNA metabarcoding communities into GBIF’s federated infrastructure.
“GBIF nodes participating in the pilot phase of the Metabarcoding Data programme will have the opportunity to engage the numerous holders of eDNA data as potential new publishers,” said Anne-Sophie Archambeau, chair of the GBIF Global Nodes Steering Group and node manager of GBIF France. “Such new collaborations will serve to improve our connections and relevance to wider research and policy communities.”
Programme goals
The initial two-year pilot programme will conclude in 2026, and its desired outcomes include:
- Strengthening ties with eDNA research communities. By leveraging the established role of GBIF nodes in coordinating the endorsement of new publishers and data publication, the network establishes wider and more resilient relationships with the global communities of researchers working with metabarcoding data.
- Enhancing tool usability. Feedback from participants will contribute to iterative improvements and make the Metabarcoding Data Toolkit more user-friendly.
- Refining data templates. The pilot programme will finalize a tested set of standardized templates that assist researchers in preparing metabarcoding datasets for publication through the GBIF network.
- Developing training materials. The GBIF network will create and contribute to comprehensive documentation and training materials that support nodes and data publishers using the MDT.
- Guiding script development. The programme will create guidelines and leverage existing libraries, wherever possible, to support use of R and Python scripts to work with datasets generated by the MDT.
- Determining future deployment strategy. The pilot phase will inform how best to package the MDT for future deployments, whether in installations for GBIF nodes or institutional data hosts or as a GBIF-hosted service.
Next steps
The GBIF Secretariat will provide each participating node with a hosted installation of the MDT, providing maintenance and updates as new features and versions are introduced. In recognition of the fact that data sovereignty issues may require some nodes host datasets on servers within their national boundaries, installations can be configured to operate in one of two modes:
- Publishing mode: MDT users can register datasets for publication through GBIF through the organizations to which they’re associated. Operating in this mode, the MDT functions similarly to an installation of GBIF’s Integrated Publishing Toolkit (IPT) and serves a publishing platform into GBIF.
- Conversion-only mode: MDT users can use it to reshape their datasets into GBIF-ready Darwin Core Archive (DwC-A) files but must download them for hosting and publication on another repository, such as an IPT. This mode may be most appropriate where nodes or data holders have data sovereignty concerns.
Learn more
- Metabarcoding Data Programme
- Apply to participate in the pilot programme (GBIF node managers only)
- Questions about the programme? Contact DNA@gbif.org
- About the Metabarcoding Data Toolkit
- GBIF briefing on Digital Sequence Information (DSI)
The development of the Metabarcoding Data Tookit is co-funded by the European Union (Grant Agreement No.101057437: BioDT) |