Quick guide to publishing datasets through GBIF.org

GBIF.org supports the publication of four classes of datasets. While the steps below are numbered, endorsement (#2) is the only one that is dependent on others. Do not view what follows as a strictly linear process—several of the steps can and probably should run in parallel.

       

      1. Secure your internal institutional agreements

      Once you decide to share data through GBIF, you should alert administrators of your plans to publish on behalf of your institution. Sharing data through GBIF can increase the visibility and global impact of your institution by building on traditional methods like academic publications, specimen loans and the like. Learn more about the benefits of data publishing on GBIF.org

      At present, the GBIF network only publishes data directly from organizations, that is, institutions, networks and societies. Individuals wishing to publish data should work through their affiliated organizations to seek endorsement as a publisher.

      Citizen scientists and others can also share their observation data by contributing to the growing network of local, national and thematic publishers, including Sweden’s ArtDatabanken, the Norwegian Biodiversity Information Centre, the UK National Biodiversity Network, eBird or iNaturalist, among many others.

      2. Request endorsement

      Endorsement of new data publishers is a GBIF community procedure that aims to ensure that:

      • Data are relevant to GBIF’s scope and objectives
      • Data hosting arrangements are stable and persistent
      • National, regional and thematic networks are actively engaged in data publishing and use
      • Data can be openly shared and reused
      • Data quality can be improved by data publishers responding to feedback

      We encourage organizations to request endorsement as soon as they agree in principle to share data through GBIF in order to avoid delays in publishing data.

      3. Familiarize yourself with publishing tools, workflows and/or partners

      The majority of the data now shared with GBIF resides on one of the dozens of installations of our Integrated Publishing Toolkit, or IPT (see stats). Other alternatives exist, including seeking in-country hosting support from national nodes and/or other active participants (e.g., iDigBio). For advanced publishers we also expose an API to register datasets programmatically. We maintain an extensive knowledge base of tools and documentation, along with detailed manuals for publishers.

      4. Prepare data for publication

      GBIF is built on open standards. We rely primarily on the Darwin Core (DwC) Standard, which accounts for more than 80% the data current published on GBIF.org. We also support BioCASe’s ABCD standard for biodiversity occurrences, the EML standard for dataset description / metadata, and legacy protocols like TAPIR and DiGIR.

      Regardless of the standard or the tool used to share your dataset, this is a good moment to perform some general data quality checking and cleaning, ensuring that all fields have consistent formats, records aren’t duplicated or incomplete, and any known issues or inconsistencies are documented.

      5. Publish data

      If you’re using an IPT, simply click the button to ‘register’ your dataset with GBIF. Your data publisher page and dataset information will appear on GBIF.org, and our real-time infrastructure will quickly begin crawling the individual occurrences. Soon, the indexed summary of the published data will start to display users’ activity and download statistics. These stats—as well as with the Google Chrome extension that won our innovation price, the 2015 Ebbe Nielsen Challenge—can highlight potential issues and guide your efforts to improve the data.

      Next steps


      French translationGuide rapide pour la publication d’ensemble de données via le site GBIF.org