Quick guide to publishing data through GBIF.org

Learn about tools, processes and best practices for publishing datasets through the GBIF network

GBIF.org supports the publication of four classes of datasets using widely accepted biodiversity data standards.

At present, the GBIF network only publishes datasets directly from organizations. Individuals who wish to publish relevant datasets should work through their affiliated organizations (see 'Request endorsement' below) or consider submitting a data paper to one of a growing number of journals.

Citizen scientists can contribute occurrence records indirectly by participating in the growing number of projects worldwide that publish their datasets through the GBIF network.

Secure institutional agreements

Once you decide to share data through the GBIF network, you should alert administrators of your plans to publish on behalf of your institution. Sharing open data can increase the visibility and impact of institutions, building on traditional methods like academic publications and specimen loans to reveal new opportunities for collaboration and, through the use of DOI-based citations, link directly to research uses (example).

Request endorsement

To become a data publisher, your organization must request endorsement from the GBIF community. Once you have reviewed the data publisher agreement and agree in principle to share data, we encourage you to request endorsement for your organization as soon as possible to avoid delays in publishing data.

Select publishing tools and partners

Much of the data now shared with GBIF resides on one of the dozens of installations of the GBIF IPT: Integrated Publishing Toolkit and, increasingly, on national installations of the Living Atlases platform originally developed by the Atlas of Living Australia.

Other alternative arrangements exist, including those for data hosting both within and outside a given data-publishing institution. Highly skilled publishers can also use an API to register datasets programmatically (contact the GBIF help desk for more details).

We also maintain a knowledgebase of tools and other documentation.

Prepare data for publication

Data holders who choose to share their data using Darwin Core Archives (see data standards) can familiarize themselves with the format using spreadsheet templates created for occurrence datasets, checklists and sampling-event datasets.

Data holders have a choice to make regarding their arrangements for hosting data. Some choose to host and maintain instances of the Integrated Publishing Toolkit (IPT, a free, open-source software tool developed by the GBIF Secretariat. However, other alternatives exist, including hosted IPT services available through national and thematic nodes and cloud-based regional services maintained by the Secretariat.

Using the updated GBIF Data Validator, you can check datasets prior to publication and receive specific recommendations on improving and cleaning them. The report will help, for instance, by flagging duplicate records, incomplete fields and recognized inconsistencies in formatting.

You can also prepare datasets to comply with GBIF's data quality requirements.

Choose a Creative Commons license

In keeping with a 2014 decision by the GBIF governing board, data publishers must assign one of the three Creative Commons licences to any occurrence dataset:

CC0, for data made available for any use without any restrictions
CC BY, for data made available for any use with appropriate attribution
CC BY-NC, for data made available for any non-commercial use with appropriate attribution

Note that CC-BY-NC licences have a significant effect on the reusability of data. GBIF encourages data publishers to choose the most open option they can wherever possible.

Publish datasets

If you’re using an IPT, simply click the button to ‘register’ your dataset with GBIF. Once published, you can view some quick metrics on your dataset (example) , user download activity (example) and traceable literature citations (example).

Incentives for publishing open-access biodiversity data

An important part of GBIF's mission is to promote a culture in which people recognize the benefits of publishing open-access biodiversity data, for themselves as well as for the broader society.

By making your data discoverable and accessible through GBIF and similar information infrastructures, you will contribute to global knowledge about biodiversity, and thus to the solutions that will promote its conservation and sustainable use.
Data publishing enables datasets held all over the world to be integrated, revealing new opportunities for collaboration among data owners and researchers.
Publishing data enables individuals and institutions to be properly credited for their work to create and curate biodiversity data, by giving visibility to publishing institutions through good metadata authoring. This recognition can be further developed if you author a peer-reviewed data paper, giving scholarly recognition to the publication of biodiversity datasets.
Collection managers can trace usage and citations of digitized data published from their institutions and accessed through GBIF and similar infrastructures.
Some funding agencies now require researchers receiving public funds to make data freely accessible at the end of a project.