Data hosting

A quick guide for making good decisions about how to host data shared with GBIF

wikimedia-servers
Wikimedia Foundation servers. Photo 2012 Victor Grigas, Wikimedia Foundation, licensed under CC BY-SA 3.0.

GBIF.org is an index of biodiversity data published through a globally distributed network of national, thematic and project infrastructures. Within this interconnected system, it is essential for data publishers to ensure that the data they share has a persistent, stable point of access. This requirement is challenging for many institutions, especially those that are new to GBIF and may not have the facilities to host and maintain data on servers that always remain online.

One way to start addressing the challenge is to distinguish between data publishing and data hosting. While these activities are connected, there is no formal or technical requirement that the same institution must perform both tasks (even if that is generally the case).

Data publishing is the act of organizing and sharing data standardized for use through the GBIF network. An institution becomes a GBIF data publisher by completing an online registration form and receiving endorsement, either through one of the national and organizational Participants in the GBIF network or the Nodes Steering Group.

Data hosting is the act of storing the data on a stable and accessible web platform. While there is no standard arrangement for providing this service, data hosting does represent a significant commitment that requires dedicated, long-term capacity that maintains a persistent and highly reliable web-connected platform.

Regardless who hosts datasets, GBIF works to credit attribute both the data publishing institution and its country of registration. What follows is a quick guide to making informed decisions about how to host data shared with GBIF.


Hosting steps

Once your data have been organized into a supported data formats, proceed as follows:

  1. Become a GBIF data publisher by completing the publisher registration form
  2. Choose a data hosting and publishing platform. GBIF's Integrated Publishing Toolkit can be self-hosted, hosted by a national or thematic node (including one of several available trusted data-hosting centres, or hosted on one of the GBIF Secretariat's cloud-based regional IPTs.
  3. Get access to the IPT manual and training resources
  4. Start publishing your datasets


Intro to the IPT: Integrated Publishing Toolkit

The IPT is free open-source software developed and supported by the GBIF Secretariat that organizations around the world use to publish and share biodiversity datasets through the GBIF network. The IPT can also function as a repository for data referenced in an article, as in this example of an IPT installation hosted by the Canadensys network.

Learn more about the technical requirements for hosting an IPT

Test Mode

The IPT can be installed in Test mode, which means that its hosted resources will not be indexed or publicly accessible by searching on GBIF.org. If you decide to install your own IPT, GBIF recommends that you try Test mode first in order to understand the registration process. Test mode is for running the IPT while evaluating it or conducting training; test-mode registrations will go into a test registry and resources will never be indexed.

Once you are sure that the IPT is working the way that you expect, you will have to reinstall the software in Production mode to make the data actually discoverable through GBIF. Production mode registers datasets and publishes them so they are indexed and publicly accessible through GBIF.org.

Both the IPT instance and its associated organization must be registered with GBIF. If your organization isn't registered yet, you will be asked to complete this step and provide basic information through a short form in the IPT. Learn more about how this works in the IPT User Manual


Terms of Use

The use of an external data host by a data publisher should be negotiated between the respective parties, ideally with a service-level agreement that outlines the terms and obligations for both the data publisher and the data host. The use of GBIF's cloud-hosted IPT will be governed by the GBIF Data Publisher Agreement.