Data accessed through the GBIF network is free for all—but not free of obligations. Under the terms of the GBIF data user agreement, users who download individual datasets or search results and use them in research or policy agree to cite them using a DOI, or Digital Object Identifier.
Good citation practices ensure scientific transparency and reproducibility by guiding other researchers to the original sources of information. They also reward data-publishing institutions and individuals by reinforcing the value of sharing open data and demonstrating its impact to their stakeholders and funders. Datasets published through GBIF are authored electronic data publications and, as such, should be treated as first-class research outputs and correctly cited.
While all example citations below are formatted in Harvard style, please adapt them to the style format required by your institution, publisher or agency. However, please do include each element of content—most importantly the DOI expressed as a URL.
- Occurrence data downloads
- Individual datasets
- Species pages
- Derived datasets
- Occurrence data obtained using third-party tools (e.g. rgbif, pygbif, spocc, dismo, etc.)
- Occurrence data accessed via the GBIF occurrence search API
- Occurrence data accessed in a cloud computing environment
Citing non-data content
Occurrence data downloads
When downloading data from GBIF.org, a registered user is immediately redirected to a page that includes the following information:
This citation appears again in the confirmation email sent to the registered user. Keep this reference close so you can cite it. Details of previous downloads can always be accessed in the registered user's list of downloads. Please contact GBIF if you need help finding a previous download.
The download page provides a record listing all contributing datasets as well as a snapshot of all search terms, filters and facets. Users can quickly update search results from the download page and will also see links to any citations once they are picked up in GBIF's literature tracking programme (for example).
Citing filtered downloads
If you have filtered downloaded data significantly, you can create a derived dataset to cite only the records used in downstream analysis. This requires that you preserve the
datasetKey column during filtering steps.
Citing multiple downloads
If you have used multiple downloads, you may not be able to include all citations in the reference list of your article. In this case, we recommend including a supplementary list or addendum of all downloads used. You may also choose to summarize the combined data using a derived dataset. Note that the GBIF download system allows for multiple taxa (up to 100,000) in a single download request.
Most downloads from GBIF.org contain records from multiple datasets (as above), but in some instances, such as internal reporting or the advance publication of a dataset for research, users may want or need to cite a single dataset, as in this example:
Rivas Pava M D P, Muñoz Lara D G, Ruiz Camayo M A, Fernández Trujillo L F, Muñoz Castro F A, Pérez Muñoz N (2017). Colección Mastozoológica del Museo de Historia Natural de la Universidad del Cauca. Version 1.1. Universidad del Cauca. Occurrence dataset https://doi.org/10.15472/ciasei accessed via GBIF.org on 2020-03-02.
Note, that as datasets may change over time, even single-dataset downloads are assigned new, unique DOIs which should used in citations. If appropriate, this can be done in combination with the original dataset citation, e.g.:
Telenius A, Jonsson C (2017). Molluscs of the Gothenburg Natural History Museum (GNM). GBIF-Sweden. Occurrence download https://doi.org/10.15468/dl.f14yjv accessed via GBIF.org on 2020-03-02.
Each species page includes a default citation, for example:
Note: If making assertions about the distribution of a given taxon, consider making a download of occurrences. This will ensure a persistent time-stamped snapshot of data with a DOI that can be cited in the same way as occurrence data downloads.
Derived datasets are citable records of GBIF-mediated occurrence data derived either from:
- a GBIF.org download that has been filtered/reduced significantly, or
- data accessed in a cloud computing environment, or
- data obtained by any means for which no DOI was assigned, but one is required (e.g. third-party tools accessing the GBIF search API)
When created, a derived dataset is assigned a unique DOI that can used to cite the data. To create a derived dataset you will need to authenticate using a GBIF.org account and provide a list of the GBIF datasets (by DOI or datasetKey) from which the data originated, ideally with counts of how many records each dataset contributed.
GBIF data accessed using third-party tools (e.g. rgbif, pygbif, spocc, dismo, etc.)
Accessing occurrence data from GBIF in R, Python and other programming languages is fast and easy. It is, however, important to always keep in mind that the citation requirements of the GBIF data user agreement still apply.
For most users, obtaining occcurrence data using the occ_download() function of the rgbif package is strongly recommended as this ensures that downloads are assigned DOIs for easy citation.
Tools returning results directly from the GBIF search API (e.g. spocc, dismo and the occ_data() and occ_search() functions of rgbif) will not assign single DOIs for data downloaded. It is up to the user to identify dataset publishers and properly acknowledge each of them when citing the data.
For data obtained via occurrence search API-based tools, we recommend using a derived dataset as an easy way of obtaining a DOI for citing the data. The rOpenSci documentation site provides instructions on how to cite GBIF-mediated data in rgbif.
GBIF makes monthly snapshots of occurrence data available for analysis in a number of cloud computing environments:
Users accessing and/or analysing data in such cloud environment should refer to specific instructions provided in the cloud computing repositories. As a minimum, include the DOI of the relevant snapshot (see table) in the citation. For analyses where data are significantly filtered, please track the
datasetKeys used and use a derived dataset record for citing the data.
Citing non-data GBIF content
Those wishing to cite GBIF's website in general can use the following example:
GBIF.org (year), GBIF Home Page. Available from: https://www.gbif.org
[13 January 2020].
Authored content at GBIF.org (web page)
Similarly, users can cite non-data pages on the GBIF website as, for example:
GBIF.org (year) Citation guidelines. Available from https://www.gbif.org/citation-guidelines
[13 January 2020].
Note: this approach is not an accepted alternative for citing data downloads.
GBIF as an infrastructure/entity
We recommend that those wishing to cite GBIF in a broader, more general context should use the following citation:
GBIF: The Global Biodiversity Information Facility (year) What is GBIF?. Available from https://www.gbif.org/what-is-gbif
[13 January 2020].