FAQ

Answers to some of the most frequently asked questions about GBIF and GBIF.org

As a data publisher, how do I make sure that my organization's logo looks best on GBIF.org?

Simple steps to format your logo You can use any basic image tool to adjust your file: Create a canvas: For example, 900 x 600 pixels (for a 3:2 ratio). Place your logo: Center the logo on the…

I have occurrence data/photos/videos of species x that I would like to submit to GBIF. (How) can I do so?

GBIF only aggregates data from endorsed institutions. If you are affiliated with an institution that is already registered as a data publisher, you can find details of whom to contact on their data…

I'm having issues connecting to and accessing GBIF services or network—what might be wrong and what can I do?

GBIF servers are openly accessible on the internet and intended for use by all. Our data-processing infrastructure also needs to make outbound connections to registered data repositories in order to…

How do I create a CSV file for a derived dataset?

The CSV file must contain exactly two columns, the first containing datasetKey identifiers and the second containing a count (integer) of how many occurrences were derived from that specific dataset.…

Which coordinate systems are used for GBIF occurence downloads?

Publishers can share their coordinate data in various spatial reference systems, datums and ellipsoids, but GBIF's ingestion workflow always reprojects the coordinates to World Geodetic System (WGS)…

How can I add events to my project page?

To have events related to a project show up on the project's page, please suggest the event and make sure to include your project ID under GBIF-funded projects.

How can I link datasets to my project page?

For datasets to be displayed on a project page, you must ensure that your project ID has been specified correctly, i.e., under Metadata -> Project Data -> Identifier in the IPT, or <project…

I used data from GBIF, but I don't have a DOI—what can I do?

If you downloaded occurrence data through GBIF.org, you simply need to log back into your account and find the download from the list of downloads. If you obtained the data through a third-party tool…

I lost/forgot the DOI of my download. How can I find the DOIs of past downloads?

You can access a full history of all previous downloads in your GBIF.org account. After logging in, click you username in the upper-right corner and then click the "Downloads" tab. Downloads are shown in reverse chronological order, i.e., with the most recent downloads at the top.

How can I improve the matching of occurrence records with GRSciColl collection registry entities?

GBIF matches primary occurrence recordss to GRSciColl collections entries using e.g., collectionCode, institutionCode and collectionID. Sometimes these are ambiguous, and the matches are then flagged…

What are the IP addresses of GBIF servers?

All GBIF servers have IP addresses in the range 130.225.43.0/24, and data publishers and hosts should adjust their firewall filters make sure that access is allowed from this range.

What are some examples of expected deliverables from the final evaluation phase of the BID grant?

These would be user-defined, integrated data products and communication tools derived from these products that could include

Data summary tables indicating population trends, species richness and community composition
Map products such as species distributions or geographical data gap analyses
Indicators informing Sustainable Development Goals or CBD targets

Why does my download contain a different number of records than reported in the portal and the download page?

The occurrences shown when searching on GBIF.org and the occurrences used for generating large download archives (approximately 100,000 records or more) originate from two separate data stores. The…

What is the definition of "Relevance" and the different options used in this filter?

In the literature database, Relevance refers to how publications relate to GBIF following these definitions: GBIF used: makes substantive use of data in a quantitative analysis (e.g. ecological…

What does the taxon status "doubtful" mean and when is used?

A taxon name can be flagged as doubtful for several reasons that have to do with the way the GBIF backbone taxonomy is built. Here is a non-exhaustive list: The name comes from a nomenclator…

In what formats can I download occurrence data from GBIF.org?

Occurrence data can be downloaded in the following formats: CSV: Tab delimited CSV. Only contains the data after GBIF interpretation. No multimedia included. More information about CSV Darwin Core…

How many species are in GBIF?

This may seem like a simple question, but the answer isn't. For a detailed explanation of why GBIF doesn't show a species count on the home page, please refer to this article.

(How) can I publish molecular/sequence/DNA based data to GBIF?

Yes, you can publish sequence-based data to GBIF. Read more here. Note that GBIF.org does not index and provide search for the sequences themselves. However, sequences may be displayed on individual…

How can I use data from GBIF in in ArcGIS?

While beyond the scope of this FAQ to provide detailed and exhaustive help on how to use ArcGIS, these simple steps can be followed to add species occurrences to a map. Using downloaded data (small…

What is the Darwin Core Archive download format? What's contained in the file?

This format is a TDWG Standard and contains rich information. It is a ZIP file containing the original data as shared by the publisher, and the interpreted view after data has gone through quality…

What is the Tab delimited CSV download format? What's contained in the file?

This simple format provides a view of the data with the most commonly used columns, separated by tabs. The table includes only the data after it has gone through interpretation and quality control. Tools such as Microsoft Excel can be used to read this format.

How do I download images shown in the gallery?

Some occurrence records have images associated with them, and in occurrence searches you can view all images in a search using the Gallery tab. However, as images are not hosted by GBIF and may be…

How do report a bug or error–or feedback in general–on this site?

You can access Feedback and questions by clicking the speech bubble () in the upper right-hand corner. Here, you can report bugs or issues on specific content–as well as submit ideas for new developments on the site.

All feedback submitted is public and available through Github.

Why do some taxa show as being "deleted"? What does this mean?

This means that the taxon has been deleted from the GBIF Backbone Taxonomy, but the identifier is preserved for historical purposes. Taxa are deleted for a number of reasons, e.g. duplicate entries, being removed from source checklist, etc.

What is a DOI? How does GBIF use them?

A Digital Object Identifier, or DOI, is a standard, permanent identifier that provides an actionable, interoperable, persistent link to any entity. The concept is that DOI differs from commonly used…

Can I use the images I find via GBIF.org?

Though some occurrence records published on GBIF.org do include photo, audio or even video content as evidence, GBIF does not publish any of this multimedia content directly. Instead, data providers…

What is an orphan dataset?

A dataset indexed by GBIF.org is considered "orphaned" when GBIF hasn't been able to re-ingest it from its source for more than six months. At this point, GBIF takes steps to export the dataset to a…

How do I open tab-delimited CSV files downloaded from GBIF.org in Excel?

If you're using Excel on a Mac Open Excel Create a new empty spreadsheet (File → New) Import text file (Data → Get Data → From File → From Text/CSV) Select the downloaded CSV file (e.g.…

How and when does GBIF assign Digital Object Identifiers (DOIs)?

GBIF assigns unique Digital Object Identifiers (DOIs) to all datasets and occurrence downloads. When data is used, following DOI citation practice ensures an easy and consistent way of crediting…

For how long does GBIF store downloads?

Download files are initially stored for six months. After six months, the CSV or Darwin Core file may be deleted, but the information about the download will be kept forever. This includes the DOI,…

What is a publishing institution?

A publishing institution is a GBIF data publisher that has published at least one dataset. As newly endorsed publishers may not yet have published any data, the total number of data publishers is slightly higher.

Why can't I open the zip file I downloaded from GBIF.org?

Downloads bigger than four gigabytes (4 GB) need to be compressed using an extension of the original zip format called ZIP64. Not all operating systems support this extension natively. MS Windows XP and Mac OS X systems are among those. Please make sure that the software you are using to decompress the file is compatible with the ZIP64 extension.

What is inside a GBIF download zip file?

When you request a download in the GBIF data portal, you will receive a Darwin Core Archive file (DwC-A). This is the most widely-used data exchange file format in the GBIF network. To open it, you…

How do I provide feedback to data trend charts?

Please use the feedback button at the top right of each page or contact us by email.

What would it take for me to produce data trend charts myself in a different style or language?

The scripts used for this work are maintained in the GitHub project site. GBIF can provide the underlying digested data in the form of a collection of CSV files which can be used in various…

Can I make suggestions for other interesting charts that I would like to see on GBIF.org?

In future GBIF work programmes, it may be possible to extend this work further to include other interesting trends around data mobilization in GBIF. Please use the feedback button to provide any additional ideas or comments on the current charts, or consider contributing to the project.

What can I do to improve the completeness of records available through GBIF?

A complete record is here defined as having species identification, valid coordinates and the full date of collection or observation. Some records published to GBIF are incomplete. There can be…

Which technologies are involved in producing data trend charts?

The original unprocessed data resides in Hadoop. Hive is used for the SQL processing on the Hadoop data using custom UDFs wrapping the GBIF core processing libraries (Java). Hive is used to digest the data into CSV tables. All other processing is in R.

How did you select the colours used in data trend charts and can we improve them?*

The colour palettes come from colorbrewer2.org, and an attempt was made to select colours that would be colour-blind safe. It is difficult to find suitable colour palettes that work on all charts (e.g. global and country specific) and input would be greatly appreciated to help improve these.

Why are data trend charts presented as static images and not something more dynamic?

This is a first iteration of work. Future versions could be more interactive, although one has to consider if a PDF view or simple images for (e.g.) annual reports are required. As an open project, anyone with interest in improving the data visualization is welcome to get involved. Please contact us.

How do I submit suggestions to improve the clarity of the data trend charts?

Please use the feedback button on the top of the page to log any suggestions.

What is the cause of strange peaks in the charts showing trends in the temporality of the data?

The charts may reveal patterns that represent biases in data collection (seasonality, public holidays) or potential issues in data management (disproportionate numbers of records shown for the first or last days in the year or each month or week). Such issues may arise at various stages in data processing and require further investigation.

In some trend charts, why does the amount of mobilized data sometimes goes down before going up again?

This is due to the removal of data sets from GBIF. This might occur if a publisher wishes to remove their data, but is often due to the removal of datasets that were inadvertently published twice (duplicate datasets).

How do data trend charts take into account changes in the GBIF taxonomic backbone over time?

All data is processed to the latest GBIF backbone taxonomy, to ensure that species counts are comparable over time.

How are data trend charts produced?

The project is documented on the GitHub project site. Approximately four historical views per year of the GBIF index are restored (totalling approximately 8 Billion records in May 2014), and the raw data is processed to the latest quality control and taxonomic backbone. Various scripts are then used to digest the records into smaller views which are then processed in R to produce the charts.

How can I contribute to data trend charts?

This project is being developed openly on the GitHub project site. While some data preparation stages require access to the GBIF index and Hadoop infrastructure, other stages run using R and can be developed remotely. Please contact us if you would like to contribute to the work.

How often are data trend charts updated?

The charts show data trending from the end of 2007 until recent weeks and will be recalculated periodically; approximately quarterly.

Can I reuse data trend charts in my national reports?

Yes, however we encourage that they be reviewed before doing so.

Are data trend reports available for download?

The data used for the data trend charts and reports is available to download at https://analytics-files.gbif.org/

Who produces data trend charts these charts and GBIF.org and why?

The GBIF Secretariat is producing information on data mobilization trends observed on the GBIF network. Showing trends on the data mobilized by the GBIF network can help with planning data mobilization efforts, showing the results of previous investments in digitization or data mobilization, or in highlighting issues to be targeted to improve the fitness-for-use of the data.

What type of datasets does GBIF index/support?

GBIF currently supports four classes of datasets. GBIF currently only indexes species occurrence records though, which can be provided as either core records or as extension records. In the case of sampling-event datasets, species occurrences in extension records will be augmented with information coming from its core event record wherever possible.

How often does GBIF reindex my dataset?

GBIF automatically attempts to reindex a registered dataset each time its registration is updated. This happens each time the dataset gets republished via the IPT. To cater to datasets not published…

Where can I find additional information about how GBIF automatically generates text citations for datasets?

You’ll find a slightly more formal description of the logic behind automatic citation generation in this GBIF GitHub repository.

How is the dataset citation generated if I don’t name any authors, or list only the metadata author without any originating authors?

In cases where no authors are named, or where only metadata authors are named without any originating authors, the citation text will start with the name of the publishing institution, followed by the publication year and the other elements.

How is the dataset citation text auto-generated?

By default, the auto-generated citation contains the following information: Name(s) of the dataset’s originating author(s), formatted to show surname and initial(s), e.g. ‘Andersen AA’ for Anders…

Where does the dataset citation text come from? I published this dataset, and that is not the citation text that I provided!

The data standards that GBIF supports, and that institutions use to publish their data through GBIF, include a number of so-called metadata elements–descriptive information, that is, about the…

Can you provide an example of an auto-generated dataset citation?

Text appears at the bottom of each dataset page on GBIF.org to provide guidance on how users should cite it. We also provide more general guidelines on citations. Here's an example: Khidas K,…

Why hasn’t GBIF (re)indexed my dataset yet?

Occasionally, GBIF turns off its indexing service for maintenance. This is the most common reason why datasets aren’t indexed as quickly as expected. If your dataset has been successfully reindexed,…

How long does it take GBIF to start (re)indexing my dataset?

The answer depends on how long GBIF's indexing queue is, how big your dataset it and whether GBIF's indexing service is turned on. Normally it will take between 5-60 minutes for GBIF to start…

These are some of the most frequently asked questions about GBIF and GBIF.org. Please contact us if your questions aren’t answered here.