Please contact us for question about the data
Who is producing these charts and why?
The GBIF Secretariat is producing information on data mobilization trends observed on the GBIF
Showing trends on the data mobilized by the GBIF network can help with planning data
showing the results of previous investments in digitization or data mobilization, or in
to be targeted to improve the fitness-for-use of the data.
Can I reproduce these charts in my national reports?
Yes, however we encourage that they be reviewed before doing so.
How often will the charts be updated?
The charts show data trending from the end of 2007 until recent weeks and will be recalculated
periodically; approximately quarterly.
How can I contribute to this work?
This project is being developed openly on the GitHub
While some data preparation stages require access to the GBIF index and Hadoop infrastructure,
stages run using R and can be developed remotely. Please contact
if you would like to contribute to the work.
Understanding the trends and improving the charts
How have these trends been produced?
The project is documented on the GitHub project
Approximately 4 historical views per year of the GBIF index were restored (totalling
Billion records in May 2014), and the raw data were processed to the latest quality
control and taxonomic backbone.
Various scripts were then used to digest the records into smaller views which are then processed
in R to produce the charts.
How do they take into account changes in the GBIF taxonomic backbone over time?
All data are processed to the latest GBIF backbone taxonomy, to ensure that species counts are
comparable over time.
In some charts I can see that the amount of mobilized data sometimes goes down before
going up again. Why might that be?
This is due to the removal of data sets from GBIF. This might occur if a publisher wishes to
remove their data, but is often
due to the removal of datasets that were inadvertently published twice (duplicate datasets).
I can see strange peaks in the charts showing trends in the temporality of the data.
What might be the cause of this?
The charts may reveal patterns that represent biases in data collection (seasonality, public
holidays) or potential issues in data management
(disproportionate numbers of records shown for the first or last days in the year or each month
or week). Such issues may arise at various
stages in data processing and require further investigation.
I have suggestions to improve the clarity of the charts included here - what should I
Please use the feedback button on the side of the page to log any suggestions.
Why are these charts presented as static images and not something more dynamic?
This is a first iteration of work. Future versions could be more interactive, although one has
to consider if a
PDF view or simple images for (e.g.) annual reports are required. As an open project, anyone
with interest in improving
the data visualization is welcome to get involved. Please contact
How did you select the colours used in these charts and can we improve them?
The colour palettes come from colorbrewer2.org and an
attempt was made to select
colours that would be colour-blind safe. It is difficult to find suitable colour palettes that
work on all charts
(e.g. global and country specific) and input would be greatly appreciated to help improve these.
Which technologies were involved in this work?
The original unprocessed data resides in Hadoop. Hive is used for the SQL processing on the
Hadoop data using custom
UDFs wrapping the GBIF core processing libraries (Java). Hive is used to digest the data into
CSV tables. All other processing is in R.
How to get involved
What can I do to improve the completeness of records available through GBIF?
A complete record is here defined as having species identification, valid coordinates and the
full date of
collection or observation. The charts show that some records published to GBIF are incomplete.
There can be
different reasons for this, which include deliberately excluding coordinates for sensitive data,
or the full
date of collection not being available for some historic collections. However, for many
datasets, the completeness
of records could be improved by working with the data publisher concerned. All GBIF Nodes are
encouraged to consider
how they can work with the data publishers in their networks to improve the completeness of the
records, which will
contribute to making these data fit for a broader range of uses.
I have suggestions for other interesting charts that I would like to see on GBIF.org.
Can I request more charts?
In future GBIF work programmes, it may be possible to extend this work further to include other
interesting trends around data mobilization in GBIF.
Please use the feedback button to provide any additional ideas or comments on the current
charts, or consider contributing to the project.
What would it take for me to produce these charts myself in a different style or
The scripts used for this work are maintained in the GitHub project site.
GBIF can provide the underlying digested data in the form of a collection of CSV files which can
be used in various applications
to produce the charts. For those wishing to do far more detailed analysis than GBIF is able to
do globally, the processed source
records can be provided for subsets of the data (e.g. all records for Spain). Please note that
the Secretariat has limited resources
but will do all they can to support others wishing to further the analysis. Please also note
that the volumes of data can be very large -
the data covers approximately 8 Billion records (May 2014)
How do I provide feedback?
Please use the feedback button on the side of each page or contact
us by mail.