This marks a key milestone in GBIF’s implementation of machine-readable licensing, which began in April 2014 with a community consultation, followed by painstaking communications with all data publishers, led by GBIF’s national nodes.
In the next four to six weeks, GBIF.org will release a feature enabling users to search and filter occurrences by licence type, the final step in the implementation.
Built on a 2014 decision by GBIF’s governing board, all occurrence datasets are now assigned one of the following three licences from Creative Commons:
- CC0, under which data are made available for any use without restriction or particular requirements on the part of users
- CC BY, under which data are made available for any use provided that attribution is appropriately given for the sources of data used
- CC BY-NC, under which data are made available for any use provided that attribution is appropriately given and provided the use is not for commercial purposes
CC-BY-NC licences have a significant effect on the reusability of data, and data publishers have been encouraged to use the most open option they can wherever possible.
The vast majority of publishers (83%) have selected CC BY licences for their datasets, with CC0 licenses accounting for another 5 per cent. In terms of individual records, more than 82 per cent are under either CC0 or CC-BY, which place no restriction on users other than to give appropriate attribution, and GBIF’s support for digital object identifiers (DOIs) makes attribution simple and straightforward. In addition, more than 50% of all records share through GBIF now carry the CC0 designation, which waives all copyright claims and places them permanently in the public domain.
Impact on data totals
The improvements gained through these changes do, however, come with costs. A few data publishers felt unable to assign the accepted licences to their datasets, so users will lose access to some previously available datasets. In other instances, the top-to-bottom review of datasets identified some legacy versions that duplicated records now available in other, more current form. Overall, the total data loss amounts to 456 datasets containing 48.7 million occurrences—about 7.5 per cent of the total number of records.
“When we started this process, some predicted massive data loss, but these results inspire confidence that the efforts across the community, while lengthy and sometimes difficult, have been worthwhile,” said Donald Hobern, director of the GBIF Secretariat. “Given that the vast majority of publishers in our network clearly recognize the value of sharing their data under standardized licences, we can move forward and encourage re-use of open data for public benefit more rationally and consistently.”
While the total number of occurrences available through GBIF.org will dip to just more than 600 million records, we expect that ongoing data mobilization and publication will compensate for the decrease over a relatively short period. As GBIF’s analyses of global trends show, the number of records published through the GBIF network continues to grow.
The largest national impact falls on the United Kingdom, where GBIF will remove 329 datasets comprising 27.3 million occurrences, which represent 72 per cent of the records currently published through the UK National Biodiversity Network (NBN). However, both GBIF and NBN anticipate that many if not most of these will return, once NBN completes discussions on data licensing with all of its data publishers in March 2017.
Following the review that NBN launched in 2014, its data partners affirmed their commitment to open data and continue to make steady progress toward fully compatible machine-readable licences. NBN has also secured permission to publish 120 new datasets not previously shared through GBIF, which will join the 149 datasets that will remain re-licensed by current UK data partners.