This pages details the main updates to the GBIF.org and related infrastructure. Further details are found in the GitHub repositories, including
26 October 2023
- The rewrite of the workflows that generate the precalculated maps on GBIF.org was deployed.
20 September 2023
- GBIF.org now processes
taxonConceptIDfor configured identifier schemes; initially WoRMS LSID. The details of this are discussed in this issue.
28 August 2023
- New GBIF backbone taxonomy, with three new sources and other updates. Name matching to species aggregates has been improved. Refer to the backbone build log for additional details.
- The field
identifier) has been removed from interpreted data and downloads, as it had been introduced unintentionally and contained only internal identifiers.
- 43 unused Dublin Core terms — which have never been used in Darwin Core, and were always empty — have also been removed from new Darwin Core archive downloads.
- Searching using
gbif_idis now supported, in the API, website and downloads.
24 August 2023
Two new reference databases added to the Sequence ID tool
- 12S: MitoFish - Mitochondrial Genome Database of Fish
- 18S: PR2 18S rRNA database
12 January 2023
- Continent interpretation now considers occurrence coordinates. All georeferenced, terrestrial records now have a continent value, and issues are applied where the publisher's value is unexpected.
- A new field
distanceFromCentroidInMetersis present for occurrences within 5000m of a known country centroid. Particularly for preserved specimens, this can highlight imprecise georeferencing. See the data blog for the motivation for this field.
- Coordinate uncertainty, where provided by the publisher, is now taken into account when verifying the country/countryCode values.
- A new check prevents derived datasets from being created without a related dataset.
- The Camera Trap Data Package (Camtrap DP) format is now supported.
26 August 2022
- Mechanisms deployed to detect large scale changes in occurrence record IDs before indexing. When detected data managers can intervene and confirm or correct mistakes to better ensure GBIF ID stability.
23 June 2022
- Clustering rules relaxed for iBOL and EMBL(INSDC) datasets to accommodate more sparse data. Now when a record is from there, it is sufficient to have the accepted scientific name and identifier overlap to connect the records.
22 June 2022
- Occurrence search results can now be shown in 4 map projections
19 May 2022
- GBIF data now on Google BigQuery as a public table, updated monthly
2 May 2022
- 16S (Bacteria and Archaea) Genome Taxonomy Database r207
- COI (Animalia) database updated to International Barcode of Life v2022-02-22
28 April 2022
The following fields that can contain multiple values, can now be searched using individual values (
samplingProtocol). For example, searching for records collected by "L. Richardson" now returns records when they were part of a group of people making the observation. (pipelines/665 and pipelines/283)
Occurrences are now searchable using the Darwin Core
datasetNameand `datasetID fields. This search respects what the record states the value is, allowing different values within a dataset registered on GBIF to support search within aggregated datasets. (pipelines/662)
datasetNameis now included in the occurrence download (pipelines/270)
Occurrence records are now searchable using the Darwin Core term
otherCatalogNumbersin the API (pipelines/664)
preparationsfield is now correctly populated (pipelines/667)
11 March 2022
- Rules for clustering tightened, to avoid over-eagerly clustering records that have the same species and catalogue number but nothing else to support the link
1 March 2022
- The IPT 2.5.7 released, addressing 3 bugs and 2 minor improvements
28 February 2022
- Datasets can now declare which country should be attributed as publishing the data on a record by record basis. Previously, only eBird had this capability, now others can, such as iNaturalist
7 February 2022
- Changes to the map server have been deployed, to provider higher resolution maps to all hosted portals, such as on GBIF.us
17 February 2022
- Changes to the content model deployed allowing the communications team more control of the GBIF.org homepage. Deployed with first changes to the styling.
3 February 2022
- The data validator has been updated to be consistent with GBIF.org indexing. Within the tool, logged in users can now find their historical validation reports
31 January 2022
- The occurrence index has been updated to support the latest version of Darwin Core, released last year.
- A new basis of record, Material Citation, replaces the obsolete Literature basis of record. Additionally, the "Unknown" basis of record will no longer be used, instead records will be shown with an "Occurrence" basis of record.
- The existing term Establishment Means now has a vocabulary in Darwin Core. This replaces the GBIF enumeration used for this term.
- New terms Degree of Establishment and Pathway are now available, and have their own vocabularies.
- The new terms Vertical Datum, Verbatim Identification, Subfamily, Infrageneric Epithet and Cultivar Epithet may be used on occurrences, although the taxonomic terms are not yet supported by the GBIF Backbone Taxonomy.
29 January 2022
- Changes to data ingestion applied that aborts the process if >5% of record IDs are seen to change, allowing data managers to verify before proceeding.
13 January 2022
- GRSciColl now supports the ability to select a GBIF dataset or publishing organisation as the "master" source of information for a GRSciColl collection or institution record. Changes made in the organisation's registration or dataset metadata will automatically be reflected on the GRSciColl entity.
3 December 2021
A new backbone is live, with a new WCVP Fabaceae source and additional Plazi publications. For further details, see the build log
28 October 2021
- New filters and facets for the literature service (gbifTaxonKey, gbifOccurrenceKey, gbifHigherTaxonKey, citationType)
- New geo distance filter/predicate for occurrence search and download (geoDistance)
- New model for GRSciColl contacts that replaces the current staff members (#379)
- Number of specimens in institutions made optional (#389)
- Taxonomic coverage added to the collections search (#390)
- Lookup now accepts alternative codes + ID matches as exact (#381)
17 September 2021
New backbone live
- Improved parsing and matching of operational taxonomic units (OTUs) from
Genome Taxonomy Database (GTDB)
- Update of Systema Dipterorum
- First update of the PaleoBiology Database (PaleoBioDB) and Index Fungorum (via Species Fungorum for CoL+) in several years
- Addition of United Kingdom Species Inventory (UKSI)
- Addition of three national checklists (plants, birds, legal) from Colombia
- Updated Fabaceae taxonomy via RBG Kew's World Checklist of Vascular Plants
- Even more updates than usual from Plazi
- Resolution of issues from user feedback in the GitHub project's "Done" column
Refer to the backbone build log for additional details.
31 August 2021
Integrated Publishing Toolkit (IPT)
- A fresher-looking user interface, which should still be familiar to existing users
- The user manual has been converted from the GitHub Wiki to AsciiDoctor/Antora
- Source data files can now be downloaded by a resource manager
- Auto-publishing can now be set to specific, future dates
- Archive mode can be limited to a set number of old archives to retain
- A new health/troubleshooting page reports common system problems, like running out of disk space or incorrect filesystem permissions
- The administration contact (for forgotten passwords) is now configurable
- Database (JDBC) drivers have been updated
- A URL can now be used as a data source
2 July 2021
- Occurrence records with IIIF manifest given in Audubon Core extension or Dynamic Properties now display draggable IIIF icon with link to viewer (example)
11 June 2021
- New service in the GRSciColl API to suggest changes such as creating new entities or modifying the existing ones. Available in the registry UI
- The IH synchronization now uses machine tags instead of identifiers. This allows to disconnect an entity from IH but keeping the IH identifier.
- New audit log in GRSciColl to track all the changes done in the catalogue: https://api.gbif.org/v1/grscicoll/auditLog
- New permissions model for GRSciColl that includes country scopes, namespace rights and a new Mediator role #310
- The filter by code in the GRSciColl institutions and collections is now case insensitive: https://api.gbif.org/v1/grscicoll/collection?code=naic
- Now possible to filter GRSciColl staff by identifiers and machine tags, e.g.: https://api.gbif.org/v1/grscicoll/person?identifierType=IH_IRN
- New service to download GRSciColl institutions and collections in CSV or TSV format, e.g.: https://api.gbif.org/v1/grscicoll/institution/export?active=true
- More fields to filter searches of GRSciColl institutions and collections #357 #269
31 May 2021
- Dataset search API supports filters and facets by networkKey, hostingCountry and endorsingNodeKey
Dataset export services
- Search export service, accepts the same parameters as the search service but the result is exported into tsv or csv file, facets and paging parameters are ignored, e.g. https://api.gbif.org/v1/dataset/search/export?q=inaturalist (also available in UI)
- Dataset occurrence download usages, exports datasets used in a download into tsv or csv formats, e.g. https://api.gbif.org/v1/occurrence/download/0220580-200613084148143/datasets/export?format=TSV (also available in UI)
- New download statistics api, accepts the same filters as the downloadsByDataset service, e.g. https://api.gbif.org/v1/occurrence/download/statistics?datasetKey=0001480b-76ca-4f30-86bc-f4292481554b
- Export service for download statistics, accepts the same filters as the download/statistics and export the results into csv and tsv formats, e.g. http://api.gbif.org/v1/occurrence/download/statistics/export?datasetKey=0001480b-76ca-4f30-86bc-f4292481554b
- The latest release of the ChronometricAge Extension is now supported and datasets using them can now be filtered, using the occurrence DWCA_EXTENSION search filter.
- New occurrence lookup service, occurrence records can now be looked up by using: datasetKey/occurrenceId, e.g. https://api.gbif.org/v1/occurrence/0001480b-76ca-4f30-86bc-f4292481554b/651D49B2-FF77-7F3F-E053-2614A8C050DE and in UI https://www.gbif.org/occurrence/0001480b-76ca-4f30-86bc-f4292481554b/651D49B2-FF77-7F3F-E053-2614A8C050DE
21 May 2021
- Search occurrences using modification date stated by publisher #219
- Download filters support search “field has a value” using the
- Registry console supports user filtering by roles and editor scopes #330
- API response for dataset citation now includes authors as objects, if they are also contacts and indication if the citation was provided or generated #351
- Dataset search API supports filters and facets by installationKey and endpointType #148
- Creating a network constituent for a non existing network no longer throws error #349
- Network suggest no longer includes deleted entities #308
- Consistent behaviour on GBIF.org and Registry management console for publisher search #198
17 May 2021
- First GBIF Parquet export added to the Amazon Public Data Catalog, with data available on 5 continents
5 May 2021
API and processing
- ability to search for datasets by the network they belong (e.g. OBIS)
- network facets added in the dataset search API (e.g. http://api.gbif.org/v1/dataset/search?facet=networkKey&limit=0)
- events added triggering occurrence dataset reprocessing for changes in dataset network membership
20 April 2021
- New Parquet download format added to the API
- First GBIF Parquet export added to the Microsoft Planetary Computer data catalogue.
22 March 2021
- Classification of Bacteria and Archaea by 16S sequences matched against the Genome Taxonomy Database r95
- ITS (Fungi) database updated to UNITE v8.2
- COI (Animalia) database updated to International Barcode of Life v2021-02-08
11 March 2021
New backbone live
- Data source replacements, primarily for Fabaceae family and the prokaryotic kingdoms Bacteria and Archaea
- Improvement for stable identifiers, esp relating to OTUs
- Algorithm improvements (misplaced taxa)
- Removal of names / terms on a denylist
- Please refer to the backbone build log for additional details
23 February 2021
- Support for registering dataset endpoints in Catalogue of Life Data Package format
- Flagging of potential duplicates added to assist editors in deduplication entries in the GRSciColl catalogue. E.g. Reuse of the code PCU
- Ability to restrict permissions for GRSciColl editors to institution or collection, allowing more people to participate
- Schema.org metadata tags revised on the dataset and taxon pages to improve search engine discoverability
11 February 2021
- Quarterly trends now include summaries by GBIF Region (e.g., Latin America and the Caribbean)
26 January 2021
- Improvements to the handling of networks (groupings of datasets) including
- Support for DOIs for adhoc data exports by GBIFS staff (example https://doi.org/10.15468/dd.jskxae)
- This service is a precursor for GBIF to offer public datasets on cloud environment
- Bug fix for BioCASe protocol metadata synchronisation
- Added the literature vocabularies type, topic and relevance to the API to support analyses by external data scientists
- Added an experimental API categorisation of the griddedness of datasets (e.g. this example)
- Based on exploratory work documented in this blog post
- Added capability to associate ROR and GRID ids to organisations in the GBIF registry
17 December 2020
- Search capability to find records that participate in a cluster, e.g. 9M specimen-related occurrences that cluster
- Search for records that have content in any Darwin Core Archive extension. For example, records with the OBIS Extended Measurements and Facts
- A dashboard (metrics) is added to the institution (e.g. Kew Gardens) and collection (e.g. SAIAB Algae) pages summarizing the digitized occurrence records. Note that records may come from multiple datasets
- Improvements to date interpretation, including the ability to disambiguate date formats (dd/mm/yyyy vs MM/dd/yyyy) using the GBIF Registry and machine tags
15 December 2020
- Search for occurrence records by hosting organization e.g. map of records hosted by GBIF France or through the API
- Search for records by life stage added, such as images of records in nymph stage. Interpretation of this content is backed by the vocabulary server that is part of the registry. GBIF intend to open up vocabularies for collaborative editing when ready, and are working with the TDWG Data Quality Group on this topic.
14 December 2020
- API deployed to support Literature search by DOI. This API is documented in GitHub but documentation will be moved to the GBIF API documentation shortly
8 December 2020
- The new Catalogue of Life website is live. This is the first deployment that is powered by GBIF and hosted on GBIF infrastructure. In addition to the public website are the common repository known as the checklistbank, and a new API which is supported in the rOpenSci client.
2 December 2020
- Extension data now shown on all occurrence pages e.g. measurements example
- Specimen-related occurrence records now link to the collection catalogue entries in addition to the dataset they originate from e.g. this record from SAIAB. Matching uses a variety of fields including
institutionID. See the FAQ on how to improve matching
- New API to improve searching against the Collection Catalogue, e.g searching for "K"
- Elasticsearch updated to version 7.10.0