Home | Data | News | Events | Articles | Nodes | Preferences | Help | About | Press | Site map
SITE SEARCH: 
    
GBIF Data
Browse
Search
How to search
Providers
Data policy
About GBIF
Press
GBIF Q&A
GBIF Data Sharing
GBIF Symposia, etc.
Ebbe Nielsen Prize
GBIF Publications
GBIF Documents
GBIF Membership
GBIF Nodes
GBIF Directory
Tools and services
Newsletters
Mailing lists
Wiki
UDDI registry
Standards
CIRCA
GBIF tools download
Support
Become a data provider
GB documents [login]
GB14
Helpdesk
Training
Travel guidelines
FAQ
Programmes
DADI
DIGIT
ECAT
OCB
Home Stories centre

Story: An ID Tag for Biodiversity Information Objects


Click on the image to enlarge

Two workshops organised by the Taxonomic Databases Working Group and GBIF have adopted a system of globally unique identifiers - called LSIDs - that will be implemented within the GBIF information architecture in conjunction with many other partners.
Released on: 27 June 2006
Contributor: Not applicable
Language: English
Spatial coverage: Not applicable
Keywords:
Source of information: Lee Belbin, Manager, TDWG Infrastructure Project
Concerned URL:

Put ten taxonomists in a room to describe a new species and you will probably get more than ten descriptions and names. Biology is complex. Imagine then what it might be like to try to trace changes in the naming and misnaming of a species through time. Identifying the status of a species name can be a frustrating exercise even for experts.

Consider then the millions of observations of hundreds of thousands of species stored in thousands of databases around the world. If we are to conserve the planet's biodiversity, we depend on such data to make quality management decisions. How can informed decisions be based on data where the same species is given five different names across different databases, or one name refers to five different species?

An initial meeting of 30 international experts in bioinformatics and computing was hosted in February, 2006, by the USA's National Evolutionary Synthesis Centre (NESCent) to attempt to address this problem. The workshop was organized by the Taxonomic Database Working Group (TDWG) of the International Union of Biological Sciences and the Global Biodiversity Information Facility (GBIF). It is GBIF's role to provide open access to the world's biological data and it is TDWG's role to provide the standards to make that possible.

This workshop was seeking a system of identifiers for data records that relate, for example, to the naming or occurrence of organisms. This identifier needs to be "globally unique" for each data record. Thus such identifiers are called globally unique identifiers, or GUIDs.

By using GUIDs, the current ambiguity in databases could be greatly reduced over time. A well-designed GUID system could also provide the basis for valuable additional services to those seeking biological information.

The meeting (see a full report here) adopted a GUID technology known as Life Science Identifiers (LSID). LSIDs were developed by the Object Management Group (OMG), an open-membership, not-for-profit consortium that produces and maintains computer industry specifications that enable data integration. If widely adopted, the Life Sciences Identifier protocol will enable scientists and researchers across multiple organizations to share data and collaborate in ways never before considered.

LSIDs serve as uniform resource tags that uniquely identify a given data object on the Internet. LSIDs provide standard mechanisms for accessing data and metadata (descriptive information) for the objects they identify. Experts say that they will form anchor-points for a range of layered information services relating to these objects, such as access to further data resources in a wide range of formats. LSIDs can in this way serve as a stepping-stone to integrating biodiversity data.

For example, an LSID issuing authority such as a museum could generate an LSID for a type specimen (the specimen that a species is named from) and link services that enable a scientist on the Internet to see images of the specimen, a set of keys to identifying the species within the family it belongs to, and observations in space and time of that species.

LSIDs are comprised of six components, for example-

    urn:lsid:ncbi.nlm.nih.gov:GenBank:T48601:2
where "urn" and "lsid" form a mandatory preface for LSID data; "ncbi.nlm.nig.gov" is the identifier of the organization that assigned the LSID to the data; "GenBank" identifies a class of data objects offered by the organization; "T48601" is the name of the data object; and "2" is an optional version number.

IBM has developed tools that help providers serve LSIDs on the Internet and for clients to resolve those LSIDs to access resources (see www-128.ibm.com/developerworks/webservices/library/os-lsid2/#resources).

The Second International Workshop on Globally Unique Identifiers for Biodiversity Informatics (GUID-2) was hosted in June, 2006, by the e-Science Institute of Edinburgh, Scotland. The full report of that meeting can be found here. Importantly, the meeting affirmed the choice of LSIDs, and then made technical recommendations and other actions necessary to begin implementing an Internet-wide system of LSIDs for biodiversity informatics.

At this second workshop, it was recommended that TDWG and GBIF
  • Work first with the nomenclators, particularly IPNI, Index Fungorum and ZooBank, to associate LSIDs with these reference lists of scientific names, recognising their foundational place in biodiversity informatics.
  • Update software tools for LSID resolution and develop packaged installers.
  • Support integration of LSID functionality into TDWG data provider packages.
  • Develop documentation and publicity materials for managers, biologists and IT specialists.
  • Approach external software projects to encourage the support of LSIDs.
Tasks were assigned to individuals and working groups, with the short-term goal of reporting on progress to the TDWG annual meeting in October, 2006.

It is recognized by all concerned that widespread partnerships with other communities (especially digital libraries) are critical to the adoption of LSIDs, and their optimal functioning to improve the delivery of data and information. These communities are being involved at every possible step in this process of development.

Please note that this story expired on 2006/09/15

Contact info | Webmaster | Webmaster login | Printable page