Big data for biodiversity: surpasses 1 billion species occurrences

Milestone represents a collective effort to share evidence of our planet’s biodiversity by more than 1,200 institutions in 123 countries—and more than 1 million researchers and citizen scientists worldwide

Metridium_dianthus- BGuichard-120583
Frilled anemone (Metridium dianthus). Photo by B. Guichard, Agence Française pour la Biodiversité, licensed under CC BY-NC-SA 4.0 topped 1 billion species occurrence records on 4 July 2018, thanks to a surge of datasets from the French National Inventory of Natural Heritage (l'Inventaire National du Patrimoine Naturel), which included this observation of a frilled anemone (Metridium dianthus) off Saint-Pierre and Miquelon, a French archipelago in the northwestern Atlantic. The record for this marine invertebrate animal is one of 150,097 gathered through BioObs (Base pour l’Inventaire des Observations Subaquatiques), a citizen science tool that enables divers to learn about the marine environment while contributing to a national inventory of underwater species.

The milestone symbolizes a major collective achievement, one made possible through the work of the GBIF network, a diverse partnership of more than 1,200 public and private organizations from 123 countries. GBIF’s global index and research infrastructure provides anyone, anywhere, with instant access to free and open data about where and when life forms occur on Earth.

Read this news item en español | en français | на русском | 簡体中文 | بالعربية (PDF)
繁體中文 | em português | in het Nederlands

In recent years, the long-term contributions of biologists and field researchers, IT professionals, collections curators, biodiversity informaticians and data scientists have received a boost through the participation of more than 1 million individuals, whose observations are shared through by the recording societies and citizen science projects in which they participate.

The continuing growth in number of occurrence records also shows a steady rise in species coverage and diversity of species represented in—a trend that reflects the network’s increasing emphasis on filling known taxonomic, geographical and temporal gaps. As of April 2018, has at least one record for 1,049,839 species, representing 62% of those reviewed in the most recent Catalogue of Life checklist.

An unequalled resource for research

GBIF’s global occurrence index provides an unequalled evidence base for informing scientific research and policy through its support of ‘big data’ analyses. On average, nearly two peer-reviewed research papers appear each day that rely on data accessed through, for example, to illuminate our planet’s evolutionary history or to generate models that seek to understand the impact of rapidly changing conditions for life on earth. However, the findings are not limited to research and management topics around species conservation and protected areas or risks from alien and invasive species—they explore how we can improve food security by conserving the wild plants that are related to important crops, where to target monitoring of human diseases given changes in the distribution of the animals that carry them, and why the benefits and services that nature provides our communities hinge on biodiversity.

“Investments by several dozen national governments over the past 15 years have enabled the GBIF network to produce a high-performance platform for sharing biodiversity data freely, publicly and openly,” said Dr Tanya Abrahamse, founding CEO of SANBI, the South African National Biodiversity Institute, and current chair of the GBIF Governing Board. “But equally importantly, the infrastructure comes with a highly effective and worldwide community of practice. The individuals participating in GBIF’s intercontinental collaborations generously transfer their skills and knowledge to enable ever-wider access to data relating to life on Earth.”

Recent and ongoing improvements to the network’s underlying technology platforms have produced a high-volume, near-real time infrastructure prepared to build on’s rapid recent growth to deliver even greater and richer quantities of biodiversity information in the years to come. These enhancements proved timely, given that users of downloaded more than 845 billion records in 2017—a 200% increase in the data delivered the previous year. GBIF fully expects this total to cross the 1 trillion mark in 2018, another symbol of the infrastructure’s maturity.

GBIF’s global infrastructure yields researchers valuable savings and efficiencies by enabling them to search information from hundreds of collections and databases worldwide. The community itself also provides increasingly important services by investing labour into open-source tools for data sharing and access, like the Living Atlases platform, open-source software originally developed for the Australian government by the Atlas of Living Australia and now in use or under development in dozens of countries covering every region of the world.

“If we want to address the big challenges we face around the future of land use, conservation, climate change, food security and health, we need efficient ways to bring together all the data capable of helping us understand the changing state of the world and the essential role that biodiversity plays at all scales.” said Donald Hobern, Executive Secretary of GBIF. “This milestone shows that today’s GBIF is prepared for continued growth and ready to handle the massive volume of data we expect to see from other new technologies and sources, including environmental sequencing and remote sensing.”

What lies ahead

Although reaching 1 billion records is a significant milestone, much work remains to be done, including expanding the partnerships needed to link sources of biodiversity data not yet connected through the GBIF network. To this end, provides a ready-made framework for helping countries and organizations to fill gaps and biases in geographic, temporal and taxonomic coverage of biodiversity information.

Naturalists, explorers and scientists have documented life around the world for centuries. Open biodiversity data served through the GBIF network repatriates evidence about species gathered from field expeditions and held in natural history collections around the world, unlocking it through digital access for use by researchers and citizens worldwide including the countries of origin.

The use of standardized data formats and licences on eliminate guesswork and uncertainty about the terms of both sharing and using open biodiversity data. The global index also incorporates an advanced system of linking data properly cited in research (example) to the datasets that supported to it (example), thereby ensuring that the institutions that share data (example) receive credit for their actions. GBIF also continues to support and advocate for data papers as a tool for ensuring that researchers gain accepted academic credit for their work to collect, curate and share freely accessible, interoperable and reusable data.