Award winner uses data mining and machine learning to identify collectors and duplicated herbarium specimens

Nicky Nicolson, PhD candidate from Brunel University London, earns award for developing computational techniques to identify collectors and duplicate specimens, improving data and inferring connections between different collections

Nicky Nicolson, PhD candidate from Brunel University London, earns award for developing data mining & machine learning techniques that identify plant collectors & duplicate specimens, improving data across different herbaria

Nicky Nicolson, a PhD candidate from the United Kingdom at Brunel University London, is one of two recipients of the GBIF Young Researchers Award for 2019. As a staff member at the Royal Botanic Gardens, Kew, Nicolson is well-known within the GBIF community, having recently switched from a technical software development role, to senior research leader in biodiversity informatics while pursuing her graduate studies.

For centuries, naturalists collecting plants in the field have generally gathered at least five or six representative samples of any specimen. After drying and preparing the specimens and returning to the herbarium, they would then often split up these groups and distribute the individual specimens to other institutions. This strategy maximizes researchers' access to specimens, makes efficient use of collection space and safeguards knowledge against the catastrophic loss of any single collection. However, with collections pursuing independent digitization processes, links between duplicate specimens remain hidden when shared through the GBIF infrastructure.

Nicolson's research starts with clustering digital data from herbarium specimens shared through, to identify the collectors responsible for gathering specimens in the field and the expeditions they conducted. Data-mining these implicit entities— often not formally managed, but recognized by researchers and specimen collection holders alike—allows the reshaping of data aggregations like GBIF. Effective summaries of large-scale data enable novel outputs such as data visualisations and network graph representations.

Representing this data as a network makes it possible to infer connections and reveal hidden or forgotten aspects of specimen collections. For example, reconciling information about collectors' names, dates and locations can detect collecting trips and closely related specimens now dispersed and managed independently, even if the vouchers and digital records no longer explicitly record them as coming from the same collection event.

"Annotations held on specimens represent both the most costly part of the specimen digitization process—georeferencing—and the most valuable kinds of usage— identifications, and the citation of type specimens formally attached to scientific names," said Nicolson. "By identifying the collecting events from specimens pooled in GBIF, I hope to combine their histories and form a richer, shared basis of evidence."

"The vast amount of digitized information held at Kew and mobilised through GBIF offers a rich resource to mine knowledge about botanical specimens, and how they are collected internationally,” said Dr Allan Tucker, senior lecturer in computer science at Brunel University London and head of the Intelligent Data Analysis Group. ”By developing state-of-the-art algorithms, Nicky has enabled new inferences to be made about the botanical scientific process, revealing how science in the field has changed over time and undoubtedly informing future efforts."

“Nicky’s research to reconcile specimen duplicates holds the promise of enabling shared curation effort between organizations worldwide, making the most efficient use of expert input," said Dr Alan Paton, Head of Science Collections at the Royal Botanic Gardens, Kew. "Her work also has great potential for mining details of specimen use from literature and other data sources, demonstrating the value of digitized collections as a basic scientific infrastructure for addressing environmental challenges. Applying these new computational techniques to the collections data strengthens the data-level connections between institutions, helping us to scale up mass digitization.”

Like the botanical technique of propagation which grows new plants from diverse sources like seeds and cuttings, the computational techniques Nicolson has developed aim to cultivate connections between collections. Sharing metadata elements and annotations from related specimens held in different herbaria could enable the dissemination of better, more consistent data that efficiently enriches records for them all.

Meanwhile, researchers would benefit from improved standardization, documentation and linkage of digital specimen information, and detection of collection patterns within individual expeditions. Collections could also see positive effects in reduced data management costs and improved data quality and data usage reporting. Finally, revealing the latent relationships between collections with shared specimen material could highlight institutions that stand to benefit from collaborations aimed at enabling community curation.

The award jury, led by GBIF science committee vice chair Anders G. Finstad of the Norwegian University of Science and Technology (NTNU), lauded Nicolson for her "highly original and innovative" approaches and her successful "use of data from GBIF to combine geographically distant collections using only minimal information on the specimen."

Nicolson is the first U.K. national to win the award since Amy McDougal earned the honour in 2010, the programme's first year. She is also the third U.K.-based winnner, preceded by McDougal and Juan Escamilla Mólgora, a Mexican PhD candidate at Lancaster University and 2016 award recipient.

The GBIF Science Committee selected Nicolson and Marcos Daniel Zárate, a PhD candidate from Argentina, from a pool of 11 candidates nominated by heads of delegation from seven GBIF Participant countries, including the United Kingdom, whose delegation nominated Nicolson for the award. Zárate and Nicolson will each receive a €5,000 award and recognition at the 26th GBIF Governing Board in Leiden, the Netherlands, in October 2019.

About the Award

Since its inception in 2010, the GBIF national heads of delegation used the annual Young Researchers Award to promote and encourage innovation in biodiversity-related research using data shared through the GBIF network.

About Brunel University London

Brunel University London is an international university committed to bringing benefit to society through excellence in education, research and knowledge transfer. Founded in 1966, Brunel has a reputation for high-impact academic research and entrepreneurial flair. The university works extensively with industry partners, contributing to global innovation and policy change. The Intelligent Data Analysis Group is a leading centre of excellence in artificial intelligence, data science and software engineering, and its research in artificial intelligence is among the most highly cited in the world. Learn more at

About Royal Botanic Garden, Kew

The Royal Botanic Gardens, Kew is a world-famous scientific organisation, internationally respected for its outstanding collections as well as its scientific expertise in plant diversity, conservation and sustainable development in the UK and around the world. Kew Gardens was made a UNESCO World Heritage Site in July 2003 and celebrates its 260th anniversary in 2019. Kew supplies GBIF with information from almost one million digitized Herbarium specimens and more than 1.6 million vascular plant names from the International Plant Names Index as well as data on accepted plant species from the World Checklist of Selected Plant Families. Learn more at

2019 Young Researchers Award Jury

  • Anders G. Finstad * (chair): Norwegian University of Science and Technology
  • Jurate De Prins *: Royal Belgian Institute of Natural Sciences
  • Gregory Riccardi *: Florida State University
  • Philippe Grandcolas *: Centre National de la Recherche Scientifique | Muséum national d'Histoire naturelle
  • Emma Patricia Gomez Ruiz: Facultad de Ciencias Biológicas, Universidad Autónoma de Nuevo León
  • Erlend Nilsen: Norwegian Institute for Nature Research

* denotes GBIF Science Committee member