Improving biodiversity data quality in Latin America: documenting best practices across data workflows and life cycles

How to link datasets to a project

{{result.description | stripTags | limitTo:200 }}

{{ result.publishingOrganizationTitle | limitTo:100 }}

... ...

How to link events and news to a project

Vultur gryphus Linnaeus, 1758

Andean Condor (Vultur gryphus), Quindío, Colombia. Photo 2019 Ma Alejandra Bedoya-Cañón via iNaturalist Research-grade Observations, licensed under CC BY-NC 4.0.

Nearly every occurrence record in GBIF can be refined or improved. In Latin America, data professionals and decision-makers alike consistently call out data quality as a priority for ensuring its inclusion in national conservation strategies and policies.

This project, led by GBIF Colombia (SiB Colombia), will address this need by documenting best practices for improving data quality at each stage in the typical data workflow. While the project will focus on delivering a set of best practices adapted to Latin American local needs, the resulting document will draw upon and bring together existing materials from across the global community.

Activities will revolve around a pair of virtual workshops, and the project team will document recommended approaches at three different stages of typical data workflows: digitization, publication and data repatriation.

  1. Unpublished data: A great deal of biodiversity information across Latin America remains undigitized. To improve overall quality and fitness-for-use, it is crucial to consider data quality at the earliest stages of digitization processes. The project will seek to produce clear, consistent protocols for distribution among data providers.
  2. Published data: Already published data is of varying quality. We need to implement unified data quality processes at the nodes and providers level, to identify and correct errors, complement existing information when incomplete and republish the data.
  3. Data repatriation: We need good quality, research-ready data for decision-making. A considerable percentage of biodiversity data published about our countries comes from foreign institutions. We need a clear, unified protocol to guide re-ingestion, assessment and enhancement of the quality of these data.

While the main deliverable, a guide provisionally entitled Data Quality Best Practices for Digitization, Publication and Biodiversity Data Repatriation, will appear first in Spanish, the project team will also deliver a summary in English, with the full document available for translation like the rest of GBIF's digital documentation.

Sources and precedents

  • Chapman AD (2005) Principles of Data Quality. Copenhagen: Global Biodiversity Information Facility. https://doi.org/10.15468/doc.jrgg-a190
  • Chapman AD, Belbin L, Zermoglio PF, Wieczorek J, Morris PJ, Nicholls M, Rees ER, Veiga AK, Thompson A, Saraiva AM, James SA, Gendreau C, Benson A, Schigel D (2020). Developing Standards for Improved Data Quality and for Selecting Fit for Use Biodiversity Data. Biodiversity Information Science and Standards 4: e50889. https://doi.org/10.3897/biss.4.50889
  • Comisión Nacional para el Conocimiento y Uso de la Biodiversidad, México (2017) CONABIO, 25 años de evolución. Ciudad de México: CONABIO. https://www.gob.mx/cms/uploads/attachment/file/262393/25_an_os_Conabio_web.pdf
  • CONABIO (2019) Datos primarios de ejemplares del Sistema Nacional sobre Biodiversidad (SNIB): características y reglas. Ciudad de México: CONABIO. http://www.snib.mx/ejemplares/docs/CONABIO-SNIB-ProtocoloCalidadI.pdf
  • Escobar D, Jojoa LM, Díaz SR, Rudas E, Albarracín RD, Ramírez C, Gómez JY, López CR, Saavedra J & Ortiz R (2016) Georreferenciación de localidades: Una guía de referencia para colecciones biológicas, versión 4.0. Bogotá, Colombia: Instituto de Investigación de Recursos Biológicos Alexander von Humboldt y Instituto de Ciencias Naturales, Universidad Nacional de Colombia. https://hdl.handle.net/20.500.11761/35180
  • Hill AW, Otegui J, Ariño AH & Guralnick RP (2010) GBIF Position Paper on Future Directions and Recommendations for Enhancing Fitness-for-Use Across the GBIF Network, version 1.0. Copenhagen: Global Biodiversity Information Facility. https://www.gbif.org/document/80623/
  • Escobar D, Beltrán N, Buitrago L, Plata C & Delgado E (2015) Calidad de Datos: Guía de herramientas para mejorar los datos primarios de biodiversidad, versión 1.0. Bogotá, Colombia: SiB Colombia. https://hdl.handle.net/20.500.11761/35351
  • Escobar D & Ortiz R (2018) Lineamientos para la georreferenciación de datos sobre biodiversidad. versión 1.0. Bogotá, Colombia: SiB Colombia. https://hdl.handle.net/20.500.11761/35331
  • Buitrago L, Plata C, Ortíz R, Beltrán N (2019) OpenRefine - Guía básica, Limpieza de datos sobre biodiversidad, versión 1.0. Bogotá, Colombia: SiB Colombia. https://hdl.handle.net/20.500.11761/35348
  • Ortíz R, Plata C & Buitrago L (2019) OpenRefine - Guía de validación y limpieza de datos sobre biodiversidad, versión 1.0. Bogotá D.C.: SiB Colombia. https://hdl.handle.net/20.500.11761/35350
  • Veiga AK, Saraiva AM, Chapman AD, Morris PJ, Gendreau C, Schigel D & Robertson TJ (2017) A conceptual framework for quality assessment and management of biodiversity data. PLoS ONE 12(6):e0178731. https://doi.org/10.1371/journal.pone.0178731

Project progress

At final reporting the project achieved its aim to address an identified need for the implementation of good biodiversity data quality practices in different steps of the data workflow through workshops and documentation focused on the Spanish-speaking community.

All partners created an inventory with existing materials related to data quality from international and local sources, including critical details for use in the development of the main document of this project. This inventory was consolidated using Zotero.

Based on this compilation, the project built in Spanish, a draft for the document “Mejores prácticas en calidad de datos para digitalización, publicación y repatriación de datos de biodiversidad” that addresses all the needs identified in the biodiversity data quality processes.

Using the themes of the document, the project organized and held in November 2021 its central workshop “Workshop on Data quality and Biodiversity”. Attended by 60 participants of 14 different nationalities, this online workshop trained relevant stakeholders in the use of the data quality best practices developed.

Workshop materials (text, slides, exercises) are openly available from a GitHub repository created for the project. Additionally, a landing page for the workshop was created to maximize its visibility at the time of the call and consolidate materials. Videos of the workshop are also available via YouTube.

To aid the promotion of the project in the LA community the GBIF project page was translated into Spanish and the project had an access point on the SiB Colombia website. The project also used various channels of communication to share information about the workshop call and materials.

In addition to the above activities, the project also completed an assessment of the central workshop results; the first part of the work towards an internal workshop to finalize the documentation, incorporating lessons learned from the application of the draft documentation during the central workshop.

Due to the COVID-19 pandemic project work was affected, and implementation of activities was delayed. The conditions to implement all activities online were also challenging, however provided an opportunity to reach a higher number of participants in the project’s central workshop. Post project, work towards the remaining commitments is envisaged.

€ {{ 13262 | localNumber }}
€ {{ 27468 | localNumber }}
Duration
29 November 2020 - 29 December 2022
Project identifier
CESP2020-018
Funded by
Project lead
GBIF Colombia
Contact details

Dairo Escobar
Sistema de Información sobre Biodiversidad de Colombia - SiB Colombia
Instituto de Investigación de Recursos Biológicos Alexander von Humboldt
Calle 28A # 15-09
111311 Bogotá, D.C.
Colombia

€ {{ 13262 | localNumber}}