The GBIF Secretariat has released a new Spanish-language guide aimed at helping researchers, analysts and other data wranglers who use the open-source tool OpenRefine to clean and transform biodiversity data from the GBIF network.
Guía para la limpieza de datos sobre biodiversidad con OpenRefine offers a step-wise explanation of key functions and possible approaches for assessing and improving the quality of biodiversity data using OpenRefine, an open-source desktop application for cleaning, transforming and extending records from large datasets.
"The community has long recognized data quality as a key fitness-for-use issue around biodiversity data, but dealing with it is often easier said than done," said Paula Zermoglio of VertNet and lead author of the guide. "We prepared this document to provide an easy-to-follow guide that helps users build an initial set of skills that they can then expand on to enhance the quality of the data they share or use."
Conversations that arose during the initial community review period prompted Zermoglio and VertNet co-author John Wieczorek to collaborate with colleagues from SiB Colombia to expand the guide. Its "first-final" version adds new sections drawn from earlier materials they developed while also accommodating and responding to comments from other community members.
"By merging our earlier materials into this guide, we feel we can provide more comprehensive documentation while avoiding confusion and duplicating the effort needed to maintain different sources," said Camila Plata, data administration team lead at SiB Colombia. "We feel this approach ties in well with our own long-standing efforts to support the community of biodiversity data users in Colombia."
SiB Colombia—an acronym for "Sistema de Información sobre Biodiversidad de Colombia" (in English: Biodiversity Information System of Colombia)—serves as the GBIF node in Colombia, coordinating activities of its own national network and collaborating with other GBIF members in the Latin America and Caribbean region.
"The successful merger of source materials from VertNet and SiB Colombia marks an important milestone for the whole GBIF community," said Joe Miller, executive secretary of GBIF. "The expanded OpenRefine guide demonstrates the effectiveness of having community-maintained documentation and highlights the strengths of our multilingual network."
The OpenRefine guide is the final instalment in a series of five digital-first documents commissioned from VertNet by GBIF, following the sensitive species guide and a set of three georeferencing documents. Developed with the aim of providing up-to-date technical guidance for skills development and training across GBIF's communities of practice, the digital documentation system continues to grow with training materials arising out of other activities.
Many if not most of the materials are or will soon be available in languages other than English, supported in part by a free non-commercial licence provided by CrowdIn that empowers dozens of volunteer translators from across the GBIF community. The OpenRefine guide reverses that flow, and translation projects are already established for it in French and English.
Those interested in future publications within this programme, including peer review and translation opportunities, should consider subscribing to the digital documentation mailing list.