Processes for validating and improving data quality prior to publication often require either separate tools or manual intervention. Besides consuming extra time and resources, these approaches can be difficult if not impossible when working with large datasets—or publishing data in languages other than English.
In this project, SiB Colombia (in Spanish, the Colombian Biodiversity Information System) will work with the U.S.-based collections collaboration VertNet to translate the interface and documentation for the Darwin Core Data Migrator Toolkit. The collaboration between these two GBIF Participants will result in a version of the tool that fills an important technical gap for the numerous Spanish-speaking staff across the GBIF community.
By generating automatic data quality check and improvement reports on datasets, the Darwin Core Data Migrator Toolkit reflects VertNet’s long-standing experience in developing and automating routines to monitor and improve data quality. The team also hopes that the project can act as the pilot for future cooperation between stakeholders elsewhere around the world interested in procedures for improving biodiversity data quality early and often.
A 5-day training workshop initiated the project, providing partners with training in the use of VertNet’s Data Migrator Toolkit within the SiB Colombia data sharing workflow. Early progress has been made on the implementation of data sets shared through SiB Colombia. Alongside this, enhancements to the toolkit have been performed and documents have been translated into Spanish for its broader use by the Spanish-speaking community. Currently, the tool has been tested by SiB Colombia staff on a dataset from the fish collection of the Humboldt Institute, and the anticipated feedback will help to develop this in the near future. From here, the next phase will include applying the migration process to other datasets shared through SiB.