Biogeo: an R package for assessing and improving data quality of occurrence record datasets

The R package biogeo was developed for detecting and correcting errors and for assessment of data quality of collections datasets consisting of occurrence records.

NOTE: This is a placeholder as the package has not yet been publicly released.

Occurrence data from museum and herbarium collections are valuable for mapping biodiversity patterns in space and time. Unfortunately these collections datasets contain many errors and suffer from several data quality issues that can influence the quality of the products derived from them. It is up to the user to identify these errors and data quality issues when using these data. Despite the large number of potential users of these datasets there are few software tools dedicated to error detection and correction of collections datasets.

The R package biogeo was developed for detecting and correcting errors and for assessment of data quality of collections datasets consisting of occurrence records. Features of the package include error detection, such as mismatches between the recorded country and the country where the record is plotted, records of terrestrial species that fall into the sea and outlier detection.

A key feature of the package is the ability to identify likely alternative positions for points that represent obvious errors in the dataset and functions to explore records in geographical and environmental space in order to identify possible errors in the dataset. Functions are also available for converting coordinates that are in various text formats into degrees, minutes and seconds and then into decimal degrees.

Citations

Robertson MP, Visser V & Hui C (2016). Biogeo: an R package for assessing and improving data quality of occurrence record datasets. Ecography http://doi.org/10.1111/ecog.02118