Using DNA sequences to improve taxonomic identifications

Some species are easily misidentified because they appear very similar to other species. Affecting large biodiversity repositories, such as GBIF, how can such misidentifications be corrected without having to go through millions of records?

GBIF-mediated data resources used : 4,672 species occurrences
Usnea longissima now known as Dolichousnea longissima - or it it?

Usnea longissima (now known as Dolichousnea longissima) - or it it? Photo by J Brew via iNaturalist, licensed under CC BY-SA 2.0.

Some species are easily misidentified because they appear very similar to other species. This affects large biodiversity repositories, such as GBIF, but how can such misidentifications be corrected without having to go through millions of records? In this study, researchers present a strategy that combines DNA sequence data and specimen occurrence data to potentially find incorrectly identified specimens in large repositories such as GBIF. The researchers create ecological niche models for the lichen fungus, Usnea longissima, by using georeferenced specimen data that at the same time have been confirmed to represent a single species by DNA sequence data. When plotting GBIF-mediated occurrences against the verified distribution of the fungus, outliers identified potentially records for taxonomic scrutiny and revision. Revision of these outliers revealed that most were, in fact, misidentified and belonged to similar species with different distributions. The study raises interesting questions about the potential of DNA sequence data to improve the quality of species information in GBIF.

Smith BE, Johnston MK and Lücking R (2016) From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories. PLOS ONE. Public Library of Science (PLoS), e0151232. Available at doi:10.1371/journal.pone.0151232.