Automated species identification using 19th-century zoological illustrations

Study explores large-scale zero-shot learning for automated classification of scientific illustrations to reduce time needed for digitization

Scutigera in Iconographia Zoologica from the Special Collections of the University of Amsterdam, Public domain, via Wikimedia Commons

Scientific illustrations have historically served as perhaps the most important medium for conveying a species' characteristic traits, offering a means of highlighting and delineating miniscule details not always well-suited to photography. Such illustrations are often stored in museum repositories and archives in undigitized form, where they remain unavailable for generic use.

The use of automated methods for reducing the time and effort required for digitization—including interpretation of historical names and other metadata—may help facilitate access to such important sources of historical knowledge.

In this study, a team of Dutch researchers explore "zero-shot learning" to address the problem. In brief, this approach allows for recognition of objects for which no direct examples of the object class are observed during training. Instead, the method relies on embedded class information from other sources.

The authors trained their model on a dataset of 14,502 illustrations of 7,973 animal species from Iconographia Zoologica, embedding a class hierarchy based on the GBIF backbone taxonomy, literature from the Biodiversity Heritage Library (BHL) and dimensional features from photographs from iNaturalist.

Evaluating the model on an unannotated dataset of digitized illustrations of historical fauna of Indonesia, the researchers achieved an overall classification accuracy of around 35 per cent. While this may seem low, illustrations from 80 classes—with zero examples for training—were categorized correctly, showing the potential of computational methods for embedded models of species classification.

Stork L, Weber A, van den Herik J, Plaat A, Verbeek F and Wolstencroft K (2021) Large-scale zero-shot learning in the wild: Classifying zoological illustrations. Ecological Informatics. Elsevier BV 62: 101222. Available at: