Natural history collections of many sizes contribute data to GBIF.org–used extensively every week to model the distribution of species for a variety of purposes. Every record counts, however, small collections may be more regional in scope with a specific taxomic or ecological focus compared to larger collections.
To quantify the impact of small collections, authors of this study built distribution models of five test case species relying on GBIF-mediated data partitioned by size of source collection. Despite having fewer records, the dataset based on small collections contributed unique information that when combined with the data from the large collections led to more refined and robust predictions of habitat suitability–compared to the large collections alone–across all test species.
While using high numbers of species occurrences as input for distribution models can improve performance and reliability, the present study suggests that the nature of data source can be important too. This potentional of small, regional collections should be considered when planning digitization and data publication efforts.