Effects of sampling bias on stability of areas of endemism

By artificially introducing problems in data, study reveals that even with biased data, areas of endemism can be described spatially

Data resources used via GBIF : 34,000 species occurrences
Dendropsophus carnifex
Executioner clownfrog (Dendropsophus carnifex) observed in Reserva del Río Guajalito, Ecuador by Marco F. Monteros.
Photo via iNaturalist (CC BY-NC 4.0)

Biases in data—whether geographic or taxonomic, intentional or not—all introduce uncertainty in downstream analyses which can affect results and misrepresent the real world.

This study attempts to quantify the effects of different types of biases by introducing two stability measures to indicate the degree to which a biased dataset agrees with its unbiased version.

Using three datasets, including a GBIF download of all Amazonian amphibians, researchers artificially introduced "biases" by randomly removing increasing fractions of records for 1) all species, 2) specific subsets of species, and 3) within defined geographical sectors—to emulate problems of poor sampling, uneven sampling and geographical bias, respectively.

In subsequent analyses of areas of endemism, the authors compared results from the original data with intentionally biased data to uncover measures of geographical and taxonomic stability—that is, the degree to which the biased data leads to the same predictions as the unbiased data.

In nearly all cases, stability diminished with increased removal of data. However, it seems that data incompleteness had a bigger impact on taxonomic stability, indicating perhaps on a positive note that even scattered data can lead to fair spatial identification of areas of endemism.

Original paper

Casagranda MD and Goloboff PA (2019) On stability measures and effects of data structure in the recognition of areas of endemism. Biological Journal of the Linnean Society. Oxford University Press (OUP) 127(1): 143–155. Available at: https://doi.org/10.1093/biolinnean/blz019