Biases in data—whether geographic or taxonomic, intentional or not—all introduce uncertainty in downstream analyses which can affect results and misrepresent the real world.
This study attempts to quantify the effects of different types of biases by introducing two stability measures to indicate the degree to which a biased dataset agrees with its unbiased version.
Using three datasets, including a GBIF download of all Amazonian amphibians, researchers artificially introduced "biases" by randomly removing increasing fractions of records for 1) all species, 2) specific subsets of species, and 3) within defined geographical sectors—to emulate problems of poor sampling, uneven sampling and geographical bias, respectively.
In subsequent analyses of areas of endemism, the authors compared results from the original data with intentionally biased data to uncover measures of geographical and taxonomic stability—that is, the degree to which the biased data leads to the same predictions as the unbiased data.
In nearly all cases, stability diminished with increased removal of data. However, it seems that data incompleteness had a bigger impact on taxonomic stability, indicating perhaps on a positive note that even scattered data can lead to fair spatial identification of areas of endemism.