Biases in data—whether geographic or taxonomic—all introduce uncertainty in downstream analyses which can affect results and misrepresent the real world.
This study attempts to quantify the effects of different types of biases by introducing two stability measures—one for geography and one for taxonomy—to indicate the degree to which a biased dataset agrees with its unbiased version.
Using three datasets, including a GBIF download of all Amazonian amphibians, researchers artificially introduced "biases" by randomly removing increasing fractions of records for 1) all species, 2) specific subsets of species, and 3) within defined geographical sectors—to emulate problems of poor sampling, uneven sampling and geographical bias, respectively.
In analyses focused on specific areas of endemism, the authors compared results from the original data with intentionally biased data to uncover measures of stability—that is, the degree to which the biased data leads to the same predictions as the unbiased data.
In nearly all cases, stability diminished with increased removal of data. However, it seems that data incompleteness had a bigger impact on taxonomic stability, indicating perhaps on a positive note that even scattered data can lead to fair spatial identification of areas of endemism.