Big data in biodiversity

Reviewing the pros and cons of using big data in biodiversity research

GBIF-mediated data resources used : 9,397 species occurrences
Arctic dwarf birch (Betula nana)

Arctic dwarf birch (Betula nana), the plant species with highest number of records in GBIF. Photo by Jason Grant licensed under CC BY-NC 4.0.

Researchers are using big data in studies of many areas of ecology and biodiversity, and while integrating big datasets allows for addressing questions on a global scale, there are challenges faced with different types of data.

This concept paper reviews four classes of data relevant for forecasting the impacts of global change on vegetation: environmental data, species occurrences, community plots and species traits. The authors highlight how a growing number of regional and global biodiversity research initiatives use GBIF for species occurrence data, but also note that errors in spatial coordinates and taxonomic identification can lead to overestimations.

For researchers producing their own datasets, the authors point to the importance of storing these in data repositories that allows for re-use and post-publication peer review, while also ensuring proper attribution that, in turn will foster more enthusiastic data sharing and data-driven discovery. The quality of the data, however, must always be considered, as the size of a dataset alone cannot overcome problems caused by systematic errors.

Franklin J, Serra-Diaz JM, Syphard AD and Regan HM (2016) Big data for forecasting the impacts of global change on plant communities. Global Ecology and Biogeography. Wiley-Blackwell 26(1): 6–17. Available at: https://doi.org/10.1111/geb.12501.