Researchers are using big data in studies of many areas of ecology and biodiversity, and while integrating big datasets allows for addressing questions on a global scale, there are challenges faced with different types of data.
This concept paper reviews four classes of data relevant for forecasting the impacts of global change on vegetation: environmental data, species occurrences, community plots and species traits. The authors highlight how a growing number of regional and global biodiversity research initiatives use GBIF for species occurrence data, but also note that errors in spatial coordinates and taxonomic identification can lead to overestimations.
For researchers producing their own datasets, the authors point to the importance of storing these in data repositories that allows for re-use and post-publication peer review, while also ensuring proper attribution that, in turn will foster more enthusiastic data sharing and data-driven discovery. The quality of the data, however, must always be considered, as the size of a dataset alone cannot overcome problems caused by systematic errors.