Uses of GBIF in scientific research

Peer-reviewed research citing GBIF as a data source, with at least one author from Netherlands.
Extracted from the Mendeley GBIF Public Library.

List of publications

  • Lin, Y., Deng, D., Lin, W., Lemmens, R., Crossman, N., Henle, K., Schmeller, D., 2015.

    Uncertainty analysis of crowd-sourced and professionally collected field data used in species distribution models of Taiwanese moths

    Biological Conservation 181 102-110.

    The purposes of this study are to extract the names of species and places for a citizen-science monitoring program, to obtain crowd-sourced data of acceptable quality, and to assess the quality and the uncertainty of predictions based on crowd-sourced data and professional data. We used Natural Language Processing to extract names of species and places from text messages in a citizen science project. Bootstrap and Maximum Entropy methods were used to assess the uncertainty in the model predictions based on crowd-sourced data from the EnjoyMoths project in Taiwan. We compared uncertainty in the predictions obtained from the project and from the Global Biodiversity Information Facility (GBIF) field data for seven focal species of moth. The proximity to locations of easy access and the Ripley K method were used to test the level of spatial bias and randomness of the crowd-sourced data against GBIF data. Our results show that extracting information to identify the names of species and their locations from crowd-sourced data performed well. The results of the spatial bias and randomness tests revealed that the crowd-sourced data and GBIF data did not differ significantly in respect to both spatial bias and clustering. The prediction models developed using the crowd-sourced dataset were the most effective, followed by those that were developed using the combined dataset. Those that performed least well were based on the small sample size GBIF dataset. Our method demonstrates the potential for using data collected by citizen scientists and the extraction of information from vast social networks. Our analysis also shows the value of citizen science data to improve biodiversity information in combination with data collected by professionals.

    Keywords: Citizen science, Large-scale monitoring program, Natural language, Prediction of species distribution, Social media, Uncertainty, Volunteer survey


  • van Andel, T., Croft, S., van Loon, E., Quiroz, D., Towns, A., Raes, N., 2015.

    Prioritizing West African medicinal plants for conservation and sustainable extraction studies based on market surveys and species distribution models

    Biological Conservation 181 173-181.

    Sub-Saharan African human populations rely heavily on wild-harvested medicinal plants for their health. The trade in herbal medicine provides an income for many West African people, but little is known about the effects of commercial extraction on wild plant populations. Detailed distribution maps are lacking for even the most commonly traded species. Here we combine quantitative market surveys in Ghana and Benin with species distribution models (SDMs) to assess potential species’ vulnerability to overharvesting and to prioritize areas for sustainable extraction studies. We provide the first detailed distribution maps for 12 commercially extracted medicinal plants in West Africa. We suggest an IUCN threat status for four forest species that were not previously assessed (Sphenocentrum jollyanum, Okoubaka aubrevillei, Entada gigas and Piper guineense), which have narrow distributions in West Africa and are extensively commercialized. As SDMs estimate the extent of suitable abiotic habitat conditions rather than population size per se, their output is of limited use to assess vulnerability for overharvesting of widely distributed species. Examples of such species are Khaya senegalensis and Securidaca longipedunculata, two trees that were reported by market vendors as becoming increasingly scarce in the wild. Field surveys should start in predicted suitable habitats closest to urban areas and main roads, as commercial extraction likely occurs at the shortest cost distance to the markets. Our study provides an example of applying SDMs to conservation assessments aiming to safeguard provisioning ecosystems.

    Keywords: Commercial extraction, Ecosystem services, Herbal medicine, Non-timber forest products, Trade


  • Aguirre-Gutiérrez, J., Serna-Chavez, H., Villalobos-Arambula, A., de la Rosa, J., Raes, N., 2014.

    Similar but not equivalent: ecological niche comparison across closely-related Mexican white pines

    Diversity and Distributions Forthcoming.

    Aim In the face of global environmental change, identifying the factors that shape the ecological niches of species and understanding the mechanisms behind them can help to draft effective conservation plans. The differences in the ecological factors that shape species distributions may then help to highlight differences between closely related taxa. We investigate the applicability of ecological niche modelling and the comparison of species distributions in ecological niche space to detect areas with priority for biodiversity conservation and to analyse differ- ences in the ecological niche spaces used by closely related taxa. Location United States of America, Mexico and Central America. Methods We apply ordination and ecological niche modelling techniques to assess the main environmental drivers of the distribution of Mexican white pines (Pinus: Pinaceae). Furthermore, we assess the similarities and differences of the ecological niches occupied by closely related taxa. We analyse whether Mexican white pines occupy similar or equivalent ecological niches. Results All the studied taxa presented different responses to the environmental factors, resulting in a unique combination of niche conditions. Our stacked habitat suitability maps highlighted regions in southern Mexico and northern Central America as highly suitable for most species and thus with high conser- vation value. By quantitatively assessing the niche overlap, similarity and equiv- alency of Mexican white pines, our results prove that the distribution of one species cannot be implied by the distribution of another, even if these taxa are considered closely related. Main conclusions The fact that each Mexican white pine is constrained by a unique set of environmental conditions, and thus, their non-equivalence of ecological niches has direct implications for conservation as this highlights the inadequacy of one-fits all type of conservation measure.

    Keywords: conifers, conservation, forest, niche comparison, pinus, species distribution


  • Cadima, X., van Zonneveld, M., Scheldeman, X., Castañeda, N., Patiño, F., Beltran, M., Van Damme, P., 2014.

    Endemic wild potato (Solanum spp.) biodiversity status in Bolivia: Reasons for conservation concerns

    Journal for Nature Conservation 22(2) 113-131.

    Crop wild relatives possess important traits, therefore ex situ and in situ conservation efforts are essential to maintain sufficient options for crop improvement. Bolivia is a centre of wild relative diversity for several crops, among them potato, which is an important staple worldwide and the principal food crop in this country. Despite their relevance for plant breeding, limited knowledge exists about their in situ conservation status. We used Geographic Information Systems (GIS) and distribution modelling with the software Maxent to better understand geographic patterns of endemic wild potato diversity in Bolivia. In combination with threat layers, we assessed the conservation status of all endemic species, 21 in total. We prioritised areas for in situ conservation by using complementary reserve selection and excluded 25% of the most-threatened collection sites because costs to implement conservation measures at those locations may be too high compared to other areas. Some 70% (15 of 21 species) has a preliminary vulnerable status or worse according to IUCN red list distribution criteria. Our results show that four of these species would require special conservation attention because they were only observed in <15 locations and are highly threatened by human accessibility, fires and livestock pressure. Although highest species richness occurs in south-central Bolivia, in the departments Santa Cruz and Chuquisaca, the first priority area for in situ conservation according to our reserve selection exercise is central Bolivia, Cochabamba; this area is less threatened than the potato wild relatives’ hotspot in south-central Bolivia. Only seven of the 21 species were observed in protected areas. To improve coverage of potato wild relatives’ distribution by protected areas, we recommend starting inventories in parks and reserves with high modelled diversity. Finally, to improve ex situ conservation, we targeted areas for germplasm collection of species with <5 accessions conserved in genebanks.

    Keywords: Crop wild relatives, Ex situ conservation, IUCN red listing, In situ conservation, Potato breeding material, Reserve selection, Species distribution modelling, Threat assessment


  • Cornwell, W., Westoby, M., Falster, D., FitzJohn, R., O'Meara, B., Pennell, M., McGlinn, D., Eastman, J., Moles, A., Reich, P., Tank, D., Wright, I., Aarssen, L., Beaulieu, J., Kooyman, R., Leishman, M., Miller, E., Niinemets, ., Oleksyn, J., Ordonez, A., Royer, D., Smith, S., Stevens, P., Warman, L., Wilf, P., Zanne, A., 2014.

    Functional distinctiveness of major plant lineages

    Journal of Ecology 102(2) 345-356.

    1. Plant traits vary widely across species and underpin differences in ecological strategy. Despite centuries of interest, the contributions of different evolutionary lineages to modern-day functional diversity remain poorly quantified. 2. Expanding data bases of plant traits plus rapidly improving phylogenies enable for the first time a data-driven global picture of plant functional diversity across the major clades of higher plants. We mapped five key traits relevant to metabolism, resource competition and reproductive strategy onto a phylogeny across 48324 vascular plant species world-wide, along with climate and biogeo- graphic data. Using a novel metric, we test whether major plant lineages are functionally distinctive. We then highlight the trait–lineage combinations that are most functionally distinctive within the present-day spread of ecological strategies. 3. For some trait–clade combinations, knowing the clade of a species conveys little information to neo- and palaeo-ecologists. In other trait–clade combinations, the clade identity can be highly reveal- ing, especially informative clade–trait combinations include Proteaceae, which is highly distinctive, representing the global slow extreme of the leaf economic spectrum. Magnoliidae and Rosidae con- tribute large leaf sizes and seed masses and have distinctively warm, wet climatic distributions. 4. Synthesis. This analysis provides a shortlist of the most distinctive trait–lineage combinations along with their geographic and climatic context: a global view of extant functional diversity across the tips of the vascular plant phylogeny.

    Keywords: Kolmogorov–Smirnov Importance index, determinants of plant community diversity and stru, functional traits, geographic and climatic distributions, leaf nitrogen, leaf size, maximum adult height, phylogenetic tree, seed mass, specific leaf area


  • Creemers, R., Denoël, M., Campos, J., Vences, M., Crochet, P., Gonçalves, J., de Pous, P., Kuzmin, S., Speybroeck, J., Toxopeus, B., Corti, C., Vieites, D., Ficetola, G., Bonardi, A., Crnobrnja Isailović, J., Rodríguez, A., Lymberakis, P., Sindaco, R., Sillero, N., 2014.

    Updated distribution and biogeography of amphibians and reptiles of Europe

    Amphibia-Reptilia 35(1) 1-31.

    A precise knowledge of the spatial distribution of taxa is essential for decision-making processes in land management and biodiversity conservation, both for present and under future global change scenarios. This is a key base for several scientific disciplines (e.g. macro-ecology, biogeography, evolutionary biology, spatial planning, or environmental impact assessment) that rely on species distribution maps. An atlas summarizing the distribution of European amphibians and reptiles with 50&#160;× 50&#160;km resolution maps based on ca. 85 000 grid records was published by the Societas Europaea Herpetologica (SEH) in 1997. Since then, more detailed species distribution maps covering large parts of Europe became available, while taxonomic progress has led to a plethora of taxonomic changes including new species descriptions. To account for these progresses, we compiled information from different data sources: published in books and websites, ongoing national atlases, personal data kindly provided to the SEH, the 1997 European Atlas, and the Global Biodiversity Information Facility (GBIF). Databases were homogenised, deleting all information except species names and coordinates, projected to the same coordinate system (WGS84) and transformed into a 50&#160;× 50&#160;km grid. The newly compiled database comprises more than 384 000 grid and locality records distributed across 40 countries. We calculated species richness maps as well as maps of Corrected Weighted Endemism and defined species distribution types (i.e. groups of species with similar distribution patterns) by hierarchical cluster analysis using Jaccard’s index as association measure. Our analysis serves as a preliminary step towards an interactive, dynamic and online distributed database system (NA2RE system) of the current spatial distribution of European amphibians and reptiles. The NA2RE system will serve as well to monitor potential temporal changes in their distributions. Grid maps of all species are made available along with this paper as a tool for decision-making and conservation-related studies and actions. We also identify taxonomic and geographic gaps of knowledge that need to be filled, and we highlight the need to add temporal and altitudinal data for all records, to allow tracking potential species distribution changes as well as detailed modelling of the impacts of land use and climate change on European amphibians and reptiles.

    Keywords: European herpetofauna, IUCN red list, biogeography, conservation, distribution atlas, distribution types, endemism, species richness


  • FitzJohn, R., Pennell, M., Zanne, A., Stevens, P., Tank, D., Cornwell, W., 2014.

    How much of the world is woody?

    Journal of Ecology 102(5) 1266-1272.

    1.The question posed by the title of this paper is a basic one, and it is surprising that the answer is not known. Recently assembled trait datasets provide an opportunity to address this, but scaling these datasets to the global scale is challenging because of sampling bias. Although we currently know the growth form of tens of thousands of species, these data are not a random sample of global diversity; some clades are exhaustively characterised, while others we know little–to–nothing about. 2.Starting with a database of woodiness for 39,313 species of vascular plants (12% of taxonomically resolved species, 59% of which were woody), we estimated the status of the remaining taxonomically resolved species by randomisation. To compare the results of our method to conventional wisdom, we informally surveyed a broad community of biologists. No consensus answer to the question existed, with estimates ranging from 1% to 90% (mean: 31.7%). 3.After accounting for sampling bias, we estimated the proportion of woodiness among the world's vascular plants to be between 45% and 48%. This was much lower than a simple mean of our dataset and much higher than the conventional wisdom. 4.Synthesis: Alongside an understanding of global taxonomic diversity (i.e., number of species globally), building a functional understanding of global diversity is an important emerging research direction. This approach represents a novel way to account for sampling bias in functional trait datasets and to answer basic questions about functional diversity at a global scale.

    Keywords: Databases, Determinantes of plant community diversity and str, Functional diversity, Herbaceousness, Macroecology, Sampling bias, Woodiness


  • Flantua, S., Hooghiemstra, H., Boxel, J., Cabrera, M., González, Z., González-Arango, C., 2014.

    Connectivity dynamics since the Last Glacial Maximum in the northern Andes; a pollen-driven framework to assess potential migration

    98 - 123.

    We provide an innovative pollen-driven connectivity framework of the dynamic altitudinal distribution of North Andean biomes since the Last Glacial Maximum (LGM). Altitudinally changing biome distributions reconstructed from a pollen record from Lake La Cocha (2780 m) are assessed in terms of their changing surface and connectivity within the study area. The upper forest line (UFL) ecotone lodged during much of the time around 2000 m (LGM), 2400 m (ca. 14–8 ka), 2800 m (ca. 8–3 ka), and 3550–3600 m (modern time). This resulted in a four-fold increase of the area covered by mountain forest (Andean and sub-Andean), a decrease of 96% of páramo, and a disappearance of permanent snow. Upslope migration of the UFL of 20 vertical m yr–1 and more, as inferred from the pollen record, was spatially assessed: reduced surface area, dispersal limitation, reduced connectivity, and extirpation of the subpáramo biome during a few centuries is shown. The study area includes abundant higher mid-range altitudes (2600–3400 m), with a steep reduction of available surface area and increased dispersal distance in the high and low altitudes. In this range, each 100-m altitudinal rise of the UFL results in 20%–60% reduction of the surface area available for páramo and connectivity. The critical elevations where large biome surfaces start to disconnect depend on the elevation of lowest thresholds in the landscape and the elevation of summits. The 2500–3600 m elevation range is most dynamic in terms of geography and ecological species sorting; the 1000–1500 m interval is relatively stable and is permanently covered by Andean forest, making this interval less sensitive for monitoring climate change. When forests migrate to higher elevations, distribution nuclei of species are compressed, resulting temporarily in a higher species diversity. The species dissimilarity coefficient reflects rate of (ecological) change more adequately than the rate of palynological turnover, because the latter is much influenced by the lengths of the time steps between the pollen samples. Spatial analysis of site-specific dynamics provides exciting new insights into past vegetation dynamics, with potential for better understanding species-area distributions, distribution patterns of biodiversity, and conservation of mountain ecosystems.

    Keywords: Andean biome dynamics, GIS, Last Glacial Maximum, elevational distribution patterns, landscape connectivity, palynological turnover rate, upper forest line


  • Mathew, C., Güntsch, A., Obst, M., Vicario, S., Haines, R., Williams, A., de Jong, Y., Goble, C., 2014.

    A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control

    Biodiversity Data Journal 2(2) e4221.

    The compilation and cleaning of data needed for analyses and prediction of species distributions is a time consuming process requiring a solid understanding of data formats and service APIs provided by biodiversity informatics infrastructures. We designed and implemented a Taverna-based Data Refinement Workflow which integrates taxonomic data retrieval, data cleaning, and data selection into a consistent, standards-based, and effective system hiding the complexity of underlying service infrastructures. The workflow can be freely used both locally and through a web-portal which does not require additional software installations by users.

    Keywords: biodiversity informatics, data cleaning, e-Science, service oriented architecture, web services, workflows


  • Samy, A., van de Sande, W., Fahal, A., Peterson, A., 2014.

    Mapping the potential risk of mycetoma infection in Sudan and South Sudan using ecological niche modeling

    PLoS neglected tropical diseases 8(10) e3250.

    In 2013, the World Health Organization (WHO) recognized mycetoma as one of the neglected tropical conditions due to the efforts of the mycetoma consortium. This same consortium formulated knowledge gaps that require further research. One of these gaps was that very few data are available on the epidemiology and transmission cycle of the causative agents. Previous work suggested a soil-borne or Acacia thorn-prick-mediated origin of mycetoma infections, but no studies have investigated effects of soil type and Acacia geographic distribution on mycetoma case distributions. Here, we map risk of mycetoma infection across Sudan and South Sudan using ecological niche modeling (ENM). For this study, records of mycetoma cases were obtained from the scientific literature and GIDEON; Acacia records were obtained from the Global Biodiversity Information Facility. We developed ENMs based on digital GIS data layers summarizing soil characteristics, land-surface temperature, and greenness indices to provide a rich picture of environmental variation across Sudan and South Sudan. ENMs were calibrated in known endemic districts and transferred countrywide; model results suggested that risk is greatest in an east-west belt across central Sudan. Visualizing ENMs in environmental dimensions, mycetoma occurs under diverse environmental conditions. We compared niches of mycetoma and Acacia trees, and could not reject the null hypothesis of niche similarity. This study revealed contributions of different environmental factors to mycetoma infection risk, identified suitable environments and regions for transmission, signaled a potential mycetoma-Acacia association, and provided steps towards a robust risk map for the disease.

    Keywords: biodiversity informatics, data cleaning, e-Science, service oriented architecture, web services, workflows