Webinar | Data papers: Bringing more data to light

Add to calendar
30 March 2023
14:00 - 15:00 CEST

Watch recording

Data papers serve as "a mechanism to incentivize data publishing in biodiversity science" and a means of supporting researchers who curate and manage data with a recognizable form of scholarly publishing. Unlike conventional research articles that report hypotheses and conclusions, a data paper has the primary purpose of describing a dataset, outlining its significance and the circumstances of its collection, and increasing the visibility and discoverability of the data, its authors and its originating organization.

During this 60-minute webinar, Dmitry Schigel, scientific officer at the GBIF Secretariat, and partners from open-data publishers—Pensoft Publishers and GigaScience Press—will discuss current open calls for submisisons of data papers that use GBIF to the target mobilization of data on soil biodiversity and about vectors of human disease.

Over the past decade, GBIF has worked with journal publishers to support and promote the preparation of data papers, leading to nearly 400 data papers that describe FAIR and open datasets freely accessible through its global infrastructure.


Topic Speaker
14:00–14:05 Welcome Daniel Noesgaard
GBIF Secretariat
14:05–14:15 GBIF-mediated data and data papers 📊 Dmitry Schigel
GBIF Secretariat
14:15–14:30 Origins of data papers 📊
Biodiversity Data Journal and call on soil biodiversity
Lyubomir Penev
Pensoft Publishers
14:30–14:45 Data papers about vectors of diseases 📊
GigaByte and call on vectors of human diseases
Scott Edmunds
GigaScience Press
14:45–15:00 Q&A


Questions and answers

Question Answer
Are there palaeo data on GBIF? Yes! try using Basis of record filter for Occurrences
I've never submitted data to GBIF. Is there a template I'd have to follow to make the data deposit easier? Yes! Please start with our quick guide, from where you can find the templates. Basically, publishing data to GBIF boils down to converting your data export to an international data standard, and making the online location of the standardized table and its metadata "known" to GBIF (range of approaches available for this step).
If some of my specimens, let's say 5% of my dataset, were only morphotyped to genus level, is that acceptable too? To what detail have specimens to be determined? Yes, we welcome ID to as low taxonomic rank as possible, but genus level ID works. Any rank, in fact, but there should be a Latin name or a global OTU ID to make records publishable in GBIF
Do you have some way to publish small R code alongside very raw data? GBIF supports open data and open source / open code science. Maybe placing the code in the body text of the data paper is not the best idea, and you would rather use CRAN or GitHub, and link to that repo from your data paper.
Are there any metrics on subsequent usage of the data sets post-publication? Yes! You can follow DOI based citations of use for each dataset. Look for the grey Citations box in top right corner of each dataset page, see examples of data citations accumulated since 2021 vs. bibliographic citations known about its data paper. This is a very contrasting example, patterns of attention vary.
Are there processes or tools to make this [data publishing we guess] easier? There is a range of tools and instruments that can help data cleaning (e.g. OpenRefine) and data standardization and mapping (e.g. IPT) easer. Please browse full collection at GBIF.org -> Tools
Clarification on the '5000 presence records' - what does this mean? What is meant by 5,000 new records to GBIF? For GBIF, sponsoring calls for data paper is an instrument to incentivise data mobilization and introduce new communities to data publishing. For these reasons, we sponsor papers describing data which are new to GBIF. 5,000 records is a guiding minimum number of records per paper, which is not strict, but exists to immunize the call from salami publishing, including slicing datasets. Most datasets are defined by their origin, incl. methods, time, space, researchers involved and should be exposed through data papers in their entirety.
Data analysis? Not required in data papers
Data to collect in soil study? Any biodiversity data can be published through GBIF, at the moment sponsored calls focus on a few priority themes. As GBIF is a general discovery platform, only a limited number of data fields are indexed: e.g. in case of soil biodiversity data records will be discoverable through species searches, geography etc, but many soil specific parameters will not be standardized in Darwin Core standard. This, however, should not stop anyone from publishing realm specific data through GBIF - on the contrary, GBIF and data papers mainly help data visibility beyond the original field.
Funding and training course on SCHISTOSOMIASIS and DENGUE research project for OMAN? Please follow GBIF news for our upcoming activities with the Gulf countries. For the specific request, the nearest training will take place in Thailand in connection to the AMV conference (registration). Note that application for the training will open separately from the conference registration.
Generating Abstracts from data? To our knowledge, abstracts are not generated by the partner journals from the dataset metadata (but manuscripts are!). If you are not in a habit of writing your own abstracts, why not try some of the AI tools? Just kidding: please write the abstract last, after the rest of the manuscript is ready.
How do we convert data to meaningful information? This is a deep philosophical question! In science, analysis and synthesis were around for a while, modeling and predictions are trending in data-intensive research, and in science-policy interface indicators and dashboards of all sorts seem to be popular at the moment of writing.
How can I use data for Research results? In data papers, the Results section may not be essential - after materials and methods, the data resource section, including link(s) to datasets, is the centerpiece of the data paper story. In Discussion, as necessary, you are welcome to discuss the shortcomings and opportunities that may help the data reuse.
How much continuous data is required for vector density variation analysis per location? No restrictions, let your study design guide you
How much does it cost? How to select the best journal to publish? See GBIF page on data papers for the range of prices and other parameters. The soil and the vectors calls are sponsored, so effectively free for authors as long as the budget lasts (calls therefore can close before the announced deadlines).
How to do it effectively? How to make a data paper? How to publish a data paper? Select a dataset you want to publish -> data cleaning -> data standardization -> publish to GBIF -> generate a ms draft using journal’s tools -> finish the ms -> check the call details and journal guidelines carefully again -> submit -> enjoy
How to evaluate the impact of development, for instance the Maya train, on indigenous population and their environment? This is a very interesting topic. Perhaps, a data paper on the affected vs. non-affected lands can enable the analysis on impact of development
How to implement in a resource limited setting? Enjoyed the sponsored calls, while they last
How to manage data? Don’t use spreadsheet processors, make a database or use CMS.
What about submissions of data papers in open access journals? See GBIF page on data papers. To our knowledge, all data papers journals are OA.
Quelles sont les composante d'une donnée de bonne qualité? Data quality is a relative concept, quality (or the lack of) is specific to each data application scenario. However, mistakes, omissions etc. can and should be corrected. One possible start is the GBIF data quality requirements page.
Relevance of [biodiversity?] data in policy in health? This is a very important question. Some answers can be found in the recent paper Biodiversity data supports research on human infectious diseases: global trends, challenges, and opportunities published in One Health in 2023. Presence and importance of biodiversity data in the WHO recommendations and guidelines has not been summarized, but there is an interest to do so. You might like to follow news and announcements at the TDR and GBIF websites for the update on this front.
[What is] support and promotion [available] the preparation of data papers? Provided by the journals and by the sponsors, including TDR and GBIF
What is the most appropriate presentation of data? This depends on the kind of data at hand, please start from exploring Dataset classes.
How to describe the dataset of vectors of human disease through its global infrastructure? See Vectors call and links therein.
Updates of the AWT? Please explore the Arpha Writing Tool website
What data is considered to be in darkness? Digitally invisible or non-standardized data, not FAIR data. There are multiple taxonomic, temporal, spatial, and realm gaps in the current digitally available and FAIR data. Sponsored calls for data papers help address thematic gaps in particular fields.

Read more

30 March 2023 14:00 - 15:00