Webinar: Data papers describing datasets on vectors of human diseases

25 January 2022
12:00 - 13:00 CET
Asian tiger mosquito (Aedes albopictus), observed in Republic of Korea. Photo 2021 Jake David MacLennan via iNaturalist Research-grade Observations, licensed under CC BY-NC 4.0.

This webinar uncovers the details of the sponsored call for data papers describing datasets on vectors of human diseases launched in Nov 2021 and is aimed at authors interested in submitting a manuscript.

Presenters include Dmitry Schigel, Scientific Officer, GBIF Secretariat and Scott Edmunds, Editor-in-Chief, GigaScience.

Questions and answers from the webinar

I have never published data through GBIF before, can I just make my data file available online elsewhere, and submit a data paper?

This is indeed possible under the GigaByte guidelines to authors, but in order to qualify for sponsorship under this particular call, the data papers has to be based on the new dataset / new records added to GBIF

I have gone through GBIF’s data publishing guidelines, but I still have questions about the Darwin Core standard, IPT, and data hosting. Whom can I ask?

GBIF has set thematic helpdesk for any questions related to data publishing on human diseases and health related data. Please contact health@gbif.org.

I have a good data story in my head, and I have a data table in need of edits and standardization. Where to start?

We recommend a dataset-first approach: clean your data, prepare your metadata, publish a dataset through GBIF. Once this is secured, turn to the preparation of the manuscript narrative and present this following the simple structure of the GigaByte Data Release articles.

The call says the number of sponsored slots is limited. Am I too late?

Sponsorship is indeed limited to the first 15 accepted papers. No, you are not too late: the deadline for submissions is the end of February, and there are about a dozen slots left.

Might publishing a dataset in the journal that includes unpublished data interfere with future publications?

In most cases this is not a problem. The purpose of the data paper is to stimulate the downstream application of the data to answer scientific questions. Many publishing houses including the largest ones (Elsevier, Springer Nature, Wiley, etc.) publish data papers and data journals, too. See also the F1000 survey of publishers in 2013 showing journals do not see this as “prior publication”.

I would like to submit a manusccript based on an updated version of our already existing dataset. Will it still be eligible if I use our own IPT and just update the dataset with additional +5,000 records? Can the dataset be made available now?

Yes, the data papers based on the existing dataset where update of the new records is 5000+ records, are welcome. Yes, datasets need to be published before the submission of the manuscripts.

Is it possible to submit two different manuscripts by the same author?

We did not set any limits on how many data papers a single author or a group of authors can submit.

We have three different sampling-event datasets but they are about 3,000 records or less in total, and I am aware that their suitability will be considered by the editor.

If they are stand-alone datasets it would make sense to submit them as three different data papers. The series aims for submissions with >5000 records, but there is flexibility here, especially for particularly interesting and hard-to-gather datasets, which these sound like.

Can metaanalyses be considered as data papers?

Unfortunately, not. It is expected that you write a data paper about the new datasets that you author, curate or manage yourself, e.g. in your group, unit, or organizations. Unauthorized creation of data descriptors / data papers about somebody else's data are not eligible for publication in GigaByte.

Can any person use data (described by data paper) and publish articles based on it?

Yes, the idea of data paper is to promote discoverability and reuse. In fact, the more people find the data paper and its data, the more successful the paper is. Research has shown data sharing translates into an increase of bibliographic and data citations.

What will be the formal process to use the data of others? (Does one need to get permission from the original source person/entity?)

By publishing data with citation details and a clear license and statement of re-use you are making it clear that data is usable with attribution of the source but without any other restrictions. Good academic practice including on citations is the correct way to deal with this. As anywhere in science, crediting each others effort is expected, and this applies to data and data papers, too. Not attributing the source is seen by your peers, journals, research institutions and funders as research misconduct.

Will the current TDR call on disease vectors be repeated in the future?

There is no information at the moment on possible continuations, and we encourage all holders of the suitable datasets to make the best of the current opportunity.

Is there any specific formality to get permission from the original funding agency? If it is a government supported program will those formalities be different?

Many funders expect or require data availability, so publishing datasets and data papers make a good and positive response to these expectations. In GBIF, datasets are published from the organizational accounts, and registering an organization in GBIF requires administrative approval locally. It is expected that local processes are respected in terms of data access and data publications. Please provide grant and funding details so we can include this in the articles and data metadata so they can be credited with funding these outputs.

Is there any limit on the number of participants/authors? We are preparing a database for where 47 countries are participating. Are there any issues about data repositories in so many different countries?

GigaByte does not have limits for the number of authors, so this should not be a problem. Please take into account that hyperauthorship can be seens as an instrument of inclusivity or as a dilution of credits. A mental exercise can be suggested “could this paper happen without a person NN?” if the answer is “it could not”, then add NN as a co-author. Also note that co-authorship may affect future collaborations such as committee membership, acting as opponents etc. - in some cases co-authorships can influence this.

As mentioned in the presentation usually it is no issue to publish based on previously published datasets in a data paper. However does this also work the other way around? For example if we published some records in other papers can we include them in the larger dataset of the data paper (with reference)?

The only novelty important for this call for data papers is the novel availability of the data through GBIF. In a way, liberating data from its imprisonment in non-digital formal or unindexed supplementary materials. If new data will become available through GBIF and you can base a data paper on this basis, this is welcome. As you say, a reference to preceding research publication is expected.

Should we contact health@gbif.org before preparing the dataset?

Not if you don’t have any questions. Contact helpdesk after you try to publish data yourself and have questions or you get stuck.

Is there an age-limit fordata (i.e. will old be accepted)?

No limit, the data can be of any age, but should be newly available to GBIF (e.g. not published to GBIF 10 years ago)

To be sure the data that we submit must be unpublished new data not the data that we get from literature?

Data extracted from literature is eligible, but please consider including authors of these papers, if this is at all possible, into this work.

How should a list of species (data) be presented?

Please refer to Quick guide to publishing data through GBIF.org -> Checklists

How to "validate data" for a list of species?

Please see the Data Validator

