gbif.org
Informatics
Participation
Governance
Communications
gbif.orggbif.org

Publications: archive file

Archive

The archive containing all publications should be a list of simple Dublin Core records. There are two ways of encoding such an archive: a simple CSV text file, or XML.

CSV archive

A CSV file with each row representing a single publication. This format is very simple to produce and is compatible with the Darwin Core text guidelines, in particular the ECAT references extension.

It does not allow for line breaks in the metadata – something commonly found in abstracts. If you do not have abstracts, or can replace the line breaks, please consider this format. A simple example file with only one record looks like this:

<pre>
dc:identifier    link    dc:bibliographicCitation    dc:title    dc:creator    dc:date dc:source    dc:subject    dc:description
doi:10.1038/ng0609-637    Hartge, P., Genetics of reproductive lifespan. Nature Genetics 41, 637 - 638 (2009)    Genetics of reproductive lifespan    Patricia Hartge 2009-06-01    Nature Genetics 41, 635 (2009)    genomics, epidemiology    Five genome-wide association studies of the timing of menarche and menopause have now taken us beyond the range of candidate gene and linkage studies. The list of new genetic associations identified for these two traits should shed light on the mechanisms of ovarian aging, as well as breast cancer and other diseases associated with reproductive lifespan.
 ...
</pre>

XML archive

The exact same informations can also be encoded as XML, which allows for line breaks and markup within the abstracts. A simple XML schema is provided to validate resources encoded in Dublin Core alone. The above example would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<resources xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xsi:noNamespaceSchemaLocation="http://gbif-ecat.googlecode.com/files/publication_archive.xsd">
    <resource>
        <dc:identifier>doi:10.1038/ng0609-637</dc:identifier>
        <dc:identifier>http://www.nature.com/ng/journal/v41/n6/pdf/ng0609-637.pdf</dc:identifier>
        <dc:title>Genetics of reproductive lifespan</dc:title>
        <dc:creator>Patricia Hartge</dc:creator>
        <dc:date>2009-06-01</dc:date>
        <dc:source>Nature Genetics 41, 635 (2009)</dc:source>
        <dc:subject>genomics; epidemiology</dc:subject>
        <dc:language>en</dc:language>
        <dc:rights>Copyright © 2009 Wiley-Liss, Inc., A Wiley Company</dc:rights>
        <dc:description>
            Five genome-wide association studies of the timing of menarche and menopause have now taken us beyond the range of candidate gene and linkage studies.
            The list of new genetic associations identified for these two traits should shed light on the mechanisms of ovarian aging, as well as breast cancer and other diseases associated with reproductive lifespan.
        </dc:description>
    </resource>
     ...
</resources>