gbif.org
Informatics
Participation
Governance
Communications
gbif.orggbif.org


The call closed on 30 April 2010 and a total of 8 Nodes responded. These included Belgium, Bioversity International, Costa Rica, ETI, France, Spain, Switzerland, and Tanzania. Unfortunately not all these were able to complete all the testing requirements. In the end, 4 Nodes (Belgium, Costa Rica, France, and Spain) successfully carried out all the requirements including the submission of a final report by the closing date of 17 July.

In compiling the feedback from these final reports, GBIF hoped not only to improve the HIT, but also to better understand the need for the HIT by the community. A synthesis of all the reports has since been published, and can be downloaded here: http://gbif-indexingtoolkit.googlecode.com/files/HIT-testing-final-report.pdf



The GBIF Secretariat is making a release candidate (RC) of the Harvesting and Indexing Toolkit (HIT) available for testing. We are looking for 5 to10 Nodes who can assist with testing and provide us with feedback on this new GBIF application.

What is the HIT?

The HIT is an open source Java-based web application developed by the GBIF Secretariat (http://www.gbif.org/) to manage biodiversity data harvesting and quickly build indexes of harvested data.

How does the HIT work?

The HIT is capable of harvesting data from data publishers exposing their data through three protocols: Distributed Generic Information Retrieval (DiGIR), Biological Collection Access Service (BioCASe), and TDWG Access Protocol for Information Retrieval (TAPIR). It can also harvest data directly from a single export, created in accordance with the new Darwin Core terms as a dump in Archive format. By accessing various data publishers through a single tool, the HIT provides a convenient mechanism to manage indexing operations.

This harvested data, usually in XML, is then extracted into intermediary tab-delimited text files. Afterward, a specialised module synchronises/indexes those tab-delimited text files into a GBIF-structured database. Afterward, the GBIF Data Portal could be built on top of the indexed data, providing access to it. Similar modules synchronising/indexing into database structures (other than GBIF's Data Portal) could also be developed allowing thematic portals to then be built on top of them for example.

What are the technical requirements?

In order to limit the technical requirements, the version to be tested has the specialised module responsible for indexing taken out. In doing so, there is no longer the need for a separate indexing database, which depending on its size could be quite expensive. In future, the GBIF Secretariat will provide further assistance to Nodes that are interested in building their own index for the purposes of a data portal for example.

For this trimmed-down version (that only goes as far as extracting into intermediary tab-delimited text files) the requirements are a computer with Java (TM) runtime environment (http://java.sun.com/) version 5 or higher, a web server with a servlet container, such as Apache Tomcat (http://tomcat.apache.org/) which is allocated with at least 1GB RAM and connected to the internet, and a MySQL database management system version 5.1 or higher.

What are the guidelines for testing?

All volunteers must agree to

a. Install the application

b. Learn how to use the application

c. Report any bugs

d. Suggest any improvements that could be made

  • Follow the same instructions on how to report a bug (above)

e. Join the HIT mailing list

Volunteers might also be interested in getting involved with other activities such as:

  1. Writing documentation
  2. Translating documentation
  3. Translating the application itself
  4. Developing new extensions to the application

In order to compile the results from testing for use in improving future versions of the application, all volunteers are asked to finalise testing and provide feedback within two months from the start of testing. The feedback could take the form of a small report, for example, that answers the following questions:

a. What was your overall impression of the HIT?

b. Was the tool easy enough and intuitive to use?

  • If no, what made the application difficult to use?

c. Was the documentation sufficient?

  • If no, what would you like to see added?

d. Could this tool be of value to your Node?

  • If yes, what would you use it for?
  • If no, why wouldn’t you use it?


How will the feedback be used?

All the feedback collected will be used to improve subsequent versions of the HIT in order to more closely meet the GBIF community’s needs. A detailed description of the project’s timeline for 2010 and how feedback from testing will be used can be found here. Notice that a collated report of the feedback received will be made available to the Nodes community on 9 July, 2010.

What happens after testing?

We hope that the GBIF Participant Nodes will continue to use the HIT following testing to satisfy their own harvesting and indexing needs.

Furthermore, those who have learned to use the HIT will ideally have become expert users, and can assist the wider adoption of the tool by others, and by assisting in answering questions posted to the mailing list for example.

For more information on the HIT development, visit the project website.

If you’re interested….

Contact the HIT’s lead developer Kyle Braak (kbraak_@If you can read this, please upgrade to a modern browser.gbif.org), by no later than Friday 28 April indicating your willingness to become one of the HIT testers. Please also be sure to list the technical contact(s) that he can liaise with to commence installation.