Taxonomy and distribution of ecosystem types: implementation of ecosystemology principles
CitationSenterre B (2023). Taxonomy and distribution of ecosystem types: implementation of ecosystemology principles. Version 1.4. Seychelles National Herbarium. Metadata dataset https://doi.org/10.15468/q23r47 accessed via GBIF.org on 2024-03-01.
This dataset was intially aimed for publication on GBIF (see details below), but we have now restricted it to a 'metadata' entry, and the corresponding ecosystem dataset is published on Zenodo: https://doi.org/10.5281/zenodo.7812549. It compiles data gathered on ecosystem-types and their distribution based on a series of field studies led by the author, in Seychelles and West and Central Africa (Senterre 2014, Senterre & Wagner 2014, Senterre 2016, Senterre et al. 2017, 2019, 2020, 2021a, 2022). The aims of this dataset are:
1. To share in an explicit and transparent way data on proposed taxonomies of ecosystems, i.e. conceptualizations of ecosystem-types, including explicit ecosystem names and management of synonymies.
2. To develop ecosystem red listing based on transparent and falsifiable distribution raw data, combining distribution modeling (maps) and in situ observation of individual stand occurrences.
3. To illustrate in detail how to deal with ecosystem data following the approach described in Senterre et al. (2021b) (i.e. "ecosystemology" approach).
Although GBIF is currently not able to cater appropriately for ecosystem data and is designed in a species-centric view, GBIF is the largest repository of biodiversity data in the world and therefore it is relevant to at least explore the possibility of addressing that gap. In addition, as we will show here, we suggest that only a few additions and adjustments to the current GBIF structure would be required to integrate the treatment of ecosystem data in a standardized way, following the "ecosystemology" approach (ecosystem taxonomy) proposed by Senterre et al. 2021b (http://dx.doi.org/10.1016/j.ecocom.2021.100945).
In the ‘sampling method’ section of these metadata, we present in detail the suggested needs for adjustments and additions in the GBIF structure, and we explain our short term strategy to publish an existing ecosystemology dataset using the current GBIF structure, by squeezing information within available and suitable fields of GBIF (mostly free text fields that are related to the ecosystem or habitat). Several fields are thus stored within a GBIF field by using the pipe separator (|).
We then developed a series of R scripts that take the ecosystem data squeezed into the GBIF fields and that restore the tables needed to do an ecosystem taxonomy treatment (by splitting columns at the pipe separators). Finally, we compile ecosystem checklists, taxonomies and occurrence data into an R shiny application. In addition, we integrate the use of Google Earth Engine (EE) and we develop the method to integrate these with the GBIF dataset toward the production of complete distribution maps and their use in Red Listing of Ecosystems (RLE).
The R scripts developed are available here: https://github.com/bsenterre/ecosystemology
Study ExtentAt the time of creation of this dataset, the existing data that will eventually be added later are distributed from: West Africa (mostly Guinea and Liberia), Seychelles, Atlantic Central Africa (Equatorial Guinea, Cameroon, Republic of Congo).
Sampling1. Adaptation of the GBIF standard to manage an ecosystemology approach Sampling of ecosystem occurrences follows the standard proposed in Senterre et al. (2021): http://dx.doi.org/10.1016/j.ecocom.2021.100945. This standard describes a method and provide modern tools to record ‘virtual ecosystem specimens’ in the field. The data collected with Open Foris Collect using smartphone are then managed in a MS Access database called BIO, but any other database system can be used, as long as the standardized fields and terminology are followed. The format used to produce the ecosystem occurrence core and its corresponding identification history extension, are detailed here along with a brief introduction relating to ecosystemology concepts. To get to the solution proposed here, we have reviewed the various standards available with GBIF through the main 'cores' and extensions (http://rs.gbif.org/extension/gbif/1.0/ and https://rs.gbif.org/extension/dwc/identification_history_2022-02-02.xml). Although conceptually the ecosystemology approach is complex, from a data management point of view it translates in a very simple way. Three core tables are needed: (a) a table of individual stands, or ‘eco-occurrences’ (a given habitat observed at a given location, by a given observer, at a given time) (b) a table of ‘eco-determinavit’, or ecosystem determinavit, i.e. of identifications of a given eco-species for a given stand, according to a given author at a given time. (c) a table of ecosystem types at the most fundamental operational unit level, which we called the eco-species level (by analogy with a species taxonomic approach, allowing for a terminology easier to integrate compared to other systems). (a) Occurrence core The stand table being analogue to the table of individual observations for species (i.e. the classical GBIF 'Occurrence core'), it contains all the same fields, except that all fields on species identification and individual organisms do not apply here, and fields describing the individual organism state (stage, status, establishmentMeans, etc.) need to be translated to ecosystem analogues. The fields needed are: occurrenceID: unique ID for each 'individual' stand at a given time basisOfRecord: applicable terms include 'HumanObservation', 'Literature', 'RemoteObservation', 'Modeling' (the last three terms need to be added) recordedBy: Surname, Firstname of a person recordNumber: field code given to a stand (to see how to create unique codes: https://www.researchgate.net/publication/353345419_BIO_KBA_3_4_Open_Foris_Collect_Survey_File_with_species_list_for_Seychelles) eventDate: YYYY-MM-DD eventID: Used to store (if existing) the unique ID of a survey or vegetation plot corresponding to the current time slice description of this stand. This links to the other GBIF dataset ‘seyvegplot’, although in principle, a given stand can be the object of more than one inventory (e.g. with a sub-plot for trees, another one for understorey herbs, one for some fauna elements, etc.). Therefore, we prefer to use the field ‘locationID’ (same as ‘recordNumber’) as a way to link stand descriptions made at a site with species inventories made at that same site, over time. habitat: free text as described by the source All the geographic fields: locationID, islandGroup, island, country, locality, location, verbatimElevation, decimalLatitude, decimalLongitude, locationRemarks. Note that the field 'locationID' can be used to get the list of the species observed within this stand and published in the ‘seychecklist’ dataset. This code is normally the same as the code provided in the field recordNumber, above (which is used for taxonomic citation of a stand as a virtual ecosystem specimen. fieldNotes: In the short term, we are using this field to squeeze in all the ecosystemic fields needed but currently missing from the occurrence core, see below (the terminology follows that of http://dx.doi.org/10.1016/j.ecocom.2021.100945) ecoFunctional: A generalized broad classification of habitat-types based on functional principles, e.g. the new IUCN global typology level 3 (or any equivalent) degreeOfInvasion: 'native', 'mostly native', 'mixed', 'mostly exotic', 'exotic' ecoNaturalness: 'natural', 'semi-natural', 'artificial' ecoStage: 'pioneer', 'early secondary', 'late secondary', 'old growth' (b) Ecosystem identification history extension The ecosystem 'names', in fact (because of the taxonomic approach being rooted on the concept of the nomenclatural type), is not really stored primarily in a table of ecosystem names. Rather, the actual creation of an ecosystem name is the application to a given individual stand of an eco-determinavit bearing the created new eco-species name, by a given author of ecosystem name, with a mention of the reference where the eco-species name is described, and an explicit statement that the eco-determinavit defines the stand as 'biotype' for the eco-species name (functionally equivalent to the mention of "sp.nov." in taxonomy). This is in fact where the name is created and exists. The most important implication is that it is on the eco-determinavit that the protologue reference must be indicated, and that the 'biotype' status is designated. This information is therefore not needed in a table of eco-species names (which table is needed to manage uniqueness of eco-species identifiers (IDs), and to link to eco-species profiles, etc.). In addition, the ecosystem identification history extension is the key to allow dealing with ecosystem taxonomic synonymies and therefore with ecosystem distribution. Because the Identification history extension is designed for species, it cannot really be extended to apply to ecosystems and it is probably best to create a new extension with the following fields (but this would require further discussion and for now we are using the fields identificationReference, identificationRemarks and taxonRemarks to squeeze the missing fields): occurrenceID: Foreign key linking to the Occurrence core (stands) identificationID: Primary key providing unique ID for each determinavit identifiedBy: Surname, Firstname of a person dateIdentified: The exact date must always be provided, even when the actual date is only known to the year or month. It has to be the date of publication of the protologue in case of a holobiotype designation, or parabiotype material, or else it is just the date of identification. identificationReference: A free text citation of the reference. Since the next 2 fields are not available currently in the Identification history extension, in the short term, we will squeeze them here as 'reference|page|ID' identificationReferencePage: The page in reference where a name is published identificationReferenceID: A unique ID (e.g.doi) to a publication biotypeStatus: If the eco-determinavit is one of taxonomic value (publication of an eco-species), the holobiotype must be designated here. Acceptable values are: 'holobiotype', 'neobiotype', and 'lectobiotype'. Since this field is not currently available in the Identification history extension, we will squeeze it into identificationRemarks. identificationRemarks: If a determinavit is applied to the holobiotype of an existing name, argumentation must be provided here as a free text. taxonRemarks: In the short term, we will squeeze here all the necessary ecosystem identification fields, see below. lifeZoneID, lifeZone, ecoOrderID, ecoOrder, ecoFamilyID, ecoFamily, ecoGenusID, ecoGenus, ecoSpeciesID, ecoSpecies, ecoSpeciesTranslated (c) Ecosystem taxon core As discussed briefly above, an eco-species table is useful to manage unique eco-species identifiers and other eco-species synthetic attributes. In such a table, although tempting, we should refrain to include fields defining the protologue and authors of the name and its biotype. Rather, we must make it mandatory to explicitly record the biotype (an individual stand) and to explicitly record the eco-determinavit responsible for the creation of the eco-species name. For the ecosystemology treatment to work with the proposed adaptation of the GBIF structure, it is important to note that the fields identifiedBy and dateIdentified must be clean and standardized (i.e. authors' names must be entered with care, and a protologue date must be entered precisely e.g. ‘2022-12-27’ and be exactly the same for all the determinavit originating from a given protologue). These fields are key to allow identification of paratypic determinavit (see R script), i.e. to identify determinavit originating from the protologue of the name they refer to. Alternatively, careful standardization of identificationReferenceID (using doi or an equivalent) can provide a solution, but it seems clear that standardizing references IDs is even more difficult than standardizing authors' names. The translation of the BIO eco-taxonomic backbone into GBIF can be discussed later. This will involve the creation of an analogue to the Taxon core for ecosystems, compiling the BIO backbone of eco-species, eco-genera, eco-families, eco-orders, life zones, and their definition using the standardized ecosystem ontology proposed in the ecosystemology paper (http://dx.doi.org/10.1016/j.ecocom.2021.100945). Note that eco-species names translated into several languages have to be merged into the ecoSpecies and ecoSpeciesTranslated fields: the former being dedicated to the original or preferred name out of the original protologue, while the later is for a translation (coming either from the protologue, or a posterior translation if one was not provided in the protologue). Each name must be followed by ' (EN)' or ' (FR)' to indicate the language used for each name (and any other language is acceptable). Alternative versions of a name, simplified for local communication, will have to be dealt with later using a modified version of the Vernacular names extension. 2. Extending GBIF geographic capabilities through integration with Google Earth Engine Thanks to this ecosystem dataset and GBIF, it is possible to formally store the basic ecosystem taxonomic information provided by vegetation and ecosystem studies such as those produced in West Africa by the current author. The exact list of ecosystem types recognized in those reports can thus be compiled within R from the most elementary raw data back into a checklist, including treatment of the synonymies, in a very transparent and explicit way. The next question consists in exploring the possibility of using the explicit checklist for reference in the production of distribution data on ecosystems, and therefore, ultimately, on assessment of threats (Red Listing of Ecosystems, RLE) and conservation priorities. Although for species, distribution data are predominantly available in the form of latitude-longitude point coordinates (rarely drawn polygons), for ecosystems, the situation is the exact opposite. Yet, theses geographic objects (raster or vector data) cannot be published on GBIF. Where to publish geographic data on biodiversity distribution? Considering that a large amount of similar data (e.g. on land cover) is already compiled on Google Earth Engine (EE), worldwide, and that this repository is free and can be accessed easily by R (using rgee package) and thus compiled easily with GBIF data (also in R using various packages, e.g. rgbif), we therefore discuss here a solution based on EE as repository for the geographic data not allowed in GBIF. What format to follow for publication of geographic data related to an ecosystemology approach? In other words, how to link EE to GBIF? There are basically 4 issues that need to be discussed: (a) Put a GBIF ID in EE or an EE ID in GBIF?; (b) How to deal with ecosystem types being identified by combinations of rasters (e.g. land cover + landform)?; (c) How to deal with mapping units that include more than one eco-species?; (d) How to deal with maps that detail different states of a given ecosystem type (e.g. primary, secondary)? (a) Although we could technically insert in the GBIF ecosystem dataset a link to an object in EE (Asset id + band name or attribute + value), this is not a good idea because these EE Asset IDs are not necessarily stable and can change when an asset is moved to another folder. Thus, the best option is to reference GBIF IDs in EE assets’ attributes. : occID (occurrenceID) taxonID ecoOccID (ecoOccurrenceID) ecoSpID (ecoSpeciesID) assocSpID (associatedEcoSpeciesID) ecoGeID (ecoGenusID) basisOfRec (basisOfRecord) recordedBy eventDate locality habitat remarks Only one of the occurrence and taxonomic IDs can be provided. If an ecoOccurrenceID is provided, the ecoSpeciesID will be retrieved through GBIF. (b) If we take the example of ‘West African ravine forests of the perhumid lowland rainforest life zone’ ecosystem, no explicit GIS data exists directly. But one can be derived by combining a layer for ‘forest’, a layer for the ‘perhumid rainforest life zone’, and a layer for ‘ravine’ landforms. Therefore, there is not a place in EE (in a given asset) to put the eco-species ID for that ecosystem. In that case also, we are generally dealing with the broadest distribution knowledge, obtained through modeling. This means that the geographic object is never a stand and therefore it can never be a dynamic component of an ecosystemology approach (because of the impossibility to use eco-determinavits on something that is not a true individual object). Therefore, we suggest that the only solution that can be used is to define the geographic object either in an R or in an EE script (preference for the later, see https://code.earthengine.google.com/?accept_repo=users/bsenterre/ecosystemology, due to the difficulty of installing and using rgee). This method can be interpreted as the publication of ‘dynamic’ eco-determinavit on objects created in EE. These cannot be used for eco-taxonomic purpose but can be used for analysis of the eco-species distribution. The created EE objects and their eco-determinavit can then be saved to the user’s asset, in the same ‘bioOccurrences’ folder, using a file naming standard where the prefix “ecoSp_” is combined with the eco-species ID as defined in GBIF. Later these assets can be retrieved in R (using rgee) and compiled (as local ‘.tif’ raster data files) with other data into a shiny app aimed at Red Listing of Ecosystems. (c) Sometimes, clearly defined elements being mapped (mapping units), within clearly identified geographic objects (or EE assets), represent a composition of several ecosystem types that cannot be distinguished at the map scale. These mapping units are therefore ‘mixed’ in terms of ecosystem composition (at the eco-species level). Therefore, it is not possible to associate the mapping unit to a given eco-species ID. We suggest that the solution to that problem can be analogous to the problem of some non-vascular plant specimens (Marchantiophyta typically), where many species can grow together in mats or as epiphylles that cannot be physically separated into individual specimens). Only two solutions: either put in additional work to actually get to separate them as different mapping units or split the attribute giving the link to an eco-species ID into two fields, one for the ‘predominant’ eco-species represented by the mapping unit, plus one for the ‘associatedEcoSpeciesID’. The later field is therefore partly analogue to the field ‘associatedTaxa’ in the ‘Occurrence Darwin Core’. If more than one accompanying eco-species is present, the ecoSpeciesID of all the accompanying eco-species can be provided with a pipe separator ‘|’. (d) In the case of a geographic object describing a given eco-species in a given state (e.g. climax state), if that object can be linked to a given eco-occurrence in the BIO database, then state information can be delt with in BIO. If not, then the state has to be also captured in the attributes of the geographic object. This can be achieved by defining rasters not as binary (1-0) but with codes corresponding to the state. The same codification can then be used in the vector bioOccurrence FeatureCollections. We will discuss this further in a later version. What can be used for ecosystemology? It is important to note, after having discussed the above, that only geographic objects that are linked to individual occurrences (of eco-species) will be available for contribution to ecosystem taxonomy. This is simply the consequence of eco-determinavit being applicable only to eco-occurrences (stands) and not to eco-species. Let’s take the example of the map produced for “West African dwarf subsaxicolous forest of the perhumid submontane life zone”, which is a multipolygon object with several polygons in the Nimba (Guinea) and one polygon in the Wologizi (Liberia): https://www.tropicos.org/projectimages/Threat.Pl.LGMN/Ecosystems/03-map.html The options are: To give the ecoSpeciesID as attribute to the multipolygon To link the multipolygon object to a broadly defined ‘eco-occurrence’ in BIO To split the multipolygon into smaller groups or (ideally) individual polygons corresponding to individual localities, and to link each one of them to a stand description provided in the ecosystem GBIF dataset (i.e. an ecoOccurrenceID) Only the third option will provide the possibility to split taxonomically the Nimba form of that ecosystem, on one side, and the Liberian form, on the other, if an author wanted to consider them as distinct eco-species. With the first option, these geographic information on ecosystem distribution will never be allowed to be involved in ecotaxonomy, and no history of ecosystem identification will be allowed since the ecoSpeciesID will be edited directly in the EE asset and not in the BIO database/GBIF eco-determinavit.
Quality ControlShiny apps are being developed to make all this easily accessible and to facilitate the search for issues in the data. We cleaned as much as we can but in a later version, we will detail the quality control process.
- Step 1 In the BIO database (Seychelles National Herbarium), data are entered on observed stands, (i.e. individual occurrences of ecosystems at a given location and at a given time) using either a smartphone application designed with Open Foris Collect or a direct entry in the database (MS Access forms), following the standards described in Senterre et al. (2021).
- Step 2 Eco-taxonomic revisions (taxonomic revision of ecosystem types) that provide explicit identification of individual stands to named ‘eco-species’ (i.e. the most elementary taxonomic unit of an ecosystem-type) are then entered in the BIO database in the form of ‘eco-determinavit’ attached to those stands. Eco-determinavit that represent the actual creation of an eco-species name are explicitly qualified as such by defining the stand as the ‘holobiotype’ of the eco-species name according to the corresponding author of the eco-determinavit. If an eco-determinavit (i.e. a reference to a name) is added to the holobiotype of another eco-species name, a synonymy is established. Even when no eco-taxonomic novelty is being proposed, new versions of this dataset can involve new eco-determinavit, i.e. new identifications of existing stands to a given eco-species. Those are being done by exploring the BIO database and the shiny apps, focusing on unidentified stands that do have some ecosystem characters recorded, or accompanying photos.
- Step 3 The ecosystem occurrence dataset and its corresponding identification history data are exported to two text files built to the GBIF standards. To make sure that Memo fields (> 255 characters) are not interpreted as short text fields and truncated to 255 characters, we follow the procedure described here: https://social.technet.microsoft.com/Forums/office/en-US/9ba70566-3e46-4df7-977e-88144d611d71/access-2007-memo-field-export-to-txt-file?forum=officeitproprevious In the same export window from MS Access, we define the export as tab-delimited text file, encoding as UTF-8, dates as ‘YYYY-MM-DD’, text fields marked by “”. The methodology and format used in the production of the eco-Occurrence file and of the identification history file (eco-determinavit) are detailed in the section ‘Sampling description’ of this asset metadata.
- Step 4 We log in the GBIF IPT (https://cloud.gbif.org/africa), then go to manage resources and we enter the ‘ecosystemology’ resource. We delete the two earlier version of data sources, and then we load the new versions exported from MS Access at the previous step. Then we map, each of the two source files to the Occurrence Core and the Identification history extension. If there is any change in the metadata, we make the required changes. Finally, we ‘publish’ the new version, providing a brief text that explains the changes since previous version.
- Step 5 We run the R script that create ready-to-use files for the purpose of updating the shiny app. Then we update the shiny app version by re-plublishing the script. These R scripts are available here: https://github.com/bsenterre/ecosystemology. If any of the newly added ecosystem occurrence data, or any older data, has a newly developed spatial object published in Earth Engine, that object is linked to its corresponding ecosystem data in BIO using the corresponding BIO ID (which can be the ID of a species occurrence, a species name, an ecosystem occurrence, an eco-species name, an eco-genus name). The script is adjusted to create a copy of these new geographic objects in our R working directory used for the shiny app. Vector data are stored in geojson format (considering the limitation of ‘esri shapefiles’ regarding the length of field names.
- Senterre, B., Lowry II, P.P., Bidault, E., Stévart, T., 2021. Ecosystemology: a new approach toward a taxonomy of ecosystems. Ecol. Complex. 47, 100945. https://doi.org/10.1016/j.ecocom.2021.100945 - https://doi.org/10.1016/j.ecocom.2021.100945
- SENTERRE, B., BIDAULT, E. & STÉVART, T. (2019) Identification et évaluation des écosystèmes menacés du Mont Nimba. Rapport de consultance, Missouri Botanical Garden (MBG), Africa and Madagascar Department. Available from https://doi.org/10.13140/RG.2.2.13242.93129 - https://doi.org/10.13140/RG.2.2.13242.93129
- SENTERRE, B., PARADIS, A.-H., BIDAULT, E., STÉVART, T. & Lowry II, Porter P. (2022) Qualité et distribution des savanes montagnardes du Nimba. Rapport de consultance, Missouri Botanical Garden (MBG), Africa and Madagascar Department. http://dx.doi.org/10.13140/RG.2.2.13433.34401 - http://dx.doi.org/10.13140/RG.2.2.13433.34401
- Senterre et al. (2022) Cartographie des grands types de savanes du Nimba par interprétation visuelle et intégration d'un modèle multispectral - http://dx.doi.org/10.13140/RG.2.2.31874.76488
- Senterre et al. (2020) Assessment of Key Biodiversity Areas in the Lofa-Gola-Mano-Nimba complexes (West Africa) using ecosystem criteria - http://dx.doi.org/10.13140/RG.2.2.17934.89924
Seychelles National Herbarium
Bel Etand, Mont Fleuri
P.O. Box 720
Telephone: +248 2746862
Seychelles National Herbarium
Bel Etand, Mont Fleuri
P.O. Box 720
Telephone: +248 2746862
Missouri Botanical Garden (MBG)
administrative point of contact
Government of Seychelles
administrative point of contact
Seychelles National Herbarium
Bel Etand, Mont Fleuri
P.O. Box 720
Telephone: +248 2746862