Ecosystem Approach Community of Practice: OBIS

From IMarine Wiki

Jump to: navigation, search


OBIS Current Situation

The Ocean Biogeographic Information System (OBIS, is a global information system for digital marine biodiversity data. It is comprised of data management facilities and infrastructure that provide open access to data for technical, educational, scientific and resource management purposes. By providing access to marine biogeographic data using a standard terminology, OBIS fills a critical marine data gap. An extensive description of OBIS can be found on-line.

OBIS is built on open-source software. The central data management tool is PostgreSQL, with PostGIS as GIS extensions. The web site and web-based search interface is based on Apache, PHP, GeoServer and OpenLayers.

OBIS manages the upload of data from a variety of sources (list) and formats (csv, ...), and offers many database functions and procedures for sorting, filtering, merging, etc. It also includes advanced analytical support, as PostgreSQL queries and pl/pgsql functions. The data are used to calculate several derived products, mainly maps of diversity indices such as species richness, and heat maps of sampling density and number of species recorded. Most of these calculations are in psql; in some cases, R has been used.

The resulting OBIS rdbms feeds data to the OBIS website, and to upstream aggregators such as GBIF and EOL. It can also:

  • generate geospatially explicit data (georeferenced)
  • display data in an integrated MapViewer;
  • export data over R-ODBC to a stand-alone R-environment;
  • calculate 'ranges' over a number of physical oceanography parameters, and bathymetry, which are shared with the Encyclopedia of Life (EOL,
  • etc.


The OBIS infrastruture is distributed, with several servers hosted by partners in the OBIS network. An 'OBIS instance' consists of two servers: one database server; the other a web and application server; operating system is Ubuntu. More details on the set-up can be found in a recent paper, "Fujioka, E., Vanden Berghe, E., Donnelly, B., Castillo, J., Cleary, J., Holmes, C., McKnight, S., et al. (Accepted). Advancing global marine biogeography research with opensource GIS software and cloud computing. Transactions in GIS." Please contact Edward if you're interested in a copy of the pre-print. As soon as the paper is available on line, this page will be changed to link to it directly.

Currently there are three such pairs: the development, staging and production installation. The staging and production pairs both are hosted by the Flanders Marine Institute. The development web server lives in Duke University; the data assembly ('data development') machine is managed from Rutgers, and lives on the Amazon Cloud. One issue to investigate is whether the data development machine can be operated on the D4Science infrastructure. The Amazon solution is extremely satisfying (very stable, fast, flexible...) but not without cost.

Several pieces of the OBIS data streams are clear candidates for work. There is a need to automate the quality control; data ingestion and integration has to be improved, including detection of duplicates; upstream provision of OBIS data, mainly to GBIF and EOL should be improved; marine data now available in GBIF but not in OBIS should be incorporated in OBIS. All these activities will lead to a bigger and better OBIS database, which will be better able to server the Community of Practice.

Another line of work will be the environmental modelling. For this, we need access to physical oceanography data and other data types (bathymetry, distance from ice...). We need the algorithms to be operational - Aquamaps and the algorithms implemented in OpenModeller. The first results of this line of work are relatively far away. As one intermediate result, we can automate/streamline the data delivery to EOL.


Many unfinished sections

OBIS Postgres database management OBIS Pgadmin OBIS R-ODBC OBIS Users Management OBIS Data vizualization (Table / Chart / Map) OBIS Data Export …. …. ….

IRD Data Access


IRD has discussed services it expect to contribute to OBIS. IRD has set up the GBIF Integrated Publishing Toolkit (IPT, [1]) last year on a server at IRD to provide access to part of IRD’s data:

  • metadata with EML metadata format,
  • data with Darwin Core data format,

In iMarine, a workflow has to be defined that allows the reuse of these data to populate the OBIS database. The data can be obtained from GBIF or by connecting OBIS system the IPT instance of IRD. The obvious problem is to avoid duplication of data in the OBIS system from different dataflows.


Currently, OBIS does not interact with IPT – even though it increasingly replaces DiGIR providers. iMarine WP6 should validate the Harvesting and Indexing Toolkit (HIT, [2]) developed by GBIF (HIT, if I'm not mistaken) and connect it to the PostgreSQL instance of OBIS on the D4Science infrastructure. That would kickstart collaboration with WP6 and provide a powerful tool to harvest not only IRD but also other data.

To avoid 'Crop circles', a flag in the metadata whether a dataset can flow upstream or not. Also, data that have already been submitted to GBIF should be recognizable; this avoids that OBIS in iMarine passes that data to GBIF, and that there is no need to harvest from GBIF (if it is already harvested directly from IRD).

IRD data accuracy

The IPT of IRD can expose much more data than currently shared with GBIF. IRD currently does not share these data because of location accuracy issues. OBIS / iMarine has addressed this issue in the past (e.g. for trawlnets) and IRD seeks a solution through the iMarine collaboration.

IRD collects data from landings of purse seiners, and these do not always reflect exactly where the fishes were caught (various possible fishing operations over months and huge areas). Instead of a range of accurate points IRD has many possible points; resulting in a polygon that can be small or big). IRD has developed a range of different algorithms to transform polygons into points that could be used in OBIS.

IRD can share more data located with polygons instead of points at the end of this year.

For OBIS, harvesting directly is important; it provides the option to be more specific for marine data. For instance, ingesting transects as start and end-points, rather than a single point. OBIS can provide details on the differences between Darwin Core and the OBIS Schema. Other extension requirements are dealing with polygons and sets of points.

IRD biological parameters

IRD can share biological parameters such as weight and length as well for some of its datasets, and share the sampling / survey method used in the collection of the data.

Lengths and weights would be an interesting extension, but we'd have to extend the OBIS Schema in order to deal with those. I'm a bit reluctant to do this directly. I think we first should investigate rewriting the OBIS Schema as a formal extension to the new extendible Darwin Core, and then do length and weight as an extension of the extension, or as a direct extension of GBIF's Darwin Core (assuming we can have several extensions in parallel).

IRD data coverage


IRD data coverage OBIS data coverage

EurOBIS is not in charge of the Indian Ocean data.


The IRD datasets start in (list sets) They are updated continuously / monthly / annualy

OBIS Website

OBIS VRE Profile


Describe the proposed solution in maximum 3 sentences:

With ICIS capture time-series can be

Priority to CoP

List proposed solution priority following the iMarine Board priority setting criteria:

  • Identified community: Users now:
  • Potential for co-funding:
  • Structural allocation of resources:
  • Referred in DoW:
  • Business Cases:
  • How does the proposed action generally support sustainability aspects
  • How consistent it is with EC regulations/strategies (eg INSPIRE, ... ):
  • Re-usability – benefits – compatibility


Relation to CoP Software Relation to D4S technologies

Does the proposed solution solve other problems associated with EA-CoP Business Cases?

If the proposed solution can be used in another SW scenario (not users!) please describe.


How big is the expected user community after delivery?


Are the proposed measures effective?

Does it reduce a known workload?


Is the proposed solution cheap?

Expected effort in PM:


How is the component delivered to users? (Design / on-line help / training material / support). The OBIS VRE will be a VRE that build a data-load and validation interface around an existing Postgresql db.

The VRE will not replace all existing data structures and services, and a pgAdmin is expected, even with very resticted grants and rights on the DB instance.

The VRE will offer

  • Data discovery through DiGIR, IPT, and HIT
  • Data Loading to the DB
  • Data Vizualization in a map
  • Interactive data management with a tabular and map interface.
  • ...


Are they safe?

The Postgresql stores data ...

Access is only possible through ....

Need the proposed solution to manage confidential info at data / dataset / organizational level? None of the data is confidential.

Describe security and privacy issues:


Are there any policies available that describe data access and sharing?


Are these really needed?

Yes, OBIS combines data from many different resources and it is important to keep track of the ownership, and provide proper attributions in all products. Without a well-defined data access and sharing rule-frame, it is also very difficult to identify replicates; e.g. if a dataset has already been uploaded to GBIF, and if it has alreday been reviewed.

Copyright / attribution / metadata / legal

The attribution records are most important.


Do they introduce moral hazard? (A hazard here is the risk that users will behave more recklessly if they are insulated from the effects of the software, or if they do not understand what it produces, where data come from, what they represent etc. .)

The OBIS VRE carries no risks to users and or developers.

Personal tools