Biodiversity Draft Work Plan Q2-3- 2012
From IMarine Wiki
The expectations of the Biodiversity Cluster (Mainly related to BC 2) were presented at the imarine Board meeting in Rome in March. After the meetng they were further detailed using WP3 Conference Calls, and meetings with EA-CoP representatives. The iMarine Vice-Chair (E. Vanden Berghe) was intrumental in collecting the descriptions.
Abstract or Executive Summary
One of the 4 currently identified clusters in iMarine is 'biodiversity'. The term is sufficiently vague to enable the collection of requirements that operate on, or benefit from, 'biodiversity' domain. The boundary of this domain is far from sharp, and immediate relations with e.g. the geospatial and statistical clusters are evident.
The work-plan is not domain specific, and technology neutral. It decribes how the iMarine Board can be involved in the specification of use-cases, data policies, and harmonization issues, to name a few. This page is deliberately written from the end-users' point of view, without details of implementation, but concentrating on functionality. This page is seen as a spring-board from which to define a concrete work plan that can be passed on to the development teams.
Introduction and Background (The Problems)
The iMarine Board is responsible for the implementation of 2 Business Cases in the project, and brings a wealth of community expertise to the technical e-infrastructure. The EA-CoP has needs to search over multiple resources, to extract data from several of them, and to integrate the data from many sources in information products transgressing boundaries between data sets and even scientific disciplines. For example, the ultimate aim of Business Case 2 is to perform analysis and create information products that will support the FAO Vulnerable Marine Ecosystem project; these products will have to be based on biogeographical data warehouses such as OBIS and GBIF (in themselves compilations of large number of individual data sets), and environmental data from oceanography, meteorology, bathymetry...
The opportunity was presented and discussed at the imarine Board meeting in Rome (March 19-21), and later furtehr elaborated and discussed by project partners with EA-CoP input.
Goals and Objectives (The Outputs)
The ultimate goal of the Biodiversity Cluster, and of Business Case 2, is to support the ecosystem approach to fisheries. In the first instance this will be done through activities supporting FAO's Vulnerable Marine Ecosystems project, and CBD's Ecologically and Biologically Significant Areas activities. The ultimate goal of BC2 is to be able to bring biodiversity information into the process of defining VMEs and/or EBSAs, by predicting hot-spots of biodiversity, by providing information on distribution of rare and/or endangered species, by providing information on areas essential in the life cycle of marine species... The iMarine activities - and hence the D4Science infrastructure, will be used to enhance the workflows to integrate data within the warehouses for biogeographic data, by enabling the creation of tools for quality control, and by bringing biodiversity data together with non-biodiversity environmental data.
The cluster discussion at the imarine board meeting was summarized by OBIS (Edward Vanden Berghe). Five goals were identified as products that can be delivered, for each of these, an initial set of objectives emerged that require further discussion. The goals are : Taxon name access, Taxon name reconciliation, Occurrence data access, Occurrence data reconciliation, and Occurrence data enrichment.
Taxon Names Access
The work of CNR will have to be reviewed and validated once the service is completed by the end of April 2012. It is noted that plug-ins now exist to consume WoRMS services, and that also Catalogue of Life is available within the infrastructure. Three mare reference lists used at OBIS at this point are ITIS, IRMNG and NCBI. Now these lists are kept up-to-date by downloading a copy of these databases, and uploading to the OBIS PosgreSQL database; this is a time-consuming process, and access to these name lists in an environment where also the OBIS names are available would enhance workflow for OBIS.
GBIF is exposing its taxonomic names as well; this could be explored as a separate reference list.
Efforts in this context will mainly be of benefit through combination with Taxonomic Name Reconciliation.
Taxon Name Reconciliation
Currently, OBIS (Edward) uses SQL statements to merge taxonomic lists. This is based on a number of rules that marshall the merging.
The proposed service will produce a list of pairs of Taxa each with a probability of similarity among the two Taxa; CNR and FIN will take the leadership of specifying services for Taxon Data; FAO has developed a very similar tool for vessel disambiguation that includes a well designed UI. This will be reviewed too.
Data availability depends on the number of plugins the infrastructure is equipped with, one plugin for each data source / provider. It is therefore evident that the first Use Case has to be operational.
Occurrence data access
CNR is already implementing a first schedule:
- first develop occurrence points data access, i.e. work on services giving access to occurrence points from a number of data providers. The occurence data service will be based on a species name, and spatial and temporal parameters.
- then, in a second phase the taxonomic data ;
Occurrence Data Reconciliation
There may be overlaps and gaps between the datasets contained in 2 (or more) repositories. With millions of occurence records, support is needed to identify both the gaps and overlaps, not only at data level, but also at dataset level.
- OBIS and GBIF can serve data through an 'occurrences service';
- The project partners have to consider how to define an 'occurrence service' for 'singleton'/'duplicate' identification;
CNR has plans to initiate work in May.
Occurrence data enrichment
By the end of April CNR expects to complete the activity on 'occurrence point access'. The enrichment will come in a successive phase, also since this depends on results of other clusters (namely the Geo-spatial one);
The ocurence data enrichment would see a user use a service that, either in on-line or in batch mode, takes a set of spatio-temporal parameters, and a set of occurence points, and queries and external environmental data repository to extract geospatial explicit information. For example, for 10.000 points, the nearest 1000 Sea Surface temperatures are interpolated over a 1 month period, and returned as average, max, min, std for each point.
For outliers flagging on land gazetteers are available, however, in a marine environment the notion of space is different, and iMarine can contribute truely innovative solutions.
CNR sees a role for the other project partners and the imarine Board to guide the classification of Occurrence Points, e.g. survey data rather than specimen.
Taxon Distribution Modeling
Use OBIS data for generation of species distribution maps and analysis of occurence data using a variety of modeling tools, including AquaMaps. The tools proposed are the CRIA OpenModeller suite and the gCube statistical services.
There are several dependencies between the different use cases. These can be graphically represented as follows:
---Taxon Name Enrichment | Taxon Name Access -----------------> Taxon Name Reconciliation | Occurrence Data Access ------------> Occurrence Data Reconciliation | Environmental Data Access ----| | |--> Occurrence Data Enrichment | | ------> Taxon Distribution Modelling
Resources and Constraints (The Inputs)
The iMarine project was designed with a clear vision on the need for support to challenging data access and management scenarios. It also anticipated that specialized resources would have to be identified after the project started, e.g. in establishing collaborations with specialized departments in project partners' institutions (FAO, IRD), and related EA-CoP projects such as with AgInfra. A quick assessment of some potential resources to include can be used to identify the nex steps to bring them to the e-Infrastructure. The resources in this project that con be included are listd by contributing project partner:
The below tables list the resources by: Name; a short identifyer Source; a url or other resource identifyer; MosCoW; Must Should, or Would the resource be exploitable through the e-Infrastructure; Purpose; in what scenario / Use Cases is the resource needed;
OBIS Taxonomic data:
|World Register of Marine Species||Link to WoRMS||Must||Taxon name access|
|Catalogue of Life||Link to CoL||Must||Taxon name access|
|Integrated Taxonomic Information System||Link to ITIS||Must||Taxon name access|
|Interim Register of Marine and Non-marine Genera||Link to IRMNG; discriminating between marine and non-marine, but also recent and fossil taxa||Should||Taxon name access|
|National Center for Biodiversity Information||Link to NCBI and GenBank||Should||Taxon name access|
- Note: The OBIS and GBIF taxonomy are already available, but should not be used as a source for taxonomy – OBIS is a ‘consumer’ of the taxonomy, not an authoritative source. The same is true for GBIF
|OBIS||Link to OBIS||Must||Occurrence data access|
|GBIF||Link to GBIF||Must||Occurrence data access|
There are several more ‘thematic sub-networks’ of GBIF – such as VertNet. We could check whether GBIF has all these data of VertNet, just as we can do for OBIS. Let’s concentrate on OBIS to build the tools, then afterwards we can try and ‘sell’ this to the VertNet people. ‘VertNet’ is Vertebrates Network, and is itself a network with specialised nodes dealing with mammals, reptiles, fish, birds.]
Once the above 2 have been decided in this cluster, for the later use cases the advice of the geospatial partners and cluster wil be sought. Several data sets are essential to serve as input in the environmental envelope modelling activities foreseen with openModeller, many others are desirable. Some of the more pertinent data sources are:
- ETOPO (OBIS is using the 1 minute grid);
- as an alternative, or in addition, we might want to explore GEBCO: has a reputatio to be more reliable, especally in coastal waters; and exists now in 30 seconds grid.
- There are a number of parameters we might want to derive from the bathymetry (such as rugosity, and slope and aspect)
- Physical/Chemical Oceanography
- World Ocean Database 09; available from the USA National Oceanographic Data Center in Silver Spring, Maryland; contains data from 9 million 'casts', mostly vertical profiles; also includes data of species occurrence of planktonic animals, mostly obtained from US sources. The biogeographical component of WOD has been extracted from WOD and integrated in OBIS.
- World Ocean Atlas is derived from the WOD; WOD contains the raw data; WOA contains data interpolated to standard grid (lat/lon), depth and time. It would be good to check with the NODC people how they do this – instead of reducing WOD to the regular-spaced points/intervals of the WOA, we might try and adapt their algorithms to give us the best value for the position and time of the OBIS data
- The one thing missing from WOA is pH – which is very important
Others we could include: distance from ice (can be derived from datasets available from a NOAA web site); ocean colour (as a proxy for production); distance from coast and from 200-m depth contour… Last but not least: IPCC predictions for change of temperature, oxygen and pH; apparently there are some models that predict these not only for the surface but over the water column.
This list is likely to grow – it would make sense to have it on a Wiki page
FAO - Use case description, data provider
CNR - Tools and application provider, developer
CRIA - Tools and application provider, developer
FIN - .....
Specific constraints are the low level of expertise in gCube technolgy development in th EA-CoP and with some partners that have developed biodiversity tools. In addition, many data are volatile or incomplete, and will require specialized curation.
Strategy and Actions (from Inputs to Outputs)
This schedule will have to be further elaborated, and discussed with the iMarine Board in May. Their response can then be used at the TCom in Greece in June.
The goals and objectives have been defined and discussed at the iMarine Board meeting in Rome in March. Here, it was also decided that a biodiversity cluster be established to define objectives, and prepare outlines for VRE's, applications and services. These will then be presented to the iMarine Board and the wider EA-CoP (May 2012).
In June, the results from the EA-CoP consultation will be discussed at the TCom, to establish feasibility, usability, and usefulness of the identified Use Cases and components.
The feed-back from the TCom and technical boards will then be discussed with the iMarine Board and selected EA-CoP representatives for follow-up ations.
Meanwhile, project partners already can spend effort on the first 4 Use Cases to support; the Taxon service, the data access to biodiversity data repositories, and discovery and dowload of species occurence data; taxon name discovery, taxon name reconciliation, occurrence data access, and occurrence data reconciliation.
Appendices (Planned Effort, Resources, Meeting notes, Schedule and Others)
Planned effort & resources
FIN: From 1 July 2012 to end of the project: at least 50% time of Maricarl Ortiz (programmer VRE); 10% Nicolas.