Semantic cluster

From IMarine Wiki

Jump to: navigation, search

The main purpose of the Cluster work plan (template here) is to provide the iMarine Board with a management tool usable as a framework for planning activities, and that can serve as a guide for carrying out that work. The scope is thus the interface between the Board and the project's Work Packages activities. After drafting, a work plan needs approval from the iMarine Board, following the Board procedures.


Executive Summary

The iMarine Semantic Cluster is maintaining and promoting a Work Plan (this document) aimed at:

  • organizing collections of requirements gathered from the iMarine Business Cases
  • providing recommendations for the implementation of the iMarine infrastructure.

The requirements are inputs for the cluster, from iMarine Business Cases that are grouped as follows:

  • Support to regional (Africa) LME pelagic EAF community [1]
  • the FAO deep seas fisheries programme
  • and the UN EAF Ecosystem Approach to fisheries

The recommendations are outputs from the cluster, primarily intended for the iMarine Board, the iMarine project partners (Work Packages) and the Communities of Practice (CoP) identified within the Ecosystem Approach. They are aimed at releasing infrastructure services such as:

  • setting up ontologies from controlled vocabularies of the domain: species taxonomy, fishing vessels and gears codes (FAO, DG-MARE code lists, )...
  • creation of Linked Open Data through enrichment of Metadata with URIs of ontologies (TLO, Ecoscope, FLOD, WORMS): bibliographic references, OGC metadata (data sources and related services including processes), EML metadata, .pdf / . doc files
  • workflow for massive RDF generation, storage and publication (triple store, SPARQL endpoint, OpenSearch).
  • seamless access to metadata catalogues through search engines based on ontologies

Such Infrastructure Services can be used by the iMarine eScience services (VREs & Apps): species manager, geoexplorer, iMarine search engine.

Introduction and Background (The Problems)

Currently, some datasets are freely available (GBIF, OBIS, INSPIRE..) but difficult to retrieve as related metadata are heterogeneous. Indeed the name of creators and other tags used to annotate these resources with related entities of the domain (species, fishing gears, fisheries..) are rarely using the same terms. Data discovery is thus complicated because users have to use synonyms for the same concepts in multiple languages to retrieve the datasets. Ontologies can help in matching terms and improving data discovery.

Semantic Web and ontologies enable data producers to create richer metadata. Usual metadata are using XML schema with literals as values for tags (like keywords, persons). This is the case for Dublin Core metadata, OGC metadata, EML metadata. These XML metadata with literals can be transformed in RDF metadata with URIs of ontologies. This can be achieved programmatically with text mining applications.

However, most of all, the main issue is the lack of ontologies for the domain of Ecosystem Approach to Marine Resources. Many initiatives have been dealing with related sub-domains:

  • species:
    • Worms [2] which is not a real ontology but is translated into RDF [3]
    • NASA Semantic Web for Earth and Environmental Terminology (SWEET ontologies [4])
    • ontologies for ecoinformatics [5]
  • fisheries sciences: Neon with FAO [6]

On top of these ontologies, there is a need to built a new top-level ontology which reuses parts of existing ones (including those for information resources: Dublin Core, FOAF, Dclite4g [7], Genesi-dec [8]..).

Such ontologies can be used to set up knowldedge bases by instianting underlying classes and properties. Indeed, concepts are not only URIs to annotate information resources but are made of a set of properties indicating the relationships between entities of the domain: which species is predator of these species, which fishing gear are targeting these species, where these vessels are fishing... Knowledge bases can thus be used to set up Web portals summarizing some knowledge about entities: fact sheets about species, fishing gears, ecosystems, fisheries..

Automated fact sheet generation is a key issue in iMarine if we consider that a lot of systems have set up fact sheets:

  • Worms Yellowfin Tuna fact sheet [9]
  • FIRMS Yellowfin Tuna fact sheet [10]
  • Fishbase Yellowfin Tuna fact sheet [11]
  • Encyclopedia Of Life Yellowfin Tuna fact sheet [12]
  • GBIF Yellowfin Tuna fact sheet [13]

Being able to generate such fact sheets directly from RDF requires the content of underlying information systems to be made available in RDF. To achieve this goal, iMarine VRE and apps can help. Indeed, applications like "species manager" can combine information from different sources (OBIS, WORMS, GBIF, Fishbase...) and export the resulting mapping in RDF (compliant with TLO).

Other domains face similar issues and research projetcs like agInfra suggest methods and tools that have to be taken into account in the framework of iMarine.

Goals and Objectives (The Outputs)

Outputs of the cluster are Roadmaps, Tradeoff analysis and Guidelines for the development, deployment and maintenance of infrastructure services involving semantic resources and technology, such as:

  • publication of species manager results (code mapping / reconcialiation) VRE with RDF (based on Top Level Ontology Schema)
  • publication of iMarine geonetwork metadata (about data sources and related services: WMS / WFS/ WCS/ WPS...) through RDF (based on GENESI-DEC Schema)
  • RDF generation from various types of information resources (Web Pages, OGC metadata / CSW URL, .pdf /. doc files, bibliographic references..)

Such Infrastructure Services are needed by the iMarine eScience services (VREs & Apps) and other web service endpoints.

A validation process aims at matching the cluster outputs with 'consuming' eScience services like these ones:

  • a VRE to provide GUIs to facilitate RDF generation through iMarine Tagger
  • a VRE to provide a search engine for iMarine enabling seamless access to different metadata catalogues (iMarine native metadata element set, OGC, publications, pictures...)
  • Smartfish Web portal
  • Fact sheet generator (e.g. Tuna Atlas Use Case)

Resources and Constraints (The Inputs)

The Business Cases requirements are inputs for the cluster, they come from 3 Business Cases that are grouped as follows:

  • Smartfish
  • Tuna Atlas

Other inputs:

  • RDF sources for domain entities: FAO FLOD (species, vessels, areas and related properties), IRD Ecoscope (species, vessels, ecosystems and related properties), WORMS (taxon ranks and related properties), Species manager VRE (species and codes).
  • RDF sources for information resources metadata: FAO FLOD (publications, ??), IRD Ecoscope (pictures, databases, publications, people...), iMarine geonetwork

Strategy and Actions (from Inputs to Outputs)

Another Wiki page is dedicated to Semantic cluster achievements [14] related to iMarine Board Work Plan [15].

From the strenghts and skills of the iMarine partners contributing to the Semantic Cluster, the following action plans have been conducted or are underway:

  • Leveraging the FLOD and Ecoscope knowledge bases,
  • Implementing SPARQL enpoints,
  • Implementing OpenSearch,
  • Implementing new schema for RDF metadata (GENESI-DEC)
  • use FORTH search engine (xSearch) on top of FLOD and Ecoscope knowledge bases (including OpenSearch for results and SPARQL enpoints for clustering),
  • use FORTH entity / text mining application with FLOD and Ecoscope to highlight Web Pages,
  • use FORTH entity / text mining to annotate new kinds of information resources (bibliographic references, OGC metadata...)

For each of them, it is envisioned (by January 2013) to review and benchmark their added-value accordingly to the following iMarine standard review:

  • Who are the Users
  • Who are the co-funding partners
  • What are the iMarine infrastructure resources involved
  • What are the outcomes that do match the iMarine Description of Work
  • How do they fit in the EA-CoP business cases
  • How do they contribute to the sustainability of an EA-CoP
  • How far are they re-usable with clear benefits to EA-CoP representatives, and proven compatibility with EA-CoP resources
  • How far are they consistent with EC regulations/strategies such as open data strategy for Europe [16].

Cluster Participants and Roles

  • IRD:
    • provides an ontology about domain entities and related information resources metadata,
    • provides expertise about the domain (Ecosystem Approach to Marine Resources) with underlying research laboratory
  • FAO:
    • provides an ontology which deals with entities of the domain (vessel, gear, linneantaxonomy, port, flagstate, area: sea, eez, statisticaldivision, rfb..),
    • provides Linked Open Data (publications) which are annotated with FLOD ontologies URIs
  • FORTH:
    • provides expertise in setting up ontologies and work on TLO [17]
    • provides tools to annotate information ressources and discover them through search engine exploiting ontologies (for clustering results...)

Appendix A - Resources

  • Wiki page about Top Level Ontology / TLO [18]
  • Ongoing version of TLO [19]
  • Previous version of TLO [20]
  • FORTH xSearch [21]
  • FORTH tagger [22]
  • Ecoscope fact sheet example [23]

Appendix B - Budget

Appendix C - Schedule

The Semantic Cluster aligns its work plan to its primary 'customer' milestones, that are the planned iMarine Board meetings, appointed through the life-time of the iMarine project:

  • Semester 1 (Nov 2011 - Apr. 2012);
    • Mobilization phase: identification of opportunities for collaboration and technologies
    • Semantic Cluster support:
  • Semester 2 (May 2012 - Oct. 2012);
    • Stabilization phase: validation of opportunities and definition of the technology scope
    • Semantic Cluster support:
  • Semester 3 (Nov 2012 - Apr. 2013);
    • Experimentation phase: with technologies, and with expansion of the EA-CoP user base
    • Semantic Cluster support:
  • Semester 4 (May 2013 - Oct. 2013);
    • Validation phase: collaboration structures and EA-CoP requirements consolidation
    • Semantic Cluster support:
  • Semester 5 (Nov 2013 - Apr. 2014);
    • Exploitation phase: operations through EA-CoP collaboration frameworks
    • Semantic Cluster support:

Appendix D - Documents

TCOM Documents

  • OGC/ISO Publishing guidelines for Data and Services Providers. Use Cases and links with the Statistical Cluster (and VREs) and Semantic Cluster (Tuna Atlas fact sheets and indicators) TCOM-4 Oostende, Belgium 23-25 January 2013 at:
  • T10.4-Semantic Data Analysis FORTH 4th TCOM.pdf TCOM-4 Oostende, Belgium 23-25 January 2013 [24]
  • T10.4-FLOD initiative TCOM-4 Oostende, Belgium 23-25 January 2013 [25]

Appendix E - Other

iMarine Technical Guidelines

  • Publishing guidelines for Data and Services Providers [26]
Personal tools