OGC/ISO Publishing guidelines for Data and Services Providers

From IMarine Wiki

Jump to: navigation, search

Contents

This page presents the guidelines for data and service providers.

The following sections describe what is required to connect data and related services to the geospatial cluster.

What kind of data and Services do we aim to provide ?

This section enumerates the data & services in the process to be shared with the iMarine infrastructure.

Data sources

  • A data sources inventory is provided here and lists the different collections, datasets & related services in the process to be shared by data providers.
  • Other data providers that want to share geo-related data with the Geospatial Cluster might want to contribute to this page.

Related services

On top of data sources described in previous section, the Geospatial cluster will deliver different Web Services:

  • Data discovery services through CSW (metadata catalogs) that aim to make users aware of available data,
  • Data access services through OGC WxS services (WMS, WFS and WCS), and other potentially relevant web-services (e.g. SDMX for statistical data) that make data accessible
  • Data processing services through OGC WPS XML description to make processings accessible

The list of services in the process to be shared is described here.

A CSW URL can be sufficient to discover existing Web Services instances through metadata catalogs and to query them (GetCapabilities). To do so, each instance of Web Service has to be described with 19119 / 19139 standard.

Once the available data and its data access will be well-known, the aim will be to enable processings that can be combined with these data.

End products

The following gives end products aimed to be delivered to the users by combining previous data and services:

  • Statistical data reallocation (SPREAD): combining both statistical data (time-series) with geospatial data (intersections, Species distributions, etc) processed with WPS in order to provide statistical estimate data by mean of new time-series referenced at other geospatial resolutions.
  • Tuna atlas: Tuna fisheries data will be made handled through Web Services: accessible through standard protocols (e.g. WFS) and formats, and processed with WPS to generate indicators. The goal will be to have a Common Tuna Atlas, replicating two different tuna atlas available online:
  • Species data enrichment: Species distribution, occurrences of other fisheries databases aimed to be input data for WPS that can collect environmental data for the same date and location than species observations. However underlying processing currently require IDL language.

Metadata guidelines for data and service providers

This section intends to provide guidelines to be followed by data & services providers with the objective to enable new sources of data and processings to be handled by iMarine infrastructure.

The guidelines are mainly based on the use of metadata standards, such as OGC/ISO-TC211, and OGC Web-Services (OWS) standards.

FAO and IRD implementations are given as examples.

Metadata Standards for data sources and related services

Dataset & Service metadata description

  • Data & services description with appropriate metadata standards
    • data sources should be described with ISO 19115:2003/19139 dataset / database metadata)
    • related Web-Services should be possibly described with ISO 19119/19139 OGC WxS service metadata
  • Association between dataset (in geographic server) & dataset metadata (in metadata catalogue)
    • Dataset & its metadata should be closely associated.
    • Such strong association should be operated on both side:
      • in the dataset metadata, data should be made available as WxS online resource(s)
      • in the WxS GetCapabilities, reference to the metadata should be made through a MetadataURL tag
  • Association between dataset metadata & service metadata
    • Whole or parts of these data sources (that are described with ISO 19115/19139) can be made available through Web Services. In this case, such Web Services have to be described and relationships between datasets and Web Services have to be indicated in both kinds of metadata.
    • In the service metadata, the relationship with dataset metadata (ISO 19115:2003/19139) is established using the srv:operateOn tag.
  • Compliancy with INSPIRE
    • data sources shared with the geospatial catalog should be in the process of being compliant with INSPIRE
    • as first validation step, INSPIRE compliancy can be assessed with the INSPIRE metadata validator

Data & Service sharing

  • For iMarine, partners should be enabled to share existing data and related services in two different ways:
    • by providing URLs of CSW GetCpabilities
    • by providing URLs of WMS/WFS/WCS/WPS GetCapabilities

Examples of implementations:

Implementation Online Resource
CSW nodes (GetCapabilities)
WxS GetCapabilities
ISO 19119/19139 (service metadata)
ISO 19115/19139 (dataset metadata)

Data discovery

In order to share a set of dataset & service metadata served by a given CSW, the iMarine CSW instance (Geonetwork) will then be able to harvest related metadata without having to replicate them into the iMarine infrastructure.

If only a subset of metadata has to be harvested from the CSW, a set of keywords could be given by the data/service provider to restrict the scope of the harvesting operation.

Obviously, CSW is the most powerfull way to provide data and services for iMarine as WMS/WFS/WCS as GetCapabilities URLs are just part of information brought in a CSW (through 19119). For example, this IRD WFS Getcapabilities example can be found in IRD CSW.

This approach has been validated with FAO and IRD data and services as a first step.

Additional metadata for processings

  • In addition to 19119/19139 metadata sheet that describes an instance of WPS, each processing of this instance which is made available for iMarine (by FAO, IRD and other partners) should as well be described through OGC WPS metadata with a DescribeProcess request that returns the ProcessDescription.
  • This description can be queried from the WPS instance URL with a GetProcess/ProcessDescription query.
  • For example, usual description like this should be translated into OGC compliant description.
  • Another question is to know if the list of iMarine processings (deployed on the infrastructure) can be extented with those described in WPS instances of partners who manage their own WPS servers.
  • Whatever the architecture (distributed or not), a crucial issue consists in the ability to suggest a proper combination of data and processing.

Use of keywords in metadata

Once the different CSW partner nodes harvested, the data discovery will be facilitated by using keywords tags (instead of current ad hoc rules for URL syntax where a specific literal has to appear in URLs of WxS instances which is not always possible). The standard approach that we describe here will enable to:

  • discover data that can be displayed by the iMarine GeoExplorer web-application (datasets served through WMS),
  • discover data inputs for a given WPS,
  • discover processings that can run with a given dataset made available through WFS and WCS,
  • cluster/categorize layers according to different needs (species, environmental data, fishing gears, fisheries, vessels...). The cluster/categorization requirements are described here (TRACT ticket).

The use of keywords is envisaged by following two different steps:

  • First step: use of literals between tags,
  • Second steps: use of top-ontology URIs. The goal will be to use controlled vocabularies of iMarine (from semantic cluster / TLO / FLOD / Ecoscope). Literals could be turned into URIs with the "matching service" (based on text / entity mining).

Examples of keyword categories:

  • Species: Worms identifier and / or latin names.
  • Fishing gears: Worms identifier and / or latin names.
  • Fisheries,
  • Vessels,
  • Flags, etc

The creation of iMarine thematic thesaurus should be investigated to cluster the discovery results.

Keywords in 19139 metadata

Example of keywords in ISO 19139:

 <gmd: ... >
   <gmd:descriptiveKeywords >
     <gmd:MD_Keywords >
       <gmd:keyword>
         <gco:CharacterString >indian Ocean</gco:CharacterString >
       </gmd:keyword>
       <gmd:keyword>
         <gco:CharacterString >ocean Indien </gco:CharacterString >
       </gmd:keyword>
       <gmd:type>
         <gmd:MD_KeywordTypeCode codeListValue="place" codeList="MD_KeywordTypeCode" />
       </gmd:type>
       <gmd:thesaurusName>
          <gmd:CI_Citation id="ID">
          <gmd:title>
            < gco:CharacterString >iMarine ontology</gco:CharacterString >
          </ gmd:title>
       </gmd:thesaurusName>
      ...


Keywords in WPS metadata

Example of keywords in Processing description:


< DataInputs>

 <Input minOccurs="1" maxOccurs="1">
   <ows:Identifier>InputPolygon</ows:Identifier>
   <ows:Title>Polygon to be buffered</ows:Title>
   <ows:Abstract>URI to a set of GML that describes the polygon.</ows:Abstract>
   <ComplexData maximumMegabytes="5">
    <Default>
     <Format>
       < MimeType>text/ xml</MimeType>
       < Encoding>base64</Encoding>
       <Schema >http://foo.bar/gml/3.1.0/polygon.xsd</Schema>
     </Format>
   </Default>
   <Supported>
     <Format>
       <MimeType>text/xml</MimeType>
       <Encoding>UTF-8</Encoding>
       <Schema>http://foo.bar/gml/3.1.0/polygon.xsd</Schema >
     </Format >
   </Supported>
  </ComplexData>
 </Input>
...

Combining data and processings through keywords

We aim to answer the following needs:

  • the user wants to know what are the available processings that iMarine can run with a given data
  • the user wants to know what are the available data to be used as inputs for a given processing

To answer the previous questions, there is a need to indicate in both data and processing descriptions some information to relate one with the other. The use of dataset metadata keywords was already mentioned to locate/discover datasets.

A similar approach is suggested to deal with "data and related processings" discovery by using the following tags:

  • in WPS description: the data dictionnary of the input dataset will be described by using the schema tag of the ProcessDescription to attribute an identifier.
  • in WFS description: this identifier will be used as a keyword in WxS getCapabilities, dataset metadata (OGC/ISO 19115/19139), or service metadata (OGC/ISO 19119/19139).

A more sophisticated approach should be investigated to match data with processings by using the Feature Catalog metadata standard (ISO 19110) in relation with:

  • the MD_FeatureCatalogueDescription of ISO 19115/19139 metadata standard (for vector data)
  • the MD_CoverageDescription of ISO 19115/19139 metadata standard (for raster data).

This could be the next step according to the geospatial cluster progress.

Transforming OGC metadata in RDF for semantic cluster

OGC metadata can be transformed in RDF by re-using following XSL files:

  • xsl files [1] of Terradue developped for the GENESI-DEC project,
  • xsl files of USGS/USGIN for WMS 1.1.1 (can be customized for the iMarine needs).

However, keywords in RDF should be replaced with URIs (instead of literals). XSL cannot achieve this. The Semantic cluster "entity mining" / "matching service" applications could help to achieve this (same approach as translating / turning literals into URIs that in use cases Zotero RDF references, Opensearch RDF results,etc). See related tasks and tickets: here...XXX).

References

Personal tools