2012.03 WP10: Data Consumption Facilities Development Monthly Activity By Task and Beneficiary

From IMarine Wiki

Jump to: navigation, search

Contents

This WP10 Activity Report described the activities performed in March 2012 by Beneficiary and Task.

It is part of the March 2012 Activity Report.

T10.1 Data Retrieval Facilities

NKUA Activities

NKUA has been working on evolving the Resource Registry component in the following directions:

  • Flexibility in remote data store selection. Remote data stores for read and update operations will be supported in addition to the existing local ones. This will enable the Resource Registry to function in a variety of modes and environments with minimal work in terms of development.
  • Support for new read and write policies. As far as the underlying repository provider provides such support, the Resource Registry will provide the option to contact the remote data store directly for store and retrieve operations, instead of relying to the periodic bridging iterations. Repository providers will be free to provide both or just one of the two modes of operation.
  • Evolution to a plug in oriented architecture. Pre and post-processing tasks within the logic of repository providers which should be independent to the latter is now moved to separate plug in modules. Plug in logic will be executed in designated points in bridging cycles or triggered in a periodic fashion, will be able to be enabled or disabled easily via configuration and will encapsulate various administrative or value adding features such as automatic field creation or data source management.

In addition, NKUA has been supporting FORTH in using the HTTP API of the gCube Search System and both partners are investigating the cause of an error preventing FORTH from getting results in ticket #226.


None


None

FORTH Activities

We were testing the provision of snippets by the gCube Search Service. The latter returned NPE (NullPointerException), so no testing was possible (A ticket has already been opened).

During the 1st TCOM we addressed several issues that are related to the communication between X-Search and gCube search system. At first X-Search will be developed externally from the gCube infrastructure. For this reason X-Search will use the HTTP API of the gCube search system. However the OpenSearch description document of the underlying search engine should be modified to contain more descriptive information.
The extended OpenSearch description document will also contain several information that should be “passed” to the X-Search. (i.e. which metadata categories to be used for performing entity mining, field name that holds the id, url of the actual hit, etc).

Furthermore we discussed how we are planning to proceed to the integration of X-Search into the gCube infrastructure.


None


None

Terradue Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

T10.2 Data Manipulation Facilities

NKUA Activities

During last month, NKUA finalized the development of the WorkflowDTSAdaptor. Now the adaptor is capable of producing PE2ng execution plans that will carry out the transformation process of a Data Source.

Especially, WorkflowDTSAdaptor produces an execution plan, consisted of the Merger Operator responsible for the merging process of other various transformation chains that may occur for different source content types. After WorkflowDTSAdaptor has been initialized, WorkflowDTSSubplanAdaptor can be used in order to construct execution plans for the upcoming transformation chains and the merging of them will be delegated to the main execution plan.

For each transformation chain a separate execution plan is constructed. That plan is composed of a Data Source Operator that is responsible for Data Source retrieval and forward Data Elements of a specific content type, while Transformation Operators are dealing with simple transformations of Data Elements. The output of the last Transformation Operator, that is a ResultSet containing Data Elements transformed to desired target Content Type, is assigned to the Merger Operator, through the ResultSet that Merger Operator has for input. Consequently, WorkflowDTSAdaptor can adapt to arbitrary number of different transformation chains, all done dynamically.

As for the next steps, we are going to extensively test the new Adaptor for its integrity and performance, advance to fully functionality of transformation process and investigate fault tolerance issues.


none


The development phase of the WorkflowDTSAdaptor has finished.

CNR Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

FAO Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

T10.3 Data Mining and Visualisation Facilities

CNR Activities

During M3 CNR activity focused on the Bio-Climate Analysis. The principal aim of the analysis was to evaluate the impact of climate changes on marine species distribution. Experiments were made in order to understand how the variations in some sea characteristics can influence the suitability of some regions to the survival and reproduction of marine species. The experimental setup was made by:

  • A benchmark set of 11549 marine species, coming from the Fish Base data source with associated meta-information indicating life stages, scientific names, ranks, authors’ names etc.
  • Two tables named HCAF and HCAF2050, indicating the values for some environmental parameters at 0.5 degrees resolution. The underlying assumption is that these should contain the minimum set of information which can influence marine species fundamental niche.
  • A modeling system able to produce spatial probability distributions at 0.5 degrees resolution on the oceans. In the reported chart the AquaMaps Suitable algorithm was used for calculating the probability, for a certain fish, to be able to prosper in a defined area of the sea.

The analysis has been performed at various levels. Some official subdivision of the seas by FAO were used: FAO MAJOR AREAs and Large Marine Ecosystems (LME) were selected. The analysis was performed by enriching the functionalities yet present on the AquaMaps gCube application by means of new technological facilities that were added to the Statistical Manager. In particular, the following techniques were added:

  • Usage of the Rainy Cloud computational infrastructure for executing probability models calculations
  • Interpolation between two different climate scenarios
  • Visualization of trends
  • Analysis of the variation of occupancy probabilities (HSPEC analysis)
  • Analysis of the trends and of HSPECs for each FAO Area or LME
  • Analysis on the Envelopes variations (HSPEN analysis)

The following images report the principal obtained results:

Trend of the ice concentration from 2012 to 2050.
Trend of the salinity from 2012 to 2050.
Trend of the sea surface temperature from 2012 to 2050.
Effect of climate variations on the distribution of the species in the oceans.


No deviations from the foreseen actions.


  • introduction of interpolation facilities between two different scenarios
  • evaluation of probability distributions changes in time for marine species
  • analysis focused on FAO Areas and LME

NKUA Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

FAO Activities

FAO continued the development of the Spatial REallocation of Aquatic Data (SPREAD) functionality. This requires consumption of Aquamaps data stored in the D4Science e-infrastructure on Species distribution. Due to different technologies in FAO and on the D4S infrastructure, consuming AquaMaps data from a remote infrastructure experiences delays. The matter is under investigation.


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

T10.4 Semantic Data Analysis Facilities

FORTH Activities

We have developed a new version of X-Search service whose functionality can be summarized as:

  • Provision of snippet-based results clustering over any search system (that returns textual snippets and for which there is an OpenSearch description)
  • Provision of snippet or contents-based entity mining (generic as well as vertical– based on predetermined entity categories and lists)
  • Provision of gradual faceted (session-based) search
  • Ability to gradually restrict the answer based on the selected entities and/or clusters
  • Ability to fetch (by querying SPARQL endpoints) and display the semantic information of an identified entity.
  • Ability to apply these services on any web page through a Web browser (bookmarklet)

We have developed two version/prototypes:
The first is a general version over a Web Search Engine (Google in our case) and the FactForge SPARQL endpoint which includes 8 LOD datasets (including DBPedia, Freebase, GeoNames, WordNet, etc.)
The second is a prototype over FIGIS search component and FLOD SPARQL endpoint. The FIGIS search component can receive queries through an HTTP API. The search result apart from formatted HTML can be returned in XML format which uses Dublin Core schema to encapsulate bibliographic information. Each returned hit has various textual elements, including publication title and abstract. The first is around 9 words, the second cannot go beyond 3,000 characters. As concern entity mining, we identified (in collaboration with FAO) the following relevant categories: Countries, Water Areas, Regional Fisheries Bodies, and Marine Species. For each one there is a list of entities: 240 countries, 28 water areas, 47 regional fisheries bodies and 8,277 marine species, in total 8,592 names. Each such entity is also described and mutually networked in the Fisheries Linked Open Data (FLOD) RDF dataset. FLOD extended network of entities is exposed via a public SPARQL endpoint and web based services.
The objective is to investigate how to enrich keyword search with entity mining where the identified entities are linked to entities in FLOD endpoint, and from which semantic description can be created and served.

The running prototypes are available at http://www.ics.forth.gr/isl/ios


None


None

FAO Activities

No report received (Officer is on Break)


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

Personal tools