2011.12 WP10: Data Consumption Facilities Development Monthly Activity By Task and Beneficiary

From IMarine Wiki

Jump to: navigation, search

Contents

This activity report documents the activities performed in November and December 2011. It is integral to Activity Report of the period.

T10.1 Data Retrieval Facilities

NKUA Activities

NKUA has been working in three main directions for this task:

  • Finalization of the gCube Query Language Specification: The formalization of the query language which embraces all the capacities currently offered by the infrastructure, including filtering, projection, feature matching and geospatial/temporal matching is heading towards its completion, based on the outcome of the work already performed before the initiation of the iMarine project. The first version of the specification of the QL is therefore in schedule for the D10.1 deliverable in M6. In addition, given that semantic related functionality is currently not included in the specification, NKUA and FORTH will investigate the possibility of including the outcome of the collaboration with T10.4 if the latter has reached acceptable levels of maturity until M6.
  • Collaboration with FORTH in order to analyze the requirements between tasks 10.1 and 10.4 and to devise a work plan: Following the discussions which took place at the KOM, NKUA has arranged a meeting with FORTH on Dec 13rd with the aim to streamline collaboration between T10.1 and T10.4. In this meeting NKUA discussed with FORTH on possible enhancements of the gCube Search System and QL with semantic functionality (e.g. Query expansion through thesauri/ontologies/taxonomies, result clustering), provided information on key facts about the gCube infrastructure to FORTH in order to help them familiarize themselves with the system, discussed on the format of the search results and the content model and started gathering requirements from FORTH, the first one being the support of search result snippets (teasers). A work plan for the next few months has also been agreed upon and finally NKUA offered to compile a list of requirements and functionalities for T10.4. Work for the latter is in progress.
  • Enhancement of the gCube Search System
    • A query cache (task ticket #6) has been integrated into the gCube Search System in order to enhance its performance and scalability. The cache stores the outcome of the planning phase (abstract search plans) and will help to further reduce response times by allowing the system to avoid re-executing the planning phase.
    • The robustness of the system has been enhanced by fixing defects discovered in the Resource Registry and Search Operator components
    • The implementation of support for search result snippets (task ticket #7) is currently in progress.
    • The implementation of a set of features for the Resource Registry, which will aid administration and provide support for UI enhancements is in progress. Among these features are data source similarity detection, semi-automatic update of identical data sources and indication of automatically vs manually managed search fields.


A defect in the Search/Index System which was discovered on Dec 22nd while performing queries in the development infrastructure has been delaying the work of FORTH. Actions towards its resolution are in progress. (Solved on Dec 27th)


  • Implementation and testing of a query cache for the gCube Search System is complete.

FORTH Activities

Discussions with NKUA (at Pisa) regarding the provision of snippets for enabling real time clustering of results and other services (see slides of FORTH at KickOff meeting). Then we have registered several VREs’, in order to familiarize ourselves with the gCube infrastructure. In addition we read the developers guide from the gCube wiki and studied a guide (provided by NKUA) for the development of new portlets. We had discussions (KO meeting, teleconf) with WP10 leader (NKUA), and identified that a first step is to use the search service API, analyze the search results (access data/metadata through the content model structure, OpenSearch), and identify the services (i.e. real time results clustering, exploit metadata in faceted search, etc.) we could offer on top of this API. The results of this first step will define how the exploratory search services we will provide, will be integrated in gCube infrastructure (i.e. specification of the services/API (related to tasks T11.*) for enabling others to build applications on it).


We tried (Dec 20) to build and deploy the searchsystem example, but several problems occurred. According to some discussions with the NKUA development/deployment over linux has been tested. To check.


None

Terradue Activities

None


None


None

T10.2 Data Manipulation Facilities

NKUA Activities

As a first step towards the providing Data Manipulation facilities and meeting the task's goals, the activity of NKUA has focused on investigating how the gCube Data Transformation Service can be altered in order to produce execution plans for the PE2ng engine instead of performing tranformation on the local node. The integration of gDTS with the PE2ng engine will allow the former to have access to vast amounts of processing power and enable it to handle virtually any transformation task thus making it the standard Data Manipulation facility for gCube applications, many of which currently rely on their own custom methods of performing such tasks. Additional goals of the design work currently underway is to make the new version of the tranformers completely independent of the underlying execution model and to specify requirements for T8.3. The design process is still in progress and has not yet produced concrete results. Since this work is performed in tandem with the familiarization of the current task leader with gDTS software, more definitive results are expected in the next periods.


Discussions between partners in the task as well as inter-task and WP discussions concerning the requirements for execution models, new data types to be supported and other relevant issues have not started yet. Since such discussions will be necessary for the full realisation of the task's goals, WP and task leader will liaise with CNR and FAO after the first phase of the design and analysis procedure is complete.


-

CNR Activities

None


None


None

FAO Activities

None


None


None

T10.3 Data Mining and Visualisation Facilities

CNR Activities

The main objective of this task is to set up a service in the infrastructure which is able to:

  • Perform Data Mining operations on datasets coming from users or services
  • Perform computations on several Computational Infrastructures (e.g. single machine, cloud etc.)
  • Parallelize commonly used computational algorithms (e.g. for species probability distributions)
  • Supply statistical analysis features and charts production
  • Exchange input and output datasets in standard formats (e.g. SDMX format)
  • Plug-in best practice methods for data processing from the iMarine community experience
  • Train systems for performing experiments about Ecological Modeling


A statistical service is being developed in order to achieve the above goals. Currently the implementation supports the generative model of Aquamaps along with a monitoring tool for controlling the performances. A plug-in based system has been implemented for both the system and the underlying library which performs the calculations. A simple mechanism, from the user's point of view, lets common users develop their own code and run it in parallel mode. Support for other cloud infrastructures has not been implemented yet. Interface with Venus-C computational infrastructure is running even if the writing of the results back on the database is missing. The statistical service is able to take under control the occupation of the computational infrastructures it is able to manage. The following figure reports a conceptual sketch of the service core.

Statistical Service Internal Logic

Tests are being performed in order to check out the robustness of the implementation. The plug-in system on the library side is complete for what regards the probability generation models which perform projections for species distribution estimation. The parallel model training facility is still under development.

NKUA Activities

No activities were planned in this phase about Data Mining, by NKUA

FAO Activities

No activities were planned in this phase about Data Mining, by FAO

T10.4 Semantic Data Analysis Facilities

FORTH Activities

High Level Objective: Provide some basic RDF metadata management services (T10.4) and exploit them for advancing the search service of gCube (T10.1). Emphasis will be given on offering exploratory search services that leverage (in various ways) the available knowledge models (vocabularies, ontologies, etc) and metadata

Log of activities: Familiarization with gCube: We read the developer’s guide from gCube wiki, registered to some VRE’s and familiarized with the environment (We had a skype communication with Anton from FAO that presented us some VRE’s from the infrastructure. We defined a work plan for the next months and we discussed it with WP10 leader (NKUA).

The work plan is available at: http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d239678/ActionPlan_Tasks_10.1-10.4_FORTH.docx


None


None

FAO Activities

None


None


None

Personal tools