2012.06 WP10: Data Consumption Facilities Development Monthly Activity By Task and Beneficiary

From IMarine Wiki

Jump to: navigation, search

Contents

This WP10 Activity Report described the activities performed in June 2012 by Beneficiary and Task.

It is part of the June Monthly Report.

T10.1 Data Retrieval Facilities

NKUA Activities

During this period, deliverables D10.1: gCube Query Language Specification and D10.3: iMarine Data Consumption Software were finalized and submitted.

In addition, discussions on the integration of Semantic Search with the gCube Search System took place during the second TCom meeting in Rhodes, in dedicated parallel sessions. The discussions led to final decisions on the exact way Semantic Search will communicate with gCube Search and on the exact components of the ASL layer the latter will use.

In particular, Semantic Search functionality will be deployed in the infrastructure as a service, in order to minimize the burden imposed on portal nodes. The default mode of operation of Semantic Search based on search result snippets is not computationally intensive. However, if specific features are enabled requirements with respect both to computation and to memory are expected to be higher. For this reason, Semantic Search features will be deployed in the infrastructure separately from the UI component, as a servlet or gCube Service depending on the requirements imposed by intefacing with SPARQL endpoints as gCube Runtime Resources.

Semantic Search will therefore consist of two components: the Semantic Search portlet and the Semantic Search Service. The portlet will be in charge of presenting the results along with the outcome of semantic search features, e.g. clusters and entities, and of feeding the Semantic Search Service with data extracted from the search results. In order to obtain the results, the portlet will use the RSConsumer facility of the ASL. As result iteration both for Search and Semantic Search is similar in its nature, the usage of the RSConsumer facility was preferred over using the API of gRS2 directly in order to maintain the specifics of handling search results, such as the choice of gRS2 reader and the handling of timeouts in a single point withing the software stack; the ASL layer is the most suitable choice to provide such functionality from an architectural point of view. Moreover, the ASL will provide the means to ensure the presentation of results in a common format and style both at the Search and Semantic Search portlets.

The existing Search portlet will include a selector which will be used to enable semantic features. In this case, the Search portlet after submitting the search query to gCube Search System and retrieving the outcome of the query, will redirect the user to the Semantic Search portlet and will provide the latter with the search terms in free text form along with the returned result set locator. The Semantic Search service will in turn need data from the top k returned hits in order to perform its computation, where k is expected to be in the order of 100-1000 results. This data consists of result snippets, and optionally the payload of all relevant annotated search fields depending on the features selected by the user. The Semantic Search portlet is in charge of forwarding the search terms and the extracted data to the Semantic Search Service. Since neither the size nor the number of the latter can be known in advance, gRS2 will be used for communication betweeen the Semantic Search portlet and Service.

The method of retrieving annotated fields through the Resource Registry will be used with no changes with respect to what has been decided before the TCom meeting.


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


  • Finalization and submission of deliverable D10.1: gCube Query Language Specification
  • Finalization and submission of deviverable D10.3: iMarine Data Consumption Software

FORTH Activities

The activities were focused on defining how XSearch will communicate with the gCube Search System. The most convenient way for communicating is to make XSearch act as a “consumer” of the results retrieved from gCube search. This strategy requires that another component (in particular SearchPortlet) will formulate the CQL queries, perform the query using gCube search system and then trigger XSearch to analyze and present the results.


none


none

Terradue Activities

Following the general outline from FAO's TCom, we started to create the "Visualization" part of the plan. We would to design a WPSHadoop "connector" to a WMS hosted by GeoServer. We are thinking of use GDAL Java Bindings to create various GeoTIFF files from each layer and upload them via REST interface (API).

T10.2 Data Manipulation Facilities

NKUA Activities

During last month, NKUA focused on the proper distribution of the transformation process to different execution nodes. Transformation plans should exploit former knowledge about the transformation, in order to construct efficient workflows. In such a way, performance and scalability of transformations performed will be improved.

Especially, constructed workflow plans, describing the transformation that takes place, are now containing instructions to execution engine on how the plan should be deployed. This is done by using the appropriate facilities offered by the execution engine, the Boundary plan elements, in order to harbor plan elements from being executed on the same execution node. As for example, both Data Source Retrieval and primal transformation occupy one execution node exclusively, different execution nodes will be used for upcomming transformations and strictly one node for merging of transformation to output.

Abstractions over execution models will be exploited in order to support the execution of the same transformation processes on different execution nodes, through data partitioning. This activity depends on the provision of the corresponding facilities by PE2ng (WP8) and will be performed in parallel with the latter.


none


none

CNR Activities

CNR worked in the direction of geo-spatial data manipulation. The activities regarded the integration of a Thredds Service with GeoNetwork and the management of NetCDF-CF files. The Environmental Explorer library was developed in order to perform intersections of geo-spatial layers with coordinate points. Intersections are performed towards a GeoNetwork and it is transparent to the user if the underlying layers reside on Thredds or on a certain GeoServer. NetCDF-CF files are correctly indexed on GeoNetwork by means of metadata management classes in the Environmental Explorer Library. On the other side CNR worked on WPS integration based on the WPS-hadoop framework developed by Terradue in WP9. Usage tests on basic configurations were successful after discussions and collaborations through tickets. Integration is currently focusing on running jobs on remote Hadoop workers and HDFS.


No deviations to report.


  • NetCDF-CF files management and publication in D4Science
  • Environmental Explorer library was developed and tested
  • Thredds has been released in the infrastructure
  • Thredds\NetCDF-CF files are indexed on GeoNetwork
  • WPS-Hadoop integration in D4Science was started
  • WPS integration with remote distributed Hadoop installation was started

FAO Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

T10.3 Data Mining and Visualisation Facilities

CNR Activities

During the month of June, CNR worked in enhancing the efficiency of the Statistical Manager. A connection to the Executor engine was implemented. Algorithms can be now parallelized on several GHN. An evaluation of the performances and robustness of the system was made as reported in ticket #429. Next steps will try to use the same approach with Pe2ng. On the other side, the Statistical Manager was redesigned in order to rely on a external message queue. Requests by the users will be added to a queue and consumed by the service. Clients will act as producers. The Apache MQ was selected for testing. On data visualization CNR improved the GeoExplorer Portlet by designing the management of multiple GIS workspaces and performing optimization and bug fixing of the current portlet.


No deviations to report.


  • Design of Map Reduce procedures in D4Science
  • Usage of the Executor for distributed computations
  • Test of failure and robustness for the distributed computation
  • Re-design of the Statistical Manager for managing requests by means of an external queue
  • Enhancements in GeoExplorer Portlet

NKUA Activities

Activities in the area during this period have been restricted to participation to the TCom sessions and eMail discussions, which are kept to the minimum.


none


none

FAO Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

Terradue Activities

Following the general outline from FAO's TCom, we started to create the "Visualization" part of the plan. We would to design a WPSHadoop "connector" to a WMS hosted by GeoServer. We are thinking of use GDAL Java Bindings to create various GeoTIFF files from each layer and upload them via REST interface (API).


None


None

T10.4 Semantic Data Analysis Facilities

FORTH Activities

FORTH started the commitment of the source code of XSearch at https://svn.d4science-ii.research-infrastructures.eu/gcube/trunk/semantic-search/. The committed component is already mavenized with the following information:

   <groupId>x-search</groupId>
   <artifactId>x-search</artifactId>
   <version>1.0</version>

Furthermore FORTH continued the work on integrating XSearch in the infrastructure. For this reason FORTH will continue in working in two tasks: (a)On performing the necessary actions for running XSearch over gCube Search System and (b)on developing the XSearch portlet. Regarding (a) XSearch will “receive” the results derived from the gCube Search System and perform its functionality on top of these. Regarding (b) the portlet is necessary for presenting the results (with the identified mined entities and cluster results) to the user through the portal.

Finally FORTH continued working on the refinement of the MS45-Sematnic Data Analysis Specification.


none


none

FAO Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

NKUA Activities

NKUA is following activities in the semantic data analysis sector. As a result it is designing an implementing supporting components in other services that enable and integrate the Semantic Data Analysis sector products with gCube.


none


none

Personal tools