2012.07 WP10: Data Consumption Facilities Development Monthly Activity By Task and Beneficiary

From IMarine Wiki

Jump to: navigation, search

Contents

This WP10 Activity Report described the activities performed in July 2012 by Beneficiary and Task.

It is part of July Monthly Activity Report.

T10.1 Data Retrieval Facilities

NKUA Activities

NKUA has been working in the following directions

  • In enhancing the gCube search service in order to support the new enhancements made in the Workflow Search Adaptor in the context of WP8. More specifically the service now supports the following properties:
    • The data source node selector to be used
    • The tie breaker selector of the data source selector
    • The node selector used to select nodes for seach operator invocations
    • The tie breaker selector of the operator node selector
    • The node assignment policy to be used
    • The maximum cost of search plan subtrees which can be executed in a single node, termed maximum colocation cost
    • A threshold value which is used by the node assignment policy. All nodes whose score fail to reach the threshold are excluded from selection.
    • Whether or not the local node should be used for the execution of simple plans.
  • In performing enhancements and solving issues in Resource Registry:
    • Implementation of a new class of plugins which are specialized to the gCube environment.
    • Functionality to distinguish local from remote hostong nodes, by exploiting the information provided by a specific gCube plugin.
    • Minor improvements in plugin execution.
    • Correction of total clock speed property name in HostingNode.
    • Fixes in apply and store operations of HostingNode entity
    • Replaced getLock with getSharedLock and getExclusiveLock in order to eliminate confusion created by using read and write locks, since the purpose of this lock is not to protect actual read and write operations.
    • Resolution of issue which prevented hosting node properties from being stored.
    • Scope is now optional in all QueryHelper methods which accept it as argument.
    • Added new operation for data source service retrieval by id.
    • Minor corrections in persistency descriptors
  • In supporting the activity of XSearch integration with gCube. A conference call was organized in order to plan the activity taking into account the time constraints present until the next gCube minor release. In this conference call the involved partners, NKUA and FORTH, agreed to follow the design discussed during the 2nd TCOM meeting. Another conference call followed soon in order to assess the progress of the activity and solve remaining issues. Since the initiation of the integration activity, NKUA has been working towards providing stable service through the development infrastructure by solving some issues caused by residual issues caused by the development activity of the search service and towards supporting FORTH in the usage of Resource Registry.


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

FORTH Activities

FORTH focused more on the integration between gCube Search System and XSearch. In the 2nd TCOM it was decided that the most efficient way for communicating with the gCube Search is to make XSearch act as a consumer, in the sense that it will not perform the search directly. In contrast it will receive the results and perform the required analysis (described in more detail in T10.4).

After the completion of the first phase of deployment of XSearch FORTH will perform a study regarding the quality of the results that are returned from the gCube search (i.e. metadata, snippets, etc.) in order to find the configuration that returns that “best” (in terms of their textual content) results.


None


None

Terradue Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

T10.2 Data Manipulation Facilities

NKUA Activities

NKUA has been working on the integration of the new node selection library which is part of org.gcube.execution.MadgikCommons into Data Transformation Service Workflow Adaptor. In that way, during the construction of a data transformation execution plan, the node selection library is responsible for finding the appropriate execution nodes that will be used.

During the exploration of the available nodes, Merger node must take into account the high demands for i/o, instructing high resource needs. In this case, a special purpose cost function based selector named "best node selector" is invoked. The rest of the nodes must distribute evenly to all available nodes, considering the distance between subsequent execution nodes. For this case, distance based selector with MRU policy is being used.


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

CNR Activities

During the month of July CNR continued with the integration of a remote distributed Hadoop cluster with the WPS-Hadoop framework developed by Terradue. The integration was finally achieved with Hadoop v2 but issues related to the algorithm are still waiting for a solution.


The integration was performed with Hadoop v0.2 at first stage and then repeated for Hadoop v2.


  • a remote submission procedure for the Java-WPS Hadoop framework was released.

FAO Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

T10.3 Data Mining and Visualisation Facilities

CNR Activities

During the month of July, CNR worked on the usage of a distributed network of Executor instances, along with an ActiveMQ installation, for performing data mining experiments in parallel fashion. Tests were made in order to evaluate performances and a first version is currently ready to be integrated with the Statistical Service. The DBSCAN algorithm was investigated about performances and limitations for its application to species occurrence points clustering.


No deviations to report.


  • a beta version of a parallel processing system for niche modeling experiments was released
  • the DBSCAN algorithm was implemented and evaluated

NKUA Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

Terradue Activities

In this period Terradue implemented a GDAL Java bindings-based library called gtuploader that uses GDAL Java Bindings to retrieve all layers from a NetCDF file and their metadata. Next step should be to develop a WPS-Hadoop Algorithm to send these layers and metadata on a known GeoServer via REST interface (API), and get the new getCapabilities from it.

FAO Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

T10.4 Semantic Data Analysis Facilities

FORTH Activities

FORTH continued the development of the XSearch portlet. The portlet will act as a consumer of search results (that will be given from the Search portlet through the ASL session) and perform an analysis of the results. In the current version of the portlet, the analysis will be perfomed over the (textual) metadata of the results and their snippet (as given from gCube Search).

XSearch uses the gRS2 to create a locator and then passes it to the XSearch service. The results from the XSearch service, in particular the entities that have been mined from the search results and the search results clustering, are then returned back to the XSearch portlet and are presented to the user.

Apart from returning the list of mined entities (that are categorized) and the clustering on the results, the functionalities that are offered to the user from the XSearch portlet include, a gradual restriction of the results based on the mined entities and clusters.

The first version of the XSearch portlet is almost ready (some look & feel changes are still pending) and the source code with the latest changes can be found in the SVN repository.


None


None

FAO Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

Personal tools