2013.02 WP9: Data Management Facilities Development Monthly Activity By Task and Beneficiary

From IMarine Wiki

Jump to: navigation, search

Contents

This WP9 Activity Report described the activities performed in February 2013 by Beneficiary and Task.

It is part of February Activity Report.

T9.1 Data Access and Storage Facilities

FAO Activities

In this reporting, FAO has provided support and coordinates activities towards the integration of the tree-based access subsystem with other gCube subsystems, particularly those data related to data transfer and search result presentations.


None to report.


None to report.

CNR Activities

Species Discovery Service

The CNR has been involved in the following activities:

  • enhancing the Darwin Core Archive generation, fixing bugs in the taxonomy ranks


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

T9.2 Data Transfer Facilities

CERN Activities

Task on data transfer implemented by CERN:


None


None

NKUA Activities

Nothing to report


Nothing to report


Nothing to report

Terradue Activities


None


SW architecture for the Legacy Applications Integration http://gcube.wiki.gcube-system.org/gcube/index.php/Legacy_applications_integration

T9.3 Data Assessment, Harmonization and Certification Facilities

CNR Activities

During the month of February CNR studied several software solutions related to the field of Data Warehouse, Data Integration and Data Anlaysis with the idea of integrating them within a new version of the timeseries software suite or take the cue from them for an in-house solution (#1435). Data curation and analysis tools were considered for the implementation of user defined rules on data and the data analysis workflow. CNR started a discussion on the adoption of OLAP technologies as solutions that can be leveraged to cover part of the identified requirements (#1224). CNR tested the following software solutions and analyzed the impact on the actual implementation:

  • Mondrian: Open Source OLAP (Online analytical processing) solution that supports the MDX query language (#1442). In order to support Mondrian Cubes of statistical data an effort must be put into creating a datastructure definition for the data cube star schema (XML document), defining dimension (and hierarchies) and measures for each curated dataset. The tool used for this aim needs to be easily integrated with the actual semi-interactive workflow currently implemented in timeseries for the management of statistical data. In order to obtain this result several modifications need to be made to the database tables structure, moreover there is a need for a data integration/trasformation solution that can be easily integrated with the actual workflow and within an automated curation workflow.
  • Kettle (Pentaho Data Integration): Kettle is a data integration software library, a client tool and an engine (1441). Data integration software can be leveraged for the use cases regarding the transformation of data and data import. As this kind of tools is as flexible as complex we must consider that an effort is needed in order to hide the complexities of the data integration engine from the end-user.
  • Drools: We envisioned the adoption of a rule engine system as a solution for the curation/trasformation of data (#1436). According to the initial plan rules can be defined and parameterized by the enduser in order to verify constraints on dataset and take actions (notify the user, automatic modification of datasets). We abandoned the plan of adopting a rule engine for the curation/transformation of datasets as we encountered performance/resource managements issues and problems with the definition of the transformation workflow by the enduser. In order to solve performance/resource management issues we tested the adoption of drools with a key-value store (Redis).
  • Data Cleaner: data cleaner is a client tool and a library for data analysis (#1442). The tool is also able to perform transformations on data prior analysis.
  • AnalyzerBeans: Analyzer beans is a library, on which data cleaner is based, that allows to analyze datasets against a set of parameterized tests, producing reports (#1442).

Other open-source software solutions and components were also considered briefly but not tested during this month: Olap4j, Talend ETL, Clover ETL, JasperSoft ETL, Metamodel.


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

Personal tools