3rd TCom Meeting: 2nd October 2012 Discussions and Notes

From IMarine Wiki

Jump to: navigation, search

Contents

Meeting Agenda

Meeting Participants

Web conference system

gCube Workflow Management

Presenter: G. Farantatos (NKUA)

Slides: .pptx file

Participants: A. Antoniadis (NKUA), M. Assante (CNR), N. Bailly (FIN), J. Barde (IRD), C. Bekiari (FORTH), E. Blondel (FAO), F. Brito (Terradue), L. Candela (CNR), V. Canhos (CRIA), P. Cauquil (IRD), G. Coro (CNR), G. Farantatos (NKUA), G. Kakaletris (NKUA), A. Manzi (CERN), V. Marioli (CNR), Y. Marketakis (FORTH), P. Pagano (CNR), F. Simeoni (FAO), E. Travaglino (E-IIS), R. Tsantouli (NKUA);

Multiple instances of the Search can be deployed by relying on gCube 2.10 release. The ASL selects a random instance among the available instances of the Search "System".

Are there any indicators that can be used to tune the deployment strategy? NKUA will liaise with CERN to define the allocation plan.

No figures on performance enhancements are available. NKUA is working to deeply testing the new version of the Workflow Management.

Data Mining

Presenter: G. Coro (CNR)

Slides: .pptx file

Participants: A. Antoniadis (NKUA), M. Assante (CNR), N. Bailly (FIN), J. Barde (IRD), C. Bekiari (FORTH), C. Baldassarre (FAO), E. Blondel (FAO), F. Brito (Terradue), L. Candela (CNR), V. Canhos (CRIA), P. Cauquil (IRD), G. Coro (CNR), P. Fabriani (E-IIS), G. Farantatos (NKUA), G. Kakaletris (NKUA), A. Manzi (CERN), V. Marioli (CNR), Y. Marketakis (FORTH), P. Pagano (CNR), F. Simeoni (FAO), E. Travaglino (E-IIS), R. Tsantouli (NKUA);

There is no comparison with a Map-Reduce engine. CNR is starting testing and comparing this with Hadoop.

The proposed approach is not a replication of Hadoop, it is an attempt to exploit the available GHNs for processing purposes. Moreover, it is conceived to have a minimal impact on GHN performances.

PE2ng has been considered a sort of overkill wrt the exploitation scenarios underlying the Statistical Service.

An extensive testing activity aiming at estimating the overhead resulting from a multi-site deployment is needed.

Geospatial Processing

Presenter: G. Coro (CNR)

Slides: .pptx file

Participants: A. Antoniadis (NKUA), M. Assante (CNR), N. Bailly (FIN), J. Barde (IRD), C. Bekiari (FORTH), C. Baldassarre (FAO), E. Blondel (FAO), F. Brito (Terradue), L. Candela (CNR), V. Canhos (CRIA), H. Caumont (Terradue), P. Cauquil (IRD), G. Coro (CNR), P. Fabriani (E-IIS), G. Farantatos (NKUA), G. Kakaletris (NKUA), A. Manzi (CERN), V. Marioli (CNR), Y. Marketakis (FORTH), P. Pagano (CNR), F. Simeoni (FAO), E. Travaglino (E-IIS), R. Tsantouli (NKUA);

Some limitations affecting the current implementation of the WPS-Hadoop service have been highlighted, namely single node execution mode for the three algorithms and minimal integration with the rest of infrastructure services (e.g. Storage).

IRD clarified that they are not going to develop any approach for kriging, and they propose to rely on existing solutions. However, they have a number of algorithms that can be added (including spatial re-sampling and trajectory analysis). It seems that the process for adding new processes / algorithms is not clear.

IRD will circulate an inventory of data and algorithms they can offer (some of them are based on R).

Terradue (and CNR) should provide IRD with guidelines on how to inject their algorithms and data within the proposed framerwork.

Intersection algorithm is not a priority for SPREAD. Since SPREAD is the only use case the development of this part will be tuned to take into account that no use case is in place.

Integration between Search and Xsearch

Presenter: G. Farantatos (CNR)

Slides: .pptx file

Participants: A. Antoniadis (NKUA), M. Assante (CNR), N. Bailly (FIN), J. Barde (IRD), C. Bekiari (FORTH), C. Baldassarre (FAO), E. Blondel (FAO), F. Brito (Terradue), L. Candela (CNR), V. Canhos (CRIA), H. Caumont (Terradue), P. Cauquil (IRD), G. Coro (CNR), P. Fabriani (E-IIS), G. Farantatos (NKUA), G. Kakaletris (NKUA), A. Manzi (CERN), V. Marioli (CNR), Y. Marketakis (FORTH), P. Pagano (CNR), F. Simeoni (FAO), E. Travaglino (E-IIS), R. Tsantouli (NKUA);

Some open issues affecting integration of XSearch need to be carefully discussed. In particular:

  • how to configure the XSearch service per VRE;

Two questions have been raise by C. Baldassarre about XSearch:

  • what are the use cases for XSearch;
    • a long discussion took place.
  • when the XSearch will be released "as-a-Service" via the Infrastructure;
    • XSearch will be released in 2.11 and be deployed before the forthcoming review;

RSConsumer Performance Results

Presenter: A. Antoniadis (NKUA)

Slides: .pptx file

Participants: A. Antoniadis (NKUA), M. Assante (CNR), N. Bailly (FIN), J. Barde (IRD), C. Bekiari (FORTH), C. Baldassarre (FAO), E. Blondel (FAO), F. Brito (Terradue), L. Candela (CNR), V. Canhos (CRIA), H. Caumont (Terradue), P. Cauquil (IRD), G. Coro (CNR), P. Fabriani (E-IIS), G. Farantatos (NKUA), G. Kakaletris (NKUA), A. Manzi (CERN), V. Marioli (CNR), Y. Marketakis (FORTH), P. Pagano (CNR), F. Simeoni (FAO), E. Travaglino (E-IIS), R. Tsantouli (NKUA);

Integration between Search and Xsearch

Presenter: Y. Marketakis (FORTH)

Slides: .pdf file

Participants: A. Antoniadis (NKUA), M. Assante (CNR), N. Bailly (FIN), J. Barde (IRD), C. Bekiari (FORTH), C. Baldassarre (FAO), E. Blondel (FAO), F. Brito (Terradue), L. Candela (CNR), V. Canhos (CRIA), H. Caumont (Terradue), P. Cauquil (IRD), G. Coro (CNR), P. Fabriani (E-IIS), G. Farantatos (NKUA), G. Kakaletris (NKUA), A. Manzi (CERN), V. Marioli (CNR), Y. Marketakis (FORTH), P. Pagano (CNR), F. Simeoni (FAO), E. Travaglino (E-IIS), R. Tsantouli (NKUA);

A number of prototypes have been briefly presented. However, there is a very limited integration between these prototypes and the rest of the infrastructure. This should be enhanced.

Ecological Modeling

Presenter: G. Coro (CNR)

Slides: .pptx file

Participants: A. Antoniadis (NKUA), W. Appeltans (UNESCO), N. Bailly (FIN), J. Barde (IRD), C. Bekiari (FORTH), F. Brito (Terradue), L. Candela (CNR), V. Canhos (CRIA), H. Caumont (Terradue), G. Coro (CNR), A. Ellenbroek (FAO), G. Kakaletris (NKUA), V. Marioli (CNR), Y. Marketakis (FORTH), P. Pagano (CNR), M. Taconet (FAO), E. Vanden Berghe (remote participation).

N. Bailly: clustering can be performed in space and time. Time should be taken into account.

  • This requires other types of algorithms;

E. Vanden Berghe: cluster analysis and outlier detection is very important wrt environmental parameters. Time is less important.

N. Bailly: the kind of analysis described here should be carefully evaluated by experts. Models represents only part of the reality.

Environmental parameters worth to manage are: bathymetry, salinity, wind, ...

  • the most difficult part is to make the data available through the infrastructure;

V. Canhos:

  • it is fundamental to carefully understand the "quality of data";
  • data quality has implications of models quality;

P. Pagano:

  • the Statistical Service will be release in October. We expect that scientists start testing it and playing with it;

A Demonstration of the Statistical Manager took place.

  • the whole service is open, i.e. new operators can be easily added by developing a dedicated plugin;

Species Products Management

Presenter: P. Pagano (CNR)

Slides: [xxx]

Participants: A. Antoniadis (NKUA), W. Appeltans (UNESCO), N. Bailly (FIN), J. Barde (IRD), C. Bekiari (FORTH), F. Brito (Terradue), L. Candela (CNR), V. Canhos (CRIA), H. Caumont (Terradue), G. Coro (CNR), A. Ellenbroek (FAO), G. Farantatos (NKUA), G. Kakaletris (NKUA), V. Marioli (CNR), Y. Marketakis (FORTH), P. Pagano (CNR), M. Taconet (FAO), E. Vanden Berghe (remote participation).

V. Canhos: Data Quality is a prerequisite for any data processing. We should identify what kind of data quality controls can be implemented;

  • in gCube (TimeSeries) there are facilities for this, these are called "curation" (e.g. matching of values with respect to a code list/controlled vocabulary);

GNA (http://www.gbif.org/informatics/name-services/global-names-architecture/) is an activity that should be analysed

GBIF Australia has developed some code available through Google on outliers detections and removal. This should be analysed.

Usage Tracker Integration ( Parallel Session)

Presenter: P. Fabriani (E-IIS)

Participants: P. Fabriani (E-IIS), G. Farantatos (NKUA), A. Manzi (CERN), F. Simeoni (FAO), E. Travaglino (E-IIS),

There is no decision still on deployment scenarios of the Usage Tracker, but we can go for a solution with a UT for the whole infra.

Possible client Integrations:

  • Workflow Engine
  • Tree Manager
  • OpenSearch

Possible connectors

  • Storage Manager (MongoDB)

Defining the records:

Type Data Access : for Tree Manager and Opensearch( maybe)

Type Execution : Workflow

Type Storage : Storage Manager connector

Type Service : all gCube Invocations ( already collected in a DB, we should just create a connector)

Tickets to be opened for the definition of the Record Type

Contact points :

  • Workflow - John ( Gerasimos)
  • Data Access - Fabio
  • Service - Andrea
  • Storage - Paolo, Roberto Cirillo (CNR)

UT Deployment in dev ( open a ticket)


PEB

Participants: L. Candela (CNR), G. Coro (CNR), P. Fabriani (E-IIS), G. Farantatos (NKUA), G. Kakaletris (NKUA), A. Manzi (CERN), P. Pagano (CNR),

All on-going deliverable should be completed by 12 Oct.

The deadline for gCUbe 2.11.0 has been set to 19 Oct. The release will contain ( apart from major fixes) only the following components:

- Statistical Manager

- Time Series

- Occurrence

- XSearch

- Data Transfer portlet

- Visualization Library from T2?

Personal tools