2012.02 WP10: Data Consumption Facilities Development Monthly Activity By Task and Beneficiary

From IMarine Wiki

Jump to: navigation, search

Contents

This WP10 Activity Report described the activities performed in February 2012 by Beneficiary and Task.

It is part of February 2012 Activity Report.

T10.1 Data Retrieval Facilities

NKUA Activities

NKUA has released version 3.0.0 of the gCube Search System for gCube 2.8.0, which features the integration of the query cache implemented during the previous period into the planner of the Search System.

Version 3.3.0 of the gCube Index subsystem, featuring snippet support has also been included in the 2.8.0 release.

Moreover, a new feature has been added in all Search Operators which are included in the Search Operator Library and used by the gCube Search System: The Search Operators now support configurable gRS2 buffer capacities by exposing a relevant parameter through their APIs. This parameterization will enable the Search System execution planner to be more adaptive against the number of collocated operator executions, by adjusting the buffer capacities of operators executed on the same node, thereby trading performance for memory whenever necessary. This feature is included in a new version of the Search Operator Library (1.1.0) which is included in gCube 2.8.0 and is supported by the corresponding gRS2 enhancement in T9.2 also released in 2.8.0

Finally, NKUA has been working in the compilation of the specifications for Data Retrieval Facilities in accordance with the schedule of MS41. The milestone has been achieved and the specifications are available here


MS41 has been achieved with a delay of 2 working days but is included in the present report as most work toward its achievement took place during the current and previous periods.


  • Release of version 3.0.0 of the gCube Search System (org.gcube.search.searchsystem.2-2-0) featuring query caching
  • Release of version 1.1.0 of the Search Operator Library (org.gcube.search.operatorlibrary.1-1-0) featuring configurable buffer capacity
  • Release of version 3.3.0 of the Index subsystem (org.gcube.index-management.3-3-0) featuring snippet support

FORTH Activities

We are developing a prototype application (external to gCube) that will use gCube search system as its back end and provide its functionality on top of it. We organized a teleconference to discuss several issues regarding this approach.

We discussed the availability of textual snippet of each result, as well as the ability to get the actual content of a hit. However we realized that there is not always a textual description for an ‘object’ (e.g. there are maps, CSV files, time series etc), and there is not a single identity resolver for every identifier (regarding the ability to get the actual ‘object’).

A gCube search system prototype that returns snippets is being prepared by NKUA but it is not available to us (FORTH) because it is not yet part of the infrastructure.


none


none

Terradue Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

T10.2 Data Manipulation Facilities

NKUA Activities

As the redesign of the Data Transformation Service finished last month, NKUA moved on towards the implementation of that design. Starting from bottom to top, we worked on the materialization of the operators of the plan elements, such as Data Source Operator, Transformation Operator and Merger Operator that will be used by PE2ng as the working nodes that actually carry out the transformation.

Especially,

  • Data Source Operator takes as input a data source description and the content type that it does forward, resolves the corresponding data source type through a data source handler and creates a new data source that will be used as a source. It also creates a result set data sink, that data elements read from source of the corresponding content type will be appended to. It returns the locator of the constructed result set.
  • Transformation Operator takes a result set as an input, the correpsonding content type and the target content type, performs a simple transformation and appends the result to a result set. The locator of the created result set is returned.
  • Merger Operator takes as an input a result set, containing the locators of the result sets that have to be merged. A new result set is constructed, as the outcome of the merging process, consisting the output of the whole transformation.

As for the next moves, we have to combine those working nodes, that will be instructed by a constructed transformation plan.


none


none

CNR Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

FAO Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

T10.3 Data Mining and Visualisation Facilities

CNR Activities

A special page on the gCube wiki has been created for summarizing the experiment that are being performed on the Statistical Manager service as well as the enhancements and the implemented functionalities: https://gcube.wiki.gcube-system.org/gcube/index.php/Ecological_Modeling#Habitat_Representativeness

In particular, in February the following techniques have been implemented or enhanced:

  • CLASSIFICATION QUALITY ANALYSIS: This evaluation method applies to a probability distribution and a set of occurrences\absence points. Calculation includes the following values
    • TRUE_POSITIVES
    • FALSE_NEGATIVES
    • TRUE_NEGATIVES
    • FALSE_POSITIVES
    • ACCURACY
    • SENSITIVITY
    • SPECIFICITY
  • DISCREPANCY ANALYSIS - BETWEEN TWO SPATIAL DISTRIBUTIONS: Evaluates the distance between two spatial probabilities distributions with the same resolution, in terms of
    • ACCURACY
    • MEAN ERROR
    • VARIANCE
    • NUMBER_OF_ERRORS
    • MAXIMUM_ERROR
    • MAXIMUM_ERROR_POINT
    • NUMBER_OF_COMPARISONS

Some notes about the HRS: we applied the HRS technique to the experiment from the previous month. The output of the procedure is currently a score for each input feature (ice concentration, salinity etc.), ranging from 0 to 2, where 0 means that the occurrence\absence points for the species under analysis is rich enough to represent the variability in the projection area (usually all the oceans).

An overall score was even calculated by summing the HRS scores of the features. This is the only little difference respect to the Mac Leod paper, where he suggests to weight each score for the inverse of the eigenvalues for the PCA transformation. We ignored the inverse weighting for two reasons: (i) the eigenvalues highly depend on the ordering of the vectors taken for calculating the PCA (while the HRS don't), (ii) as all the PCA components are used by the HRS algorithm we should consider all the features as independent or at least equally important; all the HRS scores are commensurable and then an inverse weight would always give too much importance to less variable dimensions. Our overall HRS score is almost independent from the ordering of the input vectors and rapidly achieves an asymptotic number when an increasing random number of samples is taken from the projection area.

In the following results, we report an analysis we performed on the basking shark case, discussed in the last month:

Presence + Dense Absence Presence + Random Absence Dense Absence Random Absence
COMPLETE HCAF 4.49 3.89 7.57 3.39
DENSE ABSENCE DISTRIBUTION 9.16 17.39 0.00 15.62
RANDOM ABSENCE DISTRIBUTION 5.20 4.92 6.34 0.00


  • CLASSIFICATION QUALITY ANALYSIS
  • DISCREPANCY ANALYSIS
  • HABITAT REPRESENTATIVENESS SCORE
  • PRINCIPAL COMPONENT ANALYSIS (used as a core functionality in the HRS)

NKUA Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

FAO Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

T10.4 Semantic Data Analysis Facilities

FORTH Activities

We investigate how to exploit semantic web formats for publishing advanced search results (i.e. clusters, mined entities). For the moment we will refer to these services with the name X-Search.

We prepared a document discussing three main topics:

  • [I] Capabilities of the underlying search system
  • [II]Structure and format of X-Search responses
  • [III] Specification of a general HTTP API

Then we organized a teleconference for discussing these issues (as well as issues related to T10.1) with other participants in WP10 and WP11.
The summary of the meeting:
X-Search services can function on top of gCube Search system. In this way all indexes that are currently used by gCube Search system and its distributed evaluation approach are exploited. Some of the X-Search services (e.g. clustering) rely on the availability of textual snippets. A generic approach to tackle this requirement is the following: the Open Search description document of the source (here gCube Search system), or the query response itself, can indicate which is the attribute that holds textual descriptions (if such exist). Some of the X-Search services (on demand) could further process the actual contents of a hit. This requires an identifier for the hit and a resolver. This could be tackled analogously, e.g. the OpenSearch description document of the gCube search system (or its response) could indicate which is the attribute that carries identifier information.

In parallel we’re investigating with FAO a configuration of architectural components that can instantiate a retrieval mechanism of Fisheries documental resources leveraging X-Search and the Fisheries Linked Data resource FLOD.


none


none

FAO Activities

The beneficiary should report here a summary of the activities performed in the reporting period


The beneficiary should report here major issues faced in the reporting period and the identified corrective actions, if any.


The beneficiary should report here a bullet list highlighting the main achievements of the reporting period

Personal tools