From IMarine Wiki
|Line 21:||Line 21:|
|| '''[[#BiolCube | BiolCube]]'''; focuses on the management and interpretation of
|| '''[[#BiolCube | BiolCube]]'''; focuses on the management and interpretation of data.
Latest revision as of 11:03, 11 September 2013
The D4Science / iMarine infrastructure combines the functionality of more than 500 components into a coherent and centrally managed infrastructure of hardware, software, and data resources. Together, these offer a platform that can host a variety of applications. These applications share a common theme; Provide a service to a Community of Practice. Other than other infrastructures that boast size, power, performance, or latest technology, D4Science puts the community first. In the context of iMarine, this is taken even further, quite literally, as the Ecosystem Approach Community of Practice is spread around the globe. No other infrastructure equals iMarine in developing support the real-life scenarios overcoming 'low' hurdles; low resources, low training, low connectivity, low data quality. We are glad to leave the high hurdles to specialists, we rather serve communities that work to achieve the UN Millennium Development Goals. This does not imply we make concessions on quality or performance, but we see it as our mission to offer quality and performance to communities that have no resources of their own to jump high hurdles.
The infrastructure resembles an archipelago where applications emerge as islands of services, resting on an underlying infrastructure bedrock. The islands specialize in one or more domains, yet are not isolated 'atolls'. Every island is well connected to others, and island-hopping is strongly encouraged. Each island offers a standard set of features that can be extended by selecting services from several topical bundles.
The iMarine infrastructure currently offers 4 main domain bundles that can be customized and / or enriched into flexible, purpose-built applications. Each application in the infrastructure is tightly integrated with the underlying gCube enabling software, and can access and re-purpose data from other iMarine applications.
Through the enabling environment of gCube, all users benefit from Infrastructure Services, but where to start? For new users, iMarine offers several domain oriented solutions for 4 categories of users: data managers and analysts, biologists, spatial data managers, and policy oriented 'omnivores'. For each of these, a bundle of relevant gCube software components is available in a 'Cube'. This bundle can be limited to receive (and pay for) only those resources actually needed or consumed. A bundle can also be extended with resources coming from other bundles; our aim is to offer bundles characterized by the domain tools, and not by domain boundaries. In our experience, most experts rather manage their information in a bundle of domain specific software, and are only consumers of data from other bundles. Thus, in most use scenarios, a user would be a data manager in a bundle, but only a consumer in another.
The 4 key-applications that iMarine has delivered and continues to enrich are:
|BiolCube; focuses on the management and interpretation of biodiversity data.|
|StatsCube; a complete full life-cycle data framework, from observational data to aggregated data repositories enriched with validation and analytical tools.|
|GeosCube; tightly connected to the BiolCube, the framework, based on OGC compliant tools and services manage the storage and interpretation of geospatial explicit information, including WPS processing.|
|ConnectCube; brings semantic technologies for publishing structured data so that it can be interlinked and become more useful to end-users, enabling them to produce LOD, to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.|
The bundle approach, by itself an abstraction over a host of services, is expected to offer more 'flavors' in the near future. For instance a focused approach for infrastructure support for Mobile Apps is foreseen:
- AppsCube; will offer an integrated approach to mobile app development. The infrastructure organizes the content and data-exchange with mobile apps, Please note that the App itself is not developed with iMarine, rather it relies on the infrastructure to maintain and manage the data collected with and exposed through this App.
- IceCube; An Integrated Computing Environment, will offer access to infrastructure, cloud and grid based computing resources "as a Service". An instance of such Cube will offer users access to predefined data, and algorithm that can be applied to these data.
BiolCube is available as a suite that packs many useful features in one research environment where marine ecologists are offered a complete private work-space to manage species names and occurrence data, the main areas where BiolCube offers services:
Taxonomic and occurrence data discovery and management
- Occurrence data finder: Download public datasets from world class biodiversity occurrence data repositories to your private environment where you can prepare datasets for use in further analysis with iMarine tools for data curation, filtering, merging and duplicate detection. Occurrence data can be directly visualized on maps using the geo-explorer, downloaded in several formats, and shared with / send to other environments.
- Species name finder: Not sure about a species name? Then iMarine offers tools to search, download and verify taxonomic and vernacular names of marine species.
- Species name matcher: Correcting spelling mistakes or incomplete names can be very time-consuming. With iMarine tools you can validate the names of species names in your data to ensure they comply with the standard of your choice. iMarine offer powerful matching and reconciliation services, already in use at FAO, to identify close matches the names in your datasets. The infrastructure makes several key reference datasets available for consultation and reconciliation. These include the FAO ASFIS species list, FishBase for finfishes, and WoRMS the World Register of Marine Species. If you wish, you can add your own reference list.
- Environmental enrichment of data: In a shared service with GeosCube, this service adds environmental information to occurrence data to improve their quality and usefulness in modeling and analytical exercises. The service allows to obtain an estimate of a range of dynamically computed environmental parameters such as water temperature, ocean color, salinity, aragonite content, or BOD. The services can identify the nearest observations in space and time, and will return a computed average or nearest observation that can document an occurrence. The iMarine innovative tools allow to specify what the 'nearest' means; i.e. a distance, a distance over a gradient, a seasonal average, or a depth range.
Modeling and analysis of distribution data
- Biodiversity mapping tools: The first iMarine species distribution and biodiversity mapping tool enabled the production of the well-known AquaMaps. With iMarine, the generation became faster, more robust, and results are shared in a collaborative environment. In addition to AquaMaps, many other biodiversity analytical and predictive tools are available. These include the toolset of OpenModeler and custom build Neural Network driven analytical services.
- Species fact-sheets generator: With scientists spread over the globe, generating consistent information sheets on marine species is no sinecure. That is why the FishFinderVRE was designed. It offers a complete templating and reporting work-flow operated by scientists, for scientists. The results, species fact-sheets, can be disseminated in a variety of formats, inp articular those established by FAO for its now famous species and regional catalogues, field guides, and the more recent pocket guides.
- Trend-analysis of data: In a shared service with StatsCube, Trendylyzer offers services to identify and vizualize trends in time-series of data. Trendylyzer was developed to specifically address skewness and gaps in datasets.
- Spatial analysis of data: In a shared service with StatsCube, clustering, probability, and other spatial analytical features.
BiolCube is an independent yet not isolated bundle of specialized services for marine ecologists and natural aquatic resource managers. Well embedded in the iMarine e-Infrastructure, it provides access to auxiliary services that turn BiolCube in a multi-purpose toolbox for biodiversity data analysis. iMarine enables a near-seamless access to powerful statistical analysis software through StatsCube, advanced plotting and geospatial data production through GeosCube.
With BiolCube and StatsCube services combined, developers are now working to develop an integrated environment where species distribution can be studied in space and over time, with occurrence data analyzed using measured environmental observations, rather than estimated large scale average values.
The services that are most characteristic of this bundle are:
- Species Product Discovery service
- Occurrence Data Reconciliation
- Occurrence Data Enrichment Service
- Taxon Names Reconciliation Service
If you wish to learn more about using BiolCube or specific services, please contact us.
StatsCube offers a complete data suite to manage the entire data-cycle from collection to archiving. With iMarine technologies exiting new capabilities are added to the life-cycle management and analysis of especially time-series data. StatsCube is developed using state-of-the-art OpenSource components that are brought together in a managed infrastructure. This enables a very cost-effective offer to resource poor institutes in need of sophisticated data services. Other benefits are the availability of shared services for reference data management, and harmonization of data repository services.
StatsCube relies on continued support and ongoing development of a bundle of service. This bundle offers services that together support a complete life-cycle for statistical data, but can also connect to services offered through other bundles to establish a network of cross-domain services.
The StatsCube bundle offers a set of services available to VRE managers. They can select from this bundle to compose one or more VRE's, and decide who can access such services. This allows for a fine-grained approach to sometimes complex data-workflows, where data flow from detailed field level data through several aggregation and review stages until summary statistics can be produced. At each stage of such work-flow, other resources can be mobilized in support of specific activities such as geo-referencing, enrichment with environmental data, statistical modeling or analysis. With StatsCube, iMarine implements key data services:
Data Work-flow If you need to manage data-flows, iMarine offers a life-cycle support where data enter the system as observations or batch data, and can then be harmonized and validated before being added to a repository. Not only are data well described by metadata during this process, but also the processing steps are captured as process metadata. The entire process is under the control of a 'visor' that protect the data from unauthorized access and modifications. The harmonization can rely on powerful matching features that enable to establish matches between datasets that would be very time-consuming to establish manually. Just as one would expect in a work-flow, the matching results are kept for re-use and reference. The matching is usually performed against a (long) code list, that are fully managed through the iMarine infrastructure. A specialized code list manager enables the ingestion (of existing SDMX code lists), creation, and maintenance of reference lists.
Data Analysis iMarine excels in offering advanced data analysis facilities to users. The clear separation of data and analytical resources makes it also easy work with these analytical tools. The infrastructure stores the data, and no complicated steps are needed other than to select and filter the datasets, and load these to the required analytical environment. For analysis, several environments are proposed, ranging from a bare-bone R-studio, parallelized R-servers, VRE-based analytical and predictive algorithms such as AquaMaps, to the Statistical manager, where users can integrate their own logic. This logic can exploit infrastructure computing resources, or interact with external Cloud or Hadoop clusters. With iMarine, the threshold for exploiting such resources is lowered considerably, making them accessible to a much wider, geographically dispersed EA-CoP. Examples of analytical features implemented in iMarine are:
- Tools include R, WPS, Hadoop, WEKA data mining and access to Cloud resources;
- Algorithms in the statistical service include DBSCAN, Neurological Networks, Clustering, and trend analysis.
Data reporting and visualization After a dataset has been added to the infrastructure, or once an analysis has been performed, the results are available in the same infrastructure to enrich reports, repositories or other infrastructure resources that can access them. Dataset in iMarine are easily enriched and re-used in sometimes surprising new contexts. Some advanced facilities to work with statistical data are:
- geo-referencing time-series, and display these on maps;
- include time-series in reports;
- infrastructure services for download, sharing and sending datasets.
A few key services of this bundle are:
- Tabular Data
- Time Series
- Data manipulation, mining and modeling
For more information on getting started with and using StatsCube, the iMarine website offers many resources. You can also register to the iMarine gateway to experience some of the components.
Examples of StatsCube implementations are
- ICIS; a complete solution for the collection and dissemination of fisheries capture data.
- Tuna Atlas; a focused ICIS implementation, with extended mapping capabilities provided through GeosCube.
- TimeSeries Environment; An open free-to-use private solution of ICIS.
- Trendylyzer; A trend-analysis toolkit for time series that have evolved over time, and have incorporated inconsistencies, gaps, and discrepancies. Trendylyzer employs a range of mining and manipulation techniques to first prepare a harmonized data-set, and then discover trends, if the data allow.
GeosCube is the iMarine answer to the large and complex issue of understanding fisheries and biodiversity data in the spatial domain. Through GeosCube, spatial services are offered to consumers of the iMarine infrastructure, be they other iMarine tools or VRE's, or external organizations wishing to use iMarine's web-services.
Through GeosCube iMarine aims to offer an INSPIRE directive compliant bundle of services that will enable the generation and management of geospatial explicit data for practitioners who have no resources to develop and maintain their own spatial data infrastructure. From the onset of iMarine GeosCube was seen as a service provider to several business cases. The set of services, standards and protocols that together comprise the bundle rely on W*Ss, GeoNetwork, GeoServer, and THREDDS. In iMarine a catalogue is implemented using the CS-W protocol through a GeoNetwork service. The GeosCube bundles a range of OGC compliant resources that can be either made available in it's entirety, or as a selection of services that can be mounted in a customized environment, such a VRE. These VRE's are vertically integrated, and horizontally interoperable. They rest on the gCube infrastructure, and are thus managed through a well-defined environment, while at the same time seamlessly benefit from data and processing resources made available through that infrastructure.
GeosCube bridges the gap between powerful infrastructure-based geospatial tools and data, and lightweight web map solutions with limited processing capacity. It thus enables the use of these powerful tools for resource limited users and organizations.
GeosCube bundles the tools to:
- Upload large datasets and overlay them up with thousands of other layers;
- Share edit or view access with small or large groups;
- Export data to standard formats;
- Make use of powerful online geospatial tools;
- Predictive mapping using world-class algorithms such as AquaMaps;
- Analytical features such as clustering and trend-analysis with the custom build statistical manager;
- Legacy applications for e.g. interpolation and map comparison using WPS/Hadoop;
- Use our DIY approach to convert and host your application;
- Georeference statistical data, occurrence data, fact-sheets, and documents online;
- Publish one’s data to the world or to just a few collaborators.
GeosCube is constantly being enriched with features. We are working hard on:
- Annotation and commenting on maps;
- Create and edit maps and link map features to rich media content including LOD;
- Validation of geospatial explicit data such as names, location, and movements;
- Interpolate environmental data sets to add information to occurrence data;
- Mobile client;
- Field-data collection.
Interested users can select services from this bundle described in detail here:
Example products that rely on services made available through this bundle in the iMarine infrastructure are:
- AquaMaps; use this State-of-the-art suite to generate predictive species distribution maps;
- ICIS; Georeference Statistical datasets;
- Species Products Discovery species occurrence geospatial datasets disovery and sharing (KML / GML);
- GeoExplorer; Vizualize species information, environmental informations, borders and competence areas and other geospatial explicit data. View details, select layers of information and share the results.
ConnectCube aims to deliver information to policy makers from a variety of sources as an integrated view. These are generated using a variety of approaches, including semantic technologies.
ConnectCube offers flexible sharing, storage, reporting, search and retrieval, aggregation and projection facilities. These are primarily offered as data-driven indicators and topical fact sheets. These facilities can only be effective if a modern toolset is available to enrich or annotate existing data with relevant information in the form of e.g. uri's.
ConnectCube includes several semantic technologies. One important objective is to identify and link equivalent concepts from different resources, in order to allow a harmonized search over datasets. The current semantic network includes entities and relationships from the the domains of marine species, water areas, land areas, exclusive economic zones, and capture. It serves software applications in the domain of statistics, and GIS. The main information outlets are currently semantic factsheets. The content is also exposed via either SPARQL endpoints (suitable for semantic applications), or via JAVA API to be embedded in consumers' application code (one could also see the Semantic Cluster technologies wiki page).
The use of an infrastructure enables to focus on the needs of policy makers, that need to rely on dynamic reports, extracted near-real time from data coming in from multiple directions, and with varying quality and accessibility policies attached to these data flows.
- Organizational features of iMarine; Workspace, messaging, mailing, user management
- Social tool;
- Semantic search and fact-sheets;
- Ontology engineering and use, especially in the fisheries domain;
- Linked Open Data engineering and maintenance;
- Plugins for remote information (OAI, OpenSearch)
Expected products that use semantic services from the ConnectCube bundle are:
- Ecoscope; semantic fact sheets for tuna fisheries;
- Smartfish; semantic factsheets on top of 3 data repositories;
- FishFinder; factsheets of marine species enriched with semantic annotations.
Some of the most indicative services for this bundle are:
Examples of products that already rely on services offered through this bundle are:
- The reporting VRE's FCPPS and FishFinderVRE;
- The iSearch VRE;
- All VRE's equipped with the social tools and workspace.
The rapidly growing use of mobile apps for data collection and dissemination requires that content and reference data are managed from an integrated data perspective. With ever more versatile and demanding apps, data often cannot be kept in one central repository that fits all sizes. Very often, apps mash-up data from e.g. geo-spatial and statistical data resources, or, when used in data collection, rely on constantly updated reference data, such as of names of species, vessel characteristics, or local reporting requirements.
Modern apps require an infrastructure that was designed specifically to deal with data discovery, access, and manipulation features in mind, and combines this with search and retrieval functionality over multiple resources. With the D4Science infrastructure iMarine offers a very powerful backbone to mobile apps.
In iMarine, mobile apps are considered as data clients for data managed through the infrastructure, which are exposed to the apps (or vice-versa) through web-services. Examples are map-display in the AppliFish mobile app, and the infrastructure search enabled in the search mobile app. The D4Science infrastructure can make data available to apps through reliable connectors, and can offer services that collect and validate mobile application data.
The first mobile applications in iMarine that provide evidence of the suitability of the infrastructure are:
- AppliFish; The FAO species fact sheets enriched with domain specific data (+4000 downloads!)
A key benefit of iMarine is the ease to set up scalable data processing solutions. A scalable solution may be needed because you have to manage any combination of a lot of users, a lot of data, a lot of processing, and a lot of new functionality. This requires expertise that is usually not found in one place. An infrastructure can offer more than one solution; offering a dedicated computing environment, parallelization, access to a grid or cloud environment, or outsourcing computations to external infrastructures are all options to consider. With iMarine expertise, you can ask for a technology solution, where several options can be discussed.
The iMarine Integrated Computation Environment Bundle (ICE-Cube) aims to speed up not only the computational processes, but also the administrative and organizational process to select, tune, and test a new infrastructure.
The services available on demand can be separated in several categories:
- Manage administrative scalability
- Manage users;
- Manage virtual Organizations.
- Manage Functionality
- Manage data in a pre-processing environment;
- Select the processes you wish to apply to your data;
- Perform the computation and monitor progress, intervene if needed;
- Share the results, or use in another process in the same infrastructure, eliminating the need to transfer data;
- Keep a trail of the applied processes with the data results, boosting reproducibility and credibility.
- Manage Load scalability
- If your computations take more time then expected, or are growing fast in number or size, more resources can be dynamically added;
- If your computation is complex or instable, iMarine can offer expertise from trained computer experts to analyze the code and propose alternative solutions.
- Manage geographic scalability
- Keep your data and processes together to ensure confidentiality;
- Bring your computation to your data to reduce bandwidth use.
ICE-Cube is available and ready to be further exploited.