D9.2 Service deployment in computing and data e-Infrastructures Version 2

This document reports on implementation results of T9.1 use cases for testing and validating ENVRIplus Theme 2 service solutions. It is an update of the previous report D9.1 that was submitted on M28. In D9.1 we explained the approach to the service integration and validation. The driver was to defi...

Full description

Bibliographic Details
Main Authors: Chen, Yin, Haggström, Ingemar, Buck, Justin, Stocker, Markus, Carval, Thierry, Vitale, Domenico, Huber, Robert, Hellström, Margareta, Candela, Leonardo, Haslinger, Florian
Format: Text
Language:unknown
Published: Zenodo 2018
Subjects:
Online Access:https://dx.doi.org/10.5281/zenodo.3258529
https://zenodo.org/record/3258529
Description
Summary:This document reports on implementation results of T9.1 use cases for testing and validating ENVRIplus Theme 2 service solutions. It is an update of the previous report D9.1 that was submitted on M28. In D9.1 we explained the approach to the service integration and validation. The driver was to define appropriate use cases that address community’s real needs, integrate ENVRIplus Theme2 services as part of the service solutions, and implement those use cases following an agile approach. This document describes 7 well-developed use cases that are selected as the final Science Demonstrators. By Science Demonstrator we mean “a showcase of a service solution illustrated through a prototype implementation, which serves as proof or evidence that the Theme2 services can bring added value for supporting ENVRIplus community to deliver scientific research”. Science Demonstrator 1 addresses a requirement of the EISCAT RI community, namely to allow individual scientists to process their experimental data using their own algorithms. The challenge is common to many ENVRIplus RIs, where data is often processed using standard models and methods. As researchers want to use different analysis models, easily modify parameters or algorithms, and collaborate with each other, they need a Virtual Research Environment (VRE). This demo showcases a model making use of the D4Science gCube platform developed by T7.1, which enables scientific researchers to re-process data by implementing and adapting algorithms and parameters from other sources. Science Demonstrator 2 showcases a novel implementation of a computationally efficient tool for processing of Eddy Covariance (EC) data which offers to users the possibility to calculate EC fluxes through the EddyPro® software (LI-COR Biosciences, 2017; Fratini and Mauder, 2014) according to 4 processing schemes resulting from a different combination of existing methods. To reduce the computational runtime required, the 4 processing schemes were implemented and executed in parallel mode. The whole service setup including a metadata management algorithm, was implemented and tested in the D4Science gCube Virtual Research Environment provided by Task 7.1, and the final computational runtime for Near Real Time (NRT) processing (i.e. flux estimates based on raw data collected the previous day) is of about 4 minutes, similar to those required for a standard run involving only a single processing scheme. Science Demonstrator 3 addresses a common problem for ENVRIplus RIs (specifically observatories that build on environmental sensor networks) that data acquisition service, in particular, the preparation of data transfer prior to data transmission are often not yet sufficiently standardized. This hinders the operation of efficient cross-RI data processing routines, e.g., for data quality checking. The demonstrator showcases a service prototype that allows submitting and publishing raw observational (non-geophysical) environmental time series data in common standard formats (T-SOS XML and SSNO JSON). A messaging API (EGI ARGO) is used to perform Near Real Time (NRT) quality control procedures by an Apache Storm NRT QC Topology, which publishes the quality controlled and labelled data via a messaging output queue. Science Demonstrator 4 describes the EuroArgo Data Subscription Service (DSS) that allows researchers to subscribe to customized views of Argo data, selecting specific regions and time spans, and choose the frequency of updates. Tailored updates are then provided on schedule to researchers’ private storage. The demo showcases an integration solution that combines the EuroArgo community data portal with e-Infrastructure services (EUDAT B2SAFE, EGI FedCloud, etc.), and uses the DRIP service developed by T7.2 for optimised service deployment. The pilot activity was initiated by the marine research community, however, the possibility to receive regular transmissions of data, especially in near-real time, directly from the organisation responsible for data collection and (pre-)processing, is very important to many large initiatives. ENVRIplus RIs can benefit from the subscription services, e.g., to create more elaborated data products by requesting data from other sources, and can optimise their internal workflows by signing up for automatic updates. Science Demonstrator 5 showcases a “sensor registry” that aims at supporting the management of sensors deployed for in-situ measurements. Common sensors or families of sensors are used across different research infrastructures, for example, oxygen optodes that are equipped on platforms in multiple research infrastructures. The goal of this work is to define common methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed in WP8, and uses data technologies and standards from the OGC Sensor Web Enablement family including SensorML, Observations and Measurements (O&M), and Sensor Observation Service (SOS). It brings together a marine domain implementation of these standards (the Marine SWE profile) developed by several European projects demonstrating the viability for future sensor and observation activities. The service can be integrated to various types of platforms, deep-sea observatories (e.g., EMSO), marine gliders (e.g., EuroGOOS) as well as solid earth (e.g., EPOS) or atmosphere observations (e.g., ICOS). It can also be used to track usage of specific sensor models (e.g., CO2) across the RI ‘s observation networks. Science Demonstrator 6 describes a service prototype that supports aerosol scientists in studying new atmospheric particle formation events by moving data analysis from local computing environments to interoperable infrastructures, thus harmonizing data analysis itself and more importantly the syntax and semantics of data derived from analysis. As researchers interpret primary data and thus gain information and transfer information into knowledge, we are studying and advancing in particular some technical aspects of a knowledge infrastructure i.e., a robust network of scientists, artefacts such as virtual research environments and research data, and institutions such as research infrastructures and e-Infrastructures that acquire, maintain and share scientific knowledge about the natural world. The science demonstrator showcases a possible architecture of a socio-technical infrastructure that “transforms data into knowledge.” The proposed approach highlights a range of novel possibilities, in particular enabling researchers to focus on data analysis and interpretation while leaving data access and transformation from and to systems to interoperable infrastructure. It significantly contributes to implementing the global agenda of FAIR data by promoting the notion of “FAIR by Design”, weaving data FAIRness into the fabric of infrastructures. It builds on the principle not to leave making data FAIR to researchers but to guarantee it by design of well-engineered infrastructures. The demonstrator is first and foremost of primary interest to a specific scientific community, namely the various aerosol research groups that study new particle formation events. Science Demonstrator 7 illustrates how a LifeWatch researcher can easily upload and integrate an analysis algorithm in D4Science, and share it with other researchers in a VRE. The use case proposed an integration solution that links the D4Science/gCube VRE to the LifeWatch RI and to the EGI e-Infrastructure. This integration, for example, enables individual researchers to repeat and reuse algorithms at will, run trend analysis, and add new parameters and custom data. The VRE provides provenance registration that improves reproducibility and also allows retention of computation results in the user’s workspace. This facilitates editing and adaptation of algorithms, features that are not provided by the existing LifeWatch ICT. : Slightly updated from the version 2.0 (which was delivered as official report to the Commission).