The DataCloud Toolbox provides 6 main tools (DIS-PIPE, DEF-PIPE, SIM-PIPE, ADA-PIPE, R-MARKET and DEP-PIPE) as shown in the figure below.
Each tool has its own GitHub repository and may consist of different tool components having their separate component repository on GitHub.
DIS-PIPE (discovery pipeline tool)
DIS-PIPE provides scalable integration of process mining techniques and artificial intelligence algorithms to learn the structure of Big Data pipelines by extracting, processing and interpreting vast amounts of event data collected from several data sources. Furthermore, DIS-PIPE supports a variety of analytics techniques for visualising the discovered pipelines together with detailed diagnostics information about their execution.
DEF-PIPE (definition pipeline tool)
DEF-PIPE provides a visual design for domain experts to implement Big Data pipelines based on a DSL, including means to store and load the pipeline definitions. Furthermore, it enables data scientists to define the pipelines by configuring each step, injecting code, or customising predefined generic templates.
Components
SIM-PIPE (simulation pipeline tool)
SIM-PIPE simulates the pipeline execution and provides final deployment configurations that conform to the hardware requirements. SIM-PIPE also provides testing functionalities, such as a sandbox for evaluating the performance of individual pipeline steps and statistical analysis of the performance of the overall pipeline.
ADA-PIPE (adaption pipeline tool)
ADA-PIPE provides a data-aware and adaptive scheduling algorithm for allocating the data pipeline step to the Computing Continuum with infrastructure drift adaptation capability. ADA-PIPE allows dynamic resource reconfiguration for improved performance and SLO fulfillment.
R-MARKET (resource marketplace tool)
R-MARKET deploys a decentralised hybrid permissioned and permissionless blockchain network that federates a vast set of heterogeneous resources from various providers spread across the Computing Continuum. R-MARKET creates a democratic marketplace of trustworthy resources and enables transparent provisioning over multiple control and network domains for external use.
Components
DEP-PIPE (deployment pipeline tool)
DEP-PIPE enables flexible and scalable deployment and orchestration of Big Data pipelines over the Computing Continuum resources. DEP-PIPE monitors the pipeline execution and provides online SLO metrics to the other tools.
Components