The Dataverse
Data Spaces can be nested and overlapping, as one dataset can participate in many Data Spaces, and many applications can be built on top of one Data Space. The whole is greater than the sum of its parts : this is the Dataverse.
Overview
OKP4 is a general-purpose Ecosystem that enables XaaS (Anything as a Service) integration. Anything that is presented to the Protocol as a Service, whatever it does, whereever it is hosted or deployed (in the cloud or in premise), whoever provides it, it can be used by the Protocol. Therein lies the integration power of the protocol, which brings infinite scalability and extensibility to the entire OKP4 ecosystem.
There is a large possible typology of services already supported by the Protocol:
- Data as a Service - Services allowing data management: storage, accessibility, administration...
- Algorithm as a Service - Services that process data to produce meaningful information (i.e. knowledge), for instance: machine learning algorithms.
- Infrastructure as a Service - Services that offer computation, storage and networking resources.
- Identity as a Service - Services that offer a decentralized management of identities.
The applications that are leveraging outputs from one (or many) Data Space are not considered as resources in the Dataverse as they are not referenced in the OKP4 protocol.
The Ecosystem is based on a fully decentralized, open and agnostic architecture, which is:
- Scalable - designed to scale to infinity.
- Interoperable - ensure maximum integration with existing and future services.
- Auditable - provide a complete topology of the Ecosystem recorded as an ontology on-chain.
The Dataverse is an ever-extanding universe comprised of all the Datasets, Algorithms, Softwares, Infrastructures, Identity solutions, Workflow Engines, and other resources referenced in the Blockchain
Let's dive into each kind of resource that can be shared into the Dataverse and how to coordinate them together.
Decentralized Orchestration
The Workflow Engine
The Workflow Engine is a special service of the Protocol which orchestrates the invocations of other services. It is a reflex component of the Ecosystem that listens to the transactions of the blockchain and is triggered on command, when a particular transaction of execution request is registered in the blockchain.
The following diagram describes the different possible interactions during the execution of a workflow with a single service.
- (1) - The use case starts with the submission of a transaction requesting to execute a workflow.
- (2) & (3) - In parallel, the Workflow Engine regularly queries the blockchain to retrieve the latest validated transactions.
- (4) & (5) - If the transaction is a request to execute a workflow, the Workflow Engine retrieves the metadata describing the services to invoke. For simplicity, we consider here that the invoked workflow consists of a single service.
- (6) - Before invoking the service, the Workflow Engine records a status transaction in the blockchain. This information is important because it helps to ensure that the contract established between the customer and the various suppliers (data, services, infrastructure, etc.) is honored.
- (7), (8) & (9) - Based on the service metadata, the Workflow Engine is able to call the service.
- (10) - After invoking the service, the Workflow Engine records a status transaction (success, failure) in the blockchain.
The knowledge graph
All the different activities of the Workflow Engine, and especially the invocations of the services, are recorded in the blockchain, with all the necessary context for a complete understanding.
Since all the information is expressed in the form of an ontology, the Workflow Engine enables the chain of custody of each piece of data (what) during the different processings (how) operated on the different locations (where) at a given time (when), which leads to a knowledge graph from the source data to the final knowledge produced. Moreover, based on this knowledge graph, monetary policies can be applied in order to reward all the actors in the workflow, from Data Providers to Service Providers, in accordance with the business model stated in the Data Space.
The workflow
A workflow is a (standardized) description of a processing flow using existing services. It is understood that a workflow can be reduced to the use of a single service, in which case the processing is just the invocation of that service.
The services registered in the Catalog in the Protocol are described as a workflow using a Domain Specific Language that allows the execution of the interaction between the Workflow Engine and the described service with great flexibility. In this sense, the Protocol service descriptions, at runtime, act as a broker to the target services, an intermediary that performs Protocol adaptation and translation between the Workflow Engine and the invoked services.
The decentralization of orchestration
Decentralization comes first and foremost through the Protocol's ability to bring into play services of all kinds, hosted anywhere. It is also possible to have the same services (or services offering a similar set of features) deployed in several places (cloud or in premise), which reinforces the distributed aspect. The Workflow Engine itself obeys the same principles and can therefore exist in different deployments in the cloud. In this case, the selection of the Workflow Engine instance can be done according to different strategies that can be decided at the Data Space level and/or at the level of the user running the workflow.
Secondly, decentralization is also achieved by the ability of the Protocol to be agnostic: indeed, each service can be described by any language and thus interpreted by any Workflow Engine. By this principle, it is possible to have services invocable by different Workflow Engines.
Datasets
Any Datasets can be shared and used in the Dataverse.
Because all datasets and services are stored off-chain, OKP4 can support any volume, variety and velocity of data.
If your data is stored locally, you need to make it available to the network. You can upload it on your own local server to make it available, or into a cloud provider, or into a decentralized storage network... anything is possible.
Algorithms
Any Algorithm can be shared and used in the Dataverse. Either as code without the computation infrastructure or as an API.
Infrastructure
Any Infrastructure can be shared and used in the Dataverse.
Storage
All storage options can be considered: from local storage to company-operated cloud storage (i.e. AWS, Azure, Google Cloud...), to decentralized storage networks (i.e. IPFS, Filecoin, Arweave...)
Computation
All computation options can be considered: many services will be provided as API with their own computation resources, but others can also be executed on company-operated cloud services or decentralized computation (i.e. Akash, Ankr, Cudos...)
Identity solutions
Identity management services or identity standards can be used, depending on the needs.
Personal sovereignty in the Dataverse
What matters in the Dataverse design is the ability of any provider to be sovereign about what they shared in the Dataverse.
For example, because Data Space governance is highly customizable, it can be defined as quite centralized (operated by a single company for example). But the Dataverse is open, trustless, and censorship-resistant, so anyone can remove access rights from what he shared in Data Spaces, and even dereference from the Dataverse completely without asking anyone for permission to do so.
Security
The purpose of the OKP4 protocol is to allow the deployment of applications dedicated to data sharing within data spaces governed by customizable governance. In this context, which involves several different systems, from different providers, in different runtime environments, security is a major concern.
The means used to guarantee the security of the protocol and its ecosystem are varied and depend on the nature of each component and the way they are used and exposed.
In the following chapters, we describe the main security principles on which the protocol is based.
The responsibility of Service Providers
The Services deployed in the OKP4 ecosystem are responsible for all the data they host, process and distribute. This means that data at rest and in transit must be protected from unauthorized access. Clients must be authenticated and appropriately authorized to access data. Ideally, all interactions must be auditable and uniquely traceable to the originating entity. The system must be robust against intentional attempts to circumvent any of its access controls.
In the protocol, Service Providers are encouraged to make security a primary issue, as Service Providers have value at stake that they can lose if they fail to meet their obligations. We, therefore, consider that the obligation of means in terms of security must be guaranteed by the Service Provider.
Interactions flow
We consider here internal services interaction, i.e. the interactions that take place between the different components of the ecosystem, and more specifically the interactions between a service and the Workflow Service in charge of the orchestration.
In this situation, the workflow service must have the legitimacy to access the data or be able to provide the authorization to another service (typically an algorithm service) to access the data. The pain point is that for several services to cooperate together they must share secrets that allow them to access certain resources, under certain conditions.
In the framework of the OKP4 protocol, services dedicated to the management of identity-based secrets and encryptions can be used: those are the Secrets Vaults.
The sections below detail the mechanisms of secret sharing used during the data processing with the workflow service.
Store credentials
First, the Data Provider deposits the data on the Data Service Provider and secures it with an appropriate means, e.g. with credentials. Then, to allow access to this data to the protocol, the provider stores the secret in the Secret Vault.
loading...It should be noted that different instances of this service can be deployed in the protocol and operated by different providers, contributing to the decentralization of secret management.
Retrieve credentials
As soon as a workflow request is submitted to the protocol, the Workflow Service is triggered, and retrieves the metadata describing the services to invoke. To operate, the Workflow Service makes a request to retrieve the secrets from the Secret Vault. An authorization step ensures that the Workflow Service is authorized to access the secret by verifying in the blockchain that a transaction exists for this. If authorized, the Secret Vault grants access to secrets.
loading...Use credentials to invoke services
Having the secret, the Workflow Service is now allowed to invoke the service.
loading...Highly-Secured Data Spaces
For very sensitive data where security must be very high, the OKP4 protocol supports the deployment of a private and secure ecosystem for a Data Space.
In this Data Space, all declared resources are private and used only within the Data Space. In particular:
- Services have been audited to ensure security and compliance. These services are deployed and operated on a secure infrastructure and have no exposure to the outside world.
- The data is hosted in-situ, and never leaves the Data Space.