Data Spaces
Data is the limiting factor in the creation of knowledge. Thus, data sharing will help finding answers to societal, policy, and legal objectives that are in the public interest.
Organizations will not be able to maintain their competitiveness by innovating alone. It is necessary to think about models that allow collaboration and innovation that respond to the collective and individual interests. These new models must be intimately linked to trust between actors and data sovereignty.
Data sovereignty - which is a guarantee of transparency and sustainability of data sharing ecosystems - refers to the ability of natural or legal person to determine and enforce the rights to use its data.
OKP4 enables anyone (organizations, individuals, public administrations, NGO or communities) to build and join an infinite number of highly customizable Data Spaces.
What is a Data Space ?
First, what is not a Data Space? It is not a platform, not a hub, not a data lake neither a data warehouse.
In Designing Data Spaces, Lars Nagel and Douwe Lycklama define six mandatory principles to be introduced in all data space implementations:
- Decentralization
- Scalability
- Collaboration
- Interoperability
- Evaluation (audit)
- Trust
Four key European organizations, have formed in 2021 an initiative – the Data Space Business Alliance (DSBA) – creating a common framework to accelerate business transformation in the data economy and make data spaces happen.
According to the International Data Spaces Association, Data spaces “comprise relationships between trusted partners that are governed by the IDSA standard for secure and sovereign data exchange, certification and governance for business and industry across Europe and around the world”.
As for GAIA-X Association, they define the term data space as “a type of data relationship between trusted partners who adhere to the same high level standards and guidelines in relation to data storage and sharing within one or many Vertical Ecosystems”.
From FIWARE point of view, a data space can be defined as a decentralized data ecosystem built around commonly agreed building blocks enabling an effective and trusted sharing of data among participants.
And in the BDVA (Big Data Value Association) perspective, five pillars (data, governance, people, organization and technology) are needed to create value in data with trust as a central concept, together with the tools and mechanisms for strategic stakeholders to create data sharing spaces jointly.
It appears from these definitions that Data Spaces are commercial collaborative environments between various actors around the sharing and or exchange of data based on fundamental and non-transferable principles that are trust, security, interoperability and data sovereignty in order to reach common goals.
In Europe, Data spaces are governed by European Data Governance Act adopted in May 2022 and are a declination of the European Data Strategy.
A Data Space with OKP4 protocol
Definition of digital commons : "A common is a resource produced and/or maintained collectively by a community of heterogeneous actors, and governed by rules that ensure its collective and shared character. It is said to be digital when the resource is dematerialized: software, database, digital content (text, image, video and/or sound), etc." according to Labo Société Numérique.
Definition of OKP4 Data Spaces : A Data Space is a digital common that makes data and services (algorithms, computing resources, etc.) available to the community, based on shared governance, for the creation of knowledge. The governance guarantees the individual and collective interests of the participants, who retain the sovereignty of their resources. Trust is brought by the decentralized infrastructure which orchestrates the interactions between different resources.
Digital resources can be shared without being exchanged. In an OKP4 Data Space, datasets and algorithms are not stored centrally but at source and are therefore only shared (via semantic interoperability) when necessary. Several Data Spaces can have data and algorithms in common and can be nested and overlapping. There is then an endless combination of oftentimes interoperable Data Spaces where participants and digital resources can interact freely and create value.
In short, a data space is a digital common, totally decentralized, that enables the knowledge economy. All the rules are defined by the participants and the community. These Data Spaces are the source of new value chains in the knowledge economy by enabling trust-minimized data sharing, offering possibilities way beyond what exists today with transactional data marketplaces.
The OKP4 protocol acts as a clearing house that enables clearing and settlement services for all financial and digital resource sharing transactions. Based on the business model chosen by the participants, the retribution of the participants is then automated by the decentralized protocol.
A Data Space consists of three elements:
- Bundle of rights & rules that regulates interactions between data and services shared by participants
- Governance mechanisms to make these rules evolve
- Interfaces to create, modify and interact with the Data Space
A new knowledge created from different shared resources within a data space can then feed any applications. These applications are not part of the data space and have their own governance. They can be either a web2.0 or web3.0 application.
1. A bundle of rights and rules
A Data Space is highly customizable regarding every aspect of data sharing: access control, data & service management, business models, governance frameworks...
The governance rules of a Data Space describe a set of rules, standards and tools shared with Data Space participants, based on 5 pillars: ethics, legal, data management, technical requirements & business models.
The rules can refer to data management, retribution, business model or technical requirements for instance. All the rules are totally customizable and in particular can be amended provided that specific provisions have been made. By amendments, we mean an addition or modification to the governance rules, provided that they are compatible with the rules already defined (consistency).
Examples of rules
- Data management
- All metadata of datasets and services must be described
- Restrictions on the license of the dataset
- Only one type of actor has access to the raw data
- No member of the Data Space has access to raw data but only to aggregated and anonymized data
- Technical requirements
- Minimum x GB of available disk space
- Minimum upstream bandwidth of x Mbps
- Minimum download bandwidth of x Mbps
- Minimum of x TB of available bandwidth a month
- Availability x nines
- Security
- Business models
- Retribution of contributors according to the template XXX : Open data are not retributed and other datasets are retributed in function of the service consumption
- Retribution of contributors according to the template “Data Marketplace”: each provider evaluates the price of its data or services
- ...
2. Governance mechanisms to change rules
Each Data Space can have its own governance mechanism to change the rules.
Centralized Data Space Governance
When a Data Space is created, the creator can decide to keep the ability to change the rules for himself through a monosig mechanism. It enables him to keep maximum control over the rules. The participants can still make propositions and suggestions, but they can’t vote.
This mechanism can be useful for early communities or for companies who want to keep maximum control.
In this case, the Data Space governance rights holder make sure that participants are still aligned with the Data Space rules and find their own personal interest. The decentralized infrastructure enables participants to stay sovereign over everything they share, they can remove their consents at any moment, within asking the permission from the governance right owner. A period of time can also be defined between a rule change and the effective change, enabling participants to anticipate changes.
Permissionned Data Space Governance
In this case, the Data Space governance rights are shared by multiples participants using a multisig mechanism. This brings the possibility for two or more users to change rules as a group in a n-of-n design. For instance, a 2-of-2 would require the 2 governance rights holders to agree on the same proposal. A 2-of-3 would require 2 of the 3 governance right holders to agree. Mechanisms like a 7-of-10 are also possible in consortiums contexts.
Open Data Space Governance
Governance is open when governance rights can be earned or acquired through a publicly traded token. The more stake someone has in the Data Space, the more he has the ability to influence how rules change.
Tokens can also be delegated to trusted leaders if enabled.
Each Data Space can have its own native token, which can take a fungible or non-fungible format.
A Data Space can also inherit from an existing token for governance, may it be OKP4 $KNOW token, or another token in the Cosmos ecosystem or beyond.
This brings the ability to have community-governed Data Space with dynamic mechanisms, ideal for decentralized communities.
3. Interfaces to create, modify and interact with the Data Space
Anyone can create and design a Data Space with as much freedom as possible. OKP4 provides templates and no-code tools to set the governance rules. OKP4 SDK is designed to give as much freedom as possible for developers, data scientists and communities at large. Freedom of development, freedom of business models, freedoms of standards... freedom of innovation.
In this way, an interface has been designed to interact easily with the OKP4 protocol and, more specifically, create a data space. As far as the interface is concerned, it allows for an access to datasets and services and to create a new dataspace. Through the creation process of a new dataspace, besides the name, description, and tags, one can either use an existing governance template or create its authority using a current token (i.e., the KNOW) or not.
The governance rules of a dataspace can be built using a public template, its own library of dataspaces or custom blocks. For example, it is possible to define the rules of access control of data spaces, datasets and services, the token used, or the token amount. Data management can also be defined depending on the format, size or sectors desired. Other elements such as services management, business model, and governance can be set up. Once created, the data space is available on the OKP4 Explorer. A search tool is used to look for datasets, services, and data spaces.
Once the Data Space is created through the interface, the governance rules of this Data Space are encoded into a Domain Specific Language allowing the rules to be expressed as a formal code which
- is stored on-chain and is freely auditable by anyone,
- interpretable by a smart contract in a fully decentralized and autonomous way.
Thus, for any transaction sent to the OKP4 blockchain, this smart contract ensures that the transaction complies with the rights established by the Data Space governance rules. If not, the transaction is rejected.
Data Space Business Models
How to price data, services & knowledge?
There is no unique way to price data, services & knowledge. Today, the vast majority of transactions around data happen at a fixed price posted upfront or through long negotiations. These approaches work for now but are very inefficient.
Other approaches for dynamic data pricing such as the one enabled by Ocean Protocol provide alternatives that still rely on pricing data itself.
These pricing models are usually suboptimal, benefiting few large players who have an overview of the data market. These price models also benefit larger players who can derive more value from data than a smaller players due to economies of scale (more data) and economies of scope (more variety of data). Finally, negotiations due to data non-fungibility are very inefficient and create a lot of friction that prevents the data market to be scalable.
We envision a world where data (proprietary or not) is readily available for applications, where knowledge can flow seamlessly from data provider to applications, providing value to the knowledge consumer without exposing the data. Given the issues mentioned above and the ambition of OKP4, it's fair to say that we need new mechanisms for the data economy.
At OKP4, we propose generic templates, for both the data sharing and marketplace models, which allow the implementation of different business model templates based on user requirements.
Introducing: business model templates
Different data sharing business models can be implemented using generic templates proposed by OKP4. These generic templates are fully customizable according to users needs such as: depth of (non)operating workflow of services considered for rewarding data providers, the importance of datasets for new data creation, the pricing algorithms of the data marketplaces, etc.
Service-based pricing template
A general and sensible approach to data pricing can be based on the cost of executing the workflow involving the services operating on shared data to create the desired knowledge. Value generated from the knowledge can then be shared among all the providers (data, algo...) involved in the processing.
The generic template for service-based pricing considers that the value associated to the new created knowledge is a consequence of the different workflows of services which operate, or happened to operate, on shared data. Therefore, the value of the created knowledge is a function of the cost of executing these workflows. The rewards of the service providers are based on the considered importance (weights) of their shared datasets in these workflows.
This template has the following parameters available to be used:
- An offset (ξ > 0), which allows the computation of the knowledge value, i.e created dataset value, as a function of the cost of the associated workflow of services operating on shared datasets
- Weights (αin >=0), which characterize the importance of each shared dataset in the workflows of services linked to the created dataset
- A last rewarded rank (N >=0), which allows the computation of rewards for data providers, across the different workflows, up to this rank
Data-based pricing template
Another approach to price data is to consider datasets and services as atomic units with their own prices and, eventually, their own dynamics at the atomic level. There is no weights involved and their price is independant of the workflow in which it is comprised. These prices are added up to define the price of a workflow. While services pricing is fixed and defined by the providers (on a time-based basis or per use basis), datasets pricing can be fixed or dynamic as our template allows users to implement different dataset pricing algorithms. It could be a fixed price; first and second price sealed-bid auctions, price defined as a function of time and number of purchases are proposed among the different algorithms of the template...
This template has the following parameters:
- Auction time-interval, which sets the start and end of the auction bidding process
- Dataset reserved price, which set a price threshold under which a dataset will not be sold
- Weight parameters (ψ and ɸ > 1), which allows dataset price to be defined as a function of time and number of purchases