Data contracts are the talk of the town. Numerous webinars and blogs have been dedicated to the topic, but its definition is still murky. Partially due to a largely semantic discussion, partially due to it being a fairly new concept. A more in-depth thesis about data contracts can be found here. Vendors are moving into the space from multiple angels and claim to be the ultimate data contract solution. When the smoke clears, the industry will come to a consensus on the definition of data contracts, at the risk of becoming solution based.
The smoke will clear to reveal the true definition of data contracts
Photo by Filip Bunkens on Unsplash
Let us therefore be very clear. Raito is not a data contract solution. A data contract is an up-front agreement between a data producer and a data consumer, governing the format of the dataset created by the data producer. Depending on the tool, a user can merely monitor, or even enforce compliance with the data contracts. Raito offers a data access management solution, governing data usage by the data consumer. In this capability, Raito offers data access management and usage analytics, combined with basic data discoverability. In short, Raito governs the consumer interactions with data objects subject to the data contract, in contrast to data contract tools which govern the behavior of data contract producers.
Raito and data contract tools
Image by author
Data access management and data contracts
Yet still, you feel data access management is closely related to data contracts. A data contract might have multiple endpoints that a data consumer can use to access the data covered by the contract. The end point can expose masked data or unmasked data, a subset of the data or the whole data set, all serving different purposes.
The concept of endpoints implies explicitly that access policies are not in scope of a data contract, even when an access policy can affect or even define how data is consumed through the endpoint through column masking, row filtering, or other policies. In fact, global access policies overlay these endpoints orthogonally and might conflict with the locally defined endpoints.
Global access policies overlay locally defined endpoints orthogonally
Image by author
Where endpoints are defined locally at the technology level as a handover point between the data producer and consumer, access policies can be defined globally at the logical level which can affect the behavior of multiple endpoints at once. For instance, if you have an access policy that says that employees can only access data of customers within their geographical region, that policy will affect any endpoint exposing customer data, whether it is an API, Data Warehouse or BI report.
The Data owner also owns the data access controls
If not owned by the data contract, who owns data access management? A data contract is an agreement between a data producer and a data consumer. In modern data management the concept of a business data steward or a business data owner is becoming standard practice. Additionally, you see that within data mesh data ownership is being pushed to the data producer. Hence, when a data contract is an agreement between the data producer and the data consumer, the data producer becomes the data owner, who is responsible for managing access to the data.
When applying this to data contracts, this makes a lot of sense: it’s the one that provides the goods of the contract, that should hand out the key as well. A data producer should be enabled to decide who can access which version of his data. Even when their endpoints are influenced by global policies, the data owner should be able to determine which users can interact with their data and in which manner. More on this here.
A data owner is still in control to offer unmasked access to authorized users, even though regular users obtain masked access by default
Image by author
The different ways that global policies defined by the central data governance team can interact with a data producer’s end points can become quite complex. For example, you could have a central policy that requires that you always mask customer data to employees from other geographic regions, independent of the end point used to access that data. Yet still, the data producer should be able to provide access to all the unmasked data for valid purposes. Understanding the impact of, and resolving this interplay between policies and their exemptions can become a very time consuming activity affecting your time to market.
Amongst the many other benefits, it is this complexity that Raito resolves. If you want to know more, reach out to us for a free trial!