Data Security for Data Mesh

Introduction

Data Mesh is still very much alive. Despite Gartner saying Data Mesh is dead, we still see a lot of organisations implementing Data Mesh in one shape or another. Data Mesh offers a nice opportunity to revise your data security framework and make it future-proof.

In this article we describe the impact of the 4 pillars of Data Mesh on Data Access Management.

Domain-Driven Data Ownership

Under the first and most important principle of data mesh ownership is shifted from a central team towards the different business domains. As these teams produce the data in their domain and know its business context best, they’re ideally positioned to be responsible for and take ownership of that data. In practice, this means that they’re responsible for making the data findable, accessible, interoperable, and reusable, while assuring the data quality standards are met. As the domains have the business context, they’re best positioned to manage access and respond to access requests. By moving this responsibility from the central team, where it is traditionally done, to the domains, data consumers get 73% faster access to data, making them significantly more productive.

However, you cannot federate ownership of all data to the domains. When it comes to highly regulated personal data, or business critical IP, the central data governance team will still have to keep control over the access to it. This means that the data governance teams should remain owner of the access controls, data masking rules, and row filters to that data.

Our customers use Raito to assign ownership of data, access controls, and data security controls (data masking and row filters) to users and user groups.

‍

*Make users and groups owners of data products, data access controls, and data security controls*

‍

Data as a Product

When managing data as a product, data teams apply product thinking to their data products, and approach the development of data products in the same way software engineers do. It also means that data products are versioned and managed as code in your CI/CD pipeline. In terms of access management it means that access is managed as code, becoming an integral part of how the data product is defined. This can be implemented through various tools like data contracts, dbt, or Terraform.

However, it should be noted that granting access directly to users within code is discouraged. This approach hinders security by making it difficult to adapt to organizational changes, like employee departures. It also puts the onus to decide who should have access with the data engineer who often doesn’t have that knowledge. To make access controls flexible to organisational change and to achieve a segregation of duties, data producers should only grant roles or groups (and not users) access to data products. It is then up to the data owner to add users to the roles or groups.

Alternatively, data producers can add tags to their data products, after which Raito will dynamically grant access using the data owners’ access policies, or dynamically mask data using data governance’s data masking policies. This is described in more detail under the pillar ‘Federated computational governance’.

‍

Our customers use Raito for both managing access as code, and dynamically granting access and masking data using tags. Check out our webinar to learn more.

‍

‍

Self-Service Data Platform

Self-service data platforms democratize access to data. They enable non-technical users to access, process, and get insights from data without the support from a central data team. In this way, it’s an important driving factor to make organisations more data driven, which is a prerequisite to remain competitive in many sectors. However, without proper access request workflow you’ll likely frustrate data consumers when they can’t get access in due time.

With Raito, data consumers can find and request access to data and data products after which data owners get to approve the access request. Upon approval, Raito updates the permissions in the underlying cloud data provider. To respect the principles of least privilege access management, data consumers will get just-enough and just-in-time access.

Data Owners can use Raito automatically grant access in two ways:

Pre-approved access: When data owners pre-approve access requests for certain users or groups all their access requests will be auto-approved, granting them immediate access. Data owners can also configure the amount of time after which that access will also be automatically revoked by Raito.
Attribute-Based-Access-Controls (ABAC): Instead of manually granting access, Data Owners can dynamically grant access to data consumers and service accounts using ABAC, where access is automatically granted based on the attributes of the data and the data consumer.

You can also ingest access requests from other platforms such as:

Data Marketplace: Collibra, Atlan, Zeenea
Ticketing Systems: ServiceNow, JIRA
Communication Services: Slack

Later this year, we will release the data product marketplace where data consumers can find curated data products, understand what they consist of, and request access to the underlying data no matter whether it’s structured, unstructured or streaming data.

‍

Federated computational governance

As you federate ownership to the domain teams it will be important that you provide sufficient guardrails to guarantee that the organisation's data remains private and secure, without having to pass through the central data governance team which risks creating bottlenecks. Raito lets the data governance team dynamically mask columns or filter rows using policies that are computationally enforced using the data and user attributes without manual intervention by the data governance team.

This way the data governance team can write a policy that says Payment Card Information (PCI) always needs to be masked. Raito will detect all columns tagged as PCI and automatically mask those columns without the data governance team having to manually mask the columns. These tags can be ingested from dbt, terraform or data contracts, but can alternatively also come from your cloud data provider or data catalog.

*Dynamically mask all columns tagged as PCI in Databricks*

‍

Don't hesitate to reach out!

Bart

Talk to the team

Data Security for Data Mesh

Introduction

Domain-Driven Data Ownership

Data as a Product

Self-Service Data Platform

Federated computational governance

Product

Partners

Solutions

Legal