Imagine data engineers as modern-day Indiana Jones, navigating vast data troves to unearth hidden insights and fuel innovation. But unlike Jones, their adventures come with a hefty responsibility: guarding access to sensitive information.
Traditionally, data access was a static affair, dictated by decrees from on high and implemented through rigid "infrastructure as code" (IaC) scripts. However, this approach clashes with the dynamic and collaborative nature of modern data stacks. Think sprawling data lakes, real-time analytics, and diverse teams vying for crucial information. In this fast-paced landscape, inflexible controls become bottlenecks, hindering agility and innovation.
The shift-left paradigm. By embedding data security management (grants, masks, filters, etc.) earlier in the data lifecycle, we empower engineers to define dynamic access policies. This doesn't mean abandoning IaC scripts – they remain crucial for efficient development. Instead, we integrate them with modern access management tools, unlocking access-on-request, time-bound or purpose-based access. Teams gain agility while upholding security best practices like the principle of least privilege (PoLP).
This blog post delves into this exciting shift, exploring how it can transform data access management from a roadblock to a catalyst for collaboration and innovation. Get ready to unleash the Indiana Jones in your data engineering team, all while keeping the data vault secure
Imagine a world where data access controls are baked into your code base alongside your data pipelines and transformations, by using IaC or transformation workflow systems (such as dbt). This proactive approach, known as "shift-left," allows you to define secure access policies at the beginning of the data lifecycle, not as an afterthought. Why is this crucial?
But hold on, static rules have their limits: Shift-left isn't just about defining access grants in code. Modern data needs are dynamic. Teams have evolving needs, data lakes constantly expand, and temporary access requests arise. The current approach of defining the "who" in access control scripts restricts the flexibility needed to meet dynamic access requirements.
Here's where Raito comes in, a hybrid approach:
We make a difference between grants, filters and masks. To ensure data protection, the general column and row-level security should be defined in a complete static way. Think about general rules to mask PII data by default, or only allow only access to records of team’s region (by using filters). This ensures the main data protection. Note that it should be possible to introduce exceptions in a later stage.
On the other hand, grants should be made dynamic in order to become more agile and meet the needs of modern data teams. By adopting a "what, not who" approach, data engineers are granted full control over data access at the table, schema, and database level, without the need for micromanaging individual users or groups. Since the "who" is dynamic, the following features are accessible:
Our SaaS solution, Raito.io, lets you treat grants, filters and masks as managed infrastructure resources and manage them as configurations.
This hybrid approach combines the benefits of shift-left with the flexibility needed in modern data stacks. Data engineers gain the autonomy to manage access securely, while other teams enjoy frictionless access when needed. It's a win-win for security, agility, and collaboration.
On top of that, dynamic features such as time-bound access can be introduced, where a data consumer gets access for a limited amount of time only. No need for the data owner or data engineer to revoke access afterwards. It’s all automated. Or think of purpose-based access, were multiple grants for multi-cloud data products or projects are easily combined .
In one of my previous jobs, I had the important responsibility of improving the efficiency and accuracy of material sampling for a recycling company. The manual process in place at that time was time-consuming, error-prone, and lacked the consistency needed for optimal results. Our collaboration focused on developing a robust machine learning (ML) model to automate material sampling, which would streamline our operations and foster innovation. However, addressing the legal and ethical concerns associated with sharing confidential data presented a significant challenge.
A few years ago, granting access to different teams responsible for different parts of the ML solution while ensuring the protection of critical company data was difficult. Access was managed either directly in the data source or through scattered resource files across various projects.
Nowadays, with the use of Raito, access controls can be created within the owning repository, for example, by utilizing Terraform. A Terraform script included in the project could appear as follows:
provider "raito" {
domain = var.raito_domain
user = var.raito_user
secret = var.raito_password
}
data "raito_datasource" "research_ds" {
name = "research"
}
data "raito_user" "ruben" {
email = "ruben@******"
}
resource "raito_grant" "recycling_data_engineers" {
name = "Recycling data engineers"
description = "Access control to grant access for research purposes"
data_source = data.raito_datasource.research_ds.id
owners = [data.raito_user.ruben.id]
what_data_objects = [
{
fullname = "RESEARCH.RECYCLING.MATERIAL_DATA"
global_permissions = ["READ"]
},
{
fullname = "RESEARCH.RECYCLING.MATERIAL_ENRICHED_DATA"
global_permissions = ["READ"]
}
]
}
resource "raito_grant" "recycling_data_writer_sa" {
name = "Recycling writer service account"
description = "Access control for incoming data stream (service account)"
data_source = data.raito_datasource.research_ds.id
owners = [data.raito_user.ruben.id]
what_data_objects = [
{
fullname = "RESEARCH.RECYCLING.MATERIAL_DATA"
permissions = ["INSERT"]
},
]
who = [
{
user = "recyling_insert_service_account"
}
]
}
resource "raito_mask" "recycling_mask" {
name = "Recycling data mask for critical data"
description = "Access control for incoming data stream (service account)"
data_source = data.raito_datasource.research_ds.id
owners = [data.raito_user.ruben.id]
what_data_objects = [
{
fullname = "RESEARCH.RECYCLING.MATERIAL_DATA.ORIGIN_COMPANY"
}
]
type = "SHA256"
}
In this example, we have defined static write access for a service account that handles incoming data. Additionally, an access control is created to access the data. It is important to note that no users are directly assigned to this access control. Instead, the beneficiaries of the grant will be dynamically managed by Raito Cloud. This allows us to use time-bound access or grant purpose-based access. Depending on the needs of the data teams.
Remember our intrepid data engineers as modern-day Indiana Jones, navigating data troves for hidden insights? Just like Jones, they face a crucial responsibility: safeguarding sensitive information. Raito’s shift-left approach empowers them to embrace this challenge, shifting security and privacy earlier in the data development lifecycle. By integrating "what, not who" access controls, they gain agility and autonomy, while upholding security best practices like the principle of least privilege.
Think of Raito as Jones' trusty whip, enabling dynamic access management starting form the code base. Just as Jones collaborates with allies, your data teams can seamlessly share access based on project needs, fostering innovation.
The dynamic shift left can empower data engineers to become collaborators, eliminating the role of gatekeepers. Instead of following long data request workflows and manually maintaining grants, dynamic access management accelerates data teams. This is made possible by adopting a shift-left approach, where grants, masks, and filters are resources that should be created with the data. Not statically like in most IaS technologies, but enhanced with the dynamic nature of modern data teams.