Cookies
Close Cookie Preference Manager
Cookie Settings
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage and assist in our marketing efforts. More info
Strictly Necessary (Always Active)
Cookies required to enable basic website functionality.
Made by Flinch 77
Oops! Something went wrong while submitting the form.

Data Security for Data Products II

As data teams are increasingly becoming software teams, we need to take a more holistic approach to access management for data products that covers access to data, infrastructure, and code repositories, which integrates well with the data development process.

In a previous post I talked about the challenges with Data Security for Data Products, but after talking with several data engineers, data architects, and security experts, I realized that I missed something important. Access management for Data Products is not limited to the underlying data. Depending on your role in the data team, you will also need the right access to infrastructure and code repositories, and the type of permissions you need depend on the job ahead:

  • A Data Analyst will need read permissions to the underlying data of the data product, and the reporting tool used to report the data.
  • An Analytics Engineer will need access to the dbt project, and read and write access to the schema’s used by the dbt project.
  • A Data Engineer will need access to the Git repository, Airflow, read access to the landing zone, and write access to the staging zone.
  • An ML engineer will need read access to ML libraries, training data, the Git Repository, and the performance monitoring tools.

Why am I writing about this now? Apart from having time on my hands now that I finished the second season of Foundation, the recent NIST CSF 2.0 framework prompted me to pick up my proverbial pen to write this addendum to my previous post. The NIST framework, as the upcoming NIS2 Directive, requires organisations to limit access and monitor usage not only to data, but also to infrastructure and code bases. Definitely when you’re using managed services as these are accessible over the web. As managed services are integrated in almost every aspect of a modern data team’s workload, it doesn’t suffice anymore to restrict our thinking to data access management when thinking about access management. You’ll have to broaden your scope to include access to infrastructure and code bases.

This brings me to a point that I’ve already slightly touched upon in my previous article. The effort of managing access grows exponentially with every tool. If managing cloud data access for a growing data team wasn’t hard enough in itself, now you also have to manage access to the other tools in the data development stack. Taking an isolated approach to access management per tool won’t help much, nor make your engineers more productive.

Or, to quote a data engineer from a large software vendor:

“What’s the point of using one tool for access management to data, and another tool for managing access to Airflow?”

To quote a security leader from a very fast growing scale-up:

“When an engineer has access issues, they send me a ticket. Doesn’t matter what kind of access.” 

To quote an MLOps leader from a leading worldwide retailer:

“Access management in GitHub is a massive pain”

Request access to Data Product to get all the right access to data, Infra, and code repositories

As data teams are increasingly becoming software teams, we need to take a more holistic approach to access management for data products that covers access to data, infrastructure, and code repositories, which integrates well with the data development process. Otherwise, data teams end up with a piecemeal solution that comes with huge productivity costs and security blind spots.

Raito offers data teams a platform to centrally group all the permissions to data, infrastructure, and code repositories needed to work with Data Products. Reach out to info@raito.io to learn how this approach will make data teams more efficient.

Talk to the team