Snowflake’s rise to global megastar has felt nothing short of meteoric. The focus for enterprise companies was primarily on data performance, with query execution time a key factor when large datasets are becoming the norm. Snowflake’s scalable compute model aptly served this requirement, all while ‘playing nicely’ with the rest of a company’s data stack.
With scale comes mounting pressure for features that surround your core product offering. One of the most important of these features, in our opinion, has been Snowflake’s approach to data access management. Controlling who can access what data in your data source is always going to be a crucial aspect of any organisation, and the ramifications of getting this wrong can be huge. After all, a balance must be found. What is the point of having performant data that is not available or usable to those who need it? Equally, what is the fallout from sacrificing security in order to provide data access?
In this blog series, we will go through the details of Snowflake’s basic access management and governance capabilities, showing you how to build from scratch - and how to move from the basic levels to the more complex. We will even show you some ways to make it all a whole lot easier.
What is Data Access Management?
Data access management is the process of defining and controlling who can take an action upon specific data, in specific situations and through which methods. This typically includes processes, roles, policies, standards and metrics that combine to ensure an effective data management structure throughout the organization and the data’s lifecycle.
Effective access management will empower users, allowing everyone who needs access to certain data the correct access environment, whilst also ensuring the business is always protected from the mis-use of data - and any potential fines and fallouts that are associated with this.
Snowflake has taken a good look at how companies manage access, with a least-privilege approach through role-based access controls. In this blog, we go through how Snowflake implements this, whilst also highlighting some of the common difficulties and solutions that come with it.
How does Data Access Management work on Snowflake?
Snowflake uses a mix of Role-Based Access Controls and Discretionary Access Control to provide its data access controls. The way this works in practice is on the query level.
For example, you want to run a query in Snowflake. Four contextual points are required for you to create and run your query. Firstly, you need to select a role. Then, you select and configure the chosen warehouse (compute resource), before selecting the namespace (the database and schema). Once these four pieces are ready, you can run your query from the worksheet. This would look similar to the below:
use role DATA_ENGINEER;
use warehouse COMPUTE_WH;
use database MASTER_DATA;
use schema PRODUCTION;
select * FROM PRODUCT limit 10;
Snowflake does try to make this easier for you. In fact, the above can be condensed when you enable a default warehouse (Snowflake will autoselect it for you), and you enable the usage of secondary roles. Snowflake will then automatically pick the right role to run the query.
SELECT* FROM MASTER_DATA.PRODUCTION.PRODUCT limit 10;
Your role is checked to ensure that you have the permissions to run that query along the assigned parameters. If this is accurate, the query completes and you receive your response.
As you will no doubt know, role-based access controls is a method of restricting access based on the roles assigned to individual users within an enterprise structure. It ensures that employees can only access information that they need to do their roles, and prevents them accessing information that doesn’t relate to their role.
RBAC allows Snowflake to control a role hierarchy, where roles can inherit access from those linked to them on the hierarchy. Permission grants are applied to each role, which allows any assigned user the ability to perform a set of permitted actions (such as reading from the database, assigning other role permissions etc) against the Snowflake resources to which they have been assigned access.
Roles in Snowflake come in many different shapes and sizes; but specifically system and custom.
System roles inherit from each other and are split into 5 differing levels:
ACCOUNTADMIN
The highest role leve in Snowflake, access is typically limited to a very select few. This is due to the level of control that ACCOUNTADMIN has across your Snowflake account. SYSADMIN and SECURITYADMIN are inherited within ACCOUNTADMIN.
SECURITYADMIN
This role mangoes object grant globally. MANAGE GRANTS privilege is automatically included by default, with USERADMIN role inherited.
USERADMIN
This applies directly to user and role management. This role can create and modify users and all their respective roles.
SYSADMIN
Controlling system object management, it is typical that custom roles are regularly assigned to the SYSADMIN role. This means that SYSADMIN can grant any custom role privileges to other roles.
PUBLIC
Every user or role gets this role. Everything owned by PUBLIC are inherently public.
A simple hierarchy is obvious within Snowflake’s system roles that extends into custom roles. Creating custom roles obviously therefore requires the CREATE ROLE privilege - contained within the SYSADMIN role if on a new Snowflake account. Having a logical hierarchy when creating roles within Snowflake is important as it ensures a continuing understanding of access and goals.
For example, a group of users who need to be able to query sensitive data within a select database could all sit within a single role that has object access. Alternatively, you can create a role with a business function name that applies the same access - giving a marketing team access to marketing specific data like churn, for example. This custom role now sits beneath your SYSADMIN, within whichever hierarchy you have specified to control the rest of the marketing team.
use role accountadmin;
grant usage on database MY_DB to role MARKETING;
grant usage on all schemas in database MY_DB to role MARKETING;
grant select on all tables in database MY_DB to role MARKETING;
New objects and privileges therefore need to be handled in a similar manner. Automated grant privileges allow this to be rolled out in a quicker way when dealing with new access when creating a new schema in Snowflake. Using “on future tables” provides an easy solution to this:
use role accountadmin;
grant usage on database MY_DB to role MARKETING;
grant usage on future schemas in database MY_DB to role MARKETING;
grant select on future tables in database MY_DB to role MARKETING;
Using this syntax will allow the privileges to carry down to all the tables within that schema.
With potentially fewer complications, within Snowflake each resource can and will have a defined owner. The owner can be added at creation or changed by the owning role, a role that sits higher in the direct hierarchy, or the account administrator. Most custom roles will sit underneath the SYSADMIN role, since it is responsible for custom role privileges.
So, this shows a clear structure for Snowflake’s data access management, how it works within a role-based approach, as well as linking that to defined owners of specific resources within your organisation. This sets out a very early, easy to understand base from which to build your data access management approach.
Over the next few weeks, we will go into more detail about the intricacies of Snowflake’s data governance, as well as how Raito enhances Snowflake’s base controls to offer the best business benefit for you. So