Snowflake is one of the leading data warehousing solutions available today; its high performant and cloud-based design has been enabling thousands of customers around the world to create value out of their datasets. However, the risk of data breaches, strict security requirements, and regulations pose growing limitations and concerns to firms wanting to work with valuable information in cloud environments.
Comforte empowers organizations to identify, classify, and protect sensitive data throughout their whole architecture. Modern data analytics solutions often involve technologies deployed on different cloud providers and on-prem. Our easy-to use, resilient, and scalable platform can flexibly implement data policies on different environments and tools. The comforte Data Security Platform’s tokenization engine offers different and customizable options including anonymization, pseudonymization, and format-preserving encryption. Tokenization is part of a data-centric protection strategy which significantly reduces the burdens of security and compliance by protecting the data itself rather than focusing exclusively on the applications’ perimeters.
Snowflake can easily be set up to leverage external functions and invoke the comforte Data Security Platform through an API Gateway service. This method enables authorized users to easily tokenize and detokenize columns from Snowflake itself through a UDF.
Snowflake supports the creation of external functions using the main cloud services providers, including Amazon AWS, Microsoft Azure and Google GCP. The configuration can be slightly different for each provider. In this document we will be focusing on how to configure Snowflake external functions on Amazon AWS.
We will not cover the steps related to the creation of a protection cluster nor the setup of the REST API, but these are requirements to use the Snowflake integration. Please refer to the documentation of each product for information on how to configure them in each cloud platform.
The integration between Snowflake and the REST API requires an API Gateway to be configured.
Each cloud platform has its own way to configure an API gateway, and we will describe the steps required to do it on Amazon AWS. For other cloud providers, please check their documentation.
It is required that the AWS account has the privileges to: -Create AWS roles via IAM. -Create an API gateway endpoint.
On Snowflake, it is required that the user account have ACCOUNTADMIN privileges or a role with the CREATE INTEGRATION privilege.
The first step is to create a new AWS role that will be used for the Snowflake integration.
Create an empty text file to record some values and information for future reference. Some values will be required in different places, and it will be easier to copy and paste them from the text file.
After creating the role, we need to create the actual API Gateway endpoint that will be later used by Snowflake to call the SecurDPS Enterprise REST API to do the protect or reveal operations.
Paste the JSON below in the Resource Policy editor:
{Edit the provided JSON changing the <12-digit-number> value with your AWS Account ID. In the same line change the <external_function_role> value with the name the IAM role created earlier.
Change also the <method_request_ARN> with the ARN value you have recorded earlier for the POST method.
After double-checking you have substituted all the variables in the Resource Policy JSON configuration, click the Save button.
The suggested name for the API integration can be edited. Please, record the name of the API integration.
Provide the <new_IAM_role_ARN> ARN value of the IAM role that was created in step 2.a.
The api_allowed_prefixes field should contain the resource invocation URL that you recorded earlier in step 2.b.
After doing the changes, click the Run button to create the API integration and check if you receive the "Integration SECURDPS_API_INTEGRATION successfully created" message.
4. Execute the DESCRIBE INTEGRATION command.
describe integration <my_integration_name>;For example:
describe integration securdps_api_integration;Look for the property named API_AWS_IAM_USER_ARN and then record that property’s property_value for future use.
Find the property named API_AWS_EXTERNAL_ID and record that property’s property_value for future use.
Note that the property_value of the API_AWS_EXTERNAL_ID often ends with an equals sign (“=”). That equals sign is part of the value; make sure that you cut and paste it along with the rest of the property_value.
This topic provides instructions for linking the API integration object in Snowflake to the AWS API Gateway. You do this by creating a trust relationship between Snowflake and the IAM (identity and access management) role you created earlier.
Find the Statement Condition field. Initially, this should contain only curly brackets (“{}”).
Paste the following between the curly brackets:
"StringEquals": { "sts:ExternalId": "xxx" }Replace the xxx with the value for the API_AWS_EXTERNAL_ID field recorded earlier in step 3.a).4.
Finally, click the Update policy button.
Now we are ready to create the External Function in Snowflake in the next step.
To create the external function access Snowflake web interface and type the CREATE EXTERNAL FUNCTION command as exemplified below:
CREATE EXTERNAL FUNCTION my_external_function(v VARCHAR)Change the "my_external_function" function name to something more meaningful, for example securdps_protect_SSN, so you can identify that this function will protect and return the passed SSN (social security number).
Using the same principle, you could change the v variable to SSN. The datatype should also be adjusted accordingly with the actual database column datatype.
Replace the <api_integration_name> with the previously recorded name of the API integration object. If you followed the suggestion, the name is securdps_api_integration.
In the HEADERS parameters, change the <securdps_restapi_token> value to the corresponding bearer token of the REST API to authenticate the connection user. Modify the X-Operation header to the desired PROTECT or REVEAL operation to be executed and define the X-Strategy to be applied based on the configured protection strategies defined in the protection cluster.
Finally, modify the <resource_invocation_url> to the previously recorded value of the API Gateway invoke URL recorded in step 2.b).8.
The resulting command should look like:
CREATE OR REPLACE EXTERNAL FUNCTION SDPS_PROTECT_SSN(value VARCHAR)Run the command to create the Snowflake external function.
You can now test the external function running it with a single value as exemplified below:
SELECT SDPS_PROTECT_SSN2('111-22-3333')If everything was correctly setup you should see a tokenized value as result.
Snowflake has the option to create column level masking to mask the data that is presented as result of SELECT statements. Using the external function we created earlier it is possible to apply column level tokenization in a few easy steps that we will describe next.
1. First, create a masking policy using the command CREATE MASKING POLICY as exemplified next:
create or replace masking policy <my_masking_policy> as (val string) returns string ->Using the previous implemented external function as example, we could define a masking policy that allow access to plain data only to the SECURITYADMIN role in Snowflake using the following example:
create or replace masking policy SSN_MASK as (val string) returns string ->In the example above we are assuming that data is stored in plain in Snowflake, and we want to protect this data from not allowed users - in this case all users that are not part of the SECURITYADMIN role will see only protected data.
2. After creating the masking policy you have to apply it to the table/column to be masked, using the SET MASKING POLICY command. A masking policy can be applied during CREATE TABLE statements or be applied later using the ALTER TABLE statement.
To apply the masking policy to existing data we can use the following command:
alter table if exists <table_name> modify column <column_name> set masking policy <masking_policy_name>;Using the samples we created in this document, the command would look like :
alter table if exists CUSTOMERS modify column SSN set masking policy SSN_MASK;After applying this masking policy, any user that does not belong to the SECURITYADMIN role that queries the CUSTOMERS table SSN column will get only protected results.
For more information on how create objects in Snowflake, including external functions, masking policies, user roles, tables and columns, please refer to the links and references below: