Subscribe

Giovanni Battista Spinelli Barrile l Sep 14, 2022 l Data Protection, Big Data Analytics, Cloud Computing

How to Protect Data on Snowflake with comforte Data Security Platform

Snowflake is one of the leading data warehousing solutions available today; its high performant and cloud-based design has been enabling thousands of customers around the world to create value out of their datasets. However, the risk of data breaches, strict security requirements, and regulations pose growing limitations and concerns to firms wanting to work with valuable information in cloud environments.

Comforte empowers organizations to identify, classify, and protect sensitive data throughout their whole architecture. Modern data analytics solutions often involve technologies deployed on different cloud providers and on-prem. Our easy-to use, resilient, and scalable platform can flexibly implement data policies on different environments and tools. The comforte Data Security Platform’s tokenization engine offers different and customizable options including anonymization, pseudonymization, and format-preserving encryption. Tokenization is part of a data-centric protection strategy which significantly reduces the burdens of security and compliance by protecting the data itself rather than focusing exclusively on the applications’ perimeters.

intercept and secure live data

granular authorized access

Snowflake can easily be set up to leverage external functions and invoke the comforte Data Security Platform through an API Gateway service. This method enables authorized users to easily tokenize and detokenize columns from Snowflake itself through a UDF. 

Snowflake supports the creation of external functions using the main cloud services providers, including Amazon AWS, Microsoft Azure and Google GCP. The configuration can be slightly different for each provider. In this document we will be focusing on how to configure Snowflake external functions on Amazon AWS.

1. Protection cluster and REST API

We will not cover the steps related to the creation of a protection cluster nor the setup of the REST API, but these are requirements to use the Snowflake integration. Please refer to the documentation of each product for information on how to configure them in each cloud platform.

2. API gateway

The integration between Snowflake and the REST API requires an API Gateway to be configured.

Each cloud platform has its own way to configure an API gateway, and we will describe the steps required to do it on Amazon AWS. For other cloud providers, please check their documentation.

It is required that the AWS account has the privileges to: -Create AWS roles via IAM. -Create an API gateway endpoint.

On Snowflake, it is required that the user account have ACCOUNTADMIN privileges or a role with the CREATE INTEGRATION privilege.

a) Create an AWS role for the integration

The first step is to create a new AWS role that will be used for the Snowflake integration.

Create an empty text file to record some values and information for future reference. Some values will be required in different places, and it will be easier to copy and paste them from the text file.

  1. Access the Identity and Access Management (IAM) console and click on Roles menu option and then on the Create role button.
  2. On the Select trusted entity page select the AWS account entity type and specify which account ID will be able to access this role.
  3. In the Add permissions page, optionally attach Permissions policies and click Next.
  4. In the Name, review, and create page, define a name for the role. Suggestion is SecurDPS_Snowflake_integration, record this name for future reference. Finally, click the Create role button.
  5. If everything is correct the new role will be created. Record the ARN of this role for future use.

b) Create an AWS API Gateway endpoint

After creating the role, we need to create the actual API Gateway endpoint that will be later used by Snowflake to call the SecurDPS Enterprise REST API to do the protect or reveal operations.

  1. Access the Amazon API Gateway console and click on the Create API button.
  2. Select the REST API option and click the Build button.
  3. Select the REST protocol and choose to create a New API. In the Settings define a name for the API (suggested name is SecurDPS_Snowflake_Integration). Select the Endpoint Type based on your cloud deployment type. Click the Create API button to finish the initial steps.
  4. If the API was created successfully you will see a new screen with a lot of options. Search for the Actions dropdown button and select the Create Method option.
  5. Just below the Actions button you will notice a drop-down box where you can select the available methods. Select the Post method and click on the button with a "check" symbol to save the selected method.
  6. Now you will be presented with the options for the Post method setup. Select the HTTP integration method and then inform the Endpoint URL to the REST API instance. The Endpoint URL will look like the example below. As we can notice, the REST API is reached at port 8080 and through the specific path /securdps/v1/snowflake/batch/.

    http://**********4687b5daac5bb4a54f2-*****2279.us-east-2.elb.amazonaws.com:8080/securdps/v1/snowflake/batch/

    Save the changes to the API by clicking the Save button.

  7. Next, deploy the API. Click the Actions drop down menu and select the Deploy API option. On the new form, select the [New stage] option for the Deployment stage field, then define a name for the stage (suggestion is prod) and click the Deploy button.
  8. Record the Invoke URL information for future reference and click the "Save changes" button.

c) Secure Your Amazon API Gateway Endpoint

  1. At this point, you should be on the screen that displays your API Gateway information and you should see your resource and POST method.
  2. In the left-hand pane, click on Resources and then click on the POST method.
  3. Record the Method Request ARN from the Method Request box to your notes for future reference.
  4. Click on the title Method Request to open the details window. Click the edit symbol beside Authorization and select AWS_IAM to specify that the method request requires AWS_IAM authorization. Then click on the small checkmark next to the menu to confirm your selection.
  5. To set the resource policy for the API Gateway to specify who is authorized to invoke the gateway endpoint, click on Resource Policy in the left-hand column of the window for the API.

    Paste the JSON below in the Resource Policy editor:

    {
       "Version": "2012-10-17",
       "Statement":
       [
          {
          "Effect": "Allow",
          "Principal":
             {
             "AWS": "arn:aws:sts::<12-digit-number>:assumed-role/<external_function_role>/snowflake"
              },
         "Action": "execute-api:Invoke",
         "Resource": "<method_request_ARN>"
         }
       ]
    }

    Edit the provided JSON changing the <12-digit-number> value with your AWS Account ID. In the same line change the <external_function_role> value with the name the IAM role created earlier.

    Change also the <method_request_ARN> with the ARN value you have recorded earlier for the POST method.

    After double-checking you have substituted all the variables in the Resource Policy JSON configuration, click the Save button.

  6. Deploy all the changes. Select the Resources option in the left-hand column for the API, then use the Action menu, select Deploy API and use the stage created earlier (no need to create a new stage).
  7. The next step is to create the Snowflake API integration.

3. Snowflake API integration

a) Create the API Integration for AWS in Snowflake

  1. Open a Snowflake session, typically a Snowflake web interface session.
  2. Use a Snowflake role with ACCOUNTADMIN privileges or the CREATE INTEGRATION privilege, for example:
    use role <has_accountadmin_privileges>;
  3. Type the CREATE API INTEGRATION command to create an API integration. The command should look similar to the following:
create or replace api integration securdps_api_integration
   api_provider = aws_api_gateway
   api_aws_role_arn = '<new_IAM_role_ARN>'
   api_allowed_prefixes = ('https://')
   enabled = true;

The suggested name for the API integration can be edited. Please, record the name of the API integration.

Provide the <new_IAM_role_ARN> ARN value of the IAM role that was created in step 2.a.

The api_allowed_prefixes field should contain the resource invocation URL that you recorded earlier in step 2.b.

After doing the changes, click the Run button to create the API integration and check if you receive the "Integration SECURDPS_API_INTEGRATION successfully created" message.

   4. Execute the DESCRIBE INTEGRATION command.

describe integration <my_integration_name>;

For example:

describe integration securdps_api_integration;

Look for the property named API_AWS_IAM_USER_ARN and then record that property’s property_value for future use.

Find the property named API_AWS_EXTERNAL_ID and record that property’s property_value for future use.

Note that the property_value of the API_AWS_EXTERNAL_ID often ends with an equals sign (“=”). That equals sign is part of the value; make sure that you cut and paste it along with the rest of the property_value.

b) Link the API Integration for AWS to the Proxy Service in the Management Console

This topic provides instructions for linking the API integration object in Snowflake to the AWS API Gateway. You do this by creating a trust relationship between Snowflake and the IAM (identity and access management) role you created earlier.

  1. Log into your AWS account and navigate to the IAM management console.
  2. Select the Roles item under the Access management menu.
  3. Locate the role created earlier using the search box. Click on the role name to open the details screen. Click on the Trust relationships tab, then click on the button Edit trust policy. This should open the Trusted Entities document into which you can add authentication information.

Find the Statement Condition field. Initially, this should contain only curly brackets (“{}”).

Paste the following between the curly brackets:

"StringEquals": { "sts:ExternalId": "xxx" }

Replace the xxx with the value for the API_AWS_EXTERNAL_ID field recorded earlier in step 3.a).4.

Finally, click the Update policy button.

Now we are ready to create the External Function in Snowflake in the next step.

4. Snowflake External Function

To create the external function access Snowflake web interface and type the CREATE EXTERNAL FUNCTION command as exemplified below:

CREATE EXTERNAL FUNCTION my_external_function(v VARCHAR)
   RETURNS VARIANT
   API_INTEGRATION = <api_integration_name>
   HEADERS = ( 'Authorization' = <securdps_restapi_token>,
               'Content-Type' = 'application/json',
               'X-Operation' = <securdps_operation>,
               'X-Strategy' = <securdps_strategy>)
   AS '<resource_invocation_url>';

Change the "my_external_function" function name to something more meaningful, for example securdps_protect_SSN, so you can identify that this function will protect and return the passed SSN (social security number).

Using the same principle, you could change the v variable to SSN. The datatype should also be adjusted accordingly with the actual database column datatype.

Replace the <api_integration_name> with the previously recorded name of the API integration object. If you followed the suggestion, the name is securdps_api_integration.

In the HEADERS parameters, change the  <securdps_restapi_token>  value to the corresponding bearer token of the REST API to authenticate the connection user. Modify the X-Operation header to the desired PROTECT or REVEAL operation to be executed and define the X-Strategy to be applied based on the configured protection strategies defined in the protection cluster.

Finally, modify the <resource_invocation_url> to the previously recorded value of the API Gateway invoke URL recorded in step 2.b).8.

The resulting command should look like:

CREATE OR REPLACE EXTERNAL FUNCTION SDPS_PROTECT_SSN(value VARCHAR)
RETURNS variant
API_INTEGRATION = securdps_api_integration
HEADERS = ( 'Authorization' = 'Bearer *****672-5598-4480-aca5-*********',
            'Content-Type' = 'application/json',
            'X-Operation' = 'PROTECT',
            'X-Strategy' = 'SSN')
AS 'https://***********.execute-api.eu-central-1.amazonaws.com/Prod'

Run the command to create the Snowflake external function.

You can now test the external function running it with a single value as exemplified below:

SELECT SDPS_PROTECT_SSN2('111-22-3333')

If everything was correctly setup you should see a tokenized value as result.

5. Snowflake column masking

Snowflake has the option to create column level masking to mask the data that is presented as result of SELECT statements. Using the external function we created earlier it is possible to apply column level tokenization in a few easy steps that we will describe next.

      1. First, create a masking policy using the command CREATE MASKING POLICY as exemplified next:

create or replace masking policy <my_masking_policy> as (val string) returns string ->
      <snowflake_external_function>(val)

Using the previous implemented external function as example, we could define a masking policy that allow access to plain data only to the SECURITYADMIN role in Snowflake using the following example:

create or replace masking policy SSN_MASK as (val string) returns string ->
   CASE
     WHEN CURRENT_ROLE() IN ('SECURITYADMIN') THEN val
     ELSE SDPS_PROTECT_SSN(val)
   END;

In the example above we are assuming that data is stored in plain in Snowflake, and we want to protect this data from not allowed users - in this case all users that are not part of the SECURITYADMIN role will see only protected data.

      2. After creating the masking policy you have to apply it to the table/column to be masked, using the SET MASKING POLICY command. A masking policy can be applied during CREATE TABLE statements or be applied later using the ALTER TABLE statement.

To apply the masking policy to existing data we can use the following command:

alter table if exists <table_name> modify column <column_name> set masking policy <masking_policy_name>;

Using the samples we created in this document, the command would look like :

alter table if exists CUSTOMERS modify column SSN set masking policy SSN_MASK;

After applying this masking policy, any user that does not belong to the SECURITYADMIN role that queries the CUSTOMERS table SSN column will get only protected results.

For more information on how create objects in Snowflake, including external functions, masking policies, user roles, tables and columns, please refer to the links and references below:

Links and references


Share this:  LinkedIn XING Email

Want to learn more?

Check out our data analytics security page:

Privacy-enabled Analytics

Related posts