Snowflake ETL and PII Masking

 

Next Steps
Overview DB2 UDB Cassandra Elasticsearch MongoDB MySQL/MariaDB Oracle PostgreSQL SAP HANA Snowflake SQL Server Sybase Teradata

Challenges

You may face one or more of these time-consuming issues working with Snowflake:

Snowflake ddf

  • Data searches, profiling, and/or classification
  • Integrating or wrangling data for DW/BI ops
  • Data movement/migration to/from tables
  • Transforming or loading large tables
  • Change data capture or replication
  • Clustering or query performance
  • Generating smart, safe test data
  • Masking sensitive data

Specific performance diagnoses and tuning also take time and may affect other users. Finally, stored SQL procedures may also be programmed inefficiently, require optimization, then still take too long to run. 

Solutions

To:

Use:

Keep Snowflake Data in Order & Externalize SQL Transforms

IRI CoSort to pre-sort flat files for bulk loads and inserts, and to bypass slower in-database transformation like sorting, joining and filtering by using the external CoSort SortCL data processing program against Snowflake data. This removes the overhead of that work from Snowflake if it needs to be done, improving the performance of clustering and commonly performed queries.

Integrate and Wrangle Data for DWH & Analytics

IRI Voracity to leverage the multi-threaded, memory-optimized, and task consolidating power of CoSort to perform ETL operations and act as a production analytics platform to simultaneous prepare, package (and even report) simultaneously. For more information, see the tabs under https://www.iri.com/solutions/data-integration/implement.

Migrate and Replicate Snowflake Databases

IRI NextForm Database Edition to acquire, re-map, re-format, and build/populate new tables during migrations to and from Snowflake. You can also use NextForm or the SortCL program in CoSort or Voracity to refresh, re-map and convert data in Snowflake, produce custom reports, copies, and federated views of data.

Mask Data in Snowflake Columns

IRI FieldShield to classify, find and mask structured (or DarkShield for semi/unstructured) data in Snowflake columns, like personally identifiable information (PII) or protected health information (PHI). Apply redaction, encryption, pseduonymization, blurring and other de-identifying functions to comply with privacy laws like HIPAA, PCI DSS, FERPA, and GDPR and support DevOps. For structured data, see how you can connect to Snowflake here, and mask and map data in Snowflake here.

Generate Snowflake Test Data

IRI RowGen to populate Snowflake operations rapidly with safe test data. RowGen uses relational data models to generate realistic test data automatically for an entire database or DataVault 2.0 models with referential (or business-key) integrity. IRI RowGen, FieldShield, and subsetting operations are also tightly integrated with the ValueLabs Test Data Hub for test data management (TDM) in Snowflake.

Learn more about all of these mapping and masking options in the IRI Voracity data management platform which includes these components here!

Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.