This is the first of a four-part series of blog articles examining the inherent tradeoffs between data processing and information storage and presentation within traditional ETL paradigms — from the ODS to the data lake.
There are times when it is necessary to test with or share data that has elements of personally identifiable information (PII). To comply with data privacy laws and prevent a data breach, you may need to provide data that reflects, and sometimes imparts, critical information, but still protects the PII.
The IRI Voracity data management platform (and IRI FieldShield data masking product within) now allow you to auto-define data classes and groups based on your business glossaries or domain ontologies and apply transformation rules to those classes across multiple data sources and fields.
A dimension is a structure that categorizes a collection of information so that meaningful answers to questions regarding that information may be obtained. Dimensions in data management and data warehouses contain relatively static data; however, this dimensional data can change slowly over time and at unpredictable intervals.
This article looks at sets from an informational processing perspective; what they are; how they are constructed; and, distinct ways in which data can be drawn from sets within IRI software products using the SortCL data definition and processing program; i.e.,
Dimensional data that change slowly or unpredictably are captured in Slowly Changing Dimensions (SCD) analyses. In a data warehouse environment, a dimension table has a primary key that uniquely identifies each record and other pieces of information that are known as the dimensional data.