Glossary
Glossary
A structured reference for data professionals covering protection, quality, integration & test data management — aligned to IRI's Content Hub and product suite.
Key
| Data Protection & Privacy | Data Quality & Control | Data Integration & Modernization | Test Data & QA |
A
2 termsThe irreversible process of altering data so individuals cannot be identified, directly or indirectly. Unlike pseudonymization, anonymized data falls outside GDPR scope because re-identification is not possible. Techniques include generalization, suppression, and noise addition.
See also: Pseudonymization · Data Masking · GDPR ▶ IRI Product(s): FieldShield, Voracity
A chronological record capturing who accessed or modified data, when, and what changed. Audit trails are mandatory in regulated industries (HIPAA, SOX, PCI-DSS) and support forensic investigation, anomaly detection, and compliance demonstration.
See also: Audit-Ready Outputs · Reconciliation · Data Lineage ▶ IRI Product(s): CoSort / Voracity Voracity, CoSort, FieldShield, DarkShield, RowGen, NextForm, Ripcurrent
B
1 termExecution of data jobs collected over a period and processed together without manual intervention. Batch processing suits high-volume, non-time-critical workloads such as end-of-day reconciliation, report generation, or nightly ETL loads. Contrast with streaming/real-time processing.
See also: ETL · Streaming · CI/CD Data Refresh ▶ IRI Product(s): CoSort / SortCL
C
3 termsThe automatic refreshing of test datasets as part of a Continuous Integration / Continuous Delivery pipeline. Ensures tests run against representative, up-to-date data without manual intervention, reducing flakiness and improving release confidence.
See also: DevSecOps · Data Subsetting · Test Data Generation ▶ IRI Product(s): RowGen, FieldShield, DarkShield, Voracity
Adherence to laws, regulations, standards, or internal policies governing how data is collected, stored, processed, and shared. Key frameworks include GDPR (EU), HIPAA (US healthcare), PCI-DSS (payment cards), and CPRA (California). Non-compliance can result in significant fines and reputational damage.
See also: GDPR · HIPAA · PCI-DSS · Data Masking ▶ IRI Product(s): FieldShield, Darkshield
IRI's high-performance data transformation engine for sorting, aggregating, joining, and reformatting large datasets at the command line or within graphical workflows. CoSort uses SortCL and underpins structured data manipulations in the IRI Voracity platform, enabling fast ETL, wrangling, reporting, and data quality operations without database overhead.
See also: SortCL · ETL · Data Transformation ▶ IRI Product(s): CoSort, Voracity
D
13 termsThe detection and correction (or removal) of corrupt, inaccurate, incomplete, or duplicate records from a dataset. Common tasks include standardizing formats, filling nulls, deduplicating records, and validating against reference data. A prerequisite for reliable analytics and reporting.
See also: Data Quality · Validation Rules · Reconciliation ▶ IRI Product(s): CoSort, Voracity
The process of combining data from disparate sources into a unified, consistent view. Integration may be achieved through ETL pipelines, data virtualization, or API-based federation. Essential for enabling enterprise analytics, AI model training, and operational reporting.
See also: ETL · ELT · Data Migration · Data Pipeline ▶ IRI Product: Voracity
The end-to-end record of data's origins, movements, transformations, and consumption across systems. Lineage is critical for debugging data quality issues, satisfying regulatory audit requirements, and understanding the impact of upstream changes on downstream consumers.
See also: Audit Trail · Data Governance · ETL ▶ IRI Product: Voracity
Replacing sensitive data values with fictitious but structurally realistic substitutes. Static masking permanently transforms data in non-production environments; dynamic masking applies in real time at query execution. Masked data retains referential integrity and format validity, allowing applications and tests to function normally.
See also: Tokenization · Pseudonymization · Anonymization · FieldShield ▶ IRI Product(s): FieldShield, DarkShield, CellShield, Voracity
The process of moving data from one storage system, format, or environment to another - e.g., from an on-premises mainframe to a cloud data warehouse. Successful migration requires profiling, cleansing, mapping, validation, and cut-over planning to avoid data loss or corruption.
See also: ETL · Mainframe-to-Cloud · Data Integration ▶ IRI Product(s): IRI Voracity, Nextform
An automated sequence of steps that ingests raw data from one or more sources, applies transformations, and loads it into a target system. Pipelines may run on a schedule (batch) or continuously (streaming) and typically include error handling, logging, and retry logic.
See also: ETL · ELT · Batch Processing · Streaming ▶ IRI Product(s): CoSort, IRI Voracity
The examination of an existing dataset to collect statistics about its structure, content, and quality. Profiling reveals nulls, uniqueness rates, value distributions, format inconsistencies, and referential integrity violations - providing the foundation for cleansing and governance efforts.
See also: Data Quality · Data Cleansing · Validation Rules ▶ IRI Product: IRI Voracity
A multi-dimensional measure of data's fitness for use, assessed against accuracy, completeness, consistency, timeliness, validity, and uniqueness. Poor data quality leads to flawed decisions, failed migrations, and compliance violations. Ongoing quality management involves profiling, cleansing, monitoring, and remediation.
See also: Data Cleansing · Validation Rules · Audit-Ready Outputs ▶ IRI Product(s): CoSort, Voracity
Extraction of a representative, referentially intact portion of a production database for use in development or testing environments. Subsetting dramatically reduces environment size and provisioning time while ensuring tests exercise realistic data distributions and relationships.
See also: Test Data Generation · CI/CD Data Refresh · Data Masking ▶ IRI Product(s): RowGen, FieldShield, Voracity
The conversion of data from one format, structure, or schema to another. Transformations include filtering, sorting, aggregating, joining, splitting, normalizing, or enriching records. A core step in ETL and ELT pipelines, ensuring data is fit for purpose in the target system.
See also: ETL · ELT · CoSort · SortCL ▶ IRI Product(s): CoSort, Voracity
The broad process of removing or obfuscating personally identifiable information (PII) from datasets so individuals cannot be identified. De-identification is a superset term encompassing anonymization, pseudonymization, masking, redaction, obfuscation and tokenization. HIPAA Safe Harbor defines specific criteria for sufficient de-identification.
See also: Anonymization · Pseudonymization · Data Masking · PII ▶ IRI Product(s): FieldShield, DarkShield, CellShield, Voracity
A cultural and technical practice that integrates security controls into DevOps pipelines from the earliest development stages. In a data context, DevSecOps includes automated masking in CI/CD, test data governance policies, and continuous compliance validation - ensuring sensitive data never reaches non-production environments.
See also: CI/CD Data Refresh · Data Masking · Test Data Governance ▶ IRI Product(s): RowGen , FieldShield, Darkshield
A masking technique that intercepts queries in real time and returns masked values to unauthorized users while leaving the underlying data unchanged. DDM is role-based and requires no duplication of the database, making it suitable for live systems where different users need different data visibility.
See also: Static Data Masking · Data Masking · FieldShield ▶ IRI Product(s): FieldShield, Darkshield
E
3 termsA modern data integration pattern where raw data is extracted from sources and loaded into a target (typically a cloud data warehouse) before transformations are applied. ELT leverages the compute power of the target platform and is popular with columnar cloud stores such as Snowflake, BigQuery, and Redshift.
See also: ETL · Data Pipeline · Data Integration ▶ IRI Product: Voracity
The encoding of data using a cryptographic algorithm so that only authorized parties with the correct decryption key can read it. Encryption protects data at rest (stored files, databases) and in transit (network communication). Unlike masking, encrypted data can be fully restored with the key.
See also: Tokenization · Data Masking · FieldShield ▶ IRI Product(s): FieldShield, DarkShield, CellShield
A foundational data integration pattern in which data is extracted from source systems, transformed (cleansed, reformatted, enriched) in a staging area, and then loaded into a target system such as a data warehouse. ETL remains prevalent in mainframe and on-premises environments where transformation logic must be centralized.
See also: ELT · Data Pipeline · Data Integration · Batch Processing ▶ IRI Product: Voracity
F
2 termsIRI's data masking and protection product supporting static masking, dynamic data masking, encryption, tokenization, and pseudonymization across structured, semi-structured, and unstructured data sources. Integrates with IRI Voracity and the IRI Workbench IDE for end-to-end data protection workflows.
See also: Data Masking · Tokenization · Pseudonymization · Encryption ▶ IRI Product: FieldShield
An encryption method that produces ciphertext of the same format and length as the plaintext (e.g., a 16-digit card number encrypts to another 16-digit number). FPE allows encrypted data to pass format validation checks without changes to downstream applications, making it ideal for payment and identity data.
See also: Encryption · Tokenization · PCI-DSS ▶ IRI Product(s): FieldShield, DarkShield, CellShield
G
1 termThe EU's data protection regulation (in force since May 2018) governing how organizations collect, process, store, and transfer personal data of EU residents. GDPR grants individuals rights including access, rectification, erasure (right to be forgotten), and data portability. Non-compliance carries fines of up to 4% of global annual turnover.
See also: PII · Anonymization · Data Masking · Compliance ▶ IRI Product(s): FieldShield, DarkShield
H
1 termUS federal legislation mandating privacy and security standards for Protected Health Information (PHI). HIPAA's Safe Harbor method defines 18 identifiers that must be removed or masked before health data can be considered de-identified. Covered entities and business associates face civil and criminal penalties for violations.
See also: PHI · De-identification · Compliance · Data Masking ▶ IRI Product(s): FieldShield, Darkshield
I
1 termIRI's all-in-one data management platform combining ETL, data quality, data masking, analytics, and migration capabilities in a single licensed environment. Built on the CoSort engine, Voracity reduces tool sprawl and total cost of ownership for enterprise data teams managing complex, mixed-environment data landscapes.
See also: CoSort · FieldShield · RowGen · ETL ▶ IRI Product: Voracity
M
1 termThe process of moving data, workloads, or applications from legacy mainframe systems (e.g., IBM z/OS) to cloud platforms (AWS, Azure, GCP) or hybrid environments. Key challenges include EBCDIC-to-Unicode conversion, copybook parsing, referential integrity preservation, and minimizing downtime during cut-over.
See also: Data Migration · ETL · Data Integration ▶ IRI Product(s): NextForm, Cosort, Voracity
P
4 termsA set of security standards established by the PCI Security Standards Council to protect cardholder data. Organizations that store, process, or transmit payment card data must comply, including requirements for data encryption, tokenization of Primary Account Numbers (PANs), and strict access controls.
See also: Tokenization · Encryption · Compliance · FPE ▶ IRI Product(s): FieldShield, DarkShield, CellShield
Any individually identifiable health information held or transmitted by a HIPAA-covered entity or business associate. PHI includes demographic details, diagnoses, treatment records, and payment information when linked to a specific individual. PHI must be de-identified before use in research, analytics, or test environments.
See also: HIPAA · De-identification · PII ▶ IRI Product(s): FieldShield, DarkShield, CellShield
Any data that can be used, alone or in combination with other data, to identify a specific individual. Examples include full name, national ID number, email address, IP address, biometric records, and location data. PII is subject to protection under regulations such as GDPR, CCRA, and HIPAA.
See also: GDPR · HIPAA · Data Masking · De-identification ▶ IRI Product(s): FieldShield, DarkShield, CellShield
Processing of personal data so that it can no longer be attributed to a specific individual without use of a separate key or mapping table. Unlike anonymization, pseudonymized data is still considered personal data under GDPR but benefits from reduced regulatory requirements. Typical techniques include consistent substitution and hashing.
See also: Anonymization · Data Masking · GDPR · Tokenization ▶ IRI Product(s): FieldShield, DarkShield, CellShield
R
3 termsThe process of comparing two sets of records to ensure they are consistent and accurate. In data quality, reconciliation verifies that source and target datasets match after a migration, transformation, or integration job. Discrepancies are flagged for investigation and remediation before data is promoted to production.
See also: Data Quality · Audit Trail · Validation Rules ▶ IRI Product(s): CoSort, Voracity
A constraint ensuring that relationships between tables in a database (and values across applications) remain consistent. Foreign key values in a child table must correspond to valid primary key values in the parent table. Maintaining referential integrity is critical during data masking, subsetting, synthesis, and migration to prevent orphaned records and application errors.
See also: Data Quality · Data Subsetting · Data Migration ▶ IRI Product(s): FieldShield, DarkShield, RowGen, Voracity
IRI's test data generation and population product that creates syntactically and semantically valid synthetic data for files, tables, and reports in development, QA, and performance testing. RowGen generates data according to user-defined rules, statistical distributions, and referential constraints — enabling thorough testing without exposing production PII.
See also: Synthetic Data · Data Subsetting · CI/CD Data Refresh ▶ IRI Product: RowGen
S
4 termsIRI's command-line scripting language embedded in CoSort and Voracity for defining data transformation, sorting, aggregation, and reporting jobs. SortCL scripts describe input/output layouts, field-level transformations, and filtering conditions, enabling high-speed data processing across flat files, databases, and mainframe datasets.
See also: CoSort · Data Transformation · ETL ▶ IRI Product(s): CoSort, Voracity, FieldShield, RowGen, NextForm
A masking approach in which sensitive data in a copy of the production database is permanently replaced before the copy is distributed to non-production environments. SDM is a one-time, offline operation that creates a stable, safe dataset for development or testing. Contrast with Dynamic Data Masking (DDM).
See also: Dynamic Data Masking · Data Masking · FieldShield ▶ IRI Product(s): FieldShield, DarkShield, CellShield
A data processing model in which records are ingested and processed continuously as they arrive, rather than in bulk. Streaming enables low-latency use cases such as fraud detection, IoT telemetry, and live dashboards. Technologies like Apache Kafka and Flink are commonly used for stream orchestration.
See also: Batch Processing · ETL · Data Pipeline ▶ IRI Product: Voracity
Artificially generated data that statistically mimics the properties of real data without containing any actual personal or sensitive information. Synthetic data is increasingly used in AI/ML model training, software testing, and demo environments as a privacy-safe alternative to masked or sampled production data.
See also: RowGen · Test Data Generation · Data Masking ▶ IRI Product: RowGen
T
3 termsThe automated or manual creation of datasets specifically designed for use in software testing. Effective test data generation covers positive cases, negative cases, edge cases, and high-volume scenarios. Automated tools like RowGen generate referentially consistent, rule-bound test data at scale, reducing QA cycle times.
See also: RowGen · Synthetic Data · Data Subsetting ▶ IRI Product: RowGen
The policies, processes, and controls that manage how test data is created, stored, distributed, and retired. Governance ensures that test environments never expose production PII, that data is refreshed consistently across teams, and that compliance obligations are met throughout the software development lifecycle.
See also: DevSecOps · CI/CD Data Refresh · Data Masking ▶ IRI Product(s): RowGen , FieldShield, DarkShield, Voracity
The replacement of a sensitive data value (e.g., a credit card number) with a non-sensitive placeholder called a token. Tokens have no exploitable value outside the tokenization system and can be stored and processed freely. The original value can be retrieved via a secure token vault. Widely used in payment processing to reduce PCI-DSS scope.
See also: Data Masking · Encryption · PCI-DSS · FPE ▶ IRI Product: FieldShield
V
1 termPredefined constraints applied to data to verify it meets required standards before processing or persistence. Examples include range checks, format checks, referential checks, and business rules. Failed validations trigger rejections or quarantine.
See also: Data Quality · Data Cleansing · Reconciliation ▶ IRI Product(s): CoSort, Voracity

