Data Education Center

 

Next Steps
Support Site Overview Self-Learning Data Education Center License Transfers Support FAQ Knowledge Base Documentation


Glossary

A structured reference for data professionals covering protection, quality, integration & test data management — aligned to IRI's Content Hub and product suite.

Key

 

 Data Protection & Privacy  Data Quality & Control Data Integration & Modernization  Test Data & QA 

 

A

2 terms
 
Anonymization | DATA PROTECTION & PRIVACY
 

The irreversible process of altering data so individuals cannot be identified, directly or indirectly. Unlike pseudonymization, anonymized data falls outside GDPR scope because re-identification is not possible. Techniques include generalization, suppression, and noise addition.

See also: Pseudonymization · Data Masking · GDPR     ▶ IRI Product(s): FieldShield, Voracity

Audit Trail | DATA QUALITY & CONTROL
 

A chronological record capturing who accessed or modified data, when, and what changed. Audit trails are mandatory in regulated industries (HIPAA, SOX, PCI-DSS) and support forensic investigation, anomaly detection, and compliance demonstration.

See also: Audit-Ready Outputs · Reconciliation · Data Lineage     ▶ IRI Product(s): CoSort / Voracity Voracity, CoSort, FieldShield, DarkShield, RowGen, NextForm, Ripcurrent

B

1 term
 
Batch Processing | DATA INTEGRATION & MODERNIZATION
 

Execution of data jobs collected over a period and processed together without manual intervention. Batch processing suits high-volume, non-time-critical workloads such as end-of-day reconciliation, report generation, or nightly ETL loads. Contrast with streaming/real-time processing.

See also: ETL · Streaming · CI/CD Data Refresh     ▶ IRI Product(s): CoSort / SortCL

C

3 terms
 
CI/CD Data Refresh | TEST DATA & QA
 

The automatic refreshing of test datasets as part of a Continuous Integration / Continuous Delivery pipeline. Ensures tests run against representative, up-to-date data without manual intervention, reducing flakiness and improving release confidence.

See also: DevSecOps · Data Subsetting · Test Data Generation     ▶ IRI Product(s): RowGen, FieldShield, DarkShield, Voracity

Compliance | DATA PROTECTION & PRIVACY
 

Adherence to laws, regulations, standards, or internal policies governing how data is collected, stored, processed, and shared. Key frameworks include GDPR (EU), HIPAA (US healthcare), PCI-DSS (payment cards), and CPRA (California). Non-compliance can result in significant fines and reputational damage.

See also: GDPR · HIPAA · PCI-DSS · Data Masking     ▶ IRI Product(s): FieldShield, Darkshield

IRI CoSort | DATA QUALITY & CONTROL
 

IRI's high-performance data transformation engine for sorting, aggregating, joining, and reformatting large datasets at the command line or within graphical workflows. CoSort uses SortCL and underpins structured data manipulations in the IRI Voracity platform, enabling fast ETL, wrangling, reporting, and data quality operations without database overhead.

See also: SortCL · ETL · Data Transformation     ▶ IRI Product(s): CoSort, Voracity

D

13 terms
 
Data Cleansing | DATA QUALITY & CONTROL
 

The detection and correction (or removal) of corrupt, inaccurate, incomplete, or duplicate records from a dataset. Common tasks include standardizing formats, filling nulls, deduplicating records, and validating against reference data. A prerequisite for reliable analytics and reporting.

See also: Data Quality · Validation Rules · Reconciliation     ▶ IRI Product(s): CoSort, Voracity

Data Integration | DATA INTEGRATION & MODERNIZATION
 

The process of combining data from disparate sources into a unified, consistent view. Integration may be achieved through ETL pipelines, data virtualization, or API-based federation. Essential for enabling enterprise analytics, AI model training, and operational reporting.

See also: ETL · ELT · Data Migration · Data Pipeline     ▶ IRI Product: Voracity

Data Lineage | DATA QUALITY & CONTROL
 

The end-to-end record of data's origins, movements, transformations, and consumption across systems. Lineage is critical for debugging data quality issues, satisfying regulatory audit requirements, and understanding the impact of upstream changes on downstream consumers.

See also: Audit Trail · Data Governance · ETL     ▶ IRI Product: Voracity

Data Masking | DATA PROTECTION & PRIVACY
 

Replacing sensitive data values with fictitious but structurally realistic substitutes. Static masking permanently transforms data in non-production environments; dynamic masking applies in real time at query execution. Masked data retains referential integrity and format validity, allowing applications and tests to function normally.

See also: Tokenization · Pseudonymization · Anonymization · FieldShield     ▶ IRI Product(s): FieldShield, DarkShield, CellShield, Voracity

Data Migration | DATA INTEGRATION & MODERNIZATION
 

The process of moving data from one storage system, format, or environment to another - e.g., from an on-premises mainframe to a cloud data warehouse. Successful migration requires profiling, cleansing, mapping, validation, and cut-over planning to avoid data loss or corruption.

See also: ETL · Mainframe-to-Cloud · Data Integration     ▶ IRI Product(s): IRI Voracity, Nextform

Data Pipeline | DATA INTEGRATION & MODERNIZATION
 

An automated sequence of steps that ingests raw data from one or more sources, applies transformations, and loads it into a target system. Pipelines may run on a schedule (batch) or continuously (streaming) and typically include error handling, logging, and retry logic.

See also: ETL · ELT · Batch Processing · Streaming     ▶ IRI Product(s): CoSort, IRI Voracity

Data Profiling | DATA QUALITY & CONTROL
 

The examination of an existing dataset to collect statistics about its structure, content, and quality. Profiling reveals nulls, uniqueness rates, value distributions, format inconsistencies, and referential integrity violations - providing the foundation for cleansing and governance efforts.

See also: Data Quality · Data Cleansing · Validation Rules     ▶ IRI Product: IRI Voracity

Data Quality | DATA QUALITY & CONTROL
 

A multi-dimensional measure of data's fitness for use, assessed against accuracy, completeness, consistency, timeliness, validity, and uniqueness. Poor data quality leads to flawed decisions, failed migrations, and compliance violations. Ongoing quality management involves profiling, cleansing, monitoring, and remediation.

See also: Data Cleansing · Validation Rules · Audit-Ready Outputs     ▶ IRI Product(s): CoSort, Voracity

Data Subsetting | TEST DATA & QA
 

Extraction of a representative, referentially intact portion of a production database for use in development or testing environments. Subsetting dramatically reduces environment size and provisioning time while ensuring tests exercise realistic data distributions and relationships.

See also: Test Data Generation · CI/CD Data Refresh · Data Masking     ▶ IRI Product(s): RowGen, FieldShield, Voracity

Data Transformation | DATA INTEGRATION & MODERNIZATION
 

The conversion of data from one format, structure, or schema to another. Transformations include filtering, sorting, aggregating, joining, splitting, normalizing, or enriching records. A core step in ETL and ELT pipelines, ensuring data is fit for purpose in the target system.

See also: ETL · ELT · CoSort · SortCL     ▶ IRI Product(s): CoSort, Voracity

De-identification | DATA PROTECTION & PRIVACY
 

The broad process of removing or obfuscating personally identifiable information (PII) from datasets so individuals cannot be identified. De-identification is a superset term encompassing anonymization, pseudonymization, masking, redaction, obfuscation and tokenization. HIPAA Safe Harbor defines specific criteria for sufficient de-identification.

See also: Anonymization · Pseudonymization · Data Masking · PII     ▶ IRI Product(s): FieldShield, DarkShield, CellShield, Voracity

DevSecOps | TEST DATA & QA
 

A cultural and technical practice that integrates security controls into DevOps pipelines from the earliest development stages. In a data context, DevSecOps includes automated masking in CI/CD, test data governance policies, and continuous compliance validation - ensuring sensitive data never reaches non-production environments.

See also: CI/CD Data Refresh · Data Masking · Test Data Governance     ▶ IRI Product(s): RowGen , FieldShield, Darkshield

Dynamic Data Masking (DDM) | DATA PROTECTION & PRIVACY
 

A masking technique that intercepts queries in real time and returns masked values to unauthorized users while leaving the underlying data unchanged. DDM is role-based and requires no duplication of the database, making it suitable for live systems where different users need different data visibility.

See also: Static Data Masking · Data Masking · FieldShield     ▶ IRI Product(s): FieldShield, Darkshield

E

3 terms
 
ELT (Extract, Load, Transform) | DATA INTEGRATION & MODERNIZATION
 

A modern data integration pattern where raw data is extracted from sources and loaded into a target (typically a cloud data warehouse) before transformations are applied. ELT leverages the compute power of the target platform and is popular with columnar cloud stores such as Snowflake, BigQuery, and Redshift.

See also: ETL · Data Pipeline · Data Integration     ▶ IRI Product: Voracity

Encryption | DATA PROTECTION & PRIVACY
 

The encoding of data using a cryptographic algorithm so that only authorized parties with the correct decryption key can read it. Encryption protects data at rest (stored files, databases) and in transit (network communication). Unlike masking, encrypted data can be fully restored with the key.

See also: Tokenization · Data Masking · FieldShield     ▶ IRI Product(s): FieldShield, DarkShield, CellShield

ETL (Extract, Transform, Load) | DATA INTEGRATION & MODERNIZATION
 

A foundational data integration pattern in which data is extracted from source systems, transformed (cleansed, reformatted, enriched) in a staging area, and then loaded into a target system such as a data warehouse. ETL remains prevalent in mainframe and on-premises environments where transformation logic must be centralized.

See also: ELT · Data Pipeline · Data Integration · Batch Processing     ▶ IRI Product: Voracity

F

2 terms
 
IRI FieldShield | DATA PROTECTION & PRIVACY
 

IRI's data masking and protection product supporting static masking, dynamic data masking, encryption, tokenization, and pseudonymization across structured, semi-structured, and unstructured data sources. Integrates with IRI Voracity and the IRI Workbench IDE for end-to-end data protection workflows.

See also: Data Masking · Tokenization · Pseudonymization · Encryption     ▶ IRI Product: FieldShield

Format-Preserving Encryption (FPE) | DATA PROTECTION & PRIVACY
 

An encryption method that produces ciphertext of the same format and length as the plaintext (e.g., a 16-digit card number encrypts to another 16-digit number). FPE allows encrypted data to pass format validation checks without changes to downstream applications, making it ideal for payment and identity data.

See also: Encryption · Tokenization · PCI-DSS     ▶ IRI Product(s): FieldShield, DarkShield, CellShield

G

1 term
 
GDPR (General Data Protection Regulation) | DATA PROTECTION & PRIVACY
 

The EU's data protection regulation (in force since May 2018) governing how organizations collect, process, store, and transfer personal data of EU residents. GDPR grants individuals rights including access, rectification, erasure (right to be forgotten), and data portability. Non-compliance carries fines of up to 4% of global annual turnover.

See also: PII · Anonymization · Data Masking · Compliance     ▶ IRI Product(s): FieldShield, DarkShield

H

1 term
 
HIPAA (Health Insurance Portability and Accountability Act) | DATA PROTECTION & PRIVACY
 

US federal legislation mandating privacy and security standards for Protected Health Information (PHI). HIPAA's Safe Harbor method defines 18 identifiers that must be removed or masked before health data can be considered de-identified. Covered entities and business associates face civil and criminal penalties for violations.

See also: PHI · De-identification · Compliance · Data Masking     ▶ IRI Product(s): FieldShield, Darkshield

I

1 term
 
IRI Voracity | DATA INTEGRATION & MODERNIZATION
 

IRI's all-in-one data management platform combining ETL, data quality, data masking, analytics, and migration capabilities in a single licensed environment. Built on the CoSort engine, Voracity reduces tool sprawl and total cost of ownership for enterprise data teams managing complex, mixed-environment data landscapes.

See also: CoSort · FieldShield · RowGen · ETL     ▶ IRI Product: Voracity

M

1 term
 
Mainframe-to-Cloud Migration | DATA INTEGRATION & MODERNIZATION
 

The process of moving data, workloads, or applications from legacy mainframe systems (e.g., IBM z/OS) to cloud platforms (AWS, Azure, GCP) or hybrid environments. Key challenges include EBCDIC-to-Unicode conversion, copybook parsing, referential integrity preservation, and minimizing downtime during cut-over.

See also: Data Migration · ETL · Data Integration     ▶ IRI Product(s): NextForm, Cosort, Voracity

P

4 terms
 
PCI-DSS (Payment Card Industry Data Security Standard) | DATA PROTECTION & PRIVACY
 

A set of security standards established by the PCI Security Standards Council to protect cardholder data. Organizations that store, process, or transmit payment card data must comply, including requirements for data encryption, tokenization of Primary Account Numbers (PANs), and strict access controls.

See also: Tokenization · Encryption · Compliance · FPE     ▶ IRI Product(s): FieldShield, DarkShield, CellShield

PHI (Protected Health Information) | DATA PROTECTION & PRIVACY
 

Any individually identifiable health information held or transmitted by a HIPAA-covered entity or business associate. PHI includes demographic details, diagnoses, treatment records, and payment information when linked to a specific individual. PHI must be de-identified before use in research, analytics, or test environments.

See also: HIPAA · De-identification · PII     ▶ IRI Product(s): FieldShield, DarkShield, CellShield

PII (Personally Identifiable Information) | DATA PROTECTION & PRIVACY
 

Any data that can be used, alone or in combination with other data, to identify a specific individual. Examples include full name, national ID number, email address, IP address, biometric records, and location data. PII is subject to protection under regulations such as GDPR, CCRA, and HIPAA.

See also: GDPR · HIPAA · Data Masking · De-identification     ▶ IRI Product(s): FieldShield, DarkShield, CellShield

Pseudonymization | DATA PROTECTION & PRIVACY
 

Processing of personal data so that it can no longer be attributed to a specific individual without use of a separate key or mapping table. Unlike anonymization, pseudonymized data is still considered personal data under GDPR but benefits from reduced regulatory requirements. Typical techniques include consistent substitution and hashing.

See also: Anonymization · Data Masking · GDPR · Tokenization     ▶ IRI Product(s): FieldShield, DarkShield, CellShield

R

3 terms
 
Reconciliation | DATA QUALITY & CONTROL
 

The process of comparing two sets of records to ensure they are consistent and accurate. In data quality, reconciliation verifies that source and target datasets match after a migration, transformation, or integration job. Discrepancies are flagged for investigation and remediation before data is promoted to production.

See also: Data Quality · Audit Trail · Validation Rules     ▶ IRI Product(s): CoSort, Voracity

Referential Integrity | DATA QUALITY & CONTROL
 

A constraint ensuring that relationships between tables in a database (and values across applications) remain consistent. Foreign key values in a child table must correspond to valid primary key values in the parent table. Maintaining referential integrity is critical during data masking, subsetting, synthesis, and migration to prevent orphaned records and application errors.

See also: Data Quality · Data Subsetting · Data Migration     ▶ IRI Product(s): FieldShield, DarkShield, RowGen, Voracity

IRI RowGen | TEST DATA & QA
 

IRI's test data generation and population product that creates syntactically and semantically valid synthetic data for files, tables, and reports in development, QA, and performance testing. RowGen generates data according to user-defined rules, statistical distributions, and referential constraints — enabling thorough testing without exposing production PII.

See also: Synthetic Data · Data Subsetting · CI/CD Data Refresh     ▶ IRI Product: RowGen

S

4 terms
 
SortCL | DATA QUALITY & CONTROL
 

IRI's command-line scripting language embedded in CoSort and Voracity for defining data transformation, sorting, aggregation, and reporting jobs. SortCL scripts describe input/output layouts, field-level transformations, and filtering conditions, enabling high-speed data processing across flat files, databases, and mainframe datasets.

See also: CoSort · Data Transformation · ETL     ▶ IRI Product(s): CoSort, Voracity, FieldShield, RowGen, NextForm

Static Data Masking (SDM) | DATA PROTECTION & PRIVACY
 

A masking approach in which sensitive data in a copy of the production database is permanently replaced before the copy is distributed to non-production environments. SDM is a one-time, offline operation that creates a stable, safe dataset for development or testing. Contrast with Dynamic Data Masking (DDM).

See also: Dynamic Data Masking · Data Masking · FieldShield     ▶ IRI Product(s): FieldShield, DarkShield, CellShield

Streaming (Real-Time) Processing | DATA INTEGRATION & MODERNIZATION
 

A data processing model in which records are ingested and processed continuously as they arrive, rather than in bulk. Streaming enables low-latency use cases such as fraud detection, IoT telemetry, and live dashboards. Technologies like Apache Kafka and Flink are commonly used for stream orchestration.

See also: Batch Processing · ETL · Data Pipeline     ▶ IRI Product: Voracity

Synthetic Data | TEST DATA & QA
 

Artificially generated data that statistically mimics the properties of real data without containing any actual personal or sensitive information. Synthetic data is increasingly used in AI/ML model training, software testing, and demo environments as a privacy-safe alternative to masked or sampled production data.

See also: RowGen · Test Data Generation · Data Masking     ▶ IRI Product: RowGen

T

3 terms
 
Test Data Generation | TEST DATA & QA
 

The automated or manual creation of datasets specifically designed for use in software testing. Effective test data generation covers positive cases, negative cases, edge cases, and high-volume scenarios. Automated tools like RowGen generate referentially consistent, rule-bound test data at scale, reducing QA cycle times.

See also: RowGen · Synthetic Data · Data Subsetting     ▶ IRI Product: RowGen

Test Data Governance | TEST DATA & QA
 

The policies, processes, and controls that manage how test data is created, stored, distributed, and retired. Governance ensures that test environments never expose production PII, that data is refreshed consistently across teams, and that compliance obligations are met throughout the software development lifecycle.

See also: DevSecOps · CI/CD Data Refresh · Data Masking     ▶ IRI Product(s): RowGen , FieldShield, DarkShield, Voracity

Tokenization | DATA PROTECTION & PRIVACY
 

The replacement of a sensitive data value (e.g., a credit card number) with a non-sensitive placeholder called a token. Tokens have no exploitable value outside the tokenization system and can be stored and processed freely. The original value can be retrieved via a secure token vault. Widely used in payment processing to reduce PCI-DSS scope.

See also: Data Masking · Encryption · PCI-DSS · FPE     ▶ IRI Product: FieldShield

V

1 term
 
Validation Rules | DATA QUALITY & CONTROL
 

Predefined constraints applied to data to verify it meets required standards before processing or persistence. Examples include range checks, format checks, referential checks, and business rules. Failed validations trigger rejections or quarantine.

See also: Data Quality · Data Cleansing · Reconciliation     ▶ IRI Product(s): CoSort, Voracity

Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.