Database Subsetting

Home » Solutions » Test Data » DB Subsetting

Quick Links

Overview Benchmarking Compliance DB Test Data DB Subsetting DevOps Test Files/Reports Virtual Test Data TDaaS

Challenges

Database application developers often rely on data in production tables for testing. But there are several drawbacks to that approach, including, the:

confidentiality of data in those tables
cost of migrating, masking, refreshing, and/or storing replicated databases for testing
redundancy of production data, which means wasted space and insufficient testing coverage
need for only small slices of data for specific test cases

Sometimes smaller, masked subsets of massive files are also needed for rapidly testing applications with anonymized data. Most data masking tools cannot support the volume and variety of flat files involved.

Solutions

In addition to the powerful database parsing, generation, and population capabilities that IRI RowGen provides for synthesizing structurally and referentially correct test data, you can now also produce (and mask) referential intact database subsets from standard relational -- as well as complex and/or very large database (VLDB) -- sources.

A proven, powerful database subsetting wizard for relational databases is inside IRI Workbench, the Eclipse IDE for the IRI Voracity test data management platform or its component IRI Data Protector suite test data security tools: IRI FieldShield and IRI DarkShield for data masking and/or RowGen for test data generation.

This ergonomic data subsetting tool allows you to rapidly create custom-sized subsets of manageable, referentially correct data determined by your master (parent) table, and apply consistent data masking and/or mapping rules to all the subset (child) tables at the same time.

It's also possible to selectively subset and mask data from individual tables on an ad hoc basis for testing using the same IRI metadata framework. To do that, simply write, or use a wizard to automatically create, a single source FieldShield data masking job script with the table details, and add either SQL SELECT syntax directly into the input section, or create and use custom /INCOLLECT row-count filtering and/or qualitative /INCLUDE or /OMIT statements to define the size and content of each subset, respectively.

You can provision the DB subsets in multiple ways for test data users working on-premise or in the cloud. These database subset management options include: new persistent or virtual (federated) test schema, flat file targets, and DevOps pipelines per this example

In addition to database subsets, you can also create test file subsets. Use RowGen to create synthetic, highly realistic test files in any format and size. Use FieldShield to extract (and mask) test data subsets from structured (flat) files in fixed position or delimited format using its built-in selection and filtering functionality. Built on the same big data engine for Voracity (the IRI CoSort SortCLprogram), FieldShieldwill search and mask structured files at any volume, too.

Subsetting strategies like these not only minimize the risk of PII exposure and privacy law violations, but dramatically lower the costs of database and application testing infrastructures ... some say as much as $50,000 per database. And by running the subsetting jobs you save, you can use this tool for automated test data generation.

Learn how to automatically set-up, and create test data subsetting jobs in Workbench here.

Frequently Asked Questions (FAQs)

1. What is database subsetting?

Database subsetting is the process of extracting a smaller, representative slice of data from a larger production database. These subsets maintain referential integrity and are used for testing, development, and training without exposing full production datasets.

2. How can database subsetting help reduce testing costs?

By working with smaller, targeted subsets instead of full databases, you reduce the storage, migration, masking, and refresh costs. Subsets also accelerate test cycles and infrastructure provisioning.

3. What makes IRI’s subsetting solution different?

IRI’s solution uses a graphical job configuration wizard to specify the automatic creation of referentially intact subsets based on master-child table relationships. It applies masking and mapping rules across all related tables and scales to large or complex database environments. It also supports custom qualitative and quantitative filters for target tables and/or custom files.

4. How does IRI ensure referential integrity in subsets?

IRI’s subsetting tool identifies the parent table and preserves all related foreign key relationships across child tables, ensuring that the extracted subset behaves like the original schema during testing.

5. Can I apply data masking during the subsetting process?

Yes. Subsetting jobs in the IRI Voracity platform support the simultaneous application of data masking rules in your subsetting jobs. You can reuse FieldShield’s encryption, redaction, or pseudonymization functions across all extracted data to stay compliant with privacy laws and maintain referential integrity. You can apply these rules, and other data transformation, cleansing, and reporting functions at the same time (during subsetting) thanks to the use of the SortCL data definition and manipulation program powering both DB and file subsetting.

6. What databases are supported for subsetting?

IRI supports all major relational databases, including Oracle, SQL Server, PostgreSQL, MySQL, DB2, and Teradata. It also works with very large databases (VLDBs) without needing proprietary connectors or additional infrastructure.

7. How do I subset a specific portion of a table?

You can filter data using SQL SELECT statements, or apply SortCL/INCOLLECT, /INCLUDE, or /OMIT syntax in FieldShield jobs to define subset size, row limits, or specific criteria. These types of filters are front-ended in the database subsetting wizard in the graphical IRI IDE, too.

8. Can I subset (and mask PII in) flat files, too?

Yes. IRI FieldShield can extract and mask structured flat files, including fixed-width and delimited formats. RowGen can also generate synthetic test file subsets in any structure or volume. You can use the same filtering functions described above in any SortCL-compatible job script to do this.

9. What formats can I provision the subsets into?

Subset data can be delivered as persistent tables, federated (virtual) schemas, or flat files. You can also push them into DevOps pipelines or clone environments using supported tools.

10. Can I reuse subsetting jobs for automated test data generation?

Yes. Once created, your subsetting and masking jobs can be saved and reused in scheduled, scripted, or pipeline-driven workflows for automated test data provisioning.

11. How does subsetting support compliance with data privacy laws?

Subsetting minimizes exposure by limiting access to sensitive fields and enabling targeted masking. This helps organizations stay compliant with laws like HIPAA, GDPR, and PCI DSS.

12. Can I perform subsetting in the cloud?

Yes. IRI supports on-premise, cloud, and hybrid environments. You can provision subsets to cloud-hosted databases or storage and integrate with cloud DevOps tools like Jenkins, GitLab, and Azure DevOps.

Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.

Database Subsetting

Test with Smaller, Masked Schemas

Quick Links

Frequently Asked Questions (FAQs)

Request More Information

Solutions

Products

Customers

Services

Company

Support

News

Partners

Database Subsetting

Test with Smaller, Masked Schemas

Quick Links

Frequently Asked Questions (FAQs)

Request More Information

Follow us on

Get the IRI Newsletter